OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments - Explained Simply | ArXiv Explained