Vision-DeepResearch Benchmark: Rethinking Visual and Textual Search for Multimodal Large Language Models - Explained Simply | ArXiv Explained