More Thought, Less Accuracy? On the Dual Nature of Reasoning in Vision-Language Models - Explained Simply | ArXiv Explained