SpatiaLab: Can Vision-Language Models Perform Spatial Reasoning in the Wild? - Explained Simply | ArXiv Explained