Urban Socio-Semantic Segmentation with Vision-Language Reasoning - Explained Simply | ArXiv Explained