Adapting Vision-Language Models for E-commerce Understanding at Scale - Explained Simply | ArXiv Explained