Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection - Explained Simply | ArXiv Explained