MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing - Explained Simply

MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing - Explained Simply | ArXiv Explained