MiniCPM-V

Exceptional Benchmark Performance: State of the art vision language performance with a 77.0 average score on OpenCompass, surpassing larger models like GPT-4o-latest and Gemini-2.0 Pro
Revolutionary Video Processing: Efficient video understanding using a unified 3D-Resampler that compresses video tokens 96 times, enabling high-FPS processing up to 10 frames per second
Flexible Reasoning Modes: Controllable hybrid fast and deep thinking modes for switching between quick responses and complex reasoning
Advanced Text Recognition: Strong OCR and document parsing that processes high resolution images up to 1.8 million pixels, achieving leading scores on OCRBench and OmniDocBench
Versatile Platform Support: Easy deployment across platforms with llama.cpp and ollama support, 16 quantized model sizes, SGLang and vLLM integration, fine tuning options, WebUI demo, iOS app, and online web demo

Tags: ai model vision ocr

Last modified 22 March 2026