What is the best open-source LLM now?
The “best” open-source LLM always depends on your use case, compute budget, and priorities.That said, if you really want some names, here are commonly recommended open-source LLMs for different use cases.
The “best” model is the one that fits your product requirements, works within your compute constraints, and can be optimized for your specific tasks.
Model types
- Source models ("text" or "base"): predictive-text models trained on a large corpus of text
- Fine Tuned models ("chat", "instruct", "code"): receive input in a specific format and respond accordingly
- Embedding models (special case): smaller, specifically for creating embedding vectors
- Multi modal: accepts text and some other modality (image, audio, video, ....)
Explorers, Benchmarks, Leaderboards
- Arena - benchmark & compare the best AI models
- AI Models & API Providers Analysis - understand the AI landscape to choose the best model and provider for your use case
- BullshitBench - measure whether AI models challenge nonsensical prompts instead of confidently answering them
- CyberGym - evaluating AI agents' real-world cybersecurity capabilities at scale
- Dubesor LLM Benchmark table - small-scale manual performance comparison benchmark
- LLM Explorer - explore list of the open-source LLM models
- oobabooga benchmark - a list sorted by size (on disk) for each score
- SWE-rebench - a continuously evolving and decontaminated benchmark for software engineering LLMs
- vakra - a benchmark for evaluating multi-hop, multi-source tool-calling in AI agents
Providers
- bartowski - providing GGUF versions of popular LLMs
- Open Thoughts - a team of researchers and engineers curating the best open reasoning datasets
- Tencent - a profile of a Chinese multinational technology conglomerate and holding company
- Unsloth AI - focusing on making AI more accessible to everyone (GGUFs etc.)
Related tools
- llmfit - hundreds of models & providers, one command to find what runs on your hardware
- outlines - structured outputs for LLMs
- llama-swap - reliable model swapping for any local OpenAI compatible server - llama.cpp, vllm, etc.
Specific (uncategorized) list of models
- wizard-math: Your logic partner. "This model is a specialized version of the WizardLM family. It is trained to excel in complex mathematical problems, logical reasoning, and solving puzzles. There are three different sizes available for wizard-math – 7B, 13B, and 70B. With my little experience exploring wizard-math with 7B, I found it fantastic. It helps me test my solutions and explore new ways to approach difficult problems. The model's ability to handle these subjects with precision and clarity makes it my go-to partner for all things logic and numbers."
- reader-lm: Web to markdown, instantly. "The model is super practical for my needs. Instead of manually creating .md files from the web content, I can feed them to reader-lm to get a perfectly structured markdown file. In my experience, while reader-lm does an amazing job for most of my needs, it sometimes struggles with really large or messy HTML code. But, it works well enough most of the time."
- llma-guard3: An LLM for safe prompts. "When working with LLMs, it’s crucial that our interactions are safe and responsible. While we can’t control an LLM’s response, we can ensure our prompts are appropriate. That’s exactly why I self-hosted llama-guard 3. This powerful model acts as a dedicated content moderation tool for all my other local LLMs. The llama-guard 3's job is to classify every interaction against a set of safety categories. It checks our prompts for 13 different categories. When we give a prompt to this LLM, it will respond with a message stating whether the prompt message was safe or unsafe. If it is unsafe, it flags it with a specific reason, such as S1 (Hate Speech) or S2 (Sexual Content), etc."
- InternVL2.5-4B is a compact multimodal large language model from the InternVL 2.5 series, combining a 300 million parameter InternViT vision encoder with a 3 billion parameter Qwen2.5 language model.
- Dynamic High Resolution Processing: Handles single images, multiple images, and video frames by dividing them into adaptive 448 by 448 pixel tiles with intelligent token reduction through pixel unshuffle operations
- Efficient Three Stage Training: Features a carefully designed pipeline with MLP warmup, optional vision encoder incremental learning for specialized domains, and full model instruction tuning with strict data quality controls
- Progressive Scaling Strategy: Trains the vision encoder with smaller language models first before transferring to larger ones, using less than one tenth of the tokens required by comparable models
- Advanced Data Quality Filtering: Employs a comprehensive pipeline with LLM based quality scoring, repetition detection, and heuristic rule based filtering to remove low quality samples and prevent model degradation
- Strong Multimodal Performance: Delivers competitive results on OCR, document parsing, chart understanding, multi image comprehension, and video analysis while preserving pure language capabilities through improved data curation
-
Granite Vision 3.3 2b is a compact and efficient vision-language model released on June 11th, 2025, designed specifically for visual document understanding tasks.
- Superior Document Understanding Performance: Achieves improved scores across key benchmarks including ChartQA, DocVQA, TextVQA, and OCRBench, outperforming previous granite-vision versions
- Enhanced Safety Alignment: Features improved safety scores on RTVLM and VLGuard datasets, with better handling of political, racial, jailbreak, and misleading content
- Experimental Multipage Support: Trained to handle question answering tasks using up to 8 consecutive pages from a document, enabling long context processing
- Advanced Document Processing Features: Introduces novel capabilities including image segmentation and doctags generation for parsing documents into structured text formats
- Efficient Enterprise-Focused Design: Compact 2 billion parameter architecture optimized for visual document understanding tasks while maintaining 128 thousand token context length
-
The TrOCR large-sized model fine-tuned on SROIE is a specialized transformer-based optical character recognition system designed for extracting text from single-line images.
- Transformer Based Architecture: Encoder-decoder design with image Transformer encoder and text Transformer decoder for end-to-end optical character recognition
- Pretrained Component Initialization: Leverages BEiT weights for image encoder and RoBERTa weights for text decoder for better performance
- Patch Based Image Processing: Processes images as fixed-size 16 by 16 patches with linear embedding and position embeddings
- Autoregressive Text Generation: Decoder generates text tokens sequentially for accurate character recognition
- SROIE Dataset Specialization: Fine-tuned on the SROIE dataset for enhanced performance on printed text recognition tasks
General purpose
- Qwen3.6 - a collection of the latest generation Qwen LLMs
- NVIDIA Nemotron v3 - a family of open models from NVIDIA with open weights, training data and recipes, delivering leading efficiency and accuracy for building specialized AI agents
- Gemma 4 - a family of open models built by Google DeepMind, that are multimodal, handling text and image input (with audio supported on small models) and generating text output
- Mistral Small 4 - A state-of-the-art model from Mistral, open-weight, with a granular Mixture-of-Experts architecture that fuses instruct, reasoning and agentic skills
- gpt-oss - a collection of open-weight models from OpenAI, designed for powerful reasoning, agentic tasks, and versatile developer use cases
- gpt-oss-puzzle-88B - a deployment-optimized large language model developed by NVIDIA, derived from OpenAI's gpt-oss-120b
- Hunyuan - a collection of Tencent's open-source efficient LLMs designed for versatile deployment across diverse computational environments
- Phi-4 - a family of small language, multi-modal and reasoning models from Microsoft
- OpenReasoning-Nemotron - a collection of models from NVIDIA, trained on 5M reasoning traces for math, code and science
- GLM-5 - a model targeting complex systems engineering and long-horizon agentic tasks
- Granite 4.0 - a collection of lightweight, state-of-the-art open foundation models from IBM that natively support multilingual capabilities, a wide range of coding tasks—including fill-in-the-middle (FIM) code completion—retrieval-augmented generation (RAG), tool usage and structured JSON output
- EXAONE-4.0 - a collection of LLMs from LG AI Research, integrating non-reasoning and reasoning modes
- ERNIE 4.5 - a collection of large-scale multimodal models from Baidu
- Seed-OSS - a collection of LLMs developed by ByteDance's Seed Team, designed for powerful long-context, reasoning, agent and general capabilities, and versatile developer-friendly features
- Step-3.5-Flash - most capable open-source foundation model, engineered to deliver frontier reasoning and agentic capabilities with exceptional efficiency
Coding
- Qwen3-Coder-Next - a collection of Qwen's open-weight language models designed specifically for coding agents and local development
- Devstral 2 - a couple of agentic LLMs for software engineering tasks, excelling at using tools to explore codebases, edit multiple files, and power SWE Agents
- GLM-4.7 - a collection of agentic, reasoning and coding (ARC) foundation models
- MiniMax-M2 - a collection of SOTA models for real-world dev & agents
- OmniCoder-9B - a 9-billion parameter coding agent model built by Tesslate, fine-tuned on top of Qwen3.5-9B's hybrid architecture
- NousCoder-14B - a competitive programming model post-trained on Qwen3-14B via reinforcement learning
- FrogBoss-32B-2510 & FrogMini-14B-2510 - coding agents specialized in fixing bugs in code obtained by fine‑tuning a Qwen3‑32B and Qwen3‑14B language model, respectively, on debugging trajectories generated by Claude Sonnet 4 within the BugPilot framework
- Jan-code - a small code-tuned model focuses on handling well-scoped subtasks reliably while keeping latency and compute requirements small
- Mellum-4b-base - an LLM from JetBrains, optimized for code-related tasks
- Stable-DiffCoder - a strong code diffusion large language model
Multimodal
- Qwen3-Omni - a collection of the natively end-to-end multilingual omni-modal foundation models from Qwen
- GLM-4.6V - a collection of open source multimodal models with native tool use from Zhipu AI
Image
- Qwen-Image - a collection of models for image generation, edit and decomposition from Qwen
- Qwen3-VL - a collection of the most powerful vision-language models in the Qwen series to date
- GLM-Image - an image generation model
- HunyuanImage - a collection of image generation models from Tencent
- HunyuanVideo - a collection of video generation models from Tencent
- Vidi - a collection of models for multimodal video understanding and creation
- FastVLM - a collection of VLMs with efficient vision encoding from Apple
- MiniCPM-V-4_5 - a GPT-4o Level MLLM for single image, multi image and high-FPS video understanding on your phone
- LFM2-VL - a colection of vision-language models, designed for on-device deployment
- ClipTagger-12b - a vision-language model (VLM) designed for video understanding at massive scale
Audio
- whisper-large-v3 - a state-of-the-art model for automatic speech recognition (ASR) and speech translation from OpenAI
- Nemotron Speech - a collection of open, state-of-the-art, production‑ready enterprise speech models from the NVIDIA Speech research team for ASR, TTS, Speaker Diarization and S2S
- Qwen3-ASR - a collection of models that support language identification and ASR for 52 languages and dialects
- Qwen3-TTS - a collection of TTS models that cover 10 major languages as well as multiple dialectal voice profiles to meet global application needs
- Granite Speech - a collection of compact and efficient speech-language models from IBM, specifically designed for multilingual automatic speech recognition (ASR) and bidirectional automatic speech translation (AST)
- Voxtral-Small-24B-2507 - an enhancement of Mistral Small 3, incorporating state-of-the-art audio input capabilities while retaining best-in-class text performance
- Voxtral-Mini-4B-Realtime-2602 - a multilingual, realtime speech-transcription model and among the first open-source solutions to achieve accuracy comparable to offline systems with a delay of <500ms
- Voxtral-4B-TTS-2603 - frontier, open-weights text-to-speech model that’s fast, instantly adaptable, and produces lifelike speech for voice agents
- chatterbox - first production-grade open-source TTS model
- VibeVoice - a collection of frontier text-to-speech models from Microsoft
- Kitten TTS - a collection of open-source realistic text-to-speech models designed for lightweight deployment and high-quality voice synthesis
- Streaming Sortformer Diarizer 4spk v2.1 - a streaming version of a novel end-to-end neural model for speaker diarization from NVIDIA
Retrieval-Augmented Generation
- Nemotron RAG - a set of tools to build retrieval-augmented generation (RAG) systems, improve search and ranking accuracy, and extract structured data from complex docs
- Qwen3-Embedding - a collection of the latest proprietary Qwen models, specifically designed for text embedding and ranking tasks
- Qwen3-VL-Embedding - an addition to the Qwen embedding models, specifically designed for multimodal information retrieval and cross-modal understanding
- Qwen3-Reranker - a collection of the latest proprietary Qwen models, engineered to refine embedding results
- Qwen3-VL-Reranker - an addition to the Qwen embedding models, specifically designed for multimodal information retrieval and cross-modal understanding
Safeguards
- gpt-oss-safeguard - a collection of safety reasoning models built-upon gpt-oss
- Granite Guardian Models - a collection of models created by IBM for safeguarding language models
- Qwen3Guard - a collection of safety moderation models built upon Qwen3
- NemoGuard - a collection of models from NVIDIA for content safety, topic-following and security guardrails
- Nemotron-3-Content-Safety - a content-safety moderator from NVIDIA for both inputs to and responses from LLMs and VLMs
- privacy-filter - a bidirectional token-classification model from OpenAI for personally identifiable information (PII) detection and masking in text
- AprielGuard - a safeguard model designed to detect and mitigate both safety risks and security threats in LLM interactions
Miscellaneous
- Marco-MoE - a suit of multilingual MoE models with highly-sparse architectures
- Jan-v3 - a 4B baseline model for fine-tuning, designed for downstream work: improved instruction following out of the box, strong starting point for fine-tuning and effective lightweight coding assistance
- Jan-v2-VL - a family of VLM focused on reliable, many-step task execution
- Nemotron-Orchestrator-8B - a state-of-the-art 8B orchestration model designed to solve complex, multi-turn agentic tasks by coordinating a diverse set of expert models and tools
- Arch-Router-1.5B - the fastest LLM router model that aligns to subjective usage preferences
- Waypoint - a collection of real-time interactive video world models
- Hunyuan3D - a collection of everything related (models, datasets etc.) to 3D assets generation from Tencent
- Hunyuan-GameCraft-1.0 - a novel framework for high-dynamic interactive video generation in game environments
- void-model - a model from Netflix that removes objects from videos along with all interactions they induce on the scene — not just secondary effects like shadows and reflections, but physical interactions like objects falling when a person is removed
Reading
Articles
Tags:
ai
model
Last modified 07 May 2026