AI Models

nanochat - a full-stack implementation of an LLM like ChatGPT in a single, clean, minimal, hackable, dependency-lite codebase, designed to run on a single 8XH100 node via scripts like speedrun.sh, that run the entire pipeline start to end
Knowledge Distillation: How LLMs train each other
gguf-docs - Docs for GGUF quantization (unofficial)
Embarrassingly Simple Self-Distillation Improves Code Generation

What is the best open-source LLM now?

The “best” open-source LLM always depends on your use case, compute budget, and priorities.That said, if you really want some names, here are commonly recommended open-source LLMs for different use cases.

Reasoning: DeepSeek-V3.2-Speciale
Coding assistants: GLM-4.7, MiniMax-M2.1
Agentic workflows: MiMo-V2-Flash, Kimi-K2.5
General chat: Qwen3-235B-A22B-Instruct-2507, DeepSeek-V3.2
Story writing & creative tasks: Qwen3-235B-A22B-Instruct-2507, Llama 4 Maverick
OCR: OCRFlux, PaddleOCR, MiniCPM-V

The “best” model is the one that fits your product requirements, works within your compute constraints, and can be optimized for your specific tasks.

Model types

Source models ("text" or "base"): predictive-text models trained on a large corpus of text
Fine Tuned models ("chat", "instruct", "code"): receive input in a specific format and respond accordingly
Embedding models (special case): smaller, specifically for creating embedding vectors
Multi modal: accepts text and some other modality (image, audio, video, ....)

Explorers, Benchmarks, Leaderboards

Arena - benchmark & compare the best AI models
AI Models & API Providers Analysis - understand the AI landscape to choose the best model and provider for your use case
BullshitBench - measure whether AI models challenge nonsensical prompts instead of confidently answering them
CyberGym - evaluating AI agents' real-world cybersecurity capabilities at scale
Dubesor LLM Benchmark table - small-scale manual performance comparison benchmark
LLM Explorer - explore list of the open-source LLM models
oobabooga benchmark - a list sorted by size (on disk) for each score
SWE-rebench - a continuously evolving and decontaminated benchmark for software engineering LLMs
vakra - a benchmark for evaluating multi-hop, multi-source tool-calling in AI agents

Providers

bartowski - providing GGUF versions of popular LLMs
Open Thoughts - a team of researchers and engineers curating the best open reasoning datasets
Tencent - a profile of a Chinese multinational technology conglomerate and holding company
Unsloth AI - focusing on making AI more accessible to everyone (GGUFs etc.)

Related tools

llmfit - hundreds of models & providers, one command to find what runs on your hardware
outlines - structured outputs for LLMs
llama-swap - reliable model swapping for any local OpenAI compatible server - llama.cpp, vllm, etc.

Specific (uncategorized) list of models

wizard-math: Your logic partner. "This model is a specialized version of the WizardLM family. It is trained to excel in complex mathematical problems, logical reasoning, and solving puzzles. There are three different sizes available for wizard-math – 7B, 13B, and 70B. With my little experience exploring wizard-math with 7B, I found it fantastic. It helps me test my solutions and explore new ways to approach difficult problems. The model's ability to handle these subjects with precision and clarity makes it my go-to partner for all things logic and numbers."
reader-lm: Web to markdown, instantly. "The model is super practical for my needs. Instead of manually creating .md files from the web content, I can feed them to reader-lm to get a perfectly structured markdown file. In my experience, while reader-lm does an amazing job for most of my needs, it sometimes struggles with really large or messy HTML code. But, it works well enough most of the time."
llma-guard3: An LLM for safe prompts. "When working with LLMs, it’s crucial that our interactions are safe and responsible. While we can’t control an LLM’s response, we can ensure our prompts are appropriate. That’s exactly why I self-hosted llama-guard 3. This powerful model acts as a dedicated content moderation tool for all my other local LLMs. The llama-guard 3's job is to classify every interaction against a set of safety categories. It checks our prompts for 13 different categories. When we give a prompt to this LLM, it will respond with a message stating whether the prompt message was safe or unsafe. If it is unsafe, it flags it with a specific reason, such as S1 (Hate Speech) or S2 (Sexual Content), etc."
InternVL2.5-4B is a compact multimodal large language model from the InternVL 2.5 series, combining a 300 million parameter InternViT vision encoder with a 3 billion parameter Qwen2.5 language model.
- Dynamic High Resolution Processing: Handles single images, multiple images, and video frames by dividing them into adaptive 448 by 448 pixel tiles with intelligent token reduction through pixel unshuffle operations
- Efficient Three Stage Training: Features a carefully designed pipeline with MLP warmup, optional vision encoder incremental learning for specialized domains, and full model instruction tuning with strict data quality controls
- Progressive Scaling Strategy: Trains the vision encoder with smaller language models first before transferring to larger ones, using less than one tenth of the tokens required by comparable models
- Advanced Data Quality Filtering: Employs a comprehensive pipeline with LLM based quality scoring, repetition detection, and heuristic rule based filtering to remove low quality samples and prevent model degradation
- Strong Multimodal Performance: Delivers competitive results on OCR, document parsing, chart understanding, multi image comprehension, and video analysis while preserving pure language capabilities through improved data curation
Granite Vision 3.3 2b is a compact and efficient vision-language model released on June 11th, 2025, designed specifically for visual document understanding tasks.
- Superior Document Understanding Performance: Achieves improved scores across key benchmarks including ChartQA, DocVQA, TextVQA, and OCRBench, outperforming previous granite-vision versions
- Enhanced Safety Alignment: Features improved safety scores on RTVLM and VLGuard datasets, with better handling of political, racial, jailbreak, and misleading content
- Experimental Multipage Support: Trained to handle question answering tasks using up to 8 consecutive pages from a document, enabling long context processing
- Advanced Document Processing Features: Introduces novel capabilities including image segmentation and doctags generation for parsing documents into structured text formats
- Efficient Enterprise-Focused Design: Compact 2 billion parameter architecture optimized for visual document understanding tasks while maintaining 128 thousand token context length
The TrOCR large-sized model fine-tuned on SROIE is a specialized transformer-based optical character recognition system designed for extracting text from single-line images.
- Transformer Based Architecture: Encoder-decoder design with image Transformer encoder and text Transformer decoder for end-to-end optical character recognition
- Pretrained Component Initialization: Leverages BEiT weights for image encoder and RoBERTa weights for text decoder for better performance
- Patch Based Image Processing: Processes images as fixed-size 16 by 16 patches with linear embedding and position embeddings
- Autoregressive Text Generation: Decoder generates text tokens sequentially for accurate character recognition
- SROIE Dataset Specialization: Fine-tuned on the SROIE dataset for enhanced performance on printed text recognition tasks

General purpose

Qwen3.6 - a collection of the latest generation Qwen LLMs
NVIDIA Nemotron v3 - a family of open models from NVIDIA with open weights, training data and recipes, delivering leading efficiency and accuracy for building specialized AI agents
Gemma 4 - a family of open models built by Google DeepMind, that are multimodal, handling text and image input (with audio supported on small models) and generating text output
Mistral Small 4 - A state-of-the-art model from Mistral, open-weight, with a granular Mixture-of-Experts architecture that fuses instruct, reasoning and agentic skills
gpt-oss - a collection of open-weight models from OpenAI, designed for powerful reasoning, agentic tasks, and versatile developer use cases
gpt-oss-puzzle-88B - a deployment-optimized large language model developed by NVIDIA, derived from OpenAI's gpt-oss-120b
Hunyuan - a collection of Tencent's open-source efficient LLMs designed for versatile deployment across diverse computational environments
Phi-4 - a family of small language, multi-modal and reasoning models from Microsoft
OpenReasoning-Nemotron - a collection of models from NVIDIA, trained on 5M reasoning traces for math, code and science
GLM-5 - a model targeting complex systems engineering and long-horizon agentic tasks
Granite 4.0 - a collection of lightweight, state-of-the-art open foundation models from IBM that natively support multilingual capabilities, a wide range of coding tasks—including fill-in-the-middle (FIM) code completion—retrieval-augmented generation (RAG), tool usage and structured JSON output
EXAONE-4.0 - a collection of LLMs from LG AI Research, integrating non-reasoning and reasoning modes
ERNIE 4.5 - a collection of large-scale multimodal models from Baidu
Seed-OSS - a collection of LLMs developed by ByteDance's Seed Team, designed for powerful long-context, reasoning, agent and general capabilities, and versatile developer-friendly features
Step-3.5-Flash - most capable open-source foundation model, engineered to deliver frontier reasoning and agentic capabilities with exceptional efficiency

Coding

Qwen3-Coder-Next - a collection of Qwen's open-weight language models designed specifically for coding agents and local development
Devstral 2 - a couple of agentic LLMs for software engineering tasks, excelling at using tools to explore codebases, edit multiple files, and power SWE Agents
GLM-4.7 - a collection of agentic, reasoning and coding (ARC) foundation models
MiniMax-M2 - a collection of SOTA models for real-world dev & agents
OmniCoder-9B - a 9-billion parameter coding agent model built by Tesslate, fine-tuned on top of Qwen3.5-9B's hybrid architecture
NousCoder-14B - a competitive programming model post-trained on Qwen3-14B via reinforcement learning
FrogBoss-32B-2510 & FrogMini-14B-2510 - coding agents specialized in fixing bugs in code obtained by fine‑tuning a Qwen3‑32B and Qwen3‑14B language model, respectively, on debugging trajectories generated by Claude Sonnet 4 within the BugPilot framework
Jan-code - a small code-tuned model focuses on handling well-scoped subtasks reliably while keeping latency and compute requirements small
Mellum-4b-base - an LLM from JetBrains, optimized for code-related tasks
Stable-DiffCoder - a strong code diffusion large language model

Multimodal

Qwen3-Omni - a collection of the natively end-to-end multilingual omni-modal foundation models from Qwen
GLM-4.6V - a collection of open source multimodal models with native tool use from Zhipu AI

Image

Qwen-Image - a collection of models for image generation, edit and decomposition from Qwen
Qwen3-VL - a collection of the most powerful vision-language models in the Qwen series to date
GLM-Image - an image generation model
HunyuanImage - a collection of image generation models from Tencent
HunyuanVideo - a collection of video generation models from Tencent
Vidi - a collection of models for multimodal video understanding and creation
FastVLM - a collection of VLMs with efficient vision encoding from Apple
MiniCPM-V-4_5 - a GPT-4o Level MLLM for single image, multi image and high-FPS video understanding on your phone
LFM2-VL - a colection of vision-language models, designed for on-device deployment
ClipTagger-12b - a vision-language model (VLM) designed for video understanding at massive scale

Audio

whisper-large-v3 - a state-of-the-art model for automatic speech recognition (ASR) and speech translation from OpenAI
Nemotron Speech - a collection of open, state-of-the-art, production‑ready enterprise speech models from the NVIDIA Speech research team for ASR, TTS, Speaker Diarization and S2S
Qwen3-ASR - a collection of models that support language identification and ASR for 52 languages and dialects
Qwen3-TTS - a collection of TTS models that cover 10 major languages as well as multiple dialectal voice profiles to meet global application needs
Granite Speech - a collection of compact and efficient speech-language models from IBM, specifically designed for multilingual automatic speech recognition (ASR) and bidirectional automatic speech translation (AST)
Voxtral-Small-24B-2507 - an enhancement of Mistral Small 3, incorporating state-of-the-art audio input capabilities while retaining best-in-class text performance
Voxtral-Mini-4B-Realtime-2602 - a multilingual, realtime speech-transcription model and among the first open-source solutions to achieve accuracy comparable to offline systems with a delay of <500ms
Voxtral-4B-TTS-2603 - frontier, open-weights text-to-speech model that’s fast, instantly adaptable, and produces lifelike speech for voice agents
chatterbox - first production-grade open-source TTS model
VibeVoice - a collection of frontier text-to-speech models from Microsoft
Kitten TTS - a collection of open-source realistic text-to-speech models designed for lightweight deployment and high-quality voice synthesis
Streaming Sortformer Diarizer 4spk v2.1 - a streaming version of a novel end-to-end neural model for speaker diarization from NVIDIA

Retrieval-Augmented Generation

Nemotron RAG - a set of tools to build retrieval-augmented generation (RAG) systems, improve search and ranking accuracy, and extract structured data from complex docs
Qwen3-Embedding - a collection of the latest proprietary Qwen models, specifically designed for text embedding and ranking tasks
Qwen3-VL-Embedding - an addition to the Qwen embedding models, specifically designed for multimodal information retrieval and cross-modal understanding
Qwen3-Reranker - a collection of the latest proprietary Qwen models, engineered to refine embedding results
Qwen3-VL-Reranker - an addition to the Qwen embedding models, specifically designed for multimodal information retrieval and cross-modal understanding

Safeguards

gpt-oss-safeguard - a collection of safety reasoning models built-upon gpt-oss
Granite Guardian Models - a collection of models created by IBM for safeguarding language models
Qwen3Guard - a collection of safety moderation models built upon Qwen3
NemoGuard - a collection of models from NVIDIA for content safety, topic-following and security guardrails
Nemotron-3-Content-Safety - a content-safety moderator from NVIDIA for both inputs to and responses from LLMs and VLMs
privacy-filter - a bidirectional token-classification model from OpenAI for personally identifiable information (PII) detection and masking in text
AprielGuard - a safeguard model designed to detect and mitigate both safety risks and security threats in LLM interactions

Miscellaneous

Marco-MoE - a suit of multilingual MoE models with highly-sparse architectures
Jan-v3 - a 4B baseline model for fine-tuning, designed for downstream work: improved instruction following out of the box, strong starting point for fine-tuning and effective lightweight coding assistance
Jan-v2-VL - a family of VLM focused on reliable, many-step task execution
Nemotron-Orchestrator-8B - a state-of-the-art 8B orchestration model designed to solve complex, multi-turn agentic tasks by coordinating a diverse set of expert models and tools
Arch-Router-1.5B - the fastest LLM router model that aligns to subjective usage preferences
Waypoint - a collection of real-time interactive video world models
Hunyuan3D - a collection of everything related (models, datasets etc.) to 3D assets generation from Tencent
Hunyuan-GameCraft-1.0 - a novel framework for high-dynamic interactive video generation in game environments
void-model - a model from Netflix that removes objects from videos along with all interactions they induce on the scene — not just secondary effects like shadows and reflections, but physical interactions like objects falling when a person is removed

Reading

Articles

Tags: ai model

Last modified 07 May 2026