The code version of Qwen3, the large language model series developed by Qwen team, Alibaba Cloud.
This model has 235B parameters, with 22B active per token, powered by 128 experts (8 active). Note that it only supports non-thinking mode and does not generate
The smallest dense model in Alibaba’s Qwen3 family, released under the Apache 2.0 license. Despite its tiny size, it inherits much of what makes Qwen3 notable: strong reasoning, improved agent and tool-use capabilities, and broad multilingual support. Architecturally, Qwen3-0.6B is a 0.6B parameter causal LM with 32K context length. Like the rest of the Qwen3 line, it supports a hybrid behavior pattern that can switch between deeper reasoning and faster responses depending on your use case. By December 2025, Qwen3-0.6B is among the most downloaded text generation models on Hugging Face.
Why should you use Qwen3-0.6B:
Points to be cautious about:
If you can afford a bit more compute, I also recommend Qwen3-1.7B and Qwen3-4B. They keep the same hybrid reasoning and agent-friendly design, but offer better performance for more complex tasks than the 0.6B model.
Qwen3-VL is the latest and most capable VLM in Alibaba’s Qwen series, which represents a major leap over its predecessor Qwen2.5-VL. It delivers stronger multimodal reasoning, agentic capabilities, and long-context comprehension.
Two main editions are currently available: Qwen3-VL-235B-A22B and Qwen3-VL-30B-A3B. Both provide Instruct and Thinking variants and official FP8 versions for efficient inference.
The flagship Qwen3-VL-235B-A22B-Instruct rivals top-tier proprietary models such as Gemini-2.5-Pro and GPT-5 across multimodal benchmarks covering general Q&A, 2D/3D grounding, video understanding, OCR, and document comprehension. In text-only tasks, it performs on par with or surpasses frontier models like DeepSeek-V3-0324 and Claude-Opus-4 on leading benchmarks like MMLU, AIME25, and LiveBench1125.
Key features:
For more practical examples and use cases, explore the official Qwen3-VL cookbooks.
The image generation foundation model in the Qwen series. It stands out as a next-generation diffusion model that brings together text-aware visual generation, intelligent editing, and vision understanding. It adopts Apache 2.0, making it an excellent choice for commercial-ready image generation.
Why should you use Qwen-Image:
Note that the image editing version is Qwen-Image-Edit, which is built upon the 20B Qwen-Image model. The latest iteration, Qwen-Image-Edit-2509, further enhances editing consistency and introduces multi-image editing, supporting operations across one to three input images (e.g., “person + product” or “person + scene”). It also adds ControlNet-based conditioning (depth, edge, and keypoint maps) for more structured and controllable results.
If you are working with complex image editing workflows, also take a look at Qwen-Image-Layered. It introduces a layered RGBA representation that decomposes an image into multiple editable layers. This means you can edit them independently in a precise, non-destructive way, including recoloring, resizing, repositioning, object replacement, and clean deletion.
Points to be cautious about:
If you are considering Qwen-Image, I also recommend Qwen-Image-Lightning, a distilled and speed-optimized variant of the base model. It delivers high-quality image generation with a 12 to 25× speed improvement in most scenarios, with no significant loss in visual quality. By reducing inference steps to as few as 4 to 8, Qwen-Image-Lightning is ideal for real-time applications, high-throughput pipelines, and large-scale batch processing.
Alibaba’s Qwen 2.5 7B dominates coding and mathematical reasoning benchmarks. If your use case involves code generation, data analysis, or solving math problems, this model outperforms competitors in its size class.
Domain specialization. Qwen was trained with heavy emphasis on code and technical content. It understands programming patterns, can debug code, and generates working solutions more reliably than general-purpose models.
Best for: Code generation and completion · Mathematical reasoning · Technical documentation · Multilingual tasks (especially Chinese/English)
Hardware: Quantized (4-bit) requires 8GB RAM · Full precision (16-bit) requires 16GB RAM · Recommended: 12GB RAM for best performance
Download / Run locally: Available on Hugging Face under the Qwen org (Qwen 2.5 7B Instruct). For Ollama, pull the instruct-tagged variant: ollama pull qwen2.5:7b-instruct.
Last modified 22 March 2026