Arcee

Models

Trinity Nano〔6B, 1B active〕

Edge, embedded, and privacy-critical

Runs fully local on consumer GPUs, edge servers, and mobile-class devices. Tuned for offline operation and latency-sensitive voice or UI loops.

Deploy on: on-device, edge servers, mobile, kiosks

Active parameters: 1B per token

Context window: 128K tokens

Download

Trinity Mini〔26B, 3B active〕

Cloud and on-prem production

Serve customer-facing apps, agent backends, and high-throughput services in your cloud or VPC.

Deploy on: AWS, GCP, Azure, on-prem (vLLM, SGLang, llama.cpp)

Active parameters: 3B per token

Context window: 128K tokens

Try Mini Download API Docs

Trinity Large Preview〔400B, 13B active〕

Cloud deployment

Production-oriented instruct model trained to navigate well in agent harnesses, handle complex toolchains, and excel in creative scenarios.

Deploy on: Preview API (currently served at 128k context) using 8-bit quantization

Active parameters: 13B per token

Context window: 512K tokens

Try Large Preview Download API Docs

Trinity Large Thinking〔400B, 13B active〕

Cloud deployment

Trinity-Large-Thinking is a reasoning-optimized variant of Arcee AI's Trinity-Large family. Built on Trinity-Large-Base and post-trained with extended chain-of-thought reasoning and agentic RL, it delivers state-of-the-art performance on agentic benchmarks while maintaining strong general capabilities.

Deploy on: Arcee API - served at 256k at BF16

Active parameters: 13B per token

Context window: 512K tokens

Try Large Thinking Download API Docs

Articles

Overview and discussion

Tags: ai place model

Last modified 07 May 2026