Edge, embedded, and privacy-critical
Runs fully local on consumer GPUs, edge servers, and mobile-class devices. Tuned for offline operation and latency-sensitive voice or UI loops.
Deploy on: on-device, edge servers, mobile, kiosks
Active parameters: 1B per token
Context window: 128K tokens
Cloud and on-prem production
Serve customer-facing apps, agent backends, and high-throughput services in your cloud or VPC.
Deploy on: AWS, GCP, Azure, on-prem (vLLM, SGLang, llama.cpp)
Active parameters: 3B per token
Context window: 128K tokens
Cloud deployment
Production-oriented instruct model trained to navigate well in agent harnesses, handle complex toolchains, and excel in creative scenarios.
Deploy on: Preview API (currently served at 128k context) using 8-bit quantization
Active parameters: 13B per token
Context window: 512K tokens
Try Large Preview Download API Docs
Cloud deployment
Trinity-Large-Thinking is a reasoning-optimized variant of Arcee AI's Trinity-Large family. Built on Trinity-Large-Base and post-trained with extended chain-of-thought reasoning and agentic RL, it delivers state-of-the-art performance on agentic benchmarks while maintaining strong general capabilities.
Deploy on: Arcee API - served at 256k at BF16
Active parameters: 13B per token
Context window: 512K tokens
Try Large Thinking Download API Docs
Last modified 07 May 2026