How to Use Xiaomi MiMo V2.5 Pro: Complete Step by Step Guide
On April 27, 2026, Xiaomi released and open-sourced MiMo-V2.5-Pro, their most capable AI model to date. This 1.02T-parameter Mixture-of-Experts model with 42B active parameters delivers significant improvements over its predecessor in general agentic capabilities, complex software engineering, and long-horizon tasks. Built on a hybrid-attention architecture with a 1M-token context window, V2.5-Pro can sustain complex autonomous tasks spanning more than a thousand tool calls. In internal testing, it built a complete SysY compiler in Rust from scratch in 4.3 hours scoring 233/233, produced an 8,192-line video editor over 11.5 hours of autonomous work, and designed an analog circuit meeting all six target metrics simultaneously. This complete step by step guide walks through the architecture, capabilities, access methods, and how to leverage MiMo V2.5 Pro for your own projects.
What You'll Learn:
- The MiMo-V2.5-Pro architecture: MoE, hybrid attention, Multi-Token Prediction, and 1M context
- Long-horizon agentic capabilities demonstrated through real benchmark results
- Three ways to access MiMo V2.5 Pro: AI Studio, API Platform, and Token Plan
- How to download and deploy the open-source model weights
- Model specifications, benchmark scores, and comparisons with other frontier models
- Token efficiency advantages and cost-saving strategies
What Is Xiaomi MiMo V2.5 Pro?
MiMo V2.5 Pro is Xiaomi's flagship open-source AI model built for agentic and long-horizon coherence. It is a 1.02T-parameter Mixture-of-Experts model with 42B parameters active per token, using a hybrid-attention architecture that interleaves Local Sliding Window Attention (SWA) and Global Attention (GA) at a 6:1 ratio with a 128-token window. This design cuts KV-cache storage by nearly 7x at long context while preserving performance through a learnable attention-sink bias. A lightweight Multi-Token Prediction (MTP) module with dense FFNs is natively integrated, roughly tripling output throughput and accelerating RL rollouts. The model was pre-trained on 27T tokens using FP8 mixed precision and has a 1M-token context window.
1.02T MoE with 42B Active
Mixture-of-Experts architecture with 1.02 trillion total parameters but only 42 billion active per token. Delivers frontier-level intelligence at a fraction of the compute cost through sparse activation.
Hybrid Attention (6:1 Ratio)
Interleaves Local Sliding Window Attention and Global Attention at a 6:1 ratio with 128-token windows. Cuts KV-cache storage by nearly 7x at long context while maintaining performance through a learnable attention-sink bias.
Multi-Token Prediction (MTP)
A lightweight MTP module with dense FFNs is natively integrated for training and inference. Roughly triples output throughput and accelerates reinforcement learning rollouts significantly.
1M-Token Context Window
Pre-trained at 32K sequence length on 27T tokens using FP8 mixed precision, with context extended up to 1 million tokens. Maintains strong coherence across ultra-long contexts and complex agentic trajectories.
Model Specifications
The following table shows the available model variants, their parameters, context windows, and download links. Both variants are open-sourced under a permissive license.
| Model | Total Params | Active Params | Context | Precision |
|---|---|---|---|---|
| MiMo-V2.5-Pro-Base | 1.02T | 42B | 256K | FP8 (E4M3) Mixed |
| MiMo-V2.5-Pro | 1.02T | 42B | 1M | FP8 (E4M3) Mixed |
Long-Horizon Agentic Capabilities
What sets MiMo V2.5 Pro apart from other models is its demonstrated ability to sustain complex, autonomous work over extended periods. When paired with a proper harness, V2.5-Pro can execute tasks spanning thousands of tool calls with strong coherence and self-correction. Here are three benchmark results from the official launch.
SysY Compiler in Rust — 233/233 Tests Passed
Sourced from Peking University's Compiler Principles course, this task asked the model to implement a complete SysY compiler in Rust from scratch: lexer, parser, AST, Koopa IR codegen, RISC-V assembly backend, and performance optimization. A task that typically takes a CS major student several weeks was completed in 4.3 hours across 672 tool calls, scoring a perfect 233/233 against the course's hidden test suite. The first compile alone passed 137/233 tests (59% cold start), proving the architecture was designed correctly before a single test was run. At turn 512, a refactoring pass regressed two tests; the model diagnosed the failures, recovered, and pushed on.
Full-Featured Video Editor — 8,192 Lines of Code
With just a few simple prompts, MiMo-V2.5-Pro delivered a working desktop video editor with multi-track timeline, clip trimming, cross-fades, audio mixing, and an export pipeline. The final build is 8,192 lines of code, produced over 1,868 tool calls across 11.5 hours of autonomous work. The demo also includes AI voice-over driven by MiMo-V2-TTS, showing cross-modal integration capabilities.
Analog EDA: FVF-LDO Design and Optimization
A graduate-level analog-circuit EDA task: design and optimize a complete FVF-LDO (Flipped-Voltage-Follower LDO) regulator from scratch in the TSMC 180nm CMOS process. The model sized the power transistor, tuned the compensation network, and picked bias voltages so that six metrics — phase margin, line regulation, load regulation, quiescent current, PSRR, and transient response — all landed within spec simultaneously. Wired into an ngspice simulation loop with Claude Code as the harness, the model produced a design improved by an order of magnitude over its initial attempt in about an hour of closed-loop iteration.
Three Ways to Access MiMo V2.5 Pro
Xiaomi provides multiple access methods depending on your use case: a free browser-based AI Studio for experimentation, a full API platform for integration, and a cost-effective Token Plan for heavy usage. The model is also fully open source for self-hosting.
AI Studio (Free)
Visit aistudio.xiaomimimo.com to try MiMo V2.5 Pro directly in your browser. No API key, no setup, no payment required. Simply select mimo-v2.5-pro from the model dropdown and start chatting. Best for quick experimentation and testing prompts.
API Platform
Visit platform.xiaomimimo.com, create an account, and generate an API key. Use the OpenAI-compatible endpoint with the model tag mimo-v2.5-pro. Standard chat completions format works out of the box for integration into your applications.
Token Plan (10x Cheaper)
Navigate to Subscription Details on the API platform and create a Token Plan API key. The Token Plan provides a fixed monthly token allowance at roughly 10x cheaper than standard pay-as-you-go rates. Choose your Dedicated Base URL region and use it with supported tools like OpenCode, Cline, or Cherry Studio.
Open Source (Self-Host)
Download weights from Hugging Face (XiaomiMiMo/MiMo-V2.5-Pro). Deploy using SGLang or vLLM as recommended in the model card. The permissive license allows commercial use, modification, and redistribution. Requires multi-GPU setup with approximately 80-100GB GPU memory.
API Usage Example
The Xiaomi MiMo API uses the OpenAI-compatible chat completions format. Here is a Python example showing how to call MiMo V2.5 Pro using the official API platform endpoint.
from openai import OpenAI
client = OpenAI(
base_url="https://platform.xiaomimimo.com/v1",
api_key="your-api-key-here"
)
response = client.chat.completions.create(
model="mimo-v2.5-pro",
messages=[
{"role": "system", "content": "You are a helpful coding assistant."},
{"role": "user", "content": "Write a Python function to merge two sorted lists."}
],
temperature=0.7,
max_tokens=4096
)
print(response.choices[0].message.content)
Token Efficiency and Cost Advantages
One of MiMo V2.5 Pro's standout features is its token efficiency. On ClawEval, V2.5-Pro achieves a 64% Pass^3 score using only approximately 70K tokens per trajectory — roughly 40-60% fewer tokens than Claude Opus 4.6, Gemini 3.1 Pro, and GPT-5.4 at comparable capability levels. This means you get frontier-level intelligence at significantly lower cost, especially when combined with the Token Plan's 10x cheaper rates. The model's efficiency makes it particularly suitable for agentic workflows where each trajectory involves thousands of tool calls and lengthy context chains.
Token Plan Credit Reset
All users who purchased a Token Plan before 14:00 UTC on April 21, 2026 had their used Credit balance reset. If you were an early adopter, check your dashboard for the updated balance and new pricing tiers.
Step by Step: Get Started with MiMo V2.5 Pro
Try the Model in AI Studio
Go to aistudio.xiaomimimo.com to test MiMo V2.5 Pro directly in your browser with zero setup. This is the fastest way to evaluate the model's capabilities for your specific use case before committing to API integration.
Generate an API Key on the Platform
Visit platform.xiaomimimo.com, create your account, and navigate to the API keys section. Generate a new key and note your Dedicated Base URL region. Use this with the OpenAI-compatible endpoint to integrate MiMo V2.5 Pro into your applications.
Subscribe to the Token Plan for Heavy Usage
If you plan to use MiMo V2.5 Pro extensively, navigate to Subscription Details on the platform dashboard and purchase a Token Plan. This gives you a fixed monthly token allowance at roughly 10x cheaper than standard pay-as-you-go rates. The plan is ideal for coding agents, automated workflows, and long-running agentic tasks.
Use with Agentic Frameworks
MiMo V2.5 Pro works with Claude Code, OpenCode, Kilo, OpenClaw, Hermes Agent, Cherry Studio, Qwen Code, CodeBuddy, and Cline. The model exhibits remarkable harness awareness — it makes full use of environment affordances, manages its memory, and shapes its own context toward the final objective across thousands of tool calls.
Download and Self-Host the Open-Source Model
Visit huggingface.co/XiaomiMiMo/MiMo-V2.5-Pro to download the full model weights, tokenizer, and model card. Deploy using SGLang or vLLM on a multi-GPU setup with approximately 80-100GB GPU memory (FP8 precision). The permissive license allows commercial use, modification, and redistribution.
Build Long-Horizon Agentic Workflows
Leverage MiMo V2.5 Pro's ability to sustain thousands of tool calls across extended autonomous runs. The model excels at structured, self-correcting behavior — as demonstrated by the SysY compiler build where it diagnosed regressions at turn 512 and recovered. Use it for software engineering, research automation, document processing, and any task requiring multi-step reasoning with tool use.
Frequently Asked Questions
What is the difference between MiMo-V2.5-Pro-Base and MiMo-V2.5-Pro?
MiMo-V2.5-Pro-Base has a 256K context window and is the base pre-trained model without chat fine-tuning. MiMo-V2.5-Pro has a 1M context window and includes full chat and instruction fine-tuning for agentic and conversational use.
Is Xiaomi MiMo V2.5 Pro free to use?
The model weights are fully open source under a permissive license on Hugging Face, so you can self-host at no cost. For cloud API access via the Xiaomi platform, you use pay-as-you-go credits or a Token Plan subscription.
What hardware do I need to run MiMo V2.5 Pro locally?
The model has 1.02T total parameters with 42B active per token. With FP8 precision, you need approximately 80-100GB of GPU memory for inference. It can be deployed on multi-GPU setups using SGLang or vLLM as recommended in the Hugging Face model card.
How does MiMo V2.5 Pro compare to DeepSeek V4 Pro?
MiMo V2.5 Pro surpasses DeepSeek V4 Pro on most coding and agentic benchmarks. It ranks fourth on the Artificial Analysis Agentic Index behind GPT-5.5, Claude Opus 4.7, and GPT-5.4, while also using significantly fewer tokens per trajectory for comparable capability levels.
What is the Token Plan and how is it different from regular API credits?
The Token Plan gives a fixed monthly token allowance at roughly 10x cheaper rates than regular pay-as-you-go API credits. It is designed for heavy users of tools like OpenCode, Cline, and Cherry Studio. All users who purchased a Token Plan before April 21, 2026 had their used Credit balance reset.
Need Help with AI Integration?
Our AI engineering team can help you integrate MiMo V2.5 Pro into your workflows, deploy it on your infrastructure, or build custom agentic solutions. From model evaluation to production deployment, we deliver end-to-end support.
About the author
Head of AI Agents Practice
Builds production AI agents for US financial services, healthcare, and retail. SOC 2 Type II / HIPAA-scope deployments on AWS Bedrock. Anthropic Claude, OpenAI, LangChain, MCP.
