MiniMaxAI
/

MiniMax-M1-40k

@@ -66,8 +66,8 @@ license: apache-2.0
 We introduce MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model.
 MiniMax-M1 is powered by a hybrid Mixture-of-Experts (MoE) architecture combined with a lightning
-attention mechanism. The model is developed based on our previous MiniMax-Text-01 model (MiniMax
-et al., 2025), which contains a total of 456 billion parameters with 45.9 billion parameters activated
 per token. Consistent with MiniMax-Text-01, the M1 model natively supports a context length of 1
 million tokens, 8x the context size of DeepSeek R1. Furthermore, the lightning attention mechanism
 in MiniMax-M1 enables efficient scaling of test-time compute – For example, compared to DeepSeek
@@ -79,7 +79,8 @@ We develop an efficient RL scaling framework for M1 highlighting two perspective
 CISPO, a novel algorithm that clips importance sampling weights instead of token updates, which
 outperforms other competitive RL variants; (2) Our hybrid-attention design naturally enhances the
 efficiency of RL, where we address unique challenges when scaling RL with the hybrid architecture. We
-train two versions of MiniMax-M1 models with [40K](https://huggingface.co/MiniMaxAI/MiniMax-M1-40k) and [80K](https://huggingface.co/MiniMaxAI/MiniMax-M1-80k) thinking budgets respectively. Experiments
 on standard benchmarks show that our models outperform other strong open-weight models such as
 the original DeepSeek-R1 and Qwen3-235B, particularly on complex software engineering, tool using,
 and long context tasks. With efficient scaling of test-time compute, MiniMax-M1 serves as a strong
@@ -173,12 +174,9 @@ Alternatively, you can also deploy using Transformers directly. For detailed Tra
 The MiniMax-M1 model supports function calling capabilities, enabling the model to identify when external functions need to be called and output function call parameters in a structured format. [MiniMax-M1 Function Call Guide](https://huggingface.co/MiniMaxAI/MiniMax-M1-40k/blob/main/function_call_guide.md) provides detailed instructions on how to use the function calling feature of MiniMax-M1.
-## 5. Citation
-## 6. Chatbot & API
 For general use and evaluation, we provide a [Chatbot](https://chat.minimax.io/) with online search capabilities and the [online API](https://www.minimax.io/platform/) for developers. For general use and evaluation, we provide the [MiniMax MCP Server](https://github.com/MiniMax-AI/MiniMax-MCP) with video generation, image generation, speech synthesis, and voice cloning for developers.
-## 7. Contact Us
 Contact us at [[email protected]](mailto:[email protected]).

 We introduce MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model.
 MiniMax-M1 is powered by a hybrid Mixture-of-Experts (MoE) architecture combined with a lightning
+attention mechanism. The model is developed based on our previous [MiniMax-Text-01 model](https://huggingface.co/MiniMaxAI/MiniMax-Text-01),
+which contains a total of 456 billion parameters with 45.9 billion parameters activated
 per token. Consistent with MiniMax-Text-01, the M1 model natively supports a context length of 1
 million tokens, 8x the context size of DeepSeek R1. Furthermore, the lightning attention mechanism
 in MiniMax-M1 enables efficient scaling of test-time compute – For example, compared to DeepSeek
 CISPO, a novel algorithm that clips importance sampling weights instead of token updates, which
 outperforms other competitive RL variants; (2) Our hybrid-attention design naturally enhances the
 efficiency of RL, where we address unique challenges when scaling RL with the hybrid architecture. We
+train two versions of MiniMax-M1 models with [40K](https://huggingface.co/MiniMaxAI/MiniMax-M1-40k) and
+[80K](https://huggingface.co/MiniMaxAI/MiniMax-M1-80k) thinking budgets respectively. Experiments
 on standard benchmarks show that our models outperform other strong open-weight models such as
 the original DeepSeek-R1 and Qwen3-235B, particularly on complex software engineering, tool using,
 and long context tasks. With efficient scaling of test-time compute, MiniMax-M1 serves as a strong
 The MiniMax-M1 model supports function calling capabilities, enabling the model to identify when external functions need to be called and output function call parameters in a structured format. [MiniMax-M1 Function Call Guide](https://huggingface.co/MiniMaxAI/MiniMax-M1-40k/blob/main/function_call_guide.md) provides detailed instructions on how to use the function calling feature of MiniMax-M1.
+## 5. Chatbot & API
 For general use and evaluation, we provide a [Chatbot](https://chat.minimax.io/) with online search capabilities and the [online API](https://www.minimax.io/platform/) for developers. For general use and evaluation, we provide the [MiniMax MCP Server](https://github.com/MiniMax-AI/MiniMax-MCP) with video generation, image generation, speech synthesis, and voice cloning for developers.
+## 6. Contact Us
 Contact us at [[email protected]](mailto:[email protected]).