sriting commited on
Commit
3e13426
·
1 Parent(s): c7a3b80

update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -8
README.md CHANGED
@@ -66,8 +66,8 @@ license: apache-2.0
66
 
67
  We introduce MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model.
68
  MiniMax-M1 is powered by a hybrid Mixture-of-Experts (MoE) architecture combined with a lightning
69
- attention mechanism. The model is developed based on our previous MiniMax-Text-01 model (MiniMax
70
- et al., 2025), which contains a total of 456 billion parameters with 45.9 billion parameters activated
71
  per token. Consistent with MiniMax-Text-01, the M1 model natively supports a context length of 1
72
  million tokens, 8x the context size of DeepSeek R1. Furthermore, the lightning attention mechanism
73
  in MiniMax-M1 enables efficient scaling of test-time compute – For example, compared to DeepSeek
@@ -79,7 +79,8 @@ We develop an efficient RL scaling framework for M1 highlighting two perspective
79
  CISPO, a novel algorithm that clips importance sampling weights instead of token updates, which
80
  outperforms other competitive RL variants; (2) Our hybrid-attention design naturally enhances the
81
  efficiency of RL, where we address unique challenges when scaling RL with the hybrid architecture. We
82
- train two versions of MiniMax-M1 models with [40K](https://huggingface.co/MiniMaxAI/MiniMax-M1-40k) and [80K](https://huggingface.co/MiniMaxAI/MiniMax-M1-80k) thinking budgets respectively. Experiments
 
83
  on standard benchmarks show that our models outperform other strong open-weight models such as
84
  the original DeepSeek-R1 and Qwen3-235B, particularly on complex software engineering, tool using,
85
  and long context tasks. With efficient scaling of test-time compute, MiniMax-M1 serves as a strong
@@ -173,12 +174,9 @@ Alternatively, you can also deploy using Transformers directly. For detailed Tra
173
  The MiniMax-M1 model supports function calling capabilities, enabling the model to identify when external functions need to be called and output function call parameters in a structured format. [MiniMax-M1 Function Call Guide](https://huggingface.co/MiniMaxAI/MiniMax-M1-40k/blob/main/function_call_guide.md) provides detailed instructions on how to use the function calling feature of MiniMax-M1.
174
 
175
 
176
- ## 5. Citation
177
-
178
-
179
- ## 6. Chatbot & API
180
  For general use and evaluation, we provide a [Chatbot](https://chat.minimax.io/) with online search capabilities and the [online API](https://www.minimax.io/platform/) for developers. For general use and evaluation, we provide the [MiniMax MCP Server](https://github.com/MiniMax-AI/MiniMax-MCP) with video generation, image generation, speech synthesis, and voice cloning for developers.
181
 
182
 
183
- ## 7. Contact Us
184
  Contact us at [[email protected]](mailto:[email protected]).
 
66
 
67
  We introduce MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model.
68
  MiniMax-M1 is powered by a hybrid Mixture-of-Experts (MoE) architecture combined with a lightning
69
+ attention mechanism. The model is developed based on our previous [MiniMax-Text-01 model](https://huggingface.co/MiniMaxAI/MiniMax-Text-01),
70
+ which contains a total of 456 billion parameters with 45.9 billion parameters activated
71
  per token. Consistent with MiniMax-Text-01, the M1 model natively supports a context length of 1
72
  million tokens, 8x the context size of DeepSeek R1. Furthermore, the lightning attention mechanism
73
  in MiniMax-M1 enables efficient scaling of test-time compute – For example, compared to DeepSeek
 
79
  CISPO, a novel algorithm that clips importance sampling weights instead of token updates, which
80
  outperforms other competitive RL variants; (2) Our hybrid-attention design naturally enhances the
81
  efficiency of RL, where we address unique challenges when scaling RL with the hybrid architecture. We
82
+ train two versions of MiniMax-M1 models with [40K](https://huggingface.co/MiniMaxAI/MiniMax-M1-40k) and
83
+ [80K](https://huggingface.co/MiniMaxAI/MiniMax-M1-80k) thinking budgets respectively. Experiments
84
  on standard benchmarks show that our models outperform other strong open-weight models such as
85
  the original DeepSeek-R1 and Qwen3-235B, particularly on complex software engineering, tool using,
86
  and long context tasks. With efficient scaling of test-time compute, MiniMax-M1 serves as a strong
 
174
  The MiniMax-M1 model supports function calling capabilities, enabling the model to identify when external functions need to be called and output function call parameters in a structured format. [MiniMax-M1 Function Call Guide](https://huggingface.co/MiniMaxAI/MiniMax-M1-40k/blob/main/function_call_guide.md) provides detailed instructions on how to use the function calling feature of MiniMax-M1.
175
 
176
 
177
+ ## 5. Chatbot & API
 
 
 
178
  For general use and evaluation, we provide a [Chatbot](https://chat.minimax.io/) with online search capabilities and the [online API](https://www.minimax.io/platform/) for developers. For general use and evaluation, we provide the [MiniMax MCP Server](https://github.com/MiniMax-AI/MiniMax-MCP) with video generation, image generation, speech synthesis, and voice cloning for developers.
179
 
180
 
181
+ ## 6. Contact Us
182
  Contact us at [[email protected]](mailto:[email protected]).