update README.md
Browse files
README.md
CHANGED
@@ -66,8 +66,8 @@ license: apache-2.0
|
|
66 |
|
67 |
We introduce MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model.
|
68 |
MiniMax-M1 is powered by a hybrid Mixture-of-Experts (MoE) architecture combined with a lightning
|
69 |
-
attention mechanism. The model is developed based on our previous MiniMax-Text-01 model
|
70 |
-
|
71 |
per token. Consistent with MiniMax-Text-01, the M1 model natively supports a context length of 1
|
72 |
million tokens, 8x the context size of DeepSeek R1. Furthermore, the lightning attention mechanism
|
73 |
in MiniMax-M1 enables efficient scaling of test-time compute – For example, compared to DeepSeek
|
@@ -79,7 +79,8 @@ We develop an efficient RL scaling framework for M1 highlighting two perspective
|
|
79 |
CISPO, a novel algorithm that clips importance sampling weights instead of token updates, which
|
80 |
outperforms other competitive RL variants; (2) Our hybrid-attention design naturally enhances the
|
81 |
efficiency of RL, where we address unique challenges when scaling RL with the hybrid architecture. We
|
82 |
-
train two versions of MiniMax-M1 models with [40K](https://huggingface.co/MiniMaxAI/MiniMax-M1-40k) and
|
|
|
83 |
on standard benchmarks show that our models outperform other strong open-weight models such as
|
84 |
the original DeepSeek-R1 and Qwen3-235B, particularly on complex software engineering, tool using,
|
85 |
and long context tasks. With efficient scaling of test-time compute, MiniMax-M1 serves as a strong
|
@@ -173,12 +174,9 @@ Alternatively, you can also deploy using Transformers directly. For detailed Tra
|
|
173 |
The MiniMax-M1 model supports function calling capabilities, enabling the model to identify when external functions need to be called and output function call parameters in a structured format. [MiniMax-M1 Function Call Guide](https://huggingface.co/MiniMaxAI/MiniMax-M1-40k/blob/main/function_call_guide.md) provides detailed instructions on how to use the function calling feature of MiniMax-M1.
|
174 |
|
175 |
|
176 |
-
## 5.
|
177 |
-
|
178 |
-
|
179 |
-
## 6. Chatbot & API
|
180 |
For general use and evaluation, we provide a [Chatbot](https://chat.minimax.io/) with online search capabilities and the [online API](https://www.minimax.io/platform/) for developers. For general use and evaluation, we provide the [MiniMax MCP Server](https://github.com/MiniMax-AI/MiniMax-MCP) with video generation, image generation, speech synthesis, and voice cloning for developers.
|
181 |
|
182 |
|
183 |
-
##
|
184 |
Contact us at [[email protected]](mailto:[email protected]).
|
|
|
66 |
|
67 |
We introduce MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model.
|
68 |
MiniMax-M1 is powered by a hybrid Mixture-of-Experts (MoE) architecture combined with a lightning
|
69 |
+
attention mechanism. The model is developed based on our previous [MiniMax-Text-01 model](https://huggingface.co/MiniMaxAI/MiniMax-Text-01),
|
70 |
+
which contains a total of 456 billion parameters with 45.9 billion parameters activated
|
71 |
per token. Consistent with MiniMax-Text-01, the M1 model natively supports a context length of 1
|
72 |
million tokens, 8x the context size of DeepSeek R1. Furthermore, the lightning attention mechanism
|
73 |
in MiniMax-M1 enables efficient scaling of test-time compute – For example, compared to DeepSeek
|
|
|
79 |
CISPO, a novel algorithm that clips importance sampling weights instead of token updates, which
|
80 |
outperforms other competitive RL variants; (2) Our hybrid-attention design naturally enhances the
|
81 |
efficiency of RL, where we address unique challenges when scaling RL with the hybrid architecture. We
|
82 |
+
train two versions of MiniMax-M1 models with [40K](https://huggingface.co/MiniMaxAI/MiniMax-M1-40k) and
|
83 |
+
[80K](https://huggingface.co/MiniMaxAI/MiniMax-M1-80k) thinking budgets respectively. Experiments
|
84 |
on standard benchmarks show that our models outperform other strong open-weight models such as
|
85 |
the original DeepSeek-R1 and Qwen3-235B, particularly on complex software engineering, tool using,
|
86 |
and long context tasks. With efficient scaling of test-time compute, MiniMax-M1 serves as a strong
|
|
|
174 |
The MiniMax-M1 model supports function calling capabilities, enabling the model to identify when external functions need to be called and output function call parameters in a structured format. [MiniMax-M1 Function Call Guide](https://huggingface.co/MiniMaxAI/MiniMax-M1-40k/blob/main/function_call_guide.md) provides detailed instructions on how to use the function calling feature of MiniMax-M1.
|
175 |
|
176 |
|
177 |
+
## 5. Chatbot & API
|
|
|
|
|
|
|
178 |
For general use and evaluation, we provide a [Chatbot](https://chat.minimax.io/) with online search capabilities and the [online API](https://www.minimax.io/platform/) for developers. For general use and evaluation, we provide the [MiniMax MCP Server](https://github.com/MiniMax-AI/MiniMax-MCP) with video generation, image generation, speech synthesis, and voice cloning for developers.
|
179 |
|
180 |
|
181 |
+
## 6. Contact Us
|
182 |
Contact us at [[email protected]](mailto:[email protected]).
|