openbmb
/

MiniCPM4-8B-mlx

@@ -20,11 +20,12 @@ library_name: transformers
 ## What's New
 - [2025.06.06] **MiniCPM4** series are released! This model achieves ultimate efficiency improvements while maintaining optimal performance at the same scale! It can achieve over 5x generation acceleration on typical end-side chips! You can find the technical report [here](https://github.com/OpenBMB/MiniCPM/tree/main/report/MiniCPM_4_Technical_Report.pdf).🔥🔥🔥
-- [2025.06.09] **MiniCPM4-8B-mlx** is available and you can run MiniCPM4 on your Apple devices! Thanks to [pzc163](https://huggingface.co/pzc163) for providing this converted model version and related usage instructions.
 ## MiniCPM4 Series
 MiniCPM4 series are highly efficient large language models (LLMs) designed explicitly for end-side devices, which achieves this efficiency through systematic innovation in four key dimensions: model architecture, training data, training algorithms, and inference systems.
 - [MiniCPM4-8B-mlx](https://huggingface.co/openbmb/MiniCPM4-8B-mlx): MiniCPM4-8B in mlx format, which can used for Apple silicon. (**<-- you are here**)
 - [MiniCPM4-8B](https://huggingface.co/openbmb/MiniCPM4-8B): The flagship of MiniCPM4, with 8B parameters, trained on 8T tokens.
 - [MiniCPM4-0.5B](https://huggingface.co/openbmb/MiniCPM4-0.5B): The small version of MiniCPM4, with 0.5B parameters, trained on 1T tokens.
 - [MiniCPM4-8B-Eagle-FRSpec](https://huggingface.co/openbmb/MiniCPM4-8B-Eagle-FRSpec): Eagle head for FRSpec, accelerating speculative inference for MiniCPM4-8B.
@@ -56,19 +57,19 @@ MiniCPM 4 is an extremely efficient edge-side large model that has undergone eff
   - ArkInfer -- Cross-platform Deployment System: Supports efficient deployment across multiple backend environments, providing flexible cross-platform adaptation capabilities
-## How to Run MiniCPM-8B-mlx
-Here is a guide on how to run the `MiniCPM-8B-mlx` model from the command line using `mlx-lm`. You can use mlx-lm to interact with the `MiniCPM-8B-mlx` model directly from your command line. This is a powerful tool that allows you to quickly test and use LLMs in the MLX format.
 ### Basic Usage
-Here is a specific example. This command will load the `openbmb/MiniCPM-8B-mlx` model and generate text based on the prompt you provide: "hello, pls tell me which one is the most powerful LLM in the World".
 ```Bash
-mlx_lm.generate --model openbmb/MiniCPM-8B-mlx --prompt "hello, pls tell me which one is the most powerful LLM in the World"
 ```
 ### MLX-LM Command Line Parameters
 - `mlx_lm.generate`: This is the primary command in the mlx-lm toolkit used for text generation.
-- `--model openbmb/MiniCPM-8B-mlx`: This parameter specifies the model to be loaded. `openbmb/MiniCPM-8B-mlx` is the model's identifier on the Hugging Face Hub. mlx-lm will automatically download and cache the model from there.
 - `--prompt "..."`: This parameter is used to provide the initial text that you want the model to respond to or complete.
 - `--max-tokens`: Sets the maximum number of tokens to generate. For example, `--max-tokens 200` will limit the output to 200 tokens.
 - `--temp`: Controls the randomness of the output. Higher temperature values (like 0.8) will produce more diverse and creative outputs, while lower values (like 0.2) will make the output more deterministic and focused. The default value is usually 0.6.
@@ -78,7 +79,7 @@ mlx_lm.generate --model openbmb/MiniCPM-8B-mlx --prompt "hello, pls tell me whic
 The following command will use a higher temperature value and limit the output length:
 ```bash
-mlx_lm.generate --model openbmb/MiniCPM-8B-mlx \
                 --prompt "tell me a story about a robot who discovered music" \
                 --max-tokens 500 \
                 --temp 0.8

 ## What's New
 - [2025.06.06] **MiniCPM4** series are released! This model achieves ultimate efficiency improvements while maintaining optimal performance at the same scale! It can achieve over 5x generation acceleration on typical end-side chips! You can find the technical report [here](https://github.com/OpenBMB/MiniCPM/tree/main/report/MiniCPM_4_Technical_Report.pdf).🔥🔥🔥
+- [2025.06.09] **MiniCPM4-8B-mlx** and **MiniCPM4-0.5B-mlx** are available and you can run MiniCPM4 on your Apple devices! Thanks to [pzc163](https://huggingface.co/pzc163) for providing this converted model version and related usage instructions.
 ## MiniCPM4 Series
 MiniCPM4 series are highly efficient large language models (LLMs) designed explicitly for end-side devices, which achieves this efficiency through systematic innovation in four key dimensions: model architecture, training data, training algorithms, and inference systems.
 - [MiniCPM4-8B-mlx](https://huggingface.co/openbmb/MiniCPM4-8B-mlx): MiniCPM4-8B in mlx format, which can used for Apple silicon. (**<-- you are here**)
+- [MiniCPM4-0.5B-mlx](https://huggingface.co/openbmb/MiniCPM4-0.5B-mlx): MiniCPM4-0.5B in mlx format, which can used for Apple silicon.
 - [MiniCPM4-8B](https://huggingface.co/openbmb/MiniCPM4-8B): The flagship of MiniCPM4, with 8B parameters, trained on 8T tokens.
 - [MiniCPM4-0.5B](https://huggingface.co/openbmb/MiniCPM4-0.5B): The small version of MiniCPM4, with 0.5B parameters, trained on 1T tokens.
 - [MiniCPM4-8B-Eagle-FRSpec](https://huggingface.co/openbmb/MiniCPM4-8B-Eagle-FRSpec): Eagle head for FRSpec, accelerating speculative inference for MiniCPM4-8B.
   - ArkInfer -- Cross-platform Deployment System: Supports efficient deployment across multiple backend environments, providing flexible cross-platform adaptation capabilities
+## How to Run MiniCPM4-8B-mlx
+Here is a guide on how to run the `MiniCPM4-8B-mlx` model from the command line using `mlx-lm`. You can use mlx-lm to interact with the `MiniCPM4-8B-mlx` model directly from your command line. This is a powerful tool that allows you to quickly test and use LLMs in the MLX format.
 ### Basic Usage
+Here is a specific example. This command will load the `openbmb/MiniCPM4-8B-mlx` model and generate text based on the prompt you provide: "hello, pls tell me which one is the most powerful LLM in the World".
 ```Bash
+mlx_lm.generate --model openbmb/MiniCPM4-8B-mlx --prompt "hello, pls tell me which one is the most powerful LLM in the World"
 ```
 ### MLX-LM Command Line Parameters
 - `mlx_lm.generate`: This is the primary command in the mlx-lm toolkit used for text generation.
+- `--model openbmb/MiniCPM4-8B-mlx`: This parameter specifies the model to be loaded. `openbmb/MiniCPM4-8B-mlx` is the model's identifier on the Hugging Face Hub. mlx-lm will automatically download and cache the model from there.
 - `--prompt "..."`: This parameter is used to provide the initial text that you want the model to respond to or complete.
 - `--max-tokens`: Sets the maximum number of tokens to generate. For example, `--max-tokens 200` will limit the output to 200 tokens.
 - `--temp`: Controls the randomness of the output. Higher temperature values (like 0.8) will produce more diverse and creative outputs, while lower values (like 0.2) will make the output more deterministic and focused. The default value is usually 0.6.
 The following command will use a higher temperature value and limit the output length:
 ```bash
+mlx_lm.generate --model openbmb/MiniCPM4-8B-mlx \
                 --prompt "tell me a story about a robot who discovered music" \
                 --max-tokens 500 \
                 --temp 0.8