openbmb
/

MiniCPM4-8B-mlx

@@ -19,7 +19,7 @@ library_name: transformers
 </p>
 ## What's New
-- [2025.06.06] **MiniCPM4** series are released! This model achieves ultimate efficiency improvements while maintaining optimal performance at the same scale! It can achieve over 5x generation acceleration on typical end-side chips! You can find technical report [here](https://github.com/OpenBMB/MiniCPM/tree/main/report/MiniCPM_4_Technical_Report.pdf).🔥🔥🔥
 - [2025.06.09] **MiniCPM4-8B-mlx** is available and you can run MiniCPM4 on your Apple devices! Thanks to [pzc163](https://huggingface.co/pzc163) for providing this converted model version and related usage instructions.
 ## MiniCPM4 Series
@@ -61,14 +61,14 @@ MiniCPM 4 is an extremely efficient edge-side large model that has undergone eff
 Here is a guide on how to run the `MiniCPM-8B-mlx` model from the command line using `mlx-lm`. You can use mlx-lm to interact with the `MiniCPM-8B-mlx` model directly from your command line. This is a powerful tool that allows you to quickly test and use LLMs in the MLX format.
 ### Basic Usage
-Here is a specific example. This command will load the `OpenBMB/MiniCPM-8B-mlx` model and generate text based on the prompt you provide: "hello, pls tell me which one is the most powerful LLM in the World".
 ```Bash
-mlx_lm.generate --model OpenBMB/MiniCPM-8B-mlx --prompt "hello, pls tell me which one is the most powerful LLM in the World"
 ```
 ### MLX-LM Command Line Parameters
 - `mlx_lm.generate`: This is the primary command in the mlx-lm toolkit used for text generation.
-- `--model OpenBMB/MiniCPM-8B-mlx`: This parameter specifies the model to be loaded. `OpenBMB/MiniCPM-8B-mlx` is the model's identifier on the Hugging Face Hub. mlx-lm will automatically download and cache the model from there.
 - `--prompt "..."`: This parameter is used to provide the initial text that you want the model to respond to or complete.
 - `--max-tokens`: Sets the maximum number of tokens to generate. For example, `--max-tokens 200` will limit the output to 200 tokens.
 - `--temp`: Controls the randomness of the output. Higher temperature values (like 0.8) will produce more diverse and creative outputs, while lower values (like 0.2) will make the output more deterministic and focused. The default value is usually 0.6.
@@ -78,7 +78,7 @@ mlx_lm.generate --model OpenBMB/MiniCPM-8B-mlx --prompt "hello, pls tell me whic
 The following command will use a higher temperature value and limit the output length:
 ```bash
-mlx_lm.generate --model OpenBMB/MiniCPM-8B-mlx \
                 --prompt "tell me a story about a robot who discovered music" \
                 --max-tokens 500 \
                 --temp 0.8

 </p>
 ## What's New
+- [2025.06.06] **MiniCPM4** series are released! This model achieves ultimate efficiency improvements while maintaining optimal performance at the same scale! It can achieve over 5x generation acceleration on typical end-side chips! You can find the technical report [here](https://github.com/OpenBMB/MiniCPM/tree/main/report/MiniCPM_4_Technical_Report.pdf).🔥🔥🔥
 - [2025.06.09] **MiniCPM4-8B-mlx** is available and you can run MiniCPM4 on your Apple devices! Thanks to [pzc163](https://huggingface.co/pzc163) for providing this converted model version and related usage instructions.
 ## MiniCPM4 Series
 Here is a guide on how to run the `MiniCPM-8B-mlx` model from the command line using `mlx-lm`. You can use mlx-lm to interact with the `MiniCPM-8B-mlx` model directly from your command line. This is a powerful tool that allows you to quickly test and use LLMs in the MLX format.
 ### Basic Usage
+Here is a specific example. This command will load the `openbmb/MiniCPM-8B-mlx` model and generate text based on the prompt you provide: "hello, pls tell me which one is the most powerful LLM in the World".
 ```Bash
+mlx_lm.generate --model openbmb/MiniCPM-8B-mlx --prompt "hello, pls tell me which one is the most powerful LLM in the World"
 ```
 ### MLX-LM Command Line Parameters
 - `mlx_lm.generate`: This is the primary command in the mlx-lm toolkit used for text generation.
+- `--model openbmb/MiniCPM-8B-mlx`: This parameter specifies the model to be loaded. `openbmb/MiniCPM-8B-mlx` is the model's identifier on the Hugging Face Hub. mlx-lm will automatically download and cache the model from there.
 - `--prompt "..."`: This parameter is used to provide the initial text that you want the model to respond to or complete.
 - `--max-tokens`: Sets the maximum number of tokens to generate. For example, `--max-tokens 200` will limit the output to 200 tokens.
 - `--temp`: Controls the randomness of the output. Higher temperature values (like 0.8) will produce more diverse and creative outputs, while lower values (like 0.2) will make the output more deterministic and focused. The default value is usually 0.6.
 The following command will use a higher temperature value and limit the output length:
 ```bash
+mlx_lm.generate --model openbmb/MiniCPM-8B-mlx \
                 --prompt "tell me a story about a robot who discovered music" \
                 --max-tokens 500 \
                 --temp 0.8