CausalLM
/

miniG

@@ -49,7 +49,7 @@ Despite the absence of thorough alignment with human preferences, the model is u
 **Inference Parameters:** Our observations suggest that, if one desires to achieve results with fewer hallucinations, it is advisable to employ sampling with top_p=0.8 followed by a temperature setting of 0.3, or alternatively, to use pure temperature sampling with a setting of 0.2. **In general, a lower temperature is required compared to similar models**, which we tentatively attribute to overfitting on the vast dataset. The model inference should refer to THUDM/glm-4-9b-chat-1m and THUDM/glm-4v-9b. We only guarantee best performance when using transformers for inference. In our testing, we also used lmdeploy, which resulted in a significant performance degradation for multimodal input.
-**Regarding Formatting:** We strongly recommend you double-check your input to ensure: 1. The system prompt is not empty. Even something as simple as "You are a helpful assistant." is expected. 2. Each role's content ends with a newline character ('\n') before being concatenated with the <|role|> tag. 3. There is always a newline character after the <|role|> tag. This will help ensure proper parsing and processing of your input.
 **Regarding [Benchmark Scores](https://huggingface.co/spaces/JosephusCheung/Goodharts-Law-on-Benchmarks-a-Page-for-miniG):** Generally, you shouldn't worry too much about them, as people can always train specifically to achieve good results. We mainly use them as a smoke test, a quick check to ensure no major regressions have occurred. In fact, if you actually read through the benchmark questions themselves, you'll often find yourself chuckling at how inane, low-quality, or even downright silly they are.
@@ -75,7 +75,7 @@ Despite the absence of thorough alignment with human preferences, the model is u
 **推理参数：**我们的观察表明，如果想要减少幻觉结果，建议使用top_p=0.8的采样方式，然后设置temperature为0.3，或者使用纯粹的temperature采样，设置为0.2。**总体来说，相比类似的模型，该模型需要较低的temperature**，我们暂时将其归因于在庞大数据集上的过拟合。模型推理应参考 THUDM/glm-4-9b-chat-1m 和 THUDM/glm-4v-9b。我们只保证使用 transformer 进行推理时的性能最佳。在我们的测试中，我们还使用了 lmdeploy，这导致多模态输入的性能显著下降。
-**关于格式：**我们强烈建议您仔细检查输入内容，以确保：1. 系统提示不为空。即使是像“You are a helpful assistant.”这样简单的提示也是预期的。2. 每个角色的内容在与 <|role|> 标签连接之前都以换行符 ('\n') 结尾。3. <|role|> 标签后始终有一个换行符。这将有助于确保正确解析和处理您的输入。
 **关于[基准测试分数](https://huggingface.co/spaces/JosephusCheung/Goodharts-Law-on-Benchmarks-a-Page-for-miniG)：**一般来说，你不应该太过在意这些分数，因为人们总是可以专门训练以取得好成绩。我们主要将它们作为一个冒烟测试，一种快速检查，确保没有发生重大回退。事实上，如果你真的去阅读这些基准测试问题本身，你常常会发现自己会忍不住笑出声来，因为它们是多么无聊、低质量，甚至荒谬可笑。

 **Inference Parameters:** Our observations suggest that, if one desires to achieve results with fewer hallucinations, it is advisable to employ sampling with top_p=0.8 followed by a temperature setting of 0.3, or alternatively, to use pure temperature sampling with a setting of 0.2. **In general, a lower temperature is required compared to similar models**, which we tentatively attribute to overfitting on the vast dataset. The model inference should refer to THUDM/glm-4-9b-chat-1m and THUDM/glm-4v-9b. We only guarantee best performance when using transformers for inference. In our testing, we also used lmdeploy, which resulted in a significant performance degradation for multimodal input.
+**Regarding Formatting:** We strongly recommend you double-check your input to ensure: 1. The system prompt is not empty. Even something as simple as "You are a helpful assistant." is expected. 2. There is always a newline character after the <|role|> tag. This will help ensure proper parsing and processing of your input.
 **Regarding [Benchmark Scores](https://huggingface.co/spaces/JosephusCheung/Goodharts-Law-on-Benchmarks-a-Page-for-miniG):** Generally, you shouldn't worry too much about them, as people can always train specifically to achieve good results. We mainly use them as a smoke test, a quick check to ensure no major regressions have occurred. In fact, if you actually read through the benchmark questions themselves, you'll often find yourself chuckling at how inane, low-quality, or even downright silly they are.
 **推理参数：**我们的观察表明，如果想要减少幻觉结果，建议使用top_p=0.8的采样方式，然后设置temperature为0.3，或者使用纯粹的temperature采样，设置为0.2。**总体来说，相比类似的模型，该模型需要较低的temperature**，我们暂时将其归因于在庞大数据集上的过拟合。模型推理应参考 THUDM/glm-4-9b-chat-1m 和 THUDM/glm-4v-9b。我们只保证使用 transformer 进行推理时的性能最佳。在我们的测试中，我们还使用了 lmdeploy，这导致多模态输入的性能显著下降。
+**关于格式：**我们强烈建议您仔细检查输入内容，以确保：1. 系统提示不为空。即使是像“You are a helpful assistant.”这样简单的提示也是预期的。2. <|role|> 标签后始终有一个换行符。这将有助于确保正确解析和处理您的输入。
 **关于[基准测试分数](https://huggingface.co/spaces/JosephusCheung/Goodharts-Law-on-Benchmarks-a-Page-for-miniG)：**一般来说，你不应该太过在意这些分数，因为人们总是可以专门训练以取得好成绩。我们主要将它们作为一个冒烟测试，一种快速检查，确保没有发生重大回退。事实上，如果你真的去阅读这些基准测试问题本身，你常常会发现自己会忍不住笑出声来，因为它们是多么无聊、低质量，甚至荒谬可笑。