Text Generation
Transformers
Safetensors
chatglm
feature-extraction
custom_code
JosephusCheung commited on
Commit
49e75aa
·
verified ·
1 Parent(s): 4832d5c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -49,7 +49,7 @@ Despite the absence of thorough alignment with human preferences, the model is u
49
 
50
  **Inference Parameters:** Our observations suggest that, if one desires to achieve results with fewer hallucinations, it is advisable to employ sampling with top_p=0.8 followed by a temperature setting of 0.3, or alternatively, to use pure temperature sampling with a setting of 0.2. **In general, a lower temperature is required compared to similar models**, which we tentatively attribute to overfitting on the vast dataset. The model inference should refer to THUDM/glm-4-9b-chat-1m and THUDM/glm-4v-9b. We only guarantee best performance when using transformers for inference. In our testing, we also used lmdeploy, which resulted in a significant performance degradation for multimodal input.
51
 
52
- **Regarding Formatting:** We strongly recommend you double-check your input to ensure: 1. The system prompt is not empty. Even something as simple as "You are a helpful assistant." is expected. 2. Each role's content ends with a newline character ('\n') before being concatenated with the <|role|> tag. 3. There is always a newline character after the <|role|> tag. This will help ensure proper parsing and processing of your input.
53
 
54
  **Regarding [Benchmark Scores](https://huggingface.co/spaces/JosephusCheung/Goodharts-Law-on-Benchmarks-a-Page-for-miniG):** Generally, you shouldn't worry too much about them, as people can always train specifically to achieve good results. We mainly use them as a smoke test, a quick check to ensure no major regressions have occurred. In fact, if you actually read through the benchmark questions themselves, you'll often find yourself chuckling at how inane, low-quality, or even downright silly they are.
55
 
@@ -75,7 +75,7 @@ Despite the absence of thorough alignment with human preferences, the model is u
75
 
76
  **推理参数:**我们的观察表明,如果想要减少幻觉结果,建议使用top_p=0.8的采样方式,然后设置temperature为0.3,或者使用纯粹的temperature采样,设置为0.2。**总体来说,相比类似的模型,该模型需要较低的temperature**,我们暂时将其归因于在庞大数据集上的过拟合。模型推理应参考 THUDM/glm-4-9b-chat-1m 和 THUDM/glm-4v-9b。我们只保证使用 transformer 进行推理时的性能最佳。在我们的测试中,我们还使用了 lmdeploy,这导致多模态输入的性能显著下降。
77
 
78
- **关于格式:**我们强烈建议您仔细检查输入内容,以确保:1. 系统提示不为空。即使是像“You are a helpful assistant.”这样简单的提示也是预期的。2. 每个角色的内容在与 <|role|> 标签连接之前都以换行符 ('\n') 结尾。3. <|role|> 标签后始终有一个换行符。这将有助于确保正确解析和处理您的输入。
79
 
80
  **关于[基准测试分数](https://huggingface.co/spaces/JosephusCheung/Goodharts-Law-on-Benchmarks-a-Page-for-miniG):**一般来说,你不应该太过在意这些分数,因为人们总是可以专门训练以取得好成绩。我们主要将它们作为一个冒烟测试,一种快速检查,确保没有发生重大回退。事实上,如果你真的去阅读这些基准测试问题本身,你常常会发现自己会忍不住笑出声来,因为它们是多么无聊、低质量,甚至荒谬可笑。
81
 
 
49
 
50
  **Inference Parameters:** Our observations suggest that, if one desires to achieve results with fewer hallucinations, it is advisable to employ sampling with top_p=0.8 followed by a temperature setting of 0.3, or alternatively, to use pure temperature sampling with a setting of 0.2. **In general, a lower temperature is required compared to similar models**, which we tentatively attribute to overfitting on the vast dataset. The model inference should refer to THUDM/glm-4-9b-chat-1m and THUDM/glm-4v-9b. We only guarantee best performance when using transformers for inference. In our testing, we also used lmdeploy, which resulted in a significant performance degradation for multimodal input.
51
 
52
+ **Regarding Formatting:** We strongly recommend you double-check your input to ensure: 1. The system prompt is not empty. Even something as simple as "You are a helpful assistant." is expected. 2. There is always a newline character after the <|role|> tag. This will help ensure proper parsing and processing of your input.
53
 
54
  **Regarding [Benchmark Scores](https://huggingface.co/spaces/JosephusCheung/Goodharts-Law-on-Benchmarks-a-Page-for-miniG):** Generally, you shouldn't worry too much about them, as people can always train specifically to achieve good results. We mainly use them as a smoke test, a quick check to ensure no major regressions have occurred. In fact, if you actually read through the benchmark questions themselves, you'll often find yourself chuckling at how inane, low-quality, or even downright silly they are.
55
 
 
75
 
76
  **推理参数:**我们的观察表明,如果想要减少幻觉结果,建议使用top_p=0.8的采样方式,然后设置temperature为0.3,或者使用纯粹的temperature采样,设置为0.2。**总体来说,相比类似的模型,该模型需要较低的temperature**,我们暂时将其归因于在庞大数据集上的过拟合。模型推理应参考 THUDM/glm-4-9b-chat-1m 和 THUDM/glm-4v-9b。我们只保证使用 transformer 进行推理时的性能最佳。在我们的测试中,我们还使用了 lmdeploy,这导致多模态输入的性能显著下降。
77
 
78
+ **关于格式:**我们强烈建议您仔细检查输入内容,以确保:1. 系统提示不为空。即使是像“You are a helpful assistant.”这样简单的提示也是预期的。2. <|role|> 标签后始终有一个换行符。这将有助于确保正确解析和处理您的输入。
79
 
80
  **关于[基准测试分数](https://huggingface.co/spaces/JosephusCheung/Goodharts-Law-on-Benchmarks-a-Page-for-miniG):**一般来说,你不应该太过在意这些分数,因为人们总是可以专门训练以取得好成绩。我们主要将它们作为一个冒烟测试,一种快速检查,确保没有发生重大回退。事实上,如果你真的去阅读这些基准测试问题本身,你常常会发现自己会忍不住笑出声来,因为它们是多么无聊、低质量,甚至荒谬可笑。
81