Text Generation
Transformers
Safetensors
chatglm
feature-extraction
custom_code
JosephusCheung commited on
Commit
dc05c42
·
verified ·
1 Parent(s): 582309e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -47,7 +47,7 @@ Despite the absence of thorough alignment with human preferences, the model is u
47
 
48
  **Cautionary Notes:** **It is strongly recommended to utilize a standardized implementation for inference**, such as Hugging Face Transformers, to avoid the significant performance degradation that might occur when using accelerated kernels like vllm or lmdeploy - not to mention the potentially catastrophic effects of model quantization. **As of now, these accelerated inference implementations are known to severely compromise effective** vision inference, though they have a less pronounced impact on pure text performance.
49
 
50
- **Inference Parameters:** Our observations suggest that, if one desires to achieve results with fewer hallucinations, it is advisable to employ sampling with top_p=0.8 followed by a temperature setting of 0.3, or alternatively, to use pure temperature sampling with a setting of 0.2. **In general, a lower temperature is required compared to similar models**, which we tentatively attribute to overfitting on the vast dataset.
51
 
52
  **Regarding Formatting:** We strongly recommend you double-check your input to ensure: 1. The system prompt is not empty. Even something as simple as "You are a helpful assistant." is expected. 2. Each role's content ends with a newline character ('\n') before being concatenated with the <|role|> tag. 3. There is always a newline character after the <|role|> tag. This will help ensure proper parsing and processing of your input.
53
 
@@ -73,7 +73,7 @@ Despite the absence of thorough alignment with human preferences, the model is u
73
 
74
  **注意事项:** **强烈建议使用标准化的推理实现**,例如Hugging Face Transformers,以避免在使用加速内核(如vllm或lmdeploy)时可能发生的显著性能下降——更不用说模型量化可能带来的灾难性影响。**目前,这些加速推理实现已知会严重损害**视觉推理的有效性,尽管对纯文本性能的影响较小。
75
 
76
- **推理参数:**我们的观察表明,如果想要减少幻觉结果,建议使用top_p=0.8的采样方式,然后设置temperature为0.3,或者使用纯粹的temperature采样,设置为0.2。**总体来说,相比类似的模型,该模型需要较低的temperature**,我们暂时将其归因于在庞大数据集上的过拟合。
77
 
78
  **关于格式:**我们强烈建议您仔细检查输入内容,以确保:1. 系统提示不为空。即使是像“You are a helpful assistant.”这样简单的提示也是预期的。2. 每个角色的内容在与 <|role|> 标签连接之前都以换行符 ('\n') 结尾。3. <|role|> 标签后始终有一个换行符。这将有助于确保正确解析和处理您的输入。
79
 
 
47
 
48
  **Cautionary Notes:** **It is strongly recommended to utilize a standardized implementation for inference**, such as Hugging Face Transformers, to avoid the significant performance degradation that might occur when using accelerated kernels like vllm or lmdeploy - not to mention the potentially catastrophic effects of model quantization. **As of now, these accelerated inference implementations are known to severely compromise effective** vision inference, though they have a less pronounced impact on pure text performance.
49
 
50
+ **Inference Parameters:** Our observations suggest that, if one desires to achieve results with fewer hallucinations, it is advisable to employ sampling with top_p=0.8 followed by a temperature setting of 0.3, or alternatively, to use pure temperature sampling with a setting of 0.2. **In general, a lower temperature is required compared to similar models**, which we tentatively attribute to overfitting on the vast dataset. The model inference should refer to THUDM/glm-4-9b-chat-1m and THUDM/glm-4v-9b. We only guarantee best performance when using transformers for inference. In our testing, we also used lmdeploy, which resulted in a significant performance degradation for multimodal input.
51
 
52
  **Regarding Formatting:** We strongly recommend you double-check your input to ensure: 1. The system prompt is not empty. Even something as simple as "You are a helpful assistant." is expected. 2. Each role's content ends with a newline character ('\n') before being concatenated with the <|role|> tag. 3. There is always a newline character after the <|role|> tag. This will help ensure proper parsing and processing of your input.
53
 
 
73
 
74
  **注意事项:** **强烈建议使用标准化的推理实现**,例如Hugging Face Transformers,以避免在使用加速内核(如vllm或lmdeploy)时可能发生的显著性能下降——更不用说模型量化可能带来的灾难性影响。**目前,这些加速推理实现已知会严重损害**视觉推理的有效性,尽管对纯文本性能的影响较小。
75
 
76
+ **推理参数:**我们的观察表明,如果想要减少幻觉结果,建议使用top_p=0.8的采样方式,然后设置temperature为0.3,或者使用纯粹的temperature采样,设置为0.2。**总体来说,相比类似的模型,该模型需要较低的temperature**,我们暂时将其归因于在庞大数据集上的过拟合。模型推理应参考 THUDM/glm-4-9b-chat-1m 和 THUDM/glm-4v-9b。我们只保证使用 transformer 进行推理时的性能最佳。在我们的测试中,我们还使用了 lmdeploy,这导致多模态输入的性能显著下降。
77
 
78
  **关于格式:**我们强烈建议您仔细检查输入内容,以确保:1. 系统提示不为空。即使是像“You are a helpful assistant.”这样简单的提示也是预期的。2. 每个角色的内容在与 <|role|> 标签连接之前都以换行符 ('\n') 结尾。3. <|role|> 标签后始终有一个换行符。这将有助于确保正确解析和处理您的输入。
79