CausalLM
/

miniG

@@ -47,7 +47,7 @@ Despite the absence of thorough alignment with human preferences, the model is u
 **Cautionary Notes:** **It is strongly recommended to utilize a standardized implementation for inference**, such as Hugging Face Transformers, to avoid the significant performance degradation that might occur when using accelerated kernels like vllm or lmdeploy - not to mention the potentially catastrophic effects of model quantization. **As of now, these accelerated inference implementations are known to severely compromise effective** vision inference, though they have a less pronounced impact on pure text performance.
-**Inference Parameters:** Our observations suggest that, if one desires to achieve results with fewer hallucinations, it is advisable to employ sampling with top_p=0.8 followed by a temperature setting of 0.3, or alternatively, to use pure temperature sampling with a setting of 0.2. **In general, a lower temperature is required compared to similar models**, which we tentatively attribute to overfitting on the vast dataset.
 **Regarding Formatting:** We strongly recommend you double-check your input to ensure: 1. The system prompt is not empty. Even something as simple as "You are a helpful assistant." is expected. 2. Each role's content ends with a newline character ('\n') before being concatenated with the <|role|> tag. 3. There is always a newline character after the <|role|> tag. This will help ensure proper parsing and processing of your input.
@@ -73,7 +73,7 @@ Despite the absence of thorough alignment with human preferences, the model is u
 **注意事项：** **强烈建议使用标准化的推理实现**，例如Hugging Face Transformers，以避免在使用加速内核（如vllm或lmdeploy）时可能发生的显著性能下降——更不用说模型量化可能带来的灾难性影响。**目前，这些加速推理实现已知会严重损害**视觉推理的有效性，尽管对纯文本性能的影响较小。
-**推理参数：**我们的观察表明，如果想要减少幻觉结果，建议使用top_p=0.8的采样方式，然后设置temperature为0.3，或者使用纯粹的temperature采样，设置为0.2。**总体来说，相比类似的模型，该模型需要较低的temperature**，我们暂时将其归因于在庞大数据集上的过拟合。
 **关于格式：**我们强烈建议您仔细检查输入内容，以确保：1. 系统提示不为空。即使是像“You are a helpful assistant.”这样简单的提示也是预期的。2. 每个角色的内容在与 <|role|> 标签连接之前都以换行符 ('\n') 结尾。3. <|role|> 标签后始终有一个换行符。这将有助于确保正确解析和处理您的输入。

 **Cautionary Notes:** **It is strongly recommended to utilize a standardized implementation for inference**, such as Hugging Face Transformers, to avoid the significant performance degradation that might occur when using accelerated kernels like vllm or lmdeploy - not to mention the potentially catastrophic effects of model quantization. **As of now, these accelerated inference implementations are known to severely compromise effective** vision inference, though they have a less pronounced impact on pure text performance.
+**Inference Parameters:** Our observations suggest that, if one desires to achieve results with fewer hallucinations, it is advisable to employ sampling with top_p=0.8 followed by a temperature setting of 0.3, or alternatively, to use pure temperature sampling with a setting of 0.2. **In general, a lower temperature is required compared to similar models**, which we tentatively attribute to overfitting on the vast dataset. The model inference should refer to THUDM/glm-4-9b-chat-1m and THUDM/glm-4v-9b. We only guarantee best performance when using transformers for inference. In our testing, we also used lmdeploy, which resulted in a significant performance degradation for multimodal input.
 **Regarding Formatting:** We strongly recommend you double-check your input to ensure: 1. The system prompt is not empty. Even something as simple as "You are a helpful assistant." is expected. 2. Each role's content ends with a newline character ('\n') before being concatenated with the <|role|> tag. 3. There is always a newline character after the <|role|> tag. This will help ensure proper parsing and processing of your input.
 **注意事项：** **强烈建议使用标准化的推理实现**，例如Hugging Face Transformers，以避免在使用加速内核（如vllm或lmdeploy）时可能发生的显著性能下降——更不用说模型量化可能带来的灾难性影响。**目前，这些加速推理实现已知会严重损害**视觉推理的有效性，尽管对纯文本性能的影响较小。
+**推理参数：**我们的观察表明，如果想要减少幻觉结果，建议使用top_p=0.8的采样方式，然后设置temperature为0.3，或者使用纯粹的temperature采样，设置为0.2。**总体来说，相比类似的模型，该模型需要较低的temperature**，我们暂时将其归因于在庞大数据集上的过拟合。模型推理应参考 THUDM/glm-4-9b-chat-1m 和 THUDM/glm-4v-9b。我们只保证使用 transformer 进行推理时的性能最佳。在我们的测试中，我们还使用了 lmdeploy，这导致多模态输入的性能显著下降。
 **关于格式：**我们强烈建议您仔细检查输入内容，以确保：1. 系统提示不为空。即使是像“You are a helpful assistant.”这样简单的提示也是预期的。2. 每个角色的内容在与 <|role|> 标签连接之前都以换行符 ('\n') 结尾。3. <|role|> 标签后始终有一个换行符。这将有助于确保正确解析和处理您的输入。