ZhangCNN commited on
Commit
24e71ed
·
1 Parent(s): 1afb151

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -6
README.md CHANGED
@@ -74,11 +74,11 @@ MindGLM was trained using a combination of open-source datasets and self-constru
74
  5. Training Process
75
  The model underwent a three-phase training approach:
76
 
77
- Supervised Fine-tuning: Using the ChatGLM2-6B foundational model, MindGLM was fine-tuned with a dedicated dataset for psychological counseling.
78
 
79
- Reward Model Training: A reward model was trained to evaluate and score the responses of the fine-tuned model.
80
 
81
- Reinforcement Learning: The model was further aligned using the PPO (Proximal Policy Optimization) algorithm to ensure its responses align with human preferences.
82
 
83
  6. Limitations
84
  While MindGLM is a powerful tool, users should be aware of its limitations:
@@ -165,11 +165,11 @@ MindGLM 结合使用开源数据集和自建数据集进行训练,以确保全
165
  5. 训练过程
166
  该模型采用了三阶段训练方法:(均使用了LoRA)
167
 
168
- 监督微调: 使用 ChatGLM2-6B 基础模型,用心理咨询专用数据集对 MindGLM 进行微调。
169
 
170
- 奖励模型训练: 对奖励模型进行训练,以对微调模型的响应进行评估和评分。
171
 
172
- 强化学习: 使用 PPO(近端策略优化)算法对模型进行进一步调整,以确保其反应符合人类的偏好。
173
 
174
  6. 局限性
175
  虽然 MindGLM 是一款功能强大的工具,但用户也应了解其局限性:
 
74
  5. Training Process
75
  The model underwent a three-phase training approach:
76
 
77
+ - Supervised Fine-tuning: Using the ChatGLM2-6B foundational model, MindGLM was fine-tuned with a dedicated dataset for psychological counseling.
78
 
79
+ - Reward Model Training: A reward model was trained to evaluate and score the responses of the fine-tuned model.
80
 
81
+ - Reinforcement Learning: The model was further aligned using the PPO (Proximal Policy Optimization) algorithm to ensure its responses align with human preferences.
82
 
83
  6. Limitations
84
  While MindGLM is a powerful tool, users should be aware of its limitations:
 
165
  5. 训练过程
166
  该模型采用了三阶段训练方法:(均使用了LoRA)
167
 
168
+ - 监督微调: 使用 ChatGLM2-6B 基础模型,用心理咨询专用数据集对 MindGLM 进行微调。
169
 
170
+ - 奖励模型训练: 对奖励模型进行训练,以对微调模型的响应进行评估和评分。
171
 
172
+ - 强化学习: 使用 PPO(近端策略优化)算法对模型进行进一步调整,以确保其反应符合人类的偏好。
173
 
174
  6. 局限性
175
  虽然 MindGLM 是一款功能强大的工具,但用户也应了解其局限性: