Update README.md
Browse files
README.md
CHANGED
@@ -74,11 +74,11 @@ MindGLM was trained using a combination of open-source datasets and self-constru
|
|
74 |
5. Training Process
|
75 |
The model underwent a three-phase training approach:
|
76 |
|
77 |
-
Supervised Fine-tuning: Using the ChatGLM2-6B foundational model, MindGLM was fine-tuned with a dedicated dataset for psychological counseling.
|
78 |
|
79 |
-
Reward Model Training: A reward model was trained to evaluate and score the responses of the fine-tuned model.
|
80 |
|
81 |
-
Reinforcement Learning: The model was further aligned using the PPO (Proximal Policy Optimization) algorithm to ensure its responses align with human preferences.
|
82 |
|
83 |
6. Limitations
|
84 |
While MindGLM is a powerful tool, users should be aware of its limitations:
|
@@ -165,11 +165,11 @@ MindGLM 结合使用开源数据集和自建数据集进行训练,以确保全
|
|
165 |
5. 训练过程
|
166 |
该模型采用了三阶段训练方法:(均使用了LoRA)
|
167 |
|
168 |
-
监督微调: 使用 ChatGLM2-6B 基础模型,用心理咨询专用数据集对 MindGLM 进行微调。
|
169 |
|
170 |
-
奖励模型训练: 对奖励模型进行训练,以对微调模型的响应进行评估和评分。
|
171 |
|
172 |
-
强化学习: 使用 PPO(近端策略优化)算法对模型进行进一步调整,以确保其反应符合人类的偏好。
|
173 |
|
174 |
6. 局限性
|
175 |
虽然 MindGLM 是一款功能强大的工具,但用户也应了解其局限性:
|
|
|
74 |
5. Training Process
|
75 |
The model underwent a three-phase training approach:
|
76 |
|
77 |
+
- Supervised Fine-tuning: Using the ChatGLM2-6B foundational model, MindGLM was fine-tuned with a dedicated dataset for psychological counseling.
|
78 |
|
79 |
+
- Reward Model Training: A reward model was trained to evaluate and score the responses of the fine-tuned model.
|
80 |
|
81 |
+
- Reinforcement Learning: The model was further aligned using the PPO (Proximal Policy Optimization) algorithm to ensure its responses align with human preferences.
|
82 |
|
83 |
6. Limitations
|
84 |
While MindGLM is a powerful tool, users should be aware of its limitations:
|
|
|
165 |
5. 训练过程
|
166 |
该模型采用了三阶段训练方法:(均使用了LoRA)
|
167 |
|
168 |
+
- 监督微调: 使用 ChatGLM2-6B 基础模型,用心理咨询专用数据集对 MindGLM 进行微调。
|
169 |
|
170 |
+
- 奖励模型训练: 对奖励模型进行训练,以对微调模型的响应进行评估和评分。
|
171 |
|
172 |
+
- 强化学习: 使用 PPO(近端策略优化)算法对模型进行进一步调整,以确保其反应符合人类的偏好。
|
173 |
|
174 |
6. 局限性
|
175 |
虽然 MindGLM 是一款功能强大的工具,但用户也应了解其局限性:
|