Update README.md
Browse files
README.md
CHANGED
|
@@ -3,3 +3,20 @@ license: apache-2.0
|
|
| 3 |
---
|
| 4 |
|
| 5 |
Base Model: Just Merged ~ No Training Gates After Merge
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
---
|
| 4 |
|
| 5 |
Base Model: Just Merged ~ No Training Gates After Merge
|
| 6 |
+
|
| 7 |
+
### Model Overview
|
| 8 |
+
|
| 9 |
+
I have developed a Mixture of Experts (MoE) architecture with two always-active experts designed to work together for Python instruction tuning. Each expert possesses a distinct skill:
|
| 10 |
+
|
| 11 |
+
- **Expert 1**: Specializes in generating Mermaid diagrams, primarily from Python code, which requires a deep understanding of code structures and logic.
|
| 12 |
+
- **Expert 2**: Focuses on strict context obedience, ensuring that the model only generates outputs based on the provided instructions.
|
| 13 |
+
|
| 14 |
+
### Why Always-Active MoE is Optimal
|
| 15 |
+
|
| 16 |
+
In this model, both experts are always active for each token, allowing them to complement each other:
|
| 17 |
+
|
| 18 |
+
- **Expert 1’s competence in Python structures** enhances the model's ability to generate correct and structured Python code.
|
| 19 |
+
- **Expert 2’s context obedience** ensures that the output remains aligned with the user’s instructions, preventing unnecessary or irrelevant outputs, such as Mermaid diagrams, unless explicitly requested.
|
| 20 |
+
|
| 21 |
+
This setup allows me to efficiently train the model for Python instruction following. By leveraging both experts simultaneously, I ensure that the model generates syntactically correct Python code while strictly adhering to user prompts.
|
| 22 |
+
|