Update README.md
Browse files
README.md
CHANGED
@@ -132,7 +132,7 @@ Four evaluation metrics were employed across all subsets: language quality, over
|
|
132 |
- **Instruction following:** This metric assessed the model's ability to follow specific instructions provided for each task.
|
133 |
- **Overall score:** This metric combined the results from the previous three metrics, offering a comprehensive evaluation of the model's capabilities across all subsets.
|
134 |
|
135 |
-
| Metric | [Vanila-Phi-3.5-Mini-4B](https://huggingface.co/microsoft/Phi-3.5-mini-instruct) | [GRAG-SFT](https://huggingface.co/avemio/GRAG-PHI-3.5-MINI-4B-SFT-HESSIAN-AI) | [GRAG-ORPO](https://huggingface.co/avemio/GRAG-PHI-3.5-MINI-4B-ORPO-HESSIAN-AI) | [GRAG-MERGED]() |
|
136 |
|------------------------------------------|---------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------|-----------------------------|
|
137 |
| **Average_language_quality** | 85.88 | 89.61 | 89.1 | |
|
138 |
| **extraction_recall_weighted_overall_score** | 35.2 | 52.3 | 48.8 | |
|
|
|
132 |
- **Instruction following:** This metric assessed the model's ability to follow specific instructions provided for each task.
|
133 |
- **Overall score:** This metric combined the results from the previous three metrics, offering a comprehensive evaluation of the model's capabilities across all subsets.
|
134 |
|
135 |
+
| Metric | [Vanila-Phi-3.5-Mini-4B](https://huggingface.co/microsoft/Phi-3.5-mini-instruct) | [GRAG-SFT](https://huggingface.co/avemio/GRAG-PHI-3.5-MINI-4B-SFT-HESSIAN-AI) | [GRAG-ORPO](https://huggingface.co/avemio/GRAG-PHI-3.5-MINI-4B-ORPO-HESSIAN-AI) | [GRAG-MERGED]() |
|
136 |
|------------------------------------------|---------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------|-----------------------------|
|
137 |
| **Average_language_quality** | 85.88 | 89.61 | 89.1 | |
|
138 |
| **extraction_recall_weighted_overall_score** | 35.2 | 52.3 | 48.8 | |
|