Update README.md
Browse files
README.md
CHANGED
@@ -103,12 +103,11 @@ Four evaluation metrics were employed across all subsets: language quality, over
|
|
103 |
- **Instruction following:** This metric assessed the model's ability to follow specific instructions provided for each task.
|
104 |
- **Overall score:** This metric combined the results from the previous three metrics, offering a comprehensive evaluation of the model's capabilities across all subsets.
|
105 |
|
106 |
-
|
|
107 |
-
|
108 |
-
| **Average_language_quality**
|
109 |
-
| extraction_recall_overall_score
|
110 |
-
| qa_multiple_references_overall_score
|
111 |
-
|
112 |
## Model Details
|
113 |
|
114 |
### Data
|
|
|
103 |
- **Instruction following:** This metric assessed the model's ability to follow specific instructions provided for each task.
|
104 |
- **Overall score:** This metric combined the results from the previous three metrics, offering a comprehensive evaluation of the model's capabilities across all subsets.
|
105 |
|
106 |
+
| Metric | [Vanila-Phi-3.5-Mini-4B](https://huggingface.co/microsoft/Phi-3.5-mini-instruct) | [GRAG-Phi3.5-SFT-Mini-4B](https://huggingface.co/avemio/GRAG-PHI-3.5-MINI-4B-SFT-HESSIAN-AI) | [GRAG-ORPO-Phi3-5-Mini-4B](https://huggingface.co/avemio/GRAG-PHI-3.5-MINI-4B-ORPO-HESSIAN-AI) | [GRAG-Merge-Phi3.5-Mini-4B]() |
|
107 |
+
|------------------------------------------|---------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------|-----------------------------|
|
108 |
+
| **Average_language_quality** | 80.33 | 86.45 | | |
|
109 |
+
| **extraction_recall_overall_score** | 64.43 | 65.68 | | |
|
110 |
+
| **qa_multiple_references_overall_score** | 59.82 | 63.12 | | |
|
|
|
111 |
## Model Details
|
112 |
|
113 |
### Data
|