Update README.md
Browse files
README.md
CHANGED
@@ -133,15 +133,16 @@ Four evaluation metrics were employed across all subsets: language quality, over
|
|
133 |
- **Overall score:** This metric combined the results from the previous three metrics, offering a comprehensive evaluation of the model's capabilities across all subsets.
|
134 |
|
135 |
|
136 |
-
| Metric | [Vanila-Phi-3.5-Mini-4B](https://huggingface.co/microsoft/Phi-3.5-mini-instruct) | [GRAG-PHI-SFT](https://huggingface.co/avemio/GRAG-PHI-3.5-MINI-4B-SFT-HESSIAN-AI) | [GRAG-PHI-ORPO](https://huggingface.co/avemio/GRAG-PHI-3.5-MINI-4B-ORPO-HESSIAN-AI) | [GRAG-PHI-MERGED]() | GPT-3.5-TURBO |
|
137 |
|------------------------------------------|---------------------------------------------------------------------------------|--------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------|-----------------------------|----------------|
|
138 |
-
|
|
139 |
-
| **
|
140 |
-
|
|
141 |
-
|
|
142 |
-
|
|
143 |
-
|
|
144 |
-
|
|
|
|
145 |
|
146 |
## Model Details
|
147 |
|
|
|
133 |
- **Overall score:** This metric combined the results from the previous three metrics, offering a comprehensive evaluation of the model's capabilities across all subsets.
|
134 |
|
135 |
|
136 |
+
| Metric | [Vanila-Phi-3.5-Mini-4B](https://huggingface.co/microsoft/Phi-3.5-mini-instruct) | **[GRAG-PHI-SFT](https://huggingface.co/avemio/GRAG-PHI-3.5-MINI-4B-SFT-HESSIAN-AI)** | [GRAG-PHI-ORPO](https://huggingface.co/avemio/GRAG-PHI-3.5-MINI-4B-ORPO-HESSIAN-AI) | [GRAG-PHI-MERGED]() | GPT-3.5-TURBO |
|
137 |
|------------------------------------------|---------------------------------------------------------------------------------|--------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------|-----------------------------|----------------|
|
138 |
+
| Average_language_quality | 75.11 | **78.88** | 78.13 |85.41 |91.86 |
|
139 |
+
| **OVERALL SCORES (weighted):** | | | | | |
|
140 |
+
| extraction_recall | 18.0 | **37.5** | 32.0 |61.8 |87.2 |
|
141 |
+
| qa_multiple_references | 65.8 | **70.6** | 74.8 |84.8 |77.2 |
|
142 |
+
| qa_without_time_difference | 71.2 | **88.0** | 87.3 |88.0 |83.1 |
|
143 |
+
| qa_with_time_difference | 64.6 | **89.3** | 86.9 |89.1 |83.2 |
|
144 |
+
| relevant_context | 72.3 | **72.8** | 69.1 |84.4 |89.5 |
|
145 |
+
| summarizations | 74.6 | **83.2** | 81.1 |84.9 |86.9 |
|
146 |
|
147 |
## Model Details
|
148 |
|