avemio
/

German-RAG-PHI-3.5-MINI-4B-SFT-HESSIAN-AI

@@ -133,15 +133,16 @@ Four evaluation metrics were employed across all subsets: language quality, over
 -   **Overall score:** This metric combined the results from the previous three metrics, offering a comprehensive evaluation of the model's capabilities across all subsets.
-| Metric                                    | [Vanila-Phi-3.5-Mini-4B](https://huggingface.co/microsoft/Phi-3.5-mini-instruct) | [GRAG-PHI-SFT](https://huggingface.co/avemio/GRAG-PHI-3.5-MINI-4B-SFT-HESSIAN-AI) | [GRAG-PHI-ORPO](https://huggingface.co/avemio/GRAG-PHI-3.5-MINI-4B-ORPO-HESSIAN-AI) | [GRAG-PHI-MERGED]() | GPT-3.5-TURBO |
 |------------------------------------------|---------------------------------------------------------------------------------|--------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------|-----------------------------|----------------|
-| **Average_language_quality**             | 75.11                                                                          | 78.88                                                                          | 78.13                                                                                          |85.41                             |91.86                |
-| **extraction_recall_weighted_overall_score**       | 18.0                                                                           | 37.5                                                                           | 32.0                                                                                          |61.8                             |87.2                |
-| **qa_multiple_references_weighted_overall_score** | 65.8                                                                           | 70.6                                                                           | 74.8                                                                                          |84.8                             |77.2                |
-| **qa_without_time_difference_weighted_overall_score** | 71.2                                                                           | 88.0                                                                           | 87.3                                                                                          |88.0                             |83.1                |
-| **qa_with_time_difference_weighted_overall_score** | 64.6                                                                           | 89.3                                                                           | 86.9                                                                                          |89.1                             |83.2                |
-| **relevant_context_weighted_overall_score**       | 72.3                                                                           | 72.8                                                                           | 69.1                                                                                          |84.4                             |89.5                |
-| **summarizations_weighted_overall_score**         | 74.6                                                                           | 83.2                                                                           | 81.1                                                                                          |84.9                             |86.9                |
 ## Model Details

 -   **Overall score:** This metric combined the results from the previous three metrics, offering a comprehensive evaluation of the model's capabilities across all subsets.
+| Metric                                    | [Vanila-Phi-3.5-Mini-4B](https://huggingface.co/microsoft/Phi-3.5-mini-instruct) | **[GRAG-PHI-SFT](https://huggingface.co/avemio/GRAG-PHI-3.5-MINI-4B-SFT-HESSIAN-AI)** | [GRAG-PHI-ORPO](https://huggingface.co/avemio/GRAG-PHI-3.5-MINI-4B-ORPO-HESSIAN-AI) | [GRAG-PHI-MERGED]() | GPT-3.5-TURBO |
 |------------------------------------------|---------------------------------------------------------------------------------|--------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------|-----------------------------|----------------|
+| Average_language_quality             | 75.11                                                                          | **78.88**                                                                          | 78.13                                                                                          |85.41                             |91.86                |
+| **OVERALL SCORES (weighted):**       |                                                                            |                                                                            |                                                                                           |                             |               |
+| extraction_recall       | 18.0                                                                           | **37.5**                                                                           | 32.0                                                                                          |61.8                             |87.2                |
+| qa_multiple_references | 65.8                                                                           | **70.6**                                                                           | 74.8                                                                                          |84.8                             |77.2                |
+| qa_without_time_difference | 71.2                                                                           | **88.0**                                                                           | 87.3                                                                                          |88.0                             |83.1                |
+| qa_with_time_difference | 64.6                                                                           | **89.3**                                                                           | 86.9                                                                                          |89.1                             |83.2                |
+| relevant_context       | 72.3                                                                           | **72.8**                                                                           | 69.1                                                                                          |84.4                             |89.5                |
+| summarizations         | 74.6                                                                           | **83.2**                                                                           | 81.1                                                                                          |84.9                             |86.9                |
 ## Model Details