avemio-digital commited on
Commit
60df963
verified
1 Parent(s): 5e7b4a1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -8
README.md CHANGED
@@ -133,15 +133,16 @@ Four evaluation metrics were employed across all subsets: language quality, over
133
  - **Overall score:** This metric combined the results from the previous three metrics, offering a comprehensive evaluation of the model's capabilities across all subsets.
134
 
135
 
136
- | Metric | [Vanila-Phi-3.5-Mini-4B](https://huggingface.co/microsoft/Phi-3.5-mini-instruct) | [GRAG-PHI-SFT](https://huggingface.co/avemio/GRAG-PHI-3.5-MINI-4B-SFT-HESSIAN-AI) | [GRAG-PHI-ORPO](https://huggingface.co/avemio/GRAG-PHI-3.5-MINI-4B-ORPO-HESSIAN-AI) | [GRAG-PHI-MERGED]() | GPT-3.5-TURBO |
137
  |------------------------------------------|---------------------------------------------------------------------------------|--------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------|-----------------------------|----------------|
138
- | **Average_language_quality** | 75.11 | 78.88 | 78.13 |85.41 |91.86 |
139
- | **extraction_recall_weighted_overall_score** | 18.0 | 37.5 | 32.0 |61.8 |87.2 |
140
- | **qa_multiple_references_weighted_overall_score** | 65.8 | 70.6 | 74.8 |84.8 |77.2 |
141
- | **qa_without_time_difference_weighted_overall_score** | 71.2 | 88.0 | 87.3 |88.0 |83.1 |
142
- | **qa_with_time_difference_weighted_overall_score** | 64.6 | 89.3 | 86.9 |89.1 |83.2 |
143
- | **relevant_context_weighted_overall_score** | 72.3 | 72.8 | 69.1 |84.4 |89.5 |
144
- | **summarizations_weighted_overall_score** | 74.6 | 83.2 | 81.1 |84.9 |86.9 |
 
145
 
146
  ## Model Details
147
 
 
133
  - **Overall score:** This metric combined the results from the previous three metrics, offering a comprehensive evaluation of the model's capabilities across all subsets.
134
 
135
 
136
+ | Metric | [Vanila-Phi-3.5-Mini-4B](https://huggingface.co/microsoft/Phi-3.5-mini-instruct) | **[GRAG-PHI-SFT](https://huggingface.co/avemio/GRAG-PHI-3.5-MINI-4B-SFT-HESSIAN-AI)** | [GRAG-PHI-ORPO](https://huggingface.co/avemio/GRAG-PHI-3.5-MINI-4B-ORPO-HESSIAN-AI) | [GRAG-PHI-MERGED]() | GPT-3.5-TURBO |
137
  |------------------------------------------|---------------------------------------------------------------------------------|--------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------|-----------------------------|----------------|
138
+ | Average_language_quality | 75.11 | **78.88** | 78.13 |85.41 |91.86 |
139
+ | **OVERALL SCORES (weighted):** | | | | | |
140
+ | extraction_recall | 18.0 | **37.5** | 32.0 |61.8 |87.2 |
141
+ | qa_multiple_references | 65.8 | **70.6** | 74.8 |84.8 |77.2 |
142
+ | qa_without_time_difference | 71.2 | **88.0** | 87.3 |88.0 |83.1 |
143
+ | qa_with_time_difference | 64.6 | **89.3** | 86.9 |89.1 |83.2 |
144
+ | relevant_context | 72.3 | **72.8** | 69.1 |84.4 |89.5 |
145
+ | summarizations | 74.6 | **83.2** | 81.1 |84.9 |86.9 |
146
 
147
  ## Model Details
148