avemio-digital commited on
Commit
cbf315e
verified
1 Parent(s): 63e4c6e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -0
README.md CHANGED
@@ -142,6 +142,19 @@ Four evaluation metrics were employed across all subsets: language quality, over
142
  | **reasoning_weighted_overall_score** | 69.4 | 71.5 | 73.4 | |
143
  | **relevant_context_weighted_overall_score** | 71.3 | 69.1 | 65.5 | |
144
  | **summarizations_weighted_overall_score** | 73.8 | 81.6 | 80.3 | |
 
 
 
 
 
 
 
 
 
 
 
 
 
145
  ## Model Details
146
 
147
  ### Data
 
142
  | **reasoning_weighted_overall_score** | 69.4 | 71.5 | 73.4 | |
143
  | **relevant_context_weighted_overall_score** | 71.3 | 69.1 | 65.5 | |
144
  | **summarizations_weighted_overall_score** | 73.8 | 81.6 | 80.3 | |
145
+
146
+
147
+ | Metric | [Vanila-Phi-3.5-Mini-4B](https://huggingface.co/microsoft/Phi-3.5-mini-instruct) | [GRAG-SFT](https://huggingface.co/avemio/GRAG-PHI-3.5-MINI-4B-SFT-HESSIAN-AI) | [GRAG-ORPO](https://huggingface.co/avemio/GRAG-PHI-3.5-MINI-4B-ORPO-HESSIAN-AI) | [GRAG-MERGED]() | GPT-3.5-TURBO |
148
+ |------------------------------------------|---------------------------------------------------------------------------------|--------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------|-----------------------------|----------------|
149
+ | **Average_language_quality** | 85.88 | 89.61 | 89.1 | | |
150
+ | **extraction_recall_weighted_overall_score** | 35.2 | 52.3 | 48.8 | | |
151
+ | **qa_multiple_references_weighted_overall_score** | 65.3 | 71.0 | 74.0 | | |
152
+ | **qa_without_time_difference_weighted_overall_score** | 71.5 | 85.6 | 85.6 | | |
153
+ | **qa_with_time_difference_weighted_overall_score** | 65.3 | 87.9 | 85.4 | | |
154
+ | **reasoning_weighted_overall_score** | 69.4 | 71.5 | 73.4 | | |
155
+ | **relevant_context_weighted_overall_score** | 71.3 | 69.1 | 65.5 | | |
156
+ | **summarizations_weighted_overall_score** | 73.8 | 81.6 | 80.3 | | |
157
+
158
  ## Model Details
159
 
160
  ### Data