julsCadenas commited on
Commit
a389ab8
·
verified ·
1 Parent(s): 721ad7f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +33 -1
README.md CHANGED
@@ -139,4 +139,36 @@ The following table summarizes the ROUGE scores (Recall, Precision, and F1) for
139
  ## **Improvements**
140
  - Focus on enhancing **bigram overlap** (ROUGE-2) and overall **context understanding**.
141
  - Reduce **irrelevant content** for improved **precision**.
142
- - Improve **sequence coherence** for better **ROUGE-L** scores.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
139
  ## **Improvements**
140
  - Focus on enhancing **bigram overlap** (ROUGE-2) and overall **context understanding**.
141
  - Reduce **irrelevant content** for improved **precision**.
142
+ - Improve **sequence coherence** for better **ROUGE-L** scores.
143
+
144
+ ## **METEOR Score**
145
+
146
+ | Metric | Meteor Score |
147
+ |-------------|--------------|
148
+ | **Mean** | 0.2079 |
149
+ | **Min** | 0.0915 |
150
+ | **Max** | 0.3216 |
151
+ | **STD** | 0.0769 |
152
+
153
+ ### **Interpretation**
154
+ - **Mean**: The average METEOR score indicates good performance in terms of word alignment and synonyms, but there is still room for improvement.
155
+ - **Min**: The lowest METEOR score suggests some summaries may not align well with the reference.
156
+ - **Max**: The highest METEOR score shows the model's potential for generating very well-aligned summaries.
157
+ - **STD**: The standard deviation indicates some variability in the model's performance across different summaries.
158
+
159
+ ### **Conclusion**
160
+ - The model's **METEOR Score** shows a generally solid performance in generating summaries that align well with reference content but still has variability in certain cases.
161
+
162
+ ### **Improvements**
163
+ - Focus on improving the **alignment** and **synonym usage** to achieve higher and more consistent **METEOR scores** across summaries.
164
+
165
+ ## **TLDR**
166
+
167
+ ### **Comparison & Final Evaluation**
168
+ - **BERTScore** suggests the model is good at generating relevant tokens (precision) but struggles with capturing all relevant content (recall).
169
+ - **ROUGE-1** is decent, but **ROUGE-2** and **ROUGE-L** show weak performance, particularly in terms of bigram relationships and sequence coherence.
170
+ - **METEOR** results show solid alignment, but there’s significant variability, especially with lower scores.
171
+
172
+ ### **Conclusion**
173
+ - The model performs decently but lacks consistency, especially in **bigram overlap** (ROUGE-2) and capturing **longer sequences** (ROUGE-L). There’s room for improvement in **recall** and **precision** to make the summaries more relevant and coherent.
174
+ - Focus on improving **recall**, **bigram relationships**, and **precision** to achieve more consistent, high-quality summaries.