julsCadenas
/

summarize-reddit

Model card Files Files and versions

julsCadenas commited on Feb 4

Commit

a389ab8

·

verified ·

1 Parent(s): 721ad7f

Update README.md

Files changed (1) hide show

README.md +33 -1

README.md CHANGED Viewed

@@ -139,4 +139,36 @@ The following table summarizes the ROUGE scores (Recall, Precision, and F1) for
 ## **Improvements**
 - Focus on enhancing **bigram overlap** (ROUGE-2) and overall **context understanding**.
 - Reduce **irrelevant content** for improved **precision**.
-- Improve **sequence coherence** for better **ROUGE-L** scores.

 ## **Improvements**
 - Focus on enhancing **bigram overlap** (ROUGE-2) and overall **context understanding**.
 - Reduce **irrelevant content** for improved **precision**.
+- Improve **sequence coherence** for better **ROUGE-L** scores.
+## **METEOR Score**
+| Metric      | Meteor Score |
+|-------------|--------------|
+| **Mean**    | 0.2079       |
+| **Min**     | 0.0915       |
+| **Max**     | 0.3216       |
+| **STD**     | 0.0769       |
+### **Interpretation**
+- **Mean**: The average METEOR score indicates good performance in terms of word alignment and synonyms, but there is still room for improvement.
+- **Min**: The lowest METEOR score suggests some summaries may not align well with the reference.
+- **Max**: The highest METEOR score shows the model's potential for generating very well-aligned summaries.
+- **STD**: The standard deviation indicates some variability in the model's performance across different summaries.
+### **Conclusion**
+- The model's **METEOR Score** shows a generally solid performance in generating summaries that align well with reference content but still has variability in certain cases.
+### **Improvements**
+- Focus on improving the **alignment** and **synonym usage** to achieve higher and more consistent **METEOR scores** across summaries.
+## **TLDR**
+### **Comparison & Final Evaluation**
+- **BERTScore** suggests the model is good at generating relevant tokens (precision) but struggles with capturing all relevant content (recall).
+- **ROUGE-1** is decent, but **ROUGE-2** and **ROUGE-L** show weak performance, particularly in terms of bigram relationships and sequence coherence.
+- **METEOR** results show solid alignment, but there’s significant variability, especially with lower scores.
+### **Conclusion**
+- The model performs decently but lacks consistency, especially in **bigram overlap** (ROUGE-2) and capturing **longer sequences** (ROUGE-L). There’s room for improvement in **recall** and **precision** to make the summaries more relevant and coherent.
+- Focus on improving **recall**, **bigram relationships**, and **precision** to achieve more consistent, high-quality summaries.