shahin-as's picture
Update README.md
4fda53a verified
---
datasets:
- sentence-transformers/sentence-compression
language:
- en
metrics:
- sari
- rouge
base_model:
- facebook/bart-large
pipeline_tag: text-generation
tags:
- sentence-compression
- sentence-simplification
---
## Fine-Tuned BART-Large for Sentence Compression
### Model Overview
This model is a fine-tuned version of ```facebook/bart-large``` trained on the ```sentence-transformers/sentence-compression``` dataset. The goal of this model is to generate compressed versions of input sentences while maintaining fluency and meaning.
---
### Training Details
Base Model: ```facebook/bart-large```
Dataset: ```sentence-transformers/sentence-compression```
Batch Size: 8
Epochs: 5
Learning Rate: 2e-5
Weight Decay: 0.01
Evaluation Metric for Best Model: SARI Penalized
Precision Mode: FP16 for efficient training
---
### Evaluation Results
### Validation Set Performance:
| Metric | Score |
|---------------------|-------|
| SARI | 89.68 |
| SARI Penalized | 88.42 |
| ROUGE-1 | 93.05 |
| ROUGE-2 | 88.47 |
| ROUGE-L | 92.98 |
### Test Set Performance:
| Metric | Score |
|---------------------|-------|
| SARI | 89.76 |
| SARI Penalized | 88.32 |
| ROUGE-1 | 93.14 |
| ROUGE-2 | 88.65 |
| ROUGE-L | 93.07 |
---
### Training Loss Curve
The loss curves during training are visualized in bart-large-sentence-compression_loss.eps, showing both training and evaluation loss over steps.
<img src="Training_and_Evaluation_Loss_Plot.png" alt="Stats1" width="200" height="200">
---
## **Usage**
### Load the Model
```python
from transformers import BartForConditionalGeneration, BartTokenizer
model_name = "shahin-as/bart-large-sentence-compression"
model = BartForConditionalGeneration.from_pretrained(model_name)
tokenizer = BartTokenizer.from_pretrained(model_name)
def compress_sentence(sentence):
inputs = tokenizer(sentence, return_tensors="pt", max_length=1024, truncation=True)
summary_ids = model.generate(**inputs, max_length=50, num_beams=5, length_penalty=2.0, early_stopping=True)
return tokenizer.decode(summary_ids[0], skip_special_tokens=True)
# Example usage
sentence = "Insert the sentence to be compressed here."
compressed_sentence = compress_sentence(sentence)
print("Original:", sentence)
print("Compressed:", compressed_sentence)
```