File size: 2,432 Bytes

7a374ef
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0ea0d7f
 
 
 
 
 
5559d93
 
0ea0d7f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5559d93
0ea0d7f
 
 
 
5559d93
 
 
 
 
 
 
0ea0d7f
 
 
07f2e7a
 
 
 
 
 
 
 
 
 
0ea0d7f
07f2e7a
0ea0d7f
4fda53a
f990875
07f2e7a
 
0ea0d7f
07f2e7a
0ea0d7f
07f2e7a
 
0ea0d7f
07f2e7a
 
 
 
 
 
 
 
 
0ea0d7f
07f2e7a

---
datasets:
- sentence-transformers/sentence-compression
language:
- en
metrics:
- sari
- rouge
base_model:
- facebook/bart-large
pipeline_tag: text-generation
tags:
- sentence-compression
- sentence-simplification
---
## Fine-Tuned BART-Large for Sentence Compression

### Model Overview

This model is a fine-tuned version of ```facebook/bart-large``` trained on the ```sentence-transformers/sentence-compression``` dataset. The goal of this model is to generate compressed versions of input sentences while maintaining fluency and meaning.

---

### Training Details

Base Model: ```facebook/bart-large```

Dataset: ```sentence-transformers/sentence-compression```

Batch Size: 8

Epochs: 5

Learning Rate: 2e-5

Weight Decay: 0.01

Evaluation Metric for Best Model: SARI Penalized

Precision Mode: FP16 for efficient training

---
### Evaluation Results

### Validation Set Performance:

| Metric              | Score |
|---------------------|-------|
| SARI                | 89.68 |
| SARI Penalized      | 88.42 |
| ROUGE-1             | 93.05 |
| ROUGE-2             | 88.47 |
| ROUGE-L             | 92.98 |

### Test Set Performance:

| Metric              | Score |
|---------------------|-------|
| SARI                | 89.76 |
| SARI Penalized      | 88.32 |
| ROUGE-1             | 93.14 |
| ROUGE-2             | 88.65 |
| ROUGE-L             | 93.07 |

---
### Training Loss Curve

The loss curves during training are visualized in bart-large-sentence-compression_loss.eps, showing both training and evaluation loss over steps.

<img src="Training_and_Evaluation_Loss_Plot.png" alt="Stats1" width="200" height="200">

---
## **Usage**

### Load the Model

```python
from transformers import BartForConditionalGeneration, BartTokenizer

model_name = "shahin-as/bart-large-sentence-compression"

model = BartForConditionalGeneration.from_pretrained(model_name)
tokenizer = BartTokenizer.from_pretrained(model_name)

def compress_sentence(sentence):
    inputs = tokenizer(sentence, return_tensors="pt", max_length=1024, truncation=True)
    summary_ids = model.generate(**inputs, max_length=50, num_beams=5, length_penalty=2.0, early_stopping=True)
    return tokenizer.decode(summary_ids[0], skip_special_tokens=True)

# Example usage
sentence = "Insert the sentence to be compressed here."
compressed_sentence = compress_sentence(sentence)
print("Original:", sentence)
print("Compressed:", compressed_sentence)
```