File size: 2,432 Bytes
7a374ef
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0ea0d7f
 
 
 
 
 
5559d93
 
0ea0d7f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5559d93
0ea0d7f
 
 
 
5559d93
 
 
 
 
 
 
0ea0d7f
 
 
07f2e7a
 
 
 
 
 
 
 
 
 
0ea0d7f
07f2e7a
0ea0d7f
4fda53a
f990875
07f2e7a
 
0ea0d7f
07f2e7a
0ea0d7f
07f2e7a
 
0ea0d7f
07f2e7a
 
 
 
 
 
 
 
 
0ea0d7f
07f2e7a
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
---
datasets:
- sentence-transformers/sentence-compression
language:
- en
metrics:
- sari
- rouge
base_model:
- facebook/bart-large
pipeline_tag: text-generation
tags:
- sentence-compression
- sentence-simplification
---
## Fine-Tuned BART-Large for Sentence Compression

### Model Overview

This model is a fine-tuned version of ```facebook/bart-large``` trained on the ```sentence-transformers/sentence-compression``` dataset. The goal of this model is to generate compressed versions of input sentences while maintaining fluency and meaning.

---

### Training Details

Base Model: ```facebook/bart-large```

Dataset: ```sentence-transformers/sentence-compression```

Batch Size: 8

Epochs: 5

Learning Rate: 2e-5

Weight Decay: 0.01

Evaluation Metric for Best Model: SARI Penalized

Precision Mode: FP16 for efficient training

---
### Evaluation Results

### Validation Set Performance:

| Metric              | Score |
|---------------------|-------|
| SARI                | 89.68 |
| SARI Penalized      | 88.42 |
| ROUGE-1             | 93.05 |
| ROUGE-2             | 88.47 |
| ROUGE-L             | 92.98 |

### Test Set Performance:

| Metric              | Score |
|---------------------|-------|
| SARI                | 89.76 |
| SARI Penalized      | 88.32 |
| ROUGE-1             | 93.14 |
| ROUGE-2             | 88.65 |
| ROUGE-L             | 93.07 |

---
### Training Loss Curve

The loss curves during training are visualized in bart-large-sentence-compression_loss.eps, showing both training and evaluation loss over steps.

<img src="Training_and_Evaluation_Loss_Plot.png" alt="Stats1" width="200" height="200">

---
## **Usage**

### Load the Model

```python
from transformers import BartForConditionalGeneration, BartTokenizer

model_name = "shahin-as/bart-large-sentence-compression"

model = BartForConditionalGeneration.from_pretrained(model_name)
tokenizer = BartTokenizer.from_pretrained(model_name)

def compress_sentence(sentence):
    inputs = tokenizer(sentence, return_tensors="pt", max_length=1024, truncation=True)
    summary_ids = model.generate(**inputs, max_length=50, num_beams=5, length_penalty=2.0, early_stopping=True)
    return tokenizer.decode(summary_ids[0], skip_special_tokens=True)

# Example usage
sentence = "Insert the sentence to be compressed here."
compressed_sentence = compress_sentence(sentence)
print("Original:", sentence)
print("Compressed:", compressed_sentence)
```