File size: 2,432 Bytes
7a374ef 0ea0d7f 5559d93 0ea0d7f 5559d93 0ea0d7f 5559d93 0ea0d7f 07f2e7a 0ea0d7f 07f2e7a 0ea0d7f 4fda53a f990875 07f2e7a 0ea0d7f 07f2e7a 0ea0d7f 07f2e7a 0ea0d7f 07f2e7a 0ea0d7f 07f2e7a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 |
---
datasets:
- sentence-transformers/sentence-compression
language:
- en
metrics:
- sari
- rouge
base_model:
- facebook/bart-large
pipeline_tag: text-generation
tags:
- sentence-compression
- sentence-simplification
---
## Fine-Tuned BART-Large for Sentence Compression
### Model Overview
This model is a fine-tuned version of ```facebook/bart-large``` trained on the ```sentence-transformers/sentence-compression``` dataset. The goal of this model is to generate compressed versions of input sentences while maintaining fluency and meaning.
---
### Training Details
Base Model: ```facebook/bart-large```
Dataset: ```sentence-transformers/sentence-compression```
Batch Size: 8
Epochs: 5
Learning Rate: 2e-5
Weight Decay: 0.01
Evaluation Metric for Best Model: SARI Penalized
Precision Mode: FP16 for efficient training
---
### Evaluation Results
### Validation Set Performance:
| Metric | Score |
|---------------------|-------|
| SARI | 89.68 |
| SARI Penalized | 88.42 |
| ROUGE-1 | 93.05 |
| ROUGE-2 | 88.47 |
| ROUGE-L | 92.98 |
### Test Set Performance:
| Metric | Score |
|---------------------|-------|
| SARI | 89.76 |
| SARI Penalized | 88.32 |
| ROUGE-1 | 93.14 |
| ROUGE-2 | 88.65 |
| ROUGE-L | 93.07 |
---
### Training Loss Curve
The loss curves during training are visualized in bart-large-sentence-compression_loss.eps, showing both training and evaluation loss over steps.
<img src="Training_and_Evaluation_Loss_Plot.png" alt="Stats1" width="200" height="200">
---
## **Usage**
### Load the Model
```python
from transformers import BartForConditionalGeneration, BartTokenizer
model_name = "shahin-as/bart-large-sentence-compression"
model = BartForConditionalGeneration.from_pretrained(model_name)
tokenizer = BartTokenizer.from_pretrained(model_name)
def compress_sentence(sentence):
inputs = tokenizer(sentence, return_tensors="pt", max_length=1024, truncation=True)
summary_ids = model.generate(**inputs, max_length=50, num_beams=5, length_penalty=2.0, early_stopping=True)
return tokenizer.decode(summary_ids[0], skip_special_tokens=True)
# Example usage
sentence = "Insert the sentence to be compressed here."
compressed_sentence = compress_sentence(sentence)
print("Original:", sentence)
print("Compressed:", compressed_sentence)
``` |