File size: 4,741 Bytes
63cf97c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 |
---
tags:
- summarization
- summary
- booksum
- long-document
- long-form
license:
- apache-2.0
- bsd-3-clause
datasets:
- kmfoda/booksum
metrics:
- rouge
inference: False
---
# long-t5-tglobal-xl + BookSum
- summarize long text and get a SparkNotes-esque summary of arbitrary topics!
- generalizes reasonably well to academic & narrative text. This is the XL checkpoint, which **from a human-evaluation perspective, produces even better summaries**.
- A simple example/use case with the `base` model on ASR is [here](https://longt5-booksum-example.netlify.app/).
## Model description
A fine-tuned version of [google/long-t5-tglobal-xl](https://huggingface.co/google/long-t5-tglobal-xl) on the `kmfoda/booksum` dataset.
Read the paper by Guo et al. here: [LongT5: Efficient Text-To-Text Transformer for Long Sequences](https://arxiv.org/pdf/2112.07916.pdf)
## How-To in Python
> `LLM.int8()` appears to be compatible with summarization and does not degrade the quality of the outputs; this is a crucial enabler for using this model on standard GPUs. A PR for this is in-progress [here](https://github.com/huggingface/transformers/pull/20341), and this model card will be updated with instructions once done :)
Install/update transformers `pip install -U transformers`
Summarize text with pipeline:
```python
import torch
from transformers import pipeline
summarizer = pipeline(
"summarization",
"pszemraj/long-t5-tglobal-xl-16384-book-summary",
device=0 if torch.cuda.is_available() else -1,
)
long_text = "Here is a lot of text I don't want to read. Replace me"
result = summarizer(long_text)
print(result[0]["summary_text"])
```
Pass [other parameters related to beam search textgen](https://huggingface.co/blog/how-to-generate) when calling `summarizer` to get even higher quality results.
## Intended uses & limitations
- while this model seems to improve upon factual consistency, **do not take summaries to be foolproof and check things that seem odd**.
- specifically: negation statements (i.e. model says: _This thing does not have <ATTRIBUTE>_ where instead it should have said _This thing has a lot of <ATTRIBUTE>_).
- I'm sure someone will write a paper on this eventually (if there isn't one already), but you can usually fact-check this by paying attention to the surrounding sentences of a claim by the model.
## Training and evaluation data
- `kmfoda/booksum` dataset on HuggingFace - read [the original paper here](https://arxiv.org/abs/2105.08209).
- **Initial fine-tuning** only used input text with 12288 tokens input or less and 1024 tokens output or less for memory reasons. Per brief analysis, summaries in the 12288-16384 range in this dataset are in the **small** minority
- In addition, this initial training combined the training and validation sets and trained on these in aggregate to increase the functional dataset size. **Therefore, take the validation set results with a grain of salt; primary metrics should be (always) the test set.**
- **final phases of fine-tuning** used the standard conventions of 16384 input/1024 output keeping everything (truncating longer sequences). This did not appear to change the loss/performance much.
## Eval Results
Official results with the [model evaluator](https://huggingface.co/spaces/autoevaluate/model-evaluator) will be computed and posted here.
**Please read the note above as due to training methods it looks better than the test set results will be**. The model achieves the following results on the evaluation set:
- eval_loss: 1.2756
- eval_rouge1: 41.8013
- eval_rouge2: 12.0895
- eval_rougeL: 21.6007
- eval_rougeLsum: 39.5382
- eval_gen_len: 387.2945
- eval_runtime: 13908.4995
- eval_samples_per_second: 0.107
- eval_steps_per_second: 0.027
---
## FAQ
### How can I run inference with this on CPU?
lol
---
## Training procedure
### Updates
Updates to this model/model card will be posted here as relevant. The model seems fairly converged, but if updates/improvements can be made using `kmfoda/booksum`, this repo will be updated.
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0006
- train_batch_size: 1
- eval_batch_size: 1
- seed: 10350
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 32
- total_train_batch_size: 128
- total_eval_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant
- num_epochs: 1.0
\*_Prior training sessions used roughly similar parameters (learning rates were higher); multiple sessions were required as this takes eons to train
### Framework versions
- Transformers 4.25.0.dev0
- Pytorch 1.13.0+cu117
- Datasets 2.6.1
- Tokenizers 0.13.1
|