Create README.md

63cf97c about 2 years ago

4.74 kB

	---
	tags:
	- summarization
	- summary
	- booksum
	- long-document
	- long-form
	license:
	- apache-2.0
	- bsd-3-clause
	datasets:
	- kmfoda/booksum
	metrics:
	- rouge
	inference: False

	---

	# long-t5-tglobal-xl + BookSum

	- summarize long text and get a SparkNotes-esque summary of arbitrary topics!
	- generalizes reasonably well to academic & narrative text. This is the XL checkpoint, which from a human-evaluation perspective, produces even better summaries.
	- A simple example/use case with the `base` model on ASR is [here](https://longt5-booksum-example.netlify.app/).

	## Model description

	A fine-tuned version of [google/long-t5-tglobal-xl](https://huggingface.co/google/long-t5-tglobal-xl) on the `kmfoda/booksum` dataset.

	Read the paper by Guo et al. here: [LongT5: Efficient Text-To-Text Transformer for Long Sequences](https://arxiv.org/pdf/2112.07916.pdf)

	## How-To in Python

	> `LLM.int8()` appears to be compatible with summarization and does not degrade the quality of the outputs; this is a crucial enabler for using this model on standard GPUs. A PR for this is in-progress [here](https://github.com/huggingface/transformers/pull/20341), and this model card will be updated with instructions once done :)

	Install/update transformers `pip install -U transformers`

	Summarize text with pipeline:

	```python
	import torch
	from transformers import pipeline

	summarizer = pipeline(
	"summarization",
	"pszemraj/long-t5-tglobal-xl-16384-book-summary",
	device=0 if torch.cuda.is_available() else -1,
	)
	long_text = "Here is a lot of text I don't want to read. Replace me"

	result = summarizer(long_text)
	print(result[0]["summary_text"])
	```

	Pass [other parameters related to beam search textgen](https://huggingface.co/blog/how-to-generate) when calling `summarizer` to get even higher quality results.


	## Intended uses & limitations

	- while this model seems to improve upon factual consistency, do not take summaries to be foolproof and check things that seem odd.
	- specifically: negation statements (i.e. model says: _This thing does not have <ATTRIBUTE>_ where instead it should have said _This thing has a lot of <ATTRIBUTE>_).
	- I'm sure someone will write a paper on this eventually (if there isn't one already), but you can usually fact-check this by paying attention to the surrounding sentences of a claim by the model.

	## Training and evaluation data

	- `kmfoda/booksum` dataset on HuggingFace - read [the original paper here](https://arxiv.org/abs/2105.08209).
	- Initial fine-tuning only used input text with 12288 tokens input or less and 1024 tokens output or less for memory reasons. Per brief analysis, summaries in the 12288-16384 range in this dataset are in the small minority
	- In addition, this initial training combined the training and validation sets and trained on these in aggregate to increase the functional dataset size. Therefore, take the validation set results with a grain of salt; primary metrics should be (always) the test set.
	- final phases of fine-tuning used the standard conventions of 16384 input/1024 output keeping everything (truncating longer sequences). This did not appear to change the loss/performance much.

	## Eval Results

	Official results with the [model evaluator](https://huggingface.co/spaces/autoevaluate/model-evaluator) will be computed and posted here.

	Please read the note above as due to training methods it looks better than the test set results will be. The model achieves the following results on the evaluation set:
	- eval_loss: 1.2756
	- eval_rouge1: 41.8013
	- eval_rouge2: 12.0895
	- eval_rougeL: 21.6007
	- eval_rougeLsum: 39.5382
	- eval_gen_len: 387.2945
	- eval_runtime: 13908.4995
	- eval_samples_per_second: 0.107
	- eval_steps_per_second: 0.027

	---

	## FAQ

	### How can I run inference with this on CPU?

	lol

	---

	## Training procedure

	### Updates

	Updates to this model/model card will be posted here as relevant. The model seems fairly converged, but if updates/improvements can be made using `kmfoda/booksum`, this repo will be updated.

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0006
	- train_batch_size: 1
	- eval_batch_size: 1
	- seed: 10350
	- distributed_type: multi-GPU
	- num_devices: 4
	- gradient_accumulation_steps: 32
	- total_train_batch_size: 128
	- total_eval_batch_size: 4
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: constant
	- num_epochs: 1.0

	\*_Prior training sessions used roughly similar parameters (learning rates were higher); multiple sessions were required as this takes eons to train


	### Framework versions

	- Transformers 4.25.0.dev0
	- Pytorch 1.13.0+cu117
	- Datasets 2.6.1
	- Tokenizers 0.13.1