pszemraj
/

pegasus-large-summary-explain

text2text-generation

Model card Files Files and versions

pszemraj commited on Feb 4, 2022

Commit

4da593a

·

1 Parent(s): 09eda8e

[email protected]

Files changed (1) hide show

README.md +9 -3

README.md CHANGED Viewed

@@ -62,7 +62,8 @@ inference:
 # checkpoints
-This model is a fine-tuned version of [pszemraj/pegasus-large-book-summary](https://huggingface.co/pszemraj/pegasus-large-book-summary) on an unknown dataset.
 It achieves the following results on the evaluation set:
 - eval_loss: 1.1193
 - eval_runtime: 6.6754
@@ -71,13 +72,18 @@ It achieves the following results on the evaluation set:
 - epoch: 3.0
 - step: 900
 ## Model description
-More information needed
 ## Intended uses & limitations
-More information needed
 ## Training and evaluation data

 # checkpoints
+This model is a fine-tuned version of [google/pegasus-large](https://huggingface.co/google/pegasus-large) on the [booksum](https://github.com/salesforce/booksum) dataset for four total epochs.
 It achieves the following results on the evaluation set:
 - eval_loss: 1.1193
 - eval_runtime: 6.6754
 - epoch: 3.0
 - step: 900
+A 1-epoch checkpoint can be found at [pszemraj/pegasus-large-book-summary](https://huggingface.co/pszemraj/pegasus-large-book-summary), which is where the second training session started from.
 ## Model description
+- After some initial tests, it was found that models trained on the [booksum](https://github.com/salesforce/booksum) dataset seem to inherit the summaries' SparkNotes-style explanations; so the user gets a shorter and easier-to-understand version of the text instead of **just** more compact.
+ - This quality (anecdotally) is favourable for learning/comprehension because summarization datasets that simply make the information more compact (* cough * arXiv) can be so dense that the overall time spent trying to _comprehend_ what it is saying can be the same as just reading the original material.
 ## Intended uses & limitations
+- standard pegasus has a max input length of 1024 tokens, therefore the model only saw the first 1024 tokens of a chapter when training, and learned to try to make the chapter's summary from that. Keep this in mind when using this model, as information at the end of a text sequence longer than 1024 tokens may be excluded from the final summary/the model will be biased towards information presented first.
+- this was only trained on the dataset for an epoch but still provides reasonable results.
 ## Training and evaluation data