README.md
CHANGED
@@ -62,7 +62,8 @@ inference:
|
|
62 |
|
63 |
# checkpoints
|
64 |
|
65 |
-
This model is a fine-tuned version of [
|
|
|
66 |
It achieves the following results on the evaluation set:
|
67 |
- eval_loss: 1.1193
|
68 |
- eval_runtime: 6.6754
|
@@ -71,13 +72,18 @@ It achieves the following results on the evaluation set:
|
|
71 |
- epoch: 3.0
|
72 |
- step: 900
|
73 |
|
|
|
|
|
74 |
## Model description
|
75 |
|
76 |
-
|
|
|
|
|
77 |
|
78 |
## Intended uses & limitations
|
79 |
|
80 |
-
|
|
|
81 |
|
82 |
## Training and evaluation data
|
83 |
|
|
|
62 |
|
63 |
# checkpoints
|
64 |
|
65 |
+
This model is a fine-tuned version of [google/pegasus-large](https://huggingface.co/google/pegasus-large) on the [booksum](https://github.com/salesforce/booksum) dataset for four total epochs.
|
66 |
+
|
67 |
It achieves the following results on the evaluation set:
|
68 |
- eval_loss: 1.1193
|
69 |
- eval_runtime: 6.6754
|
|
|
72 |
- epoch: 3.0
|
73 |
- step: 900
|
74 |
|
75 |
+
A 1-epoch checkpoint can be found at [pszemraj/pegasus-large-book-summary](https://huggingface.co/pszemraj/pegasus-large-book-summary), which is where the second training session started from.
|
76 |
+
|
77 |
## Model description
|
78 |
|
79 |
+
- After some initial tests, it was found that models trained on the [booksum](https://github.com/salesforce/booksum) dataset seem to inherit the summaries' SparkNotes-style explanations; so the user gets a shorter and easier-to-understand version of the text instead of **just** more compact.
|
80 |
+
- This quality (anecdotally) is favourable for learning/comprehension because summarization datasets that simply make the information more compact (* cough * arXiv) can be so dense that the overall time spent trying to _comprehend_ what it is saying can be the same as just reading the original material.
|
81 |
+
|
82 |
|
83 |
## Intended uses & limitations
|
84 |
|
85 |
+
- standard pegasus has a max input length of 1024 tokens, therefore the model only saw the first 1024 tokens of a chapter when training, and learned to try to make the chapter's summary from that. Keep this in mind when using this model, as information at the end of a text sequence longer than 1024 tokens may be excluded from the final summary/the model will be biased towards information presented first.
|
86 |
+
- this was only trained on the dataset for an epoch but still provides reasonable results.
|
87 |
|
88 |
## Training and evaluation data
|
89 |
|