|
--- |
|
language: de |
|
license: mit |
|
inference: false |
|
tags: |
|
- gptj |
|
- title generation |
|
- headline generation |
|
- teaser generation |
|
- news |
|
--- |
|
# GPT-J-Title-Teaser-1k |
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
|
gptj-title-teaser-1k |
|
Version 1.0 / 22 December 2022 |
|
|
|
A proof of concept for multitask fine-tuning [GPT-J-6B-8bit](https://huggingface.co/hivemind/gpt-j-6B-8bit) for german news title and teaser generation. |
|
|
|
# Model Details |
|
|
|
## Model Description |
|
|
|
- **Developed by:** snipaid |
|
- **Model type:** gptj |
|
- **Language(s) (NLP):** de |
|
- **License:** MIT |
|
- **Finetuned from model:** [GPT-J-6B-8bit](https://huggingface.co/hivemind/gpt-j-6B-8bit) |
|
|
|
# Uses |
|
|
|
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. --> |
|
|
|
This model is not intended for use! It is a preliminary version of gptj-title-teaser-10k to prove the multitask fine-tuning approach. |
|
For use please refer to [gptj-title-teaser-10k](https://huggingface.co/snipaid/gptj-title-teaser-10k). |
|
|
|
|
|
# Training Details |
|
|
|
## Training Data |
|
|
|
<!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. --> |
|
|
|
The model was finetuned on a collection of 1,000 news items scraped from different online news outlets in german language. |
|
|
|
For each news item the dataset contains title, teaser and fulltext. |
|
|
|
``` |
|
[ |
|
{ |
|
"title": ..., |
|
"teaser": ..., |
|
"fulltext": ... |
|
}, |
|
] |
|
``` |
|
|
|
## Training Procedure |
|
|
|
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. --> |
|
|
|
The model was finetuned using a causal language modeling (CLM) objective for multitask finetuning. |
|
|
|
### Preprocessing |
|
|
|
For each news item, two inputs were concatenated like below. |
|
``` |
|
f"[Text]: {item.fulltext} \n [Title]: {item.title}" |
|
f"[Text]: {item.fulltext} \n [Teaser]: {item.teaser}" |
|
``` |
|
This results in one input per task for each news item. |
|
|
|
*Note: The inserted prompt "[Text]:" marks the beginning of the news item's fulltext. |
|
In the same manner "[Title]:" prompts the news item's title and "[Teaser]:" the news item's teaser.* |
|
|
|
# Evaluation |
|
|
|
1,000 german news articles proved to be sufficient to validate the approach. |
|
Evaluation showed that the model improved compared to the GPT-J baseline in: |
|
- german language capabilities (significantly) |
|
- title generation (significantly) |
|
- teaser generation (slightly) |
|
|
|
The evaluation also suggested that there is still opportunity for improvement with more data. |
|
For the model trained with the same approach but 10x the amount of data pleaser refer to [gptj-title-teaser-10k](https://huggingface.co/snipaid/gptj-title-teaser-10k). |
|
|
|
# Environmental Impact |
|
|
|
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly --> |
|
|
|
Carbon emissions were estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). |
|
|
|
- **Hardware Type:** A100 SXM4 |
|
- **Hours used:** 2h 42min |
|
- **Cloud Provider:** Vast.ai |
|
- **Compute Region:** Unknown |
|
- **Carbon Emitted:** ~0.47kg co2e |
|
|
|
# Glossary |
|
|
|
**News Item**, aka news article. A particular piece of news, usually from a journalistic source. |
|
**Snippet**, a small section of text that is related to a news item. |
|
**Title** aka headline. A few words that reflect the essence of the news story. |
|
**Teaser** aka lede. A few sentences that spark curiosity about the "best of the rest" of the news story. |