|
--- |
|
datasets: |
|
- HuggingFaceFW/fineweb |
|
language: |
|
- en |
|
--- |
|
# Encoder-Decoder model with DeBERTa encoder |
|
|
|
## pre-trained models |
|
|
|
- Encoder: `microsoft/deberta-v3-small` |
|
|
|
- Decoder: `deliciouscat/deberta-v3-base-decoder-v0.1` (6 transformer layers, 8 attention heads) |
|
|
|
-> 297511524(298M) params |
|
|
|
## Data used |
|
|
|
`HuggingFaceFW/fineweb` -> sampled 124800 |
|
|
|
## Training hparams |
|
|
|
- optimizer: AdamW, lr=2.3e-5, betas=(0.875, 0.997) |
|
|
|
- batch size: 12 (maximal on Colab pro A100 env) |
|
|
|
-> training on denoising objective (BART) |
|
|
|
## How to use |
|
|
|
``` |
|
from transformers import AutoTokenizer, EncoderDecoderModel |
|
|
|
model = EncoderDecoderModel.from_pretrained("deliciouscat/deberta-v3-base-encoder-decoder-v0.2") |
|
tokenizer = AutoTokenizer.from_pretrained("deliciouscat/deberta-v3-base-encoder-decoder-v0.2") |
|
``` |
|
|
|
## Future work! |
|
|
|
- train more scientific data |
|
|
|
- fine-tune on keyword extraction task |