|
# Finetuning RoBERTa on Commonsense QA |
|
|
|
We follow a similar approach to [finetuning RACE](../README.race.md). Specifically |
|
for each question we construct five inputs, one for each of the five candidate |
|
answer choices. Each input is constructed by concatenating the question and |
|
candidate answer. We then encode each input and pass the resulting "[CLS]" |
|
representations through a fully-connected layer to predict the correct answer. |
|
We train with a standard cross-entropy loss. |
|
|
|
We also found it helpful to prepend a prefix of `Q:` to the question and `A:` to |
|
the answer. The complete input format is: |
|
``` |
|
<s> Q: Where would I not want a fox? </s> A: hen house </s> |
|
``` |
|
|
|
Our final submission is based on a hyperparameter search over the learning rate |
|
(1e-5, 2e-5, 3e-5), batch size (8, 16), number of training steps (2000, 3000, |
|
4000) and random seed. We selected the model with the best performance on the |
|
development set after 100 trials. |
|
|
|
### 1) Download data from the Commonsense QA website (https://www.tau-nlp.org/commonsenseqa) |
|
```bash |
|
bash examples/roberta/commonsense_qa/download_cqa_data.sh |
|
``` |
|
|
|
### 2) Finetune |
|
|
|
```bash |
|
MAX_UPDATES=3000 # Number of training steps. |
|
WARMUP_UPDATES=150 # Linearly increase LR over this many steps. |
|
LR=1e-05 # Peak LR for polynomial LR scheduler. |
|
MAX_SENTENCES=16 # Batch size. |
|
SEED=1 # Random seed. |
|
ROBERTA_PATH=/path/to/roberta/model.pt |
|
DATA_DIR=data/CommonsenseQA |
|
|
|
# we use the --user-dir option to load the task from |
|
# the examples/roberta/commonsense_qa directory: |
|
FAIRSEQ_PATH=/path/to/fairseq |
|
FAIRSEQ_USER_DIR=${FAIRSEQ_PATH}/examples/roberta/commonsense_qa |
|
|
|
CUDA_VISIBLE_DEVICES=0 fairseq-train --fp16 --ddp-backend=legacy_ddp \ |
|
$DATA_DIR \ |
|
--user-dir $FAIRSEQ_USER_DIR \ |
|
--restore-file $ROBERTA_PATH \ |
|
--reset-optimizer --reset-dataloader --reset-meters \ |
|
--no-epoch-checkpoints --no-last-checkpoints --no-save-optimizer-state \ |
|
--best-checkpoint-metric accuracy --maximize-best-checkpoint-metric \ |
|
--task commonsense_qa --init-token 0 --bpe gpt2 \ |
|
--arch roberta_large --max-positions 512 \ |
|
--dropout 0.1 --attention-dropout 0.1 --weight-decay 0.01 \ |
|
--criterion sentence_ranking --num-classes 5 \ |
|
--optimizer adam --adam-betas '(0.9, 0.98)' --adam-eps 1e-06 --clip-norm 0.0 \ |
|
--lr-scheduler polynomial_decay --lr $LR \ |
|
--warmup-updates $WARMUP_UPDATES --total-num-update $MAX_UPDATES \ |
|
--batch-size $MAX_SENTENCES \ |
|
--max-update $MAX_UPDATES \ |
|
--log-format simple --log-interval 25 \ |
|
--seed $SEED |
|
``` |
|
|
|
The above command assumes training on 1 GPU with 32GB of RAM. For GPUs with |
|
less memory, decrease `--batch-size` and increase `--update-freq` |
|
accordingly to compensate. |
|
|
|
### 3) Evaluate |
|
```python |
|
import json |
|
import torch |
|
from fairseq.models.roberta import RobertaModel |
|
from examples.roberta import commonsense_qa # load the Commonsense QA task |
|
roberta = RobertaModel.from_pretrained('checkpoints', 'checkpoint_best.pt', 'data/CommonsenseQA') |
|
roberta.eval() # disable dropout |
|
roberta.cuda() # use the GPU (optional) |
|
nsamples, ncorrect = 0, 0 |
|
with open('data/CommonsenseQA/valid.jsonl') as h: |
|
for line in h: |
|
example = json.loads(line) |
|
scores = [] |
|
for choice in example['question']['choices']: |
|
input = roberta.encode( |
|
'Q: ' + example['question']['stem'], |
|
'A: ' + choice['text'], |
|
no_separator=True |
|
) |
|
score = roberta.predict('sentence_classification_head', input, return_logits=True) |
|
scores.append(score) |
|
pred = torch.cat(scores).argmax() |
|
answer = ord(example['answerKey']) - ord('A') |
|
nsamples += 1 |
|
if pred == answer: |
|
ncorrect += 1 |
|
|
|
print('Accuracy: ' + str(ncorrect / float(nsamples))) |
|
# Accuracy: 0.7846027846027847 |
|
``` |
|
|
|
The above snippet is not batched, which makes it quite slow. See [instructions |
|
for batched prediction with RoBERTa](https://github.com/pytorch/fairseq/tree/master/examples/roberta#batched-prediction). |
|
|