File size: 1,242 Bytes
8db876c
 
 
 
 
 
e09fb32
6fcbb7d
d64b82e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6fcbb7d
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
---
datasets:
- stanfordnlp/imdb
base_model:
- lvwerra/gpt2-imdb
pipeline_tag: text-generation
license: mit
---
# Purpose of this finetuning

<!-- Provide a quick summary of what the model is/does. -->

Finetune base model [GPT2-IMDB](https://huggingface.co/lvwerra/gpt2-imdb) using a using [this BERT sentiment classifier](https://huggingface.co/lvwerra/distilbert-imdb) as a reward function.

- The goal is to train the GPT2 model to extrapolate on a movie review and generate negative sentiment.
- There is a separate training done to generate positive movie reviews. The eventual goal would be to interpolate the weight spaces of the 'positively fintuned' and 'negatively finetuned' models as per the [rewarded-soups paper](https://arxiv.org/abs/2306.04488) and test if it results in (qualitatively) neutral reviews.

## Model Params

Here are the traning parameters
- base_model ='lvwerra/gpt2-imdb'
- dataset = stanfordnlp/imdb
- batch_size = 16
- learning_rate = 1.41e-5
- output_max_length = 16
- output_min_length = 4

Not sure how long it took, but less than a couple hours on a single A6000 GPU 


### Results

![image/png](https://cdn-uploads.huggingface.co/production/uploads/671ad995ca9561981190dbb4/ndneRnA3jP563cKMEtMth.png)