|
--- |
|
license: apache-2.0 |
|
--- |
|
### DreamBank Custom Architecture |
|
|
|
The repo contains the weights for the custom architecture presented in [Bertolini et al., 2023](https://arxiv.org/abs/2302.14828). |
|
Working example on how to load and use the model can be found in the [Git repo](https://github.com/lorenzoscottb/Dream_Reports_Annotation/tree/main/Experiments/Supervised_Learning). |
|
|
|
#### Use |
|
|
|
```py |
|
import torch, os |
|
import pandas as pd |
|
from tqdm import tqdm |
|
import transformers |
|
from transformers import AutoModel |
|
from transformers import AutoConfig |
|
from transformers import BertTokenizerFast |
|
from SL_utils import * |
|
|
|
Coding_emotions = { |
|
"AN": "Anger", |
|
"AP": "Apprehension", |
|
"SD": "Sadness", |
|
"CO": "Confusion", |
|
"HA": "Happiness", |
|
} |
|
|
|
emotions_list = list(Coding_emotions.keys()) |
|
|
|
test_sentences = [ |
|
"In my dream I was follwed by the scary monster.", |
|
"I was walking in a forest, sorrounded by singing birds. I was in calm and peace." |
|
] |
|
|
|
test_sentences_target = len(test_sentences)*[[0, 0, 0, 0, 0]] |
|
test_sentences_df = pd.DataFrame.from_dict( |
|
{ |
|
"report":test_sentences, |
|
"Report_as_Multilabel":test_sentences_target |
|
} |
|
) |
|
``` |
|
|
|
```py |
|
model_name = "bert-large-cased" |
|
model_config = AutoConfig.from_pretrained(model_name) |
|
tokenizer = BertTokenizerFast.from_pretrained(model_name, do_lower_case=False) |
|
testing_set = CustomDataset(test_sentences_df, tokenizer, max_length=512) |
|
|
|
test_params = { |
|
'batch_size': 2, |
|
'shuffle': True, |
|
'num_workers': 0 |
|
} |
|
|
|
testing_loader = DataLoader(testing_set, **test_params) |
|
|
|
model = BERT_PTM( |
|
model_config, |
|
model_name=model_name, |
|
n_classes=len(emotions_list), |
|
freeze_BERT=False, |
|
) |
|
|
|
# Load the models' weights from the pre-treined model |
|
model.load_state_dict(torch.load("path/to/pytorch_model.bin")) |
|
model.to("cuda") |
|
``` |
|
|
|
```py |
|
outputs, targets, ids = validation(model, testing_loader, device="cuda", return_inputs=True) |
|
|
|
corr_outputs = np.array(outputs) >= 0.5 |
|
corr_outputs_df = pd.DataFrame(corr_outputs, columns=emotions_list) |
|
corr_outputs_df = corr_outputs_df.astype(int) |
|
|
|
corr_outputs_df["report"] = decoded_ids = [decode_clean(x, tokenizer) for x in tqdm(ids)] |
|
``` |
|
|
|
### Cite |
|
If you use the model, please cite the pre-print. |
|
```bibtex |
|
@misc{https://doi.org/10.48550/arxiv.2302.14828, |
|
doi = {10.48550/ARXIV.2302.14828}, |
|
url = {https://arxiv.org/abs/2302.14828}, |
|
author = {Bertolini, Lorenzo and Elce, Valentina and Michalak, Adriana and Bernardi, Giulio and Weeds, Julie}, |
|
keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences}, |
|
title = {Automatic Scoring of Dream Reports' Emotional Content with Large Language Models}, |
|
publisher = {arXiv}, |
|
year = {2023}, |
|
copyright = {Creative Commons Attribution 4.0 International} |
|
} |
|
``` |