File size: 2,739 Bytes
2a2844a
 
 
 
 
 
 
 
 
 
124a88c
d5a5c8b
124a88c
f1003ff
124a88c
 
 
f1003ff
 
 
 
 
 
 
 
 
 
 
da0bd58
 
 
f1003ff
 
 
 
 
 
da0bd58
 
f1003ff
 
 
 
449bf36
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f1003ff
 
449bf36
 
 
 
 
 
 
 
 
 
f1003ff
 
 
 
124a88c
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
---
license: apache-2.0
datasets:
- dair-ai/emotion
language:
- en
metrics:
- accuracy
tags:
- emotion
---
# Model

Model IA Berta_Base_Uncased entrened with dateset emotion

## Model Details

Model Base: bert_base_uncased

dataset: dair-ai/emotion

Config train:

num_train_epochs= 8
learning_rate= 2e-5
weight_decay=0.01
batch_size: 64

## Eval Exam
```json
{
 'test_loss': 0.14830373227596283
 'test_accuracy': 0.9415
 'test_f1': 0.9411005763302622
 'test_runtime': 8.372
 'test_samples_per_second': 238.892
 'test_steps_per_second': 3.822
  }
```

 ## How to Use the model:
```python
from transformers import pipeline
model_path = "daveni/twitter-xlm-roberta-emotion-es"
emotion_analysis = pipeline("text-classification", framework="pt", model=model_path, tokenizer=model_path)
emotion_analysis("Einstein dijo: Solo hay dos cosas infinitas, el universo y los pinches anuncios de bitcoin en Twitter. Paren ya carajo aaaaaaghhgggghhh me quiero murir")
```
```
[{'label': 'anger', 'score': 0.48307016491889954}]
```
## Full classification example
```python
from transformers import AutoModelForSequenceClassification
from transformers import AutoTokenizer, AutoConfig
import numpy as np
from scipy.special import softmax
# Preprocess text (username and link placeholders)
def preprocess(text):
    new_text = []
    for t in text.split(" "):
        t = '@user' if t.startswith('@') and len(t) > 1 else t
        t = 'http' if t.startswith('http') else t
        new_text.append(t)
    return " ".join(new_text)
model_path = "Cesar42/bert-base-uncased-emotion_v2"
tokenizer = AutoTokenizer.from_pretrained(model_path )
config = AutoConfig.from_pretrained(model_path )
# PT
model = AutoModelForSequenceClassification.from_pretrained(model_path )
text = "Se ha quedao bonito día para publicar vídeo, ¿no? Hoy del tema más diferente que hemos tocado en el canal."
text = preprocess(text)
print(text)
encoded_input = tokenizer(text, return_tensors='pt')
output = model(**encoded_input)
scores = output[0][0].detach().numpy()
scores = softmax(scores)
# Print labels and scores
ranking = np.argsort(scores)
ranking = ranking[::-1]
for i in range(scores.shape[0]):
    l = config.id2label[ranking[i]]
    s = scores[ranking[i]]
    print(f"{i+1}) {l} {np.round(float(s), 4)}")
```
Output: 

```
Se ha quedao bonito día para publicar vídeo, ¿no? Hoy del tema más diferente que hemos tocado en el canal.
1) joy 0.7887
2) others 0.1679
3) surprise 0.0152
4) sadness 0.0145
5) anger 0.0077
6) disgust 0.0033
7) fear 0.0027
```

### Referece

* bhadresh-savani/bert-base-uncased-emotion
* [Colab Notebook](https://github.com/bhadreshpsavani/ExploringSentimentalAnalysis/blob/main/SentimentalAnalysisWithDistilbert.ipynb). bhadresh-savani