File size: 1,257 Bytes
cc0b62b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
# torchMoji examples

## Initialization  
[create_twitter_vocab.py](create_twitter_vocab.py)  
Create a new vocabulary from a tsv file.  
  
[tokenize_dataset.py](tokenize_dataset.py)  
Tokenize a given dataset using the prebuilt vocabulary.  
  
[vocab_extension.py](vocab_extension.py)  
Extend the given vocabulary using dataset-specific words.  
  
[dataset_split.py](dataset_split.py)  
Split a given dataset into training, validation and testing.
  
## Use pretrained model/architecture 
[score_texts_emojis.py](score_texts_emojis.py)  
Use torchMoji to score texts for emoji distribution.  

[encode_texts.py](encode_texts.py)  
Use torchMoji to encode the text into 2304-dimensional feature vectors for further modeling/analysis.

## Transfer learning
[finetune_youtube_last.py](finetune_youtube_last.py)  
Finetune the model on the SS-Youtube dataset using the 'last' method.  
    
[finetune_insults_chain-thaw.py](finetune_insults_chain-thaw.py)  
Finetune the model on the Kaggle insults dataset (from blog post) using the 'chain-thaw' method.  
  
[finetune_semeval_class-avg_f1.py](finetune_semeval_class-avg_f1.py)  
Finetune the model on the SemeEval emotion dataset using the 'full' method and evaluate using the class average F1 metric.