startificial commited on
Commit
b758e43
·
verified ·
1 Parent(s): 1374b1e

Upload 8 files

Browse files
README.md ADDED
@@ -0,0 +1,76 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ tags:
5
+ - text-classification
6
+ - zero-shot-classification
7
+ metrics:
8
+ - accuracy
9
+ pipeline_tag: zero-shot-classification
10
+
11
+ ---
12
+ # DeBERTa-v3-base-mnli-fever-anli
13
+ ## Model description
14
+ This model was trained on the MultiNLI dataset, which consists of 392 702 NLI hypothesis-premise pairs.
15
+ The base model is [DeBERTa-v3-base from Microsoft](https://huggingface.co/microsoft/deberta-v3-base). The v3 variant of DeBERTa substantially outperforms previous versions of the model by including a different pre-training objective, see annex 11 of the original [DeBERTa paper](https://arxiv.org/pdf/2006.03654.pdf). For a more powerful model, check out [DeBERTa-v3-base-mnli-fever-anli](https://huggingface.co/MoritzLaurer/DeBERTa-v3-base-mnli-fever-anli) which was trained on even more data.
16
+ ## Intended uses & limitations
17
+ #### How to use the model
18
+ ```python
19
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
20
+ import torch
21
+ model_name = "MoritzLaurer/DeBERTa-v3-base-mnli"
22
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
23
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
24
+ premise = "I first thought that I liked the movie, but upon second thought it was actually disappointing."
25
+ hypothesis = "The movie was good."
26
+ input = tokenizer(premise, hypothesis, truncation=True, return_tensors="pt")
27
+ output = model(input["input_ids"].to(device)) # device = "cuda:0" or "cpu"
28
+ prediction = torch.softmax(output["logits"][0], -1).tolist()
29
+ label_names = ["entailment", "neutral", "contradiction"]
30
+ prediction = {name: round(float(pred) * 100, 1) for pred, name in zip(prediction, label_names)}
31
+ print(prediction)
32
+ ```
33
+ ### Training data
34
+ This model was trained on the MultiNLI dataset, which consists of 392 702 NLI hypothesis-premise pairs.
35
+
36
+ ### Training procedure
37
+ DeBERTa-v3-base-mnli was trained using the Hugging Face trainer with the following hyperparameters.
38
+ ```
39
+ training_args = TrainingArguments(
40
+ num_train_epochs=5, # total number of training epochs
41
+ learning_rate=2e-05,
42
+ per_device_train_batch_size=32, # batch size per device during training
43
+ per_device_eval_batch_size=32, # batch size for evaluation
44
+ warmup_ratio=0.1, # number of warmup steps for learning rate scheduler
45
+ weight_decay=0.06, # strength of weight decay
46
+ fp16=True # mixed precision training
47
+ )
48
+ ```
49
+ ### Eval results
50
+ The model was evaluated using the matched test set and achieves 0.90 accuracy.
51
+
52
+ ## Limitations and bias
53
+ Please consult the original DeBERTa paper and literature on different NLI datasets for potential biases.
54
+ ### BibTeX entry and citation info
55
+ If you want to cite this model, please cite the original DeBERTa paper, the respective NLI datasets and include a link to this model on the Hugging Face hub.
56
+
57
+ ### Ideas for cooperation or questions?
58
+ If you have questions or ideas for cooperation, contact me at m{dot}laurer{at}vu{dot}nl or [LinkedIn](https://www.linkedin.com/in/moritz-laurer/)
59
+
60
+ ### Debugging and issues
61
+ Note that DeBERTa-v3 was released recently and older versions of HF Transformers seem to have issues running the model (e.g. resulting in an issue with the tokenizer). Using Transformers==4.13 might solve some issues.
62
+
63
+ ## Model Recycling
64
+
65
+ [Evaluation on 36 datasets](https://ibm.github.io/model-recycling/model_gain_chart?avg=0.97&mnli_lp=nan&20_newsgroup=-0.39&ag_news=0.19&amazon_reviews_multi=0.10&anli=1.31&boolq=0.81&cb=8.93&cola=0.01&copa=13.60&dbpedia=-0.23&esnli=-0.51&financial_phrasebank=0.61&imdb=-0.26&isear=-0.35&mnli=-0.34&mrpc=1.24&multirc=1.50&poem_sentiment=-0.19&qnli=0.30&qqp=0.13&rotten_tomatoes=-0.55&rte=3.57&sst2=0.35&sst_5bins=0.39&stsb=1.10&trec_coarse=-0.36&trec_fine=-0.02&tweet_ev_emoji=1.11&tweet_ev_emotion=-0.35&tweet_ev_hate=1.43&tweet_ev_irony=-2.65&tweet_ev_offensive=-1.69&tweet_ev_sentiment=-1.51&wic=0.57&wnli=-2.61&wsc=9.95&yahoo_answers=-0.33&model_name=MoritzLaurer%2FDeBERTa-v3-base-mnli&base_name=microsoft%2Fdeberta-v3-base) using MoritzLaurer/DeBERTa-v3-base-mnli as a base model yields average score of 80.01 in comparison to 79.04 by microsoft/deberta-v3-base.
66
+
67
+ The model is ranked 1st among all tested models for the microsoft/deberta-v3-base architecture as of 09/01/2023.
68
+
69
+ Results:
70
+
71
+ | 20_newsgroup | ag_news | amazon_reviews_multi | anli | boolq | cb | cola | copa | dbpedia | esnli | financial_phrasebank | imdb | isear | mnli | mrpc | multirc | poem_sentiment | qnli | qqp | rotten_tomatoes | rte | sst2 | sst_5bins | stsb | trec_coarse | trec_fine | tweet_ev_emoji | tweet_ev_emotion | tweet_ev_hate | tweet_ev_irony | tweet_ev_offensive | tweet_ev_sentiment | wic | wnli | wsc | yahoo_answers |
72
+ |---------------:|----------:|-----------------------:|--------:|--------:|--------:|--------:|-------:|----------:|--------:|-----------------------:|-------:|--------:|--------:|--------:|----------:|-----------------:|--------:|--------:|------------------:|--------:|--------:|------------:|-------:|--------------:|------------:|-----------------:|-------------------:|----------------:|-----------------:|---------------------:|---------------------:|--------:|--------:|--------:|----------------:|
73
+ | 86.0196 | 90.6333 | 66.96 | 60.0938 | 83.792 | 83.9286 | 86.5772 | 72 | 79.2 | 91.419 | 85.1 | 94.232 | 71.5124 | 89.4426 | 90.4412 | 63.7583 | 86.5385 | 93.8129 | 91.9144 | 89.8687 | 85.9206 | 95.4128 | 57.3756 | 91.377 | 97.4 | 91 | 47.302 | 83.6031 | 57.6431 | 77.1684 | 83.3721 | 70.2947 | 71.7868 | 67.6056 | 74.0385 | 71.7 |
74
+
75
+
76
+ For more information, see: [Model Recycling](https://ibm.github.io/model-recycling/)
added_tokens.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"[MASK]": 128000}
config.json ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "./results/nli-few-shot/mnli-3c/DeBERTa-v3-base-mnli",
3
+ "architectures": [
4
+ "DebertaV2ForSequenceClassification"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "hidden_act": "gelu",
8
+ "hidden_dropout_prob": 0.1,
9
+ "hidden_size": 768,
10
+ "id2label": {
11
+ "0": "entailment",
12
+ "1": "neutral",
13
+ "2": "contradiction"
14
+ },
15
+ "initializer_range": 0.02,
16
+ "intermediate_size": 3072,
17
+ "label2id": {
18
+ "contradiction": 2,
19
+ "entailment": 0,
20
+ "neutral": 1
21
+ },
22
+ "layer_norm_eps": 1e-07,
23
+ "max_position_embeddings": 512,
24
+ "max_relative_positions": -1,
25
+ "model_type": "deberta-v2",
26
+ "norm_rel_ebd": "layer_norm",
27
+ "num_attention_heads": 12,
28
+ "num_hidden_layers": 12,
29
+ "pad_token_id": 0,
30
+ "pooler_dropout": 0,
31
+ "pooler_hidden_act": "gelu",
32
+ "pooler_hidden_size": 768,
33
+ "pos_att_type": [
34
+ "p2c",
35
+ "c2p"
36
+ ],
37
+ "position_biased_input": false,
38
+ "position_buckets": 256,
39
+ "relative_attention": true,
40
+ "share_att_key": true,
41
+ "torch_dtype": "float32",
42
+ "transformers_version": "4.11.0",
43
+ "type_vocab_size": 0,
44
+ "vocab_size": 128100
45
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d60ea56c15d3e58e568dbfc0f2d662b5b3afec5bb0bbe2e10751a4f5a9a6efe7
3
+ size 737726552
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fb31d72b63a8c2996d987db725d4cce57f589959ab4668938a3e0c6a5dd16470
3
+ size 737784811
special_tokens_map.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"bos_token": "[CLS]", "eos_token": "[SEP]", "unk_token": "[UNK]", "sep_token": "[SEP]", "pad_token": "[PAD]", "cls_token": "[CLS]", "mask_token": "[MASK]"}
spm.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c679fbf93643d19aab7ee10c0b99e460bdbc02fedf34b92b05af343b4af586fd
3
+ size 2464616
tokenizer_config.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"do_lower_case": false, "bos_token": "[CLS]", "eos_token": "[SEP]", "unk_token": "[UNK]", "sep_token": "[SEP]", "pad_token": "[PAD]", "cls_token": "[CLS]", "mask_token": "[MASK]", "split_by_punct": false, "sp_model_kwargs": {}, "vocab_type": "spm", "model_max_length": 512, "special_tokens_map_file": null, "tokenizer_file": null, "name_or_path": "microsoft/deberta-v3-base", "tokenizer_class": "DebertaV2Tokenizer"}