tasksource
/

ModernBERT-large-nli

@@ -1,199 +1,190 @@
 ---
 library_name: transformers
-tags: []
 ---
 # Model Card for Model ID
-<!-- Provide a quick summary of what the model is/does. -->
-## Model Details
-### Model Description
-<!-- Provide a longer summary of what this model is. -->
-This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
-- **Developed by:** [More Information Needed]
-- **Funded by [optional]:** [More Information Needed]
-- **Shared by [optional]:** [More Information Needed]
-- **Model type:** [More Information Needed]
-- **Language(s) (NLP):** [More Information Needed]
-- **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [More Information Needed]
-### Model Sources [optional]
-<!-- Provide the basic links for the model. -->
-- **Repository:** [More Information Needed]
-- **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
-## Uses
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
-### Direct Use
-<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
-[More Information Needed]
-### Downstream Use [optional]
-<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
-[More Information Needed]
-### Out-of-Scope Use
-<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
-[More Information Needed]
-## Bias, Risks, and Limitations
-<!-- This section is meant to convey both technical and sociotechnical limitations. -->
-[More Information Needed]
-### Recommendations
-<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
-Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
-## How to Get Started with the Model
-Use the code below to get started with the model.
-[More Information Needed]
-## Training Details
-### Training Data
-<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
-### Training Procedure
-<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-#### Preprocessing [optional]
-[More Information Needed]
-#### Training Hyperparameters
-- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
-#### Speeds, Sizes, Times [optional]
-<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
-[More Information Needed]
-## Evaluation
-<!-- This section describes the evaluation protocols and provides the results. -->
-### Testing Data, Factors & Metrics
-#### Testing Data
-<!-- This should link to a Dataset Card if possible. -->
-[More Information Needed]
-#### Factors
-<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
-[More Information Needed]
-#### Metrics
-<!-- These are the evaluation metrics being used, ideally with a description of why. -->
-[More Information Needed]
-### Results
-[More Information Needed]
-#### Summary
-## Model Examination [optional]
-<!-- Relevant interpretability work for the model goes here -->
-[More Information Needed]
-## Environmental Impact
-<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
-Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
-- **Hardware Type:** [More Information Needed]
-- **Hours used:** [More Information Needed]
-- **Cloud Provider:** [More Information Needed]
-- **Compute Region:** [More Information Needed]
-- **Carbon Emitted:** [More Information Needed]
-## Technical Specifications [optional]
-### Model Architecture and Objective
-[More Information Needed]
-### Compute Infrastructure
-[More Information Needed]
-#### Hardware
-[More Information Needed]
-#### Software
-[More Information Needed]
-## Citation [optional]
-<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
-**BibTeX:**
-[More Information Needed]
-**APA:**
-[More Information Needed]
-## Glossary [optional]
-<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
-[More Information Needed]
-## More Information [optional]
-[More Information Needed]
-## Model Card Authors [optional]
-[More Information Needed]
-## Model Card Contact
-[More Information Needed]

 ---
 library_name: transformers
+base_model:
+- answerdotai/ModernBERT-large
+license: apache-2.0
+language:
+- en
+pipeline_tag: zero-shot-classification
+datasets:
+- nyu-mll/glue
+- facebook/anli
+tags:
+- instruct
+- natural-language-inference
+- nli
 ---
 # Model Card for Model ID
+ModernBERT multi-task fine-tuned on tasksource NLI tasks, including MNLI, ANLI, SICK, WANLI, doc-nli, LingNLI, FOLIO, FOL-NLI, LogicNLI, Label-NLI and all datasets in the below table).
+This is the equivalent of an "instruct" version.
+The model was trained for 200k steps on an Nvidia A30 GPU.
+It is very good at reasoning tasks (better than llama 3.1 8B Instruct on ANLI and FOLIO), long context reasoning, sentiment analysis and zero-shot classification with new labels.
+| test_name                             |   test_accuracy |
+|:--------------------------------------|----------------:|
+| glue/mnli                             |            0.89 |
+| glue/qnli                             |            0.96 |
+| glue/rte                              |            0.91 |
+| glue/wnli                             |            0.64 |
+| glue/mrpc                             |            0.81 |
+| glue/qqp                              |            0.87 |
+| super_glue/boolq                      |            0.66 |
+| super_glue/cb                         |            0.86 |
+| super_glue/multirc                    |            0.9  |
+| super_glue/wic                        |            0.71 |
+| super_glue/axg                        |            1    |
+| anli/a1                               |            0.72 |
+| anli/a2                               |            0.54 |
+| anli/a3                               |            0.55 |
+| sick/label                            |            0.91 |
+| sick/entailment_AB                    |            0.93 |
+| snli                                  |            0.94 |
+| scitail/snli_format                   |            0.95 |
+| hans                                  |            1    |
+| WANLI                                 |            0.77 |
+| recast/recast_ner                     |            0.85 |
+| recast/recast_sentiment               |            0.97 |
+| recast/recast_verbnet                 |            0.89 |
+| recast/recast_megaveridicality        |            0.87 |
+| recast/recast_verbcorner              |            0.87 |
+| recast/recast_kg_relations            |            0.9  |
+| recast/recast_factuality              |            0.95 |
+| recast/recast_puns                    |            0.98 |
+| probability_words_nli/reasoning_1hop  |            1    |
+| probability_words_nli/usnli           |            0.79 |
+| probability_words_nli/reasoning_2hop  |            0.98 |
+| nan-nli                               |            0.85 |
+| nli_fever                             |            0.78 |
+| breaking_nli                          |            0.99 |
+| conj_nli                              |            0.72 |
+| fracas                                |            0.79 |
+| dialogue_nli                          |            0.94 |
+| mpe                                   |            0.75 |
+| dnc                                   |            0.91 |
+| recast_white/fnplus                   |            0.76 |
+| recast_white/sprl                     |            0.9  |
+| recast_white/dpr                      |            0.84 |
+| add_one_rte                           |            0.94 |
+| paws/labeled_final                    |            0.96 |
+| glue/cola                             |            0.87 |
+| glue/sst2                             |            0.96 |
+| pragmeval/pdtb                        |            0.56 |
+| lex_glue/scotus                       |            0.58 |
+| lex_glue/ledgar                       |            0.85 |
+| dynasent/dynabench.dynasent.r1.all/r1 |            0.83 |
+| dynasent/dynabench.dynasent.r2.all/r2 |            0.76 |
+| cycic_classification                  |            0.96 |
+| lingnli                               |            0.91 |
+| monotonicity-entailment               |            0.97 |
+| scinli                                |            0.88 |
+| naturallogic                          |            0.93 |
+| dynahate                              |            0.86 |
+| syntactic-augmentation-nli            |            0.94 |
+| autotnli                              |            0.92 |
+| defeasible-nli/atomic                 |            0.83 |
+| defeasible-nli/snli                   |            0.8  |
+| help-nli                              |            0.96 |
+| nli-veridicality-transitivity         |            0.99 |
+| lonli                                 |            0.99 |
+| dadc-limit-nli                        |            0.79 |
+| folio                                 |            0.71 |
+| tomi-nli                              |            0.54 |
+| puzzte                                |            0.59 |
+| temporal-nli                          |            0.93 |
+| counterfactually-augmented-snli       |            0.81 |
+| cnli                                  |            0.9  |
+| boolq-natural-perturbations           |            0.72 |
+| equate                                |            0.65 |
+| logiqa-2.0-nli                        |            0.58 |
+| mindgames                             |            0.96 |
+| ConTRoL-nli                           |            0.66 |
+| logical-fallacy                       |            0.38 |
+| cladder                               |            0.89 |
+| conceptrules_v2                       |            1    |
+| zero-shot-label-nli                   |            0.79 |
+| scone                                 |            1    |
+| monli                                 |            1    |
+| SpaceNLI                              |            1    |
+| propsegment/nli                       |            0.92 |
+| FLD.v2/default                        |            0.91 |
+| FLD.v2/star                           |            0.78 |
+| SDOH-NLI                              |            0.99 |
+| scifact_entailment                    |            0.87 |
+| feasibilityQA                         |            0.79 |
+| AdjectiveScaleProbe-nli               |            1    |
+| resnli                                |            1    |
+| semantic_fragments_nli                |            1    |
+| dataset_train_nli                     |            0.95 |
+| nlgraph                               |            0.97 |
+| ruletaker                             |            0.99 |
+| PARARULE-Plus                         |            1    |
+| logical-entailment                    |            0.93 |
+| nope                                  |            0.56 |
+| LogicNLI                              |            0.91 |
+| contract-nli/contractnli_a/seg        |            0.88 |
+| contract-nli/contractnli_b/full       |            0.84 |
+| nli4ct_semeval2024                    |            0.72 |
+| biosift-nli                           |            0.92 |
+| SIGA-nli                              |            0.57 |
+| FOL-nli                               |            0.79 |
+| doc-nli                               |            0.81 |
+| mctest-nli                            |            0.92 |
+| natural-language-satisfiability       |            0.92 |
+| idioms-nli                            |            0.83 |
+| lifecycle-entailment                  |            0.79 |
+| MSciNLI                               |            0.84 |
+| hover-3way/nli                        |            0.92 |
+| seahorse_summarization_evaluation     |            0.81 |
+| missing-item-prediction/contrastive   |            0.88 |
+| Pol_NLI                               |            0.93 |
+| synthetic-retrieval-NLI/count         |            0.72 |
+| synthetic-retrieval-NLI/position      |            0.9  |
+| synthetic-retrieval-NLI/binary        |            0.92 |
+| babi_nli                              |            0.98 |
+# Usage
+## [ZS] Zero-shot classification pipeline
+```python
+from transformers import pipeline
+classifier = pipeline("zero-shot-classification",model="tasksource/ModernBERT-large-nli")
+text = "one day I will see the world"
+candidate_labels = ['travel', 'cooking', 'dancing']
+classifier(text, candidate_labels)
+```
+NLI training data of this model includes [label-nli](https://huggingface.co/datasets/tasksource/zero-shot-label-nli), a NLI dataset specially constructed to improve this kind of zero-shot classification.
+## [NLI] Natural language inference pipeline
+```python
+from transformers import pipeline
+pipe = pipeline("text-classification",model="tasksource/ModernBERT-large-nli")
+pipe([dict(text='there is a cat',
+  text_pair='there is a black cat')]) #list of (premise,hypothesis)
+```
+## Backbone for further fune-tuning
+This checkpoint has stronger reasoning and fine-grained abilities than the base version and can be used for further fine-tuning.
+# Citation
+```
+@inproceedings{sileo-2024-tasksource,
+    title = "tasksource: A Large Collection of {NLP} tasks with a Structured Dataset Preprocessing Framework",
+    author = "Sileo, Damien",
+    booktitle = "Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)",
+    month = may,
+    year = "2024",
+    address = "Torino, Italia",
+    publisher = "ELRA and ICCL",
+    url = "https://aclanthology.org/2024.lrec-main.1361",
+    pages = "15655--15684",
+}
+```