File size: 11,278 Bytes
bf7d133 f8ff32c bf7d133 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 |
---
license: mit
datasets:
- CodeIsAbstract/reasoning_dataset
language:
- en
metrics:
- accuracy
- f1
- recall
base_model:
- answerdotai/ModernBERT-base
pipeline_tag: text-classification
---
# Model Card for Model ID
This model is a text classification model that identifies whether a given text expresses reasoning or not. It classifies text into two categories: "reasoning" (label 1) and "non-reasoning" (label 0).
## Model Details
### Model Description
This model is designed to classify text based on the presence of reasoning. It has been trained on the @CodeIsAbstract/reasoning_dataset, a dataset specifically created for this task. The model is intended to distinguish between text that presents logical arguments, explanations, or justifications (reasoning) and text that does not (non-reasoning).
Developed by: Samarth Pusalkar - Shared by: CodeIsAbstract - Model type: Transformer-based text classification model - Language(s) (NLP): English
License: mit
Finetuned from model: ModernBert-base ### Model Sources [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base)
Repository: [https://huggingface.co/CodeIsAbstract/ReasoningTextClassifier](https://huggingface.co/CodeIsAbstract/ReasoningTextClassifier)
- **Developed by:** Samarth Pusalkar
- **Model type:** BertForSequenceClassification
- **Language(s) (NLP):** English
- **License:** MIT
- **Finetuned from model:** answerdotai/ModernBERT-base
NOTE: Calling this model as reasoning classification model could be ambiguous users may find that the model does not classify a math problem solved step by step as reasoning,
rather model is more inclined towards detecting reasoning pattern in text language and specifically of the reasoning and thinking patters of the LLMs like Deepseek and gemini, much towards the bias of its training data
## Uses
### Direct Use
The primary direct use of this model is to classify English text as either expressing reasoning (label 1) or not (label 0). Researchers, educators, and content analysts can use this model to automatically identify and categorize text based on the presence of reasoning.
This model can be used to score an LLMs output as reasoning and non reasoning and potentially allowing the model to learn to predict reasoning like output.
### Out-of-Scope Use
This model is intended for classifying English text. Its performance on other languages is not guaranteed. Misuse and out-of-scope scenarios include:
High-stakes decision making: The model's output should not be used as the sole basis for critical decisions, especially in contexts where incorrect reasoning detection could have significant negative consequences (e.g., legal or medical domains) without careful validation and human oversight.
Detecting specific types of reasoning: The model is trained on a general dataset and may not be optimized for detecting specific types or nuances of reasoning (e.g., causal reasoning, deductive reasoning, etc.).
Bias amplification: If the training dataset contains biases, the model may perpetuate or amplify these biases in its predictions. Users should be aware of potential biases in the model's output, especially when used on text from underrepresented groups or sensitive topics.
Content generation: This model is designed for classification and not for generating text. It should not be used for generating text that is supposed to exhibit reasoning.
Use on non-textual data: The model is specifically designed for text and should not be applied to other data types such as images or audio.
## Bias, Risks, and Limitations
The model's performance is subject to several limitations:
Dataset Bias: The CodeIsAbstract/reasoning_dataset dataset's inherent biases may be reflected in the model.
The dataset characteristics and potential biases should be further investigated in the dataset card.
Generalization: The model's ability to generalize to text significantly different from the training data is not fully evaluated and may be limited.
Ambiguity of "Reasoning": The concept of "reasoning" can be subjective and context-dependent. The model's definition of reasoning is based on the dataset labels and may not align perfectly with all users' interpretations.
Technical limitations: As a machine learning model, it is not guaranteed to be perfectly accurate. Errors in classification are possible, particularly with complex or nuanced text.
Evaluation Limitations: The evaluation metrics and testing data used to assess the model's performance may not fully capture its real-world effectiveness across all use cases. Further evaluation on diverse datasets and in different application scenarios is recommended.
### Recommendations
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.
Dataset Understanding: Users are strongly encouraged to examine the CodeIsAbstract/reasoning_dataset dataset card to understand the data it was trained on and potential biases within the data.
Careful Evaluation: Before deploying the model in any application, especially in sensitive contexts, thorough evaluation on data relevant to the specific use case is essential. Consider evaluating on diverse datasets and with metrics beyond standard accuracy.
Human Oversight: In applications where accuracy is critical, human review of the model's predictions is highly recommended. The model should be used as a tool to augment, not replace, human judgment.
Bias Mitigation: Explore techniques for bias detection and mitigation if the model is to be used in contexts where fairness is a concern.
Transparency: When using this model, clearly communicate its limitations and potential biases to end-users.
## How to Get Started with the Model
Use the code below to get started with the model.
```
from transformers import pipeline
classifier = pipeline("text-classification", model="CodeIsAbstract/ReasoningTextClassifier")
text_example = """The user wants a detailed reasoning for each step in solving the equation x+3=5. I need to break down the process into smaller steps and explain the logic behind each manipulation.
Here's a plan:
Start with the given equation: State the equation we are trying to solve.
Identify the goal: Explain that the goal is to isolate 'x' to find its value.
Recognize the operation with x: Identify that 3 is being added to x.
Determine the inverse operation: Explain that to isolate x, we need to perform the inverse operation of addition, which""" # Example text exhibiting reasoning
output = classifier(text_example)
print(output)
text_example_non_reasoning = ".., we need to isolate x on one side of the equation.
to help user evaluate the output
The given equation is:
x + 3 = 5
To isolate x, we need to remove the +3 from the left side of the equation.
We can do this by subtracting 3 from both sides of the equation to maintain the equality." # Example text without reasoning, this is an extreme case where the text seem to be reasoning math like but actually does not have the thinking patterns of LLM's
output_non_reasoning = classifier(text_example_non_reasoning)
print(output_non_reasoning)
```
## Training Details
### Training Data
@[CodeIsAbstract/reasoning_dataset](https://huggingface.co/datasets/CodeIsAbstract/reasoning_dataset) train split
the training data is specifically desined to help classify text as reasoning(1) and non-reasoning(0), and is a derivative of [Dolphine R1 dataset](https://huggingface.co/datasets/cognitivecomputations/dolphin-r1)
### Training Procedure
Training procedure was simply the default implementation of Trainer with BertForSequenceClassification
#### Training Hyperparameters
- **Training regime:** fp16 mixed precision <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
- **model_architecture:** ModernBert-Base,
- **learning_rate:** 3e-5,
- **per_device_train_batch_size:** 704,
- **per_device_eval_batch_size:** 512,
- **num_train_epochs:** 2,
- **gradient_accumulation_steps:** 4,
- **dataloader_num_workers:** 4,
- **weight_decay:** 0.001,
- **warmup_ratio:** 0.03,
- **logging_steps:** 50,
- **evaluation_strategy:** **steps**,
- **eval_steps:** 100,
- **save_strategy:** **steps**,
- **save_steps:** 200,
- **load_best_model_at_end:** True,
- **metric_for_best_model:** **eval_loss**,
- **gradient_checkpointing:** True,
- **fp16:** True
- **torch.backends.cudnn.benchmark:** True # Enable cudnn auto-tuner
- **torch.backends.cuda.matmul.allow_tf32:** True # Allow TF32 on Ampere
- **orch.backends.cudnn.allow_tf32L:** True #
## Evaluation
### Testing Data, Factors & Metrics
#### Testing Data
Testing dataset comes from the same distribution as train set from @[CodeIsAbstract/reasoning_dataset](https://huggingface.co/datasets/CodeIsAbstract/reasoning_dataset) test split
Testing dataset size -> 165k samples
#### Metrics
Tested on test set of [@CodeIsAbstract/reasoning_dataset](https://huggingface.co/datasets/CodeIsAbstract/reasoning_dataset)
-**eval_loss:** 0.003581336699426174
-**eval_model_preparation_time:** 0.0048
-**eval_accuracy:** 0.9991756576554733
-**eval_precision:** 0.9991760105961167
-**eval_recall:** 0.9991756576554733
-**eval_f1:** 0.9991756643183358
-**eval_runtime:** 447.9271
-**eval_samples_per_second:** 368.319
-**eval_steps_per_second:** 0.721
### Results
The model is able to classify the test set samples with near 100% accuracy.
## Environmental Impact
- **Hardware Type:** L40S
- **Hours used:** 6Hrs
- **Cloud Provider:** Lightning-AI
- **Compute Region:** N.A.
- **Carbon Emitted:** N.A.
## Technical Specifications
### Model Architecture and Objective
Derived from Bert, BertForSequenceClassification
### Compute Infrastructure
#### Hardware
Trained on L40S for 6 Hrs 2 epochs completed on entire train set size 1.04M text
#### Software
Trained with huggingface transformers library -> BertForSequenceClassification with CrossEntropyLossFunction
## Citation
**BibTeX:**
```
@misc{pusalkar2025reasoningclf,
title={Sequence Classifier for classifing reasoning dataset},
author={Samarth Pusalkar},
year={2025}
}
```
Taken from base model:
```
@misc{modernbert,
title={Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference},
author={Benjamin Warner and Antoine Chaffin and Benjamin Clavié and Orion Weller and Oskar Hallström and Said Taghadouini and Alexis Gallagher and Raja Biswas and Faisal Ladhak and Tom Aarsen and Nathan Cooper and Griffin Adams and Jeremy Howard and Iacopo Poli},
year={2024},
eprint={2412.13663},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2412.13663},
}
```
## Model Card Authors
Samarth Pusalkar
## Model Card Contact
[[email protected]]([email protected]) |