Model Card for Model ID

This model is a text classification model that identifies whether a given text expresses reasoning or not. It classifies text into two categories: "reasoning" (label 1) and "non-reasoning" (label 0).

Model Details

Model Description

This model is designed to classify text based on the presence of reasoning. It has been trained on the @CodeIsAbstract/reasoning_dataset, a dataset specifically created for this task. The model is intended to distinguish between text that presents logical arguments, explanations, or justifications (reasoning) and text that does not (non-reasoning).

Developed by: Samarth Pusalkar - Shared by: CodeIsAbstract - Model type: Transformer-based text classification model - Language(s) (NLP): English

License: mit

Finetuned from model: ModernBert-base ### Model Sources answerdotai/ModernBERT-base

Repository: https://huggingface.co/CodeIsAbstract/ReasoningTextClassifier

  • Developed by: Samarth Pusalkar
  • Model type: BertForSequenceClassification
  • Language(s) (NLP): English
  • License: MIT
  • Finetuned from model: answerdotai/ModernBERT-base

NOTE: Calling this model as reasoning classification model could be ambiguous users may find that the model does not classify a math problem solved step by step as reasoning, rather model is more inclined towards detecting reasoning pattern in text language and specifically of the reasoning and thinking patters of the LLMs like Deepseek and gemini, much towards the bias of its training data

Uses

Direct Use

The primary direct use of this model is to classify English text as either expressing reasoning (label 1) or not (label 0). Researchers, educators, and content analysts can use this model to automatically identify and categorize text based on the presence of reasoning. This model can be used to score an LLMs output as reasoning and non reasoning and potentially allowing the model to learn to predict reasoning like output.

Out-of-Scope Use

This model is intended for classifying English text. Its performance on other languages is not guaranteed. Misuse and out-of-scope scenarios include:

High-stakes decision making: The model's output should not be used as the sole basis for critical decisions, especially in contexts where incorrect reasoning detection could have significant negative consequences (e.g., legal or medical domains) without careful validation and human oversight. Detecting specific types of reasoning: The model is trained on a general dataset and may not be optimized for detecting specific types or nuances of reasoning (e.g., causal reasoning, deductive reasoning, etc.). Bias amplification: If the training dataset contains biases, the model may perpetuate or amplify these biases in its predictions. Users should be aware of potential biases in the model's output, especially when used on text from underrepresented groups or sensitive topics. Content generation: This model is designed for classification and not for generating text. It should not be used for generating text that is supposed to exhibit reasoning. Use on non-textual data: The model is specifically designed for text and should not be applied to other data types such as images or audio.

Bias, Risks, and Limitations

The model's performance is subject to several limitations:

Dataset Bias: The CodeIsAbstract/reasoning_dataset dataset's inherent biases may be reflected in the model. The dataset characteristics and potential biases should be further investigated in the dataset card. Generalization: The model's ability to generalize to text significantly different from the training data is not fully evaluated and may be limited. Ambiguity of "Reasoning": The concept of "reasoning" can be subjective and context-dependent. The model's definition of reasoning is based on the dataset labels and may not align perfectly with all users' interpretations. Technical limitations: As a machine learning model, it is not guaranteed to be perfectly accurate. Errors in classification are possible, particularly with complex or nuanced text. Evaluation Limitations: The evaluation metrics and testing data used to assess the model's performance may not fully capture its real-world effectiveness across all use cases. Further evaluation on diverse datasets and in different application scenarios is recommended.

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. Dataset Understanding: Users are strongly encouraged to examine the CodeIsAbstract/reasoning_dataset dataset card to understand the data it was trained on and potential biases within the data. Careful Evaluation: Before deploying the model in any application, especially in sensitive contexts, thorough evaluation on data relevant to the specific use case is essential. Consider evaluating on diverse datasets and with metrics beyond standard accuracy. Human Oversight: In applications where accuracy is critical, human review of the model's predictions is highly recommended. The model should be used as a tool to augment, not replace, human judgment. Bias Mitigation: Explore techniques for bias detection and mitigation if the model is to be used in contexts where fairness is a concern. Transparency: When using this model, clearly communicate its limitations and potential biases to end-users.

How to Get Started with the Model

Use the code below to get started with the model.

from transformers import pipeline

classifier = pipeline("text-classification", model="CodeIsAbstract/ReasoningTextClassifier")
text_example = """The user wants a detailed reasoning for each step in solving the equation x+3=5. I need to break down the process into smaller steps and explain the logic behind each manipulation.
                  Here's a plan:
                  
                  Start with the given equation: State the equation we are trying to solve.
                  Identify the goal: Explain that the goal is to isolate 'x' to find its value.
                  Recognize the operation with x: Identify that 3 is being added to x.
                  Determine the inverse operation: Explain that to isolate x, we need to perform the inverse operation of addition, which""" # Example text exhibiting reasoning
output = classifier(text_example)
print(output)

text_example_non_reasoning = ".., we need to isolate x on one side of the equation.
                                to help user evaluate the output
                                
                                The given equation is:
                                x + 3 = 5
                                
                                To isolate x, we need to remove the +3 from the left side of the equation.
                                We can do this by subtracting 3 from both sides of the equation to maintain the equality." # Example text without reasoning, this is an extreme case where the text seem to be reasoning math like but actually does not have the thinking patterns of LLM's
output_non_reasoning = classifier(text_example_non_reasoning)
print(output_non_reasoning)

Training Details

Training Data

@CodeIsAbstract/reasoning_dataset train split the training data is specifically desined to help classify text as reasoning(1) and non-reasoning(0), and is a derivative of Dolphine R1 dataset

Training Procedure

Training procedure was simply the default implementation of Trainer with BertForSequenceClassification

Training Hyperparameters

  • Training regime: fp16 mixed precision
  • model_architecture: ModernBert-Base,
  • learning_rate: 3e-5,
  • per_device_train_batch_size: 704,
  • per_device_eval_batch_size: 512,
  • num_train_epochs: 2,
  • gradient_accumulation_steps: 4,
  • dataloader_num_workers: 4,
  • weight_decay: 0.001,
  • warmup_ratio: 0.03,
  • logging_steps: 50,
  • evaluation_strategy: steps,
  • eval_steps: 100,
  • save_strategy: steps,
  • save_steps: 200,
  • load_best_model_at_end: True,
  • metric_for_best_model: eval_loss,
  • gradient_checkpointing: True,
  • fp16: True
  • torch.backends.cudnn.benchmark: True # Enable cudnn auto-tuner
  • torch.backends.cuda.matmul.allow_tf32: True # Allow TF32 on Ampere
  • orch.backends.cudnn.allow_tf32L: True #

Evaluation

Testing Data, Factors & Metrics

Testing Data

Testing dataset comes from the same distribution as train set from @CodeIsAbstract/reasoning_dataset test split Testing dataset size -> 165k samples

Metrics

Tested on test set of @CodeIsAbstract/reasoning_dataset

-eval_loss: 0.003581336699426174 -eval_model_preparation_time: 0.0048 -eval_accuracy: 0.9991756576554733 -eval_precision: 0.9991760105961167 -eval_recall: 0.9991756576554733 -eval_f1: 0.9991756643183358 -eval_runtime: 447.9271 -eval_samples_per_second: 368.319 -eval_steps_per_second: 0.721

Results

The model is able to classify the test set samples with near 100% accuracy.

Environmental Impact

  • Hardware Type: L40S
  • Hours used: 6Hrs
  • Cloud Provider: Lightning-AI
  • Compute Region: N.A.
  • Carbon Emitted: N.A.

Technical Specifications

Model Architecture and Objective

Derived from Bert, BertForSequenceClassification

Compute Infrastructure

Hardware

Trained on L40S for 6 Hrs 2 epochs completed on entire train set size 1.04M text

Software

Trained with huggingface transformers library -> BertForSequenceClassification with CrossEntropyLossFunction

Citation

BibTeX:

@misc{pusalkar2025reasoningclf,
      title={Sequence Classifier for classifing reasoning dataset}, 
      author={Samarth Pusalkar},
      year={2025}
}

Taken from base model:

@misc{modernbert,
      title={Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference}, 
      author={Benjamin Warner and Antoine Chaffin and Benjamin Clavié and Orion Weller and Oskar Hallström and Said Taghadouini and Alexis Gallagher and Raja Biswas and Faisal Ladhak and Tom Aarsen and Nathan Cooper and Griffin Adams and Jeremy Howard and Iacopo Poli},
      year={2024},
      eprint={2412.13663},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2412.13663}, 
}

Model Card Authors

Samarth Pusalkar

Model Card Contact

[email protected]

Downloads last month
7
Safetensors
Model size
137M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for CodeIsAbstract/ReasoningTextClassifier

Finetuned
(485)
this model

Dataset used to train CodeIsAbstract/ReasoningTextClassifier