AICodexLab
/

answerdotai-ModernBERT-base-ai-detector

@@ -3,69 +3,115 @@ library_name: transformers
 license: apache-2.0
 base_model: answerdotai/ModernBERT-base
 tags:
 - generated_from_trainer
 model-index:
 - name: answerdotai-ModernBERT-base-ai-detector
   results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
 # answerdotai-ModernBERT-base-ai-detector
-This model is a fine-tuned version of [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) on the None dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.0036
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
 The following hyperparameters were used during training:
-- learning_rate: 2e-05
-- train_batch_size: 16
-- eval_batch_size: 16
-- seed: 42
-- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
-- lr_scheduler_type: linear
-- num_epochs: 3
-- mixed_precision_training: Native AMP
-### Training results
 | Training Loss | Epoch  | Step | Validation Loss |
-|:-------------:|:------:|:----:|:---------------:|
-| 0.0505        | 0.2228 | 500  | 0.0214          |
-| 0.0114        | 0.4456 | 1000 | 0.0110          |
-| 0.0088        | 0.6684 | 1500 | 0.0032          |
-| 0.0           | 0.8913 | 2000 | 0.0048          |
-| 0.0068        | 1.1141 | 2500 | 0.0035          |
-| 0.0           | 1.3369 | 3000 | 0.0040          |
-| 0.0           | 1.5597 | 3500 | 0.0097          |
-| 0.0053        | 1.7825 | 4000 | 0.0101          |
-| 0.0           | 2.0053 | 4500 | 0.0053          |
-| 0.0           | 2.2282 | 5000 | 0.0039          |
-| 0.0017        | 2.4510 | 5500 | 0.0046          |
-| 0.0           | 2.6738 | 6000 | 0.0043          |
-| 0.0           | 2.8966 | 6500 | 0.0036          |
-### Framework versions
-- Transformers 4.48.3
-- Pytorch 2.5.1+cu124
-- Datasets 3.3.2
-- Tokenizers 0.21.0

 license: apache-2.0
 base_model: answerdotai/ModernBERT-base
 tags:
+- text-classification
+- ai-content-detection
+- bert
+- transformers
 - generated_from_trainer
 model-index:
 - name: answerdotai-ModernBERT-base-ai-detector
   results: []
 ---
 # answerdotai-ModernBERT-base-ai-detector
+This model is a fine-tuned version of [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) on the **AI vs Human Text Classification dataset**.
 It achieves the following results on the evaluation set:
+- **Validation Loss:** `0.0036`
+---
+## **📝 Model Description**
+This model is based on **ModernBERT-base**, a lightweight and efficient BERT-based model.
+It has been fine-tuned for **AI-generated vs Human-written text classification**, allowing it to distinguish between texts written by **AI models (ChatGPT, DeepSeek, Claude, etc.)** and human authors.
+---
+## **🎯 Intended Uses & Limitations**
+### ✅ **Intended Uses**
+- **AI-generated content detection** (e.g., ChatGPT, Claude, DeepSeek).
+- **Text classification** for distinguishing human vs AI-generated content.
+- **Educational & Research applications** for AI-content detection.
+### ⚠️ **Limitations**
+- **Not 100% accurate** → Some AI texts may resemble human writing and vice versa.
+- **Limited to trained dataset scope** → May struggle with **out-of-domain** text.
+- **Bias risks** → If the dataset contains bias, the model may inherit it.
+---
+## **📊 Training and Evaluation Data**
+- The model was fine-tuned on **35,894 training samples** and **8,974 test samples**.
+- The dataset consists of **AI-generated text samples (ChatGPT, Claude, DeepSeek, etc.)** and **human-written samples (Wikipedia, books, articles)**.
+- Labels:
+  - `1` → AI-generated text
+  - `0` → Human-written text
+---
+## **⚙️ Training Procedure**
+### **Training Hyperparameters**
 The following hyperparameters were used during training:
+| Hyperparameter        | Value                |
+|----------------------|--------------------|
+| **Learning Rate**    | `2e-5`             |
+| **Train Batch Size** | `16`               |
+| **Eval Batch Size**  | `16`               |
+| **Optimizer**        | `AdamW` (`β1=0.9, β2=0.999, ε=1e-08`) |
+| **LR Scheduler**     | `Linear`           |
+| **Epochs**          | `3`                |
+| **Mixed Precision**  | `Native AMP (fp16)` |
+---
+## **📈 Training Results**
 | Training Loss | Epoch  | Step | Validation Loss |
+|--------------|--------|------|----------------|
+| 0.0505       | 0.22   | 500  | 0.0214         |
+| 0.0114       | 0.44   | 1000 | 0.0110         |
+| 0.0088       | 0.66   | 1500 | 0.0032         |
+| 0.0          | 0.89   | 2000 | 0.0048         |
+| 0.0068       | 1.11   | 2500 | 0.0035         |
+| 0.0          | 1.33   | 3000 | 0.0040         |
+| 0.0          | 1.55   | 3500 | 0.0097         |
+| 0.0053       | 1.78   | 4000 | 0.0101         |
+| 0.0          | 2.00   | 4500 | 0.0053         |
+| 0.0          | 2.22   | 5000 | 0.0039         |
+| 0.0017       | 2.45   | 5500 | 0.0046         |
+| 0.0          | 2.67   | 6000 | 0.0043         |
+| 0.0          | 2.89   | 6500 | 0.0036         |
+---
+## **🛠 Framework Versions**
+| Library       | Version     |
+|--------------|------------|
+| **Transformers** | `4.48.3`  |
+| **PyTorch**      | `2.5.1+cu124` |
+| **Datasets**     | `3.3.2`  |
+| **Tokenizers**   | `0.21.0` |
+---
+## **📤 Model Usage**
+To load and use the model for text classification:
+```python
+from transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline
+model_name = "answerdotai/ModernBERT-base-ai-detector"
+# Load model and tokenizer
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForSequenceClassification.from_pretrained(model_name)
+# Create text classification pipeline
+classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)
+# Run classification
+text = "This text was written by an AI model like ChatGPT."
+result = classifier(text)
+print(result)
+```