Plot Arc Character Classifier

A DeBERTa-v3-XSmall model fine-tuned to classify fictional characters based on their plot arc potential.

Model Description

This model classifies character descriptions into two categories:

STRONG (label 1): Characters with both internal conflict and external responsibilities/events
WEAK (label 0): Characters with no plot arc, pure internal conflict only, or pure external events only

The model fixes critical bias issues where simple background characters (shopkeepers, guards) were incorrectly classified as plot-significant.

Training Data

Dataset Size: 11,888 balanced examples (50/50 split)
Training Examples: 9,510
Validation Examples: 2,378
Source: Custom 4-way classified character descriptions from literature

Label Mapping

STRONG (1): Characters classified as "BOTH" (internal conflict + external events)
WEAK (0): Characters classified as "NONE", "INTERNAL", or "EXTERNAL"

Training Details

Base Model: microsoft/deberta-v3-xsmall (22M parameters)
Training Time: ~15 minutes
Batch Size: 8 (with gradient accumulation = 2)
Max Sequence Length: 384 tokens
Learning Rate: 5e-5 with warmup
Early Stopping: Yes (stopped at 3.7/5 epochs)

Performance

Validation Metrics

Metric	Score
Accuracy	79.6%
F1 (Strong)	79.6%
Precision (Strong)	77.7%
Recall (Strong)	81.6%

Synthetic Test Results

100% accuracy on diverse test cases including previously problematic examples:

Character Type	Example	Prediction	Confidence
Background (NONE)	Baker, Guard	WEAK ✅	98.9%, 98.5%
Pure Internal	Haunted Artist	WEAK ✅	93.9%
Pure External	Military Commander	WEAK ✅	94.5%
Both (Internal+External)	Conflicted King	STRONG ✅	95.1%
Both (Trauma+Mission)	PTSD Captain	STRONG ✅	95.5%
Both (Doubt+Quest)	Uncertain Prophet	STRONG ✅	96.0%

Key Achievement: Fixed critical bias where simple background characters were incorrectly classified as plot-significant.

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("plot-arc-classifier")
model = AutoModelForSequenceClassification.from_pretrained("plot-arc-classifier")

# Example usage
def classify_character(description):
    inputs = tokenizer(description, return_tensors="pt", truncation=True, max_length=384)
    
    with torch.no_grad():
        outputs = model(**inputs)
        probabilities = torch.nn.functional.softmax(outputs.logits, dim=-1)
        predicted_class = torch.argmax(probabilities, dim=-1).item()
    
    labels = {0: "WEAK", 1: "STRONG"}
    confidence = probabilities[0][predicted_class].item()
    
    return labels[predicted_class], confidence

# Test examples
examples = [
    "A baker who makes fresh bread daily and serves customers with a smile.",
    "A warrior haunted by past failures who must lead a desperate battle to save his homeland while confronting his inner demons.",
]

for desc in examples:
    label, conf = classify_character(desc)
    print(f"'{desc[:50]}...': {label} ({conf:.3f})")

Model Improvements

This model addresses critical issues from previous versions:

Fixed Bias: No longer classifies simple background characters as STRONG
Proper Discrimination: Requires both internal and external elements for STRONG classification
Balanced Training: 50/50 split prevents class imbalance issues
Clean Taxonomy: Based on proper 4-way character analysis

Limitations

Trained on English literary character descriptions
May not generalize well to other domains (screenwriting, gaming, etc.)
Performance may degrade on very short or very long descriptions
Cultural bias toward Western narrative structures

Ethical Considerations

This model is designed for narrative analysis and creative writing assistance. It should not be used to make judgments about real people or for any discriminatory purposes.

Citation

If you use this model, please cite:

@misc{plot-arc-classifier-2024,
  title={Plot Arc Character Classifier},
  author={Generated with Claude Code},
  year={2024},
  url={https://huggingface.co/plot-arc-classifier}
}

Training Infrastructure

Framework: 🤗 Transformers
Hardware: Apple Silicon (MPS)
Optimization: Memory-optimized for MPS training
Early Stopping: Enabled to prevent overfitting

🤖 Generated with Claude Code

Co-Authored-By: Claude [email protected]

Mitchins
/

deberta-v3-xs-plot-arc-classifier