|
--- |
|
library_name: transformers |
|
tags: [] |
|
--- |
|
|
|
# Model Card: DistilBERT-based Joke Detection (needed this because I'm German) |
|
|
|
## Model Details |
|
- **Model Type:** Fine-tuned DistilBERT base model (uncased) |
|
- **Task:** Binary classification for joke detection |
|
- **Output:** Joke or No-joke sentiment |
|
|
|
## Training Data |
|
- **Dataset:** 200k Short Texts for Humor Detection |
|
- **Link:** https://www.kaggle.com/datasets/deepcontractor/200k-short-texts-for-humor-detection |
|
- **Size:** 200,000 labeled short texts |
|
- **Distribution:** Equally balanced between humor and non-humor |
|
- **Source:** Primarily from r/jokes and r/cleanjokes subreddits |
|
|
|
## Base Model |
|
DistilBERT base model (uncased), a distilled version of BERT optimized for efficiency while maintaining performance. |
|
|
|
## Usage |
|
```python |
|
from transformers import pipeline |
|
|
|
model_id = "VitalContribution/JokeDetectBERT" |
|
pipe = pipeline('text-classification', model=model_id) |
|
|
|
joke_questionmark = "What do elves learn in school? The elf-abet." |
|
|
|
out = pipe(joke_questionmark)[0] |
|
label = out['label'] |
|
confidence = out['score'] |
|
result = "JOKE" if label == 'LABEL_1' else "NO JOKE" |
|
print(f"Prediction: {result} ({confidence:.2f})") |
|
``` |
|
|
|
## Training Details |
|
| Parameter | Value | |
|
|:----------|:------| |
|
| Model | DistilBERT (base-uncased) | |
|
| Task | Sequence Classification | |
|
| Number of Classes | 2 | |
|
| Batch Size | 32 (per device) | |
|
| Learning Rate | 2e-4 | |
|
| Weight Decay | 0.01 | |
|
| Epochs | 2 | |
|
| Warmup Steps | 100 | |
|
| Best Model Selection | Based on eval_loss | |
|
|
|
### Model Evaluation |
|
<img src="https://cdn-uploads.huggingface.co/production/uploads/63ae02ff20176b2d21669dd6/lEXXQguN-8-VVrFnlmW5o.png" width="600" alt="Model Evaluation Image 1"> |
|
<img src="https://cdn-uploads.huggingface.co/production/uploads/63ae02ff20176b2d21669dd6/A99ZeYAr1jb32YF_yBin8.png" width="600" alt="Model Evaluation Image 2"> |
|
|