iTzMiNOS/distilbert-uncased-fake-news-detector

This model is a fine-tuned version of the distilbert-base-uncased model for fake news detection. The model classifies news articles into two categories: Fake or Real based on their titles and content. This model is trained using the transformers library and leverages pre-trained embeddings from DistilBERT, fine-tuned on a custom fake news dataset.

Model Description

Model Type: DistilBERT (fine-tuned on distilbert-base-uncased)
Task: Binary classification (Fake vs. Real news)
Preprocessing: Text normalization (lowercasing, stopword removal, tokenization)
Dataset: Custom fake news dataset (not publicly available, please use your own dataset)
Evaluation Metrics: Accuracy, Classification Report (Precision, Recall, F1-Score)

Model Card

Intended Use

This model can be used to classify news articles into Fake or Real categories. It's suitable for applications where automated detection of misinformation or fake news is required.

Limitations

The model is trained on a specific dataset, so it might not generalize well to other domains or contexts of news.
The performance could vary if the dataset contains other languages or heavily domain-specific terminology not seen during training.

Training Information

Base Model: distilbert-base-uncased
Epochs: 2 epochs
Batch Size: 32
Learning Rate: 2e-5
Training Libraries: transformers, torch, datasets
Dataset : 20800 Entries of custom dataset (50% real news, 50% fake news)
Preprocessing: The text is preprocessed using NLTK for stopword removal and tokenization.

Installation

To use this model, you'll need to install the following dependencies:

pip install transformers torch datasets nltk evaluate

Model Usage

Load the Model

To use the model for inference, you can load it directly from the Hugging Face Model Hub:

from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

# Load the fine-tuned model
model_name = "iTzMiNOS/distilbert-uncased-fake-news-detector"
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Define device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

# Example input text
text = "Breaking: The government announces new economic measures."

# Tokenize the input text
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)
inputs = {key: value.to(device) for key, value in inputs.items()}

# Inference
with torch.no_grad():
    logits = model(**inputs).logits

# Get predicted class (Fake = 1, Real = 0)
predicted_class = torch.argmax(logits, dim=-1).item()
print("Predicted Class:", "Fake" if predicted_class == 1 else "Real")

Inference Output

Real: The model predicts the article is reliable.

Fake: The model predicts the article is unreliable.

Conclusion

This model provides a solid base for detecting fake news articles based on their titles and content. It can be further fine-tuned or retrained for domain-specific tasks if needed.

iTzMiNOS
/

distilbert-uncased-fake-news-detector