Text Classification
PyTorch
English
bert

Social Media Style Classifier for Climate Change Text

This model is a fine-tuned bert-base-uncased on a binary classification task to determine whether an English text about Climate Change is written in a social media style.

Social media texts were gathered from ClimaConvo and DEBAGREEMENT.

Non-social media texts were gathered from diverse sources including article abstracts (G11/climate_adaptation_abstracts), Wikipedia articles (pierre-pessarossi/wikipedia-climate-data), and IPCC reports (rlacombe/ClimateX).

The dataset contained about 60K instances, with a 50/50 distribution between the two classes. It was shuffled with a random seed of 42 and split into 80/20 for training/testing. The V100-16GB GPU was used for training three epochs with a batch size of 8. Other hyperparameters were default values from the HuggingFace Trainer.

The model was trained in order to evaluate a text style transfer task, converting formal-language texts to tweets.

How to use

from transformers import AutoModelForSequenceClassification, AutoTokenizer, TextClassificationPipeline

model_name = "rabuahmad/cc-tweets-classifier"

model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name, max_len=512)

classifier = TextClassificationPipeline(model=model, tokenizer=tokenizer, truncation=True, max_length=512)

text = "Yesterday was a great day!"

result = classifier(text)

Label 1 indicates that the text is predicted to be a tweet.

Evaluation

Evaluation results on the test set:

Metric Score
Accuracy 0.99747
Precision 1.0
Recall 0.99493
F1 0.99746
Downloads last month
11
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for rabuahmad/cc-tweets-classifier

Finetuned
(3348)
this model

Datasets used to train rabuahmad/cc-tweets-classifier