vector-institute
/

nmb-plus-bias-ner-bert

Token Classification

Model card Files Files and versions Community

nmb-plus-bias-ner-bert / README.md

rjavadi's picture

Add license

e9a1062 verified 21 days ago

|

history blame contribute delete

2.99 kB

	---
	license: cc
	datasets:
	- vector-institute/NMB-Plus-Named-Entities
	base_model:
	- distilbert/distilbert-base-uncased
	pipeline_tag: token-classification
	tags:
	- ner
	- bias_detection
	model-index:
	- name: nmb-plus-bias-ner-bert
	results:
	- task:
	type: named-entity-recognition
	name: Named Entity Recognition (NER)
	dataset:
	type: vector-institute/NMB-Plus-Named-Entities
	name: Biased Named Entities
	metrics:
	- type: precision
	value: 0.6405
	- type: recall
	value: 0.5589
	- type: f1
	value: 0.5922
	language:
	- en
	---

	# Model Overview

	A fine-tuned DistilBERT model for Named Entity Recognition (NER) in bias detection.

	## Model Details
	We used `distilbert-base-uncased` and fine-tuned it on `vector-institute/NMB-Plus-Named-Entities` dataset.



	## How to Get Started with the Model

	```python
	from transformers import AutoModelForTokenClassification, AutoTokenizer

	model_name = "vector-institute/nmb-plus-bias-ner-bert"
	tokenizer = AutoTokenizer.from_pretrained(model_name)

	label_list = ["O", "B-BIAS", "I-BIAS"]
	id2label = {i: label for i, label in enumerate(label_list)}
	label2id = {label: i for i, label in enumerate(label_list)}

	model = AutoModelForTokenClassification.from_pretrained(
	model_name,
	id2label=id2label,
	label2id=label2id
	)


	ner_pipeline = pipeline("ner", model=model, tokenizer=tokenizer)

	text = "Fox News reported that Joe Biden met with CNN executives."
	predictions = ner_pipeline(text)
	print(predictions)

	```


	## Training Hyperparameters

	- Training regime:
	Here's the training arguments we used:

	```python
	training_args = TrainingArguments(
	learning_rate=2e-5,
	per_device_train_batch_size=64,
	per_device_eval_batch_size=32,
	num_train_epochs=10,
	weight_decay=0.01,
	eval_strategy="epoch",
	save_strategy="epoch",
	load_best_model_at_end=True,
	output_dir="./results",
	logging_dir="./logs",
	logging_steps=50,
	group_by_length=True,
	)
	```


	## Evaluation

	We split the data to train(80%), validation(10%) and test(10%) sets.


	### Results
	We used common classification metrics:
	- precision
	- recall
	- f1-score

	#### Overall Results:
	\| Metric \| Precision \| Recall \| F1-Score \| Support \|
	\|---------------\|-----------\|--------\|----------\|---------\|
	\| Macro Avg \| 0.6405 \| 0.5589 \| 0.5922 \| 48710 \|
	\| Weighted Avg \| 0.9330 \| 0.9418 \| 0.9366 \| 48710 \|

	#### Per-class Results:

	\| Label \| Precision \| Recall \| F1-Score \| Support \|
	\|----------\|-----------\|--------\|----------\|---------\|
	\| O \| 0.9615 \| 0.9792 \| 0.9703 \| 45921 \|
	\| B-BIAS \| 0.5314 \| 0.4183 \| 0.4681 \| 930 \|
	\| I-BIAS \| 0.4286 \| 0.2792 \| 0.3381 \| 1859 \|


	## Environmental Impact

	Total energy consumption for fine-tuning is 0.032804 kWh

	Local CO2 Emission: Approximately 3.12 grams of CO₂ equivalent.

	## License
	CC BY 4.0 (Creative Commons Attribution 4.0): Allows sharing and adaptation with proper credit.