Update README.md
Browse files
README.md
CHANGED
@@ -1,11 +1,20 @@
|
|
1 |
---
|
2 |
-
language:
|
|
|
|
|
|
|
|
|
3 |
tags:
|
4 |
- conspiracy-detection
|
5 |
- content-moderation
|
6 |
- bert
|
7 |
- prct
|
8 |
- social-media
|
|
|
|
|
|
|
|
|
|
|
9 |
license: mit
|
10 |
datasets:
|
11 |
- custom
|
@@ -14,14 +23,78 @@ metrics:
|
|
14 |
- f1
|
15 |
- precision
|
16 |
- recall
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
17 |
---
|
18 |
|
19 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
20 |
|
21 |
## Model description
|
22 |
|
23 |
CT-BERT-PRCT is a fine-tuned version of CT-BERT specifically adapted for detecting Population Replacement Conspiracy Theory (PRCT) content across social media platforms. The model has been trained to identify both explicit and implicit PRCT narratives while maintaining robust cross-platform generalization capabilities.
|
24 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
25 |
## Intended uses & limitations
|
26 |
|
27 |
### Intended uses
|
@@ -71,6 +144,32 @@ Detailed performance metrics:
|
|
71 |
|
72 |
The model demonstrates strong performance on its primary training domain (YouTube - English) while maintaining reasonable effectiveness in cross-platform and multilingual scenarios (Telegram - Portuguese and Spanish), showing good generalization capabilities across different social media environments.
|
73 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
74 |
## Example Usage
|
75 |
|
76 |
```python
|
@@ -164,4 +263,5 @@ If you use this model, please cite:
|
|
164 |
|
165 |
## Contact
|
166 |
|
167 |
-
Erik Bran Marino ([email protected])
|
|
|
|
1 |
---
|
2 |
+
language:
|
3 |
+
- multilingual
|
4 |
+
- en
|
5 |
+
- es
|
6 |
+
- pt
|
7 |
tags:
|
8 |
- conspiracy-detection
|
9 |
- content-moderation
|
10 |
- bert
|
11 |
- prct
|
12 |
- social-media
|
13 |
+
- misinformation
|
14 |
+
- hate-speech
|
15 |
+
- PRCT
|
16 |
+
- cross-platform
|
17 |
+
- multilingual-classification
|
18 |
license: mit
|
19 |
datasets:
|
20 |
- custom
|
|
|
23 |
- f1
|
24 |
- precision
|
25 |
- recall
|
26 |
+
widget:
|
27 |
+
- text: Immigration is necessary for economic growth and demographic balance.
|
28 |
+
example_title: Non-PRCT Example
|
29 |
+
- text: >-
|
30 |
+
They are deliberately replacing us with foreigners to change voting
|
31 |
+
patterns.
|
32 |
+
example_title: PRCT Example
|
33 |
+
pipeline_tag: text-classification
|
34 |
+
base_model:
|
35 |
+
- digitalepidemiologylab/covid-twitter-bert-v2
|
36 |
+
library_name: transformers
|
37 |
---
|
38 |
|
39 |
+
<div align="center">
|
40 |
+
|
41 |
+
# 🔍 CT-BERT-PRCT
|
42 |
+
|
43 |
+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/bert-arquitecture.png" width="600"/>
|
44 |
+
|
45 |
+
**A specialized BERT model for detection of Population Replacement Conspiracy Theory content**
|
46 |
+
|
47 |
+
[](https://huggingface.co/erikbranmarino/CT-BERT-PRCT)
|
48 |
+
[](https://opensource.org/licenses/MIT)
|
49 |
+
|
50 |
+
</div>
|
51 |
+
|
52 |
+
## Overview
|
53 |
+
|
54 |
+
<table>
|
55 |
+
<tr>
|
56 |
+
<td>
|
57 |
+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/social-thumbnails/transformers-v4.0.0.png" width="240" />
|
58 |
+
</td>
|
59 |
+
<td>
|
60 |
+
<b>CT-BERT-PRCT</b> is a fine-tuned BERT model for detecting Population Replacement Conspiracy Theory content across multiple platforms and languages. <br/><br/>
|
61 |
+
<b>Key metrics:</b>
|
62 |
+
<ul>
|
63 |
+
<li>YouTube Accuracy: 83.8%</li>
|
64 |
+
<li>Telegram Accuracy: 71.9%</li>
|
65 |
+
<li>Cross-platform F1: 71.2%</li>
|
66 |
+
</ul>
|
67 |
+
</td>
|
68 |
+
</tr>
|
69 |
+
</table>
|
70 |
|
71 |
## Model description
|
72 |
|
73 |
CT-BERT-PRCT is a fine-tuned version of CT-BERT specifically adapted for detecting Population Replacement Conspiracy Theory (PRCT) content across social media platforms. The model has been trained to identify both explicit and implicit PRCT narratives while maintaining robust cross-platform generalization capabilities.
|
74 |
|
75 |
+
## Performance Visualization
|
76 |
+
|
77 |
+
<div align="center">
|
78 |
+
<img src="https://quickchart.io/chart?c={type:'radar',data:{labels:['Accuracy','Precision','Recall','F1-score'],datasets:[{label:'YouTube',data:[83.8,86.5,83.3,83.3],backgroundColor:'rgba(54,162,235,0.2)',borderColor:'rgb(54,162,235)'},{label:'Telegram',data:[71.9,74.2,71.9,71.2],backgroundColor:'rgba(255,99,132,0.2)',borderColor:'rgb(255,99,132)'}]}}" width="450" />
|
79 |
+
</div>
|
80 |
+
|
81 |
+
## Model Configuration
|
82 |
+
|
83 |
+
### Label Mapping
|
84 |
+
- 0: Non-PRCT content
|
85 |
+
- 1: PRCT content
|
86 |
+
|
87 |
+
### Model Architecture
|
88 |
+
- Base model: CT-BERT
|
89 |
+
- Hidden layers: 12
|
90 |
+
- Attention heads: 12
|
91 |
+
- Parameters: 110M
|
92 |
+
|
93 |
+
### Input Requirements
|
94 |
+
- Maximum sequence length: 512 tokens
|
95 |
+
- Input type: Text (strings)
|
96 |
+
- Preprocessing: Standard BERT tokenization
|
97 |
+
|
98 |
## Intended uses & limitations
|
99 |
|
100 |
### Intended uses
|
|
|
144 |
|
145 |
The model demonstrates strong performance on its primary training domain (YouTube - English) while maintaining reasonable effectiveness in cross-platform and multilingual scenarios (Telegram - Portuguese and Spanish), showing good generalization capabilities across different social media environments.
|
146 |
|
147 |
+
## Example Predictions
|
148 |
+
|
149 |
+
Here are some example texts and how the model classifies them:
|
150 |
+
|
151 |
+
| Example Text | Prediction | Confidence |
|
152 |
+
|-------------|------------|------------|
|
153 |
+
| "Immigration policies should be decided based on economic needs and humanitarian considerations." | Non-PRCT | 0.96 |
|
154 |
+
| "We need more controlled immigration to match our labor market demands." | Non-PRCT | 0.92 |
|
155 |
+
| "European countries must protect their cultural identity while respecting diverse backgrounds." | Non-PRCT | 0.78 |
|
156 |
+
| "Politicians are secretly working to change our demographics." | PRCT | 0.85 |
|
157 |
+
| "They're bringing in foreigners to replace native voters." | PRCT | 0.94 |
|
158 |
+
| "The elites have a plan to erase our culture through mass immigration." | PRCT | 0.97 |
|
159 |
+
|
160 |
+
*Note: These examples are simplified for illustration. The model evaluates nuanced content in context.*
|
161 |
+
|
162 |
+
## Online Demo
|
163 |
+
|
164 |
+
Try the model directly in your browser using the Hugging Face Inference API:
|
165 |
+
|
166 |
+
1. Go to the [model page](https://huggingface.co/erikbranmarino/CT-BERT-PRCT)
|
167 |
+
2. Navigate to the "Inference API" tab
|
168 |
+
3. Type or paste text into the input field
|
169 |
+
4. Click "Compute" to see the model's prediction
|
170 |
+
|
171 |
+
You can also integrate the API into your applications using the [Hugging Face Inference API](https://huggingface.co/docs/api-inference/index).
|
172 |
+
|
173 |
## Example Usage
|
174 |
|
175 |
```python
|
|
|
263 |
|
264 |
## Contact
|
265 |
|
266 |
+
Erik Bran Marino ([email protected])
|
267 |
+
```
|