File size: 5,030 Bytes
45691a6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1420769
45691a6
1420769
45691a6
 
 
1420769
45691a6
1420769
 
 
 
 
 
 
45691a6
 
 
 
 
1420769
45691a6
1420769
45691a6
1420769
45691a6
 
 
1420769
45691a6
 
 
1420769
45691a6
 
 
1420769
45691a6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1420769
45691a6
 
 
1420769
45691a6
 
 
1420769
45691a6
 
 
1420769
45691a6
567f4c9
 
1420769
567f4c9
 
 
 
 
1420769
567f4c9
 
45691a6
1420769
45691a6
 
 
1420769
45691a6
 
 
1420769
45691a6
 
 
 
 
913d017
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
---
base_model: llava-hf/llava-v1.6-mistral-7b-hf
datasets:
- aliencaocao/multimodal_meme_classification_singapore
language: en
library_name: transformers
license: mit
metrics:
- accuracy
- roc_auc
pipeline_tag: image-text-to-text
tags:
- memes
- offensive
- singapore
- vlm
model-index:
- name: llava-1.6-mistral-7b-offensive-meme-singapore
  results:
  - task:
      type: image-classification
      name: Offensive Meme Classification
    dataset:
      name: Offensive Memes in Singapore Context
      type: aliencaocao/multimodal_meme_classification_singapore
      split: test
    metrics:
    - type: roc_auc
      value: 0.7345
      name: AUROC
    - type: accuracy
      value: 0.7259
      name: Accuracy
---

# Model Card for LLaVA-1.6-Mistral-7B-Offensive-Meme-Singapore

This model is described in the paper [Detecting Offensive Memes with Social Biases in Singapore Context Using Multimodal Large Language Models](https://arxiv.org/abs/2502.18101). It classifies memes as offensive or not offensive, specifically within the Singaporean context.

## Model Details

This model is a fine-tuned Vision-Language Model (VLM) designed to detect offensive memes in the Singaporean context. It leverages the strengths of VLMs to handle the nuanced and culturally specific nature of meme interpretation, addressing the limitations of traditional content moderation systems. The model was fine-tuned on a dataset of 112K memes labeled by GPT-4V. The fine-tuning process involved a pipeline incorporating OCR, translation, and a 7-billion parameter VLM (LLaVA-v1.6-Mistral-7b-hf). The resulting model demonstrates strong performance in offensive meme detection, achieving high accuracy and AUROC scores on a held-out test set.

- **Developed by:** Cao Yuxuan, Wu Jiayang, Alistair Cheong Liang Chuen, Bryan Shan Guanrong, Theodore Lee Chong Jen, and Sherman Chann Zhi Shen
- **Model type:** Fine-tuned Vision-Language Model (VLM)
- **Language(s) (NLP):** English (with multilingual capabilities through the pipeline)
- **License:** MIT
- **Finetuned from model:** llava-hf/llava-v1.6-mistral-7b-hf
- **Repository:** https://github.com/aliencaocao/vlm-for-memes-aisg
- **Paper:** [Detecting Offensive Memes with Social Biases in Singapore Context Using Multimodal Large Language Models](https://arxiv.org/abs/2502.18101)

## Uses

### Direct Use

The model can be used directly for classifying memes as offensive or non-offensive. Input is expected to be a meme image. The model processes this using OCR and translation where necessary, then utilizes a VLM for classification.

### Downstream Use

This model can be integrated into larger content moderation systems to enhance the detection of offensive memes, specifically targeting the Singaporean context.

### Out-of-Scope Use

This model is specifically trained for the Singaporean context. Its performance may degrade significantly when applied to memes from other cultures or regions. It is also not suitable for general-purpose image classification tasks.

## Bias, Risks, and Limitations

The model's performance is inherently tied to the quality and representativeness of the training data. Biases present in the training data may be reflected in the model's output, particularly regarding the interpretation of culturally specific humor or references. The model may misclassify memes due to ambiguities in language or visual representation. It is crucial to use this model responsibly and acknowledge its limitations.

### Recommendations

Users should be aware of the potential biases and limitations of the model. Human review of the model's output is strongly recommended, especially in high-stakes scenarios. Further research into mitigating bias and enhancing robustness is needed.

## How to Get Started with the Model

[More Information Needed]

## Training Details

### Training Data

[More Information Needed]

### Training Procedure

[More Information Needed]

## Evaluation

### Testing Data, Factors & Metrics

#### Testing Data

[More Information Needed]

#### Factors

[More Information Needed]

#### Metrics

[More Information Needed]

### Results

[More Information Needed]

#### Summary

[More Information Needed]

## Model Examination

[More Information Needed]

## Environmental Impact

[More Information Needed]

## Technical Specifications

[More Information Needed]

## Citation

```
@misc{yuxuan2025detectingoffensivememessocial,
      title={Detecting Offensive Memes with Social Biases in Singapore Context Using Multimodal Large Language Models},
      author={Cao Yuxuan and Wu Jiayang and Alistair Cheong Liang Chuen and Bryan Shan Guanrong and Theodore Lee Chong Jen and Sherman Chann Zhi Shen},
      year={2025},
      eprint={2502.18101},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2502.18101},
}
```

## Glossary

[More Information Needed]

## More Information

[More Information Needed]

## Model Card Authors

[More Information Needed]

## Model Card Contact

[More Information Needed]