Human-CentricAI
/

LLM-Refusal-Classifier

Model card Files Files and versions

Human-CentricAI commited on Jan 13

Commit

76a2920

·

verified ·

1 Parent(s): 4e4e60c

Update README.md

Files changed (1) hide show

README.md +3 -0

README.md CHANGED Viewed

@@ -12,6 +12,9 @@ The model assigns one of five possible labels:
 3 (**Refusal Capability**): The model refuses to answer due to its own limitations, lack of information, or lack of ability to provide an adequate response. <br />
 4 (**Disclaimer Capability**): The model signals its limitations but attempts to provide an answer within its capacity  <br />
 References <br />
 [1] Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; ... & Stoyanov, V. (2019). Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692. <br />
 [2] Chiang, W. L., Zheng, L., Sheng, Y., Angelopoulos, A. N., Li, T., Li, D., ... & Stoica, I. (2024). Chatbot arena: An open platform for evaluating llms by human preference. arXiv preprint arXiv:2403.04132.

 3 (**Refusal Capability**): The model refuses to answer due to its own limitations, lack of information, or lack of ability to provide an adequate response. <br />
 4 (**Disclaimer Capability**): The model signals its limitations but attempts to provide an answer within its capacity  <br />
+Please cite: <br />
+Pasch, S. (2025). LLM Content Moderation and User Satisfaction: Evidence from Response Refusals in Chatbot Arena. arXiv preprint arXiv:2501.03266.
 References <br />
 [1] Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; ... & Stoyanov, V. (2019). Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692. <br />
 [2] Chiang, W. L., Zheng, L., Sheng, Y., Angelopoulos, A. N., Li, T., Li, D., ... & Stoica, I. (2024). Chatbot arena: An open platform for evaluating llms by human preference. arXiv preprint arXiv:2403.04132.