|
# Automatic Evaluation Model for RAIDEN Benchmark |
|
|
|
This repository contains the automated evaluation model trained as part of the research presented in the paper "RAIDEN Benchmark: Evaluating Role-playing Conversational Agents with Measurement-Driven Custom Dialogues". |
|
|
|
The model is designed to compare the quality of two different responses in a given dialogue turn and produce one of three evaluation outcomes: win , tie , or lose . |
|
|
|
For more detailed information, please refer to our paper and code: |
|
|
|
Paper: https://aclanthology.org/2025.coling-main.735.pdf |
|
GitHub repo: https://github.com/FrontierLabs/RAIDEN |
|
|