Taka008 commited on
Commit
a63821c
·
verified ·
1 Parent(s): 91cc6ea

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -0
README.md CHANGED
@@ -198,6 +198,7 @@ For more details, please refer to the [codes](https://github.com/llm-jp/llm-jp-j
198
  [AnswerCarefully-Eval](https://www.anlp.jp/proceedings/annual_meeting/2025/pdf_dir/Q4-19.pdf) assesses the safety of Japanese language model outputs using the LLM-as-a-Judge approach, based on the test set from [llm-jp/AnswerCarefully](https://huggingface.co/datasets/llm-jp/AnswerCarefully).
199
  We evaluated the models using `gpt-4o-2024-08-06`.
200
  The scores represent the average values obtained from three rounds of inference and evaluation.
 
201
 
202
  | Model name | Score | Acceptance rate (%, ↑) | Violation rate (%, ↓) |
203
  | :--- | ---: | ---: | ---: |
 
198
  [AnswerCarefully-Eval](https://www.anlp.jp/proceedings/annual_meeting/2025/pdf_dir/Q4-19.pdf) assesses the safety of Japanese language model outputs using the LLM-as-a-Judge approach, based on the test set from [llm-jp/AnswerCarefully](https://huggingface.co/datasets/llm-jp/AnswerCarefully).
199
  We evaluated the models using `gpt-4o-2024-08-06`.
200
  The scores represent the average values obtained from three rounds of inference and evaluation.
201
+ For more details, please refer to the [codes](https://github.com/llm-jp/llm-jp-judge/tree/v1.0.0).
202
 
203
  | Model name | Score | Acceptance rate (%, ↑) | Violation rate (%, ↓) |
204
  | :--- | ---: | ---: | ---: |