Update README.md
Browse files
README.md
CHANGED
@@ -107,7 +107,7 @@ We compare this to the original R1 model and test in both regimes where repetiti
|
|
107 |
| lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese | 1.0 | 66 | 92 |
|
108 |
| lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese | 1.1 | 70 | 98 |
|
109 |
|
110 |
-
Code for the SakanaAI/gsm8k-ja-test_250-1319 evaluation can be found [here](https://drive.google.com/file/d/1gCzCJv5vasw8R3KVQimfoIDFyfxwxNvC/view?usp=sharing)
|
111 |
|
112 |
|
113 |
We further use the first 50 prompts from (DeL-TaiseiOzaki/Tengentoppa-sft-reasoning-ja)[https://huggingface.co/datasets/DeL-TaiseiOzaki/Tengentoppa-sft-reasoning-ja] to evaluate the percentage of valid Japanese `\<think\>` sections in model responses.
|
@@ -120,7 +120,7 @@ This benchmark contains more varied and complex prompts, meaning this is a more
|
|
120 |
| lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese | 1.0 | 84 |
|
121 |
| lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese | 1.1 | 94 |
|
122 |
|
123 |
-
Code for the DeL-TaiseiOzaki/Tengentoppa-sft-reasoning-ja evaluation can be found [here](https://drive.google.com/file/d/1f75IM5x1SZrb300odkEsLMfKsfibrxvR/view?usp=sharing)
|
124 |
|
125 |
# How this model was made
|
126 |
|
@@ -228,7 +228,7 @@ for output in outputs:
|
|
228 |
| lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese | 1.0 | 66 | 92 |
|
229 |
| lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese | 1.1 | 70 | 98 |
|
230 |
|
231 |
-
SakanaAI/gsm8k-ja-test_250-1319の評価コードは[こちら](https://drive.google.com/file/d/1gCzCJv5vasw8R3KVQimfoIDFyfxwxNvC/view?usp=sharing)
|
232 |
|
233 |
さらに、(DeL-TaiseiOzaki/Tengentoppa-sft-reasoning-ja)[https://huggingface.co/datasets/DeL-TaiseiOzaki/Tengentoppa-sft-reasoning-ja]の最初の50プロンプトを使用して、モデル応答における有効な日本語の`<think>`セクションの割合を評価します。このベンチマークにはより多様で複雑なプロンプトが含まれており、モデルが日本語を信頼性高く出力できるかどうかを、より現実的に評価します。
|
234 |
|
@@ -239,7 +239,7 @@ SakanaAI/gsm8k-ja-test_250-1319の評価コードは[こちら](https://drive.go
|
|
239 |
| lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese | 1.0 | 84 |
|
240 |
| lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese | 1.1 | 94 |
|
241 |
|
242 |
-
DeL-TaiseiOzaki/Tengentoppa-sft-reasoning-ja評価コードは[こちら](https://drive.google.com/file/d/1f75IM5x1SZrb300odkEsLMfKsfibrxvR/view?usp=sharing)
|
243 |
|
244 |
# 作成方法
|
245 |
|
|
|
107 |
| lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese | 1.0 | 66 | 92 |
|
108 |
| lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese | 1.1 | 70 | 98 |
|
109 |
|
110 |
+
Code for the SakanaAI/gsm8k-ja-test_250-1319 evaluation can be found [here](https://drive.google.com/file/d/1gCzCJv5vasw8R3KVQimfoIDFyfxwxNvC/view?usp=sharing).
|
111 |
|
112 |
|
113 |
We further use the first 50 prompts from (DeL-TaiseiOzaki/Tengentoppa-sft-reasoning-ja)[https://huggingface.co/datasets/DeL-TaiseiOzaki/Tengentoppa-sft-reasoning-ja] to evaluate the percentage of valid Japanese `\<think\>` sections in model responses.
|
|
|
120 |
| lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese | 1.0 | 84 |
|
121 |
| lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese | 1.1 | 94 |
|
122 |
|
123 |
+
Code for the DeL-TaiseiOzaki/Tengentoppa-sft-reasoning-ja evaluation can be found [here](https://drive.google.com/file/d/1f75IM5x1SZrb300odkEsLMfKsfibrxvR/view?usp=sharing).
|
124 |
|
125 |
# How this model was made
|
126 |
|
|
|
228 |
| lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese | 1.0 | 66 | 92 |
|
229 |
| lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese | 1.1 | 70 | 98 |
|
230 |
|
231 |
+
SakanaAI/gsm8k-ja-test_250-1319の評価コードは[こちら](https://drive.google.com/file/d/1gCzCJv5vasw8R3KVQimfoIDFyfxwxNvC/view?usp=sharing)にあります。
|
232 |
|
233 |
さらに、(DeL-TaiseiOzaki/Tengentoppa-sft-reasoning-ja)[https://huggingface.co/datasets/DeL-TaiseiOzaki/Tengentoppa-sft-reasoning-ja]の最初の50プロンプトを使用して、モデル応答における有効な日本語の`<think>`セクションの割合を評価します。このベンチマークにはより多様で複雑なプロンプトが含まれており、モデルが日本語を信頼性高く出力できるかどうかを、より現実的に評価します。
|
234 |
|
|
|
239 |
| lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese | 1.0 | 84 |
|
240 |
| lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese | 1.1 | 94 |
|
241 |
|
242 |
+
DeL-TaiseiOzaki/Tengentoppa-sft-reasoning-ja評価コードは[こちら](https://drive.google.com/file/d/1f75IM5x1SZrb300odkEsLMfKsfibrxvR/view?usp=sharing)にあります。
|
243 |
|
244 |
# 作成方法
|
245 |
|