shenzhi-wang commited on
Commit
edc40ce
Β·
verified Β·
1 Parent(s): 4c98372

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -2
README.md CHANGED
@@ -91,7 +91,9 @@ print(response)
91
 
92
  ### 3.1 Arena-Hard-Auto-v0.1
93
 
94
- All results below, except those for `Xwen-72B-Chat`, are sourced from [Arena-Hard-Auto](https://github.com/lmarena/arena-hard-auto) (accessed on February 1, 2025).
 
 
95
 
96
  #### 3.1.1 No Style Control
97
 
@@ -99,9 +101,11 @@ All results below, except those for `Xwen-72B-Chat`, are sourced from [Arena-Har
99
 
100
  | | Score | 95% CIs |
101
  | --------------------------------- | ------------------------ | ----------- |
102
- | **Xwen-72B-Chat** πŸ”‘ | **86.1** (Top-1 among πŸ”‘) | (-1.5, 1.7) |
103
  | Qwen2.5-72B-Instruct πŸ”‘ | 78.0 | (-1.8, 1.8) |
104
  | Athene-v2-Chat πŸ”‘ | 85.0 | (-1.4, 1.7) |
 
 
105
  | Llama-3.1-Nemotron-70B-Instruct πŸ”‘ | 84.9 | (-1.7, 1.8) |
106
  | Llama-3.1-405B-Instruct-FP8 πŸ”‘ | 69.3 | (-2.4, 2.2) |
107
  | Claude-3-5-Sonnet-20241022 πŸ”’ | 85.2 | (-1.4, 1.6) |
 
91
 
92
  ### 3.1 Arena-Hard-Auto-v0.1
93
 
94
+ All results below, except those for `Xwen-72B-Chat`, `DeepSeek-V3` and `DeepSeek-R1`, are sourced from [Arena-Hard-Auto](https://github.com/lmarena/arena-hard-auto) (accessed on February 1, 2025).
95
+
96
+ The results of `DeepSeek-V3` and `DeepSeek-R1` are borrowed from their officially reported results.
97
 
98
  #### 3.1.1 No Style Control
99
 
 
101
 
102
  | | Score | 95% CIs |
103
  | --------------------------------- | ------------------------ | ----------- |
104
+ | **Xwen-72B-Chat** πŸ”‘ | **86.1** (Top-1 among πŸ”‘ below 100B) | (-1.5, 1.7) |
105
  | Qwen2.5-72B-Instruct πŸ”‘ | 78.0 | (-1.8, 1.8) |
106
  | Athene-v2-Chat πŸ”‘ | 85.0 | (-1.4, 1.7) |
107
+ | DeepSeek-V3 **(671B >> 72B)** πŸ”‘ | 85.5 | N/A |
108
+ | DeepSeek-R1 **(671B >> 72B)** πŸ”‘ | **92.3** (Top-1 among πŸ”‘) | N/A |
109
  | Llama-3.1-Nemotron-70B-Instruct πŸ”‘ | 84.9 | (-1.7, 1.8) |
110
  | Llama-3.1-405B-Instruct-FP8 πŸ”‘ | 69.3 | (-2.4, 2.2) |
111
  | Claude-3-5-Sonnet-20241022 πŸ”’ | 85.2 | (-1.4, 1.6) |