CultriX commited on
Commit
006d441
·
verified ·
1 Parent(s): 207ba8c

Update scrape-leaderboard.py

Browse files
Files changed (1) hide show
  1. scrape-leaderboard.py +6 -117
scrape-leaderboard.py CHANGED
@@ -1,123 +1,12 @@
1
  import requests
2
  from bs4 import BeautifulSoup
3
 
4
- #!/usr/bin/env python3
5
-
6
- def main():
7
- print("""### INSTRUCTION ###
8
- Read the instructions below:
9
-
10
- 1. You will be presented with benchmark scores by various LLM's.
11
- 2. The layout of the data presented to you is as follows:
12
- >>> START LAYOUT EXAMPLE <<<
13
- --- (start of a new model marker)
14
- Model ranking
15
- Model name
16
- Model average score across benchmarks in %
17
- Models average score on IFEval benchmarks in %
18
- Models average score on BBH benchmarks in %
19
- Models average score on MATH benchmarks in %
20
- Models average score in GPQA benchmarks in %
21
- Models average score in MUSR benchmarks in %
22
- Models average score in MMLU-PRO benchmarks in %
23
- ### (start of YAML-configuration marker)
24
- The YAML-configuration file that was used to create the model in mergekit.
25
- Note that this part is only available for certain models and not for all models on the list!
26
- The configuration starts and ends with '###'
27
- ### (end of YAML-configuration marker)
28
- >>> END LAYOUT EXAMPLE <<<
29
-
30
- For example, the following input could be possible:
31
- >>> START INPUT EXAMPLE <<<
32
- ---
33
- 44
34
- sometimesanotion/Qwen2.5-14B-Vimarckoso-v3
35
- 40.10 %
36
-
37
- 72.57 %
38
-
39
- 48.58 %
40
-
41
- 34.44 %
42
-
43
- 17.34 %
44
-
45
- 19.39 %
46
-
47
- 48.26 %
48
- ###
49
- models:
50
- - model: CultriX/SeQwence-14Bv1
51
- - model: allknowingroger/Qwenslerp5-14B
52
- merge_method: slerp
53
- base_model: CultriX/SeQwence-14Bv1
54
- dtype: bfloat16
55
- parameters:
56
- t: [0, 0.5, 1, 0.5, 0] # V shaped curve: Hermes for input & output, WizardMath in the middle layers
57
- ###
58
-
59
- ---
60
- 45
61
-
62
- sthenno-com/miscii-14b-1225
63
- 40.08 %
64
-
65
- 78.78 %
66
-
67
- 50.91 %
68
-
69
- 31.57 %
70
-
71
- 17.00 %
72
-
73
- 14.77 %
74
-
75
- 47.46 %
76
- ---
77
- >>> END INPUT EXAMPLE <<<
78
- >>> START INTERPRETATION OF INPUT EXAMPLE <<<
79
- ---
80
- Model Rank: 44
81
- Model Name: sometimesanotion/Qwen2.5-14B-Vimarckoso-v3
82
- Model average score across benchmarks in %: 40.10
83
- Models average score on IFEval benchmarks in %: 72.57
84
- Models average score on BBH benchmarks in %: 48.58
85
- Models average score on MATH benchmarks in % 34.44
86
- Models average score in GPQA benchmarks in % 17.34
87
- Models average score in MUSR benchmarks in % 19.39
88
- Models average score in MMLU-PRO benchmarks in % 48.26
89
- ### (THE CONFIGURATION FOR MERGING THE MODEL: sometimesanotion/Qwen2.5-14B-Vimarckoso-v3 was found)
90
- models:
91
- - model: CultriX/SeQwence-14Bv1
92
- - model: allknowingroger/Qwenslerp5-14B
93
- merge_method: slerp
94
- base_model: CultriX/SeQwence-14Bv1
95
- dtype: bfloat16
96
- parameters:
97
- t: [0, 0.5, 1, 0.5, 0] # V shaped curve: Hermes for input & output, WizardMath in the middle layers
98
- ###
99
- ---
100
- Model Rank: 45
101
- Model Name: sthenno-com/miscii-14b-1225
102
- Model average score across benchmarks in %: 40.08
103
- Models average score on IFEval benchmarks in %: 78.78
104
- Models average score on BBH benchmarks in %: 50.91
105
- Models average score on MATH benchmarks in % 31.57
106
- Models average score in GPQA benchmarks in % 17.00
107
- Models average score in MUSR benchmarks in % 14.77
108
- Models average score in MMLU-PRO benchmarks in % 47.46
109
- ###
110
- THE MERGEKIT CONFIGURATION FOR MODEL sthenno-com/miscii-14b-1225 WAS NOT FOUND SO IT IS SKIPPED
111
- ###
112
-
113
- --- (next model etc...)
114
- >>> END INTERPRETATION OF INPUT EXAMPLE <<<
115
-
116
- 4. >>> INSTRUCTIONS <<<
117
- Below follows the scraped data from the leaderboard.
118
-
119
- >>> DATA START <<<
120
- """
121
  benchmark_data = [
122
  {
123
  "rank": 44,
 
1
  import requests
2
  from bs4 import BeautifulSoup
3
 
4
+ # 1. A list of model benchmark data from your “DATA START”. Each entry contains:
5
+ # - rank
6
+ # - name
7
+ # - scores (average, IFEval, BBH, MATH, GPQA, MUSR, MMLU-PRO)
8
+ # - hf_url: the Hugging Face URL to scrape for a MergeKit config
9
+ # - known_config: if we already know the configuration, store it here; otherwise None.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  benchmark_data = [
11
  {
12
  "rank": 44,