CultriX commited on
Commit
207ba8c
·
verified ·
1 Parent(s): 1c2e4dc

Update scrape-leaderboard.py

Browse files
Files changed (1) hide show
  1. scrape-leaderboard.py +117 -6
scrape-leaderboard.py CHANGED
@@ -1,12 +1,123 @@
1
  import requests
2
  from bs4 import BeautifulSoup
3
 
4
- # 1. A list of model benchmark data from your “DATA START”. Each entry contains:
5
- # - rank
6
- # - name
7
- # - scores (average, IFEval, BBH, MATH, GPQA, MUSR, MMLU-PRO)
8
- # - hf_url: the Hugging Face URL to scrape for a MergeKit config
9
- # - known_config: if we already know the configuration, store it here; otherwise None.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  benchmark_data = [
11
  {
12
  "rank": 44,
 
1
  import requests
2
  from bs4 import BeautifulSoup
3
 
4
+ #!/usr/bin/env python3
5
+
6
+ def main():
7
+ print("""### INSTRUCTION ###
8
+ Read the instructions below:
9
+
10
+ 1. You will be presented with benchmark scores by various LLM's.
11
+ 2. The layout of the data presented to you is as follows:
12
+ >>> START LAYOUT EXAMPLE <<<
13
+ --- (start of a new model marker)
14
+ Model ranking
15
+ Model name
16
+ Model average score across benchmarks in %
17
+ Models average score on IFEval benchmarks in %
18
+ Models average score on BBH benchmarks in %
19
+ Models average score on MATH benchmarks in %
20
+ Models average score in GPQA benchmarks in %
21
+ Models average score in MUSR benchmarks in %
22
+ Models average score in MMLU-PRO benchmarks in %
23
+ ### (start of YAML-configuration marker)
24
+ The YAML-configuration file that was used to create the model in mergekit.
25
+ Note that this part is only available for certain models and not for all models on the list!
26
+ The configuration starts and ends with '###'
27
+ ### (end of YAML-configuration marker)
28
+ >>> END LAYOUT EXAMPLE <<<
29
+
30
+ For example, the following input could be possible:
31
+ >>> START INPUT EXAMPLE <<<
32
+ ---
33
+ 44
34
+ sometimesanotion/Qwen2.5-14B-Vimarckoso-v3
35
+ 40.10 %
36
+
37
+ 72.57 %
38
+
39
+ 48.58 %
40
+
41
+ 34.44 %
42
+
43
+ 17.34 %
44
+
45
+ 19.39 %
46
+
47
+ 48.26 %
48
+ ###
49
+ models:
50
+ - model: CultriX/SeQwence-14Bv1
51
+ - model: allknowingroger/Qwenslerp5-14B
52
+ merge_method: slerp
53
+ base_model: CultriX/SeQwence-14Bv1
54
+ dtype: bfloat16
55
+ parameters:
56
+ t: [0, 0.5, 1, 0.5, 0] # V shaped curve: Hermes for input & output, WizardMath in the middle layers
57
+ ###
58
+
59
+ ---
60
+ 45
61
+
62
+ sthenno-com/miscii-14b-1225
63
+ 40.08 %
64
+
65
+ 78.78 %
66
+
67
+ 50.91 %
68
+
69
+ 31.57 %
70
+
71
+ 17.00 %
72
+
73
+ 14.77 %
74
+
75
+ 47.46 %
76
+ ---
77
+ >>> END INPUT EXAMPLE <<<
78
+ >>> START INTERPRETATION OF INPUT EXAMPLE <<<
79
+ ---
80
+ Model Rank: 44
81
+ Model Name: sometimesanotion/Qwen2.5-14B-Vimarckoso-v3
82
+ Model average score across benchmarks in %: 40.10
83
+ Models average score on IFEval benchmarks in %: 72.57
84
+ Models average score on BBH benchmarks in %: 48.58
85
+ Models average score on MATH benchmarks in % 34.44
86
+ Models average score in GPQA benchmarks in % 17.34
87
+ Models average score in MUSR benchmarks in % 19.39
88
+ Models average score in MMLU-PRO benchmarks in % 48.26
89
+ ### (THE CONFIGURATION FOR MERGING THE MODEL: sometimesanotion/Qwen2.5-14B-Vimarckoso-v3 was found)
90
+ models:
91
+ - model: CultriX/SeQwence-14Bv1
92
+ - model: allknowingroger/Qwenslerp5-14B
93
+ merge_method: slerp
94
+ base_model: CultriX/SeQwence-14Bv1
95
+ dtype: bfloat16
96
+ parameters:
97
+ t: [0, 0.5, 1, 0.5, 0] # V shaped curve: Hermes for input & output, WizardMath in the middle layers
98
+ ###
99
+ ---
100
+ Model Rank: 45
101
+ Model Name: sthenno-com/miscii-14b-1225
102
+ Model average score across benchmarks in %: 40.08
103
+ Models average score on IFEval benchmarks in %: 78.78
104
+ Models average score on BBH benchmarks in %: 50.91
105
+ Models average score on MATH benchmarks in % 31.57
106
+ Models average score in GPQA benchmarks in % 17.00
107
+ Models average score in MUSR benchmarks in % 14.77
108
+ Models average score in MMLU-PRO benchmarks in % 47.46
109
+ ###
110
+ THE MERGEKIT CONFIGURATION FOR MODEL sthenno-com/miscii-14b-1225 WAS NOT FOUND SO IT IS SKIPPED
111
+ ###
112
+
113
+ --- (next model etc...)
114
+ >>> END INTERPRETATION OF INPUT EXAMPLE <<<
115
+
116
+ 4. >>> INSTRUCTIONS <<<
117
+ Below follows the scraped data from the leaderboard.
118
+
119
+ >>> DATA START <<<
120
+ """
121
  benchmark_data = [
122
  {
123
  "rank": 44,