Commit History

[Fix] alpha choices
b16e2d1

Joschka Strueber commited on

[Ref, Fix] indentation error in answer key selection, longer explanation in demo, exclusion of broken dataset
c608f7f

Joschka Strueber commited on

[Fix] error in dataset name, error in digit check for str
3dfa66b

Joschka Strueber commited on

[Ref] list comprehensions in label filtering
64789e4

Joschka Strueber commited on

[Fix] error in label filtering
2d8352e

Joschka Strueber commited on

[Fix] error in filter_responses call
7c4f6b6

Joschka Strueber commited on

[Fix] import error
001064a

Joschka Strueber commited on

[Add] add bbh and gpqa benchmarks again with correct answer_index selection
0a42e99

Joschka Strueber commited on

[Ref] apply custom css to heatmap, increase size of images
4077e51

Joschka Strueber commited on

[Ref, Add] custom css for sizing, move demo utility to its own file
bd28414

Joschka Strueber commited on

[Add, Ref] Add more info and table on metric, move model list to data/
b90e0d3

Joschka Strueber commited on

[Fix] removal of not working benchmarks
c24946e

Joschka Strueber commited on

[Add] ignore datasets that are not functional atm
bf2618d

Joschka Strueber commited on

[Fix] key error for binary datasets
9e1c5ed

Joschka Strueber commited on

[Fix, Debug] wrong default model, check filter_labels
4b2993a

Joschka Strueber commited on

[Fix] error in deleting not-matching gt values
75132dc

Joschka Strueber commited on

[Add, Fix] add list of ungated models
5c5dc6a

Joschka Strueber commited on

[Ref, Fix] use cached list of usable models, convert logits to OneHot for EC as well
64b132e

Joschka Strueber commited on

[Debug] EC error
1f20712

Joschka Strueber commited on

[Add] only load cached models
ec5f717

Joschka Strueber commited on

[Fix] wrong API calls
9f3c166

Joschka Strueber commited on

[Ref] check number of saved and loaded models
e604b65

Joschka Strueber commited on

[Ref] check number of saved models
715aed5

Joschka Strueber commited on

[Fix] add check if cached files have been saved
81438ca

Joschka Strueber commited on

[Fix, Add] check for #api calls, bug in warning
047f32f

Joschka Strueber commited on

[Add, Fix] add loading mechanism for cached models, change error to warning when computing heatmap
93d753c

Joschka Strueber commited on

[Add] saving unblocked models as file to read from
1e010df

Joschka Strueber commited on

[Fix, Add] fix bug with metric names
d2471f2

Joschka Strueber commited on

[Fix] catch all errors from API access
1072829

Joschka Strueber commited on

[Add] cache loading data from hf
e64ca4e

Joschka Strueber commited on

[Add, Fix] change to CAPA, fix error in dataloading
ce6be70

Joschka Strueber commited on

[Add] filter gated models
5d4059c

Joschka Strueber commited on

[Add] error messages
75b9622

Joschka Strueber commited on

[Fix] convert logits to softmax for kappa_p
00b5438

Joschka Strueber commited on

[Fix] sim check for gts values
65ef274

Joschka Strueber commited on

[Fix] import error
8fd0ade

Joschka Strueber commited on

[Add, Ref] integrate similarity computation, fix one-hot for EC, add login option
0f7de99

Joschka Strueber commited on

[Add] load models and datasets from hub, compute similarities
a48b15f

Joschka Strueber commited on

[Add, Ref] matplotlib test, random test value for sim
874e761

Joschka Strueber commited on

[Ref] error handling
30bd486

Joschka Strueber commited on

[Ref] error handling
140bdab

Joschka Strueber commited on

[Fix] import error
2535891

Joschka Strueber commited on

[Add, Ref] pairwise sim, data loading, simple number example demo
f3cd231

Joschka Strueber commited on

[Fix] rendering issues
4adb140

Joschka Strueber commited on

[Fix] plotly heatmap
3b16cfa

Joschka Strueber commited on

[Ref] test list of models
c192c72

Joschka Strueber commited on

[Ref] test list of models
482c272

Joschka Strueber commited on

[Add] clear button, load the right data, create plot on click
53d5dd8

Joschka Strueber commited on

[Add] create heatmaps for multiselection
e1a6930

Joschka Strueber commited on

[Fix] load models from leaderboard
228927e

Joschka Strueber commited on