[Ref, Fix] indentation error in answer key selection, longer explanation in demo, exclusion of broken dataset c608f7f Joschka Strueber commited on 7 days ago
[Fix] error in dataset name, error in digit check for str 3dfa66b Joschka Strueber commited on 7 days ago
[Add] add bbh and gpqa benchmarks again with correct answer_index selection 0a42e99 Joschka Strueber commited on 7 days ago
[Ref] apply custom css to heatmap, increase size of images 4077e51 Joschka Strueber commited on 7 days ago
[Ref, Add] custom css for sizing, move demo utility to its own file bd28414 Joschka Strueber commited on 7 days ago
[Add, Ref] Add more info and table on metric, move model list to data/ b90e0d3 Joschka Strueber commited on 7 days ago
[Fix, Debug] wrong default model, check filter_labels 4b2993a Joschka Strueber commited on 8 days ago
[Ref, Fix] use cached list of usable models, convert logits to OneHot for EC as well 64b132e Joschka Strueber commited on 8 days ago
[Add, Fix] add loading mechanism for cached models, change error to warning when computing heatmap 93d753c Joschka Strueber commited on 8 days ago
[Add, Ref] integrate similarity computation, fix one-hot for EC, add login option 0f7de99 Joschka Strueber commited on 9 days ago
[Add] load models and datasets from hub, compute similarities a48b15f Joschka Strueber commited on 9 days ago
[Add, Ref] matplotlib test, random test value for sim 874e761 Joschka Strueber commited on 10 days ago
[Add, Ref] pairwise sim, data loading, simple number example demo f3cd231 Joschka Strueber commited on 10 days ago
[Add] clear button, load the right data, create plot on click 53d5dd8 Joschka Strueber commited on 10 days ago