Spaces:
Running
Running
Commit
Β·
27e5b96
1
Parent(s):
d8147b8
[ADD]Model submission guide and citation
Browse files- app.py +5 -0
- src/about.py +41 -21
app.py
CHANGED
@@ -371,8 +371,13 @@ with demo:
|
|
371 |
)
|
372 |
|
373 |
with gr.TabItem("π
Open Ended Evaluation", elem_id="llm-benchmark-tab-table", id=1):
|
|
|
374 |
pass
|
375 |
with gr.TabItem("π
Med Safety", elem_id="llm-benchmark-tab-table", id=2):
|
|
|
|
|
|
|
|
|
376 |
pass
|
377 |
|
378 |
with gr.TabItem("π About", elem_id="llm-benchmark-tab-table", id=3):
|
|
|
371 |
)
|
372 |
|
373 |
with gr.TabItem("π
Open Ended Evaluation", elem_id="llm-benchmark-tab-table", id=1):
|
374 |
+
gr.Markdown("# Coming Soon!!!", elem_classes="markdown-text")
|
375 |
pass
|
376 |
with gr.TabItem("π
Med Safety", elem_id="llm-benchmark-tab-table", id=2):
|
377 |
+
gr.Markdown("# Coming Soon!!!", elem_classes="markdown-text")
|
378 |
+
pass
|
379 |
+
with gr.TabItem("π
Med Safety", elem_id="llm-benchmark-tab-table", id=2):
|
380 |
+
gr.Markdown("# Coming Soon!!!", elem_classes="markdown-text")
|
381 |
pass
|
382 |
|
383 |
with gr.TabItem("π About", elem_id="llm-benchmark-tab-table", id=3):
|
src/about.py
CHANGED
@@ -227,41 +227,61 @@ Users are advised to approach the results with an understanding of the inherent
|
|
227 |
|
228 |
EVALUATION_QUEUE_TEXT = """
|
229 |
|
230 |
-
Currently, the benchmark supports evaluation for models hosted on the huggingface hub and of type
|
231 |
-
If your model needs a custom implementation, follow the steps outlined in the [clinical_ner_benchmark](https://github.com/WadoodAbdul/clinical_ner_benchmark/blob/e66eb566f34e33c4b6c3e5258ac85aba42ec7894/docs/custom_model_implementation.md) repo or reach out to our team!
|
232 |
|
|
|
233 |
|
234 |
-
|
235 |
|
236 |
-
|
237 |
-
|
238 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
239 |
|
240 |
-
|
241 |
-
|
242 |
-
- Decoder: Transformer based autoregressive token generation model.
|
243 |
-
- GLiNER: Architecture outlined in the [GLiNER Paper](https://arxiv.org/abs/2311.08526)
|
244 |
|
245 |
-
|
246 |
-
|
247 |
-
|
|
|
|
|
248 |
|
249 |
-
|
|
|
|
|
|
|
|
|
250 |
|
|
|
|
|
|
|
|
|
|
|
|
|
251 |
|
252 |
Upon successful submission of your request, your model's result would be updated on the leaderboard within 5 working days!
|
253 |
"""
|
254 |
|
255 |
CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results"
|
256 |
CITATION_BUTTON_TEXT = r"""
|
257 |
-
@misc{
|
258 |
-
title={
|
259 |
-
author={
|
260 |
year={2024},
|
261 |
-
eprint={
|
262 |
archivePrefix={arXiv},
|
263 |
primaryClass={cs.CL},
|
264 |
-
url={https://arxiv.org/abs/
|
265 |
-
}
|
266 |
-
|
267 |
"""
|
|
|
227 |
|
228 |
EVALUATION_QUEUE_TEXT = """
|
229 |
|
230 |
+
Currently, the benchmark supports evaluation for models hosted on the huggingface hub and of decoder type. It doesn't support adapter models yet but we will soon add adapters too.
|
|
|
231 |
|
232 |
+
## Submission Guide for the MEDIC Benchamark
|
233 |
|
234 |
+
## First Steps Before Submitting a Model
|
235 |
|
236 |
+
### 1. Ensure Your Model Loads with AutoClasses
|
237 |
+
Verify that you can load your model and tokenizer using AutoClasses:
|
238 |
+
```python
|
239 |
+
from transformers import AutoConfig, AutoModel, AutoTokenizer
|
240 |
+
config = AutoConfig.from_pretrained("your model name", revision=revision)
|
241 |
+
model = AutoModel.from_pretrained("your model name", revision=revision)
|
242 |
+
tokenizer = AutoTokenizer.from_pretrained("your model name", revision=revision)
|
243 |
+
```
|
244 |
+
Note:
|
245 |
+
- If this step fails, debug your model before submitting.
|
246 |
+
- Ensure your model is public.
|
247 |
+
|
248 |
+
### 2. Convert Weights to Safetensors
|
249 |
+
[Safetensors](https://huggingface.co/docs/safetensors/index) is a new format for storing weights which is safer and faster to load and use. It will also allow us to add the number of parameters of your model to the `Extended Viewer`!
|
250 |
|
251 |
+
### 3. Complete Your Model Card
|
252 |
+
When we add extra information about models to the leaderboard, it will be automatically taken from the model card
|
|
|
|
|
253 |
|
254 |
+
### 4. Select the correct model type
|
255 |
+
Choose the correct model cateogory from the option below:
|
256 |
+
- π’ : π’ pretrained model: new, base models, trained on a given text corpora using masked modelling or new, base models, continuously trained on further corpus (which may include IFT/chat data) using masked modelling
|
257 |
+
- β : β fine-tuned models: pretrained models finetuned on more data or tasks.
|
258 |
+
- π¦ : π¦ preference-tuned models: chat like fine-tunes, either using IFT (datasets of task instruction), RLHF or DPO (changing the model loss a bit with an added policy), etc
|
259 |
|
260 |
+
### 5. Select Correct Precision
|
261 |
+
Choose the right precision to avoid evaluation errors:
|
262 |
+
- Not all models convert properly from float16 to bfloat16.
|
263 |
+
- Incorrect precision can cause issues (e.g., loading a bf16 model in fp16 may generate NaNs).
|
264 |
+
- If you have selected auto, the precision mentioned under `torch_dtype` under model config will be used.
|
265 |
|
266 |
+
### 6. Medically oriented model
|
267 |
+
If the model has been specifically built for medical domains i.e. pretrained/finetuned on significant medical data, make sure check the `Domain specific` checkbox
|
268 |
+
|
269 |
+
### 7. Chat template
|
270 |
+
Select this option if your model uses a chat template. The chat template will be used during evaluation.
|
271 |
+
- Before submitting, make sure the chat template is defined in tokenizer config.
|
272 |
|
273 |
Upon successful submission of your request, your model's result would be updated on the leaderboard within 5 working days!
|
274 |
"""
|
275 |
|
276 |
CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results"
|
277 |
CITATION_BUTTON_TEXT = r"""
|
278 |
+
@misc{kanithi2024mediccomprehensiveframeworkevaluating,
|
279 |
+
title={MEDIC: Towards a Comprehensive Framework for Evaluating LLMs in Clinical Applications},
|
280 |
+
author={Praveen K Kanithi and ClΓ©ment Christophe and Marco AF Pimentel and Tathagata Raha and Nada Saadi and Hamza Javed and Svetlana Maslenkova and Nasir Hayat and Ronnie Rajan and Shadab Khan},
|
281 |
year={2024},
|
282 |
+
eprint={2409.07314},
|
283 |
archivePrefix={arXiv},
|
284 |
primaryClass={cs.CL},
|
285 |
+
url={https://arxiv.org/abs/2409.07314},
|
286 |
+
}
|
|
|
287 |
"""
|