Spaces:
Sleeping
Sleeping
File size: 3,214 Bytes
8dc4b22 c8763bd e2e1ee9 bee5389 ad5bd56 81f5492 ad5bd56 c8763bd 9dc4521 c382b2a e747f4e c382b2a e2e1ee9 e0ef314 9e3eaf4 df1a500 67b4a03 483e3a1 e2e1ee9 483e3a1 e2e1ee9 483e3a1 bee5389 b869fcb 2ff4a74 9dc4521 bee5389 9dc4521 2ff4a74 00642fb ad5bd56 9dc4521 bee5389 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 |
TITLE = """<h1 align="center" id="space-title">🤗 Open LLM-Perf Leaderboard 🏋️</h1>"""
INTRODUCTION_TEXT = f"""
The 🤗 Open LLM-Perf Leaderboard 🏋️ aims to benchmark the performance (latency, throughput & memory) of Large Language Models (LLMs) with different hardwares, backends and optimizations using [Optimum-Benchmark](https://github.com/huggingface/optimum-benchmark) and [Optimum](https://github.com/huggingface/optimum) flavors.
Anyone from the community can request a model or a hardware/backend/optimization configuration for automated benchmarking:
- Model evaluation requests should be made in the [🤗 Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) and will be added to the 🤗 Open LLM-Perf Leaderboard 🏋️ automatically.
- Hardware/Backend/Optimization performance requests should be made in the [community discussions](https://huggingface.co/spaces/optimum/llm-perf-leaderboard/discussions) to assess their relevance and feasibility.
"""
ABOUT_TEXT = """<h3>About the 🤗 Open LLM-Perf Leaderboard 🏋️</h3>
<ul>
<li>To avoid communication-dependent results, only one GPU is used.</li>
<li>LLMs are evaluated on a singleton batch with a prompt size of 512 and generating 1000 tokens.</li>
<li>Peak memory is measured in MB during the generate pass with py3nvml while assuring the GPU's isolation.</li>
<li>Each pair of (Model Type, Weight Class) is represented by the best scored model. This LLM is the one used for all the hardware/backend/optimization experiments.</li>
<li>Score is the average evaluation score obtained from the <a href="https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard">🤗 Open LLM Leaderboard</a>.</li>
</ul>
"""
EXAMPLE_CONFIG_TEXT = """
Here's an example of the configuration file used to benchmark the models with Optimum-Benchmark:
```yaml
defaults:
- backend: pytorch # default backend
- benchmark: inference # default benchmark
- experiment # inheriting from experiment config
- _self_ # for hydra 1.1 compatibility
- override hydra/job_logging: colorlog # colorful logging
- override hydra/hydra_logging: colorlog # colorful logging
hydra:
run:
dir: llm-experiments/{experiment_name}
job:
chdir: true
experiment_name: {experiment_name}
model: {model}
device: cuda
backend:
no_weights: true
delete_cache: true
torch_dtype: float16
quantization_strategy: gptq
bettertransformer: true
benchmark:
memory: true
input_shapes:
batch_size: 1
sequence_length: 512
new_tokens: 1000
```
"""
CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results."
CITATION_BUTTON_TEXT = r"""@misc{open-llm-perf-leaderboard,
author = {Ilyas Moutawwakil, Régis Pierrard},
title = {Open LLM-Perf Leaderboard},
year = {2023},
publisher = {Hugging Face},
howpublished = "\url{https://huggingface.co/spaces/optimum/llm-perf-leaderboard}",
@software{optimum-benchmark,
author = {Ilyas Moutawwakil, Régis Pierrard},
publisher = {Hugging Face},
title = {Optimum-Benchmark: A framework for benchmarking the performance of Transformers models with different hardwares, backends and optimizations.},
}
"""
|