TITLE = """

🤗 Open LLM-Perf Leaderboard 🏋️

""" INTRODUCTION_TEXT = f""" The 🤗 Open LLM-Perf Leaderboard 🏋️ aims to benchmark the performance (latency & throughput) of Large Language Models (LLMs) with different hardwares, backends and optimizations using [Optimum-Benchmark](https://github.com/huggingface/optimum-benchmark) and [Optimum](https://github.com/huggingface/optimum) flavors. Anyone from the community can request a model or a hardware/backend/optimization configuration for automated benchmarking: - Model evaluation requests should be made in the [🤗 Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) and will be added to the 🤗 Open LLM-Perf Leaderboard 🏋️ automatically. - Hardware/Backend/Optimization performance requests should be made in the [community discussions](https://huggingface.co/spaces/optimum/llm-perf-leaderboard/discussions) to assess their relevance and feasibility. """ A100_TEXT = """

Single-GPU Benchmark (1xA100):

""" ABOUT_TEXT = """

About the benchmarks:

- The performances benchmarks were obtained using [Optimum-Benchmark](https://github.com/huggingface/optimum-benchmark). - Throughput is measured in tokens per second when generating 1000 tokens with a batch size of 1. - Peak memory is measured in MB during the first forward pass of the model (no warmup). - Open LLM Score is an average evaluation score obtained from the [🤗 Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard). - Open LLM Tradeoff is the euclidean distance between an LLM and the "perfect LLM" (i.e. 0 latency and 100% accuracy) translating the tradeoff between latency and accuracy. """ CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results." CITATION_BUTTON_TEXT = r"""@misc{open-llm-perf-leaderboard, author = {Ilyas Moutawwakil, Régis Pierrard}, title = {Open LLM-Perf Leaderboard}, year = {2023}, publisher = {Hugging Face}, howpublished = "\url{https://huggingface.co/spaces/optimum/llm-perf-leaderboard}", @software{optimum-benchmark, author = {Ilyas Moutawwakil, Régis Pierrard}, publisher = {Hugging Face}, title = {Optimum-Benchmark: A framework for benchmarking the performance of Transformers models with different hardwares, backends and optimizations.}, } """