metadata

title: README
emoji: 💻
colorFrom: blue
colorTo: red
sdk: static
pinned: false

CIIRC CTU

Our team at Czech Institute of Informatics, Robotics and Cybernetics focuses on developing NLP applications utilizing large language models. Apart from creating custom solutions for our worldwide partners, we aim to aid the local NLP community by developing Czech-enabled LLMs and complemantary evaluation tools.

CzechBench

As selecting the most capable model for a specific task and language is crucial for ensuring optimal performance, we concentrated our efforts on developing a Czech-focused LLM evaluation suite. CzechBench, available on GitHub, is a collection of Czech evaluation tasks selected to assess multiple aspects of LLM capabilities. The suite newly leverages the Language Model Evaluation Harness framework, in order to provide an effective environment most LLM developers are already familiar with. The models are evaluated in an end-to-end fashion, using only their final textual outputs. This allows for direct performance comparison across both open-source and proprietary LLM solutions.

To start evaluating your own Czech-enabled models, you can follow the instructions on GitHub. We are currently working on providing an open leaderboard for CzechBench to allow for efficient sharing of evaluation results.