Spaces:

fair-forward
/

languagebench

Running

App Files Files Community

languagebench / README.md

davidpomerenke

Upload from GitHub Actions: Merge pull request #18 from datenlabor-bmz/pr-17

a0d1624 verified 2 months ago

preview code

raw

history blame

2.03 kB

metadata

title: AI Language Monitor
emoji: 🌍
colorFrom: purple
colorTo: pink
sdk: docker
app_port: 8000
license: cc-by-sa-4.0
short_description: Evaluating LLM performance across all human languages.
datasets:
  - openlanguagedata/flores_plus
  - google/fleurs
  - mozilla-foundation/common_voice_1_0
  - CohereForAI/Global-MMLU
models:
  - meta-llama/Llama-3.3-70B-Instruct
  - mistralai/Mistral-Small-24B-Instruct-2501
  - deepseek-ai/DeepSeek-V3
  - microsoft/phi-4
  - openai/whisper-large-v3
  - google/gemma-3-27b-it
tags:
  - leaderboard
  - submission:manual
  - test:public
  - judge:auto
  - modality:text
  - modality:artefacts
  - eval:generation
  - language:English
  - language:German

AI Language Monitor 🌍

Tracking language proficiency of AI models for every language

System Architecture

The AI Language Monitor evaluates language models across 100+ languages using a comprehensive pipeline that combines model discovery, automated evaluation, and real-time visualization.

Detailed Architecture: See system_architecture_diagram.md for the complete system architecture diagram and component descriptions.

Key Features:

Model Discovery: Combines curated models with real-time trending models via web scraping
Multi-Task Evaluation: 7 tasks across 100+ languages with origin tracking (human vs machine-translated)
Scalable Architecture: Dual deployment (local/GitHub vs Google Cloud)
Real-time Visualization: Interactive web interface with country-level insights

Evaluate

Local Development

uv run --extra dev evals/main.py

Explore

uv run evals/backend.py
cd frontend && npm i && npm start