HumanEval V Benchmark Viewer
A simple data viewer for the HumanEval-V benchmark.
Visual Centric Coding Tasks for Large Multimodal Models
📄 Paper • 🏠 Home Page • 💻 GitHub Repository • 🏆 Leaderboard • 🤗 Dataset • 🤗 Dataset Viewer
HumanEval-V is a novel benchmark designed to evaluate the diagram understanding and reasoning capabilities of Large Multimodal Models (LMMs) in programming contexts. Unlike existing benchmarks, HumanEval-V focuses on coding tasks that require sophisticated visual reasoning over complex diagrams, pushing the boundaries of LMMs' ability to comprehend and process visual information. The dataset includes 253 human-annotated Python coding tasks, each featuring a critical, self-explanatory diagram with minimal textual clues. These tasks require LMMs to generate Python code based on the visual context and predefined function signatures.
Key features: