Spaces:
Running
Running
Commit
Β·
96e3dc1
1
Parent(s):
4b5b795
add link to evaluation code
Browse files- content.py +7 -3
content.py
CHANGED
@@ -11,7 +11,11 @@ benchname = 'KOFFVQA'
|
|
11 |
Bottom_logo = f'''<img src="data:image/jpeg;base64,{bottom_logo}" style="width:20%;display:block;margin-left:auto;margin-right:auto">'''
|
12 |
|
13 |
intro_md = f'''
|
14 |
-
#
|
|
|
|
|
|
|
|
|
15 |
|
16 |
{benchname}π is a Free-Form VQA benchmark dataset designed to evaluate Vision-Language Models (VLMs) in Korean language environments. Unlike traditional multiple-choice or predefined answer formats, KOFFVQA challenges models to generate open-ended, natural-language answers to visually grounded questions. This allows for a more comprehensive assessment of a model's ability to understand and generate nuanced Korean responses.
|
17 |
|
@@ -29,10 +33,10 @@ The {benchname} benchmark is designed to evaluate and compare the performance of
|
|
29 |
|
30 |
This benchmark includes a total of 275 Korean questions across 10 tasks. The questions are open-ended, free-form VQA (Visual Question Answering) with objective answers, allowing responses without strict format constraints.
|
31 |
|
32 |
-
We will add more information about this benchmark soon.
|
33 |
-
|
34 |
## News
|
35 |
|
|
|
|
|
36 |
* **2024-12-06**: Leaderboard Release!
|
37 |
|
38 |
'''.strip()
|
|
|
11 |
Bottom_logo = f'''<img src="data:image/jpeg;base64,{bottom_logo}" style="width:20%;display:block;margin-left:auto;margin-right:auto">'''
|
12 |
|
13 |
intro_md = f'''
|
14 |
+
# {benchname} Leaderboard
|
15 |
+
|
16 |
+
* [Dataset](https://huggingface.co/datasets/maum-ai/KOFFVQA_Data)
|
17 |
+
* [Evaluation Code](https://github.com/maum-ai/KOFFVQA)
|
18 |
+
* Report (coming soon)
|
19 |
|
20 |
{benchname}π is a Free-Form VQA benchmark dataset designed to evaluate Vision-Language Models (VLMs) in Korean language environments. Unlike traditional multiple-choice or predefined answer formats, KOFFVQA challenges models to generate open-ended, natural-language answers to visually grounded questions. This allows for a more comprehensive assessment of a model's ability to understand and generate nuanced Korean responses.
|
21 |
|
|
|
33 |
|
34 |
This benchmark includes a total of 275 Korean questions across 10 tasks. The questions are open-ended, free-form VQA (Visual Question Answering) with objective answers, allowing responses without strict format constraints.
|
35 |
|
|
|
|
|
36 |
## News
|
37 |
|
38 |
+
* **2025-01-21**: [Evaluation code](https://github.com/maum-ai/KOFFVQA) release
|
39 |
+
|
40 |
* **2024-12-06**: Leaderboard Release!
|
41 |
|
42 |
'''.strip()
|