File size: 1,834 Bytes
df0c926
 
56597e4
df0c926
 
56597e4
df0c926
 
 
 
 
 
e0860a0
580bcf5
56597e4
580bcf5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
---
title: Gaia Llamaindex Agent
emoji: πŸ¦™
colorFrom: red
colorTo: pink
sdk: docker
app_file: app.py
pinned: false
short_description: Test To Pass GAIA
---

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference




# πŸ¦™ GAIA Benchmark Agent with LlamaIndex

This Space implements a complete LlamaIndex agent designed to tackle the GAIA (General AI Assistants) benchmark questions.

## Features

- **Local LLM**: Runs entirely on Hugging Face Spaces without external API dependencies
- **LlamaIndex Integration**: Uses ReAct agent framework for reasoning and tool use
- **GAIA API Integration**: Fetches questions and submits answers automatically
- **Tool Suite**: Web search, calculation, file reading, and more
- **User-Friendly Interface**: Gradio UI for testing and submission

## Architecture

```
πŸ“¦ GAIA Agent
β”œβ”€β”€ 🧠 Local LLM (DialoGPT/GPT-2)
β”œβ”€β”€ πŸ”§ Agent Tools
β”‚   β”œβ”€β”€ Web Search
β”‚   β”œβ”€β”€ Calculator
β”‚   β”œβ”€β”€ File Reader
β”‚   └── GAIA API Client
β”œβ”€β”€ πŸ€– ReAct Agent (LlamaIndex)
└── πŸ–₯️ Gradio Interface
```

## Usage

1. **Test Single Questions**: Try individual GAIA questions
2. **Full Evaluation**: Process all 20 questions from the dataset
3. **Submit to GAIA**: Send answers for official scoring

## Scoring Target

The goal is to achieve **30% accuracy** on GAIA Level 1 questions, which represents a significant milestone in AI assistant capabilities.

## Hardware Requirements

- CPU: Works on free tier
- Memory: ~8GB recommended
- GPU: Optional but improves performance

## Getting Started

1. Clone or duplicate this Space
2. Run the application
3. Start with single question testing
4. Process all questions when ready
5. Submit to GAIA leaderboard

Built with ❀️ for the GAIA benchmark challenge!