File size: 10,769 Bytes
7370e5c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
---

comments: true
description: Explore the Roboflow 100 dataset featuring 100 diverse datasets designed to test object detection models across various domains, from healthcare to video games.
keywords: Roboflow 100, Ultralytics, object detection, dataset, benchmarking, machine learning, computer vision, diverse datasets, model evaluation
---


# Roboflow 100 Dataset

Roboflow 100, developed by [Roboflow](https://roboflow.com/?ref=ultralytics) and sponsored by Intel, is a groundbreaking [object detection](../../tasks/detect.md) benchmark. It includes 100 diverse datasets sampled from over 90,000 public datasets. This benchmark is designed to test the adaptability of models to various domains, including healthcare, aerial imagery, and video games.

<p align="center">
  <img width="640" src="https://user-images.githubusercontent.com/15908060/202452898-9ca6b8f7-4805-4e8e-949a-6e080d7b94d2.jpg" alt="Roboflow 100 Overview">
</p>

## Key Features

- Includes 100 datasets across seven domains: Aerial, Video games, Microscopic, Underwater, Documents, Electromagnetic, and Real World.
- The benchmark comprises 224,714 images across 805 classes, thanks to over 11,170 hours of labeling efforts.
- All images are resized to 640x640 pixels, with a focus on eliminating class ambiguity and filtering out underrepresented classes.
- Annotations include bounding boxes for objects, making it suitable for [training](../../modes/train.md) and evaluating object detection models.

## Dataset Structure

The Roboflow 100 dataset is organized into seven categories, each with a distinct set of datasets, images, and classes:

- **Aerial**: Consists of 7 datasets with a total of 9,683 images, covering 24 distinct classes.
- **Video Games**: Includes 7 datasets, featuring 11,579 images across 88 classes.
- **Microscopic**: Comprises 11 datasets with 13,378 images, spanning 28 classes.
- **Underwater**: Contains 5 datasets, encompassing 18,003 images in 39 classes.
- **Documents**: Consists of 8 datasets with 24,813 images, divided into 90 classes.
- **Electromagnetic**: Made up of 12 datasets, totaling 36,381 images in 41 classes.
- **Real World**: The largest category with 50 datasets, offering 110,615 images across 495 classes.

This structure enables a diverse and extensive testing ground for object detection models, reflecting real-world application scenarios.

## Benchmarking

Dataset benchmarking evaluates machine learning model performance on specific datasets using standardized metrics like accuracy, mean average precision and F1-score.

!!! Tip "Benchmarking"

    Benchmarking results will be stored in "ultralytics-benchmarks/evaluation.txt"


!!! Example "Benchmarking example"

    === "Python"


        ```python

        import os

        import shutil

        from pathlib import Path


        from ultralytics.utils.benchmarks import RF100Benchmark


        # Initialize RF100Benchmark and set API key

        benchmark = RF100Benchmark()

        benchmark.set_key(api_key="YOUR_ROBOFLOW_API_KEY")


        # Parse dataset and define file paths

        names, cfg_yamls = benchmark.parse_dataset()

        val_log_file = Path("ultralytics-benchmarks") / "validation.txt"

        eval_log_file = Path("ultralytics-benchmarks") / "evaluation.txt"


        # Run benchmarks on each dataset in RF100

        for ind, path in enumerate(cfg_yamls):

            path = Path(path)

            if path.exists():

                # Fix YAML file and run training

                benchmark.fix_yaml(str(path))

                os.system(f"yolo detect train data={path} model=yolov8s.pt epochs=1 batch=16")


                # Run validation and evaluate

                os.system(f"yolo detect val data={path} model=runs/detect/train/weights/best.pt > {val_log_file} 2>&1")

                benchmark.evaluate(str(path), str(val_log_file), str(eval_log_file), ind)


                # Remove the 'runs' directory

                runs_dir = Path.cwd() / "runs"

                shutil.rmtree(runs_dir)

            else:

                print("YAML file path does not exist")

                continue


        print("RF100 Benchmarking completed!")

        ```


## Applications

Roboflow 100 is invaluable for various applications related to computer vision and deep learning. Researchers and engineers can use this benchmark to:

- Evaluate the performance of object detection models in a multi-domain context.
- Test the adaptability of models to real-world scenarios beyond common object recognition.
- Benchmark the capabilities of object detection models across diverse datasets, including those in healthcare, aerial imagery, and video games.

For more ideas and inspiration on real-world applications, be sure to check out [our guides on real-world projects](../../guides/index.md).

## Usage

The Roboflow 100 dataset is available on both [GitHub](https://github.com/roboflow/roboflow-100-benchmark) and [Roboflow Universe](https://universe.roboflow.com/roboflow-100).

You can access it directly from the Roboflow 100 GitHub repository. In addition, on Roboflow Universe, you have the flexibility to download individual datasets by simply clicking the export button within each dataset.

## Sample Data and Annotations

Roboflow 100 consists of datasets with diverse images and videos captured from various angles and domains. Here's a look at examples of annotated images in the RF100 benchmark.

<p align="center">
  <img width="640" src="https://blog.roboflow.com/content/images/2022/11/image-2.png" alt="Sample Data and Annotations">
</p>

The diversity in the Roboflow 100 benchmark that can be seen above is a significant advancement from traditional benchmarks which often focus on optimizing a single metric within a limited domain.

## Citations and Acknowledgments

If you use the Roboflow 100 dataset in your research or development work, please cite the following paper:

!!! Quote ""

    === "BibTeX"


        ```bibtex

        @misc{2211.13523,

            Author = {Floriana Ciaglia and Francesco Saverio Zuppichini and Paul Guerrie and Mark McQuade and Jacob Solawetz},

            Title = {Roboflow 100: A Rich, Multi-Domain Object Detection Benchmark},

            Eprint = {arXiv:2211.13523},

        }

        ```


Our thanks go to the Roboflow team and all the contributors for their hard work in creating and sustaining the Roboflow 100 dataset.

If you are interested in exploring more datasets to enhance your object detection and machine learning projects, feel free to visit [our comprehensive dataset collection](../index.md).

## FAQ

### What is the Roboflow 100 dataset, and why is it significant for object detection?

The **Roboflow 100** dataset, developed by [Roboflow](https://roboflow.com/?ref=ultralytics) and sponsored by Intel, is a crucial [object detection](../../tasks/detect.md) benchmark. It features 100 diverse datasets from over 90,000 public datasets, covering domains such as healthcare, aerial imagery, and video games. This diversity ensures that models can adapt to various real-world scenarios, enhancing their robustness and performance.

### How can I use the Roboflow 100 dataset for benchmarking my object detection models?

To use the Roboflow 100 dataset for benchmarking, you can implement the RF100Benchmark class from the Ultralytics library. Here's a brief example:

!!! Example "Benchmarking example"

    === "Python"

    

        ```python

        import os

        import shutil

        from pathlib import Path


        from ultralytics.utils.benchmarks import RF100Benchmark


        # Initialize RF100Benchmark and set API key

        benchmark = RF100Benchmark()

        benchmark.set_key(api_key="YOUR_ROBOFLOW_API_KEY")


        # Parse dataset and define file paths

        names, cfg_yamls = benchmark.parse_dataset()

        val_log_file = Path("ultralytics-benchmarks") / "validation.txt"

        eval_log_file = Path("ultralytics-benchmarks") / "evaluation.txt"


        # Run benchmarks on each dataset in RF100

        for ind, path in enumerate(cfg_yamls):

            path = Path(path)

            if path.exists():

                # Fix YAML file and run training

                benchmark.fix_yaml(str(path))

                os.system(f"yolo detect train data={path} model=yolov8s.pt epochs=1 batch=16")


                # Run validation and evaluate

                os.system(f"yolo detect val data={path} model=runs/detect/train/weights/best.pt > {val_log_file} 2>&1")

                benchmark.evaluate(str(path), str(val_log_file), str(eval_log_file), ind)


                # Remove 'runs' directory

                runs_dir = Path.cwd() / "runs"

                shutil.rmtree(runs_dir)

            else:

                print("YAML file path does not exist")

                continue


        print("RF100 Benchmarking completed!")

        ```


### Which domains are covered by the Roboflow 100 dataset?

The **Roboflow 100** dataset spans seven domains, each providing unique challenges and applications for object detection models:

1. **Aerial**: 7 datasets, 9,683 images, 24 classes
2. **Video Games**: 7 datasets, 11,579 images, 88 classes
3. **Microscopic**: 11 datasets, 13,378 images, 28 classes
4. **Underwater**: 5 datasets, 18,003 images, 39 classes
5. **Documents**: 8 datasets, 24,813 images, 90 classes
6. **Electromagnetic**: 12 datasets, 36,381 images, 41 classes
7. **Real World**: 50 datasets, 110,615 images, 495 classes

This setup allows for extensive and varied testing of models across different real-world applications.

### How do I access and download the Roboflow 100 dataset?

The **Roboflow 100** dataset is accessible on [GitHub](https://github.com/roboflow/roboflow-100-benchmark) and [Roboflow Universe](https://universe.roboflow.com/roboflow-100). You can download the entire dataset from GitHub or select individual datasets on Roboflow Universe using the export button.

### What should I include when citing the Roboflow 100 dataset in my research?

When using the Roboflow 100 dataset in your research, ensure to properly cite it. Here is the recommended citation:

!!! Quote ""

    === "BibTeX"


        ```bibtex

        @misc{2211.13523,

            Author = {Floriana Ciaglia and Francesco Saverio Zuppichini and Paul Guerrie and Mark McQuade and Jacob Solawetz},

            Title = {Roboflow 100: A Rich, Multi-Domain Object Detection Benchmark},

            Eprint = {arXiv:2211.13523},

        }

        ```


For more details, you can refer to our [comprehensive dataset collection](../index.md).