Spaces:
				
			
			
	
			
			
		Runtime error
		
	
	
	
			
			
	
	
	
	
		
		Text Recognition
Overview
The structure of the text recognition dataset directory is organized as follows.
βββ mixture
β   βββ coco_text
β   β   βββ train_label.txt
β   β   βββ train_words
β   βββ icdar_2011
β   β   βββ training_label.txt
β   β   βββ Challenge1_Training_Task3_Images_GT
β   βββ icdar_2013
β   β   βββ train_label.txt
β   β   βββ test_label_1015.txt
β   β   βββ test_label_1095.txt
β   β   βββ Challenge2_Training_Task3_Images_GT
β   β   βββ Challenge2_Test_Task3_Images
β   βββ icdar_2015
β   β   βββ train_label.txt
β   β   βββ test_label.txt
β   β   βββ ch4_training_word_images_gt
β   β   βββ ch4_test_word_images_gt
β   βββ III5K
β   β   βββ train_label.txt
β   β   βββ test_label.txt
β   β   βββ train
β   β   βββ test
β   βββ ct80
β   β   βββ test_label.txt
β   β   βββ image
β   βββ svt
β   β   βββ test_label.txt
β   β   βββ image
β   βββ svtp
β   β   βββ test_label.txt
β   β   βββ image
β   βββ Syn90k
β   β   βββ shuffle_labels.txt
β   β   βββ label.txt
β   β   βββ label.lmdb
β   β   βββ mnt
β   βββ SynthText
β   β   βββ alphanumeric_labels.txt
β   β   βββ shuffle_labels.txt
β   β   βββ instances_train.txt
β   β   βββ label.txt
β   β   βββ label.lmdb
β   β   βββ synthtext
β   βββ SynthAdd
β   β   βββ label.txt
β   β   βββ label.lmdb
β   β   βββ SynthText_Add
β   βββ TextOCR
β   β   βββ image
β   β   βββ train_label.txt
β   β   βββ val_label.txt
β   βββ Totaltext
β   β   βββ imgs
β   β   βββ annotations
β   β   βββ train_label.txt
β   β   βββ test_label.txt
β   βββ OpenVINO
β   β   βββ image_1
β   β   βββ image_2
β   β   βββ image_5
β   β   βββ image_f
β   β   βββ image_val
β   β   βββ train_1_label.txt
β   β   βββ train_2_label.txt
β   β   βββ train_5_label.txt
β   β   βββ train_f_label.txt
β   β   βββ val_label.txt
β   βββ funsd
β   β   βββ imgs
β   β   βββ dst_imgs
β   β   βββ annotations
β   β   βββ train_label.txt
β   β   βββ test_label.txt
| Dataset | images | annotation file | annotation file | 
|---|---|---|---|
| training | test | ||
| coco_text | homepage | train_label.txt | - | 
| icdar_2011 | homepage | train_label.txt | - | 
| icdar_2013 | homepage | train_label.txt | test_label_1015.txt | 
| icdar_2015 | homepage | train_label.txt | test_label.txt | 
| IIIT5K | homepage | train_label.txt | test_label.txt | 
| ct80 | homepage | - | test_label.txt | 
| svt | homepage | - | test_label.txt | 
| svtp | unofficial homepage[1] | - | test_label.txt | 
| MJSynth (Syn90k) | homepage | shuffle_labels.txt | label.txt | - | 
| SynthText (Synth800k) | homepage | alphanumeric_labels.txt |shuffle_labels.txt | instances_train.txt | label.txt | - | 
| SynthAdd | SynthText_Add.zip (code:627x) | label.txt | - | 
| TextOCR | homepage | - | - | 
| Totaltext | homepage | - | - | 
| OpenVINO | Open Images | annotations | annotations | 
| FUNSD | homepage | - | - | 
(*) Since the official homepage is unavailable now, we provide an alternative for quick reference. However, we do not guarantee the correctness of the dataset.
Preparation Steps
ICDAR 2013
- Step1: Download Challenge2_Test_Task3_Images.zipandChallenge2_Training_Task3_Images_GT.zipfrom homepage
- Step2: Download test_label_1015.txt and train_label.txt
ICDAR 2015
- Step1: Download ch4_training_word_images_gt.zipandch4_test_word_images_gt.zipfrom homepage
- Step2: Download train_label.txt and test_label.txt
IIIT5K
- Step1: Download IIIT5K-Word_V3.0.tar.gzfrom homepage
- Step2: Download train_label.txt and test_label.txt
svt
- Step1: Download svt.zipform homepage
- Step2: Download test_label.txt
- Step3:
python tools/data/textrecog/svt_converter.py <download_svt_dir_path>
ct80
- Step1: Download test_label.txt
svtp
- Step1: Download test_label.txt
coco_text
- Step1: Download from homepage
- Step2: Download train_label.txt
MJSynth (Syn90k)
- Step1: Download mjsynth.tar.gzfrom homepage
- Step2: Download label.txt (8,919,273 annotations) and shuffle_labels.txt (2,400,000 randomly sampled annotations). Please make sure you're using the right annotation to train the model by checking its dataset specs in Model Zoo.
- Step3:
mkdir Syn90k && cd Syn90k
mv /path/to/mjsynth.tar.gz .
tar -xzf mjsynth.tar.gz
mv /path/to/shuffle_labels.txt .
mv /path/to/label.txt .
# create soft link
cd /path/to/mmocr/data/mixture
ln -s /path/to/Syn90k Syn90k
SynthText (Synth800k)
- Step1: Download - SynthText.zipfrom homepage
- Step2: According to your actual needs, download the most appropriate one from the following options: label.txt (7,266,686 annotations), shuffle_labels.txt (2,400,000 randomly sampled annotations), alphanumeric_labels.txt (7,239,272 annotations with alphanumeric characters only) and instances_train.txt (7,266,686 character-level annotations). 
:::{warning} Please make sure you're using the right annotation to train the model by checking its dataset specs in Model Zoo. :::
- Step3:
mkdir SynthText && cd SynthText
mv /path/to/SynthText.zip .
unzip SynthText.zip
mv SynthText synthtext
mv /path/to/shuffle_labels.txt .
mv /path/to/label.txt .
mv /path/to/alphanumeric_labels.txt .
mv /path/to/instances_train.txt .
# create soft link
cd /path/to/mmocr/data/mixture
ln -s /path/to/SynthText SynthText
- Step4: Generate cropped images and labels:
cd /path/to/mmocr
python tools/data/textrecog/synthtext_converter.py data/mixture/SynthText/gt.mat data/mixture/SynthText/ data/mixture/SynthText/synthtext/SynthText_patch_horizontal --n_proc 8
SynthAdd
mkdir SynthAdd && cd SynthAdd
mv /path/to/SynthText_Add.zip .
unzip SynthText_Add.zip
mv /path/to/label.txt .
# create soft link
cd /path/to/mmocr/data/mixture
ln -s /path/to/SynthAdd SynthAdd
:::{tip}
To convert label file with txt format to lmdb format,
python tools/data/utils/txt2lmdb.py -i <txt_label_path> -o <lmdb_label_path>
For example,
python tools/data/utils/txt2lmdb.py -i data/mixture/Syn90k/label.txt -o data/mixture/Syn90k/label.lmdb
:::
TextOCR
- Step1: Download train_val_images.zip, TextOCR_0.1_train.json and TextOCR_0.1_val.json to textocr/.
mkdir textocr && cd textocr
# Download TextOCR dataset
wget https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip
wget https://dl.fbaipublicfiles.com/textvqa/data/textocr/TextOCR_0.1_train.json
wget https://dl.fbaipublicfiles.com/textvqa/data/textocr/TextOCR_0.1_val.json
# For images
unzip -q train_val_images.zip
mv train_images train
- Step2: Generate train_label.txt,val_label.txtand crop images using 4 processes with the following command:
python tools/data/textrecog/textocr_converter.py /path/to/textocr 4
Totaltext
- Step1: Download totaltext.zipfrom github dataset andgroundtruth_text.zipfrom github Groundtruth (Our totaltext_converter.py supports groundtruth with both .mat and .txt format).
mkdir totaltext && cd totaltext
mkdir imgs && mkdir annotations
# For images
# in ./totaltext
unzip totaltext.zip
mv Images/Train imgs/training
mv Images/Test imgs/test
# For annotations
unzip groundtruth_text.zip
cd Groundtruth
mv Polygon/Train ../annotations/training
mv Polygon/Test ../annotations/test
- Step2: Generate cropped images, train_label.txtandtest_label.txtwith the following command (the cropped images will be saved todata/totaltext/dst_imgs/):
python tools/data/textrecog/totaltext_converter.py /path/to/totaltext -o /path/to/totaltext --split-list training test
OpenVINO
- Step0: Install awscli.
- Step1: Download Open Images subsets train_1,train_2,train_5,train_f, andvalidationtoopenvino/.
mkdir openvino && cd openvino
# Download Open Images subsets
for s in 1 2 5 f; do
  aws s3 --no-sign-request cp s3://open-images-dataset/tar/train_${s}.tar.gz .
done
aws s3 --no-sign-request cp s3://open-images-dataset/tar/validation.tar.gz .
# Download annotations
for s in 1 2 5 f; do
  wget https://storage.openvinotoolkit.org/repositories/openvino_training_extensions/datasets/open_images_v5_text/text_spotting_openimages_v5_train_${s}.json
done
wget https://storage.openvinotoolkit.org/repositories/openvino_training_extensions/datasets/open_images_v5_text/text_spotting_openimages_v5_validation.json
# Extract images
mkdir -p openimages_v5/val
for s in 1 2 5 f; do
  tar zxf train_${s}.tar.gz -C openimages_v5
done
tar zxf validation.tar.gz -C openimages_v5/val
- Step2: Generate train_{1,2,5,f}_label.txt,val_label.txtand crop images using 4 processes with the following command:
python tools/data/textrecog/openvino_converter.py /path/to/openvino 4
FUNSD
- Step1: Download dataset.zip to funsd/.
mkdir funsd && cd funsd
# Download FUNSD dataset
wget https://guillaumejaume.github.io/FUNSD/dataset.zip
unzip -q dataset.zip
# For images
mv dataset/training_data/images imgs && mv dataset/testing_data/images/* imgs/
# For annotations
mkdir annotations
mv dataset/training_data/annotations annotations/training && mv dataset/testing_data/annotations annotations/test
rm dataset.zip && rm -rf dataset
- Step2: Generate train_label.txtandtest_label.txtand crop images using 4 processes with following command (add--preserve-verticalif you wish to preserve the images containing vertical texts):
python tools/data/textrecog/funsd_converter.py PATH/TO/funsd --nproc 4