omvishesh commited on
Commit
3319a69
·
verified ·
1 Parent(s): 51d22a6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +114 -31
README.md CHANGED
@@ -10,48 +10,131 @@ pinned: false
10
  short_description: An OCR application integrated with GOT OCR 2.0
11
  ---
12
 
13
- <mark>OCR Model Integration Using Gradio:</mark>
14
 
15
- **This project integrates a pre-trained OCR (Optical Character Recognition) model with a Gradio-based web interface. Users can upload an image (JPEG format), extract the text using the model, and search for specific keywords in the extracted text. The keywords are highlighted within the displayed results.**
16
 
17
- **dependencies / libraries required:**
18
- torch
19
- transformers
20
- gradio
21
- pillow
22
- tiktoken
23
- torchvision
24
- torchaudio
25
- verovio
26
- accelerate
27
 
28
- all these libraries are included in requirements.txt to install them : pip install -r requirements.txt
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
 
30
- **ALSO this model requires a GPU to run , so make sure you have NVIDIA CUDA or similar technologies.**
 
 
 
 
31
 
32
- The current web page is running on the hugging face space which is using paid GPU that is Nvidia T4 medium.
33
 
 
34
 
35
- **Project Overview**
36
- OCR Model: This project uses the GOT-OCR 2.0 model from Hugging Face.
37
- Frontend: The frontend is built using Gradio, which provides an easy-to-use web interface.
38
- Keyword Search: Users can search for specific keywords in the extracted text. The search is case-insensitive, and the matching keywords are highlighted using HTML <mark> tags with customizable colors.
 
 
 
 
 
39
 
40
- **Model Description**
41
- The project uses a pre-trained OCR model from Hugging Face:
42
 
43
- **Model Name: GOT-OCR 2.0**
44
- Architecture: Transformer-based model, fine-tuned for Optical Character Recognition.
45
- Framework: Hugging Face's transformers library.
46
- The model is loaded using the AutoTokenizer and AutoModel classes from Hugging Face and runs on a CUDA-enabled device.
 
 
 
 
 
47
 
48
- **Gradio Web Interface**
49
- The project uses Gradio to create an easy-to-use web interface for interacting with the model. The interface allows users to upload images, extract text, and search for keywords in the extracted text.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
50
 
51
- **Gradio Setup**
52
- Image Upload: The user uploads an image, and the text is extracted using the OCR model.
53
- Keyword Search: Users input a keyword to search within the extracted text.
54
- Highlighting: Keywords found in the text are highlighted with a customizable color using HTML <mark> tags
55
 
56
 
57
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
10
  short_description: An OCR application integrated with GOT OCR 2.0
11
  ---
12
 
 
13
 
14
+ # OCR Model Integration with Gradio
15
 
16
+ This project integrates a pre-trained OCR (Optical Character Recognition) model, GOT-OCR 2.0, with a Gradio-based web interface. It allows users to upload images, extract text, and perform keyword searches within the extracted text. Matching keywords are highlighted, enhancing the readability and usability of the extracted content.
 
 
 
 
 
 
 
 
 
17
 
18
+ ---
19
+
20
+ ## Project Overview
21
+
22
+ - **OCR Model:** This project leverages the GOT-OCR 2.0 model from Hugging Face, a transformer-based model fine-tuned for OCR tasks.
23
+ - **Frontend:** The frontend interface is built using Gradio, offering a user-friendly experience for image uploads, text extraction, and keyword searching.
24
+ - **Keyword Search:** Users can input keywords to search within the extracted text. The search is case-insensitive, and keywords are highlighted using customizable HTML tags.
25
+
26
+ ---
27
+
28
+ ## Features
29
+
30
+ 1. **Image Upload**: Users can upload images in JPEG format to extract text.
31
+ 2. **Text Extraction**: The model processes the image and extracts any text found within.
32
+ 3. **Keyword Search**: Users can search for specific keywords in the extracted text. Matching keywords are highlighted for easy identification.
33
+ 4. **Highlighting**: Search results are highlighted in customizable colors using HTML, making it easy to locate keywords.
34
+
35
+ ---
36
+
37
+ ## Model Details
38
+
39
+ - **Model Name**: [GOT-OCR 2.0](https://huggingface.co/ucaslcl/GOT-OCR2_0)
40
+ - **Architecture**: Transformer-based model, fine-tuned specifically for OCR tasks.
41
+ - **Framework**: Hugging Face's Transformers library.
42
+ - **Device Compatibility**: This model requires a GPU, ideally with NVIDIA CUDA support, for efficient performance.
43
+ - **Deployment**: Currently hosted on Hugging Face Spaces, using a paid NVIDIA T4 GPU.
44
+
45
+ ### Model Components
46
+
47
+ - **Tokenizer**: Loaded with `AutoTokenizer` for tokenizing the extracted text.
48
+ - **Model**: Loaded with `AutoModel` for OCR, running in CUDA mode if a compatible GPU is available.
49
+
50
+ ---
51
+
52
+ ## Gradio Web Interface
53
+
54
+ This project utilizes Gradio to create a web interface that allows users to interact with the model seamlessly. The interface features:
55
+
56
+ 1. **Image Upload Section**: Users can upload an image (JPEG) to be processed by the OCR model.
57
+ 2. **Text Display**: Extracted text is displayed on the interface, allowing users to view the OCR results instantly.
58
+ 3. **Keyword Search and Highlighting**: A search box for keyword input enables users to locate specific terms within the extracted text, with matched keywords highlighted in a customizable color.
59
+
60
+ ---
61
+
62
+ ## Setup Instructions
63
+
64
+ ### Prerequisites
65
 
66
+ - **GPU Requirement**: This model requires a GPU to run efficiently. Ensure you have an NVIDIA CUDA-compatible device or similar technology.
67
+ - **Dependencies**: The required dependencies are included in `requirements.txt`. Install them with the following command:
68
+ ```bash
69
+ pip install -r requirements.txt
70
+ ```
71
 
72
+ ### Required Libraries
73
 
74
+ The following libraries are essential for running the project:
75
 
76
+ - `torch`
77
+ - `transformers`
78
+ - `gradio`
79
+ - `pillow`
80
+ - `tiktoken`
81
+ - `torchvision`
82
+ - `torchaudio`
83
+ - `verovio`
84
+ - `accelerate`
85
 
86
+ ### Installation
 
87
 
88
+ 1. Clone the repository and navigate to the project directory.
89
+ 2. Install dependencies using:
90
+ ```bash
91
+ pip install -r requirements.txt
92
+ ```
93
+ 3. Launch the Gradio web app:
94
+ ```bash
95
+ python app.py
96
+ ```
97
 
98
+ ---
99
+
100
+ ## Running the Project on Hugging Face Spaces
101
+
102
+ This project is currently deployed on Hugging Face Spaces using an NVIDIA T4 GPU. To configure your own deployment on Hugging Face, refer to the [Spaces Configuration Reference](https://huggingface.co/docs/hub/spaces-config-reference).
103
+
104
+ ---
105
+
106
+ ## Usage
107
+
108
+ 1. **Upload an Image**: Click on the "Upload" button to upload a JPEG image.
109
+ 2. **Extract Text**: The OCR model will process the image and display the extracted text.
110
+ 3. **Search for Keywords**: Enter a keyword in the search bar to locate it within the extracted text.
111
+ 4. **View Results**: The keyword will be highlighted, making it easy to spot within the extracted text.
112
+
113
+ ---
114
+
115
+ ## Additional Notes
116
+
117
+ - **Performance**: The OCR model is optimized for GPU-based execution. Running it on a CPU might be slower and is not recommended for real-time applications.
118
+ - **Customization**: The color of the highlighted keywords can be customized by modifying the HTML tags in the code.
119
+
120
+ ---
121
+
122
+ ## Example Usage
123
+
124
+ ```python
125
+ # Upload Image: Upload a JPEG image to the interface.
126
+ # Extract Text: The model processes the image and extracts text.
127
+ # Search Keywords: Search for keywords to highlight them in the text.
128
+ ```
129
+
130
+ ---
131
+
132
+ ## Support
133
+
134
+ For any issues or questions, feel free to open an issue on the repository or contact the developer.
135
+
136
+ ---
137
 
 
 
 
 
138
 
139
 
140
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference