Spaces:
Paused
Paused
Update README.md
Browse files
README.md
CHANGED
@@ -10,48 +10,131 @@ pinned: false
|
|
10 |
short_description: An OCR application integrated with GOT OCR 2.0
|
11 |
---
|
12 |
|
13 |
-
<mark>OCR Model Integration Using Gradio:</mark>
|
14 |
|
15 |
-
|
16 |
|
17 |
-
|
18 |
-
torch
|
19 |
-
transformers
|
20 |
-
gradio
|
21 |
-
pillow
|
22 |
-
tiktoken
|
23 |
-
torchvision
|
24 |
-
torchaudio
|
25 |
-
verovio
|
26 |
-
accelerate
|
27 |
|
28 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
29 |
|
30 |
-
**
|
|
|
|
|
|
|
|
|
31 |
|
32 |
-
|
33 |
|
|
|
34 |
|
35 |
-
|
36 |
-
|
37 |
-
|
38 |
-
|
|
|
|
|
|
|
|
|
|
|
39 |
|
40 |
-
|
41 |
-
The project uses a pre-trained OCR model from Hugging Face:
|
42 |
|
43 |
-
|
44 |
-
|
45 |
-
|
46 |
-
|
|
|
|
|
|
|
|
|
|
|
47 |
|
48 |
-
|
49 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
50 |
|
51 |
-
**Gradio Setup**
|
52 |
-
Image Upload: The user uploads an image, and the text is extracted using the OCR model.
|
53 |
-
Keyword Search: Users input a keyword to search within the extracted text.
|
54 |
-
Highlighting: Keywords found in the text are highlighted with a customizable color using HTML <mark> tags
|
55 |
|
56 |
|
57 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
|
|
10 |
short_description: An OCR application integrated with GOT OCR 2.0
|
11 |
---
|
12 |
|
|
|
13 |
|
14 |
+
# OCR Model Integration with Gradio
|
15 |
|
16 |
+
This project integrates a pre-trained OCR (Optical Character Recognition) model, GOT-OCR 2.0, with a Gradio-based web interface. It allows users to upload images, extract text, and perform keyword searches within the extracted text. Matching keywords are highlighted, enhancing the readability and usability of the extracted content.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
17 |
|
18 |
+
---
|
19 |
+
|
20 |
+
## Project Overview
|
21 |
+
|
22 |
+
- **OCR Model:** This project leverages the GOT-OCR 2.0 model from Hugging Face, a transformer-based model fine-tuned for OCR tasks.
|
23 |
+
- **Frontend:** The frontend interface is built using Gradio, offering a user-friendly experience for image uploads, text extraction, and keyword searching.
|
24 |
+
- **Keyword Search:** Users can input keywords to search within the extracted text. The search is case-insensitive, and keywords are highlighted using customizable HTML tags.
|
25 |
+
|
26 |
+
---
|
27 |
+
|
28 |
+
## Features
|
29 |
+
|
30 |
+
1. **Image Upload**: Users can upload images in JPEG format to extract text.
|
31 |
+
2. **Text Extraction**: The model processes the image and extracts any text found within.
|
32 |
+
3. **Keyword Search**: Users can search for specific keywords in the extracted text. Matching keywords are highlighted for easy identification.
|
33 |
+
4. **Highlighting**: Search results are highlighted in customizable colors using HTML, making it easy to locate keywords.
|
34 |
+
|
35 |
+
---
|
36 |
+
|
37 |
+
## Model Details
|
38 |
+
|
39 |
+
- **Model Name**: [GOT-OCR 2.0](https://huggingface.co/ucaslcl/GOT-OCR2_0)
|
40 |
+
- **Architecture**: Transformer-based model, fine-tuned specifically for OCR tasks.
|
41 |
+
- **Framework**: Hugging Face's Transformers library.
|
42 |
+
- **Device Compatibility**: This model requires a GPU, ideally with NVIDIA CUDA support, for efficient performance.
|
43 |
+
- **Deployment**: Currently hosted on Hugging Face Spaces, using a paid NVIDIA T4 GPU.
|
44 |
+
|
45 |
+
### Model Components
|
46 |
+
|
47 |
+
- **Tokenizer**: Loaded with `AutoTokenizer` for tokenizing the extracted text.
|
48 |
+
- **Model**: Loaded with `AutoModel` for OCR, running in CUDA mode if a compatible GPU is available.
|
49 |
+
|
50 |
+
---
|
51 |
+
|
52 |
+
## Gradio Web Interface
|
53 |
+
|
54 |
+
This project utilizes Gradio to create a web interface that allows users to interact with the model seamlessly. The interface features:
|
55 |
+
|
56 |
+
1. **Image Upload Section**: Users can upload an image (JPEG) to be processed by the OCR model.
|
57 |
+
2. **Text Display**: Extracted text is displayed on the interface, allowing users to view the OCR results instantly.
|
58 |
+
3. **Keyword Search and Highlighting**: A search box for keyword input enables users to locate specific terms within the extracted text, with matched keywords highlighted in a customizable color.
|
59 |
+
|
60 |
+
---
|
61 |
+
|
62 |
+
## Setup Instructions
|
63 |
+
|
64 |
+
### Prerequisites
|
65 |
|
66 |
+
- **GPU Requirement**: This model requires a GPU to run efficiently. Ensure you have an NVIDIA CUDA-compatible device or similar technology.
|
67 |
+
- **Dependencies**: The required dependencies are included in `requirements.txt`. Install them with the following command:
|
68 |
+
```bash
|
69 |
+
pip install -r requirements.txt
|
70 |
+
```
|
71 |
|
72 |
+
### Required Libraries
|
73 |
|
74 |
+
The following libraries are essential for running the project:
|
75 |
|
76 |
+
- `torch`
|
77 |
+
- `transformers`
|
78 |
+
- `gradio`
|
79 |
+
- `pillow`
|
80 |
+
- `tiktoken`
|
81 |
+
- `torchvision`
|
82 |
+
- `torchaudio`
|
83 |
+
- `verovio`
|
84 |
+
- `accelerate`
|
85 |
|
86 |
+
### Installation
|
|
|
87 |
|
88 |
+
1. Clone the repository and navigate to the project directory.
|
89 |
+
2. Install dependencies using:
|
90 |
+
```bash
|
91 |
+
pip install -r requirements.txt
|
92 |
+
```
|
93 |
+
3. Launch the Gradio web app:
|
94 |
+
```bash
|
95 |
+
python app.py
|
96 |
+
```
|
97 |
|
98 |
+
---
|
99 |
+
|
100 |
+
## Running the Project on Hugging Face Spaces
|
101 |
+
|
102 |
+
This project is currently deployed on Hugging Face Spaces using an NVIDIA T4 GPU. To configure your own deployment on Hugging Face, refer to the [Spaces Configuration Reference](https://huggingface.co/docs/hub/spaces-config-reference).
|
103 |
+
|
104 |
+
---
|
105 |
+
|
106 |
+
## Usage
|
107 |
+
|
108 |
+
1. **Upload an Image**: Click on the "Upload" button to upload a JPEG image.
|
109 |
+
2. **Extract Text**: The OCR model will process the image and display the extracted text.
|
110 |
+
3. **Search for Keywords**: Enter a keyword in the search bar to locate it within the extracted text.
|
111 |
+
4. **View Results**: The keyword will be highlighted, making it easy to spot within the extracted text.
|
112 |
+
|
113 |
+
---
|
114 |
+
|
115 |
+
## Additional Notes
|
116 |
+
|
117 |
+
- **Performance**: The OCR model is optimized for GPU-based execution. Running it on a CPU might be slower and is not recommended for real-time applications.
|
118 |
+
- **Customization**: The color of the highlighted keywords can be customized by modifying the HTML tags in the code.
|
119 |
+
|
120 |
+
---
|
121 |
+
|
122 |
+
## Example Usage
|
123 |
+
|
124 |
+
```python
|
125 |
+
# Upload Image: Upload a JPEG image to the interface.
|
126 |
+
# Extract Text: The model processes the image and extracts text.
|
127 |
+
# Search Keywords: Search for keywords to highlight them in the text.
|
128 |
+
```
|
129 |
+
|
130 |
+
---
|
131 |
+
|
132 |
+
## Support
|
133 |
+
|
134 |
+
For any issues or questions, feel free to open an issue on the repository or contact the developer.
|
135 |
+
|
136 |
+
---
|
137 |
|
|
|
|
|
|
|
|
|
138 |
|
139 |
|
140 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|