metadata

title: OCR App
emoji: 🏆
colorFrom: pink
colorTo: blue
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
short_description: An OCR application integrated with GOT OCR 2.0

OCR Model Integration with Gradio

This project integrates a pre-trained OCR (Optical Character Recognition) model, GOT-OCR 2.0, with a Gradio-based web interface. It allows users to upload images, extract text, and perform keyword searches within the extracted text. Matching keywords are highlighted, enhancing the readability and usability of the extracted content.

Project Overview

OCR Model: This project leverages the GOT-OCR 2.0 model from Hugging Face, a transformer-based model fine-tuned for OCR tasks.
Frontend: The frontend interface is built using Gradio, offering a user-friendly experience for image uploads, text extraction, and keyword searching.
Keyword Search: Users can input keywords to search within the extracted text. The search is case-insensitive, and keywords are highlighted using customizable HTML tags.

Features

Image Upload: Users can upload images in JPEG format to extract text.
Text Extraction: The model processes the image and extracts any text found within.
Keyword Search: Users can search for specific keywords in the extracted text. Matching keywords are highlighted for easy identification.
Highlighting: Search results are highlighted in customizable colors using HTML, making it easy to locate keywords.

Model Details

Model Name: GOT-OCR 2.0
Architecture: Transformer-based model, fine-tuned specifically for OCR tasks.
Framework: Hugging Face's Transformers library.
Device Compatibility: This model requires a GPU, ideally with NVIDIA CUDA support, for efficient performance.
Deployment: Currently hosted on Hugging Face Spaces, using a paid NVIDIA T4 GPU.

Model Components

Tokenizer: Loaded with AutoTokenizer for tokenizing the extracted text.
Model: Loaded with AutoModel for OCR, running in CUDA mode if a compatible GPU is available.

Gradio Web Interface

This project utilizes Gradio to create a web interface that allows users to interact with the model seamlessly. The interface features:

Image Upload Section: Users can upload an image (JPEG) to be processed by the OCR model.
Text Display: Extracted text is displayed on the interface, allowing users to view the OCR results instantly.
Keyword Search and Highlighting: A search box for keyword input enables users to locate specific terms within the extracted text, with matched keywords highlighted in a customizable color.

Setup Instructions

Prerequisites

GPU Requirement: This model requires a GPU to run efficiently. Ensure you have an NVIDIA CUDA-compatible device or similar technology.
Dependencies: The required dependencies are included in requirements.txt. Install them with the following command:
```
pip install -r requirements.txt
```

Required Libraries

The following libraries are essential for running the project:

torch
transformers
gradio
pillow
tiktoken
torchvision
torchaudio
verovio
accelerate

Installation

Clone the repository and navigate to the project directory.
Install dependencies using:
```
pip install -r requirements.txt
```
Launch the Gradio web app:
```
python app.py
```

Running the Project on Hugging Face Spaces

This project is currently deployed on Hugging Face Spaces using an NVIDIA T4 GPU. To configure your own deployment on Hugging Face, refer to the Spaces Configuration Reference.

Usage

Upload an Image: Click on the "Upload" button to upload a JPEG image.
Extract Text: The OCR model will process the image and display the extracted text.
Search for Keywords: Enter a keyword in the search bar to locate it within the extracted text.
View Results: The keyword will be highlighted, making it easy to spot within the extracted text.

Additional Notes

Performance: The OCR model is optimized for GPU-based execution. Running it on a CPU might be slower and is not recommended for real-time applications.
Customization: The color of the highlighted keywords can be customized by modifying the HTML tags in the code.

Example Usage

# Upload Image: Upload a JPEG image to the interface.
# Extract Text: The model processes the image and extracts text.
# Search Keywords: Search for keywords to highlight them in the text.

Support

For any issues or questions, feel free to open an issue on the repository or contact the developer.

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference