Spaces:

omvishesh
/

OCR-app

Paused

File size: 4,964 Bytes

---
title: OCR App
emoji: 🏆
colorFrom: pink
colorTo: blue
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
short_description: An OCR application integrated with GOT OCR 2.0
---


# OCR Model Integration with Gradio

This project integrates a pre-trained OCR (Optical Character Recognition) model, GOT-OCR 2.0, with a Gradio-based web interface. It allows users to upload images, extract text, and perform keyword searches within the extracted text. Matching keywords are highlighted, enhancing the readability and usability of the extracted content.

---

## Project Overview

- **OCR Model:** This project leverages the GOT-OCR 2.0 model from Hugging Face, a transformer-based model fine-tuned for OCR tasks.
- **Frontend:** The frontend interface is built using Gradio, offering a user-friendly experience for image uploads, text extraction, and keyword searching.
- **Keyword Search:** Users can input keywords to search within the extracted text. The search is case-insensitive, and keywords are highlighted using customizable HTML tags.

---

## Features

1. **Image Upload**: Users can upload images in JPEG format to extract text.
2. **Text Extraction**: The model processes the image and extracts any text found within.
3. **Keyword Search**: Users can search for specific keywords in the extracted text. Matching keywords are highlighted for easy identification.
4. **Highlighting**: Search results are highlighted in customizable colors using HTML, making it easy to locate keywords.

---

## Model Details

- **Model Name**: [GOT-OCR 2.0](https://huggingface.co/ucaslcl/GOT-OCR2_0)
- **Architecture**: Transformer-based model, fine-tuned specifically for OCR tasks.
- **Framework**: Hugging Face's Transformers library.
- **Device Compatibility**: This model requires a GPU, ideally with NVIDIA CUDA support, for efficient performance.
- **Deployment**: Currently hosted on Hugging Face Spaces, using a paid NVIDIA T4 GPU.

### Model Components

- **Tokenizer**: Loaded with `AutoTokenizer` for tokenizing the extracted text.
- **Model**: Loaded with `AutoModel` for OCR, running in CUDA mode if a compatible GPU is available.

---

## Gradio Web Interface

This project utilizes Gradio to create a web interface that allows users to interact with the model seamlessly. The interface features:

1. **Image Upload Section**: Users can upload an image (JPEG) to be processed by the OCR model.
2. **Text Display**: Extracted text is displayed on the interface, allowing users to view the OCR results instantly.
3. **Keyword Search and Highlighting**: A search box for keyword input enables users to locate specific terms within the extracted text, with matched keywords highlighted in a customizable color.

---

## Setup Instructions

### Prerequisites

- **GPU Requirement**: This model requires a GPU to run efficiently. Ensure you have an NVIDIA CUDA-compatible device or similar technology.
- **Dependencies**: The required dependencies are included in `requirements.txt`. Install them with the following command:
  ```bash
  pip install -r requirements.txt
  ```

### Required Libraries

The following libraries are essential for running the project:

- `torch`
- `transformers`
- `gradio`
- `pillow`
- `tiktoken`
- `torchvision`
- `torchaudio`
- `verovio`
- `accelerate`

### Installation

1. Clone the repository and navigate to the project directory.
2. Install dependencies using:
   ```bash
   pip install -r requirements.txt
   ```
3. Launch the Gradio web app:
   ```bash
   python app.py
   ```

---

## Running the Project on Hugging Face Spaces

This project is currently deployed on Hugging Face Spaces using an NVIDIA T4 GPU. To configure your own deployment on Hugging Face, refer to the [Spaces Configuration Reference](https://huggingface.co/docs/hub/spaces-config-reference).

---

## Usage

1. **Upload an Image**: Click on the "Upload" button to upload a JPEG image.
2. **Extract Text**: The OCR model will process the image and display the extracted text.
3. **Search for Keywords**: Enter a keyword in the search bar to locate it within the extracted text.
4. **View Results**: The keyword will be highlighted, making it easy to spot within the extracted text.

---

## Additional Notes

- **Performance**: The OCR model is optimized for GPU-based execution. Running it on a CPU might be slower and is not recommended for real-time applications.
- **Customization**: The color of the highlighted keywords can be customized by modifying the HTML tags in the code.

---

## Example Usage

```python
# Upload Image: Upload a JPEG image to the interface.
# Extract Text: The model processes the image and extracts text.
# Search Keywords: Search for keywords to highlight them in the text.
```

---

## Support

For any issues or questions, feel free to open an issue on the repository or contact the developer.

---



Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference