Spaces:

omvishesh
/

OCR-app

Paused

App Files Files Community

omvishesh commited on Nov 6, 2024

Commit

3319a69

verified ·

1 Parent(s): 51d22a6

Update README.md

Browse files

Files changed (1) hide show

README.md +114 -31

README.md CHANGED Viewed

@@ -10,48 +10,131 @@ pinned: false
 short_description: An OCR application integrated with GOT OCR 2.0
 ---
-<mark>OCR Model Integration Using Gradio:</mark>
-**This project integrates a pre-trained OCR (Optical Character Recognition) model with a Gradio-based web interface. Users can upload an image (JPEG format), extract the text using the model, and search for specific keywords in the extracted text. The keywords are highlighted within the displayed results.**
-**dependencies / libraries required:**
-torch
-transformers
-gradio
-pillow
-tiktoken
-torchvision
-torchaudio
-verovio
-accelerate
-all these libraries are included in requirements.txt to install them : pip install -r requirements.txt
-**ALSO this model requires a GPU to run , so make sure you have NVIDIA CUDA or similar technologies.**
-The current web page is running on the hugging face space which is using paid GPU that is Nvidia T4 medium.
-**Project Overview**
-OCR Model: This project uses the GOT-OCR 2.0 model from Hugging Face.
-Frontend: The frontend is built using Gradio, which provides an easy-to-use web interface.
-Keyword Search: Users can search for specific keywords in the extracted text. The search is case-insensitive, and the matching keywords are highlighted using HTML <mark> tags with customizable colors.
-**Model Description**
-The project uses a pre-trained OCR model from Hugging Face:
-**Model Name: GOT-OCR 2.0**
-Architecture: Transformer-based model, fine-tuned for Optical Character Recognition.
-Framework: Hugging Face's transformers library.
-The model is loaded using the AutoTokenizer and AutoModel classes from Hugging Face and runs on a CUDA-enabled device.
-**Gradio Web Interface**
-The project uses Gradio to create an easy-to-use web interface for interacting with the model. The interface allows users to upload images, extract text, and search for keywords in the extracted text.
-**Gradio Setup**
-Image Upload: The user uploads an image, and the text is extracted using the OCR model.
-Keyword Search: Users input a keyword to search within the extracted text.
-Highlighting: Keywords found in the text are highlighted with a customizable color using HTML <mark> tags
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 short_description: An OCR application integrated with GOT OCR 2.0
 ---
+# OCR Model Integration with Gradio
+This project integrates a pre-trained OCR (Optical Character Recognition) model, GOT-OCR 2.0, with a Gradio-based web interface. It allows users to upload images, extract text, and perform keyword searches within the extracted text. Matching keywords are highlighted, enhancing the readability and usability of the extracted content.
+---
+## Project Overview
+- **OCR Model:** This project leverages the GOT-OCR 2.0 model from Hugging Face, a transformer-based model fine-tuned for OCR tasks.
+- **Frontend:** The frontend interface is built using Gradio, offering a user-friendly experience for image uploads, text extraction, and keyword searching.
+- **Keyword Search:** Users can input keywords to search within the extracted text. The search is case-insensitive, and keywords are highlighted using customizable HTML tags.
+---
+## Features
+1. **Image Upload**: Users can upload images in JPEG format to extract text.
+2. **Text Extraction**: The model processes the image and extracts any text found within.
+3. **Keyword Search**: Users can search for specific keywords in the extracted text. Matching keywords are highlighted for easy identification.
+4. **Highlighting**: Search results are highlighted in customizable colors using HTML, making it easy to locate keywords.
+---
+## Model Details
+- **Model Name**: [GOT-OCR 2.0](https://huggingface.co/ucaslcl/GOT-OCR2_0)
+- **Architecture**: Transformer-based model, fine-tuned specifically for OCR tasks.
+- **Framework**: Hugging Face's Transformers library.
+- **Device Compatibility**: This model requires a GPU, ideally with NVIDIA CUDA support, for efficient performance.
+- **Deployment**: Currently hosted on Hugging Face Spaces, using a paid NVIDIA T4 GPU.
+### Model Components
+- **Tokenizer**: Loaded with `AutoTokenizer` for tokenizing the extracted text.
+- **Model**: Loaded with `AutoModel` for OCR, running in CUDA mode if a compatible GPU is available.
+---
+## Gradio Web Interface
+This project utilizes Gradio to create a web interface that allows users to interact with the model seamlessly. The interface features:
+1. **Image Upload Section**: Users can upload an image (JPEG) to be processed by the OCR model.
+2. **Text Display**: Extracted text is displayed on the interface, allowing users to view the OCR results instantly.
+3. **Keyword Search and Highlighting**: A search box for keyword input enables users to locate specific terms within the extracted text, with matched keywords highlighted in a customizable color.
+---
+## Setup Instructions
+### Prerequisites
+- **GPU Requirement**: This model requires a GPU to run efficiently. Ensure you have an NVIDIA CUDA-compatible device or similar technology.
+- **Dependencies**: The required dependencies are included in `requirements.txt`. Install them with the following command:
+  ```bash
+  pip install -r requirements.txt
+  ```
+### Required Libraries
+The following libraries are essential for running the project:
+- `torch`
+- `transformers`
+- `gradio`
+- `pillow`
+- `tiktoken`
+- `torchvision`
+- `torchaudio`
+- `verovio`
+- `accelerate`
+### Installation
+1. Clone the repository and navigate to the project directory.
+2. Install dependencies using:
+   ```bash
+   pip install -r requirements.txt
+   ```
+3. Launch the Gradio web app:
+   ```bash
+   python app.py
+   ```
+---
+## Running the Project on Hugging Face Spaces
+This project is currently deployed on Hugging Face Spaces using an NVIDIA T4 GPU. To configure your own deployment on Hugging Face, refer to the [Spaces Configuration Reference](https://huggingface.co/docs/hub/spaces-config-reference).
+---
+## Usage
+1. **Upload an Image**: Click on the "Upload" button to upload a JPEG image.
+2. **Extract Text**: The OCR model will process the image and display the extracted text.
+3. **Search for Keywords**: Enter a keyword in the search bar to locate it within the extracted text.
+4. **View Results**: The keyword will be highlighted, making it easy to spot within the extracted text.
+---
+## Additional Notes
+- **Performance**: The OCR model is optimized for GPU-based execution. Running it on a CPU might be slower and is not recommended for real-time applications.
+- **Customization**: The color of the highlighted keywords can be customized by modifying the HTML tags in the code.
+---
+## Example Usage
+```python
+# Upload Image: Upload a JPEG image to the interface.
+# Extract Text: The model processes the image and extracts text.
+# Search Keywords: Search for keywords to highlight them in the text.
+```
+---
+## Support
+For any issues or questions, feel free to open an issue on the repository or contact the developer.
+---
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference