File size: 5,062 Bytes
997496f
06f01a0
997496f
06f01a0
 
 
 
 
 
 
c525ead
06f01a0
 
 
 
c525ead
06f01a0
 
c525ead
06f01a0
c525ead
06f01a0
 
 
c525ead
06f01a0
c525ead
06f01a0
c525ead
06f01a0
c525ead
06f01a0
 
 
c525ead
06f01a0
c525ead
 
06f01a0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
_\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\_  
_\\-------- **Image Data Extractor** -------\\_  
_\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\_

# Overview:
The **Image Data Extractor** is a Python-based tool designed to extract and structure text data from images of visiting cards using **PaddleOCR**. The tool processes the extracted text to recognize key information such as name, designation, contact number, address, and company name, organizing the output into a well-defined structure. The **Mistral 7B model** is used for advanced text analysis, and if it becomes unavailable, the system seamlessly switches to the **Gliner urchade/gliner_mediumv2.1** model.

# Installation Guide:

1. **Create and Activate a Virtual Environment**
    ```bash
    python -m venv venv
    source venv/bin/activate  # For Linux/Mac
    # or
    venv\Scripts\activate  # For Windows
    ```

2. **Install Required Libraries**
    ```bash
    pip install -r requirements.txt
    ```

3. **Run the Application**
    - If Docker is being used:
    ```bash
    docker-compose up --build
    ```
    - Without Docker:
    ```bash
    python app.py
    ```

4. **Set up Hugging Face Token**
    - Add your Hugging Face token in the `.env` file:
    ```bash
    HF_TOKEN=<your_huggingface_token>
    ```
    
# File Structure Overview:

```
ImageDataExtractor/
β”‚
β”œβ”€β”€ app.py                       # Main Flask app
β”œβ”€β”€ requirements.txt             # Dependencies
β”œβ”€β”€ Dockerfile                   # Docker container setup
β”œβ”€β”€ docker-compose.yml           # Docker Compose setup
β”‚    
β”œβ”€β”€ utility/     
β”‚   └── utils.py                 # PaddleOCR integration, Image preprocessing and Mistral model processing 
β”‚    
β”œβ”€β”€ template/    
β”‚   β”œβ”€β”€ index.html               # UI for image uploads
β”‚   └── result.html              # Display extracted results
β”‚    
β”œβ”€β”€ Backup/  
β”‚   β”œβ”€β”€ modules/                 # Base classes for data processing models
β”‚   β”‚   └── base.py              
β”‚   β”‚   └── data_proc.py         
β”‚   β”‚   └── evaluator.py         
β”‚   β”‚   └── layers.py            
β”‚   β”‚   └── run_evaluation.py    
β”‚   β”‚   └── span_rep.py          
β”‚   β”‚   └── token_rep.py         
β”‚   β”œβ”€β”€ backup.py                # Backup handling
β”‚   └── model.py                 # Gliner Model integration and backup logic
β”‚   └── save_load.py             # Mistral 7B model integration and backup logic
β”‚   └── train.py                 # Mistral 7B model integration and backup logic
β”‚    
└── .env                         # Environment variables (includes Hugging Face token)
```

# Program Overview:

### PaddleOCR Integration (utility/utils.py):
- **Text Extraction**: The tool utilizes **PaddleOCR** to extract text from image-based inputs (PNG, JPG, JPEG) of visiting cards.
- **Preprocessing**: Handles basic image preprocessing to enhance text recognition for OCR.

### Mistral 7B Integration (utility/utils.py):
- **Data Structuring**: After text extraction, the **Mistral 7B model** processes the extracted data, structuring it into fields such as name, designation, contact number, address, and company name.
  
### Fallback Mechanism (Backup/backup.py):
- **Gliner urchade/gliner_mediumv2.1 Model**: If the Mistral model is unavailable, the system uses the **Gliner urchade/gliner_mediumv2.1 model** to perform the same task, ensuring continuous service.
- **Error Handling**: Manages failures in model availability and ensures smooth fallback.

### Web Interface (app.py):
- **Flask API**: Provides endpoints for image uploads and displays the results in a structured manner.
- **HTML Interface**: A frontend for users to upload images of visiting cards and view the parsed results.

# Tree Map of the Program:

```
app.py
β”œβ”€β”€ Handles Flask API and web interface
β”œβ”€β”€ Manages file upload
β”œβ”€β”€ Extracts text with PaddleOCR
β”œβ”€β”€ Processes text with Mistral 7B
└── Displays structured results

utility/utils.py
β”œβ”€β”€ PaddleOCR for text extraction
└── Mistral 7B for data structuring

Backup/backup.py
β”œβ”€β”€ Gliner urchade/gliner_mediumv2.1 as fallback
└── Backup and error handling

Backup/model.py
└── Mistral 7B integration and processing logic
```

# Main Task:
The main objective is to extract and structure text data from visiting cards. The system identifies and organizes:
- **Name**
- **Designation**
- **Phone Number**
- **Address**
- **Company Name**

# References:

- [PaddleOCR Documentation](https://github.com/PaddlePaddle/PaddleOCR)
- [Mistral 7B Documentation](https://huggingface.co/)
- [Gliner urchade/gliner_mediumv2.1 Documentation](https://huggingface.co/)
- [Flask Documentation](https://flask.palletsprojects.com/)
- [Docker Documentation](https://docs.docker.com/)
- [Virtual Environments in Python](https://docs.python.org/3/tutorial/venv.html)

---