prithivMLmods commited on
Commit
d8857fd
·
verified ·
1 Parent(s): fd20fa7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +87 -3
README.md CHANGED
@@ -1,3 +1,87 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ - th
6
+ - zh
7
+ pipeline_tag: image-text-to-text
8
+ library_name: transformers
9
+ tags:
10
+ - text-generation-inference
11
+ ---
12
+ # **OCR-ReportLab**
13
+
14
+ ![OCR](https://github.com/user-attachments/assets/83a88cd7-c19e-4214-bf59-6ede1eb05e89)
15
+
16
+ > [!note]
17
+ **OCR-ReportLab** is a collection of Colab notebooks designed to perform Optical Character Recognition (OCR) on images and generate DOCX or PDF documents containing both the original image and the extracted text. It supports multiple state-of-the-art vision-language models for experimentation and practical use.
18
+
19
+ ## Notebooks
20
+
21
+ You can launch and run the following notebooks directly in Google Colab:
22
+
23
+ - **Nanonets OCR:** [Open in Colab](https://colab.research.google.com/drive/1VvA-amvSVxGdWgIsh4_by6KWOtEs_Iqp)
24
+ - **Monkey OCR:** [Open in Colab](https://colab.research.google.com/drive/1vPCojbmlXjDFUt06FJ1tjgnj_zWK4mUo)
25
+ - **OCRFlux 3B:** [Open in Colab](https://colab.research.google.com/drive/1TDoCXzWdF2hxVLbISqW6DjXAzOyI7pzf)
26
+ - **Typhoon OCR:** [Open in Colab](https://colab.research.google.com/drive/1_59zvLNnn1kvbiSFxzA1WiqhpbW8RKbz)
27
+ - **... and more ...**
28
+ ## Features
29
+
30
+ - Extracts text from input images using various OCR models
31
+ - Embeds the image and extracted text into DOCX or PDF formats
32
+ - Designed for quick deployment via Google Colab
33
+
34
+ ## Supported Models
35
+
36
+ The repository currently supports the following OCR implementations:
37
+
38
+ - **Nanonets OCR**
39
+ - **Monkey OCR**
40
+ - **OCRFlux 3B**
41
+ - **Typhoon OCR 3B**
42
+
43
+ ## Installation
44
+
45
+ No installation is required. Simply click on the links above to run the notebooks in Google Colab. Make sure to upload your image file(s) when prompted and follow the instructions in the notebook.
46
+
47
+ ---
48
+
49
+ ## Other Images
50
+
51
+ ---
52
+
53
+ <table border="1" style="width:100%; table-layout:fixed;">
54
+ <tr>
55
+ <td style="text-align:center;">
56
+ <img src="https://github.com/user-attachments/assets/88429981-84d0-40b2-8d99-546c439d36f3" alt="OCR" width="100%">
57
+ <p>OCR</p>
58
+ </td>
59
+ <td style="text-align:center;">
60
+ <img src="https://github.com/user-attachments/assets/bb6bfbb5-3313-47c5-988e-47083531e398" alt="Caption" width="100%">
61
+ <p>Caption</p>
62
+ </td>
63
+ </tr>
64
+ </table>
65
+
66
+ ---
67
+
68
+ ![222](https://github.com/user-attachments/assets/eae0f85d-2963-4edf-96e9-caedfe048c4f)
69
+
70
+ ---
71
+
72
+ ## Dependencies
73
+
74
+ The notebooks are built using:
75
+
76
+ - Python
77
+ - PyTorch
78
+ - Hugging Face Transformers
79
+ - ReportLab
80
+ - Gradio (for UI)
81
+ - (Qwen2.5-VL based)
82
+
83
+ All dependencies are automatically installed in the Colab environment.
84
+
85
+ ## Author
86
+
87
+ Created and maintained by [PRITHIVSAKTHIUR](https://github.com/PRITHIVSAKTHIUR)