Anonymous1223334444
commited on
Commit
·
94b27ef
1
Parent(s):
c2e3cf5
Add YAML metadata to README for Hugging Face Hub
Browse files
README.md
CHANGED
@@ -1,3 +1,19 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
# Multimodal & Multilingual PDF Embedding Pipeline
|
2 |
|
3 |
This repository hosts a Python pipeline that extracts text, tables, and images from PDF documents, generates multimodal descriptions for visual content (tables and images), and then creates multilingual text embeddings for all extracted information. The generated embeddings are stored in a JSON file, ready for use in Retrieval Augmented Generation (RAG) systems or other downstream applications.
|
|
|
1 |
+
---
|
2 |
+
tags:
|
3 |
+
- multimodal
|
4 |
+
- multilingual
|
5 |
+
- pdf
|
6 |
+
- embeddings
|
7 |
+
- rag
|
8 |
+
- google-cloud
|
9 |
+
- vertex-ai
|
10 |
+
- gemini
|
11 |
+
- python
|
12 |
+
datasets:
|
13 |
+
- any
|
14 |
+
license: mit
|
15 |
+
---
|
16 |
+
|
17 |
# Multimodal & Multilingual PDF Embedding Pipeline
|
18 |
|
19 |
This repository hosts a Python pipeline that extracts text, tables, and images from PDF documents, generates multimodal descriptions for visual content (tables and images), and then creates multilingual text embeddings for all extracted information. The generated embeddings are stored in a JSON file, ready for use in Retrieval Augmented Generation (RAG) systems or other downstream applications.
|