Spaces:
Sleeping
Sleeping
Update README.md
Browse files
README.md
CHANGED
@@ -1,10 +1,30 @@
|
|
1 |
---
|
2 |
-
title: Pdf2markdown
|
3 |
-
emoji:
|
4 |
-
colorFrom:
|
5 |
colorTo: blue
|
6 |
sdk: docker
|
7 |
pinned: false
|
|
|
|
|
|
|
8 |
---
|
9 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
10 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
|
|
1 |
---
|
2 |
+
title: Pdf2markdown (Flask)
|
3 |
+
emoji: 👁️
|
4 |
+
colorFrom: green
|
5 |
colorTo: blue
|
6 |
sdk: docker
|
7 |
pinned: false
|
8 |
+
# For Docker Spaces, app_port in README.md informs Hugging Face which internal port your app listens on.
|
9 |
+
# This should match the port Gunicorn (or your app server) binds to.
|
10 |
+
app_port: 7860
|
11 |
---
|
12 |
|
13 |
+
## PDF to Markdown Converter (Flask Version)
|
14 |
+
|
15 |
+
This application converts PDF files (either uploaded or from a URL) into Markdown format.
|
16 |
+
It extracts text, attempts to format it, identifies tables, and extracts images.
|
17 |
+
|
18 |
+
Extracted images are uploaded to a Hugging Face Dataset repository named "pdf-images-extracted" (this can be configured).
|
19 |
+
**Important:** For image uploading to work, you **must** set an `HF_TOKEN` with write access to datasets in your Hugging Face Space secrets.
|
20 |
+
|
21 |
+
### Features
|
22 |
+
- Upload PDF files directly.
|
23 |
+
- Process PDFs from a publicly accessible URL.
|
24 |
+
- Extracts plain text and attempts to preserve some layout.
|
25 |
+
- Detects and formats tables into Markdown.
|
26 |
+
- Extracts images from the PDF.
|
27 |
+
- Performs OCR on extracted images to include text from images.
|
28 |
+
- Uploads extracted images to a Hugging Face Dataset.
|
29 |
+
|
30 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|