Spaces:
Sleeping
Sleeping
File size: 1,234 Bytes
ba5d90f 6ccfcef ba5d90f 3e6f5e3 ba5d90f 6ccfcef ba5d90f 6ccfcef 3e6f5e3 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
---
title: Pdf2markdown (Flask)
emoji: 👁️
colorFrom: green
colorTo: blue
sdk: docker
pinned: false
# For Docker Spaces, app_port in README.md informs Hugging Face which internal port your app listens on.
# This should match the port Gunicorn (or your app server) binds to.
app_port: 7860
---
## PDF to Markdown Converter (Flask Version)
This application converts PDF files (either uploaded or from a URL) into Markdown format.
It extracts text, attempts to format it, identifies tables, and extracts images.
Extracted images are uploaded to a Hugging Face Dataset repository named "pdf-images-extracted" (this can be configured).
**Important:** For image uploading to work, you **must** set an `HF_TOKEN` with write access to datasets in your Hugging Face Space secrets.
### Features
- Upload PDF files directly.
- Process PDFs from a publicly accessible URL.
- Extracts plain text and attempts to preserve some layout.
- Detects and formats tables into Markdown.
- Extracts images from the PDF.
- Performs OCR on extracted images to include text from images.
- Uploads extracted images to a Hugging Face Dataset.
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference |