Spaces:
Sleeping
Sleeping
metadata
title: Pdf2markdown (Flask)
emoji: 👁️
colorFrom: green
colorTo: blue
sdk: docker
pinned: false
app_port: 7860
PDF to Markdown Converter (Flask Version)
This application converts PDF files (either uploaded or from a URL) into Markdown format. It extracts text, attempts to format it, identifies tables, and extracts images.
Extracted images are uploaded to a Hugging Face Dataset repository named "pdf-images-extracted" (this can be configured).
Important: For image uploading to work, you must set an HF_TOKEN
with write access to datasets in your Hugging Face Space secrets.
Features
- Upload PDF files directly.
- Process PDFs from a publicly accessible URL.
- Extracts plain text and attempts to preserve some layout.
- Detects and formats tables into Markdown.
- Extracts images from the PDF.
- Performs OCR on extracted images to include text from images.
- Uploads extracted images to a Hugging Face Dataset.
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference