Spaces:

broadfield-dev
/

pdf2markdown

Sleeping

broadfield-dev commited on Jun 2

Commit

6ccfcef

verified ·

1 Parent(s): 4b30e2b

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -1,10 +1,30 @@
 ---
-title: Pdf2markdown
-emoji: 👁
-colorFrom: red
 colorTo: blue
 sdk: docker
 pinned: false
 ---
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: Pdf2markdown (Flask)
+emoji: 👁️
+colorFrom: green
 colorTo: blue
 sdk: docker
 pinned: false
+# For Docker Spaces, app_port in README.md informs Hugging Face which internal port your app listens on.
+# This should match the port Gunicorn (or your app server) binds to.
+app_port: 7860
 ---
+## PDF to Markdown Converter (Flask Version)
+This application converts PDF files (either uploaded or from a URL) into Markdown format.
+It extracts text, attempts to format it, identifies tables, and extracts images.
+Extracted images are uploaded to a Hugging Face Dataset repository named "pdf-images-extracted" (this can be configured).
+**Important:** For image uploading to work, you **must** set an `HF_TOKEN` with write access to datasets in your Hugging Face Space secrets.
+### Features
+-   Upload PDF files directly.
+-   Process PDFs from a publicly accessible URL.
+-   Extracts plain text and attempts to preserve some layout.
+-   Detects and formats tables into Markdown.
+-   Extracts images from the PDF.
+-   Performs OCR on extracted images to include text from images.
+-   Uploads extracted images to a Hugging Face Dataset.
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference