broadfield-dev commited on
Commit
6ccfcef
·
verified ·
1 Parent(s): 4b30e2b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +23 -3
README.md CHANGED
@@ -1,10 +1,30 @@
1
  ---
2
- title: Pdf2markdown
3
- emoji: 👁
4
- colorFrom: red
5
  colorTo: blue
6
  sdk: docker
7
  pinned: false
 
 
 
8
  ---
9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
1
  ---
2
+ title: Pdf2markdown (Flask)
3
+ emoji: 👁️
4
+ colorFrom: green
5
  colorTo: blue
6
  sdk: docker
7
  pinned: false
8
+ # For Docker Spaces, app_port in README.md informs Hugging Face which internal port your app listens on.
9
+ # This should match the port Gunicorn (or your app server) binds to.
10
+ app_port: 7860
11
  ---
12
 
13
+ ## PDF to Markdown Converter (Flask Version)
14
+
15
+ This application converts PDF files (either uploaded or from a URL) into Markdown format.
16
+ It extracts text, attempts to format it, identifies tables, and extracts images.
17
+
18
+ Extracted images are uploaded to a Hugging Face Dataset repository named "pdf-images-extracted" (this can be configured).
19
+ **Important:** For image uploading to work, you **must** set an `HF_TOKEN` with write access to datasets in your Hugging Face Space secrets.
20
+
21
+ ### Features
22
+ - Upload PDF files directly.
23
+ - Process PDFs from a publicly accessible URL.
24
+ - Extracts plain text and attempts to preserve some layout.
25
+ - Detects and formats tables into Markdown.
26
+ - Extracts images from the PDF.
27
+ - Performs OCR on extracted images to include text from images.
28
+ - Uploads extracted images to a Hugging Face Dataset.
29
+
30
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference