Spaces:
Running
Running
| title: olmOCR Markdown Converter | |
| emoji: π | |
| colorFrom: yellow | |
| colorTo: blue | |
| sdk: gradio | |
| sdk_version: 3.50.2 | |
| app_file: app.py | |
| python_version: 3.11 | |
| license: mit | |
| # olmOCR Markdown Converter | |
| This Space uses the `olmOCR` model pipeline to convert PDFs (including scientific papers) into markdown `.txt` files that retain document structure, headers, and basic math formatting β ready for Calibre/Kindle or downstream parsing. | |
| - β Vision + text anchor OCR pipeline (via `olmOCR`) | |
| - β Extracts semantic structure via PDF TOC | |
| - β Outputs clean `.txt` in markdown format | |
| - β Hugging Face **Gradio Space with GPU support** | |
| ## Example Use | |
| Upload a scientific paper in PDF and download a markdown `.txt` version with preserved headers and inline structure. | |
| --- | |
| Built by [@BenedictRichardLeonardi](https://huggingface.co/BenedictRichardLeonardi) using [olmOCR](https://huggingface.co/allenai/olmOCR-7B-0225-preview) | |