Spaces:

leonarb
/

olmocr-demo

Running

olmocr-demo / README.md

Update README.md

5e55b20 verified 6 months ago

929 Bytes

	---
	title: olmOCR Markdown Converter
	emoji: 📝
	colorFrom: yellow
	colorTo: blue
	sdk: gradio
	sdk_version: 3.50.2
	app_file: app.py
	python_version: 3.11
	license: mit
	---

	# olmOCR Markdown Converter

	This Space uses the `olmOCR` model pipeline to convert PDFs (including scientific papers) into markdown `.txt` files that retain document structure, headers, and basic math formatting — ready for Calibre/Kindle or downstream parsing.

	- ✅ Vision + text anchor OCR pipeline (via `olmOCR`)
	- ✅ Extracts semantic structure via PDF TOC
	- ✅ Outputs clean `.txt` in markdown format
	- ✅ Hugging Face Gradio Space with GPU support

	## Example Use

	Upload a scientific paper in PDF and download a markdown `.txt` version with preserved headers and inline structure.

	---

	Built by [@BenedictRichardLeonardi](https://huggingface.co/BenedictRichardLeonardi) using [olmOCR](https://huggingface.co/allenai/olmOCR-7B-0225-preview)