Spaces:
Sleeping
Sleeping
title: MinerU PDF Processor | |
emoji: π | |
colorFrom: blue | |
colorTo: indigo | |
sdk: docker | |
pinned: false | |
license: apache-2.0 | |
app_port: 7860 | |
# MinerU PDF API | |
A simple API for extracting text and tables from PDF documents using MinerU's magic-pdf library. | |
## Features | |
- Extract text from PDF documents | |
- Identify and extract tables from PDFs | |
- Works with both regular and scanned PDFs | |
- Simple JSON response format | |
## API Endpoints | |
### Health Check | |
``` | |
GET /health | |
``` | |
Returns the current status of the service. | |
### Extract PDF Content | |
``` | |
POST /extract | |
``` | |
Upload a PDF file to extract its text and tables. | |
#### Request | |
- `file`: The PDF file to process (multipart/form-data) | |
#### Response | |
JSON object containing: | |
- `filename`: Original filename | |
- `pages`: Array of pages with text and tables | |
## Deployment | |
This application is deployed as a Hugging Face Space using Docker. | |
## Local Development | |
To run this application locally: | |
1. Install the requirements: | |
``` | |
pip install -r requirements.txt | |
``` | |
2. Run the application: | |
``` | |
python app.py | |
``` | |
3. Access the API at `http://localhost:7860` | |
## Docker | |
You can also build and run with Docker: | |
```bash | |
docker build -t mineru-pdf-api . | |
docker run -p 7860:7860 mineru-pdf-api | |
``` | |
## About | |
This API is built on top of MinerU and magic-pdf, a powerful PDF extraction tool. | |
## API Documentation | |
Once deployed, you can access the auto-generated Swagger documentation at: | |
``` | |
https://marcosremar2-docker-mineru.hf.space/docs | |
``` | |
For ReDoc documentation: | |
``` | |
https://marcosremar2-docker-mineru.hf.space/redoc | |
``` |