docker_mineru / README.md
marcosremar2's picture
Update with magic-pdf API implementation
ab599b4
|
raw
history blame
1.6 kB
---
title: MinerU PDF Processor
emoji: πŸ“„
colorFrom: blue
colorTo: indigo
sdk: docker
pinned: false
license: apache-2.0
app_port: 7860
---
# MinerU PDF API
A simple API for extracting text and tables from PDF documents using MinerU's magic-pdf library.
## Features
- Extract text from PDF documents
- Identify and extract tables from PDFs
- Works with both regular and scanned PDFs
- Simple JSON response format
## API Endpoints
### Health Check
```
GET /health
```
Returns the current status of the service.
### Extract PDF Content
```
POST /extract
```
Upload a PDF file to extract its text and tables.
#### Request
- `file`: The PDF file to process (multipart/form-data)
#### Response
JSON object containing:
- `filename`: Original filename
- `pages`: Array of pages with text and tables
## Deployment
This application is deployed as a Hugging Face Space using Docker.
## Local Development
To run this application locally:
1. Install the requirements:
```
pip install -r requirements.txt
```
2. Run the application:
```
python app.py
```
3. Access the API at `http://localhost:7860`
## Docker
You can also build and run with Docker:
```bash
docker build -t mineru-pdf-api .
docker run -p 7860:7860 mineru-pdf-api
```
## About
This API is built on top of MinerU and magic-pdf, a powerful PDF extraction tool.
## API Documentation
Once deployed, you can access the auto-generated Swagger documentation at:
```
https://marcosremar2-docker-mineru.hf.space/docs
```
For ReDoc documentation:
```
https://marcosremar2-docker-mineru.hf.space/redoc
```