File size: 1,599 Bytes
9d76a63
ab599b4
9d76a63
 
 
 
 
ab599b4
 
9d76a63
 
ab599b4
b8fca79
ab599b4
 
 
 
 
 
 
 
44df236
f30c298
 
 
 
 
 
 
 
ab599b4
f30c298
 
 
 
 
 
 
ab599b4
f30c298
 
 
ab599b4
f30c298
 
 
 
ab599b4
 
f30c298
ab599b4
f30c298
ab599b4
f30c298
ab599b4
 
 
f30c298
ab599b4
 
 
 
f30c298
ab599b4
 
 
 
f30c298
ab599b4
f30c298
ab599b4
f30c298
ab599b4
 
 
 
 
f30c298
 
ab599b4
 
 
 
f30c298
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
---
title: MinerU PDF Processor
emoji: πŸ“„
colorFrom: blue
colorTo: indigo
sdk: docker
pinned: false
license: apache-2.0
app_port: 7860
---

# MinerU PDF API

A simple API for extracting text and tables from PDF documents using MinerU's magic-pdf library.

## Features

- Extract text from PDF documents
- Identify and extract tables from PDFs
- Works with both regular and scanned PDFs
- Simple JSON response format

## API Endpoints

### Health Check

```
GET /health
```

Returns the current status of the service.

### Extract PDF Content

```
POST /extract
```

Upload a PDF file to extract its text and tables.

#### Request

- `file`: The PDF file to process (multipart/form-data)

#### Response

JSON object containing:
- `filename`: Original filename
- `pages`: Array of pages with text and tables

## Deployment

This application is deployed as a Hugging Face Space using Docker.

## Local Development

To run this application locally:

1. Install the requirements:
   ```
   pip install -r requirements.txt
   ```

2. Run the application:
   ```
   python app.py
   ```

3. Access the API at `http://localhost:7860`

## Docker

You can also build and run with Docker:

```bash
docker build -t mineru-pdf-api .
docker run -p 7860:7860 mineru-pdf-api
```

## About

This API is built on top of MinerU and magic-pdf, a powerful PDF extraction tool.

## API Documentation

Once deployed, you can access the auto-generated Swagger documentation at:

```
https://marcosremar2-docker-mineru.hf.space/docs
```

For ReDoc documentation:

```
https://marcosremar2-docker-mineru.hf.space/redoc
```