File size: 6,429 Bytes

---
base_model:
- HuggingFaceTB/SmolVLM-256M-Instruct
language:
- en
library_name: mlx
license: apache-2.0
pipeline_tag: image-text-to-text
tags:
- mlx
---

# zboyles/SmolDocling-256M-preview-bf16
This model was converted to **MLX format** from [`ds4sd/SmolDocling-256M-preview`](https://huggingface.co/ds4sd/SmolDocling-256M-preview) using mlx-vlm version **0.1.18**.
* Refer to the [**original model card**](https://huggingface.co/ds4sd/SmolDocling-256M-preview) for more details on the model.
* Refer to the [**mlx-vlm repo**](https://github.com/Blaizzy/mlx-vlm) for more examples using `mlx-vlm`.


## Use SmolDocling-256M-preview with with docling and mlx 

> **Find Working MLX + Docling Example Code Below**


<div style="display: flex; align-items: center;">
    <img src="https://huggingface.co/ds4sd/SmolDocling-256M-preview/resolve/main/assets/SmolDocling_doctags1.png" alt="SmolDocling" style="width: 200px; height: auto; margin-right: 20px;">
    <div>
        <h3>SmolDocling-256M-preview</h3>
        <p>SmolDocling is a multimodal Image-Text-to-Text model designed for efficient document conversion. It retains Docling's most popular features while ensuring full compatibility with Docling through seamless support for <strong>DoclingDocuments</strong>.</p>
    </div>
</div>

This model was presented in the paper [SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion](https://huggingface.co/papers/2503.11576).

### 🚀 Features:  
- 🏷️ **DocTags for Efficient Tokenization** – Introduces DocTags an efficient and minimal representation for documents that is fully compatible with **DoclingDocuments**.  
- 🔍 **OCR (Optical Character Recognition)** – Extracts text accurately from images.  
- 📐 **Layout and Localization** – Preserves document structure and document element **bounding boxes**.  
- 💻 **Code Recognition** – Detects and formats code blocks including identation.  
- 🔢 **Formula Recognition** – Identifies and processes mathematical expressions.  
- 📊 **Chart Recognition** – Extracts and interprets chart data.  
- 📑 **Table Recognition** – Supports column and row headers for structured table extraction.  
- 🖼️ **Figure Classification** – Differentiates figures and graphical elements.  
- 📝 **Caption Correspondence** – Links captions to relevant images and figures.  
- 📜 **List Grouping** – Organizes and structures list elements correctly.  
- 📄 **Full-Page Conversion** – Processes entire pages for comprehensive document conversion including all page elements (code, equations, tables, charts etc.) 
- 🔲 **OCR with Bounding Boxes** – OCR regions using a bounding box.
- 📂 **General Document Processing** – Trained for both scientific and non-scientific documents.  
- 🔄 **Seamless Docling Integration** – Import into **Docling** and export in multiple formats.
- 💨 **Fast inference using VLLM** – Avg of 0.35 secs per page on A100 GPU.

### 🚧 *Coming soon!*
- 📊 **Better chart recognition 🛠️**
- 📚 **One shot multi-page inference ⏱️**
- 🧪 **Chemical Recognition**
- 📙 **Datasets**

## ⌨️ Get started (**MLX** code examples)

You can use **mlx** to perform inference, and [Docling](https://github.com/docling-project/docling) to convert the results to a variety of ourput formats (md, html, etc.):

<details>
<summary>📄 Single page image inference using MLX via `mlx-vlm` 🤖</summary>

```python
# Prerequisites:
# pip install -U mlx-vlm
# pip install docling_core

import sys

from pathlib import Path
from PIL import Image

from mlx_vlm import load, apply_chat_template, stream_generate
from mlx_vlm.utils import load_image

# Variables
path_or_hf_repo="zboyles/SmolDocling-256M-preview-bf16"
output_path=Path("output")
output_path.mkdir(exist_ok=True)

# Model Params
eos="<end_of_utterance>"
verbose=True
kwargs={
    "max_tokens": 8000,
    "temperature": 0.0,
}

# Load images
# Note: I manually downloaded the image
# image_src = "https://upload.wikimedia.org/wikipedia/commons/7/76/GazettedeFrance.jpg"
# image = load_image(image_src)
image_src = "images/GazettedeFrance.jpg"
image = Image.open(image_src).convert("RGB")

# Initialize processor and model
model, processor = load(
    path_or_hf_repo=path_or_hf_repo,
    trust_remote_code=True,
)
config = model.config


# Create input messages - Docling Walkthrough Structure
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image"},
            {"type": "text", "text": "Convert this page to docling."}
        ]
    },
]
prompt = apply_chat_template(processor, config, messages, add_generation_prompt=True)

# # Alternatively, supported prompt creation method
# messages = [{"role": "user", "content": "Convert this page to docling."}]
# prompt = apply_chat_template(processor, config, messages, add_generation_prompt=True)


text = ""
last_response = None

for response in stream_generate(
    model=model,
    processor=processor,
    prompt=prompt,
    image=image,
    **kwargs
):
    if verbose:
        print(response.text, end="", flush=True)
    text += response.text
    last_response = response
    if eos in text:
        text = text.split(eos)[0].strip()
        break
print()

if verbose:
    print("\n" + "=" * 10)
    if len(text) == 0:
        print("No text generated for this prompt")
        sys.exit(0)
    print(
        f"Prompt: {last_response.prompt_tokens} tokens, "
        f"{last_response.prompt_tps:.3f} tokens-per-sec"
    )
    print(
        f"Generation: {last_response.generation_tokens} tokens, "
        f"{last_response.generation_tps:.3f} tokens-per-sec"
    )
    print(f"Peak memory: {last_response.peak_memory:.3f} GB")

# To convert to Docling Document, MD, HTML, etc.:
docling_output_path = output_path / Path(image_src).with_suffix(".dt").name
docling_output_path.write_text(text)
doctags_doc = DocTagsDocument.from_doctags_and_image_pairs([text], [image])
doc = DoclingDocument(name="Document")
doc.load_from_doctags(doctags_doc)
# export as any format
# HTML
doc.save_as_html(docling_output_path.with_suffix(".html"))
# MD
doc.save_as_markdown(docling_output_path.with_suffix(".md"))
```
</details>

Thanks to [**@Blaizzy**](https://github.com/Blaizzy) for the [code examples](https://github.com/Blaizzy/mlx-vlm/tree/main/examples) that helped me quickly adapt the `docling` example.