---
base_model:
- HuggingFaceTB/SmolVLM-256M-Instruct
language:
- en
library_name: transformers
license: cdla-permissive-2.0
pipeline_tag: image-text-to-text
---
Description |
Instruction |
Comment |
Full conversion |
Convert this page to docling. |
DocTags represetation |
Chart |
Convert chart to table. |
(e.g., <chart>) |
Formula |
Convert formula to LaTeX. |
(e.g., <formula>) |
Code |
Convert code to text. |
(e.g., <code>) |
Table |
Convert table to OTSL. |
(e.g., <otsl>) OTSL: Lysak et al., 2023 |
Actions and Pipelines |
OCR the text in a specific location: <loc_155><loc_233><loc_206><loc_237> |
|
Identify element at: <loc_247><loc_482><10c_252><loc_486> |
|
Find all 'text' elements on the page, retrieve all section headers. |
|
Detect footer elements on the page. |
|
#### Model Summary
- **Developed by:** Docling Team, IBM Research
- **Model type:** Multi-modal model (image+text)
- **Language(s) (NLP):** English
- **License:** Apache 2.0
- **Architecture:** Based on [Idefics3](https://huggingface.co/HuggingFaceM4/Idefics3-8B-Llama3) (see technical summary)
- **Finetuned from model:** Based on [SmolVLM-256M-Instruct](https://huggingface.co/HuggingFaceTB/SmolVLM-256M-Instruct)
**Repository:** [Docling](https://github.com/docling-project/docling)
**Paper:** [arXiv](https://arxiv.org/abs/2503.11576)
**Project Page:** [Hugging Face](https://huggingface.co/ds4sd/SmolDocling-256M-preview)
**Citation:**
```
@misc{nassar2025smoldoclingultracompactvisionlanguagemodel,
title={SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion},
author={Ahmed Nassar and Andres Marafioti and Matteo Omenetti and Maksym Lysak and Nikolaos Livathinos and Christoph Auer and Lucas Morin and Rafael Teixeira de Lima and Yusik Kim and A. Said Gurbuz and Michele Dolfi and Miquel FarrΓ© and Peter W. J. Staar},
year={2025},
eprint={2503.11576},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2503.11576},
}
```
**Demo:** [HF Space](https://huggingface.co/spaces/ds4sd/SmolDocling-256M-Demo)