[](https://arxiv.org/abs/2408.09869)
[](https://pypi.org/project/docling/)
[](https://pypi.org/project/docling/)
[](https://python-poetry.org/)
[](https://github.com/psf/black)
[](https://pycqa.github.io/isort/)
[](https://pydantic.dev)
[](https://github.com/pre-commit/pre-commit)
[](https://opensource.org/licenses/MIT)
[](https://pepy.tech/projects/docling)
Docling simplifies document processing, parsing diverse formats — including advanced PDF understanding — and providing seamless integrations with the gen AI ecosystem.
## Features
* 🗂️ Parsing of [multiple document formats][supported_formats] incl. PDF, DOCX, XLSX, HTML, images, and more
* 📑 Advanced PDF understanding incl. page layout, reading order, table structure, code, formulas, image classification, and more
* 🧬 Unified, expressive [DoclingDocument][docling_document] representation format
* ↪️ Various [export formats][supported_formats] and options, including Markdown, HTML, and lossless JSON
* 🔒 Local execution capabilities for sensitive data and air-gapped environments
* 🤖 Plug-and-play [integrations][integrations] incl. LangChain, LlamaIndex, Crew AI & Haystack for agentic AI
* 🔍 Extensive OCR support for scanned PDFs and images
* 💻 Simple and convenient CLI
### Coming soon
* 📝 Metadata extraction, including title, authors, references & language
* 📝 Inclusion of Visual Language Models ([SmolDocling](https://huggingface.co/blog/smolervlm#smoldocling))
* 📝 Chart understanding (Barchart, Piechart, LinePlot, etc)
* 📝 Complex chemistry understanding (Molecular structures)
## Get started
## IBM ❤️ Open Source AI
Docling has been brought to you by IBM.
[supported_formats]: ./supported_formats.md
[docling_document]: ./concepts/docling_document.md
[integrations]: ./integrations/index.md