# Docling
[](https://arxiv.org/abs/2408.09869) [](https://pypi.org/project/docling/)  [](https://python-poetry.org/) [](https://github.com/psf/black) [](https://pycqa.github.io/isort/) [](https://pydantic.dev) [](https://github.com/pre-commit/pre-commit) [](https://opensource.org/licenses/MIT) Docling bundles PDF document conversion to JSON and Markdown in an easy, self-contained package. ## Features * ⚡ Converts any PDF document to JSON or Markdown format, stable and lightning fast * 📑 Understands detailed page layout, reading order and recovers table structures * 📝 Extracts metadata from the document, such as title, authors, references and language * 🔍 Includes OCR support for scanned PDFs * 🤖 Integrates easily with LLM app / RAG frameworks like LlamaIndex 🦙 & LangChain 🦜🔗 * 💻 Provides a simple and convenient CLI