Docling/docs/index.md
Panos Vagenas 2d24faecd9
docs: add integrations, revamp docs (#693)
Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
2025-01-07 14:15:54 +01:00

2.8 KiB

Docling DS4SD%2Fdocling | Trendshift

arXiv PyPI version PyPI - Python Version Poetry Code style: black Imports: isort Pydantic v2 pre-commit License MIT PyPI Downloads

Docling parses documents and exports them to the desired format with ease and speed.

Features

  • 🗂️ Reads popular document formats (PDF, DOCX, PPTX, XLSX, Images, HTML, AsciiDoc & Markdown) and exports to HTML, Markdown and JSON (with embedded and referenced images)
  • 📑 Advanced PDF document understanding incl. page layout, reading order & table structures
  • 🧩 Unified, expressive DoclingDocument representation format
  • 🤖 Easy integration with 🦙 LlamaIndex & 🦜🔗 LangChain for powerful RAG / QA applications
  • 🔍 OCR support for scanned PDFs
  • 💻 Simple and convenient CLI

Coming soon

  • ♾️ Equation & code extraction
  • 📝 Metadata extraction, including title, authors, references & language
  • 🦜🔗 Native LangChain extension

Get started

IBM ❤️ Open Source AI

Docling has been brought to you by IBM.