
* updated README Signed-off-by: Peter Staar <taa@zurich.ibm.com> * removed duck in title Signed-off-by: Peter Staar <taa@zurich.ibm.com> * updated the index.md Signed-off-by: Peter Staar <taa@zurich.ibm.com> * updated the cli to export html Signed-off-by: Peter Staar <taa@zurich.ibm.com> * added html to cli Signed-off-by: Peter Staar <taa@zurich.ibm.com> * reformatted the code Signed-off-by: Peter Staar <taa@zurich.ibm.com> * removed the duck emoji, added the in the cli. Currently, the referenced seems broken Signed-off-by: Peter Staar <taa@zurich.ibm.com> * cleaning up the comments Signed-off-by: Peter Staar <taa@zurich.ibm.com> * reference is now working Signed-off-by: Peter Staar <taa@zurich.ibm.com> * Clean up styling and docs Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Pin docling-core>=2.7.1 Signed-off-by: Christoph Auer <cau@zurich.ibm.com> --------- Signed-off-by: Peter Staar <taa@zurich.ibm.com> Signed-off-by: Christoph Auer <cau@zurich.ibm.com> Co-authored-by: Christoph Auer <cau@zurich.ibm.com>
2.4 KiB
2.4 KiB
Docling parses documents and exports them to the desired format with ease and speed.
Features
- 🗂️ Reads popular document formats (PDF, DOCX, PPTX, XLSX, Images, HTML, AsciiDoc & Markdown) and exports to HTML, Markdown and JSON (with embedded and referenced images)
- 📑 Advanced PDF document understanding incl. page layout, reading order & table structures
- 🧩 Unified, expressive DoclingDocument representation format
- 🤖 Easy integration with 🦙 LlamaIndex & 🦜🔗 LangChain for powerful RAG / QA applications
- 🔍 OCR support for scanned PDFs
- 💻 Simple and convenient CLI
Coming soon
- ♾️ Equation & code extraction
- 📝 Metadata extraction, including title, authors, references & language
- 🦜🔗 Native LangChain extension
IBM ❤️ Open Source AI
Docling has been brought to you by IBM.