diff --git a/README.md b/README.md index 1c350cd..cafb15b 100644 --- a/README.md +++ b/README.md @@ -4,7 +4,7 @@

-# Docling +# 🦆 Docling

DS4SD%2Fdocling | Trendshift @@ -29,7 +29,7 @@ Docling parses documents and exports them to the desired format with ease and sp * 🗂️ Reads popular document formats (PDF, DOCX, PPTX, XLSX, Images, HTML, AsciiDoc & Markdown) and exports to Markdown and JSON * 📑 Advanced PDF document understanding including page layout, reading order & table structures * 🧩 Unified, expressive [DoclingDocument](https://ds4sd.github.io/docling/concepts/docling_document/) representation format -* 🤖 Easy integration with LlamaIndex 🦙 & LangChain 🦜🔗 for powerful RAG / QA applications +* 🤖 Easy integration with 🦙 LlamaIndex & 🦜🔗 LangChain for powerful RAG / QA applications * 🔍 OCR support for scanned PDFs * 💻 Simple and convenient CLI @@ -65,8 +65,24 @@ result = converter.convert(source) print(result.document.export_to_markdown()) # output: "## Docling Technical Report[...]" ``` -Check out [Getting started](https://ds4sd.github.io/docling/). -You will find lots of tuning options to leverage all the advanced capabilities. +More [advanced usage options](https://ds4sd.github.io/docling/usage/) are available in +the docs. + +## Documentation + +Check out Docling's [documentation](https://ds4sd.github.io/docling/), for details on +installation, usage, concepts, recipes, extensions, and more. + +## Examples + +Go hands-on with our [examples](https://ds4sd.github.io/docling/examples/), +demonstrating how to address different application use cases with Docling. + +## Integrations + +To further accelerate your AI application development, check out Docling's native +[integrations](https://ds4sd.github.io/docling/integrations/) with popular frameworks +and tools. ## Get help and support diff --git a/docs/assets/docling_ecosystem.png b/docs/assets/docling_ecosystem.png new file mode 100644 index 0000000..299f62d Binary files /dev/null and b/docs/assets/docling_ecosystem.png differ diff --git a/docs/assets/docling_ecosystem.pptx b/docs/assets/docling_ecosystem.pptx new file mode 100644 index 0000000..14ec044 Binary files /dev/null and b/docs/assets/docling_ecosystem.pptx differ diff --git a/docs/index.md b/docs/index.md index 27c926f..3ae3ceb 100644 --- a/docs/index.md +++ b/docs/index.md @@ -1,5 +1,3 @@ -# Docling -

Docling DS4SD%2Fdocling | Trendshift @@ -23,7 +21,7 @@ Docling parses documents and exports them to the desired format with ease and sp * 🗂️ Reads popular document formats (PDF, DOCX, PPTX, XLSX, Images, HTML, AsciiDoc & Markdown) and exports to Markdown and JSON * 📑 Advanced PDF document understanding incl. page layout, reading order & table structures * 🧩 Unified, expressive [DoclingDocument](./concepts/docling_document.md) representation format -* 🤖 Easy integration with LlamaIndex 🦙 & LangChain 🦜🔗 for powerful RAG / QA applications +* 🤖 Easy integration with 🦙 LlamaIndex & 🦜🔗 LangChain for powerful RAG / QA applications * 🔍 OCR support for scanned PDFs * 💻 Simple and convenient CLI diff --git a/docs/integrations/bee.md b/docs/integrations/bee.md new file mode 100644 index 0000000..168fd78 --- /dev/null +++ b/docs/integrations/bee.md @@ -0,0 +1,9 @@ +Docling is available as an extraction backend in the [Bee][github] framework. + +- 💻 [Bee GitHub][github] +- 📖 [Bee Docs][docs] +- 📦 [Bee NPM][package] + +[github]: https://github.com/i-am-bee +[docs]: https://i-am-bee.github.io/bee-agent-framework/ +[package]: https://www.npmjs.com/package/bee-agent-framework diff --git a/docs/integrations/index.md b/docs/integrations/index.md index 3539c2f..fb8d7fe 100644 --- a/docs/integrations/index.md +++ b/docs/integrations/index.md @@ -1 +1,6 @@ Use the navigation on the left to browse through Docling integrations with popular frameworks and tools. + + +

+ Docling +

diff --git a/docs/integrations/instructlab.md b/docs/integrations/instructlab.md new file mode 100644 index 0000000..5f3b331 --- /dev/null +++ b/docs/integrations/instructlab.md @@ -0,0 +1,17 @@ +Docling is powering document processing in [InstructLab](https://instructlab.ai/), +enabling users to unlock the knowledge hidden in documents and present it to +InstructLab's fine-tuning for aligning AI models to the user's specific data. + +More details can be found in this [blog post][blog]. + +- 🏠 [InstructLab Home][home] +- 💻 [InstructLab GitHub][github] +- 🧑🏻‍💻 [InstructLab UI][ui] +- 📖 [InstructLab Docs][docs] + + +[home]: https://instructlab.ai +[github]: https://github.com/instructlab +[ui]: https://ui.instructlab.ai/ +[docs]: https://docs.instructlab.ai/ +[blog]: https://www.redhat.com/en/blog/docling-missing-document-processing-companion-generative-ai diff --git a/docs/integrations/prodigy.md b/docs/integrations/prodigy.md new file mode 100644 index 0000000..8bb40c2 --- /dev/null +++ b/docs/integrations/prodigy.md @@ -0,0 +1,9 @@ +Docling is available in [Prodigy][home] as a [Prodigy-PDF plugin][plugin] recipe. + +- 🌐 [Prodigy Home][home] +- 🔌 [Prodigy-PDF Plugin][plugin] +- 🧑🏽‍🍳 [pdf-spans.manual Recipe][recipe] + +[home]: https://prodi.gy/ +[plugin]: https://prodi.gy/docs/plugins#pdf +[recipe]: https://prodi.gy/docs/plugins#pdf-spans.manual diff --git a/docs/integrations/spacy.md b/docs/integrations/spacy.md index 82a2089..a057870 100644 --- a/docs/integrations/spacy.md +++ b/docs/integrations/spacy.md @@ -1,3 +1,5 @@ +# spaCy + Docling is available in [spaCy](https://spacy.io/) as the "SpaCy Layout" plugin: - 💻 [SpacyLayout GitHub][github] diff --git a/docs/overrides/main.html b/docs/overrides/main.html index 195acaf..74749ab 100644 --- a/docs/overrides/main.html +++ b/docs/overrides/main.html @@ -1,5 +1,7 @@ {% extends "base.html" %} +{# {% block announce %}

🎉 Docling has gone v2! Check out what's new and how to get started!

{% endblock %} +#} diff --git a/mkdocs.yml b/mkdocs.yml index 43012b1..dd4502d 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -52,8 +52,8 @@ theme: - search.suggest - toc.follow nav: - - Get started: - - Home: index.md + - Home: + - "🦆 Docling": index.md - Installation: installation.md - Usage: usage.md - CLI: cli.md @@ -85,10 +85,13 @@ nav: # - CLI: examples/cli.md - Integrations: - Integrations: integrations/index.md + - "🐝 Bee": integrations/bee.md - "Data Prep Kit": integrations/data_prep_kit.md - "DocETL": integrations/docetl.md + - "🐶 InstructLab": integrations/instructlab.md - "Kotaemon": integrations/kotaemon.md - - "LlamaIndex 🦙": integrations/llamaindex.md + - "🦙 LlamaIndex": integrations/llamaindex.md + - "Prodigy": integrations/prodigy.md - "spaCy": integrations/spacy.md # - "LangChain 🦜🔗": integrations/langchain.md # - API reference: