docs: extend integration docs & README (#456)

Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
This commit is contained in:
Panos Vagenas 2024-11-28 09:41:21 +01:00 committed by GitHub
parent 211f4f7570
commit 84c46fdeb3
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
11 changed files with 71 additions and 10 deletions

View File

@ -4,7 +4,7 @@
</a> </a>
</p> </p>
# Docling # 🦆 Docling
<p align="center"> <p align="center">
<a href="https://trendshift.io/repositories/12132" target="_blank"><img src="https://trendshift.io/api/badge/repositories/12132" alt="DS4SD%2Fdocling | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a> <a href="https://trendshift.io/repositories/12132" target="_blank"><img src="https://trendshift.io/api/badge/repositories/12132" alt="DS4SD%2Fdocling | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>
@ -29,7 +29,7 @@ Docling parses documents and exports them to the desired format with ease and sp
* 🗂️ Reads popular document formats (PDF, DOCX, PPTX, XLSX, Images, HTML, AsciiDoc & Markdown) and exports to Markdown and JSON * 🗂️ Reads popular document formats (PDF, DOCX, PPTX, XLSX, Images, HTML, AsciiDoc & Markdown) and exports to Markdown and JSON
* 📑 Advanced PDF document understanding including page layout, reading order & table structures * 📑 Advanced PDF document understanding including page layout, reading order & table structures
* 🧩 Unified, expressive [DoclingDocument](https://ds4sd.github.io/docling/concepts/docling_document/) representation format * 🧩 Unified, expressive [DoclingDocument](https://ds4sd.github.io/docling/concepts/docling_document/) representation format
* 🤖 Easy integration with LlamaIndex 🦙 & LangChain 🦜🔗 for powerful RAG / QA applications * 🤖 Easy integration with 🦙 LlamaIndex & 🦜🔗 LangChain for powerful RAG / QA applications
* 🔍 OCR support for scanned PDFs * 🔍 OCR support for scanned PDFs
* 💻 Simple and convenient CLI * 💻 Simple and convenient CLI
@ -65,8 +65,24 @@ result = converter.convert(source)
print(result.document.export_to_markdown()) # output: "## Docling Technical Report[...]" print(result.document.export_to_markdown()) # output: "## Docling Technical Report[...]"
``` ```
Check out [Getting started](https://ds4sd.github.io/docling/). More [advanced usage options](https://ds4sd.github.io/docling/usage/) are available in
You will find lots of tuning options to leverage all the advanced capabilities. the docs.
## Documentation
Check out Docling's [documentation](https://ds4sd.github.io/docling/), for details on
installation, usage, concepts, recipes, extensions, and more.
## Examples
Go hands-on with our [examples](https://ds4sd.github.io/docling/examples/),
demonstrating how to address different application use cases with Docling.
## Integrations
To further accelerate your AI application development, check out Docling's native
[integrations](https://ds4sd.github.io/docling/integrations/) with popular frameworks
and tools.
## Get help and support ## Get help and support

Binary file not shown.

After

Width:  |  Height:  |  Size: 233 KiB

Binary file not shown.

View File

@ -1,5 +1,3 @@
# Docling
<p align="center"> <p align="center">
<img loading="lazy" alt="Docling" src="assets/docling_processing.png" width="100%" /> <img loading="lazy" alt="Docling" src="assets/docling_processing.png" width="100%" />
<a href="https://trendshift.io/repositories/12132" target="_blank"><img src="https://trendshift.io/api/badge/repositories/12132" alt="DS4SD%2Fdocling | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a> <a href="https://trendshift.io/repositories/12132" target="_blank"><img src="https://trendshift.io/api/badge/repositories/12132" alt="DS4SD%2Fdocling | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>
@ -23,7 +21,7 @@ Docling parses documents and exports them to the desired format with ease and sp
* 🗂️ Reads popular document formats (PDF, DOCX, PPTX, XLSX, Images, HTML, AsciiDoc & Markdown) and exports to Markdown and JSON * 🗂️ Reads popular document formats (PDF, DOCX, PPTX, XLSX, Images, HTML, AsciiDoc & Markdown) and exports to Markdown and JSON
* 📑 Advanced PDF document understanding incl. page layout, reading order & table structures * 📑 Advanced PDF document understanding incl. page layout, reading order & table structures
* 🧩 Unified, expressive [DoclingDocument](./concepts/docling_document.md) representation format * 🧩 Unified, expressive [DoclingDocument](./concepts/docling_document.md) representation format
* 🤖 Easy integration with LlamaIndex 🦙 & LangChain 🦜🔗 for powerful RAG / QA applications * 🤖 Easy integration with 🦙 LlamaIndex & 🦜🔗 LangChain for powerful RAG / QA applications
* 🔍 OCR support for scanned PDFs * 🔍 OCR support for scanned PDFs
* 💻 Simple and convenient CLI * 💻 Simple and convenient CLI

9
docs/integrations/bee.md Normal file
View File

@ -0,0 +1,9 @@
Docling is available as an extraction backend in the [Bee][github] framework.
- 💻 [Bee GitHub][github]
- 📖 [Bee Docs][docs]
- 📦 [Bee NPM][package]
[github]: https://github.com/i-am-bee
[docs]: https://i-am-bee.github.io/bee-agent-framework/
[package]: https://www.npmjs.com/package/bee-agent-framework

View File

@ -1 +1,6 @@
Use the navigation on the left to browse through Docling integrations with popular frameworks and tools. Use the navigation on the left to browse through Docling integrations with popular frameworks and tools.
<p align="center">
<img loading="lazy" alt="Docling" src="../assets/docling_ecosystem.png" width="100%" />
</p>

View File

@ -0,0 +1,17 @@
Docling is powering document processing in [InstructLab](https://instructlab.ai/),
enabling users to unlock the knowledge hidden in documents and present it to
InstructLab's fine-tuning for aligning AI models to the user's specific data.
More details can be found in this [blog post][blog].
- 🏠 [InstructLab Home][home]
- 💻 [InstructLab GitHub][github]
- 🧑🏻‍💻 [InstructLab UI][ui]
- 📖 [InstructLab Docs][docs]
<!-- - 📝 [Blog post]() -->
[home]: https://instructlab.ai
[github]: https://github.com/instructlab
[ui]: https://ui.instructlab.ai/
[docs]: https://docs.instructlab.ai/
[blog]: https://www.redhat.com/en/blog/docling-missing-document-processing-companion-generative-ai

View File

@ -0,0 +1,9 @@
Docling is available in [Prodigy][home] as a [Prodigy-PDF plugin][plugin] recipe.
- 🌐 [Prodigy Home][home]
- 🔌 [Prodigy-PDF Plugin][plugin]
- 🧑🏽‍🍳 [pdf-spans.manual Recipe][recipe]
[home]: https://prodi.gy/
[plugin]: https://prodi.gy/docs/plugins#pdf
[recipe]: https://prodi.gy/docs/plugins#pdf-spans.manual

View File

@ -1,3 +1,5 @@
# spaCy
Docling is available in [spaCy](https://spacy.io/) as the "SpaCy Layout" plugin: Docling is available in [spaCy](https://spacy.io/) as the "SpaCy Layout" plugin:
- 💻 [SpacyLayout GitHub][github] - 💻 [SpacyLayout GitHub][github]

View File

@ -1,5 +1,7 @@
{% extends "base.html" %} {% extends "base.html" %}
{#
{% block announce %} {% block announce %}
<p>🎉 Docling has gone v2! <a href="{{ 'v2' | url }}">Check out</a> what's new and how to get started!</p> <p>🎉 Docling has gone v2! <a href="{{ 'v2' | url }}">Check out</a> what's new and how to get started!</p>
{% endblock %} {% endblock %}
#}

View File

@ -52,8 +52,8 @@ theme:
- search.suggest - search.suggest
- toc.follow - toc.follow
nav: nav:
- Get started: - Home:
- Home: index.md - "🦆 Docling": index.md
- Installation: installation.md - Installation: installation.md
- Usage: usage.md - Usage: usage.md
- CLI: cli.md - CLI: cli.md
@ -85,10 +85,13 @@ nav:
# - CLI: examples/cli.md # - CLI: examples/cli.md
- Integrations: - Integrations:
- Integrations: integrations/index.md - Integrations: integrations/index.md
- "🐝 Bee": integrations/bee.md
- "Data Prep Kit": integrations/data_prep_kit.md - "Data Prep Kit": integrations/data_prep_kit.md
- "DocETL": integrations/docetl.md - "DocETL": integrations/docetl.md
- "🐶 InstructLab": integrations/instructlab.md
- "Kotaemon": integrations/kotaemon.md - "Kotaemon": integrations/kotaemon.md
- "LlamaIndex 🦙": integrations/llamaindex.md - "🦙 LlamaIndex": integrations/llamaindex.md
- "Prodigy": integrations/prodigy.md
- "spaCy": integrations/spacy.md - "spaCy": integrations/spacy.md
# - "LangChain 🦜🔗": integrations/langchain.md # - "LangChain 🦜🔗": integrations/langchain.md
# - API reference: # - API reference: