docs: extend integration docs & README (#456)

Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
This commit is contained in:
Panos Vagenas 2024-11-28 09:41:21 +01:00 committed by GitHub
parent 211f4f7570
commit 84c46fdeb3
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
11 changed files with 71 additions and 10 deletions

View File

@ -4,7 +4,7 @@
</a>
</p>
# Docling
# 🦆 Docling
<p align="center">
<a href="https://trendshift.io/repositories/12132" target="_blank"><img src="https://trendshift.io/api/badge/repositories/12132" alt="DS4SD%2Fdocling | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>
@ -29,7 +29,7 @@ Docling parses documents and exports them to the desired format with ease and sp
* 🗂️ Reads popular document formats (PDF, DOCX, PPTX, XLSX, Images, HTML, AsciiDoc & Markdown) and exports to Markdown and JSON
* 📑 Advanced PDF document understanding including page layout, reading order & table structures
* 🧩 Unified, expressive [DoclingDocument](https://ds4sd.github.io/docling/concepts/docling_document/) representation format
* 🤖 Easy integration with LlamaIndex 🦙 & LangChain 🦜🔗 for powerful RAG / QA applications
* 🤖 Easy integration with 🦙 LlamaIndex & 🦜🔗 LangChain for powerful RAG / QA applications
* 🔍 OCR support for scanned PDFs
* 💻 Simple and convenient CLI
@ -65,8 +65,24 @@ result = converter.convert(source)
print(result.document.export_to_markdown()) # output: "## Docling Technical Report[...]"
```
Check out [Getting started](https://ds4sd.github.io/docling/).
You will find lots of tuning options to leverage all the advanced capabilities.
More [advanced usage options](https://ds4sd.github.io/docling/usage/) are available in
the docs.
## Documentation
Check out Docling's [documentation](https://ds4sd.github.io/docling/), for details on
installation, usage, concepts, recipes, extensions, and more.
## Examples
Go hands-on with our [examples](https://ds4sd.github.io/docling/examples/),
demonstrating how to address different application use cases with Docling.
## Integrations
To further accelerate your AI application development, check out Docling's native
[integrations](https://ds4sd.github.io/docling/integrations/) with popular frameworks
and tools.
## Get help and support

Binary file not shown.

After

Width:  |  Height:  |  Size: 233 KiB

Binary file not shown.

View File

@ -1,5 +1,3 @@
# Docling
<p align="center">
<img loading="lazy" alt="Docling" src="assets/docling_processing.png" width="100%" />
<a href="https://trendshift.io/repositories/12132" target="_blank"><img src="https://trendshift.io/api/badge/repositories/12132" alt="DS4SD%2Fdocling | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>
@ -23,7 +21,7 @@ Docling parses documents and exports them to the desired format with ease and sp
* 🗂️ Reads popular document formats (PDF, DOCX, PPTX, XLSX, Images, HTML, AsciiDoc & Markdown) and exports to Markdown and JSON
* 📑 Advanced PDF document understanding incl. page layout, reading order & table structures
* 🧩 Unified, expressive [DoclingDocument](./concepts/docling_document.md) representation format
* 🤖 Easy integration with LlamaIndex 🦙 & LangChain 🦜🔗 for powerful RAG / QA applications
* 🤖 Easy integration with 🦙 LlamaIndex & 🦜🔗 LangChain for powerful RAG / QA applications
* 🔍 OCR support for scanned PDFs
* 💻 Simple and convenient CLI

9
docs/integrations/bee.md Normal file
View File

@ -0,0 +1,9 @@
Docling is available as an extraction backend in the [Bee][github] framework.
- 💻 [Bee GitHub][github]
- 📖 [Bee Docs][docs]
- 📦 [Bee NPM][package]
[github]: https://github.com/i-am-bee
[docs]: https://i-am-bee.github.io/bee-agent-framework/
[package]: https://www.npmjs.com/package/bee-agent-framework

View File

@ -1 +1,6 @@
Use the navigation on the left to browse through Docling integrations with popular frameworks and tools.
<p align="center">
<img loading="lazy" alt="Docling" src="../assets/docling_ecosystem.png" width="100%" />
</p>

View File

@ -0,0 +1,17 @@
Docling is powering document processing in [InstructLab](https://instructlab.ai/),
enabling users to unlock the knowledge hidden in documents and present it to
InstructLab's fine-tuning for aligning AI models to the user's specific data.
More details can be found in this [blog post][blog].
- 🏠 [InstructLab Home][home]
- 💻 [InstructLab GitHub][github]
- 🧑🏻‍💻 [InstructLab UI][ui]
- 📖 [InstructLab Docs][docs]
<!-- - 📝 [Blog post]() -->
[home]: https://instructlab.ai
[github]: https://github.com/instructlab
[ui]: https://ui.instructlab.ai/
[docs]: https://docs.instructlab.ai/
[blog]: https://www.redhat.com/en/blog/docling-missing-document-processing-companion-generative-ai

View File

@ -0,0 +1,9 @@
Docling is available in [Prodigy][home] as a [Prodigy-PDF plugin][plugin] recipe.
- 🌐 [Prodigy Home][home]
- 🔌 [Prodigy-PDF Plugin][plugin]
- 🧑🏽‍🍳 [pdf-spans.manual Recipe][recipe]
[home]: https://prodi.gy/
[plugin]: https://prodi.gy/docs/plugins#pdf
[recipe]: https://prodi.gy/docs/plugins#pdf-spans.manual

View File

@ -1,3 +1,5 @@
# spaCy
Docling is available in [spaCy](https://spacy.io/) as the "SpaCy Layout" plugin:
- 💻 [SpacyLayout GitHub][github]

View File

@ -1,5 +1,7 @@
{% extends "base.html" %}
{#
{% block announce %}
<p>🎉 Docling has gone v2! <a href="{{ 'v2' | url }}">Check out</a> what's new and how to get started!</p>
{% endblock %}
#}

View File

@ -52,8 +52,8 @@ theme:
- search.suggest
- toc.follow
nav:
- Get started:
- Home: index.md
- Home:
- "🦆 Docling": index.md
- Installation: installation.md
- Usage: usage.md
- CLI: cli.md
@ -85,10 +85,13 @@ nav:
# - CLI: examples/cli.md
- Integrations:
- Integrations: integrations/index.md
- "🐝 Bee": integrations/bee.md
- "Data Prep Kit": integrations/data_prep_kit.md
- "DocETL": integrations/docetl.md
- "🐶 InstructLab": integrations/instructlab.md
- "Kotaemon": integrations/kotaemon.md
- "LlamaIndex 🦙": integrations/llamaindex.md
- "🦙 LlamaIndex": integrations/llamaindex.md
- "Prodigy": integrations/prodigy.md
- "spaCy": integrations/spacy.md
# - "LangChain 🦜🔗": integrations/langchain.md
# - API reference: