docs: add DocETL, Kotaemon, spaCy integrations; minor docs improvements (#408)
Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
This commit is contained in:
parent
97d571af97
commit
7a45b92078
@ -1 +1 @@
|
|||||||
Use the navigation on the left to browse some core Docling concepts.
|
Use the navigation on the left to browse through some core Docling concepts.
|
||||||
|
@ -7,13 +7,14 @@
|
|||||||
|
|
||||||
[](https://arxiv.org/abs/2408.09869)
|
[](https://arxiv.org/abs/2408.09869)
|
||||||
[](https://pypi.org/project/docling/)
|
[](https://pypi.org/project/docling/)
|
||||||

|
[](https://pypi.org/project/docling/)
|
||||||
[](https://python-poetry.org/)
|
[](https://python-poetry.org/)
|
||||||
[](https://github.com/psf/black)
|
[](https://github.com/psf/black)
|
||||||
[](https://pycqa.github.io/isort/)
|
[](https://pycqa.github.io/isort/)
|
||||||
[](https://pydantic.dev)
|
[](https://pydantic.dev)
|
||||||
[](https://github.com/pre-commit/pre-commit)
|
[](https://github.com/pre-commit/pre-commit)
|
||||||
[](https://opensource.org/licenses/MIT)
|
[](https://opensource.org/licenses/MIT)
|
||||||
|
[](https://pepy.tech/projects/docling)
|
||||||
|
|
||||||
Docling parses documents and exports them to the desired format with ease and speed.
|
Docling parses documents and exports them to the desired format with ease and speed.
|
||||||
|
|
||||||
|
9
docs/integrations/.template.md
Normal file
9
docs/integrations/.template.md
Normal file
@ -0,0 +1,9 @@
|
|||||||
|
Docling is available as a plugin for [EXAMPLE](https://example.com).
|
||||||
|
|
||||||
|
- 💻 [GitHub][github]
|
||||||
|
- 📖 [Docs][docs]
|
||||||
|
- 📦 [PyPI][pypi]
|
||||||
|
|
||||||
|
[github]: https://github.com/...
|
||||||
|
[docs]: https://...
|
||||||
|
[pypi]: https://pypi.org/project/...
|
@ -1,13 +1,13 @@
|
|||||||
## Get started
|
## Get started
|
||||||
|
|
||||||
Docling is used by the [Data Prep Kit \[↗\]](https://ibm.github.io/data-prep-kit/) open-source toolkit for preparing unstructured data for LLM application development ranging from laptop scale to datacenter scale.
|
Docling is used by the [Data Prep Kit](https://ibm.github.io/data-prep-kit/) open-source toolkit for preparing unstructured data for LLM application development ranging from laptop scale to datacenter scale.
|
||||||
|
|
||||||
Below you find the Data Prep Kit modules powered by Docling.
|
Below you find the Data Prep Kit modules powered by Docling.
|
||||||
|
|
||||||
## PDF ingestion to Parquet
|
## PDF ingestion to Parquet
|
||||||
- 💻 [GitHub \[↗\]](https://github.com/IBM/data-prep-kit/tree/dev/transforms/language/pdf2parquet)
|
- 💻 [PDF-to-Parquet GitHub](https://github.com/IBM/data-prep-kit/tree/dev/transforms/language/pdf2parquet)
|
||||||
- 📖 [API docs \[↗\]](https://ibm.github.io/data-prep-kit/transforms/language/pdf2parquet/python/)
|
- 📖 [PDF-to-Parquet Docs](https://ibm.github.io/data-prep-kit/transforms/language/pdf2parquet/python/)
|
||||||
|
|
||||||
## Document chunking
|
## Document chunking
|
||||||
- 💻 [GitHub \[↗\]](https://github.com/IBM/data-prep-kit/tree/dev/transforms/language/doc_chunk)
|
- 💻 [Doc Chunking GitHub](https://github.com/IBM/data-prep-kit/tree/dev/transforms/language/doc_chunk)
|
||||||
- 📖 [API docs \[↗\]](https://ibm.github.io/data-prep-kit/transforms/language/doc_chunk/python/)
|
- 📖 [Doc Chunking Docs](https://ibm.github.io/data-prep-kit/transforms/language/doc_chunk/python/)
|
||||||
|
9
docs/integrations/docetl.md
Normal file
9
docs/integrations/docetl.md
Normal file
@ -0,0 +1,9 @@
|
|||||||
|
Docling is available as a file conversion method in [DocETL](https://github.com/ucbepic/docetl):
|
||||||
|
|
||||||
|
- 💻 [DocETL GitHub][github]
|
||||||
|
- 📖 [DocETL Docs][docs]
|
||||||
|
- 📦 [DocETL PyPI][pypi]
|
||||||
|
|
||||||
|
[github]: https://github.com/ucbepic/docetl
|
||||||
|
[docs]: https://ucbepic.github.io/docetl/
|
||||||
|
[pypi]: https://pypi.org/project/docetl/
|
9
docs/integrations/kotaemon.md
Normal file
9
docs/integrations/kotaemon.md
Normal file
@ -0,0 +1,9 @@
|
|||||||
|
Docling is available in [Kotaemon](https://cinnamon.github.io/kotaemon/) as the `DoclingReader` loader:
|
||||||
|
|
||||||
|
- 💻 [Kotaemon GitHub][github]
|
||||||
|
- 📖 [DoclingReader Docs][docs]
|
||||||
|
- ⚙️ [Docling Setup in Kotaemon][setup]
|
||||||
|
|
||||||
|
[github]: https://github.com/Cinnamon/kotaemon
|
||||||
|
[docs]: https://cinnamon.github.io/kotaemon/reference/loaders/docling_loader/
|
||||||
|
[setup]: https://cinnamon.github.io/kotaemon/development/?h=docling#setup-multimodal-document-parsing-ocr-table-parsing-figure-extraction
|
@ -1,8 +1,8 @@
|
|||||||
## Get started
|
## Get started
|
||||||
|
|
||||||
Docling is available as an official [LlamaIndex \[↗\]](https://docs.llamaindex.ai/) extension.
|
Docling is available as an official [LlamaIndex](https://docs.llamaindex.ai/) extension.
|
||||||
|
|
||||||
To get started, check out the [step-by-step guide in LlamaIndex \[↗\]](https://docs.llamaindex.ai/en/stable/examples/data_connectors/DoclingReaderDemo/)<!--{target="_blank"}-->.
|
To get started, check out the [step-by-step guide in LlamaIndex](https://docs.llamaindex.ai/en/stable/examples/data_connectors/DoclingReaderDemo/).
|
||||||
|
|
||||||
## Components
|
## Components
|
||||||
|
|
||||||
@ -10,16 +10,14 @@ To get started, check out the [step-by-step guide in LlamaIndex \[↗\]](https:/
|
|||||||
|
|
||||||
Reads document files and uses Docling to populate LlamaIndex `Document` objects — either serializing Docling's data model (losslessly, e.g. as JSON) or exporting to a simplified format (lossily, e.g. as Markdown).
|
Reads document files and uses Docling to populate LlamaIndex `Document` objects — either serializing Docling's data model (losslessly, e.g. as JSON) or exporting to a simplified format (lossily, e.g. as Markdown).
|
||||||
|
|
||||||
- 💻 [GitHub \[↗\]](https://github.com/run-llama/llama_index/tree/main/llama-index-integrations/readers/llama-index-readers-docling)<!--{target="_blank"}-->
|
- 💻 [Docling Reader GitHub](https://github.com/run-llama/llama_index/tree/main/llama-index-integrations/readers/llama-index-readers-docling)
|
||||||
- 📖 [API docs \[↗\]](https://docs.llamaindex.ai/en/stable/api_reference/readers/docling/)<!--{target="_blank"} -->
|
- 📖 [Docling Reader Docs](https://docs.llamaindex.ai/en/stable/api_reference/readers/docling/)
|
||||||
- 📦 [PyPI \[↗\]](https://pypi.org/project/llama-index-readers-docling/)<!--{target="_blank"}-->
|
- 📦 [Docling Reader PyPI](https://pypi.org/project/llama-index-readers-docling/)
|
||||||
- 🦙 [LlamaHub \[↗\]](https://llamahub.ai/l/readers/llama-index-readers-docling)<!--{target="_blank"}-->
|
|
||||||
|
|
||||||
### Docling Node Parser
|
### Docling Node Parser
|
||||||
|
|
||||||
Reads LlamaIndex `Document` objects populated in Docling's format by Docling Reader and, using its knowledge of the Docling format, parses them to LlamaIndex `Node` objects for downstream usage in LlamaIndex applications, e.g. as chunks for embedding.
|
Reads LlamaIndex `Document` objects populated in Docling's format by Docling Reader and, using its knowledge of the Docling format, parses them to LlamaIndex `Node` objects for downstream usage in LlamaIndex applications, e.g. as chunks for embedding.
|
||||||
|
|
||||||
- 💻 [GitHub \[↗\]](https://github.com/run-llama/llama_index/tree/main/llama-index-integrations/node_parser/llama-index-node-parser-docling)<!--{target="_blank"}-->
|
- 💻 [Docling Node Parser GitHub](https://github.com/run-llama/llama_index/tree/main/llama-index-integrations/node_parser/llama-index-node-parser-docling)
|
||||||
- 📖 [API docs \[↗\]](https://docs.llamaindex.ai/en/stable/api_reference/node_parser/docling/)<!--{target="_blank"} -->
|
- 📖 [Docling Node Parser Docs](https://docs.llamaindex.ai/en/stable/api_reference/node_parser/docling/)
|
||||||
- 📦 [PyPI \[↗\]](https://pypi.org/project/llama-index-node-parser-docling/)<!--{target="_blank"}-->
|
- 📦 [Docling Node Parser PyPI](https://pypi.org/project/llama-index-node-parser-docling/)
|
||||||
- 🦙 [LlamaHub \[↗\]](https://llamahub.ai/l/node_parser/llama-index-node-parser-docling)<!--{target="_blank"}-->
|
|
||||||
|
9
docs/integrations/spacy.md
Normal file
9
docs/integrations/spacy.md
Normal file
@ -0,0 +1,9 @@
|
|||||||
|
Docling is available in [spaCy](https://spacy.io/) as the "SpaCy Layout" plugin:
|
||||||
|
|
||||||
|
- 💻 [SpacyLayout GitHub][github]
|
||||||
|
- 📖 [SpacyLayout Docs][docs]
|
||||||
|
- 📦 [SpacyLayout PyPI][pypi]
|
||||||
|
|
||||||
|
[github]: https://github.com/explosion/spacy-layout
|
||||||
|
[docs]: https://github.com/explosion/spacy-layout?tab=readme-ov-file#readme
|
||||||
|
[pypi]: https://pypi.org/project/spacy-layout/
|
@ -38,6 +38,7 @@ theme:
|
|||||||
- content.code.annotate
|
- content.code.annotate
|
||||||
- content.code.copy
|
- content.code.copy
|
||||||
- announce.dismiss
|
- announce.dismiss
|
||||||
|
- navigation.footer
|
||||||
- navigation.tabs
|
- navigation.tabs
|
||||||
- navigation.indexes # <= if set, each "section" can have its own page, if index.md is used
|
- navigation.indexes # <= if set, each "section" can have its own page, if index.md is used
|
||||||
- navigation.instant
|
- navigation.instant
|
||||||
@ -85,7 +86,10 @@ nav:
|
|||||||
- Integrations:
|
- Integrations:
|
||||||
- Integrations: integrations/index.md
|
- Integrations: integrations/index.md
|
||||||
- "Data Prep Kit": integrations/data_prep_kit.md
|
- "Data Prep Kit": integrations/data_prep_kit.md
|
||||||
|
- "DocETL": integrations/docetl.md
|
||||||
|
- "Kotaemon": integrations/kotaemon.md
|
||||||
- "LlamaIndex 🦙": integrations/llamaindex.md
|
- "LlamaIndex 🦙": integrations/llamaindex.md
|
||||||
|
- "spaCy": integrations/spacy.md
|
||||||
# - "LangChain 🦜🔗": integrations/langchain.md
|
# - "LangChain 🦜🔗": integrations/langchain.md
|
||||||
# - API reference:
|
# - API reference:
|
||||||
# - API reference: api_reference/index.md
|
# - API reference: api_reference/index.md
|
||||||
|
Loading…
Reference in New Issue
Block a user