chore: move to docling-project org (#1160)
* chore: rename org Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * Update docs/faq/index.md Co-authored-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com> Signed-off-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com> * update github pages Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * revert test content Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> --------- Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> Signed-off-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com> Co-authored-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
This commit is contained in:
parent
f94da44ec5
commit
fa16b12316
2
.github/SECURITY.md
vendored
2
.github/SECURITY.md
vendored
@ -20,4 +20,4 @@ After the initial reply to your report, the security team will keep you informed
|
|||||||
|
|
||||||
## Security Alerts
|
## Security Alerts
|
||||||
|
|
||||||
We will send announcements of security vulnerabilities and steps to remediate on the [Docling announcements](https://github.com/DS4SD/docling/discussions/categories/announcements).
|
We will send announcements of security vulnerabilities and steps to remediate on the [Docling announcements](https://github.com/docling-project/docling/discussions/categories/announcements).
|
||||||
|
2
.github/workflows/ci-docs.yml
vendored
2
.github/workflows/ci-docs.yml
vendored
@ -10,7 +10,7 @@ on:
|
|||||||
|
|
||||||
jobs:
|
jobs:
|
||||||
build-docs:
|
build-docs:
|
||||||
if: ${{ github.event_name == 'push' || (github.event.pull_request.head.repo.full_name != 'DS4SD/docling' && github.event.pull_request.head.repo.full_name != 'ds4sd/docling') }}
|
if: ${{ github.event_name == 'push' || (github.event.pull_request.head.repo.full_name != 'docling-project/docling' && github.event.pull_request.head.repo.full_name != 'docling-project/docling') }}
|
||||||
uses: ./.github/workflows/docs.yml
|
uses: ./.github/workflows/docs.yml
|
||||||
with:
|
with:
|
||||||
deploy: false
|
deploy: false
|
||||||
|
2
.github/workflows/ci.yml
vendored
2
.github/workflows/ci.yml
vendored
@ -15,5 +15,5 @@ env:
|
|||||||
|
|
||||||
jobs:
|
jobs:
|
||||||
code-checks:
|
code-checks:
|
||||||
if: ${{ github.event_name == 'push' || (github.event.pull_request.head.repo.full_name != 'DS4SD/docling' && github.event.pull_request.head.repo.full_name != 'ds4sd/docling') }}
|
if: ${{ github.event_name == 'push' || (github.event.pull_request.head.repo.full_name != 'docling-project/docling' && github.event.pull_request.head.repo.full_name != 'docling-project/docling') }}
|
||||||
uses: ./.github/workflows/checks.yml
|
uses: ./.github/workflows/checks.yml
|
||||||
|
666
CHANGELOG.md
666
CHANGELOG.md
File diff suppressed because it is too large
Load Diff
@ -2,13 +2,13 @@
|
|||||||
Our project welcomes external contributions. If you have an itch, please feel
|
Our project welcomes external contributions. If you have an itch, please feel
|
||||||
free to scratch it.
|
free to scratch it.
|
||||||
|
|
||||||
To contribute code or documentation, please submit a [pull request](https://github.com/DS4SD/docling/pulls).
|
To contribute code or documentation, please submit a [pull request](https://github.com/docling-project/docling/pulls).
|
||||||
|
|
||||||
A good way to familiarize yourself with the codebase and contribution process is
|
A good way to familiarize yourself with the codebase and contribution process is
|
||||||
to look for and tackle low-hanging fruit in the [issue tracker](https://github.com/DS4SD/docling/issues).
|
to look for and tackle low-hanging fruit in the [issue tracker](https://github.com/docling-project/docling/issues).
|
||||||
Before embarking on a more ambitious contribution, please quickly [get in touch](#communication) with us.
|
Before embarking on a more ambitious contribution, please quickly [get in touch](#communication) with us.
|
||||||
|
|
||||||
For general questions or support requests, please refer to the [discussion section](https://github.com/DS4SD/docling/discussions).
|
For general questions or support requests, please refer to the [discussion section](https://github.com/docling-project/docling/discussions).
|
||||||
|
|
||||||
**Note: We appreciate your effort and want to avoid situations where a contribution
|
**Note: We appreciate your effort and want to avoid situations where a contribution
|
||||||
requires extensive rework (by you or by us), sits in the backlog for a long time, or
|
requires extensive rework (by you or by us), sits in the backlog for a long time, or
|
||||||
@ -16,14 +16,14 @@ cannot be accepted at all!**
|
|||||||
|
|
||||||
### Proposing New Features
|
### Proposing New Features
|
||||||
|
|
||||||
If you would like to implement a new feature, please [raise an issue](https://github.com/DS4SD/docling/issues)
|
If you would like to implement a new feature, please [raise an issue](https://github.com/docling-project/docling/issues)
|
||||||
before sending a pull request so the feature can be discussed. This is to avoid
|
before sending a pull request so the feature can be discussed. This is to avoid
|
||||||
you spending valuable time working on a feature that the project developers
|
you spending valuable time working on a feature that the project developers
|
||||||
are not interested in accepting into the codebase.
|
are not interested in accepting into the codebase.
|
||||||
|
|
||||||
### Fixing Bugs
|
### Fixing Bugs
|
||||||
|
|
||||||
If you would like to fix a bug, please [raise an issue](https://github.com/DS4SD/docling/issues) before sending a
|
If you would like to fix a bug, please [raise an issue](https://github.com/docling-project/docling/issues) before sending a
|
||||||
pull request so it can be tracked.
|
pull request so it can be tracked.
|
||||||
|
|
||||||
### Merge Approval
|
### Merge Approval
|
||||||
@ -78,7 +78,7 @@ This project strictly adheres to using dependencies that are compatible with the
|
|||||||
|
|
||||||
## Communication
|
## Communication
|
||||||
|
|
||||||
Please feel free to connect with us using the [discussion section](https://github.com/DS4SD/docling/discussions).
|
Please feel free to connect with us using the [discussion section](https://github.com/docling-project/docling/discussions).
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
28
README.md
28
README.md
@ -1,6 +1,6 @@
|
|||||||
<p align="center">
|
<p align="center">
|
||||||
<a href="https://github.com/ds4sd/docling">
|
<a href="https://github.com/docling-project/docling">
|
||||||
<img loading="lazy" alt="Docling" src="https://github.com/DS4SD/docling/raw/main/docs/assets/docling_processing.png" width="100%"/>
|
<img loading="lazy" alt="Docling" src="https://github.com/docling-project/docling/raw/main/docs/assets/docling_processing.png" width="100%"/>
|
||||||
</a>
|
</a>
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
@ -11,7 +11,7 @@
|
|||||||
</p>
|
</p>
|
||||||
|
|
||||||
[](https://arxiv.org/abs/2408.09869)
|
[](https://arxiv.org/abs/2408.09869)
|
||||||
[](https://ds4sd.github.io/docling/)
|
[](https://docling-project.github.io/docling/)
|
||||||
[](https://pypi.org/project/docling/)
|
[](https://pypi.org/project/docling/)
|
||||||
[](https://pypi.org/project/docling/)
|
[](https://pypi.org/project/docling/)
|
||||||
[](https://python-poetry.org/)
|
[](https://python-poetry.org/)
|
||||||
@ -19,7 +19,7 @@
|
|||||||
[](https://pycqa.github.io/isort/)
|
[](https://pycqa.github.io/isort/)
|
||||||
[](https://pydantic.dev)
|
[](https://pydantic.dev)
|
||||||
[](https://github.com/pre-commit/pre-commit)
|
[](https://github.com/pre-commit/pre-commit)
|
||||||
[](https://opensource.org/licenses/MIT)
|
[](https://opensource.org/licenses/MIT)
|
||||||
[](https://pepy.tech/projects/docling)
|
[](https://pepy.tech/projects/docling)
|
||||||
|
|
||||||
Docling simplifies document processing, parsing diverse formats — including advanced PDF understanding — and providing seamless integrations with the gen AI ecosystem.
|
Docling simplifies document processing, parsing diverse formats — including advanced PDF understanding — and providing seamless integrations with the gen AI ecosystem.
|
||||||
@ -51,7 +51,7 @@ pip install docling
|
|||||||
|
|
||||||
Works on macOS, Linux and Windows environments. Both x86_64 and arm64 architectures.
|
Works on macOS, Linux and Windows environments. Both x86_64 and arm64 architectures.
|
||||||
|
|
||||||
More [detailed installation instructions](https://ds4sd.github.io/docling/installation/) are available in the docs.
|
More [detailed installation instructions](https://docling-project.github.io/docling/installation/) are available in the docs.
|
||||||
|
|
||||||
## Getting started
|
## Getting started
|
||||||
|
|
||||||
@ -66,28 +66,28 @@ result = converter.convert(source)
|
|||||||
print(result.document.export_to_markdown()) # output: "## Docling Technical Report[...]"
|
print(result.document.export_to_markdown()) # output: "## Docling Technical Report[...]"
|
||||||
```
|
```
|
||||||
|
|
||||||
More [advanced usage options](https://ds4sd.github.io/docling/usage/) are available in
|
More [advanced usage options](https://docling-project.github.io/docling/usage/) are available in
|
||||||
the docs.
|
the docs.
|
||||||
|
|
||||||
## Documentation
|
## Documentation
|
||||||
|
|
||||||
Check out Docling's [documentation](https://ds4sd.github.io/docling/), for details on
|
Check out Docling's [documentation](https://docling-project.github.io/docling/), for details on
|
||||||
installation, usage, concepts, recipes, extensions, and more.
|
installation, usage, concepts, recipes, extensions, and more.
|
||||||
|
|
||||||
## Examples
|
## Examples
|
||||||
|
|
||||||
Go hands-on with our [examples](https://ds4sd.github.io/docling/examples/),
|
Go hands-on with our [examples](https://docling-project.github.io/docling/examples/),
|
||||||
demonstrating how to address different application use cases with Docling.
|
demonstrating how to address different application use cases with Docling.
|
||||||
|
|
||||||
## Integrations
|
## Integrations
|
||||||
|
|
||||||
To further accelerate your AI application development, check out Docling's native
|
To further accelerate your AI application development, check out Docling's native
|
||||||
[integrations](https://ds4sd.github.io/docling/integrations/) with popular frameworks
|
[integrations](https://docling-project.github.io/docling/integrations/) with popular frameworks
|
||||||
and tools.
|
and tools.
|
||||||
|
|
||||||
## Get help and support
|
## Get help and support
|
||||||
|
|
||||||
Please feel free to connect with us using the [discussion section](https://github.com/DS4SD/docling/discussions).
|
Please feel free to connect with us using the [discussion section](https://github.com/docling-project/docling/discussions).
|
||||||
|
|
||||||
## Technical report
|
## Technical report
|
||||||
|
|
||||||
@ -95,7 +95,7 @@ For more details on Docling's inner workings, check out the [Docling Technical R
|
|||||||
|
|
||||||
## Contributing
|
## Contributing
|
||||||
|
|
||||||
Please read [Contributing to Docling](https://github.com/DS4SD/docling/blob/main/CONTRIBUTING.md) for details.
|
Please read [Contributing to Docling](https://github.com/docling-project/docling/blob/main/CONTRIBUTING.md) for details.
|
||||||
|
|
||||||
## References
|
## References
|
||||||
|
|
||||||
@ -123,6 +123,6 @@ For individual model usage, please refer to the model licenses found in the orig
|
|||||||
|
|
||||||
Docling has been brought to you by IBM.
|
Docling has been brought to you by IBM.
|
||||||
|
|
||||||
[supported_formats]: https://ds4sd.github.io/docling/usage/supported_formats/
|
[supported_formats]: https://docling-project.github.io/docling/usage/supported_formats/
|
||||||
[docling_document]: https://ds4sd.github.io/docling/concepts/docling_document/
|
[docling_document]: https://docling-project.github.io/docling/concepts/docling_document/
|
||||||
[integrations]: https://ds4sd.github.io/docling/integrations/
|
[integrations]: https://docling-project.github.io/docling/integrations/
|
||||||
|
@ -121,7 +121,7 @@ def download(
|
|||||||
"Using the CLI:",
|
"Using the CLI:",
|
||||||
f"`docling --artifacts-path={output_dir} FILE`",
|
f"`docling --artifacts-path={output_dir} FILE`",
|
||||||
"\n",
|
"\n",
|
||||||
"Using Python: see the documentation at <https://ds4sd.github.io/docling/usage>.",
|
"Using Python: see the documentation at <https://docling-project.github.io/docling/usage>.",
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
|
@ -26,7 +26,7 @@ class OcrMacModel(BaseOcrModel):
|
|||||||
"ocrmac is not correctly installed. "
|
"ocrmac is not correctly installed. "
|
||||||
"Please install it via `pip install ocrmac` to use this OCR engine. "
|
"Please install it via `pip install ocrmac` to use this OCR engine. "
|
||||||
"Alternatively, Docling has support for other OCR engines. See the documentation: "
|
"Alternatively, Docling has support for other OCR engines. See the documentation: "
|
||||||
"https://ds4sd.github.io/docling/installation/"
|
"https://docling-project.github.io/docling/installation/"
|
||||||
)
|
)
|
||||||
try:
|
try:
|
||||||
from ocrmac import ocrmac
|
from ocrmac import ocrmac
|
||||||
|
@ -31,14 +31,14 @@ class TesseractOcrModel(BaseOcrModel):
|
|||||||
"Note that tesserocr might have to be manually compiled for working with "
|
"Note that tesserocr might have to be manually compiled for working with "
|
||||||
"your Tesseract installation. The Docling documentation provides examples for it. "
|
"your Tesseract installation. The Docling documentation provides examples for it. "
|
||||||
"Alternatively, Docling has support for other OCR engines. See the documentation: "
|
"Alternatively, Docling has support for other OCR engines. See the documentation: "
|
||||||
"https://ds4sd.github.io/docling/installation/"
|
"https://docling-project.github.io/docling/installation/"
|
||||||
)
|
)
|
||||||
missing_langs_errmsg = (
|
missing_langs_errmsg = (
|
||||||
"tesserocr is not correctly configured. No language models have been detected. "
|
"tesserocr is not correctly configured. No language models have been detected. "
|
||||||
"Please ensure that the TESSDATA_PREFIX envvar points to tesseract languages dir. "
|
"Please ensure that the TESSDATA_PREFIX envvar points to tesseract languages dir. "
|
||||||
"You can find more information how to setup other OCR engines in Docling "
|
"You can find more information how to setup other OCR engines in Docling "
|
||||||
"documentation: "
|
"documentation: "
|
||||||
"https://ds4sd.github.io/docling/installation/"
|
"https://docling-project.github.io/docling/installation/"
|
||||||
)
|
)
|
||||||
|
|
||||||
try:
|
try:
|
||||||
|
@ -7,7 +7,7 @@ pydantic datatype, which can express several features common to documents, such
|
|||||||
* Layout information (i.e. bounding boxes) for all items, if available
|
* Layout information (i.e. bounding boxes) for all items, if available
|
||||||
* Provenance information
|
* Provenance information
|
||||||
|
|
||||||
The definition of the Pydantic types is implemented in the module `docling_core.types.doc`, more details in [source code definitions](https://github.com/DS4SD/docling-core/tree/main/docling_core/types/doc).
|
The definition of the Pydantic types is implemented in the module `docling_core.types.doc`, more details in [source code definitions](https://github.com/docling-project/docling-core/tree/main/docling_core/types/doc).
|
||||||
|
|
||||||
It also brings a set of document construction APIs to build up a `DoclingDocument` from scratch.
|
It also brings a set of document construction APIs to build up a `DoclingDocument` from scratch.
|
||||||
|
|
||||||
|
@ -4,7 +4,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"<a href=\"https://colab.research.google.com/github/DS4SD/docling/blob/main/docs/examples/backend_xml_rag.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
|
"<a href=\"https://colab.research.google.com/github/docling-project/docling/blob/main/docs/examples/backend_xml_rag.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@ -36,7 +36,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"This is an example of using [Docling](https://ds4sd.github.io/docling/) for converting structured data (XML) into a unified document\n",
|
"This is an example of using [Docling](https://docling-project.github.io/docling/) for converting structured data (XML) into a unified document\n",
|
||||||
"representation format, `DoclingDocument`, and leverage its riched structured content for RAG applications.\n",
|
"representation format, `DoclingDocument`, and leverage its riched structured content for RAG applications.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Data used in this example consist of patents from the [United States Patent and Trademark Office (USPTO)](https://www.uspto.gov/) and medical\n",
|
"Data used in this example consist of patents from the [United States Patent and Trademark Office (USPTO)](https://www.uspto.gov/) and medical\n",
|
||||||
|
@ -103,7 +103,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"> 👉 **NOTE**: As you see above, using the `HybridChunker` can sometimes lead to a warning from the transformers library, however this is a \"false alarm\" — for details check [here](https://ds4sd.github.io/docling/faq/#hybridchunker-triggers-warning-token-indices-sequence-length-is-longer-than-the-specified-maximum-sequence-length-for-this-model)."
|
"> 👉 **NOTE**: As you see above, using the `HybridChunker` can sometimes lead to a warning from the transformers library, however this is a \"false alarm\" — for details check [here](https://docling-project.github.io/docling/faq/#hybridchunker-triggers-warning-token-indices-sequence-length-is-longer-than-the-specified-maximum-sequence-length-for-this-model)."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
@ -321,7 +321,7 @@
|
|||||||
],
|
],
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"kernelspec": {
|
"kernelspec": {
|
||||||
"display_name": "docling-aMWN2FRM-py3.12",
|
"display_name": "docling-hgXEfXco-py3.12",
|
||||||
"language": "python",
|
"language": "python",
|
||||||
"name": "python3"
|
"name": "python3"
|
||||||
},
|
},
|
||||||
|
@ -36,7 +36,7 @@
|
|||||||
"## A recipe 🧑🍳 🐥 💚\n",
|
"## A recipe 🧑🍳 🐥 💚\n",
|
||||||
"\n",
|
"\n",
|
||||||
"This notebook demonstrates how to build a Retrieval-Augmented Generation (RAG) system using:\n",
|
"This notebook demonstrates how to build a Retrieval-Augmented Generation (RAG) system using:\n",
|
||||||
"- [Docling](https://ds4sd.github.io/docling/) for document parsing and chunking\n",
|
"- [Docling](https://docling-project.github.io/docling/) for document parsing and chunking\n",
|
||||||
"- [Azure AI Search](https://azure.microsoft.com/products/ai-services/ai-search/?msockid=0109678bea39665431e37323ebff6723) for vector indexing and retrieval\n",
|
"- [Azure AI Search](https://azure.microsoft.com/products/ai-services/ai-search/?msockid=0109678bea39665431e37323ebff6723) for vector indexing and retrieval\n",
|
||||||
"- [Azure OpenAI](https://azure.microsoft.com/products/ai-services/openai-service?msockid=0109678bea39665431e37323ebff6723) for embeddings and chat completion\n",
|
"- [Azure OpenAI](https://azure.microsoft.com/products/ai-services/openai-service?msockid=0109678bea39665431e37323ebff6723) for embeddings and chat completion\n",
|
||||||
"\n",
|
"\n",
|
||||||
|
@ -4,7 +4,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"<a href=\"https://colab.research.google.com/github/DS4SD/docling/blob/main/docs/examples/rag_haystack.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
|
"<a href=\"https://colab.research.google.com/github/docling-project/docling/blob/main/docs/examples/rag_haystack.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@ -247,7 +247,7 @@
|
|||||||
"name": "stderr",
|
"name": "stderr",
|
||||||
"output_type": "stream",
|
"output_type": "stream",
|
||||||
"text": [
|
"text": [
|
||||||
"/Users/pva/work/github.com/DS4SD/docling/.venv/lib/python3.12/site-packages/huggingface_hub/inference/_client.py:2232: FutureWarning: `stop_sequences` is a deprecated argument for `text_generation` task and will be removed in version '0.28.0'. Use `stop` instead.\n",
|
"/Users/pva/work/github.com/docling-project/docling/.venv/lib/python3.12/site-packages/huggingface_hub/inference/_client.py:2232: FutureWarning: `stop_sequences` is a deprecated argument for `text_generation` task and will be removed in version '0.28.0'. Use `stop` instead.\n",
|
||||||
" warnings.warn(\n"
|
" warnings.warn(\n"
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
|
@ -4,7 +4,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"<a href=\"https://colab.research.google.com/github/DS4SD/docling/blob/main/docs/examples/rag_langchain.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
|
"<a href=\"https://colab.research.google.com/github/docling-project/docling/blob/main/docs/examples/rag_langchain.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@ -168,7 +168,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"> Note: a message saying `\"Token indices sequence length is longer than the specified\n",
|
"> Note: a message saying `\"Token indices sequence length is longer than the specified\n",
|
||||||
"maximum sequence length...\"` can be ignored in this case — details\n",
|
"maximum sequence length...\"` can be ignored in this case — details\n",
|
||||||
"[here](https://github.com/DS4SD/docling-core/issues/119#issuecomment-2577418826)."
|
"[here](https://github.com/docling-project/docling-core/issues/119#issuecomment-2577418826)."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
@ -4,7 +4,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"<a href=\"https://colab.research.google.com/github/DS4SD/docling/blob/main/docs/examples/rag_llamaindex.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
|
"<a href=\"https://colab.research.google.com/github/docling-project/docling/blob/main/docs/examples/rag_llamaindex.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
@ -4,7 +4,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"[](https://colab.research.google.com/github/DS4SD/docling/blob/main/docs/examples/rag_weaviate.ipynb)"
|
"[](https://colab.research.google.com/github/docling-project/docling/blob/main/docs/examples/rag_weaviate.ipynb)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@ -29,7 +29,7 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"## A recipe 🧑🍳 🐥 💚\n",
|
"## A recipe 🧑🍳 🐥 💚\n",
|
||||||
"\n",
|
"\n",
|
||||||
"This is a code recipe that uses [Weaviate](https://weaviate.io/) to perform RAG over PDF documents parsed by [Docling](https://ds4sd.github.io/docling/).\n",
|
"This is a code recipe that uses [Weaviate](https://weaviate.io/) to perform RAG over PDF documents parsed by [Docling](https://docling-project.github.io/docling/).\n",
|
||||||
"\n",
|
"\n",
|
||||||
"In this notebook, we accomplish the following:\n",
|
"In this notebook, we accomplish the following:\n",
|
||||||
"* Parse the top machine learning papers on [arXiv](https://arxiv.org/) using Docling\n",
|
"* Parse the top machine learning papers on [arXiv](https://arxiv.org/) using Docling\n",
|
||||||
|
@ -4,7 +4,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"<a href=\"https://colab.research.google.com/github/DS4SD/docling/blob/main/docs/examples/hybrid_rag_qdrant\n",
|
"<a href=\"https://colab.research.google.com/github/docling-project/docling/blob/main/docs/examples/hybrid_rag_qdrant\n",
|
||||||
".ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
|
".ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@ -109,7 +109,7 @@
|
|||||||
"name": "stderr",
|
"name": "stderr",
|
||||||
"output_type": "stream",
|
"output_type": "stream",
|
||||||
"text": [
|
"text": [
|
||||||
"/Users/pva/work/github.com/DS4SD/docling/.venv/lib/python3.12/site-packages/huggingface_hub/utils/tqdm.py:155: UserWarning: Cannot enable progress bars: environment variable `HF_HUB_DISABLE_PROGRESS_BARS=1` is set and has priority.\n",
|
"/Users/pva/work/github.com/docling-project/docling/.venv/lib/python3.12/site-packages/huggingface_hub/utils/tqdm.py:155: UserWarning: Cannot enable progress bars: environment variable `HF_HUB_DISABLE_PROGRESS_BARS=1` is set and has priority.\n",
|
||||||
" warnings.warn(\n"
|
" warnings.warn(\n"
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
|
@ -1,6 +1,6 @@
|
|||||||
# FAQ
|
# FAQ
|
||||||
|
|
||||||
This is a collection of FAQ collected from the user questions on <https://github.com/DS4SD/docling/discussions>.
|
This is a collection of FAQ collected from the user questions on <https://github.com/docling-project/docling/discussions>.
|
||||||
|
|
||||||
|
|
||||||
??? question "Is Python 3.13 supported?"
|
??? question "Is Python 3.13 supported?"
|
||||||
@ -41,7 +41,7 @@ This is a collection of FAQ collected from the user questions on <https://github
|
|||||||
]
|
]
|
||||||
```
|
```
|
||||||
|
|
||||||
Source: Issue [#283](https://github.com/DS4SD/docling/issues/283#issuecomment-2465035868)
|
Source: Issue [#283](https://github.com/docling-project/docling/issues/283#issuecomment-2465035868)
|
||||||
|
|
||||||
|
|
||||||
??? question "Are text styles (bold, underline, etc) supported?"
|
??? question "Are text styles (bold, underline, etc) supported?"
|
||||||
@ -74,7 +74,7 @@ This is a collection of FAQ collected from the user questions on <https://github
|
|||||||
)
|
)
|
||||||
```
|
```
|
||||||
|
|
||||||
Source: Issue [#326](https://github.com/DS4SD/docling/issues/326)
|
Source: Issue [#326](https://github.com/docling-project/docling/issues/326)
|
||||||
|
|
||||||
|
|
||||||
??? question " Which model weights are needed to run Docling?"
|
??? question " Which model weights are needed to run Docling?"
|
||||||
@ -84,7 +84,7 @@ This is a collection of FAQ collected from the user questions on <https://github
|
|||||||
|
|
||||||
For processing PDF documents, Docling requires the model weights from <https://huggingface.co/ds4sd/docling-models>.
|
For processing PDF documents, Docling requires the model weights from <https://huggingface.co/ds4sd/docling-models>.
|
||||||
|
|
||||||
When OCR is enabled, some engines also require model artifacts. For example EasyOCR, for which Docling has [special pipeline options](https://github.com/DS4SD/docling/blob/main/docling/datamodel/pipeline_options.py#L68) to control the runtime behavior.
|
When OCR is enabled, some engines also require model artifacts. For example EasyOCR, for which Docling has [special pipeline options](https://github.com/docling-project/docling/blob/main/docling/datamodel/pipeline_options.py#L68) to control the runtime behavior.
|
||||||
|
|
||||||
|
|
||||||
??? question "SSL error downloading model weights"
|
??? question "SSL error downloading model weights"
|
||||||
@ -174,6 +174,6 @@ This is a collection of FAQ collected from the user questions on <https://github
|
|||||||
print(f"Model max length: {tokenizer.model_max_length}")
|
print(f"Model max length: {tokenizer.model_max_length}")
|
||||||
```
|
```
|
||||||
|
|
||||||
Also see [docling#725](https://github.com/DS4SD/docling/issues/725).
|
Also see [docling#725](https://github.com/docling-project/docling/issues/725).
|
||||||
|
|
||||||
Source: Issue [docling-core#119](https://github.com/DS4SD/docling-core/issues/119)
|
Source: Issue [docling-core#119](https://github.com/docling-project/docling-core/issues/119)
|
||||||
|
@ -11,7 +11,7 @@
|
|||||||
[](https://pycqa.github.io/isort/)
|
[](https://pycqa.github.io/isort/)
|
||||||
[](https://pydantic.dev)
|
[](https://pydantic.dev)
|
||||||
[](https://github.com/pre-commit/pre-commit)
|
[](https://github.com/pre-commit/pre-commit)
|
||||||
[](https://opensource.org/licenses/MIT)
|
[](https://opensource.org/licenses/MIT)
|
||||||
[](https://pepy.tech/projects/docling)
|
[](https://pepy.tech/projects/docling)
|
||||||
|
|
||||||
Docling simplifies document processing, parsing diverse formats — including advanced PDF understanding — and providing seamless integrations with the gen AI ecosystem.
|
Docling simplifies document processing, parsing diverse formats — including advanced PDF understanding — and providing seamless integrations with the gen AI ecosystem.
|
||||||
|
@ -5,7 +5,7 @@ Docling is available as a converter in [Haystack](https://haystack.deepset.ai/):
|
|||||||
- 🧑🏽🍳 [Docling Haystack integration example][example]
|
- 🧑🏽🍳 [Docling Haystack integration example][example]
|
||||||
- 📦 [Docling Haystack integration PyPI][pypi]
|
- 📦 [Docling Haystack integration PyPI][pypi]
|
||||||
|
|
||||||
[github]: https://github.com/DS4SD/docling-haystack
|
[github]: https://github.com/docling-project/docling-haystack
|
||||||
[docs]: https://haystack.deepset.ai/integrations/docling
|
[docs]: https://haystack.deepset.ai/integrations/docling
|
||||||
[pypi]: https://pypi.org/project/docling-haystack
|
[pypi]: https://pypi.org/project/docling-haystack
|
||||||
[example]: ../examples/rag_haystack.ipynb
|
[example]: ../examples/rag_haystack.ipynb
|
||||||
|
@ -8,7 +8,7 @@ To get started, check out the [step-by-step guide in LangChain][guide].
|
|||||||
- 📦 [LangChain Docling integration PyPI][pypi]
|
- 📦 [LangChain Docling integration PyPI][pypi]
|
||||||
|
|
||||||
[docs]: https://python.langchain.com/docs/integrations/providers/docling/
|
[docs]: https://python.langchain.com/docs/integrations/providers/docling/
|
||||||
[github]: https://github.com/DS4SD/docling-langchain
|
[github]: https://github.com/docling-project/docling-langchain
|
||||||
[guide]: https://python.langchain.com/docs/integrations/document_loaders/docling/
|
[guide]: https://python.langchain.com/docs/integrations/document_loaders/docling/
|
||||||
[example]: ../examples/rag_langchain.ipynb
|
[example]: ../examples/rag_langchain.ipynb
|
||||||
[pypi]: https://pypi.org/project/langchain-docling/
|
[pypi]: https://pypi.org/project/langchain-docling/
|
||||||
|
@ -1,7 +1,7 @@
|
|||||||
site_name: Docling
|
site_name: Docling
|
||||||
site_url: https://ds4sd.github.io/docling/
|
site_url: https://docling-project.github.io/docling/
|
||||||
repo_name: DS4SD/docling
|
repo_name: docling-project/docling
|
||||||
repo_url: https://github.com/DS4SD/docling
|
repo_url: https://github.com/docling-project/docling
|
||||||
|
|
||||||
theme:
|
theme:
|
||||||
name: material
|
name: material
|
||||||
|
@ -13,8 +13,8 @@ authors = [
|
|||||||
]
|
]
|
||||||
license = "MIT"
|
license = "MIT"
|
||||||
readme = "README.md"
|
readme = "README.md"
|
||||||
repository = "https://github.com/DS4SD/docling"
|
repository = "https://github.com/docling-project/docling"
|
||||||
homepage = "https://github.com/DS4SD/docling"
|
homepage = "https://github.com/docling-project/docling"
|
||||||
keywords = [
|
keywords = [
|
||||||
"docling",
|
"docling",
|
||||||
"convert",
|
"convert",
|
||||||
|
@ -179,7 +179,7 @@ def test_guess_format(tmp_path):
|
|||||||
# Non-Docling JSON
|
# Non-Docling JSON
|
||||||
# TODO: Docling JSON is currently the single supported JSON flavor and the pipeline
|
# TODO: Docling JSON is currently the single supported JSON flavor and the pipeline
|
||||||
# will try to validate *any* JSON (based on suffix/MIME) as Docling JSON; proper
|
# will try to validate *any* JSON (based on suffix/MIME) as Docling JSON; proper
|
||||||
# disambiguation seen as part of https://github.com/DS4SD/docling/issues/802
|
# disambiguation seen as part of https://github.com/docling-project/docling/issues/802
|
||||||
test_str = "{}"
|
test_str = "{}"
|
||||||
stream = DocumentStream(name="test.json", stream=BytesIO(f"{test_str}".encode()))
|
stream = DocumentStream(name="test.json", stream=BytesIO(f"{test_str}".encode()))
|
||||||
assert dci._guess_format(stream) == InputFormat.JSON_DOCLING
|
assert dci._guess_format(stream) == InputFormat.JSON_DOCLING
|
||||||
|
Loading…
Reference in New Issue
Block a user