chore: move to docling-project org (#1160)

* chore: rename org

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* Update docs/faq/index.md

Co-authored-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
Signed-off-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com>

* update github pages

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* revert test content

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com>
Co-authored-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
This commit is contained in:
Michele Dolfi 2025-03-14 12:35:29 +01:00 committed by GitHub
parent f94da44ec5
commit fa16b12316
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
26 changed files with 390 additions and 390 deletions

2
.github/SECURITY.md vendored
View File

@ -20,4 +20,4 @@ After the initial reply to your report, the security team will keep you informed
## Security Alerts ## Security Alerts
We will send announcements of security vulnerabilities and steps to remediate on the [Docling announcements](https://github.com/DS4SD/docling/discussions/categories/announcements). We will send announcements of security vulnerabilities and steps to remediate on the [Docling announcements](https://github.com/docling-project/docling/discussions/categories/announcements).

View File

@ -10,7 +10,7 @@ on:
jobs: jobs:
build-docs: build-docs:
if: ${{ github.event_name == 'push' || (github.event.pull_request.head.repo.full_name != 'DS4SD/docling' && github.event.pull_request.head.repo.full_name != 'ds4sd/docling') }} if: ${{ github.event_name == 'push' || (github.event.pull_request.head.repo.full_name != 'docling-project/docling' && github.event.pull_request.head.repo.full_name != 'docling-project/docling') }}
uses: ./.github/workflows/docs.yml uses: ./.github/workflows/docs.yml
with: with:
deploy: false deploy: false

View File

@ -15,5 +15,5 @@ env:
jobs: jobs:
code-checks: code-checks:
if: ${{ github.event_name == 'push' || (github.event.pull_request.head.repo.full_name != 'DS4SD/docling' && github.event.pull_request.head.repo.full_name != 'ds4sd/docling') }} if: ${{ github.event_name == 'push' || (github.event.pull_request.head.repo.full_name != 'docling-project/docling' && github.event.pull_request.head.repo.full_name != 'docling-project/docling') }}
uses: ./.github/workflows/checks.yml uses: ./.github/workflows/checks.yml

File diff suppressed because it is too large Load Diff

View File

@ -2,13 +2,13 @@
Our project welcomes external contributions. If you have an itch, please feel Our project welcomes external contributions. If you have an itch, please feel
free to scratch it. free to scratch it.
To contribute code or documentation, please submit a [pull request](https://github.com/DS4SD/docling/pulls). To contribute code or documentation, please submit a [pull request](https://github.com/docling-project/docling/pulls).
A good way to familiarize yourself with the codebase and contribution process is A good way to familiarize yourself with the codebase and contribution process is
to look for and tackle low-hanging fruit in the [issue tracker](https://github.com/DS4SD/docling/issues). to look for and tackle low-hanging fruit in the [issue tracker](https://github.com/docling-project/docling/issues).
Before embarking on a more ambitious contribution, please quickly [get in touch](#communication) with us. Before embarking on a more ambitious contribution, please quickly [get in touch](#communication) with us.
For general questions or support requests, please refer to the [discussion section](https://github.com/DS4SD/docling/discussions). For general questions or support requests, please refer to the [discussion section](https://github.com/docling-project/docling/discussions).
**Note: We appreciate your effort and want to avoid situations where a contribution **Note: We appreciate your effort and want to avoid situations where a contribution
requires extensive rework (by you or by us), sits in the backlog for a long time, or requires extensive rework (by you or by us), sits in the backlog for a long time, or
@ -16,14 +16,14 @@ cannot be accepted at all!**
### Proposing New Features ### Proposing New Features
If you would like to implement a new feature, please [raise an issue](https://github.com/DS4SD/docling/issues) If you would like to implement a new feature, please [raise an issue](https://github.com/docling-project/docling/issues)
before sending a pull request so the feature can be discussed. This is to avoid before sending a pull request so the feature can be discussed. This is to avoid
you spending valuable time working on a feature that the project developers you spending valuable time working on a feature that the project developers
are not interested in accepting into the codebase. are not interested in accepting into the codebase.
### Fixing Bugs ### Fixing Bugs
If you would like to fix a bug, please [raise an issue](https://github.com/DS4SD/docling/issues) before sending a If you would like to fix a bug, please [raise an issue](https://github.com/docling-project/docling/issues) before sending a
pull request so it can be tracked. pull request so it can be tracked.
### Merge Approval ### Merge Approval
@ -78,7 +78,7 @@ This project strictly adheres to using dependencies that are compatible with the
## Communication ## Communication
Please feel free to connect with us using the [discussion section](https://github.com/DS4SD/docling/discussions). Please feel free to connect with us using the [discussion section](https://github.com/docling-project/docling/discussions).

View File

@ -1,6 +1,6 @@
<p align="center"> <p align="center">
<a href="https://github.com/ds4sd/docling"> <a href="https://github.com/docling-project/docling">
<img loading="lazy" alt="Docling" src="https://github.com/DS4SD/docling/raw/main/docs/assets/docling_processing.png" width="100%"/> <img loading="lazy" alt="Docling" src="https://github.com/docling-project/docling/raw/main/docs/assets/docling_processing.png" width="100%"/>
</a> </a>
</p> </p>
@ -11,7 +11,7 @@
</p> </p>
[![arXiv](https://img.shields.io/badge/arXiv-2408.09869-b31b1b.svg)](https://arxiv.org/abs/2408.09869) [![arXiv](https://img.shields.io/badge/arXiv-2408.09869-b31b1b.svg)](https://arxiv.org/abs/2408.09869)
[![Docs](https://img.shields.io/badge/docs-live-brightgreen)](https://ds4sd.github.io/docling/) [![Docs](https://img.shields.io/badge/docs-live-brightgreen)](https://docling-project.github.io/docling/)
[![PyPI version](https://img.shields.io/pypi/v/docling)](https://pypi.org/project/docling/) [![PyPI version](https://img.shields.io/pypi/v/docling)](https://pypi.org/project/docling/)
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/docling)](https://pypi.org/project/docling/) [![PyPI - Python Version](https://img.shields.io/pypi/pyversions/docling)](https://pypi.org/project/docling/)
[![Poetry](https://img.shields.io/endpoint?url=https://python-poetry.org/badge/v0.json)](https://python-poetry.org/) [![Poetry](https://img.shields.io/endpoint?url=https://python-poetry.org/badge/v0.json)](https://python-poetry.org/)
@ -19,7 +19,7 @@
[![Imports: isort](https://img.shields.io/badge/%20imports-isort-%231674b1?style=flat&labelColor=ef8336)](https://pycqa.github.io/isort/) [![Imports: isort](https://img.shields.io/badge/%20imports-isort-%231674b1?style=flat&labelColor=ef8336)](https://pycqa.github.io/isort/)
[![Pydantic v2](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/pydantic/pydantic/main/docs/badge/v2.json)](https://pydantic.dev) [![Pydantic v2](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/pydantic/pydantic/main/docs/badge/v2.json)](https://pydantic.dev)
[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white)](https://github.com/pre-commit/pre-commit) [![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white)](https://github.com/pre-commit/pre-commit)
[![License MIT](https://img.shields.io/github/license/DS4SD/docling)](https://opensource.org/licenses/MIT) [![License MIT](https://img.shields.io/github/license/docling-project/docling)](https://opensource.org/licenses/MIT)
[![PyPI Downloads](https://static.pepy.tech/badge/docling/month)](https://pepy.tech/projects/docling) [![PyPI Downloads](https://static.pepy.tech/badge/docling/month)](https://pepy.tech/projects/docling)
Docling simplifies document processing, parsing diverse formats — including advanced PDF understanding — and providing seamless integrations with the gen AI ecosystem. Docling simplifies document processing, parsing diverse formats — including advanced PDF understanding — and providing seamless integrations with the gen AI ecosystem.
@ -51,7 +51,7 @@ pip install docling
Works on macOS, Linux and Windows environments. Both x86_64 and arm64 architectures. Works on macOS, Linux and Windows environments. Both x86_64 and arm64 architectures.
More [detailed installation instructions](https://ds4sd.github.io/docling/installation/) are available in the docs. More [detailed installation instructions](https://docling-project.github.io/docling/installation/) are available in the docs.
## Getting started ## Getting started
@ -66,28 +66,28 @@ result = converter.convert(source)
print(result.document.export_to_markdown()) # output: "## Docling Technical Report[...]" print(result.document.export_to_markdown()) # output: "## Docling Technical Report[...]"
``` ```
More [advanced usage options](https://ds4sd.github.io/docling/usage/) are available in More [advanced usage options](https://docling-project.github.io/docling/usage/) are available in
the docs. the docs.
## Documentation ## Documentation
Check out Docling's [documentation](https://ds4sd.github.io/docling/), for details on Check out Docling's [documentation](https://docling-project.github.io/docling/), for details on
installation, usage, concepts, recipes, extensions, and more. installation, usage, concepts, recipes, extensions, and more.
## Examples ## Examples
Go hands-on with our [examples](https://ds4sd.github.io/docling/examples/), Go hands-on with our [examples](https://docling-project.github.io/docling/examples/),
demonstrating how to address different application use cases with Docling. demonstrating how to address different application use cases with Docling.
## Integrations ## Integrations
To further accelerate your AI application development, check out Docling's native To further accelerate your AI application development, check out Docling's native
[integrations](https://ds4sd.github.io/docling/integrations/) with popular frameworks [integrations](https://docling-project.github.io/docling/integrations/) with popular frameworks
and tools. and tools.
## Get help and support ## Get help and support
Please feel free to connect with us using the [discussion section](https://github.com/DS4SD/docling/discussions). Please feel free to connect with us using the [discussion section](https://github.com/docling-project/docling/discussions).
## Technical report ## Technical report
@ -95,7 +95,7 @@ For more details on Docling's inner workings, check out the [Docling Technical R
## Contributing ## Contributing
Please read [Contributing to Docling](https://github.com/DS4SD/docling/blob/main/CONTRIBUTING.md) for details. Please read [Contributing to Docling](https://github.com/docling-project/docling/blob/main/CONTRIBUTING.md) for details.
## References ## References
@ -123,6 +123,6 @@ For individual model usage, please refer to the model licenses found in the orig
Docling has been brought to you by IBM. Docling has been brought to you by IBM.
[supported_formats]: https://ds4sd.github.io/docling/usage/supported_formats/ [supported_formats]: https://docling-project.github.io/docling/usage/supported_formats/
[docling_document]: https://ds4sd.github.io/docling/concepts/docling_document/ [docling_document]: https://docling-project.github.io/docling/concepts/docling_document/
[integrations]: https://ds4sd.github.io/docling/integrations/ [integrations]: https://docling-project.github.io/docling/integrations/

View File

@ -121,7 +121,7 @@ def download(
"Using the CLI:", "Using the CLI:",
f"`docling --artifacts-path={output_dir} FILE`", f"`docling --artifacts-path={output_dir} FILE`",
"\n", "\n",
"Using Python: see the documentation at <https://ds4sd.github.io/docling/usage>.", "Using Python: see the documentation at <https://docling-project.github.io/docling/usage>.",
) )

View File

@ -26,7 +26,7 @@ class OcrMacModel(BaseOcrModel):
"ocrmac is not correctly installed. " "ocrmac is not correctly installed. "
"Please install it via `pip install ocrmac` to use this OCR engine. " "Please install it via `pip install ocrmac` to use this OCR engine. "
"Alternatively, Docling has support for other OCR engines. See the documentation: " "Alternatively, Docling has support for other OCR engines. See the documentation: "
"https://ds4sd.github.io/docling/installation/" "https://docling-project.github.io/docling/installation/"
) )
try: try:
from ocrmac import ocrmac from ocrmac import ocrmac

View File

@ -31,14 +31,14 @@ class TesseractOcrModel(BaseOcrModel):
"Note that tesserocr might have to be manually compiled for working with " "Note that tesserocr might have to be manually compiled for working with "
"your Tesseract installation. The Docling documentation provides examples for it. " "your Tesseract installation. The Docling documentation provides examples for it. "
"Alternatively, Docling has support for other OCR engines. See the documentation: " "Alternatively, Docling has support for other OCR engines. See the documentation: "
"https://ds4sd.github.io/docling/installation/" "https://docling-project.github.io/docling/installation/"
) )
missing_langs_errmsg = ( missing_langs_errmsg = (
"tesserocr is not correctly configured. No language models have been detected. " "tesserocr is not correctly configured. No language models have been detected. "
"Please ensure that the TESSDATA_PREFIX envvar points to tesseract languages dir. " "Please ensure that the TESSDATA_PREFIX envvar points to tesseract languages dir. "
"You can find more information how to setup other OCR engines in Docling " "You can find more information how to setup other OCR engines in Docling "
"documentation: " "documentation: "
"https://ds4sd.github.io/docling/installation/" "https://docling-project.github.io/docling/installation/"
) )
try: try:

View File

@ -7,7 +7,7 @@ pydantic datatype, which can express several features common to documents, such
* Layout information (i.e. bounding boxes) for all items, if available * Layout information (i.e. bounding boxes) for all items, if available
* Provenance information * Provenance information
The definition of the Pydantic types is implemented in the module `docling_core.types.doc`, more details in [source code definitions](https://github.com/DS4SD/docling-core/tree/main/docling_core/types/doc). The definition of the Pydantic types is implemented in the module `docling_core.types.doc`, more details in [source code definitions](https://github.com/docling-project/docling-core/tree/main/docling_core/types/doc).
It also brings a set of document construction APIs to build up a `DoclingDocument` from scratch. It also brings a set of document construction APIs to build up a `DoclingDocument` from scratch.

View File

@ -4,7 +4,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"<a href=\"https://colab.research.google.com/github/DS4SD/docling/blob/main/docs/examples/backend_xml_rag.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>" "<a href=\"https://colab.research.google.com/github/docling-project/docling/blob/main/docs/examples/backend_xml_rag.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
] ]
}, },
{ {
@ -36,7 +36,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"This is an example of using [Docling](https://ds4sd.github.io/docling/) for converting structured data (XML) into a unified document\n", "This is an example of using [Docling](https://docling-project.github.io/docling/) for converting structured data (XML) into a unified document\n",
"representation format, `DoclingDocument`, and leverage its riched structured content for RAG applications.\n", "representation format, `DoclingDocument`, and leverage its riched structured content for RAG applications.\n",
"\n", "\n",
"Data used in this example consist of patents from the [United States Patent and Trademark Office (USPTO)](https://www.uspto.gov/) and medical\n", "Data used in this example consist of patents from the [United States Patent and Trademark Office (USPTO)](https://www.uspto.gov/) and medical\n",

View File

@ -103,7 +103,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"> 👉 **NOTE**: As you see above, using the `HybridChunker` can sometimes lead to a warning from the transformers library, however this is a \"false alarm\" — for details check [here](https://ds4sd.github.io/docling/faq/#hybridchunker-triggers-warning-token-indices-sequence-length-is-longer-than-the-specified-maximum-sequence-length-for-this-model)." "> 👉 **NOTE**: As you see above, using the `HybridChunker` can sometimes lead to a warning from the transformers library, however this is a \"false alarm\" — for details check [here](https://docling-project.github.io/docling/faq/#hybridchunker-triggers-warning-token-indices-sequence-length-is-longer-than-the-specified-maximum-sequence-length-for-this-model)."
] ]
}, },
{ {

View File

@ -321,7 +321,7 @@
], ],
"metadata": { "metadata": {
"kernelspec": { "kernelspec": {
"display_name": "docling-aMWN2FRM-py3.12", "display_name": "docling-hgXEfXco-py3.12",
"language": "python", "language": "python",
"name": "python3" "name": "python3"
}, },

View File

@ -36,7 +36,7 @@
"## A recipe 🧑‍🍳 🐥 💚\n", "## A recipe 🧑‍🍳 🐥 💚\n",
"\n", "\n",
"This notebook demonstrates how to build a Retrieval-Augmented Generation (RAG) system using:\n", "This notebook demonstrates how to build a Retrieval-Augmented Generation (RAG) system using:\n",
"- [Docling](https://ds4sd.github.io/docling/) for document parsing and chunking\n", "- [Docling](https://docling-project.github.io/docling/) for document parsing and chunking\n",
"- [Azure AI Search](https://azure.microsoft.com/products/ai-services/ai-search/?msockid=0109678bea39665431e37323ebff6723) for vector indexing and retrieval\n", "- [Azure AI Search](https://azure.microsoft.com/products/ai-services/ai-search/?msockid=0109678bea39665431e37323ebff6723) for vector indexing and retrieval\n",
"- [Azure OpenAI](https://azure.microsoft.com/products/ai-services/openai-service?msockid=0109678bea39665431e37323ebff6723) for embeddings and chat completion\n", "- [Azure OpenAI](https://azure.microsoft.com/products/ai-services/openai-service?msockid=0109678bea39665431e37323ebff6723) for embeddings and chat completion\n",
"\n", "\n",

View File

@ -4,7 +4,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"<a href=\"https://colab.research.google.com/github/DS4SD/docling/blob/main/docs/examples/rag_haystack.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>" "<a href=\"https://colab.research.google.com/github/docling-project/docling/blob/main/docs/examples/rag_haystack.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
] ]
}, },
{ {
@ -247,7 +247,7 @@
"name": "stderr", "name": "stderr",
"output_type": "stream", "output_type": "stream",
"text": [ "text": [
"/Users/pva/work/github.com/DS4SD/docling/.venv/lib/python3.12/site-packages/huggingface_hub/inference/_client.py:2232: FutureWarning: `stop_sequences` is a deprecated argument for `text_generation` task and will be removed in version '0.28.0'. Use `stop` instead.\n", "/Users/pva/work/github.com/docling-project/docling/.venv/lib/python3.12/site-packages/huggingface_hub/inference/_client.py:2232: FutureWarning: `stop_sequences` is a deprecated argument for `text_generation` task and will be removed in version '0.28.0'. Use `stop` instead.\n",
" warnings.warn(\n" " warnings.warn(\n"
] ]
} }

View File

@ -4,7 +4,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"<a href=\"https://colab.research.google.com/github/DS4SD/docling/blob/main/docs/examples/rag_langchain.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>" "<a href=\"https://colab.research.google.com/github/docling-project/docling/blob/main/docs/examples/rag_langchain.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
] ]
}, },
{ {
@ -168,7 +168,7 @@
"source": [ "source": [
"> Note: a message saying `\"Token indices sequence length is longer than the specified\n", "> Note: a message saying `\"Token indices sequence length is longer than the specified\n",
"maximum sequence length...\"` can be ignored in this case — details\n", "maximum sequence length...\"` can be ignored in this case — details\n",
"[here](https://github.com/DS4SD/docling-core/issues/119#issuecomment-2577418826)." "[here](https://github.com/docling-project/docling-core/issues/119#issuecomment-2577418826)."
] ]
}, },
{ {

View File

@ -4,7 +4,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"<a href=\"https://colab.research.google.com/github/DS4SD/docling/blob/main/docs/examples/rag_llamaindex.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>" "<a href=\"https://colab.research.google.com/github/docling-project/docling/blob/main/docs/examples/rag_llamaindex.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
] ]
}, },
{ {

View File

@ -4,7 +4,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/DS4SD/docling/blob/main/docs/examples/rag_weaviate.ipynb)" "[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/docling-project/docling/blob/main/docs/examples/rag_weaviate.ipynb)"
] ]
}, },
{ {
@ -29,7 +29,7 @@
"\n", "\n",
"## A recipe 🧑‍🍳 🐥 💚\n", "## A recipe 🧑‍🍳 🐥 💚\n",
"\n", "\n",
"This is a code recipe that uses [Weaviate](https://weaviate.io/) to perform RAG over PDF documents parsed by [Docling](https://ds4sd.github.io/docling/).\n", "This is a code recipe that uses [Weaviate](https://weaviate.io/) to perform RAG over PDF documents parsed by [Docling](https://docling-project.github.io/docling/).\n",
"\n", "\n",
"In this notebook, we accomplish the following:\n", "In this notebook, we accomplish the following:\n",
"* Parse the top machine learning papers on [arXiv](https://arxiv.org/) using Docling\n", "* Parse the top machine learning papers on [arXiv](https://arxiv.org/) using Docling\n",

View File

@ -4,7 +4,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"<a href=\"https://colab.research.google.com/github/DS4SD/docling/blob/main/docs/examples/hybrid_rag_qdrant\n", "<a href=\"https://colab.research.google.com/github/docling-project/docling/blob/main/docs/examples/hybrid_rag_qdrant\n",
".ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>" ".ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
] ]
}, },
@ -109,7 +109,7 @@
"name": "stderr", "name": "stderr",
"output_type": "stream", "output_type": "stream",
"text": [ "text": [
"/Users/pva/work/github.com/DS4SD/docling/.venv/lib/python3.12/site-packages/huggingface_hub/utils/tqdm.py:155: UserWarning: Cannot enable progress bars: environment variable `HF_HUB_DISABLE_PROGRESS_BARS=1` is set and has priority.\n", "/Users/pva/work/github.com/docling-project/docling/.venv/lib/python3.12/site-packages/huggingface_hub/utils/tqdm.py:155: UserWarning: Cannot enable progress bars: environment variable `HF_HUB_DISABLE_PROGRESS_BARS=1` is set and has priority.\n",
" warnings.warn(\n" " warnings.warn(\n"
] ]
} }

View File

@ -1,6 +1,6 @@
# FAQ # FAQ
This is a collection of FAQ collected from the user questions on <https://github.com/DS4SD/docling/discussions>. This is a collection of FAQ collected from the user questions on <https://github.com/docling-project/docling/discussions>.
??? question "Is Python 3.13 supported?" ??? question "Is Python 3.13 supported?"
@ -41,7 +41,7 @@ This is a collection of FAQ collected from the user questions on <https://github
] ]
``` ```
Source: Issue [#283](https://github.com/DS4SD/docling/issues/283#issuecomment-2465035868) Source: Issue [#283](https://github.com/docling-project/docling/issues/283#issuecomment-2465035868)
??? question "Are text styles (bold, underline, etc) supported?" ??? question "Are text styles (bold, underline, etc) supported?"
@ -74,7 +74,7 @@ This is a collection of FAQ collected from the user questions on <https://github
) )
``` ```
Source: Issue [#326](https://github.com/DS4SD/docling/issues/326) Source: Issue [#326](https://github.com/docling-project/docling/issues/326)
??? question " Which model weights are needed to run Docling?" ??? question " Which model weights are needed to run Docling?"
@ -84,7 +84,7 @@ This is a collection of FAQ collected from the user questions on <https://github
For processing PDF documents, Docling requires the model weights from <https://huggingface.co/ds4sd/docling-models>. For processing PDF documents, Docling requires the model weights from <https://huggingface.co/ds4sd/docling-models>.
When OCR is enabled, some engines also require model artifacts. For example EasyOCR, for which Docling has [special pipeline options](https://github.com/DS4SD/docling/blob/main/docling/datamodel/pipeline_options.py#L68) to control the runtime behavior. When OCR is enabled, some engines also require model artifacts. For example EasyOCR, for which Docling has [special pipeline options](https://github.com/docling-project/docling/blob/main/docling/datamodel/pipeline_options.py#L68) to control the runtime behavior.
??? question "SSL error downloading model weights" ??? question "SSL error downloading model weights"
@ -174,6 +174,6 @@ This is a collection of FAQ collected from the user questions on <https://github
print(f"Model max length: {tokenizer.model_max_length}") print(f"Model max length: {tokenizer.model_max_length}")
``` ```
Also see [docling#725](https://github.com/DS4SD/docling/issues/725). Also see [docling#725](https://github.com/docling-project/docling/issues/725).
Source: Issue [docling-core#119](https://github.com/DS4SD/docling-core/issues/119) Source: Issue [docling-core#119](https://github.com/docling-project/docling-core/issues/119)

View File

@ -11,7 +11,7 @@
[![Imports: isort](https://img.shields.io/badge/%20imports-isort-%231674b1?style=flat&labelColor=ef8336)](https://pycqa.github.io/isort/) [![Imports: isort](https://img.shields.io/badge/%20imports-isort-%231674b1?style=flat&labelColor=ef8336)](https://pycqa.github.io/isort/)
[![Pydantic v2](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/pydantic/pydantic/main/docs/badge/v2.json)](https://pydantic.dev) [![Pydantic v2](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/pydantic/pydantic/main/docs/badge/v2.json)](https://pydantic.dev)
[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white)](https://github.com/pre-commit/pre-commit) [![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white)](https://github.com/pre-commit/pre-commit)
[![License MIT](https://img.shields.io/github/license/DS4SD/docling)](https://opensource.org/licenses/MIT) [![License MIT](https://img.shields.io/github/license/docling-project/docling)](https://opensource.org/licenses/MIT)
[![PyPI Downloads](https://static.pepy.tech/badge/docling/month)](https://pepy.tech/projects/docling) [![PyPI Downloads](https://static.pepy.tech/badge/docling/month)](https://pepy.tech/projects/docling)
Docling simplifies document processing, parsing diverse formats — including advanced PDF understanding — and providing seamless integrations with the gen AI ecosystem. Docling simplifies document processing, parsing diverse formats — including advanced PDF understanding — and providing seamless integrations with the gen AI ecosystem.

View File

@ -5,7 +5,7 @@ Docling is available as a converter in [Haystack](https://haystack.deepset.ai/):
- 🧑🏽‍🍳 [Docling Haystack integration example][example] - 🧑🏽‍🍳 [Docling Haystack integration example][example]
- 📦 [Docling Haystack integration PyPI][pypi] - 📦 [Docling Haystack integration PyPI][pypi]
[github]: https://github.com/DS4SD/docling-haystack [github]: https://github.com/docling-project/docling-haystack
[docs]: https://haystack.deepset.ai/integrations/docling [docs]: https://haystack.deepset.ai/integrations/docling
[pypi]: https://pypi.org/project/docling-haystack [pypi]: https://pypi.org/project/docling-haystack
[example]: ../examples/rag_haystack.ipynb [example]: ../examples/rag_haystack.ipynb

View File

@ -8,7 +8,7 @@ To get started, check out the [step-by-step guide in LangChain][guide].
- 📦 [LangChain Docling integration PyPI][pypi] - 📦 [LangChain Docling integration PyPI][pypi]
[docs]: https://python.langchain.com/docs/integrations/providers/docling/ [docs]: https://python.langchain.com/docs/integrations/providers/docling/
[github]: https://github.com/DS4SD/docling-langchain [github]: https://github.com/docling-project/docling-langchain
[guide]: https://python.langchain.com/docs/integrations/document_loaders/docling/ [guide]: https://python.langchain.com/docs/integrations/document_loaders/docling/
[example]: ../examples/rag_langchain.ipynb [example]: ../examples/rag_langchain.ipynb
[pypi]: https://pypi.org/project/langchain-docling/ [pypi]: https://pypi.org/project/langchain-docling/

View File

@ -1,7 +1,7 @@
site_name: Docling site_name: Docling
site_url: https://ds4sd.github.io/docling/ site_url: https://docling-project.github.io/docling/
repo_name: DS4SD/docling repo_name: docling-project/docling
repo_url: https://github.com/DS4SD/docling repo_url: https://github.com/docling-project/docling
theme: theme:
name: material name: material

View File

@ -13,8 +13,8 @@ authors = [
] ]
license = "MIT" license = "MIT"
readme = "README.md" readme = "README.md"
repository = "https://github.com/DS4SD/docling" repository = "https://github.com/docling-project/docling"
homepage = "https://github.com/DS4SD/docling" homepage = "https://github.com/docling-project/docling"
keywords = [ keywords = [
"docling", "docling",
"convert", "convert",

View File

@ -179,7 +179,7 @@ def test_guess_format(tmp_path):
# Non-Docling JSON # Non-Docling JSON
# TODO: Docling JSON is currently the single supported JSON flavor and the pipeline # TODO: Docling JSON is currently the single supported JSON flavor and the pipeline
# will try to validate *any* JSON (based on suffix/MIME) as Docling JSON; proper # will try to validate *any* JSON (based on suffix/MIME) as Docling JSON; proper
# disambiguation seen as part of https://github.com/DS4SD/docling/issues/802 # disambiguation seen as part of https://github.com/docling-project/docling/issues/802
test_str = "{}" test_str = "{}"
stream = DocumentStream(name="test.json", stream=BytesIO(f"{test_str}".encode())) stream = DocumentStream(name="test.json", stream=BytesIO(f"{test_str}".encode()))
assert dci._guess_format(stream) == InputFormat.JSON_DOCLING assert dci._guess_format(stream) == InputFormat.JSON_DOCLING