feat: add Docling JSON ingestion (#783)

* feat: add Docling JSON ingestion

Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>

* update conversion as per review comments, add tests, revert Docling JSON disambiguation, document intricacies

Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>

* Update docling/backend/json/docling_json_backend.py

Co-authored-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>

---------

Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
Co-authored-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
This commit is contained in:
Panos Vagenas
2025-01-24 18:05:23 +01:00
committed by GitHub
parent e9768ae6a5
commit 88a0e66adc
8 changed files with 144 additions and 1 deletions

View File

@@ -350,6 +350,8 @@ class _DocumentConversionInput(BaseModel):
mime = FormatToMimeType[InputFormat.HTML][0]
elif ext in FormatToExtensions[InputFormat.MD]:
mime = FormatToMimeType[InputFormat.MD][0]
elif ext in FormatToExtensions[InputFormat.JSON_DOCLING]:
mime = FormatToMimeType[InputFormat.JSON_DOCLING][0]
return mime
@staticmethod