Docling

Files

Guilhem VERMOREL b3d111a3cd fix: Tesseract OCR CLI can't process images composed with numbers only (#1201 )

fix wrong type text extracted by tesseract_ocr_cli_model

Signed-off-by: gvl4 <Guilhem.VERMOREL@3ds.com>
Co-authored-by: gvl4 <Guilhem.VERMOREL@3ds.com>

2025-03-31 10:53:49 +02:00

backend

fix: improve HTML layer detection, various MD fixes (#1241 )

2025-03-26 16:07:14 +01:00

chunking

feat: expose new hybrid chunker, update docs (#384 )

2024-12-09 08:28:29 +01:00

cli

feat(SmolDocling): Support MLX acceleration in VLM pipeline (#1199 )

2025-03-19 15:38:54 +01:00

datamodel

feat(SmolDocling): Support MLX acceleration in VLM pipeline (#1199 )

2025-03-19 15:38:54 +01:00

models

fix: Tesseract OCR CLI can't process images composed with numbers only (#1201 )

2025-03-31 10:53:49 +02:00

pipeline

feat(SmolDocling): Support MLX acceleration in VLM pipeline (#1199 )

2025-03-19 15:38:54 +01:00

utils

feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905 )

2025-03-18 10:38:19 +01:00

__init__.py

Initial commit

2024-07-15 09:42:42 +02:00

document_converter.py

fix(converter): Cache same pipeline class with different options (#1152 )

2025-03-25 12:18:44 +01:00

exceptions.py

feat: Introduce the enable_remote_services option to allow remote connections while processing (#941 )

2025-02-12 15:18:01 +01:00

py.typed

fix: Add py.typed marker file (#531 )

2024-12-06 13:42:14 +01:00