Docling/docling/utils
Nikos Livathinos 3be2fb581f
feat: Introduce automatic language detection in TesseractOcrCliModel (#800)
* feat: Introduce automatic language detection in tesseract_ocr_cli model. Extend unit tests.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* docs: Add example how to use "auto" language with tesseract OCR engines

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: Refactor the TesseractOcrModel and TesseractOcrCliModel to validate if the auto-detected
language is installed in the system and if not fall back to a default option without language.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

---------

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
2025-01-26 08:07:56 +01:00
..
__init__.py Initial commit 2024-07-15 09:42:42 +02:00
accelerator_utils.py feat: Introduce support for GPU Accelerators (#593) 2024-12-13 17:45:22 +01:00
export.py feat!: Docling v2 (#117) 2024-10-16 21:02:03 +02:00
glm_utils.py feat: Code and equation model for PDF and code blocks in markdown (#752) 2025-01-24 16:54:22 +01:00
layout_postprocessor.py feat: Updated Layout processing with forms and key-value areas (#530) 2024-12-17 17:32:24 +01:00
ocr_utils.py feat: Introduce automatic language detection in TesseractOcrCliModel (#800) 2025-01-26 08:07:56 +01:00
profiling.py feat: Add pipeline timings and toggle visualization, establish debug settings (#183) 2024-10-30 15:04:19 +01:00
utils.py Initial commit 2024-07-15 09:42:42 +02:00
visualization.py chore: expose draw_clusters function (#803) 2025-01-24 17:35:29 +01:00