Docling/docs
Nikos Livathinos 3be2fb581f
feat: Introduce automatic language detection in TesseractOcrCliModel (#800)
* feat: Introduce automatic language detection in tesseract_ocr_cli model. Extend unit tests.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* docs: Add example how to use "auto" language with tesseract OCR engines

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: Refactor the TesseractOcrModel and TesseractOcrCliModel to validate if the auto-detected
language is installed in the system and if not fall back to a default option without language.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

---------

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
2025-01-26 08:07:56 +01:00
..
assets feat: expose new hybrid chunker, update docs (#384) 2024-12-09 08:28:29 +01:00
concepts docs: fix links between docs pages (#697) 2025-01-20 09:52:59 +01:00
examples feat: Introduce automatic language detection in TesseractOcrCliModel (#800) 2025-01-26 08:07:56 +01:00
integrations docs: add pointers to LangChain-side docs (#718) 2025-01-09 17:36:46 +01:00
overrides docs: extend integration docs & README (#456) 2024-11-28 09:41:21 +01:00
reference docs: specify docstring types (#702) 2025-01-08 09:05:18 +01:00
stylesheets docs: introduce docs site (#141) 2024-10-14 14:13:13 +02:00
faq.md docs: add styling for faq (#502) 2024-12-03 11:20:49 +01:00
index.md docs: add LangChain docs (#717) 2025-01-09 14:12:05 +01:00
installation.md feat(ocr): added support for RapidOCR engine (#415) 2024-11-27 13:57:41 +01:00
usage.md docs: update chunking usage docs, minor reorg (#550) 2024-12-10 16:03:02 +01:00
v2.md docs: fix links between docs pages (#697) 2025-01-20 09:52:59 +01:00