* feat: Introduce automatic language detection in tesseract_ocr_cli model. Extend unit tests.
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
* docs: Add example how to use "auto" language with tesseract OCR engines
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
* fix: Refactor the TesseractOcrModel and TesseractOcrCliModel to validate if the auto-detected
language is installed in the system and if not fall back to a default option without language.
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
---------
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
* Add "auto" language for TesseractOcr
Signed-off-by: Pavel Denisov <pavel.denisov@iais.fraunhofer.de>
* Add tesseract-ocr-script-latn installation for the "auto" language
Signed-off-by: Pavel Denisov <pavel.denisov@iais.fraunhofer.de>
* Modify "auto" language in TesseractOcr to initialize the script readers lazily
Signed-off-by: Pavel Denisov <pavel.denisov@iais.fraunhofer.de>
* Finalize script readers
Signed-off-by: Pavel Denisov <pavel.denisov@iais.fraunhofer.de>
* Fix script models prefix for Linux
Signed-off-by: Pavel Denisov <pavel.denisov@iais.fraunhofer.de>
---------
Signed-off-by: Pavel Denisov <pavel.denisov@iais.fraunhofer.de>
- When the OCR is forced, any existing PDF cells are rejected.
- Introduce the force-ocr cmd parameter in docling CLI.
- Update unit tests.
- Add the full_page_ocr.py example in mkdocs.
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
fix(TesseractOcrModel): Raise Exception if tesserocr has not loaded any languages. Provide a descriptive error message.
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
* Add settings to turn visualization on or off
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Add profiling code to all models
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Refactor and fix profiling codes
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Visualization codes output PNG to debug dir
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fixes for time logging
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Optimize imports
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Update lockfile
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Add start_timestamps to ProfilingItem
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>