Docling/docling/models
Nikos Livathinos dae2a3b667
fix: remove stderr from tesseract cli and introduce fuzziness in the text validation of OCR tests (#138)
* feat(OCR tests): Introduce fuzziness in the text validation of OCR tests

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix(TesseractOcrCliModel): Send the stderr to devnull to avoid poluting the console with messages from tesseract cmd

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

---------

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
2024-10-11 10:21:19 +02:00
..
__init__.py Initial commit 2024-07-15 09:42:42 +02:00
base_ocr_model.py feat: add options for choosing OCR engines (#118) 2024-10-08 19:07:08 +02:00
ds_glm_model.py feat: linux arm64 support and reducing dependencies (#69) 2024-09-10 15:43:27 +02:00
easyocr_model.py feat: add options for choosing OCR engines (#118) 2024-10-08 19:07:08 +02:00
layout_model.py feat: new torch-based docling models (#120) 2024-10-03 18:42:33 +02:00
page_assemble_model.py feat: Optimize table extraction quality, add configuration options (#11) 2024-07-17 16:13:21 +02:00
table_structure_model.py feat: Support tableformer model choice (#90) 2024-09-26 21:37:08 +02:00
tesseract_ocr_cli_model.py fix: remove stderr from tesseract cli and introduce fuzziness in the text validation of OCR tests (#138) 2024-10-11 10:21:19 +02:00
tesseract_ocr_model.py feat: add options for choosing OCR engines (#118) 2024-10-08 19:07:08 +02:00