Docling/tests
Nikos Livathinos dae2a3b667
fix: remove stderr from tesseract cli and introduce fuzziness in the text validation of OCR tests (#138)
* feat(OCR tests): Introduce fuzziness in the text validation of OCR tests

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix(TesseractOcrCliModel): Send the stderr to devnull to avoid poluting the console with messages from tesseract cmd

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

---------

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
2024-10-11 10:21:19 +02:00
..
data feat: new torch-based docling models (#120) 2024-10-03 18:42:33 +02:00
data_scanned feat: add options for choosing OCR engines (#118) 2024-10-08 19:07:08 +02:00
__init__.py fix: Add unit tests (#51) 2024-08-30 14:08:20 +02:00
test_backend_docling_parse.py fix: Add unit tests (#51) 2024-08-30 14:08:20 +02:00
test_backend_pdfium.py fix: Add unit tests (#51) 2024-08-30 14:08:20 +02:00
test_cli.py feat: add docling cli (#75) 2024-09-13 14:03:09 +02:00
test_e2e_conversion.py feat: Support tableformer model choice (#90) 2024-09-26 21:37:08 +02:00
test_e2e_ocr_conversion.py fix: remove stderr from tesseract cli and introduce fuzziness in the text validation of OCR tests (#138) 2024-10-11 10:21:19 +02:00
test_interfaces.py feat: Support tableformer model choice (#90) 2024-09-26 21:37:08 +02:00
test_options.py feat: Support tableformer model choice (#90) 2024-09-26 21:37:08 +02:00
verify_utils.py fix: remove stderr from tesseract cli and introduce fuzziness in the text validation of OCR tests (#138) 2024-10-11 10:21:19 +02:00