Docling/docling
Nikos Livathinos dae2a3b667
fix: remove stderr from tesseract cli and introduce fuzziness in the text validation of OCR tests (#138)
* feat(OCR tests): Introduce fuzziness in the text validation of OCR tests

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix(TesseractOcrCliModel): Send the stderr to devnull to avoid poluting the console with messages from tesseract cmd

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

---------

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
2024-10-11 10:21:19 +02:00
..
backend test: improve typing definitions (part 1) (#72) 2024-09-12 15:56:29 +02:00
cli feat: add options for choosing OCR engines (#118) 2024-10-08 19:07:08 +02:00
datamodel feat: add options for choosing OCR engines (#118) 2024-10-08 19:07:08 +02:00
models fix: remove stderr from tesseract cli and introduce fuzziness in the text validation of OCR tests (#138) 2024-10-11 10:21:19 +02:00
pipeline feat: add options for choosing OCR engines (#118) 2024-10-08 19:07:08 +02:00
utils fix: updated the render_as_doctags with the new arguments from docling-core (#93) 2024-09-23 20:12:18 +02:00
__init__.py Initial commit 2024-07-15 09:42:42 +02:00
document_converter.py fixed unload pdf backend resources (#129) 2024-10-08 10:46:43 +02:00