- When the OCR is forced, any existing PDF cells are rejected.
- Introduce the force-ocr cmd parameter in docling CLI.
- Update unit tests.
- Add the full_page_ocr.py example in mkdocs.
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
* feat(OCR tests): Introduce fuzziness in the text validation of OCR tests
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
* fix(TesseractOcrCliModel): Send the stderr to devnull to avoid poluting the console with messages from tesseract cmd
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
---------
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>