Clément Doumouro
|
45265bf8b1
|
feat(ocr): auto-detect rotated pages in Tesseract (#1167)
* fix(ocr): tesseract support mis-oriented documents
Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com>
* fix(ocr): update missing test data
Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com>
* fix(ocr): rotate image to the natural orientation before layout prediction
Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com>
* fix(ocr): move bounding bow rotation util to orientation.py
Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com>
* fix(ocr): refactor rotation utilities
Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com>
* chore(ocr): revert layout updates
Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com>
* chore(ocr): update e2e OCR test data
Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com>
* fix(ocr): avoid to swallow tesseract errors causing orientation detection failures
Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com>
* chore(ocr): revert layout updates
Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com>
* chore(ocr): update e2e OCR test data
* chore(ocr): proceed to OCR without rotation when OSD fails in `TesseractOcrCliModel`
* chore(ocr): proceed to OCR without rotation when OSD fails in `TesseractOcrModel`
* chore(ocr): default `TesseractOcrCliModel._is_auto` to `False`
* fix(ocr): fix `TesseractOcrCliModel._is_auto` computation
* chore(ocr): improve logging in case of OSD failure in `TesseractOcrCliModel` and `TesseractOcrModel`
---------
Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com>
|
2025-05-21 18:12:33 +02:00 |
|