feat(ocr): auto-detect rotated pages in Tesseract (#1167)
* fix(ocr): tesseract support mis-oriented documents Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com> * fix(ocr): update missing test data Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com> * fix(ocr): rotate image to the natural orientation before layout prediction Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com> * fix(ocr): move bounding bow rotation util to orientation.py Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com> * fix(ocr): refactor rotation utilities Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com> * chore(ocr): revert layout updates Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com> * chore(ocr): update e2e OCR test data Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com> * fix(ocr): avoid to swallow tesseract errors causing orientation detection failures Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com> * chore(ocr): revert layout updates Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com> * chore(ocr): update e2e OCR test data * chore(ocr): proceed to OCR without rotation when OSD fails in `TesseractOcrCliModel` * chore(ocr): proceed to OCR without rotation when OSD fails in `TesseractOcrModel` * chore(ocr): default `TesseractOcrCliModel._is_auto` to `False` * fix(ocr): fix `TesseractOcrCliModel._is_auto` computation * chore(ocr): improve logging in case of OSD failure in `TesseractOcrCliModel` and `TesseractOcrModel` --------- Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com>
This commit is contained in:
@@ -3099,9 +3099,9 @@
|
||||
"prov": [
|
||||
{
|
||||
"bbox": [
|
||||
323.4081115722656,
|
||||
266.14935302734375,
|
||||
553.295166015625,
|
||||
323.408203125,
|
||||
266.1492919921875,
|
||||
553.2952270507812,
|
||||
541.6512603759766
|
||||
],
|
||||
"page": 1,
|
||||
@@ -3122,9 +3122,9 @@
|
||||
{
|
||||
"bbox": [
|
||||
88.33030700683594,
|
||||
571.4317626953125,
|
||||
571.4317321777344,
|
||||
263.7049560546875,
|
||||
699.1134490966797
|
||||
699.1134796142578
|
||||
],
|
||||
"page": 3,
|
||||
"span": [
|
||||
@@ -3144,9 +3144,9 @@
|
||||
{
|
||||
"bbox": [
|
||||
53.05912780761719,
|
||||
251.1358642578125,
|
||||
251.135986328125,
|
||||
295.8506164550781,
|
||||
481.20867919921875
|
||||
481.2087097167969
|
||||
],
|
||||
"page": 4,
|
||||
"span": [
|
||||
@@ -3234,9 +3234,9 @@
|
||||
{
|
||||
"bbox": [
|
||||
98.93103790283203,
|
||||
497.91845703125,
|
||||
497.91851806640625,
|
||||
512.579833984375,
|
||||
654.5244903564453
|
||||
654.5245208740234
|
||||
],
|
||||
"page": 4,
|
||||
"span": [
|
||||
@@ -8153,7 +8153,7 @@
|
||||
62.02753829956055,
|
||||
440.3381042480469,
|
||||
285.78955078125,
|
||||
596.3199462890625
|
||||
596.3199310302734
|
||||
],
|
||||
"page": 6,
|
||||
"span": [
|
||||
@@ -10514,9 +10514,9 @@
|
||||
"prov": [
|
||||
{
|
||||
"bbox": [
|
||||
80.35527038574219,
|
||||
80.35525512695312,
|
||||
496.5545349121094,
|
||||
267.00823974609375,
|
||||
267.0082092285156,
|
||||
641.0637054443359
|
||||
],
|
||||
"page": 7,
|
||||
@@ -14214,10 +14214,10 @@
|
||||
"prov": [
|
||||
{
|
||||
"bbox": [
|
||||
72.65901947021484,
|
||||
452.14599609375,
|
||||
274.8346862792969,
|
||||
619.5191650390625
|
||||
72.6590347290039,
|
||||
452.1459655761719,
|
||||
274.83465576171875,
|
||||
619.5191955566406
|
||||
],
|
||||
"page": 8,
|
||||
"span": [
|
||||
|
||||
Reference in New Issue
Block a user