feat(ocr): auto-detect rotated pages in Tesseract (#1167)
* fix(ocr): tesseract support mis-oriented documents Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com> * fix(ocr): update missing test data Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com> * fix(ocr): rotate image to the natural orientation before layout prediction Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com> * fix(ocr): move bounding bow rotation util to orientation.py Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com> * fix(ocr): refactor rotation utilities Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com> * chore(ocr): revert layout updates Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com> * chore(ocr): update e2e OCR test data Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com> * fix(ocr): avoid to swallow tesseract errors causing orientation detection failures Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com> * chore(ocr): revert layout updates Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com> * chore(ocr): update e2e OCR test data * chore(ocr): proceed to OCR without rotation when OSD fails in `TesseractOcrCliModel` * chore(ocr): proceed to OCR without rotation when OSD fails in `TesseractOcrModel` * chore(ocr): default `TesseractOcrCliModel._is_auto` to `False` * fix(ocr): fix `TesseractOcrCliModel._is_auto` computation * chore(ocr): improve logging in case of OSD failure in `TesseractOcrCliModel` and `TesseractOcrModel` --------- Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com>
This commit is contained in:
@@ -5951,7 +5951,7 @@
|
||||
"b": 465.596681609368,
|
||||
"coord_origin": "TOPLEFT"
|
||||
},
|
||||
"confidence": 0.93938809633255,
|
||||
"confidence": 0.9393879771232605,
|
||||
"cells": [
|
||||
{
|
||||
"index": 77,
|
||||
@@ -7406,7 +7406,7 @@
|
||||
"b": 534.1167018462124,
|
||||
"coord_origin": "TOPLEFT"
|
||||
},
|
||||
"confidence": 0.5769621729850769,
|
||||
"confidence": 0.5769620537757874,
|
||||
"cells": [
|
||||
{
|
||||
"index": 134,
|
||||
@@ -8046,7 +8046,7 @@
|
||||
"b": 650.6431884765625,
|
||||
"coord_origin": "TOPLEFT"
|
||||
},
|
||||
"confidence": 0.6444893479347229,
|
||||
"confidence": 0.6444889903068542,
|
||||
"cells": [],
|
||||
"children": []
|
||||
}
|
||||
@@ -10042,7 +10042,7 @@
|
||||
"b": 465.596681609368,
|
||||
"coord_origin": "TOPLEFT"
|
||||
},
|
||||
"confidence": 0.93938809633255,
|
||||
"confidence": 0.9393879771232605,
|
||||
"cells": [
|
||||
{
|
||||
"index": 77,
|
||||
@@ -11509,7 +11509,7 @@
|
||||
"b": 534.1167018462124,
|
||||
"coord_origin": "TOPLEFT"
|
||||
},
|
||||
"confidence": 0.5769621729850769,
|
||||
"confidence": 0.5769620537757874,
|
||||
"cells": [
|
||||
{
|
||||
"index": 134,
|
||||
@@ -12155,7 +12155,7 @@
|
||||
"b": 650.6431884765625,
|
||||
"coord_origin": "TOPLEFT"
|
||||
},
|
||||
"confidence": 0.6444893479347229,
|
||||
"confidence": 0.6444889903068542,
|
||||
"cells": [],
|
||||
"children": []
|
||||
},
|
||||
@@ -14148,7 +14148,7 @@
|
||||
"b": 465.596681609368,
|
||||
"coord_origin": "TOPLEFT"
|
||||
},
|
||||
"confidence": 0.93938809633255,
|
||||
"confidence": 0.9393879771232605,
|
||||
"cells": [
|
||||
{
|
||||
"index": 77,
|
||||
@@ -15615,7 +15615,7 @@
|
||||
"b": 534.1167018462124,
|
||||
"coord_origin": "TOPLEFT"
|
||||
},
|
||||
"confidence": 0.5769621729850769,
|
||||
"confidence": 0.5769620537757874,
|
||||
"cells": [
|
||||
{
|
||||
"index": 134,
|
||||
@@ -16261,7 +16261,7 @@
|
||||
"b": 650.6431884765625,
|
||||
"coord_origin": "TOPLEFT"
|
||||
},
|
||||
"confidence": 0.6444893479347229,
|
||||
"confidence": 0.6444889903068542,
|
||||
"cells": [],
|
||||
"children": []
|
||||
},
|
||||
|
||||
Reference in New Issue
Block a user