feat(ocr): auto-detect rotated pages in Tesseract (#1167)
* fix(ocr): tesseract support mis-oriented documents Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com> * fix(ocr): update missing test data Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com> * fix(ocr): rotate image to the natural orientation before layout prediction Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com> * fix(ocr): move bounding bow rotation util to orientation.py Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com> * fix(ocr): refactor rotation utilities Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com> * chore(ocr): revert layout updates Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com> * chore(ocr): update e2e OCR test data Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com> * fix(ocr): avoid to swallow tesseract errors causing orientation detection failures Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com> * chore(ocr): revert layout updates Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com> * chore(ocr): update e2e OCR test data * chore(ocr): proceed to OCR without rotation when OSD fails in `TesseractOcrCliModel` * chore(ocr): proceed to OCR without rotation when OSD fails in `TesseractOcrModel` * chore(ocr): default `TesseractOcrCliModel._is_auto` to `False` * fix(ocr): fix `TesseractOcrCliModel._is_auto` computation * chore(ocr): improve logging in case of OSD failure in `TesseractOcrCliModel` and `TesseractOcrModel` --------- Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com>
This commit is contained in:
@@ -14942,9 +14942,9 @@
|
||||
"page_no": 2,
|
||||
"bbox": {
|
||||
"l": 148.45364379882812,
|
||||
"t": 583.6257476806641,
|
||||
"t": 583.6257629394531,
|
||||
"r": 464.3608093261719,
|
||||
"b": 366.1538391113281,
|
||||
"b": 366.1537780761719,
|
||||
"coord_origin": "BOTTOMLEFT"
|
||||
},
|
||||
"charspan": [
|
||||
@@ -15221,9 +15221,9 @@
|
||||
{
|
||||
"page_no": 7,
|
||||
"bbox": {
|
||||
"l": 164.6503143310547,
|
||||
"l": 164.65028381347656,
|
||||
"t": 628.2029113769531,
|
||||
"r": 449.550537109375,
|
||||
"r": 449.5505676269531,
|
||||
"b": 511.6590576171875,
|
||||
"coord_origin": "BOTTOMLEFT"
|
||||
},
|
||||
@@ -15475,7 +15475,7 @@
|
||||
{
|
||||
"page_no": 8,
|
||||
"bbox": {
|
||||
"l": 140.70960998535156,
|
||||
"l": 140.70968627929688,
|
||||
"t": 283.9361572265625,
|
||||
"r": 472.73382568359375,
|
||||
"b": 198.32281494140625,
|
||||
@@ -15804,10 +15804,10 @@
|
||||
{
|
||||
"page_no": 10,
|
||||
"bbox": {
|
||||
"l": 162.67434692382812,
|
||||
"t": 347.3774719238281,
|
||||
"r": 451.70068359375,
|
||||
"b": 128.786376953125,
|
||||
"l": 162.67430114746094,
|
||||
"t": 347.37744140625,
|
||||
"r": 451.70062255859375,
|
||||
"b": 128.78643798828125,
|
||||
"coord_origin": "BOTTOMLEFT"
|
||||
},
|
||||
"charspan": [
|
||||
@@ -15875,9 +15875,9 @@
|
||||
{
|
||||
"page_no": 11,
|
||||
"bbox": {
|
||||
"l": 168.3928985595703,
|
||||
"l": 168.39285278320312,
|
||||
"t": 610.0334930419922,
|
||||
"r": 447.3513488769531,
|
||||
"r": 447.35137939453125,
|
||||
"b": 157.99432373046875,
|
||||
"coord_origin": "BOTTOMLEFT"
|
||||
},
|
||||
@@ -17702,7 +17702,7 @@
|
||||
"page_no": 10,
|
||||
"bbox": {
|
||||
"l": 143.6376495361328,
|
||||
"t": 635.6522827148438,
|
||||
"t": 635.6522979736328,
|
||||
"r": 470.8485412597656,
|
||||
"b": 528.7375183105469,
|
||||
"coord_origin": "BOTTOMLEFT"
|
||||
|
||||
Reference in New Issue
Block a user