feat(ocr): auto-detect rotated pages in Tesseract (#1167)
* fix(ocr): tesseract support mis-oriented documents Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com> * fix(ocr): update missing test data Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com> * fix(ocr): rotate image to the natural orientation before layout prediction Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com> * fix(ocr): move bounding bow rotation util to orientation.py Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com> * fix(ocr): refactor rotation utilities Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com> * chore(ocr): revert layout updates Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com> * chore(ocr): update e2e OCR test data Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com> * fix(ocr): avoid to swallow tesseract errors causing orientation detection failures Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com> * chore(ocr): revert layout updates Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com> * chore(ocr): update e2e OCR test data * chore(ocr): proceed to OCR without rotation when OSD fails in `TesseractOcrCliModel` * chore(ocr): proceed to OCR without rotation when OSD fails in `TesseractOcrModel` * chore(ocr): default `TesseractOcrCliModel._is_auto` to `False` * fix(ocr): fix `TesseractOcrCliModel._is_auto` computation * chore(ocr): improve logging in case of OSD failure in `TesseractOcrCliModel` and `TesseractOcrModel` --------- Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com>
This commit is contained in:
@@ -1541,7 +1541,7 @@
|
||||
"b": 358.76782,
|
||||
"coord_origin": "TOPLEFT"
|
||||
},
|
||||
"confidence": 0.5588339567184448,
|
||||
"confidence": 0.5588350296020508,
|
||||
"cells": [
|
||||
{
|
||||
"index": 18,
|
||||
@@ -1581,7 +1581,7 @@
|
||||
"b": 406.74554,
|
||||
"coord_origin": "TOPLEFT"
|
||||
},
|
||||
"confidence": 0.6312211155891418,
|
||||
"confidence": 0.6312209963798523,
|
||||
"cells": [
|
||||
{
|
||||
"index": 19,
|
||||
@@ -2036,7 +2036,7 @@
|
||||
"b": 607.23564,
|
||||
"coord_origin": "TOPLEFT"
|
||||
},
|
||||
"confidence": 0.9843752980232239,
|
||||
"confidence": 0.9843751788139343,
|
||||
"cells": [
|
||||
{
|
||||
"index": 36,
|
||||
@@ -2719,7 +2719,7 @@
|
||||
"b": 358.76782,
|
||||
"coord_origin": "TOPLEFT"
|
||||
},
|
||||
"confidence": 0.5588339567184448,
|
||||
"confidence": 0.5588350296020508,
|
||||
"cells": [
|
||||
{
|
||||
"index": 18,
|
||||
@@ -2765,7 +2765,7 @@
|
||||
"b": 406.74554,
|
||||
"coord_origin": "TOPLEFT"
|
||||
},
|
||||
"confidence": 0.6312211155891418,
|
||||
"confidence": 0.6312209963798523,
|
||||
"cells": [
|
||||
{
|
||||
"index": 19,
|
||||
@@ -3232,7 +3232,7 @@
|
||||
"b": 607.23564,
|
||||
"coord_origin": "TOPLEFT"
|
||||
},
|
||||
"confidence": 0.9843752980232239,
|
||||
"confidence": 0.9843751788139343,
|
||||
"cells": [
|
||||
{
|
||||
"index": 36,
|
||||
@@ -3914,7 +3914,7 @@
|
||||
"b": 358.76782,
|
||||
"coord_origin": "TOPLEFT"
|
||||
},
|
||||
"confidence": 0.5588339567184448,
|
||||
"confidence": 0.5588350296020508,
|
||||
"cells": [
|
||||
{
|
||||
"index": 18,
|
||||
@@ -3960,7 +3960,7 @@
|
||||
"b": 406.74554,
|
||||
"coord_origin": "TOPLEFT"
|
||||
},
|
||||
"confidence": 0.6312211155891418,
|
||||
"confidence": 0.6312209963798523,
|
||||
"cells": [
|
||||
{
|
||||
"index": 19,
|
||||
@@ -4427,7 +4427,7 @@
|
||||
"b": 607.23564,
|
||||
"coord_origin": "TOPLEFT"
|
||||
},
|
||||
"confidence": 0.9843752980232239,
|
||||
"confidence": 0.9843751788139343,
|
||||
"cells": [
|
||||
{
|
||||
"index": 36,
|
||||
@@ -5782,7 +5782,7 @@
|
||||
"b": 137.5481507594625,
|
||||
"coord_origin": "TOPLEFT"
|
||||
},
|
||||
"confidence": 0.9505067467689514,
|
||||
"confidence": 0.950506865978241,
|
||||
"cells": [
|
||||
{
|
||||
"index": 0,
|
||||
@@ -6302,7 +6302,7 @@
|
||||
"b": 373.7119120634245,
|
||||
"coord_origin": "TOPLEFT"
|
||||
},
|
||||
"confidence": 0.8727476000785828,
|
||||
"confidence": 0.8727474808692932,
|
||||
"cells": [
|
||||
{
|
||||
"index": 19,
|
||||
@@ -7037,7 +7037,7 @@
|
||||
"b": 704.5687238902275,
|
||||
"coord_origin": "TOPLEFT"
|
||||
},
|
||||
"confidence": 0.8504503965377808,
|
||||
"confidence": 0.8504500389099121,
|
||||
"cells": [
|
||||
{
|
||||
"index": 46,
|
||||
@@ -7092,7 +7092,7 @@
|
||||
"b": 137.5481507594625,
|
||||
"coord_origin": "TOPLEFT"
|
||||
},
|
||||
"confidence": 0.9505067467689514,
|
||||
"confidence": 0.950506865978241,
|
||||
"cells": [
|
||||
{
|
||||
"index": 0,
|
||||
@@ -7630,7 +7630,7 @@
|
||||
"b": 373.7119120634245,
|
||||
"coord_origin": "TOPLEFT"
|
||||
},
|
||||
"confidence": 0.8727476000785828,
|
||||
"confidence": 0.8727474808692932,
|
||||
"cells": [
|
||||
{
|
||||
"index": 19,
|
||||
@@ -8389,7 +8389,7 @@
|
||||
"b": 704.5687238902275,
|
||||
"coord_origin": "TOPLEFT"
|
||||
},
|
||||
"confidence": 0.8504503965377808,
|
||||
"confidence": 0.8504500389099121,
|
||||
"cells": [
|
||||
{
|
||||
"index": 46,
|
||||
@@ -8437,7 +8437,7 @@
|
||||
"b": 137.5481507594625,
|
||||
"coord_origin": "TOPLEFT"
|
||||
},
|
||||
"confidence": 0.9505067467689514,
|
||||
"confidence": 0.950506865978241,
|
||||
"cells": [
|
||||
{
|
||||
"index": 0,
|
||||
@@ -8975,7 +8975,7 @@
|
||||
"b": 373.7119120634245,
|
||||
"coord_origin": "TOPLEFT"
|
||||
},
|
||||
"confidence": 0.8727476000785828,
|
||||
"confidence": 0.8727474808692932,
|
||||
"cells": [
|
||||
{
|
||||
"index": 19,
|
||||
@@ -9736,7 +9736,7 @@
|
||||
"b": 704.5687238902275,
|
||||
"coord_origin": "TOPLEFT"
|
||||
},
|
||||
"confidence": 0.8504503965377808,
|
||||
"confidence": 0.8504500389099121,
|
||||
"cells": [
|
||||
{
|
||||
"index": 46,
|
||||
|
||||
Reference in New Issue
Block a user