feat: Make Page.parsed_page the only source of truth for text cells, add OCR cells to it (#1745)
* Keep page.parsed_page.textline_cells and page.cells in sync, including OCR Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Make page.parsed_page the only source of truth for text cells Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Small fix Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Correctly compute PDF boxes from pymupdf Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Use different OCR engine order Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Add type hints and fix mypy Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * One more test fix Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Remove with pypdfium2_lock from caller sites Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Fix typing Signed-off-by: Christoph Auer <cau@zurich.ibm.com> --------- Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
This commit is contained in:
+51209
-50265
File diff suppressed because it is too large
Load Diff
+45258
-44727
File diff suppressed because it is too large
Load Diff
+2687
-2628
File diff suppressed because it is too large
Load Diff
+28529
-27703
File diff suppressed because it is too large
Load Diff
+2512
-2453
File diff suppressed because it is too large
Load Diff
+2318
-2200
File diff suppressed because it is too large
Load Diff
+4964
-4669
File diff suppressed because it is too large
Load Diff
+1344
-1226
File diff suppressed because it is too large
Load Diff
+23787
-22725
File diff suppressed because it is too large
Load Diff
+684
-625
File diff suppressed because it is too large
Load Diff
+4037
-3978
File diff suppressed because it is too large
Load Diff
+1434
-1375
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user