Christoph Auer
|
e94d317c02
|
feat: Add adaptive OCR, factor out treatment of OCR areas and cell filtering (#38)
* Introduce adaptive OCR
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Factor out BaseOcrModel, add docling-parse backend tests, fixes
* Make easyocr default dep
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
|
2024-08-20 15:28:03 +02:00 |
|
Michele Dolfi
|
78347bf679
|
feat: allow computing page images on-demand with scale and cache them (#36)
* feat: allow computing page images on-demand and cache them
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* feat: expose scale for export of page images and document elements
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* fix comment
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
---------
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
|
2024-08-20 13:27:19 +02:00 |
|
Michele Dolfi
|
794b20a50a
|
fix: type of path_or_stream in PdfDocumentBackend (#28)
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
|
2024-08-07 17:20:44 +02:00 |
|
mara004
|
3eca8b8485
|
refactor(pypdfium2): just forward input to PdfDocument directly (#17)
PdfDocument() should do accept strings, paths, bytes and byte streams. If not, please file a bug report.
Signed-off-by: mara004 <geisserml@gmail.com>
|
2024-07-25 08:54:57 +02:00 |
|
Christoph Auer
|
e2d996753b
|
Initial commit
|
2024-07-15 09:42:42 +02:00 |
|