Docling

Author	SHA1	Message	Date
Christoph Auer	e94d317c02	feat: Add adaptive OCR, factor out treatment of OCR areas and cell filtering (#38 ) * Introduce adaptive OCR Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Factor out BaseOcrModel, add docling-parse backend tests, fixes * Make easyocr default dep Signed-off-by: Christoph Auer <cau@zurich.ibm.com> --------- Signed-off-by: Christoph Auer <cau@zurich.ibm.com>	2024-08-20 15:28:03 +02:00
Michele Dolfi	78347bf679	feat: allow computing page images on-demand with scale and cache them (#36 ) * feat: allow computing page images on-demand and cache them Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * feat: expose scale for export of page images and document elements Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * fix comment Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> --------- Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2024-08-20 13:27:19 +02:00
Michele Dolfi	794b20a50a	fix: type of path_or_stream in PdfDocumentBackend (#28 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2024-08-07 17:20:44 +02:00
mara004	3eca8b8485	refactor(pypdfium2): just forward input to PdfDocument directly (#17 ) PdfDocument() should do accept strings, paths, bytes and byte streams. If not, please file a bug report. Signed-off-by: mara004 <geisserml@gmail.com>	2024-07-25 08:54:57 +02:00
Christoph Auer	e2d996753b	Initial commit	2024-07-15 09:42:42 +02:00