Christoph Auer
c253dd743a
Add redbooks to test data, small additions ( #35 )
...
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
Co-authored-by: Christoph Auer <cau@zurich.ibm.com >
2024-08-20 12:36:00 +02:00
Michele Dolfi
90dd676422
feat: update parser with bytesio interface and set as new default backend ( #32 )
...
* update parser with bytesio interface
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
* change default backend
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
* update DEFAULT_BACKEND
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
---------
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
2024-08-14 12:30:00 +02:00
Michele Dolfi
794b20a50a
fix: type of path_or_stream in PdfDocumentBackend ( #28 )
...
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
2024-08-07 17:20:44 +02:00
Maxim Lysak
b8f5e38a8c
feat: introducing docling_backend ( #26 )
...
Uses our own docling_parse to reliably get PDF cells
To get page images, this backend uses pypdfium2
Signed-off-by: Maxim Lysak <mly@zurich.ibm.com >
Co-authored-by: Maxim Lysak <mly@zurich.ibm.com >
2024-08-07 16:22:36 +02:00
mara004
3eca8b8485
refactor(pypdfium2): just forward input to PdfDocument directly ( #17 )
...
PdfDocument() should do accept strings, paths, bytes and byte streams. If not, please file a bug report.
Signed-off-by: mara004 <geisserml@gmail.com >
2024-07-25 08:54:57 +02:00
Christoph Auer
e2d996753b
Initial commit
2024-07-15 09:42:42 +02:00