Docling/docling
Michele Dolfi 1de2e4f924
feat: export document pages as multimodal output (#54)
* feat: export document pages as multimodal output

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* create a single parquet output

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add loading into HF datasets library

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* renaming

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* cleanup

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2024-09-03 15:05:35 +02:00
..
backend feat: Page-level error reporting from PDF backend, introduce PARTIAL_SUCCESS status (#47) 2024-08-23 16:18:41 +02:00
datamodel feat: export document pages as multimodal output (#54) 2024-09-03 15:05:35 +02:00
models fix: Add unit tests (#51) 2024-08-30 14:08:20 +02:00
pipeline feat: Add adaptive OCR, factor out treatment of OCR areas and cell filtering (#38) 2024-08-20 15:28:03 +02:00
utils feat: export document pages as multimodal output (#54) 2024-09-03 15:05:35 +02:00
__init__.py Initial commit 2024-07-15 09:42:42 +02:00
document_converter.py fix: refine conversion result (#52) 2024-08-27 11:50:43 +02:00