Christoph Auer
e94d317c02
feat: Add adaptive OCR, factor out treatment of OCR areas and cell filtering ( #38 )
...
* Introduce adaptive OCR
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Factor out BaseOcrModel, add docling-parse backend tests, fixes
* Make easyocr default dep
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-08-20 15:28:03 +02:00
Michele Dolfi
78347bf679
feat: allow computing page images on-demand with scale and cache them ( #36 )
...
* feat: allow computing page images on-demand and cache them
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* feat: expose scale for export of page images and document elements
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* fix comment
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
---------
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2024-08-20 13:27:19 +02:00
Michele Dolfi
63d80edca2
feat: output page images and extracted bbox ( #31 )
...
* Add assemble options and example saving pages and figures
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* add options for different page elements, improve example and flip name of assemble_options
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
---------
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2024-08-12 18:25:45 +02:00
Panos Vagenas
d603137383
feat: add simplified single-doc conversion ( #20 )
...
Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
2024-07-26 16:55:33 +02:00
Christoph Auer
e2d996753b
Initial commit
2024-07-15 09:42:42 +02:00