Shubham Gupta
3f91e7d3f1
feat: added support for exporting DocItem to an image when page image is available ( #379 )
...
* Updated minimum docling-core version to 2.4.0
Signed-off-by: Shubham Gupta <26436285+sh-gupta@users.noreply.github.com >
* Deprecated the generate_table_images option
Signed-off-by: Shubham Gupta <26436285+sh-gupta@users.noreply.github.com >
* Updated examples to use get_image instead of element.image
Signed-off-by: Shubham Gupta <26436285+sh-gupta@users.noreply.github.com >
---------
Signed-off-by: Shubham Gupta <26436285+sh-gupta@users.noreply.github.com >
2024-11-19 16:28:52 +01:00
Michele Dolfi
ed785ea122
feat: expose ocr-lang in CLI ( #375 )
...
* feat: expose ocr-lang in CLI
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
* use regex for supporting multiple sep
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
---------
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
2024-11-19 15:58:49 +01:00
Nikos Livathinos
c6b3763ecb
feat(OCR): Introduce the OcrOptions.force_full_page_ocr parameter that forces a full page OCR scanning ( #290 )
...
- When the OCR is forced, any existing PDF cells are rejected.
- Introduce the force-ocr cmd parameter in docling CLI.
- Update unit tests.
- Add the full_page_ocr.py example in mkdocs.
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com >
2024-11-12 09:46:14 +01:00
Michele Dolfi
40ad987303
feat: pdf backend, table mode as options and artifacts path ( #203 )
...
* feat: add more options in the CLI
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
* update CLI docs
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
* expose artifacts-path as argument
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
---------
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
2024-11-04 14:26:05 +01:00
Michele Dolfi
b346faf622
feat: add coverage_threshold to skip OCR for small images ( #161 )
...
* feat: add coverage_threshold to skip OCR for small images
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
* filter individual boxes
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
* rename option
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
---------
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
2024-10-18 13:58:23 +02:00
Christoph Auer
7d3be0edeb
feat!: Docling v2 ( #117 )
...
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
Signed-off-by: Maxim Lysak <mly@zurich.ibm.com >
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com >
Co-authored-by: Maxim Lysak <mly@zurich.ibm.com >
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com >
Co-authored-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com >
2024-10-16 21:02:03 +02:00
Michele Dolfi
f96ea86a00
feat: add options for choosing OCR engines ( #118 )
...
---------
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com >
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
Co-authored-by: Nikos Livathinos <nli@zurich.ibm.com >
Co-authored-by: Peter Staar <taa@zurich.ibm.com >
2024-10-08 19:07:08 +02:00
Christoph Auer
d6df76f90b
feat: Support tableformer model choice ( #90 )
...
* Support tableformer model choice
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Update datamodel structure
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Update docs
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Cleanup
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Add test unit for table options
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Ensure import backwards-compatibility for PipelineOptions
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Update README
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Adjust parameters on custom_convert
Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com >
* Update Dockerfile
Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com >
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com >
2024-09-26 21:37:08 +02:00