Docling/tests
Maxim Lysak d0a1180478
fix: Fixes for wordx (#432)
* fixes for referencing drawing blip in wordx

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Added safety try-except when trying to load pillow image from a docx blob. Added explicit dependency on lxml.

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Added test for word file with embedded emf images, re-generated full tests for docx, eased up dependency on lxml

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Updated lxml dependency version

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

---------

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
Co-authored-by: Maksym Lysak <mly@zurich.ibm.com>
2024-11-26 14:44:43 +01:00
..
data fix: Fixes for wordx (#432) 2024-11-26 14:44:43 +01:00
data_scanned feat!: Docling v2 (#117) 2024-10-16 21:02:03 +02:00
__init__.py fix: Add unit tests (#51) 2024-08-30 14:08:20 +02:00
test_backend_asciidoc.py feat: Add pipeline timings and toggle visualization, establish debug settings (#183) 2024-10-30 15:04:19 +01:00
test_backend_docling_parse_v2.py chore: make tests lighter (#228) 2024-11-04 14:02:28 +01:00
test_backend_docling_parse.py chore: make tests lighter (#228) 2024-11-04 14:02:28 +01:00
test_backend_html.py fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
test_backend_msexcel.py feat: added excel backend (#334) 2024-11-19 12:21:17 +01:00
test_backend_msword.py fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
test_backend_pdfium.py chore: make tests lighter (#228) 2024-11-04 14:02:28 +01:00
test_backend_pptx.py feat: Extracting picture data for raster images found in PPTX (#349) 2024-11-18 15:22:28 +01:00
test_cli.py feat: add docling cli (#75) 2024-09-13 14:03:09 +02:00
test_e2e_conversion.py feat: Add pipeline timings and toggle visualization, establish debug settings (#183) 2024-10-30 15:04:19 +01:00
test_e2e_ocr_conversion.py feat: add support for ocrmac OCR engine on macOS (#276) 2024-11-20 12:51:19 +01:00
test_input_doc.py fix: set valid=false for invalid backends (#171) 2024-10-23 15:52:30 +02:00
test_interfaces.py feat!: Docling v2 (#117) 2024-10-16 21:02:03 +02:00
test_legacy_format_transform.py fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
test_options.py feat: add coverage_threshold to skip OCR for small images (#161) 2024-10-18 13:58:23 +02:00
verify_utils.py feat(OCR): Introduce the OcrOptions.force_full_page_ocr parameter that forces a full page OCR scanning (#290) 2024-11-12 09:46:14 +01:00