Docling/tests
mkrssg 1350a8d3e5
fix(msword_backend): Identify text in the same line after an image #1425 (#1610)
* fix(msword_backend): Identify text in the same line after an image / image anchor #1425

Signed-off-by: Michael Krissgau <michael.krissgau@ibm.com>

* test: add test file and case for fix(msword_backend): Identify text in the same line after an image / image anchor #1425

Signed-off-by: Michael Krissgau <michael.krissgau@ibm.com>

* test: added groundtruth test files for fix(msword_backend): Identify text in the same line after an image / image anchor #1425

Signed-off-by: Michael Krissgau <michael.krissgau@ibm.com>

* fix: extraneous empty paragraphs for test files

Signed-off-by: Michael Krissgau <michael.krissgau@ibm.com>

---------

Signed-off-by: Michael Krissgau <michael.krissgau@ibm.com>
Co-authored-by: Michael Krissgau <michael.krissgau@ibm.com>
2025-06-20 10:55:30 +02:00
..
data fix(msword_backend): Identify text in the same line after an image #1425 (#1610) 2025-06-20 10:55:30 +02:00
data_scanned feat: Make Page.parsed_page the only source of truth for text cells, add OCR cells to it (#1745) 2025-06-13 19:01:55 +02:00
__init__.py fix: Add unit tests (#51) 2024-08-30 14:08:20 +02:00
test_backend_asciidoc.py fix(asciidoc): set default size when missing in image directive (#1769) 2025-06-16 10:38:46 +02:00
test_backend_csv.py chore: fix or ignore runtime and deprecation warnings (#1660) 2025-05-28 17:55:31 +02:00
test_backend_docling_json.py feat: add Docling JSON ingestion (#783) 2025-01-24 18:05:23 +01:00
test_backend_docling_parse_v2.py ci: add coverage and ruff (#1383) 2025-04-14 18:01:26 +02:00
test_backend_docling_parse_v4.py ci: add coverage and ruff (#1383) 2025-04-14 18:01:26 +02:00
test_backend_docling_parse.py ci: add coverage and ruff (#1383) 2025-04-14 18:01:26 +02:00
test_backend_html.py ci: add coverage and ruff (#1383) 2025-04-14 18:01:26 +02:00
test_backend_jats.py ci: add coverage and ruff (#1383) 2025-04-14 18:01:26 +02:00
test_backend_markdown.py feat(markdown): add formatting & improve inline support (#1804) 2025-06-18 15:57:57 +02:00
test_backend_msexcel.py feat: support xlsm files (#1520) 2025-06-10 16:55:59 +02:00
test_backend_msword.py fix(msword_backend): Identify text in the same line after an image #1425 (#1610) 2025-06-20 10:55:30 +02:00
test_backend_patent_uspto.py ci: add coverage and ruff (#1383) 2025-04-14 18:01:26 +02:00
test_backend_pdfium.py fix(pypdfium): resolve overlapping text when merging bounding boxes (#1549) 2025-05-19 15:26:00 +02:00
test_backend_pptx.py ci: add coverage and ruff (#1383) 2025-04-14 18:01:26 +02:00
test_backend_webp.py feat: support image/webp file type (#1415) 2025-05-14 09:47:28 +02:00
test_cli.py fix: Test cases for RTL programmatic PDFs and fixes for the formula model (#903) 2025-02-07 08:43:31 +01:00
test_code_formula.py fix: formula conversion with page_range param set (#1791) 2025-06-17 13:58:45 +02:00
test_data_gen_flag.py fix(markdown): handle nested lists (#910) 2025-02-07 12:55:12 +01:00
test_document_picture_classifier.py ci: add coverage and ruff (#1383) 2025-04-14 18:01:26 +02:00
test_e2e_conversion.py feat: new vlm-models support (#1570) 2025-06-02 17:01:06 +02:00
test_e2e_ocr_conversion.py feat: Make Page.parsed_page the only source of truth for text cells, add OCR cells to it (#1745) 2025-06-13 19:01:55 +02:00
test_input_doc.py fix: guess HTML content starting with script tag (#1673) 2025-06-02 08:43:24 +02:00
test_interfaces.py ci: add coverage and ruff (#1383) 2025-04-14 18:01:26 +02:00
test_invalid_input.py ci: add coverage and ruff (#1383) 2025-04-14 18:01:26 +02:00
test_legacy_format_transform.py chore: fix or ignore runtime and deprecation warnings (#1660) 2025-05-28 17:55:31 +02:00
test_options.py feat: new vlm-models support (#1570) 2025-06-02 17:01:06 +02:00
test_settings_load.py fix(settings): fix nested settings load via environment variables (#1551) 2025-05-14 13:42:10 +02:00
verify_utils.py test: ensure utf-8 in test data utils (#1691) 2025-06-02 12:13:19 +02:00