Docling/docling
mkrssg 1350a8d3e5
fix(msword_backend): Identify text in the same line after an image #1425 (#1610)
* fix(msword_backend): Identify text in the same line after an image / image anchor #1425

Signed-off-by: Michael Krissgau <michael.krissgau@ibm.com>

* test: add test file and case for fix(msword_backend): Identify text in the same line after an image / image anchor #1425

Signed-off-by: Michael Krissgau <michael.krissgau@ibm.com>

* test: added groundtruth test files for fix(msword_backend): Identify text in the same line after an image / image anchor #1425

Signed-off-by: Michael Krissgau <michael.krissgau@ibm.com>

* fix: extraneous empty paragraphs for test files

Signed-off-by: Michael Krissgau <michael.krissgau@ibm.com>

---------

Signed-off-by: Michael Krissgau <michael.krissgau@ibm.com>
Co-authored-by: Michael Krissgau <michael.krissgau@ibm.com>
2025-06-20 10:55:30 +02:00
..
backend fix(msword_backend): Identify text in the same line after an image #1425 (#1610) 2025-06-20 10:55:30 +02:00
chunking feat: expose new hybrid chunker, update docs (#384) 2024-12-09 08:28:29 +01:00
cli feat: new vlm-models support (#1570) 2025-06-02 17:01:06 +02:00
datamodel feat: Maximum image size for Vlm models (#1802) 2025-06-18 12:57:37 +02:00
models fix: Ensure uninitialized pages are removed before assembling document (#1812) 2025-06-19 07:33:25 +02:00
pipeline fix: Ensure uninitialized pages are removed before assembling document (#1812) 2025-06-19 07:33:25 +02:00
utils feat: Make Page.parsed_page the only source of truth for text cells, add OCR cells to it (#1745) 2025-06-13 19:01:55 +02:00
__init__.py Initial commit 2024-07-15 09:42:42 +02:00
document_converter.py feat: new vlm-models support (#1570) 2025-06-02 17:01:06 +02:00
exceptions.py feat: Introduce the enable_remote_services option to allow remote connections while processing (#941) 2025-02-12 15:18:01 +01:00
py.typed fix: Add py.typed marker file (#531) 2024-12-06 13:42:14 +01:00