Docling/tests/data
Pedro Ribeiro 98b5eeb844
fix(pypdfium): resolve overlapping text when merging bounding boxes (#1549)
get merged_text from boundingbox instead of merging it to prevent overlaps

Signed-off-by: Pedro Ribeiro <pedro_ribeiro_93@hotmail.com>
2025-05-19 15:26:00 +02:00
..
asciidoc fix: Test cases for RTL programmatic PDFs and fixes for the formula model (#903) 2025-02-07 08:43:31 +01:00
csv feat: Add support for CSV input with new backend to transform CSV files to DoclingDocument (#945) 2025-02-14 08:55:09 +01:00
docx feat: add textbox content extraction in msword_backend (#1538) 2025-05-19 15:01:36 +02:00
groundtruth fix(pypdfium): resolve overlapping text when merging bounding boxes (#1549) 2025-05-19 15:26:00 +02:00
html fix(HTML): handle row spans in header rows (#1536) 2025-05-09 15:14:32 +02:00
jats feat(xml-jats): parse XML JATS documents (#967) 2025-02-17 10:43:31 +01:00
md fix: improve HTML layer detection, various MD fixes (#1241) 2025-03-26 16:07:14 +01:00
pdf fix(pypdfium): resolve overlapping text when merging bounding boxes (#1549) 2025-05-19 15:26:00 +02:00
pptx feat: Add PPTX notes slides (#474) 2025-03-19 14:52:09 +01:00
uspto feat: create a backend to parse USPTO patents into DoclingDocument (#606) 2024-12-17 16:35:23 +01:00
webp feat: support image/webp file type (#1415) 2025-05-14 09:47:28 +02:00
xlsx fix: added extraction of byte-images in excel (#804) 2025-01-24 18:48:02 +01:00
2305.03393v1-pg9-img.png feat!: Docling v2 (#117) 2024-10-16 21:02:03 +02:00