Docling/tests/data/groundtruth/docling_v2
Christoph Auer 56a0e104f7
feat: Integrate ListItemMarkerProcessor into document assembly (#1825)
* Integrate ListItemMarkerProcessor into document assembly

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Update to final version

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Update all test cases

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Upgrade deps

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

---------

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2025-07-01 10:04:58 +02:00
..
2203.01017v2.doctags.txt feat: Integrate ListItemMarkerProcessor into document assembly (#1825) 2025-07-01 10:04:58 +02:00
2203.01017v2.json feat: Integrate ListItemMarkerProcessor into document assembly (#1825) 2025-07-01 10:04:58 +02:00
2203.01017v2.md feat: Integrate ListItemMarkerProcessor into document assembly (#1825) 2025-07-01 10:04:58 +02:00
2203.01017v2.pages.json feat: Make Page.parsed_page the only source of truth for text cells, add OCR cells to it (#1745) 2025-06-13 19:01:55 +02:00
2206.01062.doctags.txt feat: Integrate ListItemMarkerProcessor into document assembly (#1825) 2025-07-01 10:04:58 +02:00
2206.01062.json feat: Integrate ListItemMarkerProcessor into document assembly (#1825) 2025-07-01 10:04:58 +02:00
2206.01062.md chore: update locked deps (#1239) 2025-03-25 15:48:02 +01:00
2206.01062.pages.json feat: Make Page.parsed_page the only source of truth for text cells, add OCR cells to it (#1745) 2025-06-13 19:01:55 +02:00
2305.03393v1-pg9.doctags.txt feat: Use new TableFormer model weights and default to accurate model version (#1100) 2025-03-11 10:53:49 +01:00
2305.03393v1-pg9.json feat: leverage new list modeling, capture default markers (#1856) 2025-06-27 16:37:15 +02:00
2305.03393v1-pg9.md feat: Use new TableFormer model weights and default to accurate model version (#1100) 2025-03-11 10:53:49 +01:00
2305.03393v1-pg9.pages.json feat: Integrate ListItemMarkerProcessor into document assembly (#1825) 2025-07-01 10:04:58 +02:00
2305.03393v1.doctags.txt feat: Integrate ListItemMarkerProcessor into document assembly (#1825) 2025-07-01 10:04:58 +02:00
2305.03393v1.json feat: Integrate ListItemMarkerProcessor into document assembly (#1825) 2025-07-01 10:04:58 +02:00
2305.03393v1.md feat: Integrate ListItemMarkerProcessor into document assembly (#1825) 2025-07-01 10:04:58 +02:00
2305.03393v1.pages.json feat: Make Page.parsed_page the only source of truth for text cells, add OCR cells to it (#1745) 2025-06-13 19:01:55 +02:00
amt_handbook_sample.doctags.txt fix: Revise DocTags, fix iterate_items to output content_layer in items (#965) 2025-02-17 14:11:55 +01:00
amt_handbook_sample.json feat: leverage new list modeling, capture default markers (#1856) 2025-06-27 16:37:15 +02:00
amt_handbook_sample.md docs: Add example for inspection of picture content (#624) 2025-01-29 10:39:00 +01:00
amt_handbook_sample.pages.json feat: Make Page.parsed_page the only source of truth for text cells, add OCR cells to it (#1745) 2025-06-13 19:01:55 +02:00
blocks.md.md fix: Pass tests, update docling-core to 2.22.0 (#1150) 2025-03-13 09:45:55 +01:00
bmj_sample.xml.itxt feat(xml-jats): parse XML JATS documents (#967) 2025-02-17 10:43:31 +01:00
bmj_sample.xml.json feat(xml-jats): parse XML JATS documents (#967) 2025-02-17 10:43:31 +01:00
bmj_sample.xml.md feat(xml-jats): parse XML JATS documents (#967) 2025-02-17 10:43:31 +01:00
code_and_formula.doctags.txt chore: update locked deps (#1239) 2025-03-25 15:48:02 +01:00
code_and_formula.json feat: leverage new list modeling, capture default markers (#1856) 2025-06-27 16:37:15 +02:00
code_and_formula.md chore: update locked deps (#1239) 2025-03-25 15:48:02 +01:00
code_and_formula.pages.json feat: Make Page.parsed_page the only source of truth for text cells, add OCR cells to it (#1745) 2025-06-13 19:01:55 +02:00
csv-comma-in-cell.csv.itxt feat: Add support for CSV input with new backend to transform CSV files to DoclingDocument (#945) 2025-02-14 08:55:09 +01:00
csv-comma-in-cell.csv.json feat: leverage new list modeling, capture default markers (#1856) 2025-06-27 16:37:15 +02:00
csv-comma-in-cell.csv.md feat: Add support for CSV input with new backend to transform CSV files to DoclingDocument (#945) 2025-02-14 08:55:09 +01:00
csv-comma.csv.itxt feat: Add support for CSV input with new backend to transform CSV files to DoclingDocument (#945) 2025-02-14 08:55:09 +01:00
csv-comma.csv.json feat: leverage new list modeling, capture default markers (#1856) 2025-06-27 16:37:15 +02:00
csv-comma.csv.md feat: Add support for CSV input with new backend to transform CSV files to DoclingDocument (#945) 2025-02-14 08:55:09 +01:00
csv-inconsistent-header.csv.itxt feat: Add support for CSV input with new backend to transform CSV files to DoclingDocument (#945) 2025-02-14 08:55:09 +01:00
csv-inconsistent-header.csv.json feat: leverage new list modeling, capture default markers (#1856) 2025-06-27 16:37:15 +02:00
csv-inconsistent-header.csv.md feat: Add support for CSV input with new backend to transform CSV files to DoclingDocument (#945) 2025-02-14 08:55:09 +01:00
csv-pipe.csv.itxt feat: Add support for CSV input with new backend to transform CSV files to DoclingDocument (#945) 2025-02-14 08:55:09 +01:00
csv-pipe.csv.json feat: leverage new list modeling, capture default markers (#1856) 2025-06-27 16:37:15 +02:00
csv-pipe.csv.md feat: Add support for CSV input with new backend to transform CSV files to DoclingDocument (#945) 2025-02-14 08:55:09 +01:00
csv-semicolon.csv.itxt feat: Add support for CSV input with new backend to transform CSV files to DoclingDocument (#945) 2025-02-14 08:55:09 +01:00
csv-semicolon.csv.json feat: leverage new list modeling, capture default markers (#1856) 2025-06-27 16:37:15 +02:00
csv-semicolon.csv.md feat: Add support for CSV input with new backend to transform CSV files to DoclingDocument (#945) 2025-02-14 08:55:09 +01:00
csv-tab.csv.itxt feat: Add support for CSV input with new backend to transform CSV files to DoclingDocument (#945) 2025-02-14 08:55:09 +01:00
csv-tab.csv.json feat: leverage new list modeling, capture default markers (#1856) 2025-06-27 16:37:15 +02:00
csv-tab.csv.md feat: Add support for CSV input with new backend to transform CSV files to DoclingDocument (#945) 2025-02-14 08:55:09 +01:00
csv-too-few-columns.csv.itxt feat: Add support for CSV input with new backend to transform CSV files to DoclingDocument (#945) 2025-02-14 08:55:09 +01:00
csv-too-few-columns.csv.json feat: leverage new list modeling, capture default markers (#1856) 2025-06-27 16:37:15 +02:00
csv-too-few-columns.csv.md feat: Add support for CSV input with new backend to transform CSV files to DoclingDocument (#945) 2025-02-14 08:55:09 +01:00
csv-too-many-columns.csv.itxt feat: Add support for CSV input with new backend to transform CSV files to DoclingDocument (#945) 2025-02-14 08:55:09 +01:00
csv-too-many-columns.csv.json feat: leverage new list modeling, capture default markers (#1856) 2025-06-27 16:37:15 +02:00
csv-too-many-columns.csv.md feat: Add support for CSV input with new backend to transform CSV files to DoclingDocument (#945) 2025-02-14 08:55:09 +01:00
duck.md.md fix: fix single newline handling in MD backend (#824) 2025-01-28 19:05:55 +01:00
elife-56337.xml.itxt feat(xml-jats): parse XML JATS documents (#967) 2025-02-17 10:43:31 +01:00
elife-56337.xml.md feat(xml-jats): parse XML JATS documents (#967) 2025-02-17 10:43:31 +01:00
ending_with_table.md.md fix(markdown): fix parsing if doc ending with table (#873) 2025-02-03 14:38:38 +01:00
equations.docx.itxt fix(docx): Adding new latex symbols, simplifying how equations are added to text (#1295) 2025-04-08 17:11:37 +02:00
equations.docx.json feat: leverage new list modeling, capture default markers (#1856) 2025-06-27 16:37:15 +02:00
equations.docx.md fix(docx): Adding new latex symbols, simplifying how equations are added to text (#1295) 2025-04-08 17:11:37 +02:00
example_8.html.itxt feat: support xlsm files (#1520) 2025-06-10 16:55:59 +02:00
example_8.html.json feat: support xlsm files (#1520) 2025-06-10 16:55:59 +02:00
example_8.html.md feat: support xlsm files (#1520) 2025-06-10 16:55:59 +02:00
example_01.html.itxt feat: leverage new list modeling, capture default markers (#1856) 2025-06-27 16:37:15 +02:00
example_01.html.json feat: leverage new list modeling, capture default markers (#1856) 2025-06-27 16:37:15 +02:00
example_01.html.md feat: leverage new list modeling, capture default markers (#1856) 2025-06-27 16:37:15 +02:00
example_02.html.itxt feat: leverage new list modeling, capture default markers (#1856) 2025-06-27 16:37:15 +02:00
example_02.html.json feat: leverage new list modeling, capture default markers (#1856) 2025-06-27 16:37:15 +02:00
example_02.html.md fix(html): fix HTML parsed heading level (#1244) 2025-03-26 10:30:23 +01:00
example_03.html.itxt feat: leverage new list modeling, capture default markers (#1856) 2025-06-27 16:37:15 +02:00
example_03.html.json feat: leverage new list modeling, capture default markers (#1856) 2025-06-27 16:37:15 +02:00
example_03.html.md fix(html): fix HTML parsed heading level (#1244) 2025-03-26 10:30:23 +01:00
example_04.html.itxt fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
example_04.html.json feat: leverage new list modeling, capture default markers (#1856) 2025-06-27 16:37:15 +02:00
example_04.html.md feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905) 2025-03-18 10:38:19 +01:00
example_05.html.itxt fix: parse html with omitted body tag (#818) 2025-01-27 16:59:00 +01:00
example_05.html.json feat: leverage new list modeling, capture default markers (#1856) 2025-06-27 16:37:15 +02:00
example_05.html.md feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905) 2025-03-18 10:38:19 +01:00
example_06.html.itxt fix(html): handle address, details, and summary tags (#1436) 2025-04-23 09:30:59 +02:00
example_06.html.json feat: leverage new list modeling, capture default markers (#1856) 2025-06-27 16:37:15 +02:00
example_06.html.md fix(html): handle address, details, and summary tags (#1436) 2025-04-23 09:30:59 +02:00
example_07.html.itxt fix(html): handle nested empty lists (#1154) 2025-03-13 16:56:58 +01:00
example_07.html.json feat: leverage new list modeling, capture default markers (#1856) 2025-06-27 16:37:15 +02:00
example_07.html.md fix(html): handle nested empty lists (#1154) 2025-03-13 16:56:58 +01:00
example_08.html.itxt test: add missing ground truth files (#1667) 2025-05-28 13:26:49 +02:00
example_08.html.json feat: leverage new list modeling, capture default markers (#1856) 2025-06-27 16:37:15 +02:00
example_08.html.md test: add missing ground truth files (#1667) 2025-05-28 13:26:49 +02:00
inline_and_formatting.md.md fix(markdown): make parsing of rich table cells valid (#1821) 2025-06-26 19:50:45 +02:00
inline_and_formatting.md.yaml feat: leverage new list modeling, capture default markers (#1856) 2025-06-27 16:37:15 +02:00
ipa20180000016.itxt feat: create a backend to parse USPTO patents into DoclingDocument (#606) 2024-12-17 16:35:23 +01:00
ipa20180000016.json feat: leverage new list modeling, capture default markers (#1856) 2025-06-27 16:37:15 +02:00
ipa20180000016.md feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905) 2025-03-18 10:38:19 +01:00
ipa20200022300.itxt feat: create a backend to parse USPTO patents into DoclingDocument (#606) 2024-12-17 16:35:23 +01:00
ipa20200022300.json feat: leverage new list modeling, capture default markers (#1856) 2025-06-27 16:37:15 +02:00
ipa20200022300.md feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905) 2025-03-18 10:38:19 +01:00
lorem_ipsum.docx.itxt fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
lorem_ipsum.docx.json feat: leverage new list modeling, capture default markers (#1856) 2025-06-27 16:37:15 +02:00
lorem_ipsum.docx.md fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
mixed_without_h1.md.md fix: improve HTML layer detection, various MD fixes (#1241) 2025-03-26 16:07:14 +01:00
mixed.md.md fix(html): fix HTML parsed heading level (#1244) 2025-03-26 10:30:23 +01:00
multi_page.doctags.txt feat: Integrate ListItemMarkerProcessor into document assembly (#1825) 2025-07-01 10:04:58 +02:00
multi_page.json feat: Integrate ListItemMarkerProcessor into document assembly (#1825) 2025-07-01 10:04:58 +02:00
multi_page.md feat: Integrate ListItemMarkerProcessor into document assembly (#1825) 2025-07-01 10:04:58 +02:00
multi_page.pages.json feat: Make Page.parsed_page the only source of truth for text cells, add OCR cells to it (#1745) 2025-06-13 19:01:55 +02:00
nested.md.md fix(markdown): handle nested lists (#910) 2025-02-07 12:55:12 +01:00
pa20010031492.itxt feat: create a backend to parse USPTO patents into DoclingDocument (#606) 2024-12-17 16:35:23 +01:00
pa20010031492.json feat: leverage new list modeling, capture default markers (#1856) 2025-06-27 16:37:15 +02:00
pa20010031492.md feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905) 2025-03-18 10:38:19 +01:00
pftaps057006474.itxt fix: Pass tests, update docling-core to 2.22.0 (#1150) 2025-03-13 09:45:55 +01:00
pftaps057006474.json feat: leverage new list modeling, capture default markers (#1856) 2025-06-27 16:37:15 +02:00
pftaps057006474.md feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905) 2025-03-18 10:38:19 +01:00
pg06442728.itxt feat: create a backend to parse USPTO patents into DoclingDocument (#606) 2024-12-17 16:35:23 +01:00
pg06442728.json feat: leverage new list modeling, capture default markers (#1856) 2025-06-27 16:37:15 +02:00
pg06442728.md feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905) 2025-03-18 10:38:19 +01:00
picture_classification.doctags.txt fix: Revise DocTags, fix iterate_items to output content_layer in items (#965) 2025-02-17 14:11:55 +01:00
picture_classification.json feat: leverage new list modeling, capture default markers (#1856) 2025-06-27 16:37:15 +02:00
picture_classification.md feat: New document picture classifier (#805) 2025-01-24 18:05:51 +01:00
picture_classification.pages.json feat: Make Page.parsed_page the only source of truth for text cells, add OCR cells to it (#1745) 2025-06-13 19:01:55 +02:00
pnas_sample.xml.itxt feat(xml-jats): parse XML JATS documents (#967) 2025-02-17 10:43:31 +01:00
pnas_sample.xml.json feat(xml-jats): parse XML JATS documents (#967) 2025-02-17 10:43:31 +01:00
pnas_sample.xml.md feat(xml-jats): parse XML JATS documents (#967) 2025-02-17 10:43:31 +01:00
pntd.0008301.xml.itxt feat(xml-jats): parse XML JATS documents (#967) 2025-02-17 10:43:31 +01:00
pntd.0008301.xml.md feat(xml-jats): parse XML JATS documents (#967) 2025-02-17 10:43:31 +01:00
pone.0234687.xml.itxt feat(xml-jats): parse XML JATS documents (#967) 2025-02-17 10:43:31 +01:00
pone.0234687.xml.md feat(xml-jats): parse XML JATS documents (#967) 2025-02-17 10:43:31 +01:00
powerpoint_bad_text.pptx.itxt fix: pptx line break and space handling (#1664) 2025-06-16 10:44:30 +02:00
powerpoint_bad_text.pptx.json feat: leverage new list modeling, capture default markers (#1856) 2025-06-27 16:37:15 +02:00
powerpoint_bad_text.pptx.md fix: pptx line break and space handling (#1664) 2025-06-16 10:44:30 +02:00
powerpoint_sample.pptx.itxt feat: leverage new list modeling, capture default markers (#1856) 2025-06-27 16:37:15 +02:00
powerpoint_sample.pptx.json feat: leverage new list modeling, capture default markers (#1856) 2025-06-27 16:37:15 +02:00
powerpoint_sample.pptx.md feat: Extracting picture data for raster images found in PPTX (#349) 2024-11-18 15:22:28 +01:00
powerpoint_with_image.pptx.itxt feat: Extracting picture data for raster images found in PPTX (#349) 2024-11-18 15:22:28 +01:00
powerpoint_with_image.pptx.json feat: leverage new list modeling, capture default markers (#1856) 2025-06-27 16:37:15 +02:00
powerpoint_with_image.pptx.md feat: Extracting picture data for raster images found in PPTX (#349) 2024-11-18 15:22:28 +01:00
redp5110_sampled.doctags.txt feat: Integrate ListItemMarkerProcessor into document assembly (#1825) 2025-07-01 10:04:58 +02:00
redp5110_sampled.json feat: Integrate ListItemMarkerProcessor into document assembly (#1825) 2025-07-01 10:04:58 +02:00
redp5110_sampled.md feat: Integrate ListItemMarkerProcessor into document assembly (#1825) 2025-07-01 10:04:58 +02:00
redp5110_sampled.pages.json feat: Make Page.parsed_page the only source of truth for text cells, add OCR cells to it (#1745) 2025-06-13 19:01:55 +02:00
right_to_left_01.doctags.txt feat: Implement new reading-order model (#916) 2025-02-20 17:51:17 +01:00
right_to_left_01.json feat: leverage new list modeling, capture default markers (#1856) 2025-06-27 16:37:15 +02:00
right_to_left_01.md feat: Implement new reading-order model (#916) 2025-02-20 17:51:17 +01:00
right_to_left_01.pages.json feat: Make Page.parsed_page the only source of truth for text cells, add OCR cells to it (#1745) 2025-06-13 19:01:55 +02:00
right_to_left_02.doctags.txt chore: update locked deps (#1239) 2025-03-25 15:48:02 +01:00
right_to_left_02.json feat: leverage new list modeling, capture default markers (#1856) 2025-06-27 16:37:15 +02:00
right_to_left_02.md fix: Test cases for RTL programmatic PDFs and fixes for the formula model (#903) 2025-02-07 08:43:31 +01:00
right_to_left_02.pages.json feat: Make Page.parsed_page the only source of truth for text cells, add OCR cells to it (#1745) 2025-06-13 19:01:55 +02:00
right_to_left_03.doctags.txt fix: Revise DocTags, fix iterate_items to output content_layer in items (#965) 2025-02-17 14:11:55 +01:00
right_to_left_03.json feat: leverage new list modeling, capture default markers (#1856) 2025-06-27 16:37:15 +02:00
right_to_left_03.md fix: Test cases for RTL programmatic PDFs and fixes for the formula model (#903) 2025-02-07 08:43:31 +01:00
right_to_left_03.pages.json feat: Make Page.parsed_page the only source of truth for text cells, add OCR cells to it (#1745) 2025-06-13 19:01:55 +02:00
sample_sales_data.xlsm.itxt feat: support xlsm files (#1520) 2025-06-10 16:55:59 +02:00
sample_sales_data.xlsm.json feat: leverage new list modeling, capture default markers (#1856) 2025-06-27 16:37:15 +02:00
sample_sales_data.xlsm.md feat: support xlsm files (#1520) 2025-06-10 16:55:59 +02:00
tablecell.docx.itxt fix: Fixes for wordx (#432) 2024-11-26 14:44:43 +01:00
tablecell.docx.json feat: leverage new list modeling, capture default markers (#1856) 2025-06-27 16:37:15 +02:00
tablecell.docx.md fix: Fixes for wordx (#432) 2024-11-26 14:44:43 +01:00
test_01.asciidoc.md feat: Support AsciiDoc and Markdown input format (#168) 2024-10-23 16:14:26 +02:00
test_02.asciidoc.md feat: Support AsciiDoc and Markdown input format (#168) 2024-10-23 16:14:26 +02:00
test_03.asciidoc.md fix(asciidoc): set default size when missing in image directive (#1769) 2025-06-16 10:38:46 +02:00
test_emf_docx.docx.itxt fix: Fixes for wordx (#432) 2024-11-26 14:44:43 +01:00
test_emf_docx.docx.json feat: leverage new list modeling, capture default markers (#1856) 2025-06-27 16:37:15 +02:00
test_emf_docx.docx.md fix: Fixes for wordx (#432) 2024-11-26 14:44:43 +01:00
test-01.xlsx.itxt fix: added extraction of byte-images in excel (#804) 2025-01-24 18:48:02 +01:00
test-01.xlsx.json feat: leverage new list modeling, capture default markers (#1856) 2025-06-27 16:37:15 +02:00
test-01.xlsx.md fix: added extraction of byte-images in excel (#804) 2025-01-24 18:48:02 +01:00
textbox.docx.itxt feat: leverage new list modeling, capture default markers (#1856) 2025-06-27 16:37:15 +02:00
textbox.docx.json feat: leverage new list modeling, capture default markers (#1856) 2025-06-27 16:37:15 +02:00
textbox.docx.md feat: support xlsm files (#1520) 2025-06-10 16:55:59 +02:00
unit_test_01.html.itxt fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
unit_test_01.html.json feat: leverage new list modeling, capture default markers (#1856) 2025-06-27 16:37:15 +02:00
unit_test_01.html.md fix(html): fix HTML parsed heading level (#1244) 2025-03-26 10:30:23 +01:00
unit_test_formatting.docx.itxt fix(docx): ensure list items have a list parent (#1827) 2025-06-20 14:47:25 +02:00
unit_test_formatting.docx.json feat: leverage new list modeling, capture default markers (#1856) 2025-06-27 16:37:15 +02:00
unit_test_formatting.docx.md feat(docx): add text formatting and hyperlink support (#630) 2025-04-03 15:11:50 +02:00
unit_test_headers_numbered.docx.itxt fix(docx): identifying numbered headers (#1231) 2025-03-25 11:41:02 +01:00
unit_test_headers_numbered.docx.json feat: leverage new list modeling, capture default markers (#1856) 2025-06-27 16:37:15 +02:00
unit_test_headers_numbered.docx.md fix(docx): identifying numbered headers (#1231) 2025-03-25 11:41:02 +01:00
unit_test_headers.docx.itxt fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
unit_test_headers.docx.json feat: leverage new list modeling, capture default markers (#1856) 2025-06-27 16:37:15 +02:00
unit_test_headers.docx.md fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
unit_test_lists.docx.itxt fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
unit_test_lists.docx.json feat: leverage new list modeling, capture default markers (#1856) 2025-06-27 16:37:15 +02:00
unit_test_lists.docx.md fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
wiki_duck.html.itxt feat: leverage new list modeling, capture default markers (#1856) 2025-06-27 16:37:15 +02:00
wiki_duck.html.json feat: leverage new list modeling, capture default markers (#1856) 2025-06-27 16:37:15 +02:00
wiki_duck.html.md fix: improve HTML layer detection, various MD fixes (#1241) 2025-03-26 16:07:14 +01:00
wiki.md.md fix: fix single newline handling in MD backend (#824) 2025-01-28 19:05:55 +01:00
word_image_anchors.docx.itxt fix(msword_backend): Identify text in the same line after an image #1425 (#1610) 2025-06-20 10:55:30 +02:00
word_image_anchors.docx.json feat: leverage new list modeling, capture default markers (#1856) 2025-06-27 16:37:15 +02:00
word_image_anchors.docx.md fix(msword_backend): Identify text in the same line after an image #1425 (#1610) 2025-06-20 10:55:30 +02:00
word_sample.docx.itxt fix: Fixing images in the input Word files (#330) 2024-11-14 13:33:34 +01:00
word_sample.docx.json feat: leverage new list modeling, capture default markers (#1856) 2025-06-27 16:37:15 +02:00
word_sample.docx.md fix: Fixing images in the input Word files (#330) 2024-11-14 13:33:34 +01:00
word_sample.json fix: Fixes for wordx (#432) 2024-11-26 14:44:43 +01:00
word_sample.md fix: Fixes for wordx (#432) 2024-11-26 14:44:43 +01:00
word_sample.yaml fix: Fixes for wordx (#432) 2024-11-26 14:44:43 +01:00
word_tables.docx.html feat(cli): add option for html with split-page mode (#1355) 2025-04-14 08:41:50 +02:00
word_tables.docx.itxt fix(docx): merged table cells not properly converted (#857) 2025-02-03 10:20:03 +01:00
word_tables.docx.json feat: leverage new list modeling, capture default markers (#1856) 2025-06-27 16:37:15 +02:00
word_tables.docx.md fix(docx): merged table cells not properly converted (#857) 2025-02-03 10:20:03 +01:00