Docling/tests/data/groundtruth/docling_v1
Christoph Auer 56a0e104f7
feat: Integrate ListItemMarkerProcessor into document assembly (#1825)
* Integrate ListItemMarkerProcessor into document assembly

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Update to final version

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Update all test cases

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Upgrade deps

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

---------

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2025-07-01 10:04:58 +02:00
..
2203.01017v2.doctags.txt feat: leverage new list modeling, capture default markers (#1856) 2025-06-27 16:37:15 +02:00
2203.01017v2.json feat: Integrate ListItemMarkerProcessor into document assembly (#1825) 2025-07-01 10:04:58 +02:00
2203.01017v2.md feat: leverage new list modeling, capture default markers (#1856) 2025-06-27 16:37:15 +02:00
2203.01017v2.pages.json feat: Make Page.parsed_page the only source of truth for text cells, add OCR cells to it (#1745) 2025-06-13 19:01:55 +02:00
2206.01062.doctags.txt feat: leverage new list modeling, capture default markers (#1856) 2025-06-27 16:37:15 +02:00
2206.01062.json feat: Integrate ListItemMarkerProcessor into document assembly (#1825) 2025-07-01 10:04:58 +02:00
2206.01062.md feat: leverage new list modeling, capture default markers (#1856) 2025-06-27 16:37:15 +02:00
2206.01062.pages.json feat: Make Page.parsed_page the only source of truth for text cells, add OCR cells to it (#1745) 2025-06-13 19:01:55 +02:00
2305.03393v1-pg9.doctags.txt feat: Use new TableFormer model weights and default to accurate model version (#1100) 2025-03-11 10:53:49 +01:00
2305.03393v1-pg9.json feat: support xlsm files (#1520) 2025-06-10 16:55:59 +02:00
2305.03393v1-pg9.md feat: Use new TableFormer model weights and default to accurate model version (#1100) 2025-03-11 10:53:49 +01:00
2305.03393v1-pg9.pages.json feat: Integrate ListItemMarkerProcessor into document assembly (#1825) 2025-07-01 10:04:58 +02:00
2305.03393v1.doctags.txt feat: leverage new list modeling, capture default markers (#1856) 2025-06-27 16:37:15 +02:00
2305.03393v1.json feat: Integrate ListItemMarkerProcessor into document assembly (#1825) 2025-07-01 10:04:58 +02:00
2305.03393v1.md feat: leverage new list modeling, capture default markers (#1856) 2025-06-27 16:37:15 +02:00
2305.03393v1.pages.json feat: Make Page.parsed_page the only source of truth for text cells, add OCR cells to it (#1745) 2025-06-13 19:01:55 +02:00
amt_handbook_sample.doctags.txt fix: restrict click version and update lock file (#1582) 2025-05-13 10:40:08 +02:00
amt_handbook_sample.json feat(ocr): auto-detect rotated pages in Tesseract (#1167) 2025-05-21 18:12:33 +02:00
amt_handbook_sample.md docs: Add example for inspection of picture content (#624) 2025-01-29 10:39:00 +01:00
amt_handbook_sample.pages.json feat: Make Page.parsed_page the only source of truth for text cells, add OCR cells to it (#1745) 2025-06-13 19:01:55 +02:00
code_and_formula.doctags.txt feat: Implement new reading-order model (#916) 2025-02-20 17:51:17 +01:00
code_and_formula.json chore: format JSON test files to enable comparison (#1511) 2025-05-02 10:52:18 +02:00
code_and_formula.md feat: Implement new reading-order model (#916) 2025-02-20 17:51:17 +01:00
code_and_formula.pages.json feat: Make Page.parsed_page the only source of truth for text cells, add OCR cells to it (#1745) 2025-06-13 19:01:55 +02:00
multi_page.doctags.txt feat: leverage new list modeling, capture default markers (#1856) 2025-06-27 16:37:15 +02:00
multi_page.json feat: Integrate ListItemMarkerProcessor into document assembly (#1825) 2025-07-01 10:04:58 +02:00
multi_page.md feat: leverage new list modeling, capture default markers (#1856) 2025-06-27 16:37:15 +02:00
multi_page.pages.json feat: Make Page.parsed_page the only source of truth for text cells, add OCR cells to it (#1745) 2025-06-13 19:01:55 +02:00
picture_classification.doctags.txt fix: restrict click version and update lock file (#1582) 2025-05-13 10:40:08 +02:00
picture_classification.json feat(ocr): auto-detect rotated pages in Tesseract (#1167) 2025-05-21 18:12:33 +02:00
picture_classification.md feat: New document picture classifier (#805) 2025-01-24 18:05:51 +01:00
picture_classification.pages.json feat: Make Page.parsed_page the only source of truth for text cells, add OCR cells to it (#1745) 2025-06-13 19:01:55 +02:00
redp5110_sampled.doctags.txt feat: leverage new list modeling, capture default markers (#1856) 2025-06-27 16:37:15 +02:00
redp5110_sampled.json feat: Integrate ListItemMarkerProcessor into document assembly (#1825) 2025-07-01 10:04:58 +02:00
redp5110_sampled.md feat: leverage new list modeling, capture default markers (#1856) 2025-06-27 16:37:15 +02:00
redp5110_sampled.pages.json feat: Make Page.parsed_page the only source of truth for text cells, add OCR cells to it (#1745) 2025-06-13 19:01:55 +02:00
right_to_left_01.doctags.txt feat: Implement new reading-order model (#916) 2025-02-20 17:51:17 +01:00
right_to_left_01.json chore: format JSON test files to enable comparison (#1511) 2025-05-02 10:52:18 +02:00
right_to_left_01.md feat: Implement new reading-order model (#916) 2025-02-20 17:51:17 +01:00
right_to_left_01.pages.json feat: Make Page.parsed_page the only source of truth for text cells, add OCR cells to it (#1745) 2025-06-13 19:01:55 +02:00
right_to_left_02.doctags.txt fix: Test cases for RTL programmatic PDFs and fixes for the formula model (#903) 2025-02-07 08:43:31 +01:00
right_to_left_02.json chore: format JSON test files to enable comparison (#1511) 2025-05-02 10:52:18 +02:00
right_to_left_02.md fix: Test cases for RTL programmatic PDFs and fixes for the formula model (#903) 2025-02-07 08:43:31 +01:00
right_to_left_02.pages.json feat: Make Page.parsed_page the only source of truth for text cells, add OCR cells to it (#1745) 2025-06-13 19:01:55 +02:00
right_to_left_03.doctags.txt fix: Test cases for RTL programmatic PDFs and fixes for the formula model (#903) 2025-02-07 08:43:31 +01:00
right_to_left_03.json feat(ocr): auto-detect rotated pages in Tesseract (#1167) 2025-05-21 18:12:33 +02:00
right_to_left_03.md fix: Test cases for RTL programmatic PDFs and fixes for the formula model (#903) 2025-02-07 08:43:31 +01:00
right_to_left_03.pages.json feat: Make Page.parsed_page the only source of truth for text cells, add OCR cells to it (#1745) 2025-06-13 19:01:55 +02:00