..
2203.01017v2.doctags.txt
feat: leverage new list modeling, capture default markers ( #1856 )
2025-06-27 16:37:15 +02:00
2203.01017v2.json
feat: Integrate ListItemMarkerProcessor into document assembly ( #1825 )
2025-07-01 10:04:58 +02:00
2203.01017v2.md
feat: leverage new list modeling, capture default markers ( #1856 )
2025-06-27 16:37:15 +02:00
2203.01017v2.pages.json
feat: Make Page.parsed_page the only source of truth for text cells, add OCR cells to it ( #1745 )
2025-06-13 19:01:55 +02:00
2206.01062.doctags.txt
feat: leverage new list modeling, capture default markers ( #1856 )
2025-06-27 16:37:15 +02:00
2206.01062.json
feat: Integrate ListItemMarkerProcessor into document assembly ( #1825 )
2025-07-01 10:04:58 +02:00
2206.01062.md
feat: leverage new list modeling, capture default markers ( #1856 )
2025-06-27 16:37:15 +02:00
2206.01062.pages.json
feat: Make Page.parsed_page the only source of truth for text cells, add OCR cells to it ( #1745 )
2025-06-13 19:01:55 +02:00
2305.03393v1-pg9.doctags.txt
feat: Use new TableFormer model weights and default to accurate model version ( #1100 )
2025-03-11 10:53:49 +01:00
2305.03393v1-pg9.json
feat: support xlsm files ( #1520 )
2025-06-10 16:55:59 +02:00
2305.03393v1-pg9.md
feat: Use new TableFormer model weights and default to accurate model version ( #1100 )
2025-03-11 10:53:49 +01:00
2305.03393v1-pg9.pages.json
feat: Integrate ListItemMarkerProcessor into document assembly ( #1825 )
2025-07-01 10:04:58 +02:00
2305.03393v1.doctags.txt
feat: leverage new list modeling, capture default markers ( #1856 )
2025-06-27 16:37:15 +02:00
2305.03393v1.json
feat: Integrate ListItemMarkerProcessor into document assembly ( #1825 )
2025-07-01 10:04:58 +02:00
2305.03393v1.md
feat: leverage new list modeling, capture default markers ( #1856 )
2025-06-27 16:37:15 +02:00
2305.03393v1.pages.json
feat: Make Page.parsed_page the only source of truth for text cells, add OCR cells to it ( #1745 )
2025-06-13 19:01:55 +02:00
amt_handbook_sample.doctags.txt
fix: restrict click version and update lock file ( #1582 )
2025-05-13 10:40:08 +02:00
amt_handbook_sample.json
feat(ocr): auto-detect rotated pages in Tesseract ( #1167 )
2025-05-21 18:12:33 +02:00
amt_handbook_sample.md
docs: Add example for inspection of picture content ( #624 )
2025-01-29 10:39:00 +01:00
amt_handbook_sample.pages.json
feat: Make Page.parsed_page the only source of truth for text cells, add OCR cells to it ( #1745 )
2025-06-13 19:01:55 +02:00
code_and_formula.doctags.txt
feat: Implement new reading-order model ( #916 )
2025-02-20 17:51:17 +01:00
code_and_formula.json
chore: format JSON test files to enable comparison ( #1511 )
2025-05-02 10:52:18 +02:00
code_and_formula.md
feat: Implement new reading-order model ( #916 )
2025-02-20 17:51:17 +01:00
code_and_formula.pages.json
feat: Make Page.parsed_page the only source of truth for text cells, add OCR cells to it ( #1745 )
2025-06-13 19:01:55 +02:00
multi_page.doctags.txt
feat: leverage new list modeling, capture default markers ( #1856 )
2025-06-27 16:37:15 +02:00
multi_page.json
feat: Integrate ListItemMarkerProcessor into document assembly ( #1825 )
2025-07-01 10:04:58 +02:00
multi_page.md
feat: leverage new list modeling, capture default markers ( #1856 )
2025-06-27 16:37:15 +02:00
multi_page.pages.json
feat: Make Page.parsed_page the only source of truth for text cells, add OCR cells to it ( #1745 )
2025-06-13 19:01:55 +02:00
picture_classification.doctags.txt
fix: restrict click version and update lock file ( #1582 )
2025-05-13 10:40:08 +02:00
picture_classification.json
feat(ocr): auto-detect rotated pages in Tesseract ( #1167 )
2025-05-21 18:12:33 +02:00
picture_classification.md
feat: New document picture classifier ( #805 )
2025-01-24 18:05:51 +01:00
picture_classification.pages.json
feat: Make Page.parsed_page the only source of truth for text cells, add OCR cells to it ( #1745 )
2025-06-13 19:01:55 +02:00
redp5110_sampled.doctags.txt
feat: leverage new list modeling, capture default markers ( #1856 )
2025-06-27 16:37:15 +02:00
redp5110_sampled.json
feat: Integrate ListItemMarkerProcessor into document assembly ( #1825 )
2025-07-01 10:04:58 +02:00
redp5110_sampled.md
feat: leverage new list modeling, capture default markers ( #1856 )
2025-06-27 16:37:15 +02:00
redp5110_sampled.pages.json
feat: Make Page.parsed_page the only source of truth for text cells, add OCR cells to it ( #1745 )
2025-06-13 19:01:55 +02:00
right_to_left_01.doctags.txt
feat: Implement new reading-order model ( #916 )
2025-02-20 17:51:17 +01:00
right_to_left_01.json
chore: format JSON test files to enable comparison ( #1511 )
2025-05-02 10:52:18 +02:00
right_to_left_01.md
feat: Implement new reading-order model ( #916 )
2025-02-20 17:51:17 +01:00
right_to_left_01.pages.json
feat: Make Page.parsed_page the only source of truth for text cells, add OCR cells to it ( #1745 )
2025-06-13 19:01:55 +02:00
right_to_left_02.doctags.txt
fix: Test cases for RTL programmatic PDFs and fixes for the formula model ( #903 )
2025-02-07 08:43:31 +01:00
right_to_left_02.json
chore: format JSON test files to enable comparison ( #1511 )
2025-05-02 10:52:18 +02:00
right_to_left_02.md
fix: Test cases for RTL programmatic PDFs and fixes for the formula model ( #903 )
2025-02-07 08:43:31 +01:00
right_to_left_02.pages.json
feat: Make Page.parsed_page the only source of truth for text cells, add OCR cells to it ( #1745 )
2025-06-13 19:01:55 +02:00
right_to_left_03.doctags.txt
fix: Test cases for RTL programmatic PDFs and fixes for the formula model ( #903 )
2025-02-07 08:43:31 +01:00
right_to_left_03.json
feat(ocr): auto-detect rotated pages in Tesseract ( #1167 )
2025-05-21 18:12:33 +02:00
right_to_left_03.md
fix: Test cases for RTL programmatic PDFs and fixes for the formula model ( #903 )
2025-02-07 08:43:31 +01:00
right_to_left_03.pages.json
feat: Make Page.parsed_page the only source of truth for text cells, add OCR cells to it ( #1745 )
2025-06-13 19:01:55 +02:00