Docling/tests at 7d3302cb48dd91cd29673d7c4eaf7326736d0685 - Docling - Gitea: Git with a cup of tea

NeoAnd/Docling

Files

History

Christoph Auer 7d3302cb48 feat: Make Page.parsed_page the only source of truth for text cells, add OCR cells to it (#1745 )

* Keep page.parsed_page.textline_cells and page.cells in sync, including OCR

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Make page.parsed_page the only source of truth for text cells

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Small fix

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Correctly compute PDF boxes from pymupdf

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Use different OCR engine order

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Add type hints and fix mypy

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* One more test fix

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Remove with pypdfium2_lock from caller sites

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Fix typing

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

---------

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

2025-06-13 19:01:55 +02:00

..

feat: Make Page.parsed_page the only source of truth for text cells, add OCR cells to it (#1745 )

2025-06-13 19:01:55 +02:00

feat: Make Page.parsed_page the only source of truth for text cells, add OCR cells to it (#1745 )

2025-06-13 19:01:55 +02:00

__init__.py

fix: Add unit tests (#51 )

2024-08-30 14:08:20 +02:00

test_backend_asciidoc.py

ci: add coverage and ruff (#1383 )

2025-04-14 18:01:26 +02:00

test_backend_csv.py

chore: fix or ignore runtime and deprecation warnings (#1660 )

2025-05-28 17:55:31 +02:00

test_backend_docling_json.py

feat: add Docling JSON ingestion (#783 )

2025-01-24 18:05:23 +01:00

test_backend_docling_parse_v2.py

ci: add coverage and ruff (#1383 )

2025-04-14 18:01:26 +02:00

test_backend_docling_parse_v4.py

ci: add coverage and ruff (#1383 )

2025-04-14 18:01:26 +02:00

test_backend_docling_parse.py

ci: add coverage and ruff (#1383 )

2025-04-14 18:01:26 +02:00

test_backend_html.py

ci: add coverage and ruff (#1383 )

2025-04-14 18:01:26 +02:00

test_backend_jats.py

ci: add coverage and ruff (#1383 )

2025-04-14 18:01:26 +02:00

test_backend_markdown.py

fix(markdown): handle nested lists (#910 )

2025-02-07 12:55:12 +01:00

test_backend_msexcel.py

feat: support xlsm files (#1520 )

2025-06-10 16:55:59 +02:00

test_backend_msword.py

test: mark flaky test (#1698 )

2025-06-03 13:13:44 +02:00

test_backend_patent_uspto.py

ci: add coverage and ruff (#1383 )

2025-04-14 18:01:26 +02:00

test_backend_pdfium.py

fix(pypdfium): resolve overlapping text when merging bounding boxes (#1549 )

2025-05-19 15:26:00 +02:00

test_backend_pptx.py

ci: add coverage and ruff (#1383 )

2025-04-14 18:01:26 +02:00

test_backend_webp.py

feat: support image/webp file type (#1415 )

2025-05-14 09:47:28 +02:00

test_cli.py

fix: Test cases for RTL programmatic PDFs and fixes for the formula model (#903 )

2025-02-07 08:43:31 +01:00

test_code_formula.py

ci: add coverage and ruff (#1383 )

2025-04-14 18:01:26 +02:00

test_data_gen_flag.py

fix(markdown): handle nested lists (#910 )

2025-02-07 12:55:12 +01:00

test_document_picture_classifier.py

ci: add coverage and ruff (#1383 )

2025-04-14 18:01:26 +02:00

test_e2e_conversion.py

feat: new vlm-models support (#1570 )

2025-06-02 17:01:06 +02:00

test_e2e_ocr_conversion.py

feat: Make Page.parsed_page the only source of truth for text cells, add OCR cells to it (#1745 )

2025-06-13 19:01:55 +02:00

test_input_doc.py

fix: guess HTML content starting with script tag (#1673 )

2025-06-02 08:43:24 +02:00

test_interfaces.py

ci: add coverage and ruff (#1383 )

2025-04-14 18:01:26 +02:00

test_invalid_input.py

ci: add coverage and ruff (#1383 )

2025-04-14 18:01:26 +02:00

test_legacy_format_transform.py

chore: fix or ignore runtime and deprecation warnings (#1660 )

2025-05-28 17:55:31 +02:00

test_options.py

feat: new vlm-models support (#1570 )

2025-06-02 17:01:06 +02:00

test_settings_load.py

fix(settings): fix nested settings load via environment variables (#1551 )

2025-05-14 13:42:10 +02:00

verify_utils.py

test: ensure utf-8 in test data utils (#1691 )

2025-06-02 12:13:19 +02:00