Docling/tests
Simon Jégou bfcab3d677
feat(docx): add text formatting and hyperlink support (#630)
* feat: Enable markdown text formatting for docx

Signed-off-by: SimJeg <sjegou@nvidia.com>

* Fix imports

Signed-off-by: SimJeg <sjegou@nvidia.com>

* Use Formatting

Signed-off-by: SimJeg <sjegou@nvidia.com>

* Handle hyperlink

Signed-off-by: SimJeg <sjegou@nvidia.com>

* Handle formatting properly for DocItemLabel.PARAGRAPH

Signed-off-by: SimJeg <sjegou@nvidia.com>

* Use inline group

Signed-off-by: SimJeg <sjegou@nvidia.com>

* Handle bullet lists

Signed-off-by: SimJeg <sjegou@nvidia.com>

* Strip elements

Signed-off-by: SimJeg <sjegou@nvidia.com>

* Strip elements

Signed-off-by: SimJeg <sjegou@nvidia.com>

* Run black and mypy

Signed-off-by: SimJeg <sjegou@nvidia.com>

* Handle header and footer

Signed-off-by: SimJeg <sjegou@nvidia.com>

* Use inline_fmt everywhere

Signed-off-by: SimJeg <sjegou@nvidia.com>

* Run precommit

Signed-off-by: SimJeg <sjegou@nvidia.com>

* Address feedback

Signed-off-by: SimJeg <sjegou@nvidia.com>

* Fix add_list_item

Signed-off-by: SimJeg <sjegou@nvidia.com>

* fix minor bugs, mark helper methods internal

Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>

---------

Signed-off-by: SimJeg <sjegou@nvidia.com>
Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>
Co-authored-by: Panos Vagenas <pva@zurich.ibm.com>
2025-04-03 15:11:50 +02:00
..
data feat(docx): add text formatting and hyperlink support (#630) 2025-04-03 15:11:50 +02:00
data_scanned feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905) 2025-03-18 10:38:19 +01:00
__init__.py fix: Add unit tests (#51) 2024-08-30 14:08:20 +02:00
test_backend_asciidoc.py fix: Test cases for RTL programmatic PDFs and fixes for the formula model (#903) 2025-02-07 08:43:31 +01:00
test_backend_csv.py feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905) 2025-03-18 10:38:19 +01:00
test_backend_docling_json.py feat: add Docling JSON ingestion (#783) 2025-01-24 18:05:23 +01:00
test_backend_docling_parse_v2.py fix: Test cases for RTL programmatic PDFs and fixes for the formula model (#903) 2025-02-07 08:43:31 +01:00
test_backend_docling_parse_v4.py feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905) 2025-03-18 10:38:19 +01:00
test_backend_docling_parse.py fix: Test cases for RTL programmatic PDFs and fixes for the formula model (#903) 2025-02-07 08:43:31 +01:00
test_backend_html.py fix: improve HTML layer detection, various MD fixes (#1241) 2025-03-26 16:07:14 +01:00
test_backend_jats.py feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905) 2025-03-18 10:38:19 +01:00
test_backend_markdown.py fix(markdown): handle nested lists (#910) 2025-02-07 12:55:12 +01:00
test_backend_msexcel.py feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905) 2025-03-18 10:38:19 +01:00
test_backend_msword.py feat(docx): add text formatting and hyperlink support (#630) 2025-04-03 15:11:50 +02:00
test_backend_patent_uspto.py feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905) 2025-03-18 10:38:19 +01:00
test_backend_pdfium.py fix: Test cases for RTL programmatic PDFs and fixes for the formula model (#903) 2025-02-07 08:43:31 +01:00
test_backend_pptx.py feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905) 2025-03-18 10:38:19 +01:00
test_cli.py fix: Test cases for RTL programmatic PDFs and fixes for the formula model (#903) 2025-02-07 08:43:31 +01:00
test_code_formula.py feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905) 2025-03-18 10:38:19 +01:00
test_data_gen_flag.py fix(markdown): handle nested lists (#910) 2025-02-07 12:55:12 +01:00
test_document_picture_classifier.py feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905) 2025-03-18 10:38:19 +01:00
test_e2e_conversion.py feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905) 2025-03-18 10:38:19 +01:00
test_e2e_ocr_conversion.py feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905) 2025-03-18 10:38:19 +01:00
test_input_doc.py feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905) 2025-03-18 10:38:19 +01:00
test_interfaces.py feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905) 2025-03-18 10:38:19 +01:00
test_invalid_input.py fix: Test cases for RTL programmatic PDFs and fixes for the formula model (#903) 2025-02-07 08:43:31 +01:00
test_legacy_format_transform.py feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905) 2025-03-18 10:38:19 +01:00
test_options.py feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905) 2025-03-18 10:38:19 +01:00
verify_utils.py feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905) 2025-03-18 10:38:19 +01:00