Commit Graph

8 Commits

Author SHA1 Message Date
Panos Vagenas
0945973b79
fix: use first table row as col headers (#1156)
Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>
2025-03-13 15:34:18 +01:00
Suehtam
1d17e7397a
test: avoid testing exact JSON in CSV backend (#1038)
* feat: updated verify_export
Moved verify_export to verify_utils
Reuse verify_export in tests

Signed-off-by: Matheus Abdias <matheusfabdias@gmail.com>

* feat: replace verify_export with verify_document in CSV conversion tests

Signed-off-by: Matheus Abdias <matheusfabdias@gmail.com>

---------

Signed-off-by: Matheus Abdias <matheusfabdias@gmail.com>
2025-02-24 08:10:40 +01:00
Cesar Berrospi Ramis
1ac010354f
test: avoid testing exact JSON (#1027)
* test: avoid testing exact JSON

Avoid testing exact JSON output in html and xml backends.
Reuse the JSON verify helper function among backend test files.
Improve type annotations in html backend.

Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>

* Update tests/test_backend_patent_uspto.py

Co-authored-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com>
Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>

---------

Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
Co-authored-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com>
2025-02-20 16:20:07 +01:00
Michele Dolfi
e1436a8b05
test: validate actual docitems in tests (#966)
* validate actual docitems in tests

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* remove verbose print

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* disable test generation

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2025-02-14 17:47:53 +01:00
Cesar Berrospi Ramis
0cd81a8122
fix(docx): merged table cells not properly converted (#857)
* fix(docx): merged cells not properly converted

Fix conversion issue of merged cells in Word tables leading to repeated text.
Simplify Word table conversion code.
Add docx file with several table formats for regression tests.

Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>

* chore: add type hinting to docx backend

Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>

---------

Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
2025-02-03 10:20:03 +01:00
Panos Vagenas
ba521dd88f
chore: add missing imports to Office type tests (#826)
* chore: add missing import to XLSX test

Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>

* Update test_backend_msword.py [skip ci]

Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>

* Update test_backend_pptx.py

Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>

---------

Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
2025-01-28 16:17:44 +01:00
Peter W. J. Staar
f542460af3
fix: fix duplicate title and heading + add e2e tests for html and docx (#186)
* add real e2e tests for html and docx

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* updated the output of itxt

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* reformatted the text

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fixed the tests

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fixed the tests (2)

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fixed the examples (1)

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fixed the output of the test

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* updated the tests, moved the ground-truth

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* moved the ground-truth data

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fixed the html tests

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* restructure title fix (#187)

Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>

---------

Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
Co-authored-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
2024-10-30 13:14:56 +01:00
Panos Vagenas
b9f5c74a7d
fix: fix header levels for DOCX & HTML (#184)
Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
2024-10-28 17:02:52 +01:00