Suehtam
|
1d17e7397a
|
test: avoid testing exact JSON in CSV backend (#1038)
* feat: updated verify_export
Moved verify_export to verify_utils
Reuse verify_export in tests
Signed-off-by: Matheus Abdias <matheusfabdias@gmail.com>
* feat: replace verify_export with verify_document in CSV conversion tests
Signed-off-by: Matheus Abdias <matheusfabdias@gmail.com>
---------
Signed-off-by: Matheus Abdias <matheusfabdias@gmail.com>
|
2025-02-24 08:10:40 +01:00 |
|
Cesar Berrospi Ramis
|
1ac010354f
|
test: avoid testing exact JSON (#1027)
* test: avoid testing exact JSON
Avoid testing exact JSON output in html and xml backends.
Reuse the JSON verify helper function among backend test files.
Improve type annotations in html backend.
Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
* Update tests/test_backend_patent_uspto.py
Co-authored-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com>
Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
---------
Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
Co-authored-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com>
|
2025-02-20 16:20:07 +01:00 |
|
Cesar Berrospi Ramis
|
a112d7a035
|
fix: parse html with omitted body tag (#818)
* fix: parse HTML files without body tag
Parse HTML files without 'body' tag, since it is optional in HTML5 specification.
Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
* test: ensure docling converts HTML without body tag
Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
---------
Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
|
2025-01-27 16:59:00 +01:00 |
|
Peter W. J. Staar
|
f542460af3
|
fix: fix duplicate title and heading + add e2e tests for html and docx (#186)
* add real e2e tests for html and docx
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* updated the output of itxt
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* reformatted the text
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* fixed the tests
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* fixed the tests (2)
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* fixed the examples (1)
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* fixed the output of the test
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* updated the tests, moved the ground-truth
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* moved the ground-truth data
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* fixed the html tests
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* restructure title fix (#187)
Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
---------
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
Co-authored-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
|
2024-10-30 13:14:56 +01:00 |
|
Panos Vagenas
|
b9f5c74a7d
|
fix: fix header levels for DOCX & HTML (#184)
Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
|
2024-10-28 17:02:52 +01:00 |
|