* add the pytests
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* renamed the test folder and added the toplevel test
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* updated the toplevel function test
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* need to start running all tests successfully
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* added the reference converted documents
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* added first test for json and md output
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* ran pre-commit
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* replaced deprecated json function with model_dump_json
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* replaced deprecated json function with model_dump_json
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* reformatted code
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* Fix backend tests
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* commented out the drawing
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* ci: avoid duplicate runs
Signed-off-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com>
* commented out json verification for now
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* added verification of input cells
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* reformat code
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* added test to verify the cells in the pages
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* added test to verify the cells in the pages (2)
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* added test to verify the cells in the pages (3)
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* run all examples in CI
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* make sure examples return failures
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* raise a failure if examples fail
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* fix examples
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* run examples after tests
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* Add tests and update top_level_tests using only datamodels
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Remove unnecessary code
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Validate conversion status on e2e test
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* package verify utils and add more tests
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* reduce docs in example, since they are already in the tests
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* skip batch_convert
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* pin docling-parse 1.1.2
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* updated the error messages
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* commented out the json verification for now
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* bumped GLM version
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* Fix lockfile
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Pin new docling-parse v1.1.3
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
---------
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Christoph Auer <cau@zurich.ibm.com>
Co-authored-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
- fields `output` & `assembled` need not be optional
- introduced "synonym" `ConversionResult` for `ConvertedDocument` & deprecated the latter
Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
* Put safety-checks for failed parse of pages
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Introduce page-level error checks
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Bump to docling-parse 1.1.1
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Introduce page-level error checks
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Put safety-checks for failed parse of pages
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Bump to docling-parse 1.1.1
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Raise from page backend if page is not correctly parsed
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Put safety-checks for failed parse of pages
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Bump to docling-parse 1.1.1
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Use docling-parse page-by-page
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Propagate document_hash to PDF backends, use docling-parse 1.0.0
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Upgrade lockfile
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* repin after more packages on pypi
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
* Add assemble options and example saving pages and figures
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* add options for different page elements, improve example and flip name of assemble_options
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
---------
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Uses our own docling_parse to reliably get PDF cells
To get page images, this backend uses pypdfium2
Signed-off-by: Maxim Lysak <mly@zurich.ibm.com>
Co-authored-by: Maxim Lysak <mly@zurich.ibm.com>