Commit Graph

6 Commits

Author SHA1 Message Date
Christoph Auer
a00c937e19
Ensure all models work only on valid pages (#158)
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-10-18 08:54:06 +02:00
Christoph Auer
7d3be0edeb
feat!: Docling v2 (#117)
---------

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Maxim Lysak <mly@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
Co-authored-by: Maxim Lysak <mly@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
2024-10-16 21:02:03 +02:00
Michele Dolfi
27a7a152e1
feat: linux arm64 support and reducing dependencies (#69)
* feat: linux arm64 support and reducing dependencies

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* downgrade pyarrow for wider support

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2024-09-10 15:43:27 +02:00
Peter W. J. Staar
48f4d1ba52
fix: Add unit tests (#51)
* add the pytests

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* renamed the test folder and added the toplevel test

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* updated the toplevel function test

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* need to start running all tests successfully

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added the reference converted documents

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added first test for json and md output

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* ran pre-commit

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* replaced deprecated json function with model_dump_json

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* replaced deprecated json function with model_dump_json

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* reformatted code

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* Fix backend tests

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* commented out the drawing

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* ci: avoid duplicate runs

Signed-off-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com>

* commented out json verification for now

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added verification of input cells

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* reformat code

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added test to verify the cells in the pages

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added test to verify the cells in the pages (2)

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added test to verify the cells in the pages (3)

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* run all examples in CI

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* make sure examples return failures

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* raise a failure if examples fail

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* fix examples

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* run examples after tests

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* Add tests and update top_level_tests using only datamodels

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Remove unnecessary code

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Validate conversion status on e2e test

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* package verify utils and add more tests

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* reduce docs in example, since they are already in the tests

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* skip batch_convert

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* pin docling-parse 1.1.2

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* updated the error messages

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* commented out the json verification for now

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* bumped GLM version

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* Fix lockfile

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Pin new docling-parse v1.1.3

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

---------

Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Christoph Auer <cau@zurich.ibm.com>
Co-authored-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
2024-08-30 14:08:20 +02:00
Panos Vagenas
e46a66a176
fix: refine conversion result (#52)
- fields `output` & `assembled` need not be optional
- introduced "synonym" `ConversionResult` for `ConvertedDocument` & deprecated the latter

Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
2024-08-27 11:50:43 +02:00
Christoph Auer
e2d996753b Initial commit 2024-07-15 09:42:42 +02:00