Peter W. J. Staar
c0ba88edf1
feat(cli): add option for html with split-page mode ( #1355 )
...
* updated the cli to output html in split-page mode
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* add pin for new docling-core with html split argument
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* relock with fixed html export in docling-core
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* update test results
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* update more tests
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* update example
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* update lock with docling-core fixes
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* update test results
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* add again chunking extras
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
---------
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
2025-04-14 08:41:50 +02:00
Maxim Lysak
2f72167ff6
feat: updated vlm pipeline (with latest changes from docling-core) ( #1158 )
...
* Draft implementation of Doctag backend
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Updated VLM pipeline doctags to docling conversion, now properly supports lists
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* preparing to migrate to new doctags deserializer
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* re-using DocTagsDocument.from_doctags_and_image_pairs
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* satisfying mypy and other checks
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Added support for force_backend_text parameter
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* removed unnecessary transformation
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Cleaned up
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Update tests
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Updated readme
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
---------
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Co-authored-by: Maksym Lysak <mly@zurich.ibm.com>
Co-authored-by: Christoph Auer <cau@zurich.ibm.com>
2025-03-18 15:44:51 +01:00
Panos Vagenas
0945973b79
fix: use first table row as col headers ( #1156 )
...
Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>
2025-03-13 15:34:18 +01:00
Panos Vagenas
94751a78f4
fix(markdown): add support for HTML content ( #855 )
...
Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
2025-02-03 12:21:05 +01:00
Cesar Berrospi Ramis
0cd81a8122
fix(docx): merged table cells not properly converted ( #857 )
...
* fix(docx): merged cells not properly converted
Fix conversion issue of merged cells in Word tables leading to repeated text.
Simplify Word table conversion code.
Add docx file with several table formats for regression tests.
Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
* chore: add type hinting to docx backend
Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
---------
Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
2025-02-03 10:20:03 +01:00