Michele Dolfi
5458a88464
ci: add coverage and ruff ( #1383 )
...
* add coverage calculation and push
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
* new codecov version and usage of token
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
* enable ruff formatter instead of black and isort
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
* apply ruff lint fixes
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
* apply ruff unsafe fixes
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
* add removed imports
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
* runs 1 on linter issues
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
* finalize linter fixes
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
* Update pyproject.toml
Co-authored-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com >
Signed-off-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com >
---------
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
Signed-off-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com >
Co-authored-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com >
2025-04-14 18:01:26 +02:00
Cesar Berrospi Ramis
eef2bdea77
feat(xlsx): create a page for each worksheet in XLSX backend ( #1332 )
...
* sytle(xlsx): enforce type hints in XLSX backend
Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com >
* feat(xlsx): create a page for each worksheet in XLSX backend
Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com >
* docs(xlsx): add docstrings to XLSX backend module.
Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com >
* docling(xlsx): add bounding boxes and page size information in cell units
Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com >
---------
Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com >
2025-04-11 10:29:53 +02:00
Panos Vagenas
0945973b79
fix: use first table row as col headers ( #1156 )
...
Signed-off-by: Panos Vagenas <pva@zurich.ibm.com >
2025-03-13 15:34:18 +01:00
Peter W. J. Staar
a458e298ca
fix: added extraction of byte-images in excel ( #804 )
...
* fix(msexcel): ignore Mypy checking for _find_images_in_sheet function
Signed-off-by: Jiun An Tsai <andrew@247365-Macbook.local >
* fixed some issues
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* reformatted the code
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* pinned pillow in pyproject
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
---------
Signed-off-by: Jiun An Tsai <andrew@247365-Macbook.local >
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
Co-authored-by: Jiun An Tsai <andrew@247365-Macbook.local >
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com >
2025-01-24 18:48:02 +01:00
Matteo
3213b247ad
feat: Code and equation model for PDF and code blocks in markdown ( #752 )
...
* propagated changes for new CodeItem class
Signed-off-by: Matteo Omenetti <omenetti.matteo@gmail.com >
* Rebased branch on latest main. changes for CodeItem
Signed-off-by: Matteo Omenetti <omenetti.matteo@gmail.com >
* removed unused files
Signed-off-by: Matteo Omenetti <omenetti.matteo@gmail.com >
* chore: update lockfile
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* pin latest docling-core
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
* update docling-core pinning
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
* pin docling-core
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
* use new add_code in backends and update typing in MD backend
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
* added if statement for backend
Signed-off-by: Matteo Omenetti <omenetti.matteo@gmail.com >
* removed unused import
Signed-off-by: Matteo Omenetti <omenetti.matteo@gmail.com >
* removed print statements
Signed-off-by: Matteo Omenetti <omenetti.matteo@gmail.com >
* gt for new pdf
Signed-off-by: Matteo Omenetti <omenetti.matteo@gmail.com >
* Update docling/pipeline/standard_pdf_pipeline.py
Co-authored-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com >
Signed-off-by: Matteo <43417658+Matteo-Omenetti@users.noreply.github.com >
* fixed doc comment of __call__ function of code_formula_model
Signed-off-by: Matteo Omenetti <omenetti.matteo@gmail.com >
* fix artifacts_path type
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
* move imports
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
* move expansion_factor to base class
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
---------
Signed-off-by: Matteo Omenetti <omenetti.matteo@gmail.com >
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
Signed-off-by: Matteo <43417658+Matteo-Omenetti@users.noreply.github.com >
Co-authored-by: Christoph Auer <cau@zurich.ibm.com >
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com >
Co-authored-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com >
2025-01-24 16:54:22 +01:00
Peter W. J. Staar
926dfd29d5
feat: added excel backend ( #334 )
...
* feat: added excel backend
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* first msexcel backend
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* added tooling for the cli
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* first working version for excel parsing of tables
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* added proper typing for mypy
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* added proper typing for mypy
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* refactor EXCEL to XLSX
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* added the unit tests
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* ran poetry lock
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* adding images to output [WIP]
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* reformatted the code
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* fixed the mypy
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* updated the msexcel
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* updated the msexcel (2)
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* fixed the mypy
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* added tests for merged cells in excel
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* reformatted the code
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
---------
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2024-11-19 12:21:17 +01:00