Michele Dolfi
5458a88464
ci: add coverage and ruff ( #1383 )
...
* add coverage calculation and push
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* new codecov version and usage of token
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* enable ruff formatter instead of black and isort
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* apply ruff lint fixes
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* apply ruff unsafe fixes
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* add removed imports
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* runs 1 on linter issues
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* finalize linter fixes
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* Update pyproject.toml
Co-authored-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
Signed-off-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com>
---------
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com>
Co-authored-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
2025-04-14 18:01:26 +02:00
Michele Dolfi
6eaae3cba0
feat: add factory for ocr engines via plugins ( #1010 )
...
* add factory for ocr engines
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* apply pre-commit after rebase
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* add picture description factory
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* fix enable option
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* switch to create methods
Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>
* make `options` an explicit kwarg
Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>
* keep old lock of docling-core
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* fix lock
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* add allow_external_plugins option
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* add factory return and ignore options type
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
---------
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>
Co-authored-by: Panos Vagenas <pva@zurich.ibm.com>
2025-03-18 13:58:05 +01:00
Michele Dolfi
4cc6e3ea5e
feat: Describe pictures using vision models ( #259 )
...
* draft for picture description models
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* vlm description using AutoModelForVision2Seq
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* add generation options
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* update vlm API
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* allow only localhost traffic
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* rename model
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* do not run with vlm api
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* more renaming
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* fix examples path
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* apply CLI download login
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* fix name of cli argument
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* use with_smolvlm in models download
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
---------
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2025-02-07 16:30:42 +01:00
Michele Dolfi
5ad6de0560
fix: enrichment models batch size and expose picture classifier ( #878 )
...
* expose picture classifier in CLI
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* use different batch size in each model
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* remove batch size from CLI
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* cleanup imports
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
---------
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2025-02-05 11:46:01 +01:00
Matteo
3213b247ad
feat: Code and equation model for PDF and code blocks in markdown ( #752 )
...
* propagated changes for new CodeItem class
Signed-off-by: Matteo Omenetti <omenetti.matteo@gmail.com>
* Rebased branch on latest main. changes for CodeItem
Signed-off-by: Matteo Omenetti <omenetti.matteo@gmail.com>
* removed unused files
Signed-off-by: Matteo Omenetti <omenetti.matteo@gmail.com>
* chore: update lockfile
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* pin latest docling-core
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* update docling-core pinning
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* pin docling-core
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* use new add_code in backends and update typing in MD backend
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* added if statement for backend
Signed-off-by: Matteo Omenetti <omenetti.matteo@gmail.com>
* removed unused import
Signed-off-by: Matteo Omenetti <omenetti.matteo@gmail.com>
* removed print statements
Signed-off-by: Matteo Omenetti <omenetti.matteo@gmail.com>
* gt for new pdf
Signed-off-by: Matteo Omenetti <omenetti.matteo@gmail.com>
* Update docling/pipeline/standard_pdf_pipeline.py
Co-authored-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com>
Signed-off-by: Matteo <43417658+Matteo-Omenetti@users.noreply.github.com>
* fixed doc comment of __call__ function of code_formula_model
Signed-off-by: Matteo Omenetti <omenetti.matteo@gmail.com>
* fix artifacts_path type
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* move imports
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* move expansion_factor to base class
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
---------
Signed-off-by: Matteo Omenetti <omenetti.matteo@gmail.com>
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Matteo <43417658+Matteo-Omenetti@users.noreply.github.com>
Co-authored-by: Christoph Auer <cau@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com>
2025-01-24 16:54:22 +01:00
Michele Dolfi
57fc28d3d8
refactor: allow the usage of backends in the enrich models and generalize the interface ( #742 )
...
* fix get image with cropbox
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* allow the usage of backends in the enrich models and generalize the interface
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* move logic in BaseTextImageEnrichmentModel
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* renaming
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
---------
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2025-01-15 09:52:38 +01:00
Christoph Auer
2a2c65bf4f
feat: Add pipeline timings and toggle visualization, establish debug settings ( #183 )
...
* Add settings to turn visualization on or off
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Add profiling code to all models
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Refactor and fix profiling codes
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Visualization codes output PNG to debug dir
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fixes for time logging
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Optimize imports
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Update lockfile
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Add start_timestamps to ProfilingItem
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-10-30 15:04:19 +01:00
Christoph Auer
7d3be0edeb
feat!: Docling v2 ( #117 )
...
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Maxim Lysak <mly@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
Co-authored-by: Maxim Lysak <mly@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
2024-10-16 21:02:03 +02:00