* Move expensive imports closer to usage
Signed-off-by: William Easton <bill.easton@elastic.co>
* DCO Remediation Commit for William Easton <bill.easton@elastic.co>
I, William Easton <bill.easton@elastic.co>, hereby add my Signed-off-by to this commit: 8a7412ce5bb131a01bb6403067aeb948c9093b0b
Signed-off-by: William Easton <bill.easton@elastic.co>
* formatting fixes
Signed-off-by: William Easton <bill.easton@elastic.co>
* DCO Remediation Commit for William Easton <bill.easton@elastic.co>
I, William Easton <bill.easton@elastic.co>, hereby add my Signed-off-by to this commit: 8a7412ce5bb131a01bb6403067aeb948c9093b0b
I, William Easton <bill.easton@elastic.co>, hereby add my Signed-off-by to this commit: 963e34325071db5e844841f10c27b396a054a0a1
Signed-off-by: William Easton <bill.easton@elastic.co>
* Fix baseocrmodel test issue
Signed-off-by: William Easton <bill.easton@elastic.co>
---------
Signed-off-by: William Easton <bill.easton@elastic.co>
* Keep page.parsed_page.textline_cells and page.cells in sync, including OCR
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Make page.parsed_page the only source of truth for text cells
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Small fix
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Correctly compute PDF boxes from pymupdf
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Use different OCR engine order
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Add type hints and fix mypy
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* One more test fix
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Remove with pypdfium2_lock from caller sites
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fix typing
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* feat: adding new vlm-models support
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* fixed the transformers
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* got microsoft/Phi-4-multimodal-instruct to work
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* working on vlm's
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* refactoring the VLM part
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* all working, now serious refacgtoring necessary
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* refactoring the download_model
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* added the formulate_prompt
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* pixtral 12b runs via MLX and native transformers
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* added the VlmPredictionToken
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* refactoring minimal_vlm_pipeline
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* fixed the MyPy
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* added pipeline_model_specializations file
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* need to get Phi4 working again ...
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* finalising last points for vlms support
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* fixed the pipeline for Phi4
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* streamlining all code
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* reformatted the code
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* fixing the tests
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* added the html backend to the VLM pipeline
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* fixed the static load_from_doctags
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* restore stable imports
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* use AutoModelForVision2Seq for Pixtral and review example (including rename)
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* remove unused value
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* refactor instances of VLM models
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* skip compare example in CI
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* use lowercase and uppercase only
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* add new minimal_vlm example and refactor pipeline_options_vlm_model for cleaner import
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* rename pipeline_vlm_model_spec
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* move more argument to options and simplify model init
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* add supported_devices
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* remove not-needed function
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* exclude minimal_vlm
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* missing file
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* add message for transformers version
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* rename to specs
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* use module import and remove MLX from non-darwin
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* remove hf_vlm_model and add extra_generation_args
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* use single HF VLM model class
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* remove torch type
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* add docs for vision models
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
---------
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
fix: Support for RTL programmatic documents
fix(parser): detect and handle rotated pages
fix(parser): fix bug causing duplicated text
fix(formula): improve stopping criteria
chore: update lock file
fix: temporary constrain beautifulsoup
* switch to code formula model v1.0.1 and new test pdf
Signed-off-by: Matteo-Omenetti <Matteo.Omenetti1@ibm.com>
* switch to code formula model v1.0.1 and new test pdf
Signed-off-by: Matteo-Omenetti <Matteo.Omenetti1@ibm.com>
* cleaned up the data folder in the tests
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* switch to code formula model v1.0.1 and new test pdf
Signed-off-by: Matteo-Omenetti <Matteo.Omenetti1@ibm.com>
* added three test-files for right-to-left
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* fix black
Signed-off-by: Matteo-Omenetti <Matteo.Omenetti1@ibm.com>
* added new gt for test_e2e_conversion
Signed-off-by: Matteo-Omenetti <Matteo.Omenetti1@ibm.com>
* added new gt for test_e2e_conversion
Signed-off-by: Matteo-Omenetti <Matteo.Omenetti1@ibm.com>
* Add code to expose text direction of cell
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* new test file
Signed-off-by: Matteo-Omenetti <Matteo.Omenetti1@ibm.com>
* update lock
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* fix mypy reports
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* fix example filepaths
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* add test data results
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* pin wheel of latest docling-parse release
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* use latest docling-core
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* remove debugging code
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* fix path to files in example
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* Revert unwanted RTL additions
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fix test data paths in examples
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
---------
Signed-off-by: Matteo-Omenetti <Matteo.Omenetti1@ibm.com>
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Matteo-Omenetti <Matteo.Omenetti1@ibm.com>
Co-authored-by: Peter Staar <taa@zurich.ibm.com>
Co-authored-by: Christoph Auer <cau@zurich.ibm.com>
* Upgraded Layout Postprocessing, sending old code back to ERZ
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Implement hierachical cluster layout processing
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Pass nested cluster processing through full pipeline
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Pass nested clusters through GLM as payload
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Move to_docling_document from ds-glm to this repo
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Clean up imports again
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* feat(Accelerator): Introduce options to control the num_threads and device from API, envvars, CLI.
- Introduce the AcceleratorOptions, AcceleratorDevice and use them to set the device where the models run.
- Introduce the accelerator_utils with function to decide the device and resolve the AUTO setting.
- Refactor the way how the docling-ibm-models are called to match the new init signature of models.
- Translate the accelerator options to the specific inputs for third-party models.
- Extend the docling CLI with parameters to set the num_threads and device.
- Add new unit tests.
- Write new example how to use the accelerator options.
* fix: Improve the pydantic objects in the pipeline_options and imports.
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
* fix: TableStructureModel: Refactor the artifacts path to use the new structure for fast/accurate model
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
* Updated test ground-truth
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Updated test ground-truth (again), bugfix for empty layout
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* fix: Do proper check to set the device in EasyOCR, RapidOCR.
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
* fix: Correct the way to set GPU for EasyOCR, RapidOCR
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
* fix: Ocr AccleratorDevice
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
* Merge pull request #556 from DS4SD/cau/layout-processing-improvement
feat: layout processing improvements and bugfixes
* Update lockfile
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Update tests
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Update HF model ref, reset test generate
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Repin to release package versions
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Many layout processing improvements, add document index type
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Update pinnings to docling-core
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Update test GT
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fix table box snapping
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fixes for cluster pre-ordering
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Introduce OCR confidence, propagate to orphan in post-processing
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fix form and key value area groups
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Adjust confidence in EasyOcr
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Roll back CLI changes from main
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Update test GT
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Update docling-core pinning
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Annoying fixes for historical python versions
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Updated test GT for legacy
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Comment cleanup
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
Co-authored-by: Nikos Livathinos <nli@zurich.ibm.com>
* Upgraded Layout Postprocessing, sending old code back to ERZ
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Implement hierachical cluster layout processing
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Pass nested cluster processing through full pipeline
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Pass nested clusters through GLM as payload
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Move to_docling_document from ds-glm to this repo
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Clean up imports again
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* feat(Accelerator): Introduce options to control the num_threads and device from API, envvars, CLI.
- Introduce the AcceleratorOptions, AcceleratorDevice and use them to set the device where the models run.
- Introduce the accelerator_utils with function to decide the device and resolve the AUTO setting.
- Refactor the way how the docling-ibm-models are called to match the new init signature of models.
- Translate the accelerator options to the specific inputs for third-party models.
- Extend the docling CLI with parameters to set the num_threads and device.
- Add new unit tests.
- Write new example how to use the accelerator options.
* fix: Improve the pydantic objects in the pipeline_options and imports.
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
* fix: TableStructureModel: Refactor the artifacts path to use the new structure for fast/accurate model
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
* Updated test ground-truth
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Updated test ground-truth (again), bugfix for empty layout
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* fix: Do proper check to set the device in EasyOCR, RapidOCR.
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
* Rollback changes from main
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Update test gt
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Remove unused debug settings
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Review fixes
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Nail the accelerator defaults for MPS
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
Co-authored-by: Christoph Auer <cau@zurich.ibm.com>
Co-authored-by: Christoph Auer <60343111+cau-git@users.noreply.github.com>
* Add settings to turn visualization on or off
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Add profiling code to all models
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Refactor and fix profiling codes
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Visualization codes output PNG to debug dir
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fixes for time logging
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Optimize imports
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Update lockfile
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Add start_timestamps to ProfilingItem
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>