* feat: adding new vlm-models support
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* fixed the transformers
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* got microsoft/Phi-4-multimodal-instruct to work
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* working on vlm's
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* refactoring the VLM part
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* all working, now serious refacgtoring necessary
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* refactoring the download_model
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* added the formulate_prompt
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* pixtral 12b runs via MLX and native transformers
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* added the VlmPredictionToken
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* refactoring minimal_vlm_pipeline
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* fixed the MyPy
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* added pipeline_model_specializations file
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* need to get Phi4 working again ...
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* finalising last points for vlms support
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* fixed the pipeline for Phi4
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* streamlining all code
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* reformatted the code
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* fixing the tests
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* added the html backend to the VLM pipeline
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* fixed the static load_from_doctags
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* restore stable imports
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* use AutoModelForVision2Seq for Pixtral and review example (including rename)
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* remove unused value
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* refactor instances of VLM models
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* skip compare example in CI
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* use lowercase and uppercase only
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* add new minimal_vlm example and refactor pipeline_options_vlm_model for cleaner import
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* rename pipeline_vlm_model_spec
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* move more argument to options and simplify model init
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* add supported_devices
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* remove not-needed function
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* exclude minimal_vlm
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* missing file
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* add message for transformers version
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* rename to specs
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* use module import and remove MLX from non-darwin
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* remove hf_vlm_model and add extra_generation_args
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* use single HF VLM model class
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* remove torch type
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* add docs for vision models
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
---------
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
* feat: Add visualization of bbox on page with html export.
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* updated the cli
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* reformatted code
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* updated the cli argument to show_layout
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
---------
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* Add DoclingParseV3 backend implementation
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Use docling-core with docling-parse types
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fixes and test updates
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fix streams
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fix streams
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Reset tests
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* update test cases
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* update test units
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Add back DoclingParse v1 backend, pipeline options
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Update locks
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* fix: update docling-core to 2.22.0
Update dependency library docling-core to latest release 2.22.0
Fix regression tests and ground truth files
Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
* Ground-truth files updated
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Update tests, use TextCell.from_ocr property
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Text fixes, new test data
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Rename docling backend to v4
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Test all backends, fixes
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Reset all tests to use docling-parse v1 for now
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fixes for DPv4 backend init, better test coverage
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* test_input_doc use default backend
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
Co-authored-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
* Equation groups
Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>
* fix: Proper handling of orphan IDs in layout postprocessing (#1118)
* Fix the handling of orphan IDs in layout postprocessing
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Update test cases
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>
* chore: bump version to 2.25.2 [skip ci]
* docs: add description of DOCLING_ARTIFACTS_PATH env var (#1124)
add env var in docs
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>
* fix(CLI): fix help message for abort options (#1130)
fix help message
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>
* perf: New revision code formula model and document picture classifier (#1140)
* new version code formula model
Signed-off-by: Matteo-Omenetti <Matteo.Omenetti1@ibm.com>
* new version document picture classifier
Signed-off-by: Matteo-Omenetti <Matteo.Omenetti1@ibm.com>
* new code formula model
Signed-off-by: Matteo-Omenetti <Matteo.Omenetti1@ibm.com>
* restored original code formula test pdf
Signed-off-by: Matteo-Omenetti <Matteo.Omenetti1@ibm.com>
---------
Signed-off-by: Matteo-Omenetti <Matteo.Omenetti1@ibm.com>
Co-authored-by: Matteo-Omenetti <Matteo.Omenetti1@ibm.com>
Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>
* feat: Use new TableFormer model weights and default to accurate model version (#1100)
* feat: New tableformer model weights [WIP]
Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com>
* Updated TF version
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Updated tests, after merging with Main, Switched to Accurate TF model by default
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
---------
Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com>
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
Co-authored-by: Maksym Lysak <mly@zurich.ibm.com>
Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>
* chore: bump version to 2.26.0 [skip ci]
* fix: Pass tests, update docling-core to 2.22.0 (#1150)
fix: update docling-core to 2.22.0
Update dependency library docling-core to latest release 2.22.0
Fix regression tests and ground truth files
Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
* Updating content hash
Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>
---------
Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Matteo-Omenetti <Matteo.Omenetti1@ibm.com>
Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com>
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
Co-authored-by: Christoph Auer <60343111+cau-git@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com>
Co-authored-by: Matteo <43417658+Matteo-Omenetti@users.noreply.github.com>
Co-authored-by: Matteo-Omenetti <Matteo.Omenetti1@ibm.com>
Co-authored-by: Maksym Lysak <mly@zurich.ibm.com>
Co-authored-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
fix: update docling-core to 2.22.0
Update dependency library docling-core to latest release 2.22.0
Fix regression tests and ground truth files
Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
* Skeleton for SmolDocling model and VLM Pipeline
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* wip smolDocling inference and vlm pipeline
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* WIP, first working code for inference of SmolDocling, and vlm pipeline assembly code, example included.
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Fixes to preserve page image and demo export to html
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Enabled figure support in vlm_pipeline
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Fix for table span compute in vlm_pipeline
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Properly propagating image data per page, together with predicted tags in VLM pipeline. This enables correct figure extraction and page numbers in provenances
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Cleaned up logs, added pages to vlm_pipeline, basic timing per page measurement in smol_docling models
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Replaced hardcoded otsl tokens with the ones from docling-core tokens.py enum
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Added tokens/sec measurement, improved example
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Added capability for vlm_pipeline to grab text from preconfigured backend
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Exposed "force_backend_text" as pipeline parameter
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Flipped keep_backend to True for vlm_pipeline assembly to work
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Updated vlm pipeline assembly and smol docling model code to support updated doctags
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Fixing doctags starting tag, that broke elements on first line during assembly
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Introduced SmolDoclingOptions to configure model parameters (such as query and artifacts path) via client code, see example in minimal_smol_docling. Provisioning for other potential vlm all-in-one models.
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Moved artifacts_path for SmolDocling into vlm_options instead of global pipeline option
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* New assembly code for latest model revision, updated prompt and parsing of doctags, updated logging
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Updated example of Smol Docling usage
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Added captions for the images for SmolDocling assembly code, improved provenance definition for all elements
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Update minimal smoldocling example
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fix repo id
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Cleaned up unnecessary logging
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* More elegant solution in removing the input prompt
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* removed minimal_smol_docling example from CI checks
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Removed special html code wrapping when exporting to docling document, cleaned up comments
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Addressing PR comments, added enabled property to SmolDocling, and related VLM pipeline option, few other minor things
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Moved keep_backend = True to vlm pipeline
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* removed pipeline_options.generate_table_images from vlm_pipeline (deprecated in the pipelines)
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Added example on how to get original predicted doctags in minimal_smol_docling
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* removing changes from base_pipeline
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Replaced remaining strings to appropriate enums
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Updated poetry.lock
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* re-built poetry.lock
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Generalize and refactor VLM pipeline and models
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Rename example
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Move imports
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Expose control over using flash_attention_2
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fix VLM example exclusion in CI
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Add back device_map and accelerate
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Make drawing code resilient against bad bboxes
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* chore: clean up code and comments
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* chore: more cleanup
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* chore: fix leftover .to(device)
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* fix: add proper table provenance
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
Co-authored-by: Maksym Lysak <mly@zurich.ibm.com>
* refactor: upgrade BeautifulSoup4 with type hints
Upgrade dependency library BeautifulSoup4 to 4.13.3 (with type hints).
Refactor backends using BeautifulSoup4 to comply with type hints.
Apply style simplifications and improvements for consistency.
Remove variables and functions that are never used.
Remove code duplication between backends for parsing HTML tables.
Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
* build: allow beautifulsoup4 version 4.12.3
Allow older version of beautifulsoup4 and ensure compatibility.
Update library dependencies.
Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
---------
Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
* Testing fix for docling-core dt
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* fix: Fix code_formula test unit, update test-cases
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* fix: Fix code-formula model for new docling-core
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* fix: Update fixes
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Update test cases for office formats
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Update deps and lockfile
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Clean up imports
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
---------
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Co-authored-by: Maksym Lysak <mly@zurich.ibm.com>
Co-authored-by: Christoph Auer <cau@zurich.ibm.com>
* feat: Pass predicted page-headers and page-footers through to DoclingDocument furniture
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* chore: Update all test GT
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* fix: update all test cases
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* fix: update all test cases again
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Update lock
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Update lock to final docling-core
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
fix: Support for RTL programmatic documents
fix(parser): detect and handle rotated pages
fix(parser): fix bug causing duplicated text
fix(formula): improve stopping criteria
chore: update lock file
fix: temporary constrain beautifulsoup
* switch to code formula model v1.0.1 and new test pdf
Signed-off-by: Matteo-Omenetti <Matteo.Omenetti1@ibm.com>
* switch to code formula model v1.0.1 and new test pdf
Signed-off-by: Matteo-Omenetti <Matteo.Omenetti1@ibm.com>
* cleaned up the data folder in the tests
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* switch to code formula model v1.0.1 and new test pdf
Signed-off-by: Matteo-Omenetti <Matteo.Omenetti1@ibm.com>
* added three test-files for right-to-left
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* fix black
Signed-off-by: Matteo-Omenetti <Matteo.Omenetti1@ibm.com>
* added new gt for test_e2e_conversion
Signed-off-by: Matteo-Omenetti <Matteo.Omenetti1@ibm.com>
* added new gt for test_e2e_conversion
Signed-off-by: Matteo-Omenetti <Matteo.Omenetti1@ibm.com>
* Add code to expose text direction of cell
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* new test file
Signed-off-by: Matteo-Omenetti <Matteo.Omenetti1@ibm.com>
* update lock
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* fix mypy reports
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* fix example filepaths
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* add test data results
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* pin wheel of latest docling-parse release
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* use latest docling-core
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* remove debugging code
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* fix path to files in example
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* Revert unwanted RTL additions
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fix test data paths in examples
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
---------
Signed-off-by: Matteo-Omenetti <Matteo.Omenetti1@ibm.com>
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Matteo-Omenetti <Matteo.Omenetti1@ibm.com>
Co-authored-by: Peter Staar <taa@zurich.ibm.com>
Co-authored-by: Christoph Auer <cau@zurich.ibm.com>