Docling/docling/models
Peter W. J. Staar cfdf4cea25
feat: new vlm-models support (#1570)
* feat: adding new vlm-models support

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fixed the transformers

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* got microsoft/Phi-4-multimodal-instruct to work

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* working on vlm's

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* refactoring the VLM part

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* all working, now serious refacgtoring necessary

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* refactoring the download_model

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added the formulate_prompt

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* pixtral 12b runs via MLX and native transformers

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added the VlmPredictionToken

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* refactoring minimal_vlm_pipeline

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fixed the MyPy

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added pipeline_model_specializations file

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* need to get Phi4 working again ...

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* finalising last points for vlms support

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fixed the pipeline for Phi4

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* streamlining all code

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* reformatted the code

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fixing the tests

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added the html backend to the VLM pipeline

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fixed the static load_from_doctags

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* restore stable imports

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* use AutoModelForVision2Seq for Pixtral and review example (including rename)

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* remove unused value

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* refactor instances of VLM models

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* skip compare example in CI

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* use lowercase and uppercase only

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add new minimal_vlm example and refactor pipeline_options_vlm_model for cleaner import

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* rename pipeline_vlm_model_spec

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* move more argument to options and simplify model init

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add supported_devices

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* remove not-needed function

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* exclude minimal_vlm

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* missing file

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add message for transformers version

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* rename to specs

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* use module import and remove MLX from non-darwin

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* remove hf_vlm_model and add extra_generation_args

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* use single HF VLM model class

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* remove torch type

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add docs for vision models

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
2025-06-02 17:01:06 +02:00
..
factories ci: add coverage and ruff (#1383) 2025-04-14 18:01:26 +02:00
plugins feat: add factory for ocr engines via plugins (#1010) 2025-03-18 13:58:05 +01:00
utils feat: new vlm-models support (#1570) 2025-06-02 17:01:06 +02:00
vlm_models_inline feat: new vlm-models support (#1570) 2025-06-02 17:01:06 +02:00
__init__.py Initial commit 2024-07-15 09:42:42 +02:00
api_vlm_model.py feat: new vlm-models support (#1570) 2025-06-02 17:01:06 +02:00
base_model.py ci: add coverage and ruff (#1383) 2025-04-14 18:01:26 +02:00
base_ocr_model.py feat: new vlm-models support (#1570) 2025-06-02 17:01:06 +02:00
code_formula_model.py feat: new vlm-models support (#1570) 2025-06-02 17:01:06 +02:00
document_picture_classifier.py feat: new vlm-models support (#1570) 2025-06-02 17:01:06 +02:00
easyocr_model.py feat: new vlm-models support (#1570) 2025-06-02 17:01:06 +02:00
layout_model.py feat: new vlm-models support (#1570) 2025-06-02 17:01:06 +02:00
ocr_mac_model.py feat: new vlm-models support (#1570) 2025-06-02 17:01:06 +02:00
page_assemble_model.py feat: Establish confidence estimation for document and pages (#1313) 2025-05-21 12:32:49 +02:00
page_preprocessing_model.py chore: fix or ignore runtime and deprecation warnings (#1660) 2025-05-28 17:55:31 +02:00
picture_description_api_model.py feat: new vlm-models support (#1570) 2025-06-02 17:01:06 +02:00
picture_description_base_model.py feat: new vlm-models support (#1570) 2025-06-02 17:01:06 +02:00
picture_description_vlm_model.py feat: new vlm-models support (#1570) 2025-06-02 17:01:06 +02:00
rapid_ocr_model.py feat: new vlm-models support (#1570) 2025-06-02 17:01:06 +02:00
readingorder_model.py fix: updated the time-recorder label for reading order (#1490) 2025-04-29 13:02:53 +02:00
table_structure_model.py feat: new vlm-models support (#1570) 2025-06-02 17:01:06 +02:00
tesseract_ocr_cli_model.py feat: new vlm-models support (#1570) 2025-06-02 17:01:06 +02:00
tesseract_ocr_model.py feat: new vlm-models support (#1570) 2025-06-02 17:01:06 +02:00