Docling/docs/examples
Peter W. J. Staar cfdf4cea25
feat: new vlm-models support (#1570)
* feat: adding new vlm-models support

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fixed the transformers

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* got microsoft/Phi-4-multimodal-instruct to work

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* working on vlm's

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* refactoring the VLM part

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* all working, now serious refacgtoring necessary

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* refactoring the download_model

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added the formulate_prompt

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* pixtral 12b runs via MLX and native transformers

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added the VlmPredictionToken

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* refactoring minimal_vlm_pipeline

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fixed the MyPy

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added pipeline_model_specializations file

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* need to get Phi4 working again ...

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* finalising last points for vlms support

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fixed the pipeline for Phi4

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* streamlining all code

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* reformatted the code

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fixing the tests

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added the html backend to the VLM pipeline

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fixed the static load_from_doctags

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* restore stable imports

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* use AutoModelForVision2Seq for Pixtral and review example (including rename)

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* remove unused value

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* refactor instances of VLM models

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* skip compare example in CI

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* use lowercase and uppercase only

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add new minimal_vlm example and refactor pipeline_options_vlm_model for cleaner import

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* rename pipeline_vlm_model_spec

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* move more argument to options and simplify model init

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add supported_devices

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* remove not-needed function

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* exclude minimal_vlm

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* missing file

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add message for transformers version

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* rename to specs

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* use module import and remove MLX from non-darwin

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* remove hf_vlm_model and add extra_generation_args

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* use single HF VLM model class

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* remove torch type

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add docs for vision models

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
2025-06-02 17:01:06 +02:00
..
data docs: add advanced chunking & serialization example (#1589) 2025-05-14 14:35:07 +02:00
advanced_chunking_and_serialization.ipynb chore: fix chunking example data link (#1596) 2025-05-16 08:44:47 +02:00
backend_csv.ipynb feat: Add support for CSV input with new backend to transform CSV files to DoclingDocument (#945) 2025-02-14 08:55:09 +01:00
backend_xml_rag.ipynb chore: typo fix (#1465) 2025-04-28 08:52:09 +02:00
batch_convert.py ci: add coverage and ruff (#1383) 2025-04-14 18:01:26 +02:00
compare_vlm_models.py feat: new vlm-models support (#1570) 2025-06-02 17:01:06 +02:00
custom_convert.py feat: new vlm-models support (#1570) 2025-06-02 17:01:06 +02:00
develop_formula_understanding.py ci: add coverage and ruff (#1383) 2025-04-14 18:01:26 +02:00
develop_picture_enrichment.py ci: add coverage and ruff (#1383) 2025-04-14 18:01:26 +02:00
export_figures.py ci: add coverage and ruff (#1383) 2025-04-14 18:01:26 +02:00
export_multimodal.py ci: add coverage and ruff (#1383) 2025-04-14 18:01:26 +02:00
export_tables.py ci: add coverage and ruff (#1383) 2025-04-14 18:01:26 +02:00
full_page_ocr.py ci: add coverage and ruff (#1383) 2025-04-14 18:01:26 +02:00
hybrid_chunking.ipynb docs: add advanced chunking & serialization example (#1589) 2025-05-14 14:35:07 +02:00
index.md docs: add architecture outline (#341) 2024-11-15 12:52:41 +01:00
inspect_picture_content.py fix: Test cases for RTL programmatic PDFs and fixes for the formula model (#903) 2025-02-07 08:43:31 +01:00
minimal_vlm_pipeline.py feat: new vlm-models support (#1570) 2025-06-02 17:01:06 +02:00
minimal.py docs: add advanced chunking & serialization example (#1589) 2025-05-14 14:35:07 +02:00
pictures_description_api.py docs: Examples for picture descriptions (#951) 2025-02-13 08:33:12 +01:00
pictures_description.ipynb chore: typo fix (#1465) 2025-04-28 08:52:09 +02:00
rag_azuresearch.ipynb ci: add coverage and ruff (#1383) 2025-04-14 18:01:26 +02:00
rag_haystack.ipynb ci: add coverage and ruff (#1383) 2025-04-14 18:01:26 +02:00
rag_langchain.ipynb ci: add coverage and ruff (#1383) 2025-04-14 18:01:26 +02:00
rag_llamaindex.ipynb chore: move to docling-project org (#1160) 2025-03-14 12:35:29 +01:00
rag_milvus.ipynb chore: typo fix (#1465) 2025-04-28 08:52:09 +02:00
rag_weaviate.ipynb chore: typo fix (#1465) 2025-04-28 08:52:09 +02:00
rapidocr_with_custom_models.py docs: Introduce example with custom models for RapidOCR (#874) 2025-02-04 10:07:00 +01:00
retrieval_qdrant.ipynb chore: move to docling-project org (#1160) 2025-03-14 12:35:29 +01:00
run_md.py ci: add coverage and ruff (#1383) 2025-04-14 18:01:26 +02:00
run_with_accelerator.py feat: new vlm-models support (#1570) 2025-06-02 17:01:06 +02:00
run_with_formats.py ci: add coverage and ruff (#1383) 2025-04-14 18:01:26 +02:00
serialization.ipynb docs: add serialization docs, update chunking docs (#1556) 2025-05-08 21:43:01 +02:00
tesseract_lang_detection.py ci: add coverage and ruff (#1383) 2025-04-14 18:01:26 +02:00
translate.py feat: new vlm-models support (#1570) 2025-06-02 17:01:06 +02:00
visual_grounding.ipynb ci: add coverage and ruff (#1383) 2025-04-14 18:01:26 +02:00
vlm_pipeline_api_model.py feat: new vlm-models support (#1570) 2025-06-02 17:01:06 +02:00