Docling/mkdocs.yml
Peter W. J. Staar cfdf4cea25
feat: new vlm-models support (#1570)
* feat: adding new vlm-models support

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fixed the transformers

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* got microsoft/Phi-4-multimodal-instruct to work

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* working on vlm's

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* refactoring the VLM part

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* all working, now serious refacgtoring necessary

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* refactoring the download_model

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added the formulate_prompt

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* pixtral 12b runs via MLX and native transformers

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added the VlmPredictionToken

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* refactoring minimal_vlm_pipeline

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fixed the MyPy

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added pipeline_model_specializations file

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* need to get Phi4 working again ...

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* finalising last points for vlms support

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fixed the pipeline for Phi4

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* streamlining all code

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* reformatted the code

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fixing the tests

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added the html backend to the VLM pipeline

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fixed the static load_from_doctags

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* restore stable imports

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* use AutoModelForVision2Seq for Pixtral and review example (including rename)

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* remove unused value

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* refactor instances of VLM models

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* skip compare example in CI

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* use lowercase and uppercase only

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add new minimal_vlm example and refactor pipeline_options_vlm_model for cleaner import

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* rename pipeline_vlm_model_spec

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* move more argument to options and simplify model init

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add supported_devices

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* remove not-needed function

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* exclude minimal_vlm

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* missing file

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add message for transformers version

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* rename to specs

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* use module import and remove MLX from non-darwin

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* remove hf_vlm_model and add extra_generation_args

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* use single HF VLM model class

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* remove torch type

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add docs for vision models

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
2025-06-02 17:01:06 +02:00

171 lines
5.8 KiB
YAML

site_name: Docling
site_url: https://docling-project.github.io/docling/
repo_name: docling-project/docling
repo_url: https://github.com/docling-project/docling
theme:
name: material
custom_dir: docs/overrides
palette:
# Palette toggle for automatic mode
- media: "(prefers-color-scheme)"
scheme: default
primary: black
toggle:
icon: material/brightness-auto
name: Switch to light mode
# Palette toggle for light mode
- media: "(prefers-color-scheme: light)"
scheme: default
primary: black
toggle:
icon: material/brightness-7
name: Switch to dark mode
# Palette toggle for dark mode
- media: "(prefers-color-scheme: dark)"
scheme: slate
primary: black
toggle:
icon: material/brightness-4
name: Switch to system preference
logo: assets/logo.png
favicon: assets/logo.png
features:
- content.tabs.link
- content.code.annotate
- content.code.copy
- announce.dismiss
- navigation.footer
- navigation.tabs
- navigation.indexes # <= if set, each "section" can have its own page, if index.md is used
- navigation.instant
- navigation.instant.prefetch
# - navigation.instant.preview
- navigation.instant.progress
- navigation.path
- navigation.sections # <=
- navigation.top
- navigation.tracking
- search.suggest
- toc.follow
nav:
- Home:
- "Docling": index.md
- Installation:
- Installation: installation/index.md
- Usage:
- Usage: usage/index.md
- Supported formats: usage/supported_formats.md
- Enrichment features: usage/enrichments.md
- Vision models: usage/vision_models.md
- FAQ:
- FAQ: faq/index.md
- Concepts:
- Concepts: concepts/index.md
- Architecture: concepts/architecture.md
- Docling Document: concepts/docling_document.md
- Serialization: concepts/serialization.md
- Chunking: concepts/chunking.md
- Plugins: concepts/plugins.md
- Examples:
- Examples: examples/index.md
- 🔀 Conversion:
- "Simple conversion": examples/minimal.py
- "Custom conversion": examples/custom_convert.py
- "Batch conversion": examples/batch_convert.py
- "Multi-format conversion": examples/run_with_formats.py
- "VLM pipeline with SmolDocling": examples/minimal_vlm_pipeline.py
- "VLM pipeline with remote model": examples/vlm_pipeline_api_model.py
- "VLM comparison": examples/compare_vlm_models.py
- "Figure export": examples/export_figures.py
- "Table export": examples/export_tables.py
- "Multimodal export": examples/export_multimodal.py
- "Force full page OCR": examples/full_page_ocr.py
- "Automatic OCR language detection with tesseract": examples/tesseract_lang_detection.py
- "RapidOCR with custom OCR models": examples/rapidocr_with_custom_models.py
- "Accelerator options": examples/run_with_accelerator.py
- "Simple translation": examples/translate.py
- examples/backend_csv.ipynb
- examples/backend_xml_rag.ipynb
- ✂️ Serialization & chunking:
- examples/serialization.ipynb
- examples/hybrid_chunking.ipynb
- examples/advanced_chunking_and_serialization.ipynb
- 🤖 RAG with AI dev frameworks:
- examples/rag_haystack.ipynb
- examples/rag_langchain.ipynb
- examples/rag_llamaindex.ipynb
- examples/visual_grounding.ipynb
- 🖼️ Picture annotation:
- "Annotate picture with local VLM": examples/pictures_description.ipynb
- "Annotate picture with remote VLM": examples/pictures_description_api.py
- ✨ Enrichment development:
- "Figure enrichment": examples/develop_picture_enrichment.py
- "Formula enrichment": examples/develop_formula_understanding.py
- 🗂️ More examples:
- examples/rag_milvus.ipynb
- examples/rag_weaviate.ipynb
- RAG with Granite [↗]: https://github.com/ibm-granite-community/granite-snack-cookbook/blob/main/recipes/RAG/Granite_Docling_RAG.ipynb
- examples/rag_azuresearch.ipynb
- examples/retrieval_qdrant.ipynb
- Integrations:
- Integrations: integrations/index.md
- 🤖 Agentic / AI dev frameworks:
- "Bee Agent Framework": integrations/bee.md
- "Crew AI": integrations/crewai.md
- "Haystack": integrations/haystack.md
- "LangChain": integrations/langchain.md
- "LlamaIndex": integrations/llamaindex.md
- "txtai": integrations/txtai.md
- ⭐️ Featured:
- "Apify": integrations/apify.md
- "Data Prep Kit": integrations/data_prep_kit.md
- "InstructLab": integrations/instructlab.md
- "NVIDIA": integrations/nvidia.md
- "Prodigy": integrations/prodigy.md
- "RHEL AI": integrations/rhel_ai.md
- "spaCy": integrations/spacy.md
- 🗂️ More integrations:
- "Cloudera": integrations/cloudera.md
- "DocETL": integrations/docetl.md
- "Kotaemon": integrations/kotaemon.md
- "OpenContracts": integrations/opencontracts.md
- "Vectara": integrations/vectara.md
- Reference:
- Python API:
- Document Converter: reference/document_converter.md
- Pipeline options: reference/pipeline_options.md
- Docling Document: reference/docling_document.md
- CLI:
- CLI reference: reference/cli.md
markdown_extensions:
- pymdownx.superfences
- pymdownx.tabbed:
alternate_style: true
slugify: !!python/object/apply:pymdownx.slugs.slugify
kwds:
case: lower
- admonition
- pymdownx.details
- attr_list
- mkdocs-click
plugins:
- search
- mkdocs-jupyter
- mkdocstrings:
default_handler: python
options:
extensions:
- griffe_pydantic:
schema: true
preload_modules:
- docling
- docling_core
extra_css:
- stylesheets/extra.css