Docling/docs/examples
Christoph Auer 3c9fe76b70
feat: [Experimental] Introduce VLM pipeline using HF AutoModelForVision2Seq, featuring SmolDocling model (#1054)
* Skeleton for SmolDocling model and VLM Pipeline

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* wip smolDocling inference and vlm pipeline

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* WIP, first working code for inference of SmolDocling, and vlm pipeline assembly code, example included.

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Fixes to preserve page image and demo export to html

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Enabled figure support in vlm_pipeline

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Fix for table span compute in vlm_pipeline

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Properly propagating image data per page, together with predicted tags in VLM pipeline. This enables correct figure extraction and page numbers in provenances

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Cleaned up logs, added pages to vlm_pipeline, basic timing per page measurement in smol_docling models

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Replaced hardcoded otsl tokens with the ones from docling-core tokens.py enum

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Added tokens/sec measurement, improved example

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Added capability for vlm_pipeline to grab text from preconfigured backend

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Exposed "force_backend_text" as pipeline parameter

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Flipped keep_backend to True for vlm_pipeline assembly to work

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Updated vlm pipeline assembly and smol docling model code to support updated doctags

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Fixing doctags starting tag, that broke elements on first line during assembly

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Introduced SmolDoclingOptions to configure model parameters (such as query and artifacts path) via client code, see example in minimal_smol_docling. Provisioning for other potential vlm all-in-one models.

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Moved artifacts_path for SmolDocling into vlm_options instead of global pipeline option

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* New assembly code for latest model revision, updated prompt and parsing of doctags, updated logging

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Updated example of Smol Docling usage

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Added captions for the images for SmolDocling assembly code, improved provenance definition for all elements

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Update minimal smoldocling example

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Fix repo id

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Cleaned up unnecessary logging

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* More elegant solution in removing the input prompt

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* removed minimal_smol_docling example from CI checks

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Removed special html code wrapping when exporting to docling document, cleaned up comments

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Addressing PR comments, added enabled property to SmolDocling, and related VLM pipeline option, few other minor things

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Moved keep_backend = True to vlm pipeline

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* removed pipeline_options.generate_table_images from vlm_pipeline (deprecated in the pipelines)

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Added example on how to get original predicted doctags in minimal_smol_docling

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* removing changes from base_pipeline

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Replaced remaining strings to appropriate enums

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Updated poetry.lock

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* re-built poetry.lock

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Generalize and refactor VLM pipeline and models

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Rename example

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Move imports

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Expose control over using flash_attention_2

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Fix VLM example exclusion in CI

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Add back device_map and accelerate

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Make drawing code resilient against bad bboxes

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* chore: clean up code and comments

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* chore: more cleanup

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* chore: fix leftover .to(device)

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* fix: add proper table provenance

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

---------

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
Co-authored-by: Maksym Lysak <mly@zurich.ibm.com>
2025-02-26 14:43:26 +01:00
..
backend_csv.ipynb feat: Add support for CSV input with new backend to transform CSV files to DoclingDocument (#945) 2025-02-14 08:55:09 +01:00
backend_xml_rag.ipynb feat(xml-jats): parse XML JATS documents (#967) 2025-02-17 10:43:31 +01:00
batch_convert.py feat: Implement new reading-order model (#916) 2025-02-20 17:51:17 +01:00
custom_convert.py fix: Test cases for RTL programmatic PDFs and fixes for the formula model (#903) 2025-02-07 08:43:31 +01:00
develop_formula_understanding.py fix: Test cases for RTL programmatic PDFs and fixes for the formula model (#903) 2025-02-07 08:43:31 +01:00
develop_picture_enrichment.py fix: Test cases for RTL programmatic PDFs and fixes for the formula model (#903) 2025-02-07 08:43:31 +01:00
export_figures.py fix: Test cases for RTL programmatic PDFs and fixes for the formula model (#903) 2025-02-07 08:43:31 +01:00
export_multimodal.py fix: Test cases for RTL programmatic PDFs and fixes for the formula model (#903) 2025-02-07 08:43:31 +01:00
export_tables.py fix: Test cases for RTL programmatic PDFs and fixes for the formula model (#903) 2025-02-07 08:43:31 +01:00
full_page_ocr.py fix: Test cases for RTL programmatic PDFs and fixes for the formula model (#903) 2025-02-07 08:43:31 +01:00
hybrid_chunking.ipynb docs: add LangChain docs (#717) 2025-01-09 14:12:05 +01:00
index.md docs: add architecture outline (#341) 2024-11-15 12:52:41 +01:00
inspect_picture_content.py fix: Test cases for RTL programmatic PDFs and fixes for the formula model (#903) 2025-02-07 08:43:31 +01:00
minimal_vlm_pipeline.py feat: [Experimental] Introduce VLM pipeline using HF AutoModelForVision2Seq, featuring SmolDocling model (#1054) 2025-02-26 14:43:26 +01:00
minimal.py chore: various minor docs fixes (#169) 2024-10-22 15:29:36 +02:00
pictures_description_api.py docs: Examples for picture descriptions (#951) 2025-02-13 08:33:12 +01:00
pictures_description.ipynb docs: revamp picture description example (#1015) 2025-02-19 11:28:54 +01:00
rag_azuresearch.ipynb docs: typo (#814) 2025-01-27 11:24:26 +01:00
rag_haystack.ipynb docs: add integrations, revamp docs (#693) 2025-01-07 14:15:54 +01:00
rag_langchain.ipynb docs: add LangChain docs (#717) 2025-01-09 14:12:05 +01:00
rag_llamaindex.ipynb docs: add integrations, revamp docs (#693) 2025-01-07 14:15:54 +01:00
rag_weaviate.ipynb docs: add integrations, revamp docs (#693) 2025-01-07 14:15:54 +01:00
rapidocr_with_custom_models.py docs: Introduce example with custom models for RapidOCR (#874) 2025-02-04 10:07:00 +01:00
retrieval_qdrant.ipynb docs: add integrations, revamp docs (#693) 2025-01-07 14:15:54 +01:00
run_md.py feat: Support AsciiDoc and Markdown input format (#168) 2024-10-23 16:14:26 +02:00
run_with_accelerator.py feat: Support cuda:n GPU device allocation (#694) 2025-02-17 11:31:13 +01:00
run_with_formats.py feat: Add support for CSV input with new backend to transform CSV files to DoclingDocument (#945) 2025-02-14 08:55:09 +01:00
tesseract_lang_detection.py fix: Test cases for RTL programmatic PDFs and fixes for the formula model (#903) 2025-02-07 08:43:31 +01:00
translate.py fix: Test cases for RTL programmatic PDFs and fixes for the formula model (#903) 2025-02-07 08:43:31 +01:00