Docling/docs/examples
Nikos Livathinos 19fad9261c
feat: Introduce support for GPU Accelerators (#593)
* Upgraded Layout Postprocessing, sending old code back to ERZ

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Implement hierachical cluster layout processing

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Pass nested cluster processing through full pipeline

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Pass nested clusters through GLM as payload

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Move to_docling_document from ds-glm to this repo

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Clean up imports again

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* feat(Accelerator): Introduce options to control the num_threads and device from API, envvars, CLI.
- Introduce the AcceleratorOptions, AcceleratorDevice and use them to set the device where the models run.
- Introduce the accelerator_utils with function to decide the device and resolve the AUTO setting.
- Refactor the way how the docling-ibm-models are called to match the new init signature of models.
- Translate the accelerator options to the specific inputs for third-party models.
- Extend the docling CLI with parameters to set the num_threads and device.
- Add new unit tests.
- Write new example how to use the accelerator options.

* fix: Improve the pydantic objects in the pipeline_options and imports.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: TableStructureModel: Refactor the artifacts path to use the new structure for fast/accurate model

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* Updated test ground-truth

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Updated test ground-truth (again), bugfix for empty layout

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* fix: Do proper check to set the device in EasyOCR, RapidOCR.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* Rollback changes from main

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Update test gt

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Remove unused debug settings

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Review fixes

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Nail the accelerator defaults for MPS

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

---------

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
Co-authored-by: Christoph Auer <cau@zurich.ibm.com>
Co-authored-by: Christoph Auer <60343111+cau-git@users.noreply.github.com>
2024-12-13 17:45:22 +01:00
..
batch_convert.py chore: make tests lighter (#228) 2024-11-04 14:02:28 +01:00
custom_convert.py feat: Introduce support for GPU Accelerators (#593) 2024-12-13 17:45:22 +01:00
develop_picture_enrichment.py feat: added support for exporting DocItem to an image when page image is available (#379) 2024-11-19 16:28:52 +01:00
export_figures.py fix: Update tests and examples for docling-core 2.5.1 (#449) 2024-11-27 13:07:00 +01:00
export_multimodal.py feat!: Docling v2 (#117) 2024-10-16 21:02:03 +02:00
export_tables.py feat!: Docling v2 (#117) 2024-10-16 21:02:03 +02:00
full_page_ocr.py feat(ocr): added support for RapidOCR engine (#415) 2024-11-27 13:57:41 +01:00
hybrid_chunking.ipynb feat: expose new hybrid chunker, update docs (#384) 2024-12-09 08:28:29 +01:00
hybrid_rag_qdrant.ipynb chore: fix Qdrant notebook Colab link (#319) 2024-11-14 10:42:02 +01:00
index.md docs: add architecture outline (#341) 2024-11-15 12:52:41 +01:00
minimal.py chore: various minor docs fixes (#169) 2024-10-22 15:29:36 +02:00
rag_langchain.ipynb feat!: Docling v2 (#117) 2024-10-16 21:02:03 +02:00
rag_llamaindex.ipynb docs: update LlamaIndex docs for Docling v2 (#182) 2024-10-28 14:28:26 +01:00
run_md.py feat: Support AsciiDoc and Markdown input format (#168) 2024-10-23 16:14:26 +02:00
run_with_accelerator.py feat: Introduce support for GPU Accelerators (#593) 2024-12-13 17:45:22 +01:00
run_with_formats.py fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00