Docling/docling/utils
Christoph Auer 3960b199d6
feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905)
* Add DoclingParseV3 backend implementation

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Use docling-core with docling-parse types

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Fixes and test updates

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Fix streams

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Fix streams

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Reset tests

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* update test cases

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* update test units

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Add back DoclingParse v1 backend, pipeline options

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Update locks

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* fix: update docling-core to 2.22.0

Update dependency library docling-core to latest release 2.22.0
Fix regression tests and ground truth files

Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>

* Ground-truth files updated

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Update tests, use TextCell.from_ocr property

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Text fixes, new test data

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Rename docling backend to v4

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Test all backends, fixes

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Reset all tests to use docling-parse v1 for now

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Fixes for DPv4 backend init, better test coverage

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* test_input_doc use default backend

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

---------

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
Co-authored-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
2025-03-18 10:38:19 +01:00
..
__init__.py Initial commit 2024-07-15 09:42:42 +02:00
accelerator_utils.py feat: Support cuda:n GPU device allocation (#694) 2025-02-17 11:31:13 +01:00
export.py feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905) 2025-03-18 10:38:19 +01:00
glm_utils.py feat: Add content_layer property to items to address body, furniture and other roles (#735) 2025-02-10 12:07:49 +01:00
layout_postprocessor.py feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905) 2025-03-18 10:38:19 +01:00
locks.py fix: enable locks for threadsafe pdfium (#1052) 2025-03-02 20:06:44 +01:00
model_downloader.py fix: vlm using artifacts path (#1057) 2025-02-26 08:33:50 +01:00
ocr_utils.py feat: Introduce automatic language detection in TesseractOcrCliModel (#800) 2025-01-26 08:07:56 +01:00
profiling.py feat: Add pipeline timings and toggle visualization, establish debug settings (#183) 2024-10-30 15:04:19 +01:00
utils.py feat: new artifacts path and CLI utility (#876) 2025-02-06 15:46:32 +01:00
visualization.py feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905) 2025-03-18 10:38:19 +01:00