Docling/docling/backend
Christoph Auer 3960b199d6
feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905)
* Add DoclingParseV3 backend implementation

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Use docling-core with docling-parse types

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Fixes and test updates

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Fix streams

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Fix streams

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Reset tests

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* update test cases

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* update test units

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Add back DoclingParse v1 backend, pipeline options

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Update locks

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* fix: update docling-core to 2.22.0

Update dependency library docling-core to latest release 2.22.0
Fix regression tests and ground truth files

Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>

* Ground-truth files updated

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Update tests, use TextCell.from_ocr property

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Text fixes, new test data

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Rename docling backend to v4

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Test all backends, fixes

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Reset all tests to use docling-parse v1 for now

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Fixes for DPv4 backend init, better test coverage

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* test_input_doc use default backend

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

---------

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
Co-authored-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
2025-03-18 10:38:19 +01:00
..
docx feat: equations to latex in MSWord backend (with inline groups) (#1114) 2025-03-13 15:12:22 +01:00
json feat: add Docling JSON ingestion (#783) 2025-01-24 18:05:23 +01:00
xml fix: Pass tests, update docling-core to 2.22.0 (#1150) 2025-03-13 09:45:55 +01:00
__init__.py Initial commit 2024-07-15 09:42:42 +02:00
abstract_backend.py feat: add Docling JSON ingestion (#783) 2025-01-24 18:05:23 +01:00
asciidoc_backend.py fix: use first table row as col headers (#1156) 2025-03-13 15:34:18 +01:00
csv_backend.py fix: use first table row as col headers (#1156) 2025-03-13 15:34:18 +01:00
docling_parse_backend.py feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905) 2025-03-18 10:38:19 +01:00
docling_parse_v2_backend.py feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905) 2025-03-18 10:38:19 +01:00
docling_parse_v4_backend.py feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905) 2025-03-18 10:38:19 +01:00
html_backend.py fix(html): handle nested empty lists (#1154) 2025-03-13 16:56:58 +01:00
md_backend.py fix: use first table row as col headers (#1156) 2025-03-13 15:34:18 +01:00
msexcel_backend.py fix: use first table row as col headers (#1156) 2025-03-13 15:34:18 +01:00
mspowerpoint_backend.py fix: use first table row as col headers (#1156) 2025-03-13 15:34:18 +01:00
msword_backend.py fix: use first table row as col headers (#1156) 2025-03-13 15:34:18 +01:00
pdf_backend.py feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905) 2025-03-18 10:38:19 +01:00
pypdfium2_backend.py feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905) 2025-03-18 10:38:19 +01:00