Docling/docling/utils at 56a0e104f76c5ba30ac0fcd247be61f911b560c1 - Docling - Gitea: Git with a cup of tea

NeoAnd/Docling

Files

History

Christoph Auer 7d3302cb48 feat: Make Page.parsed_page the only source of truth for text cells, add OCR cells to it (#1745 )

* Keep page.parsed_page.textline_cells and page.cells in sync, including OCR

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Make page.parsed_page the only source of truth for text cells

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Small fix

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Correctly compute PDF boxes from pymupdf

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Use different OCR engine order

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Add type hints and fix mypy

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* One more test fix

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Remove with pypdfium2_lock from caller sites

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Fix typing

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

---------

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

2025-06-13 19:01:55 +02:00

..

__init__.py

Initial commit

2024-07-15 09:42:42 +02:00

accelerator_utils.py

feat: new vlm-models support (#1570 )

2025-06-02 17:01:06 +02:00

api_image_request.py

feat: OllamaVlmModel for Granite Vision 3.2 (#1337 )

2025-04-10 18:03:04 +02:00

export.py

ci: add coverage and ruff (#1383 )

2025-04-14 18:01:26 +02:00

glm_utils.py

ci: add coverage and ruff (#1383 )

2025-04-14 18:01:26 +02:00

layout_postprocessor.py

feat: Make Page.parsed_page the only source of truth for text cells, add OCR cells to it (#1745 )

2025-06-13 19:01:55 +02:00

locks.py

fix: enable locks for threadsafe pdfium (#1052 )

2025-03-02 20:06:44 +01:00

model_downloader.py

feat: new vlm-models support (#1570 )

2025-06-02 17:01:06 +02:00

ocr_utils.py

feat(ocr): auto-detect rotated pages in Tesseract (#1167 )

2025-05-21 18:12:33 +02:00

orientation.py

feat(ocr): auto-detect rotated pages in Tesseract (#1167 )

2025-05-21 18:12:33 +02:00

profiling.py

feat: Add pipeline timings and toggle visualization, establish debug settings (#183 )

2024-10-30 15:04:19 +01:00

utils.py

fix: usage of hashlib for FIPS (#1512 )

2025-05-02 15:03:29 +02:00

visualization.py

feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905 )

2025-03-18 10:38:19 +01:00