Docling/docling/backend
Michele Dolfi 57fc28d3d8
refactor: allow the usage of backends in the enrich models and generalize the interface (#742)
* fix get image with cropbox

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* allow the usage of backends in the enrich models and generalize the interface

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* move logic in BaseTextImageEnrichmentModel

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* renaming

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2025-01-15 09:52:38 +01:00
..
xml feat: Create a backend to transform PubMed XML files to DoclingDocument (#557) 2024-12-17 19:27:09 +01:00
__init__.py Initial commit 2024-07-15 09:42:42 +02:00
abstract_backend.py feat: Support AsciiDoc and Markdown input format (#168) 2024-10-23 16:14:26 +02:00
asciidoc_backend.py feat: Add pipeline timings and toggle visualization, establish debug settings (#183) 2024-10-30 15:04:19 +01:00
docling_parse_backend.py refactor: allow the usage of backends in the enrich models and generalize the interface (#742) 2025-01-15 09:52:38 +01:00
docling_parse_v2_backend.py refactor: allow the usage of backends in the enrich models and generalize the interface (#742) 2025-01-15 09:52:38 +01:00
html_backend.py fix: Let BeautifulSoup detect the HTML encoding (#695) 2025-01-07 15:49:28 +01:00
md_backend.py fix: handling of long sequence of unescaped underscore chars in markdown (#173) 2024-10-28 16:34:48 +01:00
msexcel_backend.py feat: added excel backend (#334) 2024-11-19 12:21:17 +01:00
mspowerpoint_backend.py fix(mspowerpoint): handle invalid images in PowerPoint slides (#650) 2025-01-07 13:58:10 +01:00
msword_backend.py fix: Correcting DefaultText ID for MS Word backend (#537) 2024-12-06 15:48:35 +01:00
pdf_backend.py feat!: Docling v2 (#117) 2024-10-16 21:02:03 +02:00
pypdfium2_backend.py refactor: allow the usage of backends in the enrich models and generalize the interface (#742) 2025-01-15 09:52:38 +01:00