Docling/docling/backend
Peter W. J. Staar 926dfd29d5
feat: added excel backend (#334)
* feat: added excel backend

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* first msexcel backend

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added tooling for the cli

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* first working version for excel parsing of tables

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added proper typing for mypy

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added proper typing for mypy

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* refactor EXCEL to XLSX

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added the unit tests

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* ran poetry lock

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* adding images to output [WIP]

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* reformatted the code

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fixed the mypy

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* updated the msexcel

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* updated the msexcel (2)

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fixed the mypy

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added tests for merged cells in excel

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* reformatted the code

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

---------

Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2024-11-19 12:21:17 +01:00
..
__init__.py Initial commit 2024-07-15 09:42:42 +02:00
abstract_backend.py feat: Support AsciiDoc and Markdown input format (#168) 2024-10-23 16:14:26 +02:00
asciidoc_backend.py feat: Add pipeline timings and toggle visualization, establish debug settings (#183) 2024-10-30 15:04:19 +01:00
docling_parse_backend.py chore: fix typo (#241) 2024-11-05 16:20:04 +01:00
docling_parse_v2_backend.py chore: fix typo (#241) 2024-11-05 16:20:04 +01:00
html_backend.py Added handling of code blocks in html with <pre> tag (#302) 2024-11-11 15:00:11 +01:00
md_backend.py fix: handling of long sequence of unescaped underscore chars in markdown (#173) 2024-10-28 16:34:48 +01:00
msexcel_backend.py feat: added excel backend (#334) 2024-11-19 12:21:17 +01:00
mspowerpoint_backend.py feat: Extracting picture data for raster images found in PPTX (#349) 2024-11-18 15:22:28 +01:00
msword_backend.py fix: Fixing images in the input Word files (#330) 2024-11-14 13:33:34 +01:00
pdf_backend.py feat!: Docling v2 (#117) 2024-10-16 21:02:03 +02:00
pypdfium2_backend.py chore: fix typo (#241) 2024-11-05 16:20:04 +01:00