Docling/tests/data
Peter W. J. Staar 926dfd29d5
feat: added excel backend (#334)
* feat: added excel backend

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* first msexcel backend

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added tooling for the cli

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* first working version for excel parsing of tables

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added proper typing for mypy

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added proper typing for mypy

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* refactor EXCEL to XLSX

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added the unit tests

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* ran poetry lock

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* adding images to output [WIP]

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* reformatted the code

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fixed the mypy

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* updated the msexcel

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* updated the msexcel (2)

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fixed the mypy

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added tests for merged cells in excel

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* reformatted the code

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

---------

Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2024-11-19 12:21:17 +01:00
..
docx fix: Handling of single-cell tables in DOCX backend (#314) 2024-11-12 15:20:55 +01:00
groundtruth feat: added excel backend (#334) 2024-11-19 12:21:17 +01:00
html fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
pptx feat: Extracting picture data for raster images found in PPTX (#349) 2024-11-18 15:22:28 +01:00
xlsx feat: added excel backend (#334) 2024-11-19 12:21:17 +01:00
2203.01017v2.pdf fix: Add unit tests (#51) 2024-08-30 14:08:20 +02:00
2206.01062.pdf fix: Add unit tests (#51) 2024-08-30 14:08:20 +02:00
2305.03393v1-pg9-img.png feat!: Docling v2 (#117) 2024-10-16 21:02:03 +02:00
2305.03393v1-pg9.pdf fix: Add unit tests (#51) 2024-08-30 14:08:20 +02:00
2305.03393v1.pdf fix: Add unit tests (#51) 2024-08-30 14:08:20 +02:00
redp5110_sampled.pdf chore: make tests lighter (#228) 2024-11-04 14:02:28 +01:00
test_01.asciidoc feat: Support AsciiDoc and Markdown input format (#168) 2024-10-23 16:14:26 +02:00
test_02.asciidoc feat: Support AsciiDoc and Markdown input format (#168) 2024-10-23 16:14:26 +02:00