Docling/docling/backend
Tobias Strebitzer 00d9405b0a
feat: Add support for CSV input with new backend to transform CSV files to DoclingDocument (#945)
* feat: Implement csv backend and format detection

Signed-off-by: Tobias Strebitzer <tobias.strebitzer@magloft.com>

* test: Implement csv parsing and format tests

Signed-off-by: Tobias Strebitzer <tobias.strebitzer@magloft.com>

* docs: Add example and CSV format documentation

Signed-off-by: Tobias Strebitzer <tobias.strebitzer@magloft.com>

* feat: Add support for various CSV dialects and update documentation

Signed-off-by: Tobias Strebitzer <tobias.strebitzer@magloft.com>

* feat: Add validation for delimiters and tests for inconsistent csv files

Signed-off-by: Tobias Strebitzer <tobias.strebitzer@magloft.com>

---------

Signed-off-by: Tobias Strebitzer <tobias.strebitzer@magloft.com>
2025-02-14 08:55:09 +01:00
..
json feat: add Docling JSON ingestion (#783) 2025-01-24 18:05:23 +01:00
xml docs: description of supported formats and backends (#788) 2025-01-26 08:10:33 +01:00
__init__.py Initial commit 2024-07-15 09:42:42 +02:00
abstract_backend.py feat: add Docling JSON ingestion (#783) 2025-01-24 18:05:23 +01:00
asciidoc_backend.py feat: Code and equation model for PDF and code blocks in markdown (#752) 2025-01-24 16:54:22 +01:00
csv_backend.py feat: Add support for CSV input with new backend to transform CSV files to DoclingDocument (#945) 2025-02-14 08:55:09 +01:00
docling_parse_backend.py refactor: allow the usage of backends in the enrich models and generalize the interface (#742) 2025-01-15 09:52:38 +01:00
docling_parse_v2_backend.py refactor: allow the usage of backends in the enrich models and generalize the interface (#742) 2025-01-15 09:52:38 +01:00
html_backend.py fix: use new add_code in html backend and add more typing hints (#850) 2025-01-31 09:54:17 +01:00
md_backend.py fix(markdown): handle nested lists (#910) 2025-02-07 12:55:12 +01:00
msexcel_backend.py fix: added extraction of byte-images in excel (#804) 2025-01-24 18:48:02 +01:00
mspowerpoint_backend.py fix: Processing of placeholder shapes in pptx that have text but no bbox (#868) 2025-02-03 09:33:33 +01:00
msword_backend.py fix(msword_backend): handle conversion error in label parsing (#896) 2025-02-06 12:30:51 +01:00
pdf_backend.py feat: Code and equation model for PDF and code blocks in markdown (#752) 2025-01-24 16:54:22 +01:00
pypdfium2_backend.py refactor: allow the usage of backends in the enrich models and generalize the interface (#742) 2025-01-15 09:52:38 +01:00