Docling/docs/supported_formats.md
Tobias Strebitzer 00d9405b0a
feat: Add support for CSV input with new backend to transform CSV files to DoclingDocument (#945)
* feat: Implement csv backend and format detection

Signed-off-by: Tobias Strebitzer <tobias.strebitzer@magloft.com>

* test: Implement csv parsing and format tests

Signed-off-by: Tobias Strebitzer <tobias.strebitzer@magloft.com>

* docs: Add example and CSV format documentation

Signed-off-by: Tobias Strebitzer <tobias.strebitzer@magloft.com>

* feat: Add support for various CSV dialects and update documentation

Signed-off-by: Tobias Strebitzer <tobias.strebitzer@magloft.com>

* feat: Add validation for delimiters and tests for inconsistent csv files

Signed-off-by: Tobias Strebitzer <tobias.strebitzer@magloft.com>

---------

Signed-off-by: Tobias Strebitzer <tobias.strebitzer@magloft.com>
2025-02-14 08:55:09 +01:00

1.2 KiB

Docling can parse various documents formats into a unified representation (Docling Document), which it can export to different formats too — check out Architecture for more details.

Below you can find a listing of all supported input and output formats.

Supported input formats

Format Description
PDF
DOCX, XLSX, PPTX Default formats in MS Office 2007+, based on Office Open XML
Markdown
AsciiDoc
HTML, XHTML
CSV
PNG, JPEG, TIFF, BMP Image formats

Schema-specific support:

Format Description
USPTO XML XML format followed by USPTO patents
PMC XML XML format followed by PubMed Central® articles
Docling JSON JSON-serialized Docling Document

Supported output formats

Format Description
HTML Both image embedding and referencing are supported
Markdown
JSON Lossless serialization of Docling Document
Text Plain text, i.e. without Markdown markers
Doctags