
* feat: Implement csv backend and format detection Signed-off-by: Tobias Strebitzer <tobias.strebitzer@magloft.com> * test: Implement csv parsing and format tests Signed-off-by: Tobias Strebitzer <tobias.strebitzer@magloft.com> * docs: Add example and CSV format documentation Signed-off-by: Tobias Strebitzer <tobias.strebitzer@magloft.com> * feat: Add support for various CSV dialects and update documentation Signed-off-by: Tobias Strebitzer <tobias.strebitzer@magloft.com> * feat: Add validation for delimiters and tests for inconsistent csv files Signed-off-by: Tobias Strebitzer <tobias.strebitzer@magloft.com> --------- Signed-off-by: Tobias Strebitzer <tobias.strebitzer@magloft.com>
36 lines
1.2 KiB
Markdown
36 lines
1.2 KiB
Markdown
Docling can parse various documents formats into a unified representation (Docling
|
|
Document), which it can export to different formats too — check out
|
|
[Architecture](./concepts/architecture.md) for more details.
|
|
|
|
Below you can find a listing of all supported input and output formats.
|
|
|
|
## Supported input formats
|
|
|
|
| Format | Description |
|
|
|--------|-------------|
|
|
| PDF | |
|
|
| DOCX, XLSX, PPTX | Default formats in MS Office 2007+, based on Office Open XML |
|
|
| Markdown | |
|
|
| AsciiDoc | |
|
|
| HTML, XHTML | |
|
|
| CSV | |
|
|
| PNG, JPEG, TIFF, BMP | Image formats |
|
|
|
|
Schema-specific support:
|
|
|
|
| Format | Description |
|
|
|--------|-------------|
|
|
| USPTO XML | XML format followed by [USPTO](https://www.uspto.gov/patents) patents |
|
|
| PMC XML | XML format followed by [PubMed Central®](https://pmc.ncbi.nlm.nih.gov/) articles |
|
|
| Docling JSON | JSON-serialized [Docling Document](./concepts/docling_document.md) |
|
|
|
|
## Supported output formats
|
|
|
|
| Format | Description |
|
|
|--------|-------------|
|
|
| HTML | Both image embedding and referencing are supported |
|
|
| Markdown | |
|
|
| JSON | Lossless serialization of Docling Document |
|
|
| Text | Plain text, i.e. without Markdown markers |
|
|
| Doctags | |
|