..
data
fix: parse html with omitted body tag ( #818 )
2025-01-27 16:59:00 +01:00
data_scanned
feat: Updated Layout processing with forms and key-value areas ( #530 )
2024-12-17 17:32:24 +01:00
__init__.py
fix: Add unit tests ( #51 )
2024-08-30 14:08:20 +02:00
test_backend_asciidoc.py
feat: Add pipeline timings and toggle visualization, establish debug settings ( #183 )
2024-10-30 15:04:19 +01:00
test_backend_docling_json.py
feat: add Docling JSON ingestion ( #783 )
2025-01-24 18:05:23 +01:00
test_backend_docling_parse_v2.py
chore: make tests lighter ( #228 )
2024-11-04 14:02:28 +01:00
test_backend_docling_parse.py
chore: make tests lighter ( #228 )
2024-11-04 14:02:28 +01:00
test_backend_html.py
fix: parse html with omitted body tag ( #818 )
2025-01-27 16:59:00 +01:00
test_backend_msexcel.py
docs: description of supported formats and backends ( #788 )
2025-01-26 08:10:33 +01:00
test_backend_msword.py
fix: fix duplicate title and heading + add e2e tests for html and docx ( #186 )
2024-10-30 13:14:56 +01:00
test_backend_patent_uspto.py
docs: description of supported formats and backends ( #788 )
2025-01-26 08:10:33 +01:00
test_backend_pdfium.py
chore: make tests lighter ( #228 )
2024-11-04 14:02:28 +01:00
test_backend_pptx.py
feat: Extracting picture data for raster images found in PPTX ( #349 )
2024-11-18 15:22:28 +01:00
test_backend_pubmed.py
docs: description of supported formats and backends ( #788 )
2025-01-26 08:10:33 +01:00
test_cli.py
test: generate file from CLI in a temporary directory ( #618 )
2024-12-17 16:35:42 +01:00
test_code_formula.py
feat: Code and equation model for PDF and code blocks in markdown ( #752 )
2025-01-24 16:54:22 +01:00
test_document_picture_classifier.py
feat: New document picture classifier ( #805 )
2025-01-24 18:05:51 +01:00
test_e2e_conversion.py
feat: Add pipeline timings and toggle visualization, establish debug settings ( #183 )
2024-10-30 15:04:19 +01:00
test_e2e_ocr_conversion.py
feat: Introduce automatic language detection in TesseractOcrCliModel ( #800 )
2025-01-26 08:07:56 +01:00
test_input_doc.py
feat: add Docling JSON ingestion ( #783 )
2025-01-24 18:05:23 +01:00
test_interfaces.py
fix: improve handling of disallowed formats ( #429 )
2024-12-03 12:45:32 +01:00
test_invalid_input.py
fix: improve handling of disallowed formats ( #429 )
2024-12-03 12:45:32 +01:00
test_legacy_format_transform.py
fix: fix duplicate title and heading + add e2e tests for html and docx ( #186 )
2024-10-30 13:14:56 +01:00
test_options.py
feat: Introduce support for GPU Accelerators ( #593 )
2024-12-13 17:45:22 +01:00
verify_utils.py
feat(OCR): Introduce the OcrOptions.force_full_page_ocr parameter that forces a full page OCR scanning ( #290 )
2024-11-12 09:46:14 +01:00