Docling/docling at 4d41db3f7abb86c8c65386bf94e7eb0bf22bb82b - Docling - Gitea: Git with a cup of tea

NeoAnd/Docling

Files

History

Cesar Berrospi Ramis a112d7a035 fix: parse html with omitted body tag (#818 )

* fix: parse HTML files without body tag

Parse HTML files without 'body' tag, since it is optional in HTML5 specification.

Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>

* test: ensure docling converts HTML without body tag

Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>

---------

Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>

2025-01-27 16:59:00 +01:00

..

fix: parse html with omitted body tag (#818 )

2025-01-27 16:59:00 +01:00

feat: expose new hybrid chunker, update docs (#384 )

2024-12-09 08:28:29 +01:00

feat: add platform info to CLI version printout (#816 )

2025-01-27 16:04:57 +01:00

feat(ocr): expose rec_keys_path in RapidOcrOptions to support custom dictionaries (#786 )

2025-01-27 13:38:15 +01:00

feat(ocr): expose rec_keys_path in RapidOcrOptions to support custom dictionaries (#786 )

2025-01-27 13:38:15 +01:00

feat: New document picture classifier (#805 )

2025-01-24 18:05:51 +01:00

feat: Introduce automatic language detection in TesseractOcrCliModel (#800 )

2025-01-26 08:07:56 +01:00

__init__.py

Initial commit

2024-07-15 09:42:42 +02:00

document_converter.py

feat: add Docling JSON ingestion (#783 )

2025-01-24 18:05:23 +01:00

exceptions.py

fix: improve handling of disallowed formats (#429 )

2024-12-03 12:45:32 +01:00

py.typed

fix: Add py.typed marker file (#531 )

2024-12-06 13:42:14 +01:00