Docling

Files

Guilhem VERMOREL b3d111a3cd fix: Tesseract OCR CLI can't process images composed with numbers only (#1201 )

fix wrong type text extracted by tesseract_ocr_cli_model

Signed-off-by: gvl4 <Guilhem.VERMOREL@3ds.com>
Co-authored-by: gvl4 <Guilhem.VERMOREL@3ds.com>

2025-03-31 10:53:49 +02:00

factories

feat: add factory for ocr engines via plugins (#1010 )

2025-03-18 13:58:05 +01:00

plugins

feat: add factory for ocr engines via plugins (#1010 )

2025-03-18 13:58:05 +01:00

__init__.py

Initial commit

2024-07-15 09:42:42 +02:00

base_model.py

feat: add factory for ocr engines via plugins (#1010 )

2025-03-18 13:58:05 +01:00

base_ocr_model.py

feat: add factory for ocr engines via plugins (#1010 )

2025-03-18 13:58:05 +01:00

code_formula_model.py

perf: New revision code formula model and document picture classifier (#1140 )

2025-03-11 10:15:28 +01:00

document_picture_classifier.py

perf: New revision code formula model and document picture classifier (#1140 )

2025-03-11 10:15:28 +01:00

easyocr_model.py

feat: add factory for ocr engines via plugins (#1010 )

2025-03-18 13:58:05 +01:00

hf_mlx_model.py

feat(SmolDocling): Support MLX acceleration in VLM pipeline (#1199 )

2025-03-19 15:38:54 +01:00

hf_vlm_model.py

feat: [Experimental] Introduce VLM pipeline using HF AutoModelForVision2Seq, featuring SmolDocling model (#1054 )

2025-02-26 14:43:26 +01:00

layout_model.py

refactor: use org--name in artifacts-path (#912 )

2025-02-07 13:58:05 +01:00

ocr_mac_model.py

feat: add factory for ocr engines via plugins (#1010 )

2025-03-18 13:58:05 +01:00

page_assemble_model.py

feat: Implement new reading-order model (#916 )

2025-02-20 17:51:17 +01:00

page_preprocessing_model.py

fix(debug): Missing translation of bbox to to_bounding_box (#1220 )

2025-03-25 12:18:10 +01:00

picture_description_api_model.py

feat: add factory for ocr engines via plugins (#1010 )

2025-03-18 13:58:05 +01:00

picture_description_base_model.py

feat: add factory for ocr engines via plugins (#1010 )

2025-03-18 13:58:05 +01:00

picture_description_vlm_model.py

feat: add factory for ocr engines via plugins (#1010 )

2025-03-18 13:58:05 +01:00

rapid_ocr_model.py

feat: add factory for ocr engines via plugins (#1010 )

2025-03-18 13:58:05 +01:00

readingorder_model.py

feat: Implement new reading-order model (#916 )

2025-02-20 17:51:17 +01:00

table_structure_model.py

fix: Fixes tables when using OCR (#1261 )

2025-03-29 10:06:00 +01:00

tesseract_ocr_cli_model.py

fix: Tesseract OCR CLI can't process images composed with numbers only (#1201 )

2025-03-31 10:53:49 +02:00

tesseract_ocr_model.py

feat: add factory for ocr engines via plugins (#1010 )

2025-03-18 13:58:05 +01:00