Commit Graph

  • 00d9405b0a
    feat: Add support for CSV input with new backend to transform CSV files to DoclingDocument (#945) Tobias Strebitzer 2025-02-14 15:55:09 +0800
  • 7493d5b01f
    docs: update example Dockerfile with download CLI (#929) Michele Dolfi 2025-02-13 14:19:50 +0100
  • af19c03f6e
    fix: update Pillow constraints (#958) Michele Dolfi 2025-02-13 14:19:37 +0100
  • 2d66e99b69
    docs: Examples for picture descriptions (#951) Michele Dolfi 2025-02-13 08:33:12 +0100
  • 2716c7d4ff
    feat: Introduce the enable_remote_services option to allow remote connections while processing (#941) Michele Dolfi 2025-02-12 15:18:01 +0100
  • 5101e2519e
    feat: allow artifacts_path to be defined as ENV (#940) Michele Dolfi 2025-02-12 13:08:37 +0100
  • c47ae700ec
    fix: Fix the initialization of the TesseractOcrModel (#935) Nikos Livathinos 2025-02-11 12:27:12 +0100
  • de462090e7 chore: bump version to 2.21.0 [skip ci] github-actions[bot] 2025-02-10 11:43:05 +0000
  • cf78d5b7b9
    feat: Add content_layer property to items to address body, furniture and other roles (#735) Christoph Auer 2025-02-10 12:07:49 +0100
  • 3e26597995 chore: bump version to 2.20.0 [skip ci] github-actions[bot] 2025-02-07 17:46:36 +0000
  • c18f47c5c0
    fix: remove unused httpx (#919) Michele Dolfi 2025-02-07 17:51:31 +0100
  • 4cc6e3ea5e
    feat: Describe pictures using vision models (#259) Michele Dolfi 2025-02-07 16:30:42 +0100
  • fba3cf9be7 chore: bump version to 2.19.0 [skip ci] github-actions[bot] 2025-02-07 13:36:54 +0000
  • 02faf5376b
    refactor: use org--name in artifacts-path (#912) Michele Dolfi 2025-02-07 13:58:05 +0100
  • 90b766e2ae
    fix(markdown): handle nested lists (#910) Panos Vagenas 2025-02-07 12:55:12 +0100
  • 9114ada7bc
    fix: Test cases for RTL programmatic PDFs and fixes for the formula model (#903) Michele Dolfi 2025-02-07 08:43:31 +0100
  • ed74fe2ec0
    feat: new artifacts path and CLI utility (#876) Michele Dolfi 2025-02-06 15:46:32 +0100
  • 722a6eb7b9
    fix(msword_backend): handle conversion error in label parsing (#896) Vladimir Gurevich 2025-02-06 13:30:51 +0200
  • 5ad6de0560
    fix: enrichment models batch size and expose picture classifier (#878) Michele Dolfi 2025-02-05 11:46:01 +0100
  • 17448163e7
    chore: fix docs search (#880) Panos Vagenas 2025-02-04 11:35:34 +0100
  • 6d3fea0196
    docs: Introduce example with custom models for RapidOCR (#874) Nikos Livathinos 2025-02-04 10:07:00 +0100
  • b5da4080c9 chore: bump version to 2.18.0 [skip ci] github-actions[bot] 2025-02-03 14:58:50 +0000
  • 5ac2887e4a
    fix(markdown): fix parsing if doc ending with table (#873) Panos Vagenas 2025-02-03 14:38:38 +0100
  • a40544a546
    chore: clean up top-level file (#872) Panos Vagenas 2025-02-03 14:10:12 +0100
  • 94751a78f4
    fix(markdown): add support for HTML content (#855) Panos Vagenas 2025-02-03 12:21:05 +0100
  • 6a76b49a47
    feat: Expose equation exports (#869) Michele Dolfi 2025-02-03 10:31:19 +0100
  • 0cd81a8122
    fix(docx): merged table cells not properly converted (#857) Cesar Berrospi Ramis 2025-02-03 10:20:03 +0100
  • eff16b62cc
    fix: Processing of placeholder shapes in pptx that have text but no bbox (#868) Maxim Lysak 2025-02-03 09:33:33 +0100
  • b1cf796730
    fix: KeyError in tableformer prediction (#854) Maxim Lysak 2025-01-31 17:00:14 +0100
  • 70d68b6164
    feat: Add option to define page range (#852) Christoph Auer 2025-01-31 15:23:00 +0100
  • d727b04ad0
    feat(docx): Support of SDTs in docx backend (#853) Maxim Lysak 2025-01-31 14:52:24 +0100
  • 2c037ae62e
    fix: Fixed docx import with headers that are also lists (#842) Maxim Lysak 2025-01-31 10:51:21 +0100
  • 2a1f8afe7e
    fix: use new add_code in html backend and add more typing hints (#850) Michele Dolfi 2025-01-31 09:54:17 +0100
  • 4df085aa6c
    feat: Python 3.13 support (#841) Michele Dolfi 2025-01-30 17:26:42 +0100
  • bccb022fc8
    fix(markdown): fix empty block handling (#843) Panos Vagenas 2025-01-30 16:22:29 +0100
  • fea0a99a95
    fix: Fix for the crash when encountering WMF images in pptx and docx (#837) Maxim Lysak 2025-01-30 14:58:27 +0100
  • d01a2e73ee
    test: update results with new docling-core (#839) Michele Dolfi 2025-01-30 14:07:52 +0100
  • d7c082894e
    docs: updated the readme with upcoming features (#831) Peter W. J. Staar 2025-01-30 09:52:54 +0100
  • f9144f2bb6
    docs: Add example for inspection of picture content (#624) Christoph Auer 2025-01-29 10:39:00 +0100
  • 4d11d87d06 chore: bump version to 2.17.0 [skip ci] github-actions[bot] 2025-01-28 18:37:26 +0000
  • 5aed9f8aeb
    fix: fix single newline handling in MD backend (#824) Panos Vagenas 2025-01-28 19:05:55 +0100
  • adf6353483
    fix: use file extension if filetype fails with PDF (#827) Cesar Berrospi Ramis 2025-01-28 19:03:54 +0100
  • ba521dd88f
    chore: add missing imports to Office type tests (#826) Panos Vagenas 2025-01-28 16:17:44 +0100
  • 6875913e34
    docs: document Docling JSON parsing (#819) Panos Vagenas 2025-01-28 13:23:30 +0100
  • 5139b48e4e
    docs: Add SSL verification error mitigation (#821) Anastas Stoyanovsky 2025-01-28 01:22:43 -0500
  • 6882e6c38d
    feat(CLI): Expose code and formula models in the CLI (#820) Michele Dolfi 2025-01-28 06:26:03 +0100
  • 4d41db3f7a
    docs(backend XML): do not delete temp file in notebook (#817) Cesar Berrospi Ramis 2025-01-27 18:53:39 +0100
  • a112d7a035
    fix: parse html with omitted body tag (#818) Cesar Berrospi Ramis 2025-01-27 16:59:00 +0100
  • 95b293a723
    feat: add platform info to CLI version printout (#816) Panos Vagenas 2025-01-27 16:04:57 +0100
  • 53327552e8
    feat(ocr): expose rec_keys_path in RapidOcrOptions to support custom dictionaries (#786) Yorick Terweijden 2025-01-27 14:38:15 +0200
  • 9022c6d855
    chore: update deps in lockfile (#815) Michele Dolfi 2025-01-27 12:41:18 +0100
  • 8a4ec77576
    docs: typo (#814) Farzad Sunavala 2025-01-27 04:24:26 -0600
  • b885b2fa3c
    docs: added markdown headings to enable TOC in github pages (#808) Farzad Sunavala 2025-01-27 02:40:35 -0600
  • c2ae1cc4ca
    docs: description of supported formats and backends (#788) Cesar Berrospi Ramis 2025-01-26 08:10:33 +0100
  • 3be2fb581f
    feat: Introduce automatic language detection in TesseractOcrCliModel (#800) Nikos Livathinos 2025-01-26 08:07:56 +0100
  • 9e4ca90db1 chore: bump version to 2.16.0 [skip ci] github-actions[bot] 2025-01-24 18:21:14 +0000
  • a458e298ca
    fix: added extraction of byte-images in excel (#804) Peter W. J. Staar 2025-01-24 18:48:02 +0100
  • 16a218d871
    feat: New document picture classifier (#805) Matteo 2025-01-24 18:05:51 +0100
  • 88a0e66adc
    feat: add Docling JSON ingestion (#783) Panos Vagenas 2025-01-24 18:05:23 +0100
  • e9768ae6a5
    chore: expose draw_clusters function (#803) Yusik Kim 2025-01-24 17:35:29 +0100
  • 3213b247ad
    feat: Code and equation model for PDF and code blocks in markdown (#752) Matteo 2025-01-24 16:54:22 +0100
  • c58f75d0f7
    docs: fix minor typos (#801) Farzad Sunavala 2025-01-24 09:27:05 -0600
  • 9020a934be
    docs: add Azure RAG example (#675) Farzad Sunavala 2025-01-24 06:56:26 -0600
  • 8543c22687
    feat: add "auto" language for TesseractOcr (#759) Pavel Denisov 2025-01-23 12:40:50 +0100
  • c49b3526fb
    docs: fix links between docs pages (#697) Michele Dolfi 2025-01-20 09:52:59 +0100
  • e4c7210133
    ci: added action to generate llms.txt (#701) Selvam Palanimalai 2025-01-20 03:52:27 -0500
  • 670a08bded
    fix: Update docling-parse-v2 backend version with new parsing fixes (#769) Christoph Auer 2025-01-20 09:00:57 +0100
  • 768608351d
    docs: fix correct Accelerator pipeline options in docs/examples/custom_convert.py (#733) Iacopo Ghinassi 2025-01-19 15:55:26 +0000
  • 57fc28d3d8
    refactor: allow the usage of backends in the enrich models and generalize the interface (#742) Michele Dolfi 2025-01-15 09:52:38 +0100
  • f7e1cbf629
    docs: Example to translate documents (#739) Peter W. J. Staar 2025-01-15 06:51:15 +0100
  • 1976584be1 chore: bump version to 2.15.1 [skip ci] github-actions[bot] 2025-01-10 10:29:32 +0000
  • 5a060f237d
    fix: Improve OCR results, stricten criteria before dropping bitmap areas (#719) Christoph Auer 2025-01-10 10:38:49 +0100
  • 9a6b5c8c8d
    docs: add pointers to LangChain-side docs (#718) Panos Vagenas 2025-01-09 17:36:46 +0100
  • 4fa8028bd8
    docs: add LangChain docs (#717) Panos Vagenas 2025-01-09 14:12:05 +0100
  • e64b5a2f62
    fix: allow earlier requests versions (#716) Michele Dolfi 2025-01-09 13:30:40 +0100
  • 9a94b54f6c chore: bump version to 2.15.0 [skip ci] github-actions[bot] 2025-01-08 12:06:38 +0000
  • 5cb4cf6f19
    fix: Correct scaling of debug visualizations, tune OCR (#700) Christoph Auer 2025-01-08 12:26:44 +0100
  • ead396ab40
    docs: specify docstring types (#702) Michele Dolfi 2025-01-08 09:05:18 +0100
  • 6701f34c85
    docs: add link to rag with granite (#698) Michele Dolfi 2025-01-07 20:01:41 +0100
  • 42856fdf79
    fix: Let BeautifulSoup detect the HTML encoding (#695) Christoph Auer 2025-01-07 15:49:28 +0100
  • 2d24faecd9
    docs: add integrations, revamp docs (#693) Panos Vagenas 2025-01-07 14:15:54 +0100
  • d49650c54f
    fix(mspowerpoint): handle invalid images in PowerPoint slides (#650) Jinfeng Sun 2025-01-07 20:58:10 +0800
  • 0ee849e8bc
    feat: added http header support for document converter and cli (#642) Luke Harrison 2025-01-07 04:15:14 -0500
  • 569038df42
    docs: Add OpenContracts as an integration (#679) JSIV 2025-01-07 04:14:42 -0500
  • 2b591f9872
    docs: add Weaviate RAG recipe notebook (#451) m-newhauser 2024-12-19 14:57:40 -0600
  • fc645ea531
    docs: document Haystack & Vectara support (#628) Panos Vagenas 2024-12-19 13:33:02 +0100
  • 1418fa1488 chore: bump version to 2.14.0 [skip ci] github-actions[bot] 2024-12-18 07:04:47 +0000
  • fd034802b6
    feat: Create a backend to transform PubMed XML files to DoclingDocument (#557) Lucas Morin 2024-12-17 19:27:09 +0100
  • e31f09f71f chore: bump version to 2.13.0 [skip ci] github-actions[bot] 2024-12-17 17:01:04 +0000
  • 60dc852f16
    feat: Updated Layout processing with forms and key-value areas (#530) Christoph Auer 2024-12-17 17:32:24 +0100
  • 00dec7a2f3
    test: generate file from CLI in a temporary directory (#618) Cesar Berrospi Ramis 2024-12-17 16:35:42 +0100
  • 4e087504cc
    feat: create a backend to parse USPTO patents into DoclingDocument (#606) Cesar Berrospi Ramis 2024-12-17 16:35:23 +0100
  • 3e599c7bbe
    docs: add Haystack RAG example (#615) Panos Vagenas 2024-12-17 14:24:40 +0100
  • 3b53bd38c8
    feat: Add Easyocr parameter recog_network (#613) itsainii 2024-12-17 16:47:18 +0800
  • 3bb3bf5715
    docs: Fix the path to the run_with_accelerator.py example (#608) Nikos Livathinos 2024-12-16 15:03:06 +0100
  • a2db5fbd0f chore: bump version to 2.12.0 [skip ci] github-actions[bot] 2024-12-13 18:27:00 +0000
  • 19fad9261c
    feat: Introduce support for GPU Accelerators (#593) Nikos Livathinos 2024-12-13 17:45:22 +0100
  • 365a1e7b98 chore: bump version to 2.11.0 [skip ci] github-actions[bot] 2024-12-12 08:16:05 +0000
  • 3da166eafa
    feat: Add timeout limit to document parsing job. DS4SD#270 (#552) Abhishek Kumar 2024-12-11 19:36:10 +0530
  • aee9c0b324
    fix: Do not import python modules from deepsearch-glm (#569) Christoph Auer 2024-12-11 12:29:06 +0100