Commit Graph

  • 8012a3e4d6
    fix: Treat overflowing -v flags as DEBUG (#1419) Eugene 2025-04-19 13:02:41 +0400
  • 88948b0bba
    docs: Updated the [Usage] link in architecture.md (#1416) Leandro Rosas 2025-04-19 09:20:52 +0100
  • fa7fc9e63d
    fix(codecov): fix codecov argument and yaml file (#1399) Cesar Berrospi Ramis 2025-04-15 18:12:57 +0200
  • 550b1ca2f8
    chore: propagate docling-core fix (#1389) Panos Vagenas 2025-04-15 10:51:47 +0200
  • a7dd59c5cb
    docs(ocr): Add docs entry for OnnxTR OCR plugin (#1382) Felix Dittrich 2025-04-15 09:46:59 +0200
  • 06227e9970
    ci: sign pypi packages (#1392) Michele Dolfi 2025-04-15 08:59:16 +0200
  • 5458a88464
    ci: add coverage and ruff (#1383) Michele Dolfi 2025-04-14 18:01:26 +0200
  • 293c28ca7c
    docs(security): more statements about secure development (#1381) Michele Dolfi 2025-04-14 13:53:26 +0200
  • 01fbfd5652
    docs: Add testing in the docs (#1379) Michele Dolfi 2025-04-14 12:31:48 +0200
  • d9c3999175
    chore: update lock file (#1378) Michele Dolfi 2025-04-14 10:38:10 +0200
  • a026b4e84b
    docs: Add Notes for Installing in Intel macOS (#1377) Juil Park 2025-04-14 17:21:13 +0900
  • c391adb5f0 chore: bump version to 2.30.0 [skip ci] github-actions[bot] 2025-04-14 08:20:31 +0000
  • 7e40ad3261
    fix(deps): widen typer upper bound (#1375) Michele Dolfi 2025-04-14 09:23:39 +0200
  • c0ba88edf1
    feat(cli): add option for html with split-page mode (#1355) Peter W. J. Staar 2025-04-14 08:41:50 +0200
  • 0de70e7991
    fix: auto-recognize .xlsx, .docx and .pptx files (#1340) Tim Kellogg 2025-04-14 01:45:13 -0400
  • b295da4bfe
    chore: Update repository URL in CITATION.cff (#1363) Simon Leiß 2025-04-14 06:57:04 +0200
  • 415b877984
    fix(docx): declare image_data variable when handling pictures (#1359) Cesar Berrospi Ramis 2025-04-11 13:04:00 +0200
  • 250399948d
    fix: Implement PictureDescriptionApiOptions.bitmap_area_threshold (#1248) Rowan Skewes 2025-04-11 19:14:05 +1000
  • eef2bdea77
    feat(xlsx): create a page for each worksheet in XLSX backend (#1332) Cesar Berrospi Ramis 2025-04-11 10:29:53 +0200
  • c605edd8e9
    feat: OllamaVlmModel for Granite Vision 3.2 (#1337) Gabe Goodhart 2025-04-10 10:03:04 -0600
  • 6b696b504a
    fix: Properly address page in pipeline _assemble_document when page_range is provided (#1334) Joan Fabrégat 2025-04-10 16:11:28 +0200
  • 72ab8e1821 chore: bump version to 2.29.0 [skip ci] github-actions[bot] 2025-04-10 12:24:09 +0000
  • 355d8dc7a6
    chore: Logo parameter in docling CLI, prints cute ascii logo (#1294) Maxim Lysak 2025-04-09 05:29:48 +0200
  • 14e9c0ce9a
    fix(docx): Adding new latex symbols, simplifying how equations are added to text (#1295) Rafael Teixeira de Lima 2025-04-08 17:11:37 +0200
  • 0499cd1c1e
    feat: handle <code> tags as code blocks (#1320) Fernando Santos 2025-04-08 05:32:06 -0300
  • 2e99e5a54f
    docs: add plugins docs (#1319) Michele Dolfi 2025-04-08 09:44:37 +0200
  • 61de30966f
    chore: update lock file (#1315) Michele Dolfi 2025-04-07 17:47:51 +0200
  • dc3bf9ceac
    fix(pptx): check if picture shape has an image attached (#1316) Maxim Lysak 2025-04-07 17:36:56 +0200
  • bfcab3d677
    feat(docx): add text formatting and hyperlink support (#630) Simon Jégou 2025-04-03 15:11:50 +0200
  • 71148eb381
    docs: add visual grounding example (#1270) Panos Vagenas 2025-04-02 14:03:19 +0200
  • d2d68747f9
    fix(docx): Improve text parsing (#1268) Rafael Teixeira de Lima 2025-04-02 12:56:44 +0200
  • b3d111a3cd
    fix: Tesseract OCR CLI can't process images composed with numbers only (#1201) Guilhem VERMOREL 2025-03-31 10:53:49 +0200
  • 44f2b081ec chore: bump version to 2.28.4 [skip ci] github-actions[bot] 2025-03-29 11:56:42 +0000
  • 7afad7e52d
    fix: Fixes tables when using OCR (#1261) Maxim Lysak 2025-03-29 10:06:00 +0100
  • 124f921077 chore: bump version to 2.28.3 [skip ci] github-actions[bot] 2025-03-28 18:30:03 +0000
  • 8bd71e8e33
    fix: Word-level pdf cells for tables (#1238) Maxim Lysak 2025-03-28 16:34:48 +0100
  • 82694b2136 chore: bump version to 2.28.2 [skip ci] github-actions[bot] 2025-03-26 16:52:06 +0000
  • 9210812bfa
    fix: improve HTML layer detection, various MD fixes (#1241) Panos Vagenas 2025-03-26 16:07:14 +0100
  • 85c4df887b
    fix(html): fix HTML parsed heading level (#1244) Panos Vagenas 2025-03-26 10:30:23 +0100
  • 9eb1686f93 chore: bump version to 2.28.1 [skip ci] github-actions[bot] 2025-03-25 18:20:23 +0000
  • 38b7108a22
    chore: update locked deps (#1239) Panos Vagenas 2025-03-25 15:48:02 +0100
  • 825b226fab
    fix(converter): Cache same pipeline class with different options (#1152) mislavmartinic 2025-03-26 00:18:44 +1300
  • 6df8827231
    fix(debug): Missing translation of bbox to to_bounding_box (#1220) Hoang-Long Do 2025-03-25 18:18:10 +0700
  • f739d0e4c5
    fix(docx): identifying numbered headers (#1231) Rafael Teixeira de Lima 2025-03-25 11:41:02 +0100
  • 0974ba4e1c
    docs(examples): batch conversion doc raises_on_error (#1147) Clément Doumouro 2025-03-25 11:14:39 +0100
  • 8ebb0bf1a0
    chore: properly clean up apt temporary files in Dockerfile (#1223) Peter Dave Hello 2025-03-25 18:10:09 +0800
  • 7df157204b chore: bump version to 2.28.0 [skip ci] github-actions[bot] 2025-03-19 15:18:10 +0000
  • 1c26769785
    feat(SmolDocling): Support MLX acceleration in VLM pipeline (#1199) Maxim Lysak 2025-03-19 15:38:54 +0100
  • b454aa1551
    feat: Add PPTX notes slides (#474) Maciej Wieczorek 2025-03-19 14:52:09 +0100
  • f5adfb9724
    fix: Determine correct page size in DoclingParseV4Backend (#1196) Christoph Auer 2025-03-19 11:05:42 +0100
  • d5f7798763
    test(html): fix regression test after docling-core update (#1197) Cesar Berrospi Ramis 2025-03-19 11:03:46 +0100
  • 0b707d0882
    fix(msword): Fixing function return in equations handling (#1194) Rafael Teixeira de Lima 2025-03-19 10:34:25 +0100
  • 1d680b0a32
    docs: Linux Foundation AI & Data (#1183) Michele Dolfi 2025-03-19 09:05:57 +0100
  • 54a78c307d
    docs: move apify to docs (#1182) Michele Dolfi 2025-03-18 16:43:55 +0100
  • 2f72167ff6
    feat: updated vlm pipeline (with latest changes from docling-core) (#1158) Maxim Lysak 2025-03-18 15:44:51 +0100
  • 1a2a9e4eff chore: bump version to 2.27.0 [skip ci] github-actions[bot] 2025-03-18 13:37:45 +0000
  • 6eaae3cba0
    feat: add factory for ocr engines via plugins (#1010) Michele Dolfi 2025-03-18 13:58:05 +0100
  • 3960b199d6
    feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905) Christoph Auer 2025-03-18 10:38:19 +0100
  • 772487f9c9
    feat(actor): Docling Actor on Apify infrastructure (#875) Václav Vančura 2025-03-18 10:17:44 +0100
  • 7e01798417
    docs: fix spelling of picture in usage (#1165) serced 2025-03-17 09:33:51 +0100
  • fa16b12316
    chore: move to docling-project org (#1160) Michele Dolfi 2025-03-14 12:35:29 +0100
  • f94da44ec5
    fix(html): handle nested empty lists (#1154) Cesar Berrospi Ramis 2025-03-13 16:56:58 +0100
  • 0945973b79
    fix: use first table row as col headers (#1156) Panos Vagenas 2025-03-13 15:34:18 +0100
  • 6eb718f849
    feat: equations to latex in MSWord backend (with inline groups) (#1114) Rafael Teixeira de Lima 2025-03-13 15:12:22 +0100
  • aa92a57fa9
    fix: Pass tests, update docling-core to 2.22.0 (#1150) Cesar Berrospi Ramis 2025-03-13 09:45:55 +0100
  • 17c5bf1242 chore: bump version to 2.26.0 [skip ci] github-actions[bot] 2025-03-11 11:12:43 +0000
  • eb97357b05
    feat: Use new TableFormer model weights and default to accurate model version (#1100) Christoph Auer 2025-03-11 10:53:49 +0100
  • 5e30381c0d
    perf: New revision code formula model and document picture classifier (#1140) Matteo 2025-03-11 09:15:28 +0000
  • 4d64c4c0b6
    fix(CLI): fix help message for abort options (#1130) Michele Dolfi 2025-03-07 14:47:49 +0100
  • e1c49ad727
    docs: add description of DOCLING_ARTIFACTS_PATH env var (#1124) Michele Dolfi 2025-03-06 07:30:07 +0100
  • a3c957ca6b chore: bump version to 2.25.2 [skip ci] github-actions[bot] 2025-03-05 14:51:57 +0000
  • c56ab3a66b
    fix: Proper handling of orphan IDs in layout postprocessing (#1118) Christoph Auer 2025-03-05 14:30:59 +0100
  • 357d41cc47
    docs: Enrichment models (#1097) Michele Dolfi 2025-03-04 14:24:38 +0100
  • b1e79cadc7 chore: bump version to 2.25.1 [skip ci] github-actions[bot] 2025-03-03 00:56:40 +0000
  • 0c1e9391de
    chore: use gh cache for huggingface models (#1096) Michele Dolfi 2025-03-03 00:13:47 +0100
  • 8dc0562542
    fix: enable locks for threadsafe pdfium (#1052) Michele Dolfi 2025-03-02 20:06:44 +0100
  • e25d557c06
    refactor: add the contentlayer to html-backend (#1040) Peter W. J. Staar 2025-03-02 10:37:53 -0500
  • db3ceefd4a
    docs: improve docs on token limit warning triggered by HybridChunker (#1077) Panos Vagenas 2025-02-28 14:54:46 +0100
  • de7b963b09
    fix(html): use 'start' attribute when parsing ordered lists from HTML docs (#1062) Cesar Berrospi Ramis 2025-02-27 09:46:57 +0100
  • 37dd8c1cc7 chore: bump version to 2.25.0 [skip ci] github-actions[bot] 2025-02-26 14:16:15 +0000
  • 3c9fe76b70
    feat: [Experimental] Introduce VLM pipeline using HF AutoModelForVision2Seq, featuring SmolDocling model (#1054) Christoph Auer 2025-02-26 14:43:26 +0100
  • ab683e4fb6
    feat(cli): add option for downloading all models, refine help messages (#1061) Panos Vagenas 2025-02-26 13:27:29 +0100
  • e197225739
    fix: vlm using artifacts path (#1057) Michele Dolfi 2025-02-26 08:33:50 +0100
  • c84b973959
    docs: extend chunking docs, add FAQ on token limit (#1053) Panos Vagenas 2025-02-25 13:07:38 +0100
  • 1b0ead6907
    fix(html): Parse text in div elements as TextItem (#1041) Cesar Berrospi Ramis 2025-02-24 12:38:29 +0100
  • 1d17e7397a
    test: avoid testing exact JSON in CSV backend (#1038) Suehtam 2025-02-24 07:10:40 +0000
  • d8a81c3168 chore: bump version to 2.24.0 [skip ci] github-actions[bot] 2025-02-20 18:31:20 +0000
  • c93e36988f
    feat: Implement new reading-order model (#916) Christoph Auer 2025-02-20 17:51:17 +0100
  • c031a7ae47 chore: bump version to 2.23.1 [skip ci] github-actions[bot] 2025-02-20 16:26:41 +0000
  • 1ac010354f
    test: avoid testing exact JSON (#1027) Cesar Berrospi Ramis 2025-02-20 16:20:07 +0100
  • 6796f0a132
    fix: Runtime error when Pandas Series is not always of string type (#1024) fanszoro 2025-02-20 22:41:41 +0800
  • dfcc30dddb
    chore: Update tests and lockfile (#1021) Christoph Auer 2025-02-19 16:51:53 +0100
  • 27c04007bc
    docs: revamp picture description example (#1015) Panos Vagenas 2025-02-19 11:28:54 +0100
  • 7450050ace
    refactor: upgrade BeautifulSoup4 with type hints (#999) Cesar Berrospi Ramis 2025-02-18 11:30:47 +0100
  • 75db61127c chore: bump version to 2.23.0 [skip ci] github-actions[bot] 2025-02-17 14:22:49 +0000
  • 6e75f0b5d3
    fix: Revise DocTags, fix iterate_items to output content_layer in items (#965) Maxim Lysak 2025-02-17 14:11:55 +0100
  • 77eb77bdc2
    feat: Support cuda:n GPU device allocation (#694) Ahmed Nassar 2025-02-17 11:31:13 +0100
  • 428b656793
    feat(xml-jats): parse XML JATS documents (#967) Cesar Berrospi Ramis 2025-02-17 10:43:31 +0100
  • e1436a8b05
    test: validate actual docitems in tests (#966) Michele Dolfi 2025-02-14 17:47:53 +0100
  • ffbde1d1b0 chore: bump version to 2.22.0 [skip ci] github-actions[bot] 2025-02-14 08:53:20 +0000