Commit Graph

  • f45499ce93
    fix: Handle no result from RapidOcr reader (#558) Christoph Auer 2024-12-10 16:25:05 +0100
  • d0c9e8e508
    docs: update chunking usage docs, minor reorg (#550) Panos Vagenas 2024-12-10 16:03:02 +0100
  • a7df337654
    fix: make enum serializable with human-readable value (#555) Michele Dolfi 2024-12-10 13:12:44 +0100
  • eb30c4f763 chore: bump version to 2.10.0 [skip ci] github-actions[bot] 2024-12-09 16:28:46 +0000
  • 7972d47f88
    fix: Call into docling-core for legacy document transform (#551) Christoph Auer 2024-12-09 17:06:47 +0100
  • 78f61a8522
    fix: Introduce Image format options in CLI. Silence the tqdm downloading messages. (#544) Nikos Livathinos 2024-12-09 15:57:37 +0100
  • aca57f0527
    feat: docling-parse v2 as default PDF backend (#549) Christoph Auer 2024-12-09 13:26:17 +0100
  • 9fd2cf847a chore: bump version to 2.9.0 [skip ci] github-actions[bot] 2024-12-09 09:33:55 +0000
  • c8ecdd987e
    feat: expose new hybrid chunker, update docs (#384) Panos Vagenas 2024-12-09 08:28:29 +0100
  • eb7ffcdd1c
    fix: Correcting DefaultText ID for MS Word backend (#537) Maxim Lysak 2024-12-06 15:48:35 +0100
  • 3e073dfbeb
    feat(MS Word backend): Make detection of headers and other styles localization agnostic (#534) Maxim Lysak 2024-12-06 15:17:56 +0100
  • 53039a8367
    ci: allow ! in conventionalcommits (#533) Michele Dolfi 2024-12-06 14:50:10 +0100
  • 9102fe1adc
    fix: Add py.typed marker file (#531) Sander Maijers 2024-12-06 13:42:14 +0100
  • e780333440
    docs: document new integrations (#532) Panos Vagenas 2024-12-06 13:18:14 +0100
  • 0d11e30dd8
    fix: Enable HTML export in CLI and add options for image mode (#513) Peter W. J. Staar 2024-12-06 12:37:57 +0100
  • b730b2d7a0
    fix: Missing text in docx (t tag) when embedded in a table (#528) Maxim Lysak 2024-12-06 12:37:25 +0100
  • c830b92b2e
    fix: restore pydantic version pin after fixes (#512) Michele Dolfi 2024-12-06 09:33:39 +0100
  • 8ada0bccc7
    fix: folder input in cli (#511) Michele Dolfi 2024-12-04 14:22:00 +0100
  • 9c788ae778 chore: bump version to 2.8.3 [skip ci] github-actions[bot] 2024-12-03 15:16:47 +0000
  • 34c7c79858
    fix: improve handling of disallowed formats (#429) Christoph Auer 2024-12-03 12:45:32 +0100
  • 2254845da3 chore: bump version to 2.8.2 [skip ci] github-actions[bot] 2024-12-03 10:47:29 +0000
  • 672962a8b2
    chore: update numpy lock (#500) Michele Dolfi 2024-12-03 11:21:31 +0100
  • c90c41c391
    fix: ParserError EOF inside string (#470) (#472) guglie 2024-12-03 11:21:18 +0100
  • 5ba3807f31
    docs: add styling for faq (#502) Michele Dolfi 2024-12-03 11:20:49 +0100
  • 051789d017
    perf: prevent temp file leftovers, reuse core type (#487) Panos Vagenas 2024-12-03 10:40:28 +0100
  • d3f84b2457
    fix: PermissionError when using tesseract_ocr_cli_model (#496) Gaspard Petit 2024-12-03 04:22:03 -0500
  • 33cff98d36
    docs: typo in faq (#484) Álvaro Huertas 2024-12-02 10:35:24 +0100
  • d4872103b8
    docs: add automatic api reference (#475) Michele Dolfi 2024-12-02 09:55:52 +0100
  • 8ccb3c6db6
    docs: introduce faq section (#468) Michele Dolfi 2024-11-29 22:34:56 +0100
  • cc46c938b6 chore: bump version to 2.8.1 [skip ci] github-actions[bot] 2024-11-29 13:04:48 +0000
  • dd8de46267
    fix(cli): expose debug options (#467) Michele Dolfi 2024-11-29 13:25:58 +0100
  • af63818df5
    fix: remove unused deps (#466) Michele Dolfi 2024-11-29 13:18:06 +0100
  • 84c46fdeb3
    docs: extend integration docs & README (#456) Panos Vagenas 2024-11-28 09:41:21 +0100
  • 211f4f7570 chore: bump version to 2.8.0 [skip ci] github-actions[bot] 2024-11-27 13:29:32 +0000
  • 85b29990be
    feat(ocr): added support for RapidOCR engine (#415) Swaymaw 2024-11-27 18:27:41 +0530
  • 767563bf8b
    fix: use correct image index in word backend (#442) Manuel030 2024-11-27 13:45:07 +0100
  • 29807a2d68
    fix: Update tests and examples for docling-core 2.5.1 (#449) Christoph Auer 2024-11-27 13:07:00 +0100
  • 6666d9ec07 chore: bump version to 2.7.1 [skip ci] github-actions[bot] 2024-11-26 15:01:33 +0000
  • d0a1180478
    fix: Fixes for wordx (#432) Maxim Lysak 2024-11-26 14:44:43 +0100
  • d7072b4b56
    fix: force pydantic < 2.10.0 (#407) Michele Dolfi 2024-11-22 08:23:11 +0100
  • 2a1d3fd221
    chore: update the README (#409) Peter W. J. Staar 2024-11-21 17:28:53 +0100
  • 7a45b92078
    docs: add DocETL, Kotaemon, spaCy integrations; minor docs improvements (#408) Panos Vagenas 2024-11-21 17:23:04 +0100
  • 97d571af97
    chore: add downloads in README, security policy and update ci actions (#401) Michele Dolfi 2024-11-21 13:59:45 +0100
  • eb64f6d368 chore: bump version to 2.7.0 [skip ci] github-actions[bot] 2024-11-20 15:36:51 +0000
  • 7b013abcf3
    fix: python3.9 support (#396) Michele Dolfi 2024-11-20 15:21:40 +0100
  • 6efa96c983
    feat: add support for ocrmac OCR engine on macOS (#276) nuridol 2024-11-20 20:51:19 +0900
  • 32ebf55e33
    fix: propagate document limits to converter (#388) Michele Dolfi 2024-11-20 08:36:51 +0100
  • 2cfaceb787 chore: bump version to 2.6.0 [skip ci] github-actions[bot] 2024-11-19 16:07:34 +0000
  • 3f91e7d3f1
    feat: added support for exporting DocItem to an image when page image is available (#379) Shubham Gupta 2024-11-19 16:28:52 +0100
  • 911c3bda27
    docs: fixed typo in v2 example v2 (#378) Gaspard Petit 2024-11-19 10:27:19 -0500
  • ed785ea122
    feat: expose ocr-lang in CLI (#375) Michele Dolfi 2024-11-19 15:58:49 +0100
  • 926dfd29d5
    feat: added excel backend (#334) Peter W. J. Staar 2024-11-19 12:21:17 +0100
  • e6f89d520f
    chore: update lock of deps (#371) Michele Dolfi 2024-11-19 10:23:59 +0100
  • 7a97d7119f
    feat: Extracting picture data for raster images found in PPTX (#349) Maxim Lysak 2024-11-18 15:22:28 +0100
  • 7dbdbdeaf3
    ci: fix mergify (#350) Michele Dolfi 2024-11-15 17:13:01 +0100
  • 364d37ca96
    ci(Mergify): configuration update (#339) Michele Dolfi 2024-11-15 13:18:33 +0100
  • ca8524ecae
    docs: add automatic generation of CLI reference (#325) Michele Dolfi 2024-11-15 13:18:17 +0100
  • 25fd149c38
    docs: add architecture outline (#341) Panos Vagenas 2024-11-15 12:52:41 +0100
  • 835e077b02
    docs: fix parameter in usage.md (#332) Carl 2024-11-15 09:24:15 +0100
  • 8533039b0c
    fix: Fixing images in the input Word files (#330) Maxim Lysak 2024-11-14 13:33:34 +0100
  • bf2a85f1d4
    chore: fix Qdrant notebook Colab link (#319) Panos Vagenas 2024-11-14 10:42:02 +0100
  • 8b437adcde
    fix: reduce logging by keeping option for more verbose (#323) Michele Dolfi 2024-11-13 10:08:24 +0100
  • 5a44236ac2 chore: bump version to 2.5.2 [skip ci] github-actions[bot] 2024-11-13 08:19:09 +0000
  • c9341bf22e
    fix: skip glm model downloads (#322) Michele Dolfi 2024-11-13 08:45:28 +0100
  • 2c0c439a44 chore: bump version to 2.5.1 [skip ci] github-actions[bot] 2024-11-12 14:56:34 +0000
  • fb8ba861e2
    fix: Handling of single-cell tables in DOCX backend (#314) Maxim Lysak 2024-11-12 15:20:55 +0100
  • 7f5d35ea3c
    docs: Hybrid RAG with Qdrant (#312) Anush 2024-11-12 19:48:14 +0530
  • 93fc1be61a
    docs: add Data Prep Kit integration (#316) Panos Vagenas 2024-11-12 12:21:48 +0100
  • 777237ebc9 chore: bump version to 2.5.0 [skip ci] github-actions[bot] 2024-11-12 10:19:55 +0000
  • 5d4a10b121
    fix: Configure env prefix for docling settings (#315) Christoph Auer 2024-11-12 10:57:16 +0100
  • c6b3763ecb
    feat(OCR): Introduce the OcrOptions.force_full_page_ocr parameter that forces a full page OCR scanning (#290) Nikos Livathinos 2024-11-12 09:46:14 +0100
  • 81c8243a8b
    fix: Added handling of grouped elements in pptx backend (#307) Maxim Lysak 2024-11-11 16:38:21 +0100
  • 53bf2d1790
    Added handling of code blocks in html with <pre> tag (#302) Maxim Lysak 2024-11-11 15:00:11 +0100
  • 1239ade275
    docs: add navigation indices (#305) Panos Vagenas 2024-11-11 14:49:06 +0100
  • 97f214efdd
    fix: allow mps usage for easyocr (#286) Michele Dolfi 2024-11-10 14:26:17 +0100
  • be8aa17291 chore: bump version to 2.4.2 [skip ci] github-actions[bot] 2024-11-08 16:31:47 +0000
  • 0eb065e9b6
    fix(EasyOcrModel): Support the use_gpu pipeline parameter in EasyOcrModel. Initialize easyocr (#282) Nikos Livathinos 2024-11-08 16:48:41 +0100
  • 118f162e64 chore: bump version to 2.4.1 [skip ci] github-actions[bot] 2024-11-08 12:37:36 +0000
  • 704d792a79
    fix(tesserocr): Raise Exception if tesserocr has not loaded any languages (#279) Nikos Livathinos 2024-11-08 13:03:09 +0100
  • 6c22cba0a7
    chore: add issue templates (#251) Panos Vagenas 2024-11-05 23:18:20 +0100
  • c3098e3c12
    chore: fix typo (#241) Ikko Eltociear Ashimine 2024-11-06 00:20:04 +0900
  • a84ec276b0
    docs: update badges & credits (#248) Panos Vagenas 2024-11-05 13:57:06 +0100
  • 90836db90a
    fix: Dockerfile example copy command (#234) Anthony R 2024-11-05 12:48:27 +0100
  • 5ce02c5c59
    docs: add coming-soon section (#235) Panos Vagenas 2024-11-05 08:53:02 +0100
  • d5e65aedac
    docs: add artifacts-path param to CLI (#233) Panos Vagenas 2024-11-05 08:51:21 +0100
  • e30a9c25a2 chore: bump version to 2.4.0 [skip ci] github-actions[bot] 2024-11-04 15:11:09 +0000
  • 862d78d271
    chore: update pyproject.toml metadata (#229) Panos Vagenas 2024-11-04 15:48:00 +0100
  • eeee3b4371
    docs: add explicit artifacts path example (#224) Panos Vagenas 2024-11-04 14:27:56 +0100
  • 5f5fea90a9
    docs: update custom convert and dockerfile (#226) Michele Dolfi 2024-11-04 14:27:40 +0100
  • 41acaa9e2e
    docs: correct spelling of 'individual' (#219) Vicky Sekhon 2024-11-04 08:27:02 -0500
  • 40ad987303
    feat: pdf backend, table mode as options and artifacts path (#203) Michele Dolfi 2024-11-04 14:26:05 +0100
  • af323c04ef
    fit: Specify encoding when writing output file (#214) Johnny Salazar 2024-11-04 20:24:13 +0700
  • 8fb445f46c
    chore: make tests lighter (#228) Panos Vagenas 2024-11-04 14:02:28 +0100
  • 244ca69cfd
    docs: update LlamaIndex docs (#196) Panos Vagenas 2024-11-01 20:55:28 +0100
  • 9d8865856d chore: bump version to 2.3.1 [skip ci] github-actions[bot] 2024-10-30 18:23:53 +0000
  • eb679ccbb4
    fix: simplify torch dependencies and update pinned docling deps (#190) Michele Dolfi 2024-10-30 18:44:08 +0100
  • 904d24d600
    fix: allow to explicitly initialize the pipeline (#189) Michele Dolfi 2024-10-30 17:54:53 +0100
  • 43349865d0 chore: bump version to 2.3.0 [skip ci] github-actions[bot] 2024-10-30 14:47:37 +0000
  • 2a2c65bf4f
    feat: Add pipeline timings and toggle visualization, establish debug settings (#183) Christoph Auer 2024-10-30 15:04:19 +0100
  • 94a5290789
    chore: update the with input formats and DoclingDocument (#188) Peter W. J. Staar 2024-10-30 15:02:28 +0100