Commit Graph

  • f542460af3 fix: fix duplicate title and heading + add e2e tests for html and docx (#186) Peter W. J. Staar 2024-10-30 13:14:56 +01:00
  • dda2645d4c chore: bump version to 2.2.1 [skip ci] github-actions[bot] 2024-10-28 17:18:41 +00:00
  • b9f5c74a7d fix: fix header levels for DOCX & HTML (#184) Panos Vagenas 2024-10-28 17:02:52 +01:00
  • 94d0729c50 fix: handling of long sequence of unescaped underscore chars in markdown (#173) Maxim Lysak 2024-10-28 16:34:48 +01:00
  • 2cece27208 docs: update LlamaIndex docs for Docling v2 (#182) Panos Vagenas 2024-10-28 14:28:26 +01:00
  • 189d3c2d44 docs: fix batch convert (#177) Michele Dolfi 2024-10-26 05:50:34 +02:00
  • 7d19418b77 fix: HTML backend, fixes for Lists and nested texts (#180) Maxim Lysak 2024-10-25 20:14:04 +02:00
  • 88c1673057 fix: MD Backend, fixes to properly handle trailing inline text and emphasis in headers (#178) Maxim Lysak 2024-10-25 18:02:20 +02:00
  • 77a89c3334 chore: make auto-release on request (#179) Michele Dolfi 2024-10-25 10:47:25 +02:00
  • 8d356aa247 docs: add export with embedded images (#175) Michele Dolfi 2024-10-24 20:19:41 +02:00
  • 8208c93e3a chore: bump version to 2.2.0 [skip ci] github-actions[bot] 2024-10-23 16:04:55 +00:00
  • 4116819b51 feat: Update to docling-parse v2 without history (#170) Peter W. J. Staar 2024-10-23 17:20:11 +02:00
  • 3023f18ba0 feat: Support AsciiDoc and Markdown input format (#168) Christoph Auer 2024-10-23 16:14:26 +02:00
  • 3496b4838f fix: set valid=false for invalid backends (#171) Michele Dolfi 2024-10-23 15:52:30 +02:00
  • b8d2286dd1 chore: various minor docs fixes (#169) Panos Vagenas 2024-10-22 15:29:36 +02:00
  • fa5f94ec10 Fix Typo errors in CONTRIBUTING.md file (#164) Mohamed Ali 2024-10-22 10:31:48 +05:30
  • d5460e2d1f chore: bump version to 2.1.0 [skip ci] github-actions[bot] 2024-10-18 13:21:15 +00:00
  • b346faf622 feat: add coverage_threshold to skip OCR for small images (#161) Michele Dolfi 2024-10-18 13:58:23 +02:00
  • f799e777c1 docs: typo fix (#155) ABHISHEK FADAKE 2024-10-18 17:26:48 +05:30
  • 63bef59d9e fix: fix legacy doc ref (#162) Panos Vagenas 2024-10-18 13:11:20 +02:00
  • bb7a58d45d ci: run ci also on forks (#160) Michele Dolfi 2024-10-18 12:32:27 +02:00
  • a00c937e19 Ensure all models work only on valid pages (#158) Christoph Auer 2024-10-18 08:54:06 +02:00
  • 034a411057 docs: add graphical band in readme (#154) Maxim Lysak 2024-10-17 18:15:40 +02:00
  • 61c092f445 docs: add use docling (#150) Michele Dolfi 2024-10-17 18:14:48 +02:00
  • 24f949ada2 chore: run apt-get update before install (#156) Michele Dolfi 2024-10-17 17:27:16 +02:00
  • a29c256041 chore: bump version to 2.0.0 [skip ci] github-actions[bot] 2024-10-16 19:48:06 +00:00
  • 7d3be0edeb feat!: Docling v2 (#117) Christoph Auer 2024-10-16 21:02:03 +02:00
  • d504432c1e docs: introduce docs site (#141) Panos Vagenas 2024-10-14 14:13:13 +02:00
  • 2b1e72d327 refactor: fix type of tesseractocr options (#140) Michele Dolfi 2024-10-14 08:40:22 +02:00
  • 4672b24c1a chore: bump version to 1.20.0 [skip ci] github-actions[bot] 2024-10-11 13:48:02 +00:00
  • 5e4944f15f feat: new experimental docling-parse v2 backend (#131) Christoph Auer 2024-10-11 15:12:49 +02:00
  • 2ec39636f0 chore: bump version to 1.19.1 [skip ci] github-actions[bot] 2024-10-11 08:52:09 +00:00
  • dae2a3b667 fix: remove stderr from tesseract cli and introduce fuzziness in the text validation of OCR tests (#138) Nikos Livathinos 2024-10-11 10:21:19 +02:00
  • 5f1bd9e9c8 docs: simplify LlamaIndex example using Docling extension (#135) Panos Vagenas 2024-10-09 22:17:56 +02:00
  • 6924999f1f chore: explicitly manage pandas dependency (#134) Panos Vagenas 2024-10-09 14:50:39 +02:00
  • 0ffc1708d2 chore: bump version to 1.19.0 [skip ci] github-actions[bot] 2024-10-08 17:42:29 +00:00
  • f96ea86a00 feat: add options for choosing OCR engines (#118) Michele Dolfi 2024-10-08 19:07:08 +02:00
  • d412c363d7 fixed unload pdf backend resources (#129) Fasal Shah 2024-10-08 14:16:43 +05:30
  • 9b82ae3324 chore: bump version to 1.18.0 [skip ci] github-actions[bot] 2024-10-03 17:16:00 +00:00
  • 2422f706a1 feat: new torch-based docling models (#120) Maxim Lysak 2024-10-03 18:42:33 +02:00
  • 9ebbbc1245 chore: bump version to 1.17.0 [skip ci] github-actions[bot] 2024-10-03 13:44:52 +00:00
  • dde0aff8bd update examples (#123) Rui Dias Gomes 2024-10-03 13:28:25 +01:00
  • d44c62d7ce feat: windows support (#122) Michele Dolfi 2024-10-03 14:23:47 +02:00
  • cde671cf34 chore: bump version to 1.16.1 [skip ci] github-actions[bot] 2024-09-27 14:36:40 +00:00
  • 34bd887a7f fix: allow usage of opencv 4.6.x (#110) Michele Dolfi 2024-09-27 15:51:43 +02:00
  • c05b692d69 docs: document chunking (#111) Panos Vagenas 2024-09-27 11:16:04 +02:00
  • 6760571fe1 chore: bump version to 1.16.0 [skip ci] github-actions[bot] 2024-09-27 06:21:15 +00:00
  • d6df76f90b feat: Support tableformer model choice (#90) Christoph Auer 2024-09-26 21:37:08 +02:00
  • 39977b5631 chore: move examples extras to respective group (#103) Panos Vagenas 2024-09-25 15:47:48 +02:00
  • 3dfd02a7e9 chore: bump version to 1.15.0 [skip ci] github-actions[bot] 2024-09-24 15:58:16 +00:00
  • 6a03c208ec feat: add figure in markdown (#98) Michele Dolfi 2024-09-24 17:28:23 +02:00
  • 001d214a13 chore: bump version to 1.14.0 [skip ci] github-actions[bot] 2024-09-24 13:38:23 +00:00
  • d96b96c848 fix: fix OCR setting for pypdfium, minor refactor (#102) Panos Vagenas 2024-09-24 14:36:00 +02:00
  • f8f2303348 docs: document CLI, minor README revamp (#100) Panos Vagenas 2024-09-24 09:21:28 +02:00
  • f555815343 chore: add RAG notebook titles (#101) Panos Vagenas 2024-09-24 09:17:46 +02:00
  • 3c46e4266c feat: add URL support to CLI (#99) Panos Vagenas 2024-09-24 08:47:53 +02:00
  • c65a01c9b7 chore: bump version to 1.13.1 [skip ci] github-actions[bot] 2024-09-23 19:04:01 +00:00
  • 4794ce460a fix: updated the render_as_doctags with the new arguments from docling-core (#93) Peter W. J. Staar 2024-09-23 20:12:18 +02:00
  • dce9934a0f Updated to new, clean vector logo, svg and rendered png are provided (#96) Maxim Lysak 2024-09-23 15:31:21 +02:00
  • 1f4b224ab6 chore: switch to gh apps user (#92) Michele Dolfi 2024-09-20 17:02:27 +02:00
  • 6dd1e91c4a chore: bump version to 1.13.0 [skip ci] github-actions[bot] 2024-09-18 09:26:03 +00:00
  • 0da7519896 docs: updated Docling logo.png with transparent background (#88) Maxim Lysak 2024-09-18 10:39:11 +02:00
  • f19bd43798 feat: add table exports (#86) Michele Dolfi 2024-09-18 08:44:13 +02:00
  • 442443a102 fix: bumped the glm version and adjusted the tests (#83) Peter W. J. Staar 2024-09-18 07:43:49 +02:00
  • 8242bce4fa chore: bump version to 1.12.2 [skip ci] github-actions[bot] 2024-09-17 16:01:34 +00:00
  • fa9699fa3c fix(tests): Adjust the test data to match the new version of LayoutPredictor (#82) Nikos Livathinos 2024-09-17 15:50:35 +02:00
  • 30a0ef69b4 chore: Add PR template (#81) Michele Dolfi 2024-09-16 18:36:26 +02:00
  • f1932fd8c5 chore: bump version to 1.12.1 [skip ci] github-actions[bot] 2024-09-16 10:58:09 +00:00
  • 2870fdc857 fix: CLI compatibility with python 3.10 and 3.11 (#79) Michele Dolfi 2024-09-16 12:32:45 +02:00
  • 34b2772a2e chore: bump version to 1.12.0 [skip ci] github-actions[bot] 2024-09-13 12:34:15 +00:00
  • 98990784df feat: add docling cli (#75) Peter W. J. Staar 2024-09-13 14:03:09 +02:00
  • 8aa476ccd3 test: improve typing definitions (part 1) (#72) Michele Dolfi 2024-09-12 15:56:29 +02:00
  • 53569a1023 docs: showcase RAG with LlamaIndex and LangChain (#71) Panos Vagenas 2024-09-11 15:07:08 +02:00
  • 79932b7d69 test: check for stable obj_type (#70) Michele Dolfi 2024-09-11 12:53:59 +02:00
  • e66dc53765 chore: bump version to 1.11.0 [skip ci] github-actions[bot] 2024-09-10 16:18:59 +00:00
  • bdfdfbf092 feat: adding txt and doctags output (#68) Peter W. J. Staar 2024-09-10 17:30:52 +02:00
  • cd5b6293cc chore: bump version to 1.10.0 [skip ci] github-actions[bot] 2024-09-10 14:38:07 +00:00
  • 27a7a152e1 feat: linux arm64 support and reducing dependencies (#69) Michele Dolfi 2024-09-10 15:43:27 +02:00
  • 1051eb9465 chore: update README (#65) Panos Vagenas 2024-09-09 12:03:04 +02:00
  • 6f1811e050 chore: fix placeholders in license (#63) Michele Dolfi 2024-09-06 17:10:07 +02:00
  • d3711437f6 chore: bump version to 1.9.0 [skip ci] github-actions[bot] 2024-09-03 13:33:40 +00:00
  • 1de2e4f924 feat: export document pages as multimodal output (#54) Michele Dolfi 2024-09-03 15:05:35 +02:00
  • 69e5d951a3 docs: Update MAINTAINERS.md (#59) Christoph Auer 2024-09-02 12:34:38 +02:00
  • 85b7348846 docs: Mention quackling on README (#58) Christoph Auer 2024-09-02 12:27:29 +02:00
  • 66ed096c40 chore: bump version to 1.8.5 [skip ci] github-actions[bot] 2024-08-30 12:37:54 +00:00
  • 48f4d1ba52 fix: Add unit tests (#51) Peter W. J. Staar 2024-08-30 14:08:20 +02:00
  • 256f4d504e chore: bump version to 1.8.4 [skip ci] github-actions[bot] 2024-08-30 08:47:57 +00:00
  • de85e46ced fix: propagate row_section in tables (#57) Michele Dolfi 2024-08-30 10:36:00 +02:00
  • a8a60d52b1 docs: add instructions for cpu-only installation (#56) Michele Dolfi 2024-08-30 10:20:21 +02:00
  • 5c46749e70 chore: bump version to 1.8.3 [skip ci] github-actions[bot] 2024-08-28 10:37:38 +00:00
  • f49ee825c3 fix: table cells overlap and model warnings (#53) Michele Dolfi 2024-08-28 12:30:42 +02:00
  • d0403aaebf chore: bump version to 1.8.2 [skip ci] github-actions[bot] 2024-08-27 09:53:15 +00:00
  • e46a66a176 fix: refine conversion result (#52) Panos Vagenas 2024-08-27 11:50:43 +02:00
  • fe817b11d7 docs: update interface in README (#50) Michele Dolfi 2024-08-26 15:36:39 +02:00
  • 7052bee999 chore: bump version to 1.8.1 [skip ci] github-actions[bot] 2024-08-26 11:55:37 +00:00
  • 8cc147bc56 fix: align output formats (#49) Michele Dolfi 2024-08-26 13:30:26 +02:00
  • 053eae4bdf chore: bump version to 1.8.0 [skip ci] github-actions[bot] 2024-08-23 14:24:04 +00:00
  • a294b7e64a feat: Page-level error reporting from PDF backend, introduce PARTIAL_SUCCESS status (#47) Christoph Auer 2024-08-23 16:18:41 +02:00
  • 3226b20779 chore: bump version to 1.7.1 [skip ci] github-actions[bot] 2024-08-23 11:56:02 +00:00
  • 8808463cec fix: Better raise exception when a page fails to parse (#46) Christoph Auer 2024-08-23 13:51:42 +02:00