-
8012a3e4d6
fix: Treat overflowing -v flags as DEBUG (#1419)
Eugene
2025-04-19 13:02:41 +0400
-
88948b0bba
docs: Updated the [Usage] link in architecture.md (#1416)
Leandro Rosas
2025-04-19 09:20:52 +0100
-
fa7fc9e63d
fix(codecov): fix codecov argument and yaml file (#1399)
Cesar Berrospi Ramis
2025-04-15 18:12:57 +0200
-
550b1ca2f8
chore: propagate docling-core fix (#1389)
Panos Vagenas
2025-04-15 10:51:47 +0200
-
a7dd59c5cb
docs(ocr): Add docs entry for OnnxTR OCR plugin (#1382)
Felix Dittrich
2025-04-15 09:46:59 +0200
-
06227e9970
ci: sign pypi packages (#1392)
Michele Dolfi
2025-04-15 08:59:16 +0200
-
5458a88464
ci: add coverage and ruff (#1383)
Michele Dolfi
2025-04-14 18:01:26 +0200
-
293c28ca7c
docs(security): more statements about secure development (#1381)
Michele Dolfi
2025-04-14 13:53:26 +0200
-
01fbfd5652
docs: Add testing in the docs (#1379)
Michele Dolfi
2025-04-14 12:31:48 +0200
-
d9c3999175
chore: update lock file (#1378)
Michele Dolfi
2025-04-14 10:38:10 +0200
-
a026b4e84b
docs: Add Notes for Installing in Intel macOS (#1377)
Juil Park
2025-04-14 17:21:13 +0900
-
c391adb5f0
chore: bump version to 2.30.0 [skip ci]
github-actions[bot]
2025-04-14 08:20:31 +0000
-
7e40ad3261
fix(deps): widen typer upper bound (#1375)
Michele Dolfi
2025-04-14 09:23:39 +0200
-
c0ba88edf1
feat(cli): add option for html with split-page mode (#1355)
Peter W. J. Staar
2025-04-14 08:41:50 +0200
-
0de70e7991
fix: auto-recognize .xlsx, .docx and .pptx files (#1340)
Tim Kellogg
2025-04-14 01:45:13 -0400
-
b295da4bfe
chore: Update repository URL in CITATION.cff (#1363)
Simon Leiß
2025-04-14 06:57:04 +0200
-
415b877984
fix(docx): declare image_data variable when handling pictures (#1359)
Cesar Berrospi Ramis
2025-04-11 13:04:00 +0200
-
250399948d
fix: Implement PictureDescriptionApiOptions.bitmap_area_threshold (#1248)
Rowan Skewes
2025-04-11 19:14:05 +1000
-
eef2bdea77
feat(xlsx): create a page for each worksheet in XLSX backend (#1332)
Cesar Berrospi Ramis
2025-04-11 10:29:53 +0200
-
c605edd8e9
feat: OllamaVlmModel for Granite Vision 3.2 (#1337)
Gabe Goodhart
2025-04-10 10:03:04 -0600
-
6b696b504a
fix: Properly address page in pipeline _assemble_document when page_range is provided (#1334)
Joan Fabrégat
2025-04-10 16:11:28 +0200
-
72ab8e1821
chore: bump version to 2.29.0 [skip ci]
github-actions[bot]
2025-04-10 12:24:09 +0000
-
355d8dc7a6
chore: Logo parameter in docling CLI, prints cute ascii logo (#1294)
Maxim Lysak
2025-04-09 05:29:48 +0200
-
14e9c0ce9a
fix(docx): Adding new latex symbols, simplifying how equations are added to text (#1295)
Rafael Teixeira de Lima
2025-04-08 17:11:37 +0200
-
0499cd1c1e
feat: handle <code> tags as code blocks (#1320)
Fernando Santos
2025-04-08 05:32:06 -0300
-
2e99e5a54f
docs: add plugins docs (#1319)
Michele Dolfi
2025-04-08 09:44:37 +0200
-
61de30966f
chore: update lock file (#1315)
Michele Dolfi
2025-04-07 17:47:51 +0200
-
dc3bf9ceac
fix(pptx): check if picture shape has an image attached (#1316)
Maxim Lysak
2025-04-07 17:36:56 +0200
-
bfcab3d677
feat(docx): add text formatting and hyperlink support (#630)
Simon Jégou
2025-04-03 15:11:50 +0200
-
71148eb381
docs: add visual grounding example (#1270)
Panos Vagenas
2025-04-02 14:03:19 +0200
-
d2d68747f9
fix(docx): Improve text parsing (#1268)
Rafael Teixeira de Lima
2025-04-02 12:56:44 +0200
-
b3d111a3cd
fix: Tesseract OCR CLI can't process images composed with numbers only (#1201)
Guilhem VERMOREL
2025-03-31 10:53:49 +0200
-
44f2b081ec
chore: bump version to 2.28.4 [skip ci]
github-actions[bot]
2025-03-29 11:56:42 +0000
-
7afad7e52d
fix: Fixes tables when using OCR (#1261)
Maxim Lysak
2025-03-29 10:06:00 +0100
-
124f921077
chore: bump version to 2.28.3 [skip ci]
github-actions[bot]
2025-03-28 18:30:03 +0000
-
8bd71e8e33
fix: Word-level pdf cells for tables (#1238)
Maxim Lysak
2025-03-28 16:34:48 +0100
-
82694b2136
chore: bump version to 2.28.2 [skip ci]
github-actions[bot]
2025-03-26 16:52:06 +0000
-
9210812bfa
fix: improve HTML layer detection, various MD fixes (#1241)
Panos Vagenas
2025-03-26 16:07:14 +0100
-
85c4df887b
fix(html): fix HTML parsed heading level (#1244)
Panos Vagenas
2025-03-26 10:30:23 +0100
-
9eb1686f93
chore: bump version to 2.28.1 [skip ci]
github-actions[bot]
2025-03-25 18:20:23 +0000
-
38b7108a22
chore: update locked deps (#1239)
Panos Vagenas
2025-03-25 15:48:02 +0100
-
825b226fab
fix(converter): Cache same pipeline class with different options (#1152)
mislavmartinic
2025-03-26 00:18:44 +1300
-
6df8827231
fix(debug): Missing translation of bbox to to_bounding_box (#1220)
Hoang-Long Do
2025-03-25 18:18:10 +0700
-
f739d0e4c5
fix(docx): identifying numbered headers (#1231)
Rafael Teixeira de Lima
2025-03-25 11:41:02 +0100
-
0974ba4e1c
docs(examples): batch conversion doc
raises_on_error
(#1147)
Clément Doumouro
2025-03-25 11:14:39 +0100
-
8ebb0bf1a0
chore: properly clean up apt temporary files in Dockerfile (#1223)
Peter Dave Hello
2025-03-25 18:10:09 +0800
-
7df157204b
chore: bump version to 2.28.0 [skip ci]
github-actions[bot]
2025-03-19 15:18:10 +0000
-
1c26769785
feat(SmolDocling): Support MLX acceleration in VLM pipeline (#1199)
Maxim Lysak
2025-03-19 15:38:54 +0100
-
b454aa1551
feat: Add PPTX notes slides (#474)
Maciej Wieczorek
2025-03-19 14:52:09 +0100
-
f5adfb9724
fix: Determine correct page size in DoclingParseV4Backend (#1196)
Christoph Auer
2025-03-19 11:05:42 +0100
-
d5f7798763
test(html): fix regression test after docling-core update (#1197)
Cesar Berrospi Ramis
2025-03-19 11:03:46 +0100
-
0b707d0882
fix(msword): Fixing function return in equations handling (#1194)
Rafael Teixeira de Lima
2025-03-19 10:34:25 +0100
-
1d680b0a32
docs: Linux Foundation AI & Data (#1183)
Michele Dolfi
2025-03-19 09:05:57 +0100
-
54a78c307d
docs: move apify to docs (#1182)
Michele Dolfi
2025-03-18 16:43:55 +0100
-
2f72167ff6
feat: updated vlm pipeline (with latest changes from docling-core) (#1158)
Maxim Lysak
2025-03-18 15:44:51 +0100
-
1a2a9e4eff
chore: bump version to 2.27.0 [skip ci]
github-actions[bot]
2025-03-18 13:37:45 +0000
-
6eaae3cba0
feat: add factory for ocr engines via plugins (#1010)
Michele Dolfi
2025-03-18 13:58:05 +0100
-
3960b199d6
feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905)
Christoph Auer
2025-03-18 10:38:19 +0100
-
772487f9c9
feat(actor): Docling Actor on Apify infrastructure (#875)
Václav Vančura
2025-03-18 10:17:44 +0100
-
7e01798417
docs: fix spelling of picture in usage (#1165)
serced
2025-03-17 09:33:51 +0100
-
fa16b12316
chore: move to docling-project org (#1160)
Michele Dolfi
2025-03-14 12:35:29 +0100
-
f94da44ec5
fix(html): handle nested empty lists (#1154)
Cesar Berrospi Ramis
2025-03-13 16:56:58 +0100
-
0945973b79
fix: use first table row as col headers (#1156)
Panos Vagenas
2025-03-13 15:34:18 +0100
-
6eb718f849
feat: equations to latex in MSWord backend (with inline groups) (#1114)
Rafael Teixeira de Lima
2025-03-13 15:12:22 +0100
-
aa92a57fa9
fix: Pass tests, update docling-core to 2.22.0 (#1150)
Cesar Berrospi Ramis
2025-03-13 09:45:55 +0100
-
17c5bf1242
chore: bump version to 2.26.0 [skip ci]
github-actions[bot]
2025-03-11 11:12:43 +0000
-
eb97357b05
feat: Use new TableFormer model weights and default to accurate model version (#1100)
Christoph Auer
2025-03-11 10:53:49 +0100
-
5e30381c0d
perf: New revision code formula model and document picture classifier (#1140)
Matteo
2025-03-11 09:15:28 +0000
-
4d64c4c0b6
fix(CLI): fix help message for abort options (#1130)
Michele Dolfi
2025-03-07 14:47:49 +0100
-
e1c49ad727
docs: add description of DOCLING_ARTIFACTS_PATH env var (#1124)
Michele Dolfi
2025-03-06 07:30:07 +0100
-
a3c957ca6b
chore: bump version to 2.25.2 [skip ci]
github-actions[bot]
2025-03-05 14:51:57 +0000
-
c56ab3a66b
fix: Proper handling of orphan IDs in layout postprocessing (#1118)
Christoph Auer
2025-03-05 14:30:59 +0100
-
357d41cc47
docs: Enrichment models (#1097)
Michele Dolfi
2025-03-04 14:24:38 +0100
-
b1e79cadc7
chore: bump version to 2.25.1 [skip ci]
github-actions[bot]
2025-03-03 00:56:40 +0000
-
0c1e9391de
chore: use gh cache for huggingface models (#1096)
Michele Dolfi
2025-03-03 00:13:47 +0100
-
8dc0562542
fix: enable locks for threadsafe pdfium (#1052)
Michele Dolfi
2025-03-02 20:06:44 +0100
-
e25d557c06
refactor: add the contentlayer to html-backend (#1040)
Peter W. J. Staar
2025-03-02 10:37:53 -0500
-
db3ceefd4a
docs: improve docs on token limit warning triggered by HybridChunker (#1077)
Panos Vagenas
2025-02-28 14:54:46 +0100
-
de7b963b09
fix(html): use 'start' attribute when parsing ordered lists from HTML docs (#1062)
Cesar Berrospi Ramis
2025-02-27 09:46:57 +0100
-
37dd8c1cc7
chore: bump version to 2.25.0 [skip ci]
github-actions[bot]
2025-02-26 14:16:15 +0000
-
3c9fe76b70
feat: [Experimental] Introduce VLM pipeline using HF AutoModelForVision2Seq, featuring SmolDocling model (#1054)
Christoph Auer
2025-02-26 14:43:26 +0100
-
ab683e4fb6
feat(cli): add option for downloading all models, refine help messages (#1061)
Panos Vagenas
2025-02-26 13:27:29 +0100
-
e197225739
fix: vlm using artifacts path (#1057)
Michele Dolfi
2025-02-26 08:33:50 +0100
-
c84b973959
docs: extend chunking docs, add FAQ on token limit (#1053)
Panos Vagenas
2025-02-25 13:07:38 +0100
-
1b0ead6907
fix(html): Parse text in div elements as TextItem (#1041)
Cesar Berrospi Ramis
2025-02-24 12:38:29 +0100
-
1d17e7397a
test: avoid testing exact JSON in CSV backend (#1038)
Suehtam
2025-02-24 07:10:40 +0000
-
d8a81c3168
chore: bump version to 2.24.0 [skip ci]
github-actions[bot]
2025-02-20 18:31:20 +0000
-
c93e36988f
feat: Implement new reading-order model (#916)
Christoph Auer
2025-02-20 17:51:17 +0100
-
c031a7ae47
chore: bump version to 2.23.1 [skip ci]
github-actions[bot]
2025-02-20 16:26:41 +0000
-
1ac010354f
test: avoid testing exact JSON (#1027)
Cesar Berrospi Ramis
2025-02-20 16:20:07 +0100
-
6796f0a132
fix: Runtime error when Pandas Series is not always of string type (#1024)
fanszoro
2025-02-20 22:41:41 +0800
-
dfcc30dddb
chore: Update tests and lockfile (#1021)
Christoph Auer
2025-02-19 16:51:53 +0100
-
27c04007bc
docs: revamp picture description example (#1015)
Panos Vagenas
2025-02-19 11:28:54 +0100
-
7450050ace
refactor: upgrade BeautifulSoup4 with type hints (#999)
Cesar Berrospi Ramis
2025-02-18 11:30:47 +0100
-
75db61127c
chore: bump version to 2.23.0 [skip ci]
github-actions[bot]
2025-02-17 14:22:49 +0000
-
6e75f0b5d3
fix: Revise DocTags, fix iterate_items to output content_layer in items (#965)
Maxim Lysak
2025-02-17 14:11:55 +0100
-
77eb77bdc2
feat: Support cuda:n GPU device allocation (#694)
Ahmed Nassar
2025-02-17 11:31:13 +0100
-
428b656793
feat(xml-jats): parse XML JATS documents (#967)
Cesar Berrospi Ramis
2025-02-17 10:43:31 +0100
-
e1436a8b05
test: validate actual docitems in tests (#966)
Michele Dolfi
2025-02-14 17:47:53 +0100
-
ffbde1d1b0
chore: bump version to 2.22.0 [skip ci]
github-actions[bot]
2025-02-14 08:53:20 +0000