Docling/docs
Michele Dolfi 9114ada7bc
fix: Test cases for RTL programmatic PDFs and fixes for the formula model (#903)
fix: Support for RTL programmatic documents
fix(parser): detect and handle rotated pages
fix(parser): fix bug causing duplicated text
fix(formula): improve stopping criteria
chore: update lock file
fix: temporary constrain beautifulsoup


* switch to code formula model v1.0.1 and new test pdf

Signed-off-by: Matteo-Omenetti <Matteo.Omenetti1@ibm.com>

* switch to code formula model v1.0.1 and new test pdf

Signed-off-by: Matteo-Omenetti <Matteo.Omenetti1@ibm.com>

* cleaned up the data folder in the tests

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* switch to code formula model v1.0.1 and new test pdf

Signed-off-by: Matteo-Omenetti <Matteo.Omenetti1@ibm.com>

* added three test-files for right-to-left

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fix black

Signed-off-by: Matteo-Omenetti <Matteo.Omenetti1@ibm.com>

* added new gt for test_e2e_conversion

Signed-off-by: Matteo-Omenetti <Matteo.Omenetti1@ibm.com>

* added new gt for test_e2e_conversion

Signed-off-by: Matteo-Omenetti <Matteo.Omenetti1@ibm.com>

* Add code to expose text direction of cell

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* new test file

Signed-off-by: Matteo-Omenetti <Matteo.Omenetti1@ibm.com>

* update lock

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* fix mypy reports

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* fix example filepaths

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add test data results

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* pin wheel of latest docling-parse release

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* use latest docling-core

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* remove debugging code

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* fix path to files in example

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* Revert unwanted RTL additions

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Fix test data paths in examples

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

---------

Signed-off-by: Matteo-Omenetti <Matteo.Omenetti1@ibm.com>
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Matteo-Omenetti <Matteo.Omenetti1@ibm.com>
Co-authored-by: Peter Staar <taa@zurich.ibm.com>
Co-authored-by: Christoph Auer <cau@zurich.ibm.com>
2025-02-07 08:43:31 +01:00
..
assets feat: expose new hybrid chunker, update docs (#384) 2024-12-09 08:28:29 +01:00
concepts docs: fix links between docs pages (#697) 2025-01-20 09:52:59 +01:00
examples fix: Test cases for RTL programmatic PDFs and fixes for the formula model (#903) 2025-02-07 08:43:31 +01:00
integrations docs: add pointers to LangChain-side docs (#718) 2025-01-09 17:36:46 +01:00
overrides docs: extend integration docs & README (#456) 2024-11-28 09:41:21 +01:00
reference docs: specify docstring types (#702) 2025-01-08 09:05:18 +01:00
stylesheets docs: introduce docs site (#141) 2024-10-14 14:13:13 +02:00
faq.md feat: Python 3.13 support (#841) 2025-01-30 17:26:42 +01:00
index.md docs: updated the readme with upcoming features (#831) 2025-01-30 09:52:54 +01:00
installation.md feat(ocr): added support for RapidOCR engine (#415) 2024-11-27 13:57:41 +01:00
supported_formats.md docs: document Docling JSON parsing (#819) 2025-01-28 13:23:30 +01:00
usage.md feat: new artifacts path and CLI utility (#876) 2025-02-06 15:46:32 +01:00
v2.md fix: Test cases for RTL programmatic PDFs and fixes for the formula model (#903) 2025-02-07 08:43:31 +01:00