Docling/tests/data/groundtruth/docling_v2/equations.docx.md
Rafael Teixeira de Lima 14e9c0ce9a
fix(docx): Adding new latex symbols, simplifying how equations are added to text (#1295)
* Adding new latex symbols, simplifying how equations are added to text

Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>

* Identify headers through inhenrited style

Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>

* Log warning message instead of print

Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>

* Adding new latex symbols, simplifying how equations are added to text

Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>

* Identify headers through inhenrited style

Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>

* Log warning message instead of print

Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>

* fix: Tesseract OCR CLI can't process images composed with numbers only (#1201)

fix wrong type text extracted by tesseract_ocr_cli_model

Signed-off-by: gvl4 <Guilhem.VERMOREL@3ds.com>
Co-authored-by: gvl4 <Guilhem.VERMOREL@3ds.com>
Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>

* fix(docx): Improve text parsing (#1268)

* chore: bump version to 2.28.4 [skip ci]

Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>

* Improve text parsing

Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>

* fix: Tesseract OCR CLI can't process images composed with numbers only (#1201)

fix wrong type text extracted by tesseract_ocr_cli_model

Signed-off-by: gvl4 <Guilhem.VERMOREL@3ds.com>
Co-authored-by: gvl4 <Guilhem.VERMOREL@3ds.com>
Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>

* Flexibilize heading detection

Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>

* Fix trailing space

Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>

* Remove trailing space

Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>

---------

Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>
Signed-off-by: gvl4 <Guilhem.VERMOREL@3ds.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Guilhem VERMOREL <83694424+guilhemvermorel@users.noreply.github.com>
Co-authored-by: gvl4 <Guilhem.VERMOREL@3ds.com>
Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>

* docs: add visual grounding example (#1270)

Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>
Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>

* feat(docx): add text formatting and hyperlink support (#630)

* feat: Enable markdown text formatting for docx

Signed-off-by: SimJeg <sjegou@nvidia.com>

* Fix imports

Signed-off-by: SimJeg <sjegou@nvidia.com>

* Use Formatting

Signed-off-by: SimJeg <sjegou@nvidia.com>

* Handle hyperlink

Signed-off-by: SimJeg <sjegou@nvidia.com>

* Handle formatting properly for DocItemLabel.PARAGRAPH

Signed-off-by: SimJeg <sjegou@nvidia.com>

* Use inline group

Signed-off-by: SimJeg <sjegou@nvidia.com>

* Handle bullet lists

Signed-off-by: SimJeg <sjegou@nvidia.com>

* Strip elements

Signed-off-by: SimJeg <sjegou@nvidia.com>

* Strip elements

Signed-off-by: SimJeg <sjegou@nvidia.com>

* Run black and mypy

Signed-off-by: SimJeg <sjegou@nvidia.com>

* Handle header and footer

Signed-off-by: SimJeg <sjegou@nvidia.com>

* Use inline_fmt everywhere

Signed-off-by: SimJeg <sjegou@nvidia.com>

* Run precommit

Signed-off-by: SimJeg <sjegou@nvidia.com>

* Address feedback

Signed-off-by: SimJeg <sjegou@nvidia.com>

* Fix add_list_item

Signed-off-by: SimJeg <sjegou@nvidia.com>

* fix minor bugs, mark helper methods internal

Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>

---------

Signed-off-by: SimJeg <sjegou@nvidia.com>
Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>
Co-authored-by: Panos Vagenas <pva@zurich.ibm.com>
Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>

* fix(pptx): check if picture shape has an image attached (#1316)

Check if picture shape has an image attached in pptx backend

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
Co-authored-by: Maksym Lysak <mly@zurich.ibm.com>
Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>

* chore: update lock file (#1315)

chore: update lock

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>

* docs: add plugins docs (#1319)

add plugin docs

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>

* feat: handle <code> tags as code blocks (#1320)

handle <code> tags as code blocks

Signed-off-by: FernandoSSI <fernandosi2005@gmail.com>
Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>

* Adding new latex symbols, simplifying how equations are added to text

Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>

* Identify headers through inhenrited style

Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>

* Log warning message instead of print

Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>

* Adding new latex symbols, simplifying how equations are added to text

Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>

---------

Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>
Signed-off-by: gvl4 <Guilhem.VERMOREL@3ds.com>
Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>
Signed-off-by: SimJeg <sjegou@nvidia.com>
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: FernandoSSI <fernandosi2005@gmail.com>
Co-authored-by: Guilhem VERMOREL <83694424+guilhemvermorel@users.noreply.github.com>
Co-authored-by: gvl4 <Guilhem.VERMOREL@3ds.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
Co-authored-by: Simon Jégou <SimJeg@users.noreply.github.com>
Co-authored-by: Panos Vagenas <pva@zurich.ibm.com>
Co-authored-by: Maxim Lysak <101627549+maxmnemonic@users.noreply.github.com>
Co-authored-by: Maksym Lysak <mly@zurich.ibm.com>
Co-authored-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com>
Co-authored-by: Fernando Santos <121275806+FernandoSSI@users.noreply.github.com>
2025-04-08 17:11:37 +02:00

1.9 KiB
Vendored

This is a word document and this is an inline equation: A= \pi r^{2} . If instead, I want an equation by line, I can do this:

a^{2}+b^{2}=c^{2} \text{ \texttimes } 23

And that is an equation by itself. Cheers!

This is another equation:

f\left(x\right)=a_{0}+\sum_{n=1}^{ \infty }\left(a_{n}\cos(\frac{n \pi x}{L})+b_{n}\sin(\frac{n \pi x}{L})\right)

This is text. This is text. This is text. This is text. This is text. This is text. This is text. This is text. This is text. This is text. This is text. This is text. This is text. This is text. This is text. This is text. This is text. This is text. This is text. This is text. This is text. This is text. This is text. This is text. This is text. This is text. This is text. This is text. This is text. This is text. This is text.

This is a word document and this is an inline equation: A= \pi r^{2} . If instead, I want an equation by line, I can do this:

\left(x+a\right)^{n}=\sum_{k=0}^{n}\left(\genfrac{}{}{0pt}{}{n}{k}\right)x^{k}a^{n-k}

And that is an equation by itself. Cheers!

This is another equation:

\left(1+x\right)^{n}=1+\frac{nx}{1!}+\frac{n\left(n-1\right)x^{2}}{2!}+ \text{ \textellipsis }

This is text. This is text. This is text. This is text. This is text. This is text. This is text. This is text. This is text. This is text. This is text. This is text. This is text. This is text. This is text. This is text. This is text. This is text. This is text. This is text. This is text. This is text. This is text. This is text. This is text. This is text. This is text. This is text. This is text. This is text. This is text.

This is a word document and this is an inline equation: A= \pi r^{2} . If instead, I want an equation by line, I can do this:

e^{x}=1+\frac{x}{1!}+\frac{x^{2}}{2!}+\frac{x^{3}}{3!}+ \text{ \textellipsis } , - \infty  < x <  \infty

And that is an equation by itself. Cheers!