fix: Fixed docx import with headers that are also lists (#842)

* Fix for docx when headers are also lists, now recorded as appropriate headers and subheaders, unit test included

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Update docling/backend/msword_backend.py

Co-authored-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
Signed-off-by: Maxim Lysak <101627549+maxmnemonic@users.noreply.github.com>

* Update docling/backend/msword_backend.py

Co-authored-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
Signed-off-by: Maxim Lysak <101627549+maxmnemonic@users.noreply.github.com>

---------

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
Signed-off-by: Maxim Lysak <101627549+maxmnemonic@users.noreply.github.com>
Co-authored-by: Maksym Lysak <mly@zurich.ibm.com>
Co-authored-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
This commit is contained in:
Maxim Lysak
2025-01-31 10:51:21 +01:00
committed by GitHub
parent 2a1f8afe7e
commit 2c037ae62e
5 changed files with 868 additions and 6 deletions
@@ -0,0 +1,43 @@
# Test Document
## Section 1
Paragraph 1.1
Paragraph 1.2
### Section 1.1
Paragraph 1.1.1
Paragraph 1.1.2
### Section 1.2
Paragraph 1.1.1
Paragraph 1.1.2
#### Section 1.2.3
Paragraph 1.2.3.1
Paragraph 1.2.3.1
## Section 2
Paragraph 2.1
Paragraph 2.2
#### Section 2.1.1
Paragraph 2.1.1.1
Paragraph 2.1.1.1
### Section 2.1
Paragraph 2.1.1
Paragraph 2.1.2