fix: Fix for the crash when encountering WMF images in pptx and docx (#837)

* Fix for the crash when encountering WMF images in pptx and docx backends on non Windows platforms

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Updated faq

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

---------

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
Co-authored-by: Maksym Lysak <mly@zurich.ibm.com>
This commit is contained in:
Maxim Lysak 2025-01-30 14:58:27 +01:00 committed by GitHub
parent d01a2e73ee
commit fea0a99a95
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
3 changed files with 14 additions and 7 deletions

View File

@ -271,13 +271,12 @@ class MsPowerpointDocumentBackend(DeclarativeDocumentBackend, PaginatedDocumentB
return
def handle_pictures(self, shape, parent_slide, slide_ind, doc):
# Get the image bytes
image = shape.image
image_bytes = image.blob
im_dpi, _ = image.dpi
# Open it with PIL
try:
# Get the image bytes
image = shape.image
image_bytes = image.blob
im_dpi, _ = image.dpi
pil_image = Image.open(BytesIO(image_bytes))
# shape has picture

View File

@ -520,11 +520,11 @@ class MsWordDocumentBackend(DeclarativeDocumentBackend):
image_data = image_part.blob # Get the binary image data
return image_data
image_data = get_docx_image(element, drawing_blip)
image_bytes = BytesIO(image_data)
level = self.get_level()
# Open the BytesIO object with PIL to create an Image
try:
image_data = get_docx_image(element, drawing_blip)
image_bytes = BytesIO(image_data)
pil_image = Image.open(image_bytes)
doc.add_picture(
parent=self.parents[level - 1],

View File

@ -151,3 +151,11 @@ This is a collection of FAQ collected from the user questions on <https://github
pipeline_options = PdfPipelineOptions()
pipeline_options.ocr_options.lang = ["fr", "de", "es", "en"] # example of languages for EasyOCR
```
??? Some images are missing from MS Word and Powerpoint"
### Some images are missing from MS Word and Powerpoint
The image processing library used by Docling is able to handle embedded WMF images only on Windows platform.
If you are on other operaring systems, these images will be ignored.