feat: Page-level error reporting from PDF backend, introduce PARTIAL_SUCCESS status (#47)

* Put safety-checks for failed parse of pages

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Introduce page-level error checks

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Bump to docling-parse 1.1.1

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Introduce page-level error checks

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

---------

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
This commit is contained in:
Christoph Auer
2024-08-23 16:18:41 +02:00
committed by GitHub
parent 3226b20779
commit a294b7e64a
7 changed files with 92 additions and 30 deletions

View File

@@ -19,6 +19,7 @@ from docling.datamodel.base_models import (
AssembledUnit,
ConversionStatus,
DocumentStream,
ErrorItem,
FigureElement,
Page,
PageElement,
@@ -118,7 +119,7 @@ class ConvertedDocument(BaseModel):
input: InputDocument
status: ConversionStatus = ConversionStatus.PENDING # failure, success
errors: List[Dict] = [] # structure to keep errors
errors: List[ErrorItem] = [] # structure to keep errors
pages: List[Page] = []
assembled: Optional[AssembledUnit] = None