feat: expose ocr-lang in CLI (#375)

* feat: expose ocr-lang in CLI

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* use regex for supporting multiple sep

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
This commit is contained in:
Michele Dolfi
2024-11-19 15:58:49 +01:00
committed by GitHub
parent 926dfd29d5
commit ed785ea122
2 changed files with 19 additions and 0 deletions

View File

@@ -22,6 +22,7 @@ class TableStructureOptions(BaseModel):
class OcrOptions(BaseModel):
kind: str
lang: List[str]
force_full_page_ocr: bool = False # If enabled a full page OCR is always applied
bitmap_area_threshold: float = (
0.05 # percentage of the area for a bitmap to processed with OCR