
* feat: Introduce automatic language detection in tesseract_ocr_cli model. Extend unit tests. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * docs: Add example how to use "auto" language with tesseract OCR engines Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * fix: Refactor the TesseractOcrModel and TesseractOcrCliModel to validate if the auto-detected language is installed in the system and if not fall back to a default option without language. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> --------- Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
10 lines
263 B
Python
10 lines
263 B
Python
def map_tesseract_script(script: str) -> str:
|
|
r""" """
|
|
if script == "Katakana" or script == "Hiragana":
|
|
script = "Japanese"
|
|
elif script == "Han":
|
|
script = "HanS"
|
|
elif script == "Korean":
|
|
script = "Hangul"
|
|
return script
|