fix: enrichment models batch size and expose picture classifier (#878)

* expose picture classifier in CLI

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* use different batch size in each model

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* remove batch size from CLI

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* cleanup imports

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
This commit is contained in:
Michele Dolfi
2025-02-05 11:46:01 +01:00
committed by GitHub
parent 17448163e7
commit 5ad6de0560
4 changed files with 13 additions and 1 deletions

View File

@@ -219,6 +219,13 @@ def convert(
bool,
typer.Option(..., help="Enable the formula enrichment model in the pipeline."),
] = False,
enrich_picture_classes: Annotated[
bool,
typer.Option(
...,
help="Enable the picture classification enrichment model in the pipeline.",
),
] = False,
artifacts_path: Annotated[
Optional[Path],
typer.Option(..., help="If provided, the location of the model artifacts."),
@@ -375,6 +382,7 @@ def convert(
do_table_structure=True,
do_code_enrichment=enrich_code,
do_formula_enrichment=enrich_formula,
do_picture_classification=enrich_picture_classes,
document_timeout=document_timeout,
)
pipeline_options.table_structure_options.do_cell_matching = (