* scaffolding in place
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* doing scaffolding for audio pipeline
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* WIP: got first transcription working
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* all working, time to start cleaning up
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* first working ASR pipeline
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* added openai-whisper as a first transcription model
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* updating with asr_options
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* finalised the first working ASR pipeline with Whisper
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* use whisper from the latest git commit
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* Update docling/datamodel/pipeline_options.py
Co-authored-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com>
Signed-off-by: Peter W. J. Staar <91719829+PeterStaar-IBM@users.noreply.github.com>
* Update docling/datamodel/pipeline_options.py
Co-authored-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com>
Signed-off-by: Peter W. J. Staar <91719829+PeterStaar-IBM@users.noreply.github.com>
* updated comment
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* AudioBackend -> DummyBackend
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* file rename
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Rename to NoOpBackend, add test for ASR pipeline
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Support every format in NoOpBackend
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Add missing audio file and test
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Install ffmpeg system dependency for ASR test
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
---------
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Peter W. J. Staar <91719829+PeterStaar-IBM@users.noreply.github.com>
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com>
Co-authored-by: Christoph Auer <cau@zurich.ibm.com>
* feat: adding new vlm-models support
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* fixed the transformers
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* got microsoft/Phi-4-multimodal-instruct to work
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* working on vlm's
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* refactoring the VLM part
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* all working, now serious refacgtoring necessary
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* refactoring the download_model
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* added the formulate_prompt
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* pixtral 12b runs via MLX and native transformers
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* added the VlmPredictionToken
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* refactoring minimal_vlm_pipeline
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* fixed the MyPy
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* added pipeline_model_specializations file
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* need to get Phi4 working again ...
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* finalising last points for vlms support
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* fixed the pipeline for Phi4
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* streamlining all code
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* reformatted the code
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* fixing the tests
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* added the html backend to the VLM pipeline
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* fixed the static load_from_doctags
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* restore stable imports
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* use AutoModelForVision2Seq for Pixtral and review example (including rename)
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* remove unused value
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* refactor instances of VLM models
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* skip compare example in CI
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* use lowercase and uppercase only
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* add new minimal_vlm example and refactor pipeline_options_vlm_model for cleaner import
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* rename pipeline_vlm_model_spec
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* move more argument to options and simplify model init
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* add supported_devices
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* remove not-needed function
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* exclude minimal_vlm
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* missing file
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* add message for transformers version
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* rename to specs
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* use module import and remove MLX from non-darwin
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* remove hf_vlm_model and add extra_generation_args
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* use single HF VLM model class
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* remove torch type
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* add docs for vision models
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
---------
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
* feat: Add visualization of bbox on page with html export.
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* updated the cli
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* reformatted code
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* updated the cli argument to show_layout
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
---------
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* build: Add ollama sdk dependency
Branch: OllamaVlmModel
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
* feat: Add option plumbing for OllamaVlmOptions in pipeline_options
Branch: OllamaVlmModel
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
* feat: Full implementation of OllamaVlmModel
Branch: OllamaVlmModel
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
* feat: Connect "granite_vision_ollama" pipeline option to CLI
Branch: OllamaVlmModel
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
* Revert "build: Add ollama sdk dependency"
After consideration, we're going to use the generic OpenAI API instead
of the Ollama-specific API to avoid duplicate work.
This reverts commit bc6b366468cdd66b52540aac9c7d8b584ab48ad0.
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
* refactor: Move OpenAI API call logic into utils.utils
This will allow reuse of this logic in a generic VLM model
NOTE: There is a subtle change here in the ordering of the text prompt and
the image in the call to the OpenAI API. When run against Ollama, this
ordering makes a big difference. If the prompt comes before the image, the
result is terse and not usable whereas the prompt coming after the image
works as expected and matches the non-OpenAI chat API.
Branch: OllamaVlmModel
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
* refactor: Refactor from Ollama SDK to generic OpenAI API
Branch: OllamaVlmModel
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
* fix: Linting, formatting, and bug fixes
The one bug fix was in the timeout arg to openai_image_request. Otherwise,
this is all style changes to get MyPy and black passing cleanly.
Branch: OllamaVlmModel
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
* remove model from download enum
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* generalize input args for other API providers
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* rename and refactor
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* add example
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* require flag for remote services
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* disable example from CI
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* add examples to docs
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
---------
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
logo parameter in docling cli, prints cute ascii logo
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
Co-authored-by: Maksym Lysak <mly@zurich.ibm.com>
* Add DoclingParseV3 backend implementation
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Use docling-core with docling-parse types
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fixes and test updates
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fix streams
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fix streams
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Reset tests
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* update test cases
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* update test units
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Add back DoclingParse v1 backend, pipeline options
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Update locks
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* fix: update docling-core to 2.22.0
Update dependency library docling-core to latest release 2.22.0
Fix regression tests and ground truth files
Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
* Ground-truth files updated
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Update tests, use TextCell.from_ocr property
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Text fixes, new test data
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Rename docling backend to v4
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Test all backends, fixes
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Reset all tests to use docling-parse v1 for now
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fixes for DPv4 backend init, better test coverage
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* test_input_doc use default backend
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
Co-authored-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
* feat: New tableformer model weights [WIP]
Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com>
* Updated TF version
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Updated tests, after merging with Main, Switched to Accurate TF model by default
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
---------
Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com>
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
Co-authored-by: Maksym Lysak <mly@zurich.ibm.com>
* added http header support for document converter and cli
Signed-off-by: Luke Harrison <Luke.Harrison1@ibm.com>
* fixed formatting and typing issues
Signed-off-by: Luke Harrison <Luke.Harrison1@ibm.com>
* use pydantic to parse dict
suggested by @dolfim-ibm
Co-authored-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com>
Signed-off-by: Luke Harrison <luke.harrison1@ibm.com>
---------
Signed-off-by: Luke Harrison <Luke.Harrison1@ibm.com>
Signed-off-by: Luke Harrison <luke.harrison1@ibm.com>
Co-authored-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com>
* Upgraded Layout Postprocessing, sending old code back to ERZ
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Implement hierachical cluster layout processing
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Pass nested cluster processing through full pipeline
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Pass nested clusters through GLM as payload
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Move to_docling_document from ds-glm to this repo
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Clean up imports again
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* feat(Accelerator): Introduce options to control the num_threads and device from API, envvars, CLI.
- Introduce the AcceleratorOptions, AcceleratorDevice and use them to set the device where the models run.
- Introduce the accelerator_utils with function to decide the device and resolve the AUTO setting.
- Refactor the way how the docling-ibm-models are called to match the new init signature of models.
- Translate the accelerator options to the specific inputs for third-party models.
- Extend the docling CLI with parameters to set the num_threads and device.
- Add new unit tests.
- Write new example how to use the accelerator options.
* fix: Improve the pydantic objects in the pipeline_options and imports.
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
* fix: TableStructureModel: Refactor the artifacts path to use the new structure for fast/accurate model
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
* Updated test ground-truth
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Updated test ground-truth (again), bugfix for empty layout
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* fix: Do proper check to set the device in EasyOCR, RapidOCR.
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
* Rollback changes from main
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Update test gt
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Remove unused debug settings
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Review fixes
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Nail the accelerator defaults for MPS
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
Co-authored-by: Christoph Auer <cau@zurich.ibm.com>
Co-authored-by: Christoph Auer <60343111+cau-git@users.noreply.github.com>
Signed-off-by: Abhishek Kumar <abhishekrocketeer@gmail.com>
Testing:
(.venv) mario@Abhisheks-MacBook-Air docling % docling https://arxiv.org/pdf/2206.01062 --document-timeout=10 --verbose
INFO:docling.document_converter:Going to convert document batch...
INFO:docling.pipeline.base_pipeline:Processing document 2206.01062v1.pdf
WARNING:docling.pipeline.base_pipeline:Document processing time (24.555 seconds) exceeded the specified timeout of 10.000 seconds
INFO:docling.document_converter:Finished converting document 2206.01062v1.pdf in 36.29 sec.
WARNING:docling.cli.main:Document /var/folders/d7/dsfkllxs0xs8x2t4fcjknj4c0000gn/T/tmpl6p08u5i/2206.01062v1.pdf failed to convert.
INFO:docling.cli.main:Processed 1 docs, of which 1 failed
INFO:docling.cli.main:All documents were converted in 36.29 seconds.
(.venv) mario@Abhisheks-MacBook-Air docling % docling https://arxiv.org/pdf/2206.01062 --document-timeout=100 --verbose
INFO:docling.document_converter:Going to convert document batch...
INFO:docling.pipeline.base_pipeline:Processing document 2206.01062v1.pdf
INFO:docling.document_converter:Finished converting document 2206.01062v1.pdf in 58.36 sec.
INFO:docling.cli.main:writing Markdown output to 2206.01062v1.md
INFO:docling.cli.main:Processed 1 docs, of which 0 failed
INFO:docling.cli.main:All documents were converted in 58.56 seconds.
(.venv) mario@Abhisheks-MacBook-Air docling % docling https://arxiv.org/pdf/2206.01062 --verbose
INFO:docling.document_converter:Going to convert document batch...
INFO:docling.pipeline.base_pipeline:Processing document 2206.01062v1.pdf
INFO:docling.document_converter:Finished converting document 2206.01062v1.pdf in 59.82 sec.
INFO:docling.cli.main:writing Markdown output to 2206.01062v1.md
INFO:docling.cli.main:Processed 1 docs, of which 0 failed
INFO:docling.cli.main:All documents were converted in 59.88 seconds.
(.venv) mario@Abhisheks-MacBook-Air docling % docling
Usage: docling [OPTIONS] source
╭─ Arguments ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ * input_sources source PDF files to convert. Can be local file / directory paths or URL. [default: None] [required] │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --from [docx|pptx|html|image|pdf|asciido Specify input formats to convert │
│ c|md|xlsx] from. Defaults to all formats. │
│ [default: None] │
│ --to [md|json|html|text|doctags] Specify output formats. Defaults to │
│ Markdown. │
│ [default: None] │
│ --image-export-mode [placeholder|embedded|referenced] Image export mode for the document │
│ (only in case of JSON, Markdown or │
│ HTML). With `placeholder`, only the │
│ position of the image is marked in │
│ the output. In `embedded` mode, the │
│ image is embedded as base64 encoded │
│ string. In `referenced` mode, the │
│ image is exported in PNG format and │
│ referenced from the main exported │
│ document. │
│ [default: embedded] │
│ --ocr --no-ocr If enabled, the bitmap content will │
│ be processed using OCR. │
│ [default: ocr] │
│ --force-ocr --no-force-ocr Replace any existing text with OCR │
│ generated text over the full │
│ content. │
│ [default: no-force-ocr] │
│ --ocr-engine [easyocr|tesseract_cli|tesseract| The OCR engine to use. │
│ ocrmac|rapidocr] [default: easyocr] │
│ --ocr-lang TEXT Provide a comma-separated list of │
│ languages used by the OCR engine. │
│ Note that each OCR engine has │
│ different values for the language │
│ names. │
│ [default: None] │
│ --pdf-backend [pypdfium2|dlparse_v1|dlparse_v2] The PDF backend to use. │
│ [default: dlparse_v2] │
│ --table-mode [fast|accurate] The mode to use in the table │
│ structure model. │
│ [default: fast] │
│ --artifacts-path PATH If provided, the location of the │
│ model artifacts. │
│ [default: None] │
│ --abort-on-error --no-abort-on-error If enabled, the bitmap content will │
│ be processed using OCR. │
│ [default: no-abort-on-error] │
│ --output PATH Output directory where results are │
│ saved. │
│ [default: .] │
│ --verbose -v INTEGER Set the verbosity level. -v for │
│ info logging, -vv for debug │
│ logging. │
│ [default: 0] │
│ --debug-visualize-cells --no-debug-visualize-cells Enable debug output which │
│ visualizes the PDF cells │
│ [default: no-debug-visualize-cells] │
│ --debug-visualize-ocr --no-debug-visualize-ocr Enable debug output which │
│ visualizes the OCR cells │
│ [default: no-debug-visualize-ocr] │
│ --debug-visualize-layout --no-debug-visualize-layout Enable debug output which │
│ visualizes the layour clusters │
│ [default: │
│ no-debug-visualize-layout] │
│ --debug-visualize-tables --no-debug-visualize-tables Enable debug output which │
│ visualizes the table cells │
│ [default: │
│ no-debug-visualize-tables] │
│ --version Show version information. │
│ --document-timeout FLOAT The timeout for processing each │
│ document, in seconds. │
│ [default: None] │
│ --help Show this message and exit. │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
* fix: main: Introduce format options for Image with the same pdf pipeline_options.
Add RapidOcrOptions to the Union of ocr_options for PdfPipelineOptions
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
* fix: Silence the tqdm messages during the downloading of model files
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
* fix: Code styling
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
* fix: Use the HF API to disable the tqdm progress bars
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
---------
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
* Move to_docling_document from ds-glm to this repo
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Upgrade to ds-glm 1.0 and docling-parse 3.0
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Update lock
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fix DP2 backend code, change CLI default backend
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* updated README
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* removed duck in title
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* updated the index.md
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* updated the cli to export html
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* added html to cli
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* reformatted the code
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* removed the duck emoji, added the in the cli. Currently, the referenced seems broken
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* cleaning up the comments
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* reference is now working
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* Clean up styling and docs
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Pin docling-core>=2.7.1
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
---------
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Co-authored-by: Christoph Auer <cau@zurich.ibm.com>
* adding rapidocr engine for ocr in docling
Signed-off-by: swayam-singhal <swayam.singhal@inito.com>
* fixing styling format
Signed-off-by: Swaymaw <swaymaw@gmail.com>
* updating pyproject.toml and poetry.lock to fix ci bugs
Signed-off-by: Swaymaw <swaymaw@gmail.com>
* help poetry pinning for python3.9
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* simplifying rapidocr options so that device can be changed using a single option for all models
Signed-off-by: Swaymaw <swaymaw@gmail.com>
* fix styling issues and small bug in rapidOcrOptions
Signed-off-by: Swaymaw <swaymaw@gmail.com>
* use default device until we enable global management
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
---------
Signed-off-by: swayam-singhal <swayam.singhal@inito.com>
Signed-off-by: Swaymaw <swaymaw@gmail.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: swayam-singhal <swayam.singhal@inito.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
* feat: add support for `ocrmac` OCR engine on macOS
- Integrates `ocrmac` as an OCR engine option for macOS users.
- Adds configuration options and dependencies for `ocrmac`.
- Updates documentation to reflect new engine support.
This change allows macOS users to utilize `ocrmac` for improved OCR performance and compatibility.
Signed-off-by: Suhwan Seo <nuridol@gmail.com>
* updated the poetry lock
Signed-off-by: Suhwan Seo <nuridol@gmail.com>
* Fix linting issues, update CLI docs, and add error for ocrmac use on non-Mac systems
- Resolved formatting and linting issues
- Updated `--ocr-engine` CLI option documentation for `ocrmac`
- Added RuntimeError for attempts to use `ocrmac` on non-Mac platforms
Signed-off-by: Suhwan Seo <nuridol@gmail.com>
* feat: add support for `ocrmac` OCR engine on macOS
- Integrates `ocrmac` as an OCR engine option for macOS users.
- Adds configuration options and dependencies for `ocrmac`.
- Updates documentation to reflect new engine support.
This change allows macOS users to utilize `ocrmac` for improved OCR performance and compatibility.
Signed-off-by: Suhwan Seo <nuridol@gmail.com>
* docs: update examples and installation for ocrmac support
- Added `OcrMacOptions` to `custom_convert.py` and `full_page_ocr.py` examples.
- Included usage comments and examples for `OcrMacOptions` in OCR pipelines.
- Updated installation guide to include instructions for installing `ocrmac`, noting macOS version requirements (10.15+).
- Highlighted that `ocrmac` leverages Apple's Vision framework as an OCR backend.
This enhances documentation for users working on macOS to leverage `ocrmac` effectively.
Signed-off-by: Suhwan Seo <nuridol@gmail.com>
* fix: update `ocrmac` dependency with macOS-specific marker
- Added `sys_platform == 'darwin'` marker to the `ocrmac` dependency in `pyproject.toml` to specify macOS compatibility.
- Updated the content hash in `poetry.lock` to reflect the changes.
This ensures the `ocrmac` dependency is only installed on macOS systems.
Signed-off-by: Suhwan Seo <nuridol@gmail.com>
---------
Signed-off-by: Suhwan Seo <nuridol@gmail.com>
Co-authored-by: Suhwan Seo <nuridol@gmail.com>
- When the OCR is forced, any existing PDF cells are rejected.
- Introduce the force-ocr cmd parameter in docling CLI.
- Update unit tests.
- Add the full_page_ocr.py example in mkdocs.
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
Specify encoding when writing output file to avoid errors when default target encoding doesn't have all characters. utf8 seems like the most universal and supported encoding. Otherwise, the cli fails with encoding errors when input file contains unicode text (basically most files nowadays) and the target system has default encoding set to some one-byte charset like cp1252
Signed-off-by: Johnny Salazar <cepera.ang@gmail.com>
* Support tableformer model choice
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Update datamodel structure
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Update docs
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Cleanup
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Add test unit for table options
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Ensure import backwards-compatibility for PipelineOptions
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Update README
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Adjust parameters on custom_convert
Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com>
* Update Dockerfile
Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com>
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com>