OCR for documents

Beta License: AGPL-3 OCA/knowledge Translate me on Weblate Try me on Runboat

This module was written to make uploaded documents, for example scans, searchable by running OCR on them.

It supports all image formats Pillow supports for reading and PDFs.

Table of contents

Installation

To install this module, you need to:

  1. install tesseract and the language(s) your documents use
  2. if you want to support OCR on PDFs, install imagemagick
  3. install the module itself

On an Debian or Ubuntu system you would typically run:

$ sudo apt-get install tesseract-ocr imagemagick

Configuration

To configure this module, go to:

  1. Settings/Technical/Parameters/System parameters and review the parameters with names ocr.*

Usage

By default, character recognition is done asynchronously by a cronjob at night. This is because the recognition process takes a while and you don’t want to make your users wait for the indexation to finish. The interval to run the cronjob can be adjusted to your needs in the Scheduled Actions menu, under ` Settings`. In case you want to force the OCR to be done immediately, set configuration parameter ocr.synchronous to value True.

Bug Tracker

Bugs are tracked on GitHub Issues. In case of trouble, please check there if your issue has already been reported. If you spotted it first, help us to smash it by providing a detailed and welcomed feedback.

Do not contact contributors directly about support or help with technical issues.

Credits

Authors

  • Therp BV

Maintainers

This module is maintained by the OCA.

Odoo Community Association

OCA, or the Odoo Community Association, is a nonprofit organization whose mission is to support the collaborative development of Odoo features and promote its widespread use.

This module is part of the OCA/knowledge project on GitHub.

You are welcome to contribute. To learn how please visit https://odoo-community.org/page/Contribute.