Docling/CONTRIBUTING.md
Michele Dolfi 01fbfd5652
docs: Add testing in the docs (#1379)
* add testing to CONTRIBUTING

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* document test generation

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* typo

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2025-04-14 12:31:48 +02:00

3.1 KiB

Contributing In General

Our project welcomes external contributions. If you have an itch, please feel free to scratch it.

For more details on the contributing guidelines head to the Docling Project community repository.

Developing

Usage of Poetry

We use Poetry to manage dependencies.

Installation

To install Poetry, follow the documentation here: https://python-poetry.org/docs/master/#installing-with-the-official-installer

  1. Install Poetry globally on your machine:

    curl -sSL https://install.python-poetry.org | python3 -
    

    The installation script will print the installation bin folder POETRY_BIN which you need in the next steps.

  2. Make sure Poetry is in your $PATH:

    • for zsh:
      echo 'export PATH="POETRY_BIN:$PATH"' >> ~/.zshrc
      
    • for bash:
      echo 'export PATH="POETRY_BIN:$PATH"' >> ~/.bashrc
      
  3. The official guidelines linked above include useful details on configuring autocomplete for most shell environments, e.g., Bash and Zsh.

Create a Virtual Environment and Install Dependencies

To activate the Virtual Environment, run:

poetry shell

This will spawn a shell with the Virtual Environment activated. If the Virtual Environment doesn't exist, Poetry will create one for you. Then, to install dependencies, run:

poetry install

(Advanced) Use a Specific Python Version

If you need to work with a specific (older) version of Python, run:

poetry env use $(which python3.8)

This creates a Virtual Environment with Python 3.8. For other versions, replace $(which python3.8) with the path to the interpreter (e.g., /usr/bin/python3.8) or use $(which pythonX.Y).

Add a New Dependency

poetry add NAME

Coding Style Guidelines

We use the following tools to enforce code style:

  • iSort, to sort imports
  • Black, to format code

We run a series of checks on the codebase on every commit using pre-commit. To install the hooks, run:

pre-commit install

To run the checks on-demand, run:

pre-commit run --all-files

Note: Checks like Black and isort will "fail" if they modify files. This is because pre-commit doesn't like to see files modified by its hooks. In these cases, git add the modified files and git commit again.

Tests

When submitting a new feature or fix, please consider adding a short test for it.

Reference test documents

When a change improves the conversion results, multiple reference documents must be regenerated and reviewed.

The reference data can be regenerated with

DOCLING_GEN_TEST_DATA=1 poetry run pytest

All PRs modifying the reference test data require a double review to guarantee we don't miss edge cases.

Documentation

We use MkDocs to write documentation.

To run the documentation server, run:

mkdocs serve

The server will be available at http://localhost:8000.

Pushing Documentation to GitHub Pages

Run the following:

mkdocs gh-deploy