docs: reflect supported Python versions, add badges (#10)

* docs: reflect supported Python versions, add badges Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com> * minor HTML fix Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com> --------- Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
2024-07-17 15:49:26 +02:00 · 2024-07-17 15:49:26 +02:00 · 2baa35c548
commit 2baa35c548
parent 0dfa4548d3
1 changed files with 30 additions and 25 deletions
--- a/README.md
+++ b/README.md
@ -1,9 +1,18 @@
 <p align="center">
-  <a href="https://github.com/ds4sd/docling"> <img loading="lazy" alt="Docling" src="https://github.com/DS4SD/docling/raw/main/logo.png" width="150" /> </a>
+  <a href="https://github.com/ds4sd/docling"> <img loading="lazy" alt="Docling" src="https://github.com/DS4SD/docling/raw/main/logo.png" width="150" />
 </p>

 # Docling

+[![PyPI version](https://img.shields.io/pypi/v/docling)](https://pypi.org/project/docling/)
+![Python](https://img.shields.io/badge/python-3.11%20%7C%203.12-blue)
+[![Poetry](https://img.shields.io/endpoint?url=https://python-poetry.org/badge/v0.json)](https://python-poetry.org/)
+[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
+[![Imports: isort](https://img.shields.io/badge/%20imports-isort-%231674b1?style=flat&labelColor=ef8336)](https://pycqa.github.io/isort/)
+[![Pydantic v2](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/pydantic/pydantic/main/docs/badge/v2.json)](https://pydantic.dev)
+[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white)](https://github.com/pre-commit/pre-commit)
+[![License MIT](https://img.shields.io/github/license/ds4sd/deepsearch-toolkit)](https://opensource.org/licenses/MIT)
+
 Docling bundles PDF document conversion to JSON and Markdown in an easy, self-contained package.

 ## Features
@ -12,25 +21,20 @@ Docling bundles PDF document conversion to JSON and Markdown in an easy, self-co
 * 📝 Extracts metadata from the document, such as title, authors, references and language
 * 🔍 Optionally applies OCR (use with scanned PDFs)

-## Setup
+## Installation

-For general usage, you can simply install `docling` through `pip` from the pypi package index.
-```
+To use Docling, simply install `docling` from your package manager, e.g. pip:
+```bash
 pip install docling
 ```

-**Notes**:
-* Works on macOS and Linux environments. Windows platforms are currently not tested.
+> [!NOTE]  
+> Works on macOS and Linux environments. Windows platforms are currently not tested.

 ### Development setup

-To develop for `docling`, you need Python 3.11 and `poetry`. Install poetry from [here](https://python-poetry.org/docs/#installing-with-the-official-installer).
-
-Once you have `poetry` installed and cloned this repo, create an environment and install `docling` from the repo root:
-
+To develop for Docling, you need Python 3.11 / 3.12 and Poetry. You can then install from your local clone's root dir:
 ```bash
-poetry env use $(which python3.11)
-poetry shell
 poetry install
 ```

@ -45,23 +49,24 @@ The output of the above command will be written to `./scratch`.

 ### Enable or disable pipeline features

-You can control if table structure recognition or OCR should be performed by arguments passed to `DocumentConverter` 
+You can control if table structure recognition or OCR should be performed by arguments passed to `DocumentConverter`:
 ```python
 doc_converter = DocumentConverter(
    artifacts_path=artifacts_path,
-    pipeline_options=PipelineOptions(do_table_structure=False, # Controls if table structure is recovered. 
-                                     do_ocr=True), # Controls if OCR is applied (ignores programmatic content)
+    pipeline_options=PipelineOptions(
+        do_table_structure=False,  # controls if table structure is recovered 
+        do_ocr=True,  # controls if OCR is applied (ignores programmatic content)
+    ),
 )
 ```

 ### Impose limits on the document size

-You can limit the file size and number of pages which should be allowed to process per document.
+You can limit the file size and number of pages which should be allowed to process per document:
 ```python
-paths = [Path("./test/data/2206.01062.pdf")]
-
-input = DocumentConversionInput.from_paths(
-    paths, limits=DocumentLimits(max_num_pages=100, max_file_size=20971520)
+conv_input = DocumentConversionInput.from_paths(
+    paths=[Path("./test/data/2206.01062.pdf")],
+    limits=DocumentLimits(max_num_pages=100, max_file_size=20971520)
 )
 ```

@ -71,12 +76,12 @@ You can convert PDFs from a binary stream instead of from the filesystem as foll
 ```python
 buf = BytesIO(your_binary_stream)
 docs = [DocumentStream(filename="my_doc.pdf", stream=buf)]
-input = DocumentConversionInput.from_streams(docs)
-converted_docs = doc_converter.convert(input)
+conv_input = DocumentConversionInput.from_streams(docs)
+converted_docs = doc_converter.convert(conv_input)
 ```
 ### Limit resource usage

-You can limit the CPU threads used by `docling` by setting the environment variable `OMP_NUM_THREADS` accordingly. The default setting is using 4 CPU threads.
+You can limit the CPU threads used by Docling by setting the environment variable `OMP_NUM_THREADS` accordingly. The default setting is using 4 CPU threads.


 ## Contributing
@ -86,7 +91,7 @@ Please read [Contributing to Docling](https://github.com/DS4SD/docling/blob/main

 ## References

-If you use `Docling` in your projects, please consider citing the following:
+If you use Docling in your projects, please consider citing the following:

 ```bib
@software{Docling,
@ -101,5 +106,5 @@ year = {2024}

 ## License

-The `Docling` codebase is under MIT license.
+The Docling codebase is under MIT license.
 For individual model usage, please refer to the model licenses found in the original packages.