From 2baa35c548dd6d15dba449eb1dc707f8f08c0a2a Mon Sep 17 00:00:00 2001
From: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
Date: Wed, 17 Jul 2024 15:49:26 +0200
Subject: [PATCH] docs: reflect supported Python versions, add badges (#10)

* docs: reflect supported Python versions, add badges

Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>

* minor HTML fix

Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>

---------

Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
---
 README.md | 55 ++++++++++++++++++++++++++++++-------------------------
 1 file changed, 30 insertions(+), 25 deletions(-)
diff --git a/README.md b/README.md
index f70c015..847ede6 100644
--- a/README.md
+++ b/README.md
@@ -1,9 +1,18 @@
 <p align="center">
-  <a href="https://github.com/ds4sd/docling"> <img loading="lazy" alt="Docling" src="https://github.com/DS4SD/docling/raw/main/logo.png" width="150" /> </a>
+  <a href="https://github.com/ds4sd/docling"> <img loading="lazy" alt="Docling" src="https://github.com/DS4SD/docling/raw/main/logo.png" width="150" />
 </p>
 
 # Docling
 
+[![PyPI version](https://img.shields.io/pypi/v/docling)](https://pypi.org/project/docling/)
+![Python](https://img.shields.io/badge/python-3.11%20%7C%203.12-blue)
+[![Poetry](https://img.shields.io/endpoint?url=https://python-poetry.org/badge/v0.json)](https://python-poetry.org/)
+[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
+[![Imports: isort](https://img.shields.io/badge/%20imports-isort-%231674b1?style=flat&labelColor=ef8336)](https://pycqa.github.io/isort/)
+[![Pydantic v2](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/pydantic/pydantic/main/docs/badge/v2.json)](https://pydantic.dev)
+[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white)](https://github.com/pre-commit/pre-commit)
+[![License MIT](https://img.shields.io/github/license/ds4sd/deepsearch-toolkit)](https://opensource.org/licenses/MIT)
+
 Docling bundles PDF document conversion to JSON and Markdown in an easy, self-contained package.
 
 ## Features
@@ -12,25 +21,20 @@ Docling bundles PDF document conversion to JSON and Markdown in an easy, self-co
 * 📝 Extracts metadata from the document, such as title, authors, references and language
 * 🔍 Optionally applies OCR (use with scanned PDFs)
 
-## Setup
+## Installation
 
-For general usage, you can simply install `docling` through `pip` from the pypi package index.
-```
+To use Docling, simply install `docling` from your package manager, e.g. pip:
+```bash
 pip install docling
 ```
 
-**Notes**:
-* Works on macOS and Linux environments. Windows platforms are currently not tested.
+> [!NOTE]  
+> Works on macOS and Linux environments. Windows platforms are currently not tested.
 
 ### Development setup
 
-To develop for `docling`, you need Python 3.11 and `poetry`. Install poetry from [here](https://python-poetry.org/docs/#installing-with-the-official-installer).
-
-Once you have `poetry` installed and cloned this repo, create an environment and install `docling` from the repo root:
-
+To develop for Docling, you need Python 3.11 / 3.12 and Poetry. You can then install from your local clone's root dir:
 ```bash
-poetry env use $(which python3.11)
-poetry shell
 poetry install
 ```
 
@@ -45,23 +49,24 @@ The output of the above command will be written to `./scratch`.
 
 ### Enable or disable pipeline features
 
-You can control if table structure recognition or OCR should be performed by arguments passed to `DocumentConverter` 
+You can control if table structure recognition or OCR should be performed by arguments passed to `DocumentConverter`:
 ```python
 doc_converter = DocumentConverter(
     artifacts_path=artifacts_path,
-    pipeline_options=PipelineOptions(do_table_structure=False, # Controls if table structure is recovered. 
-                                     do_ocr=True), # Controls if OCR is applied (ignores programmatic content)
+    pipeline_options=PipelineOptions(
+        do_table_structure=False,  # controls if table structure is recovered 
+        do_ocr=True,  # controls if OCR is applied (ignores programmatic content)
+    ),
 )
 ```
 
 ### Impose limits on the document size
 
-You can limit the file size and number of pages which should be allowed to process per document.
+You can limit the file size and number of pages which should be allowed to process per document:
 ```python
-paths = [Path("./test/data/2206.01062.pdf")]
-
-input = DocumentConversionInput.from_paths(
-    paths, limits=DocumentLimits(max_num_pages=100, max_file_size=20971520)
+conv_input = DocumentConversionInput.from_paths(
+    paths=[Path("./test/data/2206.01062.pdf")],
+    limits=DocumentLimits(max_num_pages=100, max_file_size=20971520)
 )
 ```
 
@@ -71,12 +76,12 @@ You can convert PDFs from a binary stream instead of from the filesystem as foll
 ```python
 buf = BytesIO(your_binary_stream)
 docs = [DocumentStream(filename="my_doc.pdf", stream=buf)]
-input = DocumentConversionInput.from_streams(docs)
-converted_docs = doc_converter.convert(input)
+conv_input = DocumentConversionInput.from_streams(docs)
+converted_docs = doc_converter.convert(conv_input)
 ```
 ### Limit resource usage
 
-You can limit the CPU threads used by `docling` by setting the environment variable `OMP_NUM_THREADS` accordingly. The default setting is using 4 CPU threads.
+You can limit the CPU threads used by Docling by setting the environment variable `OMP_NUM_THREADS` accordingly. The default setting is using 4 CPU threads.
 
 
 ## Contributing
@@ -86,7 +91,7 @@ Please read [Contributing to Docling](https://github.com/DS4SD/docling/blob/main
 
 ## References
 
-If you use `Docling` in your projects, please consider citing the following:
+If you use Docling in your projects, please consider citing the following:
 
 ```bib
 @software{Docling,
@@ -101,5 +106,5 @@ year = {2024}
 
 ## License
 
-The `Docling` codebase is under MIT license.
+The Docling codebase is under MIT license.
 For individual model usage, please refer to the model licenses found in the original packages.