chore: update README (#13)
Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
This commit is contained in:
parent
f09ffcc8f4
commit
28d1c746a6
13
README.md
13
README.md
@ -1,5 +1,7 @@
|
|||||||
<p align="center">
|
<p align="center">
|
||||||
<a href="https://github.com/ds4sd/docling"> <img loading="lazy" alt="Docling" src="https://github.com/DS4SD/docling/raw/main/logo.png" width="150" />
|
<a href="https://github.com/ds4sd/docling">
|
||||||
|
<img loading="lazy" alt="Docling" src="https://github.com/DS4SD/docling/raw/main/logo.png" width="150" />
|
||||||
|
</a>
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
# Docling
|
# Docling
|
||||||
@ -11,7 +13,7 @@
|
|||||||
[](https://pycqa.github.io/isort/)
|
[](https://pycqa.github.io/isort/)
|
||||||
[](https://pydantic.dev)
|
[](https://pydantic.dev)
|
||||||
[](https://github.com/pre-commit/pre-commit)
|
[](https://github.com/pre-commit/pre-commit)
|
||||||
[](https://opensource.org/licenses/MIT)
|
[](https://opensource.org/licenses/MIT)
|
||||||
|
|
||||||
Docling bundles PDF document conversion to JSON and Markdown in an easy, self-contained package.
|
Docling bundles PDF document conversion to JSON and Markdown in an easy, self-contained package.
|
||||||
|
|
||||||
@ -49,7 +51,7 @@ The output of the above command will be written to `./scratch`.
|
|||||||
|
|
||||||
### Adjust pipeline features
|
### Adjust pipeline features
|
||||||
|
|
||||||
**Control pipeline options**
|
#### Control pipeline options
|
||||||
|
|
||||||
You can control if table structure recognition or OCR should be performed by arguments passed to `DocumentConverter`:
|
You can control if table structure recognition or OCR should be performed by arguments passed to `DocumentConverter`:
|
||||||
```python
|
```python
|
||||||
@ -62,16 +64,15 @@ doc_converter = DocumentConverter(
|
|||||||
)
|
)
|
||||||
```
|
```
|
||||||
|
|
||||||
**Control table extraction options**
|
#### Control table extraction options
|
||||||
|
|
||||||
You can control if table structure recognition should map the recognized structure back to PDF cells (default) or use text cells from the structure prediction itself.
|
You can control if table structure recognition should map the recognized structure back to PDF cells (default) or use text cells from the structure prediction itself.
|
||||||
This can improve output quality if you find that multiple columns in extracted tables are erroneously merged into one.
|
This can improve output quality if you find that multiple columns in extracted tables are erroneously merged into one.
|
||||||
|
|
||||||
|
|
||||||
```python
|
```python
|
||||||
|
|
||||||
pipeline_options = PipelineOptions(do_table_structure=True)
|
pipeline_options = PipelineOptions(do_table_structure=True)
|
||||||
pipeline_options.table_structure_options.do_cell_matching = False # Uses text cells predicted from table structure model
|
pipeline_options.table_structure_options.do_cell_matching = False # uses text cells predicted from table structure model
|
||||||
|
|
||||||
doc_converter = DocumentConverter(
|
doc_converter = DocumentConverter(
|
||||||
artifacts_path=artifacts_path,
|
artifacts_path=artifacts_path,
|
||||||
|
Loading…
Reference in New Issue
Block a user