chore: update the with input formats and DoclingDocument (#188)
--------- Signed-off-by: Peter Staar <taa@zurich.ibm.com> Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> Signed-off-by: Christoph Auer <cau@zurich.ibm.com> Co-authored-by: Michele Dolfi <dol@zurich.ibm.com> Co-authored-by: Christoph Auer <cau@zurich.ibm.com>
This commit is contained in:
parent
f542460af3
commit
94a5290789
14
.github/workflows/cd-docs.yml
vendored
Normal file
14
.github/workflows/cd-docs.yml
vendored
Normal file
@ -0,0 +1,14 @@
|
|||||||
|
name: "Run Docs CD"
|
||||||
|
|
||||||
|
on:
|
||||||
|
push:
|
||||||
|
branches:
|
||||||
|
- "main"
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
build-deploy-docs:
|
||||||
|
uses: ./.github/workflows/docs.yml
|
||||||
|
with:
|
||||||
|
deploy: true
|
||||||
|
permissions:
|
||||||
|
contents: write
|
6
.github/workflows/cd.yml
vendored
6
.github/workflows/cd.yml
vendored
@ -10,12 +10,6 @@ env:
|
|||||||
jobs:
|
jobs:
|
||||||
code-checks:
|
code-checks:
|
||||||
uses: ./.github/workflows/checks.yml
|
uses: ./.github/workflows/checks.yml
|
||||||
build-deploy-docs:
|
|
||||||
uses: ./.github/workflows/docs.yml
|
|
||||||
with:
|
|
||||||
deploy: true
|
|
||||||
permissions:
|
|
||||||
contents: write
|
|
||||||
pre-release-check:
|
pre-release-check:
|
||||||
runs-on: ubuntu-latest
|
runs-on: ubuntu-latest
|
||||||
outputs:
|
outputs:
|
||||||
|
16
.github/workflows/ci-docs.yml
vendored
Normal file
16
.github/workflows/ci-docs.yml
vendored
Normal file
@ -0,0 +1,16 @@
|
|||||||
|
name: "Run Docs CI"
|
||||||
|
|
||||||
|
on:
|
||||||
|
pull_request:
|
||||||
|
types: [opened, reopened, synchronize]
|
||||||
|
push:
|
||||||
|
branches:
|
||||||
|
- "**"
|
||||||
|
- "!gh-pages"
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
build-docs:
|
||||||
|
if: ${{ github.event_name == 'push' || (github.event.pull_request.head.repo.full_name != 'DS4SD/docling' && github.event.pull_request.head.repo.full_name != 'ds4sd/docling') }}
|
||||||
|
uses: ./.github/workflows/docs.yml
|
||||||
|
with:
|
||||||
|
deploy: false
|
6
.github/workflows/ci.yml
vendored
6
.github/workflows/ci.yml
vendored
@ -6,6 +6,7 @@ on:
|
|||||||
push:
|
push:
|
||||||
branches:
|
branches:
|
||||||
- "**"
|
- "**"
|
||||||
|
- "!main"
|
||||||
- "!gh-pages"
|
- "!gh-pages"
|
||||||
|
|
||||||
env:
|
env:
|
||||||
@ -16,8 +17,3 @@ jobs:
|
|||||||
code-checks:
|
code-checks:
|
||||||
if: ${{ github.event_name == 'push' || (github.event.pull_request.head.repo.full_name != 'DS4SD/docling' && github.event.pull_request.head.repo.full_name != 'ds4sd/docling') }}
|
if: ${{ github.event_name == 'push' || (github.event.pull_request.head.repo.full_name != 'DS4SD/docling' && github.event.pull_request.head.repo.full_name != 'ds4sd/docling') }}
|
||||||
uses: ./.github/workflows/checks.yml
|
uses: ./.github/workflows/checks.yml
|
||||||
build-docs:
|
|
||||||
if: ${{ github.event_name == 'push' || (github.event.pull_request.head.repo.full_name != 'DS4SD/docling' && github.event.pull_request.head.repo.full_name != 'ds4sd/docling') }}
|
|
||||||
uses: ./.github/workflows/docs.yml
|
|
||||||
with:
|
|
||||||
deploy: false
|
|
||||||
|
@ -22,8 +22,9 @@ Docling parses documents and exports them to the desired format with ease and sp
|
|||||||
|
|
||||||
## Features
|
## Features
|
||||||
|
|
||||||
* 🗂️ Multi-format support for input (PDF, DOCX etc.) & output (Markdown, JSON etc.)
|
* 🗂️ Reads popular document formats (PDF, DOCX, PPTX, Images, HTML, AsciiDoc, Markdown) and exports to Markdown and JSON
|
||||||
* 📑 Advanced PDF document understanding incl. page layout, reading order & table structures
|
* 📑 Advanced PDF document understanding including page layout, reading order & table structures
|
||||||
|
* 🧩 Unified, expressive [DoclingDocument](https://ds4sd.github.io/docling/concepts/docling_document/) representation format
|
||||||
* 📝 Metadata extraction, including title, authors, references & language
|
* 📝 Metadata extraction, including title, authors, references & language
|
||||||
* 🤖 Seamless LlamaIndex 🦙 & LangChain 🦜🔗 integration for powerful RAG / QA applications
|
* 🤖 Seamless LlamaIndex 🦙 & LangChain 🦜🔗 integration for powerful RAG / QA applications
|
||||||
* 🔍 OCR support for scanned PDFs
|
* 🔍 OCR support for scanned PDFs
|
||||||
|
@ -7,6 +7,8 @@ pydantic datatype, which can express several features common to documents, such
|
|||||||
* Layout information (i.e. bounding boxes) for all items, if available
|
* Layout information (i.e. bounding boxes) for all items, if available
|
||||||
* Provenance information
|
* Provenance information
|
||||||
|
|
||||||
|
The definition of the Pydantic types is implemented in the module `docling_core.types.doc`, more details in [source code definitions](https://github.com/DS4SD/docling-core/tree/main/docling_core/types/doc).
|
||||||
|
|
||||||
It also brings a set of document construction APIs to build up a `DoclingDocument` from scratch.
|
It also brings a set of document construction APIs to build up a `DoclingDocument` from scratch.
|
||||||
|
|
||||||
## Example document structures
|
## Example document structures
|
||||||
|
@ -19,8 +19,9 @@ Docling parses documents and exports them to the desired format with ease and sp
|
|||||||
|
|
||||||
## Features
|
## Features
|
||||||
|
|
||||||
* 🗂️ Multi-format support for input (PDF, DOCX etc.) & output (Markdown, JSON etc.)
|
* 🗂️ Reads popular document formats (PDF, DOCX, PPTX, Images, HTML, AsciiDoc, Markdown) and exports to Markdown and JSON
|
||||||
* 📑 Advanced PDF document understanding incl. page layout, reading order & table structures
|
* 📑 Advanced PDF document understanding incl. page layout, reading order & table structures
|
||||||
|
* 🧩 Unified, expressive [DoclingDocument](./concepts/docling_document.md) representation format
|
||||||
* 📝 Metadata extraction, including title, authors, references & language
|
* 📝 Metadata extraction, including title, authors, references & language
|
||||||
* 🤖 Seamless LlamaIndex 🦙 & LangChain 🦜🔗 integration for powerful RAG / QA applications
|
* 🤖 Seamless LlamaIndex 🦙 & LangChain 🦜🔗 integration for powerful RAG / QA applications
|
||||||
* 🔍 OCR support for scanned PDFs
|
* 🔍 OCR support for scanned PDFs
|
||||||
|
Loading…
Reference in New Issue
Block a user