Christoph Auer
|
70d68b6164
|
feat: Add option to define page range (#852)
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
|
2025-01-31 15:23:00 +01:00 |
|
Panos Vagenas
|
88a0e66adc
|
feat: add Docling JSON ingestion (#783)
* feat: add Docling JSON ingestion
Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
* update conversion as per review comments, add tests, revert Docling JSON disambiguation, document intricacies
Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
* Update docling/backend/json/docling_json_backend.py
Co-authored-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
---------
Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
Co-authored-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
|
2025-01-24 18:05:23 +01:00 |
|
Lucas Morin
|
fd034802b6
|
feat: Create a backend to transform PubMed XML files to DoclingDocument (#557)
Signed-off-by: lucas-morin <lucas.morin222@gmail.com>
|
2024-12-17 19:27:09 +01:00 |
|
Cesar Berrospi Ramis
|
4e087504cc
|
feat: create a backend to parse USPTO patents into DoclingDocument (#606)
* feat: add PATENT_USPTO as input format
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
* feat: add USPTO backend parser
Add a backend implementation to parse patent applications and
grants from the United States Patent Office (USPTO).
Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
* refactor: change the name of the USPTO input format
Change the name of the patent USPTO input format to show the typical format (XML).
Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
* refactor: address several input formats with same mime type
Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
* refactor: group XML backend parsers in a subfolder
Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
* chore: add safe initialization of PatentUsptoDocumentBackend
Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
---------
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
|
2024-12-17 16:35:23 +01:00 |
|
Michele Dolfi
|
3496b4838f
|
fix: set valid=false for invalid backends (#171)
* fix: set valid=false for invalid backends
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* Add test case for InputDocument
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
---------
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Co-authored-by: Christoph Auer <cau@zurich.ibm.com>
|
2024-10-23 15:52:30 +02:00 |
|