Docling/tests/data/jats
Cesar Berrospi Ramis 428b656793
feat(xml-jats): parse XML JATS documents (#967)
* chore(xml-jats): separate authors and affiliations

In XML PubMed (JATS) backend, convert authors and affiliations as they
are typically rendered on PDFs.

Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>

* fix(xml-jats): replace new line character by a space

Instead of removing new line character from text, replace it by a space character.

Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>

* feat(xml-jats): improve existing parser and extend features

Partially support lists, respect reading order, parse more sections, support equations, better text formatting.

Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>

* chore(xml-jats): rename PubMed objects to JATS

Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>

---------

Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
2025-02-17 10:43:31 +01:00
..
bmj_sample.xml feat(xml-jats): parse XML JATS documents (#967) 2025-02-17 10:43:31 +01:00
elife-56337.nxml feat(xml-jats): parse XML JATS documents (#967) 2025-02-17 10:43:31 +01:00
elife-56337.txt feat(xml-jats): parse XML JATS documents (#967) 2025-02-17 10:43:31 +01:00
elife-56337.xml feat(xml-jats): parse XML JATS documents (#967) 2025-02-17 10:43:31 +01:00
pnas_sample.xml feat(xml-jats): parse XML JATS documents (#967) 2025-02-17 10:43:31 +01:00
pntd.0008301.nxml feat(xml-jats): parse XML JATS documents (#967) 2025-02-17 10:43:31 +01:00
pntd.0008301.txt feat(xml-jats): parse XML JATS documents (#967) 2025-02-17 10:43:31 +01:00
pntd.0008301.xml feat(xml-jats): parse XML JATS documents (#967) 2025-02-17 10:43:31 +01:00
pone.0234687.nxml feat(xml-jats): parse XML JATS documents (#967) 2025-02-17 10:43:31 +01:00
pone.0234687.txt feat(xml-jats): parse XML JATS documents (#967) 2025-02-17 10:43:31 +01:00
pone.0234687.xml feat(xml-jats): parse XML JATS documents (#967) 2025-02-17 10:43:31 +01:00