Docling/tests/data/2305.03393v1.pages.json
Peter W. J. Staar 48f4d1ba52
fix: Add unit tests (#51)
* add the pytests

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* renamed the test folder and added the toplevel test

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* updated the toplevel function test

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* need to start running all tests successfully

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added the reference converted documents

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added first test for json and md output

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* ran pre-commit

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* replaced deprecated json function with model_dump_json

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* replaced deprecated json function with model_dump_json

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* reformatted code

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* Fix backend tests

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* commented out the drawing

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* ci: avoid duplicate runs

Signed-off-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com>

* commented out json verification for now

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added verification of input cells

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* reformat code

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added test to verify the cells in the pages

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added test to verify the cells in the pages (2)

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added test to verify the cells in the pages (3)

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* run all examples in CI

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* make sure examples return failures

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* raise a failure if examples fail

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* fix examples

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* run examples after tests

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* Add tests and update top_level_tests using only datamodels

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Remove unnecessary code

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Validate conversion status on e2e test

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* package verify utils and add more tests

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* reduce docs in example, since they are already in the tests

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* skip batch_convert

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* pin docling-parse 1.1.2

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* updated the error messages

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* commented out the json verification for now

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* bumped GLM version

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* Fix lockfile

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Pin new docling-parse v1.1.3

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

---------

Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Christoph Auer <cau@zurich.ibm.com>
Co-authored-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
2024-08-30 14:08:20 +02:00

1 line
954 KiB
JSON

[{"page_no": 0, "page_hash": "7d7ef24bf2a048bcc229d37583b737ee85f67a02864236764abcaca9eabc8b68", "size": {"width": 612.0, "height": 792.0}, "cells": [{"id": 0, "text": "Optimized Table Tokenization for Table Structure", "bbox": {"l": 134.765, "t": 115.89910999999995, "r": 480.59735, "b": 128.58112000000006, "coord_origin": "1"}}, {"id": 1, "text": "Recognition", "bbox": {"l": 266.67499, "t": 133.83209, "r": 348.68506, "b": 146.51409999999998, "coord_origin": "1"}}, {"id": 2, "text": "Maksym Lysak", "bbox": {"l": 151.22598, "t": 171.67371000000003, "r": 217.04390999999998, "b": 180.47069999999997, "coord_origin": "1"}}, {"id": 3, "text": "[0000", "bbox": {"l": 217.04599, "t": 170.08209, "r": 235.18764, "b": 176.27484000000004, "coord_origin": "1"}}, {"id": 4, "text": "\u2212", "bbox": {"l": 235.18598999999998, "t": 169.69159000000002, "r": 241.4129, "b": 176.27484000000004, "coord_origin": "1"}}, {"id": 5, "text": "0002", "bbox": {"l": 241.41299000000004, "t": 170.08209, "r": 257.29932, "b": 176.27484000000004, "coord_origin": "1"}}, {"id": 6, "text": "\u2212", "bbox": {"l": 257.298, "t": 169.69159000000002, "r": 263.5249, "b": 176.27484000000004, "coord_origin": "1"}}, {"id": 7, "text": "3723", "bbox": {"l": 263.52499, "t": 170.08209, "r": 279.41132, "b": 176.27484000000004, "coord_origin": "1"}}, {"id": 8, "text": "\u2212", "bbox": {"l": 279.41, "t": 169.69159000000002, "r": 285.6369, "b": 176.27484000000004, "coord_origin": "1"}}, {"id": 9, "text": "$^{6960]}$, Ahmed Nassar[0000", "bbox": {"l": 285.63602, "t": 171.67371000000003, "r": 392.27664, "b": 180.47069999999997, "coord_origin": "1"}}, {"id": 10, "text": "\u2212", "bbox": {"l": 392.27502, "t": 169.69159000000002, "r": 398.50192, "b": 176.27484000000004, "coord_origin": "1"}}, {"id": 11, "text": "0002", "bbox": {"l": 398.50201, "t": 170.08209, "r": 414.38834, "b": 176.27484000000004, "coord_origin": "1"}}, {"id": 12, "text": "\u2212", "bbox": {"l": 414.38702, "t": 169.69159000000002, "r": 420.61392, "b": 176.27484000000004, "coord_origin": "1"}}, {"id": 13, "text": "9468", "bbox": {"l": 420.61304, "t": 170.08209, "r": 436.49936, "b": 176.27484000000004, "coord_origin": "1"}}, {"id": 14, "text": "\u2212", "bbox": {"l": 436.49805000000003, "t": 169.69159000000002, "r": 442.72495000000004, "b": 176.27484000000004, "coord_origin": "1"}}, {"id": 15, "text": "$^{0822]}$,", "bbox": {"l": 442.72504, "t": 171.67371000000003, "r": 464.12963999999994, "b": 180.47069999999997, "coord_origin": "1"}}, {"id": 16, "text": "Nikolaos Livathinos", "bbox": {"l": 139.34305, "t": 183.62872000000004, "r": 224.80720999999997, "b": 192.42571999999996, "coord_origin": "1"}}, {"id": 17, "text": "[0000", "bbox": {"l": 224.80704000000003, "t": 182.03814999999997, "r": 242.94868, "b": 188.23090000000002, "coord_origin": "1"}}, {"id": 18, "text": "\u2212", "bbox": {"l": 242.94704000000002, "t": 181.64764000000002, "r": 249.17394999999996, "b": 188.23090000000002, "coord_origin": "1"}}, {"id": 19, "text": "0001", "bbox": {"l": 249.17404000000002, "t": 182.03814999999997, "r": 265.06036, "b": 188.23090000000002, "coord_origin": "1"}}, {"id": 20, "text": "\u2212", "bbox": {"l": 265.05905, "t": 181.64764000000002, "r": 271.28595, "b": 188.23090000000002, "coord_origin": "1"}}, {"id": 21, "text": "8513", "bbox": {"l": 271.28506, "t": 182.03814999999997, "r": 287.17139, "b": 188.23090000000002, "coord_origin": "1"}}, {"id": 22, "text": "\u2212", "bbox": {"l": 287.17007, "t": 181.64764000000002, "r": 293.39697, "b": 188.23090000000002, "coord_origin": "1"}}, {"id": 23, "text": "$^{3491]}$, Christoph Auer[0000", "bbox": {"l": 293.39706, "t": 183.62872000000004, "r": 404.1597, "b": 192.42571999999996, "coord_origin": "1"}}, {"id": 24, "text": "\u2212", "bbox": {"l": 404.15808, "t": 181.64764000000002, "r": 410.38498, "b": 188.23090000000002, "coord_origin": "1"}}, {"id": 25, "text": "0001", "bbox": {"l": 410.38507, "t": 182.03814999999997, "r": 426.27139, "b": 188.23090000000002, "coord_origin": "1"}}, {"id": 26, "text": "\u2212", "bbox": {"l": 426.27008, "t": 181.64764000000002, "r": 432.49697999999995, "b": 188.23090000000002, "coord_origin": "1"}}, {"id": 27, "text": "5761", "bbox": {"l": 432.49609, "t": 182.03814999999997, "r": 448.3824200000001, "b": 188.23090000000002, "coord_origin": "1"}}, {"id": 28, "text": "\u2212", "bbox": {"l": 448.3811, "t": 181.64764000000002, "r": 454.608, "b": 188.23090000000002, "coord_origin": "1"}}, {"id": 29, "text": "$^{0422]}$,", "bbox": {"l": 454.60808999999995, "t": 183.62872000000004, "r": 476.01270000000005, "b": 192.42571999999996, "coord_origin": "1"}}, {"id": 30, "text": "and Peter Staar", "bbox": {"l": 229.52109000000002, "t": 195.58374000000003, "r": 298.6087, "b": 204.38073999999995, "coord_origin": "1"}}, {"id": 31, "text": "[0000", "bbox": {"l": 298.60608, "t": 193.99316, "r": 316.74771, "b": 200.18591000000004, "coord_origin": "1"}}, {"id": 32, "text": "\u2212", "bbox": {"l": 316.74609, "t": 193.60266000000001, "r": 322.97299, "b": 200.18591000000004, "coord_origin": "1"}}, {"id": 33, "text": "0002", "bbox": {"l": 322.97308, "t": 193.99316, "r": 338.85941, "b": 200.18591000000004, "coord_origin": "1"}}, {"id": 34, "text": "\u2212", "bbox": {"l": 338.85809, "t": 193.60266000000001, "r": 345.08499, "b": 200.18591000000004, "coord_origin": "1"}}, {"id": 35, "text": "8088", "bbox": {"l": 345.08508, "t": 193.99316, "r": 360.97141, "b": 200.18591000000004, "coord_origin": "1"}}, {"id": 36, "text": "\u2212", "bbox": {"l": 360.97009, "t": 193.60266000000001, "r": 367.19699, "b": 200.18591000000004, "coord_origin": "1"}}, {"id": 37, "text": "0823]", "bbox": {"l": 367.19611, "t": 193.99316, "r": 385.33774, "b": 200.18591000000004, "coord_origin": "1"}}, {"id": 38, "text": "IBM Research", "bbox": {"l": 279.1051, "t": 217.20398, "r": 336.25153, "b": 225.27368, "coord_origin": "1"}}, {"id": 39, "text": "{mly,ahn,nli,cau,taa}@zurich.ibm.com", "bbox": {"l": 222.96609, "t": 228.80853000000002, "r": 392.38983, "b": 236.27752999999996, "coord_origin": "1"}}, {"id": 40, "text": "Abstract.", "bbox": {"l": 163.1111, "t": 270.30115, "r": 206.6358, "b": 278.22748, "coord_origin": "1"}}, {"id": 41, "text": "Extracting tables from documents is a crucial task in any", "bbox": {"l": 211.6171, "t": 270.36395000000005, "r": 452.2447199999999, "b": 278.43364999999994, "coord_origin": "1"}}, {"id": 42, "text": "document conversion pipeline. Recently, transformer-based models have", "bbox": {"l": 163.1111, "t": 281.3229099999999, "r": 452.24246, "b": 289.39267, "coord_origin": "1"}}, {"id": 43, "text": "demonstrated that table-structure can be recognized with impressive ac-", "bbox": {"l": 163.1111, "t": 292.28189, "r": 452.24792, "b": 300.35165000000006, "coord_origin": "1"}}, {"id": 44, "text": "curacy using Image-to-Markup-Sequence (Im2Seq) approaches. Taking", "bbox": {"l": 163.1111, "t": 303.24088, "r": 452.2407799999999, "b": 311.31064, "coord_origin": "1"}}, {"id": 45, "text": "only the image of a table, such models predict a sequence of tokens (e.g.", "bbox": {"l": 163.1111, "t": 314.19888, "r": 452.24609, "b": 322.26865, "coord_origin": "1"}}, {"id": 46, "text": "in HTML, LaTeX) which represent the structure of the table. Since the", "bbox": {"l": 163.1111, "t": 325.15787, "r": 452.24615000000006, "b": 333.22763, "coord_origin": "1"}}, {"id": 47, "text": "token representation of the table structure has a significant impact on", "bbox": {"l": 163.1111, "t": 336.11685, "r": 452.24707, "b": 344.18661, "coord_origin": "1"}}, {"id": 48, "text": "the accuracy and run-time performance of any Im2Seq model, we inves-", "bbox": {"l": 163.1111, "t": 347.07584, "r": 452.2459999999999, "b": 355.1456, "coord_origin": "1"}}, {"id": 49, "text": "tigate in this paper how table-structure representation can be optimised.", "bbox": {"l": 163.1111, "t": 358.03482, "r": 452.2479900000001, "b": 366.10458, "coord_origin": "1"}}, {"id": 50, "text": "We propose a new, optimised table-structure language (OTSL) with a", "bbox": {"l": 163.1111, "t": 368.9938, "r": 452.24609, "b": 377.06357, "coord_origin": "1"}}, {"id": 51, "text": "minimized vocabulary and specific rules. The benefits of OTSL are that", "bbox": {"l": 163.1111, "t": 379.95279, "r": 452.2417, "b": 388.02255, "coord_origin": "1"}}, {"id": 52, "text": "it reduces the number of tokens to 5 (HTML needs 28+) and shortens", "bbox": {"l": 163.1111, "t": 390.91177, "r": 452.2443200000001, "b": 398.98154, "coord_origin": "1"}}, {"id": 53, "text": "the sequence length to half of HTML on average. Consequently, model", "bbox": {"l": 163.1111, "t": 401.87076, "r": 452.24878000000007, "b": 409.94052, "coord_origin": "1"}}, {"id": 54, "text": "accuracy improves significantly, inference time is halved compared to", "bbox": {"l": 163.1111, "t": 412.82974, "r": 452.24063000000007, "b": 420.8995100000001, "coord_origin": "1"}}, {"id": 55, "text": "HTML-based models, and the predicted table structures are always syn-", "bbox": {"l": 163.1111, "t": 423.78774999999996, "r": 452.24161, "b": 431.85751000000005, "coord_origin": "1"}}, {"id": 56, "text": "tactically correct. This in turn eliminates most post-processing needs.", "bbox": {"l": 163.1111, "t": 434.74673, "r": 452.24429, "b": 442.8165, "coord_origin": "1"}}, {"id": 57, "text": "Popular table structure data-sets will be published in OTSL format to", "bbox": {"l": 163.1111, "t": 445.70572000000004, "r": 452.24603, "b": 453.77547999999996, "coord_origin": "1"}}, {"id": 58, "text": "the community.", "bbox": {"l": 163.1111, "t": 456.6647, "r": 225.56116, "b": 464.73447, "coord_origin": "1"}}, {"id": 59, "text": "Keywords:", "bbox": {"l": 163.1111, "t": 478.69394, "r": 211.94211, "b": 486.62024, "coord_origin": "1"}}, {"id": 60, "text": "Table Structure Recognition \u00b7 Data Representation \u00b7 Trans-", "bbox": {"l": 216.55208999999996, "t": 478.75671, "r": 452.24158, "b": 486.82648, "coord_origin": "1"}}, {"id": 61, "text": "formers \u00b7 Optimization.", "bbox": {"l": 163.11111, "t": 489.71573, "r": 257.64185, "b": 497.78549, "coord_origin": "1"}}, {"id": 62, "text": "1", "bbox": {"l": 134.76512, "t": 522.11969, "r": 141.48872, "b": 532.68808, "coord_origin": "1"}}, {"id": 63, "text": "Introduction", "bbox": {"l": 154.93832, "t": 522.11969, "r": 228.93384, "b": 532.68808, "coord_origin": "1"}}, {"id": 64, "text": "Tables are ubiquitous in documents such as scientific papers, patents, reports,", "bbox": {"l": 134.76512, "t": 548.2865400000001, "r": 480.5939, "b": 557.0835099999999, "coord_origin": "1"}}, {"id": 65, "text": "manuals, specification sheets or marketing material. They often encode highly", "bbox": {"l": 134.76512, "t": 560.24254, "r": 480.59180000000003, "b": 569.0395100000001, "coord_origin": "1"}}, {"id": 66, "text": "valuable information and therefore need to be extracted with high accuracy.", "bbox": {"l": 134.76512, "t": 572.19754, "r": 480.59283000000005, "b": 580.99451, "coord_origin": "1"}}, {"id": 67, "text": "Unfortunately, tables appear in documents in various sizes, styling and struc-", "bbox": {"l": 134.76512, "t": 584.15254, "r": 480.5959500000001, "b": 592.9495099999999, "coord_origin": "1"}}, {"id": 68, "text": "ture, making it difficult to recover their correct structure with simple analyt-", "bbox": {"l": 134.76512, "t": 596.10754, "r": 480.58688, "b": 604.90451, "coord_origin": "1"}}, {"id": 69, "text": "ical methods. Therefore, accurate table extraction is achieved these days with", "bbox": {"l": 134.76512, "t": 608.06255, "r": 480.59088, "b": 616.85951, "coord_origin": "1"}}, {"id": 70, "text": "machine-learning based methods.", "bbox": {"l": 134.76512, "t": 620.01755, "r": 279.32745, "b": 628.81451, "coord_origin": "1"}}, {"id": 71, "text": "In modern document understanding systems [1,15], table extraction is typi-", "bbox": {"l": 149.70811, "t": 632.14755, "r": 480.58899, "b": 640.94452, "coord_origin": "1"}}, {"id": 72, "text": "cally a two-step process. Firstly, every table on a page is located with a bounding", "bbox": {"l": 134.76512, "t": 644.1025500000001, "r": 480.59583, "b": 652.8995199999999, "coord_origin": "1"}}, {"id": 73, "text": "box, and secondly, their logical row and column structure is recognized. As of", "bbox": {"l": 134.76512, "t": 656.05756, "r": 480.59496999999993, "b": 664.85453, "coord_origin": "1"}}, {"id": 74, "text": "arXiv:2305.03393v1 [cs.CV] 5 May 2023", "bbox": {"l": 18.340218, "t": 209.47997999999995, "r": 36.339787, "b": 555.00003, "coord_origin": "1"}}], "predictions": {"layout": {"clusters": [{"id": 0, "label": "Section-header", "bbox": {"l": 134.61328639984131, "t": 115.19469738006592, "r": 480.59735, "b": 147.38129138946533, "coord_origin": "1"}, "confidence": 0.920151948928833, "cells": [{"id": 0, "text": "Optimized Table Tokenization for Table Structure", "bbox": {"l": 134.765, "t": 115.89910999999995, "r": 480.59735, "b": 128.58112000000006, "coord_origin": "1"}}, {"id": 1, "text": "Recognition", "bbox": {"l": 266.67499, "t": 133.83209, "r": 348.68506, "b": 146.51409999999998, "coord_origin": "1"}}]}, {"id": 1, "label": "Text", "bbox": {"l": 138.6561770439148, "t": 168.9183345794678, "r": 476.0571910858154, "b": 204.38073999999995, "coord_origin": "1"}, "confidence": 0.9266575574874878, "cells": [{"id": 2, "text": "Maksym Lysak", "bbox": {"l": 151.22598, "t": 171.67371000000003, "r": 217.04390999999998, "b": 180.47069999999997, "coord_origin": "1"}}, {"id": 3, "text": "[0000", "bbox": {"l": 217.04599, "t": 170.08209, "r": 235.18764, "b": 176.27484000000004, "coord_origin": "1"}}, {"id": 4, "text": "\u2212", "bbox": {"l": 235.18598999999998, "t": 169.69159000000002, "r": 241.4129, "b": 176.27484000000004, "coord_origin": "1"}}, {"id": 5, "text": "0002", "bbox": {"l": 241.41299000000004, "t": 170.08209, "r": 257.29932, "b": 176.27484000000004, "coord_origin": "1"}}, {"id": 6, "text": "\u2212", "bbox": {"l": 257.298, "t": 169.69159000000002, "r": 263.5249, "b": 176.27484000000004, "coord_origin": "1"}}, {"id": 7, "text": "3723", "bbox": {"l": 263.52499, "t": 170.08209, "r": 279.41132, "b": 176.27484000000004, "coord_origin": "1"}}, {"id": 8, "text": "\u2212", "bbox": {"l": 279.41, "t": 169.69159000000002, "r": 285.6369, "b": 176.27484000000004, "coord_origin": "1"}}, {"id": 9, "text": "$^{6960]}$, Ahmed Nassar[0000", "bbox": {"l": 285.63602, "t": 171.67371000000003, "r": 392.27664, "b": 180.47069999999997, "coord_origin": "1"}}, {"id": 10, "text": "\u2212", "bbox": {"l": 392.27502, "t": 169.69159000000002, "r": 398.50192, "b": 176.27484000000004, "coord_origin": "1"}}, {"id": 11, "text": "0002", "bbox": {"l": 398.50201, "t": 170.08209, "r": 414.38834, "b": 176.27484000000004, "coord_origin": "1"}}, {"id": 12, "text": "\u2212", "bbox": {"l": 414.38702, "t": 169.69159000000002, "r": 420.61392, "b": 176.27484000000004, "coord_origin": "1"}}, {"id": 13, "text": "9468", "bbox": {"l": 420.61304, "t": 170.08209, "r": 436.49936, "b": 176.27484000000004, "coord_origin": "1"}}, {"id": 14, "text": "\u2212", "bbox": {"l": 436.49805000000003, "t": 169.69159000000002, "r": 442.72495000000004, "b": 176.27484000000004, "coord_origin": "1"}}, {"id": 15, "text": "$^{0822]}$,", "bbox": {"l": 442.72504, "t": 171.67371000000003, "r": 464.12963999999994, "b": 180.47069999999997, "coord_origin": "1"}}, {"id": 16, "text": "Nikolaos Livathinos", "bbox": {"l": 139.34305, "t": 183.62872000000004, "r": 224.80720999999997, "b": 192.42571999999996, "coord_origin": "1"}}, {"id": 17, "text": "[0000", "bbox": {"l": 224.80704000000003, "t": 182.03814999999997, "r": 242.94868, "b": 188.23090000000002, "coord_origin": "1"}}, {"id": 18, "text": "\u2212", "bbox": {"l": 242.94704000000002, "t": 181.64764000000002, "r": 249.17394999999996, "b": 188.23090000000002, "coord_origin": "1"}}, {"id": 19, "text": "0001", "bbox": {"l": 249.17404000000002, "t": 182.03814999999997, "r": 265.06036, "b": 188.23090000000002, "coord_origin": "1"}}, {"id": 20, "text": "\u2212", "bbox": {"l": 265.05905, "t": 181.64764000000002, "r": 271.28595, "b": 188.23090000000002, "coord_origin": "1"}}, {"id": 21, "text": "8513", "bbox": {"l": 271.28506, "t": 182.03814999999997, "r": 287.17139, "b": 188.23090000000002, "coord_origin": "1"}}, {"id": 22, "text": "\u2212", "bbox": {"l": 287.17007, "t": 181.64764000000002, "r": 293.39697, "b": 188.23090000000002, "coord_origin": "1"}}, {"id": 23, "text": "$^{3491]}$, Christoph Auer[0000", "bbox": {"l": 293.39706, "t": 183.62872000000004, "r": 404.1597, "b": 192.42571999999996, "coord_origin": "1"}}, {"id": 24, "text": "\u2212", "bbox": {"l": 404.15808, "t": 181.64764000000002, "r": 410.38498, "b": 188.23090000000002, "coord_origin": "1"}}, {"id": 25, "text": "0001", "bbox": {"l": 410.38507, "t": 182.03814999999997, "r": 426.27139, "b": 188.23090000000002, "coord_origin": "1"}}, {"id": 26, "text": "\u2212", "bbox": {"l": 426.27008, "t": 181.64764000000002, "r": 432.49697999999995, "b": 188.23090000000002, "coord_origin": "1"}}, {"id": 27, "text": "5761", "bbox": {"l": 432.49609, "t": 182.03814999999997, "r": 448.3824200000001, "b": 188.23090000000002, "coord_origin": "1"}}, {"id": 28, "text": "\u2212", "bbox": {"l": 448.3811, "t": 181.64764000000002, "r": 454.608, "b": 188.23090000000002, "coord_origin": "1"}}, {"id": 29, "text": "$^{0422]}$,", "bbox": {"l": 454.60808999999995, "t": 183.62872000000004, "r": 476.01270000000005, "b": 192.42571999999996, "coord_origin": "1"}}, {"id": 30, "text": "and Peter Staar", "bbox": {"l": 229.52109000000002, "t": 195.58374000000003, "r": 298.6087, "b": 204.38073999999995, "coord_origin": "1"}}, {"id": 31, "text": "[0000", "bbox": {"l": 298.60608, "t": 193.99316, "r": 316.74771, "b": 200.18591000000004, "coord_origin": "1"}}, {"id": 32, "text": "\u2212", "bbox": {"l": 316.74609, "t": 193.60266000000001, "r": 322.97299, "b": 200.18591000000004, "coord_origin": "1"}}, {"id": 33, "text": "0002", "bbox": {"l": 322.97308, "t": 193.99316, "r": 338.85941, "b": 200.18591000000004, "coord_origin": "1"}}, {"id": 34, "text": "\u2212", "bbox": {"l": 338.85809, "t": 193.60266000000001, "r": 345.08499, "b": 200.18591000000004, "coord_origin": "1"}}, {"id": 35, "text": "8088", "bbox": {"l": 345.08508, "t": 193.99316, "r": 360.97141, "b": 200.18591000000004, "coord_origin": "1"}}, {"id": 36, "text": "\u2212", "bbox": {"l": 360.97009, "t": 193.60266000000001, "r": 367.19699, "b": 200.18591000000004, "coord_origin": "1"}}, {"id": 37, "text": "0823]", "bbox": {"l": 367.19611, "t": 193.99316, "r": 385.33774, "b": 200.18591000000004, "coord_origin": "1"}}]}, {"id": 2, "label": "Text", "bbox": {"l": 222.96609, "t": 216.0551582336426, "r": 392.69108963012695, "b": 236.3769641876221, "coord_origin": "1"}, "confidence": 0.802741527557373, "cells": [{"id": 38, "text": "IBM Research", "bbox": {"l": 279.1051, "t": 217.20398, "r": 336.25153, "b": 225.27368, "coord_origin": "1"}}, {"id": 39, "text": "{mly,ahn,nli,cau,taa}@zurich.ibm.com", "bbox": {"l": 222.96609, "t": 228.80853000000002, "r": 392.38983, "b": 236.27752999999996, "coord_origin": "1"}}]}, {"id": 3, "label": "Text", "bbox": {"l": 162.13674287796022, "t": 269.4665313720702, "r": 452.41988639831544, "b": 464.73447, "coord_origin": "1"}, "confidence": 0.9719225168228149, "cells": [{"id": 40, "text": "Abstract.", "bbox": {"l": 163.1111, "t": 270.30115, "r": 206.6358, "b": 278.22748, "coord_origin": "1"}}, {"id": 41, "text": "Extracting tables from documents is a crucial task in any", "bbox": {"l": 211.6171, "t": 270.36395000000005, "r": 452.2447199999999, "b": 278.43364999999994, "coord_origin": "1"}}, {"id": 42, "text": "document conversion pipeline. Recently, transformer-based models have", "bbox": {"l": 163.1111, "t": 281.3229099999999, "r": 452.24246, "b": 289.39267, "coord_origin": "1"}}, {"id": 43, "text": "demonstrated that table-structure can be recognized with impressive ac-", "bbox": {"l": 163.1111, "t": 292.28189, "r": 452.24792, "b": 300.35165000000006, "coord_origin": "1"}}, {"id": 44, "text": "curacy using Image-to-Markup-Sequence (Im2Seq) approaches. Taking", "bbox": {"l": 163.1111, "t": 303.24088, "r": 452.2407799999999, "b": 311.31064, "coord_origin": "1"}}, {"id": 45, "text": "only the image of a table, such models predict a sequence of tokens (e.g.", "bbox": {"l": 163.1111, "t": 314.19888, "r": 452.24609, "b": 322.26865, "coord_origin": "1"}}, {"id": 46, "text": "in HTML, LaTeX) which represent the structure of the table. Since the", "bbox": {"l": 163.1111, "t": 325.15787, "r": 452.24615000000006, "b": 333.22763, "coord_origin": "1"}}, {"id": 47, "text": "token representation of the table structure has a significant impact on", "bbox": {"l": 163.1111, "t": 336.11685, "r": 452.24707, "b": 344.18661, "coord_origin": "1"}}, {"id": 48, "text": "the accuracy and run-time performance of any Im2Seq model, we inves-", "bbox": {"l": 163.1111, "t": 347.07584, "r": 452.2459999999999, "b": 355.1456, "coord_origin": "1"}}, {"id": 49, "text": "tigate in this paper how table-structure representation can be optimised.", "bbox": {"l": 163.1111, "t": 358.03482, "r": 452.2479900000001, "b": 366.10458, "coord_origin": "1"}}, {"id": 50, "text": "We propose a new, optimised table-structure language (OTSL) with a", "bbox": {"l": 163.1111, "t": 368.9938, "r": 452.24609, "b": 377.06357, "coord_origin": "1"}}, {"id": 51, "text": "minimized vocabulary and specific rules. The benefits of OTSL are that", "bbox": {"l": 163.1111, "t": 379.95279, "r": 452.2417, "b": 388.02255, "coord_origin": "1"}}, {"id": 52, "text": "it reduces the number of tokens to 5 (HTML needs 28+) and shortens", "bbox": {"l": 163.1111, "t": 390.91177, "r": 452.2443200000001, "b": 398.98154, "coord_origin": "1"}}, {"id": 53, "text": "the sequence length to half of HTML on average. Consequently, model", "bbox": {"l": 163.1111, "t": 401.87076, "r": 452.24878000000007, "b": 409.94052, "coord_origin": "1"}}, {"id": 54, "text": "accuracy improves significantly, inference time is halved compared to", "bbox": {"l": 163.1111, "t": 412.82974, "r": 452.24063000000007, "b": 420.8995100000001, "coord_origin": "1"}}, {"id": 55, "text": "HTML-based models, and the predicted table structures are always syn-", "bbox": {"l": 163.1111, "t": 423.78774999999996, "r": 452.24161, "b": 431.85751000000005, "coord_origin": "1"}}, {"id": 56, "text": "tactically correct. This in turn eliminates most post-processing needs.", "bbox": {"l": 163.1111, "t": 434.74673, "r": 452.24429, "b": 442.8165, "coord_origin": "1"}}, {"id": 57, "text": "Popular table structure data-sets will be published in OTSL format to", "bbox": {"l": 163.1111, "t": 445.70572000000004, "r": 452.24603, "b": 453.77547999999996, "coord_origin": "1"}}, {"id": 58, "text": "the community.", "bbox": {"l": 163.1111, "t": 456.6647, "r": 225.56116, "b": 464.73447, "coord_origin": "1"}}]}, {"id": 4, "label": "Text", "bbox": {"l": 162.67949237823487, "t": 477.7591037750244, "r": 452.24158, "b": 498.1963966369629, "coord_origin": "1"}, "confidence": 0.9469619989395142, "cells": [{"id": 59, "text": "Keywords:", "bbox": {"l": 163.1111, "t": 478.69394, "r": 211.94211, "b": 486.62024, "coord_origin": "1"}}, {"id": 60, "text": "Table Structure Recognition \u00b7 Data Representation \u00b7 Trans-", "bbox": {"l": 216.55208999999996, "t": 478.75671, "r": 452.24158, "b": 486.82648, "coord_origin": "1"}}, {"id": 61, "text": "formers \u00b7 Optimization.", "bbox": {"l": 163.11111, "t": 489.71573, "r": 257.64185, "b": 497.78549, "coord_origin": "1"}}]}, {"id": 5, "label": "Section-header", "bbox": {"l": 134.76512, "t": 521.4849472045898, "r": 228.93384, "b": 532.68808, "coord_origin": "1"}, "confidence": 0.9437072277069092, "cells": [{"id": 62, "text": "1", "bbox": {"l": 134.76512, "t": 522.11969, "r": 141.48872, "b": 532.68808, "coord_origin": "1"}}, {"id": 63, "text": "Introduction", "bbox": {"l": 154.93832, "t": 522.11969, "r": 228.93384, "b": 532.68808, "coord_origin": "1"}}]}, {"id": 6, "label": "Text", "bbox": {"l": 134.0102459907532, "t": 547.7120315551757, "r": 480.5959500000001, "b": 628.8722877502441, "coord_origin": "1"}, "confidence": 0.9859005212783813, "cells": [{"id": 64, "text": "Tables are ubiquitous in documents such as scientific papers, patents, reports,", "bbox": {"l": 134.76512, "t": 548.2865400000001, "r": 480.5939, "b": 557.0835099999999, "coord_origin": "1"}}, {"id": 65, "text": "manuals, specification sheets or marketing material. They often encode highly", "bbox": {"l": 134.76512, "t": 560.24254, "r": 480.59180000000003, "b": 569.0395100000001, "coord_origin": "1"}}, {"id": 66, "text": "valuable information and therefore need to be extracted with high accuracy.", "bbox": {"l": 134.76512, "t": 572.19754, "r": 480.59283000000005, "b": 580.99451, "coord_origin": "1"}}, {"id": 67, "text": "Unfortunately, tables appear in documents in various sizes, styling and struc-", "bbox": {"l": 134.76512, "t": 584.15254, "r": 480.5959500000001, "b": 592.9495099999999, "coord_origin": "1"}}, {"id": 68, "text": "ture, making it difficult to recover their correct structure with simple analyt-", "bbox": {"l": 134.76512, "t": 596.10754, "r": 480.58688, "b": 604.90451, "coord_origin": "1"}}, {"id": 69, "text": "ical methods. Therefore, accurate table extraction is achieved these days with", "bbox": {"l": 134.76512, "t": 608.06255, "r": 480.59088, "b": 616.85951, "coord_origin": "1"}}, {"id": 70, "text": "machine-learning based methods.", "bbox": {"l": 134.76512, "t": 620.01755, "r": 279.32745, "b": 628.81451, "coord_origin": "1"}}]}, {"id": 7, "label": "Text", "bbox": {"l": 134.04418516159058, "t": 631.6932197570801, "r": 480.7483549118042, "b": 665.1588180541993, "coord_origin": "1"}, "confidence": 0.9777455925941467, "cells": [{"id": 71, "text": "In modern document understanding systems [1,15], table extraction is typi-", "bbox": {"l": 149.70811, "t": 632.14755, "r": 480.58899, "b": 640.94452, "coord_origin": "1"}}, {"id": 72, "text": "cally a two-step process. Firstly, every table on a page is located with a bounding", "bbox": {"l": 134.76512, "t": 644.1025500000001, "r": 480.59583, "b": 652.8995199999999, "coord_origin": "1"}}, {"id": 73, "text": "box, and secondly, their logical row and column structure is recognized. As of", "bbox": {"l": 134.76512, "t": 656.05756, "r": 480.59496999999993, "b": 664.85453, "coord_origin": "1"}}]}, {"id": 8, "label": "Page-header", "bbox": {"l": 16.3292133808136, "t": 209.47997999999995, "r": 36.60316228866577, "b": 555.00003, "coord_origin": "1"}, "confidence": 0.867455244064331, "cells": [{"id": 74, "text": "arXiv:2305.03393v1 [cs.CV] 5 May 2023", "bbox": {"l": 18.340218, "t": 209.47997999999995, "r": 36.339787, "b": 555.00003, "coord_origin": "1"}}]}]}, "tablestructure": {"table_map": {}}, "figures_classification": null, "equations_prediction": null}, "assembled": {"elements": [{"label": "Section-header", "id": 0, "page_no": 0, "cluster": {"id": 0, "label": "Section-header", "bbox": {"l": 134.61328639984131, "t": 115.19469738006592, "r": 480.59735, "b": 147.38129138946533, "coord_origin": "1"}, "confidence": 0.920151948928833, "cells": [{"id": 0, "text": "Optimized Table Tokenization for Table Structure", "bbox": {"l": 134.765, "t": 115.89910999999995, "r": 480.59735, "b": 128.58112000000006, "coord_origin": "1"}}, {"id": 1, "text": "Recognition", "bbox": {"l": 266.67499, "t": 133.83209, "r": 348.68506, "b": 146.51409999999998, "coord_origin": "1"}}]}, "text": "Optimized Table Tokenization for Table Structure Recognition"}, {"label": "Text", "id": 1, "page_no": 0, "cluster": {"id": 1, "label": "Text", "bbox": {"l": 138.6561770439148, "t": 168.9183345794678, "r": 476.0571910858154, "b": 204.38073999999995, "coord_origin": "1"}, "confidence": 0.9266575574874878, "cells": [{"id": 2, "text": "Maksym Lysak", "bbox": {"l": 151.22598, "t": 171.67371000000003, "r": 217.04390999999998, "b": 180.47069999999997, "coord_origin": "1"}}, {"id": 3, "text": "[0000", "bbox": {"l": 217.04599, "t": 170.08209, "r": 235.18764, "b": 176.27484000000004, "coord_origin": "1"}}, {"id": 4, "text": "\u2212", "bbox": {"l": 235.18598999999998, "t": 169.69159000000002, "r": 241.4129, "b": 176.27484000000004, "coord_origin": "1"}}, {"id": 5, "text": "0002", "bbox": {"l": 241.41299000000004, "t": 170.08209, "r": 257.29932, "b": 176.27484000000004, "coord_origin": "1"}}, {"id": 6, "text": "\u2212", "bbox": {"l": 257.298, "t": 169.69159000000002, "r": 263.5249, "b": 176.27484000000004, "coord_origin": "1"}}, {"id": 7, "text": "3723", "bbox": {"l": 263.52499, "t": 170.08209, "r": 279.41132, "b": 176.27484000000004, "coord_origin": "1"}}, {"id": 8, "text": "\u2212", "bbox": {"l": 279.41, "t": 169.69159000000002, "r": 285.6369, "b": 176.27484000000004, "coord_origin": "1"}}, {"id": 9, "text": "$^{6960]}$, Ahmed Nassar[0000", "bbox": {"l": 285.63602, "t": 171.67371000000003, "r": 392.27664, "b": 180.47069999999997, "coord_origin": "1"}}, {"id": 10, "text": "\u2212", "bbox": {"l": 392.27502, "t": 169.69159000000002, "r": 398.50192, "b": 176.27484000000004, "coord_origin": "1"}}, {"id": 11, "text": "0002", "bbox": {"l": 398.50201, "t": 170.08209, "r": 414.38834, "b": 176.27484000000004, "coord_origin": "1"}}, {"id": 12, "text": "\u2212", "bbox": {"l": 414.38702, "t": 169.69159000000002, "r": 420.61392, "b": 176.27484000000004, "coord_origin": "1"}}, {"id": 13, "text": "9468", "bbox": {"l": 420.61304, "t": 170.08209, "r": 436.49936, "b": 176.27484000000004, "coord_origin": "1"}}, {"id": 14, "text": "\u2212", "bbox": {"l": 436.49805000000003, "t": 169.69159000000002, "r": 442.72495000000004, "b": 176.27484000000004, "coord_origin": "1"}}, {"id": 15, "text": "$^{0822]}$,", "bbox": {"l": 442.72504, "t": 171.67371000000003, "r": 464.12963999999994, "b": 180.47069999999997, "coord_origin": "1"}}, {"id": 16, "text": "Nikolaos Livathinos", "bbox": {"l": 139.34305, "t": 183.62872000000004, "r": 224.80720999999997, "b": 192.42571999999996, "coord_origin": "1"}}, {"id": 17, "text": "[0000", "bbox": {"l": 224.80704000000003, "t": 182.03814999999997, "r": 242.94868, "b": 188.23090000000002, "coord_origin": "1"}}, {"id": 18, "text": "\u2212", "bbox": {"l": 242.94704000000002, "t": 181.64764000000002, "r": 249.17394999999996, "b": 188.23090000000002, "coord_origin": "1"}}, {"id": 19, "text": "0001", "bbox": {"l": 249.17404000000002, "t": 182.03814999999997, "r": 265.06036, "b": 188.23090000000002, "coord_origin": "1"}}, {"id": 20, "text": "\u2212", "bbox": {"l": 265.05905, "t": 181.64764000000002, "r": 271.28595, "b": 188.23090000000002, "coord_origin": "1"}}, {"id": 21, "text": "8513", "bbox": {"l": 271.28506, "t": 182.03814999999997, "r": 287.17139, "b": 188.23090000000002, "coord_origin": "1"}}, {"id": 22, "text": "\u2212", "bbox": {"l": 287.17007, "t": 181.64764000000002, "r": 293.39697, "b": 188.23090000000002, "coord_origin": "1"}}, {"id": 23, "text": "$^{3491]}$, Christoph Auer[0000", "bbox": {"l": 293.39706, "t": 183.62872000000004, "r": 404.1597, "b": 192.42571999999996, "coord_origin": "1"}}, {"id": 24, "text": "\u2212", "bbox": {"l": 404.15808, "t": 181.64764000000002, "r": 410.38498, "b": 188.23090000000002, "coord_origin": "1"}}, {"id": 25, "text": "0001", "bbox": {"l": 410.38507, "t": 182.03814999999997, "r": 426.27139, "b": 188.23090000000002, "coord_origin": "1"}}, {"id": 26, "text": "\u2212", "bbox": {"l": 426.27008, "t": 181.64764000000002, "r": 432.49697999999995, "b": 188.23090000000002, "coord_origin": "1"}}, {"id": 27, "text": "5761", "bbox": {"l": 432.49609, "t": 182.03814999999997, "r": 448.3824200000001, "b": 188.23090000000002, "coord_origin": "1"}}, {"id": 28, "text": "\u2212", "bbox": {"l": 448.3811, "t": 181.64764000000002, "r": 454.608, "b": 188.23090000000002, "coord_origin": "1"}}, {"id": 29, "text": "$^{0422]}$,", "bbox": {"l": 454.60808999999995, "t": 183.62872000000004, "r": 476.01270000000005, "b": 192.42571999999996, "coord_origin": "1"}}, {"id": 30, "text": "and Peter Staar", "bbox": {"l": 229.52109000000002, "t": 195.58374000000003, "r": 298.6087, "b": 204.38073999999995, "coord_origin": "1"}}, {"id": 31, "text": "[0000", "bbox": {"l": 298.60608, "t": 193.99316, "r": 316.74771, "b": 200.18591000000004, "coord_origin": "1"}}, {"id": 32, "text": "\u2212", "bbox": {"l": 316.74609, "t": 193.60266000000001, "r": 322.97299, "b": 200.18591000000004, "coord_origin": "1"}}, {"id": 33, "text": "0002", "bbox": {"l": 322.97308, "t": 193.99316, "r": 338.85941, "b": 200.18591000000004, "coord_origin": "1"}}, {"id": 34, "text": "\u2212", "bbox": {"l": 338.85809, "t": 193.60266000000001, "r": 345.08499, "b": 200.18591000000004, "coord_origin": "1"}}, {"id": 35, "text": "8088", "bbox": {"l": 345.08508, "t": 193.99316, "r": 360.97141, "b": 200.18591000000004, "coord_origin": "1"}}, {"id": 36, "text": "\u2212", "bbox": {"l": 360.97009, "t": 193.60266000000001, "r": 367.19699, "b": 200.18591000000004, "coord_origin": "1"}}, {"id": 37, "text": "0823]", "bbox": {"l": 367.19611, "t": 193.99316, "r": 385.33774, "b": 200.18591000000004, "coord_origin": "1"}}]}, "text": "Maksym Lysak [0000 \u2212 0002 \u2212 3723 \u2212 $^{6960]}$, Ahmed Nassar[0000 \u2212 0002 \u2212 9468 \u2212 $^{0822]}$, Nikolaos Livathinos [0000 \u2212 0001 \u2212 8513 \u2212 $^{3491]}$, Christoph Auer[0000 \u2212 0001 \u2212 5761 \u2212 $^{0422]}$, and Peter Staar [0000 \u2212 0002 \u2212 8088 \u2212 0823]"}, {"label": "Text", "id": 2, "page_no": 0, "cluster": {"id": 2, "label": "Text", "bbox": {"l": 222.96609, "t": 216.0551582336426, "r": 392.69108963012695, "b": 236.3769641876221, "coord_origin": "1"}, "confidence": 0.802741527557373, "cells": [{"id": 38, "text": "IBM Research", "bbox": {"l": 279.1051, "t": 217.20398, "r": 336.25153, "b": 225.27368, "coord_origin": "1"}}, {"id": 39, "text": "{mly,ahn,nli,cau,taa}@zurich.ibm.com", "bbox": {"l": 222.96609, "t": 228.80853000000002, "r": 392.38983, "b": 236.27752999999996, "coord_origin": "1"}}]}, "text": "IBM Research {mly,ahn,nli,cau,taa}@zurich.ibm.com"}, {"label": "Text", "id": 3, "page_no": 0, "cluster": {"id": 3, "label": "Text", "bbox": {"l": 162.13674287796022, "t": 269.4665313720702, "r": 452.41988639831544, "b": 464.73447, "coord_origin": "1"}, "confidence": 0.9719225168228149, "cells": [{"id": 40, "text": "Abstract.", "bbox": {"l": 163.1111, "t": 270.30115, "r": 206.6358, "b": 278.22748, "coord_origin": "1"}}, {"id": 41, "text": "Extracting tables from documents is a crucial task in any", "bbox": {"l": 211.6171, "t": 270.36395000000005, "r": 452.2447199999999, "b": 278.43364999999994, "coord_origin": "1"}}, {"id": 42, "text": "document conversion pipeline. Recently, transformer-based models have", "bbox": {"l": 163.1111, "t": 281.3229099999999, "r": 452.24246, "b": 289.39267, "coord_origin": "1"}}, {"id": 43, "text": "demonstrated that table-structure can be recognized with impressive ac-", "bbox": {"l": 163.1111, "t": 292.28189, "r": 452.24792, "b": 300.35165000000006, "coord_origin": "1"}}, {"id": 44, "text": "curacy using Image-to-Markup-Sequence (Im2Seq) approaches. Taking", "bbox": {"l": 163.1111, "t": 303.24088, "r": 452.2407799999999, "b": 311.31064, "coord_origin": "1"}}, {"id": 45, "text": "only the image of a table, such models predict a sequence of tokens (e.g.", "bbox": {"l": 163.1111, "t": 314.19888, "r": 452.24609, "b": 322.26865, "coord_origin": "1"}}, {"id": 46, "text": "in HTML, LaTeX) which represent the structure of the table. Since the", "bbox": {"l": 163.1111, "t": 325.15787, "r": 452.24615000000006, "b": 333.22763, "coord_origin": "1"}}, {"id": 47, "text": "token representation of the table structure has a significant impact on", "bbox": {"l": 163.1111, "t": 336.11685, "r": 452.24707, "b": 344.18661, "coord_origin": "1"}}, {"id": 48, "text": "the accuracy and run-time performance of any Im2Seq model, we inves-", "bbox": {"l": 163.1111, "t": 347.07584, "r": 452.2459999999999, "b": 355.1456, "coord_origin": "1"}}, {"id": 49, "text": "tigate in this paper how table-structure representation can be optimised.", "bbox": {"l": 163.1111, "t": 358.03482, "r": 452.2479900000001, "b": 366.10458, "coord_origin": "1"}}, {"id": 50, "text": "We propose a new, optimised table-structure language (OTSL) with a", "bbox": {"l": 163.1111, "t": 368.9938, "r": 452.24609, "b": 377.06357, "coord_origin": "1"}}, {"id": 51, "text": "minimized vocabulary and specific rules. The benefits of OTSL are that", "bbox": {"l": 163.1111, "t": 379.95279, "r": 452.2417, "b": 388.02255, "coord_origin": "1"}}, {"id": 52, "text": "it reduces the number of tokens to 5 (HTML needs 28+) and shortens", "bbox": {"l": 163.1111, "t": 390.91177, "r": 452.2443200000001, "b": 398.98154, "coord_origin": "1"}}, {"id": 53, "text": "the sequence length to half of HTML on average. Consequently, model", "bbox": {"l": 163.1111, "t": 401.87076, "r": 452.24878000000007, "b": 409.94052, "coord_origin": "1"}}, {"id": 54, "text": "accuracy improves significantly, inference time is halved compared to", "bbox": {"l": 163.1111, "t": 412.82974, "r": 452.24063000000007, "b": 420.8995100000001, "coord_origin": "1"}}, {"id": 55, "text": "HTML-based models, and the predicted table structures are always syn-", "bbox": {"l": 163.1111, "t": 423.78774999999996, "r": 452.24161, "b": 431.85751000000005, "coord_origin": "1"}}, {"id": 56, "text": "tactically correct. This in turn eliminates most post-processing needs.", "bbox": {"l": 163.1111, "t": 434.74673, "r": 452.24429, "b": 442.8165, "coord_origin": "1"}}, {"id": 57, "text": "Popular table structure data-sets will be published in OTSL format to", "bbox": {"l": 163.1111, "t": 445.70572000000004, "r": 452.24603, "b": 453.77547999999996, "coord_origin": "1"}}, {"id": 58, "text": "the community.", "bbox": {"l": 163.1111, "t": 456.6647, "r": 225.56116, "b": 464.73447, "coord_origin": "1"}}]}, "text": "Abstract. Extracting tables from documents is a crucial task in any document conversion pipeline. Recently, transformer-based models have demonstrated that table-structure can be recognized with impressive accuracy using Image-to-Markup-Sequence (Im2Seq) approaches. Taking only the image of a table, such models predict a sequence of tokens (e.g. in HTML, LaTeX) which represent the structure of the table. Since the token representation of the table structure has a significant impact on the accuracy and run-time performance of any Im2Seq model, we investigate in this paper how table-structure representation can be optimised. We propose a new, optimised table-structure language (OTSL) with a minimized vocabulary and specific rules. The benefits of OTSL are that it reduces the number of tokens to 5 (HTML needs 28+) and shortens the sequence length to half of HTML on average. Consequently, model accuracy improves significantly, inference time is halved compared to HTML-based models, and the predicted table structures are always syntactically correct. This in turn eliminates most post-processing needs. Popular table structure data-sets will be published in OTSL format to the community."}, {"label": "Text", "id": 4, "page_no": 0, "cluster": {"id": 4, "label": "Text", "bbox": {"l": 162.67949237823487, "t": 477.7591037750244, "r": 452.24158, "b": 498.1963966369629, "coord_origin": "1"}, "confidence": 0.9469619989395142, "cells": [{"id": 59, "text": "Keywords:", "bbox": {"l": 163.1111, "t": 478.69394, "r": 211.94211, "b": 486.62024, "coord_origin": "1"}}, {"id": 60, "text": "Table Structure Recognition \u00b7 Data Representation \u00b7 Trans-", "bbox": {"l": 216.55208999999996, "t": 478.75671, "r": 452.24158, "b": 486.82648, "coord_origin": "1"}}, {"id": 61, "text": "formers \u00b7 Optimization.", "bbox": {"l": 163.11111, "t": 489.71573, "r": 257.64185, "b": 497.78549, "coord_origin": "1"}}]}, "text": "Keywords: Table Structure Recognition \u00b7 Data Representation \u00b7 Transformers \u00b7 Optimization."}, {"label": "Section-header", "id": 5, "page_no": 0, "cluster": {"id": 5, "label": "Section-header", "bbox": {"l": 134.76512, "t": 521.4849472045898, "r": 228.93384, "b": 532.68808, "coord_origin": "1"}, "confidence": 0.9437072277069092, "cells": [{"id": 62, "text": "1", "bbox": {"l": 134.76512, "t": 522.11969, "r": 141.48872, "b": 532.68808, "coord_origin": "1"}}, {"id": 63, "text": "Introduction", "bbox": {"l": 154.93832, "t": 522.11969, "r": 228.93384, "b": 532.68808, "coord_origin": "1"}}]}, "text": "1 Introduction"}, {"label": "Text", "id": 6, "page_no": 0, "cluster": {"id": 6, "label": "Text", "bbox": {"l": 134.0102459907532, "t": 547.7120315551757, "r": 480.5959500000001, "b": 628.8722877502441, "coord_origin": "1"}, "confidence": 0.9859005212783813, "cells": [{"id": 64, "text": "Tables are ubiquitous in documents such as scientific papers, patents, reports,", "bbox": {"l": 134.76512, "t": 548.2865400000001, "r": 480.5939, "b": 557.0835099999999, "coord_origin": "1"}}, {"id": 65, "text": "manuals, specification sheets or marketing material. They often encode highly", "bbox": {"l": 134.76512, "t": 560.24254, "r": 480.59180000000003, "b": 569.0395100000001, "coord_origin": "1"}}, {"id": 66, "text": "valuable information and therefore need to be extracted with high accuracy.", "bbox": {"l": 134.76512, "t": 572.19754, "r": 480.59283000000005, "b": 580.99451, "coord_origin": "1"}}, {"id": 67, "text": "Unfortunately, tables appear in documents in various sizes, styling and struc-", "bbox": {"l": 134.76512, "t": 584.15254, "r": 480.5959500000001, "b": 592.9495099999999, "coord_origin": "1"}}, {"id": 68, "text": "ture, making it difficult to recover their correct structure with simple analyt-", "bbox": {"l": 134.76512, "t": 596.10754, "r": 480.58688, "b": 604.90451, "coord_origin": "1"}}, {"id": 69, "text": "ical methods. Therefore, accurate table extraction is achieved these days with", "bbox": {"l": 134.76512, "t": 608.06255, "r": 480.59088, "b": 616.85951, "coord_origin": "1"}}, {"id": 70, "text": "machine-learning based methods.", "bbox": {"l": 134.76512, "t": 620.01755, "r": 279.32745, "b": 628.81451, "coord_origin": "1"}}]}, "text": "Tables are ubiquitous in documents such as scientific papers, patents, reports, manuals, specification sheets or marketing material. They often encode highly valuable information and therefore need to be extracted with high accuracy. Unfortunately, tables appear in documents in various sizes, styling and structure, making it difficult to recover their correct structure with simple analytical methods. Therefore, accurate table extraction is achieved these days with machine-learning based methods."}, {"label": "Text", "id": 7, "page_no": 0, "cluster": {"id": 7, "label": "Text", "bbox": {"l": 134.04418516159058, "t": 631.6932197570801, "r": 480.7483549118042, "b": 665.1588180541993, "coord_origin": "1"}, "confidence": 0.9777455925941467, "cells": [{"id": 71, "text": "In modern document understanding systems [1,15], table extraction is typi-", "bbox": {"l": 149.70811, "t": 632.14755, "r": 480.58899, "b": 640.94452, "coord_origin": "1"}}, {"id": 72, "text": "cally a two-step process. Firstly, every table on a page is located with a bounding", "bbox": {"l": 134.76512, "t": 644.1025500000001, "r": 480.59583, "b": 652.8995199999999, "coord_origin": "1"}}, {"id": 73, "text": "box, and secondly, their logical row and column structure is recognized. As of", "bbox": {"l": 134.76512, "t": 656.05756, "r": 480.59496999999993, "b": 664.85453, "coord_origin": "1"}}]}, "text": "In modern document understanding systems [1,15], table extraction is typically a two-step process. Firstly, every table on a page is located with a bounding box, and secondly, their logical row and column structure is recognized. As of"}, {"label": "Page-header", "id": 8, "page_no": 0, "cluster": {"id": 8, "label": "Page-header", "bbox": {"l": 16.3292133808136, "t": 209.47997999999995, "r": 36.60316228866577, "b": 555.00003, "coord_origin": "1"}, "confidence": 0.867455244064331, "cells": [{"id": 74, "text": "arXiv:2305.03393v1 [cs.CV] 5 May 2023", "bbox": {"l": 18.340218, "t": 209.47997999999995, "r": 36.339787, "b": 555.00003, "coord_origin": "1"}}]}, "text": "arXiv:2305.03393v1 [cs.CV] 5 May 2023"}], "body": [{"label": "Section-header", "id": 0, "page_no": 0, "cluster": {"id": 0, "label": "Section-header", "bbox": {"l": 134.61328639984131, "t": 115.19469738006592, "r": 480.59735, "b": 147.38129138946533, "coord_origin": "1"}, "confidence": 0.920151948928833, "cells": [{"id": 0, "text": "Optimized Table Tokenization for Table Structure", "bbox": {"l": 134.765, "t": 115.89910999999995, "r": 480.59735, "b": 128.58112000000006, "coord_origin": "1"}}, {"id": 1, "text": "Recognition", "bbox": {"l": 266.67499, "t": 133.83209, "r": 348.68506, "b": 146.51409999999998, "coord_origin": "1"}}]}, "text": "Optimized Table Tokenization for Table Structure Recognition"}, {"label": "Text", "id": 1, "page_no": 0, "cluster": {"id": 1, "label": "Text", "bbox": {"l": 138.6561770439148, "t": 168.9183345794678, "r": 476.0571910858154, "b": 204.38073999999995, "coord_origin": "1"}, "confidence": 0.9266575574874878, "cells": [{"id": 2, "text": "Maksym Lysak", "bbox": {"l": 151.22598, "t": 171.67371000000003, "r": 217.04390999999998, "b": 180.47069999999997, "coord_origin": "1"}}, {"id": 3, "text": "[0000", "bbox": {"l": 217.04599, "t": 170.08209, "r": 235.18764, "b": 176.27484000000004, "coord_origin": "1"}}, {"id": 4, "text": "\u2212", "bbox": {"l": 235.18598999999998, "t": 169.69159000000002, "r": 241.4129, "b": 176.27484000000004, "coord_origin": "1"}}, {"id": 5, "text": "0002", "bbox": {"l": 241.41299000000004, "t": 170.08209, "r": 257.29932, "b": 176.27484000000004, "coord_origin": "1"}}, {"id": 6, "text": "\u2212", "bbox": {"l": 257.298, "t": 169.69159000000002, "r": 263.5249, "b": 176.27484000000004, "coord_origin": "1"}}, {"id": 7, "text": "3723", "bbox": {"l": 263.52499, "t": 170.08209, "r": 279.41132, "b": 176.27484000000004, "coord_origin": "1"}}, {"id": 8, "text": "\u2212", "bbox": {"l": 279.41, "t": 169.69159000000002, "r": 285.6369, "b": 176.27484000000004, "coord_origin": "1"}}, {"id": 9, "text": "$^{6960]}$, Ahmed Nassar[0000", "bbox": {"l": 285.63602, "t": 171.67371000000003, "r": 392.27664, "b": 180.47069999999997, "coord_origin": "1"}}, {"id": 10, "text": "\u2212", "bbox": {"l": 392.27502, "t": 169.69159000000002, "r": 398.50192, "b": 176.27484000000004, "coord_origin": "1"}}, {"id": 11, "text": "0002", "bbox": {"l": 398.50201, "t": 170.08209, "r": 414.38834, "b": 176.27484000000004, "coord_origin": "1"}}, {"id": 12, "text": "\u2212", "bbox": {"l": 414.38702, "t": 169.69159000000002, "r": 420.61392, "b": 176.27484000000004, "coord_origin": "1"}}, {"id": 13, "text": "9468", "bbox": {"l": 420.61304, "t": 170.08209, "r": 436.49936, "b": 176.27484000000004, "coord_origin": "1"}}, {"id": 14, "text": "\u2212", "bbox": {"l": 436.49805000000003, "t": 169.69159000000002, "r": 442.72495000000004, "b": 176.27484000000004, "coord_origin": "1"}}, {"id": 15, "text": "$^{0822]}$,", "bbox": {"l": 442.72504, "t": 171.67371000000003, "r": 464.12963999999994, "b": 180.47069999999997, "coord_origin": "1"}}, {"id": 16, "text": "Nikolaos Livathinos", "bbox": {"l": 139.34305, "t": 183.62872000000004, "r": 224.80720999999997, "b": 192.42571999999996, "coord_origin": "1"}}, {"id": 17, "text": "[0000", "bbox": {"l": 224.80704000000003, "t": 182.03814999999997, "r": 242.94868, "b": 188.23090000000002, "coord_origin": "1"}}, {"id": 18, "text": "\u2212", "bbox": {"l": 242.94704000000002, "t": 181.64764000000002, "r": 249.17394999999996, "b": 188.23090000000002, "coord_origin": "1"}}, {"id": 19, "text": "0001", "bbox": {"l": 249.17404000000002, "t": 182.03814999999997, "r": 265.06036, "b": 188.23090000000002, "coord_origin": "1"}}, {"id": 20, "text": "\u2212", "bbox": {"l": 265.05905, "t": 181.64764000000002, "r": 271.28595, "b": 188.23090000000002, "coord_origin": "1"}}, {"id": 21, "text": "8513", "bbox": {"l": 271.28506, "t": 182.03814999999997, "r": 287.17139, "b": 188.23090000000002, "coord_origin": "1"}}, {"id": 22, "text": "\u2212", "bbox": {"l": 287.17007, "t": 181.64764000000002, "r": 293.39697, "b": 188.23090000000002, "coord_origin": "1"}}, {"id": 23, "text": "$^{3491]}$, Christoph Auer[0000", "bbox": {"l": 293.39706, "t": 183.62872000000004, "r": 404.1597, "b": 192.42571999999996, "coord_origin": "1"}}, {"id": 24, "text": "\u2212", "bbox": {"l": 404.15808, "t": 181.64764000000002, "r": 410.38498, "b": 188.23090000000002, "coord_origin": "1"}}, {"id": 25, "text": "0001", "bbox": {"l": 410.38507, "t": 182.03814999999997, "r": 426.27139, "b": 188.23090000000002, "coord_origin": "1"}}, {"id": 26, "text": "\u2212", "bbox": {"l": 426.27008, "t": 181.64764000000002, "r": 432.49697999999995, "b": 188.23090000000002, "coord_origin": "1"}}, {"id": 27, "text": "5761", "bbox": {"l": 432.49609, "t": 182.03814999999997, "r": 448.3824200000001, "b": 188.23090000000002, "coord_origin": "1"}}, {"id": 28, "text": "\u2212", "bbox": {"l": 448.3811, "t": 181.64764000000002, "r": 454.608, "b": 188.23090000000002, "coord_origin": "1"}}, {"id": 29, "text": "$^{0422]}$,", "bbox": {"l": 454.60808999999995, "t": 183.62872000000004, "r": 476.01270000000005, "b": 192.42571999999996, "coord_origin": "1"}}, {"id": 30, "text": "and Peter Staar", "bbox": {"l": 229.52109000000002, "t": 195.58374000000003, "r": 298.6087, "b": 204.38073999999995, "coord_origin": "1"}}, {"id": 31, "text": "[0000", "bbox": {"l": 298.60608, "t": 193.99316, "r": 316.74771, "b": 200.18591000000004, "coord_origin": "1"}}, {"id": 32, "text": "\u2212", "bbox": {"l": 316.74609, "t": 193.60266000000001, "r": 322.97299, "b": 200.18591000000004, "coord_origin": "1"}}, {"id": 33, "text": "0002", "bbox": {"l": 322.97308, "t": 193.99316, "r": 338.85941, "b": 200.18591000000004, "coord_origin": "1"}}, {"id": 34, "text": "\u2212", "bbox": {"l": 338.85809, "t": 193.60266000000001, "r": 345.08499, "b": 200.18591000000004, "coord_origin": "1"}}, {"id": 35, "text": "8088", "bbox": {"l": 345.08508, "t": 193.99316, "r": 360.97141, "b": 200.18591000000004, "coord_origin": "1"}}, {"id": 36, "text": "\u2212", "bbox": {"l": 360.97009, "t": 193.60266000000001, "r": 367.19699, "b": 200.18591000000004, "coord_origin": "1"}}, {"id": 37, "text": "0823]", "bbox": {"l": 367.19611, "t": 193.99316, "r": 385.33774, "b": 200.18591000000004, "coord_origin": "1"}}]}, "text": "Maksym Lysak [0000 \u2212 0002 \u2212 3723 \u2212 $^{6960]}$, Ahmed Nassar[0000 \u2212 0002 \u2212 9468 \u2212 $^{0822]}$, Nikolaos Livathinos [0000 \u2212 0001 \u2212 8513 \u2212 $^{3491]}$, Christoph Auer[0000 \u2212 0001 \u2212 5761 \u2212 $^{0422]}$, and Peter Staar [0000 \u2212 0002 \u2212 8088 \u2212 0823]"}, {"label": "Text", "id": 2, "page_no": 0, "cluster": {"id": 2, "label": "Text", "bbox": {"l": 222.96609, "t": 216.0551582336426, "r": 392.69108963012695, "b": 236.3769641876221, "coord_origin": "1"}, "confidence": 0.802741527557373, "cells": [{"id": 38, "text": "IBM Research", "bbox": {"l": 279.1051, "t": 217.20398, "r": 336.25153, "b": 225.27368, "coord_origin": "1"}}, {"id": 39, "text": "{mly,ahn,nli,cau,taa}@zurich.ibm.com", "bbox": {"l": 222.96609, "t": 228.80853000000002, "r": 392.38983, "b": 236.27752999999996, "coord_origin": "1"}}]}, "text": "IBM Research {mly,ahn,nli,cau,taa}@zurich.ibm.com"}, {"label": "Text", "id": 3, "page_no": 0, "cluster": {"id": 3, "label": "Text", "bbox": {"l": 162.13674287796022, "t": 269.4665313720702, "r": 452.41988639831544, "b": 464.73447, "coord_origin": "1"}, "confidence": 0.9719225168228149, "cells": [{"id": 40, "text": "Abstract.", "bbox": {"l": 163.1111, "t": 270.30115, "r": 206.6358, "b": 278.22748, "coord_origin": "1"}}, {"id": 41, "text": "Extracting tables from documents is a crucial task in any", "bbox": {"l": 211.6171, "t": 270.36395000000005, "r": 452.2447199999999, "b": 278.43364999999994, "coord_origin": "1"}}, {"id": 42, "text": "document conversion pipeline. Recently, transformer-based models have", "bbox": {"l": 163.1111, "t": 281.3229099999999, "r": 452.24246, "b": 289.39267, "coord_origin": "1"}}, {"id": 43, "text": "demonstrated that table-structure can be recognized with impressive ac-", "bbox": {"l": 163.1111, "t": 292.28189, "r": 452.24792, "b": 300.35165000000006, "coord_origin": "1"}}, {"id": 44, "text": "curacy using Image-to-Markup-Sequence (Im2Seq) approaches. Taking", "bbox": {"l": 163.1111, "t": 303.24088, "r": 452.2407799999999, "b": 311.31064, "coord_origin": "1"}}, {"id": 45, "text": "only the image of a table, such models predict a sequence of tokens (e.g.", "bbox": {"l": 163.1111, "t": 314.19888, "r": 452.24609, "b": 322.26865, "coord_origin": "1"}}, {"id": 46, "text": "in HTML, LaTeX) which represent the structure of the table. Since the", "bbox": {"l": 163.1111, "t": 325.15787, "r": 452.24615000000006, "b": 333.22763, "coord_origin": "1"}}, {"id": 47, "text": "token representation of the table structure has a significant impact on", "bbox": {"l": 163.1111, "t": 336.11685, "r": 452.24707, "b": 344.18661, "coord_origin": "1"}}, {"id": 48, "text": "the accuracy and run-time performance of any Im2Seq model, we inves-", "bbox": {"l": 163.1111, "t": 347.07584, "r": 452.2459999999999, "b": 355.1456, "coord_origin": "1"}}, {"id": 49, "text": "tigate in this paper how table-structure representation can be optimised.", "bbox": {"l": 163.1111, "t": 358.03482, "r": 452.2479900000001, "b": 366.10458, "coord_origin": "1"}}, {"id": 50, "text": "We propose a new, optimised table-structure language (OTSL) with a", "bbox": {"l": 163.1111, "t": 368.9938, "r": 452.24609, "b": 377.06357, "coord_origin": "1"}}, {"id": 51, "text": "minimized vocabulary and specific rules. The benefits of OTSL are that", "bbox": {"l": 163.1111, "t": 379.95279, "r": 452.2417, "b": 388.02255, "coord_origin": "1"}}, {"id": 52, "text": "it reduces the number of tokens to 5 (HTML needs 28+) and shortens", "bbox": {"l": 163.1111, "t": 390.91177, "r": 452.2443200000001, "b": 398.98154, "coord_origin": "1"}}, {"id": 53, "text": "the sequence length to half of HTML on average. Consequently, model", "bbox": {"l": 163.1111, "t": 401.87076, "r": 452.24878000000007, "b": 409.94052, "coord_origin": "1"}}, {"id": 54, "text": "accuracy improves significantly, inference time is halved compared to", "bbox": {"l": 163.1111, "t": 412.82974, "r": 452.24063000000007, "b": 420.8995100000001, "coord_origin": "1"}}, {"id": 55, "text": "HTML-based models, and the predicted table structures are always syn-", "bbox": {"l": 163.1111, "t": 423.78774999999996, "r": 452.24161, "b": 431.85751000000005, "coord_origin": "1"}}, {"id": 56, "text": "tactically correct. This in turn eliminates most post-processing needs.", "bbox": {"l": 163.1111, "t": 434.74673, "r": 452.24429, "b": 442.8165, "coord_origin": "1"}}, {"id": 57, "text": "Popular table structure data-sets will be published in OTSL format to", "bbox": {"l": 163.1111, "t": 445.70572000000004, "r": 452.24603, "b": 453.77547999999996, "coord_origin": "1"}}, {"id": 58, "text": "the community.", "bbox": {"l": 163.1111, "t": 456.6647, "r": 225.56116, "b": 464.73447, "coord_origin": "1"}}]}, "text": "Abstract. Extracting tables from documents is a crucial task in any document conversion pipeline. Recently, transformer-based models have demonstrated that table-structure can be recognized with impressive accuracy using Image-to-Markup-Sequence (Im2Seq) approaches. Taking only the image of a table, such models predict a sequence of tokens (e.g. in HTML, LaTeX) which represent the structure of the table. Since the token representation of the table structure has a significant impact on the accuracy and run-time performance of any Im2Seq model, we investigate in this paper how table-structure representation can be optimised. We propose a new, optimised table-structure language (OTSL) with a minimized vocabulary and specific rules. The benefits of OTSL are that it reduces the number of tokens to 5 (HTML needs 28+) and shortens the sequence length to half of HTML on average. Consequently, model accuracy improves significantly, inference time is halved compared to HTML-based models, and the predicted table structures are always syntactically correct. This in turn eliminates most post-processing needs. Popular table structure data-sets will be published in OTSL format to the community."}, {"label": "Text", "id": 4, "page_no": 0, "cluster": {"id": 4, "label": "Text", "bbox": {"l": 162.67949237823487, "t": 477.7591037750244, "r": 452.24158, "b": 498.1963966369629, "coord_origin": "1"}, "confidence": 0.9469619989395142, "cells": [{"id": 59, "text": "Keywords:", "bbox": {"l": 163.1111, "t": 478.69394, "r": 211.94211, "b": 486.62024, "coord_origin": "1"}}, {"id": 60, "text": "Table Structure Recognition \u00b7 Data Representation \u00b7 Trans-", "bbox": {"l": 216.55208999999996, "t": 478.75671, "r": 452.24158, "b": 486.82648, "coord_origin": "1"}}, {"id": 61, "text": "formers \u00b7 Optimization.", "bbox": {"l": 163.11111, "t": 489.71573, "r": 257.64185, "b": 497.78549, "coord_origin": "1"}}]}, "text": "Keywords: Table Structure Recognition \u00b7 Data Representation \u00b7 Transformers \u00b7 Optimization."}, {"label": "Section-header", "id": 5, "page_no": 0, "cluster": {"id": 5, "label": "Section-header", "bbox": {"l": 134.76512, "t": 521.4849472045898, "r": 228.93384, "b": 532.68808, "coord_origin": "1"}, "confidence": 0.9437072277069092, "cells": [{"id": 62, "text": "1", "bbox": {"l": 134.76512, "t": 522.11969, "r": 141.48872, "b": 532.68808, "coord_origin": "1"}}, {"id": 63, "text": "Introduction", "bbox": {"l": 154.93832, "t": 522.11969, "r": 228.93384, "b": 532.68808, "coord_origin": "1"}}]}, "text": "1 Introduction"}, {"label": "Text", "id": 6, "page_no": 0, "cluster": {"id": 6, "label": "Text", "bbox": {"l": 134.0102459907532, "t": 547.7120315551757, "r": 480.5959500000001, "b": 628.8722877502441, "coord_origin": "1"}, "confidence": 0.9859005212783813, "cells": [{"id": 64, "text": "Tables are ubiquitous in documents such as scientific papers, patents, reports,", "bbox": {"l": 134.76512, "t": 548.2865400000001, "r": 480.5939, "b": 557.0835099999999, "coord_origin": "1"}}, {"id": 65, "text": "manuals, specification sheets or marketing material. They often encode highly", "bbox": {"l": 134.76512, "t": 560.24254, "r": 480.59180000000003, "b": 569.0395100000001, "coord_origin": "1"}}, {"id": 66, "text": "valuable information and therefore need to be extracted with high accuracy.", "bbox": {"l": 134.76512, "t": 572.19754, "r": 480.59283000000005, "b": 580.99451, "coord_origin": "1"}}, {"id": 67, "text": "Unfortunately, tables appear in documents in various sizes, styling and struc-", "bbox": {"l": 134.76512, "t": 584.15254, "r": 480.5959500000001, "b": 592.9495099999999, "coord_origin": "1"}}, {"id": 68, "text": "ture, making it difficult to recover their correct structure with simple analyt-", "bbox": {"l": 134.76512, "t": 596.10754, "r": 480.58688, "b": 604.90451, "coord_origin": "1"}}, {"id": 69, "text": "ical methods. Therefore, accurate table extraction is achieved these days with", "bbox": {"l": 134.76512, "t": 608.06255, "r": 480.59088, "b": 616.85951, "coord_origin": "1"}}, {"id": 70, "text": "machine-learning based methods.", "bbox": {"l": 134.76512, "t": 620.01755, "r": 279.32745, "b": 628.81451, "coord_origin": "1"}}]}, "text": "Tables are ubiquitous in documents such as scientific papers, patents, reports, manuals, specification sheets or marketing material. They often encode highly valuable information and therefore need to be extracted with high accuracy. Unfortunately, tables appear in documents in various sizes, styling and structure, making it difficult to recover their correct structure with simple analytical methods. Therefore, accurate table extraction is achieved these days with machine-learning based methods."}, {"label": "Text", "id": 7, "page_no": 0, "cluster": {"id": 7, "label": "Text", "bbox": {"l": 134.04418516159058, "t": 631.6932197570801, "r": 480.7483549118042, "b": 665.1588180541993, "coord_origin": "1"}, "confidence": 0.9777455925941467, "cells": [{"id": 71, "text": "In modern document understanding systems [1,15], table extraction is typi-", "bbox": {"l": 149.70811, "t": 632.14755, "r": 480.58899, "b": 640.94452, "coord_origin": "1"}}, {"id": 72, "text": "cally a two-step process. Firstly, every table on a page is located with a bounding", "bbox": {"l": 134.76512, "t": 644.1025500000001, "r": 480.59583, "b": 652.8995199999999, "coord_origin": "1"}}, {"id": 73, "text": "box, and secondly, their logical row and column structure is recognized. As of", "bbox": {"l": 134.76512, "t": 656.05756, "r": 480.59496999999993, "b": 664.85453, "coord_origin": "1"}}]}, "text": "In modern document understanding systems [1,15], table extraction is typically a two-step process. Firstly, every table on a page is located with a bounding box, and secondly, their logical row and column structure is recognized. As of"}], "headers": [{"label": "Page-header", "id": 8, "page_no": 0, "cluster": {"id": 8, "label": "Page-header", "bbox": {"l": 16.3292133808136, "t": 209.47997999999995, "r": 36.60316228866577, "b": 555.00003, "coord_origin": "1"}, "confidence": 0.867455244064331, "cells": [{"id": 74, "text": "arXiv:2305.03393v1 [cs.CV] 5 May 2023", "bbox": {"l": 18.340218, "t": 209.47997999999995, "r": 36.339787, "b": 555.00003, "coord_origin": "1"}}]}, "text": "arXiv:2305.03393v1 [cs.CV] 5 May 2023"}]}}, {"page_no": 1, "page_hash": "45bd6ad4d3e145029fa89fbf741a81d8885eb87ef03d6744221c61e66358451b", "size": {"width": 612.0, "height": 792.0}, "cells": [{"id": 0, "text": "2", "bbox": {"l": 134.765, "t": 93.77099999999996, "r": 139.37193, "b": 101.84069999999997, "coord_origin": "1"}}, {"id": 1, "text": "M.", "bbox": {"l": 167.81335, "t": 93.77099999999996, "r": 178.07675, "b": 101.84069999999997, "coord_origin": "1"}}, {"id": 2, "text": "Lysak, et al.", "bbox": {"l": 182.37415, "t": 93.77099999999996, "r": 231.72227, "b": 101.84069999999997, "coord_origin": "1"}}, {"id": 3, "text": "Fig. 1.", "bbox": {"l": 134.765, "t": 126.33416999999997, "r": 162.64424, "b": 134.26049999999998, "coord_origin": "1"}}, {"id": 4, "text": "Comparison between HTML and OTSL table structure representation: (A)", "bbox": {"l": 167.062, "t": 126.39697000000001, "r": 480.59106, "b": 134.46667000000002, "coord_origin": "1"}}, {"id": 5, "text": "table-example with complex row and column headers, including a 2D empty span,", "bbox": {"l": 134.765, "t": 137.35595999999998, "r": 480.59018, "b": 145.42566, "coord_origin": "1"}}, {"id": 6, "text": "(B)", "bbox": {"l": 134.765, "t": 148.31493999999998, "r": 147.95433, "b": 156.38464, "coord_origin": "1"}}, {"id": 7, "text": "minimal graphical representation of table structure using rectangular layout, (C)", "bbox": {"l": 152.39224, "t": 148.31493999999998, "r": 480.59096999999997, "b": 156.38464, "coord_origin": "1"}}, {"id": 8, "text": "HTML representation, (D) OTSL representation. This example demonstrates many of", "bbox": {"l": 134.765, "t": 159.27392999999995, "r": 480.59189, "b": 167.34362999999996, "coord_origin": "1"}}, {"id": 9, "text": "the key-features of OTSL, namely its reduced vocabulary size (12 versus 5 in this case),", "bbox": {"l": 134.765, "t": 170.23290999999995, "r": 480.58914000000004, "b": 178.30260999999996, "coord_origin": "1"}}, {"id": 10, "text": "its reduced sequence length (55 versus 30) and a enhanced internal structure (variable", "bbox": {"l": 134.765, "t": 181.19188999999994, "r": 480.59020999999996, "b": 189.26160000000004, "coord_origin": "1"}}, {"id": 11, "text": "token sequence length per row in HTML versus a fixed length of rows in OTSL).", "bbox": {"l": 134.765, "t": 192.15088000000003, "r": 460.87109, "b": 200.22058000000004, "coord_origin": "1"}}, {"id": 12, "text": "C", "bbox": {"l": 396.41107, "t": 280.98352, "r": 402.97336, "b": 289.50903, "coord_origin": "1"}}, {"id": 13, "text": "C", "bbox": {"l": 418.58682, "t": 280.89792, "r": 425.14911, "b": 289.42343, "coord_origin": "1"}}, {"id": 14, "text": "C", "bbox": {"l": 395.74835, "t": 303.23727, "r": 402.31064, "b": 311.76279, "coord_origin": "1"}}, {"id": 15, "text": "C", "bbox": {"l": 407.54214, "t": 303.36981, "r": 414.10443, "b": 311.89532, "coord_origin": "1"}}, {"id": 16, "text": "C", "bbox": {"l": 407.56335, "t": 314.40619, "r": 414.12564, "b": 322.9317, "coord_origin": "1"}}, {"id": 17, "text": "C", "bbox": {"l": 418.51108, "t": 292.08502000000004, "r": 425.07336, "b": 300.61053000000004, "coord_origin": "1"}}, {"id": 18, "text": "C", "bbox": {"l": 429.59744, "t": 292.09106, "r": 436.1597300000001, "b": 300.61658, "coord_origin": "1"}}, {"id": 19, "text": "C", "bbox": {"l": 440.68759000000006, "t": 292.01230000000004, "r": 447.24987999999996, "b": 300.53781000000004, "coord_origin": "1"}}, {"id": 20, "text": "C", "bbox": {"l": 418.6232, "t": 303.29483, "r": 425.18549, "b": 311.82034, "coord_origin": "1"}}, {"id": 21, "text": "C", "bbox": {"l": 429.7095299999999, "t": 303.30011, "r": 436.27182, "b": 311.82562, "coord_origin": "1"}}, {"id": 22, "text": "C", "bbox": {"l": 440.7996800000001, "t": 303.22211, "r": 447.36197, "b": 311.74762, "coord_origin": "1"}}, {"id": 23, "text": "C", "bbox": {"l": 418.62546, "t": 314.56903, "r": 425.18774, "b": 323.09454, "coord_origin": "1"}}, {"id": 24, "text": "C", "bbox": {"l": 429.71181999999993, "t": 314.57434, "r": 436.27411, "b": 323.09985, "coord_origin": "1"}}, {"id": 25, "text": "C", "bbox": {"l": 440.80194, "t": 314.49631, "r": 447.36423, "b": 323.02182, "coord_origin": "1"}}, {"id": 26, "text": "C", "bbox": {"l": 407.39746, "t": 325.29031, "r": 413.95975, "b": 333.81583, "coord_origin": "1"}}, {"id": 27, "text": "C", "bbox": {"l": 418.45959, "t": 325.45316, "r": 425.02188, "b": 333.97867, "coord_origin": "1"}}, {"id": 28, "text": "C", "bbox": {"l": 429.54593, "t": 325.4592, "r": 436.10822, "b": 333.98471, "coord_origin": "1"}}, {"id": 29, "text": "C", "bbox": {"l": 440.63608, "t": 325.38043, "r": 447.19836, "b": 333.90594, "coord_origin": "1"}}, {"id": 30, "text": "NL", "bbox": {"l": 451.89511000000005, "t": 280.15717, "r": 463.51273000000003, "b": 288.68268, "coord_origin": "1"}}, {"id": 31, "text": "NL", "bbox": {"l": 452.1557, "t": 291.59875000000005, "r": 463.77332, "b": 300.12427, "coord_origin": "1"}}, {"id": 32, "text": "NL", "bbox": {"l": 452.17688000000004, "t": 302.84265, "r": 463.79449000000005, "b": 311.36816, "coord_origin": "1"}}, {"id": 33, "text": "NL", "bbox": {"l": 452.09887999999995, "t": 314.12441999999993, "r": 463.71648999999996, "b": 322.6499299999999, "coord_origin": "1"}}, {"id": 34, "text": "NL", "bbox": {"l": 452.29733, "t": 325.46906, "r": 463.91495, "b": 333.99457, "coord_origin": "1"}}, {"id": 35, "text": "U", "bbox": {"l": 396.09677, "t": 314.49478, "r": 402.65906, "b": 323.02029000000005, "coord_origin": "1"}}, {"id": 36, "text": "U", "bbox": {"l": 395.99829, "t": 325.38876000000005, "r": 402.56058, "b": 333.91428, "coord_origin": "1"}}, {"id": 37, "text": "U", "bbox": {"l": 396.27475, "t": 292.27057, "r": 402.83704, "b": 300.79608, "coord_origin": "1"}}, {"id": 38, "text": "L", "bbox": {"l": 408.54724, "t": 280.96912, "r": 413.60074, "b": 289.49463, "coord_origin": "1"}}, {"id": 39, "text": "L", "bbox": {"l": 430.58966, "t": 280.49725, "r": 435.6431600000001, "b": 289.02277, "coord_origin": "1"}}, {"id": 40, "text": "L", "bbox": {"l": 441.08069, "t": 280.38062, "r": 446.13419, "b": 288.90613, "coord_origin": "1"}}, {"id": 41, "text": "X", "bbox": {"l": 407.97388, "t": 292.13425, "r": 414.03625, "b": 300.65976, "coord_origin": "1"}}, {"id": 42, "text": "NL", "bbox": {"l": 441.25640999999996, "t": 411.1807600000001, "r": 452.87402, "b": 419.7062700000001, "coord_origin": "1"}}, {"id": 43, "text": "vocabulary:", "bbox": {"l": 393.75256, "t": 399.7947700000001, "r": 432.48929, "b": 406.89935, "coord_origin": "1"}}, {"id": 44, "text": "5", "bbox": {"l": 434.5896000000001, "t": 399.7947700000001, "r": 438.80083999999994, "b": 406.89935, "coord_origin": "1"}}, {"id": 45, "text": "tokens", "bbox": {"l": 440.90573, "t": 399.7947700000001, "r": 463.22235, "b": 406.89935, "coord_origin": "1"}}, {"id": 46, "text": "D OTSL", "bbox": {"l": 384.11816, "t": 258.54718, "r": 413.99307, "b": 265.65179, "coord_origin": "1"}}, {"id": 47, "text": "sequence length:", "bbox": {"l": 393.75256, "t": 266.67505000000006, "r": 451.45129000000003, "b": 273.77966000000004, "coord_origin": "1"}}, {"id": 48, "text": "30", "bbox": {"l": 453.55083999999994, "t": 266.67505000000006, "r": 461.97485, "b": 273.77966000000004, "coord_origin": "1"}}, {"id": 49, "text": "vocabulary for this table:", "bbox": {"l": 151.79318, "t": 399.76016, "r": 233.89371000000003, "b": 406.86474999999996, "coord_origin": "1"}}, {"id": 50, "text": "12", "bbox": {"l": 235.99332, "t": 399.76016, "r": 244.41734000000002, "b": 406.86474999999996, "coord_origin": "1"}}, {"id": 51, "text": "tokens", "bbox": {"l": 246.52222, "t": 399.76016, "r": 268.83884, "b": 406.86474999999996, "coord_origin": "1"}}, {"id": 52, "text": "A", "bbox": {"l": 154.3298, "t": 213.57457999999997, "r": 159.79837, "b": 220.67920000000004, "coord_origin": "1"}}, {"id": 53, "text": "B", "bbox": {"l": 321.07053, "t": 213.57457999999997, "r": 326.53909, "b": 220.67920000000004, "coord_origin": "1"}}, {"id": 54, "text": "<table>", "bbox": {"l": 153.0947, "t": 280.30411, "r": 175.83888, "b": 286.69824, "coord_origin": "1"}}, {"id": 55, "text": "<tr>", "bbox": {"l": 160.67039, "t": 287.12088, "r": 172.79608, "b": 293.51501, "coord_origin": "1"}}, {"id": 56, "text": "<td", "bbox": {"l": 168.24603, "t": 293.93765, "r": 177.91019, "b": 300.33179, "coord_origin": "1"}}, {"id": 57, "text": "colspan=\u201c2\u201d", "bbox": {"l": 179.80525, "t": 293.93765, "r": 215.61517, "b": 300.33179, "coord_origin": "1"}}, {"id": 58, "text": "rowspan=\u201c2\u201d", "bbox": {"l": 217.50886999999997, "t": 293.93765, "r": 255.58945, "b": 300.33179, "coord_origin": "1"}}, {"id": 59, "text": ">", "bbox": {"l": 257.48315, "t": 293.93765, "r": 261.46414, "b": 300.33179, "coord_origin": "1"}}, {"id": 60, "text": "</td>", "bbox": {"l": 263.35785, "t": 293.93765, "r": 278.89804, "b": 300.33179, "coord_origin": "1"}}, {"id": 61, "text": "<td", "bbox": {"l": 280.79175, "t": 293.93765, "r": 290.4559, "b": 300.33179, "coord_origin": "1"}}, {"id": 62, "text": "colspan=\u201c3\u201d", "bbox": {"l": 292.35095, "t": 293.93765, "r": 328.16083, "b": 300.33179, "coord_origin": "1"}}, {"id": 63, "text": ">", "bbox": {"l": 330.05457, "t": 293.93765, "r": 334.03555, "b": 300.33179, "coord_origin": "1"}}, {"id": 64, "text": "</td>", "bbox": {"l": 335.92926, "t": 293.93765, "r": 351.46945, "b": 300.33179, "coord_origin": "1"}}, {"id": 65, "text": "</tr>", "bbox": {"l": 160.67039, "t": 300.75442999999996, "r": 174.68979, "b": 307.14856, "coord_origin": "1"}}, {"id": 66, "text": "<tr>", "bbox": {"l": 160.67039, "t": 307.57122999999996, "r": 172.79608, "b": 313.96536, "coord_origin": "1"}}, {"id": 67, "text": "<td>", "bbox": {"l": 168.24603, "t": 314.388, "r": 181.89255, "b": 320.78214, "coord_origin": "1"}}, {"id": 68, "text": "</td>", "bbox": {"l": 183.78624, "t": 314.388, "r": 199.32646, "b": 320.78214, "coord_origin": "1"}}, {"id": 69, "text": "<td>", "bbox": {"l": 201.22015, "t": 314.388, "r": 214.86666999999997, "b": 320.78214, "coord_origin": "1"}}, {"id": 70, "text": "</td>", "bbox": {"l": 216.76038, "t": 314.388, "r": 232.30058, "b": 320.78214, "coord_origin": "1"}}, {"id": 71, "text": "<td>", "bbox": {"l": 234.19427000000002, "t": 314.388, "r": 247.84079000000003, "b": 320.78214, "coord_origin": "1"}}, {"id": 72, "text": "</td>", "bbox": {"l": 249.73447999999996, "t": 314.388, "r": 265.27469, "b": 320.78214, "coord_origin": "1"}}, {"id": 73, "text": "</tr>", "bbox": {"l": 160.67039, "t": 321.20477, "r": 174.68979, "b": 327.59890999999993, "coord_origin": "1"}}, {"id": 74, "text": "<tr>", "bbox": {"l": 160.67039, "t": 328.02158, "r": 172.79608, "b": 334.41571000000005, "coord_origin": "1"}}, {"id": 75, "text": "<td rowspan=\u201c3\u201d > </td> <td> </td> <td> </td> <td> </td> <td> </td>", "bbox": {"l": 168.24603, "t": 334.83835, "r": 373.09091, "b": 341.23248, "coord_origin": "1"}}, {"id": 76, "text": "</tr>", "bbox": {"l": 160.67039, "t": 341.65512, "r": 174.68979, "b": 348.04926, "coord_origin": "1"}}, {"id": 77, "text": "<tr>", "bbox": {"l": 160.67039, "t": 348.47159, "r": 172.79608, "b": 354.86572, "coord_origin": "1"}}, {"id": 78, "text": "<td>", "bbox": {"l": 168.24603, "t": 355.28836000000007, "r": 181.89255, "b": 361.68249999999995, "coord_origin": "1"}}, {"id": 79, "text": "</td>", "bbox": {"l": 183.78624, "t": 355.28836000000007, "r": 199.32646, "b": 361.68249999999995, "coord_origin": "1"}}, {"id": 80, "text": "<td>", "bbox": {"l": 201.22015, "t": 355.28836000000007, "r": 214.86666999999997, "b": 361.68249999999995, "coord_origin": "1"}}, {"id": 81, "text": "</td>", "bbox": {"l": 216.76038, "t": 355.28836000000007, "r": 232.30058, "b": 361.68249999999995, "coord_origin": "1"}}, {"id": 82, "text": "<td>", "bbox": {"l": 234.19427000000002, "t": 355.28836000000007, "r": 247.84079000000003, "b": 361.68249999999995, "coord_origin": "1"}}, {"id": 83, "text": "</td>", "bbox": {"l": 249.73447999999996, "t": 355.28836000000007, "r": 265.27469, "b": 361.68249999999995, "coord_origin": "1"}}, {"id": 84, "text": "<td>", "bbox": {"l": 267.1684, "t": 355.28836000000007, "r": 280.81488, "b": 361.68249999999995, "coord_origin": "1"}}, {"id": 85, "text": "</td>", "bbox": {"l": 282.70862, "t": 355.28836000000007, "r": 298.24881, "b": 361.68249999999995, "coord_origin": "1"}}, {"id": 86, "text": "</tr>", "bbox": {"l": 160.67039, "t": 362.10516000000007, "r": 174.68979, "b": 368.49929999999995, "coord_origin": "1"}}, {"id": 87, "text": "<tr>", "bbox": {"l": 160.67039, "t": 368.92194, "r": 172.79608, "b": 375.31607, "coord_origin": "1"}}, {"id": 88, "text": "<td>", "bbox": {"l": 168.24603, "t": 375.73871, "r": 181.89255, "b": 382.13284, "coord_origin": "1"}}, {"id": 89, "text": "</td>", "bbox": {"l": 183.78624, "t": 375.73871, "r": 199.32646, "b": 382.13284, "coord_origin": "1"}}, {"id": 90, "text": "<td>", "bbox": {"l": 201.22015, "t": 375.73871, "r": 214.86666999999997, "b": 382.13284, "coord_origin": "1"}}, {"id": 91, "text": "</td>", "bbox": {"l": 216.76038, "t": 375.73871, "r": 232.30058, "b": 382.13284, "coord_origin": "1"}}, {"id": 92, "text": "<td>", "bbox": {"l": 234.19427000000002, "t": 375.73871, "r": 247.84079000000003, "b": 382.13284, "coord_origin": "1"}}, {"id": 93, "text": "</td>", "bbox": {"l": 249.73447999999996, "t": 375.73871, "r": 265.27469, "b": 382.13284, "coord_origin": "1"}}, {"id": 94, "text": "<td>", "bbox": {"l": 267.1684, "t": 375.73871, "r": 280.81488, "b": 382.13284, "coord_origin": "1"}}, {"id": 95, "text": "</td>", "bbox": {"l": 282.70862, "t": 375.73871, "r": 298.24881, "b": 382.13284, "coord_origin": "1"}}, {"id": 96, "text": "</tr>", "bbox": {"l": 160.67039, "t": 382.55551, "r": 174.68979, "b": 388.94965, "coord_origin": "1"}}, {"id": 97, "text": "</table>", "bbox": {"l": 153.0947, "t": 389.37228, "r": 177.73259, "b": 395.76642, "coord_origin": "1"}}, {"id": 98, "text": "C", "bbox": {"l": 395.06137, "t": 411.33353, "r": 401.62366, "b": 419.85904, "coord_origin": "1"}}, {"id": 99, "text": "L", "bbox": {"l": 407.42249, "t": 411.33353, "r": 412.47598, "b": 419.85904, "coord_origin": "1"}}, {"id": 100, "text": "U", "bbox": {"l": 418.69287, "t": 411.33353, "r": 425.25516, "b": 419.85904, "coord_origin": "1"}}, {"id": 101, "text": "X", "bbox": {"l": 430.5086099999999, "t": 411.33353, "r": 436.5709800000001, "b": 419.85904, "coord_origin": "1"}}, {"id": 102, "text": "<table>", "bbox": {"l": 152.36208, "t": 409.77362, "r": 175.10626, "b": 416.16776, "coord_origin": "1"}}, {"id": 103, "text": "<tr>", "bbox": {"l": 178.89366, "t": 409.77362, "r": 191.01935, "b": 416.16776, "coord_origin": "1"}}, {"id": 104, "text": "</tr>", "bbox": {"l": 194.80676, "t": 409.77362, "r": 208.82614, "b": 416.16776, "coord_origin": "1"}}, {"id": 105, "text": "<td>", "bbox": {"l": 212.61354, "t": 409.77362, "r": 226.26003999999998, "b": 416.16776, "coord_origin": "1"}}, {"id": 106, "text": "</td>", "bbox": {"l": 230.04745000000003, "t": 409.77362, "r": 245.58765000000002, "b": 416.16776, "coord_origin": "1"}}, {"id": 107, "text": "<td", "bbox": {"l": 249.37506000000002, "t": 409.77362, "r": 259.03918, "b": 416.16776, "coord_origin": "1"}}, {"id": 108, "text": "colspan=\"2\"", "bbox": {"l": 262.82797, "t": 409.77362, "r": 298.93646, "b": 416.16776, "coord_origin": "1"}}, {"id": 109, "text": "colspan=\"3\"", "bbox": {"l": 302.72385, "t": 409.77362, "r": 338.83234, "b": 416.16776, "coord_origin": "1"}}, {"id": 110, "text": "rowspan=\"2\"", "bbox": {"l": 152.36208, "t": 418.10522, "r": 190.74123, "b": 424.49936, "coord_origin": "1"}}, {"id": 111, "text": "rowspan=\"3\"", "bbox": {"l": 194.52863, "t": 418.10522, "r": 232.90777999999997, "b": 424.49936, "coord_origin": "1"}}, {"id": 112, "text": ">", "bbox": {"l": 236.69518999999997, "t": 418.10522, "r": 240.67617999999996, "b": 424.49936, "coord_origin": "1"}}, {"id": 113, "text": "</table>", "bbox": {"l": 244.46358, "t": 418.10522, "r": 269.10144, "b": 424.49936, "coord_origin": "1"}}, {"id": 114, "text": "C", "bbox": {"l": 154.50595, "t": 258.60095, "r": 159.62473, "b": 265.70556999999997, "coord_origin": "1"}}, {"id": 115, "text": "HTML", "bbox": {"l": 164.74348, "t": 258.60095, "r": 185.21857, "b": 265.70556999999997, "coord_origin": "1"}}, {"id": 116, "text": "sequence length:", "bbox": {"l": 164.3548, "t": 266.49707, "r": 222.05352999999997, "b": 273.60168, "coord_origin": "1"}}, {"id": 117, "text": "55", "bbox": {"l": 224.15326, "t": 266.49707, "r": 232.57729, "b": 273.60168, "coord_origin": "1"}}, {"id": 118, "text": "today,", "bbox": {"l": 134.765, "t": 452.31378, "r": 161.32928, "b": 461.11075, "coord_origin": "1"}}, {"id": 119, "text": "table detection", "bbox": {"l": 164.269, "t": 452.31378, "r": 226.28617999999997, "b": 461.11075, "coord_origin": "1"}}, {"id": 120, "text": "in documents is a well understood problem, and the latest", "bbox": {"l": 229.992, "t": 452.31378, "r": 480.59232000000003, "b": 461.11075, "coord_origin": "1"}}, {"id": 121, "text": "state-of-the-art (SOTA) object detection methods provide an accuracy compa-", "bbox": {"l": 134.76501, "t": 464.26877, "r": 480.59180000000003, "b": 473.06573, "coord_origin": "1"}}, {"id": 122, "text": "rable to human observers [7,8,10,14,23]. On the other hand, the problem of table", "bbox": {"l": 134.76501, "t": 476.22375, "r": 480.58673, "b": 485.02072, "coord_origin": "1"}}, {"id": 123, "text": "structure recognition (TSR) is a lot more challenging and remains a very active", "bbox": {"l": 134.76501, "t": 488.17975, "r": 480.58658, "b": 496.97672, "coord_origin": "1"}}, {"id": 124, "text": "area of research, in which many novel machine learning algorithms are being", "bbox": {"l": 134.76501, "t": 500.13474, "r": 480.58978, "b": 508.9317, "coord_origin": "1"}}, {"id": 125, "text": "explored [3,4,5,9,11,12,13,14,17,18,21,22].", "bbox": {"l": 134.76501, "t": 512.0897199999999, "r": 313.24597, "b": 520.88669, "coord_origin": "1"}}, {"id": 126, "text": "Recently emerging SOTA methods for table structure recognition employ", "bbox": {"l": 149.70901, "t": 524.55072, "r": 480.58884000000006, "b": 533.3476900000001, "coord_origin": "1"}}, {"id": 127, "text": "transformer-based models, in which an image of the table is provided to the net-", "bbox": {"l": 134.76501, "t": 536.50671, "r": 480.5917400000001, "b": 545.30368, "coord_origin": "1"}}, {"id": 128, "text": "work in order to predict the structure of the table as a sequence of tokens. These", "bbox": {"l": 134.76501, "t": 548.46172, "r": 480.58868, "b": 557.25868, "coord_origin": "1"}}, {"id": 129, "text": "image-to-sequence (Im2Seq) models are extremely powerful, since they allow for", "bbox": {"l": 134.76501, "t": 560.41672, "r": 480.58795, "b": 569.2136800000001, "coord_origin": "1"}}, {"id": 130, "text": "a purely data-driven solution. The tokens of the sequence typically belong to a", "bbox": {"l": 134.76501, "t": 572.37172, "r": 480.58978, "b": 581.16869, "coord_origin": "1"}}, {"id": 131, "text": "markup language such as HTML, Latex or Markdown, which allow to describe", "bbox": {"l": 134.76501, "t": 584.32672, "r": 480.59479, "b": 593.12369, "coord_origin": "1"}}, {"id": 132, "text": "table structure as rows, columns and spanning cells in various configurations.", "bbox": {"l": 134.76501, "t": 596.28271, "r": 480.58678999999995, "b": 605.0796799999999, "coord_origin": "1"}}, {"id": 133, "text": "In Figure 1, we illustrate how HTML is used to represent the table-structure", "bbox": {"l": 134.76501, "t": 608.23772, "r": 480.59476, "b": 617.03468, "coord_origin": "1"}}, {"id": 134, "text": "of a particular example table. Public table-structure data sets such as PubTab-", "bbox": {"l": 134.76501, "t": 620.19272, "r": 480.5938100000001, "b": 628.98969, "coord_origin": "1"}}, {"id": 135, "text": "Net [22], and FinTabNet [21], which were created in a semi-automated way from", "bbox": {"l": 134.76501, "t": 632.1477199999999, "r": 480.59482, "b": 640.94469, "coord_origin": "1"}}, {"id": 136, "text": "paired PDF and HTML sources (e.g. PubMed Central), popularized primarily", "bbox": {"l": 134.76501, "t": 644.10272, "r": 480.58771, "b": 652.89969, "coord_origin": "1"}}, {"id": 137, "text": "the use of HTML as ground-truth representation format for TSR.", "bbox": {"l": 134.76501, "t": 656.05772, "r": 421.45377, "b": 664.8547, "coord_origin": "1"}}], "predictions": {"layout": {"clusters": [{"id": 0, "label": "Page-header", "bbox": {"l": 134.28974075317385, "t": 93.54430103302002, "r": 139.49438409805296, "b": 101.84069999999997, "coord_origin": "1"}, "confidence": 0.862735390663147, "cells": [{"id": 0, "text": "2", "bbox": {"l": 134.765, "t": 93.77099999999996, "r": 139.37193, "b": 101.84069999999997, "coord_origin": "1"}}]}, {"id": 1, "label": "Page-header", "bbox": {"l": 167.31274366378784, "t": 92.9727201461792, "r": 231.72227, "b": 102.11999702453613, "coord_origin": "1"}, "confidence": 0.930519700050354, "cells": [{"id": 1, "text": "M.", "bbox": {"l": 167.81335, "t": 93.77099999999996, "r": 178.07675, "b": 101.84069999999997, "coord_origin": "1"}}, {"id": 2, "text": "Lysak, et al.", "bbox": {"l": 182.37415, "t": 93.77099999999996, "r": 231.72227, "b": 101.84069999999997, "coord_origin": "1"}}]}, {"id": 2, "label": "Text", "bbox": {"l": 133.9922842025757, "t": 125.57491407394411, "r": 480.75620498657224, "b": 200.46212196350098, "coord_origin": "1"}, "confidence": 0.8814666271209717, "cells": [{"id": 3, "text": "Fig. 1.", "bbox": {"l": 134.765, "t": 126.33416999999997, "r": 162.64424, "b": 134.26049999999998, "coord_origin": "1"}}, {"id": 4, "text": "Comparison between HTML and OTSL table structure representation: (A)", "bbox": {"l": 167.062, "t": 126.39697000000001, "r": 480.59106, "b": 134.46667000000002, "coord_origin": "1"}}, {"id": 5, "text": "table-example with complex row and column headers, including a 2D empty span,", "bbox": {"l": 134.765, "t": 137.35595999999998, "r": 480.59018, "b": 145.42566, "coord_origin": "1"}}, {"id": 6, "text": "(B)", "bbox": {"l": 134.765, "t": 148.31493999999998, "r": 147.95433, "b": 156.38464, "coord_origin": "1"}}, {"id": 7, "text": "minimal graphical representation of table structure using rectangular layout, (C)", "bbox": {"l": 152.39224, "t": 148.31493999999998, "r": 480.59096999999997, "b": 156.38464, "coord_origin": "1"}}, {"id": 8, "text": "HTML representation, (D) OTSL representation. This example demonstrates many of", "bbox": {"l": 134.765, "t": 159.27392999999995, "r": 480.59189, "b": 167.34362999999996, "coord_origin": "1"}}, {"id": 9, "text": "the key-features of OTSL, namely its reduced vocabulary size (12 versus 5 in this case),", "bbox": {"l": 134.765, "t": 170.23290999999995, "r": 480.58914000000004, "b": 178.30260999999996, "coord_origin": "1"}}, {"id": 10, "text": "its reduced sequence length (55 versus 30) and a enhanced internal structure (variable", "bbox": {"l": 134.765, "t": 181.19188999999994, "r": 480.59020999999996, "b": 189.26160000000004, "coord_origin": "1"}}, {"id": 11, "text": "token sequence length per row in HTML versus a fixed length of rows in OTSL).", "bbox": {"l": 134.765, "t": 192.15088000000003, "r": 460.87109, "b": 200.22058000000004, "coord_origin": "1"}}]}, {"id": 3, "label": "Picture", "bbox": {"l": 150.0213635444641, "t": 208.88499984741213, "r": 464.4815700531006, "b": 425.84868278503416, "coord_origin": "1"}, "confidence": 0.9741523265838623, "cells": [{"id": 12, "text": "C", "bbox": {"l": 396.41107, "t": 280.98352, "r": 402.97336, "b": 289.50903, "coord_origin": "1"}}, {"id": 13, "text": "C", "bbox": {"l": 418.58682, "t": 280.89792, "r": 425.14911, "b": 289.42343, "coord_origin": "1"}}, {"id": 14, "text": "C", "bbox": {"l": 395.74835, "t": 303.23727, "r": 402.31064, "b": 311.76279, "coord_origin": "1"}}, {"id": 15, "text": "C", "bbox": {"l": 407.54214, "t": 303.36981, "r": 414.10443, "b": 311.89532, "coord_origin": "1"}}, {"id": 16, "text": "C", "bbox": {"l": 407.56335, "t": 314.40619, "r": 414.12564, "b": 322.9317, "coord_origin": "1"}}, {"id": 17, "text": "C", "bbox": {"l": 418.51108, "t": 292.08502000000004, "r": 425.07336, "b": 300.61053000000004, "coord_origin": "1"}}, {"id": 18, "text": "C", "bbox": {"l": 429.59744, "t": 292.09106, "r": 436.1597300000001, "b": 300.61658, "coord_origin": "1"}}, {"id": 19, "text": "C", "bbox": {"l": 440.68759000000006, "t": 292.01230000000004, "r": 447.24987999999996, "b": 300.53781000000004, "coord_origin": "1"}}, {"id": 20, "text": "C", "bbox": {"l": 418.6232, "t": 303.29483, "r": 425.18549, "b": 311.82034, "coord_origin": "1"}}, {"id": 21, "text": "C", "bbox": {"l": 429.7095299999999, "t": 303.30011, "r": 436.27182, "b": 311.82562, "coord_origin": "1"}}, {"id": 22, "text": "C", "bbox": {"l": 440.7996800000001, "t": 303.22211, "r": 447.36197, "b": 311.74762, "coord_origin": "1"}}, {"id": 23, "text": "C", "bbox": {"l": 418.62546, "t": 314.56903, "r": 425.18774, "b": 323.09454, "coord_origin": "1"}}, {"id": 24, "text": "C", "bbox": {"l": 429.71181999999993, "t": 314.57434, "r": 436.27411, "b": 323.09985, "coord_origin": "1"}}, {"id": 25, "text": "C", "bbox": {"l": 440.80194, "t": 314.49631, "r": 447.36423, "b": 323.02182, "coord_origin": "1"}}, {"id": 26, "text": "C", "bbox": {"l": 407.39746, "t": 325.29031, "r": 413.95975, "b": 333.81583, "coord_origin": "1"}}, {"id": 27, "text": "C", "bbox": {"l": 418.45959, "t": 325.45316, "r": 425.02188, "b": 333.97867, "coord_origin": "1"}}, {"id": 28, "text": "C", "bbox": {"l": 429.54593, "t": 325.4592, "r": 436.10822, "b": 333.98471, "coord_origin": "1"}}, {"id": 29, "text": "C", "bbox": {"l": 440.63608, "t": 325.38043, "r": 447.19836, "b": 333.90594, "coord_origin": "1"}}, {"id": 30, "text": "NL", "bbox": {"l": 451.89511000000005, "t": 280.15717, "r": 463.51273000000003, "b": 288.68268, "coord_origin": "1"}}, {"id": 31, "text": "NL", "bbox": {"l": 452.1557, "t": 291.59875000000005, "r": 463.77332, "b": 300.12427, "coord_origin": "1"}}, {"id": 32, "text": "NL", "bbox": {"l": 452.17688000000004, "t": 302.84265, "r": 463.79449000000005, "b": 311.36816, "coord_origin": "1"}}, {"id": 33, "text": "NL", "bbox": {"l": 452.09887999999995, "t": 314.12441999999993, "r": 463.71648999999996, "b": 322.6499299999999, "coord_origin": "1"}}, {"id": 34, "text": "NL", "bbox": {"l": 452.29733, "t": 325.46906, "r": 463.91495, "b": 333.99457, "coord_origin": "1"}}, {"id": 35, "text": "U", "bbox": {"l": 396.09677, "t": 314.49478, "r": 402.65906, "b": 323.02029000000005, "coord_origin": "1"}}, {"id": 36, "text": "U", "bbox": {"l": 395.99829, "t": 325.38876000000005, "r": 402.56058, "b": 333.91428, "coord_origin": "1"}}, {"id": 37, "text": "U", "bbox": {"l": 396.27475, "t": 292.27057, "r": 402.83704, "b": 300.79608, "coord_origin": "1"}}, {"id": 38, "text": "L", "bbox": {"l": 408.54724, "t": 280.96912, "r": 413.60074, "b": 289.49463, "coord_origin": "1"}}, {"id": 39, "text": "L", "bbox": {"l": 430.58966, "t": 280.49725, "r": 435.6431600000001, "b": 289.02277, "coord_origin": "1"}}, {"id": 40, "text": "L", "bbox": {"l": 441.08069, "t": 280.38062, "r": 446.13419, "b": 288.90613, "coord_origin": "1"}}, {"id": 41, "text": "X", "bbox": {"l": 407.97388, "t": 292.13425, "r": 414.03625, "b": 300.65976, "coord_origin": "1"}}, {"id": 42, "text": "NL", "bbox": {"l": 441.25640999999996, "t": 411.1807600000001, "r": 452.87402, "b": 419.7062700000001, "coord_origin": "1"}}, {"id": 43, "text": "vocabulary:", "bbox": {"l": 393.75256, "t": 399.7947700000001, "r": 432.48929, "b": 406.89935, "coord_origin": "1"}}, {"id": 44, "text": "5", "bbox": {"l": 434.5896000000001, "t": 399.7947700000001, "r": 438.80083999999994, "b": 406.89935, "coord_origin": "1"}}, {"id": 45, "text": "tokens", "bbox": {"l": 440.90573, "t": 399.7947700000001, "r": 463.22235, "b": 406.89935, "coord_origin": "1"}}, {"id": 46, "text": "D OTSL", "bbox": {"l": 384.11816, "t": 258.54718, "r": 413.99307, "b": 265.65179, "coord_origin": "1"}}, {"id": 47, "text": "sequence length:", "bbox": {"l": 393.75256, "t": 266.67505000000006, "r": 451.45129000000003, "b": 273.77966000000004, "coord_origin": "1"}}, {"id": 48, "text": "30", "bbox": {"l": 453.55083999999994, "t": 266.67505000000006, "r": 461.97485, "b": 273.77966000000004, "coord_origin": "1"}}, {"id": 49, "text": "vocabulary for this table:", "bbox": {"l": 151.79318, "t": 399.76016, "r": 233.89371000000003, "b": 406.86474999999996, "coord_origin": "1"}}, {"id": 50, "text": "12", "bbox": {"l": 235.99332, "t": 399.76016, "r": 244.41734000000002, "b": 406.86474999999996, "coord_origin": "1"}}, {"id": 51, "text": "tokens", "bbox": {"l": 246.52222, "t": 399.76016, "r": 268.83884, "b": 406.86474999999996, "coord_origin": "1"}}, {"id": 52, "text": "A", "bbox": {"l": 154.3298, "t": 213.57457999999997, "r": 159.79837, "b": 220.67920000000004, "coord_origin": "1"}}, {"id": 53, "text": "B", "bbox": {"l": 321.07053, "t": 213.57457999999997, "r": 326.53909, "b": 220.67920000000004, "coord_origin": "1"}}, {"id": 54, "text": "<table>", "bbox": {"l": 153.0947, "t": 280.30411, "r": 175.83888, "b": 286.69824, "coord_origin": "1"}}, {"id": 55, "text": "<tr>", "bbox": {"l": 160.67039, "t": 287.12088, "r": 172.79608, "b": 293.51501, "coord_origin": "1"}}, {"id": 56, "text": "<td", "bbox": {"l": 168.24603, "t": 293.93765, "r": 177.91019, "b": 300.33179, "coord_origin": "1"}}, {"id": 57, "text": "colspan=\u201c2\u201d", "bbox": {"l": 179.80525, "t": 293.93765, "r": 215.61517, "b": 300.33179, "coord_origin": "1"}}, {"id": 58, "text": "rowspan=\u201c2\u201d", "bbox": {"l": 217.50886999999997, "t": 293.93765, "r": 255.58945, "b": 300.33179, "coord_origin": "1"}}, {"id": 59, "text": ">", "bbox": {"l": 257.48315, "t": 293.93765, "r": 261.46414, "b": 300.33179, "coord_origin": "1"}}, {"id": 60, "text": "</td>", "bbox": {"l": 263.35785, "t": 293.93765, "r": 278.89804, "b": 300.33179, "coord_origin": "1"}}, {"id": 61, "text": "<td", "bbox": {"l": 280.79175, "t": 293.93765, "r": 290.4559, "b": 300.33179, "coord_origin": "1"}}, {"id": 62, "text": "colspan=\u201c3\u201d", "bbox": {"l": 292.35095, "t": 293.93765, "r": 328.16083, "b": 300.33179, "coord_origin": "1"}}, {"id": 63, "text": ">", "bbox": {"l": 330.05457, "t": 293.93765, "r": 334.03555, "b": 300.33179, "coord_origin": "1"}}, {"id": 64, "text": "</td>", "bbox": {"l": 335.92926, "t": 293.93765, "r": 351.46945, "b": 300.33179, "coord_origin": "1"}}, {"id": 65, "text": "</tr>", "bbox": {"l": 160.67039, "t": 300.75442999999996, "r": 174.68979, "b": 307.14856, "coord_origin": "1"}}, {"id": 66, "text": "<tr>", "bbox": {"l": 160.67039, "t": 307.57122999999996, "r": 172.79608, "b": 313.96536, "coord_origin": "1"}}, {"id": 67, "text": "<td>", "bbox": {"l": 168.24603, "t": 314.388, "r": 181.89255, "b": 320.78214, "coord_origin": "1"}}, {"id": 68, "text": "</td>", "bbox": {"l": 183.78624, "t": 314.388, "r": 199.32646, "b": 320.78214, "coord_origin": "1"}}, {"id": 69, "text": "<td>", "bbox": {"l": 201.22015, "t": 314.388, "r": 214.86666999999997, "b": 320.78214, "coord_origin": "1"}}, {"id": 70, "text": "</td>", "bbox": {"l": 216.76038, "t": 314.388, "r": 232.30058, "b": 320.78214, "coord_origin": "1"}}, {"id": 71, "text": "<td>", "bbox": {"l": 234.19427000000002, "t": 314.388, "r": 247.84079000000003, "b": 320.78214, "coord_origin": "1"}}, {"id": 72, "text": "</td>", "bbox": {"l": 249.73447999999996, "t": 314.388, "r": 265.27469, "b": 320.78214, "coord_origin": "1"}}, {"id": 73, "text": "</tr>", "bbox": {"l": 160.67039, "t": 321.20477, "r": 174.68979, "b": 327.59890999999993, "coord_origin": "1"}}, {"id": 74, "text": "<tr>", "bbox": {"l": 160.67039, "t": 328.02158, "r": 172.79608, "b": 334.41571000000005, "coord_origin": "1"}}, {"id": 75, "text": "<td rowspan=\u201c3\u201d > </td> <td> </td> <td> </td> <td> </td> <td> </td>", "bbox": {"l": 168.24603, "t": 334.83835, "r": 373.09091, "b": 341.23248, "coord_origin": "1"}}, {"id": 76, "text": "</tr>", "bbox": {"l": 160.67039, "t": 341.65512, "r": 174.68979, "b": 348.04926, "coord_origin": "1"}}, {"id": 77, "text": "<tr>", "bbox": {"l": 160.67039, "t": 348.47159, "r": 172.79608, "b": 354.86572, "coord_origin": "1"}}, {"id": 78, "text": "<td>", "bbox": {"l": 168.24603, "t": 355.28836000000007, "r": 181.89255, "b": 361.68249999999995, "coord_origin": "1"}}, {"id": 79, "text": "</td>", "bbox": {"l": 183.78624, "t": 355.28836000000007, "r": 199.32646, "b": 361.68249999999995, "coord_origin": "1"}}, {"id": 80, "text": "<td>", "bbox": {"l": 201.22015, "t": 355.28836000000007, "r": 214.86666999999997, "b": 361.68249999999995, "coord_origin": "1"}}, {"id": 81, "text": "</td>", "bbox": {"l": 216.76038, "t": 355.28836000000007, "r": 232.30058, "b": 361.68249999999995, "coord_origin": "1"}}, {"id": 82, "text": "<td>", "bbox": {"l": 234.19427000000002, "t": 355.28836000000007, "r": 247.84079000000003, "b": 361.68249999999995, "coord_origin": "1"}}, {"id": 83, "text": "</td>", "bbox": {"l": 249.73447999999996, "t": 355.28836000000007, "r": 265.27469, "b": 361.68249999999995, "coord_origin": "1"}}, {"id": 84, "text": "<td>", "bbox": {"l": 267.1684, "t": 355.28836000000007, "r": 280.81488, "b": 361.68249999999995, "coord_origin": "1"}}, {"id": 85, "text": "</td>", "bbox": {"l": 282.70862, "t": 355.28836000000007, "r": 298.24881, "b": 361.68249999999995, "coord_origin": "1"}}, {"id": 86, "text": "</tr>", "bbox": {"l": 160.67039, "t": 362.10516000000007, "r": 174.68979, "b": 368.49929999999995, "coord_origin": "1"}}, {"id": 87, "text": "<tr>", "bbox": {"l": 160.67039, "t": 368.92194, "r": 172.79608, "b": 375.31607, "coord_origin": "1"}}, {"id": 88, "text": "<td>", "bbox": {"l": 168.24603, "t": 375.73871, "r": 181.89255, "b": 382.13284, "coord_origin": "1"}}, {"id": 89, "text": "</td>", "bbox": {"l": 183.78624, "t": 375.73871, "r": 199.32646, "b": 382.13284, "coord_origin": "1"}}, {"id": 90, "text": "<td>", "bbox": {"l": 201.22015, "t": 375.73871, "r": 214.86666999999997, "b": 382.13284, "coord_origin": "1"}}, {"id": 91, "text": "</td>", "bbox": {"l": 216.76038, "t": 375.73871, "r": 232.30058, "b": 382.13284, "coord_origin": "1"}}, {"id": 92, "text": "<td>", "bbox": {"l": 234.19427000000002, "t": 375.73871, "r": 247.84079000000003, "b": 382.13284, "coord_origin": "1"}}, {"id": 93, "text": "</td>", "bbox": {"l": 249.73447999999996, "t": 375.73871, "r": 265.27469, "b": 382.13284, "coord_origin": "1"}}, {"id": 94, "text": "<td>", "bbox": {"l": 267.1684, "t": 375.73871, "r": 280.81488, "b": 382.13284, "coord_origin": "1"}}, {"id": 95, "text": "</td>", "bbox": {"l": 282.70862, "t": 375.73871, "r": 298.24881, "b": 382.13284, "coord_origin": "1"}}, {"id": 96, "text": "</tr>", "bbox": {"l": 160.67039, "t": 382.55551, "r": 174.68979, "b": 388.94965, "coord_origin": "1"}}, {"id": 97, "text": "</table>", "bbox": {"l": 153.0947, "t": 389.37228, "r": 177.73259, "b": 395.76642, "coord_origin": "1"}}, {"id": 98, "text": "C", "bbox": {"l": 395.06137, "t": 411.33353, "r": 401.62366, "b": 419.85904, "coord_origin": "1"}}, {"id": 99, "text": "L", "bbox": {"l": 407.42249, "t": 411.33353, "r": 412.47598, "b": 419.85904, "coord_origin": "1"}}, {"id": 100, "text": "U", "bbox": {"l": 418.69287, "t": 411.33353, "r": 425.25516, "b": 419.85904, "coord_origin": "1"}}, {"id": 101, "text": "X", "bbox": {"l": 430.5086099999999, "t": 411.33353, "r": 436.5709800000001, "b": 419.85904, "coord_origin": "1"}}, {"id": 102, "text": "<table>", "bbox": {"l": 152.36208, "t": 409.77362, "r": 175.10626, "b": 416.16776, "coord_origin": "1"}}, {"id": 103, "text": "<tr>", "bbox": {"l": 178.89366, "t": 409.77362, "r": 191.01935, "b": 416.16776, "coord_origin": "1"}}, {"id": 104, "text": "</tr>", "bbox": {"l": 194.80676, "t": 409.77362, "r": 208.82614, "b": 416.16776, "coord_origin": "1"}}, {"id": 105, "text": "<td>", "bbox": {"l": 212.61354, "t": 409.77362, "r": 226.26003999999998, "b": 416.16776, "coord_origin": "1"}}, {"id": 106, "text": "</td>", "bbox": {"l": 230.04745000000003, "t": 409.77362, "r": 245.58765000000002, "b": 416.16776, "coord_origin": "1"}}, {"id": 107, "text": "<td", "bbox": {"l": 249.37506000000002, "t": 409.77362, "r": 259.03918, "b": 416.16776, "coord_origin": "1"}}, {"id": 108, "text": "colspan=\"2\"", "bbox": {"l": 262.82797, "t": 409.77362, "r": 298.93646, "b": 416.16776, "coord_origin": "1"}}, {"id": 109, "text": "colspan=\"3\"", "bbox": {"l": 302.72385, "t": 409.77362, "r": 338.83234, "b": 416.16776, "coord_origin": "1"}}, {"id": 110, "text": "rowspan=\"2\"", "bbox": {"l": 152.36208, "t": 418.10522, "r": 190.74123, "b": 424.49936, "coord_origin": "1"}}, {"id": 111, "text": "rowspan=\"3\"", "bbox": {"l": 194.52863, "t": 418.10522, "r": 232.90777999999997, "b": 424.49936, "coord_origin": "1"}}, {"id": 112, "text": ">", "bbox": {"l": 236.69518999999997, "t": 418.10522, "r": 240.67617999999996, "b": 424.49936, "coord_origin": "1"}}, {"id": 113, "text": "</table>", "bbox": {"l": 244.46358, "t": 418.10522, "r": 269.10144, "b": 424.49936, "coord_origin": "1"}}, {"id": 114, "text": "C", "bbox": {"l": 154.50595, "t": 258.60095, "r": 159.62473, "b": 265.70556999999997, "coord_origin": "1"}}, {"id": 115, "text": "HTML", "bbox": {"l": 164.74348, "t": 258.60095, "r": 185.21857, "b": 265.70556999999997, "coord_origin": "1"}}, {"id": 116, "text": "sequence length:", "bbox": {"l": 164.3548, "t": 266.49707, "r": 222.05352999999997, "b": 273.60168, "coord_origin": "1"}}, {"id": 117, "text": "55", "bbox": {"l": 224.15326, "t": 266.49707, "r": 232.57729, "b": 273.60168, "coord_origin": "1"}}]}, {"id": 4, "label": "Text", "bbox": {"l": 133.95978956222532, "t": 451.48462371826173, "r": 480.59232000000003, "b": 521.5370635986328, "coord_origin": "1"}, "confidence": 0.9744325280189514, "cells": [{"id": 118, "text": "today,", "bbox": {"l": 134.765, "t": 452.31378, "r": 161.32928, "b": 461.11075, "coord_origin": "1"}}, {"id": 119, "text": "table detection", "bbox": {"l": 164.269, "t": 452.31378, "r": 226.28617999999997, "b": 461.11075, "coord_origin": "1"}}, {"id": 120, "text": "in documents is a well understood problem, and the latest", "bbox": {"l": 229.992, "t": 452.31378, "r": 480.59232000000003, "b": 461.11075, "coord_origin": "1"}}, {"id": 121, "text": "state-of-the-art (SOTA) object detection methods provide an accuracy compa-", "bbox": {"l": 134.76501, "t": 464.26877, "r": 480.59180000000003, "b": 473.06573, "coord_origin": "1"}}, {"id": 122, "text": "rable to human observers [7,8,10,14,23]. On the other hand, the problem of table", "bbox": {"l": 134.76501, "t": 476.22375, "r": 480.58673, "b": 485.02072, "coord_origin": "1"}}, {"id": 123, "text": "structure recognition (TSR) is a lot more challenging and remains a very active", "bbox": {"l": 134.76501, "t": 488.17975, "r": 480.58658, "b": 496.97672, "coord_origin": "1"}}, {"id": 124, "text": "area of research, in which many novel machine learning algorithms are being", "bbox": {"l": 134.76501, "t": 500.13474, "r": 480.58978, "b": 508.9317, "coord_origin": "1"}}, {"id": 125, "text": "explored [3,4,5,9,11,12,13,14,17,18,21,22].", "bbox": {"l": 134.76501, "t": 512.0897199999999, "r": 313.24597, "b": 520.88669, "coord_origin": "1"}}]}, {"id": 5, "label": "Text", "bbox": {"l": 133.8620867729187, "t": 523.3501098632813, "r": 480.59482, "b": 665.1943176269532, "coord_origin": "1"}, "confidence": 0.9866191148757935, "cells": [{"id": 126, "text": "Recently emerging SOTA methods for table structure recognition employ", "bbox": {"l": 149.70901, "t": 524.55072, "r": 480.58884000000006, "b": 533.3476900000001, "coord_origin": "1"}}, {"id": 127, "text": "transformer-based models, in which an image of the table is provided to the net-", "bbox": {"l": 134.76501, "t": 536.50671, "r": 480.5917400000001, "b": 545.30368, "coord_origin": "1"}}, {"id": 128, "text": "work in order to predict the structure of the table as a sequence of tokens. These", "bbox": {"l": 134.76501, "t": 548.46172, "r": 480.58868, "b": 557.25868, "coord_origin": "1"}}, {"id": 129, "text": "image-to-sequence (Im2Seq) models are extremely powerful, since they allow for", "bbox": {"l": 134.76501, "t": 560.41672, "r": 480.58795, "b": 569.2136800000001, "coord_origin": "1"}}, {"id": 130, "text": "a purely data-driven solution. The tokens of the sequence typically belong to a", "bbox": {"l": 134.76501, "t": 572.37172, "r": 480.58978, "b": 581.16869, "coord_origin": "1"}}, {"id": 131, "text": "markup language such as HTML, Latex or Markdown, which allow to describe", "bbox": {"l": 134.76501, "t": 584.32672, "r": 480.59479, "b": 593.12369, "coord_origin": "1"}}, {"id": 132, "text": "table structure as rows, columns and spanning cells in various configurations.", "bbox": {"l": 134.76501, "t": 596.28271, "r": 480.58678999999995, "b": 605.0796799999999, "coord_origin": "1"}}, {"id": 133, "text": "In Figure 1, we illustrate how HTML is used to represent the table-structure", "bbox": {"l": 134.76501, "t": 608.23772, "r": 480.59476, "b": 617.03468, "coord_origin": "1"}}, {"id": 134, "text": "of a particular example table. Public table-structure data sets such as PubTab-", "bbox": {"l": 134.76501, "t": 620.19272, "r": 480.5938100000001, "b": 628.98969, "coord_origin": "1"}}, {"id": 135, "text": "Net [22], and FinTabNet [21], which were created in a semi-automated way from", "bbox": {"l": 134.76501, "t": 632.1477199999999, "r": 480.59482, "b": 640.94469, "coord_origin": "1"}}, {"id": 136, "text": "paired PDF and HTML sources (e.g. PubMed Central), popularized primarily", "bbox": {"l": 134.76501, "t": 644.10272, "r": 480.58771, "b": 652.89969, "coord_origin": "1"}}, {"id": 137, "text": "the use of HTML as ground-truth representation format for TSR.", "bbox": {"l": 134.76501, "t": 656.05772, "r": 421.45377, "b": 664.8547, "coord_origin": "1"}}]}]}, "tablestructure": {"table_map": {}}, "figures_classification": null, "equations_prediction": null}, "assembled": {"elements": [{"label": "Page-header", "id": 0, "page_no": 1, "cluster": {"id": 0, "label": "Page-header", "bbox": {"l": 134.28974075317385, "t": 93.54430103302002, "r": 139.49438409805296, "b": 101.84069999999997, "coord_origin": "1"}, "confidence": 0.862735390663147, "cells": [{"id": 0, "text": "2", "bbox": {"l": 134.765, "t": 93.77099999999996, "r": 139.37193, "b": 101.84069999999997, "coord_origin": "1"}}]}, "text": "2"}, {"label": "Page-header", "id": 1, "page_no": 1, "cluster": {"id": 1, "label": "Page-header", "bbox": {"l": 167.31274366378784, "t": 92.9727201461792, "r": 231.72227, "b": 102.11999702453613, "coord_origin": "1"}, "confidence": 0.930519700050354, "cells": [{"id": 1, "text": "M.", "bbox": {"l": 167.81335, "t": 93.77099999999996, "r": 178.07675, "b": 101.84069999999997, "coord_origin": "1"}}, {"id": 2, "text": "Lysak, et al.", "bbox": {"l": 182.37415, "t": 93.77099999999996, "r": 231.72227, "b": 101.84069999999997, "coord_origin": "1"}}]}, "text": "M. Lysak, et al."}, {"label": "Text", "id": 2, "page_no": 1, "cluster": {"id": 2, "label": "Text", "bbox": {"l": 133.9922842025757, "t": 125.57491407394411, "r": 480.75620498657224, "b": 200.46212196350098, "coord_origin": "1"}, "confidence": 0.8814666271209717, "cells": [{"id": 3, "text": "Fig. 1.", "bbox": {"l": 134.765, "t": 126.33416999999997, "r": 162.64424, "b": 134.26049999999998, "coord_origin": "1"}}, {"id": 4, "text": "Comparison between HTML and OTSL table structure representation: (A)", "bbox": {"l": 167.062, "t": 126.39697000000001, "r": 480.59106, "b": 134.46667000000002, "coord_origin": "1"}}, {"id": 5, "text": "table-example with complex row and column headers, including a 2D empty span,", "bbox": {"l": 134.765, "t": 137.35595999999998, "r": 480.59018, "b": 145.42566, "coord_origin": "1"}}, {"id": 6, "text": "(B)", "bbox": {"l": 134.765, "t": 148.31493999999998, "r": 147.95433, "b": 156.38464, "coord_origin": "1"}}, {"id": 7, "text": "minimal graphical representation of table structure using rectangular layout, (C)", "bbox": {"l": 152.39224, "t": 148.31493999999998, "r": 480.59096999999997, "b": 156.38464, "coord_origin": "1"}}, {"id": 8, "text": "HTML representation, (D) OTSL representation. This example demonstrates many of", "bbox": {"l": 134.765, "t": 159.27392999999995, "r": 480.59189, "b": 167.34362999999996, "coord_origin": "1"}}, {"id": 9, "text": "the key-features of OTSL, namely its reduced vocabulary size (12 versus 5 in this case),", "bbox": {"l": 134.765, "t": 170.23290999999995, "r": 480.58914000000004, "b": 178.30260999999996, "coord_origin": "1"}}, {"id": 10, "text": "its reduced sequence length (55 versus 30) and a enhanced internal structure (variable", "bbox": {"l": 134.765, "t": 181.19188999999994, "r": 480.59020999999996, "b": 189.26160000000004, "coord_origin": "1"}}, {"id": 11, "text": "token sequence length per row in HTML versus a fixed length of rows in OTSL).", "bbox": {"l": 134.765, "t": 192.15088000000003, "r": 460.87109, "b": 200.22058000000004, "coord_origin": "1"}}]}, "text": "Fig. 1. Comparison between HTML and OTSL table structure representation: (A) table-example with complex row and column headers, including a 2D empty span, (B) minimal graphical representation of table structure using rectangular layout, (C) HTML representation, (D) OTSL representation. This example demonstrates many of the key-features of OTSL, namely its reduced vocabulary size (12 versus 5 in this case), its reduced sequence length (55 versus 30) and a enhanced internal structure (variable token sequence length per row in HTML versus a fixed length of rows in OTSL)."}, {"label": "Picture", "id": 3, "page_no": 1, "cluster": {"id": 3, "label": "Picture", "bbox": {"l": 150.0213635444641, "t": 208.88499984741213, "r": 464.4815700531006, "b": 425.84868278503416, "coord_origin": "1"}, "confidence": 0.9741523265838623, "cells": [{"id": 12, "text": "C", "bbox": {"l": 396.41107, "t": 280.98352, "r": 402.97336, "b": 289.50903, "coord_origin": "1"}}, {"id": 13, "text": "C", "bbox": {"l": 418.58682, "t": 280.89792, "r": 425.14911, "b": 289.42343, "coord_origin": "1"}}, {"id": 14, "text": "C", "bbox": {"l": 395.74835, "t": 303.23727, "r": 402.31064, "b": 311.76279, "coord_origin": "1"}}, {"id": 15, "text": "C", "bbox": {"l": 407.54214, "t": 303.36981, "r": 414.10443, "b": 311.89532, "coord_origin": "1"}}, {"id": 16, "text": "C", "bbox": {"l": 407.56335, "t": 314.40619, "r": 414.12564, "b": 322.9317, "coord_origin": "1"}}, {"id": 17, "text": "C", "bbox": {"l": 418.51108, "t": 292.08502000000004, "r": 425.07336, "b": 300.61053000000004, "coord_origin": "1"}}, {"id": 18, "text": "C", "bbox": {"l": 429.59744, "t": 292.09106, "r": 436.1597300000001, "b": 300.61658, "coord_origin": "1"}}, {"id": 19, "text": "C", "bbox": {"l": 440.68759000000006, "t": 292.01230000000004, "r": 447.24987999999996, "b": 300.53781000000004, "coord_origin": "1"}}, {"id": 20, "text": "C", "bbox": {"l": 418.6232, "t": 303.29483, "r": 425.18549, "b": 311.82034, "coord_origin": "1"}}, {"id": 21, "text": "C", "bbox": {"l": 429.7095299999999, "t": 303.30011, "r": 436.27182, "b": 311.82562, "coord_origin": "1"}}, {"id": 22, "text": "C", "bbox": {"l": 440.7996800000001, "t": 303.22211, "r": 447.36197, "b": 311.74762, "coord_origin": "1"}}, {"id": 23, "text": "C", "bbox": {"l": 418.62546, "t": 314.56903, "r": 425.18774, "b": 323.09454, "coord_origin": "1"}}, {"id": 24, "text": "C", "bbox": {"l": 429.71181999999993, "t": 314.57434, "r": 436.27411, "b": 323.09985, "coord_origin": "1"}}, {"id": 25, "text": "C", "bbox": {"l": 440.80194, "t": 314.49631, "r": 447.36423, "b": 323.02182, "coord_origin": "1"}}, {"id": 26, "text": "C", "bbox": {"l": 407.39746, "t": 325.29031, "r": 413.95975, "b": 333.81583, "coord_origin": "1"}}, {"id": 27, "text": "C", "bbox": {"l": 418.45959, "t": 325.45316, "r": 425.02188, "b": 333.97867, "coord_origin": "1"}}, {"id": 28, "text": "C", "bbox": {"l": 429.54593, "t": 325.4592, "r": 436.10822, "b": 333.98471, "coord_origin": "1"}}, {"id": 29, "text": "C", "bbox": {"l": 440.63608, "t": 325.38043, "r": 447.19836, "b": 333.90594, "coord_origin": "1"}}, {"id": 30, "text": "NL", "bbox": {"l": 451.89511000000005, "t": 280.15717, "r": 463.51273000000003, "b": 288.68268, "coord_origin": "1"}}, {"id": 31, "text": "NL", "bbox": {"l": 452.1557, "t": 291.59875000000005, "r": 463.77332, "b": 300.12427, "coord_origin": "1"}}, {"id": 32, "text": "NL", "bbox": {"l": 452.17688000000004, "t": 302.84265, "r": 463.79449000000005, "b": 311.36816, "coord_origin": "1"}}, {"id": 33, "text": "NL", "bbox": {"l": 452.09887999999995, "t": 314.12441999999993, "r": 463.71648999999996, "b": 322.6499299999999, "coord_origin": "1"}}, {"id": 34, "text": "NL", "bbox": {"l": 452.29733, "t": 325.46906, "r": 463.91495, "b": 333.99457, "coord_origin": "1"}}, {"id": 35, "text": "U", "bbox": {"l": 396.09677, "t": 314.49478, "r": 402.65906, "b": 323.02029000000005, "coord_origin": "1"}}, {"id": 36, "text": "U", "bbox": {"l": 395.99829, "t": 325.38876000000005, "r": 402.56058, "b": 333.91428, "coord_origin": "1"}}, {"id": 37, "text": "U", "bbox": {"l": 396.27475, "t": 292.27057, "r": 402.83704, "b": 300.79608, "coord_origin": "1"}}, {"id": 38, "text": "L", "bbox": {"l": 408.54724, "t": 280.96912, "r": 413.60074, "b": 289.49463, "coord_origin": "1"}}, {"id": 39, "text": "L", "bbox": {"l": 430.58966, "t": 280.49725, "r": 435.6431600000001, "b": 289.02277, "coord_origin": "1"}}, {"id": 40, "text": "L", "bbox": {"l": 441.08069, "t": 280.38062, "r": 446.13419, "b": 288.90613, "coord_origin": "1"}}, {"id": 41, "text": "X", "bbox": {"l": 407.97388, "t": 292.13425, "r": 414.03625, "b": 300.65976, "coord_origin": "1"}}, {"id": 42, "text": "NL", "bbox": {"l": 441.25640999999996, "t": 411.1807600000001, "r": 452.87402, "b": 419.7062700000001, "coord_origin": "1"}}, {"id": 43, "text": "vocabulary:", "bbox": {"l": 393.75256, "t": 399.7947700000001, "r": 432.48929, "b": 406.89935, "coord_origin": "1"}}, {"id": 44, "text": "5", "bbox": {"l": 434.5896000000001, "t": 399.7947700000001, "r": 438.80083999999994, "b": 406.89935, "coord_origin": "1"}}, {"id": 45, "text": "tokens", "bbox": {"l": 440.90573, "t": 399.7947700000001, "r": 463.22235, "b": 406.89935, "coord_origin": "1"}}, {"id": 46, "text": "D OTSL", "bbox": {"l": 384.11816, "t": 258.54718, "r": 413.99307, "b": 265.65179, "coord_origin": "1"}}, {"id": 47, "text": "sequence length:", "bbox": {"l": 393.75256, "t": 266.67505000000006, "r": 451.45129000000003, "b": 273.77966000000004, "coord_origin": "1"}}, {"id": 48, "text": "30", "bbox": {"l": 453.55083999999994, "t": 266.67505000000006, "r": 461.97485, "b": 273.77966000000004, "coord_origin": "1"}}, {"id": 49, "text": "vocabulary for this table:", "bbox": {"l": 151.79318, "t": 399.76016, "r": 233.89371000000003, "b": 406.86474999999996, "coord_origin": "1"}}, {"id": 50, "text": "12", "bbox": {"l": 235.99332, "t": 399.76016, "r": 244.41734000000002, "b": 406.86474999999996, "coord_origin": "1"}}, {"id": 51, "text": "tokens", "bbox": {"l": 246.52222, "t": 399.76016, "r": 268.83884, "b": 406.86474999999996, "coord_origin": "1"}}, {"id": 52, "text": "A", "bbox": {"l": 154.3298, "t": 213.57457999999997, "r": 159.79837, "b": 220.67920000000004, "coord_origin": "1"}}, {"id": 53, "text": "B", "bbox": {"l": 321.07053, "t": 213.57457999999997, "r": 326.53909, "b": 220.67920000000004, "coord_origin": "1"}}, {"id": 54, "text": "<table>", "bbox": {"l": 153.0947, "t": 280.30411, "r": 175.83888, "b": 286.69824, "coord_origin": "1"}}, {"id": 55, "text": "<tr>", "bbox": {"l": 160.67039, "t": 287.12088, "r": 172.79608, "b": 293.51501, "coord_origin": "1"}}, {"id": 56, "text": "<td", "bbox": {"l": 168.24603, "t": 293.93765, "r": 177.91019, "b": 300.33179, "coord_origin": "1"}}, {"id": 57, "text": "colspan=\u201c2\u201d", "bbox": {"l": 179.80525, "t": 293.93765, "r": 215.61517, "b": 300.33179, "coord_origin": "1"}}, {"id": 58, "text": "rowspan=\u201c2\u201d", "bbox": {"l": 217.50886999999997, "t": 293.93765, "r": 255.58945, "b": 300.33179, "coord_origin": "1"}}, {"id": 59, "text": ">", "bbox": {"l": 257.48315, "t": 293.93765, "r": 261.46414, "b": 300.33179, "coord_origin": "1"}}, {"id": 60, "text": "</td>", "bbox": {"l": 263.35785, "t": 293.93765, "r": 278.89804, "b": 300.33179, "coord_origin": "1"}}, {"id": 61, "text": "<td", "bbox": {"l": 280.79175, "t": 293.93765, "r": 290.4559, "b": 300.33179, "coord_origin": "1"}}, {"id": 62, "text": "colspan=\u201c3\u201d", "bbox": {"l": 292.35095, "t": 293.93765, "r": 328.16083, "b": 300.33179, "coord_origin": "1"}}, {"id": 63, "text": ">", "bbox": {"l": 330.05457, "t": 293.93765, "r": 334.03555, "b": 300.33179, "coord_origin": "1"}}, {"id": 64, "text": "</td>", "bbox": {"l": 335.92926, "t": 293.93765, "r": 351.46945, "b": 300.33179, "coord_origin": "1"}}, {"id": 65, "text": "</tr>", "bbox": {"l": 160.67039, "t": 300.75442999999996, "r": 174.68979, "b": 307.14856, "coord_origin": "1"}}, {"id": 66, "text": "<tr>", "bbox": {"l": 160.67039, "t": 307.57122999999996, "r": 172.79608, "b": 313.96536, "coord_origin": "1"}}, {"id": 67, "text": "<td>", "bbox": {"l": 168.24603, "t": 314.388, "r": 181.89255, "b": 320.78214, "coord_origin": "1"}}, {"id": 68, "text": "</td>", "bbox": {"l": 183.78624, "t": 314.388, "r": 199.32646, "b": 320.78214, "coord_origin": "1"}}, {"id": 69, "text": "<td>", "bbox": {"l": 201.22015, "t": 314.388, "r": 214.86666999999997, "b": 320.78214, "coord_origin": "1"}}, {"id": 70, "text": "</td>", "bbox": {"l": 216.76038, "t": 314.388, "r": 232.30058, "b": 320.78214, "coord_origin": "1"}}, {"id": 71, "text": "<td>", "bbox": {"l": 234.19427000000002, "t": 314.388, "r": 247.84079000000003, "b": 320.78214, "coord_origin": "1"}}, {"id": 72, "text": "</td>", "bbox": {"l": 249.73447999999996, "t": 314.388, "r": 265.27469, "b": 320.78214, "coord_origin": "1"}}, {"id": 73, "text": "</tr>", "bbox": {"l": 160.67039, "t": 321.20477, "r": 174.68979, "b": 327.59890999999993, "coord_origin": "1"}}, {"id": 74, "text": "<tr>", "bbox": {"l": 160.67039, "t": 328.02158, "r": 172.79608, "b": 334.41571000000005, "coord_origin": "1"}}, {"id": 75, "text": "<td rowspan=\u201c3\u201d > </td> <td> </td> <td> </td> <td> </td> <td> </td>", "bbox": {"l": 168.24603, "t": 334.83835, "r": 373.09091, "b": 341.23248, "coord_origin": "1"}}, {"id": 76, "text": "</tr>", "bbox": {"l": 160.67039, "t": 341.65512, "r": 174.68979, "b": 348.04926, "coord_origin": "1"}}, {"id": 77, "text": "<tr>", "bbox": {"l": 160.67039, "t": 348.47159, "r": 172.79608, "b": 354.86572, "coord_origin": "1"}}, {"id": 78, "text": "<td>", "bbox": {"l": 168.24603, "t": 355.28836000000007, "r": 181.89255, "b": 361.68249999999995, "coord_origin": "1"}}, {"id": 79, "text": "</td>", "bbox": {"l": 183.78624, "t": 355.28836000000007, "r": 199.32646, "b": 361.68249999999995, "coord_origin": "1"}}, {"id": 80, "text": "<td>", "bbox": {"l": 201.22015, "t": 355.28836000000007, "r": 214.86666999999997, "b": 361.68249999999995, "coord_origin": "1"}}, {"id": 81, "text": "</td>", "bbox": {"l": 216.76038, "t": 355.28836000000007, "r": 232.30058, "b": 361.68249999999995, "coord_origin": "1"}}, {"id": 82, "text": "<td>", "bbox": {"l": 234.19427000000002, "t": 355.28836000000007, "r": 247.84079000000003, "b": 361.68249999999995, "coord_origin": "1"}}, {"id": 83, "text": "</td>", "bbox": {"l": 249.73447999999996, "t": 355.28836000000007, "r": 265.27469, "b": 361.68249999999995, "coord_origin": "1"}}, {"id": 84, "text": "<td>", "bbox": {"l": 267.1684, "t": 355.28836000000007, "r": 280.81488, "b": 361.68249999999995, "coord_origin": "1"}}, {"id": 85, "text": "</td>", "bbox": {"l": 282.70862, "t": 355.28836000000007, "r": 298.24881, "b": 361.68249999999995, "coord_origin": "1"}}, {"id": 86, "text": "</tr>", "bbox": {"l": 160.67039, "t": 362.10516000000007, "r": 174.68979, "b": 368.49929999999995, "coord_origin": "1"}}, {"id": 87, "text": "<tr>", "bbox": {"l": 160.67039, "t": 368.92194, "r": 172.79608, "b": 375.31607, "coord_origin": "1"}}, {"id": 88, "text": "<td>", "bbox": {"l": 168.24603, "t": 375.73871, "r": 181.89255, "b": 382.13284, "coord_origin": "1"}}, {"id": 89, "text": "</td>", "bbox": {"l": 183.78624, "t": 375.73871, "r": 199.32646, "b": 382.13284, "coord_origin": "1"}}, {"id": 90, "text": "<td>", "bbox": {"l": 201.22015, "t": 375.73871, "r": 214.86666999999997, "b": 382.13284, "coord_origin": "1"}}, {"id": 91, "text": "</td>", "bbox": {"l": 216.76038, "t": 375.73871, "r": 232.30058, "b": 382.13284, "coord_origin": "1"}}, {"id": 92, "text": "<td>", "bbox": {"l": 234.19427000000002, "t": 375.73871, "r": 247.84079000000003, "b": 382.13284, "coord_origin": "1"}}, {"id": 93, "text": "</td>", "bbox": {"l": 249.73447999999996, "t": 375.73871, "r": 265.27469, "b": 382.13284, "coord_origin": "1"}}, {"id": 94, "text": "<td>", "bbox": {"l": 267.1684, "t": 375.73871, "r": 280.81488, "b": 382.13284, "coord_origin": "1"}}, {"id": 95, "text": "</td>", "bbox": {"l": 282.70862, "t": 375.73871, "r": 298.24881, "b": 382.13284, "coord_origin": "1"}}, {"id": 96, "text": "</tr>", "bbox": {"l": 160.67039, "t": 382.55551, "r": 174.68979, "b": 388.94965, "coord_origin": "1"}}, {"id": 97, "text": "</table>", "bbox": {"l": 153.0947, "t": 389.37228, "r": 177.73259, "b": 395.76642, "coord_origin": "1"}}, {"id": 98, "text": "C", "bbox": {"l": 395.06137, "t": 411.33353, "r": 401.62366, "b": 419.85904, "coord_origin": "1"}}, {"id": 99, "text": "L", "bbox": {"l": 407.42249, "t": 411.33353, "r": 412.47598, "b": 419.85904, "coord_origin": "1"}}, {"id": 100, "text": "U", "bbox": {"l": 418.69287, "t": 411.33353, "r": 425.25516, "b": 419.85904, "coord_origin": "1"}}, {"id": 101, "text": "X", "bbox": {"l": 430.5086099999999, "t": 411.33353, "r": 436.5709800000001, "b": 419.85904, "coord_origin": "1"}}, {"id": 102, "text": "<table>", "bbox": {"l": 152.36208, "t": 409.77362, "r": 175.10626, "b": 416.16776, "coord_origin": "1"}}, {"id": 103, "text": "<tr>", "bbox": {"l": 178.89366, "t": 409.77362, "r": 191.01935, "b": 416.16776, "coord_origin": "1"}}, {"id": 104, "text": "</tr>", "bbox": {"l": 194.80676, "t": 409.77362, "r": 208.82614, "b": 416.16776, "coord_origin": "1"}}, {"id": 105, "text": "<td>", "bbox": {"l": 212.61354, "t": 409.77362, "r": 226.26003999999998, "b": 416.16776, "coord_origin": "1"}}, {"id": 106, "text": "</td>", "bbox": {"l": 230.04745000000003, "t": 409.77362, "r": 245.58765000000002, "b": 416.16776, "coord_origin": "1"}}, {"id": 107, "text": "<td", "bbox": {"l": 249.37506000000002, "t": 409.77362, "r": 259.03918, "b": 416.16776, "coord_origin": "1"}}, {"id": 108, "text": "colspan=\"2\"", "bbox": {"l": 262.82797, "t": 409.77362, "r": 298.93646, "b": 416.16776, "coord_origin": "1"}}, {"id": 109, "text": "colspan=\"3\"", "bbox": {"l": 302.72385, "t": 409.77362, "r": 338.83234, "b": 416.16776, "coord_origin": "1"}}, {"id": 110, "text": "rowspan=\"2\"", "bbox": {"l": 152.36208, "t": 418.10522, "r": 190.74123, "b": 424.49936, "coord_origin": "1"}}, {"id": 111, "text": "rowspan=\"3\"", "bbox": {"l": 194.52863, "t": 418.10522, "r": 232.90777999999997, "b": 424.49936, "coord_origin": "1"}}, {"id": 112, "text": ">", "bbox": {"l": 236.69518999999997, "t": 418.10522, "r": 240.67617999999996, "b": 424.49936, "coord_origin": "1"}}, {"id": 113, "text": "</table>", "bbox": {"l": 244.46358, "t": 418.10522, "r": 269.10144, "b": 424.49936, "coord_origin": "1"}}, {"id": 114, "text": "C", "bbox": {"l": 154.50595, "t": 258.60095, "r": 159.62473, "b": 265.70556999999997, "coord_origin": "1"}}, {"id": 115, "text": "HTML", "bbox": {"l": 164.74348, "t": 258.60095, "r": 185.21857, "b": 265.70556999999997, "coord_origin": "1"}}, {"id": 116, "text": "sequence length:", "bbox": {"l": 164.3548, "t": 266.49707, "r": 222.05352999999997, "b": 273.60168, "coord_origin": "1"}}, {"id": 117, "text": "55", "bbox": {"l": 224.15326, "t": 266.49707, "r": 232.57729, "b": 273.60168, "coord_origin": "1"}}]}, "text": "", "data": null, "provenance": null, "predicted_class": null, "confidence": null}, {"label": "Text", "id": 4, "page_no": 1, "cluster": {"id": 4, "label": "Text", "bbox": {"l": 133.95978956222532, "t": 451.48462371826173, "r": 480.59232000000003, "b": 521.5370635986328, "coord_origin": "1"}, "confidence": 0.9744325280189514, "cells": [{"id": 118, "text": "today,", "bbox": {"l": 134.765, "t": 452.31378, "r": 161.32928, "b": 461.11075, "coord_origin": "1"}}, {"id": 119, "text": "table detection", "bbox": {"l": 164.269, "t": 452.31378, "r": 226.28617999999997, "b": 461.11075, "coord_origin": "1"}}, {"id": 120, "text": "in documents is a well understood problem, and the latest", "bbox": {"l": 229.992, "t": 452.31378, "r": 480.59232000000003, "b": 461.11075, "coord_origin": "1"}}, {"id": 121, "text": "state-of-the-art (SOTA) object detection methods provide an accuracy compa-", "bbox": {"l": 134.76501, "t": 464.26877, "r": 480.59180000000003, "b": 473.06573, "coord_origin": "1"}}, {"id": 122, "text": "rable to human observers [7,8,10,14,23]. On the other hand, the problem of table", "bbox": {"l": 134.76501, "t": 476.22375, "r": 480.58673, "b": 485.02072, "coord_origin": "1"}}, {"id": 123, "text": "structure recognition (TSR) is a lot more challenging and remains a very active", "bbox": {"l": 134.76501, "t": 488.17975, "r": 480.58658, "b": 496.97672, "coord_origin": "1"}}, {"id": 124, "text": "area of research, in which many novel machine learning algorithms are being", "bbox": {"l": 134.76501, "t": 500.13474, "r": 480.58978, "b": 508.9317, "coord_origin": "1"}}, {"id": 125, "text": "explored [3,4,5,9,11,12,13,14,17,18,21,22].", "bbox": {"l": 134.76501, "t": 512.0897199999999, "r": 313.24597, "b": 520.88669, "coord_origin": "1"}}]}, "text": "today, table detection in documents is a well understood problem, and the latest state-of-the-art (SOTA) object detection methods provide an accuracy comparable to human observers [7,8,10,14,23]. On the other hand, the problem of table structure recognition (TSR) is a lot more challenging and remains a very active area of research, in which many novel machine learning algorithms are being explored [3,4,5,9,11,12,13,14,17,18,21,22]."}, {"label": "Text", "id": 5, "page_no": 1, "cluster": {"id": 5, "label": "Text", "bbox": {"l": 133.8620867729187, "t": 523.3501098632813, "r": 480.59482, "b": 665.1943176269532, "coord_origin": "1"}, "confidence": 0.9866191148757935, "cells": [{"id": 126, "text": "Recently emerging SOTA methods for table structure recognition employ", "bbox": {"l": 149.70901, "t": 524.55072, "r": 480.58884000000006, "b": 533.3476900000001, "coord_origin": "1"}}, {"id": 127, "text": "transformer-based models, in which an image of the table is provided to the net-", "bbox": {"l": 134.76501, "t": 536.50671, "r": 480.5917400000001, "b": 545.30368, "coord_origin": "1"}}, {"id": 128, "text": "work in order to predict the structure of the table as a sequence of tokens. These", "bbox": {"l": 134.76501, "t": 548.46172, "r": 480.58868, "b": 557.25868, "coord_origin": "1"}}, {"id": 129, "text": "image-to-sequence (Im2Seq) models are extremely powerful, since they allow for", "bbox": {"l": 134.76501, "t": 560.41672, "r": 480.58795, "b": 569.2136800000001, "coord_origin": "1"}}, {"id": 130, "text": "a purely data-driven solution. The tokens of the sequence typically belong to a", "bbox": {"l": 134.76501, "t": 572.37172, "r": 480.58978, "b": 581.16869, "coord_origin": "1"}}, {"id": 131, "text": "markup language such as HTML, Latex or Markdown, which allow to describe", "bbox": {"l": 134.76501, "t": 584.32672, "r": 480.59479, "b": 593.12369, "coord_origin": "1"}}, {"id": 132, "text": "table structure as rows, columns and spanning cells in various configurations.", "bbox": {"l": 134.76501, "t": 596.28271, "r": 480.58678999999995, "b": 605.0796799999999, "coord_origin": "1"}}, {"id": 133, "text": "In Figure 1, we illustrate how HTML is used to represent the table-structure", "bbox": {"l": 134.76501, "t": 608.23772, "r": 480.59476, "b": 617.03468, "coord_origin": "1"}}, {"id": 134, "text": "of a particular example table. Public table-structure data sets such as PubTab-", "bbox": {"l": 134.76501, "t": 620.19272, "r": 480.5938100000001, "b": 628.98969, "coord_origin": "1"}}, {"id": 135, "text": "Net [22], and FinTabNet [21], which were created in a semi-automated way from", "bbox": {"l": 134.76501, "t": 632.1477199999999, "r": 480.59482, "b": 640.94469, "coord_origin": "1"}}, {"id": 136, "text": "paired PDF and HTML sources (e.g. PubMed Central), popularized primarily", "bbox": {"l": 134.76501, "t": 644.10272, "r": 480.58771, "b": 652.89969, "coord_origin": "1"}}, {"id": 137, "text": "the use of HTML as ground-truth representation format for TSR.", "bbox": {"l": 134.76501, "t": 656.05772, "r": 421.45377, "b": 664.8547, "coord_origin": "1"}}]}, "text": "Recently emerging SOTA methods for table structure recognition employ transformer-based models, in which an image of the table is provided to the network in order to predict the structure of the table as a sequence of tokens. These image-to-sequence (Im2Seq) models are extremely powerful, since they allow for a purely data-driven solution. The tokens of the sequence typically belong to a markup language such as HTML, Latex or Markdown, which allow to describe table structure as rows, columns and spanning cells in various configurations. In Figure 1, we illustrate how HTML is used to represent the table-structure of a particular example table. Public table-structure data sets such as PubTabNet [22], and FinTabNet [21], which were created in a semi-automated way from paired PDF and HTML sources (e.g. PubMed Central), popularized primarily the use of HTML as ground-truth representation format for TSR."}], "body": [{"label": "Text", "id": 2, "page_no": 1, "cluster": {"id": 2, "label": "Text", "bbox": {"l": 133.9922842025757, "t": 125.57491407394411, "r": 480.75620498657224, "b": 200.46212196350098, "coord_origin": "1"}, "confidence": 0.8814666271209717, "cells": [{"id": 3, "text": "Fig. 1.", "bbox": {"l": 134.765, "t": 126.33416999999997, "r": 162.64424, "b": 134.26049999999998, "coord_origin": "1"}}, {"id": 4, "text": "Comparison between HTML and OTSL table structure representation: (A)", "bbox": {"l": 167.062, "t": 126.39697000000001, "r": 480.59106, "b": 134.46667000000002, "coord_origin": "1"}}, {"id": 5, "text": "table-example with complex row and column headers, including a 2D empty span,", "bbox": {"l": 134.765, "t": 137.35595999999998, "r": 480.59018, "b": 145.42566, "coord_origin": "1"}}, {"id": 6, "text": "(B)", "bbox": {"l": 134.765, "t": 148.31493999999998, "r": 147.95433, "b": 156.38464, "coord_origin": "1"}}, {"id": 7, "text": "minimal graphical representation of table structure using rectangular layout, (C)", "bbox": {"l": 152.39224, "t": 148.31493999999998, "r": 480.59096999999997, "b": 156.38464, "coord_origin": "1"}}, {"id": 8, "text": "HTML representation, (D) OTSL representation. This example demonstrates many of", "bbox": {"l": 134.765, "t": 159.27392999999995, "r": 480.59189, "b": 167.34362999999996, "coord_origin": "1"}}, {"id": 9, "text": "the key-features of OTSL, namely its reduced vocabulary size (12 versus 5 in this case),", "bbox": {"l": 134.765, "t": 170.23290999999995, "r": 480.58914000000004, "b": 178.30260999999996, "coord_origin": "1"}}, {"id": 10, "text": "its reduced sequence length (55 versus 30) and a enhanced internal structure (variable", "bbox": {"l": 134.765, "t": 181.19188999999994, "r": 480.59020999999996, "b": 189.26160000000004, "coord_origin": "1"}}, {"id": 11, "text": "token sequence length per row in HTML versus a fixed length of rows in OTSL).", "bbox": {"l": 134.765, "t": 192.15088000000003, "r": 460.87109, "b": 200.22058000000004, "coord_origin": "1"}}]}, "text": "Fig. 1. Comparison between HTML and OTSL table structure representation: (A) table-example with complex row and column headers, including a 2D empty span, (B) minimal graphical representation of table structure using rectangular layout, (C) HTML representation, (D) OTSL representation. This example demonstrates many of the key-features of OTSL, namely its reduced vocabulary size (12 versus 5 in this case), its reduced sequence length (55 versus 30) and a enhanced internal structure (variable token sequence length per row in HTML versus a fixed length of rows in OTSL)."}, {"label": "Picture", "id": 3, "page_no": 1, "cluster": {"id": 3, "label": "Picture", "bbox": {"l": 150.0213635444641, "t": 208.88499984741213, "r": 464.4815700531006, "b": 425.84868278503416, "coord_origin": "1"}, "confidence": 0.9741523265838623, "cells": [{"id": 12, "text": "C", "bbox": {"l": 396.41107, "t": 280.98352, "r": 402.97336, "b": 289.50903, "coord_origin": "1"}}, {"id": 13, "text": "C", "bbox": {"l": 418.58682, "t": 280.89792, "r": 425.14911, "b": 289.42343, "coord_origin": "1"}}, {"id": 14, "text": "C", "bbox": {"l": 395.74835, "t": 303.23727, "r": 402.31064, "b": 311.76279, "coord_origin": "1"}}, {"id": 15, "text": "C", "bbox": {"l": 407.54214, "t": 303.36981, "r": 414.10443, "b": 311.89532, "coord_origin": "1"}}, {"id": 16, "text": "C", "bbox": {"l": 407.56335, "t": 314.40619, "r": 414.12564, "b": 322.9317, "coord_origin": "1"}}, {"id": 17, "text": "C", "bbox": {"l": 418.51108, "t": 292.08502000000004, "r": 425.07336, "b": 300.61053000000004, "coord_origin": "1"}}, {"id": 18, "text": "C", "bbox": {"l": 429.59744, "t": 292.09106, "r": 436.1597300000001, "b": 300.61658, "coord_origin": "1"}}, {"id": 19, "text": "C", "bbox": {"l": 440.68759000000006, "t": 292.01230000000004, "r": 447.24987999999996, "b": 300.53781000000004, "coord_origin": "1"}}, {"id": 20, "text": "C", "bbox": {"l": 418.6232, "t": 303.29483, "r": 425.18549, "b": 311.82034, "coord_origin": "1"}}, {"id": 21, "text": "C", "bbox": {"l": 429.7095299999999, "t": 303.30011, "r": 436.27182, "b": 311.82562, "coord_origin": "1"}}, {"id": 22, "text": "C", "bbox": {"l": 440.7996800000001, "t": 303.22211, "r": 447.36197, "b": 311.74762, "coord_origin": "1"}}, {"id": 23, "text": "C", "bbox": {"l": 418.62546, "t": 314.56903, "r": 425.18774, "b": 323.09454, "coord_origin": "1"}}, {"id": 24, "text": "C", "bbox": {"l": 429.71181999999993, "t": 314.57434, "r": 436.27411, "b": 323.09985, "coord_origin": "1"}}, {"id": 25, "text": "C", "bbox": {"l": 440.80194, "t": 314.49631, "r": 447.36423, "b": 323.02182, "coord_origin": "1"}}, {"id": 26, "text": "C", "bbox": {"l": 407.39746, "t": 325.29031, "r": 413.95975, "b": 333.81583, "coord_origin": "1"}}, {"id": 27, "text": "C", "bbox": {"l": 418.45959, "t": 325.45316, "r": 425.02188, "b": 333.97867, "coord_origin": "1"}}, {"id": 28, "text": "C", "bbox": {"l": 429.54593, "t": 325.4592, "r": 436.10822, "b": 333.98471, "coord_origin": "1"}}, {"id": 29, "text": "C", "bbox": {"l": 440.63608, "t": 325.38043, "r": 447.19836, "b": 333.90594, "coord_origin": "1"}}, {"id": 30, "text": "NL", "bbox": {"l": 451.89511000000005, "t": 280.15717, "r": 463.51273000000003, "b": 288.68268, "coord_origin": "1"}}, {"id": 31, "text": "NL", "bbox": {"l": 452.1557, "t": 291.59875000000005, "r": 463.77332, "b": 300.12427, "coord_origin": "1"}}, {"id": 32, "text": "NL", "bbox": {"l": 452.17688000000004, "t": 302.84265, "r": 463.79449000000005, "b": 311.36816, "coord_origin": "1"}}, {"id": 33, "text": "NL", "bbox": {"l": 452.09887999999995, "t": 314.12441999999993, "r": 463.71648999999996, "b": 322.6499299999999, "coord_origin": "1"}}, {"id": 34, "text": "NL", "bbox": {"l": 452.29733, "t": 325.46906, "r": 463.91495, "b": 333.99457, "coord_origin": "1"}}, {"id": 35, "text": "U", "bbox": {"l": 396.09677, "t": 314.49478, "r": 402.65906, "b": 323.02029000000005, "coord_origin": "1"}}, {"id": 36, "text": "U", "bbox": {"l": 395.99829, "t": 325.38876000000005, "r": 402.56058, "b": 333.91428, "coord_origin": "1"}}, {"id": 37, "text": "U", "bbox": {"l": 396.27475, "t": 292.27057, "r": 402.83704, "b": 300.79608, "coord_origin": "1"}}, {"id": 38, "text": "L", "bbox": {"l": 408.54724, "t": 280.96912, "r": 413.60074, "b": 289.49463, "coord_origin": "1"}}, {"id": 39, "text": "L", "bbox": {"l": 430.58966, "t": 280.49725, "r": 435.6431600000001, "b": 289.02277, "coord_origin": "1"}}, {"id": 40, "text": "L", "bbox": {"l": 441.08069, "t": 280.38062, "r": 446.13419, "b": 288.90613, "coord_origin": "1"}}, {"id": 41, "text": "X", "bbox": {"l": 407.97388, "t": 292.13425, "r": 414.03625, "b": 300.65976, "coord_origin": "1"}}, {"id": 42, "text": "NL", "bbox": {"l": 441.25640999999996, "t": 411.1807600000001, "r": 452.87402, "b": 419.7062700000001, "coord_origin": "1"}}, {"id": 43, "text": "vocabulary:", "bbox": {"l": 393.75256, "t": 399.7947700000001, "r": 432.48929, "b": 406.89935, "coord_origin": "1"}}, {"id": 44, "text": "5", "bbox": {"l": 434.5896000000001, "t": 399.7947700000001, "r": 438.80083999999994, "b": 406.89935, "coord_origin": "1"}}, {"id": 45, "text": "tokens", "bbox": {"l": 440.90573, "t": 399.7947700000001, "r": 463.22235, "b": 406.89935, "coord_origin": "1"}}, {"id": 46, "text": "D OTSL", "bbox": {"l": 384.11816, "t": 258.54718, "r": 413.99307, "b": 265.65179, "coord_origin": "1"}}, {"id": 47, "text": "sequence length:", "bbox": {"l": 393.75256, "t": 266.67505000000006, "r": 451.45129000000003, "b": 273.77966000000004, "coord_origin": "1"}}, {"id": 48, "text": "30", "bbox": {"l": 453.55083999999994, "t": 266.67505000000006, "r": 461.97485, "b": 273.77966000000004, "coord_origin": "1"}}, {"id": 49, "text": "vocabulary for this table:", "bbox": {"l": 151.79318, "t": 399.76016, "r": 233.89371000000003, "b": 406.86474999999996, "coord_origin": "1"}}, {"id": 50, "text": "12", "bbox": {"l": 235.99332, "t": 399.76016, "r": 244.41734000000002, "b": 406.86474999999996, "coord_origin": "1"}}, {"id": 51, "text": "tokens", "bbox": {"l": 246.52222, "t": 399.76016, "r": 268.83884, "b": 406.86474999999996, "coord_origin": "1"}}, {"id": 52, "text": "A", "bbox": {"l": 154.3298, "t": 213.57457999999997, "r": 159.79837, "b": 220.67920000000004, "coord_origin": "1"}}, {"id": 53, "text": "B", "bbox": {"l": 321.07053, "t": 213.57457999999997, "r": 326.53909, "b": 220.67920000000004, "coord_origin": "1"}}, {"id": 54, "text": "<table>", "bbox": {"l": 153.0947, "t": 280.30411, "r": 175.83888, "b": 286.69824, "coord_origin": "1"}}, {"id": 55, "text": "<tr>", "bbox": {"l": 160.67039, "t": 287.12088, "r": 172.79608, "b": 293.51501, "coord_origin": "1"}}, {"id": 56, "text": "<td", "bbox": {"l": 168.24603, "t": 293.93765, "r": 177.91019, "b": 300.33179, "coord_origin": "1"}}, {"id": 57, "text": "colspan=\u201c2\u201d", "bbox": {"l": 179.80525, "t": 293.93765, "r": 215.61517, "b": 300.33179, "coord_origin": "1"}}, {"id": 58, "text": "rowspan=\u201c2\u201d", "bbox": {"l": 217.50886999999997, "t": 293.93765, "r": 255.58945, "b": 300.33179, "coord_origin": "1"}}, {"id": 59, "text": ">", "bbox": {"l": 257.48315, "t": 293.93765, "r": 261.46414, "b": 300.33179, "coord_origin": "1"}}, {"id": 60, "text": "</td>", "bbox": {"l": 263.35785, "t": 293.93765, "r": 278.89804, "b": 300.33179, "coord_origin": "1"}}, {"id": 61, "text": "<td", "bbox": {"l": 280.79175, "t": 293.93765, "r": 290.4559, "b": 300.33179, "coord_origin": "1"}}, {"id": 62, "text": "colspan=\u201c3\u201d", "bbox": {"l": 292.35095, "t": 293.93765, "r": 328.16083, "b": 300.33179, "coord_origin": "1"}}, {"id": 63, "text": ">", "bbox": {"l": 330.05457, "t": 293.93765, "r": 334.03555, "b": 300.33179, "coord_origin": "1"}}, {"id": 64, "text": "</td>", "bbox": {"l": 335.92926, "t": 293.93765, "r": 351.46945, "b": 300.33179, "coord_origin": "1"}}, {"id": 65, "text": "</tr>", "bbox": {"l": 160.67039, "t": 300.75442999999996, "r": 174.68979, "b": 307.14856, "coord_origin": "1"}}, {"id": 66, "text": "<tr>", "bbox": {"l": 160.67039, "t": 307.57122999999996, "r": 172.79608, "b": 313.96536, "coord_origin": "1"}}, {"id": 67, "text": "<td>", "bbox": {"l": 168.24603, "t": 314.388, "r": 181.89255, "b": 320.78214, "coord_origin": "1"}}, {"id": 68, "text": "</td>", "bbox": {"l": 183.78624, "t": 314.388, "r": 199.32646, "b": 320.78214, "coord_origin": "1"}}, {"id": 69, "text": "<td>", "bbox": {"l": 201.22015, "t": 314.388, "r": 214.86666999999997, "b": 320.78214, "coord_origin": "1"}}, {"id": 70, "text": "</td>", "bbox": {"l": 216.76038, "t": 314.388, "r": 232.30058, "b": 320.78214, "coord_origin": "1"}}, {"id": 71, "text": "<td>", "bbox": {"l": 234.19427000000002, "t": 314.388, "r": 247.84079000000003, "b": 320.78214, "coord_origin": "1"}}, {"id": 72, "text": "</td>", "bbox": {"l": 249.73447999999996, "t": 314.388, "r": 265.27469, "b": 320.78214, "coord_origin": "1"}}, {"id": 73, "text": "</tr>", "bbox": {"l": 160.67039, "t": 321.20477, "r": 174.68979, "b": 327.59890999999993, "coord_origin": "1"}}, {"id": 74, "text": "<tr>", "bbox": {"l": 160.67039, "t": 328.02158, "r": 172.79608, "b": 334.41571000000005, "coord_origin": "1"}}, {"id": 75, "text": "<td rowspan=\u201c3\u201d > </td> <td> </td> <td> </td> <td> </td> <td> </td>", "bbox": {"l": 168.24603, "t": 334.83835, "r": 373.09091, "b": 341.23248, "coord_origin": "1"}}, {"id": 76, "text": "</tr>", "bbox": {"l": 160.67039, "t": 341.65512, "r": 174.68979, "b": 348.04926, "coord_origin": "1"}}, {"id": 77, "text": "<tr>", "bbox": {"l": 160.67039, "t": 348.47159, "r": 172.79608, "b": 354.86572, "coord_origin": "1"}}, {"id": 78, "text": "<td>", "bbox": {"l": 168.24603, "t": 355.28836000000007, "r": 181.89255, "b": 361.68249999999995, "coord_origin": "1"}}, {"id": 79, "text": "</td>", "bbox": {"l": 183.78624, "t": 355.28836000000007, "r": 199.32646, "b": 361.68249999999995, "coord_origin": "1"}}, {"id": 80, "text": "<td>", "bbox": {"l": 201.22015, "t": 355.28836000000007, "r": 214.86666999999997, "b": 361.68249999999995, "coord_origin": "1"}}, {"id": 81, "text": "</td>", "bbox": {"l": 216.76038, "t": 355.28836000000007, "r": 232.30058, "b": 361.68249999999995, "coord_origin": "1"}}, {"id": 82, "text": "<td>", "bbox": {"l": 234.19427000000002, "t": 355.28836000000007, "r": 247.84079000000003, "b": 361.68249999999995, "coord_origin": "1"}}, {"id": 83, "text": "</td>", "bbox": {"l": 249.73447999999996, "t": 355.28836000000007, "r": 265.27469, "b": 361.68249999999995, "coord_origin": "1"}}, {"id": 84, "text": "<td>", "bbox": {"l": 267.1684, "t": 355.28836000000007, "r": 280.81488, "b": 361.68249999999995, "coord_origin": "1"}}, {"id": 85, "text": "</td>", "bbox": {"l": 282.70862, "t": 355.28836000000007, "r": 298.24881, "b": 361.68249999999995, "coord_origin": "1"}}, {"id": 86, "text": "</tr>", "bbox": {"l": 160.67039, "t": 362.10516000000007, "r": 174.68979, "b": 368.49929999999995, "coord_origin": "1"}}, {"id": 87, "text": "<tr>", "bbox": {"l": 160.67039, "t": 368.92194, "r": 172.79608, "b": 375.31607, "coord_origin": "1"}}, {"id": 88, "text": "<td>", "bbox": {"l": 168.24603, "t": 375.73871, "r": 181.89255, "b": 382.13284, "coord_origin": "1"}}, {"id": 89, "text": "</td>", "bbox": {"l": 183.78624, "t": 375.73871, "r": 199.32646, "b": 382.13284, "coord_origin": "1"}}, {"id": 90, "text": "<td>", "bbox": {"l": 201.22015, "t": 375.73871, "r": 214.86666999999997, "b": 382.13284, "coord_origin": "1"}}, {"id": 91, "text": "</td>", "bbox": {"l": 216.76038, "t": 375.73871, "r": 232.30058, "b": 382.13284, "coord_origin": "1"}}, {"id": 92, "text": "<td>", "bbox": {"l": 234.19427000000002, "t": 375.73871, "r": 247.84079000000003, "b": 382.13284, "coord_origin": "1"}}, {"id": 93, "text": "</td>", "bbox": {"l": 249.73447999999996, "t": 375.73871, "r": 265.27469, "b": 382.13284, "coord_origin": "1"}}, {"id": 94, "text": "<td>", "bbox": {"l": 267.1684, "t": 375.73871, "r": 280.81488, "b": 382.13284, "coord_origin": "1"}}, {"id": 95, "text": "</td>", "bbox": {"l": 282.70862, "t": 375.73871, "r": 298.24881, "b": 382.13284, "coord_origin": "1"}}, {"id": 96, "text": "</tr>", "bbox": {"l": 160.67039, "t": 382.55551, "r": 174.68979, "b": 388.94965, "coord_origin": "1"}}, {"id": 97, "text": "</table>", "bbox": {"l": 153.0947, "t": 389.37228, "r": 177.73259, "b": 395.76642, "coord_origin": "1"}}, {"id": 98, "text": "C", "bbox": {"l": 395.06137, "t": 411.33353, "r": 401.62366, "b": 419.85904, "coord_origin": "1"}}, {"id": 99, "text": "L", "bbox": {"l": 407.42249, "t": 411.33353, "r": 412.47598, "b": 419.85904, "coord_origin": "1"}}, {"id": 100, "text": "U", "bbox": {"l": 418.69287, "t": 411.33353, "r": 425.25516, "b": 419.85904, "coord_origin": "1"}}, {"id": 101, "text": "X", "bbox": {"l": 430.5086099999999, "t": 411.33353, "r": 436.5709800000001, "b": 419.85904, "coord_origin": "1"}}, {"id": 102, "text": "<table>", "bbox": {"l": 152.36208, "t": 409.77362, "r": 175.10626, "b": 416.16776, "coord_origin": "1"}}, {"id": 103, "text": "<tr>", "bbox": {"l": 178.89366, "t": 409.77362, "r": 191.01935, "b": 416.16776, "coord_origin": "1"}}, {"id": 104, "text": "</tr>", "bbox": {"l": 194.80676, "t": 409.77362, "r": 208.82614, "b": 416.16776, "coord_origin": "1"}}, {"id": 105, "text": "<td>", "bbox": {"l": 212.61354, "t": 409.77362, "r": 226.26003999999998, "b": 416.16776, "coord_origin": "1"}}, {"id": 106, "text": "</td>", "bbox": {"l": 230.04745000000003, "t": 409.77362, "r": 245.58765000000002, "b": 416.16776, "coord_origin": "1"}}, {"id": 107, "text": "<td", "bbox": {"l": 249.37506000000002, "t": 409.77362, "r": 259.03918, "b": 416.16776, "coord_origin": "1"}}, {"id": 108, "text": "colspan=\"2\"", "bbox": {"l": 262.82797, "t": 409.77362, "r": 298.93646, "b": 416.16776, "coord_origin": "1"}}, {"id": 109, "text": "colspan=\"3\"", "bbox": {"l": 302.72385, "t": 409.77362, "r": 338.83234, "b": 416.16776, "coord_origin": "1"}}, {"id": 110, "text": "rowspan=\"2\"", "bbox": {"l": 152.36208, "t": 418.10522, "r": 190.74123, "b": 424.49936, "coord_origin": "1"}}, {"id": 111, "text": "rowspan=\"3\"", "bbox": {"l": 194.52863, "t": 418.10522, "r": 232.90777999999997, "b": 424.49936, "coord_origin": "1"}}, {"id": 112, "text": ">", "bbox": {"l": 236.69518999999997, "t": 418.10522, "r": 240.67617999999996, "b": 424.49936, "coord_origin": "1"}}, {"id": 113, "text": "</table>", "bbox": {"l": 244.46358, "t": 418.10522, "r": 269.10144, "b": 424.49936, "coord_origin": "1"}}, {"id": 114, "text": "C", "bbox": {"l": 154.50595, "t": 258.60095, "r": 159.62473, "b": 265.70556999999997, "coord_origin": "1"}}, {"id": 115, "text": "HTML", "bbox": {"l": 164.74348, "t": 258.60095, "r": 185.21857, "b": 265.70556999999997, "coord_origin": "1"}}, {"id": 116, "text": "sequence length:", "bbox": {"l": 164.3548, "t": 266.49707, "r": 222.05352999999997, "b": 273.60168, "coord_origin": "1"}}, {"id": 117, "text": "55", "bbox": {"l": 224.15326, "t": 266.49707, "r": 232.57729, "b": 273.60168, "coord_origin": "1"}}]}, "text": "", "data": null, "provenance": null, "predicted_class": null, "confidence": null}, {"label": "Text", "id": 4, "page_no": 1, "cluster": {"id": 4, "label": "Text", "bbox": {"l": 133.95978956222532, "t": 451.48462371826173, "r": 480.59232000000003, "b": 521.5370635986328, "coord_origin": "1"}, "confidence": 0.9744325280189514, "cells": [{"id": 118, "text": "today,", "bbox": {"l": 134.765, "t": 452.31378, "r": 161.32928, "b": 461.11075, "coord_origin": "1"}}, {"id": 119, "text": "table detection", "bbox": {"l": 164.269, "t": 452.31378, "r": 226.28617999999997, "b": 461.11075, "coord_origin": "1"}}, {"id": 120, "text": "in documents is a well understood problem, and the latest", "bbox": {"l": 229.992, "t": 452.31378, "r": 480.59232000000003, "b": 461.11075, "coord_origin": "1"}}, {"id": 121, "text": "state-of-the-art (SOTA) object detection methods provide an accuracy compa-", "bbox": {"l": 134.76501, "t": 464.26877, "r": 480.59180000000003, "b": 473.06573, "coord_origin": "1"}}, {"id": 122, "text": "rable to human observers [7,8,10,14,23]. On the other hand, the problem of table", "bbox": {"l": 134.76501, "t": 476.22375, "r": 480.58673, "b": 485.02072, "coord_origin": "1"}}, {"id": 123, "text": "structure recognition (TSR) is a lot more challenging and remains a very active", "bbox": {"l": 134.76501, "t": 488.17975, "r": 480.58658, "b": 496.97672, "coord_origin": "1"}}, {"id": 124, "text": "area of research, in which many novel machine learning algorithms are being", "bbox": {"l": 134.76501, "t": 500.13474, "r": 480.58978, "b": 508.9317, "coord_origin": "1"}}, {"id": 125, "text": "explored [3,4,5,9,11,12,13,14,17,18,21,22].", "bbox": {"l": 134.76501, "t": 512.0897199999999, "r": 313.24597, "b": 520.88669, "coord_origin": "1"}}]}, "text": "today, table detection in documents is a well understood problem, and the latest state-of-the-art (SOTA) object detection methods provide an accuracy comparable to human observers [7,8,10,14,23]. On the other hand, the problem of table structure recognition (TSR) is a lot more challenging and remains a very active area of research, in which many novel machine learning algorithms are being explored [3,4,5,9,11,12,13,14,17,18,21,22]."}, {"label": "Text", "id": 5, "page_no": 1, "cluster": {"id": 5, "label": "Text", "bbox": {"l": 133.8620867729187, "t": 523.3501098632813, "r": 480.59482, "b": 665.1943176269532, "coord_origin": "1"}, "confidence": 0.9866191148757935, "cells": [{"id": 126, "text": "Recently emerging SOTA methods for table structure recognition employ", "bbox": {"l": 149.70901, "t": 524.55072, "r": 480.58884000000006, "b": 533.3476900000001, "coord_origin": "1"}}, {"id": 127, "text": "transformer-based models, in which an image of the table is provided to the net-", "bbox": {"l": 134.76501, "t": 536.50671, "r": 480.5917400000001, "b": 545.30368, "coord_origin": "1"}}, {"id": 128, "text": "work in order to predict the structure of the table as a sequence of tokens. These", "bbox": {"l": 134.76501, "t": 548.46172, "r": 480.58868, "b": 557.25868, "coord_origin": "1"}}, {"id": 129, "text": "image-to-sequence (Im2Seq) models are extremely powerful, since they allow for", "bbox": {"l": 134.76501, "t": 560.41672, "r": 480.58795, "b": 569.2136800000001, "coord_origin": "1"}}, {"id": 130, "text": "a purely data-driven solution. The tokens of the sequence typically belong to a", "bbox": {"l": 134.76501, "t": 572.37172, "r": 480.58978, "b": 581.16869, "coord_origin": "1"}}, {"id": 131, "text": "markup language such as HTML, Latex or Markdown, which allow to describe", "bbox": {"l": 134.76501, "t": 584.32672, "r": 480.59479, "b": 593.12369, "coord_origin": "1"}}, {"id": 132, "text": "table structure as rows, columns and spanning cells in various configurations.", "bbox": {"l": 134.76501, "t": 596.28271, "r": 480.58678999999995, "b": 605.0796799999999, "coord_origin": "1"}}, {"id": 133, "text": "In Figure 1, we illustrate how HTML is used to represent the table-structure", "bbox": {"l": 134.76501, "t": 608.23772, "r": 480.59476, "b": 617.03468, "coord_origin": "1"}}, {"id": 134, "text": "of a particular example table. Public table-structure data sets such as PubTab-", "bbox": {"l": 134.76501, "t": 620.19272, "r": 480.5938100000001, "b": 628.98969, "coord_origin": "1"}}, {"id": 135, "text": "Net [22], and FinTabNet [21], which were created in a semi-automated way from", "bbox": {"l": 134.76501, "t": 632.1477199999999, "r": 480.59482, "b": 640.94469, "coord_origin": "1"}}, {"id": 136, "text": "paired PDF and HTML sources (e.g. PubMed Central), popularized primarily", "bbox": {"l": 134.76501, "t": 644.10272, "r": 480.58771, "b": 652.89969, "coord_origin": "1"}}, {"id": 137, "text": "the use of HTML as ground-truth representation format for TSR.", "bbox": {"l": 134.76501, "t": 656.05772, "r": 421.45377, "b": 664.8547, "coord_origin": "1"}}]}, "text": "Recently emerging SOTA methods for table structure recognition employ transformer-based models, in which an image of the table is provided to the network in order to predict the structure of the table as a sequence of tokens. These image-to-sequence (Im2Seq) models are extremely powerful, since they allow for a purely data-driven solution. The tokens of the sequence typically belong to a markup language such as HTML, Latex or Markdown, which allow to describe table structure as rows, columns and spanning cells in various configurations. In Figure 1, we illustrate how HTML is used to represent the table-structure of a particular example table. Public table-structure data sets such as PubTabNet [22], and FinTabNet [21], which were created in a semi-automated way from paired PDF and HTML sources (e.g. PubMed Central), popularized primarily the use of HTML as ground-truth representation format for TSR."}], "headers": [{"label": "Page-header", "id": 0, "page_no": 1, "cluster": {"id": 0, "label": "Page-header", "bbox": {"l": 134.28974075317385, "t": 93.54430103302002, "r": 139.49438409805296, "b": 101.84069999999997, "coord_origin": "1"}, "confidence": 0.862735390663147, "cells": [{"id": 0, "text": "2", "bbox": {"l": 134.765, "t": 93.77099999999996, "r": 139.37193, "b": 101.84069999999997, "coord_origin": "1"}}]}, "text": "2"}, {"label": "Page-header", "id": 1, "page_no": 1, "cluster": {"id": 1, "label": "Page-header", "bbox": {"l": 167.31274366378784, "t": 92.9727201461792, "r": 231.72227, "b": 102.11999702453613, "coord_origin": "1"}, "confidence": 0.930519700050354, "cells": [{"id": 1, "text": "M.", "bbox": {"l": 167.81335, "t": 93.77099999999996, "r": 178.07675, "b": 101.84069999999997, "coord_origin": "1"}}, {"id": 2, "text": "Lysak, et al.", "bbox": {"l": 182.37415, "t": 93.77099999999996, "r": 231.72227, "b": 101.84069999999997, "coord_origin": "1"}}]}, "text": "M. Lysak, et al."}]}}, {"page_no": 2, "page_hash": "69656f07bd8fb7afc53ab6f3d0e9153a337b550522493bf18d702c8406a9c545", "size": {"width": 612.0, "height": 792.0}, "cells": [{"id": 0, "text": "Optimized Table Tokenization for Table Structure Recognition", "bbox": {"l": 194.478, "t": 93.77099999999996, "r": 447.54291000000006, "b": 101.84069999999997, "coord_origin": "1"}}, {"id": 1, "text": "3", "bbox": {"l": 475.98431, "t": 93.77099999999996, "r": 480.59125000000006, "b": 101.84069999999997, "coord_origin": "1"}}, {"id": 2, "text": "While the majority of research in TSR is currently focused on the develop-", "bbox": {"l": 149.709, "t": 118.93377999999996, "r": 480.59183, "b": 127.73077, "coord_origin": "1"}}, {"id": 3, "text": "ment and application of novel neural model architectures, the table structure", "bbox": {"l": 134.765, "t": 130.88878999999997, "r": 480.58675999999997, "b": 139.68579, "coord_origin": "1"}}, {"id": 4, "text": "representation language (e.g. HTML in PubTabNet and FinTabNet) is usually", "bbox": {"l": 134.765, "t": 142.84479, "r": 480.5917400000001, "b": 151.64178000000004, "coord_origin": "1"}}, {"id": 5, "text": "adopted", "bbox": {"l": 134.765, "t": 154.7998, "r": 169.62514, "b": 163.59680000000003, "coord_origin": "1"}}, {"id": 6, "text": "as is", "bbox": {"l": 173.86099, "t": 154.7998, "r": 194.55531, "b": 163.59680000000003, "coord_origin": "1"}}, {"id": 7, "text": "for the sequence tokenization in Im2Seq models. In this paper,", "bbox": {"l": 199.60999, "t": 154.7998, "r": 480.58618, "b": 163.59680000000003, "coord_origin": "1"}}, {"id": 8, "text": "we aim for the opposite and investigate the impact of the table structure rep-", "bbox": {"l": 134.76498, "t": 166.75482, "r": 480.59167, "b": 175.55182000000002, "coord_origin": "1"}}, {"id": 9, "text": "resentation language with an otherwise unmodified Im2Seq transformer-based", "bbox": {"l": 134.76498, "t": 178.70983999999999, "r": 480.58968999999996, "b": 187.50684, "coord_origin": "1"}}, {"id": 10, "text": "architecture. Since the current state-of-the-art Im2Seq model is TableFormer [9],", "bbox": {"l": 134.76498, "t": 190.66485999999998, "r": 480.5917400000001, "b": 199.46185000000003, "coord_origin": "1"}}, {"id": 11, "text": "we select this model to perform our experiments.", "bbox": {"l": 134.76498, "t": 202.61987, "r": 348.35519, "b": 211.41687000000002, "coord_origin": "1"}}, {"id": 12, "text": "The main contribution of this paper is the introduction of a new optimised ta-", "bbox": {"l": 149.70898, "t": 214.83587999999997, "r": 480.5939, "b": 223.63287000000003, "coord_origin": "1"}}, {"id": 13, "text": "ble structure language (OTSL), specifically designed to describe table-structure", "bbox": {"l": 134.76498, "t": 226.79089, "r": 480.5938100000001, "b": 235.58789000000002, "coord_origin": "1"}}, {"id": 14, "text": "in an compact and structured way for Im2Seq models. OTSL has a number of", "bbox": {"l": 134.76498, "t": 238.74689, "r": 480.58667, "b": 247.54387999999994, "coord_origin": "1"}}, {"id": 15, "text": "key features, which make it very attractive to use in Im2Seq models. Specifically,", "bbox": {"l": 134.76498, "t": 250.70190000000002, "r": 480.5867, "b": 259.49890000000005, "coord_origin": "1"}}, {"id": 16, "text": "compared to other languages such as HTML, OTSL has a minimized vocabulary", "bbox": {"l": 134.76498, "t": 262.65692, "r": 480.58771, "b": 271.45392000000004, "coord_origin": "1"}}, {"id": 17, "text": "which yields short sequence length, strong inherent structure (e.g. strict rectan-", "bbox": {"l": 134.76498, "t": 274.61194, "r": 480.59572999999995, "b": 283.40891, "coord_origin": "1"}}, {"id": 18, "text": "gular layout) and a strict syntax with rules that only look backwards. The latter", "bbox": {"l": 134.76498, "t": 286.56692999999996, "r": 480.59274, "b": 295.36389, "coord_origin": "1"}}, {"id": 19, "text": "allows for syntax validation during inference and ensures a syntactically correct", "bbox": {"l": 134.76498, "t": 298.52190999999993, "r": 480.59473, "b": 307.31888, "coord_origin": "1"}}, {"id": 20, "text": "table-structure. These OTSL features are illustrated in Figure 1, in comparison", "bbox": {"l": 134.76498, "t": 310.47791, "r": 480.58667, "b": 319.27487, "coord_origin": "1"}}, {"id": 21, "text": "to HTML.", "bbox": {"l": 134.76498, "t": 322.43289, "r": 179.72021, "b": 331.22986, "coord_origin": "1"}}, {"id": 22, "text": "The paper is structured as follows. In section 2, we give an overview of the", "bbox": {"l": 149.70898, "t": 334.64789, "r": 480.5878000000001, "b": 343.44485000000003, "coord_origin": "1"}}, {"id": 23, "text": "latest developments in table-structure reconstruction. In section 3 we review", "bbox": {"l": 134.76498, "t": 346.60388000000006, "r": 480.59375, "b": 355.40085, "coord_origin": "1"}}, {"id": 24, "text": "the current HTML table encoding (popularised by PubTabNet and FinTabNet)", "bbox": {"l": 134.76498, "t": 358.55887, "r": 480.58673, "b": 367.3558300000001, "coord_origin": "1"}}, {"id": 25, "text": "and discuss its flaws. Subsequently, we introduce OTSL in section 4, which in-", "bbox": {"l": 134.76498, "t": 370.51385, "r": 480.59161, "b": 379.31082, "coord_origin": "1"}}, {"id": 26, "text": "cludes the language definition, syntax rules and error-correction procedures. In", "bbox": {"l": 134.76498, "t": 382.46883999999994, "r": 480.59177000000005, "b": 391.26581, "coord_origin": "1"}}, {"id": 27, "text": "section 5, we apply OTSL on the TableFormer architecture, compare it to Table-", "bbox": {"l": 134.76498, "t": 394.42383, "r": 480.58774, "b": 403.2207900000001, "coord_origin": "1"}}, {"id": 28, "text": "Former models trained on HTML and ultimately demonstrate the advantages", "bbox": {"l": 134.76498, "t": 406.37982, "r": 480.59469999999993, "b": 415.17679, "coord_origin": "1"}}, {"id": 29, "text": "of using OTSL. Finally, in section 6 we conclude our work and outline next", "bbox": {"l": 134.76498, "t": 418.33481, "r": 480.59567, "b": 427.13177, "coord_origin": "1"}}, {"id": 30, "text": "potential steps.", "bbox": {"l": 134.76498, "t": 430.28979, "r": 201.27232, "b": 439.08676, "coord_origin": "1"}}, {"id": 31, "text": "2", "bbox": {"l": 134.76498, "t": 462.08795, "r": 141.48859, "b": 472.65634, "coord_origin": "1"}}, {"id": 32, "text": "Related Work", "bbox": {"l": 154.93819, "t": 462.08795, "r": 236.76912999999996, "b": 472.65634, "coord_origin": "1"}}, {"id": 33, "text": "Approaches to formalize the logical structure and layout of tables in electronic", "bbox": {"l": 134.76498, "t": 488.68582, "r": 480.59067, "b": 497.48279, "coord_origin": "1"}}, {"id": 34, "text": "documents date back more than two decades [16]. In the recent past, a wide", "bbox": {"l": 134.76498, "t": 500.64081, "r": 480.5917400000001, "b": 509.43777, "coord_origin": "1"}}, {"id": 35, "text": "variety of computer vision methods have been explored to tackle the prob-", "bbox": {"l": 134.76498, "t": 512.5957900000001, "r": 480.58971999999994, "b": 521.39276, "coord_origin": "1"}}, {"id": 36, "text": "lem of table structure recognition, i.e. the correct identification of columns,", "bbox": {"l": 134.76498, "t": 524.55179, "r": 480.58966, "b": 533.34875, "coord_origin": "1"}}, {"id": 37, "text": "rows and spanning cells in a given table. Broadly speaking, the current deep-", "bbox": {"l": 134.76498, "t": 536.50679, "r": 480.5897499999999, "b": 545.30376, "coord_origin": "1"}}, {"id": 38, "text": "learning based approaches fall into three categories: object detection (OD) meth-", "bbox": {"l": 134.76498, "t": 548.4617900000001, "r": 480.58862000000005, "b": 557.2587599999999, "coord_origin": "1"}}, {"id": 39, "text": "ods, Graph-Neural-Network (GNN) methods and Image-to-Markup-Sequence", "bbox": {"l": 134.76498, "t": 560.41679, "r": 480.59072999999995, "b": 569.21376, "coord_origin": "1"}}, {"id": 40, "text": "(Im2Seq) methods. Object-detection based methods [11,12,13,14,21] rely on table-", "bbox": {"l": 134.76498, "t": 572.3718, "r": 484.12047999999993, "b": 581.16876, "coord_origin": "1"}}, {"id": 41, "text": "structure annotation using (overlapping) bounding boxes for training, and pro-", "bbox": {"l": 134.76498, "t": 584.3267999999999, "r": 480.59567, "b": 593.12376, "coord_origin": "1"}}, {"id": 42, "text": "duce bounding-box predictions to define table cells, rows, and columns on a table", "bbox": {"l": 134.76498, "t": 596.28279, "r": 480.58871, "b": 605.07976, "coord_origin": "1"}}, {"id": 43, "text": "image. Graph Neural Network (GNN) based methods [3,6,17,18], as the name", "bbox": {"l": 134.76498, "t": 608.23779, "r": 480.59075999999993, "b": 617.03476, "coord_origin": "1"}}, {"id": 44, "text": "suggests, represent tables as graph structures. The graph nodes represent the", "bbox": {"l": 134.76498, "t": 620.1927900000001, "r": 480.58574999999996, "b": 628.9897599999999, "coord_origin": "1"}}, {"id": 45, "text": "content of each table cell, an embedding vector from the table image, or geomet-", "bbox": {"l": 134.76498, "t": 632.1478, "r": 480.58875, "b": 640.94476, "coord_origin": "1"}}, {"id": 46, "text": "ric coordinates of the table cell. The edges of the graph define the relationship", "bbox": {"l": 134.76498, "t": 644.1028, "r": 480.58875, "b": 652.89977, "coord_origin": "1"}}, {"id": 47, "text": "between the nodes, e.g. if they belong to the same column, row, or table cell.", "bbox": {"l": 134.76498, "t": 656.05879, "r": 480.59069999999997, "b": 664.85577, "coord_origin": "1"}}], "predictions": {"layout": {"clusters": [{"id": 0, "label": "Page-header", "bbox": {"l": 194.03438358306886, "t": 93.05150842666626, "r": 447.54291000000006, "b": 102.33464670181274, "coord_origin": "1"}, "confidence": 0.9509506821632385, "cells": [{"id": 0, "text": "Optimized Table Tokenization for Table Structure Recognition", "bbox": {"l": 194.478, "t": 93.77099999999996, "r": 447.54291000000006, "b": 101.84069999999997, "coord_origin": "1"}}]}, {"id": 1, "label": "Page-header", "bbox": {"l": 474.9551456451416, "t": 93.63219079971316, "r": 480.59125000000006, "b": 101.84069999999997, "coord_origin": "1"}, "confidence": 0.8758732080459595, "cells": [{"id": 1, "text": "3", "bbox": {"l": 475.98431, "t": 93.77099999999996, "r": 480.59125000000006, "b": 101.84069999999997, "coord_origin": "1"}}]}, {"id": 2, "label": "Text", "bbox": {"l": 133.98119487762452, "t": 118.18482828140259, "r": 480.74181804656985, "b": 212.0443296432495, "coord_origin": "1"}, "confidence": 0.9855654239654541, "cells": [{"id": 2, "text": "While the majority of research in TSR is currently focused on the develop-", "bbox": {"l": 149.709, "t": 118.93377999999996, "r": 480.59183, "b": 127.73077, "coord_origin": "1"}}, {"id": 3, "text": "ment and application of novel neural model architectures, the table structure", "bbox": {"l": 134.765, "t": 130.88878999999997, "r": 480.58675999999997, "b": 139.68579, "coord_origin": "1"}}, {"id": 4, "text": "representation language (e.g. HTML in PubTabNet and FinTabNet) is usually", "bbox": {"l": 134.765, "t": 142.84479, "r": 480.5917400000001, "b": 151.64178000000004, "coord_origin": "1"}}, {"id": 5, "text": "adopted", "bbox": {"l": 134.765, "t": 154.7998, "r": 169.62514, "b": 163.59680000000003, "coord_origin": "1"}}, {"id": 6, "text": "as is", "bbox": {"l": 173.86099, "t": 154.7998, "r": 194.55531, "b": 163.59680000000003, "coord_origin": "1"}}, {"id": 7, "text": "for the sequence tokenization in Im2Seq models. In this paper,", "bbox": {"l": 199.60999, "t": 154.7998, "r": 480.58618, "b": 163.59680000000003, "coord_origin": "1"}}, {"id": 8, "text": "we aim for the opposite and investigate the impact of the table structure rep-", "bbox": {"l": 134.76498, "t": 166.75482, "r": 480.59167, "b": 175.55182000000002, "coord_origin": "1"}}, {"id": 9, "text": "resentation language with an otherwise unmodified Im2Seq transformer-based", "bbox": {"l": 134.76498, "t": 178.70983999999999, "r": 480.58968999999996, "b": 187.50684, "coord_origin": "1"}}, {"id": 10, "text": "architecture. Since the current state-of-the-art Im2Seq model is TableFormer [9],", "bbox": {"l": 134.76498, "t": 190.66485999999998, "r": 480.5917400000001, "b": 199.46185000000003, "coord_origin": "1"}}, {"id": 11, "text": "we select this model to perform our experiments.", "bbox": {"l": 134.76498, "t": 202.61987, "r": 348.35519, "b": 211.41687000000002, "coord_origin": "1"}}]}, {"id": 3, "label": "Text", "bbox": {"l": 133.77240915298464, "t": 214.33996238708494, "r": 480.87480239868165, "b": 331.22986, "coord_origin": "1"}, "confidence": 0.9869152903556824, "cells": [{"id": 12, "text": "The main contribution of this paper is the introduction of a new optimised ta-", "bbox": {"l": 149.70898, "t": 214.83587999999997, "r": 480.5939, "b": 223.63287000000003, "coord_origin": "1"}}, {"id": 13, "text": "ble structure language (OTSL), specifically designed to describe table-structure", "bbox": {"l": 134.76498, "t": 226.79089, "r": 480.5938100000001, "b": 235.58789000000002, "coord_origin": "1"}}, {"id": 14, "text": "in an compact and structured way for Im2Seq models. OTSL has a number of", "bbox": {"l": 134.76498, "t": 238.74689, "r": 480.58667, "b": 247.54387999999994, "coord_origin": "1"}}, {"id": 15, "text": "key features, which make it very attractive to use in Im2Seq models. Specifically,", "bbox": {"l": 134.76498, "t": 250.70190000000002, "r": 480.5867, "b": 259.49890000000005, "coord_origin": "1"}}, {"id": 16, "text": "compared to other languages such as HTML, OTSL has a minimized vocabulary", "bbox": {"l": 134.76498, "t": 262.65692, "r": 480.58771, "b": 271.45392000000004, "coord_origin": "1"}}, {"id": 17, "text": "which yields short sequence length, strong inherent structure (e.g. strict rectan-", "bbox": {"l": 134.76498, "t": 274.61194, "r": 480.59572999999995, "b": 283.40891, "coord_origin": "1"}}, {"id": 18, "text": "gular layout) and a strict syntax with rules that only look backwards. The latter", "bbox": {"l": 134.76498, "t": 286.56692999999996, "r": 480.59274, "b": 295.36389, "coord_origin": "1"}}, {"id": 19, "text": "allows for syntax validation during inference and ensures a syntactically correct", "bbox": {"l": 134.76498, "t": 298.52190999999993, "r": 480.59473, "b": 307.31888, "coord_origin": "1"}}, {"id": 20, "text": "table-structure. These OTSL features are illustrated in Figure 1, in comparison", "bbox": {"l": 134.76498, "t": 310.47791, "r": 480.58667, "b": 319.27487, "coord_origin": "1"}}, {"id": 21, "text": "to HTML.", "bbox": {"l": 134.76498, "t": 322.43289, "r": 179.72021, "b": 331.22986, "coord_origin": "1"}}]}, {"id": 4, "label": "Text", "bbox": {"l": 133.75097465515137, "t": 333.3511470794678, "r": 480.60798740386963, "b": 439.85488815307616, "coord_origin": "1"}, "confidence": 0.9858303070068359, "cells": [{"id": 22, "text": "The paper is structured as follows. In section 2, we give an overview of the", "bbox": {"l": 149.70898, "t": 334.64789, "r": 480.5878000000001, "b": 343.44485000000003, "coord_origin": "1"}}, {"id": 23, "text": "latest developments in table-structure reconstruction. In section 3 we review", "bbox": {"l": 134.76498, "t": 346.60388000000006, "r": 480.59375, "b": 355.40085, "coord_origin": "1"}}, {"id": 24, "text": "the current HTML table encoding (popularised by PubTabNet and FinTabNet)", "bbox": {"l": 134.76498, "t": 358.55887, "r": 480.58673, "b": 367.3558300000001, "coord_origin": "1"}}, {"id": 25, "text": "and discuss its flaws. Subsequently, we introduce OTSL in section 4, which in-", "bbox": {"l": 134.76498, "t": 370.51385, "r": 480.59161, "b": 379.31082, "coord_origin": "1"}}, {"id": 26, "text": "cludes the language definition, syntax rules and error-correction procedures. In", "bbox": {"l": 134.76498, "t": 382.46883999999994, "r": 480.59177000000005, "b": 391.26581, "coord_origin": "1"}}, {"id": 27, "text": "section 5, we apply OTSL on the TableFormer architecture, compare it to Table-", "bbox": {"l": 134.76498, "t": 394.42383, "r": 480.58774, "b": 403.2207900000001, "coord_origin": "1"}}, {"id": 28, "text": "Former models trained on HTML and ultimately demonstrate the advantages", "bbox": {"l": 134.76498, "t": 406.37982, "r": 480.59469999999993, "b": 415.17679, "coord_origin": "1"}}, {"id": 29, "text": "of using OTSL. Finally, in section 6 we conclude our work and outline next", "bbox": {"l": 134.76498, "t": 418.33481, "r": 480.59567, "b": 427.13177, "coord_origin": "1"}}, {"id": 30, "text": "potential steps.", "bbox": {"l": 134.76498, "t": 430.28979, "r": 201.27232, "b": 439.08676, "coord_origin": "1"}}]}, {"id": 5, "label": "Section-header", "bbox": {"l": 134.49938735961913, "t": 461.4249195098877, "r": 236.76912999999996, "b": 472.65634, "coord_origin": "1"}, "confidence": 0.9480533599853516, "cells": [{"id": 31, "text": "2", "bbox": {"l": 134.76498, "t": 462.08795, "r": 141.48859, "b": 472.65634, "coord_origin": "1"}}, {"id": 32, "text": "Related Work", "bbox": {"l": 154.93819, "t": 462.08795, "r": 236.76912999999996, "b": 472.65634, "coord_origin": "1"}}]}, {"id": 6, "label": "Text", "bbox": {"l": 133.6534761428833, "t": 487.3701599121094, "r": 484.12047999999993, "b": 665.3428871154784, "coord_origin": "1"}, "confidence": 0.987200915813446, "cells": [{"id": 33, "text": "Approaches to formalize the logical structure and layout of tables in electronic", "bbox": {"l": 134.76498, "t": 488.68582, "r": 480.59067, "b": 497.48279, "coord_origin": "1"}}, {"id": 34, "text": "documents date back more than two decades [16]. In the recent past, a wide", "bbox": {"l": 134.76498, "t": 500.64081, "r": 480.5917400000001, "b": 509.43777, "coord_origin": "1"}}, {"id": 35, "text": "variety of computer vision methods have been explored to tackle the prob-", "bbox": {"l": 134.76498, "t": 512.5957900000001, "r": 480.58971999999994, "b": 521.39276, "coord_origin": "1"}}, {"id": 36, "text": "lem of table structure recognition, i.e. the correct identification of columns,", "bbox": {"l": 134.76498, "t": 524.55179, "r": 480.58966, "b": 533.34875, "coord_origin": "1"}}, {"id": 37, "text": "rows and spanning cells in a given table. Broadly speaking, the current deep-", "bbox": {"l": 134.76498, "t": 536.50679, "r": 480.5897499999999, "b": 545.30376, "coord_origin": "1"}}, {"id": 38, "text": "learning based approaches fall into three categories: object detection (OD) meth-", "bbox": {"l": 134.76498, "t": 548.4617900000001, "r": 480.58862000000005, "b": 557.2587599999999, "coord_origin": "1"}}, {"id": 39, "text": "ods, Graph-Neural-Network (GNN) methods and Image-to-Markup-Sequence", "bbox": {"l": 134.76498, "t": 560.41679, "r": 480.59072999999995, "b": 569.21376, "coord_origin": "1"}}, {"id": 40, "text": "(Im2Seq) methods. Object-detection based methods [11,12,13,14,21] rely on table-", "bbox": {"l": 134.76498, "t": 572.3718, "r": 484.12047999999993, "b": 581.16876, "coord_origin": "1"}}, {"id": 41, "text": "structure annotation using (overlapping) bounding boxes for training, and pro-", "bbox": {"l": 134.76498, "t": 584.3267999999999, "r": 480.59567, "b": 593.12376, "coord_origin": "1"}}, {"id": 42, "text": "duce bounding-box predictions to define table cells, rows, and columns on a table", "bbox": {"l": 134.76498, "t": 596.28279, "r": 480.58871, "b": 605.07976, "coord_origin": "1"}}, {"id": 43, "text": "image. Graph Neural Network (GNN) based methods [3,6,17,18], as the name", "bbox": {"l": 134.76498, "t": 608.23779, "r": 480.59075999999993, "b": 617.03476, "coord_origin": "1"}}, {"id": 44, "text": "suggests, represent tables as graph structures. The graph nodes represent the", "bbox": {"l": 134.76498, "t": 620.1927900000001, "r": 480.58574999999996, "b": 628.9897599999999, "coord_origin": "1"}}, {"id": 45, "text": "content of each table cell, an embedding vector from the table image, or geomet-", "bbox": {"l": 134.76498, "t": 632.1478, "r": 480.58875, "b": 640.94476, "coord_origin": "1"}}, {"id": 46, "text": "ric coordinates of the table cell. The edges of the graph define the relationship", "bbox": {"l": 134.76498, "t": 644.1028, "r": 480.58875, "b": 652.89977, "coord_origin": "1"}}, {"id": 47, "text": "between the nodes, e.g. if they belong to the same column, row, or table cell.", "bbox": {"l": 134.76498, "t": 656.05879, "r": 480.59069999999997, "b": 664.85577, "coord_origin": "1"}}]}]}, "tablestructure": {"table_map": {}}, "figures_classification": null, "equations_prediction": null}, "assembled": {"elements": [{"label": "Page-header", "id": 0, "page_no": 2, "cluster": {"id": 0, "label": "Page-header", "bbox": {"l": 194.03438358306886, "t": 93.05150842666626, "r": 447.54291000000006, "b": 102.33464670181274, "coord_origin": "1"}, "confidence": 0.9509506821632385, "cells": [{"id": 0, "text": "Optimized Table Tokenization for Table Structure Recognition", "bbox": {"l": 194.478, "t": 93.77099999999996, "r": 447.54291000000006, "b": 101.84069999999997, "coord_origin": "1"}}]}, "text": "Optimized Table Tokenization for Table Structure Recognition"}, {"label": "Page-header", "id": 1, "page_no": 2, "cluster": {"id": 1, "label": "Page-header", "bbox": {"l": 474.9551456451416, "t": 93.63219079971316, "r": 480.59125000000006, "b": 101.84069999999997, "coord_origin": "1"}, "confidence": 0.8758732080459595, "cells": [{"id": 1, "text": "3", "bbox": {"l": 475.98431, "t": 93.77099999999996, "r": 480.59125000000006, "b": 101.84069999999997, "coord_origin": "1"}}]}, "text": "3"}, {"label": "Text", "id": 2, "page_no": 2, "cluster": {"id": 2, "label": "Text", "bbox": {"l": 133.98119487762452, "t": 118.18482828140259, "r": 480.74181804656985, "b": 212.0443296432495, "coord_origin": "1"}, "confidence": 0.9855654239654541, "cells": [{"id": 2, "text": "While the majority of research in TSR is currently focused on the develop-", "bbox": {"l": 149.709, "t": 118.93377999999996, "r": 480.59183, "b": 127.73077, "coord_origin": "1"}}, {"id": 3, "text": "ment and application of novel neural model architectures, the table structure", "bbox": {"l": 134.765, "t": 130.88878999999997, "r": 480.58675999999997, "b": 139.68579, "coord_origin": "1"}}, {"id": 4, "text": "representation language (e.g. HTML in PubTabNet and FinTabNet) is usually", "bbox": {"l": 134.765, "t": 142.84479, "r": 480.5917400000001, "b": 151.64178000000004, "coord_origin": "1"}}, {"id": 5, "text": "adopted", "bbox": {"l": 134.765, "t": 154.7998, "r": 169.62514, "b": 163.59680000000003, "coord_origin": "1"}}, {"id": 6, "text": "as is", "bbox": {"l": 173.86099, "t": 154.7998, "r": 194.55531, "b": 163.59680000000003, "coord_origin": "1"}}, {"id": 7, "text": "for the sequence tokenization in Im2Seq models. In this paper,", "bbox": {"l": 199.60999, "t": 154.7998, "r": 480.58618, "b": 163.59680000000003, "coord_origin": "1"}}, {"id": 8, "text": "we aim for the opposite and investigate the impact of the table structure rep-", "bbox": {"l": 134.76498, "t": 166.75482, "r": 480.59167, "b": 175.55182000000002, "coord_origin": "1"}}, {"id": 9, "text": "resentation language with an otherwise unmodified Im2Seq transformer-based", "bbox": {"l": 134.76498, "t": 178.70983999999999, "r": 480.58968999999996, "b": 187.50684, "coord_origin": "1"}}, {"id": 10, "text": "architecture. Since the current state-of-the-art Im2Seq model is TableFormer [9],", "bbox": {"l": 134.76498, "t": 190.66485999999998, "r": 480.5917400000001, "b": 199.46185000000003, "coord_origin": "1"}}, {"id": 11, "text": "we select this model to perform our experiments.", "bbox": {"l": 134.76498, "t": 202.61987, "r": 348.35519, "b": 211.41687000000002, "coord_origin": "1"}}]}, "text": "While the majority of research in TSR is currently focused on the development and application of novel neural model architectures, the table structure representation language (e.g. HTML in PubTabNet and FinTabNet) is usually adopted as is for the sequence tokenization in Im2Seq models. In this paper, we aim for the opposite and investigate the impact of the table structure representation language with an otherwise unmodified Im2Seq transformer-based architecture. Since the current state-of-the-art Im2Seq model is TableFormer [9], we select this model to perform our experiments."}, {"label": "Text", "id": 3, "page_no": 2, "cluster": {"id": 3, "label": "Text", "bbox": {"l": 133.77240915298464, "t": 214.33996238708494, "r": 480.87480239868165, "b": 331.22986, "coord_origin": "1"}, "confidence": 0.9869152903556824, "cells": [{"id": 12, "text": "The main contribution of this paper is the introduction of a new optimised ta-", "bbox": {"l": 149.70898, "t": 214.83587999999997, "r": 480.5939, "b": 223.63287000000003, "coord_origin": "1"}}, {"id": 13, "text": "ble structure language (OTSL), specifically designed to describe table-structure", "bbox": {"l": 134.76498, "t": 226.79089, "r": 480.5938100000001, "b": 235.58789000000002, "coord_origin": "1"}}, {"id": 14, "text": "in an compact and structured way for Im2Seq models. OTSL has a number of", "bbox": {"l": 134.76498, "t": 238.74689, "r": 480.58667, "b": 247.54387999999994, "coord_origin": "1"}}, {"id": 15, "text": "key features, which make it very attractive to use in Im2Seq models. Specifically,", "bbox": {"l": 134.76498, "t": 250.70190000000002, "r": 480.5867, "b": 259.49890000000005, "coord_origin": "1"}}, {"id": 16, "text": "compared to other languages such as HTML, OTSL has a minimized vocabulary", "bbox": {"l": 134.76498, "t": 262.65692, "r": 480.58771, "b": 271.45392000000004, "coord_origin": "1"}}, {"id": 17, "text": "which yields short sequence length, strong inherent structure (e.g. strict rectan-", "bbox": {"l": 134.76498, "t": 274.61194, "r": 480.59572999999995, "b": 283.40891, "coord_origin": "1"}}, {"id": 18, "text": "gular layout) and a strict syntax with rules that only look backwards. The latter", "bbox": {"l": 134.76498, "t": 286.56692999999996, "r": 480.59274, "b": 295.36389, "coord_origin": "1"}}, {"id": 19, "text": "allows for syntax validation during inference and ensures a syntactically correct", "bbox": {"l": 134.76498, "t": 298.52190999999993, "r": 480.59473, "b": 307.31888, "coord_origin": "1"}}, {"id": 20, "text": "table-structure. These OTSL features are illustrated in Figure 1, in comparison", "bbox": {"l": 134.76498, "t": 310.47791, "r": 480.58667, "b": 319.27487, "coord_origin": "1"}}, {"id": 21, "text": "to HTML.", "bbox": {"l": 134.76498, "t": 322.43289, "r": 179.72021, "b": 331.22986, "coord_origin": "1"}}]}, "text": "The main contribution of this paper is the introduction of a new optimised table structure language (OTSL), specifically designed to describe table-structure in an compact and structured way for Im2Seq models. OTSL has a number of key features, which make it very attractive to use in Im2Seq models. Specifically, compared to other languages such as HTML, OTSL has a minimized vocabulary which yields short sequence length, strong inherent structure (e.g. strict rectangular layout) and a strict syntax with rules that only look backwards. The latter allows for syntax validation during inference and ensures a syntactically correct table-structure. These OTSL features are illustrated in Figure 1, in comparison to HTML."}, {"label": "Text", "id": 4, "page_no": 2, "cluster": {"id": 4, "label": "Text", "bbox": {"l": 133.75097465515137, "t": 333.3511470794678, "r": 480.60798740386963, "b": 439.85488815307616, "coord_origin": "1"}, "confidence": 0.9858303070068359, "cells": [{"id": 22, "text": "The paper is structured as follows. In section 2, we give an overview of the", "bbox": {"l": 149.70898, "t": 334.64789, "r": 480.5878000000001, "b": 343.44485000000003, "coord_origin": "1"}}, {"id": 23, "text": "latest developments in table-structure reconstruction. In section 3 we review", "bbox": {"l": 134.76498, "t": 346.60388000000006, "r": 480.59375, "b": 355.40085, "coord_origin": "1"}}, {"id": 24, "text": "the current HTML table encoding (popularised by PubTabNet and FinTabNet)", "bbox": {"l": 134.76498, "t": 358.55887, "r": 480.58673, "b": 367.3558300000001, "coord_origin": "1"}}, {"id": 25, "text": "and discuss its flaws. Subsequently, we introduce OTSL in section 4, which in-", "bbox": {"l": 134.76498, "t": 370.51385, "r": 480.59161, "b": 379.31082, "coord_origin": "1"}}, {"id": 26, "text": "cludes the language definition, syntax rules and error-correction procedures. In", "bbox": {"l": 134.76498, "t": 382.46883999999994, "r": 480.59177000000005, "b": 391.26581, "coord_origin": "1"}}, {"id": 27, "text": "section 5, we apply OTSL on the TableFormer architecture, compare it to Table-", "bbox": {"l": 134.76498, "t": 394.42383, "r": 480.58774, "b": 403.2207900000001, "coord_origin": "1"}}, {"id": 28, "text": "Former models trained on HTML and ultimately demonstrate the advantages", "bbox": {"l": 134.76498, "t": 406.37982, "r": 480.59469999999993, "b": 415.17679, "coord_origin": "1"}}, {"id": 29, "text": "of using OTSL. Finally, in section 6 we conclude our work and outline next", "bbox": {"l": 134.76498, "t": 418.33481, "r": 480.59567, "b": 427.13177, "coord_origin": "1"}}, {"id": 30, "text": "potential steps.", "bbox": {"l": 134.76498, "t": 430.28979, "r": 201.27232, "b": 439.08676, "coord_origin": "1"}}]}, "text": "The paper is structured as follows. In section 2, we give an overview of the latest developments in table-structure reconstruction. In section 3 we review the current HTML table encoding (popularised by PubTabNet and FinTabNet) and discuss its flaws. Subsequently, we introduce OTSL in section 4, which includes the language definition, syntax rules and error-correction procedures. In section 5, we apply OTSL on the TableFormer architecture, compare it to TableFormer models trained on HTML and ultimately demonstrate the advantages of using OTSL. Finally, in section 6 we conclude our work and outline next potential steps."}, {"label": "Section-header", "id": 5, "page_no": 2, "cluster": {"id": 5, "label": "Section-header", "bbox": {"l": 134.49938735961913, "t": 461.4249195098877, "r": 236.76912999999996, "b": 472.65634, "coord_origin": "1"}, "confidence": 0.9480533599853516, "cells": [{"id": 31, "text": "2", "bbox": {"l": 134.76498, "t": 462.08795, "r": 141.48859, "b": 472.65634, "coord_origin": "1"}}, {"id": 32, "text": "Related Work", "bbox": {"l": 154.93819, "t": 462.08795, "r": 236.76912999999996, "b": 472.65634, "coord_origin": "1"}}]}, "text": "2 Related Work"}, {"label": "Text", "id": 6, "page_no": 2, "cluster": {"id": 6, "label": "Text", "bbox": {"l": 133.6534761428833, "t": 487.3701599121094, "r": 484.12047999999993, "b": 665.3428871154784, "coord_origin": "1"}, "confidence": 0.987200915813446, "cells": [{"id": 33, "text": "Approaches to formalize the logical structure and layout of tables in electronic", "bbox": {"l": 134.76498, "t": 488.68582, "r": 480.59067, "b": 497.48279, "coord_origin": "1"}}, {"id": 34, "text": "documents date back more than two decades [16]. In the recent past, a wide", "bbox": {"l": 134.76498, "t": 500.64081, "r": 480.5917400000001, "b": 509.43777, "coord_origin": "1"}}, {"id": 35, "text": "variety of computer vision methods have been explored to tackle the prob-", "bbox": {"l": 134.76498, "t": 512.5957900000001, "r": 480.58971999999994, "b": 521.39276, "coord_origin": "1"}}, {"id": 36, "text": "lem of table structure recognition, i.e. the correct identification of columns,", "bbox": {"l": 134.76498, "t": 524.55179, "r": 480.58966, "b": 533.34875, "coord_origin": "1"}}, {"id": 37, "text": "rows and spanning cells in a given table. Broadly speaking, the current deep-", "bbox": {"l": 134.76498, "t": 536.50679, "r": 480.5897499999999, "b": 545.30376, "coord_origin": "1"}}, {"id": 38, "text": "learning based approaches fall into three categories: object detection (OD) meth-", "bbox": {"l": 134.76498, "t": 548.4617900000001, "r": 480.58862000000005, "b": 557.2587599999999, "coord_origin": "1"}}, {"id": 39, "text": "ods, Graph-Neural-Network (GNN) methods and Image-to-Markup-Sequence", "bbox": {"l": 134.76498, "t": 560.41679, "r": 480.59072999999995, "b": 569.21376, "coord_origin": "1"}}, {"id": 40, "text": "(Im2Seq) methods. Object-detection based methods [11,12,13,14,21] rely on table-", "bbox": {"l": 134.76498, "t": 572.3718, "r": 484.12047999999993, "b": 581.16876, "coord_origin": "1"}}, {"id": 41, "text": "structure annotation using (overlapping) bounding boxes for training, and pro-", "bbox": {"l": 134.76498, "t": 584.3267999999999, "r": 480.59567, "b": 593.12376, "coord_origin": "1"}}, {"id": 42, "text": "duce bounding-box predictions to define table cells, rows, and columns on a table", "bbox": {"l": 134.76498, "t": 596.28279, "r": 480.58871, "b": 605.07976, "coord_origin": "1"}}, {"id": 43, "text": "image. Graph Neural Network (GNN) based methods [3,6,17,18], as the name", "bbox": {"l": 134.76498, "t": 608.23779, "r": 480.59075999999993, "b": 617.03476, "coord_origin": "1"}}, {"id": 44, "text": "suggests, represent tables as graph structures. The graph nodes represent the", "bbox": {"l": 134.76498, "t": 620.1927900000001, "r": 480.58574999999996, "b": 628.9897599999999, "coord_origin": "1"}}, {"id": 45, "text": "content of each table cell, an embedding vector from the table image, or geomet-", "bbox": {"l": 134.76498, "t": 632.1478, "r": 480.58875, "b": 640.94476, "coord_origin": "1"}}, {"id": 46, "text": "ric coordinates of the table cell. The edges of the graph define the relationship", "bbox": {"l": 134.76498, "t": 644.1028, "r": 480.58875, "b": 652.89977, "coord_origin": "1"}}, {"id": 47, "text": "between the nodes, e.g. if they belong to the same column, row, or table cell.", "bbox": {"l": 134.76498, "t": 656.05879, "r": 480.59069999999997, "b": 664.85577, "coord_origin": "1"}}]}, "text": "Approaches to formalize the logical structure and layout of tables in electronic documents date back more than two decades [16]. In the recent past, a wide variety of computer vision methods have been explored to tackle the problem of table structure recognition, i.e. the correct identification of columns, rows and spanning cells in a given table. Broadly speaking, the current deeplearning based approaches fall into three categories: object detection (OD) methods, Graph-Neural-Network (GNN) methods and Image-to-Markup-Sequence (Im2Seq) methods. Object-detection based methods [11,12,13,14,21] rely on tablestructure annotation using (overlapping) bounding boxes for training, and produce bounding-box predictions to define table cells, rows, and columns on a table image. Graph Neural Network (GNN) based methods [3,6,17,18], as the name suggests, represent tables as graph structures. The graph nodes represent the content of each table cell, an embedding vector from the table image, or geometric coordinates of the table cell. The edges of the graph define the relationship between the nodes, e.g. if they belong to the same column, row, or table cell."}], "body": [{"label": "Text", "id": 2, "page_no": 2, "cluster": {"id": 2, "label": "Text", "bbox": {"l": 133.98119487762452, "t": 118.18482828140259, "r": 480.74181804656985, "b": 212.0443296432495, "coord_origin": "1"}, "confidence": 0.9855654239654541, "cells": [{"id": 2, "text": "While the majority of research in TSR is currently focused on the develop-", "bbox": {"l": 149.709, "t": 118.93377999999996, "r": 480.59183, "b": 127.73077, "coord_origin": "1"}}, {"id": 3, "text": "ment and application of novel neural model architectures, the table structure", "bbox": {"l": 134.765, "t": 130.88878999999997, "r": 480.58675999999997, "b": 139.68579, "coord_origin": "1"}}, {"id": 4, "text": "representation language (e.g. HTML in PubTabNet and FinTabNet) is usually", "bbox": {"l": 134.765, "t": 142.84479, "r": 480.5917400000001, "b": 151.64178000000004, "coord_origin": "1"}}, {"id": 5, "text": "adopted", "bbox": {"l": 134.765, "t": 154.7998, "r": 169.62514, "b": 163.59680000000003, "coord_origin": "1"}}, {"id": 6, "text": "as is", "bbox": {"l": 173.86099, "t": 154.7998, "r": 194.55531, "b": 163.59680000000003, "coord_origin": "1"}}, {"id": 7, "text": "for the sequence tokenization in Im2Seq models. In this paper,", "bbox": {"l": 199.60999, "t": 154.7998, "r": 480.58618, "b": 163.59680000000003, "coord_origin": "1"}}, {"id": 8, "text": "we aim for the opposite and investigate the impact of the table structure rep-", "bbox": {"l": 134.76498, "t": 166.75482, "r": 480.59167, "b": 175.55182000000002, "coord_origin": "1"}}, {"id": 9, "text": "resentation language with an otherwise unmodified Im2Seq transformer-based", "bbox": {"l": 134.76498, "t": 178.70983999999999, "r": 480.58968999999996, "b": 187.50684, "coord_origin": "1"}}, {"id": 10, "text": "architecture. Since the current state-of-the-art Im2Seq model is TableFormer [9],", "bbox": {"l": 134.76498, "t": 190.66485999999998, "r": 480.5917400000001, "b": 199.46185000000003, "coord_origin": "1"}}, {"id": 11, "text": "we select this model to perform our experiments.", "bbox": {"l": 134.76498, "t": 202.61987, "r": 348.35519, "b": 211.41687000000002, "coord_origin": "1"}}]}, "text": "While the majority of research in TSR is currently focused on the development and application of novel neural model architectures, the table structure representation language (e.g. HTML in PubTabNet and FinTabNet) is usually adopted as is for the sequence tokenization in Im2Seq models. In this paper, we aim for the opposite and investigate the impact of the table structure representation language with an otherwise unmodified Im2Seq transformer-based architecture. Since the current state-of-the-art Im2Seq model is TableFormer [9], we select this model to perform our experiments."}, {"label": "Text", "id": 3, "page_no": 2, "cluster": {"id": 3, "label": "Text", "bbox": {"l": 133.77240915298464, "t": 214.33996238708494, "r": 480.87480239868165, "b": 331.22986, "coord_origin": "1"}, "confidence": 0.9869152903556824, "cells": [{"id": 12, "text": "The main contribution of this paper is the introduction of a new optimised ta-", "bbox": {"l": 149.70898, "t": 214.83587999999997, "r": 480.5939, "b": 223.63287000000003, "coord_origin": "1"}}, {"id": 13, "text": "ble structure language (OTSL), specifically designed to describe table-structure", "bbox": {"l": 134.76498, "t": 226.79089, "r": 480.5938100000001, "b": 235.58789000000002, "coord_origin": "1"}}, {"id": 14, "text": "in an compact and structured way for Im2Seq models. OTSL has a number of", "bbox": {"l": 134.76498, "t": 238.74689, "r": 480.58667, "b": 247.54387999999994, "coord_origin": "1"}}, {"id": 15, "text": "key features, which make it very attractive to use in Im2Seq models. Specifically,", "bbox": {"l": 134.76498, "t": 250.70190000000002, "r": 480.5867, "b": 259.49890000000005, "coord_origin": "1"}}, {"id": 16, "text": "compared to other languages such as HTML, OTSL has a minimized vocabulary", "bbox": {"l": 134.76498, "t": 262.65692, "r": 480.58771, "b": 271.45392000000004, "coord_origin": "1"}}, {"id": 17, "text": "which yields short sequence length, strong inherent structure (e.g. strict rectan-", "bbox": {"l": 134.76498, "t": 274.61194, "r": 480.59572999999995, "b": 283.40891, "coord_origin": "1"}}, {"id": 18, "text": "gular layout) and a strict syntax with rules that only look backwards. The latter", "bbox": {"l": 134.76498, "t": 286.56692999999996, "r": 480.59274, "b": 295.36389, "coord_origin": "1"}}, {"id": 19, "text": "allows for syntax validation during inference and ensures a syntactically correct", "bbox": {"l": 134.76498, "t": 298.52190999999993, "r": 480.59473, "b": 307.31888, "coord_origin": "1"}}, {"id": 20, "text": "table-structure. These OTSL features are illustrated in Figure 1, in comparison", "bbox": {"l": 134.76498, "t": 310.47791, "r": 480.58667, "b": 319.27487, "coord_origin": "1"}}, {"id": 21, "text": "to HTML.", "bbox": {"l": 134.76498, "t": 322.43289, "r": 179.72021, "b": 331.22986, "coord_origin": "1"}}]}, "text": "The main contribution of this paper is the introduction of a new optimised table structure language (OTSL), specifically designed to describe table-structure in an compact and structured way for Im2Seq models. OTSL has a number of key features, which make it very attractive to use in Im2Seq models. Specifically, compared to other languages such as HTML, OTSL has a minimized vocabulary which yields short sequence length, strong inherent structure (e.g. strict rectangular layout) and a strict syntax with rules that only look backwards. The latter allows for syntax validation during inference and ensures a syntactically correct table-structure. These OTSL features are illustrated in Figure 1, in comparison to HTML."}, {"label": "Text", "id": 4, "page_no": 2, "cluster": {"id": 4, "label": "Text", "bbox": {"l": 133.75097465515137, "t": 333.3511470794678, "r": 480.60798740386963, "b": 439.85488815307616, "coord_origin": "1"}, "confidence": 0.9858303070068359, "cells": [{"id": 22, "text": "The paper is structured as follows. In section 2, we give an overview of the", "bbox": {"l": 149.70898, "t": 334.64789, "r": 480.5878000000001, "b": 343.44485000000003, "coord_origin": "1"}}, {"id": 23, "text": "latest developments in table-structure reconstruction. In section 3 we review", "bbox": {"l": 134.76498, "t": 346.60388000000006, "r": 480.59375, "b": 355.40085, "coord_origin": "1"}}, {"id": 24, "text": "the current HTML table encoding (popularised by PubTabNet and FinTabNet)", "bbox": {"l": 134.76498, "t": 358.55887, "r": 480.58673, "b": 367.3558300000001, "coord_origin": "1"}}, {"id": 25, "text": "and discuss its flaws. Subsequently, we introduce OTSL in section 4, which in-", "bbox": {"l": 134.76498, "t": 370.51385, "r": 480.59161, "b": 379.31082, "coord_origin": "1"}}, {"id": 26, "text": "cludes the language definition, syntax rules and error-correction procedures. In", "bbox": {"l": 134.76498, "t": 382.46883999999994, "r": 480.59177000000005, "b": 391.26581, "coord_origin": "1"}}, {"id": 27, "text": "section 5, we apply OTSL on the TableFormer architecture, compare it to Table-", "bbox": {"l": 134.76498, "t": 394.42383, "r": 480.58774, "b": 403.2207900000001, "coord_origin": "1"}}, {"id": 28, "text": "Former models trained on HTML and ultimately demonstrate the advantages", "bbox": {"l": 134.76498, "t": 406.37982, "r": 480.59469999999993, "b": 415.17679, "coord_origin": "1"}}, {"id": 29, "text": "of using OTSL. Finally, in section 6 we conclude our work and outline next", "bbox": {"l": 134.76498, "t": 418.33481, "r": 480.59567, "b": 427.13177, "coord_origin": "1"}}, {"id": 30, "text": "potential steps.", "bbox": {"l": 134.76498, "t": 430.28979, "r": 201.27232, "b": 439.08676, "coord_origin": "1"}}]}, "text": "The paper is structured as follows. In section 2, we give an overview of the latest developments in table-structure reconstruction. In section 3 we review the current HTML table encoding (popularised by PubTabNet and FinTabNet) and discuss its flaws. Subsequently, we introduce OTSL in section 4, which includes the language definition, syntax rules and error-correction procedures. In section 5, we apply OTSL on the TableFormer architecture, compare it to TableFormer models trained on HTML and ultimately demonstrate the advantages of using OTSL. Finally, in section 6 we conclude our work and outline next potential steps."}, {"label": "Section-header", "id": 5, "page_no": 2, "cluster": {"id": 5, "label": "Section-header", "bbox": {"l": 134.49938735961913, "t": 461.4249195098877, "r": 236.76912999999996, "b": 472.65634, "coord_origin": "1"}, "confidence": 0.9480533599853516, "cells": [{"id": 31, "text": "2", "bbox": {"l": 134.76498, "t": 462.08795, "r": 141.48859, "b": 472.65634, "coord_origin": "1"}}, {"id": 32, "text": "Related Work", "bbox": {"l": 154.93819, "t": 462.08795, "r": 236.76912999999996, "b": 472.65634, "coord_origin": "1"}}]}, "text": "2 Related Work"}, {"label": "Text", "id": 6, "page_no": 2, "cluster": {"id": 6, "label": "Text", "bbox": {"l": 133.6534761428833, "t": 487.3701599121094, "r": 484.12047999999993, "b": 665.3428871154784, "coord_origin": "1"}, "confidence": 0.987200915813446, "cells": [{"id": 33, "text": "Approaches to formalize the logical structure and layout of tables in electronic", "bbox": {"l": 134.76498, "t": 488.68582, "r": 480.59067, "b": 497.48279, "coord_origin": "1"}}, {"id": 34, "text": "documents date back more than two decades [16]. In the recent past, a wide", "bbox": {"l": 134.76498, "t": 500.64081, "r": 480.5917400000001, "b": 509.43777, "coord_origin": "1"}}, {"id": 35, "text": "variety of computer vision methods have been explored to tackle the prob-", "bbox": {"l": 134.76498, "t": 512.5957900000001, "r": 480.58971999999994, "b": 521.39276, "coord_origin": "1"}}, {"id": 36, "text": "lem of table structure recognition, i.e. the correct identification of columns,", "bbox": {"l": 134.76498, "t": 524.55179, "r": 480.58966, "b": 533.34875, "coord_origin": "1"}}, {"id": 37, "text": "rows and spanning cells in a given table. Broadly speaking, the current deep-", "bbox": {"l": 134.76498, "t": 536.50679, "r": 480.5897499999999, "b": 545.30376, "coord_origin": "1"}}, {"id": 38, "text": "learning based approaches fall into three categories: object detection (OD) meth-", "bbox": {"l": 134.76498, "t": 548.4617900000001, "r": 480.58862000000005, "b": 557.2587599999999, "coord_origin": "1"}}, {"id": 39, "text": "ods, Graph-Neural-Network (GNN) methods and Image-to-Markup-Sequence", "bbox": {"l": 134.76498, "t": 560.41679, "r": 480.59072999999995, "b": 569.21376, "coord_origin": "1"}}, {"id": 40, "text": "(Im2Seq) methods. Object-detection based methods [11,12,13,14,21] rely on table-", "bbox": {"l": 134.76498, "t": 572.3718, "r": 484.12047999999993, "b": 581.16876, "coord_origin": "1"}}, {"id": 41, "text": "structure annotation using (overlapping) bounding boxes for training, and pro-", "bbox": {"l": 134.76498, "t": 584.3267999999999, "r": 480.59567, "b": 593.12376, "coord_origin": "1"}}, {"id": 42, "text": "duce bounding-box predictions to define table cells, rows, and columns on a table", "bbox": {"l": 134.76498, "t": 596.28279, "r": 480.58871, "b": 605.07976, "coord_origin": "1"}}, {"id": 43, "text": "image. Graph Neural Network (GNN) based methods [3,6,17,18], as the name", "bbox": {"l": 134.76498, "t": 608.23779, "r": 480.59075999999993, "b": 617.03476, "coord_origin": "1"}}, {"id": 44, "text": "suggests, represent tables as graph structures. The graph nodes represent the", "bbox": {"l": 134.76498, "t": 620.1927900000001, "r": 480.58574999999996, "b": 628.9897599999999, "coord_origin": "1"}}, {"id": 45, "text": "content of each table cell, an embedding vector from the table image, or geomet-", "bbox": {"l": 134.76498, "t": 632.1478, "r": 480.58875, "b": 640.94476, "coord_origin": "1"}}, {"id": 46, "text": "ric coordinates of the table cell. The edges of the graph define the relationship", "bbox": {"l": 134.76498, "t": 644.1028, "r": 480.58875, "b": 652.89977, "coord_origin": "1"}}, {"id": 47, "text": "between the nodes, e.g. if they belong to the same column, row, or table cell.", "bbox": {"l": 134.76498, "t": 656.05879, "r": 480.59069999999997, "b": 664.85577, "coord_origin": "1"}}]}, "text": "Approaches to formalize the logical structure and layout of tables in electronic documents date back more than two decades [16]. In the recent past, a wide variety of computer vision methods have been explored to tackle the problem of table structure recognition, i.e. the correct identification of columns, rows and spanning cells in a given table. Broadly speaking, the current deeplearning based approaches fall into three categories: object detection (OD) methods, Graph-Neural-Network (GNN) methods and Image-to-Markup-Sequence (Im2Seq) methods. Object-detection based methods [11,12,13,14,21] rely on tablestructure annotation using (overlapping) bounding boxes for training, and produce bounding-box predictions to define table cells, rows, and columns on a table image. Graph Neural Network (GNN) based methods [3,6,17,18], as the name suggests, represent tables as graph structures. The graph nodes represent the content of each table cell, an embedding vector from the table image, or geometric coordinates of the table cell. The edges of the graph define the relationship between the nodes, e.g. if they belong to the same column, row, or table cell."}], "headers": [{"label": "Page-header", "id": 0, "page_no": 2, "cluster": {"id": 0, "label": "Page-header", "bbox": {"l": 194.03438358306886, "t": 93.05150842666626, "r": 447.54291000000006, "b": 102.33464670181274, "coord_origin": "1"}, "confidence": 0.9509506821632385, "cells": [{"id": 0, "text": "Optimized Table Tokenization for Table Structure Recognition", "bbox": {"l": 194.478, "t": 93.77099999999996, "r": 447.54291000000006, "b": 101.84069999999997, "coord_origin": "1"}}]}, "text": "Optimized Table Tokenization for Table Structure Recognition"}, {"label": "Page-header", "id": 1, "page_no": 2, "cluster": {"id": 1, "label": "Page-header", "bbox": {"l": 474.9551456451416, "t": 93.63219079971316, "r": 480.59125000000006, "b": 101.84069999999997, "coord_origin": "1"}, "confidence": 0.8758732080459595, "cells": [{"id": 1, "text": "3", "bbox": {"l": 475.98431, "t": 93.77099999999996, "r": 480.59125000000006, "b": 101.84069999999997, "coord_origin": "1"}}]}, "text": "3"}]}}, {"page_no": 3, "page_hash": "5afca9340c5bda646a75b8c2a1bde1b8f7b89e08a64a3cc4732fd11c1c6ead48", "size": {"width": 612.0, "height": 792.0}, "cells": [{"id": 0, "text": "4", "bbox": {"l": 134.765, "t": 93.77099999999996, "r": 139.37193, "b": 101.84069999999997, "coord_origin": "1"}}, {"id": 1, "text": "M.", "bbox": {"l": 167.81335, "t": 93.77099999999996, "r": 178.07675, "b": 101.84069999999997, "coord_origin": "1"}}, {"id": 2, "text": "Lysak, et al.", "bbox": {"l": 182.37415, "t": 93.77099999999996, "r": 231.72227, "b": 101.84069999999997, "coord_origin": "1"}}, {"id": 3, "text": "Other work [20] aims at predicting a grid for each table and deciding which cells", "bbox": {"l": 134.765, "t": 118.93377999999996, "r": 480.59375, "b": 127.73077, "coord_origin": "1"}}, {"id": 4, "text": "must be merged using an attention network. Im2Seq methods cast the problem", "bbox": {"l": 134.765, "t": 130.88878999999997, "r": 480.58774, "b": 139.68579, "coord_origin": "1"}}, {"id": 5, "text": "as a sequence generation task [4,5,9,22], and therefore need an internal table-", "bbox": {"l": 134.765, "t": 142.84479, "r": 480.58675999999997, "b": 151.64178000000004, "coord_origin": "1"}}, {"id": 6, "text": "structure representation language, which is often implemented with standard", "bbox": {"l": 134.765, "t": 154.7998, "r": 480.5878000000001, "b": 163.59680000000003, "coord_origin": "1"}}, {"id": 7, "text": "markup languages (e.g. HTML, LaTeX, Markdown). In theory, Im2Seq methods", "bbox": {"l": 134.765, "t": 166.75482, "r": 480.59271, "b": 175.55182000000002, "coord_origin": "1"}}, {"id": 8, "text": "have a natural advantage over the OD and GNN methods by virtue of directly", "bbox": {"l": 134.765, "t": 178.70983999999999, "r": 480.5957599999999, "b": 187.50684, "coord_origin": "1"}}, {"id": 9, "text": "predicting the table-structure. As such, no post-processing or rules are needed", "bbox": {"l": 134.765, "t": 190.66485999999998, "r": 480.59271, "b": 199.46185000000003, "coord_origin": "1"}}, {"id": 10, "text": "in order to obtain the table-structure, which is necessary with OD and GNN", "bbox": {"l": 134.765, "t": 202.61987, "r": 480.59378, "b": 211.41687000000002, "coord_origin": "1"}}, {"id": 11, "text": "approaches. In practice, this is not entirely true, because a predicted sequence", "bbox": {"l": 134.765, "t": 214.57587, "r": 480.58783000000005, "b": 223.37285999999995, "coord_origin": "1"}}, {"id": 12, "text": "of table-structure markup does not necessarily have to be syntactically correct.", "bbox": {"l": 134.765, "t": 226.53088000000002, "r": 480.58978, "b": 235.32788000000005, "coord_origin": "1"}}, {"id": 13, "text": "Hence, depending on the quality of the predicted sequence, some post-processing", "bbox": {"l": 134.765, "t": 238.48590000000002, "r": 480.59572999999995, "b": 247.28290000000004, "coord_origin": "1"}}, {"id": 14, "text": "needs to be performed to ensure a syntactically valid (let alone correct) sequence.", "bbox": {"l": 134.765, "t": 250.44092, "r": 480.59473, "b": 259.23792000000003, "coord_origin": "1"}}, {"id": 15, "text": "Within the Im2Seq method, we find several popular models, namely the", "bbox": {"l": 149.709, "t": 262.65692, "r": 480.59280000000007, "b": 271.45392000000004, "coord_origin": "1"}}, {"id": 16, "text": "encoder-dual-decoder model (EDD) [22], TableFormer [9], Tabsplitter[2] and Ye", "bbox": {"l": 134.765, "t": 274.61194, "r": 480.59167, "b": 283.40891, "coord_origin": "1"}}, {"id": 17, "text": "et. al. [19]. EDD uses two consecutive long short-term memory (LSTM) decoders", "bbox": {"l": 134.765, "t": 286.56692999999996, "r": 480.59271, "b": 295.36389, "coord_origin": "1"}}, {"id": 18, "text": "to predict a table in HTML representation. The", "bbox": {"l": 134.765, "t": 298.52190999999993, "r": 342.02097, "b": 307.31888, "coord_origin": "1"}}, {"id": 19, "text": "tag decoder", "bbox": {"l": 345.064, "t": 298.52190999999993, "r": 393.04684, "b": 307.31888, "coord_origin": "1"}}, {"id": 20, "text": "predicts a sequence", "bbox": {"l": 397.16699, "t": 298.52190999999993, "r": 480.59082, "b": 307.31888, "coord_origin": "1"}}, {"id": 21, "text": "of HTML tags. For each decoded table cell (", "bbox": {"l": 134.76498, "t": 310.47791, "r": 333.29871, "b": 319.27487, "coord_origin": "1"}}, {"id": 22, "text": "<td>", "bbox": {"l": 333.29898, "t": 310.47791, "r": 356.9711, "b": 319.27487, "coord_origin": "1"}}, {"id": 23, "text": "), the attention is passed to", "bbox": {"l": 357.08499, "t": 310.47791, "r": 480.59433000000007, "b": 319.27487, "coord_origin": "1"}}, {"id": 24, "text": "the", "bbox": {"l": 134.76498, "t": 322.43289, "r": 148.59805, "b": 331.22986, "coord_origin": "1"}}, {"id": 25, "text": "cell decoder", "bbox": {"l": 152.27698, "t": 322.43289, "r": 202.1109, "b": 331.22986, "coord_origin": "1"}}, {"id": 26, "text": "to predict the content with an embedded OCR approach. The", "bbox": {"l": 206.86398, "t": 322.43289, "r": 480.58743, "b": 331.22986, "coord_origin": "1"}}, {"id": 27, "text": "latter makes it susceptible to transcription errors in the cell content of the table.", "bbox": {"l": 134.76498, "t": 334.38788, "r": 480.59476, "b": 343.18484, "coord_origin": "1"}}, {"id": 28, "text": "TableFormer address this reliance on OCR and uses two transformer decoders for", "bbox": {"l": 134.76498, "t": 346.34286, "r": 480.58675999999997, "b": 355.13983, "coord_origin": "1"}}, {"id": 29, "text": "HTML structure and cell bounding box prediction in an end-to-end architecture.", "bbox": {"l": 134.76498, "t": 358.29785, "r": 480.58868, "b": 367.09482, "coord_origin": "1"}}, {"id": 30, "text": "The predicted cell bounding box is then used to extract text tokens from an", "bbox": {"l": 134.76498, "t": 370.25284, "r": 480.58868, "b": 379.0498, "coord_origin": "1"}}, {"id": 31, "text": "originating (digital) PDF page, circumventing any need for OCR. TabSplitter", "bbox": {"l": 134.76498, "t": 382.20883, "r": 480.59357000000006, "b": 391.0058, "coord_origin": "1"}}, {"id": 32, "text": "[2]", "bbox": {"l": 134.76498, "t": 394.16382, "r": 144.76979, "b": 402.96078, "coord_origin": "1"}}, {"id": 33, "text": "proposes a compact double-matrix representation of table rows and columns", "bbox": {"l": 149.50908, "t": 394.16382, "r": 480.58667, "b": 402.96078, "coord_origin": "1"}}, {"id": 34, "text": "to do error detection and error correction of HTML structure sequences based", "bbox": {"l": 134.76498, "t": 406.1188, "r": 480.59569999999997, "b": 414.91576999999995, "coord_origin": "1"}}, {"id": 35, "text": "on predictions from [19]. This compact double-matrix representation can not be", "bbox": {"l": 134.76498, "t": 418.07379, "r": 480.59180000000003, "b": 426.87076, "coord_origin": "1"}}, {"id": 36, "text": "used directly by the Img2seq model training, so the model uses HTML as an", "bbox": {"l": 134.76498, "t": 430.02878, "r": 480.5878000000001, "b": 438.82574, "coord_origin": "1"}}, {"id": 37, "text": "intermediate form. Chi et. al. [4] introduce a data set and a baseline method", "bbox": {"l": 134.76498, "t": 441.98376, "r": 480.58868, "b": 450.78073, "coord_origin": "1"}}, {"id": 38, "text": "using bidirectional LSTMs to predict LaTeX code. Kayal", "bbox": {"l": 134.76498, "t": 453.93976000000004, "r": 384.5752, "b": 462.73672, "coord_origin": "1"}}, {"id": 39, "text": "[5]", "bbox": {"l": 391.55899, "t": 453.93976000000004, "r": 401.73236, "b": 462.73672, "coord_origin": "1"}}, {"id": 40, "text": "introduces Gated", "bbox": {"l": 406.55154, "t": 453.93976000000004, "r": 480.58777, "b": 462.73672, "coord_origin": "1"}}, {"id": 41, "text": "ResNet transformers to predict LaTeX code, and a separate OCR module to", "bbox": {"l": 134.76498, "t": 465.89474, "r": 480.59079, "b": 474.69171, "coord_origin": "1"}}, {"id": 42, "text": "extract content.", "bbox": {"l": 134.76498, "t": 477.84973, "r": 203.68625, "b": 486.6467, "coord_origin": "1"}}, {"id": 43, "text": "Im2Seq approaches have shown to be well-suited for the TSR task and allow a", "bbox": {"l": 149.70898, "t": 490.06573, "r": 480.59378, "b": 498.8627, "coord_origin": "1"}}, {"id": 44, "text": "full end-to-end network design that can output the final table structure without", "bbox": {"l": 134.76498, "t": 502.02072, "r": 480.58871, "b": 510.81769, "coord_origin": "1"}}, {"id": 45, "text": "pre- or post-processing logic. Furthermore, Im2Seq models have demonstrated", "bbox": {"l": 134.76498, "t": 513.9757099999999, "r": 480.58675999999997, "b": 522.7726700000001, "coord_origin": "1"}}, {"id": 46, "text": "to deliver state-of-the-art prediction accuracy [9]. This motivated the authors", "bbox": {"l": 134.76498, "t": 525.93069, "r": 480.58978, "b": 534.72766, "coord_origin": "1"}}, {"id": 47, "text": "to investigate if the performance (both in accuracy and inference time) can", "bbox": {"l": 134.76498, "t": 537.8857, "r": 480.58765, "b": 546.6826599999999, "coord_origin": "1"}}, {"id": 48, "text": "be further improved by optimising the table structure representation language.", "bbox": {"l": 134.76498, "t": 549.84169, "r": 480.58971999999994, "b": 558.63866, "coord_origin": "1"}}, {"id": 49, "text": "We believe this is a necessary step before further improving neural network", "bbox": {"l": 134.76498, "t": 561.79669, "r": 480.58871, "b": 570.59366, "coord_origin": "1"}}, {"id": 50, "text": "architectures for this task.", "bbox": {"l": 134.76498, "t": 573.75169, "r": 249.27811, "b": 582.54866, "coord_origin": "1"}}, {"id": 51, "text": "3", "bbox": {"l": 134.76498, "t": 605.54984, "r": 141.48859, "b": 616.11823, "coord_origin": "1"}}, {"id": 52, "text": "Problem Statement", "bbox": {"l": 154.93819, "t": 605.54984, "r": 269.62442, "b": 616.11823, "coord_origin": "1"}}, {"id": 53, "text": "All known Im2Seq based models for TSR fundamentally work in similar ways.", "bbox": {"l": 134.76498, "t": 632.14769, "r": 480.59064, "b": 640.94466, "coord_origin": "1"}}, {"id": 54, "text": "Given an image of a table, the Im2Seq model predicts the structure of the table", "bbox": {"l": 134.76498, "t": 644.1026899999999, "r": 480.5867, "b": 652.89966, "coord_origin": "1"}}, {"id": 55, "text": "by generating a sequence of tokens. These tokens originate from a finite vocab-", "bbox": {"l": 134.76498, "t": 656.0586900000001, "r": 480.5936899999999, "b": 664.85566, "coord_origin": "1"}}], "predictions": {"layout": {"clusters": [{"id": 0, "label": "Page-header", "bbox": {"l": 134.52096776962279, "t": 92.96537475585933, "r": 231.72227, "b": 101.84069999999997, "coord_origin": "1"}, "confidence": 0.6325680017471313, "cells": [{"id": 0, "text": "4", "bbox": {"l": 134.765, "t": 93.77099999999996, "r": 139.37193, "b": 101.84069999999997, "coord_origin": "1"}}, {"id": 1, "text": "M.", "bbox": {"l": 167.81335, "t": 93.77099999999996, "r": 178.07675, "b": 101.84069999999997, "coord_origin": "1"}}, {"id": 2, "text": "Lysak, et al.", "bbox": {"l": 182.37415, "t": 93.77099999999996, "r": 231.72227, "b": 101.84069999999997, "coord_origin": "1"}}]}, {"id": 1, "label": "Text", "bbox": {"l": 133.76139278411867, "t": 117.85091514587407, "r": 480.6270435333252, "b": 259.4519508361816, "coord_origin": "1"}, "confidence": 0.9769367575645447, "cells": [{"id": 3, "text": "Other work [20] aims at predicting a grid for each table and deciding which cells", "bbox": {"l": 134.765, "t": 118.93377999999996, "r": 480.59375, "b": 127.73077, "coord_origin": "1"}}, {"id": 4, "text": "must be merged using an attention network. Im2Seq methods cast the problem", "bbox": {"l": 134.765, "t": 130.88878999999997, "r": 480.58774, "b": 139.68579, "coord_origin": "1"}}, {"id": 5, "text": "as a sequence generation task [4,5,9,22], and therefore need an internal table-", "bbox": {"l": 134.765, "t": 142.84479, "r": 480.58675999999997, "b": 151.64178000000004, "coord_origin": "1"}}, {"id": 6, "text": "structure representation language, which is often implemented with standard", "bbox": {"l": 134.765, "t": 154.7998, "r": 480.5878000000001, "b": 163.59680000000003, "coord_origin": "1"}}, {"id": 7, "text": "markup languages (e.g. HTML, LaTeX, Markdown). In theory, Im2Seq methods", "bbox": {"l": 134.765, "t": 166.75482, "r": 480.59271, "b": 175.55182000000002, "coord_origin": "1"}}, {"id": 8, "text": "have a natural advantage over the OD and GNN methods by virtue of directly", "bbox": {"l": 134.765, "t": 178.70983999999999, "r": 480.5957599999999, "b": 187.50684, "coord_origin": "1"}}, {"id": 9, "text": "predicting the table-structure. As such, no post-processing or rules are needed", "bbox": {"l": 134.765, "t": 190.66485999999998, "r": 480.59271, "b": 199.46185000000003, "coord_origin": "1"}}, {"id": 10, "text": "in order to obtain the table-structure, which is necessary with OD and GNN", "bbox": {"l": 134.765, "t": 202.61987, "r": 480.59378, "b": 211.41687000000002, "coord_origin": "1"}}, {"id": 11, "text": "approaches. In practice, this is not entirely true, because a predicted sequence", "bbox": {"l": 134.765, "t": 214.57587, "r": 480.58783000000005, "b": 223.37285999999995, "coord_origin": "1"}}, {"id": 12, "text": "of table-structure markup does not necessarily have to be syntactically correct.", "bbox": {"l": 134.765, "t": 226.53088000000002, "r": 480.58978, "b": 235.32788000000005, "coord_origin": "1"}}, {"id": 13, "text": "Hence, depending on the quality of the predicted sequence, some post-processing", "bbox": {"l": 134.765, "t": 238.48590000000002, "r": 480.59572999999995, "b": 247.28290000000004, "coord_origin": "1"}}, {"id": 14, "text": "needs to be performed to ensure a syntactically valid (let alone correct) sequence.", "bbox": {"l": 134.765, "t": 250.44092, "r": 480.59473, "b": 259.23792000000003, "coord_origin": "1"}}]}, {"id": 2, "label": "Text", "bbox": {"l": 133.58259201049805, "t": 261.394985961914, "r": 480.79309158325196, "b": 486.6467, "coord_origin": "1"}, "confidence": 0.9808397889137268, "cells": [{"id": 15, "text": "Within the Im2Seq method, we find several popular models, namely the", "bbox": {"l": 149.709, "t": 262.65692, "r": 480.59280000000007, "b": 271.45392000000004, "coord_origin": "1"}}, {"id": 16, "text": "encoder-dual-decoder model (EDD) [22], TableFormer [9], Tabsplitter[2] and Ye", "bbox": {"l": 134.765, "t": 274.61194, "r": 480.59167, "b": 283.40891, "coord_origin": "1"}}, {"id": 17, "text": "et. al. [19]. EDD uses two consecutive long short-term memory (LSTM) decoders", "bbox": {"l": 134.765, "t": 286.56692999999996, "r": 480.59271, "b": 295.36389, "coord_origin": "1"}}, {"id": 18, "text": "to predict a table in HTML representation. The", "bbox": {"l": 134.765, "t": 298.52190999999993, "r": 342.02097, "b": 307.31888, "coord_origin": "1"}}, {"id": 19, "text": "tag decoder", "bbox": {"l": 345.064, "t": 298.52190999999993, "r": 393.04684, "b": 307.31888, "coord_origin": "1"}}, {"id": 20, "text": "predicts a sequence", "bbox": {"l": 397.16699, "t": 298.52190999999993, "r": 480.59082, "b": 307.31888, "coord_origin": "1"}}, {"id": 21, "text": "of HTML tags. For each decoded table cell (", "bbox": {"l": 134.76498, "t": 310.47791, "r": 333.29871, "b": 319.27487, "coord_origin": "1"}}, {"id": 22, "text": "<td>", "bbox": {"l": 333.29898, "t": 310.47791, "r": 356.9711, "b": 319.27487, "coord_origin": "1"}}, {"id": 23, "text": "), the attention is passed to", "bbox": {"l": 357.08499, "t": 310.47791, "r": 480.59433000000007, "b": 319.27487, "coord_origin": "1"}}, {"id": 24, "text": "the", "bbox": {"l": 134.76498, "t": 322.43289, "r": 148.59805, "b": 331.22986, "coord_origin": "1"}}, {"id": 25, "text": "cell decoder", "bbox": {"l": 152.27698, "t": 322.43289, "r": 202.1109, "b": 331.22986, "coord_origin": "1"}}, {"id": 26, "text": "to predict the content with an embedded OCR approach. The", "bbox": {"l": 206.86398, "t": 322.43289, "r": 480.58743, "b": 331.22986, "coord_origin": "1"}}, {"id": 27, "text": "latter makes it susceptible to transcription errors in the cell content of the table.", "bbox": {"l": 134.76498, "t": 334.38788, "r": 480.59476, "b": 343.18484, "coord_origin": "1"}}, {"id": 28, "text": "TableFormer address this reliance on OCR and uses two transformer decoders for", "bbox": {"l": 134.76498, "t": 346.34286, "r": 480.58675999999997, "b": 355.13983, "coord_origin": "1"}}, {"id": 29, "text": "HTML structure and cell bounding box prediction in an end-to-end architecture.", "bbox": {"l": 134.76498, "t": 358.29785, "r": 480.58868, "b": 367.09482, "coord_origin": "1"}}, {"id": 30, "text": "The predicted cell bounding box is then used to extract text tokens from an", "bbox": {"l": 134.76498, "t": 370.25284, "r": 480.58868, "b": 379.0498, "coord_origin": "1"}}, {"id": 31, "text": "originating (digital) PDF page, circumventing any need for OCR. TabSplitter", "bbox": {"l": 134.76498, "t": 382.20883, "r": 480.59357000000006, "b": 391.0058, "coord_origin": "1"}}, {"id": 32, "text": "[2]", "bbox": {"l": 134.76498, "t": 394.16382, "r": 144.76979, "b": 402.96078, "coord_origin": "1"}}, {"id": 33, "text": "proposes a compact double-matrix representation of table rows and columns", "bbox": {"l": 149.50908, "t": 394.16382, "r": 480.58667, "b": 402.96078, "coord_origin": "1"}}, {"id": 34, "text": "to do error detection and error correction of HTML structure sequences based", "bbox": {"l": 134.76498, "t": 406.1188, "r": 480.59569999999997, "b": 414.91576999999995, "coord_origin": "1"}}, {"id": 35, "text": "on predictions from [19]. This compact double-matrix representation can not be", "bbox": {"l": 134.76498, "t": 418.07379, "r": 480.59180000000003, "b": 426.87076, "coord_origin": "1"}}, {"id": 36, "text": "used directly by the Img2seq model training, so the model uses HTML as an", "bbox": {"l": 134.76498, "t": 430.02878, "r": 480.5878000000001, "b": 438.82574, "coord_origin": "1"}}, {"id": 37, "text": "intermediate form. Chi et. al. [4] introduce a data set and a baseline method", "bbox": {"l": 134.76498, "t": 441.98376, "r": 480.58868, "b": 450.78073, "coord_origin": "1"}}, {"id": 38, "text": "using bidirectional LSTMs to predict LaTeX code. Kayal", "bbox": {"l": 134.76498, "t": 453.93976000000004, "r": 384.5752, "b": 462.73672, "coord_origin": "1"}}, {"id": 39, "text": "[5]", "bbox": {"l": 391.55899, "t": 453.93976000000004, "r": 401.73236, "b": 462.73672, "coord_origin": "1"}}, {"id": 40, "text": "introduces Gated", "bbox": {"l": 406.55154, "t": 453.93976000000004, "r": 480.58777, "b": 462.73672, "coord_origin": "1"}}, {"id": 41, "text": "ResNet transformers to predict LaTeX code, and a separate OCR module to", "bbox": {"l": 134.76498, "t": 465.89474, "r": 480.59079, "b": 474.69171, "coord_origin": "1"}}, {"id": 42, "text": "extract content.", "bbox": {"l": 134.76498, "t": 477.84973, "r": 203.68625, "b": 486.6467, "coord_origin": "1"}}]}, {"id": 3, "label": "Text", "bbox": {"l": 133.88829259872435, "t": 488.7115905761719, "r": 480.59378, "b": 582.54866, "coord_origin": "1"}, "confidence": 0.9849583506584167, "cells": [{"id": 43, "text": "Im2Seq approaches have shown to be well-suited for the TSR task and allow a", "bbox": {"l": 149.70898, "t": 490.06573, "r": 480.59378, "b": 498.8627, "coord_origin": "1"}}, {"id": 44, "text": "full end-to-end network design that can output the final table structure without", "bbox": {"l": 134.76498, "t": 502.02072, "r": 480.58871, "b": 510.81769, "coord_origin": "1"}}, {"id": 45, "text": "pre- or post-processing logic. Furthermore, Im2Seq models have demonstrated", "bbox": {"l": 134.76498, "t": 513.9757099999999, "r": 480.58675999999997, "b": 522.7726700000001, "coord_origin": "1"}}, {"id": 46, "text": "to deliver state-of-the-art prediction accuracy [9]. This motivated the authors", "bbox": {"l": 134.76498, "t": 525.93069, "r": 480.58978, "b": 534.72766, "coord_origin": "1"}}, {"id": 47, "text": "to investigate if the performance (both in accuracy and inference time) can", "bbox": {"l": 134.76498, "t": 537.8857, "r": 480.58765, "b": 546.6826599999999, "coord_origin": "1"}}, {"id": 48, "text": "be further improved by optimising the table structure representation language.", "bbox": {"l": 134.76498, "t": 549.84169, "r": 480.58971999999994, "b": 558.63866, "coord_origin": "1"}}, {"id": 49, "text": "We believe this is a necessary step before further improving neural network", "bbox": {"l": 134.76498, "t": 561.79669, "r": 480.58871, "b": 570.59366, "coord_origin": "1"}}, {"id": 50, "text": "architectures for this task.", "bbox": {"l": 134.76498, "t": 573.75169, "r": 249.27811, "b": 582.54866, "coord_origin": "1"}}]}, {"id": 4, "label": "Section-header", "bbox": {"l": 134.42018623352052, "t": 605.1948657989502, "r": 269.62442, "b": 616.11823, "coord_origin": "1"}, "confidence": 0.9407892227172852, "cells": [{"id": 51, "text": "3", "bbox": {"l": 134.76498, "t": 605.54984, "r": 141.48859, "b": 616.11823, "coord_origin": "1"}}, {"id": 52, "text": "Problem Statement", "bbox": {"l": 154.93819, "t": 605.54984, "r": 269.62442, "b": 616.11823, "coord_origin": "1"}}]}, {"id": 5, "label": "Text", "bbox": {"l": 133.80312366485597, "t": 631.5329429626464, "r": 480.5936899999999, "b": 665.3024780273437, "coord_origin": "1"}, "confidence": 0.980012059211731, "cells": [{"id": 53, "text": "All known Im2Seq based models for TSR fundamentally work in similar ways.", "bbox": {"l": 134.76498, "t": 632.14769, "r": 480.59064, "b": 640.94466, "coord_origin": "1"}}, {"id": 54, "text": "Given an image of a table, the Im2Seq model predicts the structure of the table", "bbox": {"l": 134.76498, "t": 644.1026899999999, "r": 480.5867, "b": 652.89966, "coord_origin": "1"}}, {"id": 55, "text": "by generating a sequence of tokens. These tokens originate from a finite vocab-", "bbox": {"l": 134.76498, "t": 656.0586900000001, "r": 480.5936899999999, "b": 664.85566, "coord_origin": "1"}}]}]}, "tablestructure": {"table_map": {}}, "figures_classification": null, "equations_prediction": null}, "assembled": {"elements": [{"label": "Page-header", "id": 0, "page_no": 3, "cluster": {"id": 0, "label": "Page-header", "bbox": {"l": 134.52096776962279, "t": 92.96537475585933, "r": 231.72227, "b": 101.84069999999997, "coord_origin": "1"}, "confidence": 0.6325680017471313, "cells": [{"id": 0, "text": "4", "bbox": {"l": 134.765, "t": 93.77099999999996, "r": 139.37193, "b": 101.84069999999997, "coord_origin": "1"}}, {"id": 1, "text": "M.", "bbox": {"l": 167.81335, "t": 93.77099999999996, "r": 178.07675, "b": 101.84069999999997, "coord_origin": "1"}}, {"id": 2, "text": "Lysak, et al.", "bbox": {"l": 182.37415, "t": 93.77099999999996, "r": 231.72227, "b": 101.84069999999997, "coord_origin": "1"}}]}, "text": "4 M. Lysak, et al."}, {"label": "Text", "id": 1, "page_no": 3, "cluster": {"id": 1, "label": "Text", "bbox": {"l": 133.76139278411867, "t": 117.85091514587407, "r": 480.6270435333252, "b": 259.4519508361816, "coord_origin": "1"}, "confidence": 0.9769367575645447, "cells": [{"id": 3, "text": "Other work [20] aims at predicting a grid for each table and deciding which cells", "bbox": {"l": 134.765, "t": 118.93377999999996, "r": 480.59375, "b": 127.73077, "coord_origin": "1"}}, {"id": 4, "text": "must be merged using an attention network. Im2Seq methods cast the problem", "bbox": {"l": 134.765, "t": 130.88878999999997, "r": 480.58774, "b": 139.68579, "coord_origin": "1"}}, {"id": 5, "text": "as a sequence generation task [4,5,9,22], and therefore need an internal table-", "bbox": {"l": 134.765, "t": 142.84479, "r": 480.58675999999997, "b": 151.64178000000004, "coord_origin": "1"}}, {"id": 6, "text": "structure representation language, which is often implemented with standard", "bbox": {"l": 134.765, "t": 154.7998, "r": 480.5878000000001, "b": 163.59680000000003, "coord_origin": "1"}}, {"id": 7, "text": "markup languages (e.g. HTML, LaTeX, Markdown). In theory, Im2Seq methods", "bbox": {"l": 134.765, "t": 166.75482, "r": 480.59271, "b": 175.55182000000002, "coord_origin": "1"}}, {"id": 8, "text": "have a natural advantage over the OD and GNN methods by virtue of directly", "bbox": {"l": 134.765, "t": 178.70983999999999, "r": 480.5957599999999, "b": 187.50684, "coord_origin": "1"}}, {"id": 9, "text": "predicting the table-structure. As such, no post-processing or rules are needed", "bbox": {"l": 134.765, "t": 190.66485999999998, "r": 480.59271, "b": 199.46185000000003, "coord_origin": "1"}}, {"id": 10, "text": "in order to obtain the table-structure, which is necessary with OD and GNN", "bbox": {"l": 134.765, "t": 202.61987, "r": 480.59378, "b": 211.41687000000002, "coord_origin": "1"}}, {"id": 11, "text": "approaches. In practice, this is not entirely true, because a predicted sequence", "bbox": {"l": 134.765, "t": 214.57587, "r": 480.58783000000005, "b": 223.37285999999995, "coord_origin": "1"}}, {"id": 12, "text": "of table-structure markup does not necessarily have to be syntactically correct.", "bbox": {"l": 134.765, "t": 226.53088000000002, "r": 480.58978, "b": 235.32788000000005, "coord_origin": "1"}}, {"id": 13, "text": "Hence, depending on the quality of the predicted sequence, some post-processing", "bbox": {"l": 134.765, "t": 238.48590000000002, "r": 480.59572999999995, "b": 247.28290000000004, "coord_origin": "1"}}, {"id": 14, "text": "needs to be performed to ensure a syntactically valid (let alone correct) sequence.", "bbox": {"l": 134.765, "t": 250.44092, "r": 480.59473, "b": 259.23792000000003, "coord_origin": "1"}}]}, "text": "Other work [20] aims at predicting a grid for each table and deciding which cells must be merged using an attention network. Im2Seq methods cast the problem as a sequence generation task [4,5,9,22], and therefore need an internal tablestructure representation language, which is often implemented with standard markup languages (e.g. HTML, LaTeX, Markdown). In theory, Im2Seq methods have a natural advantage over the OD and GNN methods by virtue of directly predicting the table-structure. As such, no post-processing or rules are needed in order to obtain the table-structure, which is necessary with OD and GNN approaches. In practice, this is not entirely true, because a predicted sequence of table-structure markup does not necessarily have to be syntactically correct. Hence, depending on the quality of the predicted sequence, some post-processing needs to be performed to ensure a syntactically valid (let alone correct) sequence."}, {"label": "Text", "id": 2, "page_no": 3, "cluster": {"id": 2, "label": "Text", "bbox": {"l": 133.58259201049805, "t": 261.394985961914, "r": 480.79309158325196, "b": 486.6467, "coord_origin": "1"}, "confidence": 0.9808397889137268, "cells": [{"id": 15, "text": "Within the Im2Seq method, we find several popular models, namely the", "bbox": {"l": 149.709, "t": 262.65692, "r": 480.59280000000007, "b": 271.45392000000004, "coord_origin": "1"}}, {"id": 16, "text": "encoder-dual-decoder model (EDD) [22], TableFormer [9], Tabsplitter[2] and Ye", "bbox": {"l": 134.765, "t": 274.61194, "r": 480.59167, "b": 283.40891, "coord_origin": "1"}}, {"id": 17, "text": "et. al. [19]. EDD uses two consecutive long short-term memory (LSTM) decoders", "bbox": {"l": 134.765, "t": 286.56692999999996, "r": 480.59271, "b": 295.36389, "coord_origin": "1"}}, {"id": 18, "text": "to predict a table in HTML representation. The", "bbox": {"l": 134.765, "t": 298.52190999999993, "r": 342.02097, "b": 307.31888, "coord_origin": "1"}}, {"id": 19, "text": "tag decoder", "bbox": {"l": 345.064, "t": 298.52190999999993, "r": 393.04684, "b": 307.31888, "coord_origin": "1"}}, {"id": 20, "text": "predicts a sequence", "bbox": {"l": 397.16699, "t": 298.52190999999993, "r": 480.59082, "b": 307.31888, "coord_origin": "1"}}, {"id": 21, "text": "of HTML tags. For each decoded table cell (", "bbox": {"l": 134.76498, "t": 310.47791, "r": 333.29871, "b": 319.27487, "coord_origin": "1"}}, {"id": 22, "text": "<td>", "bbox": {"l": 333.29898, "t": 310.47791, "r": 356.9711, "b": 319.27487, "coord_origin": "1"}}, {"id": 23, "text": "), the attention is passed to", "bbox": {"l": 357.08499, "t": 310.47791, "r": 480.59433000000007, "b": 319.27487, "coord_origin": "1"}}, {"id": 24, "text": "the", "bbox": {"l": 134.76498, "t": 322.43289, "r": 148.59805, "b": 331.22986, "coord_origin": "1"}}, {"id": 25, "text": "cell decoder", "bbox": {"l": 152.27698, "t": 322.43289, "r": 202.1109, "b": 331.22986, "coord_origin": "1"}}, {"id": 26, "text": "to predict the content with an embedded OCR approach. The", "bbox": {"l": 206.86398, "t": 322.43289, "r": 480.58743, "b": 331.22986, "coord_origin": "1"}}, {"id": 27, "text": "latter makes it susceptible to transcription errors in the cell content of the table.", "bbox": {"l": 134.76498, "t": 334.38788, "r": 480.59476, "b": 343.18484, "coord_origin": "1"}}, {"id": 28, "text": "TableFormer address this reliance on OCR and uses two transformer decoders for", "bbox": {"l": 134.76498, "t": 346.34286, "r": 480.58675999999997, "b": 355.13983, "coord_origin": "1"}}, {"id": 29, "text": "HTML structure and cell bounding box prediction in an end-to-end architecture.", "bbox": {"l": 134.76498, "t": 358.29785, "r": 480.58868, "b": 367.09482, "coord_origin": "1"}}, {"id": 30, "text": "The predicted cell bounding box is then used to extract text tokens from an", "bbox": {"l": 134.76498, "t": 370.25284, "r": 480.58868, "b": 379.0498, "coord_origin": "1"}}, {"id": 31, "text": "originating (digital) PDF page, circumventing any need for OCR. TabSplitter", "bbox": {"l": 134.76498, "t": 382.20883, "r": 480.59357000000006, "b": 391.0058, "coord_origin": "1"}}, {"id": 32, "text": "[2]", "bbox": {"l": 134.76498, "t": 394.16382, "r": 144.76979, "b": 402.96078, "coord_origin": "1"}}, {"id": 33, "text": "proposes a compact double-matrix representation of table rows and columns", "bbox": {"l": 149.50908, "t": 394.16382, "r": 480.58667, "b": 402.96078, "coord_origin": "1"}}, {"id": 34, "text": "to do error detection and error correction of HTML structure sequences based", "bbox": {"l": 134.76498, "t": 406.1188, "r": 480.59569999999997, "b": 414.91576999999995, "coord_origin": "1"}}, {"id": 35, "text": "on predictions from [19]. This compact double-matrix representation can not be", "bbox": {"l": 134.76498, "t": 418.07379, "r": 480.59180000000003, "b": 426.87076, "coord_origin": "1"}}, {"id": 36, "text": "used directly by the Img2seq model training, so the model uses HTML as an", "bbox": {"l": 134.76498, "t": 430.02878, "r": 480.5878000000001, "b": 438.82574, "coord_origin": "1"}}, {"id": 37, "text": "intermediate form. Chi et. al. [4] introduce a data set and a baseline method", "bbox": {"l": 134.76498, "t": 441.98376, "r": 480.58868, "b": 450.78073, "coord_origin": "1"}}, {"id": 38, "text": "using bidirectional LSTMs to predict LaTeX code. Kayal", "bbox": {"l": 134.76498, "t": 453.93976000000004, "r": 384.5752, "b": 462.73672, "coord_origin": "1"}}, {"id": 39, "text": "[5]", "bbox": {"l": 391.55899, "t": 453.93976000000004, "r": 401.73236, "b": 462.73672, "coord_origin": "1"}}, {"id": 40, "text": "introduces Gated", "bbox": {"l": 406.55154, "t": 453.93976000000004, "r": 480.58777, "b": 462.73672, "coord_origin": "1"}}, {"id": 41, "text": "ResNet transformers to predict LaTeX code, and a separate OCR module to", "bbox": {"l": 134.76498, "t": 465.89474, "r": 480.59079, "b": 474.69171, "coord_origin": "1"}}, {"id": 42, "text": "extract content.", "bbox": {"l": 134.76498, "t": 477.84973, "r": 203.68625, "b": 486.6467, "coord_origin": "1"}}]}, "text": "Within the Im2Seq method, we find several popular models, namely the encoder-dual-decoder model (EDD) [22], TableFormer [9], Tabsplitter[2] and Ye et. al. [19]. EDD uses two consecutive long short-term memory (LSTM) decoders to predict a table in HTML representation. The tag decoder predicts a sequence of HTML tags. For each decoded table cell ( <td> ), the attention is passed to the cell decoder to predict the content with an embedded OCR approach. The latter makes it susceptible to transcription errors in the cell content of the table. TableFormer address this reliance on OCR and uses two transformer decoders for HTML structure and cell bounding box prediction in an end-to-end architecture. The predicted cell bounding box is then used to extract text tokens from an originating (digital) PDF page, circumventing any need for OCR. TabSplitter [2] proposes a compact double-matrix representation of table rows and columns to do error detection and error correction of HTML structure sequences based on predictions from [19]. This compact double-matrix representation can not be used directly by the Img2seq model training, so the model uses HTML as an intermediate form. Chi et. al. [4] introduce a data set and a baseline method using bidirectional LSTMs to predict LaTeX code. Kayal [5] introduces Gated ResNet transformers to predict LaTeX code, and a separate OCR module to extract content."}, {"label": "Text", "id": 3, "page_no": 3, "cluster": {"id": 3, "label": "Text", "bbox": {"l": 133.88829259872435, "t": 488.7115905761719, "r": 480.59378, "b": 582.54866, "coord_origin": "1"}, "confidence": 0.9849583506584167, "cells": [{"id": 43, "text": "Im2Seq approaches have shown to be well-suited for the TSR task and allow a", "bbox": {"l": 149.70898, "t": 490.06573, "r": 480.59378, "b": 498.8627, "coord_origin": "1"}}, {"id": 44, "text": "full end-to-end network design that can output the final table structure without", "bbox": {"l": 134.76498, "t": 502.02072, "r": 480.58871, "b": 510.81769, "coord_origin": "1"}}, {"id": 45, "text": "pre- or post-processing logic. Furthermore, Im2Seq models have demonstrated", "bbox": {"l": 134.76498, "t": 513.9757099999999, "r": 480.58675999999997, "b": 522.7726700000001, "coord_origin": "1"}}, {"id": 46, "text": "to deliver state-of-the-art prediction accuracy [9]. This motivated the authors", "bbox": {"l": 134.76498, "t": 525.93069, "r": 480.58978, "b": 534.72766, "coord_origin": "1"}}, {"id": 47, "text": "to investigate if the performance (both in accuracy and inference time) can", "bbox": {"l": 134.76498, "t": 537.8857, "r": 480.58765, "b": 546.6826599999999, "coord_origin": "1"}}, {"id": 48, "text": "be further improved by optimising the table structure representation language.", "bbox": {"l": 134.76498, "t": 549.84169, "r": 480.58971999999994, "b": 558.63866, "coord_origin": "1"}}, {"id": 49, "text": "We believe this is a necessary step before further improving neural network", "bbox": {"l": 134.76498, "t": 561.79669, "r": 480.58871, "b": 570.59366, "coord_origin": "1"}}, {"id": 50, "text": "architectures for this task.", "bbox": {"l": 134.76498, "t": 573.75169, "r": 249.27811, "b": 582.54866, "coord_origin": "1"}}]}, "text": "Im2Seq approaches have shown to be well-suited for the TSR task and allow a full end-to-end network design that can output the final table structure without pre- or post-processing logic. Furthermore, Im2Seq models have demonstrated to deliver state-of-the-art prediction accuracy [9]. This motivated the authors to investigate if the performance (both in accuracy and inference time) can be further improved by optimising the table structure representation language. We believe this is a necessary step before further improving neural network architectures for this task."}, {"label": "Section-header", "id": 4, "page_no": 3, "cluster": {"id": 4, "label": "Section-header", "bbox": {"l": 134.42018623352052, "t": 605.1948657989502, "r": 269.62442, "b": 616.11823, "coord_origin": "1"}, "confidence": 0.9407892227172852, "cells": [{"id": 51, "text": "3", "bbox": {"l": 134.76498, "t": 605.54984, "r": 141.48859, "b": 616.11823, "coord_origin": "1"}}, {"id": 52, "text": "Problem Statement", "bbox": {"l": 154.93819, "t": 605.54984, "r": 269.62442, "b": 616.11823, "coord_origin": "1"}}]}, "text": "3 Problem Statement"}, {"label": "Text", "id": 5, "page_no": 3, "cluster": {"id": 5, "label": "Text", "bbox": {"l": 133.80312366485597, "t": 631.5329429626464, "r": 480.5936899999999, "b": 665.3024780273437, "coord_origin": "1"}, "confidence": 0.980012059211731, "cells": [{"id": 53, "text": "All known Im2Seq based models for TSR fundamentally work in similar ways.", "bbox": {"l": 134.76498, "t": 632.14769, "r": 480.59064, "b": 640.94466, "coord_origin": "1"}}, {"id": 54, "text": "Given an image of a table, the Im2Seq model predicts the structure of the table", "bbox": {"l": 134.76498, "t": 644.1026899999999, "r": 480.5867, "b": 652.89966, "coord_origin": "1"}}, {"id": 55, "text": "by generating a sequence of tokens. These tokens originate from a finite vocab-", "bbox": {"l": 134.76498, "t": 656.0586900000001, "r": 480.5936899999999, "b": 664.85566, "coord_origin": "1"}}]}, "text": "All known Im2Seq based models for TSR fundamentally work in similar ways. Given an image of a table, the Im2Seq model predicts the structure of the table by generating a sequence of tokens. These tokens originate from a finite vocab-"}], "body": [{"label": "Text", "id": 1, "page_no": 3, "cluster": {"id": 1, "label": "Text", "bbox": {"l": 133.76139278411867, "t": 117.85091514587407, "r": 480.6270435333252, "b": 259.4519508361816, "coord_origin": "1"}, "confidence": 0.9769367575645447, "cells": [{"id": 3, "text": "Other work [20] aims at predicting a grid for each table and deciding which cells", "bbox": {"l": 134.765, "t": 118.93377999999996, "r": 480.59375, "b": 127.73077, "coord_origin": "1"}}, {"id": 4, "text": "must be merged using an attention network. Im2Seq methods cast the problem", "bbox": {"l": 134.765, "t": 130.88878999999997, "r": 480.58774, "b": 139.68579, "coord_origin": "1"}}, {"id": 5, "text": "as a sequence generation task [4,5,9,22], and therefore need an internal table-", "bbox": {"l": 134.765, "t": 142.84479, "r": 480.58675999999997, "b": 151.64178000000004, "coord_origin": "1"}}, {"id": 6, "text": "structure representation language, which is often implemented with standard", "bbox": {"l": 134.765, "t": 154.7998, "r": 480.5878000000001, "b": 163.59680000000003, "coord_origin": "1"}}, {"id": 7, "text": "markup languages (e.g. HTML, LaTeX, Markdown). In theory, Im2Seq methods", "bbox": {"l": 134.765, "t": 166.75482, "r": 480.59271, "b": 175.55182000000002, "coord_origin": "1"}}, {"id": 8, "text": "have a natural advantage over the OD and GNN methods by virtue of directly", "bbox": {"l": 134.765, "t": 178.70983999999999, "r": 480.5957599999999, "b": 187.50684, "coord_origin": "1"}}, {"id": 9, "text": "predicting the table-structure. As such, no post-processing or rules are needed", "bbox": {"l": 134.765, "t": 190.66485999999998, "r": 480.59271, "b": 199.46185000000003, "coord_origin": "1"}}, {"id": 10, "text": "in order to obtain the table-structure, which is necessary with OD and GNN", "bbox": {"l": 134.765, "t": 202.61987, "r": 480.59378, "b": 211.41687000000002, "coord_origin": "1"}}, {"id": 11, "text": "approaches. In practice, this is not entirely true, because a predicted sequence", "bbox": {"l": 134.765, "t": 214.57587, "r": 480.58783000000005, "b": 223.37285999999995, "coord_origin": "1"}}, {"id": 12, "text": "of table-structure markup does not necessarily have to be syntactically correct.", "bbox": {"l": 134.765, "t": 226.53088000000002, "r": 480.58978, "b": 235.32788000000005, "coord_origin": "1"}}, {"id": 13, "text": "Hence, depending on the quality of the predicted sequence, some post-processing", "bbox": {"l": 134.765, "t": 238.48590000000002, "r": 480.59572999999995, "b": 247.28290000000004, "coord_origin": "1"}}, {"id": 14, "text": "needs to be performed to ensure a syntactically valid (let alone correct) sequence.", "bbox": {"l": 134.765, "t": 250.44092, "r": 480.59473, "b": 259.23792000000003, "coord_origin": "1"}}]}, "text": "Other work [20] aims at predicting a grid for each table and deciding which cells must be merged using an attention network. Im2Seq methods cast the problem as a sequence generation task [4,5,9,22], and therefore need an internal tablestructure representation language, which is often implemented with standard markup languages (e.g. HTML, LaTeX, Markdown). In theory, Im2Seq methods have a natural advantage over the OD and GNN methods by virtue of directly predicting the table-structure. As such, no post-processing or rules are needed in order to obtain the table-structure, which is necessary with OD and GNN approaches. In practice, this is not entirely true, because a predicted sequence of table-structure markup does not necessarily have to be syntactically correct. Hence, depending on the quality of the predicted sequence, some post-processing needs to be performed to ensure a syntactically valid (let alone correct) sequence."}, {"label": "Text", "id": 2, "page_no": 3, "cluster": {"id": 2, "label": "Text", "bbox": {"l": 133.58259201049805, "t": 261.394985961914, "r": 480.79309158325196, "b": 486.6467, "coord_origin": "1"}, "confidence": 0.9808397889137268, "cells": [{"id": 15, "text": "Within the Im2Seq method, we find several popular models, namely the", "bbox": {"l": 149.709, "t": 262.65692, "r": 480.59280000000007, "b": 271.45392000000004, "coord_origin": "1"}}, {"id": 16, "text": "encoder-dual-decoder model (EDD) [22], TableFormer [9], Tabsplitter[2] and Ye", "bbox": {"l": 134.765, "t": 274.61194, "r": 480.59167, "b": 283.40891, "coord_origin": "1"}}, {"id": 17, "text": "et. al. [19]. EDD uses two consecutive long short-term memory (LSTM) decoders", "bbox": {"l": 134.765, "t": 286.56692999999996, "r": 480.59271, "b": 295.36389, "coord_origin": "1"}}, {"id": 18, "text": "to predict a table in HTML representation. The", "bbox": {"l": 134.765, "t": 298.52190999999993, "r": 342.02097, "b": 307.31888, "coord_origin": "1"}}, {"id": 19, "text": "tag decoder", "bbox": {"l": 345.064, "t": 298.52190999999993, "r": 393.04684, "b": 307.31888, "coord_origin": "1"}}, {"id": 20, "text": "predicts a sequence", "bbox": {"l": 397.16699, "t": 298.52190999999993, "r": 480.59082, "b": 307.31888, "coord_origin": "1"}}, {"id": 21, "text": "of HTML tags. For each decoded table cell (", "bbox": {"l": 134.76498, "t": 310.47791, "r": 333.29871, "b": 319.27487, "coord_origin": "1"}}, {"id": 22, "text": "<td>", "bbox": {"l": 333.29898, "t": 310.47791, "r": 356.9711, "b": 319.27487, "coord_origin": "1"}}, {"id": 23, "text": "), the attention is passed to", "bbox": {"l": 357.08499, "t": 310.47791, "r": 480.59433000000007, "b": 319.27487, "coord_origin": "1"}}, {"id": 24, "text": "the", "bbox": {"l": 134.76498, "t": 322.43289, "r": 148.59805, "b": 331.22986, "coord_origin": "1"}}, {"id": 25, "text": "cell decoder", "bbox": {"l": 152.27698, "t": 322.43289, "r": 202.1109, "b": 331.22986, "coord_origin": "1"}}, {"id": 26, "text": "to predict the content with an embedded OCR approach. The", "bbox": {"l": 206.86398, "t": 322.43289, "r": 480.58743, "b": 331.22986, "coord_origin": "1"}}, {"id": 27, "text": "latter makes it susceptible to transcription errors in the cell content of the table.", "bbox": {"l": 134.76498, "t": 334.38788, "r": 480.59476, "b": 343.18484, "coord_origin": "1"}}, {"id": 28, "text": "TableFormer address this reliance on OCR and uses two transformer decoders for", "bbox": {"l": 134.76498, "t": 346.34286, "r": 480.58675999999997, "b": 355.13983, "coord_origin": "1"}}, {"id": 29, "text": "HTML structure and cell bounding box prediction in an end-to-end architecture.", "bbox": {"l": 134.76498, "t": 358.29785, "r": 480.58868, "b": 367.09482, "coord_origin": "1"}}, {"id": 30, "text": "The predicted cell bounding box is then used to extract text tokens from an", "bbox": {"l": 134.76498, "t": 370.25284, "r": 480.58868, "b": 379.0498, "coord_origin": "1"}}, {"id": 31, "text": "originating (digital) PDF page, circumventing any need for OCR. TabSplitter", "bbox": {"l": 134.76498, "t": 382.20883, "r": 480.59357000000006, "b": 391.0058, "coord_origin": "1"}}, {"id": 32, "text": "[2]", "bbox": {"l": 134.76498, "t": 394.16382, "r": 144.76979, "b": 402.96078, "coord_origin": "1"}}, {"id": 33, "text": "proposes a compact double-matrix representation of table rows and columns", "bbox": {"l": 149.50908, "t": 394.16382, "r": 480.58667, "b": 402.96078, "coord_origin": "1"}}, {"id": 34, "text": "to do error detection and error correction of HTML structure sequences based", "bbox": {"l": 134.76498, "t": 406.1188, "r": 480.59569999999997, "b": 414.91576999999995, "coord_origin": "1"}}, {"id": 35, "text": "on predictions from [19]. This compact double-matrix representation can not be", "bbox": {"l": 134.76498, "t": 418.07379, "r": 480.59180000000003, "b": 426.87076, "coord_origin": "1"}}, {"id": 36, "text": "used directly by the Img2seq model training, so the model uses HTML as an", "bbox": {"l": 134.76498, "t": 430.02878, "r": 480.5878000000001, "b": 438.82574, "coord_origin": "1"}}, {"id": 37, "text": "intermediate form. Chi et. al. [4] introduce a data set and a baseline method", "bbox": {"l": 134.76498, "t": 441.98376, "r": 480.58868, "b": 450.78073, "coord_origin": "1"}}, {"id": 38, "text": "using bidirectional LSTMs to predict LaTeX code. Kayal", "bbox": {"l": 134.76498, "t": 453.93976000000004, "r": 384.5752, "b": 462.73672, "coord_origin": "1"}}, {"id": 39, "text": "[5]", "bbox": {"l": 391.55899, "t": 453.93976000000004, "r": 401.73236, "b": 462.73672, "coord_origin": "1"}}, {"id": 40, "text": "introduces Gated", "bbox": {"l": 406.55154, "t": 453.93976000000004, "r": 480.58777, "b": 462.73672, "coord_origin": "1"}}, {"id": 41, "text": "ResNet transformers to predict LaTeX code, and a separate OCR module to", "bbox": {"l": 134.76498, "t": 465.89474, "r": 480.59079, "b": 474.69171, "coord_origin": "1"}}, {"id": 42, "text": "extract content.", "bbox": {"l": 134.76498, "t": 477.84973, "r": 203.68625, "b": 486.6467, "coord_origin": "1"}}]}, "text": "Within the Im2Seq method, we find several popular models, namely the encoder-dual-decoder model (EDD) [22], TableFormer [9], Tabsplitter[2] and Ye et. al. [19]. EDD uses two consecutive long short-term memory (LSTM) decoders to predict a table in HTML representation. The tag decoder predicts a sequence of HTML tags. For each decoded table cell ( <td> ), the attention is passed to the cell decoder to predict the content with an embedded OCR approach. The latter makes it susceptible to transcription errors in the cell content of the table. TableFormer address this reliance on OCR and uses two transformer decoders for HTML structure and cell bounding box prediction in an end-to-end architecture. The predicted cell bounding box is then used to extract text tokens from an originating (digital) PDF page, circumventing any need for OCR. TabSplitter [2] proposes a compact double-matrix representation of table rows and columns to do error detection and error correction of HTML structure sequences based on predictions from [19]. This compact double-matrix representation can not be used directly by the Img2seq model training, so the model uses HTML as an intermediate form. Chi et. al. [4] introduce a data set and a baseline method using bidirectional LSTMs to predict LaTeX code. Kayal [5] introduces Gated ResNet transformers to predict LaTeX code, and a separate OCR module to extract content."}, {"label": "Text", "id": 3, "page_no": 3, "cluster": {"id": 3, "label": "Text", "bbox": {"l": 133.88829259872435, "t": 488.7115905761719, "r": 480.59378, "b": 582.54866, "coord_origin": "1"}, "confidence": 0.9849583506584167, "cells": [{"id": 43, "text": "Im2Seq approaches have shown to be well-suited for the TSR task and allow a", "bbox": {"l": 149.70898, "t": 490.06573, "r": 480.59378, "b": 498.8627, "coord_origin": "1"}}, {"id": 44, "text": "full end-to-end network design that can output the final table structure without", "bbox": {"l": 134.76498, "t": 502.02072, "r": 480.58871, "b": 510.81769, "coord_origin": "1"}}, {"id": 45, "text": "pre- or post-processing logic. Furthermore, Im2Seq models have demonstrated", "bbox": {"l": 134.76498, "t": 513.9757099999999, "r": 480.58675999999997, "b": 522.7726700000001, "coord_origin": "1"}}, {"id": 46, "text": "to deliver state-of-the-art prediction accuracy [9]. This motivated the authors", "bbox": {"l": 134.76498, "t": 525.93069, "r": 480.58978, "b": 534.72766, "coord_origin": "1"}}, {"id": 47, "text": "to investigate if the performance (both in accuracy and inference time) can", "bbox": {"l": 134.76498, "t": 537.8857, "r": 480.58765, "b": 546.6826599999999, "coord_origin": "1"}}, {"id": 48, "text": "be further improved by optimising the table structure representation language.", "bbox": {"l": 134.76498, "t": 549.84169, "r": 480.58971999999994, "b": 558.63866, "coord_origin": "1"}}, {"id": 49, "text": "We believe this is a necessary step before further improving neural network", "bbox": {"l": 134.76498, "t": 561.79669, "r": 480.58871, "b": 570.59366, "coord_origin": "1"}}, {"id": 50, "text": "architectures for this task.", "bbox": {"l": 134.76498, "t": 573.75169, "r": 249.27811, "b": 582.54866, "coord_origin": "1"}}]}, "text": "Im2Seq approaches have shown to be well-suited for the TSR task and allow a full end-to-end network design that can output the final table structure without pre- or post-processing logic. Furthermore, Im2Seq models have demonstrated to deliver state-of-the-art prediction accuracy [9]. This motivated the authors to investigate if the performance (both in accuracy and inference time) can be further improved by optimising the table structure representation language. We believe this is a necessary step before further improving neural network architectures for this task."}, {"label": "Section-header", "id": 4, "page_no": 3, "cluster": {"id": 4, "label": "Section-header", "bbox": {"l": 134.42018623352052, "t": 605.1948657989502, "r": 269.62442, "b": 616.11823, "coord_origin": "1"}, "confidence": 0.9407892227172852, "cells": [{"id": 51, "text": "3", "bbox": {"l": 134.76498, "t": 605.54984, "r": 141.48859, "b": 616.11823, "coord_origin": "1"}}, {"id": 52, "text": "Problem Statement", "bbox": {"l": 154.93819, "t": 605.54984, "r": 269.62442, "b": 616.11823, "coord_origin": "1"}}]}, "text": "3 Problem Statement"}, {"label": "Text", "id": 5, "page_no": 3, "cluster": {"id": 5, "label": "Text", "bbox": {"l": 133.80312366485597, "t": 631.5329429626464, "r": 480.5936899999999, "b": 665.3024780273437, "coord_origin": "1"}, "confidence": 0.980012059211731, "cells": [{"id": 53, "text": "All known Im2Seq based models for TSR fundamentally work in similar ways.", "bbox": {"l": 134.76498, "t": 632.14769, "r": 480.59064, "b": 640.94466, "coord_origin": "1"}}, {"id": 54, "text": "Given an image of a table, the Im2Seq model predicts the structure of the table", "bbox": {"l": 134.76498, "t": 644.1026899999999, "r": 480.5867, "b": 652.89966, "coord_origin": "1"}}, {"id": 55, "text": "by generating a sequence of tokens. These tokens originate from a finite vocab-", "bbox": {"l": 134.76498, "t": 656.0586900000001, "r": 480.5936899999999, "b": 664.85566, "coord_origin": "1"}}]}, "text": "All known Im2Seq based models for TSR fundamentally work in similar ways. Given an image of a table, the Im2Seq model predicts the structure of the table by generating a sequence of tokens. These tokens originate from a finite vocab-"}], "headers": [{"label": "Page-header", "id": 0, "page_no": 3, "cluster": {"id": 0, "label": "Page-header", "bbox": {"l": 134.52096776962279, "t": 92.96537475585933, "r": 231.72227, "b": 101.84069999999997, "coord_origin": "1"}, "confidence": 0.6325680017471313, "cells": [{"id": 0, "text": "4", "bbox": {"l": 134.765, "t": 93.77099999999996, "r": 139.37193, "b": 101.84069999999997, "coord_origin": "1"}}, {"id": 1, "text": "M.", "bbox": {"l": 167.81335, "t": 93.77099999999996, "r": 178.07675, "b": 101.84069999999997, "coord_origin": "1"}}, {"id": 2, "text": "Lysak, et al.", "bbox": {"l": 182.37415, "t": 93.77099999999996, "r": 231.72227, "b": 101.84069999999997, "coord_origin": "1"}}]}, "text": "4 M. Lysak, et al."}]}}, {"page_no": 4, "page_hash": "d3b9daa8fd5d091fb5ef9bce44f085dd282a137e215574fec9556904b25cea8a", "size": {"width": 612.0, "height": 792.0}, "cells": [{"id": 0, "text": "Optimized Table Tokenization for Table Structure Recognition", "bbox": {"l": 194.478, "t": 93.77099999999996, "r": 447.54291000000006, "b": 101.84069999999997, "coord_origin": "1"}}, {"id": 1, "text": "5", "bbox": {"l": 475.98431, "t": 93.77099999999996, "r": 480.59125000000006, "b": 101.84069999999997, "coord_origin": "1"}}, {"id": 2, "text": "ulary and can be interpreted as a table structure. For example, with the HTML", "bbox": {"l": 134.765, "t": 118.93377999999996, "r": 480.58577999999994, "b": 127.73077, "coord_origin": "1"}}, {"id": 3, "text": "tokens", "bbox": {"l": 134.765, "t": 130.88878999999997, "r": 162.48494, "b": 139.68579, "coord_origin": "1"}}, {"id": 4, "text": "<table>", "bbox": {"l": 166.368, "t": 130.88878999999997, "r": 201.74918, "b": 139.68579, "coord_origin": "1"}}, {"id": 5, "text": ",", "bbox": {"l": 201.74899, "t": 130.88878999999997, "r": 204.51561, "b": 139.68579, "coord_origin": "1"}}, {"id": 6, "text": "</table>", "bbox": {"l": 208.39699, "t": 130.88878999999997, "r": 248.86904999999996, "b": 139.68579, "coord_origin": "1"}}, {"id": 7, "text": ",", "bbox": {"l": 248.86899, "t": 130.88878999999997, "r": 251.6356, "b": 139.68579, "coord_origin": "1"}}, {"id": 8, "text": "<tr>", "bbox": {"l": 255.51698, "t": 130.88878999999997, "r": 278.29846, "b": 139.68579, "coord_origin": "1"}}, {"id": 9, "text": ",", "bbox": {"l": 278.29797, "t": 130.88878999999997, "r": 281.06458, "b": 139.68579, "coord_origin": "1"}}, {"id": 10, "text": "</tr>", "bbox": {"l": 284.94598, "t": 130.88878999999997, "r": 312.81836, "b": 139.68579, "coord_origin": "1"}}, {"id": 11, "text": ",", "bbox": {"l": 312.81799, "t": 130.88878999999997, "r": 315.58459, "b": 139.68579, "coord_origin": "1"}}, {"id": 12, "text": "<td>", "bbox": {"l": 319.466, "t": 130.88878999999997, "r": 343.13812, "b": 139.68579, "coord_origin": "1"}}, {"id": 13, "text": "and", "bbox": {"l": 347.13202, "t": 130.88878999999997, "r": 363.17877, "b": 139.68579, "coord_origin": "1"}}, {"id": 14, "text": "</td>", "bbox": {"l": 367.06003, "t": 130.88878999999997, "r": 395.82306, "b": 139.68579, "coord_origin": "1"}}, {"id": 15, "text": ", one can construct", "bbox": {"l": 395.82303, "t": 130.88878999999997, "r": 480.59177000000005, "b": 139.68579, "coord_origin": "1"}}, {"id": 16, "text": "simple table structures without any spanning cells. In reality though, one needs", "bbox": {"l": 134.76501, "t": 142.84479, "r": 480.59365999999994, "b": 151.64178000000004, "coord_origin": "1"}}, {"id": 17, "text": "at least 28 HTML tokens to describe the most common complex tables observed", "bbox": {"l": 134.76501, "t": 154.7998, "r": 480.58577999999994, "b": 163.59680000000003, "coord_origin": "1"}}, {"id": 18, "text": "in real-world documents [21,22], due to a variety of spanning cells definitions in", "bbox": {"l": 134.76501, "t": 166.75482, "r": 480.59378, "b": 175.55182000000002, "coord_origin": "1"}}, {"id": 19, "text": "the HTML token vocabulary.", "bbox": {"l": 134.76501, "t": 178.70983999999999, "r": 261.92566, "b": 187.50684, "coord_origin": "1"}}, {"id": 20, "text": "Fig. 2.", "bbox": {"l": 145.60701, "t": 221.07928000000004, "r": 173.48625, "b": 229.00562000000002, "coord_origin": "1"}}, {"id": 21, "text": "Frequency of tokens in HTML and OTSL as they appear in PubTabNet.", "bbox": {"l": 176.56001, "t": 221.14209000000005, "r": 469.75223000000005, "b": 229.21178999999995, "coord_origin": "1"}}, {"id": 22, "text": "Obviously, HTML and other general-purpose markup languages were not de-", "bbox": {"l": 149.709, "t": 368.20679, "r": 480.59283000000005, "b": 377.00375, "coord_origin": "1"}}, {"id": 23, "text": "signed for Im2Seq models. As such, they have some serious drawbacks. First, the", "bbox": {"l": 134.765, "t": 380.16177, "r": 480.58664, "b": 388.9587399999999, "coord_origin": "1"}}, {"id": 24, "text": "token vocabulary needs to be artificially large in order to describe all plausible", "bbox": {"l": 134.765, "t": 392.11676, "r": 480.59180000000003, "b": 400.91373, "coord_origin": "1"}}, {"id": 25, "text": "tabular structures. Since most Im2Seq models use an autoregressive approach,", "bbox": {"l": 134.765, "t": 404.07175, "r": 480.5897499999999, "b": 412.86871, "coord_origin": "1"}}, {"id": 26, "text": "they generate the sequence token by token. Therefore, to reduce inference time,", "bbox": {"l": 134.765, "t": 416.02774, "r": 480.58871, "b": 424.82471, "coord_origin": "1"}}, {"id": 27, "text": "a shorter sequence length is critical. Every table-cell is represented by at least", "bbox": {"l": 134.765, "t": 427.98273, "r": 480.59265, "b": 436.77969, "coord_origin": "1"}}, {"id": 28, "text": "two tokens (", "bbox": {"l": 134.765, "t": 439.9377099999999, "r": 187.93439, "b": 448.73467999999997, "coord_origin": "1"}}, {"id": 29, "text": "<td>", "bbox": {"l": 187.931, "t": 439.9377099999999, "r": 211.60313, "b": 448.73467999999997, "coord_origin": "1"}}, {"id": 30, "text": "and", "bbox": {"l": 214.75400000000002, "t": 439.9377099999999, "r": 230.80075000000002, "b": 448.73467999999997, "coord_origin": "1"}}, {"id": 31, "text": "</td>", "bbox": {"l": 233.83898999999997, "t": 439.9377099999999, "r": 262.60202, "b": 448.73467999999997, "coord_origin": "1"}}, {"id": 32, "text": "). Furthermore, when tokenizing the HTML struc-", "bbox": {"l": 262.716, "t": 439.9377099999999, "r": 480.59009, "b": 448.73467999999997, "coord_origin": "1"}}, {"id": 33, "text": "ture, one needs to explicitly enumerate possible column-spans and row-spans", "bbox": {"l": 134.76501, "t": 451.8927, "r": 480.58777, "b": 460.68967, "coord_origin": "1"}}, {"id": 34, "text": "as words. In practice, this ends up requiring 28 different HTML tokens (when", "bbox": {"l": 134.76501, "t": 463.84769, "r": 480.58681999999993, "b": 472.64465, "coord_origin": "1"}}, {"id": 35, "text": "including column- and row-spans up to 10 cells) just to describe every table in", "bbox": {"l": 134.76501, "t": 475.80368, "r": 480.58681999999993, "b": 484.60065, "coord_origin": "1"}}, {"id": 36, "text": "the PubTabNet dataset. Clearly, not every token is equally represented, as is", "bbox": {"l": 134.76501, "t": 487.75867, "r": 480.59067, "b": 496.55563, "coord_origin": "1"}}, {"id": 37, "text": "depicted in Figure 2. This skewed distribution of tokens in combination with", "bbox": {"l": 134.76501, "t": 499.71365, "r": 480.59277, "b": 508.51062, "coord_origin": "1"}}, {"id": 38, "text": "variable token row-length makes it challenging for models to learn the HTML", "bbox": {"l": 134.76501, "t": 511.66864, "r": 480.59476, "b": 520.46561, "coord_origin": "1"}}, {"id": 39, "text": "structure.", "bbox": {"l": 134.76501, "t": 523.62363, "r": 176.92873, "b": 532.42059, "coord_origin": "1"}}, {"id": 40, "text": "Additionally, it would be desirable if the representation would easily allow", "bbox": {"l": 149.70901, "t": 536.04263, "r": 480.59289999999993, "b": 544.8396, "coord_origin": "1"}}, {"id": 41, "text": "an early detection of invalid sequences on-the-go, before the prediction of the", "bbox": {"l": 134.76501, "t": 547.99763, "r": 480.59085, "b": 556.7946000000001, "coord_origin": "1"}}, {"id": 42, "text": "entire table structure is completed. HTML is not well-suited for this purpose as", "bbox": {"l": 134.76501, "t": 559.95264, "r": 480.58984, "b": 568.7496, "coord_origin": "1"}}, {"id": 43, "text": "the verification of incomplete sequences is non-trivial or even impossible.", "bbox": {"l": 134.76501, "t": 571.90863, "r": 452.18933, "b": 580.7056, "coord_origin": "1"}}, {"id": 44, "text": "In a valid HTML table, the token sequence must describe a 2D grid of table", "bbox": {"l": 149.70901, "t": 584.32663, "r": 480.59283000000005, "b": 593.1236, "coord_origin": "1"}}, {"id": 45, "text": "cells, serialised in row-major ordering, where each row and each column have", "bbox": {"l": 134.76501, "t": 596.28262, "r": 480.58978, "b": 605.07959, "coord_origin": "1"}}, {"id": 46, "text": "the same length (while considering row- and column-spans). Furthermore, every", "bbox": {"l": 134.76501, "t": 608.23763, "r": 480.5936899999999, "b": 617.03459, "coord_origin": "1"}}, {"id": 47, "text": "opening tag in HTML needs to be matched by a closing tag in a correct hierar-", "bbox": {"l": 134.76501, "t": 620.19263, "r": 480.59091, "b": 628.98959, "coord_origin": "1"}}, {"id": 48, "text": "chical manner. Since the number of tokens for each table row and column can", "bbox": {"l": 134.76501, "t": 632.1476299999999, "r": 480.58582, "b": 640.9446, "coord_origin": "1"}}, {"id": 49, "text": "vary significantly, especially for large tables with many row- and column-spans,", "bbox": {"l": 134.76501, "t": 644.10263, "r": 480.59180000000003, "b": 652.8996, "coord_origin": "1"}}, {"id": 50, "text": "it is complex to verify the consistency of predicted structures during sequence", "bbox": {"l": 134.76501, "t": 656.05763, "r": 480.59473, "b": 664.85461, "coord_origin": "1"}}], "predictions": {"layout": {"clusters": [{"id": 0, "label": "Page-header", "bbox": {"l": 194.02211236953735, "t": 93.09380578994751, "r": 447.54291000000006, "b": 102.16611814498901, "coord_origin": "1"}, "confidence": 0.9483410120010376, "cells": [{"id": 0, "text": "Optimized Table Tokenization for Table Structure Recognition", "bbox": {"l": 194.478, "t": 93.77099999999996, "r": 447.54291000000006, "b": 101.84069999999997, "coord_origin": "1"}}]}, {"id": 1, "label": "Page-header", "bbox": {"l": 475.13187446594236, "t": 93.52824125289919, "r": 480.59125000000006, "b": 101.84069999999997, "coord_origin": "1"}, "confidence": 0.8685489892959595, "cells": [{"id": 1, "text": "5", "bbox": {"l": 475.98431, "t": 93.77099999999996, "r": 480.59125000000006, "b": 101.84069999999997, "coord_origin": "1"}}]}, {"id": 2, "label": "Text", "bbox": {"l": 133.90025739669798, "t": 118.06202430725102, "r": 480.78725509643556, "b": 187.50684, "coord_origin": "1"}, "confidence": 0.9866952896118164, "cells": [{"id": 2, "text": "ulary and can be interpreted as a table structure. For example, with the HTML", "bbox": {"l": 134.765, "t": 118.93377999999996, "r": 480.58577999999994, "b": 127.73077, "coord_origin": "1"}}, {"id": 3, "text": "tokens", "bbox": {"l": 134.765, "t": 130.88878999999997, "r": 162.48494, "b": 139.68579, "coord_origin": "1"}}, {"id": 4, "text": "<table>", "bbox": {"l": 166.368, "t": 130.88878999999997, "r": 201.74918, "b": 139.68579, "coord_origin": "1"}}, {"id": 5, "text": ",", "bbox": {"l": 201.74899, "t": 130.88878999999997, "r": 204.51561, "b": 139.68579, "coord_origin": "1"}}, {"id": 6, "text": "</table>", "bbox": {"l": 208.39699, "t": 130.88878999999997, "r": 248.86904999999996, "b": 139.68579, "coord_origin": "1"}}, {"id": 7, "text": ",", "bbox": {"l": 248.86899, "t": 130.88878999999997, "r": 251.6356, "b": 139.68579, "coord_origin": "1"}}, {"id": 8, "text": "<tr>", "bbox": {"l": 255.51698, "t": 130.88878999999997, "r": 278.29846, "b": 139.68579, "coord_origin": "1"}}, {"id": 9, "text": ",", "bbox": {"l": 278.29797, "t": 130.88878999999997, "r": 281.06458, "b": 139.68579, "coord_origin": "1"}}, {"id": 10, "text": "</tr>", "bbox": {"l": 284.94598, "t": 130.88878999999997, "r": 312.81836, "b": 139.68579, "coord_origin": "1"}}, {"id": 11, "text": ",", "bbox": {"l": 312.81799, "t": 130.88878999999997, "r": 315.58459, "b": 139.68579, "coord_origin": "1"}}, {"id": 12, "text": "<td>", "bbox": {"l": 319.466, "t": 130.88878999999997, "r": 343.13812, "b": 139.68579, "coord_origin": "1"}}, {"id": 13, "text": "and", "bbox": {"l": 347.13202, "t": 130.88878999999997, "r": 363.17877, "b": 139.68579, "coord_origin": "1"}}, {"id": 14, "text": "</td>", "bbox": {"l": 367.06003, "t": 130.88878999999997, "r": 395.82306, "b": 139.68579, "coord_origin": "1"}}, {"id": 15, "text": ", one can construct", "bbox": {"l": 395.82303, "t": 130.88878999999997, "r": 480.59177000000005, "b": 139.68579, "coord_origin": "1"}}, {"id": 16, "text": "simple table structures without any spanning cells. In reality though, one needs", "bbox": {"l": 134.76501, "t": 142.84479, "r": 480.59365999999994, "b": 151.64178000000004, "coord_origin": "1"}}, {"id": 17, "text": "at least 28 HTML tokens to describe the most common complex tables observed", "bbox": {"l": 134.76501, "t": 154.7998, "r": 480.58577999999994, "b": 163.59680000000003, "coord_origin": "1"}}, {"id": 18, "text": "in real-world documents [21,22], due to a variety of spanning cells definitions in", "bbox": {"l": 134.76501, "t": 166.75482, "r": 480.59378, "b": 175.55182000000002, "coord_origin": "1"}}, {"id": 19, "text": "the HTML token vocabulary.", "bbox": {"l": 134.76501, "t": 178.70983999999999, "r": 261.92566, "b": 187.50684, "coord_origin": "1"}}]}, {"id": 3, "label": "Caption", "bbox": {"l": 145.19676303863525, "t": 220.18719520568845, "r": 469.75223000000005, "b": 229.42055854797366, "coord_origin": "1"}, "confidence": 0.8952732682228088, "cells": [{"id": 20, "text": "Fig. 2.", "bbox": {"l": 145.60701, "t": 221.07928000000004, "r": 173.48625, "b": 229.00562000000002, "coord_origin": "1"}}, {"id": 21, "text": "Frequency of tokens in HTML and OTSL as they appear in PubTabNet.", "bbox": {"l": 176.56001, "t": 221.14209000000005, "r": 469.75223000000005, "b": 229.21178999999995, "coord_origin": "1"}}]}, {"id": 4, "label": "Text", "bbox": {"l": 133.70604829788206, "t": 367.1275177001953, "r": 480.62745208740233, "b": 532.42059, "coord_origin": "1"}, "confidence": 0.9863564968109131, "cells": [{"id": 22, "text": "Obviously, HTML and other general-purpose markup languages were not de-", "bbox": {"l": 149.709, "t": 368.20679, "r": 480.59283000000005, "b": 377.00375, "coord_origin": "1"}}, {"id": 23, "text": "signed for Im2Seq models. As such, they have some serious drawbacks. First, the", "bbox": {"l": 134.765, "t": 380.16177, "r": 480.58664, "b": 388.9587399999999, "coord_origin": "1"}}, {"id": 24, "text": "token vocabulary needs to be artificially large in order to describe all plausible", "bbox": {"l": 134.765, "t": 392.11676, "r": 480.59180000000003, "b": 400.91373, "coord_origin": "1"}}, {"id": 25, "text": "tabular structures. Since most Im2Seq models use an autoregressive approach,", "bbox": {"l": 134.765, "t": 404.07175, "r": 480.5897499999999, "b": 412.86871, "coord_origin": "1"}}, {"id": 26, "text": "they generate the sequence token by token. Therefore, to reduce inference time,", "bbox": {"l": 134.765, "t": 416.02774, "r": 480.58871, "b": 424.82471, "coord_origin": "1"}}, {"id": 27, "text": "a shorter sequence length is critical. Every table-cell is represented by at least", "bbox": {"l": 134.765, "t": 427.98273, "r": 480.59265, "b": 436.77969, "coord_origin": "1"}}, {"id": 28, "text": "two tokens (", "bbox": {"l": 134.765, "t": 439.9377099999999, "r": 187.93439, "b": 448.73467999999997, "coord_origin": "1"}}, {"id": 29, "text": "<td>", "bbox": {"l": 187.931, "t": 439.9377099999999, "r": 211.60313, "b": 448.73467999999997, "coord_origin": "1"}}, {"id": 30, "text": "and", "bbox": {"l": 214.75400000000002, "t": 439.9377099999999, "r": 230.80075000000002, "b": 448.73467999999997, "coord_origin": "1"}}, {"id": 31, "text": "</td>", "bbox": {"l": 233.83898999999997, "t": 439.9377099999999, "r": 262.60202, "b": 448.73467999999997, "coord_origin": "1"}}, {"id": 32, "text": "). Furthermore, when tokenizing the HTML struc-", "bbox": {"l": 262.716, "t": 439.9377099999999, "r": 480.59009, "b": 448.73467999999997, "coord_origin": "1"}}, {"id": 33, "text": "ture, one needs to explicitly enumerate possible column-spans and row-spans", "bbox": {"l": 134.76501, "t": 451.8927, "r": 480.58777, "b": 460.68967, "coord_origin": "1"}}, {"id": 34, "text": "as words. In practice, this ends up requiring 28 different HTML tokens (when", "bbox": {"l": 134.76501, "t": 463.84769, "r": 480.58681999999993, "b": 472.64465, "coord_origin": "1"}}, {"id": 35, "text": "including column- and row-spans up to 10 cells) just to describe every table in", "bbox": {"l": 134.76501, "t": 475.80368, "r": 480.58681999999993, "b": 484.60065, "coord_origin": "1"}}, {"id": 36, "text": "the PubTabNet dataset. Clearly, not every token is equally represented, as is", "bbox": {"l": 134.76501, "t": 487.75867, "r": 480.59067, "b": 496.55563, "coord_origin": "1"}}, {"id": 37, "text": "depicted in Figure 2. This skewed distribution of tokens in combination with", "bbox": {"l": 134.76501, "t": 499.71365, "r": 480.59277, "b": 508.51062, "coord_origin": "1"}}, {"id": 38, "text": "variable token row-length makes it challenging for models to learn the HTML", "bbox": {"l": 134.76501, "t": 511.66864, "r": 480.59476, "b": 520.46561, "coord_origin": "1"}}, {"id": 39, "text": "structure.", "bbox": {"l": 134.76501, "t": 523.62363, "r": 176.92873, "b": 532.42059, "coord_origin": "1"}}]}, {"id": 5, "label": "Text", "bbox": {"l": 133.89939651489257, "t": 534.8984985351562, "r": 480.59289999999993, "b": 581.5316436767579, "coord_origin": "1"}, "confidence": 0.9800698161125183, "cells": [{"id": 40, "text": "Additionally, it would be desirable if the representation would easily allow", "bbox": {"l": 149.70901, "t": 536.04263, "r": 480.59289999999993, "b": 544.8396, "coord_origin": "1"}}, {"id": 41, "text": "an early detection of invalid sequences on-the-go, before the prediction of the", "bbox": {"l": 134.76501, "t": 547.99763, "r": 480.59085, "b": 556.7946000000001, "coord_origin": "1"}}, {"id": 42, "text": "entire table structure is completed. HTML is not well-suited for this purpose as", "bbox": {"l": 134.76501, "t": 559.95264, "r": 480.58984, "b": 568.7496, "coord_origin": "1"}}, {"id": 43, "text": "the verification of incomplete sequences is non-trivial or even impossible.", "bbox": {"l": 134.76501, "t": 571.90863, "r": 452.18933, "b": 580.7056, "coord_origin": "1"}}]}, {"id": 6, "label": "Text", "bbox": {"l": 133.75929164886475, "t": 583.1087310791016, "r": 480.59473, "b": 665.1034538269042, "coord_origin": "1"}, "confidence": 0.9856952428817749, "cells": [{"id": 44, "text": "In a valid HTML table, the token sequence must describe a 2D grid of table", "bbox": {"l": 149.70901, "t": 584.32663, "r": 480.59283000000005, "b": 593.1236, "coord_origin": "1"}}, {"id": 45, "text": "cells, serialised in row-major ordering, where each row and each column have", "bbox": {"l": 134.76501, "t": 596.28262, "r": 480.58978, "b": 605.07959, "coord_origin": "1"}}, {"id": 46, "text": "the same length (while considering row- and column-spans). Furthermore, every", "bbox": {"l": 134.76501, "t": 608.23763, "r": 480.5936899999999, "b": 617.03459, "coord_origin": "1"}}, {"id": 47, "text": "opening tag in HTML needs to be matched by a closing tag in a correct hierar-", "bbox": {"l": 134.76501, "t": 620.19263, "r": 480.59091, "b": 628.98959, "coord_origin": "1"}}, {"id": 48, "text": "chical manner. Since the number of tokens for each table row and column can", "bbox": {"l": 134.76501, "t": 632.1476299999999, "r": 480.58582, "b": 640.9446, "coord_origin": "1"}}, {"id": 49, "text": "vary significantly, especially for large tables with many row- and column-spans,", "bbox": {"l": 134.76501, "t": 644.10263, "r": 480.59180000000003, "b": 652.8996, "coord_origin": "1"}}, {"id": 50, "text": "it is complex to verify the consistency of predicted structures during sequence", "bbox": {"l": 134.76501, "t": 656.05763, "r": 480.59473, "b": 664.85461, "coord_origin": "1"}}]}, {"id": 7, "label": "Picture", "bbox": {"l": 137.53746843338013, "t": 229.03011989593506, "r": 476.1513336181641, "b": 339.5847587585449, "coord_origin": "1"}, "confidence": 0.9503862857818604, "cells": []}]}, "tablestructure": {"table_map": {}}, "figures_classification": null, "equations_prediction": null}, "assembled": {"elements": [{"label": "Page-header", "id": 0, "page_no": 4, "cluster": {"id": 0, "label": "Page-header", "bbox": {"l": 194.02211236953735, "t": 93.09380578994751, "r": 447.54291000000006, "b": 102.16611814498901, "coord_origin": "1"}, "confidence": 0.9483410120010376, "cells": [{"id": 0, "text": "Optimized Table Tokenization for Table Structure Recognition", "bbox": {"l": 194.478, "t": 93.77099999999996, "r": 447.54291000000006, "b": 101.84069999999997, "coord_origin": "1"}}]}, "text": "Optimized Table Tokenization for Table Structure Recognition"}, {"label": "Page-header", "id": 1, "page_no": 4, "cluster": {"id": 1, "label": "Page-header", "bbox": {"l": 475.13187446594236, "t": 93.52824125289919, "r": 480.59125000000006, "b": 101.84069999999997, "coord_origin": "1"}, "confidence": 0.8685489892959595, "cells": [{"id": 1, "text": "5", "bbox": {"l": 475.98431, "t": 93.77099999999996, "r": 480.59125000000006, "b": 101.84069999999997, "coord_origin": "1"}}]}, "text": "5"}, {"label": "Text", "id": 2, "page_no": 4, "cluster": {"id": 2, "label": "Text", "bbox": {"l": 133.90025739669798, "t": 118.06202430725102, "r": 480.78725509643556, "b": 187.50684, "coord_origin": "1"}, "confidence": 0.9866952896118164, "cells": [{"id": 2, "text": "ulary and can be interpreted as a table structure. For example, with the HTML", "bbox": {"l": 134.765, "t": 118.93377999999996, "r": 480.58577999999994, "b": 127.73077, "coord_origin": "1"}}, {"id": 3, "text": "tokens", "bbox": {"l": 134.765, "t": 130.88878999999997, "r": 162.48494, "b": 139.68579, "coord_origin": "1"}}, {"id": 4, "text": "<table>", "bbox": {"l": 166.368, "t": 130.88878999999997, "r": 201.74918, "b": 139.68579, "coord_origin": "1"}}, {"id": 5, "text": ",", "bbox": {"l": 201.74899, "t": 130.88878999999997, "r": 204.51561, "b": 139.68579, "coord_origin": "1"}}, {"id": 6, "text": "</table>", "bbox": {"l": 208.39699, "t": 130.88878999999997, "r": 248.86904999999996, "b": 139.68579, "coord_origin": "1"}}, {"id": 7, "text": ",", "bbox": {"l": 248.86899, "t": 130.88878999999997, "r": 251.6356, "b": 139.68579, "coord_origin": "1"}}, {"id": 8, "text": "<tr>", "bbox": {"l": 255.51698, "t": 130.88878999999997, "r": 278.29846, "b": 139.68579, "coord_origin": "1"}}, {"id": 9, "text": ",", "bbox": {"l": 278.29797, "t": 130.88878999999997, "r": 281.06458, "b": 139.68579, "coord_origin": "1"}}, {"id": 10, "text": "</tr>", "bbox": {"l": 284.94598, "t": 130.88878999999997, "r": 312.81836, "b": 139.68579, "coord_origin": "1"}}, {"id": 11, "text": ",", "bbox": {"l": 312.81799, "t": 130.88878999999997, "r": 315.58459, "b": 139.68579, "coord_origin": "1"}}, {"id": 12, "text": "<td>", "bbox": {"l": 319.466, "t": 130.88878999999997, "r": 343.13812, "b": 139.68579, "coord_origin": "1"}}, {"id": 13, "text": "and", "bbox": {"l": 347.13202, "t": 130.88878999999997, "r": 363.17877, "b": 139.68579, "coord_origin": "1"}}, {"id": 14, "text": "</td>", "bbox": {"l": 367.06003, "t": 130.88878999999997, "r": 395.82306, "b": 139.68579, "coord_origin": "1"}}, {"id": 15, "text": ", one can construct", "bbox": {"l": 395.82303, "t": 130.88878999999997, "r": 480.59177000000005, "b": 139.68579, "coord_origin": "1"}}, {"id": 16, "text": "simple table structures without any spanning cells. In reality though, one needs", "bbox": {"l": 134.76501, "t": 142.84479, "r": 480.59365999999994, "b": 151.64178000000004, "coord_origin": "1"}}, {"id": 17, "text": "at least 28 HTML tokens to describe the most common complex tables observed", "bbox": {"l": 134.76501, "t": 154.7998, "r": 480.58577999999994, "b": 163.59680000000003, "coord_origin": "1"}}, {"id": 18, "text": "in real-world documents [21,22], due to a variety of spanning cells definitions in", "bbox": {"l": 134.76501, "t": 166.75482, "r": 480.59378, "b": 175.55182000000002, "coord_origin": "1"}}, {"id": 19, "text": "the HTML token vocabulary.", "bbox": {"l": 134.76501, "t": 178.70983999999999, "r": 261.92566, "b": 187.50684, "coord_origin": "1"}}]}, "text": "ulary and can be interpreted as a table structure. For example, with the HTML tokens <table> , </table> , <tr> , </tr> , <td> and </td> , one can construct simple table structures without any spanning cells. In reality though, one needs at least 28 HTML tokens to describe the most common complex tables observed in real-world documents [21,22], due to a variety of spanning cells definitions in the HTML token vocabulary."}, {"label": "Caption", "id": 3, "page_no": 4, "cluster": {"id": 3, "label": "Caption", "bbox": {"l": 145.19676303863525, "t": 220.18719520568845, "r": 469.75223000000005, "b": 229.42055854797366, "coord_origin": "1"}, "confidence": 0.8952732682228088, "cells": [{"id": 20, "text": "Fig. 2.", "bbox": {"l": 145.60701, "t": 221.07928000000004, "r": 173.48625, "b": 229.00562000000002, "coord_origin": "1"}}, {"id": 21, "text": "Frequency of tokens in HTML and OTSL as they appear in PubTabNet.", "bbox": {"l": 176.56001, "t": 221.14209000000005, "r": 469.75223000000005, "b": 229.21178999999995, "coord_origin": "1"}}]}, "text": "Fig. 2. Frequency of tokens in HTML and OTSL as they appear in PubTabNet."}, {"label": "Text", "id": 4, "page_no": 4, "cluster": {"id": 4, "label": "Text", "bbox": {"l": 133.70604829788206, "t": 367.1275177001953, "r": 480.62745208740233, "b": 532.42059, "coord_origin": "1"}, "confidence": 0.9863564968109131, "cells": [{"id": 22, "text": "Obviously, HTML and other general-purpose markup languages were not de-", "bbox": {"l": 149.709, "t": 368.20679, "r": 480.59283000000005, "b": 377.00375, "coord_origin": "1"}}, {"id": 23, "text": "signed for Im2Seq models. As such, they have some serious drawbacks. First, the", "bbox": {"l": 134.765, "t": 380.16177, "r": 480.58664, "b": 388.9587399999999, "coord_origin": "1"}}, {"id": 24, "text": "token vocabulary needs to be artificially large in order to describe all plausible", "bbox": {"l": 134.765, "t": 392.11676, "r": 480.59180000000003, "b": 400.91373, "coord_origin": "1"}}, {"id": 25, "text": "tabular structures. Since most Im2Seq models use an autoregressive approach,", "bbox": {"l": 134.765, "t": 404.07175, "r": 480.5897499999999, "b": 412.86871, "coord_origin": "1"}}, {"id": 26, "text": "they generate the sequence token by token. Therefore, to reduce inference time,", "bbox": {"l": 134.765, "t": 416.02774, "r": 480.58871, "b": 424.82471, "coord_origin": "1"}}, {"id": 27, "text": "a shorter sequence length is critical. Every table-cell is represented by at least", "bbox": {"l": 134.765, "t": 427.98273, "r": 480.59265, "b": 436.77969, "coord_origin": "1"}}, {"id": 28, "text": "two tokens (", "bbox": {"l": 134.765, "t": 439.9377099999999, "r": 187.93439, "b": 448.73467999999997, "coord_origin": "1"}}, {"id": 29, "text": "<td>", "bbox": {"l": 187.931, "t": 439.9377099999999, "r": 211.60313, "b": 448.73467999999997, "coord_origin": "1"}}, {"id": 30, "text": "and", "bbox": {"l": 214.75400000000002, "t": 439.9377099999999, "r": 230.80075000000002, "b": 448.73467999999997, "coord_origin": "1"}}, {"id": 31, "text": "</td>", "bbox": {"l": 233.83898999999997, "t": 439.9377099999999, "r": 262.60202, "b": 448.73467999999997, "coord_origin": "1"}}, {"id": 32, "text": "). Furthermore, when tokenizing the HTML struc-", "bbox": {"l": 262.716, "t": 439.9377099999999, "r": 480.59009, "b": 448.73467999999997, "coord_origin": "1"}}, {"id": 33, "text": "ture, one needs to explicitly enumerate possible column-spans and row-spans", "bbox": {"l": 134.76501, "t": 451.8927, "r": 480.58777, "b": 460.68967, "coord_origin": "1"}}, {"id": 34, "text": "as words. In practice, this ends up requiring 28 different HTML tokens (when", "bbox": {"l": 134.76501, "t": 463.84769, "r": 480.58681999999993, "b": 472.64465, "coord_origin": "1"}}, {"id": 35, "text": "including column- and row-spans up to 10 cells) just to describe every table in", "bbox": {"l": 134.76501, "t": 475.80368, "r": 480.58681999999993, "b": 484.60065, "coord_origin": "1"}}, {"id": 36, "text": "the PubTabNet dataset. Clearly, not every token is equally represented, as is", "bbox": {"l": 134.76501, "t": 487.75867, "r": 480.59067, "b": 496.55563, "coord_origin": "1"}}, {"id": 37, "text": "depicted in Figure 2. This skewed distribution of tokens in combination with", "bbox": {"l": 134.76501, "t": 499.71365, "r": 480.59277, "b": 508.51062, "coord_origin": "1"}}, {"id": 38, "text": "variable token row-length makes it challenging for models to learn the HTML", "bbox": {"l": 134.76501, "t": 511.66864, "r": 480.59476, "b": 520.46561, "coord_origin": "1"}}, {"id": 39, "text": "structure.", "bbox": {"l": 134.76501, "t": 523.62363, "r": 176.92873, "b": 532.42059, "coord_origin": "1"}}]}, "text": "Obviously, HTML and other general-purpose markup languages were not designed for Im2Seq models. As such, they have some serious drawbacks. First, the token vocabulary needs to be artificially large in order to describe all plausible tabular structures. Since most Im2Seq models use an autoregressive approach, they generate the sequence token by token. Therefore, to reduce inference time, a shorter sequence length is critical. Every table-cell is represented by at least two tokens ( <td> and </td> ). Furthermore, when tokenizing the HTML structure, one needs to explicitly enumerate possible column-spans and row-spans as words. In practice, this ends up requiring 28 different HTML tokens (when including column- and row-spans up to 10 cells) just to describe every table in the PubTabNet dataset. Clearly, not every token is equally represented, as is depicted in Figure 2. This skewed distribution of tokens in combination with variable token row-length makes it challenging for models to learn the HTML structure."}, {"label": "Text", "id": 5, "page_no": 4, "cluster": {"id": 5, "label": "Text", "bbox": {"l": 133.89939651489257, "t": 534.8984985351562, "r": 480.59289999999993, "b": 581.5316436767579, "coord_origin": "1"}, "confidence": 0.9800698161125183, "cells": [{"id": 40, "text": "Additionally, it would be desirable if the representation would easily allow", "bbox": {"l": 149.70901, "t": 536.04263, "r": 480.59289999999993, "b": 544.8396, "coord_origin": "1"}}, {"id": 41, "text": "an early detection of invalid sequences on-the-go, before the prediction of the", "bbox": {"l": 134.76501, "t": 547.99763, "r": 480.59085, "b": 556.7946000000001, "coord_origin": "1"}}, {"id": 42, "text": "entire table structure is completed. HTML is not well-suited for this purpose as", "bbox": {"l": 134.76501, "t": 559.95264, "r": 480.58984, "b": 568.7496, "coord_origin": "1"}}, {"id": 43, "text": "the verification of incomplete sequences is non-trivial or even impossible.", "bbox": {"l": 134.76501, "t": 571.90863, "r": 452.18933, "b": 580.7056, "coord_origin": "1"}}]}, "text": "Additionally, it would be desirable if the representation would easily allow an early detection of invalid sequences on-the-go, before the prediction of the entire table structure is completed. HTML is not well-suited for this purpose as the verification of incomplete sequences is non-trivial or even impossible."}, {"label": "Text", "id": 6, "page_no": 4, "cluster": {"id": 6, "label": "Text", "bbox": {"l": 133.75929164886475, "t": 583.1087310791016, "r": 480.59473, "b": 665.1034538269042, "coord_origin": "1"}, "confidence": 0.9856952428817749, "cells": [{"id": 44, "text": "In a valid HTML table, the token sequence must describe a 2D grid of table", "bbox": {"l": 149.70901, "t": 584.32663, "r": 480.59283000000005, "b": 593.1236, "coord_origin": "1"}}, {"id": 45, "text": "cells, serialised in row-major ordering, where each row and each column have", "bbox": {"l": 134.76501, "t": 596.28262, "r": 480.58978, "b": 605.07959, "coord_origin": "1"}}, {"id": 46, "text": "the same length (while considering row- and column-spans). Furthermore, every", "bbox": {"l": 134.76501, "t": 608.23763, "r": 480.5936899999999, "b": 617.03459, "coord_origin": "1"}}, {"id": 47, "text": "opening tag in HTML needs to be matched by a closing tag in a correct hierar-", "bbox": {"l": 134.76501, "t": 620.19263, "r": 480.59091, "b": 628.98959, "coord_origin": "1"}}, {"id": 48, "text": "chical manner. Since the number of tokens for each table row and column can", "bbox": {"l": 134.76501, "t": 632.1476299999999, "r": 480.58582, "b": 640.9446, "coord_origin": "1"}}, {"id": 49, "text": "vary significantly, especially for large tables with many row- and column-spans,", "bbox": {"l": 134.76501, "t": 644.10263, "r": 480.59180000000003, "b": 652.8996, "coord_origin": "1"}}, {"id": 50, "text": "it is complex to verify the consistency of predicted structures during sequence", "bbox": {"l": 134.76501, "t": 656.05763, "r": 480.59473, "b": 664.85461, "coord_origin": "1"}}]}, "text": "In a valid HTML table, the token sequence must describe a 2D grid of table cells, serialised in row-major ordering, where each row and each column have the same length (while considering row- and column-spans). Furthermore, every opening tag in HTML needs to be matched by a closing tag in a correct hierarchical manner. Since the number of tokens for each table row and column can vary significantly, especially for large tables with many row- and column-spans, it is complex to verify the consistency of predicted structures during sequence"}, {"label": "Picture", "id": 7, "page_no": 4, "cluster": {"id": 7, "label": "Picture", "bbox": {"l": 137.53746843338013, "t": 229.03011989593506, "r": 476.1513336181641, "b": 339.5847587585449, "coord_origin": "1"}, "confidence": 0.9503862857818604, "cells": []}, "text": "", "data": null, "provenance": null, "predicted_class": null, "confidence": null}], "body": [{"label": "Text", "id": 2, "page_no": 4, "cluster": {"id": 2, "label": "Text", "bbox": {"l": 133.90025739669798, "t": 118.06202430725102, "r": 480.78725509643556, "b": 187.50684, "coord_origin": "1"}, "confidence": 0.9866952896118164, "cells": [{"id": 2, "text": "ulary and can be interpreted as a table structure. For example, with the HTML", "bbox": {"l": 134.765, "t": 118.93377999999996, "r": 480.58577999999994, "b": 127.73077, "coord_origin": "1"}}, {"id": 3, "text": "tokens", "bbox": {"l": 134.765, "t": 130.88878999999997, "r": 162.48494, "b": 139.68579, "coord_origin": "1"}}, {"id": 4, "text": "<table>", "bbox": {"l": 166.368, "t": 130.88878999999997, "r": 201.74918, "b": 139.68579, "coord_origin": "1"}}, {"id": 5, "text": ",", "bbox": {"l": 201.74899, "t": 130.88878999999997, "r": 204.51561, "b": 139.68579, "coord_origin": "1"}}, {"id": 6, "text": "</table>", "bbox": {"l": 208.39699, "t": 130.88878999999997, "r": 248.86904999999996, "b": 139.68579, "coord_origin": "1"}}, {"id": 7, "text": ",", "bbox": {"l": 248.86899, "t": 130.88878999999997, "r": 251.6356, "b": 139.68579, "coord_origin": "1"}}, {"id": 8, "text": "<tr>", "bbox": {"l": 255.51698, "t": 130.88878999999997, "r": 278.29846, "b": 139.68579, "coord_origin": "1"}}, {"id": 9, "text": ",", "bbox": {"l": 278.29797, "t": 130.88878999999997, "r": 281.06458, "b": 139.68579, "coord_origin": "1"}}, {"id": 10, "text": "</tr>", "bbox": {"l": 284.94598, "t": 130.88878999999997, "r": 312.81836, "b": 139.68579, "coord_origin": "1"}}, {"id": 11, "text": ",", "bbox": {"l": 312.81799, "t": 130.88878999999997, "r": 315.58459, "b": 139.68579, "coord_origin": "1"}}, {"id": 12, "text": "<td>", "bbox": {"l": 319.466, "t": 130.88878999999997, "r": 343.13812, "b": 139.68579, "coord_origin": "1"}}, {"id": 13, "text": "and", "bbox": {"l": 347.13202, "t": 130.88878999999997, "r": 363.17877, "b": 139.68579, "coord_origin": "1"}}, {"id": 14, "text": "</td>", "bbox": {"l": 367.06003, "t": 130.88878999999997, "r": 395.82306, "b": 139.68579, "coord_origin": "1"}}, {"id": 15, "text": ", one can construct", "bbox": {"l": 395.82303, "t": 130.88878999999997, "r": 480.59177000000005, "b": 139.68579, "coord_origin": "1"}}, {"id": 16, "text": "simple table structures without any spanning cells. In reality though, one needs", "bbox": {"l": 134.76501, "t": 142.84479, "r": 480.59365999999994, "b": 151.64178000000004, "coord_origin": "1"}}, {"id": 17, "text": "at least 28 HTML tokens to describe the most common complex tables observed", "bbox": {"l": 134.76501, "t": 154.7998, "r": 480.58577999999994, "b": 163.59680000000003, "coord_origin": "1"}}, {"id": 18, "text": "in real-world documents [21,22], due to a variety of spanning cells definitions in", "bbox": {"l": 134.76501, "t": 166.75482, "r": 480.59378, "b": 175.55182000000002, "coord_origin": "1"}}, {"id": 19, "text": "the HTML token vocabulary.", "bbox": {"l": 134.76501, "t": 178.70983999999999, "r": 261.92566, "b": 187.50684, "coord_origin": "1"}}]}, "text": "ulary and can be interpreted as a table structure. For example, with the HTML tokens <table> , </table> , <tr> , </tr> , <td> and </td> , one can construct simple table structures without any spanning cells. In reality though, one needs at least 28 HTML tokens to describe the most common complex tables observed in real-world documents [21,22], due to a variety of spanning cells definitions in the HTML token vocabulary."}, {"label": "Caption", "id": 3, "page_no": 4, "cluster": {"id": 3, "label": "Caption", "bbox": {"l": 145.19676303863525, "t": 220.18719520568845, "r": 469.75223000000005, "b": 229.42055854797366, "coord_origin": "1"}, "confidence": 0.8952732682228088, "cells": [{"id": 20, "text": "Fig. 2.", "bbox": {"l": 145.60701, "t": 221.07928000000004, "r": 173.48625, "b": 229.00562000000002, "coord_origin": "1"}}, {"id": 21, "text": "Frequency of tokens in HTML and OTSL as they appear in PubTabNet.", "bbox": {"l": 176.56001, "t": 221.14209000000005, "r": 469.75223000000005, "b": 229.21178999999995, "coord_origin": "1"}}]}, "text": "Fig. 2. Frequency of tokens in HTML and OTSL as they appear in PubTabNet."}, {"label": "Text", "id": 4, "page_no": 4, "cluster": {"id": 4, "label": "Text", "bbox": {"l": 133.70604829788206, "t": 367.1275177001953, "r": 480.62745208740233, "b": 532.42059, "coord_origin": "1"}, "confidence": 0.9863564968109131, "cells": [{"id": 22, "text": "Obviously, HTML and other general-purpose markup languages were not de-", "bbox": {"l": 149.709, "t": 368.20679, "r": 480.59283000000005, "b": 377.00375, "coord_origin": "1"}}, {"id": 23, "text": "signed for Im2Seq models. As such, they have some serious drawbacks. First, the", "bbox": {"l": 134.765, "t": 380.16177, "r": 480.58664, "b": 388.9587399999999, "coord_origin": "1"}}, {"id": 24, "text": "token vocabulary needs to be artificially large in order to describe all plausible", "bbox": {"l": 134.765, "t": 392.11676, "r": 480.59180000000003, "b": 400.91373, "coord_origin": "1"}}, {"id": 25, "text": "tabular structures. Since most Im2Seq models use an autoregressive approach,", "bbox": {"l": 134.765, "t": 404.07175, "r": 480.5897499999999, "b": 412.86871, "coord_origin": "1"}}, {"id": 26, "text": "they generate the sequence token by token. Therefore, to reduce inference time,", "bbox": {"l": 134.765, "t": 416.02774, "r": 480.58871, "b": 424.82471, "coord_origin": "1"}}, {"id": 27, "text": "a shorter sequence length is critical. Every table-cell is represented by at least", "bbox": {"l": 134.765, "t": 427.98273, "r": 480.59265, "b": 436.77969, "coord_origin": "1"}}, {"id": 28, "text": "two tokens (", "bbox": {"l": 134.765, "t": 439.9377099999999, "r": 187.93439, "b": 448.73467999999997, "coord_origin": "1"}}, {"id": 29, "text": "<td>", "bbox": {"l": 187.931, "t": 439.9377099999999, "r": 211.60313, "b": 448.73467999999997, "coord_origin": "1"}}, {"id": 30, "text": "and", "bbox": {"l": 214.75400000000002, "t": 439.9377099999999, "r": 230.80075000000002, "b": 448.73467999999997, "coord_origin": "1"}}, {"id": 31, "text": "</td>", "bbox": {"l": 233.83898999999997, "t": 439.9377099999999, "r": 262.60202, "b": 448.73467999999997, "coord_origin": "1"}}, {"id": 32, "text": "). Furthermore, when tokenizing the HTML struc-", "bbox": {"l": 262.716, "t": 439.9377099999999, "r": 480.59009, "b": 448.73467999999997, "coord_origin": "1"}}, {"id": 33, "text": "ture, one needs to explicitly enumerate possible column-spans and row-spans", "bbox": {"l": 134.76501, "t": 451.8927, "r": 480.58777, "b": 460.68967, "coord_origin": "1"}}, {"id": 34, "text": "as words. In practice, this ends up requiring 28 different HTML tokens (when", "bbox": {"l": 134.76501, "t": 463.84769, "r": 480.58681999999993, "b": 472.64465, "coord_origin": "1"}}, {"id": 35, "text": "including column- and row-spans up to 10 cells) just to describe every table in", "bbox": {"l": 134.76501, "t": 475.80368, "r": 480.58681999999993, "b": 484.60065, "coord_origin": "1"}}, {"id": 36, "text": "the PubTabNet dataset. Clearly, not every token is equally represented, as is", "bbox": {"l": 134.76501, "t": 487.75867, "r": 480.59067, "b": 496.55563, "coord_origin": "1"}}, {"id": 37, "text": "depicted in Figure 2. This skewed distribution of tokens in combination with", "bbox": {"l": 134.76501, "t": 499.71365, "r": 480.59277, "b": 508.51062, "coord_origin": "1"}}, {"id": 38, "text": "variable token row-length makes it challenging for models to learn the HTML", "bbox": {"l": 134.76501, "t": 511.66864, "r": 480.59476, "b": 520.46561, "coord_origin": "1"}}, {"id": 39, "text": "structure.", "bbox": {"l": 134.76501, "t": 523.62363, "r": 176.92873, "b": 532.42059, "coord_origin": "1"}}]}, "text": "Obviously, HTML and other general-purpose markup languages were not designed for Im2Seq models. As such, they have some serious drawbacks. First, the token vocabulary needs to be artificially large in order to describe all plausible tabular structures. Since most Im2Seq models use an autoregressive approach, they generate the sequence token by token. Therefore, to reduce inference time, a shorter sequence length is critical. Every table-cell is represented by at least two tokens ( <td> and </td> ). Furthermore, when tokenizing the HTML structure, one needs to explicitly enumerate possible column-spans and row-spans as words. In practice, this ends up requiring 28 different HTML tokens (when including column- and row-spans up to 10 cells) just to describe every table in the PubTabNet dataset. Clearly, not every token is equally represented, as is depicted in Figure 2. This skewed distribution of tokens in combination with variable token row-length makes it challenging for models to learn the HTML structure."}, {"label": "Text", "id": 5, "page_no": 4, "cluster": {"id": 5, "label": "Text", "bbox": {"l": 133.89939651489257, "t": 534.8984985351562, "r": 480.59289999999993, "b": 581.5316436767579, "coord_origin": "1"}, "confidence": 0.9800698161125183, "cells": [{"id": 40, "text": "Additionally, it would be desirable if the representation would easily allow", "bbox": {"l": 149.70901, "t": 536.04263, "r": 480.59289999999993, "b": 544.8396, "coord_origin": "1"}}, {"id": 41, "text": "an early detection of invalid sequences on-the-go, before the prediction of the", "bbox": {"l": 134.76501, "t": 547.99763, "r": 480.59085, "b": 556.7946000000001, "coord_origin": "1"}}, {"id": 42, "text": "entire table structure is completed. HTML is not well-suited for this purpose as", "bbox": {"l": 134.76501, "t": 559.95264, "r": 480.58984, "b": 568.7496, "coord_origin": "1"}}, {"id": 43, "text": "the verification of incomplete sequences is non-trivial or even impossible.", "bbox": {"l": 134.76501, "t": 571.90863, "r": 452.18933, "b": 580.7056, "coord_origin": "1"}}]}, "text": "Additionally, it would be desirable if the representation would easily allow an early detection of invalid sequences on-the-go, before the prediction of the entire table structure is completed. HTML is not well-suited for this purpose as the verification of incomplete sequences is non-trivial or even impossible."}, {"label": "Text", "id": 6, "page_no": 4, "cluster": {"id": 6, "label": "Text", "bbox": {"l": 133.75929164886475, "t": 583.1087310791016, "r": 480.59473, "b": 665.1034538269042, "coord_origin": "1"}, "confidence": 0.9856952428817749, "cells": [{"id": 44, "text": "In a valid HTML table, the token sequence must describe a 2D grid of table", "bbox": {"l": 149.70901, "t": 584.32663, "r": 480.59283000000005, "b": 593.1236, "coord_origin": "1"}}, {"id": 45, "text": "cells, serialised in row-major ordering, where each row and each column have", "bbox": {"l": 134.76501, "t": 596.28262, "r": 480.58978, "b": 605.07959, "coord_origin": "1"}}, {"id": 46, "text": "the same length (while considering row- and column-spans). Furthermore, every", "bbox": {"l": 134.76501, "t": 608.23763, "r": 480.5936899999999, "b": 617.03459, "coord_origin": "1"}}, {"id": 47, "text": "opening tag in HTML needs to be matched by a closing tag in a correct hierar-", "bbox": {"l": 134.76501, "t": 620.19263, "r": 480.59091, "b": 628.98959, "coord_origin": "1"}}, {"id": 48, "text": "chical manner. Since the number of tokens for each table row and column can", "bbox": {"l": 134.76501, "t": 632.1476299999999, "r": 480.58582, "b": 640.9446, "coord_origin": "1"}}, {"id": 49, "text": "vary significantly, especially for large tables with many row- and column-spans,", "bbox": {"l": 134.76501, "t": 644.10263, "r": 480.59180000000003, "b": 652.8996, "coord_origin": "1"}}, {"id": 50, "text": "it is complex to verify the consistency of predicted structures during sequence", "bbox": {"l": 134.76501, "t": 656.05763, "r": 480.59473, "b": 664.85461, "coord_origin": "1"}}]}, "text": "In a valid HTML table, the token sequence must describe a 2D grid of table cells, serialised in row-major ordering, where each row and each column have the same length (while considering row- and column-spans). Furthermore, every opening tag in HTML needs to be matched by a closing tag in a correct hierarchical manner. Since the number of tokens for each table row and column can vary significantly, especially for large tables with many row- and column-spans, it is complex to verify the consistency of predicted structures during sequence"}, {"label": "Picture", "id": 7, "page_no": 4, "cluster": {"id": 7, "label": "Picture", "bbox": {"l": 137.53746843338013, "t": 229.03011989593506, "r": 476.1513336181641, "b": 339.5847587585449, "coord_origin": "1"}, "confidence": 0.9503862857818604, "cells": []}, "text": "", "data": null, "provenance": null, "predicted_class": null, "confidence": null}], "headers": [{"label": "Page-header", "id": 0, "page_no": 4, "cluster": {"id": 0, "label": "Page-header", "bbox": {"l": 194.02211236953735, "t": 93.09380578994751, "r": 447.54291000000006, "b": 102.16611814498901, "coord_origin": "1"}, "confidence": 0.9483410120010376, "cells": [{"id": 0, "text": "Optimized Table Tokenization for Table Structure Recognition", "bbox": {"l": 194.478, "t": 93.77099999999996, "r": 447.54291000000006, "b": 101.84069999999997, "coord_origin": "1"}}]}, "text": "Optimized Table Tokenization for Table Structure Recognition"}, {"label": "Page-header", "id": 1, "page_no": 4, "cluster": {"id": 1, "label": "Page-header", "bbox": {"l": 475.13187446594236, "t": 93.52824125289919, "r": 480.59125000000006, "b": 101.84069999999997, "coord_origin": "1"}, "confidence": 0.8685489892959595, "cells": [{"id": 1, "text": "5", "bbox": {"l": 475.98431, "t": 93.77099999999996, "r": 480.59125000000006, "b": 101.84069999999997, "coord_origin": "1"}}]}, "text": "5"}]}}, {"page_no": 5, "page_hash": "eaaaaebf96b567c9bd5696b2dd4d747b3b3ad40e15ca8dc8968c56060315f228", "size": {"width": 612.0, "height": 792.0}, "cells": [{"id": 0, "text": "6", "bbox": {"l": 134.765, "t": 93.77099999999996, "r": 139.37193, "b": 101.84069999999997, "coord_origin": "1"}}, {"id": 1, "text": "M.", "bbox": {"l": 167.81335, "t": 93.77099999999996, "r": 178.07675, "b": 101.84069999999997, "coord_origin": "1"}}, {"id": 2, "text": "Lysak, et al.", "bbox": {"l": 182.37415, "t": 93.77099999999996, "r": 231.72227, "b": 101.84069999999997, "coord_origin": "1"}}, {"id": 3, "text": "generation. Implicitly, this also means that Im2Seq models need to learn these", "bbox": {"l": 134.765, "t": 118.93377999999996, "r": 480.59479, "b": 127.73077, "coord_origin": "1"}}, {"id": 4, "text": "complex syntax rules, simply to deliver valid output.", "bbox": {"l": 134.765, "t": 130.88878999999997, "r": 364.62503, "b": 139.68579, "coord_origin": "1"}}, {"id": 5, "text": "In practice, we observe two major issues with prediction quality when train-", "bbox": {"l": 149.709, "t": 143.48279000000002, "r": 480.58981, "b": 152.27979000000005, "coord_origin": "1"}}, {"id": 6, "text": "ing Im2Seq models on HTML table structure generation from images. On the", "bbox": {"l": 134.765, "t": 155.43781, "r": 480.59378, "b": 164.23479999999995, "coord_origin": "1"}}, {"id": 7, "text": "one hand, we find that on large tables, the visual attention of the model often", "bbox": {"l": 134.765, "t": 167.39282000000003, "r": 480.5867, "b": 176.18982000000005, "coord_origin": "1"}}, {"id": 8, "text": "starts to drift and is not accurately moving forward cell by cell anymore. This", "bbox": {"l": 134.765, "t": 179.34784000000002, "r": 480.59476, "b": 188.14484000000004, "coord_origin": "1"}}, {"id": 9, "text": "manifests itself in either in an increasing", "bbox": {"l": 134.765, "t": 191.30286, "r": 314.27805, "b": 200.09984999999995, "coord_origin": "1"}}, {"id": 10, "text": "location drift", "bbox": {"l": 318.056, "t": 191.30286, "r": 374.08664, "b": 200.09984999999995, "coord_origin": "1"}}, {"id": 11, "text": "for proposed table-cells", "bbox": {"l": 378.80899, "t": 191.30286, "r": 480.58594, "b": 200.09984999999995, "coord_origin": "1"}}, {"id": 12, "text": "in later rows on the same column or even complete loss of vertical alignment, as", "bbox": {"l": 134.76498, "t": 203.25885000000005, "r": 480.58771, "b": 212.05584999999996, "coord_origin": "1"}}, {"id": 13, "text": "illustrated in Figure 5. Addressing this with post-processing is partially possible,", "bbox": {"l": 134.76498, "t": 215.21387000000004, "r": 480.59569999999997, "b": 224.01085999999998, "coord_origin": "1"}}, {"id": 14, "text": "but clearly undesired. On the other hand, we find many instances of predictions", "bbox": {"l": 134.76498, "t": 227.16887999999994, "r": 480.59454, "b": 235.96587999999997, "coord_origin": "1"}}, {"id": 15, "text": "with structural inconsistencies or plain invalid HTML output, as shown in Fig-", "bbox": {"l": 134.76498, "t": 239.12390000000005, "r": 480.58759000000003, "b": 247.92089999999996, "coord_origin": "1"}}, {"id": 16, "text": "ure 6, which are nearly impossible to properly correct. Both problems seriously", "bbox": {"l": 134.76498, "t": 251.07892000000004, "r": 480.59277, "b": 259.87591999999995, "coord_origin": "1"}}, {"id": 17, "text": "impact the TSR model performance, since they reflect not only in the task of", "bbox": {"l": 134.76498, "t": 263.03394000000003, "r": 480.59463999999997, "b": 271.83092999999997, "coord_origin": "1"}}, {"id": 18, "text": "pure structure recognition but also in the equally crucial recognition or matching", "bbox": {"l": 134.76498, "t": 274.98992999999996, "r": 480.58978, "b": 283.78693, "coord_origin": "1"}}, {"id": 19, "text": "of table cell content.", "bbox": {"l": 134.76498, "t": 286.94495, "r": 223.57262, "b": 295.74191, "coord_origin": "1"}}, {"id": 20, "text": "4", "bbox": {"l": 134.76498, "t": 320.6311, "r": 141.48859, "b": 331.19949, "coord_origin": "1"}}, {"id": 21, "text": "Optimised Table Structure Language", "bbox": {"l": 154.93819, "t": 320.6311, "r": 372.50848, "b": 331.19949, "coord_origin": "1"}}, {"id": 22, "text": "To mitigate the issues with HTML in Im2Seq-based TSR models laid out before,", "bbox": {"l": 134.76498, "t": 349.11697, "r": 480.59075999999993, "b": 357.91394, "coord_origin": "1"}}, {"id": 23, "text": "we propose here our Optimised Table Structure Language (OTSL). OTSL is", "bbox": {"l": 134.76498, "t": 361.07196000000005, "r": 480.58875, "b": 369.86893, "coord_origin": "1"}}, {"id": 24, "text": "designed to express table structure with a minimized vocabulary and a simple", "bbox": {"l": 134.76498, "t": 373.02795, "r": 480.58681999999993, "b": 381.82492, "coord_origin": "1"}}, {"id": 25, "text": "set of rules, which are both significantly reduced compared to HTML. At the", "bbox": {"l": 134.76498, "t": 384.98294, "r": 480.58875, "b": 393.77991, "coord_origin": "1"}}, {"id": 26, "text": "same time, OTSL enables easy error detection and correction during sequence", "bbox": {"l": 134.76498, "t": 396.93793, "r": 480.58978, "b": 405.73489, "coord_origin": "1"}}, {"id": 27, "text": "generation. We further demonstrate how the compact structure representation", "bbox": {"l": 134.76498, "t": 408.89291, "r": 480.59473, "b": 417.68988, "coord_origin": "1"}}, {"id": 28, "text": "and minimized sequence length improves prediction accuracy and inference time", "bbox": {"l": 134.76498, "t": 420.8479, "r": 480.58868, "b": 429.64487, "coord_origin": "1"}}, {"id": 29, "text": "in the TableFormer architecture.", "bbox": {"l": 134.76498, "t": 432.80289, "r": 276.67325, "b": 441.59985, "coord_origin": "1"}}, {"id": 30, "text": "4.1", "bbox": {"l": 134.76498, "t": 465.87192, "r": 149.40204, "b": 474.67886, "coord_origin": "1"}}, {"id": 31, "text": "Language Definition", "bbox": {"l": 160.85902, "t": 465.87192, "r": 261.80109, "b": 474.67886, "coord_origin": "1"}}, {"id": 32, "text": "In Figure 3, we illustrate how the OTSL is defined. In essence, the OTSL defines", "bbox": {"l": 134.76498, "t": 488.99789, "r": 480.58871, "b": 497.79486, "coord_origin": "1"}}, {"id": 33, "text": "only 5 tokens that directly describe a tabular structure based on an atomic 2D", "bbox": {"l": 134.76498, "t": 500.95288, "r": 480.5867, "b": 509.74985, "coord_origin": "1"}}, {"id": 34, "text": "grid.", "bbox": {"l": 134.76498, "t": 512.90887, "r": 154.7131, "b": 521.7058400000001, "coord_origin": "1"}}, {"id": 35, "text": "The OTSL vocabulary is comprised of the following tokens:", "bbox": {"l": 149.70898, "t": 525.5018600000001, "r": 409.31137, "b": 534.29883, "coord_origin": "1"}}, {"id": 36, "text": "-", "bbox": {"l": 140.99298, "t": 547.96989, "r": 146.72047, "b": 556.77682, "coord_origin": "1"}}, {"id": 37, "text": "\"C\" cell -", "bbox": {"l": 151.70099, "t": 547.97986, "r": 193.20619, "b": 556.77682, "coord_origin": "1"}}, {"id": 38, "text": "a new table cell", "bbox": {"l": 196.52199, "t": 547.97986, "r": 263.46564, "b": 556.77682, "coord_origin": "1"}}, {"id": 39, "text": "that either has or does not have cell content", "bbox": {"l": 267.815, "t": 547.97986, "r": 460.54443, "b": 556.77682, "coord_origin": "1"}}, {"id": 40, "text": "-", "bbox": {"l": 140.99301, "t": 560.5629, "r": 146.7205, "b": 569.36983, "coord_origin": "1"}}, {"id": 41, "text": "\"L\" cell -", "bbox": {"l": 151.70102, "t": 560.57286, "r": 194.30011, "b": 569.36983, "coord_origin": "1"}}, {"id": 42, "text": "left-looking cell", "bbox": {"l": 198.65903, "t": 560.57286, "r": 264.51779, "b": 569.36983, "coord_origin": "1"}}, {"id": 43, "text": ", merging with the left neighbor cell to create a", "bbox": {"l": 264.51804, "t": 560.57286, "r": 480.59392999999994, "b": 569.36983, "coord_origin": "1"}}, {"id": 44, "text": "span", "bbox": {"l": 151.70103, "t": 572.52786, "r": 171.67604, "b": 581.32483, "coord_origin": "1"}}, {"id": 45, "text": "-", "bbox": {"l": 140.99304, "t": 585.11189, "r": 146.72054, "b": 593.91882, "coord_origin": "1"}}, {"id": 46, "text": "\"U\" cell -", "bbox": {"l": 151.70105, "t": 585.12186, "r": 194.11086, "b": 593.91882, "coord_origin": "1"}}, {"id": 47, "text": "up-looking cell", "bbox": {"l": 197.74805, "t": 585.12186, "r": 259.89474, "b": 593.91882, "coord_origin": "1"}}, {"id": 48, "text": ", merging with the upper neighbor cell to create a", "bbox": {"l": 259.89206, "t": 585.12186, "r": 480.58856, "b": 593.91882, "coord_origin": "1"}}, {"id": 49, "text": "span", "bbox": {"l": 151.70105, "t": 597.07686, "r": 171.67606, "b": 605.87383, "coord_origin": "1"}}, {"id": 50, "text": "-", "bbox": {"l": 140.99304, "t": 609.6599, "r": 146.72054, "b": 618.46683, "coord_origin": "1"}}, {"id": 51, "text": "\"X\" cell -", "bbox": {"l": 151.70105, "t": 609.66986, "r": 193.48323, "b": 618.46683, "coord_origin": "1"}}, {"id": 52, "text": "cross cell", "bbox": {"l": 196.79904, "t": 609.66986, "r": 236.12042, "b": 618.46683, "coord_origin": "1"}}, {"id": 53, "text": ", to merge with both left and upper neighbor cells", "bbox": {"l": 236.12505, "t": 609.66986, "r": 454.55496, "b": 618.46683, "coord_origin": "1"}}, {"id": 54, "text": "-", "bbox": {"l": 140.99304, "t": 622.2538900000001, "r": 146.72054, "b": 631.06082, "coord_origin": "1"}}, {"id": 55, "text": "\"NL\" -", "bbox": {"l": 151.70105, "t": 622.26385, "r": 181.99434, "b": 631.06082, "coord_origin": "1"}}, {"id": 56, "text": "new-line", "bbox": {"l": 185.31705, "t": 622.26385, "r": 221.46236, "b": 631.06082, "coord_origin": "1"}}, {"id": 57, "text": ", switch to the next row.", "bbox": {"l": 221.46104, "t": 622.26385, "r": 328.61676, "b": 631.06082, "coord_origin": "1"}}, {"id": 58, "text": "A notable attribute of OTSL is that it has the capability of achieving lossless", "bbox": {"l": 149.70905, "t": 644.10286, "r": 480.59280000000007, "b": 652.8998300000001, "coord_origin": "1"}}, {"id": 59, "text": "conversion to HTML.", "bbox": {"l": 134.76505, "t": 656.05786, "r": 228.22321, "b": 664.85484, "coord_origin": "1"}}], "predictions": {"layout": {"clusters": [{"id": 0, "label": "Page-header", "bbox": {"l": 134.1282597541809, "t": 93.76585235595701, "r": 139.453120136261, "b": 101.84069999999997, "coord_origin": "1"}, "confidence": 0.8473044633865356, "cells": [{"id": 0, "text": "6", "bbox": {"l": 134.765, "t": 93.77099999999996, "r": 139.37193, "b": 101.84069999999997, "coord_origin": "1"}}]}, {"id": 1, "label": "Page-header", "bbox": {"l": 167.2993927001953, "t": 93.00047779083252, "r": 231.72227, "b": 101.91809320449829, "coord_origin": "1"}, "confidence": 0.9042621850967407, "cells": [{"id": 1, "text": "M.", "bbox": {"l": 167.81335, "t": 93.77099999999996, "r": 178.07675, "b": 101.84069999999997, "coord_origin": "1"}}, {"id": 2, "text": "Lysak, et al.", "bbox": {"l": 182.37415, "t": 93.77099999999996, "r": 231.72227, "b": 101.84069999999997, "coord_origin": "1"}}]}, {"id": 2, "label": "Text", "bbox": {"l": 133.94254274368288, "t": 118.29494304656987, "r": 480.59479, "b": 140.69587211608882, "coord_origin": "1"}, "confidence": 0.9729993939399719, "cells": [{"id": 3, "text": "generation. Implicitly, this also means that Im2Seq models need to learn these", "bbox": {"l": 134.765, "t": 118.93377999999996, "r": 480.59479, "b": 127.73077, "coord_origin": "1"}}, {"id": 4, "text": "complex syntax rules, simply to deliver valid output.", "bbox": {"l": 134.765, "t": 130.88878999999997, "r": 364.62503, "b": 139.68579, "coord_origin": "1"}}]}, {"id": 3, "label": "Text", "bbox": {"l": 133.6434519767761, "t": 142.55638961791988, "r": 480.59569999999997, "b": 295.74191, "coord_origin": "1"}, "confidence": 0.9872115850448608, "cells": [{"id": 5, "text": "In practice, we observe two major issues with prediction quality when train-", "bbox": {"l": 149.709, "t": 143.48279000000002, "r": 480.58981, "b": 152.27979000000005, "coord_origin": "1"}}, {"id": 6, "text": "ing Im2Seq models on HTML table structure generation from images. On the", "bbox": {"l": 134.765, "t": 155.43781, "r": 480.59378, "b": 164.23479999999995, "coord_origin": "1"}}, {"id": 7, "text": "one hand, we find that on large tables, the visual attention of the model often", "bbox": {"l": 134.765, "t": 167.39282000000003, "r": 480.5867, "b": 176.18982000000005, "coord_origin": "1"}}, {"id": 8, "text": "starts to drift and is not accurately moving forward cell by cell anymore. This", "bbox": {"l": 134.765, "t": 179.34784000000002, "r": 480.59476, "b": 188.14484000000004, "coord_origin": "1"}}, {"id": 9, "text": "manifests itself in either in an increasing", "bbox": {"l": 134.765, "t": 191.30286, "r": 314.27805, "b": 200.09984999999995, "coord_origin": "1"}}, {"id": 10, "text": "location drift", "bbox": {"l": 318.056, "t": 191.30286, "r": 374.08664, "b": 200.09984999999995, "coord_origin": "1"}}, {"id": 11, "text": "for proposed table-cells", "bbox": {"l": 378.80899, "t": 191.30286, "r": 480.58594, "b": 200.09984999999995, "coord_origin": "1"}}, {"id": 12, "text": "in later rows on the same column or even complete loss of vertical alignment, as", "bbox": {"l": 134.76498, "t": 203.25885000000005, "r": 480.58771, "b": 212.05584999999996, "coord_origin": "1"}}, {"id": 13, "text": "illustrated in Figure 5. Addressing this with post-processing is partially possible,", "bbox": {"l": 134.76498, "t": 215.21387000000004, "r": 480.59569999999997, "b": 224.01085999999998, "coord_origin": "1"}}, {"id": 14, "text": "but clearly undesired. On the other hand, we find many instances of predictions", "bbox": {"l": 134.76498, "t": 227.16887999999994, "r": 480.59454, "b": 235.96587999999997, "coord_origin": "1"}}, {"id": 15, "text": "with structural inconsistencies or plain invalid HTML output, as shown in Fig-", "bbox": {"l": 134.76498, "t": 239.12390000000005, "r": 480.58759000000003, "b": 247.92089999999996, "coord_origin": "1"}}, {"id": 16, "text": "ure 6, which are nearly impossible to properly correct. Both problems seriously", "bbox": {"l": 134.76498, "t": 251.07892000000004, "r": 480.59277, "b": 259.87591999999995, "coord_origin": "1"}}, {"id": 17, "text": "impact the TSR model performance, since they reflect not only in the task of", "bbox": {"l": 134.76498, "t": 263.03394000000003, "r": 480.59463999999997, "b": 271.83092999999997, "coord_origin": "1"}}, {"id": 18, "text": "pure structure recognition but also in the equally crucial recognition or matching", "bbox": {"l": 134.76498, "t": 274.98992999999996, "r": 480.58978, "b": 283.78693, "coord_origin": "1"}}, {"id": 19, "text": "of table cell content.", "bbox": {"l": 134.76498, "t": 286.94495, "r": 223.57262, "b": 295.74191, "coord_origin": "1"}}]}, {"id": 4, "label": "Section-header", "bbox": {"l": 134.07444734573366, "t": 319.69540557861325, "r": 372.50848, "b": 331.54225502014157, "coord_origin": "1"}, "confidence": 0.9534726738929749, "cells": [{"id": 20, "text": "4", "bbox": {"l": 134.76498, "t": 320.6311, "r": 141.48859, "b": 331.19949, "coord_origin": "1"}}, {"id": 21, "text": "Optimised Table Structure Language", "bbox": {"l": 154.93819, "t": 320.6311, "r": 372.50848, "b": 331.19949, "coord_origin": "1"}}]}, {"id": 5, "label": "Text", "bbox": {"l": 133.82858533859255, "t": 348.3478282928467, "r": 480.59473, "b": 441.59985, "coord_origin": "1"}, "confidence": 0.9874445796012878, "cells": [{"id": 22, "text": "To mitigate the issues with HTML in Im2Seq-based TSR models laid out before,", "bbox": {"l": 134.76498, "t": 349.11697, "r": 480.59075999999993, "b": 357.91394, "coord_origin": "1"}}, {"id": 23, "text": "we propose here our Optimised Table Structure Language (OTSL). OTSL is", "bbox": {"l": 134.76498, "t": 361.07196000000005, "r": 480.58875, "b": 369.86893, "coord_origin": "1"}}, {"id": 24, "text": "designed to express table structure with a minimized vocabulary and a simple", "bbox": {"l": 134.76498, "t": 373.02795, "r": 480.58681999999993, "b": 381.82492, "coord_origin": "1"}}, {"id": 25, "text": "set of rules, which are both significantly reduced compared to HTML. At the", "bbox": {"l": 134.76498, "t": 384.98294, "r": 480.58875, "b": 393.77991, "coord_origin": "1"}}, {"id": 26, "text": "same time, OTSL enables easy error detection and correction during sequence", "bbox": {"l": 134.76498, "t": 396.93793, "r": 480.58978, "b": 405.73489, "coord_origin": "1"}}, {"id": 27, "text": "generation. We further demonstrate how the compact structure representation", "bbox": {"l": 134.76498, "t": 408.89291, "r": 480.59473, "b": 417.68988, "coord_origin": "1"}}, {"id": 28, "text": "and minimized sequence length improves prediction accuracy and inference time", "bbox": {"l": 134.76498, "t": 420.8479, "r": 480.58868, "b": 429.64487, "coord_origin": "1"}}, {"id": 29, "text": "in the TableFormer architecture.", "bbox": {"l": 134.76498, "t": 432.80289, "r": 276.67325, "b": 441.59985, "coord_origin": "1"}}]}, {"id": 6, "label": "Section-header", "bbox": {"l": 134.02143745422362, "t": 465.00746841430663, "r": 261.80109, "b": 475.0406295776367, "coord_origin": "1"}, "confidence": 0.9574865698814392, "cells": [{"id": 30, "text": "4.1", "bbox": {"l": 134.76498, "t": 465.87192, "r": 149.40204, "b": 474.67886, "coord_origin": "1"}}, {"id": 31, "text": "Language Definition", "bbox": {"l": 160.85902, "t": 465.87192, "r": 261.80109, "b": 474.67886, "coord_origin": "1"}}]}, {"id": 7, "label": "Text", "bbox": {"l": 134.03182640075684, "t": 488.4044437408447, "r": 480.58871, "b": 522.0173652648926, "coord_origin": "1"}, "confidence": 0.9783343076705933, "cells": [{"id": 32, "text": "In Figure 3, we illustrate how the OTSL is defined. In essence, the OTSL defines", "bbox": {"l": 134.76498, "t": 488.99789, "r": 480.58871, "b": 497.79486, "coord_origin": "1"}}, {"id": 33, "text": "only 5 tokens that directly describe a tabular structure based on an atomic 2D", "bbox": {"l": 134.76498, "t": 500.95288, "r": 480.5867, "b": 509.74985, "coord_origin": "1"}}, {"id": 34, "text": "grid.", "bbox": {"l": 134.76498, "t": 512.90887, "r": 154.7131, "b": 521.7058400000001, "coord_origin": "1"}}]}, {"id": 8, "label": "Text", "bbox": {"l": 149.35654392242432, "t": 525.0188541412354, "r": 409.31137, "b": 535.0435180664062, "coord_origin": "1"}, "confidence": 0.8631380796432495, "cells": [{"id": 35, "text": "The OTSL vocabulary is comprised of the following tokens:", "bbox": {"l": 149.70898, "t": 525.5018600000001, "r": 409.31137, "b": 534.29883, "coord_origin": "1"}}]}, {"id": 9, "label": "List-item", "bbox": {"l": 139.9448733329773, "t": 546.6955352783203, "r": 460.54443, "b": 556.77682, "coord_origin": "1"}, "confidence": 0.9208247065544128, "cells": [{"id": 36, "text": "-", "bbox": {"l": 140.99298, "t": 547.96989, "r": 146.72047, "b": 556.77682, "coord_origin": "1"}}, {"id": 37, "text": "\"C\" cell -", "bbox": {"l": 151.70099, "t": 547.97986, "r": 193.20619, "b": 556.77682, "coord_origin": "1"}}, {"id": 38, "text": "a new table cell", "bbox": {"l": 196.52199, "t": 547.97986, "r": 263.46564, "b": 556.77682, "coord_origin": "1"}}, {"id": 39, "text": "that either has or does not have cell content", "bbox": {"l": 267.815, "t": 547.97986, "r": 460.54443, "b": 556.77682, "coord_origin": "1"}}]}, {"id": 10, "label": "List-item", "bbox": {"l": 139.97167739868163, "t": 559.1281276702881, "r": 480.59392999999994, "b": 581.8816543579102, "coord_origin": "1"}, "confidence": 0.9447745084762573, "cells": [{"id": 40, "text": "-", "bbox": {"l": 140.99301, "t": 560.5629, "r": 146.7205, "b": 569.36983, "coord_origin": "1"}}, {"id": 41, "text": "\"L\" cell -", "bbox": {"l": 151.70102, "t": 560.57286, "r": 194.30011, "b": 569.36983, "coord_origin": "1"}}, {"id": 42, "text": "left-looking cell", "bbox": {"l": 198.65903, "t": 560.57286, "r": 264.51779, "b": 569.36983, "coord_origin": "1"}}, {"id": 43, "text": ", merging with the left neighbor cell to create a", "bbox": {"l": 264.51804, "t": 560.57286, "r": 480.59392999999994, "b": 569.36983, "coord_origin": "1"}}, {"id": 44, "text": "span", "bbox": {"l": 151.70103, "t": 572.52786, "r": 171.67604, "b": 581.32483, "coord_origin": "1"}}]}, {"id": 11, "label": "List-item", "bbox": {"l": 140.17970438003542, "t": 584.0574760437012, "r": 480.58856, "b": 607.0045509338379, "coord_origin": "1"}, "confidence": 0.9388728141784668, "cells": [{"id": 45, "text": "-", "bbox": {"l": 140.99304, "t": 585.11189, "r": 146.72054, "b": 593.91882, "coord_origin": "1"}}, {"id": 46, "text": "\"U\" cell -", "bbox": {"l": 151.70105, "t": 585.12186, "r": 194.11086, "b": 593.91882, "coord_origin": "1"}}, {"id": 47, "text": "up-looking cell", "bbox": {"l": 197.74805, "t": 585.12186, "r": 259.89474, "b": 593.91882, "coord_origin": "1"}}, {"id": 48, "text": ", merging with the upper neighbor cell to create a", "bbox": {"l": 259.89206, "t": 585.12186, "r": 480.58856, "b": 593.91882, "coord_origin": "1"}}, {"id": 49, "text": "span", "bbox": {"l": 151.70105, "t": 597.07686, "r": 171.67606, "b": 605.87383, "coord_origin": "1"}}]}, {"id": 12, "label": "List-item", "bbox": {"l": 139.92364311218262, "t": 608.5861701965332, "r": 454.55496, "b": 619.1174583435059, "coord_origin": "1"}, "confidence": 0.9166865348815918, "cells": [{"id": 50, "text": "-", "bbox": {"l": 140.99304, "t": 609.6599, "r": 146.72054, "b": 618.46683, "coord_origin": "1"}}, {"id": 51, "text": "\"X\" cell -", "bbox": {"l": 151.70105, "t": 609.66986, "r": 193.48323, "b": 618.46683, "coord_origin": "1"}}, {"id": 52, "text": "cross cell", "bbox": {"l": 196.79904, "t": 609.66986, "r": 236.12042, "b": 618.46683, "coord_origin": "1"}}, {"id": 53, "text": ", to merge with both left and upper neighbor cells", "bbox": {"l": 236.12505, "t": 609.66986, "r": 454.55496, "b": 618.46683, "coord_origin": "1"}}]}, {"id": 13, "label": "List-item", "bbox": {"l": 139.8769658088684, "t": 621.1636688232421, "r": 328.61676, "b": 631.06082, "coord_origin": "1"}, "confidence": 0.9090225696563721, "cells": [{"id": 54, "text": "-", "bbox": {"l": 140.99304, "t": 622.2538900000001, "r": 146.72054, "b": 631.06082, "coord_origin": "1"}}, {"id": 55, "text": "\"NL\" -", "bbox": {"l": 151.70105, "t": 622.26385, "r": 181.99434, "b": 631.06082, "coord_origin": "1"}}, {"id": 56, "text": "new-line", "bbox": {"l": 185.31705, "t": 622.26385, "r": 221.46236, "b": 631.06082, "coord_origin": "1"}}, {"id": 57, "text": ", switch to the next row.", "bbox": {"l": 221.46104, "t": 622.26385, "r": 328.61676, "b": 631.06082, "coord_origin": "1"}}]}, {"id": 14, "label": "Text", "bbox": {"l": 134.1934679031372, "t": 643.1055770874024, "r": 480.59280000000007, "b": 664.85484, "coord_origin": "1"}, "confidence": 0.9640712141990662, "cells": [{"id": 58, "text": "A notable attribute of OTSL is that it has the capability of achieving lossless", "bbox": {"l": 149.70905, "t": 644.10286, "r": 480.59280000000007, "b": 652.8998300000001, "coord_origin": "1"}}, {"id": 59, "text": "conversion to HTML.", "bbox": {"l": 134.76505, "t": 656.05786, "r": 228.22321, "b": 664.85484, "coord_origin": "1"}}]}]}, "tablestructure": {"table_map": {}}, "figures_classification": null, "equations_prediction": null}, "assembled": {"elements": [{"label": "Page-header", "id": 0, "page_no": 5, "cluster": {"id": 0, "label": "Page-header", "bbox": {"l": 134.1282597541809, "t": 93.76585235595701, "r": 139.453120136261, "b": 101.84069999999997, "coord_origin": "1"}, "confidence": 0.8473044633865356, "cells": [{"id": 0, "text": "6", "bbox": {"l": 134.765, "t": 93.77099999999996, "r": 139.37193, "b": 101.84069999999997, "coord_origin": "1"}}]}, "text": "6"}, {"label": "Page-header", "id": 1, "page_no": 5, "cluster": {"id": 1, "label": "Page-header", "bbox": {"l": 167.2993927001953, "t": 93.00047779083252, "r": 231.72227, "b": 101.91809320449829, "coord_origin": "1"}, "confidence": 0.9042621850967407, "cells": [{"id": 1, "text": "M.", "bbox": {"l": 167.81335, "t": 93.77099999999996, "r": 178.07675, "b": 101.84069999999997, "coord_origin": "1"}}, {"id": 2, "text": "Lysak, et al.", "bbox": {"l": 182.37415, "t": 93.77099999999996, "r": 231.72227, "b": 101.84069999999997, "coord_origin": "1"}}]}, "text": "M. Lysak, et al."}, {"label": "Text", "id": 2, "page_no": 5, "cluster": {"id": 2, "label": "Text", "bbox": {"l": 133.94254274368288, "t": 118.29494304656987, "r": 480.59479, "b": 140.69587211608882, "coord_origin": "1"}, "confidence": 0.9729993939399719, "cells": [{"id": 3, "text": "generation. Implicitly, this also means that Im2Seq models need to learn these", "bbox": {"l": 134.765, "t": 118.93377999999996, "r": 480.59479, "b": 127.73077, "coord_origin": "1"}}, {"id": 4, "text": "complex syntax rules, simply to deliver valid output.", "bbox": {"l": 134.765, "t": 130.88878999999997, "r": 364.62503, "b": 139.68579, "coord_origin": "1"}}]}, "text": "generation. Implicitly, this also means that Im2Seq models need to learn these complex syntax rules, simply to deliver valid output."}, {"label": "Text", "id": 3, "page_no": 5, "cluster": {"id": 3, "label": "Text", "bbox": {"l": 133.6434519767761, "t": 142.55638961791988, "r": 480.59569999999997, "b": 295.74191, "coord_origin": "1"}, "confidence": 0.9872115850448608, "cells": [{"id": 5, "text": "In practice, we observe two major issues with prediction quality when train-", "bbox": {"l": 149.709, "t": 143.48279000000002, "r": 480.58981, "b": 152.27979000000005, "coord_origin": "1"}}, {"id": 6, "text": "ing Im2Seq models on HTML table structure generation from images. On the", "bbox": {"l": 134.765, "t": 155.43781, "r": 480.59378, "b": 164.23479999999995, "coord_origin": "1"}}, {"id": 7, "text": "one hand, we find that on large tables, the visual attention of the model often", "bbox": {"l": 134.765, "t": 167.39282000000003, "r": 480.5867, "b": 176.18982000000005, "coord_origin": "1"}}, {"id": 8, "text": "starts to drift and is not accurately moving forward cell by cell anymore. This", "bbox": {"l": 134.765, "t": 179.34784000000002, "r": 480.59476, "b": 188.14484000000004, "coord_origin": "1"}}, {"id": 9, "text": "manifests itself in either in an increasing", "bbox": {"l": 134.765, "t": 191.30286, "r": 314.27805, "b": 200.09984999999995, "coord_origin": "1"}}, {"id": 10, "text": "location drift", "bbox": {"l": 318.056, "t": 191.30286, "r": 374.08664, "b": 200.09984999999995, "coord_origin": "1"}}, {"id": 11, "text": "for proposed table-cells", "bbox": {"l": 378.80899, "t": 191.30286, "r": 480.58594, "b": 200.09984999999995, "coord_origin": "1"}}, {"id": 12, "text": "in later rows on the same column or even complete loss of vertical alignment, as", "bbox": {"l": 134.76498, "t": 203.25885000000005, "r": 480.58771, "b": 212.05584999999996, "coord_origin": "1"}}, {"id": 13, "text": "illustrated in Figure 5. Addressing this with post-processing is partially possible,", "bbox": {"l": 134.76498, "t": 215.21387000000004, "r": 480.59569999999997, "b": 224.01085999999998, "coord_origin": "1"}}, {"id": 14, "text": "but clearly undesired. On the other hand, we find many instances of predictions", "bbox": {"l": 134.76498, "t": 227.16887999999994, "r": 480.59454, "b": 235.96587999999997, "coord_origin": "1"}}, {"id": 15, "text": "with structural inconsistencies or plain invalid HTML output, as shown in Fig-", "bbox": {"l": 134.76498, "t": 239.12390000000005, "r": 480.58759000000003, "b": 247.92089999999996, "coord_origin": "1"}}, {"id": 16, "text": "ure 6, which are nearly impossible to properly correct. Both problems seriously", "bbox": {"l": 134.76498, "t": 251.07892000000004, "r": 480.59277, "b": 259.87591999999995, "coord_origin": "1"}}, {"id": 17, "text": "impact the TSR model performance, since they reflect not only in the task of", "bbox": {"l": 134.76498, "t": 263.03394000000003, "r": 480.59463999999997, "b": 271.83092999999997, "coord_origin": "1"}}, {"id": 18, "text": "pure structure recognition but also in the equally crucial recognition or matching", "bbox": {"l": 134.76498, "t": 274.98992999999996, "r": 480.58978, "b": 283.78693, "coord_origin": "1"}}, {"id": 19, "text": "of table cell content.", "bbox": {"l": 134.76498, "t": 286.94495, "r": 223.57262, "b": 295.74191, "coord_origin": "1"}}]}, "text": "In practice, we observe two major issues with prediction quality when training Im2Seq models on HTML table structure generation from images. On the one hand, we find that on large tables, the visual attention of the model often starts to drift and is not accurately moving forward cell by cell anymore. This manifests itself in either in an increasing location drift for proposed table-cells in later rows on the same column or even complete loss of vertical alignment, as illustrated in Figure 5. Addressing this with post-processing is partially possible, but clearly undesired. On the other hand, we find many instances of predictions with structural inconsistencies or plain invalid HTML output, as shown in Figure 6, which are nearly impossible to properly correct. Both problems seriously impact the TSR model performance, since they reflect not only in the task of pure structure recognition but also in the equally crucial recognition or matching of table cell content."}, {"label": "Section-header", "id": 4, "page_no": 5, "cluster": {"id": 4, "label": "Section-header", "bbox": {"l": 134.07444734573366, "t": 319.69540557861325, "r": 372.50848, "b": 331.54225502014157, "coord_origin": "1"}, "confidence": 0.9534726738929749, "cells": [{"id": 20, "text": "4", "bbox": {"l": 134.76498, "t": 320.6311, "r": 141.48859, "b": 331.19949, "coord_origin": "1"}}, {"id": 21, "text": "Optimised Table Structure Language", "bbox": {"l": 154.93819, "t": 320.6311, "r": 372.50848, "b": 331.19949, "coord_origin": "1"}}]}, "text": "4 Optimised Table Structure Language"}, {"label": "Text", "id": 5, "page_no": 5, "cluster": {"id": 5, "label": "Text", "bbox": {"l": 133.82858533859255, "t": 348.3478282928467, "r": 480.59473, "b": 441.59985, "coord_origin": "1"}, "confidence": 0.9874445796012878, "cells": [{"id": 22, "text": "To mitigate the issues with HTML in Im2Seq-based TSR models laid out before,", "bbox": {"l": 134.76498, "t": 349.11697, "r": 480.59075999999993, "b": 357.91394, "coord_origin": "1"}}, {"id": 23, "text": "we propose here our Optimised Table Structure Language (OTSL). OTSL is", "bbox": {"l": 134.76498, "t": 361.07196000000005, "r": 480.58875, "b": 369.86893, "coord_origin": "1"}}, {"id": 24, "text": "designed to express table structure with a minimized vocabulary and a simple", "bbox": {"l": 134.76498, "t": 373.02795, "r": 480.58681999999993, "b": 381.82492, "coord_origin": "1"}}, {"id": 25, "text": "set of rules, which are both significantly reduced compared to HTML. At the", "bbox": {"l": 134.76498, "t": 384.98294, "r": 480.58875, "b": 393.77991, "coord_origin": "1"}}, {"id": 26, "text": "same time, OTSL enables easy error detection and correction during sequence", "bbox": {"l": 134.76498, "t": 396.93793, "r": 480.58978, "b": 405.73489, "coord_origin": "1"}}, {"id": 27, "text": "generation. We further demonstrate how the compact structure representation", "bbox": {"l": 134.76498, "t": 408.89291, "r": 480.59473, "b": 417.68988, "coord_origin": "1"}}, {"id": 28, "text": "and minimized sequence length improves prediction accuracy and inference time", "bbox": {"l": 134.76498, "t": 420.8479, "r": 480.58868, "b": 429.64487, "coord_origin": "1"}}, {"id": 29, "text": "in the TableFormer architecture.", "bbox": {"l": 134.76498, "t": 432.80289, "r": 276.67325, "b": 441.59985, "coord_origin": "1"}}]}, "text": "To mitigate the issues with HTML in Im2Seq-based TSR models laid out before, we propose here our Optimised Table Structure Language (OTSL). OTSL is designed to express table structure with a minimized vocabulary and a simple set of rules, which are both significantly reduced compared to HTML. At the same time, OTSL enables easy error detection and correction during sequence generation. We further demonstrate how the compact structure representation and minimized sequence length improves prediction accuracy and inference time in the TableFormer architecture."}, {"label": "Section-header", "id": 6, "page_no": 5, "cluster": {"id": 6, "label": "Section-header", "bbox": {"l": 134.02143745422362, "t": 465.00746841430663, "r": 261.80109, "b": 475.0406295776367, "coord_origin": "1"}, "confidence": 0.9574865698814392, "cells": [{"id": 30, "text": "4.1", "bbox": {"l": 134.76498, "t": 465.87192, "r": 149.40204, "b": 474.67886, "coord_origin": "1"}}, {"id": 31, "text": "Language Definition", "bbox": {"l": 160.85902, "t": 465.87192, "r": 261.80109, "b": 474.67886, "coord_origin": "1"}}]}, "text": "4.1 Language Definition"}, {"label": "Text", "id": 7, "page_no": 5, "cluster": {"id": 7, "label": "Text", "bbox": {"l": 134.03182640075684, "t": 488.4044437408447, "r": 480.58871, "b": 522.0173652648926, "coord_origin": "1"}, "confidence": 0.9783343076705933, "cells": [{"id": 32, "text": "In Figure 3, we illustrate how the OTSL is defined. In essence, the OTSL defines", "bbox": {"l": 134.76498, "t": 488.99789, "r": 480.58871, "b": 497.79486, "coord_origin": "1"}}, {"id": 33, "text": "only 5 tokens that directly describe a tabular structure based on an atomic 2D", "bbox": {"l": 134.76498, "t": 500.95288, "r": 480.5867, "b": 509.74985, "coord_origin": "1"}}, {"id": 34, "text": "grid.", "bbox": {"l": 134.76498, "t": 512.90887, "r": 154.7131, "b": 521.7058400000001, "coord_origin": "1"}}]}, "text": "In Figure 3, we illustrate how the OTSL is defined. In essence, the OTSL defines only 5 tokens that directly describe a tabular structure based on an atomic 2D grid."}, {"label": "Text", "id": 8, "page_no": 5, "cluster": {"id": 8, "label": "Text", "bbox": {"l": 149.35654392242432, "t": 525.0188541412354, "r": 409.31137, "b": 535.0435180664062, "coord_origin": "1"}, "confidence": 0.8631380796432495, "cells": [{"id": 35, "text": "The OTSL vocabulary is comprised of the following tokens:", "bbox": {"l": 149.70898, "t": 525.5018600000001, "r": 409.31137, "b": 534.29883, "coord_origin": "1"}}]}, "text": "The OTSL vocabulary is comprised of the following tokens:"}, {"label": "List-item", "id": 9, "page_no": 5, "cluster": {"id": 9, "label": "List-item", "bbox": {"l": 139.9448733329773, "t": 546.6955352783203, "r": 460.54443, "b": 556.77682, "coord_origin": "1"}, "confidence": 0.9208247065544128, "cells": [{"id": 36, "text": "-", "bbox": {"l": 140.99298, "t": 547.96989, "r": 146.72047, "b": 556.77682, "coord_origin": "1"}}, {"id": 37, "text": "\"C\" cell -", "bbox": {"l": 151.70099, "t": 547.97986, "r": 193.20619, "b": 556.77682, "coord_origin": "1"}}, {"id": 38, "text": "a new table cell", "bbox": {"l": 196.52199, "t": 547.97986, "r": 263.46564, "b": 556.77682, "coord_origin": "1"}}, {"id": 39, "text": "that either has or does not have cell content", "bbox": {"l": 267.815, "t": 547.97986, "r": 460.54443, "b": 556.77682, "coord_origin": "1"}}]}, "text": "-\"C\" cell a new table cell that either has or does not have cell content"}, {"label": "List-item", "id": 10, "page_no": 5, "cluster": {"id": 10, "label": "List-item", "bbox": {"l": 139.97167739868163, "t": 559.1281276702881, "r": 480.59392999999994, "b": 581.8816543579102, "coord_origin": "1"}, "confidence": 0.9447745084762573, "cells": [{"id": 40, "text": "-", "bbox": {"l": 140.99301, "t": 560.5629, "r": 146.7205, "b": 569.36983, "coord_origin": "1"}}, {"id": 41, "text": "\"L\" cell -", "bbox": {"l": 151.70102, "t": 560.57286, "r": 194.30011, "b": 569.36983, "coord_origin": "1"}}, {"id": 42, "text": "left-looking cell", "bbox": {"l": 198.65903, "t": 560.57286, "r": 264.51779, "b": 569.36983, "coord_origin": "1"}}, {"id": 43, "text": ", merging with the left neighbor cell to create a", "bbox": {"l": 264.51804, "t": 560.57286, "r": 480.59392999999994, "b": 569.36983, "coord_origin": "1"}}, {"id": 44, "text": "span", "bbox": {"l": 151.70103, "t": 572.52786, "r": 171.67604, "b": 581.32483, "coord_origin": "1"}}]}, "text": "-\"L\" cell left-looking cell , merging with the left neighbor cell to create a span"}, {"label": "List-item", "id": 11, "page_no": 5, "cluster": {"id": 11, "label": "List-item", "bbox": {"l": 140.17970438003542, "t": 584.0574760437012, "r": 480.58856, "b": 607.0045509338379, "coord_origin": "1"}, "confidence": 0.9388728141784668, "cells": [{"id": 45, "text": "-", "bbox": {"l": 140.99304, "t": 585.11189, "r": 146.72054, "b": 593.91882, "coord_origin": "1"}}, {"id": 46, "text": "\"U\" cell -", "bbox": {"l": 151.70105, "t": 585.12186, "r": 194.11086, "b": 593.91882, "coord_origin": "1"}}, {"id": 47, "text": "up-looking cell", "bbox": {"l": 197.74805, "t": 585.12186, "r": 259.89474, "b": 593.91882, "coord_origin": "1"}}, {"id": 48, "text": ", merging with the upper neighbor cell to create a", "bbox": {"l": 259.89206, "t": 585.12186, "r": 480.58856, "b": 593.91882, "coord_origin": "1"}}, {"id": 49, "text": "span", "bbox": {"l": 151.70105, "t": 597.07686, "r": 171.67606, "b": 605.87383, "coord_origin": "1"}}]}, "text": "-\"U\" cell up-looking cell , merging with the upper neighbor cell to create a span"}, {"label": "List-item", "id": 12, "page_no": 5, "cluster": {"id": 12, "label": "List-item", "bbox": {"l": 139.92364311218262, "t": 608.5861701965332, "r": 454.55496, "b": 619.1174583435059, "coord_origin": "1"}, "confidence": 0.9166865348815918, "cells": [{"id": 50, "text": "-", "bbox": {"l": 140.99304, "t": 609.6599, "r": 146.72054, "b": 618.46683, "coord_origin": "1"}}, {"id": 51, "text": "\"X\" cell -", "bbox": {"l": 151.70105, "t": 609.66986, "r": 193.48323, "b": 618.46683, "coord_origin": "1"}}, {"id": 52, "text": "cross cell", "bbox": {"l": 196.79904, "t": 609.66986, "r": 236.12042, "b": 618.46683, "coord_origin": "1"}}, {"id": 53, "text": ", to merge with both left and upper neighbor cells", "bbox": {"l": 236.12505, "t": 609.66986, "r": 454.55496, "b": 618.46683, "coord_origin": "1"}}]}, "text": "-\"X\" cell cross cell , to merge with both left and upper neighbor cells"}, {"label": "List-item", "id": 13, "page_no": 5, "cluster": {"id": 13, "label": "List-item", "bbox": {"l": 139.8769658088684, "t": 621.1636688232421, "r": 328.61676, "b": 631.06082, "coord_origin": "1"}, "confidence": 0.9090225696563721, "cells": [{"id": 54, "text": "-", "bbox": {"l": 140.99304, "t": 622.2538900000001, "r": 146.72054, "b": 631.06082, "coord_origin": "1"}}, {"id": 55, "text": "\"NL\" -", "bbox": {"l": 151.70105, "t": 622.26385, "r": 181.99434, "b": 631.06082, "coord_origin": "1"}}, {"id": 56, "text": "new-line", "bbox": {"l": 185.31705, "t": 622.26385, "r": 221.46236, "b": 631.06082, "coord_origin": "1"}}, {"id": 57, "text": ", switch to the next row.", "bbox": {"l": 221.46104, "t": 622.26385, "r": 328.61676, "b": 631.06082, "coord_origin": "1"}}]}, "text": "-\"NL\" new-line , switch to the next row."}, {"label": "Text", "id": 14, "page_no": 5, "cluster": {"id": 14, "label": "Text", "bbox": {"l": 134.1934679031372, "t": 643.1055770874024, "r": 480.59280000000007, "b": 664.85484, "coord_origin": "1"}, "confidence": 0.9640712141990662, "cells": [{"id": 58, "text": "A notable attribute of OTSL is that it has the capability of achieving lossless", "bbox": {"l": 149.70905, "t": 644.10286, "r": 480.59280000000007, "b": 652.8998300000001, "coord_origin": "1"}}, {"id": 59, "text": "conversion to HTML.", "bbox": {"l": 134.76505, "t": 656.05786, "r": 228.22321, "b": 664.85484, "coord_origin": "1"}}]}, "text": "A notable attribute of OTSL is that it has the capability of achieving lossless conversion to HTML."}], "body": [{"label": "Text", "id": 2, "page_no": 5, "cluster": {"id": 2, "label": "Text", "bbox": {"l": 133.94254274368288, "t": 118.29494304656987, "r": 480.59479, "b": 140.69587211608882, "coord_origin": "1"}, "confidence": 0.9729993939399719, "cells": [{"id": 3, "text": "generation. Implicitly, this also means that Im2Seq models need to learn these", "bbox": {"l": 134.765, "t": 118.93377999999996, "r": 480.59479, "b": 127.73077, "coord_origin": "1"}}, {"id": 4, "text": "complex syntax rules, simply to deliver valid output.", "bbox": {"l": 134.765, "t": 130.88878999999997, "r": 364.62503, "b": 139.68579, "coord_origin": "1"}}]}, "text": "generation. Implicitly, this also means that Im2Seq models need to learn these complex syntax rules, simply to deliver valid output."}, {"label": "Text", "id": 3, "page_no": 5, "cluster": {"id": 3, "label": "Text", "bbox": {"l": 133.6434519767761, "t": 142.55638961791988, "r": 480.59569999999997, "b": 295.74191, "coord_origin": "1"}, "confidence": 0.9872115850448608, "cells": [{"id": 5, "text": "In practice, we observe two major issues with prediction quality when train-", "bbox": {"l": 149.709, "t": 143.48279000000002, "r": 480.58981, "b": 152.27979000000005, "coord_origin": "1"}}, {"id": 6, "text": "ing Im2Seq models on HTML table structure generation from images. On the", "bbox": {"l": 134.765, "t": 155.43781, "r": 480.59378, "b": 164.23479999999995, "coord_origin": "1"}}, {"id": 7, "text": "one hand, we find that on large tables, the visual attention of the model often", "bbox": {"l": 134.765, "t": 167.39282000000003, "r": 480.5867, "b": 176.18982000000005, "coord_origin": "1"}}, {"id": 8, "text": "starts to drift and is not accurately moving forward cell by cell anymore. This", "bbox": {"l": 134.765, "t": 179.34784000000002, "r": 480.59476, "b": 188.14484000000004, "coord_origin": "1"}}, {"id": 9, "text": "manifests itself in either in an increasing", "bbox": {"l": 134.765, "t": 191.30286, "r": 314.27805, "b": 200.09984999999995, "coord_origin": "1"}}, {"id": 10, "text": "location drift", "bbox": {"l": 318.056, "t": 191.30286, "r": 374.08664, "b": 200.09984999999995, "coord_origin": "1"}}, {"id": 11, "text": "for proposed table-cells", "bbox": {"l": 378.80899, "t": 191.30286, "r": 480.58594, "b": 200.09984999999995, "coord_origin": "1"}}, {"id": 12, "text": "in later rows on the same column or even complete loss of vertical alignment, as", "bbox": {"l": 134.76498, "t": 203.25885000000005, "r": 480.58771, "b": 212.05584999999996, "coord_origin": "1"}}, {"id": 13, "text": "illustrated in Figure 5. Addressing this with post-processing is partially possible,", "bbox": {"l": 134.76498, "t": 215.21387000000004, "r": 480.59569999999997, "b": 224.01085999999998, "coord_origin": "1"}}, {"id": 14, "text": "but clearly undesired. On the other hand, we find many instances of predictions", "bbox": {"l": 134.76498, "t": 227.16887999999994, "r": 480.59454, "b": 235.96587999999997, "coord_origin": "1"}}, {"id": 15, "text": "with structural inconsistencies or plain invalid HTML output, as shown in Fig-", "bbox": {"l": 134.76498, "t": 239.12390000000005, "r": 480.58759000000003, "b": 247.92089999999996, "coord_origin": "1"}}, {"id": 16, "text": "ure 6, which are nearly impossible to properly correct. Both problems seriously", "bbox": {"l": 134.76498, "t": 251.07892000000004, "r": 480.59277, "b": 259.87591999999995, "coord_origin": "1"}}, {"id": 17, "text": "impact the TSR model performance, since they reflect not only in the task of", "bbox": {"l": 134.76498, "t": 263.03394000000003, "r": 480.59463999999997, "b": 271.83092999999997, "coord_origin": "1"}}, {"id": 18, "text": "pure structure recognition but also in the equally crucial recognition or matching", "bbox": {"l": 134.76498, "t": 274.98992999999996, "r": 480.58978, "b": 283.78693, "coord_origin": "1"}}, {"id": 19, "text": "of table cell content.", "bbox": {"l": 134.76498, "t": 286.94495, "r": 223.57262, "b": 295.74191, "coord_origin": "1"}}]}, "text": "In practice, we observe two major issues with prediction quality when training Im2Seq models on HTML table structure generation from images. On the one hand, we find that on large tables, the visual attention of the model often starts to drift and is not accurately moving forward cell by cell anymore. This manifests itself in either in an increasing location drift for proposed table-cells in later rows on the same column or even complete loss of vertical alignment, as illustrated in Figure 5. Addressing this with post-processing is partially possible, but clearly undesired. On the other hand, we find many instances of predictions with structural inconsistencies or plain invalid HTML output, as shown in Figure 6, which are nearly impossible to properly correct. Both problems seriously impact the TSR model performance, since they reflect not only in the task of pure structure recognition but also in the equally crucial recognition or matching of table cell content."}, {"label": "Section-header", "id": 4, "page_no": 5, "cluster": {"id": 4, "label": "Section-header", "bbox": {"l": 134.07444734573366, "t": 319.69540557861325, "r": 372.50848, "b": 331.54225502014157, "coord_origin": "1"}, "confidence": 0.9534726738929749, "cells": [{"id": 20, "text": "4", "bbox": {"l": 134.76498, "t": 320.6311, "r": 141.48859, "b": 331.19949, "coord_origin": "1"}}, {"id": 21, "text": "Optimised Table Structure Language", "bbox": {"l": 154.93819, "t": 320.6311, "r": 372.50848, "b": 331.19949, "coord_origin": "1"}}]}, "text": "4 Optimised Table Structure Language"}, {"label": "Text", "id": 5, "page_no": 5, "cluster": {"id": 5, "label": "Text", "bbox": {"l": 133.82858533859255, "t": 348.3478282928467, "r": 480.59473, "b": 441.59985, "coord_origin": "1"}, "confidence": 0.9874445796012878, "cells": [{"id": 22, "text": "To mitigate the issues with HTML in Im2Seq-based TSR models laid out before,", "bbox": {"l": 134.76498, "t": 349.11697, "r": 480.59075999999993, "b": 357.91394, "coord_origin": "1"}}, {"id": 23, "text": "we propose here our Optimised Table Structure Language (OTSL). OTSL is", "bbox": {"l": 134.76498, "t": 361.07196000000005, "r": 480.58875, "b": 369.86893, "coord_origin": "1"}}, {"id": 24, "text": "designed to express table structure with a minimized vocabulary and a simple", "bbox": {"l": 134.76498, "t": 373.02795, "r": 480.58681999999993, "b": 381.82492, "coord_origin": "1"}}, {"id": 25, "text": "set of rules, which are both significantly reduced compared to HTML. At the", "bbox": {"l": 134.76498, "t": 384.98294, "r": 480.58875, "b": 393.77991, "coord_origin": "1"}}, {"id": 26, "text": "same time, OTSL enables easy error detection and correction during sequence", "bbox": {"l": 134.76498, "t": 396.93793, "r": 480.58978, "b": 405.73489, "coord_origin": "1"}}, {"id": 27, "text": "generation. We further demonstrate how the compact structure representation", "bbox": {"l": 134.76498, "t": 408.89291, "r": 480.59473, "b": 417.68988, "coord_origin": "1"}}, {"id": 28, "text": "and minimized sequence length improves prediction accuracy and inference time", "bbox": {"l": 134.76498, "t": 420.8479, "r": 480.58868, "b": 429.64487, "coord_origin": "1"}}, {"id": 29, "text": "in the TableFormer architecture.", "bbox": {"l": 134.76498, "t": 432.80289, "r": 276.67325, "b": 441.59985, "coord_origin": "1"}}]}, "text": "To mitigate the issues with HTML in Im2Seq-based TSR models laid out before, we propose here our Optimised Table Structure Language (OTSL). OTSL is designed to express table structure with a minimized vocabulary and a simple set of rules, which are both significantly reduced compared to HTML. At the same time, OTSL enables easy error detection and correction during sequence generation. We further demonstrate how the compact structure representation and minimized sequence length improves prediction accuracy and inference time in the TableFormer architecture."}, {"label": "Section-header", "id": 6, "page_no": 5, "cluster": {"id": 6, "label": "Section-header", "bbox": {"l": 134.02143745422362, "t": 465.00746841430663, "r": 261.80109, "b": 475.0406295776367, "coord_origin": "1"}, "confidence": 0.9574865698814392, "cells": [{"id": 30, "text": "4.1", "bbox": {"l": 134.76498, "t": 465.87192, "r": 149.40204, "b": 474.67886, "coord_origin": "1"}}, {"id": 31, "text": "Language Definition", "bbox": {"l": 160.85902, "t": 465.87192, "r": 261.80109, "b": 474.67886, "coord_origin": "1"}}]}, "text": "4.1 Language Definition"}, {"label": "Text", "id": 7, "page_no": 5, "cluster": {"id": 7, "label": "Text", "bbox": {"l": 134.03182640075684, "t": 488.4044437408447, "r": 480.58871, "b": 522.0173652648926, "coord_origin": "1"}, "confidence": 0.9783343076705933, "cells": [{"id": 32, "text": "In Figure 3, we illustrate how the OTSL is defined. In essence, the OTSL defines", "bbox": {"l": 134.76498, "t": 488.99789, "r": 480.58871, "b": 497.79486, "coord_origin": "1"}}, {"id": 33, "text": "only 5 tokens that directly describe a tabular structure based on an atomic 2D", "bbox": {"l": 134.76498, "t": 500.95288, "r": 480.5867, "b": 509.74985, "coord_origin": "1"}}, {"id": 34, "text": "grid.", "bbox": {"l": 134.76498, "t": 512.90887, "r": 154.7131, "b": 521.7058400000001, "coord_origin": "1"}}]}, "text": "In Figure 3, we illustrate how the OTSL is defined. In essence, the OTSL defines only 5 tokens that directly describe a tabular structure based on an atomic 2D grid."}, {"label": "Text", "id": 8, "page_no": 5, "cluster": {"id": 8, "label": "Text", "bbox": {"l": 149.35654392242432, "t": 525.0188541412354, "r": 409.31137, "b": 535.0435180664062, "coord_origin": "1"}, "confidence": 0.8631380796432495, "cells": [{"id": 35, "text": "The OTSL vocabulary is comprised of the following tokens:", "bbox": {"l": 149.70898, "t": 525.5018600000001, "r": 409.31137, "b": 534.29883, "coord_origin": "1"}}]}, "text": "The OTSL vocabulary is comprised of the following tokens:"}, {"label": "List-item", "id": 9, "page_no": 5, "cluster": {"id": 9, "label": "List-item", "bbox": {"l": 139.9448733329773, "t": 546.6955352783203, "r": 460.54443, "b": 556.77682, "coord_origin": "1"}, "confidence": 0.9208247065544128, "cells": [{"id": 36, "text": "-", "bbox": {"l": 140.99298, "t": 547.96989, "r": 146.72047, "b": 556.77682, "coord_origin": "1"}}, {"id": 37, "text": "\"C\" cell -", "bbox": {"l": 151.70099, "t": 547.97986, "r": 193.20619, "b": 556.77682, "coord_origin": "1"}}, {"id": 38, "text": "a new table cell", "bbox": {"l": 196.52199, "t": 547.97986, "r": 263.46564, "b": 556.77682, "coord_origin": "1"}}, {"id": 39, "text": "that either has or does not have cell content", "bbox": {"l": 267.815, "t": 547.97986, "r": 460.54443, "b": 556.77682, "coord_origin": "1"}}]}, "text": "-\"C\" cell a new table cell that either has or does not have cell content"}, {"label": "List-item", "id": 10, "page_no": 5, "cluster": {"id": 10, "label": "List-item", "bbox": {"l": 139.97167739868163, "t": 559.1281276702881, "r": 480.59392999999994, "b": 581.8816543579102, "coord_origin": "1"}, "confidence": 0.9447745084762573, "cells": [{"id": 40, "text": "-", "bbox": {"l": 140.99301, "t": 560.5629, "r": 146.7205, "b": 569.36983, "coord_origin": "1"}}, {"id": 41, "text": "\"L\" cell -", "bbox": {"l": 151.70102, "t": 560.57286, "r": 194.30011, "b": 569.36983, "coord_origin": "1"}}, {"id": 42, "text": "left-looking cell", "bbox": {"l": 198.65903, "t": 560.57286, "r": 264.51779, "b": 569.36983, "coord_origin": "1"}}, {"id": 43, "text": ", merging with the left neighbor cell to create a", "bbox": {"l": 264.51804, "t": 560.57286, "r": 480.59392999999994, "b": 569.36983, "coord_origin": "1"}}, {"id": 44, "text": "span", "bbox": {"l": 151.70103, "t": 572.52786, "r": 171.67604, "b": 581.32483, "coord_origin": "1"}}]}, "text": "-\"L\" cell left-looking cell , merging with the left neighbor cell to create a span"}, {"label": "List-item", "id": 11, "page_no": 5, "cluster": {"id": 11, "label": "List-item", "bbox": {"l": 140.17970438003542, "t": 584.0574760437012, "r": 480.58856, "b": 607.0045509338379, "coord_origin": "1"}, "confidence": 0.9388728141784668, "cells": [{"id": 45, "text": "-", "bbox": {"l": 140.99304, "t": 585.11189, "r": 146.72054, "b": 593.91882, "coord_origin": "1"}}, {"id": 46, "text": "\"U\" cell -", "bbox": {"l": 151.70105, "t": 585.12186, "r": 194.11086, "b": 593.91882, "coord_origin": "1"}}, {"id": 47, "text": "up-looking cell", "bbox": {"l": 197.74805, "t": 585.12186, "r": 259.89474, "b": 593.91882, "coord_origin": "1"}}, {"id": 48, "text": ", merging with the upper neighbor cell to create a", "bbox": {"l": 259.89206, "t": 585.12186, "r": 480.58856, "b": 593.91882, "coord_origin": "1"}}, {"id": 49, "text": "span", "bbox": {"l": 151.70105, "t": 597.07686, "r": 171.67606, "b": 605.87383, "coord_origin": "1"}}]}, "text": "-\"U\" cell up-looking cell , merging with the upper neighbor cell to create a span"}, {"label": "List-item", "id": 12, "page_no": 5, "cluster": {"id": 12, "label": "List-item", "bbox": {"l": 139.92364311218262, "t": 608.5861701965332, "r": 454.55496, "b": 619.1174583435059, "coord_origin": "1"}, "confidence": 0.9166865348815918, "cells": [{"id": 50, "text": "-", "bbox": {"l": 140.99304, "t": 609.6599, "r": 146.72054, "b": 618.46683, "coord_origin": "1"}}, {"id": 51, "text": "\"X\" cell -", "bbox": {"l": 151.70105, "t": 609.66986, "r": 193.48323, "b": 618.46683, "coord_origin": "1"}}, {"id": 52, "text": "cross cell", "bbox": {"l": 196.79904, "t": 609.66986, "r": 236.12042, "b": 618.46683, "coord_origin": "1"}}, {"id": 53, "text": ", to merge with both left and upper neighbor cells", "bbox": {"l": 236.12505, "t": 609.66986, "r": 454.55496, "b": 618.46683, "coord_origin": "1"}}]}, "text": "-\"X\" cell cross cell , to merge with both left and upper neighbor cells"}, {"label": "List-item", "id": 13, "page_no": 5, "cluster": {"id": 13, "label": "List-item", "bbox": {"l": 139.8769658088684, "t": 621.1636688232421, "r": 328.61676, "b": 631.06082, "coord_origin": "1"}, "confidence": 0.9090225696563721, "cells": [{"id": 54, "text": "-", "bbox": {"l": 140.99304, "t": 622.2538900000001, "r": 146.72054, "b": 631.06082, "coord_origin": "1"}}, {"id": 55, "text": "\"NL\" -", "bbox": {"l": 151.70105, "t": 622.26385, "r": 181.99434, "b": 631.06082, "coord_origin": "1"}}, {"id": 56, "text": "new-line", "bbox": {"l": 185.31705, "t": 622.26385, "r": 221.46236, "b": 631.06082, "coord_origin": "1"}}, {"id": 57, "text": ", switch to the next row.", "bbox": {"l": 221.46104, "t": 622.26385, "r": 328.61676, "b": 631.06082, "coord_origin": "1"}}]}, "text": "-\"NL\" new-line , switch to the next row."}, {"label": "Text", "id": 14, "page_no": 5, "cluster": {"id": 14, "label": "Text", "bbox": {"l": 134.1934679031372, "t": 643.1055770874024, "r": 480.59280000000007, "b": 664.85484, "coord_origin": "1"}, "confidence": 0.9640712141990662, "cells": [{"id": 58, "text": "A notable attribute of OTSL is that it has the capability of achieving lossless", "bbox": {"l": 149.70905, "t": 644.10286, "r": 480.59280000000007, "b": 652.8998300000001, "coord_origin": "1"}}, {"id": 59, "text": "conversion to HTML.", "bbox": {"l": 134.76505, "t": 656.05786, "r": 228.22321, "b": 664.85484, "coord_origin": "1"}}]}, "text": "A notable attribute of OTSL is that it has the capability of achieving lossless conversion to HTML."}], "headers": [{"label": "Page-header", "id": 0, "page_no": 5, "cluster": {"id": 0, "label": "Page-header", "bbox": {"l": 134.1282597541809, "t": 93.76585235595701, "r": 139.453120136261, "b": 101.84069999999997, "coord_origin": "1"}, "confidence": 0.8473044633865356, "cells": [{"id": 0, "text": "6", "bbox": {"l": 134.765, "t": 93.77099999999996, "r": 139.37193, "b": 101.84069999999997, "coord_origin": "1"}}]}, "text": "6"}, {"label": "Page-header", "id": 1, "page_no": 5, "cluster": {"id": 1, "label": "Page-header", "bbox": {"l": 167.2993927001953, "t": 93.00047779083252, "r": 231.72227, "b": 101.91809320449829, "coord_origin": "1"}, "confidence": 0.9042621850967407, "cells": [{"id": 1, "text": "M.", "bbox": {"l": 167.81335, "t": 93.77099999999996, "r": 178.07675, "b": 101.84069999999997, "coord_origin": "1"}}, {"id": 2, "text": "Lysak, et al.", "bbox": {"l": 182.37415, "t": 93.77099999999996, "r": 231.72227, "b": 101.84069999999997, "coord_origin": "1"}}]}, "text": "M. Lysak, et al."}]}}, {"page_no": 6, "page_hash": "d786b8d564d7a7c122f2cf573f0cc1f11ea0a559d93f19cf020c11360bce00b4", "size": {"width": 612.0, "height": 792.0}, "cells": [{"id": 0, "text": "Optimized Table Tokenization for Table Structure Recognition", "bbox": {"l": 194.478, "t": 93.77099999999996, "r": 447.54291000000006, "b": 101.84069999999997, "coord_origin": "1"}}, {"id": 1, "text": "7", "bbox": {"l": 475.98431, "t": 93.77099999999996, "r": 480.59125000000006, "b": 101.84069999999997, "coord_origin": "1"}}, {"id": 2, "text": "Fig. 3.", "bbox": {"l": 134.765, "t": 125.79918999999984, "r": 162.64424, "b": 133.72551999999996, "coord_origin": "1"}}, {"id": 3, "text": "OTSL description of table structure: A - table example; B - graphical repre-", "bbox": {"l": 166.276, "t": 125.86200000000008, "r": 480.58675999999997, "b": 133.93169999999998, "coord_origin": "1"}}, {"id": 4, "text": "sentation of table structure; C - mapping structure on a grid; D - OTSL structure", "bbox": {"l": 134.765, "t": 136.82097999999996, "r": 480.5874, "b": 144.89068999999995, "coord_origin": "1"}}, {"id": 5, "text": "encoding; E - explanation on cell encoding", "bbox": {"l": 134.765, "t": 147.77997000000005, "r": 306.1156, "b": 155.84966999999995, "coord_origin": "1"}}, {"id": 6, "text": "C", "bbox": {"l": 374.49326, "t": 168.59362999999996, "r": 381.66843, "b": 177.91540999999995, "coord_origin": "1"}}, {"id": 7, "text": "C", "bbox": {"l": 398.74011, "t": 168.50005999999996, "r": 405.91528, "b": 177.82183999999995, "coord_origin": "1"}}, {"id": 8, "text": "C", "bbox": {"l": 373.76862, "t": 192.92553999999996, "r": 380.94379, "b": 202.24730999999997, "coord_origin": "1"}}, {"id": 9, "text": "C", "bbox": {"l": 386.66388, "t": 193.07061999999996, "r": 393.83905, "b": 202.39239999999995, "coord_origin": "1"}}, {"id": 10, "text": "C", "bbox": {"l": 386.68707, "t": 205.13756999999998, "r": 393.86224, "b": 214.45934999999997, "coord_origin": "1"}}, {"id": 11, "text": "C", "bbox": {"l": 398.65729, "t": 180.73279000000002, "r": 405.83246, "b": 190.05457, "coord_origin": "1"}}, {"id": 12, "text": "C", "bbox": {"l": 410.77908, "t": 180.73859000000004, "r": 417.95425, "b": 190.06035999999995, "coord_origin": "1"}}, {"id": 13, "text": "C", "bbox": {"l": 422.90503, "t": 180.65247, "r": 430.08020000000005, "b": 189.97424, "coord_origin": "1"}}, {"id": 14, "text": "C", "bbox": {"l": 398.7807, "t": 192.98865, "r": 405.95587, "b": 202.31042000000002, "coord_origin": "1"}}, {"id": 15, "text": "C", "bbox": {"l": 410.90164, "t": 192.99487, "r": 418.07681, "b": 202.31664999999998, "coord_origin": "1"}}, {"id": 16, "text": "C", "bbox": {"l": 423.02753, "t": 192.909, "r": 430.2027, "b": 202.23077, "coord_origin": "1"}}, {"id": 17, "text": "C", "bbox": {"l": 398.78235, "t": 205.31573000000003, "r": 405.95752, "b": 214.63751000000002, "coord_origin": "1"}}, {"id": 18, "text": "C", "bbox": {"l": 410.90414, "t": 205.32196, "r": 418.07932, "b": 214.64373999999998, "coord_origin": "1"}}, {"id": 19, "text": "C", "bbox": {"l": 423.03003, "t": 205.23614999999995, "r": 430.20520000000005, "b": 214.55791999999997, "coord_origin": "1"}}, {"id": 20, "text": "C", "bbox": {"l": 386.50574, "t": 217.03882, "r": 393.68091, "b": 226.36059999999998, "coord_origin": "1"}}, {"id": 21, "text": "C", "bbox": {"l": 398.60181, "t": 217.21704, "r": 405.77698, "b": 226.53882, "coord_origin": "1"}}, {"id": 22, "text": "C", "bbox": {"l": 410.72275, "t": 217.22321, "r": 417.89792, "b": 226.54498, "coord_origin": "1"}}, {"id": 23, "text": "C", "bbox": {"l": 422.84869, "t": 217.13738999999998, "r": 430.02386, "b": 226.45916999999997, "coord_origin": "1"}}, {"id": 24, "text": "NL", "bbox": {"l": 435.16009999999994, "t": 167.69011999999998, "r": 447.86273, "b": 177.01189999999997, "coord_origin": "1"}}, {"id": 25, "text": "NL", "bbox": {"l": 435.44415, "t": 180.20025999999996, "r": 448.14679, "b": 189.52202999999997, "coord_origin": "1"}}, {"id": 26, "text": "NL", "bbox": {"l": 435.46735, "t": 192.49474999999995, "r": 448.16998000000007, "b": 201.81652999999994, "coord_origin": "1"}}, {"id": 27, "text": "NL", "bbox": {"l": 435.38202, "t": 204.83025999999995, "r": 448.08466, "b": 214.15204000000006, "coord_origin": "1"}}, {"id": 28, "text": "NL", "bbox": {"l": 435.59906, "t": 217.2337, "r": 448.3017, "b": 226.55548, "coord_origin": "1"}}, {"id": 29, "text": "U", "bbox": {"l": 374.14957, "t": 205.23492, "r": 381.32474, "b": 214.55669999999998, "coord_origin": "1"}}, {"id": 30, "text": "U", "bbox": {"l": 374.0419, "t": 217.14648, "r": 381.21707, "b": 226.46826, "coord_origin": "1"}}, {"id": 31, "text": "U", "bbox": {"l": 374.34418, "t": 180.93488000000002, "r": 381.51935, "b": 190.25665000000004, "coord_origin": "1"}}, {"id": 32, "text": "L", "bbox": {"l": 387.76285, "t": 168.57788000000005, "r": 393.28833, "b": 177.89966000000004, "coord_origin": "1"}}, {"id": 33, "text": "L", "bbox": {"l": 411.86395, "t": 168.06195000000002, "r": 417.38943, "b": 177.38373, "coord_origin": "1"}}, {"id": 34, "text": "L", "bbox": {"l": 423.33563, "t": 167.93439, "r": 428.86111, "b": 177.25616000000002, "coord_origin": "1"}}, {"id": 35, "text": "X", "bbox": {"l": 387.13593, "t": 180.78576999999996, "r": 393.76453, "b": 190.10753999999997, "coord_origin": "1"}}, {"id": 36, "text": "C", "bbox": {"l": 282.2594, "t": 244.50878999999998, "r": 289.43457, "b": 253.83056999999997, "coord_origin": "1"}}, {"id": 37, "text": "U", "bbox": {"l": 282.11035, "t": 256.85022000000004, "r": 289.28552, "b": 266.172, "coord_origin": "1"}}, {"id": 38, "text": "U", "bbox": {"l": 282.40848, "t": 269.13300000000004, "r": 289.58365, "b": 278.45477000000005, "coord_origin": "1"}}, {"id": 39, "text": "L", "bbox": {"l": 295.52902, "t": 244.49347, "r": 301.0545, "b": 253.81525, "coord_origin": "1"}}, {"id": 40, "text": "L", "bbox": {"l": 307.46613, "t": 244.57372999999995, "r": 312.99161, "b": 253.89550999999994, "coord_origin": "1"}}, {"id": 41, "text": "L", "bbox": {"l": 318.76886, "t": 244.44037000000003, "r": 324.29434, "b": 253.76215000000002, "coord_origin": "1"}}, {"id": 42, "text": "X", "bbox": {"l": 294.9021, "t": 256.70154, "r": 301.03976, "b": 266.02332, "coord_origin": "1"}}, {"id": 43, "text": "X X", "bbox": {"l": 307.17743, "t": 256.70154, "r": 325.59039, "b": 266.02332, "coord_origin": "1"}}, {"id": 44, "text": "X", "bbox": {"l": 294.78949, "t": 269.25420999999994, "r": 300.92715, "b": 278.57599000000005, "coord_origin": "1"}}, {"id": 45, "text": "X X", "bbox": {"l": 307.06482, "t": 269.25420999999994, "r": 325.47778, "b": 278.57599000000005, "coord_origin": "1"}}, {"id": 46, "text": "C", "bbox": {"l": 195.93939, "t": 268.74798999999996, "r": 203.11456, "b": 278.06976, "coord_origin": "1"}}, {"id": 47, "text": "L", "bbox": {"l": 209.20891, "t": 268.73267, "r": 214.73439, "b": 278.05444, "coord_origin": "1"}}, {"id": 48, "text": "L", "bbox": {"l": 221.14551, "t": 268.81293000000005, "r": 226.67099, "b": 278.13469999999995, "coord_origin": "1"}}, {"id": 49, "text": "L", "bbox": {"l": 232.44858, "t": 268.67957, "r": 237.97405999999998, "b": 278.00134, "coord_origin": "1"}}, {"id": 50, "text": "C", "bbox": {"l": 196.21715, "t": 244.53961000000004, "r": 203.39232, "b": 253.86139000000003, "coord_origin": "1"}}, {"id": 51, "text": "C", "bbox": {"l": 250.32143, "t": 244.09813999999994, "r": 257.49661, "b": 253.41992000000005, "coord_origin": "1"}}, {"id": 52, "text": "U", "bbox": {"l": 250.17235999999997, "t": 256.43951000000004, "r": 257.34753, "b": 265.76129000000003, "coord_origin": "1"}}, {"id": 53, "text": "U", "bbox": {"l": 250.47049000000004, "t": 268.72222999999997, "r": 257.64566, "b": 278.04400999999996, "coord_origin": "1"}}, {"id": 54, "text": "1", "bbox": {"l": 334.51135, "t": 242.99463000000003, "r": 337.22485, "b": 249.20911, "coord_origin": "1"}}, {"id": 55, "text": "- simple cells: \"C\"", "bbox": {"l": 339.93835, "t": 242.99463000000003, "r": 391.49472, "b": 249.20911, "coord_origin": "1"}}, {"id": 56, "text": "2", "bbox": {"l": 334.51135, "t": 252.93255999999997, "r": 337.33313, "b": 259.14703, "coord_origin": "1"}}, {"id": 57, "text": "- horizontal merges: \"C\", \"L\"", "bbox": {"l": 340.15491, "t": 252.93255999999997, "r": 421.98624, "b": 259.14703, "coord_origin": "1"}}, {"id": 58, "text": "3", "bbox": {"l": 334.51135, "t": 262.87048000000004, "r": 337.29868, "b": 269.08496, "coord_origin": "1"}}, {"id": 59, "text": "- vertical merges: \"C\", \"U\"", "bbox": {"l": 340.086, "t": 262.87048000000004, "r": 415.34375, "b": 269.08496, "coord_origin": "1"}}, {"id": 60, "text": "4", "bbox": {"l": 334.51135, "t": 272.80841, "r": 337.30188, "b": 279.02288999999996, "coord_origin": "1"}}, {"id": 61, "text": "- 2d merges: \"C\", \"L\", \"U\", \"X\"", "bbox": {"l": 340.09241, "t": 272.80841, "r": 426.59875, "b": 279.02288999999996, "coord_origin": "1"}}, {"id": 62, "text": "1", "bbox": {"l": 185.67178, "t": 244.04224, "r": 189.35544, "b": 250.25671, "coord_origin": "1"}}, {"id": 63, "text": "2", "bbox": {"l": 185.96759, "t": 268.34766, "r": 189.65125, "b": 274.56213, "coord_origin": "1"}}, {"id": 64, "text": "3", "bbox": {"l": 239.34152, "t": 243.62523999999996, "r": 243.02518, "b": 249.83972000000006, "coord_origin": "1"}}, {"id": 65, "text": "4", "bbox": {"l": 271.32852, "t": 243.49390000000005, "r": 275.01218, "b": 249.70836999999995, "coord_origin": "1"}}, {"id": 66, "text": "2", "bbox": {"l": 229.81627, "t": 166.51495, "r": 233.49992000000003, "b": 172.72942999999998, "coord_origin": "1"}}, {"id": 67, "text": "1", "bbox": {"l": 257.24402, "t": 189.961, "r": 260.92767, "b": 196.17548, "coord_origin": "1"}}, {"id": 68, "text": "3", "bbox": {"l": 186.87526, "t": 177.97668, "r": 190.55891, "b": 184.19115999999997, "coord_origin": "1"}}, {"id": 69, "text": "4", "bbox": {"l": 196.48746, "t": 169.01520000000005, "r": 200.17111, "b": 175.22968000000003, "coord_origin": "1"}}, {"id": 70, "text": "A", "bbox": {"l": 169.74728, "t": 167.88225999999997, "r": 175.72659, "b": 175.65039000000002, "coord_origin": "1"}}, {"id": 71, "text": "B", "bbox": {"l": 169.74728, "t": 206.83867999999995, "r": 175.72659, "b": 214.60681, "coord_origin": "1"}}, {"id": 72, "text": "C", "bbox": {"l": 274.29419, "t": 168.27972, "r": 280.2735, "b": 176.04785000000004, "coord_origin": "1"}}, {"id": 73, "text": "D", "bbox": {"l": 359.56152, "t": 168.27972, "r": 365.54083, "b": 176.04785000000004, "coord_origin": "1"}}, {"id": 74, "text": "E", "bbox": {"l": 169.74728, "t": 243.21149000000003, "r": 175.27112, "b": 250.97960999999998, "coord_origin": "1"}}, {"id": 75, "text": "4.2", "bbox": {"l": 134.765, "t": 305.29581, "r": 149.40205, "b": 314.10275, "coord_origin": "1"}}, {"id": 76, "text": "Language Syntax", "bbox": {"l": 160.85904, "t": 305.29581, "r": 246.65197999999998, "b": 314.10275, "coord_origin": "1"}}, {"id": 77, "text": "The OTSL representation follows these syntax rules:", "bbox": {"l": 134.765, "t": 325.24777, "r": 363.79617, "b": 334.04474, "coord_origin": "1"}}, {"id": 78, "text": "1.", "bbox": {"l": 138.97299, "t": 347.18079, "r": 146.71991, "b": 355.97775, "coord_origin": "1"}}, {"id": 79, "text": "Left-looking cell rule", "bbox": {"l": 151.70099, "t": 347.17081, "r": 257.37927, "b": 355.97775, "coord_origin": "1"}}, {"id": 80, "text": ": The left neighbour of an \"L\" cell must be either", "bbox": {"l": 257.383, "t": 347.18079, "r": 480.58902, "b": 355.97775, "coord_origin": "1"}}, {"id": 81, "text": "another \"L\" cell or a \"C\" cell.", "bbox": {"l": 151.70099, "t": 359.13678, "r": 283.59387, "b": 367.93375, "coord_origin": "1"}}, {"id": 82, "text": "2.", "bbox": {"l": 138.97299, "t": 371.09479, "r": 146.71991, "b": 379.89175, "coord_origin": "1"}}, {"id": 83, "text": "Up-looking cell rule", "bbox": {"l": 151.70099, "t": 371.08481, "r": 252.11203, "b": 379.89175, "coord_origin": "1"}}, {"id": 84, "text": ": The upper neighbour of a \"U\" cell must be either", "bbox": {"l": 252.112, "t": 371.09479, "r": 480.59229000000005, "b": 379.89175, "coord_origin": "1"}}, {"id": 85, "text": "another \"U\" cell or a \"C\" cell.", "bbox": {"l": 151.70099, "t": 383.04977, "r": 284.8392, "b": 391.84673999999995, "coord_origin": "1"}}, {"id": 86, "text": "3.", "bbox": {"l": 138.97299, "t": 395.0077800000001, "r": 146.71991, "b": 403.80475, "coord_origin": "1"}}, {"id": 87, "text": "Cross cell rule", "bbox": {"l": 151.70099, "t": 394.99780000000004, "r": 223.3042, "b": 403.80475, "coord_origin": "1"}}, {"id": 88, "text": ":", "bbox": {"l": 223.30699, "t": 395.0077800000001, "r": 226.07360999999997, "b": 403.80475, "coord_origin": "1"}}, {"id": 89, "text": "The left neighbour of an \"X\" cell must be either another \"X\" cell or a \"U\"", "bbox": {"l": 151.70099, "t": 406.96677, "r": 480.59238, "b": 415.76373, "coord_origin": "1"}}, {"id": 90, "text": "cell, and the upper neighbour of an \"X\" cell must be either another \"X\" cell", "bbox": {"l": 151.70099, "t": 418.9217499999999, "r": 480.59219, "b": 427.71871999999996, "coord_origin": "1"}}, {"id": 91, "text": "or an \"L\" cell.", "bbox": {"l": 151.70099, "t": 430.87674, "r": 214.39663999999996, "b": 439.67371, "coord_origin": "1"}}, {"id": 92, "text": "4.", "bbox": {"l": 138.97299, "t": 442.83572, "r": 146.71991, "b": 451.63269, "coord_origin": "1"}}, {"id": 93, "text": "First row rule", "bbox": {"l": 151.70099, "t": 442.82574, "r": 221.32263, "b": 451.63269, "coord_origin": "1"}}, {"id": 94, "text": ": Only \"L\" cells and \"C\" cells are allowed in the first row.", "bbox": {"l": 221.32700000000003, "t": 442.83572, "r": 474.59018, "b": 451.63269, "coord_origin": "1"}}, {"id": 95, "text": "5.", "bbox": {"l": 138.97299, "t": 454.7937299999999, "r": 146.71991, "b": 463.5907, "coord_origin": "1"}}, {"id": 96, "text": "First column rule", "bbox": {"l": 151.70099, "t": 454.78375, "r": 240.71982, "b": 463.5907, "coord_origin": "1"}}, {"id": 97, "text": ": Only \"U\" cells and \"C\" cells are allowed in the first", "bbox": {"l": 240.71599, "t": 454.7937299999999, "r": 480.58746, "b": 463.5907, "coord_origin": "1"}}, {"id": 98, "text": "column.", "bbox": {"l": 151.70099, "t": 466.74872, "r": 186.0072, "b": 475.54568, "coord_origin": "1"}}, {"id": 99, "text": "6.", "bbox": {"l": 138.97299, "t": 478.70673, "r": 146.71991, "b": 487.50369, "coord_origin": "1"}}, {"id": 100, "text": "Rectangular rule", "bbox": {"l": 151.70099, "t": 478.69675, "r": 235.15768, "b": 487.50369, "coord_origin": "1"}}, {"id": 101, "text": ": The table representation is always rectangular - all rows", "bbox": {"l": 235.15697999999998, "t": 478.70673, "r": 480.59457, "b": 487.50369, "coord_origin": "1"}}, {"id": 102, "text": "must have an equal number of tokens, terminated with \"NL\" token.", "bbox": {"l": 151.70099, "t": 490.66272, "r": 448.04147, "b": 499.45969, "coord_origin": "1"}}, {"id": 103, "text": "The application of these rules gives OTSL a set of unique properties. First", "bbox": {"l": 149.70898, "t": 512.59271, "r": 480.59583, "b": 521.38968, "coord_origin": "1"}}, {"id": 104, "text": "of all, the OTSL enforces a strictly rectangular structure representation, where", "bbox": {"l": 134.76498, "t": 524.5477000000001, "r": 480.59079, "b": 533.34467, "coord_origin": "1"}}, {"id": 105, "text": "every new-line token starts a new row. As a consequence, all rows and all columns", "bbox": {"l": 134.76498, "t": 536.5027, "r": 480.59482, "b": 545.29967, "coord_origin": "1"}}, {"id": 106, "text": "have exactly the same number of tokens, irrespective of cell spans. Secondly, the", "bbox": {"l": 134.76498, "t": 548.4586899999999, "r": 480.58865000000003, "b": 557.25566, "coord_origin": "1"}}, {"id": 107, "text": "OTSL representation is unambiguous: Every table structure is represented in one", "bbox": {"l": 134.76498, "t": 560.4137000000001, "r": 480.59365999999994, "b": 569.21066, "coord_origin": "1"}}, {"id": 108, "text": "way. In this representation every table cell corresponds to a \"C\"-cell token, which", "bbox": {"l": 134.76498, "t": 572.3687, "r": 480.58673, "b": 581.16566, "coord_origin": "1"}}, {"id": 109, "text": "in case of spans is always located in the top-left corner of the table cell definition.", "bbox": {"l": 134.76498, "t": 584.3237, "r": 480.59171, "b": 593.12067, "coord_origin": "1"}}, {"id": 110, "text": "Third, OTSL syntax rules are only backward-looking. As a consequence, every", "bbox": {"l": 134.76498, "t": 596.2787, "r": 480.59180000000003, "b": 605.07567, "coord_origin": "1"}}, {"id": 111, "text": "predicted token can be validated straight during sequence generation by looking", "bbox": {"l": 134.76498, "t": 608.2347, "r": 480.5936899999999, "b": 617.03166, "coord_origin": "1"}}, {"id": 112, "text": "at the previously predicted sequence. As such, OTSL can guarantee that every", "bbox": {"l": 134.76498, "t": 620.1897, "r": 480.59072999999995, "b": 628.98666, "coord_origin": "1"}}, {"id": 113, "text": "predicted sequence is syntactically valid.", "bbox": {"l": 134.76498, "t": 632.1447000000001, "r": 311.19769, "b": 640.9416699999999, "coord_origin": "1"}}, {"id": 114, "text": "These characteristics can be easily learned by sequence generator networks,", "bbox": {"l": 149.70898, "t": 644.1026899999999, "r": 480.59186, "b": 652.89966, "coord_origin": "1"}}, {"id": 115, "text": "as we demonstrate further below. We find strong indications that this pattern", "bbox": {"l": 134.76498, "t": 656.05769, "r": 480.59265, "b": 664.8546699999999, "coord_origin": "1"}}], "predictions": {"layout": {"clusters": [{"id": 0, "label": "Page-header", "bbox": {"l": 193.9747784614563, "t": 93.12438640594485, "r": 447.54291000000006, "b": 102.22475852966306, "coord_origin": "1"}, "confidence": 0.9503458738327026, "cells": [{"id": 0, "text": "Optimized Table Tokenization for Table Structure Recognition", "bbox": {"l": 194.478, "t": 93.77099999999996, "r": 447.54291000000006, "b": 101.84069999999997, "coord_origin": "1"}}]}, {"id": 1, "label": "Page-header", "bbox": {"l": 475.39760971069336, "t": 93.39061431884761, "r": 480.59125000000006, "b": 101.84069999999997, "coord_origin": "1"}, "confidence": 0.8631047010421753, "cells": [{"id": 1, "text": "7", "bbox": {"l": 475.98431, "t": 93.77099999999996, "r": 480.59125000000006, "b": 101.84069999999997, "coord_origin": "1"}}]}, {"id": 2, "label": "Caption", "bbox": {"l": 133.8881320953369, "t": 124.88460788726809, "r": 480.5874, "b": 156.37949838638303, "coord_origin": "1"}, "confidence": 0.9609795808792114, "cells": [{"id": 2, "text": "Fig. 3.", "bbox": {"l": 134.765, "t": 125.79918999999984, "r": 162.64424, "b": 133.72551999999996, "coord_origin": "1"}}, {"id": 3, "text": "OTSL description of table structure: A - table example; B - graphical repre-", "bbox": {"l": 166.276, "t": 125.86200000000008, "r": 480.58675999999997, "b": 133.93169999999998, "coord_origin": "1"}}, {"id": 4, "text": "sentation of table structure; C - mapping structure on a grid; D - OTSL structure", "bbox": {"l": 134.765, "t": 136.82097999999996, "r": 480.5874, "b": 144.89068999999995, "coord_origin": "1"}}, {"id": 5, "text": "encoding; E - explanation on cell encoding", "bbox": {"l": 134.765, "t": 147.77997000000005, "r": 306.1156, "b": 155.84966999999995, "coord_origin": "1"}}]}, {"id": 3, "label": "Picture", "bbox": {"l": 164.22023735046386, "t": 163.8766965866089, "r": 448.976096534729, "b": 280.3828954696655, "coord_origin": "1"}, "confidence": 0.9576331973075867, "cells": [{"id": 6, "text": "C", "bbox": {"l": 374.49326, "t": 168.59362999999996, "r": 381.66843, "b": 177.91540999999995, "coord_origin": "1"}}, {"id": 7, "text": "C", "bbox": {"l": 398.74011, "t": 168.50005999999996, "r": 405.91528, "b": 177.82183999999995, "coord_origin": "1"}}, {"id": 8, "text": "C", "bbox": {"l": 373.76862, "t": 192.92553999999996, "r": 380.94379, "b": 202.24730999999997, "coord_origin": "1"}}, {"id": 9, "text": "C", "bbox": {"l": 386.66388, "t": 193.07061999999996, "r": 393.83905, "b": 202.39239999999995, "coord_origin": "1"}}, {"id": 10, "text": "C", "bbox": {"l": 386.68707, "t": 205.13756999999998, "r": 393.86224, "b": 214.45934999999997, "coord_origin": "1"}}, {"id": 11, "text": "C", "bbox": {"l": 398.65729, "t": 180.73279000000002, "r": 405.83246, "b": 190.05457, "coord_origin": "1"}}, {"id": 12, "text": "C", "bbox": {"l": 410.77908, "t": 180.73859000000004, "r": 417.95425, "b": 190.06035999999995, "coord_origin": "1"}}, {"id": 13, "text": "C", "bbox": {"l": 422.90503, "t": 180.65247, "r": 430.08020000000005, "b": 189.97424, "coord_origin": "1"}}, {"id": 14, "text": "C", "bbox": {"l": 398.7807, "t": 192.98865, "r": 405.95587, "b": 202.31042000000002, "coord_origin": "1"}}, {"id": 15, "text": "C", "bbox": {"l": 410.90164, "t": 192.99487, "r": 418.07681, "b": 202.31664999999998, "coord_origin": "1"}}, {"id": 16, "text": "C", "bbox": {"l": 423.02753, "t": 192.909, "r": 430.2027, "b": 202.23077, "coord_origin": "1"}}, {"id": 17, "text": "C", "bbox": {"l": 398.78235, "t": 205.31573000000003, "r": 405.95752, "b": 214.63751000000002, "coord_origin": "1"}}, {"id": 18, "text": "C", "bbox": {"l": 410.90414, "t": 205.32196, "r": 418.07932, "b": 214.64373999999998, "coord_origin": "1"}}, {"id": 19, "text": "C", "bbox": {"l": 423.03003, "t": 205.23614999999995, "r": 430.20520000000005, "b": 214.55791999999997, "coord_origin": "1"}}, {"id": 20, "text": "C", "bbox": {"l": 386.50574, "t": 217.03882, "r": 393.68091, "b": 226.36059999999998, "coord_origin": "1"}}, {"id": 21, "text": "C", "bbox": {"l": 398.60181, "t": 217.21704, "r": 405.77698, "b": 226.53882, "coord_origin": "1"}}, {"id": 22, "text": "C", "bbox": {"l": 410.72275, "t": 217.22321, "r": 417.89792, "b": 226.54498, "coord_origin": "1"}}, {"id": 23, "text": "C", "bbox": {"l": 422.84869, "t": 217.13738999999998, "r": 430.02386, "b": 226.45916999999997, "coord_origin": "1"}}, {"id": 24, "text": "NL", "bbox": {"l": 435.16009999999994, "t": 167.69011999999998, "r": 447.86273, "b": 177.01189999999997, "coord_origin": "1"}}, {"id": 25, "text": "NL", "bbox": {"l": 435.44415, "t": 180.20025999999996, "r": 448.14679, "b": 189.52202999999997, "coord_origin": "1"}}, {"id": 26, "text": "NL", "bbox": {"l": 435.46735, "t": 192.49474999999995, "r": 448.16998000000007, "b": 201.81652999999994, "coord_origin": "1"}}, {"id": 27, "text": "NL", "bbox": {"l": 435.38202, "t": 204.83025999999995, "r": 448.08466, "b": 214.15204000000006, "coord_origin": "1"}}, {"id": 28, "text": "NL", "bbox": {"l": 435.59906, "t": 217.2337, "r": 448.3017, "b": 226.55548, "coord_origin": "1"}}, {"id": 29, "text": "U", "bbox": {"l": 374.14957, "t": 205.23492, "r": 381.32474, "b": 214.55669999999998, "coord_origin": "1"}}, {"id": 30, "text": "U", "bbox": {"l": 374.0419, "t": 217.14648, "r": 381.21707, "b": 226.46826, "coord_origin": "1"}}, {"id": 31, "text": "U", "bbox": {"l": 374.34418, "t": 180.93488000000002, "r": 381.51935, "b": 190.25665000000004, "coord_origin": "1"}}, {"id": 32, "text": "L", "bbox": {"l": 387.76285, "t": 168.57788000000005, "r": 393.28833, "b": 177.89966000000004, "coord_origin": "1"}}, {"id": 33, "text": "L", "bbox": {"l": 411.86395, "t": 168.06195000000002, "r": 417.38943, "b": 177.38373, "coord_origin": "1"}}, {"id": 34, "text": "L", "bbox": {"l": 423.33563, "t": 167.93439, "r": 428.86111, "b": 177.25616000000002, "coord_origin": "1"}}, {"id": 35, "text": "X", "bbox": {"l": 387.13593, "t": 180.78576999999996, "r": 393.76453, "b": 190.10753999999997, "coord_origin": "1"}}, {"id": 36, "text": "C", "bbox": {"l": 282.2594, "t": 244.50878999999998, "r": 289.43457, "b": 253.83056999999997, "coord_origin": "1"}}, {"id": 37, "text": "U", "bbox": {"l": 282.11035, "t": 256.85022000000004, "r": 289.28552, "b": 266.172, "coord_origin": "1"}}, {"id": 38, "text": "U", "bbox": {"l": 282.40848, "t": 269.13300000000004, "r": 289.58365, "b": 278.45477000000005, "coord_origin": "1"}}, {"id": 39, "text": "L", "bbox": {"l": 295.52902, "t": 244.49347, "r": 301.0545, "b": 253.81525, "coord_origin": "1"}}, {"id": 40, "text": "L", "bbox": {"l": 307.46613, "t": 244.57372999999995, "r": 312.99161, "b": 253.89550999999994, "coord_origin": "1"}}, {"id": 41, "text": "L", "bbox": {"l": 318.76886, "t": 244.44037000000003, "r": 324.29434, "b": 253.76215000000002, "coord_origin": "1"}}, {"id": 42, "text": "X", "bbox": {"l": 294.9021, "t": 256.70154, "r": 301.03976, "b": 266.02332, "coord_origin": "1"}}, {"id": 43, "text": "X X", "bbox": {"l": 307.17743, "t": 256.70154, "r": 325.59039, "b": 266.02332, "coord_origin": "1"}}, {"id": 44, "text": "X", "bbox": {"l": 294.78949, "t": 269.25420999999994, "r": 300.92715, "b": 278.57599000000005, "coord_origin": "1"}}, {"id": 45, "text": "X X", "bbox": {"l": 307.06482, "t": 269.25420999999994, "r": 325.47778, "b": 278.57599000000005, "coord_origin": "1"}}, {"id": 46, "text": "C", "bbox": {"l": 195.93939, "t": 268.74798999999996, "r": 203.11456, "b": 278.06976, "coord_origin": "1"}}, {"id": 47, "text": "L", "bbox": {"l": 209.20891, "t": 268.73267, "r": 214.73439, "b": 278.05444, "coord_origin": "1"}}, {"id": 48, "text": "L", "bbox": {"l": 221.14551, "t": 268.81293000000005, "r": 226.67099, "b": 278.13469999999995, "coord_origin": "1"}}, {"id": 49, "text": "L", "bbox": {"l": 232.44858, "t": 268.67957, "r": 237.97405999999998, "b": 278.00134, "coord_origin": "1"}}, {"id": 50, "text": "C", "bbox": {"l": 196.21715, "t": 244.53961000000004, "r": 203.39232, "b": 253.86139000000003, "coord_origin": "1"}}, {"id": 51, "text": "C", "bbox": {"l": 250.32143, "t": 244.09813999999994, "r": 257.49661, "b": 253.41992000000005, "coord_origin": "1"}}, {"id": 52, "text": "U", "bbox": {"l": 250.17235999999997, "t": 256.43951000000004, "r": 257.34753, "b": 265.76129000000003, "coord_origin": "1"}}, {"id": 53, "text": "U", "bbox": {"l": 250.47049000000004, "t": 268.72222999999997, "r": 257.64566, "b": 278.04400999999996, "coord_origin": "1"}}, {"id": 54, "text": "1", "bbox": {"l": 334.51135, "t": 242.99463000000003, "r": 337.22485, "b": 249.20911, "coord_origin": "1"}}, {"id": 55, "text": "- simple cells: \"C\"", "bbox": {"l": 339.93835, "t": 242.99463000000003, "r": 391.49472, "b": 249.20911, "coord_origin": "1"}}, {"id": 56, "text": "2", "bbox": {"l": 334.51135, "t": 252.93255999999997, "r": 337.33313, "b": 259.14703, "coord_origin": "1"}}, {"id": 57, "text": "- horizontal merges: \"C\", \"L\"", "bbox": {"l": 340.15491, "t": 252.93255999999997, "r": 421.98624, "b": 259.14703, "coord_origin": "1"}}, {"id": 58, "text": "3", "bbox": {"l": 334.51135, "t": 262.87048000000004, "r": 337.29868, "b": 269.08496, "coord_origin": "1"}}, {"id": 59, "text": "- vertical merges: \"C\", \"U\"", "bbox": {"l": 340.086, "t": 262.87048000000004, "r": 415.34375, "b": 269.08496, "coord_origin": "1"}}, {"id": 60, "text": "4", "bbox": {"l": 334.51135, "t": 272.80841, "r": 337.30188, "b": 279.02288999999996, "coord_origin": "1"}}, {"id": 61, "text": "- 2d merges: \"C\", \"L\", \"U\", \"X\"", "bbox": {"l": 340.09241, "t": 272.80841, "r": 426.59875, "b": 279.02288999999996, "coord_origin": "1"}}, {"id": 62, "text": "1", "bbox": {"l": 185.67178, "t": 244.04224, "r": 189.35544, "b": 250.25671, "coord_origin": "1"}}, {"id": 63, "text": "2", "bbox": {"l": 185.96759, "t": 268.34766, "r": 189.65125, "b": 274.56213, "coord_origin": "1"}}, {"id": 64, "text": "3", "bbox": {"l": 239.34152, "t": 243.62523999999996, "r": 243.02518, "b": 249.83972000000006, "coord_origin": "1"}}, {"id": 65, "text": "4", "bbox": {"l": 271.32852, "t": 243.49390000000005, "r": 275.01218, "b": 249.70836999999995, "coord_origin": "1"}}, {"id": 66, "text": "2", "bbox": {"l": 229.81627, "t": 166.51495, "r": 233.49992000000003, "b": 172.72942999999998, "coord_origin": "1"}}, {"id": 67, "text": "1", "bbox": {"l": 257.24402, "t": 189.961, "r": 260.92767, "b": 196.17548, "coord_origin": "1"}}, {"id": 68, "text": "3", "bbox": {"l": 186.87526, "t": 177.97668, "r": 190.55891, "b": 184.19115999999997, "coord_origin": "1"}}, {"id": 69, "text": "4", "bbox": {"l": 196.48746, "t": 169.01520000000005, "r": 200.17111, "b": 175.22968000000003, "coord_origin": "1"}}, {"id": 70, "text": "A", "bbox": {"l": 169.74728, "t": 167.88225999999997, "r": 175.72659, "b": 175.65039000000002, "coord_origin": "1"}}, {"id": 71, "text": "B", "bbox": {"l": 169.74728, "t": 206.83867999999995, "r": 175.72659, "b": 214.60681, "coord_origin": "1"}}, {"id": 72, "text": "C", "bbox": {"l": 274.29419, "t": 168.27972, "r": 280.2735, "b": 176.04785000000004, "coord_origin": "1"}}, {"id": 73, "text": "D", "bbox": {"l": 359.56152, "t": 168.27972, "r": 365.54083, "b": 176.04785000000004, "coord_origin": "1"}}, {"id": 74, "text": "E", "bbox": {"l": 169.74728, "t": 243.21149000000003, "r": 175.27112, "b": 250.97960999999998, "coord_origin": "1"}}]}, {"id": 4, "label": "Section-header", "bbox": {"l": 134.28743534088133, "t": 304.4804964065552, "r": 246.78787822723388, "b": 314.2943344116211, "coord_origin": "1"}, "confidence": 0.9464288949966431, "cells": [{"id": 75, "text": "4.2", "bbox": {"l": 134.765, "t": 305.29581, "r": 149.40205, "b": 314.10275, "coord_origin": "1"}}, {"id": 76, "text": "Language Syntax", "bbox": {"l": 160.85904, "t": 305.29581, "r": 246.65197999999998, "b": 314.10275, "coord_origin": "1"}}]}, {"id": 5, "label": "Text", "bbox": {"l": 134.2309673309326, "t": 324.43218154907225, "r": 363.79617, "b": 334.19743423461915, "coord_origin": "1"}, "confidence": 0.9260643720626831, "cells": [{"id": 77, "text": "The OTSL representation follows these syntax rules:", "bbox": {"l": 134.765, "t": 325.24777, "r": 363.79617, "b": 334.04474, "coord_origin": "1"}}]}, {"id": 6, "label": "List-item", "bbox": {"l": 138.97299, "t": 346.1299736022949, "r": 480.58902, "b": 367.93375, "coord_origin": "1"}, "confidence": 0.9650311470031738, "cells": [{"id": 78, "text": "1.", "bbox": {"l": 138.97299, "t": 347.18079, "r": 146.71991, "b": 355.97775, "coord_origin": "1"}}, {"id": 79, "text": "Left-looking cell rule", "bbox": {"l": 151.70099, "t": 347.17081, "r": 257.37927, "b": 355.97775, "coord_origin": "1"}}, {"id": 80, "text": ": The left neighbour of an \"L\" cell must be either", "bbox": {"l": 257.383, "t": 347.18079, "r": 480.58902, "b": 355.97775, "coord_origin": "1"}}, {"id": 81, "text": "another \"L\" cell or a \"C\" cell.", "bbox": {"l": 151.70099, "t": 359.13678, "r": 283.59387, "b": 367.93375, "coord_origin": "1"}}]}, {"id": 7, "label": "List-item", "bbox": {"l": 138.19280376434327, "t": 370.04180603027345, "r": 480.59229000000005, "b": 391.84673999999995, "coord_origin": "1"}, "confidence": 0.9545656442642212, "cells": [{"id": 82, "text": "2.", "bbox": {"l": 138.97299, "t": 371.09479, "r": 146.71991, "b": 379.89175, "coord_origin": "1"}}, {"id": 83, "text": "Up-looking cell rule", "bbox": {"l": 151.70099, "t": 371.08481, "r": 252.11203, "b": 379.89175, "coord_origin": "1"}}, {"id": 84, "text": ": The upper neighbour of a \"U\" cell must be either", "bbox": {"l": 252.112, "t": 371.09479, "r": 480.59229000000005, "b": 379.89175, "coord_origin": "1"}}, {"id": 85, "text": "another \"U\" cell or a \"C\" cell.", "bbox": {"l": 151.70099, "t": 383.04977, "r": 284.8392, "b": 391.84673999999995, "coord_origin": "1"}}]}, {"id": 8, "label": "Section-header", "bbox": {"l": 138.0652765274048, "t": 394.5083381652832, "r": 226.07360999999997, "b": 403.80475, "coord_origin": "1"}, "confidence": 0.729752779006958, "cells": [{"id": 86, "text": "3.", "bbox": {"l": 138.97299, "t": 395.0077800000001, "r": 146.71991, "b": 403.80475, "coord_origin": "1"}}, {"id": 87, "text": "Cross cell rule", "bbox": {"l": 151.70099, "t": 394.99780000000004, "r": 223.3042, "b": 403.80475, "coord_origin": "1"}}, {"id": 88, "text": ":", "bbox": {"l": 223.30699, "t": 395.0077800000001, "r": 226.07360999999997, "b": 403.80475, "coord_origin": "1"}}]}, {"id": 9, "label": "List-item", "bbox": {"l": 146.40036334991456, "t": 395.0077800000001, "r": 480.59238, "b": 439.67371, "coord_origin": "1"}, "confidence": 0.7146523594856262, "cells": [{"id": 88, "text": ":", "bbox": {"l": 223.30699, "t": 395.0077800000001, "r": 226.07360999999997, "b": 403.80475, "coord_origin": "1"}}, {"id": 89, "text": "The left neighbour of an \"X\" cell must be either another \"X\" cell or a \"U\"", "bbox": {"l": 151.70099, "t": 406.96677, "r": 480.59238, "b": 415.76373, "coord_origin": "1"}}, {"id": 90, "text": "cell, and the upper neighbour of an \"X\" cell must be either another \"X\" cell", "bbox": {"l": 151.70099, "t": 418.9217499999999, "r": 480.59219, "b": 427.71871999999996, "coord_origin": "1"}}, {"id": 91, "text": "or an \"L\" cell.", "bbox": {"l": 151.70099, "t": 430.87674, "r": 214.39663999999996, "b": 439.67371, "coord_origin": "1"}}]}, {"id": 10, "label": "List-item", "bbox": {"l": 138.3949067115784, "t": 442.1132652282715, "r": 474.59018, "b": 452.20458526611327, "coord_origin": "1"}, "confidence": 0.921468198299408, "cells": [{"id": 92, "text": "4.", "bbox": {"l": 138.97299, "t": 442.83572, "r": 146.71991, "b": 451.63269, "coord_origin": "1"}}, {"id": 93, "text": "First row rule", "bbox": {"l": 151.70099, "t": 442.82574, "r": 221.32263, "b": 451.63269, "coord_origin": "1"}}, {"id": 94, "text": ": Only \"L\" cells and \"C\" cells are allowed in the first row.", "bbox": {"l": 221.32700000000003, "t": 442.83572, "r": 474.59018, "b": 451.63269, "coord_origin": "1"}}]}, {"id": 11, "label": "List-item", "bbox": {"l": 138.3254817008972, "t": 453.90531692504885, "r": 480.58746, "b": 475.54568, "coord_origin": "1"}, "confidence": 0.9438549280166626, "cells": [{"id": 95, "text": "5.", "bbox": {"l": 138.97299, "t": 454.7937299999999, "r": 146.71991, "b": 463.5907, "coord_origin": "1"}}, {"id": 96, "text": "First column rule", "bbox": {"l": 151.70099, "t": 454.78375, "r": 240.71982, "b": 463.5907, "coord_origin": "1"}}, {"id": 97, "text": ": Only \"U\" cells and \"C\" cells are allowed in the first", "bbox": {"l": 240.71599, "t": 454.7937299999999, "r": 480.58746, "b": 463.5907, "coord_origin": "1"}}, {"id": 98, "text": "column.", "bbox": {"l": 151.70099, "t": 466.74872, "r": 186.0072, "b": 475.54568, "coord_origin": "1"}}]}, {"id": 12, "label": "List-item", "bbox": {"l": 138.22427701950073, "t": 477.50852966308594, "r": 480.59457, "b": 499.71804428100586, "coord_origin": "1"}, "confidence": 0.9670841693878174, "cells": [{"id": 99, "text": "6.", "bbox": {"l": 138.97299, "t": 478.70673, "r": 146.71991, "b": 487.50369, "coord_origin": "1"}}, {"id": 100, "text": "Rectangular rule", "bbox": {"l": 151.70099, "t": 478.69675, "r": 235.15768, "b": 487.50369, "coord_origin": "1"}}, {"id": 101, "text": ": The table representation is always rectangular - all rows", "bbox": {"l": 235.15697999999998, "t": 478.70673, "r": 480.59457, "b": 487.50369, "coord_origin": "1"}}, {"id": 102, "text": "must have an equal number of tokens, terminated with \"NL\" token.", "bbox": {"l": 151.70099, "t": 490.66272, "r": 448.04147, "b": 499.45969, "coord_origin": "1"}}]}, {"id": 13, "label": "Text", "bbox": {"l": 133.61584539413454, "t": 511.45877265930176, "r": 480.59583, "b": 642.2503395080566, "coord_origin": "1"}, "confidence": 0.9846742153167725, "cells": [{"id": 103, "text": "The application of these rules gives OTSL a set of unique properties. First", "bbox": {"l": 149.70898, "t": 512.59271, "r": 480.59583, "b": 521.38968, "coord_origin": "1"}}, {"id": 104, "text": "of all, the OTSL enforces a strictly rectangular structure representation, where", "bbox": {"l": 134.76498, "t": 524.5477000000001, "r": 480.59079, "b": 533.34467, "coord_origin": "1"}}, {"id": 105, "text": "every new-line token starts a new row. As a consequence, all rows and all columns", "bbox": {"l": 134.76498, "t": 536.5027, "r": 480.59482, "b": 545.29967, "coord_origin": "1"}}, {"id": 106, "text": "have exactly the same number of tokens, irrespective of cell spans. Secondly, the", "bbox": {"l": 134.76498, "t": 548.4586899999999, "r": 480.58865000000003, "b": 557.25566, "coord_origin": "1"}}, {"id": 107, "text": "OTSL representation is unambiguous: Every table structure is represented in one", "bbox": {"l": 134.76498, "t": 560.4137000000001, "r": 480.59365999999994, "b": 569.21066, "coord_origin": "1"}}, {"id": 108, "text": "way. In this representation every table cell corresponds to a \"C\"-cell token, which", "bbox": {"l": 134.76498, "t": 572.3687, "r": 480.58673, "b": 581.16566, "coord_origin": "1"}}, {"id": 109, "text": "in case of spans is always located in the top-left corner of the table cell definition.", "bbox": {"l": 134.76498, "t": 584.3237, "r": 480.59171, "b": 593.12067, "coord_origin": "1"}}, {"id": 110, "text": "Third, OTSL syntax rules are only backward-looking. As a consequence, every", "bbox": {"l": 134.76498, "t": 596.2787, "r": 480.59180000000003, "b": 605.07567, "coord_origin": "1"}}, {"id": 111, "text": "predicted token can be validated straight during sequence generation by looking", "bbox": {"l": 134.76498, "t": 608.2347, "r": 480.5936899999999, "b": 617.03166, "coord_origin": "1"}}, {"id": 112, "text": "at the previously predicted sequence. As such, OTSL can guarantee that every", "bbox": {"l": 134.76498, "t": 620.1897, "r": 480.59072999999995, "b": 628.98666, "coord_origin": "1"}}, {"id": 113, "text": "predicted sequence is syntactically valid.", "bbox": {"l": 134.76498, "t": 632.1447000000001, "r": 311.19769, "b": 640.9416699999999, "coord_origin": "1"}}]}, {"id": 14, "label": "Text", "bbox": {"l": 134.0440538406372, "t": 643.1018760681153, "r": 480.59265, "b": 665.0898582458495, "coord_origin": "1"}, "confidence": 0.9682873487472534, "cells": [{"id": 114, "text": "These characteristics can be easily learned by sequence generator networks,", "bbox": {"l": 149.70898, "t": 644.1026899999999, "r": 480.59186, "b": 652.89966, "coord_origin": "1"}}, {"id": 115, "text": "as we demonstrate further below. We find strong indications that this pattern", "bbox": {"l": 134.76498, "t": 656.05769, "r": 480.59265, "b": 664.8546699999999, "coord_origin": "1"}}]}]}, "tablestructure": {"table_map": {}}, "figures_classification": null, "equations_prediction": null}, "assembled": {"elements": [{"label": "Page-header", "id": 0, "page_no": 6, "cluster": {"id": 0, "label": "Page-header", "bbox": {"l": 193.9747784614563, "t": 93.12438640594485, "r": 447.54291000000006, "b": 102.22475852966306, "coord_origin": "1"}, "confidence": 0.9503458738327026, "cells": [{"id": 0, "text": "Optimized Table Tokenization for Table Structure Recognition", "bbox": {"l": 194.478, "t": 93.77099999999996, "r": 447.54291000000006, "b": 101.84069999999997, "coord_origin": "1"}}]}, "text": "Optimized Table Tokenization for Table Structure Recognition"}, {"label": "Page-header", "id": 1, "page_no": 6, "cluster": {"id": 1, "label": "Page-header", "bbox": {"l": 475.39760971069336, "t": 93.39061431884761, "r": 480.59125000000006, "b": 101.84069999999997, "coord_origin": "1"}, "confidence": 0.8631047010421753, "cells": [{"id": 1, "text": "7", "bbox": {"l": 475.98431, "t": 93.77099999999996, "r": 480.59125000000006, "b": 101.84069999999997, "coord_origin": "1"}}]}, "text": "7"}, {"label": "Caption", "id": 2, "page_no": 6, "cluster": {"id": 2, "label": "Caption", "bbox": {"l": 133.8881320953369, "t": 124.88460788726809, "r": 480.5874, "b": 156.37949838638303, "coord_origin": "1"}, "confidence": 0.9609795808792114, "cells": [{"id": 2, "text": "Fig. 3.", "bbox": {"l": 134.765, "t": 125.79918999999984, "r": 162.64424, "b": 133.72551999999996, "coord_origin": "1"}}, {"id": 3, "text": "OTSL description of table structure: A - table example; B - graphical repre-", "bbox": {"l": 166.276, "t": 125.86200000000008, "r": 480.58675999999997, "b": 133.93169999999998, "coord_origin": "1"}}, {"id": 4, "text": "sentation of table structure; C - mapping structure on a grid; D - OTSL structure", "bbox": {"l": 134.765, "t": 136.82097999999996, "r": 480.5874, "b": 144.89068999999995, "coord_origin": "1"}}, {"id": 5, "text": "encoding; E - explanation on cell encoding", "bbox": {"l": 134.765, "t": 147.77997000000005, "r": 306.1156, "b": 155.84966999999995, "coord_origin": "1"}}]}, "text": "Fig. 3. OTSL description of table structure: A - table example; B - graphical representation of table structure; C - mapping structure on a grid; D - OTSL structure encoding; E - explanation on cell encoding"}, {"label": "Picture", "id": 3, "page_no": 6, "cluster": {"id": 3, "label": "Picture", "bbox": {"l": 164.22023735046386, "t": 163.8766965866089, "r": 448.976096534729, "b": 280.3828954696655, "coord_origin": "1"}, "confidence": 0.9576331973075867, "cells": [{"id": 6, "text": "C", "bbox": {"l": 374.49326, "t": 168.59362999999996, "r": 381.66843, "b": 177.91540999999995, "coord_origin": "1"}}, {"id": 7, "text": "C", "bbox": {"l": 398.74011, "t": 168.50005999999996, "r": 405.91528, "b": 177.82183999999995, "coord_origin": "1"}}, {"id": 8, "text": "C", "bbox": {"l": 373.76862, "t": 192.92553999999996, "r": 380.94379, "b": 202.24730999999997, "coord_origin": "1"}}, {"id": 9, "text": "C", "bbox": {"l": 386.66388, "t": 193.07061999999996, "r": 393.83905, "b": 202.39239999999995, "coord_origin": "1"}}, {"id": 10, "text": "C", "bbox": {"l": 386.68707, "t": 205.13756999999998, "r": 393.86224, "b": 214.45934999999997, "coord_origin": "1"}}, {"id": 11, "text": "C", "bbox": {"l": 398.65729, "t": 180.73279000000002, "r": 405.83246, "b": 190.05457, "coord_origin": "1"}}, {"id": 12, "text": "C", "bbox": {"l": 410.77908, "t": 180.73859000000004, "r": 417.95425, "b": 190.06035999999995, "coord_origin": "1"}}, {"id": 13, "text": "C", "bbox": {"l": 422.90503, "t": 180.65247, "r": 430.08020000000005, "b": 189.97424, "coord_origin": "1"}}, {"id": 14, "text": "C", "bbox": {"l": 398.7807, "t": 192.98865, "r": 405.95587, "b": 202.31042000000002, "coord_origin": "1"}}, {"id": 15, "text": "C", "bbox": {"l": 410.90164, "t": 192.99487, "r": 418.07681, "b": 202.31664999999998, "coord_origin": "1"}}, {"id": 16, "text": "C", "bbox": {"l": 423.02753, "t": 192.909, "r": 430.2027, "b": 202.23077, "coord_origin": "1"}}, {"id": 17, "text": "C", "bbox": {"l": 398.78235, "t": 205.31573000000003, "r": 405.95752, "b": 214.63751000000002, "coord_origin": "1"}}, {"id": 18, "text": "C", "bbox": {"l": 410.90414, "t": 205.32196, "r": 418.07932, "b": 214.64373999999998, "coord_origin": "1"}}, {"id": 19, "text": "C", "bbox": {"l": 423.03003, "t": 205.23614999999995, "r": 430.20520000000005, "b": 214.55791999999997, "coord_origin": "1"}}, {"id": 20, "text": "C", "bbox": {"l": 386.50574, "t": 217.03882, "r": 393.68091, "b": 226.36059999999998, "coord_origin": "1"}}, {"id": 21, "text": "C", "bbox": {"l": 398.60181, "t": 217.21704, "r": 405.77698, "b": 226.53882, "coord_origin": "1"}}, {"id": 22, "text": "C", "bbox": {"l": 410.72275, "t": 217.22321, "r": 417.89792, "b": 226.54498, "coord_origin": "1"}}, {"id": 23, "text": "C", "bbox": {"l": 422.84869, "t": 217.13738999999998, "r": 430.02386, "b": 226.45916999999997, "coord_origin": "1"}}, {"id": 24, "text": "NL", "bbox": {"l": 435.16009999999994, "t": 167.69011999999998, "r": 447.86273, "b": 177.01189999999997, "coord_origin": "1"}}, {"id": 25, "text": "NL", "bbox": {"l": 435.44415, "t": 180.20025999999996, "r": 448.14679, "b": 189.52202999999997, "coord_origin": "1"}}, {"id": 26, "text": "NL", "bbox": {"l": 435.46735, "t": 192.49474999999995, "r": 448.16998000000007, "b": 201.81652999999994, "coord_origin": "1"}}, {"id": 27, "text": "NL", "bbox": {"l": 435.38202, "t": 204.83025999999995, "r": 448.08466, "b": 214.15204000000006, "coord_origin": "1"}}, {"id": 28, "text": "NL", "bbox": {"l": 435.59906, "t": 217.2337, "r": 448.3017, "b": 226.55548, "coord_origin": "1"}}, {"id": 29, "text": "U", "bbox": {"l": 374.14957, "t": 205.23492, "r": 381.32474, "b": 214.55669999999998, "coord_origin": "1"}}, {"id": 30, "text": "U", "bbox": {"l": 374.0419, "t": 217.14648, "r": 381.21707, "b": 226.46826, "coord_origin": "1"}}, {"id": 31, "text": "U", "bbox": {"l": 374.34418, "t": 180.93488000000002, "r": 381.51935, "b": 190.25665000000004, "coord_origin": "1"}}, {"id": 32, "text": "L", "bbox": {"l": 387.76285, "t": 168.57788000000005, "r": 393.28833, "b": 177.89966000000004, "coord_origin": "1"}}, {"id": 33, "text": "L", "bbox": {"l": 411.86395, "t": 168.06195000000002, "r": 417.38943, "b": 177.38373, "coord_origin": "1"}}, {"id": 34, "text": "L", "bbox": {"l": 423.33563, "t": 167.93439, "r": 428.86111, "b": 177.25616000000002, "coord_origin": "1"}}, {"id": 35, "text": "X", "bbox": {"l": 387.13593, "t": 180.78576999999996, "r": 393.76453, "b": 190.10753999999997, "coord_origin": "1"}}, {"id": 36, "text": "C", "bbox": {"l": 282.2594, "t": 244.50878999999998, "r": 289.43457, "b": 253.83056999999997, "coord_origin": "1"}}, {"id": 37, "text": "U", "bbox": {"l": 282.11035, "t": 256.85022000000004, "r": 289.28552, "b": 266.172, "coord_origin": "1"}}, {"id": 38, "text": "U", "bbox": {"l": 282.40848, "t": 269.13300000000004, "r": 289.58365, "b": 278.45477000000005, "coord_origin": "1"}}, {"id": 39, "text": "L", "bbox": {"l": 295.52902, "t": 244.49347, "r": 301.0545, "b": 253.81525, "coord_origin": "1"}}, {"id": 40, "text": "L", "bbox": {"l": 307.46613, "t": 244.57372999999995, "r": 312.99161, "b": 253.89550999999994, "coord_origin": "1"}}, {"id": 41, "text": "L", "bbox": {"l": 318.76886, "t": 244.44037000000003, "r": 324.29434, "b": 253.76215000000002, "coord_origin": "1"}}, {"id": 42, "text": "X", "bbox": {"l": 294.9021, "t": 256.70154, "r": 301.03976, "b": 266.02332, "coord_origin": "1"}}, {"id": 43, "text": "X X", "bbox": {"l": 307.17743, "t": 256.70154, "r": 325.59039, "b": 266.02332, "coord_origin": "1"}}, {"id": 44, "text": "X", "bbox": {"l": 294.78949, "t": 269.25420999999994, "r": 300.92715, "b": 278.57599000000005, "coord_origin": "1"}}, {"id": 45, "text": "X X", "bbox": {"l": 307.06482, "t": 269.25420999999994, "r": 325.47778, "b": 278.57599000000005, "coord_origin": "1"}}, {"id": 46, "text": "C", "bbox": {"l": 195.93939, "t": 268.74798999999996, "r": 203.11456, "b": 278.06976, "coord_origin": "1"}}, {"id": 47, "text": "L", "bbox": {"l": 209.20891, "t": 268.73267, "r": 214.73439, "b": 278.05444, "coord_origin": "1"}}, {"id": 48, "text": "L", "bbox": {"l": 221.14551, "t": 268.81293000000005, "r": 226.67099, "b": 278.13469999999995, "coord_origin": "1"}}, {"id": 49, "text": "L", "bbox": {"l": 232.44858, "t": 268.67957, "r": 237.97405999999998, "b": 278.00134, "coord_origin": "1"}}, {"id": 50, "text": "C", "bbox": {"l": 196.21715, "t": 244.53961000000004, "r": 203.39232, "b": 253.86139000000003, "coord_origin": "1"}}, {"id": 51, "text": "C", "bbox": {"l": 250.32143, "t": 244.09813999999994, "r": 257.49661, "b": 253.41992000000005, "coord_origin": "1"}}, {"id": 52, "text": "U", "bbox": {"l": 250.17235999999997, "t": 256.43951000000004, "r": 257.34753, "b": 265.76129000000003, "coord_origin": "1"}}, {"id": 53, "text": "U", "bbox": {"l": 250.47049000000004, "t": 268.72222999999997, "r": 257.64566, "b": 278.04400999999996, "coord_origin": "1"}}, {"id": 54, "text": "1", "bbox": {"l": 334.51135, "t": 242.99463000000003, "r": 337.22485, "b": 249.20911, "coord_origin": "1"}}, {"id": 55, "text": "- simple cells: \"C\"", "bbox": {"l": 339.93835, "t": 242.99463000000003, "r": 391.49472, "b": 249.20911, "coord_origin": "1"}}, {"id": 56, "text": "2", "bbox": {"l": 334.51135, "t": 252.93255999999997, "r": 337.33313, "b": 259.14703, "coord_origin": "1"}}, {"id": 57, "text": "- horizontal merges: \"C\", \"L\"", "bbox": {"l": 340.15491, "t": 252.93255999999997, "r": 421.98624, "b": 259.14703, "coord_origin": "1"}}, {"id": 58, "text": "3", "bbox": {"l": 334.51135, "t": 262.87048000000004, "r": 337.29868, "b": 269.08496, "coord_origin": "1"}}, {"id": 59, "text": "- vertical merges: \"C\", \"U\"", "bbox": {"l": 340.086, "t": 262.87048000000004, "r": 415.34375, "b": 269.08496, "coord_origin": "1"}}, {"id": 60, "text": "4", "bbox": {"l": 334.51135, "t": 272.80841, "r": 337.30188, "b": 279.02288999999996, "coord_origin": "1"}}, {"id": 61, "text": "- 2d merges: \"C\", \"L\", \"U\", \"X\"", "bbox": {"l": 340.09241, "t": 272.80841, "r": 426.59875, "b": 279.02288999999996, "coord_origin": "1"}}, {"id": 62, "text": "1", "bbox": {"l": 185.67178, "t": 244.04224, "r": 189.35544, "b": 250.25671, "coord_origin": "1"}}, {"id": 63, "text": "2", "bbox": {"l": 185.96759, "t": 268.34766, "r": 189.65125, "b": 274.56213, "coord_origin": "1"}}, {"id": 64, "text": "3", "bbox": {"l": 239.34152, "t": 243.62523999999996, "r": 243.02518, "b": 249.83972000000006, "coord_origin": "1"}}, {"id": 65, "text": "4", "bbox": {"l": 271.32852, "t": 243.49390000000005, "r": 275.01218, "b": 249.70836999999995, "coord_origin": "1"}}, {"id": 66, "text": "2", "bbox": {"l": 229.81627, "t": 166.51495, "r": 233.49992000000003, "b": 172.72942999999998, "coord_origin": "1"}}, {"id": 67, "text": "1", "bbox": {"l": 257.24402, "t": 189.961, "r": 260.92767, "b": 196.17548, "coord_origin": "1"}}, {"id": 68, "text": "3", "bbox": {"l": 186.87526, "t": 177.97668, "r": 190.55891, "b": 184.19115999999997, "coord_origin": "1"}}, {"id": 69, "text": "4", "bbox": {"l": 196.48746, "t": 169.01520000000005, "r": 200.17111, "b": 175.22968000000003, "coord_origin": "1"}}, {"id": 70, "text": "A", "bbox": {"l": 169.74728, "t": 167.88225999999997, "r": 175.72659, "b": 175.65039000000002, "coord_origin": "1"}}, {"id": 71, "text": "B", "bbox": {"l": 169.74728, "t": 206.83867999999995, "r": 175.72659, "b": 214.60681, "coord_origin": "1"}}, {"id": 72, "text": "C", "bbox": {"l": 274.29419, "t": 168.27972, "r": 280.2735, "b": 176.04785000000004, "coord_origin": "1"}}, {"id": 73, "text": "D", "bbox": {"l": 359.56152, "t": 168.27972, "r": 365.54083, "b": 176.04785000000004, "coord_origin": "1"}}, {"id": 74, "text": "E", "bbox": {"l": 169.74728, "t": 243.21149000000003, "r": 175.27112, "b": 250.97960999999998, "coord_origin": "1"}}]}, "text": "", "data": null, "provenance": null, "predicted_class": null, "confidence": null}, {"label": "Section-header", "id": 4, "page_no": 6, "cluster": {"id": 4, "label": "Section-header", "bbox": {"l": 134.28743534088133, "t": 304.4804964065552, "r": 246.78787822723388, "b": 314.2943344116211, "coord_origin": "1"}, "confidence": 0.9464288949966431, "cells": [{"id": 75, "text": "4.2", "bbox": {"l": 134.765, "t": 305.29581, "r": 149.40205, "b": 314.10275, "coord_origin": "1"}}, {"id": 76, "text": "Language Syntax", "bbox": {"l": 160.85904, "t": 305.29581, "r": 246.65197999999998, "b": 314.10275, "coord_origin": "1"}}]}, "text": "4.2 Language Syntax"}, {"label": "Text", "id": 5, "page_no": 6, "cluster": {"id": 5, "label": "Text", "bbox": {"l": 134.2309673309326, "t": 324.43218154907225, "r": 363.79617, "b": 334.19743423461915, "coord_origin": "1"}, "confidence": 0.9260643720626831, "cells": [{"id": 77, "text": "The OTSL representation follows these syntax rules:", "bbox": {"l": 134.765, "t": 325.24777, "r": 363.79617, "b": 334.04474, "coord_origin": "1"}}]}, "text": "The OTSL representation follows these syntax rules:"}, {"label": "List-item", "id": 6, "page_no": 6, "cluster": {"id": 6, "label": "List-item", "bbox": {"l": 138.97299, "t": 346.1299736022949, "r": 480.58902, "b": 367.93375, "coord_origin": "1"}, "confidence": 0.9650311470031738, "cells": [{"id": 78, "text": "1.", "bbox": {"l": 138.97299, "t": 347.18079, "r": 146.71991, "b": 355.97775, "coord_origin": "1"}}, {"id": 79, "text": "Left-looking cell rule", "bbox": {"l": 151.70099, "t": 347.17081, "r": 257.37927, "b": 355.97775, "coord_origin": "1"}}, {"id": 80, "text": ": The left neighbour of an \"L\" cell must be either", "bbox": {"l": 257.383, "t": 347.18079, "r": 480.58902, "b": 355.97775, "coord_origin": "1"}}, {"id": 81, "text": "another \"L\" cell or a \"C\" cell.", "bbox": {"l": 151.70099, "t": 359.13678, "r": 283.59387, "b": 367.93375, "coord_origin": "1"}}]}, "text": "1. Left-looking cell rule : The left neighbour of an \"L\" cell must be either another \"L\" cell or a \"C\" cell."}, {"label": "List-item", "id": 7, "page_no": 6, "cluster": {"id": 7, "label": "List-item", "bbox": {"l": 138.19280376434327, "t": 370.04180603027345, "r": 480.59229000000005, "b": 391.84673999999995, "coord_origin": "1"}, "confidence": 0.9545656442642212, "cells": [{"id": 82, "text": "2.", "bbox": {"l": 138.97299, "t": 371.09479, "r": 146.71991, "b": 379.89175, "coord_origin": "1"}}, {"id": 83, "text": "Up-looking cell rule", "bbox": {"l": 151.70099, "t": 371.08481, "r": 252.11203, "b": 379.89175, "coord_origin": "1"}}, {"id": 84, "text": ": The upper neighbour of a \"U\" cell must be either", "bbox": {"l": 252.112, "t": 371.09479, "r": 480.59229000000005, "b": 379.89175, "coord_origin": "1"}}, {"id": 85, "text": "another \"U\" cell or a \"C\" cell.", "bbox": {"l": 151.70099, "t": 383.04977, "r": 284.8392, "b": 391.84673999999995, "coord_origin": "1"}}]}, "text": "2. Up-looking cell rule : The upper neighbour of a \"U\" cell must be either another \"U\" cell or a \"C\" cell."}, {"label": "Section-header", "id": 8, "page_no": 6, "cluster": {"id": 8, "label": "Section-header", "bbox": {"l": 138.0652765274048, "t": 394.5083381652832, "r": 226.07360999999997, "b": 403.80475, "coord_origin": "1"}, "confidence": 0.729752779006958, "cells": [{"id": 86, "text": "3.", "bbox": {"l": 138.97299, "t": 395.0077800000001, "r": 146.71991, "b": 403.80475, "coord_origin": "1"}}, {"id": 87, "text": "Cross cell rule", "bbox": {"l": 151.70099, "t": 394.99780000000004, "r": 223.3042, "b": 403.80475, "coord_origin": "1"}}, {"id": 88, "text": ":", "bbox": {"l": 223.30699, "t": 395.0077800000001, "r": 226.07360999999997, "b": 403.80475, "coord_origin": "1"}}]}, "text": "3. Cross cell rule :"}, {"label": "List-item", "id": 9, "page_no": 6, "cluster": {"id": 9, "label": "List-item", "bbox": {"l": 146.40036334991456, "t": 395.0077800000001, "r": 480.59238, "b": 439.67371, "coord_origin": "1"}, "confidence": 0.7146523594856262, "cells": [{"id": 88, "text": ":", "bbox": {"l": 223.30699, "t": 395.0077800000001, "r": 226.07360999999997, "b": 403.80475, "coord_origin": "1"}}, {"id": 89, "text": "The left neighbour of an \"X\" cell must be either another \"X\" cell or a \"U\"", "bbox": {"l": 151.70099, "t": 406.96677, "r": 480.59238, "b": 415.76373, "coord_origin": "1"}}, {"id": 90, "text": "cell, and the upper neighbour of an \"X\" cell must be either another \"X\" cell", "bbox": {"l": 151.70099, "t": 418.9217499999999, "r": 480.59219, "b": 427.71871999999996, "coord_origin": "1"}}, {"id": 91, "text": "or an \"L\" cell.", "bbox": {"l": 151.70099, "t": 430.87674, "r": 214.39663999999996, "b": 439.67371, "coord_origin": "1"}}]}, "text": ": The left neighbour of an \"X\" cell must be either another \"X\" cell or a \"U\" cell, and the upper neighbour of an \"X\" cell must be either another \"X\" cell or an \"L\" cell."}, {"label": "List-item", "id": 10, "page_no": 6, "cluster": {"id": 10, "label": "List-item", "bbox": {"l": 138.3949067115784, "t": 442.1132652282715, "r": 474.59018, "b": 452.20458526611327, "coord_origin": "1"}, "confidence": 0.921468198299408, "cells": [{"id": 92, "text": "4.", "bbox": {"l": 138.97299, "t": 442.83572, "r": 146.71991, "b": 451.63269, "coord_origin": "1"}}, {"id": 93, "text": "First row rule", "bbox": {"l": 151.70099, "t": 442.82574, "r": 221.32263, "b": 451.63269, "coord_origin": "1"}}, {"id": 94, "text": ": Only \"L\" cells and \"C\" cells are allowed in the first row.", "bbox": {"l": 221.32700000000003, "t": 442.83572, "r": 474.59018, "b": 451.63269, "coord_origin": "1"}}]}, "text": "4. First row rule : Only \"L\" cells and \"C\" cells are allowed in the first row."}, {"label": "List-item", "id": 11, "page_no": 6, "cluster": {"id": 11, "label": "List-item", "bbox": {"l": 138.3254817008972, "t": 453.90531692504885, "r": 480.58746, "b": 475.54568, "coord_origin": "1"}, "confidence": 0.9438549280166626, "cells": [{"id": 95, "text": "5.", "bbox": {"l": 138.97299, "t": 454.7937299999999, "r": 146.71991, "b": 463.5907, "coord_origin": "1"}}, {"id": 96, "text": "First column rule", "bbox": {"l": 151.70099, "t": 454.78375, "r": 240.71982, "b": 463.5907, "coord_origin": "1"}}, {"id": 97, "text": ": Only \"U\" cells and \"C\" cells are allowed in the first", "bbox": {"l": 240.71599, "t": 454.7937299999999, "r": 480.58746, "b": 463.5907, "coord_origin": "1"}}, {"id": 98, "text": "column.", "bbox": {"l": 151.70099, "t": 466.74872, "r": 186.0072, "b": 475.54568, "coord_origin": "1"}}]}, "text": "5. First column rule : Only \"U\" cells and \"C\" cells are allowed in the first column."}, {"label": "List-item", "id": 12, "page_no": 6, "cluster": {"id": 12, "label": "List-item", "bbox": {"l": 138.22427701950073, "t": 477.50852966308594, "r": 480.59457, "b": 499.71804428100586, "coord_origin": "1"}, "confidence": 0.9670841693878174, "cells": [{"id": 99, "text": "6.", "bbox": {"l": 138.97299, "t": 478.70673, "r": 146.71991, "b": 487.50369, "coord_origin": "1"}}, {"id": 100, "text": "Rectangular rule", "bbox": {"l": 151.70099, "t": 478.69675, "r": 235.15768, "b": 487.50369, "coord_origin": "1"}}, {"id": 101, "text": ": The table representation is always rectangular - all rows", "bbox": {"l": 235.15697999999998, "t": 478.70673, "r": 480.59457, "b": 487.50369, "coord_origin": "1"}}, {"id": 102, "text": "must have an equal number of tokens, terminated with \"NL\" token.", "bbox": {"l": 151.70099, "t": 490.66272, "r": 448.04147, "b": 499.45969, "coord_origin": "1"}}]}, "text": "6. Rectangular rule : The table representation is always rectangular - all rows must have an equal number of tokens, terminated with \"NL\" token."}, {"label": "Text", "id": 13, "page_no": 6, "cluster": {"id": 13, "label": "Text", "bbox": {"l": 133.61584539413454, "t": 511.45877265930176, "r": 480.59583, "b": 642.2503395080566, "coord_origin": "1"}, "confidence": 0.9846742153167725, "cells": [{"id": 103, "text": "The application of these rules gives OTSL a set of unique properties. First", "bbox": {"l": 149.70898, "t": 512.59271, "r": 480.59583, "b": 521.38968, "coord_origin": "1"}}, {"id": 104, "text": "of all, the OTSL enforces a strictly rectangular structure representation, where", "bbox": {"l": 134.76498, "t": 524.5477000000001, "r": 480.59079, "b": 533.34467, "coord_origin": "1"}}, {"id": 105, "text": "every new-line token starts a new row. As a consequence, all rows and all columns", "bbox": {"l": 134.76498, "t": 536.5027, "r": 480.59482, "b": 545.29967, "coord_origin": "1"}}, {"id": 106, "text": "have exactly the same number of tokens, irrespective of cell spans. Secondly, the", "bbox": {"l": 134.76498, "t": 548.4586899999999, "r": 480.58865000000003, "b": 557.25566, "coord_origin": "1"}}, {"id": 107, "text": "OTSL representation is unambiguous: Every table structure is represented in one", "bbox": {"l": 134.76498, "t": 560.4137000000001, "r": 480.59365999999994, "b": 569.21066, "coord_origin": "1"}}, {"id": 108, "text": "way. In this representation every table cell corresponds to a \"C\"-cell token, which", "bbox": {"l": 134.76498, "t": 572.3687, "r": 480.58673, "b": 581.16566, "coord_origin": "1"}}, {"id": 109, "text": "in case of spans is always located in the top-left corner of the table cell definition.", "bbox": {"l": 134.76498, "t": 584.3237, "r": 480.59171, "b": 593.12067, "coord_origin": "1"}}, {"id": 110, "text": "Third, OTSL syntax rules are only backward-looking. As a consequence, every", "bbox": {"l": 134.76498, "t": 596.2787, "r": 480.59180000000003, "b": 605.07567, "coord_origin": "1"}}, {"id": 111, "text": "predicted token can be validated straight during sequence generation by looking", "bbox": {"l": 134.76498, "t": 608.2347, "r": 480.5936899999999, "b": 617.03166, "coord_origin": "1"}}, {"id": 112, "text": "at the previously predicted sequence. As such, OTSL can guarantee that every", "bbox": {"l": 134.76498, "t": 620.1897, "r": 480.59072999999995, "b": 628.98666, "coord_origin": "1"}}, {"id": 113, "text": "predicted sequence is syntactically valid.", "bbox": {"l": 134.76498, "t": 632.1447000000001, "r": 311.19769, "b": 640.9416699999999, "coord_origin": "1"}}]}, "text": "The application of these rules gives OTSL a set of unique properties. First of all, the OTSL enforces a strictly rectangular structure representation, where every new-line token starts a new row. As a consequence, all rows and all columns have exactly the same number of tokens, irrespective of cell spans. Secondly, the OTSL representation is unambiguous: Every table structure is represented in one way. In this representation every table cell corresponds to a \"C\"-cell token, which in case of spans is always located in the top-left corner of the table cell definition. Third, OTSL syntax rules are only backward-looking. As a consequence, every predicted token can be validated straight during sequence generation by looking at the previously predicted sequence. As such, OTSL can guarantee that every predicted sequence is syntactically valid."}, {"label": "Text", "id": 14, "page_no": 6, "cluster": {"id": 14, "label": "Text", "bbox": {"l": 134.0440538406372, "t": 643.1018760681153, "r": 480.59265, "b": 665.0898582458495, "coord_origin": "1"}, "confidence": 0.9682873487472534, "cells": [{"id": 114, "text": "These characteristics can be easily learned by sequence generator networks,", "bbox": {"l": 149.70898, "t": 644.1026899999999, "r": 480.59186, "b": 652.89966, "coord_origin": "1"}}, {"id": 115, "text": "as we demonstrate further below. We find strong indications that this pattern", "bbox": {"l": 134.76498, "t": 656.05769, "r": 480.59265, "b": 664.8546699999999, "coord_origin": "1"}}]}, "text": "These characteristics can be easily learned by sequence generator networks, as we demonstrate further below. We find strong indications that this pattern"}], "body": [{"label": "Caption", "id": 2, "page_no": 6, "cluster": {"id": 2, "label": "Caption", "bbox": {"l": 133.8881320953369, "t": 124.88460788726809, "r": 480.5874, "b": 156.37949838638303, "coord_origin": "1"}, "confidence": 0.9609795808792114, "cells": [{"id": 2, "text": "Fig. 3.", "bbox": {"l": 134.765, "t": 125.79918999999984, "r": 162.64424, "b": 133.72551999999996, "coord_origin": "1"}}, {"id": 3, "text": "OTSL description of table structure: A - table example; B - graphical repre-", "bbox": {"l": 166.276, "t": 125.86200000000008, "r": 480.58675999999997, "b": 133.93169999999998, "coord_origin": "1"}}, {"id": 4, "text": "sentation of table structure; C - mapping structure on a grid; D - OTSL structure", "bbox": {"l": 134.765, "t": 136.82097999999996, "r": 480.5874, "b": 144.89068999999995, "coord_origin": "1"}}, {"id": 5, "text": "encoding; E - explanation on cell encoding", "bbox": {"l": 134.765, "t": 147.77997000000005, "r": 306.1156, "b": 155.84966999999995, "coord_origin": "1"}}]}, "text": "Fig. 3. OTSL description of table structure: A - table example; B - graphical representation of table structure; C - mapping structure on a grid; D - OTSL structure encoding; E - explanation on cell encoding"}, {"label": "Picture", "id": 3, "page_no": 6, "cluster": {"id": 3, "label": "Picture", "bbox": {"l": 164.22023735046386, "t": 163.8766965866089, "r": 448.976096534729, "b": 280.3828954696655, "coord_origin": "1"}, "confidence": 0.9576331973075867, "cells": [{"id": 6, "text": "C", "bbox": {"l": 374.49326, "t": 168.59362999999996, "r": 381.66843, "b": 177.91540999999995, "coord_origin": "1"}}, {"id": 7, "text": "C", "bbox": {"l": 398.74011, "t": 168.50005999999996, "r": 405.91528, "b": 177.82183999999995, "coord_origin": "1"}}, {"id": 8, "text": "C", "bbox": {"l": 373.76862, "t": 192.92553999999996, "r": 380.94379, "b": 202.24730999999997, "coord_origin": "1"}}, {"id": 9, "text": "C", "bbox": {"l": 386.66388, "t": 193.07061999999996, "r": 393.83905, "b": 202.39239999999995, "coord_origin": "1"}}, {"id": 10, "text": "C", "bbox": {"l": 386.68707, "t": 205.13756999999998, "r": 393.86224, "b": 214.45934999999997, "coord_origin": "1"}}, {"id": 11, "text": "C", "bbox": {"l": 398.65729, "t": 180.73279000000002, "r": 405.83246, "b": 190.05457, "coord_origin": "1"}}, {"id": 12, "text": "C", "bbox": {"l": 410.77908, "t": 180.73859000000004, "r": 417.95425, "b": 190.06035999999995, "coord_origin": "1"}}, {"id": 13, "text": "C", "bbox": {"l": 422.90503, "t": 180.65247, "r": 430.08020000000005, "b": 189.97424, "coord_origin": "1"}}, {"id": 14, "text": "C", "bbox": {"l": 398.7807, "t": 192.98865, "r": 405.95587, "b": 202.31042000000002, "coord_origin": "1"}}, {"id": 15, "text": "C", "bbox": {"l": 410.90164, "t": 192.99487, "r": 418.07681, "b": 202.31664999999998, "coord_origin": "1"}}, {"id": 16, "text": "C", "bbox": {"l": 423.02753, "t": 192.909, "r": 430.2027, "b": 202.23077, "coord_origin": "1"}}, {"id": 17, "text": "C", "bbox": {"l": 398.78235, "t": 205.31573000000003, "r": 405.95752, "b": 214.63751000000002, "coord_origin": "1"}}, {"id": 18, "text": "C", "bbox": {"l": 410.90414, "t": 205.32196, "r": 418.07932, "b": 214.64373999999998, "coord_origin": "1"}}, {"id": 19, "text": "C", "bbox": {"l": 423.03003, "t": 205.23614999999995, "r": 430.20520000000005, "b": 214.55791999999997, "coord_origin": "1"}}, {"id": 20, "text": "C", "bbox": {"l": 386.50574, "t": 217.03882, "r": 393.68091, "b": 226.36059999999998, "coord_origin": "1"}}, {"id": 21, "text": "C", "bbox": {"l": 398.60181, "t": 217.21704, "r": 405.77698, "b": 226.53882, "coord_origin": "1"}}, {"id": 22, "text": "C", "bbox": {"l": 410.72275, "t": 217.22321, "r": 417.89792, "b": 226.54498, "coord_origin": "1"}}, {"id": 23, "text": "C", "bbox": {"l": 422.84869, "t": 217.13738999999998, "r": 430.02386, "b": 226.45916999999997, "coord_origin": "1"}}, {"id": 24, "text": "NL", "bbox": {"l": 435.16009999999994, "t": 167.69011999999998, "r": 447.86273, "b": 177.01189999999997, "coord_origin": "1"}}, {"id": 25, "text": "NL", "bbox": {"l": 435.44415, "t": 180.20025999999996, "r": 448.14679, "b": 189.52202999999997, "coord_origin": "1"}}, {"id": 26, "text": "NL", "bbox": {"l": 435.46735, "t": 192.49474999999995, "r": 448.16998000000007, "b": 201.81652999999994, "coord_origin": "1"}}, {"id": 27, "text": "NL", "bbox": {"l": 435.38202, "t": 204.83025999999995, "r": 448.08466, "b": 214.15204000000006, "coord_origin": "1"}}, {"id": 28, "text": "NL", "bbox": {"l": 435.59906, "t": 217.2337, "r": 448.3017, "b": 226.55548, "coord_origin": "1"}}, {"id": 29, "text": "U", "bbox": {"l": 374.14957, "t": 205.23492, "r": 381.32474, "b": 214.55669999999998, "coord_origin": "1"}}, {"id": 30, "text": "U", "bbox": {"l": 374.0419, "t": 217.14648, "r": 381.21707, "b": 226.46826, "coord_origin": "1"}}, {"id": 31, "text": "U", "bbox": {"l": 374.34418, "t": 180.93488000000002, "r": 381.51935, "b": 190.25665000000004, "coord_origin": "1"}}, {"id": 32, "text": "L", "bbox": {"l": 387.76285, "t": 168.57788000000005, "r": 393.28833, "b": 177.89966000000004, "coord_origin": "1"}}, {"id": 33, "text": "L", "bbox": {"l": 411.86395, "t": 168.06195000000002, "r": 417.38943, "b": 177.38373, "coord_origin": "1"}}, {"id": 34, "text": "L", "bbox": {"l": 423.33563, "t": 167.93439, "r": 428.86111, "b": 177.25616000000002, "coord_origin": "1"}}, {"id": 35, "text": "X", "bbox": {"l": 387.13593, "t": 180.78576999999996, "r": 393.76453, "b": 190.10753999999997, "coord_origin": "1"}}, {"id": 36, "text": "C", "bbox": {"l": 282.2594, "t": 244.50878999999998, "r": 289.43457, "b": 253.83056999999997, "coord_origin": "1"}}, {"id": 37, "text": "U", "bbox": {"l": 282.11035, "t": 256.85022000000004, "r": 289.28552, "b": 266.172, "coord_origin": "1"}}, {"id": 38, "text": "U", "bbox": {"l": 282.40848, "t": 269.13300000000004, "r": 289.58365, "b": 278.45477000000005, "coord_origin": "1"}}, {"id": 39, "text": "L", "bbox": {"l": 295.52902, "t": 244.49347, "r": 301.0545, "b": 253.81525, "coord_origin": "1"}}, {"id": 40, "text": "L", "bbox": {"l": 307.46613, "t": 244.57372999999995, "r": 312.99161, "b": 253.89550999999994, "coord_origin": "1"}}, {"id": 41, "text": "L", "bbox": {"l": 318.76886, "t": 244.44037000000003, "r": 324.29434, "b": 253.76215000000002, "coord_origin": "1"}}, {"id": 42, "text": "X", "bbox": {"l": 294.9021, "t": 256.70154, "r": 301.03976, "b": 266.02332, "coord_origin": "1"}}, {"id": 43, "text": "X X", "bbox": {"l": 307.17743, "t": 256.70154, "r": 325.59039, "b": 266.02332, "coord_origin": "1"}}, {"id": 44, "text": "X", "bbox": {"l": 294.78949, "t": 269.25420999999994, "r": 300.92715, "b": 278.57599000000005, "coord_origin": "1"}}, {"id": 45, "text": "X X", "bbox": {"l": 307.06482, "t": 269.25420999999994, "r": 325.47778, "b": 278.57599000000005, "coord_origin": "1"}}, {"id": 46, "text": "C", "bbox": {"l": 195.93939, "t": 268.74798999999996, "r": 203.11456, "b": 278.06976, "coord_origin": "1"}}, {"id": 47, "text": "L", "bbox": {"l": 209.20891, "t": 268.73267, "r": 214.73439, "b": 278.05444, "coord_origin": "1"}}, {"id": 48, "text": "L", "bbox": {"l": 221.14551, "t": 268.81293000000005, "r": 226.67099, "b": 278.13469999999995, "coord_origin": "1"}}, {"id": 49, "text": "L", "bbox": {"l": 232.44858, "t": 268.67957, "r": 237.97405999999998, "b": 278.00134, "coord_origin": "1"}}, {"id": 50, "text": "C", "bbox": {"l": 196.21715, "t": 244.53961000000004, "r": 203.39232, "b": 253.86139000000003, "coord_origin": "1"}}, {"id": 51, "text": "C", "bbox": {"l": 250.32143, "t": 244.09813999999994, "r": 257.49661, "b": 253.41992000000005, "coord_origin": "1"}}, {"id": 52, "text": "U", "bbox": {"l": 250.17235999999997, "t": 256.43951000000004, "r": 257.34753, "b": 265.76129000000003, "coord_origin": "1"}}, {"id": 53, "text": "U", "bbox": {"l": 250.47049000000004, "t": 268.72222999999997, "r": 257.64566, "b": 278.04400999999996, "coord_origin": "1"}}, {"id": 54, "text": "1", "bbox": {"l": 334.51135, "t": 242.99463000000003, "r": 337.22485, "b": 249.20911, "coord_origin": "1"}}, {"id": 55, "text": "- simple cells: \"C\"", "bbox": {"l": 339.93835, "t": 242.99463000000003, "r": 391.49472, "b": 249.20911, "coord_origin": "1"}}, {"id": 56, "text": "2", "bbox": {"l": 334.51135, "t": 252.93255999999997, "r": 337.33313, "b": 259.14703, "coord_origin": "1"}}, {"id": 57, "text": "- horizontal merges: \"C\", \"L\"", "bbox": {"l": 340.15491, "t": 252.93255999999997, "r": 421.98624, "b": 259.14703, "coord_origin": "1"}}, {"id": 58, "text": "3", "bbox": {"l": 334.51135, "t": 262.87048000000004, "r": 337.29868, "b": 269.08496, "coord_origin": "1"}}, {"id": 59, "text": "- vertical merges: \"C\", \"U\"", "bbox": {"l": 340.086, "t": 262.87048000000004, "r": 415.34375, "b": 269.08496, "coord_origin": "1"}}, {"id": 60, "text": "4", "bbox": {"l": 334.51135, "t": 272.80841, "r": 337.30188, "b": 279.02288999999996, "coord_origin": "1"}}, {"id": 61, "text": "- 2d merges: \"C\", \"L\", \"U\", \"X\"", "bbox": {"l": 340.09241, "t": 272.80841, "r": 426.59875, "b": 279.02288999999996, "coord_origin": "1"}}, {"id": 62, "text": "1", "bbox": {"l": 185.67178, "t": 244.04224, "r": 189.35544, "b": 250.25671, "coord_origin": "1"}}, {"id": 63, "text": "2", "bbox": {"l": 185.96759, "t": 268.34766, "r": 189.65125, "b": 274.56213, "coord_origin": "1"}}, {"id": 64, "text": "3", "bbox": {"l": 239.34152, "t": 243.62523999999996, "r": 243.02518, "b": 249.83972000000006, "coord_origin": "1"}}, {"id": 65, "text": "4", "bbox": {"l": 271.32852, "t": 243.49390000000005, "r": 275.01218, "b": 249.70836999999995, "coord_origin": "1"}}, {"id": 66, "text": "2", "bbox": {"l": 229.81627, "t": 166.51495, "r": 233.49992000000003, "b": 172.72942999999998, "coord_origin": "1"}}, {"id": 67, "text": "1", "bbox": {"l": 257.24402, "t": 189.961, "r": 260.92767, "b": 196.17548, "coord_origin": "1"}}, {"id": 68, "text": "3", "bbox": {"l": 186.87526, "t": 177.97668, "r": 190.55891, "b": 184.19115999999997, "coord_origin": "1"}}, {"id": 69, "text": "4", "bbox": {"l": 196.48746, "t": 169.01520000000005, "r": 200.17111, "b": 175.22968000000003, "coord_origin": "1"}}, {"id": 70, "text": "A", "bbox": {"l": 169.74728, "t": 167.88225999999997, "r": 175.72659, "b": 175.65039000000002, "coord_origin": "1"}}, {"id": 71, "text": "B", "bbox": {"l": 169.74728, "t": 206.83867999999995, "r": 175.72659, "b": 214.60681, "coord_origin": "1"}}, {"id": 72, "text": "C", "bbox": {"l": 274.29419, "t": 168.27972, "r": 280.2735, "b": 176.04785000000004, "coord_origin": "1"}}, {"id": 73, "text": "D", "bbox": {"l": 359.56152, "t": 168.27972, "r": 365.54083, "b": 176.04785000000004, "coord_origin": "1"}}, {"id": 74, "text": "E", "bbox": {"l": 169.74728, "t": 243.21149000000003, "r": 175.27112, "b": 250.97960999999998, "coord_origin": "1"}}]}, "text": "", "data": null, "provenance": null, "predicted_class": null, "confidence": null}, {"label": "Section-header", "id": 4, "page_no": 6, "cluster": {"id": 4, "label": "Section-header", "bbox": {"l": 134.28743534088133, "t": 304.4804964065552, "r": 246.78787822723388, "b": 314.2943344116211, "coord_origin": "1"}, "confidence": 0.9464288949966431, "cells": [{"id": 75, "text": "4.2", "bbox": {"l": 134.765, "t": 305.29581, "r": 149.40205, "b": 314.10275, "coord_origin": "1"}}, {"id": 76, "text": "Language Syntax", "bbox": {"l": 160.85904, "t": 305.29581, "r": 246.65197999999998, "b": 314.10275, "coord_origin": "1"}}]}, "text": "4.2 Language Syntax"}, {"label": "Text", "id": 5, "page_no": 6, "cluster": {"id": 5, "label": "Text", "bbox": {"l": 134.2309673309326, "t": 324.43218154907225, "r": 363.79617, "b": 334.19743423461915, "coord_origin": "1"}, "confidence": 0.9260643720626831, "cells": [{"id": 77, "text": "The OTSL representation follows these syntax rules:", "bbox": {"l": 134.765, "t": 325.24777, "r": 363.79617, "b": 334.04474, "coord_origin": "1"}}]}, "text": "The OTSL representation follows these syntax rules:"}, {"label": "List-item", "id": 6, "page_no": 6, "cluster": {"id": 6, "label": "List-item", "bbox": {"l": 138.97299, "t": 346.1299736022949, "r": 480.58902, "b": 367.93375, "coord_origin": "1"}, "confidence": 0.9650311470031738, "cells": [{"id": 78, "text": "1.", "bbox": {"l": 138.97299, "t": 347.18079, "r": 146.71991, "b": 355.97775, "coord_origin": "1"}}, {"id": 79, "text": "Left-looking cell rule", "bbox": {"l": 151.70099, "t": 347.17081, "r": 257.37927, "b": 355.97775, "coord_origin": "1"}}, {"id": 80, "text": ": The left neighbour of an \"L\" cell must be either", "bbox": {"l": 257.383, "t": 347.18079, "r": 480.58902, "b": 355.97775, "coord_origin": "1"}}, {"id": 81, "text": "another \"L\" cell or a \"C\" cell.", "bbox": {"l": 151.70099, "t": 359.13678, "r": 283.59387, "b": 367.93375, "coord_origin": "1"}}]}, "text": "1. Left-looking cell rule : The left neighbour of an \"L\" cell must be either another \"L\" cell or a \"C\" cell."}, {"label": "List-item", "id": 7, "page_no": 6, "cluster": {"id": 7, "label": "List-item", "bbox": {"l": 138.19280376434327, "t": 370.04180603027345, "r": 480.59229000000005, "b": 391.84673999999995, "coord_origin": "1"}, "confidence": 0.9545656442642212, "cells": [{"id": 82, "text": "2.", "bbox": {"l": 138.97299, "t": 371.09479, "r": 146.71991, "b": 379.89175, "coord_origin": "1"}}, {"id": 83, "text": "Up-looking cell rule", "bbox": {"l": 151.70099, "t": 371.08481, "r": 252.11203, "b": 379.89175, "coord_origin": "1"}}, {"id": 84, "text": ": The upper neighbour of a \"U\" cell must be either", "bbox": {"l": 252.112, "t": 371.09479, "r": 480.59229000000005, "b": 379.89175, "coord_origin": "1"}}, {"id": 85, "text": "another \"U\" cell or a \"C\" cell.", "bbox": {"l": 151.70099, "t": 383.04977, "r": 284.8392, "b": 391.84673999999995, "coord_origin": "1"}}]}, "text": "2. Up-looking cell rule : The upper neighbour of a \"U\" cell must be either another \"U\" cell or a \"C\" cell."}, {"label": "Section-header", "id": 8, "page_no": 6, "cluster": {"id": 8, "label": "Section-header", "bbox": {"l": 138.0652765274048, "t": 394.5083381652832, "r": 226.07360999999997, "b": 403.80475, "coord_origin": "1"}, "confidence": 0.729752779006958, "cells": [{"id": 86, "text": "3.", "bbox": {"l": 138.97299, "t": 395.0077800000001, "r": 146.71991, "b": 403.80475, "coord_origin": "1"}}, {"id": 87, "text": "Cross cell rule", "bbox": {"l": 151.70099, "t": 394.99780000000004, "r": 223.3042, "b": 403.80475, "coord_origin": "1"}}, {"id": 88, "text": ":", "bbox": {"l": 223.30699, "t": 395.0077800000001, "r": 226.07360999999997, "b": 403.80475, "coord_origin": "1"}}]}, "text": "3. Cross cell rule :"}, {"label": "List-item", "id": 9, "page_no": 6, "cluster": {"id": 9, "label": "List-item", "bbox": {"l": 146.40036334991456, "t": 395.0077800000001, "r": 480.59238, "b": 439.67371, "coord_origin": "1"}, "confidence": 0.7146523594856262, "cells": [{"id": 88, "text": ":", "bbox": {"l": 223.30699, "t": 395.0077800000001, "r": 226.07360999999997, "b": 403.80475, "coord_origin": "1"}}, {"id": 89, "text": "The left neighbour of an \"X\" cell must be either another \"X\" cell or a \"U\"", "bbox": {"l": 151.70099, "t": 406.96677, "r": 480.59238, "b": 415.76373, "coord_origin": "1"}}, {"id": 90, "text": "cell, and the upper neighbour of an \"X\" cell must be either another \"X\" cell", "bbox": {"l": 151.70099, "t": 418.9217499999999, "r": 480.59219, "b": 427.71871999999996, "coord_origin": "1"}}, {"id": 91, "text": "or an \"L\" cell.", "bbox": {"l": 151.70099, "t": 430.87674, "r": 214.39663999999996, "b": 439.67371, "coord_origin": "1"}}]}, "text": ": The left neighbour of an \"X\" cell must be either another \"X\" cell or a \"U\" cell, and the upper neighbour of an \"X\" cell must be either another \"X\" cell or an \"L\" cell."}, {"label": "List-item", "id": 10, "page_no": 6, "cluster": {"id": 10, "label": "List-item", "bbox": {"l": 138.3949067115784, "t": 442.1132652282715, "r": 474.59018, "b": 452.20458526611327, "coord_origin": "1"}, "confidence": 0.921468198299408, "cells": [{"id": 92, "text": "4.", "bbox": {"l": 138.97299, "t": 442.83572, "r": 146.71991, "b": 451.63269, "coord_origin": "1"}}, {"id": 93, "text": "First row rule", "bbox": {"l": 151.70099, "t": 442.82574, "r": 221.32263, "b": 451.63269, "coord_origin": "1"}}, {"id": 94, "text": ": Only \"L\" cells and \"C\" cells are allowed in the first row.", "bbox": {"l": 221.32700000000003, "t": 442.83572, "r": 474.59018, "b": 451.63269, "coord_origin": "1"}}]}, "text": "4. First row rule : Only \"L\" cells and \"C\" cells are allowed in the first row."}, {"label": "List-item", "id": 11, "page_no": 6, "cluster": {"id": 11, "label": "List-item", "bbox": {"l": 138.3254817008972, "t": 453.90531692504885, "r": 480.58746, "b": 475.54568, "coord_origin": "1"}, "confidence": 0.9438549280166626, "cells": [{"id": 95, "text": "5.", "bbox": {"l": 138.97299, "t": 454.7937299999999, "r": 146.71991, "b": 463.5907, "coord_origin": "1"}}, {"id": 96, "text": "First column rule", "bbox": {"l": 151.70099, "t": 454.78375, "r": 240.71982, "b": 463.5907, "coord_origin": "1"}}, {"id": 97, "text": ": Only \"U\" cells and \"C\" cells are allowed in the first", "bbox": {"l": 240.71599, "t": 454.7937299999999, "r": 480.58746, "b": 463.5907, "coord_origin": "1"}}, {"id": 98, "text": "column.", "bbox": {"l": 151.70099, "t": 466.74872, "r": 186.0072, "b": 475.54568, "coord_origin": "1"}}]}, "text": "5. First column rule : Only \"U\" cells and \"C\" cells are allowed in the first column."}, {"label": "List-item", "id": 12, "page_no": 6, "cluster": {"id": 12, "label": "List-item", "bbox": {"l": 138.22427701950073, "t": 477.50852966308594, "r": 480.59457, "b": 499.71804428100586, "coord_origin": "1"}, "confidence": 0.9670841693878174, "cells": [{"id": 99, "text": "6.", "bbox": {"l": 138.97299, "t": 478.70673, "r": 146.71991, "b": 487.50369, "coord_origin": "1"}}, {"id": 100, "text": "Rectangular rule", "bbox": {"l": 151.70099, "t": 478.69675, "r": 235.15768, "b": 487.50369, "coord_origin": "1"}}, {"id": 101, "text": ": The table representation is always rectangular - all rows", "bbox": {"l": 235.15697999999998, "t": 478.70673, "r": 480.59457, "b": 487.50369, "coord_origin": "1"}}, {"id": 102, "text": "must have an equal number of tokens, terminated with \"NL\" token.", "bbox": {"l": 151.70099, "t": 490.66272, "r": 448.04147, "b": 499.45969, "coord_origin": "1"}}]}, "text": "6. Rectangular rule : The table representation is always rectangular - all rows must have an equal number of tokens, terminated with \"NL\" token."}, {"label": "Text", "id": 13, "page_no": 6, "cluster": {"id": 13, "label": "Text", "bbox": {"l": 133.61584539413454, "t": 511.45877265930176, "r": 480.59583, "b": 642.2503395080566, "coord_origin": "1"}, "confidence": 0.9846742153167725, "cells": [{"id": 103, "text": "The application of these rules gives OTSL a set of unique properties. First", "bbox": {"l": 149.70898, "t": 512.59271, "r": 480.59583, "b": 521.38968, "coord_origin": "1"}}, {"id": 104, "text": "of all, the OTSL enforces a strictly rectangular structure representation, where", "bbox": {"l": 134.76498, "t": 524.5477000000001, "r": 480.59079, "b": 533.34467, "coord_origin": "1"}}, {"id": 105, "text": "every new-line token starts a new row. As a consequence, all rows and all columns", "bbox": {"l": 134.76498, "t": 536.5027, "r": 480.59482, "b": 545.29967, "coord_origin": "1"}}, {"id": 106, "text": "have exactly the same number of tokens, irrespective of cell spans. Secondly, the", "bbox": {"l": 134.76498, "t": 548.4586899999999, "r": 480.58865000000003, "b": 557.25566, "coord_origin": "1"}}, {"id": 107, "text": "OTSL representation is unambiguous: Every table structure is represented in one", "bbox": {"l": 134.76498, "t": 560.4137000000001, "r": 480.59365999999994, "b": 569.21066, "coord_origin": "1"}}, {"id": 108, "text": "way. In this representation every table cell corresponds to a \"C\"-cell token, which", "bbox": {"l": 134.76498, "t": 572.3687, "r": 480.58673, "b": 581.16566, "coord_origin": "1"}}, {"id": 109, "text": "in case of spans is always located in the top-left corner of the table cell definition.", "bbox": {"l": 134.76498, "t": 584.3237, "r": 480.59171, "b": 593.12067, "coord_origin": "1"}}, {"id": 110, "text": "Third, OTSL syntax rules are only backward-looking. As a consequence, every", "bbox": {"l": 134.76498, "t": 596.2787, "r": 480.59180000000003, "b": 605.07567, "coord_origin": "1"}}, {"id": 111, "text": "predicted token can be validated straight during sequence generation by looking", "bbox": {"l": 134.76498, "t": 608.2347, "r": 480.5936899999999, "b": 617.03166, "coord_origin": "1"}}, {"id": 112, "text": "at the previously predicted sequence. As such, OTSL can guarantee that every", "bbox": {"l": 134.76498, "t": 620.1897, "r": 480.59072999999995, "b": 628.98666, "coord_origin": "1"}}, {"id": 113, "text": "predicted sequence is syntactically valid.", "bbox": {"l": 134.76498, "t": 632.1447000000001, "r": 311.19769, "b": 640.9416699999999, "coord_origin": "1"}}]}, "text": "The application of these rules gives OTSL a set of unique properties. First of all, the OTSL enforces a strictly rectangular structure representation, where every new-line token starts a new row. As a consequence, all rows and all columns have exactly the same number of tokens, irrespective of cell spans. Secondly, the OTSL representation is unambiguous: Every table structure is represented in one way. In this representation every table cell corresponds to a \"C\"-cell token, which in case of spans is always located in the top-left corner of the table cell definition. Third, OTSL syntax rules are only backward-looking. As a consequence, every predicted token can be validated straight during sequence generation by looking at the previously predicted sequence. As such, OTSL can guarantee that every predicted sequence is syntactically valid."}, {"label": "Text", "id": 14, "page_no": 6, "cluster": {"id": 14, "label": "Text", "bbox": {"l": 134.0440538406372, "t": 643.1018760681153, "r": 480.59265, "b": 665.0898582458495, "coord_origin": "1"}, "confidence": 0.9682873487472534, "cells": [{"id": 114, "text": "These characteristics can be easily learned by sequence generator networks,", "bbox": {"l": 149.70898, "t": 644.1026899999999, "r": 480.59186, "b": 652.89966, "coord_origin": "1"}}, {"id": 115, "text": "as we demonstrate further below. We find strong indications that this pattern", "bbox": {"l": 134.76498, "t": 656.05769, "r": 480.59265, "b": 664.8546699999999, "coord_origin": "1"}}]}, "text": "These characteristics can be easily learned by sequence generator networks, as we demonstrate further below. We find strong indications that this pattern"}], "headers": [{"label": "Page-header", "id": 0, "page_no": 6, "cluster": {"id": 0, "label": "Page-header", "bbox": {"l": 193.9747784614563, "t": 93.12438640594485, "r": 447.54291000000006, "b": 102.22475852966306, "coord_origin": "1"}, "confidence": 0.9503458738327026, "cells": [{"id": 0, "text": "Optimized Table Tokenization for Table Structure Recognition", "bbox": {"l": 194.478, "t": 93.77099999999996, "r": 447.54291000000006, "b": 101.84069999999997, "coord_origin": "1"}}]}, "text": "Optimized Table Tokenization for Table Structure Recognition"}, {"label": "Page-header", "id": 1, "page_no": 6, "cluster": {"id": 1, "label": "Page-header", "bbox": {"l": 475.39760971069336, "t": 93.39061431884761, "r": 480.59125000000006, "b": 101.84069999999997, "coord_origin": "1"}, "confidence": 0.8631047010421753, "cells": [{"id": 1, "text": "7", "bbox": {"l": 475.98431, "t": 93.77099999999996, "r": 480.59125000000006, "b": 101.84069999999997, "coord_origin": "1"}}]}, "text": "7"}]}}, {"page_no": 7, "page_hash": "839d5ba3f9d079e8b42470002e4d7cb9ac60681cd9e2f2e3bf41afa6884a170e", "size": {"width": 612.0, "height": 792.0}, "cells": [{"id": 0, "text": "8", "bbox": {"l": 134.765, "t": 93.77099999999996, "r": 139.37193, "b": 101.84069999999997, "coord_origin": "1"}}, {"id": 1, "text": "M.", "bbox": {"l": 167.81335, "t": 93.77099999999996, "r": 178.07675, "b": 101.84069999999997, "coord_origin": "1"}}, {"id": 2, "text": "Lysak, et al.", "bbox": {"l": 182.37415, "t": 93.77099999999996, "r": 231.72227, "b": 101.84069999999997, "coord_origin": "1"}}, {"id": 3, "text": "reduces significantly the column drift seen in the HTML based models (see Fig-", "bbox": {"l": 134.765, "t": 118.93377999999996, "r": 480.58884000000006, "b": 127.73077, "coord_origin": "1"}}, {"id": 4, "text": "ure 5).", "bbox": {"l": 134.765, "t": 130.88878999999997, "r": 163.56389, "b": 139.68579, "coord_origin": "1"}}, {"id": 5, "text": "4.3", "bbox": {"l": 134.765, "t": 161.55682000000002, "r": 149.40205, "b": 170.36377000000005, "coord_origin": "1"}}, {"id": 6, "text": "Error-detection and -mitigation", "bbox": {"l": 160.85904, "t": 161.55682000000002, "r": 319.34708, "b": 170.36377000000005, "coord_origin": "1"}}, {"id": 7, "text": "The design of OTSL allows to validate a table structure easily on an unfinished", "bbox": {"l": 134.765, "t": 182.28179999999998, "r": 480.59572999999995, "b": 191.0788, "coord_origin": "1"}}, {"id": 8, "text": "sequence. The detection of an invalid sequence token is a clear indication of a", "bbox": {"l": 134.765, "t": 194.23779000000002, "r": 480.59473, "b": 203.03479000000004, "coord_origin": "1"}}, {"id": 9, "text": "prediction mistake, however a valid sequence by itself does not guarantee pre-", "bbox": {"l": 134.765, "t": 206.19281, "r": 480.58678999999995, "b": 214.98981000000003, "coord_origin": "1"}}, {"id": 10, "text": "diction correctness. Different heuristics can be used to correct token errors in", "bbox": {"l": 134.765, "t": 218.14783, "r": 480.59177000000005, "b": 226.94482000000005, "coord_origin": "1"}}, {"id": 11, "text": "an invalid sequence and thus increase the chances for accurate predictions. Such", "bbox": {"l": 134.765, "t": 230.10284000000001, "r": 480.58768, "b": 238.89984000000004, "coord_origin": "1"}}, {"id": 12, "text": "heuristics can be applied either after the prediction of each token, or at the end", "bbox": {"l": 134.765, "t": 242.05786, "r": 480.5867, "b": 250.85486000000003, "coord_origin": "1"}}, {"id": 13, "text": "on the entire predicted sequence. For example a simple heuristic which can cor-", "bbox": {"l": 134.765, "t": 254.01288, "r": 480.5938100000001, "b": 262.80988, "coord_origin": "1"}}, {"id": 14, "text": "rect the predicted OTSL sequence on-the-fly is to verify if the token with the", "bbox": {"l": 134.765, "t": 265.96887000000004, "r": 480.59069999999997, "b": 274.76586999999995, "coord_origin": "1"}}, {"id": 15, "text": "highest prediction confidence invalidates the predicted sequence, and replace it", "bbox": {"l": 134.765, "t": 277.92389000000003, "r": 480.5957599999999, "b": 286.72086, "coord_origin": "1"}}, {"id": 16, "text": "by the token with the next highest confidence until OTSL rules are satisfied.", "bbox": {"l": 134.765, "t": 289.8788799999999, "r": 469.40369, "b": 298.67584, "coord_origin": "1"}}, {"id": 17, "text": "5", "bbox": {"l": 134.765, "t": 321.164, "r": 141.4886, "b": 331.73239000000007, "coord_origin": "1"}}, {"id": 18, "text": "Experiments", "bbox": {"l": 154.9382, "t": 321.164, "r": 229.03534, "b": 331.73239000000007, "coord_origin": "1"}}, {"id": 19, "text": "To evaluate the impact of OTSL on prediction accuracy and inference times, we", "bbox": {"l": 134.765, "t": 347.24985, "r": 480.59375, "b": 356.04681, "coord_origin": "1"}}, {"id": 20, "text": "conducted a series of experiments based on the TableFormer model (Figure 4)", "bbox": {"l": 134.765, "t": 359.2048300000001, "r": 480.59476, "b": 368.0018, "coord_origin": "1"}}, {"id": 21, "text": "with two objectives: Firstly we evaluate the prediction quality and performance", "bbox": {"l": 134.765, "t": 371.15982, "r": 480.58786000000003, "b": 379.95679, "coord_origin": "1"}}, {"id": 22, "text": "of OTSL vs. HTML after performing Hyper Parameter Optimization (HPO) on", "bbox": {"l": 134.765, "t": 383.11481000000003, "r": 480.58777, "b": 391.91177, "coord_origin": "1"}}, {"id": 23, "text": "the", "bbox": {"l": 134.765, "t": 395.06978999999995, "r": 148.59807, "b": 403.86676, "coord_origin": "1"}}, {"id": 24, "text": "canonical", "bbox": {"l": 151.627, "t": 395.06978999999995, "r": 191.84703, "b": 403.86676, "coord_origin": "1"}}, {"id": 25, "text": "PubTabNet data set. Secondly we pick the best hyper-parameters", "bbox": {"l": 195.90201, "t": 395.06978999999995, "r": 480.59528, "b": 403.86676, "coord_origin": "1"}}, {"id": 26, "text": "found in the first step and evaluate how OTSL impacts the performance of", "bbox": {"l": 134.76501, "t": 407.02478, "r": 480.59283000000005, "b": 415.82175, "coord_origin": "1"}}, {"id": 27, "text": "TableFormer after training on other publicly available data sets (FinTabNet,", "bbox": {"l": 134.76501, "t": 418.98077, "r": 480.59476, "b": 427.77774, "coord_origin": "1"}}, {"id": 28, "text": "PubTables-1M [14]). The ground truth (GT) from all data sets has been con-", "bbox": {"l": 134.76501, "t": 430.93576, "r": 480.59171, "b": 439.73273, "coord_origin": "1"}}, {"id": 29, "text": "verted into OTSL format for this purpose, and will be made publicly available.", "bbox": {"l": 134.76501, "t": 442.8907500000001, "r": 479.30258, "b": 451.6877099999999, "coord_origin": "1"}}, {"id": 30, "text": "Fig. 4.", "bbox": {"l": 134.76501, "t": 484.64813, "r": 162.64424, "b": 492.57443, "coord_origin": "1"}}, {"id": 31, "text": "Architecture sketch of the TableFormer model, which is a representative for the", "bbox": {"l": 165.19601, "t": 484.71091, "r": 480.59082, "b": 492.78067, "coord_origin": "1"}}, {"id": 32, "text": "Im2Seq approach.", "bbox": {"l": 134.76501, "t": 495.66989, "r": 206.70245, "b": 503.73965, "coord_origin": "1"}}, {"id": 33, "text": "1.", "bbox": {"l": 147.30025, "t": 540.73164, "r": 149.70605, "b": 543.1000799999999, "coord_origin": "1"}}, {"id": 34, "text": "Item", "bbox": {"l": 150.90895, "t": 540.73164, "r": 155.72055, "b": 543.1000799999999, "coord_origin": "1"}}, {"id": 35, "text": "Amount", "bbox": {"l": 162.75987, "t": 535.3938, "r": 172.2963, "b": 537.76224, "coord_origin": "1"}}, {"id": 36, "text": "Names", "bbox": {"l": 147.63603, "t": 535.3661500000001, "r": 155.91753, "b": 537.73459, "coord_origin": "1"}}, {"id": 37, "text": "1000", "bbox": {"l": 158.48466, "t": 540.73164, "r": 164.10178, "b": 543.1000799999999, "coord_origin": "1"}}, {"id": 38, "text": "500", "bbox": {"l": 158.48466, "t": 544.67065, "r": 162.69737, "b": 547.03909, "coord_origin": "1"}}, {"id": 39, "text": "3500", "bbox": {"l": 158.48466, "t": 548.91264, "r": 164.10178, "b": 551.28108, "coord_origin": "1"}}, {"id": 40, "text": "150", "bbox": {"l": 158.48466, "t": 553.15465, "r": 162.69737, "b": 555.52309, "coord_origin": "1"}}, {"id": 41, "text": "unit", "bbox": {"l": 168.81696, "t": 540.73164, "r": 172.88876, "b": 543.1000799999999, "coord_origin": "1"}}, {"id": 42, "text": "unit", "bbox": {"l": 168.81696, "t": 544.67065, "r": 172.88876, "b": 547.03909, "coord_origin": "1"}}, {"id": 43, "text": "unit", "bbox": {"l": 168.81696, "t": 548.91264, "r": 172.88876, "b": 551.28108, "coord_origin": "1"}}, {"id": 44, "text": "unit", "bbox": {"l": 168.81696, "t": 553.15465, "r": 172.88876, "b": 555.52309, "coord_origin": "1"}}, {"id": 45, "text": "2.", "bbox": {"l": 147.30025, "t": 544.67065, "r": 149.70605, "b": 547.03909, "coord_origin": "1"}}, {"id": 46, "text": "Item", "bbox": {"l": 150.90895, "t": 544.67065, "r": 155.72055, "b": 547.03909, "coord_origin": "1"}}, {"id": 47, "text": "3.", "bbox": {"l": 147.30025, "t": 548.91264, "r": 149.70605, "b": 551.28108, "coord_origin": "1"}}, {"id": 48, "text": "Item", "bbox": {"l": 150.90895, "t": 548.91264, "r": 155.72055, "b": 551.28108, "coord_origin": "1"}}, {"id": 49, "text": "4.", "bbox": {"l": 147.30025, "t": 553.15465, "r": 149.70605, "b": 555.52309, "coord_origin": "1"}}, {"id": 50, "text": "Item", "bbox": {"l": 150.90895, "t": 553.15465, "r": 155.72055, "b": 555.52309, "coord_origin": "1"}}, {"id": 51, "text": "Extracted", "bbox": {"l": 152.05046, "t": 517.0098, "r": 171.24945, "b": 521.27298, "coord_origin": "1"}}, {"id": 52, "text": "Table Images", "bbox": {"l": 148.13347, "t": 522.3122900000001, "r": 175.16759, "b": 526.57547, "coord_origin": "1"}}, {"id": 53, "text": "Standardized", "bbox": {"l": 193.53331, "t": 524.51422, "r": 220.31973, "b": 528.7774, "coord_origin": "1"}}, {"id": 54, "text": "Images", "bbox": {"l": 199.47311, "t": 529.8167100000001, "r": 214.37889, "b": 534.0799, "coord_origin": "1"}}, {"id": 55, "text": "BBox", "bbox": {"l": 273.61066, "t": 509.9053, "r": 284.47275, "b": 514.16849, "coord_origin": "1"}}, {"id": 56, "text": "Decoder", "bbox": {"l": 270.45187, "t": 513.6928399999999, "r": 287.63242, "b": 517.9560200000001, "coord_origin": "1"}}, {"id": 57, "text": "BBoxes", "bbox": {"l": 332.47852, "t": 508.14438, "r": 348.14014, "b": 512.40756, "coord_origin": "1"}}, {"id": 58, "text": "BBoxes can be", "bbox": {"l": 376.68622, "t": 521.12024, "r": 407.25497, "b": 525.38342, "coord_origin": "1"}}, {"id": 59, "text": "traced back to the", "bbox": {"l": 373.90869, "t": 525.66525, "r": 410.03506, "b": 529.92844, "coord_origin": "1"}}, {"id": 60, "text": "original image to", "bbox": {"l": 375.29871, "t": 530.21024, "r": 408.64902, "b": 534.47342, "coord_origin": "1"}}, {"id": 61, "text": "extract content", "bbox": {"l": 377.06747, "t": 534.75522, "r": 406.88312, "b": 539.01843, "coord_origin": "1"}}, {"id": 62, "text": "Structure Tags sequence", "bbox": {"l": 383.56683, "t": 563.24176, "r": 433.76544, "b": 567.50497, "coord_origin": "1"}}, {"id": 63, "text": "provide full description of", "bbox": {"l": 383.52768, "t": 567.78676, "r": 433.80764999999997, "b": 572.04997, "coord_origin": "1"}}, {"id": 64, "text": "the table structure", "bbox": {"l": 390.47522, "t": 572.33177, "r": 426.85703, "b": 576.59499, "coord_origin": "1"}}, {"id": 65, "text": "Structure Tags", "bbox": {"l": 293.94702, "t": 577.89143, "r": 323.1691, "b": 582.15465, "coord_origin": "1"}}, {"id": 66, "text": "in OTSL format", "bbox": {"l": 293.94702, "t": 582.43648, "r": 324.59396, "b": 586.69969, "coord_origin": "1"}}, {"id": 67, "text": "BBoxes in sync", "bbox": {"l": 333.07819, "t": 541.82269, "r": 364.14691, "b": 546.08591, "coord_origin": "1"}}, {"id": 68, "text": "with tag sequence", "bbox": {"l": 333.07819, "t": 545.6102, "r": 369.71542, "b": 549.87341, "coord_origin": "1"}}, {"id": 69, "text": "Encoder", "bbox": {"l": 232.65881000000002, "t": 515.24139, "r": 249.58894000000004, "b": 519.50458, "coord_origin": "1"}}, {"id": 70, "text": "Structure", "bbox": {"l": 269.8219, "t": 545.97102, "r": 288.26279, "b": 550.23424, "coord_origin": "1"}}, {"id": 71, "text": "Decoder", "bbox": {"l": 270.45187, "t": 549.75851, "r": 287.63242, "b": 554.0217299999999, "coord_origin": "1"}}, {"id": 72, "text": "[x1, y2, x2, y2]", "bbox": {"l": 332.17676, "t": 515.91205, "r": 358.11206, "b": 520.17523, "coord_origin": "1"}}, {"id": 73, "text": "[x1', y2', x2', y2']", "bbox": {"l": 332.17676, "t": 521.9720500000001, "r": 361.58298, "b": 526.23523, "coord_origin": "1"}}, {"id": 74, "text": "[x1'', y2'', x2'', y2'']", "bbox": {"l": 332.17676, "t": 528.03204, "r": 364.76474, "b": 532.29523, "coord_origin": "1"}}, {"id": 75, "text": "...", "bbox": {"l": 332.17676, "t": 534.09204, "r": 335.96548, "b": 538.35524, "coord_origin": "1"}}, {"id": 76, "text": "1", "bbox": {"l": 326.8894, "t": 516.39508, "r": 329.41641, "b": 520.6582599999999, "coord_origin": "1"}}, {"id": 77, "text": "2", "bbox": {"l": 327.04089, "t": 522.4247700000001, "r": 329.5679, "b": 526.68796, "coord_origin": "1"}}, {"id": 78, "text": "3", "bbox": {"l": 327.04089, "t": 528.51508, "r": 329.5679, "b": 532.77826, "coord_origin": "1"}}, {"id": 79, "text": "3", "bbox": {"l": 424.14102, "t": 527.4428399999999, "r": 426.66803, "b": 531.7060200000001, "coord_origin": "1"}}, {"id": 80, "text": "2", "bbox": {"l": 453.0018, "t": 517.4539500000001, "r": 455.52881, "b": 521.71713, "coord_origin": "1"}}, {"id": 81, "text": "1", "bbox": {"l": 423.85825, "t": 517.06281, "r": 426.38525, "b": 521.32599, "coord_origin": "1"}}, {"id": 82, "text": "C", "bbox": {"l": 333.4342, "t": 557.36679, "r": 337.27542, "b": 562.35719, "coord_origin": "1"}}, {"id": 83, "text": "C", "bbox": {"l": 340.35397, "t": 557.31679, "r": 344.19519, "b": 562.30719, "coord_origin": "1"}}, {"id": 84, "text": "C", "bbox": {"l": 340.30978, "t": 563.8653899999999, "r": 344.151, "b": 568.8557900000001, "coord_origin": "1"}}, {"id": 85, "text": "C", "bbox": {"l": 346.79904, "t": 563.8686700000001, "r": 350.64026, "b": 568.85907, "coord_origin": "1"}}, {"id": 86, "text": "C", "bbox": {"l": 333.59583, "t": 563.82271, "r": 337.43704, "b": 568.81311, "coord_origin": "1"}}, {"id": 87, "text": "C", "bbox": {"l": 340.37543, "t": 570.42673, "r": 344.21664, "b": 575.41713, "coord_origin": "1"}}, {"id": 88, "text": "C", "bbox": {"l": 346.86469, "t": 570.43001, "r": 350.7059, "b": 575.42041, "coord_origin": "1"}}, {"id": 89, "text": "C", "bbox": {"l": 333.66144, "t": 570.38405, "r": 337.50266, "b": 575.37445, "coord_origin": "1"}}, {"id": 90, "text": "C", "bbox": {"l": 340.37671, "t": 577.02606, "r": 344.21793, "b": 582.0164599999999, "coord_origin": "1"}}, {"id": 91, "text": "C", "bbox": {"l": 346.86597, "t": 577.02934, "r": 350.70718, "b": 582.01974, "coord_origin": "1"}}, {"id": 92, "text": "C", "bbox": {"l": 333.66272, "t": 576.98338, "r": 337.50394, "b": 581.97379, "coord_origin": "1"}}, {"id": 93, "text": "C", "bbox": {"l": 340.27948, "t": 583.39737, "r": 344.1207, "b": 588.38777, "coord_origin": "1"}}, {"id": 94, "text": "C", "bbox": {"l": 346.76874, "t": 583.40068, "r": 350.60995, "b": 588.39108, "coord_origin": "1"}}, {"id": 95, "text": "C", "bbox": {"l": 333.56549, "t": 583.35474, "r": 337.40671, "b": 588.34514, "coord_origin": "1"}}, {"id": 96, "text": "NL", "bbox": {"l": 353.03326, "t": 556.8831299999999, "r": 359.83362, "b": 561.87354, "coord_origin": "1"}}, {"id": 97, "text": "NL", "bbox": {"l": 353.18604, "t": 563.58044, "r": 359.98639, "b": 568.57085, "coord_origin": "1"}}, {"id": 98, "text": "NL", "bbox": {"l": 353.19864, "t": 570.1623500000001, "r": 359.99899, "b": 575.15276, "coord_origin": "1"}}, {"id": 99, "text": "NL", "bbox": {"l": 353.1532, "t": 576.76611, "r": 359.95355, "b": 581.75652, "coord_origin": "1"}}, {"id": 100, "text": "NL", "bbox": {"l": 353.26935, "t": 583.40628, "r": 360.0697, "b": 588.3966800000001, "coord_origin": "1"}}, {"id": 101, "text": "L", "bbox": {"l": 347.37979, "t": 557.08235, "r": 350.33786, "b": 562.07275, "coord_origin": "1"}}, {"id": 102, "text": "3", "bbox": {"l": 331.14026, "t": 564.2907700000001, "r": 333.66727, "b": 568.55399, "coord_origin": "1"}}, {"id": 103, "text": "2", "bbox": {"l": 340.80972, "t": 554.59312, "r": 343.33673, "b": 558.85634, "coord_origin": "1"}}, {"id": 104, "text": "1", "bbox": {"l": 330.97992, "t": 554.83035, "r": 333.50693, "b": 559.09357, "coord_origin": "1"}}, {"id": 105, "text": "We rely on standard metrics such as Tree Edit Distance score (TEDs) for", "bbox": {"l": 149.709, "t": 620.19278, "r": 480.58792, "b": 628.98975, "coord_origin": "1"}}, {"id": 106, "text": "table structure prediction, and Mean Average Precision (mAP) with 0.75 Inter-", "bbox": {"l": 134.765, "t": 632.14778, "r": 480.58871, "b": 640.94475, "coord_origin": "1"}}, {"id": 107, "text": "section Over Union (IOU) threshold for the bounding-box predictions of table", "bbox": {"l": 134.765, "t": 644.1027799999999, "r": 480.5917400000001, "b": 652.89975, "coord_origin": "1"}}, {"id": 108, "text": "cells. The predicted OTSL structures were converted back to HTML format in", "bbox": {"l": 134.765, "t": 656.0577900000001, "r": 480.58968999999996, "b": 664.8547599999999, "coord_origin": "1"}}], "predictions": {"layout": {"clusters": [{"id": 0, "label": "Page-header", "bbox": {"l": 134.19006814956666, "t": 93.66879501342771, "r": 139.46353826522827, "b": 101.84069999999997, "coord_origin": "1"}, "confidence": 0.7701128125190735, "cells": [{"id": 0, "text": "8", "bbox": {"l": 134.765, "t": 93.77099999999996, "r": 139.37193, "b": 101.84069999999997, "coord_origin": "1"}}]}, {"id": 1, "label": "Page-header", "bbox": {"l": 167.4087100982666, "t": 92.9255227088928, "r": 231.72227, "b": 101.94012937545779, "coord_origin": "1"}, "confidence": 0.8090205788612366, "cells": [{"id": 1, "text": "M.", "bbox": {"l": 167.81335, "t": 93.77099999999996, "r": 178.07675, "b": 101.84069999999997, "coord_origin": "1"}}, {"id": 2, "text": "Lysak, et al.", "bbox": {"l": 182.37415, "t": 93.77099999999996, "r": 231.72227, "b": 101.84069999999997, "coord_origin": "1"}}]}, {"id": 2, "label": "Text", "bbox": {"l": 134.20023822784424, "t": 118.29313030242918, "r": 480.58884000000006, "b": 140.21612749099734, "coord_origin": "1"}, "confidence": 0.9690903425216675, "cells": [{"id": 3, "text": "reduces significantly the column drift seen in the HTML based models (see Fig-", "bbox": {"l": 134.765, "t": 118.93377999999996, "r": 480.58884000000006, "b": 127.73077, "coord_origin": "1"}}, {"id": 4, "text": "ure 5).", "bbox": {"l": 134.765, "t": 130.88878999999997, "r": 163.56389, "b": 139.68579, "coord_origin": "1"}}]}, {"id": 3, "label": "Section-header", "bbox": {"l": 134.25577239990233, "t": 161.1968942642212, "r": 319.34708, "b": 171.12788085937495, "coord_origin": "1"}, "confidence": 0.9505801796913147, "cells": [{"id": 5, "text": "4.3", "bbox": {"l": 134.765, "t": 161.55682000000002, "r": 149.40205, "b": 170.36377000000005, "coord_origin": "1"}}, {"id": 6, "text": "Error-detection and -mitigation", "bbox": {"l": 160.85904, "t": 161.55682000000002, "r": 319.34708, "b": 170.36377000000005, "coord_origin": "1"}}]}, {"id": 4, "label": "Text", "bbox": {"l": 133.90631275177, "t": 181.4434524536133, "r": 480.5957599999999, "b": 299.0146385192871, "coord_origin": "1"}, "confidence": 0.9878641366958618, "cells": [{"id": 7, "text": "The design of OTSL allows to validate a table structure easily on an unfinished", "bbox": {"l": 134.765, "t": 182.28179999999998, "r": 480.59572999999995, "b": 191.0788, "coord_origin": "1"}}, {"id": 8, "text": "sequence. The detection of an invalid sequence token is a clear indication of a", "bbox": {"l": 134.765, "t": 194.23779000000002, "r": 480.59473, "b": 203.03479000000004, "coord_origin": "1"}}, {"id": 9, "text": "prediction mistake, however a valid sequence by itself does not guarantee pre-", "bbox": {"l": 134.765, "t": 206.19281, "r": 480.58678999999995, "b": 214.98981000000003, "coord_origin": "1"}}, {"id": 10, "text": "diction correctness. Different heuristics can be used to correct token errors in", "bbox": {"l": 134.765, "t": 218.14783, "r": 480.59177000000005, "b": 226.94482000000005, "coord_origin": "1"}}, {"id": 11, "text": "an invalid sequence and thus increase the chances for accurate predictions. Such", "bbox": {"l": 134.765, "t": 230.10284000000001, "r": 480.58768, "b": 238.89984000000004, "coord_origin": "1"}}, {"id": 12, "text": "heuristics can be applied either after the prediction of each token, or at the end", "bbox": {"l": 134.765, "t": 242.05786, "r": 480.5867, "b": 250.85486000000003, "coord_origin": "1"}}, {"id": 13, "text": "on the entire predicted sequence. For example a simple heuristic which can cor-", "bbox": {"l": 134.765, "t": 254.01288, "r": 480.5938100000001, "b": 262.80988, "coord_origin": "1"}}, {"id": 14, "text": "rect the predicted OTSL sequence on-the-fly is to verify if the token with the", "bbox": {"l": 134.765, "t": 265.96887000000004, "r": 480.59069999999997, "b": 274.76586999999995, "coord_origin": "1"}}, {"id": 15, "text": "highest prediction confidence invalidates the predicted sequence, and replace it", "bbox": {"l": 134.765, "t": 277.92389000000003, "r": 480.5957599999999, "b": 286.72086, "coord_origin": "1"}}, {"id": 16, "text": "by the token with the next highest confidence until OTSL rules are satisfied.", "bbox": {"l": 134.765, "t": 289.8788799999999, "r": 469.40369, "b": 298.67584, "coord_origin": "1"}}]}, {"id": 5, "label": "Section-header", "bbox": {"l": 134.63143787384033, "t": 320.433532333374, "r": 229.03534, "b": 332.14910888671875, "coord_origin": "1"}, "confidence": 0.9535147547721863, "cells": [{"id": 17, "text": "5", "bbox": {"l": 134.765, "t": 321.164, "r": 141.4886, "b": 331.73239000000007, "coord_origin": "1"}}, {"id": 18, "text": "Experiments", "bbox": {"l": 154.9382, "t": 321.164, "r": 229.03534, "b": 331.73239000000007, "coord_origin": "1"}}]}, {"id": 6, "label": "Text", "bbox": {"l": 133.6389286994934, "t": 346.10840950012204, "r": 480.60244274139404, "b": 452.3212429046631, "coord_origin": "1"}, "confidence": 0.9873077869415283, "cells": [{"id": 19, "text": "To evaluate the impact of OTSL on prediction accuracy and inference times, we", "bbox": {"l": 134.765, "t": 347.24985, "r": 480.59375, "b": 356.04681, "coord_origin": "1"}}, {"id": 20, "text": "conducted a series of experiments based on the TableFormer model (Figure 4)", "bbox": {"l": 134.765, "t": 359.2048300000001, "r": 480.59476, "b": 368.0018, "coord_origin": "1"}}, {"id": 21, "text": "with two objectives: Firstly we evaluate the prediction quality and performance", "bbox": {"l": 134.765, "t": 371.15982, "r": 480.58786000000003, "b": 379.95679, "coord_origin": "1"}}, {"id": 22, "text": "of OTSL vs. HTML after performing Hyper Parameter Optimization (HPO) on", "bbox": {"l": 134.765, "t": 383.11481000000003, "r": 480.58777, "b": 391.91177, "coord_origin": "1"}}, {"id": 23, "text": "the", "bbox": {"l": 134.765, "t": 395.06978999999995, "r": 148.59807, "b": 403.86676, "coord_origin": "1"}}, {"id": 24, "text": "canonical", "bbox": {"l": 151.627, "t": 395.06978999999995, "r": 191.84703, "b": 403.86676, "coord_origin": "1"}}, {"id": 25, "text": "PubTabNet data set. Secondly we pick the best hyper-parameters", "bbox": {"l": 195.90201, "t": 395.06978999999995, "r": 480.59528, "b": 403.86676, "coord_origin": "1"}}, {"id": 26, "text": "found in the first step and evaluate how OTSL impacts the performance of", "bbox": {"l": 134.76501, "t": 407.02478, "r": 480.59283000000005, "b": 415.82175, "coord_origin": "1"}}, {"id": 27, "text": "TableFormer after training on other publicly available data sets (FinTabNet,", "bbox": {"l": 134.76501, "t": 418.98077, "r": 480.59476, "b": 427.77774, "coord_origin": "1"}}, {"id": 28, "text": "PubTables-1M [14]). The ground truth (GT) from all data sets has been con-", "bbox": {"l": 134.76501, "t": 430.93576, "r": 480.59171, "b": 439.73273, "coord_origin": "1"}}, {"id": 29, "text": "verted into OTSL format for this purpose, and will be made publicly available.", "bbox": {"l": 134.76501, "t": 442.8907500000001, "r": 479.30258, "b": 451.6877099999999, "coord_origin": "1"}}]}, {"id": 7, "label": "Caption", "bbox": {"l": 134.0367874145508, "t": 483.7284702301025, "r": 480.59082, "b": 504.30859222412107, "coord_origin": "1"}, "confidence": 0.942292332649231, "cells": [{"id": 30, "text": "Fig. 4.", "bbox": {"l": 134.76501, "t": 484.64813, "r": 162.64424, "b": 492.57443, "coord_origin": "1"}}, {"id": 31, "text": "Architecture sketch of the TableFormer model, which is a representative for the", "bbox": {"l": 165.19601, "t": 484.71091, "r": 480.59082, "b": 492.78067, "coord_origin": "1"}}, {"id": 32, "text": "Im2Seq approach.", "bbox": {"l": 134.76501, "t": 495.66989, "r": 206.70245, "b": 503.73965, "coord_origin": "1"}}]}, {"id": 8, "label": "Picture", "bbox": {"l": 141.42980690002443, "t": 506.86558113098147, "r": 472.3452730178833, "b": 594.0726608276367, "coord_origin": "1"}, "confidence": 0.970181941986084, "cells": [{"id": 33, "text": "1.", "bbox": {"l": 147.30025, "t": 540.73164, "r": 149.70605, "b": 543.1000799999999, "coord_origin": "1"}}, {"id": 34, "text": "Item", "bbox": {"l": 150.90895, "t": 540.73164, "r": 155.72055, "b": 543.1000799999999, "coord_origin": "1"}}, {"id": 35, "text": "Amount", "bbox": {"l": 162.75987, "t": 535.3938, "r": 172.2963, "b": 537.76224, "coord_origin": "1"}}, {"id": 36, "text": "Names", "bbox": {"l": 147.63603, "t": 535.3661500000001, "r": 155.91753, "b": 537.73459, "coord_origin": "1"}}, {"id": 37, "text": "1000", "bbox": {"l": 158.48466, "t": 540.73164, "r": 164.10178, "b": 543.1000799999999, "coord_origin": "1"}}, {"id": 38, "text": "500", "bbox": {"l": 158.48466, "t": 544.67065, "r": 162.69737, "b": 547.03909, "coord_origin": "1"}}, {"id": 39, "text": "3500", "bbox": {"l": 158.48466, "t": 548.91264, "r": 164.10178, "b": 551.28108, "coord_origin": "1"}}, {"id": 40, "text": "150", "bbox": {"l": 158.48466, "t": 553.15465, "r": 162.69737, "b": 555.52309, "coord_origin": "1"}}, {"id": 41, "text": "unit", "bbox": {"l": 168.81696, "t": 540.73164, "r": 172.88876, "b": 543.1000799999999, "coord_origin": "1"}}, {"id": 42, "text": "unit", "bbox": {"l": 168.81696, "t": 544.67065, "r": 172.88876, "b": 547.03909, "coord_origin": "1"}}, {"id": 43, "text": "unit", "bbox": {"l": 168.81696, "t": 548.91264, "r": 172.88876, "b": 551.28108, "coord_origin": "1"}}, {"id": 44, "text": "unit", "bbox": {"l": 168.81696, "t": 553.15465, "r": 172.88876, "b": 555.52309, "coord_origin": "1"}}, {"id": 45, "text": "2.", "bbox": {"l": 147.30025, "t": 544.67065, "r": 149.70605, "b": 547.03909, "coord_origin": "1"}}, {"id": 46, "text": "Item", "bbox": {"l": 150.90895, "t": 544.67065, "r": 155.72055, "b": 547.03909, "coord_origin": "1"}}, {"id": 47, "text": "3.", "bbox": {"l": 147.30025, "t": 548.91264, "r": 149.70605, "b": 551.28108, "coord_origin": "1"}}, {"id": 48, "text": "Item", "bbox": {"l": 150.90895, "t": 548.91264, "r": 155.72055, "b": 551.28108, "coord_origin": "1"}}, {"id": 49, "text": "4.", "bbox": {"l": 147.30025, "t": 553.15465, "r": 149.70605, "b": 555.52309, "coord_origin": "1"}}, {"id": 50, "text": "Item", "bbox": {"l": 150.90895, "t": 553.15465, "r": 155.72055, "b": 555.52309, "coord_origin": "1"}}, {"id": 51, "text": "Extracted", "bbox": {"l": 152.05046, "t": 517.0098, "r": 171.24945, "b": 521.27298, "coord_origin": "1"}}, {"id": 52, "text": "Table Images", "bbox": {"l": 148.13347, "t": 522.3122900000001, "r": 175.16759, "b": 526.57547, "coord_origin": "1"}}, {"id": 53, "text": "Standardized", "bbox": {"l": 193.53331, "t": 524.51422, "r": 220.31973, "b": 528.7774, "coord_origin": "1"}}, {"id": 54, "text": "Images", "bbox": {"l": 199.47311, "t": 529.8167100000001, "r": 214.37889, "b": 534.0799, "coord_origin": "1"}}, {"id": 55, "text": "BBox", "bbox": {"l": 273.61066, "t": 509.9053, "r": 284.47275, "b": 514.16849, "coord_origin": "1"}}, {"id": 56, "text": "Decoder", "bbox": {"l": 270.45187, "t": 513.6928399999999, "r": 287.63242, "b": 517.9560200000001, "coord_origin": "1"}}, {"id": 57, "text": "BBoxes", "bbox": {"l": 332.47852, "t": 508.14438, "r": 348.14014, "b": 512.40756, "coord_origin": "1"}}, {"id": 58, "text": "BBoxes can be", "bbox": {"l": 376.68622, "t": 521.12024, "r": 407.25497, "b": 525.38342, "coord_origin": "1"}}, {"id": 59, "text": "traced back to the", "bbox": {"l": 373.90869, "t": 525.66525, "r": 410.03506, "b": 529.92844, "coord_origin": "1"}}, {"id": 60, "text": "original image to", "bbox": {"l": 375.29871, "t": 530.21024, "r": 408.64902, "b": 534.47342, "coord_origin": "1"}}, {"id": 61, "text": "extract content", "bbox": {"l": 377.06747, "t": 534.75522, "r": 406.88312, "b": 539.01843, "coord_origin": "1"}}, {"id": 62, "text": "Structure Tags sequence", "bbox": {"l": 383.56683, "t": 563.24176, "r": 433.76544, "b": 567.50497, "coord_origin": "1"}}, {"id": 63, "text": "provide full description of", "bbox": {"l": 383.52768, "t": 567.78676, "r": 433.80764999999997, "b": 572.04997, "coord_origin": "1"}}, {"id": 64, "text": "the table structure", "bbox": {"l": 390.47522, "t": 572.33177, "r": 426.85703, "b": 576.59499, "coord_origin": "1"}}, {"id": 65, "text": "Structure Tags", "bbox": {"l": 293.94702, "t": 577.89143, "r": 323.1691, "b": 582.15465, "coord_origin": "1"}}, {"id": 66, "text": "in OTSL format", "bbox": {"l": 293.94702, "t": 582.43648, "r": 324.59396, "b": 586.69969, "coord_origin": "1"}}, {"id": 67, "text": "BBoxes in sync", "bbox": {"l": 333.07819, "t": 541.82269, "r": 364.14691, "b": 546.08591, "coord_origin": "1"}}, {"id": 68, "text": "with tag sequence", "bbox": {"l": 333.07819, "t": 545.6102, "r": 369.71542, "b": 549.87341, "coord_origin": "1"}}, {"id": 69, "text": "Encoder", "bbox": {"l": 232.65881000000002, "t": 515.24139, "r": 249.58894000000004, "b": 519.50458, "coord_origin": "1"}}, {"id": 70, "text": "Structure", "bbox": {"l": 269.8219, "t": 545.97102, "r": 288.26279, "b": 550.23424, "coord_origin": "1"}}, {"id": 71, "text": "Decoder", "bbox": {"l": 270.45187, "t": 549.75851, "r": 287.63242, "b": 554.0217299999999, "coord_origin": "1"}}, {"id": 72, "text": "[x1, y2, x2, y2]", "bbox": {"l": 332.17676, "t": 515.91205, "r": 358.11206, "b": 520.17523, "coord_origin": "1"}}, {"id": 73, "text": "[x1', y2', x2', y2']", "bbox": {"l": 332.17676, "t": 521.9720500000001, "r": 361.58298, "b": 526.23523, "coord_origin": "1"}}, {"id": 74, "text": "[x1'', y2'', x2'', y2'']", "bbox": {"l": 332.17676, "t": 528.03204, "r": 364.76474, "b": 532.29523, "coord_origin": "1"}}, {"id": 75, "text": "...", "bbox": {"l": 332.17676, "t": 534.09204, "r": 335.96548, "b": 538.35524, "coord_origin": "1"}}, {"id": 76, "text": "1", "bbox": {"l": 326.8894, "t": 516.39508, "r": 329.41641, "b": 520.6582599999999, "coord_origin": "1"}}, {"id": 77, "text": "2", "bbox": {"l": 327.04089, "t": 522.4247700000001, "r": 329.5679, "b": 526.68796, "coord_origin": "1"}}, {"id": 78, "text": "3", "bbox": {"l": 327.04089, "t": 528.51508, "r": 329.5679, "b": 532.77826, "coord_origin": "1"}}, {"id": 79, "text": "3", "bbox": {"l": 424.14102, "t": 527.4428399999999, "r": 426.66803, "b": 531.7060200000001, "coord_origin": "1"}}, {"id": 80, "text": "2", "bbox": {"l": 453.0018, "t": 517.4539500000001, "r": 455.52881, "b": 521.71713, "coord_origin": "1"}}, {"id": 81, "text": "1", "bbox": {"l": 423.85825, "t": 517.06281, "r": 426.38525, "b": 521.32599, "coord_origin": "1"}}, {"id": 82, "text": "C", "bbox": {"l": 333.4342, "t": 557.36679, "r": 337.27542, "b": 562.35719, "coord_origin": "1"}}, {"id": 83, "text": "C", "bbox": {"l": 340.35397, "t": 557.31679, "r": 344.19519, "b": 562.30719, "coord_origin": "1"}}, {"id": 84, "text": "C", "bbox": {"l": 340.30978, "t": 563.8653899999999, "r": 344.151, "b": 568.8557900000001, "coord_origin": "1"}}, {"id": 85, "text": "C", "bbox": {"l": 346.79904, "t": 563.8686700000001, "r": 350.64026, "b": 568.85907, "coord_origin": "1"}}, {"id": 86, "text": "C", "bbox": {"l": 333.59583, "t": 563.82271, "r": 337.43704, "b": 568.81311, "coord_origin": "1"}}, {"id": 87, "text": "C", "bbox": {"l": 340.37543, "t": 570.42673, "r": 344.21664, "b": 575.41713, "coord_origin": "1"}}, {"id": 88, "text": "C", "bbox": {"l": 346.86469, "t": 570.43001, "r": 350.7059, "b": 575.42041, "coord_origin": "1"}}, {"id": 89, "text": "C", "bbox": {"l": 333.66144, "t": 570.38405, "r": 337.50266, "b": 575.37445, "coord_origin": "1"}}, {"id": 90, "text": "C", "bbox": {"l": 340.37671, "t": 577.02606, "r": 344.21793, "b": 582.0164599999999, "coord_origin": "1"}}, {"id": 91, "text": "C", "bbox": {"l": 346.86597, "t": 577.02934, "r": 350.70718, "b": 582.01974, "coord_origin": "1"}}, {"id": 92, "text": "C", "bbox": {"l": 333.66272, "t": 576.98338, "r": 337.50394, "b": 581.97379, "coord_origin": "1"}}, {"id": 93, "text": "C", "bbox": {"l": 340.27948, "t": 583.39737, "r": 344.1207, "b": 588.38777, "coord_origin": "1"}}, {"id": 94, "text": "C", "bbox": {"l": 346.76874, "t": 583.40068, "r": 350.60995, "b": 588.39108, "coord_origin": "1"}}, {"id": 95, "text": "C", "bbox": {"l": 333.56549, "t": 583.35474, "r": 337.40671, "b": 588.34514, "coord_origin": "1"}}, {"id": 96, "text": "NL", "bbox": {"l": 353.03326, "t": 556.8831299999999, "r": 359.83362, "b": 561.87354, "coord_origin": "1"}}, {"id": 97, "text": "NL", "bbox": {"l": 353.18604, "t": 563.58044, "r": 359.98639, "b": 568.57085, "coord_origin": "1"}}, {"id": 98, "text": "NL", "bbox": {"l": 353.19864, "t": 570.1623500000001, "r": 359.99899, "b": 575.15276, "coord_origin": "1"}}, {"id": 99, "text": "NL", "bbox": {"l": 353.1532, "t": 576.76611, "r": 359.95355, "b": 581.75652, "coord_origin": "1"}}, {"id": 100, "text": "NL", "bbox": {"l": 353.26935, "t": 583.40628, "r": 360.0697, "b": 588.3966800000001, "coord_origin": "1"}}, {"id": 101, "text": "L", "bbox": {"l": 347.37979, "t": 557.08235, "r": 350.33786, "b": 562.07275, "coord_origin": "1"}}, {"id": 102, "text": "3", "bbox": {"l": 331.14026, "t": 564.2907700000001, "r": 333.66727, "b": 568.55399, "coord_origin": "1"}}, {"id": 103, "text": "2", "bbox": {"l": 340.80972, "t": 554.59312, "r": 343.33673, "b": 558.85634, "coord_origin": "1"}}, {"id": 104, "text": "1", "bbox": {"l": 330.97992, "t": 554.83035, "r": 333.50693, "b": 559.09357, "coord_origin": "1"}}]}, {"id": 9, "label": "Text", "bbox": {"l": 133.83853654861448, "t": 619.5480606079101, "r": 480.5917400000001, "b": 665.1434852600097, "coord_origin": "1"}, "confidence": 0.9766379594802856, "cells": [{"id": 105, "text": "We rely on standard metrics such as Tree Edit Distance score (TEDs) for", "bbox": {"l": 149.709, "t": 620.19278, "r": 480.58792, "b": 628.98975, "coord_origin": "1"}}, {"id": 106, "text": "table structure prediction, and Mean Average Precision (mAP) with 0.75 Inter-", "bbox": {"l": 134.765, "t": 632.14778, "r": 480.58871, "b": 640.94475, "coord_origin": "1"}}, {"id": 107, "text": "section Over Union (IOU) threshold for the bounding-box predictions of table", "bbox": {"l": 134.765, "t": 644.1027799999999, "r": 480.5917400000001, "b": 652.89975, "coord_origin": "1"}}, {"id": 108, "text": "cells. The predicted OTSL structures were converted back to HTML format in", "bbox": {"l": 134.765, "t": 656.0577900000001, "r": 480.58968999999996, "b": 664.8547599999999, "coord_origin": "1"}}]}]}, "tablestructure": {"table_map": {}}, "figures_classification": null, "equations_prediction": null}, "assembled": {"elements": [{"label": "Page-header", "id": 0, "page_no": 7, "cluster": {"id": 0, "label": "Page-header", "bbox": {"l": 134.19006814956666, "t": 93.66879501342771, "r": 139.46353826522827, "b": 101.84069999999997, "coord_origin": "1"}, "confidence": 0.7701128125190735, "cells": [{"id": 0, "text": "8", "bbox": {"l": 134.765, "t": 93.77099999999996, "r": 139.37193, "b": 101.84069999999997, "coord_origin": "1"}}]}, "text": "8"}, {"label": "Page-header", "id": 1, "page_no": 7, "cluster": {"id": 1, "label": "Page-header", "bbox": {"l": 167.4087100982666, "t": 92.9255227088928, "r": 231.72227, "b": 101.94012937545779, "coord_origin": "1"}, "confidence": 0.8090205788612366, "cells": [{"id": 1, "text": "M.", "bbox": {"l": 167.81335, "t": 93.77099999999996, "r": 178.07675, "b": 101.84069999999997, "coord_origin": "1"}}, {"id": 2, "text": "Lysak, et al.", "bbox": {"l": 182.37415, "t": 93.77099999999996, "r": 231.72227, "b": 101.84069999999997, "coord_origin": "1"}}]}, "text": "M. Lysak, et al."}, {"label": "Text", "id": 2, "page_no": 7, "cluster": {"id": 2, "label": "Text", "bbox": {"l": 134.20023822784424, "t": 118.29313030242918, "r": 480.58884000000006, "b": 140.21612749099734, "coord_origin": "1"}, "confidence": 0.9690903425216675, "cells": [{"id": 3, "text": "reduces significantly the column drift seen in the HTML based models (see Fig-", "bbox": {"l": 134.765, "t": 118.93377999999996, "r": 480.58884000000006, "b": 127.73077, "coord_origin": "1"}}, {"id": 4, "text": "ure 5).", "bbox": {"l": 134.765, "t": 130.88878999999997, "r": 163.56389, "b": 139.68579, "coord_origin": "1"}}]}, "text": "reduces significantly the column drift seen in the HTML based models (see Figure 5)."}, {"label": "Section-header", "id": 3, "page_no": 7, "cluster": {"id": 3, "label": "Section-header", "bbox": {"l": 134.25577239990233, "t": 161.1968942642212, "r": 319.34708, "b": 171.12788085937495, "coord_origin": "1"}, "confidence": 0.9505801796913147, "cells": [{"id": 5, "text": "4.3", "bbox": {"l": 134.765, "t": 161.55682000000002, "r": 149.40205, "b": 170.36377000000005, "coord_origin": "1"}}, {"id": 6, "text": "Error-detection and -mitigation", "bbox": {"l": 160.85904, "t": 161.55682000000002, "r": 319.34708, "b": 170.36377000000005, "coord_origin": "1"}}]}, "text": "4.3 Error-detection and -mitigation"}, {"label": "Text", "id": 4, "page_no": 7, "cluster": {"id": 4, "label": "Text", "bbox": {"l": 133.90631275177, "t": 181.4434524536133, "r": 480.5957599999999, "b": 299.0146385192871, "coord_origin": "1"}, "confidence": 0.9878641366958618, "cells": [{"id": 7, "text": "The design of OTSL allows to validate a table structure easily on an unfinished", "bbox": {"l": 134.765, "t": 182.28179999999998, "r": 480.59572999999995, "b": 191.0788, "coord_origin": "1"}}, {"id": 8, "text": "sequence. The detection of an invalid sequence token is a clear indication of a", "bbox": {"l": 134.765, "t": 194.23779000000002, "r": 480.59473, "b": 203.03479000000004, "coord_origin": "1"}}, {"id": 9, "text": "prediction mistake, however a valid sequence by itself does not guarantee pre-", "bbox": {"l": 134.765, "t": 206.19281, "r": 480.58678999999995, "b": 214.98981000000003, "coord_origin": "1"}}, {"id": 10, "text": "diction correctness. Different heuristics can be used to correct token errors in", "bbox": {"l": 134.765, "t": 218.14783, "r": 480.59177000000005, "b": 226.94482000000005, "coord_origin": "1"}}, {"id": 11, "text": "an invalid sequence and thus increase the chances for accurate predictions. Such", "bbox": {"l": 134.765, "t": 230.10284000000001, "r": 480.58768, "b": 238.89984000000004, "coord_origin": "1"}}, {"id": 12, "text": "heuristics can be applied either after the prediction of each token, or at the end", "bbox": {"l": 134.765, "t": 242.05786, "r": 480.5867, "b": 250.85486000000003, "coord_origin": "1"}}, {"id": 13, "text": "on the entire predicted sequence. For example a simple heuristic which can cor-", "bbox": {"l": 134.765, "t": 254.01288, "r": 480.5938100000001, "b": 262.80988, "coord_origin": "1"}}, {"id": 14, "text": "rect the predicted OTSL sequence on-the-fly is to verify if the token with the", "bbox": {"l": 134.765, "t": 265.96887000000004, "r": 480.59069999999997, "b": 274.76586999999995, "coord_origin": "1"}}, {"id": 15, "text": "highest prediction confidence invalidates the predicted sequence, and replace it", "bbox": {"l": 134.765, "t": 277.92389000000003, "r": 480.5957599999999, "b": 286.72086, "coord_origin": "1"}}, {"id": 16, "text": "by the token with the next highest confidence until OTSL rules are satisfied.", "bbox": {"l": 134.765, "t": 289.8788799999999, "r": 469.40369, "b": 298.67584, "coord_origin": "1"}}]}, "text": "The design of OTSL allows to validate a table structure easily on an unfinished sequence. The detection of an invalid sequence token is a clear indication of a prediction mistake, however a valid sequence by itself does not guarantee prediction correctness. Different heuristics can be used to correct token errors in an invalid sequence and thus increase the chances for accurate predictions. Such heuristics can be applied either after the prediction of each token, or at the end on the entire predicted sequence. For example a simple heuristic which can correct the predicted OTSL sequence on-the-fly is to verify if the token with the highest prediction confidence invalidates the predicted sequence, and replace it by the token with the next highest confidence until OTSL rules are satisfied."}, {"label": "Section-header", "id": 5, "page_no": 7, "cluster": {"id": 5, "label": "Section-header", "bbox": {"l": 134.63143787384033, "t": 320.433532333374, "r": 229.03534, "b": 332.14910888671875, "coord_origin": "1"}, "confidence": 0.9535147547721863, "cells": [{"id": 17, "text": "5", "bbox": {"l": 134.765, "t": 321.164, "r": 141.4886, "b": 331.73239000000007, "coord_origin": "1"}}, {"id": 18, "text": "Experiments", "bbox": {"l": 154.9382, "t": 321.164, "r": 229.03534, "b": 331.73239000000007, "coord_origin": "1"}}]}, "text": "5 Experiments"}, {"label": "Text", "id": 6, "page_no": 7, "cluster": {"id": 6, "label": "Text", "bbox": {"l": 133.6389286994934, "t": 346.10840950012204, "r": 480.60244274139404, "b": 452.3212429046631, "coord_origin": "1"}, "confidence": 0.9873077869415283, "cells": [{"id": 19, "text": "To evaluate the impact of OTSL on prediction accuracy and inference times, we", "bbox": {"l": 134.765, "t": 347.24985, "r": 480.59375, "b": 356.04681, "coord_origin": "1"}}, {"id": 20, "text": "conducted a series of experiments based on the TableFormer model (Figure 4)", "bbox": {"l": 134.765, "t": 359.2048300000001, "r": 480.59476, "b": 368.0018, "coord_origin": "1"}}, {"id": 21, "text": "with two objectives: Firstly we evaluate the prediction quality and performance", "bbox": {"l": 134.765, "t": 371.15982, "r": 480.58786000000003, "b": 379.95679, "coord_origin": "1"}}, {"id": 22, "text": "of OTSL vs. HTML after performing Hyper Parameter Optimization (HPO) on", "bbox": {"l": 134.765, "t": 383.11481000000003, "r": 480.58777, "b": 391.91177, "coord_origin": "1"}}, {"id": 23, "text": "the", "bbox": {"l": 134.765, "t": 395.06978999999995, "r": 148.59807, "b": 403.86676, "coord_origin": "1"}}, {"id": 24, "text": "canonical", "bbox": {"l": 151.627, "t": 395.06978999999995, "r": 191.84703, "b": 403.86676, "coord_origin": "1"}}, {"id": 25, "text": "PubTabNet data set. Secondly we pick the best hyper-parameters", "bbox": {"l": 195.90201, "t": 395.06978999999995, "r": 480.59528, "b": 403.86676, "coord_origin": "1"}}, {"id": 26, "text": "found in the first step and evaluate how OTSL impacts the performance of", "bbox": {"l": 134.76501, "t": 407.02478, "r": 480.59283000000005, "b": 415.82175, "coord_origin": "1"}}, {"id": 27, "text": "TableFormer after training on other publicly available data sets (FinTabNet,", "bbox": {"l": 134.76501, "t": 418.98077, "r": 480.59476, "b": 427.77774, "coord_origin": "1"}}, {"id": 28, "text": "PubTables-1M [14]). The ground truth (GT) from all data sets has been con-", "bbox": {"l": 134.76501, "t": 430.93576, "r": 480.59171, "b": 439.73273, "coord_origin": "1"}}, {"id": 29, "text": "verted into OTSL format for this purpose, and will be made publicly available.", "bbox": {"l": 134.76501, "t": 442.8907500000001, "r": 479.30258, "b": 451.6877099999999, "coord_origin": "1"}}]}, "text": "To evaluate the impact of OTSL on prediction accuracy and inference times, we conducted a series of experiments based on the TableFormer model (Figure 4) with two objectives: Firstly we evaluate the prediction quality and performance of OTSL vs. HTML after performing Hyper Parameter Optimization (HPO) on the canonical PubTabNet data set. Secondly we pick the best hyper-parameters found in the first step and evaluate how OTSL impacts the performance of TableFormer after training on other publicly available data sets (FinTabNet, PubTables-1M [14]). The ground truth (GT) from all data sets has been converted into OTSL format for this purpose, and will be made publicly available."}, {"label": "Caption", "id": 7, "page_no": 7, "cluster": {"id": 7, "label": "Caption", "bbox": {"l": 134.0367874145508, "t": 483.7284702301025, "r": 480.59082, "b": 504.30859222412107, "coord_origin": "1"}, "confidence": 0.942292332649231, "cells": [{"id": 30, "text": "Fig. 4.", "bbox": {"l": 134.76501, "t": 484.64813, "r": 162.64424, "b": 492.57443, "coord_origin": "1"}}, {"id": 31, "text": "Architecture sketch of the TableFormer model, which is a representative for the", "bbox": {"l": 165.19601, "t": 484.71091, "r": 480.59082, "b": 492.78067, "coord_origin": "1"}}, {"id": 32, "text": "Im2Seq approach.", "bbox": {"l": 134.76501, "t": 495.66989, "r": 206.70245, "b": 503.73965, "coord_origin": "1"}}]}, "text": "Fig. 4. Architecture sketch of the TableFormer model, which is a representative for the Im2Seq approach."}, {"label": "Picture", "id": 8, "page_no": 7, "cluster": {"id": 8, "label": "Picture", "bbox": {"l": 141.42980690002443, "t": 506.86558113098147, "r": 472.3452730178833, "b": 594.0726608276367, "coord_origin": "1"}, "confidence": 0.970181941986084, "cells": [{"id": 33, "text": "1.", "bbox": {"l": 147.30025, "t": 540.73164, "r": 149.70605, "b": 543.1000799999999, "coord_origin": "1"}}, {"id": 34, "text": "Item", "bbox": {"l": 150.90895, "t": 540.73164, "r": 155.72055, "b": 543.1000799999999, "coord_origin": "1"}}, {"id": 35, "text": "Amount", "bbox": {"l": 162.75987, "t": 535.3938, "r": 172.2963, "b": 537.76224, "coord_origin": "1"}}, {"id": 36, "text": "Names", "bbox": {"l": 147.63603, "t": 535.3661500000001, "r": 155.91753, "b": 537.73459, "coord_origin": "1"}}, {"id": 37, "text": "1000", "bbox": {"l": 158.48466, "t": 540.73164, "r": 164.10178, "b": 543.1000799999999, "coord_origin": "1"}}, {"id": 38, "text": "500", "bbox": {"l": 158.48466, "t": 544.67065, "r": 162.69737, "b": 547.03909, "coord_origin": "1"}}, {"id": 39, "text": "3500", "bbox": {"l": 158.48466, "t": 548.91264, "r": 164.10178, "b": 551.28108, "coord_origin": "1"}}, {"id": 40, "text": "150", "bbox": {"l": 158.48466, "t": 553.15465, "r": 162.69737, "b": 555.52309, "coord_origin": "1"}}, {"id": 41, "text": "unit", "bbox": {"l": 168.81696, "t": 540.73164, "r": 172.88876, "b": 543.1000799999999, "coord_origin": "1"}}, {"id": 42, "text": "unit", "bbox": {"l": 168.81696, "t": 544.67065, "r": 172.88876, "b": 547.03909, "coord_origin": "1"}}, {"id": 43, "text": "unit", "bbox": {"l": 168.81696, "t": 548.91264, "r": 172.88876, "b": 551.28108, "coord_origin": "1"}}, {"id": 44, "text": "unit", "bbox": {"l": 168.81696, "t": 553.15465, "r": 172.88876, "b": 555.52309, "coord_origin": "1"}}, {"id": 45, "text": "2.", "bbox": {"l": 147.30025, "t": 544.67065, "r": 149.70605, "b": 547.03909, "coord_origin": "1"}}, {"id": 46, "text": "Item", "bbox": {"l": 150.90895, "t": 544.67065, "r": 155.72055, "b": 547.03909, "coord_origin": "1"}}, {"id": 47, "text": "3.", "bbox": {"l": 147.30025, "t": 548.91264, "r": 149.70605, "b": 551.28108, "coord_origin": "1"}}, {"id": 48, "text": "Item", "bbox": {"l": 150.90895, "t": 548.91264, "r": 155.72055, "b": 551.28108, "coord_origin": "1"}}, {"id": 49, "text": "4.", "bbox": {"l": 147.30025, "t": 553.15465, "r": 149.70605, "b": 555.52309, "coord_origin": "1"}}, {"id": 50, "text": "Item", "bbox": {"l": 150.90895, "t": 553.15465, "r": 155.72055, "b": 555.52309, "coord_origin": "1"}}, {"id": 51, "text": "Extracted", "bbox": {"l": 152.05046, "t": 517.0098, "r": 171.24945, "b": 521.27298, "coord_origin": "1"}}, {"id": 52, "text": "Table Images", "bbox": {"l": 148.13347, "t": 522.3122900000001, "r": 175.16759, "b": 526.57547, "coord_origin": "1"}}, {"id": 53, "text": "Standardized", "bbox": {"l": 193.53331, "t": 524.51422, "r": 220.31973, "b": 528.7774, "coord_origin": "1"}}, {"id": 54, "text": "Images", "bbox": {"l": 199.47311, "t": 529.8167100000001, "r": 214.37889, "b": 534.0799, "coord_origin": "1"}}, {"id": 55, "text": "BBox", "bbox": {"l": 273.61066, "t": 509.9053, "r": 284.47275, "b": 514.16849, "coord_origin": "1"}}, {"id": 56, "text": "Decoder", "bbox": {"l": 270.45187, "t": 513.6928399999999, "r": 287.63242, "b": 517.9560200000001, "coord_origin": "1"}}, {"id": 57, "text": "BBoxes", "bbox": {"l": 332.47852, "t": 508.14438, "r": 348.14014, "b": 512.40756, "coord_origin": "1"}}, {"id": 58, "text": "BBoxes can be", "bbox": {"l": 376.68622, "t": 521.12024, "r": 407.25497, "b": 525.38342, "coord_origin": "1"}}, {"id": 59, "text": "traced back to the", "bbox": {"l": 373.90869, "t": 525.66525, "r": 410.03506, "b": 529.92844, "coord_origin": "1"}}, {"id": 60, "text": "original image to", "bbox": {"l": 375.29871, "t": 530.21024, "r": 408.64902, "b": 534.47342, "coord_origin": "1"}}, {"id": 61, "text": "extract content", "bbox": {"l": 377.06747, "t": 534.75522, "r": 406.88312, "b": 539.01843, "coord_origin": "1"}}, {"id": 62, "text": "Structure Tags sequence", "bbox": {"l": 383.56683, "t": 563.24176, "r": 433.76544, "b": 567.50497, "coord_origin": "1"}}, {"id": 63, "text": "provide full description of", "bbox": {"l": 383.52768, "t": 567.78676, "r": 433.80764999999997, "b": 572.04997, "coord_origin": "1"}}, {"id": 64, "text": "the table structure", "bbox": {"l": 390.47522, "t": 572.33177, "r": 426.85703, "b": 576.59499, "coord_origin": "1"}}, {"id": 65, "text": "Structure Tags", "bbox": {"l": 293.94702, "t": 577.89143, "r": 323.1691, "b": 582.15465, "coord_origin": "1"}}, {"id": 66, "text": "in OTSL format", "bbox": {"l": 293.94702, "t": 582.43648, "r": 324.59396, "b": 586.69969, "coord_origin": "1"}}, {"id": 67, "text": "BBoxes in sync", "bbox": {"l": 333.07819, "t": 541.82269, "r": 364.14691, "b": 546.08591, "coord_origin": "1"}}, {"id": 68, "text": "with tag sequence", "bbox": {"l": 333.07819, "t": 545.6102, "r": 369.71542, "b": 549.87341, "coord_origin": "1"}}, {"id": 69, "text": "Encoder", "bbox": {"l": 232.65881000000002, "t": 515.24139, "r": 249.58894000000004, "b": 519.50458, "coord_origin": "1"}}, {"id": 70, "text": "Structure", "bbox": {"l": 269.8219, "t": 545.97102, "r": 288.26279, "b": 550.23424, "coord_origin": "1"}}, {"id": 71, "text": "Decoder", "bbox": {"l": 270.45187, "t": 549.75851, "r": 287.63242, "b": 554.0217299999999, "coord_origin": "1"}}, {"id": 72, "text": "[x1, y2, x2, y2]", "bbox": {"l": 332.17676, "t": 515.91205, "r": 358.11206, "b": 520.17523, "coord_origin": "1"}}, {"id": 73, "text": "[x1', y2', x2', y2']", "bbox": {"l": 332.17676, "t": 521.9720500000001, "r": 361.58298, "b": 526.23523, "coord_origin": "1"}}, {"id": 74, "text": "[x1'', y2'', x2'', y2'']", "bbox": {"l": 332.17676, "t": 528.03204, "r": 364.76474, "b": 532.29523, "coord_origin": "1"}}, {"id": 75, "text": "...", "bbox": {"l": 332.17676, "t": 534.09204, "r": 335.96548, "b": 538.35524, "coord_origin": "1"}}, {"id": 76, "text": "1", "bbox": {"l": 326.8894, "t": 516.39508, "r": 329.41641, "b": 520.6582599999999, "coord_origin": "1"}}, {"id": 77, "text": "2", "bbox": {"l": 327.04089, "t": 522.4247700000001, "r": 329.5679, "b": 526.68796, "coord_origin": "1"}}, {"id": 78, "text": "3", "bbox": {"l": 327.04089, "t": 528.51508, "r": 329.5679, "b": 532.77826, "coord_origin": "1"}}, {"id": 79, "text": "3", "bbox": {"l": 424.14102, "t": 527.4428399999999, "r": 426.66803, "b": 531.7060200000001, "coord_origin": "1"}}, {"id": 80, "text": "2", "bbox": {"l": 453.0018, "t": 517.4539500000001, "r": 455.52881, "b": 521.71713, "coord_origin": "1"}}, {"id": 81, "text": "1", "bbox": {"l": 423.85825, "t": 517.06281, "r": 426.38525, "b": 521.32599, "coord_origin": "1"}}, {"id": 82, "text": "C", "bbox": {"l": 333.4342, "t": 557.36679, "r": 337.27542, "b": 562.35719, "coord_origin": "1"}}, {"id": 83, "text": "C", "bbox": {"l": 340.35397, "t": 557.31679, "r": 344.19519, "b": 562.30719, "coord_origin": "1"}}, {"id": 84, "text": "C", "bbox": {"l": 340.30978, "t": 563.8653899999999, "r": 344.151, "b": 568.8557900000001, "coord_origin": "1"}}, {"id": 85, "text": "C", "bbox": {"l": 346.79904, "t": 563.8686700000001, "r": 350.64026, "b": 568.85907, "coord_origin": "1"}}, {"id": 86, "text": "C", "bbox": {"l": 333.59583, "t": 563.82271, "r": 337.43704, "b": 568.81311, "coord_origin": "1"}}, {"id": 87, "text": "C", "bbox": {"l": 340.37543, "t": 570.42673, "r": 344.21664, "b": 575.41713, "coord_origin": "1"}}, {"id": 88, "text": "C", "bbox": {"l": 346.86469, "t": 570.43001, "r": 350.7059, "b": 575.42041, "coord_origin": "1"}}, {"id": 89, "text": "C", "bbox": {"l": 333.66144, "t": 570.38405, "r": 337.50266, "b": 575.37445, "coord_origin": "1"}}, {"id": 90, "text": "C", "bbox": {"l": 340.37671, "t": 577.02606, "r": 344.21793, "b": 582.0164599999999, "coord_origin": "1"}}, {"id": 91, "text": "C", "bbox": {"l": 346.86597, "t": 577.02934, "r": 350.70718, "b": 582.01974, "coord_origin": "1"}}, {"id": 92, "text": "C", "bbox": {"l": 333.66272, "t": 576.98338, "r": 337.50394, "b": 581.97379, "coord_origin": "1"}}, {"id": 93, "text": "C", "bbox": {"l": 340.27948, "t": 583.39737, "r": 344.1207, "b": 588.38777, "coord_origin": "1"}}, {"id": 94, "text": "C", "bbox": {"l": 346.76874, "t": 583.40068, "r": 350.60995, "b": 588.39108, "coord_origin": "1"}}, {"id": 95, "text": "C", "bbox": {"l": 333.56549, "t": 583.35474, "r": 337.40671, "b": 588.34514, "coord_origin": "1"}}, {"id": 96, "text": "NL", "bbox": {"l": 353.03326, "t": 556.8831299999999, "r": 359.83362, "b": 561.87354, "coord_origin": "1"}}, {"id": 97, "text": "NL", "bbox": {"l": 353.18604, "t": 563.58044, "r": 359.98639, "b": 568.57085, "coord_origin": "1"}}, {"id": 98, "text": "NL", "bbox": {"l": 353.19864, "t": 570.1623500000001, "r": 359.99899, "b": 575.15276, "coord_origin": "1"}}, {"id": 99, "text": "NL", "bbox": {"l": 353.1532, "t": 576.76611, "r": 359.95355, "b": 581.75652, "coord_origin": "1"}}, {"id": 100, "text": "NL", "bbox": {"l": 353.26935, "t": 583.40628, "r": 360.0697, "b": 588.3966800000001, "coord_origin": "1"}}, {"id": 101, "text": "L", "bbox": {"l": 347.37979, "t": 557.08235, "r": 350.33786, "b": 562.07275, "coord_origin": "1"}}, {"id": 102, "text": "3", "bbox": {"l": 331.14026, "t": 564.2907700000001, "r": 333.66727, "b": 568.55399, "coord_origin": "1"}}, {"id": 103, "text": "2", "bbox": {"l": 340.80972, "t": 554.59312, "r": 343.33673, "b": 558.85634, "coord_origin": "1"}}, {"id": 104, "text": "1", "bbox": {"l": 330.97992, "t": 554.83035, "r": 333.50693, "b": 559.09357, "coord_origin": "1"}}]}, "text": "", "data": null, "provenance": null, "predicted_class": null, "confidence": null}, {"label": "Text", "id": 9, "page_no": 7, "cluster": {"id": 9, "label": "Text", "bbox": {"l": 133.83853654861448, "t": 619.5480606079101, "r": 480.5917400000001, "b": 665.1434852600097, "coord_origin": "1"}, "confidence": 0.9766379594802856, "cells": [{"id": 105, "text": "We rely on standard metrics such as Tree Edit Distance score (TEDs) for", "bbox": {"l": 149.709, "t": 620.19278, "r": 480.58792, "b": 628.98975, "coord_origin": "1"}}, {"id": 106, "text": "table structure prediction, and Mean Average Precision (mAP) with 0.75 Inter-", "bbox": {"l": 134.765, "t": 632.14778, "r": 480.58871, "b": 640.94475, "coord_origin": "1"}}, {"id": 107, "text": "section Over Union (IOU) threshold for the bounding-box predictions of table", "bbox": {"l": 134.765, "t": 644.1027799999999, "r": 480.5917400000001, "b": 652.89975, "coord_origin": "1"}}, {"id": 108, "text": "cells. The predicted OTSL structures were converted back to HTML format in", "bbox": {"l": 134.765, "t": 656.0577900000001, "r": 480.58968999999996, "b": 664.8547599999999, "coord_origin": "1"}}]}, "text": "We rely on standard metrics such as Tree Edit Distance score (TEDs) for table structure prediction, and Mean Average Precision (mAP) with 0.75 Intersection Over Union (IOU) threshold for the bounding-box predictions of table cells. The predicted OTSL structures were converted back to HTML format in"}], "body": [{"label": "Text", "id": 2, "page_no": 7, "cluster": {"id": 2, "label": "Text", "bbox": {"l": 134.20023822784424, "t": 118.29313030242918, "r": 480.58884000000006, "b": 140.21612749099734, "coord_origin": "1"}, "confidence": 0.9690903425216675, "cells": [{"id": 3, "text": "reduces significantly the column drift seen in the HTML based models (see Fig-", "bbox": {"l": 134.765, "t": 118.93377999999996, "r": 480.58884000000006, "b": 127.73077, "coord_origin": "1"}}, {"id": 4, "text": "ure 5).", "bbox": {"l": 134.765, "t": 130.88878999999997, "r": 163.56389, "b": 139.68579, "coord_origin": "1"}}]}, "text": "reduces significantly the column drift seen in the HTML based models (see Figure 5)."}, {"label": "Section-header", "id": 3, "page_no": 7, "cluster": {"id": 3, "label": "Section-header", "bbox": {"l": 134.25577239990233, "t": 161.1968942642212, "r": 319.34708, "b": 171.12788085937495, "coord_origin": "1"}, "confidence": 0.9505801796913147, "cells": [{"id": 5, "text": "4.3", "bbox": {"l": 134.765, "t": 161.55682000000002, "r": 149.40205, "b": 170.36377000000005, "coord_origin": "1"}}, {"id": 6, "text": "Error-detection and -mitigation", "bbox": {"l": 160.85904, "t": 161.55682000000002, "r": 319.34708, "b": 170.36377000000005, "coord_origin": "1"}}]}, "text": "4.3 Error-detection and -mitigation"}, {"label": "Text", "id": 4, "page_no": 7, "cluster": {"id": 4, "label": "Text", "bbox": {"l": 133.90631275177, "t": 181.4434524536133, "r": 480.5957599999999, "b": 299.0146385192871, "coord_origin": "1"}, "confidence": 0.9878641366958618, "cells": [{"id": 7, "text": "The design of OTSL allows to validate a table structure easily on an unfinished", "bbox": {"l": 134.765, "t": 182.28179999999998, "r": 480.59572999999995, "b": 191.0788, "coord_origin": "1"}}, {"id": 8, "text": "sequence. The detection of an invalid sequence token is a clear indication of a", "bbox": {"l": 134.765, "t": 194.23779000000002, "r": 480.59473, "b": 203.03479000000004, "coord_origin": "1"}}, {"id": 9, "text": "prediction mistake, however a valid sequence by itself does not guarantee pre-", "bbox": {"l": 134.765, "t": 206.19281, "r": 480.58678999999995, "b": 214.98981000000003, "coord_origin": "1"}}, {"id": 10, "text": "diction correctness. Different heuristics can be used to correct token errors in", "bbox": {"l": 134.765, "t": 218.14783, "r": 480.59177000000005, "b": 226.94482000000005, "coord_origin": "1"}}, {"id": 11, "text": "an invalid sequence and thus increase the chances for accurate predictions. Such", "bbox": {"l": 134.765, "t": 230.10284000000001, "r": 480.58768, "b": 238.89984000000004, "coord_origin": "1"}}, {"id": 12, "text": "heuristics can be applied either after the prediction of each token, or at the end", "bbox": {"l": 134.765, "t": 242.05786, "r": 480.5867, "b": 250.85486000000003, "coord_origin": "1"}}, {"id": 13, "text": "on the entire predicted sequence. For example a simple heuristic which can cor-", "bbox": {"l": 134.765, "t": 254.01288, "r": 480.5938100000001, "b": 262.80988, "coord_origin": "1"}}, {"id": 14, "text": "rect the predicted OTSL sequence on-the-fly is to verify if the token with the", "bbox": {"l": 134.765, "t": 265.96887000000004, "r": 480.59069999999997, "b": 274.76586999999995, "coord_origin": "1"}}, {"id": 15, "text": "highest prediction confidence invalidates the predicted sequence, and replace it", "bbox": {"l": 134.765, "t": 277.92389000000003, "r": 480.5957599999999, "b": 286.72086, "coord_origin": "1"}}, {"id": 16, "text": "by the token with the next highest confidence until OTSL rules are satisfied.", "bbox": {"l": 134.765, "t": 289.8788799999999, "r": 469.40369, "b": 298.67584, "coord_origin": "1"}}]}, "text": "The design of OTSL allows to validate a table structure easily on an unfinished sequence. The detection of an invalid sequence token is a clear indication of a prediction mistake, however a valid sequence by itself does not guarantee prediction correctness. Different heuristics can be used to correct token errors in an invalid sequence and thus increase the chances for accurate predictions. Such heuristics can be applied either after the prediction of each token, or at the end on the entire predicted sequence. For example a simple heuristic which can correct the predicted OTSL sequence on-the-fly is to verify if the token with the highest prediction confidence invalidates the predicted sequence, and replace it by the token with the next highest confidence until OTSL rules are satisfied."}, {"label": "Section-header", "id": 5, "page_no": 7, "cluster": {"id": 5, "label": "Section-header", "bbox": {"l": 134.63143787384033, "t": 320.433532333374, "r": 229.03534, "b": 332.14910888671875, "coord_origin": "1"}, "confidence": 0.9535147547721863, "cells": [{"id": 17, "text": "5", "bbox": {"l": 134.765, "t": 321.164, "r": 141.4886, "b": 331.73239000000007, "coord_origin": "1"}}, {"id": 18, "text": "Experiments", "bbox": {"l": 154.9382, "t": 321.164, "r": 229.03534, "b": 331.73239000000007, "coord_origin": "1"}}]}, "text": "5 Experiments"}, {"label": "Text", "id": 6, "page_no": 7, "cluster": {"id": 6, "label": "Text", "bbox": {"l": 133.6389286994934, "t": 346.10840950012204, "r": 480.60244274139404, "b": 452.3212429046631, "coord_origin": "1"}, "confidence": 0.9873077869415283, "cells": [{"id": 19, "text": "To evaluate the impact of OTSL on prediction accuracy and inference times, we", "bbox": {"l": 134.765, "t": 347.24985, "r": 480.59375, "b": 356.04681, "coord_origin": "1"}}, {"id": 20, "text": "conducted a series of experiments based on the TableFormer model (Figure 4)", "bbox": {"l": 134.765, "t": 359.2048300000001, "r": 480.59476, "b": 368.0018, "coord_origin": "1"}}, {"id": 21, "text": "with two objectives: Firstly we evaluate the prediction quality and performance", "bbox": {"l": 134.765, "t": 371.15982, "r": 480.58786000000003, "b": 379.95679, "coord_origin": "1"}}, {"id": 22, "text": "of OTSL vs. HTML after performing Hyper Parameter Optimization (HPO) on", "bbox": {"l": 134.765, "t": 383.11481000000003, "r": 480.58777, "b": 391.91177, "coord_origin": "1"}}, {"id": 23, "text": "the", "bbox": {"l": 134.765, "t": 395.06978999999995, "r": 148.59807, "b": 403.86676, "coord_origin": "1"}}, {"id": 24, "text": "canonical", "bbox": {"l": 151.627, "t": 395.06978999999995, "r": 191.84703, "b": 403.86676, "coord_origin": "1"}}, {"id": 25, "text": "PubTabNet data set. Secondly we pick the best hyper-parameters", "bbox": {"l": 195.90201, "t": 395.06978999999995, "r": 480.59528, "b": 403.86676, "coord_origin": "1"}}, {"id": 26, "text": "found in the first step and evaluate how OTSL impacts the performance of", "bbox": {"l": 134.76501, "t": 407.02478, "r": 480.59283000000005, "b": 415.82175, "coord_origin": "1"}}, {"id": 27, "text": "TableFormer after training on other publicly available data sets (FinTabNet,", "bbox": {"l": 134.76501, "t": 418.98077, "r": 480.59476, "b": 427.77774, "coord_origin": "1"}}, {"id": 28, "text": "PubTables-1M [14]). The ground truth (GT) from all data sets has been con-", "bbox": {"l": 134.76501, "t": 430.93576, "r": 480.59171, "b": 439.73273, "coord_origin": "1"}}, {"id": 29, "text": "verted into OTSL format for this purpose, and will be made publicly available.", "bbox": {"l": 134.76501, "t": 442.8907500000001, "r": 479.30258, "b": 451.6877099999999, "coord_origin": "1"}}]}, "text": "To evaluate the impact of OTSL on prediction accuracy and inference times, we conducted a series of experiments based on the TableFormer model (Figure 4) with two objectives: Firstly we evaluate the prediction quality and performance of OTSL vs. HTML after performing Hyper Parameter Optimization (HPO) on the canonical PubTabNet data set. Secondly we pick the best hyper-parameters found in the first step and evaluate how OTSL impacts the performance of TableFormer after training on other publicly available data sets (FinTabNet, PubTables-1M [14]). The ground truth (GT) from all data sets has been converted into OTSL format for this purpose, and will be made publicly available."}, {"label": "Caption", "id": 7, "page_no": 7, "cluster": {"id": 7, "label": "Caption", "bbox": {"l": 134.0367874145508, "t": 483.7284702301025, "r": 480.59082, "b": 504.30859222412107, "coord_origin": "1"}, "confidence": 0.942292332649231, "cells": [{"id": 30, "text": "Fig. 4.", "bbox": {"l": 134.76501, "t": 484.64813, "r": 162.64424, "b": 492.57443, "coord_origin": "1"}}, {"id": 31, "text": "Architecture sketch of the TableFormer model, which is a representative for the", "bbox": {"l": 165.19601, "t": 484.71091, "r": 480.59082, "b": 492.78067, "coord_origin": "1"}}, {"id": 32, "text": "Im2Seq approach.", "bbox": {"l": 134.76501, "t": 495.66989, "r": 206.70245, "b": 503.73965, "coord_origin": "1"}}]}, "text": "Fig. 4. Architecture sketch of the TableFormer model, which is a representative for the Im2Seq approach."}, {"label": "Picture", "id": 8, "page_no": 7, "cluster": {"id": 8, "label": "Picture", "bbox": {"l": 141.42980690002443, "t": 506.86558113098147, "r": 472.3452730178833, "b": 594.0726608276367, "coord_origin": "1"}, "confidence": 0.970181941986084, "cells": [{"id": 33, "text": "1.", "bbox": {"l": 147.30025, "t": 540.73164, "r": 149.70605, "b": 543.1000799999999, "coord_origin": "1"}}, {"id": 34, "text": "Item", "bbox": {"l": 150.90895, "t": 540.73164, "r": 155.72055, "b": 543.1000799999999, "coord_origin": "1"}}, {"id": 35, "text": "Amount", "bbox": {"l": 162.75987, "t": 535.3938, "r": 172.2963, "b": 537.76224, "coord_origin": "1"}}, {"id": 36, "text": "Names", "bbox": {"l": 147.63603, "t": 535.3661500000001, "r": 155.91753, "b": 537.73459, "coord_origin": "1"}}, {"id": 37, "text": "1000", "bbox": {"l": 158.48466, "t": 540.73164, "r": 164.10178, "b": 543.1000799999999, "coord_origin": "1"}}, {"id": 38, "text": "500", "bbox": {"l": 158.48466, "t": 544.67065, "r": 162.69737, "b": 547.03909, "coord_origin": "1"}}, {"id": 39, "text": "3500", "bbox": {"l": 158.48466, "t": 548.91264, "r": 164.10178, "b": 551.28108, "coord_origin": "1"}}, {"id": 40, "text": "150", "bbox": {"l": 158.48466, "t": 553.15465, "r": 162.69737, "b": 555.52309, "coord_origin": "1"}}, {"id": 41, "text": "unit", "bbox": {"l": 168.81696, "t": 540.73164, "r": 172.88876, "b": 543.1000799999999, "coord_origin": "1"}}, {"id": 42, "text": "unit", "bbox": {"l": 168.81696, "t": 544.67065, "r": 172.88876, "b": 547.03909, "coord_origin": "1"}}, {"id": 43, "text": "unit", "bbox": {"l": 168.81696, "t": 548.91264, "r": 172.88876, "b": 551.28108, "coord_origin": "1"}}, {"id": 44, "text": "unit", "bbox": {"l": 168.81696, "t": 553.15465, "r": 172.88876, "b": 555.52309, "coord_origin": "1"}}, {"id": 45, "text": "2.", "bbox": {"l": 147.30025, "t": 544.67065, "r": 149.70605, "b": 547.03909, "coord_origin": "1"}}, {"id": 46, "text": "Item", "bbox": {"l": 150.90895, "t": 544.67065, "r": 155.72055, "b": 547.03909, "coord_origin": "1"}}, {"id": 47, "text": "3.", "bbox": {"l": 147.30025, "t": 548.91264, "r": 149.70605, "b": 551.28108, "coord_origin": "1"}}, {"id": 48, "text": "Item", "bbox": {"l": 150.90895, "t": 548.91264, "r": 155.72055, "b": 551.28108, "coord_origin": "1"}}, {"id": 49, "text": "4.", "bbox": {"l": 147.30025, "t": 553.15465, "r": 149.70605, "b": 555.52309, "coord_origin": "1"}}, {"id": 50, "text": "Item", "bbox": {"l": 150.90895, "t": 553.15465, "r": 155.72055, "b": 555.52309, "coord_origin": "1"}}, {"id": 51, "text": "Extracted", "bbox": {"l": 152.05046, "t": 517.0098, "r": 171.24945, "b": 521.27298, "coord_origin": "1"}}, {"id": 52, "text": "Table Images", "bbox": {"l": 148.13347, "t": 522.3122900000001, "r": 175.16759, "b": 526.57547, "coord_origin": "1"}}, {"id": 53, "text": "Standardized", "bbox": {"l": 193.53331, "t": 524.51422, "r": 220.31973, "b": 528.7774, "coord_origin": "1"}}, {"id": 54, "text": "Images", "bbox": {"l": 199.47311, "t": 529.8167100000001, "r": 214.37889, "b": 534.0799, "coord_origin": "1"}}, {"id": 55, "text": "BBox", "bbox": {"l": 273.61066, "t": 509.9053, "r": 284.47275, "b": 514.16849, "coord_origin": "1"}}, {"id": 56, "text": "Decoder", "bbox": {"l": 270.45187, "t": 513.6928399999999, "r": 287.63242, "b": 517.9560200000001, "coord_origin": "1"}}, {"id": 57, "text": "BBoxes", "bbox": {"l": 332.47852, "t": 508.14438, "r": 348.14014, "b": 512.40756, "coord_origin": "1"}}, {"id": 58, "text": "BBoxes can be", "bbox": {"l": 376.68622, "t": 521.12024, "r": 407.25497, "b": 525.38342, "coord_origin": "1"}}, {"id": 59, "text": "traced back to the", "bbox": {"l": 373.90869, "t": 525.66525, "r": 410.03506, "b": 529.92844, "coord_origin": "1"}}, {"id": 60, "text": "original image to", "bbox": {"l": 375.29871, "t": 530.21024, "r": 408.64902, "b": 534.47342, "coord_origin": "1"}}, {"id": 61, "text": "extract content", "bbox": {"l": 377.06747, "t": 534.75522, "r": 406.88312, "b": 539.01843, "coord_origin": "1"}}, {"id": 62, "text": "Structure Tags sequence", "bbox": {"l": 383.56683, "t": 563.24176, "r": 433.76544, "b": 567.50497, "coord_origin": "1"}}, {"id": 63, "text": "provide full description of", "bbox": {"l": 383.52768, "t": 567.78676, "r": 433.80764999999997, "b": 572.04997, "coord_origin": "1"}}, {"id": 64, "text": "the table structure", "bbox": {"l": 390.47522, "t": 572.33177, "r": 426.85703, "b": 576.59499, "coord_origin": "1"}}, {"id": 65, "text": "Structure Tags", "bbox": {"l": 293.94702, "t": 577.89143, "r": 323.1691, "b": 582.15465, "coord_origin": "1"}}, {"id": 66, "text": "in OTSL format", "bbox": {"l": 293.94702, "t": 582.43648, "r": 324.59396, "b": 586.69969, "coord_origin": "1"}}, {"id": 67, "text": "BBoxes in sync", "bbox": {"l": 333.07819, "t": 541.82269, "r": 364.14691, "b": 546.08591, "coord_origin": "1"}}, {"id": 68, "text": "with tag sequence", "bbox": {"l": 333.07819, "t": 545.6102, "r": 369.71542, "b": 549.87341, "coord_origin": "1"}}, {"id": 69, "text": "Encoder", "bbox": {"l": 232.65881000000002, "t": 515.24139, "r": 249.58894000000004, "b": 519.50458, "coord_origin": "1"}}, {"id": 70, "text": "Structure", "bbox": {"l": 269.8219, "t": 545.97102, "r": 288.26279, "b": 550.23424, "coord_origin": "1"}}, {"id": 71, "text": "Decoder", "bbox": {"l": 270.45187, "t": 549.75851, "r": 287.63242, "b": 554.0217299999999, "coord_origin": "1"}}, {"id": 72, "text": "[x1, y2, x2, y2]", "bbox": {"l": 332.17676, "t": 515.91205, "r": 358.11206, "b": 520.17523, "coord_origin": "1"}}, {"id": 73, "text": "[x1', y2', x2', y2']", "bbox": {"l": 332.17676, "t": 521.9720500000001, "r": 361.58298, "b": 526.23523, "coord_origin": "1"}}, {"id": 74, "text": "[x1'', y2'', x2'', y2'']", "bbox": {"l": 332.17676, "t": 528.03204, "r": 364.76474, "b": 532.29523, "coord_origin": "1"}}, {"id": 75, "text": "...", "bbox": {"l": 332.17676, "t": 534.09204, "r": 335.96548, "b": 538.35524, "coord_origin": "1"}}, {"id": 76, "text": "1", "bbox": {"l": 326.8894, "t": 516.39508, "r": 329.41641, "b": 520.6582599999999, "coord_origin": "1"}}, {"id": 77, "text": "2", "bbox": {"l": 327.04089, "t": 522.4247700000001, "r": 329.5679, "b": 526.68796, "coord_origin": "1"}}, {"id": 78, "text": "3", "bbox": {"l": 327.04089, "t": 528.51508, "r": 329.5679, "b": 532.77826, "coord_origin": "1"}}, {"id": 79, "text": "3", "bbox": {"l": 424.14102, "t": 527.4428399999999, "r": 426.66803, "b": 531.7060200000001, "coord_origin": "1"}}, {"id": 80, "text": "2", "bbox": {"l": 453.0018, "t": 517.4539500000001, "r": 455.52881, "b": 521.71713, "coord_origin": "1"}}, {"id": 81, "text": "1", "bbox": {"l": 423.85825, "t": 517.06281, "r": 426.38525, "b": 521.32599, "coord_origin": "1"}}, {"id": 82, "text": "C", "bbox": {"l": 333.4342, "t": 557.36679, "r": 337.27542, "b": 562.35719, "coord_origin": "1"}}, {"id": 83, "text": "C", "bbox": {"l": 340.35397, "t": 557.31679, "r": 344.19519, "b": 562.30719, "coord_origin": "1"}}, {"id": 84, "text": "C", "bbox": {"l": 340.30978, "t": 563.8653899999999, "r": 344.151, "b": 568.8557900000001, "coord_origin": "1"}}, {"id": 85, "text": "C", "bbox": {"l": 346.79904, "t": 563.8686700000001, "r": 350.64026, "b": 568.85907, "coord_origin": "1"}}, {"id": 86, "text": "C", "bbox": {"l": 333.59583, "t": 563.82271, "r": 337.43704, "b": 568.81311, "coord_origin": "1"}}, {"id": 87, "text": "C", "bbox": {"l": 340.37543, "t": 570.42673, "r": 344.21664, "b": 575.41713, "coord_origin": "1"}}, {"id": 88, "text": "C", "bbox": {"l": 346.86469, "t": 570.43001, "r": 350.7059, "b": 575.42041, "coord_origin": "1"}}, {"id": 89, "text": "C", "bbox": {"l": 333.66144, "t": 570.38405, "r": 337.50266, "b": 575.37445, "coord_origin": "1"}}, {"id": 90, "text": "C", "bbox": {"l": 340.37671, "t": 577.02606, "r": 344.21793, "b": 582.0164599999999, "coord_origin": "1"}}, {"id": 91, "text": "C", "bbox": {"l": 346.86597, "t": 577.02934, "r": 350.70718, "b": 582.01974, "coord_origin": "1"}}, {"id": 92, "text": "C", "bbox": {"l": 333.66272, "t": 576.98338, "r": 337.50394, "b": 581.97379, "coord_origin": "1"}}, {"id": 93, "text": "C", "bbox": {"l": 340.27948, "t": 583.39737, "r": 344.1207, "b": 588.38777, "coord_origin": "1"}}, {"id": 94, "text": "C", "bbox": {"l": 346.76874, "t": 583.40068, "r": 350.60995, "b": 588.39108, "coord_origin": "1"}}, {"id": 95, "text": "C", "bbox": {"l": 333.56549, "t": 583.35474, "r": 337.40671, "b": 588.34514, "coord_origin": "1"}}, {"id": 96, "text": "NL", "bbox": {"l": 353.03326, "t": 556.8831299999999, "r": 359.83362, "b": 561.87354, "coord_origin": "1"}}, {"id": 97, "text": "NL", "bbox": {"l": 353.18604, "t": 563.58044, "r": 359.98639, "b": 568.57085, "coord_origin": "1"}}, {"id": 98, "text": "NL", "bbox": {"l": 353.19864, "t": 570.1623500000001, "r": 359.99899, "b": 575.15276, "coord_origin": "1"}}, {"id": 99, "text": "NL", "bbox": {"l": 353.1532, "t": 576.76611, "r": 359.95355, "b": 581.75652, "coord_origin": "1"}}, {"id": 100, "text": "NL", "bbox": {"l": 353.26935, "t": 583.40628, "r": 360.0697, "b": 588.3966800000001, "coord_origin": "1"}}, {"id": 101, "text": "L", "bbox": {"l": 347.37979, "t": 557.08235, "r": 350.33786, "b": 562.07275, "coord_origin": "1"}}, {"id": 102, "text": "3", "bbox": {"l": 331.14026, "t": 564.2907700000001, "r": 333.66727, "b": 568.55399, "coord_origin": "1"}}, {"id": 103, "text": "2", "bbox": {"l": 340.80972, "t": 554.59312, "r": 343.33673, "b": 558.85634, "coord_origin": "1"}}, {"id": 104, "text": "1", "bbox": {"l": 330.97992, "t": 554.83035, "r": 333.50693, "b": 559.09357, "coord_origin": "1"}}]}, "text": "", "data": null, "provenance": null, "predicted_class": null, "confidence": null}, {"label": "Text", "id": 9, "page_no": 7, "cluster": {"id": 9, "label": "Text", "bbox": {"l": 133.83853654861448, "t": 619.5480606079101, "r": 480.5917400000001, "b": 665.1434852600097, "coord_origin": "1"}, "confidence": 0.9766379594802856, "cells": [{"id": 105, "text": "We rely on standard metrics such as Tree Edit Distance score (TEDs) for", "bbox": {"l": 149.709, "t": 620.19278, "r": 480.58792, "b": 628.98975, "coord_origin": "1"}}, {"id": 106, "text": "table structure prediction, and Mean Average Precision (mAP) with 0.75 Inter-", "bbox": {"l": 134.765, "t": 632.14778, "r": 480.58871, "b": 640.94475, "coord_origin": "1"}}, {"id": 107, "text": "section Over Union (IOU) threshold for the bounding-box predictions of table", "bbox": {"l": 134.765, "t": 644.1027799999999, "r": 480.5917400000001, "b": 652.89975, "coord_origin": "1"}}, {"id": 108, "text": "cells. The predicted OTSL structures were converted back to HTML format in", "bbox": {"l": 134.765, "t": 656.0577900000001, "r": 480.58968999999996, "b": 664.8547599999999, "coord_origin": "1"}}]}, "text": "We rely on standard metrics such as Tree Edit Distance score (TEDs) for table structure prediction, and Mean Average Precision (mAP) with 0.75 Intersection Over Union (IOU) threshold for the bounding-box predictions of table cells. The predicted OTSL structures were converted back to HTML format in"}], "headers": [{"label": "Page-header", "id": 0, "page_no": 7, "cluster": {"id": 0, "label": "Page-header", "bbox": {"l": 134.19006814956666, "t": 93.66879501342771, "r": 139.46353826522827, "b": 101.84069999999997, "coord_origin": "1"}, "confidence": 0.7701128125190735, "cells": [{"id": 0, "text": "8", "bbox": {"l": 134.765, "t": 93.77099999999996, "r": 139.37193, "b": 101.84069999999997, "coord_origin": "1"}}]}, "text": "8"}, {"label": "Page-header", "id": 1, "page_no": 7, "cluster": {"id": 1, "label": "Page-header", "bbox": {"l": 167.4087100982666, "t": 92.9255227088928, "r": 231.72227, "b": 101.94012937545779, "coord_origin": "1"}, "confidence": 0.8090205788612366, "cells": [{"id": 1, "text": "M.", "bbox": {"l": 167.81335, "t": 93.77099999999996, "r": 178.07675, "b": 101.84069999999997, "coord_origin": "1"}}, {"id": 2, "text": "Lysak, et al.", "bbox": {"l": 182.37415, "t": 93.77099999999996, "r": 231.72227, "b": 101.84069999999997, "coord_origin": "1"}}]}, "text": "M. Lysak, et al."}]}}, {"page_no": 8, "page_hash": "d50e5f3b8b4d1d5b04d5b253b187da6f40784bee5bf36b7eaefcabbc89e7b7a9", "size": {"width": 612.0, "height": 792.0}, "cells": [{"id": 0, "text": "Optimized Table Tokenization for Table Structure Recognition", "bbox": {"l": 194.478, "t": 93.77099999999996, "r": 447.54291000000006, "b": 101.84069999999997, "coord_origin": "1"}}, {"id": 1, "text": "9", "bbox": {"l": 475.98431, "t": 93.77099999999996, "r": 480.59125000000006, "b": 101.84069999999997, "coord_origin": "1"}}, {"id": 2, "text": "order to compute the TED score. Inference timing results for all experiments", "bbox": {"l": 134.765, "t": 118.93377999999996, "r": 480.5936899999999, "b": 127.73077, "coord_origin": "1"}}, {"id": 3, "text": "were obtained from the same machine on a single core with AMD EPYC 7763", "bbox": {"l": 134.765, "t": 130.88878999999997, "r": 480.59579, "b": 139.68579, "coord_origin": "1"}}, {"id": 4, "text": "CPU @2.45 GHz.", "bbox": {"l": 134.765, "t": 142.84479, "r": 210.78462, "b": 151.64178000000004, "coord_origin": "1"}}, {"id": 5, "text": "5.1", "bbox": {"l": 134.765, "t": 169.18584999999996, "r": 149.40205, "b": 177.9928, "coord_origin": "1"}}, {"id": 6, "text": "Hyper Parameter Optimization", "bbox": {"l": 160.85904, "t": 169.18584999999996, "r": 318.44843, "b": 177.9928, "coord_origin": "1"}}, {"id": 7, "text": "We have chosen the PubTabNet data set to perform HPO, since it includes a", "bbox": {"l": 134.765, "t": 185.58582, "r": 480.59183, "b": 194.38280999999995, "coord_origin": "1"}}, {"id": 8, "text": "highly diverse set of tables. Also we report TED scores separately for simple and", "bbox": {"l": 134.765, "t": 197.54083000000003, "r": 480.59183, "b": 206.33783000000005, "coord_origin": "1"}}, {"id": 9, "text": "complex tables (tables with cell spans). Results are presented in Table. 1. It is", "bbox": {"l": 134.765, "t": 209.49585000000002, "r": 480.59177000000005, "b": 218.29285000000004, "coord_origin": "1"}}, {"id": 10, "text": "evident that with OTSL, our model achieves the same TED score and slightly", "bbox": {"l": 134.765, "t": 221.45087, "r": 480.59277, "b": 230.24785999999995, "coord_origin": "1"}}, {"id": 11, "text": "better mAP scores in comparison to HTML. However OTSL yields a", "bbox": {"l": 134.765, "t": 233.40588000000002, "r": 440.94159, "b": 242.20288000000005, "coord_origin": "1"}}, {"id": 12, "text": "2x speed", "bbox": {"l": 444.86798, "t": 233.40588000000002, "r": 480.58786000000003, "b": 242.20288000000005, "coord_origin": "1"}}, {"id": 13, "text": "up", "bbox": {"l": 134.76498, "t": 245.36188000000004, "r": 145.20081, "b": 254.15886999999998, "coord_origin": "1"}}, {"id": 14, "text": "in the inference runtime over HTML.", "bbox": {"l": 149.14899, "t": 245.36188000000004, "r": 311.21957, "b": 254.15886999999998, "coord_origin": "1"}}, {"id": 15, "text": "Table", "bbox": {"l": 134.76498, "t": 275.07232999999997, "r": 160.11836, "b": 282.9986, "coord_origin": "1"}}, {"id": 16, "text": "1.", "bbox": {"l": 167.34528, "t": 275.07232999999997, "r": 175.59526, "b": 282.9986, "coord_origin": "1"}}, {"id": 17, "text": "HPO performed in OTSL and HTML representation on the same", "bbox": {"l": 188.13298, "t": 275.13507000000004, "r": 480.59365999999994, "b": 283.2048300000001, "coord_origin": "1"}}, {"id": 18, "text": "transformer-based TableFormer [9] architecture, trained only on PubTabNet [22]. Ef-", "bbox": {"l": 134.76498, "t": 286.09409, "r": 480.59444999999994, "b": 294.16385, "coord_origin": "1"}}, {"id": 19, "text": "fects of reducing the # of layers in encoder and decoder stages of the model show that", "bbox": {"l": 134.76498, "t": 297.05307, "r": 480.5954, "b": 305.12283, "coord_origin": "1"}}, {"id": 20, "text": "smaller models trained on OTSL perform better, especially in recognizing complex", "bbox": {"l": 134.76498, "t": 308.01205, "r": 480.59451, "b": 316.08182, "coord_origin": "1"}}, {"id": 21, "text": "table structures, and maintain a much higher mAP score than the HTML counterpart.", "bbox": {"l": 134.76498, "t": 318.97104, "r": 480.59441999999996, "b": 327.0408, "coord_origin": "1"}}, {"id": 22, "text": "#", "bbox": {"l": 160.37, "t": 341.73495, "r": 168.04793, "b": 349.8047199999999, "coord_origin": "1"}}, {"id": 23, "text": "enc-layers", "bbox": {"l": 144.592, "t": 354.68594, "r": 183.82806, "b": 362.75570999999997, "coord_origin": "1"}}, {"id": 24, "text": "#", "bbox": {"l": 207.974, "t": 341.73495, "r": 215.65193, "b": 349.8047199999999, "coord_origin": "1"}}, {"id": 25, "text": "dec-layers", "bbox": {"l": 192.19499, "t": 354.68594, "r": 231.43106, "b": 362.75570999999997, "coord_origin": "1"}}, {"id": 26, "text": "Language", "bbox": {"l": 239.79799999999997, "t": 347.21396, "r": 278.31766, "b": 355.28372, "coord_origin": "1"}}, {"id": 27, "text": "TEDs", "bbox": {"l": 324.67001, "t": 341.73495, "r": 348.26419, "b": 349.8047199999999, "coord_origin": "1"}}, {"id": 28, "text": "mAP", "bbox": {"l": 396.271, "t": 341.73495, "r": 417.12683, "b": 349.8047199999999, "coord_origin": "1"}}, {"id": 29, "text": "(0.75)", "bbox": {"l": 394.927, "t": 352.69394000000005, "r": 418.47278, "b": 360.7637, "coord_origin": "1"}}, {"id": 30, "text": "Inference", "bbox": {"l": 430.771, "t": 341.73495, "r": 467.1423, "b": 349.8047199999999, "coord_origin": "1"}}, {"id": 31, "text": "time (secs)", "bbox": {"l": 427.14801, "t": 352.69394000000005, "r": 470.76056, "b": 360.7637, "coord_origin": "1"}}, {"id": 32, "text": "simple", "bbox": {"l": 286.686, "t": 354.68594, "r": 312.33261, "b": 362.75570999999997, "coord_origin": "1"}}, {"id": 33, "text": "complex", "bbox": {"l": 320.702, "t": 354.68594, "r": 353.71988, "b": 362.75570999999997, "coord_origin": "1"}}, {"id": 34, "text": "all", "bbox": {"l": 369.306, "t": 354.68594, "r": 379.03094, "b": 362.75570999999997, "coord_origin": "1"}}, {"id": 35, "text": "6", "bbox": {"l": 161.90601, "t": 373.51596, "r": 166.51294, "b": 381.58572, "coord_origin": "1"}}, {"id": 36, "text": "6", "bbox": {"l": 209.509, "t": 373.51596, "r": 214.11594, "b": 381.58572, "coord_origin": "1"}}, {"id": 37, "text": "OTSL", "bbox": {"l": 246.71000999999998, "t": 368.03595, "r": 271.40527, "b": 376.10571, "coord_origin": "1"}}, {"id": 38, "text": "0.965", "bbox": {"l": 289.017, "t": 368.03595, "r": 310.00375, "b": 376.10571, "coord_origin": "1"}}, {"id": 39, "text": "0.934", "bbox": {"l": 326.71701, "t": 368.03595, "r": 347.70377, "b": 376.10571, "coord_origin": "1"}}, {"id": 40, "text": "0.955", "bbox": {"l": 363.67599, "t": 368.03595, "r": 384.66275, "b": 376.10571, "coord_origin": "1"}}, {"id": 41, "text": "0.88", "bbox": {"l": 397.26999, "t": 367.97317999999996, "r": 416.12723, "b": 375.89948, "coord_origin": "1"}}, {"id": 42, "text": "2.73", "bbox": {"l": 439.52701, "t": 367.97317999999996, "r": 458.38425, "b": 375.89948, "coord_origin": "1"}}, {"id": 43, "text": "HTML", "bbox": {"l": 245.17598999999998, "t": 380.98795, "r": 272.93954, "b": 389.05771, "coord_origin": "1"}}, {"id": 44, "text": "0.969", "bbox": {"l": 289.017, "t": 380.98795, "r": 310.00375, "b": 389.05771, "coord_origin": "1"}}, {"id": 45, "text": "0.927", "bbox": {"l": 326.71701, "t": 380.98795, "r": 347.70377, "b": 389.05771, "coord_origin": "1"}}, {"id": 46, "text": "0.955", "bbox": {"l": 363.67599, "t": 380.98795, "r": 384.66275, "b": 389.05771, "coord_origin": "1"}}, {"id": 47, "text": "0.857", "bbox": {"l": 396.20599, "t": 380.98795, "r": 417.19275, "b": 389.05771, "coord_origin": "1"}}, {"id": 48, "text": "5.39", "bbox": {"l": 440.767, "t": 380.98795, "r": 457.14682, "b": 389.05771, "coord_origin": "1"}}, {"id": 49, "text": "4", "bbox": {"l": 161.90601, "t": 399.81696, "r": 166.51294, "b": 407.88672, "coord_origin": "1"}}, {"id": 50, "text": "4", "bbox": {"l": 209.509, "t": 399.81696, "r": 214.11594, "b": 407.88672, "coord_origin": "1"}}, {"id": 51, "text": "OTSL", "bbox": {"l": 246.71000999999998, "t": 394.33795, "r": 271.40527, "b": 402.40771, "coord_origin": "1"}}, {"id": 52, "text": "0.938", "bbox": {"l": 289.017, "t": 394.33795, "r": 310.00375, "b": 402.40771, "coord_origin": "1"}}, {"id": 53, "text": "0.904", "bbox": {"l": 326.71701, "t": 394.33795, "r": 347.70377, "b": 402.40771, "coord_origin": "1"}}, {"id": 54, "text": "0.927", "bbox": {"l": 363.67599, "t": 394.33795, "r": 384.66275, "b": 402.40771, "coord_origin": "1"}}, {"id": 55, "text": "0.853", "bbox": {"l": 394.61801, "t": 394.27518, "r": 418.77887, "b": 402.20148, "coord_origin": "1"}}, {"id": 56, "text": "1.97", "bbox": {"l": 439.52701, "t": 394.27518, "r": 458.38425, "b": 402.20148, "coord_origin": "1"}}, {"id": 57, "text": "HTML", "bbox": {"l": 245.17598999999998, "t": 407.28894, "r": 272.93954, "b": 415.3587, "coord_origin": "1"}}, {"id": 58, "text": "0.952", "bbox": {"l": 289.017, "t": 407.28894, "r": 310.00375, "b": 415.3587, "coord_origin": "1"}}, {"id": 59, "text": "0.909", "bbox": {"l": 326.71701, "t": 407.28894, "r": 347.70377, "b": 415.3587, "coord_origin": "1"}}, {"id": 60, "text": "0.938", "bbox": {"l": 362.08801, "t": 407.22617, "r": 386.24887, "b": 415.15247, "coord_origin": "1"}}, {"id": 61, "text": "0.843", "bbox": {"l": 396.20599, "t": 407.28894, "r": 417.19275, "b": 415.3587, "coord_origin": "1"}}, {"id": 62, "text": "3.77", "bbox": {"l": 440.767, "t": 407.28894, "r": 457.14682, "b": 415.3587, "coord_origin": "1"}}, {"id": 63, "text": "2", "bbox": {"l": 161.90601, "t": 426.11795, "r": 166.51294, "b": 434.1877099999999, "coord_origin": "1"}}, {"id": 64, "text": "4", "bbox": {"l": 209.509, "t": 426.11795, "r": 214.11594, "b": 434.1877099999999, "coord_origin": "1"}}, {"id": 65, "text": "OTSL", "bbox": {"l": 246.71000999999998, "t": 420.63895, "r": 271.40527, "b": 428.70871, "coord_origin": "1"}}, {"id": 66, "text": "0.923", "bbox": {"l": 289.017, "t": 420.63895, "r": 310.00375, "b": 428.70871, "coord_origin": "1"}}, {"id": 67, "text": "0.897", "bbox": {"l": 326.71701, "t": 420.63895, "r": 347.70377, "b": 428.70871, "coord_origin": "1"}}, {"id": 68, "text": "0.915", "bbox": {"l": 363.67599, "t": 420.63895, "r": 384.66275, "b": 428.70871, "coord_origin": "1"}}, {"id": 69, "text": "0.859", "bbox": {"l": 394.61801, "t": 420.57617, "r": 418.77887, "b": 428.50247, "coord_origin": "1"}}, {"id": 70, "text": "1.91", "bbox": {"l": 439.52701, "t": 420.57617, "r": 458.38425, "b": 428.50247, "coord_origin": "1"}}, {"id": 71, "text": "HTML", "bbox": {"l": 245.17598999999998, "t": 433.58994, "r": 272.93954, "b": 441.6597, "coord_origin": "1"}}, {"id": 72, "text": "0.945", "bbox": {"l": 289.017, "t": 433.58994, "r": 310.00375, "b": 441.6597, "coord_origin": "1"}}, {"id": 73, "text": "0.901", "bbox": {"l": 326.71701, "t": 433.58994, "r": 347.70377, "b": 441.6597, "coord_origin": "1"}}, {"id": 74, "text": "0.931", "bbox": {"l": 362.08801, "t": 433.5271599999999, "r": 386.24887, "b": 441.45346, "coord_origin": "1"}}, {"id": 75, "text": "0.834", "bbox": {"l": 396.20599, "t": 433.58994, "r": 417.19275, "b": 441.6597, "coord_origin": "1"}}, {"id": 76, "text": "3.81", "bbox": {"l": 440.767, "t": 433.58994, "r": 457.14682, "b": 441.6597, "coord_origin": "1"}}, {"id": 77, "text": "4", "bbox": {"l": 161.90601, "t": 452.41995, "r": 166.51294, "b": 460.48972, "coord_origin": "1"}}, {"id": 78, "text": "2", "bbox": {"l": 209.509, "t": 452.41995, "r": 214.11594, "b": 460.48972, "coord_origin": "1"}}, {"id": 79, "text": "OTSL", "bbox": {"l": 246.71000999999998, "t": 446.9399399999999, "r": 271.40527, "b": 455.0097, "coord_origin": "1"}}, {"id": 80, "text": "0.952", "bbox": {"l": 289.017, "t": 446.9399399999999, "r": 310.00375, "b": 455.0097, "coord_origin": "1"}}, {"id": 81, "text": "0.92", "bbox": {"l": 329.021, "t": 446.9399399999999, "r": 345.40082, "b": 455.0097, "coord_origin": "1"}}, {"id": 82, "text": "0.942", "bbox": {"l": 362.08801, "t": 446.87717, "r": 386.24887, "b": 454.80347, "coord_origin": "1"}}, {"id": 83, "text": "0.857", "bbox": {"l": 394.61801, "t": 446.87717, "r": 418.77887, "b": 454.80347, "coord_origin": "1"}}, {"id": 84, "text": "1.22", "bbox": {"l": 439.52701, "t": 446.87717, "r": 458.38425, "b": 454.80347, "coord_origin": "1"}}, {"id": 85, "text": "HTML", "bbox": {"l": 245.17598999999998, "t": 459.8919399999999, "r": 272.93954, "b": 467.9617, "coord_origin": "1"}}, {"id": 86, "text": "0.944", "bbox": {"l": 289.017, "t": 459.8919399999999, "r": 310.00375, "b": 467.9617, "coord_origin": "1"}}, {"id": 87, "text": "0.903", "bbox": {"l": 326.71701, "t": 459.8919399999999, "r": 347.70377, "b": 467.9617, "coord_origin": "1"}}, {"id": 88, "text": "0.931", "bbox": {"l": 363.67599, "t": 459.8919399999999, "r": 384.66275, "b": 467.9617, "coord_origin": "1"}}, {"id": 89, "text": "0.824", "bbox": {"l": 396.20599, "t": 459.8919399999999, "r": 417.19275, "b": 467.9617, "coord_origin": "1"}}, {"id": 90, "text": "2", "bbox": {"l": 446.65302, "t": 459.8919399999999, "r": 451.25995, "b": 467.9617, "coord_origin": "1"}}, {"id": 91, "text": "5.2", "bbox": {"l": 134.765, "t": 508.15179, "r": 149.40205, "b": 516.95874, "coord_origin": "1"}}, {"id": 92, "text": "Quantitative Results", "bbox": {"l": 160.85904, "t": 508.15179, "r": 264.40332, "b": 516.95874, "coord_origin": "1"}}, {"id": 93, "text": "We picked the model parameter configuration that produced the best prediction", "bbox": {"l": 134.765, "t": 524.55078, "r": 480.59075999999993, "b": 533.34775, "coord_origin": "1"}}, {"id": 94, "text": "quality (enc=6, dec=6, heads=8) with PubTabNet alone, then independently", "bbox": {"l": 134.765, "t": 536.50677, "r": 480.58675999999997, "b": 545.3037400000001, "coord_origin": "1"}}, {"id": 95, "text": "trained and evaluated it on three publicly available data sets: PubTabNet (395k", "bbox": {"l": 134.765, "t": 548.4617800000001, "r": 480.59572999999995, "b": 557.25874, "coord_origin": "1"}}, {"id": 96, "text": "samples), FinTabNet (113k samples) and PubTables-1M (about 1M samples).", "bbox": {"l": 134.765, "t": 560.41678, "r": 480.59177000000005, "b": 569.21375, "coord_origin": "1"}}, {"id": 97, "text": "Performance results are presented in Table. 2. It is clearly evident that the model", "bbox": {"l": 134.765, "t": 572.37178, "r": 480.59069999999997, "b": 581.16875, "coord_origin": "1"}}, {"id": 98, "text": "trained on OTSL outperforms HTML across the board, keeping high TEDs and", "bbox": {"l": 134.765, "t": 584.32678, "r": 480.5957599999999, "b": 593.12375, "coord_origin": "1"}}, {"id": 99, "text": "mAP scores even on difficult financial tables (FinTabNet) that contain sparse", "bbox": {"l": 134.765, "t": 596.28278, "r": 480.58774, "b": 605.07974, "coord_origin": "1"}}, {"id": 100, "text": "and large tables.", "bbox": {"l": 134.765, "t": 608.2377799999999, "r": 206.78664, "b": 617.03474, "coord_origin": "1"}}, {"id": 101, "text": "Additionally, the results show that OTSL has an advantage over HTML", "bbox": {"l": 149.709, "t": 620.19278, "r": 480.59271, "b": 628.98975, "coord_origin": "1"}}, {"id": 102, "text": "when applied on a bigger data set like PubTables-1M and achieves significantly", "bbox": {"l": 134.765, "t": 632.14778, "r": 480.5957599999999, "b": 640.94475, "coord_origin": "1"}}, {"id": 103, "text": "improved scores. Finally, OTSL achieves faster inference due to fewer decoding", "bbox": {"l": 134.765, "t": 644.1027799999999, "r": 480.59283000000005, "b": 652.89975, "coord_origin": "1"}}, {"id": 104, "text": "steps which is a result of the reduced sequence representation.", "bbox": {"l": 134.765, "t": 656.0577900000001, "r": 405.79651, "b": 664.8547599999999, "coord_origin": "1"}}], "predictions": {"layout": {"clusters": [{"id": 0, "label": "Page-header", "bbox": {"l": 193.94394721984864, "t": 93.11655950546265, "r": 447.54291000000006, "b": 102.24131870269775, "coord_origin": "1"}, "confidence": 0.9502049088478088, "cells": [{"id": 0, "text": "Optimized Table Tokenization for Table Structure Recognition", "bbox": {"l": 194.478, "t": 93.77099999999996, "r": 447.54291000000006, "b": 101.84069999999997, "coord_origin": "1"}}]}, {"id": 1, "label": "Page-header", "bbox": {"l": 474.9051853179932, "t": 93.4998132705689, "r": 480.59125000000006, "b": 101.84069999999997, "coord_origin": "1"}, "confidence": 0.870819091796875, "cells": [{"id": 1, "text": "9", "bbox": {"l": 475.98431, "t": 93.77099999999996, "r": 480.59125000000006, "b": 101.84069999999997, "coord_origin": "1"}}]}, {"id": 2, "label": "Text", "bbox": {"l": 133.90584583282472, "t": 118.23915395736697, "r": 480.59579, "b": 151.64178000000004, "coord_origin": "1"}, "confidence": 0.9810612201690674, "cells": [{"id": 2, "text": "order to compute the TED score. Inference timing results for all experiments", "bbox": {"l": 134.765, "t": 118.93377999999996, "r": 480.5936899999999, "b": 127.73077, "coord_origin": "1"}}, {"id": 3, "text": "were obtained from the same machine on a single core with AMD EPYC 7763", "bbox": {"l": 134.765, "t": 130.88878999999997, "r": 480.59579, "b": 139.68579, "coord_origin": "1"}}, {"id": 4, "text": "CPU @2.45 GHz.", "bbox": {"l": 134.765, "t": 142.84479, "r": 210.78462, "b": 151.64178000000004, "coord_origin": "1"}}]}, {"id": 3, "label": "Section-header", "bbox": {"l": 134.28504238128662, "t": 168.39932327270503, "r": 318.44843, "b": 178.3033452987671, "coord_origin": "1"}, "confidence": 0.9505251049995422, "cells": [{"id": 5, "text": "5.1", "bbox": {"l": 134.765, "t": 169.18584999999996, "r": 149.40205, "b": 177.9928, "coord_origin": "1"}}, {"id": 6, "text": "Hyper Parameter Optimization", "bbox": {"l": 160.85904, "t": 169.18584999999996, "r": 318.44843, "b": 177.9928, "coord_origin": "1"}}]}, {"id": 4, "label": "Text", "bbox": {"l": 133.80440769195556, "t": 184.85479145050044, "r": 481.1519771575928, "b": 254.36992263793945, "coord_origin": "1"}, "confidence": 0.9858020544052124, "cells": [{"id": 7, "text": "We have chosen the PubTabNet data set to perform HPO, since it includes a", "bbox": {"l": 134.765, "t": 185.58582, "r": 480.59183, "b": 194.38280999999995, "coord_origin": "1"}}, {"id": 8, "text": "highly diverse set of tables. Also we report TED scores separately for simple and", "bbox": {"l": 134.765, "t": 197.54083000000003, "r": 480.59183, "b": 206.33783000000005, "coord_origin": "1"}}, {"id": 9, "text": "complex tables (tables with cell spans). Results are presented in Table. 1. It is", "bbox": {"l": 134.765, "t": 209.49585000000002, "r": 480.59177000000005, "b": 218.29285000000004, "coord_origin": "1"}}, {"id": 10, "text": "evident that with OTSL, our model achieves the same TED score and slightly", "bbox": {"l": 134.765, "t": 221.45087, "r": 480.59277, "b": 230.24785999999995, "coord_origin": "1"}}, {"id": 11, "text": "better mAP scores in comparison to HTML. However OTSL yields a", "bbox": {"l": 134.765, "t": 233.40588000000002, "r": 440.94159, "b": 242.20288000000005, "coord_origin": "1"}}, {"id": 12, "text": "2x speed", "bbox": {"l": 444.86798, "t": 233.40588000000002, "r": 480.58786000000003, "b": 242.20288000000005, "coord_origin": "1"}}, {"id": 13, "text": "up", "bbox": {"l": 134.76498, "t": 245.36188000000004, "r": 145.20081, "b": 254.15886999999998, "coord_origin": "1"}}, {"id": 14, "text": "in the inference runtime over HTML.", "bbox": {"l": 149.14899, "t": 245.36188000000004, "r": 311.21957, "b": 254.15886999999998, "coord_origin": "1"}}]}, {"id": 5, "label": "Caption", "bbox": {"l": 133.88543272018433, "t": 274.21845130920406, "r": 480.5954, "b": 327.4440181732178, "coord_origin": "1"}, "confidence": 0.9517639875411987, "cells": [{"id": 15, "text": "Table", "bbox": {"l": 134.76498, "t": 275.07232999999997, "r": 160.11836, "b": 282.9986, "coord_origin": "1"}}, {"id": 16, "text": "1.", "bbox": {"l": 167.34528, "t": 275.07232999999997, "r": 175.59526, "b": 282.9986, "coord_origin": "1"}}, {"id": 17, "text": "HPO performed in OTSL and HTML representation on the same", "bbox": {"l": 188.13298, "t": 275.13507000000004, "r": 480.59365999999994, "b": 283.2048300000001, "coord_origin": "1"}}, {"id": 18, "text": "transformer-based TableFormer [9] architecture, trained only on PubTabNet [22]. Ef-", "bbox": {"l": 134.76498, "t": 286.09409, "r": 480.59444999999994, "b": 294.16385, "coord_origin": "1"}}, {"id": 19, "text": "fects of reducing the # of layers in encoder and decoder stages of the model show that", "bbox": {"l": 134.76498, "t": 297.05307, "r": 480.5954, "b": 305.12283, "coord_origin": "1"}}, {"id": 20, "text": "smaller models trained on OTSL perform better, especially in recognizing complex", "bbox": {"l": 134.76498, "t": 308.01205, "r": 480.59451, "b": 316.08182, "coord_origin": "1"}}, {"id": 21, "text": "table structures, and maintain a much higher mAP score than the HTML counterpart.", "bbox": {"l": 134.76498, "t": 318.97104, "r": 480.59441999999996, "b": 327.0408, "coord_origin": "1"}}]}, {"id": 6, "label": "Table", "bbox": {"l": 139.82041025161743, "t": 337.08411598205566, "r": 474.8002452850342, "b": 469.7329902648926, "coord_origin": "1"}, "confidence": 0.990551233291626, "cells": [{"id": 22, "text": "#", "bbox": {"l": 160.37, "t": 341.73495, "r": 168.04793, "b": 349.8047199999999, "coord_origin": "1"}}, {"id": 23, "text": "enc-layers", "bbox": {"l": 144.592, "t": 354.68594, "r": 183.82806, "b": 362.75570999999997, "coord_origin": "1"}}, {"id": 24, "text": "#", "bbox": {"l": 207.974, "t": 341.73495, "r": 215.65193, "b": 349.8047199999999, "coord_origin": "1"}}, {"id": 25, "text": "dec-layers", "bbox": {"l": 192.19499, "t": 354.68594, "r": 231.43106, "b": 362.75570999999997, "coord_origin": "1"}}, {"id": 26, "text": "Language", "bbox": {"l": 239.79799999999997, "t": 347.21396, "r": 278.31766, "b": 355.28372, "coord_origin": "1"}}, {"id": 27, "text": "TEDs", "bbox": {"l": 324.67001, "t": 341.73495, "r": 348.26419, "b": 349.8047199999999, "coord_origin": "1"}}, {"id": 28, "text": "mAP", "bbox": {"l": 396.271, "t": 341.73495, "r": 417.12683, "b": 349.8047199999999, "coord_origin": "1"}}, {"id": 29, "text": "(0.75)", "bbox": {"l": 394.927, "t": 352.69394000000005, "r": 418.47278, "b": 360.7637, "coord_origin": "1"}}, {"id": 30, "text": "Inference", "bbox": {"l": 430.771, "t": 341.73495, "r": 467.1423, "b": 349.8047199999999, "coord_origin": "1"}}, {"id": 31, "text": "time (secs)", "bbox": {"l": 427.14801, "t": 352.69394000000005, "r": 470.76056, "b": 360.7637, "coord_origin": "1"}}, {"id": 32, "text": "simple", "bbox": {"l": 286.686, "t": 354.68594, "r": 312.33261, "b": 362.75570999999997, "coord_origin": "1"}}, {"id": 33, "text": "complex", "bbox": {"l": 320.702, "t": 354.68594, "r": 353.71988, "b": 362.75570999999997, "coord_origin": "1"}}, {"id": 34, "text": "all", "bbox": {"l": 369.306, "t": 354.68594, "r": 379.03094, "b": 362.75570999999997, "coord_origin": "1"}}, {"id": 35, "text": "6", "bbox": {"l": 161.90601, "t": 373.51596, "r": 166.51294, "b": 381.58572, "coord_origin": "1"}}, {"id": 36, "text": "6", "bbox": {"l": 209.509, "t": 373.51596, "r": 214.11594, "b": 381.58572, "coord_origin": "1"}}, {"id": 37, "text": "OTSL", "bbox": {"l": 246.71000999999998, "t": 368.03595, "r": 271.40527, "b": 376.10571, "coord_origin": "1"}}, {"id": 38, "text": "0.965", "bbox": {"l": 289.017, "t": 368.03595, "r": 310.00375, "b": 376.10571, "coord_origin": "1"}}, {"id": 39, "text": "0.934", "bbox": {"l": 326.71701, "t": 368.03595, "r": 347.70377, "b": 376.10571, "coord_origin": "1"}}, {"id": 40, "text": "0.955", "bbox": {"l": 363.67599, "t": 368.03595, "r": 384.66275, "b": 376.10571, "coord_origin": "1"}}, {"id": 41, "text": "0.88", "bbox": {"l": 397.26999, "t": 367.97317999999996, "r": 416.12723, "b": 375.89948, "coord_origin": "1"}}, {"id": 42, "text": "2.73", "bbox": {"l": 439.52701, "t": 367.97317999999996, "r": 458.38425, "b": 375.89948, "coord_origin": "1"}}, {"id": 43, "text": "HTML", "bbox": {"l": 245.17598999999998, "t": 380.98795, "r": 272.93954, "b": 389.05771, "coord_origin": "1"}}, {"id": 44, "text": "0.969", "bbox": {"l": 289.017, "t": 380.98795, "r": 310.00375, "b": 389.05771, "coord_origin": "1"}}, {"id": 45, "text": "0.927", "bbox": {"l": 326.71701, "t": 380.98795, "r": 347.70377, "b": 389.05771, "coord_origin": "1"}}, {"id": 46, "text": "0.955", "bbox": {"l": 363.67599, "t": 380.98795, "r": 384.66275, "b": 389.05771, "coord_origin": "1"}}, {"id": 47, "text": "0.857", "bbox": {"l": 396.20599, "t": 380.98795, "r": 417.19275, "b": 389.05771, "coord_origin": "1"}}, {"id": 48, "text": "5.39", "bbox": {"l": 440.767, "t": 380.98795, "r": 457.14682, "b": 389.05771, "coord_origin": "1"}}, {"id": 49, "text": "4", "bbox": {"l": 161.90601, "t": 399.81696, "r": 166.51294, "b": 407.88672, "coord_origin": "1"}}, {"id": 50, "text": "4", "bbox": {"l": 209.509, "t": 399.81696, "r": 214.11594, "b": 407.88672, "coord_origin": "1"}}, {"id": 51, "text": "OTSL", "bbox": {"l": 246.71000999999998, "t": 394.33795, "r": 271.40527, "b": 402.40771, "coord_origin": "1"}}, {"id": 52, "text": "0.938", "bbox": {"l": 289.017, "t": 394.33795, "r": 310.00375, "b": 402.40771, "coord_origin": "1"}}, {"id": 53, "text": "0.904", "bbox": {"l": 326.71701, "t": 394.33795, "r": 347.70377, "b": 402.40771, "coord_origin": "1"}}, {"id": 54, "text": "0.927", "bbox": {"l": 363.67599, "t": 394.33795, "r": 384.66275, "b": 402.40771, "coord_origin": "1"}}, {"id": 55, "text": "0.853", "bbox": {"l": 394.61801, "t": 394.27518, "r": 418.77887, "b": 402.20148, "coord_origin": "1"}}, {"id": 56, "text": "1.97", "bbox": {"l": 439.52701, "t": 394.27518, "r": 458.38425, "b": 402.20148, "coord_origin": "1"}}, {"id": 57, "text": "HTML", "bbox": {"l": 245.17598999999998, "t": 407.28894, "r": 272.93954, "b": 415.3587, "coord_origin": "1"}}, {"id": 58, "text": "0.952", "bbox": {"l": 289.017, "t": 407.28894, "r": 310.00375, "b": 415.3587, "coord_origin": "1"}}, {"id": 59, "text": "0.909", "bbox": {"l": 326.71701, "t": 407.28894, "r": 347.70377, "b": 415.3587, "coord_origin": "1"}}, {"id": 60, "text": "0.938", "bbox": {"l": 362.08801, "t": 407.22617, "r": 386.24887, "b": 415.15247, "coord_origin": "1"}}, {"id": 61, "text": "0.843", "bbox": {"l": 396.20599, "t": 407.28894, "r": 417.19275, "b": 415.3587, "coord_origin": "1"}}, {"id": 62, "text": "3.77", "bbox": {"l": 440.767, "t": 407.28894, "r": 457.14682, "b": 415.3587, "coord_origin": "1"}}, {"id": 63, "text": "2", "bbox": {"l": 161.90601, "t": 426.11795, "r": 166.51294, "b": 434.1877099999999, "coord_origin": "1"}}, {"id": 64, "text": "4", "bbox": {"l": 209.509, "t": 426.11795, "r": 214.11594, "b": 434.1877099999999, "coord_origin": "1"}}, {"id": 65, "text": "OTSL", "bbox": {"l": 246.71000999999998, "t": 420.63895, "r": 271.40527, "b": 428.70871, "coord_origin": "1"}}, {"id": 66, "text": "0.923", "bbox": {"l": 289.017, "t": 420.63895, "r": 310.00375, "b": 428.70871, "coord_origin": "1"}}, {"id": 67, "text": "0.897", "bbox": {"l": 326.71701, "t": 420.63895, "r": 347.70377, "b": 428.70871, "coord_origin": "1"}}, {"id": 68, "text": "0.915", "bbox": {"l": 363.67599, "t": 420.63895, "r": 384.66275, "b": 428.70871, "coord_origin": "1"}}, {"id": 69, "text": "0.859", "bbox": {"l": 394.61801, "t": 420.57617, "r": 418.77887, "b": 428.50247, "coord_origin": "1"}}, {"id": 70, "text": "1.91", "bbox": {"l": 439.52701, "t": 420.57617, "r": 458.38425, "b": 428.50247, "coord_origin": "1"}}, {"id": 71, "text": "HTML", "bbox": {"l": 245.17598999999998, "t": 433.58994, "r": 272.93954, "b": 441.6597, "coord_origin": "1"}}, {"id": 72, "text": "0.945", "bbox": {"l": 289.017, "t": 433.58994, "r": 310.00375, "b": 441.6597, "coord_origin": "1"}}, {"id": 73, "text": "0.901", "bbox": {"l": 326.71701, "t": 433.58994, "r": 347.70377, "b": 441.6597, "coord_origin": "1"}}, {"id": 74, "text": "0.931", "bbox": {"l": 362.08801, "t": 433.5271599999999, "r": 386.24887, "b": 441.45346, "coord_origin": "1"}}, {"id": 75, "text": "0.834", "bbox": {"l": 396.20599, "t": 433.58994, "r": 417.19275, "b": 441.6597, "coord_origin": "1"}}, {"id": 76, "text": "3.81", "bbox": {"l": 440.767, "t": 433.58994, "r": 457.14682, "b": 441.6597, "coord_origin": "1"}}, {"id": 77, "text": "4", "bbox": {"l": 161.90601, "t": 452.41995, "r": 166.51294, "b": 460.48972, "coord_origin": "1"}}, {"id": 78, "text": "2", "bbox": {"l": 209.509, "t": 452.41995, "r": 214.11594, "b": 460.48972, "coord_origin": "1"}}, {"id": 79, "text": "OTSL", "bbox": {"l": 246.71000999999998, "t": 446.9399399999999, "r": 271.40527, "b": 455.0097, "coord_origin": "1"}}, {"id": 80, "text": "0.952", "bbox": {"l": 289.017, "t": 446.9399399999999, "r": 310.00375, "b": 455.0097, "coord_origin": "1"}}, {"id": 81, "text": "0.92", "bbox": {"l": 329.021, "t": 446.9399399999999, "r": 345.40082, "b": 455.0097, "coord_origin": "1"}}, {"id": 82, "text": "0.942", "bbox": {"l": 362.08801, "t": 446.87717, "r": 386.24887, "b": 454.80347, "coord_origin": "1"}}, {"id": 83, "text": "0.857", "bbox": {"l": 394.61801, "t": 446.87717, "r": 418.77887, "b": 454.80347, "coord_origin": "1"}}, {"id": 84, "text": "1.22", "bbox": {"l": 439.52701, "t": 446.87717, "r": 458.38425, "b": 454.80347, "coord_origin": "1"}}, {"id": 85, "text": "HTML", "bbox": {"l": 245.17598999999998, "t": 459.8919399999999, "r": 272.93954, "b": 467.9617, "coord_origin": "1"}}, {"id": 86, "text": "0.944", "bbox": {"l": 289.017, "t": 459.8919399999999, "r": 310.00375, "b": 467.9617, "coord_origin": "1"}}, {"id": 87, "text": "0.903", "bbox": {"l": 326.71701, "t": 459.8919399999999, "r": 347.70377, "b": 467.9617, "coord_origin": "1"}}, {"id": 88, "text": "0.931", "bbox": {"l": 363.67599, "t": 459.8919399999999, "r": 384.66275, "b": 467.9617, "coord_origin": "1"}}, {"id": 89, "text": "0.824", "bbox": {"l": 396.20599, "t": 459.8919399999999, "r": 417.19275, "b": 467.9617, "coord_origin": "1"}}, {"id": 90, "text": "2", "bbox": {"l": 446.65302, "t": 459.8919399999999, "r": 451.25995, "b": 467.9617, "coord_origin": "1"}}]}, {"id": 7, "label": "Section-header", "bbox": {"l": 134.48985929489137, "t": 507.6188140869141, "r": 264.40332, "b": 517.7784141540527, "coord_origin": "1"}, "confidence": 0.9545547962188721, "cells": [{"id": 91, "text": "5.2", "bbox": {"l": 134.765, "t": 508.15179, "r": 149.40205, "b": 516.95874, "coord_origin": "1"}}, {"id": 92, "text": "Quantitative Results", "bbox": {"l": 160.85904, "t": 508.15179, "r": 264.40332, "b": 516.95874, "coord_origin": "1"}}]}, {"id": 8, "label": "Text", "bbox": {"l": 133.97792644500734, "t": 523.5121616363525, "r": 480.5957599999999, "b": 617.5317226409912, "coord_origin": "1"}, "confidence": 0.9885255098342896, "cells": [{"id": 93, "text": "We picked the model parameter configuration that produced the best prediction", "bbox": {"l": 134.765, "t": 524.55078, "r": 480.59075999999993, "b": 533.34775, "coord_origin": "1"}}, {"id": 94, "text": "quality (enc=6, dec=6, heads=8) with PubTabNet alone, then independently", "bbox": {"l": 134.765, "t": 536.50677, "r": 480.58675999999997, "b": 545.3037400000001, "coord_origin": "1"}}, {"id": 95, "text": "trained and evaluated it on three publicly available data sets: PubTabNet (395k", "bbox": {"l": 134.765, "t": 548.4617800000001, "r": 480.59572999999995, "b": 557.25874, "coord_origin": "1"}}, {"id": 96, "text": "samples), FinTabNet (113k samples) and PubTables-1M (about 1M samples).", "bbox": {"l": 134.765, "t": 560.41678, "r": 480.59177000000005, "b": 569.21375, "coord_origin": "1"}}, {"id": 97, "text": "Performance results are presented in Table. 2. It is clearly evident that the model", "bbox": {"l": 134.765, "t": 572.37178, "r": 480.59069999999997, "b": 581.16875, "coord_origin": "1"}}, {"id": 98, "text": "trained on OTSL outperforms HTML across the board, keeping high TEDs and", "bbox": {"l": 134.765, "t": 584.32678, "r": 480.5957599999999, "b": 593.12375, "coord_origin": "1"}}, {"id": 99, "text": "mAP scores even on difficult financial tables (FinTabNet) that contain sparse", "bbox": {"l": 134.765, "t": 596.28278, "r": 480.58774, "b": 605.07974, "coord_origin": "1"}}, {"id": 100, "text": "and large tables.", "bbox": {"l": 134.765, "t": 608.2377799999999, "r": 206.78664, "b": 617.03474, "coord_origin": "1"}}]}, {"id": 9, "label": "Text", "bbox": {"l": 133.90371551513672, "t": 619.2685958862304, "r": 480.6639009475708, "b": 665.2616912841796, "coord_origin": "1"}, "confidence": 0.9859562516212463, "cells": [{"id": 101, "text": "Additionally, the results show that OTSL has an advantage over HTML", "bbox": {"l": 149.709, "t": 620.19278, "r": 480.59271, "b": 628.98975, "coord_origin": "1"}}, {"id": 102, "text": "when applied on a bigger data set like PubTables-1M and achieves significantly", "bbox": {"l": 134.765, "t": 632.14778, "r": 480.5957599999999, "b": 640.94475, "coord_origin": "1"}}, {"id": 103, "text": "improved scores. Finally, OTSL achieves faster inference due to fewer decoding", "bbox": {"l": 134.765, "t": 644.1027799999999, "r": 480.59283000000005, "b": 652.89975, "coord_origin": "1"}}, {"id": 104, "text": "steps which is a result of the reduced sequence representation.", "bbox": {"l": 134.765, "t": 656.0577900000001, "r": 405.79651, "b": 664.8547599999999, "coord_origin": "1"}}]}]}, "tablestructure": {"table_map": {"6": {"label": "Table", "id": 6, "page_no": 8, "cluster": {"id": 6, "label": "Table", "bbox": {"l": 139.82041025161743, "t": 337.08411598205566, "r": 474.8002452850342, "b": 469.7329902648926, "coord_origin": "1"}, "confidence": 0.990551233291626, "cells": [{"id": 22, "text": "#", "bbox": {"l": 160.37, "t": 341.73495, "r": 168.04793, "b": 349.8047199999999, "coord_origin": "1"}}, {"id": 23, "text": "enc-layers", "bbox": {"l": 144.592, "t": 354.68594, "r": 183.82806, "b": 362.75570999999997, "coord_origin": "1"}}, {"id": 24, "text": "#", "bbox": {"l": 207.974, "t": 341.73495, "r": 215.65193, "b": 349.8047199999999, "coord_origin": "1"}}, {"id": 25, "text": "dec-layers", "bbox": {"l": 192.19499, "t": 354.68594, "r": 231.43106, "b": 362.75570999999997, "coord_origin": "1"}}, {"id": 26, "text": "Language", "bbox": {"l": 239.79799999999997, "t": 347.21396, "r": 278.31766, "b": 355.28372, "coord_origin": "1"}}, {"id": 27, "text": "TEDs", "bbox": {"l": 324.67001, "t": 341.73495, "r": 348.26419, "b": 349.8047199999999, "coord_origin": "1"}}, {"id": 28, "text": "mAP", "bbox": {"l": 396.271, "t": 341.73495, "r": 417.12683, "b": 349.8047199999999, "coord_origin": "1"}}, {"id": 29, "text": "(0.75)", "bbox": {"l": 394.927, "t": 352.69394000000005, "r": 418.47278, "b": 360.7637, "coord_origin": "1"}}, {"id": 30, "text": "Inference", "bbox": {"l": 430.771, "t": 341.73495, "r": 467.1423, "b": 349.8047199999999, "coord_origin": "1"}}, {"id": 31, "text": "time (secs)", "bbox": {"l": 427.14801, "t": 352.69394000000005, "r": 470.76056, "b": 360.7637, "coord_origin": "1"}}, {"id": 32, "text": "simple", "bbox": {"l": 286.686, "t": 354.68594, "r": 312.33261, "b": 362.75570999999997, "coord_origin": "1"}}, {"id": 33, "text": "complex", "bbox": {"l": 320.702, "t": 354.68594, "r": 353.71988, "b": 362.75570999999997, "coord_origin": "1"}}, {"id": 34, "text": "all", "bbox": {"l": 369.306, "t": 354.68594, "r": 379.03094, "b": 362.75570999999997, "coord_origin": "1"}}, {"id": 35, "text": "6", "bbox": {"l": 161.90601, "t": 373.51596, "r": 166.51294, "b": 381.58572, "coord_origin": "1"}}, {"id": 36, "text": "6", "bbox": {"l": 209.509, "t": 373.51596, "r": 214.11594, "b": 381.58572, "coord_origin": "1"}}, {"id": 37, "text": "OTSL", "bbox": {"l": 246.71000999999998, "t": 368.03595, "r": 271.40527, "b": 376.10571, "coord_origin": "1"}}, {"id": 38, "text": "0.965", "bbox": {"l": 289.017, "t": 368.03595, "r": 310.00375, "b": 376.10571, "coord_origin": "1"}}, {"id": 39, "text": "0.934", "bbox": {"l": 326.71701, "t": 368.03595, "r": 347.70377, "b": 376.10571, "coord_origin": "1"}}, {"id": 40, "text": "0.955", "bbox": {"l": 363.67599, "t": 368.03595, "r": 384.66275, "b": 376.10571, "coord_origin": "1"}}, {"id": 41, "text": "0.88", "bbox": {"l": 397.26999, "t": 367.97317999999996, "r": 416.12723, "b": 375.89948, "coord_origin": "1"}}, {"id": 42, "text": "2.73", "bbox": {"l": 439.52701, "t": 367.97317999999996, "r": 458.38425, "b": 375.89948, "coord_origin": "1"}}, {"id": 43, "text": "HTML", "bbox": {"l": 245.17598999999998, "t": 380.98795, "r": 272.93954, "b": 389.05771, "coord_origin": "1"}}, {"id": 44, "text": "0.969", "bbox": {"l": 289.017, "t": 380.98795, "r": 310.00375, "b": 389.05771, "coord_origin": "1"}}, {"id": 45, "text": "0.927", "bbox": {"l": 326.71701, "t": 380.98795, "r": 347.70377, "b": 389.05771, "coord_origin": "1"}}, {"id": 46, "text": "0.955", "bbox": {"l": 363.67599, "t": 380.98795, "r": 384.66275, "b": 389.05771, "coord_origin": "1"}}, {"id": 47, "text": "0.857", "bbox": {"l": 396.20599, "t": 380.98795, "r": 417.19275, "b": 389.05771, "coord_origin": "1"}}, {"id": 48, "text": "5.39", "bbox": {"l": 440.767, "t": 380.98795, "r": 457.14682, "b": 389.05771, "coord_origin": "1"}}, {"id": 49, "text": "4", "bbox": {"l": 161.90601, "t": 399.81696, "r": 166.51294, "b": 407.88672, "coord_origin": "1"}}, {"id": 50, "text": "4", "bbox": {"l": 209.509, "t": 399.81696, "r": 214.11594, "b": 407.88672, "coord_origin": "1"}}, {"id": 51, "text": "OTSL", "bbox": {"l": 246.71000999999998, "t": 394.33795, "r": 271.40527, "b": 402.40771, "coord_origin": "1"}}, {"id": 52, "text": "0.938", "bbox": {"l": 289.017, "t": 394.33795, "r": 310.00375, "b": 402.40771, "coord_origin": "1"}}, {"id": 53, "text": "0.904", "bbox": {"l": 326.71701, "t": 394.33795, "r": 347.70377, "b": 402.40771, "coord_origin": "1"}}, {"id": 54, "text": "0.927", "bbox": {"l": 363.67599, "t": 394.33795, "r": 384.66275, "b": 402.40771, "coord_origin": "1"}}, {"id": 55, "text": "0.853", "bbox": {"l": 394.61801, "t": 394.27518, "r": 418.77887, "b": 402.20148, "coord_origin": "1"}}, {"id": 56, "text": "1.97", "bbox": {"l": 439.52701, "t": 394.27518, "r": 458.38425, "b": 402.20148, "coord_origin": "1"}}, {"id": 57, "text": "HTML", "bbox": {"l": 245.17598999999998, "t": 407.28894, "r": 272.93954, "b": 415.3587, "coord_origin": "1"}}, {"id": 58, "text": "0.952", "bbox": {"l": 289.017, "t": 407.28894, "r": 310.00375, "b": 415.3587, "coord_origin": "1"}}, {"id": 59, "text": "0.909", "bbox": {"l": 326.71701, "t": 407.28894, "r": 347.70377, "b": 415.3587, "coord_origin": "1"}}, {"id": 60, "text": "0.938", "bbox": {"l": 362.08801, "t": 407.22617, "r": 386.24887, "b": 415.15247, "coord_origin": "1"}}, {"id": 61, "text": "0.843", "bbox": {"l": 396.20599, "t": 407.28894, "r": 417.19275, "b": 415.3587, "coord_origin": "1"}}, {"id": 62, "text": "3.77", "bbox": {"l": 440.767, "t": 407.28894, "r": 457.14682, "b": 415.3587, "coord_origin": "1"}}, {"id": 63, "text": "2", "bbox": {"l": 161.90601, "t": 426.11795, "r": 166.51294, "b": 434.1877099999999, "coord_origin": "1"}}, {"id": 64, "text": "4", "bbox": {"l": 209.509, "t": 426.11795, "r": 214.11594, "b": 434.1877099999999, "coord_origin": "1"}}, {"id": 65, "text": "OTSL", "bbox": {"l": 246.71000999999998, "t": 420.63895, "r": 271.40527, "b": 428.70871, "coord_origin": "1"}}, {"id": 66, "text": "0.923", "bbox": {"l": 289.017, "t": 420.63895, "r": 310.00375, "b": 428.70871, "coord_origin": "1"}}, {"id": 67, "text": "0.897", "bbox": {"l": 326.71701, "t": 420.63895, "r": 347.70377, "b": 428.70871, "coord_origin": "1"}}, {"id": 68, "text": "0.915", "bbox": {"l": 363.67599, "t": 420.63895, "r": 384.66275, "b": 428.70871, "coord_origin": "1"}}, {"id": 69, "text": "0.859", "bbox": {"l": 394.61801, "t": 420.57617, "r": 418.77887, "b": 428.50247, "coord_origin": "1"}}, {"id": 70, "text": "1.91", "bbox": {"l": 439.52701, "t": 420.57617, "r": 458.38425, "b": 428.50247, "coord_origin": "1"}}, {"id": 71, "text": "HTML", "bbox": {"l": 245.17598999999998, "t": 433.58994, "r": 272.93954, "b": 441.6597, "coord_origin": "1"}}, {"id": 72, "text": "0.945", "bbox": {"l": 289.017, "t": 433.58994, "r": 310.00375, "b": 441.6597, "coord_origin": "1"}}, {"id": 73, "text": "0.901", "bbox": {"l": 326.71701, "t": 433.58994, "r": 347.70377, "b": 441.6597, "coord_origin": "1"}}, {"id": 74, "text": "0.931", "bbox": {"l": 362.08801, "t": 433.5271599999999, "r": 386.24887, "b": 441.45346, "coord_origin": "1"}}, {"id": 75, "text": "0.834", "bbox": {"l": 396.20599, "t": 433.58994, "r": 417.19275, "b": 441.6597, "coord_origin": "1"}}, {"id": 76, "text": "3.81", "bbox": {"l": 440.767, "t": 433.58994, "r": 457.14682, "b": 441.6597, "coord_origin": "1"}}, {"id": 77, "text": "4", "bbox": {"l": 161.90601, "t": 452.41995, "r": 166.51294, "b": 460.48972, "coord_origin": "1"}}, {"id": 78, "text": "2", "bbox": {"l": 209.509, "t": 452.41995, "r": 214.11594, "b": 460.48972, "coord_origin": "1"}}, {"id": 79, "text": "OTSL", "bbox": {"l": 246.71000999999998, "t": 446.9399399999999, "r": 271.40527, "b": 455.0097, "coord_origin": "1"}}, {"id": 80, "text": "0.952", "bbox": {"l": 289.017, "t": 446.9399399999999, "r": 310.00375, "b": 455.0097, "coord_origin": "1"}}, {"id": 81, "text": "0.92", "bbox": {"l": 329.021, "t": 446.9399399999999, "r": 345.40082, "b": 455.0097, "coord_origin": "1"}}, {"id": 82, "text": "0.942", "bbox": {"l": 362.08801, "t": 446.87717, "r": 386.24887, "b": 454.80347, "coord_origin": "1"}}, {"id": 83, "text": "0.857", "bbox": {"l": 394.61801, "t": 446.87717, "r": 418.77887, "b": 454.80347, "coord_origin": "1"}}, {"id": 84, "text": "1.22", "bbox": {"l": 439.52701, "t": 446.87717, "r": 458.38425, "b": 454.80347, "coord_origin": "1"}}, {"id": 85, "text": "HTML", "bbox": {"l": 245.17598999999998, "t": 459.8919399999999, "r": 272.93954, "b": 467.9617, "coord_origin": "1"}}, {"id": 86, "text": "0.944", "bbox": {"l": 289.017, "t": 459.8919399999999, "r": 310.00375, "b": 467.9617, "coord_origin": "1"}}, {"id": 87, "text": "0.903", "bbox": {"l": 326.71701, "t": 459.8919399999999, "r": 347.70377, "b": 467.9617, "coord_origin": "1"}}, {"id": 88, "text": "0.931", "bbox": {"l": 363.67599, "t": 459.8919399999999, "r": 384.66275, "b": 467.9617, "coord_origin": "1"}}, {"id": 89, "text": "0.824", "bbox": {"l": 396.20599, "t": 459.8919399999999, "r": 417.19275, "b": 467.9617, "coord_origin": "1"}}, {"id": 90, "text": "2", "bbox": {"l": 446.65302, "t": 459.8919399999999, "r": 451.25995, "b": 467.9617, "coord_origin": "1"}}]}, "text": null, "otsl_seq": ["ched", "ched", "ched", "ched", "lcel", "lcel", "ched", "ched", "nl", "ched", "ched", "ucel", "ched", "ched", "ched", "ched", "ched", "nl", "fcel", "fcel", "fcel", "fcel", "fcel", "fcel", "fcel", "fcel", "nl", "fcel", "fcel", "fcel", "fcel", "fcel", "fcel", "fcel", "fcel", "nl", "fcel", "fcel", "fcel", "fcel", "fcel", "fcel", "fcel", "fcel", "nl", "fcel", "fcel", "fcel", "fcel", "fcel", "fcel", "fcel", "fcel", "nl", "fcel", "fcel", "fcel", "fcel", "fcel", "fcel", "fcel", "fcel", "nl"], "num_rows": 7, "num_cols": 8, "table_cells": [{"bbox": {"l": 160.37, "t": 341.73495, "r": 168.04793, "b": 349.8047199999999, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "#", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 144.592, "t": 354.68594, "r": 183.82806, "b": 362.75570999999997, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "enc-layers", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 207.974, "t": 341.73495, "r": 215.65193, "b": 349.8047199999999, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "#", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 192.19499, "t": 354.68594, "r": 231.43106, "b": 362.75570999999997, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "dec-layers", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 239.79799999999997, "t": 347.21396, "r": 278.31766, "b": 355.28372, "coord_origin": "1"}, "row_span": 2, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 2, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "Language", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 324.67001, "t": 341.73495, "r": 348.26419, "b": 349.8047199999999, "coord_origin": "1"}, "row_span": 1, "col_span": 3, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 3, "end_col_offset_idx": 6, "text": "TEDs", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 396.271, "t": 341.73495, "r": 417.12683, "b": 349.8047199999999, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 6, "end_col_offset_idx": 7, "text": "mAP", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 394.927, "t": 352.69394000000005, "r": 418.47278, "b": 360.7637, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 6, "end_col_offset_idx": 7, "text": "(0.75)", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 430.771, "t": 341.73495, "r": 467.1423, "b": 349.8047199999999, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 7, "end_col_offset_idx": 8, "text": "Inference", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 427.14801, "t": 352.69394000000005, "r": 470.76056, "b": 360.7637, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 7, "end_col_offset_idx": 8, "text": "time (secs)", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 286.686, "t": 354.68594, "r": 312.33261, "b": 362.75570999999997, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "simple", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 320.702, "t": 354.68594, "r": 353.71988, "b": 362.75570999999997, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "complex", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 369.306, "t": 354.68594, "r": 379.03094, "b": 362.75570999999997, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "all", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 161.90601, "t": 373.51596, "r": 166.51294, "b": 381.58572, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "6", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 209.509, "t": 373.51596, "r": 214.11594, "b": 381.58572, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "6", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 245.17598999999998, "t": 368.03595, "r": 272.93954, "b": 389.05771, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "OTSL HTML", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 289.017, "t": 368.03595, "r": 310.00375, "b": 389.05771, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "0.965 0.969", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 326.71701, "t": 368.03595, "r": 347.70377, "b": 389.05771, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "0.934 0.927", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 363.67599, "t": 368.03595, "r": 384.66275, "b": 389.05771, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "0.955 0.955", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 396.20599, "t": 367.97317999999996, "r": 417.19275, "b": 389.05771, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 6, "end_col_offset_idx": 7, "text": "0.88 0.857", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 439.52701, "t": 367.97317999999996, "r": 458.38425, "b": 389.05771, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 7, "end_col_offset_idx": 8, "text": "2.73 5.39", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 161.90601, "t": 399.81696, "r": 166.51294, "b": 407.88672, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "4", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 209.509, "t": 399.81696, "r": 214.11594, "b": 407.88672, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "4", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 245.17598999999998, "t": 394.33795, "r": 272.93954, "b": 415.3587, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "OTSL HTML", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 289.017, "t": 394.33795, "r": 310.00375, "b": 415.3587, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "0.938 0.952", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 326.71701, "t": 394.33795, "r": 347.70377, "b": 415.3587, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "0.904 0.909", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 363.67599, "t": 394.33795, "r": 384.66275, "b": 402.40771, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "0.927", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 394.61801, "t": 394.27518, "r": 418.77887, "b": 402.20148, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 6, "end_col_offset_idx": 7, "text": "0.853", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 439.52701, "t": 394.27518, "r": 458.38425, "b": 402.20148, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 7, "end_col_offset_idx": 8, "text": "1.97", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 362.08801, "t": 407.22617, "r": 386.24887, "b": 428.70871, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "0.938 0.915", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 396.20599, "t": 407.28894, "r": 417.19275, "b": 415.3587, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 6, "end_col_offset_idx": 7, "text": "0.843", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 440.767, "t": 407.28894, "r": 457.14682, "b": 415.3587, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 7, "end_col_offset_idx": 8, "text": "3.77", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 161.90601, "t": 426.11795, "r": 166.51294, "b": 434.1877099999999, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "2", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 209.509, "t": 426.11795, "r": 214.11594, "b": 434.1877099999999, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "4", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 245.17598999999998, "t": 420.63895, "r": 272.93954, "b": 441.6597, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "OTSL HTML", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 289.017, "t": 420.63895, "r": 310.00375, "b": 428.70871, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "0.923", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 326.71701, "t": 420.63895, "r": 347.70377, "b": 441.6597, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "0.897 0.901", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 394.61801, "t": 420.57617, "r": 418.77887, "b": 441.6597, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 6, "end_col_offset_idx": 7, "text": "0.859 0.834", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 439.52701, "t": 420.57617, "r": 458.38425, "b": 441.6597, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 7, "end_col_offset_idx": 8, "text": "1.91 3.81", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 289.017, "t": 433.58994, "r": 310.00375, "b": 441.6597, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "0.945", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 362.08801, "t": 433.5271599999999, "r": 386.24887, "b": 441.45346, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "0.931", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 161.90601, "t": 452.41995, "r": 166.51294, "b": 460.48972, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "4", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 209.509, "t": 452.41995, "r": 214.11594, "b": 460.48972, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "2", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 245.17598999999998, "t": 446.9399399999999, "r": 272.93954, "b": 467.9617, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "OTSL HTML", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 289.017, "t": 446.9399399999999, "r": 310.00375, "b": 467.9617, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "0.952 0.944", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 326.71701, "t": 446.9399399999999, "r": 347.70377, "b": 467.9617, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "0.92 0.903", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 362.08801, "t": 446.87717, "r": 386.24887, "b": 467.9617, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "0.942 0.931", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 394.61801, "t": 446.87717, "r": 418.77887, "b": 467.9617, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 6, "end_col_offset_idx": 7, "text": "0.857 0.824", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 439.52701, "t": 446.87717, "r": 458.38425, "b": 467.9617, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 7, "end_col_offset_idx": 8, "text": "1.22 2", "column_header": false, "row_header": false, "row_section": false}]}}}, "figures_classification": null, "equations_prediction": null}, "assembled": {"elements": [{"label": "Page-header", "id": 0, "page_no": 8, "cluster": {"id": 0, "label": "Page-header", "bbox": {"l": 193.94394721984864, "t": 93.11655950546265, "r": 447.54291000000006, "b": 102.24131870269775, "coord_origin": "1"}, "confidence": 0.9502049088478088, "cells": [{"id": 0, "text": "Optimized Table Tokenization for Table Structure Recognition", "bbox": {"l": 194.478, "t": 93.77099999999996, "r": 447.54291000000006, "b": 101.84069999999997, "coord_origin": "1"}}]}, "text": "Optimized Table Tokenization for Table Structure Recognition"}, {"label": "Page-header", "id": 1, "page_no": 8, "cluster": {"id": 1, "label": "Page-header", "bbox": {"l": 474.9051853179932, "t": 93.4998132705689, "r": 480.59125000000006, "b": 101.84069999999997, "coord_origin": "1"}, "confidence": 0.870819091796875, "cells": [{"id": 1, "text": "9", "bbox": {"l": 475.98431, "t": 93.77099999999996, "r": 480.59125000000006, "b": 101.84069999999997, "coord_origin": "1"}}]}, "text": "9"}, {"label": "Text", "id": 2, "page_no": 8, "cluster": {"id": 2, "label": "Text", "bbox": {"l": 133.90584583282472, "t": 118.23915395736697, "r": 480.59579, "b": 151.64178000000004, "coord_origin": "1"}, "confidence": 0.9810612201690674, "cells": [{"id": 2, "text": "order to compute the TED score. Inference timing results for all experiments", "bbox": {"l": 134.765, "t": 118.93377999999996, "r": 480.5936899999999, "b": 127.73077, "coord_origin": "1"}}, {"id": 3, "text": "were obtained from the same machine on a single core with AMD EPYC 7763", "bbox": {"l": 134.765, "t": 130.88878999999997, "r": 480.59579, "b": 139.68579, "coord_origin": "1"}}, {"id": 4, "text": "CPU @2.45 GHz.", "bbox": {"l": 134.765, "t": 142.84479, "r": 210.78462, "b": 151.64178000000004, "coord_origin": "1"}}]}, "text": "order to compute the TED score. Inference timing results for all experiments were obtained from the same machine on a single core with AMD EPYC 7763 CPU @2.45 GHz."}, {"label": "Section-header", "id": 3, "page_no": 8, "cluster": {"id": 3, "label": "Section-header", "bbox": {"l": 134.28504238128662, "t": 168.39932327270503, "r": 318.44843, "b": 178.3033452987671, "coord_origin": "1"}, "confidence": 0.9505251049995422, "cells": [{"id": 5, "text": "5.1", "bbox": {"l": 134.765, "t": 169.18584999999996, "r": 149.40205, "b": 177.9928, "coord_origin": "1"}}, {"id": 6, "text": "Hyper Parameter Optimization", "bbox": {"l": 160.85904, "t": 169.18584999999996, "r": 318.44843, "b": 177.9928, "coord_origin": "1"}}]}, "text": "5.1 Hyper Parameter Optimization"}, {"label": "Text", "id": 4, "page_no": 8, "cluster": {"id": 4, "label": "Text", "bbox": {"l": 133.80440769195556, "t": 184.85479145050044, "r": 481.1519771575928, "b": 254.36992263793945, "coord_origin": "1"}, "confidence": 0.9858020544052124, "cells": [{"id": 7, "text": "We have chosen the PubTabNet data set to perform HPO, since it includes a", "bbox": {"l": 134.765, "t": 185.58582, "r": 480.59183, "b": 194.38280999999995, "coord_origin": "1"}}, {"id": 8, "text": "highly diverse set of tables. Also we report TED scores separately for simple and", "bbox": {"l": 134.765, "t": 197.54083000000003, "r": 480.59183, "b": 206.33783000000005, "coord_origin": "1"}}, {"id": 9, "text": "complex tables (tables with cell spans). Results are presented in Table. 1. It is", "bbox": {"l": 134.765, "t": 209.49585000000002, "r": 480.59177000000005, "b": 218.29285000000004, "coord_origin": "1"}}, {"id": 10, "text": "evident that with OTSL, our model achieves the same TED score and slightly", "bbox": {"l": 134.765, "t": 221.45087, "r": 480.59277, "b": 230.24785999999995, "coord_origin": "1"}}, {"id": 11, "text": "better mAP scores in comparison to HTML. However OTSL yields a", "bbox": {"l": 134.765, "t": 233.40588000000002, "r": 440.94159, "b": 242.20288000000005, "coord_origin": "1"}}, {"id": 12, "text": "2x speed", "bbox": {"l": 444.86798, "t": 233.40588000000002, "r": 480.58786000000003, "b": 242.20288000000005, "coord_origin": "1"}}, {"id": 13, "text": "up", "bbox": {"l": 134.76498, "t": 245.36188000000004, "r": 145.20081, "b": 254.15886999999998, "coord_origin": "1"}}, {"id": 14, "text": "in the inference runtime over HTML.", "bbox": {"l": 149.14899, "t": 245.36188000000004, "r": 311.21957, "b": 254.15886999999998, "coord_origin": "1"}}]}, "text": "We have chosen the PubTabNet data set to perform HPO, since it includes a highly diverse set of tables. Also we report TED scores separately for simple and complex tables (tables with cell spans). Results are presented in Table. 1. It is evident that with OTSL, our model achieves the same TED score and slightly better mAP scores in comparison to HTML. However OTSL yields a 2x speed up in the inference runtime over HTML."}, {"label": "Caption", "id": 5, "page_no": 8, "cluster": {"id": 5, "label": "Caption", "bbox": {"l": 133.88543272018433, "t": 274.21845130920406, "r": 480.5954, "b": 327.4440181732178, "coord_origin": "1"}, "confidence": 0.9517639875411987, "cells": [{"id": 15, "text": "Table", "bbox": {"l": 134.76498, "t": 275.07232999999997, "r": 160.11836, "b": 282.9986, "coord_origin": "1"}}, {"id": 16, "text": "1.", "bbox": {"l": 167.34528, "t": 275.07232999999997, "r": 175.59526, "b": 282.9986, "coord_origin": "1"}}, {"id": 17, "text": "HPO performed in OTSL and HTML representation on the same", "bbox": {"l": 188.13298, "t": 275.13507000000004, "r": 480.59365999999994, "b": 283.2048300000001, "coord_origin": "1"}}, {"id": 18, "text": "transformer-based TableFormer [9] architecture, trained only on PubTabNet [22]. Ef-", "bbox": {"l": 134.76498, "t": 286.09409, "r": 480.59444999999994, "b": 294.16385, "coord_origin": "1"}}, {"id": 19, "text": "fects of reducing the # of layers in encoder and decoder stages of the model show that", "bbox": {"l": 134.76498, "t": 297.05307, "r": 480.5954, "b": 305.12283, "coord_origin": "1"}}, {"id": 20, "text": "smaller models trained on OTSL perform better, especially in recognizing complex", "bbox": {"l": 134.76498, "t": 308.01205, "r": 480.59451, "b": 316.08182, "coord_origin": "1"}}, {"id": 21, "text": "table structures, and maintain a much higher mAP score than the HTML counterpart.", "bbox": {"l": 134.76498, "t": 318.97104, "r": 480.59441999999996, "b": 327.0408, "coord_origin": "1"}}]}, "text": "Table 1. HPO performed in OTSL and HTML representation on the same transformer-based TableFormer [9] architecture, trained only on PubTabNet [22]. Effects of reducing the # of layers in encoder and decoder stages of the model show that smaller models trained on OTSL perform better, especially in recognizing complex table structures, and maintain a much higher mAP score than the HTML counterpart."}, {"label": "Table", "id": 6, "page_no": 8, "cluster": {"id": 6, "label": "Table", "bbox": {"l": 139.82041025161743, "t": 337.08411598205566, "r": 474.8002452850342, "b": 469.7329902648926, "coord_origin": "1"}, "confidence": 0.990551233291626, "cells": [{"id": 22, "text": "#", "bbox": {"l": 160.37, "t": 341.73495, "r": 168.04793, "b": 349.8047199999999, "coord_origin": "1"}}, {"id": 23, "text": "enc-layers", "bbox": {"l": 144.592, "t": 354.68594, "r": 183.82806, "b": 362.75570999999997, "coord_origin": "1"}}, {"id": 24, "text": "#", "bbox": {"l": 207.974, "t": 341.73495, "r": 215.65193, "b": 349.8047199999999, "coord_origin": "1"}}, {"id": 25, "text": "dec-layers", "bbox": {"l": 192.19499, "t": 354.68594, "r": 231.43106, "b": 362.75570999999997, "coord_origin": "1"}}, {"id": 26, "text": "Language", "bbox": {"l": 239.79799999999997, "t": 347.21396, "r": 278.31766, "b": 355.28372, "coord_origin": "1"}}, {"id": 27, "text": "TEDs", "bbox": {"l": 324.67001, "t": 341.73495, "r": 348.26419, "b": 349.8047199999999, "coord_origin": "1"}}, {"id": 28, "text": "mAP", "bbox": {"l": 396.271, "t": 341.73495, "r": 417.12683, "b": 349.8047199999999, "coord_origin": "1"}}, {"id": 29, "text": "(0.75)", "bbox": {"l": 394.927, "t": 352.69394000000005, "r": 418.47278, "b": 360.7637, "coord_origin": "1"}}, {"id": 30, "text": "Inference", "bbox": {"l": 430.771, "t": 341.73495, "r": 467.1423, "b": 349.8047199999999, "coord_origin": "1"}}, {"id": 31, "text": "time (secs)", "bbox": {"l": 427.14801, "t": 352.69394000000005, "r": 470.76056, "b": 360.7637, "coord_origin": "1"}}, {"id": 32, "text": "simple", "bbox": {"l": 286.686, "t": 354.68594, "r": 312.33261, "b": 362.75570999999997, "coord_origin": "1"}}, {"id": 33, "text": "complex", "bbox": {"l": 320.702, "t": 354.68594, "r": 353.71988, "b": 362.75570999999997, "coord_origin": "1"}}, {"id": 34, "text": "all", "bbox": {"l": 369.306, "t": 354.68594, "r": 379.03094, "b": 362.75570999999997, "coord_origin": "1"}}, {"id": 35, "text": "6", "bbox": {"l": 161.90601, "t": 373.51596, "r": 166.51294, "b": 381.58572, "coord_origin": "1"}}, {"id": 36, "text": "6", "bbox": {"l": 209.509, "t": 373.51596, "r": 214.11594, "b": 381.58572, "coord_origin": "1"}}, {"id": 37, "text": "OTSL", "bbox": {"l": 246.71000999999998, "t": 368.03595, "r": 271.40527, "b": 376.10571, "coord_origin": "1"}}, {"id": 38, "text": "0.965", "bbox": {"l": 289.017, "t": 368.03595, "r": 310.00375, "b": 376.10571, "coord_origin": "1"}}, {"id": 39, "text": "0.934", "bbox": {"l": 326.71701, "t": 368.03595, "r": 347.70377, "b": 376.10571, "coord_origin": "1"}}, {"id": 40, "text": "0.955", "bbox": {"l": 363.67599, "t": 368.03595, "r": 384.66275, "b": 376.10571, "coord_origin": "1"}}, {"id": 41, "text": "0.88", "bbox": {"l": 397.26999, "t": 367.97317999999996, "r": 416.12723, "b": 375.89948, "coord_origin": "1"}}, {"id": 42, "text": "2.73", "bbox": {"l": 439.52701, "t": 367.97317999999996, "r": 458.38425, "b": 375.89948, "coord_origin": "1"}}, {"id": 43, "text": "HTML", "bbox": {"l": 245.17598999999998, "t": 380.98795, "r": 272.93954, "b": 389.05771, "coord_origin": "1"}}, {"id": 44, "text": "0.969", "bbox": {"l": 289.017, "t": 380.98795, "r": 310.00375, "b": 389.05771, "coord_origin": "1"}}, {"id": 45, "text": "0.927", "bbox": {"l": 326.71701, "t": 380.98795, "r": 347.70377, "b": 389.05771, "coord_origin": "1"}}, {"id": 46, "text": "0.955", "bbox": {"l": 363.67599, "t": 380.98795, "r": 384.66275, "b": 389.05771, "coord_origin": "1"}}, {"id": 47, "text": "0.857", "bbox": {"l": 396.20599, "t": 380.98795, "r": 417.19275, "b": 389.05771, "coord_origin": "1"}}, {"id": 48, "text": "5.39", "bbox": {"l": 440.767, "t": 380.98795, "r": 457.14682, "b": 389.05771, "coord_origin": "1"}}, {"id": 49, "text": "4", "bbox": {"l": 161.90601, "t": 399.81696, "r": 166.51294, "b": 407.88672, "coord_origin": "1"}}, {"id": 50, "text": "4", "bbox": {"l": 209.509, "t": 399.81696, "r": 214.11594, "b": 407.88672, "coord_origin": "1"}}, {"id": 51, "text": "OTSL", "bbox": {"l": 246.71000999999998, "t": 394.33795, "r": 271.40527, "b": 402.40771, "coord_origin": "1"}}, {"id": 52, "text": "0.938", "bbox": {"l": 289.017, "t": 394.33795, "r": 310.00375, "b": 402.40771, "coord_origin": "1"}}, {"id": 53, "text": "0.904", "bbox": {"l": 326.71701, "t": 394.33795, "r": 347.70377, "b": 402.40771, "coord_origin": "1"}}, {"id": 54, "text": "0.927", "bbox": {"l": 363.67599, "t": 394.33795, "r": 384.66275, "b": 402.40771, "coord_origin": "1"}}, {"id": 55, "text": "0.853", "bbox": {"l": 394.61801, "t": 394.27518, "r": 418.77887, "b": 402.20148, "coord_origin": "1"}}, {"id": 56, "text": "1.97", "bbox": {"l": 439.52701, "t": 394.27518, "r": 458.38425, "b": 402.20148, "coord_origin": "1"}}, {"id": 57, "text": "HTML", "bbox": {"l": 245.17598999999998, "t": 407.28894, "r": 272.93954, "b": 415.3587, "coord_origin": "1"}}, {"id": 58, "text": "0.952", "bbox": {"l": 289.017, "t": 407.28894, "r": 310.00375, "b": 415.3587, "coord_origin": "1"}}, {"id": 59, "text": "0.909", "bbox": {"l": 326.71701, "t": 407.28894, "r": 347.70377, "b": 415.3587, "coord_origin": "1"}}, {"id": 60, "text": "0.938", "bbox": {"l": 362.08801, "t": 407.22617, "r": 386.24887, "b": 415.15247, "coord_origin": "1"}}, {"id": 61, "text": "0.843", "bbox": {"l": 396.20599, "t": 407.28894, "r": 417.19275, "b": 415.3587, "coord_origin": "1"}}, {"id": 62, "text": "3.77", "bbox": {"l": 440.767, "t": 407.28894, "r": 457.14682, "b": 415.3587, "coord_origin": "1"}}, {"id": 63, "text": "2", "bbox": {"l": 161.90601, "t": 426.11795, "r": 166.51294, "b": 434.1877099999999, "coord_origin": "1"}}, {"id": 64, "text": "4", "bbox": {"l": 209.509, "t": 426.11795, "r": 214.11594, "b": 434.1877099999999, "coord_origin": "1"}}, {"id": 65, "text": "OTSL", "bbox": {"l": 246.71000999999998, "t": 420.63895, "r": 271.40527, "b": 428.70871, "coord_origin": "1"}}, {"id": 66, "text": "0.923", "bbox": {"l": 289.017, "t": 420.63895, "r": 310.00375, "b": 428.70871, "coord_origin": "1"}}, {"id": 67, "text": "0.897", "bbox": {"l": 326.71701, "t": 420.63895, "r": 347.70377, "b": 428.70871, "coord_origin": "1"}}, {"id": 68, "text": "0.915", "bbox": {"l": 363.67599, "t": 420.63895, "r": 384.66275, "b": 428.70871, "coord_origin": "1"}}, {"id": 69, "text": "0.859", "bbox": {"l": 394.61801, "t": 420.57617, "r": 418.77887, "b": 428.50247, "coord_origin": "1"}}, {"id": 70, "text": "1.91", "bbox": {"l": 439.52701, "t": 420.57617, "r": 458.38425, "b": 428.50247, "coord_origin": "1"}}, {"id": 71, "text": "HTML", "bbox": {"l": 245.17598999999998, "t": 433.58994, "r": 272.93954, "b": 441.6597, "coord_origin": "1"}}, {"id": 72, "text": "0.945", "bbox": {"l": 289.017, "t": 433.58994, "r": 310.00375, "b": 441.6597, "coord_origin": "1"}}, {"id": 73, "text": "0.901", "bbox": {"l": 326.71701, "t": 433.58994, "r": 347.70377, "b": 441.6597, "coord_origin": "1"}}, {"id": 74, "text": "0.931", "bbox": {"l": 362.08801, "t": 433.5271599999999, "r": 386.24887, "b": 441.45346, "coord_origin": "1"}}, {"id": 75, "text": "0.834", "bbox": {"l": 396.20599, "t": 433.58994, "r": 417.19275, "b": 441.6597, "coord_origin": "1"}}, {"id": 76, "text": "3.81", "bbox": {"l": 440.767, "t": 433.58994, "r": 457.14682, "b": 441.6597, "coord_origin": "1"}}, {"id": 77, "text": "4", "bbox": {"l": 161.90601, "t": 452.41995, "r": 166.51294, "b": 460.48972, "coord_origin": "1"}}, {"id": 78, "text": "2", "bbox": {"l": 209.509, "t": 452.41995, "r": 214.11594, "b": 460.48972, "coord_origin": "1"}}, {"id": 79, "text": "OTSL", "bbox": {"l": 246.71000999999998, "t": 446.9399399999999, "r": 271.40527, "b": 455.0097, "coord_origin": "1"}}, {"id": 80, "text": "0.952", "bbox": {"l": 289.017, "t": 446.9399399999999, "r": 310.00375, "b": 455.0097, "coord_origin": "1"}}, {"id": 81, "text": "0.92", "bbox": {"l": 329.021, "t": 446.9399399999999, "r": 345.40082, "b": 455.0097, "coord_origin": "1"}}, {"id": 82, "text": "0.942", "bbox": {"l": 362.08801, "t": 446.87717, "r": 386.24887, "b": 454.80347, "coord_origin": "1"}}, {"id": 83, "text": "0.857", "bbox": {"l": 394.61801, "t": 446.87717, "r": 418.77887, "b": 454.80347, "coord_origin": "1"}}, {"id": 84, "text": "1.22", "bbox": {"l": 439.52701, "t": 446.87717, "r": 458.38425, "b": 454.80347, "coord_origin": "1"}}, {"id": 85, "text": "HTML", "bbox": {"l": 245.17598999999998, "t": 459.8919399999999, "r": 272.93954, "b": 467.9617, "coord_origin": "1"}}, {"id": 86, "text": "0.944", "bbox": {"l": 289.017, "t": 459.8919399999999, "r": 310.00375, "b": 467.9617, "coord_origin": "1"}}, {"id": 87, "text": "0.903", "bbox": {"l": 326.71701, "t": 459.8919399999999, "r": 347.70377, "b": 467.9617, "coord_origin": "1"}}, {"id": 88, "text": "0.931", "bbox": {"l": 363.67599, "t": 459.8919399999999, "r": 384.66275, "b": 467.9617, "coord_origin": "1"}}, {"id": 89, "text": "0.824", "bbox": {"l": 396.20599, "t": 459.8919399999999, "r": 417.19275, "b": 467.9617, "coord_origin": "1"}}, {"id": 90, "text": "2", "bbox": {"l": 446.65302, "t": 459.8919399999999, "r": 451.25995, "b": 467.9617, "coord_origin": "1"}}]}, "text": null, "otsl_seq": ["ched", "ched", "ched", "ched", "lcel", "lcel", "ched", "ched", "nl", "ched", "ched", "ucel", "ched", "ched", "ched", "ched", "ched", "nl", "fcel", "fcel", "fcel", "fcel", "fcel", "fcel", "fcel", "fcel", "nl", "fcel", "fcel", "fcel", "fcel", "fcel", "fcel", "fcel", "fcel", "nl", "fcel", "fcel", "fcel", "fcel", "fcel", "fcel", "fcel", "fcel", "nl", "fcel", "fcel", "fcel", "fcel", "fcel", "fcel", "fcel", "fcel", "nl", "fcel", "fcel", "fcel", "fcel", "fcel", "fcel", "fcel", "fcel", "nl"], "num_rows": 7, "num_cols": 8, "table_cells": [{"bbox": {"l": 160.37, "t": 341.73495, "r": 168.04793, "b": 349.8047199999999, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "#", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 144.592, "t": 354.68594, "r": 183.82806, "b": 362.75570999999997, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "enc-layers", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 207.974, "t": 341.73495, "r": 215.65193, "b": 349.8047199999999, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "#", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 192.19499, "t": 354.68594, "r": 231.43106, "b": 362.75570999999997, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "dec-layers", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 239.79799999999997, "t": 347.21396, "r": 278.31766, "b": 355.28372, "coord_origin": "1"}, "row_span": 2, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 2, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "Language", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 324.67001, "t": 341.73495, "r": 348.26419, "b": 349.8047199999999, "coord_origin": "1"}, "row_span": 1, "col_span": 3, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 3, "end_col_offset_idx": 6, "text": "TEDs", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 396.271, "t": 341.73495, "r": 417.12683, "b": 349.8047199999999, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 6, "end_col_offset_idx": 7, "text": "mAP", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 394.927, "t": 352.69394000000005, "r": 418.47278, "b": 360.7637, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 6, "end_col_offset_idx": 7, "text": "(0.75)", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 430.771, "t": 341.73495, "r": 467.1423, "b": 349.8047199999999, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 7, "end_col_offset_idx": 8, "text": "Inference", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 427.14801, "t": 352.69394000000005, "r": 470.76056, "b": 360.7637, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 7, "end_col_offset_idx": 8, "text": "time (secs)", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 286.686, "t": 354.68594, "r": 312.33261, "b": 362.75570999999997, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "simple", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 320.702, "t": 354.68594, "r": 353.71988, "b": 362.75570999999997, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "complex", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 369.306, "t": 354.68594, "r": 379.03094, "b": 362.75570999999997, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "all", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 161.90601, "t": 373.51596, "r": 166.51294, "b": 381.58572, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "6", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 209.509, "t": 373.51596, "r": 214.11594, "b": 381.58572, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "6", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 245.17598999999998, "t": 368.03595, "r": 272.93954, "b": 389.05771, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "OTSL HTML", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 289.017, "t": 368.03595, "r": 310.00375, "b": 389.05771, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "0.965 0.969", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 326.71701, "t": 368.03595, "r": 347.70377, "b": 389.05771, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "0.934 0.927", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 363.67599, "t": 368.03595, "r": 384.66275, "b": 389.05771, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "0.955 0.955", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 396.20599, "t": 367.97317999999996, "r": 417.19275, "b": 389.05771, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 6, "end_col_offset_idx": 7, "text": "0.88 0.857", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 439.52701, "t": 367.97317999999996, "r": 458.38425, "b": 389.05771, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 7, "end_col_offset_idx": 8, "text": "2.73 5.39", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 161.90601, "t": 399.81696, "r": 166.51294, "b": 407.88672, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "4", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 209.509, "t": 399.81696, "r": 214.11594, "b": 407.88672, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "4", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 245.17598999999998, "t": 394.33795, "r": 272.93954, "b": 415.3587, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "OTSL HTML", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 289.017, "t": 394.33795, "r": 310.00375, "b": 415.3587, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "0.938 0.952", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 326.71701, "t": 394.33795, "r": 347.70377, "b": 415.3587, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "0.904 0.909", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 363.67599, "t": 394.33795, "r": 384.66275, "b": 402.40771, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "0.927", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 394.61801, "t": 394.27518, "r": 418.77887, "b": 402.20148, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 6, "end_col_offset_idx": 7, "text": "0.853", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 439.52701, "t": 394.27518, "r": 458.38425, "b": 402.20148, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 7, "end_col_offset_idx": 8, "text": "1.97", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 362.08801, "t": 407.22617, "r": 386.24887, "b": 428.70871, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "0.938 0.915", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 396.20599, "t": 407.28894, "r": 417.19275, "b": 415.3587, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 6, "end_col_offset_idx": 7, "text": "0.843", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 440.767, "t": 407.28894, "r": 457.14682, "b": 415.3587, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 7, "end_col_offset_idx": 8, "text": "3.77", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 161.90601, "t": 426.11795, "r": 166.51294, "b": 434.1877099999999, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "2", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 209.509, "t": 426.11795, "r": 214.11594, "b": 434.1877099999999, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "4", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 245.17598999999998, "t": 420.63895, "r": 272.93954, "b": 441.6597, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "OTSL HTML", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 289.017, "t": 420.63895, "r": 310.00375, "b": 428.70871, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "0.923", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 326.71701, "t": 420.63895, "r": 347.70377, "b": 441.6597, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "0.897 0.901", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 394.61801, "t": 420.57617, "r": 418.77887, "b": 441.6597, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 6, "end_col_offset_idx": 7, "text": "0.859 0.834", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 439.52701, "t": 420.57617, "r": 458.38425, "b": 441.6597, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 7, "end_col_offset_idx": 8, "text": "1.91 3.81", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 289.017, "t": 433.58994, "r": 310.00375, "b": 441.6597, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "0.945", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 362.08801, "t": 433.5271599999999, "r": 386.24887, "b": 441.45346, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "0.931", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 161.90601, "t": 452.41995, "r": 166.51294, "b": 460.48972, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "4", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 209.509, "t": 452.41995, "r": 214.11594, "b": 460.48972, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "2", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 245.17598999999998, "t": 446.9399399999999, "r": 272.93954, "b": 467.9617, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "OTSL HTML", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 289.017, "t": 446.9399399999999, "r": 310.00375, "b": 467.9617, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "0.952 0.944", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 326.71701, "t": 446.9399399999999, "r": 347.70377, "b": 467.9617, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "0.92 0.903", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 362.08801, "t": 446.87717, "r": 386.24887, "b": 467.9617, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "0.942 0.931", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 394.61801, "t": 446.87717, "r": 418.77887, "b": 467.9617, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 6, "end_col_offset_idx": 7, "text": "0.857 0.824", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 439.52701, "t": 446.87717, "r": 458.38425, "b": 467.9617, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 7, "end_col_offset_idx": 8, "text": "1.22 2", "column_header": false, "row_header": false, "row_section": false}]}, {"label": "Section-header", "id": 7, "page_no": 8, "cluster": {"id": 7, "label": "Section-header", "bbox": {"l": 134.48985929489137, "t": 507.6188140869141, "r": 264.40332, "b": 517.7784141540527, "coord_origin": "1"}, "confidence": 0.9545547962188721, "cells": [{"id": 91, "text": "5.2", "bbox": {"l": 134.765, "t": 508.15179, "r": 149.40205, "b": 516.95874, "coord_origin": "1"}}, {"id": 92, "text": "Quantitative Results", "bbox": {"l": 160.85904, "t": 508.15179, "r": 264.40332, "b": 516.95874, "coord_origin": "1"}}]}, "text": "5.2 Quantitative Results"}, {"label": "Text", "id": 8, "page_no": 8, "cluster": {"id": 8, "label": "Text", "bbox": {"l": 133.97792644500734, "t": 523.5121616363525, "r": 480.5957599999999, "b": 617.5317226409912, "coord_origin": "1"}, "confidence": 0.9885255098342896, "cells": [{"id": 93, "text": "We picked the model parameter configuration that produced the best prediction", "bbox": {"l": 134.765, "t": 524.55078, "r": 480.59075999999993, "b": 533.34775, "coord_origin": "1"}}, {"id": 94, "text": "quality (enc=6, dec=6, heads=8) with PubTabNet alone, then independently", "bbox": {"l": 134.765, "t": 536.50677, "r": 480.58675999999997, "b": 545.3037400000001, "coord_origin": "1"}}, {"id": 95, "text": "trained and evaluated it on three publicly available data sets: PubTabNet (395k", "bbox": {"l": 134.765, "t": 548.4617800000001, "r": 480.59572999999995, "b": 557.25874, "coord_origin": "1"}}, {"id": 96, "text": "samples), FinTabNet (113k samples) and PubTables-1M (about 1M samples).", "bbox": {"l": 134.765, "t": 560.41678, "r": 480.59177000000005, "b": 569.21375, "coord_origin": "1"}}, {"id": 97, "text": "Performance results are presented in Table. 2. It is clearly evident that the model", "bbox": {"l": 134.765, "t": 572.37178, "r": 480.59069999999997, "b": 581.16875, "coord_origin": "1"}}, {"id": 98, "text": "trained on OTSL outperforms HTML across the board, keeping high TEDs and", "bbox": {"l": 134.765, "t": 584.32678, "r": 480.5957599999999, "b": 593.12375, "coord_origin": "1"}}, {"id": 99, "text": "mAP scores even on difficult financial tables (FinTabNet) that contain sparse", "bbox": {"l": 134.765, "t": 596.28278, "r": 480.58774, "b": 605.07974, "coord_origin": "1"}}, {"id": 100, "text": "and large tables.", "bbox": {"l": 134.765, "t": 608.2377799999999, "r": 206.78664, "b": 617.03474, "coord_origin": "1"}}]}, "text": "We picked the model parameter configuration that produced the best prediction quality (enc=6, dec=6, heads=8) with PubTabNet alone, then independently trained and evaluated it on three publicly available data sets: PubTabNet (395k samples), FinTabNet (113k samples) and PubTables-1M (about 1M samples). Performance results are presented in Table. 2. It is clearly evident that the model trained on OTSL outperforms HTML across the board, keeping high TEDs and mAP scores even on difficult financial tables (FinTabNet) that contain sparse and large tables."}, {"label": "Text", "id": 9, "page_no": 8, "cluster": {"id": 9, "label": "Text", "bbox": {"l": 133.90371551513672, "t": 619.2685958862304, "r": 480.6639009475708, "b": 665.2616912841796, "coord_origin": "1"}, "confidence": 0.9859562516212463, "cells": [{"id": 101, "text": "Additionally, the results show that OTSL has an advantage over HTML", "bbox": {"l": 149.709, "t": 620.19278, "r": 480.59271, "b": 628.98975, "coord_origin": "1"}}, {"id": 102, "text": "when applied on a bigger data set like PubTables-1M and achieves significantly", "bbox": {"l": 134.765, "t": 632.14778, "r": 480.5957599999999, "b": 640.94475, "coord_origin": "1"}}, {"id": 103, "text": "improved scores. Finally, OTSL achieves faster inference due to fewer decoding", "bbox": {"l": 134.765, "t": 644.1027799999999, "r": 480.59283000000005, "b": 652.89975, "coord_origin": "1"}}, {"id": 104, "text": "steps which is a result of the reduced sequence representation.", "bbox": {"l": 134.765, "t": 656.0577900000001, "r": 405.79651, "b": 664.8547599999999, "coord_origin": "1"}}]}, "text": "Additionally, the results show that OTSL has an advantage over HTML when applied on a bigger data set like PubTables-1M and achieves significantly improved scores. Finally, OTSL achieves faster inference due to fewer decoding steps which is a result of the reduced sequence representation."}], "body": [{"label": "Text", "id": 2, "page_no": 8, "cluster": {"id": 2, "label": "Text", "bbox": {"l": 133.90584583282472, "t": 118.23915395736697, "r": 480.59579, "b": 151.64178000000004, "coord_origin": "1"}, "confidence": 0.9810612201690674, "cells": [{"id": 2, "text": "order to compute the TED score. Inference timing results for all experiments", "bbox": {"l": 134.765, "t": 118.93377999999996, "r": 480.5936899999999, "b": 127.73077, "coord_origin": "1"}}, {"id": 3, "text": "were obtained from the same machine on a single core with AMD EPYC 7763", "bbox": {"l": 134.765, "t": 130.88878999999997, "r": 480.59579, "b": 139.68579, "coord_origin": "1"}}, {"id": 4, "text": "CPU @2.45 GHz.", "bbox": {"l": 134.765, "t": 142.84479, "r": 210.78462, "b": 151.64178000000004, "coord_origin": "1"}}]}, "text": "order to compute the TED score. Inference timing results for all experiments were obtained from the same machine on a single core with AMD EPYC 7763 CPU @2.45 GHz."}, {"label": "Section-header", "id": 3, "page_no": 8, "cluster": {"id": 3, "label": "Section-header", "bbox": {"l": 134.28504238128662, "t": 168.39932327270503, "r": 318.44843, "b": 178.3033452987671, "coord_origin": "1"}, "confidence": 0.9505251049995422, "cells": [{"id": 5, "text": "5.1", "bbox": {"l": 134.765, "t": 169.18584999999996, "r": 149.40205, "b": 177.9928, "coord_origin": "1"}}, {"id": 6, "text": "Hyper Parameter Optimization", "bbox": {"l": 160.85904, "t": 169.18584999999996, "r": 318.44843, "b": 177.9928, "coord_origin": "1"}}]}, "text": "5.1 Hyper Parameter Optimization"}, {"label": "Text", "id": 4, "page_no": 8, "cluster": {"id": 4, "label": "Text", "bbox": {"l": 133.80440769195556, "t": 184.85479145050044, "r": 481.1519771575928, "b": 254.36992263793945, "coord_origin": "1"}, "confidence": 0.9858020544052124, "cells": [{"id": 7, "text": "We have chosen the PubTabNet data set to perform HPO, since it includes a", "bbox": {"l": 134.765, "t": 185.58582, "r": 480.59183, "b": 194.38280999999995, "coord_origin": "1"}}, {"id": 8, "text": "highly diverse set of tables. Also we report TED scores separately for simple and", "bbox": {"l": 134.765, "t": 197.54083000000003, "r": 480.59183, "b": 206.33783000000005, "coord_origin": "1"}}, {"id": 9, "text": "complex tables (tables with cell spans). Results are presented in Table. 1. It is", "bbox": {"l": 134.765, "t": 209.49585000000002, "r": 480.59177000000005, "b": 218.29285000000004, "coord_origin": "1"}}, {"id": 10, "text": "evident that with OTSL, our model achieves the same TED score and slightly", "bbox": {"l": 134.765, "t": 221.45087, "r": 480.59277, "b": 230.24785999999995, "coord_origin": "1"}}, {"id": 11, "text": "better mAP scores in comparison to HTML. However OTSL yields a", "bbox": {"l": 134.765, "t": 233.40588000000002, "r": 440.94159, "b": 242.20288000000005, "coord_origin": "1"}}, {"id": 12, "text": "2x speed", "bbox": {"l": 444.86798, "t": 233.40588000000002, "r": 480.58786000000003, "b": 242.20288000000005, "coord_origin": "1"}}, {"id": 13, "text": "up", "bbox": {"l": 134.76498, "t": 245.36188000000004, "r": 145.20081, "b": 254.15886999999998, "coord_origin": "1"}}, {"id": 14, "text": "in the inference runtime over HTML.", "bbox": {"l": 149.14899, "t": 245.36188000000004, "r": 311.21957, "b": 254.15886999999998, "coord_origin": "1"}}]}, "text": "We have chosen the PubTabNet data set to perform HPO, since it includes a highly diverse set of tables. Also we report TED scores separately for simple and complex tables (tables with cell spans). Results are presented in Table. 1. It is evident that with OTSL, our model achieves the same TED score and slightly better mAP scores in comparison to HTML. However OTSL yields a 2x speed up in the inference runtime over HTML."}, {"label": "Caption", "id": 5, "page_no": 8, "cluster": {"id": 5, "label": "Caption", "bbox": {"l": 133.88543272018433, "t": 274.21845130920406, "r": 480.5954, "b": 327.4440181732178, "coord_origin": "1"}, "confidence": 0.9517639875411987, "cells": [{"id": 15, "text": "Table", "bbox": {"l": 134.76498, "t": 275.07232999999997, "r": 160.11836, "b": 282.9986, "coord_origin": "1"}}, {"id": 16, "text": "1.", "bbox": {"l": 167.34528, "t": 275.07232999999997, "r": 175.59526, "b": 282.9986, "coord_origin": "1"}}, {"id": 17, "text": "HPO performed in OTSL and HTML representation on the same", "bbox": {"l": 188.13298, "t": 275.13507000000004, "r": 480.59365999999994, "b": 283.2048300000001, "coord_origin": "1"}}, {"id": 18, "text": "transformer-based TableFormer [9] architecture, trained only on PubTabNet [22]. Ef-", "bbox": {"l": 134.76498, "t": 286.09409, "r": 480.59444999999994, "b": 294.16385, "coord_origin": "1"}}, {"id": 19, "text": "fects of reducing the # of layers in encoder and decoder stages of the model show that", "bbox": {"l": 134.76498, "t": 297.05307, "r": 480.5954, "b": 305.12283, "coord_origin": "1"}}, {"id": 20, "text": "smaller models trained on OTSL perform better, especially in recognizing complex", "bbox": {"l": 134.76498, "t": 308.01205, "r": 480.59451, "b": 316.08182, "coord_origin": "1"}}, {"id": 21, "text": "table structures, and maintain a much higher mAP score than the HTML counterpart.", "bbox": {"l": 134.76498, "t": 318.97104, "r": 480.59441999999996, "b": 327.0408, "coord_origin": "1"}}]}, "text": "Table 1. HPO performed in OTSL and HTML representation on the same transformer-based TableFormer [9] architecture, trained only on PubTabNet [22]. Effects of reducing the # of layers in encoder and decoder stages of the model show that smaller models trained on OTSL perform better, especially in recognizing complex table structures, and maintain a much higher mAP score than the HTML counterpart."}, {"label": "Table", "id": 6, "page_no": 8, "cluster": {"id": 6, "label": "Table", "bbox": {"l": 139.82041025161743, "t": 337.08411598205566, "r": 474.8002452850342, "b": 469.7329902648926, "coord_origin": "1"}, "confidence": 0.990551233291626, "cells": [{"id": 22, "text": "#", "bbox": {"l": 160.37, "t": 341.73495, "r": 168.04793, "b": 349.8047199999999, "coord_origin": "1"}}, {"id": 23, "text": "enc-layers", "bbox": {"l": 144.592, "t": 354.68594, "r": 183.82806, "b": 362.75570999999997, "coord_origin": "1"}}, {"id": 24, "text": "#", "bbox": {"l": 207.974, "t": 341.73495, "r": 215.65193, "b": 349.8047199999999, "coord_origin": "1"}}, {"id": 25, "text": "dec-layers", "bbox": {"l": 192.19499, "t": 354.68594, "r": 231.43106, "b": 362.75570999999997, "coord_origin": "1"}}, {"id": 26, "text": "Language", "bbox": {"l": 239.79799999999997, "t": 347.21396, "r": 278.31766, "b": 355.28372, "coord_origin": "1"}}, {"id": 27, "text": "TEDs", "bbox": {"l": 324.67001, "t": 341.73495, "r": 348.26419, "b": 349.8047199999999, "coord_origin": "1"}}, {"id": 28, "text": "mAP", "bbox": {"l": 396.271, "t": 341.73495, "r": 417.12683, "b": 349.8047199999999, "coord_origin": "1"}}, {"id": 29, "text": "(0.75)", "bbox": {"l": 394.927, "t": 352.69394000000005, "r": 418.47278, "b": 360.7637, "coord_origin": "1"}}, {"id": 30, "text": "Inference", "bbox": {"l": 430.771, "t": 341.73495, "r": 467.1423, "b": 349.8047199999999, "coord_origin": "1"}}, {"id": 31, "text": "time (secs)", "bbox": {"l": 427.14801, "t": 352.69394000000005, "r": 470.76056, "b": 360.7637, "coord_origin": "1"}}, {"id": 32, "text": "simple", "bbox": {"l": 286.686, "t": 354.68594, "r": 312.33261, "b": 362.75570999999997, "coord_origin": "1"}}, {"id": 33, "text": "complex", "bbox": {"l": 320.702, "t": 354.68594, "r": 353.71988, "b": 362.75570999999997, "coord_origin": "1"}}, {"id": 34, "text": "all", "bbox": {"l": 369.306, "t": 354.68594, "r": 379.03094, "b": 362.75570999999997, "coord_origin": "1"}}, {"id": 35, "text": "6", "bbox": {"l": 161.90601, "t": 373.51596, "r": 166.51294, "b": 381.58572, "coord_origin": "1"}}, {"id": 36, "text": "6", "bbox": {"l": 209.509, "t": 373.51596, "r": 214.11594, "b": 381.58572, "coord_origin": "1"}}, {"id": 37, "text": "OTSL", "bbox": {"l": 246.71000999999998, "t": 368.03595, "r": 271.40527, "b": 376.10571, "coord_origin": "1"}}, {"id": 38, "text": "0.965", "bbox": {"l": 289.017, "t": 368.03595, "r": 310.00375, "b": 376.10571, "coord_origin": "1"}}, {"id": 39, "text": "0.934", "bbox": {"l": 326.71701, "t": 368.03595, "r": 347.70377, "b": 376.10571, "coord_origin": "1"}}, {"id": 40, "text": "0.955", "bbox": {"l": 363.67599, "t": 368.03595, "r": 384.66275, "b": 376.10571, "coord_origin": "1"}}, {"id": 41, "text": "0.88", "bbox": {"l": 397.26999, "t": 367.97317999999996, "r": 416.12723, "b": 375.89948, "coord_origin": "1"}}, {"id": 42, "text": "2.73", "bbox": {"l": 439.52701, "t": 367.97317999999996, "r": 458.38425, "b": 375.89948, "coord_origin": "1"}}, {"id": 43, "text": "HTML", "bbox": {"l": 245.17598999999998, "t": 380.98795, "r": 272.93954, "b": 389.05771, "coord_origin": "1"}}, {"id": 44, "text": "0.969", "bbox": {"l": 289.017, "t": 380.98795, "r": 310.00375, "b": 389.05771, "coord_origin": "1"}}, {"id": 45, "text": "0.927", "bbox": {"l": 326.71701, "t": 380.98795, "r": 347.70377, "b": 389.05771, "coord_origin": "1"}}, {"id": 46, "text": "0.955", "bbox": {"l": 363.67599, "t": 380.98795, "r": 384.66275, "b": 389.05771, "coord_origin": "1"}}, {"id": 47, "text": "0.857", "bbox": {"l": 396.20599, "t": 380.98795, "r": 417.19275, "b": 389.05771, "coord_origin": "1"}}, {"id": 48, "text": "5.39", "bbox": {"l": 440.767, "t": 380.98795, "r": 457.14682, "b": 389.05771, "coord_origin": "1"}}, {"id": 49, "text": "4", "bbox": {"l": 161.90601, "t": 399.81696, "r": 166.51294, "b": 407.88672, "coord_origin": "1"}}, {"id": 50, "text": "4", "bbox": {"l": 209.509, "t": 399.81696, "r": 214.11594, "b": 407.88672, "coord_origin": "1"}}, {"id": 51, "text": "OTSL", "bbox": {"l": 246.71000999999998, "t": 394.33795, "r": 271.40527, "b": 402.40771, "coord_origin": "1"}}, {"id": 52, "text": "0.938", "bbox": {"l": 289.017, "t": 394.33795, "r": 310.00375, "b": 402.40771, "coord_origin": "1"}}, {"id": 53, "text": "0.904", "bbox": {"l": 326.71701, "t": 394.33795, "r": 347.70377, "b": 402.40771, "coord_origin": "1"}}, {"id": 54, "text": "0.927", "bbox": {"l": 363.67599, "t": 394.33795, "r": 384.66275, "b": 402.40771, "coord_origin": "1"}}, {"id": 55, "text": "0.853", "bbox": {"l": 394.61801, "t": 394.27518, "r": 418.77887, "b": 402.20148, "coord_origin": "1"}}, {"id": 56, "text": "1.97", "bbox": {"l": 439.52701, "t": 394.27518, "r": 458.38425, "b": 402.20148, "coord_origin": "1"}}, {"id": 57, "text": "HTML", "bbox": {"l": 245.17598999999998, "t": 407.28894, "r": 272.93954, "b": 415.3587, "coord_origin": "1"}}, {"id": 58, "text": "0.952", "bbox": {"l": 289.017, "t": 407.28894, "r": 310.00375, "b": 415.3587, "coord_origin": "1"}}, {"id": 59, "text": "0.909", "bbox": {"l": 326.71701, "t": 407.28894, "r": 347.70377, "b": 415.3587, "coord_origin": "1"}}, {"id": 60, "text": "0.938", "bbox": {"l": 362.08801, "t": 407.22617, "r": 386.24887, "b": 415.15247, "coord_origin": "1"}}, {"id": 61, "text": "0.843", "bbox": {"l": 396.20599, "t": 407.28894, "r": 417.19275, "b": 415.3587, "coord_origin": "1"}}, {"id": 62, "text": "3.77", "bbox": {"l": 440.767, "t": 407.28894, "r": 457.14682, "b": 415.3587, "coord_origin": "1"}}, {"id": 63, "text": "2", "bbox": {"l": 161.90601, "t": 426.11795, "r": 166.51294, "b": 434.1877099999999, "coord_origin": "1"}}, {"id": 64, "text": "4", "bbox": {"l": 209.509, "t": 426.11795, "r": 214.11594, "b": 434.1877099999999, "coord_origin": "1"}}, {"id": 65, "text": "OTSL", "bbox": {"l": 246.71000999999998, "t": 420.63895, "r": 271.40527, "b": 428.70871, "coord_origin": "1"}}, {"id": 66, "text": "0.923", "bbox": {"l": 289.017, "t": 420.63895, "r": 310.00375, "b": 428.70871, "coord_origin": "1"}}, {"id": 67, "text": "0.897", "bbox": {"l": 326.71701, "t": 420.63895, "r": 347.70377, "b": 428.70871, "coord_origin": "1"}}, {"id": 68, "text": "0.915", "bbox": {"l": 363.67599, "t": 420.63895, "r": 384.66275, "b": 428.70871, "coord_origin": "1"}}, {"id": 69, "text": "0.859", "bbox": {"l": 394.61801, "t": 420.57617, "r": 418.77887, "b": 428.50247, "coord_origin": "1"}}, {"id": 70, "text": "1.91", "bbox": {"l": 439.52701, "t": 420.57617, "r": 458.38425, "b": 428.50247, "coord_origin": "1"}}, {"id": 71, "text": "HTML", "bbox": {"l": 245.17598999999998, "t": 433.58994, "r": 272.93954, "b": 441.6597, "coord_origin": "1"}}, {"id": 72, "text": "0.945", "bbox": {"l": 289.017, "t": 433.58994, "r": 310.00375, "b": 441.6597, "coord_origin": "1"}}, {"id": 73, "text": "0.901", "bbox": {"l": 326.71701, "t": 433.58994, "r": 347.70377, "b": 441.6597, "coord_origin": "1"}}, {"id": 74, "text": "0.931", "bbox": {"l": 362.08801, "t": 433.5271599999999, "r": 386.24887, "b": 441.45346, "coord_origin": "1"}}, {"id": 75, "text": "0.834", "bbox": {"l": 396.20599, "t": 433.58994, "r": 417.19275, "b": 441.6597, "coord_origin": "1"}}, {"id": 76, "text": "3.81", "bbox": {"l": 440.767, "t": 433.58994, "r": 457.14682, "b": 441.6597, "coord_origin": "1"}}, {"id": 77, "text": "4", "bbox": {"l": 161.90601, "t": 452.41995, "r": 166.51294, "b": 460.48972, "coord_origin": "1"}}, {"id": 78, "text": "2", "bbox": {"l": 209.509, "t": 452.41995, "r": 214.11594, "b": 460.48972, "coord_origin": "1"}}, {"id": 79, "text": "OTSL", "bbox": {"l": 246.71000999999998, "t": 446.9399399999999, "r": 271.40527, "b": 455.0097, "coord_origin": "1"}}, {"id": 80, "text": "0.952", "bbox": {"l": 289.017, "t": 446.9399399999999, "r": 310.00375, "b": 455.0097, "coord_origin": "1"}}, {"id": 81, "text": "0.92", "bbox": {"l": 329.021, "t": 446.9399399999999, "r": 345.40082, "b": 455.0097, "coord_origin": "1"}}, {"id": 82, "text": "0.942", "bbox": {"l": 362.08801, "t": 446.87717, "r": 386.24887, "b": 454.80347, "coord_origin": "1"}}, {"id": 83, "text": "0.857", "bbox": {"l": 394.61801, "t": 446.87717, "r": 418.77887, "b": 454.80347, "coord_origin": "1"}}, {"id": 84, "text": "1.22", "bbox": {"l": 439.52701, "t": 446.87717, "r": 458.38425, "b": 454.80347, "coord_origin": "1"}}, {"id": 85, "text": "HTML", "bbox": {"l": 245.17598999999998, "t": 459.8919399999999, "r": 272.93954, "b": 467.9617, "coord_origin": "1"}}, {"id": 86, "text": "0.944", "bbox": {"l": 289.017, "t": 459.8919399999999, "r": 310.00375, "b": 467.9617, "coord_origin": "1"}}, {"id": 87, "text": "0.903", "bbox": {"l": 326.71701, "t": 459.8919399999999, "r": 347.70377, "b": 467.9617, "coord_origin": "1"}}, {"id": 88, "text": "0.931", "bbox": {"l": 363.67599, "t": 459.8919399999999, "r": 384.66275, "b": 467.9617, "coord_origin": "1"}}, {"id": 89, "text": "0.824", "bbox": {"l": 396.20599, "t": 459.8919399999999, "r": 417.19275, "b": 467.9617, "coord_origin": "1"}}, {"id": 90, "text": "2", "bbox": {"l": 446.65302, "t": 459.8919399999999, "r": 451.25995, "b": 467.9617, "coord_origin": "1"}}]}, "text": null, "otsl_seq": ["ched", "ched", "ched", "ched", "lcel", "lcel", "ched", "ched", "nl", "ched", "ched", "ucel", "ched", "ched", "ched", "ched", "ched", "nl", "fcel", "fcel", "fcel", "fcel", "fcel", "fcel", "fcel", "fcel", "nl", "fcel", "fcel", "fcel", "fcel", "fcel", "fcel", "fcel", "fcel", "nl", "fcel", "fcel", "fcel", "fcel", "fcel", "fcel", "fcel", "fcel", "nl", "fcel", "fcel", "fcel", "fcel", "fcel", "fcel", "fcel", "fcel", "nl", "fcel", "fcel", "fcel", "fcel", "fcel", "fcel", "fcel", "fcel", "nl"], "num_rows": 7, "num_cols": 8, "table_cells": [{"bbox": {"l": 160.37, "t": 341.73495, "r": 168.04793, "b": 349.8047199999999, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "#", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 144.592, "t": 354.68594, "r": 183.82806, "b": 362.75570999999997, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "enc-layers", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 207.974, "t": 341.73495, "r": 215.65193, "b": 349.8047199999999, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "#", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 192.19499, "t": 354.68594, "r": 231.43106, "b": 362.75570999999997, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "dec-layers", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 239.79799999999997, "t": 347.21396, "r": 278.31766, "b": 355.28372, "coord_origin": "1"}, "row_span": 2, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 2, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "Language", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 324.67001, "t": 341.73495, "r": 348.26419, "b": 349.8047199999999, "coord_origin": "1"}, "row_span": 1, "col_span": 3, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 3, "end_col_offset_idx": 6, "text": "TEDs", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 396.271, "t": 341.73495, "r": 417.12683, "b": 349.8047199999999, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 6, "end_col_offset_idx": 7, "text": "mAP", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 394.927, "t": 352.69394000000005, "r": 418.47278, "b": 360.7637, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 6, "end_col_offset_idx": 7, "text": "(0.75)", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 430.771, "t": 341.73495, "r": 467.1423, "b": 349.8047199999999, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 7, "end_col_offset_idx": 8, "text": "Inference", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 427.14801, "t": 352.69394000000005, "r": 470.76056, "b": 360.7637, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 7, "end_col_offset_idx": 8, "text": "time (secs)", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 286.686, "t": 354.68594, "r": 312.33261, "b": 362.75570999999997, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "simple", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 320.702, "t": 354.68594, "r": 353.71988, "b": 362.75570999999997, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "complex", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 369.306, "t": 354.68594, "r": 379.03094, "b": 362.75570999999997, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "all", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 161.90601, "t": 373.51596, "r": 166.51294, "b": 381.58572, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "6", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 209.509, "t": 373.51596, "r": 214.11594, "b": 381.58572, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "6", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 245.17598999999998, "t": 368.03595, "r": 272.93954, "b": 389.05771, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "OTSL HTML", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 289.017, "t": 368.03595, "r": 310.00375, "b": 389.05771, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "0.965 0.969", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 326.71701, "t": 368.03595, "r": 347.70377, "b": 389.05771, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "0.934 0.927", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 363.67599, "t": 368.03595, "r": 384.66275, "b": 389.05771, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "0.955 0.955", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 396.20599, "t": 367.97317999999996, "r": 417.19275, "b": 389.05771, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 6, "end_col_offset_idx": 7, "text": "0.88 0.857", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 439.52701, "t": 367.97317999999996, "r": 458.38425, "b": 389.05771, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 7, "end_col_offset_idx": 8, "text": "2.73 5.39", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 161.90601, "t": 399.81696, "r": 166.51294, "b": 407.88672, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "4", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 209.509, "t": 399.81696, "r": 214.11594, "b": 407.88672, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "4", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 245.17598999999998, "t": 394.33795, "r": 272.93954, "b": 415.3587, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "OTSL HTML", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 289.017, "t": 394.33795, "r": 310.00375, "b": 415.3587, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "0.938 0.952", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 326.71701, "t": 394.33795, "r": 347.70377, "b": 415.3587, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "0.904 0.909", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 363.67599, "t": 394.33795, "r": 384.66275, "b": 402.40771, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "0.927", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 394.61801, "t": 394.27518, "r": 418.77887, "b": 402.20148, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 6, "end_col_offset_idx": 7, "text": "0.853", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 439.52701, "t": 394.27518, "r": 458.38425, "b": 402.20148, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 7, "end_col_offset_idx": 8, "text": "1.97", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 362.08801, "t": 407.22617, "r": 386.24887, "b": 428.70871, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "0.938 0.915", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 396.20599, "t": 407.28894, "r": 417.19275, "b": 415.3587, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 6, "end_col_offset_idx": 7, "text": "0.843", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 440.767, "t": 407.28894, "r": 457.14682, "b": 415.3587, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 7, "end_col_offset_idx": 8, "text": "3.77", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 161.90601, "t": 426.11795, "r": 166.51294, "b": 434.1877099999999, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "2", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 209.509, "t": 426.11795, "r": 214.11594, "b": 434.1877099999999, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "4", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 245.17598999999998, "t": 420.63895, "r": 272.93954, "b": 441.6597, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "OTSL HTML", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 289.017, "t": 420.63895, "r": 310.00375, "b": 428.70871, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "0.923", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 326.71701, "t": 420.63895, "r": 347.70377, "b": 441.6597, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "0.897 0.901", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 394.61801, "t": 420.57617, "r": 418.77887, "b": 441.6597, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 6, "end_col_offset_idx": 7, "text": "0.859 0.834", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 439.52701, "t": 420.57617, "r": 458.38425, "b": 441.6597, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 7, "end_col_offset_idx": 8, "text": "1.91 3.81", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 289.017, "t": 433.58994, "r": 310.00375, "b": 441.6597, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "0.945", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 362.08801, "t": 433.5271599999999, "r": 386.24887, "b": 441.45346, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "0.931", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 161.90601, "t": 452.41995, "r": 166.51294, "b": 460.48972, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "4", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 209.509, "t": 452.41995, "r": 214.11594, "b": 460.48972, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "2", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 245.17598999999998, "t": 446.9399399999999, "r": 272.93954, "b": 467.9617, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "OTSL HTML", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 289.017, "t": 446.9399399999999, "r": 310.00375, "b": 467.9617, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "0.952 0.944", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 326.71701, "t": 446.9399399999999, "r": 347.70377, "b": 467.9617, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "0.92 0.903", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 362.08801, "t": 446.87717, "r": 386.24887, "b": 467.9617, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "0.942 0.931", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 394.61801, "t": 446.87717, "r": 418.77887, "b": 467.9617, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 6, "end_col_offset_idx": 7, "text": "0.857 0.824", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 439.52701, "t": 446.87717, "r": 458.38425, "b": 467.9617, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 7, "end_col_offset_idx": 8, "text": "1.22 2", "column_header": false, "row_header": false, "row_section": false}]}, {"label": "Section-header", "id": 7, "page_no": 8, "cluster": {"id": 7, "label": "Section-header", "bbox": {"l": 134.48985929489137, "t": 507.6188140869141, "r": 264.40332, "b": 517.7784141540527, "coord_origin": "1"}, "confidence": 0.9545547962188721, "cells": [{"id": 91, "text": "5.2", "bbox": {"l": 134.765, "t": 508.15179, "r": 149.40205, "b": 516.95874, "coord_origin": "1"}}, {"id": 92, "text": "Quantitative Results", "bbox": {"l": 160.85904, "t": 508.15179, "r": 264.40332, "b": 516.95874, "coord_origin": "1"}}]}, "text": "5.2 Quantitative Results"}, {"label": "Text", "id": 8, "page_no": 8, "cluster": {"id": 8, "label": "Text", "bbox": {"l": 133.97792644500734, "t": 523.5121616363525, "r": 480.5957599999999, "b": 617.5317226409912, "coord_origin": "1"}, "confidence": 0.9885255098342896, "cells": [{"id": 93, "text": "We picked the model parameter configuration that produced the best prediction", "bbox": {"l": 134.765, "t": 524.55078, "r": 480.59075999999993, "b": 533.34775, "coord_origin": "1"}}, {"id": 94, "text": "quality (enc=6, dec=6, heads=8) with PubTabNet alone, then independently", "bbox": {"l": 134.765, "t": 536.50677, "r": 480.58675999999997, "b": 545.3037400000001, "coord_origin": "1"}}, {"id": 95, "text": "trained and evaluated it on three publicly available data sets: PubTabNet (395k", "bbox": {"l": 134.765, "t": 548.4617800000001, "r": 480.59572999999995, "b": 557.25874, "coord_origin": "1"}}, {"id": 96, "text": "samples), FinTabNet (113k samples) and PubTables-1M (about 1M samples).", "bbox": {"l": 134.765, "t": 560.41678, "r": 480.59177000000005, "b": 569.21375, "coord_origin": "1"}}, {"id": 97, "text": "Performance results are presented in Table. 2. It is clearly evident that the model", "bbox": {"l": 134.765, "t": 572.37178, "r": 480.59069999999997, "b": 581.16875, "coord_origin": "1"}}, {"id": 98, "text": "trained on OTSL outperforms HTML across the board, keeping high TEDs and", "bbox": {"l": 134.765, "t": 584.32678, "r": 480.5957599999999, "b": 593.12375, "coord_origin": "1"}}, {"id": 99, "text": "mAP scores even on difficult financial tables (FinTabNet) that contain sparse", "bbox": {"l": 134.765, "t": 596.28278, "r": 480.58774, "b": 605.07974, "coord_origin": "1"}}, {"id": 100, "text": "and large tables.", "bbox": {"l": 134.765, "t": 608.2377799999999, "r": 206.78664, "b": 617.03474, "coord_origin": "1"}}]}, "text": "We picked the model parameter configuration that produced the best prediction quality (enc=6, dec=6, heads=8) with PubTabNet alone, then independently trained and evaluated it on three publicly available data sets: PubTabNet (395k samples), FinTabNet (113k samples) and PubTables-1M (about 1M samples). Performance results are presented in Table. 2. It is clearly evident that the model trained on OTSL outperforms HTML across the board, keeping high TEDs and mAP scores even on difficult financial tables (FinTabNet) that contain sparse and large tables."}, {"label": "Text", "id": 9, "page_no": 8, "cluster": {"id": 9, "label": "Text", "bbox": {"l": 133.90371551513672, "t": 619.2685958862304, "r": 480.6639009475708, "b": 665.2616912841796, "coord_origin": "1"}, "confidence": 0.9859562516212463, "cells": [{"id": 101, "text": "Additionally, the results show that OTSL has an advantage over HTML", "bbox": {"l": 149.709, "t": 620.19278, "r": 480.59271, "b": 628.98975, "coord_origin": "1"}}, {"id": 102, "text": "when applied on a bigger data set like PubTables-1M and achieves significantly", "bbox": {"l": 134.765, "t": 632.14778, "r": 480.5957599999999, "b": 640.94475, "coord_origin": "1"}}, {"id": 103, "text": "improved scores. Finally, OTSL achieves faster inference due to fewer decoding", "bbox": {"l": 134.765, "t": 644.1027799999999, "r": 480.59283000000005, "b": 652.89975, "coord_origin": "1"}}, {"id": 104, "text": "steps which is a result of the reduced sequence representation.", "bbox": {"l": 134.765, "t": 656.0577900000001, "r": 405.79651, "b": 664.8547599999999, "coord_origin": "1"}}]}, "text": "Additionally, the results show that OTSL has an advantage over HTML when applied on a bigger data set like PubTables-1M and achieves significantly improved scores. Finally, OTSL achieves faster inference due to fewer decoding steps which is a result of the reduced sequence representation."}], "headers": [{"label": "Page-header", "id": 0, "page_no": 8, "cluster": {"id": 0, "label": "Page-header", "bbox": {"l": 193.94394721984864, "t": 93.11655950546265, "r": 447.54291000000006, "b": 102.24131870269775, "coord_origin": "1"}, "confidence": 0.9502049088478088, "cells": [{"id": 0, "text": "Optimized Table Tokenization for Table Structure Recognition", "bbox": {"l": 194.478, "t": 93.77099999999996, "r": 447.54291000000006, "b": 101.84069999999997, "coord_origin": "1"}}]}, "text": "Optimized Table Tokenization for Table Structure Recognition"}, {"label": "Page-header", "id": 1, "page_no": 8, "cluster": {"id": 1, "label": "Page-header", "bbox": {"l": 474.9051853179932, "t": 93.4998132705689, "r": 480.59125000000006, "b": 101.84069999999997, "coord_origin": "1"}, "confidence": 0.870819091796875, "cells": [{"id": 1, "text": "9", "bbox": {"l": 475.98431, "t": 93.77099999999996, "r": 480.59125000000006, "b": 101.84069999999997, "coord_origin": "1"}}]}, "text": "9"}]}}, {"page_no": 9, "page_hash": "a1509c4093fe25dbcb07c87f394506182323289a17dd189679c0b6d8238c5aae", "size": {"width": 612.0, "height": 792.0}, "cells": [{"id": 0, "text": "10", "bbox": {"l": 134.765, "t": 93.77099999999996, "r": 143.97887, "b": 101.84069999999997, "coord_origin": "1"}}, {"id": 1, "text": "M.", "bbox": {"l": 167.82053, "t": 93.77099999999996, "r": 178.08249, "b": 101.84069999999997, "coord_origin": "1"}}, {"id": 2, "text": "Lysak, et al.", "bbox": {"l": 182.37929, "t": 93.77099999999996, "r": 231.72049000000004, "b": 101.84069999999997, "coord_origin": "1"}}, {"id": 3, "text": "Table 2.", "bbox": {"l": 134.765, "t": 115.83618000000001, "r": 173.09366, "b": 123.76251000000002, "coord_origin": "1"}}, {"id": 4, "text": "TSR and cell detection results compared between OTSL and HTML on", "bbox": {"l": 181.30299, "t": 115.89899000000003, "r": 480.59151999999995, "b": 123.96868999999992, "coord_origin": "1"}}, {"id": 5, "text": "the PubTabNet [22], FinTabNet [21] and PubTables-1M [14] data sets using Table-", "bbox": {"l": 134.765, "t": 126.85797000000014, "r": 480.59357000000006, "b": 134.92767000000003, "coord_origin": "1"}}, {"id": 6, "text": "Former [9] (with enc=6, dec=6, heads=8).", "bbox": {"l": 134.765, "t": 137.81696, "r": 305.95691, "b": 145.88666, "coord_origin": "1"}}, {"id": 7, "text": "Data set", "bbox": {"l": 160.782, "t": 166.55895999999996, "r": 194.99779, "b": 174.62865999999997, "coord_origin": "1"}}, {"id": 8, "text": "Language", "bbox": {"l": 215.52499000000003, "t": 166.534, "r": 254.04465, "b": 174.6037, "coord_origin": "1"}}, {"id": 9, "text": "TEDs", "bbox": {"l": 300.397, "t": 161.07898, "r": 323.99118, "b": 169.14868, "coord_origin": "1"}}, {"id": 10, "text": "mAP(0.75)", "bbox": {"l": 370.345, "t": 166.55895999999996, "r": 414.74661, "b": 174.62865999999997, "coord_origin": "1"}}, {"id": 11, "text": "Inference", "bbox": {"l": 426.737, "t": 161.07898, "r": 463.10830999999996, "b": 169.14868, "coord_origin": "1"}}, {"id": 12, "text": "time (secs)", "bbox": {"l": 423.11401, "t": 172.03796, "r": 466.72656, "b": 180.10766999999998, "coord_origin": "1"}}, {"id": 13, "text": "simple", "bbox": {"l": 262.41299, "t": 174.03101000000004, "r": 288.0596, "b": 182.10071000000005, "coord_origin": "1"}}, {"id": 14, "text": "complex", "bbox": {"l": 296.42899, "t": 174.03101000000004, "r": 329.44687, "b": 182.10071000000005, "coord_origin": "1"}}, {"id": 15, "text": "all", "bbox": {"l": 345.03299, "t": 174.03101000000004, "r": 354.75793, "b": 182.10071000000005, "coord_origin": "1"}}, {"id": 16, "text": "PubTabNet", "bbox": {"l": 154.53799, "t": 192.85999000000004, "r": 201.24129, "b": 200.92969000000005, "coord_origin": "1"}}, {"id": 17, "text": "OTSL", "bbox": {"l": 222.43700000000004, "t": 187.38098000000002, "r": 247.13226000000003, "b": 195.45068000000003, "coord_origin": "1"}}, {"id": 18, "text": "0.965", "bbox": {"l": 264.74399, "t": 187.38098000000002, "r": 285.73074, "b": 195.45068000000003, "coord_origin": "1"}}, {"id": 19, "text": "0.934", "bbox": {"l": 302.444, "t": 187.38098000000002, "r": 323.43076, "b": 195.45068000000003, "coord_origin": "1"}}, {"id": 20, "text": "0.955", "bbox": {"l": 339.40302, "t": 187.38098000000002, "r": 360.38977, "b": 195.45068000000003, "coord_origin": "1"}}, {"id": 21, "text": "0.88", "bbox": {"l": 383.116, "t": 187.31817999999998, "r": 401.97324, "b": 195.24451, "coord_origin": "1"}}, {"id": 22, "text": "2.73", "bbox": {"l": 435.49300999999997, "t": 187.31817999999998, "r": 454.35025, "b": 195.24451, "coord_origin": "1"}}, {"id": 23, "text": "HTML", "bbox": {"l": 220.903, "t": 200.33196999999996, "r": 248.66655999999998, "b": 208.40166999999997, "coord_origin": "1"}}, {"id": 24, "text": "0.969", "bbox": {"l": 264.74399, "t": 200.33196999999996, "r": 285.73074, "b": 208.40166999999997, "coord_origin": "1"}}, {"id": 25, "text": "0.927", "bbox": {"l": 302.444, "t": 200.33196999999996, "r": 323.43076, "b": 208.40166999999997, "coord_origin": "1"}}, {"id": 26, "text": "0.955", "bbox": {"l": 339.40302, "t": 200.33196999999996, "r": 360.38977, "b": 208.40166999999997, "coord_origin": "1"}}, {"id": 27, "text": "0.857", "bbox": {"l": 382.052, "t": 200.33196999999996, "r": 403.03876, "b": 208.40166999999997, "coord_origin": "1"}}, {"id": 28, "text": "5.39", "bbox": {"l": 436.73199000000005, "t": 200.33196999999996, "r": 453.11182, "b": 208.40166999999997, "coord_origin": "1"}}, {"id": 29, "text": "FinTabNet", "bbox": {"l": 155.94501, "t": 219.16198999999995, "r": 199.83374, "b": 227.23168999999996, "coord_origin": "1"}}, {"id": 30, "text": "OTSL", "bbox": {"l": 222.43700000000004, "t": 213.68201, "r": 247.13226000000003, "b": 221.75171, "coord_origin": "1"}}, {"id": 31, "text": "0.955", "bbox": {"l": 264.74399, "t": 213.68201, "r": 285.73074, "b": 221.75171, "coord_origin": "1"}}, {"id": 32, "text": "0.961", "bbox": {"l": 302.444, "t": 213.68201, "r": 323.43076, "b": 221.75171, "coord_origin": "1"}}, {"id": 33, "text": "0.959", "bbox": {"l": 337.815, "t": 213.61919999999998, "r": 361.97586, "b": 221.54552999999999, "coord_origin": "1"}}, {"id": 34, "text": "0.862", "bbox": {"l": 380.46399, "t": 213.61919999999998, "r": 404.62485, "b": 221.54552999999999, "coord_origin": "1"}}, {"id": 35, "text": "1.85", "bbox": {"l": 435.49300999999997, "t": 213.61919999999998, "r": 454.35025, "b": 221.54552999999999, "coord_origin": "1"}}, {"id": 36, "text": "HTML", "bbox": {"l": 220.903, "t": 226.63396999999998, "r": 248.66655999999998, "b": 234.70367, "coord_origin": "1"}}, {"id": 37, "text": "0.917", "bbox": {"l": 264.74399, "t": 226.63396999999998, "r": 285.73074, "b": 234.70367, "coord_origin": "1"}}, {"id": 38, "text": "0.922", "bbox": {"l": 302.444, "t": 226.63396999999998, "r": 323.43076, "b": 234.70367, "coord_origin": "1"}}, {"id": 39, "text": "0.92", "bbox": {"l": 341.70599, "t": 226.63396999999998, "r": 358.08582, "b": 234.70367, "coord_origin": "1"}}, {"id": 40, "text": "0.722", "bbox": {"l": 382.052, "t": 226.63396999999998, "r": 403.03876, "b": 234.70367, "coord_origin": "1"}}, {"id": 41, "text": "3.26", "bbox": {"l": 436.73199000000005, "t": 226.63396999999998, "r": 453.11182, "b": 234.70367, "coord_origin": "1"}}, {"id": 42, "text": "PubTables-1M", "bbox": {"l": 148.62601, "t": 245.46294999999998, "r": 207.1524, "b": 253.53265, "coord_origin": "1"}}, {"id": 43, "text": "OTSL", "bbox": {"l": 222.43700000000004, "t": 239.98297000000002, "r": 247.13226000000003, "b": 248.05267000000003, "coord_origin": "1"}}, {"id": 44, "text": "0.987", "bbox": {"l": 264.74399, "t": 239.98297000000002, "r": 285.73074, "b": 248.05267000000003, "coord_origin": "1"}}, {"id": 45, "text": "0.964", "bbox": {"l": 302.444, "t": 239.98297000000002, "r": 323.43076, "b": 248.05267000000003, "coord_origin": "1"}}, {"id": 46, "text": "0.977", "bbox": {"l": 337.815, "t": 239.92016999999998, "r": 361.97586, "b": 247.8465, "coord_origin": "1"}}, {"id": 47, "text": "0.896", "bbox": {"l": 380.46399, "t": 239.92016999999998, "r": 404.62485, "b": 247.8465, "coord_origin": "1"}}, {"id": 48, "text": "1.79", "bbox": {"l": 435.49300999999997, "t": 239.92016999999998, "r": 454.35025, "b": 247.8465, "coord_origin": "1"}}, {"id": 49, "text": "HTML", "bbox": {"l": 220.903, "t": 252.93499999999995, "r": 248.66655999999998, "b": 261.00469999999996, "coord_origin": "1"}}, {"id": 50, "text": "0.983", "bbox": {"l": 264.74399, "t": 252.93499999999995, "r": 285.73074, "b": 261.00469999999996, "coord_origin": "1"}}, {"id": 51, "text": "0.944", "bbox": {"l": 302.444, "t": 252.93499999999995, "r": 323.43076, "b": 261.00469999999996, "coord_origin": "1"}}, {"id": 52, "text": "0.966", "bbox": {"l": 339.40302, "t": 252.93499999999995, "r": 360.38977, "b": 261.00469999999996, "coord_origin": "1"}}, {"id": 53, "text": "0.889", "bbox": {"l": 382.052, "t": 252.93499999999995, "r": 403.03876, "b": 261.00469999999996, "coord_origin": "1"}}, {"id": 54, "text": "3.26", "bbox": {"l": 436.73199000000005, "t": 252.93499999999995, "r": 453.11182, "b": 261.00469999999996, "coord_origin": "1"}}, {"id": 55, "text": "5.3", "bbox": {"l": 134.765, "t": 288.91479, "r": 149.40205, "b": 297.72173999999995, "coord_origin": "1"}}, {"id": 56, "text": "Qualitative Results", "bbox": {"l": 160.85904, "t": 288.91479, "r": 257.08679, "b": 297.72173999999995, "coord_origin": "1"}}, {"id": 57, "text": "To illustrate the qualitative differences between OTSL and HTML, Figure 5", "bbox": {"l": 134.765, "t": 309.86078, "r": 480.58777, "b": 318.65775, "coord_origin": "1"}}, {"id": 58, "text": "demonstrates less overlap and more accurate bounding boxes with OTSL. In", "bbox": {"l": 134.765, "t": 321.81577, "r": 480.58889999999997, "b": 330.61273, "coord_origin": "1"}}, {"id": 59, "text": "Figure 6, OTSL proves to be more effective in handling tables with longer to-", "bbox": {"l": 134.765, "t": 333.77075, "r": 480.58681999999993, "b": 342.56772, "coord_origin": "1"}}, {"id": 60, "text": "ken sequences, resulting in even more precise structure prediction and bounding", "bbox": {"l": 134.765, "t": 345.72574, "r": 480.58981, "b": 354.52271, "coord_origin": "1"}}, {"id": 61, "text": "boxes.", "bbox": {"l": 134.765, "t": 357.68073, "r": 161.65704, "b": 366.47769, "coord_origin": "1"}}, {"id": 62, "text": "Fig. 5.", "bbox": {"l": 134.765, "t": 397.59012, "r": 162.64424, "b": 405.51642, "coord_origin": "1"}}, {"id": 63, "text": "The OTSL model produces more accurate bounding boxes with less over-", "bbox": {"l": 167.384, "t": 397.65289, "r": 480.59106, "b": 405.72266, "coord_origin": "1"}}, {"id": 64, "text": "lap (E) than the HTML model (D), when predicting the structure of a sparse ta-", "bbox": {"l": 134.765, "t": 408.61190999999997, "r": 480.59106, "b": 416.68167000000005, "coord_origin": "1"}}, {"id": 65, "text": "ble (A), at twice the inference speed because of shorter sequence length (B),(C).", "bbox": {"l": 134.765, "t": 419.57089, "r": 480.58838000000003, "b": 427.64066, "coord_origin": "1"}}, {"id": 66, "text": "\"PMC2807444_006_00.png\" PubTabNet.", "bbox": {"l": 134.765, "t": 430.52987999999993, "r": 304.69171, "b": 438.59964, "coord_origin": "1"}}, {"id": 67, "text": "<table>", "bbox": {"l": 180.12473, "t": 516.2332200000001, "r": 190.62042, "b": 518.94992, "coord_origin": "1"}}, {"id": 68, "text": "<tr><td></td><td colspan=\"4\"></td><td colspan=\"6\"></td><td colspan=\"3\"></td></tr>", "bbox": {"l": 183.2438, "t": 520.13208, "r": 304.54797, "b": 522.84879, "coord_origin": "1"}}, {"id": 69, "text": "<tr><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td></tr>", "bbox": {"l": 183.2438, "t": 524.03094, "r": 388.42313, "b": 526.74765, "coord_origin": "1"}}, {"id": 70, "text": "<tr><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td></tr>", "bbox": {"l": 183.2438, "t": 527.9297799999999, "r": 388.42313, "b": 530.64648, "coord_origin": "1"}}, {"id": 71, "text": "<tr><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td></tr>", "bbox": {"l": 183.2438, "t": 531.82861, "r": 388.42313, "b": 534.54532, "coord_origin": "1"}}, {"id": 72, "text": "<tr><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td></tr>", "bbox": {"l": 183.2438, "t": 535.72748, "r": 388.42313, "b": 538.44418, "coord_origin": "1"}}, {"id": 73, "text": "<tr><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td></tr>", "bbox": {"l": 183.2438, "t": 539.62631, "r": 388.42313, "b": 542.34303, "coord_origin": "1"}}, {"id": 74, "text": "<tr><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td></tr>", "bbox": {"l": 183.2438, "t": 543.52516, "r": 388.42313, "b": 546.24188, "coord_origin": "1"}}, {"id": 75, "text": "<tr><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td></tr>", "bbox": {"l": 183.2438, "t": 547.42401, "r": 388.42313, "b": 550.14073, "coord_origin": "1"}}, {"id": 76, "text": "<tr><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td></tr>", "bbox": {"l": 183.2438, "t": 551.32286, "r": 388.42313, "b": 554.03958, "coord_origin": "1"}}, {"id": 77, "text": "</table>", "bbox": {"l": 180.12473, "t": 555.22173, "r": 191.86806, "b": 557.93845, "coord_origin": "1"}}, {"id": 78, "text": "C", "bbox": {"l": 407.38348, "t": 518.30042, "r": 408.82025, "b": 521.01712, "coord_origin": "1"}}, {"id": 79, "text": "C L L L C L L L L L C L L NL", "bbox": {"l": 410.25699, "t": 518.30042, "r": 450.48605, "b": 521.01712, "coord_origin": "1"}}, {"id": 80, "text": "C", "bbox": {"l": 407.38348, "t": 522.19925, "r": 408.82025, "b": 524.9159500000001, "coord_origin": "1"}}, {"id": 81, "text": "C C C C C C C C C C C C C NL", "bbox": {"l": 410.25699, "t": 522.19925, "r": 450.48605, "b": 524.9159500000001, "coord_origin": "1"}}, {"id": 82, "text": "C", "bbox": {"l": 407.38348, "t": 526.09808, "r": 408.82025, "b": 528.81479, "coord_origin": "1"}}, {"id": 83, "text": "C C C C C C C C C C C C C NL", "bbox": {"l": 410.25699, "t": 526.09808, "r": 450.48605, "b": 528.81479, "coord_origin": "1"}}, {"id": 84, "text": "C", "bbox": {"l": 407.38348, "t": 529.99695, "r": 408.82025, "b": 532.7136499999999, "coord_origin": "1"}}, {"id": 85, "text": "C C C C C C C C C C C C C NL", "bbox": {"l": 410.25699, "t": 529.99695, "r": 450.48605, "b": 532.7136499999999, "coord_origin": "1"}}, {"id": 86, "text": "C", "bbox": {"l": 407.38348, "t": 533.8957800000001, "r": 408.82025, "b": 536.6125, "coord_origin": "1"}}, {"id": 87, "text": "C C C C C C C C C C C C C NL", "bbox": {"l": 410.25699, "t": 533.8957800000001, "r": 450.48605, "b": 536.6125, "coord_origin": "1"}}, {"id": 88, "text": "C", "bbox": {"l": 407.38348, "t": 537.79463, "r": 408.82025, "b": 540.51135, "coord_origin": "1"}}, {"id": 89, "text": "C C C C C C C C C C C C C NL", "bbox": {"l": 410.25699, "t": 537.79463, "r": 450.48605, "b": 540.51135, "coord_origin": "1"}}, {"id": 90, "text": "C", "bbox": {"l": 407.38348, "t": 541.69348, "r": 408.82025, "b": 544.4102, "coord_origin": "1"}}, {"id": 91, "text": "C C C C C C C C C C C C C NL", "bbox": {"l": 410.25699, "t": 541.69348, "r": 450.48605, "b": 544.4102, "coord_origin": "1"}}, {"id": 92, "text": "C", "bbox": {"l": 407.38348, "t": 545.59233, "r": 408.82025, "b": 548.3090500000001, "coord_origin": "1"}}, {"id": 93, "text": "C C C C C C C C C C C C C NL", "bbox": {"l": 410.25699, "t": 545.59233, "r": 450.48605, "b": 548.3090500000001, "coord_origin": "1"}}, {"id": 94, "text": "C", "bbox": {"l": 407.38348, "t": 549.4911999999999, "r": 408.82025, "b": 552.2079200000001, "coord_origin": "1"}}, {"id": 95, "text": "C C C C C C C C C C C C C NL", "bbox": {"l": 410.25699, "t": 549.4911999999999, "r": 450.48605, "b": 552.2079200000001, "coord_origin": "1"}}, {"id": 96, "text": "HTML", "bbox": {"l": 164.52881, "t": 509.45859, "r": 181.8528, "b": 515.31, "coord_origin": "1"}}, {"id": 97, "text": "#", "bbox": {"l": 183.58441, "t": 509.45859, "r": 186.3974, "b": 515.31, "coord_origin": "1"}}, {"id": 98, "text": "tokens:", "bbox": {"l": 189.2104, "t": 509.45859, "r": 208.90137, "b": 515.31, "coord_origin": "1"}}, {"id": 99, "text": "258", "bbox": {"l": 210.63269, "t": 509.45859, "r": 221.04044, "b": 515.31, "coord_origin": "1"}}, {"id": 100, "text": "OTSL", "bbox": {"l": 390.20203, "t": 509.60361, "r": 406.83609, "b": 515.45502, "coord_origin": "1"}}, {"id": 101, "text": "#", "bbox": {"l": 408.56952, "t": 509.60361, "r": 411.38251, "b": 515.45502, "coord_origin": "1"}}, {"id": 102, "text": "tokens:", "bbox": {"l": 414.1955, "t": 509.60361, "r": 433.88647000000003, "b": 515.45502, "coord_origin": "1"}}, {"id": 103, "text": "135", "bbox": {"l": 435.61737, "t": 509.60361, "r": 446.02512, "b": 515.45502, "coord_origin": "1"}}, {"id": 104, "text": "B", "bbox": {"l": 167.19316, "t": 519.07236, "r": 172.8231, "b": 526.3866, "coord_origin": "1"}}, {"id": 105, "text": "A", "bbox": {"l": 187.33745, "t": 448.62485, "r": 192.96739, "b": 455.93909, "coord_origin": "1"}}, {"id": 106, "text": "D", "bbox": {"l": 167.38654, "t": 566.0051599999999, "r": 173.01648, "b": 573.3194, "coord_origin": "1"}}, {"id": 107, "text": "E", "bbox": {"l": 248.45621000000003, "t": 621.78008, "r": 253.65727, "b": 629.09431, "coord_origin": "1"}}, {"id": 108, "text": "C", "bbox": {"l": 395.90057, "t": 519.19946, "r": 401.53052, "b": 526.5137, "coord_origin": "1"}}, {"id": 109, "text": "HTML", "bbox": {"l": 171.62886, "t": 580.28853, "r": 177.48148, "b": 597.26784, "coord_origin": "1"}}, {"id": 110, "text": "OTSL", "bbox": {"l": 251.05969000000002, "t": 633.63408, "r": 256.91235, "b": 649.92345, "coord_origin": "1"}}, {"id": 111, "text": "HTML model shows", "bbox": {"l": 372.14645, "t": 601.45724, "r": 427.0379, "b": 607.30864, "coord_origin": "1"}}, {"id": 112, "text": "bounding box drifting", "bbox": {"l": 372.14645, "t": 607.89948, "r": 430.06838999999997, "b": 613.75087, "coord_origin": "1"}}, {"id": 113, "text": "OTSL model shows", "bbox": {"l": 176.88042, "t": 642.87209, "r": 231.08191, "b": 648.72348, "coord_origin": "1"}}, {"id": 114, "text": "clean bounding box", "bbox": {"l": 176.88042, "t": 649.3143, "r": 230.99271000000002, "b": 655.1657, "coord_origin": "1"}}, {"id": 115, "text": "alignment", "bbox": {"l": 176.88042, "t": 655.7565500000001, "r": 203.93219, "b": 661.60794, "coord_origin": "1"}}, {"id": 116, "text": "\u2264", "bbox": {"l": 215.93231000000003, "t": 557.56342, "r": 218.4697, "b": 569.15967, "coord_origin": "1"}}, {"id": 117, "text": "\u03bc", "bbox": {"l": 229.05689999999998, "t": 557.56342, "r": 231.71908999999997, "b": 569.15967, "coord_origin": "1"}}, {"id": 118, "text": "\u03bc", "bbox": {"l": 342.63354, "t": 430.19678, "r": 344.81915, "b": 439.71716, "coord_origin": "1"}}, {"id": 119, "text": "S", "bbox": {"l": 261.20892, "t": 448.46124, "r": 263.56973, "b": 451.19727, "coord_origin": "1"}}, {"id": 120, "text": "I", "bbox": {"l": 312.33463, "t": 448.46124, "r": 313.6362, "b": 451.19727, "coord_origin": "1"}}, {"id": 121, "text": "R", "bbox": {"l": 377.41125, "t": 448.46124, "r": 380.05737, "b": 451.19727, "coord_origin": "1"}}, {"id": 122, "text": "ST", "bbox": {"l": 200.63976, "t": 453.33997, "r": 205.82492, "b": 456.07599, "coord_origin": "1"}}, {"id": 123, "text": "0.03", "bbox": {"l": 222.20833000000002, "t": 453.33997, "r": 229.76836, "b": 456.07599, "coord_origin": "1"}}, {"id": 124, "text": "0.06", "bbox": {"l": 243.26666, "t": 453.33997, "r": 250.82669, "b": 456.07599, "coord_origin": "1"}}, {"id": 125, "text": "0.12", "bbox": {"l": 264.29657, "t": 453.33997, "r": 271.84949, "b": 456.07599, "coord_origin": "1"}}, {"id": 126, "text": "0.25", "bbox": {"l": 285.31943, "t": 453.33997, "r": 292.87946, "b": 456.07599, "coord_origin": "1"}}, {"id": 127, "text": "0.5", "bbox": {"l": 306.37775, "t": 453.33997, "r": 311.77319, "b": 456.07599, "coord_origin": "1"}}, {"id": 128, "text": "1", "bbox": {"l": 323.41699, "t": 453.33997, "r": 325.58157, "b": 456.07599, "coord_origin": "1"}}, {"id": 129, "text": "2", "bbox": {"l": 334.45807, "t": 453.33997, "r": 336.62265, "b": 456.07599, "coord_origin": "1"}}, {"id": 130, "text": "4", "bbox": {"l": 345.52756, "t": 453.33997, "r": 347.69214, "b": 456.07599, "coord_origin": "1"}}, {"id": 131, "text": "8", "bbox": {"l": 356.56863, "t": 453.33997, "r": 358.73322, "b": 456.07599, "coord_origin": "1"}}, {"id": 132, "text": "16", "bbox": {"l": 367.63812, "t": 453.33997, "r": 371.97089, "b": 456.07599, "coord_origin": "1"}}, {"id": 133, "text": "32", "bbox": {"l": 382.6734, "t": 453.33997, "r": 387.00616, "b": 456.07599, "coord_origin": "1"}}, {"id": 134, "text": "64", "bbox": {"l": 397.73727, "t": 453.33997, "r": 402.07001, "b": 456.07599, "coord_origin": "1"}}, {"id": 135, "text": "\u2265", "bbox": {"l": 412.78879, "t": 447.99298, "r": 414.93463, "b": 457.79964999999993, "coord_origin": "1"}}, {"id": 136, "text": " 128", "bbox": {"l": 414.95697, "t": 453.33997, "r": 422.51746, "b": 456.07599, "coord_origin": "1"}}, {"id": 137, "text": "63", "bbox": {"l": 200.63998, "t": 463.92444, "r": 204.57674, "b": 466.66043, "coord_origin": "1"}}, {"id": 138, "text": "1", "bbox": {"l": 367.62604, "t": 463.92444, "r": 369.58032, "b": 466.66043, "coord_origin": "1"}}, {"id": 139, "text": "1", "bbox": {"l": 382.66132, "t": 463.92444, "r": 384.6156, "b": 466.66043, "coord_origin": "1"}}, {"id": 140, "text": "3", "bbox": {"l": 397.72504, "t": 463.92444, "r": 399.67932, "b": 466.66043, "coord_origin": "1"}}, {"id": 141, "text": "199", "bbox": {"l": 200.64, "t": 468.80313, "r": 206.51694, "b": 471.53915, "coord_origin": "1"}}, {"id": 142, "text": "5", "bbox": {"l": 264.29047, "t": 468.80313, "r": 266.25885, "b": 471.53915, "coord_origin": "1"}}, {"id": 143, "text": "1", "bbox": {"l": 306.37213, "t": 468.80313, "r": 308.34052, "b": 471.53915, "coord_origin": "1"}}, {"id": 144, "text": "2", "bbox": {"l": 345.51526, "t": 468.80313, "r": 347.48364, "b": 471.53915, "coord_origin": "1"}}, {"id": 145, "text": "4", "bbox": {"l": 356.55634, "t": 468.80313, "r": 358.52472, "b": 471.53915, "coord_origin": "1"}}, {"id": 146, "text": "1", "bbox": {"l": 367.62582, "t": 468.80313, "r": 369.59418, "b": 471.53915, "coord_origin": "1"}}, {"id": 147, "text": "1", "bbox": {"l": 382.66107, "t": 468.80313, "r": 384.62946, "b": 471.53915, "coord_origin": "1"}}, {"id": 148, "text": "416", "bbox": {"l": 200.64, "t": 473.68185, "r": 206.51694, "b": 476.41788, "coord_origin": "1"}}, {"id": 149, "text": "4", "bbox": {"l": 264.29047, "t": 473.68185, "r": 266.25885, "b": 476.41788, "coord_origin": "1"}}, {"id": 150, "text": "230", "bbox": {"l": 200.64, "t": 478.53214, "r": 206.51694, "b": 481.26816, "coord_origin": "1"}}, {"id": 151, "text": "1", "bbox": {"l": 243.26373, "t": 478.53214, "r": 245.2321, "b": 481.26816, "coord_origin": "1"}}, {"id": 152, "text": "9", "bbox": {"l": 264.29047, "t": 478.53214, "r": 266.25885, "b": 481.26816, "coord_origin": "1"}}, {"id": 153, "text": "1", "bbox": {"l": 323.40466, "t": 478.53214, "r": 325.37305, "b": 481.26816, "coord_origin": "1"}}, {"id": 154, "text": "1", "bbox": {"l": 397.72519, "t": 478.53214, "r": 399.69354, "b": 481.26816, "coord_origin": "1"}}, {"id": 155, "text": "276", "bbox": {"l": 200.64, "t": 483.41086, "r": 206.51694, "b": 486.14688, "coord_origin": "1"}}, {"id": 156, "text": "2", "bbox": {"l": 382.66132, "t": 483.41086, "r": 384.61563, "b": 486.14688, "coord_origin": "1"}}, {"id": 157, "text": "12", "bbox": {"l": 397.72513, "t": 483.41086, "r": 401.64819, "b": 486.14688, "coord_origin": "1"}}, {"id": 158, "text": "1", "bbox": {"l": 412.78928, "t": 483.41086, "r": 414.74359, "b": 486.14688, "coord_origin": "1"}}, {"id": 159, "text": "320", "bbox": {"l": 200.64014, "t": 488.28958, "r": 207.14445, "b": 491.0256, "coord_origin": "1"}}, {"id": 160, "text": "1", "bbox": {"l": 367.62616, "t": 488.28958, "r": 369.78375, "b": 491.0256, "coord_origin": "1"}}, {"id": 161, "text": "4", "bbox": {"l": 382.66141, "t": 488.28958, "r": 384.81897, "b": 491.0256, "coord_origin": "1"}}, {"id": 162, "text": "20", "bbox": {"l": 397.7251, "t": 488.28958, "r": 402.05087, "b": 491.0256, "coord_origin": "1"}}, {"id": 163, "text": "2013", "bbox": {"l": 200.64032, "t": 493.1683, "r": 208.48566, "b": 495.90433, "coord_origin": "1"}}, {"id": 164, "text": "3", "bbox": {"l": 264.29044, "t": 493.1683, "r": 266.25879, "b": 495.90433, "coord_origin": "1"}}, {"id": 165, "text": "\u03bc", "bbox": {"l": 227.91466, "t": 665.82603, "r": 230.10028, "b": 675.3464, "coord_origin": "1"}}, {"id": 166, "text": "\u2265", "bbox": {"l": 300.58057, "t": 683.62195, "r": 302.72638, "b": 693.428658, "coord_origin": "1"}}], "predictions": {"layout": {"clusters": [{"id": 0, "label": "Page-header", "bbox": {"l": 134.6792824745178, "t": 93.56233406066895, "r": 144.24872789382934, "b": 101.84069999999997, "coord_origin": "1"}, "confidence": 0.8677384853363037, "cells": [{"id": 0, "text": "10", "bbox": {"l": 134.765, "t": 93.77099999999996, "r": 143.97887, "b": 101.84069999999997, "coord_origin": "1"}}]}, {"id": 1, "label": "Page-header", "bbox": {"l": 167.24963665008545, "t": 92.96470441818235, "r": 231.72049000000004, "b": 101.84069999999997, "coord_origin": "1"}, "confidence": 0.8613899946212769, "cells": [{"id": 1, "text": "M.", "bbox": {"l": 167.82053, "t": 93.77099999999996, "r": 178.08249, "b": 101.84069999999997, "coord_origin": "1"}}, {"id": 2, "text": "Lysak, et al.", "bbox": {"l": 182.37929, "t": 93.77099999999996, "r": 231.72049000000004, "b": 101.84069999999997, "coord_origin": "1"}}]}, {"id": 2, "label": "Caption", "bbox": {"l": 134.00595617294312, "t": 114.83857812881467, "r": 480.59357000000006, "b": 146.49228200912478, "coord_origin": "1"}, "confidence": 0.9548113346099854, "cells": [{"id": 3, "text": "Table 2.", "bbox": {"l": 134.765, "t": 115.83618000000001, "r": 173.09366, "b": 123.76251000000002, "coord_origin": "1"}}, {"id": 4, "text": "TSR and cell detection results compared between OTSL and HTML on", "bbox": {"l": 181.30299, "t": 115.89899000000003, "r": 480.59151999999995, "b": 123.96868999999992, "coord_origin": "1"}}, {"id": 5, "text": "the PubTabNet [22], FinTabNet [21] and PubTables-1M [14] data sets using Table-", "bbox": {"l": 134.765, "t": 126.85797000000014, "r": 480.59357000000006, "b": 134.92767000000003, "coord_origin": "1"}}, {"id": 6, "text": "Former [9] (with enc=6, dec=6, heads=8).", "bbox": {"l": 134.765, "t": 137.81696, "r": 305.95691, "b": 145.88666, "coord_origin": "1"}}]}, {"id": 3, "label": "Table", "bbox": {"l": 143.8171488761902, "t": 156.13133182525632, "r": 470.8412103652954, "b": 263.2244602203368, "coord_origin": "1"}, "confidence": 0.9879505038261414, "cells": [{"id": 7, "text": "Data set", "bbox": {"l": 160.782, "t": 166.55895999999996, "r": 194.99779, "b": 174.62865999999997, "coord_origin": "1"}}, {"id": 8, "text": "Language", "bbox": {"l": 215.52499000000003, "t": 166.534, "r": 254.04465, "b": 174.6037, "coord_origin": "1"}}, {"id": 9, "text": "TEDs", "bbox": {"l": 300.397, "t": 161.07898, "r": 323.99118, "b": 169.14868, "coord_origin": "1"}}, {"id": 10, "text": "mAP(0.75)", "bbox": {"l": 370.345, "t": 166.55895999999996, "r": 414.74661, "b": 174.62865999999997, "coord_origin": "1"}}, {"id": 11, "text": "Inference", "bbox": {"l": 426.737, "t": 161.07898, "r": 463.10830999999996, "b": 169.14868, "coord_origin": "1"}}, {"id": 12, "text": "time (secs)", "bbox": {"l": 423.11401, "t": 172.03796, "r": 466.72656, "b": 180.10766999999998, "coord_origin": "1"}}, {"id": 13, "text": "simple", "bbox": {"l": 262.41299, "t": 174.03101000000004, "r": 288.0596, "b": 182.10071000000005, "coord_origin": "1"}}, {"id": 14, "text": "complex", "bbox": {"l": 296.42899, "t": 174.03101000000004, "r": 329.44687, "b": 182.10071000000005, "coord_origin": "1"}}, {"id": 15, "text": "all", "bbox": {"l": 345.03299, "t": 174.03101000000004, "r": 354.75793, "b": 182.10071000000005, "coord_origin": "1"}}, {"id": 16, "text": "PubTabNet", "bbox": {"l": 154.53799, "t": 192.85999000000004, "r": 201.24129, "b": 200.92969000000005, "coord_origin": "1"}}, {"id": 17, "text": "OTSL", "bbox": {"l": 222.43700000000004, "t": 187.38098000000002, "r": 247.13226000000003, "b": 195.45068000000003, "coord_origin": "1"}}, {"id": 18, "text": "0.965", "bbox": {"l": 264.74399, "t": 187.38098000000002, "r": 285.73074, "b": 195.45068000000003, "coord_origin": "1"}}, {"id": 19, "text": "0.934", "bbox": {"l": 302.444, "t": 187.38098000000002, "r": 323.43076, "b": 195.45068000000003, "coord_origin": "1"}}, {"id": 20, "text": "0.955", "bbox": {"l": 339.40302, "t": 187.38098000000002, "r": 360.38977, "b": 195.45068000000003, "coord_origin": "1"}}, {"id": 21, "text": "0.88", "bbox": {"l": 383.116, "t": 187.31817999999998, "r": 401.97324, "b": 195.24451, "coord_origin": "1"}}, {"id": 22, "text": "2.73", "bbox": {"l": 435.49300999999997, "t": 187.31817999999998, "r": 454.35025, "b": 195.24451, "coord_origin": "1"}}, {"id": 23, "text": "HTML", "bbox": {"l": 220.903, "t": 200.33196999999996, "r": 248.66655999999998, "b": 208.40166999999997, "coord_origin": "1"}}, {"id": 24, "text": "0.969", "bbox": {"l": 264.74399, "t": 200.33196999999996, "r": 285.73074, "b": 208.40166999999997, "coord_origin": "1"}}, {"id": 25, "text": "0.927", "bbox": {"l": 302.444, "t": 200.33196999999996, "r": 323.43076, "b": 208.40166999999997, "coord_origin": "1"}}, {"id": 26, "text": "0.955", "bbox": {"l": 339.40302, "t": 200.33196999999996, "r": 360.38977, "b": 208.40166999999997, "coord_origin": "1"}}, {"id": 27, "text": "0.857", "bbox": {"l": 382.052, "t": 200.33196999999996, "r": 403.03876, "b": 208.40166999999997, "coord_origin": "1"}}, {"id": 28, "text": "5.39", "bbox": {"l": 436.73199000000005, "t": 200.33196999999996, "r": 453.11182, "b": 208.40166999999997, "coord_origin": "1"}}, {"id": 29, "text": "FinTabNet", "bbox": {"l": 155.94501, "t": 219.16198999999995, "r": 199.83374, "b": 227.23168999999996, "coord_origin": "1"}}, {"id": 30, "text": "OTSL", "bbox": {"l": 222.43700000000004, "t": 213.68201, "r": 247.13226000000003, "b": 221.75171, "coord_origin": "1"}}, {"id": 31, "text": "0.955", "bbox": {"l": 264.74399, "t": 213.68201, "r": 285.73074, "b": 221.75171, "coord_origin": "1"}}, {"id": 32, "text": "0.961", "bbox": {"l": 302.444, "t": 213.68201, "r": 323.43076, "b": 221.75171, "coord_origin": "1"}}, {"id": 33, "text": "0.959", "bbox": {"l": 337.815, "t": 213.61919999999998, "r": 361.97586, "b": 221.54552999999999, "coord_origin": "1"}}, {"id": 34, "text": "0.862", "bbox": {"l": 380.46399, "t": 213.61919999999998, "r": 404.62485, "b": 221.54552999999999, "coord_origin": "1"}}, {"id": 35, "text": "1.85", "bbox": {"l": 435.49300999999997, "t": 213.61919999999998, "r": 454.35025, "b": 221.54552999999999, "coord_origin": "1"}}, {"id": 36, "text": "HTML", "bbox": {"l": 220.903, "t": 226.63396999999998, "r": 248.66655999999998, "b": 234.70367, "coord_origin": "1"}}, {"id": 37, "text": "0.917", "bbox": {"l": 264.74399, "t": 226.63396999999998, "r": 285.73074, "b": 234.70367, "coord_origin": "1"}}, {"id": 38, "text": "0.922", "bbox": {"l": 302.444, "t": 226.63396999999998, "r": 323.43076, "b": 234.70367, "coord_origin": "1"}}, {"id": 39, "text": "0.92", "bbox": {"l": 341.70599, "t": 226.63396999999998, "r": 358.08582, "b": 234.70367, "coord_origin": "1"}}, {"id": 40, "text": "0.722", "bbox": {"l": 382.052, "t": 226.63396999999998, "r": 403.03876, "b": 234.70367, "coord_origin": "1"}}, {"id": 41, "text": "3.26", "bbox": {"l": 436.73199000000005, "t": 226.63396999999998, "r": 453.11182, "b": 234.70367, "coord_origin": "1"}}, {"id": 42, "text": "PubTables-1M", "bbox": {"l": 148.62601, "t": 245.46294999999998, "r": 207.1524, "b": 253.53265, "coord_origin": "1"}}, {"id": 43, "text": "OTSL", "bbox": {"l": 222.43700000000004, "t": 239.98297000000002, "r": 247.13226000000003, "b": 248.05267000000003, "coord_origin": "1"}}, {"id": 44, "text": "0.987", "bbox": {"l": 264.74399, "t": 239.98297000000002, "r": 285.73074, "b": 248.05267000000003, "coord_origin": "1"}}, {"id": 45, "text": "0.964", "bbox": {"l": 302.444, "t": 239.98297000000002, "r": 323.43076, "b": 248.05267000000003, "coord_origin": "1"}}, {"id": 46, "text": "0.977", "bbox": {"l": 337.815, "t": 239.92016999999998, "r": 361.97586, "b": 247.8465, "coord_origin": "1"}}, {"id": 47, "text": "0.896", "bbox": {"l": 380.46399, "t": 239.92016999999998, "r": 404.62485, "b": 247.8465, "coord_origin": "1"}}, {"id": 48, "text": "1.79", "bbox": {"l": 435.49300999999997, "t": 239.92016999999998, "r": 454.35025, "b": 247.8465, "coord_origin": "1"}}, {"id": 49, "text": "HTML", "bbox": {"l": 220.903, "t": 252.93499999999995, "r": 248.66655999999998, "b": 261.00469999999996, "coord_origin": "1"}}, {"id": 50, "text": "0.983", "bbox": {"l": 264.74399, "t": 252.93499999999995, "r": 285.73074, "b": 261.00469999999996, "coord_origin": "1"}}, {"id": 51, "text": "0.944", "bbox": {"l": 302.444, "t": 252.93499999999995, "r": 323.43076, "b": 261.00469999999996, "coord_origin": "1"}}, {"id": 52, "text": "0.966", "bbox": {"l": 339.40302, "t": 252.93499999999995, "r": 360.38977, "b": 261.00469999999996, "coord_origin": "1"}}, {"id": 53, "text": "0.889", "bbox": {"l": 382.052, "t": 252.93499999999995, "r": 403.03876, "b": 261.00469999999996, "coord_origin": "1"}}, {"id": 54, "text": "3.26", "bbox": {"l": 436.73199000000005, "t": 252.93499999999995, "r": 453.11182, "b": 261.00469999999996, "coord_origin": "1"}}]}, {"id": 4, "label": "Section-header", "bbox": {"l": 134.25314598083494, "t": 288.23322944641114, "r": 257.1956182479858, "b": 298.2838571548462, "coord_origin": "1"}, "confidence": 0.9522386193275452, "cells": [{"id": 55, "text": "5.3", "bbox": {"l": 134.765, "t": 288.91479, "r": 149.40205, "b": 297.72173999999995, "coord_origin": "1"}}, {"id": 56, "text": "Qualitative Results", "bbox": {"l": 160.85904, "t": 288.91479, "r": 257.08679, "b": 297.72173999999995, "coord_origin": "1"}}]}, {"id": 5, "label": "Text", "bbox": {"l": 133.7931432723999, "t": 308.9267612457275, "r": 480.6096508026123, "b": 366.47769, "coord_origin": "1"}, "confidence": 0.9832314252853394, "cells": [{"id": 57, "text": "To illustrate the qualitative differences between OTSL and HTML, Figure 5", "bbox": {"l": 134.765, "t": 309.86078, "r": 480.58777, "b": 318.65775, "coord_origin": "1"}}, {"id": 58, "text": "demonstrates less overlap and more accurate bounding boxes with OTSL. In", "bbox": {"l": 134.765, "t": 321.81577, "r": 480.58889999999997, "b": 330.61273, "coord_origin": "1"}}, {"id": 59, "text": "Figure 6, OTSL proves to be more effective in handling tables with longer to-", "bbox": {"l": 134.765, "t": 333.77075, "r": 480.58681999999993, "b": 342.56772, "coord_origin": "1"}}, {"id": 60, "text": "ken sequences, resulting in even more precise structure prediction and bounding", "bbox": {"l": 134.765, "t": 345.72574, "r": 480.58981, "b": 354.52271, "coord_origin": "1"}}, {"id": 61, "text": "boxes.", "bbox": {"l": 134.765, "t": 357.68073, "r": 161.65704, "b": 366.47769, "coord_origin": "1"}}]}, {"id": 6, "label": "Caption", "bbox": {"l": 133.93432788848875, "t": 396.78733520507814, "r": 480.59106, "b": 439.71716, "coord_origin": "1"}, "confidence": 0.7760405540466309, "cells": [{"id": 62, "text": "Fig. 5.", "bbox": {"l": 134.765, "t": 397.59012, "r": 162.64424, "b": 405.51642, "coord_origin": "1"}}, {"id": 63, "text": "The OTSL model produces more accurate bounding boxes with less over-", "bbox": {"l": 167.384, "t": 397.65289, "r": 480.59106, "b": 405.72266, "coord_origin": "1"}}, {"id": 64, "text": "lap (E) than the HTML model (D), when predicting the structure of a sparse ta-", "bbox": {"l": 134.765, "t": 408.61190999999997, "r": 480.59106, "b": 416.68167000000005, "coord_origin": "1"}}, {"id": 65, "text": "ble (A), at twice the inference speed because of shorter sequence length (B),(C).", "bbox": {"l": 134.765, "t": 419.57089, "r": 480.58838000000003, "b": 427.64066, "coord_origin": "1"}}, {"id": 66, "text": "\"PMC2807444_006_00.png\" PubTabNet.", "bbox": {"l": 134.765, "t": 430.52987999999993, "r": 304.69171, "b": 438.59964, "coord_origin": "1"}}, {"id": 118, "text": "\u03bc", "bbox": {"l": 342.63354, "t": 430.19678, "r": 344.81915, "b": 439.71716, "coord_origin": "1"}}]}, {"id": 7, "label": "Picture", "bbox": {"l": 162.9001407623291, "t": 443.7800834655762, "r": 451.33742237091064, "b": 663.5160186767579, "coord_origin": "1"}, "confidence": 0.945287823677063, "cells": [{"id": 67, "text": "<table>", "bbox": {"l": 180.12473, "t": 516.2332200000001, "r": 190.62042, "b": 518.94992, "coord_origin": "1"}}, {"id": 68, "text": "<tr><td></td><td colspan=\"4\"></td><td colspan=\"6\"></td><td colspan=\"3\"></td></tr>", "bbox": {"l": 183.2438, "t": 520.13208, "r": 304.54797, "b": 522.84879, "coord_origin": "1"}}, {"id": 69, "text": "<tr><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td></tr>", "bbox": {"l": 183.2438, "t": 524.03094, "r": 388.42313, "b": 526.74765, "coord_origin": "1"}}, {"id": 70, "text": "<tr><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td></tr>", "bbox": {"l": 183.2438, "t": 527.9297799999999, "r": 388.42313, "b": 530.64648, "coord_origin": "1"}}, {"id": 71, "text": "<tr><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td></tr>", "bbox": {"l": 183.2438, "t": 531.82861, "r": 388.42313, "b": 534.54532, "coord_origin": "1"}}, {"id": 72, "text": "<tr><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td></tr>", "bbox": {"l": 183.2438, "t": 535.72748, "r": 388.42313, "b": 538.44418, "coord_origin": "1"}}, {"id": 73, "text": "<tr><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td></tr>", "bbox": {"l": 183.2438, "t": 539.62631, "r": 388.42313, "b": 542.34303, "coord_origin": "1"}}, {"id": 74, "text": "<tr><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td></tr>", "bbox": {"l": 183.2438, "t": 543.52516, "r": 388.42313, "b": 546.24188, "coord_origin": "1"}}, {"id": 75, "text": "<tr><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td></tr>", "bbox": {"l": 183.2438, "t": 547.42401, "r": 388.42313, "b": 550.14073, "coord_origin": "1"}}, {"id": 76, "text": "<tr><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td></tr>", "bbox": {"l": 183.2438, "t": 551.32286, "r": 388.42313, "b": 554.03958, "coord_origin": "1"}}, {"id": 77, "text": "</table>", "bbox": {"l": 180.12473, "t": 555.22173, "r": 191.86806, "b": 557.93845, "coord_origin": "1"}}, {"id": 78, "text": "C", "bbox": {"l": 407.38348, "t": 518.30042, "r": 408.82025, "b": 521.01712, "coord_origin": "1"}}, {"id": 79, "text": "C L L L C L L L L L C L L NL", "bbox": {"l": 410.25699, "t": 518.30042, "r": 450.48605, "b": 521.01712, "coord_origin": "1"}}, {"id": 80, "text": "C", "bbox": {"l": 407.38348, "t": 522.19925, "r": 408.82025, "b": 524.9159500000001, "coord_origin": "1"}}, {"id": 81, "text": "C C C C C C C C C C C C C NL", "bbox": {"l": 410.25699, "t": 522.19925, "r": 450.48605, "b": 524.9159500000001, "coord_origin": "1"}}, {"id": 82, "text": "C", "bbox": {"l": 407.38348, "t": 526.09808, "r": 408.82025, "b": 528.81479, "coord_origin": "1"}}, {"id": 83, "text": "C C C C C C C C C C C C C NL", "bbox": {"l": 410.25699, "t": 526.09808, "r": 450.48605, "b": 528.81479, "coord_origin": "1"}}, {"id": 84, "text": "C", "bbox": {"l": 407.38348, "t": 529.99695, "r": 408.82025, "b": 532.7136499999999, "coord_origin": "1"}}, {"id": 85, "text": "C C C C C C C C C C C C C NL", "bbox": {"l": 410.25699, "t": 529.99695, "r": 450.48605, "b": 532.7136499999999, "coord_origin": "1"}}, {"id": 86, "text": "C", "bbox": {"l": 407.38348, "t": 533.8957800000001, "r": 408.82025, "b": 536.6125, "coord_origin": "1"}}, {"id": 87, "text": "C C C C C C C C C C C C C NL", "bbox": {"l": 410.25699, "t": 533.8957800000001, "r": 450.48605, "b": 536.6125, "coord_origin": "1"}}, {"id": 88, "text": "C", "bbox": {"l": 407.38348, "t": 537.79463, "r": 408.82025, "b": 540.51135, "coord_origin": "1"}}, {"id": 89, "text": "C C C C C C C C C C C C C NL", "bbox": {"l": 410.25699, "t": 537.79463, "r": 450.48605, "b": 540.51135, "coord_origin": "1"}}, {"id": 90, "text": "C", "bbox": {"l": 407.38348, "t": 541.69348, "r": 408.82025, "b": 544.4102, "coord_origin": "1"}}, {"id": 91, "text": "C C C C C C C C C C C C C NL", "bbox": {"l": 410.25699, "t": 541.69348, "r": 450.48605, "b": 544.4102, "coord_origin": "1"}}, {"id": 92, "text": "C", "bbox": {"l": 407.38348, "t": 545.59233, "r": 408.82025, "b": 548.3090500000001, "coord_origin": "1"}}, {"id": 93, "text": "C C C C C C C C C C C C C NL", "bbox": {"l": 410.25699, "t": 545.59233, "r": 450.48605, "b": 548.3090500000001, "coord_origin": "1"}}, {"id": 94, "text": "C", "bbox": {"l": 407.38348, "t": 549.4911999999999, "r": 408.82025, "b": 552.2079200000001, "coord_origin": "1"}}, {"id": 95, "text": "C C C C C C C C C C C C C NL", "bbox": {"l": 410.25699, "t": 549.4911999999999, "r": 450.48605, "b": 552.2079200000001, "coord_origin": "1"}}, {"id": 96, "text": "HTML", "bbox": {"l": 164.52881, "t": 509.45859, "r": 181.8528, "b": 515.31, "coord_origin": "1"}}, {"id": 97, "text": "#", "bbox": {"l": 183.58441, "t": 509.45859, "r": 186.3974, "b": 515.31, "coord_origin": "1"}}, {"id": 98, "text": "tokens:", "bbox": {"l": 189.2104, "t": 509.45859, "r": 208.90137, "b": 515.31, "coord_origin": "1"}}, {"id": 99, "text": "258", "bbox": {"l": 210.63269, "t": 509.45859, "r": 221.04044, "b": 515.31, "coord_origin": "1"}}, {"id": 100, "text": "OTSL", "bbox": {"l": 390.20203, "t": 509.60361, "r": 406.83609, "b": 515.45502, "coord_origin": "1"}}, {"id": 101, "text": "#", "bbox": {"l": 408.56952, "t": 509.60361, "r": 411.38251, "b": 515.45502, "coord_origin": "1"}}, {"id": 102, "text": "tokens:", "bbox": {"l": 414.1955, "t": 509.60361, "r": 433.88647000000003, "b": 515.45502, "coord_origin": "1"}}, {"id": 103, "text": "135", "bbox": {"l": 435.61737, "t": 509.60361, "r": 446.02512, "b": 515.45502, "coord_origin": "1"}}, {"id": 104, "text": "B", "bbox": {"l": 167.19316, "t": 519.07236, "r": 172.8231, "b": 526.3866, "coord_origin": "1"}}, {"id": 105, "text": "A", "bbox": {"l": 187.33745, "t": 448.62485, "r": 192.96739, "b": 455.93909, "coord_origin": "1"}}, {"id": 106, "text": "D", "bbox": {"l": 167.38654, "t": 566.0051599999999, "r": 173.01648, "b": 573.3194, "coord_origin": "1"}}, {"id": 107, "text": "E", "bbox": {"l": 248.45621000000003, "t": 621.78008, "r": 253.65727, "b": 629.09431, "coord_origin": "1"}}, {"id": 108, "text": "C", "bbox": {"l": 395.90057, "t": 519.19946, "r": 401.53052, "b": 526.5137, "coord_origin": "1"}}, {"id": 109, "text": "HTML", "bbox": {"l": 171.62886, "t": 580.28853, "r": 177.48148, "b": 597.26784, "coord_origin": "1"}}, {"id": 110, "text": "OTSL", "bbox": {"l": 251.05969000000002, "t": 633.63408, "r": 256.91235, "b": 649.92345, "coord_origin": "1"}}, {"id": 111, "text": "HTML model shows", "bbox": {"l": 372.14645, "t": 601.45724, "r": 427.0379, "b": 607.30864, "coord_origin": "1"}}, {"id": 112, "text": "bounding box drifting", "bbox": {"l": 372.14645, "t": 607.89948, "r": 430.06838999999997, "b": 613.75087, "coord_origin": "1"}}, {"id": 113, "text": "OTSL model shows", "bbox": {"l": 176.88042, "t": 642.87209, "r": 231.08191, "b": 648.72348, "coord_origin": "1"}}, {"id": 114, "text": "clean bounding box", "bbox": {"l": 176.88042, "t": 649.3143, "r": 230.99271000000002, "b": 655.1657, "coord_origin": "1"}}, {"id": 115, "text": "alignment", "bbox": {"l": 176.88042, "t": 655.7565500000001, "r": 203.93219, "b": 661.60794, "coord_origin": "1"}}, {"id": 116, "text": "\u2264", "bbox": {"l": 215.93231000000003, "t": 557.56342, "r": 218.4697, "b": 569.15967, "coord_origin": "1"}}, {"id": 117, "text": "\u03bc", "bbox": {"l": 229.05689999999998, "t": 557.56342, "r": 231.71908999999997, "b": 569.15967, "coord_origin": "1"}}, {"id": 119, "text": "S", "bbox": {"l": 261.20892, "t": 448.46124, "r": 263.56973, "b": 451.19727, "coord_origin": "1"}}, {"id": 120, "text": "I", "bbox": {"l": 312.33463, "t": 448.46124, "r": 313.6362, "b": 451.19727, "coord_origin": "1"}}, {"id": 121, "text": "R", "bbox": {"l": 377.41125, "t": 448.46124, "r": 380.05737, "b": 451.19727, "coord_origin": "1"}}, {"id": 122, "text": "ST", "bbox": {"l": 200.63976, "t": 453.33997, "r": 205.82492, "b": 456.07599, "coord_origin": "1"}}, {"id": 123, "text": "0.03", "bbox": {"l": 222.20833000000002, "t": 453.33997, "r": 229.76836, "b": 456.07599, "coord_origin": "1"}}, {"id": 124, "text": "0.06", "bbox": {"l": 243.26666, "t": 453.33997, "r": 250.82669, "b": 456.07599, "coord_origin": "1"}}, {"id": 125, "text": "0.12", "bbox": {"l": 264.29657, "t": 453.33997, "r": 271.84949, "b": 456.07599, "coord_origin": "1"}}, {"id": 126, "text": "0.25", "bbox": {"l": 285.31943, "t": 453.33997, "r": 292.87946, "b": 456.07599, "coord_origin": "1"}}, {"id": 127, "text": "0.5", "bbox": {"l": 306.37775, "t": 453.33997, "r": 311.77319, "b": 456.07599, "coord_origin": "1"}}, {"id": 128, "text": "1", "bbox": {"l": 323.41699, "t": 453.33997, "r": 325.58157, "b": 456.07599, "coord_origin": "1"}}, {"id": 129, "text": "2", "bbox": {"l": 334.45807, "t": 453.33997, "r": 336.62265, "b": 456.07599, "coord_origin": "1"}}, {"id": 130, "text": "4", "bbox": {"l": 345.52756, "t": 453.33997, "r": 347.69214, "b": 456.07599, "coord_origin": "1"}}, {"id": 131, "text": "8", "bbox": {"l": 356.56863, "t": 453.33997, "r": 358.73322, "b": 456.07599, "coord_origin": "1"}}, {"id": 132, "text": "16", "bbox": {"l": 367.63812, "t": 453.33997, "r": 371.97089, "b": 456.07599, "coord_origin": "1"}}, {"id": 133, "text": "32", "bbox": {"l": 382.6734, "t": 453.33997, "r": 387.00616, "b": 456.07599, "coord_origin": "1"}}, {"id": 134, "text": "64", "bbox": {"l": 397.73727, "t": 453.33997, "r": 402.07001, "b": 456.07599, "coord_origin": "1"}}, {"id": 135, "text": "\u2265", "bbox": {"l": 412.78879, "t": 447.99298, "r": 414.93463, "b": 457.79964999999993, "coord_origin": "1"}}, {"id": 136, "text": " 128", "bbox": {"l": 414.95697, "t": 453.33997, "r": 422.51746, "b": 456.07599, "coord_origin": "1"}}, {"id": 137, "text": "63", "bbox": {"l": 200.63998, "t": 463.92444, "r": 204.57674, "b": 466.66043, "coord_origin": "1"}}, {"id": 138, "text": "1", "bbox": {"l": 367.62604, "t": 463.92444, "r": 369.58032, "b": 466.66043, "coord_origin": "1"}}, {"id": 139, "text": "1", "bbox": {"l": 382.66132, "t": 463.92444, "r": 384.6156, "b": 466.66043, "coord_origin": "1"}}, {"id": 140, "text": "3", "bbox": {"l": 397.72504, "t": 463.92444, "r": 399.67932, "b": 466.66043, "coord_origin": "1"}}, {"id": 141, "text": "199", "bbox": {"l": 200.64, "t": 468.80313, "r": 206.51694, "b": 471.53915, "coord_origin": "1"}}, {"id": 142, "text": "5", "bbox": {"l": 264.29047, "t": 468.80313, "r": 266.25885, "b": 471.53915, "coord_origin": "1"}}, {"id": 143, "text": "1", "bbox": {"l": 306.37213, "t": 468.80313, "r": 308.34052, "b": 471.53915, "coord_origin": "1"}}, {"id": 144, "text": "2", "bbox": {"l": 345.51526, "t": 468.80313, "r": 347.48364, "b": 471.53915, "coord_origin": "1"}}, {"id": 145, "text": "4", "bbox": {"l": 356.55634, "t": 468.80313, "r": 358.52472, "b": 471.53915, "coord_origin": "1"}}, {"id": 146, "text": "1", "bbox": {"l": 367.62582, "t": 468.80313, "r": 369.59418, "b": 471.53915, "coord_origin": "1"}}, {"id": 147, "text": "1", "bbox": {"l": 382.66107, "t": 468.80313, "r": 384.62946, "b": 471.53915, "coord_origin": "1"}}, {"id": 148, "text": "416", "bbox": {"l": 200.64, "t": 473.68185, "r": 206.51694, "b": 476.41788, "coord_origin": "1"}}, {"id": 149, "text": "4", "bbox": {"l": 264.29047, "t": 473.68185, "r": 266.25885, "b": 476.41788, "coord_origin": "1"}}, {"id": 150, "text": "230", "bbox": {"l": 200.64, "t": 478.53214, "r": 206.51694, "b": 481.26816, "coord_origin": "1"}}, {"id": 151, "text": "1", "bbox": {"l": 243.26373, "t": 478.53214, "r": 245.2321, "b": 481.26816, "coord_origin": "1"}}, {"id": 152, "text": "9", "bbox": {"l": 264.29047, "t": 478.53214, "r": 266.25885, "b": 481.26816, "coord_origin": "1"}}, {"id": 153, "text": "1", "bbox": {"l": 323.40466, "t": 478.53214, "r": 325.37305, "b": 481.26816, "coord_origin": "1"}}, {"id": 154, "text": "1", "bbox": {"l": 397.72519, "t": 478.53214, "r": 399.69354, "b": 481.26816, "coord_origin": "1"}}, {"id": 155, "text": "276", "bbox": {"l": 200.64, "t": 483.41086, "r": 206.51694, "b": 486.14688, "coord_origin": "1"}}, {"id": 156, "text": "2", "bbox": {"l": 382.66132, "t": 483.41086, "r": 384.61563, "b": 486.14688, "coord_origin": "1"}}, {"id": 157, "text": "12", "bbox": {"l": 397.72513, "t": 483.41086, "r": 401.64819, "b": 486.14688, "coord_origin": "1"}}, {"id": 158, "text": "1", "bbox": {"l": 412.78928, "t": 483.41086, "r": 414.74359, "b": 486.14688, "coord_origin": "1"}}, {"id": 159, "text": "320", "bbox": {"l": 200.64014, "t": 488.28958, "r": 207.14445, "b": 491.0256, "coord_origin": "1"}}, {"id": 160, "text": "1", "bbox": {"l": 367.62616, "t": 488.28958, "r": 369.78375, "b": 491.0256, "coord_origin": "1"}}, {"id": 161, "text": "4", "bbox": {"l": 382.66141, "t": 488.28958, "r": 384.81897, "b": 491.0256, "coord_origin": "1"}}, {"id": 162, "text": "20", "bbox": {"l": 397.7251, "t": 488.28958, "r": 402.05087, "b": 491.0256, "coord_origin": "1"}}, {"id": 163, "text": "2013", "bbox": {"l": 200.64032, "t": 493.1683, "r": 208.48566, "b": 495.90433, "coord_origin": "1"}}, {"id": 164, "text": "3", "bbox": {"l": 264.29044, "t": 493.1683, "r": 266.25879, "b": 495.90433, "coord_origin": "1"}}]}, {"id": 8, "label": "Text", "bbox": {"l": 227.91466, "t": 665.82603, "r": 230.10028, "b": 675.3464, "coord_origin": "1"}, "confidence": -1.0, "cells": [{"id": 165, "text": "\u03bc", "bbox": {"l": 227.91466, "t": 665.82603, "r": 230.10028, "b": 675.3464, "coord_origin": "1"}}]}, {"id": 9, "label": "Text", "bbox": {"l": 300.58057, "t": 683.62195, "r": 302.72638, "b": 693.428658, "coord_origin": "1"}, "confidence": -1.0, "cells": [{"id": 166, "text": "\u2265", "bbox": {"l": 300.58057, "t": 683.62195, "r": 302.72638, "b": 693.428658, "coord_origin": "1"}}]}]}, "tablestructure": {"table_map": {"3": {"label": "Table", "id": 3, "page_no": 9, "cluster": {"id": 3, "label": "Table", "bbox": {"l": 143.8171488761902, "t": 156.13133182525632, "r": 470.8412103652954, "b": 263.2244602203368, "coord_origin": "1"}, "confidence": 0.9879505038261414, "cells": [{"id": 7, "text": "Data set", "bbox": {"l": 160.782, "t": 166.55895999999996, "r": 194.99779, "b": 174.62865999999997, "coord_origin": "1"}}, {"id": 8, "text": "Language", "bbox": {"l": 215.52499000000003, "t": 166.534, "r": 254.04465, "b": 174.6037, "coord_origin": "1"}}, {"id": 9, "text": "TEDs", "bbox": {"l": 300.397, "t": 161.07898, "r": 323.99118, "b": 169.14868, "coord_origin": "1"}}, {"id": 10, "text": "mAP(0.75)", "bbox": {"l": 370.345, "t": 166.55895999999996, "r": 414.74661, "b": 174.62865999999997, "coord_origin": "1"}}, {"id": 11, "text": "Inference", "bbox": {"l": 426.737, "t": 161.07898, "r": 463.10830999999996, "b": 169.14868, "coord_origin": "1"}}, {"id": 12, "text": "time (secs)", "bbox": {"l": 423.11401, "t": 172.03796, "r": 466.72656, "b": 180.10766999999998, "coord_origin": "1"}}, {"id": 13, "text": "simple", "bbox": {"l": 262.41299, "t": 174.03101000000004, "r": 288.0596, "b": 182.10071000000005, "coord_origin": "1"}}, {"id": 14, "text": "complex", "bbox": {"l": 296.42899, "t": 174.03101000000004, "r": 329.44687, "b": 182.10071000000005, "coord_origin": "1"}}, {"id": 15, "text": "all", "bbox": {"l": 345.03299, "t": 174.03101000000004, "r": 354.75793, "b": 182.10071000000005, "coord_origin": "1"}}, {"id": 16, "text": "PubTabNet", "bbox": {"l": 154.53799, "t": 192.85999000000004, "r": 201.24129, "b": 200.92969000000005, "coord_origin": "1"}}, {"id": 17, "text": "OTSL", "bbox": {"l": 222.43700000000004, "t": 187.38098000000002, "r": 247.13226000000003, "b": 195.45068000000003, "coord_origin": "1"}}, {"id": 18, "text": "0.965", "bbox": {"l": 264.74399, "t": 187.38098000000002, "r": 285.73074, "b": 195.45068000000003, "coord_origin": "1"}}, {"id": 19, "text": "0.934", "bbox": {"l": 302.444, "t": 187.38098000000002, "r": 323.43076, "b": 195.45068000000003, "coord_origin": "1"}}, {"id": 20, "text": "0.955", "bbox": {"l": 339.40302, "t": 187.38098000000002, "r": 360.38977, "b": 195.45068000000003, "coord_origin": "1"}}, {"id": 21, "text": "0.88", "bbox": {"l": 383.116, "t": 187.31817999999998, "r": 401.97324, "b": 195.24451, "coord_origin": "1"}}, {"id": 22, "text": "2.73", "bbox": {"l": 435.49300999999997, "t": 187.31817999999998, "r": 454.35025, "b": 195.24451, "coord_origin": "1"}}, {"id": 23, "text": "HTML", "bbox": {"l": 220.903, "t": 200.33196999999996, "r": 248.66655999999998, "b": 208.40166999999997, "coord_origin": "1"}}, {"id": 24, "text": "0.969", "bbox": {"l": 264.74399, "t": 200.33196999999996, "r": 285.73074, "b": 208.40166999999997, "coord_origin": "1"}}, {"id": 25, "text": "0.927", "bbox": {"l": 302.444, "t": 200.33196999999996, "r": 323.43076, "b": 208.40166999999997, "coord_origin": "1"}}, {"id": 26, "text": "0.955", "bbox": {"l": 339.40302, "t": 200.33196999999996, "r": 360.38977, "b": 208.40166999999997, "coord_origin": "1"}}, {"id": 27, "text": "0.857", "bbox": {"l": 382.052, "t": 200.33196999999996, "r": 403.03876, "b": 208.40166999999997, "coord_origin": "1"}}, {"id": 28, "text": "5.39", "bbox": {"l": 436.73199000000005, "t": 200.33196999999996, "r": 453.11182, "b": 208.40166999999997, "coord_origin": "1"}}, {"id": 29, "text": "FinTabNet", "bbox": {"l": 155.94501, "t": 219.16198999999995, "r": 199.83374, "b": 227.23168999999996, "coord_origin": "1"}}, {"id": 30, "text": "OTSL", "bbox": {"l": 222.43700000000004, "t": 213.68201, "r": 247.13226000000003, "b": 221.75171, "coord_origin": "1"}}, {"id": 31, "text": "0.955", "bbox": {"l": 264.74399, "t": 213.68201, "r": 285.73074, "b": 221.75171, "coord_origin": "1"}}, {"id": 32, "text": "0.961", "bbox": {"l": 302.444, "t": 213.68201, "r": 323.43076, "b": 221.75171, "coord_origin": "1"}}, {"id": 33, "text": "0.959", "bbox": {"l": 337.815, "t": 213.61919999999998, "r": 361.97586, "b": 221.54552999999999, "coord_origin": "1"}}, {"id": 34, "text": "0.862", "bbox": {"l": 380.46399, "t": 213.61919999999998, "r": 404.62485, "b": 221.54552999999999, "coord_origin": "1"}}, {"id": 35, "text": "1.85", "bbox": {"l": 435.49300999999997, "t": 213.61919999999998, "r": 454.35025, "b": 221.54552999999999, "coord_origin": "1"}}, {"id": 36, "text": "HTML", "bbox": {"l": 220.903, "t": 226.63396999999998, "r": 248.66655999999998, "b": 234.70367, "coord_origin": "1"}}, {"id": 37, "text": "0.917", "bbox": {"l": 264.74399, "t": 226.63396999999998, "r": 285.73074, "b": 234.70367, "coord_origin": "1"}}, {"id": 38, "text": "0.922", "bbox": {"l": 302.444, "t": 226.63396999999998, "r": 323.43076, "b": 234.70367, "coord_origin": "1"}}, {"id": 39, "text": "0.92", "bbox": {"l": 341.70599, "t": 226.63396999999998, "r": 358.08582, "b": 234.70367, "coord_origin": "1"}}, {"id": 40, "text": "0.722", "bbox": {"l": 382.052, "t": 226.63396999999998, "r": 403.03876, "b": 234.70367, "coord_origin": "1"}}, {"id": 41, "text": "3.26", "bbox": {"l": 436.73199000000005, "t": 226.63396999999998, "r": 453.11182, "b": 234.70367, "coord_origin": "1"}}, {"id": 42, "text": "PubTables-1M", "bbox": {"l": 148.62601, "t": 245.46294999999998, "r": 207.1524, "b": 253.53265, "coord_origin": "1"}}, {"id": 43, "text": "OTSL", "bbox": {"l": 222.43700000000004, "t": 239.98297000000002, "r": 247.13226000000003, "b": 248.05267000000003, "coord_origin": "1"}}, {"id": 44, "text": "0.987", "bbox": {"l": 264.74399, "t": 239.98297000000002, "r": 285.73074, "b": 248.05267000000003, "coord_origin": "1"}}, {"id": 45, "text": "0.964", "bbox": {"l": 302.444, "t": 239.98297000000002, "r": 323.43076, "b": 248.05267000000003, "coord_origin": "1"}}, {"id": 46, "text": "0.977", "bbox": {"l": 337.815, "t": 239.92016999999998, "r": 361.97586, "b": 247.8465, "coord_origin": "1"}}, {"id": 47, "text": "0.896", "bbox": {"l": 380.46399, "t": 239.92016999999998, "r": 404.62485, "b": 247.8465, "coord_origin": "1"}}, {"id": 48, "text": "1.79", "bbox": {"l": 435.49300999999997, "t": 239.92016999999998, "r": 454.35025, "b": 247.8465, "coord_origin": "1"}}, {"id": 49, "text": "HTML", "bbox": {"l": 220.903, "t": 252.93499999999995, "r": 248.66655999999998, "b": 261.00469999999996, "coord_origin": "1"}}, {"id": 50, "text": "0.983", "bbox": {"l": 264.74399, "t": 252.93499999999995, "r": 285.73074, "b": 261.00469999999996, "coord_origin": "1"}}, {"id": 51, "text": "0.944", "bbox": {"l": 302.444, "t": 252.93499999999995, "r": 323.43076, "b": 261.00469999999996, "coord_origin": "1"}}, {"id": 52, "text": "0.966", "bbox": {"l": 339.40302, "t": 252.93499999999995, "r": 360.38977, "b": 261.00469999999996, "coord_origin": "1"}}, {"id": 53, "text": "0.889", "bbox": {"l": 382.052, "t": 252.93499999999995, "r": 403.03876, "b": 261.00469999999996, "coord_origin": "1"}}, {"id": 54, "text": "3.26", "bbox": {"l": 436.73199000000005, "t": 252.93499999999995, "r": 453.11182, "b": 261.00469999999996, "coord_origin": "1"}}]}, "text": null, "otsl_seq": ["ched", "ched", "ched", "lcel", "lcel", "ched", "ched", "nl", "ched", "ucel", "ched", "ched", "ched", "ucel", "ucel", "nl", "rhed", "rhed", "fcel", "fcel", "fcel", "fcel", "fcel", "nl", "ucel", "rhed", "fcel", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "rhed", "fcel", "fcel", "fcel", "fcel", "fcel", "nl", "ucel", "rhed", "fcel", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "rhed", "fcel", "fcel", "fcel", "fcel", "fcel", "nl", "ucel", "rhed", "fcel", "fcel", "fcel", "fcel", "fcel", "nl"], "num_rows": 8, "num_cols": 7, "table_cells": [{"bbox": {"l": 215.52499000000003, "t": 166.534, "r": 254.04465, "b": 174.6037, "coord_origin": "1"}, "row_span": 2, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 2, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "Language", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 300.397, "t": 161.07898, "r": 323.99118, "b": 169.14868, "coord_origin": "1"}, "row_span": 1, "col_span": 3, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 2, "end_col_offset_idx": 5, "text": "TEDs", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 370.345, "t": 166.55895999999996, "r": 414.74661, "b": 174.62865999999997, "coord_origin": "1"}, "row_span": 2, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 2, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "mAP(0.75)", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 423.11401, "t": 161.07898, "r": 466.72656, "b": 180.10766999999998, "coord_origin": "1"}, "row_span": 2, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 2, "start_col_offset_idx": 6, "end_col_offset_idx": 7, "text": "Inference time (secs)", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 262.41299, "t": 174.03101000000004, "r": 288.0596, "b": 182.10071000000005, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "simple", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 296.42899, "t": 174.03101000000004, "r": 329.44687, "b": 182.10071000000005, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "complex", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 345.03299, "t": 174.03101000000004, "r": 354.75793, "b": 182.10071000000005, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "all", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 154.53799, "t": 192.85999000000004, "r": 201.24129, "b": 200.92969000000005, "coord_origin": "1"}, "row_span": 2, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 4, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "PubTabNet", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 222.43700000000004, "t": 187.38098000000002, "r": 247.13226000000003, "b": 195.45068000000003, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "OTSL", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 264.74399, "t": 187.38098000000002, "r": 285.73074, "b": 195.45068000000003, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "0.965", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 302.444, "t": 187.38098000000002, "r": 323.43076, "b": 195.45068000000003, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "0.934", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 339.40302, "t": 187.38098000000002, "r": 360.38977, "b": 195.45068000000003, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "0.955", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 383.116, "t": 187.31817999999998, "r": 401.97324, "b": 195.24451, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "0.88", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 435.49300999999997, "t": 187.31817999999998, "r": 454.35025, "b": 195.24451, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 6, "end_col_offset_idx": 7, "text": "2.73", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 220.903, "t": 200.33196999999996, "r": 248.66655999999998, "b": 208.40166999999997, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "HTML", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 264.74399, "t": 200.33196999999996, "r": 285.73074, "b": 208.40166999999997, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "0.969", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 302.444, "t": 200.33196999999996, "r": 323.43076, "b": 208.40166999999997, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "0.927", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 339.40302, "t": 200.33196999999996, "r": 360.38977, "b": 208.40166999999997, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "0.955", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 382.052, "t": 200.33196999999996, "r": 403.03876, "b": 208.40166999999997, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "0.857", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 436.73199000000005, "t": 200.33196999999996, "r": 453.11182, "b": 208.40166999999997, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 6, "end_col_offset_idx": 7, "text": "5.39", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 155.94501, "t": 219.16198999999995, "r": 199.83374, "b": 227.23168999999996, "coord_origin": "1"}, "row_span": 2, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 6, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "FinTabNet", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 222.43700000000004, "t": 213.68201, "r": 247.13226000000003, "b": 221.75171, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "OTSL", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 264.74399, "t": 213.68201, "r": 285.73074, "b": 221.75171, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "0.955", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 302.444, "t": 213.68201, "r": 323.43076, "b": 221.75171, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "0.961", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 337.815, "t": 213.61919999999998, "r": 361.97586, "b": 221.54552999999999, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "0.959", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 380.46399, "t": 213.61919999999998, "r": 404.62485, "b": 221.54552999999999, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "0.862", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 435.49300999999997, "t": 213.61919999999998, "r": 454.35025, "b": 221.54552999999999, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 6, "end_col_offset_idx": 7, "text": "1.85", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 220.903, "t": 226.63396999999998, "r": 248.66655999999998, "b": 234.70367, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "HTML", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 264.74399, "t": 226.63396999999998, "r": 285.73074, "b": 234.70367, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "0.917", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 302.444, "t": 226.63396999999998, "r": 323.43076, "b": 234.70367, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "0.922", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 341.70599, "t": 226.63396999999998, "r": 358.08582, "b": 234.70367, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "0.92", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 382.052, "t": 226.63396999999998, "r": 403.03876, "b": 234.70367, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "0.722", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 436.73199000000005, "t": 226.63396999999998, "r": 453.11182, "b": 234.70367, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 6, "end_col_offset_idx": 7, "text": "3.26", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 148.62601, "t": 245.46294999999998, "r": 207.1524, "b": 253.53265, "coord_origin": "1"}, "row_span": 2, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 8, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "PubTables-1M", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 222.43700000000004, "t": 239.98297000000002, "r": 247.13226000000003, "b": 248.05267000000003, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "OTSL", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 264.74399, "t": 239.98297000000002, "r": 285.73074, "b": 248.05267000000003, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "0.987", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 302.444, "t": 239.98297000000002, "r": 323.43076, "b": 248.05267000000003, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "0.964", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 337.815, "t": 239.92016999999998, "r": 361.97586, "b": 247.8465, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "0.977", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 380.46399, "t": 239.92016999999998, "r": 404.62485, "b": 247.8465, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "0.896", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 435.49300999999997, "t": 239.92016999999998, "r": 454.35025, "b": 247.8465, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 6, "end_col_offset_idx": 7, "text": "1.79", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 220.903, "t": 252.93499999999995, "r": 248.66655999999998, "b": 261.00469999999996, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "HTML", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 264.74399, "t": 252.93499999999995, "r": 285.73074, "b": 261.00469999999996, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "0.983", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 302.444, "t": 252.93499999999995, "r": 323.43076, "b": 261.00469999999996, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "0.944", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 339.40302, "t": 252.93499999999995, "r": 360.38977, "b": 261.00469999999996, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "0.966", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 382.052, "t": 252.93499999999995, "r": 403.03876, "b": 261.00469999999996, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "0.889", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 436.73199000000005, "t": 252.93499999999995, "r": 453.11182, "b": 261.00469999999996, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 6, "end_col_offset_idx": 7, "text": "3.26", "column_header": false, "row_header": false, "row_section": false}]}}}, "figures_classification": null, "equations_prediction": null}, "assembled": {"elements": [{"label": "Page-header", "id": 0, "page_no": 9, "cluster": {"id": 0, "label": "Page-header", "bbox": {"l": 134.6792824745178, "t": 93.56233406066895, "r": 144.24872789382934, "b": 101.84069999999997, "coord_origin": "1"}, "confidence": 0.8677384853363037, "cells": [{"id": 0, "text": "10", "bbox": {"l": 134.765, "t": 93.77099999999996, "r": 143.97887, "b": 101.84069999999997, "coord_origin": "1"}}]}, "text": "10"}, {"label": "Page-header", "id": 1, "page_no": 9, "cluster": {"id": 1, "label": "Page-header", "bbox": {"l": 167.24963665008545, "t": 92.96470441818235, "r": 231.72049000000004, "b": 101.84069999999997, "coord_origin": "1"}, "confidence": 0.8613899946212769, "cells": [{"id": 1, "text": "M.", "bbox": {"l": 167.82053, "t": 93.77099999999996, "r": 178.08249, "b": 101.84069999999997, "coord_origin": "1"}}, {"id": 2, "text": "Lysak, et al.", "bbox": {"l": 182.37929, "t": 93.77099999999996, "r": 231.72049000000004, "b": 101.84069999999997, "coord_origin": "1"}}]}, "text": "M. Lysak, et al."}, {"label": "Caption", "id": 2, "page_no": 9, "cluster": {"id": 2, "label": "Caption", "bbox": {"l": 134.00595617294312, "t": 114.83857812881467, "r": 480.59357000000006, "b": 146.49228200912478, "coord_origin": "1"}, "confidence": 0.9548113346099854, "cells": [{"id": 3, "text": "Table 2.", "bbox": {"l": 134.765, "t": 115.83618000000001, "r": 173.09366, "b": 123.76251000000002, "coord_origin": "1"}}, {"id": 4, "text": "TSR and cell detection results compared between OTSL and HTML on", "bbox": {"l": 181.30299, "t": 115.89899000000003, "r": 480.59151999999995, "b": 123.96868999999992, "coord_origin": "1"}}, {"id": 5, "text": "the PubTabNet [22], FinTabNet [21] and PubTables-1M [14] data sets using Table-", "bbox": {"l": 134.765, "t": 126.85797000000014, "r": 480.59357000000006, "b": 134.92767000000003, "coord_origin": "1"}}, {"id": 6, "text": "Former [9] (with enc=6, dec=6, heads=8).", "bbox": {"l": 134.765, "t": 137.81696, "r": 305.95691, "b": 145.88666, "coord_origin": "1"}}]}, "text": "Table 2. TSR and cell detection results compared between OTSL and HTML on the PubTabNet [22], FinTabNet [21] and PubTables-1M [14] data sets using TableFormer [9] (with enc=6, dec=6, heads=8)."}, {"label": "Table", "id": 3, "page_no": 9, "cluster": {"id": 3, "label": "Table", "bbox": {"l": 143.8171488761902, "t": 156.13133182525632, "r": 470.8412103652954, "b": 263.2244602203368, "coord_origin": "1"}, "confidence": 0.9879505038261414, "cells": [{"id": 7, "text": "Data set", "bbox": {"l": 160.782, "t": 166.55895999999996, "r": 194.99779, "b": 174.62865999999997, "coord_origin": "1"}}, {"id": 8, "text": "Language", "bbox": {"l": 215.52499000000003, "t": 166.534, "r": 254.04465, "b": 174.6037, "coord_origin": "1"}}, {"id": 9, "text": "TEDs", "bbox": {"l": 300.397, "t": 161.07898, "r": 323.99118, "b": 169.14868, "coord_origin": "1"}}, {"id": 10, "text": "mAP(0.75)", "bbox": {"l": 370.345, "t": 166.55895999999996, "r": 414.74661, "b": 174.62865999999997, "coord_origin": "1"}}, {"id": 11, "text": "Inference", "bbox": {"l": 426.737, "t": 161.07898, "r": 463.10830999999996, "b": 169.14868, "coord_origin": "1"}}, {"id": 12, "text": "time (secs)", "bbox": {"l": 423.11401, "t": 172.03796, "r": 466.72656, "b": 180.10766999999998, "coord_origin": "1"}}, {"id": 13, "text": "simple", "bbox": {"l": 262.41299, "t": 174.03101000000004, "r": 288.0596, "b": 182.10071000000005, "coord_origin": "1"}}, {"id": 14, "text": "complex", "bbox": {"l": 296.42899, "t": 174.03101000000004, "r": 329.44687, "b": 182.10071000000005, "coord_origin": "1"}}, {"id": 15, "text": "all", "bbox": {"l": 345.03299, "t": 174.03101000000004, "r": 354.75793, "b": 182.10071000000005, "coord_origin": "1"}}, {"id": 16, "text": "PubTabNet", "bbox": {"l": 154.53799, "t": 192.85999000000004, "r": 201.24129, "b": 200.92969000000005, "coord_origin": "1"}}, {"id": 17, "text": "OTSL", "bbox": {"l": 222.43700000000004, "t": 187.38098000000002, "r": 247.13226000000003, "b": 195.45068000000003, "coord_origin": "1"}}, {"id": 18, "text": "0.965", "bbox": {"l": 264.74399, "t": 187.38098000000002, "r": 285.73074, "b": 195.45068000000003, "coord_origin": "1"}}, {"id": 19, "text": "0.934", "bbox": {"l": 302.444, "t": 187.38098000000002, "r": 323.43076, "b": 195.45068000000003, "coord_origin": "1"}}, {"id": 20, "text": "0.955", "bbox": {"l": 339.40302, "t": 187.38098000000002, "r": 360.38977, "b": 195.45068000000003, "coord_origin": "1"}}, {"id": 21, "text": "0.88", "bbox": {"l": 383.116, "t": 187.31817999999998, "r": 401.97324, "b": 195.24451, "coord_origin": "1"}}, {"id": 22, "text": "2.73", "bbox": {"l": 435.49300999999997, "t": 187.31817999999998, "r": 454.35025, "b": 195.24451, "coord_origin": "1"}}, {"id": 23, "text": "HTML", "bbox": {"l": 220.903, "t": 200.33196999999996, "r": 248.66655999999998, "b": 208.40166999999997, "coord_origin": "1"}}, {"id": 24, "text": "0.969", "bbox": {"l": 264.74399, "t": 200.33196999999996, "r": 285.73074, "b": 208.40166999999997, "coord_origin": "1"}}, {"id": 25, "text": "0.927", "bbox": {"l": 302.444, "t": 200.33196999999996, "r": 323.43076, "b": 208.40166999999997, "coord_origin": "1"}}, {"id": 26, "text": "0.955", "bbox": {"l": 339.40302, "t": 200.33196999999996, "r": 360.38977, "b": 208.40166999999997, "coord_origin": "1"}}, {"id": 27, "text": "0.857", "bbox": {"l": 382.052, "t": 200.33196999999996, "r": 403.03876, "b": 208.40166999999997, "coord_origin": "1"}}, {"id": 28, "text": "5.39", "bbox": {"l": 436.73199000000005, "t": 200.33196999999996, "r": 453.11182, "b": 208.40166999999997, "coord_origin": "1"}}, {"id": 29, "text": "FinTabNet", "bbox": {"l": 155.94501, "t": 219.16198999999995, "r": 199.83374, "b": 227.23168999999996, "coord_origin": "1"}}, {"id": 30, "text": "OTSL", "bbox": {"l": 222.43700000000004, "t": 213.68201, "r": 247.13226000000003, "b": 221.75171, "coord_origin": "1"}}, {"id": 31, "text": "0.955", "bbox": {"l": 264.74399, "t": 213.68201, "r": 285.73074, "b": 221.75171, "coord_origin": "1"}}, {"id": 32, "text": "0.961", "bbox": {"l": 302.444, "t": 213.68201, "r": 323.43076, "b": 221.75171, "coord_origin": "1"}}, {"id": 33, "text": "0.959", "bbox": {"l": 337.815, "t": 213.61919999999998, "r": 361.97586, "b": 221.54552999999999, "coord_origin": "1"}}, {"id": 34, "text": "0.862", "bbox": {"l": 380.46399, "t": 213.61919999999998, "r": 404.62485, "b": 221.54552999999999, "coord_origin": "1"}}, {"id": 35, "text": "1.85", "bbox": {"l": 435.49300999999997, "t": 213.61919999999998, "r": 454.35025, "b": 221.54552999999999, "coord_origin": "1"}}, {"id": 36, "text": "HTML", "bbox": {"l": 220.903, "t": 226.63396999999998, "r": 248.66655999999998, "b": 234.70367, "coord_origin": "1"}}, {"id": 37, "text": "0.917", "bbox": {"l": 264.74399, "t": 226.63396999999998, "r": 285.73074, "b": 234.70367, "coord_origin": "1"}}, {"id": 38, "text": "0.922", "bbox": {"l": 302.444, "t": 226.63396999999998, "r": 323.43076, "b": 234.70367, "coord_origin": "1"}}, {"id": 39, "text": "0.92", "bbox": {"l": 341.70599, "t": 226.63396999999998, "r": 358.08582, "b": 234.70367, "coord_origin": "1"}}, {"id": 40, "text": "0.722", "bbox": {"l": 382.052, "t": 226.63396999999998, "r": 403.03876, "b": 234.70367, "coord_origin": "1"}}, {"id": 41, "text": "3.26", "bbox": {"l": 436.73199000000005, "t": 226.63396999999998, "r": 453.11182, "b": 234.70367, "coord_origin": "1"}}, {"id": 42, "text": "PubTables-1M", "bbox": {"l": 148.62601, "t": 245.46294999999998, "r": 207.1524, "b": 253.53265, "coord_origin": "1"}}, {"id": 43, "text": "OTSL", "bbox": {"l": 222.43700000000004, "t": 239.98297000000002, "r": 247.13226000000003, "b": 248.05267000000003, "coord_origin": "1"}}, {"id": 44, "text": "0.987", "bbox": {"l": 264.74399, "t": 239.98297000000002, "r": 285.73074, "b": 248.05267000000003, "coord_origin": "1"}}, {"id": 45, "text": "0.964", "bbox": {"l": 302.444, "t": 239.98297000000002, "r": 323.43076, "b": 248.05267000000003, "coord_origin": "1"}}, {"id": 46, "text": "0.977", "bbox": {"l": 337.815, "t": 239.92016999999998, "r": 361.97586, "b": 247.8465, "coord_origin": "1"}}, {"id": 47, "text": "0.896", "bbox": {"l": 380.46399, "t": 239.92016999999998, "r": 404.62485, "b": 247.8465, "coord_origin": "1"}}, {"id": 48, "text": "1.79", "bbox": {"l": 435.49300999999997, "t": 239.92016999999998, "r": 454.35025, "b": 247.8465, "coord_origin": "1"}}, {"id": 49, "text": "HTML", "bbox": {"l": 220.903, "t": 252.93499999999995, "r": 248.66655999999998, "b": 261.00469999999996, "coord_origin": "1"}}, {"id": 50, "text": "0.983", "bbox": {"l": 264.74399, "t": 252.93499999999995, "r": 285.73074, "b": 261.00469999999996, "coord_origin": "1"}}, {"id": 51, "text": "0.944", "bbox": {"l": 302.444, "t": 252.93499999999995, "r": 323.43076, "b": 261.00469999999996, "coord_origin": "1"}}, {"id": 52, "text": "0.966", "bbox": {"l": 339.40302, "t": 252.93499999999995, "r": 360.38977, "b": 261.00469999999996, "coord_origin": "1"}}, {"id": 53, "text": "0.889", "bbox": {"l": 382.052, "t": 252.93499999999995, "r": 403.03876, "b": 261.00469999999996, "coord_origin": "1"}}, {"id": 54, "text": "3.26", "bbox": {"l": 436.73199000000005, "t": 252.93499999999995, "r": 453.11182, "b": 261.00469999999996, "coord_origin": "1"}}]}, "text": null, "otsl_seq": ["ched", "ched", "ched", "lcel", "lcel", "ched", "ched", "nl", "ched", "ucel", "ched", "ched", "ched", "ucel", "ucel", "nl", "rhed", "rhed", "fcel", "fcel", "fcel", "fcel", "fcel", "nl", "ucel", "rhed", "fcel", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "rhed", "fcel", "fcel", "fcel", "fcel", "fcel", "nl", "ucel", "rhed", "fcel", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "rhed", "fcel", "fcel", "fcel", "fcel", "fcel", "nl", "ucel", "rhed", "fcel", "fcel", "fcel", "fcel", "fcel", "nl"], "num_rows": 8, "num_cols": 7, "table_cells": [{"bbox": {"l": 215.52499000000003, "t": 166.534, "r": 254.04465, "b": 174.6037, "coord_origin": "1"}, "row_span": 2, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 2, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "Language", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 300.397, "t": 161.07898, "r": 323.99118, "b": 169.14868, "coord_origin": "1"}, "row_span": 1, "col_span": 3, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 2, "end_col_offset_idx": 5, "text": "TEDs", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 370.345, "t": 166.55895999999996, "r": 414.74661, "b": 174.62865999999997, "coord_origin": "1"}, "row_span": 2, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 2, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "mAP(0.75)", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 423.11401, "t": 161.07898, "r": 466.72656, "b": 180.10766999999998, "coord_origin": "1"}, "row_span": 2, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 2, "start_col_offset_idx": 6, "end_col_offset_idx": 7, "text": "Inference time (secs)", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 262.41299, "t": 174.03101000000004, "r": 288.0596, "b": 182.10071000000005, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "simple", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 296.42899, "t": 174.03101000000004, "r": 329.44687, "b": 182.10071000000005, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "complex", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 345.03299, "t": 174.03101000000004, "r": 354.75793, "b": 182.10071000000005, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "all", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 154.53799, "t": 192.85999000000004, "r": 201.24129, "b": 200.92969000000005, "coord_origin": "1"}, "row_span": 2, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 4, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "PubTabNet", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 222.43700000000004, "t": 187.38098000000002, "r": 247.13226000000003, "b": 195.45068000000003, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "OTSL", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 264.74399, "t": 187.38098000000002, "r": 285.73074, "b": 195.45068000000003, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "0.965", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 302.444, "t": 187.38098000000002, "r": 323.43076, "b": 195.45068000000003, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "0.934", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 339.40302, "t": 187.38098000000002, "r": 360.38977, "b": 195.45068000000003, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "0.955", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 383.116, "t": 187.31817999999998, "r": 401.97324, "b": 195.24451, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "0.88", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 435.49300999999997, "t": 187.31817999999998, "r": 454.35025, "b": 195.24451, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 6, "end_col_offset_idx": 7, "text": "2.73", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 220.903, "t": 200.33196999999996, "r": 248.66655999999998, "b": 208.40166999999997, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "HTML", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 264.74399, "t": 200.33196999999996, "r": 285.73074, "b": 208.40166999999997, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "0.969", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 302.444, "t": 200.33196999999996, "r": 323.43076, "b": 208.40166999999997, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "0.927", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 339.40302, "t": 200.33196999999996, "r": 360.38977, "b": 208.40166999999997, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "0.955", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 382.052, "t": 200.33196999999996, "r": 403.03876, "b": 208.40166999999997, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "0.857", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 436.73199000000005, "t": 200.33196999999996, "r": 453.11182, "b": 208.40166999999997, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 6, "end_col_offset_idx": 7, "text": "5.39", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 155.94501, "t": 219.16198999999995, "r": 199.83374, "b": 227.23168999999996, "coord_origin": "1"}, "row_span": 2, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 6, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "FinTabNet", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 222.43700000000004, "t": 213.68201, "r": 247.13226000000003, "b": 221.75171, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "OTSL", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 264.74399, "t": 213.68201, "r": 285.73074, "b": 221.75171, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "0.955", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 302.444, "t": 213.68201, "r": 323.43076, "b": 221.75171, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "0.961", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 337.815, "t": 213.61919999999998, "r": 361.97586, "b": 221.54552999999999, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "0.959", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 380.46399, "t": 213.61919999999998, "r": 404.62485, "b": 221.54552999999999, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "0.862", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 435.49300999999997, "t": 213.61919999999998, "r": 454.35025, "b": 221.54552999999999, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 6, "end_col_offset_idx": 7, "text": "1.85", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 220.903, "t": 226.63396999999998, "r": 248.66655999999998, "b": 234.70367, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "HTML", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 264.74399, "t": 226.63396999999998, "r": 285.73074, "b": 234.70367, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "0.917", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 302.444, "t": 226.63396999999998, "r": 323.43076, "b": 234.70367, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "0.922", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 341.70599, "t": 226.63396999999998, "r": 358.08582, "b": 234.70367, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "0.92", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 382.052, "t": 226.63396999999998, "r": 403.03876, "b": 234.70367, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "0.722", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 436.73199000000005, "t": 226.63396999999998, "r": 453.11182, "b": 234.70367, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 6, "end_col_offset_idx": 7, "text": "3.26", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 148.62601, "t": 245.46294999999998, "r": 207.1524, "b": 253.53265, "coord_origin": "1"}, "row_span": 2, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 8, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "PubTables-1M", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 222.43700000000004, "t": 239.98297000000002, "r": 247.13226000000003, "b": 248.05267000000003, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "OTSL", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 264.74399, "t": 239.98297000000002, "r": 285.73074, "b": 248.05267000000003, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "0.987", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 302.444, "t": 239.98297000000002, "r": 323.43076, "b": 248.05267000000003, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "0.964", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 337.815, "t": 239.92016999999998, "r": 361.97586, "b": 247.8465, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "0.977", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 380.46399, "t": 239.92016999999998, "r": 404.62485, "b": 247.8465, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "0.896", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 435.49300999999997, "t": 239.92016999999998, "r": 454.35025, "b": 247.8465, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 6, "end_col_offset_idx": 7, "text": "1.79", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 220.903, "t": 252.93499999999995, "r": 248.66655999999998, "b": 261.00469999999996, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "HTML", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 264.74399, "t": 252.93499999999995, "r": 285.73074, "b": 261.00469999999996, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "0.983", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 302.444, "t": 252.93499999999995, "r": 323.43076, "b": 261.00469999999996, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "0.944", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 339.40302, "t": 252.93499999999995, "r": 360.38977, "b": 261.00469999999996, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "0.966", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 382.052, "t": 252.93499999999995, "r": 403.03876, "b": 261.00469999999996, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "0.889", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 436.73199000000005, "t": 252.93499999999995, "r": 453.11182, "b": 261.00469999999996, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 6, "end_col_offset_idx": 7, "text": "3.26", "column_header": false, "row_header": false, "row_section": false}]}, {"label": "Section-header", "id": 4, "page_no": 9, "cluster": {"id": 4, "label": "Section-header", "bbox": {"l": 134.25314598083494, "t": 288.23322944641114, "r": 257.1956182479858, "b": 298.2838571548462, "coord_origin": "1"}, "confidence": 0.9522386193275452, "cells": [{"id": 55, "text": "5.3", "bbox": {"l": 134.765, "t": 288.91479, "r": 149.40205, "b": 297.72173999999995, "coord_origin": "1"}}, {"id": 56, "text": "Qualitative Results", "bbox": {"l": 160.85904, "t": 288.91479, "r": 257.08679, "b": 297.72173999999995, "coord_origin": "1"}}]}, "text": "5.3 Qualitative Results"}, {"label": "Text", "id": 5, "page_no": 9, "cluster": {"id": 5, "label": "Text", "bbox": {"l": 133.7931432723999, "t": 308.9267612457275, "r": 480.6096508026123, "b": 366.47769, "coord_origin": "1"}, "confidence": 0.9832314252853394, "cells": [{"id": 57, "text": "To illustrate the qualitative differences between OTSL and HTML, Figure 5", "bbox": {"l": 134.765, "t": 309.86078, "r": 480.58777, "b": 318.65775, "coord_origin": "1"}}, {"id": 58, "text": "demonstrates less overlap and more accurate bounding boxes with OTSL. In", "bbox": {"l": 134.765, "t": 321.81577, "r": 480.58889999999997, "b": 330.61273, "coord_origin": "1"}}, {"id": 59, "text": "Figure 6, OTSL proves to be more effective in handling tables with longer to-", "bbox": {"l": 134.765, "t": 333.77075, "r": 480.58681999999993, "b": 342.56772, "coord_origin": "1"}}, {"id": 60, "text": "ken sequences, resulting in even more precise structure prediction and bounding", "bbox": {"l": 134.765, "t": 345.72574, "r": 480.58981, "b": 354.52271, "coord_origin": "1"}}, {"id": 61, "text": "boxes.", "bbox": {"l": 134.765, "t": 357.68073, "r": 161.65704, "b": 366.47769, "coord_origin": "1"}}]}, "text": "To illustrate the qualitative differences between OTSL and HTML, Figure 5 demonstrates less overlap and more accurate bounding boxes with OTSL. In Figure 6, OTSL proves to be more effective in handling tables with longer token sequences, resulting in even more precise structure prediction and bounding boxes."}, {"label": "Caption", "id": 6, "page_no": 9, "cluster": {"id": 6, "label": "Caption", "bbox": {"l": 133.93432788848875, "t": 396.78733520507814, "r": 480.59106, "b": 439.71716, "coord_origin": "1"}, "confidence": 0.7760405540466309, "cells": [{"id": 62, "text": "Fig. 5.", "bbox": {"l": 134.765, "t": 397.59012, "r": 162.64424, "b": 405.51642, "coord_origin": "1"}}, {"id": 63, "text": "The OTSL model produces more accurate bounding boxes with less over-", "bbox": {"l": 167.384, "t": 397.65289, "r": 480.59106, "b": 405.72266, "coord_origin": "1"}}, {"id": 64, "text": "lap (E) than the HTML model (D), when predicting the structure of a sparse ta-", "bbox": {"l": 134.765, "t": 408.61190999999997, "r": 480.59106, "b": 416.68167000000005, "coord_origin": "1"}}, {"id": 65, "text": "ble (A), at twice the inference speed because of shorter sequence length (B),(C).", "bbox": {"l": 134.765, "t": 419.57089, "r": 480.58838000000003, "b": 427.64066, "coord_origin": "1"}}, {"id": 66, "text": "\"PMC2807444_006_00.png\" PubTabNet.", "bbox": {"l": 134.765, "t": 430.52987999999993, "r": 304.69171, "b": 438.59964, "coord_origin": "1"}}, {"id": 118, "text": "\u03bc", "bbox": {"l": 342.63354, "t": 430.19678, "r": 344.81915, "b": 439.71716, "coord_origin": "1"}}]}, "text": "Fig. 5. The OTSL model produces more accurate bounding boxes with less overlap (E) than the HTML model (D), when predicting the structure of a sparse table (A), at twice the inference speed because of shorter sequence length (B),(C). \"PMC2807444_006_00.png\" PubTabNet. \u03bc"}, {"label": "Picture", "id": 7, "page_no": 9, "cluster": {"id": 7, "label": "Picture", "bbox": {"l": 162.9001407623291, "t": 443.7800834655762, "r": 451.33742237091064, "b": 663.5160186767579, "coord_origin": "1"}, "confidence": 0.945287823677063, "cells": [{"id": 67, "text": "<table>", "bbox": {"l": 180.12473, "t": 516.2332200000001, "r": 190.62042, "b": 518.94992, "coord_origin": "1"}}, {"id": 68, "text": "<tr><td></td><td colspan=\"4\"></td><td colspan=\"6\"></td><td colspan=\"3\"></td></tr>", "bbox": {"l": 183.2438, "t": 520.13208, "r": 304.54797, "b": 522.84879, "coord_origin": "1"}}, {"id": 69, "text": "<tr><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td></tr>", "bbox": {"l": 183.2438, "t": 524.03094, "r": 388.42313, "b": 526.74765, "coord_origin": "1"}}, {"id": 70, "text": "<tr><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td></tr>", "bbox": {"l": 183.2438, "t": 527.9297799999999, "r": 388.42313, "b": 530.64648, "coord_origin": "1"}}, {"id": 71, "text": "<tr><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td></tr>", "bbox": {"l": 183.2438, "t": 531.82861, "r": 388.42313, "b": 534.54532, "coord_origin": "1"}}, {"id": 72, "text": "<tr><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td></tr>", "bbox": {"l": 183.2438, "t": 535.72748, "r": 388.42313, "b": 538.44418, "coord_origin": "1"}}, {"id": 73, "text": "<tr><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td></tr>", "bbox": {"l": 183.2438, "t": 539.62631, "r": 388.42313, "b": 542.34303, "coord_origin": "1"}}, {"id": 74, "text": "<tr><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td></tr>", "bbox": {"l": 183.2438, "t": 543.52516, "r": 388.42313, "b": 546.24188, "coord_origin": "1"}}, {"id": 75, "text": "<tr><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td></tr>", "bbox": {"l": 183.2438, "t": 547.42401, "r": 388.42313, "b": 550.14073, "coord_origin": "1"}}, {"id": 76, "text": "<tr><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td></tr>", "bbox": {"l": 183.2438, "t": 551.32286, "r": 388.42313, "b": 554.03958, "coord_origin": "1"}}, {"id": 77, "text": "</table>", "bbox": {"l": 180.12473, "t": 555.22173, "r": 191.86806, "b": 557.93845, "coord_origin": "1"}}, {"id": 78, "text": "C", "bbox": {"l": 407.38348, "t": 518.30042, "r": 408.82025, "b": 521.01712, "coord_origin": "1"}}, {"id": 79, "text": "C L L L C L L L L L C L L NL", "bbox": {"l": 410.25699, "t": 518.30042, "r": 450.48605, "b": 521.01712, "coord_origin": "1"}}, {"id": 80, "text": "C", "bbox": {"l": 407.38348, "t": 522.19925, "r": 408.82025, "b": 524.9159500000001, "coord_origin": "1"}}, {"id": 81, "text": "C C C C C C C C C C C C C NL", "bbox": {"l": 410.25699, "t": 522.19925, "r": 450.48605, "b": 524.9159500000001, "coord_origin": "1"}}, {"id": 82, "text": "C", "bbox": {"l": 407.38348, "t": 526.09808, "r": 408.82025, "b": 528.81479, "coord_origin": "1"}}, {"id": 83, "text": "C C C C C C C C C C C C C NL", "bbox": {"l": 410.25699, "t": 526.09808, "r": 450.48605, "b": 528.81479, "coord_origin": "1"}}, {"id": 84, "text": "C", "bbox": {"l": 407.38348, "t": 529.99695, "r": 408.82025, "b": 532.7136499999999, "coord_origin": "1"}}, {"id": 85, "text": "C C C C C C C C C C C C C NL", "bbox": {"l": 410.25699, "t": 529.99695, "r": 450.48605, "b": 532.7136499999999, "coord_origin": "1"}}, {"id": 86, "text": "C", "bbox": {"l": 407.38348, "t": 533.8957800000001, "r": 408.82025, "b": 536.6125, "coord_origin": "1"}}, {"id": 87, "text": "C C C C C C C C C C C C C NL", "bbox": {"l": 410.25699, "t": 533.8957800000001, "r": 450.48605, "b": 536.6125, "coord_origin": "1"}}, {"id": 88, "text": "C", "bbox": {"l": 407.38348, "t": 537.79463, "r": 408.82025, "b": 540.51135, "coord_origin": "1"}}, {"id": 89, "text": "C C C C C C C C C C C C C NL", "bbox": {"l": 410.25699, "t": 537.79463, "r": 450.48605, "b": 540.51135, "coord_origin": "1"}}, {"id": 90, "text": "C", "bbox": {"l": 407.38348, "t": 541.69348, "r": 408.82025, "b": 544.4102, "coord_origin": "1"}}, {"id": 91, "text": "C C C C C C C C C C C C C NL", "bbox": {"l": 410.25699, "t": 541.69348, "r": 450.48605, "b": 544.4102, "coord_origin": "1"}}, {"id": 92, "text": "C", "bbox": {"l": 407.38348, "t": 545.59233, "r": 408.82025, "b": 548.3090500000001, "coord_origin": "1"}}, {"id": 93, "text": "C C C C C C C C C C C C C NL", "bbox": {"l": 410.25699, "t": 545.59233, "r": 450.48605, "b": 548.3090500000001, "coord_origin": "1"}}, {"id": 94, "text": "C", "bbox": {"l": 407.38348, "t": 549.4911999999999, "r": 408.82025, "b": 552.2079200000001, "coord_origin": "1"}}, {"id": 95, "text": "C C C C C C C C C C C C C NL", "bbox": {"l": 410.25699, "t": 549.4911999999999, "r": 450.48605, "b": 552.2079200000001, "coord_origin": "1"}}, {"id": 96, "text": "HTML", "bbox": {"l": 164.52881, "t": 509.45859, "r": 181.8528, "b": 515.31, "coord_origin": "1"}}, {"id": 97, "text": "#", "bbox": {"l": 183.58441, "t": 509.45859, "r": 186.3974, "b": 515.31, "coord_origin": "1"}}, {"id": 98, "text": "tokens:", "bbox": {"l": 189.2104, "t": 509.45859, "r": 208.90137, "b": 515.31, "coord_origin": "1"}}, {"id": 99, "text": "258", "bbox": {"l": 210.63269, "t": 509.45859, "r": 221.04044, "b": 515.31, "coord_origin": "1"}}, {"id": 100, "text": "OTSL", "bbox": {"l": 390.20203, "t": 509.60361, "r": 406.83609, "b": 515.45502, "coord_origin": "1"}}, {"id": 101, "text": "#", "bbox": {"l": 408.56952, "t": 509.60361, "r": 411.38251, "b": 515.45502, "coord_origin": "1"}}, {"id": 102, "text": "tokens:", "bbox": {"l": 414.1955, "t": 509.60361, "r": 433.88647000000003, "b": 515.45502, "coord_origin": "1"}}, {"id": 103, "text": "135", "bbox": {"l": 435.61737, "t": 509.60361, "r": 446.02512, "b": 515.45502, "coord_origin": "1"}}, {"id": 104, "text": "B", "bbox": {"l": 167.19316, "t": 519.07236, "r": 172.8231, "b": 526.3866, "coord_origin": "1"}}, {"id": 105, "text": "A", "bbox": {"l": 187.33745, "t": 448.62485, "r": 192.96739, "b": 455.93909, "coord_origin": "1"}}, {"id": 106, "text": "D", "bbox": {"l": 167.38654, "t": 566.0051599999999, "r": 173.01648, "b": 573.3194, "coord_origin": "1"}}, {"id": 107, "text": "E", "bbox": {"l": 248.45621000000003, "t": 621.78008, "r": 253.65727, "b": 629.09431, "coord_origin": "1"}}, {"id": 108, "text": "C", "bbox": {"l": 395.90057, "t": 519.19946, "r": 401.53052, "b": 526.5137, "coord_origin": "1"}}, {"id": 109, "text": "HTML", "bbox": {"l": 171.62886, "t": 580.28853, "r": 177.48148, "b": 597.26784, "coord_origin": "1"}}, {"id": 110, "text": "OTSL", "bbox": {"l": 251.05969000000002, "t": 633.63408, "r": 256.91235, "b": 649.92345, "coord_origin": "1"}}, {"id": 111, "text": "HTML model shows", "bbox": {"l": 372.14645, "t": 601.45724, "r": 427.0379, "b": 607.30864, "coord_origin": "1"}}, {"id": 112, "text": "bounding box drifting", "bbox": {"l": 372.14645, "t": 607.89948, "r": 430.06838999999997, "b": 613.75087, "coord_origin": "1"}}, {"id": 113, "text": "OTSL model shows", "bbox": {"l": 176.88042, "t": 642.87209, "r": 231.08191, "b": 648.72348, "coord_origin": "1"}}, {"id": 114, "text": "clean bounding box", "bbox": {"l": 176.88042, "t": 649.3143, "r": 230.99271000000002, "b": 655.1657, "coord_origin": "1"}}, {"id": 115, "text": "alignment", "bbox": {"l": 176.88042, "t": 655.7565500000001, "r": 203.93219, "b": 661.60794, "coord_origin": "1"}}, {"id": 116, "text": "\u2264", "bbox": {"l": 215.93231000000003, "t": 557.56342, "r": 218.4697, "b": 569.15967, "coord_origin": "1"}}, {"id": 117, "text": "\u03bc", "bbox": {"l": 229.05689999999998, "t": 557.56342, "r": 231.71908999999997, "b": 569.15967, "coord_origin": "1"}}, {"id": 119, "text": "S", "bbox": {"l": 261.20892, "t": 448.46124, "r": 263.56973, "b": 451.19727, "coord_origin": "1"}}, {"id": 120, "text": "I", "bbox": {"l": 312.33463, "t": 448.46124, "r": 313.6362, "b": 451.19727, "coord_origin": "1"}}, {"id": 121, "text": "R", "bbox": {"l": 377.41125, "t": 448.46124, "r": 380.05737, "b": 451.19727, "coord_origin": "1"}}, {"id": 122, "text": "ST", "bbox": {"l": 200.63976, "t": 453.33997, "r": 205.82492, "b": 456.07599, "coord_origin": "1"}}, {"id": 123, "text": "0.03", "bbox": {"l": 222.20833000000002, "t": 453.33997, "r": 229.76836, "b": 456.07599, "coord_origin": "1"}}, {"id": 124, "text": "0.06", "bbox": {"l": 243.26666, "t": 453.33997, "r": 250.82669, "b": 456.07599, "coord_origin": "1"}}, {"id": 125, "text": "0.12", "bbox": {"l": 264.29657, "t": 453.33997, "r": 271.84949, "b": 456.07599, "coord_origin": "1"}}, {"id": 126, "text": "0.25", "bbox": {"l": 285.31943, "t": 453.33997, "r": 292.87946, "b": 456.07599, "coord_origin": "1"}}, {"id": 127, "text": "0.5", "bbox": {"l": 306.37775, "t": 453.33997, "r": 311.77319, "b": 456.07599, "coord_origin": "1"}}, {"id": 128, "text": "1", "bbox": {"l": 323.41699, "t": 453.33997, "r": 325.58157, "b": 456.07599, "coord_origin": "1"}}, {"id": 129, "text": "2", "bbox": {"l": 334.45807, "t": 453.33997, "r": 336.62265, "b": 456.07599, "coord_origin": "1"}}, {"id": 130, "text": "4", "bbox": {"l": 345.52756, "t": 453.33997, "r": 347.69214, "b": 456.07599, "coord_origin": "1"}}, {"id": 131, "text": "8", "bbox": {"l": 356.56863, "t": 453.33997, "r": 358.73322, "b": 456.07599, "coord_origin": "1"}}, {"id": 132, "text": "16", "bbox": {"l": 367.63812, "t": 453.33997, "r": 371.97089, "b": 456.07599, "coord_origin": "1"}}, {"id": 133, "text": "32", "bbox": {"l": 382.6734, "t": 453.33997, "r": 387.00616, "b": 456.07599, "coord_origin": "1"}}, {"id": 134, "text": "64", "bbox": {"l": 397.73727, "t": 453.33997, "r": 402.07001, "b": 456.07599, "coord_origin": "1"}}, {"id": 135, "text": "\u2265", "bbox": {"l": 412.78879, "t": 447.99298, "r": 414.93463, "b": 457.79964999999993, "coord_origin": "1"}}, {"id": 136, "text": " 128", "bbox": {"l": 414.95697, "t": 453.33997, "r": 422.51746, "b": 456.07599, "coord_origin": "1"}}, {"id": 137, "text": "63", "bbox": {"l": 200.63998, "t": 463.92444, "r": 204.57674, "b": 466.66043, "coord_origin": "1"}}, {"id": 138, "text": "1", "bbox": {"l": 367.62604, "t": 463.92444, "r": 369.58032, "b": 466.66043, "coord_origin": "1"}}, {"id": 139, "text": "1", "bbox": {"l": 382.66132, "t": 463.92444, "r": 384.6156, "b": 466.66043, "coord_origin": "1"}}, {"id": 140, "text": "3", "bbox": {"l": 397.72504, "t": 463.92444, "r": 399.67932, "b": 466.66043, "coord_origin": "1"}}, {"id": 141, "text": "199", "bbox": {"l": 200.64, "t": 468.80313, "r": 206.51694, "b": 471.53915, "coord_origin": "1"}}, {"id": 142, "text": "5", "bbox": {"l": 264.29047, "t": 468.80313, "r": 266.25885, "b": 471.53915, "coord_origin": "1"}}, {"id": 143, "text": "1", "bbox": {"l": 306.37213, "t": 468.80313, "r": 308.34052, "b": 471.53915, "coord_origin": "1"}}, {"id": 144, "text": "2", "bbox": {"l": 345.51526, "t": 468.80313, "r": 347.48364, "b": 471.53915, "coord_origin": "1"}}, {"id": 145, "text": "4", "bbox": {"l": 356.55634, "t": 468.80313, "r": 358.52472, "b": 471.53915, "coord_origin": "1"}}, {"id": 146, "text": "1", "bbox": {"l": 367.62582, "t": 468.80313, "r": 369.59418, "b": 471.53915, "coord_origin": "1"}}, {"id": 147, "text": "1", "bbox": {"l": 382.66107, "t": 468.80313, "r": 384.62946, "b": 471.53915, "coord_origin": "1"}}, {"id": 148, "text": "416", "bbox": {"l": 200.64, "t": 473.68185, "r": 206.51694, "b": 476.41788, "coord_origin": "1"}}, {"id": 149, "text": "4", "bbox": {"l": 264.29047, "t": 473.68185, "r": 266.25885, "b": 476.41788, "coord_origin": "1"}}, {"id": 150, "text": "230", "bbox": {"l": 200.64, "t": 478.53214, "r": 206.51694, "b": 481.26816, "coord_origin": "1"}}, {"id": 151, "text": "1", "bbox": {"l": 243.26373, "t": 478.53214, "r": 245.2321, "b": 481.26816, "coord_origin": "1"}}, {"id": 152, "text": "9", "bbox": {"l": 264.29047, "t": 478.53214, "r": 266.25885, "b": 481.26816, "coord_origin": "1"}}, {"id": 153, "text": "1", "bbox": {"l": 323.40466, "t": 478.53214, "r": 325.37305, "b": 481.26816, "coord_origin": "1"}}, {"id": 154, "text": "1", "bbox": {"l": 397.72519, "t": 478.53214, "r": 399.69354, "b": 481.26816, "coord_origin": "1"}}, {"id": 155, "text": "276", "bbox": {"l": 200.64, "t": 483.41086, "r": 206.51694, "b": 486.14688, "coord_origin": "1"}}, {"id": 156, "text": "2", "bbox": {"l": 382.66132, "t": 483.41086, "r": 384.61563, "b": 486.14688, "coord_origin": "1"}}, {"id": 157, "text": "12", "bbox": {"l": 397.72513, "t": 483.41086, "r": 401.64819, "b": 486.14688, "coord_origin": "1"}}, {"id": 158, "text": "1", "bbox": {"l": 412.78928, "t": 483.41086, "r": 414.74359, "b": 486.14688, "coord_origin": "1"}}, {"id": 159, "text": "320", "bbox": {"l": 200.64014, "t": 488.28958, "r": 207.14445, "b": 491.0256, "coord_origin": "1"}}, {"id": 160, "text": "1", "bbox": {"l": 367.62616, "t": 488.28958, "r": 369.78375, "b": 491.0256, "coord_origin": "1"}}, {"id": 161, "text": "4", "bbox": {"l": 382.66141, "t": 488.28958, "r": 384.81897, "b": 491.0256, "coord_origin": "1"}}, {"id": 162, "text": "20", "bbox": {"l": 397.7251, "t": 488.28958, "r": 402.05087, "b": 491.0256, "coord_origin": "1"}}, {"id": 163, "text": "2013", "bbox": {"l": 200.64032, "t": 493.1683, "r": 208.48566, "b": 495.90433, "coord_origin": "1"}}, {"id": 164, "text": "3", "bbox": {"l": 264.29044, "t": 493.1683, "r": 266.25879, "b": 495.90433, "coord_origin": "1"}}]}, "text": "", "data": null, "provenance": null, "predicted_class": null, "confidence": null}, {"label": "Text", "id": 8, "page_no": 9, "cluster": {"id": 8, "label": "Text", "bbox": {"l": 227.91466, "t": 665.82603, "r": 230.10028, "b": 675.3464, "coord_origin": "1"}, "confidence": -1.0, "cells": [{"id": 165, "text": "\u03bc", "bbox": {"l": 227.91466, "t": 665.82603, "r": 230.10028, "b": 675.3464, "coord_origin": "1"}}]}, "text": "\u03bc"}, {"label": "Text", "id": 9, "page_no": 9, "cluster": {"id": 9, "label": "Text", "bbox": {"l": 300.58057, "t": 683.62195, "r": 302.72638, "b": 693.428658, "coord_origin": "1"}, "confidence": -1.0, "cells": [{"id": 166, "text": "\u2265", "bbox": {"l": 300.58057, "t": 683.62195, "r": 302.72638, "b": 693.428658, "coord_origin": "1"}}]}, "text": "\u2265"}], "body": [{"label": "Caption", "id": 2, "page_no": 9, "cluster": {"id": 2, "label": "Caption", "bbox": {"l": 134.00595617294312, "t": 114.83857812881467, "r": 480.59357000000006, "b": 146.49228200912478, "coord_origin": "1"}, "confidence": 0.9548113346099854, "cells": [{"id": 3, "text": "Table 2.", "bbox": {"l": 134.765, "t": 115.83618000000001, "r": 173.09366, "b": 123.76251000000002, "coord_origin": "1"}}, {"id": 4, "text": "TSR and cell detection results compared between OTSL and HTML on", "bbox": {"l": 181.30299, "t": 115.89899000000003, "r": 480.59151999999995, "b": 123.96868999999992, "coord_origin": "1"}}, {"id": 5, "text": "the PubTabNet [22], FinTabNet [21] and PubTables-1M [14] data sets using Table-", "bbox": {"l": 134.765, "t": 126.85797000000014, "r": 480.59357000000006, "b": 134.92767000000003, "coord_origin": "1"}}, {"id": 6, "text": "Former [9] (with enc=6, dec=6, heads=8).", "bbox": {"l": 134.765, "t": 137.81696, "r": 305.95691, "b": 145.88666, "coord_origin": "1"}}]}, "text": "Table 2. TSR and cell detection results compared between OTSL and HTML on the PubTabNet [22], FinTabNet [21] and PubTables-1M [14] data sets using TableFormer [9] (with enc=6, dec=6, heads=8)."}, {"label": "Table", "id": 3, "page_no": 9, "cluster": {"id": 3, "label": "Table", "bbox": {"l": 143.8171488761902, "t": 156.13133182525632, "r": 470.8412103652954, "b": 263.2244602203368, "coord_origin": "1"}, "confidence": 0.9879505038261414, "cells": [{"id": 7, "text": "Data set", "bbox": {"l": 160.782, "t": 166.55895999999996, "r": 194.99779, "b": 174.62865999999997, "coord_origin": "1"}}, {"id": 8, "text": "Language", "bbox": {"l": 215.52499000000003, "t": 166.534, "r": 254.04465, "b": 174.6037, "coord_origin": "1"}}, {"id": 9, "text": "TEDs", "bbox": {"l": 300.397, "t": 161.07898, "r": 323.99118, "b": 169.14868, "coord_origin": "1"}}, {"id": 10, "text": "mAP(0.75)", "bbox": {"l": 370.345, "t": 166.55895999999996, "r": 414.74661, "b": 174.62865999999997, "coord_origin": "1"}}, {"id": 11, "text": "Inference", "bbox": {"l": 426.737, "t": 161.07898, "r": 463.10830999999996, "b": 169.14868, "coord_origin": "1"}}, {"id": 12, "text": "time (secs)", "bbox": {"l": 423.11401, "t": 172.03796, "r": 466.72656, "b": 180.10766999999998, "coord_origin": "1"}}, {"id": 13, "text": "simple", "bbox": {"l": 262.41299, "t": 174.03101000000004, "r": 288.0596, "b": 182.10071000000005, "coord_origin": "1"}}, {"id": 14, "text": "complex", "bbox": {"l": 296.42899, "t": 174.03101000000004, "r": 329.44687, "b": 182.10071000000005, "coord_origin": "1"}}, {"id": 15, "text": "all", "bbox": {"l": 345.03299, "t": 174.03101000000004, "r": 354.75793, "b": 182.10071000000005, "coord_origin": "1"}}, {"id": 16, "text": "PubTabNet", "bbox": {"l": 154.53799, "t": 192.85999000000004, "r": 201.24129, "b": 200.92969000000005, "coord_origin": "1"}}, {"id": 17, "text": "OTSL", "bbox": {"l": 222.43700000000004, "t": 187.38098000000002, "r": 247.13226000000003, "b": 195.45068000000003, "coord_origin": "1"}}, {"id": 18, "text": "0.965", "bbox": {"l": 264.74399, "t": 187.38098000000002, "r": 285.73074, "b": 195.45068000000003, "coord_origin": "1"}}, {"id": 19, "text": "0.934", "bbox": {"l": 302.444, "t": 187.38098000000002, "r": 323.43076, "b": 195.45068000000003, "coord_origin": "1"}}, {"id": 20, "text": "0.955", "bbox": {"l": 339.40302, "t": 187.38098000000002, "r": 360.38977, "b": 195.45068000000003, "coord_origin": "1"}}, {"id": 21, "text": "0.88", "bbox": {"l": 383.116, "t": 187.31817999999998, "r": 401.97324, "b": 195.24451, "coord_origin": "1"}}, {"id": 22, "text": "2.73", "bbox": {"l": 435.49300999999997, "t": 187.31817999999998, "r": 454.35025, "b": 195.24451, "coord_origin": "1"}}, {"id": 23, "text": "HTML", "bbox": {"l": 220.903, "t": 200.33196999999996, "r": 248.66655999999998, "b": 208.40166999999997, "coord_origin": "1"}}, {"id": 24, "text": "0.969", "bbox": {"l": 264.74399, "t": 200.33196999999996, "r": 285.73074, "b": 208.40166999999997, "coord_origin": "1"}}, {"id": 25, "text": "0.927", "bbox": {"l": 302.444, "t": 200.33196999999996, "r": 323.43076, "b": 208.40166999999997, "coord_origin": "1"}}, {"id": 26, "text": "0.955", "bbox": {"l": 339.40302, "t": 200.33196999999996, "r": 360.38977, "b": 208.40166999999997, "coord_origin": "1"}}, {"id": 27, "text": "0.857", "bbox": {"l": 382.052, "t": 200.33196999999996, "r": 403.03876, "b": 208.40166999999997, "coord_origin": "1"}}, {"id": 28, "text": "5.39", "bbox": {"l": 436.73199000000005, "t": 200.33196999999996, "r": 453.11182, "b": 208.40166999999997, "coord_origin": "1"}}, {"id": 29, "text": "FinTabNet", "bbox": {"l": 155.94501, "t": 219.16198999999995, "r": 199.83374, "b": 227.23168999999996, "coord_origin": "1"}}, {"id": 30, "text": "OTSL", "bbox": {"l": 222.43700000000004, "t": 213.68201, "r": 247.13226000000003, "b": 221.75171, "coord_origin": "1"}}, {"id": 31, "text": "0.955", "bbox": {"l": 264.74399, "t": 213.68201, "r": 285.73074, "b": 221.75171, "coord_origin": "1"}}, {"id": 32, "text": "0.961", "bbox": {"l": 302.444, "t": 213.68201, "r": 323.43076, "b": 221.75171, "coord_origin": "1"}}, {"id": 33, "text": "0.959", "bbox": {"l": 337.815, "t": 213.61919999999998, "r": 361.97586, "b": 221.54552999999999, "coord_origin": "1"}}, {"id": 34, "text": "0.862", "bbox": {"l": 380.46399, "t": 213.61919999999998, "r": 404.62485, "b": 221.54552999999999, "coord_origin": "1"}}, {"id": 35, "text": "1.85", "bbox": {"l": 435.49300999999997, "t": 213.61919999999998, "r": 454.35025, "b": 221.54552999999999, "coord_origin": "1"}}, {"id": 36, "text": "HTML", "bbox": {"l": 220.903, "t": 226.63396999999998, "r": 248.66655999999998, "b": 234.70367, "coord_origin": "1"}}, {"id": 37, "text": "0.917", "bbox": {"l": 264.74399, "t": 226.63396999999998, "r": 285.73074, "b": 234.70367, "coord_origin": "1"}}, {"id": 38, "text": "0.922", "bbox": {"l": 302.444, "t": 226.63396999999998, "r": 323.43076, "b": 234.70367, "coord_origin": "1"}}, {"id": 39, "text": "0.92", "bbox": {"l": 341.70599, "t": 226.63396999999998, "r": 358.08582, "b": 234.70367, "coord_origin": "1"}}, {"id": 40, "text": "0.722", "bbox": {"l": 382.052, "t": 226.63396999999998, "r": 403.03876, "b": 234.70367, "coord_origin": "1"}}, {"id": 41, "text": "3.26", "bbox": {"l": 436.73199000000005, "t": 226.63396999999998, "r": 453.11182, "b": 234.70367, "coord_origin": "1"}}, {"id": 42, "text": "PubTables-1M", "bbox": {"l": 148.62601, "t": 245.46294999999998, "r": 207.1524, "b": 253.53265, "coord_origin": "1"}}, {"id": 43, "text": "OTSL", "bbox": {"l": 222.43700000000004, "t": 239.98297000000002, "r": 247.13226000000003, "b": 248.05267000000003, "coord_origin": "1"}}, {"id": 44, "text": "0.987", "bbox": {"l": 264.74399, "t": 239.98297000000002, "r": 285.73074, "b": 248.05267000000003, "coord_origin": "1"}}, {"id": 45, "text": "0.964", "bbox": {"l": 302.444, "t": 239.98297000000002, "r": 323.43076, "b": 248.05267000000003, "coord_origin": "1"}}, {"id": 46, "text": "0.977", "bbox": {"l": 337.815, "t": 239.92016999999998, "r": 361.97586, "b": 247.8465, "coord_origin": "1"}}, {"id": 47, "text": "0.896", "bbox": {"l": 380.46399, "t": 239.92016999999998, "r": 404.62485, "b": 247.8465, "coord_origin": "1"}}, {"id": 48, "text": "1.79", "bbox": {"l": 435.49300999999997, "t": 239.92016999999998, "r": 454.35025, "b": 247.8465, "coord_origin": "1"}}, {"id": 49, "text": "HTML", "bbox": {"l": 220.903, "t": 252.93499999999995, "r": 248.66655999999998, "b": 261.00469999999996, "coord_origin": "1"}}, {"id": 50, "text": "0.983", "bbox": {"l": 264.74399, "t": 252.93499999999995, "r": 285.73074, "b": 261.00469999999996, "coord_origin": "1"}}, {"id": 51, "text": "0.944", "bbox": {"l": 302.444, "t": 252.93499999999995, "r": 323.43076, "b": 261.00469999999996, "coord_origin": "1"}}, {"id": 52, "text": "0.966", "bbox": {"l": 339.40302, "t": 252.93499999999995, "r": 360.38977, "b": 261.00469999999996, "coord_origin": "1"}}, {"id": 53, "text": "0.889", "bbox": {"l": 382.052, "t": 252.93499999999995, "r": 403.03876, "b": 261.00469999999996, "coord_origin": "1"}}, {"id": 54, "text": "3.26", "bbox": {"l": 436.73199000000005, "t": 252.93499999999995, "r": 453.11182, "b": 261.00469999999996, "coord_origin": "1"}}]}, "text": null, "otsl_seq": ["ched", "ched", "ched", "lcel", "lcel", "ched", "ched", "nl", "ched", "ucel", "ched", "ched", "ched", "ucel", "ucel", "nl", "rhed", "rhed", "fcel", "fcel", "fcel", "fcel", "fcel", "nl", "ucel", "rhed", "fcel", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "rhed", "fcel", "fcel", "fcel", "fcel", "fcel", "nl", "ucel", "rhed", "fcel", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "rhed", "fcel", "fcel", "fcel", "fcel", "fcel", "nl", "ucel", "rhed", "fcel", "fcel", "fcel", "fcel", "fcel", "nl"], "num_rows": 8, "num_cols": 7, "table_cells": [{"bbox": {"l": 215.52499000000003, "t": 166.534, "r": 254.04465, "b": 174.6037, "coord_origin": "1"}, "row_span": 2, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 2, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "Language", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 300.397, "t": 161.07898, "r": 323.99118, "b": 169.14868, "coord_origin": "1"}, "row_span": 1, "col_span": 3, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 2, "end_col_offset_idx": 5, "text": "TEDs", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 370.345, "t": 166.55895999999996, "r": 414.74661, "b": 174.62865999999997, "coord_origin": "1"}, "row_span": 2, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 2, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "mAP(0.75)", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 423.11401, "t": 161.07898, "r": 466.72656, "b": 180.10766999999998, "coord_origin": "1"}, "row_span": 2, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 2, "start_col_offset_idx": 6, "end_col_offset_idx": 7, "text": "Inference time (secs)", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 262.41299, "t": 174.03101000000004, "r": 288.0596, "b": 182.10071000000005, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "simple", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 296.42899, "t": 174.03101000000004, "r": 329.44687, "b": 182.10071000000005, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "complex", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 345.03299, "t": 174.03101000000004, "r": 354.75793, "b": 182.10071000000005, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "all", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 154.53799, "t": 192.85999000000004, "r": 201.24129, "b": 200.92969000000005, "coord_origin": "1"}, "row_span": 2, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 4, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "PubTabNet", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 222.43700000000004, "t": 187.38098000000002, "r": 247.13226000000003, "b": 195.45068000000003, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "OTSL", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 264.74399, "t": 187.38098000000002, "r": 285.73074, "b": 195.45068000000003, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "0.965", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 302.444, "t": 187.38098000000002, "r": 323.43076, "b": 195.45068000000003, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "0.934", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 339.40302, "t": 187.38098000000002, "r": 360.38977, "b": 195.45068000000003, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "0.955", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 383.116, "t": 187.31817999999998, "r": 401.97324, "b": 195.24451, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "0.88", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 435.49300999999997, "t": 187.31817999999998, "r": 454.35025, "b": 195.24451, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 6, "end_col_offset_idx": 7, "text": "2.73", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 220.903, "t": 200.33196999999996, "r": 248.66655999999998, "b": 208.40166999999997, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "HTML", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 264.74399, "t": 200.33196999999996, "r": 285.73074, "b": 208.40166999999997, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "0.969", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 302.444, "t": 200.33196999999996, "r": 323.43076, "b": 208.40166999999997, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "0.927", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 339.40302, "t": 200.33196999999996, "r": 360.38977, "b": 208.40166999999997, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "0.955", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 382.052, "t": 200.33196999999996, "r": 403.03876, "b": 208.40166999999997, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "0.857", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 436.73199000000005, "t": 200.33196999999996, "r": 453.11182, "b": 208.40166999999997, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 6, "end_col_offset_idx": 7, "text": "5.39", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 155.94501, "t": 219.16198999999995, "r": 199.83374, "b": 227.23168999999996, "coord_origin": "1"}, "row_span": 2, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 6, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "FinTabNet", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 222.43700000000004, "t": 213.68201, "r": 247.13226000000003, "b": 221.75171, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "OTSL", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 264.74399, "t": 213.68201, "r": 285.73074, "b": 221.75171, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "0.955", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 302.444, "t": 213.68201, "r": 323.43076, "b": 221.75171, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "0.961", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 337.815, "t": 213.61919999999998, "r": 361.97586, "b": 221.54552999999999, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "0.959", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 380.46399, "t": 213.61919999999998, "r": 404.62485, "b": 221.54552999999999, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "0.862", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 435.49300999999997, "t": 213.61919999999998, "r": 454.35025, "b": 221.54552999999999, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 6, "end_col_offset_idx": 7, "text": "1.85", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 220.903, "t": 226.63396999999998, "r": 248.66655999999998, "b": 234.70367, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "HTML", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 264.74399, "t": 226.63396999999998, "r": 285.73074, "b": 234.70367, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "0.917", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 302.444, "t": 226.63396999999998, "r": 323.43076, "b": 234.70367, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "0.922", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 341.70599, "t": 226.63396999999998, "r": 358.08582, "b": 234.70367, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "0.92", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 382.052, "t": 226.63396999999998, "r": 403.03876, "b": 234.70367, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "0.722", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 436.73199000000005, "t": 226.63396999999998, "r": 453.11182, "b": 234.70367, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 6, "end_col_offset_idx": 7, "text": "3.26", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 148.62601, "t": 245.46294999999998, "r": 207.1524, "b": 253.53265, "coord_origin": "1"}, "row_span": 2, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 8, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "PubTables-1M", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 222.43700000000004, "t": 239.98297000000002, "r": 247.13226000000003, "b": 248.05267000000003, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "OTSL", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 264.74399, "t": 239.98297000000002, "r": 285.73074, "b": 248.05267000000003, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "0.987", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 302.444, "t": 239.98297000000002, "r": 323.43076, "b": 248.05267000000003, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "0.964", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 337.815, "t": 239.92016999999998, "r": 361.97586, "b": 247.8465, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "0.977", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 380.46399, "t": 239.92016999999998, "r": 404.62485, "b": 247.8465, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "0.896", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 435.49300999999997, "t": 239.92016999999998, "r": 454.35025, "b": 247.8465, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 6, "end_col_offset_idx": 7, "text": "1.79", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 220.903, "t": 252.93499999999995, "r": 248.66655999999998, "b": 261.00469999999996, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "HTML", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 264.74399, "t": 252.93499999999995, "r": 285.73074, "b": 261.00469999999996, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "0.983", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 302.444, "t": 252.93499999999995, "r": 323.43076, "b": 261.00469999999996, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "0.944", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 339.40302, "t": 252.93499999999995, "r": 360.38977, "b": 261.00469999999996, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "0.966", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 382.052, "t": 252.93499999999995, "r": 403.03876, "b": 261.00469999999996, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "0.889", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 436.73199000000005, "t": 252.93499999999995, "r": 453.11182, "b": 261.00469999999996, "coord_origin": "1"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 6, "end_col_offset_idx": 7, "text": "3.26", "column_header": false, "row_header": false, "row_section": false}]}, {"label": "Section-header", "id": 4, "page_no": 9, "cluster": {"id": 4, "label": "Section-header", "bbox": {"l": 134.25314598083494, "t": 288.23322944641114, "r": 257.1956182479858, "b": 298.2838571548462, "coord_origin": "1"}, "confidence": 0.9522386193275452, "cells": [{"id": 55, "text": "5.3", "bbox": {"l": 134.765, "t": 288.91479, "r": 149.40205, "b": 297.72173999999995, "coord_origin": "1"}}, {"id": 56, "text": "Qualitative Results", "bbox": {"l": 160.85904, "t": 288.91479, "r": 257.08679, "b": 297.72173999999995, "coord_origin": "1"}}]}, "text": "5.3 Qualitative Results"}, {"label": "Text", "id": 5, "page_no": 9, "cluster": {"id": 5, "label": "Text", "bbox": {"l": 133.7931432723999, "t": 308.9267612457275, "r": 480.6096508026123, "b": 366.47769, "coord_origin": "1"}, "confidence": 0.9832314252853394, "cells": [{"id": 57, "text": "To illustrate the qualitative differences between OTSL and HTML, Figure 5", "bbox": {"l": 134.765, "t": 309.86078, "r": 480.58777, "b": 318.65775, "coord_origin": "1"}}, {"id": 58, "text": "demonstrates less overlap and more accurate bounding boxes with OTSL. In", "bbox": {"l": 134.765, "t": 321.81577, "r": 480.58889999999997, "b": 330.61273, "coord_origin": "1"}}, {"id": 59, "text": "Figure 6, OTSL proves to be more effective in handling tables with longer to-", "bbox": {"l": 134.765, "t": 333.77075, "r": 480.58681999999993, "b": 342.56772, "coord_origin": "1"}}, {"id": 60, "text": "ken sequences, resulting in even more precise structure prediction and bounding", "bbox": {"l": 134.765, "t": 345.72574, "r": 480.58981, "b": 354.52271, "coord_origin": "1"}}, {"id": 61, "text": "boxes.", "bbox": {"l": 134.765, "t": 357.68073, "r": 161.65704, "b": 366.47769, "coord_origin": "1"}}]}, "text": "To illustrate the qualitative differences between OTSL and HTML, Figure 5 demonstrates less overlap and more accurate bounding boxes with OTSL. In Figure 6, OTSL proves to be more effective in handling tables with longer token sequences, resulting in even more precise structure prediction and bounding boxes."}, {"label": "Caption", "id": 6, "page_no": 9, "cluster": {"id": 6, "label": "Caption", "bbox": {"l": 133.93432788848875, "t": 396.78733520507814, "r": 480.59106, "b": 439.71716, "coord_origin": "1"}, "confidence": 0.7760405540466309, "cells": [{"id": 62, "text": "Fig. 5.", "bbox": {"l": 134.765, "t": 397.59012, "r": 162.64424, "b": 405.51642, "coord_origin": "1"}}, {"id": 63, "text": "The OTSL model produces more accurate bounding boxes with less over-", "bbox": {"l": 167.384, "t": 397.65289, "r": 480.59106, "b": 405.72266, "coord_origin": "1"}}, {"id": 64, "text": "lap (E) than the HTML model (D), when predicting the structure of a sparse ta-", "bbox": {"l": 134.765, "t": 408.61190999999997, "r": 480.59106, "b": 416.68167000000005, "coord_origin": "1"}}, {"id": 65, "text": "ble (A), at twice the inference speed because of shorter sequence length (B),(C).", "bbox": {"l": 134.765, "t": 419.57089, "r": 480.58838000000003, "b": 427.64066, "coord_origin": "1"}}, {"id": 66, "text": "\"PMC2807444_006_00.png\" PubTabNet.", "bbox": {"l": 134.765, "t": 430.52987999999993, "r": 304.69171, "b": 438.59964, "coord_origin": "1"}}, {"id": 118, "text": "\u03bc", "bbox": {"l": 342.63354, "t": 430.19678, "r": 344.81915, "b": 439.71716, "coord_origin": "1"}}]}, "text": "Fig. 5. The OTSL model produces more accurate bounding boxes with less overlap (E) than the HTML model (D), when predicting the structure of a sparse table (A), at twice the inference speed because of shorter sequence length (B),(C). \"PMC2807444_006_00.png\" PubTabNet. \u03bc"}, {"label": "Picture", "id": 7, "page_no": 9, "cluster": {"id": 7, "label": "Picture", "bbox": {"l": 162.9001407623291, "t": 443.7800834655762, "r": 451.33742237091064, "b": 663.5160186767579, "coord_origin": "1"}, "confidence": 0.945287823677063, "cells": [{"id": 67, "text": "<table>", "bbox": {"l": 180.12473, "t": 516.2332200000001, "r": 190.62042, "b": 518.94992, "coord_origin": "1"}}, {"id": 68, "text": "<tr><td></td><td colspan=\"4\"></td><td colspan=\"6\"></td><td colspan=\"3\"></td></tr>", "bbox": {"l": 183.2438, "t": 520.13208, "r": 304.54797, "b": 522.84879, "coord_origin": "1"}}, {"id": 69, "text": "<tr><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td></tr>", "bbox": {"l": 183.2438, "t": 524.03094, "r": 388.42313, "b": 526.74765, "coord_origin": "1"}}, {"id": 70, "text": "<tr><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td></tr>", "bbox": {"l": 183.2438, "t": 527.9297799999999, "r": 388.42313, "b": 530.64648, "coord_origin": "1"}}, {"id": 71, "text": "<tr><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td></tr>", "bbox": {"l": 183.2438, "t": 531.82861, "r": 388.42313, "b": 534.54532, "coord_origin": "1"}}, {"id": 72, "text": "<tr><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td></tr>", "bbox": {"l": 183.2438, "t": 535.72748, "r": 388.42313, "b": 538.44418, "coord_origin": "1"}}, {"id": 73, "text": "<tr><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td></tr>", "bbox": {"l": 183.2438, "t": 539.62631, "r": 388.42313, "b": 542.34303, "coord_origin": "1"}}, {"id": 74, "text": "<tr><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td></tr>", "bbox": {"l": 183.2438, "t": 543.52516, "r": 388.42313, "b": 546.24188, "coord_origin": "1"}}, {"id": 75, "text": "<tr><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td></tr>", "bbox": {"l": 183.2438, "t": 547.42401, "r": 388.42313, "b": 550.14073, "coord_origin": "1"}}, {"id": 76, "text": "<tr><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td></tr>", "bbox": {"l": 183.2438, "t": 551.32286, "r": 388.42313, "b": 554.03958, "coord_origin": "1"}}, {"id": 77, "text": "</table>", "bbox": {"l": 180.12473, "t": 555.22173, "r": 191.86806, "b": 557.93845, "coord_origin": "1"}}, {"id": 78, "text": "C", "bbox": {"l": 407.38348, "t": 518.30042, "r": 408.82025, "b": 521.01712, "coord_origin": "1"}}, {"id": 79, "text": "C L L L C L L L L L C L L NL", "bbox": {"l": 410.25699, "t": 518.30042, "r": 450.48605, "b": 521.01712, "coord_origin": "1"}}, {"id": 80, "text": "C", "bbox": {"l": 407.38348, "t": 522.19925, "r": 408.82025, "b": 524.9159500000001, "coord_origin": "1"}}, {"id": 81, "text": "C C C C C C C C C C C C C NL", "bbox": {"l": 410.25699, "t": 522.19925, "r": 450.48605, "b": 524.9159500000001, "coord_origin": "1"}}, {"id": 82, "text": "C", "bbox": {"l": 407.38348, "t": 526.09808, "r": 408.82025, "b": 528.81479, "coord_origin": "1"}}, {"id": 83, "text": "C C C C C C C C C C C C C NL", "bbox": {"l": 410.25699, "t": 526.09808, "r": 450.48605, "b": 528.81479, "coord_origin": "1"}}, {"id": 84, "text": "C", "bbox": {"l": 407.38348, "t": 529.99695, "r": 408.82025, "b": 532.7136499999999, "coord_origin": "1"}}, {"id": 85, "text": "C C C C C C C C C C C C C NL", "bbox": {"l": 410.25699, "t": 529.99695, "r": 450.48605, "b": 532.7136499999999, "coord_origin": "1"}}, {"id": 86, "text": "C", "bbox": {"l": 407.38348, "t": 533.8957800000001, "r": 408.82025, "b": 536.6125, "coord_origin": "1"}}, {"id": 87, "text": "C C C C C C C C C C C C C NL", "bbox": {"l": 410.25699, "t": 533.8957800000001, "r": 450.48605, "b": 536.6125, "coord_origin": "1"}}, {"id": 88, "text": "C", "bbox": {"l": 407.38348, "t": 537.79463, "r": 408.82025, "b": 540.51135, "coord_origin": "1"}}, {"id": 89, "text": "C C C C C C C C C C C C C NL", "bbox": {"l": 410.25699, "t": 537.79463, "r": 450.48605, "b": 540.51135, "coord_origin": "1"}}, {"id": 90, "text": "C", "bbox": {"l": 407.38348, "t": 541.69348, "r": 408.82025, "b": 544.4102, "coord_origin": "1"}}, {"id": 91, "text": "C C C C C C C C C C C C C NL", "bbox": {"l": 410.25699, "t": 541.69348, "r": 450.48605, "b": 544.4102, "coord_origin": "1"}}, {"id": 92, "text": "C", "bbox": {"l": 407.38348, "t": 545.59233, "r": 408.82025, "b": 548.3090500000001, "coord_origin": "1"}}, {"id": 93, "text": "C C C C C C C C C C C C C NL", "bbox": {"l": 410.25699, "t": 545.59233, "r": 450.48605, "b": 548.3090500000001, "coord_origin": "1"}}, {"id": 94, "text": "C", "bbox": {"l": 407.38348, "t": 549.4911999999999, "r": 408.82025, "b": 552.2079200000001, "coord_origin": "1"}}, {"id": 95, "text": "C C C C C C C C C C C C C NL", "bbox": {"l": 410.25699, "t": 549.4911999999999, "r": 450.48605, "b": 552.2079200000001, "coord_origin": "1"}}, {"id": 96, "text": "HTML", "bbox": {"l": 164.52881, "t": 509.45859, "r": 181.8528, "b": 515.31, "coord_origin": "1"}}, {"id": 97, "text": "#", "bbox": {"l": 183.58441, "t": 509.45859, "r": 186.3974, "b": 515.31, "coord_origin": "1"}}, {"id": 98, "text": "tokens:", "bbox": {"l": 189.2104, "t": 509.45859, "r": 208.90137, "b": 515.31, "coord_origin": "1"}}, {"id": 99, "text": "258", "bbox": {"l": 210.63269, "t": 509.45859, "r": 221.04044, "b": 515.31, "coord_origin": "1"}}, {"id": 100, "text": "OTSL", "bbox": {"l": 390.20203, "t": 509.60361, "r": 406.83609, "b": 515.45502, "coord_origin": "1"}}, {"id": 101, "text": "#", "bbox": {"l": 408.56952, "t": 509.60361, "r": 411.38251, "b": 515.45502, "coord_origin": "1"}}, {"id": 102, "text": "tokens:", "bbox": {"l": 414.1955, "t": 509.60361, "r": 433.88647000000003, "b": 515.45502, "coord_origin": "1"}}, {"id": 103, "text": "135", "bbox": {"l": 435.61737, "t": 509.60361, "r": 446.02512, "b": 515.45502, "coord_origin": "1"}}, {"id": 104, "text": "B", "bbox": {"l": 167.19316, "t": 519.07236, "r": 172.8231, "b": 526.3866, "coord_origin": "1"}}, {"id": 105, "text": "A", "bbox": {"l": 187.33745, "t": 448.62485, "r": 192.96739, "b": 455.93909, "coord_origin": "1"}}, {"id": 106, "text": "D", "bbox": {"l": 167.38654, "t": 566.0051599999999, "r": 173.01648, "b": 573.3194, "coord_origin": "1"}}, {"id": 107, "text": "E", "bbox": {"l": 248.45621000000003, "t": 621.78008, "r": 253.65727, "b": 629.09431, "coord_origin": "1"}}, {"id": 108, "text": "C", "bbox": {"l": 395.90057, "t": 519.19946, "r": 401.53052, "b": 526.5137, "coord_origin": "1"}}, {"id": 109, "text": "HTML", "bbox": {"l": 171.62886, "t": 580.28853, "r": 177.48148, "b": 597.26784, "coord_origin": "1"}}, {"id": 110, "text": "OTSL", "bbox": {"l": 251.05969000000002, "t": 633.63408, "r": 256.91235, "b": 649.92345, "coord_origin": "1"}}, {"id": 111, "text": "HTML model shows", "bbox": {"l": 372.14645, "t": 601.45724, "r": 427.0379, "b": 607.30864, "coord_origin": "1"}}, {"id": 112, "text": "bounding box drifting", "bbox": {"l": 372.14645, "t": 607.89948, "r": 430.06838999999997, "b": 613.75087, "coord_origin": "1"}}, {"id": 113, "text": "OTSL model shows", "bbox": {"l": 176.88042, "t": 642.87209, "r": 231.08191, "b": 648.72348, "coord_origin": "1"}}, {"id": 114, "text": "clean bounding box", "bbox": {"l": 176.88042, "t": 649.3143, "r": 230.99271000000002, "b": 655.1657, "coord_origin": "1"}}, {"id": 115, "text": "alignment", "bbox": {"l": 176.88042, "t": 655.7565500000001, "r": 203.93219, "b": 661.60794, "coord_origin": "1"}}, {"id": 116, "text": "\u2264", "bbox": {"l": 215.93231000000003, "t": 557.56342, "r": 218.4697, "b": 569.15967, "coord_origin": "1"}}, {"id": 117, "text": "\u03bc", "bbox": {"l": 229.05689999999998, "t": 557.56342, "r": 231.71908999999997, "b": 569.15967, "coord_origin": "1"}}, {"id": 119, "text": "S", "bbox": {"l": 261.20892, "t": 448.46124, "r": 263.56973, "b": 451.19727, "coord_origin": "1"}}, {"id": 120, "text": "I", "bbox": {"l": 312.33463, "t": 448.46124, "r": 313.6362, "b": 451.19727, "coord_origin": "1"}}, {"id": 121, "text": "R", "bbox": {"l": 377.41125, "t": 448.46124, "r": 380.05737, "b": 451.19727, "coord_origin": "1"}}, {"id": 122, "text": "ST", "bbox": {"l": 200.63976, "t": 453.33997, "r": 205.82492, "b": 456.07599, "coord_origin": "1"}}, {"id": 123, "text": "0.03", "bbox": {"l": 222.20833000000002, "t": 453.33997, "r": 229.76836, "b": 456.07599, "coord_origin": "1"}}, {"id": 124, "text": "0.06", "bbox": {"l": 243.26666, "t": 453.33997, "r": 250.82669, "b": 456.07599, "coord_origin": "1"}}, {"id": 125, "text": "0.12", "bbox": {"l": 264.29657, "t": 453.33997, "r": 271.84949, "b": 456.07599, "coord_origin": "1"}}, {"id": 126, "text": "0.25", "bbox": {"l": 285.31943, "t": 453.33997, "r": 292.87946, "b": 456.07599, "coord_origin": "1"}}, {"id": 127, "text": "0.5", "bbox": {"l": 306.37775, "t": 453.33997, "r": 311.77319, "b": 456.07599, "coord_origin": "1"}}, {"id": 128, "text": "1", "bbox": {"l": 323.41699, "t": 453.33997, "r": 325.58157, "b": 456.07599, "coord_origin": "1"}}, {"id": 129, "text": "2", "bbox": {"l": 334.45807, "t": 453.33997, "r": 336.62265, "b": 456.07599, "coord_origin": "1"}}, {"id": 130, "text": "4", "bbox": {"l": 345.52756, "t": 453.33997, "r": 347.69214, "b": 456.07599, "coord_origin": "1"}}, {"id": 131, "text": "8", "bbox": {"l": 356.56863, "t": 453.33997, "r": 358.73322, "b": 456.07599, "coord_origin": "1"}}, {"id": 132, "text": "16", "bbox": {"l": 367.63812, "t": 453.33997, "r": 371.97089, "b": 456.07599, "coord_origin": "1"}}, {"id": 133, "text": "32", "bbox": {"l": 382.6734, "t": 453.33997, "r": 387.00616, "b": 456.07599, "coord_origin": "1"}}, {"id": 134, "text": "64", "bbox": {"l": 397.73727, "t": 453.33997, "r": 402.07001, "b": 456.07599, "coord_origin": "1"}}, {"id": 135, "text": "\u2265", "bbox": {"l": 412.78879, "t": 447.99298, "r": 414.93463, "b": 457.79964999999993, "coord_origin": "1"}}, {"id": 136, "text": " 128", "bbox": {"l": 414.95697, "t": 453.33997, "r": 422.51746, "b": 456.07599, "coord_origin": "1"}}, {"id": 137, "text": "63", "bbox": {"l": 200.63998, "t": 463.92444, "r": 204.57674, "b": 466.66043, "coord_origin": "1"}}, {"id": 138, "text": "1", "bbox": {"l": 367.62604, "t": 463.92444, "r": 369.58032, "b": 466.66043, "coord_origin": "1"}}, {"id": 139, "text": "1", "bbox": {"l": 382.66132, "t": 463.92444, "r": 384.6156, "b": 466.66043, "coord_origin": "1"}}, {"id": 140, "text": "3", "bbox": {"l": 397.72504, "t": 463.92444, "r": 399.67932, "b": 466.66043, "coord_origin": "1"}}, {"id": 141, "text": "199", "bbox": {"l": 200.64, "t": 468.80313, "r": 206.51694, "b": 471.53915, "coord_origin": "1"}}, {"id": 142, "text": "5", "bbox": {"l": 264.29047, "t": 468.80313, "r": 266.25885, "b": 471.53915, "coord_origin": "1"}}, {"id": 143, "text": "1", "bbox": {"l": 306.37213, "t": 468.80313, "r": 308.34052, "b": 471.53915, "coord_origin": "1"}}, {"id": 144, "text": "2", "bbox": {"l": 345.51526, "t": 468.80313, "r": 347.48364, "b": 471.53915, "coord_origin": "1"}}, {"id": 145, "text": "4", "bbox": {"l": 356.55634, "t": 468.80313, "r": 358.52472, "b": 471.53915, "coord_origin": "1"}}, {"id": 146, "text": "1", "bbox": {"l": 367.62582, "t": 468.80313, "r": 369.59418, "b": 471.53915, "coord_origin": "1"}}, {"id": 147, "text": "1", "bbox": {"l": 382.66107, "t": 468.80313, "r": 384.62946, "b": 471.53915, "coord_origin": "1"}}, {"id": 148, "text": "416", "bbox": {"l": 200.64, "t": 473.68185, "r": 206.51694, "b": 476.41788, "coord_origin": "1"}}, {"id": 149, "text": "4", "bbox": {"l": 264.29047, "t": 473.68185, "r": 266.25885, "b": 476.41788, "coord_origin": "1"}}, {"id": 150, "text": "230", "bbox": {"l": 200.64, "t": 478.53214, "r": 206.51694, "b": 481.26816, "coord_origin": "1"}}, {"id": 151, "text": "1", "bbox": {"l": 243.26373, "t": 478.53214, "r": 245.2321, "b": 481.26816, "coord_origin": "1"}}, {"id": 152, "text": "9", "bbox": {"l": 264.29047, "t": 478.53214, "r": 266.25885, "b": 481.26816, "coord_origin": "1"}}, {"id": 153, "text": "1", "bbox": {"l": 323.40466, "t": 478.53214, "r": 325.37305, "b": 481.26816, "coord_origin": "1"}}, {"id": 154, "text": "1", "bbox": {"l": 397.72519, "t": 478.53214, "r": 399.69354, "b": 481.26816, "coord_origin": "1"}}, {"id": 155, "text": "276", "bbox": {"l": 200.64, "t": 483.41086, "r": 206.51694, "b": 486.14688, "coord_origin": "1"}}, {"id": 156, "text": "2", "bbox": {"l": 382.66132, "t": 483.41086, "r": 384.61563, "b": 486.14688, "coord_origin": "1"}}, {"id": 157, "text": "12", "bbox": {"l": 397.72513, "t": 483.41086, "r": 401.64819, "b": 486.14688, "coord_origin": "1"}}, {"id": 158, "text": "1", "bbox": {"l": 412.78928, "t": 483.41086, "r": 414.74359, "b": 486.14688, "coord_origin": "1"}}, {"id": 159, "text": "320", "bbox": {"l": 200.64014, "t": 488.28958, "r": 207.14445, "b": 491.0256, "coord_origin": "1"}}, {"id": 160, "text": "1", "bbox": {"l": 367.62616, "t": 488.28958, "r": 369.78375, "b": 491.0256, "coord_origin": "1"}}, {"id": 161, "text": "4", "bbox": {"l": 382.66141, "t": 488.28958, "r": 384.81897, "b": 491.0256, "coord_origin": "1"}}, {"id": 162, "text": "20", "bbox": {"l": 397.7251, "t": 488.28958, "r": 402.05087, "b": 491.0256, "coord_origin": "1"}}, {"id": 163, "text": "2013", "bbox": {"l": 200.64032, "t": 493.1683, "r": 208.48566, "b": 495.90433, "coord_origin": "1"}}, {"id": 164, "text": "3", "bbox": {"l": 264.29044, "t": 493.1683, "r": 266.25879, "b": 495.90433, "coord_origin": "1"}}]}, "text": "", "data": null, "provenance": null, "predicted_class": null, "confidence": null}, {"label": "Text", "id": 8, "page_no": 9, "cluster": {"id": 8, "label": "Text", "bbox": {"l": 227.91466, "t": 665.82603, "r": 230.10028, "b": 675.3464, "coord_origin": "1"}, "confidence": -1.0, "cells": [{"id": 165, "text": "\u03bc", "bbox": {"l": 227.91466, "t": 665.82603, "r": 230.10028, "b": 675.3464, "coord_origin": "1"}}]}, "text": "\u03bc"}, {"label": "Text", "id": 9, "page_no": 9, "cluster": {"id": 9, "label": "Text", "bbox": {"l": 300.58057, "t": 683.62195, "r": 302.72638, "b": 693.428658, "coord_origin": "1"}, "confidence": -1.0, "cells": [{"id": 166, "text": "\u2265", "bbox": {"l": 300.58057, "t": 683.62195, "r": 302.72638, "b": 693.428658, "coord_origin": "1"}}]}, "text": "\u2265"}], "headers": [{"label": "Page-header", "id": 0, "page_no": 9, "cluster": {"id": 0, "label": "Page-header", "bbox": {"l": 134.6792824745178, "t": 93.56233406066895, "r": 144.24872789382934, "b": 101.84069999999997, "coord_origin": "1"}, "confidence": 0.8677384853363037, "cells": [{"id": 0, "text": "10", "bbox": {"l": 134.765, "t": 93.77099999999996, "r": 143.97887, "b": 101.84069999999997, "coord_origin": "1"}}]}, "text": "10"}, {"label": "Page-header", "id": 1, "page_no": 9, "cluster": {"id": 1, "label": "Page-header", "bbox": {"l": 167.24963665008545, "t": 92.96470441818235, "r": 231.72049000000004, "b": 101.84069999999997, "coord_origin": "1"}, "confidence": 0.8613899946212769, "cells": [{"id": 1, "text": "M.", "bbox": {"l": 167.82053, "t": 93.77099999999996, "r": 178.08249, "b": 101.84069999999997, "coord_origin": "1"}}, {"id": 2, "text": "Lysak, et al.", "bbox": {"l": 182.37929, "t": 93.77099999999996, "r": 231.72049000000004, "b": 101.84069999999997, "coord_origin": "1"}}]}, "text": "M. Lysak, et al."}]}}, {"page_no": 10, "page_hash": "ac5ff01e648170bbe641d6fd95dc4f952a8e0bf62308f109b7c49678cef97005", "size": {"width": 612.0, "height": 792.0}, "cells": [{"id": 0, "text": "Optimized Table Tokenization for Table Structure Recognition", "bbox": {"l": 194.478, "t": 93.77099999999996, "r": 447.54291000000006, "b": 101.84069999999997, "coord_origin": "1"}}, {"id": 1, "text": "11", "bbox": {"l": 471.37561, "t": 93.77099999999996, "r": 480.5894799999999, "b": 101.84069999999997, "coord_origin": "1"}}, {"id": 2, "text": "Fig. 6.", "bbox": {"l": 134.765, "t": 125.79918999999984, "r": 162.64424, "b": 133.72551999999996, "coord_origin": "1"}}, {"id": 3, "text": "Visualization of predicted structure and detected bounding boxes on a complex", "bbox": {"l": 165.215, "t": 125.86200000000008, "r": 480.58752, "b": 133.93169999999998, "coord_origin": "1"}}, {"id": 4, "text": "table with many rows. The OTSL model (B) captured repeating pattern of horizontally", "bbox": {"l": 134.765, "t": 136.82097999999996, "r": 480.58823, "b": 144.89068999999995, "coord_origin": "1"}}, {"id": 5, "text": "merged cells from the GT (A), unlike the HTML model (C). The HTML model also", "bbox": {"l": 134.765, "t": 147.77997000000005, "r": 480.5881999999999, "b": 155.84966999999995, "coord_origin": "1"}}, {"id": 6, "text": "didn\u2019t complete the HTML sequence correctly and displayed a lot more of drift and", "bbox": {"l": 134.765, "t": 158.73895000000005, "r": 480.58838000000003, "b": 166.80864999999994, "coord_origin": "1"}}, {"id": 7, "text": "overlap of bounding boxes. \"PMC5406406_003_01.png\" PubTabNet.", "bbox": {"l": 134.765, "t": 169.69794000000002, "r": 415.84454, "b": 177.76764000000003, "coord_origin": "1"}}, {"id": 8, "text": "B", "bbox": {"l": 171.5049, "t": 312.45032, "r": 177.59613, "b": 320.36386, "coord_origin": "1"}}, {"id": 9, "text": "C", "bbox": {"l": 171.05823, "t": 492.65274, "r": 177.14946, "b": 500.56628, "coord_origin": "1"}}, {"id": 10, "text": "Incorrect end of HTML sequence", "bbox": {"l": 283.047, "t": 627.48166, "r": 374.96332, "b": 633.4168099999999, "coord_origin": "1"}}, {"id": 11, "text": "Horizontally merged cells are not present", "bbox": {"l": 283.047, "t": 617.35776, "r": 398.05978, "b": 623.29291, "coord_origin": "1"}}, {"id": 12, "text": "Repeating pattern is well represented in predictions", "bbox": {"l": 293.64209, "t": 465.59784, "r": 437.50800000000004, "b": 471.53299, "coord_origin": "1"}}, {"id": 13, "text": "Repeating pattern of", "bbox": {"l": 181.89114, "t": 288.35962000000006, "r": 239.23492, "b": 294.2947700000001, "coord_origin": "1"}}, {"id": 14, "text": "horizontally merged cells", "bbox": {"l": 181.89114, "t": 294.89423, "r": 251.52917, "b": 300.82938, "coord_origin": "1"}}, {"id": 15, "text": "A", "bbox": {"l": 247.83432, "t": 184.75989000000004, "r": 253.61339, "b": 194.81635000000006, "coord_origin": "1"}}, {"id": 16, "text": "Bounding box drifting at the end", "bbox": {"l": 292.18976, "t": 607.80609, "r": 381.54663, "b": 613.7412400000001, "coord_origin": "1"}}, {"id": 17, "text": "OTSL", "bbox": {"l": 172.27777, "t": 381.36288, "r": 180.18666, "b": 403.40067, "coord_origin": "1"}}, {"id": 18, "text": "HTML", "bbox": {"l": 172.27747, "t": 555.7769499999999, "r": 180.18663, "b": 578.7478, "coord_origin": "1"}}], "predictions": {"layout": {"clusters": [{"id": 0, "label": "Page-header", "bbox": {"l": 194.17212467193605, "t": 93.14918889999387, "r": 447.54291000000006, "b": 102.19556579589846, "coord_origin": "1"}, "confidence": 0.9478436708450317, "cells": [{"id": 0, "text": "Optimized Table Tokenization for Table Structure Recognition", "bbox": {"l": 194.478, "t": 93.77099999999996, "r": 447.54291000000006, "b": 101.84069999999997, "coord_origin": "1"}}]}, {"id": 1, "label": "Page-header", "bbox": {"l": 471.22020263671874, "t": 93.60166683197019, "r": 480.5894799999999, "b": 101.84069999999997, "coord_origin": "1"}, "confidence": 0.8945758938789368, "cells": [{"id": 1, "text": "11", "bbox": {"l": 471.37561, "t": 93.77099999999996, "r": 480.5894799999999, "b": 101.84069999999997, "coord_origin": "1"}}]}, {"id": 2, "label": "Caption", "bbox": {"l": 134.0015642166138, "t": 124.99403343200686, "r": 480.82831478118896, "b": 178.36690464019773, "coord_origin": "1"}, "confidence": 0.95703125, "cells": [{"id": 2, "text": "Fig. 6.", "bbox": {"l": 134.765, "t": 125.79918999999984, "r": 162.64424, "b": 133.72551999999996, "coord_origin": "1"}}, {"id": 3, "text": "Visualization of predicted structure and detected bounding boxes on a complex", "bbox": {"l": 165.215, "t": 125.86200000000008, "r": 480.58752, "b": 133.93169999999998, "coord_origin": "1"}}, {"id": 4, "text": "table with many rows. The OTSL model (B) captured repeating pattern of horizontally", "bbox": {"l": 134.765, "t": 136.82097999999996, "r": 480.58823, "b": 144.89068999999995, "coord_origin": "1"}}, {"id": 5, "text": "merged cells from the GT (A), unlike the HTML model (C). The HTML model also", "bbox": {"l": 134.765, "t": 147.77997000000005, "r": 480.5881999999999, "b": 155.84966999999995, "coord_origin": "1"}}, {"id": 6, "text": "didn\u2019t complete the HTML sequence correctly and displayed a lot more of drift and", "bbox": {"l": 134.765, "t": 158.73895000000005, "r": 480.58838000000003, "b": 166.80864999999994, "coord_origin": "1"}}, {"id": 7, "text": "overlap of bounding boxes. \"PMC5406406_003_01.png\" PubTabNet.", "bbox": {"l": 134.765, "t": 169.69794000000002, "r": 415.84454, "b": 177.76764000000003, "coord_origin": "1"}}]}, {"id": 3, "label": "Picture", "bbox": {"l": 168.2693000793457, "t": 182.13025588989262, "r": 447.7568544387817, "b": 634.443228149414, "coord_origin": "1"}, "confidence": 0.7700356245040894, "cells": [{"id": 8, "text": "B", "bbox": {"l": 171.5049, "t": 312.45032, "r": 177.59613, "b": 320.36386, "coord_origin": "1"}}, {"id": 9, "text": "C", "bbox": {"l": 171.05823, "t": 492.65274, "r": 177.14946, "b": 500.56628, "coord_origin": "1"}}, {"id": 10, "text": "Incorrect end of HTML sequence", "bbox": {"l": 283.047, "t": 627.48166, "r": 374.96332, "b": 633.4168099999999, "coord_origin": "1"}}, {"id": 11, "text": "Horizontally merged cells are not present", "bbox": {"l": 283.047, "t": 617.35776, "r": 398.05978, "b": 623.29291, "coord_origin": "1"}}, {"id": 12, "text": "Repeating pattern is well represented in predictions", "bbox": {"l": 293.64209, "t": 465.59784, "r": 437.50800000000004, "b": 471.53299, "coord_origin": "1"}}, {"id": 13, "text": "Repeating pattern of", "bbox": {"l": 181.89114, "t": 288.35962000000006, "r": 239.23492, "b": 294.2947700000001, "coord_origin": "1"}}, {"id": 14, "text": "horizontally merged cells", "bbox": {"l": 181.89114, "t": 294.89423, "r": 251.52917, "b": 300.82938, "coord_origin": "1"}}, {"id": 15, "text": "A", "bbox": {"l": 247.83432, "t": 184.75989000000004, "r": 253.61339, "b": 194.81635000000006, "coord_origin": "1"}}, {"id": 16, "text": "Bounding box drifting at the end", "bbox": {"l": 292.18976, "t": 607.80609, "r": 381.54663, "b": 613.7412400000001, "coord_origin": "1"}}, {"id": 17, "text": "OTSL", "bbox": {"l": 172.27777, "t": 381.36288, "r": 180.18666, "b": 403.40067, "coord_origin": "1"}}, {"id": 18, "text": "HTML", "bbox": {"l": 172.27747, "t": 555.7769499999999, "r": 180.18663, "b": 578.7478, "coord_origin": "1"}}]}]}, "tablestructure": {"table_map": {}}, "figures_classification": null, "equations_prediction": null}, "assembled": {"elements": [{"label": "Page-header", "id": 0, "page_no": 10, "cluster": {"id": 0, "label": "Page-header", "bbox": {"l": 194.17212467193605, "t": 93.14918889999387, "r": 447.54291000000006, "b": 102.19556579589846, "coord_origin": "1"}, "confidence": 0.9478436708450317, "cells": [{"id": 0, "text": "Optimized Table Tokenization for Table Structure Recognition", "bbox": {"l": 194.478, "t": 93.77099999999996, "r": 447.54291000000006, "b": 101.84069999999997, "coord_origin": "1"}}]}, "text": "Optimized Table Tokenization for Table Structure Recognition"}, {"label": "Page-header", "id": 1, "page_no": 10, "cluster": {"id": 1, "label": "Page-header", "bbox": {"l": 471.22020263671874, "t": 93.60166683197019, "r": 480.5894799999999, "b": 101.84069999999997, "coord_origin": "1"}, "confidence": 0.8945758938789368, "cells": [{"id": 1, "text": "11", "bbox": {"l": 471.37561, "t": 93.77099999999996, "r": 480.5894799999999, "b": 101.84069999999997, "coord_origin": "1"}}]}, "text": "11"}, {"label": "Caption", "id": 2, "page_no": 10, "cluster": {"id": 2, "label": "Caption", "bbox": {"l": 134.0015642166138, "t": 124.99403343200686, "r": 480.82831478118896, "b": 178.36690464019773, "coord_origin": "1"}, "confidence": 0.95703125, "cells": [{"id": 2, "text": "Fig. 6.", "bbox": {"l": 134.765, "t": 125.79918999999984, "r": 162.64424, "b": 133.72551999999996, "coord_origin": "1"}}, {"id": 3, "text": "Visualization of predicted structure and detected bounding boxes on a complex", "bbox": {"l": 165.215, "t": 125.86200000000008, "r": 480.58752, "b": 133.93169999999998, "coord_origin": "1"}}, {"id": 4, "text": "table with many rows. The OTSL model (B) captured repeating pattern of horizontally", "bbox": {"l": 134.765, "t": 136.82097999999996, "r": 480.58823, "b": 144.89068999999995, "coord_origin": "1"}}, {"id": 5, "text": "merged cells from the GT (A), unlike the HTML model (C). The HTML model also", "bbox": {"l": 134.765, "t": 147.77997000000005, "r": 480.5881999999999, "b": 155.84966999999995, "coord_origin": "1"}}, {"id": 6, "text": "didn\u2019t complete the HTML sequence correctly and displayed a lot more of drift and", "bbox": {"l": 134.765, "t": 158.73895000000005, "r": 480.58838000000003, "b": 166.80864999999994, "coord_origin": "1"}}, {"id": 7, "text": "overlap of bounding boxes. \"PMC5406406_003_01.png\" PubTabNet.", "bbox": {"l": 134.765, "t": 169.69794000000002, "r": 415.84454, "b": 177.76764000000003, "coord_origin": "1"}}]}, "text": "Fig. 6. Visualization of predicted structure and detected bounding boxes on a complex table with many rows. The OTSL model (B) captured repeating pattern of horizontally merged cells from the GT (A), unlike the HTML model (C). The HTML model also didn\u2019t complete the HTML sequence correctly and displayed a lot more of drift and overlap of bounding boxes. \"PMC5406406_003_01.png\" PubTabNet."}, {"label": "Picture", "id": 3, "page_no": 10, "cluster": {"id": 3, "label": "Picture", "bbox": {"l": 168.2693000793457, "t": 182.13025588989262, "r": 447.7568544387817, "b": 634.443228149414, "coord_origin": "1"}, "confidence": 0.7700356245040894, "cells": [{"id": 8, "text": "B", "bbox": {"l": 171.5049, "t": 312.45032, "r": 177.59613, "b": 320.36386, "coord_origin": "1"}}, {"id": 9, "text": "C", "bbox": {"l": 171.05823, "t": 492.65274, "r": 177.14946, "b": 500.56628, "coord_origin": "1"}}, {"id": 10, "text": "Incorrect end of HTML sequence", "bbox": {"l": 283.047, "t": 627.48166, "r": 374.96332, "b": 633.4168099999999, "coord_origin": "1"}}, {"id": 11, "text": "Horizontally merged cells are not present", "bbox": {"l": 283.047, "t": 617.35776, "r": 398.05978, "b": 623.29291, "coord_origin": "1"}}, {"id": 12, "text": "Repeating pattern is well represented in predictions", "bbox": {"l": 293.64209, "t": 465.59784, "r": 437.50800000000004, "b": 471.53299, "coord_origin": "1"}}, {"id": 13, "text": "Repeating pattern of", "bbox": {"l": 181.89114, "t": 288.35962000000006, "r": 239.23492, "b": 294.2947700000001, "coord_origin": "1"}}, {"id": 14, "text": "horizontally merged cells", "bbox": {"l": 181.89114, "t": 294.89423, "r": 251.52917, "b": 300.82938, "coord_origin": "1"}}, {"id": 15, "text": "A", "bbox": {"l": 247.83432, "t": 184.75989000000004, "r": 253.61339, "b": 194.81635000000006, "coord_origin": "1"}}, {"id": 16, "text": "Bounding box drifting at the end", "bbox": {"l": 292.18976, "t": 607.80609, "r": 381.54663, "b": 613.7412400000001, "coord_origin": "1"}}, {"id": 17, "text": "OTSL", "bbox": {"l": 172.27777, "t": 381.36288, "r": 180.18666, "b": 403.40067, "coord_origin": "1"}}, {"id": 18, "text": "HTML", "bbox": {"l": 172.27747, "t": 555.7769499999999, "r": 180.18663, "b": 578.7478, "coord_origin": "1"}}]}, "text": "", "data": null, "provenance": null, "predicted_class": null, "confidence": null}], "body": [{"label": "Caption", "id": 2, "page_no": 10, "cluster": {"id": 2, "label": "Caption", "bbox": {"l": 134.0015642166138, "t": 124.99403343200686, "r": 480.82831478118896, "b": 178.36690464019773, "coord_origin": "1"}, "confidence": 0.95703125, "cells": [{"id": 2, "text": "Fig. 6.", "bbox": {"l": 134.765, "t": 125.79918999999984, "r": 162.64424, "b": 133.72551999999996, "coord_origin": "1"}}, {"id": 3, "text": "Visualization of predicted structure and detected bounding boxes on a complex", "bbox": {"l": 165.215, "t": 125.86200000000008, "r": 480.58752, "b": 133.93169999999998, "coord_origin": "1"}}, {"id": 4, "text": "table with many rows. The OTSL model (B) captured repeating pattern of horizontally", "bbox": {"l": 134.765, "t": 136.82097999999996, "r": 480.58823, "b": 144.89068999999995, "coord_origin": "1"}}, {"id": 5, "text": "merged cells from the GT (A), unlike the HTML model (C). The HTML model also", "bbox": {"l": 134.765, "t": 147.77997000000005, "r": 480.5881999999999, "b": 155.84966999999995, "coord_origin": "1"}}, {"id": 6, "text": "didn\u2019t complete the HTML sequence correctly and displayed a lot more of drift and", "bbox": {"l": 134.765, "t": 158.73895000000005, "r": 480.58838000000003, "b": 166.80864999999994, "coord_origin": "1"}}, {"id": 7, "text": "overlap of bounding boxes. \"PMC5406406_003_01.png\" PubTabNet.", "bbox": {"l": 134.765, "t": 169.69794000000002, "r": 415.84454, "b": 177.76764000000003, "coord_origin": "1"}}]}, "text": "Fig. 6. Visualization of predicted structure and detected bounding boxes on a complex table with many rows. The OTSL model (B) captured repeating pattern of horizontally merged cells from the GT (A), unlike the HTML model (C). The HTML model also didn\u2019t complete the HTML sequence correctly and displayed a lot more of drift and overlap of bounding boxes. \"PMC5406406_003_01.png\" PubTabNet."}, {"label": "Picture", "id": 3, "page_no": 10, "cluster": {"id": 3, "label": "Picture", "bbox": {"l": 168.2693000793457, "t": 182.13025588989262, "r": 447.7568544387817, "b": 634.443228149414, "coord_origin": "1"}, "confidence": 0.7700356245040894, "cells": [{"id": 8, "text": "B", "bbox": {"l": 171.5049, "t": 312.45032, "r": 177.59613, "b": 320.36386, "coord_origin": "1"}}, {"id": 9, "text": "C", "bbox": {"l": 171.05823, "t": 492.65274, "r": 177.14946, "b": 500.56628, "coord_origin": "1"}}, {"id": 10, "text": "Incorrect end of HTML sequence", "bbox": {"l": 283.047, "t": 627.48166, "r": 374.96332, "b": 633.4168099999999, "coord_origin": "1"}}, {"id": 11, "text": "Horizontally merged cells are not present", "bbox": {"l": 283.047, "t": 617.35776, "r": 398.05978, "b": 623.29291, "coord_origin": "1"}}, {"id": 12, "text": "Repeating pattern is well represented in predictions", "bbox": {"l": 293.64209, "t": 465.59784, "r": 437.50800000000004, "b": 471.53299, "coord_origin": "1"}}, {"id": 13, "text": "Repeating pattern of", "bbox": {"l": 181.89114, "t": 288.35962000000006, "r": 239.23492, "b": 294.2947700000001, "coord_origin": "1"}}, {"id": 14, "text": "horizontally merged cells", "bbox": {"l": 181.89114, "t": 294.89423, "r": 251.52917, "b": 300.82938, "coord_origin": "1"}}, {"id": 15, "text": "A", "bbox": {"l": 247.83432, "t": 184.75989000000004, "r": 253.61339, "b": 194.81635000000006, "coord_origin": "1"}}, {"id": 16, "text": "Bounding box drifting at the end", "bbox": {"l": 292.18976, "t": 607.80609, "r": 381.54663, "b": 613.7412400000001, "coord_origin": "1"}}, {"id": 17, "text": "OTSL", "bbox": {"l": 172.27777, "t": 381.36288, "r": 180.18666, "b": 403.40067, "coord_origin": "1"}}, {"id": 18, "text": "HTML", "bbox": {"l": 172.27747, "t": 555.7769499999999, "r": 180.18663, "b": 578.7478, "coord_origin": "1"}}]}, "text": "", "data": null, "provenance": null, "predicted_class": null, "confidence": null}], "headers": [{"label": "Page-header", "id": 0, "page_no": 10, "cluster": {"id": 0, "label": "Page-header", "bbox": {"l": 194.17212467193605, "t": 93.14918889999387, "r": 447.54291000000006, "b": 102.19556579589846, "coord_origin": "1"}, "confidence": 0.9478436708450317, "cells": [{"id": 0, "text": "Optimized Table Tokenization for Table Structure Recognition", "bbox": {"l": 194.478, "t": 93.77099999999996, "r": 447.54291000000006, "b": 101.84069999999997, "coord_origin": "1"}}]}, "text": "Optimized Table Tokenization for Table Structure Recognition"}, {"label": "Page-header", "id": 1, "page_no": 10, "cluster": {"id": 1, "label": "Page-header", "bbox": {"l": 471.22020263671874, "t": 93.60166683197019, "r": 480.5894799999999, "b": 101.84069999999997, "coord_origin": "1"}, "confidence": 0.8945758938789368, "cells": [{"id": 1, "text": "11", "bbox": {"l": 471.37561, "t": 93.77099999999996, "r": 480.5894799999999, "b": 101.84069999999997, "coord_origin": "1"}}]}, "text": "11"}]}}, {"page_no": 11, "page_hash": "6a9aa589dc4faead43b032ec733af0c4a6fedfa834aa56b1bfefc7458ea949cc", "size": {"width": 612.0, "height": 792.0}, "cells": [{"id": 0, "text": "12", "bbox": {"l": 134.765, "t": 93.77099999999996, "r": 143.97887, "b": 101.84069999999997, "coord_origin": "1"}}, {"id": 1, "text": "M.", "bbox": {"l": 167.82053, "t": 93.77099999999996, "r": 178.08249, "b": 101.84069999999997, "coord_origin": "1"}}, {"id": 2, "text": "Lysak, et al.", "bbox": {"l": 182.37929, "t": 93.77099999999996, "r": 231.72049000000004, "b": 101.84069999999997, "coord_origin": "1"}}, {"id": 3, "text": "6", "bbox": {"l": 134.765, "t": 117.54894999999988, "r": 141.4886, "b": 128.11737000000005, "coord_origin": "1"}}, {"id": 4, "text": "Conclusion", "bbox": {"l": 154.9382, "t": 117.54894999999988, "r": 219.25478999999999, "b": 128.11737000000005, "coord_origin": "1"}}, {"id": 5, "text": "We demonstrated that representing tables in HTML for the task of table struc-", "bbox": {"l": 134.765, "t": 146.86377000000005, "r": 480.59476, "b": 155.66076999999996, "coord_origin": "1"}}, {"id": 6, "text": "ture recognition with Im2Seq models is ill-suited and has serious limitations.", "bbox": {"l": 134.765, "t": 158.81879000000004, "r": 480.59476, "b": 167.61577999999997, "coord_origin": "1"}}, {"id": 7, "text": "Furthermore, we presented in this paper an Optimized Table Structure Language", "bbox": {"l": 134.765, "t": 170.77380000000005, "r": 480.58978, "b": 179.57079999999996, "coord_origin": "1"}}, {"id": 8, "text": "(OTSL) which, when compared to commonly used general purpose languages,", "bbox": {"l": 134.765, "t": 182.72979999999995, "r": 480.59569999999997, "b": 191.52679, "coord_origin": "1"}}, {"id": 9, "text": "has several key benefits.", "bbox": {"l": 134.765, "t": 194.68480999999997, "r": 239.5387, "b": 203.48181, "coord_origin": "1"}}, {"id": 10, "text": "First and foremost, given the same network configuration, inference time for", "bbox": {"l": 149.709, "t": 207.44379000000004, "r": 480.59283000000005, "b": 216.24077999999997, "coord_origin": "1"}}, {"id": 11, "text": "a table-structure prediction is about 2 times faster compared to the conventional", "bbox": {"l": 134.765, "t": 219.39880000000005, "r": 480.59365999999994, "b": 228.19579999999996, "coord_origin": "1"}}, {"id": 12, "text": "HTML approach. This is primarily owed to the shorter sequence length of the", "bbox": {"l": 134.765, "t": 231.35382000000004, "r": 480.59079, "b": 240.15081999999995, "coord_origin": "1"}}, {"id": 13, "text": "OTSL representation. Additional performance benefits can be obtained with", "bbox": {"l": 134.765, "t": 243.30884000000003, "r": 480.58786000000003, "b": 252.10582999999997, "coord_origin": "1"}}, {"id": 14, "text": "HPO (hyper parameter optimization). As we demonstrate in our experiments,", "bbox": {"l": 134.765, "t": 255.26482999999996, "r": 480.59479, "b": 264.06183, "coord_origin": "1"}}, {"id": 15, "text": "models trained on OTSL can be significantly smaller, e.g. by reducing the number", "bbox": {"l": 134.765, "t": 267.21984999999995, "r": 480.5878000000001, "b": 276.01685, "coord_origin": "1"}}, {"id": 16, "text": "of encoder and decoder layers, while preserving comparatively good prediction", "bbox": {"l": 134.765, "t": 279.17487000000006, "r": 480.59268, "b": 287.97183, "coord_origin": "1"}}, {"id": 17, "text": "quality. This can further improve inference performance, yielding 5-6 times faster", "bbox": {"l": 134.765, "t": 291.12985, "r": 480.58871, "b": 299.92682, "coord_origin": "1"}}, {"id": 18, "text": "inference speed in OTSL with prediction quality comparable to models trained", "bbox": {"l": 134.765, "t": 303.08484, "r": 480.59375, "b": 311.88181, "coord_origin": "1"}}, {"id": 19, "text": "on HTML (see Table 1).", "bbox": {"l": 134.765, "t": 315.03983, "r": 240.92351000000002, "b": 323.83679, "coord_origin": "1"}}, {"id": 20, "text": "Secondly, OTSL has more inherent structure and a significantly restricted vo-", "bbox": {"l": 149.709, "t": 327.79883, "r": 480.58984, "b": 336.5957900000001, "coord_origin": "1"}}, {"id": 21, "text": "cabulary size. This allows autoregressive models to perform better in the TED", "bbox": {"l": 134.765, "t": 339.75482, "r": 480.59473, "b": 348.55179, "coord_origin": "1"}}, {"id": 22, "text": "metric, but especially with regards to prediction accuracy of the table-cell bound-", "bbox": {"l": 134.765, "t": 351.70981, "r": 480.58664, "b": 360.50677, "coord_origin": "1"}}, {"id": 23, "text": "ing boxes (see Table 2). As shown in Figure 5, we observe that the OTSL dras-", "bbox": {"l": 134.765, "t": 363.66479, "r": 480.59479, "b": 372.46176, "coord_origin": "1"}}, {"id": 24, "text": "tically reduces the drift for table cell bounding boxes at high row count and in", "bbox": {"l": 134.765, "t": 375.61978, "r": 480.58971999999994, "b": 384.41675, "coord_origin": "1"}}, {"id": 25, "text": "sparse tables. This leads to more accurate predictions and a significant reduction", "bbox": {"l": 134.765, "t": 387.57477, "r": 480.58673, "b": 396.37173, "coord_origin": "1"}}, {"id": 26, "text": "in post-processing complexity, which is an undesired necessity in HTML-based", "bbox": {"l": 134.765, "t": 399.53076, "r": 480.58574999999996, "b": 408.32773, "coord_origin": "1"}}, {"id": 27, "text": "Im2Seq models. Significant novelty lies in OTSL syntactical rules, which are few,", "bbox": {"l": 134.765, "t": 411.48575, "r": 480.58675999999997, "b": 420.28271, "coord_origin": "1"}}, {"id": 28, "text": "simple and always backwards looking. Each new token can be validated only by", "bbox": {"l": 134.765, "t": 423.44073, "r": 480.59482, "b": 432.23769999999996, "coord_origin": "1"}}, {"id": 29, "text": "analyzing the sequence of previous tokens, without requiring the entire sequence", "bbox": {"l": 134.765, "t": 435.39572, "r": 480.58777, "b": 444.19269, "coord_origin": "1"}}, {"id": 30, "text": "to detect mistakes. This in return allows to perform structural error detection", "bbox": {"l": 134.765, "t": 447.35071, "r": 480.58968999999996, "b": 456.14767, "coord_origin": "1"}}, {"id": 31, "text": "and correction on-the-fly during sequence generation.", "bbox": {"l": 134.765, "t": 459.30569, "r": 366.77698, "b": 468.10266, "coord_origin": "1"}}, {"id": 32, "text": "References", "bbox": {"l": 134.765, "t": 493.82083, "r": 197.68642, "b": 504.38922, "coord_origin": "1"}}, {"id": 33, "text": "1.", "bbox": {"l": 139.371, "t": 522.87985, "r": 146.46127, "b": 530.94962, "coord_origin": "1"}}, {"id": 34, "text": "Auer, C., Dolfi, M., Carvalho, A., Ramis, C.B., Staar, P.W.J.: Delivering doc-", "bbox": {"l": 151.01955, "t": 522.87985, "r": 480.5920100000001, "b": 530.94962, "coord_origin": "1"}}, {"id": 35, "text": "ument conversion as a cloud service with high throughput and responsiveness.", "bbox": {"l": 151.51801, "t": 533.83887, "r": 480.58667, "b": 541.90862, "coord_origin": "1"}}, {"id": 36, "text": "CoRR", "bbox": {"l": 151.51801, "t": 544.79785, "r": 176.34149, "b": 552.86761, "coord_origin": "1"}}, {"id": 37, "text": "abs/2206.00785", "bbox": {"l": 179.464, "t": 544.73509, "r": 250.67963, "b": 552.66139, "coord_origin": "1"}}, {"id": 38, "text": "(2022).", "bbox": {"l": 253.804, "t": 544.79785, "r": 281.9567, "b": 552.86761, "coord_origin": "1"}}, {"id": 39, "text": "https://doi.org/10.48550/arXiv.2206.00785", "bbox": {"l": 285.078, "t": 545.44344, "r": 478.03403000000003, "b": 552.91245, "coord_origin": "1"}}, {"id": 40, "text": ",", "bbox": {"l": 478.0319799999999, "t": 544.79785, "r": 480.59099999999995, "b": 552.86761, "coord_origin": "1"}}, {"id": 41, "text": "https://doi.org/10.48550/arXiv.2206.00785", "bbox": {"l": 151.51797, "t": 556.4024400000001, "r": 344.474, "b": 563.87144, "coord_origin": "1"}}, {"id": 42, "text": "2.", "bbox": {"l": 139.37097, "t": 567.51884, "r": 145.94186, "b": 575.58861, "coord_origin": "1"}}, {"id": 43, "text": "Chen, B., Peng, D., Zhang, J., Ren, Y., Jin, L.: Complex table structure recognition", "bbox": {"l": 150.16624, "t": 567.51884, "r": 480.58636, "b": 575.58861, "coord_origin": "1"}}, {"id": 44, "text": "in the wild using transformer and identity matrix-based augmentation. In: Porwal,", "bbox": {"l": 151.51797, "t": 578.47784, "r": 480.59012, "b": 586.5476100000001, "coord_origin": "1"}}, {"id": 45, "text": "U., Forn\u00e9s, A., Shafait, F. (eds.) Frontiers in Handwriting Recognition. pp. 545-", "bbox": {"l": 151.51797, "t": 589.43684, "r": 480.5920100000001, "b": 597.50661, "coord_origin": "1"}}, {"id": 46, "text": "561. Springer International Publishing, Cham (2022)", "bbox": {"l": 151.51797, "t": 600.39584, "r": 364.17856, "b": 608.46561, "coord_origin": "1"}}, {"id": 47, "text": "3.", "bbox": {"l": 139.37097, "t": 612.1588399999999, "r": 146.4379, "b": 620.22861, "coord_origin": "1"}}, {"id": 48, "text": "Chi, Z., Huang, H., Xu, H.D., Yu, H., Yin, W., Mao, X.L.: Complicated table", "bbox": {"l": 150.98117, "t": 612.1588399999999, "r": 480.58731000000006, "b": 620.22861, "coord_origin": "1"}}, {"id": 49, "text": "structure recognition. arXiv preprint arXiv:1908.04729 (2019)", "bbox": {"l": 151.51797, "t": 623.11784, "r": 400.22525, "b": 631.18761, "coord_origin": "1"}}, {"id": 50, "text": "4.", "bbox": {"l": 139.37097, "t": 634.88084, "r": 146.52443, "b": 642.95061, "coord_origin": "1"}}, {"id": 51, "text": "Deng, Y., Rosenberg, D., Mann, G.: Challenges in end-to-end neural scientific", "bbox": {"l": 151.12335, "t": 634.88084, "r": 480.58826, "b": 642.95061, "coord_origin": "1"}}, {"id": 52, "text": "table recognition. In: 2019 International Conference on Document Analysis and", "bbox": {"l": 151.51797, "t": 645.83984, "r": 480.58752, "b": 653.9096099999999, "coord_origin": "1"}}, {"id": 53, "text": "Recognition (ICDAR). pp. 894-901. IEEE (2019)", "bbox": {"l": 151.51797, "t": 656.79785, "r": 350.11115, "b": 664.86761, "coord_origin": "1"}}], "predictions": {"layout": {"clusters": [{"id": 0, "label": "Page-header", "bbox": {"l": 134.6935380935669, "t": 93.01469650268552, "r": 231.72049000000004, "b": 101.84788713455202, "coord_origin": "1"}, "confidence": 0.6001661419868469, "cells": [{"id": 0, "text": "12", "bbox": {"l": 134.765, "t": 93.77099999999996, "r": 143.97887, "b": 101.84069999999997, "coord_origin": "1"}}, {"id": 1, "text": "M.", "bbox": {"l": 167.82053, "t": 93.77099999999996, "r": 178.08249, "b": 101.84069999999997, "coord_origin": "1"}}, {"id": 2, "text": "Lysak, et al.", "bbox": {"l": 182.37929, "t": 93.77099999999996, "r": 231.72049000000004, "b": 101.84069999999997, "coord_origin": "1"}}]}, {"id": 1, "label": "Section-header", "bbox": {"l": 134.32137451171874, "t": 116.9173613548279, "r": 219.25478999999999, "b": 128.11737000000005, "coord_origin": "1"}, "confidence": 0.9443027973175049, "cells": [{"id": 3, "text": "6", "bbox": {"l": 134.765, "t": 117.54894999999988, "r": 141.4886, "b": 128.11737000000005, "coord_origin": "1"}}, {"id": 4, "text": "Conclusion", "bbox": {"l": 154.9382, "t": 117.54894999999988, "r": 219.25478999999999, "b": 128.11737000000005, "coord_origin": "1"}}]}, {"id": 2, "label": "Text", "bbox": {"l": 134.0799774169922, "t": 146.1485215187073, "r": 480.59569999999997, "b": 203.48181, "coord_origin": "1"}, "confidence": 0.9849004745483398, "cells": [{"id": 5, "text": "We demonstrated that representing tables in HTML for the task of table struc-", "bbox": {"l": 134.765, "t": 146.86377000000005, "r": 480.59476, "b": 155.66076999999996, "coord_origin": "1"}}, {"id": 6, "text": "ture recognition with Im2Seq models is ill-suited and has serious limitations.", "bbox": {"l": 134.765, "t": 158.81879000000004, "r": 480.59476, "b": 167.61577999999997, "coord_origin": "1"}}, {"id": 7, "text": "Furthermore, we presented in this paper an Optimized Table Structure Language", "bbox": {"l": 134.765, "t": 170.77380000000005, "r": 480.58978, "b": 179.57079999999996, "coord_origin": "1"}}, {"id": 8, "text": "(OTSL) which, when compared to commonly used general purpose languages,", "bbox": {"l": 134.765, "t": 182.72979999999995, "r": 480.59569999999997, "b": 191.52679, "coord_origin": "1"}}, {"id": 9, "text": "has several key benefits.", "bbox": {"l": 134.765, "t": 194.68480999999997, "r": 239.5387, "b": 203.48181, "coord_origin": "1"}}]}, {"id": 3, "label": "Text", "bbox": {"l": 133.63015937805176, "t": 206.26369628906252, "r": 480.64513664245607, "b": 324.5816196441651, "coord_origin": "1"}, "confidence": 0.9870830178260803, "cells": [{"id": 10, "text": "First and foremost, given the same network configuration, inference time for", "bbox": {"l": 149.709, "t": 207.44379000000004, "r": 480.59283000000005, "b": 216.24077999999997, "coord_origin": "1"}}, {"id": 11, "text": "a table-structure prediction is about 2 times faster compared to the conventional", "bbox": {"l": 134.765, "t": 219.39880000000005, "r": 480.59365999999994, "b": 228.19579999999996, "coord_origin": "1"}}, {"id": 12, "text": "HTML approach. This is primarily owed to the shorter sequence length of the", "bbox": {"l": 134.765, "t": 231.35382000000004, "r": 480.59079, "b": 240.15081999999995, "coord_origin": "1"}}, {"id": 13, "text": "OTSL representation. Additional performance benefits can be obtained with", "bbox": {"l": 134.765, "t": 243.30884000000003, "r": 480.58786000000003, "b": 252.10582999999997, "coord_origin": "1"}}, {"id": 14, "text": "HPO (hyper parameter optimization). As we demonstrate in our experiments,", "bbox": {"l": 134.765, "t": 255.26482999999996, "r": 480.59479, "b": 264.06183, "coord_origin": "1"}}, {"id": 15, "text": "models trained on OTSL can be significantly smaller, e.g. by reducing the number", "bbox": {"l": 134.765, "t": 267.21984999999995, "r": 480.5878000000001, "b": 276.01685, "coord_origin": "1"}}, {"id": 16, "text": "of encoder and decoder layers, while preserving comparatively good prediction", "bbox": {"l": 134.765, "t": 279.17487000000006, "r": 480.59268, "b": 287.97183, "coord_origin": "1"}}, {"id": 17, "text": "quality. This can further improve inference performance, yielding 5-6 times faster", "bbox": {"l": 134.765, "t": 291.12985, "r": 480.58871, "b": 299.92682, "coord_origin": "1"}}, {"id": 18, "text": "inference speed in OTSL with prediction quality comparable to models trained", "bbox": {"l": 134.765, "t": 303.08484, "r": 480.59375, "b": 311.88181, "coord_origin": "1"}}, {"id": 19, "text": "on HTML (see Table 1).", "bbox": {"l": 134.765, "t": 315.03983, "r": 240.92351000000002, "b": 323.83679, "coord_origin": "1"}}]}, {"id": 4, "label": "Text", "bbox": {"l": 133.82413501739504, "t": 326.87730903625487, "r": 480.59482, "b": 468.2926139831543, "coord_origin": "1"}, "confidence": 0.986232340335846, "cells": [{"id": 20, "text": "Secondly, OTSL has more inherent structure and a significantly restricted vo-", "bbox": {"l": 149.709, "t": 327.79883, "r": 480.58984, "b": 336.5957900000001, "coord_origin": "1"}}, {"id": 21, "text": "cabulary size. This allows autoregressive models to perform better in the TED", "bbox": {"l": 134.765, "t": 339.75482, "r": 480.59473, "b": 348.55179, "coord_origin": "1"}}, {"id": 22, "text": "metric, but especially with regards to prediction accuracy of the table-cell bound-", "bbox": {"l": 134.765, "t": 351.70981, "r": 480.58664, "b": 360.50677, "coord_origin": "1"}}, {"id": 23, "text": "ing boxes (see Table 2). As shown in Figure 5, we observe that the OTSL dras-", "bbox": {"l": 134.765, "t": 363.66479, "r": 480.59479, "b": 372.46176, "coord_origin": "1"}}, {"id": 24, "text": "tically reduces the drift for table cell bounding boxes at high row count and in", "bbox": {"l": 134.765, "t": 375.61978, "r": 480.58971999999994, "b": 384.41675, "coord_origin": "1"}}, {"id": 25, "text": "sparse tables. This leads to more accurate predictions and a significant reduction", "bbox": {"l": 134.765, "t": 387.57477, "r": 480.58673, "b": 396.37173, "coord_origin": "1"}}, {"id": 26, "text": "in post-processing complexity, which is an undesired necessity in HTML-based", "bbox": {"l": 134.765, "t": 399.53076, "r": 480.58574999999996, "b": 408.32773, "coord_origin": "1"}}, {"id": 27, "text": "Im2Seq models. Significant novelty lies in OTSL syntactical rules, which are few,", "bbox": {"l": 134.765, "t": 411.48575, "r": 480.58675999999997, "b": 420.28271, "coord_origin": "1"}}, {"id": 28, "text": "simple and always backwards looking. Each new token can be validated only by", "bbox": {"l": 134.765, "t": 423.44073, "r": 480.59482, "b": 432.23769999999996, "coord_origin": "1"}}, {"id": 29, "text": "analyzing the sequence of previous tokens, without requiring the entire sequence", "bbox": {"l": 134.765, "t": 435.39572, "r": 480.58777, "b": 444.19269, "coord_origin": "1"}}, {"id": 30, "text": "to detect mistakes. This in return allows to perform structural error detection", "bbox": {"l": 134.765, "t": 447.35071, "r": 480.58968999999996, "b": 456.14767, "coord_origin": "1"}}, {"id": 31, "text": "and correction on-the-fly during sequence generation.", "bbox": {"l": 134.765, "t": 459.30569, "r": 366.77698, "b": 468.10266, "coord_origin": "1"}}]}, {"id": 5, "label": "Section-header", "bbox": {"l": 134.31680746078493, "t": 493.0167823791504, "r": 197.68642, "b": 504.38922, "coord_origin": "1"}, "confidence": 0.952782154083252, "cells": [{"id": 32, "text": "References", "bbox": {"l": 134.765, "t": 493.82083, "r": 197.68642, "b": 504.38922, "coord_origin": "1"}}]}, {"id": 6, "label": "List-item", "bbox": {"l": 139.371, "t": 522.1764713287354, "r": 480.5920100000001, "b": 564.6129249572754, "coord_origin": "1"}, "confidence": 0.9788757562637329, "cells": [{"id": 33, "text": "1.", "bbox": {"l": 139.371, "t": 522.87985, "r": 146.46127, "b": 530.94962, "coord_origin": "1"}}, {"id": 34, "text": "Auer, C., Dolfi, M., Carvalho, A., Ramis, C.B., Staar, P.W.J.: Delivering doc-", "bbox": {"l": 151.01955, "t": 522.87985, "r": 480.5920100000001, "b": 530.94962, "coord_origin": "1"}}, {"id": 35, "text": "ument conversion as a cloud service with high throughput and responsiveness.", "bbox": {"l": 151.51801, "t": 533.83887, "r": 480.58667, "b": 541.90862, "coord_origin": "1"}}, {"id": 36, "text": "CoRR", "bbox": {"l": 151.51801, "t": 544.79785, "r": 176.34149, "b": 552.86761, "coord_origin": "1"}}, {"id": 37, "text": "abs/2206.00785", "bbox": {"l": 179.464, "t": 544.73509, "r": 250.67963, "b": 552.66139, "coord_origin": "1"}}, {"id": 38, "text": "(2022).", "bbox": {"l": 253.804, "t": 544.79785, "r": 281.9567, "b": 552.86761, "coord_origin": "1"}}, {"id": 39, "text": "https://doi.org/10.48550/arXiv.2206.00785", "bbox": {"l": 285.078, "t": 545.44344, "r": 478.03403000000003, "b": 552.91245, "coord_origin": "1"}}, {"id": 40, "text": ",", "bbox": {"l": 478.0319799999999, "t": 544.79785, "r": 480.59099999999995, "b": 552.86761, "coord_origin": "1"}}, {"id": 41, "text": "https://doi.org/10.48550/arXiv.2206.00785", "bbox": {"l": 151.51797, "t": 556.4024400000001, "r": 344.474, "b": 563.87144, "coord_origin": "1"}}]}, {"id": 7, "label": "List-item", "bbox": {"l": 138.86715145111086, "t": 566.1212036132812, "r": 480.61741333007814, "b": 609.1713466644287, "coord_origin": "1"}, "confidence": 0.9785996675491333, "cells": [{"id": 42, "text": "2.", "bbox": {"l": 139.37097, "t": 567.51884, "r": 145.94186, "b": 575.58861, "coord_origin": "1"}}, {"id": 43, "text": "Chen, B., Peng, D., Zhang, J., Ren, Y., Jin, L.: Complex table structure recognition", "bbox": {"l": 150.16624, "t": 567.51884, "r": 480.58636, "b": 575.58861, "coord_origin": "1"}}, {"id": 44, "text": "in the wild using transformer and identity matrix-based augmentation. In: Porwal,", "bbox": {"l": 151.51797, "t": 578.47784, "r": 480.59012, "b": 586.5476100000001, "coord_origin": "1"}}, {"id": 45, "text": "U., Forn\u00e9s, A., Shafait, F. (eds.) Frontiers in Handwriting Recognition. pp. 545-", "bbox": {"l": 151.51797, "t": 589.43684, "r": 480.5920100000001, "b": 597.50661, "coord_origin": "1"}}, {"id": 46, "text": "561. Springer International Publishing, Cham (2022)", "bbox": {"l": 151.51797, "t": 600.39584, "r": 364.17856, "b": 608.46561, "coord_origin": "1"}}]}, {"id": 8, "label": "List-item", "bbox": {"l": 138.72738218307495, "t": 610.5866088867188, "r": 480.58731000000006, "b": 631.8376350402832, "coord_origin": "1"}, "confidence": 0.9714517593383789, "cells": [{"id": 47, "text": "3.", "bbox": {"l": 139.37097, "t": 612.1588399999999, "r": 146.4379, "b": 620.22861, "coord_origin": "1"}}, {"id": 48, "text": "Chi, Z., Huang, H., Xu, H.D., Yu, H., Yin, W., Mao, X.L.: Complicated table", "bbox": {"l": 150.98117, "t": 612.1588399999999, "r": 480.58731000000006, "b": 620.22861, "coord_origin": "1"}}, {"id": 49, "text": "structure recognition. arXiv preprint arXiv:1908.04729 (2019)", "bbox": {"l": 151.51797, "t": 623.11784, "r": 400.22525, "b": 631.18761, "coord_origin": "1"}}]}, {"id": 9, "label": "List-item", "bbox": {"l": 138.95939712524412, "t": 634.1483551025391, "r": 480.58826, "b": 665.3444732666015, "coord_origin": "1"}, "confidence": 0.9804890155792236, "cells": [{"id": 50, "text": "4.", "bbox": {"l": 139.37097, "t": 634.88084, "r": 146.52443, "b": 642.95061, "coord_origin": "1"}}, {"id": 51, "text": "Deng, Y., Rosenberg, D., Mann, G.: Challenges in end-to-end neural scientific", "bbox": {"l": 151.12335, "t": 634.88084, "r": 480.58826, "b": 642.95061, "coord_origin": "1"}}, {"id": 52, "text": "table recognition. In: 2019 International Conference on Document Analysis and", "bbox": {"l": 151.51797, "t": 645.83984, "r": 480.58752, "b": 653.9096099999999, "coord_origin": "1"}}, {"id": 53, "text": "Recognition (ICDAR). pp. 894-901. IEEE (2019)", "bbox": {"l": 151.51797, "t": 656.79785, "r": 350.11115, "b": 664.86761, "coord_origin": "1"}}]}]}, "tablestructure": {"table_map": {}}, "figures_classification": null, "equations_prediction": null}, "assembled": {"elements": [{"label": "Page-header", "id": 0, "page_no": 11, "cluster": {"id": 0, "label": "Page-header", "bbox": {"l": 134.6935380935669, "t": 93.01469650268552, "r": 231.72049000000004, "b": 101.84788713455202, "coord_origin": "1"}, "confidence": 0.6001661419868469, "cells": [{"id": 0, "text": "12", "bbox": {"l": 134.765, "t": 93.77099999999996, "r": 143.97887, "b": 101.84069999999997, "coord_origin": "1"}}, {"id": 1, "text": "M.", "bbox": {"l": 167.82053, "t": 93.77099999999996, "r": 178.08249, "b": 101.84069999999997, "coord_origin": "1"}}, {"id": 2, "text": "Lysak, et al.", "bbox": {"l": 182.37929, "t": 93.77099999999996, "r": 231.72049000000004, "b": 101.84069999999997, "coord_origin": "1"}}]}, "text": "12 M. Lysak, et al."}, {"label": "Section-header", "id": 1, "page_no": 11, "cluster": {"id": 1, "label": "Section-header", "bbox": {"l": 134.32137451171874, "t": 116.9173613548279, "r": 219.25478999999999, "b": 128.11737000000005, "coord_origin": "1"}, "confidence": 0.9443027973175049, "cells": [{"id": 3, "text": "6", "bbox": {"l": 134.765, "t": 117.54894999999988, "r": 141.4886, "b": 128.11737000000005, "coord_origin": "1"}}, {"id": 4, "text": "Conclusion", "bbox": {"l": 154.9382, "t": 117.54894999999988, "r": 219.25478999999999, "b": 128.11737000000005, "coord_origin": "1"}}]}, "text": "6 Conclusion"}, {"label": "Text", "id": 2, "page_no": 11, "cluster": {"id": 2, "label": "Text", "bbox": {"l": 134.0799774169922, "t": 146.1485215187073, "r": 480.59569999999997, "b": 203.48181, "coord_origin": "1"}, "confidence": 0.9849004745483398, "cells": [{"id": 5, "text": "We demonstrated that representing tables in HTML for the task of table struc-", "bbox": {"l": 134.765, "t": 146.86377000000005, "r": 480.59476, "b": 155.66076999999996, "coord_origin": "1"}}, {"id": 6, "text": "ture recognition with Im2Seq models is ill-suited and has serious limitations.", "bbox": {"l": 134.765, "t": 158.81879000000004, "r": 480.59476, "b": 167.61577999999997, "coord_origin": "1"}}, {"id": 7, "text": "Furthermore, we presented in this paper an Optimized Table Structure Language", "bbox": {"l": 134.765, "t": 170.77380000000005, "r": 480.58978, "b": 179.57079999999996, "coord_origin": "1"}}, {"id": 8, "text": "(OTSL) which, when compared to commonly used general purpose languages,", "bbox": {"l": 134.765, "t": 182.72979999999995, "r": 480.59569999999997, "b": 191.52679, "coord_origin": "1"}}, {"id": 9, "text": "has several key benefits.", "bbox": {"l": 134.765, "t": 194.68480999999997, "r": 239.5387, "b": 203.48181, "coord_origin": "1"}}]}, "text": "We demonstrated that representing tables in HTML for the task of table structure recognition with Im2Seq models is ill-suited and has serious limitations. Furthermore, we presented in this paper an Optimized Table Structure Language (OTSL) which, when compared to commonly used general purpose languages, has several key benefits."}, {"label": "Text", "id": 3, "page_no": 11, "cluster": {"id": 3, "label": "Text", "bbox": {"l": 133.63015937805176, "t": 206.26369628906252, "r": 480.64513664245607, "b": 324.5816196441651, "coord_origin": "1"}, "confidence": 0.9870830178260803, "cells": [{"id": 10, "text": "First and foremost, given the same network configuration, inference time for", "bbox": {"l": 149.709, "t": 207.44379000000004, "r": 480.59283000000005, "b": 216.24077999999997, "coord_origin": "1"}}, {"id": 11, "text": "a table-structure prediction is about 2 times faster compared to the conventional", "bbox": {"l": 134.765, "t": 219.39880000000005, "r": 480.59365999999994, "b": 228.19579999999996, "coord_origin": "1"}}, {"id": 12, "text": "HTML approach. This is primarily owed to the shorter sequence length of the", "bbox": {"l": 134.765, "t": 231.35382000000004, "r": 480.59079, "b": 240.15081999999995, "coord_origin": "1"}}, {"id": 13, "text": "OTSL representation. Additional performance benefits can be obtained with", "bbox": {"l": 134.765, "t": 243.30884000000003, "r": 480.58786000000003, "b": 252.10582999999997, "coord_origin": "1"}}, {"id": 14, "text": "HPO (hyper parameter optimization). As we demonstrate in our experiments,", "bbox": {"l": 134.765, "t": 255.26482999999996, "r": 480.59479, "b": 264.06183, "coord_origin": "1"}}, {"id": 15, "text": "models trained on OTSL can be significantly smaller, e.g. by reducing the number", "bbox": {"l": 134.765, "t": 267.21984999999995, "r": 480.5878000000001, "b": 276.01685, "coord_origin": "1"}}, {"id": 16, "text": "of encoder and decoder layers, while preserving comparatively good prediction", "bbox": {"l": 134.765, "t": 279.17487000000006, "r": 480.59268, "b": 287.97183, "coord_origin": "1"}}, {"id": 17, "text": "quality. This can further improve inference performance, yielding 5-6 times faster", "bbox": {"l": 134.765, "t": 291.12985, "r": 480.58871, "b": 299.92682, "coord_origin": "1"}}, {"id": 18, "text": "inference speed in OTSL with prediction quality comparable to models trained", "bbox": {"l": 134.765, "t": 303.08484, "r": 480.59375, "b": 311.88181, "coord_origin": "1"}}, {"id": 19, "text": "on HTML (see Table 1).", "bbox": {"l": 134.765, "t": 315.03983, "r": 240.92351000000002, "b": 323.83679, "coord_origin": "1"}}]}, "text": "First and foremost, given the same network configuration, inference time for a table-structure prediction is about 2 times faster compared to the conventional HTML approach. This is primarily owed to the shorter sequence length of the OTSL representation. Additional performance benefits can be obtained with HPO (hyper parameter optimization). As we demonstrate in our experiments, models trained on OTSL can be significantly smaller, e.g. by reducing the number of encoder and decoder layers, while preserving comparatively good prediction quality. This can further improve inference performance, yielding 5-6 times faster inference speed in OTSL with prediction quality comparable to models trained on HTML (see Table 1)."}, {"label": "Text", "id": 4, "page_no": 11, "cluster": {"id": 4, "label": "Text", "bbox": {"l": 133.82413501739504, "t": 326.87730903625487, "r": 480.59482, "b": 468.2926139831543, "coord_origin": "1"}, "confidence": 0.986232340335846, "cells": [{"id": 20, "text": "Secondly, OTSL has more inherent structure and a significantly restricted vo-", "bbox": {"l": 149.709, "t": 327.79883, "r": 480.58984, "b": 336.5957900000001, "coord_origin": "1"}}, {"id": 21, "text": "cabulary size. This allows autoregressive models to perform better in the TED", "bbox": {"l": 134.765, "t": 339.75482, "r": 480.59473, "b": 348.55179, "coord_origin": "1"}}, {"id": 22, "text": "metric, but especially with regards to prediction accuracy of the table-cell bound-", "bbox": {"l": 134.765, "t": 351.70981, "r": 480.58664, "b": 360.50677, "coord_origin": "1"}}, {"id": 23, "text": "ing boxes (see Table 2). As shown in Figure 5, we observe that the OTSL dras-", "bbox": {"l": 134.765, "t": 363.66479, "r": 480.59479, "b": 372.46176, "coord_origin": "1"}}, {"id": 24, "text": "tically reduces the drift for table cell bounding boxes at high row count and in", "bbox": {"l": 134.765, "t": 375.61978, "r": 480.58971999999994, "b": 384.41675, "coord_origin": "1"}}, {"id": 25, "text": "sparse tables. This leads to more accurate predictions and a significant reduction", "bbox": {"l": 134.765, "t": 387.57477, "r": 480.58673, "b": 396.37173, "coord_origin": "1"}}, {"id": 26, "text": "in post-processing complexity, which is an undesired necessity in HTML-based", "bbox": {"l": 134.765, "t": 399.53076, "r": 480.58574999999996, "b": 408.32773, "coord_origin": "1"}}, {"id": 27, "text": "Im2Seq models. Significant novelty lies in OTSL syntactical rules, which are few,", "bbox": {"l": 134.765, "t": 411.48575, "r": 480.58675999999997, "b": 420.28271, "coord_origin": "1"}}, {"id": 28, "text": "simple and always backwards looking. Each new token can be validated only by", "bbox": {"l": 134.765, "t": 423.44073, "r": 480.59482, "b": 432.23769999999996, "coord_origin": "1"}}, {"id": 29, "text": "analyzing the sequence of previous tokens, without requiring the entire sequence", "bbox": {"l": 134.765, "t": 435.39572, "r": 480.58777, "b": 444.19269, "coord_origin": "1"}}, {"id": 30, "text": "to detect mistakes. This in return allows to perform structural error detection", "bbox": {"l": 134.765, "t": 447.35071, "r": 480.58968999999996, "b": 456.14767, "coord_origin": "1"}}, {"id": 31, "text": "and correction on-the-fly during sequence generation.", "bbox": {"l": 134.765, "t": 459.30569, "r": 366.77698, "b": 468.10266, "coord_origin": "1"}}]}, "text": "Secondly, OTSL has more inherent structure and a significantly restricted vocabulary size. This allows autoregressive models to perform better in the TED metric, but especially with regards to prediction accuracy of the table-cell bounding boxes (see Table 2). As shown in Figure 5, we observe that the OTSL drastically reduces the drift for table cell bounding boxes at high row count and in sparse tables. This leads to more accurate predictions and a significant reduction in post-processing complexity, which is an undesired necessity in HTML-based Im2Seq models. Significant novelty lies in OTSL syntactical rules, which are few, simple and always backwards looking. Each new token can be validated only by analyzing the sequence of previous tokens, without requiring the entire sequence to detect mistakes. This in return allows to perform structural error detection and correction on-the-fly during sequence generation."}, {"label": "Section-header", "id": 5, "page_no": 11, "cluster": {"id": 5, "label": "Section-header", "bbox": {"l": 134.31680746078493, "t": 493.0167823791504, "r": 197.68642, "b": 504.38922, "coord_origin": "1"}, "confidence": 0.952782154083252, "cells": [{"id": 32, "text": "References", "bbox": {"l": 134.765, "t": 493.82083, "r": 197.68642, "b": 504.38922, "coord_origin": "1"}}]}, "text": "References"}, {"label": "List-item", "id": 6, "page_no": 11, "cluster": {"id": 6, "label": "List-item", "bbox": {"l": 139.371, "t": 522.1764713287354, "r": 480.5920100000001, "b": 564.6129249572754, "coord_origin": "1"}, "confidence": 0.9788757562637329, "cells": [{"id": 33, "text": "1.", "bbox": {"l": 139.371, "t": 522.87985, "r": 146.46127, "b": 530.94962, "coord_origin": "1"}}, {"id": 34, "text": "Auer, C., Dolfi, M., Carvalho, A., Ramis, C.B., Staar, P.W.J.: Delivering doc-", "bbox": {"l": 151.01955, "t": 522.87985, "r": 480.5920100000001, "b": 530.94962, "coord_origin": "1"}}, {"id": 35, "text": "ument conversion as a cloud service with high throughput and responsiveness.", "bbox": {"l": 151.51801, "t": 533.83887, "r": 480.58667, "b": 541.90862, "coord_origin": "1"}}, {"id": 36, "text": "CoRR", "bbox": {"l": 151.51801, "t": 544.79785, "r": 176.34149, "b": 552.86761, "coord_origin": "1"}}, {"id": 37, "text": "abs/2206.00785", "bbox": {"l": 179.464, "t": 544.73509, "r": 250.67963, "b": 552.66139, "coord_origin": "1"}}, {"id": 38, "text": "(2022).", "bbox": {"l": 253.804, "t": 544.79785, "r": 281.9567, "b": 552.86761, "coord_origin": "1"}}, {"id": 39, "text": "https://doi.org/10.48550/arXiv.2206.00785", "bbox": {"l": 285.078, "t": 545.44344, "r": 478.03403000000003, "b": 552.91245, "coord_origin": "1"}}, {"id": 40, "text": ",", "bbox": {"l": 478.0319799999999, "t": 544.79785, "r": 480.59099999999995, "b": 552.86761, "coord_origin": "1"}}, {"id": 41, "text": "https://doi.org/10.48550/arXiv.2206.00785", "bbox": {"l": 151.51797, "t": 556.4024400000001, "r": 344.474, "b": 563.87144, "coord_origin": "1"}}]}, "text": "1. Auer, C., Dolfi, M., Carvalho, A., Ramis, C.B., Staar, P.W.J.: Delivering document conversion as a cloud service with high throughput and responsiveness. CoRR abs/2206.00785 (2022). https://doi.org/10.48550/arXiv.2206.00785 , https://doi.org/10.48550/arXiv.2206.00785"}, {"label": "List-item", "id": 7, "page_no": 11, "cluster": {"id": 7, "label": "List-item", "bbox": {"l": 138.86715145111086, "t": 566.1212036132812, "r": 480.61741333007814, "b": 609.1713466644287, "coord_origin": "1"}, "confidence": 0.9785996675491333, "cells": [{"id": 42, "text": "2.", "bbox": {"l": 139.37097, "t": 567.51884, "r": 145.94186, "b": 575.58861, "coord_origin": "1"}}, {"id": 43, "text": "Chen, B., Peng, D., Zhang, J., Ren, Y., Jin, L.: Complex table structure recognition", "bbox": {"l": 150.16624, "t": 567.51884, "r": 480.58636, "b": 575.58861, "coord_origin": "1"}}, {"id": 44, "text": "in the wild using transformer and identity matrix-based augmentation. In: Porwal,", "bbox": {"l": 151.51797, "t": 578.47784, "r": 480.59012, "b": 586.5476100000001, "coord_origin": "1"}}, {"id": 45, "text": "U., Forn\u00e9s, A., Shafait, F. (eds.) Frontiers in Handwriting Recognition. pp. 545-", "bbox": {"l": 151.51797, "t": 589.43684, "r": 480.5920100000001, "b": 597.50661, "coord_origin": "1"}}, {"id": 46, "text": "561. Springer International Publishing, Cham (2022)", "bbox": {"l": 151.51797, "t": 600.39584, "r": 364.17856, "b": 608.46561, "coord_origin": "1"}}]}, "text": "2. Chen, B., Peng, D., Zhang, J., Ren, Y., Jin, L.: Complex table structure recognition in the wild using transformer and identity matrix-based augmentation. In: Porwal, U., Forn\u00e9s, A., Shafait, F. (eds.) Frontiers in Handwriting Recognition. pp. 545561. Springer International Publishing, Cham (2022)"}, {"label": "List-item", "id": 8, "page_no": 11, "cluster": {"id": 8, "label": "List-item", "bbox": {"l": 138.72738218307495, "t": 610.5866088867188, "r": 480.58731000000006, "b": 631.8376350402832, "coord_origin": "1"}, "confidence": 0.9714517593383789, "cells": [{"id": 47, "text": "3.", "bbox": {"l": 139.37097, "t": 612.1588399999999, "r": 146.4379, "b": 620.22861, "coord_origin": "1"}}, {"id": 48, "text": "Chi, Z., Huang, H., Xu, H.D., Yu, H., Yin, W., Mao, X.L.: Complicated table", "bbox": {"l": 150.98117, "t": 612.1588399999999, "r": 480.58731000000006, "b": 620.22861, "coord_origin": "1"}}, {"id": 49, "text": "structure recognition. arXiv preprint arXiv:1908.04729 (2019)", "bbox": {"l": 151.51797, "t": 623.11784, "r": 400.22525, "b": 631.18761, "coord_origin": "1"}}]}, "text": "3. Chi, Z., Huang, H., Xu, H.D., Yu, H., Yin, W., Mao, X.L.: Complicated table structure recognition. arXiv preprint arXiv:1908.04729 (2019)"}, {"label": "List-item", "id": 9, "page_no": 11, "cluster": {"id": 9, "label": "List-item", "bbox": {"l": 138.95939712524412, "t": 634.1483551025391, "r": 480.58826, "b": 665.3444732666015, "coord_origin": "1"}, "confidence": 0.9804890155792236, "cells": [{"id": 50, "text": "4.", "bbox": {"l": 139.37097, "t": 634.88084, "r": 146.52443, "b": 642.95061, "coord_origin": "1"}}, {"id": 51, "text": "Deng, Y., Rosenberg, D., Mann, G.: Challenges in end-to-end neural scientific", "bbox": {"l": 151.12335, "t": 634.88084, "r": 480.58826, "b": 642.95061, "coord_origin": "1"}}, {"id": 52, "text": "table recognition. In: 2019 International Conference on Document Analysis and", "bbox": {"l": 151.51797, "t": 645.83984, "r": 480.58752, "b": 653.9096099999999, "coord_origin": "1"}}, {"id": 53, "text": "Recognition (ICDAR). pp. 894-901. IEEE (2019)", "bbox": {"l": 151.51797, "t": 656.79785, "r": 350.11115, "b": 664.86761, "coord_origin": "1"}}]}, "text": "4. Deng, Y., Rosenberg, D., Mann, G.: Challenges in end-to-end neural scientific table recognition. In: 2019 International Conference on Document Analysis and Recognition (ICDAR). pp. 894-901. IEEE (2019)"}], "body": [{"label": "Section-header", "id": 1, "page_no": 11, "cluster": {"id": 1, "label": "Section-header", "bbox": {"l": 134.32137451171874, "t": 116.9173613548279, "r": 219.25478999999999, "b": 128.11737000000005, "coord_origin": "1"}, "confidence": 0.9443027973175049, "cells": [{"id": 3, "text": "6", "bbox": {"l": 134.765, "t": 117.54894999999988, "r": 141.4886, "b": 128.11737000000005, "coord_origin": "1"}}, {"id": 4, "text": "Conclusion", "bbox": {"l": 154.9382, "t": 117.54894999999988, "r": 219.25478999999999, "b": 128.11737000000005, "coord_origin": "1"}}]}, "text": "6 Conclusion"}, {"label": "Text", "id": 2, "page_no": 11, "cluster": {"id": 2, "label": "Text", "bbox": {"l": 134.0799774169922, "t": 146.1485215187073, "r": 480.59569999999997, "b": 203.48181, "coord_origin": "1"}, "confidence": 0.9849004745483398, "cells": [{"id": 5, "text": "We demonstrated that representing tables in HTML for the task of table struc-", "bbox": {"l": 134.765, "t": 146.86377000000005, "r": 480.59476, "b": 155.66076999999996, "coord_origin": "1"}}, {"id": 6, "text": "ture recognition with Im2Seq models is ill-suited and has serious limitations.", "bbox": {"l": 134.765, "t": 158.81879000000004, "r": 480.59476, "b": 167.61577999999997, "coord_origin": "1"}}, {"id": 7, "text": "Furthermore, we presented in this paper an Optimized Table Structure Language", "bbox": {"l": 134.765, "t": 170.77380000000005, "r": 480.58978, "b": 179.57079999999996, "coord_origin": "1"}}, {"id": 8, "text": "(OTSL) which, when compared to commonly used general purpose languages,", "bbox": {"l": 134.765, "t": 182.72979999999995, "r": 480.59569999999997, "b": 191.52679, "coord_origin": "1"}}, {"id": 9, "text": "has several key benefits.", "bbox": {"l": 134.765, "t": 194.68480999999997, "r": 239.5387, "b": 203.48181, "coord_origin": "1"}}]}, "text": "We demonstrated that representing tables in HTML for the task of table structure recognition with Im2Seq models is ill-suited and has serious limitations. Furthermore, we presented in this paper an Optimized Table Structure Language (OTSL) which, when compared to commonly used general purpose languages, has several key benefits."}, {"label": "Text", "id": 3, "page_no": 11, "cluster": {"id": 3, "label": "Text", "bbox": {"l": 133.63015937805176, "t": 206.26369628906252, "r": 480.64513664245607, "b": 324.5816196441651, "coord_origin": "1"}, "confidence": 0.9870830178260803, "cells": [{"id": 10, "text": "First and foremost, given the same network configuration, inference time for", "bbox": {"l": 149.709, "t": 207.44379000000004, "r": 480.59283000000005, "b": 216.24077999999997, "coord_origin": "1"}}, {"id": 11, "text": "a table-structure prediction is about 2 times faster compared to the conventional", "bbox": {"l": 134.765, "t": 219.39880000000005, "r": 480.59365999999994, "b": 228.19579999999996, "coord_origin": "1"}}, {"id": 12, "text": "HTML approach. This is primarily owed to the shorter sequence length of the", "bbox": {"l": 134.765, "t": 231.35382000000004, "r": 480.59079, "b": 240.15081999999995, "coord_origin": "1"}}, {"id": 13, "text": "OTSL representation. Additional performance benefits can be obtained with", "bbox": {"l": 134.765, "t": 243.30884000000003, "r": 480.58786000000003, "b": 252.10582999999997, "coord_origin": "1"}}, {"id": 14, "text": "HPO (hyper parameter optimization). As we demonstrate in our experiments,", "bbox": {"l": 134.765, "t": 255.26482999999996, "r": 480.59479, "b": 264.06183, "coord_origin": "1"}}, {"id": 15, "text": "models trained on OTSL can be significantly smaller, e.g. by reducing the number", "bbox": {"l": 134.765, "t": 267.21984999999995, "r": 480.5878000000001, "b": 276.01685, "coord_origin": "1"}}, {"id": 16, "text": "of encoder and decoder layers, while preserving comparatively good prediction", "bbox": {"l": 134.765, "t": 279.17487000000006, "r": 480.59268, "b": 287.97183, "coord_origin": "1"}}, {"id": 17, "text": "quality. This can further improve inference performance, yielding 5-6 times faster", "bbox": {"l": 134.765, "t": 291.12985, "r": 480.58871, "b": 299.92682, "coord_origin": "1"}}, {"id": 18, "text": "inference speed in OTSL with prediction quality comparable to models trained", "bbox": {"l": 134.765, "t": 303.08484, "r": 480.59375, "b": 311.88181, "coord_origin": "1"}}, {"id": 19, "text": "on HTML (see Table 1).", "bbox": {"l": 134.765, "t": 315.03983, "r": 240.92351000000002, "b": 323.83679, "coord_origin": "1"}}]}, "text": "First and foremost, given the same network configuration, inference time for a table-structure prediction is about 2 times faster compared to the conventional HTML approach. This is primarily owed to the shorter sequence length of the OTSL representation. Additional performance benefits can be obtained with HPO (hyper parameter optimization). As we demonstrate in our experiments, models trained on OTSL can be significantly smaller, e.g. by reducing the number of encoder and decoder layers, while preserving comparatively good prediction quality. This can further improve inference performance, yielding 5-6 times faster inference speed in OTSL with prediction quality comparable to models trained on HTML (see Table 1)."}, {"label": "Text", "id": 4, "page_no": 11, "cluster": {"id": 4, "label": "Text", "bbox": {"l": 133.82413501739504, "t": 326.87730903625487, "r": 480.59482, "b": 468.2926139831543, "coord_origin": "1"}, "confidence": 0.986232340335846, "cells": [{"id": 20, "text": "Secondly, OTSL has more inherent structure and a significantly restricted vo-", "bbox": {"l": 149.709, "t": 327.79883, "r": 480.58984, "b": 336.5957900000001, "coord_origin": "1"}}, {"id": 21, "text": "cabulary size. This allows autoregressive models to perform better in the TED", "bbox": {"l": 134.765, "t": 339.75482, "r": 480.59473, "b": 348.55179, "coord_origin": "1"}}, {"id": 22, "text": "metric, but especially with regards to prediction accuracy of the table-cell bound-", "bbox": {"l": 134.765, "t": 351.70981, "r": 480.58664, "b": 360.50677, "coord_origin": "1"}}, {"id": 23, "text": "ing boxes (see Table 2). As shown in Figure 5, we observe that the OTSL dras-", "bbox": {"l": 134.765, "t": 363.66479, "r": 480.59479, "b": 372.46176, "coord_origin": "1"}}, {"id": 24, "text": "tically reduces the drift for table cell bounding boxes at high row count and in", "bbox": {"l": 134.765, "t": 375.61978, "r": 480.58971999999994, "b": 384.41675, "coord_origin": "1"}}, {"id": 25, "text": "sparse tables. This leads to more accurate predictions and a significant reduction", "bbox": {"l": 134.765, "t": 387.57477, "r": 480.58673, "b": 396.37173, "coord_origin": "1"}}, {"id": 26, "text": "in post-processing complexity, which is an undesired necessity in HTML-based", "bbox": {"l": 134.765, "t": 399.53076, "r": 480.58574999999996, "b": 408.32773, "coord_origin": "1"}}, {"id": 27, "text": "Im2Seq models. Significant novelty lies in OTSL syntactical rules, which are few,", "bbox": {"l": 134.765, "t": 411.48575, "r": 480.58675999999997, "b": 420.28271, "coord_origin": "1"}}, {"id": 28, "text": "simple and always backwards looking. Each new token can be validated only by", "bbox": {"l": 134.765, "t": 423.44073, "r": 480.59482, "b": 432.23769999999996, "coord_origin": "1"}}, {"id": 29, "text": "analyzing the sequence of previous tokens, without requiring the entire sequence", "bbox": {"l": 134.765, "t": 435.39572, "r": 480.58777, "b": 444.19269, "coord_origin": "1"}}, {"id": 30, "text": "to detect mistakes. This in return allows to perform structural error detection", "bbox": {"l": 134.765, "t": 447.35071, "r": 480.58968999999996, "b": 456.14767, "coord_origin": "1"}}, {"id": 31, "text": "and correction on-the-fly during sequence generation.", "bbox": {"l": 134.765, "t": 459.30569, "r": 366.77698, "b": 468.10266, "coord_origin": "1"}}]}, "text": "Secondly, OTSL has more inherent structure and a significantly restricted vocabulary size. This allows autoregressive models to perform better in the TED metric, but especially with regards to prediction accuracy of the table-cell bounding boxes (see Table 2). As shown in Figure 5, we observe that the OTSL drastically reduces the drift for table cell bounding boxes at high row count and in sparse tables. This leads to more accurate predictions and a significant reduction in post-processing complexity, which is an undesired necessity in HTML-based Im2Seq models. Significant novelty lies in OTSL syntactical rules, which are few, simple and always backwards looking. Each new token can be validated only by analyzing the sequence of previous tokens, without requiring the entire sequence to detect mistakes. This in return allows to perform structural error detection and correction on-the-fly during sequence generation."}, {"label": "Section-header", "id": 5, "page_no": 11, "cluster": {"id": 5, "label": "Section-header", "bbox": {"l": 134.31680746078493, "t": 493.0167823791504, "r": 197.68642, "b": 504.38922, "coord_origin": "1"}, "confidence": 0.952782154083252, "cells": [{"id": 32, "text": "References", "bbox": {"l": 134.765, "t": 493.82083, "r": 197.68642, "b": 504.38922, "coord_origin": "1"}}]}, "text": "References"}, {"label": "List-item", "id": 6, "page_no": 11, "cluster": {"id": 6, "label": "List-item", "bbox": {"l": 139.371, "t": 522.1764713287354, "r": 480.5920100000001, "b": 564.6129249572754, "coord_origin": "1"}, "confidence": 0.9788757562637329, "cells": [{"id": 33, "text": "1.", "bbox": {"l": 139.371, "t": 522.87985, "r": 146.46127, "b": 530.94962, "coord_origin": "1"}}, {"id": 34, "text": "Auer, C., Dolfi, M., Carvalho, A., Ramis, C.B., Staar, P.W.J.: Delivering doc-", "bbox": {"l": 151.01955, "t": 522.87985, "r": 480.5920100000001, "b": 530.94962, "coord_origin": "1"}}, {"id": 35, "text": "ument conversion as a cloud service with high throughput and responsiveness.", "bbox": {"l": 151.51801, "t": 533.83887, "r": 480.58667, "b": 541.90862, "coord_origin": "1"}}, {"id": 36, "text": "CoRR", "bbox": {"l": 151.51801, "t": 544.79785, "r": 176.34149, "b": 552.86761, "coord_origin": "1"}}, {"id": 37, "text": "abs/2206.00785", "bbox": {"l": 179.464, "t": 544.73509, "r": 250.67963, "b": 552.66139, "coord_origin": "1"}}, {"id": 38, "text": "(2022).", "bbox": {"l": 253.804, "t": 544.79785, "r": 281.9567, "b": 552.86761, "coord_origin": "1"}}, {"id": 39, "text": "https://doi.org/10.48550/arXiv.2206.00785", "bbox": {"l": 285.078, "t": 545.44344, "r": 478.03403000000003, "b": 552.91245, "coord_origin": "1"}}, {"id": 40, "text": ",", "bbox": {"l": 478.0319799999999, "t": 544.79785, "r": 480.59099999999995, "b": 552.86761, "coord_origin": "1"}}, {"id": 41, "text": "https://doi.org/10.48550/arXiv.2206.00785", "bbox": {"l": 151.51797, "t": 556.4024400000001, "r": 344.474, "b": 563.87144, "coord_origin": "1"}}]}, "text": "1. Auer, C., Dolfi, M., Carvalho, A., Ramis, C.B., Staar, P.W.J.: Delivering document conversion as a cloud service with high throughput and responsiveness. CoRR abs/2206.00785 (2022). https://doi.org/10.48550/arXiv.2206.00785 , https://doi.org/10.48550/arXiv.2206.00785"}, {"label": "List-item", "id": 7, "page_no": 11, "cluster": {"id": 7, "label": "List-item", "bbox": {"l": 138.86715145111086, "t": 566.1212036132812, "r": 480.61741333007814, "b": 609.1713466644287, "coord_origin": "1"}, "confidence": 0.9785996675491333, "cells": [{"id": 42, "text": "2.", "bbox": {"l": 139.37097, "t": 567.51884, "r": 145.94186, "b": 575.58861, "coord_origin": "1"}}, {"id": 43, "text": "Chen, B., Peng, D., Zhang, J., Ren, Y., Jin, L.: Complex table structure recognition", "bbox": {"l": 150.16624, "t": 567.51884, "r": 480.58636, "b": 575.58861, "coord_origin": "1"}}, {"id": 44, "text": "in the wild using transformer and identity matrix-based augmentation. In: Porwal,", "bbox": {"l": 151.51797, "t": 578.47784, "r": 480.59012, "b": 586.5476100000001, "coord_origin": "1"}}, {"id": 45, "text": "U., Forn\u00e9s, A., Shafait, F. (eds.) Frontiers in Handwriting Recognition. pp. 545-", "bbox": {"l": 151.51797, "t": 589.43684, "r": 480.5920100000001, "b": 597.50661, "coord_origin": "1"}}, {"id": 46, "text": "561. Springer International Publishing, Cham (2022)", "bbox": {"l": 151.51797, "t": 600.39584, "r": 364.17856, "b": 608.46561, "coord_origin": "1"}}]}, "text": "2. Chen, B., Peng, D., Zhang, J., Ren, Y., Jin, L.: Complex table structure recognition in the wild using transformer and identity matrix-based augmentation. In: Porwal, U., Forn\u00e9s, A., Shafait, F. (eds.) Frontiers in Handwriting Recognition. pp. 545561. Springer International Publishing, Cham (2022)"}, {"label": "List-item", "id": 8, "page_no": 11, "cluster": {"id": 8, "label": "List-item", "bbox": {"l": 138.72738218307495, "t": 610.5866088867188, "r": 480.58731000000006, "b": 631.8376350402832, "coord_origin": "1"}, "confidence": 0.9714517593383789, "cells": [{"id": 47, "text": "3.", "bbox": {"l": 139.37097, "t": 612.1588399999999, "r": 146.4379, "b": 620.22861, "coord_origin": "1"}}, {"id": 48, "text": "Chi, Z., Huang, H., Xu, H.D., Yu, H., Yin, W., Mao, X.L.: Complicated table", "bbox": {"l": 150.98117, "t": 612.1588399999999, "r": 480.58731000000006, "b": 620.22861, "coord_origin": "1"}}, {"id": 49, "text": "structure recognition. arXiv preprint arXiv:1908.04729 (2019)", "bbox": {"l": 151.51797, "t": 623.11784, "r": 400.22525, "b": 631.18761, "coord_origin": "1"}}]}, "text": "3. Chi, Z., Huang, H., Xu, H.D., Yu, H., Yin, W., Mao, X.L.: Complicated table structure recognition. arXiv preprint arXiv:1908.04729 (2019)"}, {"label": "List-item", "id": 9, "page_no": 11, "cluster": {"id": 9, "label": "List-item", "bbox": {"l": 138.95939712524412, "t": 634.1483551025391, "r": 480.58826, "b": 665.3444732666015, "coord_origin": "1"}, "confidence": 0.9804890155792236, "cells": [{"id": 50, "text": "4.", "bbox": {"l": 139.37097, "t": 634.88084, "r": 146.52443, "b": 642.95061, "coord_origin": "1"}}, {"id": 51, "text": "Deng, Y., Rosenberg, D., Mann, G.: Challenges in end-to-end neural scientific", "bbox": {"l": 151.12335, "t": 634.88084, "r": 480.58826, "b": 642.95061, "coord_origin": "1"}}, {"id": 52, "text": "table recognition. In: 2019 International Conference on Document Analysis and", "bbox": {"l": 151.51797, "t": 645.83984, "r": 480.58752, "b": 653.9096099999999, "coord_origin": "1"}}, {"id": 53, "text": "Recognition (ICDAR). pp. 894-901. IEEE (2019)", "bbox": {"l": 151.51797, "t": 656.79785, "r": 350.11115, "b": 664.86761, "coord_origin": "1"}}]}, "text": "4. Deng, Y., Rosenberg, D., Mann, G.: Challenges in end-to-end neural scientific table recognition. In: 2019 International Conference on Document Analysis and Recognition (ICDAR). pp. 894-901. IEEE (2019)"}], "headers": [{"label": "Page-header", "id": 0, "page_no": 11, "cluster": {"id": 0, "label": "Page-header", "bbox": {"l": 134.6935380935669, "t": 93.01469650268552, "r": 231.72049000000004, "b": 101.84788713455202, "coord_origin": "1"}, "confidence": 0.6001661419868469, "cells": [{"id": 0, "text": "12", "bbox": {"l": 134.765, "t": 93.77099999999996, "r": 143.97887, "b": 101.84069999999997, "coord_origin": "1"}}, {"id": 1, "text": "M.", "bbox": {"l": 167.82053, "t": 93.77099999999996, "r": 178.08249, "b": 101.84069999999997, "coord_origin": "1"}}, {"id": 2, "text": "Lysak, et al.", "bbox": {"l": 182.37929, "t": 93.77099999999996, "r": 231.72049000000004, "b": 101.84069999999997, "coord_origin": "1"}}]}, "text": "12 M. Lysak, et al."}]}}, {"page_no": 12, "page_hash": "467ed0563b555b6fd2a0bd2e4a7bf596c066b8f08d2e1fd33f6c6d8b1c445759", "size": {"width": 612.0, "height": 792.0}, "cells": [{"id": 0, "text": "Optimized Table Tokenization for Table Structure Recognition", "bbox": {"l": 194.478, "t": 93.77099999999996, "r": 447.54291000000006, "b": 101.84069999999997, "coord_origin": "1"}}, {"id": 1, "text": "13", "bbox": {"l": 471.37561, "t": 93.77099999999996, "r": 480.5894799999999, "b": 101.84069999999997, "coord_origin": "1"}}, {"id": 2, "text": "5.", "bbox": {"l": 139.371, "t": 119.67400999999995, "r": 146.04857, "b": 127.74370999999985, "coord_origin": "1"}}, {"id": 3, "text": "Kayal, P., Anand, M., Desai, H., Singh, M.: Tables to latex: structure and content", "bbox": {"l": 150.34157, "t": 119.67400999999995, "r": 480.58826, "b": 127.74370999999985, "coord_origin": "1"}}, {"id": 4, "text": "extraction from scientific tables. International Journal on Document Analysis and", "bbox": {"l": 151.51801, "t": 130.63300000000004, "r": 480.59479, "b": 138.70270000000005, "coord_origin": "1"}}, {"id": 5, "text": "Recognition (IJDAR) pp. 1-10 (2022)", "bbox": {"l": 151.51801, "t": 141.59198000000004, "r": 304.04364, "b": 149.66168000000005, "coord_origin": "1"}}, {"id": 6, "text": "6.", "bbox": {"l": 139.371, "t": 152.56195000000002, "r": 145.93991, "b": 160.63165000000004, "coord_origin": "1"}}, {"id": 7, "text": "Lee, E., Kwon, J., Yang, H., Park, J., Lee, S., Koo, H.I., Cho, N.I.: Table structure", "bbox": {"l": 150.16298, "t": 152.56195000000002, "r": 480.59015, "b": 160.63165000000004, "coord_origin": "1"}}, {"id": 8, "text": "recognition based on grid shape graph. In: 2022 Asia-Pacific Signal and Information", "bbox": {"l": 151.51801, "t": 163.52094, "r": 480.5903, "b": 171.59064, "coord_origin": "1"}}, {"id": 9, "text": "Processing Association Annual Summit and Conference (APSIPA ASC). pp. 1868-", "bbox": {"l": 151.51801, "t": 174.47992, "r": 480.59286000000003, "b": 182.54962, "coord_origin": "1"}}, {"id": 10, "text": "1873. IEEE (2022)", "bbox": {"l": 151.51801, "t": 185.4389, "r": 226.37399, "b": 193.50860999999998, "coord_origin": "1"}}, {"id": 11, "text": "7.", "bbox": {"l": 139.371, "t": 196.40886999999998, "r": 146.31418, "b": 204.47857999999997, "coord_origin": "1"}}, {"id": 12, "text": "Li, M., Cui, L., Huang, S., Wei, F., Zhou, M., Li, Z.: Tablebank: A benchmark", "bbox": {"l": 150.77789, "t": 196.40886999999998, "r": 480.59012, "b": 204.47857999999997, "coord_origin": "1"}}, {"id": 13, "text": "dataset for table detection and recognition (2019)", "bbox": {"l": 151.51801, "t": 207.36785999999995, "r": 352.01746, "b": 215.43755999999996, "coord_origin": "1"}}, {"id": 14, "text": "8.", "bbox": {"l": 139.371, "t": 218.33887000000004, "r": 146.37106, "b": 226.40857000000005, "coord_origin": "1"}}, {"id": 15, "text": "Livathinos, N., Berrospi, C., Lysak, M., Kuropiatnyk, V., Nassar, A., Carvalho,", "bbox": {"l": 150.87132, "t": 218.33887000000004, "r": 480.58731000000006, "b": 226.40857000000005, "coord_origin": "1"}}, {"id": 16, "text": "A., Dolfi, M., Auer, C., Dinkla, K., Staar, P.: Robust pdf document conversion", "bbox": {"l": 151.51801, "t": 229.29785000000004, "r": 480.59020999999996, "b": 237.36755000000005, "coord_origin": "1"}}, {"id": 17, "text": "using recurrent neural networks. Proceedings of the AAAI Conference on Artificial", "bbox": {"l": 151.51801, "t": 240.25684, "r": 480.59473, "b": 248.32654000000002, "coord_origin": "1"}}, {"id": 18, "text": "Intelligence", "bbox": {"l": 151.51801, "t": 251.21582, "r": 197.08617, "b": 259.28552, "coord_origin": "1"}}, {"id": 19, "text": "35", "bbox": {"l": 199.40001, "t": 251.15301999999997, "r": 210.00726, "b": 259.07935, "coord_origin": "1"}}, {"id": 20, "text": "(17), 15137-15145 (May 2021),", "bbox": {"l": 210.007, "t": 251.21582, "r": 332.37683, "b": 259.28552, "coord_origin": "1"}}, {"id": 21, "text": "https://ojs.aaai.org/index.php/", "bbox": {"l": 334.69901, "t": 251.86139000000003, "r": 480.59039000000007, "b": 259.33038, "coord_origin": "1"}}, {"id": 22, "text": "AAAI/article/view/17777", "bbox": {"l": 151.51801, "t": 262.8194, "r": 259.75769, "b": 270.28839000000005, "coord_origin": "1"}}, {"id": 23, "text": "9.", "bbox": {"l": 139.371, "t": 273.14484000000004, "r": 146.14218, "b": 281.21457, "coord_origin": "1"}}, {"id": 24, "text": "Nassar, A., Livathinos, N., Lysak, M., Staar, P.: Tableformer: Table structure un-", "bbox": {"l": 150.49533, "t": 273.14484000000004, "r": 480.5881999999999, "b": 281.21457, "coord_origin": "1"}}, {"id": 25, "text": "derstanding with transformers. In: Proceedings of the IEEE/CVF Conference on", "bbox": {"l": 151.51801, "t": 284.10379, "r": 480.59387000000004, "b": 292.17355, "coord_origin": "1"}}, {"id": 26, "text": "Computer Vision and Pattern Recognition (CVPR). pp. 4614-4623 (June 2022)", "bbox": {"l": 151.51801, "t": 295.06277, "r": 473.44308000000007, "b": 303.13254, "coord_origin": "1"}}, {"id": 27, "text": "10.", "bbox": {"l": 134.76401, "t": 306.03277999999995, "r": 146.49922, "b": 314.10254000000003, "coord_origin": "1"}}, {"id": 28, "text": "Pfitzmann, B., Auer, C., Dolfi, M., Nassar, A.S., Staar, P.W.J.: Doclaynet: A", "bbox": {"l": 151.09138, "t": 306.03277999999995, "r": 480.58905, "b": 314.10254000000003, "coord_origin": "1"}}, {"id": 29, "text": "large human-annotated dataset for document-layout segmentation. In: Zhang, A.,", "bbox": {"l": 151.51801, "t": 316.99179, "r": 480.59015, "b": 325.06155, "coord_origin": "1"}}, {"id": 30, "text": "Rangwala, H. (eds.) KDD \u201922: The 28th ACM SIGKDD Conference on Knowledge", "bbox": {"l": 151.51801, "t": 327.95078, "r": 480.59113, "b": 336.02054, "coord_origin": "1"}}, {"id": 31, "text": "Discovery and Data Mining, Washington, DC, USA, August 14 - 18, 2022. pp.", "bbox": {"l": 151.51801, "t": 338.90976, "r": 480.59113, "b": 346.97952, "coord_origin": "1"}}, {"id": 32, "text": "3743-3751. ACM (2022).", "bbox": {"l": 151.51801, "t": 349.86874, "r": 251.14098999999996, "b": 357.93851, "coord_origin": "1"}}, {"id": 33, "text": "https://doi.org/10.1145/3534678.3539043", "bbox": {"l": 253.99001, "t": 350.5143100000001, "r": 437.53311, "b": 357.98333999999994, "coord_origin": "1"}}, {"id": 34, "text": ",", "bbox": {"l": 437.53201, "t": 349.86874, "r": 440.09102999999993, "b": 357.93851, "coord_origin": "1"}}, {"id": 35, "text": "https://", "bbox": {"l": 442.94202000000007, "t": 350.5143100000001, "r": 480.59372, "b": 357.98333999999994, "coord_origin": "1"}}, {"id": 36, "text": "doi.org/10.1145/3534678.3539043", "bbox": {"l": 151.51801, "t": 361.47329999999994, "r": 297.40939, "b": 368.94232, "coord_origin": "1"}}, {"id": 37, "text": "11.", "bbox": {"l": 134.76401, "t": 371.79773, "r": 146.03854, "b": 379.86749, "coord_origin": "1"}}, {"id": 38, "text": "Prasad, D., Gadpal, A., Kapadni, K., Visave, M., Sultanpure, K.: Cascadetabnet:", "bbox": {"l": 150.4505, "t": 371.79773, "r": 480.58914000000004, "b": 379.86749, "coord_origin": "1"}}, {"id": 39, "text": "An approach for end to end table detection and structure recognition from image-", "bbox": {"l": 151.51801, "t": 382.7567399999999, "r": 480.59296, "b": 390.82651, "coord_origin": "1"}}, {"id": 40, "text": "based documents. In: Proceedings of the IEEE/CVF conference on computer vision", "bbox": {"l": 151.51801, "t": 393.71573, "r": 480.59293, "b": 401.78549, "coord_origin": "1"}}, {"id": 41, "text": "and pattern recognition workshops. pp. 572-573 (2020)", "bbox": {"l": 151.51801, "t": 404.67471, "r": 373.82727, "b": 412.74448, "coord_origin": "1"}}, {"id": 42, "text": "12.", "bbox": {"l": 134.76401, "t": 415.64471, "r": 145.91106, "b": 423.71448000000004, "coord_origin": "1"}}, {"id": 43, "text": "Schreiber, S., Agne, S., Wolf, I., Dengel, A., Ahmed, S.: Deepdesrt: Deep learning", "bbox": {"l": 150.27309, "t": 415.64471, "r": 480.5874, "b": 423.71448000000004, "coord_origin": "1"}}, {"id": 44, "text": "for detection and structure recognition of tables in document images. In: 2017 14th", "bbox": {"l": 151.51801, "t": 426.60373, "r": 480.59469999999993, "b": 434.67349, "coord_origin": "1"}}, {"id": 45, "text": "IAPR international conference on document analysis and recognition (ICDAR).", "bbox": {"l": 151.51801, "t": 437.5627099999999, "r": 480.58844, "b": 445.63248, "coord_origin": "1"}}, {"id": 46, "text": "vol. 1, pp. 1162-1167. IEEE (2017)", "bbox": {"l": 151.51801, "t": 448.5217, "r": 292.91455, "b": 456.59146, "coord_origin": "1"}}, {"id": 47, "text": "13.", "bbox": {"l": 134.76401, "t": 459.4917, "r": 145.7785, "b": 467.56146, "coord_origin": "1"}}, {"id": 48, "text": "Siddiqui, S.A., Fateh, I.A., Rizvi, S.T.R., Dengel, A., Ahmed, S.: Deeptabstr: Deep", "bbox": {"l": 150.08871, "t": 459.4917, "r": 480.59006, "b": 467.56146, "coord_origin": "1"}}, {"id": 49, "text": "learning based table structure recognition. In: 2019 International Conference on", "bbox": {"l": 151.51801, "t": 470.45071, "r": 480.59116, "b": 478.52048, "coord_origin": "1"}}, {"id": 50, "text": "Document Analysis and Recognition (ICDAR). pp. 1403-1409 (2019).", "bbox": {"l": 151.51801, "t": 481.4097, "r": 439.05963, "b": 489.47946, "coord_origin": "1"}}, {"id": 51, "text": "https://", "bbox": {"l": 442.94202000000007, "t": 482.05527, "r": 480.59372, "b": 489.52429, "coord_origin": "1"}}, {"id": 52, "text": "doi.org/10.1109/ICDAR.2019.00226", "bbox": {"l": 151.51801, "t": 493.01425, "r": 302.11584, "b": 500.48328, "coord_origin": "1"}}, {"id": 53, "text": "14.", "bbox": {"l": 134.76401, "t": 503.33868, "r": 146.15501, "b": 511.40845, "coord_origin": "1"}}, {"id": 54, "text": "Smock, B., Pesala, R., Abraham, R.: PubTables-1M: Towards comprehensive ta-", "bbox": {"l": 150.61252, "t": 503.33868, "r": 480.59088, "b": 511.40845, "coord_origin": "1"}}, {"id": 55, "text": "ble extraction from unstructured documents. In: Proceedings of the IEEE/CVF", "bbox": {"l": 151.51801, "t": 514.2977000000001, "r": 480.59286000000003, "b": 522.3674599999999, "coord_origin": "1"}}, {"id": 56, "text": "Conference on Computer Vision and Pattern Recognition (CVPR). pp. 4634-4642", "bbox": {"l": 151.51801, "t": 525.25668, "r": 480.58838000000003, "b": 533.32645, "coord_origin": "1"}}, {"id": 57, "text": "(June 2022)", "bbox": {"l": 151.51801, "t": 536.21568, "r": 199.24704, "b": 544.28545, "coord_origin": "1"}}, {"id": 58, "text": "15.", "bbox": {"l": 134.76401, "t": 547.18568, "r": 146.16588, "b": 555.25545, "coord_origin": "1"}}, {"id": 59, "text": "Staar, P.W.J., Dolfi, M., Auer, C., Bekas, C.: Corpus conversion service: A ma-", "bbox": {"l": 150.62764, "t": 547.18568, "r": 480.58734000000004, "b": 555.25545, "coord_origin": "1"}}, {"id": 60, "text": "chine learning platform to ingest documents at scale. In: Proceedings of the 24th", "bbox": {"l": 151.51801, "t": 558.14468, "r": 480.58838000000003, "b": 566.2144499999999, "coord_origin": "1"}}, {"id": 61, "text": "ACM SIGKDD International Conference on Knowledge Discovery & Data Min-", "bbox": {"l": 151.51801, "t": 569.1036799999999, "r": 480.59109, "b": 577.17345, "coord_origin": "1"}}, {"id": 62, "text": "ing. pp. 774-782. KDD \u201918, Association for Computing Machinery, New York, NY,", "bbox": {"l": 151.51801, "t": 580.06268, "r": 480.59195, "b": 588.1324500000001, "coord_origin": "1"}}, {"id": 63, "text": "USA (2018).", "bbox": {"l": 151.51801, "t": 591.0216800000001, "r": 200.75787, "b": 599.09145, "coord_origin": "1"}}, {"id": 64, "text": "https://doi.org/10.1145/3219819.3219834", "bbox": {"l": 202.916, "t": 591.66727, "r": 386.45911, "b": 599.1362799999999, "coord_origin": "1"}}, {"id": 65, "text": ",", "bbox": {"l": 386.45801, "t": 591.0216800000001, "r": 389.01703, "b": 599.09145, "coord_origin": "1"}}, {"id": 66, "text": "https://doi.org/10.", "bbox": {"l": 391.173, "t": 591.66727, "r": 480.59583, "b": 599.1362799999999, "coord_origin": "1"}}, {"id": 67, "text": "1145/3219819.3219834", "bbox": {"l": 151.51801, "t": 602.62627, "r": 245.63831, "b": 610.09528, "coord_origin": "1"}}, {"id": 68, "text": "16.", "bbox": {"l": 134.76401, "t": 612.95068, "r": 146.62019, "b": 621.02045, "coord_origin": "1"}}, {"id": 69, "text": "Wang, X.: Tabular Abstraction, Editing, and Formatting. Ph.D. thesis, CAN", "bbox": {"l": 151.25977, "t": 612.95068, "r": 480.59542999999996, "b": 621.02045, "coord_origin": "1"}}, {"id": 70, "text": "(1996), aAINN09397", "bbox": {"l": 151.51801, "t": 623.90968, "r": 234.43031, "b": 631.97945, "coord_origin": "1"}}, {"id": 71, "text": "17.", "bbox": {"l": 134.76401, "t": 634.87968, "r": 146.30539, "b": 642.9494500000001, "coord_origin": "1"}}, {"id": 72, "text": "Xue, W., Li, Q., Tao, D.: Res2tim: Reconstruct syntactic structures from table", "bbox": {"l": 150.82175, "t": 634.87968, "r": 480.58731000000006, "b": 642.9494500000001, "coord_origin": "1"}}, {"id": 73, "text": "images. In: 2019 International Conference on Document Analysis and Recognition", "bbox": {"l": 151.51801, "t": 645.8386800000001, "r": 480.59119, "b": 653.90845, "coord_origin": "1"}}, {"id": 74, "text": "(ICDAR). pp. 749-755. IEEE (2019)", "bbox": {"l": 151.51801, "t": 656.79768, "r": 299.30307, "b": 664.86745, "coord_origin": "1"}}], "predictions": {"layout": {"clusters": [{"id": 0, "label": "Page-header", "bbox": {"l": 194.0724666595459, "t": 93.14808425903323, "r": 447.54291000000006, "b": 102.36713447570799, "coord_origin": "1"}, "confidence": 0.9549390077590942, "cells": [{"id": 0, "text": "Optimized Table Tokenization for Table Structure Recognition", "bbox": {"l": 194.478, "t": 93.77099999999996, "r": 447.54291000000006, "b": 101.84069999999997, "coord_origin": "1"}}]}, {"id": 1, "label": "Page-header", "bbox": {"l": 471.1661275863647, "t": 93.57991390228267, "r": 480.5894799999999, "b": 101.84069999999997, "coord_origin": "1"}, "confidence": 0.9042201042175293, "cells": [{"id": 1, "text": "13", "bbox": {"l": 471.37561, "t": 93.77099999999996, "r": 480.5894799999999, "b": 101.84069999999997, "coord_origin": "1"}}]}, {"id": 2, "label": "List-item", "bbox": {"l": 138.69608402252197, "t": 119.06796512603762, "r": 480.59479, "b": 150.90859880447385, "coord_origin": "1"}, "confidence": 0.976601779460907, "cells": [{"id": 2, "text": "5.", "bbox": {"l": 139.371, "t": 119.67400999999995, "r": 146.04857, "b": 127.74370999999985, "coord_origin": "1"}}, {"id": 3, "text": "Kayal, P., Anand, M., Desai, H., Singh, M.: Tables to latex: structure and content", "bbox": {"l": 150.34157, "t": 119.67400999999995, "r": 480.58826, "b": 127.74370999999985, "coord_origin": "1"}}, {"id": 4, "text": "extraction from scientific tables. International Journal on Document Analysis and", "bbox": {"l": 151.51801, "t": 130.63300000000004, "r": 480.59479, "b": 138.70270000000005, "coord_origin": "1"}}, {"id": 5, "text": "Recognition (IJDAR) pp. 1-10 (2022)", "bbox": {"l": 151.51801, "t": 141.59198000000004, "r": 304.04364, "b": 149.66168000000005, "coord_origin": "1"}}]}, {"id": 3, "label": "List-item", "bbox": {"l": 138.54494819641113, "t": 151.70322275161743, "r": 480.75314083099363, "b": 193.50860999999998, "coord_origin": "1"}, "confidence": 0.9813704490661621, "cells": [{"id": 6, "text": "6.", "bbox": {"l": 139.371, "t": 152.56195000000002, "r": 145.93991, "b": 160.63165000000004, "coord_origin": "1"}}, {"id": 7, "text": "Lee, E., Kwon, J., Yang, H., Park, J., Lee, S., Koo, H.I., Cho, N.I.: Table structure", "bbox": {"l": 150.16298, "t": 152.56195000000002, "r": 480.59015, "b": 160.63165000000004, "coord_origin": "1"}}, {"id": 8, "text": "recognition based on grid shape graph. In: 2022 Asia-Pacific Signal and Information", "bbox": {"l": 151.51801, "t": 163.52094, "r": 480.5903, "b": 171.59064, "coord_origin": "1"}}, {"id": 9, "text": "Processing Association Annual Summit and Conference (APSIPA ASC). pp. 1868-", "bbox": {"l": 151.51801, "t": 174.47992, "r": 480.59286000000003, "b": 182.54962, "coord_origin": "1"}}, {"id": 10, "text": "1873. IEEE (2022)", "bbox": {"l": 151.51801, "t": 185.4389, "r": 226.37399, "b": 193.50860999999998, "coord_origin": "1"}}]}, {"id": 4, "label": "List-item", "bbox": {"l": 139.07085943222046, "t": 195.3876846313476, "r": 480.59012, "b": 215.5838447570801, "coord_origin": "1"}, "confidence": 0.9723377227783203, "cells": [{"id": 11, "text": "7.", "bbox": {"l": 139.371, "t": 196.40886999999998, "r": 146.31418, "b": 204.47857999999997, "coord_origin": "1"}}, {"id": 12, "text": "Li, M., Cui, L., Huang, S., Wei, F., Zhou, M., Li, Z.: Tablebank: A benchmark", "bbox": {"l": 150.77789, "t": 196.40886999999998, "r": 480.59012, "b": 204.47857999999997, "coord_origin": "1"}}, {"id": 13, "text": "dataset for table detection and recognition (2019)", "bbox": {"l": 151.51801, "t": 207.36785999999995, "r": 352.01746, "b": 215.43755999999996, "coord_origin": "1"}}]}, {"id": 5, "label": "List-item", "bbox": {"l": 138.5443937301636, "t": 217.49708290100102, "r": 480.8269432067871, "b": 270.28839000000005, "coord_origin": "1"}, "confidence": 0.9826642274856567, "cells": [{"id": 14, "text": "8.", "bbox": {"l": 139.371, "t": 218.33887000000004, "r": 146.37106, "b": 226.40857000000005, "coord_origin": "1"}}, {"id": 15, "text": "Livathinos, N., Berrospi, C., Lysak, M., Kuropiatnyk, V., Nassar, A., Carvalho,", "bbox": {"l": 150.87132, "t": 218.33887000000004, "r": 480.58731000000006, "b": 226.40857000000005, "coord_origin": "1"}}, {"id": 16, "text": "A., Dolfi, M., Auer, C., Dinkla, K., Staar, P.: Robust pdf document conversion", "bbox": {"l": 151.51801, "t": 229.29785000000004, "r": 480.59020999999996, "b": 237.36755000000005, "coord_origin": "1"}}, {"id": 17, "text": "using recurrent neural networks. Proceedings of the AAAI Conference on Artificial", "bbox": {"l": 151.51801, "t": 240.25684, "r": 480.59473, "b": 248.32654000000002, "coord_origin": "1"}}, {"id": 18, "text": "Intelligence", "bbox": {"l": 151.51801, "t": 251.21582, "r": 197.08617, "b": 259.28552, "coord_origin": "1"}}, {"id": 19, "text": "35", "bbox": {"l": 199.40001, "t": 251.15301999999997, "r": 210.00726, "b": 259.07935, "coord_origin": "1"}}, {"id": 20, "text": "(17), 15137-15145 (May 2021),", "bbox": {"l": 210.007, "t": 251.21582, "r": 332.37683, "b": 259.28552, "coord_origin": "1"}}, {"id": 21, "text": "https://ojs.aaai.org/index.php/", "bbox": {"l": 334.69901, "t": 251.86139000000003, "r": 480.59039000000007, "b": 259.33038, "coord_origin": "1"}}, {"id": 22, "text": "AAAI/article/view/17777", "bbox": {"l": 151.51801, "t": 262.8194, "r": 259.75769, "b": 270.28839000000005, "coord_origin": "1"}}]}, {"id": 6, "label": "List-item", "bbox": {"l": 138.21877613067628, "t": 272.1957120895386, "r": 480.59387000000004, "b": 304.09056758880615, "coord_origin": "1"}, "confidence": 0.9749984741210938, "cells": [{"id": 23, "text": "9.", "bbox": {"l": 139.371, "t": 273.14484000000004, "r": 146.14218, "b": 281.21457, "coord_origin": "1"}}, {"id": 24, "text": "Nassar, A., Livathinos, N., Lysak, M., Staar, P.: Tableformer: Table structure un-", "bbox": {"l": 150.49533, "t": 273.14484000000004, "r": 480.5881999999999, "b": 281.21457, "coord_origin": "1"}}, {"id": 25, "text": "derstanding with transformers. In: Proceedings of the IEEE/CVF Conference on", "bbox": {"l": 151.51801, "t": 284.10379, "r": 480.59387000000004, "b": 292.17355, "coord_origin": "1"}}, {"id": 26, "text": "Computer Vision and Pattern Recognition (CVPR). pp. 4614-4623 (June 2022)", "bbox": {"l": 151.51801, "t": 295.06277, "r": 473.44308000000007, "b": 303.13254, "coord_origin": "1"}}]}, {"id": 7, "label": "List-item", "bbox": {"l": 134.74402370452881, "t": 305.29434299468994, "r": 480.6158374786377, "b": 369.1853977203369, "coord_origin": "1"}, "confidence": 0.9822485446929932, "cells": [{"id": 27, "text": "10.", "bbox": {"l": 134.76401, "t": 306.03277999999995, "r": 146.49922, "b": 314.10254000000003, "coord_origin": "1"}}, {"id": 28, "text": "Pfitzmann, B., Auer, C., Dolfi, M., Nassar, A.S., Staar, P.W.J.: Doclaynet: A", "bbox": {"l": 151.09138, "t": 306.03277999999995, "r": 480.58905, "b": 314.10254000000003, "coord_origin": "1"}}, {"id": 29, "text": "large human-annotated dataset for document-layout segmentation. In: Zhang, A.,", "bbox": {"l": 151.51801, "t": 316.99179, "r": 480.59015, "b": 325.06155, "coord_origin": "1"}}, {"id": 30, "text": "Rangwala, H. (eds.) KDD \u201922: The 28th ACM SIGKDD Conference on Knowledge", "bbox": {"l": 151.51801, "t": 327.95078, "r": 480.59113, "b": 336.02054, "coord_origin": "1"}}, {"id": 31, "text": "Discovery and Data Mining, Washington, DC, USA, August 14 - 18, 2022. pp.", "bbox": {"l": 151.51801, "t": 338.90976, "r": 480.59113, "b": 346.97952, "coord_origin": "1"}}, {"id": 32, "text": "3743-3751. ACM (2022).", "bbox": {"l": 151.51801, "t": 349.86874, "r": 251.14098999999996, "b": 357.93851, "coord_origin": "1"}}, {"id": 33, "text": "https://doi.org/10.1145/3534678.3539043", "bbox": {"l": 253.99001, "t": 350.5143100000001, "r": 437.53311, "b": 357.98333999999994, "coord_origin": "1"}}, {"id": 34, "text": ",", "bbox": {"l": 437.53201, "t": 349.86874, "r": 440.09102999999993, "b": 357.93851, "coord_origin": "1"}}, {"id": 35, "text": "https://", "bbox": {"l": 442.94202000000007, "t": 350.5143100000001, "r": 480.59372, "b": 357.98333999999994, "coord_origin": "1"}}, {"id": 36, "text": "doi.org/10.1145/3534678.3539043", "bbox": {"l": 151.51801, "t": 361.47329999999994, "r": 297.40939, "b": 368.94232, "coord_origin": "1"}}]}, {"id": 8, "label": "List-item", "bbox": {"l": 134.48021450042725, "t": 370.85761642456055, "r": 480.59296, "b": 413.06162338256837, "coord_origin": "1"}, "confidence": 0.9819908142089844, "cells": [{"id": 37, "text": "11.", "bbox": {"l": 134.76401, "t": 371.79773, "r": 146.03854, "b": 379.86749, "coord_origin": "1"}}, {"id": 38, "text": "Prasad, D., Gadpal, A., Kapadni, K., Visave, M., Sultanpure, K.: Cascadetabnet:", "bbox": {"l": 150.4505, "t": 371.79773, "r": 480.58914000000004, "b": 379.86749, "coord_origin": "1"}}, {"id": 39, "text": "An approach for end to end table detection and structure recognition from image-", "bbox": {"l": 151.51801, "t": 382.7567399999999, "r": 480.59296, "b": 390.82651, "coord_origin": "1"}}, {"id": 40, "text": "based documents. In: Proceedings of the IEEE/CVF conference on computer vision", "bbox": {"l": 151.51801, "t": 393.71573, "r": 480.59293, "b": 401.78549, "coord_origin": "1"}}, {"id": 41, "text": "and pattern recognition workshops. pp. 572-573 (2020)", "bbox": {"l": 151.51801, "t": 404.67471, "r": 373.82727, "b": 412.74448, "coord_origin": "1"}}]}, {"id": 9, "label": "List-item", "bbox": {"l": 134.6136074066162, "t": 414.916438293457, "r": 480.62972831726074, "b": 457.31890296936035, "coord_origin": "1"}, "confidence": 0.9810307025909424, "cells": [{"id": 42, "text": "12.", "bbox": {"l": 134.76401, "t": 415.64471, "r": 145.91106, "b": 423.71448000000004, "coord_origin": "1"}}, {"id": 43, "text": "Schreiber, S., Agne, S., Wolf, I., Dengel, A., Ahmed, S.: Deepdesrt: Deep learning", "bbox": {"l": 150.27309, "t": 415.64471, "r": 480.5874, "b": 423.71448000000004, "coord_origin": "1"}}, {"id": 44, "text": "for detection and structure recognition of tables in document images. In: 2017 14th", "bbox": {"l": 151.51801, "t": 426.60373, "r": 480.59469999999993, "b": 434.67349, "coord_origin": "1"}}, {"id": 45, "text": "IAPR international conference on document analysis and recognition (ICDAR).", "bbox": {"l": 151.51801, "t": 437.5627099999999, "r": 480.58844, "b": 445.63248, "coord_origin": "1"}}, {"id": 46, "text": "vol. 1, pp. 1162-1167. IEEE (2017)", "bbox": {"l": 151.51801, "t": 448.5217, "r": 292.91455, "b": 456.59146, "coord_origin": "1"}}]}, {"id": 10, "label": "List-item", "bbox": {"l": 134.72238492965698, "t": 458.3810577392578, "r": 480.75556297302245, "b": 501.21110343933105, "coord_origin": "1"}, "confidence": 0.9801984429359436, "cells": [{"id": 47, "text": "13.", "bbox": {"l": 134.76401, "t": 459.4917, "r": 145.7785, "b": 467.56146, "coord_origin": "1"}}, {"id": 48, "text": "Siddiqui, S.A., Fateh, I.A., Rizvi, S.T.R., Dengel, A., Ahmed, S.: Deeptabstr: Deep", "bbox": {"l": 150.08871, "t": 459.4917, "r": 480.59006, "b": 467.56146, "coord_origin": "1"}}, {"id": 49, "text": "learning based table structure recognition. In: 2019 International Conference on", "bbox": {"l": 151.51801, "t": 470.45071, "r": 480.59116, "b": 478.52048, "coord_origin": "1"}}, {"id": 50, "text": "Document Analysis and Recognition (ICDAR). pp. 1403-1409 (2019).", "bbox": {"l": 151.51801, "t": 481.4097, "r": 439.05963, "b": 489.47946, "coord_origin": "1"}}, {"id": 51, "text": "https://", "bbox": {"l": 442.94202000000007, "t": 482.05527, "r": 480.59372, "b": 489.52429, "coord_origin": "1"}}, {"id": 52, "text": "doi.org/10.1109/ICDAR.2019.00226", "bbox": {"l": 151.51801, "t": 493.01425, "r": 302.11584, "b": 500.48328, "coord_origin": "1"}}]}, {"id": 11, "label": "List-item", "bbox": {"l": 134.37410717010496, "t": 502.09606246948243, "r": 480.59286000000003, "b": 544.6769313812256, "coord_origin": "1"}, "confidence": 0.981369137763977, "cells": [{"id": 53, "text": "14.", "bbox": {"l": 134.76401, "t": 503.33868, "r": 146.15501, "b": 511.40845, "coord_origin": "1"}}, {"id": 54, "text": "Smock, B., Pesala, R., Abraham, R.: PubTables-1M: Towards comprehensive ta-", "bbox": {"l": 150.61252, "t": 503.33868, "r": 480.59088, "b": 511.40845, "coord_origin": "1"}}, {"id": 55, "text": "ble extraction from unstructured documents. In: Proceedings of the IEEE/CVF", "bbox": {"l": 151.51801, "t": 514.2977000000001, "r": 480.59286000000003, "b": 522.3674599999999, "coord_origin": "1"}}, {"id": 56, "text": "Conference on Computer Vision and Pattern Recognition (CVPR). pp. 4634-4642", "bbox": {"l": 151.51801, "t": 525.25668, "r": 480.58838000000003, "b": 533.32645, "coord_origin": "1"}}, {"id": 57, "text": "(June 2022)", "bbox": {"l": 151.51801, "t": 536.21568, "r": 199.24704, "b": 544.28545, "coord_origin": "1"}}]}, {"id": 12, "label": "List-item", "bbox": {"l": 134.60514450073242, "t": 546.2972602844238, "r": 480.6208276748657, "b": 610.09528, "coord_origin": "1"}, "confidence": 0.9817614555358887, "cells": [{"id": 58, "text": "15.", "bbox": {"l": 134.76401, "t": 547.18568, "r": 146.16588, "b": 555.25545, "coord_origin": "1"}}, {"id": 59, "text": "Staar, P.W.J., Dolfi, M., Auer, C., Bekas, C.: Corpus conversion service: A ma-", "bbox": {"l": 150.62764, "t": 547.18568, "r": 480.58734000000004, "b": 555.25545, "coord_origin": "1"}}, {"id": 60, "text": "chine learning platform to ingest documents at scale. In: Proceedings of the 24th", "bbox": {"l": 151.51801, "t": 558.14468, "r": 480.58838000000003, "b": 566.2144499999999, "coord_origin": "1"}}, {"id": 61, "text": "ACM SIGKDD International Conference on Knowledge Discovery & Data Min-", "bbox": {"l": 151.51801, "t": 569.1036799999999, "r": 480.59109, "b": 577.17345, "coord_origin": "1"}}, {"id": 62, "text": "ing. pp. 774-782. KDD \u201918, Association for Computing Machinery, New York, NY,", "bbox": {"l": 151.51801, "t": 580.06268, "r": 480.59195, "b": 588.1324500000001, "coord_origin": "1"}}, {"id": 63, "text": "USA (2018).", "bbox": {"l": 151.51801, "t": 591.0216800000001, "r": 200.75787, "b": 599.09145, "coord_origin": "1"}}, {"id": 64, "text": "https://doi.org/10.1145/3219819.3219834", "bbox": {"l": 202.916, "t": 591.66727, "r": 386.45911, "b": 599.1362799999999, "coord_origin": "1"}}, {"id": 65, "text": ",", "bbox": {"l": 386.45801, "t": 591.0216800000001, "r": 389.01703, "b": 599.09145, "coord_origin": "1"}}, {"id": 66, "text": "https://doi.org/10.", "bbox": {"l": 391.173, "t": 591.66727, "r": 480.59583, "b": 599.1362799999999, "coord_origin": "1"}}, {"id": 67, "text": "1145/3219819.3219834", "bbox": {"l": 151.51801, "t": 602.62627, "r": 245.63831, "b": 610.09528, "coord_origin": "1"}}]}, {"id": 13, "label": "List-item", "bbox": {"l": 134.76401, "t": 612.154292678833, "r": 480.59542999999996, "b": 632.0587142944336, "coord_origin": "1"}, "confidence": 0.9665940999984741, "cells": [{"id": 68, "text": "16.", "bbox": {"l": 134.76401, "t": 612.95068, "r": 146.62019, "b": 621.02045, "coord_origin": "1"}}, {"id": 69, "text": "Wang, X.: Tabular Abstraction, Editing, and Formatting. Ph.D. thesis, CAN", "bbox": {"l": 151.25977, "t": 612.95068, "r": 480.59542999999996, "b": 621.02045, "coord_origin": "1"}}, {"id": 70, "text": "(1996), aAINN09397", "bbox": {"l": 151.51801, "t": 623.90968, "r": 234.43031, "b": 631.97945, "coord_origin": "1"}}]}, {"id": 14, "label": "List-item", "bbox": {"l": 134.76401, "t": 634.2881629943848, "r": 480.59119, "b": 665.3440200805663, "coord_origin": "1"}, "confidence": 0.9786126017570496, "cells": [{"id": 71, "text": "17.", "bbox": {"l": 134.76401, "t": 634.87968, "r": 146.30539, "b": 642.9494500000001, "coord_origin": "1"}}, {"id": 72, "text": "Xue, W., Li, Q., Tao, D.: Res2tim: Reconstruct syntactic structures from table", "bbox": {"l": 150.82175, "t": 634.87968, "r": 480.58731000000006, "b": 642.9494500000001, "coord_origin": "1"}}, {"id": 73, "text": "images. In: 2019 International Conference on Document Analysis and Recognition", "bbox": {"l": 151.51801, "t": 645.8386800000001, "r": 480.59119, "b": 653.90845, "coord_origin": "1"}}, {"id": 74, "text": "(ICDAR). pp. 749-755. IEEE (2019)", "bbox": {"l": 151.51801, "t": 656.79768, "r": 299.30307, "b": 664.86745, "coord_origin": "1"}}]}]}, "tablestructure": {"table_map": {}}, "figures_classification": null, "equations_prediction": null}, "assembled": {"elements": [{"label": "Page-header", "id": 0, "page_no": 12, "cluster": {"id": 0, "label": "Page-header", "bbox": {"l": 194.0724666595459, "t": 93.14808425903323, "r": 447.54291000000006, "b": 102.36713447570799, "coord_origin": "1"}, "confidence": 0.9549390077590942, "cells": [{"id": 0, "text": "Optimized Table Tokenization for Table Structure Recognition", "bbox": {"l": 194.478, "t": 93.77099999999996, "r": 447.54291000000006, "b": 101.84069999999997, "coord_origin": "1"}}]}, "text": "Optimized Table Tokenization for Table Structure Recognition"}, {"label": "Page-header", "id": 1, "page_no": 12, "cluster": {"id": 1, "label": "Page-header", "bbox": {"l": 471.1661275863647, "t": 93.57991390228267, "r": 480.5894799999999, "b": 101.84069999999997, "coord_origin": "1"}, "confidence": 0.9042201042175293, "cells": [{"id": 1, "text": "13", "bbox": {"l": 471.37561, "t": 93.77099999999996, "r": 480.5894799999999, "b": 101.84069999999997, "coord_origin": "1"}}]}, "text": "13"}, {"label": "List-item", "id": 2, "page_no": 12, "cluster": {"id": 2, "label": "List-item", "bbox": {"l": 138.69608402252197, "t": 119.06796512603762, "r": 480.59479, "b": 150.90859880447385, "coord_origin": "1"}, "confidence": 0.976601779460907, "cells": [{"id": 2, "text": "5.", "bbox": {"l": 139.371, "t": 119.67400999999995, "r": 146.04857, "b": 127.74370999999985, "coord_origin": "1"}}, {"id": 3, "text": "Kayal, P., Anand, M., Desai, H., Singh, M.: Tables to latex: structure and content", "bbox": {"l": 150.34157, "t": 119.67400999999995, "r": 480.58826, "b": 127.74370999999985, "coord_origin": "1"}}, {"id": 4, "text": "extraction from scientific tables. International Journal on Document Analysis and", "bbox": {"l": 151.51801, "t": 130.63300000000004, "r": 480.59479, "b": 138.70270000000005, "coord_origin": "1"}}, {"id": 5, "text": "Recognition (IJDAR) pp. 1-10 (2022)", "bbox": {"l": 151.51801, "t": 141.59198000000004, "r": 304.04364, "b": 149.66168000000005, "coord_origin": "1"}}]}, "text": "5. Kayal, P., Anand, M., Desai, H., Singh, M.: Tables to latex: structure and content extraction from scientific tables. International Journal on Document Analysis and Recognition (IJDAR) pp. 1-10 (2022)"}, {"label": "List-item", "id": 3, "page_no": 12, "cluster": {"id": 3, "label": "List-item", "bbox": {"l": 138.54494819641113, "t": 151.70322275161743, "r": 480.75314083099363, "b": 193.50860999999998, "coord_origin": "1"}, "confidence": 0.9813704490661621, "cells": [{"id": 6, "text": "6.", "bbox": {"l": 139.371, "t": 152.56195000000002, "r": 145.93991, "b": 160.63165000000004, "coord_origin": "1"}}, {"id": 7, "text": "Lee, E., Kwon, J., Yang, H., Park, J., Lee, S., Koo, H.I., Cho, N.I.: Table structure", "bbox": {"l": 150.16298, "t": 152.56195000000002, "r": 480.59015, "b": 160.63165000000004, "coord_origin": "1"}}, {"id": 8, "text": "recognition based on grid shape graph. In: 2022 Asia-Pacific Signal and Information", "bbox": {"l": 151.51801, "t": 163.52094, "r": 480.5903, "b": 171.59064, "coord_origin": "1"}}, {"id": 9, "text": "Processing Association Annual Summit and Conference (APSIPA ASC). pp. 1868-", "bbox": {"l": 151.51801, "t": 174.47992, "r": 480.59286000000003, "b": 182.54962, "coord_origin": "1"}}, {"id": 10, "text": "1873. IEEE (2022)", "bbox": {"l": 151.51801, "t": 185.4389, "r": 226.37399, "b": 193.50860999999998, "coord_origin": "1"}}]}, "text": "6. Lee, E., Kwon, J., Yang, H., Park, J., Lee, S., Koo, H.I., Cho, N.I.: Table structure recognition based on grid shape graph. In: 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). pp. 18681873. IEEE (2022)"}, {"label": "List-item", "id": 4, "page_no": 12, "cluster": {"id": 4, "label": "List-item", "bbox": {"l": 139.07085943222046, "t": 195.3876846313476, "r": 480.59012, "b": 215.5838447570801, "coord_origin": "1"}, "confidence": 0.9723377227783203, "cells": [{"id": 11, "text": "7.", "bbox": {"l": 139.371, "t": 196.40886999999998, "r": 146.31418, "b": 204.47857999999997, "coord_origin": "1"}}, {"id": 12, "text": "Li, M., Cui, L., Huang, S., Wei, F., Zhou, M., Li, Z.: Tablebank: A benchmark", "bbox": {"l": 150.77789, "t": 196.40886999999998, "r": 480.59012, "b": 204.47857999999997, "coord_origin": "1"}}, {"id": 13, "text": "dataset for table detection and recognition (2019)", "bbox": {"l": 151.51801, "t": 207.36785999999995, "r": 352.01746, "b": 215.43755999999996, "coord_origin": "1"}}]}, "text": "7. Li, M., Cui, L., Huang, S., Wei, F., Zhou, M., Li, Z.: Tablebank: A benchmark dataset for table detection and recognition (2019)"}, {"label": "List-item", "id": 5, "page_no": 12, "cluster": {"id": 5, "label": "List-item", "bbox": {"l": 138.5443937301636, "t": 217.49708290100102, "r": 480.8269432067871, "b": 270.28839000000005, "coord_origin": "1"}, "confidence": 0.9826642274856567, "cells": [{"id": 14, "text": "8.", "bbox": {"l": 139.371, "t": 218.33887000000004, "r": 146.37106, "b": 226.40857000000005, "coord_origin": "1"}}, {"id": 15, "text": "Livathinos, N., Berrospi, C., Lysak, M., Kuropiatnyk, V., Nassar, A., Carvalho,", "bbox": {"l": 150.87132, "t": 218.33887000000004, "r": 480.58731000000006, "b": 226.40857000000005, "coord_origin": "1"}}, {"id": 16, "text": "A., Dolfi, M., Auer, C., Dinkla, K., Staar, P.: Robust pdf document conversion", "bbox": {"l": 151.51801, "t": 229.29785000000004, "r": 480.59020999999996, "b": 237.36755000000005, "coord_origin": "1"}}, {"id": 17, "text": "using recurrent neural networks. Proceedings of the AAAI Conference on Artificial", "bbox": {"l": 151.51801, "t": 240.25684, "r": 480.59473, "b": 248.32654000000002, "coord_origin": "1"}}, {"id": 18, "text": "Intelligence", "bbox": {"l": 151.51801, "t": 251.21582, "r": 197.08617, "b": 259.28552, "coord_origin": "1"}}, {"id": 19, "text": "35", "bbox": {"l": 199.40001, "t": 251.15301999999997, "r": 210.00726, "b": 259.07935, "coord_origin": "1"}}, {"id": 20, "text": "(17), 15137-15145 (May 2021),", "bbox": {"l": 210.007, "t": 251.21582, "r": 332.37683, "b": 259.28552, "coord_origin": "1"}}, {"id": 21, "text": "https://ojs.aaai.org/index.php/", "bbox": {"l": 334.69901, "t": 251.86139000000003, "r": 480.59039000000007, "b": 259.33038, "coord_origin": "1"}}, {"id": 22, "text": "AAAI/article/view/17777", "bbox": {"l": 151.51801, "t": 262.8194, "r": 259.75769, "b": 270.28839000000005, "coord_origin": "1"}}]}, "text": "8. Livathinos, N., Berrospi, C., Lysak, M., Kuropiatnyk, V., Nassar, A., Carvalho, A., Dolfi, M., Auer, C., Dinkla, K., Staar, P.: Robust pdf document conversion using recurrent neural networks. Proceedings of the AAAI Conference on Artificial Intelligence 35 (17), 15137-15145 (May 2021), https://ojs.aaai.org/index.php/ AAAI/article/view/17777"}, {"label": "List-item", "id": 6, "page_no": 12, "cluster": {"id": 6, "label": "List-item", "bbox": {"l": 138.21877613067628, "t": 272.1957120895386, "r": 480.59387000000004, "b": 304.09056758880615, "coord_origin": "1"}, "confidence": 0.9749984741210938, "cells": [{"id": 23, "text": "9.", "bbox": {"l": 139.371, "t": 273.14484000000004, "r": 146.14218, "b": 281.21457, "coord_origin": "1"}}, {"id": 24, "text": "Nassar, A., Livathinos, N., Lysak, M., Staar, P.: Tableformer: Table structure un-", "bbox": {"l": 150.49533, "t": 273.14484000000004, "r": 480.5881999999999, "b": 281.21457, "coord_origin": "1"}}, {"id": 25, "text": "derstanding with transformers. In: Proceedings of the IEEE/CVF Conference on", "bbox": {"l": 151.51801, "t": 284.10379, "r": 480.59387000000004, "b": 292.17355, "coord_origin": "1"}}, {"id": 26, "text": "Computer Vision and Pattern Recognition (CVPR). pp. 4614-4623 (June 2022)", "bbox": {"l": 151.51801, "t": 295.06277, "r": 473.44308000000007, "b": 303.13254, "coord_origin": "1"}}]}, "text": "9. Nassar, A., Livathinos, N., Lysak, M., Staar, P.: Tableformer: Table structure understanding with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 4614-4623 (June 2022)"}, {"label": "List-item", "id": 7, "page_no": 12, "cluster": {"id": 7, "label": "List-item", "bbox": {"l": 134.74402370452881, "t": 305.29434299468994, "r": 480.6158374786377, "b": 369.1853977203369, "coord_origin": "1"}, "confidence": 0.9822485446929932, "cells": [{"id": 27, "text": "10.", "bbox": {"l": 134.76401, "t": 306.03277999999995, "r": 146.49922, "b": 314.10254000000003, "coord_origin": "1"}}, {"id": 28, "text": "Pfitzmann, B., Auer, C., Dolfi, M., Nassar, A.S., Staar, P.W.J.: Doclaynet: A", "bbox": {"l": 151.09138, "t": 306.03277999999995, "r": 480.58905, "b": 314.10254000000003, "coord_origin": "1"}}, {"id": 29, "text": "large human-annotated dataset for document-layout segmentation. In: Zhang, A.,", "bbox": {"l": 151.51801, "t": 316.99179, "r": 480.59015, "b": 325.06155, "coord_origin": "1"}}, {"id": 30, "text": "Rangwala, H. (eds.) KDD \u201922: The 28th ACM SIGKDD Conference on Knowledge", "bbox": {"l": 151.51801, "t": 327.95078, "r": 480.59113, "b": 336.02054, "coord_origin": "1"}}, {"id": 31, "text": "Discovery and Data Mining, Washington, DC, USA, August 14 - 18, 2022. pp.", "bbox": {"l": 151.51801, "t": 338.90976, "r": 480.59113, "b": 346.97952, "coord_origin": "1"}}, {"id": 32, "text": "3743-3751. ACM (2022).", "bbox": {"l": 151.51801, "t": 349.86874, "r": 251.14098999999996, "b": 357.93851, "coord_origin": "1"}}, {"id": 33, "text": "https://doi.org/10.1145/3534678.3539043", "bbox": {"l": 253.99001, "t": 350.5143100000001, "r": 437.53311, "b": 357.98333999999994, "coord_origin": "1"}}, {"id": 34, "text": ",", "bbox": {"l": 437.53201, "t": 349.86874, "r": 440.09102999999993, "b": 357.93851, "coord_origin": "1"}}, {"id": 35, "text": "https://", "bbox": {"l": 442.94202000000007, "t": 350.5143100000001, "r": 480.59372, "b": 357.98333999999994, "coord_origin": "1"}}, {"id": 36, "text": "doi.org/10.1145/3534678.3539043", "bbox": {"l": 151.51801, "t": 361.47329999999994, "r": 297.40939, "b": 368.94232, "coord_origin": "1"}}]}, "text": "10. Pfitzmann, B., Auer, C., Dolfi, M., Nassar, A.S., Staar, P.W.J.: Doclaynet: A large human-annotated dataset for document-layout segmentation. In: Zhang, A., Rangwala, H. (eds.) KDD \u201922: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 14 - 18, 2022. pp. 3743-3751. ACM (2022). https://doi.org/10.1145/3534678.3539043 , https:// doi.org/10.1145/3534678.3539043"}, {"label": "List-item", "id": 8, "page_no": 12, "cluster": {"id": 8, "label": "List-item", "bbox": {"l": 134.48021450042725, "t": 370.85761642456055, "r": 480.59296, "b": 413.06162338256837, "coord_origin": "1"}, "confidence": 0.9819908142089844, "cells": [{"id": 37, "text": "11.", "bbox": {"l": 134.76401, "t": 371.79773, "r": 146.03854, "b": 379.86749, "coord_origin": "1"}}, {"id": 38, "text": "Prasad, D., Gadpal, A., Kapadni, K., Visave, M., Sultanpure, K.: Cascadetabnet:", "bbox": {"l": 150.4505, "t": 371.79773, "r": 480.58914000000004, "b": 379.86749, "coord_origin": "1"}}, {"id": 39, "text": "An approach for end to end table detection and structure recognition from image-", "bbox": {"l": 151.51801, "t": 382.7567399999999, "r": 480.59296, "b": 390.82651, "coord_origin": "1"}}, {"id": 40, "text": "based documents. In: Proceedings of the IEEE/CVF conference on computer vision", "bbox": {"l": 151.51801, "t": 393.71573, "r": 480.59293, "b": 401.78549, "coord_origin": "1"}}, {"id": 41, "text": "and pattern recognition workshops. pp. 572-573 (2020)", "bbox": {"l": 151.51801, "t": 404.67471, "r": 373.82727, "b": 412.74448, "coord_origin": "1"}}]}, "text": "11. Prasad, D., Gadpal, A., Kapadni, K., Visave, M., Sultanpure, K.: Cascadetabnet: An approach for end to end table detection and structure recognition from imagebased documents. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops. pp. 572-573 (2020)"}, {"label": "List-item", "id": 9, "page_no": 12, "cluster": {"id": 9, "label": "List-item", "bbox": {"l": 134.6136074066162, "t": 414.916438293457, "r": 480.62972831726074, "b": 457.31890296936035, "coord_origin": "1"}, "confidence": 0.9810307025909424, "cells": [{"id": 42, "text": "12.", "bbox": {"l": 134.76401, "t": 415.64471, "r": 145.91106, "b": 423.71448000000004, "coord_origin": "1"}}, {"id": 43, "text": "Schreiber, S., Agne, S., Wolf, I., Dengel, A., Ahmed, S.: Deepdesrt: Deep learning", "bbox": {"l": 150.27309, "t": 415.64471, "r": 480.5874, "b": 423.71448000000004, "coord_origin": "1"}}, {"id": 44, "text": "for detection and structure recognition of tables in document images. In: 2017 14th", "bbox": {"l": 151.51801, "t": 426.60373, "r": 480.59469999999993, "b": 434.67349, "coord_origin": "1"}}, {"id": 45, "text": "IAPR international conference on document analysis and recognition (ICDAR).", "bbox": {"l": 151.51801, "t": 437.5627099999999, "r": 480.58844, "b": 445.63248, "coord_origin": "1"}}, {"id": 46, "text": "vol. 1, pp. 1162-1167. IEEE (2017)", "bbox": {"l": 151.51801, "t": 448.5217, "r": 292.91455, "b": 456.59146, "coord_origin": "1"}}]}, "text": "12. Schreiber, S., Agne, S., Wolf, I., Dengel, A., Ahmed, S.: Deepdesrt: Deep learning for detection and structure recognition of tables in document images. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR). vol. 1, pp. 1162-1167. IEEE (2017)"}, {"label": "List-item", "id": 10, "page_no": 12, "cluster": {"id": 10, "label": "List-item", "bbox": {"l": 134.72238492965698, "t": 458.3810577392578, "r": 480.75556297302245, "b": 501.21110343933105, "coord_origin": "1"}, "confidence": 0.9801984429359436, "cells": [{"id": 47, "text": "13.", "bbox": {"l": 134.76401, "t": 459.4917, "r": 145.7785, "b": 467.56146, "coord_origin": "1"}}, {"id": 48, "text": "Siddiqui, S.A., Fateh, I.A., Rizvi, S.T.R., Dengel, A., Ahmed, S.: Deeptabstr: Deep", "bbox": {"l": 150.08871, "t": 459.4917, "r": 480.59006, "b": 467.56146, "coord_origin": "1"}}, {"id": 49, "text": "learning based table structure recognition. In: 2019 International Conference on", "bbox": {"l": 151.51801, "t": 470.45071, "r": 480.59116, "b": 478.52048, "coord_origin": "1"}}, {"id": 50, "text": "Document Analysis and Recognition (ICDAR). pp. 1403-1409 (2019).", "bbox": {"l": 151.51801, "t": 481.4097, "r": 439.05963, "b": 489.47946, "coord_origin": "1"}}, {"id": 51, "text": "https://", "bbox": {"l": 442.94202000000007, "t": 482.05527, "r": 480.59372, "b": 489.52429, "coord_origin": "1"}}, {"id": 52, "text": "doi.org/10.1109/ICDAR.2019.00226", "bbox": {"l": 151.51801, "t": 493.01425, "r": 302.11584, "b": 500.48328, "coord_origin": "1"}}]}, "text": "13. Siddiqui, S.A., Fateh, I.A., Rizvi, S.T.R., Dengel, A., Ahmed, S.: Deeptabstr: Deep learning based table structure recognition. In: 2019 International Conference on Document Analysis and Recognition (ICDAR). pp. 1403-1409 (2019). https:// doi.org/10.1109/ICDAR.2019.00226"}, {"label": "List-item", "id": 11, "page_no": 12, "cluster": {"id": 11, "label": "List-item", "bbox": {"l": 134.37410717010496, "t": 502.09606246948243, "r": 480.59286000000003, "b": 544.6769313812256, "coord_origin": "1"}, "confidence": 0.981369137763977, "cells": [{"id": 53, "text": "14.", "bbox": {"l": 134.76401, "t": 503.33868, "r": 146.15501, "b": 511.40845, "coord_origin": "1"}}, {"id": 54, "text": "Smock, B., Pesala, R., Abraham, R.: PubTables-1M: Towards comprehensive ta-", "bbox": {"l": 150.61252, "t": 503.33868, "r": 480.59088, "b": 511.40845, "coord_origin": "1"}}, {"id": 55, "text": "ble extraction from unstructured documents. In: Proceedings of the IEEE/CVF", "bbox": {"l": 151.51801, "t": 514.2977000000001, "r": 480.59286000000003, "b": 522.3674599999999, "coord_origin": "1"}}, {"id": 56, "text": "Conference on Computer Vision and Pattern Recognition (CVPR). pp. 4634-4642", "bbox": {"l": 151.51801, "t": 525.25668, "r": 480.58838000000003, "b": 533.32645, "coord_origin": "1"}}, {"id": 57, "text": "(June 2022)", "bbox": {"l": 151.51801, "t": 536.21568, "r": 199.24704, "b": 544.28545, "coord_origin": "1"}}]}, "text": "14. Smock, B., Pesala, R., Abraham, R.: PubTables-1M: Towards comprehensive table extraction from unstructured documents. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 4634-4642 (June 2022)"}, {"label": "List-item", "id": 12, "page_no": 12, "cluster": {"id": 12, "label": "List-item", "bbox": {"l": 134.60514450073242, "t": 546.2972602844238, "r": 480.6208276748657, "b": 610.09528, "coord_origin": "1"}, "confidence": 0.9817614555358887, "cells": [{"id": 58, "text": "15.", "bbox": {"l": 134.76401, "t": 547.18568, "r": 146.16588, "b": 555.25545, "coord_origin": "1"}}, {"id": 59, "text": "Staar, P.W.J., Dolfi, M., Auer, C., Bekas, C.: Corpus conversion service: A ma-", "bbox": {"l": 150.62764, "t": 547.18568, "r": 480.58734000000004, "b": 555.25545, "coord_origin": "1"}}, {"id": 60, "text": "chine learning platform to ingest documents at scale. In: Proceedings of the 24th", "bbox": {"l": 151.51801, "t": 558.14468, "r": 480.58838000000003, "b": 566.2144499999999, "coord_origin": "1"}}, {"id": 61, "text": "ACM SIGKDD International Conference on Knowledge Discovery & Data Min-", "bbox": {"l": 151.51801, "t": 569.1036799999999, "r": 480.59109, "b": 577.17345, "coord_origin": "1"}}, {"id": 62, "text": "ing. pp. 774-782. KDD \u201918, Association for Computing Machinery, New York, NY,", "bbox": {"l": 151.51801, "t": 580.06268, "r": 480.59195, "b": 588.1324500000001, "coord_origin": "1"}}, {"id": 63, "text": "USA (2018).", "bbox": {"l": 151.51801, "t": 591.0216800000001, "r": 200.75787, "b": 599.09145, "coord_origin": "1"}}, {"id": 64, "text": "https://doi.org/10.1145/3219819.3219834", "bbox": {"l": 202.916, "t": 591.66727, "r": 386.45911, "b": 599.1362799999999, "coord_origin": "1"}}, {"id": 65, "text": ",", "bbox": {"l": 386.45801, "t": 591.0216800000001, "r": 389.01703, "b": 599.09145, "coord_origin": "1"}}, {"id": 66, "text": "https://doi.org/10.", "bbox": {"l": 391.173, "t": 591.66727, "r": 480.59583, "b": 599.1362799999999, "coord_origin": "1"}}, {"id": 67, "text": "1145/3219819.3219834", "bbox": {"l": 151.51801, "t": 602.62627, "r": 245.63831, "b": 610.09528, "coord_origin": "1"}}]}, "text": "15. Staar, P.W.J., Dolfi, M., Auer, C., Bekas, C.: Corpus conversion service: A machine learning platform to ingest documents at scale. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. pp. 774-782. KDD \u201918, Association for Computing Machinery, New York, NY, USA (2018). https://doi.org/10.1145/3219819.3219834 , https://doi.org/10. 1145/3219819.3219834"}, {"label": "List-item", "id": 13, "page_no": 12, "cluster": {"id": 13, "label": "List-item", "bbox": {"l": 134.76401, "t": 612.154292678833, "r": 480.59542999999996, "b": 632.0587142944336, "coord_origin": "1"}, "confidence": 0.9665940999984741, "cells": [{"id": 68, "text": "16.", "bbox": {"l": 134.76401, "t": 612.95068, "r": 146.62019, "b": 621.02045, "coord_origin": "1"}}, {"id": 69, "text": "Wang, X.: Tabular Abstraction, Editing, and Formatting. Ph.D. thesis, CAN", "bbox": {"l": 151.25977, "t": 612.95068, "r": 480.59542999999996, "b": 621.02045, "coord_origin": "1"}}, {"id": 70, "text": "(1996), aAINN09397", "bbox": {"l": 151.51801, "t": 623.90968, "r": 234.43031, "b": 631.97945, "coord_origin": "1"}}]}, "text": "16. Wang, X.: Tabular Abstraction, Editing, and Formatting. Ph.D. thesis, CAN (1996), aAINN09397"}, {"label": "List-item", "id": 14, "page_no": 12, "cluster": {"id": 14, "label": "List-item", "bbox": {"l": 134.76401, "t": 634.2881629943848, "r": 480.59119, "b": 665.3440200805663, "coord_origin": "1"}, "confidence": 0.9786126017570496, "cells": [{"id": 71, "text": "17.", "bbox": {"l": 134.76401, "t": 634.87968, "r": 146.30539, "b": 642.9494500000001, "coord_origin": "1"}}, {"id": 72, "text": "Xue, W., Li, Q., Tao, D.: Res2tim: Reconstruct syntactic structures from table", "bbox": {"l": 150.82175, "t": 634.87968, "r": 480.58731000000006, "b": 642.9494500000001, "coord_origin": "1"}}, {"id": 73, "text": "images. In: 2019 International Conference on Document Analysis and Recognition", "bbox": {"l": 151.51801, "t": 645.8386800000001, "r": 480.59119, "b": 653.90845, "coord_origin": "1"}}, {"id": 74, "text": "(ICDAR). pp. 749-755. IEEE (2019)", "bbox": {"l": 151.51801, "t": 656.79768, "r": 299.30307, "b": 664.86745, "coord_origin": "1"}}]}, "text": "17. Xue, W., Li, Q., Tao, D.: Res2tim: Reconstruct syntactic structures from table images. In: 2019 International Conference on Document Analysis and Recognition (ICDAR). pp. 749-755. IEEE (2019)"}], "body": [{"label": "List-item", "id": 2, "page_no": 12, "cluster": {"id": 2, "label": "List-item", "bbox": {"l": 138.69608402252197, "t": 119.06796512603762, "r": 480.59479, "b": 150.90859880447385, "coord_origin": "1"}, "confidence": 0.976601779460907, "cells": [{"id": 2, "text": "5.", "bbox": {"l": 139.371, "t": 119.67400999999995, "r": 146.04857, "b": 127.74370999999985, "coord_origin": "1"}}, {"id": 3, "text": "Kayal, P., Anand, M., Desai, H., Singh, M.: Tables to latex: structure and content", "bbox": {"l": 150.34157, "t": 119.67400999999995, "r": 480.58826, "b": 127.74370999999985, "coord_origin": "1"}}, {"id": 4, "text": "extraction from scientific tables. International Journal on Document Analysis and", "bbox": {"l": 151.51801, "t": 130.63300000000004, "r": 480.59479, "b": 138.70270000000005, "coord_origin": "1"}}, {"id": 5, "text": "Recognition (IJDAR) pp. 1-10 (2022)", "bbox": {"l": 151.51801, "t": 141.59198000000004, "r": 304.04364, "b": 149.66168000000005, "coord_origin": "1"}}]}, "text": "5. Kayal, P., Anand, M., Desai, H., Singh, M.: Tables to latex: structure and content extraction from scientific tables. International Journal on Document Analysis and Recognition (IJDAR) pp. 1-10 (2022)"}, {"label": "List-item", "id": 3, "page_no": 12, "cluster": {"id": 3, "label": "List-item", "bbox": {"l": 138.54494819641113, "t": 151.70322275161743, "r": 480.75314083099363, "b": 193.50860999999998, "coord_origin": "1"}, "confidence": 0.9813704490661621, "cells": [{"id": 6, "text": "6.", "bbox": {"l": 139.371, "t": 152.56195000000002, "r": 145.93991, "b": 160.63165000000004, "coord_origin": "1"}}, {"id": 7, "text": "Lee, E., Kwon, J., Yang, H., Park, J., Lee, S., Koo, H.I., Cho, N.I.: Table structure", "bbox": {"l": 150.16298, "t": 152.56195000000002, "r": 480.59015, "b": 160.63165000000004, "coord_origin": "1"}}, {"id": 8, "text": "recognition based on grid shape graph. In: 2022 Asia-Pacific Signal and Information", "bbox": {"l": 151.51801, "t": 163.52094, "r": 480.5903, "b": 171.59064, "coord_origin": "1"}}, {"id": 9, "text": "Processing Association Annual Summit and Conference (APSIPA ASC). pp. 1868-", "bbox": {"l": 151.51801, "t": 174.47992, "r": 480.59286000000003, "b": 182.54962, "coord_origin": "1"}}, {"id": 10, "text": "1873. IEEE (2022)", "bbox": {"l": 151.51801, "t": 185.4389, "r": 226.37399, "b": 193.50860999999998, "coord_origin": "1"}}]}, "text": "6. Lee, E., Kwon, J., Yang, H., Park, J., Lee, S., Koo, H.I., Cho, N.I.: Table structure recognition based on grid shape graph. In: 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). pp. 18681873. IEEE (2022)"}, {"label": "List-item", "id": 4, "page_no": 12, "cluster": {"id": 4, "label": "List-item", "bbox": {"l": 139.07085943222046, "t": 195.3876846313476, "r": 480.59012, "b": 215.5838447570801, "coord_origin": "1"}, "confidence": 0.9723377227783203, "cells": [{"id": 11, "text": "7.", "bbox": {"l": 139.371, "t": 196.40886999999998, "r": 146.31418, "b": 204.47857999999997, "coord_origin": "1"}}, {"id": 12, "text": "Li, M., Cui, L., Huang, S., Wei, F., Zhou, M., Li, Z.: Tablebank: A benchmark", "bbox": {"l": 150.77789, "t": 196.40886999999998, "r": 480.59012, "b": 204.47857999999997, "coord_origin": "1"}}, {"id": 13, "text": "dataset for table detection and recognition (2019)", "bbox": {"l": 151.51801, "t": 207.36785999999995, "r": 352.01746, "b": 215.43755999999996, "coord_origin": "1"}}]}, "text": "7. Li, M., Cui, L., Huang, S., Wei, F., Zhou, M., Li, Z.: Tablebank: A benchmark dataset for table detection and recognition (2019)"}, {"label": "List-item", "id": 5, "page_no": 12, "cluster": {"id": 5, "label": "List-item", "bbox": {"l": 138.5443937301636, "t": 217.49708290100102, "r": 480.8269432067871, "b": 270.28839000000005, "coord_origin": "1"}, "confidence": 0.9826642274856567, "cells": [{"id": 14, "text": "8.", "bbox": {"l": 139.371, "t": 218.33887000000004, "r": 146.37106, "b": 226.40857000000005, "coord_origin": "1"}}, {"id": 15, "text": "Livathinos, N., Berrospi, C., Lysak, M., Kuropiatnyk, V., Nassar, A., Carvalho,", "bbox": {"l": 150.87132, "t": 218.33887000000004, "r": 480.58731000000006, "b": 226.40857000000005, "coord_origin": "1"}}, {"id": 16, "text": "A., Dolfi, M., Auer, C., Dinkla, K., Staar, P.: Robust pdf document conversion", "bbox": {"l": 151.51801, "t": 229.29785000000004, "r": 480.59020999999996, "b": 237.36755000000005, "coord_origin": "1"}}, {"id": 17, "text": "using recurrent neural networks. Proceedings of the AAAI Conference on Artificial", "bbox": {"l": 151.51801, "t": 240.25684, "r": 480.59473, "b": 248.32654000000002, "coord_origin": "1"}}, {"id": 18, "text": "Intelligence", "bbox": {"l": 151.51801, "t": 251.21582, "r": 197.08617, "b": 259.28552, "coord_origin": "1"}}, {"id": 19, "text": "35", "bbox": {"l": 199.40001, "t": 251.15301999999997, "r": 210.00726, "b": 259.07935, "coord_origin": "1"}}, {"id": 20, "text": "(17), 15137-15145 (May 2021),", "bbox": {"l": 210.007, "t": 251.21582, "r": 332.37683, "b": 259.28552, "coord_origin": "1"}}, {"id": 21, "text": "https://ojs.aaai.org/index.php/", "bbox": {"l": 334.69901, "t": 251.86139000000003, "r": 480.59039000000007, "b": 259.33038, "coord_origin": "1"}}, {"id": 22, "text": "AAAI/article/view/17777", "bbox": {"l": 151.51801, "t": 262.8194, "r": 259.75769, "b": 270.28839000000005, "coord_origin": "1"}}]}, "text": "8. Livathinos, N., Berrospi, C., Lysak, M., Kuropiatnyk, V., Nassar, A., Carvalho, A., Dolfi, M., Auer, C., Dinkla, K., Staar, P.: Robust pdf document conversion using recurrent neural networks. Proceedings of the AAAI Conference on Artificial Intelligence 35 (17), 15137-15145 (May 2021), https://ojs.aaai.org/index.php/ AAAI/article/view/17777"}, {"label": "List-item", "id": 6, "page_no": 12, "cluster": {"id": 6, "label": "List-item", "bbox": {"l": 138.21877613067628, "t": 272.1957120895386, "r": 480.59387000000004, "b": 304.09056758880615, "coord_origin": "1"}, "confidence": 0.9749984741210938, "cells": [{"id": 23, "text": "9.", "bbox": {"l": 139.371, "t": 273.14484000000004, "r": 146.14218, "b": 281.21457, "coord_origin": "1"}}, {"id": 24, "text": "Nassar, A., Livathinos, N., Lysak, M., Staar, P.: Tableformer: Table structure un-", "bbox": {"l": 150.49533, "t": 273.14484000000004, "r": 480.5881999999999, "b": 281.21457, "coord_origin": "1"}}, {"id": 25, "text": "derstanding with transformers. In: Proceedings of the IEEE/CVF Conference on", "bbox": {"l": 151.51801, "t": 284.10379, "r": 480.59387000000004, "b": 292.17355, "coord_origin": "1"}}, {"id": 26, "text": "Computer Vision and Pattern Recognition (CVPR). pp. 4614-4623 (June 2022)", "bbox": {"l": 151.51801, "t": 295.06277, "r": 473.44308000000007, "b": 303.13254, "coord_origin": "1"}}]}, "text": "9. Nassar, A., Livathinos, N., Lysak, M., Staar, P.: Tableformer: Table structure understanding with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 4614-4623 (June 2022)"}, {"label": "List-item", "id": 7, "page_no": 12, "cluster": {"id": 7, "label": "List-item", "bbox": {"l": 134.74402370452881, "t": 305.29434299468994, "r": 480.6158374786377, "b": 369.1853977203369, "coord_origin": "1"}, "confidence": 0.9822485446929932, "cells": [{"id": 27, "text": "10.", "bbox": {"l": 134.76401, "t": 306.03277999999995, "r": 146.49922, "b": 314.10254000000003, "coord_origin": "1"}}, {"id": 28, "text": "Pfitzmann, B., Auer, C., Dolfi, M., Nassar, A.S., Staar, P.W.J.: Doclaynet: A", "bbox": {"l": 151.09138, "t": 306.03277999999995, "r": 480.58905, "b": 314.10254000000003, "coord_origin": "1"}}, {"id": 29, "text": "large human-annotated dataset for document-layout segmentation. In: Zhang, A.,", "bbox": {"l": 151.51801, "t": 316.99179, "r": 480.59015, "b": 325.06155, "coord_origin": "1"}}, {"id": 30, "text": "Rangwala, H. (eds.) KDD \u201922: The 28th ACM SIGKDD Conference on Knowledge", "bbox": {"l": 151.51801, "t": 327.95078, "r": 480.59113, "b": 336.02054, "coord_origin": "1"}}, {"id": 31, "text": "Discovery and Data Mining, Washington, DC, USA, August 14 - 18, 2022. pp.", "bbox": {"l": 151.51801, "t": 338.90976, "r": 480.59113, "b": 346.97952, "coord_origin": "1"}}, {"id": 32, "text": "3743-3751. ACM (2022).", "bbox": {"l": 151.51801, "t": 349.86874, "r": 251.14098999999996, "b": 357.93851, "coord_origin": "1"}}, {"id": 33, "text": "https://doi.org/10.1145/3534678.3539043", "bbox": {"l": 253.99001, "t": 350.5143100000001, "r": 437.53311, "b": 357.98333999999994, "coord_origin": "1"}}, {"id": 34, "text": ",", "bbox": {"l": 437.53201, "t": 349.86874, "r": 440.09102999999993, "b": 357.93851, "coord_origin": "1"}}, {"id": 35, "text": "https://", "bbox": {"l": 442.94202000000007, "t": 350.5143100000001, "r": 480.59372, "b": 357.98333999999994, "coord_origin": "1"}}, {"id": 36, "text": "doi.org/10.1145/3534678.3539043", "bbox": {"l": 151.51801, "t": 361.47329999999994, "r": 297.40939, "b": 368.94232, "coord_origin": "1"}}]}, "text": "10. Pfitzmann, B., Auer, C., Dolfi, M., Nassar, A.S., Staar, P.W.J.: Doclaynet: A large human-annotated dataset for document-layout segmentation. In: Zhang, A., Rangwala, H. (eds.) KDD \u201922: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 14 - 18, 2022. pp. 3743-3751. ACM (2022). https://doi.org/10.1145/3534678.3539043 , https:// doi.org/10.1145/3534678.3539043"}, {"label": "List-item", "id": 8, "page_no": 12, "cluster": {"id": 8, "label": "List-item", "bbox": {"l": 134.48021450042725, "t": 370.85761642456055, "r": 480.59296, "b": 413.06162338256837, "coord_origin": "1"}, "confidence": 0.9819908142089844, "cells": [{"id": 37, "text": "11.", "bbox": {"l": 134.76401, "t": 371.79773, "r": 146.03854, "b": 379.86749, "coord_origin": "1"}}, {"id": 38, "text": "Prasad, D., Gadpal, A., Kapadni, K., Visave, M., Sultanpure, K.: Cascadetabnet:", "bbox": {"l": 150.4505, "t": 371.79773, "r": 480.58914000000004, "b": 379.86749, "coord_origin": "1"}}, {"id": 39, "text": "An approach for end to end table detection and structure recognition from image-", "bbox": {"l": 151.51801, "t": 382.7567399999999, "r": 480.59296, "b": 390.82651, "coord_origin": "1"}}, {"id": 40, "text": "based documents. In: Proceedings of the IEEE/CVF conference on computer vision", "bbox": {"l": 151.51801, "t": 393.71573, "r": 480.59293, "b": 401.78549, "coord_origin": "1"}}, {"id": 41, "text": "and pattern recognition workshops. pp. 572-573 (2020)", "bbox": {"l": 151.51801, "t": 404.67471, "r": 373.82727, "b": 412.74448, "coord_origin": "1"}}]}, "text": "11. Prasad, D., Gadpal, A., Kapadni, K., Visave, M., Sultanpure, K.: Cascadetabnet: An approach for end to end table detection and structure recognition from imagebased documents. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops. pp. 572-573 (2020)"}, {"label": "List-item", "id": 9, "page_no": 12, "cluster": {"id": 9, "label": "List-item", "bbox": {"l": 134.6136074066162, "t": 414.916438293457, "r": 480.62972831726074, "b": 457.31890296936035, "coord_origin": "1"}, "confidence": 0.9810307025909424, "cells": [{"id": 42, "text": "12.", "bbox": {"l": 134.76401, "t": 415.64471, "r": 145.91106, "b": 423.71448000000004, "coord_origin": "1"}}, {"id": 43, "text": "Schreiber, S., Agne, S., Wolf, I., Dengel, A., Ahmed, S.: Deepdesrt: Deep learning", "bbox": {"l": 150.27309, "t": 415.64471, "r": 480.5874, "b": 423.71448000000004, "coord_origin": "1"}}, {"id": 44, "text": "for detection and structure recognition of tables in document images. In: 2017 14th", "bbox": {"l": 151.51801, "t": 426.60373, "r": 480.59469999999993, "b": 434.67349, "coord_origin": "1"}}, {"id": 45, "text": "IAPR international conference on document analysis and recognition (ICDAR).", "bbox": {"l": 151.51801, "t": 437.5627099999999, "r": 480.58844, "b": 445.63248, "coord_origin": "1"}}, {"id": 46, "text": "vol. 1, pp. 1162-1167. IEEE (2017)", "bbox": {"l": 151.51801, "t": 448.5217, "r": 292.91455, "b": 456.59146, "coord_origin": "1"}}]}, "text": "12. Schreiber, S., Agne, S., Wolf, I., Dengel, A., Ahmed, S.: Deepdesrt: Deep learning for detection and structure recognition of tables in document images. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR). vol. 1, pp. 1162-1167. IEEE (2017)"}, {"label": "List-item", "id": 10, "page_no": 12, "cluster": {"id": 10, "label": "List-item", "bbox": {"l": 134.72238492965698, "t": 458.3810577392578, "r": 480.75556297302245, "b": 501.21110343933105, "coord_origin": "1"}, "confidence": 0.9801984429359436, "cells": [{"id": 47, "text": "13.", "bbox": {"l": 134.76401, "t": 459.4917, "r": 145.7785, "b": 467.56146, "coord_origin": "1"}}, {"id": 48, "text": "Siddiqui, S.A., Fateh, I.A., Rizvi, S.T.R., Dengel, A., Ahmed, S.: Deeptabstr: Deep", "bbox": {"l": 150.08871, "t": 459.4917, "r": 480.59006, "b": 467.56146, "coord_origin": "1"}}, {"id": 49, "text": "learning based table structure recognition. In: 2019 International Conference on", "bbox": {"l": 151.51801, "t": 470.45071, "r": 480.59116, "b": 478.52048, "coord_origin": "1"}}, {"id": 50, "text": "Document Analysis and Recognition (ICDAR). pp. 1403-1409 (2019).", "bbox": {"l": 151.51801, "t": 481.4097, "r": 439.05963, "b": 489.47946, "coord_origin": "1"}}, {"id": 51, "text": "https://", "bbox": {"l": 442.94202000000007, "t": 482.05527, "r": 480.59372, "b": 489.52429, "coord_origin": "1"}}, {"id": 52, "text": "doi.org/10.1109/ICDAR.2019.00226", "bbox": {"l": 151.51801, "t": 493.01425, "r": 302.11584, "b": 500.48328, "coord_origin": "1"}}]}, "text": "13. Siddiqui, S.A., Fateh, I.A., Rizvi, S.T.R., Dengel, A., Ahmed, S.: Deeptabstr: Deep learning based table structure recognition. In: 2019 International Conference on Document Analysis and Recognition (ICDAR). pp. 1403-1409 (2019). https:// doi.org/10.1109/ICDAR.2019.00226"}, {"label": "List-item", "id": 11, "page_no": 12, "cluster": {"id": 11, "label": "List-item", "bbox": {"l": 134.37410717010496, "t": 502.09606246948243, "r": 480.59286000000003, "b": 544.6769313812256, "coord_origin": "1"}, "confidence": 0.981369137763977, "cells": [{"id": 53, "text": "14.", "bbox": {"l": 134.76401, "t": 503.33868, "r": 146.15501, "b": 511.40845, "coord_origin": "1"}}, {"id": 54, "text": "Smock, B., Pesala, R., Abraham, R.: PubTables-1M: Towards comprehensive ta-", "bbox": {"l": 150.61252, "t": 503.33868, "r": 480.59088, "b": 511.40845, "coord_origin": "1"}}, {"id": 55, "text": "ble extraction from unstructured documents. In: Proceedings of the IEEE/CVF", "bbox": {"l": 151.51801, "t": 514.2977000000001, "r": 480.59286000000003, "b": 522.3674599999999, "coord_origin": "1"}}, {"id": 56, "text": "Conference on Computer Vision and Pattern Recognition (CVPR). pp. 4634-4642", "bbox": {"l": 151.51801, "t": 525.25668, "r": 480.58838000000003, "b": 533.32645, "coord_origin": "1"}}, {"id": 57, "text": "(June 2022)", "bbox": {"l": 151.51801, "t": 536.21568, "r": 199.24704, "b": 544.28545, "coord_origin": "1"}}]}, "text": "14. Smock, B., Pesala, R., Abraham, R.: PubTables-1M: Towards comprehensive table extraction from unstructured documents. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 4634-4642 (June 2022)"}, {"label": "List-item", "id": 12, "page_no": 12, "cluster": {"id": 12, "label": "List-item", "bbox": {"l": 134.60514450073242, "t": 546.2972602844238, "r": 480.6208276748657, "b": 610.09528, "coord_origin": "1"}, "confidence": 0.9817614555358887, "cells": [{"id": 58, "text": "15.", "bbox": {"l": 134.76401, "t": 547.18568, "r": 146.16588, "b": 555.25545, "coord_origin": "1"}}, {"id": 59, "text": "Staar, P.W.J., Dolfi, M., Auer, C., Bekas, C.: Corpus conversion service: A ma-", "bbox": {"l": 150.62764, "t": 547.18568, "r": 480.58734000000004, "b": 555.25545, "coord_origin": "1"}}, {"id": 60, "text": "chine learning platform to ingest documents at scale. In: Proceedings of the 24th", "bbox": {"l": 151.51801, "t": 558.14468, "r": 480.58838000000003, "b": 566.2144499999999, "coord_origin": "1"}}, {"id": 61, "text": "ACM SIGKDD International Conference on Knowledge Discovery & Data Min-", "bbox": {"l": 151.51801, "t": 569.1036799999999, "r": 480.59109, "b": 577.17345, "coord_origin": "1"}}, {"id": 62, "text": "ing. pp. 774-782. KDD \u201918, Association for Computing Machinery, New York, NY,", "bbox": {"l": 151.51801, "t": 580.06268, "r": 480.59195, "b": 588.1324500000001, "coord_origin": "1"}}, {"id": 63, "text": "USA (2018).", "bbox": {"l": 151.51801, "t": 591.0216800000001, "r": 200.75787, "b": 599.09145, "coord_origin": "1"}}, {"id": 64, "text": "https://doi.org/10.1145/3219819.3219834", "bbox": {"l": 202.916, "t": 591.66727, "r": 386.45911, "b": 599.1362799999999, "coord_origin": "1"}}, {"id": 65, "text": ",", "bbox": {"l": 386.45801, "t": 591.0216800000001, "r": 389.01703, "b": 599.09145, "coord_origin": "1"}}, {"id": 66, "text": "https://doi.org/10.", "bbox": {"l": 391.173, "t": 591.66727, "r": 480.59583, "b": 599.1362799999999, "coord_origin": "1"}}, {"id": 67, "text": "1145/3219819.3219834", "bbox": {"l": 151.51801, "t": 602.62627, "r": 245.63831, "b": 610.09528, "coord_origin": "1"}}]}, "text": "15. Staar, P.W.J., Dolfi, M., Auer, C., Bekas, C.: Corpus conversion service: A machine learning platform to ingest documents at scale. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. pp. 774-782. KDD \u201918, Association for Computing Machinery, New York, NY, USA (2018). https://doi.org/10.1145/3219819.3219834 , https://doi.org/10. 1145/3219819.3219834"}, {"label": "List-item", "id": 13, "page_no": 12, "cluster": {"id": 13, "label": "List-item", "bbox": {"l": 134.76401, "t": 612.154292678833, "r": 480.59542999999996, "b": 632.0587142944336, "coord_origin": "1"}, "confidence": 0.9665940999984741, "cells": [{"id": 68, "text": "16.", "bbox": {"l": 134.76401, "t": 612.95068, "r": 146.62019, "b": 621.02045, "coord_origin": "1"}}, {"id": 69, "text": "Wang, X.: Tabular Abstraction, Editing, and Formatting. Ph.D. thesis, CAN", "bbox": {"l": 151.25977, "t": 612.95068, "r": 480.59542999999996, "b": 621.02045, "coord_origin": "1"}}, {"id": 70, "text": "(1996), aAINN09397", "bbox": {"l": 151.51801, "t": 623.90968, "r": 234.43031, "b": 631.97945, "coord_origin": "1"}}]}, "text": "16. Wang, X.: Tabular Abstraction, Editing, and Formatting. Ph.D. thesis, CAN (1996), aAINN09397"}, {"label": "List-item", "id": 14, "page_no": 12, "cluster": {"id": 14, "label": "List-item", "bbox": {"l": 134.76401, "t": 634.2881629943848, "r": 480.59119, "b": 665.3440200805663, "coord_origin": "1"}, "confidence": 0.9786126017570496, "cells": [{"id": 71, "text": "17.", "bbox": {"l": 134.76401, "t": 634.87968, "r": 146.30539, "b": 642.9494500000001, "coord_origin": "1"}}, {"id": 72, "text": "Xue, W., Li, Q., Tao, D.: Res2tim: Reconstruct syntactic structures from table", "bbox": {"l": 150.82175, "t": 634.87968, "r": 480.58731000000006, "b": 642.9494500000001, "coord_origin": "1"}}, {"id": 73, "text": "images. In: 2019 International Conference on Document Analysis and Recognition", "bbox": {"l": 151.51801, "t": 645.8386800000001, "r": 480.59119, "b": 653.90845, "coord_origin": "1"}}, {"id": 74, "text": "(ICDAR). pp. 749-755. IEEE (2019)", "bbox": {"l": 151.51801, "t": 656.79768, "r": 299.30307, "b": 664.86745, "coord_origin": "1"}}]}, "text": "17. Xue, W., Li, Q., Tao, D.: Res2tim: Reconstruct syntactic structures from table images. In: 2019 International Conference on Document Analysis and Recognition (ICDAR). pp. 749-755. IEEE (2019)"}], "headers": [{"label": "Page-header", "id": 0, "page_no": 12, "cluster": {"id": 0, "label": "Page-header", "bbox": {"l": 194.0724666595459, "t": 93.14808425903323, "r": 447.54291000000006, "b": 102.36713447570799, "coord_origin": "1"}, "confidence": 0.9549390077590942, "cells": [{"id": 0, "text": "Optimized Table Tokenization for Table Structure Recognition", "bbox": {"l": 194.478, "t": 93.77099999999996, "r": 447.54291000000006, "b": 101.84069999999997, "coord_origin": "1"}}]}, "text": "Optimized Table Tokenization for Table Structure Recognition"}, {"label": "Page-header", "id": 1, "page_no": 12, "cluster": {"id": 1, "label": "Page-header", "bbox": {"l": 471.1661275863647, "t": 93.57991390228267, "r": 480.5894799999999, "b": 101.84069999999997, "coord_origin": "1"}, "confidence": 0.9042201042175293, "cells": [{"id": 1, "text": "13", "bbox": {"l": 471.37561, "t": 93.77099999999996, "r": 480.5894799999999, "b": 101.84069999999997, "coord_origin": "1"}}]}, "text": "13"}]}}, {"page_no": 13, "page_hash": "435efd2ece1dfed60a8dcc1f7fd72dde2cb58c59f5aebc4d5ae2227510195b42", "size": {"width": 612.0, "height": 792.0}, "cells": [{"id": 0, "text": "14", "bbox": {"l": 134.765, "t": 93.77099999999996, "r": 143.97887, "b": 101.84069999999997, "coord_origin": "1"}}, {"id": 1, "text": "M.", "bbox": {"l": 167.82053, "t": 93.77099999999996, "r": 178.08249, "b": 101.84069999999997, "coord_origin": "1"}}, {"id": 2, "text": "Lysak, et al.", "bbox": {"l": 182.37929, "t": 93.77099999999996, "r": 231.72049000000004, "b": 101.84069999999997, "coord_origin": "1"}}, {"id": 3, "text": "18.", "bbox": {"l": 134.765, "t": 119.67400999999995, "r": 146.07936, "b": 127.74370999999985, "coord_origin": "1"}}, {"id": 4, "text": "Xue, W., Yu, B., Wang, W., Tao, D., Li, Q.: Tgrnet: A table graph reconstruc-", "bbox": {"l": 150.5069, "t": 119.67400999999995, "r": 480.5892, "b": 127.74370999999985, "coord_origin": "1"}}, {"id": 5, "text": "tion network for table structure recognition. In: Proceedings of the IEEE/CVF", "bbox": {"l": 151.51801, "t": 130.63300000000004, "r": 480.59113, "b": 138.70270000000005, "coord_origin": "1"}}, {"id": 6, "text": "International Conference on Computer Vision. pp. 1295-1304 (2021)", "bbox": {"l": 151.51801, "t": 141.59198000000004, "r": 427.53329, "b": 149.66168000000005, "coord_origin": "1"}}, {"id": 7, "text": "19.", "bbox": {"l": 134.765, "t": 152.55096000000003, "r": 146.19109, "b": 160.62067000000002, "coord_origin": "1"}}, {"id": 8, "text": "Ye, J., Qi, X., He, Y., Chen, Y., Gu, D., Gao, P., Xiao, R.: Pingan-vcgroup\u2019s", "bbox": {"l": 150.66234, "t": 152.55096000000003, "r": 480.5936899999999, "b": 160.62067000000002, "coord_origin": "1"}}, {"id": 9, "text": "solution for icdar 2021 competition on scientific literature parsing task b: Ta-", "bbox": {"l": 151.51801, "t": 163.50995, "r": 480.59469999999993, "b": 171.57965000000002, "coord_origin": "1"}}, {"id": 10, "text": "ble recognition to html (2021).", "bbox": {"l": 151.51801, "t": 174.46893, "r": 280.64047, "b": 182.53864, "coord_origin": "1"}}, {"id": 11, "text": "https://doi.org/10.48550/ARXIV.2105.01848", "bbox": {"l": 285.078, "t": 175.11450000000002, "r": 478.03403000000003, "b": 182.58349999999996, "coord_origin": "1"}}, {"id": 12, "text": ",", "bbox": {"l": 478.0319799999999, "t": 174.46893, "r": 480.59099999999995, "b": 182.53864, "coord_origin": "1"}}, {"id": 13, "text": "https://arxiv.org/abs/2105.01848", "bbox": {"l": 151.51797, "t": 186.07349, "r": 302.11584, "b": 193.54247999999995, "coord_origin": "1"}}, {"id": 14, "text": "20.", "bbox": {"l": 134.76497, "t": 196.38689999999997, "r": 145.65964, "b": 204.45659999999998, "coord_origin": "1"}}, {"id": 15, "text": "Zhang, Z., Zhang, J., Du, J., Wang, F.: Split, embed and merge: An accurate table", "bbox": {"l": 149.92294, "t": 196.38689999999997, "r": 480.5935400000001, "b": 204.45659999999998, "coord_origin": "1"}}, {"id": 16, "text": "structure recognizer. Pattern Recognition", "bbox": {"l": 151.51797, "t": 207.34491000000003, "r": 318.55124, "b": 215.41461000000004, "coord_origin": "1"}}, {"id": 17, "text": "126", "bbox": {"l": 321.62097, "t": 207.2821, "r": 337.53186, "b": 215.20844, "coord_origin": "1"}}, {"id": 18, "text": ", 108565 (2022)", "bbox": {"l": 337.53296, "t": 207.34491000000003, "r": 399.46927, "b": 215.41461000000004, "coord_origin": "1"}}, {"id": 19, "text": "21.", "bbox": {"l": 134.76495, "t": 218.30389000000002, "r": 145.7213, "b": 226.3736, "coord_origin": "1"}}, {"id": 20, "text": "Zheng, X., Burdick, D., Popa, L., Zhong, X., Wang, N.X.R.: Global table extractor", "bbox": {"l": 150.00871, "t": 218.30389000000002, "r": 480.59012, "b": 226.3736, "coord_origin": "1"}}, {"id": 21, "text": "(gte): A framework for joint table identification and cell structure recognition using", "bbox": {"l": 151.51796, "t": 229.26288, "r": 480.59102999999993, "b": 237.33258, "coord_origin": "1"}}, {"id": 22, "text": "visual context. In: 2021 IEEE Winter Conference on Applications of Computer Vi-", "bbox": {"l": 151.51796, "t": 240.22186, "r": 480.59119, "b": 248.29156, "coord_origin": "1"}}, {"id": 23, "text": "sion (WACV). pp. 697-706 (2021).", "bbox": {"l": 151.51796, "t": 251.18084999999996, "r": 293.44086, "b": 259.25055, "coord_origin": "1"}}, {"id": 24, "text": "https://doi.org/10.1109/WACV48630.2021.", "bbox": {"l": 297.04996, "t": 251.82641999999998, "r": 480.59305000000006, "b": 259.29540999999995, "coord_origin": "1"}}, {"id": 25, "text": "00074", "bbox": {"l": 151.51796, "t": 262.7854, "r": 175.05028, "b": 270.25438999999994, "coord_origin": "1"}}, {"id": 26, "text": "22.", "bbox": {"l": 134.76495, "t": 273.09882000000005, "r": 146.36798, "b": 281.16855000000004, "coord_origin": "1"}}, {"id": 27, "text": "Zhong, X., ShafieiBavani, E., Jimeno Yepes, A.: Image-based table recognition:", "bbox": {"l": 150.90846, "t": 273.09882000000005, "r": 480.59094, "b": 281.16855000000004, "coord_origin": "1"}}, {"id": 28, "text": "Data, model, and evaluation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M.", "bbox": {"l": 151.51796, "t": 284.05777, "r": 480.58832000000007, "b": 292.12753, "coord_origin": "1"}}, {"id": 29, "text": "(eds.) Computer Vision - ECCV 2020. pp. 564-580. Springer International Pub-", "bbox": {"l": 151.51796, "t": 295.01675, "r": 480.59558, "b": 303.08651999999995, "coord_origin": "1"}}, {"id": 30, "text": "lishing, Cham (2020)", "bbox": {"l": 151.51796, "t": 305.97574, "r": 236.02359, "b": 314.0455, "coord_origin": "1"}}, {"id": 31, "text": "23.", "bbox": {"l": 134.76495, "t": 316.93472, "r": 145.69547, "b": 325.00449000000003, "coord_origin": "1"}}, {"id": 32, "text": "Zhong, X., Tang, J., Yepes, A.J.: Publaynet: largest dataset ever for document lay-", "bbox": {"l": 149.97276, "t": 316.93472, "r": 480.59454, "b": 325.00449000000003, "coord_origin": "1"}}, {"id": 33, "text": "out analysis. In: 2019 International Conference on Document Analysis and Recog-", "bbox": {"l": 151.51796, "t": 327.8927299999999, "r": 480.59387000000004, "b": 335.96249, "coord_origin": "1"}}, {"id": 34, "text": "nition (ICDAR). pp. 1015-1022. IEEE (2019)", "bbox": {"l": 151.51796, "t": 338.85172, "r": 335.13635, "b": 346.92148, "coord_origin": "1"}}], "predictions": {"layout": {"clusters": [{"id": 0, "label": "Page-header", "bbox": {"l": 134.765, "t": 92.97494831085203, "r": 231.72049000000004, "b": 101.84069999999997, "coord_origin": "1"}, "confidence": 0.6591073870658875, "cells": [{"id": 0, "text": "14", "bbox": {"l": 134.765, "t": 93.77099999999996, "r": 143.97887, "b": 101.84069999999997, "coord_origin": "1"}}, {"id": 1, "text": "M.", "bbox": {"l": 167.82053, "t": 93.77099999999996, "r": 178.08249, "b": 101.84069999999997, "coord_origin": "1"}}, {"id": 2, "text": "Lysak, et al.", "bbox": {"l": 182.37929, "t": 93.77099999999996, "r": 231.72049000000004, "b": 101.84069999999997, "coord_origin": "1"}}]}, {"id": 1, "label": "List-item", "bbox": {"l": 134.6354066848755, "t": 118.99243412017825, "r": 480.59113, "b": 150.7261442184448, "coord_origin": "1"}, "confidence": 0.977252721786499, "cells": [{"id": 3, "text": "18.", "bbox": {"l": 134.765, "t": 119.67400999999995, "r": 146.07936, "b": 127.74370999999985, "coord_origin": "1"}}, {"id": 4, "text": "Xue, W., Yu, B., Wang, W., Tao, D., Li, Q.: Tgrnet: A table graph reconstruc-", "bbox": {"l": 150.5069, "t": 119.67400999999995, "r": 480.5892, "b": 127.74370999999985, "coord_origin": "1"}}, {"id": 5, "text": "tion network for table structure recognition. In: Proceedings of the IEEE/CVF", "bbox": {"l": 151.51801, "t": 130.63300000000004, "r": 480.59113, "b": 138.70270000000005, "coord_origin": "1"}}, {"id": 6, "text": "International Conference on Computer Vision. pp. 1295-1304 (2021)", "bbox": {"l": 151.51801, "t": 141.59198000000004, "r": 427.53329, "b": 149.66168000000005, "coord_origin": "1"}}]}, {"id": 2, "label": "List-item", "bbox": {"l": 134.765, "t": 151.89857425689695, "r": 480.9535074234009, "b": 193.6309467315674, "coord_origin": "1"}, "confidence": 0.9806671142578125, "cells": [{"id": 7, "text": "19.", "bbox": {"l": 134.765, "t": 152.55096000000003, "r": 146.19109, "b": 160.62067000000002, "coord_origin": "1"}}, {"id": 8, "text": "Ye, J., Qi, X., He, Y., Chen, Y., Gu, D., Gao, P., Xiao, R.: Pingan-vcgroup\u2019s", "bbox": {"l": 150.66234, "t": 152.55096000000003, "r": 480.5936899999999, "b": 160.62067000000002, "coord_origin": "1"}}, {"id": 9, "text": "solution for icdar 2021 competition on scientific literature parsing task b: Ta-", "bbox": {"l": 151.51801, "t": 163.50995, "r": 480.59469999999993, "b": 171.57965000000002, "coord_origin": "1"}}, {"id": 10, "text": "ble recognition to html (2021).", "bbox": {"l": 151.51801, "t": 174.46893, "r": 280.64047, "b": 182.53864, "coord_origin": "1"}}, {"id": 11, "text": "https://doi.org/10.48550/ARXIV.2105.01848", "bbox": {"l": 285.078, "t": 175.11450000000002, "r": 478.03403000000003, "b": 182.58349999999996, "coord_origin": "1"}}, {"id": 12, "text": ",", "bbox": {"l": 478.0319799999999, "t": 174.46893, "r": 480.59099999999995, "b": 182.53864, "coord_origin": "1"}}, {"id": 13, "text": "https://arxiv.org/abs/2105.01848", "bbox": {"l": 151.51797, "t": 186.07349, "r": 302.11584, "b": 193.54247999999995, "coord_origin": "1"}}]}, {"id": 3, "label": "List-item", "bbox": {"l": 134.35293531417847, "t": 195.45381202697752, "r": 480.5935400000001, "b": 215.6006504058838, "coord_origin": "1"}, "confidence": 0.9731975793838501, "cells": [{"id": 14, "text": "20.", "bbox": {"l": 134.76497, "t": 196.38689999999997, "r": 145.65964, "b": 204.45659999999998, "coord_origin": "1"}}, {"id": 15, "text": "Zhang, Z., Zhang, J., Du, J., Wang, F.: Split, embed and merge: An accurate table", "bbox": {"l": 149.92294, "t": 196.38689999999997, "r": 480.5935400000001, "b": 204.45659999999998, "coord_origin": "1"}}, {"id": 16, "text": "structure recognizer. Pattern Recognition", "bbox": {"l": 151.51797, "t": 207.34491000000003, "r": 318.55124, "b": 215.41461000000004, "coord_origin": "1"}}, {"id": 17, "text": "126", "bbox": {"l": 321.62097, "t": 207.2821, "r": 337.53186, "b": 215.20844, "coord_origin": "1"}}, {"id": 18, "text": ", 108565 (2022)", "bbox": {"l": 337.53296, "t": 207.34491000000003, "r": 399.46927, "b": 215.41461000000004, "coord_origin": "1"}}]}, {"id": 4, "label": "List-item", "bbox": {"l": 134.22648782730104, "t": 217.66440296173096, "r": 480.80447273254396, "b": 270.25438999999994, "coord_origin": "1"}, "confidence": 0.9837887287139893, "cells": [{"id": 19, "text": "21.", "bbox": {"l": 134.76495, "t": 218.30389000000002, "r": 145.7213, "b": 226.3736, "coord_origin": "1"}}, {"id": 20, "text": "Zheng, X., Burdick, D., Popa, L., Zhong, X., Wang, N.X.R.: Global table extractor", "bbox": {"l": 150.00871, "t": 218.30389000000002, "r": 480.59012, "b": 226.3736, "coord_origin": "1"}}, {"id": 21, "text": "(gte): A framework for joint table identification and cell structure recognition using", "bbox": {"l": 151.51796, "t": 229.26288, "r": 480.59102999999993, "b": 237.33258, "coord_origin": "1"}}, {"id": 22, "text": "visual context. In: 2021 IEEE Winter Conference on Applications of Computer Vi-", "bbox": {"l": 151.51796, "t": 240.22186, "r": 480.59119, "b": 248.29156, "coord_origin": "1"}}, {"id": 23, "text": "sion (WACV). pp. 697-706 (2021).", "bbox": {"l": 151.51796, "t": 251.18084999999996, "r": 293.44086, "b": 259.25055, "coord_origin": "1"}}, {"id": 24, "text": "https://doi.org/10.1109/WACV48630.2021.", "bbox": {"l": 297.04996, "t": 251.82641999999998, "r": 480.59305000000006, "b": 259.29540999999995, "coord_origin": "1"}}, {"id": 25, "text": "00074", "bbox": {"l": 151.51796, "t": 262.7854, "r": 175.05028, "b": 270.25438999999994, "coord_origin": "1"}}]}, {"id": 5, "label": "List-item", "bbox": {"l": 133.9917151451111, "t": 272.07529678344736, "r": 480.59558, "b": 314.3335727691651, "coord_origin": "1"}, "confidence": 0.9830114841461182, "cells": [{"id": 26, "text": "22.", "bbox": {"l": 134.76495, "t": 273.09882000000005, "r": 146.36798, "b": 281.16855000000004, "coord_origin": "1"}}, {"id": 27, "text": "Zhong, X., ShafieiBavani, E., Jimeno Yepes, A.: Image-based table recognition:", "bbox": {"l": 150.90846, "t": 273.09882000000005, "r": 480.59094, "b": 281.16855000000004, "coord_origin": "1"}}, {"id": 28, "text": "Data, model, and evaluation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M.", "bbox": {"l": 151.51796, "t": 284.05777, "r": 480.58832000000007, "b": 292.12753, "coord_origin": "1"}}, {"id": 29, "text": "(eds.) Computer Vision - ECCV 2020. pp. 564-580. Springer International Pub-", "bbox": {"l": 151.51796, "t": 295.01675, "r": 480.59558, "b": 303.08651999999995, "coord_origin": "1"}}, {"id": 30, "text": "lishing, Cham (2020)", "bbox": {"l": 151.51796, "t": 305.97574, "r": 236.02359, "b": 314.0455, "coord_origin": "1"}}]}, {"id": 6, "label": "List-item", "bbox": {"l": 134.23336029052734, "t": 316.3024206161499, "r": 480.59454, "b": 347.29821166992184, "coord_origin": "1"}, "confidence": 0.9820854067802429, "cells": [{"id": 31, "text": "23.", "bbox": {"l": 134.76495, "t": 316.93472, "r": 145.69547, "b": 325.00449000000003, "coord_origin": "1"}}, {"id": 32, "text": "Zhong, X., Tang, J., Yepes, A.J.: Publaynet: largest dataset ever for document lay-", "bbox": {"l": 149.97276, "t": 316.93472, "r": 480.59454, "b": 325.00449000000003, "coord_origin": "1"}}, {"id": 33, "text": "out analysis. In: 2019 International Conference on Document Analysis and Recog-", "bbox": {"l": 151.51796, "t": 327.8927299999999, "r": 480.59387000000004, "b": 335.96249, "coord_origin": "1"}}, {"id": 34, "text": "nition (ICDAR). pp. 1015-1022. IEEE (2019)", "bbox": {"l": 151.51796, "t": 338.85172, "r": 335.13635, "b": 346.92148, "coord_origin": "1"}}]}]}, "tablestructure": {"table_map": {}}, "figures_classification": null, "equations_prediction": null}, "assembled": {"elements": [{"label": "Page-header", "id": 0, "page_no": 13, "cluster": {"id": 0, "label": "Page-header", "bbox": {"l": 134.765, "t": 92.97494831085203, "r": 231.72049000000004, "b": 101.84069999999997, "coord_origin": "1"}, "confidence": 0.6591073870658875, "cells": [{"id": 0, "text": "14", "bbox": {"l": 134.765, "t": 93.77099999999996, "r": 143.97887, "b": 101.84069999999997, "coord_origin": "1"}}, {"id": 1, "text": "M.", "bbox": {"l": 167.82053, "t": 93.77099999999996, "r": 178.08249, "b": 101.84069999999997, "coord_origin": "1"}}, {"id": 2, "text": "Lysak, et al.", "bbox": {"l": 182.37929, "t": 93.77099999999996, "r": 231.72049000000004, "b": 101.84069999999997, "coord_origin": "1"}}]}, "text": "14 M. Lysak, et al."}, {"label": "List-item", "id": 1, "page_no": 13, "cluster": {"id": 1, "label": "List-item", "bbox": {"l": 134.6354066848755, "t": 118.99243412017825, "r": 480.59113, "b": 150.7261442184448, "coord_origin": "1"}, "confidence": 0.977252721786499, "cells": [{"id": 3, "text": "18.", "bbox": {"l": 134.765, "t": 119.67400999999995, "r": 146.07936, "b": 127.74370999999985, "coord_origin": "1"}}, {"id": 4, "text": "Xue, W., Yu, B., Wang, W., Tao, D., Li, Q.: Tgrnet: A table graph reconstruc-", "bbox": {"l": 150.5069, "t": 119.67400999999995, "r": 480.5892, "b": 127.74370999999985, "coord_origin": "1"}}, {"id": 5, "text": "tion network for table structure recognition. In: Proceedings of the IEEE/CVF", "bbox": {"l": 151.51801, "t": 130.63300000000004, "r": 480.59113, "b": 138.70270000000005, "coord_origin": "1"}}, {"id": 6, "text": "International Conference on Computer Vision. pp. 1295-1304 (2021)", "bbox": {"l": 151.51801, "t": 141.59198000000004, "r": 427.53329, "b": 149.66168000000005, "coord_origin": "1"}}]}, "text": "18. Xue, W., Yu, B., Wang, W., Tao, D., Li, Q.: Tgrnet: A table graph reconstruction network for table structure recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 1295-1304 (2021)"}, {"label": "List-item", "id": 2, "page_no": 13, "cluster": {"id": 2, "label": "List-item", "bbox": {"l": 134.765, "t": 151.89857425689695, "r": 480.9535074234009, "b": 193.6309467315674, "coord_origin": "1"}, "confidence": 0.9806671142578125, "cells": [{"id": 7, "text": "19.", "bbox": {"l": 134.765, "t": 152.55096000000003, "r": 146.19109, "b": 160.62067000000002, "coord_origin": "1"}}, {"id": 8, "text": "Ye, J., Qi, X., He, Y., Chen, Y., Gu, D., Gao, P., Xiao, R.: Pingan-vcgroup\u2019s", "bbox": {"l": 150.66234, "t": 152.55096000000003, "r": 480.5936899999999, "b": 160.62067000000002, "coord_origin": "1"}}, {"id": 9, "text": "solution for icdar 2021 competition on scientific literature parsing task b: Ta-", "bbox": {"l": 151.51801, "t": 163.50995, "r": 480.59469999999993, "b": 171.57965000000002, "coord_origin": "1"}}, {"id": 10, "text": "ble recognition to html (2021).", "bbox": {"l": 151.51801, "t": 174.46893, "r": 280.64047, "b": 182.53864, "coord_origin": "1"}}, {"id": 11, "text": "https://doi.org/10.48550/ARXIV.2105.01848", "bbox": {"l": 285.078, "t": 175.11450000000002, "r": 478.03403000000003, "b": 182.58349999999996, "coord_origin": "1"}}, {"id": 12, "text": ",", "bbox": {"l": 478.0319799999999, "t": 174.46893, "r": 480.59099999999995, "b": 182.53864, "coord_origin": "1"}}, {"id": 13, "text": "https://arxiv.org/abs/2105.01848", "bbox": {"l": 151.51797, "t": 186.07349, "r": 302.11584, "b": 193.54247999999995, "coord_origin": "1"}}]}, "text": "19. Ye, J., Qi, X., He, Y., Chen, Y., Gu, D., Gao, P., Xiao, R.: Pingan-vcgroup\u2019s solution for icdar 2021 competition on scientific literature parsing task b: Table recognition to html (2021). https://doi.org/10.48550/ARXIV.2105.01848 , https://arxiv.org/abs/2105.01848"}, {"label": "List-item", "id": 3, "page_no": 13, "cluster": {"id": 3, "label": "List-item", "bbox": {"l": 134.35293531417847, "t": 195.45381202697752, "r": 480.5935400000001, "b": 215.6006504058838, "coord_origin": "1"}, "confidence": 0.9731975793838501, "cells": [{"id": 14, "text": "20.", "bbox": {"l": 134.76497, "t": 196.38689999999997, "r": 145.65964, "b": 204.45659999999998, "coord_origin": "1"}}, {"id": 15, "text": "Zhang, Z., Zhang, J., Du, J., Wang, F.: Split, embed and merge: An accurate table", "bbox": {"l": 149.92294, "t": 196.38689999999997, "r": 480.5935400000001, "b": 204.45659999999998, "coord_origin": "1"}}, {"id": 16, "text": "structure recognizer. Pattern Recognition", "bbox": {"l": 151.51797, "t": 207.34491000000003, "r": 318.55124, "b": 215.41461000000004, "coord_origin": "1"}}, {"id": 17, "text": "126", "bbox": {"l": 321.62097, "t": 207.2821, "r": 337.53186, "b": 215.20844, "coord_origin": "1"}}, {"id": 18, "text": ", 108565 (2022)", "bbox": {"l": 337.53296, "t": 207.34491000000003, "r": 399.46927, "b": 215.41461000000004, "coord_origin": "1"}}]}, "text": "20. Zhang, Z., Zhang, J., Du, J., Wang, F.: Split, embed and merge: An accurate table structure recognizer. Pattern Recognition 126 , 108565 (2022)"}, {"label": "List-item", "id": 4, "page_no": 13, "cluster": {"id": 4, "label": "List-item", "bbox": {"l": 134.22648782730104, "t": 217.66440296173096, "r": 480.80447273254396, "b": 270.25438999999994, "coord_origin": "1"}, "confidence": 0.9837887287139893, "cells": [{"id": 19, "text": "21.", "bbox": {"l": 134.76495, "t": 218.30389000000002, "r": 145.7213, "b": 226.3736, "coord_origin": "1"}}, {"id": 20, "text": "Zheng, X., Burdick, D., Popa, L., Zhong, X., Wang, N.X.R.: Global table extractor", "bbox": {"l": 150.00871, "t": 218.30389000000002, "r": 480.59012, "b": 226.3736, "coord_origin": "1"}}, {"id": 21, "text": "(gte): A framework for joint table identification and cell structure recognition using", "bbox": {"l": 151.51796, "t": 229.26288, "r": 480.59102999999993, "b": 237.33258, "coord_origin": "1"}}, {"id": 22, "text": "visual context. In: 2021 IEEE Winter Conference on Applications of Computer Vi-", "bbox": {"l": 151.51796, "t": 240.22186, "r": 480.59119, "b": 248.29156, "coord_origin": "1"}}, {"id": 23, "text": "sion (WACV). pp. 697-706 (2021).", "bbox": {"l": 151.51796, "t": 251.18084999999996, "r": 293.44086, "b": 259.25055, "coord_origin": "1"}}, {"id": 24, "text": "https://doi.org/10.1109/WACV48630.2021.", "bbox": {"l": 297.04996, "t": 251.82641999999998, "r": 480.59305000000006, "b": 259.29540999999995, "coord_origin": "1"}}, {"id": 25, "text": "00074", "bbox": {"l": 151.51796, "t": 262.7854, "r": 175.05028, "b": 270.25438999999994, "coord_origin": "1"}}]}, "text": "21. Zheng, X., Burdick, D., Popa, L., Zhong, X., Wang, N.X.R.: Global table extractor (gte): A framework for joint table identification and cell structure recognition using visual context. In: 2021 IEEE Winter Conference on Applications of Computer Vision (WACV). pp. 697-706 (2021). https://doi.org/10.1109/WACV48630.2021. 00074"}, {"label": "List-item", "id": 5, "page_no": 13, "cluster": {"id": 5, "label": "List-item", "bbox": {"l": 133.9917151451111, "t": 272.07529678344736, "r": 480.59558, "b": 314.3335727691651, "coord_origin": "1"}, "confidence": 0.9830114841461182, "cells": [{"id": 26, "text": "22.", "bbox": {"l": 134.76495, "t": 273.09882000000005, "r": 146.36798, "b": 281.16855000000004, "coord_origin": "1"}}, {"id": 27, "text": "Zhong, X., ShafieiBavani, E., Jimeno Yepes, A.: Image-based table recognition:", "bbox": {"l": 150.90846, "t": 273.09882000000005, "r": 480.59094, "b": 281.16855000000004, "coord_origin": "1"}}, {"id": 28, "text": "Data, model, and evaluation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M.", "bbox": {"l": 151.51796, "t": 284.05777, "r": 480.58832000000007, "b": 292.12753, "coord_origin": "1"}}, {"id": 29, "text": "(eds.) Computer Vision - ECCV 2020. pp. 564-580. Springer International Pub-", "bbox": {"l": 151.51796, "t": 295.01675, "r": 480.59558, "b": 303.08651999999995, "coord_origin": "1"}}, {"id": 30, "text": "lishing, Cham (2020)", "bbox": {"l": 151.51796, "t": 305.97574, "r": 236.02359, "b": 314.0455, "coord_origin": "1"}}]}, "text": "22. Zhong, X., ShafieiBavani, E., Jimeno Yepes, A.: Image-based table recognition: Data, model, and evaluation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M. (eds.) Computer Vision - ECCV 2020. pp. 564-580. Springer International Publishing, Cham (2020)"}, {"label": "List-item", "id": 6, "page_no": 13, "cluster": {"id": 6, "label": "List-item", "bbox": {"l": 134.23336029052734, "t": 316.3024206161499, "r": 480.59454, "b": 347.29821166992184, "coord_origin": "1"}, "confidence": 0.9820854067802429, "cells": [{"id": 31, "text": "23.", "bbox": {"l": 134.76495, "t": 316.93472, "r": 145.69547, "b": 325.00449000000003, "coord_origin": "1"}}, {"id": 32, "text": "Zhong, X., Tang, J., Yepes, A.J.: Publaynet: largest dataset ever for document lay-", "bbox": {"l": 149.97276, "t": 316.93472, "r": 480.59454, "b": 325.00449000000003, "coord_origin": "1"}}, {"id": 33, "text": "out analysis. In: 2019 International Conference on Document Analysis and Recog-", "bbox": {"l": 151.51796, "t": 327.8927299999999, "r": 480.59387000000004, "b": 335.96249, "coord_origin": "1"}}, {"id": 34, "text": "nition (ICDAR). pp. 1015-1022. IEEE (2019)", "bbox": {"l": 151.51796, "t": 338.85172, "r": 335.13635, "b": 346.92148, "coord_origin": "1"}}]}, "text": "23. Zhong, X., Tang, J., Yepes, A.J.: Publaynet: largest dataset ever for document layout analysis. In: 2019 International Conference on Document Analysis and Recognition (ICDAR). pp. 1015-1022. IEEE (2019)"}], "body": [{"label": "List-item", "id": 1, "page_no": 13, "cluster": {"id": 1, "label": "List-item", "bbox": {"l": 134.6354066848755, "t": 118.99243412017825, "r": 480.59113, "b": 150.7261442184448, "coord_origin": "1"}, "confidence": 0.977252721786499, "cells": [{"id": 3, "text": "18.", "bbox": {"l": 134.765, "t": 119.67400999999995, "r": 146.07936, "b": 127.74370999999985, "coord_origin": "1"}}, {"id": 4, "text": "Xue, W., Yu, B., Wang, W., Tao, D., Li, Q.: Tgrnet: A table graph reconstruc-", "bbox": {"l": 150.5069, "t": 119.67400999999995, "r": 480.5892, "b": 127.74370999999985, "coord_origin": "1"}}, {"id": 5, "text": "tion network for table structure recognition. In: Proceedings of the IEEE/CVF", "bbox": {"l": 151.51801, "t": 130.63300000000004, "r": 480.59113, "b": 138.70270000000005, "coord_origin": "1"}}, {"id": 6, "text": "International Conference on Computer Vision. pp. 1295-1304 (2021)", "bbox": {"l": 151.51801, "t": 141.59198000000004, "r": 427.53329, "b": 149.66168000000005, "coord_origin": "1"}}]}, "text": "18. Xue, W., Yu, B., Wang, W., Tao, D., Li, Q.: Tgrnet: A table graph reconstruction network for table structure recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 1295-1304 (2021)"}, {"label": "List-item", "id": 2, "page_no": 13, "cluster": {"id": 2, "label": "List-item", "bbox": {"l": 134.765, "t": 151.89857425689695, "r": 480.9535074234009, "b": 193.6309467315674, "coord_origin": "1"}, "confidence": 0.9806671142578125, "cells": [{"id": 7, "text": "19.", "bbox": {"l": 134.765, "t": 152.55096000000003, "r": 146.19109, "b": 160.62067000000002, "coord_origin": "1"}}, {"id": 8, "text": "Ye, J., Qi, X., He, Y., Chen, Y., Gu, D., Gao, P., Xiao, R.: Pingan-vcgroup\u2019s", "bbox": {"l": 150.66234, "t": 152.55096000000003, "r": 480.5936899999999, "b": 160.62067000000002, "coord_origin": "1"}}, {"id": 9, "text": "solution for icdar 2021 competition on scientific literature parsing task b: Ta-", "bbox": {"l": 151.51801, "t": 163.50995, "r": 480.59469999999993, "b": 171.57965000000002, "coord_origin": "1"}}, {"id": 10, "text": "ble recognition to html (2021).", "bbox": {"l": 151.51801, "t": 174.46893, "r": 280.64047, "b": 182.53864, "coord_origin": "1"}}, {"id": 11, "text": "https://doi.org/10.48550/ARXIV.2105.01848", "bbox": {"l": 285.078, "t": 175.11450000000002, "r": 478.03403000000003, "b": 182.58349999999996, "coord_origin": "1"}}, {"id": 12, "text": ",", "bbox": {"l": 478.0319799999999, "t": 174.46893, "r": 480.59099999999995, "b": 182.53864, "coord_origin": "1"}}, {"id": 13, "text": "https://arxiv.org/abs/2105.01848", "bbox": {"l": 151.51797, "t": 186.07349, "r": 302.11584, "b": 193.54247999999995, "coord_origin": "1"}}]}, "text": "19. Ye, J., Qi, X., He, Y., Chen, Y., Gu, D., Gao, P., Xiao, R.: Pingan-vcgroup\u2019s solution for icdar 2021 competition on scientific literature parsing task b: Table recognition to html (2021). https://doi.org/10.48550/ARXIV.2105.01848 , https://arxiv.org/abs/2105.01848"}, {"label": "List-item", "id": 3, "page_no": 13, "cluster": {"id": 3, "label": "List-item", "bbox": {"l": 134.35293531417847, "t": 195.45381202697752, "r": 480.5935400000001, "b": 215.6006504058838, "coord_origin": "1"}, "confidence": 0.9731975793838501, "cells": [{"id": 14, "text": "20.", "bbox": {"l": 134.76497, "t": 196.38689999999997, "r": 145.65964, "b": 204.45659999999998, "coord_origin": "1"}}, {"id": 15, "text": "Zhang, Z., Zhang, J., Du, J., Wang, F.: Split, embed and merge: An accurate table", "bbox": {"l": 149.92294, "t": 196.38689999999997, "r": 480.5935400000001, "b": 204.45659999999998, "coord_origin": "1"}}, {"id": 16, "text": "structure recognizer. Pattern Recognition", "bbox": {"l": 151.51797, "t": 207.34491000000003, "r": 318.55124, "b": 215.41461000000004, "coord_origin": "1"}}, {"id": 17, "text": "126", "bbox": {"l": 321.62097, "t": 207.2821, "r": 337.53186, "b": 215.20844, "coord_origin": "1"}}, {"id": 18, "text": ", 108565 (2022)", "bbox": {"l": 337.53296, "t": 207.34491000000003, "r": 399.46927, "b": 215.41461000000004, "coord_origin": "1"}}]}, "text": "20. Zhang, Z., Zhang, J., Du, J., Wang, F.: Split, embed and merge: An accurate table structure recognizer. Pattern Recognition 126 , 108565 (2022)"}, {"label": "List-item", "id": 4, "page_no": 13, "cluster": {"id": 4, "label": "List-item", "bbox": {"l": 134.22648782730104, "t": 217.66440296173096, "r": 480.80447273254396, "b": 270.25438999999994, "coord_origin": "1"}, "confidence": 0.9837887287139893, "cells": [{"id": 19, "text": "21.", "bbox": {"l": 134.76495, "t": 218.30389000000002, "r": 145.7213, "b": 226.3736, "coord_origin": "1"}}, {"id": 20, "text": "Zheng, X., Burdick, D., Popa, L., Zhong, X., Wang, N.X.R.: Global table extractor", "bbox": {"l": 150.00871, "t": 218.30389000000002, "r": 480.59012, "b": 226.3736, "coord_origin": "1"}}, {"id": 21, "text": "(gte): A framework for joint table identification and cell structure recognition using", "bbox": {"l": 151.51796, "t": 229.26288, "r": 480.59102999999993, "b": 237.33258, "coord_origin": "1"}}, {"id": 22, "text": "visual context. In: 2021 IEEE Winter Conference on Applications of Computer Vi-", "bbox": {"l": 151.51796, "t": 240.22186, "r": 480.59119, "b": 248.29156, "coord_origin": "1"}}, {"id": 23, "text": "sion (WACV). pp. 697-706 (2021).", "bbox": {"l": 151.51796, "t": 251.18084999999996, "r": 293.44086, "b": 259.25055, "coord_origin": "1"}}, {"id": 24, "text": "https://doi.org/10.1109/WACV48630.2021.", "bbox": {"l": 297.04996, "t": 251.82641999999998, "r": 480.59305000000006, "b": 259.29540999999995, "coord_origin": "1"}}, {"id": 25, "text": "00074", "bbox": {"l": 151.51796, "t": 262.7854, "r": 175.05028, "b": 270.25438999999994, "coord_origin": "1"}}]}, "text": "21. Zheng, X., Burdick, D., Popa, L., Zhong, X., Wang, N.X.R.: Global table extractor (gte): A framework for joint table identification and cell structure recognition using visual context. In: 2021 IEEE Winter Conference on Applications of Computer Vision (WACV). pp. 697-706 (2021). https://doi.org/10.1109/WACV48630.2021. 00074"}, {"label": "List-item", "id": 5, "page_no": 13, "cluster": {"id": 5, "label": "List-item", "bbox": {"l": 133.9917151451111, "t": 272.07529678344736, "r": 480.59558, "b": 314.3335727691651, "coord_origin": "1"}, "confidence": 0.9830114841461182, "cells": [{"id": 26, "text": "22.", "bbox": {"l": 134.76495, "t": 273.09882000000005, "r": 146.36798, "b": 281.16855000000004, "coord_origin": "1"}}, {"id": 27, "text": "Zhong, X., ShafieiBavani, E., Jimeno Yepes, A.: Image-based table recognition:", "bbox": {"l": 150.90846, "t": 273.09882000000005, "r": 480.59094, "b": 281.16855000000004, "coord_origin": "1"}}, {"id": 28, "text": "Data, model, and evaluation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M.", "bbox": {"l": 151.51796, "t": 284.05777, "r": 480.58832000000007, "b": 292.12753, "coord_origin": "1"}}, {"id": 29, "text": "(eds.) Computer Vision - ECCV 2020. pp. 564-580. Springer International Pub-", "bbox": {"l": 151.51796, "t": 295.01675, "r": 480.59558, "b": 303.08651999999995, "coord_origin": "1"}}, {"id": 30, "text": "lishing, Cham (2020)", "bbox": {"l": 151.51796, "t": 305.97574, "r": 236.02359, "b": 314.0455, "coord_origin": "1"}}]}, "text": "22. Zhong, X., ShafieiBavani, E., Jimeno Yepes, A.: Image-based table recognition: Data, model, and evaluation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M. (eds.) Computer Vision - ECCV 2020. pp. 564-580. Springer International Publishing, Cham (2020)"}, {"label": "List-item", "id": 6, "page_no": 13, "cluster": {"id": 6, "label": "List-item", "bbox": {"l": 134.23336029052734, "t": 316.3024206161499, "r": 480.59454, "b": 347.29821166992184, "coord_origin": "1"}, "confidence": 0.9820854067802429, "cells": [{"id": 31, "text": "23.", "bbox": {"l": 134.76495, "t": 316.93472, "r": 145.69547, "b": 325.00449000000003, "coord_origin": "1"}}, {"id": 32, "text": "Zhong, X., Tang, J., Yepes, A.J.: Publaynet: largest dataset ever for document lay-", "bbox": {"l": 149.97276, "t": 316.93472, "r": 480.59454, "b": 325.00449000000003, "coord_origin": "1"}}, {"id": 33, "text": "out analysis. In: 2019 International Conference on Document Analysis and Recog-", "bbox": {"l": 151.51796, "t": 327.8927299999999, "r": 480.59387000000004, "b": 335.96249, "coord_origin": "1"}}, {"id": 34, "text": "nition (ICDAR). pp. 1015-1022. IEEE (2019)", "bbox": {"l": 151.51796, "t": 338.85172, "r": 335.13635, "b": 346.92148, "coord_origin": "1"}}]}, "text": "23. Zhong, X., Tang, J., Yepes, A.J.: Publaynet: largest dataset ever for document layout analysis. In: 2019 International Conference on Document Analysis and Recognition (ICDAR). pp. 1015-1022. IEEE (2019)"}], "headers": [{"label": "Page-header", "id": 0, "page_no": 13, "cluster": {"id": 0, "label": "Page-header", "bbox": {"l": 134.765, "t": 92.97494831085203, "r": 231.72049000000004, "b": 101.84069999999997, "coord_origin": "1"}, "confidence": 0.6591073870658875, "cells": [{"id": 0, "text": "14", "bbox": {"l": 134.765, "t": 93.77099999999996, "r": 143.97887, "b": 101.84069999999997, "coord_origin": "1"}}, {"id": 1, "text": "M.", "bbox": {"l": 167.82053, "t": 93.77099999999996, "r": 178.08249, "b": 101.84069999999997, "coord_origin": "1"}}, {"id": 2, "text": "Lysak, et al.", "bbox": {"l": 182.37929, "t": 93.77099999999996, "r": 231.72049000000004, "b": 101.84069999999997, "coord_origin": "1"}}]}, "text": "14 M. Lysak, et al."}]}}]