Docling/tests/data/groundtruth/docling_v1/2203.01017v2.pages.json
Christoph Auer 7d3be0edeb
feat!: Docling v2 (#117)
---------

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Maxim Lysak <mly@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
Co-authored-by: Maxim Lysak <mly@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
2024-10-16 21:02:03 +02:00

1 line
1.8 MiB

[{"page_no": 0, "size": {"width": 612.0, "height": 792.0}, "cells": [{"id": 0, "text": "TableFormer: Table Structure Understanding with Transformers.", "bbox": {"l": 96.301003, "t": 107.03412000000003, "r": 498.92708999999996, "b": 119.93133999999998, "coord_origin": "TOPLEFT"}}, {"id": 1, "text": "Ahmed Nassar, Nikolaos Livathinos, Maksym Lysak, Peter Staar", "bbox": {"l": 142.47701, "t": 146.68535999999995, "r": 452.75027, "b": 157.37334999999996, "coord_origin": "TOPLEFT"}}, {"id": 2, "text": "IBM Research", "bbox": {"l": 262.918, "t": 160.63239, "r": 332.30597, "b": 171.32037000000003, "coord_origin": "TOPLEFT"}}, {"id": 3, "text": "{", "bbox": {"l": 208.123, "t": 175.96123999999998, "r": 212.73083, "b": 184.42553999999996, "coord_origin": "TOPLEFT"}}, {"id": 4, "text": "ahn,nli,mly,taa", "bbox": {"l": 212.73, "t": 177.08203000000003, "r": 293.42761, "b": 184.00409000000002, "coord_origin": "TOPLEFT"}}, {"id": 5, "text": "}", "bbox": {"l": 293.42798, "t": 175.96123999999998, "r": 298.0358, "b": 184.42553999999996, "coord_origin": "TOPLEFT"}}, {"id": 6, "text": "@zurich.ibm.com", "bbox": {"l": 298.03497, "t": 177.08203000000003, "r": 378.73257, "b": 184.00409000000002, "coord_origin": "TOPLEFT"}}, {"id": 7, "text": "Abstract", "bbox": {"l": 145.99498, "t": 215.48297000000002, "r": 190.48029, "b": 226.23071000000004, "coord_origin": "TOPLEFT"}}, {"id": 8, "text": "Tables organize valuable content in a concise and com-", "bbox": {"l": 62.066978, "t": 241.39508, "r": 286.36493, "b": 249.98284999999998, "coord_origin": "TOPLEFT"}}, {"id": 9, "text": "pact representation. This content is extremely valuable for", "bbox": {"l": 50.111977, "t": 253.3501, "r": 286.36508, "b": 261.93787, "coord_origin": "TOPLEFT"}}, {"id": 10, "text": "systems such as search engines, Knowledge Graph\u2019s, etc,", "bbox": {"l": 50.111977, "t": 265.30511, "r": 286.36508, "b": 273.89288, "coord_origin": "TOPLEFT"}}, {"id": 11, "text": "since they enhance their predictive capabilities. Unfortu-", "bbox": {"l": 50.111977, "t": 277.26111000000003, "r": 286.36505, "b": 285.84888, "coord_origin": "TOPLEFT"}}, {"id": 12, "text": "nately, tables come in a large variety of shapes and sizes.", "bbox": {"l": 50.111977, "t": 289.21609, "r": 286.36505, "b": 297.80386, "coord_origin": "TOPLEFT"}}, {"id": 13, "text": "Furthermore, they can have complex column/row-header", "bbox": {"l": 50.111977, "t": 301.17108, "r": 286.36505, "b": 309.75884999999994, "coord_origin": "TOPLEFT"}}, {"id": 14, "text": "configurations, multiline rows, different variety of separa-", "bbox": {"l": 50.111977, "t": 313.12607, "r": 286.36508, "b": 321.71384, "coord_origin": "TOPLEFT"}}, {"id": 15, "text": "tion lines, missing entries, etc. As such, the correct iden-", "bbox": {"l": 50.111977, "t": 325.08105, "r": 286.36508, "b": 333.66882, "coord_origin": "TOPLEFT"}}, {"id": 16, "text": "tification of the table-structure from an image is a non-", "bbox": {"l": 50.111977, "t": 337.03604, "r": 286.36505, "b": 345.62381, "coord_origin": "TOPLEFT"}}, {"id": 17, "text": "trivial task. In this paper, we present a new table-structure", "bbox": {"l": 50.111977, "t": 348.99203, "r": 286.36508, "b": 357.5798, "coord_origin": "TOPLEFT"}}, {"id": 18, "text": "identification model. The latter improves the latest end-to-", "bbox": {"l": 50.111977, "t": 360.94701999999995, "r": 286.36505, "b": 369.53479, "coord_origin": "TOPLEFT"}}, {"id": 19, "text": "end deep learning model (i.e. encoder-dual-decoder from", "bbox": {"l": 50.111977, "t": 372.90201, "r": 286.36508, "b": 381.48978, "coord_origin": "TOPLEFT"}}, {"id": 20, "text": "PubTabNet) in two significant ways. First, we introduce a", "bbox": {"l": 50.111977, "t": 384.85699, "r": 286.36505, "b": 393.44476, "coord_origin": "TOPLEFT"}}, {"id": 21, "text": "new object detection decoder for table-cells. In this way,", "bbox": {"l": 50.111977, "t": 396.81198, "r": 286.36511, "b": 405.39975000000004, "coord_origin": "TOPLEFT"}}, {"id": 22, "text": "we can obtain the content of the table-cells from program-", "bbox": {"l": 50.111977, "t": 408.76697, "r": 286.36508, "b": 417.35474, "coord_origin": "TOPLEFT"}}, {"id": 23, "text": "matic PDF\u2019s directly from the PDF source and avoid the", "bbox": {"l": 50.111977, "t": 420.72296000000006, "r": 286.36505, "b": 429.31073, "coord_origin": "TOPLEFT"}}, {"id": 24, "text": "training of the custom OCR decoders.", "bbox": {"l": 50.111977, "t": 432.67795, "r": 207.23216, "b": 441.26572, "coord_origin": "TOPLEFT"}}, {"id": 25, "text": "This architectural", "bbox": {"l": 214.09639, "t": 432.67795, "r": 286.36508, "b": 441.26572, "coord_origin": "TOPLEFT"}}, {"id": 26, "text": "change leads to more accurate table-content extraction and", "bbox": {"l": 50.111977, "t": 444.63293, "r": 286.36508, "b": 453.2207, "coord_origin": "TOPLEFT"}}, {"id": 27, "text": "allows us to tackle non-english tables. Second, we replace", "bbox": {"l": 50.111977, "t": 456.58792000000005, "r": 286.36505, "b": 465.17569, "coord_origin": "TOPLEFT"}}, {"id": 28, "text": "the LSTM decoders with transformer based decoders. This", "bbox": {"l": 50.111977, "t": 468.54291, "r": 286.36505, "b": 477.13068, "coord_origin": "TOPLEFT"}}, {"id": 29, "text": "upgrade improves significantly the previous state-of-the-art", "bbox": {"l": 50.111977, "t": 480.4989, "r": 286.36508, "b": 489.08667, "coord_origin": "TOPLEFT"}}, {"id": 30, "text": "tree-editing-distance-score (TEDS) from 91% to 98.5% on", "bbox": {"l": 50.111977, "t": 492.45389, "r": 286.36505, "b": 501.04166, "coord_origin": "TOPLEFT"}}, {"id": 31, "text": "simple tables and from 88.7% to 95% on complex tables.", "bbox": {"l": 50.111977, "t": 504.40887, "r": 276.65152, "b": 512.9966400000001, "coord_origin": "TOPLEFT"}}, {"id": 32, "text": "1.", "bbox": {"l": 50.111977, "t": 539.94276, "r": 58.121296, "b": 550.69049, "coord_origin": "TOPLEFT"}}, {"id": 33, "text": "Introduction", "bbox": {"l": 68.800385, "t": 539.94276, "r": 126.94804, "b": 550.69049, "coord_origin": "TOPLEFT"}}, {"id": 34, "text": "The occurrence of tables in documents is ubiquitous.", "bbox": {"l": 62.066978, "t": 560.7832, "r": 286.36496, "b": 569.68976, "coord_origin": "TOPLEFT"}}, {"id": 35, "text": "They often summarise quantitative or factual data, which is", "bbox": {"l": 50.111977, "t": 572.7382, "r": 286.36508, "b": 581.64476, "coord_origin": "TOPLEFT"}}, {"id": 36, "text": "cumbersome to describe in verbose text but nevertheless ex-", "bbox": {"l": 50.111977, "t": 584.69321, "r": 286.36505, "b": 593.5997600000001, "coord_origin": "TOPLEFT"}}, {"id": 37, "text": "tremely valuable. Unfortunately, this compact representa-", "bbox": {"l": 50.111977, "t": 596.6492000000001, "r": 286.36505, "b": 605.55576, "coord_origin": "TOPLEFT"}}, {"id": 38, "text": "tion is often not easy to parse by machines. There are many", "bbox": {"l": 50.111977, "t": 608.6042, "r": 286.36505, "b": 617.51076, "coord_origin": "TOPLEFT"}}, {"id": 39, "text": "implicit conventions used to obtain a compact table repre-", "bbox": {"l": 50.111977, "t": 620.5592, "r": 286.36505, "b": 629.46576, "coord_origin": "TOPLEFT"}}, {"id": 40, "text": "sentation. For example, tables often have complex column-", "bbox": {"l": 50.111977, "t": 632.51421, "r": 286.36508, "b": 641.42076, "coord_origin": "TOPLEFT"}}, {"id": 41, "text": "and row-headers in order to reduce duplicated cell content.", "bbox": {"l": 50.111977, "t": 644.46921, "r": 286.36508, "b": 653.37576, "coord_origin": "TOPLEFT"}}, {"id": 42, "text": "Lines of different shapes and sizes are leveraged to separate", "bbox": {"l": 50.111977, "t": 656.42421, "r": 286.36502, "b": 665.33077, "coord_origin": "TOPLEFT"}}, {"id": 43, "text": "content or indicate a tree structure. Additionally, tables can", "bbox": {"l": 50.111977, "t": 668.3802000000001, "r": 286.36505, "b": 677.28677, "coord_origin": "TOPLEFT"}}, {"id": 44, "text": "also have empty/missing table-entries or multi-row textual", "bbox": {"l": 50.111977, "t": 680.33521, "r": 286.36505, "b": 689.2417800000001, "coord_origin": "TOPLEFT"}}, {"id": 45, "text": "table-entries. Fig. 1 shows a table which presents all these", "bbox": {"l": 50.111977, "t": 692.290207, "r": 286.36505, "b": 701.196777, "coord_origin": "TOPLEFT"}}, {"id": 46, "text": "issues.", "bbox": {"l": 50.111977, "t": 704.245209, "r": 76.403275, "b": 713.151779, "coord_origin": "TOPLEFT"}}, {"id": 47, "text": "a.", "bbox": {"l": 315.56702, "t": 218.00684, "r": 324.01007, "b": 226.75482, "coord_origin": "TOPLEFT"}}, {"id": 48, "text": "Picture of a table:", "bbox": {"l": 328.2316, "t": 218.00684, "r": 408.4407, "b": 226.75482, "coord_origin": "TOPLEFT"}}, {"id": 49, "text": "b.", "bbox": {"l": 315.56702, "t": 313.69478999999995, "r": 325.05786, "b": 322.44281, "coord_origin": "TOPLEFT"}}, {"id": 50, "text": "Red-annotation of bounding boxes,", "bbox": {"l": 329.80325, "t": 313.69478999999995, "r": 486.40194999999994, "b": 322.44281, "coord_origin": "TOPLEFT"}}, {"id": 51, "text": "Blue-predictions by TableFormer", "bbox": {"l": 326.46252, "t": 324.49478, "r": 472.47411999999997, "b": 333.2428, "coord_origin": "TOPLEFT"}}, {"id": 52, "text": "c.", "bbox": {"l": 315.56702, "t": 420.1828, "r": 324.81039, "b": 428.93082, "coord_origin": "TOPLEFT"}}, {"id": 53, "text": "Structure predicted by TableFormer:", "bbox": {"l": 329.4321, "t": 420.1828, "r": 491.1912500000001, "b": 428.93082, "coord_origin": "TOPLEFT"}}, {"id": 54, "text": "1", "bbox": {"l": 408.14752, "t": 342.82828, "r": 412.54001, "b": 351.61322, "coord_origin": "TOPLEFT"}}, {"id": 55, "text": "0", "bbox": {"l": 356.11011, "t": 341.57217, "r": 360.50259, "b": 350.35712, "coord_origin": "TOPLEFT"}}, {"id": 56, "text": "2", "bbox": {"l": 500.6777, "t": 340.93768, "r": 505.0701900000001, "b": 349.7226299999999, "coord_origin": "TOPLEFT"}}, {"id": 57, "text": "3", "bbox": {"l": 356.13382, "t": 351.74789, "r": 360.52631, "b": 360.53284, "coord_origin": "TOPLEFT"}}, {"id": 58, "text": "4", "bbox": {"l": 402.53992, "t": 355.8765, "r": 406.9324, "b": 364.66144, "coord_origin": "TOPLEFT"}}, {"id": 59, "text": "5", "bbox": {"l": 448.58178999999996, "t": 352.84018, "r": 452.97427, "b": 361.62512, "coord_origin": "TOPLEFT"}}, {"id": 60, "text": "6", "bbox": {"l": 491.65161000000006, "t": 353.70657, "r": 496.0441, "b": 362.49152, "coord_origin": "TOPLEFT"}}, {"id": 61, "text": "7", "bbox": {"l": 535.13843, "t": 353.33969, "r": 539.53088, "b": 362.12463, "coord_origin": "TOPLEFT"}}, {"id": 62, "text": "8", "bbox": {"l": 348.82822, "t": 387.09781, "r": 353.2207, "b": 395.88275, "coord_origin": "TOPLEFT"}}, {"id": 63, "text": "9", "bbox": {"l": 389.27151, "t": 375.37228, "r": 393.664, "b": 384.15723, "coord_origin": "TOPLEFT"}}, {"id": 64, "text": "10", "bbox": {"l": 442.67479999999995, "t": 375.64621, "r": 451.45889000000005, "b": 384.43115, "coord_origin": "TOPLEFT"}}, {"id": 65, "text": "11", "bbox": {"l": 477.4382299999999, "t": 375.534, "r": 485.90167, "b": 384.31894000000005, "coord_origin": "TOPLEFT"}}, {"id": 66, "text": "12", "bbox": {"l": 522.57263, "t": 375.64621, "r": 531.35669, "b": 384.43115, "coord_origin": "TOPLEFT"}}, {"id": 67, "text": "13", "bbox": {"l": 400.22992, "t": 387.11429, "r": 409.01401, "b": 395.89923, "coord_origin": "TOPLEFT"}}, {"id": 68, "text": "14", "bbox": {"l": 442.30792, "t": 386.98981000000003, "r": 451.0920100000001, "b": 395.77475000000004, "coord_origin": "TOPLEFT"}}, {"id": 69, "text": "15", "bbox": {"l": 478.21941999999996, "t": 387.37469, "r": 487.00351000000006, "b": 396.15964, "coord_origin": "TOPLEFT"}}, {"id": 70, "text": "16", "bbox": {"l": 523.2287, "t": 386.98981000000003, "r": 532.01276, "b": 395.77475000000004, "coord_origin": "TOPLEFT"}}, {"id": 71, "text": "1", "bbox": {"l": 411.57233, "t": 399.42477, "r": 415.96481, "b": 408.20972, "coord_origin": "TOPLEFT"}}, {"id": 72, "text": "7", "bbox": {"l": 415.96393, "t": 399.42477, "r": 420.35641, "b": 408.20972, "coord_origin": "TOPLEFT"}}, {"id": 73, "text": "18", "bbox": {"l": 442.30521, "t": 399.0371999999999, "r": 451.08929, "b": 407.82213999999993, "coord_origin": "TOPLEFT"}}, {"id": 74, "text": "19", "bbox": {"l": 478.77893, "t": 398.99639999999994, "r": 487.56302, "b": 407.78133999999994, "coord_origin": "TOPLEFT"}}, {"id": 75, "text": "20", "bbox": {"l": 523.97241, "t": 398.6114799999999, "r": 532.75647, "b": 407.39642, "coord_origin": "TOPLEFT"}}, {"id": 76, "text": "1", "bbox": {"l": 347.24872, "t": 437.68588, "r": 351.6412, "b": 446.47083, "coord_origin": "TOPLEFT"}}, {"id": 77, "text": "0", "bbox": {"l": 318.88071, "t": 437.68588, "r": 323.27319, "b": 446.47083, "coord_origin": "TOPLEFT"}}, {"id": 78, "text": "2", "bbox": {"l": 394.10422, "t": 437.68588, "r": 398.4967, "b": 446.47083, "coord_origin": "TOPLEFT"}}, {"id": 79, "text": "3", "bbox": {"l": 318.77316, "t": 449.5455, "r": 323.16565, "b": 458.33044, "coord_origin": "TOPLEFT"}}, {"id": 80, "text": "4", "bbox": {"l": 347.24872, "t": 449.5455, "r": 351.6412, "b": 458.33044, "coord_origin": "TOPLEFT"}}, {"id": 81, "text": "5", "bbox": {"l": 394.10422, "t": 449.5455, "r": 398.4967, "b": 458.33044, "coord_origin": "TOPLEFT"}}, {"id": 82, "text": "6", "bbox": {"l": 440.95941000000005, "t": 449.5455, "r": 445.3519, "b": 458.33044, "coord_origin": "TOPLEFT"}}, {"id": 83, "text": "7", "bbox": {"l": 487.81491, "t": 449.5455, "r": 492.2074, "b": 458.33044, "coord_origin": "TOPLEFT"}}, {"id": 84, "text": "8", "bbox": {"l": 318.77316, "t": 473.70425, "r": 323.16565, "b": 482.4892, "coord_origin": "TOPLEFT"}}, {"id": 85, "text": "9", "bbox": {"l": 347.24872, "t": 461.8446, "r": 351.6412, "b": 470.62955, "coord_origin": "TOPLEFT"}}, {"id": 86, "text": "10", "bbox": {"l": 394.10422, "t": 461.8446, "r": 402.88831, "b": 470.62955, "coord_origin": "TOPLEFT"}}, {"id": 87, "text": "11", "bbox": {"l": 440.95941000000005, "t": 461.8446, "r": 449.42285, "b": 470.62955, "coord_origin": "TOPLEFT"}}, {"id": 88, "text": "12", "bbox": {"l": 487.81491, "t": 461.8446, "r": 496.599, "b": 470.62955, "coord_origin": "TOPLEFT"}}, {"id": 89, "text": "13", "bbox": {"l": 347.24872, "t": 473.70425, "r": 356.03281, "b": 482.4892, "coord_origin": "TOPLEFT"}}, {"id": 90, "text": "14", "bbox": {"l": 394.10422, "t": 473.70425, "r": 402.88831, "b": 482.4892, "coord_origin": "TOPLEFT"}}, {"id": 91, "text": "15", "bbox": {"l": 440.95941000000005, "t": 473.70425, "r": 449.7435, "b": 482.4892, "coord_origin": "TOPLEFT"}}, {"id": 92, "text": "16", "bbox": {"l": 487.81491, "t": 473.70425, "r": 496.599, "b": 482.4892, "coord_origin": "TOPLEFT"}}, {"id": 93, "text": "17", "bbox": {"l": 347.24872, "t": 485.12469, "r": 356.03281, "b": 493.90964, "coord_origin": "TOPLEFT"}}, {"id": 94, "text": "18", "bbox": {"l": 394.10422, "t": 485.12469, "r": 402.88831, "b": 493.90964, "coord_origin": "TOPLEFT"}}, {"id": 95, "text": "19", "bbox": {"l": 440.95941000000005, "t": 485.12469, "r": 449.7435, "b": 493.90964, "coord_origin": "TOPLEFT"}}, {"id": 96, "text": "20", "bbox": {"l": 487.81491, "t": 485.12469, "r": 496.599, "b": 493.90964, "coord_origin": "TOPLEFT"}}, {"id": 97, "text": "1", "bbox": {"l": 451.9457100000001, "t": 235.34704999999997, "r": 457.95050000000003, "b": 245.47748, "coord_origin": "TOPLEFT"}}, {"id": 98, "text": "3", "bbox": {"l": 385.09399, "t": 357.76030999999995, "r": 391.09879, "b": 367.89072, "coord_origin": "TOPLEFT"}}, {"id": 99, "text": "3", "bbox": {"l": 366.70102, "t": 449.12082, "r": 372.70581, "b": 459.25122, "coord_origin": "TOPLEFT"}}, {"id": 100, "text": "2", "bbox": {"l": 331.19681, "t": 269.35266, "r": 337.2016, "b": 279.48308999999995, "coord_origin": "TOPLEFT"}}, {"id": 101, "text": "2", "bbox": {"l": 333.43451, "t": 380.7265, "r": 339.4393, "b": 390.85689999999994, "coord_origin": "TOPLEFT"}}, {"id": 102, "text": "2", "bbox": {"l": 331.90424, "t": 473.32291, "r": 337.90903, "b": 483.45331, "coord_origin": "TOPLEFT"}}, {"id": 103, "text": "1", "bbox": {"l": 478.07210999999995, "t": 341.0368000000001, "r": 484.0769, "b": 351.16720999999995, "coord_origin": "TOPLEFT"}}, {"id": 104, "text": "1", "bbox": {"l": 459.87621999999993, "t": 437.5936, "r": 465.88101, "b": 447.724, "coord_origin": "TOPLEFT"}}, {"id": 105, "text": "3", "bbox": {"l": 384.0329, "t": 252.67895999999996, "r": 390.03769, "b": 262.80939, "coord_origin": "TOPLEFT"}}, {"id": 106, "text": "Figure 1:", "bbox": {"l": 308.862, "t": 514.50037, "r": 345.73361, "b": 523.40692, "coord_origin": "TOPLEFT"}}, {"id": 107, "text": "Picture of a table with subtle, complex features", "bbox": {"l": 353.17566, "t": 514.50037, "r": 545.11511, "b": 523.40692, "coord_origin": "TOPLEFT"}}, {"id": 108, "text": "such as (1) multi-column headers, (2) cell with multi-row", "bbox": {"l": 308.862, "t": 526.45535, "r": 545.11511, "b": 535.3619100000001, "coord_origin": "TOPLEFT"}}, {"id": 109, "text": "text and (3) cells with no content. Image from PubTabNet", "bbox": {"l": 308.862, "t": 538.41035, "r": 545.11517, "b": 547.31691, "coord_origin": "TOPLEFT"}}, {"id": 110, "text": "evaluation set, filename: \u2018PMC2944238 004 02\u2019.", "bbox": {"l": 308.862, "t": 550.36635, "r": 505.6917700000001, "b": 559.2729, "coord_origin": "TOPLEFT"}}, {"id": 111, "text": "Recently, significant progress has been made with vi-", "bbox": {"l": 320.81699, "t": 584.40936, "r": 545.11493, "b": 593.31592, "coord_origin": "TOPLEFT"}}, {"id": 112, "text": "sion based approaches to extract tables in documents. For", "bbox": {"l": 308.862, "t": 596.36436, "r": 545.11517, "b": 605.2709199999999, "coord_origin": "TOPLEFT"}}, {"id": 113, "text": "the sake of completeness, the issue of table extraction from", "bbox": {"l": 308.862, "t": 608.31937, "r": 545.11511, "b": 617.22592, "coord_origin": "TOPLEFT"}}, {"id": 114, "text": "documents is typically decomposed into two separate chal-", "bbox": {"l": 308.862, "t": 620.27437, "r": 545.11505, "b": 629.18092, "coord_origin": "TOPLEFT"}}, {"id": 115, "text": "lenges, i.e.", "bbox": {"l": 308.862, "t": 632.23036, "r": 353.6937, "b": 641.13692, "coord_origin": "TOPLEFT"}}, {"id": 116, "text": "(1)", "bbox": {"l": 362.11209, "t": 632.23036, "r": 374.66617, "b": 641.13692, "coord_origin": "TOPLEFT"}}, {"id": 117, "text": "finding the location of the table(s) on a", "bbox": {"l": 377.35785, "t": 632.23036, "r": 545.11505, "b": 641.13692, "coord_origin": "TOPLEFT"}}, {"id": 118, "text": "document-page and (2) finding the structure of a given table", "bbox": {"l": 308.862, "t": 644.18536, "r": 545.11517, "b": 653.09192, "coord_origin": "TOPLEFT"}}, {"id": 119, "text": "in the document.", "bbox": {"l": 308.862, "t": 656.14037, "r": 375.55167, "b": 665.04693, "coord_origin": "TOPLEFT"}}, {"id": 120, "text": "The first problem is called table-location and has been", "bbox": {"l": 320.81699, "t": 668.38036, "r": 545.11493, "b": 677.28693, "coord_origin": "TOPLEFT"}}, {"id": 121, "text": "previously addressed [30, 38, 19, 21, 23, 26, 8] with state-", "bbox": {"l": 308.862, "t": 680.33536, "r": 545.11511, "b": 689.24193, "coord_origin": "TOPLEFT"}}, {"id": 122, "text": "of-the-art object-detection networks (e.g. YOLO and later", "bbox": {"l": 308.862, "t": 692.290359, "r": 545.11511, "b": 701.19693, "coord_origin": "TOPLEFT"}}, {"id": 123, "text": "on Mask-RCNN [9]). For all practical purposes, it can be", "bbox": {"l": 308.862, "t": 704.245361, "r": 545.11499, "b": 713.151932, "coord_origin": "TOPLEFT"}}, {"id": 124, "text": "1", "bbox": {"l": 295.121, "t": 734.133366, "r": 300.10229, "b": 743.039928, "coord_origin": "TOPLEFT"}}, {"id": 125, "text": "arXiv:2203.01017v2 [cs.CV] 11 Mar 2022", "bbox": {"l": 18.340221, "t": 207.82001000000002, "r": 36.339779, "b": 560.00003, "coord_origin": "TOPLEFT"}}], "predictions": {"layout": {"clusters": [{"id": 0, "label": "section_header", "bbox": {"l": 95.52344512939453, "t": 105.71222686767578, "r": 498.92708999999996, "b": 119.93133999999998, "coord_origin": "TOPLEFT"}, "confidence": 0.8868062496185303, "cells": [{"id": 0, "text": "TableFormer: Table Structure Understanding with Transformers.", "bbox": {"l": 96.301003, "t": 107.03412000000003, "r": 498.92708999999996, "b": 119.93133999999998, "coord_origin": "TOPLEFT"}}]}, {"id": 1, "label": "section_header", "bbox": {"l": 141.76437377929688, "t": 145.46343994140625, "r": 453.2174987792969, "b": 171.32037000000003, "coord_origin": "TOPLEFT"}, "confidence": 0.7586213946342468, "cells": [{"id": 1, "text": "Ahmed Nassar, Nikolaos Livathinos, Maksym Lysak, Peter Staar", "bbox": {"l": 142.47701, "t": 146.68535999999995, "r": 452.75027, "b": 157.37334999999996, "coord_origin": "TOPLEFT"}}, {"id": 2, "text": "IBM Research", "bbox": {"l": 262.918, "t": 160.63239, "r": 332.30597, "b": 171.32037000000003, "coord_origin": "TOPLEFT"}}]}, {"id": 2, "label": "text", "bbox": {"l": 208.123, "t": 175.634521484375, "r": 379.2890319824219, "b": 185.38323974609375, "coord_origin": "TOPLEFT"}, "confidence": 0.909633457660675, "cells": [{"id": 3, "text": "{", "bbox": {"l": 208.123, "t": 175.96123999999998, "r": 212.73083, "b": 184.42553999999996, "coord_origin": "TOPLEFT"}}, {"id": 4, "text": "ahn,nli,mly,taa", "bbox": {"l": 212.73, "t": 177.08203000000003, "r": 293.42761, "b": 184.00409000000002, "coord_origin": "TOPLEFT"}}, {"id": 5, "text": "}", "bbox": {"l": 293.42798, "t": 175.96123999999998, "r": 298.0358, "b": 184.42553999999996, "coord_origin": "TOPLEFT"}}, {"id": 6, "text": "@zurich.ibm.com", "bbox": {"l": 298.03497, "t": 177.08203000000003, "r": 378.73257, "b": 184.00409000000002, "coord_origin": "TOPLEFT"}}]}, {"id": 3, "label": "section_header", "bbox": {"l": 144.82810974121094, "t": 215.0791015625, "r": 190.83473205566406, "b": 226.23071000000004, "coord_origin": "TOPLEFT"}, "confidence": 0.9258671998977661, "cells": [{"id": 7, "text": "Abstract", "bbox": {"l": 145.99498, "t": 215.48297000000002, "r": 190.48029, "b": 226.23071000000004, "coord_origin": "TOPLEFT"}}]}, {"id": 4, "label": "text", "bbox": {"l": 48.843414306640625, "t": 240.1605987548828, "r": 286.7501220703125, "b": 513.9630126953125, "coord_origin": "TOPLEFT"}, "confidence": 0.9838882088661194, "cells": [{"id": 8, "text": "Tables organize valuable content in a concise and com-", "bbox": {"l": 62.066978, "t": 241.39508, "r": 286.36493, "b": 249.98284999999998, "coord_origin": "TOPLEFT"}}, {"id": 9, "text": "pact representation. This content is extremely valuable for", "bbox": {"l": 50.111977, "t": 253.3501, "r": 286.36508, "b": 261.93787, "coord_origin": "TOPLEFT"}}, {"id": 10, "text": "systems such as search engines, Knowledge Graph\u2019s, etc,", "bbox": {"l": 50.111977, "t": 265.30511, "r": 286.36508, "b": 273.89288, "coord_origin": "TOPLEFT"}}, {"id": 11, "text": "since they enhance their predictive capabilities. Unfortu-", "bbox": {"l": 50.111977, "t": 277.26111000000003, "r": 286.36505, "b": 285.84888, "coord_origin": "TOPLEFT"}}, {"id": 12, "text": "nately, tables come in a large variety of shapes and sizes.", "bbox": {"l": 50.111977, "t": 289.21609, "r": 286.36505, "b": 297.80386, "coord_origin": "TOPLEFT"}}, {"id": 13, "text": "Furthermore, they can have complex column/row-header", "bbox": {"l": 50.111977, "t": 301.17108, "r": 286.36505, "b": 309.75884999999994, "coord_origin": "TOPLEFT"}}, {"id": 14, "text": "configurations, multiline rows, different variety of separa-", "bbox": {"l": 50.111977, "t": 313.12607, "r": 286.36508, "b": 321.71384, "coord_origin": "TOPLEFT"}}, {"id": 15, "text": "tion lines, missing entries, etc. As such, the correct iden-", "bbox": {"l": 50.111977, "t": 325.08105, "r": 286.36508, "b": 333.66882, "coord_origin": "TOPLEFT"}}, {"id": 16, "text": "tification of the table-structure from an image is a non-", "bbox": {"l": 50.111977, "t": 337.03604, "r": 286.36505, "b": 345.62381, "coord_origin": "TOPLEFT"}}, {"id": 17, "text": "trivial task. In this paper, we present a new table-structure", "bbox": {"l": 50.111977, "t": 348.99203, "r": 286.36508, "b": 357.5798, "coord_origin": "TOPLEFT"}}, {"id": 18, "text": "identification model. The latter improves the latest end-to-", "bbox": {"l": 50.111977, "t": 360.94701999999995, "r": 286.36505, "b": 369.53479, "coord_origin": "TOPLEFT"}}, {"id": 19, "text": "end deep learning model (i.e. encoder-dual-decoder from", "bbox": {"l": 50.111977, "t": 372.90201, "r": 286.36508, "b": 381.48978, "coord_origin": "TOPLEFT"}}, {"id": 20, "text": "PubTabNet) in two significant ways. First, we introduce a", "bbox": {"l": 50.111977, "t": 384.85699, "r": 286.36505, "b": 393.44476, "coord_origin": "TOPLEFT"}}, {"id": 21, "text": "new object detection decoder for table-cells. In this way,", "bbox": {"l": 50.111977, "t": 396.81198, "r": 286.36511, "b": 405.39975000000004, "coord_origin": "TOPLEFT"}}, {"id": 22, "text": "we can obtain the content of the table-cells from program-", "bbox": {"l": 50.111977, "t": 408.76697, "r": 286.36508, "b": 417.35474, "coord_origin": "TOPLEFT"}}, {"id": 23, "text": "matic PDF\u2019s directly from the PDF source and avoid the", "bbox": {"l": 50.111977, "t": 420.72296000000006, "r": 286.36505, "b": 429.31073, "coord_origin": "TOPLEFT"}}, {"id": 24, "text": "training of the custom OCR decoders.", "bbox": {"l": 50.111977, "t": 432.67795, "r": 207.23216, "b": 441.26572, "coord_origin": "TOPLEFT"}}, {"id": 25, "text": "This architectural", "bbox": {"l": 214.09639, "t": 432.67795, "r": 286.36508, "b": 441.26572, "coord_origin": "TOPLEFT"}}, {"id": 26, "text": "change leads to more accurate table-content extraction and", "bbox": {"l": 50.111977, "t": 444.63293, "r": 286.36508, "b": 453.2207, "coord_origin": "TOPLEFT"}}, {"id": 27, "text": "allows us to tackle non-english tables. Second, we replace", "bbox": {"l": 50.111977, "t": 456.58792000000005, "r": 286.36505, "b": 465.17569, "coord_origin": "TOPLEFT"}}, {"id": 28, "text": "the LSTM decoders with transformer based decoders. This", "bbox": {"l": 50.111977, "t": 468.54291, "r": 286.36505, "b": 477.13068, "coord_origin": "TOPLEFT"}}, {"id": 29, "text": "upgrade improves significantly the previous state-of-the-art", "bbox": {"l": 50.111977, "t": 480.4989, "r": 286.36508, "b": 489.08667, "coord_origin": "TOPLEFT"}}, {"id": 30, "text": "tree-editing-distance-score (TEDS) from 91% to 98.5% on", "bbox": {"l": 50.111977, "t": 492.45389, "r": 286.36505, "b": 501.04166, "coord_origin": "TOPLEFT"}}, {"id": 31, "text": "simple tables and from 88.7% to 95% on complex tables.", "bbox": {"l": 50.111977, "t": 504.40887, "r": 276.65152, "b": 512.9966400000001, "coord_origin": "TOPLEFT"}}]}, {"id": 5, "label": "section_header", "bbox": {"l": 49.7277946472168, "t": 539.0704956054688, "r": 126.95994567871094, "b": 550.69049, "coord_origin": "TOPLEFT"}, "confidence": 0.9317677617073059, "cells": [{"id": 32, "text": "1.", "bbox": {"l": 50.111977, "t": 539.94276, "r": 58.121296, "b": 550.69049, "coord_origin": "TOPLEFT"}}, {"id": 33, "text": "Introduction", "bbox": {"l": 68.800385, "t": 539.94276, "r": 126.94804, "b": 550.69049, "coord_origin": "TOPLEFT"}}]}, {"id": 6, "label": "text", "bbox": {"l": 49.31346130371094, "t": 559.62060546875, "r": 286.5247497558594, "b": 713.151779, "coord_origin": "TOPLEFT"}, "confidence": 0.9841895699501038, "cells": [{"id": 34, "text": "The occurrence of tables in documents is ubiquitous.", "bbox": {"l": 62.066978, "t": 560.7832, "r": 286.36496, "b": 569.68976, "coord_origin": "TOPLEFT"}}, {"id": 35, "text": "They often summarise quantitative or factual data, which is", "bbox": {"l": 50.111977, "t": 572.7382, "r": 286.36508, "b": 581.64476, "coord_origin": "TOPLEFT"}}, {"id": 36, "text": "cumbersome to describe in verbose text but nevertheless ex-", "bbox": {"l": 50.111977, "t": 584.69321, "r": 286.36505, "b": 593.5997600000001, "coord_origin": "TOPLEFT"}}, {"id": 37, "text": "tremely valuable. Unfortunately, this compact representa-", "bbox": {"l": 50.111977, "t": 596.6492000000001, "r": 286.36505, "b": 605.55576, "coord_origin": "TOPLEFT"}}, {"id": 38, "text": "tion is often not easy to parse by machines. There are many", "bbox": {"l": 50.111977, "t": 608.6042, "r": 286.36505, "b": 617.51076, "coord_origin": "TOPLEFT"}}, {"id": 39, "text": "implicit conventions used to obtain a compact table repre-", "bbox": {"l": 50.111977, "t": 620.5592, "r": 286.36505, "b": 629.46576, "coord_origin": "TOPLEFT"}}, {"id": 40, "text": "sentation. For example, tables often have complex column-", "bbox": {"l": 50.111977, "t": 632.51421, "r": 286.36508, "b": 641.42076, "coord_origin": "TOPLEFT"}}, {"id": 41, "text": "and row-headers in order to reduce duplicated cell content.", "bbox": {"l": 50.111977, "t": 644.46921, "r": 286.36508, "b": 653.37576, "coord_origin": "TOPLEFT"}}, {"id": 42, "text": "Lines of different shapes and sizes are leveraged to separate", "bbox": {"l": 50.111977, "t": 656.42421, "r": 286.36502, "b": 665.33077, "coord_origin": "TOPLEFT"}}, {"id": 43, "text": "content or indicate a tree structure. Additionally, tables can", "bbox": {"l": 50.111977, "t": 668.3802000000001, "r": 286.36505, "b": 677.28677, "coord_origin": "TOPLEFT"}}, {"id": 44, "text": "also have empty/missing table-entries or multi-row textual", "bbox": {"l": 50.111977, "t": 680.33521, "r": 286.36505, "b": 689.2417800000001, "coord_origin": "TOPLEFT"}}, {"id": 45, "text": "table-entries. Fig. 1 shows a table which presents all these", "bbox": {"l": 50.111977, "t": 692.290207, "r": 286.36505, "b": 701.196777, "coord_origin": "TOPLEFT"}}, {"id": 46, "text": "issues.", "bbox": {"l": 50.111977, "t": 704.245209, "r": 76.403275, "b": 713.151779, "coord_origin": "TOPLEFT"}}]}, {"id": 7, "label": "section_header", "bbox": {"l": 315.26983642578125, "t": 216.9366455078125, "r": 408.4407, "b": 226.75482, "coord_origin": "TOPLEFT"}, "confidence": 0.6724019646644592, "cells": [{"id": 47, "text": "a.", "bbox": {"l": 315.56702, "t": 218.00684, "r": 324.01007, "b": 226.75482, "coord_origin": "TOPLEFT"}}, {"id": 48, "text": "Picture of a table:", "bbox": {"l": 328.2316, "t": 218.00684, "r": 408.4407, "b": 226.75482, "coord_origin": "TOPLEFT"}}]}, {"id": 8, "label": "text", "bbox": {"l": 315.56702, "t": 313.69478999999995, "r": 486.40194999999994, "b": 333.2428, "coord_origin": "TOPLEFT"}, "confidence": -1.0, "cells": [{"id": 49, "text": "b.", "bbox": {"l": 315.56702, "t": 313.69478999999995, "r": 325.05786, "b": 322.44281, "coord_origin": "TOPLEFT"}}, {"id": 50, "text": "Red-annotation of bounding boxes,", "bbox": {"l": 329.80325, "t": 313.69478999999995, "r": 486.40194999999994, "b": 322.44281, "coord_origin": "TOPLEFT"}}, {"id": 51, "text": "Blue-predictions by TableFormer", "bbox": {"l": 326.46252, "t": 324.49478, "r": 472.47411999999997, "b": 333.2428, "coord_origin": "TOPLEFT"}}]}, {"id": 9, "label": "text", "bbox": {"l": 315.56702, "t": 420.1828, "r": 324.81039, "b": 428.93082, "coord_origin": "TOPLEFT"}, "confidence": -1.0, "cells": [{"id": 52, "text": "c.", "bbox": {"l": 315.56702, "t": 420.1828, "r": 324.81039, "b": 428.93082, "coord_origin": "TOPLEFT"}}]}, {"id": 10, "label": "text", "bbox": {"l": 329.4321, "t": 420.1828, "r": 491.1912500000001, "b": 428.93082, "coord_origin": "TOPLEFT"}, "confidence": -1.0, "cells": [{"id": 53, "text": "Structure predicted by TableFormer:", "bbox": {"l": 329.4321, "t": 420.1828, "r": 491.1912500000001, "b": 428.93082, "coord_origin": "TOPLEFT"}}]}, {"id": 11, "label": "picture", "bbox": {"l": 314.78173828125, "t": 338.0652770996094, "r": 539.53088, "b": 410.0494384765625, "coord_origin": "TOPLEFT"}, "confidence": 0.8742761611938477, "cells": [{"id": 54, "text": "1", "bbox": {"l": 408.14752, "t": 342.82828, "r": 412.54001, "b": 351.61322, "coord_origin": "TOPLEFT"}}, {"id": 55, "text": "0", "bbox": {"l": 356.11011, "t": 341.57217, "r": 360.50259, "b": 350.35712, "coord_origin": "TOPLEFT"}}, {"id": 56, "text": "2", "bbox": {"l": 500.6777, "t": 340.93768, "r": 505.0701900000001, "b": 349.7226299999999, "coord_origin": "TOPLEFT"}}, {"id": 57, "text": "3", "bbox": {"l": 356.13382, "t": 351.74789, "r": 360.52631, "b": 360.53284, "coord_origin": "TOPLEFT"}}, {"id": 58, "text": "4", "bbox": {"l": 402.53992, "t": 355.8765, "r": 406.9324, "b": 364.66144, "coord_origin": "TOPLEFT"}}, {"id": 59, "text": "5", "bbox": {"l": 448.58178999999996, "t": 352.84018, "r": 452.97427, "b": 361.62512, "coord_origin": "TOPLEFT"}}, {"id": 60, "text": "6", "bbox": {"l": 491.65161000000006, "t": 353.70657, "r": 496.0441, "b": 362.49152, "coord_origin": "TOPLEFT"}}, {"id": 61, "text": "7", "bbox": {"l": 535.13843, "t": 353.33969, "r": 539.53088, "b": 362.12463, "coord_origin": "TOPLEFT"}}, {"id": 62, "text": "8", "bbox": {"l": 348.82822, "t": 387.09781, "r": 353.2207, "b": 395.88275, "coord_origin": "TOPLEFT"}}, {"id": 63, "text": "9", "bbox": {"l": 389.27151, "t": 375.37228, "r": 393.664, "b": 384.15723, "coord_origin": "TOPLEFT"}}, {"id": 64, "text": "10", "bbox": {"l": 442.67479999999995, "t": 375.64621, "r": 451.45889000000005, "b": 384.43115, "coord_origin": "TOPLEFT"}}, {"id": 65, "text": "11", "bbox": {"l": 477.4382299999999, "t": 375.534, "r": 485.90167, "b": 384.31894000000005, "coord_origin": "TOPLEFT"}}, {"id": 66, "text": "12", "bbox": {"l": 522.57263, "t": 375.64621, "r": 531.35669, "b": 384.43115, "coord_origin": "TOPLEFT"}}, {"id": 67, "text": "13", "bbox": {"l": 400.22992, "t": 387.11429, "r": 409.01401, "b": 395.89923, "coord_origin": "TOPLEFT"}}, {"id": 68, "text": "14", "bbox": {"l": 442.30792, "t": 386.98981000000003, "r": 451.0920100000001, "b": 395.77475000000004, "coord_origin": "TOPLEFT"}}, {"id": 69, "text": "15", "bbox": {"l": 478.21941999999996, "t": 387.37469, "r": 487.00351000000006, "b": 396.15964, "coord_origin": "TOPLEFT"}}, {"id": 70, "text": "16", "bbox": {"l": 523.2287, "t": 386.98981000000003, "r": 532.01276, "b": 395.77475000000004, "coord_origin": "TOPLEFT"}}, {"id": 71, "text": "1", "bbox": {"l": 411.57233, "t": 399.42477, "r": 415.96481, "b": 408.20972, "coord_origin": "TOPLEFT"}}, {"id": 72, "text": "7", "bbox": {"l": 415.96393, "t": 399.42477, "r": 420.35641, "b": 408.20972, "coord_origin": "TOPLEFT"}}, {"id": 73, "text": "18", "bbox": {"l": 442.30521, "t": 399.0371999999999, "r": 451.08929, "b": 407.82213999999993, "coord_origin": "TOPLEFT"}}, {"id": 74, "text": "19", "bbox": {"l": 478.77893, "t": 398.99639999999994, "r": 487.56302, "b": 407.78133999999994, "coord_origin": "TOPLEFT"}}, {"id": 75, "text": "20", "bbox": {"l": 523.97241, "t": 398.6114799999999, "r": 532.75647, "b": 407.39642, "coord_origin": "TOPLEFT"}}, {"id": 98, "text": "3", "bbox": {"l": 385.09399, "t": 357.76030999999995, "r": 391.09879, "b": 367.89072, "coord_origin": "TOPLEFT"}}, {"id": 101, "text": "2", "bbox": {"l": 333.43451, "t": 380.7265, "r": 339.4393, "b": 390.85689999999994, "coord_origin": "TOPLEFT"}}, {"id": 103, "text": "1", "bbox": {"l": 478.07210999999995, "t": 341.0368000000001, "r": 484.0769, "b": 351.16720999999995, "coord_origin": "TOPLEFT"}}]}, {"id": 12, "label": "table", "bbox": {"l": 315.7172546386719, "t": 433.823486328125, "r": 536.835693359375, "b": 496.0290222167969, "coord_origin": "TOPLEFT"}, "confidence": 0.8056102991104126, "cells": [{"id": 76, "text": "1", "bbox": {"l": 347.24872, "t": 437.68588, "r": 351.6412, "b": 446.47083, "coord_origin": "TOPLEFT"}}, {"id": 77, "text": "0", "bbox": {"l": 318.88071, "t": 437.68588, "r": 323.27319, "b": 446.47083, "coord_origin": "TOPLEFT"}}, {"id": 78, "text": "2", "bbox": {"l": 394.10422, "t": 437.68588, "r": 398.4967, "b": 446.47083, "coord_origin": "TOPLEFT"}}, {"id": 79, "text": "3", "bbox": {"l": 318.77316, "t": 449.5455, "r": 323.16565, "b": 458.33044, "coord_origin": "TOPLEFT"}}, {"id": 80, "text": "4", "bbox": {"l": 347.24872, "t": 449.5455, "r": 351.6412, "b": 458.33044, "coord_origin": "TOPLEFT"}}, {"id": 81, "text": "5", "bbox": {"l": 394.10422, "t": 449.5455, "r": 398.4967, "b": 458.33044, "coord_origin": "TOPLEFT"}}, {"id": 82, "text": "6", "bbox": {"l": 440.95941000000005, "t": 449.5455, "r": 445.3519, "b": 458.33044, "coord_origin": "TOPLEFT"}}, {"id": 83, "text": "7", "bbox": {"l": 487.81491, "t": 449.5455, "r": 492.2074, "b": 458.33044, "coord_origin": "TOPLEFT"}}, {"id": 84, "text": "8", "bbox": {"l": 318.77316, "t": 473.70425, "r": 323.16565, "b": 482.4892, "coord_origin": "TOPLEFT"}}, {"id": 85, "text": "9", "bbox": {"l": 347.24872, "t": 461.8446, "r": 351.6412, "b": 470.62955, "coord_origin": "TOPLEFT"}}, {"id": 86, "text": "10", "bbox": {"l": 394.10422, "t": 461.8446, "r": 402.88831, "b": 470.62955, "coord_origin": "TOPLEFT"}}, {"id": 87, "text": "11", "bbox": {"l": 440.95941000000005, "t": 461.8446, "r": 449.42285, "b": 470.62955, "coord_origin": "TOPLEFT"}}, {"id": 88, "text": "12", "bbox": {"l": 487.81491, "t": 461.8446, "r": 496.599, "b": 470.62955, "coord_origin": "TOPLEFT"}}, {"id": 89, "text": "13", "bbox": {"l": 347.24872, "t": 473.70425, "r": 356.03281, "b": 482.4892, "coord_origin": "TOPLEFT"}}, {"id": 90, "text": "14", "bbox": {"l": 394.10422, "t": 473.70425, "r": 402.88831, "b": 482.4892, "coord_origin": "TOPLEFT"}}, {"id": 91, "text": "15", "bbox": {"l": 440.95941000000005, "t": 473.70425, "r": 449.7435, "b": 482.4892, "coord_origin": "TOPLEFT"}}, {"id": 92, "text": "16", "bbox": {"l": 487.81491, "t": 473.70425, "r": 496.599, "b": 482.4892, "coord_origin": "TOPLEFT"}}, {"id": 93, "text": "17", "bbox": {"l": 347.24872, "t": 485.12469, "r": 356.03281, "b": 493.90964, "coord_origin": "TOPLEFT"}}, {"id": 94, "text": "18", "bbox": {"l": 394.10422, "t": 485.12469, "r": 402.88831, "b": 493.90964, "coord_origin": "TOPLEFT"}}, {"id": 95, "text": "19", "bbox": {"l": 440.95941000000005, "t": 485.12469, "r": 449.7435, "b": 493.90964, "coord_origin": "TOPLEFT"}}, {"id": 96, "text": "20", "bbox": {"l": 487.81491, "t": 485.12469, "r": 496.599, "b": 493.90964, "coord_origin": "TOPLEFT"}}, {"id": 99, "text": "3", "bbox": {"l": 366.70102, "t": 449.12082, "r": 372.70581, "b": 459.25122, "coord_origin": "TOPLEFT"}}, {"id": 102, "text": "2", "bbox": {"l": 331.90424, "t": 473.32291, "r": 337.90903, "b": 483.45331, "coord_origin": "TOPLEFT"}}, {"id": 104, "text": "1", "bbox": {"l": 459.87621999999993, "t": 437.5936, "r": 465.88101, "b": 447.724, "coord_origin": "TOPLEFT"}}]}, {"id": 13, "label": "table", "bbox": {"l": 315.65362548828125, "t": 228.7234344482422, "r": 537.1475219726562, "b": 302.80145263671875, "coord_origin": "TOPLEFT"}, "confidence": 0.6515864729881287, "cells": [{"id": 97, "text": "1", "bbox": {"l": 451.9457100000001, "t": 235.34704999999997, "r": 457.95050000000003, "b": 245.47748, "coord_origin": "TOPLEFT"}}, {"id": 100, "text": "2", "bbox": {"l": 331.19681, "t": 269.35266, "r": 337.2016, "b": 279.48308999999995, "coord_origin": "TOPLEFT"}}, {"id": 105, "text": "3", "bbox": {"l": 384.0329, "t": 252.67895999999996, "r": 390.03769, "b": 262.80939, "coord_origin": "TOPLEFT"}}]}, {"id": 14, "label": "caption", "bbox": {"l": 308.1024475097656, "t": 513.773681640625, "r": 545.11517, "b": 559.2729, "coord_origin": "TOPLEFT"}, "confidence": 0.92146235704422, "cells": [{"id": 106, "text": "Figure 1:", "bbox": {"l": 308.862, "t": 514.50037, "r": 345.73361, "b": 523.40692, "coord_origin": "TOPLEFT"}}, {"id": 107, "text": "Picture of a table with subtle, complex features", "bbox": {"l": 353.17566, "t": 514.50037, "r": 545.11511, "b": 523.40692, "coord_origin": "TOPLEFT"}}, {"id": 108, "text": "such as (1) multi-column headers, (2) cell with multi-row", "bbox": {"l": 308.862, "t": 526.45535, "r": 545.11511, "b": 535.3619100000001, "coord_origin": "TOPLEFT"}}, {"id": 109, "text": "text and (3) cells with no content. Image from PubTabNet", "bbox": {"l": 308.862, "t": 538.41035, "r": 545.11517, "b": 547.31691, "coord_origin": "TOPLEFT"}}, {"id": 110, "text": "evaluation set, filename: \u2018PMC2944238 004 02\u2019.", "bbox": {"l": 308.862, "t": 550.36635, "r": 505.6917700000001, "b": 559.2729, "coord_origin": "TOPLEFT"}}]}, {"id": 15, "label": "text", "bbox": {"l": 307.88861083984375, "t": 583.6217651367188, "r": 545.5301513671875, "b": 665.04693, "coord_origin": "TOPLEFT"}, "confidence": 0.9848759770393372, "cells": [{"id": 111, "text": "Recently, significant progress has been made with vi-", "bbox": {"l": 320.81699, "t": 584.40936, "r": 545.11493, "b": 593.31592, "coord_origin": "TOPLEFT"}}, {"id": 112, "text": "sion based approaches to extract tables in documents. For", "bbox": {"l": 308.862, "t": 596.36436, "r": 545.11517, "b": 605.2709199999999, "coord_origin": "TOPLEFT"}}, {"id": 113, "text": "the sake of completeness, the issue of table extraction from", "bbox": {"l": 308.862, "t": 608.31937, "r": 545.11511, "b": 617.22592, "coord_origin": "TOPLEFT"}}, {"id": 114, "text": "documents is typically decomposed into two separate chal-", "bbox": {"l": 308.862, "t": 620.27437, "r": 545.11505, "b": 629.18092, "coord_origin": "TOPLEFT"}}, {"id": 115, "text": "lenges, i.e.", "bbox": {"l": 308.862, "t": 632.23036, "r": 353.6937, "b": 641.13692, "coord_origin": "TOPLEFT"}}, {"id": 116, "text": "(1)", "bbox": {"l": 362.11209, "t": 632.23036, "r": 374.66617, "b": 641.13692, "coord_origin": "TOPLEFT"}}, {"id": 117, "text": "finding the location of the table(s) on a", "bbox": {"l": 377.35785, "t": 632.23036, "r": 545.11505, "b": 641.13692, "coord_origin": "TOPLEFT"}}, {"id": 118, "text": "document-page and (2) finding the structure of a given table", "bbox": {"l": 308.862, "t": 644.18536, "r": 545.11517, "b": 653.09192, "coord_origin": "TOPLEFT"}}, {"id": 119, "text": "in the document.", "bbox": {"l": 308.862, "t": 656.14037, "r": 375.55167, "b": 665.04693, "coord_origin": "TOPLEFT"}}]}, {"id": 16, "label": "text", "bbox": {"l": 307.9762268066406, "t": 667.4489135742188, "r": 545.4558715820312, "b": 713.8033447265625, "coord_origin": "TOPLEFT"}, "confidence": 0.9791521430015564, "cells": [{"id": 120, "text": "The first problem is called table-location and has been", "bbox": {"l": 320.81699, "t": 668.38036, "r": 545.11493, "b": 677.28693, "coord_origin": "TOPLEFT"}}, {"id": 121, "text": "previously addressed [30, 38, 19, 21, 23, 26, 8] with state-", "bbox": {"l": 308.862, "t": 680.33536, "r": 545.11511, "b": 689.24193, "coord_origin": "TOPLEFT"}}, {"id": 122, "text": "of-the-art object-detection networks (e.g. YOLO and later", "bbox": {"l": 308.862, "t": 692.290359, "r": 545.11511, "b": 701.19693, "coord_origin": "TOPLEFT"}}, {"id": 123, "text": "on Mask-RCNN [9]). For all practical purposes, it can be", "bbox": {"l": 308.862, "t": 704.245361, "r": 545.11499, "b": 713.151932, "coord_origin": "TOPLEFT"}}]}, {"id": 17, "label": "page_footer", "bbox": {"l": 295.121, "t": 733.4591674804688, "r": 300.10229, "b": 743.039928, "coord_origin": "TOPLEFT"}, "confidence": 0.8045891523361206, "cells": [{"id": 124, "text": "1", "bbox": {"l": 295.121, "t": 734.133366, "r": 300.10229, "b": 743.039928, "coord_origin": "TOPLEFT"}}]}, {"id": 18, "label": "page_header", "bbox": {"l": 17.166410446166992, "t": 207.82001000000002, "r": 36.339779, "b": 560.00003, "coord_origin": "TOPLEFT"}, "confidence": 0.8773143887519836, "cells": [{"id": 125, "text": "arXiv:2203.01017v2 [cs.CV] 11 Mar 2022", "bbox": {"l": 18.340221, "t": 207.82001000000002, "r": 36.339779, "b": 560.00003, "coord_origin": "TOPLEFT"}}]}]}, "tablestructure": {"table_map": {"12": {"label": "table", "id": 12, "page_no": 0, "cluster": {"id": 12, "label": "table", "bbox": {"l": 315.7172546386719, "t": 433.823486328125, "r": 536.835693359375, "b": 496.0290222167969, "coord_origin": "TOPLEFT"}, "confidence": 0.8056102991104126, "cells": [{"id": 76, "text": "1", "bbox": {"l": 347.24872, "t": 437.68588, "r": 351.6412, "b": 446.47083, "coord_origin": "TOPLEFT"}}, {"id": 77, "text": "0", "bbox": {"l": 318.88071, "t": 437.68588, "r": 323.27319, "b": 446.47083, "coord_origin": "TOPLEFT"}}, {"id": 78, "text": "2", "bbox": {"l": 394.10422, "t": 437.68588, "r": 398.4967, "b": 446.47083, "coord_origin": "TOPLEFT"}}, {"id": 79, "text": "3", "bbox": {"l": 318.77316, "t": 449.5455, "r": 323.16565, "b": 458.33044, "coord_origin": "TOPLEFT"}}, {"id": 80, "text": "4", "bbox": {"l": 347.24872, "t": 449.5455, "r": 351.6412, "b": 458.33044, "coord_origin": "TOPLEFT"}}, {"id": 81, "text": "5", "bbox": {"l": 394.10422, "t": 449.5455, "r": 398.4967, "b": 458.33044, "coord_origin": "TOPLEFT"}}, {"id": 82, "text": "6", "bbox": {"l": 440.95941000000005, "t": 449.5455, "r": 445.3519, "b": 458.33044, "coord_origin": "TOPLEFT"}}, {"id": 83, "text": "7", "bbox": {"l": 487.81491, "t": 449.5455, "r": 492.2074, "b": 458.33044, "coord_origin": "TOPLEFT"}}, {"id": 84, "text": "8", "bbox": {"l": 318.77316, "t": 473.70425, "r": 323.16565, "b": 482.4892, "coord_origin": "TOPLEFT"}}, {"id": 85, "text": "9", "bbox": {"l": 347.24872, "t": 461.8446, "r": 351.6412, "b": 470.62955, "coord_origin": "TOPLEFT"}}, {"id": 86, "text": "10", "bbox": {"l": 394.10422, "t": 461.8446, "r": 402.88831, "b": 470.62955, "coord_origin": "TOPLEFT"}}, {"id": 87, "text": "11", "bbox": {"l": 440.95941000000005, "t": 461.8446, "r": 449.42285, "b": 470.62955, "coord_origin": "TOPLEFT"}}, {"id": 88, "text": "12", "bbox": {"l": 487.81491, "t": 461.8446, "r": 496.599, "b": 470.62955, "coord_origin": "TOPLEFT"}}, {"id": 89, "text": "13", "bbox": {"l": 347.24872, "t": 473.70425, "r": 356.03281, "b": 482.4892, "coord_origin": "TOPLEFT"}}, {"id": 90, "text": "14", "bbox": {"l": 394.10422, "t": 473.70425, "r": 402.88831, "b": 482.4892, "coord_origin": "TOPLEFT"}}, {"id": 91, "text": "15", "bbox": {"l": 440.95941000000005, "t": 473.70425, "r": 449.7435, "b": 482.4892, "coord_origin": "TOPLEFT"}}, {"id": 92, "text": "16", "bbox": {"l": 487.81491, "t": 473.70425, "r": 496.599, "b": 482.4892, "coord_origin": "TOPLEFT"}}, {"id": 93, "text": "17", "bbox": {"l": 347.24872, "t": 485.12469, "r": 356.03281, "b": 493.90964, "coord_origin": "TOPLEFT"}}, {"id": 94, "text": "18", "bbox": {"l": 394.10422, "t": 485.12469, "r": 402.88831, "b": 493.90964, "coord_origin": "TOPLEFT"}}, {"id": 95, "text": "19", "bbox": {"l": 440.95941000000005, "t": 485.12469, "r": 449.7435, "b": 493.90964, "coord_origin": "TOPLEFT"}}, {"id": 96, "text": "20", "bbox": {"l": 487.81491, "t": 485.12469, "r": 496.599, "b": 493.90964, "coord_origin": "TOPLEFT"}}, {"id": 99, "text": "3", "bbox": {"l": 366.70102, "t": 449.12082, "r": 372.70581, "b": 459.25122, "coord_origin": "TOPLEFT"}}, {"id": 102, "text": "2", "bbox": {"l": 331.90424, "t": 473.32291, "r": 337.90903, "b": 483.45331, "coord_origin": "TOPLEFT"}}, {"id": 104, "text": "1", "bbox": {"l": 459.87621999999993, "t": 437.5936, "r": 465.88101, "b": 447.724, "coord_origin": "TOPLEFT"}}]}, "text": null, "otsl_seq": ["ched", "ched", "lcel", "ched", "lcel", "ched", "nl", "fcel", "fcel", "fcel", "fcel", "fcel", "fcel", "nl", "fcel", "fcel", "fcel", "fcel", "fcel", "fcel", "nl", "fcel", "fcel", "fcel", "fcel", "fcel", "ucel", "nl", "fcel", "fcel", "fcel", "fcel", "fcel", "ucel", "nl"], "num_rows": 5, "num_cols": 6, "table_cells": [{"bbox": {"l": 347.24872, "t": 437.68588, "r": 351.6412, "b": 446.47083, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 2, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 1, "end_col_offset_idx": 3, "text": "1", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 318.88071, "t": 437.68588, "r": 323.27319, "b": 446.47083, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "0", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 394.10422, "t": 437.5936, "r": 465.88101, "b": 447.724, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 2, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 3, "end_col_offset_idx": 5, "text": "2 1", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 318.77316, "t": 449.5455, "r": 323.16565, "b": 458.33044, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 347.24872, "t": 449.5455, "r": 351.6412, "b": 458.33044, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "4", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 366.70102, "t": 449.12082, "r": 398.4967, "b": 459.25122, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "5 3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 440.95941000000005, "t": 449.5455, "r": 445.3519, "b": 458.33044, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "6", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 487.81491, "t": 449.5455, "r": 492.2074, "b": 458.33044, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "7", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 318.77316, "t": 473.70425, "r": 323.16565, "b": 482.4892, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "8", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 347.24872, "t": 461.8446, "r": 351.6412, "b": 470.62955, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "9", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 394.10422, "t": 461.8446, "r": 402.88831, "b": 470.62955, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "10", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 440.95941000000005, "t": 461.8446, "r": 449.42285, "b": 470.62955, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "11", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 487.81491, "t": 461.8446, "r": 496.599, "b": 470.62955, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "12", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 347.24872, "t": 473.70425, "r": 356.03281, "b": 482.4892, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "13", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 394.10422, "t": 473.70425, "r": 402.88831, "b": 482.4892, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "14", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 440.95941000000005, "t": 473.70425, "r": 449.7435, "b": 482.4892, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "15", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 487.81491, "t": 473.70425, "r": 496.599, "b": 482.4892, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "16", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 347.24872, "t": 485.12469, "r": 356.03281, "b": 493.90964, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "17", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 394.10422, "t": 485.12469, "r": 402.88831, "b": 493.90964, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "18", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 440.95941000000005, "t": 485.12469, "r": 449.7435, "b": 493.90964, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "19", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 487.81491, "t": 485.12469, "r": 496.599, "b": 493.90964, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "20", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 331.90424, "t": 473.32291, "r": 337.90903, "b": 483.45331, "coord_origin": "TOPLEFT"}, "row_span": 3, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 5, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "2", "column_header": false, "row_header": false, "row_section": false}]}, "13": {"label": "table", "id": 13, "page_no": 0, "cluster": {"id": 13, "label": "table", "bbox": {"l": 315.65362548828125, "t": 228.7234344482422, "r": 537.1475219726562, "b": 302.80145263671875, "coord_origin": "TOPLEFT"}, "confidence": 0.6515864729881287, "cells": [{"id": 97, "text": "1", "bbox": {"l": 451.9457100000001, "t": 235.34704999999997, "r": 457.95050000000003, "b": 245.47748, "coord_origin": "TOPLEFT"}}, {"id": 100, "text": "2", "bbox": {"l": 331.19681, "t": 269.35266, "r": 337.2016, "b": 279.48308999999995, "coord_origin": "TOPLEFT"}}, {"id": 105, "text": "3", "bbox": {"l": 384.0329, "t": 252.67895999999996, "r": 390.03769, "b": 262.80939, "coord_origin": "TOPLEFT"}}]}, "text": null, "otsl_seq": ["ecel", "ched", "ched", "ched", "ched", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "nl", "ucel", "fcel", "fcel", "fcel", "fcel", "nl", "ucel", "fcel", "fcel", "fcel", "fcel", "nl"], "num_rows": 1, "num_cols": 2, "table_cells": [{"bbox": {"l": 451.9457100000001, "t": 235.34704999999997, "r": 457.95050000000003, "b": 245.47748, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "1", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 384.0329, "t": 252.67895999999996, "r": 390.03769, "b": 262.80939, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "3", "column_header": true, "row_header": false, "row_section": false}]}}}, "figures_classification": null, "equations_prediction": null}, "assembled": {"elements": [{"label": "section_header", "id": 0, "page_no": 0, "cluster": {"id": 0, "label": "section_header", "bbox": {"l": 95.52344512939453, "t": 105.71222686767578, "r": 498.92708999999996, "b": 119.93133999999998, "coord_origin": "TOPLEFT"}, "confidence": 0.8868062496185303, "cells": [{"id": 0, "text": "TableFormer: Table Structure Understanding with Transformers.", "bbox": {"l": 96.301003, "t": 107.03412000000003, "r": 498.92708999999996, "b": 119.93133999999998, "coord_origin": "TOPLEFT"}}]}, "text": "TableFormer: Table Structure Understanding with Transformers."}, {"label": "section_header", "id": 1, "page_no": 0, "cluster": {"id": 1, "label": "section_header", "bbox": {"l": 141.76437377929688, "t": 145.46343994140625, "r": 453.2174987792969, "b": 171.32037000000003, "coord_origin": "TOPLEFT"}, "confidence": 0.7586213946342468, "cells": [{"id": 1, "text": "Ahmed Nassar, Nikolaos Livathinos, Maksym Lysak, Peter Staar", "bbox": {"l": 142.47701, "t": 146.68535999999995, "r": 452.75027, "b": 157.37334999999996, "coord_origin": "TOPLEFT"}}, {"id": 2, "text": "IBM Research", "bbox": {"l": 262.918, "t": 160.63239, "r": 332.30597, "b": 171.32037000000003, "coord_origin": "TOPLEFT"}}]}, "text": "Ahmed Nassar, Nikolaos Livathinos, Maksym Lysak, Peter Staar IBM Research"}, {"label": "text", "id": 2, "page_no": 0, "cluster": {"id": 2, "label": "text", "bbox": {"l": 208.123, "t": 175.634521484375, "r": 379.2890319824219, "b": 185.38323974609375, "coord_origin": "TOPLEFT"}, "confidence": 0.909633457660675, "cells": [{"id": 3, "text": "{", "bbox": {"l": 208.123, "t": 175.96123999999998, "r": 212.73083, "b": 184.42553999999996, "coord_origin": "TOPLEFT"}}, {"id": 4, "text": "ahn,nli,mly,taa", "bbox": {"l": 212.73, "t": 177.08203000000003, "r": 293.42761, "b": 184.00409000000002, "coord_origin": "TOPLEFT"}}, {"id": 5, "text": "}", "bbox": {"l": 293.42798, "t": 175.96123999999998, "r": 298.0358, "b": 184.42553999999996, "coord_origin": "TOPLEFT"}}, {"id": 6, "text": "@zurich.ibm.com", "bbox": {"l": 298.03497, "t": 177.08203000000003, "r": 378.73257, "b": 184.00409000000002, "coord_origin": "TOPLEFT"}}]}, "text": "{ ahn,nli,mly,taa } @zurich.ibm.com"}, {"label": "section_header", "id": 3, "page_no": 0, "cluster": {"id": 3, "label": "section_header", "bbox": {"l": 144.82810974121094, "t": 215.0791015625, "r": 190.83473205566406, "b": 226.23071000000004, "coord_origin": "TOPLEFT"}, "confidence": 0.9258671998977661, "cells": [{"id": 7, "text": "Abstract", "bbox": {"l": 145.99498, "t": 215.48297000000002, "r": 190.48029, "b": 226.23071000000004, "coord_origin": "TOPLEFT"}}]}, "text": "Abstract"}, {"label": "text", "id": 4, "page_no": 0, "cluster": {"id": 4, "label": "text", "bbox": {"l": 48.843414306640625, "t": 240.1605987548828, "r": 286.7501220703125, "b": 513.9630126953125, "coord_origin": "TOPLEFT"}, "confidence": 0.9838882088661194, "cells": [{"id": 8, "text": "Tables organize valuable content in a concise and com-", "bbox": {"l": 62.066978, "t": 241.39508, "r": 286.36493, "b": 249.98284999999998, "coord_origin": "TOPLEFT"}}, {"id": 9, "text": "pact representation. This content is extremely valuable for", "bbox": {"l": 50.111977, "t": 253.3501, "r": 286.36508, "b": 261.93787, "coord_origin": "TOPLEFT"}}, {"id": 10, "text": "systems such as search engines, Knowledge Graph\u2019s, etc,", "bbox": {"l": 50.111977, "t": 265.30511, "r": 286.36508, "b": 273.89288, "coord_origin": "TOPLEFT"}}, {"id": 11, "text": "since they enhance their predictive capabilities. Unfortu-", "bbox": {"l": 50.111977, "t": 277.26111000000003, "r": 286.36505, "b": 285.84888, "coord_origin": "TOPLEFT"}}, {"id": 12, "text": "nately, tables come in a large variety of shapes and sizes.", "bbox": {"l": 50.111977, "t": 289.21609, "r": 286.36505, "b": 297.80386, "coord_origin": "TOPLEFT"}}, {"id": 13, "text": "Furthermore, they can have complex column/row-header", "bbox": {"l": 50.111977, "t": 301.17108, "r": 286.36505, "b": 309.75884999999994, "coord_origin": "TOPLEFT"}}, {"id": 14, "text": "configurations, multiline rows, different variety of separa-", "bbox": {"l": 50.111977, "t": 313.12607, "r": 286.36508, "b": 321.71384, "coord_origin": "TOPLEFT"}}, {"id": 15, "text": "tion lines, missing entries, etc. As such, the correct iden-", "bbox": {"l": 50.111977, "t": 325.08105, "r": 286.36508, "b": 333.66882, "coord_origin": "TOPLEFT"}}, {"id": 16, "text": "tification of the table-structure from an image is a non-", "bbox": {"l": 50.111977, "t": 337.03604, "r": 286.36505, "b": 345.62381, "coord_origin": "TOPLEFT"}}, {"id": 17, "text": "trivial task. In this paper, we present a new table-structure", "bbox": {"l": 50.111977, "t": 348.99203, "r": 286.36508, "b": 357.5798, "coord_origin": "TOPLEFT"}}, {"id": 18, "text": "identification model. The latter improves the latest end-to-", "bbox": {"l": 50.111977, "t": 360.94701999999995, "r": 286.36505, "b": 369.53479, "coord_origin": "TOPLEFT"}}, {"id": 19, "text": "end deep learning model (i.e. encoder-dual-decoder from", "bbox": {"l": 50.111977, "t": 372.90201, "r": 286.36508, "b": 381.48978, "coord_origin": "TOPLEFT"}}, {"id": 20, "text": "PubTabNet) in two significant ways. First, we introduce a", "bbox": {"l": 50.111977, "t": 384.85699, "r": 286.36505, "b": 393.44476, "coord_origin": "TOPLEFT"}}, {"id": 21, "text": "new object detection decoder for table-cells. In this way,", "bbox": {"l": 50.111977, "t": 396.81198, "r": 286.36511, "b": 405.39975000000004, "coord_origin": "TOPLEFT"}}, {"id": 22, "text": "we can obtain the content of the table-cells from program-", "bbox": {"l": 50.111977, "t": 408.76697, "r": 286.36508, "b": 417.35474, "coord_origin": "TOPLEFT"}}, {"id": 23, "text": "matic PDF\u2019s directly from the PDF source and avoid the", "bbox": {"l": 50.111977, "t": 420.72296000000006, "r": 286.36505, "b": 429.31073, "coord_origin": "TOPLEFT"}}, {"id": 24, "text": "training of the custom OCR decoders.", "bbox": {"l": 50.111977, "t": 432.67795, "r": 207.23216, "b": 441.26572, "coord_origin": "TOPLEFT"}}, {"id": 25, "text": "This architectural", "bbox": {"l": 214.09639, "t": 432.67795, "r": 286.36508, "b": 441.26572, "coord_origin": "TOPLEFT"}}, {"id": 26, "text": "change leads to more accurate table-content extraction and", "bbox": {"l": 50.111977, "t": 444.63293, "r": 286.36508, "b": 453.2207, "coord_origin": "TOPLEFT"}}, {"id": 27, "text": "allows us to tackle non-english tables. Second, we replace", "bbox": {"l": 50.111977, "t": 456.58792000000005, "r": 286.36505, "b": 465.17569, "coord_origin": "TOPLEFT"}}, {"id": 28, "text": "the LSTM decoders with transformer based decoders. This", "bbox": {"l": 50.111977, "t": 468.54291, "r": 286.36505, "b": 477.13068, "coord_origin": "TOPLEFT"}}, {"id": 29, "text": "upgrade improves significantly the previous state-of-the-art", "bbox": {"l": 50.111977, "t": 480.4989, "r": 286.36508, "b": 489.08667, "coord_origin": "TOPLEFT"}}, {"id": 30, "text": "tree-editing-distance-score (TEDS) from 91% to 98.5% on", "bbox": {"l": 50.111977, "t": 492.45389, "r": 286.36505, "b": 501.04166, "coord_origin": "TOPLEFT"}}, {"id": 31, "text": "simple tables and from 88.7% to 95% on complex tables.", "bbox": {"l": 50.111977, "t": 504.40887, "r": 276.65152, "b": 512.9966400000001, "coord_origin": "TOPLEFT"}}]}, "text": "Tables organize valuable content in a concise and compact representation. This content is extremely valuable for systems such as search engines, Knowledge Graph\u2019s, etc, since they enhance their predictive capabilities. Unfortunately, tables come in a large variety of shapes and sizes. Furthermore, they can have complex column/row-header configurations, multiline rows, different variety of separation lines, missing entries, etc. As such, the correct identification of the table-structure from an image is a nontrivial task. In this paper, we present a new table-structure identification model. The latter improves the latest end-toend deep learning model (i.e. encoder-dual-decoder from PubTabNet) in two significant ways. First, we introduce a new object detection decoder for table-cells. In this way, we can obtain the content of the table-cells from programmatic PDF\u2019s directly from the PDF source and avoid the training of the custom OCR decoders. This architectural change leads to more accurate table-content extraction and allows us to tackle non-english tables. Second, we replace the LSTM decoders with transformer based decoders. This upgrade improves significantly the previous state-of-the-art tree-editing-distance-score (TEDS) from 91% to 98.5% on simple tables and from 88.7% to 95% on complex tables."}, {"label": "section_header", "id": 5, "page_no": 0, "cluster": {"id": 5, "label": "section_header", "bbox": {"l": 49.7277946472168, "t": 539.0704956054688, "r": 126.95994567871094, "b": 550.69049, "coord_origin": "TOPLEFT"}, "confidence": 0.9317677617073059, "cells": [{"id": 32, "text": "1.", "bbox": {"l": 50.111977, "t": 539.94276, "r": 58.121296, "b": 550.69049, "coord_origin": "TOPLEFT"}}, {"id": 33, "text": "Introduction", "bbox": {"l": 68.800385, "t": 539.94276, "r": 126.94804, "b": 550.69049, "coord_origin": "TOPLEFT"}}]}, "text": "1. Introduction"}, {"label": "text", "id": 6, "page_no": 0, "cluster": {"id": 6, "label": "text", "bbox": {"l": 49.31346130371094, "t": 559.62060546875, "r": 286.5247497558594, "b": 713.151779, "coord_origin": "TOPLEFT"}, "confidence": 0.9841895699501038, "cells": [{"id": 34, "text": "The occurrence of tables in documents is ubiquitous.", "bbox": {"l": 62.066978, "t": 560.7832, "r": 286.36496, "b": 569.68976, "coord_origin": "TOPLEFT"}}, {"id": 35, "text": "They often summarise quantitative or factual data, which is", "bbox": {"l": 50.111977, "t": 572.7382, "r": 286.36508, "b": 581.64476, "coord_origin": "TOPLEFT"}}, {"id": 36, "text": "cumbersome to describe in verbose text but nevertheless ex-", "bbox": {"l": 50.111977, "t": 584.69321, "r": 286.36505, "b": 593.5997600000001, "coord_origin": "TOPLEFT"}}, {"id": 37, "text": "tremely valuable. Unfortunately, this compact representa-", "bbox": {"l": 50.111977, "t": 596.6492000000001, "r": 286.36505, "b": 605.55576, "coord_origin": "TOPLEFT"}}, {"id": 38, "text": "tion is often not easy to parse by machines. There are many", "bbox": {"l": 50.111977, "t": 608.6042, "r": 286.36505, "b": 617.51076, "coord_origin": "TOPLEFT"}}, {"id": 39, "text": "implicit conventions used to obtain a compact table repre-", "bbox": {"l": 50.111977, "t": 620.5592, "r": 286.36505, "b": 629.46576, "coord_origin": "TOPLEFT"}}, {"id": 40, "text": "sentation. For example, tables often have complex column-", "bbox": {"l": 50.111977, "t": 632.51421, "r": 286.36508, "b": 641.42076, "coord_origin": "TOPLEFT"}}, {"id": 41, "text": "and row-headers in order to reduce duplicated cell content.", "bbox": {"l": 50.111977, "t": 644.46921, "r": 286.36508, "b": 653.37576, "coord_origin": "TOPLEFT"}}, {"id": 42, "text": "Lines of different shapes and sizes are leveraged to separate", "bbox": {"l": 50.111977, "t": 656.42421, "r": 286.36502, "b": 665.33077, "coord_origin": "TOPLEFT"}}, {"id": 43, "text": "content or indicate a tree structure. Additionally, tables can", "bbox": {"l": 50.111977, "t": 668.3802000000001, "r": 286.36505, "b": 677.28677, "coord_origin": "TOPLEFT"}}, {"id": 44, "text": "also have empty/missing table-entries or multi-row textual", "bbox": {"l": 50.111977, "t": 680.33521, "r": 286.36505, "b": 689.2417800000001, "coord_origin": "TOPLEFT"}}, {"id": 45, "text": "table-entries. Fig. 1 shows a table which presents all these", "bbox": {"l": 50.111977, "t": 692.290207, "r": 286.36505, "b": 701.196777, "coord_origin": "TOPLEFT"}}, {"id": 46, "text": "issues.", "bbox": {"l": 50.111977, "t": 704.245209, "r": 76.403275, "b": 713.151779, "coord_origin": "TOPLEFT"}}]}, "text": "The occurrence of tables in documents is ubiquitous. They often summarise quantitative or factual data, which is cumbersome to describe in verbose text but nevertheless extremely valuable. Unfortunately, this compact representation is often not easy to parse by machines. There are many implicit conventions used to obtain a compact table representation. For example, tables often have complex columnand row-headers in order to reduce duplicated cell content. Lines of different shapes and sizes are leveraged to separate content or indicate a tree structure. Additionally, tables can also have empty/missing table-entries or multi-row textual table-entries. Fig. 1 shows a table which presents all these issues."}, {"label": "section_header", "id": 7, "page_no": 0, "cluster": {"id": 7, "label": "section_header", "bbox": {"l": 315.26983642578125, "t": 216.9366455078125, "r": 408.4407, "b": 226.75482, "coord_origin": "TOPLEFT"}, "confidence": 0.6724019646644592, "cells": [{"id": 47, "text": "a.", "bbox": {"l": 315.56702, "t": 218.00684, "r": 324.01007, "b": 226.75482, "coord_origin": "TOPLEFT"}}, {"id": 48, "text": "Picture of a table:", "bbox": {"l": 328.2316, "t": 218.00684, "r": 408.4407, "b": 226.75482, "coord_origin": "TOPLEFT"}}]}, "text": "a. Picture of a table:"}, {"label": "text", "id": 8, "page_no": 0, "cluster": {"id": 8, "label": "text", "bbox": {"l": 315.56702, "t": 313.69478999999995, "r": 486.40194999999994, "b": 333.2428, "coord_origin": "TOPLEFT"}, "confidence": -1.0, "cells": [{"id": 49, "text": "b.", "bbox": {"l": 315.56702, "t": 313.69478999999995, "r": 325.05786, "b": 322.44281, "coord_origin": "TOPLEFT"}}, {"id": 50, "text": "Red-annotation of bounding boxes,", "bbox": {"l": 329.80325, "t": 313.69478999999995, "r": 486.40194999999994, "b": 322.44281, "coord_origin": "TOPLEFT"}}, {"id": 51, "text": "Blue-predictions by TableFormer", "bbox": {"l": 326.46252, "t": 324.49478, "r": 472.47411999999997, "b": 333.2428, "coord_origin": "TOPLEFT"}}]}, "text": "b. Red-annotation of bounding boxes, Blue-predictions by TableFormer"}, {"label": "text", "id": 9, "page_no": 0, "cluster": {"id": 9, "label": "text", "bbox": {"l": 315.56702, "t": 420.1828, "r": 324.81039, "b": 428.93082, "coord_origin": "TOPLEFT"}, "confidence": -1.0, "cells": [{"id": 52, "text": "c.", "bbox": {"l": 315.56702, "t": 420.1828, "r": 324.81039, "b": 428.93082, "coord_origin": "TOPLEFT"}}]}, "text": "c."}, {"label": "text", "id": 10, "page_no": 0, "cluster": {"id": 10, "label": "text", "bbox": {"l": 329.4321, "t": 420.1828, "r": 491.1912500000001, "b": 428.93082, "coord_origin": "TOPLEFT"}, "confidence": -1.0, "cells": [{"id": 53, "text": "Structure predicted by TableFormer:", "bbox": {"l": 329.4321, "t": 420.1828, "r": 491.1912500000001, "b": 428.93082, "coord_origin": "TOPLEFT"}}]}, "text": "Structure predicted by TableFormer:"}, {"label": "picture", "id": 11, "page_no": 0, "cluster": {"id": 11, "label": "picture", "bbox": {"l": 314.78173828125, "t": 338.0652770996094, "r": 539.53088, "b": 410.0494384765625, "coord_origin": "TOPLEFT"}, "confidence": 0.8742761611938477, "cells": [{"id": 54, "text": "1", "bbox": {"l": 408.14752, "t": 342.82828, "r": 412.54001, "b": 351.61322, "coord_origin": "TOPLEFT"}}, {"id": 55, "text": "0", "bbox": {"l": 356.11011, "t": 341.57217, "r": 360.50259, "b": 350.35712, "coord_origin": "TOPLEFT"}}, {"id": 56, "text": "2", "bbox": {"l": 500.6777, "t": 340.93768, "r": 505.0701900000001, "b": 349.7226299999999, "coord_origin": "TOPLEFT"}}, {"id": 57, "text": "3", "bbox": {"l": 356.13382, "t": 351.74789, "r": 360.52631, "b": 360.53284, "coord_origin": "TOPLEFT"}}, {"id": 58, "text": "4", "bbox": {"l": 402.53992, "t": 355.8765, "r": 406.9324, "b": 364.66144, "coord_origin": "TOPLEFT"}}, {"id": 59, "text": "5", "bbox": {"l": 448.58178999999996, "t": 352.84018, "r": 452.97427, "b": 361.62512, "coord_origin": "TOPLEFT"}}, {"id": 60, "text": "6", "bbox": {"l": 491.65161000000006, "t": 353.70657, "r": 496.0441, "b": 362.49152, "coord_origin": "TOPLEFT"}}, {"id": 61, "text": "7", "bbox": {"l": 535.13843, "t": 353.33969, "r": 539.53088, "b": 362.12463, "coord_origin": "TOPLEFT"}}, {"id": 62, "text": "8", "bbox": {"l": 348.82822, "t": 387.09781, "r": 353.2207, "b": 395.88275, "coord_origin": "TOPLEFT"}}, {"id": 63, "text": "9", "bbox": {"l": 389.27151, "t": 375.37228, "r": 393.664, "b": 384.15723, "coord_origin": "TOPLEFT"}}, {"id": 64, "text": "10", "bbox": {"l": 442.67479999999995, "t": 375.64621, "r": 451.45889000000005, "b": 384.43115, "coord_origin": "TOPLEFT"}}, {"id": 65, "text": "11", "bbox": {"l": 477.4382299999999, "t": 375.534, "r": 485.90167, "b": 384.31894000000005, "coord_origin": "TOPLEFT"}}, {"id": 66, "text": "12", "bbox": {"l": 522.57263, "t": 375.64621, "r": 531.35669, "b": 384.43115, "coord_origin": "TOPLEFT"}}, {"id": 67, "text": "13", "bbox": {"l": 400.22992, "t": 387.11429, "r": 409.01401, "b": 395.89923, "coord_origin": "TOPLEFT"}}, {"id": 68, "text": "14", "bbox": {"l": 442.30792, "t": 386.98981000000003, "r": 451.0920100000001, "b": 395.77475000000004, "coord_origin": "TOPLEFT"}}, {"id": 69, "text": "15", "bbox": {"l": 478.21941999999996, "t": 387.37469, "r": 487.00351000000006, "b": 396.15964, "coord_origin": "TOPLEFT"}}, {"id": 70, "text": "16", "bbox": {"l": 523.2287, "t": 386.98981000000003, "r": 532.01276, "b": 395.77475000000004, "coord_origin": "TOPLEFT"}}, {"id": 71, "text": "1", "bbox": {"l": 411.57233, "t": 399.42477, "r": 415.96481, "b": 408.20972, "coord_origin": "TOPLEFT"}}, {"id": 72, "text": "7", "bbox": {"l": 415.96393, "t": 399.42477, "r": 420.35641, "b": 408.20972, "coord_origin": "TOPLEFT"}}, {"id": 73, "text": "18", "bbox": {"l": 442.30521, "t": 399.0371999999999, "r": 451.08929, "b": 407.82213999999993, "coord_origin": "TOPLEFT"}}, {"id": 74, "text": "19", "bbox": {"l": 478.77893, "t": 398.99639999999994, "r": 487.56302, "b": 407.78133999999994, "coord_origin": "TOPLEFT"}}, {"id": 75, "text": "20", "bbox": {"l": 523.97241, "t": 398.6114799999999, "r": 532.75647, "b": 407.39642, "coord_origin": "TOPLEFT"}}, {"id": 98, "text": "3", "bbox": {"l": 385.09399, "t": 357.76030999999995, "r": 391.09879, "b": 367.89072, "coord_origin": "TOPLEFT"}}, {"id": 101, "text": "2", "bbox": {"l": 333.43451, "t": 380.7265, "r": 339.4393, "b": 390.85689999999994, "coord_origin": "TOPLEFT"}}, {"id": 103, "text": "1", "bbox": {"l": 478.07210999999995, "t": 341.0368000000001, "r": 484.0769, "b": 351.16720999999995, "coord_origin": "TOPLEFT"}}]}, "text": "", "data": null, "provenance": null, "predicted_class": null, "confidence": null}, {"label": "table", "id": 12, "page_no": 0, "cluster": {"id": 12, "label": "table", "bbox": {"l": 315.7172546386719, "t": 433.823486328125, "r": 536.835693359375, "b": 496.0290222167969, "coord_origin": "TOPLEFT"}, "confidence": 0.8056102991104126, "cells": [{"id": 76, "text": "1", "bbox": {"l": 347.24872, "t": 437.68588, "r": 351.6412, "b": 446.47083, "coord_origin": "TOPLEFT"}}, {"id": 77, "text": "0", "bbox": {"l": 318.88071, "t": 437.68588, "r": 323.27319, "b": 446.47083, "coord_origin": "TOPLEFT"}}, {"id": 78, "text": "2", "bbox": {"l": 394.10422, "t": 437.68588, "r": 398.4967, "b": 446.47083, "coord_origin": "TOPLEFT"}}, {"id": 79, "text": "3", "bbox": {"l": 318.77316, "t": 449.5455, "r": 323.16565, "b": 458.33044, "coord_origin": "TOPLEFT"}}, {"id": 80, "text": "4", "bbox": {"l": 347.24872, "t": 449.5455, "r": 351.6412, "b": 458.33044, "coord_origin": "TOPLEFT"}}, {"id": 81, "text": "5", "bbox": {"l": 394.10422, "t": 449.5455, "r": 398.4967, "b": 458.33044, "coord_origin": "TOPLEFT"}}, {"id": 82, "text": "6", "bbox": {"l": 440.95941000000005, "t": 449.5455, "r": 445.3519, "b": 458.33044, "coord_origin": "TOPLEFT"}}, {"id": 83, "text": "7", "bbox": {"l": 487.81491, "t": 449.5455, "r": 492.2074, "b": 458.33044, "coord_origin": "TOPLEFT"}}, {"id": 84, "text": "8", "bbox": {"l": 318.77316, "t": 473.70425, "r": 323.16565, "b": 482.4892, "coord_origin": "TOPLEFT"}}, {"id": 85, "text": "9", "bbox": {"l": 347.24872, "t": 461.8446, "r": 351.6412, "b": 470.62955, "coord_origin": "TOPLEFT"}}, {"id": 86, "text": "10", "bbox": {"l": 394.10422, "t": 461.8446, "r": 402.88831, "b": 470.62955, "coord_origin": "TOPLEFT"}}, {"id": 87, "text": "11", "bbox": {"l": 440.95941000000005, "t": 461.8446, "r": 449.42285, "b": 470.62955, "coord_origin": "TOPLEFT"}}, {"id": 88, "text": "12", "bbox": {"l": 487.81491, "t": 461.8446, "r": 496.599, "b": 470.62955, "coord_origin": "TOPLEFT"}}, {"id": 89, "text": "13", "bbox": {"l": 347.24872, "t": 473.70425, "r": 356.03281, "b": 482.4892, "coord_origin": "TOPLEFT"}}, {"id": 90, "text": "14", "bbox": {"l": 394.10422, "t": 473.70425, "r": 402.88831, "b": 482.4892, "coord_origin": "TOPLEFT"}}, {"id": 91, "text": "15", "bbox": {"l": 440.95941000000005, "t": 473.70425, "r": 449.7435, "b": 482.4892, "coord_origin": "TOPLEFT"}}, {"id": 92, "text": "16", "bbox": {"l": 487.81491, "t": 473.70425, "r": 496.599, "b": 482.4892, "coord_origin": "TOPLEFT"}}, {"id": 93, "text": "17", "bbox": {"l": 347.24872, "t": 485.12469, "r": 356.03281, "b": 493.90964, "coord_origin": "TOPLEFT"}}, {"id": 94, "text": "18", "bbox": {"l": 394.10422, "t": 485.12469, "r": 402.88831, "b": 493.90964, "coord_origin": "TOPLEFT"}}, {"id": 95, "text": "19", "bbox": {"l": 440.95941000000005, "t": 485.12469, "r": 449.7435, "b": 493.90964, "coord_origin": "TOPLEFT"}}, {"id": 96, "text": "20", "bbox": {"l": 487.81491, "t": 485.12469, "r": 496.599, "b": 493.90964, "coord_origin": "TOPLEFT"}}, {"id": 99, "text": "3", "bbox": {"l": 366.70102, "t": 449.12082, "r": 372.70581, "b": 459.25122, "coord_origin": "TOPLEFT"}}, {"id": 102, "text": "2", "bbox": {"l": 331.90424, "t": 473.32291, "r": 337.90903, "b": 483.45331, "coord_origin": "TOPLEFT"}}, {"id": 104, "text": "1", "bbox": {"l": 459.87621999999993, "t": 437.5936, "r": 465.88101, "b": 447.724, "coord_origin": "TOPLEFT"}}]}, "text": null, "otsl_seq": ["ched", "ched", "lcel", "ched", "lcel", "ched", "nl", "fcel", "fcel", "fcel", "fcel", "fcel", "fcel", "nl", "fcel", "fcel", "fcel", "fcel", "fcel", "fcel", "nl", "fcel", "fcel", "fcel", "fcel", "fcel", "ucel", "nl", "fcel", "fcel", "fcel", "fcel", "fcel", "ucel", "nl"], "num_rows": 5, "num_cols": 6, "table_cells": [{"bbox": {"l": 347.24872, "t": 437.68588, "r": 351.6412, "b": 446.47083, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 2, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 1, "end_col_offset_idx": 3, "text": "1", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 318.88071, "t": 437.68588, "r": 323.27319, "b": 446.47083, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "0", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 394.10422, "t": 437.5936, "r": 465.88101, "b": 447.724, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 2, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 3, "end_col_offset_idx": 5, "text": "2 1", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 318.77316, "t": 449.5455, "r": 323.16565, "b": 458.33044, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 347.24872, "t": 449.5455, "r": 351.6412, "b": 458.33044, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "4", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 366.70102, "t": 449.12082, "r": 398.4967, "b": 459.25122, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "5 3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 440.95941000000005, "t": 449.5455, "r": 445.3519, "b": 458.33044, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "6", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 487.81491, "t": 449.5455, "r": 492.2074, "b": 458.33044, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "7", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 318.77316, "t": 473.70425, "r": 323.16565, "b": 482.4892, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "8", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 347.24872, "t": 461.8446, "r": 351.6412, "b": 470.62955, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "9", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 394.10422, "t": 461.8446, "r": 402.88831, "b": 470.62955, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "10", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 440.95941000000005, "t": 461.8446, "r": 449.42285, "b": 470.62955, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "11", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 487.81491, "t": 461.8446, "r": 496.599, "b": 470.62955, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "12", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 347.24872, "t": 473.70425, "r": 356.03281, "b": 482.4892, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "13", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 394.10422, "t": 473.70425, "r": 402.88831, "b": 482.4892, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "14", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 440.95941000000005, "t": 473.70425, "r": 449.7435, "b": 482.4892, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "15", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 487.81491, "t": 473.70425, "r": 496.599, "b": 482.4892, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "16", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 347.24872, "t": 485.12469, "r": 356.03281, "b": 493.90964, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "17", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 394.10422, "t": 485.12469, "r": 402.88831, "b": 493.90964, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "18", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 440.95941000000005, "t": 485.12469, "r": 449.7435, "b": 493.90964, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "19", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 487.81491, "t": 485.12469, "r": 496.599, "b": 493.90964, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "20", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 331.90424, "t": 473.32291, "r": 337.90903, "b": 483.45331, "coord_origin": "TOPLEFT"}, "row_span": 3, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 5, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "2", "column_header": false, "row_header": false, "row_section": false}]}, {"label": "table", "id": 13, "page_no": 0, "cluster": {"id": 13, "label": "table", "bbox": {"l": 315.65362548828125, "t": 228.7234344482422, "r": 537.1475219726562, "b": 302.80145263671875, "coord_origin": "TOPLEFT"}, "confidence": 0.6515864729881287, "cells": [{"id": 97, "text": "1", "bbox": {"l": 451.9457100000001, "t": 235.34704999999997, "r": 457.95050000000003, "b": 245.47748, "coord_origin": "TOPLEFT"}}, {"id": 100, "text": "2", "bbox": {"l": 331.19681, "t": 269.35266, "r": 337.2016, "b": 279.48308999999995, "coord_origin": "TOPLEFT"}}, {"id": 105, "text": "3", "bbox": {"l": 384.0329, "t": 252.67895999999996, "r": 390.03769, "b": 262.80939, "coord_origin": "TOPLEFT"}}]}, "text": null, "otsl_seq": ["ecel", "ched", "ched", "ched", "ched", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "nl", "ucel", "fcel", "fcel", "fcel", "fcel", "nl", "ucel", "fcel", "fcel", "fcel", "fcel", "nl"], "num_rows": 1, "num_cols": 2, "table_cells": [{"bbox": {"l": 451.9457100000001, "t": 235.34704999999997, "r": 457.95050000000003, "b": 245.47748, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "1", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 384.0329, "t": 252.67895999999996, "r": 390.03769, "b": 262.80939, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "3", "column_header": true, "row_header": false, "row_section": false}]}, {"label": "caption", "id": 14, "page_no": 0, "cluster": {"id": 14, "label": "caption", "bbox": {"l": 308.1024475097656, "t": 513.773681640625, "r": 545.11517, "b": 559.2729, "coord_origin": "TOPLEFT"}, "confidence": 0.92146235704422, "cells": [{"id": 106, "text": "Figure 1:", "bbox": {"l": 308.862, "t": 514.50037, "r": 345.73361, "b": 523.40692, "coord_origin": "TOPLEFT"}}, {"id": 107, "text": "Picture of a table with subtle, complex features", "bbox": {"l": 353.17566, "t": 514.50037, "r": 545.11511, "b": 523.40692, "coord_origin": "TOPLEFT"}}, {"id": 108, "text": "such as (1) multi-column headers, (2) cell with multi-row", "bbox": {"l": 308.862, "t": 526.45535, "r": 545.11511, "b": 535.3619100000001, "coord_origin": "TOPLEFT"}}, {"id": 109, "text": "text and (3) cells with no content. Image from PubTabNet", "bbox": {"l": 308.862, "t": 538.41035, "r": 545.11517, "b": 547.31691, "coord_origin": "TOPLEFT"}}, {"id": 110, "text": "evaluation set, filename: \u2018PMC2944238 004 02\u2019.", "bbox": {"l": 308.862, "t": 550.36635, "r": 505.6917700000001, "b": 559.2729, "coord_origin": "TOPLEFT"}}]}, "text": "Figure 1: Picture of a table with subtle, complex features such as (1) multi-column headers, (2) cell with multi-row text and (3) cells with no content. Image from PubTabNet evaluation set, filename: \u2018PMC2944238 004 02\u2019."}, {"label": "text", "id": 15, "page_no": 0, "cluster": {"id": 15, "label": "text", "bbox": {"l": 307.88861083984375, "t": 583.6217651367188, "r": 545.5301513671875, "b": 665.04693, "coord_origin": "TOPLEFT"}, "confidence": 0.9848759770393372, "cells": [{"id": 111, "text": "Recently, significant progress has been made with vi-", "bbox": {"l": 320.81699, "t": 584.40936, "r": 545.11493, "b": 593.31592, "coord_origin": "TOPLEFT"}}, {"id": 112, "text": "sion based approaches to extract tables in documents. For", "bbox": {"l": 308.862, "t": 596.36436, "r": 545.11517, "b": 605.2709199999999, "coord_origin": "TOPLEFT"}}, {"id": 113, "text": "the sake of completeness, the issue of table extraction from", "bbox": {"l": 308.862, "t": 608.31937, "r": 545.11511, "b": 617.22592, "coord_origin": "TOPLEFT"}}, {"id": 114, "text": "documents is typically decomposed into two separate chal-", "bbox": {"l": 308.862, "t": 620.27437, "r": 545.11505, "b": 629.18092, "coord_origin": "TOPLEFT"}}, {"id": 115, "text": "lenges, i.e.", "bbox": {"l": 308.862, "t": 632.23036, "r": 353.6937, "b": 641.13692, "coord_origin": "TOPLEFT"}}, {"id": 116, "text": "(1)", "bbox": {"l": 362.11209, "t": 632.23036, "r": 374.66617, "b": 641.13692, "coord_origin": "TOPLEFT"}}, {"id": 117, "text": "finding the location of the table(s) on a", "bbox": {"l": 377.35785, "t": 632.23036, "r": 545.11505, "b": 641.13692, "coord_origin": "TOPLEFT"}}, {"id": 118, "text": "document-page and (2) finding the structure of a given table", "bbox": {"l": 308.862, "t": 644.18536, "r": 545.11517, "b": 653.09192, "coord_origin": "TOPLEFT"}}, {"id": 119, "text": "in the document.", "bbox": {"l": 308.862, "t": 656.14037, "r": 375.55167, "b": 665.04693, "coord_origin": "TOPLEFT"}}]}, "text": "Recently, significant progress has been made with vision based approaches to extract tables in documents. For the sake of completeness, the issue of table extraction from documents is typically decomposed into two separate challenges, i.e. (1) finding the location of the table(s) on a document-page and (2) finding the structure of a given table in the document."}, {"label": "text", "id": 16, "page_no": 0, "cluster": {"id": 16, "label": "text", "bbox": {"l": 307.9762268066406, "t": 667.4489135742188, "r": 545.4558715820312, "b": 713.8033447265625, "coord_origin": "TOPLEFT"}, "confidence": 0.9791521430015564, "cells": [{"id": 120, "text": "The first problem is called table-location and has been", "bbox": {"l": 320.81699, "t": 668.38036, "r": 545.11493, "b": 677.28693, "coord_origin": "TOPLEFT"}}, {"id": 121, "text": "previously addressed [30, 38, 19, 21, 23, 26, 8] with state-", "bbox": {"l": 308.862, "t": 680.33536, "r": 545.11511, "b": 689.24193, "coord_origin": "TOPLEFT"}}, {"id": 122, "text": "of-the-art object-detection networks (e.g. YOLO and later", "bbox": {"l": 308.862, "t": 692.290359, "r": 545.11511, "b": 701.19693, "coord_origin": "TOPLEFT"}}, {"id": 123, "text": "on Mask-RCNN [9]). For all practical purposes, it can be", "bbox": {"l": 308.862, "t": 704.245361, "r": 545.11499, "b": 713.151932, "coord_origin": "TOPLEFT"}}]}, "text": "The first problem is called table-location and has been previously addressed [30, 38, 19, 21, 23, 26, 8] with stateof-the-art object-detection networks (e.g. YOLO and later on Mask-RCNN [9]). For all practical purposes, it can be"}, {"label": "page_footer", "id": 17, "page_no": 0, "cluster": {"id": 17, "label": "page_footer", "bbox": {"l": 295.121, "t": 733.4591674804688, "r": 300.10229, "b": 743.039928, "coord_origin": "TOPLEFT"}, "confidence": 0.8045891523361206, "cells": [{"id": 124, "text": "1", "bbox": {"l": 295.121, "t": 734.133366, "r": 300.10229, "b": 743.039928, "coord_origin": "TOPLEFT"}}]}, "text": "1"}, {"label": "page_header", "id": 18, "page_no": 0, "cluster": {"id": 18, "label": "page_header", "bbox": {"l": 17.166410446166992, "t": 207.82001000000002, "r": 36.339779, "b": 560.00003, "coord_origin": "TOPLEFT"}, "confidence": 0.8773143887519836, "cells": [{"id": 125, "text": "arXiv:2203.01017v2 [cs.CV] 11 Mar 2022", "bbox": {"l": 18.340221, "t": 207.82001000000002, "r": 36.339779, "b": 560.00003, "coord_origin": "TOPLEFT"}}]}, "text": "arXiv:2203.01017v2 [cs.CV] 11 Mar 2022"}], "body": [{"label": "section_header", "id": 0, "page_no": 0, "cluster": {"id": 0, "label": "section_header", "bbox": {"l": 95.52344512939453, "t": 105.71222686767578, "r": 498.92708999999996, "b": 119.93133999999998, "coord_origin": "TOPLEFT"}, "confidence": 0.8868062496185303, "cells": [{"id": 0, "text": "TableFormer: Table Structure Understanding with Transformers.", "bbox": {"l": 96.301003, "t": 107.03412000000003, "r": 498.92708999999996, "b": 119.93133999999998, "coord_origin": "TOPLEFT"}}]}, "text": "TableFormer: Table Structure Understanding with Transformers."}, {"label": "section_header", "id": 1, "page_no": 0, "cluster": {"id": 1, "label": "section_header", "bbox": {"l": 141.76437377929688, "t": 145.46343994140625, "r": 453.2174987792969, "b": 171.32037000000003, "coord_origin": "TOPLEFT"}, "confidence": 0.7586213946342468, "cells": [{"id": 1, "text": "Ahmed Nassar, Nikolaos Livathinos, Maksym Lysak, Peter Staar", "bbox": {"l": 142.47701, "t": 146.68535999999995, "r": 452.75027, "b": 157.37334999999996, "coord_origin": "TOPLEFT"}}, {"id": 2, "text": "IBM Research", "bbox": {"l": 262.918, "t": 160.63239, "r": 332.30597, "b": 171.32037000000003, "coord_origin": "TOPLEFT"}}]}, "text": "Ahmed Nassar, Nikolaos Livathinos, Maksym Lysak, Peter Staar IBM Research"}, {"label": "text", "id": 2, "page_no": 0, "cluster": {"id": 2, "label": "text", "bbox": {"l": 208.123, "t": 175.634521484375, "r": 379.2890319824219, "b": 185.38323974609375, "coord_origin": "TOPLEFT"}, "confidence": 0.909633457660675, "cells": [{"id": 3, "text": "{", "bbox": {"l": 208.123, "t": 175.96123999999998, "r": 212.73083, "b": 184.42553999999996, "coord_origin": "TOPLEFT"}}, {"id": 4, "text": "ahn,nli,mly,taa", "bbox": {"l": 212.73, "t": 177.08203000000003, "r": 293.42761, "b": 184.00409000000002, "coord_origin": "TOPLEFT"}}, {"id": 5, "text": "}", "bbox": {"l": 293.42798, "t": 175.96123999999998, "r": 298.0358, "b": 184.42553999999996, "coord_origin": "TOPLEFT"}}, {"id": 6, "text": "@zurich.ibm.com", "bbox": {"l": 298.03497, "t": 177.08203000000003, "r": 378.73257, "b": 184.00409000000002, "coord_origin": "TOPLEFT"}}]}, "text": "{ ahn,nli,mly,taa } @zurich.ibm.com"}, {"label": "section_header", "id": 3, "page_no": 0, "cluster": {"id": 3, "label": "section_header", "bbox": {"l": 144.82810974121094, "t": 215.0791015625, "r": 190.83473205566406, "b": 226.23071000000004, "coord_origin": "TOPLEFT"}, "confidence": 0.9258671998977661, "cells": [{"id": 7, "text": "Abstract", "bbox": {"l": 145.99498, "t": 215.48297000000002, "r": 190.48029, "b": 226.23071000000004, "coord_origin": "TOPLEFT"}}]}, "text": "Abstract"}, {"label": "text", "id": 4, "page_no": 0, "cluster": {"id": 4, "label": "text", "bbox": {"l": 48.843414306640625, "t": 240.1605987548828, "r": 286.7501220703125, "b": 513.9630126953125, "coord_origin": "TOPLEFT"}, "confidence": 0.9838882088661194, "cells": [{"id": 8, "text": "Tables organize valuable content in a concise and com-", "bbox": {"l": 62.066978, "t": 241.39508, "r": 286.36493, "b": 249.98284999999998, "coord_origin": "TOPLEFT"}}, {"id": 9, "text": "pact representation. This content is extremely valuable for", "bbox": {"l": 50.111977, "t": 253.3501, "r": 286.36508, "b": 261.93787, "coord_origin": "TOPLEFT"}}, {"id": 10, "text": "systems such as search engines, Knowledge Graph\u2019s, etc,", "bbox": {"l": 50.111977, "t": 265.30511, "r": 286.36508, "b": 273.89288, "coord_origin": "TOPLEFT"}}, {"id": 11, "text": "since they enhance their predictive capabilities. Unfortu-", "bbox": {"l": 50.111977, "t": 277.26111000000003, "r": 286.36505, "b": 285.84888, "coord_origin": "TOPLEFT"}}, {"id": 12, "text": "nately, tables come in a large variety of shapes and sizes.", "bbox": {"l": 50.111977, "t": 289.21609, "r": 286.36505, "b": 297.80386, "coord_origin": "TOPLEFT"}}, {"id": 13, "text": "Furthermore, they can have complex column/row-header", "bbox": {"l": 50.111977, "t": 301.17108, "r": 286.36505, "b": 309.75884999999994, "coord_origin": "TOPLEFT"}}, {"id": 14, "text": "configurations, multiline rows, different variety of separa-", "bbox": {"l": 50.111977, "t": 313.12607, "r": 286.36508, "b": 321.71384, "coord_origin": "TOPLEFT"}}, {"id": 15, "text": "tion lines, missing entries, etc. As such, the correct iden-", "bbox": {"l": 50.111977, "t": 325.08105, "r": 286.36508, "b": 333.66882, "coord_origin": "TOPLEFT"}}, {"id": 16, "text": "tification of the table-structure from an image is a non-", "bbox": {"l": 50.111977, "t": 337.03604, "r": 286.36505, "b": 345.62381, "coord_origin": "TOPLEFT"}}, {"id": 17, "text": "trivial task. In this paper, we present a new table-structure", "bbox": {"l": 50.111977, "t": 348.99203, "r": 286.36508, "b": 357.5798, "coord_origin": "TOPLEFT"}}, {"id": 18, "text": "identification model. The latter improves the latest end-to-", "bbox": {"l": 50.111977, "t": 360.94701999999995, "r": 286.36505, "b": 369.53479, "coord_origin": "TOPLEFT"}}, {"id": 19, "text": "end deep learning model (i.e. encoder-dual-decoder from", "bbox": {"l": 50.111977, "t": 372.90201, "r": 286.36508, "b": 381.48978, "coord_origin": "TOPLEFT"}}, {"id": 20, "text": "PubTabNet) in two significant ways. First, we introduce a", "bbox": {"l": 50.111977, "t": 384.85699, "r": 286.36505, "b": 393.44476, "coord_origin": "TOPLEFT"}}, {"id": 21, "text": "new object detection decoder for table-cells. In this way,", "bbox": {"l": 50.111977, "t": 396.81198, "r": 286.36511, "b": 405.39975000000004, "coord_origin": "TOPLEFT"}}, {"id": 22, "text": "we can obtain the content of the table-cells from program-", "bbox": {"l": 50.111977, "t": 408.76697, "r": 286.36508, "b": 417.35474, "coord_origin": "TOPLEFT"}}, {"id": 23, "text": "matic PDF\u2019s directly from the PDF source and avoid the", "bbox": {"l": 50.111977, "t": 420.72296000000006, "r": 286.36505, "b": 429.31073, "coord_origin": "TOPLEFT"}}, {"id": 24, "text": "training of the custom OCR decoders.", "bbox": {"l": 50.111977, "t": 432.67795, "r": 207.23216, "b": 441.26572, "coord_origin": "TOPLEFT"}}, {"id": 25, "text": "This architectural", "bbox": {"l": 214.09639, "t": 432.67795, "r": 286.36508, "b": 441.26572, "coord_origin": "TOPLEFT"}}, {"id": 26, "text": "change leads to more accurate table-content extraction and", "bbox": {"l": 50.111977, "t": 444.63293, "r": 286.36508, "b": 453.2207, "coord_origin": "TOPLEFT"}}, {"id": 27, "text": "allows us to tackle non-english tables. Second, we replace", "bbox": {"l": 50.111977, "t": 456.58792000000005, "r": 286.36505, "b": 465.17569, "coord_origin": "TOPLEFT"}}, {"id": 28, "text": "the LSTM decoders with transformer based decoders. This", "bbox": {"l": 50.111977, "t": 468.54291, "r": 286.36505, "b": 477.13068, "coord_origin": "TOPLEFT"}}, {"id": 29, "text": "upgrade improves significantly the previous state-of-the-art", "bbox": {"l": 50.111977, "t": 480.4989, "r": 286.36508, "b": 489.08667, "coord_origin": "TOPLEFT"}}, {"id": 30, "text": "tree-editing-distance-score (TEDS) from 91% to 98.5% on", "bbox": {"l": 50.111977, "t": 492.45389, "r": 286.36505, "b": 501.04166, "coord_origin": "TOPLEFT"}}, {"id": 31, "text": "simple tables and from 88.7% to 95% on complex tables.", "bbox": {"l": 50.111977, "t": 504.40887, "r": 276.65152, "b": 512.9966400000001, "coord_origin": "TOPLEFT"}}]}, "text": "Tables organize valuable content in a concise and compact representation. This content is extremely valuable for systems such as search engines, Knowledge Graph\u2019s, etc, since they enhance their predictive capabilities. Unfortunately, tables come in a large variety of shapes and sizes. Furthermore, they can have complex column/row-header configurations, multiline rows, different variety of separation lines, missing entries, etc. As such, the correct identification of the table-structure from an image is a nontrivial task. In this paper, we present a new table-structure identification model. The latter improves the latest end-toend deep learning model (i.e. encoder-dual-decoder from PubTabNet) in two significant ways. First, we introduce a new object detection decoder for table-cells. In this way, we can obtain the content of the table-cells from programmatic PDF\u2019s directly from the PDF source and avoid the training of the custom OCR decoders. This architectural change leads to more accurate table-content extraction and allows us to tackle non-english tables. Second, we replace the LSTM decoders with transformer based decoders. This upgrade improves significantly the previous state-of-the-art tree-editing-distance-score (TEDS) from 91% to 98.5% on simple tables and from 88.7% to 95% on complex tables."}, {"label": "section_header", "id": 5, "page_no": 0, "cluster": {"id": 5, "label": "section_header", "bbox": {"l": 49.7277946472168, "t": 539.0704956054688, "r": 126.95994567871094, "b": 550.69049, "coord_origin": "TOPLEFT"}, "confidence": 0.9317677617073059, "cells": [{"id": 32, "text": "1.", "bbox": {"l": 50.111977, "t": 539.94276, "r": 58.121296, "b": 550.69049, "coord_origin": "TOPLEFT"}}, {"id": 33, "text": "Introduction", "bbox": {"l": 68.800385, "t": 539.94276, "r": 126.94804, "b": 550.69049, "coord_origin": "TOPLEFT"}}]}, "text": "1. Introduction"}, {"label": "text", "id": 6, "page_no": 0, "cluster": {"id": 6, "label": "text", "bbox": {"l": 49.31346130371094, "t": 559.62060546875, "r": 286.5247497558594, "b": 713.151779, "coord_origin": "TOPLEFT"}, "confidence": 0.9841895699501038, "cells": [{"id": 34, "text": "The occurrence of tables in documents is ubiquitous.", "bbox": {"l": 62.066978, "t": 560.7832, "r": 286.36496, "b": 569.68976, "coord_origin": "TOPLEFT"}}, {"id": 35, "text": "They often summarise quantitative or factual data, which is", "bbox": {"l": 50.111977, "t": 572.7382, "r": 286.36508, "b": 581.64476, "coord_origin": "TOPLEFT"}}, {"id": 36, "text": "cumbersome to describe in verbose text but nevertheless ex-", "bbox": {"l": 50.111977, "t": 584.69321, "r": 286.36505, "b": 593.5997600000001, "coord_origin": "TOPLEFT"}}, {"id": 37, "text": "tremely valuable. Unfortunately, this compact representa-", "bbox": {"l": 50.111977, "t": 596.6492000000001, "r": 286.36505, "b": 605.55576, "coord_origin": "TOPLEFT"}}, {"id": 38, "text": "tion is often not easy to parse by machines. There are many", "bbox": {"l": 50.111977, "t": 608.6042, "r": 286.36505, "b": 617.51076, "coord_origin": "TOPLEFT"}}, {"id": 39, "text": "implicit conventions used to obtain a compact table repre-", "bbox": {"l": 50.111977, "t": 620.5592, "r": 286.36505, "b": 629.46576, "coord_origin": "TOPLEFT"}}, {"id": 40, "text": "sentation. For example, tables often have complex column-", "bbox": {"l": 50.111977, "t": 632.51421, "r": 286.36508, "b": 641.42076, "coord_origin": "TOPLEFT"}}, {"id": 41, "text": "and row-headers in order to reduce duplicated cell content.", "bbox": {"l": 50.111977, "t": 644.46921, "r": 286.36508, "b": 653.37576, "coord_origin": "TOPLEFT"}}, {"id": 42, "text": "Lines of different shapes and sizes are leveraged to separate", "bbox": {"l": 50.111977, "t": 656.42421, "r": 286.36502, "b": 665.33077, "coord_origin": "TOPLEFT"}}, {"id": 43, "text": "content or indicate a tree structure. Additionally, tables can", "bbox": {"l": 50.111977, "t": 668.3802000000001, "r": 286.36505, "b": 677.28677, "coord_origin": "TOPLEFT"}}, {"id": 44, "text": "also have empty/missing table-entries or multi-row textual", "bbox": {"l": 50.111977, "t": 680.33521, "r": 286.36505, "b": 689.2417800000001, "coord_origin": "TOPLEFT"}}, {"id": 45, "text": "table-entries. Fig. 1 shows a table which presents all these", "bbox": {"l": 50.111977, "t": 692.290207, "r": 286.36505, "b": 701.196777, "coord_origin": "TOPLEFT"}}, {"id": 46, "text": "issues.", "bbox": {"l": 50.111977, "t": 704.245209, "r": 76.403275, "b": 713.151779, "coord_origin": "TOPLEFT"}}]}, "text": "The occurrence of tables in documents is ubiquitous. They often summarise quantitative or factual data, which is cumbersome to describe in verbose text but nevertheless extremely valuable. Unfortunately, this compact representation is often not easy to parse by machines. There are many implicit conventions used to obtain a compact table representation. For example, tables often have complex columnand row-headers in order to reduce duplicated cell content. Lines of different shapes and sizes are leveraged to separate content or indicate a tree structure. Additionally, tables can also have empty/missing table-entries or multi-row textual table-entries. Fig. 1 shows a table which presents all these issues."}, {"label": "section_header", "id": 7, "page_no": 0, "cluster": {"id": 7, "label": "section_header", "bbox": {"l": 315.26983642578125, "t": 216.9366455078125, "r": 408.4407, "b": 226.75482, "coord_origin": "TOPLEFT"}, "confidence": 0.6724019646644592, "cells": [{"id": 47, "text": "a.", "bbox": {"l": 315.56702, "t": 218.00684, "r": 324.01007, "b": 226.75482, "coord_origin": "TOPLEFT"}}, {"id": 48, "text": "Picture of a table:", "bbox": {"l": 328.2316, "t": 218.00684, "r": 408.4407, "b": 226.75482, "coord_origin": "TOPLEFT"}}]}, "text": "a. Picture of a table:"}, {"label": "text", "id": 8, "page_no": 0, "cluster": {"id": 8, "label": "text", "bbox": {"l": 315.56702, "t": 313.69478999999995, "r": 486.40194999999994, "b": 333.2428, "coord_origin": "TOPLEFT"}, "confidence": -1.0, "cells": [{"id": 49, "text": "b.", "bbox": {"l": 315.56702, "t": 313.69478999999995, "r": 325.05786, "b": 322.44281, "coord_origin": "TOPLEFT"}}, {"id": 50, "text": "Red-annotation of bounding boxes,", "bbox": {"l": 329.80325, "t": 313.69478999999995, "r": 486.40194999999994, "b": 322.44281, "coord_origin": "TOPLEFT"}}, {"id": 51, "text": "Blue-predictions by TableFormer", "bbox": {"l": 326.46252, "t": 324.49478, "r": 472.47411999999997, "b": 333.2428, "coord_origin": "TOPLEFT"}}]}, "text": "b. Red-annotation of bounding boxes, Blue-predictions by TableFormer"}, {"label": "text", "id": 9, "page_no": 0, "cluster": {"id": 9, "label": "text", "bbox": {"l": 315.56702, "t": 420.1828, "r": 324.81039, "b": 428.93082, "coord_origin": "TOPLEFT"}, "confidence": -1.0, "cells": [{"id": 52, "text": "c.", "bbox": {"l": 315.56702, "t": 420.1828, "r": 324.81039, "b": 428.93082, "coord_origin": "TOPLEFT"}}]}, "text": "c."}, {"label": "text", "id": 10, "page_no": 0, "cluster": {"id": 10, "label": "text", "bbox": {"l": 329.4321, "t": 420.1828, "r": 491.1912500000001, "b": 428.93082, "coord_origin": "TOPLEFT"}, "confidence": -1.0, "cells": [{"id": 53, "text": "Structure predicted by TableFormer:", "bbox": {"l": 329.4321, "t": 420.1828, "r": 491.1912500000001, "b": 428.93082, "coord_origin": "TOPLEFT"}}]}, "text": "Structure predicted by TableFormer:"}, {"label": "picture", "id": 11, "page_no": 0, "cluster": {"id": 11, "label": "picture", "bbox": {"l": 314.78173828125, "t": 338.0652770996094, "r": 539.53088, "b": 410.0494384765625, "coord_origin": "TOPLEFT"}, "confidence": 0.8742761611938477, "cells": [{"id": 54, "text": "1", "bbox": {"l": 408.14752, "t": 342.82828, "r": 412.54001, "b": 351.61322, "coord_origin": "TOPLEFT"}}, {"id": 55, "text": "0", "bbox": {"l": 356.11011, "t": 341.57217, "r": 360.50259, "b": 350.35712, "coord_origin": "TOPLEFT"}}, {"id": 56, "text": "2", "bbox": {"l": 500.6777, "t": 340.93768, "r": 505.0701900000001, "b": 349.7226299999999, "coord_origin": "TOPLEFT"}}, {"id": 57, "text": "3", "bbox": {"l": 356.13382, "t": 351.74789, "r": 360.52631, "b": 360.53284, "coord_origin": "TOPLEFT"}}, {"id": 58, "text": "4", "bbox": {"l": 402.53992, "t": 355.8765, "r": 406.9324, "b": 364.66144, "coord_origin": "TOPLEFT"}}, {"id": 59, "text": "5", "bbox": {"l": 448.58178999999996, "t": 352.84018, "r": 452.97427, "b": 361.62512, "coord_origin": "TOPLEFT"}}, {"id": 60, "text": "6", "bbox": {"l": 491.65161000000006, "t": 353.70657, "r": 496.0441, "b": 362.49152, "coord_origin": "TOPLEFT"}}, {"id": 61, "text": "7", "bbox": {"l": 535.13843, "t": 353.33969, "r": 539.53088, "b": 362.12463, "coord_origin": "TOPLEFT"}}, {"id": 62, "text": "8", "bbox": {"l": 348.82822, "t": 387.09781, "r": 353.2207, "b": 395.88275, "coord_origin": "TOPLEFT"}}, {"id": 63, "text": "9", "bbox": {"l": 389.27151, "t": 375.37228, "r": 393.664, "b": 384.15723, "coord_origin": "TOPLEFT"}}, {"id": 64, "text": "10", "bbox": {"l": 442.67479999999995, "t": 375.64621, "r": 451.45889000000005, "b": 384.43115, "coord_origin": "TOPLEFT"}}, {"id": 65, "text": "11", "bbox": {"l": 477.4382299999999, "t": 375.534, "r": 485.90167, "b": 384.31894000000005, "coord_origin": "TOPLEFT"}}, {"id": 66, "text": "12", "bbox": {"l": 522.57263, "t": 375.64621, "r": 531.35669, "b": 384.43115, "coord_origin": "TOPLEFT"}}, {"id": 67, "text": "13", "bbox": {"l": 400.22992, "t": 387.11429, "r": 409.01401, "b": 395.89923, "coord_origin": "TOPLEFT"}}, {"id": 68, "text": "14", "bbox": {"l": 442.30792, "t": 386.98981000000003, "r": 451.0920100000001, "b": 395.77475000000004, "coord_origin": "TOPLEFT"}}, {"id": 69, "text": "15", "bbox": {"l": 478.21941999999996, "t": 387.37469, "r": 487.00351000000006, "b": 396.15964, "coord_origin": "TOPLEFT"}}, {"id": 70, "text": "16", "bbox": {"l": 523.2287, "t": 386.98981000000003, "r": 532.01276, "b": 395.77475000000004, "coord_origin": "TOPLEFT"}}, {"id": 71, "text": "1", "bbox": {"l": 411.57233, "t": 399.42477, "r": 415.96481, "b": 408.20972, "coord_origin": "TOPLEFT"}}, {"id": 72, "text": "7", "bbox": {"l": 415.96393, "t": 399.42477, "r": 420.35641, "b": 408.20972, "coord_origin": "TOPLEFT"}}, {"id": 73, "text": "18", "bbox": {"l": 442.30521, "t": 399.0371999999999, "r": 451.08929, "b": 407.82213999999993, "coord_origin": "TOPLEFT"}}, {"id": 74, "text": "19", "bbox": {"l": 478.77893, "t": 398.99639999999994, "r": 487.56302, "b": 407.78133999999994, "coord_origin": "TOPLEFT"}}, {"id": 75, "text": "20", "bbox": {"l": 523.97241, "t": 398.6114799999999, "r": 532.75647, "b": 407.39642, "coord_origin": "TOPLEFT"}}, {"id": 98, "text": "3", "bbox": {"l": 385.09399, "t": 357.76030999999995, "r": 391.09879, "b": 367.89072, "coord_origin": "TOPLEFT"}}, {"id": 101, "text": "2", "bbox": {"l": 333.43451, "t": 380.7265, "r": 339.4393, "b": 390.85689999999994, "coord_origin": "TOPLEFT"}}, {"id": 103, "text": "1", "bbox": {"l": 478.07210999999995, "t": 341.0368000000001, "r": 484.0769, "b": 351.16720999999995, "coord_origin": "TOPLEFT"}}]}, "text": "", "data": null, "provenance": null, "predicted_class": null, "confidence": null}, {"label": "table", "id": 12, "page_no": 0, "cluster": {"id": 12, "label": "table", "bbox": {"l": 315.7172546386719, "t": 433.823486328125, "r": 536.835693359375, "b": 496.0290222167969, "coord_origin": "TOPLEFT"}, "confidence": 0.8056102991104126, "cells": [{"id": 76, "text": "1", "bbox": {"l": 347.24872, "t": 437.68588, "r": 351.6412, "b": 446.47083, "coord_origin": "TOPLEFT"}}, {"id": 77, "text": "0", "bbox": {"l": 318.88071, "t": 437.68588, "r": 323.27319, "b": 446.47083, "coord_origin": "TOPLEFT"}}, {"id": 78, "text": "2", "bbox": {"l": 394.10422, "t": 437.68588, "r": 398.4967, "b": 446.47083, "coord_origin": "TOPLEFT"}}, {"id": 79, "text": "3", "bbox": {"l": 318.77316, "t": 449.5455, "r": 323.16565, "b": 458.33044, "coord_origin": "TOPLEFT"}}, {"id": 80, "text": "4", "bbox": {"l": 347.24872, "t": 449.5455, "r": 351.6412, "b": 458.33044, "coord_origin": "TOPLEFT"}}, {"id": 81, "text": "5", "bbox": {"l": 394.10422, "t": 449.5455, "r": 398.4967, "b": 458.33044, "coord_origin": "TOPLEFT"}}, {"id": 82, "text": "6", "bbox": {"l": 440.95941000000005, "t": 449.5455, "r": 445.3519, "b": 458.33044, "coord_origin": "TOPLEFT"}}, {"id": 83, "text": "7", "bbox": {"l": 487.81491, "t": 449.5455, "r": 492.2074, "b": 458.33044, "coord_origin": "TOPLEFT"}}, {"id": 84, "text": "8", "bbox": {"l": 318.77316, "t": 473.70425, "r": 323.16565, "b": 482.4892, "coord_origin": "TOPLEFT"}}, {"id": 85, "text": "9", "bbox": {"l": 347.24872, "t": 461.8446, "r": 351.6412, "b": 470.62955, "coord_origin": "TOPLEFT"}}, {"id": 86, "text": "10", "bbox": {"l": 394.10422, "t": 461.8446, "r": 402.88831, "b": 470.62955, "coord_origin": "TOPLEFT"}}, {"id": 87, "text": "11", "bbox": {"l": 440.95941000000005, "t": 461.8446, "r": 449.42285, "b": 470.62955, "coord_origin": "TOPLEFT"}}, {"id": 88, "text": "12", "bbox": {"l": 487.81491, "t": 461.8446, "r": 496.599, "b": 470.62955, "coord_origin": "TOPLEFT"}}, {"id": 89, "text": "13", "bbox": {"l": 347.24872, "t": 473.70425, "r": 356.03281, "b": 482.4892, "coord_origin": "TOPLEFT"}}, {"id": 90, "text": "14", "bbox": {"l": 394.10422, "t": 473.70425, "r": 402.88831, "b": 482.4892, "coord_origin": "TOPLEFT"}}, {"id": 91, "text": "15", "bbox": {"l": 440.95941000000005, "t": 473.70425, "r": 449.7435, "b": 482.4892, "coord_origin": "TOPLEFT"}}, {"id": 92, "text": "16", "bbox": {"l": 487.81491, "t": 473.70425, "r": 496.599, "b": 482.4892, "coord_origin": "TOPLEFT"}}, {"id": 93, "text": "17", "bbox": {"l": 347.24872, "t": 485.12469, "r": 356.03281, "b": 493.90964, "coord_origin": "TOPLEFT"}}, {"id": 94, "text": "18", "bbox": {"l": 394.10422, "t": 485.12469, "r": 402.88831, "b": 493.90964, "coord_origin": "TOPLEFT"}}, {"id": 95, "text": "19", "bbox": {"l": 440.95941000000005, "t": 485.12469, "r": 449.7435, "b": 493.90964, "coord_origin": "TOPLEFT"}}, {"id": 96, "text": "20", "bbox": {"l": 487.81491, "t": 485.12469, "r": 496.599, "b": 493.90964, "coord_origin": "TOPLEFT"}}, {"id": 99, "text": "3", "bbox": {"l": 366.70102, "t": 449.12082, "r": 372.70581, "b": 459.25122, "coord_origin": "TOPLEFT"}}, {"id": 102, "text": "2", "bbox": {"l": 331.90424, "t": 473.32291, "r": 337.90903, "b": 483.45331, "coord_origin": "TOPLEFT"}}, {"id": 104, "text": "1", "bbox": {"l": 459.87621999999993, "t": 437.5936, "r": 465.88101, "b": 447.724, "coord_origin": "TOPLEFT"}}]}, "text": null, "otsl_seq": ["ched", "ched", "lcel", "ched", "lcel", "ched", "nl", "fcel", "fcel", "fcel", "fcel", "fcel", "fcel", "nl", "fcel", "fcel", "fcel", "fcel", "fcel", "fcel", "nl", "fcel", "fcel", "fcel", "fcel", "fcel", "ucel", "nl", "fcel", "fcel", "fcel", "fcel", "fcel", "ucel", "nl"], "num_rows": 5, "num_cols": 6, "table_cells": [{"bbox": {"l": 347.24872, "t": 437.68588, "r": 351.6412, "b": 446.47083, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 2, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 1, "end_col_offset_idx": 3, "text": "1", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 318.88071, "t": 437.68588, "r": 323.27319, "b": 446.47083, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "0", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 394.10422, "t": 437.5936, "r": 465.88101, "b": 447.724, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 2, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 3, "end_col_offset_idx": 5, "text": "2 1", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 318.77316, "t": 449.5455, "r": 323.16565, "b": 458.33044, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 347.24872, "t": 449.5455, "r": 351.6412, "b": 458.33044, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "4", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 366.70102, "t": 449.12082, "r": 398.4967, "b": 459.25122, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "5 3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 440.95941000000005, "t": 449.5455, "r": 445.3519, "b": 458.33044, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "6", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 487.81491, "t": 449.5455, "r": 492.2074, "b": 458.33044, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "7", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 318.77316, "t": 473.70425, "r": 323.16565, "b": 482.4892, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "8", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 347.24872, "t": 461.8446, "r": 351.6412, "b": 470.62955, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "9", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 394.10422, "t": 461.8446, "r": 402.88831, "b": 470.62955, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "10", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 440.95941000000005, "t": 461.8446, "r": 449.42285, "b": 470.62955, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "11", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 487.81491, "t": 461.8446, "r": 496.599, "b": 470.62955, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "12", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 347.24872, "t": 473.70425, "r": 356.03281, "b": 482.4892, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "13", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 394.10422, "t": 473.70425, "r": 402.88831, "b": 482.4892, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "14", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 440.95941000000005, "t": 473.70425, "r": 449.7435, "b": 482.4892, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "15", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 487.81491, "t": 473.70425, "r": 496.599, "b": 482.4892, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "16", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 347.24872, "t": 485.12469, "r": 356.03281, "b": 493.90964, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "17", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 394.10422, "t": 485.12469, "r": 402.88831, "b": 493.90964, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "18", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 440.95941000000005, "t": 485.12469, "r": 449.7435, "b": 493.90964, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "19", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 487.81491, "t": 485.12469, "r": 496.599, "b": 493.90964, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "20", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 331.90424, "t": 473.32291, "r": 337.90903, "b": 483.45331, "coord_origin": "TOPLEFT"}, "row_span": 3, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 5, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "2", "column_header": false, "row_header": false, "row_section": false}]}, {"label": "table", "id": 13, "page_no": 0, "cluster": {"id": 13, "label": "table", "bbox": {"l": 315.65362548828125, "t": 228.7234344482422, "r": 537.1475219726562, "b": 302.80145263671875, "coord_origin": "TOPLEFT"}, "confidence": 0.6515864729881287, "cells": [{"id": 97, "text": "1", "bbox": {"l": 451.9457100000001, "t": 235.34704999999997, "r": 457.95050000000003, "b": 245.47748, "coord_origin": "TOPLEFT"}}, {"id": 100, "text": "2", "bbox": {"l": 331.19681, "t": 269.35266, "r": 337.2016, "b": 279.48308999999995, "coord_origin": "TOPLEFT"}}, {"id": 105, "text": "3", "bbox": {"l": 384.0329, "t": 252.67895999999996, "r": 390.03769, "b": 262.80939, "coord_origin": "TOPLEFT"}}]}, "text": null, "otsl_seq": ["ecel", "ched", "ched", "ched", "ched", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "nl", "ucel", "fcel", "fcel", "fcel", "fcel", "nl", "ucel", "fcel", "fcel", "fcel", "fcel", "nl"], "num_rows": 1, "num_cols": 2, "table_cells": [{"bbox": {"l": 451.9457100000001, "t": 235.34704999999997, "r": 457.95050000000003, "b": 245.47748, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "1", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 384.0329, "t": 252.67895999999996, "r": 390.03769, "b": 262.80939, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "3", "column_header": true, "row_header": false, "row_section": false}]}, {"label": "caption", "id": 14, "page_no": 0, "cluster": {"id": 14, "label": "caption", "bbox": {"l": 308.1024475097656, "t": 513.773681640625, "r": 545.11517, "b": 559.2729, "coord_origin": "TOPLEFT"}, "confidence": 0.92146235704422, "cells": [{"id": 106, "text": "Figure 1:", "bbox": {"l": 308.862, "t": 514.50037, "r": 345.73361, "b": 523.40692, "coord_origin": "TOPLEFT"}}, {"id": 107, "text": "Picture of a table with subtle, complex features", "bbox": {"l": 353.17566, "t": 514.50037, "r": 545.11511, "b": 523.40692, "coord_origin": "TOPLEFT"}}, {"id": 108, "text": "such as (1) multi-column headers, (2) cell with multi-row", "bbox": {"l": 308.862, "t": 526.45535, "r": 545.11511, "b": 535.3619100000001, "coord_origin": "TOPLEFT"}}, {"id": 109, "text": "text and (3) cells with no content. Image from PubTabNet", "bbox": {"l": 308.862, "t": 538.41035, "r": 545.11517, "b": 547.31691, "coord_origin": "TOPLEFT"}}, {"id": 110, "text": "evaluation set, filename: \u2018PMC2944238 004 02\u2019.", "bbox": {"l": 308.862, "t": 550.36635, "r": 505.6917700000001, "b": 559.2729, "coord_origin": "TOPLEFT"}}]}, "text": "Figure 1: Picture of a table with subtle, complex features such as (1) multi-column headers, (2) cell with multi-row text and (3) cells with no content. Image from PubTabNet evaluation set, filename: \u2018PMC2944238 004 02\u2019."}, {"label": "text", "id": 15, "page_no": 0, "cluster": {"id": 15, "label": "text", "bbox": {"l": 307.88861083984375, "t": 583.6217651367188, "r": 545.5301513671875, "b": 665.04693, "coord_origin": "TOPLEFT"}, "confidence": 0.9848759770393372, "cells": [{"id": 111, "text": "Recently, significant progress has been made with vi-", "bbox": {"l": 320.81699, "t": 584.40936, "r": 545.11493, "b": 593.31592, "coord_origin": "TOPLEFT"}}, {"id": 112, "text": "sion based approaches to extract tables in documents. For", "bbox": {"l": 308.862, "t": 596.36436, "r": 545.11517, "b": 605.2709199999999, "coord_origin": "TOPLEFT"}}, {"id": 113, "text": "the sake of completeness, the issue of table extraction from", "bbox": {"l": 308.862, "t": 608.31937, "r": 545.11511, "b": 617.22592, "coord_origin": "TOPLEFT"}}, {"id": 114, "text": "documents is typically decomposed into two separate chal-", "bbox": {"l": 308.862, "t": 620.27437, "r": 545.11505, "b": 629.18092, "coord_origin": "TOPLEFT"}}, {"id": 115, "text": "lenges, i.e.", "bbox": {"l": 308.862, "t": 632.23036, "r": 353.6937, "b": 641.13692, "coord_origin": "TOPLEFT"}}, {"id": 116, "text": "(1)", "bbox": {"l": 362.11209, "t": 632.23036, "r": 374.66617, "b": 641.13692, "coord_origin": "TOPLEFT"}}, {"id": 117, "text": "finding the location of the table(s) on a", "bbox": {"l": 377.35785, "t": 632.23036, "r": 545.11505, "b": 641.13692, "coord_origin": "TOPLEFT"}}, {"id": 118, "text": "document-page and (2) finding the structure of a given table", "bbox": {"l": 308.862, "t": 644.18536, "r": 545.11517, "b": 653.09192, "coord_origin": "TOPLEFT"}}, {"id": 119, "text": "in the document.", "bbox": {"l": 308.862, "t": 656.14037, "r": 375.55167, "b": 665.04693, "coord_origin": "TOPLEFT"}}]}, "text": "Recently, significant progress has been made with vision based approaches to extract tables in documents. For the sake of completeness, the issue of table extraction from documents is typically decomposed into two separate challenges, i.e. (1) finding the location of the table(s) on a document-page and (2) finding the structure of a given table in the document."}, {"label": "text", "id": 16, "page_no": 0, "cluster": {"id": 16, "label": "text", "bbox": {"l": 307.9762268066406, "t": 667.4489135742188, "r": 545.4558715820312, "b": 713.8033447265625, "coord_origin": "TOPLEFT"}, "confidence": 0.9791521430015564, "cells": [{"id": 120, "text": "The first problem is called table-location and has been", "bbox": {"l": 320.81699, "t": 668.38036, "r": 545.11493, "b": 677.28693, "coord_origin": "TOPLEFT"}}, {"id": 121, "text": "previously addressed [30, 38, 19, 21, 23, 26, 8] with state-", "bbox": {"l": 308.862, "t": 680.33536, "r": 545.11511, "b": 689.24193, "coord_origin": "TOPLEFT"}}, {"id": 122, "text": "of-the-art object-detection networks (e.g. YOLO and later", "bbox": {"l": 308.862, "t": 692.290359, "r": 545.11511, "b": 701.19693, "coord_origin": "TOPLEFT"}}, {"id": 123, "text": "on Mask-RCNN [9]). For all practical purposes, it can be", "bbox": {"l": 308.862, "t": 704.245361, "r": 545.11499, "b": 713.151932, "coord_origin": "TOPLEFT"}}]}, "text": "The first problem is called table-location and has been previously addressed [30, 38, 19, 21, 23, 26, 8] with stateof-the-art object-detection networks (e.g. YOLO and later on Mask-RCNN [9]). For all practical purposes, it can be"}], "headers": [{"label": "page_footer", "id": 17, "page_no": 0, "cluster": {"id": 17, "label": "page_footer", "bbox": {"l": 295.121, "t": 733.4591674804688, "r": 300.10229, "b": 743.039928, "coord_origin": "TOPLEFT"}, "confidence": 0.8045891523361206, "cells": [{"id": 124, "text": "1", "bbox": {"l": 295.121, "t": 734.133366, "r": 300.10229, "b": 743.039928, "coord_origin": "TOPLEFT"}}]}, "text": "1"}, {"label": "page_header", "id": 18, "page_no": 0, "cluster": {"id": 18, "label": "page_header", "bbox": {"l": 17.166410446166992, "t": 207.82001000000002, "r": 36.339779, "b": 560.00003, "coord_origin": "TOPLEFT"}, "confidence": 0.8773143887519836, "cells": [{"id": 125, "text": "arXiv:2203.01017v2 [cs.CV] 11 Mar 2022", "bbox": {"l": 18.340221, "t": 207.82001000000002, "r": 36.339779, "b": 560.00003, "coord_origin": "TOPLEFT"}}]}, "text": "arXiv:2203.01017v2 [cs.CV] 11 Mar 2022"}]}}, {"page_no": 1, "size": {"width": 612.0, "height": 792.0}, "cells": [{"id": 0, "text": "considered as a solved problem, given enough ground-truth", "bbox": {"l": 50.112, "t": 75.20836999999995, "r": 286.36505, "b": 84.11492999999996, "coord_origin": "TOPLEFT"}}, {"id": 1, "text": "data to train on.", "bbox": {"l": 50.112, "t": 87.16339000000005, "r": 112.64721999999999, "b": 96.06994999999995, "coord_origin": "TOPLEFT"}}, {"id": 2, "text": "The second problem is called table-structure decompo-", "bbox": {"l": 62.067001, "t": 99.57141000000001, "r": 286.36496, "b": 108.47797000000003, "coord_origin": "TOPLEFT"}}, {"id": 3, "text": "sition.", "bbox": {"l": 50.112, "t": 111.52643, "r": 74.749512, "b": 120.43297999999993, "coord_origin": "TOPLEFT"}}, {"id": 4, "text": "The latter is a long standing problem in the com-", "bbox": {"l": 81.334793, "t": 111.52643, "r": 286.36514, "b": 120.43297999999993, "coord_origin": "TOPLEFT"}}, {"id": 5, "text": "munity of document understanding [6, 4, 14]. Contrary to", "bbox": {"l": 50.112, "t": 123.48145, "r": 286.36511, "b": 132.38800000000003, "coord_origin": "TOPLEFT"}}, {"id": 6, "text": "the table-location problem, there are no commonly used ap-", "bbox": {"l": 50.112, "t": 135.43646, "r": 286.36511, "b": 144.34302000000002, "coord_origin": "TOPLEFT"}}, {"id": 7, "text": "proaches that can easily be re-purposed to solve this prob-", "bbox": {"l": 50.112, "t": 147.39246000000003, "r": 286.36505, "b": 156.29900999999995, "coord_origin": "TOPLEFT"}}, {"id": 8, "text": "lem. Lately, a set of new model-architectures has been pro-", "bbox": {"l": 50.112, "t": 159.34747000000004, "r": 286.36511, "b": 168.25402999999994, "coord_origin": "TOPLEFT"}}, {"id": 9, "text": "posed by the community to address table-structure decom-", "bbox": {"l": 50.112, "t": 171.30249000000003, "r": 286.36508, "b": 180.20905000000005, "coord_origin": "TOPLEFT"}}, {"id": 10, "text": "position [37, 36, 18, 20]. All these models have some weak-", "bbox": {"l": 50.112, "t": 183.25751000000002, "r": 286.36511, "b": 192.16405999999995, "coord_origin": "TOPLEFT"}}, {"id": 11, "text": "nesses (see Sec. 2). The common denominator here is the", "bbox": {"l": 50.112, "t": 195.21252000000004, "r": 286.36508, "b": 204.11908000000005, "coord_origin": "TOPLEFT"}}, {"id": 12, "text": "reliance on textual features and/or the inability to provide", "bbox": {"l": 50.112, "t": 207.16754000000003, "r": 286.36514, "b": 216.07410000000004, "coord_origin": "TOPLEFT"}}, {"id": 13, "text": "the bounding box of each table-cell in the original image.", "bbox": {"l": 50.112, "t": 219.12354000000005, "r": 278.66397, "b": 228.03008999999997, "coord_origin": "TOPLEFT"}}, {"id": 14, "text": "In this paper, we want to address these weaknesses and", "bbox": {"l": 62.067001, "t": 231.53156, "r": 286.36493, "b": 240.43811000000005, "coord_origin": "TOPLEFT"}}, {"id": 15, "text": "present a robust table-structure decomposition algorithm.", "bbox": {"l": 50.112, "t": 243.48657000000003, "r": 286.36511, "b": 252.39313000000004, "coord_origin": "TOPLEFT"}}, {"id": 16, "text": "The design criteria for our model are the following. First,", "bbox": {"l": 50.112, "t": 255.44159000000002, "r": 286.36511, "b": 264.34813999999994, "coord_origin": "TOPLEFT"}}, {"id": 17, "text": "we want our algorithm to be language agnostic. In this way,", "bbox": {"l": 50.112, "t": 267.39661, "r": 286.36502, "b": 276.30316000000005, "coord_origin": "TOPLEFT"}}, {"id": 18, "text": "we can obtain the structure of any table, irregardless of the", "bbox": {"l": 50.112, "t": 279.35155999999995, "r": 286.36508, "b": 288.25815, "coord_origin": "TOPLEFT"}}, {"id": 19, "text": "language.", "bbox": {"l": 50.112, "t": 291.30759, "r": 88.567635, "b": 300.21414, "coord_origin": "TOPLEFT"}}, {"id": 20, "text": "Second, we want our algorithm to leverage as", "bbox": {"l": 95.501602, "t": 291.30759, "r": 286.36505, "b": 300.21414, "coord_origin": "TOPLEFT"}}, {"id": 21, "text": "much data as possible from the original PDF document. For", "bbox": {"l": 50.112, "t": 303.26257, "r": 286.36508, "b": 312.16913, "coord_origin": "TOPLEFT"}}, {"id": 22, "text": "programmatic PDF documents, the text-cells can often be", "bbox": {"l": 50.112, "t": 315.21756, "r": 286.36511, "b": 324.12411, "coord_origin": "TOPLEFT"}}, {"id": 23, "text": "extracted much faster and with higher accuracy compared", "bbox": {"l": 50.112, "t": 327.17255, "r": 286.36505, "b": 336.0791, "coord_origin": "TOPLEFT"}}, {"id": 24, "text": "to OCR methods. Last but not least, we want to have a di-", "bbox": {"l": 50.112, "t": 339.12753, "r": 286.36511, "b": 348.03409, "coord_origin": "TOPLEFT"}}, {"id": 25, "text": "rect link between the table-cell and its bounding box in the", "bbox": {"l": 50.112, "t": 351.08353, "r": 286.36508, "b": 359.99008, "coord_origin": "TOPLEFT"}}, {"id": 26, "text": "image.", "bbox": {"l": 50.112, "t": 363.03851, "r": 76.951241, "b": 371.94507, "coord_origin": "TOPLEFT"}}, {"id": 27, "text": "To meet the design criteria listed above, we developed a", "bbox": {"l": 62.067001, "t": 375.4465, "r": 286.36499, "b": 384.35306, "coord_origin": "TOPLEFT"}}, {"id": 28, "text": "new model called", "bbox": {"l": 50.112, "t": 387.40149, "r": 120.98594, "b": 396.30804, "coord_origin": "TOPLEFT"}}, {"id": 29, "text": "TableFormer", "bbox": {"l": 123.901, "t": 387.28192, "r": 179.7314, "b": 396.23830999999996, "coord_origin": "TOPLEFT"}}, {"id": 30, "text": "and a synthetically gener-", "bbox": {"l": 182.646, "t": 387.40149, "r": 286.36658, "b": 396.30804, "coord_origin": "TOPLEFT"}}, {"id": 31, "text": "ated table structure dataset called", "bbox": {"l": 50.112, "t": 399.35648, "r": 181.75778, "b": 408.26302999999996, "coord_origin": "TOPLEFT"}}, {"id": 32, "text": "SynthTabNet", "bbox": {"l": 184.104, "t": 399.23690999999997, "r": 240.2034, "b": 408.1933, "coord_origin": "TOPLEFT"}}, {"id": 33, "text": "$^{1}$. In partic-", "bbox": {"l": 240.20401, "t": 399.35648, "r": 286.36069, "b": 408.26302999999996, "coord_origin": "TOPLEFT"}}, {"id": 34, "text": "ular, our contributions in this work can be summarised as", "bbox": {"l": 50.112015, "t": 411.31146, "r": 286.36511, "b": 420.21802, "coord_origin": "TOPLEFT"}}, {"id": 35, "text": "follows:", "bbox": {"l": 50.112015, "t": 423.26645, "r": 82.520355, "b": 432.173, "coord_origin": "TOPLEFT"}}, {"id": 36, "text": "\u2022", "bbox": {"l": 61.569016, "t": 444.55145, "r": 70.741714, "b": 453.45801, "coord_origin": "TOPLEFT"}}, {"id": 37, "text": "We propose", "bbox": {"l": 73.034889, "t": 444.55145, "r": 117.10054, "b": 453.45801, "coord_origin": "TOPLEFT"}}, {"id": 38, "text": "TableFormer", "bbox": {"l": 119.59001, "t": 444.43188, "r": 175.42041, "b": 453.38828, "coord_origin": "TOPLEFT"}}, {"id": 39, "text": ", a transformer based model", "bbox": {"l": 175.42102, "t": 444.55145, "r": 286.36453, "b": 453.45801, "coord_origin": "TOPLEFT"}}, {"id": 40, "text": "that predicts tables structure and bounding boxes for", "bbox": {"l": 70.037018, "t": 456.50644000000005, "r": 286.3649, "b": 465.41299, "coord_origin": "TOPLEFT"}}, {"id": 41, "text": "the table content simultaneously in an end-to-end ap-", "bbox": {"l": 70.037018, "t": 468.46143, "r": 286.3649, "b": 477.36798, "coord_origin": "TOPLEFT"}}, {"id": 42, "text": "proach.", "bbox": {"l": 70.037018, "t": 480.41641, "r": 99.635902, "b": 489.32297, "coord_origin": "TOPLEFT"}}, {"id": 43, "text": "\u2022", "bbox": {"l": 61.569016, "t": 502.15341, "r": 71.619438, "b": 511.05997, "coord_origin": "TOPLEFT"}}, {"id": 44, "text": "Across all benchmark datasets", "bbox": {"l": 74.132042, "t": 502.15341, "r": 196.10396, "b": 511.05997, "coord_origin": "TOPLEFT"}}, {"id": 45, "text": "TableFormer", "bbox": {"l": 200.31001, "t": 502.03384, "r": 256.14041, "b": 510.99023, "coord_origin": "TOPLEFT"}}, {"id": 46, "text": "signif-", "bbox": {"l": 260.35001, "t": 502.15341, "r": 286.36237, "b": 511.05997, "coord_origin": "TOPLEFT"}}, {"id": 47, "text": "icantly outperforms existing state-of-the-art metrics,", "bbox": {"l": 70.037003, "t": 514.1084000000001, "r": 286.3649, "b": 523.01495, "coord_origin": "TOPLEFT"}}, {"id": 48, "text": "while being much more efficient in training and infer-", "bbox": {"l": 70.037003, "t": 526.06439, "r": 286.36487, "b": 534.97095, "coord_origin": "TOPLEFT"}}, {"id": 49, "text": "ence to existing works.", "bbox": {"l": 70.037003, "t": 538.0193899999999, "r": 161.65305, "b": 546.9259500000001, "coord_origin": "TOPLEFT"}}, {"id": 50, "text": "\u2022", "bbox": {"l": 61.569, "t": 559.75639, "r": 71.115913, "b": 568.66295, "coord_origin": "TOPLEFT"}}, {"id": 51, "text": "We present", "bbox": {"l": 73.502647, "t": 559.75639, "r": 116.71199, "b": 568.66295, "coord_origin": "TOPLEFT"}}, {"id": 52, "text": "SynthTabNet", "bbox": {"l": 121.583, "t": 559.63684, "r": 177.68239, "b": 568.59322, "coord_origin": "TOPLEFT"}}, {"id": 53, "text": "a synthetically generated", "bbox": {"l": 182.55301, "t": 559.75639, "r": 286.36328, "b": 568.66295, "coord_origin": "TOPLEFT"}}, {"id": 54, "text": "dataset, with various appearance styles and complex-", "bbox": {"l": 70.03701, "t": 571.7114, "r": 286.36493, "b": 580.6179500000001, "coord_origin": "TOPLEFT"}}, {"id": 55, "text": "ity.", "bbox": {"l": 70.03701, "t": 583.6664000000001, "r": 82.400597, "b": 592.57295, "coord_origin": "TOPLEFT"}}, {"id": 56, "text": "\u2022", "bbox": {"l": 61.569008000000004, "t": 605.4034, "r": 72.332527, "b": 614.30995, "coord_origin": "TOPLEFT"}}, {"id": 57, "text": "An augmented dataset based on PubTabNet [37],", "bbox": {"l": 75.023399, "t": 605.4034, "r": 286.36508, "b": 614.30995, "coord_origin": "TOPLEFT"}}, {"id": 58, "text": "FinTabNet [36], and TableBank [17] with generated", "bbox": {"l": 70.03701, "t": 617.3584, "r": 286.36487, "b": 626.26495, "coord_origin": "TOPLEFT"}}, {"id": 59, "text": "ground-truth for reproducibility.", "bbox": {"l": 70.03701, "t": 629.31439, "r": 198.05641, "b": 638.22095, "coord_origin": "TOPLEFT"}}, {"id": 60, "text": "The paper is structured as follows. In Sec. 2, we give", "bbox": {"l": 62.067009000000006, "t": 650.59839, "r": 286.36496, "b": 659.50494, "coord_origin": "TOPLEFT"}}, {"id": 61, "text": "a brief overview of the current state-of-the-art. In Sec. 3,", "bbox": {"l": 50.112007, "t": 662.55339, "r": 286.36511, "b": 671.45995, "coord_origin": "TOPLEFT"}}, {"id": 62, "text": "we describe the datasets on which we train. In Sec. 4, we", "bbox": {"l": 50.112007, "t": 674.50839, "r": 286.36511, "b": 683.41496, "coord_origin": "TOPLEFT"}}, {"id": 63, "text": "introduce the TableFormer model-architecture and describe", "bbox": {"l": 50.112007, "t": 686.46339, "r": 286.36511, "b": 695.369957, "coord_origin": "TOPLEFT"}}, {"id": 64, "text": "$^{1}$https://github.com/IBM/SynthTabNet", "bbox": {"l": 60.97100100000001, "t": 705.596275, "r": 183.73055, "b": 712.721542, "coord_origin": "TOPLEFT"}}, {"id": 65, "text": "its results & performance in Sec. 5. As a conclusion, we de-", "bbox": {"l": 308.862, "t": 75.20836999999995, "r": 545.11511, "b": 84.11492999999996, "coord_origin": "TOPLEFT"}}, {"id": 66, "text": "scribe how this new model-architecture can be re-purposed", "bbox": {"l": 308.862, "t": 87.16339000000005, "r": 545.11505, "b": 96.06994999999995, "coord_origin": "TOPLEFT"}}, {"id": 67, "text": "for other tasks in the computer-vision community.", "bbox": {"l": 308.862, "t": 99.11841000000004, "r": 508.08417000000003, "b": 108.02495999999985, "coord_origin": "TOPLEFT"}}, {"id": 68, "text": "2.", "bbox": {"l": 308.862, "t": 121.73193000000003, "r": 315.5831, "b": 132.47968000000003, "coord_origin": "TOPLEFT"}}, {"id": 69, "text": "Previous work and State of the Art", "bbox": {"l": 324.54456, "t": 121.73193000000003, "r": 498.28021, "b": 132.47968000000003, "coord_origin": "TOPLEFT"}}, {"id": 70, "text": "Identifying the structure of a table has been an outstand-", "bbox": {"l": 320.81699, "t": 142.22136999999998, "r": 545.11493, "b": 151.12793, "coord_origin": "TOPLEFT"}}, {"id": 71, "text": "ing problem in the document-parsing community, that mo-", "bbox": {"l": 308.862, "t": 154.17638999999997, "r": 545.11505, "b": 163.08294999999998, "coord_origin": "TOPLEFT"}}, {"id": 72, "text": "tivates many organised public challenges [6, 4, 14].", "bbox": {"l": 308.862, "t": 166.13140999999996, "r": 522.55975, "b": 175.03796, "coord_origin": "TOPLEFT"}}, {"id": 73, "text": "The", "bbox": {"l": 529.62323, "t": 166.13140999999996, "r": 545.11505, "b": 175.03796, "coord_origin": "TOPLEFT"}}, {"id": 74, "text": "difficulty of the problem can be attributed to a number of", "bbox": {"l": 308.862, "t": 178.08642999999995, "r": 545.11517, "b": 186.99298, "coord_origin": "TOPLEFT"}}, {"id": 75, "text": "factors. First, there is a large variety in the shapes and sizes", "bbox": {"l": 308.862, "t": 190.04143999999997, "r": 545.11511, "b": 198.94799999999998, "coord_origin": "TOPLEFT"}}, {"id": 76, "text": "of tables.", "bbox": {"l": 308.862, "t": 201.99645999999996, "r": 346.97891, "b": 210.90301999999997, "coord_origin": "TOPLEFT"}}, {"id": 77, "text": "Such large variety requires a flexible method.", "bbox": {"l": 354.86929, "t": 201.99645999999996, "r": 545.11511, "b": 210.90301999999997, "coord_origin": "TOPLEFT"}}, {"id": 78, "text": "This is especially true for complex column- and row head-", "bbox": {"l": 308.862, "t": 213.95245, "r": 545.11505, "b": 222.85901, "coord_origin": "TOPLEFT"}}, {"id": 79, "text": "ers, which can be extremely intricate and demanding.", "bbox": {"l": 308.862, "t": 225.90747, "r": 530.9184, "b": 234.81403, "coord_origin": "TOPLEFT"}}, {"id": 80, "text": "A", "bbox": {"l": 537.92212, "t": 225.90747, "r": 545.11511, "b": 234.81403, "coord_origin": "TOPLEFT"}}, {"id": 81, "text": "second factor of complexity is the lack of data with regard", "bbox": {"l": 308.862, "t": 237.86248999999998, "r": 545.11517, "b": 246.76904000000002, "coord_origin": "TOPLEFT"}}, {"id": 82, "text": "to table-structure. Until the publication of PubTabNet [37],", "bbox": {"l": 308.862, "t": 249.8175, "r": 545.11511, "b": 258.72406, "coord_origin": "TOPLEFT"}}, {"id": 83, "text": "there were no large datasets (i.e.", "bbox": {"l": 308.862, "t": 261.77252, "r": 439.8402699999999, "b": 270.67908, "coord_origin": "TOPLEFT"}}, {"id": 84, "text": ">", "bbox": {"l": 444.43999999999994, "t": 261.61310000000003, "r": 452.1889, "b": 270.45989999999995, "coord_origin": "TOPLEFT"}}, {"id": 85, "text": "100", "bbox": {"l": 455.89001, "t": 261.61310000000003, "r": 470.83392000000003, "b": 270.45989999999995, "coord_origin": "TOPLEFT"}}, {"id": 86, "text": "K tables) that pro-", "bbox": {"l": 470.83401, "t": 261.77252, "r": 545.11517, "b": 270.67908, "coord_origin": "TOPLEFT"}}, {"id": 87, "text": "vided structure information. This happens primarily due to", "bbox": {"l": 308.862, "t": 273.72748, "r": 545.11511, "b": 282.63406, "coord_origin": "TOPLEFT"}}, {"id": 88, "text": "the fact that tables are notoriously time-consuming to an-", "bbox": {"l": 308.862, "t": 285.6835, "r": 545.11511, "b": 294.59006, "coord_origin": "TOPLEFT"}}, {"id": 89, "text": "notate by hand. However, this has definitely changed in re-", "bbox": {"l": 308.862, "t": 297.63849, "r": 545.11511, "b": 306.54504, "coord_origin": "TOPLEFT"}}, {"id": 90, "text": "cent years with the deliverance of PubTabNet [37], FinTab-", "bbox": {"l": 308.862, "t": 309.59348, "r": 545.11517, "b": 318.50003000000004, "coord_origin": "TOPLEFT"}}, {"id": 91, "text": "Net [36], TableBank [17] etc.", "bbox": {"l": 308.862, "t": 321.54846, "r": 425.92255, "b": 330.45502, "coord_origin": "TOPLEFT"}}, {"id": 92, "text": "Before the rising popularity of deep neural networks,", "bbox": {"l": 320.81699, "t": 333.56946, "r": 545.11499, "b": 342.47601, "coord_origin": "TOPLEFT"}}, {"id": 93, "text": "the community relied heavily on heuristic and/or statistical", "bbox": {"l": 308.862, "t": 345.52444, "r": 545.11499, "b": 354.43100000000004, "coord_origin": "TOPLEFT"}}, {"id": 94, "text": "methods to do table structure identification [3, 7, 11, 5, 13,", "bbox": {"l": 308.862, "t": 357.47943, "r": 545.11517, "b": 366.38599, "coord_origin": "TOPLEFT"}}, {"id": 95, "text": "28]. Although such methods work well on constrained ta-", "bbox": {"l": 308.862, "t": 369.43542, "r": 545.11511, "b": 378.34198, "coord_origin": "TOPLEFT"}}, {"id": 96, "text": "bles [12], a more data-driven approach can be applied due", "bbox": {"l": 308.862, "t": 381.39041, "r": 545.11505, "b": 390.29697, "coord_origin": "TOPLEFT"}}, {"id": 97, "text": "to the advent of convolutional neural networks (CNNs) and", "bbox": {"l": 308.862, "t": 393.3453999999999, "r": 545.11505, "b": 402.25195, "coord_origin": "TOPLEFT"}}, {"id": 98, "text": "the availability of large datasets. To the best-of-our knowl-", "bbox": {"l": 308.862, "t": 405.30038, "r": 545.11517, "b": 414.20694, "coord_origin": "TOPLEFT"}}, {"id": 99, "text": "edge, there are currently two different types of network ar-", "bbox": {"l": 308.862, "t": 417.25537, "r": 545.11523, "b": 426.16193, "coord_origin": "TOPLEFT"}}, {"id": 100, "text": "chitecture that are being pursued for state-of-the-art table-", "bbox": {"l": 308.862, "t": 429.21136000000007, "r": 545.11511, "b": 438.11792, "coord_origin": "TOPLEFT"}}, {"id": 101, "text": "structure identification.", "bbox": {"l": 308.862, "t": 441.16635, "r": 401.28503, "b": 450.0729099999999, "coord_origin": "TOPLEFT"}}, {"id": 102, "text": "Image-to-Text networks", "bbox": {"l": 320.81699, "t": 453.06778, "r": 423.26236, "b": 462.02417, "coord_origin": "TOPLEFT"}}, {"id": 103, "text": ": In this type of network, one", "bbox": {"l": 423.26697, "t": 453.18735, "r": 545.10956, "b": 462.0939, "coord_origin": "TOPLEFT"}}, {"id": 104, "text": "predicts a sequence of tokens starting from an encoded", "bbox": {"l": 308.86197, "t": 465.14233, "r": 545.11511, "b": 474.04889, "coord_origin": "TOPLEFT"}}, {"id": 105, "text": "image.", "bbox": {"l": 308.86197, "t": 477.09732, "r": 335.7012, "b": 486.00388, "coord_origin": "TOPLEFT"}}, {"id": 106, "text": "Such sequences of tokens can be HTML table", "bbox": {"l": 345.85309, "t": 477.09732, "r": 545.11505, "b": 486.00388, "coord_origin": "TOPLEFT"}}, {"id": 107, "text": "tags [37, 17] or LaTeX symbols[10]. The choice of sym-", "bbox": {"l": 308.86197, "t": 489.05231, "r": 545.11493, "b": 497.95886, "coord_origin": "TOPLEFT"}}, {"id": 108, "text": "bols is ultimately not very important, since one can be trans-", "bbox": {"l": 308.86197, "t": 501.00729, "r": 545.11499, "b": 509.91385, "coord_origin": "TOPLEFT"}}, {"id": 109, "text": "formed into the other. There are however subtle variations", "bbox": {"l": 308.86197, "t": 512.9632899999999, "r": 545.11505, "b": 521.8698400000001, "coord_origin": "TOPLEFT"}}, {"id": 110, "text": "in the Image-to-Text networks. The easiest network archi-", "bbox": {"l": 308.86197, "t": 524.91827, "r": 545.11505, "b": 533.82483, "coord_origin": "TOPLEFT"}}, {"id": 111, "text": "tectures are \u201cimage-encoder", "bbox": {"l": 308.86197, "t": 536.87328, "r": 420.94119, "b": 545.77983, "coord_origin": "TOPLEFT"}}, {"id": 112, "text": "\u2192", "bbox": {"l": 423.59497, "t": 536.1559599999999, "r": 433.5575600000001, "b": 545.56065, "coord_origin": "TOPLEFT"}}, {"id": 113, "text": "text-decoder\u201d (IETD), sim-", "bbox": {"l": 436.21198, "t": 536.87328, "r": 545.11316, "b": 545.77983, "coord_origin": "TOPLEFT"}}, {"id": 114, "text": "ilar to network architectures that try to provide captions to", "bbox": {"l": 308.86197, "t": 548.82828, "r": 545.11511, "b": 557.73483, "coord_origin": "TOPLEFT"}}, {"id": 115, "text": "images [32]. In these IETD networks, one expects as output", "bbox": {"l": 308.86197, "t": 560.78328, "r": 545.11493, "b": 569.68983, "coord_origin": "TOPLEFT"}}, {"id": 116, "text": "the LaTeX/HTML string of the entire table, i.e. the sym-", "bbox": {"l": 308.86197, "t": 572.73828, "r": 545.11499, "b": 581.6448399999999, "coord_origin": "TOPLEFT"}}, {"id": 117, "text": "bols necessary for creating the table with the content of the", "bbox": {"l": 308.86197, "t": 584.69427, "r": 545.11505, "b": 593.60083, "coord_origin": "TOPLEFT"}}, {"id": 118, "text": "table. Another approach is the \u201cimage-encoder", "bbox": {"l": 308.86197, "t": 596.6492800000001, "r": 497.07541, "b": 605.55583, "coord_origin": "TOPLEFT"}}, {"id": 119, "text": "\u2192", "bbox": {"l": 499.80496, "t": 595.93196, "r": 509.76755, "b": 605.33665, "coord_origin": "TOPLEFT"}}, {"id": 120, "text": "dual de-", "bbox": {"l": 512.50098, "t": 596.6492800000001, "r": 545.10852, "b": 605.55583, "coord_origin": "TOPLEFT"}}, {"id": 121, "text": "coder\u201d (IEDD) networks. In these type of networks, one has", "bbox": {"l": 308.86197, "t": 608.60428, "r": 545.11511, "b": 617.5108299999999, "coord_origin": "TOPLEFT"}}, {"id": 122, "text": "two consecutive decoders with different purposes. The first", "bbox": {"l": 308.86197, "t": 620.55928, "r": 545.11505, "b": 629.46584, "coord_origin": "TOPLEFT"}}, {"id": 123, "text": "decoder is the", "bbox": {"l": 308.86197, "t": 632.51428, "r": 364.78201, "b": 641.42084, "coord_origin": "TOPLEFT"}}, {"id": 124, "text": "tag-decoder", "bbox": {"l": 367.57397, "t": 632.60394, "r": 415.61362, "b": 641.1917, "coord_origin": "TOPLEFT"}}, {"id": 125, "text": ", i.e. it only produces the HTM-", "bbox": {"l": 415.61298, "t": 632.51428, "r": 545.11688, "b": 641.42084, "coord_origin": "TOPLEFT"}}, {"id": 126, "text": "L/LaTeX tags which construct an empty table. The second", "bbox": {"l": 308.86197, "t": 644.46928, "r": 545.11511, "b": 653.37584, "coord_origin": "TOPLEFT"}}, {"id": 127, "text": "content-decoder", "bbox": {"l": 308.86197, "t": 656.51494, "r": 373.59894, "b": 665.1027, "coord_origin": "TOPLEFT"}}, {"id": 128, "text": "uses the encoding of the image in combi-", "bbox": {"l": 376.90698, "t": 656.4252799999999, "r": 545.11548, "b": 665.33184, "coord_origin": "TOPLEFT"}}, {"id": 129, "text": "nation with the output encoding of each cell-tag (from the", "bbox": {"l": 308.862, "t": 668.38028, "r": 545.11517, "b": 677.28684, "coord_origin": "TOPLEFT"}}, {"id": 130, "text": "tag-decoder", "bbox": {"l": 308.862, "t": 680.42494, "r": 356.90164, "b": 689.0127, "coord_origin": "TOPLEFT"}}, {"id": 131, "text": ") to generate the textual content of each table", "bbox": {"l": 357.13101, "t": 680.33528, "r": 545.1153, "b": 689.24184, "coord_origin": "TOPLEFT"}}, {"id": 132, "text": "cell. The network architecture of IEDD is certainly more", "bbox": {"l": 308.862, "t": 692.290283, "r": 545.11511, "b": 701.196846, "coord_origin": "TOPLEFT"}}, {"id": 133, "text": "elaborate, but it has the advantage that one can pre-train the", "bbox": {"l": 308.862, "t": 704.245285, "r": 545.11517, "b": 713.151848, "coord_origin": "TOPLEFT"}}, {"id": 134, "text": "2", "bbox": {"l": 295.121, "t": 734.133282, "r": 300.10229, "b": 743.039845, "coord_origin": "TOPLEFT"}}], "predictions": {"layout": {"clusters": [{"id": 0, "label": "text", "bbox": {"l": 49.57566452026367, "t": 74.31916046142578, "r": 286.4354248046875, "b": 96.06994999999995, "coord_origin": "TOPLEFT"}, "confidence": 0.9656872153282166, "cells": [{"id": 0, "text": "considered as a solved problem, given enough ground-truth", "bbox": {"l": 50.112, "t": 75.20836999999995, "r": 286.36505, "b": 84.11492999999996, "coord_origin": "TOPLEFT"}}, {"id": 1, "text": "data to train on.", "bbox": {"l": 50.112, "t": 87.16339000000005, "r": 112.64721999999999, "b": 96.06994999999995, "coord_origin": "TOPLEFT"}}]}, {"id": 1, "label": "text", "bbox": {"l": 49.447235107421875, "t": 98.2728271484375, "r": 286.4484558105469, "b": 228.4640655517578, "coord_origin": "TOPLEFT"}, "confidence": 0.9838619828224182, "cells": [{"id": 2, "text": "The second problem is called table-structure decompo-", "bbox": {"l": 62.067001, "t": 99.57141000000001, "r": 286.36496, "b": 108.47797000000003, "coord_origin": "TOPLEFT"}}, {"id": 3, "text": "sition.", "bbox": {"l": 50.112, "t": 111.52643, "r": 74.749512, "b": 120.43297999999993, "coord_origin": "TOPLEFT"}}, {"id": 4, "text": "The latter is a long standing problem in the com-", "bbox": {"l": 81.334793, "t": 111.52643, "r": 286.36514, "b": 120.43297999999993, "coord_origin": "TOPLEFT"}}, {"id": 5, "text": "munity of document understanding [6, 4, 14]. Contrary to", "bbox": {"l": 50.112, "t": 123.48145, "r": 286.36511, "b": 132.38800000000003, "coord_origin": "TOPLEFT"}}, {"id": 6, "text": "the table-location problem, there are no commonly used ap-", "bbox": {"l": 50.112, "t": 135.43646, "r": 286.36511, "b": 144.34302000000002, "coord_origin": "TOPLEFT"}}, {"id": 7, "text": "proaches that can easily be re-purposed to solve this prob-", "bbox": {"l": 50.112, "t": 147.39246000000003, "r": 286.36505, "b": 156.29900999999995, "coord_origin": "TOPLEFT"}}, {"id": 8, "text": "lem. Lately, a set of new model-architectures has been pro-", "bbox": {"l": 50.112, "t": 159.34747000000004, "r": 286.36511, "b": 168.25402999999994, "coord_origin": "TOPLEFT"}}, {"id": 9, "text": "posed by the community to address table-structure decom-", "bbox": {"l": 50.112, "t": 171.30249000000003, "r": 286.36508, "b": 180.20905000000005, "coord_origin": "TOPLEFT"}}, {"id": 10, "text": "position [37, 36, 18, 20]. All these models have some weak-", "bbox": {"l": 50.112, "t": 183.25751000000002, "r": 286.36511, "b": 192.16405999999995, "coord_origin": "TOPLEFT"}}, {"id": 11, "text": "nesses (see Sec. 2). The common denominator here is the", "bbox": {"l": 50.112, "t": 195.21252000000004, "r": 286.36508, "b": 204.11908000000005, "coord_origin": "TOPLEFT"}}, {"id": 12, "text": "reliance on textual features and/or the inability to provide", "bbox": {"l": 50.112, "t": 207.16754000000003, "r": 286.36514, "b": 216.07410000000004, "coord_origin": "TOPLEFT"}}, {"id": 13, "text": "the bounding box of each table-cell in the original image.", "bbox": {"l": 50.112, "t": 219.12354000000005, "r": 278.66397, "b": 228.03008999999997, "coord_origin": "TOPLEFT"}}]}, {"id": 2, "label": "text", "bbox": {"l": 49.21666717529297, "t": 230.39610290527344, "r": 286.7552185058594, "b": 371.94507, "coord_origin": "TOPLEFT"}, "confidence": 0.9855936169624329, "cells": [{"id": 14, "text": "In this paper, we want to address these weaknesses and", "bbox": {"l": 62.067001, "t": 231.53156, "r": 286.36493, "b": 240.43811000000005, "coord_origin": "TOPLEFT"}}, {"id": 15, "text": "present a robust table-structure decomposition algorithm.", "bbox": {"l": 50.112, "t": 243.48657000000003, "r": 286.36511, "b": 252.39313000000004, "coord_origin": "TOPLEFT"}}, {"id": 16, "text": "The design criteria for our model are the following. First,", "bbox": {"l": 50.112, "t": 255.44159000000002, "r": 286.36511, "b": 264.34813999999994, "coord_origin": "TOPLEFT"}}, {"id": 17, "text": "we want our algorithm to be language agnostic. In this way,", "bbox": {"l": 50.112, "t": 267.39661, "r": 286.36502, "b": 276.30316000000005, "coord_origin": "TOPLEFT"}}, {"id": 18, "text": "we can obtain the structure of any table, irregardless of the", "bbox": {"l": 50.112, "t": 279.35155999999995, "r": 286.36508, "b": 288.25815, "coord_origin": "TOPLEFT"}}, {"id": 19, "text": "language.", "bbox": {"l": 50.112, "t": 291.30759, "r": 88.567635, "b": 300.21414, "coord_origin": "TOPLEFT"}}, {"id": 20, "text": "Second, we want our algorithm to leverage as", "bbox": {"l": 95.501602, "t": 291.30759, "r": 286.36505, "b": 300.21414, "coord_origin": "TOPLEFT"}}, {"id": 21, "text": "much data as possible from the original PDF document. For", "bbox": {"l": 50.112, "t": 303.26257, "r": 286.36508, "b": 312.16913, "coord_origin": "TOPLEFT"}}, {"id": 22, "text": "programmatic PDF documents, the text-cells can often be", "bbox": {"l": 50.112, "t": 315.21756, "r": 286.36511, "b": 324.12411, "coord_origin": "TOPLEFT"}}, {"id": 23, "text": "extracted much faster and with higher accuracy compared", "bbox": {"l": 50.112, "t": 327.17255, "r": 286.36505, "b": 336.0791, "coord_origin": "TOPLEFT"}}, {"id": 24, "text": "to OCR methods. Last but not least, we want to have a di-", "bbox": {"l": 50.112, "t": 339.12753, "r": 286.36511, "b": 348.03409, "coord_origin": "TOPLEFT"}}, {"id": 25, "text": "rect link between the table-cell and its bounding box in the", "bbox": {"l": 50.112, "t": 351.08353, "r": 286.36508, "b": 359.99008, "coord_origin": "TOPLEFT"}}, {"id": 26, "text": "image.", "bbox": {"l": 50.112, "t": 363.03851, "r": 76.951241, "b": 371.94507, "coord_origin": "TOPLEFT"}}]}, {"id": 3, "label": "text", "bbox": {"l": 49.4359016418457, "t": 374.26544189453125, "r": 286.4716491699219, "b": 432.173, "coord_origin": "TOPLEFT"}, "confidence": 0.9820109009742737, "cells": [{"id": 27, "text": "To meet the design criteria listed above, we developed a", "bbox": {"l": 62.067001, "t": 375.4465, "r": 286.36499, "b": 384.35306, "coord_origin": "TOPLEFT"}}, {"id": 28, "text": "new model called", "bbox": {"l": 50.112, "t": 387.40149, "r": 120.98594, "b": 396.30804, "coord_origin": "TOPLEFT"}}, {"id": 29, "text": "TableFormer", "bbox": {"l": 123.901, "t": 387.28192, "r": 179.7314, "b": 396.23830999999996, "coord_origin": "TOPLEFT"}}, {"id": 30, "text": "and a synthetically gener-", "bbox": {"l": 182.646, "t": 387.40149, "r": 286.36658, "b": 396.30804, "coord_origin": "TOPLEFT"}}, {"id": 31, "text": "ated table structure dataset called", "bbox": {"l": 50.112, "t": 399.35648, "r": 181.75778, "b": 408.26302999999996, "coord_origin": "TOPLEFT"}}, {"id": 32, "text": "SynthTabNet", "bbox": {"l": 184.104, "t": 399.23690999999997, "r": 240.2034, "b": 408.1933, "coord_origin": "TOPLEFT"}}, {"id": 33, "text": "$^{1}$. In partic-", "bbox": {"l": 240.20401, "t": 399.35648, "r": 286.36069, "b": 408.26302999999996, "coord_origin": "TOPLEFT"}}, {"id": 34, "text": "ular, our contributions in this work can be summarised as", "bbox": {"l": 50.112015, "t": 411.31146, "r": 286.36511, "b": 420.21802, "coord_origin": "TOPLEFT"}}, {"id": 35, "text": "follows:", "bbox": {"l": 50.112015, "t": 423.26645, "r": 82.520355, "b": 432.173, "coord_origin": "TOPLEFT"}}]}, {"id": 4, "label": "list_item", "bbox": {"l": 60.82297897338867, "t": 443.35675048828125, "r": 286.6727294921875, "b": 489.6653747558594, "coord_origin": "TOPLEFT"}, "confidence": 0.9822155237197876, "cells": [{"id": 36, "text": "\u2022", "bbox": {"l": 61.569016, "t": 444.55145, "r": 70.741714, "b": 453.45801, "coord_origin": "TOPLEFT"}}, {"id": 37, "text": "We propose", "bbox": {"l": 73.034889, "t": 444.55145, "r": 117.10054, "b": 453.45801, "coord_origin": "TOPLEFT"}}, {"id": 38, "text": "TableFormer", "bbox": {"l": 119.59001, "t": 444.43188, "r": 175.42041, "b": 453.38828, "coord_origin": "TOPLEFT"}}, {"id": 39, "text": ", a transformer based model", "bbox": {"l": 175.42102, "t": 444.55145, "r": 286.36453, "b": 453.45801, "coord_origin": "TOPLEFT"}}, {"id": 40, "text": "that predicts tables structure and bounding boxes for", "bbox": {"l": 70.037018, "t": 456.50644000000005, "r": 286.3649, "b": 465.41299, "coord_origin": "TOPLEFT"}}, {"id": 41, "text": "the table content simultaneously in an end-to-end ap-", "bbox": {"l": 70.037018, "t": 468.46143, "r": 286.3649, "b": 477.36798, "coord_origin": "TOPLEFT"}}, {"id": 42, "text": "proach.", "bbox": {"l": 70.037018, "t": 480.41641, "r": 99.635902, "b": 489.32297, "coord_origin": "TOPLEFT"}}]}, {"id": 5, "label": "list_item", "bbox": {"l": 60.85062026977539, "t": 500.852294921875, "r": 286.6067810058594, "b": 547.2149047851562, "coord_origin": "TOPLEFT"}, "confidence": 0.9822708964347839, "cells": [{"id": 43, "text": "\u2022", "bbox": {"l": 61.569016, "t": 502.15341, "r": 71.619438, "b": 511.05997, "coord_origin": "TOPLEFT"}}, {"id": 44, "text": "Across all benchmark datasets", "bbox": {"l": 74.132042, "t": 502.15341, "r": 196.10396, "b": 511.05997, "coord_origin": "TOPLEFT"}}, {"id": 45, "text": "TableFormer", "bbox": {"l": 200.31001, "t": 502.03384, "r": 256.14041, "b": 510.99023, "coord_origin": "TOPLEFT"}}, {"id": 46, "text": "signif-", "bbox": {"l": 260.35001, "t": 502.15341, "r": 286.36237, "b": 511.05997, "coord_origin": "TOPLEFT"}}, {"id": 47, "text": "icantly outperforms existing state-of-the-art metrics,", "bbox": {"l": 70.037003, "t": 514.1084000000001, "r": 286.3649, "b": 523.01495, "coord_origin": "TOPLEFT"}}, {"id": 48, "text": "while being much more efficient in training and infer-", "bbox": {"l": 70.037003, "t": 526.06439, "r": 286.36487, "b": 534.97095, "coord_origin": "TOPLEFT"}}, {"id": 49, "text": "ence to existing works.", "bbox": {"l": 70.037003, "t": 538.0193899999999, "r": 161.65305, "b": 546.9259500000001, "coord_origin": "TOPLEFT"}}]}, {"id": 6, "label": "list_item", "bbox": {"l": 60.90078353881836, "t": 558.7830810546875, "r": 286.4057922363281, "b": 593.1300048828125, "coord_origin": "TOPLEFT"}, "confidence": 0.980295717716217, "cells": [{"id": 50, "text": "\u2022", "bbox": {"l": 61.569, "t": 559.75639, "r": 71.115913, "b": 568.66295, "coord_origin": "TOPLEFT"}}, {"id": 51, "text": "We present", "bbox": {"l": 73.502647, "t": 559.75639, "r": 116.71199, "b": 568.66295, "coord_origin": "TOPLEFT"}}, {"id": 52, "text": "SynthTabNet", "bbox": {"l": 121.583, "t": 559.63684, "r": 177.68239, "b": 568.59322, "coord_origin": "TOPLEFT"}}, {"id": 53, "text": "a synthetically generated", "bbox": {"l": 182.55301, "t": 559.75639, "r": 286.36328, "b": 568.66295, "coord_origin": "TOPLEFT"}}, {"id": 54, "text": "dataset, with various appearance styles and complex-", "bbox": {"l": 70.03701, "t": 571.7114, "r": 286.36493, "b": 580.6179500000001, "coord_origin": "TOPLEFT"}}, {"id": 55, "text": "ity.", "bbox": {"l": 70.03701, "t": 583.6664000000001, "r": 82.400597, "b": 592.57295, "coord_origin": "TOPLEFT"}}]}, {"id": 7, "label": "list_item", "bbox": {"l": 60.752044677734375, "t": 604.2050170898438, "r": 286.601806640625, "b": 638.6813354492188, "coord_origin": "TOPLEFT"}, "confidence": 0.9806388020515442, "cells": [{"id": 56, "text": "\u2022", "bbox": {"l": 61.569008000000004, "t": 605.4034, "r": 72.332527, "b": 614.30995, "coord_origin": "TOPLEFT"}}, {"id": 57, "text": "An augmented dataset based on PubTabNet [37],", "bbox": {"l": 75.023399, "t": 605.4034, "r": 286.36508, "b": 614.30995, "coord_origin": "TOPLEFT"}}, {"id": 58, "text": "FinTabNet [36], and TableBank [17] with generated", "bbox": {"l": 70.03701, "t": 617.3584, "r": 286.36487, "b": 626.26495, "coord_origin": "TOPLEFT"}}, {"id": 59, "text": "ground-truth for reproducibility.", "bbox": {"l": 70.03701, "t": 629.31439, "r": 198.05641, "b": 638.22095, "coord_origin": "TOPLEFT"}}]}, {"id": 8, "label": "text", "bbox": {"l": 49.25575256347656, "t": 649.7069091796875, "r": 286.68890380859375, "b": 695.369957, "coord_origin": "TOPLEFT"}, "confidence": 0.9742557406425476, "cells": [{"id": 60, "text": "The paper is structured as follows. In Sec. 2, we give", "bbox": {"l": 62.067009000000006, "t": 650.59839, "r": 286.36496, "b": 659.50494, "coord_origin": "TOPLEFT"}}, {"id": 61, "text": "a brief overview of the current state-of-the-art. In Sec. 3,", "bbox": {"l": 50.112007, "t": 662.55339, "r": 286.36511, "b": 671.45995, "coord_origin": "TOPLEFT"}}, {"id": 62, "text": "we describe the datasets on which we train. In Sec. 4, we", "bbox": {"l": 50.112007, "t": 674.50839, "r": 286.36511, "b": 683.41496, "coord_origin": "TOPLEFT"}}, {"id": 63, "text": "introduce the TableFormer model-architecture and describe", "bbox": {"l": 50.112007, "t": 686.46339, "r": 286.36511, "b": 695.369957, "coord_origin": "TOPLEFT"}}]}, {"id": 9, "label": "footnote", "bbox": {"l": 60.58177185058594, "t": 704.4016723632812, "r": 183.79791259765625, "b": 713.3980102539062, "coord_origin": "TOPLEFT"}, "confidence": 0.8953584432601929, "cells": [{"id": 64, "text": "$^{1}$https://github.com/IBM/SynthTabNet", "bbox": {"l": 60.97100100000001, "t": 705.596275, "r": 183.73055, "b": 712.721542, "coord_origin": "TOPLEFT"}}]}, {"id": 10, "label": "text", "bbox": {"l": 308.0365905761719, "t": 74.14588165283203, "r": 545.3168334960938, "b": 108.69783782958984, "coord_origin": "TOPLEFT"}, "confidence": 0.9774291515350342, "cells": [{"id": 65, "text": "its results & performance in Sec. 5. As a conclusion, we de-", "bbox": {"l": 308.862, "t": 75.20836999999995, "r": 545.11511, "b": 84.11492999999996, "coord_origin": "TOPLEFT"}}, {"id": 66, "text": "scribe how this new model-architecture can be re-purposed", "bbox": {"l": 308.862, "t": 87.16339000000005, "r": 545.11505, "b": 96.06994999999995, "coord_origin": "TOPLEFT"}}, {"id": 67, "text": "for other tasks in the computer-vision community.", "bbox": {"l": 308.862, "t": 99.11841000000004, "r": 508.08417000000003, "b": 108.02495999999985, "coord_origin": "TOPLEFT"}}]}, {"id": 11, "label": "section_header", "bbox": {"l": 308.0834045410156, "t": 120.92444610595703, "r": 498.345947265625, "b": 132.47968000000003, "coord_origin": "TOPLEFT"}, "confidence": 0.9428719282150269, "cells": [{"id": 68, "text": "2.", "bbox": {"l": 308.862, "t": 121.73193000000003, "r": 315.5831, "b": 132.47968000000003, "coord_origin": "TOPLEFT"}}, {"id": 69, "text": "Previous work and State of the Art", "bbox": {"l": 324.54456, "t": 121.73193000000003, "r": 498.28021, "b": 132.47968000000003, "coord_origin": "TOPLEFT"}}]}, {"id": 12, "label": "text", "bbox": {"l": 307.6423034667969, "t": 140.84927368164062, "r": 545.3845825195312, "b": 330.45502, "coord_origin": "TOPLEFT"}, "confidence": 0.9871960878372192, "cells": [{"id": 70, "text": "Identifying the structure of a table has been an outstand-", "bbox": {"l": 320.81699, "t": 142.22136999999998, "r": 545.11493, "b": 151.12793, "coord_origin": "TOPLEFT"}}, {"id": 71, "text": "ing problem in the document-parsing community, that mo-", "bbox": {"l": 308.862, "t": 154.17638999999997, "r": 545.11505, "b": 163.08294999999998, "coord_origin": "TOPLEFT"}}, {"id": 72, "text": "tivates many organised public challenges [6, 4, 14].", "bbox": {"l": 308.862, "t": 166.13140999999996, "r": 522.55975, "b": 175.03796, "coord_origin": "TOPLEFT"}}, {"id": 73, "text": "The", "bbox": {"l": 529.62323, "t": 166.13140999999996, "r": 545.11505, "b": 175.03796, "coord_origin": "TOPLEFT"}}, {"id": 74, "text": "difficulty of the problem can be attributed to a number of", "bbox": {"l": 308.862, "t": 178.08642999999995, "r": 545.11517, "b": 186.99298, "coord_origin": "TOPLEFT"}}, {"id": 75, "text": "factors. First, there is a large variety in the shapes and sizes", "bbox": {"l": 308.862, "t": 190.04143999999997, "r": 545.11511, "b": 198.94799999999998, "coord_origin": "TOPLEFT"}}, {"id": 76, "text": "of tables.", "bbox": {"l": 308.862, "t": 201.99645999999996, "r": 346.97891, "b": 210.90301999999997, "coord_origin": "TOPLEFT"}}, {"id": 77, "text": "Such large variety requires a flexible method.", "bbox": {"l": 354.86929, "t": 201.99645999999996, "r": 545.11511, "b": 210.90301999999997, "coord_origin": "TOPLEFT"}}, {"id": 78, "text": "This is especially true for complex column- and row head-", "bbox": {"l": 308.862, "t": 213.95245, "r": 545.11505, "b": 222.85901, "coord_origin": "TOPLEFT"}}, {"id": 79, "text": "ers, which can be extremely intricate and demanding.", "bbox": {"l": 308.862, "t": 225.90747, "r": 530.9184, "b": 234.81403, "coord_origin": "TOPLEFT"}}, {"id": 80, "text": "A", "bbox": {"l": 537.92212, "t": 225.90747, "r": 545.11511, "b": 234.81403, "coord_origin": "TOPLEFT"}}, {"id": 81, "text": "second factor of complexity is the lack of data with regard", "bbox": {"l": 308.862, "t": 237.86248999999998, "r": 545.11517, "b": 246.76904000000002, "coord_origin": "TOPLEFT"}}, {"id": 82, "text": "to table-structure. Until the publication of PubTabNet [37],", "bbox": {"l": 308.862, "t": 249.8175, "r": 545.11511, "b": 258.72406, "coord_origin": "TOPLEFT"}}, {"id": 83, "text": "there were no large datasets (i.e.", "bbox": {"l": 308.862, "t": 261.77252, "r": 439.8402699999999, "b": 270.67908, "coord_origin": "TOPLEFT"}}, {"id": 84, "text": ">", "bbox": {"l": 444.43999999999994, "t": 261.61310000000003, "r": 452.1889, "b": 270.45989999999995, "coord_origin": "TOPLEFT"}}, {"id": 85, "text": "100", "bbox": {"l": 455.89001, "t": 261.61310000000003, "r": 470.83392000000003, "b": 270.45989999999995, "coord_origin": "TOPLEFT"}}, {"id": 86, "text": "K tables) that pro-", "bbox": {"l": 470.83401, "t": 261.77252, "r": 545.11517, "b": 270.67908, "coord_origin": "TOPLEFT"}}, {"id": 87, "text": "vided structure information. This happens primarily due to", "bbox": {"l": 308.862, "t": 273.72748, "r": 545.11511, "b": 282.63406, "coord_origin": "TOPLEFT"}}, {"id": 88, "text": "the fact that tables are notoriously time-consuming to an-", "bbox": {"l": 308.862, "t": 285.6835, "r": 545.11511, "b": 294.59006, "coord_origin": "TOPLEFT"}}, {"id": 89, "text": "notate by hand. However, this has definitely changed in re-", "bbox": {"l": 308.862, "t": 297.63849, "r": 545.11511, "b": 306.54504, "coord_origin": "TOPLEFT"}}, {"id": 90, "text": "cent years with the deliverance of PubTabNet [37], FinTab-", "bbox": {"l": 308.862, "t": 309.59348, "r": 545.11517, "b": 318.50003000000004, "coord_origin": "TOPLEFT"}}, {"id": 91, "text": "Net [36], TableBank [17] etc.", "bbox": {"l": 308.862, "t": 321.54846, "r": 425.92255, "b": 330.45502, "coord_origin": "TOPLEFT"}}]}, {"id": 13, "label": "text", "bbox": {"l": 307.82720947265625, "t": 332.40576171875, "r": 545.1534423828125, "b": 450.0729099999999, "coord_origin": "TOPLEFT"}, "confidence": 0.9869063496589661, "cells": [{"id": 92, "text": "Before the rising popularity of deep neural networks,", "bbox": {"l": 320.81699, "t": 333.56946, "r": 545.11499, "b": 342.47601, "coord_origin": "TOPLEFT"}}, {"id": 93, "text": "the community relied heavily on heuristic and/or statistical", "bbox": {"l": 308.862, "t": 345.52444, "r": 545.11499, "b": 354.43100000000004, "coord_origin": "TOPLEFT"}}, {"id": 94, "text": "methods to do table structure identification [3, 7, 11, 5, 13,", "bbox": {"l": 308.862, "t": 357.47943, "r": 545.11517, "b": 366.38599, "coord_origin": "TOPLEFT"}}, {"id": 95, "text": "28]. Although such methods work well on constrained ta-", "bbox": {"l": 308.862, "t": 369.43542, "r": 545.11511, "b": 378.34198, "coord_origin": "TOPLEFT"}}, {"id": 96, "text": "bles [12], a more data-driven approach can be applied due", "bbox": {"l": 308.862, "t": 381.39041, "r": 545.11505, "b": 390.29697, "coord_origin": "TOPLEFT"}}, {"id": 97, "text": "to the advent of convolutional neural networks (CNNs) and", "bbox": {"l": 308.862, "t": 393.3453999999999, "r": 545.11505, "b": 402.25195, "coord_origin": "TOPLEFT"}}, {"id": 98, "text": "the availability of large datasets. To the best-of-our knowl-", "bbox": {"l": 308.862, "t": 405.30038, "r": 545.11517, "b": 414.20694, "coord_origin": "TOPLEFT"}}, {"id": 99, "text": "edge, there are currently two different types of network ar-", "bbox": {"l": 308.862, "t": 417.25537, "r": 545.11523, "b": 426.16193, "coord_origin": "TOPLEFT"}}, {"id": 100, "text": "chitecture that are being pursued for state-of-the-art table-", "bbox": {"l": 308.862, "t": 429.21136000000007, "r": 545.11511, "b": 438.11792, "coord_origin": "TOPLEFT"}}, {"id": 101, "text": "structure identification.", "bbox": {"l": 308.862, "t": 441.16635, "r": 401.28503, "b": 450.0729099999999, "coord_origin": "TOPLEFT"}}]}, {"id": 14, "label": "text", "bbox": {"l": 307.8529968261719, "t": 452.412353515625, "r": 545.3386840820312, "b": 713.151848, "coord_origin": "TOPLEFT"}, "confidence": 0.9843320846557617, "cells": [{"id": 102, "text": "Image-to-Text networks", "bbox": {"l": 320.81699, "t": 453.06778, "r": 423.26236, "b": 462.02417, "coord_origin": "TOPLEFT"}}, {"id": 103, "text": ": In this type of network, one", "bbox": {"l": 423.26697, "t": 453.18735, "r": 545.10956, "b": 462.0939, "coord_origin": "TOPLEFT"}}, {"id": 104, "text": "predicts a sequence of tokens starting from an encoded", "bbox": {"l": 308.86197, "t": 465.14233, "r": 545.11511, "b": 474.04889, "coord_origin": "TOPLEFT"}}, {"id": 105, "text": "image.", "bbox": {"l": 308.86197, "t": 477.09732, "r": 335.7012, "b": 486.00388, "coord_origin": "TOPLEFT"}}, {"id": 106, "text": "Such sequences of tokens can be HTML table", "bbox": {"l": 345.85309, "t": 477.09732, "r": 545.11505, "b": 486.00388, "coord_origin": "TOPLEFT"}}, {"id": 107, "text": "tags [37, 17] or LaTeX symbols[10]. The choice of sym-", "bbox": {"l": 308.86197, "t": 489.05231, "r": 545.11493, "b": 497.95886, "coord_origin": "TOPLEFT"}}, {"id": 108, "text": "bols is ultimately not very important, since one can be trans-", "bbox": {"l": 308.86197, "t": 501.00729, "r": 545.11499, "b": 509.91385, "coord_origin": "TOPLEFT"}}, {"id": 109, "text": "formed into the other. There are however subtle variations", "bbox": {"l": 308.86197, "t": 512.9632899999999, "r": 545.11505, "b": 521.8698400000001, "coord_origin": "TOPLEFT"}}, {"id": 110, "text": "in the Image-to-Text networks. The easiest network archi-", "bbox": {"l": 308.86197, "t": 524.91827, "r": 545.11505, "b": 533.82483, "coord_origin": "TOPLEFT"}}, {"id": 111, "text": "tectures are \u201cimage-encoder", "bbox": {"l": 308.86197, "t": 536.87328, "r": 420.94119, "b": 545.77983, "coord_origin": "TOPLEFT"}}, {"id": 112, "text": "\u2192", "bbox": {"l": 423.59497, "t": 536.1559599999999, "r": 433.5575600000001, "b": 545.56065, "coord_origin": "TOPLEFT"}}, {"id": 113, "text": "text-decoder\u201d (IETD), sim-", "bbox": {"l": 436.21198, "t": 536.87328, "r": 545.11316, "b": 545.77983, "coord_origin": "TOPLEFT"}}, {"id": 114, "text": "ilar to network architectures that try to provide captions to", "bbox": {"l": 308.86197, "t": 548.82828, "r": 545.11511, "b": 557.73483, "coord_origin": "TOPLEFT"}}, {"id": 115, "text": "images [32]. In these IETD networks, one expects as output", "bbox": {"l": 308.86197, "t": 560.78328, "r": 545.11493, "b": 569.68983, "coord_origin": "TOPLEFT"}}, {"id": 116, "text": "the LaTeX/HTML string of the entire table, i.e. the sym-", "bbox": {"l": 308.86197, "t": 572.73828, "r": 545.11499, "b": 581.6448399999999, "coord_origin": "TOPLEFT"}}, {"id": 117, "text": "bols necessary for creating the table with the content of the", "bbox": {"l": 308.86197, "t": 584.69427, "r": 545.11505, "b": 593.60083, "coord_origin": "TOPLEFT"}}, {"id": 118, "text": "table. Another approach is the \u201cimage-encoder", "bbox": {"l": 308.86197, "t": 596.6492800000001, "r": 497.07541, "b": 605.55583, "coord_origin": "TOPLEFT"}}, {"id": 119, "text": "\u2192", "bbox": {"l": 499.80496, "t": 595.93196, "r": 509.76755, "b": 605.33665, "coord_origin": "TOPLEFT"}}, {"id": 120, "text": "dual de-", "bbox": {"l": 512.50098, "t": 596.6492800000001, "r": 545.10852, "b": 605.55583, "coord_origin": "TOPLEFT"}}, {"id": 121, "text": "coder\u201d (IEDD) networks. In these type of networks, one has", "bbox": {"l": 308.86197, "t": 608.60428, "r": 545.11511, "b": 617.5108299999999, "coord_origin": "TOPLEFT"}}, {"id": 122, "text": "two consecutive decoders with different purposes. The first", "bbox": {"l": 308.86197, "t": 620.55928, "r": 545.11505, "b": 629.46584, "coord_origin": "TOPLEFT"}}, {"id": 123, "text": "decoder is the", "bbox": {"l": 308.86197, "t": 632.51428, "r": 364.78201, "b": 641.42084, "coord_origin": "TOPLEFT"}}, {"id": 124, "text": "tag-decoder", "bbox": {"l": 367.57397, "t": 632.60394, "r": 415.61362, "b": 641.1917, "coord_origin": "TOPLEFT"}}, {"id": 125, "text": ", i.e. it only produces the HTM-", "bbox": {"l": 415.61298, "t": 632.51428, "r": 545.11688, "b": 641.42084, "coord_origin": "TOPLEFT"}}, {"id": 126, "text": "L/LaTeX tags which construct an empty table. The second", "bbox": {"l": 308.86197, "t": 644.46928, "r": 545.11511, "b": 653.37584, "coord_origin": "TOPLEFT"}}, {"id": 127, "text": "content-decoder", "bbox": {"l": 308.86197, "t": 656.51494, "r": 373.59894, "b": 665.1027, "coord_origin": "TOPLEFT"}}, {"id": 128, "text": "uses the encoding of the image in combi-", "bbox": {"l": 376.90698, "t": 656.4252799999999, "r": 545.11548, "b": 665.33184, "coord_origin": "TOPLEFT"}}, {"id": 129, "text": "nation with the output encoding of each cell-tag (from the", "bbox": {"l": 308.862, "t": 668.38028, "r": 545.11517, "b": 677.28684, "coord_origin": "TOPLEFT"}}, {"id": 130, "text": "tag-decoder", "bbox": {"l": 308.862, "t": 680.42494, "r": 356.90164, "b": 689.0127, "coord_origin": "TOPLEFT"}}, {"id": 131, "text": ") to generate the textual content of each table", "bbox": {"l": 357.13101, "t": 680.33528, "r": 545.1153, "b": 689.24184, "coord_origin": "TOPLEFT"}}, {"id": 132, "text": "cell. The network architecture of IEDD is certainly more", "bbox": {"l": 308.862, "t": 692.290283, "r": 545.11511, "b": 701.196846, "coord_origin": "TOPLEFT"}}, {"id": 133, "text": "elaborate, but it has the advantage that one can pre-train the", "bbox": {"l": 308.862, "t": 704.245285, "r": 545.11517, "b": 713.151848, "coord_origin": "TOPLEFT"}}]}, {"id": 15, "label": "page_footer", "bbox": {"l": 294.5776062011719, "t": 733.1296997070312, "r": 300.2464904785156, "b": 743.039845, "coord_origin": "TOPLEFT"}, "confidence": 0.8778082132339478, "cells": [{"id": 134, "text": "2", "bbox": {"l": 295.121, "t": 734.133282, "r": 300.10229, "b": 743.039845, "coord_origin": "TOPLEFT"}}]}]}, "tablestructure": {"table_map": {}}, "figures_classification": null, "equations_prediction": null}, "assembled": {"elements": [{"label": "text", "id": 0, "page_no": 1, "cluster": {"id": 0, "label": "text", "bbox": {"l": 49.57566452026367, "t": 74.31916046142578, "r": 286.4354248046875, "b": 96.06994999999995, "coord_origin": "TOPLEFT"}, "confidence": 0.9656872153282166, "cells": [{"id": 0, "text": "considered as a solved problem, given enough ground-truth", "bbox": {"l": 50.112, "t": 75.20836999999995, "r": 286.36505, "b": 84.11492999999996, "coord_origin": "TOPLEFT"}}, {"id": 1, "text": "data to train on.", "bbox": {"l": 50.112, "t": 87.16339000000005, "r": 112.64721999999999, "b": 96.06994999999995, "coord_origin": "TOPLEFT"}}]}, "text": "considered as a solved problem, given enough ground-truth data to train on."}, {"label": "text", "id": 1, "page_no": 1, "cluster": {"id": 1, "label": "text", "bbox": {"l": 49.447235107421875, "t": 98.2728271484375, "r": 286.4484558105469, "b": 228.4640655517578, "coord_origin": "TOPLEFT"}, "confidence": 0.9838619828224182, "cells": [{"id": 2, "text": "The second problem is called table-structure decompo-", "bbox": {"l": 62.067001, "t": 99.57141000000001, "r": 286.36496, "b": 108.47797000000003, "coord_origin": "TOPLEFT"}}, {"id": 3, "text": "sition.", "bbox": {"l": 50.112, "t": 111.52643, "r": 74.749512, "b": 120.43297999999993, "coord_origin": "TOPLEFT"}}, {"id": 4, "text": "The latter is a long standing problem in the com-", "bbox": {"l": 81.334793, "t": 111.52643, "r": 286.36514, "b": 120.43297999999993, "coord_origin": "TOPLEFT"}}, {"id": 5, "text": "munity of document understanding [6, 4, 14]. Contrary to", "bbox": {"l": 50.112, "t": 123.48145, "r": 286.36511, "b": 132.38800000000003, "coord_origin": "TOPLEFT"}}, {"id": 6, "text": "the table-location problem, there are no commonly used ap-", "bbox": {"l": 50.112, "t": 135.43646, "r": 286.36511, "b": 144.34302000000002, "coord_origin": "TOPLEFT"}}, {"id": 7, "text": "proaches that can easily be re-purposed to solve this prob-", "bbox": {"l": 50.112, "t": 147.39246000000003, "r": 286.36505, "b": 156.29900999999995, "coord_origin": "TOPLEFT"}}, {"id": 8, "text": "lem. Lately, a set of new model-architectures has been pro-", "bbox": {"l": 50.112, "t": 159.34747000000004, "r": 286.36511, "b": 168.25402999999994, "coord_origin": "TOPLEFT"}}, {"id": 9, "text": "posed by the community to address table-structure decom-", "bbox": {"l": 50.112, "t": 171.30249000000003, "r": 286.36508, "b": 180.20905000000005, "coord_origin": "TOPLEFT"}}, {"id": 10, "text": "position [37, 36, 18, 20]. All these models have some weak-", "bbox": {"l": 50.112, "t": 183.25751000000002, "r": 286.36511, "b": 192.16405999999995, "coord_origin": "TOPLEFT"}}, {"id": 11, "text": "nesses (see Sec. 2). The common denominator here is the", "bbox": {"l": 50.112, "t": 195.21252000000004, "r": 286.36508, "b": 204.11908000000005, "coord_origin": "TOPLEFT"}}, {"id": 12, "text": "reliance on textual features and/or the inability to provide", "bbox": {"l": 50.112, "t": 207.16754000000003, "r": 286.36514, "b": 216.07410000000004, "coord_origin": "TOPLEFT"}}, {"id": 13, "text": "the bounding box of each table-cell in the original image.", "bbox": {"l": 50.112, "t": 219.12354000000005, "r": 278.66397, "b": 228.03008999999997, "coord_origin": "TOPLEFT"}}]}, "text": "The second problem is called table-structure decomposition. The latter is a long standing problem in the community of document understanding [6, 4, 14]. Contrary to the table-location problem, there are no commonly used approaches that can easily be re-purposed to solve this problem. Lately, a set of new model-architectures has been proposed by the community to address table-structure decomposition [37, 36, 18, 20]. All these models have some weaknesses (see Sec. 2). The common denominator here is the reliance on textual features and/or the inability to provide the bounding box of each table-cell in the original image."}, {"label": "text", "id": 2, "page_no": 1, "cluster": {"id": 2, "label": "text", "bbox": {"l": 49.21666717529297, "t": 230.39610290527344, "r": 286.7552185058594, "b": 371.94507, "coord_origin": "TOPLEFT"}, "confidence": 0.9855936169624329, "cells": [{"id": 14, "text": "In this paper, we want to address these weaknesses and", "bbox": {"l": 62.067001, "t": 231.53156, "r": 286.36493, "b": 240.43811000000005, "coord_origin": "TOPLEFT"}}, {"id": 15, "text": "present a robust table-structure decomposition algorithm.", "bbox": {"l": 50.112, "t": 243.48657000000003, "r": 286.36511, "b": 252.39313000000004, "coord_origin": "TOPLEFT"}}, {"id": 16, "text": "The design criteria for our model are the following. First,", "bbox": {"l": 50.112, "t": 255.44159000000002, "r": 286.36511, "b": 264.34813999999994, "coord_origin": "TOPLEFT"}}, {"id": 17, "text": "we want our algorithm to be language agnostic. In this way,", "bbox": {"l": 50.112, "t": 267.39661, "r": 286.36502, "b": 276.30316000000005, "coord_origin": "TOPLEFT"}}, {"id": 18, "text": "we can obtain the structure of any table, irregardless of the", "bbox": {"l": 50.112, "t": 279.35155999999995, "r": 286.36508, "b": 288.25815, "coord_origin": "TOPLEFT"}}, {"id": 19, "text": "language.", "bbox": {"l": 50.112, "t": 291.30759, "r": 88.567635, "b": 300.21414, "coord_origin": "TOPLEFT"}}, {"id": 20, "text": "Second, we want our algorithm to leverage as", "bbox": {"l": 95.501602, "t": 291.30759, "r": 286.36505, "b": 300.21414, "coord_origin": "TOPLEFT"}}, {"id": 21, "text": "much data as possible from the original PDF document. For", "bbox": {"l": 50.112, "t": 303.26257, "r": 286.36508, "b": 312.16913, "coord_origin": "TOPLEFT"}}, {"id": 22, "text": "programmatic PDF documents, the text-cells can often be", "bbox": {"l": 50.112, "t": 315.21756, "r": 286.36511, "b": 324.12411, "coord_origin": "TOPLEFT"}}, {"id": 23, "text": "extracted much faster and with higher accuracy compared", "bbox": {"l": 50.112, "t": 327.17255, "r": 286.36505, "b": 336.0791, "coord_origin": "TOPLEFT"}}, {"id": 24, "text": "to OCR methods. Last but not least, we want to have a di-", "bbox": {"l": 50.112, "t": 339.12753, "r": 286.36511, "b": 348.03409, "coord_origin": "TOPLEFT"}}, {"id": 25, "text": "rect link between the table-cell and its bounding box in the", "bbox": {"l": 50.112, "t": 351.08353, "r": 286.36508, "b": 359.99008, "coord_origin": "TOPLEFT"}}, {"id": 26, "text": "image.", "bbox": {"l": 50.112, "t": 363.03851, "r": 76.951241, "b": 371.94507, "coord_origin": "TOPLEFT"}}]}, "text": "In this paper, we want to address these weaknesses and present a robust table-structure decomposition algorithm. The design criteria for our model are the following. First, we want our algorithm to be language agnostic. In this way, we can obtain the structure of any table, irregardless of the language. Second, we want our algorithm to leverage as much data as possible from the original PDF document. For programmatic PDF documents, the text-cells can often be extracted much faster and with higher accuracy compared to OCR methods. Last but not least, we want to have a direct link between the table-cell and its bounding box in the image."}, {"label": "text", "id": 3, "page_no": 1, "cluster": {"id": 3, "label": "text", "bbox": {"l": 49.4359016418457, "t": 374.26544189453125, "r": 286.4716491699219, "b": 432.173, "coord_origin": "TOPLEFT"}, "confidence": 0.9820109009742737, "cells": [{"id": 27, "text": "To meet the design criteria listed above, we developed a", "bbox": {"l": 62.067001, "t": 375.4465, "r": 286.36499, "b": 384.35306, "coord_origin": "TOPLEFT"}}, {"id": 28, "text": "new model called", "bbox": {"l": 50.112, "t": 387.40149, "r": 120.98594, "b": 396.30804, "coord_origin": "TOPLEFT"}}, {"id": 29, "text": "TableFormer", "bbox": {"l": 123.901, "t": 387.28192, "r": 179.7314, "b": 396.23830999999996, "coord_origin": "TOPLEFT"}}, {"id": 30, "text": "and a synthetically gener-", "bbox": {"l": 182.646, "t": 387.40149, "r": 286.36658, "b": 396.30804, "coord_origin": "TOPLEFT"}}, {"id": 31, "text": "ated table structure dataset called", "bbox": {"l": 50.112, "t": 399.35648, "r": 181.75778, "b": 408.26302999999996, "coord_origin": "TOPLEFT"}}, {"id": 32, "text": "SynthTabNet", "bbox": {"l": 184.104, "t": 399.23690999999997, "r": 240.2034, "b": 408.1933, "coord_origin": "TOPLEFT"}}, {"id": 33, "text": "$^{1}$. In partic-", "bbox": {"l": 240.20401, "t": 399.35648, "r": 286.36069, "b": 408.26302999999996, "coord_origin": "TOPLEFT"}}, {"id": 34, "text": "ular, our contributions in this work can be summarised as", "bbox": {"l": 50.112015, "t": 411.31146, "r": 286.36511, "b": 420.21802, "coord_origin": "TOPLEFT"}}, {"id": 35, "text": "follows:", "bbox": {"l": 50.112015, "t": 423.26645, "r": 82.520355, "b": 432.173, "coord_origin": "TOPLEFT"}}]}, "text": "To meet the design criteria listed above, we developed a new model called TableFormer and a synthetically generated table structure dataset called SynthTabNet $^{1}$. In particular, our contributions in this work can be summarised as follows:"}, {"label": "list_item", "id": 4, "page_no": 1, "cluster": {"id": 4, "label": "list_item", "bbox": {"l": 60.82297897338867, "t": 443.35675048828125, "r": 286.6727294921875, "b": 489.6653747558594, "coord_origin": "TOPLEFT"}, "confidence": 0.9822155237197876, "cells": [{"id": 36, "text": "\u2022", "bbox": {"l": 61.569016, "t": 444.55145, "r": 70.741714, "b": 453.45801, "coord_origin": "TOPLEFT"}}, {"id": 37, "text": "We propose", "bbox": {"l": 73.034889, "t": 444.55145, "r": 117.10054, "b": 453.45801, "coord_origin": "TOPLEFT"}}, {"id": 38, "text": "TableFormer", "bbox": {"l": 119.59001, "t": 444.43188, "r": 175.42041, "b": 453.38828, "coord_origin": "TOPLEFT"}}, {"id": 39, "text": ", a transformer based model", "bbox": {"l": 175.42102, "t": 444.55145, "r": 286.36453, "b": 453.45801, "coord_origin": "TOPLEFT"}}, {"id": 40, "text": "that predicts tables structure and bounding boxes for", "bbox": {"l": 70.037018, "t": 456.50644000000005, "r": 286.3649, "b": 465.41299, "coord_origin": "TOPLEFT"}}, {"id": 41, "text": "the table content simultaneously in an end-to-end ap-", "bbox": {"l": 70.037018, "t": 468.46143, "r": 286.3649, "b": 477.36798, "coord_origin": "TOPLEFT"}}, {"id": 42, "text": "proach.", "bbox": {"l": 70.037018, "t": 480.41641, "r": 99.635902, "b": 489.32297, "coord_origin": "TOPLEFT"}}]}, "text": "\u2022 We propose TableFormer , a transformer based model that predicts tables structure and bounding boxes for the table content simultaneously in an end-to-end approach."}, {"label": "list_item", "id": 5, "page_no": 1, "cluster": {"id": 5, "label": "list_item", "bbox": {"l": 60.85062026977539, "t": 500.852294921875, "r": 286.6067810058594, "b": 547.2149047851562, "coord_origin": "TOPLEFT"}, "confidence": 0.9822708964347839, "cells": [{"id": 43, "text": "\u2022", "bbox": {"l": 61.569016, "t": 502.15341, "r": 71.619438, "b": 511.05997, "coord_origin": "TOPLEFT"}}, {"id": 44, "text": "Across all benchmark datasets", "bbox": {"l": 74.132042, "t": 502.15341, "r": 196.10396, "b": 511.05997, "coord_origin": "TOPLEFT"}}, {"id": 45, "text": "TableFormer", "bbox": {"l": 200.31001, "t": 502.03384, "r": 256.14041, "b": 510.99023, "coord_origin": "TOPLEFT"}}, {"id": 46, "text": "signif-", "bbox": {"l": 260.35001, "t": 502.15341, "r": 286.36237, "b": 511.05997, "coord_origin": "TOPLEFT"}}, {"id": 47, "text": "icantly outperforms existing state-of-the-art metrics,", "bbox": {"l": 70.037003, "t": 514.1084000000001, "r": 286.3649, "b": 523.01495, "coord_origin": "TOPLEFT"}}, {"id": 48, "text": "while being much more efficient in training and infer-", "bbox": {"l": 70.037003, "t": 526.06439, "r": 286.36487, "b": 534.97095, "coord_origin": "TOPLEFT"}}, {"id": 49, "text": "ence to existing works.", "bbox": {"l": 70.037003, "t": 538.0193899999999, "r": 161.65305, "b": 546.9259500000001, "coord_origin": "TOPLEFT"}}]}, "text": "\u2022 Across all benchmark datasets TableFormer significantly outperforms existing state-of-the-art metrics, while being much more efficient in training and inference to existing works."}, {"label": "list_item", "id": 6, "page_no": 1, "cluster": {"id": 6, "label": "list_item", "bbox": {"l": 60.90078353881836, "t": 558.7830810546875, "r": 286.4057922363281, "b": 593.1300048828125, "coord_origin": "TOPLEFT"}, "confidence": 0.980295717716217, "cells": [{"id": 50, "text": "\u2022", "bbox": {"l": 61.569, "t": 559.75639, "r": 71.115913, "b": 568.66295, "coord_origin": "TOPLEFT"}}, {"id": 51, "text": "We present", "bbox": {"l": 73.502647, "t": 559.75639, "r": 116.71199, "b": 568.66295, "coord_origin": "TOPLEFT"}}, {"id": 52, "text": "SynthTabNet", "bbox": {"l": 121.583, "t": 559.63684, "r": 177.68239, "b": 568.59322, "coord_origin": "TOPLEFT"}}, {"id": 53, "text": "a synthetically generated", "bbox": {"l": 182.55301, "t": 559.75639, "r": 286.36328, "b": 568.66295, "coord_origin": "TOPLEFT"}}, {"id": 54, "text": "dataset, with various appearance styles and complex-", "bbox": {"l": 70.03701, "t": 571.7114, "r": 286.36493, "b": 580.6179500000001, "coord_origin": "TOPLEFT"}}, {"id": 55, "text": "ity.", "bbox": {"l": 70.03701, "t": 583.6664000000001, "r": 82.400597, "b": 592.57295, "coord_origin": "TOPLEFT"}}]}, "text": "\u2022 We present SynthTabNet a synthetically generated dataset, with various appearance styles and complexity."}, {"label": "list_item", "id": 7, "page_no": 1, "cluster": {"id": 7, "label": "list_item", "bbox": {"l": 60.752044677734375, "t": 604.2050170898438, "r": 286.601806640625, "b": 638.6813354492188, "coord_origin": "TOPLEFT"}, "confidence": 0.9806388020515442, "cells": [{"id": 56, "text": "\u2022", "bbox": {"l": 61.569008000000004, "t": 605.4034, "r": 72.332527, "b": 614.30995, "coord_origin": "TOPLEFT"}}, {"id": 57, "text": "An augmented dataset based on PubTabNet [37],", "bbox": {"l": 75.023399, "t": 605.4034, "r": 286.36508, "b": 614.30995, "coord_origin": "TOPLEFT"}}, {"id": 58, "text": "FinTabNet [36], and TableBank [17] with generated", "bbox": {"l": 70.03701, "t": 617.3584, "r": 286.36487, "b": 626.26495, "coord_origin": "TOPLEFT"}}, {"id": 59, "text": "ground-truth for reproducibility.", "bbox": {"l": 70.03701, "t": 629.31439, "r": 198.05641, "b": 638.22095, "coord_origin": "TOPLEFT"}}]}, "text": "\u2022 An augmented dataset based on PubTabNet [37], FinTabNet [36], and TableBank [17] with generated ground-truth for reproducibility."}, {"label": "text", "id": 8, "page_no": 1, "cluster": {"id": 8, "label": "text", "bbox": {"l": 49.25575256347656, "t": 649.7069091796875, "r": 286.68890380859375, "b": 695.369957, "coord_origin": "TOPLEFT"}, "confidence": 0.9742557406425476, "cells": [{"id": 60, "text": "The paper is structured as follows. In Sec. 2, we give", "bbox": {"l": 62.067009000000006, "t": 650.59839, "r": 286.36496, "b": 659.50494, "coord_origin": "TOPLEFT"}}, {"id": 61, "text": "a brief overview of the current state-of-the-art. In Sec. 3,", "bbox": {"l": 50.112007, "t": 662.55339, "r": 286.36511, "b": 671.45995, "coord_origin": "TOPLEFT"}}, {"id": 62, "text": "we describe the datasets on which we train. In Sec. 4, we", "bbox": {"l": 50.112007, "t": 674.50839, "r": 286.36511, "b": 683.41496, "coord_origin": "TOPLEFT"}}, {"id": 63, "text": "introduce the TableFormer model-architecture and describe", "bbox": {"l": 50.112007, "t": 686.46339, "r": 286.36511, "b": 695.369957, "coord_origin": "TOPLEFT"}}]}, "text": "The paper is structured as follows. In Sec. 2, we give a brief overview of the current state-of-the-art. In Sec. 3, we describe the datasets on which we train. In Sec. 4, we introduce the TableFormer model-architecture and describe"}, {"label": "footnote", "id": 9, "page_no": 1, "cluster": {"id": 9, "label": "footnote", "bbox": {"l": 60.58177185058594, "t": 704.4016723632812, "r": 183.79791259765625, "b": 713.3980102539062, "coord_origin": "TOPLEFT"}, "confidence": 0.8953584432601929, "cells": [{"id": 64, "text": "$^{1}$https://github.com/IBM/SynthTabNet", "bbox": {"l": 60.97100100000001, "t": 705.596275, "r": 183.73055, "b": 712.721542, "coord_origin": "TOPLEFT"}}]}, "text": "$^{1}$https://github.com/IBM/SynthTabNet"}, {"label": "text", "id": 10, "page_no": 1, "cluster": {"id": 10, "label": "text", "bbox": {"l": 308.0365905761719, "t": 74.14588165283203, "r": 545.3168334960938, "b": 108.69783782958984, "coord_origin": "TOPLEFT"}, "confidence": 0.9774291515350342, "cells": [{"id": 65, "text": "its results & performance in Sec. 5. As a conclusion, we de-", "bbox": {"l": 308.862, "t": 75.20836999999995, "r": 545.11511, "b": 84.11492999999996, "coord_origin": "TOPLEFT"}}, {"id": 66, "text": "scribe how this new model-architecture can be re-purposed", "bbox": {"l": 308.862, "t": 87.16339000000005, "r": 545.11505, "b": 96.06994999999995, "coord_origin": "TOPLEFT"}}, {"id": 67, "text": "for other tasks in the computer-vision community.", "bbox": {"l": 308.862, "t": 99.11841000000004, "r": 508.08417000000003, "b": 108.02495999999985, "coord_origin": "TOPLEFT"}}]}, "text": "its results & performance in Sec. 5. As a conclusion, we describe how this new model-architecture can be re-purposed for other tasks in the computer-vision community."}, {"label": "section_header", "id": 11, "page_no": 1, "cluster": {"id": 11, "label": "section_header", "bbox": {"l": 308.0834045410156, "t": 120.92444610595703, "r": 498.345947265625, "b": 132.47968000000003, "coord_origin": "TOPLEFT"}, "confidence": 0.9428719282150269, "cells": [{"id": 68, "text": "2.", "bbox": {"l": 308.862, "t": 121.73193000000003, "r": 315.5831, "b": 132.47968000000003, "coord_origin": "TOPLEFT"}}, {"id": 69, "text": "Previous work and State of the Art", "bbox": {"l": 324.54456, "t": 121.73193000000003, "r": 498.28021, "b": 132.47968000000003, "coord_origin": "TOPLEFT"}}]}, "text": "2. Previous work and State of the Art"}, {"label": "text", "id": 12, "page_no": 1, "cluster": {"id": 12, "label": "text", "bbox": {"l": 307.6423034667969, "t": 140.84927368164062, "r": 545.3845825195312, "b": 330.45502, "coord_origin": "TOPLEFT"}, "confidence": 0.9871960878372192, "cells": [{"id": 70, "text": "Identifying the structure of a table has been an outstand-", "bbox": {"l": 320.81699, "t": 142.22136999999998, "r": 545.11493, "b": 151.12793, "coord_origin": "TOPLEFT"}}, {"id": 71, "text": "ing problem in the document-parsing community, that mo-", "bbox": {"l": 308.862, "t": 154.17638999999997, "r": 545.11505, "b": 163.08294999999998, "coord_origin": "TOPLEFT"}}, {"id": 72, "text": "tivates many organised public challenges [6, 4, 14].", "bbox": {"l": 308.862, "t": 166.13140999999996, "r": 522.55975, "b": 175.03796, "coord_origin": "TOPLEFT"}}, {"id": 73, "text": "The", "bbox": {"l": 529.62323, "t": 166.13140999999996, "r": 545.11505, "b": 175.03796, "coord_origin": "TOPLEFT"}}, {"id": 74, "text": "difficulty of the problem can be attributed to a number of", "bbox": {"l": 308.862, "t": 178.08642999999995, "r": 545.11517, "b": 186.99298, "coord_origin": "TOPLEFT"}}, {"id": 75, "text": "factors. First, there is a large variety in the shapes and sizes", "bbox": {"l": 308.862, "t": 190.04143999999997, "r": 545.11511, "b": 198.94799999999998, "coord_origin": "TOPLEFT"}}, {"id": 76, "text": "of tables.", "bbox": {"l": 308.862, "t": 201.99645999999996, "r": 346.97891, "b": 210.90301999999997, "coord_origin": "TOPLEFT"}}, {"id": 77, "text": "Such large variety requires a flexible method.", "bbox": {"l": 354.86929, "t": 201.99645999999996, "r": 545.11511, "b": 210.90301999999997, "coord_origin": "TOPLEFT"}}, {"id": 78, "text": "This is especially true for complex column- and row head-", "bbox": {"l": 308.862, "t": 213.95245, "r": 545.11505, "b": 222.85901, "coord_origin": "TOPLEFT"}}, {"id": 79, "text": "ers, which can be extremely intricate and demanding.", "bbox": {"l": 308.862, "t": 225.90747, "r": 530.9184, "b": 234.81403, "coord_origin": "TOPLEFT"}}, {"id": 80, "text": "A", "bbox": {"l": 537.92212, "t": 225.90747, "r": 545.11511, "b": 234.81403, "coord_origin": "TOPLEFT"}}, {"id": 81, "text": "second factor of complexity is the lack of data with regard", "bbox": {"l": 308.862, "t": 237.86248999999998, "r": 545.11517, "b": 246.76904000000002, "coord_origin": "TOPLEFT"}}, {"id": 82, "text": "to table-structure. Until the publication of PubTabNet [37],", "bbox": {"l": 308.862, "t": 249.8175, "r": 545.11511, "b": 258.72406, "coord_origin": "TOPLEFT"}}, {"id": 83, "text": "there were no large datasets (i.e.", "bbox": {"l": 308.862, "t": 261.77252, "r": 439.8402699999999, "b": 270.67908, "coord_origin": "TOPLEFT"}}, {"id": 84, "text": ">", "bbox": {"l": 444.43999999999994, "t": 261.61310000000003, "r": 452.1889, "b": 270.45989999999995, "coord_origin": "TOPLEFT"}}, {"id": 85, "text": "100", "bbox": {"l": 455.89001, "t": 261.61310000000003, "r": 470.83392000000003, "b": 270.45989999999995, "coord_origin": "TOPLEFT"}}, {"id": 86, "text": "K tables) that pro-", "bbox": {"l": 470.83401, "t": 261.77252, "r": 545.11517, "b": 270.67908, "coord_origin": "TOPLEFT"}}, {"id": 87, "text": "vided structure information. This happens primarily due to", "bbox": {"l": 308.862, "t": 273.72748, "r": 545.11511, "b": 282.63406, "coord_origin": "TOPLEFT"}}, {"id": 88, "text": "the fact that tables are notoriously time-consuming to an-", "bbox": {"l": 308.862, "t": 285.6835, "r": 545.11511, "b": 294.59006, "coord_origin": "TOPLEFT"}}, {"id": 89, "text": "notate by hand. However, this has definitely changed in re-", "bbox": {"l": 308.862, "t": 297.63849, "r": 545.11511, "b": 306.54504, "coord_origin": "TOPLEFT"}}, {"id": 90, "text": "cent years with the deliverance of PubTabNet [37], FinTab-", "bbox": {"l": 308.862, "t": 309.59348, "r": 545.11517, "b": 318.50003000000004, "coord_origin": "TOPLEFT"}}, {"id": 91, "text": "Net [36], TableBank [17] etc.", "bbox": {"l": 308.862, "t": 321.54846, "r": 425.92255, "b": 330.45502, "coord_origin": "TOPLEFT"}}]}, "text": "Identifying the structure of a table has been an outstanding problem in the document-parsing community, that motivates many organised public challenges [6, 4, 14]. The difficulty of the problem can be attributed to a number of factors. First, there is a large variety in the shapes and sizes of tables. Such large variety requires a flexible method. This is especially true for complex column- and row headers, which can be extremely intricate and demanding. A second factor of complexity is the lack of data with regard to table-structure. Until the publication of PubTabNet [37], there were no large datasets (i.e. > 100 K tables) that provided structure information. This happens primarily due to the fact that tables are notoriously time-consuming to annotate by hand. However, this has definitely changed in recent years with the deliverance of PubTabNet [37], FinTabNet [36], TableBank [17] etc."}, {"label": "text", "id": 13, "page_no": 1, "cluster": {"id": 13, "label": "text", "bbox": {"l": 307.82720947265625, "t": 332.40576171875, "r": 545.1534423828125, "b": 450.0729099999999, "coord_origin": "TOPLEFT"}, "confidence": 0.9869063496589661, "cells": [{"id": 92, "text": "Before the rising popularity of deep neural networks,", "bbox": {"l": 320.81699, "t": 333.56946, "r": 545.11499, "b": 342.47601, "coord_origin": "TOPLEFT"}}, {"id": 93, "text": "the community relied heavily on heuristic and/or statistical", "bbox": {"l": 308.862, "t": 345.52444, "r": 545.11499, "b": 354.43100000000004, "coord_origin": "TOPLEFT"}}, {"id": 94, "text": "methods to do table structure identification [3, 7, 11, 5, 13,", "bbox": {"l": 308.862, "t": 357.47943, "r": 545.11517, "b": 366.38599, "coord_origin": "TOPLEFT"}}, {"id": 95, "text": "28]. Although such methods work well on constrained ta-", "bbox": {"l": 308.862, "t": 369.43542, "r": 545.11511, "b": 378.34198, "coord_origin": "TOPLEFT"}}, {"id": 96, "text": "bles [12], a more data-driven approach can be applied due", "bbox": {"l": 308.862, "t": 381.39041, "r": 545.11505, "b": 390.29697, "coord_origin": "TOPLEFT"}}, {"id": 97, "text": "to the advent of convolutional neural networks (CNNs) and", "bbox": {"l": 308.862, "t": 393.3453999999999, "r": 545.11505, "b": 402.25195, "coord_origin": "TOPLEFT"}}, {"id": 98, "text": "the availability of large datasets. To the best-of-our knowl-", "bbox": {"l": 308.862, "t": 405.30038, "r": 545.11517, "b": 414.20694, "coord_origin": "TOPLEFT"}}, {"id": 99, "text": "edge, there are currently two different types of network ar-", "bbox": {"l": 308.862, "t": 417.25537, "r": 545.11523, "b": 426.16193, "coord_origin": "TOPLEFT"}}, {"id": 100, "text": "chitecture that are being pursued for state-of-the-art table-", "bbox": {"l": 308.862, "t": 429.21136000000007, "r": 545.11511, "b": 438.11792, "coord_origin": "TOPLEFT"}}, {"id": 101, "text": "structure identification.", "bbox": {"l": 308.862, "t": 441.16635, "r": 401.28503, "b": 450.0729099999999, "coord_origin": "TOPLEFT"}}]}, "text": "Before the rising popularity of deep neural networks, the community relied heavily on heuristic and/or statistical methods to do table structure identification [3, 7, 11, 5, 13, 28]. Although such methods work well on constrained tables [12], a more data-driven approach can be applied due to the advent of convolutional neural networks (CNNs) and the availability of large datasets. To the best-of-our knowledge, there are currently two different types of network architecture that are being pursued for state-of-the-art tablestructure identification."}, {"label": "text", "id": 14, "page_no": 1, "cluster": {"id": 14, "label": "text", "bbox": {"l": 307.8529968261719, "t": 452.412353515625, "r": 545.3386840820312, "b": 713.151848, "coord_origin": "TOPLEFT"}, "confidence": 0.9843320846557617, "cells": [{"id": 102, "text": "Image-to-Text networks", "bbox": {"l": 320.81699, "t": 453.06778, "r": 423.26236, "b": 462.02417, "coord_origin": "TOPLEFT"}}, {"id": 103, "text": ": In this type of network, one", "bbox": {"l": 423.26697, "t": 453.18735, "r": 545.10956, "b": 462.0939, "coord_origin": "TOPLEFT"}}, {"id": 104, "text": "predicts a sequence of tokens starting from an encoded", "bbox": {"l": 308.86197, "t": 465.14233, "r": 545.11511, "b": 474.04889, "coord_origin": "TOPLEFT"}}, {"id": 105, "text": "image.", "bbox": {"l": 308.86197, "t": 477.09732, "r": 335.7012, "b": 486.00388, "coord_origin": "TOPLEFT"}}, {"id": 106, "text": "Such sequences of tokens can be HTML table", "bbox": {"l": 345.85309, "t": 477.09732, "r": 545.11505, "b": 486.00388, "coord_origin": "TOPLEFT"}}, {"id": 107, "text": "tags [37, 17] or LaTeX symbols[10]. The choice of sym-", "bbox": {"l": 308.86197, "t": 489.05231, "r": 545.11493, "b": 497.95886, "coord_origin": "TOPLEFT"}}, {"id": 108, "text": "bols is ultimately not very important, since one can be trans-", "bbox": {"l": 308.86197, "t": 501.00729, "r": 545.11499, "b": 509.91385, "coord_origin": "TOPLEFT"}}, {"id": 109, "text": "formed into the other. There are however subtle variations", "bbox": {"l": 308.86197, "t": 512.9632899999999, "r": 545.11505, "b": 521.8698400000001, "coord_origin": "TOPLEFT"}}, {"id": 110, "text": "in the Image-to-Text networks. The easiest network archi-", "bbox": {"l": 308.86197, "t": 524.91827, "r": 545.11505, "b": 533.82483, "coord_origin": "TOPLEFT"}}, {"id": 111, "text": "tectures are \u201cimage-encoder", "bbox": {"l": 308.86197, "t": 536.87328, "r": 420.94119, "b": 545.77983, "coord_origin": "TOPLEFT"}}, {"id": 112, "text": "\u2192", "bbox": {"l": 423.59497, "t": 536.1559599999999, "r": 433.5575600000001, "b": 545.56065, "coord_origin": "TOPLEFT"}}, {"id": 113, "text": "text-decoder\u201d (IETD), sim-", "bbox": {"l": 436.21198, "t": 536.87328, "r": 545.11316, "b": 545.77983, "coord_origin": "TOPLEFT"}}, {"id": 114, "text": "ilar to network architectures that try to provide captions to", "bbox": {"l": 308.86197, "t": 548.82828, "r": 545.11511, "b": 557.73483, "coord_origin": "TOPLEFT"}}, {"id": 115, "text": "images [32]. In these IETD networks, one expects as output", "bbox": {"l": 308.86197, "t": 560.78328, "r": 545.11493, "b": 569.68983, "coord_origin": "TOPLEFT"}}, {"id": 116, "text": "the LaTeX/HTML string of the entire table, i.e. the sym-", "bbox": {"l": 308.86197, "t": 572.73828, "r": 545.11499, "b": 581.6448399999999, "coord_origin": "TOPLEFT"}}, {"id": 117, "text": "bols necessary for creating the table with the content of the", "bbox": {"l": 308.86197, "t": 584.69427, "r": 545.11505, "b": 593.60083, "coord_origin": "TOPLEFT"}}, {"id": 118, "text": "table. Another approach is the \u201cimage-encoder", "bbox": {"l": 308.86197, "t": 596.6492800000001, "r": 497.07541, "b": 605.55583, "coord_origin": "TOPLEFT"}}, {"id": 119, "text": "\u2192", "bbox": {"l": 499.80496, "t": 595.93196, "r": 509.76755, "b": 605.33665, "coord_origin": "TOPLEFT"}}, {"id": 120, "text": "dual de-", "bbox": {"l": 512.50098, "t": 596.6492800000001, "r": 545.10852, "b": 605.55583, "coord_origin": "TOPLEFT"}}, {"id": 121, "text": "coder\u201d (IEDD) networks. In these type of networks, one has", "bbox": {"l": 308.86197, "t": 608.60428, "r": 545.11511, "b": 617.5108299999999, "coord_origin": "TOPLEFT"}}, {"id": 122, "text": "two consecutive decoders with different purposes. The first", "bbox": {"l": 308.86197, "t": 620.55928, "r": 545.11505, "b": 629.46584, "coord_origin": "TOPLEFT"}}, {"id": 123, "text": "decoder is the", "bbox": {"l": 308.86197, "t": 632.51428, "r": 364.78201, "b": 641.42084, "coord_origin": "TOPLEFT"}}, {"id": 124, "text": "tag-decoder", "bbox": {"l": 367.57397, "t": 632.60394, "r": 415.61362, "b": 641.1917, "coord_origin": "TOPLEFT"}}, {"id": 125, "text": ", i.e. it only produces the HTM-", "bbox": {"l": 415.61298, "t": 632.51428, "r": 545.11688, "b": 641.42084, "coord_origin": "TOPLEFT"}}, {"id": 126, "text": "L/LaTeX tags which construct an empty table. The second", "bbox": {"l": 308.86197, "t": 644.46928, "r": 545.11511, "b": 653.37584, "coord_origin": "TOPLEFT"}}, {"id": 127, "text": "content-decoder", "bbox": {"l": 308.86197, "t": 656.51494, "r": 373.59894, "b": 665.1027, "coord_origin": "TOPLEFT"}}, {"id": 128, "text": "uses the encoding of the image in combi-", "bbox": {"l": 376.90698, "t": 656.4252799999999, "r": 545.11548, "b": 665.33184, "coord_origin": "TOPLEFT"}}, {"id": 129, "text": "nation with the output encoding of each cell-tag (from the", "bbox": {"l": 308.862, "t": 668.38028, "r": 545.11517, "b": 677.28684, "coord_origin": "TOPLEFT"}}, {"id": 130, "text": "tag-decoder", "bbox": {"l": 308.862, "t": 680.42494, "r": 356.90164, "b": 689.0127, "coord_origin": "TOPLEFT"}}, {"id": 131, "text": ") to generate the textual content of each table", "bbox": {"l": 357.13101, "t": 680.33528, "r": 545.1153, "b": 689.24184, "coord_origin": "TOPLEFT"}}, {"id": 132, "text": "cell. The network architecture of IEDD is certainly more", "bbox": {"l": 308.862, "t": 692.290283, "r": 545.11511, "b": 701.196846, "coord_origin": "TOPLEFT"}}, {"id": 133, "text": "elaborate, but it has the advantage that one can pre-train the", "bbox": {"l": 308.862, "t": 704.245285, "r": 545.11517, "b": 713.151848, "coord_origin": "TOPLEFT"}}]}, "text": "Image-to-Text networks : In this type of network, one predicts a sequence of tokens starting from an encoded image. Such sequences of tokens can be HTML table tags [37, 17] or LaTeX symbols[10]. The choice of symbols is ultimately not very important, since one can be transformed into the other. There are however subtle variations in the Image-to-Text networks. The easiest network architectures are \u201cimage-encoder \u2192 text-decoder\u201d (IETD), similar to network architectures that try to provide captions to images [32]. In these IETD networks, one expects as output the LaTeX/HTML string of the entire table, i.e. the symbols necessary for creating the table with the content of the table. Another approach is the \u201cimage-encoder \u2192 dual decoder\u201d (IEDD) networks. In these type of networks, one has two consecutive decoders with different purposes. The first decoder is the tag-decoder , i.e. it only produces the HTML/LaTeX tags which construct an empty table. The second content-decoder uses the encoding of the image in combination with the output encoding of each cell-tag (from the tag-decoder ) to generate the textual content of each table cell. The network architecture of IEDD is certainly more elaborate, but it has the advantage that one can pre-train the"}, {"label": "page_footer", "id": 15, "page_no": 1, "cluster": {"id": 15, "label": "page_footer", "bbox": {"l": 294.5776062011719, "t": 733.1296997070312, "r": 300.2464904785156, "b": 743.039845, "coord_origin": "TOPLEFT"}, "confidence": 0.8778082132339478, "cells": [{"id": 134, "text": "2", "bbox": {"l": 295.121, "t": 734.133282, "r": 300.10229, "b": 743.039845, "coord_origin": "TOPLEFT"}}]}, "text": "2"}], "body": [{"label": "text", "id": 0, "page_no": 1, "cluster": {"id": 0, "label": "text", "bbox": {"l": 49.57566452026367, "t": 74.31916046142578, "r": 286.4354248046875, "b": 96.06994999999995, "coord_origin": "TOPLEFT"}, "confidence": 0.9656872153282166, "cells": [{"id": 0, "text": "considered as a solved problem, given enough ground-truth", "bbox": {"l": 50.112, "t": 75.20836999999995, "r": 286.36505, "b": 84.11492999999996, "coord_origin": "TOPLEFT"}}, {"id": 1, "text": "data to train on.", "bbox": {"l": 50.112, "t": 87.16339000000005, "r": 112.64721999999999, "b": 96.06994999999995, "coord_origin": "TOPLEFT"}}]}, "text": "considered as a solved problem, given enough ground-truth data to train on."}, {"label": "text", "id": 1, "page_no": 1, "cluster": {"id": 1, "label": "text", "bbox": {"l": 49.447235107421875, "t": 98.2728271484375, "r": 286.4484558105469, "b": 228.4640655517578, "coord_origin": "TOPLEFT"}, "confidence": 0.9838619828224182, "cells": [{"id": 2, "text": "The second problem is called table-structure decompo-", "bbox": {"l": 62.067001, "t": 99.57141000000001, "r": 286.36496, "b": 108.47797000000003, "coord_origin": "TOPLEFT"}}, {"id": 3, "text": "sition.", "bbox": {"l": 50.112, "t": 111.52643, "r": 74.749512, "b": 120.43297999999993, "coord_origin": "TOPLEFT"}}, {"id": 4, "text": "The latter is a long standing problem in the com-", "bbox": {"l": 81.334793, "t": 111.52643, "r": 286.36514, "b": 120.43297999999993, "coord_origin": "TOPLEFT"}}, {"id": 5, "text": "munity of document understanding [6, 4, 14]. Contrary to", "bbox": {"l": 50.112, "t": 123.48145, "r": 286.36511, "b": 132.38800000000003, "coord_origin": "TOPLEFT"}}, {"id": 6, "text": "the table-location problem, there are no commonly used ap-", "bbox": {"l": 50.112, "t": 135.43646, "r": 286.36511, "b": 144.34302000000002, "coord_origin": "TOPLEFT"}}, {"id": 7, "text": "proaches that can easily be re-purposed to solve this prob-", "bbox": {"l": 50.112, "t": 147.39246000000003, "r": 286.36505, "b": 156.29900999999995, "coord_origin": "TOPLEFT"}}, {"id": 8, "text": "lem. Lately, a set of new model-architectures has been pro-", "bbox": {"l": 50.112, "t": 159.34747000000004, "r": 286.36511, "b": 168.25402999999994, "coord_origin": "TOPLEFT"}}, {"id": 9, "text": "posed by the community to address table-structure decom-", "bbox": {"l": 50.112, "t": 171.30249000000003, "r": 286.36508, "b": 180.20905000000005, "coord_origin": "TOPLEFT"}}, {"id": 10, "text": "position [37, 36, 18, 20]. All these models have some weak-", "bbox": {"l": 50.112, "t": 183.25751000000002, "r": 286.36511, "b": 192.16405999999995, "coord_origin": "TOPLEFT"}}, {"id": 11, "text": "nesses (see Sec. 2). The common denominator here is the", "bbox": {"l": 50.112, "t": 195.21252000000004, "r": 286.36508, "b": 204.11908000000005, "coord_origin": "TOPLEFT"}}, {"id": 12, "text": "reliance on textual features and/or the inability to provide", "bbox": {"l": 50.112, "t": 207.16754000000003, "r": 286.36514, "b": 216.07410000000004, "coord_origin": "TOPLEFT"}}, {"id": 13, "text": "the bounding box of each table-cell in the original image.", "bbox": {"l": 50.112, "t": 219.12354000000005, "r": 278.66397, "b": 228.03008999999997, "coord_origin": "TOPLEFT"}}]}, "text": "The second problem is called table-structure decomposition. The latter is a long standing problem in the community of document understanding [6, 4, 14]. Contrary to the table-location problem, there are no commonly used approaches that can easily be re-purposed to solve this problem. Lately, a set of new model-architectures has been proposed by the community to address table-structure decomposition [37, 36, 18, 20]. All these models have some weaknesses (see Sec. 2). The common denominator here is the reliance on textual features and/or the inability to provide the bounding box of each table-cell in the original image."}, {"label": "text", "id": 2, "page_no": 1, "cluster": {"id": 2, "label": "text", "bbox": {"l": 49.21666717529297, "t": 230.39610290527344, "r": 286.7552185058594, "b": 371.94507, "coord_origin": "TOPLEFT"}, "confidence": 0.9855936169624329, "cells": [{"id": 14, "text": "In this paper, we want to address these weaknesses and", "bbox": {"l": 62.067001, "t": 231.53156, "r": 286.36493, "b": 240.43811000000005, "coord_origin": "TOPLEFT"}}, {"id": 15, "text": "present a robust table-structure decomposition algorithm.", "bbox": {"l": 50.112, "t": 243.48657000000003, "r": 286.36511, "b": 252.39313000000004, "coord_origin": "TOPLEFT"}}, {"id": 16, "text": "The design criteria for our model are the following. First,", "bbox": {"l": 50.112, "t": 255.44159000000002, "r": 286.36511, "b": 264.34813999999994, "coord_origin": "TOPLEFT"}}, {"id": 17, "text": "we want our algorithm to be language agnostic. In this way,", "bbox": {"l": 50.112, "t": 267.39661, "r": 286.36502, "b": 276.30316000000005, "coord_origin": "TOPLEFT"}}, {"id": 18, "text": "we can obtain the structure of any table, irregardless of the", "bbox": {"l": 50.112, "t": 279.35155999999995, "r": 286.36508, "b": 288.25815, "coord_origin": "TOPLEFT"}}, {"id": 19, "text": "language.", "bbox": {"l": 50.112, "t": 291.30759, "r": 88.567635, "b": 300.21414, "coord_origin": "TOPLEFT"}}, {"id": 20, "text": "Second, we want our algorithm to leverage as", "bbox": {"l": 95.501602, "t": 291.30759, "r": 286.36505, "b": 300.21414, "coord_origin": "TOPLEFT"}}, {"id": 21, "text": "much data as possible from the original PDF document. For", "bbox": {"l": 50.112, "t": 303.26257, "r": 286.36508, "b": 312.16913, "coord_origin": "TOPLEFT"}}, {"id": 22, "text": "programmatic PDF documents, the text-cells can often be", "bbox": {"l": 50.112, "t": 315.21756, "r": 286.36511, "b": 324.12411, "coord_origin": "TOPLEFT"}}, {"id": 23, "text": "extracted much faster and with higher accuracy compared", "bbox": {"l": 50.112, "t": 327.17255, "r": 286.36505, "b": 336.0791, "coord_origin": "TOPLEFT"}}, {"id": 24, "text": "to OCR methods. Last but not least, we want to have a di-", "bbox": {"l": 50.112, "t": 339.12753, "r": 286.36511, "b": 348.03409, "coord_origin": "TOPLEFT"}}, {"id": 25, "text": "rect link between the table-cell and its bounding box in the", "bbox": {"l": 50.112, "t": 351.08353, "r": 286.36508, "b": 359.99008, "coord_origin": "TOPLEFT"}}, {"id": 26, "text": "image.", "bbox": {"l": 50.112, "t": 363.03851, "r": 76.951241, "b": 371.94507, "coord_origin": "TOPLEFT"}}]}, "text": "In this paper, we want to address these weaknesses and present a robust table-structure decomposition algorithm. The design criteria for our model are the following. First, we want our algorithm to be language agnostic. In this way, we can obtain the structure of any table, irregardless of the language. Second, we want our algorithm to leverage as much data as possible from the original PDF document. For programmatic PDF documents, the text-cells can often be extracted much faster and with higher accuracy compared to OCR methods. Last but not least, we want to have a direct link between the table-cell and its bounding box in the image."}, {"label": "text", "id": 3, "page_no": 1, "cluster": {"id": 3, "label": "text", "bbox": {"l": 49.4359016418457, "t": 374.26544189453125, "r": 286.4716491699219, "b": 432.173, "coord_origin": "TOPLEFT"}, "confidence": 0.9820109009742737, "cells": [{"id": 27, "text": "To meet the design criteria listed above, we developed a", "bbox": {"l": 62.067001, "t": 375.4465, "r": 286.36499, "b": 384.35306, "coord_origin": "TOPLEFT"}}, {"id": 28, "text": "new model called", "bbox": {"l": 50.112, "t": 387.40149, "r": 120.98594, "b": 396.30804, "coord_origin": "TOPLEFT"}}, {"id": 29, "text": "TableFormer", "bbox": {"l": 123.901, "t": 387.28192, "r": 179.7314, "b": 396.23830999999996, "coord_origin": "TOPLEFT"}}, {"id": 30, "text": "and a synthetically gener-", "bbox": {"l": 182.646, "t": 387.40149, "r": 286.36658, "b": 396.30804, "coord_origin": "TOPLEFT"}}, {"id": 31, "text": "ated table structure dataset called", "bbox": {"l": 50.112, "t": 399.35648, "r": 181.75778, "b": 408.26302999999996, "coord_origin": "TOPLEFT"}}, {"id": 32, "text": "SynthTabNet", "bbox": {"l": 184.104, "t": 399.23690999999997, "r": 240.2034, "b": 408.1933, "coord_origin": "TOPLEFT"}}, {"id": 33, "text": "$^{1}$. In partic-", "bbox": {"l": 240.20401, "t": 399.35648, "r": 286.36069, "b": 408.26302999999996, "coord_origin": "TOPLEFT"}}, {"id": 34, "text": "ular, our contributions in this work can be summarised as", "bbox": {"l": 50.112015, "t": 411.31146, "r": 286.36511, "b": 420.21802, "coord_origin": "TOPLEFT"}}, {"id": 35, "text": "follows:", "bbox": {"l": 50.112015, "t": 423.26645, "r": 82.520355, "b": 432.173, "coord_origin": "TOPLEFT"}}]}, "text": "To meet the design criteria listed above, we developed a new model called TableFormer and a synthetically generated table structure dataset called SynthTabNet $^{1}$. In particular, our contributions in this work can be summarised as follows:"}, {"label": "list_item", "id": 4, "page_no": 1, "cluster": {"id": 4, "label": "list_item", "bbox": {"l": 60.82297897338867, "t": 443.35675048828125, "r": 286.6727294921875, "b": 489.6653747558594, "coord_origin": "TOPLEFT"}, "confidence": 0.9822155237197876, "cells": [{"id": 36, "text": "\u2022", "bbox": {"l": 61.569016, "t": 444.55145, "r": 70.741714, "b": 453.45801, "coord_origin": "TOPLEFT"}}, {"id": 37, "text": "We propose", "bbox": {"l": 73.034889, "t": 444.55145, "r": 117.10054, "b": 453.45801, "coord_origin": "TOPLEFT"}}, {"id": 38, "text": "TableFormer", "bbox": {"l": 119.59001, "t": 444.43188, "r": 175.42041, "b": 453.38828, "coord_origin": "TOPLEFT"}}, {"id": 39, "text": ", a transformer based model", "bbox": {"l": 175.42102, "t": 444.55145, "r": 286.36453, "b": 453.45801, "coord_origin": "TOPLEFT"}}, {"id": 40, "text": "that predicts tables structure and bounding boxes for", "bbox": {"l": 70.037018, "t": 456.50644000000005, "r": 286.3649, "b": 465.41299, "coord_origin": "TOPLEFT"}}, {"id": 41, "text": "the table content simultaneously in an end-to-end ap-", "bbox": {"l": 70.037018, "t": 468.46143, "r": 286.3649, "b": 477.36798, "coord_origin": "TOPLEFT"}}, {"id": 42, "text": "proach.", "bbox": {"l": 70.037018, "t": 480.41641, "r": 99.635902, "b": 489.32297, "coord_origin": "TOPLEFT"}}]}, "text": "\u2022 We propose TableFormer , a transformer based model that predicts tables structure and bounding boxes for the table content simultaneously in an end-to-end approach."}, {"label": "list_item", "id": 5, "page_no": 1, "cluster": {"id": 5, "label": "list_item", "bbox": {"l": 60.85062026977539, "t": 500.852294921875, "r": 286.6067810058594, "b": 547.2149047851562, "coord_origin": "TOPLEFT"}, "confidence": 0.9822708964347839, "cells": [{"id": 43, "text": "\u2022", "bbox": {"l": 61.569016, "t": 502.15341, "r": 71.619438, "b": 511.05997, "coord_origin": "TOPLEFT"}}, {"id": 44, "text": "Across all benchmark datasets", "bbox": {"l": 74.132042, "t": 502.15341, "r": 196.10396, "b": 511.05997, "coord_origin": "TOPLEFT"}}, {"id": 45, "text": "TableFormer", "bbox": {"l": 200.31001, "t": 502.03384, "r": 256.14041, "b": 510.99023, "coord_origin": "TOPLEFT"}}, {"id": 46, "text": "signif-", "bbox": {"l": 260.35001, "t": 502.15341, "r": 286.36237, "b": 511.05997, "coord_origin": "TOPLEFT"}}, {"id": 47, "text": "icantly outperforms existing state-of-the-art metrics,", "bbox": {"l": 70.037003, "t": 514.1084000000001, "r": 286.3649, "b": 523.01495, "coord_origin": "TOPLEFT"}}, {"id": 48, "text": "while being much more efficient in training and infer-", "bbox": {"l": 70.037003, "t": 526.06439, "r": 286.36487, "b": 534.97095, "coord_origin": "TOPLEFT"}}, {"id": 49, "text": "ence to existing works.", "bbox": {"l": 70.037003, "t": 538.0193899999999, "r": 161.65305, "b": 546.9259500000001, "coord_origin": "TOPLEFT"}}]}, "text": "\u2022 Across all benchmark datasets TableFormer significantly outperforms existing state-of-the-art metrics, while being much more efficient in training and inference to existing works."}, {"label": "list_item", "id": 6, "page_no": 1, "cluster": {"id": 6, "label": "list_item", "bbox": {"l": 60.90078353881836, "t": 558.7830810546875, "r": 286.4057922363281, "b": 593.1300048828125, "coord_origin": "TOPLEFT"}, "confidence": 0.980295717716217, "cells": [{"id": 50, "text": "\u2022", "bbox": {"l": 61.569, "t": 559.75639, "r": 71.115913, "b": 568.66295, "coord_origin": "TOPLEFT"}}, {"id": 51, "text": "We present", "bbox": {"l": 73.502647, "t": 559.75639, "r": 116.71199, "b": 568.66295, "coord_origin": "TOPLEFT"}}, {"id": 52, "text": "SynthTabNet", "bbox": {"l": 121.583, "t": 559.63684, "r": 177.68239, "b": 568.59322, "coord_origin": "TOPLEFT"}}, {"id": 53, "text": "a synthetically generated", "bbox": {"l": 182.55301, "t": 559.75639, "r": 286.36328, "b": 568.66295, "coord_origin": "TOPLEFT"}}, {"id": 54, "text": "dataset, with various appearance styles and complex-", "bbox": {"l": 70.03701, "t": 571.7114, "r": 286.36493, "b": 580.6179500000001, "coord_origin": "TOPLEFT"}}, {"id": 55, "text": "ity.", "bbox": {"l": 70.03701, "t": 583.6664000000001, "r": 82.400597, "b": 592.57295, "coord_origin": "TOPLEFT"}}]}, "text": "\u2022 We present SynthTabNet a synthetically generated dataset, with various appearance styles and complexity."}, {"label": "list_item", "id": 7, "page_no": 1, "cluster": {"id": 7, "label": "list_item", "bbox": {"l": 60.752044677734375, "t": 604.2050170898438, "r": 286.601806640625, "b": 638.6813354492188, "coord_origin": "TOPLEFT"}, "confidence": 0.9806388020515442, "cells": [{"id": 56, "text": "\u2022", "bbox": {"l": 61.569008000000004, "t": 605.4034, "r": 72.332527, "b": 614.30995, "coord_origin": "TOPLEFT"}}, {"id": 57, "text": "An augmented dataset based on PubTabNet [37],", "bbox": {"l": 75.023399, "t": 605.4034, "r": 286.36508, "b": 614.30995, "coord_origin": "TOPLEFT"}}, {"id": 58, "text": "FinTabNet [36], and TableBank [17] with generated", "bbox": {"l": 70.03701, "t": 617.3584, "r": 286.36487, "b": 626.26495, "coord_origin": "TOPLEFT"}}, {"id": 59, "text": "ground-truth for reproducibility.", "bbox": {"l": 70.03701, "t": 629.31439, "r": 198.05641, "b": 638.22095, "coord_origin": "TOPLEFT"}}]}, "text": "\u2022 An augmented dataset based on PubTabNet [37], FinTabNet [36], and TableBank [17] with generated ground-truth for reproducibility."}, {"label": "text", "id": 8, "page_no": 1, "cluster": {"id": 8, "label": "text", "bbox": {"l": 49.25575256347656, "t": 649.7069091796875, "r": 286.68890380859375, "b": 695.369957, "coord_origin": "TOPLEFT"}, "confidence": 0.9742557406425476, "cells": [{"id": 60, "text": "The paper is structured as follows. In Sec. 2, we give", "bbox": {"l": 62.067009000000006, "t": 650.59839, "r": 286.36496, "b": 659.50494, "coord_origin": "TOPLEFT"}}, {"id": 61, "text": "a brief overview of the current state-of-the-art. In Sec. 3,", "bbox": {"l": 50.112007, "t": 662.55339, "r": 286.36511, "b": 671.45995, "coord_origin": "TOPLEFT"}}, {"id": 62, "text": "we describe the datasets on which we train. In Sec. 4, we", "bbox": {"l": 50.112007, "t": 674.50839, "r": 286.36511, "b": 683.41496, "coord_origin": "TOPLEFT"}}, {"id": 63, "text": "introduce the TableFormer model-architecture and describe", "bbox": {"l": 50.112007, "t": 686.46339, "r": 286.36511, "b": 695.369957, "coord_origin": "TOPLEFT"}}]}, "text": "The paper is structured as follows. In Sec. 2, we give a brief overview of the current state-of-the-art. In Sec. 3, we describe the datasets on which we train. In Sec. 4, we introduce the TableFormer model-architecture and describe"}, {"label": "footnote", "id": 9, "page_no": 1, "cluster": {"id": 9, "label": "footnote", "bbox": {"l": 60.58177185058594, "t": 704.4016723632812, "r": 183.79791259765625, "b": 713.3980102539062, "coord_origin": "TOPLEFT"}, "confidence": 0.8953584432601929, "cells": [{"id": 64, "text": "$^{1}$https://github.com/IBM/SynthTabNet", "bbox": {"l": 60.97100100000001, "t": 705.596275, "r": 183.73055, "b": 712.721542, "coord_origin": "TOPLEFT"}}]}, "text": "$^{1}$https://github.com/IBM/SynthTabNet"}, {"label": "text", "id": 10, "page_no": 1, "cluster": {"id": 10, "label": "text", "bbox": {"l": 308.0365905761719, "t": 74.14588165283203, "r": 545.3168334960938, "b": 108.69783782958984, "coord_origin": "TOPLEFT"}, "confidence": 0.9774291515350342, "cells": [{"id": 65, "text": "its results & performance in Sec. 5. As a conclusion, we de-", "bbox": {"l": 308.862, "t": 75.20836999999995, "r": 545.11511, "b": 84.11492999999996, "coord_origin": "TOPLEFT"}}, {"id": 66, "text": "scribe how this new model-architecture can be re-purposed", "bbox": {"l": 308.862, "t": 87.16339000000005, "r": 545.11505, "b": 96.06994999999995, "coord_origin": "TOPLEFT"}}, {"id": 67, "text": "for other tasks in the computer-vision community.", "bbox": {"l": 308.862, "t": 99.11841000000004, "r": 508.08417000000003, "b": 108.02495999999985, "coord_origin": "TOPLEFT"}}]}, "text": "its results & performance in Sec. 5. As a conclusion, we describe how this new model-architecture can be re-purposed for other tasks in the computer-vision community."}, {"label": "section_header", "id": 11, "page_no": 1, "cluster": {"id": 11, "label": "section_header", "bbox": {"l": 308.0834045410156, "t": 120.92444610595703, "r": 498.345947265625, "b": 132.47968000000003, "coord_origin": "TOPLEFT"}, "confidence": 0.9428719282150269, "cells": [{"id": 68, "text": "2.", "bbox": {"l": 308.862, "t": 121.73193000000003, "r": 315.5831, "b": 132.47968000000003, "coord_origin": "TOPLEFT"}}, {"id": 69, "text": "Previous work and State of the Art", "bbox": {"l": 324.54456, "t": 121.73193000000003, "r": 498.28021, "b": 132.47968000000003, "coord_origin": "TOPLEFT"}}]}, "text": "2. Previous work and State of the Art"}, {"label": "text", "id": 12, "page_no": 1, "cluster": {"id": 12, "label": "text", "bbox": {"l": 307.6423034667969, "t": 140.84927368164062, "r": 545.3845825195312, "b": 330.45502, "coord_origin": "TOPLEFT"}, "confidence": 0.9871960878372192, "cells": [{"id": 70, "text": "Identifying the structure of a table has been an outstand-", "bbox": {"l": 320.81699, "t": 142.22136999999998, "r": 545.11493, "b": 151.12793, "coord_origin": "TOPLEFT"}}, {"id": 71, "text": "ing problem in the document-parsing community, that mo-", "bbox": {"l": 308.862, "t": 154.17638999999997, "r": 545.11505, "b": 163.08294999999998, "coord_origin": "TOPLEFT"}}, {"id": 72, "text": "tivates many organised public challenges [6, 4, 14].", "bbox": {"l": 308.862, "t": 166.13140999999996, "r": 522.55975, "b": 175.03796, "coord_origin": "TOPLEFT"}}, {"id": 73, "text": "The", "bbox": {"l": 529.62323, "t": 166.13140999999996, "r": 545.11505, "b": 175.03796, "coord_origin": "TOPLEFT"}}, {"id": 74, "text": "difficulty of the problem can be attributed to a number of", "bbox": {"l": 308.862, "t": 178.08642999999995, "r": 545.11517, "b": 186.99298, "coord_origin": "TOPLEFT"}}, {"id": 75, "text": "factors. First, there is a large variety in the shapes and sizes", "bbox": {"l": 308.862, "t": 190.04143999999997, "r": 545.11511, "b": 198.94799999999998, "coord_origin": "TOPLEFT"}}, {"id": 76, "text": "of tables.", "bbox": {"l": 308.862, "t": 201.99645999999996, "r": 346.97891, "b": 210.90301999999997, "coord_origin": "TOPLEFT"}}, {"id": 77, "text": "Such large variety requires a flexible method.", "bbox": {"l": 354.86929, "t": 201.99645999999996, "r": 545.11511, "b": 210.90301999999997, "coord_origin": "TOPLEFT"}}, {"id": 78, "text": "This is especially true for complex column- and row head-", "bbox": {"l": 308.862, "t": 213.95245, "r": 545.11505, "b": 222.85901, "coord_origin": "TOPLEFT"}}, {"id": 79, "text": "ers, which can be extremely intricate and demanding.", "bbox": {"l": 308.862, "t": 225.90747, "r": 530.9184, "b": 234.81403, "coord_origin": "TOPLEFT"}}, {"id": 80, "text": "A", "bbox": {"l": 537.92212, "t": 225.90747, "r": 545.11511, "b": 234.81403, "coord_origin": "TOPLEFT"}}, {"id": 81, "text": "second factor of complexity is the lack of data with regard", "bbox": {"l": 308.862, "t": 237.86248999999998, "r": 545.11517, "b": 246.76904000000002, "coord_origin": "TOPLEFT"}}, {"id": 82, "text": "to table-structure. Until the publication of PubTabNet [37],", "bbox": {"l": 308.862, "t": 249.8175, "r": 545.11511, "b": 258.72406, "coord_origin": "TOPLEFT"}}, {"id": 83, "text": "there were no large datasets (i.e.", "bbox": {"l": 308.862, "t": 261.77252, "r": 439.8402699999999, "b": 270.67908, "coord_origin": "TOPLEFT"}}, {"id": 84, "text": ">", "bbox": {"l": 444.43999999999994, "t": 261.61310000000003, "r": 452.1889, "b": 270.45989999999995, "coord_origin": "TOPLEFT"}}, {"id": 85, "text": "100", "bbox": {"l": 455.89001, "t": 261.61310000000003, "r": 470.83392000000003, "b": 270.45989999999995, "coord_origin": "TOPLEFT"}}, {"id": 86, "text": "K tables) that pro-", "bbox": {"l": 470.83401, "t": 261.77252, "r": 545.11517, "b": 270.67908, "coord_origin": "TOPLEFT"}}, {"id": 87, "text": "vided structure information. This happens primarily due to", "bbox": {"l": 308.862, "t": 273.72748, "r": 545.11511, "b": 282.63406, "coord_origin": "TOPLEFT"}}, {"id": 88, "text": "the fact that tables are notoriously time-consuming to an-", "bbox": {"l": 308.862, "t": 285.6835, "r": 545.11511, "b": 294.59006, "coord_origin": "TOPLEFT"}}, {"id": 89, "text": "notate by hand. However, this has definitely changed in re-", "bbox": {"l": 308.862, "t": 297.63849, "r": 545.11511, "b": 306.54504, "coord_origin": "TOPLEFT"}}, {"id": 90, "text": "cent years with the deliverance of PubTabNet [37], FinTab-", "bbox": {"l": 308.862, "t": 309.59348, "r": 545.11517, "b": 318.50003000000004, "coord_origin": "TOPLEFT"}}, {"id": 91, "text": "Net [36], TableBank [17] etc.", "bbox": {"l": 308.862, "t": 321.54846, "r": 425.92255, "b": 330.45502, "coord_origin": "TOPLEFT"}}]}, "text": "Identifying the structure of a table has been an outstanding problem in the document-parsing community, that motivates many organised public challenges [6, 4, 14]. The difficulty of the problem can be attributed to a number of factors. First, there is a large variety in the shapes and sizes of tables. Such large variety requires a flexible method. This is especially true for complex column- and row headers, which can be extremely intricate and demanding. A second factor of complexity is the lack of data with regard to table-structure. Until the publication of PubTabNet [37], there were no large datasets (i.e. > 100 K tables) that provided structure information. This happens primarily due to the fact that tables are notoriously time-consuming to annotate by hand. However, this has definitely changed in recent years with the deliverance of PubTabNet [37], FinTabNet [36], TableBank [17] etc."}, {"label": "text", "id": 13, "page_no": 1, "cluster": {"id": 13, "label": "text", "bbox": {"l": 307.82720947265625, "t": 332.40576171875, "r": 545.1534423828125, "b": 450.0729099999999, "coord_origin": "TOPLEFT"}, "confidence": 0.9869063496589661, "cells": [{"id": 92, "text": "Before the rising popularity of deep neural networks,", "bbox": {"l": 320.81699, "t": 333.56946, "r": 545.11499, "b": 342.47601, "coord_origin": "TOPLEFT"}}, {"id": 93, "text": "the community relied heavily on heuristic and/or statistical", "bbox": {"l": 308.862, "t": 345.52444, "r": 545.11499, "b": 354.43100000000004, "coord_origin": "TOPLEFT"}}, {"id": 94, "text": "methods to do table structure identification [3, 7, 11, 5, 13,", "bbox": {"l": 308.862, "t": 357.47943, "r": 545.11517, "b": 366.38599, "coord_origin": "TOPLEFT"}}, {"id": 95, "text": "28]. Although such methods work well on constrained ta-", "bbox": {"l": 308.862, "t": 369.43542, "r": 545.11511, "b": 378.34198, "coord_origin": "TOPLEFT"}}, {"id": 96, "text": "bles [12], a more data-driven approach can be applied due", "bbox": {"l": 308.862, "t": 381.39041, "r": 545.11505, "b": 390.29697, "coord_origin": "TOPLEFT"}}, {"id": 97, "text": "to the advent of convolutional neural networks (CNNs) and", "bbox": {"l": 308.862, "t": 393.3453999999999, "r": 545.11505, "b": 402.25195, "coord_origin": "TOPLEFT"}}, {"id": 98, "text": "the availability of large datasets. To the best-of-our knowl-", "bbox": {"l": 308.862, "t": 405.30038, "r": 545.11517, "b": 414.20694, "coord_origin": "TOPLEFT"}}, {"id": 99, "text": "edge, there are currently two different types of network ar-", "bbox": {"l": 308.862, "t": 417.25537, "r": 545.11523, "b": 426.16193, "coord_origin": "TOPLEFT"}}, {"id": 100, "text": "chitecture that are being pursued for state-of-the-art table-", "bbox": {"l": 308.862, "t": 429.21136000000007, "r": 545.11511, "b": 438.11792, "coord_origin": "TOPLEFT"}}, {"id": 101, "text": "structure identification.", "bbox": {"l": 308.862, "t": 441.16635, "r": 401.28503, "b": 450.0729099999999, "coord_origin": "TOPLEFT"}}]}, "text": "Before the rising popularity of deep neural networks, the community relied heavily on heuristic and/or statistical methods to do table structure identification [3, 7, 11, 5, 13, 28]. Although such methods work well on constrained tables [12], a more data-driven approach can be applied due to the advent of convolutional neural networks (CNNs) and the availability of large datasets. To the best-of-our knowledge, there are currently two different types of network architecture that are being pursued for state-of-the-art tablestructure identification."}, {"label": "text", "id": 14, "page_no": 1, "cluster": {"id": 14, "label": "text", "bbox": {"l": 307.8529968261719, "t": 452.412353515625, "r": 545.3386840820312, "b": 713.151848, "coord_origin": "TOPLEFT"}, "confidence": 0.9843320846557617, "cells": [{"id": 102, "text": "Image-to-Text networks", "bbox": {"l": 320.81699, "t": 453.06778, "r": 423.26236, "b": 462.02417, "coord_origin": "TOPLEFT"}}, {"id": 103, "text": ": In this type of network, one", "bbox": {"l": 423.26697, "t": 453.18735, "r": 545.10956, "b": 462.0939, "coord_origin": "TOPLEFT"}}, {"id": 104, "text": "predicts a sequence of tokens starting from an encoded", "bbox": {"l": 308.86197, "t": 465.14233, "r": 545.11511, "b": 474.04889, "coord_origin": "TOPLEFT"}}, {"id": 105, "text": "image.", "bbox": {"l": 308.86197, "t": 477.09732, "r": 335.7012, "b": 486.00388, "coord_origin": "TOPLEFT"}}, {"id": 106, "text": "Such sequences of tokens can be HTML table", "bbox": {"l": 345.85309, "t": 477.09732, "r": 545.11505, "b": 486.00388, "coord_origin": "TOPLEFT"}}, {"id": 107, "text": "tags [37, 17] or LaTeX symbols[10]. The choice of sym-", "bbox": {"l": 308.86197, "t": 489.05231, "r": 545.11493, "b": 497.95886, "coord_origin": "TOPLEFT"}}, {"id": 108, "text": "bols is ultimately not very important, since one can be trans-", "bbox": {"l": 308.86197, "t": 501.00729, "r": 545.11499, "b": 509.91385, "coord_origin": "TOPLEFT"}}, {"id": 109, "text": "formed into the other. There are however subtle variations", "bbox": {"l": 308.86197, "t": 512.9632899999999, "r": 545.11505, "b": 521.8698400000001, "coord_origin": "TOPLEFT"}}, {"id": 110, "text": "in the Image-to-Text networks. The easiest network archi-", "bbox": {"l": 308.86197, "t": 524.91827, "r": 545.11505, "b": 533.82483, "coord_origin": "TOPLEFT"}}, {"id": 111, "text": "tectures are \u201cimage-encoder", "bbox": {"l": 308.86197, "t": 536.87328, "r": 420.94119, "b": 545.77983, "coord_origin": "TOPLEFT"}}, {"id": 112, "text": "\u2192", "bbox": {"l": 423.59497, "t": 536.1559599999999, "r": 433.5575600000001, "b": 545.56065, "coord_origin": "TOPLEFT"}}, {"id": 113, "text": "text-decoder\u201d (IETD), sim-", "bbox": {"l": 436.21198, "t": 536.87328, "r": 545.11316, "b": 545.77983, "coord_origin": "TOPLEFT"}}, {"id": 114, "text": "ilar to network architectures that try to provide captions to", "bbox": {"l": 308.86197, "t": 548.82828, "r": 545.11511, "b": 557.73483, "coord_origin": "TOPLEFT"}}, {"id": 115, "text": "images [32]. In these IETD networks, one expects as output", "bbox": {"l": 308.86197, "t": 560.78328, "r": 545.11493, "b": 569.68983, "coord_origin": "TOPLEFT"}}, {"id": 116, "text": "the LaTeX/HTML string of the entire table, i.e. the sym-", "bbox": {"l": 308.86197, "t": 572.73828, "r": 545.11499, "b": 581.6448399999999, "coord_origin": "TOPLEFT"}}, {"id": 117, "text": "bols necessary for creating the table with the content of the", "bbox": {"l": 308.86197, "t": 584.69427, "r": 545.11505, "b": 593.60083, "coord_origin": "TOPLEFT"}}, {"id": 118, "text": "table. Another approach is the \u201cimage-encoder", "bbox": {"l": 308.86197, "t": 596.6492800000001, "r": 497.07541, "b": 605.55583, "coord_origin": "TOPLEFT"}}, {"id": 119, "text": "\u2192", "bbox": {"l": 499.80496, "t": 595.93196, "r": 509.76755, "b": 605.33665, "coord_origin": "TOPLEFT"}}, {"id": 120, "text": "dual de-", "bbox": {"l": 512.50098, "t": 596.6492800000001, "r": 545.10852, "b": 605.55583, "coord_origin": "TOPLEFT"}}, {"id": 121, "text": "coder\u201d (IEDD) networks. In these type of networks, one has", "bbox": {"l": 308.86197, "t": 608.60428, "r": 545.11511, "b": 617.5108299999999, "coord_origin": "TOPLEFT"}}, {"id": 122, "text": "two consecutive decoders with different purposes. The first", "bbox": {"l": 308.86197, "t": 620.55928, "r": 545.11505, "b": 629.46584, "coord_origin": "TOPLEFT"}}, {"id": 123, "text": "decoder is the", "bbox": {"l": 308.86197, "t": 632.51428, "r": 364.78201, "b": 641.42084, "coord_origin": "TOPLEFT"}}, {"id": 124, "text": "tag-decoder", "bbox": {"l": 367.57397, "t": 632.60394, "r": 415.61362, "b": 641.1917, "coord_origin": "TOPLEFT"}}, {"id": 125, "text": ", i.e. it only produces the HTM-", "bbox": {"l": 415.61298, "t": 632.51428, "r": 545.11688, "b": 641.42084, "coord_origin": "TOPLEFT"}}, {"id": 126, "text": "L/LaTeX tags which construct an empty table. The second", "bbox": {"l": 308.86197, "t": 644.46928, "r": 545.11511, "b": 653.37584, "coord_origin": "TOPLEFT"}}, {"id": 127, "text": "content-decoder", "bbox": {"l": 308.86197, "t": 656.51494, "r": 373.59894, "b": 665.1027, "coord_origin": "TOPLEFT"}}, {"id": 128, "text": "uses the encoding of the image in combi-", "bbox": {"l": 376.90698, "t": 656.4252799999999, "r": 545.11548, "b": 665.33184, "coord_origin": "TOPLEFT"}}, {"id": 129, "text": "nation with the output encoding of each cell-tag (from the", "bbox": {"l": 308.862, "t": 668.38028, "r": 545.11517, "b": 677.28684, "coord_origin": "TOPLEFT"}}, {"id": 130, "text": "tag-decoder", "bbox": {"l": 308.862, "t": 680.42494, "r": 356.90164, "b": 689.0127, "coord_origin": "TOPLEFT"}}, {"id": 131, "text": ") to generate the textual content of each table", "bbox": {"l": 357.13101, "t": 680.33528, "r": 545.1153, "b": 689.24184, "coord_origin": "TOPLEFT"}}, {"id": 132, "text": "cell. The network architecture of IEDD is certainly more", "bbox": {"l": 308.862, "t": 692.290283, "r": 545.11511, "b": 701.196846, "coord_origin": "TOPLEFT"}}, {"id": 133, "text": "elaborate, but it has the advantage that one can pre-train the", "bbox": {"l": 308.862, "t": 704.245285, "r": 545.11517, "b": 713.151848, "coord_origin": "TOPLEFT"}}]}, "text": "Image-to-Text networks : In this type of network, one predicts a sequence of tokens starting from an encoded image. Such sequences of tokens can be HTML table tags [37, 17] or LaTeX symbols[10]. The choice of symbols is ultimately not very important, since one can be transformed into the other. There are however subtle variations in the Image-to-Text networks. The easiest network architectures are \u201cimage-encoder \u2192 text-decoder\u201d (IETD), similar to network architectures that try to provide captions to images [32]. In these IETD networks, one expects as output the LaTeX/HTML string of the entire table, i.e. the symbols necessary for creating the table with the content of the table. Another approach is the \u201cimage-encoder \u2192 dual decoder\u201d (IEDD) networks. In these type of networks, one has two consecutive decoders with different purposes. The first decoder is the tag-decoder , i.e. it only produces the HTML/LaTeX tags which construct an empty table. The second content-decoder uses the encoding of the image in combination with the output encoding of each cell-tag (from the tag-decoder ) to generate the textual content of each table cell. The network architecture of IEDD is certainly more elaborate, but it has the advantage that one can pre-train the"}], "headers": [{"label": "page_footer", "id": 15, "page_no": 1, "cluster": {"id": 15, "label": "page_footer", "bbox": {"l": 294.5776062011719, "t": 733.1296997070312, "r": 300.2464904785156, "b": 743.039845, "coord_origin": "TOPLEFT"}, "confidence": 0.8778082132339478, "cells": [{"id": 134, "text": "2", "bbox": {"l": 295.121, "t": 734.133282, "r": 300.10229, "b": 743.039845, "coord_origin": "TOPLEFT"}}]}, "text": "2"}]}}, {"page_no": 2, "size": {"width": 612.0, "height": 792.0}, "cells": [{"id": 0, "text": "tag-decoder which is constrained to the table-tags.", "bbox": {"l": 50.112, "t": 75.20836999999995, "r": 250.15102, "b": 84.11492999999996, "coord_origin": "TOPLEFT"}}, {"id": 1, "text": "In", "bbox": {"l": 62.067001, "t": 87.21935999999994, "r": 70.365845, "b": 96.12591999999995, "coord_origin": "TOPLEFT"}}, {"id": 2, "text": "practice,", "bbox": {"l": 76.931198, "t": 87.21935999999994, "r": 110.95348000000001, "b": 96.12591999999995, "coord_origin": "TOPLEFT"}}, {"id": 3, "text": "both", "bbox": {"l": 118.54498, "t": 87.21935999999994, "r": 136.25848, "b": 96.12591999999995, "coord_origin": "TOPLEFT"}}, {"id": 4, "text": "network", "bbox": {"l": 142.82384, "t": 87.21935999999994, "r": 175.37166, "b": 96.12591999999995, "coord_origin": "TOPLEFT"}}, {"id": 5, "text": "architectures", "bbox": {"l": 181.94698, "t": 87.21935999999994, "r": 232.83594000000002, "b": 96.12591999999995, "coord_origin": "TOPLEFT"}}, {"id": 6, "text": "(IETD", "bbox": {"l": 239.41125, "t": 87.21935999999994, "r": 265.41364, "b": 96.12591999999995, "coord_origin": "TOPLEFT"}}, {"id": 7, "text": "and", "bbox": {"l": 271.979, "t": 87.21935999999994, "r": 286.36499, "b": 96.12591999999995, "coord_origin": "TOPLEFT"}}, {"id": 8, "text": "IEDD) require an implicit, custom trained object-character-", "bbox": {"l": 50.112, "t": 99.17437999999993, "r": 286.36505, "b": 108.08092999999997, "coord_origin": "TOPLEFT"}}, {"id": 9, "text": "recognition (OCR) to obtain the content of the table-cells.", "bbox": {"l": 50.112, "t": 111.13036999999997, "r": 286.36511, "b": 120.03692999999998, "coord_origin": "TOPLEFT"}}, {"id": 10, "text": "In the case of IETD, this OCR engine is implicit in the de-", "bbox": {"l": 50.112, "t": 123.08538999999996, "r": 286.36505, "b": 131.99194, "coord_origin": "TOPLEFT"}}, {"id": 11, "text": "coder similar to [24]. For the IEDD, the OCR is solely em-", "bbox": {"l": 50.112, "t": 135.04040999999995, "r": 286.36514, "b": 143.94696, "coord_origin": "TOPLEFT"}}, {"id": 12, "text": "bedded in the content-decoder. This reliance on a custom,", "bbox": {"l": 50.112, "t": 146.99541999999997, "r": 286.36511, "b": 155.90197999999998, "coord_origin": "TOPLEFT"}}, {"id": 13, "text": "implicit OCR decoder is of course problematic. OCR is a", "bbox": {"l": 50.112, "t": 158.95043999999996, "r": 286.36505, "b": 167.85699, "coord_origin": "TOPLEFT"}}, {"id": 14, "text": "well known and extremely tough problem, that often needs", "bbox": {"l": 50.112, "t": 170.90545999999995, "r": 286.36508, "b": 179.81201, "coord_origin": "TOPLEFT"}}, {"id": 15, "text": "custom training for each individual language. However, the", "bbox": {"l": 50.112, "t": 182.86145, "r": 286.36508, "b": 191.76801, "coord_origin": "TOPLEFT"}}, {"id": 16, "text": "limited availability for non-english content in the current", "bbox": {"l": 50.112, "t": 194.81646999999998, "r": 286.36511, "b": 203.72302000000002, "coord_origin": "TOPLEFT"}}, {"id": 17, "text": "datasets, makes it impractical to apply the IETD and IEDD", "bbox": {"l": 50.112, "t": 206.77148, "r": 286.36511, "b": 215.67804, "coord_origin": "TOPLEFT"}}, {"id": 18, "text": "methods on tables with other languages. Additionally, OCR", "bbox": {"l": 50.112, "t": 218.7265, "r": 286.36505, "b": 227.63306, "coord_origin": "TOPLEFT"}}, {"id": 19, "text": "can be completely omitted if the tables originate from pro-", "bbox": {"l": 50.112, "t": 230.68151999999998, "r": 286.36505, "b": 239.58807000000002, "coord_origin": "TOPLEFT"}}, {"id": 20, "text": "grammatic PDF documents with known positions of each", "bbox": {"l": 50.112, "t": 242.63653999999997, "r": 286.36511, "b": 251.54309, "coord_origin": "TOPLEFT"}}, {"id": 21, "text": "cell. The latter was the inspiration for the work of this pa-", "bbox": {"l": 50.112, "t": 254.59253, "r": 286.36508, "b": 263.49908000000005, "coord_origin": "TOPLEFT"}}, {"id": 22, "text": "per.", "bbox": {"l": 50.112, "t": 266.54755, "r": 64.776947, "b": 275.45410000000004, "coord_origin": "TOPLEFT"}}, {"id": 23, "text": "Graph Neural networks", "bbox": {"l": 62.067001, "t": 278.43895999999995, "r": 171.56593, "b": 287.39536, "coord_origin": "TOPLEFT"}}, {"id": 24, "text": ":", "bbox": {"l": 171.56799, "t": 278.55853, "r": 174.3376, "b": 287.46509, "coord_origin": "TOPLEFT"}}, {"id": 25, "text": "Graph Neural networks", "bbox": {"l": 185.18687, "t": 278.55853, "r": 286.35709, "b": 287.46509, "coord_origin": "TOPLEFT"}}, {"id": 26, "text": "(GNN\u2019s) take a radically different approach to table-", "bbox": {"l": 50.111992, "t": 290.51453000000004, "r": 286.36511, "b": 299.42108, "coord_origin": "TOPLEFT"}}, {"id": 27, "text": "structure extraction.", "bbox": {"l": 50.111992, "t": 302.46950999999996, "r": 131.16771, "b": 311.37607, "coord_origin": "TOPLEFT"}}, {"id": 28, "text": "Note that one table cell can consti-", "bbox": {"l": 138.84888, "t": 302.46950999999996, "r": 286.36508, "b": 311.37607, "coord_origin": "TOPLEFT"}}, {"id": 29, "text": "tute out of multiple text-cells. To obtain the table-structure,", "bbox": {"l": 50.111992, "t": 314.4245, "r": 286.36505, "b": 323.33105, "coord_origin": "TOPLEFT"}}, {"id": 30, "text": "one creates an initial graph, where each of the text-cells", "bbox": {"l": 50.111992, "t": 326.37949000000003, "r": 286.36508, "b": 335.28604, "coord_origin": "TOPLEFT"}}, {"id": 31, "text": "becomes a node in the graph similar to [33, 34, 2]. Each", "bbox": {"l": 50.111992, "t": 338.33447, "r": 286.36505, "b": 347.2410300000001, "coord_origin": "TOPLEFT"}}, {"id": 32, "text": "node is then associated with en embedding vector coming", "bbox": {"l": 50.111992, "t": 350.28946, "r": 286.36505, "b": 359.19601, "coord_origin": "TOPLEFT"}}, {"id": 33, "text": "from the encoded image, its coordinates and the encoded", "bbox": {"l": 50.111992, "t": 362.24545000000006, "r": 286.36508, "b": 371.15201, "coord_origin": "TOPLEFT"}}, {"id": 34, "text": "text. Furthermore, nodes that represent adjacent text-cells", "bbox": {"l": 50.111992, "t": 374.20044, "r": 286.36508, "b": 383.10699, "coord_origin": "TOPLEFT"}}, {"id": 35, "text": "are linked. Graph Convolutional Networks (GCN\u2019s) based", "bbox": {"l": 50.111992, "t": 386.15542999999997, "r": 286.36508, "b": 395.06198, "coord_origin": "TOPLEFT"}}, {"id": 36, "text": "methods take the image as an input, but also the position of", "bbox": {"l": 50.111992, "t": 398.11041000000006, "r": 286.36508, "b": 407.01697, "coord_origin": "TOPLEFT"}}, {"id": 37, "text": "the text-cells and their content [18]. The purpose of a GCN", "bbox": {"l": 50.111992, "t": 410.0654, "r": 286.36508, "b": 418.97195, "coord_origin": "TOPLEFT"}}, {"id": 38, "text": "is to transform the input graph into a new graph, which re-", "bbox": {"l": 50.111992, "t": 422.02038999999996, "r": 286.36505, "b": 430.92694, "coord_origin": "TOPLEFT"}}, {"id": 39, "text": "places the old links with new ones.", "bbox": {"l": 50.111992, "t": 433.97638, "r": 198.2359, "b": 442.88293, "coord_origin": "TOPLEFT"}}, {"id": 40, "text": "The new links then", "bbox": {"l": 205.92703, "t": 433.97638, "r": 286.36505, "b": 442.88293, "coord_origin": "TOPLEFT"}}, {"id": 41, "text": "represent the table-structure. With this approach, one can", "bbox": {"l": 50.111992, "t": 445.93137, "r": 286.36508, "b": 454.83792000000005, "coord_origin": "TOPLEFT"}}, {"id": 42, "text": "avoid the need to build custom OCR decoders. However,", "bbox": {"l": 50.111992, "t": 457.88635, "r": 286.36505, "b": 466.79291, "coord_origin": "TOPLEFT"}}, {"id": 43, "text": "the quality of the reconstructed structure is not comparable", "bbox": {"l": 50.111992, "t": 469.84134, "r": 286.36505, "b": 478.74789, "coord_origin": "TOPLEFT"}}, {"id": 44, "text": "to the current state-of-the-art [18].", "bbox": {"l": 50.111992, "t": 481.79633, "r": 186.49998, "b": 490.70288, "coord_origin": "TOPLEFT"}}, {"id": 45, "text": "Hybrid Deep Learning-Rule-Based approach", "bbox": {"l": 62.066994, "t": 493.68875, "r": 252.88068000000004, "b": 502.64514, "coord_origin": "TOPLEFT"}}, {"id": 46, "text": ": A pop-", "bbox": {"l": 252.88199, "t": 493.80832, "r": 286.36627, "b": 502.71487, "coord_origin": "TOPLEFT"}}, {"id": 47, "text": "ular current model for table-structure identification is the", "bbox": {"l": 50.111984, "t": 505.76331, "r": 286.36505, "b": 514.66986, "coord_origin": "TOPLEFT"}}, {"id": 48, "text": "use of a hybrid Deep Learning-Rule-Based approach similar", "bbox": {"l": 50.111984, "t": 517.71829, "r": 286.36505, "b": 526.6248499999999, "coord_origin": "TOPLEFT"}}, {"id": 49, "text": "to [27, 29]. In this approach, one first detects the position of", "bbox": {"l": 50.111984, "t": 529.67328, "r": 286.36508, "b": 538.57985, "coord_origin": "TOPLEFT"}}, {"id": 50, "text": "the table-cells with object detection (e.g. YoloVx or Mask-", "bbox": {"l": 50.111984, "t": 541.62929, "r": 286.36508, "b": 550.53584, "coord_origin": "TOPLEFT"}}, {"id": 51, "text": "RCNN), then classifies the table into different types (from", "bbox": {"l": 50.111984, "t": 553.58429, "r": 286.36511, "b": 562.4908399999999, "coord_origin": "TOPLEFT"}}, {"id": 52, "text": "its images) and finally uses different rule-sets to obtain", "bbox": {"l": 50.111984, "t": 565.5392899999999, "r": 286.36511, "b": 574.44585, "coord_origin": "TOPLEFT"}}, {"id": 53, "text": "its table-structure. Currently, this approach achieves state-", "bbox": {"l": 50.111984, "t": 577.49429, "r": 286.36502, "b": 586.40085, "coord_origin": "TOPLEFT"}}, {"id": 54, "text": "of-the-art results, but is not an end-to-end deep-learning", "bbox": {"l": 50.111984, "t": 589.4493, "r": 286.36505, "b": 598.35585, "coord_origin": "TOPLEFT"}}, {"id": 55, "text": "method. As such, new rules need to be written if different", "bbox": {"l": 50.111984, "t": 601.4043, "r": 286.36502, "b": 610.31085, "coord_origin": "TOPLEFT"}}, {"id": 56, "text": "types of tables are encountered.", "bbox": {"l": 50.111984, "t": 613.36029, "r": 175.98943, "b": 622.26685, "coord_origin": "TOPLEFT"}}, {"id": 57, "text": "3.", "bbox": {"l": 50.111984, "t": 635.94484, "r": 57.82375699999999, "b": 646.6925699999999, "coord_origin": "TOPLEFT"}}, {"id": 58, "text": "Datasets", "bbox": {"l": 68.106125, "t": 635.94484, "r": 105.22546, "b": 646.6925699999999, "coord_origin": "TOPLEFT"}}, {"id": 59, "text": "We rely on large-scale datasets such as PubTabNet [37],", "bbox": {"l": 62.06698600000001, "t": 656.42529, "r": 286.36493, "b": 665.33186, "coord_origin": "TOPLEFT"}}, {"id": 60, "text": "FinTabNet [36], and TableBank [17] datasets to train and", "bbox": {"l": 50.111984, "t": 668.38029, "r": 286.36508, "b": 677.2868599999999, "coord_origin": "TOPLEFT"}}, {"id": 61, "text": "evaluate our models. These datasets span over various ap-", "bbox": {"l": 50.111984, "t": 680.3353, "r": 286.36502, "b": 689.24186, "coord_origin": "TOPLEFT"}}, {"id": 62, "text": "pearance styles and content.", "bbox": {"l": 50.111984, "t": 692.290298, "r": 166.24602, "b": 701.196861, "coord_origin": "TOPLEFT"}}, {"id": 63, "text": "We also introduce our own", "bbox": {"l": 173.68808, "t": 692.290298, "r": 286.36508, "b": 701.196861, "coord_origin": "TOPLEFT"}}, {"id": 64, "text": "synthetically generated SynthTabNet dataset to fix an im-", "bbox": {"l": 50.111984, "t": 704.2453, "r": 286.36505, "b": 713.151863, "coord_origin": "TOPLEFT"}}, {"id": 65, "text": "PubTabNet + FinTabNet", "bbox": {"l": 380.79849, "t": 79.81176999999991, "r": 486.84909, "b": 88.55975000000001, "coord_origin": "TOPLEFT"}}, {"id": 66, "text": "Rows / Columns", "bbox": {"l": 396.76776, "t": 242.02697999999998, "r": 469.78748, "b": 250.77495999999996, "coord_origin": "TOPLEFT"}}, {"id": 67, "text": "0", "bbox": {"l": 320.97653, "t": 233.42296999999996, "r": 324.79254, "b": 239.255, "coord_origin": "TOPLEFT"}}, {"id": 68, "text": "20", "bbox": {"l": 410.483, "t": 233.42296999999996, "r": 418.11319, "b": 239.255, "coord_origin": "TOPLEFT"}}, {"id": 69, "text": "40", "bbox": {"l": 500.84949, "t": 233.42296999999996, "r": 508.47968000000003, "b": 239.255, "coord_origin": "TOPLEFT"}}, {"id": 70, "text": "10", "bbox": {"l": 365.29999, "t": 233.42296999999996, "r": 372.93018, "b": 239.255, "coord_origin": "TOPLEFT"}}, {"id": 71, "text": "30", "bbox": {"l": 455.66626, "t": 233.42296999999996, "r": 463.29645, "b": 239.255, "coord_origin": "TOPLEFT"}}, {"id": 72, "text": "50", "bbox": {"l": 542.03528, "t": 233.42296999999996, "r": 549.66547, "b": 239.255, "coord_origin": "TOPLEFT"}}, {"id": 73, "text": "0", "bbox": {"l": 316.04474, "t": 230.44617000000005, "r": 319.86075, "b": 236.27819999999997, "coord_origin": "TOPLEFT"}}, {"id": 74, "text": "2", "bbox": {"l": 312.62521, "t": 198.69073000000003, "r": 316.44122, "b": 204.52277000000004, "coord_origin": "TOPLEFT"}}, {"id": 75, "text": "0", "bbox": {"l": 316.43942, "t": 198.69073000000003, "r": 320.2554, "b": 204.52277000000004, "coord_origin": "TOPLEFT"}}, {"id": 76, "text": "4", "bbox": {"l": 313.14951, "t": 168.09795999999994, "r": 316.96552, "b": 173.92998999999998, "coord_origin": "TOPLEFT"}}, {"id": 77, "text": "0", "bbox": {"l": 316.96371, "t": 168.09795999999994, "r": 320.77969, "b": 173.92998999999998, "coord_origin": "TOPLEFT"}}, {"id": 78, "text": "6", "bbox": {"l": 312.92972, "t": 136.58771000000002, "r": 316.74573, "b": 142.41974000000005, "coord_origin": "TOPLEFT"}}, {"id": 79, "text": "0", "bbox": {"l": 316.74393, "t": 136.58771000000002, "r": 320.55991, "b": 142.41974000000005, "coord_origin": "TOPLEFT"}}, {"id": 80, "text": "8", "bbox": {"l": 312.48227, "t": 105.60175000000004, "r": 316.29828, "b": 111.43377999999996, "coord_origin": "TOPLEFT"}}, {"id": 81, "text": "0", "bbox": {"l": 316.29648, "t": 105.60175000000004, "r": 320.11246, "b": 111.43377999999996, "coord_origin": "TOPLEFT"}}, {"id": 82, "text": "1", "bbox": {"l": 312.48227, "t": 212.25922000000003, "r": 316.29828, "b": 218.09124999999995, "coord_origin": "TOPLEFT"}}, {"id": 83, "text": "0", "bbox": {"l": 316.29648, "t": 212.25922000000003, "r": 320.11246, "b": 218.09124999999995, "coord_origin": "TOPLEFT"}}, {"id": 84, "text": "3", "bbox": {"l": 313.07639, "t": 183.72198000000003, "r": 316.8924, "b": 189.55402000000004, "coord_origin": "TOPLEFT"}}, {"id": 85, "text": "0", "bbox": {"l": 316.89059, "t": 183.72198000000003, "r": 320.70657, "b": 189.55402000000004, "coord_origin": "TOPLEFT"}}, {"id": 86, "text": "5", "bbox": {"l": 312.76321, "t": 152.47400000000005, "r": 316.57922, "b": 158.30602999999996, "coord_origin": "TOPLEFT"}}, {"id": 87, "text": "0", "bbox": {"l": 316.57742, "t": 152.47400000000005, "r": 320.3934, "b": 158.30602999999996, "coord_origin": "TOPLEFT"}}, {"id": 88, "text": "7", "bbox": {"l": 312.19775, "t": 120.57050000000004, "r": 316.01376, "b": 126.40252999999996, "coord_origin": "TOPLEFT"}}, {"id": 89, "text": "0", "bbox": {"l": 316.01196, "t": 120.57050000000004, "r": 319.82794, "b": 126.40252999999996, "coord_origin": "TOPLEFT"}}, {"id": 90, "text": "9", "bbox": {"l": 312.8165, "t": 90.1087, "r": 316.63251, "b": 95.94073000000003, "coord_origin": "TOPLEFT"}}, {"id": 91, "text": "0", "bbox": {"l": 316.63071, "t": 90.1087, "r": 320.44669, "b": 95.94073000000003, "coord_origin": "TOPLEFT"}}, {"id": 92, "text": "0", "bbox": {"l": 532.17426, "t": 222.72729000000004, "r": 536.94427, "b": 230.01727000000005, "coord_origin": "TOPLEFT"}}, {"id": 93, "text": "10K", "bbox": {"l": 532.87952, "t": 108.26702999999986, "r": 547.61249, "b": 115.55700999999999, "coord_origin": "TOPLEFT"}}, {"id": 94, "text": "8K", "bbox": {"l": 532.7735, "t": 130.78101000000004, "r": 542.73877, "b": 138.07097999999996, "coord_origin": "TOPLEFT"}}, {"id": 95, "text": "6K", "bbox": {"l": 532.79901, "t": 153.92352000000005, "r": 542.76428, "b": 161.21349999999995, "coord_origin": "TOPLEFT"}}, {"id": 96, "text": "4K", "bbox": {"l": 532.5705, "t": 176.75800000000004, "r": 542.53577, "b": 184.04796999999996, "coord_origin": "TOPLEFT"}}, {"id": 97, "text": "2K", "bbox": {"l": 532.14551, "t": 199.6463, "r": 542.11078, "b": 206.93628, "coord_origin": "TOPLEFT"}}, {"id": 98, "text": "Figure 2:", "bbox": {"l": 308.862, "t": 267.83636, "r": 346.06238, "b": 276.74292, "coord_origin": "TOPLEFT"}}, {"id": 99, "text": "Distribution of the tables across different table", "bbox": {"l": 354.49072, "t": 267.83636, "r": 545.11511, "b": 276.74292, "coord_origin": "TOPLEFT"}}, {"id": 100, "text": "dimensions in PubTabNet + FinTabNet datasets", "bbox": {"l": 308.862, "t": 279.79132000000004, "r": 498.56989, "b": 288.6979099999999, "coord_origin": "TOPLEFT"}}, {"id": 101, "text": "balance in the previous datasets.", "bbox": {"l": 308.862, "t": 317.47336, "r": 437.27002, "b": 326.37991, "coord_origin": "TOPLEFT"}}, {"id": 102, "text": "The PubTabNet dataset contains 509k tables delivered as", "bbox": {"l": 320.81699, "t": 331.53137, "r": 545.11505, "b": 340.43793, "coord_origin": "TOPLEFT"}}, {"id": 103, "text": "annotated PNG images. The annotations consist of the table", "bbox": {"l": 308.862, "t": 343.48635999999993, "r": 545.11517, "b": 352.39291, "coord_origin": "TOPLEFT"}}, {"id": 104, "text": "structure represented in HTML format, the tokenized text", "bbox": {"l": 308.862, "t": 355.44235, "r": 545.11505, "b": 364.34890999999993, "coord_origin": "TOPLEFT"}}, {"id": 105, "text": "and its bounding boxes per table cell. Fig. 1 shows the ap-", "bbox": {"l": 308.862, "t": 367.39734, "r": 545.11505, "b": 376.30389, "coord_origin": "TOPLEFT"}}, {"id": 106, "text": "pearance style of PubTabNet. Depending on its complexity,", "bbox": {"l": 308.862, "t": 379.35233, "r": 545.11511, "b": 388.25888, "coord_origin": "TOPLEFT"}}, {"id": 107, "text": "a table is characterized as \u201csimple\u201d when it does not contain", "bbox": {"l": 308.862, "t": 391.30731, "r": 545.11511, "b": 400.21386999999993, "coord_origin": "TOPLEFT"}}, {"id": 108, "text": "row spans or column spans, otherwise it is \u201ccomplex\u201d. The", "bbox": {"l": 308.862, "t": 403.26230000000004, "r": 545.11505, "b": 412.16885, "coord_origin": "TOPLEFT"}}, {"id": 109, "text": "dataset is divided into Train and Val splits (roughly 98% and", "bbox": {"l": 308.862, "t": 415.21729, "r": 545.11511, "b": 424.12384, "coord_origin": "TOPLEFT"}}, {"id": 110, "text": "2%). The Train split consists of 54% simple and 46% com-", "bbox": {"l": 308.862, "t": 427.17328, "r": 545.11517, "b": 436.0798300000001, "coord_origin": "TOPLEFT"}}, {"id": 111, "text": "plex tables and the Val split of 51% and 49% respectively.", "bbox": {"l": 308.862, "t": 439.12827, "r": 545.11517, "b": 448.03482, "coord_origin": "TOPLEFT"}}, {"id": 112, "text": "The FinTabNet dataset contains 112k tables delivered as", "bbox": {"l": 308.862, "t": 451.08325, "r": 545.11511, "b": 459.98981000000003, "coord_origin": "TOPLEFT"}}, {"id": 113, "text": "single-page PDF documents with mixed table structures and", "bbox": {"l": 308.862, "t": 463.03824, "r": 545.11505, "b": 471.94479, "coord_origin": "TOPLEFT"}}, {"id": 114, "text": "text content. Similarly to the PubTabNet, the annotations", "bbox": {"l": 308.862, "t": 474.99323, "r": 545.11511, "b": 483.89978, "coord_origin": "TOPLEFT"}}, {"id": 115, "text": "of FinTabNet include the table structure in HTML, the to-", "bbox": {"l": 308.862, "t": 486.94922, "r": 545.11511, "b": 495.85577, "coord_origin": "TOPLEFT"}}, {"id": 116, "text": "kenized text and the bounding boxes on a table cell basis.", "bbox": {"l": 308.862, "t": 498.90421, "r": 545.11511, "b": 507.81076, "coord_origin": "TOPLEFT"}}, {"id": 117, "text": "The dataset is divided into Train, Test and Val splits (81%,", "bbox": {"l": 308.862, "t": 510.85919, "r": 545.11517, "b": 519.76575, "coord_origin": "TOPLEFT"}}, {"id": 118, "text": "9.5%, 9.5%), and each one is almost equally divided into", "bbox": {"l": 308.862, "t": 522.8141800000001, "r": 545.11517, "b": 531.72073, "coord_origin": "TOPLEFT"}}, {"id": 119, "text": "simple and complex tables (Train: 48% simple, 52% com-", "bbox": {"l": 308.862, "t": 534.76917, "r": 545.11505, "b": 543.67574, "coord_origin": "TOPLEFT"}}, {"id": 120, "text": "plex, Test: 48% simple, 52% complex, Test: 53% simple,", "bbox": {"l": 308.862, "t": 546.72418, "r": 545.11511, "b": 555.6307400000001, "coord_origin": "TOPLEFT"}}, {"id": 121, "text": "47% complex). Finally the TableBank dataset consists of", "bbox": {"l": 308.862, "t": 558.6801800000001, "r": 545.11511, "b": 567.58673, "coord_origin": "TOPLEFT"}}, {"id": 122, "text": "145k tables provided as JPEG images. The latter has anno-", "bbox": {"l": 308.862, "t": 570.63518, "r": 545.11505, "b": 579.54173, "coord_origin": "TOPLEFT"}}, {"id": 123, "text": "tations for the table structure, but only few with bounding", "bbox": {"l": 308.862, "t": 582.59018, "r": 545.11499, "b": 591.49673, "coord_origin": "TOPLEFT"}}, {"id": 124, "text": "boxes of the table cells. The entire dataset consists of sim-", "bbox": {"l": 308.862, "t": 594.54518, "r": 545.11517, "b": 603.45174, "coord_origin": "TOPLEFT"}}, {"id": 125, "text": "ple tables and it is divided into 90% Train, 3% Test and 7%", "bbox": {"l": 308.862, "t": 606.50018, "r": 545.11511, "b": 615.40674, "coord_origin": "TOPLEFT"}}, {"id": 126, "text": "Val splits.", "bbox": {"l": 308.862, "t": 618.45518, "r": 348.16446, "b": 627.36174, "coord_origin": "TOPLEFT"}}, {"id": 127, "text": "Due to the heterogeneity across the dataset formats, it", "bbox": {"l": 320.81699, "t": 632.51419, "r": 545.11487, "b": 641.42075, "coord_origin": "TOPLEFT"}}, {"id": 128, "text": "was necessary to combine all available data into one homog-", "bbox": {"l": 308.862, "t": 644.46919, "r": 545.11511, "b": 653.37575, "coord_origin": "TOPLEFT"}}, {"id": 129, "text": "enized dataset before we could train our models for practi-", "bbox": {"l": 308.862, "t": 656.42419, "r": 545.11511, "b": 665.33076, "coord_origin": "TOPLEFT"}}, {"id": 130, "text": "cal purposes. Given the size of PubTabNet, we adopted its", "bbox": {"l": 308.862, "t": 668.38019, "r": 545.11499, "b": 677.28676, "coord_origin": "TOPLEFT"}}, {"id": 131, "text": "annotation format and we extracted and converted all tables", "bbox": {"l": 308.862, "t": 680.33519, "r": 545.11505, "b": 689.24176, "coord_origin": "TOPLEFT"}}, {"id": 132, "text": "as PNG images with a resolution of 72 dpi. Additionally,", "bbox": {"l": 308.862, "t": 692.290192, "r": 545.11505, "b": 701.196762, "coord_origin": "TOPLEFT"}}, {"id": 133, "text": "we have filtered out tables with extreme sizes due to small", "bbox": {"l": 308.862, "t": 704.245193, "r": 545.11511, "b": 713.151764, "coord_origin": "TOPLEFT"}}, {"id": 134, "text": "3", "bbox": {"l": 295.121, "t": 734.133198, "r": 300.10229, "b": 743.039761, "coord_origin": "TOPLEFT"}}], "predictions": {"layout": {"clusters": [{"id": 0, "label": "text", "bbox": {"l": 49.76145935058594, "t": 74.29176330566406, "r": 250.15102, "b": 84.22994995117188, "coord_origin": "TOPLEFT"}, "confidence": 0.8767215609550476, "cells": [{"id": 0, "text": "tag-decoder which is constrained to the table-tags.", "bbox": {"l": 50.112, "t": 75.20836999999995, "r": 250.15102, "b": 84.11492999999996, "coord_origin": "TOPLEFT"}}]}, {"id": 1, "label": "text", "bbox": {"l": 49.38212203979492, "t": 85.73994445800781, "r": 286.748046875, "b": 275.4715270996094, "coord_origin": "TOPLEFT"}, "confidence": 0.9822593927383423, "cells": [{"id": 1, "text": "In", "bbox": {"l": 62.067001, "t": 87.21935999999994, "r": 70.365845, "b": 96.12591999999995, "coord_origin": "TOPLEFT"}}, {"id": 2, "text": "practice,", "bbox": {"l": 76.931198, "t": 87.21935999999994, "r": 110.95348000000001, "b": 96.12591999999995, "coord_origin": "TOPLEFT"}}, {"id": 3, "text": "both", "bbox": {"l": 118.54498, "t": 87.21935999999994, "r": 136.25848, "b": 96.12591999999995, "coord_origin": "TOPLEFT"}}, {"id": 4, "text": "network", "bbox": {"l": 142.82384, "t": 87.21935999999994, "r": 175.37166, "b": 96.12591999999995, "coord_origin": "TOPLEFT"}}, {"id": 5, "text": "architectures", "bbox": {"l": 181.94698, "t": 87.21935999999994, "r": 232.83594000000002, "b": 96.12591999999995, "coord_origin": "TOPLEFT"}}, {"id": 6, "text": "(IETD", "bbox": {"l": 239.41125, "t": 87.21935999999994, "r": 265.41364, "b": 96.12591999999995, "coord_origin": "TOPLEFT"}}, {"id": 7, "text": "and", "bbox": {"l": 271.979, "t": 87.21935999999994, "r": 286.36499, "b": 96.12591999999995, "coord_origin": "TOPLEFT"}}, {"id": 8, "text": "IEDD) require an implicit, custom trained object-character-", "bbox": {"l": 50.112, "t": 99.17437999999993, "r": 286.36505, "b": 108.08092999999997, "coord_origin": "TOPLEFT"}}, {"id": 9, "text": "recognition (OCR) to obtain the content of the table-cells.", "bbox": {"l": 50.112, "t": 111.13036999999997, "r": 286.36511, "b": 120.03692999999998, "coord_origin": "TOPLEFT"}}, {"id": 10, "text": "In the case of IETD, this OCR engine is implicit in the de-", "bbox": {"l": 50.112, "t": 123.08538999999996, "r": 286.36505, "b": 131.99194, "coord_origin": "TOPLEFT"}}, {"id": 11, "text": "coder similar to [24]. For the IEDD, the OCR is solely em-", "bbox": {"l": 50.112, "t": 135.04040999999995, "r": 286.36514, "b": 143.94696, "coord_origin": "TOPLEFT"}}, {"id": 12, "text": "bedded in the content-decoder. This reliance on a custom,", "bbox": {"l": 50.112, "t": 146.99541999999997, "r": 286.36511, "b": 155.90197999999998, "coord_origin": "TOPLEFT"}}, {"id": 13, "text": "implicit OCR decoder is of course problematic. OCR is a", "bbox": {"l": 50.112, "t": 158.95043999999996, "r": 286.36505, "b": 167.85699, "coord_origin": "TOPLEFT"}}, {"id": 14, "text": "well known and extremely tough problem, that often needs", "bbox": {"l": 50.112, "t": 170.90545999999995, "r": 286.36508, "b": 179.81201, "coord_origin": "TOPLEFT"}}, {"id": 15, "text": "custom training for each individual language. However, the", "bbox": {"l": 50.112, "t": 182.86145, "r": 286.36508, "b": 191.76801, "coord_origin": "TOPLEFT"}}, {"id": 16, "text": "limited availability for non-english content in the current", "bbox": {"l": 50.112, "t": 194.81646999999998, "r": 286.36511, "b": 203.72302000000002, "coord_origin": "TOPLEFT"}}, {"id": 17, "text": "datasets, makes it impractical to apply the IETD and IEDD", "bbox": {"l": 50.112, "t": 206.77148, "r": 286.36511, "b": 215.67804, "coord_origin": "TOPLEFT"}}, {"id": 18, "text": "methods on tables with other languages. Additionally, OCR", "bbox": {"l": 50.112, "t": 218.7265, "r": 286.36505, "b": 227.63306, "coord_origin": "TOPLEFT"}}, {"id": 19, "text": "can be completely omitted if the tables originate from pro-", "bbox": {"l": 50.112, "t": 230.68151999999998, "r": 286.36505, "b": 239.58807000000002, "coord_origin": "TOPLEFT"}}, {"id": 20, "text": "grammatic PDF documents with known positions of each", "bbox": {"l": 50.112, "t": 242.63653999999997, "r": 286.36511, "b": 251.54309, "coord_origin": "TOPLEFT"}}, {"id": 21, "text": "cell. The latter was the inspiration for the work of this pa-", "bbox": {"l": 50.112, "t": 254.59253, "r": 286.36508, "b": 263.49908000000005, "coord_origin": "TOPLEFT"}}, {"id": 22, "text": "per.", "bbox": {"l": 50.112, "t": 266.54755, "r": 64.776947, "b": 275.45410000000004, "coord_origin": "TOPLEFT"}}]}, {"id": 2, "label": "text", "bbox": {"l": 49.258270263671875, "t": 277.4222106933594, "r": 286.6703186035156, "b": 490.70288, "coord_origin": "TOPLEFT"}, "confidence": 0.9878448843955994, "cells": [{"id": 23, "text": "Graph Neural networks", "bbox": {"l": 62.067001, "t": 278.43895999999995, "r": 171.56593, "b": 287.39536, "coord_origin": "TOPLEFT"}}, {"id": 24, "text": ":", "bbox": {"l": 171.56799, "t": 278.55853, "r": 174.3376, "b": 287.46509, "coord_origin": "TOPLEFT"}}, {"id": 25, "text": "Graph Neural networks", "bbox": {"l": 185.18687, "t": 278.55853, "r": 286.35709, "b": 287.46509, "coord_origin": "TOPLEFT"}}, {"id": 26, "text": "(GNN\u2019s) take a radically different approach to table-", "bbox": {"l": 50.111992, "t": 290.51453000000004, "r": 286.36511, "b": 299.42108, "coord_origin": "TOPLEFT"}}, {"id": 27, "text": "structure extraction.", "bbox": {"l": 50.111992, "t": 302.46950999999996, "r": 131.16771, "b": 311.37607, "coord_origin": "TOPLEFT"}}, {"id": 28, "text": "Note that one table cell can consti-", "bbox": {"l": 138.84888, "t": 302.46950999999996, "r": 286.36508, "b": 311.37607, "coord_origin": "TOPLEFT"}}, {"id": 29, "text": "tute out of multiple text-cells. To obtain the table-structure,", "bbox": {"l": 50.111992, "t": 314.4245, "r": 286.36505, "b": 323.33105, "coord_origin": "TOPLEFT"}}, {"id": 30, "text": "one creates an initial graph, where each of the text-cells", "bbox": {"l": 50.111992, "t": 326.37949000000003, "r": 286.36508, "b": 335.28604, "coord_origin": "TOPLEFT"}}, {"id": 31, "text": "becomes a node in the graph similar to [33, 34, 2]. Each", "bbox": {"l": 50.111992, "t": 338.33447, "r": 286.36505, "b": 347.2410300000001, "coord_origin": "TOPLEFT"}}, {"id": 32, "text": "node is then associated with en embedding vector coming", "bbox": {"l": 50.111992, "t": 350.28946, "r": 286.36505, "b": 359.19601, "coord_origin": "TOPLEFT"}}, {"id": 33, "text": "from the encoded image, its coordinates and the encoded", "bbox": {"l": 50.111992, "t": 362.24545000000006, "r": 286.36508, "b": 371.15201, "coord_origin": "TOPLEFT"}}, {"id": 34, "text": "text. Furthermore, nodes that represent adjacent text-cells", "bbox": {"l": 50.111992, "t": 374.20044, "r": 286.36508, "b": 383.10699, "coord_origin": "TOPLEFT"}}, {"id": 35, "text": "are linked. Graph Convolutional Networks (GCN\u2019s) based", "bbox": {"l": 50.111992, "t": 386.15542999999997, "r": 286.36508, "b": 395.06198, "coord_origin": "TOPLEFT"}}, {"id": 36, "text": "methods take the image as an input, but also the position of", "bbox": {"l": 50.111992, "t": 398.11041000000006, "r": 286.36508, "b": 407.01697, "coord_origin": "TOPLEFT"}}, {"id": 37, "text": "the text-cells and their content [18]. The purpose of a GCN", "bbox": {"l": 50.111992, "t": 410.0654, "r": 286.36508, "b": 418.97195, "coord_origin": "TOPLEFT"}}, {"id": 38, "text": "is to transform the input graph into a new graph, which re-", "bbox": {"l": 50.111992, "t": 422.02038999999996, "r": 286.36505, "b": 430.92694, "coord_origin": "TOPLEFT"}}, {"id": 39, "text": "places the old links with new ones.", "bbox": {"l": 50.111992, "t": 433.97638, "r": 198.2359, "b": 442.88293, "coord_origin": "TOPLEFT"}}, {"id": 40, "text": "The new links then", "bbox": {"l": 205.92703, "t": 433.97638, "r": 286.36505, "b": 442.88293, "coord_origin": "TOPLEFT"}}, {"id": 41, "text": "represent the table-structure. With this approach, one can", "bbox": {"l": 50.111992, "t": 445.93137, "r": 286.36508, "b": 454.83792000000005, "coord_origin": "TOPLEFT"}}, {"id": 42, "text": "avoid the need to build custom OCR decoders. However,", "bbox": {"l": 50.111992, "t": 457.88635, "r": 286.36505, "b": 466.79291, "coord_origin": "TOPLEFT"}}, {"id": 43, "text": "the quality of the reconstructed structure is not comparable", "bbox": {"l": 50.111992, "t": 469.84134, "r": 286.36505, "b": 478.74789, "coord_origin": "TOPLEFT"}}, {"id": 44, "text": "to the current state-of-the-art [18].", "bbox": {"l": 50.111992, "t": 481.79633, "r": 186.49998, "b": 490.70288, "coord_origin": "TOPLEFT"}}]}, {"id": 3, "label": "text", "bbox": {"l": 49.33047866821289, "t": 492.9067077636719, "r": 286.740234375, "b": 622.9658203125, "coord_origin": "TOPLEFT"}, "confidence": 0.9875094294548035, "cells": [{"id": 45, "text": "Hybrid Deep Learning-Rule-Based approach", "bbox": {"l": 62.066994, "t": 493.68875, "r": 252.88068000000004, "b": 502.64514, "coord_origin": "TOPLEFT"}}, {"id": 46, "text": ": A pop-", "bbox": {"l": 252.88199, "t": 493.80832, "r": 286.36627, "b": 502.71487, "coord_origin": "TOPLEFT"}}, {"id": 47, "text": "ular current model for table-structure identification is the", "bbox": {"l": 50.111984, "t": 505.76331, "r": 286.36505, "b": 514.66986, "coord_origin": "TOPLEFT"}}, {"id": 48, "text": "use of a hybrid Deep Learning-Rule-Based approach similar", "bbox": {"l": 50.111984, "t": 517.71829, "r": 286.36505, "b": 526.6248499999999, "coord_origin": "TOPLEFT"}}, {"id": 49, "text": "to [27, 29]. In this approach, one first detects the position of", "bbox": {"l": 50.111984, "t": 529.67328, "r": 286.36508, "b": 538.57985, "coord_origin": "TOPLEFT"}}, {"id": 50, "text": "the table-cells with object detection (e.g. YoloVx or Mask-", "bbox": {"l": 50.111984, "t": 541.62929, "r": 286.36508, "b": 550.53584, "coord_origin": "TOPLEFT"}}, {"id": 51, "text": "RCNN), then classifies the table into different types (from", "bbox": {"l": 50.111984, "t": 553.58429, "r": 286.36511, "b": 562.4908399999999, "coord_origin": "TOPLEFT"}}, {"id": 52, "text": "its images) and finally uses different rule-sets to obtain", "bbox": {"l": 50.111984, "t": 565.5392899999999, "r": 286.36511, "b": 574.44585, "coord_origin": "TOPLEFT"}}, {"id": 53, "text": "its table-structure. Currently, this approach achieves state-", "bbox": {"l": 50.111984, "t": 577.49429, "r": 286.36502, "b": 586.40085, "coord_origin": "TOPLEFT"}}, {"id": 54, "text": "of-the-art results, but is not an end-to-end deep-learning", "bbox": {"l": 50.111984, "t": 589.4493, "r": 286.36505, "b": 598.35585, "coord_origin": "TOPLEFT"}}, {"id": 55, "text": "method. As such, new rules need to be written if different", "bbox": {"l": 50.111984, "t": 601.4043, "r": 286.36502, "b": 610.31085, "coord_origin": "TOPLEFT"}}, {"id": 56, "text": "types of tables are encountered.", "bbox": {"l": 50.111984, "t": 613.36029, "r": 175.98943, "b": 622.26685, "coord_origin": "TOPLEFT"}}]}, {"id": 4, "label": "section_header", "bbox": {"l": 49.298309326171875, "t": 635.1035766601562, "r": 105.35755920410156, "b": 646.6925699999999, "coord_origin": "TOPLEFT"}, "confidence": 0.9423062205314636, "cells": [{"id": 57, "text": "3.", "bbox": {"l": 50.111984, "t": 635.94484, "r": 57.82375699999999, "b": 646.6925699999999, "coord_origin": "TOPLEFT"}}, {"id": 58, "text": "Datasets", "bbox": {"l": 68.106125, "t": 635.94484, "r": 105.22546, "b": 646.6925699999999, "coord_origin": "TOPLEFT"}}]}, {"id": 5, "label": "text", "bbox": {"l": 49.53636169433594, "t": 655.4588012695312, "r": 286.584716796875, "b": 713.151863, "coord_origin": "TOPLEFT"}, "confidence": 0.9862047433853149, "cells": [{"id": 59, "text": "We rely on large-scale datasets such as PubTabNet [37],", "bbox": {"l": 62.06698600000001, "t": 656.42529, "r": 286.36493, "b": 665.33186, "coord_origin": "TOPLEFT"}}, {"id": 60, "text": "FinTabNet [36], and TableBank [17] datasets to train and", "bbox": {"l": 50.111984, "t": 668.38029, "r": 286.36508, "b": 677.2868599999999, "coord_origin": "TOPLEFT"}}, {"id": 61, "text": "evaluate our models. These datasets span over various ap-", "bbox": {"l": 50.111984, "t": 680.3353, "r": 286.36502, "b": 689.24186, "coord_origin": "TOPLEFT"}}, {"id": 62, "text": "pearance styles and content.", "bbox": {"l": 50.111984, "t": 692.290298, "r": 166.24602, "b": 701.196861, "coord_origin": "TOPLEFT"}}, {"id": 63, "text": "We also introduce our own", "bbox": {"l": 173.68808, "t": 692.290298, "r": 286.36508, "b": 701.196861, "coord_origin": "TOPLEFT"}}, {"id": 64, "text": "synthetically generated SynthTabNet dataset to fix an im-", "bbox": {"l": 50.111984, "t": 704.2453, "r": 286.36505, "b": 713.151863, "coord_origin": "TOPLEFT"}}]}, {"id": 6, "label": "picture", "bbox": {"l": 312.10369873046875, "t": 78.44086456298828, "r": 550.38916015625, "b": 250.77495999999996, "coord_origin": "TOPLEFT"}, "confidence": 0.9746916890144348, "cells": [{"id": 65, "text": "PubTabNet + FinTabNet", "bbox": {"l": 380.79849, "t": 79.81176999999991, "r": 486.84909, "b": 88.55975000000001, "coord_origin": "TOPLEFT"}}, {"id": 66, "text": "Rows / Columns", "bbox": {"l": 396.76776, "t": 242.02697999999998, "r": 469.78748, "b": 250.77495999999996, "coord_origin": "TOPLEFT"}}, {"id": 67, "text": "0", "bbox": {"l": 320.97653, "t": 233.42296999999996, "r": 324.79254, "b": 239.255, "coord_origin": "TOPLEFT"}}, {"id": 68, "text": "20", "bbox": {"l": 410.483, "t": 233.42296999999996, "r": 418.11319, "b": 239.255, "coord_origin": "TOPLEFT"}}, {"id": 69, "text": "40", "bbox": {"l": 500.84949, "t": 233.42296999999996, "r": 508.47968000000003, "b": 239.255, "coord_origin": "TOPLEFT"}}, {"id": 70, "text": "10", "bbox": {"l": 365.29999, "t": 233.42296999999996, "r": 372.93018, "b": 239.255, "coord_origin": "TOPLEFT"}}, {"id": 71, "text": "30", "bbox": {"l": 455.66626, "t": 233.42296999999996, "r": 463.29645, "b": 239.255, "coord_origin": "TOPLEFT"}}, {"id": 72, "text": "50", "bbox": {"l": 542.03528, "t": 233.42296999999996, "r": 549.66547, "b": 239.255, "coord_origin": "TOPLEFT"}}, {"id": 73, "text": "0", "bbox": {"l": 316.04474, "t": 230.44617000000005, "r": 319.86075, "b": 236.27819999999997, "coord_origin": "TOPLEFT"}}, {"id": 74, "text": "2", "bbox": {"l": 312.62521, "t": 198.69073000000003, "r": 316.44122, "b": 204.52277000000004, "coord_origin": "TOPLEFT"}}, {"id": 75, "text": "0", "bbox": {"l": 316.43942, "t": 198.69073000000003, "r": 320.2554, "b": 204.52277000000004, "coord_origin": "TOPLEFT"}}, {"id": 76, "text": "4", "bbox": {"l": 313.14951, "t": 168.09795999999994, "r": 316.96552, "b": 173.92998999999998, "coord_origin": "TOPLEFT"}}, {"id": 77, "text": "0", "bbox": {"l": 316.96371, "t": 168.09795999999994, "r": 320.77969, "b": 173.92998999999998, "coord_origin": "TOPLEFT"}}, {"id": 78, "text": "6", "bbox": {"l": 312.92972, "t": 136.58771000000002, "r": 316.74573, "b": 142.41974000000005, "coord_origin": "TOPLEFT"}}, {"id": 79, "text": "0", "bbox": {"l": 316.74393, "t": 136.58771000000002, "r": 320.55991, "b": 142.41974000000005, "coord_origin": "TOPLEFT"}}, {"id": 80, "text": "8", "bbox": {"l": 312.48227, "t": 105.60175000000004, "r": 316.29828, "b": 111.43377999999996, "coord_origin": "TOPLEFT"}}, {"id": 81, "text": "0", "bbox": {"l": 316.29648, "t": 105.60175000000004, "r": 320.11246, "b": 111.43377999999996, "coord_origin": "TOPLEFT"}}, {"id": 82, "text": "1", "bbox": {"l": 312.48227, "t": 212.25922000000003, "r": 316.29828, "b": 218.09124999999995, "coord_origin": "TOPLEFT"}}, {"id": 83, "text": "0", "bbox": {"l": 316.29648, "t": 212.25922000000003, "r": 320.11246, "b": 218.09124999999995, "coord_origin": "TOPLEFT"}}, {"id": 84, "text": "3", "bbox": {"l": 313.07639, "t": 183.72198000000003, "r": 316.8924, "b": 189.55402000000004, "coord_origin": "TOPLEFT"}}, {"id": 85, "text": "0", "bbox": {"l": 316.89059, "t": 183.72198000000003, "r": 320.70657, "b": 189.55402000000004, "coord_origin": "TOPLEFT"}}, {"id": 86, "text": "5", "bbox": {"l": 312.76321, "t": 152.47400000000005, "r": 316.57922, "b": 158.30602999999996, "coord_origin": "TOPLEFT"}}, {"id": 87, "text": "0", "bbox": {"l": 316.57742, "t": 152.47400000000005, "r": 320.3934, "b": 158.30602999999996, "coord_origin": "TOPLEFT"}}, {"id": 88, "text": "7", "bbox": {"l": 312.19775, "t": 120.57050000000004, "r": 316.01376, "b": 126.40252999999996, "coord_origin": "TOPLEFT"}}, {"id": 89, "text": "0", "bbox": {"l": 316.01196, "t": 120.57050000000004, "r": 319.82794, "b": 126.40252999999996, "coord_origin": "TOPLEFT"}}, {"id": 90, "text": "9", "bbox": {"l": 312.8165, "t": 90.1087, "r": 316.63251, "b": 95.94073000000003, "coord_origin": "TOPLEFT"}}, {"id": 91, "text": "0", "bbox": {"l": 316.63071, "t": 90.1087, "r": 320.44669, "b": 95.94073000000003, "coord_origin": "TOPLEFT"}}, {"id": 92, "text": "0", "bbox": {"l": 532.17426, "t": 222.72729000000004, "r": 536.94427, "b": 230.01727000000005, "coord_origin": "TOPLEFT"}}, {"id": 93, "text": "10K", "bbox": {"l": 532.87952, "t": 108.26702999999986, "r": 547.61249, "b": 115.55700999999999, "coord_origin": "TOPLEFT"}}, {"id": 94, "text": "8K", "bbox": {"l": 532.7735, "t": 130.78101000000004, "r": 542.73877, "b": 138.07097999999996, "coord_origin": "TOPLEFT"}}, {"id": 95, "text": "6K", "bbox": {"l": 532.79901, "t": 153.92352000000005, "r": 542.76428, "b": 161.21349999999995, "coord_origin": "TOPLEFT"}}, {"id": 96, "text": "4K", "bbox": {"l": 532.5705, "t": 176.75800000000004, "r": 542.53577, "b": 184.04796999999996, "coord_origin": "TOPLEFT"}}, {"id": 97, "text": "2K", "bbox": {"l": 532.14551, "t": 199.6463, "r": 542.11078, "b": 206.93628, "coord_origin": "TOPLEFT"}}]}, {"id": 7, "label": "caption", "bbox": {"l": 308.2174377441406, "t": 267.0390930175781, "r": 545.11511, "b": 288.6979099999999, "coord_origin": "TOPLEFT"}, "confidence": 0.9667505025863647, "cells": [{"id": 98, "text": "Figure 2:", "bbox": {"l": 308.862, "t": 267.83636, "r": 346.06238, "b": 276.74292, "coord_origin": "TOPLEFT"}}, {"id": 99, "text": "Distribution of the tables across different table", "bbox": {"l": 354.49072, "t": 267.83636, "r": 545.11511, "b": 276.74292, "coord_origin": "TOPLEFT"}}, {"id": 100, "text": "dimensions in PubTabNet + FinTabNet datasets", "bbox": {"l": 308.862, "t": 279.79132000000004, "r": 498.56989, "b": 288.6979099999999, "coord_origin": "TOPLEFT"}}]}, {"id": 8, "label": "text", "bbox": {"l": 308.1451416015625, "t": 316.442138671875, "r": 437.27002, "b": 326.37991, "coord_origin": "TOPLEFT"}, "confidence": 0.8879812359809875, "cells": [{"id": 101, "text": "balance in the previous datasets.", "bbox": {"l": 308.862, "t": 317.47336, "r": 437.27002, "b": 326.37991, "coord_origin": "TOPLEFT"}}]}, {"id": 9, "label": "text", "bbox": {"l": 307.77337646484375, "t": 330.03399658203125, "r": 545.3823852539062, "b": 627.379638671875, "coord_origin": "TOPLEFT"}, "confidence": 0.9870319366455078, "cells": [{"id": 102, "text": "The PubTabNet dataset contains 509k tables delivered as", "bbox": {"l": 320.81699, "t": 331.53137, "r": 545.11505, "b": 340.43793, "coord_origin": "TOPLEFT"}}, {"id": 103, "text": "annotated PNG images. The annotations consist of the table", "bbox": {"l": 308.862, "t": 343.48635999999993, "r": 545.11517, "b": 352.39291, "coord_origin": "TOPLEFT"}}, {"id": 104, "text": "structure represented in HTML format, the tokenized text", "bbox": {"l": 308.862, "t": 355.44235, "r": 545.11505, "b": 364.34890999999993, "coord_origin": "TOPLEFT"}}, {"id": 105, "text": "and its bounding boxes per table cell. Fig. 1 shows the ap-", "bbox": {"l": 308.862, "t": 367.39734, "r": 545.11505, "b": 376.30389, "coord_origin": "TOPLEFT"}}, {"id": 106, "text": "pearance style of PubTabNet. Depending on its complexity,", "bbox": {"l": 308.862, "t": 379.35233, "r": 545.11511, "b": 388.25888, "coord_origin": "TOPLEFT"}}, {"id": 107, "text": "a table is characterized as \u201csimple\u201d when it does not contain", "bbox": {"l": 308.862, "t": 391.30731, "r": 545.11511, "b": 400.21386999999993, "coord_origin": "TOPLEFT"}}, {"id": 108, "text": "row spans or column spans, otherwise it is \u201ccomplex\u201d. The", "bbox": {"l": 308.862, "t": 403.26230000000004, "r": 545.11505, "b": 412.16885, "coord_origin": "TOPLEFT"}}, {"id": 109, "text": "dataset is divided into Train and Val splits (roughly 98% and", "bbox": {"l": 308.862, "t": 415.21729, "r": 545.11511, "b": 424.12384, "coord_origin": "TOPLEFT"}}, {"id": 110, "text": "2%). The Train split consists of 54% simple and 46% com-", "bbox": {"l": 308.862, "t": 427.17328, "r": 545.11517, "b": 436.0798300000001, "coord_origin": "TOPLEFT"}}, {"id": 111, "text": "plex tables and the Val split of 51% and 49% respectively.", "bbox": {"l": 308.862, "t": 439.12827, "r": 545.11517, "b": 448.03482, "coord_origin": "TOPLEFT"}}, {"id": 112, "text": "The FinTabNet dataset contains 112k tables delivered as", "bbox": {"l": 308.862, "t": 451.08325, "r": 545.11511, "b": 459.98981000000003, "coord_origin": "TOPLEFT"}}, {"id": 113, "text": "single-page PDF documents with mixed table structures and", "bbox": {"l": 308.862, "t": 463.03824, "r": 545.11505, "b": 471.94479, "coord_origin": "TOPLEFT"}}, {"id": 114, "text": "text content. Similarly to the PubTabNet, the annotations", "bbox": {"l": 308.862, "t": 474.99323, "r": 545.11511, "b": 483.89978, "coord_origin": "TOPLEFT"}}, {"id": 115, "text": "of FinTabNet include the table structure in HTML, the to-", "bbox": {"l": 308.862, "t": 486.94922, "r": 545.11511, "b": 495.85577, "coord_origin": "TOPLEFT"}}, {"id": 116, "text": "kenized text and the bounding boxes on a table cell basis.", "bbox": {"l": 308.862, "t": 498.90421, "r": 545.11511, "b": 507.81076, "coord_origin": "TOPLEFT"}}, {"id": 117, "text": "The dataset is divided into Train, Test and Val splits (81%,", "bbox": {"l": 308.862, "t": 510.85919, "r": 545.11517, "b": 519.76575, "coord_origin": "TOPLEFT"}}, {"id": 118, "text": "9.5%, 9.5%), and each one is almost equally divided into", "bbox": {"l": 308.862, "t": 522.8141800000001, "r": 545.11517, "b": 531.72073, "coord_origin": "TOPLEFT"}}, {"id": 119, "text": "simple and complex tables (Train: 48% simple, 52% com-", "bbox": {"l": 308.862, "t": 534.76917, "r": 545.11505, "b": 543.67574, "coord_origin": "TOPLEFT"}}, {"id": 120, "text": "plex, Test: 48% simple, 52% complex, Test: 53% simple,", "bbox": {"l": 308.862, "t": 546.72418, "r": 545.11511, "b": 555.6307400000001, "coord_origin": "TOPLEFT"}}, {"id": 121, "text": "47% complex). Finally the TableBank dataset consists of", "bbox": {"l": 308.862, "t": 558.6801800000001, "r": 545.11511, "b": 567.58673, "coord_origin": "TOPLEFT"}}, {"id": 122, "text": "145k tables provided as JPEG images. The latter has anno-", "bbox": {"l": 308.862, "t": 570.63518, "r": 545.11505, "b": 579.54173, "coord_origin": "TOPLEFT"}}, {"id": 123, "text": "tations for the table structure, but only few with bounding", "bbox": {"l": 308.862, "t": 582.59018, "r": 545.11499, "b": 591.49673, "coord_origin": "TOPLEFT"}}, {"id": 124, "text": "boxes of the table cells. The entire dataset consists of sim-", "bbox": {"l": 308.862, "t": 594.54518, "r": 545.11517, "b": 603.45174, "coord_origin": "TOPLEFT"}}, {"id": 125, "text": "ple tables and it is divided into 90% Train, 3% Test and 7%", "bbox": {"l": 308.862, "t": 606.50018, "r": 545.11511, "b": 615.40674, "coord_origin": "TOPLEFT"}}, {"id": 126, "text": "Val splits.", "bbox": {"l": 308.862, "t": 618.45518, "r": 348.16446, "b": 627.36174, "coord_origin": "TOPLEFT"}}]}, {"id": 10, "label": "text", "bbox": {"l": 308.01678466796875, "t": 631.5999755859375, "r": 545.2085571289062, "b": 713.151764, "coord_origin": "TOPLEFT"}, "confidence": 0.9840090274810791, "cells": [{"id": 127, "text": "Due to the heterogeneity across the dataset formats, it", "bbox": {"l": 320.81699, "t": 632.51419, "r": 545.11487, "b": 641.42075, "coord_origin": "TOPLEFT"}}, {"id": 128, "text": "was necessary to combine all available data into one homog-", "bbox": {"l": 308.862, "t": 644.46919, "r": 545.11511, "b": 653.37575, "coord_origin": "TOPLEFT"}}, {"id": 129, "text": "enized dataset before we could train our models for practi-", "bbox": {"l": 308.862, "t": 656.42419, "r": 545.11511, "b": 665.33076, "coord_origin": "TOPLEFT"}}, {"id": 130, "text": "cal purposes. Given the size of PubTabNet, we adopted its", "bbox": {"l": 308.862, "t": 668.38019, "r": 545.11499, "b": 677.28676, "coord_origin": "TOPLEFT"}}, {"id": 131, "text": "annotation format and we extracted and converted all tables", "bbox": {"l": 308.862, "t": 680.33519, "r": 545.11505, "b": 689.24176, "coord_origin": "TOPLEFT"}}, {"id": 132, "text": "as PNG images with a resolution of 72 dpi. Additionally,", "bbox": {"l": 308.862, "t": 692.290192, "r": 545.11505, "b": 701.196762, "coord_origin": "TOPLEFT"}}, {"id": 133, "text": "we have filtered out tables with extreme sizes due to small", "bbox": {"l": 308.862, "t": 704.245193, "r": 545.11511, "b": 713.151764, "coord_origin": "TOPLEFT"}}]}, {"id": 11, "label": "page_footer", "bbox": {"l": 294.38531494140625, "t": 733.3271484375, "r": 300.10229, "b": 743.039761, "coord_origin": "TOPLEFT"}, "confidence": 0.8715606927871704, "cells": [{"id": 134, "text": "3", "bbox": {"l": 295.121, "t": 734.133198, "r": 300.10229, "b": 743.039761, "coord_origin": "TOPLEFT"}}]}]}, "tablestructure": {"table_map": {}}, "figures_classification": null, "equations_prediction": null}, "assembled": {"elements": [{"label": "text", "id": 0, "page_no": 2, "cluster": {"id": 0, "label": "text", "bbox": {"l": 49.76145935058594, "t": 74.29176330566406, "r": 250.15102, "b": 84.22994995117188, "coord_origin": "TOPLEFT"}, "confidence": 0.8767215609550476, "cells": [{"id": 0, "text": "tag-decoder which is constrained to the table-tags.", "bbox": {"l": 50.112, "t": 75.20836999999995, "r": 250.15102, "b": 84.11492999999996, "coord_origin": "TOPLEFT"}}]}, "text": "tag-decoder which is constrained to the table-tags."}, {"label": "text", "id": 1, "page_no": 2, "cluster": {"id": 1, "label": "text", "bbox": {"l": 49.38212203979492, "t": 85.73994445800781, "r": 286.748046875, "b": 275.4715270996094, "coord_origin": "TOPLEFT"}, "confidence": 0.9822593927383423, "cells": [{"id": 1, "text": "In", "bbox": {"l": 62.067001, "t": 87.21935999999994, "r": 70.365845, "b": 96.12591999999995, "coord_origin": "TOPLEFT"}}, {"id": 2, "text": "practice,", "bbox": {"l": 76.931198, "t": 87.21935999999994, "r": 110.95348000000001, "b": 96.12591999999995, "coord_origin": "TOPLEFT"}}, {"id": 3, "text": "both", "bbox": {"l": 118.54498, "t": 87.21935999999994, "r": 136.25848, "b": 96.12591999999995, "coord_origin": "TOPLEFT"}}, {"id": 4, "text": "network", "bbox": {"l": 142.82384, "t": 87.21935999999994, "r": 175.37166, "b": 96.12591999999995, "coord_origin": "TOPLEFT"}}, {"id": 5, "text": "architectures", "bbox": {"l": 181.94698, "t": 87.21935999999994, "r": 232.83594000000002, "b": 96.12591999999995, "coord_origin": "TOPLEFT"}}, {"id": 6, "text": "(IETD", "bbox": {"l": 239.41125, "t": 87.21935999999994, "r": 265.41364, "b": 96.12591999999995, "coord_origin": "TOPLEFT"}}, {"id": 7, "text": "and", "bbox": {"l": 271.979, "t": 87.21935999999994, "r": 286.36499, "b": 96.12591999999995, "coord_origin": "TOPLEFT"}}, {"id": 8, "text": "IEDD) require an implicit, custom trained object-character-", "bbox": {"l": 50.112, "t": 99.17437999999993, "r": 286.36505, "b": 108.08092999999997, "coord_origin": "TOPLEFT"}}, {"id": 9, "text": "recognition (OCR) to obtain the content of the table-cells.", "bbox": {"l": 50.112, "t": 111.13036999999997, "r": 286.36511, "b": 120.03692999999998, "coord_origin": "TOPLEFT"}}, {"id": 10, "text": "In the case of IETD, this OCR engine is implicit in the de-", "bbox": {"l": 50.112, "t": 123.08538999999996, "r": 286.36505, "b": 131.99194, "coord_origin": "TOPLEFT"}}, {"id": 11, "text": "coder similar to [24]. For the IEDD, the OCR is solely em-", "bbox": {"l": 50.112, "t": 135.04040999999995, "r": 286.36514, "b": 143.94696, "coord_origin": "TOPLEFT"}}, {"id": 12, "text": "bedded in the content-decoder. This reliance on a custom,", "bbox": {"l": 50.112, "t": 146.99541999999997, "r": 286.36511, "b": 155.90197999999998, "coord_origin": "TOPLEFT"}}, {"id": 13, "text": "implicit OCR decoder is of course problematic. OCR is a", "bbox": {"l": 50.112, "t": 158.95043999999996, "r": 286.36505, "b": 167.85699, "coord_origin": "TOPLEFT"}}, {"id": 14, "text": "well known and extremely tough problem, that often needs", "bbox": {"l": 50.112, "t": 170.90545999999995, "r": 286.36508, "b": 179.81201, "coord_origin": "TOPLEFT"}}, {"id": 15, "text": "custom training for each individual language. However, the", "bbox": {"l": 50.112, "t": 182.86145, "r": 286.36508, "b": 191.76801, "coord_origin": "TOPLEFT"}}, {"id": 16, "text": "limited availability for non-english content in the current", "bbox": {"l": 50.112, "t": 194.81646999999998, "r": 286.36511, "b": 203.72302000000002, "coord_origin": "TOPLEFT"}}, {"id": 17, "text": "datasets, makes it impractical to apply the IETD and IEDD", "bbox": {"l": 50.112, "t": 206.77148, "r": 286.36511, "b": 215.67804, "coord_origin": "TOPLEFT"}}, {"id": 18, "text": "methods on tables with other languages. Additionally, OCR", "bbox": {"l": 50.112, "t": 218.7265, "r": 286.36505, "b": 227.63306, "coord_origin": "TOPLEFT"}}, {"id": 19, "text": "can be completely omitted if the tables originate from pro-", "bbox": {"l": 50.112, "t": 230.68151999999998, "r": 286.36505, "b": 239.58807000000002, "coord_origin": "TOPLEFT"}}, {"id": 20, "text": "grammatic PDF documents with known positions of each", "bbox": {"l": 50.112, "t": 242.63653999999997, "r": 286.36511, "b": 251.54309, "coord_origin": "TOPLEFT"}}, {"id": 21, "text": "cell. The latter was the inspiration for the work of this pa-", "bbox": {"l": 50.112, "t": 254.59253, "r": 286.36508, "b": 263.49908000000005, "coord_origin": "TOPLEFT"}}, {"id": 22, "text": "per.", "bbox": {"l": 50.112, "t": 266.54755, "r": 64.776947, "b": 275.45410000000004, "coord_origin": "TOPLEFT"}}]}, "text": "In practice, both network architectures (IETD and IEDD) require an implicit, custom trained object-characterrecognition (OCR) to obtain the content of the table-cells. In the case of IETD, this OCR engine is implicit in the decoder similar to [24]. For the IEDD, the OCR is solely embedded in the content-decoder. This reliance on a custom, implicit OCR decoder is of course problematic. OCR is a well known and extremely tough problem, that often needs custom training for each individual language. However, the limited availability for non-english content in the current datasets, makes it impractical to apply the IETD and IEDD methods on tables with other languages. Additionally, OCR can be completely omitted if the tables originate from programmatic PDF documents with known positions of each cell. The latter was the inspiration for the work of this paper."}, {"label": "text", "id": 2, "page_no": 2, "cluster": {"id": 2, "label": "text", "bbox": {"l": 49.258270263671875, "t": 277.4222106933594, "r": 286.6703186035156, "b": 490.70288, "coord_origin": "TOPLEFT"}, "confidence": 0.9878448843955994, "cells": [{"id": 23, "text": "Graph Neural networks", "bbox": {"l": 62.067001, "t": 278.43895999999995, "r": 171.56593, "b": 287.39536, "coord_origin": "TOPLEFT"}}, {"id": 24, "text": ":", "bbox": {"l": 171.56799, "t": 278.55853, "r": 174.3376, "b": 287.46509, "coord_origin": "TOPLEFT"}}, {"id": 25, "text": "Graph Neural networks", "bbox": {"l": 185.18687, "t": 278.55853, "r": 286.35709, "b": 287.46509, "coord_origin": "TOPLEFT"}}, {"id": 26, "text": "(GNN\u2019s) take a radically different approach to table-", "bbox": {"l": 50.111992, "t": 290.51453000000004, "r": 286.36511, "b": 299.42108, "coord_origin": "TOPLEFT"}}, {"id": 27, "text": "structure extraction.", "bbox": {"l": 50.111992, "t": 302.46950999999996, "r": 131.16771, "b": 311.37607, "coord_origin": "TOPLEFT"}}, {"id": 28, "text": "Note that one table cell can consti-", "bbox": {"l": 138.84888, "t": 302.46950999999996, "r": 286.36508, "b": 311.37607, "coord_origin": "TOPLEFT"}}, {"id": 29, "text": "tute out of multiple text-cells. To obtain the table-structure,", "bbox": {"l": 50.111992, "t": 314.4245, "r": 286.36505, "b": 323.33105, "coord_origin": "TOPLEFT"}}, {"id": 30, "text": "one creates an initial graph, where each of the text-cells", "bbox": {"l": 50.111992, "t": 326.37949000000003, "r": 286.36508, "b": 335.28604, "coord_origin": "TOPLEFT"}}, {"id": 31, "text": "becomes a node in the graph similar to [33, 34, 2]. Each", "bbox": {"l": 50.111992, "t": 338.33447, "r": 286.36505, "b": 347.2410300000001, "coord_origin": "TOPLEFT"}}, {"id": 32, "text": "node is then associated with en embedding vector coming", "bbox": {"l": 50.111992, "t": 350.28946, "r": 286.36505, "b": 359.19601, "coord_origin": "TOPLEFT"}}, {"id": 33, "text": "from the encoded image, its coordinates and the encoded", "bbox": {"l": 50.111992, "t": 362.24545000000006, "r": 286.36508, "b": 371.15201, "coord_origin": "TOPLEFT"}}, {"id": 34, "text": "text. Furthermore, nodes that represent adjacent text-cells", "bbox": {"l": 50.111992, "t": 374.20044, "r": 286.36508, "b": 383.10699, "coord_origin": "TOPLEFT"}}, {"id": 35, "text": "are linked. Graph Convolutional Networks (GCN\u2019s) based", "bbox": {"l": 50.111992, "t": 386.15542999999997, "r": 286.36508, "b": 395.06198, "coord_origin": "TOPLEFT"}}, {"id": 36, "text": "methods take the image as an input, but also the position of", "bbox": {"l": 50.111992, "t": 398.11041000000006, "r": 286.36508, "b": 407.01697, "coord_origin": "TOPLEFT"}}, {"id": 37, "text": "the text-cells and their content [18]. The purpose of a GCN", "bbox": {"l": 50.111992, "t": 410.0654, "r": 286.36508, "b": 418.97195, "coord_origin": "TOPLEFT"}}, {"id": 38, "text": "is to transform the input graph into a new graph, which re-", "bbox": {"l": 50.111992, "t": 422.02038999999996, "r": 286.36505, "b": 430.92694, "coord_origin": "TOPLEFT"}}, {"id": 39, "text": "places the old links with new ones.", "bbox": {"l": 50.111992, "t": 433.97638, "r": 198.2359, "b": 442.88293, "coord_origin": "TOPLEFT"}}, {"id": 40, "text": "The new links then", "bbox": {"l": 205.92703, "t": 433.97638, "r": 286.36505, "b": 442.88293, "coord_origin": "TOPLEFT"}}, {"id": 41, "text": "represent the table-structure. With this approach, one can", "bbox": {"l": 50.111992, "t": 445.93137, "r": 286.36508, "b": 454.83792000000005, "coord_origin": "TOPLEFT"}}, {"id": 42, "text": "avoid the need to build custom OCR decoders. However,", "bbox": {"l": 50.111992, "t": 457.88635, "r": 286.36505, "b": 466.79291, "coord_origin": "TOPLEFT"}}, {"id": 43, "text": "the quality of the reconstructed structure is not comparable", "bbox": {"l": 50.111992, "t": 469.84134, "r": 286.36505, "b": 478.74789, "coord_origin": "TOPLEFT"}}, {"id": 44, "text": "to the current state-of-the-art [18].", "bbox": {"l": 50.111992, "t": 481.79633, "r": 186.49998, "b": 490.70288, "coord_origin": "TOPLEFT"}}]}, "text": "Graph Neural networks : Graph Neural networks (GNN\u2019s) take a radically different approach to tablestructure extraction. Note that one table cell can constitute out of multiple text-cells. To obtain the table-structure, one creates an initial graph, where each of the text-cells becomes a node in the graph similar to [33, 34, 2]. Each node is then associated with en embedding vector coming from the encoded image, its coordinates and the encoded text. Furthermore, nodes that represent adjacent text-cells are linked. Graph Convolutional Networks (GCN\u2019s) based methods take the image as an input, but also the position of the text-cells and their content [18]. The purpose of a GCN is to transform the input graph into a new graph, which replaces the old links with new ones. The new links then represent the table-structure. With this approach, one can avoid the need to build custom OCR decoders. However, the quality of the reconstructed structure is not comparable to the current state-of-the-art [18]."}, {"label": "text", "id": 3, "page_no": 2, "cluster": {"id": 3, "label": "text", "bbox": {"l": 49.33047866821289, "t": 492.9067077636719, "r": 286.740234375, "b": 622.9658203125, "coord_origin": "TOPLEFT"}, "confidence": 0.9875094294548035, "cells": [{"id": 45, "text": "Hybrid Deep Learning-Rule-Based approach", "bbox": {"l": 62.066994, "t": 493.68875, "r": 252.88068000000004, "b": 502.64514, "coord_origin": "TOPLEFT"}}, {"id": 46, "text": ": A pop-", "bbox": {"l": 252.88199, "t": 493.80832, "r": 286.36627, "b": 502.71487, "coord_origin": "TOPLEFT"}}, {"id": 47, "text": "ular current model for table-structure identification is the", "bbox": {"l": 50.111984, "t": 505.76331, "r": 286.36505, "b": 514.66986, "coord_origin": "TOPLEFT"}}, {"id": 48, "text": "use of a hybrid Deep Learning-Rule-Based approach similar", "bbox": {"l": 50.111984, "t": 517.71829, "r": 286.36505, "b": 526.6248499999999, "coord_origin": "TOPLEFT"}}, {"id": 49, "text": "to [27, 29]. In this approach, one first detects the position of", "bbox": {"l": 50.111984, "t": 529.67328, "r": 286.36508, "b": 538.57985, "coord_origin": "TOPLEFT"}}, {"id": 50, "text": "the table-cells with object detection (e.g. YoloVx or Mask-", "bbox": {"l": 50.111984, "t": 541.62929, "r": 286.36508, "b": 550.53584, "coord_origin": "TOPLEFT"}}, {"id": 51, "text": "RCNN), then classifies the table into different types (from", "bbox": {"l": 50.111984, "t": 553.58429, "r": 286.36511, "b": 562.4908399999999, "coord_origin": "TOPLEFT"}}, {"id": 52, "text": "its images) and finally uses different rule-sets to obtain", "bbox": {"l": 50.111984, "t": 565.5392899999999, "r": 286.36511, "b": 574.44585, "coord_origin": "TOPLEFT"}}, {"id": 53, "text": "its table-structure. Currently, this approach achieves state-", "bbox": {"l": 50.111984, "t": 577.49429, "r": 286.36502, "b": 586.40085, "coord_origin": "TOPLEFT"}}, {"id": 54, "text": "of-the-art results, but is not an end-to-end deep-learning", "bbox": {"l": 50.111984, "t": 589.4493, "r": 286.36505, "b": 598.35585, "coord_origin": "TOPLEFT"}}, {"id": 55, "text": "method. As such, new rules need to be written if different", "bbox": {"l": 50.111984, "t": 601.4043, "r": 286.36502, "b": 610.31085, "coord_origin": "TOPLEFT"}}, {"id": 56, "text": "types of tables are encountered.", "bbox": {"l": 50.111984, "t": 613.36029, "r": 175.98943, "b": 622.26685, "coord_origin": "TOPLEFT"}}]}, "text": "Hybrid Deep Learning-Rule-Based approach : A popular current model for table-structure identification is the use of a hybrid Deep Learning-Rule-Based approach similar to [27, 29]. In this approach, one first detects the position of the table-cells with object detection (e.g. YoloVx or MaskRCNN), then classifies the table into different types (from its images) and finally uses different rule-sets to obtain its table-structure. Currently, this approach achieves stateof-the-art results, but is not an end-to-end deep-learning method. As such, new rules need to be written if different types of tables are encountered."}, {"label": "section_header", "id": 4, "page_no": 2, "cluster": {"id": 4, "label": "section_header", "bbox": {"l": 49.298309326171875, "t": 635.1035766601562, "r": 105.35755920410156, "b": 646.6925699999999, "coord_origin": "TOPLEFT"}, "confidence": 0.9423062205314636, "cells": [{"id": 57, "text": "3.", "bbox": {"l": 50.111984, "t": 635.94484, "r": 57.82375699999999, "b": 646.6925699999999, "coord_origin": "TOPLEFT"}}, {"id": 58, "text": "Datasets", "bbox": {"l": 68.106125, "t": 635.94484, "r": 105.22546, "b": 646.6925699999999, "coord_origin": "TOPLEFT"}}]}, "text": "3. Datasets"}, {"label": "text", "id": 5, "page_no": 2, "cluster": {"id": 5, "label": "text", "bbox": {"l": 49.53636169433594, "t": 655.4588012695312, "r": 286.584716796875, "b": 713.151863, "coord_origin": "TOPLEFT"}, "confidence": 0.9862047433853149, "cells": [{"id": 59, "text": "We rely on large-scale datasets such as PubTabNet [37],", "bbox": {"l": 62.06698600000001, "t": 656.42529, "r": 286.36493, "b": 665.33186, "coord_origin": "TOPLEFT"}}, {"id": 60, "text": "FinTabNet [36], and TableBank [17] datasets to train and", "bbox": {"l": 50.111984, "t": 668.38029, "r": 286.36508, "b": 677.2868599999999, "coord_origin": "TOPLEFT"}}, {"id": 61, "text": "evaluate our models. These datasets span over various ap-", "bbox": {"l": 50.111984, "t": 680.3353, "r": 286.36502, "b": 689.24186, "coord_origin": "TOPLEFT"}}, {"id": 62, "text": "pearance styles and content.", "bbox": {"l": 50.111984, "t": 692.290298, "r": 166.24602, "b": 701.196861, "coord_origin": "TOPLEFT"}}, {"id": 63, "text": "We also introduce our own", "bbox": {"l": 173.68808, "t": 692.290298, "r": 286.36508, "b": 701.196861, "coord_origin": "TOPLEFT"}}, {"id": 64, "text": "synthetically generated SynthTabNet dataset to fix an im-", "bbox": {"l": 50.111984, "t": 704.2453, "r": 286.36505, "b": 713.151863, "coord_origin": "TOPLEFT"}}]}, "text": "We rely on large-scale datasets such as PubTabNet [37], FinTabNet [36], and TableBank [17] datasets to train and evaluate our models. These datasets span over various appearance styles and content. We also introduce our own synthetically generated SynthTabNet dataset to fix an im-"}, {"label": "picture", "id": 6, "page_no": 2, "cluster": {"id": 6, "label": "picture", "bbox": {"l": 312.10369873046875, "t": 78.44086456298828, "r": 550.38916015625, "b": 250.77495999999996, "coord_origin": "TOPLEFT"}, "confidence": 0.9746916890144348, "cells": [{"id": 65, "text": "PubTabNet + FinTabNet", "bbox": {"l": 380.79849, "t": 79.81176999999991, "r": 486.84909, "b": 88.55975000000001, "coord_origin": "TOPLEFT"}}, {"id": 66, "text": "Rows / Columns", "bbox": {"l": 396.76776, "t": 242.02697999999998, "r": 469.78748, "b": 250.77495999999996, "coord_origin": "TOPLEFT"}}, {"id": 67, "text": "0", "bbox": {"l": 320.97653, "t": 233.42296999999996, "r": 324.79254, "b": 239.255, "coord_origin": "TOPLEFT"}}, {"id": 68, "text": "20", "bbox": {"l": 410.483, "t": 233.42296999999996, "r": 418.11319, "b": 239.255, "coord_origin": "TOPLEFT"}}, {"id": 69, "text": "40", "bbox": {"l": 500.84949, "t": 233.42296999999996, "r": 508.47968000000003, "b": 239.255, "coord_origin": "TOPLEFT"}}, {"id": 70, "text": "10", "bbox": {"l": 365.29999, "t": 233.42296999999996, "r": 372.93018, "b": 239.255, "coord_origin": "TOPLEFT"}}, {"id": 71, "text": "30", "bbox": {"l": 455.66626, "t": 233.42296999999996, "r": 463.29645, "b": 239.255, "coord_origin": "TOPLEFT"}}, {"id": 72, "text": "50", "bbox": {"l": 542.03528, "t": 233.42296999999996, "r": 549.66547, "b": 239.255, "coord_origin": "TOPLEFT"}}, {"id": 73, "text": "0", "bbox": {"l": 316.04474, "t": 230.44617000000005, "r": 319.86075, "b": 236.27819999999997, "coord_origin": "TOPLEFT"}}, {"id": 74, "text": "2", "bbox": {"l": 312.62521, "t": 198.69073000000003, "r": 316.44122, "b": 204.52277000000004, "coord_origin": "TOPLEFT"}}, {"id": 75, "text": "0", "bbox": {"l": 316.43942, "t": 198.69073000000003, "r": 320.2554, "b": 204.52277000000004, "coord_origin": "TOPLEFT"}}, {"id": 76, "text": "4", "bbox": {"l": 313.14951, "t": 168.09795999999994, "r": 316.96552, "b": 173.92998999999998, "coord_origin": "TOPLEFT"}}, {"id": 77, "text": "0", "bbox": {"l": 316.96371, "t": 168.09795999999994, "r": 320.77969, "b": 173.92998999999998, "coord_origin": "TOPLEFT"}}, {"id": 78, "text": "6", "bbox": {"l": 312.92972, "t": 136.58771000000002, "r": 316.74573, "b": 142.41974000000005, "coord_origin": "TOPLEFT"}}, {"id": 79, "text": "0", "bbox": {"l": 316.74393, "t": 136.58771000000002, "r": 320.55991, "b": 142.41974000000005, "coord_origin": "TOPLEFT"}}, {"id": 80, "text": "8", "bbox": {"l": 312.48227, "t": 105.60175000000004, "r": 316.29828, "b": 111.43377999999996, "coord_origin": "TOPLEFT"}}, {"id": 81, "text": "0", "bbox": {"l": 316.29648, "t": 105.60175000000004, "r": 320.11246, "b": 111.43377999999996, "coord_origin": "TOPLEFT"}}, {"id": 82, "text": "1", "bbox": {"l": 312.48227, "t": 212.25922000000003, "r": 316.29828, "b": 218.09124999999995, "coord_origin": "TOPLEFT"}}, {"id": 83, "text": "0", "bbox": {"l": 316.29648, "t": 212.25922000000003, "r": 320.11246, "b": 218.09124999999995, "coord_origin": "TOPLEFT"}}, {"id": 84, "text": "3", "bbox": {"l": 313.07639, "t": 183.72198000000003, "r": 316.8924, "b": 189.55402000000004, "coord_origin": "TOPLEFT"}}, {"id": 85, "text": "0", "bbox": {"l": 316.89059, "t": 183.72198000000003, "r": 320.70657, "b": 189.55402000000004, "coord_origin": "TOPLEFT"}}, {"id": 86, "text": "5", "bbox": {"l": 312.76321, "t": 152.47400000000005, "r": 316.57922, "b": 158.30602999999996, "coord_origin": "TOPLEFT"}}, {"id": 87, "text": "0", "bbox": {"l": 316.57742, "t": 152.47400000000005, "r": 320.3934, "b": 158.30602999999996, "coord_origin": "TOPLEFT"}}, {"id": 88, "text": "7", "bbox": {"l": 312.19775, "t": 120.57050000000004, "r": 316.01376, "b": 126.40252999999996, "coord_origin": "TOPLEFT"}}, {"id": 89, "text": "0", "bbox": {"l": 316.01196, "t": 120.57050000000004, "r": 319.82794, "b": 126.40252999999996, "coord_origin": "TOPLEFT"}}, {"id": 90, "text": "9", "bbox": {"l": 312.8165, "t": 90.1087, "r": 316.63251, "b": 95.94073000000003, "coord_origin": "TOPLEFT"}}, {"id": 91, "text": "0", "bbox": {"l": 316.63071, "t": 90.1087, "r": 320.44669, "b": 95.94073000000003, "coord_origin": "TOPLEFT"}}, {"id": 92, "text": "0", "bbox": {"l": 532.17426, "t": 222.72729000000004, "r": 536.94427, "b": 230.01727000000005, "coord_origin": "TOPLEFT"}}, {"id": 93, "text": "10K", "bbox": {"l": 532.87952, "t": 108.26702999999986, "r": 547.61249, "b": 115.55700999999999, "coord_origin": "TOPLEFT"}}, {"id": 94, "text": "8K", "bbox": {"l": 532.7735, "t": 130.78101000000004, "r": 542.73877, "b": 138.07097999999996, "coord_origin": "TOPLEFT"}}, {"id": 95, "text": "6K", "bbox": {"l": 532.79901, "t": 153.92352000000005, "r": 542.76428, "b": 161.21349999999995, "coord_origin": "TOPLEFT"}}, {"id": 96, "text": "4K", "bbox": {"l": 532.5705, "t": 176.75800000000004, "r": 542.53577, "b": 184.04796999999996, "coord_origin": "TOPLEFT"}}, {"id": 97, "text": "2K", "bbox": {"l": 532.14551, "t": 199.6463, "r": 542.11078, "b": 206.93628, "coord_origin": "TOPLEFT"}}]}, "text": "", "data": null, "provenance": null, "predicted_class": null, "confidence": null}, {"label": "caption", "id": 7, "page_no": 2, "cluster": {"id": 7, "label": "caption", "bbox": {"l": 308.2174377441406, "t": 267.0390930175781, "r": 545.11511, "b": 288.6979099999999, "coord_origin": "TOPLEFT"}, "confidence": 0.9667505025863647, "cells": [{"id": 98, "text": "Figure 2:", "bbox": {"l": 308.862, "t": 267.83636, "r": 346.06238, "b": 276.74292, "coord_origin": "TOPLEFT"}}, {"id": 99, "text": "Distribution of the tables across different table", "bbox": {"l": 354.49072, "t": 267.83636, "r": 545.11511, "b": 276.74292, "coord_origin": "TOPLEFT"}}, {"id": 100, "text": "dimensions in PubTabNet + FinTabNet datasets", "bbox": {"l": 308.862, "t": 279.79132000000004, "r": 498.56989, "b": 288.6979099999999, "coord_origin": "TOPLEFT"}}]}, "text": "Figure 2: Distribution of the tables across different table dimensions in PubTabNet + FinTabNet datasets"}, {"label": "text", "id": 8, "page_no": 2, "cluster": {"id": 8, "label": "text", "bbox": {"l": 308.1451416015625, "t": 316.442138671875, "r": 437.27002, "b": 326.37991, "coord_origin": "TOPLEFT"}, "confidence": 0.8879812359809875, "cells": [{"id": 101, "text": "balance in the previous datasets.", "bbox": {"l": 308.862, "t": 317.47336, "r": 437.27002, "b": 326.37991, "coord_origin": "TOPLEFT"}}]}, "text": "balance in the previous datasets."}, {"label": "text", "id": 9, "page_no": 2, "cluster": {"id": 9, "label": "text", "bbox": {"l": 307.77337646484375, "t": 330.03399658203125, "r": 545.3823852539062, "b": 627.379638671875, "coord_origin": "TOPLEFT"}, "confidence": 0.9870319366455078, "cells": [{"id": 102, "text": "The PubTabNet dataset contains 509k tables delivered as", "bbox": {"l": 320.81699, "t": 331.53137, "r": 545.11505, "b": 340.43793, "coord_origin": "TOPLEFT"}}, {"id": 103, "text": "annotated PNG images. The annotations consist of the table", "bbox": {"l": 308.862, "t": 343.48635999999993, "r": 545.11517, "b": 352.39291, "coord_origin": "TOPLEFT"}}, {"id": 104, "text": "structure represented in HTML format, the tokenized text", "bbox": {"l": 308.862, "t": 355.44235, "r": 545.11505, "b": 364.34890999999993, "coord_origin": "TOPLEFT"}}, {"id": 105, "text": "and its bounding boxes per table cell. Fig. 1 shows the ap-", "bbox": {"l": 308.862, "t": 367.39734, "r": 545.11505, "b": 376.30389, "coord_origin": "TOPLEFT"}}, {"id": 106, "text": "pearance style of PubTabNet. Depending on its complexity,", "bbox": {"l": 308.862, "t": 379.35233, "r": 545.11511, "b": 388.25888, "coord_origin": "TOPLEFT"}}, {"id": 107, "text": "a table is characterized as \u201csimple\u201d when it does not contain", "bbox": {"l": 308.862, "t": 391.30731, "r": 545.11511, "b": 400.21386999999993, "coord_origin": "TOPLEFT"}}, {"id": 108, "text": "row spans or column spans, otherwise it is \u201ccomplex\u201d. The", "bbox": {"l": 308.862, "t": 403.26230000000004, "r": 545.11505, "b": 412.16885, "coord_origin": "TOPLEFT"}}, {"id": 109, "text": "dataset is divided into Train and Val splits (roughly 98% and", "bbox": {"l": 308.862, "t": 415.21729, "r": 545.11511, "b": 424.12384, "coord_origin": "TOPLEFT"}}, {"id": 110, "text": "2%). The Train split consists of 54% simple and 46% com-", "bbox": {"l": 308.862, "t": 427.17328, "r": 545.11517, "b": 436.0798300000001, "coord_origin": "TOPLEFT"}}, {"id": 111, "text": "plex tables and the Val split of 51% and 49% respectively.", "bbox": {"l": 308.862, "t": 439.12827, "r": 545.11517, "b": 448.03482, "coord_origin": "TOPLEFT"}}, {"id": 112, "text": "The FinTabNet dataset contains 112k tables delivered as", "bbox": {"l": 308.862, "t": 451.08325, "r": 545.11511, "b": 459.98981000000003, "coord_origin": "TOPLEFT"}}, {"id": 113, "text": "single-page PDF documents with mixed table structures and", "bbox": {"l": 308.862, "t": 463.03824, "r": 545.11505, "b": 471.94479, "coord_origin": "TOPLEFT"}}, {"id": 114, "text": "text content. Similarly to the PubTabNet, the annotations", "bbox": {"l": 308.862, "t": 474.99323, "r": 545.11511, "b": 483.89978, "coord_origin": "TOPLEFT"}}, {"id": 115, "text": "of FinTabNet include the table structure in HTML, the to-", "bbox": {"l": 308.862, "t": 486.94922, "r": 545.11511, "b": 495.85577, "coord_origin": "TOPLEFT"}}, {"id": 116, "text": "kenized text and the bounding boxes on a table cell basis.", "bbox": {"l": 308.862, "t": 498.90421, "r": 545.11511, "b": 507.81076, "coord_origin": "TOPLEFT"}}, {"id": 117, "text": "The dataset is divided into Train, Test and Val splits (81%,", "bbox": {"l": 308.862, "t": 510.85919, "r": 545.11517, "b": 519.76575, "coord_origin": "TOPLEFT"}}, {"id": 118, "text": "9.5%, 9.5%), and each one is almost equally divided into", "bbox": {"l": 308.862, "t": 522.8141800000001, "r": 545.11517, "b": 531.72073, "coord_origin": "TOPLEFT"}}, {"id": 119, "text": "simple and complex tables (Train: 48% simple, 52% com-", "bbox": {"l": 308.862, "t": 534.76917, "r": 545.11505, "b": 543.67574, "coord_origin": "TOPLEFT"}}, {"id": 120, "text": "plex, Test: 48% simple, 52% complex, Test: 53% simple,", "bbox": {"l": 308.862, "t": 546.72418, "r": 545.11511, "b": 555.6307400000001, "coord_origin": "TOPLEFT"}}, {"id": 121, "text": "47% complex). Finally the TableBank dataset consists of", "bbox": {"l": 308.862, "t": 558.6801800000001, "r": 545.11511, "b": 567.58673, "coord_origin": "TOPLEFT"}}, {"id": 122, "text": "145k tables provided as JPEG images. The latter has anno-", "bbox": {"l": 308.862, "t": 570.63518, "r": 545.11505, "b": 579.54173, "coord_origin": "TOPLEFT"}}, {"id": 123, "text": "tations for the table structure, but only few with bounding", "bbox": {"l": 308.862, "t": 582.59018, "r": 545.11499, "b": 591.49673, "coord_origin": "TOPLEFT"}}, {"id": 124, "text": "boxes of the table cells. The entire dataset consists of sim-", "bbox": {"l": 308.862, "t": 594.54518, "r": 545.11517, "b": 603.45174, "coord_origin": "TOPLEFT"}}, {"id": 125, "text": "ple tables and it is divided into 90% Train, 3% Test and 7%", "bbox": {"l": 308.862, "t": 606.50018, "r": 545.11511, "b": 615.40674, "coord_origin": "TOPLEFT"}}, {"id": 126, "text": "Val splits.", "bbox": {"l": 308.862, "t": 618.45518, "r": 348.16446, "b": 627.36174, "coord_origin": "TOPLEFT"}}]}, "text": "The PubTabNet dataset contains 509k tables delivered as annotated PNG images. The annotations consist of the table structure represented in HTML format, the tokenized text and its bounding boxes per table cell. Fig. 1 shows the appearance style of PubTabNet. Depending on its complexity, a table is characterized as \u201csimple\u201d when it does not contain row spans or column spans, otherwise it is \u201ccomplex\u201d. The dataset is divided into Train and Val splits (roughly 98% and 2%). The Train split consists of 54% simple and 46% complex tables and the Val split of 51% and 49% respectively. The FinTabNet dataset contains 112k tables delivered as single-page PDF documents with mixed table structures and text content. Similarly to the PubTabNet, the annotations of FinTabNet include the table structure in HTML, the tokenized text and the bounding boxes on a table cell basis. The dataset is divided into Train, Test and Val splits (81%, 9.5%, 9.5%), and each one is almost equally divided into simple and complex tables (Train: 48% simple, 52% complex, Test: 48% simple, 52% complex, Test: 53% simple, 47% complex). Finally the TableBank dataset consists of 145k tables provided as JPEG images. The latter has annotations for the table structure, but only few with bounding boxes of the table cells. The entire dataset consists of simple tables and it is divided into 90% Train, 3% Test and 7% Val splits."}, {"label": "text", "id": 10, "page_no": 2, "cluster": {"id": 10, "label": "text", "bbox": {"l": 308.01678466796875, "t": 631.5999755859375, "r": 545.2085571289062, "b": 713.151764, "coord_origin": "TOPLEFT"}, "confidence": 0.9840090274810791, "cells": [{"id": 127, "text": "Due to the heterogeneity across the dataset formats, it", "bbox": {"l": 320.81699, "t": 632.51419, "r": 545.11487, "b": 641.42075, "coord_origin": "TOPLEFT"}}, {"id": 128, "text": "was necessary to combine all available data into one homog-", "bbox": {"l": 308.862, "t": 644.46919, "r": 545.11511, "b": 653.37575, "coord_origin": "TOPLEFT"}}, {"id": 129, "text": "enized dataset before we could train our models for practi-", "bbox": {"l": 308.862, "t": 656.42419, "r": 545.11511, "b": 665.33076, "coord_origin": "TOPLEFT"}}, {"id": 130, "text": "cal purposes. Given the size of PubTabNet, we adopted its", "bbox": {"l": 308.862, "t": 668.38019, "r": 545.11499, "b": 677.28676, "coord_origin": "TOPLEFT"}}, {"id": 131, "text": "annotation format and we extracted and converted all tables", "bbox": {"l": 308.862, "t": 680.33519, "r": 545.11505, "b": 689.24176, "coord_origin": "TOPLEFT"}}, {"id": 132, "text": "as PNG images with a resolution of 72 dpi. Additionally,", "bbox": {"l": 308.862, "t": 692.290192, "r": 545.11505, "b": 701.196762, "coord_origin": "TOPLEFT"}}, {"id": 133, "text": "we have filtered out tables with extreme sizes due to small", "bbox": {"l": 308.862, "t": 704.245193, "r": 545.11511, "b": 713.151764, "coord_origin": "TOPLEFT"}}]}, "text": "Due to the heterogeneity across the dataset formats, it was necessary to combine all available data into one homogenized dataset before we could train our models for practical purposes. Given the size of PubTabNet, we adopted its annotation format and we extracted and converted all tables as PNG images with a resolution of 72 dpi. Additionally, we have filtered out tables with extreme sizes due to small"}, {"label": "page_footer", "id": 11, "page_no": 2, "cluster": {"id": 11, "label": "page_footer", "bbox": {"l": 294.38531494140625, "t": 733.3271484375, "r": 300.10229, "b": 743.039761, "coord_origin": "TOPLEFT"}, "confidence": 0.8715606927871704, "cells": [{"id": 134, "text": "3", "bbox": {"l": 295.121, "t": 734.133198, "r": 300.10229, "b": 743.039761, "coord_origin": "TOPLEFT"}}]}, "text": "3"}], "body": [{"label": "text", "id": 0, "page_no": 2, "cluster": {"id": 0, "label": "text", "bbox": {"l": 49.76145935058594, "t": 74.29176330566406, "r": 250.15102, "b": 84.22994995117188, "coord_origin": "TOPLEFT"}, "confidence": 0.8767215609550476, "cells": [{"id": 0, "text": "tag-decoder which is constrained to the table-tags.", "bbox": {"l": 50.112, "t": 75.20836999999995, "r": 250.15102, "b": 84.11492999999996, "coord_origin": "TOPLEFT"}}]}, "text": "tag-decoder which is constrained to the table-tags."}, {"label": "text", "id": 1, "page_no": 2, "cluster": {"id": 1, "label": "text", "bbox": {"l": 49.38212203979492, "t": 85.73994445800781, "r": 286.748046875, "b": 275.4715270996094, "coord_origin": "TOPLEFT"}, "confidence": 0.9822593927383423, "cells": [{"id": 1, "text": "In", "bbox": {"l": 62.067001, "t": 87.21935999999994, "r": 70.365845, "b": 96.12591999999995, "coord_origin": "TOPLEFT"}}, {"id": 2, "text": "practice,", "bbox": {"l": 76.931198, "t": 87.21935999999994, "r": 110.95348000000001, "b": 96.12591999999995, "coord_origin": "TOPLEFT"}}, {"id": 3, "text": "both", "bbox": {"l": 118.54498, "t": 87.21935999999994, "r": 136.25848, "b": 96.12591999999995, "coord_origin": "TOPLEFT"}}, {"id": 4, "text": "network", "bbox": {"l": 142.82384, "t": 87.21935999999994, "r": 175.37166, "b": 96.12591999999995, "coord_origin": "TOPLEFT"}}, {"id": 5, "text": "architectures", "bbox": {"l": 181.94698, "t": 87.21935999999994, "r": 232.83594000000002, "b": 96.12591999999995, "coord_origin": "TOPLEFT"}}, {"id": 6, "text": "(IETD", "bbox": {"l": 239.41125, "t": 87.21935999999994, "r": 265.41364, "b": 96.12591999999995, "coord_origin": "TOPLEFT"}}, {"id": 7, "text": "and", "bbox": {"l": 271.979, "t": 87.21935999999994, "r": 286.36499, "b": 96.12591999999995, "coord_origin": "TOPLEFT"}}, {"id": 8, "text": "IEDD) require an implicit, custom trained object-character-", "bbox": {"l": 50.112, "t": 99.17437999999993, "r": 286.36505, "b": 108.08092999999997, "coord_origin": "TOPLEFT"}}, {"id": 9, "text": "recognition (OCR) to obtain the content of the table-cells.", "bbox": {"l": 50.112, "t": 111.13036999999997, "r": 286.36511, "b": 120.03692999999998, "coord_origin": "TOPLEFT"}}, {"id": 10, "text": "In the case of IETD, this OCR engine is implicit in the de-", "bbox": {"l": 50.112, "t": 123.08538999999996, "r": 286.36505, "b": 131.99194, "coord_origin": "TOPLEFT"}}, {"id": 11, "text": "coder similar to [24]. For the IEDD, the OCR is solely em-", "bbox": {"l": 50.112, "t": 135.04040999999995, "r": 286.36514, "b": 143.94696, "coord_origin": "TOPLEFT"}}, {"id": 12, "text": "bedded in the content-decoder. This reliance on a custom,", "bbox": {"l": 50.112, "t": 146.99541999999997, "r": 286.36511, "b": 155.90197999999998, "coord_origin": "TOPLEFT"}}, {"id": 13, "text": "implicit OCR decoder is of course problematic. OCR is a", "bbox": {"l": 50.112, "t": 158.95043999999996, "r": 286.36505, "b": 167.85699, "coord_origin": "TOPLEFT"}}, {"id": 14, "text": "well known and extremely tough problem, that often needs", "bbox": {"l": 50.112, "t": 170.90545999999995, "r": 286.36508, "b": 179.81201, "coord_origin": "TOPLEFT"}}, {"id": 15, "text": "custom training for each individual language. However, the", "bbox": {"l": 50.112, "t": 182.86145, "r": 286.36508, "b": 191.76801, "coord_origin": "TOPLEFT"}}, {"id": 16, "text": "limited availability for non-english content in the current", "bbox": {"l": 50.112, "t": 194.81646999999998, "r": 286.36511, "b": 203.72302000000002, "coord_origin": "TOPLEFT"}}, {"id": 17, "text": "datasets, makes it impractical to apply the IETD and IEDD", "bbox": {"l": 50.112, "t": 206.77148, "r": 286.36511, "b": 215.67804, "coord_origin": "TOPLEFT"}}, {"id": 18, "text": "methods on tables with other languages. Additionally, OCR", "bbox": {"l": 50.112, "t": 218.7265, "r": 286.36505, "b": 227.63306, "coord_origin": "TOPLEFT"}}, {"id": 19, "text": "can be completely omitted if the tables originate from pro-", "bbox": {"l": 50.112, "t": 230.68151999999998, "r": 286.36505, "b": 239.58807000000002, "coord_origin": "TOPLEFT"}}, {"id": 20, "text": "grammatic PDF documents with known positions of each", "bbox": {"l": 50.112, "t": 242.63653999999997, "r": 286.36511, "b": 251.54309, "coord_origin": "TOPLEFT"}}, {"id": 21, "text": "cell. The latter was the inspiration for the work of this pa-", "bbox": {"l": 50.112, "t": 254.59253, "r": 286.36508, "b": 263.49908000000005, "coord_origin": "TOPLEFT"}}, {"id": 22, "text": "per.", "bbox": {"l": 50.112, "t": 266.54755, "r": 64.776947, "b": 275.45410000000004, "coord_origin": "TOPLEFT"}}]}, "text": "In practice, both network architectures (IETD and IEDD) require an implicit, custom trained object-characterrecognition (OCR) to obtain the content of the table-cells. In the case of IETD, this OCR engine is implicit in the decoder similar to [24]. For the IEDD, the OCR is solely embedded in the content-decoder. This reliance on a custom, implicit OCR decoder is of course problematic. OCR is a well known and extremely tough problem, that often needs custom training for each individual language. However, the limited availability for non-english content in the current datasets, makes it impractical to apply the IETD and IEDD methods on tables with other languages. Additionally, OCR can be completely omitted if the tables originate from programmatic PDF documents with known positions of each cell. The latter was the inspiration for the work of this paper."}, {"label": "text", "id": 2, "page_no": 2, "cluster": {"id": 2, "label": "text", "bbox": {"l": 49.258270263671875, "t": 277.4222106933594, "r": 286.6703186035156, "b": 490.70288, "coord_origin": "TOPLEFT"}, "confidence": 0.9878448843955994, "cells": [{"id": 23, "text": "Graph Neural networks", "bbox": {"l": 62.067001, "t": 278.43895999999995, "r": 171.56593, "b": 287.39536, "coord_origin": "TOPLEFT"}}, {"id": 24, "text": ":", "bbox": {"l": 171.56799, "t": 278.55853, "r": 174.3376, "b": 287.46509, "coord_origin": "TOPLEFT"}}, {"id": 25, "text": "Graph Neural networks", "bbox": {"l": 185.18687, "t": 278.55853, "r": 286.35709, "b": 287.46509, "coord_origin": "TOPLEFT"}}, {"id": 26, "text": "(GNN\u2019s) take a radically different approach to table-", "bbox": {"l": 50.111992, "t": 290.51453000000004, "r": 286.36511, "b": 299.42108, "coord_origin": "TOPLEFT"}}, {"id": 27, "text": "structure extraction.", "bbox": {"l": 50.111992, "t": 302.46950999999996, "r": 131.16771, "b": 311.37607, "coord_origin": "TOPLEFT"}}, {"id": 28, "text": "Note that one table cell can consti-", "bbox": {"l": 138.84888, "t": 302.46950999999996, "r": 286.36508, "b": 311.37607, "coord_origin": "TOPLEFT"}}, {"id": 29, "text": "tute out of multiple text-cells. To obtain the table-structure,", "bbox": {"l": 50.111992, "t": 314.4245, "r": 286.36505, "b": 323.33105, "coord_origin": "TOPLEFT"}}, {"id": 30, "text": "one creates an initial graph, where each of the text-cells", "bbox": {"l": 50.111992, "t": 326.37949000000003, "r": 286.36508, "b": 335.28604, "coord_origin": "TOPLEFT"}}, {"id": 31, "text": "becomes a node in the graph similar to [33, 34, 2]. Each", "bbox": {"l": 50.111992, "t": 338.33447, "r": 286.36505, "b": 347.2410300000001, "coord_origin": "TOPLEFT"}}, {"id": 32, "text": "node is then associated with en embedding vector coming", "bbox": {"l": 50.111992, "t": 350.28946, "r": 286.36505, "b": 359.19601, "coord_origin": "TOPLEFT"}}, {"id": 33, "text": "from the encoded image, its coordinates and the encoded", "bbox": {"l": 50.111992, "t": 362.24545000000006, "r": 286.36508, "b": 371.15201, "coord_origin": "TOPLEFT"}}, {"id": 34, "text": "text. Furthermore, nodes that represent adjacent text-cells", "bbox": {"l": 50.111992, "t": 374.20044, "r": 286.36508, "b": 383.10699, "coord_origin": "TOPLEFT"}}, {"id": 35, "text": "are linked. Graph Convolutional Networks (GCN\u2019s) based", "bbox": {"l": 50.111992, "t": 386.15542999999997, "r": 286.36508, "b": 395.06198, "coord_origin": "TOPLEFT"}}, {"id": 36, "text": "methods take the image as an input, but also the position of", "bbox": {"l": 50.111992, "t": 398.11041000000006, "r": 286.36508, "b": 407.01697, "coord_origin": "TOPLEFT"}}, {"id": 37, "text": "the text-cells and their content [18]. The purpose of a GCN", "bbox": {"l": 50.111992, "t": 410.0654, "r": 286.36508, "b": 418.97195, "coord_origin": "TOPLEFT"}}, {"id": 38, "text": "is to transform the input graph into a new graph, which re-", "bbox": {"l": 50.111992, "t": 422.02038999999996, "r": 286.36505, "b": 430.92694, "coord_origin": "TOPLEFT"}}, {"id": 39, "text": "places the old links with new ones.", "bbox": {"l": 50.111992, "t": 433.97638, "r": 198.2359, "b": 442.88293, "coord_origin": "TOPLEFT"}}, {"id": 40, "text": "The new links then", "bbox": {"l": 205.92703, "t": 433.97638, "r": 286.36505, "b": 442.88293, "coord_origin": "TOPLEFT"}}, {"id": 41, "text": "represent the table-structure. With this approach, one can", "bbox": {"l": 50.111992, "t": 445.93137, "r": 286.36508, "b": 454.83792000000005, "coord_origin": "TOPLEFT"}}, {"id": 42, "text": "avoid the need to build custom OCR decoders. However,", "bbox": {"l": 50.111992, "t": 457.88635, "r": 286.36505, "b": 466.79291, "coord_origin": "TOPLEFT"}}, {"id": 43, "text": "the quality of the reconstructed structure is not comparable", "bbox": {"l": 50.111992, "t": 469.84134, "r": 286.36505, "b": 478.74789, "coord_origin": "TOPLEFT"}}, {"id": 44, "text": "to the current state-of-the-art [18].", "bbox": {"l": 50.111992, "t": 481.79633, "r": 186.49998, "b": 490.70288, "coord_origin": "TOPLEFT"}}]}, "text": "Graph Neural networks : Graph Neural networks (GNN\u2019s) take a radically different approach to tablestructure extraction. Note that one table cell can constitute out of multiple text-cells. To obtain the table-structure, one creates an initial graph, where each of the text-cells becomes a node in the graph similar to [33, 34, 2]. Each node is then associated with en embedding vector coming from the encoded image, its coordinates and the encoded text. Furthermore, nodes that represent adjacent text-cells are linked. Graph Convolutional Networks (GCN\u2019s) based methods take the image as an input, but also the position of the text-cells and their content [18]. The purpose of a GCN is to transform the input graph into a new graph, which replaces the old links with new ones. The new links then represent the table-structure. With this approach, one can avoid the need to build custom OCR decoders. However, the quality of the reconstructed structure is not comparable to the current state-of-the-art [18]."}, {"label": "text", "id": 3, "page_no": 2, "cluster": {"id": 3, "label": "text", "bbox": {"l": 49.33047866821289, "t": 492.9067077636719, "r": 286.740234375, "b": 622.9658203125, "coord_origin": "TOPLEFT"}, "confidence": 0.9875094294548035, "cells": [{"id": 45, "text": "Hybrid Deep Learning-Rule-Based approach", "bbox": {"l": 62.066994, "t": 493.68875, "r": 252.88068000000004, "b": 502.64514, "coord_origin": "TOPLEFT"}}, {"id": 46, "text": ": A pop-", "bbox": {"l": 252.88199, "t": 493.80832, "r": 286.36627, "b": 502.71487, "coord_origin": "TOPLEFT"}}, {"id": 47, "text": "ular current model for table-structure identification is the", "bbox": {"l": 50.111984, "t": 505.76331, "r": 286.36505, "b": 514.66986, "coord_origin": "TOPLEFT"}}, {"id": 48, "text": "use of a hybrid Deep Learning-Rule-Based approach similar", "bbox": {"l": 50.111984, "t": 517.71829, "r": 286.36505, "b": 526.6248499999999, "coord_origin": "TOPLEFT"}}, {"id": 49, "text": "to [27, 29]. In this approach, one first detects the position of", "bbox": {"l": 50.111984, "t": 529.67328, "r": 286.36508, "b": 538.57985, "coord_origin": "TOPLEFT"}}, {"id": 50, "text": "the table-cells with object detection (e.g. YoloVx or Mask-", "bbox": {"l": 50.111984, "t": 541.62929, "r": 286.36508, "b": 550.53584, "coord_origin": "TOPLEFT"}}, {"id": 51, "text": "RCNN), then classifies the table into different types (from", "bbox": {"l": 50.111984, "t": 553.58429, "r": 286.36511, "b": 562.4908399999999, "coord_origin": "TOPLEFT"}}, {"id": 52, "text": "its images) and finally uses different rule-sets to obtain", "bbox": {"l": 50.111984, "t": 565.5392899999999, "r": 286.36511, "b": 574.44585, "coord_origin": "TOPLEFT"}}, {"id": 53, "text": "its table-structure. Currently, this approach achieves state-", "bbox": {"l": 50.111984, "t": 577.49429, "r": 286.36502, "b": 586.40085, "coord_origin": "TOPLEFT"}}, {"id": 54, "text": "of-the-art results, but is not an end-to-end deep-learning", "bbox": {"l": 50.111984, "t": 589.4493, "r": 286.36505, "b": 598.35585, "coord_origin": "TOPLEFT"}}, {"id": 55, "text": "method. As such, new rules need to be written if different", "bbox": {"l": 50.111984, "t": 601.4043, "r": 286.36502, "b": 610.31085, "coord_origin": "TOPLEFT"}}, {"id": 56, "text": "types of tables are encountered.", "bbox": {"l": 50.111984, "t": 613.36029, "r": 175.98943, "b": 622.26685, "coord_origin": "TOPLEFT"}}]}, "text": "Hybrid Deep Learning-Rule-Based approach : A popular current model for table-structure identification is the use of a hybrid Deep Learning-Rule-Based approach similar to [27, 29]. In this approach, one first detects the position of the table-cells with object detection (e.g. YoloVx or MaskRCNN), then classifies the table into different types (from its images) and finally uses different rule-sets to obtain its table-structure. Currently, this approach achieves stateof-the-art results, but is not an end-to-end deep-learning method. As such, new rules need to be written if different types of tables are encountered."}, {"label": "section_header", "id": 4, "page_no": 2, "cluster": {"id": 4, "label": "section_header", "bbox": {"l": 49.298309326171875, "t": 635.1035766601562, "r": 105.35755920410156, "b": 646.6925699999999, "coord_origin": "TOPLEFT"}, "confidence": 0.9423062205314636, "cells": [{"id": 57, "text": "3.", "bbox": {"l": 50.111984, "t": 635.94484, "r": 57.82375699999999, "b": 646.6925699999999, "coord_origin": "TOPLEFT"}}, {"id": 58, "text": "Datasets", "bbox": {"l": 68.106125, "t": 635.94484, "r": 105.22546, "b": 646.6925699999999, "coord_origin": "TOPLEFT"}}]}, "text": "3. Datasets"}, {"label": "text", "id": 5, "page_no": 2, "cluster": {"id": 5, "label": "text", "bbox": {"l": 49.53636169433594, "t": 655.4588012695312, "r": 286.584716796875, "b": 713.151863, "coord_origin": "TOPLEFT"}, "confidence": 0.9862047433853149, "cells": [{"id": 59, "text": "We rely on large-scale datasets such as PubTabNet [37],", "bbox": {"l": 62.06698600000001, "t": 656.42529, "r": 286.36493, "b": 665.33186, "coord_origin": "TOPLEFT"}}, {"id": 60, "text": "FinTabNet [36], and TableBank [17] datasets to train and", "bbox": {"l": 50.111984, "t": 668.38029, "r": 286.36508, "b": 677.2868599999999, "coord_origin": "TOPLEFT"}}, {"id": 61, "text": "evaluate our models. These datasets span over various ap-", "bbox": {"l": 50.111984, "t": 680.3353, "r": 286.36502, "b": 689.24186, "coord_origin": "TOPLEFT"}}, {"id": 62, "text": "pearance styles and content.", "bbox": {"l": 50.111984, "t": 692.290298, "r": 166.24602, "b": 701.196861, "coord_origin": "TOPLEFT"}}, {"id": 63, "text": "We also introduce our own", "bbox": {"l": 173.68808, "t": 692.290298, "r": 286.36508, "b": 701.196861, "coord_origin": "TOPLEFT"}}, {"id": 64, "text": "synthetically generated SynthTabNet dataset to fix an im-", "bbox": {"l": 50.111984, "t": 704.2453, "r": 286.36505, "b": 713.151863, "coord_origin": "TOPLEFT"}}]}, "text": "We rely on large-scale datasets such as PubTabNet [37], FinTabNet [36], and TableBank [17] datasets to train and evaluate our models. These datasets span over various appearance styles and content. We also introduce our own synthetically generated SynthTabNet dataset to fix an im-"}, {"label": "picture", "id": 6, "page_no": 2, "cluster": {"id": 6, "label": "picture", "bbox": {"l": 312.10369873046875, "t": 78.44086456298828, "r": 550.38916015625, "b": 250.77495999999996, "coord_origin": "TOPLEFT"}, "confidence": 0.9746916890144348, "cells": [{"id": 65, "text": "PubTabNet + FinTabNet", "bbox": {"l": 380.79849, "t": 79.81176999999991, "r": 486.84909, "b": 88.55975000000001, "coord_origin": "TOPLEFT"}}, {"id": 66, "text": "Rows / Columns", "bbox": {"l": 396.76776, "t": 242.02697999999998, "r": 469.78748, "b": 250.77495999999996, "coord_origin": "TOPLEFT"}}, {"id": 67, "text": "0", "bbox": {"l": 320.97653, "t": 233.42296999999996, "r": 324.79254, "b": 239.255, "coord_origin": "TOPLEFT"}}, {"id": 68, "text": "20", "bbox": {"l": 410.483, "t": 233.42296999999996, "r": 418.11319, "b": 239.255, "coord_origin": "TOPLEFT"}}, {"id": 69, "text": "40", "bbox": {"l": 500.84949, "t": 233.42296999999996, "r": 508.47968000000003, "b": 239.255, "coord_origin": "TOPLEFT"}}, {"id": 70, "text": "10", "bbox": {"l": 365.29999, "t": 233.42296999999996, "r": 372.93018, "b": 239.255, "coord_origin": "TOPLEFT"}}, {"id": 71, "text": "30", "bbox": {"l": 455.66626, "t": 233.42296999999996, "r": 463.29645, "b": 239.255, "coord_origin": "TOPLEFT"}}, {"id": 72, "text": "50", "bbox": {"l": 542.03528, "t": 233.42296999999996, "r": 549.66547, "b": 239.255, "coord_origin": "TOPLEFT"}}, {"id": 73, "text": "0", "bbox": {"l": 316.04474, "t": 230.44617000000005, "r": 319.86075, "b": 236.27819999999997, "coord_origin": "TOPLEFT"}}, {"id": 74, "text": "2", "bbox": {"l": 312.62521, "t": 198.69073000000003, "r": 316.44122, "b": 204.52277000000004, "coord_origin": "TOPLEFT"}}, {"id": 75, "text": "0", "bbox": {"l": 316.43942, "t": 198.69073000000003, "r": 320.2554, "b": 204.52277000000004, "coord_origin": "TOPLEFT"}}, {"id": 76, "text": "4", "bbox": {"l": 313.14951, "t": 168.09795999999994, "r": 316.96552, "b": 173.92998999999998, "coord_origin": "TOPLEFT"}}, {"id": 77, "text": "0", "bbox": {"l": 316.96371, "t": 168.09795999999994, "r": 320.77969, "b": 173.92998999999998, "coord_origin": "TOPLEFT"}}, {"id": 78, "text": "6", "bbox": {"l": 312.92972, "t": 136.58771000000002, "r": 316.74573, "b": 142.41974000000005, "coord_origin": "TOPLEFT"}}, {"id": 79, "text": "0", "bbox": {"l": 316.74393, "t": 136.58771000000002, "r": 320.55991, "b": 142.41974000000005, "coord_origin": "TOPLEFT"}}, {"id": 80, "text": "8", "bbox": {"l": 312.48227, "t": 105.60175000000004, "r": 316.29828, "b": 111.43377999999996, "coord_origin": "TOPLEFT"}}, {"id": 81, "text": "0", "bbox": {"l": 316.29648, "t": 105.60175000000004, "r": 320.11246, "b": 111.43377999999996, "coord_origin": "TOPLEFT"}}, {"id": 82, "text": "1", "bbox": {"l": 312.48227, "t": 212.25922000000003, "r": 316.29828, "b": 218.09124999999995, "coord_origin": "TOPLEFT"}}, {"id": 83, "text": "0", "bbox": {"l": 316.29648, "t": 212.25922000000003, "r": 320.11246, "b": 218.09124999999995, "coord_origin": "TOPLEFT"}}, {"id": 84, "text": "3", "bbox": {"l": 313.07639, "t": 183.72198000000003, "r": 316.8924, "b": 189.55402000000004, "coord_origin": "TOPLEFT"}}, {"id": 85, "text": "0", "bbox": {"l": 316.89059, "t": 183.72198000000003, "r": 320.70657, "b": 189.55402000000004, "coord_origin": "TOPLEFT"}}, {"id": 86, "text": "5", "bbox": {"l": 312.76321, "t": 152.47400000000005, "r": 316.57922, "b": 158.30602999999996, "coord_origin": "TOPLEFT"}}, {"id": 87, "text": "0", "bbox": {"l": 316.57742, "t": 152.47400000000005, "r": 320.3934, "b": 158.30602999999996, "coord_origin": "TOPLEFT"}}, {"id": 88, "text": "7", "bbox": {"l": 312.19775, "t": 120.57050000000004, "r": 316.01376, "b": 126.40252999999996, "coord_origin": "TOPLEFT"}}, {"id": 89, "text": "0", "bbox": {"l": 316.01196, "t": 120.57050000000004, "r": 319.82794, "b": 126.40252999999996, "coord_origin": "TOPLEFT"}}, {"id": 90, "text": "9", "bbox": {"l": 312.8165, "t": 90.1087, "r": 316.63251, "b": 95.94073000000003, "coord_origin": "TOPLEFT"}}, {"id": 91, "text": "0", "bbox": {"l": 316.63071, "t": 90.1087, "r": 320.44669, "b": 95.94073000000003, "coord_origin": "TOPLEFT"}}, {"id": 92, "text": "0", "bbox": {"l": 532.17426, "t": 222.72729000000004, "r": 536.94427, "b": 230.01727000000005, "coord_origin": "TOPLEFT"}}, {"id": 93, "text": "10K", "bbox": {"l": 532.87952, "t": 108.26702999999986, "r": 547.61249, "b": 115.55700999999999, "coord_origin": "TOPLEFT"}}, {"id": 94, "text": "8K", "bbox": {"l": 532.7735, "t": 130.78101000000004, "r": 542.73877, "b": 138.07097999999996, "coord_origin": "TOPLEFT"}}, {"id": 95, "text": "6K", "bbox": {"l": 532.79901, "t": 153.92352000000005, "r": 542.76428, "b": 161.21349999999995, "coord_origin": "TOPLEFT"}}, {"id": 96, "text": "4K", "bbox": {"l": 532.5705, "t": 176.75800000000004, "r": 542.53577, "b": 184.04796999999996, "coord_origin": "TOPLEFT"}}, {"id": 97, "text": "2K", "bbox": {"l": 532.14551, "t": 199.6463, "r": 542.11078, "b": 206.93628, "coord_origin": "TOPLEFT"}}]}, "text": "", "data": null, "provenance": null, "predicted_class": null, "confidence": null}, {"label": "caption", "id": 7, "page_no": 2, "cluster": {"id": 7, "label": "caption", "bbox": {"l": 308.2174377441406, "t": 267.0390930175781, "r": 545.11511, "b": 288.6979099999999, "coord_origin": "TOPLEFT"}, "confidence": 0.9667505025863647, "cells": [{"id": 98, "text": "Figure 2:", "bbox": {"l": 308.862, "t": 267.83636, "r": 346.06238, "b": 276.74292, "coord_origin": "TOPLEFT"}}, {"id": 99, "text": "Distribution of the tables across different table", "bbox": {"l": 354.49072, "t": 267.83636, "r": 545.11511, "b": 276.74292, "coord_origin": "TOPLEFT"}}, {"id": 100, "text": "dimensions in PubTabNet + FinTabNet datasets", "bbox": {"l": 308.862, "t": 279.79132000000004, "r": 498.56989, "b": 288.6979099999999, "coord_origin": "TOPLEFT"}}]}, "text": "Figure 2: Distribution of the tables across different table dimensions in PubTabNet + FinTabNet datasets"}, {"label": "text", "id": 8, "page_no": 2, "cluster": {"id": 8, "label": "text", "bbox": {"l": 308.1451416015625, "t": 316.442138671875, "r": 437.27002, "b": 326.37991, "coord_origin": "TOPLEFT"}, "confidence": 0.8879812359809875, "cells": [{"id": 101, "text": "balance in the previous datasets.", "bbox": {"l": 308.862, "t": 317.47336, "r": 437.27002, "b": 326.37991, "coord_origin": "TOPLEFT"}}]}, "text": "balance in the previous datasets."}, {"label": "text", "id": 9, "page_no": 2, "cluster": {"id": 9, "label": "text", "bbox": {"l": 307.77337646484375, "t": 330.03399658203125, "r": 545.3823852539062, "b": 627.379638671875, "coord_origin": "TOPLEFT"}, "confidence": 0.9870319366455078, "cells": [{"id": 102, "text": "The PubTabNet dataset contains 509k tables delivered as", "bbox": {"l": 320.81699, "t": 331.53137, "r": 545.11505, "b": 340.43793, "coord_origin": "TOPLEFT"}}, {"id": 103, "text": "annotated PNG images. The annotations consist of the table", "bbox": {"l": 308.862, "t": 343.48635999999993, "r": 545.11517, "b": 352.39291, "coord_origin": "TOPLEFT"}}, {"id": 104, "text": "structure represented in HTML format, the tokenized text", "bbox": {"l": 308.862, "t": 355.44235, "r": 545.11505, "b": 364.34890999999993, "coord_origin": "TOPLEFT"}}, {"id": 105, "text": "and its bounding boxes per table cell. Fig. 1 shows the ap-", "bbox": {"l": 308.862, "t": 367.39734, "r": 545.11505, "b": 376.30389, "coord_origin": "TOPLEFT"}}, {"id": 106, "text": "pearance style of PubTabNet. Depending on its complexity,", "bbox": {"l": 308.862, "t": 379.35233, "r": 545.11511, "b": 388.25888, "coord_origin": "TOPLEFT"}}, {"id": 107, "text": "a table is characterized as \u201csimple\u201d when it does not contain", "bbox": {"l": 308.862, "t": 391.30731, "r": 545.11511, "b": 400.21386999999993, "coord_origin": "TOPLEFT"}}, {"id": 108, "text": "row spans or column spans, otherwise it is \u201ccomplex\u201d. The", "bbox": {"l": 308.862, "t": 403.26230000000004, "r": 545.11505, "b": 412.16885, "coord_origin": "TOPLEFT"}}, {"id": 109, "text": "dataset is divided into Train and Val splits (roughly 98% and", "bbox": {"l": 308.862, "t": 415.21729, "r": 545.11511, "b": 424.12384, "coord_origin": "TOPLEFT"}}, {"id": 110, "text": "2%). The Train split consists of 54% simple and 46% com-", "bbox": {"l": 308.862, "t": 427.17328, "r": 545.11517, "b": 436.0798300000001, "coord_origin": "TOPLEFT"}}, {"id": 111, "text": "plex tables and the Val split of 51% and 49% respectively.", "bbox": {"l": 308.862, "t": 439.12827, "r": 545.11517, "b": 448.03482, "coord_origin": "TOPLEFT"}}, {"id": 112, "text": "The FinTabNet dataset contains 112k tables delivered as", "bbox": {"l": 308.862, "t": 451.08325, "r": 545.11511, "b": 459.98981000000003, "coord_origin": "TOPLEFT"}}, {"id": 113, "text": "single-page PDF documents with mixed table structures and", "bbox": {"l": 308.862, "t": 463.03824, "r": 545.11505, "b": 471.94479, "coord_origin": "TOPLEFT"}}, {"id": 114, "text": "text content. Similarly to the PubTabNet, the annotations", "bbox": {"l": 308.862, "t": 474.99323, "r": 545.11511, "b": 483.89978, "coord_origin": "TOPLEFT"}}, {"id": 115, "text": "of FinTabNet include the table structure in HTML, the to-", "bbox": {"l": 308.862, "t": 486.94922, "r": 545.11511, "b": 495.85577, "coord_origin": "TOPLEFT"}}, {"id": 116, "text": "kenized text and the bounding boxes on a table cell basis.", "bbox": {"l": 308.862, "t": 498.90421, "r": 545.11511, "b": 507.81076, "coord_origin": "TOPLEFT"}}, {"id": 117, "text": "The dataset is divided into Train, Test and Val splits (81%,", "bbox": {"l": 308.862, "t": 510.85919, "r": 545.11517, "b": 519.76575, "coord_origin": "TOPLEFT"}}, {"id": 118, "text": "9.5%, 9.5%), and each one is almost equally divided into", "bbox": {"l": 308.862, "t": 522.8141800000001, "r": 545.11517, "b": 531.72073, "coord_origin": "TOPLEFT"}}, {"id": 119, "text": "simple and complex tables (Train: 48% simple, 52% com-", "bbox": {"l": 308.862, "t": 534.76917, "r": 545.11505, "b": 543.67574, "coord_origin": "TOPLEFT"}}, {"id": 120, "text": "plex, Test: 48% simple, 52% complex, Test: 53% simple,", "bbox": {"l": 308.862, "t": 546.72418, "r": 545.11511, "b": 555.6307400000001, "coord_origin": "TOPLEFT"}}, {"id": 121, "text": "47% complex). Finally the TableBank dataset consists of", "bbox": {"l": 308.862, "t": 558.6801800000001, "r": 545.11511, "b": 567.58673, "coord_origin": "TOPLEFT"}}, {"id": 122, "text": "145k tables provided as JPEG images. The latter has anno-", "bbox": {"l": 308.862, "t": 570.63518, "r": 545.11505, "b": 579.54173, "coord_origin": "TOPLEFT"}}, {"id": 123, "text": "tations for the table structure, but only few with bounding", "bbox": {"l": 308.862, "t": 582.59018, "r": 545.11499, "b": 591.49673, "coord_origin": "TOPLEFT"}}, {"id": 124, "text": "boxes of the table cells. The entire dataset consists of sim-", "bbox": {"l": 308.862, "t": 594.54518, "r": 545.11517, "b": 603.45174, "coord_origin": "TOPLEFT"}}, {"id": 125, "text": "ple tables and it is divided into 90% Train, 3% Test and 7%", "bbox": {"l": 308.862, "t": 606.50018, "r": 545.11511, "b": 615.40674, "coord_origin": "TOPLEFT"}}, {"id": 126, "text": "Val splits.", "bbox": {"l": 308.862, "t": 618.45518, "r": 348.16446, "b": 627.36174, "coord_origin": "TOPLEFT"}}]}, "text": "The PubTabNet dataset contains 509k tables delivered as annotated PNG images. The annotations consist of the table structure represented in HTML format, the tokenized text and its bounding boxes per table cell. Fig. 1 shows the appearance style of PubTabNet. Depending on its complexity, a table is characterized as \u201csimple\u201d when it does not contain row spans or column spans, otherwise it is \u201ccomplex\u201d. The dataset is divided into Train and Val splits (roughly 98% and 2%). The Train split consists of 54% simple and 46% complex tables and the Val split of 51% and 49% respectively. The FinTabNet dataset contains 112k tables delivered as single-page PDF documents with mixed table structures and text content. Similarly to the PubTabNet, the annotations of FinTabNet include the table structure in HTML, the tokenized text and the bounding boxes on a table cell basis. The dataset is divided into Train, Test and Val splits (81%, 9.5%, 9.5%), and each one is almost equally divided into simple and complex tables (Train: 48% simple, 52% complex, Test: 48% simple, 52% complex, Test: 53% simple, 47% complex). Finally the TableBank dataset consists of 145k tables provided as JPEG images. The latter has annotations for the table structure, but only few with bounding boxes of the table cells. The entire dataset consists of simple tables and it is divided into 90% Train, 3% Test and 7% Val splits."}, {"label": "text", "id": 10, "page_no": 2, "cluster": {"id": 10, "label": "text", "bbox": {"l": 308.01678466796875, "t": 631.5999755859375, "r": 545.2085571289062, "b": 713.151764, "coord_origin": "TOPLEFT"}, "confidence": 0.9840090274810791, "cells": [{"id": 127, "text": "Due to the heterogeneity across the dataset formats, it", "bbox": {"l": 320.81699, "t": 632.51419, "r": 545.11487, "b": 641.42075, "coord_origin": "TOPLEFT"}}, {"id": 128, "text": "was necessary to combine all available data into one homog-", "bbox": {"l": 308.862, "t": 644.46919, "r": 545.11511, "b": 653.37575, "coord_origin": "TOPLEFT"}}, {"id": 129, "text": "enized dataset before we could train our models for practi-", "bbox": {"l": 308.862, "t": 656.42419, "r": 545.11511, "b": 665.33076, "coord_origin": "TOPLEFT"}}, {"id": 130, "text": "cal purposes. Given the size of PubTabNet, we adopted its", "bbox": {"l": 308.862, "t": 668.38019, "r": 545.11499, "b": 677.28676, "coord_origin": "TOPLEFT"}}, {"id": 131, "text": "annotation format and we extracted and converted all tables", "bbox": {"l": 308.862, "t": 680.33519, "r": 545.11505, "b": 689.24176, "coord_origin": "TOPLEFT"}}, {"id": 132, "text": "as PNG images with a resolution of 72 dpi. Additionally,", "bbox": {"l": 308.862, "t": 692.290192, "r": 545.11505, "b": 701.196762, "coord_origin": "TOPLEFT"}}, {"id": 133, "text": "we have filtered out tables with extreme sizes due to small", "bbox": {"l": 308.862, "t": 704.245193, "r": 545.11511, "b": 713.151764, "coord_origin": "TOPLEFT"}}]}, "text": "Due to the heterogeneity across the dataset formats, it was necessary to combine all available data into one homogenized dataset before we could train our models for practical purposes. Given the size of PubTabNet, we adopted its annotation format and we extracted and converted all tables as PNG images with a resolution of 72 dpi. Additionally, we have filtered out tables with extreme sizes due to small"}], "headers": [{"label": "page_footer", "id": 11, "page_no": 2, "cluster": {"id": 11, "label": "page_footer", "bbox": {"l": 294.38531494140625, "t": 733.3271484375, "r": 300.10229, "b": 743.039761, "coord_origin": "TOPLEFT"}, "confidence": 0.8715606927871704, "cells": [{"id": 134, "text": "3", "bbox": {"l": 295.121, "t": 734.133198, "r": 300.10229, "b": 743.039761, "coord_origin": "TOPLEFT"}}]}, "text": "3"}]}}, {"page_no": 3, "size": {"width": 612.0, "height": 792.0}, "cells": [{"id": 0, "text": "amount of such tables, and kept only those ones ranging", "bbox": {"l": 50.112, "t": 75.20836999999995, "r": 286.36511, "b": 84.11492999999996, "coord_origin": "TOPLEFT"}}, {"id": 1, "text": "between 1*1 and 20*10 (rows/columns).", "bbox": {"l": 50.112, "t": 87.16339000000005, "r": 212.28319, "b": 96.06994999999995, "coord_origin": "TOPLEFT"}}, {"id": 2, "text": "The availability of the bounding boxes for all table cells", "bbox": {"l": 62.067001, "t": 100.96038999999996, "r": 286.36502, "b": 109.86694, "coord_origin": "TOPLEFT"}}, {"id": 3, "text": "is essential to train our models. In order to distinguish be-", "bbox": {"l": 50.112, "t": 112.91540999999995, "r": 286.36508, "b": 121.82195999999999, "coord_origin": "TOPLEFT"}}, {"id": 4, "text": "tween empty and non-empty bounding boxes, we have in-", "bbox": {"l": 50.112, "t": 124.87041999999997, "r": 286.36508, "b": 133.77697999999998, "coord_origin": "TOPLEFT"}}, {"id": 5, "text": "troduced a binary class in the annotation. Unfortunately, the", "bbox": {"l": 50.112, "t": 136.82641999999998, "r": 286.36511, "b": 145.73297000000002, "coord_origin": "TOPLEFT"}}, {"id": 6, "text": "original datasets either omit the bounding boxes for whole", "bbox": {"l": 50.112, "t": 148.78143, "r": 286.36511, "b": 157.68799, "coord_origin": "TOPLEFT"}}, {"id": 7, "text": "tables (e.g. TableBank) or they narrow their scope only to", "bbox": {"l": 50.112, "t": 160.73645, "r": 286.36508, "b": 169.64301, "coord_origin": "TOPLEFT"}}, {"id": 8, "text": "non-empty cells. Therefore, it was imperative to introduce", "bbox": {"l": 50.112, "t": 172.69146999999998, "r": 286.36505, "b": 181.59802000000002, "coord_origin": "TOPLEFT"}}, {"id": 9, "text": "a data pre-processing procedure that generates the missing", "bbox": {"l": 50.112, "t": 184.64648, "r": 286.36508, "b": 193.55304, "coord_origin": "TOPLEFT"}}, {"id": 10, "text": "bounding boxes out of the annotation information. This pro-", "bbox": {"l": 50.112, "t": 196.60248, "r": 286.36508, "b": 205.50903000000005, "coord_origin": "TOPLEFT"}}, {"id": 11, "text": "cedure first parses the provided table structure and calcu-", "bbox": {"l": 50.112, "t": 208.5575, "r": 286.36508, "b": 217.46405000000004, "coord_origin": "TOPLEFT"}}, {"id": 12, "text": "lates the dimensions of the most fine-grained grid that cov-", "bbox": {"l": 50.112, "t": 220.51251000000002, "r": 286.36511, "b": 229.41907000000003, "coord_origin": "TOPLEFT"}}, {"id": 13, "text": "ers the table structure. Notice that each table cell may oc-", "bbox": {"l": 50.112, "t": 232.46753, "r": 286.36508, "b": 241.37408000000005, "coord_origin": "TOPLEFT"}}, {"id": 14, "text": "cupy multiple grid squares due to row or column spans. In", "bbox": {"l": 50.112, "t": 244.42255, "r": 286.36508, "b": 253.32910000000004, "coord_origin": "TOPLEFT"}}, {"id": 15, "text": "case of PubTabNet we had to compute missing bounding", "bbox": {"l": 50.112, "t": 256.37756, "r": 286.36505, "b": 265.28412000000003, "coord_origin": "TOPLEFT"}}, {"id": 16, "text": "boxes for 48% of the simple and 69% of the complex ta-", "bbox": {"l": 50.112, "t": 268.33356000000003, "r": 286.36505, "b": 277.24010999999996, "coord_origin": "TOPLEFT"}}, {"id": 17, "text": "bles.", "bbox": {"l": 50.112, "t": 280.28853999999995, "r": 68.652397, "b": 289.1951, "coord_origin": "TOPLEFT"}}, {"id": 18, "text": "Regarding FinTabNet, 68% of the simple and 98%", "bbox": {"l": 75.566444, "t": 280.28853999999995, "r": 286.36514, "b": 289.1951, "coord_origin": "TOPLEFT"}}, {"id": 19, "text": "of the complex tables require the generation of bounding", "bbox": {"l": 50.112, "t": 292.24353, "r": 286.36511, "b": 301.15009, "coord_origin": "TOPLEFT"}}, {"id": 20, "text": "boxes.", "bbox": {"l": 50.112, "t": 304.19852000000003, "r": 75.695961, "b": 313.10507, "coord_origin": "TOPLEFT"}}, {"id": 21, "text": "As it is illustrated in Fig. 2, the table distributions from", "bbox": {"l": 62.067001, "t": 317.99550999999997, "r": 286.36499, "b": 326.90207, "coord_origin": "TOPLEFT"}}, {"id": 22, "text": "all datasets are skewed towards simpler structures with", "bbox": {"l": 50.112, "t": 329.95151, "r": 286.36511, "b": 338.8580600000001, "coord_origin": "TOPLEFT"}}, {"id": 23, "text": "fewer number of rows/columns. Additionally, there is very", "bbox": {"l": 50.112, "t": 341.90649, "r": 286.36502, "b": 350.81305, "coord_origin": "TOPLEFT"}}, {"id": 24, "text": "limited variance in the table styles, which in case of Pub-", "bbox": {"l": 50.112, "t": 353.8614799999999, "r": 286.36505, "b": 362.76804, "coord_origin": "TOPLEFT"}}, {"id": 25, "text": "TabNet and FinTabNet means one styling format for the", "bbox": {"l": 50.112, "t": 365.81647, "r": 286.36508, "b": 374.72301999999996, "coord_origin": "TOPLEFT"}}, {"id": 26, "text": "majority of the tables.", "bbox": {"l": 50.112, "t": 377.77145, "r": 141.58859, "b": 386.67801, "coord_origin": "TOPLEFT"}}, {"id": 27, "text": "Similar limitations appear also in", "bbox": {"l": 148.70189, "t": 377.77145, "r": 286.36508, "b": 386.67801, "coord_origin": "TOPLEFT"}}, {"id": 28, "text": "the type of table content, which in some cases (e.g. FinTab-", "bbox": {"l": 50.112, "t": 389.72644, "r": 286.36508, "b": 398.63300000000004, "coord_origin": "TOPLEFT"}}, {"id": 29, "text": "Net) is restricted to a certain domain. Ultimately, the lack", "bbox": {"l": 50.112, "t": 401.68243, "r": 286.36511, "b": 410.58899, "coord_origin": "TOPLEFT"}}, {"id": 30, "text": "of diversity in the training dataset damages the ability of the", "bbox": {"l": 50.112, "t": 413.63742, "r": 286.36511, "b": 422.54398, "coord_origin": "TOPLEFT"}}, {"id": 31, "text": "models to generalize well on unseen data.", "bbox": {"l": 50.112, "t": 425.59241, "r": 216.39774, "b": 434.49896, "coord_origin": "TOPLEFT"}}, {"id": 32, "text": "Motivated by those observations we aimed at generating", "bbox": {"l": 62.067001, "t": 439.3894, "r": 286.36499, "b": 448.2959599999999, "coord_origin": "TOPLEFT"}}, {"id": 33, "text": "a synthetic table dataset named", "bbox": {"l": 50.112, "t": 451.34439, "r": 172.14388, "b": 460.25095, "coord_origin": "TOPLEFT"}}, {"id": 34, "text": "SynthTabNet", "bbox": {"l": 174.14801, "t": 451.43405, "r": 224.70818999999997, "b": 460.02182, "coord_origin": "TOPLEFT"}}, {"id": 35, "text": ". This approach", "bbox": {"l": 224.70801, "t": 451.34439, "r": 286.36655, "b": 460.25095, "coord_origin": "TOPLEFT"}}, {"id": 36, "text": "offers control over: 1) the size of the dataset, 2) the table", "bbox": {"l": 50.112015, "t": 463.30038, "r": 286.36505, "b": 472.20694, "coord_origin": "TOPLEFT"}}, {"id": 37, "text": "structure, 3) the table style and 4) the type of content. The", "bbox": {"l": 50.112015, "t": 475.25537, "r": 286.36511, "b": 484.16193, "coord_origin": "TOPLEFT"}}, {"id": 38, "text": "complexity of the table structure is described by the size of", "bbox": {"l": 50.112015, "t": 487.21036, "r": 286.36511, "b": 496.11691, "coord_origin": "TOPLEFT"}}, {"id": 39, "text": "the table header and the table body, as well as the percentage", "bbox": {"l": 50.112015, "t": 499.16534, "r": 286.36508, "b": 508.0719, "coord_origin": "TOPLEFT"}}, {"id": 40, "text": "of the table cells covered by row spans and column spans.", "bbox": {"l": 50.112015, "t": 511.12033, "r": 286.36505, "b": 520.02689, "coord_origin": "TOPLEFT"}}, {"id": 41, "text": "A set of carefully designed styling templates provides the", "bbox": {"l": 50.112015, "t": 523.07632, "r": 286.36508, "b": 531.98288, "coord_origin": "TOPLEFT"}}, {"id": 42, "text": "basis to build a wide range of table appearances. Lastly, the", "bbox": {"l": 50.112015, "t": 535.0313100000001, "r": 286.36508, "b": 543.93788, "coord_origin": "TOPLEFT"}}, {"id": 43, "text": "table content is generated out of a curated collection of text", "bbox": {"l": 50.112015, "t": 546.98633, "r": 286.36511, "b": 555.89288, "coord_origin": "TOPLEFT"}}, {"id": 44, "text": "corpora. By controlling the size and scope of the synthetic", "bbox": {"l": 50.112015, "t": 558.94133, "r": 286.36508, "b": 567.84789, "coord_origin": "TOPLEFT"}}, {"id": 45, "text": "datasets we are able to train and evaluate our models in a", "bbox": {"l": 50.112015, "t": 570.89633, "r": 286.36511, "b": 579.8028899999999, "coord_origin": "TOPLEFT"}}, {"id": 46, "text": "variety of different conditions. For example, we can first", "bbox": {"l": 50.112015, "t": 582.85133, "r": 286.36511, "b": 591.75789, "coord_origin": "TOPLEFT"}}, {"id": 47, "text": "generate a highly diverse dataset to train our models and", "bbox": {"l": 50.112015, "t": 594.80733, "r": 286.36505, "b": 603.71388, "coord_origin": "TOPLEFT"}}, {"id": 48, "text": "then evaluate their performance on other synthetic datasets", "bbox": {"l": 50.112015, "t": 606.76233, "r": 286.36508, "b": 615.6688800000001, "coord_origin": "TOPLEFT"}}, {"id": 49, "text": "which are focused on a specific domain.", "bbox": {"l": 50.112015, "t": 618.71733, "r": 209.7527, "b": 627.62389, "coord_origin": "TOPLEFT"}}, {"id": 50, "text": "In this regard, we have prepared four synthetic datasets,", "bbox": {"l": 62.067017, "t": 632.51433, "r": 286.36499, "b": 641.42088, "coord_origin": "TOPLEFT"}}, {"id": 51, "text": "each one containing 150k examples. The corpora to gener-", "bbox": {"l": 50.112015, "t": 644.46933, "r": 286.36508, "b": 653.37589, "coord_origin": "TOPLEFT"}}, {"id": 52, "text": "ate the table text consists of the most frequent terms appear-", "bbox": {"l": 50.112015, "t": 656.42532, "r": 286.36511, "b": 665.33189, "coord_origin": "TOPLEFT"}}, {"id": 53, "text": "ing in PubTabNet and FinTabNet together with randomly", "bbox": {"l": 50.112015, "t": 668.38033, "r": 286.36505, "b": 677.28689, "coord_origin": "TOPLEFT"}}, {"id": 54, "text": "generated text. The first two synthetic datasets have been", "bbox": {"l": 50.112015, "t": 680.33533, "r": 286.36508, "b": 689.24189, "coord_origin": "TOPLEFT"}}, {"id": 55, "text": "fine-tuned to mimic the appearance of the original datasets", "bbox": {"l": 50.112015, "t": 692.290329, "r": 286.36508, "b": 701.196892, "coord_origin": "TOPLEFT"}}, {"id": 56, "text": "but encompass more complicated table structures. The third", "bbox": {"l": 50.112015, "t": 704.245331, "r": 286.36511, "b": 713.151894, "coord_origin": "TOPLEFT"}}, {"id": 57, "text": "Tags", "bbox": {"l": 412.332, "t": 73.61437999999998, "r": 430.90231, "b": 82.52094, "coord_origin": "TOPLEFT"}}, {"id": 58, "text": "Bbox", "bbox": {"l": 442.85742, "t": 73.61437999999998, "r": 464.4463799999999, "b": 82.52094, "coord_origin": "TOPLEFT"}}, {"id": 59, "text": "Size", "bbox": {"l": 477.78632, "t": 73.61437999999998, "r": 494.94193, "b": 82.52094, "coord_origin": "TOPLEFT"}}, {"id": 60, "text": "Format", "bbox": {"l": 508.28186, "t": 73.61437999999998, "r": 536.91437, "b": 82.52094, "coord_origin": "TOPLEFT"}}, {"id": 61, "text": "PubTabNet", "bbox": {"l": 317.06, "t": 85.9673499999999, "r": 361.64264, "b": 94.87390000000005, "coord_origin": "TOPLEFT"}}, {"id": 62, "text": "3", "bbox": {"l": 417.85599, "t": 85.6684600000001, "r": 425.37775, "b": 94.88385000000017, "coord_origin": "TOPLEFT"}}, {"id": 63, "text": "3", "bbox": {"l": 449.89569, "t": 85.6684600000001, "r": 457.41745000000003, "b": 94.88385000000017, "coord_origin": "TOPLEFT"}}, {"id": 64, "text": "509k", "bbox": {"l": 476.401, "t": 85.9673499999999, "r": 496.3262, "b": 94.87390000000005, "coord_origin": "TOPLEFT"}}, {"id": 65, "text": "PNG", "bbox": {"l": 512.63495, "t": 85.9673499999999, "r": 532.56012, "b": 94.87390000000005, "coord_origin": "TOPLEFT"}}, {"id": 66, "text": "FinTabNet", "bbox": {"l": 317.06, "t": 97.92236000000003, "r": 359.43094, "b": 106.82892000000004, "coord_origin": "TOPLEFT"}}, {"id": 67, "text": "3", "bbox": {"l": 417.85599, "t": 97.62347, "r": 425.37775, "b": 106.83887000000016, "coord_origin": "TOPLEFT"}}, {"id": 68, "text": "3", "bbox": {"l": 449.89569, "t": 97.62347, "r": 457.41745000000003, "b": 106.83887000000016, "coord_origin": "TOPLEFT"}}, {"id": 69, "text": "112k", "bbox": {"l": 476.401, "t": 97.92236000000003, "r": 496.3262, "b": 106.82892000000004, "coord_origin": "TOPLEFT"}}, {"id": 70, "text": "PDF", "bbox": {"l": 513.46185, "t": 97.92236000000003, "r": 531.73328, "b": 106.82892000000004, "coord_origin": "TOPLEFT"}}, {"id": 71, "text": "TableBank", "bbox": {"l": 317.06, "t": 109.87836000000004, "r": 359.97888, "b": 118.78490999999997, "coord_origin": "TOPLEFT"}}, {"id": 72, "text": "3", "bbox": {"l": 417.85599, "t": 109.57947000000001, "r": 425.37775, "b": 118.79485999999997, "coord_origin": "TOPLEFT"}}, {"id": 73, "text": "7", "bbox": {"l": 450.81226, "t": 109.57947000000001, "r": 456.50091999999995, "b": 118.79485999999997, "coord_origin": "TOPLEFT"}}, {"id": 74, "text": "145k", "bbox": {"l": 476.401, "t": 109.87836000000004, "r": 496.3262, "b": 118.78490999999997, "coord_origin": "TOPLEFT"}}, {"id": 75, "text": "JPEG", "bbox": {"l": 511.25017999999994, "t": 109.87836000000004, "r": 533.94501, "b": 118.78490999999997, "coord_origin": "TOPLEFT"}}, {"id": 76, "text": "Combined-Tabnet(*)", "bbox": {"l": 317.06, "t": 121.83336999999995, "r": 400.37723, "b": 130.73992999999996, "coord_origin": "TOPLEFT"}}, {"id": 77, "text": "3", "bbox": {"l": 417.85599, "t": 121.53448000000003, "r": 425.37775, "b": 130.74987999999996, "coord_origin": "TOPLEFT"}}, {"id": 78, "text": "3", "bbox": {"l": 449.89569, "t": 121.53448000000003, "r": 457.41745000000003, "b": 130.74987999999996, "coord_origin": "TOPLEFT"}}, {"id": 79, "text": "400k", "bbox": {"l": 476.401, "t": 121.83336999999995, "r": 496.3262, "b": 130.73992999999996, "coord_origin": "TOPLEFT"}}, {"id": 80, "text": "PNG", "bbox": {"l": 512.63495, "t": 121.83336999999995, "r": 532.56012, "b": 130.73992999999996, "coord_origin": "TOPLEFT"}}, {"id": 81, "text": "Combined(**)", "bbox": {"l": 317.06, "t": 133.78839000000005, "r": 375.17184, "b": 142.69494999999995, "coord_origin": "TOPLEFT"}}, {"id": 82, "text": "3", "bbox": {"l": 417.85599, "t": 133.48950000000002, "r": 425.37775, "b": 142.70489999999995, "coord_origin": "TOPLEFT"}}, {"id": 83, "text": "3", "bbox": {"l": 449.89569, "t": 133.48950000000002, "r": 457.41745000000003, "b": 142.70489999999995, "coord_origin": "TOPLEFT"}}, {"id": 84, "text": "500k", "bbox": {"l": 476.401, "t": 133.78839000000005, "r": 496.3262, "b": 142.69494999999995, "coord_origin": "TOPLEFT"}}, {"id": 85, "text": "PNG", "bbox": {"l": 512.63495, "t": 133.78839000000005, "r": 532.56012, "b": 142.69494999999995, "coord_origin": "TOPLEFT"}}, {"id": 86, "text": "SynthTabNet", "bbox": {"l": 317.06, "t": 145.74341000000004, "r": 369.39352, "b": 154.64995999999996, "coord_origin": "TOPLEFT"}}, {"id": 87, "text": "3", "bbox": {"l": 417.85599, "t": 145.44446000000005, "r": 425.37775, "b": 154.65985, "coord_origin": "TOPLEFT"}}, {"id": 88, "text": "3", "bbox": {"l": 449.89569, "t": 145.44446000000005, "r": 457.41745000000003, "b": 154.65985, "coord_origin": "TOPLEFT"}}, {"id": 89, "text": "600k", "bbox": {"l": 476.401, "t": 145.74334999999996, "r": 496.3262, "b": 154.6499, "coord_origin": "TOPLEFT"}}, {"id": 90, "text": "PNG", "bbox": {"l": 512.63495, "t": 145.74334999999996, "r": 532.56012, "b": 154.6499, "coord_origin": "TOPLEFT"}}, {"id": 91, "text": "Table 1:", "bbox": {"l": 308.862, "t": 167.66138, "r": 344.6178, "b": 176.56793000000005, "coord_origin": "TOPLEFT"}}, {"id": 92, "text": "Both", "bbox": {"l": 361.07602, "t": 167.66138, "r": 380.45328, "b": 176.56793000000005, "coord_origin": "TOPLEFT"}}, {"id": 93, "text": "\u201cCombined-Tabnet\u201d", "bbox": {"l": 386.56799, "t": 167.75104, "r": 468.67974999999996, "b": 176.33880999999997, "coord_origin": "TOPLEFT"}}, {"id": 94, "text": "and", "bbox": {"l": 474.79599, "t": 167.66138, "r": 489.18198, "b": 176.56793000000005, "coord_origin": "TOPLEFT"}}, {"id": 95, "text": "\u201dCombined-", "bbox": {"l": 495.29898000000003, "t": 167.75104, "r": 545.112, "b": 176.33880999999997, "coord_origin": "TOPLEFT"}}, {"id": 96, "text": "Tabnet\u201d", "bbox": {"l": 308.862, "t": 179.70605, "r": 341.16077, "b": 188.29381999999998, "coord_origin": "TOPLEFT"}}, {"id": 97, "text": "are variations of the following: (*) The Combined-", "bbox": {"l": 343.457, "t": 179.61639000000002, "r": 545.11005, "b": 188.52295000000004, "coord_origin": "TOPLEFT"}}, {"id": 98, "text": "Tabnet dataset is the processed combination of PubTabNet", "bbox": {"l": 308.862, "t": 191.57141000000001, "r": 545.11505, "b": 200.47797000000003, "coord_origin": "TOPLEFT"}}, {"id": 99, "text": "and Fintabnet. (**) The combined dataset is the processed", "bbox": {"l": 308.862, "t": 203.52643, "r": 545.11499, "b": 212.43298000000004, "coord_origin": "TOPLEFT"}}, {"id": 100, "text": "combination of PubTabNet, Fintabnet and TableBank.", "bbox": {"l": 308.862, "t": 215.48242000000005, "r": 523.93469, "b": 224.38897999999995, "coord_origin": "TOPLEFT"}}, {"id": 101, "text": "one adopts a colorful appearance with high contrast and the", "bbox": {"l": 308.862, "t": 249.62041999999997, "r": 545.11517, "b": 258.52698, "coord_origin": "TOPLEFT"}}, {"id": 102, "text": "last one contains tables with sparse content. Lastly, we have", "bbox": {"l": 308.862, "t": 261.57543999999996, "r": 545.11517, "b": 270.48199, "coord_origin": "TOPLEFT"}}, {"id": 103, "text": "combined all synthetic datasets into one big unified syn-", "bbox": {"l": 308.862, "t": 273.5304, "r": 545.11505, "b": 282.43698, "coord_origin": "TOPLEFT"}}, {"id": 104, "text": "thetic dataset of 600k examples.", "bbox": {"l": 308.862, "t": 285.48541000000006, "r": 436.82169, "b": 294.39197, "coord_origin": "TOPLEFT"}}, {"id": 105, "text": "Tab. 1 summarizes the various attributes of the datasets.", "bbox": {"l": 320.81699, "t": 297.77240000000006, "r": 542.74396, "b": 306.67896, "coord_origin": "TOPLEFT"}}, {"id": 106, "text": "4.", "bbox": {"l": 308.862, "t": 321.18396, "r": 316.28476, "b": 331.93167000000005, "coord_origin": "TOPLEFT"}}, {"id": 107, "text": "The TableFormer model", "bbox": {"l": 326.18176, "t": 321.18396, "r": 444.93607000000003, "b": 331.93167000000005, "coord_origin": "TOPLEFT"}}, {"id": 108, "text": "Given the image of a table, TableFormer is able to pre-", "bbox": {"l": 320.81699, "t": 341.93939, "r": 545.11499, "b": 350.84594999999996, "coord_origin": "TOPLEFT"}}, {"id": 109, "text": "dict: 1) a sequence of tokens that represent the structure of", "bbox": {"l": 308.862, "t": 353.89438, "r": 545.11511, "b": 362.80092999999994, "coord_origin": "TOPLEFT"}}, {"id": 110, "text": "a table, and 2) a bounding box coupled to a subset of those", "bbox": {"l": 308.862, "t": 365.84937, "r": 545.11517, "b": 374.75592, "coord_origin": "TOPLEFT"}}, {"id": 111, "text": "tokens. The conversion of an image into a sequence of to-", "bbox": {"l": 308.862, "t": 377.80435, "r": 545.11505, "b": 386.71091, "coord_origin": "TOPLEFT"}}, {"id": 112, "text": "kens is a well-known task [35, 16]. While attention is often", "bbox": {"l": 308.862, "t": 389.75934000000007, "r": 545.11517, "b": 398.66588999999993, "coord_origin": "TOPLEFT"}}, {"id": 113, "text": "used as an implicit method to associate each token of the", "bbox": {"l": 308.862, "t": 401.71432000000004, "r": 545.11523, "b": 410.62088, "coord_origin": "TOPLEFT"}}, {"id": 114, "text": "sequence with a position in the original image, an explicit", "bbox": {"l": 308.862, "t": 413.67032, "r": 545.11517, "b": 422.57687, "coord_origin": "TOPLEFT"}}, {"id": 115, "text": "association between the individual table-cells and the image", "bbox": {"l": 308.862, "t": 425.62531, "r": 545.11505, "b": 434.53186, "coord_origin": "TOPLEFT"}}, {"id": 116, "text": "bounding boxes is also required.", "bbox": {"l": 308.862, "t": 437.58029, "r": 437.9375, "b": 446.48685000000006, "coord_origin": "TOPLEFT"}}, {"id": 117, "text": "4.1.", "bbox": {"l": 308.862, "t": 457.69427, "r": 323.14081, "b": 467.54633, "coord_origin": "TOPLEFT"}}, {"id": 118, "text": "Model architecture.", "bbox": {"l": 332.66003, "t": 457.69427, "r": 420.16058, "b": 467.54633, "coord_origin": "TOPLEFT"}}, {"id": 119, "text": "We now describe in detail the proposed method, which", "bbox": {"l": 320.81699, "t": 476.76529, "r": 545.11487, "b": 485.67184, "coord_origin": "TOPLEFT"}}, {"id": 120, "text": "is composed of three main components, see Fig.", "bbox": {"l": 308.862, "t": 488.72028, "r": 509.02054, "b": 497.62683, "coord_origin": "TOPLEFT"}}, {"id": 121, "text": "4.", "bbox": {"l": 515.58588, "t": 488.72028, "r": 523.05786, "b": 497.62683, "coord_origin": "TOPLEFT"}}, {"id": 122, "text": "Our", "bbox": {"l": 529.62323, "t": 488.72028, "r": 545.11505, "b": 497.62683, "coord_origin": "TOPLEFT"}}, {"id": 123, "text": "CNN Backbone Network", "bbox": {"l": 308.862, "t": 500.76492, "r": 406.34601, "b": 509.35269, "coord_origin": "TOPLEFT"}}, {"id": 124, "text": "encodes the input as a feature vec-", "bbox": {"l": 408.87201, "t": 500.67526, "r": 545.1106, "b": 509.58182, "coord_origin": "TOPLEFT"}}, {"id": 125, "text": "tor of predefined length.", "bbox": {"l": 308.862, "t": 512.63126, "r": 409.39459, "b": 521.53781, "coord_origin": "TOPLEFT"}}, {"id": 126, "text": "The input feature vector of the", "bbox": {"l": 416.72705, "t": 512.63126, "r": 545.11505, "b": 521.53781, "coord_origin": "TOPLEFT"}}, {"id": 127, "text": "encoded image is passed to the", "bbox": {"l": 308.862, "t": 524.58624, "r": 436.194, "b": 533.4928, "coord_origin": "TOPLEFT"}}, {"id": 128, "text": "Structure Decoder", "bbox": {"l": 439.526, "t": 524.6759, "r": 513.86694, "b": 533.26367, "coord_origin": "TOPLEFT"}}, {"id": 129, "text": "to pro-", "bbox": {"l": 517.43201, "t": 524.58624, "r": 545.10815, "b": 533.4928, "coord_origin": "TOPLEFT"}}, {"id": 130, "text": "duce a sequence of HTML tags that represent the structure", "bbox": {"l": 308.862, "t": 536.54124, "r": 545.11511, "b": 545.4478, "coord_origin": "TOPLEFT"}}, {"id": 131, "text": "of the table.", "bbox": {"l": 308.862, "t": 548.49625, "r": 358.5455, "b": 557.4028000000001, "coord_origin": "TOPLEFT"}}, {"id": 132, "text": "With each prediction of an HTML standard", "bbox": {"l": 365.19055, "t": 548.49625, "r": 545.11517, "b": 557.4028000000001, "coord_origin": "TOPLEFT"}}, {"id": 133, "text": "data cell (\u2018", "bbox": {"l": 308.862, "t": 560.45125, "r": 352.40851, "b": 569.3578, "coord_origin": "TOPLEFT"}}, {"id": 134, "text": "<", "bbox": {"l": 352.409, "t": 560.29184, "r": 360.1579, "b": 569.13863, "coord_origin": "TOPLEFT"}}, {"id": 135, "text": "td", "bbox": {"l": 360.15799, "t": 560.45125, "r": 367.90891, "b": 569.3578, "coord_origin": "TOPLEFT"}}, {"id": 136, "text": ">", "bbox": {"l": 367.909, "t": 560.29184, "r": 375.6579, "b": 569.13863, "coord_origin": "TOPLEFT"}}, {"id": 137, "text": "\u2019) the hidden state of that cell is passed to", "bbox": {"l": 375.65799, "t": 560.45125, "r": 545.11182, "b": 569.3578, "coord_origin": "TOPLEFT"}}, {"id": 138, "text": "the Cell BBox Decoder. As for spanning cells, such as row", "bbox": {"l": 308.862, "t": 572.40724, "r": 545.11499, "b": 581.3138, "coord_origin": "TOPLEFT"}}, {"id": 139, "text": "or column span, the tag is broken down to \u2018", "bbox": {"l": 308.862, "t": 584.3622399999999, "r": 483.11768, "b": 593.2688, "coord_origin": "TOPLEFT"}}, {"id": 140, "text": "<", "bbox": {"l": 483.11902, "t": 584.20284, "r": 490.86792, "b": 593.04962, "coord_origin": "TOPLEFT"}}, {"id": 141, "text": "\u2019, \u2018rowspan=\u2019", "bbox": {"l": 490.86800999999997, "t": 584.3622399999999, "r": 545.11438, "b": 593.2688, "coord_origin": "TOPLEFT"}}, {"id": 142, "text": "or \u2018colspan=\u2019, with the number of spanning cells (attribute),", "bbox": {"l": 308.862, "t": 596.31725, "r": 545.11493, "b": 605.2238, "coord_origin": "TOPLEFT"}}, {"id": 143, "text": "and \u2018", "bbox": {"l": 308.862, "t": 608.27225, "r": 329.64395, "b": 617.1788, "coord_origin": "TOPLEFT"}}, {"id": 144, "text": ">", "bbox": {"l": 329.646, "t": 608.11284, "r": 337.3949, "b": 616.9596300000001, "coord_origin": "TOPLEFT"}}, {"id": 145, "text": "\u2019. The hidden state attached to \u2018", "bbox": {"l": 337.39398, "t": 608.27225, "r": 468.5914, "b": 617.1788, "coord_origin": "TOPLEFT"}}, {"id": 146, "text": "<", "bbox": {"l": 468.59496999999993, "t": 608.11284, "r": 476.34387000000004, "b": 616.9596300000001, "coord_origin": "TOPLEFT"}}, {"id": 147, "text": "\u2019 is passed to the", "bbox": {"l": 476.3439599999999, "t": 608.27225, "r": 545.11572, "b": 617.1788, "coord_origin": "TOPLEFT"}}, {"id": 148, "text": "Cell BBox Decoder. A shared feed forward network (FFN)", "bbox": {"l": 308.86197, "t": 620.22725, "r": 545.11499, "b": 629.1338000000001, "coord_origin": "TOPLEFT"}}, {"id": 149, "text": "receives the hidden states from the Structure Decoder, to", "bbox": {"l": 308.86197, "t": 632.1822500000001, "r": 545.11517, "b": 641.08881, "coord_origin": "TOPLEFT"}}, {"id": 150, "text": "provide the final detection predictions of the bounding box", "bbox": {"l": 308.86197, "t": 644.13824, "r": 545.11511, "b": 653.0448, "coord_origin": "TOPLEFT"}}, {"id": 151, "text": "coordinates and their classification.", "bbox": {"l": 308.86197, "t": 656.09325, "r": 449.42432, "b": 664.99981, "coord_origin": "TOPLEFT"}}, {"id": 152, "text": "CNN Backbone Network.", "bbox": {"l": 320.81696, "t": 668.2607, "r": 431.90985, "b": 677.21707, "coord_origin": "TOPLEFT"}}, {"id": 153, "text": "A ResNet-18 CNN is the", "bbox": {"l": 439.49896, "t": 668.3802499999999, "r": 545.11255, "b": 677.2868100000001, "coord_origin": "TOPLEFT"}}, {"id": 154, "text": "backbone that receives the table image and encodes it as a", "bbox": {"l": 308.86197, "t": 680.33525, "r": 545.11499, "b": 689.24181, "coord_origin": "TOPLEFT"}}, {"id": 155, "text": "vector of predefined length. The network has been modified", "bbox": {"l": 308.86197, "t": 692.290253, "r": 545.11511, "b": 701.196815, "coord_origin": "TOPLEFT"}}, {"id": 156, "text": "by removing the linear and pooling layer, as we are not per-", "bbox": {"l": 308.86197, "t": 704.245255, "r": 545.11505, "b": 713.1518169999999, "coord_origin": "TOPLEFT"}}, {"id": 157, "text": "4", "bbox": {"l": 295.12097, "t": 734.133251, "r": 300.10226, "b": 743.039814, "coord_origin": "TOPLEFT"}}], "predictions": {"layout": {"clusters": [{"id": 0, "label": "text", "bbox": {"l": 49.493873596191406, "t": 74.18077087402344, "r": 286.36511, "b": 96.06994999999995, "coord_origin": "TOPLEFT"}, "confidence": 0.9614068269729614, "cells": [{"id": 0, "text": "amount of such tables, and kept only those ones ranging", "bbox": {"l": 50.112, "t": 75.20836999999995, "r": 286.36511, "b": 84.11492999999996, "coord_origin": "TOPLEFT"}}, {"id": 1, "text": "between 1*1 and 20*10 (rows/columns).", "bbox": {"l": 50.112, "t": 87.16339000000005, "r": 212.28319, "b": 96.06994999999995, "coord_origin": "TOPLEFT"}}]}, {"id": 1, "label": "text", "bbox": {"l": 49.20685577392578, "t": 100.19926452636719, "r": 286.6800842285156, "b": 313.10507, "coord_origin": "TOPLEFT"}, "confidence": 0.9880395531654358, "cells": [{"id": 2, "text": "The availability of the bounding boxes for all table cells", "bbox": {"l": 62.067001, "t": 100.96038999999996, "r": 286.36502, "b": 109.86694, "coord_origin": "TOPLEFT"}}, {"id": 3, "text": "is essential to train our models. In order to distinguish be-", "bbox": {"l": 50.112, "t": 112.91540999999995, "r": 286.36508, "b": 121.82195999999999, "coord_origin": "TOPLEFT"}}, {"id": 4, "text": "tween empty and non-empty bounding boxes, we have in-", "bbox": {"l": 50.112, "t": 124.87041999999997, "r": 286.36508, "b": 133.77697999999998, "coord_origin": "TOPLEFT"}}, {"id": 5, "text": "troduced a binary class in the annotation. Unfortunately, the", "bbox": {"l": 50.112, "t": 136.82641999999998, "r": 286.36511, "b": 145.73297000000002, "coord_origin": "TOPLEFT"}}, {"id": 6, "text": "original datasets either omit the bounding boxes for whole", "bbox": {"l": 50.112, "t": 148.78143, "r": 286.36511, "b": 157.68799, "coord_origin": "TOPLEFT"}}, {"id": 7, "text": "tables (e.g. TableBank) or they narrow their scope only to", "bbox": {"l": 50.112, "t": 160.73645, "r": 286.36508, "b": 169.64301, "coord_origin": "TOPLEFT"}}, {"id": 8, "text": "non-empty cells. Therefore, it was imperative to introduce", "bbox": {"l": 50.112, "t": 172.69146999999998, "r": 286.36505, "b": 181.59802000000002, "coord_origin": "TOPLEFT"}}, {"id": 9, "text": "a data pre-processing procedure that generates the missing", "bbox": {"l": 50.112, "t": 184.64648, "r": 286.36508, "b": 193.55304, "coord_origin": "TOPLEFT"}}, {"id": 10, "text": "bounding boxes out of the annotation information. This pro-", "bbox": {"l": 50.112, "t": 196.60248, "r": 286.36508, "b": 205.50903000000005, "coord_origin": "TOPLEFT"}}, {"id": 11, "text": "cedure first parses the provided table structure and calcu-", "bbox": {"l": 50.112, "t": 208.5575, "r": 286.36508, "b": 217.46405000000004, "coord_origin": "TOPLEFT"}}, {"id": 12, "text": "lates the dimensions of the most fine-grained grid that cov-", "bbox": {"l": 50.112, "t": 220.51251000000002, "r": 286.36511, "b": 229.41907000000003, "coord_origin": "TOPLEFT"}}, {"id": 13, "text": "ers the table structure. Notice that each table cell may oc-", "bbox": {"l": 50.112, "t": 232.46753, "r": 286.36508, "b": 241.37408000000005, "coord_origin": "TOPLEFT"}}, {"id": 14, "text": "cupy multiple grid squares due to row or column spans. In", "bbox": {"l": 50.112, "t": 244.42255, "r": 286.36508, "b": 253.32910000000004, "coord_origin": "TOPLEFT"}}, {"id": 15, "text": "case of PubTabNet we had to compute missing bounding", "bbox": {"l": 50.112, "t": 256.37756, "r": 286.36505, "b": 265.28412000000003, "coord_origin": "TOPLEFT"}}, {"id": 16, "text": "boxes for 48% of the simple and 69% of the complex ta-", "bbox": {"l": 50.112, "t": 268.33356000000003, "r": 286.36505, "b": 277.24010999999996, "coord_origin": "TOPLEFT"}}, {"id": 17, "text": "bles.", "bbox": {"l": 50.112, "t": 280.28853999999995, "r": 68.652397, "b": 289.1951, "coord_origin": "TOPLEFT"}}, {"id": 18, "text": "Regarding FinTabNet, 68% of the simple and 98%", "bbox": {"l": 75.566444, "t": 280.28853999999995, "r": 286.36514, "b": 289.1951, "coord_origin": "TOPLEFT"}}, {"id": 19, "text": "of the complex tables require the generation of bounding", "bbox": {"l": 50.112, "t": 292.24353, "r": 286.36511, "b": 301.15009, "coord_origin": "TOPLEFT"}}, {"id": 20, "text": "boxes.", "bbox": {"l": 50.112, "t": 304.19852000000003, "r": 75.695961, "b": 313.10507, "coord_origin": "TOPLEFT"}}]}, {"id": 2, "label": "text", "bbox": {"l": 49.406982421875, "t": 317.0073547363281, "r": 286.68450927734375, "b": 434.8613586425781, "coord_origin": "TOPLEFT"}, "confidence": 0.9874395728111267, "cells": [{"id": 21, "text": "As it is illustrated in Fig. 2, the table distributions from", "bbox": {"l": 62.067001, "t": 317.99550999999997, "r": 286.36499, "b": 326.90207, "coord_origin": "TOPLEFT"}}, {"id": 22, "text": "all datasets are skewed towards simpler structures with", "bbox": {"l": 50.112, "t": 329.95151, "r": 286.36511, "b": 338.8580600000001, "coord_origin": "TOPLEFT"}}, {"id": 23, "text": "fewer number of rows/columns. Additionally, there is very", "bbox": {"l": 50.112, "t": 341.90649, "r": 286.36502, "b": 350.81305, "coord_origin": "TOPLEFT"}}, {"id": 24, "text": "limited variance in the table styles, which in case of Pub-", "bbox": {"l": 50.112, "t": 353.8614799999999, "r": 286.36505, "b": 362.76804, "coord_origin": "TOPLEFT"}}, {"id": 25, "text": "TabNet and FinTabNet means one styling format for the", "bbox": {"l": 50.112, "t": 365.81647, "r": 286.36508, "b": 374.72301999999996, "coord_origin": "TOPLEFT"}}, {"id": 26, "text": "majority of the tables.", "bbox": {"l": 50.112, "t": 377.77145, "r": 141.58859, "b": 386.67801, "coord_origin": "TOPLEFT"}}, {"id": 27, "text": "Similar limitations appear also in", "bbox": {"l": 148.70189, "t": 377.77145, "r": 286.36508, "b": 386.67801, "coord_origin": "TOPLEFT"}}, {"id": 28, "text": "the type of table content, which in some cases (e.g. FinTab-", "bbox": {"l": 50.112, "t": 389.72644, "r": 286.36508, "b": 398.63300000000004, "coord_origin": "TOPLEFT"}}, {"id": 29, "text": "Net) is restricted to a certain domain. Ultimately, the lack", "bbox": {"l": 50.112, "t": 401.68243, "r": 286.36511, "b": 410.58899, "coord_origin": "TOPLEFT"}}, {"id": 30, "text": "of diversity in the training dataset damages the ability of the", "bbox": {"l": 50.112, "t": 413.63742, "r": 286.36511, "b": 422.54398, "coord_origin": "TOPLEFT"}}, {"id": 31, "text": "models to generalize well on unseen data.", "bbox": {"l": 50.112, "t": 425.59241, "r": 216.39774, "b": 434.49896, "coord_origin": "TOPLEFT"}}]}, {"id": 3, "label": "text", "bbox": {"l": 49.20225143432617, "t": 438.12713623046875, "r": 286.90093994140625, "b": 627.7101440429688, "coord_origin": "TOPLEFT"}, "confidence": 0.9876185655593872, "cells": [{"id": 32, "text": "Motivated by those observations we aimed at generating", "bbox": {"l": 62.067001, "t": 439.3894, "r": 286.36499, "b": 448.2959599999999, "coord_origin": "TOPLEFT"}}, {"id": 33, "text": "a synthetic table dataset named", "bbox": {"l": 50.112, "t": 451.34439, "r": 172.14388, "b": 460.25095, "coord_origin": "TOPLEFT"}}, {"id": 34, "text": "SynthTabNet", "bbox": {"l": 174.14801, "t": 451.43405, "r": 224.70818999999997, "b": 460.02182, "coord_origin": "TOPLEFT"}}, {"id": 35, "text": ". This approach", "bbox": {"l": 224.70801, "t": 451.34439, "r": 286.36655, "b": 460.25095, "coord_origin": "TOPLEFT"}}, {"id": 36, "text": "offers control over: 1) the size of the dataset, 2) the table", "bbox": {"l": 50.112015, "t": 463.30038, "r": 286.36505, "b": 472.20694, "coord_origin": "TOPLEFT"}}, {"id": 37, "text": "structure, 3) the table style and 4) the type of content. The", "bbox": {"l": 50.112015, "t": 475.25537, "r": 286.36511, "b": 484.16193, "coord_origin": "TOPLEFT"}}, {"id": 38, "text": "complexity of the table structure is described by the size of", "bbox": {"l": 50.112015, "t": 487.21036, "r": 286.36511, "b": 496.11691, "coord_origin": "TOPLEFT"}}, {"id": 39, "text": "the table header and the table body, as well as the percentage", "bbox": {"l": 50.112015, "t": 499.16534, "r": 286.36508, "b": 508.0719, "coord_origin": "TOPLEFT"}}, {"id": 40, "text": "of the table cells covered by row spans and column spans.", "bbox": {"l": 50.112015, "t": 511.12033, "r": 286.36505, "b": 520.02689, "coord_origin": "TOPLEFT"}}, {"id": 41, "text": "A set of carefully designed styling templates provides the", "bbox": {"l": 50.112015, "t": 523.07632, "r": 286.36508, "b": 531.98288, "coord_origin": "TOPLEFT"}}, {"id": 42, "text": "basis to build a wide range of table appearances. Lastly, the", "bbox": {"l": 50.112015, "t": 535.0313100000001, "r": 286.36508, "b": 543.93788, "coord_origin": "TOPLEFT"}}, {"id": 43, "text": "table content is generated out of a curated collection of text", "bbox": {"l": 50.112015, "t": 546.98633, "r": 286.36511, "b": 555.89288, "coord_origin": "TOPLEFT"}}, {"id": 44, "text": "corpora. By controlling the size and scope of the synthetic", "bbox": {"l": 50.112015, "t": 558.94133, "r": 286.36508, "b": 567.84789, "coord_origin": "TOPLEFT"}}, {"id": 45, "text": "datasets we are able to train and evaluate our models in a", "bbox": {"l": 50.112015, "t": 570.89633, "r": 286.36511, "b": 579.8028899999999, "coord_origin": "TOPLEFT"}}, {"id": 46, "text": "variety of different conditions. For example, we can first", "bbox": {"l": 50.112015, "t": 582.85133, "r": 286.36511, "b": 591.75789, "coord_origin": "TOPLEFT"}}, {"id": 47, "text": "generate a highly diverse dataset to train our models and", "bbox": {"l": 50.112015, "t": 594.80733, "r": 286.36505, "b": 603.71388, "coord_origin": "TOPLEFT"}}, {"id": 48, "text": "then evaluate their performance on other synthetic datasets", "bbox": {"l": 50.112015, "t": 606.76233, "r": 286.36508, "b": 615.6688800000001, "coord_origin": "TOPLEFT"}}, {"id": 49, "text": "which are focused on a specific domain.", "bbox": {"l": 50.112015, "t": 618.71733, "r": 209.7527, "b": 627.62389, "coord_origin": "TOPLEFT"}}]}, {"id": 4, "label": "text", "bbox": {"l": 49.49843215942383, "t": 631.5836181640625, "r": 286.4927978515625, "b": 713.2897338867188, "coord_origin": "TOPLEFT"}, "confidence": 0.9870707392692566, "cells": [{"id": 50, "text": "In this regard, we have prepared four synthetic datasets,", "bbox": {"l": 62.067017, "t": 632.51433, "r": 286.36499, "b": 641.42088, "coord_origin": "TOPLEFT"}}, {"id": 51, "text": "each one containing 150k examples. The corpora to gener-", "bbox": {"l": 50.112015, "t": 644.46933, "r": 286.36508, "b": 653.37589, "coord_origin": "TOPLEFT"}}, {"id": 52, "text": "ate the table text consists of the most frequent terms appear-", "bbox": {"l": 50.112015, "t": 656.42532, "r": 286.36511, "b": 665.33189, "coord_origin": "TOPLEFT"}}, {"id": 53, "text": "ing in PubTabNet and FinTabNet together with randomly", "bbox": {"l": 50.112015, "t": 668.38033, "r": 286.36505, "b": 677.28689, "coord_origin": "TOPLEFT"}}, {"id": 54, "text": "generated text. The first two synthetic datasets have been", "bbox": {"l": 50.112015, "t": 680.33533, "r": 286.36508, "b": 689.24189, "coord_origin": "TOPLEFT"}}, {"id": 55, "text": "fine-tuned to mimic the appearance of the original datasets", "bbox": {"l": 50.112015, "t": 692.290329, "r": 286.36508, "b": 701.196892, "coord_origin": "TOPLEFT"}}, {"id": 56, "text": "but encompass more complicated table structures. The third", "bbox": {"l": 50.112015, "t": 704.245331, "r": 286.36511, "b": 713.151894, "coord_origin": "TOPLEFT"}}]}, {"id": 5, "label": "table", "bbox": {"l": 310.6772766113281, "t": 73.19307708740234, "r": 542.958251953125, "b": 155.2208251953125, "coord_origin": "TOPLEFT"}, "confidence": 0.9777666330337524, "cells": [{"id": 57, "text": "Tags", "bbox": {"l": 412.332, "t": 73.61437999999998, "r": 430.90231, "b": 82.52094, "coord_origin": "TOPLEFT"}}, {"id": 58, "text": "Bbox", "bbox": {"l": 442.85742, "t": 73.61437999999998, "r": 464.4463799999999, "b": 82.52094, "coord_origin": "TOPLEFT"}}, {"id": 59, "text": "Size", "bbox": {"l": 477.78632, "t": 73.61437999999998, "r": 494.94193, "b": 82.52094, "coord_origin": "TOPLEFT"}}, {"id": 60, "text": "Format", "bbox": {"l": 508.28186, "t": 73.61437999999998, "r": 536.91437, "b": 82.52094, "coord_origin": "TOPLEFT"}}, {"id": 61, "text": "PubTabNet", "bbox": {"l": 317.06, "t": 85.9673499999999, "r": 361.64264, "b": 94.87390000000005, "coord_origin": "TOPLEFT"}}, {"id": 62, "text": "3", "bbox": {"l": 417.85599, "t": 85.6684600000001, "r": 425.37775, "b": 94.88385000000017, "coord_origin": "TOPLEFT"}}, {"id": 63, "text": "3", "bbox": {"l": 449.89569, "t": 85.6684600000001, "r": 457.41745000000003, "b": 94.88385000000017, "coord_origin": "TOPLEFT"}}, {"id": 64, "text": "509k", "bbox": {"l": 476.401, "t": 85.9673499999999, "r": 496.3262, "b": 94.87390000000005, "coord_origin": "TOPLEFT"}}, {"id": 65, "text": "PNG", "bbox": {"l": 512.63495, "t": 85.9673499999999, "r": 532.56012, "b": 94.87390000000005, "coord_origin": "TOPLEFT"}}, {"id": 66, "text": "FinTabNet", "bbox": {"l": 317.06, "t": 97.92236000000003, "r": 359.43094, "b": 106.82892000000004, "coord_origin": "TOPLEFT"}}, {"id": 67, "text": "3", "bbox": {"l": 417.85599, "t": 97.62347, "r": 425.37775, "b": 106.83887000000016, "coord_origin": "TOPLEFT"}}, {"id": 68, "text": "3", "bbox": {"l": 449.89569, "t": 97.62347, "r": 457.41745000000003, "b": 106.83887000000016, "coord_origin": "TOPLEFT"}}, {"id": 69, "text": "112k", "bbox": {"l": 476.401, "t": 97.92236000000003, "r": 496.3262, "b": 106.82892000000004, "coord_origin": "TOPLEFT"}}, {"id": 70, "text": "PDF", "bbox": {"l": 513.46185, "t": 97.92236000000003, "r": 531.73328, "b": 106.82892000000004, "coord_origin": "TOPLEFT"}}, {"id": 71, "text": "TableBank", "bbox": {"l": 317.06, "t": 109.87836000000004, "r": 359.97888, "b": 118.78490999999997, "coord_origin": "TOPLEFT"}}, {"id": 72, "text": "3", "bbox": {"l": 417.85599, "t": 109.57947000000001, "r": 425.37775, "b": 118.79485999999997, "coord_origin": "TOPLEFT"}}, {"id": 73, "text": "7", "bbox": {"l": 450.81226, "t": 109.57947000000001, "r": 456.50091999999995, "b": 118.79485999999997, "coord_origin": "TOPLEFT"}}, {"id": 74, "text": "145k", "bbox": {"l": 476.401, "t": 109.87836000000004, "r": 496.3262, "b": 118.78490999999997, "coord_origin": "TOPLEFT"}}, {"id": 75, "text": "JPEG", "bbox": {"l": 511.25017999999994, "t": 109.87836000000004, "r": 533.94501, "b": 118.78490999999997, "coord_origin": "TOPLEFT"}}, {"id": 76, "text": "Combined-Tabnet(*)", "bbox": {"l": 317.06, "t": 121.83336999999995, "r": 400.37723, "b": 130.73992999999996, "coord_origin": "TOPLEFT"}}, {"id": 77, "text": "3", "bbox": {"l": 417.85599, "t": 121.53448000000003, "r": 425.37775, "b": 130.74987999999996, "coord_origin": "TOPLEFT"}}, {"id": 78, "text": "3", "bbox": {"l": 449.89569, "t": 121.53448000000003, "r": 457.41745000000003, "b": 130.74987999999996, "coord_origin": "TOPLEFT"}}, {"id": 79, "text": "400k", "bbox": {"l": 476.401, "t": 121.83336999999995, "r": 496.3262, "b": 130.73992999999996, "coord_origin": "TOPLEFT"}}, {"id": 80, "text": "PNG", "bbox": {"l": 512.63495, "t": 121.83336999999995, "r": 532.56012, "b": 130.73992999999996, "coord_origin": "TOPLEFT"}}, {"id": 81, "text": "Combined(**)", "bbox": {"l": 317.06, "t": 133.78839000000005, "r": 375.17184, "b": 142.69494999999995, "coord_origin": "TOPLEFT"}}, {"id": 82, "text": "3", "bbox": {"l": 417.85599, "t": 133.48950000000002, "r": 425.37775, "b": 142.70489999999995, "coord_origin": "TOPLEFT"}}, {"id": 83, "text": "3", "bbox": {"l": 449.89569, "t": 133.48950000000002, "r": 457.41745000000003, "b": 142.70489999999995, "coord_origin": "TOPLEFT"}}, {"id": 84, "text": "500k", "bbox": {"l": 476.401, "t": 133.78839000000005, "r": 496.3262, "b": 142.69494999999995, "coord_origin": "TOPLEFT"}}, {"id": 85, "text": "PNG", "bbox": {"l": 512.63495, "t": 133.78839000000005, "r": 532.56012, "b": 142.69494999999995, "coord_origin": "TOPLEFT"}}, {"id": 86, "text": "SynthTabNet", "bbox": {"l": 317.06, "t": 145.74341000000004, "r": 369.39352, "b": 154.64995999999996, "coord_origin": "TOPLEFT"}}, {"id": 87, "text": "3", "bbox": {"l": 417.85599, "t": 145.44446000000005, "r": 425.37775, "b": 154.65985, "coord_origin": "TOPLEFT"}}, {"id": 88, "text": "3", "bbox": {"l": 449.89569, "t": 145.44446000000005, "r": 457.41745000000003, "b": 154.65985, "coord_origin": "TOPLEFT"}}, {"id": 89, "text": "600k", "bbox": {"l": 476.401, "t": 145.74334999999996, "r": 496.3262, "b": 154.6499, "coord_origin": "TOPLEFT"}}, {"id": 90, "text": "PNG", "bbox": {"l": 512.63495, "t": 145.74334999999996, "r": 532.56012, "b": 154.6499, "coord_origin": "TOPLEFT"}}]}, {"id": 6, "label": "caption", "bbox": {"l": 307.88067626953125, "t": 167.05953979492188, "r": 545.414306640625, "b": 224.38897999999995, "coord_origin": "TOPLEFT"}, "confidence": 0.9668342471122742, "cells": [{"id": 91, "text": "Table 1:", "bbox": {"l": 308.862, "t": 167.66138, "r": 344.6178, "b": 176.56793000000005, "coord_origin": "TOPLEFT"}}, {"id": 92, "text": "Both", "bbox": {"l": 361.07602, "t": 167.66138, "r": 380.45328, "b": 176.56793000000005, "coord_origin": "TOPLEFT"}}, {"id": 93, "text": "\u201cCombined-Tabnet\u201d", "bbox": {"l": 386.56799, "t": 167.75104, "r": 468.67974999999996, "b": 176.33880999999997, "coord_origin": "TOPLEFT"}}, {"id": 94, "text": "and", "bbox": {"l": 474.79599, "t": 167.66138, "r": 489.18198, "b": 176.56793000000005, "coord_origin": "TOPLEFT"}}, {"id": 95, "text": "\u201dCombined-", "bbox": {"l": 495.29898000000003, "t": 167.75104, "r": 545.112, "b": 176.33880999999997, "coord_origin": "TOPLEFT"}}, {"id": 96, "text": "Tabnet\u201d", "bbox": {"l": 308.862, "t": 179.70605, "r": 341.16077, "b": 188.29381999999998, "coord_origin": "TOPLEFT"}}, {"id": 97, "text": "are variations of the following: (*) The Combined-", "bbox": {"l": 343.457, "t": 179.61639000000002, "r": 545.11005, "b": 188.52295000000004, "coord_origin": "TOPLEFT"}}, {"id": 98, "text": "Tabnet dataset is the processed combination of PubTabNet", "bbox": {"l": 308.862, "t": 191.57141000000001, "r": 545.11505, "b": 200.47797000000003, "coord_origin": "TOPLEFT"}}, {"id": 99, "text": "and Fintabnet. (**) The combined dataset is the processed", "bbox": {"l": 308.862, "t": 203.52643, "r": 545.11499, "b": 212.43298000000004, "coord_origin": "TOPLEFT"}}, {"id": 100, "text": "combination of PubTabNet, Fintabnet and TableBank.", "bbox": {"l": 308.862, "t": 215.48242000000005, "r": 523.93469, "b": 224.38897999999995, "coord_origin": "TOPLEFT"}}]}, {"id": 7, "label": "text", "bbox": {"l": 307.9458923339844, "t": 248.9580078125, "r": 545.1485595703125, "b": 294.39197, "coord_origin": "TOPLEFT"}, "confidence": 0.9791285991668701, "cells": [{"id": 101, "text": "one adopts a colorful appearance with high contrast and the", "bbox": {"l": 308.862, "t": 249.62041999999997, "r": 545.11517, "b": 258.52698, "coord_origin": "TOPLEFT"}}, {"id": 102, "text": "last one contains tables with sparse content. Lastly, we have", "bbox": {"l": 308.862, "t": 261.57543999999996, "r": 545.11517, "b": 270.48199, "coord_origin": "TOPLEFT"}}, {"id": 103, "text": "combined all synthetic datasets into one big unified syn-", "bbox": {"l": 308.862, "t": 273.5304, "r": 545.11505, "b": 282.43698, "coord_origin": "TOPLEFT"}}, {"id": 104, "text": "thetic dataset of 600k examples.", "bbox": {"l": 308.862, "t": 285.48541000000006, "r": 436.82169, "b": 294.39197, "coord_origin": "TOPLEFT"}}]}, {"id": 8, "label": "text", "bbox": {"l": 320.0391845703125, "t": 297.109375, "r": 542.74396, "b": 306.67896, "coord_origin": "TOPLEFT"}, "confidence": 0.9049100279808044, "cells": [{"id": 105, "text": "Tab. 1 summarizes the various attributes of the datasets.", "bbox": {"l": 320.81699, "t": 297.77240000000006, "r": 542.74396, "b": 306.67896, "coord_origin": "TOPLEFT"}}]}, {"id": 9, "label": "section_header", "bbox": {"l": 307.9829406738281, "t": 320.3982849121094, "r": 444.93607000000003, "b": 331.93167000000005, "coord_origin": "TOPLEFT"}, "confidence": 0.946631908416748, "cells": [{"id": 106, "text": "4.", "bbox": {"l": 308.862, "t": 321.18396, "r": 316.28476, "b": 331.93167000000005, "coord_origin": "TOPLEFT"}}, {"id": 107, "text": "The TableFormer model", "bbox": {"l": 326.18176, "t": 321.18396, "r": 444.93607000000003, "b": 331.93167000000005, "coord_origin": "TOPLEFT"}}]}, {"id": 10, "label": "text", "bbox": {"l": 307.7573547363281, "t": 340.8102111816406, "r": 545.5172119140625, "b": 447.71630859375, "coord_origin": "TOPLEFT"}, "confidence": 0.988465428352356, "cells": [{"id": 108, "text": "Given the image of a table, TableFormer is able to pre-", "bbox": {"l": 320.81699, "t": 341.93939, "r": 545.11499, "b": 350.84594999999996, "coord_origin": "TOPLEFT"}}, {"id": 109, "text": "dict: 1) a sequence of tokens that represent the structure of", "bbox": {"l": 308.862, "t": 353.89438, "r": 545.11511, "b": 362.80092999999994, "coord_origin": "TOPLEFT"}}, {"id": 110, "text": "a table, and 2) a bounding box coupled to a subset of those", "bbox": {"l": 308.862, "t": 365.84937, "r": 545.11517, "b": 374.75592, "coord_origin": "TOPLEFT"}}, {"id": 111, "text": "tokens. The conversion of an image into a sequence of to-", "bbox": {"l": 308.862, "t": 377.80435, "r": 545.11505, "b": 386.71091, "coord_origin": "TOPLEFT"}}, {"id": 112, "text": "kens is a well-known task [35, 16]. While attention is often", "bbox": {"l": 308.862, "t": 389.75934000000007, "r": 545.11517, "b": 398.66588999999993, "coord_origin": "TOPLEFT"}}, {"id": 113, "text": "used as an implicit method to associate each token of the", "bbox": {"l": 308.862, "t": 401.71432000000004, "r": 545.11523, "b": 410.62088, "coord_origin": "TOPLEFT"}}, {"id": 114, "text": "sequence with a position in the original image, an explicit", "bbox": {"l": 308.862, "t": 413.67032, "r": 545.11517, "b": 422.57687, "coord_origin": "TOPLEFT"}}, {"id": 115, "text": "association between the individual table-cells and the image", "bbox": {"l": 308.862, "t": 425.62531, "r": 545.11505, "b": 434.53186, "coord_origin": "TOPLEFT"}}, {"id": 116, "text": "bounding boxes is also required.", "bbox": {"l": 308.862, "t": 437.58029, "r": 437.9375, "b": 446.48685000000006, "coord_origin": "TOPLEFT"}}]}, {"id": 11, "label": "section_header", "bbox": {"l": 307.8253173828125, "t": 457.084716796875, "r": 420.16058, "b": 467.54633, "coord_origin": "TOPLEFT"}, "confidence": 0.9376939535140991, "cells": [{"id": 117, "text": "4.1.", "bbox": {"l": 308.862, "t": 457.69427, "r": 323.14081, "b": 467.54633, "coord_origin": "TOPLEFT"}}, {"id": 118, "text": "Model architecture.", "bbox": {"l": 332.66003, "t": 457.69427, "r": 420.16058, "b": 467.54633, "coord_origin": "TOPLEFT"}}]}, {"id": 12, "label": "text", "bbox": {"l": 307.7397155761719, "t": 475.38958740234375, "r": 545.5748901367188, "b": 664.99981, "coord_origin": "TOPLEFT"}, "confidence": 0.9878638386726379, "cells": [{"id": 119, "text": "We now describe in detail the proposed method, which", "bbox": {"l": 320.81699, "t": 476.76529, "r": 545.11487, "b": 485.67184, "coord_origin": "TOPLEFT"}}, {"id": 120, "text": "is composed of three main components, see Fig.", "bbox": {"l": 308.862, "t": 488.72028, "r": 509.02054, "b": 497.62683, "coord_origin": "TOPLEFT"}}, {"id": 121, "text": "4.", "bbox": {"l": 515.58588, "t": 488.72028, "r": 523.05786, "b": 497.62683, "coord_origin": "TOPLEFT"}}, {"id": 122, "text": "Our", "bbox": {"l": 529.62323, "t": 488.72028, "r": 545.11505, "b": 497.62683, "coord_origin": "TOPLEFT"}}, {"id": 123, "text": "CNN Backbone Network", "bbox": {"l": 308.862, "t": 500.76492, "r": 406.34601, "b": 509.35269, "coord_origin": "TOPLEFT"}}, {"id": 124, "text": "encodes the input as a feature vec-", "bbox": {"l": 408.87201, "t": 500.67526, "r": 545.1106, "b": 509.58182, "coord_origin": "TOPLEFT"}}, {"id": 125, "text": "tor of predefined length.", "bbox": {"l": 308.862, "t": 512.63126, "r": 409.39459, "b": 521.53781, "coord_origin": "TOPLEFT"}}, {"id": 126, "text": "The input feature vector of the", "bbox": {"l": 416.72705, "t": 512.63126, "r": 545.11505, "b": 521.53781, "coord_origin": "TOPLEFT"}}, {"id": 127, "text": "encoded image is passed to the", "bbox": {"l": 308.862, "t": 524.58624, "r": 436.194, "b": 533.4928, "coord_origin": "TOPLEFT"}}, {"id": 128, "text": "Structure Decoder", "bbox": {"l": 439.526, "t": 524.6759, "r": 513.86694, "b": 533.26367, "coord_origin": "TOPLEFT"}}, {"id": 129, "text": "to pro-", "bbox": {"l": 517.43201, "t": 524.58624, "r": 545.10815, "b": 533.4928, "coord_origin": "TOPLEFT"}}, {"id": 130, "text": "duce a sequence of HTML tags that represent the structure", "bbox": {"l": 308.862, "t": 536.54124, "r": 545.11511, "b": 545.4478, "coord_origin": "TOPLEFT"}}, {"id": 131, "text": "of the table.", "bbox": {"l": 308.862, "t": 548.49625, "r": 358.5455, "b": 557.4028000000001, "coord_origin": "TOPLEFT"}}, {"id": 132, "text": "With each prediction of an HTML standard", "bbox": {"l": 365.19055, "t": 548.49625, "r": 545.11517, "b": 557.4028000000001, "coord_origin": "TOPLEFT"}}, {"id": 133, "text": "data cell (\u2018", "bbox": {"l": 308.862, "t": 560.45125, "r": 352.40851, "b": 569.3578, "coord_origin": "TOPLEFT"}}, {"id": 134, "text": "<", "bbox": {"l": 352.409, "t": 560.29184, "r": 360.1579, "b": 569.13863, "coord_origin": "TOPLEFT"}}, {"id": 135, "text": "td", "bbox": {"l": 360.15799, "t": 560.45125, "r": 367.90891, "b": 569.3578, "coord_origin": "TOPLEFT"}}, {"id": 136, "text": ">", "bbox": {"l": 367.909, "t": 560.29184, "r": 375.6579, "b": 569.13863, "coord_origin": "TOPLEFT"}}, {"id": 137, "text": "\u2019) the hidden state of that cell is passed to", "bbox": {"l": 375.65799, "t": 560.45125, "r": 545.11182, "b": 569.3578, "coord_origin": "TOPLEFT"}}, {"id": 138, "text": "the Cell BBox Decoder. As for spanning cells, such as row", "bbox": {"l": 308.862, "t": 572.40724, "r": 545.11499, "b": 581.3138, "coord_origin": "TOPLEFT"}}, {"id": 139, "text": "or column span, the tag is broken down to \u2018", "bbox": {"l": 308.862, "t": 584.3622399999999, "r": 483.11768, "b": 593.2688, "coord_origin": "TOPLEFT"}}, {"id": 140, "text": "<", "bbox": {"l": 483.11902, "t": 584.20284, "r": 490.86792, "b": 593.04962, "coord_origin": "TOPLEFT"}}, {"id": 141, "text": "\u2019, \u2018rowspan=\u2019", "bbox": {"l": 490.86800999999997, "t": 584.3622399999999, "r": 545.11438, "b": 593.2688, "coord_origin": "TOPLEFT"}}, {"id": 142, "text": "or \u2018colspan=\u2019, with the number of spanning cells (attribute),", "bbox": {"l": 308.862, "t": 596.31725, "r": 545.11493, "b": 605.2238, "coord_origin": "TOPLEFT"}}, {"id": 143, "text": "and \u2018", "bbox": {"l": 308.862, "t": 608.27225, "r": 329.64395, "b": 617.1788, "coord_origin": "TOPLEFT"}}, {"id": 144, "text": ">", "bbox": {"l": 329.646, "t": 608.11284, "r": 337.3949, "b": 616.9596300000001, "coord_origin": "TOPLEFT"}}, {"id": 145, "text": "\u2019. The hidden state attached to \u2018", "bbox": {"l": 337.39398, "t": 608.27225, "r": 468.5914, "b": 617.1788, "coord_origin": "TOPLEFT"}}, {"id": 146, "text": "<", "bbox": {"l": 468.59496999999993, "t": 608.11284, "r": 476.34387000000004, "b": 616.9596300000001, "coord_origin": "TOPLEFT"}}, {"id": 147, "text": "\u2019 is passed to the", "bbox": {"l": 476.3439599999999, "t": 608.27225, "r": 545.11572, "b": 617.1788, "coord_origin": "TOPLEFT"}}, {"id": 148, "text": "Cell BBox Decoder. A shared feed forward network (FFN)", "bbox": {"l": 308.86197, "t": 620.22725, "r": 545.11499, "b": 629.1338000000001, "coord_origin": "TOPLEFT"}}, {"id": 149, "text": "receives the hidden states from the Structure Decoder, to", "bbox": {"l": 308.86197, "t": 632.1822500000001, "r": 545.11517, "b": 641.08881, "coord_origin": "TOPLEFT"}}, {"id": 150, "text": "provide the final detection predictions of the bounding box", "bbox": {"l": 308.86197, "t": 644.13824, "r": 545.11511, "b": 653.0448, "coord_origin": "TOPLEFT"}}, {"id": 151, "text": "coordinates and their classification.", "bbox": {"l": 308.86197, "t": 656.09325, "r": 449.42432, "b": 664.99981, "coord_origin": "TOPLEFT"}}]}, {"id": 13, "label": "text", "bbox": {"l": 307.9578552246094, "t": 667.443115234375, "r": 545.4042358398438, "b": 713.30419921875, "coord_origin": "TOPLEFT"}, "confidence": 0.9786657691001892, "cells": [{"id": 152, "text": "CNN Backbone Network.", "bbox": {"l": 320.81696, "t": 668.2607, "r": 431.90985, "b": 677.21707, "coord_origin": "TOPLEFT"}}, {"id": 153, "text": "A ResNet-18 CNN is the", "bbox": {"l": 439.49896, "t": 668.3802499999999, "r": 545.11255, "b": 677.2868100000001, "coord_origin": "TOPLEFT"}}, {"id": 154, "text": "backbone that receives the table image and encodes it as a", "bbox": {"l": 308.86197, "t": 680.33525, "r": 545.11499, "b": 689.24181, "coord_origin": "TOPLEFT"}}, {"id": 155, "text": "vector of predefined length. The network has been modified", "bbox": {"l": 308.86197, "t": 692.290253, "r": 545.11511, "b": 701.196815, "coord_origin": "TOPLEFT"}}, {"id": 156, "text": "by removing the linear and pooling layer, as we are not per-", "bbox": {"l": 308.86197, "t": 704.245255, "r": 545.11505, "b": 713.1518169999999, "coord_origin": "TOPLEFT"}}]}, {"id": 14, "label": "page_footer", "bbox": {"l": 294.33935546875, "t": 733.3119506835938, "r": 300.1784973144531, "b": 743.039814, "coord_origin": "TOPLEFT"}, "confidence": 0.8562639951705933, "cells": [{"id": 157, "text": "4", "bbox": {"l": 295.12097, "t": 734.133251, "r": 300.10226, "b": 743.039814, "coord_origin": "TOPLEFT"}}]}]}, "tablestructure": {"table_map": {"5": {"label": "table", "id": 5, "page_no": 3, "cluster": {"id": 5, "label": "table", "bbox": {"l": 310.6772766113281, "t": 73.19307708740234, "r": 542.958251953125, "b": 155.2208251953125, "coord_origin": "TOPLEFT"}, "confidence": 0.9777666330337524, "cells": [{"id": 57, "text": "Tags", "bbox": {"l": 412.332, "t": 73.61437999999998, "r": 430.90231, "b": 82.52094, "coord_origin": "TOPLEFT"}}, {"id": 58, "text": "Bbox", "bbox": {"l": 442.85742, "t": 73.61437999999998, "r": 464.4463799999999, "b": 82.52094, "coord_origin": "TOPLEFT"}}, {"id": 59, "text": "Size", "bbox": {"l": 477.78632, "t": 73.61437999999998, "r": 494.94193, "b": 82.52094, "coord_origin": "TOPLEFT"}}, {"id": 60, "text": "Format", "bbox": {"l": 508.28186, "t": 73.61437999999998, "r": 536.91437, "b": 82.52094, "coord_origin": "TOPLEFT"}}, {"id": 61, "text": "PubTabNet", "bbox": {"l": 317.06, "t": 85.9673499999999, "r": 361.64264, "b": 94.87390000000005, "coord_origin": "TOPLEFT"}}, {"id": 62, "text": "3", "bbox": {"l": 417.85599, "t": 85.6684600000001, "r": 425.37775, "b": 94.88385000000017, "coord_origin": "TOPLEFT"}}, {"id": 63, "text": "3", "bbox": {"l": 449.89569, "t": 85.6684600000001, "r": 457.41745000000003, "b": 94.88385000000017, "coord_origin": "TOPLEFT"}}, {"id": 64, "text": "509k", "bbox": {"l": 476.401, "t": 85.9673499999999, "r": 496.3262, "b": 94.87390000000005, "coord_origin": "TOPLEFT"}}, {"id": 65, "text": "PNG", "bbox": {"l": 512.63495, "t": 85.9673499999999, "r": 532.56012, "b": 94.87390000000005, "coord_origin": "TOPLEFT"}}, {"id": 66, "text": "FinTabNet", "bbox": {"l": 317.06, "t": 97.92236000000003, "r": 359.43094, "b": 106.82892000000004, "coord_origin": "TOPLEFT"}}, {"id": 67, "text": "3", "bbox": {"l": 417.85599, "t": 97.62347, "r": 425.37775, "b": 106.83887000000016, "coord_origin": "TOPLEFT"}}, {"id": 68, "text": "3", "bbox": {"l": 449.89569, "t": 97.62347, "r": 457.41745000000003, "b": 106.83887000000016, "coord_origin": "TOPLEFT"}}, {"id": 69, "text": "112k", "bbox": {"l": 476.401, "t": 97.92236000000003, "r": 496.3262, "b": 106.82892000000004, "coord_origin": "TOPLEFT"}}, {"id": 70, "text": "PDF", "bbox": {"l": 513.46185, "t": 97.92236000000003, "r": 531.73328, "b": 106.82892000000004, "coord_origin": "TOPLEFT"}}, {"id": 71, "text": "TableBank", "bbox": {"l": 317.06, "t": 109.87836000000004, "r": 359.97888, "b": 118.78490999999997, "coord_origin": "TOPLEFT"}}, {"id": 72, "text": "3", "bbox": {"l": 417.85599, "t": 109.57947000000001, "r": 425.37775, "b": 118.79485999999997, "coord_origin": "TOPLEFT"}}, {"id": 73, "text": "7", "bbox": {"l": 450.81226, "t": 109.57947000000001, "r": 456.50091999999995, "b": 118.79485999999997, "coord_origin": "TOPLEFT"}}, {"id": 74, "text": "145k", "bbox": {"l": 476.401, "t": 109.87836000000004, "r": 496.3262, "b": 118.78490999999997, "coord_origin": "TOPLEFT"}}, {"id": 75, "text": "JPEG", "bbox": {"l": 511.25017999999994, "t": 109.87836000000004, "r": 533.94501, "b": 118.78490999999997, "coord_origin": "TOPLEFT"}}, {"id": 76, "text": "Combined-Tabnet(*)", "bbox": {"l": 317.06, "t": 121.83336999999995, "r": 400.37723, "b": 130.73992999999996, "coord_origin": "TOPLEFT"}}, {"id": 77, "text": "3", "bbox": {"l": 417.85599, "t": 121.53448000000003, "r": 425.37775, "b": 130.74987999999996, "coord_origin": "TOPLEFT"}}, {"id": 78, "text": "3", "bbox": {"l": 449.89569, "t": 121.53448000000003, "r": 457.41745000000003, "b": 130.74987999999996, "coord_origin": "TOPLEFT"}}, {"id": 79, "text": "400k", "bbox": {"l": 476.401, "t": 121.83336999999995, "r": 496.3262, "b": 130.73992999999996, "coord_origin": "TOPLEFT"}}, {"id": 80, "text": "PNG", "bbox": {"l": 512.63495, "t": 121.83336999999995, "r": 532.56012, "b": 130.73992999999996, "coord_origin": "TOPLEFT"}}, {"id": 81, "text": "Combined(**)", "bbox": {"l": 317.06, "t": 133.78839000000005, "r": 375.17184, "b": 142.69494999999995, "coord_origin": "TOPLEFT"}}, {"id": 82, "text": "3", "bbox": {"l": 417.85599, "t": 133.48950000000002, "r": 425.37775, "b": 142.70489999999995, "coord_origin": "TOPLEFT"}}, {"id": 83, "text": "3", "bbox": {"l": 449.89569, "t": 133.48950000000002, "r": 457.41745000000003, "b": 142.70489999999995, "coord_origin": "TOPLEFT"}}, {"id": 84, "text": "500k", "bbox": {"l": 476.401, "t": 133.78839000000005, "r": 496.3262, "b": 142.69494999999995, "coord_origin": "TOPLEFT"}}, {"id": 85, "text": "PNG", "bbox": {"l": 512.63495, "t": 133.78839000000005, "r": 532.56012, "b": 142.69494999999995, "coord_origin": "TOPLEFT"}}, {"id": 86, "text": "SynthTabNet", "bbox": {"l": 317.06, "t": 145.74341000000004, "r": 369.39352, "b": 154.64995999999996, "coord_origin": "TOPLEFT"}}, {"id": 87, "text": "3", "bbox": {"l": 417.85599, "t": 145.44446000000005, "r": 425.37775, "b": 154.65985, "coord_origin": "TOPLEFT"}}, {"id": 88, "text": "3", "bbox": {"l": 449.89569, "t": 145.44446000000005, "r": 457.41745000000003, "b": 154.65985, "coord_origin": "TOPLEFT"}}, {"id": 89, "text": "600k", "bbox": {"l": 476.401, "t": 145.74334999999996, "r": 496.3262, "b": 154.6499, "coord_origin": "TOPLEFT"}}, {"id": 90, "text": "PNG", "bbox": {"l": 512.63495, "t": 145.74334999999996, "r": 532.56012, "b": 154.6499, "coord_origin": "TOPLEFT"}}]}, "text": null, "otsl_seq": ["ecel", "ched", "ched", "ched", "ched", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "nl"], "num_rows": 7, "num_cols": 5, "table_cells": [{"bbox": {"l": 412.332, "t": 73.61437999999998, "r": 430.90231, "b": 82.52094, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "Tags", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 442.85742, "t": 73.61437999999998, "r": 464.4463799999999, "b": 82.52094, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "Bbox", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 477.78632, "t": 73.61437999999998, "r": 494.94193, "b": 82.52094, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "Size", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 508.28186, "t": 73.61437999999998, "r": 536.91437, "b": 82.52094, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "Format", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 317.06, "t": 85.9673499999999, "r": 361.64264, "b": 94.87390000000005, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "PubTabNet", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 417.85599, "t": 85.6684600000001, "r": 425.37775, "b": 94.88385000000017, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 449.89569, "t": 85.6684600000001, "r": 457.41745000000003, "b": 94.88385000000017, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 476.401, "t": 85.9673499999999, "r": 496.3262, "b": 94.87390000000005, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "509k", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 512.63495, "t": 85.9673499999999, "r": 532.56012, "b": 94.87390000000005, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "PNG", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 317.06, "t": 97.92236000000003, "r": 359.43094, "b": 106.82892000000004, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "FinTabNet", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 417.85599, "t": 97.62347, "r": 425.37775, "b": 106.83887000000016, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 449.89569, "t": 97.62347, "r": 457.41745000000003, "b": 106.83887000000016, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 476.401, "t": 97.92236000000003, "r": 496.3262, "b": 106.82892000000004, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "112k", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 513.46185, "t": 97.92236000000003, "r": 531.73328, "b": 106.82892000000004, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "PDF", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 317.06, "t": 109.87836000000004, "r": 359.97888, "b": 118.78490999999997, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "TableBank", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 417.85599, "t": 109.57947000000001, "r": 425.37775, "b": 118.79485999999997, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 450.81226, "t": 109.57947000000001, "r": 456.50091999999995, "b": 118.79485999999997, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "7", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 476.401, "t": 109.87836000000004, "r": 496.3262, "b": 118.78490999999997, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "145k", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 511.25017999999994, "t": 109.87836000000004, "r": 533.94501, "b": 118.78490999999997, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "JPEG", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 317.06, "t": 121.83336999999995, "r": 400.37723, "b": 130.73992999999996, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Combined-Tabnet(*)", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 417.85599, "t": 121.53448000000003, "r": 425.37775, "b": 130.74987999999996, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 449.89569, "t": 121.53448000000003, "r": 457.41745000000003, "b": 130.74987999999996, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 476.401, "t": 121.83336999999995, "r": 496.3262, "b": 130.73992999999996, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "400k", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 512.63495, "t": 121.83336999999995, "r": 532.56012, "b": 130.73992999999996, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "PNG", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 317.06, "t": 133.78839000000005, "r": 375.17184, "b": 142.69494999999995, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Combined(**)", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 417.85599, "t": 133.48950000000002, "r": 425.37775, "b": 142.70489999999995, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 449.89569, "t": 133.48950000000002, "r": 457.41745000000003, "b": 142.70489999999995, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 476.401, "t": 133.78839000000005, "r": 496.3262, "b": 142.69494999999995, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "500k", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 512.63495, "t": 133.78839000000005, "r": 532.56012, "b": 142.69494999999995, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "PNG", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 317.06, "t": 145.74341000000004, "r": 369.39352, "b": 154.64995999999996, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "SynthTabNet", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 417.85599, "t": 145.44446000000005, "r": 425.37775, "b": 154.65985, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 449.89569, "t": 145.44446000000005, "r": 457.41745000000003, "b": 154.65985, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 476.401, "t": 145.74334999999996, "r": 496.3262, "b": 154.6499, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "600k", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 512.63495, "t": 145.74334999999996, "r": 532.56012, "b": 154.6499, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "PNG", "column_header": false, "row_header": false, "row_section": false}]}}}, "figures_classification": null, "equations_prediction": null}, "assembled": {"elements": [{"label": "text", "id": 0, "page_no": 3, "cluster": {"id": 0, "label": "text", "bbox": {"l": 49.493873596191406, "t": 74.18077087402344, "r": 286.36511, "b": 96.06994999999995, "coord_origin": "TOPLEFT"}, "confidence": 0.9614068269729614, "cells": [{"id": 0, "text": "amount of such tables, and kept only those ones ranging", "bbox": {"l": 50.112, "t": 75.20836999999995, "r": 286.36511, "b": 84.11492999999996, "coord_origin": "TOPLEFT"}}, {"id": 1, "text": "between 1*1 and 20*10 (rows/columns).", "bbox": {"l": 50.112, "t": 87.16339000000005, "r": 212.28319, "b": 96.06994999999995, "coord_origin": "TOPLEFT"}}]}, "text": "amount of such tables, and kept only those ones ranging between 1*1 and 20*10 (rows/columns)."}, {"label": "text", "id": 1, "page_no": 3, "cluster": {"id": 1, "label": "text", "bbox": {"l": 49.20685577392578, "t": 100.19926452636719, "r": 286.6800842285156, "b": 313.10507, "coord_origin": "TOPLEFT"}, "confidence": 0.9880395531654358, "cells": [{"id": 2, "text": "The availability of the bounding boxes for all table cells", "bbox": {"l": 62.067001, "t": 100.96038999999996, "r": 286.36502, "b": 109.86694, "coord_origin": "TOPLEFT"}}, {"id": 3, "text": "is essential to train our models. In order to distinguish be-", "bbox": {"l": 50.112, "t": 112.91540999999995, "r": 286.36508, "b": 121.82195999999999, "coord_origin": "TOPLEFT"}}, {"id": 4, "text": "tween empty and non-empty bounding boxes, we have in-", "bbox": {"l": 50.112, "t": 124.87041999999997, "r": 286.36508, "b": 133.77697999999998, "coord_origin": "TOPLEFT"}}, {"id": 5, "text": "troduced a binary class in the annotation. Unfortunately, the", "bbox": {"l": 50.112, "t": 136.82641999999998, "r": 286.36511, "b": 145.73297000000002, "coord_origin": "TOPLEFT"}}, {"id": 6, "text": "original datasets either omit the bounding boxes for whole", "bbox": {"l": 50.112, "t": 148.78143, "r": 286.36511, "b": 157.68799, "coord_origin": "TOPLEFT"}}, {"id": 7, "text": "tables (e.g. TableBank) or they narrow their scope only to", "bbox": {"l": 50.112, "t": 160.73645, "r": 286.36508, "b": 169.64301, "coord_origin": "TOPLEFT"}}, {"id": 8, "text": "non-empty cells. Therefore, it was imperative to introduce", "bbox": {"l": 50.112, "t": 172.69146999999998, "r": 286.36505, "b": 181.59802000000002, "coord_origin": "TOPLEFT"}}, {"id": 9, "text": "a data pre-processing procedure that generates the missing", "bbox": {"l": 50.112, "t": 184.64648, "r": 286.36508, "b": 193.55304, "coord_origin": "TOPLEFT"}}, {"id": 10, "text": "bounding boxes out of the annotation information. This pro-", "bbox": {"l": 50.112, "t": 196.60248, "r": 286.36508, "b": 205.50903000000005, "coord_origin": "TOPLEFT"}}, {"id": 11, "text": "cedure first parses the provided table structure and calcu-", "bbox": {"l": 50.112, "t": 208.5575, "r": 286.36508, "b": 217.46405000000004, "coord_origin": "TOPLEFT"}}, {"id": 12, "text": "lates the dimensions of the most fine-grained grid that cov-", "bbox": {"l": 50.112, "t": 220.51251000000002, "r": 286.36511, "b": 229.41907000000003, "coord_origin": "TOPLEFT"}}, {"id": 13, "text": "ers the table structure. Notice that each table cell may oc-", "bbox": {"l": 50.112, "t": 232.46753, "r": 286.36508, "b": 241.37408000000005, "coord_origin": "TOPLEFT"}}, {"id": 14, "text": "cupy multiple grid squares due to row or column spans. In", "bbox": {"l": 50.112, "t": 244.42255, "r": 286.36508, "b": 253.32910000000004, "coord_origin": "TOPLEFT"}}, {"id": 15, "text": "case of PubTabNet we had to compute missing bounding", "bbox": {"l": 50.112, "t": 256.37756, "r": 286.36505, "b": 265.28412000000003, "coord_origin": "TOPLEFT"}}, {"id": 16, "text": "boxes for 48% of the simple and 69% of the complex ta-", "bbox": {"l": 50.112, "t": 268.33356000000003, "r": 286.36505, "b": 277.24010999999996, "coord_origin": "TOPLEFT"}}, {"id": 17, "text": "bles.", "bbox": {"l": 50.112, "t": 280.28853999999995, "r": 68.652397, "b": 289.1951, "coord_origin": "TOPLEFT"}}, {"id": 18, "text": "Regarding FinTabNet, 68% of the simple and 98%", "bbox": {"l": 75.566444, "t": 280.28853999999995, "r": 286.36514, "b": 289.1951, "coord_origin": "TOPLEFT"}}, {"id": 19, "text": "of the complex tables require the generation of bounding", "bbox": {"l": 50.112, "t": 292.24353, "r": 286.36511, "b": 301.15009, "coord_origin": "TOPLEFT"}}, {"id": 20, "text": "boxes.", "bbox": {"l": 50.112, "t": 304.19852000000003, "r": 75.695961, "b": 313.10507, "coord_origin": "TOPLEFT"}}]}, "text": "The availability of the bounding boxes for all table cells is essential to train our models. In order to distinguish between empty and non-empty bounding boxes, we have introduced a binary class in the annotation. Unfortunately, the original datasets either omit the bounding boxes for whole tables (e.g. TableBank) or they narrow their scope only to non-empty cells. Therefore, it was imperative to introduce a data pre-processing procedure that generates the missing bounding boxes out of the annotation information. This procedure first parses the provided table structure and calculates the dimensions of the most fine-grained grid that covers the table structure. Notice that each table cell may occupy multiple grid squares due to row or column spans. In case of PubTabNet we had to compute missing bounding boxes for 48% of the simple and 69% of the complex tables. Regarding FinTabNet, 68% of the simple and 98% of the complex tables require the generation of bounding boxes."}, {"label": "text", "id": 2, "page_no": 3, "cluster": {"id": 2, "label": "text", "bbox": {"l": 49.406982421875, "t": 317.0073547363281, "r": 286.68450927734375, "b": 434.8613586425781, "coord_origin": "TOPLEFT"}, "confidence": 0.9874395728111267, "cells": [{"id": 21, "text": "As it is illustrated in Fig. 2, the table distributions from", "bbox": {"l": 62.067001, "t": 317.99550999999997, "r": 286.36499, "b": 326.90207, "coord_origin": "TOPLEFT"}}, {"id": 22, "text": "all datasets are skewed towards simpler structures with", "bbox": {"l": 50.112, "t": 329.95151, "r": 286.36511, "b": 338.8580600000001, "coord_origin": "TOPLEFT"}}, {"id": 23, "text": "fewer number of rows/columns. Additionally, there is very", "bbox": {"l": 50.112, "t": 341.90649, "r": 286.36502, "b": 350.81305, "coord_origin": "TOPLEFT"}}, {"id": 24, "text": "limited variance in the table styles, which in case of Pub-", "bbox": {"l": 50.112, "t": 353.8614799999999, "r": 286.36505, "b": 362.76804, "coord_origin": "TOPLEFT"}}, {"id": 25, "text": "TabNet and FinTabNet means one styling format for the", "bbox": {"l": 50.112, "t": 365.81647, "r": 286.36508, "b": 374.72301999999996, "coord_origin": "TOPLEFT"}}, {"id": 26, "text": "majority of the tables.", "bbox": {"l": 50.112, "t": 377.77145, "r": 141.58859, "b": 386.67801, "coord_origin": "TOPLEFT"}}, {"id": 27, "text": "Similar limitations appear also in", "bbox": {"l": 148.70189, "t": 377.77145, "r": 286.36508, "b": 386.67801, "coord_origin": "TOPLEFT"}}, {"id": 28, "text": "the type of table content, which in some cases (e.g. FinTab-", "bbox": {"l": 50.112, "t": 389.72644, "r": 286.36508, "b": 398.63300000000004, "coord_origin": "TOPLEFT"}}, {"id": 29, "text": "Net) is restricted to a certain domain. Ultimately, the lack", "bbox": {"l": 50.112, "t": 401.68243, "r": 286.36511, "b": 410.58899, "coord_origin": "TOPLEFT"}}, {"id": 30, "text": "of diversity in the training dataset damages the ability of the", "bbox": {"l": 50.112, "t": 413.63742, "r": 286.36511, "b": 422.54398, "coord_origin": "TOPLEFT"}}, {"id": 31, "text": "models to generalize well on unseen data.", "bbox": {"l": 50.112, "t": 425.59241, "r": 216.39774, "b": 434.49896, "coord_origin": "TOPLEFT"}}]}, "text": "As it is illustrated in Fig. 2, the table distributions from all datasets are skewed towards simpler structures with fewer number of rows/columns. Additionally, there is very limited variance in the table styles, which in case of PubTabNet and FinTabNet means one styling format for the majority of the tables. Similar limitations appear also in the type of table content, which in some cases (e.g. FinTabNet) is restricted to a certain domain. Ultimately, the lack of diversity in the training dataset damages the ability of the models to generalize well on unseen data."}, {"label": "text", "id": 3, "page_no": 3, "cluster": {"id": 3, "label": "text", "bbox": {"l": 49.20225143432617, "t": 438.12713623046875, "r": 286.90093994140625, "b": 627.7101440429688, "coord_origin": "TOPLEFT"}, "confidence": 0.9876185655593872, "cells": [{"id": 32, "text": "Motivated by those observations we aimed at generating", "bbox": {"l": 62.067001, "t": 439.3894, "r": 286.36499, "b": 448.2959599999999, "coord_origin": "TOPLEFT"}}, {"id": 33, "text": "a synthetic table dataset named", "bbox": {"l": 50.112, "t": 451.34439, "r": 172.14388, "b": 460.25095, "coord_origin": "TOPLEFT"}}, {"id": 34, "text": "SynthTabNet", "bbox": {"l": 174.14801, "t": 451.43405, "r": 224.70818999999997, "b": 460.02182, "coord_origin": "TOPLEFT"}}, {"id": 35, "text": ". This approach", "bbox": {"l": 224.70801, "t": 451.34439, "r": 286.36655, "b": 460.25095, "coord_origin": "TOPLEFT"}}, {"id": 36, "text": "offers control over: 1) the size of the dataset, 2) the table", "bbox": {"l": 50.112015, "t": 463.30038, "r": 286.36505, "b": 472.20694, "coord_origin": "TOPLEFT"}}, {"id": 37, "text": "structure, 3) the table style and 4) the type of content. The", "bbox": {"l": 50.112015, "t": 475.25537, "r": 286.36511, "b": 484.16193, "coord_origin": "TOPLEFT"}}, {"id": 38, "text": "complexity of the table structure is described by the size of", "bbox": {"l": 50.112015, "t": 487.21036, "r": 286.36511, "b": 496.11691, "coord_origin": "TOPLEFT"}}, {"id": 39, "text": "the table header and the table body, as well as the percentage", "bbox": {"l": 50.112015, "t": 499.16534, "r": 286.36508, "b": 508.0719, "coord_origin": "TOPLEFT"}}, {"id": 40, "text": "of the table cells covered by row spans and column spans.", "bbox": {"l": 50.112015, "t": 511.12033, "r": 286.36505, "b": 520.02689, "coord_origin": "TOPLEFT"}}, {"id": 41, "text": "A set of carefully designed styling templates provides the", "bbox": {"l": 50.112015, "t": 523.07632, "r": 286.36508, "b": 531.98288, "coord_origin": "TOPLEFT"}}, {"id": 42, "text": "basis to build a wide range of table appearances. Lastly, the", "bbox": {"l": 50.112015, "t": 535.0313100000001, "r": 286.36508, "b": 543.93788, "coord_origin": "TOPLEFT"}}, {"id": 43, "text": "table content is generated out of a curated collection of text", "bbox": {"l": 50.112015, "t": 546.98633, "r": 286.36511, "b": 555.89288, "coord_origin": "TOPLEFT"}}, {"id": 44, "text": "corpora. By controlling the size and scope of the synthetic", "bbox": {"l": 50.112015, "t": 558.94133, "r": 286.36508, "b": 567.84789, "coord_origin": "TOPLEFT"}}, {"id": 45, "text": "datasets we are able to train and evaluate our models in a", "bbox": {"l": 50.112015, "t": 570.89633, "r": 286.36511, "b": 579.8028899999999, "coord_origin": "TOPLEFT"}}, {"id": 46, "text": "variety of different conditions. For example, we can first", "bbox": {"l": 50.112015, "t": 582.85133, "r": 286.36511, "b": 591.75789, "coord_origin": "TOPLEFT"}}, {"id": 47, "text": "generate a highly diverse dataset to train our models and", "bbox": {"l": 50.112015, "t": 594.80733, "r": 286.36505, "b": 603.71388, "coord_origin": "TOPLEFT"}}, {"id": 48, "text": "then evaluate their performance on other synthetic datasets", "bbox": {"l": 50.112015, "t": 606.76233, "r": 286.36508, "b": 615.6688800000001, "coord_origin": "TOPLEFT"}}, {"id": 49, "text": "which are focused on a specific domain.", "bbox": {"l": 50.112015, "t": 618.71733, "r": 209.7527, "b": 627.62389, "coord_origin": "TOPLEFT"}}]}, "text": "Motivated by those observations we aimed at generating a synthetic table dataset named SynthTabNet . This approach offers control over: 1) the size of the dataset, 2) the table structure, 3) the table style and 4) the type of content. The complexity of the table structure is described by the size of the table header and the table body, as well as the percentage of the table cells covered by row spans and column spans. A set of carefully designed styling templates provides the basis to build a wide range of table appearances. Lastly, the table content is generated out of a curated collection of text corpora. By controlling the size and scope of the synthetic datasets we are able to train and evaluate our models in a variety of different conditions. For example, we can first generate a highly diverse dataset to train our models and then evaluate their performance on other synthetic datasets which are focused on a specific domain."}, {"label": "text", "id": 4, "page_no": 3, "cluster": {"id": 4, "label": "text", "bbox": {"l": 49.49843215942383, "t": 631.5836181640625, "r": 286.4927978515625, "b": 713.2897338867188, "coord_origin": "TOPLEFT"}, "confidence": 0.9870707392692566, "cells": [{"id": 50, "text": "In this regard, we have prepared four synthetic datasets,", "bbox": {"l": 62.067017, "t": 632.51433, "r": 286.36499, "b": 641.42088, "coord_origin": "TOPLEFT"}}, {"id": 51, "text": "each one containing 150k examples. The corpora to gener-", "bbox": {"l": 50.112015, "t": 644.46933, "r": 286.36508, "b": 653.37589, "coord_origin": "TOPLEFT"}}, {"id": 52, "text": "ate the table text consists of the most frequent terms appear-", "bbox": {"l": 50.112015, "t": 656.42532, "r": 286.36511, "b": 665.33189, "coord_origin": "TOPLEFT"}}, {"id": 53, "text": "ing in PubTabNet and FinTabNet together with randomly", "bbox": {"l": 50.112015, "t": 668.38033, "r": 286.36505, "b": 677.28689, "coord_origin": "TOPLEFT"}}, {"id": 54, "text": "generated text. The first two synthetic datasets have been", "bbox": {"l": 50.112015, "t": 680.33533, "r": 286.36508, "b": 689.24189, "coord_origin": "TOPLEFT"}}, {"id": 55, "text": "fine-tuned to mimic the appearance of the original datasets", "bbox": {"l": 50.112015, "t": 692.290329, "r": 286.36508, "b": 701.196892, "coord_origin": "TOPLEFT"}}, {"id": 56, "text": "but encompass more complicated table structures. The third", "bbox": {"l": 50.112015, "t": 704.245331, "r": 286.36511, "b": 713.151894, "coord_origin": "TOPLEFT"}}]}, "text": "In this regard, we have prepared four synthetic datasets, each one containing 150k examples. The corpora to generate the table text consists of the most frequent terms appearing in PubTabNet and FinTabNet together with randomly generated text. The first two synthetic datasets have been fine-tuned to mimic the appearance of the original datasets but encompass more complicated table structures. The third"}, {"label": "table", "id": 5, "page_no": 3, "cluster": {"id": 5, "label": "table", "bbox": {"l": 310.6772766113281, "t": 73.19307708740234, "r": 542.958251953125, "b": 155.2208251953125, "coord_origin": "TOPLEFT"}, "confidence": 0.9777666330337524, "cells": [{"id": 57, "text": "Tags", "bbox": {"l": 412.332, "t": 73.61437999999998, "r": 430.90231, "b": 82.52094, "coord_origin": "TOPLEFT"}}, {"id": 58, "text": "Bbox", "bbox": {"l": 442.85742, "t": 73.61437999999998, "r": 464.4463799999999, "b": 82.52094, "coord_origin": "TOPLEFT"}}, {"id": 59, "text": "Size", "bbox": {"l": 477.78632, "t": 73.61437999999998, "r": 494.94193, "b": 82.52094, "coord_origin": "TOPLEFT"}}, {"id": 60, "text": "Format", "bbox": {"l": 508.28186, "t": 73.61437999999998, "r": 536.91437, "b": 82.52094, "coord_origin": "TOPLEFT"}}, {"id": 61, "text": "PubTabNet", "bbox": {"l": 317.06, "t": 85.9673499999999, "r": 361.64264, "b": 94.87390000000005, "coord_origin": "TOPLEFT"}}, {"id": 62, "text": "3", "bbox": {"l": 417.85599, "t": 85.6684600000001, "r": 425.37775, "b": 94.88385000000017, "coord_origin": "TOPLEFT"}}, {"id": 63, "text": "3", "bbox": {"l": 449.89569, "t": 85.6684600000001, "r": 457.41745000000003, "b": 94.88385000000017, "coord_origin": "TOPLEFT"}}, {"id": 64, "text": "509k", "bbox": {"l": 476.401, "t": 85.9673499999999, "r": 496.3262, "b": 94.87390000000005, "coord_origin": "TOPLEFT"}}, {"id": 65, "text": "PNG", "bbox": {"l": 512.63495, "t": 85.9673499999999, "r": 532.56012, "b": 94.87390000000005, "coord_origin": "TOPLEFT"}}, {"id": 66, "text": "FinTabNet", "bbox": {"l": 317.06, "t": 97.92236000000003, "r": 359.43094, "b": 106.82892000000004, "coord_origin": "TOPLEFT"}}, {"id": 67, "text": "3", "bbox": {"l": 417.85599, "t": 97.62347, "r": 425.37775, "b": 106.83887000000016, "coord_origin": "TOPLEFT"}}, {"id": 68, "text": "3", "bbox": {"l": 449.89569, "t": 97.62347, "r": 457.41745000000003, "b": 106.83887000000016, "coord_origin": "TOPLEFT"}}, {"id": 69, "text": "112k", "bbox": {"l": 476.401, "t": 97.92236000000003, "r": 496.3262, "b": 106.82892000000004, "coord_origin": "TOPLEFT"}}, {"id": 70, "text": "PDF", "bbox": {"l": 513.46185, "t": 97.92236000000003, "r": 531.73328, "b": 106.82892000000004, "coord_origin": "TOPLEFT"}}, {"id": 71, "text": "TableBank", "bbox": {"l": 317.06, "t": 109.87836000000004, "r": 359.97888, "b": 118.78490999999997, "coord_origin": "TOPLEFT"}}, {"id": 72, "text": "3", "bbox": {"l": 417.85599, "t": 109.57947000000001, "r": 425.37775, "b": 118.79485999999997, "coord_origin": "TOPLEFT"}}, {"id": 73, "text": "7", "bbox": {"l": 450.81226, "t": 109.57947000000001, "r": 456.50091999999995, "b": 118.79485999999997, "coord_origin": "TOPLEFT"}}, {"id": 74, "text": "145k", "bbox": {"l": 476.401, "t": 109.87836000000004, "r": 496.3262, "b": 118.78490999999997, "coord_origin": "TOPLEFT"}}, {"id": 75, "text": "JPEG", "bbox": {"l": 511.25017999999994, "t": 109.87836000000004, "r": 533.94501, "b": 118.78490999999997, "coord_origin": "TOPLEFT"}}, {"id": 76, "text": "Combined-Tabnet(*)", "bbox": {"l": 317.06, "t": 121.83336999999995, "r": 400.37723, "b": 130.73992999999996, "coord_origin": "TOPLEFT"}}, {"id": 77, "text": "3", "bbox": {"l": 417.85599, "t": 121.53448000000003, "r": 425.37775, "b": 130.74987999999996, "coord_origin": "TOPLEFT"}}, {"id": 78, "text": "3", "bbox": {"l": 449.89569, "t": 121.53448000000003, "r": 457.41745000000003, "b": 130.74987999999996, "coord_origin": "TOPLEFT"}}, {"id": 79, "text": "400k", "bbox": {"l": 476.401, "t": 121.83336999999995, "r": 496.3262, "b": 130.73992999999996, "coord_origin": "TOPLEFT"}}, {"id": 80, "text": "PNG", "bbox": {"l": 512.63495, "t": 121.83336999999995, "r": 532.56012, "b": 130.73992999999996, "coord_origin": "TOPLEFT"}}, {"id": 81, "text": "Combined(**)", "bbox": {"l": 317.06, "t": 133.78839000000005, "r": 375.17184, "b": 142.69494999999995, "coord_origin": "TOPLEFT"}}, {"id": 82, "text": "3", "bbox": {"l": 417.85599, "t": 133.48950000000002, "r": 425.37775, "b": 142.70489999999995, "coord_origin": "TOPLEFT"}}, {"id": 83, "text": "3", "bbox": {"l": 449.89569, "t": 133.48950000000002, "r": 457.41745000000003, "b": 142.70489999999995, "coord_origin": "TOPLEFT"}}, {"id": 84, "text": "500k", "bbox": {"l": 476.401, "t": 133.78839000000005, "r": 496.3262, "b": 142.69494999999995, "coord_origin": "TOPLEFT"}}, {"id": 85, "text": "PNG", "bbox": {"l": 512.63495, "t": 133.78839000000005, "r": 532.56012, "b": 142.69494999999995, "coord_origin": "TOPLEFT"}}, {"id": 86, "text": "SynthTabNet", "bbox": {"l": 317.06, "t": 145.74341000000004, "r": 369.39352, "b": 154.64995999999996, "coord_origin": "TOPLEFT"}}, {"id": 87, "text": "3", "bbox": {"l": 417.85599, "t": 145.44446000000005, "r": 425.37775, "b": 154.65985, "coord_origin": "TOPLEFT"}}, {"id": 88, "text": "3", "bbox": {"l": 449.89569, "t": 145.44446000000005, "r": 457.41745000000003, "b": 154.65985, "coord_origin": "TOPLEFT"}}, {"id": 89, "text": "600k", "bbox": {"l": 476.401, "t": 145.74334999999996, "r": 496.3262, "b": 154.6499, "coord_origin": "TOPLEFT"}}, {"id": 90, "text": "PNG", "bbox": {"l": 512.63495, "t": 145.74334999999996, "r": 532.56012, "b": 154.6499, "coord_origin": "TOPLEFT"}}]}, "text": null, "otsl_seq": ["ecel", "ched", "ched", "ched", "ched", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "nl"], "num_rows": 7, "num_cols": 5, "table_cells": [{"bbox": {"l": 412.332, "t": 73.61437999999998, "r": 430.90231, "b": 82.52094, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "Tags", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 442.85742, "t": 73.61437999999998, "r": 464.4463799999999, "b": 82.52094, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "Bbox", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 477.78632, "t": 73.61437999999998, "r": 494.94193, "b": 82.52094, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "Size", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 508.28186, "t": 73.61437999999998, "r": 536.91437, "b": 82.52094, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "Format", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 317.06, "t": 85.9673499999999, "r": 361.64264, "b": 94.87390000000005, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "PubTabNet", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 417.85599, "t": 85.6684600000001, "r": 425.37775, "b": 94.88385000000017, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 449.89569, "t": 85.6684600000001, "r": 457.41745000000003, "b": 94.88385000000017, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 476.401, "t": 85.9673499999999, "r": 496.3262, "b": 94.87390000000005, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "509k", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 512.63495, "t": 85.9673499999999, "r": 532.56012, "b": 94.87390000000005, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "PNG", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 317.06, "t": 97.92236000000003, "r": 359.43094, "b": 106.82892000000004, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "FinTabNet", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 417.85599, "t": 97.62347, "r": 425.37775, "b": 106.83887000000016, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 449.89569, "t": 97.62347, "r": 457.41745000000003, "b": 106.83887000000016, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 476.401, "t": 97.92236000000003, "r": 496.3262, "b": 106.82892000000004, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "112k", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 513.46185, "t": 97.92236000000003, "r": 531.73328, "b": 106.82892000000004, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "PDF", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 317.06, "t": 109.87836000000004, "r": 359.97888, "b": 118.78490999999997, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "TableBank", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 417.85599, "t": 109.57947000000001, "r": 425.37775, "b": 118.79485999999997, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 450.81226, "t": 109.57947000000001, "r": 456.50091999999995, "b": 118.79485999999997, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "7", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 476.401, "t": 109.87836000000004, "r": 496.3262, "b": 118.78490999999997, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "145k", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 511.25017999999994, "t": 109.87836000000004, "r": 533.94501, "b": 118.78490999999997, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "JPEG", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 317.06, "t": 121.83336999999995, "r": 400.37723, "b": 130.73992999999996, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Combined-Tabnet(*)", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 417.85599, "t": 121.53448000000003, "r": 425.37775, "b": 130.74987999999996, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 449.89569, "t": 121.53448000000003, "r": 457.41745000000003, "b": 130.74987999999996, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 476.401, "t": 121.83336999999995, "r": 496.3262, "b": 130.73992999999996, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "400k", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 512.63495, "t": 121.83336999999995, "r": 532.56012, "b": 130.73992999999996, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "PNG", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 317.06, "t": 133.78839000000005, "r": 375.17184, "b": 142.69494999999995, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Combined(**)", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 417.85599, "t": 133.48950000000002, "r": 425.37775, "b": 142.70489999999995, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 449.89569, "t": 133.48950000000002, "r": 457.41745000000003, "b": 142.70489999999995, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 476.401, "t": 133.78839000000005, "r": 496.3262, "b": 142.69494999999995, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "500k", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 512.63495, "t": 133.78839000000005, "r": 532.56012, "b": 142.69494999999995, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "PNG", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 317.06, "t": 145.74341000000004, "r": 369.39352, "b": 154.64995999999996, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "SynthTabNet", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 417.85599, "t": 145.44446000000005, "r": 425.37775, "b": 154.65985, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 449.89569, "t": 145.44446000000005, "r": 457.41745000000003, "b": 154.65985, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 476.401, "t": 145.74334999999996, "r": 496.3262, "b": 154.6499, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "600k", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 512.63495, "t": 145.74334999999996, "r": 532.56012, "b": 154.6499, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "PNG", "column_header": false, "row_header": false, "row_section": false}]}, {"label": "caption", "id": 6, "page_no": 3, "cluster": {"id": 6, "label": "caption", "bbox": {"l": 307.88067626953125, "t": 167.05953979492188, "r": 545.414306640625, "b": 224.38897999999995, "coord_origin": "TOPLEFT"}, "confidence": 0.9668342471122742, "cells": [{"id": 91, "text": "Table 1:", "bbox": {"l": 308.862, "t": 167.66138, "r": 344.6178, "b": 176.56793000000005, "coord_origin": "TOPLEFT"}}, {"id": 92, "text": "Both", "bbox": {"l": 361.07602, "t": 167.66138, "r": 380.45328, "b": 176.56793000000005, "coord_origin": "TOPLEFT"}}, {"id": 93, "text": "\u201cCombined-Tabnet\u201d", "bbox": {"l": 386.56799, "t": 167.75104, "r": 468.67974999999996, "b": 176.33880999999997, "coord_origin": "TOPLEFT"}}, {"id": 94, "text": "and", "bbox": {"l": 474.79599, "t": 167.66138, "r": 489.18198, "b": 176.56793000000005, "coord_origin": "TOPLEFT"}}, {"id": 95, "text": "\u201dCombined-", "bbox": {"l": 495.29898000000003, "t": 167.75104, "r": 545.112, "b": 176.33880999999997, "coord_origin": "TOPLEFT"}}, {"id": 96, "text": "Tabnet\u201d", "bbox": {"l": 308.862, "t": 179.70605, "r": 341.16077, "b": 188.29381999999998, "coord_origin": "TOPLEFT"}}, {"id": 97, "text": "are variations of the following: (*) The Combined-", "bbox": {"l": 343.457, "t": 179.61639000000002, "r": 545.11005, "b": 188.52295000000004, "coord_origin": "TOPLEFT"}}, {"id": 98, "text": "Tabnet dataset is the processed combination of PubTabNet", "bbox": {"l": 308.862, "t": 191.57141000000001, "r": 545.11505, "b": 200.47797000000003, "coord_origin": "TOPLEFT"}}, {"id": 99, "text": "and Fintabnet. (**) The combined dataset is the processed", "bbox": {"l": 308.862, "t": 203.52643, "r": 545.11499, "b": 212.43298000000004, "coord_origin": "TOPLEFT"}}, {"id": 100, "text": "combination of PubTabNet, Fintabnet and TableBank.", "bbox": {"l": 308.862, "t": 215.48242000000005, "r": 523.93469, "b": 224.38897999999995, "coord_origin": "TOPLEFT"}}]}, "text": "Table 1: Both \u201cCombined-Tabnet\u201d and \u201dCombinedTabnet\u201d are variations of the following: (*) The CombinedTabnet dataset is the processed combination of PubTabNet and Fintabnet. (**) The combined dataset is the processed combination of PubTabNet, Fintabnet and TableBank."}, {"label": "text", "id": 7, "page_no": 3, "cluster": {"id": 7, "label": "text", "bbox": {"l": 307.9458923339844, "t": 248.9580078125, "r": 545.1485595703125, "b": 294.39197, "coord_origin": "TOPLEFT"}, "confidence": 0.9791285991668701, "cells": [{"id": 101, "text": "one adopts a colorful appearance with high contrast and the", "bbox": {"l": 308.862, "t": 249.62041999999997, "r": 545.11517, "b": 258.52698, "coord_origin": "TOPLEFT"}}, {"id": 102, "text": "last one contains tables with sparse content. Lastly, we have", "bbox": {"l": 308.862, "t": 261.57543999999996, "r": 545.11517, "b": 270.48199, "coord_origin": "TOPLEFT"}}, {"id": 103, "text": "combined all synthetic datasets into one big unified syn-", "bbox": {"l": 308.862, "t": 273.5304, "r": 545.11505, "b": 282.43698, "coord_origin": "TOPLEFT"}}, {"id": 104, "text": "thetic dataset of 600k examples.", "bbox": {"l": 308.862, "t": 285.48541000000006, "r": 436.82169, "b": 294.39197, "coord_origin": "TOPLEFT"}}]}, "text": "one adopts a colorful appearance with high contrast and the last one contains tables with sparse content. Lastly, we have combined all synthetic datasets into one big unified synthetic dataset of 600k examples."}, {"label": "text", "id": 8, "page_no": 3, "cluster": {"id": 8, "label": "text", "bbox": {"l": 320.0391845703125, "t": 297.109375, "r": 542.74396, "b": 306.67896, "coord_origin": "TOPLEFT"}, "confidence": 0.9049100279808044, "cells": [{"id": 105, "text": "Tab. 1 summarizes the various attributes of the datasets.", "bbox": {"l": 320.81699, "t": 297.77240000000006, "r": 542.74396, "b": 306.67896, "coord_origin": "TOPLEFT"}}]}, "text": "Tab. 1 summarizes the various attributes of the datasets."}, {"label": "section_header", "id": 9, "page_no": 3, "cluster": {"id": 9, "label": "section_header", "bbox": {"l": 307.9829406738281, "t": 320.3982849121094, "r": 444.93607000000003, "b": 331.93167000000005, "coord_origin": "TOPLEFT"}, "confidence": 0.946631908416748, "cells": [{"id": 106, "text": "4.", "bbox": {"l": 308.862, "t": 321.18396, "r": 316.28476, "b": 331.93167000000005, "coord_origin": "TOPLEFT"}}, {"id": 107, "text": "The TableFormer model", "bbox": {"l": 326.18176, "t": 321.18396, "r": 444.93607000000003, "b": 331.93167000000005, "coord_origin": "TOPLEFT"}}]}, "text": "4. The TableFormer model"}, {"label": "text", "id": 10, "page_no": 3, "cluster": {"id": 10, "label": "text", "bbox": {"l": 307.7573547363281, "t": 340.8102111816406, "r": 545.5172119140625, "b": 447.71630859375, "coord_origin": "TOPLEFT"}, "confidence": 0.988465428352356, "cells": [{"id": 108, "text": "Given the image of a table, TableFormer is able to pre-", "bbox": {"l": 320.81699, "t": 341.93939, "r": 545.11499, "b": 350.84594999999996, "coord_origin": "TOPLEFT"}}, {"id": 109, "text": "dict: 1) a sequence of tokens that represent the structure of", "bbox": {"l": 308.862, "t": 353.89438, "r": 545.11511, "b": 362.80092999999994, "coord_origin": "TOPLEFT"}}, {"id": 110, "text": "a table, and 2) a bounding box coupled to a subset of those", "bbox": {"l": 308.862, "t": 365.84937, "r": 545.11517, "b": 374.75592, "coord_origin": "TOPLEFT"}}, {"id": 111, "text": "tokens. The conversion of an image into a sequence of to-", "bbox": {"l": 308.862, "t": 377.80435, "r": 545.11505, "b": 386.71091, "coord_origin": "TOPLEFT"}}, {"id": 112, "text": "kens is a well-known task [35, 16]. While attention is often", "bbox": {"l": 308.862, "t": 389.75934000000007, "r": 545.11517, "b": 398.66588999999993, "coord_origin": "TOPLEFT"}}, {"id": 113, "text": "used as an implicit method to associate each token of the", "bbox": {"l": 308.862, "t": 401.71432000000004, "r": 545.11523, "b": 410.62088, "coord_origin": "TOPLEFT"}}, {"id": 114, "text": "sequence with a position in the original image, an explicit", "bbox": {"l": 308.862, "t": 413.67032, "r": 545.11517, "b": 422.57687, "coord_origin": "TOPLEFT"}}, {"id": 115, "text": "association between the individual table-cells and the image", "bbox": {"l": 308.862, "t": 425.62531, "r": 545.11505, "b": 434.53186, "coord_origin": "TOPLEFT"}}, {"id": 116, "text": "bounding boxes is also required.", "bbox": {"l": 308.862, "t": 437.58029, "r": 437.9375, "b": 446.48685000000006, "coord_origin": "TOPLEFT"}}]}, "text": "Given the image of a table, TableFormer is able to predict: 1) a sequence of tokens that represent the structure of a table, and 2) a bounding box coupled to a subset of those tokens. The conversion of an image into a sequence of tokens is a well-known task [35, 16]. While attention is often used as an implicit method to associate each token of the sequence with a position in the original image, an explicit association between the individual table-cells and the image bounding boxes is also required."}, {"label": "section_header", "id": 11, "page_no": 3, "cluster": {"id": 11, "label": "section_header", "bbox": {"l": 307.8253173828125, "t": 457.084716796875, "r": 420.16058, "b": 467.54633, "coord_origin": "TOPLEFT"}, "confidence": 0.9376939535140991, "cells": [{"id": 117, "text": "4.1.", "bbox": {"l": 308.862, "t": 457.69427, "r": 323.14081, "b": 467.54633, "coord_origin": "TOPLEFT"}}, {"id": 118, "text": "Model architecture.", "bbox": {"l": 332.66003, "t": 457.69427, "r": 420.16058, "b": 467.54633, "coord_origin": "TOPLEFT"}}]}, "text": "4.1. Model architecture."}, {"label": "text", "id": 12, "page_no": 3, "cluster": {"id": 12, "label": "text", "bbox": {"l": 307.7397155761719, "t": 475.38958740234375, "r": 545.5748901367188, "b": 664.99981, "coord_origin": "TOPLEFT"}, "confidence": 0.9878638386726379, "cells": [{"id": 119, "text": "We now describe in detail the proposed method, which", "bbox": {"l": 320.81699, "t": 476.76529, "r": 545.11487, "b": 485.67184, "coord_origin": "TOPLEFT"}}, {"id": 120, "text": "is composed of three main components, see Fig.", "bbox": {"l": 308.862, "t": 488.72028, "r": 509.02054, "b": 497.62683, "coord_origin": "TOPLEFT"}}, {"id": 121, "text": "4.", "bbox": {"l": 515.58588, "t": 488.72028, "r": 523.05786, "b": 497.62683, "coord_origin": "TOPLEFT"}}, {"id": 122, "text": "Our", "bbox": {"l": 529.62323, "t": 488.72028, "r": 545.11505, "b": 497.62683, "coord_origin": "TOPLEFT"}}, {"id": 123, "text": "CNN Backbone Network", "bbox": {"l": 308.862, "t": 500.76492, "r": 406.34601, "b": 509.35269, "coord_origin": "TOPLEFT"}}, {"id": 124, "text": "encodes the input as a feature vec-", "bbox": {"l": 408.87201, "t": 500.67526, "r": 545.1106, "b": 509.58182, "coord_origin": "TOPLEFT"}}, {"id": 125, "text": "tor of predefined length.", "bbox": {"l": 308.862, "t": 512.63126, "r": 409.39459, "b": 521.53781, "coord_origin": "TOPLEFT"}}, {"id": 126, "text": "The input feature vector of the", "bbox": {"l": 416.72705, "t": 512.63126, "r": 545.11505, "b": 521.53781, "coord_origin": "TOPLEFT"}}, {"id": 127, "text": "encoded image is passed to the", "bbox": {"l": 308.862, "t": 524.58624, "r": 436.194, "b": 533.4928, "coord_origin": "TOPLEFT"}}, {"id": 128, "text": "Structure Decoder", "bbox": {"l": 439.526, "t": 524.6759, "r": 513.86694, "b": 533.26367, "coord_origin": "TOPLEFT"}}, {"id": 129, "text": "to pro-", "bbox": {"l": 517.43201, "t": 524.58624, "r": 545.10815, "b": 533.4928, "coord_origin": "TOPLEFT"}}, {"id": 130, "text": "duce a sequence of HTML tags that represent the structure", "bbox": {"l": 308.862, "t": 536.54124, "r": 545.11511, "b": 545.4478, "coord_origin": "TOPLEFT"}}, {"id": 131, "text": "of the table.", "bbox": {"l": 308.862, "t": 548.49625, "r": 358.5455, "b": 557.4028000000001, "coord_origin": "TOPLEFT"}}, {"id": 132, "text": "With each prediction of an HTML standard", "bbox": {"l": 365.19055, "t": 548.49625, "r": 545.11517, "b": 557.4028000000001, "coord_origin": "TOPLEFT"}}, {"id": 133, "text": "data cell (\u2018", "bbox": {"l": 308.862, "t": 560.45125, "r": 352.40851, "b": 569.3578, "coord_origin": "TOPLEFT"}}, {"id": 134, "text": "<", "bbox": {"l": 352.409, "t": 560.29184, "r": 360.1579, "b": 569.13863, "coord_origin": "TOPLEFT"}}, {"id": 135, "text": "td", "bbox": {"l": 360.15799, "t": 560.45125, "r": 367.90891, "b": 569.3578, "coord_origin": "TOPLEFT"}}, {"id": 136, "text": ">", "bbox": {"l": 367.909, "t": 560.29184, "r": 375.6579, "b": 569.13863, "coord_origin": "TOPLEFT"}}, {"id": 137, "text": "\u2019) the hidden state of that cell is passed to", "bbox": {"l": 375.65799, "t": 560.45125, "r": 545.11182, "b": 569.3578, "coord_origin": "TOPLEFT"}}, {"id": 138, "text": "the Cell BBox Decoder. As for spanning cells, such as row", "bbox": {"l": 308.862, "t": 572.40724, "r": 545.11499, "b": 581.3138, "coord_origin": "TOPLEFT"}}, {"id": 139, "text": "or column span, the tag is broken down to \u2018", "bbox": {"l": 308.862, "t": 584.3622399999999, "r": 483.11768, "b": 593.2688, "coord_origin": "TOPLEFT"}}, {"id": 140, "text": "<", "bbox": {"l": 483.11902, "t": 584.20284, "r": 490.86792, "b": 593.04962, "coord_origin": "TOPLEFT"}}, {"id": 141, "text": "\u2019, \u2018rowspan=\u2019", "bbox": {"l": 490.86800999999997, "t": 584.3622399999999, "r": 545.11438, "b": 593.2688, "coord_origin": "TOPLEFT"}}, {"id": 142, "text": "or \u2018colspan=\u2019, with the number of spanning cells (attribute),", "bbox": {"l": 308.862, "t": 596.31725, "r": 545.11493, "b": 605.2238, "coord_origin": "TOPLEFT"}}, {"id": 143, "text": "and \u2018", "bbox": {"l": 308.862, "t": 608.27225, "r": 329.64395, "b": 617.1788, "coord_origin": "TOPLEFT"}}, {"id": 144, "text": ">", "bbox": {"l": 329.646, "t": 608.11284, "r": 337.3949, "b": 616.9596300000001, "coord_origin": "TOPLEFT"}}, {"id": 145, "text": "\u2019. The hidden state attached to \u2018", "bbox": {"l": 337.39398, "t": 608.27225, "r": 468.5914, "b": 617.1788, "coord_origin": "TOPLEFT"}}, {"id": 146, "text": "<", "bbox": {"l": 468.59496999999993, "t": 608.11284, "r": 476.34387000000004, "b": 616.9596300000001, "coord_origin": "TOPLEFT"}}, {"id": 147, "text": "\u2019 is passed to the", "bbox": {"l": 476.3439599999999, "t": 608.27225, "r": 545.11572, "b": 617.1788, "coord_origin": "TOPLEFT"}}, {"id": 148, "text": "Cell BBox Decoder. A shared feed forward network (FFN)", "bbox": {"l": 308.86197, "t": 620.22725, "r": 545.11499, "b": 629.1338000000001, "coord_origin": "TOPLEFT"}}, {"id": 149, "text": "receives the hidden states from the Structure Decoder, to", "bbox": {"l": 308.86197, "t": 632.1822500000001, "r": 545.11517, "b": 641.08881, "coord_origin": "TOPLEFT"}}, {"id": 150, "text": "provide the final detection predictions of the bounding box", "bbox": {"l": 308.86197, "t": 644.13824, "r": 545.11511, "b": 653.0448, "coord_origin": "TOPLEFT"}}, {"id": 151, "text": "coordinates and their classification.", "bbox": {"l": 308.86197, "t": 656.09325, "r": 449.42432, "b": 664.99981, "coord_origin": "TOPLEFT"}}]}, "text": "We now describe in detail the proposed method, which is composed of three main components, see Fig. 4. Our CNN Backbone Network encodes the input as a feature vector of predefined length. The input feature vector of the encoded image is passed to the Structure Decoder to produce a sequence of HTML tags that represent the structure of the table. With each prediction of an HTML standard data cell (\u2018 < td > \u2019) the hidden state of that cell is passed to the Cell BBox Decoder. As for spanning cells, such as row or column span, the tag is broken down to \u2018 < \u2019, \u2018rowspan=\u2019 or \u2018colspan=\u2019, with the number of spanning cells (attribute), and \u2018 > \u2019. The hidden state attached to \u2018 < \u2019 is passed to the Cell BBox Decoder. A shared feed forward network (FFN) receives the hidden states from the Structure Decoder, to provide the final detection predictions of the bounding box coordinates and their classification."}, {"label": "text", "id": 13, "page_no": 3, "cluster": {"id": 13, "label": "text", "bbox": {"l": 307.9578552246094, "t": 667.443115234375, "r": 545.4042358398438, "b": 713.30419921875, "coord_origin": "TOPLEFT"}, "confidence": 0.9786657691001892, "cells": [{"id": 152, "text": "CNN Backbone Network.", "bbox": {"l": 320.81696, "t": 668.2607, "r": 431.90985, "b": 677.21707, "coord_origin": "TOPLEFT"}}, {"id": 153, "text": "A ResNet-18 CNN is the", "bbox": {"l": 439.49896, "t": 668.3802499999999, "r": 545.11255, "b": 677.2868100000001, "coord_origin": "TOPLEFT"}}, {"id": 154, "text": "backbone that receives the table image and encodes it as a", "bbox": {"l": 308.86197, "t": 680.33525, "r": 545.11499, "b": 689.24181, "coord_origin": "TOPLEFT"}}, {"id": 155, "text": "vector of predefined length. The network has been modified", "bbox": {"l": 308.86197, "t": 692.290253, "r": 545.11511, "b": 701.196815, "coord_origin": "TOPLEFT"}}, {"id": 156, "text": "by removing the linear and pooling layer, as we are not per-", "bbox": {"l": 308.86197, "t": 704.245255, "r": 545.11505, "b": 713.1518169999999, "coord_origin": "TOPLEFT"}}]}, "text": "CNN Backbone Network. A ResNet-18 CNN is the backbone that receives the table image and encodes it as a vector of predefined length. The network has been modified by removing the linear and pooling layer, as we are not per-"}, {"label": "page_footer", "id": 14, "page_no": 3, "cluster": {"id": 14, "label": "page_footer", "bbox": {"l": 294.33935546875, "t": 733.3119506835938, "r": 300.1784973144531, "b": 743.039814, "coord_origin": "TOPLEFT"}, "confidence": 0.8562639951705933, "cells": [{"id": 157, "text": "4", "bbox": {"l": 295.12097, "t": 734.133251, "r": 300.10226, "b": 743.039814, "coord_origin": "TOPLEFT"}}]}, "text": "4"}], "body": [{"label": "text", "id": 0, "page_no": 3, "cluster": {"id": 0, "label": "text", "bbox": {"l": 49.493873596191406, "t": 74.18077087402344, "r": 286.36511, "b": 96.06994999999995, "coord_origin": "TOPLEFT"}, "confidence": 0.9614068269729614, "cells": [{"id": 0, "text": "amount of such tables, and kept only those ones ranging", "bbox": {"l": 50.112, "t": 75.20836999999995, "r": 286.36511, "b": 84.11492999999996, "coord_origin": "TOPLEFT"}}, {"id": 1, "text": "between 1*1 and 20*10 (rows/columns).", "bbox": {"l": 50.112, "t": 87.16339000000005, "r": 212.28319, "b": 96.06994999999995, "coord_origin": "TOPLEFT"}}]}, "text": "amount of such tables, and kept only those ones ranging between 1*1 and 20*10 (rows/columns)."}, {"label": "text", "id": 1, "page_no": 3, "cluster": {"id": 1, "label": "text", "bbox": {"l": 49.20685577392578, "t": 100.19926452636719, "r": 286.6800842285156, "b": 313.10507, "coord_origin": "TOPLEFT"}, "confidence": 0.9880395531654358, "cells": [{"id": 2, "text": "The availability of the bounding boxes for all table cells", "bbox": {"l": 62.067001, "t": 100.96038999999996, "r": 286.36502, "b": 109.86694, "coord_origin": "TOPLEFT"}}, {"id": 3, "text": "is essential to train our models. In order to distinguish be-", "bbox": {"l": 50.112, "t": 112.91540999999995, "r": 286.36508, "b": 121.82195999999999, "coord_origin": "TOPLEFT"}}, {"id": 4, "text": "tween empty and non-empty bounding boxes, we have in-", "bbox": {"l": 50.112, "t": 124.87041999999997, "r": 286.36508, "b": 133.77697999999998, "coord_origin": "TOPLEFT"}}, {"id": 5, "text": "troduced a binary class in the annotation. Unfortunately, the", "bbox": {"l": 50.112, "t": 136.82641999999998, "r": 286.36511, "b": 145.73297000000002, "coord_origin": "TOPLEFT"}}, {"id": 6, "text": "original datasets either omit the bounding boxes for whole", "bbox": {"l": 50.112, "t": 148.78143, "r": 286.36511, "b": 157.68799, "coord_origin": "TOPLEFT"}}, {"id": 7, "text": "tables (e.g. TableBank) or they narrow their scope only to", "bbox": {"l": 50.112, "t": 160.73645, "r": 286.36508, "b": 169.64301, "coord_origin": "TOPLEFT"}}, {"id": 8, "text": "non-empty cells. Therefore, it was imperative to introduce", "bbox": {"l": 50.112, "t": 172.69146999999998, "r": 286.36505, "b": 181.59802000000002, "coord_origin": "TOPLEFT"}}, {"id": 9, "text": "a data pre-processing procedure that generates the missing", "bbox": {"l": 50.112, "t": 184.64648, "r": 286.36508, "b": 193.55304, "coord_origin": "TOPLEFT"}}, {"id": 10, "text": "bounding boxes out of the annotation information. This pro-", "bbox": {"l": 50.112, "t": 196.60248, "r": 286.36508, "b": 205.50903000000005, "coord_origin": "TOPLEFT"}}, {"id": 11, "text": "cedure first parses the provided table structure and calcu-", "bbox": {"l": 50.112, "t": 208.5575, "r": 286.36508, "b": 217.46405000000004, "coord_origin": "TOPLEFT"}}, {"id": 12, "text": "lates the dimensions of the most fine-grained grid that cov-", "bbox": {"l": 50.112, "t": 220.51251000000002, "r": 286.36511, "b": 229.41907000000003, "coord_origin": "TOPLEFT"}}, {"id": 13, "text": "ers the table structure. Notice that each table cell may oc-", "bbox": {"l": 50.112, "t": 232.46753, "r": 286.36508, "b": 241.37408000000005, "coord_origin": "TOPLEFT"}}, {"id": 14, "text": "cupy multiple grid squares due to row or column spans. In", "bbox": {"l": 50.112, "t": 244.42255, "r": 286.36508, "b": 253.32910000000004, "coord_origin": "TOPLEFT"}}, {"id": 15, "text": "case of PubTabNet we had to compute missing bounding", "bbox": {"l": 50.112, "t": 256.37756, "r": 286.36505, "b": 265.28412000000003, "coord_origin": "TOPLEFT"}}, {"id": 16, "text": "boxes for 48% of the simple and 69% of the complex ta-", "bbox": {"l": 50.112, "t": 268.33356000000003, "r": 286.36505, "b": 277.24010999999996, "coord_origin": "TOPLEFT"}}, {"id": 17, "text": "bles.", "bbox": {"l": 50.112, "t": 280.28853999999995, "r": 68.652397, "b": 289.1951, "coord_origin": "TOPLEFT"}}, {"id": 18, "text": "Regarding FinTabNet, 68% of the simple and 98%", "bbox": {"l": 75.566444, "t": 280.28853999999995, "r": 286.36514, "b": 289.1951, "coord_origin": "TOPLEFT"}}, {"id": 19, "text": "of the complex tables require the generation of bounding", "bbox": {"l": 50.112, "t": 292.24353, "r": 286.36511, "b": 301.15009, "coord_origin": "TOPLEFT"}}, {"id": 20, "text": "boxes.", "bbox": {"l": 50.112, "t": 304.19852000000003, "r": 75.695961, "b": 313.10507, "coord_origin": "TOPLEFT"}}]}, "text": "The availability of the bounding boxes for all table cells is essential to train our models. In order to distinguish between empty and non-empty bounding boxes, we have introduced a binary class in the annotation. Unfortunately, the original datasets either omit the bounding boxes for whole tables (e.g. TableBank) or they narrow their scope only to non-empty cells. Therefore, it was imperative to introduce a data pre-processing procedure that generates the missing bounding boxes out of the annotation information. This procedure first parses the provided table structure and calculates the dimensions of the most fine-grained grid that covers the table structure. Notice that each table cell may occupy multiple grid squares due to row or column spans. In case of PubTabNet we had to compute missing bounding boxes for 48% of the simple and 69% of the complex tables. Regarding FinTabNet, 68% of the simple and 98% of the complex tables require the generation of bounding boxes."}, {"label": "text", "id": 2, "page_no": 3, "cluster": {"id": 2, "label": "text", "bbox": {"l": 49.406982421875, "t": 317.0073547363281, "r": 286.68450927734375, "b": 434.8613586425781, "coord_origin": "TOPLEFT"}, "confidence": 0.9874395728111267, "cells": [{"id": 21, "text": "As it is illustrated in Fig. 2, the table distributions from", "bbox": {"l": 62.067001, "t": 317.99550999999997, "r": 286.36499, "b": 326.90207, "coord_origin": "TOPLEFT"}}, {"id": 22, "text": "all datasets are skewed towards simpler structures with", "bbox": {"l": 50.112, "t": 329.95151, "r": 286.36511, "b": 338.8580600000001, "coord_origin": "TOPLEFT"}}, {"id": 23, "text": "fewer number of rows/columns. Additionally, there is very", "bbox": {"l": 50.112, "t": 341.90649, "r": 286.36502, "b": 350.81305, "coord_origin": "TOPLEFT"}}, {"id": 24, "text": "limited variance in the table styles, which in case of Pub-", "bbox": {"l": 50.112, "t": 353.8614799999999, "r": 286.36505, "b": 362.76804, "coord_origin": "TOPLEFT"}}, {"id": 25, "text": "TabNet and FinTabNet means one styling format for the", "bbox": {"l": 50.112, "t": 365.81647, "r": 286.36508, "b": 374.72301999999996, "coord_origin": "TOPLEFT"}}, {"id": 26, "text": "majority of the tables.", "bbox": {"l": 50.112, "t": 377.77145, "r": 141.58859, "b": 386.67801, "coord_origin": "TOPLEFT"}}, {"id": 27, "text": "Similar limitations appear also in", "bbox": {"l": 148.70189, "t": 377.77145, "r": 286.36508, "b": 386.67801, "coord_origin": "TOPLEFT"}}, {"id": 28, "text": "the type of table content, which in some cases (e.g. FinTab-", "bbox": {"l": 50.112, "t": 389.72644, "r": 286.36508, "b": 398.63300000000004, "coord_origin": "TOPLEFT"}}, {"id": 29, "text": "Net) is restricted to a certain domain. Ultimately, the lack", "bbox": {"l": 50.112, "t": 401.68243, "r": 286.36511, "b": 410.58899, "coord_origin": "TOPLEFT"}}, {"id": 30, "text": "of diversity in the training dataset damages the ability of the", "bbox": {"l": 50.112, "t": 413.63742, "r": 286.36511, "b": 422.54398, "coord_origin": "TOPLEFT"}}, {"id": 31, "text": "models to generalize well on unseen data.", "bbox": {"l": 50.112, "t": 425.59241, "r": 216.39774, "b": 434.49896, "coord_origin": "TOPLEFT"}}]}, "text": "As it is illustrated in Fig. 2, the table distributions from all datasets are skewed towards simpler structures with fewer number of rows/columns. Additionally, there is very limited variance in the table styles, which in case of PubTabNet and FinTabNet means one styling format for the majority of the tables. Similar limitations appear also in the type of table content, which in some cases (e.g. FinTabNet) is restricted to a certain domain. Ultimately, the lack of diversity in the training dataset damages the ability of the models to generalize well on unseen data."}, {"label": "text", "id": 3, "page_no": 3, "cluster": {"id": 3, "label": "text", "bbox": {"l": 49.20225143432617, "t": 438.12713623046875, "r": 286.90093994140625, "b": 627.7101440429688, "coord_origin": "TOPLEFT"}, "confidence": 0.9876185655593872, "cells": [{"id": 32, "text": "Motivated by those observations we aimed at generating", "bbox": {"l": 62.067001, "t": 439.3894, "r": 286.36499, "b": 448.2959599999999, "coord_origin": "TOPLEFT"}}, {"id": 33, "text": "a synthetic table dataset named", "bbox": {"l": 50.112, "t": 451.34439, "r": 172.14388, "b": 460.25095, "coord_origin": "TOPLEFT"}}, {"id": 34, "text": "SynthTabNet", "bbox": {"l": 174.14801, "t": 451.43405, "r": 224.70818999999997, "b": 460.02182, "coord_origin": "TOPLEFT"}}, {"id": 35, "text": ". This approach", "bbox": {"l": 224.70801, "t": 451.34439, "r": 286.36655, "b": 460.25095, "coord_origin": "TOPLEFT"}}, {"id": 36, "text": "offers control over: 1) the size of the dataset, 2) the table", "bbox": {"l": 50.112015, "t": 463.30038, "r": 286.36505, "b": 472.20694, "coord_origin": "TOPLEFT"}}, {"id": 37, "text": "structure, 3) the table style and 4) the type of content. The", "bbox": {"l": 50.112015, "t": 475.25537, "r": 286.36511, "b": 484.16193, "coord_origin": "TOPLEFT"}}, {"id": 38, "text": "complexity of the table structure is described by the size of", "bbox": {"l": 50.112015, "t": 487.21036, "r": 286.36511, "b": 496.11691, "coord_origin": "TOPLEFT"}}, {"id": 39, "text": "the table header and the table body, as well as the percentage", "bbox": {"l": 50.112015, "t": 499.16534, "r": 286.36508, "b": 508.0719, "coord_origin": "TOPLEFT"}}, {"id": 40, "text": "of the table cells covered by row spans and column spans.", "bbox": {"l": 50.112015, "t": 511.12033, "r": 286.36505, "b": 520.02689, "coord_origin": "TOPLEFT"}}, {"id": 41, "text": "A set of carefully designed styling templates provides the", "bbox": {"l": 50.112015, "t": 523.07632, "r": 286.36508, "b": 531.98288, "coord_origin": "TOPLEFT"}}, {"id": 42, "text": "basis to build a wide range of table appearances. Lastly, the", "bbox": {"l": 50.112015, "t": 535.0313100000001, "r": 286.36508, "b": 543.93788, "coord_origin": "TOPLEFT"}}, {"id": 43, "text": "table content is generated out of a curated collection of text", "bbox": {"l": 50.112015, "t": 546.98633, "r": 286.36511, "b": 555.89288, "coord_origin": "TOPLEFT"}}, {"id": 44, "text": "corpora. By controlling the size and scope of the synthetic", "bbox": {"l": 50.112015, "t": 558.94133, "r": 286.36508, "b": 567.84789, "coord_origin": "TOPLEFT"}}, {"id": 45, "text": "datasets we are able to train and evaluate our models in a", "bbox": {"l": 50.112015, "t": 570.89633, "r": 286.36511, "b": 579.8028899999999, "coord_origin": "TOPLEFT"}}, {"id": 46, "text": "variety of different conditions. For example, we can first", "bbox": {"l": 50.112015, "t": 582.85133, "r": 286.36511, "b": 591.75789, "coord_origin": "TOPLEFT"}}, {"id": 47, "text": "generate a highly diverse dataset to train our models and", "bbox": {"l": 50.112015, "t": 594.80733, "r": 286.36505, "b": 603.71388, "coord_origin": "TOPLEFT"}}, {"id": 48, "text": "then evaluate their performance on other synthetic datasets", "bbox": {"l": 50.112015, "t": 606.76233, "r": 286.36508, "b": 615.6688800000001, "coord_origin": "TOPLEFT"}}, {"id": 49, "text": "which are focused on a specific domain.", "bbox": {"l": 50.112015, "t": 618.71733, "r": 209.7527, "b": 627.62389, "coord_origin": "TOPLEFT"}}]}, "text": "Motivated by those observations we aimed at generating a synthetic table dataset named SynthTabNet . This approach offers control over: 1) the size of the dataset, 2) the table structure, 3) the table style and 4) the type of content. The complexity of the table structure is described by the size of the table header and the table body, as well as the percentage of the table cells covered by row spans and column spans. A set of carefully designed styling templates provides the basis to build a wide range of table appearances. Lastly, the table content is generated out of a curated collection of text corpora. By controlling the size and scope of the synthetic datasets we are able to train and evaluate our models in a variety of different conditions. For example, we can first generate a highly diverse dataset to train our models and then evaluate their performance on other synthetic datasets which are focused on a specific domain."}, {"label": "text", "id": 4, "page_no": 3, "cluster": {"id": 4, "label": "text", "bbox": {"l": 49.49843215942383, "t": 631.5836181640625, "r": 286.4927978515625, "b": 713.2897338867188, "coord_origin": "TOPLEFT"}, "confidence": 0.9870707392692566, "cells": [{"id": 50, "text": "In this regard, we have prepared four synthetic datasets,", "bbox": {"l": 62.067017, "t": 632.51433, "r": 286.36499, "b": 641.42088, "coord_origin": "TOPLEFT"}}, {"id": 51, "text": "each one containing 150k examples. The corpora to gener-", "bbox": {"l": 50.112015, "t": 644.46933, "r": 286.36508, "b": 653.37589, "coord_origin": "TOPLEFT"}}, {"id": 52, "text": "ate the table text consists of the most frequent terms appear-", "bbox": {"l": 50.112015, "t": 656.42532, "r": 286.36511, "b": 665.33189, "coord_origin": "TOPLEFT"}}, {"id": 53, "text": "ing in PubTabNet and FinTabNet together with randomly", "bbox": {"l": 50.112015, "t": 668.38033, "r": 286.36505, "b": 677.28689, "coord_origin": "TOPLEFT"}}, {"id": 54, "text": "generated text. The first two synthetic datasets have been", "bbox": {"l": 50.112015, "t": 680.33533, "r": 286.36508, "b": 689.24189, "coord_origin": "TOPLEFT"}}, {"id": 55, "text": "fine-tuned to mimic the appearance of the original datasets", "bbox": {"l": 50.112015, "t": 692.290329, "r": 286.36508, "b": 701.196892, "coord_origin": "TOPLEFT"}}, {"id": 56, "text": "but encompass more complicated table structures. The third", "bbox": {"l": 50.112015, "t": 704.245331, "r": 286.36511, "b": 713.151894, "coord_origin": "TOPLEFT"}}]}, "text": "In this regard, we have prepared four synthetic datasets, each one containing 150k examples. The corpora to generate the table text consists of the most frequent terms appearing in PubTabNet and FinTabNet together with randomly generated text. The first two synthetic datasets have been fine-tuned to mimic the appearance of the original datasets but encompass more complicated table structures. The third"}, {"label": "table", "id": 5, "page_no": 3, "cluster": {"id": 5, "label": "table", "bbox": {"l": 310.6772766113281, "t": 73.19307708740234, "r": 542.958251953125, "b": 155.2208251953125, "coord_origin": "TOPLEFT"}, "confidence": 0.9777666330337524, "cells": [{"id": 57, "text": "Tags", "bbox": {"l": 412.332, "t": 73.61437999999998, "r": 430.90231, "b": 82.52094, "coord_origin": "TOPLEFT"}}, {"id": 58, "text": "Bbox", "bbox": {"l": 442.85742, "t": 73.61437999999998, "r": 464.4463799999999, "b": 82.52094, "coord_origin": "TOPLEFT"}}, {"id": 59, "text": "Size", "bbox": {"l": 477.78632, "t": 73.61437999999998, "r": 494.94193, "b": 82.52094, "coord_origin": "TOPLEFT"}}, {"id": 60, "text": "Format", "bbox": {"l": 508.28186, "t": 73.61437999999998, "r": 536.91437, "b": 82.52094, "coord_origin": "TOPLEFT"}}, {"id": 61, "text": "PubTabNet", "bbox": {"l": 317.06, "t": 85.9673499999999, "r": 361.64264, "b": 94.87390000000005, "coord_origin": "TOPLEFT"}}, {"id": 62, "text": "3", "bbox": {"l": 417.85599, "t": 85.6684600000001, "r": 425.37775, "b": 94.88385000000017, "coord_origin": "TOPLEFT"}}, {"id": 63, "text": "3", "bbox": {"l": 449.89569, "t": 85.6684600000001, "r": 457.41745000000003, "b": 94.88385000000017, "coord_origin": "TOPLEFT"}}, {"id": 64, "text": "509k", "bbox": {"l": 476.401, "t": 85.9673499999999, "r": 496.3262, "b": 94.87390000000005, "coord_origin": "TOPLEFT"}}, {"id": 65, "text": "PNG", "bbox": {"l": 512.63495, "t": 85.9673499999999, "r": 532.56012, "b": 94.87390000000005, "coord_origin": "TOPLEFT"}}, {"id": 66, "text": "FinTabNet", "bbox": {"l": 317.06, "t": 97.92236000000003, "r": 359.43094, "b": 106.82892000000004, "coord_origin": "TOPLEFT"}}, {"id": 67, "text": "3", "bbox": {"l": 417.85599, "t": 97.62347, "r": 425.37775, "b": 106.83887000000016, "coord_origin": "TOPLEFT"}}, {"id": 68, "text": "3", "bbox": {"l": 449.89569, "t": 97.62347, "r": 457.41745000000003, "b": 106.83887000000016, "coord_origin": "TOPLEFT"}}, {"id": 69, "text": "112k", "bbox": {"l": 476.401, "t": 97.92236000000003, "r": 496.3262, "b": 106.82892000000004, "coord_origin": "TOPLEFT"}}, {"id": 70, "text": "PDF", "bbox": {"l": 513.46185, "t": 97.92236000000003, "r": 531.73328, "b": 106.82892000000004, "coord_origin": "TOPLEFT"}}, {"id": 71, "text": "TableBank", "bbox": {"l": 317.06, "t": 109.87836000000004, "r": 359.97888, "b": 118.78490999999997, "coord_origin": "TOPLEFT"}}, {"id": 72, "text": "3", "bbox": {"l": 417.85599, "t": 109.57947000000001, "r": 425.37775, "b": 118.79485999999997, "coord_origin": "TOPLEFT"}}, {"id": 73, "text": "7", "bbox": {"l": 450.81226, "t": 109.57947000000001, "r": 456.50091999999995, "b": 118.79485999999997, "coord_origin": "TOPLEFT"}}, {"id": 74, "text": "145k", "bbox": {"l": 476.401, "t": 109.87836000000004, "r": 496.3262, "b": 118.78490999999997, "coord_origin": "TOPLEFT"}}, {"id": 75, "text": "JPEG", "bbox": {"l": 511.25017999999994, "t": 109.87836000000004, "r": 533.94501, "b": 118.78490999999997, "coord_origin": "TOPLEFT"}}, {"id": 76, "text": "Combined-Tabnet(*)", "bbox": {"l": 317.06, "t": 121.83336999999995, "r": 400.37723, "b": 130.73992999999996, "coord_origin": "TOPLEFT"}}, {"id": 77, "text": "3", "bbox": {"l": 417.85599, "t": 121.53448000000003, "r": 425.37775, "b": 130.74987999999996, "coord_origin": "TOPLEFT"}}, {"id": 78, "text": "3", "bbox": {"l": 449.89569, "t": 121.53448000000003, "r": 457.41745000000003, "b": 130.74987999999996, "coord_origin": "TOPLEFT"}}, {"id": 79, "text": "400k", "bbox": {"l": 476.401, "t": 121.83336999999995, "r": 496.3262, "b": 130.73992999999996, "coord_origin": "TOPLEFT"}}, {"id": 80, "text": "PNG", "bbox": {"l": 512.63495, "t": 121.83336999999995, "r": 532.56012, "b": 130.73992999999996, "coord_origin": "TOPLEFT"}}, {"id": 81, "text": "Combined(**)", "bbox": {"l": 317.06, "t": 133.78839000000005, "r": 375.17184, "b": 142.69494999999995, "coord_origin": "TOPLEFT"}}, {"id": 82, "text": "3", "bbox": {"l": 417.85599, "t": 133.48950000000002, "r": 425.37775, "b": 142.70489999999995, "coord_origin": "TOPLEFT"}}, {"id": 83, "text": "3", "bbox": {"l": 449.89569, "t": 133.48950000000002, "r": 457.41745000000003, "b": 142.70489999999995, "coord_origin": "TOPLEFT"}}, {"id": 84, "text": "500k", "bbox": {"l": 476.401, "t": 133.78839000000005, "r": 496.3262, "b": 142.69494999999995, "coord_origin": "TOPLEFT"}}, {"id": 85, "text": "PNG", "bbox": {"l": 512.63495, "t": 133.78839000000005, "r": 532.56012, "b": 142.69494999999995, "coord_origin": "TOPLEFT"}}, {"id": 86, "text": "SynthTabNet", "bbox": {"l": 317.06, "t": 145.74341000000004, "r": 369.39352, "b": 154.64995999999996, "coord_origin": "TOPLEFT"}}, {"id": 87, "text": "3", "bbox": {"l": 417.85599, "t": 145.44446000000005, "r": 425.37775, "b": 154.65985, "coord_origin": "TOPLEFT"}}, {"id": 88, "text": "3", "bbox": {"l": 449.89569, "t": 145.44446000000005, "r": 457.41745000000003, "b": 154.65985, "coord_origin": "TOPLEFT"}}, {"id": 89, "text": "600k", "bbox": {"l": 476.401, "t": 145.74334999999996, "r": 496.3262, "b": 154.6499, "coord_origin": "TOPLEFT"}}, {"id": 90, "text": "PNG", "bbox": {"l": 512.63495, "t": 145.74334999999996, "r": 532.56012, "b": 154.6499, "coord_origin": "TOPLEFT"}}]}, "text": null, "otsl_seq": ["ecel", "ched", "ched", "ched", "ched", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "nl"], "num_rows": 7, "num_cols": 5, "table_cells": [{"bbox": {"l": 412.332, "t": 73.61437999999998, "r": 430.90231, "b": 82.52094, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "Tags", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 442.85742, "t": 73.61437999999998, "r": 464.4463799999999, "b": 82.52094, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "Bbox", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 477.78632, "t": 73.61437999999998, "r": 494.94193, "b": 82.52094, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "Size", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 508.28186, "t": 73.61437999999998, "r": 536.91437, "b": 82.52094, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "Format", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 317.06, "t": 85.9673499999999, "r": 361.64264, "b": 94.87390000000005, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "PubTabNet", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 417.85599, "t": 85.6684600000001, "r": 425.37775, "b": 94.88385000000017, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 449.89569, "t": 85.6684600000001, "r": 457.41745000000003, "b": 94.88385000000017, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 476.401, "t": 85.9673499999999, "r": 496.3262, "b": 94.87390000000005, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "509k", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 512.63495, "t": 85.9673499999999, "r": 532.56012, "b": 94.87390000000005, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "PNG", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 317.06, "t": 97.92236000000003, "r": 359.43094, "b": 106.82892000000004, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "FinTabNet", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 417.85599, "t": 97.62347, "r": 425.37775, "b": 106.83887000000016, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 449.89569, "t": 97.62347, "r": 457.41745000000003, "b": 106.83887000000016, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 476.401, "t": 97.92236000000003, "r": 496.3262, "b": 106.82892000000004, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "112k", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 513.46185, "t": 97.92236000000003, "r": 531.73328, "b": 106.82892000000004, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "PDF", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 317.06, "t": 109.87836000000004, "r": 359.97888, "b": 118.78490999999997, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "TableBank", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 417.85599, "t": 109.57947000000001, "r": 425.37775, "b": 118.79485999999997, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 450.81226, "t": 109.57947000000001, "r": 456.50091999999995, "b": 118.79485999999997, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "7", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 476.401, "t": 109.87836000000004, "r": 496.3262, "b": 118.78490999999997, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "145k", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 511.25017999999994, "t": 109.87836000000004, "r": 533.94501, "b": 118.78490999999997, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "JPEG", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 317.06, "t": 121.83336999999995, "r": 400.37723, "b": 130.73992999999996, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Combined-Tabnet(*)", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 417.85599, "t": 121.53448000000003, "r": 425.37775, "b": 130.74987999999996, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 449.89569, "t": 121.53448000000003, "r": 457.41745000000003, "b": 130.74987999999996, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 476.401, "t": 121.83336999999995, "r": 496.3262, "b": 130.73992999999996, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "400k", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 512.63495, "t": 121.83336999999995, "r": 532.56012, "b": 130.73992999999996, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "PNG", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 317.06, "t": 133.78839000000005, "r": 375.17184, "b": 142.69494999999995, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Combined(**)", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 417.85599, "t": 133.48950000000002, "r": 425.37775, "b": 142.70489999999995, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 449.89569, "t": 133.48950000000002, "r": 457.41745000000003, "b": 142.70489999999995, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 476.401, "t": 133.78839000000005, "r": 496.3262, "b": 142.69494999999995, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "500k", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 512.63495, "t": 133.78839000000005, "r": 532.56012, "b": 142.69494999999995, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "PNG", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 317.06, "t": 145.74341000000004, "r": 369.39352, "b": 154.64995999999996, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "SynthTabNet", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 417.85599, "t": 145.44446000000005, "r": 425.37775, "b": 154.65985, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 449.89569, "t": 145.44446000000005, "r": 457.41745000000003, "b": 154.65985, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 476.401, "t": 145.74334999999996, "r": 496.3262, "b": 154.6499, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "600k", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 512.63495, "t": 145.74334999999996, "r": 532.56012, "b": 154.6499, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "PNG", "column_header": false, "row_header": false, "row_section": false}]}, {"label": "caption", "id": 6, "page_no": 3, "cluster": {"id": 6, "label": "caption", "bbox": {"l": 307.88067626953125, "t": 167.05953979492188, "r": 545.414306640625, "b": 224.38897999999995, "coord_origin": "TOPLEFT"}, "confidence": 0.9668342471122742, "cells": [{"id": 91, "text": "Table 1:", "bbox": {"l": 308.862, "t": 167.66138, "r": 344.6178, "b": 176.56793000000005, "coord_origin": "TOPLEFT"}}, {"id": 92, "text": "Both", "bbox": {"l": 361.07602, "t": 167.66138, "r": 380.45328, "b": 176.56793000000005, "coord_origin": "TOPLEFT"}}, {"id": 93, "text": "\u201cCombined-Tabnet\u201d", "bbox": {"l": 386.56799, "t": 167.75104, "r": 468.67974999999996, "b": 176.33880999999997, "coord_origin": "TOPLEFT"}}, {"id": 94, "text": "and", "bbox": {"l": 474.79599, "t": 167.66138, "r": 489.18198, "b": 176.56793000000005, "coord_origin": "TOPLEFT"}}, {"id": 95, "text": "\u201dCombined-", "bbox": {"l": 495.29898000000003, "t": 167.75104, "r": 545.112, "b": 176.33880999999997, "coord_origin": "TOPLEFT"}}, {"id": 96, "text": "Tabnet\u201d", "bbox": {"l": 308.862, "t": 179.70605, "r": 341.16077, "b": 188.29381999999998, "coord_origin": "TOPLEFT"}}, {"id": 97, "text": "are variations of the following: (*) The Combined-", "bbox": {"l": 343.457, "t": 179.61639000000002, "r": 545.11005, "b": 188.52295000000004, "coord_origin": "TOPLEFT"}}, {"id": 98, "text": "Tabnet dataset is the processed combination of PubTabNet", "bbox": {"l": 308.862, "t": 191.57141000000001, "r": 545.11505, "b": 200.47797000000003, "coord_origin": "TOPLEFT"}}, {"id": 99, "text": "and Fintabnet. (**) The combined dataset is the processed", "bbox": {"l": 308.862, "t": 203.52643, "r": 545.11499, "b": 212.43298000000004, "coord_origin": "TOPLEFT"}}, {"id": 100, "text": "combination of PubTabNet, Fintabnet and TableBank.", "bbox": {"l": 308.862, "t": 215.48242000000005, "r": 523.93469, "b": 224.38897999999995, "coord_origin": "TOPLEFT"}}]}, "text": "Table 1: Both \u201cCombined-Tabnet\u201d and \u201dCombinedTabnet\u201d are variations of the following: (*) The CombinedTabnet dataset is the processed combination of PubTabNet and Fintabnet. (**) The combined dataset is the processed combination of PubTabNet, Fintabnet and TableBank."}, {"label": "text", "id": 7, "page_no": 3, "cluster": {"id": 7, "label": "text", "bbox": {"l": 307.9458923339844, "t": 248.9580078125, "r": 545.1485595703125, "b": 294.39197, "coord_origin": "TOPLEFT"}, "confidence": 0.9791285991668701, "cells": [{"id": 101, "text": "one adopts a colorful appearance with high contrast and the", "bbox": {"l": 308.862, "t": 249.62041999999997, "r": 545.11517, "b": 258.52698, "coord_origin": "TOPLEFT"}}, {"id": 102, "text": "last one contains tables with sparse content. Lastly, we have", "bbox": {"l": 308.862, "t": 261.57543999999996, "r": 545.11517, "b": 270.48199, "coord_origin": "TOPLEFT"}}, {"id": 103, "text": "combined all synthetic datasets into one big unified syn-", "bbox": {"l": 308.862, "t": 273.5304, "r": 545.11505, "b": 282.43698, "coord_origin": "TOPLEFT"}}, {"id": 104, "text": "thetic dataset of 600k examples.", "bbox": {"l": 308.862, "t": 285.48541000000006, "r": 436.82169, "b": 294.39197, "coord_origin": "TOPLEFT"}}]}, "text": "one adopts a colorful appearance with high contrast and the last one contains tables with sparse content. Lastly, we have combined all synthetic datasets into one big unified synthetic dataset of 600k examples."}, {"label": "text", "id": 8, "page_no": 3, "cluster": {"id": 8, "label": "text", "bbox": {"l": 320.0391845703125, "t": 297.109375, "r": 542.74396, "b": 306.67896, "coord_origin": "TOPLEFT"}, "confidence": 0.9049100279808044, "cells": [{"id": 105, "text": "Tab. 1 summarizes the various attributes of the datasets.", "bbox": {"l": 320.81699, "t": 297.77240000000006, "r": 542.74396, "b": 306.67896, "coord_origin": "TOPLEFT"}}]}, "text": "Tab. 1 summarizes the various attributes of the datasets."}, {"label": "section_header", "id": 9, "page_no": 3, "cluster": {"id": 9, "label": "section_header", "bbox": {"l": 307.9829406738281, "t": 320.3982849121094, "r": 444.93607000000003, "b": 331.93167000000005, "coord_origin": "TOPLEFT"}, "confidence": 0.946631908416748, "cells": [{"id": 106, "text": "4.", "bbox": {"l": 308.862, "t": 321.18396, "r": 316.28476, "b": 331.93167000000005, "coord_origin": "TOPLEFT"}}, {"id": 107, "text": "The TableFormer model", "bbox": {"l": 326.18176, "t": 321.18396, "r": 444.93607000000003, "b": 331.93167000000005, "coord_origin": "TOPLEFT"}}]}, "text": "4. The TableFormer model"}, {"label": "text", "id": 10, "page_no": 3, "cluster": {"id": 10, "label": "text", "bbox": {"l": 307.7573547363281, "t": 340.8102111816406, "r": 545.5172119140625, "b": 447.71630859375, "coord_origin": "TOPLEFT"}, "confidence": 0.988465428352356, "cells": [{"id": 108, "text": "Given the image of a table, TableFormer is able to pre-", "bbox": {"l": 320.81699, "t": 341.93939, "r": 545.11499, "b": 350.84594999999996, "coord_origin": "TOPLEFT"}}, {"id": 109, "text": "dict: 1) a sequence of tokens that represent the structure of", "bbox": {"l": 308.862, "t": 353.89438, "r": 545.11511, "b": 362.80092999999994, "coord_origin": "TOPLEFT"}}, {"id": 110, "text": "a table, and 2) a bounding box coupled to a subset of those", "bbox": {"l": 308.862, "t": 365.84937, "r": 545.11517, "b": 374.75592, "coord_origin": "TOPLEFT"}}, {"id": 111, "text": "tokens. The conversion of an image into a sequence of to-", "bbox": {"l": 308.862, "t": 377.80435, "r": 545.11505, "b": 386.71091, "coord_origin": "TOPLEFT"}}, {"id": 112, "text": "kens is a well-known task [35, 16]. While attention is often", "bbox": {"l": 308.862, "t": 389.75934000000007, "r": 545.11517, "b": 398.66588999999993, "coord_origin": "TOPLEFT"}}, {"id": 113, "text": "used as an implicit method to associate each token of the", "bbox": {"l": 308.862, "t": 401.71432000000004, "r": 545.11523, "b": 410.62088, "coord_origin": "TOPLEFT"}}, {"id": 114, "text": "sequence with a position in the original image, an explicit", "bbox": {"l": 308.862, "t": 413.67032, "r": 545.11517, "b": 422.57687, "coord_origin": "TOPLEFT"}}, {"id": 115, "text": "association between the individual table-cells and the image", "bbox": {"l": 308.862, "t": 425.62531, "r": 545.11505, "b": 434.53186, "coord_origin": "TOPLEFT"}}, {"id": 116, "text": "bounding boxes is also required.", "bbox": {"l": 308.862, "t": 437.58029, "r": 437.9375, "b": 446.48685000000006, "coord_origin": "TOPLEFT"}}]}, "text": "Given the image of a table, TableFormer is able to predict: 1) a sequence of tokens that represent the structure of a table, and 2) a bounding box coupled to a subset of those tokens. The conversion of an image into a sequence of tokens is a well-known task [35, 16]. While attention is often used as an implicit method to associate each token of the sequence with a position in the original image, an explicit association between the individual table-cells and the image bounding boxes is also required."}, {"label": "section_header", "id": 11, "page_no": 3, "cluster": {"id": 11, "label": "section_header", "bbox": {"l": 307.8253173828125, "t": 457.084716796875, "r": 420.16058, "b": 467.54633, "coord_origin": "TOPLEFT"}, "confidence": 0.9376939535140991, "cells": [{"id": 117, "text": "4.1.", "bbox": {"l": 308.862, "t": 457.69427, "r": 323.14081, "b": 467.54633, "coord_origin": "TOPLEFT"}}, {"id": 118, "text": "Model architecture.", "bbox": {"l": 332.66003, "t": 457.69427, "r": 420.16058, "b": 467.54633, "coord_origin": "TOPLEFT"}}]}, "text": "4.1. Model architecture."}, {"label": "text", "id": 12, "page_no": 3, "cluster": {"id": 12, "label": "text", "bbox": {"l": 307.7397155761719, "t": 475.38958740234375, "r": 545.5748901367188, "b": 664.99981, "coord_origin": "TOPLEFT"}, "confidence": 0.9878638386726379, "cells": [{"id": 119, "text": "We now describe in detail the proposed method, which", "bbox": {"l": 320.81699, "t": 476.76529, "r": 545.11487, "b": 485.67184, "coord_origin": "TOPLEFT"}}, {"id": 120, "text": "is composed of three main components, see Fig.", "bbox": {"l": 308.862, "t": 488.72028, "r": 509.02054, "b": 497.62683, "coord_origin": "TOPLEFT"}}, {"id": 121, "text": "4.", "bbox": {"l": 515.58588, "t": 488.72028, "r": 523.05786, "b": 497.62683, "coord_origin": "TOPLEFT"}}, {"id": 122, "text": "Our", "bbox": {"l": 529.62323, "t": 488.72028, "r": 545.11505, "b": 497.62683, "coord_origin": "TOPLEFT"}}, {"id": 123, "text": "CNN Backbone Network", "bbox": {"l": 308.862, "t": 500.76492, "r": 406.34601, "b": 509.35269, "coord_origin": "TOPLEFT"}}, {"id": 124, "text": "encodes the input as a feature vec-", "bbox": {"l": 408.87201, "t": 500.67526, "r": 545.1106, "b": 509.58182, "coord_origin": "TOPLEFT"}}, {"id": 125, "text": "tor of predefined length.", "bbox": {"l": 308.862, "t": 512.63126, "r": 409.39459, "b": 521.53781, "coord_origin": "TOPLEFT"}}, {"id": 126, "text": "The input feature vector of the", "bbox": {"l": 416.72705, "t": 512.63126, "r": 545.11505, "b": 521.53781, "coord_origin": "TOPLEFT"}}, {"id": 127, "text": "encoded image is passed to the", "bbox": {"l": 308.862, "t": 524.58624, "r": 436.194, "b": 533.4928, "coord_origin": "TOPLEFT"}}, {"id": 128, "text": "Structure Decoder", "bbox": {"l": 439.526, "t": 524.6759, "r": 513.86694, "b": 533.26367, "coord_origin": "TOPLEFT"}}, {"id": 129, "text": "to pro-", "bbox": {"l": 517.43201, "t": 524.58624, "r": 545.10815, "b": 533.4928, "coord_origin": "TOPLEFT"}}, {"id": 130, "text": "duce a sequence of HTML tags that represent the structure", "bbox": {"l": 308.862, "t": 536.54124, "r": 545.11511, "b": 545.4478, "coord_origin": "TOPLEFT"}}, {"id": 131, "text": "of the table.", "bbox": {"l": 308.862, "t": 548.49625, "r": 358.5455, "b": 557.4028000000001, "coord_origin": "TOPLEFT"}}, {"id": 132, "text": "With each prediction of an HTML standard", "bbox": {"l": 365.19055, "t": 548.49625, "r": 545.11517, "b": 557.4028000000001, "coord_origin": "TOPLEFT"}}, {"id": 133, "text": "data cell (\u2018", "bbox": {"l": 308.862, "t": 560.45125, "r": 352.40851, "b": 569.3578, "coord_origin": "TOPLEFT"}}, {"id": 134, "text": "<", "bbox": {"l": 352.409, "t": 560.29184, "r": 360.1579, "b": 569.13863, "coord_origin": "TOPLEFT"}}, {"id": 135, "text": "td", "bbox": {"l": 360.15799, "t": 560.45125, "r": 367.90891, "b": 569.3578, "coord_origin": "TOPLEFT"}}, {"id": 136, "text": ">", "bbox": {"l": 367.909, "t": 560.29184, "r": 375.6579, "b": 569.13863, "coord_origin": "TOPLEFT"}}, {"id": 137, "text": "\u2019) the hidden state of that cell is passed to", "bbox": {"l": 375.65799, "t": 560.45125, "r": 545.11182, "b": 569.3578, "coord_origin": "TOPLEFT"}}, {"id": 138, "text": "the Cell BBox Decoder. As for spanning cells, such as row", "bbox": {"l": 308.862, "t": 572.40724, "r": 545.11499, "b": 581.3138, "coord_origin": "TOPLEFT"}}, {"id": 139, "text": "or column span, the tag is broken down to \u2018", "bbox": {"l": 308.862, "t": 584.3622399999999, "r": 483.11768, "b": 593.2688, "coord_origin": "TOPLEFT"}}, {"id": 140, "text": "<", "bbox": {"l": 483.11902, "t": 584.20284, "r": 490.86792, "b": 593.04962, "coord_origin": "TOPLEFT"}}, {"id": 141, "text": "\u2019, \u2018rowspan=\u2019", "bbox": {"l": 490.86800999999997, "t": 584.3622399999999, "r": 545.11438, "b": 593.2688, "coord_origin": "TOPLEFT"}}, {"id": 142, "text": "or \u2018colspan=\u2019, with the number of spanning cells (attribute),", "bbox": {"l": 308.862, "t": 596.31725, "r": 545.11493, "b": 605.2238, "coord_origin": "TOPLEFT"}}, {"id": 143, "text": "and \u2018", "bbox": {"l": 308.862, "t": 608.27225, "r": 329.64395, "b": 617.1788, "coord_origin": "TOPLEFT"}}, {"id": 144, "text": ">", "bbox": {"l": 329.646, "t": 608.11284, "r": 337.3949, "b": 616.9596300000001, "coord_origin": "TOPLEFT"}}, {"id": 145, "text": "\u2019. The hidden state attached to \u2018", "bbox": {"l": 337.39398, "t": 608.27225, "r": 468.5914, "b": 617.1788, "coord_origin": "TOPLEFT"}}, {"id": 146, "text": "<", "bbox": {"l": 468.59496999999993, "t": 608.11284, "r": 476.34387000000004, "b": 616.9596300000001, "coord_origin": "TOPLEFT"}}, {"id": 147, "text": "\u2019 is passed to the", "bbox": {"l": 476.3439599999999, "t": 608.27225, "r": 545.11572, "b": 617.1788, "coord_origin": "TOPLEFT"}}, {"id": 148, "text": "Cell BBox Decoder. A shared feed forward network (FFN)", "bbox": {"l": 308.86197, "t": 620.22725, "r": 545.11499, "b": 629.1338000000001, "coord_origin": "TOPLEFT"}}, {"id": 149, "text": "receives the hidden states from the Structure Decoder, to", "bbox": {"l": 308.86197, "t": 632.1822500000001, "r": 545.11517, "b": 641.08881, "coord_origin": "TOPLEFT"}}, {"id": 150, "text": "provide the final detection predictions of the bounding box", "bbox": {"l": 308.86197, "t": 644.13824, "r": 545.11511, "b": 653.0448, "coord_origin": "TOPLEFT"}}, {"id": 151, "text": "coordinates and their classification.", "bbox": {"l": 308.86197, "t": 656.09325, "r": 449.42432, "b": 664.99981, "coord_origin": "TOPLEFT"}}]}, "text": "We now describe in detail the proposed method, which is composed of three main components, see Fig. 4. Our CNN Backbone Network encodes the input as a feature vector of predefined length. The input feature vector of the encoded image is passed to the Structure Decoder to produce a sequence of HTML tags that represent the structure of the table. With each prediction of an HTML standard data cell (\u2018 < td > \u2019) the hidden state of that cell is passed to the Cell BBox Decoder. As for spanning cells, such as row or column span, the tag is broken down to \u2018 < \u2019, \u2018rowspan=\u2019 or \u2018colspan=\u2019, with the number of spanning cells (attribute), and \u2018 > \u2019. The hidden state attached to \u2018 < \u2019 is passed to the Cell BBox Decoder. A shared feed forward network (FFN) receives the hidden states from the Structure Decoder, to provide the final detection predictions of the bounding box coordinates and their classification."}, {"label": "text", "id": 13, "page_no": 3, "cluster": {"id": 13, "label": "text", "bbox": {"l": 307.9578552246094, "t": 667.443115234375, "r": 545.4042358398438, "b": 713.30419921875, "coord_origin": "TOPLEFT"}, "confidence": 0.9786657691001892, "cells": [{"id": 152, "text": "CNN Backbone Network.", "bbox": {"l": 320.81696, "t": 668.2607, "r": 431.90985, "b": 677.21707, "coord_origin": "TOPLEFT"}}, {"id": 153, "text": "A ResNet-18 CNN is the", "bbox": {"l": 439.49896, "t": 668.3802499999999, "r": 545.11255, "b": 677.2868100000001, "coord_origin": "TOPLEFT"}}, {"id": 154, "text": "backbone that receives the table image and encodes it as a", "bbox": {"l": 308.86197, "t": 680.33525, "r": 545.11499, "b": 689.24181, "coord_origin": "TOPLEFT"}}, {"id": 155, "text": "vector of predefined length. The network has been modified", "bbox": {"l": 308.86197, "t": 692.290253, "r": 545.11511, "b": 701.196815, "coord_origin": "TOPLEFT"}}, {"id": 156, "text": "by removing the linear and pooling layer, as we are not per-", "bbox": {"l": 308.86197, "t": 704.245255, "r": 545.11505, "b": 713.1518169999999, "coord_origin": "TOPLEFT"}}]}, "text": "CNN Backbone Network. A ResNet-18 CNN is the backbone that receives the table image and encodes it as a vector of predefined length. The network has been modified by removing the linear and pooling layer, as we are not per-"}], "headers": [{"label": "page_footer", "id": 14, "page_no": 3, "cluster": {"id": 14, "label": "page_footer", "bbox": {"l": 294.33935546875, "t": 733.3119506835938, "r": 300.1784973144531, "b": 743.039814, "coord_origin": "TOPLEFT"}, "confidence": 0.8562639951705933, "cells": [{"id": 157, "text": "4", "bbox": {"l": 295.12097, "t": 734.133251, "r": 300.10226, "b": 743.039814, "coord_origin": "TOPLEFT"}}]}, "text": "4"}]}}, {"page_no": 4, "size": {"width": 612.0, "height": 792.0}, "cells": [{"id": 0, "text": "1.", "bbox": {"l": 81.688072, "t": 122.43970000000002, "r": 84.927567, "b": 125.62891000000002, "coord_origin": "TOPLEFT"}}, {"id": 1, "text": "Item", "bbox": {"l": 86.54731, "t": 122.43970000000002, "r": 93.026291, "b": 125.62891000000002, "coord_origin": "TOPLEFT"}}, {"id": 2, "text": "Amount", "bbox": {"l": 102.50498, "t": 115.25214000000005, "r": 115.3461, "b": 118.44135000000006, "coord_origin": "TOPLEFT"}}, {"id": 3, "text": "Names", "bbox": {"l": 82.140205, "t": 115.21489999999994, "r": 93.291527, "b": 118.40410999999995, "coord_origin": "TOPLEFT"}}, {"id": 4, "text": "1000", "bbox": {"l": 96.748268, "t": 122.43970000000002, "r": 104.3119, "b": 125.62891000000002, "coord_origin": "TOPLEFT"}}, {"id": 5, "text": "500", "bbox": {"l": 96.748268, "t": 127.74370999999985, "r": 102.42083, "b": 130.93291999999997, "coord_origin": "TOPLEFT"}}, {"id": 6, "text": "3500", "bbox": {"l": 96.748268, "t": 133.45569, "r": 104.3119, "b": 136.6449, "coord_origin": "TOPLEFT"}}, {"id": 7, "text": "150", "bbox": {"l": 96.748268, "t": 139.16772000000003, "r": 102.42083, "b": 142.35693000000003, "coord_origin": "TOPLEFT"}}, {"id": 8, "text": "unit", "bbox": {"l": 110.66107, "t": 122.43970000000002, "r": 116.14391, "b": 125.62891000000002, "coord_origin": "TOPLEFT"}}, {"id": 9, "text": "unit", "bbox": {"l": 110.66107, "t": 127.74370999999985, "r": 116.14391, "b": 130.93291999999997, "coord_origin": "TOPLEFT"}}, {"id": 10, "text": "unit", "bbox": {"l": 110.66107, "t": 133.45569, "r": 116.14391, "b": 136.6449, "coord_origin": "TOPLEFT"}}, {"id": 11, "text": "unit", "bbox": {"l": 110.66107, "t": 139.16772000000003, "r": 116.14391, "b": 142.35693000000003, "coord_origin": "TOPLEFT"}}, {"id": 12, "text": "2.", "bbox": {"l": 81.688072, "t": 127.74370999999985, "r": 84.927567, "b": 130.93291999999997, "coord_origin": "TOPLEFT"}}, {"id": 13, "text": "Item", "bbox": {"l": 86.54731, "t": 127.74370999999985, "r": 93.026291, "b": 130.93291999999997, "coord_origin": "TOPLEFT"}}, {"id": 14, "text": "3.", "bbox": {"l": 81.688072, "t": 133.45569, "r": 84.927567, "b": 136.6449, "coord_origin": "TOPLEFT"}}, {"id": 15, "text": "Item", "bbox": {"l": 86.54731, "t": 133.45569, "r": 93.026291, "b": 136.6449, "coord_origin": "TOPLEFT"}}, {"id": 16, "text": "4.", "bbox": {"l": 81.688072, "t": 139.16772000000003, "r": 84.927567, "b": 142.35693000000003, "coord_origin": "TOPLEFT"}}, {"id": 17, "text": "Item", "bbox": {"l": 86.54731, "t": 139.16772000000003, "r": 93.026291, "b": 142.35693000000003, "coord_origin": "TOPLEFT"}}, {"id": 18, "text": "Extracted", "bbox": {"l": 88.084389, "t": 90.49738000000002, "r": 113.93649, "b": 96.23798, "coord_origin": "TOPLEFT"}}, {"id": 19, "text": "Table Images", "bbox": {"l": 82.81002, "t": 97.63738999999998, "r": 119.21240000000002, "b": 103.37798999999995, "coord_origin": "TOPLEFT"}}, {"id": 20, "text": "Standardized", "bbox": {"l": 143.94247, "t": 100.60235999999998, "r": 180.01131, "b": 106.34295999999995, "coord_origin": "TOPLEFT"}}, {"id": 21, "text": "Images", "bbox": {"l": 151.94064, "t": 107.74237000000005, "r": 172.0118, "b": 113.48297000000014, "coord_origin": "TOPLEFT"}}, {"id": 22, "text": "BBox", "bbox": {"l": 251.76939000000002, "t": 80.93096999999989, "r": 266.39557, "b": 86.67156999999997, "coord_origin": "TOPLEFT"}}, {"id": 23, "text": "Decoder", "bbox": {"l": 247.51601, "t": 86.03101000000004, "r": 270.65021, "b": 91.77161000000001, "coord_origin": "TOPLEFT"}}, {"id": 24, "text": "BBoxes", "bbox": {"l": 331.03699, "t": 78.55980999999997, "r": 352.12589, "b": 84.30042000000003, "coord_origin": "TOPLEFT"}}, {"id": 25, "text": "BBoxes can be", "bbox": {"l": 390.56421, "t": 96.03223000000003, "r": 431.7261, "b": 101.77282999999989, "coord_origin": "TOPLEFT"}}, {"id": 26, "text": "traced back to the", "bbox": {"l": 386.82422, "t": 102.15228000000013, "r": 435.46966999999995, "b": 107.89287999999999, "coord_origin": "TOPLEFT"}}, {"id": 27, "text": "original image to", "bbox": {"l": 388.69589, "t": 108.27228000000002, "r": 433.6032400000001, "b": 114.01288000000011, "coord_origin": "TOPLEFT"}}, {"id": 28, "text": "extract content", "bbox": {"l": 391.07761, "t": 114.39227000000005, "r": 431.22542999999996, "b": 120.13286999999991, "coord_origin": "TOPLEFT"}}, {"id": 29, "text": "Structure Tags sequence", "bbox": {"l": 431.22650000000004, "t": 151.68511999999998, "r": 498.82068, "b": 157.42571999999996, "coord_origin": "TOPLEFT"}}, {"id": 30, "text": "provide full description of", "bbox": {"l": 431.1738, "t": 157.80517999999995, "r": 498.87753000000004, "b": 163.54578000000004, "coord_origin": "TOPLEFT"}}, {"id": 31, "text": "the table structure", "bbox": {"l": 440.5289, "t": 163.92516999999998, "r": 489.51827999999995, "b": 169.66576999999995, "coord_origin": "TOPLEFT"}}, {"id": 32, "text": "Structure Tags", "bbox": {"l": 328.37479, "t": 178.25385000000006, "r": 367.72333, "b": 183.99445000000003, "coord_origin": "TOPLEFT"}}, {"id": 33, "text": "BBoxes in sync", "bbox": {"l": 331.84451, "t": 123.90886999999998, "r": 373.67963, "b": 129.64948000000015, "coord_origin": "TOPLEFT"}}, {"id": 34, "text": "with tag sequence", "bbox": {"l": 331.84451, "t": 129.00885000000017, "r": 381.17786, "b": 134.74945000000002, "coord_origin": "TOPLEFT"}}, {"id": 35, "text": "Encoder", "bbox": {"l": 196.62633, "t": 88.11621000000002, "r": 219.42332, "b": 93.85681, "coord_origin": "TOPLEFT"}}, {"id": 36, "text": "Structure", "bbox": {"l": 246.66771, "t": 129.4946900000001, "r": 271.49899, "b": 135.23528999999996, "coord_origin": "TOPLEFT"}}, {"id": 37, "text": "Decoder", "bbox": {"l": 247.51601, "t": 134.59473000000003, "r": 270.65021, "b": 140.33533, "coord_origin": "TOPLEFT"}}, {"id": 38, "text": "[x1, y2, x2, y2]", "bbox": {"l": 330.63071, "t": 89.01923, "r": 365.55347, "b": 94.75982999999997, "coord_origin": "TOPLEFT"}}, {"id": 39, "text": "[x1', y2', x2', y2']", "bbox": {"l": 330.63071, "t": 97.17926, "r": 370.22717, "b": 102.91985999999997, "coord_origin": "TOPLEFT"}}, {"id": 40, "text": "[x1'', y2'', x2'', y2'']", "bbox": {"l": 330.63071, "t": 105.33922999999993, "r": 374.51157, "b": 111.07983000000002, "coord_origin": "TOPLEFT"}}, {"id": 41, "text": "...", "bbox": {"l": 330.63071, "t": 113.49926999999991, "r": 335.73233, "b": 119.23987, "coord_origin": "TOPLEFT"}}, {"id": 42, "text": "<TR>", "bbox": {"l": 322.30579, "t": 141.79236000000003, "r": 335.05988, "b": 146.57617000000005, "coord_origin": "TOPLEFT"}}, {"id": 43, "text": "<TD>", "bbox": {"l": 322.30579, "t": 148.93231000000003, "r": 335.05988, "b": 153.71613000000002, "coord_origin": "TOPLEFT"}}, {"id": 44, "text": "1", "bbox": {"l": 337.54971, "t": 148.55579, "r": 340.95242, "b": 154.29638999999997, "coord_origin": "TOPLEFT"}}, {"id": 45, "text": "</TD><TD colspan=\"2\">", "bbox": {"l": 343.56262, "t": 148.93231000000003, "r": 398.91446, "b": 153.71613000000002, "coord_origin": "TOPLEFT"}}, {"id": 46, "text": "</TD>", "bbox": {"l": 407.41718, "t": 148.93231000000003, "r": 421.58801, "b": 153.71613000000002, "coord_origin": "TOPLEFT"}}, {"id": 47, "text": "</TR><TR>", "bbox": {"l": 322.30579, "t": 156.07232999999997, "r": 349.23022, "b": 160.85613999999998, "coord_origin": "TOPLEFT"}}, {"id": 48, "text": "<TD>", "bbox": {"l": 322.30579, "t": 163.21234000000004, "r": 335.05988, "b": 167.99614999999994, "coord_origin": "TOPLEFT"}}, {"id": 49, "text": "</TD><TD>...", "bbox": {"l": 343.56155, "t": 163.21234000000004, "r": 374.73685, "b": 167.99614999999994, "coord_origin": "TOPLEFT"}}, {"id": 50, "text": "...", "bbox": {"l": 322.30579, "t": 170.35235999999998, "r": 326.55716, "b": 175.13617, "coord_origin": "TOPLEFT"}}, {"id": 51, "text": "1", "bbox": {"l": 323.51111, "t": 89.66967999999997, "r": 326.91382, "b": 95.41027999999994, "coord_origin": "TOPLEFT"}}, {"id": 52, "text": "2", "bbox": {"l": 323.71509, "t": 97.78887999999995, "r": 327.1178, "b": 103.52948000000004, "coord_origin": "TOPLEFT"}}, {"id": 53, "text": "3", "bbox": {"l": 323.71509, "t": 105.98969, "r": 327.1178, "b": 111.73029000000008, "coord_origin": "TOPLEFT"}}, {"id": 54, "text": "2", "bbox": {"l": 401.4816, "t": 148.54625999999996, "r": 404.88431, "b": 154.28687000000002, "coord_origin": "TOPLEFT"}}, {"id": 55, "text": "3", "bbox": {"l": 337.6976, "t": 162.68451000000005, "r": 341.10031, "b": 168.42511000000002, "coord_origin": "TOPLEFT"}}, {"id": 56, "text": "3", "bbox": {"l": 454.46378, "t": 104.54584, "r": 457.86648999999994, "b": 110.28644000000008, "coord_origin": "TOPLEFT"}}, {"id": 57, "text": "2", "bbox": {"l": 493.32580999999993, "t": 91.09546, "r": 496.72852, "b": 96.83605999999997, "coord_origin": "TOPLEFT"}}, {"id": 58, "text": "1", "bbox": {"l": 454.08298, "t": 90.56879000000015, "r": 457.48569000000003, "b": 96.30939000000001, "coord_origin": "TOPLEFT"}}, {"id": 59, "text": "Figure 3:", "bbox": {"l": 50.112, "t": 204.10535000000004, "r": 86.883949, "b": 213.01189999999997, "coord_origin": "TOPLEFT"}}, {"id": 60, "text": "TableFormer", "bbox": {"l": 94.020996, "t": 203.98577999999998, "r": 149.85141, "b": 212.94214, "coord_origin": "TOPLEFT"}}, {"id": 61, "text": "takes in an image of the PDF and creates bounding box and HTML structure predictions that are", "bbox": {"l": 152.86099, "t": 204.10535000000004, "r": 545.10846, "b": 213.01189999999997, "coord_origin": "TOPLEFT"}}, {"id": 62, "text": "synchronized. The bounding boxes grabs the content from the PDF and inserts it in the structure.", "bbox": {"l": 50.111992, "t": 216.06035999999995, "r": 436.0134, "b": 224.96691999999996, "coord_origin": "TOPLEFT"}}, {"id": 63, "text": "Input Image", "bbox": {"l": 74.253464, "t": 258.21472000000006, "r": 101.75846, "b": 264.17474000000004, "coord_origin": "TOPLEFT"}}, {"id": 64, "text": "Tokenised Tags", "bbox": {"l": 122.29972, "t": 258.34520999999995, "r": 157.83972, "b": 264.30524, "coord_origin": "TOPLEFT"}}, {"id": 65, "text": "Multi-Head Attention", "bbox": {"l": 78.549347, "t": 371.38579999999996, "r": 125.68359000000001, "b": 377.04782, "coord_origin": "TOPLEFT"}}, {"id": 66, "text": "Add", "bbox": {"l": 78.513298, "t": 391.31857, "r": 84.644547, "b": 396.98059, "coord_origin": "TOPLEFT"}}, {"id": 67, "text": "& Normalisation", "bbox": {"l": 116.52705, "t": 391.31857, "r": 125.11079999999998, "b": 396.98059, "coord_origin": "TOPLEFT"}}, {"id": 68, "text": "Feed Forward Network", "bbox": {"l": 76.024773, "t": 424.45309, "r": 127.92327000000002, "b": 430.11511, "coord_origin": "TOPLEFT"}}, {"id": 69, "text": "Add", "bbox": {"l": 78.382828, "t": 444.88956, "r": 84.514076, "b": 450.55157, "coord_origin": "TOPLEFT"}}, {"id": 70, "text": "& Normalisation", "bbox": {"l": 116.39658, "t": 444.88956, "r": 124.98033, "b": 450.55157, "coord_origin": "TOPLEFT"}}, {"id": 71, "text": "Linear", "bbox": {"l": 167.46945, "t": 462.44324, "r": 181.6292, "b": 468.10526, "coord_origin": "TOPLEFT"}}, {"id": 72, "text": "Softmax", "bbox": {"l": 165.61292, "t": 478.47107, "r": 184.43242, "b": 484.13309, "coord_origin": "TOPLEFT"}}, {"id": 73, "text": "CNN BACKBONE ENCODER", "bbox": {"l": 65.319511, "t": 324.26235999999994, "r": 132.9245, "b": 330.22235000000006, "coord_origin": "TOPLEFT"}}, {"id": 74, "text": "[30, 1, 2, 3, 4, \u2026 3, ", "bbox": {"l": 119.51457, "t": 269.66394, "r": 162.98782, "b": 274.72992, "coord_origin": "TOPLEFT"}}, {"id": 75, "text": "4, 5, 8, 31]", "bbox": {"l": 128.72858, "t": 274.91394, "r": 151.41083, "b": 279.97992, "coord_origin": "TOPLEFT"}}, {"id": 76, "text": "Positional ", "bbox": {"l": 60.434211999999995, "t": 338.95993, "r": 80.27021, "b": 344.26993, "coord_origin": "TOPLEFT"}}, {"id": 77, "text": "Encoding", "bbox": {"l": 60.598457, "t": 343.38605, "r": 78.854958, "b": 348.69604, "coord_origin": "TOPLEFT"}}, {"id": 78, "text": "Positional ", "bbox": {"l": 134.82877, "t": 293.37762, "r": 154.66476, "b": 298.68762, "coord_origin": "TOPLEFT"}}, {"id": 79, "text": "Encoding", "bbox": {"l": 134.99303, "t": 297.80370999999997, "r": 153.24953, "b": 303.11371, "coord_origin": "TOPLEFT"}}, {"id": 80, "text": "Add & Normalisation", "bbox": {"l": 150.55193, "t": 345.35861, "r": 197.14943, "b": 351.02063, "coord_origin": "TOPLEFT"}}, {"id": 81, "text": "Add", "bbox": {"l": 150.55193, "t": 394.4234, "r": 156.68318, "b": 400.08542, "coord_origin": "TOPLEFT"}}, {"id": 82, "text": "& Normalisation", "bbox": {"l": 188.56567, "t": 394.4234, "r": 197.14943, "b": 400.08542, "coord_origin": "TOPLEFT"}}, {"id": 83, "text": "Multi-Head Attention", "bbox": {"l": 150.18539, "t": 375.66843, "r": 197.31964, "b": 381.33044, "coord_origin": "TOPLEFT"}}, {"id": 84, "text": "Add", "bbox": {"l": 150.55193, "t": 440.24847000000005, "r": 156.68318, "b": 445.91049, "coord_origin": "TOPLEFT"}}, {"id": 85, "text": "& Normalisation", "bbox": {"l": 188.56567, "t": 440.24847000000005, "r": 197.14943, "b": 445.91049, "coord_origin": "TOPLEFT"}}, {"id": 86, "text": "Feed Forward Network", "bbox": {"l": 147.86377, "t": 422.09335, "r": 199.76227, "b": 427.75537, "coord_origin": "TOPLEFT"}}, {"id": 87, "text": "Linear", "bbox": {"l": 241.56567000000004, "t": 314.26285000000007, "r": 255.72542, "b": 319.92487, "coord_origin": "TOPLEFT"}}, {"id": 88, "text": "Linear", "bbox": {"l": 241.91730000000004, "t": 361.36493, "r": 256.07706, "b": 367.02695, "coord_origin": "TOPLEFT"}}, {"id": 89, "text": "Attention", "bbox": {"l": 228.054, "t": 336.61929000000003, "r": 248.72363000000004, "b": 342.28131, "coord_origin": "TOPLEFT"}}, {"id": 90, "text": "Network", "bbox": {"l": 246.2919, "t": 336.61929000000003, "r": 269.39325, "b": 342.28131, "coord_origin": "TOPLEFT"}}, {"id": 91, "text": "MLP", "bbox": {"l": 228.44568000000004, "t": 405.14682, "r": 238.73892, "b": 410.80884, "coord_origin": "TOPLEFT"}}, {"id": 92, "text": "Linear ", "bbox": {"l": 256.29767, "t": 405.2032500000001, "r": 271.77792, "b": 410.86526, "coord_origin": "TOPLEFT"}}, {"id": 93, "text": "Sigmoid", "bbox": {"l": 239.54543, "t": 382.21344, "r": 258.08942, "b": 387.87546, "coord_origin": "TOPLEFT"}}, {"id": 94, "text": "Transformer Encoder Network", "bbox": {"l": 54.14704100000001, "t": 384.87183, "r": 59.51152, "b": 449.78326, "coord_origin": "TOPLEFT"}}, {"id": 95, "text": "x2", "bbox": {"l": 54.235424, "t": 373.81232, "r": 59.30449699999999, "b": 378.45421999999996, "coord_origin": "TOPLEFT"}}, {"id": 96, "text": "Encoded Output", "bbox": {"l": 85.295891, "t": 484.53189, "r": 122.16431, "b": 490.36688, "coord_origin": "TOPLEFT"}}, {"id": 97, "text": "Encoded Output", "bbox": {"l": 229.66599, "t": 279.54607999999996, "r": 265.3194, "b": 285.45572000000004, "coord_origin": "TOPLEFT"}}, {"id": 98, "text": "Predicted Tags", "bbox": {"l": 157.17369, "t": 500.3031, "r": 190.41711, "b": 506.12943, "coord_origin": "TOPLEFT"}}, {"id": 99, "text": "Bounding Boxes & ", "bbox": {"l": 227.81598999999997, "t": 438.05542, "r": 270.78442, "b": 443.89206, "coord_origin": "TOPLEFT"}}, {"id": 100, "text": "Classification", "bbox": {"l": 233.70262, "t": 444.06183, "r": 263.51105, "b": 449.8904999999999, "coord_origin": "TOPLEFT"}}, {"id": 101, "text": "Transformer ", "bbox": {"l": 184.74655, "t": 293.39502, "r": 212.16055, "b": 298.75903, "coord_origin": "TOPLEFT"}}, {"id": 102, "text": "Decoder Network", "bbox": {"l": 178.91229, "t": 299.14502, "r": 216.74378999999996, "b": 304.50903, "coord_origin": "TOPLEFT"}}, {"id": 103, "text": "x4", "bbox": {"l": 194.24574, "t": 282.7822, "r": 198.89099, "b": 287.84817999999996, "coord_origin": "TOPLEFT"}}, {"id": 104, "text": "CELL BBOX DECODER", "bbox": {"l": 221.45587, "t": 271.86914, "r": 276.47089, "b": 277.82916, "coord_origin": "TOPLEFT"}}, {"id": 105, "text": "Masked Multi-Head ", "bbox": {"l": 151.65219, "t": 323.44241, "r": 197.29019, "b": 329.10443, "coord_origin": "TOPLEFT"}}, {"id": 106, "text": "Attention", "bbox": {"l": 163.43277, "t": 329.44241, "r": 184.19028, "b": 335.10443, "coord_origin": "TOPLEFT"}}, {"id": 107, "text": "Figure 4: Given an input image of a table, the", "bbox": {"l": 50.112, "t": 527.90237, "r": 229.78752, "b": 536.80893, "coord_origin": "TOPLEFT"}}, {"id": 108, "text": "Encoder", "bbox": {"l": 231.787, "t": 527.7828099999999, "r": 267.76196, "b": 536.7392, "coord_origin": "TOPLEFT"}}, {"id": 109, "text": "pro-", "bbox": {"l": 269.76401, "t": 527.90237, "r": 286.36169, "b": 536.80893, "coord_origin": "TOPLEFT"}}, {"id": 110, "text": "duces fixed-length features that represent the input image.", "bbox": {"l": 50.112015, "t": 539.85738, "r": 286.36508, "b": 548.76393, "coord_origin": "TOPLEFT"}}, {"id": 111, "text": "The features are then passed to both the", "bbox": {"l": 50.112015, "t": 551.81337, "r": 205.84735, "b": 560.71992, "coord_origin": "TOPLEFT"}}, {"id": 112, "text": "Structure Decoder", "bbox": {"l": 208.01802, "t": 551.69382, "r": 286.36392, "b": 560.6501900000001, "coord_origin": "TOPLEFT"}}, {"id": 113, "text": "and", "bbox": {"l": 50.112015, "t": 563.76837, "r": 64.498009, "b": 572.67493, "coord_origin": "TOPLEFT"}}, {"id": 114, "text": "Cell BBox Decoder", "bbox": {"l": 68.165016, "t": 563.64882, "r": 151.31288, "b": 572.60519, "coord_origin": "TOPLEFT"}}, {"id": 115, "text": ".", "bbox": {"l": 151.31302, "t": 563.76837, "r": 153.80367, "b": 572.67493, "coord_origin": "TOPLEFT"}}, {"id": 116, "text": "During training, the", "bbox": {"l": 160.41884, "t": 563.76837, "r": 241.93283000000002, "b": 572.67493, "coord_origin": "TOPLEFT"}}, {"id": 117, "text": "Structure", "bbox": {"l": 245.59502, "t": 563.64882, "r": 286.362, "b": 572.60519, "coord_origin": "TOPLEFT"}}, {"id": 118, "text": "Decoder", "bbox": {"l": 50.112015, "t": 575.60382, "r": 85.519089, "b": 584.5602, "coord_origin": "TOPLEFT"}}, {"id": 119, "text": "receives \u2018tokenized tags\u2019 of the HTML code that", "bbox": {"l": 88.623016, "t": 575.7233699999999, "r": 286.36072, "b": 584.6299300000001, "coord_origin": "TOPLEFT"}}, {"id": 120, "text": "represent the table structure. Afterwards, a transformer en-", "bbox": {"l": 50.112015, "t": 587.6783800000001, "r": 286.36511, "b": 596.58493, "coord_origin": "TOPLEFT"}}, {"id": 121, "text": "coder and decoder architecture is employed to produce fea-", "bbox": {"l": 50.112015, "t": 599.63338, "r": 286.36508, "b": 608.53993, "coord_origin": "TOPLEFT"}}, {"id": 122, "text": "tures that are received by a linear layer, and the", "bbox": {"l": 50.112015, "t": 611.58838, "r": 240.43756000000002, "b": 620.4949300000001, "coord_origin": "TOPLEFT"}}, {"id": 123, "text": "Cell BBox", "bbox": {"l": 243.19801, "t": 611.46883, "r": 286.36597, "b": 620.4252, "coord_origin": "TOPLEFT"}}, {"id": 124, "text": "Decoder. The linear layer is applied to the features to", "bbox": {"l": 50.112015, "t": 623.42482, "r": 286.36511, "b": 632.3812, "coord_origin": "TOPLEFT"}}, {"id": 125, "text": "predict the tags. Simultaneously, the Cell BBox Decoder", "bbox": {"l": 50.112015, "t": 635.37982, "r": 286.36508, "b": 644.3362, "coord_origin": "TOPLEFT"}}, {"id": 126, "text": "selects features referring to the data cells (\u2018", "bbox": {"l": 50.112015, "t": 647.45438, "r": 220.58205, "b": 656.36093, "coord_origin": "TOPLEFT"}}, {"id": 127, "text": "<", "bbox": {"l": 220.57802000000004, "t": 647.29497, "r": 228.32693, "b": 656.14175, "coord_origin": "TOPLEFT"}}, {"id": 128, "text": "td", "bbox": {"l": 228.32700999999997, "t": 647.45438, "r": 236.07791000000003, "b": 656.36093, "coord_origin": "TOPLEFT"}}, {"id": 129, "text": ">", "bbox": {"l": 236.07802000000004, "t": 647.29497, "r": 243.82693, "b": 656.14175, "coord_origin": "TOPLEFT"}}, {"id": 130, "text": "\u2019, \u2018", "bbox": {"l": 243.82602, "t": 647.45438, "r": 255.29298000000003, "b": 656.36093, "coord_origin": "TOPLEFT"}}, {"id": 131, "text": "<", "bbox": {"l": 255.29102000000003, "t": 647.29497, "r": 263.03992, "b": 656.14175, "coord_origin": "TOPLEFT"}}, {"id": 132, "text": "\u2019) and", "bbox": {"l": 263.04001, "t": 647.45438, "r": 286.36246, "b": 656.36093, "coord_origin": "TOPLEFT"}}, {"id": 133, "text": "passes them through an attention network, an MLP, and a", "bbox": {"l": 50.112015, "t": 659.40938, "r": 286.36511, "b": 668.31594, "coord_origin": "TOPLEFT"}}, {"id": 134, "text": "linear layer to predict the bounding boxes.", "bbox": {"l": 50.112015, "t": 671.36438, "r": 218.46996, "b": 680.27094, "coord_origin": "TOPLEFT"}}, {"id": 135, "text": "forming classification, and adding an adaptive pooling", "bbox": {"l": 308.862, "t": 249.53441999999995, "r": 523.05786, "b": 258.44097999999997, "coord_origin": "TOPLEFT"}}, {"id": 136, "text": "layer", "bbox": {"l": 525.19983, "t": 249.53441999999995, "r": 545.11505, "b": 258.44097999999997, "coord_origin": "TOPLEFT"}}, {"id": 137, "text": "of size 28*28. ResNet by default downsamples the", "bbox": {"l": 308.862, "t": 261.49042, "r": 517.55847, "b": 270.39697, "coord_origin": "TOPLEFT"}}, {"id": 138, "text": "image", "bbox": {"l": 520.76642, "t": 261.49042, "r": 545.11499, "b": 270.39697, "coord_origin": "TOPLEFT"}}, {"id": 139, "text": "resolution by 32 and then the encoded image is provided", "bbox": {"l": 308.862, "t": 273.44537, "r": 534.80377, "b": 282.35196, "coord_origin": "TOPLEFT"}}, {"id": 140, "text": "to", "bbox": {"l": 537.36414, "t": 273.44537, "r": 545.11505, "b": 282.35196, "coord_origin": "TOPLEFT"}}, {"id": 141, "text": "both the", "bbox": {"l": 308.862, "t": 285.40039, "r": 341.24045, "b": 294.3069499999999, "coord_origin": "TOPLEFT"}}, {"id": 142, "text": "Structure Decoder", "bbox": {"l": 343.73099, "t": 285.49005, "r": 417.23508, "b": 294.07782000000003, "coord_origin": "TOPLEFT"}}, {"id": 143, "text": ", and", "bbox": {"l": 417.23398, "t": 285.40039, "r": 436.60129, "b": 294.3069499999999, "coord_origin": "TOPLEFT"}}, {"id": 144, "text": "Cell BBox Decoder", "bbox": {"l": 439.09198, "t": 285.49005, "r": 516.56116, "b": 294.07782000000003, "coord_origin": "TOPLEFT"}}, {"id": 145, "text": ".", "bbox": {"l": 516.56097, "t": 285.40039, "r": 519.05164, "b": 294.3069499999999, "coord_origin": "TOPLEFT"}}, {"id": 146, "text": "Structure Decoder.", "bbox": {"l": 320.81696, "t": 297.33981, "r": 400.86649, "b": 306.2962, "coord_origin": "TOPLEFT"}}, {"id": 147, "text": "The transformer architecture of", "bbox": {"l": 403.91394, "t": 297.45938, "r": 528.33685, "b": 306.36594, "coord_origin": "TOPLEFT"}}, {"id": 148, "text": "this", "bbox": {"l": 530.7179, "t": 297.45938, "r": 545.11383, "b": 306.36594, "coord_origin": "TOPLEFT"}}, {"id": 149, "text": "component is based on the work proposed in [31].", "bbox": {"l": 308.86194, "t": 309.41437, "r": 517.5285, "b": 318.32092, "coord_origin": "TOPLEFT"}}, {"id": 150, "text": "After", "bbox": {"l": 524.09387, "t": 309.41437, "r": 545.11493, "b": 318.32092, "coord_origin": "TOPLEFT"}}, {"id": 151, "text": "extensive experimentation, the", "bbox": {"l": 308.86194, "t": 321.36934999999994, "r": 432.35833999999994, "b": 330.27591, "coord_origin": "TOPLEFT"}}, {"id": 152, "text": "Structure Decoder", "bbox": {"l": 435.81995000000006, "t": 321.45901, "r": 510.29041, "b": 330.04678, "coord_origin": "TOPLEFT"}}, {"id": 153, "text": "is", "bbox": {"l": 513.97797, "t": 321.36934999999994, "r": 520.62305, "b": 330.27591, "coord_origin": "TOPLEFT"}}, {"id": 154, "text": "mod-", "bbox": {"l": 524.08008, "t": 321.36934999999994, "r": 545.11115, "b": 330.27591, "coord_origin": "TOPLEFT"}}, {"id": 155, "text": "eled as a transformer encoder with two encoder layers", "bbox": {"l": 308.86197, "t": 333.32434, "r": 527.76013, "b": 342.2309, "coord_origin": "TOPLEFT"}}, {"id": 156, "text": "and", "bbox": {"l": 530.729, "t": 333.32434, "r": 545.11499, "b": 342.2309, "coord_origin": "TOPLEFT"}}, {"id": 157, "text": "a transformer decoder made from a stack of 4 decoder", "bbox": {"l": 308.86197, "t": 345.27933, "r": 526.85352, "b": 354.18588, "coord_origin": "TOPLEFT"}}, {"id": 158, "text": "lay-", "bbox": {"l": 529.62311, "t": 345.27933, "r": 545.11493, "b": 354.18588, "coord_origin": "TOPLEFT"}}, {"id": 159, "text": "ers that comprise mainly of multi-head attention and", "bbox": {"l": 308.86197, "t": 357.23532, "r": 524.51245, "b": 366.14188, "coord_origin": "TOPLEFT"}}, {"id": 160, "text": "feed", "bbox": {"l": 527.96948, "t": 357.23532, "r": 545.11511, "b": 366.14188, "coord_origin": "TOPLEFT"}}, {"id": 161, "text": "forward layers.", "bbox": {"l": 308.86197, "t": 369.19031000000007, "r": 370.39096, "b": 378.09685999999994, "coord_origin": "TOPLEFT"}}, {"id": 162, "text": "This configuration uses fewer layers", "bbox": {"l": 377.44449, "t": 369.19031000000007, "r": 526.91339, "b": 378.09685999999994, "coord_origin": "TOPLEFT"}}, {"id": 163, "text": "and", "bbox": {"l": 530.72906, "t": 369.19031000000007, "r": 545.11505, "b": 378.09685999999994, "coord_origin": "TOPLEFT"}}, {"id": 164, "text": "heads in comparison to networks applied to other", "bbox": {"l": 308.86197, "t": 381.14529000000005, "r": 505.46395999999993, "b": 390.05185, "coord_origin": "TOPLEFT"}}, {"id": 165, "text": "problems", "bbox": {"l": 508.03430000000003, "t": 381.14529000000005, "r": 545.11511, "b": 390.05185, "coord_origin": "TOPLEFT"}}, {"id": 166, "text": "(e.g. \u201cScene Understanding\u201d, \u201cImage Captioning\u201d),", "bbox": {"l": 308.86197, "t": 393.10028, "r": 517.68799, "b": 402.00684, "coord_origin": "TOPLEFT"}}, {"id": 167, "text": "some-", "bbox": {"l": 520.76642, "t": 393.10028, "r": 545.11499, "b": 402.00684, "coord_origin": "TOPLEFT"}}, {"id": 168, "text": "thing which we relate to the simplicity of table images.", "bbox": {"l": 308.86197, "t": 405.05526999999995, "r": 528.01935, "b": 413.96182, "coord_origin": "TOPLEFT"}}, {"id": 169, "text": "The transformer encoder receives an encoded", "bbox": {"l": 320.81696, "t": 417.11426, "r": 515.49609, "b": 426.02081, "coord_origin": "TOPLEFT"}}, {"id": 170, "text": "image", "bbox": {"l": 520.7663, "t": 417.11426, "r": 545.11487, "b": 426.02081, "coord_origin": "TOPLEFT"}}, {"id": 171, "text": "from the", "bbox": {"l": 308.86197, "t": 429.0692399999999, "r": 343.72107, "b": 437.9758, "coord_origin": "TOPLEFT"}}, {"id": 172, "text": "CNN Backbone Network", "bbox": {"l": 347.03796, "t": 429.15891, "r": 446.45471000000003, "b": 437.74667, "coord_origin": "TOPLEFT"}}, {"id": 173, "text": "and refines it", "bbox": {"l": 449.93996999999996, "t": 429.0692399999999, "r": 503.06055000000003, "b": 437.9758, "coord_origin": "TOPLEFT"}}, {"id": 174, "text": "through", "bbox": {"l": 506.37808, "t": 429.0692399999999, "r": 537.3717, "b": 437.9758, "coord_origin": "TOPLEFT"}}, {"id": 175, "text": "a", "bbox": {"l": 540.68927, "t": 429.0692399999999, "r": 545.11267, "b": 437.9758, "coord_origin": "TOPLEFT"}}, {"id": 176, "text": "multi-head dot-product attention layer, followed by a", "bbox": {"l": 308.86197, "t": 441.02423, "r": 522.78894, "b": 449.93079, "coord_origin": "TOPLEFT"}}, {"id": 177, "text": "Feed", "bbox": {"l": 525.7478, "t": 441.02423, "r": 545.11511, "b": 449.93079, "coord_origin": "TOPLEFT"}}, {"id": 178, "text": "Forward Network.", "bbox": {"l": 308.86197, "t": 452.97922, "r": 384.14929, "b": 461.88577, "coord_origin": "TOPLEFT"}}, {"id": 179, "text": "During training, the transformer", "bbox": {"l": 393.37466, "t": 452.97922, "r": 527.84985, "b": 461.88577, "coord_origin": "TOPLEFT"}}, {"id": 180, "text": "de-", "bbox": {"l": 532.39282, "t": 452.97922, "r": 545.11505, "b": 461.88577, "coord_origin": "TOPLEFT"}}, {"id": 181, "text": "coder receives as input the output feature produced by", "bbox": {"l": 308.86197, "t": 464.93521, "r": 529.7627, "b": 473.84177, "coord_origin": "TOPLEFT"}}, {"id": 182, "text": "the", "bbox": {"l": 532.94073, "t": 464.93521, "r": 545.11505, "b": 473.84177, "coord_origin": "TOPLEFT"}}, {"id": 183, "text": "transformer encoder, and the tokenized input of the", "bbox": {"l": 308.86197, "t": 476.8902, "r": 514.17126, "b": 485.79675, "coord_origin": "TOPLEFT"}}, {"id": 184, "text": "HTML", "bbox": {"l": 516.89105, "t": 476.8902, "r": 545.11511, "b": 485.79675, "coord_origin": "TOPLEFT"}}, {"id": 185, "text": "ground-truth tags. Using a stack of multi-head attention", "bbox": {"l": 308.86197, "t": 488.84518, "r": 527.63068, "b": 497.75174, "coord_origin": "TOPLEFT"}}, {"id": 186, "text": "lay-", "bbox": {"l": 529.62317, "t": 488.84518, "r": 545.11499, "b": 497.75174, "coord_origin": "TOPLEFT"}}, {"id": 187, "text": "ers, different aspects of the tag sequence could be", "bbox": {"l": 308.86197, "t": 500.80017, "r": 508.3630999999999, "b": 509.70673, "coord_origin": "TOPLEFT"}}, {"id": 188, "text": "inferred.", "bbox": {"l": 511.09286000000003, "t": 500.80017, "r": 545.11511, "b": 509.70673, "coord_origin": "TOPLEFT"}}, {"id": 189, "text": "This is achieved by each attention head on a layer operating", "bbox": {"l": 308.86197, "t": 512.7551599999999, "r": 545.11499, "b": 521.6617100000001, "coord_origin": "TOPLEFT"}}, {"id": 190, "text": "in a different subspace, and then combining altogether their", "bbox": {"l": 308.86197, "t": 524.71115, "r": 545.11511, "b": 533.61771, "coord_origin": "TOPLEFT"}}, {"id": 191, "text": "attention score.", "bbox": {"l": 308.86197, "t": 536.66615, "r": 369.73349, "b": 545.57271, "coord_origin": "TOPLEFT"}}, {"id": 192, "text": "Cell BBox Decoder.", "bbox": {"l": 320.81696, "t": 548.6046, "r": 404.76184, "b": 557.56097, "coord_origin": "TOPLEFT"}}, {"id": 193, "text": "Our architecture allows to simul-", "bbox": {"l": 410.34094, "t": 548.72415, "r": 545.11505, "b": 557.63071, "coord_origin": "TOPLEFT"}}, {"id": 194, "text": "taneously predict HTML tags and bounding boxes for each", "bbox": {"l": 308.86194, "t": 560.68015, "r": 545.11493, "b": 569.5867000000001, "coord_origin": "TOPLEFT"}}, {"id": 195, "text": "table cell without the need of a separate object detector end", "bbox": {"l": 308.86194, "t": 572.6351500000001, "r": 545.11511, "b": 581.5417, "coord_origin": "TOPLEFT"}}, {"id": 196, "text": "to end. This approach is inspired by DETR [1] which em-", "bbox": {"l": 308.86194, "t": 584.59015, "r": 545.11493, "b": 593.4967, "coord_origin": "TOPLEFT"}}, {"id": 197, "text": "ploys a Transformer Encoder, and Decoder that looks for", "bbox": {"l": 308.86194, "t": 596.54515, "r": 545.11499, "b": 605.45171, "coord_origin": "TOPLEFT"}}, {"id": 198, "text": "a specific number of object queries (potential object detec-", "bbox": {"l": 308.86194, "t": 608.50015, "r": 545.11505, "b": 617.40671, "coord_origin": "TOPLEFT"}}, {"id": 199, "text": "tions). As our model utilizes a transformer architecture, the", "bbox": {"l": 308.86194, "t": 620.45515, "r": 545.11505, "b": 629.36171, "coord_origin": "TOPLEFT"}}, {"id": 200, "text": "hidden state of the", "bbox": {"l": 308.86194, "t": 632.41115, "r": 381.67859, "b": 641.3177000000001, "coord_origin": "TOPLEFT"}}, {"id": 201, "text": "<", "bbox": {"l": 383.99695, "t": 632.25174, "r": 391.74585, "b": 641.09853, "coord_origin": "TOPLEFT"}}, {"id": 202, "text": "td", "bbox": {"l": 391.74594, "t": 632.41115, "r": 399.49686, "b": 641.3177000000001, "coord_origin": "TOPLEFT"}}, {"id": 203, "text": ">", "bbox": {"l": 399.49695, "t": 632.25174, "r": 407.24585, "b": 641.09853, "coord_origin": "TOPLEFT"}}, {"id": 204, "text": "\u2019 and \u2018", "bbox": {"l": 407.24594, "t": 632.41115, "r": 432.90958, "b": 641.3177000000001, "coord_origin": "TOPLEFT"}}, {"id": 205, "text": "<", "bbox": {"l": 432.90792999999996, "t": 632.25174, "r": 440.65683000000007, "b": 641.09853, "coord_origin": "TOPLEFT"}}, {"id": 206, "text": "\u2019 HTML structure tags be-", "bbox": {"l": 440.65691999999996, "t": 632.41115, "r": 545.11475, "b": 641.3177000000001, "coord_origin": "TOPLEFT"}}, {"id": 207, "text": "come the object query.", "bbox": {"l": 308.86194, "t": 644.3661500000001, "r": 398.96371, "b": 653.27271, "coord_origin": "TOPLEFT"}}, {"id": 208, "text": "The encoding generated by the", "bbox": {"l": 320.81693, "t": 656.42516, "r": 444.34316999999993, "b": 665.33172, "coord_origin": "TOPLEFT"}}, {"id": 209, "text": "CNN Backbone Network", "bbox": {"l": 447.00591999999995, "t": 656.51482, "r": 545.1076, "b": 665.10258, "coord_origin": "TOPLEFT"}}, {"id": 210, "text": "along with the features acquired for every data cell from the", "bbox": {"l": 308.86194, "t": 668.38016, "r": 545.11505, "b": 677.2867200000001, "coord_origin": "TOPLEFT"}}, {"id": 211, "text": "Transformer Decoder are then passed to the attention net-", "bbox": {"l": 308.86194, "t": 680.33516, "r": 545.11505, "b": 689.24172, "coord_origin": "TOPLEFT"}}, {"id": 212, "text": "work. The attention network takes both inputs and learns to", "bbox": {"l": 308.86194, "t": 692.290161, "r": 545.11505, "b": 701.196724, "coord_origin": "TOPLEFT"}}, {"id": 213, "text": "provide an attention weighted encoding. This weighted at-", "bbox": {"l": 308.86194, "t": 704.245163, "r": 545.11505, "b": 713.151726, "coord_origin": "TOPLEFT"}}, {"id": 214, "text": "5", "bbox": {"l": 295.12094, "t": 734.13316, "r": 300.10223, "b": 743.039722, "coord_origin": "TOPLEFT"}}], "predictions": {"layout": {"clusters": [{"id": 0, "label": "picture", "bbox": {"l": 74.30538940429688, "t": 77.91118621826172, "r": 519.9801025390625, "b": 183.99445000000003, "coord_origin": "TOPLEFT"}, "confidence": 0.9296938180923462, "cells": [{"id": 0, "text": "1.", "bbox": {"l": 81.688072, "t": 122.43970000000002, "r": 84.927567, "b": 125.62891000000002, "coord_origin": "TOPLEFT"}}, {"id": 1, "text": "Item", "bbox": {"l": 86.54731, "t": 122.43970000000002, "r": 93.026291, "b": 125.62891000000002, "coord_origin": "TOPLEFT"}}, {"id": 2, "text": "Amount", "bbox": {"l": 102.50498, "t": 115.25214000000005, "r": 115.3461, "b": 118.44135000000006, "coord_origin": "TOPLEFT"}}, {"id": 3, "text": "Names", "bbox": {"l": 82.140205, "t": 115.21489999999994, "r": 93.291527, "b": 118.40410999999995, "coord_origin": "TOPLEFT"}}, {"id": 4, "text": "1000", "bbox": {"l": 96.748268, "t": 122.43970000000002, "r": 104.3119, "b": 125.62891000000002, "coord_origin": "TOPLEFT"}}, {"id": 5, "text": "500", "bbox": {"l": 96.748268, "t": 127.74370999999985, "r": 102.42083, "b": 130.93291999999997, "coord_origin": "TOPLEFT"}}, {"id": 6, "text": "3500", "bbox": {"l": 96.748268, "t": 133.45569, "r": 104.3119, "b": 136.6449, "coord_origin": "TOPLEFT"}}, {"id": 7, "text": "150", "bbox": {"l": 96.748268, "t": 139.16772000000003, "r": 102.42083, "b": 142.35693000000003, "coord_origin": "TOPLEFT"}}, {"id": 8, "text": "unit", "bbox": {"l": 110.66107, "t": 122.43970000000002, "r": 116.14391, "b": 125.62891000000002, "coord_origin": "TOPLEFT"}}, {"id": 9, "text": "unit", "bbox": {"l": 110.66107, "t": 127.74370999999985, "r": 116.14391, "b": 130.93291999999997, "coord_origin": "TOPLEFT"}}, {"id": 10, "text": "unit", "bbox": {"l": 110.66107, "t": 133.45569, "r": 116.14391, "b": 136.6449, "coord_origin": "TOPLEFT"}}, {"id": 11, "text": "unit", "bbox": {"l": 110.66107, "t": 139.16772000000003, "r": 116.14391, "b": 142.35693000000003, "coord_origin": "TOPLEFT"}}, {"id": 12, "text": "2.", "bbox": {"l": 81.688072, "t": 127.74370999999985, "r": 84.927567, "b": 130.93291999999997, "coord_origin": "TOPLEFT"}}, {"id": 13, "text": "Item", "bbox": {"l": 86.54731, "t": 127.74370999999985, "r": 93.026291, "b": 130.93291999999997, "coord_origin": "TOPLEFT"}}, {"id": 14, "text": "3.", "bbox": {"l": 81.688072, "t": 133.45569, "r": 84.927567, "b": 136.6449, "coord_origin": "TOPLEFT"}}, {"id": 15, "text": "Item", "bbox": {"l": 86.54731, "t": 133.45569, "r": 93.026291, "b": 136.6449, "coord_origin": "TOPLEFT"}}, {"id": 16, "text": "4.", "bbox": {"l": 81.688072, "t": 139.16772000000003, "r": 84.927567, "b": 142.35693000000003, "coord_origin": "TOPLEFT"}}, {"id": 17, "text": "Item", "bbox": {"l": 86.54731, "t": 139.16772000000003, "r": 93.026291, "b": 142.35693000000003, "coord_origin": "TOPLEFT"}}, {"id": 18, "text": "Extracted", "bbox": {"l": 88.084389, "t": 90.49738000000002, "r": 113.93649, "b": 96.23798, "coord_origin": "TOPLEFT"}}, {"id": 19, "text": "Table Images", "bbox": {"l": 82.81002, "t": 97.63738999999998, "r": 119.21240000000002, "b": 103.37798999999995, "coord_origin": "TOPLEFT"}}, {"id": 20, "text": "Standardized", "bbox": {"l": 143.94247, "t": 100.60235999999998, "r": 180.01131, "b": 106.34295999999995, "coord_origin": "TOPLEFT"}}, {"id": 21, "text": "Images", "bbox": {"l": 151.94064, "t": 107.74237000000005, "r": 172.0118, "b": 113.48297000000014, "coord_origin": "TOPLEFT"}}, {"id": 22, "text": "BBox", "bbox": {"l": 251.76939000000002, "t": 80.93096999999989, "r": 266.39557, "b": 86.67156999999997, "coord_origin": "TOPLEFT"}}, {"id": 23, "text": "Decoder", "bbox": {"l": 247.51601, "t": 86.03101000000004, "r": 270.65021, "b": 91.77161000000001, "coord_origin": "TOPLEFT"}}, {"id": 24, "text": "BBoxes", "bbox": {"l": 331.03699, "t": 78.55980999999997, "r": 352.12589, "b": 84.30042000000003, "coord_origin": "TOPLEFT"}}, {"id": 25, "text": "BBoxes can be", "bbox": {"l": 390.56421, "t": 96.03223000000003, "r": 431.7261, "b": 101.77282999999989, "coord_origin": "TOPLEFT"}}, {"id": 26, "text": "traced back to the", "bbox": {"l": 386.82422, "t": 102.15228000000013, "r": 435.46966999999995, "b": 107.89287999999999, "coord_origin": "TOPLEFT"}}, {"id": 27, "text": "original image to", "bbox": {"l": 388.69589, "t": 108.27228000000002, "r": 433.6032400000001, "b": 114.01288000000011, "coord_origin": "TOPLEFT"}}, {"id": 28, "text": "extract content", "bbox": {"l": 391.07761, "t": 114.39227000000005, "r": 431.22542999999996, "b": 120.13286999999991, "coord_origin": "TOPLEFT"}}, {"id": 29, "text": "Structure Tags sequence", "bbox": {"l": 431.22650000000004, "t": 151.68511999999998, "r": 498.82068, "b": 157.42571999999996, "coord_origin": "TOPLEFT"}}, {"id": 30, "text": "provide full description of", "bbox": {"l": 431.1738, "t": 157.80517999999995, "r": 498.87753000000004, "b": 163.54578000000004, "coord_origin": "TOPLEFT"}}, {"id": 31, "text": "the table structure", "bbox": {"l": 440.5289, "t": 163.92516999999998, "r": 489.51827999999995, "b": 169.66576999999995, "coord_origin": "TOPLEFT"}}, {"id": 32, "text": "Structure Tags", "bbox": {"l": 328.37479, "t": 178.25385000000006, "r": 367.72333, "b": 183.99445000000003, "coord_origin": "TOPLEFT"}}, {"id": 33, "text": "BBoxes in sync", "bbox": {"l": 331.84451, "t": 123.90886999999998, "r": 373.67963, "b": 129.64948000000015, "coord_origin": "TOPLEFT"}}, {"id": 34, "text": "with tag sequence", "bbox": {"l": 331.84451, "t": 129.00885000000017, "r": 381.17786, "b": 134.74945000000002, "coord_origin": "TOPLEFT"}}, {"id": 35, "text": "Encoder", "bbox": {"l": 196.62633, "t": 88.11621000000002, "r": 219.42332, "b": 93.85681, "coord_origin": "TOPLEFT"}}, {"id": 36, "text": "Structure", "bbox": {"l": 246.66771, "t": 129.4946900000001, "r": 271.49899, "b": 135.23528999999996, "coord_origin": "TOPLEFT"}}, {"id": 37, "text": "Decoder", "bbox": {"l": 247.51601, "t": 134.59473000000003, "r": 270.65021, "b": 140.33533, "coord_origin": "TOPLEFT"}}, {"id": 38, "text": "[x1, y2, x2, y2]", "bbox": {"l": 330.63071, "t": 89.01923, "r": 365.55347, "b": 94.75982999999997, "coord_origin": "TOPLEFT"}}, {"id": 39, "text": "[x1', y2', x2', y2']", "bbox": {"l": 330.63071, "t": 97.17926, "r": 370.22717, "b": 102.91985999999997, "coord_origin": "TOPLEFT"}}, {"id": 40, "text": "[x1'', y2'', x2'', y2'']", "bbox": {"l": 330.63071, "t": 105.33922999999993, "r": 374.51157, "b": 111.07983000000002, "coord_origin": "TOPLEFT"}}, {"id": 41, "text": "...", "bbox": {"l": 330.63071, "t": 113.49926999999991, "r": 335.73233, "b": 119.23987, "coord_origin": "TOPLEFT"}}, {"id": 42, "text": "<TR>", "bbox": {"l": 322.30579, "t": 141.79236000000003, "r": 335.05988, "b": 146.57617000000005, "coord_origin": "TOPLEFT"}}, {"id": 43, "text": "<TD>", "bbox": {"l": 322.30579, "t": 148.93231000000003, "r": 335.05988, "b": 153.71613000000002, "coord_origin": "TOPLEFT"}}, {"id": 44, "text": "1", "bbox": {"l": 337.54971, "t": 148.55579, "r": 340.95242, "b": 154.29638999999997, "coord_origin": "TOPLEFT"}}, {"id": 45, "text": "</TD><TD colspan=\"2\">", "bbox": {"l": 343.56262, "t": 148.93231000000003, "r": 398.91446, "b": 153.71613000000002, "coord_origin": "TOPLEFT"}}, {"id": 46, "text": "</TD>", "bbox": {"l": 407.41718, "t": 148.93231000000003, "r": 421.58801, "b": 153.71613000000002, "coord_origin": "TOPLEFT"}}, {"id": 47, "text": "</TR><TR>", "bbox": {"l": 322.30579, "t": 156.07232999999997, "r": 349.23022, "b": 160.85613999999998, "coord_origin": "TOPLEFT"}}, {"id": 48, "text": "<TD>", "bbox": {"l": 322.30579, "t": 163.21234000000004, "r": 335.05988, "b": 167.99614999999994, "coord_origin": "TOPLEFT"}}, {"id": 49, "text": "</TD><TD>...", "bbox": {"l": 343.56155, "t": 163.21234000000004, "r": 374.73685, "b": 167.99614999999994, "coord_origin": "TOPLEFT"}}, {"id": 50, "text": "...", "bbox": {"l": 322.30579, "t": 170.35235999999998, "r": 326.55716, "b": 175.13617, "coord_origin": "TOPLEFT"}}, {"id": 51, "text": "1", "bbox": {"l": 323.51111, "t": 89.66967999999997, "r": 326.91382, "b": 95.41027999999994, "coord_origin": "TOPLEFT"}}, {"id": 52, "text": "2", "bbox": {"l": 323.71509, "t": 97.78887999999995, "r": 327.1178, "b": 103.52948000000004, "coord_origin": "TOPLEFT"}}, {"id": 53, "text": "3", "bbox": {"l": 323.71509, "t": 105.98969, "r": 327.1178, "b": 111.73029000000008, "coord_origin": "TOPLEFT"}}, {"id": 54, "text": "2", "bbox": {"l": 401.4816, "t": 148.54625999999996, "r": 404.88431, "b": 154.28687000000002, "coord_origin": "TOPLEFT"}}, {"id": 55, "text": "3", "bbox": {"l": 337.6976, "t": 162.68451000000005, "r": 341.10031, "b": 168.42511000000002, "coord_origin": "TOPLEFT"}}, {"id": 56, "text": "3", "bbox": {"l": 454.46378, "t": 104.54584, "r": 457.86648999999994, "b": 110.28644000000008, "coord_origin": "TOPLEFT"}}, {"id": 57, "text": "2", "bbox": {"l": 493.32580999999993, "t": 91.09546, "r": 496.72852, "b": 96.83605999999997, "coord_origin": "TOPLEFT"}}, {"id": 58, "text": "1", "bbox": {"l": 454.08298, "t": 90.56879000000015, "r": 457.48569000000003, "b": 96.30939000000001, "coord_origin": "TOPLEFT"}}]}, {"id": 1, "label": "caption", "bbox": {"l": 49.33068084716797, "t": 202.95303344726562, "r": 545.10846, "b": 225.3606414794922, "coord_origin": "TOPLEFT"}, "confidence": 0.9677655100822449, "cells": [{"id": 59, "text": "Figure 3:", "bbox": {"l": 50.112, "t": 204.10535000000004, "r": 86.883949, "b": 213.01189999999997, "coord_origin": "TOPLEFT"}}, {"id": 60, "text": "TableFormer", "bbox": {"l": 94.020996, "t": 203.98577999999998, "r": 149.85141, "b": 212.94214, "coord_origin": "TOPLEFT"}}, {"id": 61, "text": "takes in an image of the PDF and creates bounding box and HTML structure predictions that are", "bbox": {"l": 152.86099, "t": 204.10535000000004, "r": 545.10846, "b": 213.01189999999997, "coord_origin": "TOPLEFT"}}, {"id": 62, "text": "synchronized. The bounding boxes grabs the content from the PDF and inserts it in the structure.", "bbox": {"l": 50.111992, "t": 216.06035999999995, "r": 436.0134, "b": 224.96691999999996, "coord_origin": "TOPLEFT"}}]}, {"id": 2, "label": "picture", "bbox": {"l": 53.03322982788086, "t": 257.6654052734375, "r": 285.3731689453125, "b": 507.6688232421875, "coord_origin": "TOPLEFT"}, "confidence": 0.9724959135055542, "cells": [{"id": 63, "text": "Input Image", "bbox": {"l": 74.253464, "t": 258.21472000000006, "r": 101.75846, "b": 264.17474000000004, "coord_origin": "TOPLEFT"}}, {"id": 64, "text": "Tokenised Tags", "bbox": {"l": 122.29972, "t": 258.34520999999995, "r": 157.83972, "b": 264.30524, "coord_origin": "TOPLEFT"}}, {"id": 65, "text": "Multi-Head Attention", "bbox": {"l": 78.549347, "t": 371.38579999999996, "r": 125.68359000000001, "b": 377.04782, "coord_origin": "TOPLEFT"}}, {"id": 66, "text": "Add", "bbox": {"l": 78.513298, "t": 391.31857, "r": 84.644547, "b": 396.98059, "coord_origin": "TOPLEFT"}}, {"id": 67, "text": "& Normalisation", "bbox": {"l": 116.52705, "t": 391.31857, "r": 125.11079999999998, "b": 396.98059, "coord_origin": "TOPLEFT"}}, {"id": 68, "text": "Feed Forward Network", "bbox": {"l": 76.024773, "t": 424.45309, "r": 127.92327000000002, "b": 430.11511, "coord_origin": "TOPLEFT"}}, {"id": 69, "text": "Add", "bbox": {"l": 78.382828, "t": 444.88956, "r": 84.514076, "b": 450.55157, "coord_origin": "TOPLEFT"}}, {"id": 70, "text": "& Normalisation", "bbox": {"l": 116.39658, "t": 444.88956, "r": 124.98033, "b": 450.55157, "coord_origin": "TOPLEFT"}}, {"id": 71, "text": "Linear", "bbox": {"l": 167.46945, "t": 462.44324, "r": 181.6292, "b": 468.10526, "coord_origin": "TOPLEFT"}}, {"id": 72, "text": "Softmax", "bbox": {"l": 165.61292, "t": 478.47107, "r": 184.43242, "b": 484.13309, "coord_origin": "TOPLEFT"}}, {"id": 73, "text": "CNN BACKBONE ENCODER", "bbox": {"l": 65.319511, "t": 324.26235999999994, "r": 132.9245, "b": 330.22235000000006, "coord_origin": "TOPLEFT"}}, {"id": 74, "text": "[30, 1, 2, 3, 4, \u2026 3, ", "bbox": {"l": 119.51457, "t": 269.66394, "r": 162.98782, "b": 274.72992, "coord_origin": "TOPLEFT"}}, {"id": 75, "text": "4, 5, 8, 31]", "bbox": {"l": 128.72858, "t": 274.91394, "r": 151.41083, "b": 279.97992, "coord_origin": "TOPLEFT"}}, {"id": 76, "text": "Positional ", "bbox": {"l": 60.434211999999995, "t": 338.95993, "r": 80.27021, "b": 344.26993, "coord_origin": "TOPLEFT"}}, {"id": 77, "text": "Encoding", "bbox": {"l": 60.598457, "t": 343.38605, "r": 78.854958, "b": 348.69604, "coord_origin": "TOPLEFT"}}, {"id": 78, "text": "Positional ", "bbox": {"l": 134.82877, "t": 293.37762, "r": 154.66476, "b": 298.68762, "coord_origin": "TOPLEFT"}}, {"id": 79, "text": "Encoding", "bbox": {"l": 134.99303, "t": 297.80370999999997, "r": 153.24953, "b": 303.11371, "coord_origin": "TOPLEFT"}}, {"id": 80, "text": "Add & Normalisation", "bbox": {"l": 150.55193, "t": 345.35861, "r": 197.14943, "b": 351.02063, "coord_origin": "TOPLEFT"}}, {"id": 81, "text": "Add", "bbox": {"l": 150.55193, "t": 394.4234, "r": 156.68318, "b": 400.08542, "coord_origin": "TOPLEFT"}}, {"id": 82, "text": "& Normalisation", "bbox": {"l": 188.56567, "t": 394.4234, "r": 197.14943, "b": 400.08542, "coord_origin": "TOPLEFT"}}, {"id": 83, "text": "Multi-Head Attention", "bbox": {"l": 150.18539, "t": 375.66843, "r": 197.31964, "b": 381.33044, "coord_origin": "TOPLEFT"}}, {"id": 84, "text": "Add", "bbox": {"l": 150.55193, "t": 440.24847000000005, "r": 156.68318, "b": 445.91049, "coord_origin": "TOPLEFT"}}, {"id": 85, "text": "& Normalisation", "bbox": {"l": 188.56567, "t": 440.24847000000005, "r": 197.14943, "b": 445.91049, "coord_origin": "TOPLEFT"}}, {"id": 86, "text": "Feed Forward Network", "bbox": {"l": 147.86377, "t": 422.09335, "r": 199.76227, "b": 427.75537, "coord_origin": "TOPLEFT"}}, {"id": 87, "text": "Linear", "bbox": {"l": 241.56567000000004, "t": 314.26285000000007, "r": 255.72542, "b": 319.92487, "coord_origin": "TOPLEFT"}}, {"id": 88, "text": "Linear", "bbox": {"l": 241.91730000000004, "t": 361.36493, "r": 256.07706, "b": 367.02695, "coord_origin": "TOPLEFT"}}, {"id": 89, "text": "Attention", "bbox": {"l": 228.054, "t": 336.61929000000003, "r": 248.72363000000004, "b": 342.28131, "coord_origin": "TOPLEFT"}}, {"id": 90, "text": "Network", "bbox": {"l": 246.2919, "t": 336.61929000000003, "r": 269.39325, "b": 342.28131, "coord_origin": "TOPLEFT"}}, {"id": 91, "text": "MLP", "bbox": {"l": 228.44568000000004, "t": 405.14682, "r": 238.73892, "b": 410.80884, "coord_origin": "TOPLEFT"}}, {"id": 92, "text": "Linear ", "bbox": {"l": 256.29767, "t": 405.2032500000001, "r": 271.77792, "b": 410.86526, "coord_origin": "TOPLEFT"}}, {"id": 93, "text": "Sigmoid", "bbox": {"l": 239.54543, "t": 382.21344, "r": 258.08942, "b": 387.87546, "coord_origin": "TOPLEFT"}}, {"id": 94, "text": "Transformer Encoder Network", "bbox": {"l": 54.14704100000001, "t": 384.87183, "r": 59.51152, "b": 449.78326, "coord_origin": "TOPLEFT"}}, {"id": 95, "text": "x2", "bbox": {"l": 54.235424, "t": 373.81232, "r": 59.30449699999999, "b": 378.45421999999996, "coord_origin": "TOPLEFT"}}, {"id": 96, "text": "Encoded Output", "bbox": {"l": 85.295891, "t": 484.53189, "r": 122.16431, "b": 490.36688, "coord_origin": "TOPLEFT"}}, {"id": 97, "text": "Encoded Output", "bbox": {"l": 229.66599, "t": 279.54607999999996, "r": 265.3194, "b": 285.45572000000004, "coord_origin": "TOPLEFT"}}, {"id": 98, "text": "Predicted Tags", "bbox": {"l": 157.17369, "t": 500.3031, "r": 190.41711, "b": 506.12943, "coord_origin": "TOPLEFT"}}, {"id": 99, "text": "Bounding Boxes & ", "bbox": {"l": 227.81598999999997, "t": 438.05542, "r": 270.78442, "b": 443.89206, "coord_origin": "TOPLEFT"}}, {"id": 100, "text": "Classification", "bbox": {"l": 233.70262, "t": 444.06183, "r": 263.51105, "b": 449.8904999999999, "coord_origin": "TOPLEFT"}}, {"id": 101, "text": "Transformer ", "bbox": {"l": 184.74655, "t": 293.39502, "r": 212.16055, "b": 298.75903, "coord_origin": "TOPLEFT"}}, {"id": 102, "text": "Decoder Network", "bbox": {"l": 178.91229, "t": 299.14502, "r": 216.74378999999996, "b": 304.50903, "coord_origin": "TOPLEFT"}}, {"id": 103, "text": "x4", "bbox": {"l": 194.24574, "t": 282.7822, "r": 198.89099, "b": 287.84817999999996, "coord_origin": "TOPLEFT"}}, {"id": 104, "text": "CELL BBOX DECODER", "bbox": {"l": 221.45587, "t": 271.86914, "r": 276.47089, "b": 277.82916, "coord_origin": "TOPLEFT"}}, {"id": 105, "text": "Masked Multi-Head ", "bbox": {"l": 151.65219, "t": 323.44241, "r": 197.29019, "b": 329.10443, "coord_origin": "TOPLEFT"}}, {"id": 106, "text": "Attention", "bbox": {"l": 163.43277, "t": 329.44241, "r": 184.19028, "b": 335.10443, "coord_origin": "TOPLEFT"}}]}, {"id": 3, "label": "caption", "bbox": {"l": 49.31888961791992, "t": 527.1819458007812, "r": 286.778076171875, "b": 680.416748046875, "coord_origin": "TOPLEFT"}, "confidence": 0.8913402557373047, "cells": [{"id": 107, "text": "Figure 4: Given an input image of a table, the", "bbox": {"l": 50.112, "t": 527.90237, "r": 229.78752, "b": 536.80893, "coord_origin": "TOPLEFT"}}, {"id": 108, "text": "Encoder", "bbox": {"l": 231.787, "t": 527.7828099999999, "r": 267.76196, "b": 536.7392, "coord_origin": "TOPLEFT"}}, {"id": 109, "text": "pro-", "bbox": {"l": 269.76401, "t": 527.90237, "r": 286.36169, "b": 536.80893, "coord_origin": "TOPLEFT"}}, {"id": 110, "text": "duces fixed-length features that represent the input image.", "bbox": {"l": 50.112015, "t": 539.85738, "r": 286.36508, "b": 548.76393, "coord_origin": "TOPLEFT"}}, {"id": 111, "text": "The features are then passed to both the", "bbox": {"l": 50.112015, "t": 551.81337, "r": 205.84735, "b": 560.71992, "coord_origin": "TOPLEFT"}}, {"id": 112, "text": "Structure Decoder", "bbox": {"l": 208.01802, "t": 551.69382, "r": 286.36392, "b": 560.6501900000001, "coord_origin": "TOPLEFT"}}, {"id": 113, "text": "and", "bbox": {"l": 50.112015, "t": 563.76837, "r": 64.498009, "b": 572.67493, "coord_origin": "TOPLEFT"}}, {"id": 114, "text": "Cell BBox Decoder", "bbox": {"l": 68.165016, "t": 563.64882, "r": 151.31288, "b": 572.60519, "coord_origin": "TOPLEFT"}}, {"id": 115, "text": ".", "bbox": {"l": 151.31302, "t": 563.76837, "r": 153.80367, "b": 572.67493, "coord_origin": "TOPLEFT"}}, {"id": 116, "text": "During training, the", "bbox": {"l": 160.41884, "t": 563.76837, "r": 241.93283000000002, "b": 572.67493, "coord_origin": "TOPLEFT"}}, {"id": 117, "text": "Structure", "bbox": {"l": 245.59502, "t": 563.64882, "r": 286.362, "b": 572.60519, "coord_origin": "TOPLEFT"}}, {"id": 118, "text": "Decoder", "bbox": {"l": 50.112015, "t": 575.60382, "r": 85.519089, "b": 584.5602, "coord_origin": "TOPLEFT"}}, {"id": 119, "text": "receives \u2018tokenized tags\u2019 of the HTML code that", "bbox": {"l": 88.623016, "t": 575.7233699999999, "r": 286.36072, "b": 584.6299300000001, "coord_origin": "TOPLEFT"}}, {"id": 120, "text": "represent the table structure. Afterwards, a transformer en-", "bbox": {"l": 50.112015, "t": 587.6783800000001, "r": 286.36511, "b": 596.58493, "coord_origin": "TOPLEFT"}}, {"id": 121, "text": "coder and decoder architecture is employed to produce fea-", "bbox": {"l": 50.112015, "t": 599.63338, "r": 286.36508, "b": 608.53993, "coord_origin": "TOPLEFT"}}, {"id": 122, "text": "tures that are received by a linear layer, and the", "bbox": {"l": 50.112015, "t": 611.58838, "r": 240.43756000000002, "b": 620.4949300000001, "coord_origin": "TOPLEFT"}}, {"id": 123, "text": "Cell BBox", "bbox": {"l": 243.19801, "t": 611.46883, "r": 286.36597, "b": 620.4252, "coord_origin": "TOPLEFT"}}, {"id": 124, "text": "Decoder. The linear layer is applied to the features to", "bbox": {"l": 50.112015, "t": 623.42482, "r": 286.36511, "b": 632.3812, "coord_origin": "TOPLEFT"}}, {"id": 125, "text": "predict the tags. Simultaneously, the Cell BBox Decoder", "bbox": {"l": 50.112015, "t": 635.37982, "r": 286.36508, "b": 644.3362, "coord_origin": "TOPLEFT"}}, {"id": 126, "text": "selects features referring to the data cells (\u2018", "bbox": {"l": 50.112015, "t": 647.45438, "r": 220.58205, "b": 656.36093, "coord_origin": "TOPLEFT"}}, {"id": 127, "text": "<", "bbox": {"l": 220.57802000000004, "t": 647.29497, "r": 228.32693, "b": 656.14175, "coord_origin": "TOPLEFT"}}, {"id": 128, "text": "td", "bbox": {"l": 228.32700999999997, "t": 647.45438, "r": 236.07791000000003, "b": 656.36093, "coord_origin": "TOPLEFT"}}, {"id": 129, "text": ">", "bbox": {"l": 236.07802000000004, "t": 647.29497, "r": 243.82693, "b": 656.14175, "coord_origin": "TOPLEFT"}}, {"id": 130, "text": "\u2019, \u2018", "bbox": {"l": 243.82602, "t": 647.45438, "r": 255.29298000000003, "b": 656.36093, "coord_origin": "TOPLEFT"}}, {"id": 131, "text": "<", "bbox": {"l": 255.29102000000003, "t": 647.29497, "r": 263.03992, "b": 656.14175, "coord_origin": "TOPLEFT"}}, {"id": 132, "text": "\u2019) and", "bbox": {"l": 263.04001, "t": 647.45438, "r": 286.36246, "b": 656.36093, "coord_origin": "TOPLEFT"}}, {"id": 133, "text": "passes them through an attention network, an MLP, and a", "bbox": {"l": 50.112015, "t": 659.40938, "r": 286.36511, "b": 668.31594, "coord_origin": "TOPLEFT"}}, {"id": 134, "text": "linear layer to predict the bounding boxes.", "bbox": {"l": 50.112015, "t": 671.36438, "r": 218.46996, "b": 680.27094, "coord_origin": "TOPLEFT"}}]}, {"id": 4, "label": "text", "bbox": {"l": 308.1139831542969, "t": 248.6479034423828, "r": 545.3682861328125, "b": 294.3069499999999, "coord_origin": "TOPLEFT"}, "confidence": 0.9831458330154419, "cells": [{"id": 135, "text": "forming classification, and adding an adaptive pooling", "bbox": {"l": 308.862, "t": 249.53441999999995, "r": 523.05786, "b": 258.44097999999997, "coord_origin": "TOPLEFT"}}, {"id": 136, "text": "layer", "bbox": {"l": 525.19983, "t": 249.53441999999995, "r": 545.11505, "b": 258.44097999999997, "coord_origin": "TOPLEFT"}}, {"id": 137, "text": "of size 28*28. ResNet by default downsamples the", "bbox": {"l": 308.862, "t": 261.49042, "r": 517.55847, "b": 270.39697, "coord_origin": "TOPLEFT"}}, {"id": 138, "text": "image", "bbox": {"l": 520.76642, "t": 261.49042, "r": 545.11499, "b": 270.39697, "coord_origin": "TOPLEFT"}}, {"id": 139, "text": "resolution by 32 and then the encoded image is provided", "bbox": {"l": 308.862, "t": 273.44537, "r": 534.80377, "b": 282.35196, "coord_origin": "TOPLEFT"}}, {"id": 140, "text": "to", "bbox": {"l": 537.36414, "t": 273.44537, "r": 545.11505, "b": 282.35196, "coord_origin": "TOPLEFT"}}, {"id": 141, "text": "both the", "bbox": {"l": 308.862, "t": 285.40039, "r": 341.24045, "b": 294.3069499999999, "coord_origin": "TOPLEFT"}}, {"id": 142, "text": "Structure Decoder", "bbox": {"l": 343.73099, "t": 285.49005, "r": 417.23508, "b": 294.07782000000003, "coord_origin": "TOPLEFT"}}, {"id": 143, "text": ", and", "bbox": {"l": 417.23398, "t": 285.40039, "r": 436.60129, "b": 294.3069499999999, "coord_origin": "TOPLEFT"}}, {"id": 144, "text": "Cell BBox Decoder", "bbox": {"l": 439.09198, "t": 285.49005, "r": 516.56116, "b": 294.07782000000003, "coord_origin": "TOPLEFT"}}, {"id": 145, "text": ".", "bbox": {"l": 516.56097, "t": 285.40039, "r": 519.05164, "b": 294.3069499999999, "coord_origin": "TOPLEFT"}}]}, {"id": 5, "label": "text", "bbox": {"l": 308.0434265136719, "t": 296.45684814453125, "r": 545.462646484375, "b": 414.4779052734375, "coord_origin": "TOPLEFT"}, "confidence": 0.9862009882926941, "cells": [{"id": 146, "text": "Structure Decoder.", "bbox": {"l": 320.81696, "t": 297.33981, "r": 400.86649, "b": 306.2962, "coord_origin": "TOPLEFT"}}, {"id": 147, "text": "The transformer architecture of", "bbox": {"l": 403.91394, "t": 297.45938, "r": 528.33685, "b": 306.36594, "coord_origin": "TOPLEFT"}}, {"id": 148, "text": "this", "bbox": {"l": 530.7179, "t": 297.45938, "r": 545.11383, "b": 306.36594, "coord_origin": "TOPLEFT"}}, {"id": 149, "text": "component is based on the work proposed in [31].", "bbox": {"l": 308.86194, "t": 309.41437, "r": 517.5285, "b": 318.32092, "coord_origin": "TOPLEFT"}}, {"id": 150, "text": "After", "bbox": {"l": 524.09387, "t": 309.41437, "r": 545.11493, "b": 318.32092, "coord_origin": "TOPLEFT"}}, {"id": 151, "text": "extensive experimentation, the", "bbox": {"l": 308.86194, "t": 321.36934999999994, "r": 432.35833999999994, "b": 330.27591, "coord_origin": "TOPLEFT"}}, {"id": 152, "text": "Structure Decoder", "bbox": {"l": 435.81995000000006, "t": 321.45901, "r": 510.29041, "b": 330.04678, "coord_origin": "TOPLEFT"}}, {"id": 153, "text": "is", "bbox": {"l": 513.97797, "t": 321.36934999999994, "r": 520.62305, "b": 330.27591, "coord_origin": "TOPLEFT"}}, {"id": 154, "text": "mod-", "bbox": {"l": 524.08008, "t": 321.36934999999994, "r": 545.11115, "b": 330.27591, "coord_origin": "TOPLEFT"}}, {"id": 155, "text": "eled as a transformer encoder with two encoder layers", "bbox": {"l": 308.86197, "t": 333.32434, "r": 527.76013, "b": 342.2309, "coord_origin": "TOPLEFT"}}, {"id": 156, "text": "and", "bbox": {"l": 530.729, "t": 333.32434, "r": 545.11499, "b": 342.2309, "coord_origin": "TOPLEFT"}}, {"id": 157, "text": "a transformer decoder made from a stack of 4 decoder", "bbox": {"l": 308.86197, "t": 345.27933, "r": 526.85352, "b": 354.18588, "coord_origin": "TOPLEFT"}}, {"id": 158, "text": "lay-", "bbox": {"l": 529.62311, "t": 345.27933, "r": 545.11493, "b": 354.18588, "coord_origin": "TOPLEFT"}}, {"id": 159, "text": "ers that comprise mainly of multi-head attention and", "bbox": {"l": 308.86197, "t": 357.23532, "r": 524.51245, "b": 366.14188, "coord_origin": "TOPLEFT"}}, {"id": 160, "text": "feed", "bbox": {"l": 527.96948, "t": 357.23532, "r": 545.11511, "b": 366.14188, "coord_origin": "TOPLEFT"}}, {"id": 161, "text": "forward layers.", "bbox": {"l": 308.86197, "t": 369.19031000000007, "r": 370.39096, "b": 378.09685999999994, "coord_origin": "TOPLEFT"}}, {"id": 162, "text": "This configuration uses fewer layers", "bbox": {"l": 377.44449, "t": 369.19031000000007, "r": 526.91339, "b": 378.09685999999994, "coord_origin": "TOPLEFT"}}, {"id": 163, "text": "and", "bbox": {"l": 530.72906, "t": 369.19031000000007, "r": 545.11505, "b": 378.09685999999994, "coord_origin": "TOPLEFT"}}, {"id": 164, "text": "heads in comparison to networks applied to other", "bbox": {"l": 308.86197, "t": 381.14529000000005, "r": 505.46395999999993, "b": 390.05185, "coord_origin": "TOPLEFT"}}, {"id": 165, "text": "problems", "bbox": {"l": 508.03430000000003, "t": 381.14529000000005, "r": 545.11511, "b": 390.05185, "coord_origin": "TOPLEFT"}}, {"id": 166, "text": "(e.g. \u201cScene Understanding\u201d, \u201cImage Captioning\u201d),", "bbox": {"l": 308.86197, "t": 393.10028, "r": 517.68799, "b": 402.00684, "coord_origin": "TOPLEFT"}}, {"id": 167, "text": "some-", "bbox": {"l": 520.76642, "t": 393.10028, "r": 545.11499, "b": 402.00684, "coord_origin": "TOPLEFT"}}, {"id": 168, "text": "thing which we relate to the simplicity of table images.", "bbox": {"l": 308.86197, "t": 405.05526999999995, "r": 528.01935, "b": 413.96182, "coord_origin": "TOPLEFT"}}]}, {"id": 6, "label": "text", "bbox": {"l": 307.9245300292969, "t": 416.45196533203125, "r": 545.5032348632812, "b": 545.57271, "coord_origin": "TOPLEFT"}, "confidence": 0.9851906895637512, "cells": [{"id": 169, "text": "The transformer encoder receives an encoded", "bbox": {"l": 320.81696, "t": 417.11426, "r": 515.49609, "b": 426.02081, "coord_origin": "TOPLEFT"}}, {"id": 170, "text": "image", "bbox": {"l": 520.7663, "t": 417.11426, "r": 545.11487, "b": 426.02081, "coord_origin": "TOPLEFT"}}, {"id": 171, "text": "from the", "bbox": {"l": 308.86197, "t": 429.0692399999999, "r": 343.72107, "b": 437.9758, "coord_origin": "TOPLEFT"}}, {"id": 172, "text": "CNN Backbone Network", "bbox": {"l": 347.03796, "t": 429.15891, "r": 446.45471000000003, "b": 437.74667, "coord_origin": "TOPLEFT"}}, {"id": 173, "text": "and refines it", "bbox": {"l": 449.93996999999996, "t": 429.0692399999999, "r": 503.06055000000003, "b": 437.9758, "coord_origin": "TOPLEFT"}}, {"id": 174, "text": "through", "bbox": {"l": 506.37808, "t": 429.0692399999999, "r": 537.3717, "b": 437.9758, "coord_origin": "TOPLEFT"}}, {"id": 175, "text": "a", "bbox": {"l": 540.68927, "t": 429.0692399999999, "r": 545.11267, "b": 437.9758, "coord_origin": "TOPLEFT"}}, {"id": 176, "text": "multi-head dot-product attention layer, followed by a", "bbox": {"l": 308.86197, "t": 441.02423, "r": 522.78894, "b": 449.93079, "coord_origin": "TOPLEFT"}}, {"id": 177, "text": "Feed", "bbox": {"l": 525.7478, "t": 441.02423, "r": 545.11511, "b": 449.93079, "coord_origin": "TOPLEFT"}}, {"id": 178, "text": "Forward Network.", "bbox": {"l": 308.86197, "t": 452.97922, "r": 384.14929, "b": 461.88577, "coord_origin": "TOPLEFT"}}, {"id": 179, "text": "During training, the transformer", "bbox": {"l": 393.37466, "t": 452.97922, "r": 527.84985, "b": 461.88577, "coord_origin": "TOPLEFT"}}, {"id": 180, "text": "de-", "bbox": {"l": 532.39282, "t": 452.97922, "r": 545.11505, "b": 461.88577, "coord_origin": "TOPLEFT"}}, {"id": 181, "text": "coder receives as input the output feature produced by", "bbox": {"l": 308.86197, "t": 464.93521, "r": 529.7627, "b": 473.84177, "coord_origin": "TOPLEFT"}}, {"id": 182, "text": "the", "bbox": {"l": 532.94073, "t": 464.93521, "r": 545.11505, "b": 473.84177, "coord_origin": "TOPLEFT"}}, {"id": 183, "text": "transformer encoder, and the tokenized input of the", "bbox": {"l": 308.86197, "t": 476.8902, "r": 514.17126, "b": 485.79675, "coord_origin": "TOPLEFT"}}, {"id": 184, "text": "HTML", "bbox": {"l": 516.89105, "t": 476.8902, "r": 545.11511, "b": 485.79675, "coord_origin": "TOPLEFT"}}, {"id": 185, "text": "ground-truth tags. Using a stack of multi-head attention", "bbox": {"l": 308.86197, "t": 488.84518, "r": 527.63068, "b": 497.75174, "coord_origin": "TOPLEFT"}}, {"id": 186, "text": "lay-", "bbox": {"l": 529.62317, "t": 488.84518, "r": 545.11499, "b": 497.75174, "coord_origin": "TOPLEFT"}}, {"id": 187, "text": "ers, different aspects of the tag sequence could be", "bbox": {"l": 308.86197, "t": 500.80017, "r": 508.3630999999999, "b": 509.70673, "coord_origin": "TOPLEFT"}}, {"id": 188, "text": "inferred.", "bbox": {"l": 511.09286000000003, "t": 500.80017, "r": 545.11511, "b": 509.70673, "coord_origin": "TOPLEFT"}}, {"id": 189, "text": "This is achieved by each attention head on a layer operating", "bbox": {"l": 308.86197, "t": 512.7551599999999, "r": 545.11499, "b": 521.6617100000001, "coord_origin": "TOPLEFT"}}, {"id": 190, "text": "in a different subspace, and then combining altogether their", "bbox": {"l": 308.86197, "t": 524.71115, "r": 545.11511, "b": 533.61771, "coord_origin": "TOPLEFT"}}, {"id": 191, "text": "attention score.", "bbox": {"l": 308.86197, "t": 536.66615, "r": 369.73349, "b": 545.57271, "coord_origin": "TOPLEFT"}}]}, {"id": 7, "label": "text", "bbox": {"l": 307.90594482421875, "t": 547.4575805664062, "r": 545.403076171875, "b": 653.4934692382812, "coord_origin": "TOPLEFT"}, "confidence": 0.9869197010993958, "cells": [{"id": 192, "text": "Cell BBox Decoder.", "bbox": {"l": 320.81696, "t": 548.6046, "r": 404.76184, "b": 557.56097, "coord_origin": "TOPLEFT"}}, {"id": 193, "text": "Our architecture allows to simul-", "bbox": {"l": 410.34094, "t": 548.72415, "r": 545.11505, "b": 557.63071, "coord_origin": "TOPLEFT"}}, {"id": 194, "text": "taneously predict HTML tags and bounding boxes for each", "bbox": {"l": 308.86194, "t": 560.68015, "r": 545.11493, "b": 569.5867000000001, "coord_origin": "TOPLEFT"}}, {"id": 195, "text": "table cell without the need of a separate object detector end", "bbox": {"l": 308.86194, "t": 572.6351500000001, "r": 545.11511, "b": 581.5417, "coord_origin": "TOPLEFT"}}, {"id": 196, "text": "to end. This approach is inspired by DETR [1] which em-", "bbox": {"l": 308.86194, "t": 584.59015, "r": 545.11493, "b": 593.4967, "coord_origin": "TOPLEFT"}}, {"id": 197, "text": "ploys a Transformer Encoder, and Decoder that looks for", "bbox": {"l": 308.86194, "t": 596.54515, "r": 545.11499, "b": 605.45171, "coord_origin": "TOPLEFT"}}, {"id": 198, "text": "a specific number of object queries (potential object detec-", "bbox": {"l": 308.86194, "t": 608.50015, "r": 545.11505, "b": 617.40671, "coord_origin": "TOPLEFT"}}, {"id": 199, "text": "tions). As our model utilizes a transformer architecture, the", "bbox": {"l": 308.86194, "t": 620.45515, "r": 545.11505, "b": 629.36171, "coord_origin": "TOPLEFT"}}, {"id": 200, "text": "hidden state of the", "bbox": {"l": 308.86194, "t": 632.41115, "r": 381.67859, "b": 641.3177000000001, "coord_origin": "TOPLEFT"}}, {"id": 201, "text": "<", "bbox": {"l": 383.99695, "t": 632.25174, "r": 391.74585, "b": 641.09853, "coord_origin": "TOPLEFT"}}, {"id": 202, "text": "td", "bbox": {"l": 391.74594, "t": 632.41115, "r": 399.49686, "b": 641.3177000000001, "coord_origin": "TOPLEFT"}}, {"id": 203, "text": ">", "bbox": {"l": 399.49695, "t": 632.25174, "r": 407.24585, "b": 641.09853, "coord_origin": "TOPLEFT"}}, {"id": 204, "text": "\u2019 and \u2018", "bbox": {"l": 407.24594, "t": 632.41115, "r": 432.90958, "b": 641.3177000000001, "coord_origin": "TOPLEFT"}}, {"id": 205, "text": "<", "bbox": {"l": 432.90792999999996, "t": 632.25174, "r": 440.65683000000007, "b": 641.09853, "coord_origin": "TOPLEFT"}}, {"id": 206, "text": "\u2019 HTML structure tags be-", "bbox": {"l": 440.65691999999996, "t": 632.41115, "r": 545.11475, "b": 641.3177000000001, "coord_origin": "TOPLEFT"}}, {"id": 207, "text": "come the object query.", "bbox": {"l": 308.86194, "t": 644.3661500000001, "r": 398.96371, "b": 653.27271, "coord_origin": "TOPLEFT"}}]}, {"id": 8, "label": "text", "bbox": {"l": 307.9397277832031, "t": 655.742919921875, "r": 545.2218627929688, "b": 713.3260498046875, "coord_origin": "TOPLEFT"}, "confidence": 0.9852352142333984, "cells": [{"id": 208, "text": "The encoding generated by the", "bbox": {"l": 320.81693, "t": 656.42516, "r": 444.34316999999993, "b": 665.33172, "coord_origin": "TOPLEFT"}}, {"id": 209, "text": "CNN Backbone Network", "bbox": {"l": 447.00591999999995, "t": 656.51482, "r": 545.1076, "b": 665.10258, "coord_origin": "TOPLEFT"}}, {"id": 210, "text": "along with the features acquired for every data cell from the", "bbox": {"l": 308.86194, "t": 668.38016, "r": 545.11505, "b": 677.2867200000001, "coord_origin": "TOPLEFT"}}, {"id": 211, "text": "Transformer Decoder are then passed to the attention net-", "bbox": {"l": 308.86194, "t": 680.33516, "r": 545.11505, "b": 689.24172, "coord_origin": "TOPLEFT"}}, {"id": 212, "text": "work. The attention network takes both inputs and learns to", "bbox": {"l": 308.86194, "t": 692.290161, "r": 545.11505, "b": 701.196724, "coord_origin": "TOPLEFT"}}, {"id": 213, "text": "provide an attention weighted encoding. This weighted at-", "bbox": {"l": 308.86194, "t": 704.245163, "r": 545.11505, "b": 713.151726, "coord_origin": "TOPLEFT"}}]}, {"id": 9, "label": "page_footer", "bbox": {"l": 294.5858459472656, "t": 733.3272094726562, "r": 300.10223, "b": 743.039722, "coord_origin": "TOPLEFT"}, "confidence": 0.8719565868377686, "cells": [{"id": 214, "text": "5", "bbox": {"l": 295.12094, "t": 734.13316, "r": 300.10223, "b": 743.039722, "coord_origin": "TOPLEFT"}}]}]}, "tablestructure": {"table_map": {}}, "figures_classification": null, "equations_prediction": null}, "assembled": {"elements": [{"label": "picture", "id": 0, "page_no": 4, "cluster": {"id": 0, "label": "picture", "bbox": {"l": 74.30538940429688, "t": 77.91118621826172, "r": 519.9801025390625, "b": 183.99445000000003, "coord_origin": "TOPLEFT"}, "confidence": 0.9296938180923462, "cells": [{"id": 0, "text": "1.", "bbox": {"l": 81.688072, "t": 122.43970000000002, "r": 84.927567, "b": 125.62891000000002, "coord_origin": "TOPLEFT"}}, {"id": 1, "text": "Item", "bbox": {"l": 86.54731, "t": 122.43970000000002, "r": 93.026291, "b": 125.62891000000002, "coord_origin": "TOPLEFT"}}, {"id": 2, "text": "Amount", "bbox": {"l": 102.50498, "t": 115.25214000000005, "r": 115.3461, "b": 118.44135000000006, "coord_origin": "TOPLEFT"}}, {"id": 3, "text": "Names", "bbox": {"l": 82.140205, "t": 115.21489999999994, "r": 93.291527, "b": 118.40410999999995, "coord_origin": "TOPLEFT"}}, {"id": 4, "text": "1000", "bbox": {"l": 96.748268, "t": 122.43970000000002, "r": 104.3119, "b": 125.62891000000002, "coord_origin": "TOPLEFT"}}, {"id": 5, "text": "500", "bbox": {"l": 96.748268, "t": 127.74370999999985, "r": 102.42083, "b": 130.93291999999997, "coord_origin": "TOPLEFT"}}, {"id": 6, "text": "3500", "bbox": {"l": 96.748268, "t": 133.45569, "r": 104.3119, "b": 136.6449, "coord_origin": "TOPLEFT"}}, {"id": 7, "text": "150", "bbox": {"l": 96.748268, "t": 139.16772000000003, "r": 102.42083, "b": 142.35693000000003, "coord_origin": "TOPLEFT"}}, {"id": 8, "text": "unit", "bbox": {"l": 110.66107, "t": 122.43970000000002, "r": 116.14391, "b": 125.62891000000002, "coord_origin": "TOPLEFT"}}, {"id": 9, "text": "unit", "bbox": {"l": 110.66107, "t": 127.74370999999985, "r": 116.14391, "b": 130.93291999999997, "coord_origin": "TOPLEFT"}}, {"id": 10, "text": "unit", "bbox": {"l": 110.66107, "t": 133.45569, "r": 116.14391, "b": 136.6449, "coord_origin": "TOPLEFT"}}, {"id": 11, "text": "unit", "bbox": {"l": 110.66107, "t": 139.16772000000003, "r": 116.14391, "b": 142.35693000000003, "coord_origin": "TOPLEFT"}}, {"id": 12, "text": "2.", "bbox": {"l": 81.688072, "t": 127.74370999999985, "r": 84.927567, "b": 130.93291999999997, "coord_origin": "TOPLEFT"}}, {"id": 13, "text": "Item", "bbox": {"l": 86.54731, "t": 127.74370999999985, "r": 93.026291, "b": 130.93291999999997, "coord_origin": "TOPLEFT"}}, {"id": 14, "text": "3.", "bbox": {"l": 81.688072, "t": 133.45569, "r": 84.927567, "b": 136.6449, "coord_origin": "TOPLEFT"}}, {"id": 15, "text": "Item", "bbox": {"l": 86.54731, "t": 133.45569, "r": 93.026291, "b": 136.6449, "coord_origin": "TOPLEFT"}}, {"id": 16, "text": "4.", "bbox": {"l": 81.688072, "t": 139.16772000000003, "r": 84.927567, "b": 142.35693000000003, "coord_origin": "TOPLEFT"}}, {"id": 17, "text": "Item", "bbox": {"l": 86.54731, "t": 139.16772000000003, "r": 93.026291, "b": 142.35693000000003, "coord_origin": "TOPLEFT"}}, {"id": 18, "text": "Extracted", "bbox": {"l": 88.084389, "t": 90.49738000000002, "r": 113.93649, "b": 96.23798, "coord_origin": "TOPLEFT"}}, {"id": 19, "text": "Table Images", "bbox": {"l": 82.81002, "t": 97.63738999999998, "r": 119.21240000000002, "b": 103.37798999999995, "coord_origin": "TOPLEFT"}}, {"id": 20, "text": "Standardized", "bbox": {"l": 143.94247, "t": 100.60235999999998, "r": 180.01131, "b": 106.34295999999995, "coord_origin": "TOPLEFT"}}, {"id": 21, "text": "Images", "bbox": {"l": 151.94064, "t": 107.74237000000005, "r": 172.0118, "b": 113.48297000000014, "coord_origin": "TOPLEFT"}}, {"id": 22, "text": "BBox", "bbox": {"l": 251.76939000000002, "t": 80.93096999999989, "r": 266.39557, "b": 86.67156999999997, "coord_origin": "TOPLEFT"}}, {"id": 23, "text": "Decoder", "bbox": {"l": 247.51601, "t": 86.03101000000004, "r": 270.65021, "b": 91.77161000000001, "coord_origin": "TOPLEFT"}}, {"id": 24, "text": "BBoxes", "bbox": {"l": 331.03699, "t": 78.55980999999997, "r": 352.12589, "b": 84.30042000000003, "coord_origin": "TOPLEFT"}}, {"id": 25, "text": "BBoxes can be", "bbox": {"l": 390.56421, "t": 96.03223000000003, "r": 431.7261, "b": 101.77282999999989, "coord_origin": "TOPLEFT"}}, {"id": 26, "text": "traced back to the", "bbox": {"l": 386.82422, "t": 102.15228000000013, "r": 435.46966999999995, "b": 107.89287999999999, "coord_origin": "TOPLEFT"}}, {"id": 27, "text": "original image to", "bbox": {"l": 388.69589, "t": 108.27228000000002, "r": 433.6032400000001, "b": 114.01288000000011, "coord_origin": "TOPLEFT"}}, {"id": 28, "text": "extract content", "bbox": {"l": 391.07761, "t": 114.39227000000005, "r": 431.22542999999996, "b": 120.13286999999991, "coord_origin": "TOPLEFT"}}, {"id": 29, "text": "Structure Tags sequence", "bbox": {"l": 431.22650000000004, "t": 151.68511999999998, "r": 498.82068, "b": 157.42571999999996, "coord_origin": "TOPLEFT"}}, {"id": 30, "text": "provide full description of", "bbox": {"l": 431.1738, "t": 157.80517999999995, "r": 498.87753000000004, "b": 163.54578000000004, "coord_origin": "TOPLEFT"}}, {"id": 31, "text": "the table structure", "bbox": {"l": 440.5289, "t": 163.92516999999998, "r": 489.51827999999995, "b": 169.66576999999995, "coord_origin": "TOPLEFT"}}, {"id": 32, "text": "Structure Tags", "bbox": {"l": 328.37479, "t": 178.25385000000006, "r": 367.72333, "b": 183.99445000000003, "coord_origin": "TOPLEFT"}}, {"id": 33, "text": "BBoxes in sync", "bbox": {"l": 331.84451, "t": 123.90886999999998, "r": 373.67963, "b": 129.64948000000015, "coord_origin": "TOPLEFT"}}, {"id": 34, "text": "with tag sequence", "bbox": {"l": 331.84451, "t": 129.00885000000017, "r": 381.17786, "b": 134.74945000000002, "coord_origin": "TOPLEFT"}}, {"id": 35, "text": "Encoder", "bbox": {"l": 196.62633, "t": 88.11621000000002, "r": 219.42332, "b": 93.85681, "coord_origin": "TOPLEFT"}}, {"id": 36, "text": "Structure", "bbox": {"l": 246.66771, "t": 129.4946900000001, "r": 271.49899, "b": 135.23528999999996, "coord_origin": "TOPLEFT"}}, {"id": 37, "text": "Decoder", "bbox": {"l": 247.51601, "t": 134.59473000000003, "r": 270.65021, "b": 140.33533, "coord_origin": "TOPLEFT"}}, {"id": 38, "text": "[x1, y2, x2, y2]", "bbox": {"l": 330.63071, "t": 89.01923, "r": 365.55347, "b": 94.75982999999997, "coord_origin": "TOPLEFT"}}, {"id": 39, "text": "[x1', y2', x2', y2']", "bbox": {"l": 330.63071, "t": 97.17926, "r": 370.22717, "b": 102.91985999999997, "coord_origin": "TOPLEFT"}}, {"id": 40, "text": "[x1'', y2'', x2'', y2'']", "bbox": {"l": 330.63071, "t": 105.33922999999993, "r": 374.51157, "b": 111.07983000000002, "coord_origin": "TOPLEFT"}}, {"id": 41, "text": "...", "bbox": {"l": 330.63071, "t": 113.49926999999991, "r": 335.73233, "b": 119.23987, "coord_origin": "TOPLEFT"}}, {"id": 42, "text": "<TR>", "bbox": {"l": 322.30579, "t": 141.79236000000003, "r": 335.05988, "b": 146.57617000000005, "coord_origin": "TOPLEFT"}}, {"id": 43, "text": "<TD>", "bbox": {"l": 322.30579, "t": 148.93231000000003, "r": 335.05988, "b": 153.71613000000002, "coord_origin": "TOPLEFT"}}, {"id": 44, "text": "1", "bbox": {"l": 337.54971, "t": 148.55579, "r": 340.95242, "b": 154.29638999999997, "coord_origin": "TOPLEFT"}}, {"id": 45, "text": "</TD><TD colspan=\"2\">", "bbox": {"l": 343.56262, "t": 148.93231000000003, "r": 398.91446, "b": 153.71613000000002, "coord_origin": "TOPLEFT"}}, {"id": 46, "text": "</TD>", "bbox": {"l": 407.41718, "t": 148.93231000000003, "r": 421.58801, "b": 153.71613000000002, "coord_origin": "TOPLEFT"}}, {"id": 47, "text": "</TR><TR>", "bbox": {"l": 322.30579, "t": 156.07232999999997, "r": 349.23022, "b": 160.85613999999998, "coord_origin": "TOPLEFT"}}, {"id": 48, "text": "<TD>", "bbox": {"l": 322.30579, "t": 163.21234000000004, "r": 335.05988, "b": 167.99614999999994, "coord_origin": "TOPLEFT"}}, {"id": 49, "text": "</TD><TD>...", "bbox": {"l": 343.56155, "t": 163.21234000000004, "r": 374.73685, "b": 167.99614999999994, "coord_origin": "TOPLEFT"}}, {"id": 50, "text": "...", "bbox": {"l": 322.30579, "t": 170.35235999999998, "r": 326.55716, "b": 175.13617, "coord_origin": "TOPLEFT"}}, {"id": 51, "text": "1", "bbox": {"l": 323.51111, "t": 89.66967999999997, "r": 326.91382, "b": 95.41027999999994, "coord_origin": "TOPLEFT"}}, {"id": 52, "text": "2", "bbox": {"l": 323.71509, "t": 97.78887999999995, "r": 327.1178, "b": 103.52948000000004, "coord_origin": "TOPLEFT"}}, {"id": 53, "text": "3", "bbox": {"l": 323.71509, "t": 105.98969, "r": 327.1178, "b": 111.73029000000008, "coord_origin": "TOPLEFT"}}, {"id": 54, "text": "2", "bbox": {"l": 401.4816, "t": 148.54625999999996, "r": 404.88431, "b": 154.28687000000002, "coord_origin": "TOPLEFT"}}, {"id": 55, "text": "3", "bbox": {"l": 337.6976, "t": 162.68451000000005, "r": 341.10031, "b": 168.42511000000002, "coord_origin": "TOPLEFT"}}, {"id": 56, "text": "3", "bbox": {"l": 454.46378, "t": 104.54584, "r": 457.86648999999994, "b": 110.28644000000008, "coord_origin": "TOPLEFT"}}, {"id": 57, "text": "2", "bbox": {"l": 493.32580999999993, "t": 91.09546, "r": 496.72852, "b": 96.83605999999997, "coord_origin": "TOPLEFT"}}, {"id": 58, "text": "1", "bbox": {"l": 454.08298, "t": 90.56879000000015, "r": 457.48569000000003, "b": 96.30939000000001, "coord_origin": "TOPLEFT"}}]}, "text": "", "data": null, "provenance": null, "predicted_class": null, "confidence": null}, {"label": "caption", "id": 1, "page_no": 4, "cluster": {"id": 1, "label": "caption", "bbox": {"l": 49.33068084716797, "t": 202.95303344726562, "r": 545.10846, "b": 225.3606414794922, "coord_origin": "TOPLEFT"}, "confidence": 0.9677655100822449, "cells": [{"id": 59, "text": "Figure 3:", "bbox": {"l": 50.112, "t": 204.10535000000004, "r": 86.883949, "b": 213.01189999999997, "coord_origin": "TOPLEFT"}}, {"id": 60, "text": "TableFormer", "bbox": {"l": 94.020996, "t": 203.98577999999998, "r": 149.85141, "b": 212.94214, "coord_origin": "TOPLEFT"}}, {"id": 61, "text": "takes in an image of the PDF and creates bounding box and HTML structure predictions that are", "bbox": {"l": 152.86099, "t": 204.10535000000004, "r": 545.10846, "b": 213.01189999999997, "coord_origin": "TOPLEFT"}}, {"id": 62, "text": "synchronized. The bounding boxes grabs the content from the PDF and inserts it in the structure.", "bbox": {"l": 50.111992, "t": 216.06035999999995, "r": 436.0134, "b": 224.96691999999996, "coord_origin": "TOPLEFT"}}]}, "text": "Figure 3: TableFormer takes in an image of the PDF and creates bounding box and HTML structure predictions that are synchronized. The bounding boxes grabs the content from the PDF and inserts it in the structure."}, {"label": "picture", "id": 2, "page_no": 4, "cluster": {"id": 2, "label": "picture", "bbox": {"l": 53.03322982788086, "t": 257.6654052734375, "r": 285.3731689453125, "b": 507.6688232421875, "coord_origin": "TOPLEFT"}, "confidence": 0.9724959135055542, "cells": [{"id": 63, "text": "Input Image", "bbox": {"l": 74.253464, "t": 258.21472000000006, "r": 101.75846, "b": 264.17474000000004, "coord_origin": "TOPLEFT"}}, {"id": 64, "text": "Tokenised Tags", "bbox": {"l": 122.29972, "t": 258.34520999999995, "r": 157.83972, "b": 264.30524, "coord_origin": "TOPLEFT"}}, {"id": 65, "text": "Multi-Head Attention", "bbox": {"l": 78.549347, "t": 371.38579999999996, "r": 125.68359000000001, "b": 377.04782, "coord_origin": "TOPLEFT"}}, {"id": 66, "text": "Add", "bbox": {"l": 78.513298, "t": 391.31857, "r": 84.644547, "b": 396.98059, "coord_origin": "TOPLEFT"}}, {"id": 67, "text": "& Normalisation", "bbox": {"l": 116.52705, "t": 391.31857, "r": 125.11079999999998, "b": 396.98059, "coord_origin": "TOPLEFT"}}, {"id": 68, "text": "Feed Forward Network", "bbox": {"l": 76.024773, "t": 424.45309, "r": 127.92327000000002, "b": 430.11511, "coord_origin": "TOPLEFT"}}, {"id": 69, "text": "Add", "bbox": {"l": 78.382828, "t": 444.88956, "r": 84.514076, "b": 450.55157, "coord_origin": "TOPLEFT"}}, {"id": 70, "text": "& Normalisation", "bbox": {"l": 116.39658, "t": 444.88956, "r": 124.98033, "b": 450.55157, "coord_origin": "TOPLEFT"}}, {"id": 71, "text": "Linear", "bbox": {"l": 167.46945, "t": 462.44324, "r": 181.6292, "b": 468.10526, "coord_origin": "TOPLEFT"}}, {"id": 72, "text": "Softmax", "bbox": {"l": 165.61292, "t": 478.47107, "r": 184.43242, "b": 484.13309, "coord_origin": "TOPLEFT"}}, {"id": 73, "text": "CNN BACKBONE ENCODER", "bbox": {"l": 65.319511, "t": 324.26235999999994, "r": 132.9245, "b": 330.22235000000006, "coord_origin": "TOPLEFT"}}, {"id": 74, "text": "[30, 1, 2, 3, 4, \u2026 3, ", "bbox": {"l": 119.51457, "t": 269.66394, "r": 162.98782, "b": 274.72992, "coord_origin": "TOPLEFT"}}, {"id": 75, "text": "4, 5, 8, 31]", "bbox": {"l": 128.72858, "t": 274.91394, "r": 151.41083, "b": 279.97992, "coord_origin": "TOPLEFT"}}, {"id": 76, "text": "Positional ", "bbox": {"l": 60.434211999999995, "t": 338.95993, "r": 80.27021, "b": 344.26993, "coord_origin": "TOPLEFT"}}, {"id": 77, "text": "Encoding", "bbox": {"l": 60.598457, "t": 343.38605, "r": 78.854958, "b": 348.69604, "coord_origin": "TOPLEFT"}}, {"id": 78, "text": "Positional ", "bbox": {"l": 134.82877, "t": 293.37762, "r": 154.66476, "b": 298.68762, "coord_origin": "TOPLEFT"}}, {"id": 79, "text": "Encoding", "bbox": {"l": 134.99303, "t": 297.80370999999997, "r": 153.24953, "b": 303.11371, "coord_origin": "TOPLEFT"}}, {"id": 80, "text": "Add & Normalisation", "bbox": {"l": 150.55193, "t": 345.35861, "r": 197.14943, "b": 351.02063, "coord_origin": "TOPLEFT"}}, {"id": 81, "text": "Add", "bbox": {"l": 150.55193, "t": 394.4234, "r": 156.68318, "b": 400.08542, "coord_origin": "TOPLEFT"}}, {"id": 82, "text": "& Normalisation", "bbox": {"l": 188.56567, "t": 394.4234, "r": 197.14943, "b": 400.08542, "coord_origin": "TOPLEFT"}}, {"id": 83, "text": "Multi-Head Attention", "bbox": {"l": 150.18539, "t": 375.66843, "r": 197.31964, "b": 381.33044, "coord_origin": "TOPLEFT"}}, {"id": 84, "text": "Add", "bbox": {"l": 150.55193, "t": 440.24847000000005, "r": 156.68318, "b": 445.91049, "coord_origin": "TOPLEFT"}}, {"id": 85, "text": "& Normalisation", "bbox": {"l": 188.56567, "t": 440.24847000000005, "r": 197.14943, "b": 445.91049, "coord_origin": "TOPLEFT"}}, {"id": 86, "text": "Feed Forward Network", "bbox": {"l": 147.86377, "t": 422.09335, "r": 199.76227, "b": 427.75537, "coord_origin": "TOPLEFT"}}, {"id": 87, "text": "Linear", "bbox": {"l": 241.56567000000004, "t": 314.26285000000007, "r": 255.72542, "b": 319.92487, "coord_origin": "TOPLEFT"}}, {"id": 88, "text": "Linear", "bbox": {"l": 241.91730000000004, "t": 361.36493, "r": 256.07706, "b": 367.02695, "coord_origin": "TOPLEFT"}}, {"id": 89, "text": "Attention", "bbox": {"l": 228.054, "t": 336.61929000000003, "r": 248.72363000000004, "b": 342.28131, "coord_origin": "TOPLEFT"}}, {"id": 90, "text": "Network", "bbox": {"l": 246.2919, "t": 336.61929000000003, "r": 269.39325, "b": 342.28131, "coord_origin": "TOPLEFT"}}, {"id": 91, "text": "MLP", "bbox": {"l": 228.44568000000004, "t": 405.14682, "r": 238.73892, "b": 410.80884, "coord_origin": "TOPLEFT"}}, {"id": 92, "text": "Linear ", "bbox": {"l": 256.29767, "t": 405.2032500000001, "r": 271.77792, "b": 410.86526, "coord_origin": "TOPLEFT"}}, {"id": 93, "text": "Sigmoid", "bbox": {"l": 239.54543, "t": 382.21344, "r": 258.08942, "b": 387.87546, "coord_origin": "TOPLEFT"}}, {"id": 94, "text": "Transformer Encoder Network", "bbox": {"l": 54.14704100000001, "t": 384.87183, "r": 59.51152, "b": 449.78326, "coord_origin": "TOPLEFT"}}, {"id": 95, "text": "x2", "bbox": {"l": 54.235424, "t": 373.81232, "r": 59.30449699999999, "b": 378.45421999999996, "coord_origin": "TOPLEFT"}}, {"id": 96, "text": "Encoded Output", "bbox": {"l": 85.295891, "t": 484.53189, "r": 122.16431, "b": 490.36688, "coord_origin": "TOPLEFT"}}, {"id": 97, "text": "Encoded Output", "bbox": {"l": 229.66599, "t": 279.54607999999996, "r": 265.3194, "b": 285.45572000000004, "coord_origin": "TOPLEFT"}}, {"id": 98, "text": "Predicted Tags", "bbox": {"l": 157.17369, "t": 500.3031, "r": 190.41711, "b": 506.12943, "coord_origin": "TOPLEFT"}}, {"id": 99, "text": "Bounding Boxes & ", "bbox": {"l": 227.81598999999997, "t": 438.05542, "r": 270.78442, "b": 443.89206, "coord_origin": "TOPLEFT"}}, {"id": 100, "text": "Classification", "bbox": {"l": 233.70262, "t": 444.06183, "r": 263.51105, "b": 449.8904999999999, "coord_origin": "TOPLEFT"}}, {"id": 101, "text": "Transformer ", "bbox": {"l": 184.74655, "t": 293.39502, "r": 212.16055, "b": 298.75903, "coord_origin": "TOPLEFT"}}, {"id": 102, "text": "Decoder Network", "bbox": {"l": 178.91229, "t": 299.14502, "r": 216.74378999999996, "b": 304.50903, "coord_origin": "TOPLEFT"}}, {"id": 103, "text": "x4", "bbox": {"l": 194.24574, "t": 282.7822, "r": 198.89099, "b": 287.84817999999996, "coord_origin": "TOPLEFT"}}, {"id": 104, "text": "CELL BBOX DECODER", "bbox": {"l": 221.45587, "t": 271.86914, "r": 276.47089, "b": 277.82916, "coord_origin": "TOPLEFT"}}, {"id": 105, "text": "Masked Multi-Head ", "bbox": {"l": 151.65219, "t": 323.44241, "r": 197.29019, "b": 329.10443, "coord_origin": "TOPLEFT"}}, {"id": 106, "text": "Attention", "bbox": {"l": 163.43277, "t": 329.44241, "r": 184.19028, "b": 335.10443, "coord_origin": "TOPLEFT"}}]}, "text": "", "data": null, "provenance": null, "predicted_class": null, "confidence": null}, {"label": "caption", "id": 3, "page_no": 4, "cluster": {"id": 3, "label": "caption", "bbox": {"l": 49.31888961791992, "t": 527.1819458007812, "r": 286.778076171875, "b": 680.416748046875, "coord_origin": "TOPLEFT"}, "confidence": 0.8913402557373047, "cells": [{"id": 107, "text": "Figure 4: Given an input image of a table, the", "bbox": {"l": 50.112, "t": 527.90237, "r": 229.78752, "b": 536.80893, "coord_origin": "TOPLEFT"}}, {"id": 108, "text": "Encoder", "bbox": {"l": 231.787, "t": 527.7828099999999, "r": 267.76196, "b": 536.7392, "coord_origin": "TOPLEFT"}}, {"id": 109, "text": "pro-", "bbox": {"l": 269.76401, "t": 527.90237, "r": 286.36169, "b": 536.80893, "coord_origin": "TOPLEFT"}}, {"id": 110, "text": "duces fixed-length features that represent the input image.", "bbox": {"l": 50.112015, "t": 539.85738, "r": 286.36508, "b": 548.76393, "coord_origin": "TOPLEFT"}}, {"id": 111, "text": "The features are then passed to both the", "bbox": {"l": 50.112015, "t": 551.81337, "r": 205.84735, "b": 560.71992, "coord_origin": "TOPLEFT"}}, {"id": 112, "text": "Structure Decoder", "bbox": {"l": 208.01802, "t": 551.69382, "r": 286.36392, "b": 560.6501900000001, "coord_origin": "TOPLEFT"}}, {"id": 113, "text": "and", "bbox": {"l": 50.112015, "t": 563.76837, "r": 64.498009, "b": 572.67493, "coord_origin": "TOPLEFT"}}, {"id": 114, "text": "Cell BBox Decoder", "bbox": {"l": 68.165016, "t": 563.64882, "r": 151.31288, "b": 572.60519, "coord_origin": "TOPLEFT"}}, {"id": 115, "text": ".", "bbox": {"l": 151.31302, "t": 563.76837, "r": 153.80367, "b": 572.67493, "coord_origin": "TOPLEFT"}}, {"id": 116, "text": "During training, the", "bbox": {"l": 160.41884, "t": 563.76837, "r": 241.93283000000002, "b": 572.67493, "coord_origin": "TOPLEFT"}}, {"id": 117, "text": "Structure", "bbox": {"l": 245.59502, "t": 563.64882, "r": 286.362, "b": 572.60519, "coord_origin": "TOPLEFT"}}, {"id": 118, "text": "Decoder", "bbox": {"l": 50.112015, "t": 575.60382, "r": 85.519089, "b": 584.5602, "coord_origin": "TOPLEFT"}}, {"id": 119, "text": "receives \u2018tokenized tags\u2019 of the HTML code that", "bbox": {"l": 88.623016, "t": 575.7233699999999, "r": 286.36072, "b": 584.6299300000001, "coord_origin": "TOPLEFT"}}, {"id": 120, "text": "represent the table structure. Afterwards, a transformer en-", "bbox": {"l": 50.112015, "t": 587.6783800000001, "r": 286.36511, "b": 596.58493, "coord_origin": "TOPLEFT"}}, {"id": 121, "text": "coder and decoder architecture is employed to produce fea-", "bbox": {"l": 50.112015, "t": 599.63338, "r": 286.36508, "b": 608.53993, "coord_origin": "TOPLEFT"}}, {"id": 122, "text": "tures that are received by a linear layer, and the", "bbox": {"l": 50.112015, "t": 611.58838, "r": 240.43756000000002, "b": 620.4949300000001, "coord_origin": "TOPLEFT"}}, {"id": 123, "text": "Cell BBox", "bbox": {"l": 243.19801, "t": 611.46883, "r": 286.36597, "b": 620.4252, "coord_origin": "TOPLEFT"}}, {"id": 124, "text": "Decoder. The linear layer is applied to the features to", "bbox": {"l": 50.112015, "t": 623.42482, "r": 286.36511, "b": 632.3812, "coord_origin": "TOPLEFT"}}, {"id": 125, "text": "predict the tags. Simultaneously, the Cell BBox Decoder", "bbox": {"l": 50.112015, "t": 635.37982, "r": 286.36508, "b": 644.3362, "coord_origin": "TOPLEFT"}}, {"id": 126, "text": "selects features referring to the data cells (\u2018", "bbox": {"l": 50.112015, "t": 647.45438, "r": 220.58205, "b": 656.36093, "coord_origin": "TOPLEFT"}}, {"id": 127, "text": "<", "bbox": {"l": 220.57802000000004, "t": 647.29497, "r": 228.32693, "b": 656.14175, "coord_origin": "TOPLEFT"}}, {"id": 128, "text": "td", "bbox": {"l": 228.32700999999997, "t": 647.45438, "r": 236.07791000000003, "b": 656.36093, "coord_origin": "TOPLEFT"}}, {"id": 129, "text": ">", "bbox": {"l": 236.07802000000004, "t": 647.29497, "r": 243.82693, "b": 656.14175, "coord_origin": "TOPLEFT"}}, {"id": 130, "text": "\u2019, \u2018", "bbox": {"l": 243.82602, "t": 647.45438, "r": 255.29298000000003, "b": 656.36093, "coord_origin": "TOPLEFT"}}, {"id": 131, "text": "<", "bbox": {"l": 255.29102000000003, "t": 647.29497, "r": 263.03992, "b": 656.14175, "coord_origin": "TOPLEFT"}}, {"id": 132, "text": "\u2019) and", "bbox": {"l": 263.04001, "t": 647.45438, "r": 286.36246, "b": 656.36093, "coord_origin": "TOPLEFT"}}, {"id": 133, "text": "passes them through an attention network, an MLP, and a", "bbox": {"l": 50.112015, "t": 659.40938, "r": 286.36511, "b": 668.31594, "coord_origin": "TOPLEFT"}}, {"id": 134, "text": "linear layer to predict the bounding boxes.", "bbox": {"l": 50.112015, "t": 671.36438, "r": 218.46996, "b": 680.27094, "coord_origin": "TOPLEFT"}}]}, "text": "Figure 4: Given an input image of a table, the Encoder produces fixed-length features that represent the input image. The features are then passed to both the Structure Decoder and Cell BBox Decoder . During training, the Structure Decoder receives \u2018tokenized tags\u2019 of the HTML code that represent the table structure. Afterwards, a transformer encoder and decoder architecture is employed to produce features that are received by a linear layer, and the Cell BBox Decoder. The linear layer is applied to the features to predict the tags. Simultaneously, the Cell BBox Decoder selects features referring to the data cells (\u2018 < td > \u2019, \u2018 < \u2019) and passes them through an attention network, an MLP, and a linear layer to predict the bounding boxes."}, {"label": "text", "id": 4, "page_no": 4, "cluster": {"id": 4, "label": "text", "bbox": {"l": 308.1139831542969, "t": 248.6479034423828, "r": 545.3682861328125, "b": 294.3069499999999, "coord_origin": "TOPLEFT"}, "confidence": 0.9831458330154419, "cells": [{"id": 135, "text": "forming classification, and adding an adaptive pooling", "bbox": {"l": 308.862, "t": 249.53441999999995, "r": 523.05786, "b": 258.44097999999997, "coord_origin": "TOPLEFT"}}, {"id": 136, "text": "layer", "bbox": {"l": 525.19983, "t": 249.53441999999995, "r": 545.11505, "b": 258.44097999999997, "coord_origin": "TOPLEFT"}}, {"id": 137, "text": "of size 28*28. ResNet by default downsamples the", "bbox": {"l": 308.862, "t": 261.49042, "r": 517.55847, "b": 270.39697, "coord_origin": "TOPLEFT"}}, {"id": 138, "text": "image", "bbox": {"l": 520.76642, "t": 261.49042, "r": 545.11499, "b": 270.39697, "coord_origin": "TOPLEFT"}}, {"id": 139, "text": "resolution by 32 and then the encoded image is provided", "bbox": {"l": 308.862, "t": 273.44537, "r": 534.80377, "b": 282.35196, "coord_origin": "TOPLEFT"}}, {"id": 140, "text": "to", "bbox": {"l": 537.36414, "t": 273.44537, "r": 545.11505, "b": 282.35196, "coord_origin": "TOPLEFT"}}, {"id": 141, "text": "both the", "bbox": {"l": 308.862, "t": 285.40039, "r": 341.24045, "b": 294.3069499999999, "coord_origin": "TOPLEFT"}}, {"id": 142, "text": "Structure Decoder", "bbox": {"l": 343.73099, "t": 285.49005, "r": 417.23508, "b": 294.07782000000003, "coord_origin": "TOPLEFT"}}, {"id": 143, "text": ", and", "bbox": {"l": 417.23398, "t": 285.40039, "r": 436.60129, "b": 294.3069499999999, "coord_origin": "TOPLEFT"}}, {"id": 144, "text": "Cell BBox Decoder", "bbox": {"l": 439.09198, "t": 285.49005, "r": 516.56116, "b": 294.07782000000003, "coord_origin": "TOPLEFT"}}, {"id": 145, "text": ".", "bbox": {"l": 516.56097, "t": 285.40039, "r": 519.05164, "b": 294.3069499999999, "coord_origin": "TOPLEFT"}}]}, "text": "forming classification, and adding an adaptive pooling layer of size 28*28. ResNet by default downsamples the image resolution by 32 and then the encoded image is provided to both the Structure Decoder , and Cell BBox Decoder ."}, {"label": "text", "id": 5, "page_no": 4, "cluster": {"id": 5, "label": "text", "bbox": {"l": 308.0434265136719, "t": 296.45684814453125, "r": 545.462646484375, "b": 414.4779052734375, "coord_origin": "TOPLEFT"}, "confidence": 0.9862009882926941, "cells": [{"id": 146, "text": "Structure Decoder.", "bbox": {"l": 320.81696, "t": 297.33981, "r": 400.86649, "b": 306.2962, "coord_origin": "TOPLEFT"}}, {"id": 147, "text": "The transformer architecture of", "bbox": {"l": 403.91394, "t": 297.45938, "r": 528.33685, "b": 306.36594, "coord_origin": "TOPLEFT"}}, {"id": 148, "text": "this", "bbox": {"l": 530.7179, "t": 297.45938, "r": 545.11383, "b": 306.36594, "coord_origin": "TOPLEFT"}}, {"id": 149, "text": "component is based on the work proposed in [31].", "bbox": {"l": 308.86194, "t": 309.41437, "r": 517.5285, "b": 318.32092, "coord_origin": "TOPLEFT"}}, {"id": 150, "text": "After", "bbox": {"l": 524.09387, "t": 309.41437, "r": 545.11493, "b": 318.32092, "coord_origin": "TOPLEFT"}}, {"id": 151, "text": "extensive experimentation, the", "bbox": {"l": 308.86194, "t": 321.36934999999994, "r": 432.35833999999994, "b": 330.27591, "coord_origin": "TOPLEFT"}}, {"id": 152, "text": "Structure Decoder", "bbox": {"l": 435.81995000000006, "t": 321.45901, "r": 510.29041, "b": 330.04678, "coord_origin": "TOPLEFT"}}, {"id": 153, "text": "is", "bbox": {"l": 513.97797, "t": 321.36934999999994, "r": 520.62305, "b": 330.27591, "coord_origin": "TOPLEFT"}}, {"id": 154, "text": "mod-", "bbox": {"l": 524.08008, "t": 321.36934999999994, "r": 545.11115, "b": 330.27591, "coord_origin": "TOPLEFT"}}, {"id": 155, "text": "eled as a transformer encoder with two encoder layers", "bbox": {"l": 308.86197, "t": 333.32434, "r": 527.76013, "b": 342.2309, "coord_origin": "TOPLEFT"}}, {"id": 156, "text": "and", "bbox": {"l": 530.729, "t": 333.32434, "r": 545.11499, "b": 342.2309, "coord_origin": "TOPLEFT"}}, {"id": 157, "text": "a transformer decoder made from a stack of 4 decoder", "bbox": {"l": 308.86197, "t": 345.27933, "r": 526.85352, "b": 354.18588, "coord_origin": "TOPLEFT"}}, {"id": 158, "text": "lay-", "bbox": {"l": 529.62311, "t": 345.27933, "r": 545.11493, "b": 354.18588, "coord_origin": "TOPLEFT"}}, {"id": 159, "text": "ers that comprise mainly of multi-head attention and", "bbox": {"l": 308.86197, "t": 357.23532, "r": 524.51245, "b": 366.14188, "coord_origin": "TOPLEFT"}}, {"id": 160, "text": "feed", "bbox": {"l": 527.96948, "t": 357.23532, "r": 545.11511, "b": 366.14188, "coord_origin": "TOPLEFT"}}, {"id": 161, "text": "forward layers.", "bbox": {"l": 308.86197, "t": 369.19031000000007, "r": 370.39096, "b": 378.09685999999994, "coord_origin": "TOPLEFT"}}, {"id": 162, "text": "This configuration uses fewer layers", "bbox": {"l": 377.44449, "t": 369.19031000000007, "r": 526.91339, "b": 378.09685999999994, "coord_origin": "TOPLEFT"}}, {"id": 163, "text": "and", "bbox": {"l": 530.72906, "t": 369.19031000000007, "r": 545.11505, "b": 378.09685999999994, "coord_origin": "TOPLEFT"}}, {"id": 164, "text": "heads in comparison to networks applied to other", "bbox": {"l": 308.86197, "t": 381.14529000000005, "r": 505.46395999999993, "b": 390.05185, "coord_origin": "TOPLEFT"}}, {"id": 165, "text": "problems", "bbox": {"l": 508.03430000000003, "t": 381.14529000000005, "r": 545.11511, "b": 390.05185, "coord_origin": "TOPLEFT"}}, {"id": 166, "text": "(e.g. \u201cScene Understanding\u201d, \u201cImage Captioning\u201d),", "bbox": {"l": 308.86197, "t": 393.10028, "r": 517.68799, "b": 402.00684, "coord_origin": "TOPLEFT"}}, {"id": 167, "text": "some-", "bbox": {"l": 520.76642, "t": 393.10028, "r": 545.11499, "b": 402.00684, "coord_origin": "TOPLEFT"}}, {"id": 168, "text": "thing which we relate to the simplicity of table images.", "bbox": {"l": 308.86197, "t": 405.05526999999995, "r": 528.01935, "b": 413.96182, "coord_origin": "TOPLEFT"}}]}, "text": "Structure Decoder. The transformer architecture of this component is based on the work proposed in [31]. After extensive experimentation, the Structure Decoder is modeled as a transformer encoder with two encoder layers and a transformer decoder made from a stack of 4 decoder layers that comprise mainly of multi-head attention and feed forward layers. This configuration uses fewer layers and heads in comparison to networks applied to other problems (e.g. \u201cScene Understanding\u201d, \u201cImage Captioning\u201d), something which we relate to the simplicity of table images."}, {"label": "text", "id": 6, "page_no": 4, "cluster": {"id": 6, "label": "text", "bbox": {"l": 307.9245300292969, "t": 416.45196533203125, "r": 545.5032348632812, "b": 545.57271, "coord_origin": "TOPLEFT"}, "confidence": 0.9851906895637512, "cells": [{"id": 169, "text": "The transformer encoder receives an encoded", "bbox": {"l": 320.81696, "t": 417.11426, "r": 515.49609, "b": 426.02081, "coord_origin": "TOPLEFT"}}, {"id": 170, "text": "image", "bbox": {"l": 520.7663, "t": 417.11426, "r": 545.11487, "b": 426.02081, "coord_origin": "TOPLEFT"}}, {"id": 171, "text": "from the", "bbox": {"l": 308.86197, "t": 429.0692399999999, "r": 343.72107, "b": 437.9758, "coord_origin": "TOPLEFT"}}, {"id": 172, "text": "CNN Backbone Network", "bbox": {"l": 347.03796, "t": 429.15891, "r": 446.45471000000003, "b": 437.74667, "coord_origin": "TOPLEFT"}}, {"id": 173, "text": "and refines it", "bbox": {"l": 449.93996999999996, "t": 429.0692399999999, "r": 503.06055000000003, "b": 437.9758, "coord_origin": "TOPLEFT"}}, {"id": 174, "text": "through", "bbox": {"l": 506.37808, "t": 429.0692399999999, "r": 537.3717, "b": 437.9758, "coord_origin": "TOPLEFT"}}, {"id": 175, "text": "a", "bbox": {"l": 540.68927, "t": 429.0692399999999, "r": 545.11267, "b": 437.9758, "coord_origin": "TOPLEFT"}}, {"id": 176, "text": "multi-head dot-product attention layer, followed by a", "bbox": {"l": 308.86197, "t": 441.02423, "r": 522.78894, "b": 449.93079, "coord_origin": "TOPLEFT"}}, {"id": 177, "text": "Feed", "bbox": {"l": 525.7478, "t": 441.02423, "r": 545.11511, "b": 449.93079, "coord_origin": "TOPLEFT"}}, {"id": 178, "text": "Forward Network.", "bbox": {"l": 308.86197, "t": 452.97922, "r": 384.14929, "b": 461.88577, "coord_origin": "TOPLEFT"}}, {"id": 179, "text": "During training, the transformer", "bbox": {"l": 393.37466, "t": 452.97922, "r": 527.84985, "b": 461.88577, "coord_origin": "TOPLEFT"}}, {"id": 180, "text": "de-", "bbox": {"l": 532.39282, "t": 452.97922, "r": 545.11505, "b": 461.88577, "coord_origin": "TOPLEFT"}}, {"id": 181, "text": "coder receives as input the output feature produced by", "bbox": {"l": 308.86197, "t": 464.93521, "r": 529.7627, "b": 473.84177, "coord_origin": "TOPLEFT"}}, {"id": 182, "text": "the", "bbox": {"l": 532.94073, "t": 464.93521, "r": 545.11505, "b": 473.84177, "coord_origin": "TOPLEFT"}}, {"id": 183, "text": "transformer encoder, and the tokenized input of the", "bbox": {"l": 308.86197, "t": 476.8902, "r": 514.17126, "b": 485.79675, "coord_origin": "TOPLEFT"}}, {"id": 184, "text": "HTML", "bbox": {"l": 516.89105, "t": 476.8902, "r": 545.11511, "b": 485.79675, "coord_origin": "TOPLEFT"}}, {"id": 185, "text": "ground-truth tags. Using a stack of multi-head attention", "bbox": {"l": 308.86197, "t": 488.84518, "r": 527.63068, "b": 497.75174, "coord_origin": "TOPLEFT"}}, {"id": 186, "text": "lay-", "bbox": {"l": 529.62317, "t": 488.84518, "r": 545.11499, "b": 497.75174, "coord_origin": "TOPLEFT"}}, {"id": 187, "text": "ers, different aspects of the tag sequence could be", "bbox": {"l": 308.86197, "t": 500.80017, "r": 508.3630999999999, "b": 509.70673, "coord_origin": "TOPLEFT"}}, {"id": 188, "text": "inferred.", "bbox": {"l": 511.09286000000003, "t": 500.80017, "r": 545.11511, "b": 509.70673, "coord_origin": "TOPLEFT"}}, {"id": 189, "text": "This is achieved by each attention head on a layer operating", "bbox": {"l": 308.86197, "t": 512.7551599999999, "r": 545.11499, "b": 521.6617100000001, "coord_origin": "TOPLEFT"}}, {"id": 190, "text": "in a different subspace, and then combining altogether their", "bbox": {"l": 308.86197, "t": 524.71115, "r": 545.11511, "b": 533.61771, "coord_origin": "TOPLEFT"}}, {"id": 191, "text": "attention score.", "bbox": {"l": 308.86197, "t": 536.66615, "r": 369.73349, "b": 545.57271, "coord_origin": "TOPLEFT"}}]}, "text": "The transformer encoder receives an encoded image from the CNN Backbone Network and refines it through a multi-head dot-product attention layer, followed by a Feed Forward Network. During training, the transformer decoder receives as input the output feature produced by the transformer encoder, and the tokenized input of the HTML ground-truth tags. Using a stack of multi-head attention layers, different aspects of the tag sequence could be inferred. This is achieved by each attention head on a layer operating in a different subspace, and then combining altogether their attention score."}, {"label": "text", "id": 7, "page_no": 4, "cluster": {"id": 7, "label": "text", "bbox": {"l": 307.90594482421875, "t": 547.4575805664062, "r": 545.403076171875, "b": 653.4934692382812, "coord_origin": "TOPLEFT"}, "confidence": 0.9869197010993958, "cells": [{"id": 192, "text": "Cell BBox Decoder.", "bbox": {"l": 320.81696, "t": 548.6046, "r": 404.76184, "b": 557.56097, "coord_origin": "TOPLEFT"}}, {"id": 193, "text": "Our architecture allows to simul-", "bbox": {"l": 410.34094, "t": 548.72415, "r": 545.11505, "b": 557.63071, "coord_origin": "TOPLEFT"}}, {"id": 194, "text": "taneously predict HTML tags and bounding boxes for each", "bbox": {"l": 308.86194, "t": 560.68015, "r": 545.11493, "b": 569.5867000000001, "coord_origin": "TOPLEFT"}}, {"id": 195, "text": "table cell without the need of a separate object detector end", "bbox": {"l": 308.86194, "t": 572.6351500000001, "r": 545.11511, "b": 581.5417, "coord_origin": "TOPLEFT"}}, {"id": 196, "text": "to end. This approach is inspired by DETR [1] which em-", "bbox": {"l": 308.86194, "t": 584.59015, "r": 545.11493, "b": 593.4967, "coord_origin": "TOPLEFT"}}, {"id": 197, "text": "ploys a Transformer Encoder, and Decoder that looks for", "bbox": {"l": 308.86194, "t": 596.54515, "r": 545.11499, "b": 605.45171, "coord_origin": "TOPLEFT"}}, {"id": 198, "text": "a specific number of object queries (potential object detec-", "bbox": {"l": 308.86194, "t": 608.50015, "r": 545.11505, "b": 617.40671, "coord_origin": "TOPLEFT"}}, {"id": 199, "text": "tions). As our model utilizes a transformer architecture, the", "bbox": {"l": 308.86194, "t": 620.45515, "r": 545.11505, "b": 629.36171, "coord_origin": "TOPLEFT"}}, {"id": 200, "text": "hidden state of the", "bbox": {"l": 308.86194, "t": 632.41115, "r": 381.67859, "b": 641.3177000000001, "coord_origin": "TOPLEFT"}}, {"id": 201, "text": "<", "bbox": {"l": 383.99695, "t": 632.25174, "r": 391.74585, "b": 641.09853, "coord_origin": "TOPLEFT"}}, {"id": 202, "text": "td", "bbox": {"l": 391.74594, "t": 632.41115, "r": 399.49686, "b": 641.3177000000001, "coord_origin": "TOPLEFT"}}, {"id": 203, "text": ">", "bbox": {"l": 399.49695, "t": 632.25174, "r": 407.24585, "b": 641.09853, "coord_origin": "TOPLEFT"}}, {"id": 204, "text": "\u2019 and \u2018", "bbox": {"l": 407.24594, "t": 632.41115, "r": 432.90958, "b": 641.3177000000001, "coord_origin": "TOPLEFT"}}, {"id": 205, "text": "<", "bbox": {"l": 432.90792999999996, "t": 632.25174, "r": 440.65683000000007, "b": 641.09853, "coord_origin": "TOPLEFT"}}, {"id": 206, "text": "\u2019 HTML structure tags be-", "bbox": {"l": 440.65691999999996, "t": 632.41115, "r": 545.11475, "b": 641.3177000000001, "coord_origin": "TOPLEFT"}}, {"id": 207, "text": "come the object query.", "bbox": {"l": 308.86194, "t": 644.3661500000001, "r": 398.96371, "b": 653.27271, "coord_origin": "TOPLEFT"}}]}, "text": "Cell BBox Decoder. Our architecture allows to simultaneously predict HTML tags and bounding boxes for each table cell without the need of a separate object detector end to end. This approach is inspired by DETR [1] which employs a Transformer Encoder, and Decoder that looks for a specific number of object queries (potential object detections). As our model utilizes a transformer architecture, the hidden state of the < td > \u2019 and \u2018 < \u2019 HTML structure tags become the object query."}, {"label": "text", "id": 8, "page_no": 4, "cluster": {"id": 8, "label": "text", "bbox": {"l": 307.9397277832031, "t": 655.742919921875, "r": 545.2218627929688, "b": 713.3260498046875, "coord_origin": "TOPLEFT"}, "confidence": 0.9852352142333984, "cells": [{"id": 208, "text": "The encoding generated by the", "bbox": {"l": 320.81693, "t": 656.42516, "r": 444.34316999999993, "b": 665.33172, "coord_origin": "TOPLEFT"}}, {"id": 209, "text": "CNN Backbone Network", "bbox": {"l": 447.00591999999995, "t": 656.51482, "r": 545.1076, "b": 665.10258, "coord_origin": "TOPLEFT"}}, {"id": 210, "text": "along with the features acquired for every data cell from the", "bbox": {"l": 308.86194, "t": 668.38016, "r": 545.11505, "b": 677.2867200000001, "coord_origin": "TOPLEFT"}}, {"id": 211, "text": "Transformer Decoder are then passed to the attention net-", "bbox": {"l": 308.86194, "t": 680.33516, "r": 545.11505, "b": 689.24172, "coord_origin": "TOPLEFT"}}, {"id": 212, "text": "work. The attention network takes both inputs and learns to", "bbox": {"l": 308.86194, "t": 692.290161, "r": 545.11505, "b": 701.196724, "coord_origin": "TOPLEFT"}}, {"id": 213, "text": "provide an attention weighted encoding. This weighted at-", "bbox": {"l": 308.86194, "t": 704.245163, "r": 545.11505, "b": 713.151726, "coord_origin": "TOPLEFT"}}]}, "text": "The encoding generated by the CNN Backbone Network along with the features acquired for every data cell from the Transformer Decoder are then passed to the attention network. The attention network takes both inputs and learns to provide an attention weighted encoding. This weighted at-"}, {"label": "page_footer", "id": 9, "page_no": 4, "cluster": {"id": 9, "label": "page_footer", "bbox": {"l": 294.5858459472656, "t": 733.3272094726562, "r": 300.10223, "b": 743.039722, "coord_origin": "TOPLEFT"}, "confidence": 0.8719565868377686, "cells": [{"id": 214, "text": "5", "bbox": {"l": 295.12094, "t": 734.13316, "r": 300.10223, "b": 743.039722, "coord_origin": "TOPLEFT"}}]}, "text": "5"}], "body": [{"label": "picture", "id": 0, "page_no": 4, "cluster": {"id": 0, "label": "picture", "bbox": {"l": 74.30538940429688, "t": 77.91118621826172, "r": 519.9801025390625, "b": 183.99445000000003, "coord_origin": "TOPLEFT"}, "confidence": 0.9296938180923462, "cells": [{"id": 0, "text": "1.", "bbox": {"l": 81.688072, "t": 122.43970000000002, "r": 84.927567, "b": 125.62891000000002, "coord_origin": "TOPLEFT"}}, {"id": 1, "text": "Item", "bbox": {"l": 86.54731, "t": 122.43970000000002, "r": 93.026291, "b": 125.62891000000002, "coord_origin": "TOPLEFT"}}, {"id": 2, "text": "Amount", "bbox": {"l": 102.50498, "t": 115.25214000000005, "r": 115.3461, "b": 118.44135000000006, "coord_origin": "TOPLEFT"}}, {"id": 3, "text": "Names", "bbox": {"l": 82.140205, "t": 115.21489999999994, "r": 93.291527, "b": 118.40410999999995, "coord_origin": "TOPLEFT"}}, {"id": 4, "text": "1000", "bbox": {"l": 96.748268, "t": 122.43970000000002, "r": 104.3119, "b": 125.62891000000002, "coord_origin": "TOPLEFT"}}, {"id": 5, "text": "500", "bbox": {"l": 96.748268, "t": 127.74370999999985, "r": 102.42083, "b": 130.93291999999997, "coord_origin": "TOPLEFT"}}, {"id": 6, "text": "3500", "bbox": {"l": 96.748268, "t": 133.45569, "r": 104.3119, "b": 136.6449, "coord_origin": "TOPLEFT"}}, {"id": 7, "text": "150", "bbox": {"l": 96.748268, "t": 139.16772000000003, "r": 102.42083, "b": 142.35693000000003, "coord_origin": "TOPLEFT"}}, {"id": 8, "text": "unit", "bbox": {"l": 110.66107, "t": 122.43970000000002, "r": 116.14391, "b": 125.62891000000002, "coord_origin": "TOPLEFT"}}, {"id": 9, "text": "unit", "bbox": {"l": 110.66107, "t": 127.74370999999985, "r": 116.14391, "b": 130.93291999999997, "coord_origin": "TOPLEFT"}}, {"id": 10, "text": "unit", "bbox": {"l": 110.66107, "t": 133.45569, "r": 116.14391, "b": 136.6449, "coord_origin": "TOPLEFT"}}, {"id": 11, "text": "unit", "bbox": {"l": 110.66107, "t": 139.16772000000003, "r": 116.14391, "b": 142.35693000000003, "coord_origin": "TOPLEFT"}}, {"id": 12, "text": "2.", "bbox": {"l": 81.688072, "t": 127.74370999999985, "r": 84.927567, "b": 130.93291999999997, "coord_origin": "TOPLEFT"}}, {"id": 13, "text": "Item", "bbox": {"l": 86.54731, "t": 127.74370999999985, "r": 93.026291, "b": 130.93291999999997, "coord_origin": "TOPLEFT"}}, {"id": 14, "text": "3.", "bbox": {"l": 81.688072, "t": 133.45569, "r": 84.927567, "b": 136.6449, "coord_origin": "TOPLEFT"}}, {"id": 15, "text": "Item", "bbox": {"l": 86.54731, "t": 133.45569, "r": 93.026291, "b": 136.6449, "coord_origin": "TOPLEFT"}}, {"id": 16, "text": "4.", "bbox": {"l": 81.688072, "t": 139.16772000000003, "r": 84.927567, "b": 142.35693000000003, "coord_origin": "TOPLEFT"}}, {"id": 17, "text": "Item", "bbox": {"l": 86.54731, "t": 139.16772000000003, "r": 93.026291, "b": 142.35693000000003, "coord_origin": "TOPLEFT"}}, {"id": 18, "text": "Extracted", "bbox": {"l": 88.084389, "t": 90.49738000000002, "r": 113.93649, "b": 96.23798, "coord_origin": "TOPLEFT"}}, {"id": 19, "text": "Table Images", "bbox": {"l": 82.81002, "t": 97.63738999999998, "r": 119.21240000000002, "b": 103.37798999999995, "coord_origin": "TOPLEFT"}}, {"id": 20, "text": "Standardized", "bbox": {"l": 143.94247, "t": 100.60235999999998, "r": 180.01131, "b": 106.34295999999995, "coord_origin": "TOPLEFT"}}, {"id": 21, "text": "Images", "bbox": {"l": 151.94064, "t": 107.74237000000005, "r": 172.0118, "b": 113.48297000000014, "coord_origin": "TOPLEFT"}}, {"id": 22, "text": "BBox", "bbox": {"l": 251.76939000000002, "t": 80.93096999999989, "r": 266.39557, "b": 86.67156999999997, "coord_origin": "TOPLEFT"}}, {"id": 23, "text": "Decoder", "bbox": {"l": 247.51601, "t": 86.03101000000004, "r": 270.65021, "b": 91.77161000000001, "coord_origin": "TOPLEFT"}}, {"id": 24, "text": "BBoxes", "bbox": {"l": 331.03699, "t": 78.55980999999997, "r": 352.12589, "b": 84.30042000000003, "coord_origin": "TOPLEFT"}}, {"id": 25, "text": "BBoxes can be", "bbox": {"l": 390.56421, "t": 96.03223000000003, "r": 431.7261, "b": 101.77282999999989, "coord_origin": "TOPLEFT"}}, {"id": 26, "text": "traced back to the", "bbox": {"l": 386.82422, "t": 102.15228000000013, "r": 435.46966999999995, "b": 107.89287999999999, "coord_origin": "TOPLEFT"}}, {"id": 27, "text": "original image to", "bbox": {"l": 388.69589, "t": 108.27228000000002, "r": 433.6032400000001, "b": 114.01288000000011, "coord_origin": "TOPLEFT"}}, {"id": 28, "text": "extract content", "bbox": {"l": 391.07761, "t": 114.39227000000005, "r": 431.22542999999996, "b": 120.13286999999991, "coord_origin": "TOPLEFT"}}, {"id": 29, "text": "Structure Tags sequence", "bbox": {"l": 431.22650000000004, "t": 151.68511999999998, "r": 498.82068, "b": 157.42571999999996, "coord_origin": "TOPLEFT"}}, {"id": 30, "text": "provide full description of", "bbox": {"l": 431.1738, "t": 157.80517999999995, "r": 498.87753000000004, "b": 163.54578000000004, "coord_origin": "TOPLEFT"}}, {"id": 31, "text": "the table structure", "bbox": {"l": 440.5289, "t": 163.92516999999998, "r": 489.51827999999995, "b": 169.66576999999995, "coord_origin": "TOPLEFT"}}, {"id": 32, "text": "Structure Tags", "bbox": {"l": 328.37479, "t": 178.25385000000006, "r": 367.72333, "b": 183.99445000000003, "coord_origin": "TOPLEFT"}}, {"id": 33, "text": "BBoxes in sync", "bbox": {"l": 331.84451, "t": 123.90886999999998, "r": 373.67963, "b": 129.64948000000015, "coord_origin": "TOPLEFT"}}, {"id": 34, "text": "with tag sequence", "bbox": {"l": 331.84451, "t": 129.00885000000017, "r": 381.17786, "b": 134.74945000000002, "coord_origin": "TOPLEFT"}}, {"id": 35, "text": "Encoder", "bbox": {"l": 196.62633, "t": 88.11621000000002, "r": 219.42332, "b": 93.85681, "coord_origin": "TOPLEFT"}}, {"id": 36, "text": "Structure", "bbox": {"l": 246.66771, "t": 129.4946900000001, "r": 271.49899, "b": 135.23528999999996, "coord_origin": "TOPLEFT"}}, {"id": 37, "text": "Decoder", "bbox": {"l": 247.51601, "t": 134.59473000000003, "r": 270.65021, "b": 140.33533, "coord_origin": "TOPLEFT"}}, {"id": 38, "text": "[x1, y2, x2, y2]", "bbox": {"l": 330.63071, "t": 89.01923, "r": 365.55347, "b": 94.75982999999997, "coord_origin": "TOPLEFT"}}, {"id": 39, "text": "[x1', y2', x2', y2']", "bbox": {"l": 330.63071, "t": 97.17926, "r": 370.22717, "b": 102.91985999999997, "coord_origin": "TOPLEFT"}}, {"id": 40, "text": "[x1'', y2'', x2'', y2'']", "bbox": {"l": 330.63071, "t": 105.33922999999993, "r": 374.51157, "b": 111.07983000000002, "coord_origin": "TOPLEFT"}}, {"id": 41, "text": "...", "bbox": {"l": 330.63071, "t": 113.49926999999991, "r": 335.73233, "b": 119.23987, "coord_origin": "TOPLEFT"}}, {"id": 42, "text": "<TR>", "bbox": {"l": 322.30579, "t": 141.79236000000003, "r": 335.05988, "b": 146.57617000000005, "coord_origin": "TOPLEFT"}}, {"id": 43, "text": "<TD>", "bbox": {"l": 322.30579, "t": 148.93231000000003, "r": 335.05988, "b": 153.71613000000002, "coord_origin": "TOPLEFT"}}, {"id": 44, "text": "1", "bbox": {"l": 337.54971, "t": 148.55579, "r": 340.95242, "b": 154.29638999999997, "coord_origin": "TOPLEFT"}}, {"id": 45, "text": "</TD><TD colspan=\"2\">", "bbox": {"l": 343.56262, "t": 148.93231000000003, "r": 398.91446, "b": 153.71613000000002, "coord_origin": "TOPLEFT"}}, {"id": 46, "text": "</TD>", "bbox": {"l": 407.41718, "t": 148.93231000000003, "r": 421.58801, "b": 153.71613000000002, "coord_origin": "TOPLEFT"}}, {"id": 47, "text": "</TR><TR>", "bbox": {"l": 322.30579, "t": 156.07232999999997, "r": 349.23022, "b": 160.85613999999998, "coord_origin": "TOPLEFT"}}, {"id": 48, "text": "<TD>", "bbox": {"l": 322.30579, "t": 163.21234000000004, "r": 335.05988, "b": 167.99614999999994, "coord_origin": "TOPLEFT"}}, {"id": 49, "text": "</TD><TD>...", "bbox": {"l": 343.56155, "t": 163.21234000000004, "r": 374.73685, "b": 167.99614999999994, "coord_origin": "TOPLEFT"}}, {"id": 50, "text": "...", "bbox": {"l": 322.30579, "t": 170.35235999999998, "r": 326.55716, "b": 175.13617, "coord_origin": "TOPLEFT"}}, {"id": 51, "text": "1", "bbox": {"l": 323.51111, "t": 89.66967999999997, "r": 326.91382, "b": 95.41027999999994, "coord_origin": "TOPLEFT"}}, {"id": 52, "text": "2", "bbox": {"l": 323.71509, "t": 97.78887999999995, "r": 327.1178, "b": 103.52948000000004, "coord_origin": "TOPLEFT"}}, {"id": 53, "text": "3", "bbox": {"l": 323.71509, "t": 105.98969, "r": 327.1178, "b": 111.73029000000008, "coord_origin": "TOPLEFT"}}, {"id": 54, "text": "2", "bbox": {"l": 401.4816, "t": 148.54625999999996, "r": 404.88431, "b": 154.28687000000002, "coord_origin": "TOPLEFT"}}, {"id": 55, "text": "3", "bbox": {"l": 337.6976, "t": 162.68451000000005, "r": 341.10031, "b": 168.42511000000002, "coord_origin": "TOPLEFT"}}, {"id": 56, "text": "3", "bbox": {"l": 454.46378, "t": 104.54584, "r": 457.86648999999994, "b": 110.28644000000008, "coord_origin": "TOPLEFT"}}, {"id": 57, "text": "2", "bbox": {"l": 493.32580999999993, "t": 91.09546, "r": 496.72852, "b": 96.83605999999997, "coord_origin": "TOPLEFT"}}, {"id": 58, "text": "1", "bbox": {"l": 454.08298, "t": 90.56879000000015, "r": 457.48569000000003, "b": 96.30939000000001, "coord_origin": "TOPLEFT"}}]}, "text": "", "data": null, "provenance": null, "predicted_class": null, "confidence": null}, {"label": "caption", "id": 1, "page_no": 4, "cluster": {"id": 1, "label": "caption", "bbox": {"l": 49.33068084716797, "t": 202.95303344726562, "r": 545.10846, "b": 225.3606414794922, "coord_origin": "TOPLEFT"}, "confidence": 0.9677655100822449, "cells": [{"id": 59, "text": "Figure 3:", "bbox": {"l": 50.112, "t": 204.10535000000004, "r": 86.883949, "b": 213.01189999999997, "coord_origin": "TOPLEFT"}}, {"id": 60, "text": "TableFormer", "bbox": {"l": 94.020996, "t": 203.98577999999998, "r": 149.85141, "b": 212.94214, "coord_origin": "TOPLEFT"}}, {"id": 61, "text": "takes in an image of the PDF and creates bounding box and HTML structure predictions that are", "bbox": {"l": 152.86099, "t": 204.10535000000004, "r": 545.10846, "b": 213.01189999999997, "coord_origin": "TOPLEFT"}}, {"id": 62, "text": "synchronized. The bounding boxes grabs the content from the PDF and inserts it in the structure.", "bbox": {"l": 50.111992, "t": 216.06035999999995, "r": 436.0134, "b": 224.96691999999996, "coord_origin": "TOPLEFT"}}]}, "text": "Figure 3: TableFormer takes in an image of the PDF and creates bounding box and HTML structure predictions that are synchronized. The bounding boxes grabs the content from the PDF and inserts it in the structure."}, {"label": "picture", "id": 2, "page_no": 4, "cluster": {"id": 2, "label": "picture", "bbox": {"l": 53.03322982788086, "t": 257.6654052734375, "r": 285.3731689453125, "b": 507.6688232421875, "coord_origin": "TOPLEFT"}, "confidence": 0.9724959135055542, "cells": [{"id": 63, "text": "Input Image", "bbox": {"l": 74.253464, "t": 258.21472000000006, "r": 101.75846, "b": 264.17474000000004, "coord_origin": "TOPLEFT"}}, {"id": 64, "text": "Tokenised Tags", "bbox": {"l": 122.29972, "t": 258.34520999999995, "r": 157.83972, "b": 264.30524, "coord_origin": "TOPLEFT"}}, {"id": 65, "text": "Multi-Head Attention", "bbox": {"l": 78.549347, "t": 371.38579999999996, "r": 125.68359000000001, "b": 377.04782, "coord_origin": "TOPLEFT"}}, {"id": 66, "text": "Add", "bbox": {"l": 78.513298, "t": 391.31857, "r": 84.644547, "b": 396.98059, "coord_origin": "TOPLEFT"}}, {"id": 67, "text": "& Normalisation", "bbox": {"l": 116.52705, "t": 391.31857, "r": 125.11079999999998, "b": 396.98059, "coord_origin": "TOPLEFT"}}, {"id": 68, "text": "Feed Forward Network", "bbox": {"l": 76.024773, "t": 424.45309, "r": 127.92327000000002, "b": 430.11511, "coord_origin": "TOPLEFT"}}, {"id": 69, "text": "Add", "bbox": {"l": 78.382828, "t": 444.88956, "r": 84.514076, "b": 450.55157, "coord_origin": "TOPLEFT"}}, {"id": 70, "text": "& Normalisation", "bbox": {"l": 116.39658, "t": 444.88956, "r": 124.98033, "b": 450.55157, "coord_origin": "TOPLEFT"}}, {"id": 71, "text": "Linear", "bbox": {"l": 167.46945, "t": 462.44324, "r": 181.6292, "b": 468.10526, "coord_origin": "TOPLEFT"}}, {"id": 72, "text": "Softmax", "bbox": {"l": 165.61292, "t": 478.47107, "r": 184.43242, "b": 484.13309, "coord_origin": "TOPLEFT"}}, {"id": 73, "text": "CNN BACKBONE ENCODER", "bbox": {"l": 65.319511, "t": 324.26235999999994, "r": 132.9245, "b": 330.22235000000006, "coord_origin": "TOPLEFT"}}, {"id": 74, "text": "[30, 1, 2, 3, 4, \u2026 3, ", "bbox": {"l": 119.51457, "t": 269.66394, "r": 162.98782, "b": 274.72992, "coord_origin": "TOPLEFT"}}, {"id": 75, "text": "4, 5, 8, 31]", "bbox": {"l": 128.72858, "t": 274.91394, "r": 151.41083, "b": 279.97992, "coord_origin": "TOPLEFT"}}, {"id": 76, "text": "Positional ", "bbox": {"l": 60.434211999999995, "t": 338.95993, "r": 80.27021, "b": 344.26993, "coord_origin": "TOPLEFT"}}, {"id": 77, "text": "Encoding", "bbox": {"l": 60.598457, "t": 343.38605, "r": 78.854958, "b": 348.69604, "coord_origin": "TOPLEFT"}}, {"id": 78, "text": "Positional ", "bbox": {"l": 134.82877, "t": 293.37762, "r": 154.66476, "b": 298.68762, "coord_origin": "TOPLEFT"}}, {"id": 79, "text": "Encoding", "bbox": {"l": 134.99303, "t": 297.80370999999997, "r": 153.24953, "b": 303.11371, "coord_origin": "TOPLEFT"}}, {"id": 80, "text": "Add & Normalisation", "bbox": {"l": 150.55193, "t": 345.35861, "r": 197.14943, "b": 351.02063, "coord_origin": "TOPLEFT"}}, {"id": 81, "text": "Add", "bbox": {"l": 150.55193, "t": 394.4234, "r": 156.68318, "b": 400.08542, "coord_origin": "TOPLEFT"}}, {"id": 82, "text": "& Normalisation", "bbox": {"l": 188.56567, "t": 394.4234, "r": 197.14943, "b": 400.08542, "coord_origin": "TOPLEFT"}}, {"id": 83, "text": "Multi-Head Attention", "bbox": {"l": 150.18539, "t": 375.66843, "r": 197.31964, "b": 381.33044, "coord_origin": "TOPLEFT"}}, {"id": 84, "text": "Add", "bbox": {"l": 150.55193, "t": 440.24847000000005, "r": 156.68318, "b": 445.91049, "coord_origin": "TOPLEFT"}}, {"id": 85, "text": "& Normalisation", "bbox": {"l": 188.56567, "t": 440.24847000000005, "r": 197.14943, "b": 445.91049, "coord_origin": "TOPLEFT"}}, {"id": 86, "text": "Feed Forward Network", "bbox": {"l": 147.86377, "t": 422.09335, "r": 199.76227, "b": 427.75537, "coord_origin": "TOPLEFT"}}, {"id": 87, "text": "Linear", "bbox": {"l": 241.56567000000004, "t": 314.26285000000007, "r": 255.72542, "b": 319.92487, "coord_origin": "TOPLEFT"}}, {"id": 88, "text": "Linear", "bbox": {"l": 241.91730000000004, "t": 361.36493, "r": 256.07706, "b": 367.02695, "coord_origin": "TOPLEFT"}}, {"id": 89, "text": "Attention", "bbox": {"l": 228.054, "t": 336.61929000000003, "r": 248.72363000000004, "b": 342.28131, "coord_origin": "TOPLEFT"}}, {"id": 90, "text": "Network", "bbox": {"l": 246.2919, "t": 336.61929000000003, "r": 269.39325, "b": 342.28131, "coord_origin": "TOPLEFT"}}, {"id": 91, "text": "MLP", "bbox": {"l": 228.44568000000004, "t": 405.14682, "r": 238.73892, "b": 410.80884, "coord_origin": "TOPLEFT"}}, {"id": 92, "text": "Linear ", "bbox": {"l": 256.29767, "t": 405.2032500000001, "r": 271.77792, "b": 410.86526, "coord_origin": "TOPLEFT"}}, {"id": 93, "text": "Sigmoid", "bbox": {"l": 239.54543, "t": 382.21344, "r": 258.08942, "b": 387.87546, "coord_origin": "TOPLEFT"}}, {"id": 94, "text": "Transformer Encoder Network", "bbox": {"l": 54.14704100000001, "t": 384.87183, "r": 59.51152, "b": 449.78326, "coord_origin": "TOPLEFT"}}, {"id": 95, "text": "x2", "bbox": {"l": 54.235424, "t": 373.81232, "r": 59.30449699999999, "b": 378.45421999999996, "coord_origin": "TOPLEFT"}}, {"id": 96, "text": "Encoded Output", "bbox": {"l": 85.295891, "t": 484.53189, "r": 122.16431, "b": 490.36688, "coord_origin": "TOPLEFT"}}, {"id": 97, "text": "Encoded Output", "bbox": {"l": 229.66599, "t": 279.54607999999996, "r": 265.3194, "b": 285.45572000000004, "coord_origin": "TOPLEFT"}}, {"id": 98, "text": "Predicted Tags", "bbox": {"l": 157.17369, "t": 500.3031, "r": 190.41711, "b": 506.12943, "coord_origin": "TOPLEFT"}}, {"id": 99, "text": "Bounding Boxes & ", "bbox": {"l": 227.81598999999997, "t": 438.05542, "r": 270.78442, "b": 443.89206, "coord_origin": "TOPLEFT"}}, {"id": 100, "text": "Classification", "bbox": {"l": 233.70262, "t": 444.06183, "r": 263.51105, "b": 449.8904999999999, "coord_origin": "TOPLEFT"}}, {"id": 101, "text": "Transformer ", "bbox": {"l": 184.74655, "t": 293.39502, "r": 212.16055, "b": 298.75903, "coord_origin": "TOPLEFT"}}, {"id": 102, "text": "Decoder Network", "bbox": {"l": 178.91229, "t": 299.14502, "r": 216.74378999999996, "b": 304.50903, "coord_origin": "TOPLEFT"}}, {"id": 103, "text": "x4", "bbox": {"l": 194.24574, "t": 282.7822, "r": 198.89099, "b": 287.84817999999996, "coord_origin": "TOPLEFT"}}, {"id": 104, "text": "CELL BBOX DECODER", "bbox": {"l": 221.45587, "t": 271.86914, "r": 276.47089, "b": 277.82916, "coord_origin": "TOPLEFT"}}, {"id": 105, "text": "Masked Multi-Head ", "bbox": {"l": 151.65219, "t": 323.44241, "r": 197.29019, "b": 329.10443, "coord_origin": "TOPLEFT"}}, {"id": 106, "text": "Attention", "bbox": {"l": 163.43277, "t": 329.44241, "r": 184.19028, "b": 335.10443, "coord_origin": "TOPLEFT"}}]}, "text": "", "data": null, "provenance": null, "predicted_class": null, "confidence": null}, {"label": "caption", "id": 3, "page_no": 4, "cluster": {"id": 3, "label": "caption", "bbox": {"l": 49.31888961791992, "t": 527.1819458007812, "r": 286.778076171875, "b": 680.416748046875, "coord_origin": "TOPLEFT"}, "confidence": 0.8913402557373047, "cells": [{"id": 107, "text": "Figure 4: Given an input image of a table, the", "bbox": {"l": 50.112, "t": 527.90237, "r": 229.78752, "b": 536.80893, "coord_origin": "TOPLEFT"}}, {"id": 108, "text": "Encoder", "bbox": {"l": 231.787, "t": 527.7828099999999, "r": 267.76196, "b": 536.7392, "coord_origin": "TOPLEFT"}}, {"id": 109, "text": "pro-", "bbox": {"l": 269.76401, "t": 527.90237, "r": 286.36169, "b": 536.80893, "coord_origin": "TOPLEFT"}}, {"id": 110, "text": "duces fixed-length features that represent the input image.", "bbox": {"l": 50.112015, "t": 539.85738, "r": 286.36508, "b": 548.76393, "coord_origin": "TOPLEFT"}}, {"id": 111, "text": "The features are then passed to both the", "bbox": {"l": 50.112015, "t": 551.81337, "r": 205.84735, "b": 560.71992, "coord_origin": "TOPLEFT"}}, {"id": 112, "text": "Structure Decoder", "bbox": {"l": 208.01802, "t": 551.69382, "r": 286.36392, "b": 560.6501900000001, "coord_origin": "TOPLEFT"}}, {"id": 113, "text": "and", "bbox": {"l": 50.112015, "t": 563.76837, "r": 64.498009, "b": 572.67493, "coord_origin": "TOPLEFT"}}, {"id": 114, "text": "Cell BBox Decoder", "bbox": {"l": 68.165016, "t": 563.64882, "r": 151.31288, "b": 572.60519, "coord_origin": "TOPLEFT"}}, {"id": 115, "text": ".", "bbox": {"l": 151.31302, "t": 563.76837, "r": 153.80367, "b": 572.67493, "coord_origin": "TOPLEFT"}}, {"id": 116, "text": "During training, the", "bbox": {"l": 160.41884, "t": 563.76837, "r": 241.93283000000002, "b": 572.67493, "coord_origin": "TOPLEFT"}}, {"id": 117, "text": "Structure", "bbox": {"l": 245.59502, "t": 563.64882, "r": 286.362, "b": 572.60519, "coord_origin": "TOPLEFT"}}, {"id": 118, "text": "Decoder", "bbox": {"l": 50.112015, "t": 575.60382, "r": 85.519089, "b": 584.5602, "coord_origin": "TOPLEFT"}}, {"id": 119, "text": "receives \u2018tokenized tags\u2019 of the HTML code that", "bbox": {"l": 88.623016, "t": 575.7233699999999, "r": 286.36072, "b": 584.6299300000001, "coord_origin": "TOPLEFT"}}, {"id": 120, "text": "represent the table structure. Afterwards, a transformer en-", "bbox": {"l": 50.112015, "t": 587.6783800000001, "r": 286.36511, "b": 596.58493, "coord_origin": "TOPLEFT"}}, {"id": 121, "text": "coder and decoder architecture is employed to produce fea-", "bbox": {"l": 50.112015, "t": 599.63338, "r": 286.36508, "b": 608.53993, "coord_origin": "TOPLEFT"}}, {"id": 122, "text": "tures that are received by a linear layer, and the", "bbox": {"l": 50.112015, "t": 611.58838, "r": 240.43756000000002, "b": 620.4949300000001, "coord_origin": "TOPLEFT"}}, {"id": 123, "text": "Cell BBox", "bbox": {"l": 243.19801, "t": 611.46883, "r": 286.36597, "b": 620.4252, "coord_origin": "TOPLEFT"}}, {"id": 124, "text": "Decoder. The linear layer is applied to the features to", "bbox": {"l": 50.112015, "t": 623.42482, "r": 286.36511, "b": 632.3812, "coord_origin": "TOPLEFT"}}, {"id": 125, "text": "predict the tags. Simultaneously, the Cell BBox Decoder", "bbox": {"l": 50.112015, "t": 635.37982, "r": 286.36508, "b": 644.3362, "coord_origin": "TOPLEFT"}}, {"id": 126, "text": "selects features referring to the data cells (\u2018", "bbox": {"l": 50.112015, "t": 647.45438, "r": 220.58205, "b": 656.36093, "coord_origin": "TOPLEFT"}}, {"id": 127, "text": "<", "bbox": {"l": 220.57802000000004, "t": 647.29497, "r": 228.32693, "b": 656.14175, "coord_origin": "TOPLEFT"}}, {"id": 128, "text": "td", "bbox": {"l": 228.32700999999997, "t": 647.45438, "r": 236.07791000000003, "b": 656.36093, "coord_origin": "TOPLEFT"}}, {"id": 129, "text": ">", "bbox": {"l": 236.07802000000004, "t": 647.29497, "r": 243.82693, "b": 656.14175, "coord_origin": "TOPLEFT"}}, {"id": 130, "text": "\u2019, \u2018", "bbox": {"l": 243.82602, "t": 647.45438, "r": 255.29298000000003, "b": 656.36093, "coord_origin": "TOPLEFT"}}, {"id": 131, "text": "<", "bbox": {"l": 255.29102000000003, "t": 647.29497, "r": 263.03992, "b": 656.14175, "coord_origin": "TOPLEFT"}}, {"id": 132, "text": "\u2019) and", "bbox": {"l": 263.04001, "t": 647.45438, "r": 286.36246, "b": 656.36093, "coord_origin": "TOPLEFT"}}, {"id": 133, "text": "passes them through an attention network, an MLP, and a", "bbox": {"l": 50.112015, "t": 659.40938, "r": 286.36511, "b": 668.31594, "coord_origin": "TOPLEFT"}}, {"id": 134, "text": "linear layer to predict the bounding boxes.", "bbox": {"l": 50.112015, "t": 671.36438, "r": 218.46996, "b": 680.27094, "coord_origin": "TOPLEFT"}}]}, "text": "Figure 4: Given an input image of a table, the Encoder produces fixed-length features that represent the input image. The features are then passed to both the Structure Decoder and Cell BBox Decoder . During training, the Structure Decoder receives \u2018tokenized tags\u2019 of the HTML code that represent the table structure. Afterwards, a transformer encoder and decoder architecture is employed to produce features that are received by a linear layer, and the Cell BBox Decoder. The linear layer is applied to the features to predict the tags. Simultaneously, the Cell BBox Decoder selects features referring to the data cells (\u2018 < td > \u2019, \u2018 < \u2019) and passes them through an attention network, an MLP, and a linear layer to predict the bounding boxes."}, {"label": "text", "id": 4, "page_no": 4, "cluster": {"id": 4, "label": "text", "bbox": {"l": 308.1139831542969, "t": 248.6479034423828, "r": 545.3682861328125, "b": 294.3069499999999, "coord_origin": "TOPLEFT"}, "confidence": 0.9831458330154419, "cells": [{"id": 135, "text": "forming classification, and adding an adaptive pooling", "bbox": {"l": 308.862, "t": 249.53441999999995, "r": 523.05786, "b": 258.44097999999997, "coord_origin": "TOPLEFT"}}, {"id": 136, "text": "layer", "bbox": {"l": 525.19983, "t": 249.53441999999995, "r": 545.11505, "b": 258.44097999999997, "coord_origin": "TOPLEFT"}}, {"id": 137, "text": "of size 28*28. ResNet by default downsamples the", "bbox": {"l": 308.862, "t": 261.49042, "r": 517.55847, "b": 270.39697, "coord_origin": "TOPLEFT"}}, {"id": 138, "text": "image", "bbox": {"l": 520.76642, "t": 261.49042, "r": 545.11499, "b": 270.39697, "coord_origin": "TOPLEFT"}}, {"id": 139, "text": "resolution by 32 and then the encoded image is provided", "bbox": {"l": 308.862, "t": 273.44537, "r": 534.80377, "b": 282.35196, "coord_origin": "TOPLEFT"}}, {"id": 140, "text": "to", "bbox": {"l": 537.36414, "t": 273.44537, "r": 545.11505, "b": 282.35196, "coord_origin": "TOPLEFT"}}, {"id": 141, "text": "both the", "bbox": {"l": 308.862, "t": 285.40039, "r": 341.24045, "b": 294.3069499999999, "coord_origin": "TOPLEFT"}}, {"id": 142, "text": "Structure Decoder", "bbox": {"l": 343.73099, "t": 285.49005, "r": 417.23508, "b": 294.07782000000003, "coord_origin": "TOPLEFT"}}, {"id": 143, "text": ", and", "bbox": {"l": 417.23398, "t": 285.40039, "r": 436.60129, "b": 294.3069499999999, "coord_origin": "TOPLEFT"}}, {"id": 144, "text": "Cell BBox Decoder", "bbox": {"l": 439.09198, "t": 285.49005, "r": 516.56116, "b": 294.07782000000003, "coord_origin": "TOPLEFT"}}, {"id": 145, "text": ".", "bbox": {"l": 516.56097, "t": 285.40039, "r": 519.05164, "b": 294.3069499999999, "coord_origin": "TOPLEFT"}}]}, "text": "forming classification, and adding an adaptive pooling layer of size 28*28. ResNet by default downsamples the image resolution by 32 and then the encoded image is provided to both the Structure Decoder , and Cell BBox Decoder ."}, {"label": "text", "id": 5, "page_no": 4, "cluster": {"id": 5, "label": "text", "bbox": {"l": 308.0434265136719, "t": 296.45684814453125, "r": 545.462646484375, "b": 414.4779052734375, "coord_origin": "TOPLEFT"}, "confidence": 0.9862009882926941, "cells": [{"id": 146, "text": "Structure Decoder.", "bbox": {"l": 320.81696, "t": 297.33981, "r": 400.86649, "b": 306.2962, "coord_origin": "TOPLEFT"}}, {"id": 147, "text": "The transformer architecture of", "bbox": {"l": 403.91394, "t": 297.45938, "r": 528.33685, "b": 306.36594, "coord_origin": "TOPLEFT"}}, {"id": 148, "text": "this", "bbox": {"l": 530.7179, "t": 297.45938, "r": 545.11383, "b": 306.36594, "coord_origin": "TOPLEFT"}}, {"id": 149, "text": "component is based on the work proposed in [31].", "bbox": {"l": 308.86194, "t": 309.41437, "r": 517.5285, "b": 318.32092, "coord_origin": "TOPLEFT"}}, {"id": 150, "text": "After", "bbox": {"l": 524.09387, "t": 309.41437, "r": 545.11493, "b": 318.32092, "coord_origin": "TOPLEFT"}}, {"id": 151, "text": "extensive experimentation, the", "bbox": {"l": 308.86194, "t": 321.36934999999994, "r": 432.35833999999994, "b": 330.27591, "coord_origin": "TOPLEFT"}}, {"id": 152, "text": "Structure Decoder", "bbox": {"l": 435.81995000000006, "t": 321.45901, "r": 510.29041, "b": 330.04678, "coord_origin": "TOPLEFT"}}, {"id": 153, "text": "is", "bbox": {"l": 513.97797, "t": 321.36934999999994, "r": 520.62305, "b": 330.27591, "coord_origin": "TOPLEFT"}}, {"id": 154, "text": "mod-", "bbox": {"l": 524.08008, "t": 321.36934999999994, "r": 545.11115, "b": 330.27591, "coord_origin": "TOPLEFT"}}, {"id": 155, "text": "eled as a transformer encoder with two encoder layers", "bbox": {"l": 308.86197, "t": 333.32434, "r": 527.76013, "b": 342.2309, "coord_origin": "TOPLEFT"}}, {"id": 156, "text": "and", "bbox": {"l": 530.729, "t": 333.32434, "r": 545.11499, "b": 342.2309, "coord_origin": "TOPLEFT"}}, {"id": 157, "text": "a transformer decoder made from a stack of 4 decoder", "bbox": {"l": 308.86197, "t": 345.27933, "r": 526.85352, "b": 354.18588, "coord_origin": "TOPLEFT"}}, {"id": 158, "text": "lay-", "bbox": {"l": 529.62311, "t": 345.27933, "r": 545.11493, "b": 354.18588, "coord_origin": "TOPLEFT"}}, {"id": 159, "text": "ers that comprise mainly of multi-head attention and", "bbox": {"l": 308.86197, "t": 357.23532, "r": 524.51245, "b": 366.14188, "coord_origin": "TOPLEFT"}}, {"id": 160, "text": "feed", "bbox": {"l": 527.96948, "t": 357.23532, "r": 545.11511, "b": 366.14188, "coord_origin": "TOPLEFT"}}, {"id": 161, "text": "forward layers.", "bbox": {"l": 308.86197, "t": 369.19031000000007, "r": 370.39096, "b": 378.09685999999994, "coord_origin": "TOPLEFT"}}, {"id": 162, "text": "This configuration uses fewer layers", "bbox": {"l": 377.44449, "t": 369.19031000000007, "r": 526.91339, "b": 378.09685999999994, "coord_origin": "TOPLEFT"}}, {"id": 163, "text": "and", "bbox": {"l": 530.72906, "t": 369.19031000000007, "r": 545.11505, "b": 378.09685999999994, "coord_origin": "TOPLEFT"}}, {"id": 164, "text": "heads in comparison to networks applied to other", "bbox": {"l": 308.86197, "t": 381.14529000000005, "r": 505.46395999999993, "b": 390.05185, "coord_origin": "TOPLEFT"}}, {"id": 165, "text": "problems", "bbox": {"l": 508.03430000000003, "t": 381.14529000000005, "r": 545.11511, "b": 390.05185, "coord_origin": "TOPLEFT"}}, {"id": 166, "text": "(e.g. \u201cScene Understanding\u201d, \u201cImage Captioning\u201d),", "bbox": {"l": 308.86197, "t": 393.10028, "r": 517.68799, "b": 402.00684, "coord_origin": "TOPLEFT"}}, {"id": 167, "text": "some-", "bbox": {"l": 520.76642, "t": 393.10028, "r": 545.11499, "b": 402.00684, "coord_origin": "TOPLEFT"}}, {"id": 168, "text": "thing which we relate to the simplicity of table images.", "bbox": {"l": 308.86197, "t": 405.05526999999995, "r": 528.01935, "b": 413.96182, "coord_origin": "TOPLEFT"}}]}, "text": "Structure Decoder. The transformer architecture of this component is based on the work proposed in [31]. After extensive experimentation, the Structure Decoder is modeled as a transformer encoder with two encoder layers and a transformer decoder made from a stack of 4 decoder layers that comprise mainly of multi-head attention and feed forward layers. This configuration uses fewer layers and heads in comparison to networks applied to other problems (e.g. \u201cScene Understanding\u201d, \u201cImage Captioning\u201d), something which we relate to the simplicity of table images."}, {"label": "text", "id": 6, "page_no": 4, "cluster": {"id": 6, "label": "text", "bbox": {"l": 307.9245300292969, "t": 416.45196533203125, "r": 545.5032348632812, "b": 545.57271, "coord_origin": "TOPLEFT"}, "confidence": 0.9851906895637512, "cells": [{"id": 169, "text": "The transformer encoder receives an encoded", "bbox": {"l": 320.81696, "t": 417.11426, "r": 515.49609, "b": 426.02081, "coord_origin": "TOPLEFT"}}, {"id": 170, "text": "image", "bbox": {"l": 520.7663, "t": 417.11426, "r": 545.11487, "b": 426.02081, "coord_origin": "TOPLEFT"}}, {"id": 171, "text": "from the", "bbox": {"l": 308.86197, "t": 429.0692399999999, "r": 343.72107, "b": 437.9758, "coord_origin": "TOPLEFT"}}, {"id": 172, "text": "CNN Backbone Network", "bbox": {"l": 347.03796, "t": 429.15891, "r": 446.45471000000003, "b": 437.74667, "coord_origin": "TOPLEFT"}}, {"id": 173, "text": "and refines it", "bbox": {"l": 449.93996999999996, "t": 429.0692399999999, "r": 503.06055000000003, "b": 437.9758, "coord_origin": "TOPLEFT"}}, {"id": 174, "text": "through", "bbox": {"l": 506.37808, "t": 429.0692399999999, "r": 537.3717, "b": 437.9758, "coord_origin": "TOPLEFT"}}, {"id": 175, "text": "a", "bbox": {"l": 540.68927, "t": 429.0692399999999, "r": 545.11267, "b": 437.9758, "coord_origin": "TOPLEFT"}}, {"id": 176, "text": "multi-head dot-product attention layer, followed by a", "bbox": {"l": 308.86197, "t": 441.02423, "r": 522.78894, "b": 449.93079, "coord_origin": "TOPLEFT"}}, {"id": 177, "text": "Feed", "bbox": {"l": 525.7478, "t": 441.02423, "r": 545.11511, "b": 449.93079, "coord_origin": "TOPLEFT"}}, {"id": 178, "text": "Forward Network.", "bbox": {"l": 308.86197, "t": 452.97922, "r": 384.14929, "b": 461.88577, "coord_origin": "TOPLEFT"}}, {"id": 179, "text": "During training, the transformer", "bbox": {"l": 393.37466, "t": 452.97922, "r": 527.84985, "b": 461.88577, "coord_origin": "TOPLEFT"}}, {"id": 180, "text": "de-", "bbox": {"l": 532.39282, "t": 452.97922, "r": 545.11505, "b": 461.88577, "coord_origin": "TOPLEFT"}}, {"id": 181, "text": "coder receives as input the output feature produced by", "bbox": {"l": 308.86197, "t": 464.93521, "r": 529.7627, "b": 473.84177, "coord_origin": "TOPLEFT"}}, {"id": 182, "text": "the", "bbox": {"l": 532.94073, "t": 464.93521, "r": 545.11505, "b": 473.84177, "coord_origin": "TOPLEFT"}}, {"id": 183, "text": "transformer encoder, and the tokenized input of the", "bbox": {"l": 308.86197, "t": 476.8902, "r": 514.17126, "b": 485.79675, "coord_origin": "TOPLEFT"}}, {"id": 184, "text": "HTML", "bbox": {"l": 516.89105, "t": 476.8902, "r": 545.11511, "b": 485.79675, "coord_origin": "TOPLEFT"}}, {"id": 185, "text": "ground-truth tags. Using a stack of multi-head attention", "bbox": {"l": 308.86197, "t": 488.84518, "r": 527.63068, "b": 497.75174, "coord_origin": "TOPLEFT"}}, {"id": 186, "text": "lay-", "bbox": {"l": 529.62317, "t": 488.84518, "r": 545.11499, "b": 497.75174, "coord_origin": "TOPLEFT"}}, {"id": 187, "text": "ers, different aspects of the tag sequence could be", "bbox": {"l": 308.86197, "t": 500.80017, "r": 508.3630999999999, "b": 509.70673, "coord_origin": "TOPLEFT"}}, {"id": 188, "text": "inferred.", "bbox": {"l": 511.09286000000003, "t": 500.80017, "r": 545.11511, "b": 509.70673, "coord_origin": "TOPLEFT"}}, {"id": 189, "text": "This is achieved by each attention head on a layer operating", "bbox": {"l": 308.86197, "t": 512.7551599999999, "r": 545.11499, "b": 521.6617100000001, "coord_origin": "TOPLEFT"}}, {"id": 190, "text": "in a different subspace, and then combining altogether their", "bbox": {"l": 308.86197, "t": 524.71115, "r": 545.11511, "b": 533.61771, "coord_origin": "TOPLEFT"}}, {"id": 191, "text": "attention score.", "bbox": {"l": 308.86197, "t": 536.66615, "r": 369.73349, "b": 545.57271, "coord_origin": "TOPLEFT"}}]}, "text": "The transformer encoder receives an encoded image from the CNN Backbone Network and refines it through a multi-head dot-product attention layer, followed by a Feed Forward Network. During training, the transformer decoder receives as input the output feature produced by the transformer encoder, and the tokenized input of the HTML ground-truth tags. Using a stack of multi-head attention layers, different aspects of the tag sequence could be inferred. This is achieved by each attention head on a layer operating in a different subspace, and then combining altogether their attention score."}, {"label": "text", "id": 7, "page_no": 4, "cluster": {"id": 7, "label": "text", "bbox": {"l": 307.90594482421875, "t": 547.4575805664062, "r": 545.403076171875, "b": 653.4934692382812, "coord_origin": "TOPLEFT"}, "confidence": 0.9869197010993958, "cells": [{"id": 192, "text": "Cell BBox Decoder.", "bbox": {"l": 320.81696, "t": 548.6046, "r": 404.76184, "b": 557.56097, "coord_origin": "TOPLEFT"}}, {"id": 193, "text": "Our architecture allows to simul-", "bbox": {"l": 410.34094, "t": 548.72415, "r": 545.11505, "b": 557.63071, "coord_origin": "TOPLEFT"}}, {"id": 194, "text": "taneously predict HTML tags and bounding boxes for each", "bbox": {"l": 308.86194, "t": 560.68015, "r": 545.11493, "b": 569.5867000000001, "coord_origin": "TOPLEFT"}}, {"id": 195, "text": "table cell without the need of a separate object detector end", "bbox": {"l": 308.86194, "t": 572.6351500000001, "r": 545.11511, "b": 581.5417, "coord_origin": "TOPLEFT"}}, {"id": 196, "text": "to end. This approach is inspired by DETR [1] which em-", "bbox": {"l": 308.86194, "t": 584.59015, "r": 545.11493, "b": 593.4967, "coord_origin": "TOPLEFT"}}, {"id": 197, "text": "ploys a Transformer Encoder, and Decoder that looks for", "bbox": {"l": 308.86194, "t": 596.54515, "r": 545.11499, "b": 605.45171, "coord_origin": "TOPLEFT"}}, {"id": 198, "text": "a specific number of object queries (potential object detec-", "bbox": {"l": 308.86194, "t": 608.50015, "r": 545.11505, "b": 617.40671, "coord_origin": "TOPLEFT"}}, {"id": 199, "text": "tions). As our model utilizes a transformer architecture, the", "bbox": {"l": 308.86194, "t": 620.45515, "r": 545.11505, "b": 629.36171, "coord_origin": "TOPLEFT"}}, {"id": 200, "text": "hidden state of the", "bbox": {"l": 308.86194, "t": 632.41115, "r": 381.67859, "b": 641.3177000000001, "coord_origin": "TOPLEFT"}}, {"id": 201, "text": "<", "bbox": {"l": 383.99695, "t": 632.25174, "r": 391.74585, "b": 641.09853, "coord_origin": "TOPLEFT"}}, {"id": 202, "text": "td", "bbox": {"l": 391.74594, "t": 632.41115, "r": 399.49686, "b": 641.3177000000001, "coord_origin": "TOPLEFT"}}, {"id": 203, "text": ">", "bbox": {"l": 399.49695, "t": 632.25174, "r": 407.24585, "b": 641.09853, "coord_origin": "TOPLEFT"}}, {"id": 204, "text": "\u2019 and \u2018", "bbox": {"l": 407.24594, "t": 632.41115, "r": 432.90958, "b": 641.3177000000001, "coord_origin": "TOPLEFT"}}, {"id": 205, "text": "<", "bbox": {"l": 432.90792999999996, "t": 632.25174, "r": 440.65683000000007, "b": 641.09853, "coord_origin": "TOPLEFT"}}, {"id": 206, "text": "\u2019 HTML structure tags be-", "bbox": {"l": 440.65691999999996, "t": 632.41115, "r": 545.11475, "b": 641.3177000000001, "coord_origin": "TOPLEFT"}}, {"id": 207, "text": "come the object query.", "bbox": {"l": 308.86194, "t": 644.3661500000001, "r": 398.96371, "b": 653.27271, "coord_origin": "TOPLEFT"}}]}, "text": "Cell BBox Decoder. Our architecture allows to simultaneously predict HTML tags and bounding boxes for each table cell without the need of a separate object detector end to end. This approach is inspired by DETR [1] which employs a Transformer Encoder, and Decoder that looks for a specific number of object queries (potential object detections). As our model utilizes a transformer architecture, the hidden state of the < td > \u2019 and \u2018 < \u2019 HTML structure tags become the object query."}, {"label": "text", "id": 8, "page_no": 4, "cluster": {"id": 8, "label": "text", "bbox": {"l": 307.9397277832031, "t": 655.742919921875, "r": 545.2218627929688, "b": 713.3260498046875, "coord_origin": "TOPLEFT"}, "confidence": 0.9852352142333984, "cells": [{"id": 208, "text": "The encoding generated by the", "bbox": {"l": 320.81693, "t": 656.42516, "r": 444.34316999999993, "b": 665.33172, "coord_origin": "TOPLEFT"}}, {"id": 209, "text": "CNN Backbone Network", "bbox": {"l": 447.00591999999995, "t": 656.51482, "r": 545.1076, "b": 665.10258, "coord_origin": "TOPLEFT"}}, {"id": 210, "text": "along with the features acquired for every data cell from the", "bbox": {"l": 308.86194, "t": 668.38016, "r": 545.11505, "b": 677.2867200000001, "coord_origin": "TOPLEFT"}}, {"id": 211, "text": "Transformer Decoder are then passed to the attention net-", "bbox": {"l": 308.86194, "t": 680.33516, "r": 545.11505, "b": 689.24172, "coord_origin": "TOPLEFT"}}, {"id": 212, "text": "work. The attention network takes both inputs and learns to", "bbox": {"l": 308.86194, "t": 692.290161, "r": 545.11505, "b": 701.196724, "coord_origin": "TOPLEFT"}}, {"id": 213, "text": "provide an attention weighted encoding. This weighted at-", "bbox": {"l": 308.86194, "t": 704.245163, "r": 545.11505, "b": 713.151726, "coord_origin": "TOPLEFT"}}]}, "text": "The encoding generated by the CNN Backbone Network along with the features acquired for every data cell from the Transformer Decoder are then passed to the attention network. The attention network takes both inputs and learns to provide an attention weighted encoding. This weighted at-"}], "headers": [{"label": "page_footer", "id": 9, "page_no": 4, "cluster": {"id": 9, "label": "page_footer", "bbox": {"l": 294.5858459472656, "t": 733.3272094726562, "r": 300.10223, "b": 743.039722, "coord_origin": "TOPLEFT"}, "confidence": 0.8719565868377686, "cells": [{"id": 214, "text": "5", "bbox": {"l": 295.12094, "t": 734.13316, "r": 300.10223, "b": 743.039722, "coord_origin": "TOPLEFT"}}]}, "text": "5"}]}}, {"page_no": 5, "size": {"width": 612.0, "height": 792.0}, "cells": [{"id": 0, "text": "tention encoding is then multiplied to the encoded image to", "bbox": {"l": 50.112, "t": 75.20836999999995, "r": 286.36514, "b": 84.11492999999996, "coord_origin": "TOPLEFT"}}, {"id": 1, "text": "produce a feature for each table cell. Notice that this is dif-", "bbox": {"l": 50.112, "t": 87.16339000000005, "r": 286.36508, "b": 96.06994999999995, "coord_origin": "TOPLEFT"}}, {"id": 2, "text": "ferent than the typical object detection problem where im-", "bbox": {"l": 50.112, "t": 99.11841000000004, "r": 286.36508, "b": 108.02495999999985, "coord_origin": "TOPLEFT"}}, {"id": 3, "text": "balances between the number of detections and the amount", "bbox": {"l": 50.112, "t": 111.07343000000003, "r": 286.36508, "b": 119.97997999999984, "coord_origin": "TOPLEFT"}}, {"id": 4, "text": "of objects may exist. In our case, we know up front that", "bbox": {"l": 50.112, "t": 123.02844000000005, "r": 286.36508, "b": 131.93499999999995, "coord_origin": "TOPLEFT"}}, {"id": 5, "text": "the produced detections always match with the table cells", "bbox": {"l": 50.112, "t": 134.98443999999995, "r": 286.36514, "b": 143.89099, "coord_origin": "TOPLEFT"}}, {"id": 6, "text": "in number and correspondence.", "bbox": {"l": 50.112, "t": 146.93944999999997, "r": 175.16254, "b": 155.84600999999998, "coord_origin": "TOPLEFT"}}, {"id": 7, "text": "The output features for each table cell are then fed", "bbox": {"l": 62.067001, "t": 159.62445000000002, "r": 286.36496, "b": 168.53101000000004, "coord_origin": "TOPLEFT"}}, {"id": 8, "text": "into the feed-forward network (FFN). The FFN consists", "bbox": {"l": 50.112, "t": 171.58043999999995, "r": 286.36511, "b": 180.48699999999997, "coord_origin": "TOPLEFT"}}, {"id": 9, "text": "of a Multi-Layer Perceptron (3 layers with ReLU activa-", "bbox": {"l": 50.112, "t": 183.53545999999994, "r": 286.36511, "b": 192.44201999999996, "coord_origin": "TOPLEFT"}}, {"id": 10, "text": "tion function) that predicts the normalized coordinates for", "bbox": {"l": 50.112, "t": 195.49048000000005, "r": 286.36511, "b": 204.39702999999997, "coord_origin": "TOPLEFT"}}, {"id": 11, "text": "the bounding box of each table cell. Finally, the predicted", "bbox": {"l": 50.112, "t": 207.44550000000004, "r": 286.36511, "b": 216.35204999999996, "coord_origin": "TOPLEFT"}}, {"id": 12, "text": "bounding boxes are classified based on whether they are", "bbox": {"l": 50.112, "t": 219.40051000000005, "r": 286.36511, "b": 228.30706999999995, "coord_origin": "TOPLEFT"}}, {"id": 13, "text": "empty or not using a linear layer.", "bbox": {"l": 50.112, "t": 231.35650999999996, "r": 181.54855, "b": 240.26306, "coord_origin": "TOPLEFT"}}, {"id": 14, "text": "Loss Functions.", "bbox": {"l": 62.067001, "t": 243.92193999999995, "r": 129.21492, "b": 252.87829999999997, "coord_origin": "TOPLEFT"}}, {"id": 15, "text": "We formulate a multi-task loss Eq. 2", "bbox": {"l": 134.451, "t": 244.04150000000004, "r": 286.36078, "b": 252.94806000000005, "coord_origin": "TOPLEFT"}}, {"id": 16, "text": "to train our network. The Cross-Entropy loss (denoted as", "bbox": {"l": 50.112007, "t": 255.99652000000003, "r": 286.36511, "b": 264.90308000000005, "coord_origin": "TOPLEFT"}}, {"id": 17, "text": "l$_{s}$", "bbox": {"l": 50.112007, "t": 267.79309, "r": 56.84528, "b": 276.63989000000004, "coord_origin": "TOPLEFT"}}, {"id": 18, "text": ") is used to train the", "bbox": {"l": 57.343006, "t": 267.95250999999996, "r": 135.39996, "b": 276.85907, "coord_origin": "TOPLEFT"}}, {"id": 19, "text": "Structure Decoder", "bbox": {"l": 137.735, "t": 268.04218000000003, "r": 211.07965, "b": 276.62994000000003, "coord_origin": "TOPLEFT"}}, {"id": 20, "text": "which predicts the", "bbox": {"l": 213.63699, "t": 267.95250999999996, "r": 286.36395, "b": 276.85907, "coord_origin": "TOPLEFT"}}, {"id": 21, "text": "structure tokens. As for the", "bbox": {"l": 50.112, "t": 279.90747, "r": 158.82388, "b": 288.81406, "coord_origin": "TOPLEFT"}}, {"id": 22, "text": "Cell BBox Decoder", "bbox": {"l": 161.31799, "t": 279.99712999999997, "r": 238.79712, "b": 288.58493, "coord_origin": "TOPLEFT"}}, {"id": 23, "text": "it is trained", "bbox": {"l": 241.521, "t": 279.90747, "r": 286.36264, "b": 288.81406, "coord_origin": "TOPLEFT"}}, {"id": 24, "text": "with a combination of losses denoted as", "bbox": {"l": 50.112, "t": 291.86249, "r": 211.3766, "b": 300.76904, "coord_origin": "TOPLEFT"}}, {"id": 25, "text": "l$_{box}$", "bbox": {"l": 214.271, "t": 291.70309, "r": 229.19780000000003, "b": 300.54987, "coord_origin": "TOPLEFT"}}, {"id": 26, "text": ".", "bbox": {"l": 229.696, "t": 291.86249, "r": 232.18665000000001, "b": 300.76904, "coord_origin": "TOPLEFT"}}, {"id": 27, "text": "l$_{box}$", "bbox": {"l": 236.49001, "t": 291.70309, "r": 251.41681000000003, "b": 300.54987, "coord_origin": "TOPLEFT"}}, {"id": 28, "text": "consists", "bbox": {"l": 254.81099999999998, "t": 291.86249, "r": 286.36255, "b": 300.76904, "coord_origin": "TOPLEFT"}}, {"id": 29, "text": "of the generally used", "bbox": {"l": 50.112, "t": 303.81747, "r": 137.45412, "b": 312.72403, "coord_origin": "TOPLEFT"}}, {"id": 30, "text": "l$_{1}$", "bbox": {"l": 141.298, "t": 303.65808, "r": 148.24258, "b": 312.50485, "coord_origin": "TOPLEFT"}}, {"id": 31, "text": "loss for object detection and the", "bbox": {"l": 152.58601, "t": 303.81747, "r": 286.36377, "b": 312.72403, "coord_origin": "TOPLEFT"}}, {"id": 32, "text": "IoU loss (", "bbox": {"l": 50.112015, "t": 315.77245999999997, "r": 89.683464, "b": 324.67902, "coord_origin": "TOPLEFT"}}, {"id": 33, "text": "l$_{iou}$", "bbox": {"l": 89.68602, "t": 315.61307, "r": 104.12046, "b": 324.45984, "coord_origin": "TOPLEFT"}}, {"id": 34, "text": ") to be scale invariant as explained in [25]. In", "bbox": {"l": 104.61802, "t": 315.77245999999997, "r": 286.36572, "b": 324.67902, "coord_origin": "TOPLEFT"}}, {"id": 35, "text": "comparison to DETR, we do not use the Hungarian algo-", "bbox": {"l": 50.112019, "t": 327.72845, "r": 286.36511, "b": 336.6350100000001, "coord_origin": "TOPLEFT"}}, {"id": 36, "text": "rithm [15] to match the predicted bounding boxes with the", "bbox": {"l": 50.112019, "t": 339.68344, "r": 286.36508, "b": 348.59, "coord_origin": "TOPLEFT"}}, {"id": 37, "text": "ground-truth boxes, as we have already achieved a one-to-", "bbox": {"l": 50.112019, "t": 351.63843, "r": 286.36511, "b": 360.54498, "coord_origin": "TOPLEFT"}}, {"id": 38, "text": "one match through two steps: 1) Our token input sequence", "bbox": {"l": 50.112019, "t": 363.59341, "r": 286.36508, "b": 372.49996999999996, "coord_origin": "TOPLEFT"}}, {"id": 39, "text": "is naturally ordered, therefore the hidden states of the table", "bbox": {"l": 50.112019, "t": 375.5484, "r": 286.36511, "b": 384.45496, "coord_origin": "TOPLEFT"}}, {"id": 40, "text": "data cells are also in order when they are provided as in-", "bbox": {"l": 50.112019, "t": 387.50339, "r": 286.36514, "b": 396.40994, "coord_origin": "TOPLEFT"}}, {"id": 41, "text": "put to the", "bbox": {"l": 50.112019, "t": 399.45938, "r": 88.68721, "b": 408.36594, "coord_origin": "TOPLEFT"}}, {"id": 42, "text": "Cell BBox Decoder", "bbox": {"l": 91.646019, "t": 399.54904, "r": 170.0517, "b": 408.13681, "coord_origin": "TOPLEFT"}}, {"id": 43, "text": ", and 2) Our bounding boxes", "bbox": {"l": 170.05103, "t": 399.45938, "r": 286.36438, "b": 408.36594, "coord_origin": "TOPLEFT"}}, {"id": 44, "text": "generation mechanism (see Sec.", "bbox": {"l": 50.112022, "t": 411.41437, "r": 181.96703, "b": 420.32092, "coord_origin": "TOPLEFT"}}, {"id": 45, "text": "3)", "bbox": {"l": 189.09029, "t": 411.41437, "r": 197.74918, "b": 420.32092, "coord_origin": "TOPLEFT"}}, {"id": 46, "text": "ensures a one-to-one", "bbox": {"l": 200.34789, "t": 411.41437, "r": 286.36511, "b": 420.32092, "coord_origin": "TOPLEFT"}}, {"id": 47, "text": "mapping between the cell content and its bounding box for", "bbox": {"l": 50.112022, "t": 423.36934999999994, "r": 286.36511, "b": 432.27591, "coord_origin": "TOPLEFT"}}, {"id": 48, "text": "all post-processed datasets.", "bbox": {"l": 50.112022, "t": 435.32434, "r": 158.2959, "b": 444.2309, "coord_origin": "TOPLEFT"}}, {"id": 49, "text": "The loss used to train the TableFormer can be defined as", "bbox": {"l": 62.067024, "t": 448.01035, "r": 286.36499, "b": 456.9169, "coord_origin": "TOPLEFT"}}, {"id": 50, "text": "following:", "bbox": {"l": 50.112022, "t": 459.96533, "r": 91.377113, "b": 468.87189, "coord_origin": "TOPLEFT"}}, {"id": 51, "text": "l$_{box}$", "bbox": {"l": 125.71502, "t": 493.28094, "r": 140.64182, "b": 502.12772, "coord_origin": "TOPLEFT"}}, {"id": 52, "text": "=", "bbox": {"l": 143.90701, "t": 493.28094, "r": 151.65593, "b": 502.12772, "coord_origin": "TOPLEFT"}}, {"id": 53, "text": "\u03bb$_{iou}$l$_{iou}$", "bbox": {"l": 154.42302, "t": 493.28094, "r": 186.62846, "b": 502.12772, "coord_origin": "TOPLEFT"}}, {"id": 54, "text": "+", "bbox": {"l": 189.34003, "t": 493.28094, "r": 197.08894, "b": 502.12772, "coord_origin": "TOPLEFT"}}, {"id": 55, "text": "\u03bb$_{l}$$_{1}$", "bbox": {"l": 199.30302, "t": 493.28094, "r": 211.64659, "b": 502.12772, "coord_origin": "TOPLEFT"}}, {"id": 56, "text": "l", "bbox": {"l": 124.33002, "t": 508.22495, "r": 127.30286, "b": 517.07172, "coord_origin": "TOPLEFT"}}, {"id": 57, "text": "=", "bbox": {"l": 130.26602, "t": 508.22495, "r": 138.01494, "b": 517.07172, "coord_origin": "TOPLEFT"}}, {"id": 58, "text": "\u03bbl$_{s}$", "bbox": {"l": 140.78203, "t": 508.22495, "r": 153.32629, "b": 517.07172, "coord_origin": "TOPLEFT"}}, {"id": 59, "text": "+ (1", "bbox": {"l": 156.03903, "t": 508.22495, "r": 174.85541, "b": 517.07172, "coord_origin": "TOPLEFT"}}, {"id": 60, "text": "\u2212", "bbox": {"l": 177.07103, "t": 507.66702, "r": 184.81995, "b": 517.07172, "coord_origin": "TOPLEFT"}}, {"id": 61, "text": "\u03bb", "bbox": {"l": 187.03304, "t": 508.22495, "r": 192.84422, "b": 517.07172, "coord_origin": "TOPLEFT"}}, {"id": 62, "text": ")", "bbox": {"l": 192.84503, "t": 508.22495, "r": 196.71948, "b": 517.07172, "coord_origin": "TOPLEFT"}}, {"id": 63, "text": "l$_{box}$", "bbox": {"l": 196.71902, "t": 508.22495, "r": 211.64583, "b": 517.07172, "coord_origin": "TOPLEFT"}}, {"id": 64, "text": "(1)", "bbox": {"l": 274.74603, "t": 501.01132, "r": 286.36243, "b": 509.91788, "coord_origin": "TOPLEFT"}}, {"id": 65, "text": "where", "bbox": {"l": 50.11203, "t": 531.30933, "r": 74.450661, "b": 540.21588, "coord_origin": "TOPLEFT"}}, {"id": 66, "text": "\u03bb", "bbox": {"l": 76.941032, "t": 531.14993, "r": 82.75222, "b": 539.9967, "coord_origin": "TOPLEFT"}}, {"id": 67, "text": "\u2208", "bbox": {"l": 85.520035, "t": 530.5920100000001, "r": 92.162102, "b": 539.9967, "coord_origin": "TOPLEFT"}}, {"id": 68, "text": "[0, 1], and", "bbox": {"l": 94.653038, "t": 531.30933, "r": 135.59932, "b": 540.21588, "coord_origin": "TOPLEFT"}}, {"id": 69, "text": "\u03bb$_{iou}$, \u03bb$_{l}$$_{1}$", "bbox": {"l": 138.09004, "t": 531.14993, "r": 172.63162, "b": 539.9967, "coord_origin": "TOPLEFT"}}, {"id": 70, "text": "\u2208$_{R}$", "bbox": {"l": 175.89705, "t": 530.5920100000001, "r": 192.50104, "b": 539.9967, "coord_origin": "TOPLEFT"}}, {"id": 71, "text": "are hyper-parameters.", "bbox": {"l": 194.99205, "t": 531.30933, "r": 281.59692, "b": 540.21588, "coord_origin": "TOPLEFT"}}, {"id": 72, "text": "5.", "bbox": {"l": 50.112045, "t": 555.91689, "r": 57.92831799999999, "b": 566.66461, "coord_origin": "TOPLEFT"}}, {"id": 73, "text": "Experimental Results", "bbox": {"l": 68.350014, "t": 555.91689, "r": 171.98335, "b": 566.66461, "coord_origin": "TOPLEFT"}}, {"id": 74, "text": "5.1.", "bbox": {"l": 50.112045, "t": 576.26433, "r": 64.693237, "b": 586.1163799999999, "coord_origin": "TOPLEFT"}}, {"id": 75, "text": "Implementation Details", "bbox": {"l": 74.414032, "t": 576.26433, "r": 179.17502, "b": 586.1163799999999, "coord_origin": "TOPLEFT"}}, {"id": 76, "text": "TableFormer uses ResNet-18 as the", "bbox": {"l": 62.067047, "t": 595.73433, "r": 202.97806, "b": 604.64088, "coord_origin": "TOPLEFT"}}, {"id": 77, "text": "CNN Backbone Net-", "bbox": {"l": 205.38405, "t": 595.82399, "r": 286.36008, "b": 604.41174, "coord_origin": "TOPLEFT"}}, {"id": 78, "text": "work", "bbox": {"l": 50.112045, "t": 607.77899, "r": 70.037247, "b": 616.3667399999999, "coord_origin": "TOPLEFT"}}, {"id": 79, "text": ". The input images are resized to 448*448 pixels and", "bbox": {"l": 70.037048, "t": 607.68933, "r": 286.36496, "b": 616.59589, "coord_origin": "TOPLEFT"}}, {"id": 80, "text": "the feature map has a dimension of 28*28. Additionally, we", "bbox": {"l": 50.112049, "t": 619.64433, "r": 286.36517, "b": 628.55089, "coord_origin": "TOPLEFT"}}, {"id": 81, "text": "enforce the following input constraints:", "bbox": {"l": 50.112049, "t": 631.60033, "r": 207.03294, "b": 640.50688, "coord_origin": "TOPLEFT"}}, {"id": 82, "text": "Image width and height", "bbox": {"l": 91.661049, "t": 654.54532, "r": 186.01683, "b": 663.45187, "coord_origin": "TOPLEFT"}}, {"id": 83, "text": "\u2264", "bbox": {"l": 188.50705, "t": 653.828, "r": 196.25597, "b": 663.2327, "coord_origin": "TOPLEFT"}}, {"id": 84, "text": "1024 pixels", "bbox": {"l": 198.74605, "t": 654.54532, "r": 244.81310999999997, "b": 663.45187, "coord_origin": "TOPLEFT"}}, {"id": 85, "text": "Structural tags length", "bbox": {"l": 101.01604, "t": 669.48932, "r": 186.24606, "b": 678.39588, "coord_origin": "TOPLEFT"}}, {"id": 86, "text": "\u2264", "bbox": {"l": 188.73605, "t": 668.77201, "r": 196.48497, "b": 678.1767, "coord_origin": "TOPLEFT"}}, {"id": 87, "text": "512 tokens.", "bbox": {"l": 198.97505, "t": 669.48932, "r": 244.81296999999998, "b": 678.39588, "coord_origin": "TOPLEFT"}}, {"id": 88, "text": "(2)", "bbox": {"l": 274.74606, "t": 662.11731, "r": 286.36246, "b": 671.02388, "coord_origin": "TOPLEFT"}}, {"id": 89, "text": "Although input constraints are used also by other methods,", "bbox": {"l": 50.112061, "t": 692.290314, "r": 286.36514, "b": 701.196877, "coord_origin": "TOPLEFT"}}, {"id": 90, "text": "such as EDD, ours are less restrictive due to the improved", "bbox": {"l": 50.112061, "t": 704.245316, "r": 286.36514, "b": 713.151878, "coord_origin": "TOPLEFT"}}, {"id": 91, "text": "runtime performance and lower memory footprint of Table-", "bbox": {"l": 308.86206, "t": 75.20830999999998, "r": 545.11523, "b": 84.11487, "coord_origin": "TOPLEFT"}}, {"id": 92, "text": "Former.", "bbox": {"l": 308.86206, "t": 87.16332999999997, "r": 339.98523, "b": 96.06988999999999, "coord_origin": "TOPLEFT"}}, {"id": 93, "text": "This allows to utilize input samples with longer", "bbox": {"l": 346.88931, "t": 87.16332999999997, "r": 545.11523, "b": 96.06988999999999, "coord_origin": "TOPLEFT"}}, {"id": 94, "text": "sequences and images with larger dimensions.", "bbox": {"l": 308.86206, "t": 99.11835000000008, "r": 492.96097, "b": 108.0249, "coord_origin": "TOPLEFT"}}, {"id": 95, "text": "The Transformer Encoder consists of two \u201cTransformer", "bbox": {"l": 320.81705, "t": 116.22937000000002, "r": 545.11499, "b": 125.13593000000003, "coord_origin": "TOPLEFT"}}, {"id": 96, "text": "Encoder Layers\u201d, with an input feature size of 512, feed", "bbox": {"l": 308.86206, "t": 128.18439, "r": 545.11517, "b": 137.09094000000005, "coord_origin": "TOPLEFT"}}, {"id": 97, "text": "forward network of 1024, and 4 attention heads. As for the", "bbox": {"l": 308.86206, "t": 140.13940000000002, "r": 545.11505, "b": 149.04596000000004, "coord_origin": "TOPLEFT"}}, {"id": 98, "text": "Transformer Decoder it is composed of four \u201cTransformer", "bbox": {"l": 308.86206, "t": 152.09442, "r": 545.11511, "b": 161.00098000000003, "coord_origin": "TOPLEFT"}}, {"id": 99, "text": "Decoder Layers\u201d with similar input and output dimensions", "bbox": {"l": 308.86206, "t": 164.04944, "r": 545.11517, "b": 172.95599000000004, "coord_origin": "TOPLEFT"}}, {"id": 100, "text": "as the \u201cTransformer Encoder Layers\u201d.", "bbox": {"l": 308.86206, "t": 176.00543000000005, "r": 467.21756000000005, "b": 184.91198999999995, "coord_origin": "TOPLEFT"}}, {"id": 101, "text": "Even though our", "bbox": {"l": 475.43671, "t": 176.00543000000005, "r": 545.11511, "b": 184.91198999999995, "coord_origin": "TOPLEFT"}}, {"id": 102, "text": "model uses fewer layers and heads than the default imple-", "bbox": {"l": 308.86206, "t": 187.96045000000004, "r": 545.11511, "b": 196.86699999999996, "coord_origin": "TOPLEFT"}}, {"id": 103, "text": "mentation parameters, our extensive experimentation has", "bbox": {"l": 308.86206, "t": 199.91547000000003, "r": 545.11511, "b": 208.82201999999995, "coord_origin": "TOPLEFT"}}, {"id": 104, "text": "proved this setup to be more suitable for table images. We", "bbox": {"l": 308.86206, "t": 211.87048000000004, "r": 545.11517, "b": 220.77704000000006, "coord_origin": "TOPLEFT"}}, {"id": 105, "text": "attribute this finding to the inherent design of table im-", "bbox": {"l": 308.86206, "t": 223.82550000000003, "r": 545.11511, "b": 232.73206000000005, "coord_origin": "TOPLEFT"}}, {"id": 106, "text": "ages, which contain mostly lines and text, unlike the more", "bbox": {"l": 308.86206, "t": 235.78052000000002, "r": 545.11511, "b": 244.68706999999995, "coord_origin": "TOPLEFT"}}, {"id": 107, "text": "elaborate content present in other scopes (e.g. the COCO", "bbox": {"l": 308.86206, "t": 247.73650999999995, "r": 545.11523, "b": 256.64306999999997, "coord_origin": "TOPLEFT"}}, {"id": 108, "text": "dataset).", "bbox": {"l": 308.86206, "t": 259.69152999999994, "r": 342.3364, "b": 268.59808, "coord_origin": "TOPLEFT"}}, {"id": 109, "text": "Moreover, we have added ResNet blocks to the", "bbox": {"l": 348.95157, "t": 259.69152999999994, "r": 545.11517, "b": 268.59808, "coord_origin": "TOPLEFT"}}, {"id": 110, "text": "inputs of the Structure Decoder and Cell BBox Decoder.", "bbox": {"l": 308.86206, "t": 271.64655000000005, "r": 545.11517, "b": 280.55310000000003, "coord_origin": "TOPLEFT"}}, {"id": 111, "text": "This prevents a decoder having a stronger influence over the", "bbox": {"l": 308.86206, "t": 283.6015300000001, "r": 545.1153, "b": 292.50809, "coord_origin": "TOPLEFT"}}, {"id": 112, "text": "learned weights which would damage the other prediction", "bbox": {"l": 308.86206, "t": 295.55652, "r": 545.11511, "b": 304.46307, "coord_origin": "TOPLEFT"}}, {"id": 113, "text": "task (structure vs bounding boxes), but learn task specific", "bbox": {"l": 308.86206, "t": 307.51151, "r": 545.11511, "b": 316.41806, "coord_origin": "TOPLEFT"}}, {"id": 114, "text": "weights instead. Lastly our dropout layers are set to 0.5.", "bbox": {"l": 308.86206, "t": 319.4674999999999, "r": 532.48267, "b": 328.37405, "coord_origin": "TOPLEFT"}}, {"id": 115, "text": "For training, TableFormer is trained with 3 Adam opti-", "bbox": {"l": 320.81705, "t": 336.57751, "r": 545.11499, "b": 345.48407000000003, "coord_origin": "TOPLEFT"}}, {"id": 116, "text": "mizers, each one for the", "bbox": {"l": 308.86206, "t": 348.5325000000001, "r": 403.7359, "b": 357.43906, "coord_origin": "TOPLEFT"}}, {"id": 117, "text": "CNN Backbone Network", "bbox": {"l": 406.07605, "t": 348.62216, "r": 503.54016, "b": 357.20993, "coord_origin": "TOPLEFT"}}, {"id": 118, "text": ",", "bbox": {"l": 503.53906, "t": 348.5325000000001, "r": 506.02972, "b": 357.43906, "coord_origin": "TOPLEFT"}}, {"id": 119, "text": "Structure", "bbox": {"l": 508.40004999999996, "t": 348.62216, "r": 545.11224, "b": 357.20993, "coord_origin": "TOPLEFT"}}, {"id": 120, "text": "Decoder", "bbox": {"l": 308.86206, "t": 360.57715, "r": 343.1633, "b": 369.16492000000005, "coord_origin": "TOPLEFT"}}, {"id": 121, "text": ", and", "bbox": {"l": 343.16306, "t": 360.48749, "r": 362.2016, "b": 369.39404, "coord_origin": "TOPLEFT"}}, {"id": 122, "text": "Cell BBox Decoder", "bbox": {"l": 364.28604, "t": 360.57715, "r": 440.93829, "b": 369.16492000000005, "coord_origin": "TOPLEFT"}}, {"id": 123, "text": ". Taking the PubTabNet as", "bbox": {"l": 440.93903, "t": 360.48749, "r": 545.10797, "b": 369.39404, "coord_origin": "TOPLEFT"}}, {"id": 124, "text": "an example for our parameter set up, the initializing learn-", "bbox": {"l": 308.86203, "t": 372.44247, "r": 545.11511, "b": 381.34903, "coord_origin": "TOPLEFT"}}, {"id": 125, "text": "ing rate is 0.001 for 12 epochs with a batch size of 24, and", "bbox": {"l": 308.86203, "t": 384.3984699999999, "r": 545.11517, "b": 393.30502, "coord_origin": "TOPLEFT"}}, {"id": 126, "text": "\u03bb", "bbox": {"l": 308.86203, "t": 396.19406000000004, "r": 314.67322, "b": 405.04083, "coord_origin": "TOPLEFT"}}, {"id": 127, "text": "set to 0.5.", "bbox": {"l": 318.65802, "t": 396.35345, "r": 360.39139, "b": 405.2600100000001, "coord_origin": "TOPLEFT"}}, {"id": 128, "text": "Afterwards, we reduce the learning rate to", "bbox": {"l": 367.96295, "t": 396.35345, "r": 545.10803, "b": 405.2600100000001, "coord_origin": "TOPLEFT"}}, {"id": 129, "text": "0.0001, the batch size to 18 and train for 12 more epochs or", "bbox": {"l": 308.86203, "t": 408.30844, "r": 545.11511, "b": 417.215, "coord_origin": "TOPLEFT"}}, {"id": 130, "text": "convergence.", "bbox": {"l": 308.86203, "t": 420.26343, "r": 360.9664, "b": 429.16998, "coord_origin": "TOPLEFT"}}, {"id": 131, "text": "TableFormer is implemented with PyTorch and Torchvi-", "bbox": {"l": 320.81702, "t": 437.37441999999993, "r": 545.11499, "b": 446.28098, "coord_origin": "TOPLEFT"}}, {"id": 132, "text": "sion libraries [22].", "bbox": {"l": 308.86203, "t": 449.32941, "r": 384.62759, "b": 458.23596, "coord_origin": "TOPLEFT"}}, {"id": 133, "text": "To speed up the inference, the image", "bbox": {"l": 391.37228, "t": 449.32941, "r": 545.11511, "b": 458.23596, "coord_origin": "TOPLEFT"}}, {"id": 134, "text": "undergoes a single forward pass through the", "bbox": {"l": 308.86203, "t": 461.28439, "r": 494.00693000000007, "b": 470.19095, "coord_origin": "TOPLEFT"}}, {"id": 135, "text": "CNN Back-", "bbox": {"l": 498.07803, "t": 461.37405, "r": 545.11145, "b": 469.96182, "coord_origin": "TOPLEFT"}}, {"id": 136, "text": "bone Network", "bbox": {"l": 308.86203, "t": 473.32904, "r": 364.44336, "b": 481.91681, "coord_origin": "TOPLEFT"}}, {"id": 137, "text": "and transformer encoder. This eliminates the", "bbox": {"l": 367.06104, "t": 473.23938, "r": 545.11267, "b": 482.14594, "coord_origin": "TOPLEFT"}}, {"id": 138, "text": "overhead of generating the same features for each decoding", "bbox": {"l": 308.86203, "t": 485.19437, "r": 545.11511, "b": 494.10092, "coord_origin": "TOPLEFT"}}, {"id": 139, "text": "step. Similarly, we employ a \u2019caching\u2019 technique to preform", "bbox": {"l": 308.86203, "t": 497.14935, "r": 545.11523, "b": 506.05591, "coord_origin": "TOPLEFT"}}, {"id": 140, "text": "faster autoregressive decoding. This is achieved by storing", "bbox": {"l": 308.86203, "t": 509.10535, "r": 545.11511, "b": 518.0119, "coord_origin": "TOPLEFT"}}, {"id": 141, "text": "the features of decoded tokens so we can reuse them for", "bbox": {"l": 308.86203, "t": 521.06033, "r": 545.11517, "b": 529.9668899999999, "coord_origin": "TOPLEFT"}}, {"id": 142, "text": "each time step. Therefore, we only compute the attention", "bbox": {"l": 308.86203, "t": 533.01532, "r": 545.11517, "b": 541.9218900000001, "coord_origin": "TOPLEFT"}}, {"id": 143, "text": "for each new tag.", "bbox": {"l": 308.86203, "t": 544.97034, "r": 377.21548, "b": 553.87689, "coord_origin": "TOPLEFT"}}, {"id": 144, "text": "5.2.", "bbox": {"l": 308.86203, "t": 579.55432, "r": 323.9046, "b": 589.40637, "coord_origin": "TOPLEFT"}}, {"id": 145, "text": "Generalization", "bbox": {"l": 333.93301, "t": 579.55432, "r": 397.44281, "b": 589.40637, "coord_origin": "TOPLEFT"}}, {"id": 146, "text": "TableFormer is evaluated on three major publicly avail-", "bbox": {"l": 320.81702, "t": 603.44933, "r": 545.11493, "b": 612.3558800000001, "coord_origin": "TOPLEFT"}}, {"id": 147, "text": "able datasets of different nature to prove the generalization", "bbox": {"l": 308.86203, "t": 615.40433, "r": 545.11511, "b": 624.31088, "coord_origin": "TOPLEFT"}}, {"id": 148, "text": "and effectiveness of our model. The datasets used for eval-", "bbox": {"l": 308.86203, "t": 627.35933, "r": 545.11517, "b": 636.26588, "coord_origin": "TOPLEFT"}}, {"id": 149, "text": "uation are the PubTabNet, FinTabNet and TableBank which", "bbox": {"l": 308.86203, "t": 639.31433, "r": 545.11511, "b": 648.22089, "coord_origin": "TOPLEFT"}}, {"id": 150, "text": "stem from the scientific, financial and general domains re-", "bbox": {"l": 308.86203, "t": 651.27032, "r": 545.11517, "b": 660.17688, "coord_origin": "TOPLEFT"}}, {"id": 151, "text": "spectively.", "bbox": {"l": 308.86203, "t": 663.22533, "r": 350.70493, "b": 672.13189, "coord_origin": "TOPLEFT"}}, {"id": 152, "text": "We also share our baseline results on the challenging", "bbox": {"l": 320.81702, "t": 680.33533, "r": 545.11505, "b": 689.24189, "coord_origin": "TOPLEFT"}}, {"id": 153, "text": "SynthTabNet dataset.", "bbox": {"l": 308.86203, "t": 692.290329, "r": 396.21411, "b": 701.196892, "coord_origin": "TOPLEFT"}}, {"id": 154, "text": "Throughout our experiments, the", "bbox": {"l": 406.40585, "t": 692.290329, "r": 545.11523, "b": 701.196892, "coord_origin": "TOPLEFT"}}, {"id": 155, "text": "same parameters stated in Sec. 5.1 are utilized.", "bbox": {"l": 308.86203, "t": 704.246323, "r": 495.93982, "b": 713.152893, "coord_origin": "TOPLEFT"}}, {"id": 156, "text": "6", "bbox": {"l": 295.12103, "t": 734.133327, "r": 300.10233, "b": 743.03989, "coord_origin": "TOPLEFT"}}], "predictions": {"layout": {"clusters": [{"id": 0, "label": "text", "bbox": {"l": 49.35792922973633, "t": 74.26022338867188, "r": 286.5823059082031, "b": 156.22055053710938, "coord_origin": "TOPLEFT"}, "confidence": 0.9868088960647583, "cells": [{"id": 0, "text": "tention encoding is then multiplied to the encoded image to", "bbox": {"l": 50.112, "t": 75.20836999999995, "r": 286.36514, "b": 84.11492999999996, "coord_origin": "TOPLEFT"}}, {"id": 1, "text": "produce a feature for each table cell. Notice that this is dif-", "bbox": {"l": 50.112, "t": 87.16339000000005, "r": 286.36508, "b": 96.06994999999995, "coord_origin": "TOPLEFT"}}, {"id": 2, "text": "ferent than the typical object detection problem where im-", "bbox": {"l": 50.112, "t": 99.11841000000004, "r": 286.36508, "b": 108.02495999999985, "coord_origin": "TOPLEFT"}}, {"id": 3, "text": "balances between the number of detections and the amount", "bbox": {"l": 50.112, "t": 111.07343000000003, "r": 286.36508, "b": 119.97997999999984, "coord_origin": "TOPLEFT"}}, {"id": 4, "text": "of objects may exist. In our case, we know up front that", "bbox": {"l": 50.112, "t": 123.02844000000005, "r": 286.36508, "b": 131.93499999999995, "coord_origin": "TOPLEFT"}}, {"id": 5, "text": "the produced detections always match with the table cells", "bbox": {"l": 50.112, "t": 134.98443999999995, "r": 286.36514, "b": 143.89099, "coord_origin": "TOPLEFT"}}, {"id": 6, "text": "in number and correspondence.", "bbox": {"l": 50.112, "t": 146.93944999999997, "r": 175.16254, "b": 155.84600999999998, "coord_origin": "TOPLEFT"}}]}, {"id": 1, "label": "text", "bbox": {"l": 49.385826110839844, "t": 158.92007446289062, "r": 286.5838928222656, "b": 240.56492614746094, "coord_origin": "TOPLEFT"}, "confidence": 0.9860329031944275, "cells": [{"id": 7, "text": "The output features for each table cell are then fed", "bbox": {"l": 62.067001, "t": 159.62445000000002, "r": 286.36496, "b": 168.53101000000004, "coord_origin": "TOPLEFT"}}, {"id": 8, "text": "into the feed-forward network (FFN). The FFN consists", "bbox": {"l": 50.112, "t": 171.58043999999995, "r": 286.36511, "b": 180.48699999999997, "coord_origin": "TOPLEFT"}}, {"id": 9, "text": "of a Multi-Layer Perceptron (3 layers with ReLU activa-", "bbox": {"l": 50.112, "t": 183.53545999999994, "r": 286.36511, "b": 192.44201999999996, "coord_origin": "TOPLEFT"}}, {"id": 10, "text": "tion function) that predicts the normalized coordinates for", "bbox": {"l": 50.112, "t": 195.49048000000005, "r": 286.36511, "b": 204.39702999999997, "coord_origin": "TOPLEFT"}}, {"id": 11, "text": "the bounding box of each table cell. Finally, the predicted", "bbox": {"l": 50.112, "t": 207.44550000000004, "r": 286.36511, "b": 216.35204999999996, "coord_origin": "TOPLEFT"}}, {"id": 12, "text": "bounding boxes are classified based on whether they are", "bbox": {"l": 50.112, "t": 219.40051000000005, "r": 286.36511, "b": 228.30706999999995, "coord_origin": "TOPLEFT"}}, {"id": 13, "text": "empty or not using a linear layer.", "bbox": {"l": 50.112, "t": 231.35650999999996, "r": 181.54855, "b": 240.26306, "coord_origin": "TOPLEFT"}}]}, {"id": 2, "label": "text", "bbox": {"l": 49.27542495727539, "t": 243.0098419189453, "r": 286.86236572265625, "b": 444.3459777832031, "coord_origin": "TOPLEFT"}, "confidence": 0.9873637557029724, "cells": [{"id": 14, "text": "Loss Functions.", "bbox": {"l": 62.067001, "t": 243.92193999999995, "r": 129.21492, "b": 252.87829999999997, "coord_origin": "TOPLEFT"}}, {"id": 15, "text": "We formulate a multi-task loss Eq. 2", "bbox": {"l": 134.451, "t": 244.04150000000004, "r": 286.36078, "b": 252.94806000000005, "coord_origin": "TOPLEFT"}}, {"id": 16, "text": "to train our network. The Cross-Entropy loss (denoted as", "bbox": {"l": 50.112007, "t": 255.99652000000003, "r": 286.36511, "b": 264.90308000000005, "coord_origin": "TOPLEFT"}}, {"id": 17, "text": "l$_{s}$", "bbox": {"l": 50.112007, "t": 267.79309, "r": 56.84528, "b": 276.63989000000004, "coord_origin": "TOPLEFT"}}, {"id": 18, "text": ") is used to train the", "bbox": {"l": 57.343006, "t": 267.95250999999996, "r": 135.39996, "b": 276.85907, "coord_origin": "TOPLEFT"}}, {"id": 19, "text": "Structure Decoder", "bbox": {"l": 137.735, "t": 268.04218000000003, "r": 211.07965, "b": 276.62994000000003, "coord_origin": "TOPLEFT"}}, {"id": 20, "text": "which predicts the", "bbox": {"l": 213.63699, "t": 267.95250999999996, "r": 286.36395, "b": 276.85907, "coord_origin": "TOPLEFT"}}, {"id": 21, "text": "structure tokens. As for the", "bbox": {"l": 50.112, "t": 279.90747, "r": 158.82388, "b": 288.81406, "coord_origin": "TOPLEFT"}}, {"id": 22, "text": "Cell BBox Decoder", "bbox": {"l": 161.31799, "t": 279.99712999999997, "r": 238.79712, "b": 288.58493, "coord_origin": "TOPLEFT"}}, {"id": 23, "text": "it is trained", "bbox": {"l": 241.521, "t": 279.90747, "r": 286.36264, "b": 288.81406, "coord_origin": "TOPLEFT"}}, {"id": 24, "text": "with a combination of losses denoted as", "bbox": {"l": 50.112, "t": 291.86249, "r": 211.3766, "b": 300.76904, "coord_origin": "TOPLEFT"}}, {"id": 25, "text": "l$_{box}$", "bbox": {"l": 214.271, "t": 291.70309, "r": 229.19780000000003, "b": 300.54987, "coord_origin": "TOPLEFT"}}, {"id": 26, "text": ".", "bbox": {"l": 229.696, "t": 291.86249, "r": 232.18665000000001, "b": 300.76904, "coord_origin": "TOPLEFT"}}, {"id": 27, "text": "l$_{box}$", "bbox": {"l": 236.49001, "t": 291.70309, "r": 251.41681000000003, "b": 300.54987, "coord_origin": "TOPLEFT"}}, {"id": 28, "text": "consists", "bbox": {"l": 254.81099999999998, "t": 291.86249, "r": 286.36255, "b": 300.76904, "coord_origin": "TOPLEFT"}}, {"id": 29, "text": "of the generally used", "bbox": {"l": 50.112, "t": 303.81747, "r": 137.45412, "b": 312.72403, "coord_origin": "TOPLEFT"}}, {"id": 30, "text": "l$_{1}$", "bbox": {"l": 141.298, "t": 303.65808, "r": 148.24258, "b": 312.50485, "coord_origin": "TOPLEFT"}}, {"id": 31, "text": "loss for object detection and the", "bbox": {"l": 152.58601, "t": 303.81747, "r": 286.36377, "b": 312.72403, "coord_origin": "TOPLEFT"}}, {"id": 32, "text": "IoU loss (", "bbox": {"l": 50.112015, "t": 315.77245999999997, "r": 89.683464, "b": 324.67902, "coord_origin": "TOPLEFT"}}, {"id": 33, "text": "l$_{iou}$", "bbox": {"l": 89.68602, "t": 315.61307, "r": 104.12046, "b": 324.45984, "coord_origin": "TOPLEFT"}}, {"id": 34, "text": ") to be scale invariant as explained in [25]. In", "bbox": {"l": 104.61802, "t": 315.77245999999997, "r": 286.36572, "b": 324.67902, "coord_origin": "TOPLEFT"}}, {"id": 35, "text": "comparison to DETR, we do not use the Hungarian algo-", "bbox": {"l": 50.112019, "t": 327.72845, "r": 286.36511, "b": 336.6350100000001, "coord_origin": "TOPLEFT"}}, {"id": 36, "text": "rithm [15] to match the predicted bounding boxes with the", "bbox": {"l": 50.112019, "t": 339.68344, "r": 286.36508, "b": 348.59, "coord_origin": "TOPLEFT"}}, {"id": 37, "text": "ground-truth boxes, as we have already achieved a one-to-", "bbox": {"l": 50.112019, "t": 351.63843, "r": 286.36511, "b": 360.54498, "coord_origin": "TOPLEFT"}}, {"id": 38, "text": "one match through two steps: 1) Our token input sequence", "bbox": {"l": 50.112019, "t": 363.59341, "r": 286.36508, "b": 372.49996999999996, "coord_origin": "TOPLEFT"}}, {"id": 39, "text": "is naturally ordered, therefore the hidden states of the table", "bbox": {"l": 50.112019, "t": 375.5484, "r": 286.36511, "b": 384.45496, "coord_origin": "TOPLEFT"}}, {"id": 40, "text": "data cells are also in order when they are provided as in-", "bbox": {"l": 50.112019, "t": 387.50339, "r": 286.36514, "b": 396.40994, "coord_origin": "TOPLEFT"}}, {"id": 41, "text": "put to the", "bbox": {"l": 50.112019, "t": 399.45938, "r": 88.68721, "b": 408.36594, "coord_origin": "TOPLEFT"}}, {"id": 42, "text": "Cell BBox Decoder", "bbox": {"l": 91.646019, "t": 399.54904, "r": 170.0517, "b": 408.13681, "coord_origin": "TOPLEFT"}}, {"id": 43, "text": ", and 2) Our bounding boxes", "bbox": {"l": 170.05103, "t": 399.45938, "r": 286.36438, "b": 408.36594, "coord_origin": "TOPLEFT"}}, {"id": 44, "text": "generation mechanism (see Sec.", "bbox": {"l": 50.112022, "t": 411.41437, "r": 181.96703, "b": 420.32092, "coord_origin": "TOPLEFT"}}, {"id": 45, "text": "3)", "bbox": {"l": 189.09029, "t": 411.41437, "r": 197.74918, "b": 420.32092, "coord_origin": "TOPLEFT"}}, {"id": 46, "text": "ensures a one-to-one", "bbox": {"l": 200.34789, "t": 411.41437, "r": 286.36511, "b": 420.32092, "coord_origin": "TOPLEFT"}}, {"id": 47, "text": "mapping between the cell content and its bounding box for", "bbox": {"l": 50.112022, "t": 423.36934999999994, "r": 286.36511, "b": 432.27591, "coord_origin": "TOPLEFT"}}, {"id": 48, "text": "all post-processed datasets.", "bbox": {"l": 50.112022, "t": 435.32434, "r": 158.2959, "b": 444.2309, "coord_origin": "TOPLEFT"}}]}, {"id": 3, "label": "text", "bbox": {"l": 49.76386642456055, "t": 446.9576110839844, "r": 286.42083740234375, "b": 469.1785583496094, "coord_origin": "TOPLEFT"}, "confidence": 0.9724196195602417, "cells": [{"id": 49, "text": "The loss used to train the TableFormer can be defined as", "bbox": {"l": 62.067024, "t": 448.01035, "r": 286.36499, "b": 456.9169, "coord_origin": "TOPLEFT"}}, {"id": 50, "text": "following:", "bbox": {"l": 50.112022, "t": 459.96533, "r": 91.377113, "b": 468.87189, "coord_origin": "TOPLEFT"}}]}, {"id": 4, "label": "formula", "bbox": {"l": 123.5527114868164, "t": 492.1290588378906, "r": 286.36480712890625, "b": 518.0797119140625, "coord_origin": "TOPLEFT"}, "confidence": 0.9423348307609558, "cells": [{"id": 51, "text": "l$_{box}$", "bbox": {"l": 125.71502, "t": 493.28094, "r": 140.64182, "b": 502.12772, "coord_origin": "TOPLEFT"}}, {"id": 52, "text": "=", "bbox": {"l": 143.90701, "t": 493.28094, "r": 151.65593, "b": 502.12772, "coord_origin": "TOPLEFT"}}, {"id": 53, "text": "\u03bb$_{iou}$l$_{iou}$", "bbox": {"l": 154.42302, "t": 493.28094, "r": 186.62846, "b": 502.12772, "coord_origin": "TOPLEFT"}}, {"id": 54, "text": "+", "bbox": {"l": 189.34003, "t": 493.28094, "r": 197.08894, "b": 502.12772, "coord_origin": "TOPLEFT"}}, {"id": 55, "text": "\u03bb$_{l}$$_{1}$", "bbox": {"l": 199.30302, "t": 493.28094, "r": 211.64659, "b": 502.12772, "coord_origin": "TOPLEFT"}}, {"id": 56, "text": "l", "bbox": {"l": 124.33002, "t": 508.22495, "r": 127.30286, "b": 517.07172, "coord_origin": "TOPLEFT"}}, {"id": 57, "text": "=", "bbox": {"l": 130.26602, "t": 508.22495, "r": 138.01494, "b": 517.07172, "coord_origin": "TOPLEFT"}}, {"id": 58, "text": "\u03bbl$_{s}$", "bbox": {"l": 140.78203, "t": 508.22495, "r": 153.32629, "b": 517.07172, "coord_origin": "TOPLEFT"}}, {"id": 59, "text": "+ (1", "bbox": {"l": 156.03903, "t": 508.22495, "r": 174.85541, "b": 517.07172, "coord_origin": "TOPLEFT"}}, {"id": 60, "text": "\u2212", "bbox": {"l": 177.07103, "t": 507.66702, "r": 184.81995, "b": 517.07172, "coord_origin": "TOPLEFT"}}, {"id": 61, "text": "\u03bb", "bbox": {"l": 187.03304, "t": 508.22495, "r": 192.84422, "b": 517.07172, "coord_origin": "TOPLEFT"}}, {"id": 62, "text": ")", "bbox": {"l": 192.84503, "t": 508.22495, "r": 196.71948, "b": 517.07172, "coord_origin": "TOPLEFT"}}, {"id": 63, "text": "l$_{box}$", "bbox": {"l": 196.71902, "t": 508.22495, "r": 211.64583, "b": 517.07172, "coord_origin": "TOPLEFT"}}, {"id": 64, "text": "(1)", "bbox": {"l": 274.74603, "t": 501.01132, "r": 286.36243, "b": 509.91788, "coord_origin": "TOPLEFT"}}]}, {"id": 5, "label": "text", "bbox": {"l": 49.46774673461914, "t": 530.3292236328125, "r": 281.59692, "b": 541.0584106445312, "coord_origin": "TOPLEFT"}, "confidence": 0.9363928437232971, "cells": [{"id": 65, "text": "where", "bbox": {"l": 50.11203, "t": 531.30933, "r": 74.450661, "b": 540.21588, "coord_origin": "TOPLEFT"}}, {"id": 66, "text": "\u03bb", "bbox": {"l": 76.941032, "t": 531.14993, "r": 82.75222, "b": 539.9967, "coord_origin": "TOPLEFT"}}, {"id": 67, "text": "\u2208", "bbox": {"l": 85.520035, "t": 530.5920100000001, "r": 92.162102, "b": 539.9967, "coord_origin": "TOPLEFT"}}, {"id": 68, "text": "[0, 1], and", "bbox": {"l": 94.653038, "t": 531.30933, "r": 135.59932, "b": 540.21588, "coord_origin": "TOPLEFT"}}, {"id": 69, "text": "\u03bb$_{iou}$, \u03bb$_{l}$$_{1}$", "bbox": {"l": 138.09004, "t": 531.14993, "r": 172.63162, "b": 539.9967, "coord_origin": "TOPLEFT"}}, {"id": 70, "text": "\u2208$_{R}$", "bbox": {"l": 175.89705, "t": 530.5920100000001, "r": 192.50104, "b": 539.9967, "coord_origin": "TOPLEFT"}}, {"id": 71, "text": "are hyper-parameters.", "bbox": {"l": 194.99205, "t": 531.30933, "r": 281.59692, "b": 540.21588, "coord_origin": "TOPLEFT"}}]}, {"id": 6, "label": "section_header", "bbox": {"l": 49.37754821777344, "t": 554.845703125, "r": 171.98335, "b": 566.986572265625, "coord_origin": "TOPLEFT"}, "confidence": 0.9554555416107178, "cells": [{"id": 72, "text": "5.", "bbox": {"l": 50.112045, "t": 555.91689, "r": 57.92831799999999, "b": 566.66461, "coord_origin": "TOPLEFT"}}, {"id": 73, "text": "Experimental Results", "bbox": {"l": 68.350014, "t": 555.91689, "r": 171.98335, "b": 566.66461, "coord_origin": "TOPLEFT"}}]}, {"id": 7, "label": "section_header", "bbox": {"l": 49.518253326416016, "t": 575.6320190429688, "r": 179.17800903320312, "b": 586.595947265625, "coord_origin": "TOPLEFT"}, "confidence": 0.9538504481315613, "cells": [{"id": 74, "text": "5.1.", "bbox": {"l": 50.112045, "t": 576.26433, "r": 64.693237, "b": 586.1163799999999, "coord_origin": "TOPLEFT"}}, {"id": 75, "text": "Implementation Details", "bbox": {"l": 74.414032, "t": 576.26433, "r": 179.17502, "b": 586.1163799999999, "coord_origin": "TOPLEFT"}}]}, {"id": 8, "label": "text", "bbox": {"l": 49.45502471923828, "t": 594.9777221679688, "r": 286.3757629394531, "b": 641.0610961914062, "coord_origin": "TOPLEFT"}, "confidence": 0.9856163263320923, "cells": [{"id": 76, "text": "TableFormer uses ResNet-18 as the", "bbox": {"l": 62.067047, "t": 595.73433, "r": 202.97806, "b": 604.64088, "coord_origin": "TOPLEFT"}}, {"id": 77, "text": "CNN Backbone Net-", "bbox": {"l": 205.38405, "t": 595.82399, "r": 286.36008, "b": 604.41174, "coord_origin": "TOPLEFT"}}, {"id": 78, "text": "work", "bbox": {"l": 50.112045, "t": 607.77899, "r": 70.037247, "b": 616.3667399999999, "coord_origin": "TOPLEFT"}}, {"id": 79, "text": ". The input images are resized to 448*448 pixels and", "bbox": {"l": 70.037048, "t": 607.68933, "r": 286.36496, "b": 616.59589, "coord_origin": "TOPLEFT"}}, {"id": 80, "text": "the feature map has a dimension of 28*28. Additionally, we", "bbox": {"l": 50.112049, "t": 619.64433, "r": 286.36517, "b": 628.55089, "coord_origin": "TOPLEFT"}}, {"id": 81, "text": "enforce the following input constraints:", "bbox": {"l": 50.112049, "t": 631.60033, "r": 207.03294, "b": 640.50688, "coord_origin": "TOPLEFT"}}]}, {"id": 9, "label": "formula", "bbox": {"l": 91.31732940673828, "t": 653.6121215820312, "r": 286.36246, "b": 678.39588, "coord_origin": "TOPLEFT"}, "confidence": 0.8441831469535828, "cells": [{"id": 82, "text": "Image width and height", "bbox": {"l": 91.661049, "t": 654.54532, "r": 186.01683, "b": 663.45187, "coord_origin": "TOPLEFT"}}, {"id": 83, "text": "\u2264", "bbox": {"l": 188.50705, "t": 653.828, "r": 196.25597, "b": 663.2327, "coord_origin": "TOPLEFT"}}, {"id": 84, "text": "1024 pixels", "bbox": {"l": 198.74605, "t": 654.54532, "r": 244.81310999999997, "b": 663.45187, "coord_origin": "TOPLEFT"}}, {"id": 85, "text": "Structural tags length", "bbox": {"l": 101.01604, "t": 669.48932, "r": 186.24606, "b": 678.39588, "coord_origin": "TOPLEFT"}}, {"id": 86, "text": "\u2264", "bbox": {"l": 188.73605, "t": 668.77201, "r": 196.48497, "b": 678.1767, "coord_origin": "TOPLEFT"}}, {"id": 87, "text": "512 tokens.", "bbox": {"l": 198.97505, "t": 669.48932, "r": 244.81296999999998, "b": 678.39588, "coord_origin": "TOPLEFT"}}, {"id": 88, "text": "(2)", "bbox": {"l": 274.74606, "t": 662.11731, "r": 286.36246, "b": 671.02388, "coord_origin": "TOPLEFT"}}]}, {"id": 10, "label": "text", "bbox": {"l": 49.31819534301758, "t": 691.8294067382812, "r": 286.36514, "b": 713.1538696289062, "coord_origin": "TOPLEFT"}, "confidence": 0.9724978804588318, "cells": [{"id": 89, "text": "Although input constraints are used also by other methods,", "bbox": {"l": 50.112061, "t": 692.290314, "r": 286.36514, "b": 701.196877, "coord_origin": "TOPLEFT"}}, {"id": 90, "text": "such as EDD, ours are less restrictive due to the improved", "bbox": {"l": 50.112061, "t": 704.245316, "r": 286.36514, "b": 713.151878, "coord_origin": "TOPLEFT"}}]}, {"id": 11, "label": "text", "bbox": {"l": 307.9281921386719, "t": 74.27256774902344, "r": 545.5595703125, "b": 108.48387145996094, "coord_origin": "TOPLEFT"}, "confidence": 0.9799237847328186, "cells": [{"id": 91, "text": "runtime performance and lower memory footprint of Table-", "bbox": {"l": 308.86206, "t": 75.20830999999998, "r": 545.11523, "b": 84.11487, "coord_origin": "TOPLEFT"}}, {"id": 92, "text": "Former.", "bbox": {"l": 308.86206, "t": 87.16332999999997, "r": 339.98523, "b": 96.06988999999999, "coord_origin": "TOPLEFT"}}, {"id": 93, "text": "This allows to utilize input samples with longer", "bbox": {"l": 346.88931, "t": 87.16332999999997, "r": 545.11523, "b": 96.06988999999999, "coord_origin": "TOPLEFT"}}, {"id": 94, "text": "sequences and images with larger dimensions.", "bbox": {"l": 308.86206, "t": 99.11835000000008, "r": 492.96097, "b": 108.0249, "coord_origin": "TOPLEFT"}}]}, {"id": 12, "label": "text", "bbox": {"l": 307.79034423828125, "t": 114.75221252441406, "r": 545.6121215820312, "b": 328.51171875, "coord_origin": "TOPLEFT"}, "confidence": 0.9880929589271545, "cells": [{"id": 95, "text": "The Transformer Encoder consists of two \u201cTransformer", "bbox": {"l": 320.81705, "t": 116.22937000000002, "r": 545.11499, "b": 125.13593000000003, "coord_origin": "TOPLEFT"}}, {"id": 96, "text": "Encoder Layers\u201d, with an input feature size of 512, feed", "bbox": {"l": 308.86206, "t": 128.18439, "r": 545.11517, "b": 137.09094000000005, "coord_origin": "TOPLEFT"}}, {"id": 97, "text": "forward network of 1024, and 4 attention heads. As for the", "bbox": {"l": 308.86206, "t": 140.13940000000002, "r": 545.11505, "b": 149.04596000000004, "coord_origin": "TOPLEFT"}}, {"id": 98, "text": "Transformer Decoder it is composed of four \u201cTransformer", "bbox": {"l": 308.86206, "t": 152.09442, "r": 545.11511, "b": 161.00098000000003, "coord_origin": "TOPLEFT"}}, {"id": 99, "text": "Decoder Layers\u201d with similar input and output dimensions", "bbox": {"l": 308.86206, "t": 164.04944, "r": 545.11517, "b": 172.95599000000004, "coord_origin": "TOPLEFT"}}, {"id": 100, "text": "as the \u201cTransformer Encoder Layers\u201d.", "bbox": {"l": 308.86206, "t": 176.00543000000005, "r": 467.21756000000005, "b": 184.91198999999995, "coord_origin": "TOPLEFT"}}, {"id": 101, "text": "Even though our", "bbox": {"l": 475.43671, "t": 176.00543000000005, "r": 545.11511, "b": 184.91198999999995, "coord_origin": "TOPLEFT"}}, {"id": 102, "text": "model uses fewer layers and heads than the default imple-", "bbox": {"l": 308.86206, "t": 187.96045000000004, "r": 545.11511, "b": 196.86699999999996, "coord_origin": "TOPLEFT"}}, {"id": 103, "text": "mentation parameters, our extensive experimentation has", "bbox": {"l": 308.86206, "t": 199.91547000000003, "r": 545.11511, "b": 208.82201999999995, "coord_origin": "TOPLEFT"}}, {"id": 104, "text": "proved this setup to be more suitable for table images. We", "bbox": {"l": 308.86206, "t": 211.87048000000004, "r": 545.11517, "b": 220.77704000000006, "coord_origin": "TOPLEFT"}}, {"id": 105, "text": "attribute this finding to the inherent design of table im-", "bbox": {"l": 308.86206, "t": 223.82550000000003, "r": 545.11511, "b": 232.73206000000005, "coord_origin": "TOPLEFT"}}, {"id": 106, "text": "ages, which contain mostly lines and text, unlike the more", "bbox": {"l": 308.86206, "t": 235.78052000000002, "r": 545.11511, "b": 244.68706999999995, "coord_origin": "TOPLEFT"}}, {"id": 107, "text": "elaborate content present in other scopes (e.g. the COCO", "bbox": {"l": 308.86206, "t": 247.73650999999995, "r": 545.11523, "b": 256.64306999999997, "coord_origin": "TOPLEFT"}}, {"id": 108, "text": "dataset).", "bbox": {"l": 308.86206, "t": 259.69152999999994, "r": 342.3364, "b": 268.59808, "coord_origin": "TOPLEFT"}}, {"id": 109, "text": "Moreover, we have added ResNet blocks to the", "bbox": {"l": 348.95157, "t": 259.69152999999994, "r": 545.11517, "b": 268.59808, "coord_origin": "TOPLEFT"}}, {"id": 110, "text": "inputs of the Structure Decoder and Cell BBox Decoder.", "bbox": {"l": 308.86206, "t": 271.64655000000005, "r": 545.11517, "b": 280.55310000000003, "coord_origin": "TOPLEFT"}}, {"id": 111, "text": "This prevents a decoder having a stronger influence over the", "bbox": {"l": 308.86206, "t": 283.6015300000001, "r": 545.1153, "b": 292.50809, "coord_origin": "TOPLEFT"}}, {"id": 112, "text": "learned weights which would damage the other prediction", "bbox": {"l": 308.86206, "t": 295.55652, "r": 545.11511, "b": 304.46307, "coord_origin": "TOPLEFT"}}, {"id": 113, "text": "task (structure vs bounding boxes), but learn task specific", "bbox": {"l": 308.86206, "t": 307.51151, "r": 545.11511, "b": 316.41806, "coord_origin": "TOPLEFT"}}, {"id": 114, "text": "weights instead. Lastly our dropout layers are set to 0.5.", "bbox": {"l": 308.86206, "t": 319.4674999999999, "r": 532.48267, "b": 328.37405, "coord_origin": "TOPLEFT"}}]}, {"id": 13, "label": "text", "bbox": {"l": 307.6786193847656, "t": 335.59222412109375, "r": 545.5258178710938, "b": 429.6965637207031, "coord_origin": "TOPLEFT"}, "confidence": 0.9877589344978333, "cells": [{"id": 115, "text": "For training, TableFormer is trained with 3 Adam opti-", "bbox": {"l": 320.81705, "t": 336.57751, "r": 545.11499, "b": 345.48407000000003, "coord_origin": "TOPLEFT"}}, {"id": 116, "text": "mizers, each one for the", "bbox": {"l": 308.86206, "t": 348.5325000000001, "r": 403.7359, "b": 357.43906, "coord_origin": "TOPLEFT"}}, {"id": 117, "text": "CNN Backbone Network", "bbox": {"l": 406.07605, "t": 348.62216, "r": 503.54016, "b": 357.20993, "coord_origin": "TOPLEFT"}}, {"id": 118, "text": ",", "bbox": {"l": 503.53906, "t": 348.5325000000001, "r": 506.02972, "b": 357.43906, "coord_origin": "TOPLEFT"}}, {"id": 119, "text": "Structure", "bbox": {"l": 508.40004999999996, "t": 348.62216, "r": 545.11224, "b": 357.20993, "coord_origin": "TOPLEFT"}}, {"id": 120, "text": "Decoder", "bbox": {"l": 308.86206, "t": 360.57715, "r": 343.1633, "b": 369.16492000000005, "coord_origin": "TOPLEFT"}}, {"id": 121, "text": ", and", "bbox": {"l": 343.16306, "t": 360.48749, "r": 362.2016, "b": 369.39404, "coord_origin": "TOPLEFT"}}, {"id": 122, "text": "Cell BBox Decoder", "bbox": {"l": 364.28604, "t": 360.57715, "r": 440.93829, "b": 369.16492000000005, "coord_origin": "TOPLEFT"}}, {"id": 123, "text": ". Taking the PubTabNet as", "bbox": {"l": 440.93903, "t": 360.48749, "r": 545.10797, "b": 369.39404, "coord_origin": "TOPLEFT"}}, {"id": 124, "text": "an example for our parameter set up, the initializing learn-", "bbox": {"l": 308.86203, "t": 372.44247, "r": 545.11511, "b": 381.34903, "coord_origin": "TOPLEFT"}}, {"id": 125, "text": "ing rate is 0.001 for 12 epochs with a batch size of 24, and", "bbox": {"l": 308.86203, "t": 384.3984699999999, "r": 545.11517, "b": 393.30502, "coord_origin": "TOPLEFT"}}, {"id": 126, "text": "\u03bb", "bbox": {"l": 308.86203, "t": 396.19406000000004, "r": 314.67322, "b": 405.04083, "coord_origin": "TOPLEFT"}}, {"id": 127, "text": "set to 0.5.", "bbox": {"l": 318.65802, "t": 396.35345, "r": 360.39139, "b": 405.2600100000001, "coord_origin": "TOPLEFT"}}, {"id": 128, "text": "Afterwards, we reduce the learning rate to", "bbox": {"l": 367.96295, "t": 396.35345, "r": 545.10803, "b": 405.2600100000001, "coord_origin": "TOPLEFT"}}, {"id": 129, "text": "0.0001, the batch size to 18 and train for 12 more epochs or", "bbox": {"l": 308.86203, "t": 408.30844, "r": 545.11511, "b": 417.215, "coord_origin": "TOPLEFT"}}, {"id": 130, "text": "convergence.", "bbox": {"l": 308.86203, "t": 420.26343, "r": 360.9664, "b": 429.16998, "coord_origin": "TOPLEFT"}}]}, {"id": 14, "label": "text", "bbox": {"l": 307.8315124511719, "t": 436.140380859375, "r": 545.4056396484375, "b": 554.6504516601562, "coord_origin": "TOPLEFT"}, "confidence": 0.9884491562843323, "cells": [{"id": 131, "text": "TableFormer is implemented with PyTorch and Torchvi-", "bbox": {"l": 320.81702, "t": 437.37441999999993, "r": 545.11499, "b": 446.28098, "coord_origin": "TOPLEFT"}}, {"id": 132, "text": "sion libraries [22].", "bbox": {"l": 308.86203, "t": 449.32941, "r": 384.62759, "b": 458.23596, "coord_origin": "TOPLEFT"}}, {"id": 133, "text": "To speed up the inference, the image", "bbox": {"l": 391.37228, "t": 449.32941, "r": 545.11511, "b": 458.23596, "coord_origin": "TOPLEFT"}}, {"id": 134, "text": "undergoes a single forward pass through the", "bbox": {"l": 308.86203, "t": 461.28439, "r": 494.00693000000007, "b": 470.19095, "coord_origin": "TOPLEFT"}}, {"id": 135, "text": "CNN Back-", "bbox": {"l": 498.07803, "t": 461.37405, "r": 545.11145, "b": 469.96182, "coord_origin": "TOPLEFT"}}, {"id": 136, "text": "bone Network", "bbox": {"l": 308.86203, "t": 473.32904, "r": 364.44336, "b": 481.91681, "coord_origin": "TOPLEFT"}}, {"id": 137, "text": "and transformer encoder. This eliminates the", "bbox": {"l": 367.06104, "t": 473.23938, "r": 545.11267, "b": 482.14594, "coord_origin": "TOPLEFT"}}, {"id": 138, "text": "overhead of generating the same features for each decoding", "bbox": {"l": 308.86203, "t": 485.19437, "r": 545.11511, "b": 494.10092, "coord_origin": "TOPLEFT"}}, {"id": 139, "text": "step. Similarly, we employ a \u2019caching\u2019 technique to preform", "bbox": {"l": 308.86203, "t": 497.14935, "r": 545.11523, "b": 506.05591, "coord_origin": "TOPLEFT"}}, {"id": 140, "text": "faster autoregressive decoding. This is achieved by storing", "bbox": {"l": 308.86203, "t": 509.10535, "r": 545.11511, "b": 518.0119, "coord_origin": "TOPLEFT"}}, {"id": 141, "text": "the features of decoded tokens so we can reuse them for", "bbox": {"l": 308.86203, "t": 521.06033, "r": 545.11517, "b": 529.9668899999999, "coord_origin": "TOPLEFT"}}, {"id": 142, "text": "each time step. Therefore, we only compute the attention", "bbox": {"l": 308.86203, "t": 533.01532, "r": 545.11517, "b": 541.9218900000001, "coord_origin": "TOPLEFT"}}, {"id": 143, "text": "for each new tag.", "bbox": {"l": 308.86203, "t": 544.97034, "r": 377.21548, "b": 553.87689, "coord_origin": "TOPLEFT"}}]}, {"id": 15, "label": "section_header", "bbox": {"l": 308.1548767089844, "t": 578.8842163085938, "r": 397.44281, "b": 589.40637, "coord_origin": "TOPLEFT"}, "confidence": 0.9450808763504028, "cells": [{"id": 144, "text": "5.2.", "bbox": {"l": 308.86203, "t": 579.55432, "r": 323.9046, "b": 589.40637, "coord_origin": "TOPLEFT"}}, {"id": 145, "text": "Generalization", "bbox": {"l": 333.93301, "t": 579.55432, "r": 397.44281, "b": 589.40637, "coord_origin": "TOPLEFT"}}]}, {"id": 16, "label": "text", "bbox": {"l": 308.0038757324219, "t": 602.3896484375, "r": 545.11517, "b": 672.86962890625, "coord_origin": "TOPLEFT"}, "confidence": 0.9880596995353699, "cells": [{"id": 146, "text": "TableFormer is evaluated on three major publicly avail-", "bbox": {"l": 320.81702, "t": 603.44933, "r": 545.11493, "b": 612.3558800000001, "coord_origin": "TOPLEFT"}}, {"id": 147, "text": "able datasets of different nature to prove the generalization", "bbox": {"l": 308.86203, "t": 615.40433, "r": 545.11511, "b": 624.31088, "coord_origin": "TOPLEFT"}}, {"id": 148, "text": "and effectiveness of our model. The datasets used for eval-", "bbox": {"l": 308.86203, "t": 627.35933, "r": 545.11517, "b": 636.26588, "coord_origin": "TOPLEFT"}}, {"id": 149, "text": "uation are the PubTabNet, FinTabNet and TableBank which", "bbox": {"l": 308.86203, "t": 639.31433, "r": 545.11511, "b": 648.22089, "coord_origin": "TOPLEFT"}}, {"id": 150, "text": "stem from the scientific, financial and general domains re-", "bbox": {"l": 308.86203, "t": 651.27032, "r": 545.11517, "b": 660.17688, "coord_origin": "TOPLEFT"}}, {"id": 151, "text": "spectively.", "bbox": {"l": 308.86203, "t": 663.22533, "r": 350.70493, "b": 672.13189, "coord_origin": "TOPLEFT"}}]}, {"id": 17, "label": "text", "bbox": {"l": 308.2522277832031, "t": 679.5938720703125, "r": 545.2968139648438, "b": 713.7670288085938, "coord_origin": "TOPLEFT"}, "confidence": 0.9830910563468933, "cells": [{"id": 152, "text": "We also share our baseline results on the challenging", "bbox": {"l": 320.81702, "t": 680.33533, "r": 545.11505, "b": 689.24189, "coord_origin": "TOPLEFT"}}, {"id": 153, "text": "SynthTabNet dataset.", "bbox": {"l": 308.86203, "t": 692.290329, "r": 396.21411, "b": 701.196892, "coord_origin": "TOPLEFT"}}, {"id": 154, "text": "Throughout our experiments, the", "bbox": {"l": 406.40585, "t": 692.290329, "r": 545.11523, "b": 701.196892, "coord_origin": "TOPLEFT"}}, {"id": 155, "text": "same parameters stated in Sec. 5.1 are utilized.", "bbox": {"l": 308.86203, "t": 704.246323, "r": 495.93982, "b": 713.152893, "coord_origin": "TOPLEFT"}}]}, {"id": 18, "label": "page_footer", "bbox": {"l": 294.4747009277344, "t": 733.2755126953125, "r": 300.3784484863281, "b": 743.03989, "coord_origin": "TOPLEFT"}, "confidence": 0.88204026222229, "cells": [{"id": 156, "text": "6", "bbox": {"l": 295.12103, "t": 734.133327, "r": 300.10233, "b": 743.03989, "coord_origin": "TOPLEFT"}}]}]}, "tablestructure": {"table_map": {}}, "figures_classification": null, "equations_prediction": null}, "assembled": {"elements": [{"label": "text", "id": 0, "page_no": 5, "cluster": {"id": 0, "label": "text", "bbox": {"l": 49.35792922973633, "t": 74.26022338867188, "r": 286.5823059082031, "b": 156.22055053710938, "coord_origin": "TOPLEFT"}, "confidence": 0.9868088960647583, "cells": [{"id": 0, "text": "tention encoding is then multiplied to the encoded image to", "bbox": {"l": 50.112, "t": 75.20836999999995, "r": 286.36514, "b": 84.11492999999996, "coord_origin": "TOPLEFT"}}, {"id": 1, "text": "produce a feature for each table cell. Notice that this is dif-", "bbox": {"l": 50.112, "t": 87.16339000000005, "r": 286.36508, "b": 96.06994999999995, "coord_origin": "TOPLEFT"}}, {"id": 2, "text": "ferent than the typical object detection problem where im-", "bbox": {"l": 50.112, "t": 99.11841000000004, "r": 286.36508, "b": 108.02495999999985, "coord_origin": "TOPLEFT"}}, {"id": 3, "text": "balances between the number of detections and the amount", "bbox": {"l": 50.112, "t": 111.07343000000003, "r": 286.36508, "b": 119.97997999999984, "coord_origin": "TOPLEFT"}}, {"id": 4, "text": "of objects may exist. In our case, we know up front that", "bbox": {"l": 50.112, "t": 123.02844000000005, "r": 286.36508, "b": 131.93499999999995, "coord_origin": "TOPLEFT"}}, {"id": 5, "text": "the produced detections always match with the table cells", "bbox": {"l": 50.112, "t": 134.98443999999995, "r": 286.36514, "b": 143.89099, "coord_origin": "TOPLEFT"}}, {"id": 6, "text": "in number and correspondence.", "bbox": {"l": 50.112, "t": 146.93944999999997, "r": 175.16254, "b": 155.84600999999998, "coord_origin": "TOPLEFT"}}]}, "text": "tention encoding is then multiplied to the encoded image to produce a feature for each table cell. Notice that this is different than the typical object detection problem where imbalances between the number of detections and the amount of objects may exist. In our case, we know up front that the produced detections always match with the table cells in number and correspondence."}, {"label": "text", "id": 1, "page_no": 5, "cluster": {"id": 1, "label": "text", "bbox": {"l": 49.385826110839844, "t": 158.92007446289062, "r": 286.5838928222656, "b": 240.56492614746094, "coord_origin": "TOPLEFT"}, "confidence": 0.9860329031944275, "cells": [{"id": 7, "text": "The output features for each table cell are then fed", "bbox": {"l": 62.067001, "t": 159.62445000000002, "r": 286.36496, "b": 168.53101000000004, "coord_origin": "TOPLEFT"}}, {"id": 8, "text": "into the feed-forward network (FFN). The FFN consists", "bbox": {"l": 50.112, "t": 171.58043999999995, "r": 286.36511, "b": 180.48699999999997, "coord_origin": "TOPLEFT"}}, {"id": 9, "text": "of a Multi-Layer Perceptron (3 layers with ReLU activa-", "bbox": {"l": 50.112, "t": 183.53545999999994, "r": 286.36511, "b": 192.44201999999996, "coord_origin": "TOPLEFT"}}, {"id": 10, "text": "tion function) that predicts the normalized coordinates for", "bbox": {"l": 50.112, "t": 195.49048000000005, "r": 286.36511, "b": 204.39702999999997, "coord_origin": "TOPLEFT"}}, {"id": 11, "text": "the bounding box of each table cell. Finally, the predicted", "bbox": {"l": 50.112, "t": 207.44550000000004, "r": 286.36511, "b": 216.35204999999996, "coord_origin": "TOPLEFT"}}, {"id": 12, "text": "bounding boxes are classified based on whether they are", "bbox": {"l": 50.112, "t": 219.40051000000005, "r": 286.36511, "b": 228.30706999999995, "coord_origin": "TOPLEFT"}}, {"id": 13, "text": "empty or not using a linear layer.", "bbox": {"l": 50.112, "t": 231.35650999999996, "r": 181.54855, "b": 240.26306, "coord_origin": "TOPLEFT"}}]}, "text": "The output features for each table cell are then fed into the feed-forward network (FFN). The FFN consists of a Multi-Layer Perceptron (3 layers with ReLU activation function) that predicts the normalized coordinates for the bounding box of each table cell. Finally, the predicted bounding boxes are classified based on whether they are empty or not using a linear layer."}, {"label": "text", "id": 2, "page_no": 5, "cluster": {"id": 2, "label": "text", "bbox": {"l": 49.27542495727539, "t": 243.0098419189453, "r": 286.86236572265625, "b": 444.3459777832031, "coord_origin": "TOPLEFT"}, "confidence": 0.9873637557029724, "cells": [{"id": 14, "text": "Loss Functions.", "bbox": {"l": 62.067001, "t": 243.92193999999995, "r": 129.21492, "b": 252.87829999999997, "coord_origin": "TOPLEFT"}}, {"id": 15, "text": "We formulate a multi-task loss Eq. 2", "bbox": {"l": 134.451, "t": 244.04150000000004, "r": 286.36078, "b": 252.94806000000005, "coord_origin": "TOPLEFT"}}, {"id": 16, "text": "to train our network. The Cross-Entropy loss (denoted as", "bbox": {"l": 50.112007, "t": 255.99652000000003, "r": 286.36511, "b": 264.90308000000005, "coord_origin": "TOPLEFT"}}, {"id": 17, "text": "l$_{s}$", "bbox": {"l": 50.112007, "t": 267.79309, "r": 56.84528, "b": 276.63989000000004, "coord_origin": "TOPLEFT"}}, {"id": 18, "text": ") is used to train the", "bbox": {"l": 57.343006, "t": 267.95250999999996, "r": 135.39996, "b": 276.85907, "coord_origin": "TOPLEFT"}}, {"id": 19, "text": "Structure Decoder", "bbox": {"l": 137.735, "t": 268.04218000000003, "r": 211.07965, "b": 276.62994000000003, "coord_origin": "TOPLEFT"}}, {"id": 20, "text": "which predicts the", "bbox": {"l": 213.63699, "t": 267.95250999999996, "r": 286.36395, "b": 276.85907, "coord_origin": "TOPLEFT"}}, {"id": 21, "text": "structure tokens. As for the", "bbox": {"l": 50.112, "t": 279.90747, "r": 158.82388, "b": 288.81406, "coord_origin": "TOPLEFT"}}, {"id": 22, "text": "Cell BBox Decoder", "bbox": {"l": 161.31799, "t": 279.99712999999997, "r": 238.79712, "b": 288.58493, "coord_origin": "TOPLEFT"}}, {"id": 23, "text": "it is trained", "bbox": {"l": 241.521, "t": 279.90747, "r": 286.36264, "b": 288.81406, "coord_origin": "TOPLEFT"}}, {"id": 24, "text": "with a combination of losses denoted as", "bbox": {"l": 50.112, "t": 291.86249, "r": 211.3766, "b": 300.76904, "coord_origin": "TOPLEFT"}}, {"id": 25, "text": "l$_{box}$", "bbox": {"l": 214.271, "t": 291.70309, "r": 229.19780000000003, "b": 300.54987, "coord_origin": "TOPLEFT"}}, {"id": 26, "text": ".", "bbox": {"l": 229.696, "t": 291.86249, "r": 232.18665000000001, "b": 300.76904, "coord_origin": "TOPLEFT"}}, {"id": 27, "text": "l$_{box}$", "bbox": {"l": 236.49001, "t": 291.70309, "r": 251.41681000000003, "b": 300.54987, "coord_origin": "TOPLEFT"}}, {"id": 28, "text": "consists", "bbox": {"l": 254.81099999999998, "t": 291.86249, "r": 286.36255, "b": 300.76904, "coord_origin": "TOPLEFT"}}, {"id": 29, "text": "of the generally used", "bbox": {"l": 50.112, "t": 303.81747, "r": 137.45412, "b": 312.72403, "coord_origin": "TOPLEFT"}}, {"id": 30, "text": "l$_{1}$", "bbox": {"l": 141.298, "t": 303.65808, "r": 148.24258, "b": 312.50485, "coord_origin": "TOPLEFT"}}, {"id": 31, "text": "loss for object detection and the", "bbox": {"l": 152.58601, "t": 303.81747, "r": 286.36377, "b": 312.72403, "coord_origin": "TOPLEFT"}}, {"id": 32, "text": "IoU loss (", "bbox": {"l": 50.112015, "t": 315.77245999999997, "r": 89.683464, "b": 324.67902, "coord_origin": "TOPLEFT"}}, {"id": 33, "text": "l$_{iou}$", "bbox": {"l": 89.68602, "t": 315.61307, "r": 104.12046, "b": 324.45984, "coord_origin": "TOPLEFT"}}, {"id": 34, "text": ") to be scale invariant as explained in [25]. In", "bbox": {"l": 104.61802, "t": 315.77245999999997, "r": 286.36572, "b": 324.67902, "coord_origin": "TOPLEFT"}}, {"id": 35, "text": "comparison to DETR, we do not use the Hungarian algo-", "bbox": {"l": 50.112019, "t": 327.72845, "r": 286.36511, "b": 336.6350100000001, "coord_origin": "TOPLEFT"}}, {"id": 36, "text": "rithm [15] to match the predicted bounding boxes with the", "bbox": {"l": 50.112019, "t": 339.68344, "r": 286.36508, "b": 348.59, "coord_origin": "TOPLEFT"}}, {"id": 37, "text": "ground-truth boxes, as we have already achieved a one-to-", "bbox": {"l": 50.112019, "t": 351.63843, "r": 286.36511, "b": 360.54498, "coord_origin": "TOPLEFT"}}, {"id": 38, "text": "one match through two steps: 1) Our token input sequence", "bbox": {"l": 50.112019, "t": 363.59341, "r": 286.36508, "b": 372.49996999999996, "coord_origin": "TOPLEFT"}}, {"id": 39, "text": "is naturally ordered, therefore the hidden states of the table", "bbox": {"l": 50.112019, "t": 375.5484, "r": 286.36511, "b": 384.45496, "coord_origin": "TOPLEFT"}}, {"id": 40, "text": "data cells are also in order when they are provided as in-", "bbox": {"l": 50.112019, "t": 387.50339, "r": 286.36514, "b": 396.40994, "coord_origin": "TOPLEFT"}}, {"id": 41, "text": "put to the", "bbox": {"l": 50.112019, "t": 399.45938, "r": 88.68721, "b": 408.36594, "coord_origin": "TOPLEFT"}}, {"id": 42, "text": "Cell BBox Decoder", "bbox": {"l": 91.646019, "t": 399.54904, "r": 170.0517, "b": 408.13681, "coord_origin": "TOPLEFT"}}, {"id": 43, "text": ", and 2) Our bounding boxes", "bbox": {"l": 170.05103, "t": 399.45938, "r": 286.36438, "b": 408.36594, "coord_origin": "TOPLEFT"}}, {"id": 44, "text": "generation mechanism (see Sec.", "bbox": {"l": 50.112022, "t": 411.41437, "r": 181.96703, "b": 420.32092, "coord_origin": "TOPLEFT"}}, {"id": 45, "text": "3)", "bbox": {"l": 189.09029, "t": 411.41437, "r": 197.74918, "b": 420.32092, "coord_origin": "TOPLEFT"}}, {"id": 46, "text": "ensures a one-to-one", "bbox": {"l": 200.34789, "t": 411.41437, "r": 286.36511, "b": 420.32092, "coord_origin": "TOPLEFT"}}, {"id": 47, "text": "mapping between the cell content and its bounding box for", "bbox": {"l": 50.112022, "t": 423.36934999999994, "r": 286.36511, "b": 432.27591, "coord_origin": "TOPLEFT"}}, {"id": 48, "text": "all post-processed datasets.", "bbox": {"l": 50.112022, "t": 435.32434, "r": 158.2959, "b": 444.2309, "coord_origin": "TOPLEFT"}}]}, "text": "Loss Functions. We formulate a multi-task loss Eq. 2 to train our network. The Cross-Entropy loss (denoted as l$_{s}$ ) is used to train the Structure Decoder which predicts the structure tokens. As for the Cell BBox Decoder it is trained with a combination of losses denoted as l$_{box}$ . l$_{box}$ consists of the generally used l$_{1}$ loss for object detection and the IoU loss ( l$_{iou}$ ) to be scale invariant as explained in [25]. In comparison to DETR, we do not use the Hungarian algorithm [15] to match the predicted bounding boxes with the ground-truth boxes, as we have already achieved a one-toone match through two steps: 1) Our token input sequence is naturally ordered, therefore the hidden states of the table data cells are also in order when they are provided as input to the Cell BBox Decoder , and 2) Our bounding boxes generation mechanism (see Sec. 3) ensures a one-to-one mapping between the cell content and its bounding box for all post-processed datasets."}, {"label": "text", "id": 3, "page_no": 5, "cluster": {"id": 3, "label": "text", "bbox": {"l": 49.76386642456055, "t": 446.9576110839844, "r": 286.42083740234375, "b": 469.1785583496094, "coord_origin": "TOPLEFT"}, "confidence": 0.9724196195602417, "cells": [{"id": 49, "text": "The loss used to train the TableFormer can be defined as", "bbox": {"l": 62.067024, "t": 448.01035, "r": 286.36499, "b": 456.9169, "coord_origin": "TOPLEFT"}}, {"id": 50, "text": "following:", "bbox": {"l": 50.112022, "t": 459.96533, "r": 91.377113, "b": 468.87189, "coord_origin": "TOPLEFT"}}]}, "text": "The loss used to train the TableFormer can be defined as following:"}, {"label": "formula", "id": 4, "page_no": 5, "cluster": {"id": 4, "label": "formula", "bbox": {"l": 123.5527114868164, "t": 492.1290588378906, "r": 286.36480712890625, "b": 518.0797119140625, "coord_origin": "TOPLEFT"}, "confidence": 0.9423348307609558, "cells": [{"id": 51, "text": "l$_{box}$", "bbox": {"l": 125.71502, "t": 493.28094, "r": 140.64182, "b": 502.12772, "coord_origin": "TOPLEFT"}}, {"id": 52, "text": "=", "bbox": {"l": 143.90701, "t": 493.28094, "r": 151.65593, "b": 502.12772, "coord_origin": "TOPLEFT"}}, {"id": 53, "text": "\u03bb$_{iou}$l$_{iou}$", "bbox": {"l": 154.42302, "t": 493.28094, "r": 186.62846, "b": 502.12772, "coord_origin": "TOPLEFT"}}, {"id": 54, "text": "+", "bbox": {"l": 189.34003, "t": 493.28094, "r": 197.08894, "b": 502.12772, "coord_origin": "TOPLEFT"}}, {"id": 55, "text": "\u03bb$_{l}$$_{1}$", "bbox": {"l": 199.30302, "t": 493.28094, "r": 211.64659, "b": 502.12772, "coord_origin": "TOPLEFT"}}, {"id": 56, "text": "l", "bbox": {"l": 124.33002, "t": 508.22495, "r": 127.30286, "b": 517.07172, "coord_origin": "TOPLEFT"}}, {"id": 57, "text": "=", "bbox": {"l": 130.26602, "t": 508.22495, "r": 138.01494, "b": 517.07172, "coord_origin": "TOPLEFT"}}, {"id": 58, "text": "\u03bbl$_{s}$", "bbox": {"l": 140.78203, "t": 508.22495, "r": 153.32629, "b": 517.07172, "coord_origin": "TOPLEFT"}}, {"id": 59, "text": "+ (1", "bbox": {"l": 156.03903, "t": 508.22495, "r": 174.85541, "b": 517.07172, "coord_origin": "TOPLEFT"}}, {"id": 60, "text": "\u2212", "bbox": {"l": 177.07103, "t": 507.66702, "r": 184.81995, "b": 517.07172, "coord_origin": "TOPLEFT"}}, {"id": 61, "text": "\u03bb", "bbox": {"l": 187.03304, "t": 508.22495, "r": 192.84422, "b": 517.07172, "coord_origin": "TOPLEFT"}}, {"id": 62, "text": ")", "bbox": {"l": 192.84503, "t": 508.22495, "r": 196.71948, "b": 517.07172, "coord_origin": "TOPLEFT"}}, {"id": 63, "text": "l$_{box}$", "bbox": {"l": 196.71902, "t": 508.22495, "r": 211.64583, "b": 517.07172, "coord_origin": "TOPLEFT"}}, {"id": 64, "text": "(1)", "bbox": {"l": 274.74603, "t": 501.01132, "r": 286.36243, "b": 509.91788, "coord_origin": "TOPLEFT"}}]}, "text": "l$_{box}$ = \u03bb$_{iou}$l$_{iou}$ + \u03bb$_{l}$$_{1}$ l = \u03bbl$_{s}$ + (1 \u2212 \u03bb ) l$_{box}$ (1)"}, {"label": "text", "id": 5, "page_no": 5, "cluster": {"id": 5, "label": "text", "bbox": {"l": 49.46774673461914, "t": 530.3292236328125, "r": 281.59692, "b": 541.0584106445312, "coord_origin": "TOPLEFT"}, "confidence": 0.9363928437232971, "cells": [{"id": 65, "text": "where", "bbox": {"l": 50.11203, "t": 531.30933, "r": 74.450661, "b": 540.21588, "coord_origin": "TOPLEFT"}}, {"id": 66, "text": "\u03bb", "bbox": {"l": 76.941032, "t": 531.14993, "r": 82.75222, "b": 539.9967, "coord_origin": "TOPLEFT"}}, {"id": 67, "text": "\u2208", "bbox": {"l": 85.520035, "t": 530.5920100000001, "r": 92.162102, "b": 539.9967, "coord_origin": "TOPLEFT"}}, {"id": 68, "text": "[0, 1], and", "bbox": {"l": 94.653038, "t": 531.30933, "r": 135.59932, "b": 540.21588, "coord_origin": "TOPLEFT"}}, {"id": 69, "text": "\u03bb$_{iou}$, \u03bb$_{l}$$_{1}$", "bbox": {"l": 138.09004, "t": 531.14993, "r": 172.63162, "b": 539.9967, "coord_origin": "TOPLEFT"}}, {"id": 70, "text": "\u2208$_{R}$", "bbox": {"l": 175.89705, "t": 530.5920100000001, "r": 192.50104, "b": 539.9967, "coord_origin": "TOPLEFT"}}, {"id": 71, "text": "are hyper-parameters.", "bbox": {"l": 194.99205, "t": 531.30933, "r": 281.59692, "b": 540.21588, "coord_origin": "TOPLEFT"}}]}, "text": "where \u03bb \u2208 [0, 1], and \u03bb$_{iou}$, \u03bb$_{l}$$_{1}$ \u2208$_{R}$ are hyper-parameters."}, {"label": "section_header", "id": 6, "page_no": 5, "cluster": {"id": 6, "label": "section_header", "bbox": {"l": 49.37754821777344, "t": 554.845703125, "r": 171.98335, "b": 566.986572265625, "coord_origin": "TOPLEFT"}, "confidence": 0.9554555416107178, "cells": [{"id": 72, "text": "5.", "bbox": {"l": 50.112045, "t": 555.91689, "r": 57.92831799999999, "b": 566.66461, "coord_origin": "TOPLEFT"}}, {"id": 73, "text": "Experimental Results", "bbox": {"l": 68.350014, "t": 555.91689, "r": 171.98335, "b": 566.66461, "coord_origin": "TOPLEFT"}}]}, "text": "5. Experimental Results"}, {"label": "section_header", "id": 7, "page_no": 5, "cluster": {"id": 7, "label": "section_header", "bbox": {"l": 49.518253326416016, "t": 575.6320190429688, "r": 179.17800903320312, "b": 586.595947265625, "coord_origin": "TOPLEFT"}, "confidence": 0.9538504481315613, "cells": [{"id": 74, "text": "5.1.", "bbox": {"l": 50.112045, "t": 576.26433, "r": 64.693237, "b": 586.1163799999999, "coord_origin": "TOPLEFT"}}, {"id": 75, "text": "Implementation Details", "bbox": {"l": 74.414032, "t": 576.26433, "r": 179.17502, "b": 586.1163799999999, "coord_origin": "TOPLEFT"}}]}, "text": "5.1. Implementation Details"}, {"label": "text", "id": 8, "page_no": 5, "cluster": {"id": 8, "label": "text", "bbox": {"l": 49.45502471923828, "t": 594.9777221679688, "r": 286.3757629394531, "b": 641.0610961914062, "coord_origin": "TOPLEFT"}, "confidence": 0.9856163263320923, "cells": [{"id": 76, "text": "TableFormer uses ResNet-18 as the", "bbox": {"l": 62.067047, "t": 595.73433, "r": 202.97806, "b": 604.64088, "coord_origin": "TOPLEFT"}}, {"id": 77, "text": "CNN Backbone Net-", "bbox": {"l": 205.38405, "t": 595.82399, "r": 286.36008, "b": 604.41174, "coord_origin": "TOPLEFT"}}, {"id": 78, "text": "work", "bbox": {"l": 50.112045, "t": 607.77899, "r": 70.037247, "b": 616.3667399999999, "coord_origin": "TOPLEFT"}}, {"id": 79, "text": ". The input images are resized to 448*448 pixels and", "bbox": {"l": 70.037048, "t": 607.68933, "r": 286.36496, "b": 616.59589, "coord_origin": "TOPLEFT"}}, {"id": 80, "text": "the feature map has a dimension of 28*28. Additionally, we", "bbox": {"l": 50.112049, "t": 619.64433, "r": 286.36517, "b": 628.55089, "coord_origin": "TOPLEFT"}}, {"id": 81, "text": "enforce the following input constraints:", "bbox": {"l": 50.112049, "t": 631.60033, "r": 207.03294, "b": 640.50688, "coord_origin": "TOPLEFT"}}]}, "text": "TableFormer uses ResNet-18 as the CNN Backbone Network . The input images are resized to 448*448 pixels and the feature map has a dimension of 28*28. Additionally, we enforce the following input constraints:"}, {"label": "formula", "id": 9, "page_no": 5, "cluster": {"id": 9, "label": "formula", "bbox": {"l": 91.31732940673828, "t": 653.6121215820312, "r": 286.36246, "b": 678.39588, "coord_origin": "TOPLEFT"}, "confidence": 0.8441831469535828, "cells": [{"id": 82, "text": "Image width and height", "bbox": {"l": 91.661049, "t": 654.54532, "r": 186.01683, "b": 663.45187, "coord_origin": "TOPLEFT"}}, {"id": 83, "text": "\u2264", "bbox": {"l": 188.50705, "t": 653.828, "r": 196.25597, "b": 663.2327, "coord_origin": "TOPLEFT"}}, {"id": 84, "text": "1024 pixels", "bbox": {"l": 198.74605, "t": 654.54532, "r": 244.81310999999997, "b": 663.45187, "coord_origin": "TOPLEFT"}}, {"id": 85, "text": "Structural tags length", "bbox": {"l": 101.01604, "t": 669.48932, "r": 186.24606, "b": 678.39588, "coord_origin": "TOPLEFT"}}, {"id": 86, "text": "\u2264", "bbox": {"l": 188.73605, "t": 668.77201, "r": 196.48497, "b": 678.1767, "coord_origin": "TOPLEFT"}}, {"id": 87, "text": "512 tokens.", "bbox": {"l": 198.97505, "t": 669.48932, "r": 244.81296999999998, "b": 678.39588, "coord_origin": "TOPLEFT"}}, {"id": 88, "text": "(2)", "bbox": {"l": 274.74606, "t": 662.11731, "r": 286.36246, "b": 671.02388, "coord_origin": "TOPLEFT"}}]}, "text": "Image width and height \u2264 1024 pixels Structural tags length \u2264 512 tokens. (2)"}, {"label": "text", "id": 10, "page_no": 5, "cluster": {"id": 10, "label": "text", "bbox": {"l": 49.31819534301758, "t": 691.8294067382812, "r": 286.36514, "b": 713.1538696289062, "coord_origin": "TOPLEFT"}, "confidence": 0.9724978804588318, "cells": [{"id": 89, "text": "Although input constraints are used also by other methods,", "bbox": {"l": 50.112061, "t": 692.290314, "r": 286.36514, "b": 701.196877, "coord_origin": "TOPLEFT"}}, {"id": 90, "text": "such as EDD, ours are less restrictive due to the improved", "bbox": {"l": 50.112061, "t": 704.245316, "r": 286.36514, "b": 713.151878, "coord_origin": "TOPLEFT"}}]}, "text": "Although input constraints are used also by other methods, such as EDD, ours are less restrictive due to the improved"}, {"label": "text", "id": 11, "page_no": 5, "cluster": {"id": 11, "label": "text", "bbox": {"l": 307.9281921386719, "t": 74.27256774902344, "r": 545.5595703125, "b": 108.48387145996094, "coord_origin": "TOPLEFT"}, "confidence": 0.9799237847328186, "cells": [{"id": 91, "text": "runtime performance and lower memory footprint of Table-", "bbox": {"l": 308.86206, "t": 75.20830999999998, "r": 545.11523, "b": 84.11487, "coord_origin": "TOPLEFT"}}, {"id": 92, "text": "Former.", "bbox": {"l": 308.86206, "t": 87.16332999999997, "r": 339.98523, "b": 96.06988999999999, "coord_origin": "TOPLEFT"}}, {"id": 93, "text": "This allows to utilize input samples with longer", "bbox": {"l": 346.88931, "t": 87.16332999999997, "r": 545.11523, "b": 96.06988999999999, "coord_origin": "TOPLEFT"}}, {"id": 94, "text": "sequences and images with larger dimensions.", "bbox": {"l": 308.86206, "t": 99.11835000000008, "r": 492.96097, "b": 108.0249, "coord_origin": "TOPLEFT"}}]}, "text": "runtime performance and lower memory footprint of TableFormer. This allows to utilize input samples with longer sequences and images with larger dimensions."}, {"label": "text", "id": 12, "page_no": 5, "cluster": {"id": 12, "label": "text", "bbox": {"l": 307.79034423828125, "t": 114.75221252441406, "r": 545.6121215820312, "b": 328.51171875, "coord_origin": "TOPLEFT"}, "confidence": 0.9880929589271545, "cells": [{"id": 95, "text": "The Transformer Encoder consists of two \u201cTransformer", "bbox": {"l": 320.81705, "t": 116.22937000000002, "r": 545.11499, "b": 125.13593000000003, "coord_origin": "TOPLEFT"}}, {"id": 96, "text": "Encoder Layers\u201d, with an input feature size of 512, feed", "bbox": {"l": 308.86206, "t": 128.18439, "r": 545.11517, "b": 137.09094000000005, "coord_origin": "TOPLEFT"}}, {"id": 97, "text": "forward network of 1024, and 4 attention heads. As for the", "bbox": {"l": 308.86206, "t": 140.13940000000002, "r": 545.11505, "b": 149.04596000000004, "coord_origin": "TOPLEFT"}}, {"id": 98, "text": "Transformer Decoder it is composed of four \u201cTransformer", "bbox": {"l": 308.86206, "t": 152.09442, "r": 545.11511, "b": 161.00098000000003, "coord_origin": "TOPLEFT"}}, {"id": 99, "text": "Decoder Layers\u201d with similar input and output dimensions", "bbox": {"l": 308.86206, "t": 164.04944, "r": 545.11517, "b": 172.95599000000004, "coord_origin": "TOPLEFT"}}, {"id": 100, "text": "as the \u201cTransformer Encoder Layers\u201d.", "bbox": {"l": 308.86206, "t": 176.00543000000005, "r": 467.21756000000005, "b": 184.91198999999995, "coord_origin": "TOPLEFT"}}, {"id": 101, "text": "Even though our", "bbox": {"l": 475.43671, "t": 176.00543000000005, "r": 545.11511, "b": 184.91198999999995, "coord_origin": "TOPLEFT"}}, {"id": 102, "text": "model uses fewer layers and heads than the default imple-", "bbox": {"l": 308.86206, "t": 187.96045000000004, "r": 545.11511, "b": 196.86699999999996, "coord_origin": "TOPLEFT"}}, {"id": 103, "text": "mentation parameters, our extensive experimentation has", "bbox": {"l": 308.86206, "t": 199.91547000000003, "r": 545.11511, "b": 208.82201999999995, "coord_origin": "TOPLEFT"}}, {"id": 104, "text": "proved this setup to be more suitable for table images. We", "bbox": {"l": 308.86206, "t": 211.87048000000004, "r": 545.11517, "b": 220.77704000000006, "coord_origin": "TOPLEFT"}}, {"id": 105, "text": "attribute this finding to the inherent design of table im-", "bbox": {"l": 308.86206, "t": 223.82550000000003, "r": 545.11511, "b": 232.73206000000005, "coord_origin": "TOPLEFT"}}, {"id": 106, "text": "ages, which contain mostly lines and text, unlike the more", "bbox": {"l": 308.86206, "t": 235.78052000000002, "r": 545.11511, "b": 244.68706999999995, "coord_origin": "TOPLEFT"}}, {"id": 107, "text": "elaborate content present in other scopes (e.g. the COCO", "bbox": {"l": 308.86206, "t": 247.73650999999995, "r": 545.11523, "b": 256.64306999999997, "coord_origin": "TOPLEFT"}}, {"id": 108, "text": "dataset).", "bbox": {"l": 308.86206, "t": 259.69152999999994, "r": 342.3364, "b": 268.59808, "coord_origin": "TOPLEFT"}}, {"id": 109, "text": "Moreover, we have added ResNet blocks to the", "bbox": {"l": 348.95157, "t": 259.69152999999994, "r": 545.11517, "b": 268.59808, "coord_origin": "TOPLEFT"}}, {"id": 110, "text": "inputs of the Structure Decoder and Cell BBox Decoder.", "bbox": {"l": 308.86206, "t": 271.64655000000005, "r": 545.11517, "b": 280.55310000000003, "coord_origin": "TOPLEFT"}}, {"id": 111, "text": "This prevents a decoder having a stronger influence over the", "bbox": {"l": 308.86206, "t": 283.6015300000001, "r": 545.1153, "b": 292.50809, "coord_origin": "TOPLEFT"}}, {"id": 112, "text": "learned weights which would damage the other prediction", "bbox": {"l": 308.86206, "t": 295.55652, "r": 545.11511, "b": 304.46307, "coord_origin": "TOPLEFT"}}, {"id": 113, "text": "task (structure vs bounding boxes), but learn task specific", "bbox": {"l": 308.86206, "t": 307.51151, "r": 545.11511, "b": 316.41806, "coord_origin": "TOPLEFT"}}, {"id": 114, "text": "weights instead. Lastly our dropout layers are set to 0.5.", "bbox": {"l": 308.86206, "t": 319.4674999999999, "r": 532.48267, "b": 328.37405, "coord_origin": "TOPLEFT"}}]}, "text": "The Transformer Encoder consists of two \u201cTransformer Encoder Layers\u201d, with an input feature size of 512, feed forward network of 1024, and 4 attention heads. As for the Transformer Decoder it is composed of four \u201cTransformer Decoder Layers\u201d with similar input and output dimensions as the \u201cTransformer Encoder Layers\u201d. Even though our model uses fewer layers and heads than the default implementation parameters, our extensive experimentation has proved this setup to be more suitable for table images. We attribute this finding to the inherent design of table images, which contain mostly lines and text, unlike the more elaborate content present in other scopes (e.g. the COCO dataset). Moreover, we have added ResNet blocks to the inputs of the Structure Decoder and Cell BBox Decoder. This prevents a decoder having a stronger influence over the learned weights which would damage the other prediction task (structure vs bounding boxes), but learn task specific weights instead. Lastly our dropout layers are set to 0.5."}, {"label": "text", "id": 13, "page_no": 5, "cluster": {"id": 13, "label": "text", "bbox": {"l": 307.6786193847656, "t": 335.59222412109375, "r": 545.5258178710938, "b": 429.6965637207031, "coord_origin": "TOPLEFT"}, "confidence": 0.9877589344978333, "cells": [{"id": 115, "text": "For training, TableFormer is trained with 3 Adam opti-", "bbox": {"l": 320.81705, "t": 336.57751, "r": 545.11499, "b": 345.48407000000003, "coord_origin": "TOPLEFT"}}, {"id": 116, "text": "mizers, each one for the", "bbox": {"l": 308.86206, "t": 348.5325000000001, "r": 403.7359, "b": 357.43906, "coord_origin": "TOPLEFT"}}, {"id": 117, "text": "CNN Backbone Network", "bbox": {"l": 406.07605, "t": 348.62216, "r": 503.54016, "b": 357.20993, "coord_origin": "TOPLEFT"}}, {"id": 118, "text": ",", "bbox": {"l": 503.53906, "t": 348.5325000000001, "r": 506.02972, "b": 357.43906, "coord_origin": "TOPLEFT"}}, {"id": 119, "text": "Structure", "bbox": {"l": 508.40004999999996, "t": 348.62216, "r": 545.11224, "b": 357.20993, "coord_origin": "TOPLEFT"}}, {"id": 120, "text": "Decoder", "bbox": {"l": 308.86206, "t": 360.57715, "r": 343.1633, "b": 369.16492000000005, "coord_origin": "TOPLEFT"}}, {"id": 121, "text": ", and", "bbox": {"l": 343.16306, "t": 360.48749, "r": 362.2016, "b": 369.39404, "coord_origin": "TOPLEFT"}}, {"id": 122, "text": "Cell BBox Decoder", "bbox": {"l": 364.28604, "t": 360.57715, "r": 440.93829, "b": 369.16492000000005, "coord_origin": "TOPLEFT"}}, {"id": 123, "text": ". Taking the PubTabNet as", "bbox": {"l": 440.93903, "t": 360.48749, "r": 545.10797, "b": 369.39404, "coord_origin": "TOPLEFT"}}, {"id": 124, "text": "an example for our parameter set up, the initializing learn-", "bbox": {"l": 308.86203, "t": 372.44247, "r": 545.11511, "b": 381.34903, "coord_origin": "TOPLEFT"}}, {"id": 125, "text": "ing rate is 0.001 for 12 epochs with a batch size of 24, and", "bbox": {"l": 308.86203, "t": 384.3984699999999, "r": 545.11517, "b": 393.30502, "coord_origin": "TOPLEFT"}}, {"id": 126, "text": "\u03bb", "bbox": {"l": 308.86203, "t": 396.19406000000004, "r": 314.67322, "b": 405.04083, "coord_origin": "TOPLEFT"}}, {"id": 127, "text": "set to 0.5.", "bbox": {"l": 318.65802, "t": 396.35345, "r": 360.39139, "b": 405.2600100000001, "coord_origin": "TOPLEFT"}}, {"id": 128, "text": "Afterwards, we reduce the learning rate to", "bbox": {"l": 367.96295, "t": 396.35345, "r": 545.10803, "b": 405.2600100000001, "coord_origin": "TOPLEFT"}}, {"id": 129, "text": "0.0001, the batch size to 18 and train for 12 more epochs or", "bbox": {"l": 308.86203, "t": 408.30844, "r": 545.11511, "b": 417.215, "coord_origin": "TOPLEFT"}}, {"id": 130, "text": "convergence.", "bbox": {"l": 308.86203, "t": 420.26343, "r": 360.9664, "b": 429.16998, "coord_origin": "TOPLEFT"}}]}, "text": "For training, TableFormer is trained with 3 Adam optimizers, each one for the CNN Backbone Network , Structure Decoder , and Cell BBox Decoder . Taking the PubTabNet as an example for our parameter set up, the initializing learning rate is 0.001 for 12 epochs with a batch size of 24, and \u03bb set to 0.5. Afterwards, we reduce the learning rate to 0.0001, the batch size to 18 and train for 12 more epochs or convergence."}, {"label": "text", "id": 14, "page_no": 5, "cluster": {"id": 14, "label": "text", "bbox": {"l": 307.8315124511719, "t": 436.140380859375, "r": 545.4056396484375, "b": 554.6504516601562, "coord_origin": "TOPLEFT"}, "confidence": 0.9884491562843323, "cells": [{"id": 131, "text": "TableFormer is implemented with PyTorch and Torchvi-", "bbox": {"l": 320.81702, "t": 437.37441999999993, "r": 545.11499, "b": 446.28098, "coord_origin": "TOPLEFT"}}, {"id": 132, "text": "sion libraries [22].", "bbox": {"l": 308.86203, "t": 449.32941, "r": 384.62759, "b": 458.23596, "coord_origin": "TOPLEFT"}}, {"id": 133, "text": "To speed up the inference, the image", "bbox": {"l": 391.37228, "t": 449.32941, "r": 545.11511, "b": 458.23596, "coord_origin": "TOPLEFT"}}, {"id": 134, "text": "undergoes a single forward pass through the", "bbox": {"l": 308.86203, "t": 461.28439, "r": 494.00693000000007, "b": 470.19095, "coord_origin": "TOPLEFT"}}, {"id": 135, "text": "CNN Back-", "bbox": {"l": 498.07803, "t": 461.37405, "r": 545.11145, "b": 469.96182, "coord_origin": "TOPLEFT"}}, {"id": 136, "text": "bone Network", "bbox": {"l": 308.86203, "t": 473.32904, "r": 364.44336, "b": 481.91681, "coord_origin": "TOPLEFT"}}, {"id": 137, "text": "and transformer encoder. This eliminates the", "bbox": {"l": 367.06104, "t": 473.23938, "r": 545.11267, "b": 482.14594, "coord_origin": "TOPLEFT"}}, {"id": 138, "text": "overhead of generating the same features for each decoding", "bbox": {"l": 308.86203, "t": 485.19437, "r": 545.11511, "b": 494.10092, "coord_origin": "TOPLEFT"}}, {"id": 139, "text": "step. Similarly, we employ a \u2019caching\u2019 technique to preform", "bbox": {"l": 308.86203, "t": 497.14935, "r": 545.11523, "b": 506.05591, "coord_origin": "TOPLEFT"}}, {"id": 140, "text": "faster autoregressive decoding. This is achieved by storing", "bbox": {"l": 308.86203, "t": 509.10535, "r": 545.11511, "b": 518.0119, "coord_origin": "TOPLEFT"}}, {"id": 141, "text": "the features of decoded tokens so we can reuse them for", "bbox": {"l": 308.86203, "t": 521.06033, "r": 545.11517, "b": 529.9668899999999, "coord_origin": "TOPLEFT"}}, {"id": 142, "text": "each time step. Therefore, we only compute the attention", "bbox": {"l": 308.86203, "t": 533.01532, "r": 545.11517, "b": 541.9218900000001, "coord_origin": "TOPLEFT"}}, {"id": 143, "text": "for each new tag.", "bbox": {"l": 308.86203, "t": 544.97034, "r": 377.21548, "b": 553.87689, "coord_origin": "TOPLEFT"}}]}, "text": "TableFormer is implemented with PyTorch and Torchvision libraries [22]. To speed up the inference, the image undergoes a single forward pass through the CNN Backbone Network and transformer encoder. This eliminates the overhead of generating the same features for each decoding step. Similarly, we employ a \u2019caching\u2019 technique to preform faster autoregressive decoding. This is achieved by storing the features of decoded tokens so we can reuse them for each time step. Therefore, we only compute the attention for each new tag."}, {"label": "section_header", "id": 15, "page_no": 5, "cluster": {"id": 15, "label": "section_header", "bbox": {"l": 308.1548767089844, "t": 578.8842163085938, "r": 397.44281, "b": 589.40637, "coord_origin": "TOPLEFT"}, "confidence": 0.9450808763504028, "cells": [{"id": 144, "text": "5.2.", "bbox": {"l": 308.86203, "t": 579.55432, "r": 323.9046, "b": 589.40637, "coord_origin": "TOPLEFT"}}, {"id": 145, "text": "Generalization", "bbox": {"l": 333.93301, "t": 579.55432, "r": 397.44281, "b": 589.40637, "coord_origin": "TOPLEFT"}}]}, "text": "5.2. Generalization"}, {"label": "text", "id": 16, "page_no": 5, "cluster": {"id": 16, "label": "text", "bbox": {"l": 308.0038757324219, "t": 602.3896484375, "r": 545.11517, "b": 672.86962890625, "coord_origin": "TOPLEFT"}, "confidence": 0.9880596995353699, "cells": [{"id": 146, "text": "TableFormer is evaluated on three major publicly avail-", "bbox": {"l": 320.81702, "t": 603.44933, "r": 545.11493, "b": 612.3558800000001, "coord_origin": "TOPLEFT"}}, {"id": 147, "text": "able datasets of different nature to prove the generalization", "bbox": {"l": 308.86203, "t": 615.40433, "r": 545.11511, "b": 624.31088, "coord_origin": "TOPLEFT"}}, {"id": 148, "text": "and effectiveness of our model. The datasets used for eval-", "bbox": {"l": 308.86203, "t": 627.35933, "r": 545.11517, "b": 636.26588, "coord_origin": "TOPLEFT"}}, {"id": 149, "text": "uation are the PubTabNet, FinTabNet and TableBank which", "bbox": {"l": 308.86203, "t": 639.31433, "r": 545.11511, "b": 648.22089, "coord_origin": "TOPLEFT"}}, {"id": 150, "text": "stem from the scientific, financial and general domains re-", "bbox": {"l": 308.86203, "t": 651.27032, "r": 545.11517, "b": 660.17688, "coord_origin": "TOPLEFT"}}, {"id": 151, "text": "spectively.", "bbox": {"l": 308.86203, "t": 663.22533, "r": 350.70493, "b": 672.13189, "coord_origin": "TOPLEFT"}}]}, "text": "TableFormer is evaluated on three major publicly available datasets of different nature to prove the generalization and effectiveness of our model. The datasets used for evaluation are the PubTabNet, FinTabNet and TableBank which stem from the scientific, financial and general domains respectively."}, {"label": "text", "id": 17, "page_no": 5, "cluster": {"id": 17, "label": "text", "bbox": {"l": 308.2522277832031, "t": 679.5938720703125, "r": 545.2968139648438, "b": 713.7670288085938, "coord_origin": "TOPLEFT"}, "confidence": 0.9830910563468933, "cells": [{"id": 152, "text": "We also share our baseline results on the challenging", "bbox": {"l": 320.81702, "t": 680.33533, "r": 545.11505, "b": 689.24189, "coord_origin": "TOPLEFT"}}, {"id": 153, "text": "SynthTabNet dataset.", "bbox": {"l": 308.86203, "t": 692.290329, "r": 396.21411, "b": 701.196892, "coord_origin": "TOPLEFT"}}, {"id": 154, "text": "Throughout our experiments, the", "bbox": {"l": 406.40585, "t": 692.290329, "r": 545.11523, "b": 701.196892, "coord_origin": "TOPLEFT"}}, {"id": 155, "text": "same parameters stated in Sec. 5.1 are utilized.", "bbox": {"l": 308.86203, "t": 704.246323, "r": 495.93982, "b": 713.152893, "coord_origin": "TOPLEFT"}}]}, "text": "We also share our baseline results on the challenging SynthTabNet dataset. Throughout our experiments, the same parameters stated in Sec. 5.1 are utilized."}, {"label": "page_footer", "id": 18, "page_no": 5, "cluster": {"id": 18, "label": "page_footer", "bbox": {"l": 294.4747009277344, "t": 733.2755126953125, "r": 300.3784484863281, "b": 743.03989, "coord_origin": "TOPLEFT"}, "confidence": 0.88204026222229, "cells": [{"id": 156, "text": "6", "bbox": {"l": 295.12103, "t": 734.133327, "r": 300.10233, "b": 743.03989, "coord_origin": "TOPLEFT"}}]}, "text": "6"}], "body": [{"label": "text", "id": 0, "page_no": 5, "cluster": {"id": 0, "label": "text", "bbox": {"l": 49.35792922973633, "t": 74.26022338867188, "r": 286.5823059082031, "b": 156.22055053710938, "coord_origin": "TOPLEFT"}, "confidence": 0.9868088960647583, "cells": [{"id": 0, "text": "tention encoding is then multiplied to the encoded image to", "bbox": {"l": 50.112, "t": 75.20836999999995, "r": 286.36514, "b": 84.11492999999996, "coord_origin": "TOPLEFT"}}, {"id": 1, "text": "produce a feature for each table cell. Notice that this is dif-", "bbox": {"l": 50.112, "t": 87.16339000000005, "r": 286.36508, "b": 96.06994999999995, "coord_origin": "TOPLEFT"}}, {"id": 2, "text": "ferent than the typical object detection problem where im-", "bbox": {"l": 50.112, "t": 99.11841000000004, "r": 286.36508, "b": 108.02495999999985, "coord_origin": "TOPLEFT"}}, {"id": 3, "text": "balances between the number of detections and the amount", "bbox": {"l": 50.112, "t": 111.07343000000003, "r": 286.36508, "b": 119.97997999999984, "coord_origin": "TOPLEFT"}}, {"id": 4, "text": "of objects may exist. In our case, we know up front that", "bbox": {"l": 50.112, "t": 123.02844000000005, "r": 286.36508, "b": 131.93499999999995, "coord_origin": "TOPLEFT"}}, {"id": 5, "text": "the produced detections always match with the table cells", "bbox": {"l": 50.112, "t": 134.98443999999995, "r": 286.36514, "b": 143.89099, "coord_origin": "TOPLEFT"}}, {"id": 6, "text": "in number and correspondence.", "bbox": {"l": 50.112, "t": 146.93944999999997, "r": 175.16254, "b": 155.84600999999998, "coord_origin": "TOPLEFT"}}]}, "text": "tention encoding is then multiplied to the encoded image to produce a feature for each table cell. Notice that this is different than the typical object detection problem where imbalances between the number of detections and the amount of objects may exist. In our case, we know up front that the produced detections always match with the table cells in number and correspondence."}, {"label": "text", "id": 1, "page_no": 5, "cluster": {"id": 1, "label": "text", "bbox": {"l": 49.385826110839844, "t": 158.92007446289062, "r": 286.5838928222656, "b": 240.56492614746094, "coord_origin": "TOPLEFT"}, "confidence": 0.9860329031944275, "cells": [{"id": 7, "text": "The output features for each table cell are then fed", "bbox": {"l": 62.067001, "t": 159.62445000000002, "r": 286.36496, "b": 168.53101000000004, "coord_origin": "TOPLEFT"}}, {"id": 8, "text": "into the feed-forward network (FFN). The FFN consists", "bbox": {"l": 50.112, "t": 171.58043999999995, "r": 286.36511, "b": 180.48699999999997, "coord_origin": "TOPLEFT"}}, {"id": 9, "text": "of a Multi-Layer Perceptron (3 layers with ReLU activa-", "bbox": {"l": 50.112, "t": 183.53545999999994, "r": 286.36511, "b": 192.44201999999996, "coord_origin": "TOPLEFT"}}, {"id": 10, "text": "tion function) that predicts the normalized coordinates for", "bbox": {"l": 50.112, "t": 195.49048000000005, "r": 286.36511, "b": 204.39702999999997, "coord_origin": "TOPLEFT"}}, {"id": 11, "text": "the bounding box of each table cell. Finally, the predicted", "bbox": {"l": 50.112, "t": 207.44550000000004, "r": 286.36511, "b": 216.35204999999996, "coord_origin": "TOPLEFT"}}, {"id": 12, "text": "bounding boxes are classified based on whether they are", "bbox": {"l": 50.112, "t": 219.40051000000005, "r": 286.36511, "b": 228.30706999999995, "coord_origin": "TOPLEFT"}}, {"id": 13, "text": "empty or not using a linear layer.", "bbox": {"l": 50.112, "t": 231.35650999999996, "r": 181.54855, "b": 240.26306, "coord_origin": "TOPLEFT"}}]}, "text": "The output features for each table cell are then fed into the feed-forward network (FFN). The FFN consists of a Multi-Layer Perceptron (3 layers with ReLU activation function) that predicts the normalized coordinates for the bounding box of each table cell. Finally, the predicted bounding boxes are classified based on whether they are empty or not using a linear layer."}, {"label": "text", "id": 2, "page_no": 5, "cluster": {"id": 2, "label": "text", "bbox": {"l": 49.27542495727539, "t": 243.0098419189453, "r": 286.86236572265625, "b": 444.3459777832031, "coord_origin": "TOPLEFT"}, "confidence": 0.9873637557029724, "cells": [{"id": 14, "text": "Loss Functions.", "bbox": {"l": 62.067001, "t": 243.92193999999995, "r": 129.21492, "b": 252.87829999999997, "coord_origin": "TOPLEFT"}}, {"id": 15, "text": "We formulate a multi-task loss Eq. 2", "bbox": {"l": 134.451, "t": 244.04150000000004, "r": 286.36078, "b": 252.94806000000005, "coord_origin": "TOPLEFT"}}, {"id": 16, "text": "to train our network. The Cross-Entropy loss (denoted as", "bbox": {"l": 50.112007, "t": 255.99652000000003, "r": 286.36511, "b": 264.90308000000005, "coord_origin": "TOPLEFT"}}, {"id": 17, "text": "l$_{s}$", "bbox": {"l": 50.112007, "t": 267.79309, "r": 56.84528, "b": 276.63989000000004, "coord_origin": "TOPLEFT"}}, {"id": 18, "text": ") is used to train the", "bbox": {"l": 57.343006, "t": 267.95250999999996, "r": 135.39996, "b": 276.85907, "coord_origin": "TOPLEFT"}}, {"id": 19, "text": "Structure Decoder", "bbox": {"l": 137.735, "t": 268.04218000000003, "r": 211.07965, "b": 276.62994000000003, "coord_origin": "TOPLEFT"}}, {"id": 20, "text": "which predicts the", "bbox": {"l": 213.63699, "t": 267.95250999999996, "r": 286.36395, "b": 276.85907, "coord_origin": "TOPLEFT"}}, {"id": 21, "text": "structure tokens. As for the", "bbox": {"l": 50.112, "t": 279.90747, "r": 158.82388, "b": 288.81406, "coord_origin": "TOPLEFT"}}, {"id": 22, "text": "Cell BBox Decoder", "bbox": {"l": 161.31799, "t": 279.99712999999997, "r": 238.79712, "b": 288.58493, "coord_origin": "TOPLEFT"}}, {"id": 23, "text": "it is trained", "bbox": {"l": 241.521, "t": 279.90747, "r": 286.36264, "b": 288.81406, "coord_origin": "TOPLEFT"}}, {"id": 24, "text": "with a combination of losses denoted as", "bbox": {"l": 50.112, "t": 291.86249, "r": 211.3766, "b": 300.76904, "coord_origin": "TOPLEFT"}}, {"id": 25, "text": "l$_{box}$", "bbox": {"l": 214.271, "t": 291.70309, "r": 229.19780000000003, "b": 300.54987, "coord_origin": "TOPLEFT"}}, {"id": 26, "text": ".", "bbox": {"l": 229.696, "t": 291.86249, "r": 232.18665000000001, "b": 300.76904, "coord_origin": "TOPLEFT"}}, {"id": 27, "text": "l$_{box}$", "bbox": {"l": 236.49001, "t": 291.70309, "r": 251.41681000000003, "b": 300.54987, "coord_origin": "TOPLEFT"}}, {"id": 28, "text": "consists", "bbox": {"l": 254.81099999999998, "t": 291.86249, "r": 286.36255, "b": 300.76904, "coord_origin": "TOPLEFT"}}, {"id": 29, "text": "of the generally used", "bbox": {"l": 50.112, "t": 303.81747, "r": 137.45412, "b": 312.72403, "coord_origin": "TOPLEFT"}}, {"id": 30, "text": "l$_{1}$", "bbox": {"l": 141.298, "t": 303.65808, "r": 148.24258, "b": 312.50485, "coord_origin": "TOPLEFT"}}, {"id": 31, "text": "loss for object detection and the", "bbox": {"l": 152.58601, "t": 303.81747, "r": 286.36377, "b": 312.72403, "coord_origin": "TOPLEFT"}}, {"id": 32, "text": "IoU loss (", "bbox": {"l": 50.112015, "t": 315.77245999999997, "r": 89.683464, "b": 324.67902, "coord_origin": "TOPLEFT"}}, {"id": 33, "text": "l$_{iou}$", "bbox": {"l": 89.68602, "t": 315.61307, "r": 104.12046, "b": 324.45984, "coord_origin": "TOPLEFT"}}, {"id": 34, "text": ") to be scale invariant as explained in [25]. In", "bbox": {"l": 104.61802, "t": 315.77245999999997, "r": 286.36572, "b": 324.67902, "coord_origin": "TOPLEFT"}}, {"id": 35, "text": "comparison to DETR, we do not use the Hungarian algo-", "bbox": {"l": 50.112019, "t": 327.72845, "r": 286.36511, "b": 336.6350100000001, "coord_origin": "TOPLEFT"}}, {"id": 36, "text": "rithm [15] to match the predicted bounding boxes with the", "bbox": {"l": 50.112019, "t": 339.68344, "r": 286.36508, "b": 348.59, "coord_origin": "TOPLEFT"}}, {"id": 37, "text": "ground-truth boxes, as we have already achieved a one-to-", "bbox": {"l": 50.112019, "t": 351.63843, "r": 286.36511, "b": 360.54498, "coord_origin": "TOPLEFT"}}, {"id": 38, "text": "one match through two steps: 1) Our token input sequence", "bbox": {"l": 50.112019, "t": 363.59341, "r": 286.36508, "b": 372.49996999999996, "coord_origin": "TOPLEFT"}}, {"id": 39, "text": "is naturally ordered, therefore the hidden states of the table", "bbox": {"l": 50.112019, "t": 375.5484, "r": 286.36511, "b": 384.45496, "coord_origin": "TOPLEFT"}}, {"id": 40, "text": "data cells are also in order when they are provided as in-", "bbox": {"l": 50.112019, "t": 387.50339, "r": 286.36514, "b": 396.40994, "coord_origin": "TOPLEFT"}}, {"id": 41, "text": "put to the", "bbox": {"l": 50.112019, "t": 399.45938, "r": 88.68721, "b": 408.36594, "coord_origin": "TOPLEFT"}}, {"id": 42, "text": "Cell BBox Decoder", "bbox": {"l": 91.646019, "t": 399.54904, "r": 170.0517, "b": 408.13681, "coord_origin": "TOPLEFT"}}, {"id": 43, "text": ", and 2) Our bounding boxes", "bbox": {"l": 170.05103, "t": 399.45938, "r": 286.36438, "b": 408.36594, "coord_origin": "TOPLEFT"}}, {"id": 44, "text": "generation mechanism (see Sec.", "bbox": {"l": 50.112022, "t": 411.41437, "r": 181.96703, "b": 420.32092, "coord_origin": "TOPLEFT"}}, {"id": 45, "text": "3)", "bbox": {"l": 189.09029, "t": 411.41437, "r": 197.74918, "b": 420.32092, "coord_origin": "TOPLEFT"}}, {"id": 46, "text": "ensures a one-to-one", "bbox": {"l": 200.34789, "t": 411.41437, "r": 286.36511, "b": 420.32092, "coord_origin": "TOPLEFT"}}, {"id": 47, "text": "mapping between the cell content and its bounding box for", "bbox": {"l": 50.112022, "t": 423.36934999999994, "r": 286.36511, "b": 432.27591, "coord_origin": "TOPLEFT"}}, {"id": 48, "text": "all post-processed datasets.", "bbox": {"l": 50.112022, "t": 435.32434, "r": 158.2959, "b": 444.2309, "coord_origin": "TOPLEFT"}}]}, "text": "Loss Functions. We formulate a multi-task loss Eq. 2 to train our network. The Cross-Entropy loss (denoted as l$_{s}$ ) is used to train the Structure Decoder which predicts the structure tokens. As for the Cell BBox Decoder it is trained with a combination of losses denoted as l$_{box}$ . l$_{box}$ consists of the generally used l$_{1}$ loss for object detection and the IoU loss ( l$_{iou}$ ) to be scale invariant as explained in [25]. In comparison to DETR, we do not use the Hungarian algorithm [15] to match the predicted bounding boxes with the ground-truth boxes, as we have already achieved a one-toone match through two steps: 1) Our token input sequence is naturally ordered, therefore the hidden states of the table data cells are also in order when they are provided as input to the Cell BBox Decoder , and 2) Our bounding boxes generation mechanism (see Sec. 3) ensures a one-to-one mapping between the cell content and its bounding box for all post-processed datasets."}, {"label": "text", "id": 3, "page_no": 5, "cluster": {"id": 3, "label": "text", "bbox": {"l": 49.76386642456055, "t": 446.9576110839844, "r": 286.42083740234375, "b": 469.1785583496094, "coord_origin": "TOPLEFT"}, "confidence": 0.9724196195602417, "cells": [{"id": 49, "text": "The loss used to train the TableFormer can be defined as", "bbox": {"l": 62.067024, "t": 448.01035, "r": 286.36499, "b": 456.9169, "coord_origin": "TOPLEFT"}}, {"id": 50, "text": "following:", "bbox": {"l": 50.112022, "t": 459.96533, "r": 91.377113, "b": 468.87189, "coord_origin": "TOPLEFT"}}]}, "text": "The loss used to train the TableFormer can be defined as following:"}, {"label": "formula", "id": 4, "page_no": 5, "cluster": {"id": 4, "label": "formula", "bbox": {"l": 123.5527114868164, "t": 492.1290588378906, "r": 286.36480712890625, "b": 518.0797119140625, "coord_origin": "TOPLEFT"}, "confidence": 0.9423348307609558, "cells": [{"id": 51, "text": "l$_{box}$", "bbox": {"l": 125.71502, "t": 493.28094, "r": 140.64182, "b": 502.12772, "coord_origin": "TOPLEFT"}}, {"id": 52, "text": "=", "bbox": {"l": 143.90701, "t": 493.28094, "r": 151.65593, "b": 502.12772, "coord_origin": "TOPLEFT"}}, {"id": 53, "text": "\u03bb$_{iou}$l$_{iou}$", "bbox": {"l": 154.42302, "t": 493.28094, "r": 186.62846, "b": 502.12772, "coord_origin": "TOPLEFT"}}, {"id": 54, "text": "+", "bbox": {"l": 189.34003, "t": 493.28094, "r": 197.08894, "b": 502.12772, "coord_origin": "TOPLEFT"}}, {"id": 55, "text": "\u03bb$_{l}$$_{1}$", "bbox": {"l": 199.30302, "t": 493.28094, "r": 211.64659, "b": 502.12772, "coord_origin": "TOPLEFT"}}, {"id": 56, "text": "l", "bbox": {"l": 124.33002, "t": 508.22495, "r": 127.30286, "b": 517.07172, "coord_origin": "TOPLEFT"}}, {"id": 57, "text": "=", "bbox": {"l": 130.26602, "t": 508.22495, "r": 138.01494, "b": 517.07172, "coord_origin": "TOPLEFT"}}, {"id": 58, "text": "\u03bbl$_{s}$", "bbox": {"l": 140.78203, "t": 508.22495, "r": 153.32629, "b": 517.07172, "coord_origin": "TOPLEFT"}}, {"id": 59, "text": "+ (1", "bbox": {"l": 156.03903, "t": 508.22495, "r": 174.85541, "b": 517.07172, "coord_origin": "TOPLEFT"}}, {"id": 60, "text": "\u2212", "bbox": {"l": 177.07103, "t": 507.66702, "r": 184.81995, "b": 517.07172, "coord_origin": "TOPLEFT"}}, {"id": 61, "text": "\u03bb", "bbox": {"l": 187.03304, "t": 508.22495, "r": 192.84422, "b": 517.07172, "coord_origin": "TOPLEFT"}}, {"id": 62, "text": ")", "bbox": {"l": 192.84503, "t": 508.22495, "r": 196.71948, "b": 517.07172, "coord_origin": "TOPLEFT"}}, {"id": 63, "text": "l$_{box}$", "bbox": {"l": 196.71902, "t": 508.22495, "r": 211.64583, "b": 517.07172, "coord_origin": "TOPLEFT"}}, {"id": 64, "text": "(1)", "bbox": {"l": 274.74603, "t": 501.01132, "r": 286.36243, "b": 509.91788, "coord_origin": "TOPLEFT"}}]}, "text": "l$_{box}$ = \u03bb$_{iou}$l$_{iou}$ + \u03bb$_{l}$$_{1}$ l = \u03bbl$_{s}$ + (1 \u2212 \u03bb ) l$_{box}$ (1)"}, {"label": "text", "id": 5, "page_no": 5, "cluster": {"id": 5, "label": "text", "bbox": {"l": 49.46774673461914, "t": 530.3292236328125, "r": 281.59692, "b": 541.0584106445312, "coord_origin": "TOPLEFT"}, "confidence": 0.9363928437232971, "cells": [{"id": 65, "text": "where", "bbox": {"l": 50.11203, "t": 531.30933, "r": 74.450661, "b": 540.21588, "coord_origin": "TOPLEFT"}}, {"id": 66, "text": "\u03bb", "bbox": {"l": 76.941032, "t": 531.14993, "r": 82.75222, "b": 539.9967, "coord_origin": "TOPLEFT"}}, {"id": 67, "text": "\u2208", "bbox": {"l": 85.520035, "t": 530.5920100000001, "r": 92.162102, "b": 539.9967, "coord_origin": "TOPLEFT"}}, {"id": 68, "text": "[0, 1], and", "bbox": {"l": 94.653038, "t": 531.30933, "r": 135.59932, "b": 540.21588, "coord_origin": "TOPLEFT"}}, {"id": 69, "text": "\u03bb$_{iou}$, \u03bb$_{l}$$_{1}$", "bbox": {"l": 138.09004, "t": 531.14993, "r": 172.63162, "b": 539.9967, "coord_origin": "TOPLEFT"}}, {"id": 70, "text": "\u2208$_{R}$", "bbox": {"l": 175.89705, "t": 530.5920100000001, "r": 192.50104, "b": 539.9967, "coord_origin": "TOPLEFT"}}, {"id": 71, "text": "are hyper-parameters.", "bbox": {"l": 194.99205, "t": 531.30933, "r": 281.59692, "b": 540.21588, "coord_origin": "TOPLEFT"}}]}, "text": "where \u03bb \u2208 [0, 1], and \u03bb$_{iou}$, \u03bb$_{l}$$_{1}$ \u2208$_{R}$ are hyper-parameters."}, {"label": "section_header", "id": 6, "page_no": 5, "cluster": {"id": 6, "label": "section_header", "bbox": {"l": 49.37754821777344, "t": 554.845703125, "r": 171.98335, "b": 566.986572265625, "coord_origin": "TOPLEFT"}, "confidence": 0.9554555416107178, "cells": [{"id": 72, "text": "5.", "bbox": {"l": 50.112045, "t": 555.91689, "r": 57.92831799999999, "b": 566.66461, "coord_origin": "TOPLEFT"}}, {"id": 73, "text": "Experimental Results", "bbox": {"l": 68.350014, "t": 555.91689, "r": 171.98335, "b": 566.66461, "coord_origin": "TOPLEFT"}}]}, "text": "5. Experimental Results"}, {"label": "section_header", "id": 7, "page_no": 5, "cluster": {"id": 7, "label": "section_header", "bbox": {"l": 49.518253326416016, "t": 575.6320190429688, "r": 179.17800903320312, "b": 586.595947265625, "coord_origin": "TOPLEFT"}, "confidence": 0.9538504481315613, "cells": [{"id": 74, "text": "5.1.", "bbox": {"l": 50.112045, "t": 576.26433, "r": 64.693237, "b": 586.1163799999999, "coord_origin": "TOPLEFT"}}, {"id": 75, "text": "Implementation Details", "bbox": {"l": 74.414032, "t": 576.26433, "r": 179.17502, "b": 586.1163799999999, "coord_origin": "TOPLEFT"}}]}, "text": "5.1. Implementation Details"}, {"label": "text", "id": 8, "page_no": 5, "cluster": {"id": 8, "label": "text", "bbox": {"l": 49.45502471923828, "t": 594.9777221679688, "r": 286.3757629394531, "b": 641.0610961914062, "coord_origin": "TOPLEFT"}, "confidence": 0.9856163263320923, "cells": [{"id": 76, "text": "TableFormer uses ResNet-18 as the", "bbox": {"l": 62.067047, "t": 595.73433, "r": 202.97806, "b": 604.64088, "coord_origin": "TOPLEFT"}}, {"id": 77, "text": "CNN Backbone Net-", "bbox": {"l": 205.38405, "t": 595.82399, "r": 286.36008, "b": 604.41174, "coord_origin": "TOPLEFT"}}, {"id": 78, "text": "work", "bbox": {"l": 50.112045, "t": 607.77899, "r": 70.037247, "b": 616.3667399999999, "coord_origin": "TOPLEFT"}}, {"id": 79, "text": ". The input images are resized to 448*448 pixels and", "bbox": {"l": 70.037048, "t": 607.68933, "r": 286.36496, "b": 616.59589, "coord_origin": "TOPLEFT"}}, {"id": 80, "text": "the feature map has a dimension of 28*28. Additionally, we", "bbox": {"l": 50.112049, "t": 619.64433, "r": 286.36517, "b": 628.55089, "coord_origin": "TOPLEFT"}}, {"id": 81, "text": "enforce the following input constraints:", "bbox": {"l": 50.112049, "t": 631.60033, "r": 207.03294, "b": 640.50688, "coord_origin": "TOPLEFT"}}]}, "text": "TableFormer uses ResNet-18 as the CNN Backbone Network . The input images are resized to 448*448 pixels and the feature map has a dimension of 28*28. Additionally, we enforce the following input constraints:"}, {"label": "formula", "id": 9, "page_no": 5, "cluster": {"id": 9, "label": "formula", "bbox": {"l": 91.31732940673828, "t": 653.6121215820312, "r": 286.36246, "b": 678.39588, "coord_origin": "TOPLEFT"}, "confidence": 0.8441831469535828, "cells": [{"id": 82, "text": "Image width and height", "bbox": {"l": 91.661049, "t": 654.54532, "r": 186.01683, "b": 663.45187, "coord_origin": "TOPLEFT"}}, {"id": 83, "text": "\u2264", "bbox": {"l": 188.50705, "t": 653.828, "r": 196.25597, "b": 663.2327, "coord_origin": "TOPLEFT"}}, {"id": 84, "text": "1024 pixels", "bbox": {"l": 198.74605, "t": 654.54532, "r": 244.81310999999997, "b": 663.45187, "coord_origin": "TOPLEFT"}}, {"id": 85, "text": "Structural tags length", "bbox": {"l": 101.01604, "t": 669.48932, "r": 186.24606, "b": 678.39588, "coord_origin": "TOPLEFT"}}, {"id": 86, "text": "\u2264", "bbox": {"l": 188.73605, "t": 668.77201, "r": 196.48497, "b": 678.1767, "coord_origin": "TOPLEFT"}}, {"id": 87, "text": "512 tokens.", "bbox": {"l": 198.97505, "t": 669.48932, "r": 244.81296999999998, "b": 678.39588, "coord_origin": "TOPLEFT"}}, {"id": 88, "text": "(2)", "bbox": {"l": 274.74606, "t": 662.11731, "r": 286.36246, "b": 671.02388, "coord_origin": "TOPLEFT"}}]}, "text": "Image width and height \u2264 1024 pixels Structural tags length \u2264 512 tokens. (2)"}, {"label": "text", "id": 10, "page_no": 5, "cluster": {"id": 10, "label": "text", "bbox": {"l": 49.31819534301758, "t": 691.8294067382812, "r": 286.36514, "b": 713.1538696289062, "coord_origin": "TOPLEFT"}, "confidence": 0.9724978804588318, "cells": [{"id": 89, "text": "Although input constraints are used also by other methods,", "bbox": {"l": 50.112061, "t": 692.290314, "r": 286.36514, "b": 701.196877, "coord_origin": "TOPLEFT"}}, {"id": 90, "text": "such as EDD, ours are less restrictive due to the improved", "bbox": {"l": 50.112061, "t": 704.245316, "r": 286.36514, "b": 713.151878, "coord_origin": "TOPLEFT"}}]}, "text": "Although input constraints are used also by other methods, such as EDD, ours are less restrictive due to the improved"}, {"label": "text", "id": 11, "page_no": 5, "cluster": {"id": 11, "label": "text", "bbox": {"l": 307.9281921386719, "t": 74.27256774902344, "r": 545.5595703125, "b": 108.48387145996094, "coord_origin": "TOPLEFT"}, "confidence": 0.9799237847328186, "cells": [{"id": 91, "text": "runtime performance and lower memory footprint of Table-", "bbox": {"l": 308.86206, "t": 75.20830999999998, "r": 545.11523, "b": 84.11487, "coord_origin": "TOPLEFT"}}, {"id": 92, "text": "Former.", "bbox": {"l": 308.86206, "t": 87.16332999999997, "r": 339.98523, "b": 96.06988999999999, "coord_origin": "TOPLEFT"}}, {"id": 93, "text": "This allows to utilize input samples with longer", "bbox": {"l": 346.88931, "t": 87.16332999999997, "r": 545.11523, "b": 96.06988999999999, "coord_origin": "TOPLEFT"}}, {"id": 94, "text": "sequences and images with larger dimensions.", "bbox": {"l": 308.86206, "t": 99.11835000000008, "r": 492.96097, "b": 108.0249, "coord_origin": "TOPLEFT"}}]}, "text": "runtime performance and lower memory footprint of TableFormer. This allows to utilize input samples with longer sequences and images with larger dimensions."}, {"label": "text", "id": 12, "page_no": 5, "cluster": {"id": 12, "label": "text", "bbox": {"l": 307.79034423828125, "t": 114.75221252441406, "r": 545.6121215820312, "b": 328.51171875, "coord_origin": "TOPLEFT"}, "confidence": 0.9880929589271545, "cells": [{"id": 95, "text": "The Transformer Encoder consists of two \u201cTransformer", "bbox": {"l": 320.81705, "t": 116.22937000000002, "r": 545.11499, "b": 125.13593000000003, "coord_origin": "TOPLEFT"}}, {"id": 96, "text": "Encoder Layers\u201d, with an input feature size of 512, feed", "bbox": {"l": 308.86206, "t": 128.18439, "r": 545.11517, "b": 137.09094000000005, "coord_origin": "TOPLEFT"}}, {"id": 97, "text": "forward network of 1024, and 4 attention heads. As for the", "bbox": {"l": 308.86206, "t": 140.13940000000002, "r": 545.11505, "b": 149.04596000000004, "coord_origin": "TOPLEFT"}}, {"id": 98, "text": "Transformer Decoder it is composed of four \u201cTransformer", "bbox": {"l": 308.86206, "t": 152.09442, "r": 545.11511, "b": 161.00098000000003, "coord_origin": "TOPLEFT"}}, {"id": 99, "text": "Decoder Layers\u201d with similar input and output dimensions", "bbox": {"l": 308.86206, "t": 164.04944, "r": 545.11517, "b": 172.95599000000004, "coord_origin": "TOPLEFT"}}, {"id": 100, "text": "as the \u201cTransformer Encoder Layers\u201d.", "bbox": {"l": 308.86206, "t": 176.00543000000005, "r": 467.21756000000005, "b": 184.91198999999995, "coord_origin": "TOPLEFT"}}, {"id": 101, "text": "Even though our", "bbox": {"l": 475.43671, "t": 176.00543000000005, "r": 545.11511, "b": 184.91198999999995, "coord_origin": "TOPLEFT"}}, {"id": 102, "text": "model uses fewer layers and heads than the default imple-", "bbox": {"l": 308.86206, "t": 187.96045000000004, "r": 545.11511, "b": 196.86699999999996, "coord_origin": "TOPLEFT"}}, {"id": 103, "text": "mentation parameters, our extensive experimentation has", "bbox": {"l": 308.86206, "t": 199.91547000000003, "r": 545.11511, "b": 208.82201999999995, "coord_origin": "TOPLEFT"}}, {"id": 104, "text": "proved this setup to be more suitable for table images. We", "bbox": {"l": 308.86206, "t": 211.87048000000004, "r": 545.11517, "b": 220.77704000000006, "coord_origin": "TOPLEFT"}}, {"id": 105, "text": "attribute this finding to the inherent design of table im-", "bbox": {"l": 308.86206, "t": 223.82550000000003, "r": 545.11511, "b": 232.73206000000005, "coord_origin": "TOPLEFT"}}, {"id": 106, "text": "ages, which contain mostly lines and text, unlike the more", "bbox": {"l": 308.86206, "t": 235.78052000000002, "r": 545.11511, "b": 244.68706999999995, "coord_origin": "TOPLEFT"}}, {"id": 107, "text": "elaborate content present in other scopes (e.g. the COCO", "bbox": {"l": 308.86206, "t": 247.73650999999995, "r": 545.11523, "b": 256.64306999999997, "coord_origin": "TOPLEFT"}}, {"id": 108, "text": "dataset).", "bbox": {"l": 308.86206, "t": 259.69152999999994, "r": 342.3364, "b": 268.59808, "coord_origin": "TOPLEFT"}}, {"id": 109, "text": "Moreover, we have added ResNet blocks to the", "bbox": {"l": 348.95157, "t": 259.69152999999994, "r": 545.11517, "b": 268.59808, "coord_origin": "TOPLEFT"}}, {"id": 110, "text": "inputs of the Structure Decoder and Cell BBox Decoder.", "bbox": {"l": 308.86206, "t": 271.64655000000005, "r": 545.11517, "b": 280.55310000000003, "coord_origin": "TOPLEFT"}}, {"id": 111, "text": "This prevents a decoder having a stronger influence over the", "bbox": {"l": 308.86206, "t": 283.6015300000001, "r": 545.1153, "b": 292.50809, "coord_origin": "TOPLEFT"}}, {"id": 112, "text": "learned weights which would damage the other prediction", "bbox": {"l": 308.86206, "t": 295.55652, "r": 545.11511, "b": 304.46307, "coord_origin": "TOPLEFT"}}, {"id": 113, "text": "task (structure vs bounding boxes), but learn task specific", "bbox": {"l": 308.86206, "t": 307.51151, "r": 545.11511, "b": 316.41806, "coord_origin": "TOPLEFT"}}, {"id": 114, "text": "weights instead. Lastly our dropout layers are set to 0.5.", "bbox": {"l": 308.86206, "t": 319.4674999999999, "r": 532.48267, "b": 328.37405, "coord_origin": "TOPLEFT"}}]}, "text": "The Transformer Encoder consists of two \u201cTransformer Encoder Layers\u201d, with an input feature size of 512, feed forward network of 1024, and 4 attention heads. As for the Transformer Decoder it is composed of four \u201cTransformer Decoder Layers\u201d with similar input and output dimensions as the \u201cTransformer Encoder Layers\u201d. Even though our model uses fewer layers and heads than the default implementation parameters, our extensive experimentation has proved this setup to be more suitable for table images. We attribute this finding to the inherent design of table images, which contain mostly lines and text, unlike the more elaborate content present in other scopes (e.g. the COCO dataset). Moreover, we have added ResNet blocks to the inputs of the Structure Decoder and Cell BBox Decoder. This prevents a decoder having a stronger influence over the learned weights which would damage the other prediction task (structure vs bounding boxes), but learn task specific weights instead. Lastly our dropout layers are set to 0.5."}, {"label": "text", "id": 13, "page_no": 5, "cluster": {"id": 13, "label": "text", "bbox": {"l": 307.6786193847656, "t": 335.59222412109375, "r": 545.5258178710938, "b": 429.6965637207031, "coord_origin": "TOPLEFT"}, "confidence": 0.9877589344978333, "cells": [{"id": 115, "text": "For training, TableFormer is trained with 3 Adam opti-", "bbox": {"l": 320.81705, "t": 336.57751, "r": 545.11499, "b": 345.48407000000003, "coord_origin": "TOPLEFT"}}, {"id": 116, "text": "mizers, each one for the", "bbox": {"l": 308.86206, "t": 348.5325000000001, "r": 403.7359, "b": 357.43906, "coord_origin": "TOPLEFT"}}, {"id": 117, "text": "CNN Backbone Network", "bbox": {"l": 406.07605, "t": 348.62216, "r": 503.54016, "b": 357.20993, "coord_origin": "TOPLEFT"}}, {"id": 118, "text": ",", "bbox": {"l": 503.53906, "t": 348.5325000000001, "r": 506.02972, "b": 357.43906, "coord_origin": "TOPLEFT"}}, {"id": 119, "text": "Structure", "bbox": {"l": 508.40004999999996, "t": 348.62216, "r": 545.11224, "b": 357.20993, "coord_origin": "TOPLEFT"}}, {"id": 120, "text": "Decoder", "bbox": {"l": 308.86206, "t": 360.57715, "r": 343.1633, "b": 369.16492000000005, "coord_origin": "TOPLEFT"}}, {"id": 121, "text": ", and", "bbox": {"l": 343.16306, "t": 360.48749, "r": 362.2016, "b": 369.39404, "coord_origin": "TOPLEFT"}}, {"id": 122, "text": "Cell BBox Decoder", "bbox": {"l": 364.28604, "t": 360.57715, "r": 440.93829, "b": 369.16492000000005, "coord_origin": "TOPLEFT"}}, {"id": 123, "text": ". Taking the PubTabNet as", "bbox": {"l": 440.93903, "t": 360.48749, "r": 545.10797, "b": 369.39404, "coord_origin": "TOPLEFT"}}, {"id": 124, "text": "an example for our parameter set up, the initializing learn-", "bbox": {"l": 308.86203, "t": 372.44247, "r": 545.11511, "b": 381.34903, "coord_origin": "TOPLEFT"}}, {"id": 125, "text": "ing rate is 0.001 for 12 epochs with a batch size of 24, and", "bbox": {"l": 308.86203, "t": 384.3984699999999, "r": 545.11517, "b": 393.30502, "coord_origin": "TOPLEFT"}}, {"id": 126, "text": "\u03bb", "bbox": {"l": 308.86203, "t": 396.19406000000004, "r": 314.67322, "b": 405.04083, "coord_origin": "TOPLEFT"}}, {"id": 127, "text": "set to 0.5.", "bbox": {"l": 318.65802, "t": 396.35345, "r": 360.39139, "b": 405.2600100000001, "coord_origin": "TOPLEFT"}}, {"id": 128, "text": "Afterwards, we reduce the learning rate to", "bbox": {"l": 367.96295, "t": 396.35345, "r": 545.10803, "b": 405.2600100000001, "coord_origin": "TOPLEFT"}}, {"id": 129, "text": "0.0001, the batch size to 18 and train for 12 more epochs or", "bbox": {"l": 308.86203, "t": 408.30844, "r": 545.11511, "b": 417.215, "coord_origin": "TOPLEFT"}}, {"id": 130, "text": "convergence.", "bbox": {"l": 308.86203, "t": 420.26343, "r": 360.9664, "b": 429.16998, "coord_origin": "TOPLEFT"}}]}, "text": "For training, TableFormer is trained with 3 Adam optimizers, each one for the CNN Backbone Network , Structure Decoder , and Cell BBox Decoder . Taking the PubTabNet as an example for our parameter set up, the initializing learning rate is 0.001 for 12 epochs with a batch size of 24, and \u03bb set to 0.5. Afterwards, we reduce the learning rate to 0.0001, the batch size to 18 and train for 12 more epochs or convergence."}, {"label": "text", "id": 14, "page_no": 5, "cluster": {"id": 14, "label": "text", "bbox": {"l": 307.8315124511719, "t": 436.140380859375, "r": 545.4056396484375, "b": 554.6504516601562, "coord_origin": "TOPLEFT"}, "confidence": 0.9884491562843323, "cells": [{"id": 131, "text": "TableFormer is implemented with PyTorch and Torchvi-", "bbox": {"l": 320.81702, "t": 437.37441999999993, "r": 545.11499, "b": 446.28098, "coord_origin": "TOPLEFT"}}, {"id": 132, "text": "sion libraries [22].", "bbox": {"l": 308.86203, "t": 449.32941, "r": 384.62759, "b": 458.23596, "coord_origin": "TOPLEFT"}}, {"id": 133, "text": "To speed up the inference, the image", "bbox": {"l": 391.37228, "t": 449.32941, "r": 545.11511, "b": 458.23596, "coord_origin": "TOPLEFT"}}, {"id": 134, "text": "undergoes a single forward pass through the", "bbox": {"l": 308.86203, "t": 461.28439, "r": 494.00693000000007, "b": 470.19095, "coord_origin": "TOPLEFT"}}, {"id": 135, "text": "CNN Back-", "bbox": {"l": 498.07803, "t": 461.37405, "r": 545.11145, "b": 469.96182, "coord_origin": "TOPLEFT"}}, {"id": 136, "text": "bone Network", "bbox": {"l": 308.86203, "t": 473.32904, "r": 364.44336, "b": 481.91681, "coord_origin": "TOPLEFT"}}, {"id": 137, "text": "and transformer encoder. This eliminates the", "bbox": {"l": 367.06104, "t": 473.23938, "r": 545.11267, "b": 482.14594, "coord_origin": "TOPLEFT"}}, {"id": 138, "text": "overhead of generating the same features for each decoding", "bbox": {"l": 308.86203, "t": 485.19437, "r": 545.11511, "b": 494.10092, "coord_origin": "TOPLEFT"}}, {"id": 139, "text": "step. Similarly, we employ a \u2019caching\u2019 technique to preform", "bbox": {"l": 308.86203, "t": 497.14935, "r": 545.11523, "b": 506.05591, "coord_origin": "TOPLEFT"}}, {"id": 140, "text": "faster autoregressive decoding. This is achieved by storing", "bbox": {"l": 308.86203, "t": 509.10535, "r": 545.11511, "b": 518.0119, "coord_origin": "TOPLEFT"}}, {"id": 141, "text": "the features of decoded tokens so we can reuse them for", "bbox": {"l": 308.86203, "t": 521.06033, "r": 545.11517, "b": 529.9668899999999, "coord_origin": "TOPLEFT"}}, {"id": 142, "text": "each time step. Therefore, we only compute the attention", "bbox": {"l": 308.86203, "t": 533.01532, "r": 545.11517, "b": 541.9218900000001, "coord_origin": "TOPLEFT"}}, {"id": 143, "text": "for each new tag.", "bbox": {"l": 308.86203, "t": 544.97034, "r": 377.21548, "b": 553.87689, "coord_origin": "TOPLEFT"}}]}, "text": "TableFormer is implemented with PyTorch and Torchvision libraries [22]. To speed up the inference, the image undergoes a single forward pass through the CNN Backbone Network and transformer encoder. This eliminates the overhead of generating the same features for each decoding step. Similarly, we employ a \u2019caching\u2019 technique to preform faster autoregressive decoding. This is achieved by storing the features of decoded tokens so we can reuse them for each time step. Therefore, we only compute the attention for each new tag."}, {"label": "section_header", "id": 15, "page_no": 5, "cluster": {"id": 15, "label": "section_header", "bbox": {"l": 308.1548767089844, "t": 578.8842163085938, "r": 397.44281, "b": 589.40637, "coord_origin": "TOPLEFT"}, "confidence": 0.9450808763504028, "cells": [{"id": 144, "text": "5.2.", "bbox": {"l": 308.86203, "t": 579.55432, "r": 323.9046, "b": 589.40637, "coord_origin": "TOPLEFT"}}, {"id": 145, "text": "Generalization", "bbox": {"l": 333.93301, "t": 579.55432, "r": 397.44281, "b": 589.40637, "coord_origin": "TOPLEFT"}}]}, "text": "5.2. Generalization"}, {"label": "text", "id": 16, "page_no": 5, "cluster": {"id": 16, "label": "text", "bbox": {"l": 308.0038757324219, "t": 602.3896484375, "r": 545.11517, "b": 672.86962890625, "coord_origin": "TOPLEFT"}, "confidence": 0.9880596995353699, "cells": [{"id": 146, "text": "TableFormer is evaluated on three major publicly avail-", "bbox": {"l": 320.81702, "t": 603.44933, "r": 545.11493, "b": 612.3558800000001, "coord_origin": "TOPLEFT"}}, {"id": 147, "text": "able datasets of different nature to prove the generalization", "bbox": {"l": 308.86203, "t": 615.40433, "r": 545.11511, "b": 624.31088, "coord_origin": "TOPLEFT"}}, {"id": 148, "text": "and effectiveness of our model. The datasets used for eval-", "bbox": {"l": 308.86203, "t": 627.35933, "r": 545.11517, "b": 636.26588, "coord_origin": "TOPLEFT"}}, {"id": 149, "text": "uation are the PubTabNet, FinTabNet and TableBank which", "bbox": {"l": 308.86203, "t": 639.31433, "r": 545.11511, "b": 648.22089, "coord_origin": "TOPLEFT"}}, {"id": 150, "text": "stem from the scientific, financial and general domains re-", "bbox": {"l": 308.86203, "t": 651.27032, "r": 545.11517, "b": 660.17688, "coord_origin": "TOPLEFT"}}, {"id": 151, "text": "spectively.", "bbox": {"l": 308.86203, "t": 663.22533, "r": 350.70493, "b": 672.13189, "coord_origin": "TOPLEFT"}}]}, "text": "TableFormer is evaluated on three major publicly available datasets of different nature to prove the generalization and effectiveness of our model. The datasets used for evaluation are the PubTabNet, FinTabNet and TableBank which stem from the scientific, financial and general domains respectively."}, {"label": "text", "id": 17, "page_no": 5, "cluster": {"id": 17, "label": "text", "bbox": {"l": 308.2522277832031, "t": 679.5938720703125, "r": 545.2968139648438, "b": 713.7670288085938, "coord_origin": "TOPLEFT"}, "confidence": 0.9830910563468933, "cells": [{"id": 152, "text": "We also share our baseline results on the challenging", "bbox": {"l": 320.81702, "t": 680.33533, "r": 545.11505, "b": 689.24189, "coord_origin": "TOPLEFT"}}, {"id": 153, "text": "SynthTabNet dataset.", "bbox": {"l": 308.86203, "t": 692.290329, "r": 396.21411, "b": 701.196892, "coord_origin": "TOPLEFT"}}, {"id": 154, "text": "Throughout our experiments, the", "bbox": {"l": 406.40585, "t": 692.290329, "r": 545.11523, "b": 701.196892, "coord_origin": "TOPLEFT"}}, {"id": 155, "text": "same parameters stated in Sec. 5.1 are utilized.", "bbox": {"l": 308.86203, "t": 704.246323, "r": 495.93982, "b": 713.152893, "coord_origin": "TOPLEFT"}}]}, "text": "We also share our baseline results on the challenging SynthTabNet dataset. Throughout our experiments, the same parameters stated in Sec. 5.1 are utilized."}], "headers": [{"label": "page_footer", "id": 18, "page_no": 5, "cluster": {"id": 18, "label": "page_footer", "bbox": {"l": 294.4747009277344, "t": 733.2755126953125, "r": 300.3784484863281, "b": 743.03989, "coord_origin": "TOPLEFT"}, "confidence": 0.88204026222229, "cells": [{"id": 156, "text": "6", "bbox": {"l": 295.12103, "t": 734.133327, "r": 300.10233, "b": 743.03989, "coord_origin": "TOPLEFT"}}]}, "text": "6"}]}}, {"page_no": 6, "size": {"width": 612.0, "height": 792.0}, "cells": [{"id": 0, "text": "5.3.", "bbox": {"l": 50.112, "t": 74.40137000000016, "r": 63.704811, "b": 84.25342, "coord_origin": "TOPLEFT"}}, {"id": 1, "text": "Datasets and Metrics", "bbox": {"l": 72.766685, "t": 74.40137000000016, "r": 167.89825, "b": 84.25342, "coord_origin": "TOPLEFT"}}, {"id": 2, "text": "The Tree-Edit-Distance-Based Similarity (TEDS) met-", "bbox": {"l": 62.067001, "t": 93.35039999999992, "r": 286.36499, "b": 102.25696000000016, "coord_origin": "TOPLEFT"}}, {"id": 3, "text": "ric was introduced in [37]. It represents the prediction, and", "bbox": {"l": 50.112, "t": 105.30542000000003, "r": 286.36511, "b": 114.21198000000015, "coord_origin": "TOPLEFT"}}, {"id": 4, "text": "ground-truth as a tree structure of HTML tags. This simi-", "bbox": {"l": 50.112, "t": 117.26044000000002, "r": 286.36505, "b": 126.16699000000006, "coord_origin": "TOPLEFT"}}, {"id": 5, "text": "larity is calculated as:", "bbox": {"l": 50.112, "t": 129.21642999999995, "r": 136.71687, "b": 138.12298999999996, "coord_origin": "TOPLEFT"}}, {"id": 6, "text": "TEDS (", "bbox": {"l": 86.218994, "t": 157.05798000000004, "r": 118.8784, "b": 165.90479000000005, "coord_origin": "TOPLEFT"}}, {"id": 7, "text": "T$_{a}$, T$_{b}$", "bbox": {"l": 118.87499, "t": 157.05798000000004, "r": 143.26962, "b": 165.90479000000005, "coord_origin": "TOPLEFT"}}, {"id": 8, "text": ") = 1", "bbox": {"l": 143.76799, "t": 157.05798000000004, "r": 165.9019, "b": 165.90479000000005, "coord_origin": "TOPLEFT"}}, {"id": 9, "text": "\u2212", "bbox": {"l": 168.12099, "t": 156.50012000000004, "r": 175.8699, "b": 165.90479000000005, "coord_origin": "TOPLEFT"}}, {"id": 10, "text": "EditDist (", "bbox": {"l": 179.27899, "t": 150.31799, "r": 221.95677, "b": 159.16479000000004, "coord_origin": "TOPLEFT"}}, {"id": 11, "text": "T$_{a}$, T$_{b}$", "bbox": {"l": 221.95200000000003, "t": 150.31799, "r": 246.34663, "b": 159.16479000000004, "coord_origin": "TOPLEFT"}}, {"id": 12, "text": ")", "bbox": {"l": 246.84499999999997, "t": 150.31799, "r": 250.71945, "b": 159.16479000000004, "coord_origin": "TOPLEFT"}}, {"id": 13, "text": "max (", "bbox": {"l": 182.21201, "t": 163.89197000000001, "r": 206.29161, "b": 172.73877000000005, "coord_origin": "TOPLEFT"}}, {"id": 14, "text": "|", "bbox": {"l": 206.289, "t": 163.33411, "r": 209.05661, "b": 172.73877000000005, "coord_origin": "TOPLEFT"}}, {"id": 15, "text": "T$_{a}$", "bbox": {"l": 209.056, "t": 163.89197000000001, "r": 219.19968, "b": 172.73877000000005, "coord_origin": "TOPLEFT"}}, {"id": 16, "text": "|", "bbox": {"l": 219.69700999999998, "t": 163.33411, "r": 222.46461000000002, "b": 172.73877000000005, "coord_origin": "TOPLEFT"}}, {"id": 17, "text": ",", "bbox": {"l": 224.125, "t": 163.89197000000001, "r": 226.89261, "b": 172.73877000000005, "coord_origin": "TOPLEFT"}}, {"id": 18, "text": "|", "bbox": {"l": 228.55299000000002, "t": 163.33411, "r": 231.3206, "b": 172.73877000000005, "coord_origin": "TOPLEFT"}}, {"id": 19, "text": "T$_{b}$", "bbox": {"l": 231.31999, "t": 163.89197000000001, "r": 240.64563, "b": 172.73877000000005, "coord_origin": "TOPLEFT"}}, {"id": 20, "text": "|", "bbox": {"l": 241.144, "t": 163.33411, "r": 243.91161, "b": 172.73877000000005, "coord_origin": "TOPLEFT"}}, {"id": 21, "text": ")", "bbox": {"l": 243.911, "t": 163.89197000000001, "r": 247.78545, "b": 172.73877000000005, "coord_origin": "TOPLEFT"}}, {"id": 22, "text": "(3)", "bbox": {"l": 274.746, "t": 157.21740999999997, "r": 286.3624, "b": 166.12396, "coord_origin": "TOPLEFT"}}, {"id": 23, "text": "where", "bbox": {"l": 62.067001, "t": 181.16241000000002, "r": 86.405632, "b": 190.06897000000004, "coord_origin": "TOPLEFT"}}, {"id": 24, "text": "T$_{a}$", "bbox": {"l": 88.581001, "t": 181.00298999999995, "r": 98.724663, "b": 189.84978999999998, "coord_origin": "TOPLEFT"}}, {"id": 25, "text": "and", "bbox": {"l": 101.399, "t": 181.16241000000002, "r": 115.785, "b": 190.06897000000004, "coord_origin": "TOPLEFT"}}, {"id": 26, "text": "T$_{b}$", "bbox": {"l": 117.961, "t": 181.00298999999995, "r": 127.28664, "b": 189.84978999999998, "coord_origin": "TOPLEFT"}}, {"id": 27, "text": "represent tables in tree structure HTML", "bbox": {"l": 129.95999, "t": 181.16241000000002, "r": 286.36285, "b": 190.06897000000004, "coord_origin": "TOPLEFT"}}, {"id": 28, "text": "format. EditDist denotes the tree-edit distance, and", "bbox": {"l": 50.111992, "t": 193.11743, "r": 252.78116000000003, "b": 202.02399000000003, "coord_origin": "TOPLEFT"}}, {"id": 29, "text": "|", "bbox": {"l": 255.18201, "t": 192.40015000000005, "r": 257.94962, "b": 201.80480999999997, "coord_origin": "TOPLEFT"}}, {"id": 30, "text": "T", "bbox": {"l": 257.94901, "t": 192.95800999999994, "r": 263.77115, "b": 201.80480999999997, "coord_origin": "TOPLEFT"}}, {"id": 31, "text": "|", "bbox": {"l": 265.155, "t": 192.40015000000005, "r": 267.92261, "b": 201.80480999999997, "coord_origin": "TOPLEFT"}}, {"id": 32, "text": "rep-", "bbox": {"l": 270.32199, "t": 193.11743, "r": 286.36179, "b": 202.02399000000003, "coord_origin": "TOPLEFT"}}, {"id": 33, "text": "resents the number of nodes in", "bbox": {"l": 50.111984, "t": 205.07245, "r": 172.13388, "b": 213.97900000000004, "coord_origin": "TOPLEFT"}}, {"id": 34, "text": "T", "bbox": {"l": 174.62399, "t": 204.91301999999996, "r": 180.44614, "b": 213.75982999999997, "coord_origin": "TOPLEFT"}}, {"id": 35, "text": ".", "bbox": {"l": 181.82899, "t": 205.07245, "r": 184.31964, "b": 213.97900000000004, "coord_origin": "TOPLEFT"}}, {"id": 36, "text": "5.4.", "bbox": {"l": 50.112, "t": 224.81946000000005, "r": 64.551605, "b": 234.67151, "coord_origin": "TOPLEFT"}}, {"id": 37, "text": "Quantitative Analysis", "bbox": {"l": 74.178009, "t": 224.81946000000005, "r": 170.45169, "b": 234.67151, "coord_origin": "TOPLEFT"}}, {"id": 38, "text": "Structure.", "bbox": {"l": 62.067001, "t": 243.6499, "r": 105.32461, "b": 252.60626000000002, "coord_origin": "TOPLEFT"}}, {"id": 39, "text": "As shown in Tab.", "bbox": {"l": 112.12600000000002, "t": 243.76946999999996, "r": 184.68361, "b": 252.67602999999997, "coord_origin": "TOPLEFT"}}, {"id": 40, "text": "2, TableFormer outper-", "bbox": {"l": 191.4781, "t": 243.76946999999996, "r": 286.36188, "b": 252.67602999999997, "coord_origin": "TOPLEFT"}}, {"id": 41, "text": "forms all SOTA methods across different datasets by a large", "bbox": {"l": 50.112, "t": 255.72448999999995, "r": 286.36508, "b": 264.63104, "coord_origin": "TOPLEFT"}}, {"id": 42, "text": "margin for predicting the table structure from an image.", "bbox": {"l": 50.112, "t": 267.67949999999996, "r": 286.36508, "b": 276.58606, "coord_origin": "TOPLEFT"}}, {"id": 43, "text": "All the more, our model outperforms pre-trained methods.", "bbox": {"l": 50.112, "t": 279.63446, "r": 286.36508, "b": 288.54105, "coord_origin": "TOPLEFT"}}, {"id": 44, "text": "During the evaluation we do not apply any table filtering.", "bbox": {"l": 50.112, "t": 291.59048, "r": 286.36514, "b": 300.49704, "coord_origin": "TOPLEFT"}}, {"id": 45, "text": "We also provide our baseline results on the SynthTabNet", "bbox": {"l": 50.112, "t": 303.54547, "r": 286.36508, "b": 312.45203000000004, "coord_origin": "TOPLEFT"}}, {"id": 46, "text": "dataset. It has been observed that large tables (e.g. tables", "bbox": {"l": 50.112, "t": 315.50046, "r": 286.36505, "b": 324.40700999999996, "coord_origin": "TOPLEFT"}}, {"id": 47, "text": "that occupy half of the page or more) yield poor predictions.", "bbox": {"l": 50.112, "t": 327.45544, "r": 286.36508, "b": 336.362, "coord_origin": "TOPLEFT"}}, {"id": 48, "text": "We attribute this issue to the image resizing during the pre-", "bbox": {"l": 50.112, "t": 339.41043, "r": 286.36508, "b": 348.31699000000003, "coord_origin": "TOPLEFT"}}, {"id": 49, "text": "processing step, that produces downsampled images with", "bbox": {"l": 50.112, "t": 351.36542, "r": 286.36505, "b": 360.27197, "coord_origin": "TOPLEFT"}}, {"id": 50, "text": "indistinguishable features. This problem can be addressed", "bbox": {"l": 50.112, "t": 363.32141, "r": 286.36508, "b": 372.2279700000001, "coord_origin": "TOPLEFT"}}, {"id": 51, "text": "by treating such big tables with a separate model which ac-", "bbox": {"l": 50.112, "t": 375.2764, "r": 286.36511, "b": 384.18295000000006, "coord_origin": "TOPLEFT"}}, {"id": 52, "text": "cepts a large input image size.", "bbox": {"l": 50.112, "t": 387.23138, "r": 170.01187, "b": 396.13794, "coord_origin": "TOPLEFT"}}, {"id": 53, "text": "Model", "bbox": {"l": 78.843002, "t": 420.69037, "r": 104.85535, "b": 429.59692, "coord_origin": "TOPLEFT"}}, {"id": 54, "text": "TEDS", "bbox": {"l": 211.2, "t": 414.71237, "r": 236.10649, "b": 423.61893, "coord_origin": "TOPLEFT"}}, {"id": 55, "text": "Dataset", "bbox": {"l": 129.338, "t": 426.66736, "r": 159.21584, "b": 435.57391000000007, "coord_origin": "TOPLEFT"}}, {"id": 56, "text": "Simple", "bbox": {"l": 171.17096, "t": 426.66736, "r": 199.40497, "b": 435.57391000000007, "coord_origin": "TOPLEFT"}}, {"id": 57, "text": "Complex", "bbox": {"l": 211.36009, "t": 426.66736, "r": 247.74349999999998, "b": 435.57391000000007, "coord_origin": "TOPLEFT"}}, {"id": 58, "text": "All", "bbox": {"l": 264.54044, "t": 426.66736, "r": 277.27264, "b": 435.57391000000007, "coord_origin": "TOPLEFT"}}, {"id": 59, "text": "EDD", "bbox": {"l": 81.612, "t": 443.62436, "r": 102.08514, "b": 452.53091, "coord_origin": "TOPLEFT"}}, {"id": 60, "text": "PTN", "bbox": {"l": 134.87206, "t": 443.62436, "r": 153.69141, "b": 452.53091, "coord_origin": "TOPLEFT"}}, {"id": 61, "text": "91.1", "bbox": {"l": 176.56554, "t": 443.62436, "r": 194.00009, "b": 452.53091, "coord_origin": "TOPLEFT"}}, {"id": 62, "text": "88.7", "bbox": {"l": 220.82938000000001, "t": 443.62436, "r": 238.26393, "b": 452.53091, "coord_origin": "TOPLEFT"}}, {"id": 63, "text": "89.9", "bbox": {"l": 262.18414, "t": 443.62436, "r": 279.61868, "b": 452.53091, "coord_origin": "TOPLEFT"}}, {"id": 64, "text": "GTE", "bbox": {"l": 82.165001, "t": 455.58035, "r": 101.5323, "b": 464.48691, "coord_origin": "TOPLEFT"}}, {"id": 65, "text": "PTN", "bbox": {"l": 134.86716, "t": 455.58035, "r": 153.68651, "b": 464.48691, "coord_origin": "TOPLEFT"}}, {"id": 66, "text": "-", "bbox": {"l": 183.62411, "t": 455.58035, "r": 186.94167, "b": 464.48691, "coord_origin": "TOPLEFT"}}, {"id": 67, "text": "-", "bbox": {"l": 227.88795000000002, "t": 455.58035, "r": 231.20551, "b": 464.48691, "coord_origin": "TOPLEFT"}}, {"id": 68, "text": "93.01", "bbox": {"l": 259.69855, "t": 455.58035, "r": 282.11441, "b": 464.48691, "coord_origin": "TOPLEFT"}}, {"id": 69, "text": "TableFormer", "bbox": {"l": 66.315002, "t": 468.13336, "r": 117.38329000000002, "b": 477.03992, "coord_origin": "TOPLEFT"}}, {"id": 70, "text": "PTN", "bbox": {"l": 134.86766, "t": 468.13336, "r": 153.68701, "b": 477.03992, "coord_origin": "TOPLEFT"}}, {"id": 71, "text": "98.5", "bbox": {"l": 176.57111, "t": 468.13336, "r": 194.00566, "b": 477.03992, "coord_origin": "TOPLEFT"}}, {"id": 72, "text": "95.0", "bbox": {"l": 220.83495, "t": 468.13336, "r": 238.26950000000002, "b": 477.03992, "coord_origin": "TOPLEFT"}}, {"id": 73, "text": "96.75", "bbox": {"l": 259.698, "t": 468.01379, "r": 282.11386, "b": 476.97018, "coord_origin": "TOPLEFT"}}, {"id": 74, "text": "EDD", "bbox": {"l": 81.612, "t": 483.32635, "r": 102.08514, "b": 492.23291, "coord_origin": "TOPLEFT"}}, {"id": 75, "text": "FTN", "bbox": {"l": 134.87206, "t": 483.32635, "r": 153.69141, "b": 492.23291, "coord_origin": "TOPLEFT"}}, {"id": 76, "text": "88.4", "bbox": {"l": 176.56554, "t": 483.32635, "r": 194.00009, "b": 492.23291, "coord_origin": "TOPLEFT"}}, {"id": 77, "text": "92.08", "bbox": {"l": 218.33870999999996, "t": 483.32635, "r": 240.75455999999997, "b": 492.23291, "coord_origin": "TOPLEFT"}}, {"id": 78, "text": "90.6", "bbox": {"l": 262.18411, "t": 483.32635, "r": 279.61865, "b": 492.23291, "coord_origin": "TOPLEFT"}}, {"id": 79, "text": "GTE", "bbox": {"l": 82.165001, "t": 495.28134, "r": 101.5323, "b": 504.1879, "coord_origin": "TOPLEFT"}}, {"id": 80, "text": "FTN", "bbox": {"l": 134.86716, "t": 495.28134, "r": 153.68651, "b": 504.1879, "coord_origin": "TOPLEFT"}}, {"id": 81, "text": "-", "bbox": {"l": 183.62411, "t": 495.28134, "r": 186.94167, "b": 504.1879, "coord_origin": "TOPLEFT"}}, {"id": 82, "text": "-", "bbox": {"l": 227.88795000000002, "t": 495.28134, "r": 231.20551, "b": 504.1879, "coord_origin": "TOPLEFT"}}, {"id": 83, "text": "87.14", "bbox": {"l": 259.69855, "t": 495.28134, "r": 282.11441, "b": 504.1879, "coord_origin": "TOPLEFT"}}, {"id": 84, "text": "GTE (FT)", "bbox": {"l": 71.789001, "t": 507.23633, "r": 111.90838999999998, "b": 516.14288, "coord_origin": "TOPLEFT"}}, {"id": 85, "text": "FTN", "bbox": {"l": 134.86221, "t": 507.23633, "r": 153.68156, "b": 516.14288, "coord_origin": "TOPLEFT"}}, {"id": 86, "text": "-", "bbox": {"l": 183.62914, "t": 507.23633, "r": 186.94669, "b": 516.14288, "coord_origin": "TOPLEFT"}}, {"id": 87, "text": "-", "bbox": {"l": 227.89297, "t": 507.23633, "r": 231.21053000000003, "b": 516.14288, "coord_origin": "TOPLEFT"}}, {"id": 88, "text": "91.02", "bbox": {"l": 259.6936, "t": 507.23633, "r": 282.10947, "b": 516.14288, "coord_origin": "TOPLEFT"}}, {"id": 89, "text": "TableFormer", "bbox": {"l": 66.315002, "t": 519.1913099999999, "r": 117.38329000000002, "b": 528.0978700000001, "coord_origin": "TOPLEFT"}}, {"id": 90, "text": "FTN", "bbox": {"l": 134.86766, "t": 519.1913099999999, "r": 153.68701, "b": 528.0978700000001, "coord_origin": "TOPLEFT"}}, {"id": 91, "text": "97.5", "bbox": {"l": 176.57111, "t": 519.1913099999999, "r": 194.00566, "b": 528.0978700000001, "coord_origin": "TOPLEFT"}}, {"id": 92, "text": "96.0", "bbox": {"l": 220.83495, "t": 519.1913099999999, "r": 238.26950000000002, "b": 528.0978700000001, "coord_origin": "TOPLEFT"}}, {"id": 93, "text": "96.8", "bbox": {"l": 262.189, "t": 519.0717500000001, "r": 279.62354, "b": 528.02814, "coord_origin": "TOPLEFT"}}, {"id": 94, "text": "EDD", "bbox": {"l": 81.612, "t": 536.49837, "r": 102.08514, "b": 545.40492, "coord_origin": "TOPLEFT"}}, {"id": 95, "text": "TB", "bbox": {"l": 137.91064, "t": 536.49837, "r": 150.64285, "b": 545.40492, "coord_origin": "TOPLEFT"}}, {"id": 96, "text": "86.0", "bbox": {"l": 176.56554, "t": 536.49837, "r": 194.00009, "b": 545.40492, "coord_origin": "TOPLEFT"}}, {"id": 97, "text": "-", "bbox": {"l": 227.89285, "t": 536.49837, "r": 231.21040000000002, "b": 545.40492, "coord_origin": "TOPLEFT"}}, {"id": 98, "text": "86.0", "bbox": {"l": 262.18411, "t": 536.49837, "r": 279.61865, "b": 545.40492, "coord_origin": "TOPLEFT"}}, {"id": 99, "text": "TableFormer", "bbox": {"l": 66.315002, "t": 548.45436, "r": 117.38329000000002, "b": 557.36092, "coord_origin": "TOPLEFT"}}, {"id": 100, "text": "TB", "bbox": {"l": 137.90625, "t": 548.45436, "r": 150.63846, "b": 557.36092, "coord_origin": "TOPLEFT"}}, {"id": 101, "text": "89.6", "bbox": {"l": 176.57111, "t": 548.45436, "r": 194.00566, "b": 557.36092, "coord_origin": "TOPLEFT"}}, {"id": 102, "text": "-", "bbox": {"l": 227.88845999999998, "t": 548.45436, "r": 231.20601, "b": 557.36092, "coord_origin": "TOPLEFT"}}, {"id": 103, "text": "89.6", "bbox": {"l": 262.189, "t": 548.3348100000001, "r": 279.62354, "b": 557.2911799999999, "coord_origin": "TOPLEFT"}}, {"id": 104, "text": "TableFormer", "bbox": {"l": 66.315002, "t": 568.00237, "r": 117.38329000000002, "b": 576.90892, "coord_origin": "TOPLEFT"}}, {"id": 105, "text": "STN", "bbox": {"l": 134.86766, "t": 568.00237, "r": 153.68701, "b": 576.90892, "coord_origin": "TOPLEFT"}}, {"id": 106, "text": "96.9", "bbox": {"l": 176.57111, "t": 568.00237, "r": 194.00566, "b": 576.90892, "coord_origin": "TOPLEFT"}}, {"id": 107, "text": "95.7", "bbox": {"l": 220.83495, "t": 568.00237, "r": 238.26950000000002, "b": 576.90892, "coord_origin": "TOPLEFT"}}, {"id": 108, "text": "96.7", "bbox": {"l": 262.1897, "t": 568.00237, "r": 279.62424, "b": 576.90892, "coord_origin": "TOPLEFT"}}, {"id": 109, "text": "Table 2: Structure results on PubTabNet (PTN), FinTabNet", "bbox": {"l": 50.112, "t": 592.43336, "r": 286.36511, "b": 601.33992, "coord_origin": "TOPLEFT"}}, {"id": 110, "text": "(FTN), TableBank (TB) and SynthTabNet (STN).", "bbox": {"l": 50.112, "t": 604.38837, "r": 247.46114, "b": 613.29492, "coord_origin": "TOPLEFT"}}, {"id": 111, "text": "FT: Model was trained on PubTabNet then finetuned.", "bbox": {"l": 50.112, "t": 616.34337, "r": 261.78732, "b": 625.24992, "coord_origin": "TOPLEFT"}}, {"id": 112, "text": "Cell Detection.", "bbox": {"l": 62.067001, "t": 644.3498099999999, "r": 124.72179, "b": 653.30618, "coord_origin": "TOPLEFT"}}, {"id": 113, "text": "Like any object detector, our", "bbox": {"l": 128.20401, "t": 644.46936, "r": 242.9333, "b": 653.37592, "coord_origin": "TOPLEFT"}}, {"id": 114, "text": "Cell BBox", "bbox": {"l": 245.55401999999998, "t": 644.55902, "r": 286.36084, "b": 653.1467700000001, "coord_origin": "TOPLEFT"}}, {"id": 115, "text": "Detector", "bbox": {"l": 50.112015, "t": 656.51402, "r": 84.971146, "b": 665.10178, "coord_origin": "TOPLEFT"}}, {"id": 116, "text": "provides bounding boxes that can be improved", "bbox": {"l": 89.515015, "t": 656.42436, "r": 286.366, "b": 665.33092, "coord_origin": "TOPLEFT"}}, {"id": 117, "text": "with post-processing during inference. We make use of the", "bbox": {"l": 50.112015, "t": 668.37936, "r": 286.36511, "b": 677.28593, "coord_origin": "TOPLEFT"}}, {"id": 118, "text": "grid-like structure of tables to refine the predictions. A de-", "bbox": {"l": 50.112015, "t": 680.33536, "r": 286.36505, "b": 689.24193, "coord_origin": "TOPLEFT"}}, {"id": 119, "text": "tailed explanation on the post-processing is available in the", "bbox": {"l": 50.112015, "t": 692.290359, "r": 286.36511, "b": 701.19693, "coord_origin": "TOPLEFT"}}, {"id": 120, "text": "supplementary material. As shown in Tab. 3, we evaluate", "bbox": {"l": 50.112015, "t": 704.245361, "r": 286.36508, "b": 713.151932, "coord_origin": "TOPLEFT"}}, {"id": 121, "text": "our", "bbox": {"l": 308.862, "t": 75.20836999999995, "r": 322.14215, "b": 84.11492999999996, "coord_origin": "TOPLEFT"}}, {"id": 122, "text": "Cell BBox Decoder", "bbox": {"l": 325.45401, "t": 75.29803000000004, "r": 404.56702, "b": 83.88580000000002, "coord_origin": "TOPLEFT"}}, {"id": 123, "text": "accuracy for cells with a class la-", "bbox": {"l": 408.104, "t": 75.20836999999995, "r": 545.10968, "b": 84.11492999999996, "coord_origin": "TOPLEFT"}}, {"id": 124, "text": "bel of \u2018content\u2019 only using the PASCAL VOC mAP metric", "bbox": {"l": 308.862, "t": 87.16339000000005, "r": 545.11511, "b": 96.06994999999995, "coord_origin": "TOPLEFT"}}, {"id": 125, "text": "for pre-processing and post-processing.", "bbox": {"l": 308.862, "t": 99.11841000000004, "r": 470.22626, "b": 108.02495999999985, "coord_origin": "TOPLEFT"}}, {"id": 126, "text": "Note that we do", "bbox": {"l": 477.52884, "t": 99.11841000000004, "r": 545.11511, "b": 108.02495999999985, "coord_origin": "TOPLEFT"}}, {"id": 127, "text": "not have post-processing results for SynthTabNet as images", "bbox": {"l": 308.862, "t": 111.07343000000003, "r": 545.11517, "b": 119.97997999999984, "coord_origin": "TOPLEFT"}}, {"id": 128, "text": "are only provided. To compare the performance of our pro-", "bbox": {"l": 308.862, "t": 123.02844000000005, "r": 545.11511, "b": 131.93499999999995, "coord_origin": "TOPLEFT"}}, {"id": 129, "text": "posed approach, we\u2019ve integrated TableFormer\u2019s", "bbox": {"l": 308.862, "t": 134.98443999999995, "r": 502.01691000000005, "b": 143.89099, "coord_origin": "TOPLEFT"}}, {"id": 130, "text": "Cell BBox", "bbox": {"l": 504.47299, "t": 135.07410000000004, "r": 545.11041, "b": 143.66187000000002, "coord_origin": "TOPLEFT"}}, {"id": 131, "text": "Decoder", "bbox": {"l": 308.862, "t": 147.02910999999995, "r": 343.16324, "b": 155.61688000000004, "coord_origin": "TOPLEFT"}}, {"id": 132, "text": "into EDD architecture. As mentioned previously,", "bbox": {"l": 346.371, "t": 146.93944999999997, "r": 545.11493, "b": 155.84600999999998, "coord_origin": "TOPLEFT"}}, {"id": 133, "text": "the Structure Decoder provides the", "bbox": {"l": 308.862, "t": 158.89446999999996, "r": 446.15652, "b": 167.80102999999997, "coord_origin": "TOPLEFT"}}, {"id": 134, "text": "Cell BBox Decoder", "bbox": {"l": 448.28998000000007, "t": 158.98413000000005, "r": 525.04181, "b": 167.57190000000003, "coord_origin": "TOPLEFT"}}, {"id": 135, "text": "with", "bbox": {"l": 527.39899, "t": 158.89446999999996, "r": 545.11249, "b": 167.80102999999997, "coord_origin": "TOPLEFT"}}, {"id": 136, "text": "the features needed to predict the bounding box predictions.", "bbox": {"l": 308.862, "t": 170.84948999999995, "r": 545.11511, "b": 179.75603999999998, "coord_origin": "TOPLEFT"}}, {"id": 137, "text": "Therefore, the accuracy of the", "bbox": {"l": 308.862, "t": 182.80449999999996, "r": 432.86642000000006, "b": 191.71105999999997, "coord_origin": "TOPLEFT"}}, {"id": 138, "text": "Structure Decoder", "bbox": {"l": 436.39001, "t": 182.89417000000003, "r": 510.93021, "b": 191.48193000000003, "coord_origin": "TOPLEFT"}}, {"id": 139, "text": "directly", "bbox": {"l": 514.677, "t": 182.80449999999996, "r": 545.11273, "b": 191.71105999999997, "coord_origin": "TOPLEFT"}}, {"id": 140, "text": "influences the accuracy of the", "bbox": {"l": 308.862, "t": 194.75951999999995, "r": 431.17285, "b": 203.66607999999997, "coord_origin": "TOPLEFT"}}, {"id": 141, "text": "Cell BBox Decoder", "bbox": {"l": 434.6790199999999, "t": 194.84918000000005, "r": 514.18054, "b": 203.43695000000002, "coord_origin": "TOPLEFT"}}, {"id": 142, "text": ". If the", "bbox": {"l": 514.17603, "t": 194.75951999999995, "r": 545.10992, "b": 203.66607999999997, "coord_origin": "TOPLEFT"}}, {"id": 143, "text": "Structure Decoder", "bbox": {"l": 308.86203, "t": 206.80517999999995, "r": 382.35614, "b": 215.39293999999995, "coord_origin": "TOPLEFT"}}, {"id": 144, "text": "predicts an extra column, this will result", "bbox": {"l": 385.07501, "t": 206.71551999999997, "r": 545.11426, "b": 215.62207, "coord_origin": "TOPLEFT"}}, {"id": 145, "text": "in an extra column of predicted bounding boxes.", "bbox": {"l": 308.862, "t": 218.67052999999999, "r": 501.6981799999999, "b": 227.57709, "coord_origin": "TOPLEFT"}}, {"id": 146, "text": "Model", "bbox": {"l": 339.323, "t": 253.66436999999996, "r": 365.33536, "b": 262.57092, "coord_origin": "TOPLEFT"}}, {"id": 147, "text": "Dataset", "bbox": {"l": 401.04132, "t": 253.66436999999996, "r": 430.91916, "b": 262.57092, "coord_origin": "TOPLEFT"}}, {"id": 148, "text": "mAP", "bbox": {"l": 454.10214, "t": 253.66436999999996, "r": 474.58523999999994, "b": 262.57092, "coord_origin": "TOPLEFT"}}, {"id": 149, "text": "mAP (PP)", "bbox": {"l": 486.54034, "t": 253.66436999999996, "r": 527.2276, "b": 262.57092, "coord_origin": "TOPLEFT"}}, {"id": 150, "text": "EDD+BBox", "bbox": {"l": 327.65601, "t": 270.62134000000003, "r": 377.00076, "b": 279.52788999999996, "coord_origin": "TOPLEFT"}}, {"id": 151, "text": "PubTabNet", "bbox": {"l": 393.69809, "t": 270.62134000000003, "r": 438.28073, "b": 279.52788999999996, "coord_origin": "TOPLEFT"}}, {"id": 152, "text": "79.2", "bbox": {"l": 455.63559, "t": 270.62134000000003, "r": 473.07013, "b": 279.52788999999996, "coord_origin": "TOPLEFT"}}, {"id": 153, "text": "82.7", "bbox": {"l": 498.16592, "t": 270.62134000000003, "r": 515.60046, "b": 279.52788999999996, "coord_origin": "TOPLEFT"}}, {"id": 154, "text": "TableFormer", "bbox": {"l": 326.79501, "t": 282.57631999999995, "r": 377.86331, "b": 291.48288, "coord_origin": "TOPLEFT"}}, {"id": 155, "text": "PubTabNet", "bbox": {"l": 393.69388, "t": 282.57631999999995, "r": 438.27652, "b": 291.48288, "coord_origin": "TOPLEFT"}}, {"id": 156, "text": "82.1", "bbox": {"l": 455.63101, "t": 282.45676, "r": 473.06555000000003, "b": 291.41315, "coord_origin": "TOPLEFT"}}, {"id": 157, "text": "86.8", "bbox": {"l": 498.1713, "t": 282.45676, "r": 515.60583, "b": 291.41315, "coord_origin": "TOPLEFT"}}, {"id": 158, "text": "TableFormer", "bbox": {"l": 326.79501, "t": 294.53131, "r": 377.86331, "b": 303.43787, "coord_origin": "TOPLEFT"}}, {"id": 159, "text": "SynthTabNet", "bbox": {"l": 389.81842, "t": 294.53131, "r": 442.15194999999994, "b": 303.43787, "coord_origin": "TOPLEFT"}}, {"id": 160, "text": "87.7", "bbox": {"l": 455.63135, "t": 294.53131, "r": 473.06589, "b": 303.43787, "coord_origin": "TOPLEFT"}}, {"id": 161, "text": "-", "bbox": {"l": 505.22515999999996, "t": 294.53131, "r": 508.54268999999994, "b": 303.43787, "coord_origin": "TOPLEFT"}}, {"id": 162, "text": "Table 3:", "bbox": {"l": 308.862, "t": 316.44931, "r": 341.49951, "b": 325.35587, "coord_origin": "TOPLEFT"}}, {"id": 163, "text": "Cell Bounding Box detection results on PubTab-", "bbox": {"l": 348.60284, "t": 316.44931, "r": 545.11517, "b": 325.35587, "coord_origin": "TOPLEFT"}}, {"id": 164, "text": "Net, and FinTabNet. PP: Post-processing.", "bbox": {"l": 308.862, "t": 328.4043, "r": 474.97845, "b": 337.3108500000001, "coord_origin": "TOPLEFT"}}, {"id": 165, "text": "Cell Content.", "bbox": {"l": 320.81699, "t": 367.6797199999999, "r": 378.94876, "b": 376.63611, "coord_origin": "TOPLEFT"}}, {"id": 166, "text": "In this section, we evaluate the entire", "bbox": {"l": 387.07898, "t": 367.79929, "r": 545.11566, "b": 376.70584, "coord_origin": "TOPLEFT"}}, {"id": 167, "text": "pipeline of recovering a table with content.", "bbox": {"l": 308.86197, "t": 379.75426999999996, "r": 487.19257, "b": 388.66083, "coord_origin": "TOPLEFT"}}, {"id": 168, "text": "Here we put", "bbox": {"l": 493.96713, "t": 379.75426999999996, "r": 545.11511, "b": 388.66083, "coord_origin": "TOPLEFT"}}, {"id": 169, "text": "our approach to test by capitalizing on extracting content", "bbox": {"l": 308.86197, "t": 391.70926, "r": 545.11505, "b": 400.61581, "coord_origin": "TOPLEFT"}}, {"id": 170, "text": "from the PDF cells rather than decoding from images. Tab.", "bbox": {"l": 308.86197, "t": 403.66525, "r": 545.11523, "b": 412.57181, "coord_origin": "TOPLEFT"}}, {"id": 171, "text": "4", "bbox": {"l": 308.86197, "t": 415.62024, "r": 314.08096, "b": 424.52679, "coord_origin": "TOPLEFT"}}, {"id": 172, "text": "shows the TEDs score of HTML code representing the", "bbox": {"l": 316.69046, "t": 415.62024, "r": 545.11517, "b": 424.52679, "coord_origin": "TOPLEFT"}}, {"id": 173, "text": "structure of the table along with the content inserted in the", "bbox": {"l": 308.86197, "t": 427.57523, "r": 545.11505, "b": 436.48177999999996, "coord_origin": "TOPLEFT"}}, {"id": 174, "text": "data cell and compared with the ground-truth. Our method", "bbox": {"l": 308.86197, "t": 439.53021, "r": 545.11505, "b": 448.43677, "coord_origin": "TOPLEFT"}}, {"id": 175, "text": "achieved a", "bbox": {"l": 308.86197, "t": 451.4852, "r": 350.23666, "b": 460.39175, "coord_origin": "TOPLEFT"}}, {"id": 176, "text": "5.3%", "bbox": {"l": 352.17596, "t": 451.36563, "r": 374.59183, "b": 460.32201999999995, "coord_origin": "TOPLEFT"}}, {"id": 177, "text": "increase over the state-of-the-art, and com-", "bbox": {"l": 376.53296, "t": 451.4852, "r": 545.11011, "b": 460.39175, "coord_origin": "TOPLEFT"}}, {"id": 178, "text": "mercial solutions. We believe our scores would be higher", "bbox": {"l": 308.86197, "t": 463.44019, "r": 545.11511, "b": 472.34674, "coord_origin": "TOPLEFT"}}, {"id": 179, "text": "if the HTML ground-truth matched the extracted PDF cell", "bbox": {"l": 308.86197, "t": 475.39618, "r": 545.11517, "b": 484.30273, "coord_origin": "TOPLEFT"}}, {"id": 180, "text": "content. Unfortunately, there are small discrepancies such", "bbox": {"l": 308.86197, "t": 487.35117, "r": 545.11511, "b": 496.25772, "coord_origin": "TOPLEFT"}}, {"id": 181, "text": "as spacings around words or special characters with various", "bbox": {"l": 308.86197, "t": 499.30615, "r": 545.11505, "b": 508.21271, "coord_origin": "TOPLEFT"}}, {"id": 182, "text": "unicode representations.", "bbox": {"l": 308.86197, "t": 511.26114, "r": 405.69846, "b": 520.16769, "coord_origin": "TOPLEFT"}}, {"id": 183, "text": "Model", "bbox": {"l": 358.01099, "t": 552.23337, "r": 384.02335, "b": 561.1399200000001, "coord_origin": "TOPLEFT"}}, {"id": 184, "text": "TEDS", "bbox": {"l": 449.03400000000005, "t": 546.25537, "r": 473.94049000000007, "b": 555.16193, "coord_origin": "TOPLEFT"}}, {"id": 185, "text": "Simple", "bbox": {"l": 408.50598, "t": 558.21037, "r": 436.73999, "b": 567.11693, "coord_origin": "TOPLEFT"}}, {"id": 186, "text": "Complex", "bbox": {"l": 448.6951, "t": 558.21037, "r": 485.07849, "b": 567.11693, "coord_origin": "TOPLEFT"}}, {"id": 187, "text": "All", "bbox": {"l": 499.3848, "t": 558.21037, "r": 512.117, "b": 567.11693, "coord_origin": "TOPLEFT"}}, {"id": 188, "text": "Tabula", "bbox": {"l": 357.68201, "t": 575.16736, "r": 384.3519, "b": 584.0739100000001, "coord_origin": "TOPLEFT"}}, {"id": 189, "text": "78.0", "bbox": {"l": 413.90097, "t": 575.16736, "r": 431.33550999999994, "b": 584.0739100000001, "coord_origin": "TOPLEFT"}}, {"id": 190, "text": "57.8", "bbox": {"l": 458.16479000000004, "t": 575.16736, "r": 475.59933000000007, "b": 584.0739100000001, "coord_origin": "TOPLEFT"}}, {"id": 191, "text": "67.9", "bbox": {"l": 497.0289, "t": 575.16736, "r": 514.46344, "b": 584.0739100000001, "coord_origin": "TOPLEFT"}}, {"id": 192, "text": "Traprange", "bbox": {"l": 350.72299, "t": 587.12236, "r": 391.31064, "b": 596.02892, "coord_origin": "TOPLEFT"}}, {"id": 193, "text": "60.8", "bbox": {"l": 413.90582, "t": 587.12236, "r": 431.34036, "b": 596.02892, "coord_origin": "TOPLEFT"}}, {"id": 194, "text": "49.9", "bbox": {"l": 458.16965, "t": 587.12236, "r": 475.60419, "b": 596.02892, "coord_origin": "TOPLEFT"}}, {"id": 195, "text": "55.4", "bbox": {"l": 497.03374999999994, "t": 587.12236, "r": 514.46832, "b": 596.02892, "coord_origin": "TOPLEFT"}}, {"id": 196, "text": "Camelot", "bbox": {"l": 354.13599, "t": 599.07835, "r": 387.89923, "b": 607.98491, "coord_origin": "TOPLEFT"}}, {"id": 197, "text": "80.0", "bbox": {"l": 413.90161, "t": 599.07835, "r": 431.33615, "b": 607.98491, "coord_origin": "TOPLEFT"}}, {"id": 198, "text": "66.0", "bbox": {"l": 458.16544, "t": 599.07835, "r": 475.59998, "b": 607.98491, "coord_origin": "TOPLEFT"}}, {"id": 199, "text": "73.0", "bbox": {"l": 497.02954000000005, "t": 599.07835, "r": 514.46411, "b": 607.98491, "coord_origin": "TOPLEFT"}}, {"id": 200, "text": "Acrobat Pro", "bbox": {"l": 346.55899, "t": 611.03336, "r": 395.47534, "b": 619.93991, "coord_origin": "TOPLEFT"}}, {"id": 201, "text": "68.9", "bbox": {"l": 413.90616, "t": 611.03336, "r": 431.34069999999997, "b": 619.93991, "coord_origin": "TOPLEFT"}}, {"id": 202, "text": "61.8", "bbox": {"l": 458.16998000000007, "t": 611.03336, "r": 475.60452, "b": 619.93991, "coord_origin": "TOPLEFT"}}, {"id": 203, "text": "65.3", "bbox": {"l": 497.03409, "t": 611.03336, "r": 514.46863, "b": 619.93991, "coord_origin": "TOPLEFT"}}, {"id": 204, "text": "EDD", "bbox": {"l": 360.78101, "t": 622.9883600000001, "r": 381.25415, "b": 631.89491, "coord_origin": "TOPLEFT"}}, {"id": 205, "text": "91.2", "bbox": {"l": 413.90158, "t": 622.9883600000001, "r": 431.33612, "b": 631.89491, "coord_origin": "TOPLEFT"}}, {"id": 206, "text": "85.4", "bbox": {"l": 458.16541, "t": 622.9883600000001, "r": 475.59995000000004, "b": 631.89491, "coord_origin": "TOPLEFT"}}, {"id": 207, "text": "88.3", "bbox": {"l": 497.0295100000001, "t": 622.9883600000001, "r": 514.46405, "b": 631.89491, "coord_origin": "TOPLEFT"}}, {"id": 208, "text": "TableFormer", "bbox": {"l": 345.483, "t": 634.94336, "r": 396.5513, "b": 643.84991, "coord_origin": "TOPLEFT"}}, {"id": 209, "text": "95.4", "bbox": {"l": 413.90616, "t": 634.94336, "r": 431.34069999999997, "b": 643.84991, "coord_origin": "TOPLEFT"}}, {"id": 210, "text": "90.1", "bbox": {"l": 458.16998000000007, "t": 634.94336, "r": 475.60452, "b": 643.84991, "coord_origin": "TOPLEFT"}}, {"id": 211, "text": "93.6", "bbox": {"l": 497.03400000000005, "t": 634.82381, "r": 514.46857, "b": 643.78018, "coord_origin": "TOPLEFT"}}, {"id": 212, "text": "Table 4:", "bbox": {"l": 308.862, "t": 656.86136, "r": 341.73862, "b": 665.76792, "coord_origin": "TOPLEFT"}}, {"id": 213, "text": "Results of structure with content retrieved using", "bbox": {"l": 349.55927, "t": 656.86136, "r": 545.11517, "b": 665.76792, "coord_origin": "TOPLEFT"}}, {"id": 214, "text": "cell detection on PubTabNet. In all cases the input is PDF", "bbox": {"l": 308.862, "t": 668.81636, "r": 545.11505, "b": 677.7229199999999, "coord_origin": "TOPLEFT"}}, {"id": 215, "text": "documents with cropped tables.", "bbox": {"l": 308.862, "t": 680.77136, "r": 435.03836, "b": 689.6779300000001, "coord_origin": "TOPLEFT"}}, {"id": 216, "text": "7", "bbox": {"l": 295.121, "t": 734.133358, "r": 300.10229, "b": 743.039921, "coord_origin": "TOPLEFT"}}], "predictions": {"layout": {"clusters": [{"id": 0, "label": "section_header", "bbox": {"l": 49.507503509521484, "t": 73.60204315185547, "r": 168.0131378173828, "b": 84.25342, "coord_origin": "TOPLEFT"}, "confidence": 0.9554283022880554, "cells": [{"id": 0, "text": "5.3.", "bbox": {"l": 50.112, "t": 74.40137000000016, "r": 63.704811, "b": 84.25342, "coord_origin": "TOPLEFT"}}, {"id": 1, "text": "Datasets and Metrics", "bbox": {"l": 72.766685, "t": 74.40137000000016, "r": 167.89825, "b": 84.25342, "coord_origin": "TOPLEFT"}}]}, {"id": 1, "label": "text", "bbox": {"l": 49.59501647949219, "t": 92.33658599853516, "r": 286.36511, "b": 138.36517333984375, "coord_origin": "TOPLEFT"}, "confidence": 0.9862996935844421, "cells": [{"id": 2, "text": "The Tree-Edit-Distance-Based Similarity (TEDS) met-", "bbox": {"l": 62.067001, "t": 93.35039999999992, "r": 286.36499, "b": 102.25696000000016, "coord_origin": "TOPLEFT"}}, {"id": 3, "text": "ric was introduced in [37]. It represents the prediction, and", "bbox": {"l": 50.112, "t": 105.30542000000003, "r": 286.36511, "b": 114.21198000000015, "coord_origin": "TOPLEFT"}}, {"id": 4, "text": "ground-truth as a tree structure of HTML tags. This simi-", "bbox": {"l": 50.112, "t": 117.26044000000002, "r": 286.36505, "b": 126.16699000000006, "coord_origin": "TOPLEFT"}}, {"id": 5, "text": "larity is calculated as:", "bbox": {"l": 50.112, "t": 129.21642999999995, "r": 136.71687, "b": 138.12298999999996, "coord_origin": "TOPLEFT"}}]}, {"id": 2, "label": "formula", "bbox": {"l": 85.5722427368164, "t": 149.28536987304688, "r": 286.3624, "b": 173.28463745117188, "coord_origin": "TOPLEFT"}, "confidence": 0.9500426650047302, "cells": [{"id": 6, "text": "TEDS (", "bbox": {"l": 86.218994, "t": 157.05798000000004, "r": 118.8784, "b": 165.90479000000005, "coord_origin": "TOPLEFT"}}, {"id": 7, "text": "T$_{a}$, T$_{b}$", "bbox": {"l": 118.87499, "t": 157.05798000000004, "r": 143.26962, "b": 165.90479000000005, "coord_origin": "TOPLEFT"}}, {"id": 8, "text": ") = 1", "bbox": {"l": 143.76799, "t": 157.05798000000004, "r": 165.9019, "b": 165.90479000000005, "coord_origin": "TOPLEFT"}}, {"id": 9, "text": "\u2212", "bbox": {"l": 168.12099, "t": 156.50012000000004, "r": 175.8699, "b": 165.90479000000005, "coord_origin": "TOPLEFT"}}, {"id": 10, "text": "EditDist (", "bbox": {"l": 179.27899, "t": 150.31799, "r": 221.95677, "b": 159.16479000000004, "coord_origin": "TOPLEFT"}}, {"id": 11, "text": "T$_{a}$, T$_{b}$", "bbox": {"l": 221.95200000000003, "t": 150.31799, "r": 246.34663, "b": 159.16479000000004, "coord_origin": "TOPLEFT"}}, {"id": 12, "text": ")", "bbox": {"l": 246.84499999999997, "t": 150.31799, "r": 250.71945, "b": 159.16479000000004, "coord_origin": "TOPLEFT"}}, {"id": 13, "text": "max (", "bbox": {"l": 182.21201, "t": 163.89197000000001, "r": 206.29161, "b": 172.73877000000005, "coord_origin": "TOPLEFT"}}, {"id": 14, "text": "|", "bbox": {"l": 206.289, "t": 163.33411, "r": 209.05661, "b": 172.73877000000005, "coord_origin": "TOPLEFT"}}, {"id": 15, "text": "T$_{a}$", "bbox": {"l": 209.056, "t": 163.89197000000001, "r": 219.19968, "b": 172.73877000000005, "coord_origin": "TOPLEFT"}}, {"id": 16, "text": "|", "bbox": {"l": 219.69700999999998, "t": 163.33411, "r": 222.46461000000002, "b": 172.73877000000005, "coord_origin": "TOPLEFT"}}, {"id": 17, "text": ",", "bbox": {"l": 224.125, "t": 163.89197000000001, "r": 226.89261, "b": 172.73877000000005, "coord_origin": "TOPLEFT"}}, {"id": 18, "text": "|", "bbox": {"l": 228.55299000000002, "t": 163.33411, "r": 231.3206, "b": 172.73877000000005, "coord_origin": "TOPLEFT"}}, {"id": 19, "text": "T$_{b}$", "bbox": {"l": 231.31999, "t": 163.89197000000001, "r": 240.64563, "b": 172.73877000000005, "coord_origin": "TOPLEFT"}}, {"id": 20, "text": "|", "bbox": {"l": 241.144, "t": 163.33411, "r": 243.91161, "b": 172.73877000000005, "coord_origin": "TOPLEFT"}}, {"id": 21, "text": ")", "bbox": {"l": 243.911, "t": 163.89197000000001, "r": 247.78545, "b": 172.73877000000005, "coord_origin": "TOPLEFT"}}, {"id": 22, "text": "(3)", "bbox": {"l": 274.746, "t": 157.21740999999997, "r": 286.3624, "b": 166.12396, "coord_origin": "TOPLEFT"}}]}, {"id": 3, "label": "text", "bbox": {"l": 49.815025329589844, "t": 180.52044677734375, "r": 286.4786376953125, "b": 213.97900000000004, "coord_origin": "TOPLEFT"}, "confidence": 0.9735332727432251, "cells": [{"id": 23, "text": "where", "bbox": {"l": 62.067001, "t": 181.16241000000002, "r": 86.405632, "b": 190.06897000000004, "coord_origin": "TOPLEFT"}}, {"id": 24, "text": "T$_{a}$", "bbox": {"l": 88.581001, "t": 181.00298999999995, "r": 98.724663, "b": 189.84978999999998, "coord_origin": "TOPLEFT"}}, {"id": 25, "text": "and", "bbox": {"l": 101.399, "t": 181.16241000000002, "r": 115.785, "b": 190.06897000000004, "coord_origin": "TOPLEFT"}}, {"id": 26, "text": "T$_{b}$", "bbox": {"l": 117.961, "t": 181.00298999999995, "r": 127.28664, "b": 189.84978999999998, "coord_origin": "TOPLEFT"}}, {"id": 27, "text": "represent tables in tree structure HTML", "bbox": {"l": 129.95999, "t": 181.16241000000002, "r": 286.36285, "b": 190.06897000000004, "coord_origin": "TOPLEFT"}}, {"id": 28, "text": "format. EditDist denotes the tree-edit distance, and", "bbox": {"l": 50.111992, "t": 193.11743, "r": 252.78116000000003, "b": 202.02399000000003, "coord_origin": "TOPLEFT"}}, {"id": 29, "text": "|", "bbox": {"l": 255.18201, "t": 192.40015000000005, "r": 257.94962, "b": 201.80480999999997, "coord_origin": "TOPLEFT"}}, {"id": 30, "text": "T", "bbox": {"l": 257.94901, "t": 192.95800999999994, "r": 263.77115, "b": 201.80480999999997, "coord_origin": "TOPLEFT"}}, {"id": 31, "text": "|", "bbox": {"l": 265.155, "t": 192.40015000000005, "r": 267.92261, "b": 201.80480999999997, "coord_origin": "TOPLEFT"}}, {"id": 32, "text": "rep-", "bbox": {"l": 270.32199, "t": 193.11743, "r": 286.36179, "b": 202.02399000000003, "coord_origin": "TOPLEFT"}}, {"id": 33, "text": "resents the number of nodes in", "bbox": {"l": 50.111984, "t": 205.07245, "r": 172.13388, "b": 213.97900000000004, "coord_origin": "TOPLEFT"}}, {"id": 34, "text": "T", "bbox": {"l": 174.62399, "t": 204.91301999999996, "r": 180.44614, "b": 213.75982999999997, "coord_origin": "TOPLEFT"}}, {"id": 35, "text": ".", "bbox": {"l": 181.82899, "t": 205.07245, "r": 184.31964, "b": 213.97900000000004, "coord_origin": "TOPLEFT"}}]}, {"id": 4, "label": "section_header", "bbox": {"l": 49.47447204589844, "t": 224.4459991455078, "r": 170.64169311523438, "b": 235.01736450195312, "coord_origin": "TOPLEFT"}, "confidence": 0.9588840007781982, "cells": [{"id": 36, "text": "5.4.", "bbox": {"l": 50.112, "t": 224.81946000000005, "r": 64.551605, "b": 234.67151, "coord_origin": "TOPLEFT"}}, {"id": 37, "text": "Quantitative Analysis", "bbox": {"l": 74.178009, "t": 224.81946000000005, "r": 170.45169, "b": 234.67151, "coord_origin": "TOPLEFT"}}]}, {"id": 5, "label": "text", "bbox": {"l": 49.47037124633789, "t": 242.6270294189453, "r": 286.4912414550781, "b": 396.757568359375, "coord_origin": "TOPLEFT"}, "confidence": 0.9855114221572876, "cells": [{"id": 38, "text": "Structure.", "bbox": {"l": 62.067001, "t": 243.6499, "r": 105.32461, "b": 252.60626000000002, "coord_origin": "TOPLEFT"}}, {"id": 39, "text": "As shown in Tab.", "bbox": {"l": 112.12600000000002, "t": 243.76946999999996, "r": 184.68361, "b": 252.67602999999997, "coord_origin": "TOPLEFT"}}, {"id": 40, "text": "2, TableFormer outper-", "bbox": {"l": 191.4781, "t": 243.76946999999996, "r": 286.36188, "b": 252.67602999999997, "coord_origin": "TOPLEFT"}}, {"id": 41, "text": "forms all SOTA methods across different datasets by a large", "bbox": {"l": 50.112, "t": 255.72448999999995, "r": 286.36508, "b": 264.63104, "coord_origin": "TOPLEFT"}}, {"id": 42, "text": "margin for predicting the table structure from an image.", "bbox": {"l": 50.112, "t": 267.67949999999996, "r": 286.36508, "b": 276.58606, "coord_origin": "TOPLEFT"}}, {"id": 43, "text": "All the more, our model outperforms pre-trained methods.", "bbox": {"l": 50.112, "t": 279.63446, "r": 286.36508, "b": 288.54105, "coord_origin": "TOPLEFT"}}, {"id": 44, "text": "During the evaluation we do not apply any table filtering.", "bbox": {"l": 50.112, "t": 291.59048, "r": 286.36514, "b": 300.49704, "coord_origin": "TOPLEFT"}}, {"id": 45, "text": "We also provide our baseline results on the SynthTabNet", "bbox": {"l": 50.112, "t": 303.54547, "r": 286.36508, "b": 312.45203000000004, "coord_origin": "TOPLEFT"}}, {"id": 46, "text": "dataset. It has been observed that large tables (e.g. tables", "bbox": {"l": 50.112, "t": 315.50046, "r": 286.36505, "b": 324.40700999999996, "coord_origin": "TOPLEFT"}}, {"id": 47, "text": "that occupy half of the page or more) yield poor predictions.", "bbox": {"l": 50.112, "t": 327.45544, "r": 286.36508, "b": 336.362, "coord_origin": "TOPLEFT"}}, {"id": 48, "text": "We attribute this issue to the image resizing during the pre-", "bbox": {"l": 50.112, "t": 339.41043, "r": 286.36508, "b": 348.31699000000003, "coord_origin": "TOPLEFT"}}, {"id": 49, "text": "processing step, that produces downsampled images with", "bbox": {"l": 50.112, "t": 351.36542, "r": 286.36505, "b": 360.27197, "coord_origin": "TOPLEFT"}}, {"id": 50, "text": "indistinguishable features. This problem can be addressed", "bbox": {"l": 50.112, "t": 363.32141, "r": 286.36508, "b": 372.2279700000001, "coord_origin": "TOPLEFT"}}, {"id": 51, "text": "by treating such big tables with a separate model which ac-", "bbox": {"l": 50.112, "t": 375.2764, "r": 286.36511, "b": 384.18295000000006, "coord_origin": "TOPLEFT"}}, {"id": 52, "text": "cepts a large input image size.", "bbox": {"l": 50.112, "t": 387.23138, "r": 170.01187, "b": 396.13794, "coord_origin": "TOPLEFT"}}]}, {"id": 6, "label": "table", "bbox": {"l": 53.36846160888672, "t": 409.1356506347656, "r": 283.0443420410156, "b": 582.397705078125, "coord_origin": "TOPLEFT"}, "confidence": 0.989250659942627, "cells": [{"id": 53, "text": "Model", "bbox": {"l": 78.843002, "t": 420.69037, "r": 104.85535, "b": 429.59692, "coord_origin": "TOPLEFT"}}, {"id": 54, "text": "TEDS", "bbox": {"l": 211.2, "t": 414.71237, "r": 236.10649, "b": 423.61893, "coord_origin": "TOPLEFT"}}, {"id": 55, "text": "Dataset", "bbox": {"l": 129.338, "t": 426.66736, "r": 159.21584, "b": 435.57391000000007, "coord_origin": "TOPLEFT"}}, {"id": 56, "text": "Simple", "bbox": {"l": 171.17096, "t": 426.66736, "r": 199.40497, "b": 435.57391000000007, "coord_origin": "TOPLEFT"}}, {"id": 57, "text": "Complex", "bbox": {"l": 211.36009, "t": 426.66736, "r": 247.74349999999998, "b": 435.57391000000007, "coord_origin": "TOPLEFT"}}, {"id": 58, "text": "All", "bbox": {"l": 264.54044, "t": 426.66736, "r": 277.27264, "b": 435.57391000000007, "coord_origin": "TOPLEFT"}}, {"id": 59, "text": "EDD", "bbox": {"l": 81.612, "t": 443.62436, "r": 102.08514, "b": 452.53091, "coord_origin": "TOPLEFT"}}, {"id": 60, "text": "PTN", "bbox": {"l": 134.87206, "t": 443.62436, "r": 153.69141, "b": 452.53091, "coord_origin": "TOPLEFT"}}, {"id": 61, "text": "91.1", "bbox": {"l": 176.56554, "t": 443.62436, "r": 194.00009, "b": 452.53091, "coord_origin": "TOPLEFT"}}, {"id": 62, "text": "88.7", "bbox": {"l": 220.82938000000001, "t": 443.62436, "r": 238.26393, "b": 452.53091, "coord_origin": "TOPLEFT"}}, {"id": 63, "text": "89.9", "bbox": {"l": 262.18414, "t": 443.62436, "r": 279.61868, "b": 452.53091, "coord_origin": "TOPLEFT"}}, {"id": 64, "text": "GTE", "bbox": {"l": 82.165001, "t": 455.58035, "r": 101.5323, "b": 464.48691, "coord_origin": "TOPLEFT"}}, {"id": 65, "text": "PTN", "bbox": {"l": 134.86716, "t": 455.58035, "r": 153.68651, "b": 464.48691, "coord_origin": "TOPLEFT"}}, {"id": 66, "text": "-", "bbox": {"l": 183.62411, "t": 455.58035, "r": 186.94167, "b": 464.48691, "coord_origin": "TOPLEFT"}}, {"id": 67, "text": "-", "bbox": {"l": 227.88795000000002, "t": 455.58035, "r": 231.20551, "b": 464.48691, "coord_origin": "TOPLEFT"}}, {"id": 68, "text": "93.01", "bbox": {"l": 259.69855, "t": 455.58035, "r": 282.11441, "b": 464.48691, "coord_origin": "TOPLEFT"}}, {"id": 69, "text": "TableFormer", "bbox": {"l": 66.315002, "t": 468.13336, "r": 117.38329000000002, "b": 477.03992, "coord_origin": "TOPLEFT"}}, {"id": 70, "text": "PTN", "bbox": {"l": 134.86766, "t": 468.13336, "r": 153.68701, "b": 477.03992, "coord_origin": "TOPLEFT"}}, {"id": 71, "text": "98.5", "bbox": {"l": 176.57111, "t": 468.13336, "r": 194.00566, "b": 477.03992, "coord_origin": "TOPLEFT"}}, {"id": 72, "text": "95.0", "bbox": {"l": 220.83495, "t": 468.13336, "r": 238.26950000000002, "b": 477.03992, "coord_origin": "TOPLEFT"}}, {"id": 73, "text": "96.75", "bbox": {"l": 259.698, "t": 468.01379, "r": 282.11386, "b": 476.97018, "coord_origin": "TOPLEFT"}}, {"id": 74, "text": "EDD", "bbox": {"l": 81.612, "t": 483.32635, "r": 102.08514, "b": 492.23291, "coord_origin": "TOPLEFT"}}, {"id": 75, "text": "FTN", "bbox": {"l": 134.87206, "t": 483.32635, "r": 153.69141, "b": 492.23291, "coord_origin": "TOPLEFT"}}, {"id": 76, "text": "88.4", "bbox": {"l": 176.56554, "t": 483.32635, "r": 194.00009, "b": 492.23291, "coord_origin": "TOPLEFT"}}, {"id": 77, "text": "92.08", "bbox": {"l": 218.33870999999996, "t": 483.32635, "r": 240.75455999999997, "b": 492.23291, "coord_origin": "TOPLEFT"}}, {"id": 78, "text": "90.6", "bbox": {"l": 262.18411, "t": 483.32635, "r": 279.61865, "b": 492.23291, "coord_origin": "TOPLEFT"}}, {"id": 79, "text": "GTE", "bbox": {"l": 82.165001, "t": 495.28134, "r": 101.5323, "b": 504.1879, "coord_origin": "TOPLEFT"}}, {"id": 80, "text": "FTN", "bbox": {"l": 134.86716, "t": 495.28134, "r": 153.68651, "b": 504.1879, "coord_origin": "TOPLEFT"}}, {"id": 81, "text": "-", "bbox": {"l": 183.62411, "t": 495.28134, "r": 186.94167, "b": 504.1879, "coord_origin": "TOPLEFT"}}, {"id": 82, "text": "-", "bbox": {"l": 227.88795000000002, "t": 495.28134, "r": 231.20551, "b": 504.1879, "coord_origin": "TOPLEFT"}}, {"id": 83, "text": "87.14", "bbox": {"l": 259.69855, "t": 495.28134, "r": 282.11441, "b": 504.1879, "coord_origin": "TOPLEFT"}}, {"id": 84, "text": "GTE (FT)", "bbox": {"l": 71.789001, "t": 507.23633, "r": 111.90838999999998, "b": 516.14288, "coord_origin": "TOPLEFT"}}, {"id": 85, "text": "FTN", "bbox": {"l": 134.86221, "t": 507.23633, "r": 153.68156, "b": 516.14288, "coord_origin": "TOPLEFT"}}, {"id": 86, "text": "-", "bbox": {"l": 183.62914, "t": 507.23633, "r": 186.94669, "b": 516.14288, "coord_origin": "TOPLEFT"}}, {"id": 87, "text": "-", "bbox": {"l": 227.89297, "t": 507.23633, "r": 231.21053000000003, "b": 516.14288, "coord_origin": "TOPLEFT"}}, {"id": 88, "text": "91.02", "bbox": {"l": 259.6936, "t": 507.23633, "r": 282.10947, "b": 516.14288, "coord_origin": "TOPLEFT"}}, {"id": 89, "text": "TableFormer", "bbox": {"l": 66.315002, "t": 519.1913099999999, "r": 117.38329000000002, "b": 528.0978700000001, "coord_origin": "TOPLEFT"}}, {"id": 90, "text": "FTN", "bbox": {"l": 134.86766, "t": 519.1913099999999, "r": 153.68701, "b": 528.0978700000001, "coord_origin": "TOPLEFT"}}, {"id": 91, "text": "97.5", "bbox": {"l": 176.57111, "t": 519.1913099999999, "r": 194.00566, "b": 528.0978700000001, "coord_origin": "TOPLEFT"}}, {"id": 92, "text": "96.0", "bbox": {"l": 220.83495, "t": 519.1913099999999, "r": 238.26950000000002, "b": 528.0978700000001, "coord_origin": "TOPLEFT"}}, {"id": 93, "text": "96.8", "bbox": {"l": 262.189, "t": 519.0717500000001, "r": 279.62354, "b": 528.02814, "coord_origin": "TOPLEFT"}}, {"id": 94, "text": "EDD", "bbox": {"l": 81.612, "t": 536.49837, "r": 102.08514, "b": 545.40492, "coord_origin": "TOPLEFT"}}, {"id": 95, "text": "TB", "bbox": {"l": 137.91064, "t": 536.49837, "r": 150.64285, "b": 545.40492, "coord_origin": "TOPLEFT"}}, {"id": 96, "text": "86.0", "bbox": {"l": 176.56554, "t": 536.49837, "r": 194.00009, "b": 545.40492, "coord_origin": "TOPLEFT"}}, {"id": 97, "text": "-", "bbox": {"l": 227.89285, "t": 536.49837, "r": 231.21040000000002, "b": 545.40492, "coord_origin": "TOPLEFT"}}, {"id": 98, "text": "86.0", "bbox": {"l": 262.18411, "t": 536.49837, "r": 279.61865, "b": 545.40492, "coord_origin": "TOPLEFT"}}, {"id": 99, "text": "TableFormer", "bbox": {"l": 66.315002, "t": 548.45436, "r": 117.38329000000002, "b": 557.36092, "coord_origin": "TOPLEFT"}}, {"id": 100, "text": "TB", "bbox": {"l": 137.90625, "t": 548.45436, "r": 150.63846, "b": 557.36092, "coord_origin": "TOPLEFT"}}, {"id": 101, "text": "89.6", "bbox": {"l": 176.57111, "t": 548.45436, "r": 194.00566, "b": 557.36092, "coord_origin": "TOPLEFT"}}, {"id": 102, "text": "-", "bbox": {"l": 227.88845999999998, "t": 548.45436, "r": 231.20601, "b": 557.36092, "coord_origin": "TOPLEFT"}}, {"id": 103, "text": "89.6", "bbox": {"l": 262.189, "t": 548.3348100000001, "r": 279.62354, "b": 557.2911799999999, "coord_origin": "TOPLEFT"}}, {"id": 104, "text": "TableFormer", "bbox": {"l": 66.315002, "t": 568.00237, "r": 117.38329000000002, "b": 576.90892, "coord_origin": "TOPLEFT"}}, {"id": 105, "text": "STN", "bbox": {"l": 134.86766, "t": 568.00237, "r": 153.68701, "b": 576.90892, "coord_origin": "TOPLEFT"}}, {"id": 106, "text": "96.9", "bbox": {"l": 176.57111, "t": 568.00237, "r": 194.00566, "b": 576.90892, "coord_origin": "TOPLEFT"}}, {"id": 107, "text": "95.7", "bbox": {"l": 220.83495, "t": 568.00237, "r": 238.26950000000002, "b": 576.90892, "coord_origin": "TOPLEFT"}}, {"id": 108, "text": "96.7", "bbox": {"l": 262.1897, "t": 568.00237, "r": 279.62424, "b": 576.90892, "coord_origin": "TOPLEFT"}}]}, {"id": 7, "label": "text", "bbox": {"l": 49.4423828125, "t": 591.6051635742188, "r": 286.63427734375, "b": 613.4329223632812, "coord_origin": "TOPLEFT"}, "confidence": 0.7209144830703735, "cells": [{"id": 109, "text": "Table 2: Structure results on PubTabNet (PTN), FinTabNet", "bbox": {"l": 50.112, "t": 592.43336, "r": 286.36511, "b": 601.33992, "coord_origin": "TOPLEFT"}}, {"id": 110, "text": "(FTN), TableBank (TB) and SynthTabNet (STN).", "bbox": {"l": 50.112, "t": 604.38837, "r": 247.46114, "b": 613.29492, "coord_origin": "TOPLEFT"}}]}, {"id": 8, "label": "text", "bbox": {"l": 49.50996780395508, "t": 615.5875244140625, "r": 261.78732, "b": 625.24992, "coord_origin": "TOPLEFT"}, "confidence": 0.6433371901512146, "cells": [{"id": 111, "text": "FT: Model was trained on PubTabNet then finetuned.", "bbox": {"l": 50.112, "t": 616.34337, "r": 261.78732, "b": 625.24992, "coord_origin": "TOPLEFT"}}]}, {"id": 9, "label": "text", "bbox": {"l": 49.4313850402832, "t": 643.5670166015625, "r": 286.515869140625, "b": 713.6913452148438, "coord_origin": "TOPLEFT"}, "confidence": 0.9854632616043091, "cells": [{"id": 112, "text": "Cell Detection.", "bbox": {"l": 62.067001, "t": 644.3498099999999, "r": 124.72179, "b": 653.30618, "coord_origin": "TOPLEFT"}}, {"id": 113, "text": "Like any object detector, our", "bbox": {"l": 128.20401, "t": 644.46936, "r": 242.9333, "b": 653.37592, "coord_origin": "TOPLEFT"}}, {"id": 114, "text": "Cell BBox", "bbox": {"l": 245.55401999999998, "t": 644.55902, "r": 286.36084, "b": 653.1467700000001, "coord_origin": "TOPLEFT"}}, {"id": 115, "text": "Detector", "bbox": {"l": 50.112015, "t": 656.51402, "r": 84.971146, "b": 665.10178, "coord_origin": "TOPLEFT"}}, {"id": 116, "text": "provides bounding boxes that can be improved", "bbox": {"l": 89.515015, "t": 656.42436, "r": 286.366, "b": 665.33092, "coord_origin": "TOPLEFT"}}, {"id": 117, "text": "with post-processing during inference. We make use of the", "bbox": {"l": 50.112015, "t": 668.37936, "r": 286.36511, "b": 677.28593, "coord_origin": "TOPLEFT"}}, {"id": 118, "text": "grid-like structure of tables to refine the predictions. A de-", "bbox": {"l": 50.112015, "t": 680.33536, "r": 286.36505, "b": 689.24193, "coord_origin": "TOPLEFT"}}, {"id": 119, "text": "tailed explanation on the post-processing is available in the", "bbox": {"l": 50.112015, "t": 692.290359, "r": 286.36511, "b": 701.19693, "coord_origin": "TOPLEFT"}}, {"id": 120, "text": "supplementary material. As shown in Tab. 3, we evaluate", "bbox": {"l": 50.112015, "t": 704.245361, "r": 286.36508, "b": 713.151932, "coord_origin": "TOPLEFT"}}]}, {"id": 10, "label": "text", "bbox": {"l": 307.97955322265625, "t": 74.48530578613281, "r": 545.258544921875, "b": 227.81777954101562, "coord_origin": "TOPLEFT"}, "confidence": 0.9713152647018433, "cells": [{"id": 121, "text": "our", "bbox": {"l": 308.862, "t": 75.20836999999995, "r": 322.14215, "b": 84.11492999999996, "coord_origin": "TOPLEFT"}}, {"id": 122, "text": "Cell BBox Decoder", "bbox": {"l": 325.45401, "t": 75.29803000000004, "r": 404.56702, "b": 83.88580000000002, "coord_origin": "TOPLEFT"}}, {"id": 123, "text": "accuracy for cells with a class la-", "bbox": {"l": 408.104, "t": 75.20836999999995, "r": 545.10968, "b": 84.11492999999996, "coord_origin": "TOPLEFT"}}, {"id": 124, "text": "bel of \u2018content\u2019 only using the PASCAL VOC mAP metric", "bbox": {"l": 308.862, "t": 87.16339000000005, "r": 545.11511, "b": 96.06994999999995, "coord_origin": "TOPLEFT"}}, {"id": 125, "text": "for pre-processing and post-processing.", "bbox": {"l": 308.862, "t": 99.11841000000004, "r": 470.22626, "b": 108.02495999999985, "coord_origin": "TOPLEFT"}}, {"id": 126, "text": "Note that we do", "bbox": {"l": 477.52884, "t": 99.11841000000004, "r": 545.11511, "b": 108.02495999999985, "coord_origin": "TOPLEFT"}}, {"id": 127, "text": "not have post-processing results for SynthTabNet as images", "bbox": {"l": 308.862, "t": 111.07343000000003, "r": 545.11517, "b": 119.97997999999984, "coord_origin": "TOPLEFT"}}, {"id": 128, "text": "are only provided. To compare the performance of our pro-", "bbox": {"l": 308.862, "t": 123.02844000000005, "r": 545.11511, "b": 131.93499999999995, "coord_origin": "TOPLEFT"}}, {"id": 129, "text": "posed approach, we\u2019ve integrated TableFormer\u2019s", "bbox": {"l": 308.862, "t": 134.98443999999995, "r": 502.01691000000005, "b": 143.89099, "coord_origin": "TOPLEFT"}}, {"id": 130, "text": "Cell BBox", "bbox": {"l": 504.47299, "t": 135.07410000000004, "r": 545.11041, "b": 143.66187000000002, "coord_origin": "TOPLEFT"}}, {"id": 131, "text": "Decoder", "bbox": {"l": 308.862, "t": 147.02910999999995, "r": 343.16324, "b": 155.61688000000004, "coord_origin": "TOPLEFT"}}, {"id": 132, "text": "into EDD architecture. As mentioned previously,", "bbox": {"l": 346.371, "t": 146.93944999999997, "r": 545.11493, "b": 155.84600999999998, "coord_origin": "TOPLEFT"}}, {"id": 133, "text": "the Structure Decoder provides the", "bbox": {"l": 308.862, "t": 158.89446999999996, "r": 446.15652, "b": 167.80102999999997, "coord_origin": "TOPLEFT"}}, {"id": 134, "text": "Cell BBox Decoder", "bbox": {"l": 448.28998000000007, "t": 158.98413000000005, "r": 525.04181, "b": 167.57190000000003, "coord_origin": "TOPLEFT"}}, {"id": 135, "text": "with", "bbox": {"l": 527.39899, "t": 158.89446999999996, "r": 545.11249, "b": 167.80102999999997, "coord_origin": "TOPLEFT"}}, {"id": 136, "text": "the features needed to predict the bounding box predictions.", "bbox": {"l": 308.862, "t": 170.84948999999995, "r": 545.11511, "b": 179.75603999999998, "coord_origin": "TOPLEFT"}}, {"id": 137, "text": "Therefore, the accuracy of the", "bbox": {"l": 308.862, "t": 182.80449999999996, "r": 432.86642000000006, "b": 191.71105999999997, "coord_origin": "TOPLEFT"}}, {"id": 138, "text": "Structure Decoder", "bbox": {"l": 436.39001, "t": 182.89417000000003, "r": 510.93021, "b": 191.48193000000003, "coord_origin": "TOPLEFT"}}, {"id": 139, "text": "directly", "bbox": {"l": 514.677, "t": 182.80449999999996, "r": 545.11273, "b": 191.71105999999997, "coord_origin": "TOPLEFT"}}, {"id": 140, "text": "influences the accuracy of the", "bbox": {"l": 308.862, "t": 194.75951999999995, "r": 431.17285, "b": 203.66607999999997, "coord_origin": "TOPLEFT"}}, {"id": 141, "text": "Cell BBox Decoder", "bbox": {"l": 434.6790199999999, "t": 194.84918000000005, "r": 514.18054, "b": 203.43695000000002, "coord_origin": "TOPLEFT"}}, {"id": 142, "text": ". If the", "bbox": {"l": 514.17603, "t": 194.75951999999995, "r": 545.10992, "b": 203.66607999999997, "coord_origin": "TOPLEFT"}}, {"id": 143, "text": "Structure Decoder", "bbox": {"l": 308.86203, "t": 206.80517999999995, "r": 382.35614, "b": 215.39293999999995, "coord_origin": "TOPLEFT"}}, {"id": 144, "text": "predicts an extra column, this will result", "bbox": {"l": 385.07501, "t": 206.71551999999997, "r": 545.11426, "b": 215.62207, "coord_origin": "TOPLEFT"}}, {"id": 145, "text": "in an extra column of predicted bounding boxes.", "bbox": {"l": 308.862, "t": 218.67052999999999, "r": 501.6981799999999, "b": 227.57709, "coord_origin": "TOPLEFT"}}]}, {"id": 11, "label": "table", "bbox": {"l": 308.4067077636719, "t": 247.87644958496094, "r": 533.64208984375, "b": 303.8056640625, "coord_origin": "TOPLEFT"}, "confidence": 0.9691707491874695, "cells": [{"id": 146, "text": "Model", "bbox": {"l": 339.323, "t": 253.66436999999996, "r": 365.33536, "b": 262.57092, "coord_origin": "TOPLEFT"}}, {"id": 147, "text": "Dataset", "bbox": {"l": 401.04132, "t": 253.66436999999996, "r": 430.91916, "b": 262.57092, "coord_origin": "TOPLEFT"}}, {"id": 148, "text": "mAP", "bbox": {"l": 454.10214, "t": 253.66436999999996, "r": 474.58523999999994, "b": 262.57092, "coord_origin": "TOPLEFT"}}, {"id": 149, "text": "mAP (PP)", "bbox": {"l": 486.54034, "t": 253.66436999999996, "r": 527.2276, "b": 262.57092, "coord_origin": "TOPLEFT"}}, {"id": 150, "text": "EDD+BBox", "bbox": {"l": 327.65601, "t": 270.62134000000003, "r": 377.00076, "b": 279.52788999999996, "coord_origin": "TOPLEFT"}}, {"id": 151, "text": "PubTabNet", "bbox": {"l": 393.69809, "t": 270.62134000000003, "r": 438.28073, "b": 279.52788999999996, "coord_origin": "TOPLEFT"}}, {"id": 152, "text": "79.2", "bbox": {"l": 455.63559, "t": 270.62134000000003, "r": 473.07013, "b": 279.52788999999996, "coord_origin": "TOPLEFT"}}, {"id": 153, "text": "82.7", "bbox": {"l": 498.16592, "t": 270.62134000000003, "r": 515.60046, "b": 279.52788999999996, "coord_origin": "TOPLEFT"}}, {"id": 154, "text": "TableFormer", "bbox": {"l": 326.79501, "t": 282.57631999999995, "r": 377.86331, "b": 291.48288, "coord_origin": "TOPLEFT"}}, {"id": 155, "text": "PubTabNet", "bbox": {"l": 393.69388, "t": 282.57631999999995, "r": 438.27652, "b": 291.48288, "coord_origin": "TOPLEFT"}}, {"id": 156, "text": "82.1", "bbox": {"l": 455.63101, "t": 282.45676, "r": 473.06555000000003, "b": 291.41315, "coord_origin": "TOPLEFT"}}, {"id": 157, "text": "86.8", "bbox": {"l": 498.1713, "t": 282.45676, "r": 515.60583, "b": 291.41315, "coord_origin": "TOPLEFT"}}, {"id": 158, "text": "TableFormer", "bbox": {"l": 326.79501, "t": 294.53131, "r": 377.86331, "b": 303.43787, "coord_origin": "TOPLEFT"}}, {"id": 159, "text": "SynthTabNet", "bbox": {"l": 389.81842, "t": 294.53131, "r": 442.15194999999994, "b": 303.43787, "coord_origin": "TOPLEFT"}}, {"id": 160, "text": "87.7", "bbox": {"l": 455.63135, "t": 294.53131, "r": 473.06589, "b": 303.43787, "coord_origin": "TOPLEFT"}}, {"id": 161, "text": "-", "bbox": {"l": 505.22515999999996, "t": 294.53131, "r": 508.54268999999994, "b": 303.43787, "coord_origin": "TOPLEFT"}}]}, {"id": 12, "label": "caption", "bbox": {"l": 308.0755615234375, "t": 315.7750549316406, "r": 545.11517, "b": 337.8129577636719, "coord_origin": "TOPLEFT"}, "confidence": 0.9519906044006348, "cells": [{"id": 162, "text": "Table 3:", "bbox": {"l": 308.862, "t": 316.44931, "r": 341.49951, "b": 325.35587, "coord_origin": "TOPLEFT"}}, {"id": 163, "text": "Cell Bounding Box detection results on PubTab-", "bbox": {"l": 348.60284, "t": 316.44931, "r": 545.11517, "b": 325.35587, "coord_origin": "TOPLEFT"}}, {"id": 164, "text": "Net, and FinTabNet. PP: Post-processing.", "bbox": {"l": 308.862, "t": 328.4043, "r": 474.97845, "b": 337.3108500000001, "coord_origin": "TOPLEFT"}}]}, {"id": 13, "label": "text", "bbox": {"l": 307.8382873535156, "t": 367.0015563964844, "r": 545.3449096679688, "b": 520.16769, "coord_origin": "TOPLEFT"}, "confidence": 0.9835002422332764, "cells": [{"id": 165, "text": "Cell Content.", "bbox": {"l": 320.81699, "t": 367.6797199999999, "r": 378.94876, "b": 376.63611, "coord_origin": "TOPLEFT"}}, {"id": 166, "text": "In this section, we evaluate the entire", "bbox": {"l": 387.07898, "t": 367.79929, "r": 545.11566, "b": 376.70584, "coord_origin": "TOPLEFT"}}, {"id": 167, "text": "pipeline of recovering a table with content.", "bbox": {"l": 308.86197, "t": 379.75426999999996, "r": 487.19257, "b": 388.66083, "coord_origin": "TOPLEFT"}}, {"id": 168, "text": "Here we put", "bbox": {"l": 493.96713, "t": 379.75426999999996, "r": 545.11511, "b": 388.66083, "coord_origin": "TOPLEFT"}}, {"id": 169, "text": "our approach to test by capitalizing on extracting content", "bbox": {"l": 308.86197, "t": 391.70926, "r": 545.11505, "b": 400.61581, "coord_origin": "TOPLEFT"}}, {"id": 170, "text": "from the PDF cells rather than decoding from images. Tab.", "bbox": {"l": 308.86197, "t": 403.66525, "r": 545.11523, "b": 412.57181, "coord_origin": "TOPLEFT"}}, {"id": 171, "text": "4", "bbox": {"l": 308.86197, "t": 415.62024, "r": 314.08096, "b": 424.52679, "coord_origin": "TOPLEFT"}}, {"id": 172, "text": "shows the TEDs score of HTML code representing the", "bbox": {"l": 316.69046, "t": 415.62024, "r": 545.11517, "b": 424.52679, "coord_origin": "TOPLEFT"}}, {"id": 173, "text": "structure of the table along with the content inserted in the", "bbox": {"l": 308.86197, "t": 427.57523, "r": 545.11505, "b": 436.48177999999996, "coord_origin": "TOPLEFT"}}, {"id": 174, "text": "data cell and compared with the ground-truth. Our method", "bbox": {"l": 308.86197, "t": 439.53021, "r": 545.11505, "b": 448.43677, "coord_origin": "TOPLEFT"}}, {"id": 175, "text": "achieved a", "bbox": {"l": 308.86197, "t": 451.4852, "r": 350.23666, "b": 460.39175, "coord_origin": "TOPLEFT"}}, {"id": 176, "text": "5.3%", "bbox": {"l": 352.17596, "t": 451.36563, "r": 374.59183, "b": 460.32201999999995, "coord_origin": "TOPLEFT"}}, {"id": 177, "text": "increase over the state-of-the-art, and com-", "bbox": {"l": 376.53296, "t": 451.4852, "r": 545.11011, "b": 460.39175, "coord_origin": "TOPLEFT"}}, {"id": 178, "text": "mercial solutions. We believe our scores would be higher", "bbox": {"l": 308.86197, "t": 463.44019, "r": 545.11511, "b": 472.34674, "coord_origin": "TOPLEFT"}}, {"id": 179, "text": "if the HTML ground-truth matched the extracted PDF cell", "bbox": {"l": 308.86197, "t": 475.39618, "r": 545.11517, "b": 484.30273, "coord_origin": "TOPLEFT"}}, {"id": 180, "text": "content. Unfortunately, there are small discrepancies such", "bbox": {"l": 308.86197, "t": 487.35117, "r": 545.11511, "b": 496.25772, "coord_origin": "TOPLEFT"}}, {"id": 181, "text": "as spacings around words or special characters with various", "bbox": {"l": 308.86197, "t": 499.30615, "r": 545.11505, "b": 508.21271, "coord_origin": "TOPLEFT"}}, {"id": 182, "text": "unicode representations.", "bbox": {"l": 308.86197, "t": 511.26114, "r": 405.69846, "b": 520.16769, "coord_origin": "TOPLEFT"}}]}, {"id": 14, "label": "table", "bbox": {"l": 332.9688720703125, "t": 540.2835083007812, "r": 520.942138671875, "b": 643.84991, "coord_origin": "TOPLEFT"}, "confidence": 0.9775565266609192, "cells": [{"id": 183, "text": "Model", "bbox": {"l": 358.01099, "t": 552.23337, "r": 384.02335, "b": 561.1399200000001, "coord_origin": "TOPLEFT"}}, {"id": 184, "text": "TEDS", "bbox": {"l": 449.03400000000005, "t": 546.25537, "r": 473.94049000000007, "b": 555.16193, "coord_origin": "TOPLEFT"}}, {"id": 185, "text": "Simple", "bbox": {"l": 408.50598, "t": 558.21037, "r": 436.73999, "b": 567.11693, "coord_origin": "TOPLEFT"}}, {"id": 186, "text": "Complex", "bbox": {"l": 448.6951, "t": 558.21037, "r": 485.07849, "b": 567.11693, "coord_origin": "TOPLEFT"}}, {"id": 187, "text": "All", "bbox": {"l": 499.3848, "t": 558.21037, "r": 512.117, "b": 567.11693, "coord_origin": "TOPLEFT"}}, {"id": 188, "text": "Tabula", "bbox": {"l": 357.68201, "t": 575.16736, "r": 384.3519, "b": 584.0739100000001, "coord_origin": "TOPLEFT"}}, {"id": 189, "text": "78.0", "bbox": {"l": 413.90097, "t": 575.16736, "r": 431.33550999999994, "b": 584.0739100000001, "coord_origin": "TOPLEFT"}}, {"id": 190, "text": "57.8", "bbox": {"l": 458.16479000000004, "t": 575.16736, "r": 475.59933000000007, "b": 584.0739100000001, "coord_origin": "TOPLEFT"}}, {"id": 191, "text": "67.9", "bbox": {"l": 497.0289, "t": 575.16736, "r": 514.46344, "b": 584.0739100000001, "coord_origin": "TOPLEFT"}}, {"id": 192, "text": "Traprange", "bbox": {"l": 350.72299, "t": 587.12236, "r": 391.31064, "b": 596.02892, "coord_origin": "TOPLEFT"}}, {"id": 193, "text": "60.8", "bbox": {"l": 413.90582, "t": 587.12236, "r": 431.34036, "b": 596.02892, "coord_origin": "TOPLEFT"}}, {"id": 194, "text": "49.9", "bbox": {"l": 458.16965, "t": 587.12236, "r": 475.60419, "b": 596.02892, "coord_origin": "TOPLEFT"}}, {"id": 195, "text": "55.4", "bbox": {"l": 497.03374999999994, "t": 587.12236, "r": 514.46832, "b": 596.02892, "coord_origin": "TOPLEFT"}}, {"id": 196, "text": "Camelot", "bbox": {"l": 354.13599, "t": 599.07835, "r": 387.89923, "b": 607.98491, "coord_origin": "TOPLEFT"}}, {"id": 197, "text": "80.0", "bbox": {"l": 413.90161, "t": 599.07835, "r": 431.33615, "b": 607.98491, "coord_origin": "TOPLEFT"}}, {"id": 198, "text": "66.0", "bbox": {"l": 458.16544, "t": 599.07835, "r": 475.59998, "b": 607.98491, "coord_origin": "TOPLEFT"}}, {"id": 199, "text": "73.0", "bbox": {"l": 497.02954000000005, "t": 599.07835, "r": 514.46411, "b": 607.98491, "coord_origin": "TOPLEFT"}}, {"id": 200, "text": "Acrobat Pro", "bbox": {"l": 346.55899, "t": 611.03336, "r": 395.47534, "b": 619.93991, "coord_origin": "TOPLEFT"}}, {"id": 201, "text": "68.9", "bbox": {"l": 413.90616, "t": 611.03336, "r": 431.34069999999997, "b": 619.93991, "coord_origin": "TOPLEFT"}}, {"id": 202, "text": "61.8", "bbox": {"l": 458.16998000000007, "t": 611.03336, "r": 475.60452, "b": 619.93991, "coord_origin": "TOPLEFT"}}, {"id": 203, "text": "65.3", "bbox": {"l": 497.03409, "t": 611.03336, "r": 514.46863, "b": 619.93991, "coord_origin": "TOPLEFT"}}, {"id": 204, "text": "EDD", "bbox": {"l": 360.78101, "t": 622.9883600000001, "r": 381.25415, "b": 631.89491, "coord_origin": "TOPLEFT"}}, {"id": 205, "text": "91.2", "bbox": {"l": 413.90158, "t": 622.9883600000001, "r": 431.33612, "b": 631.89491, "coord_origin": "TOPLEFT"}}, {"id": 206, "text": "85.4", "bbox": {"l": 458.16541, "t": 622.9883600000001, "r": 475.59995000000004, "b": 631.89491, "coord_origin": "TOPLEFT"}}, {"id": 207, "text": "88.3", "bbox": {"l": 497.0295100000001, "t": 622.9883600000001, "r": 514.46405, "b": 631.89491, "coord_origin": "TOPLEFT"}}, {"id": 208, "text": "TableFormer", "bbox": {"l": 345.483, "t": 634.94336, "r": 396.5513, "b": 643.84991, "coord_origin": "TOPLEFT"}}, {"id": 209, "text": "95.4", "bbox": {"l": 413.90616, "t": 634.94336, "r": 431.34069999999997, "b": 643.84991, "coord_origin": "TOPLEFT"}}, {"id": 210, "text": "90.1", "bbox": {"l": 458.16998000000007, "t": 634.94336, "r": 475.60452, "b": 643.84991, "coord_origin": "TOPLEFT"}}, {"id": 211, "text": "93.6", "bbox": {"l": 497.03400000000005, "t": 634.82381, "r": 514.46857, "b": 643.78018, "coord_origin": "TOPLEFT"}}]}, {"id": 15, "label": "caption", "bbox": {"l": 307.9747619628906, "t": 655.8218994140625, "r": 545.1710815429688, "b": 689.8007202148438, "coord_origin": "TOPLEFT"}, "confidence": 0.954140305519104, "cells": [{"id": 212, "text": "Table 4:", "bbox": {"l": 308.862, "t": 656.86136, "r": 341.73862, "b": 665.76792, "coord_origin": "TOPLEFT"}}, {"id": 213, "text": "Results of structure with content retrieved using", "bbox": {"l": 349.55927, "t": 656.86136, "r": 545.11517, "b": 665.76792, "coord_origin": "TOPLEFT"}}, {"id": 214, "text": "cell detection on PubTabNet. In all cases the input is PDF", "bbox": {"l": 308.862, "t": 668.81636, "r": 545.11505, "b": 677.7229199999999, "coord_origin": "TOPLEFT"}}, {"id": 215, "text": "documents with cropped tables.", "bbox": {"l": 308.862, "t": 680.77136, "r": 435.03836, "b": 689.6779300000001, "coord_origin": "TOPLEFT"}}]}, {"id": 16, "label": "page_footer", "bbox": {"l": 294.5538330078125, "t": 733.197021484375, "r": 300.1892395019531, "b": 743.039921, "coord_origin": "TOPLEFT"}, "confidence": 0.8787975907325745, "cells": [{"id": 216, "text": "7", "bbox": {"l": 295.121, "t": 734.133358, "r": 300.10229, "b": 743.039921, "coord_origin": "TOPLEFT"}}]}]}, "tablestructure": {"table_map": {"6": {"label": "table", "id": 6, "page_no": 6, "cluster": {"id": 6, "label": "table", "bbox": {"l": 53.36846160888672, "t": 409.1356506347656, "r": 283.0443420410156, "b": 582.397705078125, "coord_origin": "TOPLEFT"}, "confidence": 0.989250659942627, "cells": [{"id": 53, "text": "Model", "bbox": {"l": 78.843002, "t": 420.69037, "r": 104.85535, "b": 429.59692, "coord_origin": "TOPLEFT"}}, {"id": 54, "text": "TEDS", "bbox": {"l": 211.2, "t": 414.71237, "r": 236.10649, "b": 423.61893, "coord_origin": "TOPLEFT"}}, {"id": 55, "text": "Dataset", "bbox": {"l": 129.338, "t": 426.66736, "r": 159.21584, "b": 435.57391000000007, "coord_origin": "TOPLEFT"}}, {"id": 56, "text": "Simple", "bbox": {"l": 171.17096, "t": 426.66736, "r": 199.40497, "b": 435.57391000000007, "coord_origin": "TOPLEFT"}}, {"id": 57, "text": "Complex", "bbox": {"l": 211.36009, "t": 426.66736, "r": 247.74349999999998, "b": 435.57391000000007, "coord_origin": "TOPLEFT"}}, {"id": 58, "text": "All", "bbox": {"l": 264.54044, "t": 426.66736, "r": 277.27264, "b": 435.57391000000007, "coord_origin": "TOPLEFT"}}, {"id": 59, "text": "EDD", "bbox": {"l": 81.612, "t": 443.62436, "r": 102.08514, "b": 452.53091, "coord_origin": "TOPLEFT"}}, {"id": 60, "text": "PTN", "bbox": {"l": 134.87206, "t": 443.62436, "r": 153.69141, "b": 452.53091, "coord_origin": "TOPLEFT"}}, {"id": 61, "text": "91.1", "bbox": {"l": 176.56554, "t": 443.62436, "r": 194.00009, "b": 452.53091, "coord_origin": "TOPLEFT"}}, {"id": 62, "text": "88.7", "bbox": {"l": 220.82938000000001, "t": 443.62436, "r": 238.26393, "b": 452.53091, "coord_origin": "TOPLEFT"}}, {"id": 63, "text": "89.9", "bbox": {"l": 262.18414, "t": 443.62436, "r": 279.61868, "b": 452.53091, "coord_origin": "TOPLEFT"}}, {"id": 64, "text": "GTE", "bbox": {"l": 82.165001, "t": 455.58035, "r": 101.5323, "b": 464.48691, "coord_origin": "TOPLEFT"}}, {"id": 65, "text": "PTN", "bbox": {"l": 134.86716, "t": 455.58035, "r": 153.68651, "b": 464.48691, "coord_origin": "TOPLEFT"}}, {"id": 66, "text": "-", "bbox": {"l": 183.62411, "t": 455.58035, "r": 186.94167, "b": 464.48691, "coord_origin": "TOPLEFT"}}, {"id": 67, "text": "-", "bbox": {"l": 227.88795000000002, "t": 455.58035, "r": 231.20551, "b": 464.48691, "coord_origin": "TOPLEFT"}}, {"id": 68, "text": "93.01", "bbox": {"l": 259.69855, "t": 455.58035, "r": 282.11441, "b": 464.48691, "coord_origin": "TOPLEFT"}}, {"id": 69, "text": "TableFormer", "bbox": {"l": 66.315002, "t": 468.13336, "r": 117.38329000000002, "b": 477.03992, "coord_origin": "TOPLEFT"}}, {"id": 70, "text": "PTN", "bbox": {"l": 134.86766, "t": 468.13336, "r": 153.68701, "b": 477.03992, "coord_origin": "TOPLEFT"}}, {"id": 71, "text": "98.5", "bbox": {"l": 176.57111, "t": 468.13336, "r": 194.00566, "b": 477.03992, "coord_origin": "TOPLEFT"}}, {"id": 72, "text": "95.0", "bbox": {"l": 220.83495, "t": 468.13336, "r": 238.26950000000002, "b": 477.03992, "coord_origin": "TOPLEFT"}}, {"id": 73, "text": "96.75", "bbox": {"l": 259.698, "t": 468.01379, "r": 282.11386, "b": 476.97018, "coord_origin": "TOPLEFT"}}, {"id": 74, "text": "EDD", "bbox": {"l": 81.612, "t": 483.32635, "r": 102.08514, "b": 492.23291, "coord_origin": "TOPLEFT"}}, {"id": 75, "text": "FTN", "bbox": {"l": 134.87206, "t": 483.32635, "r": 153.69141, "b": 492.23291, "coord_origin": "TOPLEFT"}}, {"id": 76, "text": "88.4", "bbox": {"l": 176.56554, "t": 483.32635, "r": 194.00009, "b": 492.23291, "coord_origin": "TOPLEFT"}}, {"id": 77, "text": "92.08", "bbox": {"l": 218.33870999999996, "t": 483.32635, "r": 240.75455999999997, "b": 492.23291, "coord_origin": "TOPLEFT"}}, {"id": 78, "text": "90.6", "bbox": {"l": 262.18411, "t": 483.32635, "r": 279.61865, "b": 492.23291, "coord_origin": "TOPLEFT"}}, {"id": 79, "text": "GTE", "bbox": {"l": 82.165001, "t": 495.28134, "r": 101.5323, "b": 504.1879, "coord_origin": "TOPLEFT"}}, {"id": 80, "text": "FTN", "bbox": {"l": 134.86716, "t": 495.28134, "r": 153.68651, "b": 504.1879, "coord_origin": "TOPLEFT"}}, {"id": 81, "text": "-", "bbox": {"l": 183.62411, "t": 495.28134, "r": 186.94167, "b": 504.1879, "coord_origin": "TOPLEFT"}}, {"id": 82, "text": "-", "bbox": {"l": 227.88795000000002, "t": 495.28134, "r": 231.20551, "b": 504.1879, "coord_origin": "TOPLEFT"}}, {"id": 83, "text": "87.14", "bbox": {"l": 259.69855, "t": 495.28134, "r": 282.11441, "b": 504.1879, "coord_origin": "TOPLEFT"}}, {"id": 84, "text": "GTE (FT)", "bbox": {"l": 71.789001, "t": 507.23633, "r": 111.90838999999998, "b": 516.14288, "coord_origin": "TOPLEFT"}}, {"id": 85, "text": "FTN", "bbox": {"l": 134.86221, "t": 507.23633, "r": 153.68156, "b": 516.14288, "coord_origin": "TOPLEFT"}}, {"id": 86, "text": "-", "bbox": {"l": 183.62914, "t": 507.23633, "r": 186.94669, "b": 516.14288, "coord_origin": "TOPLEFT"}}, {"id": 87, "text": "-", "bbox": {"l": 227.89297, "t": 507.23633, "r": 231.21053000000003, "b": 516.14288, "coord_origin": "TOPLEFT"}}, {"id": 88, "text": "91.02", "bbox": {"l": 259.6936, "t": 507.23633, "r": 282.10947, "b": 516.14288, "coord_origin": "TOPLEFT"}}, {"id": 89, "text": "TableFormer", "bbox": {"l": 66.315002, "t": 519.1913099999999, "r": 117.38329000000002, "b": 528.0978700000001, "coord_origin": "TOPLEFT"}}, {"id": 90, "text": "FTN", "bbox": {"l": 134.86766, "t": 519.1913099999999, "r": 153.68701, "b": 528.0978700000001, "coord_origin": "TOPLEFT"}}, {"id": 91, "text": "97.5", "bbox": {"l": 176.57111, "t": 519.1913099999999, "r": 194.00566, "b": 528.0978700000001, "coord_origin": "TOPLEFT"}}, {"id": 92, "text": "96.0", "bbox": {"l": 220.83495, "t": 519.1913099999999, "r": 238.26950000000002, "b": 528.0978700000001, "coord_origin": "TOPLEFT"}}, {"id": 93, "text": "96.8", "bbox": {"l": 262.189, "t": 519.0717500000001, "r": 279.62354, "b": 528.02814, "coord_origin": "TOPLEFT"}}, {"id": 94, "text": "EDD", "bbox": {"l": 81.612, "t": 536.49837, "r": 102.08514, "b": 545.40492, "coord_origin": "TOPLEFT"}}, {"id": 95, "text": "TB", "bbox": {"l": 137.91064, "t": 536.49837, "r": 150.64285, "b": 545.40492, "coord_origin": "TOPLEFT"}}, {"id": 96, "text": "86.0", "bbox": {"l": 176.56554, "t": 536.49837, "r": 194.00009, "b": 545.40492, "coord_origin": "TOPLEFT"}}, {"id": 97, "text": "-", "bbox": {"l": 227.89285, "t": 536.49837, "r": 231.21040000000002, "b": 545.40492, "coord_origin": "TOPLEFT"}}, {"id": 98, "text": "86.0", "bbox": {"l": 262.18411, "t": 536.49837, "r": 279.61865, "b": 545.40492, "coord_origin": "TOPLEFT"}}, {"id": 99, "text": "TableFormer", "bbox": {"l": 66.315002, "t": 548.45436, "r": 117.38329000000002, "b": 557.36092, "coord_origin": "TOPLEFT"}}, {"id": 100, "text": "TB", "bbox": {"l": 137.90625, "t": 548.45436, "r": 150.63846, "b": 557.36092, "coord_origin": "TOPLEFT"}}, {"id": 101, "text": "89.6", "bbox": {"l": 176.57111, "t": 548.45436, "r": 194.00566, "b": 557.36092, "coord_origin": "TOPLEFT"}}, {"id": 102, "text": "-", "bbox": {"l": 227.88845999999998, "t": 548.45436, "r": 231.20601, "b": 557.36092, "coord_origin": "TOPLEFT"}}, {"id": 103, "text": "89.6", "bbox": {"l": 262.189, "t": 548.3348100000001, "r": 279.62354, "b": 557.2911799999999, "coord_origin": "TOPLEFT"}}, {"id": 104, "text": "TableFormer", "bbox": {"l": 66.315002, "t": 568.00237, "r": 117.38329000000002, "b": 576.90892, "coord_origin": "TOPLEFT"}}, {"id": 105, "text": "STN", "bbox": {"l": 134.86766, "t": 568.00237, "r": 153.68701, "b": 576.90892, "coord_origin": "TOPLEFT"}}, {"id": 106, "text": "96.9", "bbox": {"l": 176.57111, "t": 568.00237, "r": 194.00566, "b": 576.90892, "coord_origin": "TOPLEFT"}}, {"id": 107, "text": "95.7", "bbox": {"l": 220.83495, "t": 568.00237, "r": 238.26950000000002, "b": 576.90892, "coord_origin": "TOPLEFT"}}, {"id": 108, "text": "96.7", "bbox": {"l": 262.1897, "t": 568.00237, "r": 279.62424, "b": 576.90892, "coord_origin": "TOPLEFT"}}]}, "text": null, "otsl_seq": ["ched", "ched", "ched", "ched", "ched", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "nl"], "num_rows": 11, "num_cols": 5, "table_cells": [{"bbox": {"l": 78.843002, "t": 420.69037, "r": 104.85535, "b": 429.59692, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Model", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 211.2, "t": 414.71237, "r": 247.74349999999998, "b": 435.57391000000007, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "TEDS Complex", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 129.338, "t": 426.66736, "r": 159.21584, "b": 435.57391000000007, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "Dataset", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 171.17096, "t": 426.66736, "r": 199.40497, "b": 435.57391000000007, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "Simple", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 264.54044, "t": 426.66736, "r": 277.27264, "b": 435.57391000000007, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "All", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 81.612, "t": 443.62436, "r": 102.08514, "b": 452.53091, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "EDD", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 134.87206, "t": 443.62436, "r": 153.69141, "b": 452.53091, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "PTN", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 176.56554, "t": 443.62436, "r": 194.00009, "b": 452.53091, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "91.1", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 220.82938000000001, "t": 443.62436, "r": 238.26393, "b": 452.53091, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "88.7", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 262.18414, "t": 443.62436, "r": 279.61868, "b": 452.53091, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "89.9", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 82.165001, "t": 455.58035, "r": 101.5323, "b": 464.48691, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "GTE", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 134.86716, "t": 455.58035, "r": 153.68651, "b": 464.48691, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "PTN", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 183.62411, "t": 455.58035, "r": 186.94167, "b": 464.48691, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "-", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 227.88795000000002, "t": 455.58035, "r": 231.20551, "b": 464.48691, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "-", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 259.69855, "t": 455.58035, "r": 282.11441, "b": 464.48691, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "93.01", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 66.315002, "t": 468.13336, "r": 117.38329000000002, "b": 477.03992, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "TableFormer", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 134.86766, "t": 468.13336, "r": 153.68701, "b": 477.03992, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "PTN", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 176.57111, "t": 468.13336, "r": 194.00566, "b": 477.03992, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "98.5", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 220.83495, "t": 468.13336, "r": 238.26950000000002, "b": 477.03992, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "95.0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 259.698, "t": 468.01379, "r": 282.11386, "b": 476.97018, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "96.75", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 81.612, "t": 483.32635, "r": 102.08514, "b": 492.23291, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "EDD", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 134.87206, "t": 483.32635, "r": 153.69141, "b": 492.23291, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "FTN", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 176.56554, "t": 483.32635, "r": 194.00009, "b": 492.23291, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "88.4", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 218.33870999999996, "t": 483.32635, "r": 240.75455999999997, "b": 492.23291, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "92.08", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 262.18411, "t": 483.32635, "r": 279.61865, "b": 492.23291, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "90.6", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 82.165001, "t": 495.28134, "r": 101.5323, "b": 504.1879, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "GTE", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 134.86716, "t": 495.28134, "r": 153.68651, "b": 504.1879, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "FTN", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 183.62411, "t": 495.28134, "r": 186.94167, "b": 504.1879, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "-", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 227.88795000000002, "t": 495.28134, "r": 231.20551, "b": 504.1879, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "-", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 259.69855, "t": 495.28134, "r": 282.11441, "b": 504.1879, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "87.14", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 71.789001, "t": 507.23633, "r": 111.90838999999998, "b": 516.14288, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "GTE (FT)", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 134.86221, "t": 507.23633, "r": 153.68156, "b": 516.14288, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "FTN", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 183.62914, "t": 507.23633, "r": 186.94669, "b": 516.14288, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "-", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 227.89297, "t": 507.23633, "r": 231.21053000000003, "b": 516.14288, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "-", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 259.6936, "t": 507.23633, "r": 282.10947, "b": 516.14288, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "91.02", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 66.315002, "t": 519.1913099999999, "r": 117.38329000000002, "b": 528.0978700000001, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "TableFormer", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 134.86766, "t": 519.1913099999999, "r": 153.68701, "b": 528.0978700000001, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "FTN", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 176.57111, "t": 519.1913099999999, "r": 194.00566, "b": 528.0978700000001, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "97.5", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 220.83495, "t": 519.1913099999999, "r": 238.26950000000002, "b": 528.0978700000001, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "96.0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 262.189, "t": 519.0717500000001, "r": 279.62354, "b": 528.02814, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "96.8", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 81.612, "t": 536.49837, "r": 102.08514, "b": 545.40492, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "EDD", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 137.91064, "t": 536.49837, "r": 150.64285, "b": 545.40492, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "TB", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 176.56554, "t": 536.49837, "r": 194.00009, "b": 545.40492, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "86.0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 227.89285, "t": 536.49837, "r": 231.21040000000002, "b": 545.40492, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "-", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 262.18411, "t": 536.49837, "r": 279.61865, "b": 545.40492, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "86.0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 66.315002, "t": 548.45436, "r": 117.38329000000002, "b": 557.36092, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "TableFormer", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 137.90625, "t": 548.45436, "r": 150.63846, "b": 557.36092, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "TB", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 176.57111, "t": 548.45436, "r": 194.00566, "b": 557.36092, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "89.6", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 227.88845999999998, "t": 548.45436, "r": 231.20601, "b": 557.36092, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "-", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 262.189, "t": 548.3348100000001, "r": 279.62354, "b": 557.2911799999999, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "89.6", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 66.315002, "t": 568.00237, "r": 117.38329000000002, "b": 576.90892, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 10, "end_row_offset_idx": 11, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "TableFormer", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 134.86766, "t": 568.00237, "r": 153.68701, "b": 576.90892, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 10, "end_row_offset_idx": 11, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "STN", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 176.57111, "t": 568.00237, "r": 194.00566, "b": 576.90892, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 10, "end_row_offset_idx": 11, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "96.9", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 220.83495, "t": 568.00237, "r": 238.26950000000002, "b": 576.90892, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 10, "end_row_offset_idx": 11, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "95.7", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 262.1897, "t": 568.00237, "r": 279.62424, "b": 576.90892, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 10, "end_row_offset_idx": 11, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "96.7", "column_header": false, "row_header": false, "row_section": false}]}, "11": {"label": "table", "id": 11, "page_no": 6, "cluster": {"id": 11, "label": "table", "bbox": {"l": 308.4067077636719, "t": 247.87644958496094, "r": 533.64208984375, "b": 303.8056640625, "coord_origin": "TOPLEFT"}, "confidence": 0.9691707491874695, "cells": [{"id": 146, "text": "Model", "bbox": {"l": 339.323, "t": 253.66436999999996, "r": 365.33536, "b": 262.57092, "coord_origin": "TOPLEFT"}}, {"id": 147, "text": "Dataset", "bbox": {"l": 401.04132, "t": 253.66436999999996, "r": 430.91916, "b": 262.57092, "coord_origin": "TOPLEFT"}}, {"id": 148, "text": "mAP", "bbox": {"l": 454.10214, "t": 253.66436999999996, "r": 474.58523999999994, "b": 262.57092, "coord_origin": "TOPLEFT"}}, {"id": 149, "text": "mAP (PP)", "bbox": {"l": 486.54034, "t": 253.66436999999996, "r": 527.2276, "b": 262.57092, "coord_origin": "TOPLEFT"}}, {"id": 150, "text": "EDD+BBox", "bbox": {"l": 327.65601, "t": 270.62134000000003, "r": 377.00076, "b": 279.52788999999996, "coord_origin": "TOPLEFT"}}, {"id": 151, "text": "PubTabNet", "bbox": {"l": 393.69809, "t": 270.62134000000003, "r": 438.28073, "b": 279.52788999999996, "coord_origin": "TOPLEFT"}}, {"id": 152, "text": "79.2", "bbox": {"l": 455.63559, "t": 270.62134000000003, "r": 473.07013, "b": 279.52788999999996, "coord_origin": "TOPLEFT"}}, {"id": 153, "text": "82.7", "bbox": {"l": 498.16592, "t": 270.62134000000003, "r": 515.60046, "b": 279.52788999999996, "coord_origin": "TOPLEFT"}}, {"id": 154, "text": "TableFormer", "bbox": {"l": 326.79501, "t": 282.57631999999995, "r": 377.86331, "b": 291.48288, "coord_origin": "TOPLEFT"}}, {"id": 155, "text": "PubTabNet", "bbox": {"l": 393.69388, "t": 282.57631999999995, "r": 438.27652, "b": 291.48288, "coord_origin": "TOPLEFT"}}, {"id": 156, "text": "82.1", "bbox": {"l": 455.63101, "t": 282.45676, "r": 473.06555000000003, "b": 291.41315, "coord_origin": "TOPLEFT"}}, {"id": 157, "text": "86.8", "bbox": {"l": 498.1713, "t": 282.45676, "r": 515.60583, "b": 291.41315, "coord_origin": "TOPLEFT"}}, {"id": 158, "text": "TableFormer", "bbox": {"l": 326.79501, "t": 294.53131, "r": 377.86331, "b": 303.43787, "coord_origin": "TOPLEFT"}}, {"id": 159, "text": "SynthTabNet", "bbox": {"l": 389.81842, "t": 294.53131, "r": 442.15194999999994, "b": 303.43787, "coord_origin": "TOPLEFT"}}, {"id": 160, "text": "87.7", "bbox": {"l": 455.63135, "t": 294.53131, "r": 473.06589, "b": 303.43787, "coord_origin": "TOPLEFT"}}, {"id": 161, "text": "-", "bbox": {"l": 505.22515999999996, "t": 294.53131, "r": 508.54268999999994, "b": 303.43787, "coord_origin": "TOPLEFT"}}]}, "text": null, "otsl_seq": ["ched", "ched", "ched", "ched", "nl", "fcel", "fcel", "fcel", "fcel", "nl", "fcel", "fcel", "fcel", "fcel", "nl", "fcel", "fcel", "fcel", "fcel", "nl"], "num_rows": 4, "num_cols": 4, "table_cells": [{"bbox": {"l": 339.323, "t": 253.66436999999996, "r": 365.33536, "b": 262.57092, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Model", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 401.04132, "t": 253.66436999999996, "r": 430.91916, "b": 262.57092, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "Dataset", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 454.10214, "t": 253.66436999999996, "r": 474.58523999999994, "b": 262.57092, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "mAP", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 486.54034, "t": 253.66436999999996, "r": 527.2276, "b": 262.57092, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "mAP (PP)", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 327.65601, "t": 270.62134000000003, "r": 377.00076, "b": 279.52788999999996, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "EDD+BBox", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 393.69809, "t": 270.62134000000003, "r": 438.28073, "b": 279.52788999999996, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "PubTabNet", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 455.63559, "t": 270.62134000000003, "r": 473.07013, "b": 279.52788999999996, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "79.2", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 498.16592, "t": 270.62134000000003, "r": 515.60046, "b": 279.52788999999996, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "82.7", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 326.79501, "t": 282.57631999999995, "r": 377.86331, "b": 291.48288, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "TableFormer", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 393.69388, "t": 282.57631999999995, "r": 438.27652, "b": 291.48288, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "PubTabNet", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 455.63101, "t": 282.45676, "r": 473.06555000000003, "b": 291.41315, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "82.1", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 498.1713, "t": 282.45676, "r": 515.60583, "b": 291.41315, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "86.8", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 326.79501, "t": 294.53131, "r": 377.86331, "b": 303.43787, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "TableFormer", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 389.81842, "t": 294.53131, "r": 442.15194999999994, "b": 303.43787, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "SynthTabNet", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 455.63135, "t": 294.53131, "r": 473.06589, "b": 303.43787, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "87.7", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 505.22515999999996, "t": 294.53131, "r": 508.54268999999994, "b": 303.43787, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "-", "column_header": false, "row_header": false, "row_section": false}]}, "14": {"label": "table", "id": 14, "page_no": 6, "cluster": {"id": 14, "label": "table", "bbox": {"l": 332.9688720703125, "t": 540.2835083007812, "r": 520.942138671875, "b": 643.84991, "coord_origin": "TOPLEFT"}, "confidence": 0.9775565266609192, "cells": [{"id": 183, "text": "Model", "bbox": {"l": 358.01099, "t": 552.23337, "r": 384.02335, "b": 561.1399200000001, "coord_origin": "TOPLEFT"}}, {"id": 184, "text": "TEDS", "bbox": {"l": 449.03400000000005, "t": 546.25537, "r": 473.94049000000007, "b": 555.16193, "coord_origin": "TOPLEFT"}}, {"id": 185, "text": "Simple", "bbox": {"l": 408.50598, "t": 558.21037, "r": 436.73999, "b": 567.11693, "coord_origin": "TOPLEFT"}}, {"id": 186, "text": "Complex", "bbox": {"l": 448.6951, "t": 558.21037, "r": 485.07849, "b": 567.11693, "coord_origin": "TOPLEFT"}}, {"id": 187, "text": "All", "bbox": {"l": 499.3848, "t": 558.21037, "r": 512.117, "b": 567.11693, "coord_origin": "TOPLEFT"}}, {"id": 188, "text": "Tabula", "bbox": {"l": 357.68201, "t": 575.16736, "r": 384.3519, "b": 584.0739100000001, "coord_origin": "TOPLEFT"}}, {"id": 189, "text": "78.0", "bbox": {"l": 413.90097, "t": 575.16736, "r": 431.33550999999994, "b": 584.0739100000001, "coord_origin": "TOPLEFT"}}, {"id": 190, "text": "57.8", "bbox": {"l": 458.16479000000004, "t": 575.16736, "r": 475.59933000000007, "b": 584.0739100000001, "coord_origin": "TOPLEFT"}}, {"id": 191, "text": "67.9", "bbox": {"l": 497.0289, "t": 575.16736, "r": 514.46344, "b": 584.0739100000001, "coord_origin": "TOPLEFT"}}, {"id": 192, "text": "Traprange", "bbox": {"l": 350.72299, "t": 587.12236, "r": 391.31064, "b": 596.02892, "coord_origin": "TOPLEFT"}}, {"id": 193, "text": "60.8", "bbox": {"l": 413.90582, "t": 587.12236, "r": 431.34036, "b": 596.02892, "coord_origin": "TOPLEFT"}}, {"id": 194, "text": "49.9", "bbox": {"l": 458.16965, "t": 587.12236, "r": 475.60419, "b": 596.02892, "coord_origin": "TOPLEFT"}}, {"id": 195, "text": "55.4", "bbox": {"l": 497.03374999999994, "t": 587.12236, "r": 514.46832, "b": 596.02892, "coord_origin": "TOPLEFT"}}, {"id": 196, "text": "Camelot", "bbox": {"l": 354.13599, "t": 599.07835, "r": 387.89923, "b": 607.98491, "coord_origin": "TOPLEFT"}}, {"id": 197, "text": "80.0", "bbox": {"l": 413.90161, "t": 599.07835, "r": 431.33615, "b": 607.98491, "coord_origin": "TOPLEFT"}}, {"id": 198, "text": "66.0", "bbox": {"l": 458.16544, "t": 599.07835, "r": 475.59998, "b": 607.98491, "coord_origin": "TOPLEFT"}}, {"id": 199, "text": "73.0", "bbox": {"l": 497.02954000000005, "t": 599.07835, "r": 514.46411, "b": 607.98491, "coord_origin": "TOPLEFT"}}, {"id": 200, "text": "Acrobat Pro", "bbox": {"l": 346.55899, "t": 611.03336, "r": 395.47534, "b": 619.93991, "coord_origin": "TOPLEFT"}}, {"id": 201, "text": "68.9", "bbox": {"l": 413.90616, "t": 611.03336, "r": 431.34069999999997, "b": 619.93991, "coord_origin": "TOPLEFT"}}, {"id": 202, "text": "61.8", "bbox": {"l": 458.16998000000007, "t": 611.03336, "r": 475.60452, "b": 619.93991, "coord_origin": "TOPLEFT"}}, {"id": 203, "text": "65.3", "bbox": {"l": 497.03409, "t": 611.03336, "r": 514.46863, "b": 619.93991, "coord_origin": "TOPLEFT"}}, {"id": 204, "text": "EDD", "bbox": {"l": 360.78101, "t": 622.9883600000001, "r": 381.25415, "b": 631.89491, "coord_origin": "TOPLEFT"}}, {"id": 205, "text": "91.2", "bbox": {"l": 413.90158, "t": 622.9883600000001, "r": 431.33612, "b": 631.89491, "coord_origin": "TOPLEFT"}}, {"id": 206, "text": "85.4", "bbox": {"l": 458.16541, "t": 622.9883600000001, "r": 475.59995000000004, "b": 631.89491, "coord_origin": "TOPLEFT"}}, {"id": 207, "text": "88.3", "bbox": {"l": 497.0295100000001, "t": 622.9883600000001, "r": 514.46405, "b": 631.89491, "coord_origin": "TOPLEFT"}}, {"id": 208, "text": "TableFormer", "bbox": {"l": 345.483, "t": 634.94336, "r": 396.5513, "b": 643.84991, "coord_origin": "TOPLEFT"}}, {"id": 209, "text": "95.4", "bbox": {"l": 413.90616, "t": 634.94336, "r": 431.34069999999997, "b": 643.84991, "coord_origin": "TOPLEFT"}}, {"id": 210, "text": "90.1", "bbox": {"l": 458.16998000000007, "t": 634.94336, "r": 475.60452, "b": 643.84991, "coord_origin": "TOPLEFT"}}, {"id": 211, "text": "93.6", "bbox": {"l": 497.03400000000005, "t": 634.82381, "r": 514.46857, "b": 643.78018, "coord_origin": "TOPLEFT"}}]}, "text": null, "otsl_seq": ["fcel", "ched", "ched", "ched", "nl", "rhed", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "nl"], "num_rows": 7, "num_cols": 4, "table_cells": [{"bbox": {"l": 358.01099, "t": 552.23337, "r": 384.02335, "b": 561.1399200000001, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Model", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 448.6951, "t": 546.25537, "r": 485.07849, "b": 567.11693, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "TEDS Complex", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 408.50598, "t": 558.21037, "r": 436.73999, "b": 567.11693, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "Simple", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 499.3848, "t": 558.21037, "r": 512.117, "b": 567.11693, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "All", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 357.68201, "t": 575.16736, "r": 384.3519, "b": 584.0739100000001, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Tabula", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 413.90097, "t": 575.16736, "r": 431.33550999999994, "b": 584.0739100000001, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "78.0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 458.16479000000004, "t": 575.16736, "r": 475.59933000000007, "b": 584.0739100000001, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "57.8", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 497.0289, "t": 575.16736, "r": 514.46344, "b": 584.0739100000001, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "67.9", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 350.72299, "t": 587.12236, "r": 391.31064, "b": 596.02892, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Traprange", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 413.90582, "t": 587.12236, "r": 431.34036, "b": 596.02892, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "60.8", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 458.16965, "t": 587.12236, "r": 475.60419, "b": 596.02892, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "49.9", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 497.03374999999994, "t": 587.12236, "r": 514.46832, "b": 596.02892, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "55.4", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 354.13599, "t": 599.07835, "r": 387.89923, "b": 607.98491, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Camelot", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 413.90161, "t": 599.07835, "r": 431.33615, "b": 607.98491, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "80.0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 458.16544, "t": 599.07835, "r": 475.59998, "b": 607.98491, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "66.0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 497.02954000000005, "t": 599.07835, "r": 514.46411, "b": 607.98491, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "73.0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 346.55899, "t": 611.03336, "r": 395.47534, "b": 619.93991, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Acrobat Pro", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 413.90616, "t": 611.03336, "r": 431.34069999999997, "b": 619.93991, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "68.9", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 458.16998000000007, "t": 611.03336, "r": 475.60452, "b": 619.93991, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "61.8", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 497.03409, "t": 611.03336, "r": 514.46863, "b": 619.93991, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "65.3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 360.78101, "t": 622.9883600000001, "r": 381.25415, "b": 631.89491, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "EDD", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 413.90158, "t": 622.9883600000001, "r": 431.33612, "b": 631.89491, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "91.2", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 458.16541, "t": 622.9883600000001, "r": 475.59995000000004, "b": 631.89491, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "85.4", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 497.0295100000001, "t": 622.9883600000001, "r": 514.46405, "b": 631.89491, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "88.3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 345.483, "t": 634.94336, "r": 396.5513, "b": 643.84991, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "TableFormer", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 413.90616, "t": 634.94336, "r": 431.34069999999997, "b": 643.84991, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "95.4", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 458.16998000000007, "t": 634.94336, "r": 475.60452, "b": 643.84991, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "90.1", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 497.03400000000005, "t": 634.82381, "r": 514.46857, "b": 643.78018, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "93.6", "column_header": false, "row_header": false, "row_section": false}]}}}, "figures_classification": null, "equations_prediction": null}, "assembled": {"elements": [{"label": "section_header", "id": 0, "page_no": 6, "cluster": {"id": 0, "label": "section_header", "bbox": {"l": 49.507503509521484, "t": 73.60204315185547, "r": 168.0131378173828, "b": 84.25342, "coord_origin": "TOPLEFT"}, "confidence": 0.9554283022880554, "cells": [{"id": 0, "text": "5.3.", "bbox": {"l": 50.112, "t": 74.40137000000016, "r": 63.704811, "b": 84.25342, "coord_origin": "TOPLEFT"}}, {"id": 1, "text": "Datasets and Metrics", "bbox": {"l": 72.766685, "t": 74.40137000000016, "r": 167.89825, "b": 84.25342, "coord_origin": "TOPLEFT"}}]}, "text": "5.3. Datasets and Metrics"}, {"label": "text", "id": 1, "page_no": 6, "cluster": {"id": 1, "label": "text", "bbox": {"l": 49.59501647949219, "t": 92.33658599853516, "r": 286.36511, "b": 138.36517333984375, "coord_origin": "TOPLEFT"}, "confidence": 0.9862996935844421, "cells": [{"id": 2, "text": "The Tree-Edit-Distance-Based Similarity (TEDS) met-", "bbox": {"l": 62.067001, "t": 93.35039999999992, "r": 286.36499, "b": 102.25696000000016, "coord_origin": "TOPLEFT"}}, {"id": 3, "text": "ric was introduced in [37]. It represents the prediction, and", "bbox": {"l": 50.112, "t": 105.30542000000003, "r": 286.36511, "b": 114.21198000000015, "coord_origin": "TOPLEFT"}}, {"id": 4, "text": "ground-truth as a tree structure of HTML tags. This simi-", "bbox": {"l": 50.112, "t": 117.26044000000002, "r": 286.36505, "b": 126.16699000000006, "coord_origin": "TOPLEFT"}}, {"id": 5, "text": "larity is calculated as:", "bbox": {"l": 50.112, "t": 129.21642999999995, "r": 136.71687, "b": 138.12298999999996, "coord_origin": "TOPLEFT"}}]}, "text": "The Tree-Edit-Distance-Based Similarity (TEDS) metric was introduced in [37]. It represents the prediction, and ground-truth as a tree structure of HTML tags. This similarity is calculated as:"}, {"label": "formula", "id": 2, "page_no": 6, "cluster": {"id": 2, "label": "formula", "bbox": {"l": 85.5722427368164, "t": 149.28536987304688, "r": 286.3624, "b": 173.28463745117188, "coord_origin": "TOPLEFT"}, "confidence": 0.9500426650047302, "cells": [{"id": 6, "text": "TEDS (", "bbox": {"l": 86.218994, "t": 157.05798000000004, "r": 118.8784, "b": 165.90479000000005, "coord_origin": "TOPLEFT"}}, {"id": 7, "text": "T$_{a}$, T$_{b}$", "bbox": {"l": 118.87499, "t": 157.05798000000004, "r": 143.26962, "b": 165.90479000000005, "coord_origin": "TOPLEFT"}}, {"id": 8, "text": ") = 1", "bbox": {"l": 143.76799, "t": 157.05798000000004, "r": 165.9019, "b": 165.90479000000005, "coord_origin": "TOPLEFT"}}, {"id": 9, "text": "\u2212", "bbox": {"l": 168.12099, "t": 156.50012000000004, "r": 175.8699, "b": 165.90479000000005, "coord_origin": "TOPLEFT"}}, {"id": 10, "text": "EditDist (", "bbox": {"l": 179.27899, "t": 150.31799, "r": 221.95677, "b": 159.16479000000004, "coord_origin": "TOPLEFT"}}, {"id": 11, "text": "T$_{a}$, T$_{b}$", "bbox": {"l": 221.95200000000003, "t": 150.31799, "r": 246.34663, "b": 159.16479000000004, "coord_origin": "TOPLEFT"}}, {"id": 12, "text": ")", "bbox": {"l": 246.84499999999997, "t": 150.31799, "r": 250.71945, "b": 159.16479000000004, "coord_origin": "TOPLEFT"}}, {"id": 13, "text": "max (", "bbox": {"l": 182.21201, "t": 163.89197000000001, "r": 206.29161, "b": 172.73877000000005, "coord_origin": "TOPLEFT"}}, {"id": 14, "text": "|", "bbox": {"l": 206.289, "t": 163.33411, "r": 209.05661, "b": 172.73877000000005, "coord_origin": "TOPLEFT"}}, {"id": 15, "text": "T$_{a}$", "bbox": {"l": 209.056, "t": 163.89197000000001, "r": 219.19968, "b": 172.73877000000005, "coord_origin": "TOPLEFT"}}, {"id": 16, "text": "|", "bbox": {"l": 219.69700999999998, "t": 163.33411, "r": 222.46461000000002, "b": 172.73877000000005, "coord_origin": "TOPLEFT"}}, {"id": 17, "text": ",", "bbox": {"l": 224.125, "t": 163.89197000000001, "r": 226.89261, "b": 172.73877000000005, "coord_origin": "TOPLEFT"}}, {"id": 18, "text": "|", "bbox": {"l": 228.55299000000002, "t": 163.33411, "r": 231.3206, "b": 172.73877000000005, "coord_origin": "TOPLEFT"}}, {"id": 19, "text": "T$_{b}$", "bbox": {"l": 231.31999, "t": 163.89197000000001, "r": 240.64563, "b": 172.73877000000005, "coord_origin": "TOPLEFT"}}, {"id": 20, "text": "|", "bbox": {"l": 241.144, "t": 163.33411, "r": 243.91161, "b": 172.73877000000005, "coord_origin": "TOPLEFT"}}, {"id": 21, "text": ")", "bbox": {"l": 243.911, "t": 163.89197000000001, "r": 247.78545, "b": 172.73877000000005, "coord_origin": "TOPLEFT"}}, {"id": 22, "text": "(3)", "bbox": {"l": 274.746, "t": 157.21740999999997, "r": 286.3624, "b": 166.12396, "coord_origin": "TOPLEFT"}}]}, "text": "TEDS ( T$_{a}$, T$_{b}$ ) = 1 \u2212 EditDist ( T$_{a}$, T$_{b}$ ) max ( | T$_{a}$ | , | T$_{b}$ | ) (3)"}, {"label": "text", "id": 3, "page_no": 6, "cluster": {"id": 3, "label": "text", "bbox": {"l": 49.815025329589844, "t": 180.52044677734375, "r": 286.4786376953125, "b": 213.97900000000004, "coord_origin": "TOPLEFT"}, "confidence": 0.9735332727432251, "cells": [{"id": 23, "text": "where", "bbox": {"l": 62.067001, "t": 181.16241000000002, "r": 86.405632, "b": 190.06897000000004, "coord_origin": "TOPLEFT"}}, {"id": 24, "text": "T$_{a}$", "bbox": {"l": 88.581001, "t": 181.00298999999995, "r": 98.724663, "b": 189.84978999999998, "coord_origin": "TOPLEFT"}}, {"id": 25, "text": "and", "bbox": {"l": 101.399, "t": 181.16241000000002, "r": 115.785, "b": 190.06897000000004, "coord_origin": "TOPLEFT"}}, {"id": 26, "text": "T$_{b}$", "bbox": {"l": 117.961, "t": 181.00298999999995, "r": 127.28664, "b": 189.84978999999998, "coord_origin": "TOPLEFT"}}, {"id": 27, "text": "represent tables in tree structure HTML", "bbox": {"l": 129.95999, "t": 181.16241000000002, "r": 286.36285, "b": 190.06897000000004, "coord_origin": "TOPLEFT"}}, {"id": 28, "text": "format. EditDist denotes the tree-edit distance, and", "bbox": {"l": 50.111992, "t": 193.11743, "r": 252.78116000000003, "b": 202.02399000000003, "coord_origin": "TOPLEFT"}}, {"id": 29, "text": "|", "bbox": {"l": 255.18201, "t": 192.40015000000005, "r": 257.94962, "b": 201.80480999999997, "coord_origin": "TOPLEFT"}}, {"id": 30, "text": "T", "bbox": {"l": 257.94901, "t": 192.95800999999994, "r": 263.77115, "b": 201.80480999999997, "coord_origin": "TOPLEFT"}}, {"id": 31, "text": "|", "bbox": {"l": 265.155, "t": 192.40015000000005, "r": 267.92261, "b": 201.80480999999997, "coord_origin": "TOPLEFT"}}, {"id": 32, "text": "rep-", "bbox": {"l": 270.32199, "t": 193.11743, "r": 286.36179, "b": 202.02399000000003, "coord_origin": "TOPLEFT"}}, {"id": 33, "text": "resents the number of nodes in", "bbox": {"l": 50.111984, "t": 205.07245, "r": 172.13388, "b": 213.97900000000004, "coord_origin": "TOPLEFT"}}, {"id": 34, "text": "T", "bbox": {"l": 174.62399, "t": 204.91301999999996, "r": 180.44614, "b": 213.75982999999997, "coord_origin": "TOPLEFT"}}, {"id": 35, "text": ".", "bbox": {"l": 181.82899, "t": 205.07245, "r": 184.31964, "b": 213.97900000000004, "coord_origin": "TOPLEFT"}}]}, "text": "where T$_{a}$ and T$_{b}$ represent tables in tree structure HTML format. EditDist denotes the tree-edit distance, and | T | represents the number of nodes in T ."}, {"label": "section_header", "id": 4, "page_no": 6, "cluster": {"id": 4, "label": "section_header", "bbox": {"l": 49.47447204589844, "t": 224.4459991455078, "r": 170.64169311523438, "b": 235.01736450195312, "coord_origin": "TOPLEFT"}, "confidence": 0.9588840007781982, "cells": [{"id": 36, "text": "5.4.", "bbox": {"l": 50.112, "t": 224.81946000000005, "r": 64.551605, "b": 234.67151, "coord_origin": "TOPLEFT"}}, {"id": 37, "text": "Quantitative Analysis", "bbox": {"l": 74.178009, "t": 224.81946000000005, "r": 170.45169, "b": 234.67151, "coord_origin": "TOPLEFT"}}]}, "text": "5.4. Quantitative Analysis"}, {"label": "text", "id": 5, "page_no": 6, "cluster": {"id": 5, "label": "text", "bbox": {"l": 49.47037124633789, "t": 242.6270294189453, "r": 286.4912414550781, "b": 396.757568359375, "coord_origin": "TOPLEFT"}, "confidence": 0.9855114221572876, "cells": [{"id": 38, "text": "Structure.", "bbox": {"l": 62.067001, "t": 243.6499, "r": 105.32461, "b": 252.60626000000002, "coord_origin": "TOPLEFT"}}, {"id": 39, "text": "As shown in Tab.", "bbox": {"l": 112.12600000000002, "t": 243.76946999999996, "r": 184.68361, "b": 252.67602999999997, "coord_origin": "TOPLEFT"}}, {"id": 40, "text": "2, TableFormer outper-", "bbox": {"l": 191.4781, "t": 243.76946999999996, "r": 286.36188, "b": 252.67602999999997, "coord_origin": "TOPLEFT"}}, {"id": 41, "text": "forms all SOTA methods across different datasets by a large", "bbox": {"l": 50.112, "t": 255.72448999999995, "r": 286.36508, "b": 264.63104, "coord_origin": "TOPLEFT"}}, {"id": 42, "text": "margin for predicting the table structure from an image.", "bbox": {"l": 50.112, "t": 267.67949999999996, "r": 286.36508, "b": 276.58606, "coord_origin": "TOPLEFT"}}, {"id": 43, "text": "All the more, our model outperforms pre-trained methods.", "bbox": {"l": 50.112, "t": 279.63446, "r": 286.36508, "b": 288.54105, "coord_origin": "TOPLEFT"}}, {"id": 44, "text": "During the evaluation we do not apply any table filtering.", "bbox": {"l": 50.112, "t": 291.59048, "r": 286.36514, "b": 300.49704, "coord_origin": "TOPLEFT"}}, {"id": 45, "text": "We also provide our baseline results on the SynthTabNet", "bbox": {"l": 50.112, "t": 303.54547, "r": 286.36508, "b": 312.45203000000004, "coord_origin": "TOPLEFT"}}, {"id": 46, "text": "dataset. It has been observed that large tables (e.g. tables", "bbox": {"l": 50.112, "t": 315.50046, "r": 286.36505, "b": 324.40700999999996, "coord_origin": "TOPLEFT"}}, {"id": 47, "text": "that occupy half of the page or more) yield poor predictions.", "bbox": {"l": 50.112, "t": 327.45544, "r": 286.36508, "b": 336.362, "coord_origin": "TOPLEFT"}}, {"id": 48, "text": "We attribute this issue to the image resizing during the pre-", "bbox": {"l": 50.112, "t": 339.41043, "r": 286.36508, "b": 348.31699000000003, "coord_origin": "TOPLEFT"}}, {"id": 49, "text": "processing step, that produces downsampled images with", "bbox": {"l": 50.112, "t": 351.36542, "r": 286.36505, "b": 360.27197, "coord_origin": "TOPLEFT"}}, {"id": 50, "text": "indistinguishable features. This problem can be addressed", "bbox": {"l": 50.112, "t": 363.32141, "r": 286.36508, "b": 372.2279700000001, "coord_origin": "TOPLEFT"}}, {"id": 51, "text": "by treating such big tables with a separate model which ac-", "bbox": {"l": 50.112, "t": 375.2764, "r": 286.36511, "b": 384.18295000000006, "coord_origin": "TOPLEFT"}}, {"id": 52, "text": "cepts a large input image size.", "bbox": {"l": 50.112, "t": 387.23138, "r": 170.01187, "b": 396.13794, "coord_origin": "TOPLEFT"}}]}, "text": "Structure. As shown in Tab. 2, TableFormer outperforms all SOTA methods across different datasets by a large margin for predicting the table structure from an image. All the more, our model outperforms pre-trained methods. During the evaluation we do not apply any table filtering. We also provide our baseline results on the SynthTabNet dataset. It has been observed that large tables (e.g. tables that occupy half of the page or more) yield poor predictions. We attribute this issue to the image resizing during the preprocessing step, that produces downsampled images with indistinguishable features. This problem can be addressed by treating such big tables with a separate model which accepts a large input image size."}, {"label": "table", "id": 6, "page_no": 6, "cluster": {"id": 6, "label": "table", "bbox": {"l": 53.36846160888672, "t": 409.1356506347656, "r": 283.0443420410156, "b": 582.397705078125, "coord_origin": "TOPLEFT"}, "confidence": 0.989250659942627, "cells": [{"id": 53, "text": "Model", "bbox": {"l": 78.843002, "t": 420.69037, "r": 104.85535, "b": 429.59692, "coord_origin": "TOPLEFT"}}, {"id": 54, "text": "TEDS", "bbox": {"l": 211.2, "t": 414.71237, "r": 236.10649, "b": 423.61893, "coord_origin": "TOPLEFT"}}, {"id": 55, "text": "Dataset", "bbox": {"l": 129.338, "t": 426.66736, "r": 159.21584, "b": 435.57391000000007, "coord_origin": "TOPLEFT"}}, {"id": 56, "text": "Simple", "bbox": {"l": 171.17096, "t": 426.66736, "r": 199.40497, "b": 435.57391000000007, "coord_origin": "TOPLEFT"}}, {"id": 57, "text": "Complex", "bbox": {"l": 211.36009, "t": 426.66736, "r": 247.74349999999998, "b": 435.57391000000007, "coord_origin": "TOPLEFT"}}, {"id": 58, "text": "All", "bbox": {"l": 264.54044, "t": 426.66736, "r": 277.27264, "b": 435.57391000000007, "coord_origin": "TOPLEFT"}}, {"id": 59, "text": "EDD", "bbox": {"l": 81.612, "t": 443.62436, "r": 102.08514, "b": 452.53091, "coord_origin": "TOPLEFT"}}, {"id": 60, "text": "PTN", "bbox": {"l": 134.87206, "t": 443.62436, "r": 153.69141, "b": 452.53091, "coord_origin": "TOPLEFT"}}, {"id": 61, "text": "91.1", "bbox": {"l": 176.56554, "t": 443.62436, "r": 194.00009, "b": 452.53091, "coord_origin": "TOPLEFT"}}, {"id": 62, "text": "88.7", "bbox": {"l": 220.82938000000001, "t": 443.62436, "r": 238.26393, "b": 452.53091, "coord_origin": "TOPLEFT"}}, {"id": 63, "text": "89.9", "bbox": {"l": 262.18414, "t": 443.62436, "r": 279.61868, "b": 452.53091, "coord_origin": "TOPLEFT"}}, {"id": 64, "text": "GTE", "bbox": {"l": 82.165001, "t": 455.58035, "r": 101.5323, "b": 464.48691, "coord_origin": "TOPLEFT"}}, {"id": 65, "text": "PTN", "bbox": {"l": 134.86716, "t": 455.58035, "r": 153.68651, "b": 464.48691, "coord_origin": "TOPLEFT"}}, {"id": 66, "text": "-", "bbox": {"l": 183.62411, "t": 455.58035, "r": 186.94167, "b": 464.48691, "coord_origin": "TOPLEFT"}}, {"id": 67, "text": "-", "bbox": {"l": 227.88795000000002, "t": 455.58035, "r": 231.20551, "b": 464.48691, "coord_origin": "TOPLEFT"}}, {"id": 68, "text": "93.01", "bbox": {"l": 259.69855, "t": 455.58035, "r": 282.11441, "b": 464.48691, "coord_origin": "TOPLEFT"}}, {"id": 69, "text": "TableFormer", "bbox": {"l": 66.315002, "t": 468.13336, "r": 117.38329000000002, "b": 477.03992, "coord_origin": "TOPLEFT"}}, {"id": 70, "text": "PTN", "bbox": {"l": 134.86766, "t": 468.13336, "r": 153.68701, "b": 477.03992, "coord_origin": "TOPLEFT"}}, {"id": 71, "text": "98.5", "bbox": {"l": 176.57111, "t": 468.13336, "r": 194.00566, "b": 477.03992, "coord_origin": "TOPLEFT"}}, {"id": 72, "text": "95.0", "bbox": {"l": 220.83495, "t": 468.13336, "r": 238.26950000000002, "b": 477.03992, "coord_origin": "TOPLEFT"}}, {"id": 73, "text": "96.75", "bbox": {"l": 259.698, "t": 468.01379, "r": 282.11386, "b": 476.97018, "coord_origin": "TOPLEFT"}}, {"id": 74, "text": "EDD", "bbox": {"l": 81.612, "t": 483.32635, "r": 102.08514, "b": 492.23291, "coord_origin": "TOPLEFT"}}, {"id": 75, "text": "FTN", "bbox": {"l": 134.87206, "t": 483.32635, "r": 153.69141, "b": 492.23291, "coord_origin": "TOPLEFT"}}, {"id": 76, "text": "88.4", "bbox": {"l": 176.56554, "t": 483.32635, "r": 194.00009, "b": 492.23291, "coord_origin": "TOPLEFT"}}, {"id": 77, "text": "92.08", "bbox": {"l": 218.33870999999996, "t": 483.32635, "r": 240.75455999999997, "b": 492.23291, "coord_origin": "TOPLEFT"}}, {"id": 78, "text": "90.6", "bbox": {"l": 262.18411, "t": 483.32635, "r": 279.61865, "b": 492.23291, "coord_origin": "TOPLEFT"}}, {"id": 79, "text": "GTE", "bbox": {"l": 82.165001, "t": 495.28134, "r": 101.5323, "b": 504.1879, "coord_origin": "TOPLEFT"}}, {"id": 80, "text": "FTN", "bbox": {"l": 134.86716, "t": 495.28134, "r": 153.68651, "b": 504.1879, "coord_origin": "TOPLEFT"}}, {"id": 81, "text": "-", "bbox": {"l": 183.62411, "t": 495.28134, "r": 186.94167, "b": 504.1879, "coord_origin": "TOPLEFT"}}, {"id": 82, "text": "-", "bbox": {"l": 227.88795000000002, "t": 495.28134, "r": 231.20551, "b": 504.1879, "coord_origin": "TOPLEFT"}}, {"id": 83, "text": "87.14", "bbox": {"l": 259.69855, "t": 495.28134, "r": 282.11441, "b": 504.1879, "coord_origin": "TOPLEFT"}}, {"id": 84, "text": "GTE (FT)", "bbox": {"l": 71.789001, "t": 507.23633, "r": 111.90838999999998, "b": 516.14288, "coord_origin": "TOPLEFT"}}, {"id": 85, "text": "FTN", "bbox": {"l": 134.86221, "t": 507.23633, "r": 153.68156, "b": 516.14288, "coord_origin": "TOPLEFT"}}, {"id": 86, "text": "-", "bbox": {"l": 183.62914, "t": 507.23633, "r": 186.94669, "b": 516.14288, "coord_origin": "TOPLEFT"}}, {"id": 87, "text": "-", "bbox": {"l": 227.89297, "t": 507.23633, "r": 231.21053000000003, "b": 516.14288, "coord_origin": "TOPLEFT"}}, {"id": 88, "text": "91.02", "bbox": {"l": 259.6936, "t": 507.23633, "r": 282.10947, "b": 516.14288, "coord_origin": "TOPLEFT"}}, {"id": 89, "text": "TableFormer", "bbox": {"l": 66.315002, "t": 519.1913099999999, "r": 117.38329000000002, "b": 528.0978700000001, "coord_origin": "TOPLEFT"}}, {"id": 90, "text": "FTN", "bbox": {"l": 134.86766, "t": 519.1913099999999, "r": 153.68701, "b": 528.0978700000001, "coord_origin": "TOPLEFT"}}, {"id": 91, "text": "97.5", "bbox": {"l": 176.57111, "t": 519.1913099999999, "r": 194.00566, "b": 528.0978700000001, "coord_origin": "TOPLEFT"}}, {"id": 92, "text": "96.0", "bbox": {"l": 220.83495, "t": 519.1913099999999, "r": 238.26950000000002, "b": 528.0978700000001, "coord_origin": "TOPLEFT"}}, {"id": 93, "text": "96.8", "bbox": {"l": 262.189, "t": 519.0717500000001, "r": 279.62354, "b": 528.02814, "coord_origin": "TOPLEFT"}}, {"id": 94, "text": "EDD", "bbox": {"l": 81.612, "t": 536.49837, "r": 102.08514, "b": 545.40492, "coord_origin": "TOPLEFT"}}, {"id": 95, "text": "TB", "bbox": {"l": 137.91064, "t": 536.49837, "r": 150.64285, "b": 545.40492, "coord_origin": "TOPLEFT"}}, {"id": 96, "text": "86.0", "bbox": {"l": 176.56554, "t": 536.49837, "r": 194.00009, "b": 545.40492, "coord_origin": "TOPLEFT"}}, {"id": 97, "text": "-", "bbox": {"l": 227.89285, "t": 536.49837, "r": 231.21040000000002, "b": 545.40492, "coord_origin": "TOPLEFT"}}, {"id": 98, "text": "86.0", "bbox": {"l": 262.18411, "t": 536.49837, "r": 279.61865, "b": 545.40492, "coord_origin": "TOPLEFT"}}, {"id": 99, "text": "TableFormer", "bbox": {"l": 66.315002, "t": 548.45436, "r": 117.38329000000002, "b": 557.36092, "coord_origin": "TOPLEFT"}}, {"id": 100, "text": "TB", "bbox": {"l": 137.90625, "t": 548.45436, "r": 150.63846, "b": 557.36092, "coord_origin": "TOPLEFT"}}, {"id": 101, "text": "89.6", "bbox": {"l": 176.57111, "t": 548.45436, "r": 194.00566, "b": 557.36092, "coord_origin": "TOPLEFT"}}, {"id": 102, "text": "-", "bbox": {"l": 227.88845999999998, "t": 548.45436, "r": 231.20601, "b": 557.36092, "coord_origin": "TOPLEFT"}}, {"id": 103, "text": "89.6", "bbox": {"l": 262.189, "t": 548.3348100000001, "r": 279.62354, "b": 557.2911799999999, "coord_origin": "TOPLEFT"}}, {"id": 104, "text": "TableFormer", "bbox": {"l": 66.315002, "t": 568.00237, "r": 117.38329000000002, "b": 576.90892, "coord_origin": "TOPLEFT"}}, {"id": 105, "text": "STN", "bbox": {"l": 134.86766, "t": 568.00237, "r": 153.68701, "b": 576.90892, "coord_origin": "TOPLEFT"}}, {"id": 106, "text": "96.9", "bbox": {"l": 176.57111, "t": 568.00237, "r": 194.00566, "b": 576.90892, "coord_origin": "TOPLEFT"}}, {"id": 107, "text": "95.7", "bbox": {"l": 220.83495, "t": 568.00237, "r": 238.26950000000002, "b": 576.90892, "coord_origin": "TOPLEFT"}}, {"id": 108, "text": "96.7", "bbox": {"l": 262.1897, "t": 568.00237, "r": 279.62424, "b": 576.90892, "coord_origin": "TOPLEFT"}}]}, "text": null, "otsl_seq": ["ched", "ched", "ched", "ched", "ched", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "nl"], "num_rows": 11, "num_cols": 5, "table_cells": [{"bbox": {"l": 78.843002, "t": 420.69037, "r": 104.85535, "b": 429.59692, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Model", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 211.2, "t": 414.71237, "r": 247.74349999999998, "b": 435.57391000000007, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "TEDS Complex", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 129.338, "t": 426.66736, "r": 159.21584, "b": 435.57391000000007, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "Dataset", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 171.17096, "t": 426.66736, "r": 199.40497, "b": 435.57391000000007, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "Simple", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 264.54044, "t": 426.66736, "r": 277.27264, "b": 435.57391000000007, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "All", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 81.612, "t": 443.62436, "r": 102.08514, "b": 452.53091, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "EDD", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 134.87206, "t": 443.62436, "r": 153.69141, "b": 452.53091, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "PTN", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 176.56554, "t": 443.62436, "r": 194.00009, "b": 452.53091, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "91.1", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 220.82938000000001, "t": 443.62436, "r": 238.26393, "b": 452.53091, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "88.7", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 262.18414, "t": 443.62436, "r": 279.61868, "b": 452.53091, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "89.9", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 82.165001, "t": 455.58035, "r": 101.5323, "b": 464.48691, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "GTE", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 134.86716, "t": 455.58035, "r": 153.68651, "b": 464.48691, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "PTN", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 183.62411, "t": 455.58035, "r": 186.94167, "b": 464.48691, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "-", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 227.88795000000002, "t": 455.58035, "r": 231.20551, "b": 464.48691, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "-", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 259.69855, "t": 455.58035, "r": 282.11441, "b": 464.48691, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "93.01", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 66.315002, "t": 468.13336, "r": 117.38329000000002, "b": 477.03992, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "TableFormer", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 134.86766, "t": 468.13336, "r": 153.68701, "b": 477.03992, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "PTN", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 176.57111, "t": 468.13336, "r": 194.00566, "b": 477.03992, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "98.5", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 220.83495, "t": 468.13336, "r": 238.26950000000002, "b": 477.03992, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "95.0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 259.698, "t": 468.01379, "r": 282.11386, "b": 476.97018, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "96.75", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 81.612, "t": 483.32635, "r": 102.08514, "b": 492.23291, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "EDD", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 134.87206, "t": 483.32635, "r": 153.69141, "b": 492.23291, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "FTN", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 176.56554, "t": 483.32635, "r": 194.00009, "b": 492.23291, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "88.4", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 218.33870999999996, "t": 483.32635, "r": 240.75455999999997, "b": 492.23291, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "92.08", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 262.18411, "t": 483.32635, "r": 279.61865, "b": 492.23291, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "90.6", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 82.165001, "t": 495.28134, "r": 101.5323, "b": 504.1879, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "GTE", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 134.86716, "t": 495.28134, "r": 153.68651, "b": 504.1879, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "FTN", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 183.62411, "t": 495.28134, "r": 186.94167, "b": 504.1879, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "-", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 227.88795000000002, "t": 495.28134, "r": 231.20551, "b": 504.1879, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "-", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 259.69855, "t": 495.28134, "r": 282.11441, "b": 504.1879, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "87.14", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 71.789001, "t": 507.23633, "r": 111.90838999999998, "b": 516.14288, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "GTE (FT)", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 134.86221, "t": 507.23633, "r": 153.68156, "b": 516.14288, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "FTN", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 183.62914, "t": 507.23633, "r": 186.94669, "b": 516.14288, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "-", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 227.89297, "t": 507.23633, "r": 231.21053000000003, "b": 516.14288, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "-", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 259.6936, "t": 507.23633, "r": 282.10947, "b": 516.14288, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "91.02", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 66.315002, "t": 519.1913099999999, "r": 117.38329000000002, "b": 528.0978700000001, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "TableFormer", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 134.86766, "t": 519.1913099999999, "r": 153.68701, "b": 528.0978700000001, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "FTN", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 176.57111, "t": 519.1913099999999, "r": 194.00566, "b": 528.0978700000001, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "97.5", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 220.83495, "t": 519.1913099999999, "r": 238.26950000000002, "b": 528.0978700000001, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "96.0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 262.189, "t": 519.0717500000001, "r": 279.62354, "b": 528.02814, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "96.8", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 81.612, "t": 536.49837, "r": 102.08514, "b": 545.40492, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "EDD", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 137.91064, "t": 536.49837, "r": 150.64285, "b": 545.40492, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "TB", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 176.56554, "t": 536.49837, "r": 194.00009, "b": 545.40492, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "86.0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 227.89285, "t": 536.49837, "r": 231.21040000000002, "b": 545.40492, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "-", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 262.18411, "t": 536.49837, "r": 279.61865, "b": 545.40492, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "86.0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 66.315002, "t": 548.45436, "r": 117.38329000000002, "b": 557.36092, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "TableFormer", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 137.90625, "t": 548.45436, "r": 150.63846, "b": 557.36092, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "TB", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 176.57111, "t": 548.45436, "r": 194.00566, "b": 557.36092, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "89.6", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 227.88845999999998, "t": 548.45436, "r": 231.20601, "b": 557.36092, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "-", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 262.189, "t": 548.3348100000001, "r": 279.62354, "b": 557.2911799999999, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "89.6", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 66.315002, "t": 568.00237, "r": 117.38329000000002, "b": 576.90892, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 10, "end_row_offset_idx": 11, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "TableFormer", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 134.86766, "t": 568.00237, "r": 153.68701, "b": 576.90892, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 10, "end_row_offset_idx": 11, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "STN", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 176.57111, "t": 568.00237, "r": 194.00566, "b": 576.90892, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 10, "end_row_offset_idx": 11, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "96.9", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 220.83495, "t": 568.00237, "r": 238.26950000000002, "b": 576.90892, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 10, "end_row_offset_idx": 11, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "95.7", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 262.1897, "t": 568.00237, "r": 279.62424, "b": 576.90892, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 10, "end_row_offset_idx": 11, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "96.7", "column_header": false, "row_header": false, "row_section": false}]}, {"label": "text", "id": 7, "page_no": 6, "cluster": {"id": 7, "label": "text", "bbox": {"l": 49.4423828125, "t": 591.6051635742188, "r": 286.63427734375, "b": 613.4329223632812, "coord_origin": "TOPLEFT"}, "confidence": 0.7209144830703735, "cells": [{"id": 109, "text": "Table 2: Structure results on PubTabNet (PTN), FinTabNet", "bbox": {"l": 50.112, "t": 592.43336, "r": 286.36511, "b": 601.33992, "coord_origin": "TOPLEFT"}}, {"id": 110, "text": "(FTN), TableBank (TB) and SynthTabNet (STN).", "bbox": {"l": 50.112, "t": 604.38837, "r": 247.46114, "b": 613.29492, "coord_origin": "TOPLEFT"}}]}, "text": "Table 2: Structure results on PubTabNet (PTN), FinTabNet (FTN), TableBank (TB) and SynthTabNet (STN)."}, {"label": "text", "id": 8, "page_no": 6, "cluster": {"id": 8, "label": "text", "bbox": {"l": 49.50996780395508, "t": 615.5875244140625, "r": 261.78732, "b": 625.24992, "coord_origin": "TOPLEFT"}, "confidence": 0.6433371901512146, "cells": [{"id": 111, "text": "FT: Model was trained on PubTabNet then finetuned.", "bbox": {"l": 50.112, "t": 616.34337, "r": 261.78732, "b": 625.24992, "coord_origin": "TOPLEFT"}}]}, "text": "FT: Model was trained on PubTabNet then finetuned."}, {"label": "text", "id": 9, "page_no": 6, "cluster": {"id": 9, "label": "text", "bbox": {"l": 49.4313850402832, "t": 643.5670166015625, "r": 286.515869140625, "b": 713.6913452148438, "coord_origin": "TOPLEFT"}, "confidence": 0.9854632616043091, "cells": [{"id": 112, "text": "Cell Detection.", "bbox": {"l": 62.067001, "t": 644.3498099999999, "r": 124.72179, "b": 653.30618, "coord_origin": "TOPLEFT"}}, {"id": 113, "text": "Like any object detector, our", "bbox": {"l": 128.20401, "t": 644.46936, "r": 242.9333, "b": 653.37592, "coord_origin": "TOPLEFT"}}, {"id": 114, "text": "Cell BBox", "bbox": {"l": 245.55401999999998, "t": 644.55902, "r": 286.36084, "b": 653.1467700000001, "coord_origin": "TOPLEFT"}}, {"id": 115, "text": "Detector", "bbox": {"l": 50.112015, "t": 656.51402, "r": 84.971146, "b": 665.10178, "coord_origin": "TOPLEFT"}}, {"id": 116, "text": "provides bounding boxes that can be improved", "bbox": {"l": 89.515015, "t": 656.42436, "r": 286.366, "b": 665.33092, "coord_origin": "TOPLEFT"}}, {"id": 117, "text": "with post-processing during inference. We make use of the", "bbox": {"l": 50.112015, "t": 668.37936, "r": 286.36511, "b": 677.28593, "coord_origin": "TOPLEFT"}}, {"id": 118, "text": "grid-like structure of tables to refine the predictions. A de-", "bbox": {"l": 50.112015, "t": 680.33536, "r": 286.36505, "b": 689.24193, "coord_origin": "TOPLEFT"}}, {"id": 119, "text": "tailed explanation on the post-processing is available in the", "bbox": {"l": 50.112015, "t": 692.290359, "r": 286.36511, "b": 701.19693, "coord_origin": "TOPLEFT"}}, {"id": 120, "text": "supplementary material. As shown in Tab. 3, we evaluate", "bbox": {"l": 50.112015, "t": 704.245361, "r": 286.36508, "b": 713.151932, "coord_origin": "TOPLEFT"}}]}, "text": "Cell Detection. Like any object detector, our Cell BBox Detector provides bounding boxes that can be improved with post-processing during inference. We make use of the grid-like structure of tables to refine the predictions. A detailed explanation on the post-processing is available in the supplementary material. As shown in Tab. 3, we evaluate"}, {"label": "text", "id": 10, "page_no": 6, "cluster": {"id": 10, "label": "text", "bbox": {"l": 307.97955322265625, "t": 74.48530578613281, "r": 545.258544921875, "b": 227.81777954101562, "coord_origin": "TOPLEFT"}, "confidence": 0.9713152647018433, "cells": [{"id": 121, "text": "our", "bbox": {"l": 308.862, "t": 75.20836999999995, "r": 322.14215, "b": 84.11492999999996, "coord_origin": "TOPLEFT"}}, {"id": 122, "text": "Cell BBox Decoder", "bbox": {"l": 325.45401, "t": 75.29803000000004, "r": 404.56702, "b": 83.88580000000002, "coord_origin": "TOPLEFT"}}, {"id": 123, "text": "accuracy for cells with a class la-", "bbox": {"l": 408.104, "t": 75.20836999999995, "r": 545.10968, "b": 84.11492999999996, "coord_origin": "TOPLEFT"}}, {"id": 124, "text": "bel of \u2018content\u2019 only using the PASCAL VOC mAP metric", "bbox": {"l": 308.862, "t": 87.16339000000005, "r": 545.11511, "b": 96.06994999999995, "coord_origin": "TOPLEFT"}}, {"id": 125, "text": "for pre-processing and post-processing.", "bbox": {"l": 308.862, "t": 99.11841000000004, "r": 470.22626, "b": 108.02495999999985, "coord_origin": "TOPLEFT"}}, {"id": 126, "text": "Note that we do", "bbox": {"l": 477.52884, "t": 99.11841000000004, "r": 545.11511, "b": 108.02495999999985, "coord_origin": "TOPLEFT"}}, {"id": 127, "text": "not have post-processing results for SynthTabNet as images", "bbox": {"l": 308.862, "t": 111.07343000000003, "r": 545.11517, "b": 119.97997999999984, "coord_origin": "TOPLEFT"}}, {"id": 128, "text": "are only provided. To compare the performance of our pro-", "bbox": {"l": 308.862, "t": 123.02844000000005, "r": 545.11511, "b": 131.93499999999995, "coord_origin": "TOPLEFT"}}, {"id": 129, "text": "posed approach, we\u2019ve integrated TableFormer\u2019s", "bbox": {"l": 308.862, "t": 134.98443999999995, "r": 502.01691000000005, "b": 143.89099, "coord_origin": "TOPLEFT"}}, {"id": 130, "text": "Cell BBox", "bbox": {"l": 504.47299, "t": 135.07410000000004, "r": 545.11041, "b": 143.66187000000002, "coord_origin": "TOPLEFT"}}, {"id": 131, "text": "Decoder", "bbox": {"l": 308.862, "t": 147.02910999999995, "r": 343.16324, "b": 155.61688000000004, "coord_origin": "TOPLEFT"}}, {"id": 132, "text": "into EDD architecture. As mentioned previously,", "bbox": {"l": 346.371, "t": 146.93944999999997, "r": 545.11493, "b": 155.84600999999998, "coord_origin": "TOPLEFT"}}, {"id": 133, "text": "the Structure Decoder provides the", "bbox": {"l": 308.862, "t": 158.89446999999996, "r": 446.15652, "b": 167.80102999999997, "coord_origin": "TOPLEFT"}}, {"id": 134, "text": "Cell BBox Decoder", "bbox": {"l": 448.28998000000007, "t": 158.98413000000005, "r": 525.04181, "b": 167.57190000000003, "coord_origin": "TOPLEFT"}}, {"id": 135, "text": "with", "bbox": {"l": 527.39899, "t": 158.89446999999996, "r": 545.11249, "b": 167.80102999999997, "coord_origin": "TOPLEFT"}}, {"id": 136, "text": "the features needed to predict the bounding box predictions.", "bbox": {"l": 308.862, "t": 170.84948999999995, "r": 545.11511, "b": 179.75603999999998, "coord_origin": "TOPLEFT"}}, {"id": 137, "text": "Therefore, the accuracy of the", "bbox": {"l": 308.862, "t": 182.80449999999996, "r": 432.86642000000006, "b": 191.71105999999997, "coord_origin": "TOPLEFT"}}, {"id": 138, "text": "Structure Decoder", "bbox": {"l": 436.39001, "t": 182.89417000000003, "r": 510.93021, "b": 191.48193000000003, "coord_origin": "TOPLEFT"}}, {"id": 139, "text": "directly", "bbox": {"l": 514.677, "t": 182.80449999999996, "r": 545.11273, "b": 191.71105999999997, "coord_origin": "TOPLEFT"}}, {"id": 140, "text": "influences the accuracy of the", "bbox": {"l": 308.862, "t": 194.75951999999995, "r": 431.17285, "b": 203.66607999999997, "coord_origin": "TOPLEFT"}}, {"id": 141, "text": "Cell BBox Decoder", "bbox": {"l": 434.6790199999999, "t": 194.84918000000005, "r": 514.18054, "b": 203.43695000000002, "coord_origin": "TOPLEFT"}}, {"id": 142, "text": ". If the", "bbox": {"l": 514.17603, "t": 194.75951999999995, "r": 545.10992, "b": 203.66607999999997, "coord_origin": "TOPLEFT"}}, {"id": 143, "text": "Structure Decoder", "bbox": {"l": 308.86203, "t": 206.80517999999995, "r": 382.35614, "b": 215.39293999999995, "coord_origin": "TOPLEFT"}}, {"id": 144, "text": "predicts an extra column, this will result", "bbox": {"l": 385.07501, "t": 206.71551999999997, "r": 545.11426, "b": 215.62207, "coord_origin": "TOPLEFT"}}, {"id": 145, "text": "in an extra column of predicted bounding boxes.", "bbox": {"l": 308.862, "t": 218.67052999999999, "r": 501.6981799999999, "b": 227.57709, "coord_origin": "TOPLEFT"}}]}, "text": "our Cell BBox Decoder accuracy for cells with a class label of \u2018content\u2019 only using the PASCAL VOC mAP metric for pre-processing and post-processing. Note that we do not have post-processing results for SynthTabNet as images are only provided. To compare the performance of our proposed approach, we\u2019ve integrated TableFormer\u2019s Cell BBox Decoder into EDD architecture. As mentioned previously, the Structure Decoder provides the Cell BBox Decoder with the features needed to predict the bounding box predictions. Therefore, the accuracy of the Structure Decoder directly influences the accuracy of the Cell BBox Decoder . If the Structure Decoder predicts an extra column, this will result in an extra column of predicted bounding boxes."}, {"label": "table", "id": 11, "page_no": 6, "cluster": {"id": 11, "label": "table", "bbox": {"l": 308.4067077636719, "t": 247.87644958496094, "r": 533.64208984375, "b": 303.8056640625, "coord_origin": "TOPLEFT"}, "confidence": 0.9691707491874695, "cells": [{"id": 146, "text": "Model", "bbox": {"l": 339.323, "t": 253.66436999999996, "r": 365.33536, "b": 262.57092, "coord_origin": "TOPLEFT"}}, {"id": 147, "text": "Dataset", "bbox": {"l": 401.04132, "t": 253.66436999999996, "r": 430.91916, "b": 262.57092, "coord_origin": "TOPLEFT"}}, {"id": 148, "text": "mAP", "bbox": {"l": 454.10214, "t": 253.66436999999996, "r": 474.58523999999994, "b": 262.57092, "coord_origin": "TOPLEFT"}}, {"id": 149, "text": "mAP (PP)", "bbox": {"l": 486.54034, "t": 253.66436999999996, "r": 527.2276, "b": 262.57092, "coord_origin": "TOPLEFT"}}, {"id": 150, "text": "EDD+BBox", "bbox": {"l": 327.65601, "t": 270.62134000000003, "r": 377.00076, "b": 279.52788999999996, "coord_origin": "TOPLEFT"}}, {"id": 151, "text": "PubTabNet", "bbox": {"l": 393.69809, "t": 270.62134000000003, "r": 438.28073, "b": 279.52788999999996, "coord_origin": "TOPLEFT"}}, {"id": 152, "text": "79.2", "bbox": {"l": 455.63559, "t": 270.62134000000003, "r": 473.07013, "b": 279.52788999999996, "coord_origin": "TOPLEFT"}}, {"id": 153, "text": "82.7", "bbox": {"l": 498.16592, "t": 270.62134000000003, "r": 515.60046, "b": 279.52788999999996, "coord_origin": "TOPLEFT"}}, {"id": 154, "text": "TableFormer", "bbox": {"l": 326.79501, "t": 282.57631999999995, "r": 377.86331, "b": 291.48288, "coord_origin": "TOPLEFT"}}, {"id": 155, "text": "PubTabNet", "bbox": {"l": 393.69388, "t": 282.57631999999995, "r": 438.27652, "b": 291.48288, "coord_origin": "TOPLEFT"}}, {"id": 156, "text": "82.1", "bbox": {"l": 455.63101, "t": 282.45676, "r": 473.06555000000003, "b": 291.41315, "coord_origin": "TOPLEFT"}}, {"id": 157, "text": "86.8", "bbox": {"l": 498.1713, "t": 282.45676, "r": 515.60583, "b": 291.41315, "coord_origin": "TOPLEFT"}}, {"id": 158, "text": "TableFormer", "bbox": {"l": 326.79501, "t": 294.53131, "r": 377.86331, "b": 303.43787, "coord_origin": "TOPLEFT"}}, {"id": 159, "text": "SynthTabNet", "bbox": {"l": 389.81842, "t": 294.53131, "r": 442.15194999999994, "b": 303.43787, "coord_origin": "TOPLEFT"}}, {"id": 160, "text": "87.7", "bbox": {"l": 455.63135, "t": 294.53131, "r": 473.06589, "b": 303.43787, "coord_origin": "TOPLEFT"}}, {"id": 161, "text": "-", "bbox": {"l": 505.22515999999996, "t": 294.53131, "r": 508.54268999999994, "b": 303.43787, "coord_origin": "TOPLEFT"}}]}, "text": null, "otsl_seq": ["ched", "ched", "ched", "ched", "nl", "fcel", "fcel", "fcel", "fcel", "nl", "fcel", "fcel", "fcel", "fcel", "nl", "fcel", "fcel", "fcel", "fcel", "nl"], "num_rows": 4, "num_cols": 4, "table_cells": [{"bbox": {"l": 339.323, "t": 253.66436999999996, "r": 365.33536, "b": 262.57092, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Model", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 401.04132, "t": 253.66436999999996, "r": 430.91916, "b": 262.57092, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "Dataset", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 454.10214, "t": 253.66436999999996, "r": 474.58523999999994, "b": 262.57092, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "mAP", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 486.54034, "t": 253.66436999999996, "r": 527.2276, "b": 262.57092, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "mAP (PP)", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 327.65601, "t": 270.62134000000003, "r": 377.00076, "b": 279.52788999999996, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "EDD+BBox", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 393.69809, "t": 270.62134000000003, "r": 438.28073, "b": 279.52788999999996, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "PubTabNet", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 455.63559, "t": 270.62134000000003, "r": 473.07013, "b": 279.52788999999996, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "79.2", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 498.16592, "t": 270.62134000000003, "r": 515.60046, "b": 279.52788999999996, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "82.7", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 326.79501, "t": 282.57631999999995, "r": 377.86331, "b": 291.48288, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "TableFormer", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 393.69388, "t": 282.57631999999995, "r": 438.27652, "b": 291.48288, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "PubTabNet", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 455.63101, "t": 282.45676, "r": 473.06555000000003, "b": 291.41315, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "82.1", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 498.1713, "t": 282.45676, "r": 515.60583, "b": 291.41315, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "86.8", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 326.79501, "t": 294.53131, "r": 377.86331, "b": 303.43787, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "TableFormer", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 389.81842, "t": 294.53131, "r": 442.15194999999994, "b": 303.43787, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "SynthTabNet", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 455.63135, "t": 294.53131, "r": 473.06589, "b": 303.43787, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "87.7", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 505.22515999999996, "t": 294.53131, "r": 508.54268999999994, "b": 303.43787, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "-", "column_header": false, "row_header": false, "row_section": false}]}, {"label": "caption", "id": 12, "page_no": 6, "cluster": {"id": 12, "label": "caption", "bbox": {"l": 308.0755615234375, "t": 315.7750549316406, "r": 545.11517, "b": 337.8129577636719, "coord_origin": "TOPLEFT"}, "confidence": 0.9519906044006348, "cells": [{"id": 162, "text": "Table 3:", "bbox": {"l": 308.862, "t": 316.44931, "r": 341.49951, "b": 325.35587, "coord_origin": "TOPLEFT"}}, {"id": 163, "text": "Cell Bounding Box detection results on PubTab-", "bbox": {"l": 348.60284, "t": 316.44931, "r": 545.11517, "b": 325.35587, "coord_origin": "TOPLEFT"}}, {"id": 164, "text": "Net, and FinTabNet. PP: Post-processing.", "bbox": {"l": 308.862, "t": 328.4043, "r": 474.97845, "b": 337.3108500000001, "coord_origin": "TOPLEFT"}}]}, "text": "Table 3: Cell Bounding Box detection results on PubTabNet, and FinTabNet. PP: Post-processing."}, {"label": "text", "id": 13, "page_no": 6, "cluster": {"id": 13, "label": "text", "bbox": {"l": 307.8382873535156, "t": 367.0015563964844, "r": 545.3449096679688, "b": 520.16769, "coord_origin": "TOPLEFT"}, "confidence": 0.9835002422332764, "cells": [{"id": 165, "text": "Cell Content.", "bbox": {"l": 320.81699, "t": 367.6797199999999, "r": 378.94876, "b": 376.63611, "coord_origin": "TOPLEFT"}}, {"id": 166, "text": "In this section, we evaluate the entire", "bbox": {"l": 387.07898, "t": 367.79929, "r": 545.11566, "b": 376.70584, "coord_origin": "TOPLEFT"}}, {"id": 167, "text": "pipeline of recovering a table with content.", "bbox": {"l": 308.86197, "t": 379.75426999999996, "r": 487.19257, "b": 388.66083, "coord_origin": "TOPLEFT"}}, {"id": 168, "text": "Here we put", "bbox": {"l": 493.96713, "t": 379.75426999999996, "r": 545.11511, "b": 388.66083, "coord_origin": "TOPLEFT"}}, {"id": 169, "text": "our approach to test by capitalizing on extracting content", "bbox": {"l": 308.86197, "t": 391.70926, "r": 545.11505, "b": 400.61581, "coord_origin": "TOPLEFT"}}, {"id": 170, "text": "from the PDF cells rather than decoding from images. Tab.", "bbox": {"l": 308.86197, "t": 403.66525, "r": 545.11523, "b": 412.57181, "coord_origin": "TOPLEFT"}}, {"id": 171, "text": "4", "bbox": {"l": 308.86197, "t": 415.62024, "r": 314.08096, "b": 424.52679, "coord_origin": "TOPLEFT"}}, {"id": 172, "text": "shows the TEDs score of HTML code representing the", "bbox": {"l": 316.69046, "t": 415.62024, "r": 545.11517, "b": 424.52679, "coord_origin": "TOPLEFT"}}, {"id": 173, "text": "structure of the table along with the content inserted in the", "bbox": {"l": 308.86197, "t": 427.57523, "r": 545.11505, "b": 436.48177999999996, "coord_origin": "TOPLEFT"}}, {"id": 174, "text": "data cell and compared with the ground-truth. Our method", "bbox": {"l": 308.86197, "t": 439.53021, "r": 545.11505, "b": 448.43677, "coord_origin": "TOPLEFT"}}, {"id": 175, "text": "achieved a", "bbox": {"l": 308.86197, "t": 451.4852, "r": 350.23666, "b": 460.39175, "coord_origin": "TOPLEFT"}}, {"id": 176, "text": "5.3%", "bbox": {"l": 352.17596, "t": 451.36563, "r": 374.59183, "b": 460.32201999999995, "coord_origin": "TOPLEFT"}}, {"id": 177, "text": "increase over the state-of-the-art, and com-", "bbox": {"l": 376.53296, "t": 451.4852, "r": 545.11011, "b": 460.39175, "coord_origin": "TOPLEFT"}}, {"id": 178, "text": "mercial solutions. We believe our scores would be higher", "bbox": {"l": 308.86197, "t": 463.44019, "r": 545.11511, "b": 472.34674, "coord_origin": "TOPLEFT"}}, {"id": 179, "text": "if the HTML ground-truth matched the extracted PDF cell", "bbox": {"l": 308.86197, "t": 475.39618, "r": 545.11517, "b": 484.30273, "coord_origin": "TOPLEFT"}}, {"id": 180, "text": "content. Unfortunately, there are small discrepancies such", "bbox": {"l": 308.86197, "t": 487.35117, "r": 545.11511, "b": 496.25772, "coord_origin": "TOPLEFT"}}, {"id": 181, "text": "as spacings around words or special characters with various", "bbox": {"l": 308.86197, "t": 499.30615, "r": 545.11505, "b": 508.21271, "coord_origin": "TOPLEFT"}}, {"id": 182, "text": "unicode representations.", "bbox": {"l": 308.86197, "t": 511.26114, "r": 405.69846, "b": 520.16769, "coord_origin": "TOPLEFT"}}]}, "text": "Cell Content. In this section, we evaluate the entire pipeline of recovering a table with content. Here we put our approach to test by capitalizing on extracting content from the PDF cells rather than decoding from images. Tab. 4 shows the TEDs score of HTML code representing the structure of the table along with the content inserted in the data cell and compared with the ground-truth. Our method achieved a 5.3% increase over the state-of-the-art, and commercial solutions. We believe our scores would be higher if the HTML ground-truth matched the extracted PDF cell content. Unfortunately, there are small discrepancies such as spacings around words or special characters with various unicode representations."}, {"label": "table", "id": 14, "page_no": 6, "cluster": {"id": 14, "label": "table", "bbox": {"l": 332.9688720703125, "t": 540.2835083007812, "r": 520.942138671875, "b": 643.84991, "coord_origin": "TOPLEFT"}, "confidence": 0.9775565266609192, "cells": [{"id": 183, "text": "Model", "bbox": {"l": 358.01099, "t": 552.23337, "r": 384.02335, "b": 561.1399200000001, "coord_origin": "TOPLEFT"}}, {"id": 184, "text": "TEDS", "bbox": {"l": 449.03400000000005, "t": 546.25537, "r": 473.94049000000007, "b": 555.16193, "coord_origin": "TOPLEFT"}}, {"id": 185, "text": "Simple", "bbox": {"l": 408.50598, "t": 558.21037, "r": 436.73999, "b": 567.11693, "coord_origin": "TOPLEFT"}}, {"id": 186, "text": "Complex", "bbox": {"l": 448.6951, "t": 558.21037, "r": 485.07849, "b": 567.11693, "coord_origin": "TOPLEFT"}}, {"id": 187, "text": "All", "bbox": {"l": 499.3848, "t": 558.21037, "r": 512.117, "b": 567.11693, "coord_origin": "TOPLEFT"}}, {"id": 188, "text": "Tabula", "bbox": {"l": 357.68201, "t": 575.16736, "r": 384.3519, "b": 584.0739100000001, "coord_origin": "TOPLEFT"}}, {"id": 189, "text": "78.0", "bbox": {"l": 413.90097, "t": 575.16736, "r": 431.33550999999994, "b": 584.0739100000001, "coord_origin": "TOPLEFT"}}, {"id": 190, "text": "57.8", "bbox": {"l": 458.16479000000004, "t": 575.16736, "r": 475.59933000000007, "b": 584.0739100000001, "coord_origin": "TOPLEFT"}}, {"id": 191, "text": "67.9", "bbox": {"l": 497.0289, "t": 575.16736, "r": 514.46344, "b": 584.0739100000001, "coord_origin": "TOPLEFT"}}, {"id": 192, "text": "Traprange", "bbox": {"l": 350.72299, "t": 587.12236, "r": 391.31064, "b": 596.02892, "coord_origin": "TOPLEFT"}}, {"id": 193, "text": "60.8", "bbox": {"l": 413.90582, "t": 587.12236, "r": 431.34036, "b": 596.02892, "coord_origin": "TOPLEFT"}}, {"id": 194, "text": "49.9", "bbox": {"l": 458.16965, "t": 587.12236, "r": 475.60419, "b": 596.02892, "coord_origin": "TOPLEFT"}}, {"id": 195, "text": "55.4", "bbox": {"l": 497.03374999999994, "t": 587.12236, "r": 514.46832, "b": 596.02892, "coord_origin": "TOPLEFT"}}, {"id": 196, "text": "Camelot", "bbox": {"l": 354.13599, "t": 599.07835, "r": 387.89923, "b": 607.98491, "coord_origin": "TOPLEFT"}}, {"id": 197, "text": "80.0", "bbox": {"l": 413.90161, "t": 599.07835, "r": 431.33615, "b": 607.98491, "coord_origin": "TOPLEFT"}}, {"id": 198, "text": "66.0", "bbox": {"l": 458.16544, "t": 599.07835, "r": 475.59998, "b": 607.98491, "coord_origin": "TOPLEFT"}}, {"id": 199, "text": "73.0", "bbox": {"l": 497.02954000000005, "t": 599.07835, "r": 514.46411, "b": 607.98491, "coord_origin": "TOPLEFT"}}, {"id": 200, "text": "Acrobat Pro", "bbox": {"l": 346.55899, "t": 611.03336, "r": 395.47534, "b": 619.93991, "coord_origin": "TOPLEFT"}}, {"id": 201, "text": "68.9", "bbox": {"l": 413.90616, "t": 611.03336, "r": 431.34069999999997, "b": 619.93991, "coord_origin": "TOPLEFT"}}, {"id": 202, "text": "61.8", "bbox": {"l": 458.16998000000007, "t": 611.03336, "r": 475.60452, "b": 619.93991, "coord_origin": "TOPLEFT"}}, {"id": 203, "text": "65.3", "bbox": {"l": 497.03409, "t": 611.03336, "r": 514.46863, "b": 619.93991, "coord_origin": "TOPLEFT"}}, {"id": 204, "text": "EDD", "bbox": {"l": 360.78101, "t": 622.9883600000001, "r": 381.25415, "b": 631.89491, "coord_origin": "TOPLEFT"}}, {"id": 205, "text": "91.2", "bbox": {"l": 413.90158, "t": 622.9883600000001, "r": 431.33612, "b": 631.89491, "coord_origin": "TOPLEFT"}}, {"id": 206, "text": "85.4", "bbox": {"l": 458.16541, "t": 622.9883600000001, "r": 475.59995000000004, "b": 631.89491, "coord_origin": "TOPLEFT"}}, {"id": 207, "text": "88.3", "bbox": {"l": 497.0295100000001, "t": 622.9883600000001, "r": 514.46405, "b": 631.89491, "coord_origin": "TOPLEFT"}}, {"id": 208, "text": "TableFormer", "bbox": {"l": 345.483, "t": 634.94336, "r": 396.5513, "b": 643.84991, "coord_origin": "TOPLEFT"}}, {"id": 209, "text": "95.4", "bbox": {"l": 413.90616, "t": 634.94336, "r": 431.34069999999997, "b": 643.84991, "coord_origin": "TOPLEFT"}}, {"id": 210, "text": "90.1", "bbox": {"l": 458.16998000000007, "t": 634.94336, "r": 475.60452, "b": 643.84991, "coord_origin": "TOPLEFT"}}, {"id": 211, "text": "93.6", "bbox": {"l": 497.03400000000005, "t": 634.82381, "r": 514.46857, "b": 643.78018, "coord_origin": "TOPLEFT"}}]}, "text": null, "otsl_seq": ["fcel", "ched", "ched", "ched", "nl", "rhed", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "nl"], "num_rows": 7, "num_cols": 4, "table_cells": [{"bbox": {"l": 358.01099, "t": 552.23337, "r": 384.02335, "b": 561.1399200000001, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Model", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 448.6951, "t": 546.25537, "r": 485.07849, "b": 567.11693, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "TEDS Complex", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 408.50598, "t": 558.21037, "r": 436.73999, "b": 567.11693, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "Simple", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 499.3848, "t": 558.21037, "r": 512.117, "b": 567.11693, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "All", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 357.68201, "t": 575.16736, "r": 384.3519, "b": 584.0739100000001, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Tabula", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 413.90097, "t": 575.16736, "r": 431.33550999999994, "b": 584.0739100000001, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "78.0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 458.16479000000004, "t": 575.16736, "r": 475.59933000000007, "b": 584.0739100000001, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "57.8", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 497.0289, "t": 575.16736, "r": 514.46344, "b": 584.0739100000001, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "67.9", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 350.72299, "t": 587.12236, "r": 391.31064, "b": 596.02892, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Traprange", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 413.90582, "t": 587.12236, "r": 431.34036, "b": 596.02892, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "60.8", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 458.16965, "t": 587.12236, "r": 475.60419, "b": 596.02892, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "49.9", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 497.03374999999994, "t": 587.12236, "r": 514.46832, "b": 596.02892, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "55.4", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 354.13599, "t": 599.07835, "r": 387.89923, "b": 607.98491, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Camelot", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 413.90161, "t": 599.07835, "r": 431.33615, "b": 607.98491, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "80.0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 458.16544, "t": 599.07835, "r": 475.59998, "b": 607.98491, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "66.0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 497.02954000000005, "t": 599.07835, "r": 514.46411, "b": 607.98491, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "73.0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 346.55899, "t": 611.03336, "r": 395.47534, "b": 619.93991, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Acrobat Pro", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 413.90616, "t": 611.03336, "r": 431.34069999999997, "b": 619.93991, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "68.9", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 458.16998000000007, "t": 611.03336, "r": 475.60452, "b": 619.93991, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "61.8", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 497.03409, "t": 611.03336, "r": 514.46863, "b": 619.93991, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "65.3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 360.78101, "t": 622.9883600000001, "r": 381.25415, "b": 631.89491, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "EDD", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 413.90158, "t": 622.9883600000001, "r": 431.33612, "b": 631.89491, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "91.2", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 458.16541, "t": 622.9883600000001, "r": 475.59995000000004, "b": 631.89491, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "85.4", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 497.0295100000001, "t": 622.9883600000001, "r": 514.46405, "b": 631.89491, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "88.3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 345.483, "t": 634.94336, "r": 396.5513, "b": 643.84991, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "TableFormer", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 413.90616, "t": 634.94336, "r": 431.34069999999997, "b": 643.84991, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "95.4", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 458.16998000000007, "t": 634.94336, "r": 475.60452, "b": 643.84991, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "90.1", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 497.03400000000005, "t": 634.82381, "r": 514.46857, "b": 643.78018, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "93.6", "column_header": false, "row_header": false, "row_section": false}]}, {"label": "caption", "id": 15, "page_no": 6, "cluster": {"id": 15, "label": "caption", "bbox": {"l": 307.9747619628906, "t": 655.8218994140625, "r": 545.1710815429688, "b": 689.8007202148438, "coord_origin": "TOPLEFT"}, "confidence": 0.954140305519104, "cells": [{"id": 212, "text": "Table 4:", "bbox": {"l": 308.862, "t": 656.86136, "r": 341.73862, "b": 665.76792, "coord_origin": "TOPLEFT"}}, {"id": 213, "text": "Results of structure with content retrieved using", "bbox": {"l": 349.55927, "t": 656.86136, "r": 545.11517, "b": 665.76792, "coord_origin": "TOPLEFT"}}, {"id": 214, "text": "cell detection on PubTabNet. In all cases the input is PDF", "bbox": {"l": 308.862, "t": 668.81636, "r": 545.11505, "b": 677.7229199999999, "coord_origin": "TOPLEFT"}}, {"id": 215, "text": "documents with cropped tables.", "bbox": {"l": 308.862, "t": 680.77136, "r": 435.03836, "b": 689.6779300000001, "coord_origin": "TOPLEFT"}}]}, "text": "Table 4: Results of structure with content retrieved using cell detection on PubTabNet. In all cases the input is PDF documents with cropped tables."}, {"label": "page_footer", "id": 16, "page_no": 6, "cluster": {"id": 16, "label": "page_footer", "bbox": {"l": 294.5538330078125, "t": 733.197021484375, "r": 300.1892395019531, "b": 743.039921, "coord_origin": "TOPLEFT"}, "confidence": 0.8787975907325745, "cells": [{"id": 216, "text": "7", "bbox": {"l": 295.121, "t": 734.133358, "r": 300.10229, "b": 743.039921, "coord_origin": "TOPLEFT"}}]}, "text": "7"}], "body": [{"label": "section_header", "id": 0, "page_no": 6, "cluster": {"id": 0, "label": "section_header", "bbox": {"l": 49.507503509521484, "t": 73.60204315185547, "r": 168.0131378173828, "b": 84.25342, "coord_origin": "TOPLEFT"}, "confidence": 0.9554283022880554, "cells": [{"id": 0, "text": "5.3.", "bbox": {"l": 50.112, "t": 74.40137000000016, "r": 63.704811, "b": 84.25342, "coord_origin": "TOPLEFT"}}, {"id": 1, "text": "Datasets and Metrics", "bbox": {"l": 72.766685, "t": 74.40137000000016, "r": 167.89825, "b": 84.25342, "coord_origin": "TOPLEFT"}}]}, "text": "5.3. Datasets and Metrics"}, {"label": "text", "id": 1, "page_no": 6, "cluster": {"id": 1, "label": "text", "bbox": {"l": 49.59501647949219, "t": 92.33658599853516, "r": 286.36511, "b": 138.36517333984375, "coord_origin": "TOPLEFT"}, "confidence": 0.9862996935844421, "cells": [{"id": 2, "text": "The Tree-Edit-Distance-Based Similarity (TEDS) met-", "bbox": {"l": 62.067001, "t": 93.35039999999992, "r": 286.36499, "b": 102.25696000000016, "coord_origin": "TOPLEFT"}}, {"id": 3, "text": "ric was introduced in [37]. It represents the prediction, and", "bbox": {"l": 50.112, "t": 105.30542000000003, "r": 286.36511, "b": 114.21198000000015, "coord_origin": "TOPLEFT"}}, {"id": 4, "text": "ground-truth as a tree structure of HTML tags. This simi-", "bbox": {"l": 50.112, "t": 117.26044000000002, "r": 286.36505, "b": 126.16699000000006, "coord_origin": "TOPLEFT"}}, {"id": 5, "text": "larity is calculated as:", "bbox": {"l": 50.112, "t": 129.21642999999995, "r": 136.71687, "b": 138.12298999999996, "coord_origin": "TOPLEFT"}}]}, "text": "The Tree-Edit-Distance-Based Similarity (TEDS) metric was introduced in [37]. It represents the prediction, and ground-truth as a tree structure of HTML tags. This similarity is calculated as:"}, {"label": "formula", "id": 2, "page_no": 6, "cluster": {"id": 2, "label": "formula", "bbox": {"l": 85.5722427368164, "t": 149.28536987304688, "r": 286.3624, "b": 173.28463745117188, "coord_origin": "TOPLEFT"}, "confidence": 0.9500426650047302, "cells": [{"id": 6, "text": "TEDS (", "bbox": {"l": 86.218994, "t": 157.05798000000004, "r": 118.8784, "b": 165.90479000000005, "coord_origin": "TOPLEFT"}}, {"id": 7, "text": "T$_{a}$, T$_{b}$", "bbox": {"l": 118.87499, "t": 157.05798000000004, "r": 143.26962, "b": 165.90479000000005, "coord_origin": "TOPLEFT"}}, {"id": 8, "text": ") = 1", "bbox": {"l": 143.76799, "t": 157.05798000000004, "r": 165.9019, "b": 165.90479000000005, "coord_origin": "TOPLEFT"}}, {"id": 9, "text": "\u2212", "bbox": {"l": 168.12099, "t": 156.50012000000004, "r": 175.8699, "b": 165.90479000000005, "coord_origin": "TOPLEFT"}}, {"id": 10, "text": "EditDist (", "bbox": {"l": 179.27899, "t": 150.31799, "r": 221.95677, "b": 159.16479000000004, "coord_origin": "TOPLEFT"}}, {"id": 11, "text": "T$_{a}$, T$_{b}$", "bbox": {"l": 221.95200000000003, "t": 150.31799, "r": 246.34663, "b": 159.16479000000004, "coord_origin": "TOPLEFT"}}, {"id": 12, "text": ")", "bbox": {"l": 246.84499999999997, "t": 150.31799, "r": 250.71945, "b": 159.16479000000004, "coord_origin": "TOPLEFT"}}, {"id": 13, "text": "max (", "bbox": {"l": 182.21201, "t": 163.89197000000001, "r": 206.29161, "b": 172.73877000000005, "coord_origin": "TOPLEFT"}}, {"id": 14, "text": "|", "bbox": {"l": 206.289, "t": 163.33411, "r": 209.05661, "b": 172.73877000000005, "coord_origin": "TOPLEFT"}}, {"id": 15, "text": "T$_{a}$", "bbox": {"l": 209.056, "t": 163.89197000000001, "r": 219.19968, "b": 172.73877000000005, "coord_origin": "TOPLEFT"}}, {"id": 16, "text": "|", "bbox": {"l": 219.69700999999998, "t": 163.33411, "r": 222.46461000000002, "b": 172.73877000000005, "coord_origin": "TOPLEFT"}}, {"id": 17, "text": ",", "bbox": {"l": 224.125, "t": 163.89197000000001, "r": 226.89261, "b": 172.73877000000005, "coord_origin": "TOPLEFT"}}, {"id": 18, "text": "|", "bbox": {"l": 228.55299000000002, "t": 163.33411, "r": 231.3206, "b": 172.73877000000005, "coord_origin": "TOPLEFT"}}, {"id": 19, "text": "T$_{b}$", "bbox": {"l": 231.31999, "t": 163.89197000000001, "r": 240.64563, "b": 172.73877000000005, "coord_origin": "TOPLEFT"}}, {"id": 20, "text": "|", "bbox": {"l": 241.144, "t": 163.33411, "r": 243.91161, "b": 172.73877000000005, "coord_origin": "TOPLEFT"}}, {"id": 21, "text": ")", "bbox": {"l": 243.911, "t": 163.89197000000001, "r": 247.78545, "b": 172.73877000000005, "coord_origin": "TOPLEFT"}}, {"id": 22, "text": "(3)", "bbox": {"l": 274.746, "t": 157.21740999999997, "r": 286.3624, "b": 166.12396, "coord_origin": "TOPLEFT"}}]}, "text": "TEDS ( T$_{a}$, T$_{b}$ ) = 1 \u2212 EditDist ( T$_{a}$, T$_{b}$ ) max ( | T$_{a}$ | , | T$_{b}$ | ) (3)"}, {"label": "text", "id": 3, "page_no": 6, "cluster": {"id": 3, "label": "text", "bbox": {"l": 49.815025329589844, "t": 180.52044677734375, "r": 286.4786376953125, "b": 213.97900000000004, "coord_origin": "TOPLEFT"}, "confidence": 0.9735332727432251, "cells": [{"id": 23, "text": "where", "bbox": {"l": 62.067001, "t": 181.16241000000002, "r": 86.405632, "b": 190.06897000000004, "coord_origin": "TOPLEFT"}}, {"id": 24, "text": "T$_{a}$", "bbox": {"l": 88.581001, "t": 181.00298999999995, "r": 98.724663, "b": 189.84978999999998, "coord_origin": "TOPLEFT"}}, {"id": 25, "text": "and", "bbox": {"l": 101.399, "t": 181.16241000000002, "r": 115.785, "b": 190.06897000000004, "coord_origin": "TOPLEFT"}}, {"id": 26, "text": "T$_{b}$", "bbox": {"l": 117.961, "t": 181.00298999999995, "r": 127.28664, "b": 189.84978999999998, "coord_origin": "TOPLEFT"}}, {"id": 27, "text": "represent tables in tree structure HTML", "bbox": {"l": 129.95999, "t": 181.16241000000002, "r": 286.36285, "b": 190.06897000000004, "coord_origin": "TOPLEFT"}}, {"id": 28, "text": "format. EditDist denotes the tree-edit distance, and", "bbox": {"l": 50.111992, "t": 193.11743, "r": 252.78116000000003, "b": 202.02399000000003, "coord_origin": "TOPLEFT"}}, {"id": 29, "text": "|", "bbox": {"l": 255.18201, "t": 192.40015000000005, "r": 257.94962, "b": 201.80480999999997, "coord_origin": "TOPLEFT"}}, {"id": 30, "text": "T", "bbox": {"l": 257.94901, "t": 192.95800999999994, "r": 263.77115, "b": 201.80480999999997, "coord_origin": "TOPLEFT"}}, {"id": 31, "text": "|", "bbox": {"l": 265.155, "t": 192.40015000000005, "r": 267.92261, "b": 201.80480999999997, "coord_origin": "TOPLEFT"}}, {"id": 32, "text": "rep-", "bbox": {"l": 270.32199, "t": 193.11743, "r": 286.36179, "b": 202.02399000000003, "coord_origin": "TOPLEFT"}}, {"id": 33, "text": "resents the number of nodes in", "bbox": {"l": 50.111984, "t": 205.07245, "r": 172.13388, "b": 213.97900000000004, "coord_origin": "TOPLEFT"}}, {"id": 34, "text": "T", "bbox": {"l": 174.62399, "t": 204.91301999999996, "r": 180.44614, "b": 213.75982999999997, "coord_origin": "TOPLEFT"}}, {"id": 35, "text": ".", "bbox": {"l": 181.82899, "t": 205.07245, "r": 184.31964, "b": 213.97900000000004, "coord_origin": "TOPLEFT"}}]}, "text": "where T$_{a}$ and T$_{b}$ represent tables in tree structure HTML format. EditDist denotes the tree-edit distance, and | T | represents the number of nodes in T ."}, {"label": "section_header", "id": 4, "page_no": 6, "cluster": {"id": 4, "label": "section_header", "bbox": {"l": 49.47447204589844, "t": 224.4459991455078, "r": 170.64169311523438, "b": 235.01736450195312, "coord_origin": "TOPLEFT"}, "confidence": 0.9588840007781982, "cells": [{"id": 36, "text": "5.4.", "bbox": {"l": 50.112, "t": 224.81946000000005, "r": 64.551605, "b": 234.67151, "coord_origin": "TOPLEFT"}}, {"id": 37, "text": "Quantitative Analysis", "bbox": {"l": 74.178009, "t": 224.81946000000005, "r": 170.45169, "b": 234.67151, "coord_origin": "TOPLEFT"}}]}, "text": "5.4. Quantitative Analysis"}, {"label": "text", "id": 5, "page_no": 6, "cluster": {"id": 5, "label": "text", "bbox": {"l": 49.47037124633789, "t": 242.6270294189453, "r": 286.4912414550781, "b": 396.757568359375, "coord_origin": "TOPLEFT"}, "confidence": 0.9855114221572876, "cells": [{"id": 38, "text": "Structure.", "bbox": {"l": 62.067001, "t": 243.6499, "r": 105.32461, "b": 252.60626000000002, "coord_origin": "TOPLEFT"}}, {"id": 39, "text": "As shown in Tab.", "bbox": {"l": 112.12600000000002, "t": 243.76946999999996, "r": 184.68361, "b": 252.67602999999997, "coord_origin": "TOPLEFT"}}, {"id": 40, "text": "2, TableFormer outper-", "bbox": {"l": 191.4781, "t": 243.76946999999996, "r": 286.36188, "b": 252.67602999999997, "coord_origin": "TOPLEFT"}}, {"id": 41, "text": "forms all SOTA methods across different datasets by a large", "bbox": {"l": 50.112, "t": 255.72448999999995, "r": 286.36508, "b": 264.63104, "coord_origin": "TOPLEFT"}}, {"id": 42, "text": "margin for predicting the table structure from an image.", "bbox": {"l": 50.112, "t": 267.67949999999996, "r": 286.36508, "b": 276.58606, "coord_origin": "TOPLEFT"}}, {"id": 43, "text": "All the more, our model outperforms pre-trained methods.", "bbox": {"l": 50.112, "t": 279.63446, "r": 286.36508, "b": 288.54105, "coord_origin": "TOPLEFT"}}, {"id": 44, "text": "During the evaluation we do not apply any table filtering.", "bbox": {"l": 50.112, "t": 291.59048, "r": 286.36514, "b": 300.49704, "coord_origin": "TOPLEFT"}}, {"id": 45, "text": "We also provide our baseline results on the SynthTabNet", "bbox": {"l": 50.112, "t": 303.54547, "r": 286.36508, "b": 312.45203000000004, "coord_origin": "TOPLEFT"}}, {"id": 46, "text": "dataset. It has been observed that large tables (e.g. tables", "bbox": {"l": 50.112, "t": 315.50046, "r": 286.36505, "b": 324.40700999999996, "coord_origin": "TOPLEFT"}}, {"id": 47, "text": "that occupy half of the page or more) yield poor predictions.", "bbox": {"l": 50.112, "t": 327.45544, "r": 286.36508, "b": 336.362, "coord_origin": "TOPLEFT"}}, {"id": 48, "text": "We attribute this issue to the image resizing during the pre-", "bbox": {"l": 50.112, "t": 339.41043, "r": 286.36508, "b": 348.31699000000003, "coord_origin": "TOPLEFT"}}, {"id": 49, "text": "processing step, that produces downsampled images with", "bbox": {"l": 50.112, "t": 351.36542, "r": 286.36505, "b": 360.27197, "coord_origin": "TOPLEFT"}}, {"id": 50, "text": "indistinguishable features. This problem can be addressed", "bbox": {"l": 50.112, "t": 363.32141, "r": 286.36508, "b": 372.2279700000001, "coord_origin": "TOPLEFT"}}, {"id": 51, "text": "by treating such big tables with a separate model which ac-", "bbox": {"l": 50.112, "t": 375.2764, "r": 286.36511, "b": 384.18295000000006, "coord_origin": "TOPLEFT"}}, {"id": 52, "text": "cepts a large input image size.", "bbox": {"l": 50.112, "t": 387.23138, "r": 170.01187, "b": 396.13794, "coord_origin": "TOPLEFT"}}]}, "text": "Structure. As shown in Tab. 2, TableFormer outperforms all SOTA methods across different datasets by a large margin for predicting the table structure from an image. All the more, our model outperforms pre-trained methods. During the evaluation we do not apply any table filtering. We also provide our baseline results on the SynthTabNet dataset. It has been observed that large tables (e.g. tables that occupy half of the page or more) yield poor predictions. We attribute this issue to the image resizing during the preprocessing step, that produces downsampled images with indistinguishable features. This problem can be addressed by treating such big tables with a separate model which accepts a large input image size."}, {"label": "table", "id": 6, "page_no": 6, "cluster": {"id": 6, "label": "table", "bbox": {"l": 53.36846160888672, "t": 409.1356506347656, "r": 283.0443420410156, "b": 582.397705078125, "coord_origin": "TOPLEFT"}, "confidence": 0.989250659942627, "cells": [{"id": 53, "text": "Model", "bbox": {"l": 78.843002, "t": 420.69037, "r": 104.85535, "b": 429.59692, "coord_origin": "TOPLEFT"}}, {"id": 54, "text": "TEDS", "bbox": {"l": 211.2, "t": 414.71237, "r": 236.10649, "b": 423.61893, "coord_origin": "TOPLEFT"}}, {"id": 55, "text": "Dataset", "bbox": {"l": 129.338, "t": 426.66736, "r": 159.21584, "b": 435.57391000000007, "coord_origin": "TOPLEFT"}}, {"id": 56, "text": "Simple", "bbox": {"l": 171.17096, "t": 426.66736, "r": 199.40497, "b": 435.57391000000007, "coord_origin": "TOPLEFT"}}, {"id": 57, "text": "Complex", "bbox": {"l": 211.36009, "t": 426.66736, "r": 247.74349999999998, "b": 435.57391000000007, "coord_origin": "TOPLEFT"}}, {"id": 58, "text": "All", "bbox": {"l": 264.54044, "t": 426.66736, "r": 277.27264, "b": 435.57391000000007, "coord_origin": "TOPLEFT"}}, {"id": 59, "text": "EDD", "bbox": {"l": 81.612, "t": 443.62436, "r": 102.08514, "b": 452.53091, "coord_origin": "TOPLEFT"}}, {"id": 60, "text": "PTN", "bbox": {"l": 134.87206, "t": 443.62436, "r": 153.69141, "b": 452.53091, "coord_origin": "TOPLEFT"}}, {"id": 61, "text": "91.1", "bbox": {"l": 176.56554, "t": 443.62436, "r": 194.00009, "b": 452.53091, "coord_origin": "TOPLEFT"}}, {"id": 62, "text": "88.7", "bbox": {"l": 220.82938000000001, "t": 443.62436, "r": 238.26393, "b": 452.53091, "coord_origin": "TOPLEFT"}}, {"id": 63, "text": "89.9", "bbox": {"l": 262.18414, "t": 443.62436, "r": 279.61868, "b": 452.53091, "coord_origin": "TOPLEFT"}}, {"id": 64, "text": "GTE", "bbox": {"l": 82.165001, "t": 455.58035, "r": 101.5323, "b": 464.48691, "coord_origin": "TOPLEFT"}}, {"id": 65, "text": "PTN", "bbox": {"l": 134.86716, "t": 455.58035, "r": 153.68651, "b": 464.48691, "coord_origin": "TOPLEFT"}}, {"id": 66, "text": "-", "bbox": {"l": 183.62411, "t": 455.58035, "r": 186.94167, "b": 464.48691, "coord_origin": "TOPLEFT"}}, {"id": 67, "text": "-", "bbox": {"l": 227.88795000000002, "t": 455.58035, "r": 231.20551, "b": 464.48691, "coord_origin": "TOPLEFT"}}, {"id": 68, "text": "93.01", "bbox": {"l": 259.69855, "t": 455.58035, "r": 282.11441, "b": 464.48691, "coord_origin": "TOPLEFT"}}, {"id": 69, "text": "TableFormer", "bbox": {"l": 66.315002, "t": 468.13336, "r": 117.38329000000002, "b": 477.03992, "coord_origin": "TOPLEFT"}}, {"id": 70, "text": "PTN", "bbox": {"l": 134.86766, "t": 468.13336, "r": 153.68701, "b": 477.03992, "coord_origin": "TOPLEFT"}}, {"id": 71, "text": "98.5", "bbox": {"l": 176.57111, "t": 468.13336, "r": 194.00566, "b": 477.03992, "coord_origin": "TOPLEFT"}}, {"id": 72, "text": "95.0", "bbox": {"l": 220.83495, "t": 468.13336, "r": 238.26950000000002, "b": 477.03992, "coord_origin": "TOPLEFT"}}, {"id": 73, "text": "96.75", "bbox": {"l": 259.698, "t": 468.01379, "r": 282.11386, "b": 476.97018, "coord_origin": "TOPLEFT"}}, {"id": 74, "text": "EDD", "bbox": {"l": 81.612, "t": 483.32635, "r": 102.08514, "b": 492.23291, "coord_origin": "TOPLEFT"}}, {"id": 75, "text": "FTN", "bbox": {"l": 134.87206, "t": 483.32635, "r": 153.69141, "b": 492.23291, "coord_origin": "TOPLEFT"}}, {"id": 76, "text": "88.4", "bbox": {"l": 176.56554, "t": 483.32635, "r": 194.00009, "b": 492.23291, "coord_origin": "TOPLEFT"}}, {"id": 77, "text": "92.08", "bbox": {"l": 218.33870999999996, "t": 483.32635, "r": 240.75455999999997, "b": 492.23291, "coord_origin": "TOPLEFT"}}, {"id": 78, "text": "90.6", "bbox": {"l": 262.18411, "t": 483.32635, "r": 279.61865, "b": 492.23291, "coord_origin": "TOPLEFT"}}, {"id": 79, "text": "GTE", "bbox": {"l": 82.165001, "t": 495.28134, "r": 101.5323, "b": 504.1879, "coord_origin": "TOPLEFT"}}, {"id": 80, "text": "FTN", "bbox": {"l": 134.86716, "t": 495.28134, "r": 153.68651, "b": 504.1879, "coord_origin": "TOPLEFT"}}, {"id": 81, "text": "-", "bbox": {"l": 183.62411, "t": 495.28134, "r": 186.94167, "b": 504.1879, "coord_origin": "TOPLEFT"}}, {"id": 82, "text": "-", "bbox": {"l": 227.88795000000002, "t": 495.28134, "r": 231.20551, "b": 504.1879, "coord_origin": "TOPLEFT"}}, {"id": 83, "text": "87.14", "bbox": {"l": 259.69855, "t": 495.28134, "r": 282.11441, "b": 504.1879, "coord_origin": "TOPLEFT"}}, {"id": 84, "text": "GTE (FT)", "bbox": {"l": 71.789001, "t": 507.23633, "r": 111.90838999999998, "b": 516.14288, "coord_origin": "TOPLEFT"}}, {"id": 85, "text": "FTN", "bbox": {"l": 134.86221, "t": 507.23633, "r": 153.68156, "b": 516.14288, "coord_origin": "TOPLEFT"}}, {"id": 86, "text": "-", "bbox": {"l": 183.62914, "t": 507.23633, "r": 186.94669, "b": 516.14288, "coord_origin": "TOPLEFT"}}, {"id": 87, "text": "-", "bbox": {"l": 227.89297, "t": 507.23633, "r": 231.21053000000003, "b": 516.14288, "coord_origin": "TOPLEFT"}}, {"id": 88, "text": "91.02", "bbox": {"l": 259.6936, "t": 507.23633, "r": 282.10947, "b": 516.14288, "coord_origin": "TOPLEFT"}}, {"id": 89, "text": "TableFormer", "bbox": {"l": 66.315002, "t": 519.1913099999999, "r": 117.38329000000002, "b": 528.0978700000001, "coord_origin": "TOPLEFT"}}, {"id": 90, "text": "FTN", "bbox": {"l": 134.86766, "t": 519.1913099999999, "r": 153.68701, "b": 528.0978700000001, "coord_origin": "TOPLEFT"}}, {"id": 91, "text": "97.5", "bbox": {"l": 176.57111, "t": 519.1913099999999, "r": 194.00566, "b": 528.0978700000001, "coord_origin": "TOPLEFT"}}, {"id": 92, "text": "96.0", "bbox": {"l": 220.83495, "t": 519.1913099999999, "r": 238.26950000000002, "b": 528.0978700000001, "coord_origin": "TOPLEFT"}}, {"id": 93, "text": "96.8", "bbox": {"l": 262.189, "t": 519.0717500000001, "r": 279.62354, "b": 528.02814, "coord_origin": "TOPLEFT"}}, {"id": 94, "text": "EDD", "bbox": {"l": 81.612, "t": 536.49837, "r": 102.08514, "b": 545.40492, "coord_origin": "TOPLEFT"}}, {"id": 95, "text": "TB", "bbox": {"l": 137.91064, "t": 536.49837, "r": 150.64285, "b": 545.40492, "coord_origin": "TOPLEFT"}}, {"id": 96, "text": "86.0", "bbox": {"l": 176.56554, "t": 536.49837, "r": 194.00009, "b": 545.40492, "coord_origin": "TOPLEFT"}}, {"id": 97, "text": "-", "bbox": {"l": 227.89285, "t": 536.49837, "r": 231.21040000000002, "b": 545.40492, "coord_origin": "TOPLEFT"}}, {"id": 98, "text": "86.0", "bbox": {"l": 262.18411, "t": 536.49837, "r": 279.61865, "b": 545.40492, "coord_origin": "TOPLEFT"}}, {"id": 99, "text": "TableFormer", "bbox": {"l": 66.315002, "t": 548.45436, "r": 117.38329000000002, "b": 557.36092, "coord_origin": "TOPLEFT"}}, {"id": 100, "text": "TB", "bbox": {"l": 137.90625, "t": 548.45436, "r": 150.63846, "b": 557.36092, "coord_origin": "TOPLEFT"}}, {"id": 101, "text": "89.6", "bbox": {"l": 176.57111, "t": 548.45436, "r": 194.00566, "b": 557.36092, "coord_origin": "TOPLEFT"}}, {"id": 102, "text": "-", "bbox": {"l": 227.88845999999998, "t": 548.45436, "r": 231.20601, "b": 557.36092, "coord_origin": "TOPLEFT"}}, {"id": 103, "text": "89.6", "bbox": {"l": 262.189, "t": 548.3348100000001, "r": 279.62354, "b": 557.2911799999999, "coord_origin": "TOPLEFT"}}, {"id": 104, "text": "TableFormer", "bbox": {"l": 66.315002, "t": 568.00237, "r": 117.38329000000002, "b": 576.90892, "coord_origin": "TOPLEFT"}}, {"id": 105, "text": "STN", "bbox": {"l": 134.86766, "t": 568.00237, "r": 153.68701, "b": 576.90892, "coord_origin": "TOPLEFT"}}, {"id": 106, "text": "96.9", "bbox": {"l": 176.57111, "t": 568.00237, "r": 194.00566, "b": 576.90892, "coord_origin": "TOPLEFT"}}, {"id": 107, "text": "95.7", "bbox": {"l": 220.83495, "t": 568.00237, "r": 238.26950000000002, "b": 576.90892, "coord_origin": "TOPLEFT"}}, {"id": 108, "text": "96.7", "bbox": {"l": 262.1897, "t": 568.00237, "r": 279.62424, "b": 576.90892, "coord_origin": "TOPLEFT"}}]}, "text": null, "otsl_seq": ["ched", "ched", "ched", "ched", "ched", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "nl"], "num_rows": 11, "num_cols": 5, "table_cells": [{"bbox": {"l": 78.843002, "t": 420.69037, "r": 104.85535, "b": 429.59692, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Model", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 211.2, "t": 414.71237, "r": 247.74349999999998, "b": 435.57391000000007, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "TEDS Complex", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 129.338, "t": 426.66736, "r": 159.21584, "b": 435.57391000000007, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "Dataset", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 171.17096, "t": 426.66736, "r": 199.40497, "b": 435.57391000000007, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "Simple", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 264.54044, "t": 426.66736, "r": 277.27264, "b": 435.57391000000007, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "All", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 81.612, "t": 443.62436, "r": 102.08514, "b": 452.53091, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "EDD", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 134.87206, "t": 443.62436, "r": 153.69141, "b": 452.53091, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "PTN", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 176.56554, "t": 443.62436, "r": 194.00009, "b": 452.53091, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "91.1", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 220.82938000000001, "t": 443.62436, "r": 238.26393, "b": 452.53091, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "88.7", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 262.18414, "t": 443.62436, "r": 279.61868, "b": 452.53091, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "89.9", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 82.165001, "t": 455.58035, "r": 101.5323, "b": 464.48691, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "GTE", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 134.86716, "t": 455.58035, "r": 153.68651, "b": 464.48691, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "PTN", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 183.62411, "t": 455.58035, "r": 186.94167, "b": 464.48691, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "-", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 227.88795000000002, "t": 455.58035, "r": 231.20551, "b": 464.48691, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "-", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 259.69855, "t": 455.58035, "r": 282.11441, "b": 464.48691, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "93.01", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 66.315002, "t": 468.13336, "r": 117.38329000000002, "b": 477.03992, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "TableFormer", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 134.86766, "t": 468.13336, "r": 153.68701, "b": 477.03992, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "PTN", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 176.57111, "t": 468.13336, "r": 194.00566, "b": 477.03992, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "98.5", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 220.83495, "t": 468.13336, "r": 238.26950000000002, "b": 477.03992, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "95.0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 259.698, "t": 468.01379, "r": 282.11386, "b": 476.97018, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "96.75", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 81.612, "t": 483.32635, "r": 102.08514, "b": 492.23291, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "EDD", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 134.87206, "t": 483.32635, "r": 153.69141, "b": 492.23291, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "FTN", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 176.56554, "t": 483.32635, "r": 194.00009, "b": 492.23291, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "88.4", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 218.33870999999996, "t": 483.32635, "r": 240.75455999999997, "b": 492.23291, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "92.08", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 262.18411, "t": 483.32635, "r": 279.61865, "b": 492.23291, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "90.6", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 82.165001, "t": 495.28134, "r": 101.5323, "b": 504.1879, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "GTE", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 134.86716, "t": 495.28134, "r": 153.68651, "b": 504.1879, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "FTN", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 183.62411, "t": 495.28134, "r": 186.94167, "b": 504.1879, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "-", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 227.88795000000002, "t": 495.28134, "r": 231.20551, "b": 504.1879, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "-", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 259.69855, "t": 495.28134, "r": 282.11441, "b": 504.1879, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "87.14", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 71.789001, "t": 507.23633, "r": 111.90838999999998, "b": 516.14288, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "GTE (FT)", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 134.86221, "t": 507.23633, "r": 153.68156, "b": 516.14288, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "FTN", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 183.62914, "t": 507.23633, "r": 186.94669, "b": 516.14288, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "-", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 227.89297, "t": 507.23633, "r": 231.21053000000003, "b": 516.14288, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "-", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 259.6936, "t": 507.23633, "r": 282.10947, "b": 516.14288, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "91.02", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 66.315002, "t": 519.1913099999999, "r": 117.38329000000002, "b": 528.0978700000001, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "TableFormer", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 134.86766, "t": 519.1913099999999, "r": 153.68701, "b": 528.0978700000001, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "FTN", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 176.57111, "t": 519.1913099999999, "r": 194.00566, "b": 528.0978700000001, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "97.5", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 220.83495, "t": 519.1913099999999, "r": 238.26950000000002, "b": 528.0978700000001, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "96.0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 262.189, "t": 519.0717500000001, "r": 279.62354, "b": 528.02814, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "96.8", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 81.612, "t": 536.49837, "r": 102.08514, "b": 545.40492, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "EDD", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 137.91064, "t": 536.49837, "r": 150.64285, "b": 545.40492, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "TB", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 176.56554, "t": 536.49837, "r": 194.00009, "b": 545.40492, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "86.0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 227.89285, "t": 536.49837, "r": 231.21040000000002, "b": 545.40492, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "-", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 262.18411, "t": 536.49837, "r": 279.61865, "b": 545.40492, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "86.0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 66.315002, "t": 548.45436, "r": 117.38329000000002, "b": 557.36092, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "TableFormer", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 137.90625, "t": 548.45436, "r": 150.63846, "b": 557.36092, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "TB", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 176.57111, "t": 548.45436, "r": 194.00566, "b": 557.36092, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "89.6", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 227.88845999999998, "t": 548.45436, "r": 231.20601, "b": 557.36092, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "-", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 262.189, "t": 548.3348100000001, "r": 279.62354, "b": 557.2911799999999, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "89.6", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 66.315002, "t": 568.00237, "r": 117.38329000000002, "b": 576.90892, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 10, "end_row_offset_idx": 11, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "TableFormer", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 134.86766, "t": 568.00237, "r": 153.68701, "b": 576.90892, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 10, "end_row_offset_idx": 11, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "STN", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 176.57111, "t": 568.00237, "r": 194.00566, "b": 576.90892, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 10, "end_row_offset_idx": 11, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "96.9", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 220.83495, "t": 568.00237, "r": 238.26950000000002, "b": 576.90892, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 10, "end_row_offset_idx": 11, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "95.7", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 262.1897, "t": 568.00237, "r": 279.62424, "b": 576.90892, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 10, "end_row_offset_idx": 11, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "96.7", "column_header": false, "row_header": false, "row_section": false}]}, {"label": "text", "id": 7, "page_no": 6, "cluster": {"id": 7, "label": "text", "bbox": {"l": 49.4423828125, "t": 591.6051635742188, "r": 286.63427734375, "b": 613.4329223632812, "coord_origin": "TOPLEFT"}, "confidence": 0.7209144830703735, "cells": [{"id": 109, "text": "Table 2: Structure results on PubTabNet (PTN), FinTabNet", "bbox": {"l": 50.112, "t": 592.43336, "r": 286.36511, "b": 601.33992, "coord_origin": "TOPLEFT"}}, {"id": 110, "text": "(FTN), TableBank (TB) and SynthTabNet (STN).", "bbox": {"l": 50.112, "t": 604.38837, "r": 247.46114, "b": 613.29492, "coord_origin": "TOPLEFT"}}]}, "text": "Table 2: Structure results on PubTabNet (PTN), FinTabNet (FTN), TableBank (TB) and SynthTabNet (STN)."}, {"label": "text", "id": 8, "page_no": 6, "cluster": {"id": 8, "label": "text", "bbox": {"l": 49.50996780395508, "t": 615.5875244140625, "r": 261.78732, "b": 625.24992, "coord_origin": "TOPLEFT"}, "confidence": 0.6433371901512146, "cells": [{"id": 111, "text": "FT: Model was trained on PubTabNet then finetuned.", "bbox": {"l": 50.112, "t": 616.34337, "r": 261.78732, "b": 625.24992, "coord_origin": "TOPLEFT"}}]}, "text": "FT: Model was trained on PubTabNet then finetuned."}, {"label": "text", "id": 9, "page_no": 6, "cluster": {"id": 9, "label": "text", "bbox": {"l": 49.4313850402832, "t": 643.5670166015625, "r": 286.515869140625, "b": 713.6913452148438, "coord_origin": "TOPLEFT"}, "confidence": 0.9854632616043091, "cells": [{"id": 112, "text": "Cell Detection.", "bbox": {"l": 62.067001, "t": 644.3498099999999, "r": 124.72179, "b": 653.30618, "coord_origin": "TOPLEFT"}}, {"id": 113, "text": "Like any object detector, our", "bbox": {"l": 128.20401, "t": 644.46936, "r": 242.9333, "b": 653.37592, "coord_origin": "TOPLEFT"}}, {"id": 114, "text": "Cell BBox", "bbox": {"l": 245.55401999999998, "t": 644.55902, "r": 286.36084, "b": 653.1467700000001, "coord_origin": "TOPLEFT"}}, {"id": 115, "text": "Detector", "bbox": {"l": 50.112015, "t": 656.51402, "r": 84.971146, "b": 665.10178, "coord_origin": "TOPLEFT"}}, {"id": 116, "text": "provides bounding boxes that can be improved", "bbox": {"l": 89.515015, "t": 656.42436, "r": 286.366, "b": 665.33092, "coord_origin": "TOPLEFT"}}, {"id": 117, "text": "with post-processing during inference. We make use of the", "bbox": {"l": 50.112015, "t": 668.37936, "r": 286.36511, "b": 677.28593, "coord_origin": "TOPLEFT"}}, {"id": 118, "text": "grid-like structure of tables to refine the predictions. A de-", "bbox": {"l": 50.112015, "t": 680.33536, "r": 286.36505, "b": 689.24193, "coord_origin": "TOPLEFT"}}, {"id": 119, "text": "tailed explanation on the post-processing is available in the", "bbox": {"l": 50.112015, "t": 692.290359, "r": 286.36511, "b": 701.19693, "coord_origin": "TOPLEFT"}}, {"id": 120, "text": "supplementary material. As shown in Tab. 3, we evaluate", "bbox": {"l": 50.112015, "t": 704.245361, "r": 286.36508, "b": 713.151932, "coord_origin": "TOPLEFT"}}]}, "text": "Cell Detection. Like any object detector, our Cell BBox Detector provides bounding boxes that can be improved with post-processing during inference. We make use of the grid-like structure of tables to refine the predictions. A detailed explanation on the post-processing is available in the supplementary material. As shown in Tab. 3, we evaluate"}, {"label": "text", "id": 10, "page_no": 6, "cluster": {"id": 10, "label": "text", "bbox": {"l": 307.97955322265625, "t": 74.48530578613281, "r": 545.258544921875, "b": 227.81777954101562, "coord_origin": "TOPLEFT"}, "confidence": 0.9713152647018433, "cells": [{"id": 121, "text": "our", "bbox": {"l": 308.862, "t": 75.20836999999995, "r": 322.14215, "b": 84.11492999999996, "coord_origin": "TOPLEFT"}}, {"id": 122, "text": "Cell BBox Decoder", "bbox": {"l": 325.45401, "t": 75.29803000000004, "r": 404.56702, "b": 83.88580000000002, "coord_origin": "TOPLEFT"}}, {"id": 123, "text": "accuracy for cells with a class la-", "bbox": {"l": 408.104, "t": 75.20836999999995, "r": 545.10968, "b": 84.11492999999996, "coord_origin": "TOPLEFT"}}, {"id": 124, "text": "bel of \u2018content\u2019 only using the PASCAL VOC mAP metric", "bbox": {"l": 308.862, "t": 87.16339000000005, "r": 545.11511, "b": 96.06994999999995, "coord_origin": "TOPLEFT"}}, {"id": 125, "text": "for pre-processing and post-processing.", "bbox": {"l": 308.862, "t": 99.11841000000004, "r": 470.22626, "b": 108.02495999999985, "coord_origin": "TOPLEFT"}}, {"id": 126, "text": "Note that we do", "bbox": {"l": 477.52884, "t": 99.11841000000004, "r": 545.11511, "b": 108.02495999999985, "coord_origin": "TOPLEFT"}}, {"id": 127, "text": "not have post-processing results for SynthTabNet as images", "bbox": {"l": 308.862, "t": 111.07343000000003, "r": 545.11517, "b": 119.97997999999984, "coord_origin": "TOPLEFT"}}, {"id": 128, "text": "are only provided. To compare the performance of our pro-", "bbox": {"l": 308.862, "t": 123.02844000000005, "r": 545.11511, "b": 131.93499999999995, "coord_origin": "TOPLEFT"}}, {"id": 129, "text": "posed approach, we\u2019ve integrated TableFormer\u2019s", "bbox": {"l": 308.862, "t": 134.98443999999995, "r": 502.01691000000005, "b": 143.89099, "coord_origin": "TOPLEFT"}}, {"id": 130, "text": "Cell BBox", "bbox": {"l": 504.47299, "t": 135.07410000000004, "r": 545.11041, "b": 143.66187000000002, "coord_origin": "TOPLEFT"}}, {"id": 131, "text": "Decoder", "bbox": {"l": 308.862, "t": 147.02910999999995, "r": 343.16324, "b": 155.61688000000004, "coord_origin": "TOPLEFT"}}, {"id": 132, "text": "into EDD architecture. As mentioned previously,", "bbox": {"l": 346.371, "t": 146.93944999999997, "r": 545.11493, "b": 155.84600999999998, "coord_origin": "TOPLEFT"}}, {"id": 133, "text": "the Structure Decoder provides the", "bbox": {"l": 308.862, "t": 158.89446999999996, "r": 446.15652, "b": 167.80102999999997, "coord_origin": "TOPLEFT"}}, {"id": 134, "text": "Cell BBox Decoder", "bbox": {"l": 448.28998000000007, "t": 158.98413000000005, "r": 525.04181, "b": 167.57190000000003, "coord_origin": "TOPLEFT"}}, {"id": 135, "text": "with", "bbox": {"l": 527.39899, "t": 158.89446999999996, "r": 545.11249, "b": 167.80102999999997, "coord_origin": "TOPLEFT"}}, {"id": 136, "text": "the features needed to predict the bounding box predictions.", "bbox": {"l": 308.862, "t": 170.84948999999995, "r": 545.11511, "b": 179.75603999999998, "coord_origin": "TOPLEFT"}}, {"id": 137, "text": "Therefore, the accuracy of the", "bbox": {"l": 308.862, "t": 182.80449999999996, "r": 432.86642000000006, "b": 191.71105999999997, "coord_origin": "TOPLEFT"}}, {"id": 138, "text": "Structure Decoder", "bbox": {"l": 436.39001, "t": 182.89417000000003, "r": 510.93021, "b": 191.48193000000003, "coord_origin": "TOPLEFT"}}, {"id": 139, "text": "directly", "bbox": {"l": 514.677, "t": 182.80449999999996, "r": 545.11273, "b": 191.71105999999997, "coord_origin": "TOPLEFT"}}, {"id": 140, "text": "influences the accuracy of the", "bbox": {"l": 308.862, "t": 194.75951999999995, "r": 431.17285, "b": 203.66607999999997, "coord_origin": "TOPLEFT"}}, {"id": 141, "text": "Cell BBox Decoder", "bbox": {"l": 434.6790199999999, "t": 194.84918000000005, "r": 514.18054, "b": 203.43695000000002, "coord_origin": "TOPLEFT"}}, {"id": 142, "text": ". If the", "bbox": {"l": 514.17603, "t": 194.75951999999995, "r": 545.10992, "b": 203.66607999999997, "coord_origin": "TOPLEFT"}}, {"id": 143, "text": "Structure Decoder", "bbox": {"l": 308.86203, "t": 206.80517999999995, "r": 382.35614, "b": 215.39293999999995, "coord_origin": "TOPLEFT"}}, {"id": 144, "text": "predicts an extra column, this will result", "bbox": {"l": 385.07501, "t": 206.71551999999997, "r": 545.11426, "b": 215.62207, "coord_origin": "TOPLEFT"}}, {"id": 145, "text": "in an extra column of predicted bounding boxes.", "bbox": {"l": 308.862, "t": 218.67052999999999, "r": 501.6981799999999, "b": 227.57709, "coord_origin": "TOPLEFT"}}]}, "text": "our Cell BBox Decoder accuracy for cells with a class label of \u2018content\u2019 only using the PASCAL VOC mAP metric for pre-processing and post-processing. Note that we do not have post-processing results for SynthTabNet as images are only provided. To compare the performance of our proposed approach, we\u2019ve integrated TableFormer\u2019s Cell BBox Decoder into EDD architecture. As mentioned previously, the Structure Decoder provides the Cell BBox Decoder with the features needed to predict the bounding box predictions. Therefore, the accuracy of the Structure Decoder directly influences the accuracy of the Cell BBox Decoder . If the Structure Decoder predicts an extra column, this will result in an extra column of predicted bounding boxes."}, {"label": "table", "id": 11, "page_no": 6, "cluster": {"id": 11, "label": "table", "bbox": {"l": 308.4067077636719, "t": 247.87644958496094, "r": 533.64208984375, "b": 303.8056640625, "coord_origin": "TOPLEFT"}, "confidence": 0.9691707491874695, "cells": [{"id": 146, "text": "Model", "bbox": {"l": 339.323, "t": 253.66436999999996, "r": 365.33536, "b": 262.57092, "coord_origin": "TOPLEFT"}}, {"id": 147, "text": "Dataset", "bbox": {"l": 401.04132, "t": 253.66436999999996, "r": 430.91916, "b": 262.57092, "coord_origin": "TOPLEFT"}}, {"id": 148, "text": "mAP", "bbox": {"l": 454.10214, "t": 253.66436999999996, "r": 474.58523999999994, "b": 262.57092, "coord_origin": "TOPLEFT"}}, {"id": 149, "text": "mAP (PP)", "bbox": {"l": 486.54034, "t": 253.66436999999996, "r": 527.2276, "b": 262.57092, "coord_origin": "TOPLEFT"}}, {"id": 150, "text": "EDD+BBox", "bbox": {"l": 327.65601, "t": 270.62134000000003, "r": 377.00076, "b": 279.52788999999996, "coord_origin": "TOPLEFT"}}, {"id": 151, "text": "PubTabNet", "bbox": {"l": 393.69809, "t": 270.62134000000003, "r": 438.28073, "b": 279.52788999999996, "coord_origin": "TOPLEFT"}}, {"id": 152, "text": "79.2", "bbox": {"l": 455.63559, "t": 270.62134000000003, "r": 473.07013, "b": 279.52788999999996, "coord_origin": "TOPLEFT"}}, {"id": 153, "text": "82.7", "bbox": {"l": 498.16592, "t": 270.62134000000003, "r": 515.60046, "b": 279.52788999999996, "coord_origin": "TOPLEFT"}}, {"id": 154, "text": "TableFormer", "bbox": {"l": 326.79501, "t": 282.57631999999995, "r": 377.86331, "b": 291.48288, "coord_origin": "TOPLEFT"}}, {"id": 155, "text": "PubTabNet", "bbox": {"l": 393.69388, "t": 282.57631999999995, "r": 438.27652, "b": 291.48288, "coord_origin": "TOPLEFT"}}, {"id": 156, "text": "82.1", "bbox": {"l": 455.63101, "t": 282.45676, "r": 473.06555000000003, "b": 291.41315, "coord_origin": "TOPLEFT"}}, {"id": 157, "text": "86.8", "bbox": {"l": 498.1713, "t": 282.45676, "r": 515.60583, "b": 291.41315, "coord_origin": "TOPLEFT"}}, {"id": 158, "text": "TableFormer", "bbox": {"l": 326.79501, "t": 294.53131, "r": 377.86331, "b": 303.43787, "coord_origin": "TOPLEFT"}}, {"id": 159, "text": "SynthTabNet", "bbox": {"l": 389.81842, "t": 294.53131, "r": 442.15194999999994, "b": 303.43787, "coord_origin": "TOPLEFT"}}, {"id": 160, "text": "87.7", "bbox": {"l": 455.63135, "t": 294.53131, "r": 473.06589, "b": 303.43787, "coord_origin": "TOPLEFT"}}, {"id": 161, "text": "-", "bbox": {"l": 505.22515999999996, "t": 294.53131, "r": 508.54268999999994, "b": 303.43787, "coord_origin": "TOPLEFT"}}]}, "text": null, "otsl_seq": ["ched", "ched", "ched", "ched", "nl", "fcel", "fcel", "fcel", "fcel", "nl", "fcel", "fcel", "fcel", "fcel", "nl", "fcel", "fcel", "fcel", "fcel", "nl"], "num_rows": 4, "num_cols": 4, "table_cells": [{"bbox": {"l": 339.323, "t": 253.66436999999996, "r": 365.33536, "b": 262.57092, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Model", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 401.04132, "t": 253.66436999999996, "r": 430.91916, "b": 262.57092, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "Dataset", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 454.10214, "t": 253.66436999999996, "r": 474.58523999999994, "b": 262.57092, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "mAP", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 486.54034, "t": 253.66436999999996, "r": 527.2276, "b": 262.57092, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "mAP (PP)", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 327.65601, "t": 270.62134000000003, "r": 377.00076, "b": 279.52788999999996, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "EDD+BBox", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 393.69809, "t": 270.62134000000003, "r": 438.28073, "b": 279.52788999999996, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "PubTabNet", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 455.63559, "t": 270.62134000000003, "r": 473.07013, "b": 279.52788999999996, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "79.2", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 498.16592, "t": 270.62134000000003, "r": 515.60046, "b": 279.52788999999996, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "82.7", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 326.79501, "t": 282.57631999999995, "r": 377.86331, "b": 291.48288, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "TableFormer", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 393.69388, "t": 282.57631999999995, "r": 438.27652, "b": 291.48288, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "PubTabNet", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 455.63101, "t": 282.45676, "r": 473.06555000000003, "b": 291.41315, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "82.1", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 498.1713, "t": 282.45676, "r": 515.60583, "b": 291.41315, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "86.8", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 326.79501, "t": 294.53131, "r": 377.86331, "b": 303.43787, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "TableFormer", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 389.81842, "t": 294.53131, "r": 442.15194999999994, "b": 303.43787, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "SynthTabNet", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 455.63135, "t": 294.53131, "r": 473.06589, "b": 303.43787, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "87.7", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 505.22515999999996, "t": 294.53131, "r": 508.54268999999994, "b": 303.43787, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "-", "column_header": false, "row_header": false, "row_section": false}]}, {"label": "caption", "id": 12, "page_no": 6, "cluster": {"id": 12, "label": "caption", "bbox": {"l": 308.0755615234375, "t": 315.7750549316406, "r": 545.11517, "b": 337.8129577636719, "coord_origin": "TOPLEFT"}, "confidence": 0.9519906044006348, "cells": [{"id": 162, "text": "Table 3:", "bbox": {"l": 308.862, "t": 316.44931, "r": 341.49951, "b": 325.35587, "coord_origin": "TOPLEFT"}}, {"id": 163, "text": "Cell Bounding Box detection results on PubTab-", "bbox": {"l": 348.60284, "t": 316.44931, "r": 545.11517, "b": 325.35587, "coord_origin": "TOPLEFT"}}, {"id": 164, "text": "Net, and FinTabNet. PP: Post-processing.", "bbox": {"l": 308.862, "t": 328.4043, "r": 474.97845, "b": 337.3108500000001, "coord_origin": "TOPLEFT"}}]}, "text": "Table 3: Cell Bounding Box detection results on PubTabNet, and FinTabNet. PP: Post-processing."}, {"label": "text", "id": 13, "page_no": 6, "cluster": {"id": 13, "label": "text", "bbox": {"l": 307.8382873535156, "t": 367.0015563964844, "r": 545.3449096679688, "b": 520.16769, "coord_origin": "TOPLEFT"}, "confidence": 0.9835002422332764, "cells": [{"id": 165, "text": "Cell Content.", "bbox": {"l": 320.81699, "t": 367.6797199999999, "r": 378.94876, "b": 376.63611, "coord_origin": "TOPLEFT"}}, {"id": 166, "text": "In this section, we evaluate the entire", "bbox": {"l": 387.07898, "t": 367.79929, "r": 545.11566, "b": 376.70584, "coord_origin": "TOPLEFT"}}, {"id": 167, "text": "pipeline of recovering a table with content.", "bbox": {"l": 308.86197, "t": 379.75426999999996, "r": 487.19257, "b": 388.66083, "coord_origin": "TOPLEFT"}}, {"id": 168, "text": "Here we put", "bbox": {"l": 493.96713, "t": 379.75426999999996, "r": 545.11511, "b": 388.66083, "coord_origin": "TOPLEFT"}}, {"id": 169, "text": "our approach to test by capitalizing on extracting content", "bbox": {"l": 308.86197, "t": 391.70926, "r": 545.11505, "b": 400.61581, "coord_origin": "TOPLEFT"}}, {"id": 170, "text": "from the PDF cells rather than decoding from images. Tab.", "bbox": {"l": 308.86197, "t": 403.66525, "r": 545.11523, "b": 412.57181, "coord_origin": "TOPLEFT"}}, {"id": 171, "text": "4", "bbox": {"l": 308.86197, "t": 415.62024, "r": 314.08096, "b": 424.52679, "coord_origin": "TOPLEFT"}}, {"id": 172, "text": "shows the TEDs score of HTML code representing the", "bbox": {"l": 316.69046, "t": 415.62024, "r": 545.11517, "b": 424.52679, "coord_origin": "TOPLEFT"}}, {"id": 173, "text": "structure of the table along with the content inserted in the", "bbox": {"l": 308.86197, "t": 427.57523, "r": 545.11505, "b": 436.48177999999996, "coord_origin": "TOPLEFT"}}, {"id": 174, "text": "data cell and compared with the ground-truth. Our method", "bbox": {"l": 308.86197, "t": 439.53021, "r": 545.11505, "b": 448.43677, "coord_origin": "TOPLEFT"}}, {"id": 175, "text": "achieved a", "bbox": {"l": 308.86197, "t": 451.4852, "r": 350.23666, "b": 460.39175, "coord_origin": "TOPLEFT"}}, {"id": 176, "text": "5.3%", "bbox": {"l": 352.17596, "t": 451.36563, "r": 374.59183, "b": 460.32201999999995, "coord_origin": "TOPLEFT"}}, {"id": 177, "text": "increase over the state-of-the-art, and com-", "bbox": {"l": 376.53296, "t": 451.4852, "r": 545.11011, "b": 460.39175, "coord_origin": "TOPLEFT"}}, {"id": 178, "text": "mercial solutions. We believe our scores would be higher", "bbox": {"l": 308.86197, "t": 463.44019, "r": 545.11511, "b": 472.34674, "coord_origin": "TOPLEFT"}}, {"id": 179, "text": "if the HTML ground-truth matched the extracted PDF cell", "bbox": {"l": 308.86197, "t": 475.39618, "r": 545.11517, "b": 484.30273, "coord_origin": "TOPLEFT"}}, {"id": 180, "text": "content. Unfortunately, there are small discrepancies such", "bbox": {"l": 308.86197, "t": 487.35117, "r": 545.11511, "b": 496.25772, "coord_origin": "TOPLEFT"}}, {"id": 181, "text": "as spacings around words or special characters with various", "bbox": {"l": 308.86197, "t": 499.30615, "r": 545.11505, "b": 508.21271, "coord_origin": "TOPLEFT"}}, {"id": 182, "text": "unicode representations.", "bbox": {"l": 308.86197, "t": 511.26114, "r": 405.69846, "b": 520.16769, "coord_origin": "TOPLEFT"}}]}, "text": "Cell Content. In this section, we evaluate the entire pipeline of recovering a table with content. Here we put our approach to test by capitalizing on extracting content from the PDF cells rather than decoding from images. Tab. 4 shows the TEDs score of HTML code representing the structure of the table along with the content inserted in the data cell and compared with the ground-truth. Our method achieved a 5.3% increase over the state-of-the-art, and commercial solutions. We believe our scores would be higher if the HTML ground-truth matched the extracted PDF cell content. Unfortunately, there are small discrepancies such as spacings around words or special characters with various unicode representations."}, {"label": "table", "id": 14, "page_no": 6, "cluster": {"id": 14, "label": "table", "bbox": {"l": 332.9688720703125, "t": 540.2835083007812, "r": 520.942138671875, "b": 643.84991, "coord_origin": "TOPLEFT"}, "confidence": 0.9775565266609192, "cells": [{"id": 183, "text": "Model", "bbox": {"l": 358.01099, "t": 552.23337, "r": 384.02335, "b": 561.1399200000001, "coord_origin": "TOPLEFT"}}, {"id": 184, "text": "TEDS", "bbox": {"l": 449.03400000000005, "t": 546.25537, "r": 473.94049000000007, "b": 555.16193, "coord_origin": "TOPLEFT"}}, {"id": 185, "text": "Simple", "bbox": {"l": 408.50598, "t": 558.21037, "r": 436.73999, "b": 567.11693, "coord_origin": "TOPLEFT"}}, {"id": 186, "text": "Complex", "bbox": {"l": 448.6951, "t": 558.21037, "r": 485.07849, "b": 567.11693, "coord_origin": "TOPLEFT"}}, {"id": 187, "text": "All", "bbox": {"l": 499.3848, "t": 558.21037, "r": 512.117, "b": 567.11693, "coord_origin": "TOPLEFT"}}, {"id": 188, "text": "Tabula", "bbox": {"l": 357.68201, "t": 575.16736, "r": 384.3519, "b": 584.0739100000001, "coord_origin": "TOPLEFT"}}, {"id": 189, "text": "78.0", "bbox": {"l": 413.90097, "t": 575.16736, "r": 431.33550999999994, "b": 584.0739100000001, "coord_origin": "TOPLEFT"}}, {"id": 190, "text": "57.8", "bbox": {"l": 458.16479000000004, "t": 575.16736, "r": 475.59933000000007, "b": 584.0739100000001, "coord_origin": "TOPLEFT"}}, {"id": 191, "text": "67.9", "bbox": {"l": 497.0289, "t": 575.16736, "r": 514.46344, "b": 584.0739100000001, "coord_origin": "TOPLEFT"}}, {"id": 192, "text": "Traprange", "bbox": {"l": 350.72299, "t": 587.12236, "r": 391.31064, "b": 596.02892, "coord_origin": "TOPLEFT"}}, {"id": 193, "text": "60.8", "bbox": {"l": 413.90582, "t": 587.12236, "r": 431.34036, "b": 596.02892, "coord_origin": "TOPLEFT"}}, {"id": 194, "text": "49.9", "bbox": {"l": 458.16965, "t": 587.12236, "r": 475.60419, "b": 596.02892, "coord_origin": "TOPLEFT"}}, {"id": 195, "text": "55.4", "bbox": {"l": 497.03374999999994, "t": 587.12236, "r": 514.46832, "b": 596.02892, "coord_origin": "TOPLEFT"}}, {"id": 196, "text": "Camelot", "bbox": {"l": 354.13599, "t": 599.07835, "r": 387.89923, "b": 607.98491, "coord_origin": "TOPLEFT"}}, {"id": 197, "text": "80.0", "bbox": {"l": 413.90161, "t": 599.07835, "r": 431.33615, "b": 607.98491, "coord_origin": "TOPLEFT"}}, {"id": 198, "text": "66.0", "bbox": {"l": 458.16544, "t": 599.07835, "r": 475.59998, "b": 607.98491, "coord_origin": "TOPLEFT"}}, {"id": 199, "text": "73.0", "bbox": {"l": 497.02954000000005, "t": 599.07835, "r": 514.46411, "b": 607.98491, "coord_origin": "TOPLEFT"}}, {"id": 200, "text": "Acrobat Pro", "bbox": {"l": 346.55899, "t": 611.03336, "r": 395.47534, "b": 619.93991, "coord_origin": "TOPLEFT"}}, {"id": 201, "text": "68.9", "bbox": {"l": 413.90616, "t": 611.03336, "r": 431.34069999999997, "b": 619.93991, "coord_origin": "TOPLEFT"}}, {"id": 202, "text": "61.8", "bbox": {"l": 458.16998000000007, "t": 611.03336, "r": 475.60452, "b": 619.93991, "coord_origin": "TOPLEFT"}}, {"id": 203, "text": "65.3", "bbox": {"l": 497.03409, "t": 611.03336, "r": 514.46863, "b": 619.93991, "coord_origin": "TOPLEFT"}}, {"id": 204, "text": "EDD", "bbox": {"l": 360.78101, "t": 622.9883600000001, "r": 381.25415, "b": 631.89491, "coord_origin": "TOPLEFT"}}, {"id": 205, "text": "91.2", "bbox": {"l": 413.90158, "t": 622.9883600000001, "r": 431.33612, "b": 631.89491, "coord_origin": "TOPLEFT"}}, {"id": 206, "text": "85.4", "bbox": {"l": 458.16541, "t": 622.9883600000001, "r": 475.59995000000004, "b": 631.89491, "coord_origin": "TOPLEFT"}}, {"id": 207, "text": "88.3", "bbox": {"l": 497.0295100000001, "t": 622.9883600000001, "r": 514.46405, "b": 631.89491, "coord_origin": "TOPLEFT"}}, {"id": 208, "text": "TableFormer", "bbox": {"l": 345.483, "t": 634.94336, "r": 396.5513, "b": 643.84991, "coord_origin": "TOPLEFT"}}, {"id": 209, "text": "95.4", "bbox": {"l": 413.90616, "t": 634.94336, "r": 431.34069999999997, "b": 643.84991, "coord_origin": "TOPLEFT"}}, {"id": 210, "text": "90.1", "bbox": {"l": 458.16998000000007, "t": 634.94336, "r": 475.60452, "b": 643.84991, "coord_origin": "TOPLEFT"}}, {"id": 211, "text": "93.6", "bbox": {"l": 497.03400000000005, "t": 634.82381, "r": 514.46857, "b": 643.78018, "coord_origin": "TOPLEFT"}}]}, "text": null, "otsl_seq": ["fcel", "ched", "ched", "ched", "nl", "rhed", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "nl"], "num_rows": 7, "num_cols": 4, "table_cells": [{"bbox": {"l": 358.01099, "t": 552.23337, "r": 384.02335, "b": 561.1399200000001, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Model", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 448.6951, "t": 546.25537, "r": 485.07849, "b": 567.11693, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "TEDS Complex", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 408.50598, "t": 558.21037, "r": 436.73999, "b": 567.11693, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "Simple", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 499.3848, "t": 558.21037, "r": 512.117, "b": 567.11693, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "All", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 357.68201, "t": 575.16736, "r": 384.3519, "b": 584.0739100000001, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Tabula", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 413.90097, "t": 575.16736, "r": 431.33550999999994, "b": 584.0739100000001, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "78.0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 458.16479000000004, "t": 575.16736, "r": 475.59933000000007, "b": 584.0739100000001, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "57.8", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 497.0289, "t": 575.16736, "r": 514.46344, "b": 584.0739100000001, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "67.9", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 350.72299, "t": 587.12236, "r": 391.31064, "b": 596.02892, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Traprange", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 413.90582, "t": 587.12236, "r": 431.34036, "b": 596.02892, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "60.8", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 458.16965, "t": 587.12236, "r": 475.60419, "b": 596.02892, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "49.9", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 497.03374999999994, "t": 587.12236, "r": 514.46832, "b": 596.02892, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "55.4", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 354.13599, "t": 599.07835, "r": 387.89923, "b": 607.98491, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Camelot", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 413.90161, "t": 599.07835, "r": 431.33615, "b": 607.98491, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "80.0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 458.16544, "t": 599.07835, "r": 475.59998, "b": 607.98491, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "66.0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 497.02954000000005, "t": 599.07835, "r": 514.46411, "b": 607.98491, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "73.0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 346.55899, "t": 611.03336, "r": 395.47534, "b": 619.93991, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Acrobat Pro", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 413.90616, "t": 611.03336, "r": 431.34069999999997, "b": 619.93991, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "68.9", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 458.16998000000007, "t": 611.03336, "r": 475.60452, "b": 619.93991, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "61.8", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 497.03409, "t": 611.03336, "r": 514.46863, "b": 619.93991, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "65.3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 360.78101, "t": 622.9883600000001, "r": 381.25415, "b": 631.89491, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "EDD", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 413.90158, "t": 622.9883600000001, "r": 431.33612, "b": 631.89491, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "91.2", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 458.16541, "t": 622.9883600000001, "r": 475.59995000000004, "b": 631.89491, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "85.4", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 497.0295100000001, "t": 622.9883600000001, "r": 514.46405, "b": 631.89491, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "88.3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 345.483, "t": 634.94336, "r": 396.5513, "b": 643.84991, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "TableFormer", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 413.90616, "t": 634.94336, "r": 431.34069999999997, "b": 643.84991, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "95.4", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 458.16998000000007, "t": 634.94336, "r": 475.60452, "b": 643.84991, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "90.1", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 497.03400000000005, "t": 634.82381, "r": 514.46857, "b": 643.78018, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "93.6", "column_header": false, "row_header": false, "row_section": false}]}, {"label": "caption", "id": 15, "page_no": 6, "cluster": {"id": 15, "label": "caption", "bbox": {"l": 307.9747619628906, "t": 655.8218994140625, "r": 545.1710815429688, "b": 689.8007202148438, "coord_origin": "TOPLEFT"}, "confidence": 0.954140305519104, "cells": [{"id": 212, "text": "Table 4:", "bbox": {"l": 308.862, "t": 656.86136, "r": 341.73862, "b": 665.76792, "coord_origin": "TOPLEFT"}}, {"id": 213, "text": "Results of structure with content retrieved using", "bbox": {"l": 349.55927, "t": 656.86136, "r": 545.11517, "b": 665.76792, "coord_origin": "TOPLEFT"}}, {"id": 214, "text": "cell detection on PubTabNet. In all cases the input is PDF", "bbox": {"l": 308.862, "t": 668.81636, "r": 545.11505, "b": 677.7229199999999, "coord_origin": "TOPLEFT"}}, {"id": 215, "text": "documents with cropped tables.", "bbox": {"l": 308.862, "t": 680.77136, "r": 435.03836, "b": 689.6779300000001, "coord_origin": "TOPLEFT"}}]}, "text": "Table 4: Results of structure with content retrieved using cell detection on PubTabNet. In all cases the input is PDF documents with cropped tables."}], "headers": [{"label": "page_footer", "id": 16, "page_no": 6, "cluster": {"id": 16, "label": "page_footer", "bbox": {"l": 294.5538330078125, "t": 733.197021484375, "r": 300.1892395019531, "b": 743.039921, "coord_origin": "TOPLEFT"}, "confidence": 0.8787975907325745, "cells": [{"id": 216, "text": "7", "bbox": {"l": 295.121, "t": 734.133358, "r": 300.10229, "b": 743.039921, "coord_origin": "TOPLEFT"}}]}, "text": "7"}]}}, {"page_no": 7, "size": {"width": 612.0, "height": 792.0}, "cells": [{"id": 0, "text": "b.", "bbox": {"l": 53.811783000000005, "t": 208.23328000000004, "r": 62.219952, "b": 216.10645, "coord_origin": "TOPLEFT"}}, {"id": 1, "text": "Structure predicted by TableFormer, with superimposed matched PDF cell text:", "bbox": {"l": 66.424026, "t": 208.23328000000004, "r": 385.93451, "b": 216.10645, "coord_origin": "TOPLEFT"}}, {"id": 2, "text": "Japanese language (previously unseen by TableFormer):", "bbox": {"l": 53.811783000000005, "t": 94.28112999999996, "r": 284.34592, "b": 102.15430000000003, "coord_origin": "TOPLEFT"}}, {"id": 3, "text": "Example table from FinTabNet:", "bbox": {"l": 304.83081, "t": 94.28112999999996, "r": 431.09119, "b": 102.15430000000003, "coord_origin": "TOPLEFT"}}, {"id": 4, "text": "a.", "bbox": {"l": 53.286037, "t": 78.68756000000008, "r": 61.550289, "b": 86.56073000000004, "coord_origin": "TOPLEFT"}}, {"id": 5, "text": "Red - PDF cells, Green - predicted bounding boxes, Blue - post-processed predictions matched to PDF cells", "bbox": {"l": 65.682419, "t": 78.68756000000008, "r": 499.55563, "b": 86.56073000000004, "coord_origin": "TOPLEFT"}}, {"id": 6, "text": "\u8ad6\u6587\u30d5\u30a1\u30a4\u30eb", "bbox": {"l": 209.93285, "t": 222.18073000000004, "r": 241.04458999999997, "b": 226.36212, "coord_origin": "TOPLEFT"}}, {"id": 7, "text": "\u53c2\u8003\u6587\u732e", "bbox": {"l": 263.76489, "t": 222.18073000000004, "r": 284.50589, "b": 226.36212, "coord_origin": "TOPLEFT"}}, {"id": 8, "text": "\u51fa\u5178", "bbox": {"l": 110.24990999999999, "t": 229.66594999999995, "r": 120.62018, "b": 233.84735, "coord_origin": "TOPLEFT"}}, {"id": 9, "text": "\u30d5\u30a1\u30a4\u30eb", "bbox": {"l": 175.36609, "t": 229.66594999999995, "r": 196.1071, "b": 233.84735, "coord_origin": "TOPLEFT"}}, {"id": 10, "text": "\u6570", "bbox": {"l": 196.10756, "t": 229.66594999999995, "r": 201.29247, "b": 233.84735, "coord_origin": "TOPLEFT"}}, {"id": 11, "text": "\u82f1\u8a9e", "bbox": {"l": 209.62408, "t": 229.66594999999995, "r": 219.99435, "b": 233.84735, "coord_origin": "TOPLEFT"}}, {"id": 12, "text": "\u65e5\u672c\u8a9e", "bbox": {"l": 229.19814, "t": 229.66594999999995, "r": 244.75377, "b": 233.84735, "coord_origin": "TOPLEFT"}}, {"id": 13, "text": "\u82f1\u8a9e", "bbox": {"l": 256.1142, "t": 229.66594999999995, "r": 266.48447, "b": 233.84735, "coord_origin": "TOPLEFT"}}, {"id": 14, "text": "\u65e5\u672c\u8a9e", "bbox": {"l": 278.38434, "t": 229.66594999999995, "r": 293.93997, "b": 233.84735, "coord_origin": "TOPLEFT"}}, {"id": 15, "text": "Association for Computational Linguistics(ACL2003)", "bbox": {"l": 55.53052099999999, "t": 236.42584, "r": 162.7131, "b": 240.78375000000005, "coord_origin": "TOPLEFT"}}, {"id": 16, "text": "65", "bbox": {"l": 184.39731, "t": 236.42584, "r": 189.56456, "b": 240.78375000000005, "coord_origin": "TOPLEFT"}}, {"id": 17, "text": "65", "bbox": {"l": 208.99026, "t": 236.42584, "r": 214.15752, "b": 240.78375000000005, "coord_origin": "TOPLEFT"}}, {"id": 18, "text": "0", "bbox": {"l": 234.87517, "t": 236.42584, "r": 237.45833000000002, "b": 240.78375000000005, "coord_origin": "TOPLEFT"}}, {"id": 19, "text": "150", "bbox": {"l": 256.88446, "t": 236.42584, "r": 264.6358, "b": 240.78375000000005, "coord_origin": "TOPLEFT"}}, {"id": 20, "text": "0", "bbox": {"l": 284.06134, "t": 236.42584, "r": 286.6445, "b": 240.78375000000005, "coord_origin": "TOPLEFT"}}, {"id": 21, "text": "Computational Linguistics(COLING2002)", "bbox": {"l": 55.53052099999999, "t": 242.62048000000004, "r": 139.72253, "b": 246.97839, "coord_origin": "TOPLEFT"}}, {"id": 22, "text": "140", "bbox": {"l": 183.10536, "t": 242.62048000000004, "r": 190.8567, "b": 246.97839, "coord_origin": "TOPLEFT"}}, {"id": 23, "text": "140", "bbox": {"l": 207.69832, "t": 242.62048000000004, "r": 215.44965999999997, "b": 246.97839, "coord_origin": "TOPLEFT"}}, {"id": 24, "text": "0", "bbox": {"l": 234.87517, "t": 242.62048000000004, "r": 237.45833000000002, "b": 246.97839, "coord_origin": "TOPLEFT"}}, {"id": 25, "text": "150", "bbox": {"l": 256.88446, "t": 242.62048000000004, "r": 264.6358, "b": 246.97839, "coord_origin": "TOPLEFT"}}, {"id": 26, "text": "0", "bbox": {"l": 284.06134, "t": 242.62048000000004, "r": 286.6445, "b": 246.97839, "coord_origin": "TOPLEFT"}}, {"id": 27, "text": "\u96fb\u6c17\u60c5\u5831\u901a\u4fe1\u5b66\u4f1a", "bbox": {"l": 55.53052099999999, "t": 249.79845999999998, "r": 97.013, "b": 253.97986000000003, "coord_origin": "TOPLEFT"}}, {"id": 28, "text": "2003", "bbox": {"l": 92.698288, "t": 249.58942000000002, "r": 103.03371, "b": 253.94732999999997, "coord_origin": "TOPLEFT"}}, {"id": 29, "text": "\u5e74\u7dcf\u5408\u5927\u4f1a", "bbox": {"l": 103.03389, "t": 249.79845999999998, "r": 128.96027, "b": 253.97986000000003, "coord_origin": "TOPLEFT"}}, {"id": 30, "text": "150", "bbox": {"l": 183.10536, "t": 248.81506000000002, "r": 190.8567, "b": 253.17296999999996, "coord_origin": "TOPLEFT"}}, {"id": 31, "text": "8", "bbox": {"l": 210.28223, "t": 248.81506000000002, "r": 212.86539, "b": 253.17296999999996, "coord_origin": "TOPLEFT"}}, {"id": 32, "text": "142", "bbox": {"l": 232.29153, "t": 248.81506000000002, "r": 240.04287999999997, "b": 253.17296999999996, "coord_origin": "TOPLEFT"}}, {"id": 33, "text": "223", "bbox": {"l": 256.88446, "t": 248.81506000000002, "r": 264.6358, "b": 253.17296999999996, "coord_origin": "TOPLEFT"}}, {"id": 34, "text": "147", "bbox": {"l": 281.47742, "t": 248.81506000000002, "r": 289.22876, "b": 253.17296999999996, "coord_origin": "TOPLEFT"}}, {"id": 35, "text": "\u60c5\u5831\u51e6\u7406\u5b66\u4f1a\u7b2c", "bbox": {"l": 55.53052099999999, "t": 257.28369, "r": 91.827637, "b": 261.46509000000003, "coord_origin": "TOPLEFT"}}, {"id": 36, "text": "65", "bbox": {"l": 88.052673, "t": 257.07465, "r": 93.219925, "b": 261.43255999999997, "coord_origin": "TOPLEFT"}}, {"id": 37, "text": "\u56de\u5168\u56fd\u5927\u4f1a", "bbox": {"l": 93.220474, "t": 257.28369, "r": 119.14685, "b": 261.46509000000003, "coord_origin": "TOPLEFT"}}, {"id": 38, "text": "(2003)", "bbox": {"l": 116.45073999999998, "t": 257.07465, "r": 129.88177, "b": 261.43255999999997, "coord_origin": "TOPLEFT"}}, {"id": 39, "text": "177", "bbox": {"l": 183.10536, "t": 256.30029, "r": 190.8567, "b": 260.65819999999997, "coord_origin": "TOPLEFT"}}, {"id": 40, "text": "1", "bbox": {"l": 210.28223, "t": 256.30029, "r": 212.86539, "b": 260.65819999999997, "coord_origin": "TOPLEFT"}}, {"id": 41, "text": "176", "bbox": {"l": 232.29153, "t": 256.30029, "r": 240.04287999999997, "b": 260.65819999999997, "coord_origin": "TOPLEFT"}}, {"id": 42, "text": "150", "bbox": {"l": 256.88446, "t": 256.30029, "r": 264.6358, "b": 260.65819999999997, "coord_origin": "TOPLEFT"}}, {"id": 43, "text": "236", "bbox": {"l": 281.47742, "t": 256.30029, "r": 289.22876, "b": 260.65819999999997, "coord_origin": "TOPLEFT"}}, {"id": 44, "text": "\u7b2c", "bbox": {"l": 55.53052099999999, "t": 264.5108, "r": 60.715424, "b": 268.69219999999996, "coord_origin": "TOPLEFT"}}, {"id": 45, "text": "17", "bbox": {"l": 60.17654799999999, "t": 264.30175999999994, "r": 65.343796, "b": 268.65967, "coord_origin": "TOPLEFT"}}, {"id": 46, "text": "\u56de\u4eba\u5de5\u77e5\u80fd\u5b66\u4f1a\u5168\u56fd\u5927\u4f1a", "bbox": {"l": 65.344376, "t": 264.5108, "r": 122.38297000000001, "b": 268.69219999999996, "coord_origin": "TOPLEFT"}}, {"id": 47, "text": "(2003)", "bbox": {"l": 116.45073999999998, "t": 264.30175999999994, "r": 129.88177, "b": 268.65967, "coord_origin": "TOPLEFT"}}, {"id": 48, "text": "208", "bbox": {"l": 183.10536, "t": 263.52739999999994, "r": 190.8567, "b": 267.88531, "coord_origin": "TOPLEFT"}}, {"id": 49, "text": "5", "bbox": {"l": 210.28223, "t": 263.52739999999994, "r": 212.86539, "b": 267.88531, "coord_origin": "TOPLEFT"}}, {"id": 50, "text": "203", "bbox": {"l": 232.29153, "t": 263.52739999999994, "r": 240.04287999999997, "b": 267.88531, "coord_origin": "TOPLEFT"}}, {"id": 51, "text": "152", "bbox": {"l": 256.88446, "t": 263.52739999999994, "r": 264.6358, "b": 267.88531, "coord_origin": "TOPLEFT"}}, {"id": 52, "text": "244", "bbox": {"l": 281.47742, "t": 263.52739999999994, "r": 289.22876, "b": 267.88531, "coord_origin": "TOPLEFT"}}, {"id": 53, "text": "\u81ea\u7136\u8a00\u8a9e\u51e6\u7406\u7814\u7a76\u4f1a\u7b2c", "bbox": {"l": 55.53052099999999, "t": 271.73785, "r": 107.38374, "b": 275.91925000000003, "coord_origin": "TOPLEFT"}}, {"id": 54, "text": "146", "bbox": {"l": 101.99034, "t": 271.52881, "r": 109.74168000000002, "b": 275.88671999999997, "coord_origin": "TOPLEFT"}}, {"id": 55, "text": "\u301c", "bbox": {"l": 109.74204, "t": 271.73785, "r": 114.92695000000002, "b": 275.91925000000003, "coord_origin": "TOPLEFT"}}, {"id": 56, "text": "155", "bbox": {"l": 114.38793, "t": 271.52881, "r": 122.13927, "b": 275.88671999999997, "coord_origin": "TOPLEFT"}}, {"id": 57, "text": "\u56de", "bbox": {"l": 122.13963, "t": 271.73785, "r": 127.32454000000001, "b": 275.91925000000003, "coord_origin": "TOPLEFT"}}, {"id": 58, "text": "98", "bbox": {"l": 184.39731, "t": 270.75446, "r": 189.56456, "b": 275.11237000000006, "coord_origin": "TOPLEFT"}}, {"id": 59, "text": "2", "bbox": {"l": 210.28223, "t": 270.75446, "r": 212.86539, "b": 275.11237000000006, "coord_origin": "TOPLEFT"}}, {"id": 60, "text": "96", "bbox": {"l": 233.58348, "t": 270.75446, "r": 238.75072999999998, "b": 275.11237000000006, "coord_origin": "TOPLEFT"}}, {"id": 61, "text": "150", "bbox": {"l": 256.88446, "t": 270.75446, "r": 264.6358, "b": 275.11237000000006, "coord_origin": "TOPLEFT"}}, {"id": 62, "text": "232", "bbox": {"l": 281.47742, "t": 270.75446, "r": 289.22876, "b": 275.11237000000006, "coord_origin": "TOPLEFT"}}, {"id": 63, "text": "WWW", "bbox": {"l": 55.53052099999999, "t": 279.01392, "r": 68.68605, "b": 283.37183, "coord_origin": "TOPLEFT"}}, {"id": 64, "text": "\u304b\u3089\u53ce\u96c6\u3057\u305f\u8ad6\u6587", "bbox": {"l": 68.685814, "t": 279.22295999999994, "r": 110.16829999999999, "b": 283.40436, "coord_origin": "TOPLEFT"}}, {"id": 65, "text": "107", "bbox": {"l": 183.10536, "t": 277.98157000000003, "r": 190.8567, "b": 282.33948000000004, "coord_origin": "TOPLEFT"}}, {"id": 66, "text": "73", "bbox": {"l": 208.99026, "t": 277.98157000000003, "r": 214.15752, "b": 282.33948000000004, "coord_origin": "TOPLEFT"}}, {"id": 67, "text": "34", "bbox": {"l": 233.58348, "t": 277.98157000000003, "r": 238.75072999999998, "b": 282.33948000000004, "coord_origin": "TOPLEFT"}}, {"id": 68, "text": "147", "bbox": {"l": 256.88446, "t": 277.98157000000003, "r": 264.6358, "b": 282.33948000000004, "coord_origin": "TOPLEFT"}}, {"id": 69, "text": "96", "bbox": {"l": 282.76938, "t": 277.98157000000003, "r": 287.93661, "b": 282.33948000000004, "coord_origin": "TOPLEFT"}}, {"id": 70, "text": "\u8a08", "bbox": {"l": 169.61508, "t": 286.45004, "r": 174.79999, "b": 290.63141, "coord_origin": "TOPLEFT"}}, {"id": 71, "text": "945", "bbox": {"l": 183.10536, "t": 285.46667, "r": 190.8567, "b": 289.8245800000001, "coord_origin": "TOPLEFT"}}, {"id": 72, "text": "294", "bbox": {"l": 207.69832, "t": 285.46667, "r": 215.44965999999997, "b": 289.8245800000001, "coord_origin": "TOPLEFT"}}, {"id": 73, "text": "651", "bbox": {"l": 232.29153, "t": 285.46667, "r": 240.04287999999997, "b": 289.8245800000001, "coord_origin": "TOPLEFT"}}, {"id": 74, "text": "1122", "bbox": {"l": 255.76506, "t": 285.46667, "r": 265.75204, "b": 289.8245800000001, "coord_origin": "TOPLEFT"}}, {"id": 75, "text": "955", "bbox": {"l": 281.47742, "t": 285.46667, "r": 289.22876, "b": 289.8245800000001, "coord_origin": "TOPLEFT"}}, {"id": 76, "text": "Text is aligned to match original for ease of viewing", "bbox": {"l": 380.42731, "t": 292.30426, "r": 549.42175, "b": 298.60284, "coord_origin": "TOPLEFT"}}, {"id": 77, "text": "Weighted Average Grant Date Fair", "bbox": {"l": 459.04861, "t": 221.62415, "r": 542.00018, "b": 226.68933000000004, "coord_origin": "TOPLEFT"}}, {"id": 78, "text": "Value", "bbox": {"l": 493.82193, "t": 227.83416999999997, "r": 507.2258, "b": 232.89935000000003, "coord_origin": "TOPLEFT"}}, {"id": 79, "text": "RS", "bbox": {"l": 393.2442, "t": 236.74712999999997, "r": 400.74588, "b": 241.81232, "coord_origin": "TOPLEFT"}}, {"id": 80, "text": "U", "bbox": {"l": 400.74643, "t": 236.74712999999997, "r": 404.64523, "b": 241.81232, "coord_origin": "TOPLEFT"}}, {"id": 81, "text": "s", "bbox": {"l": 404.6463, "t": 236.74712999999997, "r": 407.34631, "b": 241.81232, "coord_origin": "TOPLEFT"}}, {"id": 82, "text": "Shares (in millions)", "bbox": {"l": 392.09671, "t": 221.57446000000004, "r": 438.0145, "b": 226.63964999999996, "coord_origin": "TOPLEFT"}}, {"id": 83, "text": "PSUs", "bbox": {"l": 427.18323, "t": 236.74712999999997, "r": 440.98778999999996, "b": 241.81232, "coord_origin": "TOPLEFT"}}, {"id": 84, "text": "RSUs", "bbox": {"l": 468.38254, "t": 236.74712999999997, "r": 482.48465000000004, "b": 241.81232, "coord_origin": "TOPLEFT"}}, {"id": 85, "text": "PSUs", "bbox": {"l": 516.92578, "t": 236.74712999999997, "r": 530.73035, "b": 241.81232, "coord_origin": "TOPLEFT"}}, {"id": 86, "text": "Nonvested on Janua", "bbox": {"l": 306.11493, "t": 244.61084000000005, "r": 355.6532, "b": 249.67602999999997, "coord_origin": "TOPLEFT"}}, {"id": 87, "text": "ry 1", "bbox": {"l": 355.65427, "t": 244.61084000000005, "r": 364.65607, "b": 249.67602999999997, "coord_origin": "TOPLEFT"}}, {"id": 88, "text": "1.", "bbox": {"l": 396.24661, "t": 244.91327, "r": 400.75238, "b": 249.97844999999995, "coord_origin": "TOPLEFT"}}, {"id": 89, "text": "1", "bbox": {"l": 400.7529, "t": 244.91327, "r": 403.75531, "b": 249.97844999999995, "coord_origin": "TOPLEFT"}}, {"id": 90, "text": "0.3", "bbox": {"l": 429.81838999999997, "t": 244.91327, "r": 437.32708999999994, "b": 249.97844999999995, "coord_origin": "TOPLEFT"}}, {"id": 91, "text": "90.10", "bbox": {"l": 465.52859, "t": 244.91327, "r": 478.40103, "b": 249.97844999999995, "coord_origin": "TOPLEFT"}}, {"id": 92, "text": "$", "bbox": {"l": 480.97552, "t": 244.91327, "r": 483.55001999999996, "b": 249.97844999999995, "coord_origin": "TOPLEFT"}}, {"id": 93, "text": "$ 91.19", "bbox": {"l": 513.44824, "t": 244.91327, "r": 531.46967, "b": 249.97844999999995, "coord_origin": "TOPLEFT"}}, {"id": 94, "text": "Granted", "bbox": {"l": 306.11493, "t": 253.68451000000005, "r": 325.62674, "b": 258.74969, "coord_origin": "TOPLEFT"}}, {"id": 95, "text": "0.", "bbox": {"l": 396.24661, "t": 253.68451000000005, "r": 400.75238, "b": 258.74969, "coord_origin": "TOPLEFT"}}, {"id": 96, "text": "5", "bbox": {"l": 400.7529, "t": 253.68451000000005, "r": 403.75531, "b": 258.74969, "coord_origin": "TOPLEFT"}}, {"id": 97, "text": "0.1", "bbox": {"l": 429.81838999999997, "t": 253.68451000000005, "r": 437.32708999999994, "b": 258.74969, "coord_origin": "TOPLEFT"}}, {"id": 98, "text": "117.44", "bbox": {"l": 466.43579000000005, "t": 253.68451000000005, "r": 482.54831, "b": 258.74969, "coord_origin": "TOPLEFT"}}, {"id": 99, "text": "122.41", "bbox": {"l": 514.29065, "t": 253.68451000000005, "r": 530.80981, "b": 258.74969, "coord_origin": "TOPLEFT"}}, {"id": 100, "text": "Vested", "bbox": {"l": 306.11493, "t": 261.54822, "r": 322.62866, "b": 266.61339999999996, "coord_origin": "TOPLEFT"}}, {"id": 101, "text": "(0.", "bbox": {"l": 394.43222, "t": 261.54822, "r": 400.73563, "b": 266.61339999999996, "coord_origin": "TOPLEFT"}}, {"id": 102, "text": "5", "bbox": {"l": 400.73456, "t": 261.54822, "r": 403.73697, "b": 266.61339999999996, "coord_origin": "TOPLEFT"}}, {"id": 103, "text": ")", "bbox": {"l": 403.73804, "t": 261.54822, "r": 405.53625, "b": 266.61339999999996, "coord_origin": "TOPLEFT"}}, {"id": 104, "text": "(0.1)", "bbox": {"l": 427.7016, "t": 261.54822, "r": 438.80563, "b": 266.61339999999996, "coord_origin": "TOPLEFT"}}, {"id": 105, "text": "87.08", "bbox": {"l": 468.55533, "t": 261.54822, "r": 482.07043, "b": 266.61339999999996, "coord_origin": "TOPLEFT"}}, {"id": 106, "text": "81.14", "bbox": {"l": 516.01862, "t": 261.54822, "r": 529.53375, "b": 266.61339999999996, "coord_origin": "TOPLEFT"}}, {"id": 107, "text": "Canceled or forfeited", "bbox": {"l": 306.11493, "t": 269.64148, "r": 356.24771, "b": 274.70667000000003, "coord_origin": "TOPLEFT"}}, {"id": 108, "text": "(0.", "bbox": {"l": 394.43222, "t": 270.31946000000005, "r": 400.73563, "b": 275.38464, "coord_origin": "TOPLEFT"}}, {"id": 109, "text": "1", "bbox": {"l": 400.73456, "t": 270.31946000000005, "r": 403.73697, "b": 275.38464, "coord_origin": "TOPLEFT"}}, {"id": 110, "text": ")", "bbox": {"l": 403.73804, "t": 270.31946000000005, "r": 405.53625, "b": 275.38464, "coord_origin": "TOPLEFT"}}, {"id": 111, "text": "-", "bbox": {"l": 431.02802, "t": 270.31946000000005, "r": 436.4280099999999, "b": 275.38464, "coord_origin": "TOPLEFT"}}, {"id": 112, "text": "102.01", "bbox": {"l": 465.83099000000004, "t": 270.31946000000005, "r": 482.35013, "b": 275.38464, "coord_origin": "TOPLEFT"}}, {"id": 113, "text": "92.18", "bbox": {"l": 516.01862, "t": 270.31946000000005, "r": 529.53375, "b": 275.38464, "coord_origin": "TOPLEFT"}}, {"id": 114, "text": "Nonvested on December 31", "bbox": {"l": 306.11493, "t": 278.48572, "r": 373.35764, "b": 283.55092999999994, "coord_origin": "TOPLEFT"}}, {"id": 115, "text": "1.0", "bbox": {"l": 396.24661, "t": 278.48572, "r": 403.75531, "b": 283.55092999999994, "coord_origin": "TOPLEFT"}}, {"id": 116, "text": "0.3", "bbox": {"l": 429.51599, "t": 278.48572, "r": 437.02469, "b": 283.55092999999994, "coord_origin": "TOPLEFT"}}, {"id": 117, "text": "104.85 $", "bbox": {"l": 463.7142, "t": 278.48572, "r": 484.73965000000004, "b": 283.55092999999994, "coord_origin": "TOPLEFT"}}, {"id": 118, "text": "$ 104.51", "bbox": {"l": 512.99463, "t": 278.48572, "r": 534.02008, "b": 283.55092999999994, "coord_origin": "TOPLEFT"}}, {"id": 119, "text": "Figure 5:", "bbox": {"l": 50.112, "t": 320.87735, "r": 86.864021, "b": 329.78391, "coord_origin": "TOPLEFT"}}, {"id": 120, "text": "One of the benefits of TableFormer is that it is language agnostic, as an example, the left part of the illustration", "bbox": {"l": 93.917542, "t": 320.87735, "r": 545.11371, "b": 329.78391, "coord_origin": "TOPLEFT"}}, {"id": 121, "text": "demonstrates TableFormer predictions on previously unseen language (Japanese). Additionally, we see that TableFormer is", "bbox": {"l": 50.112, "t": 332.83233999999993, "r": 545.11371, "b": 341.73889, "coord_origin": "TOPLEFT"}}, {"id": 122, "text": "robust to variability in style and content, right side of the illustration shows the example of the TableFormer prediction from", "bbox": {"l": 50.112, "t": 344.78732, "r": 545.11377, "b": 353.69388, "coord_origin": "TOPLEFT"}}, {"id": 123, "text": "the FinTabNet dataset.", "bbox": {"l": 50.112, "t": 356.74332, "r": 139.79532, "b": 365.64987, "coord_origin": "TOPLEFT"}}, {"id": 124, "text": "Red - PDF cells, Green - predicted bounding boxes", "bbox": {"l": 220.26282, "t": 381.77722, "r": 342.07819, "b": 386.44281, "coord_origin": "TOPLEFT"}}, {"id": 125, "text": "Ground Truth", "bbox": {"l": 53.715248, "t": 381.77722, "r": 85.657333, "b": 386.44281, "coord_origin": "TOPLEFT"}}, {"id": 126, "text": "16", "bbox": {"l": 437.37939, "t": 400.55295, "r": 443.69870000000003, "b": 406.87158, "coord_origin": "TOPLEFT"}}, {"id": 127, "text": "17", "bbox": {"l": 450.33203, "t": 400.55295, "r": 456.6513100000001, "b": 406.87158, "coord_origin": "TOPLEFT"}}, {"id": 128, "text": "18", "bbox": {"l": 463.28464, "t": 400.55295, "r": 469.60394, "b": 406.87158, "coord_origin": "TOPLEFT"}}, {"id": 129, "text": "19", "bbox": {"l": 476.23724000000004, "t": 400.55295, "r": 482.5565500000001, "b": 406.87158, "coord_origin": "TOPLEFT"}}, {"id": 130, "text": "20", "bbox": {"l": 489.18988, "t": 400.55295, "r": 495.50916, "b": 406.87158, "coord_origin": "TOPLEFT"}}, {"id": 131, "text": "21", "bbox": {"l": 502.14251999999993, "t": 400.55295, "r": 508.46178999999995, "b": 406.87158, "coord_origin": "TOPLEFT"}}, {"id": 132, "text": "22", "bbox": {"l": 515.09509, "t": 400.55295, "r": 521.41443, "b": 406.87158, "coord_origin": "TOPLEFT"}}, {"id": 133, "text": "23", "bbox": {"l": 385.2814, "t": 411.03836000000007, "r": 391.60071, "b": 417.35699, "coord_origin": "TOPLEFT"}}, {"id": 134, "text": "24", "bbox": {"l": 398.52341, "t": 411.03836000000007, "r": 404.84271, "b": 417.35699, "coord_origin": "TOPLEFT"}}, {"id": 135, "text": "25", "bbox": {"l": 411.47604, "t": 411.03836000000007, "r": 417.79535, "b": 417.35699, "coord_origin": "TOPLEFT"}}, {"id": 136, "text": "26", "bbox": {"l": 437.37939, "t": 411.03836000000007, "r": 443.69870000000003, "b": 417.35699, "coord_origin": "TOPLEFT"}}, {"id": 137, "text": "27", "bbox": {"l": 450.33203, "t": 411.03836000000007, "r": 456.6513100000001, "b": 417.35699, "coord_origin": "TOPLEFT"}}, {"id": 138, "text": "28", "bbox": {"l": 463.28464, "t": 411.03836000000007, "r": 469.60394, "b": 417.35699, "coord_origin": "TOPLEFT"}}, {"id": 139, "text": "30", "bbox": {"l": 385.2814, "t": 421.0697, "r": 391.60071, "b": 427.38834, "coord_origin": "TOPLEFT"}}, {"id": 140, "text": "31", "bbox": {"l": 398.52341, "t": 421.0697, "r": 404.84271, "b": 427.38834, "coord_origin": "TOPLEFT"}}, {"id": 141, "text": "32", "bbox": {"l": 411.47604, "t": 421.0697, "r": 417.79532, "b": 427.38834, "coord_origin": "TOPLEFT"}}, {"id": 142, "text": "33", "bbox": {"l": 424.42865, "t": 421.0697, "r": 430.74796, "b": 427.38834, "coord_origin": "TOPLEFT"}}, {"id": 143, "text": "34", "bbox": {"l": 437.38129, "t": 421.0697, "r": 443.70056, "b": 427.38834, "coord_origin": "TOPLEFT"}}, {"id": 144, "text": "35", "bbox": {"l": 450.33389000000005, "t": 421.0697, "r": 456.65319999999997, "b": 427.38834, "coord_origin": "TOPLEFT"}}, {"id": 145, "text": "36", "bbox": {"l": 463.2865, "t": 421.0697, "r": 469.6058, "b": 427.38834, "coord_origin": "TOPLEFT"}}, {"id": 146, "text": "37", "bbox": {"l": 476.23914, "t": 421.0697, "r": 482.55841, "b": 427.38834, "coord_origin": "TOPLEFT"}}, {"id": 147, "text": "38", "bbox": {"l": 489.1917700000001, "t": 421.0697, "r": 495.51105, "b": 427.38834, "coord_origin": "TOPLEFT"}}, {"id": 148, "text": "39", "bbox": {"l": 502.14438, "t": 421.0697, "r": 508.46368, "b": 427.38834, "coord_origin": "TOPLEFT"}}, {"id": 149, "text": "40", "bbox": {"l": 515.09705, "t": 421.0697, "r": 521.41632, "b": 427.38834, "coord_origin": "TOPLEFT"}}, {"id": 150, "text": "41", "bbox": {"l": 528.04962, "t": 421.0697, "r": 534.3689, "b": 427.38834, "coord_origin": "TOPLEFT"}}, {"id": 151, "text": "42", "bbox": {"l": 385.2814, "t": 432.04431, "r": 391.60071, "b": 438.36295, "coord_origin": "TOPLEFT"}}, {"id": 152, "text": "43", "bbox": {"l": 398.52341, "t": 432.04431, "r": 404.84271, "b": 438.36295, "coord_origin": "TOPLEFT"}}, {"id": 153, "text": "44", "bbox": {"l": 411.47604, "t": 432.04431, "r": 417.79532, "b": 438.36295, "coord_origin": "TOPLEFT"}}, {"id": 154, "text": "45", "bbox": {"l": 424.42865, "t": 432.04431, "r": 430.74796, "b": 438.36295, "coord_origin": "TOPLEFT"}}, {"id": 155, "text": "46", "bbox": {"l": 437.38129, "t": 432.04431, "r": 443.70056, "b": 438.36295, "coord_origin": "TOPLEFT"}}, {"id": 156, "text": "47", "bbox": {"l": 450.33389000000005, "t": 432.04431, "r": 456.65319999999997, "b": 438.36295, "coord_origin": "TOPLEFT"}}, {"id": 157, "text": "48", "bbox": {"l": 463.2865, "t": 432.04431, "r": 469.6058, "b": 438.36295, "coord_origin": "TOPLEFT"}}, {"id": 158, "text": "49", "bbox": {"l": 476.23914, "t": 432.04431, "r": 482.55841, "b": 438.36295, "coord_origin": "TOPLEFT"}}, {"id": 159, "text": "50", "bbox": {"l": 489.1917700000001, "t": 432.04431, "r": 495.51105, "b": 438.36295, "coord_origin": "TOPLEFT"}}, {"id": 160, "text": "51", "bbox": {"l": 502.14438, "t": 432.04431, "r": 508.46368, "b": 438.36295, "coord_origin": "TOPLEFT"}}, {"id": 161, "text": "52", "bbox": {"l": 515.09705, "t": 432.04431, "r": 521.41632, "b": 438.36295, "coord_origin": "TOPLEFT"}}, {"id": 162, "text": "53", "bbox": {"l": 528.04962, "t": 432.04431, "r": 534.3689, "b": 438.36295, "coord_origin": "TOPLEFT"}}, {"id": 163, "text": "0", "bbox": {"l": 385.2814, "t": 389.20004, "r": 388.44073, "b": 395.51868, "coord_origin": "TOPLEFT"}}, {"id": 164, "text": "1", "bbox": {"l": 398.52341, "t": 389.20004, "r": 401.68274, "b": 395.51868, "coord_origin": "TOPLEFT"}}, {"id": 165, "text": "2", "bbox": {"l": 411.4754, "t": 389.20004, "r": 414.63474, "b": 395.51868, "coord_origin": "TOPLEFT"}}, {"id": 166, "text": "3", "bbox": {"l": 424.4274, "t": 389.20004, "r": 427.58673, "b": 395.51868, "coord_origin": "TOPLEFT"}}, {"id": 167, "text": "4", "bbox": {"l": 437.37939, "t": 389.20004, "r": 440.53870000000006, "b": 395.51868, "coord_origin": "TOPLEFT"}}, {"id": 168, "text": "5", "bbox": {"l": 450.33136, "t": 389.20004, "r": 453.49069000000003, "b": 395.51868, "coord_origin": "TOPLEFT"}}, {"id": 169, "text": "6", "bbox": {"l": 463.28336, "t": 389.20004, "r": 466.44269, "b": 395.51868, "coord_origin": "TOPLEFT"}}, {"id": 170, "text": "7", "bbox": {"l": 476.23535, "t": 389.20004, "r": 479.39468, "b": 395.51868, "coord_origin": "TOPLEFT"}}, {"id": 171, "text": "8", "bbox": {"l": 489.18735, "t": 389.20004, "r": 492.34668, "b": 395.51868, "coord_origin": "TOPLEFT"}}, {"id": 172, "text": "9", "bbox": {"l": 502.13933999999995, "t": 389.20004, "r": 505.29868000000005, "b": 395.51868, "coord_origin": "TOPLEFT"}}, {"id": 173, "text": "10", "bbox": {"l": 515.09131, "t": 389.20004, "r": 521.41064, "b": 395.51868, "coord_origin": "TOPLEFT"}}, {"id": 174, "text": "11", "bbox": {"l": 528.04364, "t": 389.20004, "r": 534.13104, "b": 395.51868, "coord_origin": "TOPLEFT"}}, {"id": 175, "text": "12", "bbox": {"l": 385.2814, "t": 398.97464, "r": 391.60071, "b": 405.29327, "coord_origin": "TOPLEFT"}}, {"id": 176, "text": "13", "bbox": {"l": 398.52341, "t": 398.97464, "r": 404.84271, "b": 405.29327, "coord_origin": "TOPLEFT"}}, {"id": 177, "text": "14", "bbox": {"l": 411.47604, "t": 398.97464, "r": 417.79535, "b": 405.29327, "coord_origin": "TOPLEFT"}}, {"id": 178, "text": "15", "bbox": {"l": 424.42719, "t": 406.77463000000006, "r": 430.74648999999994, "b": 413.09326, "coord_origin": "TOPLEFT"}}, {"id": 179, "text": "29", "bbox": {"l": 502.86941999999993, "t": 410.99438, "r": 509.18871999999993, "b": 417.31302, "coord_origin": "TOPLEFT"}}, {"id": 180, "text": "Predicted Structure", "bbox": {"l": 384.35437, "t": 381.77722, "r": 430.99261, "b": 386.44281, "coord_origin": "TOPLEFT"}}, {"id": 181, "text": "Figure 6: An example of TableFormer predictions (bounding boxes and structure) from generated SynthTabNet table.", "bbox": {"l": 62.595001, "t": 458.72836, "r": 532.63049, "b": 467.63492, "coord_origin": "TOPLEFT"}}, {"id": 182, "text": "5.5.", "bbox": {"l": 50.112, "t": 491.39536, "r": 64.448898, "b": 501.24741, "coord_origin": "TOPLEFT"}}, {"id": 183, "text": "Qualitative Analysis", "bbox": {"l": 74.006828, "t": 491.39536, "r": 163.7558, "b": 501.24741, "coord_origin": "TOPLEFT"}}, {"id": 184, "text": "We showcase several visualizations for the different", "bbox": {"l": 62.067001, "t": 536.87337, "r": 286.36499, "b": 545.77992, "coord_origin": "TOPLEFT"}}, {"id": 185, "text": "components of our network on various", "bbox": {"l": 50.112, "t": 548.82837, "r": 211.15741, "b": 557.73492, "coord_origin": "TOPLEFT"}}, {"id": 186, "text": "\u201ccomplex\u201d", "bbox": {"l": 215.10000999999997, "t": 548.91803, "r": 259.17453, "b": 557.50578, "coord_origin": "TOPLEFT"}}, {"id": 187, "text": "tables", "bbox": {"l": 263.12, "t": 548.82837, "r": 286.36273, "b": 557.73492, "coord_origin": "TOPLEFT"}}, {"id": 188, "text": "within datasets presented in this work in Fig. 5 and Fig. 6", "bbox": {"l": 50.112, "t": 560.78337, "r": 286.36505, "b": 569.68993, "coord_origin": "TOPLEFT"}}, {"id": 189, "text": "As it is shown, our model is able to predict bounding boxes", "bbox": {"l": 50.112, "t": 572.73837, "r": 286.36508, "b": 581.6449299999999, "coord_origin": "TOPLEFT"}}, {"id": 190, "text": "for all table cells, even for the empty ones. Additionally,", "bbox": {"l": 50.112, "t": 584.69337, "r": 286.36508, "b": 593.59993, "coord_origin": "TOPLEFT"}}, {"id": 191, "text": "our post-processing techniques can extract the cell content", "bbox": {"l": 50.112, "t": 596.64937, "r": 286.36505, "b": 605.55592, "coord_origin": "TOPLEFT"}}, {"id": 192, "text": "by matching the predicted bounding boxes to the PDF cells", "bbox": {"l": 50.112, "t": 608.60437, "r": 286.36508, "b": 617.51093, "coord_origin": "TOPLEFT"}}, {"id": 193, "text": "based on their overlap and spatial proximity. The left part", "bbox": {"l": 50.112, "t": 620.55937, "r": 286.36508, "b": 629.46593, "coord_origin": "TOPLEFT"}}, {"id": 194, "text": "of Fig. 5 demonstrates also the adaptability of our method", "bbox": {"l": 50.112, "t": 632.51437, "r": 286.36508, "b": 641.42093, "coord_origin": "TOPLEFT"}}, {"id": 195, "text": "to any language, as it can successfully extract Japanese", "bbox": {"l": 50.112, "t": 644.46938, "r": 286.36508, "b": 653.37593, "coord_origin": "TOPLEFT"}}, {"id": 196, "text": "text, although the training set contains only English content.", "bbox": {"l": 50.112, "t": 656.42438, "r": 286.36511, "b": 665.33094, "coord_origin": "TOPLEFT"}}, {"id": 197, "text": "We provide more visualizations including the intermediate", "bbox": {"l": 50.112, "t": 668.38037, "r": 286.36508, "b": 677.28694, "coord_origin": "TOPLEFT"}}, {"id": 198, "text": "steps in the supplementary material. Overall these illustra-", "bbox": {"l": 50.112, "t": 680.33537, "r": 286.36511, "b": 689.24194, "coord_origin": "TOPLEFT"}}, {"id": 199, "text": "tions justify the versatility of our method across a diverse", "bbox": {"l": 50.112, "t": 692.290375, "r": 286.36511, "b": 701.196945, "coord_origin": "TOPLEFT"}}, {"id": 200, "text": "range of table appearances and content type.", "bbox": {"l": 50.112, "t": 704.245377, "r": 226.88833999999997, "b": 713.1519470000001, "coord_origin": "TOPLEFT"}}, {"id": 201, "text": "6.", "bbox": {"l": 308.862, "t": 490.70892, "r": 316.07382, "b": 501.45663, "coord_origin": "TOPLEFT"}}, {"id": 202, "text": "Future Work & Conclusion", "bbox": {"l": 325.68954, "t": 490.70892, "r": 460.84848, "b": 501.45663, "coord_origin": "TOPLEFT"}}, {"id": 203, "text": "In this paper, we presented TableFormer an end-to-end", "bbox": {"l": 320.81699, "t": 512.89337, "r": 545.11505, "b": 521.79993, "coord_origin": "TOPLEFT"}}, {"id": 204, "text": "transformer based approach to predict table structures and", "bbox": {"l": 308.862, "t": 524.84836, "r": 545.11517, "b": 533.75491, "coord_origin": "TOPLEFT"}}, {"id": 205, "text": "bounding boxes of cells from an image. This approach en-", "bbox": {"l": 308.862, "t": 536.80336, "r": 545.11511, "b": 545.70992, "coord_origin": "TOPLEFT"}}, {"id": 206, "text": "ables us to recreate the table structure, and extract the cell", "bbox": {"l": 308.862, "t": 548.75836, "r": 545.11505, "b": 557.6649199999999, "coord_origin": "TOPLEFT"}}, {"id": 207, "text": "content from PDF or OCR by using bounding boxes. Ad-", "bbox": {"l": 308.862, "t": 560.71336, "r": 545.11517, "b": 569.61992, "coord_origin": "TOPLEFT"}}, {"id": 208, "text": "ditionally, it provides the versatility required in real-world", "bbox": {"l": 308.862, "t": 572.66837, "r": 545.11511, "b": 581.57492, "coord_origin": "TOPLEFT"}}, {"id": 209, "text": "scenarios when dealing with various types of PDF docu-", "bbox": {"l": 308.862, "t": 584.62436, "r": 545.11511, "b": 593.53091, "coord_origin": "TOPLEFT"}}, {"id": 210, "text": "ments, and languages.", "bbox": {"l": 308.862, "t": 596.57936, "r": 400.46808, "b": 605.48592, "coord_origin": "TOPLEFT"}}, {"id": 211, "text": "Furthermore, our method outper-", "bbox": {"l": 408.37839, "t": 596.57936, "r": 545.11511, "b": 605.48592, "coord_origin": "TOPLEFT"}}, {"id": 212, "text": "forms all state-of-the-arts with a wide margin. Finally, we", "bbox": {"l": 308.862, "t": 608.53436, "r": 545.11505, "b": 617.44092, "coord_origin": "TOPLEFT"}}, {"id": 213, "text": "introduce \u201cSynthTabNet\u201d a challenging synthetically gen-", "bbox": {"l": 308.862, "t": 620.48936, "r": 545.11511, "b": 629.3959199999999, "coord_origin": "TOPLEFT"}}, {"id": 214, "text": "erated dataset that reinforces missing characteristics from", "bbox": {"l": 308.862, "t": 632.4443699999999, "r": 545.11505, "b": 641.35092, "coord_origin": "TOPLEFT"}}, {"id": 215, "text": "other datasets.", "bbox": {"l": 308.862, "t": 644.39937, "r": 365.85803, "b": 653.30592, "coord_origin": "TOPLEFT"}}, {"id": 216, "text": "References", "bbox": {"l": 308.862, "t": 672.09892, "r": 364.40585, "b": 682.84664, "coord_origin": "TOPLEFT"}}, {"id": 217, "text": "[1]", "bbox": {"l": 313.345, "t": 693.9617920000001, "r": 323.80792, "b": 701.977753, "coord_origin": "TOPLEFT"}}, {"id": 218, "text": "Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas", "bbox": {"l": 326.05127, "t": 693.9617920000001, "r": 545.10852, "b": 701.977753, "coord_origin": "TOPLEFT"}}, {"id": 219, "text": "Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to-", "bbox": {"l": 328.78101, "t": 704.920792, "r": 545.1134, "b": 712.936752, "coord_origin": "TOPLEFT"}}, {"id": 220, "text": "8", "bbox": {"l": 295.121, "t": 734.133366, "r": 300.10229, "b": 743.039928, "coord_origin": "TOPLEFT"}}], "predictions": {"layout": {"clusters": [{"id": 0, "label": "text", "bbox": {"l": 53.811783000000005, "t": 208.23328000000004, "r": 62.219952, "b": 216.10645, "coord_origin": "TOPLEFT"}, "confidence": -1.0, "cells": [{"id": 0, "text": "b.", "bbox": {"l": 53.811783000000005, "t": 208.23328000000004, "r": 62.219952, "b": 216.10645, "coord_origin": "TOPLEFT"}}]}, {"id": 1, "label": "text", "bbox": {"l": 66.424026, "t": 208.23328000000004, "r": 385.93451, "b": 216.10645, "coord_origin": "TOPLEFT"}, "confidence": -1.0, "cells": [{"id": 1, "text": "Structure predicted by TableFormer, with superimposed matched PDF cell text:", "bbox": {"l": 66.424026, "t": 208.23328000000004, "r": 385.93451, "b": 216.10645, "coord_origin": "TOPLEFT"}}]}, {"id": 2, "label": "text", "bbox": {"l": 53.811783000000005, "t": 94.28112999999996, "r": 284.34592, "b": 102.15430000000003, "coord_origin": "TOPLEFT"}, "confidence": -1.0, "cells": [{"id": 2, "text": "Japanese language (previously unseen by TableFormer):", "bbox": {"l": 53.811783000000005, "t": 94.28112999999996, "r": 284.34592, "b": 102.15430000000003, "coord_origin": "TOPLEFT"}}]}, {"id": 3, "label": "picture", "bbox": {"l": 304.83081, "t": 94.28112999999996, "r": 554.8255615234375, "b": 180.62570190429688, "coord_origin": "TOPLEFT"}, "confidence": 0.7697690725326538, "cells": [{"id": 3, "text": "Example table from FinTabNet:", "bbox": {"l": 304.83081, "t": 94.28112999999996, "r": 431.09119, "b": 102.15430000000003, "coord_origin": "TOPLEFT"}}]}, {"id": 4, "label": "text", "bbox": {"l": 53.286037, "t": 78.68756000000008, "r": 61.550289, "b": 86.56073000000004, "coord_origin": "TOPLEFT"}, "confidence": -1.0, "cells": [{"id": 4, "text": "a.", "bbox": {"l": 53.286037, "t": 78.68756000000008, "r": 61.550289, "b": 86.56073000000004, "coord_origin": "TOPLEFT"}}]}, {"id": 5, "label": "text", "bbox": {"l": 65.682419, "t": 77.8168716430664, "r": 500.1541748046875, "b": 86.9799575805664, "coord_origin": "TOPLEFT"}, "confidence": 0.6126988530158997, "cells": [{"id": 5, "text": "Red - PDF cells, Green - predicted bounding boxes, Blue - post-processed predictions matched to PDF cells", "bbox": {"l": 65.682419, "t": 78.68756000000008, "r": 499.55563, "b": 86.56073000000004, "coord_origin": "TOPLEFT"}}]}, {"id": 6, "label": "table", "bbox": {"l": 53.6285400390625, "t": 218.94859313964844, "r": 298.5574035644531, "b": 292.3999938964844, "coord_origin": "TOPLEFT"}, "confidence": 0.8824083805084229, "cells": [{"id": 6, "text": "\u8ad6\u6587\u30d5\u30a1\u30a4\u30eb", "bbox": {"l": 209.93285, "t": 222.18073000000004, "r": 241.04458999999997, "b": 226.36212, "coord_origin": "TOPLEFT"}}, {"id": 7, "text": "\u53c2\u8003\u6587\u732e", "bbox": {"l": 263.76489, "t": 222.18073000000004, "r": 284.50589, "b": 226.36212, "coord_origin": "TOPLEFT"}}, {"id": 8, "text": "\u51fa\u5178", "bbox": {"l": 110.24990999999999, "t": 229.66594999999995, "r": 120.62018, "b": 233.84735, "coord_origin": "TOPLEFT"}}, {"id": 9, "text": "\u30d5\u30a1\u30a4\u30eb", "bbox": {"l": 175.36609, "t": 229.66594999999995, "r": 196.1071, "b": 233.84735, "coord_origin": "TOPLEFT"}}, {"id": 10, "text": "\u6570", "bbox": {"l": 196.10756, "t": 229.66594999999995, "r": 201.29247, "b": 233.84735, "coord_origin": "TOPLEFT"}}, {"id": 11, "text": "\u82f1\u8a9e", "bbox": {"l": 209.62408, "t": 229.66594999999995, "r": 219.99435, "b": 233.84735, "coord_origin": "TOPLEFT"}}, {"id": 12, "text": "\u65e5\u672c\u8a9e", "bbox": {"l": 229.19814, "t": 229.66594999999995, "r": 244.75377, "b": 233.84735, "coord_origin": "TOPLEFT"}}, {"id": 13, "text": "\u82f1\u8a9e", "bbox": {"l": 256.1142, "t": 229.66594999999995, "r": 266.48447, "b": 233.84735, "coord_origin": "TOPLEFT"}}, {"id": 14, "text": "\u65e5\u672c\u8a9e", "bbox": {"l": 278.38434, "t": 229.66594999999995, "r": 293.93997, "b": 233.84735, "coord_origin": "TOPLEFT"}}, {"id": 15, "text": "Association for Computational Linguistics(ACL2003)", "bbox": {"l": 55.53052099999999, "t": 236.42584, "r": 162.7131, "b": 240.78375000000005, "coord_origin": "TOPLEFT"}}, {"id": 16, "text": "65", "bbox": {"l": 184.39731, "t": 236.42584, "r": 189.56456, "b": 240.78375000000005, "coord_origin": "TOPLEFT"}}, {"id": 17, "text": "65", "bbox": {"l": 208.99026, "t": 236.42584, "r": 214.15752, "b": 240.78375000000005, "coord_origin": "TOPLEFT"}}, {"id": 18, "text": "0", "bbox": {"l": 234.87517, "t": 236.42584, "r": 237.45833000000002, "b": 240.78375000000005, "coord_origin": "TOPLEFT"}}, {"id": 19, "text": "150", "bbox": {"l": 256.88446, "t": 236.42584, "r": 264.6358, "b": 240.78375000000005, "coord_origin": "TOPLEFT"}}, {"id": 20, "text": "0", "bbox": {"l": 284.06134, "t": 236.42584, "r": 286.6445, "b": 240.78375000000005, "coord_origin": "TOPLEFT"}}, {"id": 21, "text": "Computational Linguistics(COLING2002)", "bbox": {"l": 55.53052099999999, "t": 242.62048000000004, "r": 139.72253, "b": 246.97839, "coord_origin": "TOPLEFT"}}, {"id": 22, "text": "140", "bbox": {"l": 183.10536, "t": 242.62048000000004, "r": 190.8567, "b": 246.97839, "coord_origin": "TOPLEFT"}}, {"id": 23, "text": "140", "bbox": {"l": 207.69832, "t": 242.62048000000004, "r": 215.44965999999997, "b": 246.97839, "coord_origin": "TOPLEFT"}}, {"id": 24, "text": "0", "bbox": {"l": 234.87517, "t": 242.62048000000004, "r": 237.45833000000002, "b": 246.97839, "coord_origin": "TOPLEFT"}}, {"id": 25, "text": "150", "bbox": {"l": 256.88446, "t": 242.62048000000004, "r": 264.6358, "b": 246.97839, "coord_origin": "TOPLEFT"}}, {"id": 26, "text": "0", "bbox": {"l": 284.06134, "t": 242.62048000000004, "r": 286.6445, "b": 246.97839, "coord_origin": "TOPLEFT"}}, {"id": 27, "text": "\u96fb\u6c17\u60c5\u5831\u901a\u4fe1\u5b66\u4f1a", "bbox": {"l": 55.53052099999999, "t": 249.79845999999998, "r": 97.013, "b": 253.97986000000003, "coord_origin": "TOPLEFT"}}, {"id": 28, "text": "2003", "bbox": {"l": 92.698288, "t": 249.58942000000002, "r": 103.03371, "b": 253.94732999999997, "coord_origin": "TOPLEFT"}}, {"id": 29, "text": "\u5e74\u7dcf\u5408\u5927\u4f1a", "bbox": {"l": 103.03389, "t": 249.79845999999998, "r": 128.96027, "b": 253.97986000000003, "coord_origin": "TOPLEFT"}}, {"id": 30, "text": "150", "bbox": {"l": 183.10536, "t": 248.81506000000002, "r": 190.8567, "b": 253.17296999999996, "coord_origin": "TOPLEFT"}}, {"id": 31, "text": "8", "bbox": {"l": 210.28223, "t": 248.81506000000002, "r": 212.86539, "b": 253.17296999999996, "coord_origin": "TOPLEFT"}}, {"id": 32, "text": "142", "bbox": {"l": 232.29153, "t": 248.81506000000002, "r": 240.04287999999997, "b": 253.17296999999996, "coord_origin": "TOPLEFT"}}, {"id": 33, "text": "223", "bbox": {"l": 256.88446, "t": 248.81506000000002, "r": 264.6358, "b": 253.17296999999996, "coord_origin": "TOPLEFT"}}, {"id": 34, "text": "147", "bbox": {"l": 281.47742, "t": 248.81506000000002, "r": 289.22876, "b": 253.17296999999996, "coord_origin": "TOPLEFT"}}, {"id": 35, "text": "\u60c5\u5831\u51e6\u7406\u5b66\u4f1a\u7b2c", "bbox": {"l": 55.53052099999999, "t": 257.28369, "r": 91.827637, "b": 261.46509000000003, "coord_origin": "TOPLEFT"}}, {"id": 36, "text": "65", "bbox": {"l": 88.052673, "t": 257.07465, "r": 93.219925, "b": 261.43255999999997, "coord_origin": "TOPLEFT"}}, {"id": 37, "text": "\u56de\u5168\u56fd\u5927\u4f1a", "bbox": {"l": 93.220474, "t": 257.28369, "r": 119.14685, "b": 261.46509000000003, "coord_origin": "TOPLEFT"}}, {"id": 38, "text": "(2003)", "bbox": {"l": 116.45073999999998, "t": 257.07465, "r": 129.88177, "b": 261.43255999999997, "coord_origin": "TOPLEFT"}}, {"id": 39, "text": "177", "bbox": {"l": 183.10536, "t": 256.30029, "r": 190.8567, "b": 260.65819999999997, "coord_origin": "TOPLEFT"}}, {"id": 40, "text": "1", "bbox": {"l": 210.28223, "t": 256.30029, "r": 212.86539, "b": 260.65819999999997, "coord_origin": "TOPLEFT"}}, {"id": 41, "text": "176", "bbox": {"l": 232.29153, "t": 256.30029, "r": 240.04287999999997, "b": 260.65819999999997, "coord_origin": "TOPLEFT"}}, {"id": 42, "text": "150", "bbox": {"l": 256.88446, "t": 256.30029, "r": 264.6358, "b": 260.65819999999997, "coord_origin": "TOPLEFT"}}, {"id": 43, "text": "236", "bbox": {"l": 281.47742, "t": 256.30029, "r": 289.22876, "b": 260.65819999999997, "coord_origin": "TOPLEFT"}}, {"id": 44, "text": "\u7b2c", "bbox": {"l": 55.53052099999999, "t": 264.5108, "r": 60.715424, "b": 268.69219999999996, "coord_origin": "TOPLEFT"}}, {"id": 45, "text": "17", "bbox": {"l": 60.17654799999999, "t": 264.30175999999994, "r": 65.343796, "b": 268.65967, "coord_origin": "TOPLEFT"}}, {"id": 46, "text": "\u56de\u4eba\u5de5\u77e5\u80fd\u5b66\u4f1a\u5168\u56fd\u5927\u4f1a", "bbox": {"l": 65.344376, "t": 264.5108, "r": 122.38297000000001, "b": 268.69219999999996, "coord_origin": "TOPLEFT"}}, {"id": 47, "text": "(2003)", "bbox": {"l": 116.45073999999998, "t": 264.30175999999994, "r": 129.88177, "b": 268.65967, "coord_origin": "TOPLEFT"}}, {"id": 48, "text": "208", "bbox": {"l": 183.10536, "t": 263.52739999999994, "r": 190.8567, "b": 267.88531, "coord_origin": "TOPLEFT"}}, {"id": 49, "text": "5", "bbox": {"l": 210.28223, "t": 263.52739999999994, "r": 212.86539, "b": 267.88531, "coord_origin": "TOPLEFT"}}, {"id": 50, "text": "203", "bbox": {"l": 232.29153, "t": 263.52739999999994, "r": 240.04287999999997, "b": 267.88531, "coord_origin": "TOPLEFT"}}, {"id": 51, "text": "152", "bbox": {"l": 256.88446, "t": 263.52739999999994, "r": 264.6358, "b": 267.88531, "coord_origin": "TOPLEFT"}}, {"id": 52, "text": "244", "bbox": {"l": 281.47742, "t": 263.52739999999994, "r": 289.22876, "b": 267.88531, "coord_origin": "TOPLEFT"}}, {"id": 53, "text": "\u81ea\u7136\u8a00\u8a9e\u51e6\u7406\u7814\u7a76\u4f1a\u7b2c", "bbox": {"l": 55.53052099999999, "t": 271.73785, "r": 107.38374, "b": 275.91925000000003, "coord_origin": "TOPLEFT"}}, {"id": 54, "text": "146", "bbox": {"l": 101.99034, "t": 271.52881, "r": 109.74168000000002, "b": 275.88671999999997, "coord_origin": "TOPLEFT"}}, {"id": 55, "text": "\u301c", "bbox": {"l": 109.74204, "t": 271.73785, "r": 114.92695000000002, "b": 275.91925000000003, "coord_origin": "TOPLEFT"}}, {"id": 56, "text": "155", "bbox": {"l": 114.38793, "t": 271.52881, "r": 122.13927, "b": 275.88671999999997, "coord_origin": "TOPLEFT"}}, {"id": 57, "text": "\u56de", "bbox": {"l": 122.13963, "t": 271.73785, "r": 127.32454000000001, "b": 275.91925000000003, "coord_origin": "TOPLEFT"}}, {"id": 58, "text": "98", "bbox": {"l": 184.39731, "t": 270.75446, "r": 189.56456, "b": 275.11237000000006, "coord_origin": "TOPLEFT"}}, {"id": 59, "text": "2", "bbox": {"l": 210.28223, "t": 270.75446, "r": 212.86539, "b": 275.11237000000006, "coord_origin": "TOPLEFT"}}, {"id": 60, "text": "96", "bbox": {"l": 233.58348, "t": 270.75446, "r": 238.75072999999998, "b": 275.11237000000006, "coord_origin": "TOPLEFT"}}, {"id": 61, "text": "150", "bbox": {"l": 256.88446, "t": 270.75446, "r": 264.6358, "b": 275.11237000000006, "coord_origin": "TOPLEFT"}}, {"id": 62, "text": "232", "bbox": {"l": 281.47742, "t": 270.75446, "r": 289.22876, "b": 275.11237000000006, "coord_origin": "TOPLEFT"}}, {"id": 63, "text": "WWW", "bbox": {"l": 55.53052099999999, "t": 279.01392, "r": 68.68605, "b": 283.37183, "coord_origin": "TOPLEFT"}}, {"id": 64, "text": "\u304b\u3089\u53ce\u96c6\u3057\u305f\u8ad6\u6587", "bbox": {"l": 68.685814, "t": 279.22295999999994, "r": 110.16829999999999, "b": 283.40436, "coord_origin": "TOPLEFT"}}, {"id": 65, "text": "107", "bbox": {"l": 183.10536, "t": 277.98157000000003, "r": 190.8567, "b": 282.33948000000004, "coord_origin": "TOPLEFT"}}, {"id": 66, "text": "73", "bbox": {"l": 208.99026, "t": 277.98157000000003, "r": 214.15752, "b": 282.33948000000004, "coord_origin": "TOPLEFT"}}, {"id": 67, "text": "34", "bbox": {"l": 233.58348, "t": 277.98157000000003, "r": 238.75072999999998, "b": 282.33948000000004, "coord_origin": "TOPLEFT"}}, {"id": 68, "text": "147", "bbox": {"l": 256.88446, "t": 277.98157000000003, "r": 264.6358, "b": 282.33948000000004, "coord_origin": "TOPLEFT"}}, {"id": 69, "text": "96", "bbox": {"l": 282.76938, "t": 277.98157000000003, "r": 287.93661, "b": 282.33948000000004, "coord_origin": "TOPLEFT"}}, {"id": 70, "text": "\u8a08", "bbox": {"l": 169.61508, "t": 286.45004, "r": 174.79999, "b": 290.63141, "coord_origin": "TOPLEFT"}}, {"id": 71, "text": "945", "bbox": {"l": 183.10536, "t": 285.46667, "r": 190.8567, "b": 289.8245800000001, "coord_origin": "TOPLEFT"}}, {"id": 72, "text": "294", "bbox": {"l": 207.69832, "t": 285.46667, "r": 215.44965999999997, "b": 289.8245800000001, "coord_origin": "TOPLEFT"}}, {"id": 73, "text": "651", "bbox": {"l": 232.29153, "t": 285.46667, "r": 240.04287999999997, "b": 289.8245800000001, "coord_origin": "TOPLEFT"}}, {"id": 74, "text": "1122", "bbox": {"l": 255.76506, "t": 285.46667, "r": 265.75204, "b": 289.8245800000001, "coord_origin": "TOPLEFT"}}, {"id": 75, "text": "955", "bbox": {"l": 281.47742, "t": 285.46667, "r": 289.22876, "b": 289.8245800000001, "coord_origin": "TOPLEFT"}}]}, {"id": 7, "label": "caption", "bbox": {"l": 380.340087890625, "t": 291.73724365234375, "r": 549.7123413085938, "b": 299.1470642089844, "coord_origin": "TOPLEFT"}, "confidence": 0.7500573396682739, "cells": [{"id": 76, "text": "Text is aligned to match original for ease of viewing", "bbox": {"l": 380.42731, "t": 292.30426, "r": 549.42175, "b": 298.60284, "coord_origin": "TOPLEFT"}}]}, {"id": 8, "label": "table", "bbox": {"l": 304.9219970703125, "t": 218.51490783691406, "r": 550.2321166992188, "b": 287.9006652832031, "coord_origin": "TOPLEFT"}, "confidence": 0.8900098204612732, "cells": [{"id": 77, "text": "Weighted Average Grant Date Fair", "bbox": {"l": 459.04861, "t": 221.62415, "r": 542.00018, "b": 226.68933000000004, "coord_origin": "TOPLEFT"}}, {"id": 78, "text": "Value", "bbox": {"l": 493.82193, "t": 227.83416999999997, "r": 507.2258, "b": 232.89935000000003, "coord_origin": "TOPLEFT"}}, {"id": 79, "text": "RS", "bbox": {"l": 393.2442, "t": 236.74712999999997, "r": 400.74588, "b": 241.81232, "coord_origin": "TOPLEFT"}}, {"id": 80, "text": "U", "bbox": {"l": 400.74643, "t": 236.74712999999997, "r": 404.64523, "b": 241.81232, "coord_origin": "TOPLEFT"}}, {"id": 81, "text": "s", "bbox": {"l": 404.6463, "t": 236.74712999999997, "r": 407.34631, "b": 241.81232, "coord_origin": "TOPLEFT"}}, {"id": 82, "text": "Shares (in millions)", "bbox": {"l": 392.09671, "t": 221.57446000000004, "r": 438.0145, "b": 226.63964999999996, "coord_origin": "TOPLEFT"}}, {"id": 83, "text": "PSUs", "bbox": {"l": 427.18323, "t": 236.74712999999997, "r": 440.98778999999996, "b": 241.81232, "coord_origin": "TOPLEFT"}}, {"id": 84, "text": "RSUs", "bbox": {"l": 468.38254, "t": 236.74712999999997, "r": 482.48465000000004, "b": 241.81232, "coord_origin": "TOPLEFT"}}, {"id": 85, "text": "PSUs", "bbox": {"l": 516.92578, "t": 236.74712999999997, "r": 530.73035, "b": 241.81232, "coord_origin": "TOPLEFT"}}, {"id": 86, "text": "Nonvested on Janua", "bbox": {"l": 306.11493, "t": 244.61084000000005, "r": 355.6532, "b": 249.67602999999997, "coord_origin": "TOPLEFT"}}, {"id": 87, "text": "ry 1", "bbox": {"l": 355.65427, "t": 244.61084000000005, "r": 364.65607, "b": 249.67602999999997, "coord_origin": "TOPLEFT"}}, {"id": 88, "text": "1.", "bbox": {"l": 396.24661, "t": 244.91327, "r": 400.75238, "b": 249.97844999999995, "coord_origin": "TOPLEFT"}}, {"id": 89, "text": "1", "bbox": {"l": 400.7529, "t": 244.91327, "r": 403.75531, "b": 249.97844999999995, "coord_origin": "TOPLEFT"}}, {"id": 90, "text": "0.3", "bbox": {"l": 429.81838999999997, "t": 244.91327, "r": 437.32708999999994, "b": 249.97844999999995, "coord_origin": "TOPLEFT"}}, {"id": 91, "text": "90.10", "bbox": {"l": 465.52859, "t": 244.91327, "r": 478.40103, "b": 249.97844999999995, "coord_origin": "TOPLEFT"}}, {"id": 92, "text": "$", "bbox": {"l": 480.97552, "t": 244.91327, "r": 483.55001999999996, "b": 249.97844999999995, "coord_origin": "TOPLEFT"}}, {"id": 93, "text": "$ 91.19", "bbox": {"l": 513.44824, "t": 244.91327, "r": 531.46967, "b": 249.97844999999995, "coord_origin": "TOPLEFT"}}, {"id": 94, "text": "Granted", "bbox": {"l": 306.11493, "t": 253.68451000000005, "r": 325.62674, "b": 258.74969, "coord_origin": "TOPLEFT"}}, {"id": 95, "text": "0.", "bbox": {"l": 396.24661, "t": 253.68451000000005, "r": 400.75238, "b": 258.74969, "coord_origin": "TOPLEFT"}}, {"id": 96, "text": "5", "bbox": {"l": 400.7529, "t": 253.68451000000005, "r": 403.75531, "b": 258.74969, "coord_origin": "TOPLEFT"}}, {"id": 97, "text": "0.1", "bbox": {"l": 429.81838999999997, "t": 253.68451000000005, "r": 437.32708999999994, "b": 258.74969, "coord_origin": "TOPLEFT"}}, {"id": 98, "text": "117.44", "bbox": {"l": 466.43579000000005, "t": 253.68451000000005, "r": 482.54831, "b": 258.74969, "coord_origin": "TOPLEFT"}}, {"id": 99, "text": "122.41", "bbox": {"l": 514.29065, "t": 253.68451000000005, "r": 530.80981, "b": 258.74969, "coord_origin": "TOPLEFT"}}, {"id": 100, "text": "Vested", "bbox": {"l": 306.11493, "t": 261.54822, "r": 322.62866, "b": 266.61339999999996, "coord_origin": "TOPLEFT"}}, {"id": 101, "text": "(0.", "bbox": {"l": 394.43222, "t": 261.54822, "r": 400.73563, "b": 266.61339999999996, "coord_origin": "TOPLEFT"}}, {"id": 102, "text": "5", "bbox": {"l": 400.73456, "t": 261.54822, "r": 403.73697, "b": 266.61339999999996, "coord_origin": "TOPLEFT"}}, {"id": 103, "text": ")", "bbox": {"l": 403.73804, "t": 261.54822, "r": 405.53625, "b": 266.61339999999996, "coord_origin": "TOPLEFT"}}, {"id": 104, "text": "(0.1)", "bbox": {"l": 427.7016, "t": 261.54822, "r": 438.80563, "b": 266.61339999999996, "coord_origin": "TOPLEFT"}}, {"id": 105, "text": "87.08", "bbox": {"l": 468.55533, "t": 261.54822, "r": 482.07043, "b": 266.61339999999996, "coord_origin": "TOPLEFT"}}, {"id": 106, "text": "81.14", "bbox": {"l": 516.01862, "t": 261.54822, "r": 529.53375, "b": 266.61339999999996, "coord_origin": "TOPLEFT"}}, {"id": 107, "text": "Canceled or forfeited", "bbox": {"l": 306.11493, "t": 269.64148, "r": 356.24771, "b": 274.70667000000003, "coord_origin": "TOPLEFT"}}, {"id": 108, "text": "(0.", "bbox": {"l": 394.43222, "t": 270.31946000000005, "r": 400.73563, "b": 275.38464, "coord_origin": "TOPLEFT"}}, {"id": 109, "text": "1", "bbox": {"l": 400.73456, "t": 270.31946000000005, "r": 403.73697, "b": 275.38464, "coord_origin": "TOPLEFT"}}, {"id": 110, "text": ")", "bbox": {"l": 403.73804, "t": 270.31946000000005, "r": 405.53625, "b": 275.38464, "coord_origin": "TOPLEFT"}}, {"id": 111, "text": "-", "bbox": {"l": 431.02802, "t": 270.31946000000005, "r": 436.4280099999999, "b": 275.38464, "coord_origin": "TOPLEFT"}}, {"id": 112, "text": "102.01", "bbox": {"l": 465.83099000000004, "t": 270.31946000000005, "r": 482.35013, "b": 275.38464, "coord_origin": "TOPLEFT"}}, {"id": 113, "text": "92.18", "bbox": {"l": 516.01862, "t": 270.31946000000005, "r": 529.53375, "b": 275.38464, "coord_origin": "TOPLEFT"}}, {"id": 114, "text": "Nonvested on December 31", "bbox": {"l": 306.11493, "t": 278.48572, "r": 373.35764, "b": 283.55092999999994, "coord_origin": "TOPLEFT"}}, {"id": 115, "text": "1.0", "bbox": {"l": 396.24661, "t": 278.48572, "r": 403.75531, "b": 283.55092999999994, "coord_origin": "TOPLEFT"}}, {"id": 116, "text": "0.3", "bbox": {"l": 429.51599, "t": 278.48572, "r": 437.02469, "b": 283.55092999999994, "coord_origin": "TOPLEFT"}}, {"id": 117, "text": "104.85 $", "bbox": {"l": 463.7142, "t": 278.48572, "r": 484.73965000000004, "b": 283.55092999999994, "coord_origin": "TOPLEFT"}}, {"id": 118, "text": "$ 104.51", "bbox": {"l": 512.99463, "t": 278.48572, "r": 534.02008, "b": 283.55092999999994, "coord_origin": "TOPLEFT"}}]}, {"id": 9, "label": "caption", "bbox": {"l": 49.38380813598633, "t": 319.77777099609375, "r": 545.11377, "b": 365.64987, "coord_origin": "TOPLEFT"}, "confidence": 0.9140278697013855, "cells": [{"id": 119, "text": "Figure 5:", "bbox": {"l": 50.112, "t": 320.87735, "r": 86.864021, "b": 329.78391, "coord_origin": "TOPLEFT"}}, {"id": 120, "text": "One of the benefits of TableFormer is that it is language agnostic, as an example, the left part of the illustration", "bbox": {"l": 93.917542, "t": 320.87735, "r": 545.11371, "b": 329.78391, "coord_origin": "TOPLEFT"}}, {"id": 121, "text": "demonstrates TableFormer predictions on previously unseen language (Japanese). Additionally, we see that TableFormer is", "bbox": {"l": 50.112, "t": 332.83233999999993, "r": 545.11371, "b": 341.73889, "coord_origin": "TOPLEFT"}}, {"id": 122, "text": "robust to variability in style and content, right side of the illustration shows the example of the TableFormer prediction from", "bbox": {"l": 50.112, "t": 344.78732, "r": 545.11377, "b": 353.69388, "coord_origin": "TOPLEFT"}}, {"id": 123, "text": "the FinTabNet dataset.", "bbox": {"l": 50.112, "t": 356.74332, "r": 139.79532, "b": 365.64987, "coord_origin": "TOPLEFT"}}]}, {"id": 10, "label": "picture", "bbox": {"l": 216.76930236816406, "t": 380.49066162109375, "r": 375.7828674316406, "b": 443.34698486328125, "coord_origin": "TOPLEFT"}, "confidence": 0.8057794570922852, "cells": [{"id": 124, "text": "Red - PDF cells, Green - predicted bounding boxes", "bbox": {"l": 220.26282, "t": 381.77722, "r": 342.07819, "b": 386.44281, "coord_origin": "TOPLEFT"}}]}, {"id": 11, "label": "picture", "bbox": {"l": 51.73619842529297, "t": 380.48077392578125, "r": 211.83766174316406, "b": 443.65802001953125, "coord_origin": "TOPLEFT"}, "confidence": 0.8307981491088867, "cells": [{"id": 125, "text": "Ground Truth", "bbox": {"l": 53.715248, "t": 381.77722, "r": 85.657333, "b": 386.44281, "coord_origin": "TOPLEFT"}}]}, {"id": 12, "label": "picture", "bbox": {"l": 383.13629150390625, "t": 381.2313232421875, "r": 542.1132202148438, "b": 442.7750244140625, "coord_origin": "TOPLEFT"}, "confidence": 0.7880472540855408, "cells": [{"id": 126, "text": "16", "bbox": {"l": 437.37939, "t": 400.55295, "r": 443.69870000000003, "b": 406.87158, "coord_origin": "TOPLEFT"}}, {"id": 127, "text": "17", "bbox": {"l": 450.33203, "t": 400.55295, "r": 456.6513100000001, "b": 406.87158, "coord_origin": "TOPLEFT"}}, {"id": 128, "text": "18", "bbox": {"l": 463.28464, "t": 400.55295, "r": 469.60394, "b": 406.87158, "coord_origin": "TOPLEFT"}}, {"id": 129, "text": "19", "bbox": {"l": 476.23724000000004, "t": 400.55295, "r": 482.5565500000001, "b": 406.87158, "coord_origin": "TOPLEFT"}}, {"id": 130, "text": "20", "bbox": {"l": 489.18988, "t": 400.55295, "r": 495.50916, "b": 406.87158, "coord_origin": "TOPLEFT"}}, {"id": 131, "text": "21", "bbox": {"l": 502.14251999999993, "t": 400.55295, "r": 508.46178999999995, "b": 406.87158, "coord_origin": "TOPLEFT"}}, {"id": 132, "text": "22", "bbox": {"l": 515.09509, "t": 400.55295, "r": 521.41443, "b": 406.87158, "coord_origin": "TOPLEFT"}}, {"id": 133, "text": "23", "bbox": {"l": 385.2814, "t": 411.03836000000007, "r": 391.60071, "b": 417.35699, "coord_origin": "TOPLEFT"}}, {"id": 134, "text": "24", "bbox": {"l": 398.52341, "t": 411.03836000000007, "r": 404.84271, "b": 417.35699, "coord_origin": "TOPLEFT"}}, {"id": 135, "text": "25", "bbox": {"l": 411.47604, "t": 411.03836000000007, "r": 417.79535, "b": 417.35699, "coord_origin": "TOPLEFT"}}, {"id": 136, "text": "26", "bbox": {"l": 437.37939, "t": 411.03836000000007, "r": 443.69870000000003, "b": 417.35699, "coord_origin": "TOPLEFT"}}, {"id": 137, "text": "27", "bbox": {"l": 450.33203, "t": 411.03836000000007, "r": 456.6513100000001, "b": 417.35699, "coord_origin": "TOPLEFT"}}, {"id": 138, "text": "28", "bbox": {"l": 463.28464, "t": 411.03836000000007, "r": 469.60394, "b": 417.35699, "coord_origin": "TOPLEFT"}}, {"id": 139, "text": "30", "bbox": {"l": 385.2814, "t": 421.0697, "r": 391.60071, "b": 427.38834, "coord_origin": "TOPLEFT"}}, {"id": 140, "text": "31", "bbox": {"l": 398.52341, "t": 421.0697, "r": 404.84271, "b": 427.38834, "coord_origin": "TOPLEFT"}}, {"id": 141, "text": "32", "bbox": {"l": 411.47604, "t": 421.0697, "r": 417.79532, "b": 427.38834, "coord_origin": "TOPLEFT"}}, {"id": 142, "text": "33", "bbox": {"l": 424.42865, "t": 421.0697, "r": 430.74796, "b": 427.38834, "coord_origin": "TOPLEFT"}}, {"id": 143, "text": "34", "bbox": {"l": 437.38129, "t": 421.0697, "r": 443.70056, "b": 427.38834, "coord_origin": "TOPLEFT"}}, {"id": 144, "text": "35", "bbox": {"l": 450.33389000000005, "t": 421.0697, "r": 456.65319999999997, "b": 427.38834, "coord_origin": "TOPLEFT"}}, {"id": 145, "text": "36", "bbox": {"l": 463.2865, "t": 421.0697, "r": 469.6058, "b": 427.38834, "coord_origin": "TOPLEFT"}}, {"id": 146, "text": "37", "bbox": {"l": 476.23914, "t": 421.0697, "r": 482.55841, "b": 427.38834, "coord_origin": "TOPLEFT"}}, {"id": 147, "text": "38", "bbox": {"l": 489.1917700000001, "t": 421.0697, "r": 495.51105, "b": 427.38834, "coord_origin": "TOPLEFT"}}, {"id": 148, "text": "39", "bbox": {"l": 502.14438, "t": 421.0697, "r": 508.46368, "b": 427.38834, "coord_origin": "TOPLEFT"}}, {"id": 149, "text": "40", "bbox": {"l": 515.09705, "t": 421.0697, "r": 521.41632, "b": 427.38834, "coord_origin": "TOPLEFT"}}, {"id": 150, "text": "41", "bbox": {"l": 528.04962, "t": 421.0697, "r": 534.3689, "b": 427.38834, "coord_origin": "TOPLEFT"}}, {"id": 151, "text": "42", "bbox": {"l": 385.2814, "t": 432.04431, "r": 391.60071, "b": 438.36295, "coord_origin": "TOPLEFT"}}, {"id": 152, "text": "43", "bbox": {"l": 398.52341, "t": 432.04431, "r": 404.84271, "b": 438.36295, "coord_origin": "TOPLEFT"}}, {"id": 153, "text": "44", "bbox": {"l": 411.47604, "t": 432.04431, "r": 417.79532, "b": 438.36295, "coord_origin": "TOPLEFT"}}, {"id": 154, "text": "45", "bbox": {"l": 424.42865, "t": 432.04431, "r": 430.74796, "b": 438.36295, "coord_origin": "TOPLEFT"}}, {"id": 155, "text": "46", "bbox": {"l": 437.38129, "t": 432.04431, "r": 443.70056, "b": 438.36295, "coord_origin": "TOPLEFT"}}, {"id": 156, "text": "47", "bbox": {"l": 450.33389000000005, "t": 432.04431, "r": 456.65319999999997, "b": 438.36295, "coord_origin": "TOPLEFT"}}, {"id": 157, "text": "48", "bbox": {"l": 463.2865, "t": 432.04431, "r": 469.6058, "b": 438.36295, "coord_origin": "TOPLEFT"}}, {"id": 158, "text": "49", "bbox": {"l": 476.23914, "t": 432.04431, "r": 482.55841, "b": 438.36295, "coord_origin": "TOPLEFT"}}, {"id": 159, "text": "50", "bbox": {"l": 489.1917700000001, "t": 432.04431, "r": 495.51105, "b": 438.36295, "coord_origin": "TOPLEFT"}}, {"id": 160, "text": "51", "bbox": {"l": 502.14438, "t": 432.04431, "r": 508.46368, "b": 438.36295, "coord_origin": "TOPLEFT"}}, {"id": 161, "text": "52", "bbox": {"l": 515.09705, "t": 432.04431, "r": 521.41632, "b": 438.36295, "coord_origin": "TOPLEFT"}}, {"id": 162, "text": "53", "bbox": {"l": 528.04962, "t": 432.04431, "r": 534.3689, "b": 438.36295, "coord_origin": "TOPLEFT"}}, {"id": 163, "text": "0", "bbox": {"l": 385.2814, "t": 389.20004, "r": 388.44073, "b": 395.51868, "coord_origin": "TOPLEFT"}}, {"id": 164, "text": "1", "bbox": {"l": 398.52341, "t": 389.20004, "r": 401.68274, "b": 395.51868, "coord_origin": "TOPLEFT"}}, {"id": 165, "text": "2", "bbox": {"l": 411.4754, "t": 389.20004, "r": 414.63474, "b": 395.51868, "coord_origin": "TOPLEFT"}}, {"id": 166, "text": "3", "bbox": {"l": 424.4274, "t": 389.20004, "r": 427.58673, "b": 395.51868, "coord_origin": "TOPLEFT"}}, {"id": 167, "text": "4", "bbox": {"l": 437.37939, "t": 389.20004, "r": 440.53870000000006, "b": 395.51868, "coord_origin": "TOPLEFT"}}, {"id": 168, "text": "5", "bbox": {"l": 450.33136, "t": 389.20004, "r": 453.49069000000003, "b": 395.51868, "coord_origin": "TOPLEFT"}}, {"id": 169, "text": "6", "bbox": {"l": 463.28336, "t": 389.20004, "r": 466.44269, "b": 395.51868, "coord_origin": "TOPLEFT"}}, {"id": 170, "text": "7", "bbox": {"l": 476.23535, "t": 389.20004, "r": 479.39468, "b": 395.51868, "coord_origin": "TOPLEFT"}}, {"id": 171, "text": "8", "bbox": {"l": 489.18735, "t": 389.20004, "r": 492.34668, "b": 395.51868, "coord_origin": "TOPLEFT"}}, {"id": 172, "text": "9", "bbox": {"l": 502.13933999999995, "t": 389.20004, "r": 505.29868000000005, "b": 395.51868, "coord_origin": "TOPLEFT"}}, {"id": 173, "text": "10", "bbox": {"l": 515.09131, "t": 389.20004, "r": 521.41064, "b": 395.51868, "coord_origin": "TOPLEFT"}}, {"id": 174, "text": "11", "bbox": {"l": 528.04364, "t": 389.20004, "r": 534.13104, "b": 395.51868, "coord_origin": "TOPLEFT"}}, {"id": 175, "text": "12", "bbox": {"l": 385.2814, "t": 398.97464, "r": 391.60071, "b": 405.29327, "coord_origin": "TOPLEFT"}}, {"id": 176, "text": "13", "bbox": {"l": 398.52341, "t": 398.97464, "r": 404.84271, "b": 405.29327, "coord_origin": "TOPLEFT"}}, {"id": 177, "text": "14", "bbox": {"l": 411.47604, "t": 398.97464, "r": 417.79535, "b": 405.29327, "coord_origin": "TOPLEFT"}}, {"id": 178, "text": "15", "bbox": {"l": 424.42719, "t": 406.77463000000006, "r": 430.74648999999994, "b": 413.09326, "coord_origin": "TOPLEFT"}}, {"id": 179, "text": "29", "bbox": {"l": 502.86941999999993, "t": 410.99438, "r": 509.18871999999993, "b": 417.31302, "coord_origin": "TOPLEFT"}}, {"id": 180, "text": "Predicted Structure", "bbox": {"l": 384.35437, "t": 381.77722, "r": 430.99261, "b": 386.44281, "coord_origin": "TOPLEFT"}}]}, {"id": 13, "label": "caption", "bbox": {"l": 62.00014114379883, "t": 457.9582824707031, "r": 532.63049, "b": 467.8396301269531, "coord_origin": "TOPLEFT"}, "confidence": 0.9153729677200317, "cells": [{"id": 181, "text": "Figure 6: An example of TableFormer predictions (bounding boxes and structure) from generated SynthTabNet table.", "bbox": {"l": 62.595001, "t": 458.72836, "r": 532.63049, "b": 467.63492, "coord_origin": "TOPLEFT"}}]}, {"id": 14, "label": "section_header", "bbox": {"l": 49.432151794433594, "t": 490.32525634765625, "r": 164.0072479248047, "b": 501.24741, "coord_origin": "TOPLEFT"}, "confidence": 0.9561254978179932, "cells": [{"id": 182, "text": "5.5.", "bbox": {"l": 50.112, "t": 491.39536, "r": 64.448898, "b": 501.24741, "coord_origin": "TOPLEFT"}}, {"id": 183, "text": "Qualitative Analysis", "bbox": {"l": 74.006828, "t": 491.39536, "r": 163.7558, "b": 501.24741, "coord_origin": "TOPLEFT"}}]}, {"id": 15, "label": "text", "bbox": {"l": 49.34812545776367, "t": 535.7822875976562, "r": 286.7104187011719, "b": 713.9708251953125, "coord_origin": "TOPLEFT"}, "confidence": 0.9852354526519775, "cells": [{"id": 184, "text": "We showcase several visualizations for the different", "bbox": {"l": 62.067001, "t": 536.87337, "r": 286.36499, "b": 545.77992, "coord_origin": "TOPLEFT"}}, {"id": 185, "text": "components of our network on various", "bbox": {"l": 50.112, "t": 548.82837, "r": 211.15741, "b": 557.73492, "coord_origin": "TOPLEFT"}}, {"id": 186, "text": "\u201ccomplex\u201d", "bbox": {"l": 215.10000999999997, "t": 548.91803, "r": 259.17453, "b": 557.50578, "coord_origin": "TOPLEFT"}}, {"id": 187, "text": "tables", "bbox": {"l": 263.12, "t": 548.82837, "r": 286.36273, "b": 557.73492, "coord_origin": "TOPLEFT"}}, {"id": 188, "text": "within datasets presented in this work in Fig. 5 and Fig. 6", "bbox": {"l": 50.112, "t": 560.78337, "r": 286.36505, "b": 569.68993, "coord_origin": "TOPLEFT"}}, {"id": 189, "text": "As it is shown, our model is able to predict bounding boxes", "bbox": {"l": 50.112, "t": 572.73837, "r": 286.36508, "b": 581.6449299999999, "coord_origin": "TOPLEFT"}}, {"id": 190, "text": "for all table cells, even for the empty ones. Additionally,", "bbox": {"l": 50.112, "t": 584.69337, "r": 286.36508, "b": 593.59993, "coord_origin": "TOPLEFT"}}, {"id": 191, "text": "our post-processing techniques can extract the cell content", "bbox": {"l": 50.112, "t": 596.64937, "r": 286.36505, "b": 605.55592, "coord_origin": "TOPLEFT"}}, {"id": 192, "text": "by matching the predicted bounding boxes to the PDF cells", "bbox": {"l": 50.112, "t": 608.60437, "r": 286.36508, "b": 617.51093, "coord_origin": "TOPLEFT"}}, {"id": 193, "text": "based on their overlap and spatial proximity. The left part", "bbox": {"l": 50.112, "t": 620.55937, "r": 286.36508, "b": 629.46593, "coord_origin": "TOPLEFT"}}, {"id": 194, "text": "of Fig. 5 demonstrates also the adaptability of our method", "bbox": {"l": 50.112, "t": 632.51437, "r": 286.36508, "b": 641.42093, "coord_origin": "TOPLEFT"}}, {"id": 195, "text": "to any language, as it can successfully extract Japanese", "bbox": {"l": 50.112, "t": 644.46938, "r": 286.36508, "b": 653.37593, "coord_origin": "TOPLEFT"}}, {"id": 196, "text": "text, although the training set contains only English content.", "bbox": {"l": 50.112, "t": 656.42438, "r": 286.36511, "b": 665.33094, "coord_origin": "TOPLEFT"}}, {"id": 197, "text": "We provide more visualizations including the intermediate", "bbox": {"l": 50.112, "t": 668.38037, "r": 286.36508, "b": 677.28694, "coord_origin": "TOPLEFT"}}, {"id": 198, "text": "steps in the supplementary material. Overall these illustra-", "bbox": {"l": 50.112, "t": 680.33537, "r": 286.36511, "b": 689.24194, "coord_origin": "TOPLEFT"}}, {"id": 199, "text": "tions justify the versatility of our method across a diverse", "bbox": {"l": 50.112, "t": 692.290375, "r": 286.36511, "b": 701.196945, "coord_origin": "TOPLEFT"}}, {"id": 200, "text": "range of table appearances and content type.", "bbox": {"l": 50.112, "t": 704.245377, "r": 226.88833999999997, "b": 713.1519470000001, "coord_origin": "TOPLEFT"}}]}, {"id": 16, "label": "section_header", "bbox": {"l": 308.2789306640625, "t": 489.6516418457031, "r": 460.84848, "b": 501.45663, "coord_origin": "TOPLEFT"}, "confidence": 0.9436547160148621, "cells": [{"id": 201, "text": "6.", "bbox": {"l": 308.862, "t": 490.70892, "r": 316.07382, "b": 501.45663, "coord_origin": "TOPLEFT"}}, {"id": 202, "text": "Future Work & Conclusion", "bbox": {"l": 325.68954, "t": 490.70892, "r": 460.84848, "b": 501.45663, "coord_origin": "TOPLEFT"}}]}, {"id": 17, "label": "text", "bbox": {"l": 307.99957275390625, "t": 511.78887939453125, "r": 545.2568359375, "b": 653.30592, "coord_origin": "TOPLEFT"}, "confidence": 0.9875592589378357, "cells": [{"id": 203, "text": "In this paper, we presented TableFormer an end-to-end", "bbox": {"l": 320.81699, "t": 512.89337, "r": 545.11505, "b": 521.79993, "coord_origin": "TOPLEFT"}}, {"id": 204, "text": "transformer based approach to predict table structures and", "bbox": {"l": 308.862, "t": 524.84836, "r": 545.11517, "b": 533.75491, "coord_origin": "TOPLEFT"}}, {"id": 205, "text": "bounding boxes of cells from an image. This approach en-", "bbox": {"l": 308.862, "t": 536.80336, "r": 545.11511, "b": 545.70992, "coord_origin": "TOPLEFT"}}, {"id": 206, "text": "ables us to recreate the table structure, and extract the cell", "bbox": {"l": 308.862, "t": 548.75836, "r": 545.11505, "b": 557.6649199999999, "coord_origin": "TOPLEFT"}}, {"id": 207, "text": "content from PDF or OCR by using bounding boxes. Ad-", "bbox": {"l": 308.862, "t": 560.71336, "r": 545.11517, "b": 569.61992, "coord_origin": "TOPLEFT"}}, {"id": 208, "text": "ditionally, it provides the versatility required in real-world", "bbox": {"l": 308.862, "t": 572.66837, "r": 545.11511, "b": 581.57492, "coord_origin": "TOPLEFT"}}, {"id": 209, "text": "scenarios when dealing with various types of PDF docu-", "bbox": {"l": 308.862, "t": 584.62436, "r": 545.11511, "b": 593.53091, "coord_origin": "TOPLEFT"}}, {"id": 210, "text": "ments, and languages.", "bbox": {"l": 308.862, "t": 596.57936, "r": 400.46808, "b": 605.48592, "coord_origin": "TOPLEFT"}}, {"id": 211, "text": "Furthermore, our method outper-", "bbox": {"l": 408.37839, "t": 596.57936, "r": 545.11511, "b": 605.48592, "coord_origin": "TOPLEFT"}}, {"id": 212, "text": "forms all state-of-the-arts with a wide margin. Finally, we", "bbox": {"l": 308.862, "t": 608.53436, "r": 545.11505, "b": 617.44092, "coord_origin": "TOPLEFT"}}, {"id": 213, "text": "introduce \u201cSynthTabNet\u201d a challenging synthetically gen-", "bbox": {"l": 308.862, "t": 620.48936, "r": 545.11511, "b": 629.3959199999999, "coord_origin": "TOPLEFT"}}, {"id": 214, "text": "erated dataset that reinforces missing characteristics from", "bbox": {"l": 308.862, "t": 632.4443699999999, "r": 545.11505, "b": 641.35092, "coord_origin": "TOPLEFT"}}, {"id": 215, "text": "other datasets.", "bbox": {"l": 308.862, "t": 644.39937, "r": 365.85803, "b": 653.30592, "coord_origin": "TOPLEFT"}}]}, {"id": 18, "label": "section_header", "bbox": {"l": 308.3702392578125, "t": 671.6679077148438, "r": 364.48675537109375, "b": 682.84664, "coord_origin": "TOPLEFT"}, "confidence": 0.9442476034164429, "cells": [{"id": 216, "text": "References", "bbox": {"l": 308.862, "t": 672.09892, "r": 364.40585, "b": 682.84664, "coord_origin": "TOPLEFT"}}]}, {"id": 19, "label": "list_item", "bbox": {"l": 313.0051574707031, "t": 692.8663940429688, "r": 545.1151123046875, "b": 713.3478393554688, "coord_origin": "TOPLEFT"}, "confidence": 0.8318727612495422, "cells": [{"id": 217, "text": "[1]", "bbox": {"l": 313.345, "t": 693.9617920000001, "r": 323.80792, "b": 701.977753, "coord_origin": "TOPLEFT"}}, {"id": 218, "text": "Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas", "bbox": {"l": 326.05127, "t": 693.9617920000001, "r": 545.10852, "b": 701.977753, "coord_origin": "TOPLEFT"}}, {"id": 219, "text": "Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to-", "bbox": {"l": 328.78101, "t": 704.920792, "r": 545.1134, "b": 712.936752, "coord_origin": "TOPLEFT"}}]}, {"id": 20, "label": "page_footer", "bbox": {"l": 294.6642761230469, "t": 733.6441650390625, "r": 300.13397216796875, "b": 743.039928, "coord_origin": "TOPLEFT"}, "confidence": 0.8709819912910461, "cells": [{"id": 220, "text": "8", "bbox": {"l": 295.121, "t": 734.133366, "r": 300.10229, "b": 743.039928, "coord_origin": "TOPLEFT"}}]}, {"id": 21, "label": "picture", "bbox": {"l": 49.97499084472656, "t": 103.71235656738281, "r": 301.6349182128906, "b": 187.57875061035156, "coord_origin": "TOPLEFT"}, "confidence": 0.7873188853263855, "cells": []}]}, "tablestructure": {"table_map": {"6": {"label": "table", "id": 6, "page_no": 7, "cluster": {"id": 6, "label": "table", "bbox": {"l": 53.6285400390625, "t": 218.94859313964844, "r": 298.5574035644531, "b": 292.3999938964844, "coord_origin": "TOPLEFT"}, "confidence": 0.8824083805084229, "cells": [{"id": 6, "text": "\u8ad6\u6587\u30d5\u30a1\u30a4\u30eb", "bbox": {"l": 209.93285, "t": 222.18073000000004, "r": 241.04458999999997, "b": 226.36212, "coord_origin": "TOPLEFT"}}, {"id": 7, "text": "\u53c2\u8003\u6587\u732e", "bbox": {"l": 263.76489, "t": 222.18073000000004, "r": 284.50589, "b": 226.36212, "coord_origin": "TOPLEFT"}}, {"id": 8, "text": "\u51fa\u5178", "bbox": {"l": 110.24990999999999, "t": 229.66594999999995, "r": 120.62018, "b": 233.84735, "coord_origin": "TOPLEFT"}}, {"id": 9, "text": "\u30d5\u30a1\u30a4\u30eb", "bbox": {"l": 175.36609, "t": 229.66594999999995, "r": 196.1071, "b": 233.84735, "coord_origin": "TOPLEFT"}}, {"id": 10, "text": "\u6570", "bbox": {"l": 196.10756, "t": 229.66594999999995, "r": 201.29247, "b": 233.84735, "coord_origin": "TOPLEFT"}}, {"id": 11, "text": "\u82f1\u8a9e", "bbox": {"l": 209.62408, "t": 229.66594999999995, "r": 219.99435, "b": 233.84735, "coord_origin": "TOPLEFT"}}, {"id": 12, "text": "\u65e5\u672c\u8a9e", "bbox": {"l": 229.19814, "t": 229.66594999999995, "r": 244.75377, "b": 233.84735, "coord_origin": "TOPLEFT"}}, {"id": 13, "text": "\u82f1\u8a9e", "bbox": {"l": 256.1142, "t": 229.66594999999995, "r": 266.48447, "b": 233.84735, "coord_origin": "TOPLEFT"}}, {"id": 14, "text": "\u65e5\u672c\u8a9e", "bbox": {"l": 278.38434, "t": 229.66594999999995, "r": 293.93997, "b": 233.84735, "coord_origin": "TOPLEFT"}}, {"id": 15, "text": "Association for Computational Linguistics(ACL2003)", "bbox": {"l": 55.53052099999999, "t": 236.42584, "r": 162.7131, "b": 240.78375000000005, "coord_origin": "TOPLEFT"}}, {"id": 16, "text": "65", "bbox": {"l": 184.39731, "t": 236.42584, "r": 189.56456, "b": 240.78375000000005, "coord_origin": "TOPLEFT"}}, {"id": 17, "text": "65", "bbox": {"l": 208.99026, "t": 236.42584, "r": 214.15752, "b": 240.78375000000005, "coord_origin": "TOPLEFT"}}, {"id": 18, "text": "0", "bbox": {"l": 234.87517, "t": 236.42584, "r": 237.45833000000002, "b": 240.78375000000005, "coord_origin": "TOPLEFT"}}, {"id": 19, "text": "150", "bbox": {"l": 256.88446, "t": 236.42584, "r": 264.6358, "b": 240.78375000000005, "coord_origin": "TOPLEFT"}}, {"id": 20, "text": "0", "bbox": {"l": 284.06134, "t": 236.42584, "r": 286.6445, "b": 240.78375000000005, "coord_origin": "TOPLEFT"}}, {"id": 21, "text": "Computational Linguistics(COLING2002)", "bbox": {"l": 55.53052099999999, "t": 242.62048000000004, "r": 139.72253, "b": 246.97839, "coord_origin": "TOPLEFT"}}, {"id": 22, "text": "140", "bbox": {"l": 183.10536, "t": 242.62048000000004, "r": 190.8567, "b": 246.97839, "coord_origin": "TOPLEFT"}}, {"id": 23, "text": "140", "bbox": {"l": 207.69832, "t": 242.62048000000004, "r": 215.44965999999997, "b": 246.97839, "coord_origin": "TOPLEFT"}}, {"id": 24, "text": "0", "bbox": {"l": 234.87517, "t": 242.62048000000004, "r": 237.45833000000002, "b": 246.97839, "coord_origin": "TOPLEFT"}}, {"id": 25, "text": "150", "bbox": {"l": 256.88446, "t": 242.62048000000004, "r": 264.6358, "b": 246.97839, "coord_origin": "TOPLEFT"}}, {"id": 26, "text": "0", "bbox": {"l": 284.06134, "t": 242.62048000000004, "r": 286.6445, "b": 246.97839, "coord_origin": "TOPLEFT"}}, {"id": 27, "text": "\u96fb\u6c17\u60c5\u5831\u901a\u4fe1\u5b66\u4f1a", "bbox": {"l": 55.53052099999999, "t": 249.79845999999998, "r": 97.013, "b": 253.97986000000003, "coord_origin": "TOPLEFT"}}, {"id": 28, "text": "2003", "bbox": {"l": 92.698288, "t": 249.58942000000002, "r": 103.03371, "b": 253.94732999999997, "coord_origin": "TOPLEFT"}}, {"id": 29, "text": "\u5e74\u7dcf\u5408\u5927\u4f1a", "bbox": {"l": 103.03389, "t": 249.79845999999998, "r": 128.96027, "b": 253.97986000000003, "coord_origin": "TOPLEFT"}}, {"id": 30, "text": "150", "bbox": {"l": 183.10536, "t": 248.81506000000002, "r": 190.8567, "b": 253.17296999999996, "coord_origin": "TOPLEFT"}}, {"id": 31, "text": "8", "bbox": {"l": 210.28223, "t": 248.81506000000002, "r": 212.86539, "b": 253.17296999999996, "coord_origin": "TOPLEFT"}}, {"id": 32, "text": "142", "bbox": {"l": 232.29153, "t": 248.81506000000002, "r": 240.04287999999997, "b": 253.17296999999996, "coord_origin": "TOPLEFT"}}, {"id": 33, "text": "223", "bbox": {"l": 256.88446, "t": 248.81506000000002, "r": 264.6358, "b": 253.17296999999996, "coord_origin": "TOPLEFT"}}, {"id": 34, "text": "147", "bbox": {"l": 281.47742, "t": 248.81506000000002, "r": 289.22876, "b": 253.17296999999996, "coord_origin": "TOPLEFT"}}, {"id": 35, "text": "\u60c5\u5831\u51e6\u7406\u5b66\u4f1a\u7b2c", "bbox": {"l": 55.53052099999999, "t": 257.28369, "r": 91.827637, "b": 261.46509000000003, "coord_origin": "TOPLEFT"}}, {"id": 36, "text": "65", "bbox": {"l": 88.052673, "t": 257.07465, "r": 93.219925, "b": 261.43255999999997, "coord_origin": "TOPLEFT"}}, {"id": 37, "text": "\u56de\u5168\u56fd\u5927\u4f1a", "bbox": {"l": 93.220474, "t": 257.28369, "r": 119.14685, "b": 261.46509000000003, "coord_origin": "TOPLEFT"}}, {"id": 38, "text": "(2003)", "bbox": {"l": 116.45073999999998, "t": 257.07465, "r": 129.88177, "b": 261.43255999999997, "coord_origin": "TOPLEFT"}}, {"id": 39, "text": "177", "bbox": {"l": 183.10536, "t": 256.30029, "r": 190.8567, "b": 260.65819999999997, "coord_origin": "TOPLEFT"}}, {"id": 40, "text": "1", "bbox": {"l": 210.28223, "t": 256.30029, "r": 212.86539, "b": 260.65819999999997, "coord_origin": "TOPLEFT"}}, {"id": 41, "text": "176", "bbox": {"l": 232.29153, "t": 256.30029, "r": 240.04287999999997, "b": 260.65819999999997, "coord_origin": "TOPLEFT"}}, {"id": 42, "text": "150", "bbox": {"l": 256.88446, "t": 256.30029, "r": 264.6358, "b": 260.65819999999997, "coord_origin": "TOPLEFT"}}, {"id": 43, "text": "236", "bbox": {"l": 281.47742, "t": 256.30029, "r": 289.22876, "b": 260.65819999999997, "coord_origin": "TOPLEFT"}}, {"id": 44, "text": "\u7b2c", "bbox": {"l": 55.53052099999999, "t": 264.5108, "r": 60.715424, "b": 268.69219999999996, "coord_origin": "TOPLEFT"}}, {"id": 45, "text": "17", "bbox": {"l": 60.17654799999999, "t": 264.30175999999994, "r": 65.343796, "b": 268.65967, "coord_origin": "TOPLEFT"}}, {"id": 46, "text": "\u56de\u4eba\u5de5\u77e5\u80fd\u5b66\u4f1a\u5168\u56fd\u5927\u4f1a", "bbox": {"l": 65.344376, "t": 264.5108, "r": 122.38297000000001, "b": 268.69219999999996, "coord_origin": "TOPLEFT"}}, {"id": 47, "text": "(2003)", "bbox": {"l": 116.45073999999998, "t": 264.30175999999994, "r": 129.88177, "b": 268.65967, "coord_origin": "TOPLEFT"}}, {"id": 48, "text": "208", "bbox": {"l": 183.10536, "t": 263.52739999999994, "r": 190.8567, "b": 267.88531, "coord_origin": "TOPLEFT"}}, {"id": 49, "text": "5", "bbox": {"l": 210.28223, "t": 263.52739999999994, "r": 212.86539, "b": 267.88531, "coord_origin": "TOPLEFT"}}, {"id": 50, "text": "203", "bbox": {"l": 232.29153, "t": 263.52739999999994, "r": 240.04287999999997, "b": 267.88531, "coord_origin": "TOPLEFT"}}, {"id": 51, "text": "152", "bbox": {"l": 256.88446, "t": 263.52739999999994, "r": 264.6358, "b": 267.88531, "coord_origin": "TOPLEFT"}}, {"id": 52, "text": "244", "bbox": {"l": 281.47742, "t": 263.52739999999994, "r": 289.22876, "b": 267.88531, "coord_origin": "TOPLEFT"}}, {"id": 53, "text": "\u81ea\u7136\u8a00\u8a9e\u51e6\u7406\u7814\u7a76\u4f1a\u7b2c", "bbox": {"l": 55.53052099999999, "t": 271.73785, "r": 107.38374, "b": 275.91925000000003, "coord_origin": "TOPLEFT"}}, {"id": 54, "text": "146", "bbox": {"l": 101.99034, "t": 271.52881, "r": 109.74168000000002, "b": 275.88671999999997, "coord_origin": "TOPLEFT"}}, {"id": 55, "text": "\u301c", "bbox": {"l": 109.74204, "t": 271.73785, "r": 114.92695000000002, "b": 275.91925000000003, "coord_origin": "TOPLEFT"}}, {"id": 56, "text": "155", "bbox": {"l": 114.38793, "t": 271.52881, "r": 122.13927, "b": 275.88671999999997, "coord_origin": "TOPLEFT"}}, {"id": 57, "text": "\u56de", "bbox": {"l": 122.13963, "t": 271.73785, "r": 127.32454000000001, "b": 275.91925000000003, "coord_origin": "TOPLEFT"}}, {"id": 58, "text": "98", "bbox": {"l": 184.39731, "t": 270.75446, "r": 189.56456, "b": 275.11237000000006, "coord_origin": "TOPLEFT"}}, {"id": 59, "text": "2", "bbox": {"l": 210.28223, "t": 270.75446, "r": 212.86539, "b": 275.11237000000006, "coord_origin": "TOPLEFT"}}, {"id": 60, "text": "96", "bbox": {"l": 233.58348, "t": 270.75446, "r": 238.75072999999998, "b": 275.11237000000006, "coord_origin": "TOPLEFT"}}, {"id": 61, "text": "150", "bbox": {"l": 256.88446, "t": 270.75446, "r": 264.6358, "b": 275.11237000000006, "coord_origin": "TOPLEFT"}}, {"id": 62, "text": "232", "bbox": {"l": 281.47742, "t": 270.75446, "r": 289.22876, "b": 275.11237000000006, "coord_origin": "TOPLEFT"}}, {"id": 63, "text": "WWW", "bbox": {"l": 55.53052099999999, "t": 279.01392, "r": 68.68605, "b": 283.37183, "coord_origin": "TOPLEFT"}}, {"id": 64, "text": "\u304b\u3089\u53ce\u96c6\u3057\u305f\u8ad6\u6587", "bbox": {"l": 68.685814, "t": 279.22295999999994, "r": 110.16829999999999, "b": 283.40436, "coord_origin": "TOPLEFT"}}, {"id": 65, "text": "107", "bbox": {"l": 183.10536, "t": 277.98157000000003, "r": 190.8567, "b": 282.33948000000004, "coord_origin": "TOPLEFT"}}, {"id": 66, "text": "73", "bbox": {"l": 208.99026, "t": 277.98157000000003, "r": 214.15752, "b": 282.33948000000004, "coord_origin": "TOPLEFT"}}, {"id": 67, "text": "34", "bbox": {"l": 233.58348, "t": 277.98157000000003, "r": 238.75072999999998, "b": 282.33948000000004, "coord_origin": "TOPLEFT"}}, {"id": 68, "text": "147", "bbox": {"l": 256.88446, "t": 277.98157000000003, "r": 264.6358, "b": 282.33948000000004, "coord_origin": "TOPLEFT"}}, {"id": 69, "text": "96", "bbox": {"l": 282.76938, "t": 277.98157000000003, "r": 287.93661, "b": 282.33948000000004, "coord_origin": "TOPLEFT"}}, {"id": 70, "text": "\u8a08", "bbox": {"l": 169.61508, "t": 286.45004, "r": 174.79999, "b": 290.63141, "coord_origin": "TOPLEFT"}}, {"id": 71, "text": "945", "bbox": {"l": 183.10536, "t": 285.46667, "r": 190.8567, "b": 289.8245800000001, "coord_origin": "TOPLEFT"}}, {"id": 72, "text": "294", "bbox": {"l": 207.69832, "t": 285.46667, "r": 215.44965999999997, "b": 289.8245800000001, "coord_origin": "TOPLEFT"}}, {"id": 73, "text": "651", "bbox": {"l": 232.29153, "t": 285.46667, "r": 240.04287999999997, "b": 289.8245800000001, "coord_origin": "TOPLEFT"}}, {"id": 74, "text": "1122", "bbox": {"l": 255.76506, "t": 285.46667, "r": 265.75204, "b": 289.8245800000001, "coord_origin": "TOPLEFT"}}, {"id": 75, "text": "955", "bbox": {"l": 281.47742, "t": 285.46667, "r": 289.22876, "b": 289.8245800000001, "coord_origin": "TOPLEFT"}}]}, "text": null, "otsl_seq": ["ched", "ched", "ched", "lcel", "ched", "lcel", "nl", "ched", "ched", "ched", "ched", "ched", "ched", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "fcel", "nl"], "num_rows": 10, "num_cols": 6, "table_cells": [{"bbox": {"l": 209.93285, "t": 222.18073000000004, "r": 241.04458999999997, "b": 226.36212, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 2, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 2, "end_col_offset_idx": 4, "text": "\u8ad6\u6587\u30d5\u30a1\u30a4\u30eb", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 263.76489, "t": 222.18073000000004, "r": 284.50589, "b": 226.36212, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 2, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 4, "end_col_offset_idx": 6, "text": "\u53c2\u8003\u6587\u732e", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 110.24990999999999, "t": 229.66594999999995, "r": 120.62018, "b": 233.84735, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "\u51fa\u5178", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 175.36609, "t": 229.66594999999995, "r": 201.29247, "b": 233.84735, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "\u30d5\u30a1\u30a4\u30eb \u6570", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 209.62408, "t": 229.66594999999995, "r": 219.99435, "b": 233.84735, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "\u82f1\u8a9e", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 229.19814, "t": 229.66594999999995, "r": 244.75377, "b": 233.84735, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "\u65e5\u672c\u8a9e", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 256.1142, "t": 229.66594999999995, "r": 266.48447, "b": 233.84735, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "\u82f1\u8a9e", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 278.38434, "t": 229.66594999999995, "r": 293.93997, "b": 233.84735, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "\u65e5\u672c\u8a9e", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 55.53052099999999, "t": 236.42584, "r": 162.7131, "b": 240.78375000000005, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Association for Computational Linguistics(ACL2003)", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 184.39731, "t": 236.42584, "r": 189.56456, "b": 240.78375000000005, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "65", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 208.99026, "t": 236.42584, "r": 214.15752, "b": 240.78375000000005, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "65", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 234.87517, "t": 236.42584, "r": 237.45833000000002, "b": 240.78375000000005, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 256.88446, "t": 236.42584, "r": 264.6358, "b": 240.78375000000005, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "150", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 284.06134, "t": 236.42584, "r": 286.6445, "b": 240.78375000000005, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 55.53052099999999, "t": 242.62048000000004, "r": 139.72253, "b": 246.97839, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Computational Linguistics(COLING2002)", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 183.10536, "t": 242.62048000000004, "r": 190.8567, "b": 246.97839, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "140", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 207.69832, "t": 242.62048000000004, "r": 215.44965999999997, "b": 246.97839, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "140", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 234.87517, "t": 242.62048000000004, "r": 237.45833000000002, "b": 246.97839, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 256.88446, "t": 242.62048000000004, "r": 264.6358, "b": 246.97839, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "150", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 284.06134, "t": 242.62048000000004, "r": 286.6445, "b": 246.97839, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 55.53052099999999, "t": 249.58942000000002, "r": 128.96027, "b": 253.97986000000003, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "\u96fb\u6c17\u60c5\u5831\u901a\u4fe1\u5b66\u4f1a 2003 \u5e74\u7dcf\u5408\u5927\u4f1a", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 183.10536, "t": 248.81506000000002, "r": 190.8567, "b": 253.17296999999996, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "150", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 210.28223, "t": 248.81506000000002, "r": 212.86539, "b": 253.17296999999996, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "8", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 232.29153, "t": 248.81506000000002, "r": 240.04287999999997, "b": 253.17296999999996, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "142", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 256.88446, "t": 248.81506000000002, "r": 264.6358, "b": 253.17296999999996, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "223", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 281.47742, "t": 248.81506000000002, "r": 289.22876, "b": 253.17296999999996, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "147", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 55.53052099999999, "t": 257.07465, "r": 129.88177, "b": 261.46509000000003, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "\u60c5\u5831\u51e6\u7406\u5b66\u4f1a\u7b2c 65 \u56de\u5168\u56fd\u5927\u4f1a (2003)", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 183.10536, "t": 256.30029, "r": 190.8567, "b": 260.65819999999997, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "177", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 210.28223, "t": 256.30029, "r": 212.86539, "b": 260.65819999999997, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "1", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 232.29153, "t": 256.30029, "r": 240.04287999999997, "b": 260.65819999999997, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "176", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 256.88446, "t": 256.30029, "r": 264.6358, "b": 260.65819999999997, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "150", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 281.47742, "t": 256.30029, "r": 289.22876, "b": 260.65819999999997, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "236", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 55.53052099999999, "t": 264.30175999999994, "r": 129.88177, "b": 268.69219999999996, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "\u7b2c 17 \u56de\u4eba\u5de5\u77e5\u80fd\u5b66\u4f1a\u5168\u56fd\u5927\u4f1a (2003)", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 183.10536, "t": 263.52739999999994, "r": 190.8567, "b": 267.88531, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "208", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 210.28223, "t": 263.52739999999994, "r": 212.86539, "b": 267.88531, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "5", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 232.29153, "t": 263.52739999999994, "r": 240.04287999999997, "b": 267.88531, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "203", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 256.88446, "t": 263.52739999999994, "r": 264.6358, "b": 267.88531, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "152", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 281.47742, "t": 263.52739999999994, "r": 289.22876, "b": 267.88531, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "244", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 55.53052099999999, "t": 271.52881, "r": 127.32454000000001, "b": 275.91925000000003, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "\u81ea\u7136\u8a00\u8a9e\u51e6\u7406\u7814\u7a76\u4f1a\u7b2c 146 \u301c 155 \u56de", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 184.39731, "t": 270.75446, "r": 189.56456, "b": 275.11237000000006, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "98", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 210.28223, "t": 270.75446, "r": 212.86539, "b": 275.11237000000006, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "2", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 233.58348, "t": 270.75446, "r": 238.75072999999998, "b": 275.11237000000006, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "96", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 256.88446, "t": 270.75446, "r": 264.6358, "b": 275.11237000000006, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "150", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 281.47742, "t": 270.75446, "r": 289.22876, "b": 275.11237000000006, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "232", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 55.53052099999999, "t": 279.01392, "r": 110.16829999999999, "b": 283.40436, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "WWW \u304b\u3089\u53ce\u96c6\u3057\u305f\u8ad6\u6587", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 183.10536, "t": 277.98157000000003, "r": 190.8567, "b": 282.33948000000004, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "107", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 208.99026, "t": 277.98157000000003, "r": 214.15752, "b": 282.33948000000004, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "73", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 233.58348, "t": 277.98157000000003, "r": 238.75072999999998, "b": 282.33948000000004, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "34", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 256.88446, "t": 277.98157000000003, "r": 264.6358, "b": 282.33948000000004, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "147", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 282.76938, "t": 277.98157000000003, "r": 287.93661, "b": 282.33948000000004, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "96", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 183.10536, "t": 285.46667, "r": 190.8567, "b": 289.8245800000001, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "945", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 207.69832, "t": 285.46667, "r": 215.44965999999997, "b": 289.8245800000001, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "294", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 232.29153, "t": 285.46667, "r": 240.04287999999997, "b": 289.8245800000001, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "651", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 255.76506, "t": 285.46667, "r": 265.75204, "b": 289.8245800000001, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "1122", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 281.47742, "t": 285.46667, "r": 289.22876, "b": 289.8245800000001, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "955", "column_header": false, "row_header": false, "row_section": false}]}, "8": {"label": "table", "id": 8, "page_no": 7, "cluster": {"id": 8, "label": "table", "bbox": {"l": 304.9219970703125, "t": 218.51490783691406, "r": 550.2321166992188, "b": 287.9006652832031, "coord_origin": "TOPLEFT"}, "confidence": 0.8900098204612732, "cells": [{"id": 77, "text": "Weighted Average Grant Date Fair", "bbox": {"l": 459.04861, "t": 221.62415, "r": 542.00018, "b": 226.68933000000004, "coord_origin": "TOPLEFT"}}, {"id": 78, "text": "Value", "bbox": {"l": 493.82193, "t": 227.83416999999997, "r": 507.2258, "b": 232.89935000000003, "coord_origin": "TOPLEFT"}}, {"id": 79, "text": "RS", "bbox": {"l": 393.2442, "t": 236.74712999999997, "r": 400.74588, "b": 241.81232, "coord_origin": "TOPLEFT"}}, {"id": 80, "text": "U", "bbox": {"l": 400.74643, "t": 236.74712999999997, "r": 404.64523, "b": 241.81232, "coord_origin": "TOPLEFT"}}, {"id": 81, "text": "s", "bbox": {"l": 404.6463, "t": 236.74712999999997, "r": 407.34631, "b": 241.81232, "coord_origin": "TOPLEFT"}}, {"id": 82, "text": "Shares (in millions)", "bbox": {"l": 392.09671, "t": 221.57446000000004, "r": 438.0145, "b": 226.63964999999996, "coord_origin": "TOPLEFT"}}, {"id": 83, "text": "PSUs", "bbox": {"l": 427.18323, "t": 236.74712999999997, "r": 440.98778999999996, "b": 241.81232, "coord_origin": "TOPLEFT"}}, {"id": 84, "text": "RSUs", "bbox": {"l": 468.38254, "t": 236.74712999999997, "r": 482.48465000000004, "b": 241.81232, "coord_origin": "TOPLEFT"}}, {"id": 85, "text": "PSUs", "bbox": {"l": 516.92578, "t": 236.74712999999997, "r": 530.73035, "b": 241.81232, "coord_origin": "TOPLEFT"}}, {"id": 86, "text": "Nonvested on Janua", "bbox": {"l": 306.11493, "t": 244.61084000000005, "r": 355.6532, "b": 249.67602999999997, "coord_origin": "TOPLEFT"}}, {"id": 87, "text": "ry 1", "bbox": {"l": 355.65427, "t": 244.61084000000005, "r": 364.65607, "b": 249.67602999999997, "coord_origin": "TOPLEFT"}}, {"id": 88, "text": "1.", "bbox": {"l": 396.24661, "t": 244.91327, "r": 400.75238, "b": 249.97844999999995, "coord_origin": "TOPLEFT"}}, {"id": 89, "text": "1", "bbox": {"l": 400.7529, "t": 244.91327, "r": 403.75531, "b": 249.97844999999995, "coord_origin": "TOPLEFT"}}, {"id": 90, "text": "0.3", "bbox": {"l": 429.81838999999997, "t": 244.91327, "r": 437.32708999999994, "b": 249.97844999999995, "coord_origin": "TOPLEFT"}}, {"id": 91, "text": "90.10", "bbox": {"l": 465.52859, "t": 244.91327, "r": 478.40103, "b": 249.97844999999995, "coord_origin": "TOPLEFT"}}, {"id": 92, "text": "$", "bbox": {"l": 480.97552, "t": 244.91327, "r": 483.55001999999996, "b": 249.97844999999995, "coord_origin": "TOPLEFT"}}, {"id": 93, "text": "$ 91.19", "bbox": {"l": 513.44824, "t": 244.91327, "r": 531.46967, "b": 249.97844999999995, "coord_origin": "TOPLEFT"}}, {"id": 94, "text": "Granted", "bbox": {"l": 306.11493, "t": 253.68451000000005, "r": 325.62674, "b": 258.74969, "coord_origin": "TOPLEFT"}}, {"id": 95, "text": "0.", "bbox": {"l": 396.24661, "t": 253.68451000000005, "r": 400.75238, "b": 258.74969, "coord_origin": "TOPLEFT"}}, {"id": 96, "text": "5", "bbox": {"l": 400.7529, "t": 253.68451000000005, "r": 403.75531, "b": 258.74969, "coord_origin": "TOPLEFT"}}, {"id": 97, "text": "0.1", "bbox": {"l": 429.81838999999997, "t": 253.68451000000005, "r": 437.32708999999994, "b": 258.74969, "coord_origin": "TOPLEFT"}}, {"id": 98, "text": "117.44", "bbox": {"l": 466.43579000000005, "t": 253.68451000000005, "r": 482.54831, "b": 258.74969, "coord_origin": "TOPLEFT"}}, {"id": 99, "text": "122.41", "bbox": {"l": 514.29065, "t": 253.68451000000005, "r": 530.80981, "b": 258.74969, "coord_origin": "TOPLEFT"}}, {"id": 100, "text": "Vested", "bbox": {"l": 306.11493, "t": 261.54822, "r": 322.62866, "b": 266.61339999999996, "coord_origin": "TOPLEFT"}}, {"id": 101, "text": "(0.", "bbox": {"l": 394.43222, "t": 261.54822, "r": 400.73563, "b": 266.61339999999996, "coord_origin": "TOPLEFT"}}, {"id": 102, "text": "5", "bbox": {"l": 400.73456, "t": 261.54822, "r": 403.73697, "b": 266.61339999999996, "coord_origin": "TOPLEFT"}}, {"id": 103, "text": ")", "bbox": {"l": 403.73804, "t": 261.54822, "r": 405.53625, "b": 266.61339999999996, "coord_origin": "TOPLEFT"}}, {"id": 104, "text": "(0.1)", "bbox": {"l": 427.7016, "t": 261.54822, "r": 438.80563, "b": 266.61339999999996, "coord_origin": "TOPLEFT"}}, {"id": 105, "text": "87.08", "bbox": {"l": 468.55533, "t": 261.54822, "r": 482.07043, "b": 266.61339999999996, "coord_origin": "TOPLEFT"}}, {"id": 106, "text": "81.14", "bbox": {"l": 516.01862, "t": 261.54822, "r": 529.53375, "b": 266.61339999999996, "coord_origin": "TOPLEFT"}}, {"id": 107, "text": "Canceled or forfeited", "bbox": {"l": 306.11493, "t": 269.64148, "r": 356.24771, "b": 274.70667000000003, "coord_origin": "TOPLEFT"}}, {"id": 108, "text": "(0.", "bbox": {"l": 394.43222, "t": 270.31946000000005, "r": 400.73563, "b": 275.38464, "coord_origin": "TOPLEFT"}}, {"id": 109, "text": "1", "bbox": {"l": 400.73456, "t": 270.31946000000005, "r": 403.73697, "b": 275.38464, "coord_origin": "TOPLEFT"}}, {"id": 110, "text": ")", "bbox": {"l": 403.73804, "t": 270.31946000000005, "r": 405.53625, "b": 275.38464, "coord_origin": "TOPLEFT"}}, {"id": 111, "text": "-", "bbox": {"l": 431.02802, "t": 270.31946000000005, "r": 436.4280099999999, "b": 275.38464, "coord_origin": "TOPLEFT"}}, {"id": 112, "text": "102.01", "bbox": {"l": 465.83099000000004, "t": 270.31946000000005, "r": 482.35013, "b": 275.38464, "coord_origin": "TOPLEFT"}}, {"id": 113, "text": "92.18", "bbox": {"l": 516.01862, "t": 270.31946000000005, "r": 529.53375, "b": 275.38464, "coord_origin": "TOPLEFT"}}, {"id": 114, "text": "Nonvested on December 31", "bbox": {"l": 306.11493, "t": 278.48572, "r": 373.35764, "b": 283.55092999999994, "coord_origin": "TOPLEFT"}}, {"id": 115, "text": "1.0", "bbox": {"l": 396.24661, "t": 278.48572, "r": 403.75531, "b": 283.55092999999994, "coord_origin": "TOPLEFT"}}, {"id": 116, "text": "0.3", "bbox": {"l": 429.51599, "t": 278.48572, "r": 437.02469, "b": 283.55092999999994, "coord_origin": "TOPLEFT"}}, {"id": 117, "text": "104.85 $", "bbox": {"l": 463.7142, "t": 278.48572, "r": 484.73965000000004, "b": 283.55092999999994, "coord_origin": "TOPLEFT"}}, {"id": 118, "text": "$ 104.51", "bbox": {"l": 512.99463, "t": 278.48572, "r": 534.02008, "b": 283.55092999999994, "coord_origin": "TOPLEFT"}}]}, "text": null, "otsl_seq": ["ecel", "ched", "lcel", "ched", "lcel", "nl", "ecel", "ched", "ched", "ched", "ched", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "nl"], "num_rows": 7, "num_cols": 5, "table_cells": [{"bbox": {"l": 459.04861, "t": 221.62415, "r": 542.00018, "b": 232.89935000000003, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 2, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 3, "end_col_offset_idx": 5, "text": "Weighted Average Grant Date Fair Value", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 393.2442, "t": 236.74712999999997, "r": 407.34631, "b": 241.81232, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "RS U s", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 392.09671, "t": 221.57446000000004, "r": 438.0145, "b": 226.63964999999996, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 2, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 1, "end_col_offset_idx": 3, "text": "Shares (in millions)", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 427.18323, "t": 236.74712999999997, "r": 440.98778999999996, "b": 241.81232, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "PSUs", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 468.38254, "t": 236.74712999999997, "r": 482.48465000000004, "b": 241.81232, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "RSUs", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 516.92578, "t": 236.74712999999997, "r": 530.73035, "b": 241.81232, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "PSUs", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 306.11493, "t": 244.61084000000005, "r": 364.65607, "b": 249.67602999999997, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Nonvested on Janua ry 1", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 396.24661, "t": 244.91327, "r": 403.75531, "b": 249.97844999999995, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "1. 1", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 429.81838999999997, "t": 244.91327, "r": 437.32708999999994, "b": 249.97844999999995, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "0.3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 465.52859, "t": 244.91327, "r": 483.55001999999996, "b": 249.97844999999995, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "90.10 $", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 513.44824, "t": 244.91327, "r": 531.46967, "b": 249.97844999999995, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "$ 91.19", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 306.11493, "t": 253.68451000000005, "r": 325.62674, "b": 258.74969, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Granted", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 396.24661, "t": 253.68451000000005, "r": 403.75531, "b": 258.74969, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "0. 5", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 429.81838999999997, "t": 253.68451000000005, "r": 437.32708999999994, "b": 258.74969, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "0.1", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 466.43579000000005, "t": 253.68451000000005, "r": 482.54831, "b": 258.74969, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "117.44", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 514.29065, "t": 253.68451000000005, "r": 530.80981, "b": 258.74969, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "122.41", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 306.11493, "t": 261.54822, "r": 322.62866, "b": 266.61339999999996, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Vested", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 394.43222, "t": 261.54822, "r": 405.53625, "b": 266.61339999999996, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "(0. 5 )", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 427.7016, "t": 261.54822, "r": 438.80563, "b": 266.61339999999996, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "(0.1)", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 468.55533, "t": 261.54822, "r": 482.07043, "b": 266.61339999999996, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "87.08", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 516.01862, "t": 261.54822, "r": 529.53375, "b": 266.61339999999996, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "81.14", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 306.11493, "t": 269.64148, "r": 356.24771, "b": 274.70667000000003, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Canceled or forfeited", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 394.43222, "t": 270.31946000000005, "r": 405.53625, "b": 275.38464, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "(0. 1 )", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 431.02802, "t": 270.31946000000005, "r": 436.4280099999999, "b": 275.38464, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "-", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 465.83099000000004, "t": 270.31946000000005, "r": 482.35013, "b": 275.38464, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "102.01", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 516.01862, "t": 270.31946000000005, "r": 529.53375, "b": 275.38464, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "92.18", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 306.11493, "t": 278.48572, "r": 373.35764, "b": 283.55092999999994, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Nonvested on December 31", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 396.24661, "t": 278.48572, "r": 403.75531, "b": 283.55092999999994, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "1.0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 429.51599, "t": 278.48572, "r": 437.02469, "b": 283.55092999999994, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "0.3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 463.7142, "t": 278.48572, "r": 484.73965000000004, "b": 283.55092999999994, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "104.85 $", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 512.99463, "t": 278.48572, "r": 534.02008, "b": 283.55092999999994, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "$ 104.51", "column_header": false, "row_header": false, "row_section": false}]}}}, "figures_classification": null, "equations_prediction": null}, "assembled": {"elements": [{"label": "text", "id": 0, "page_no": 7, "cluster": {"id": 0, "label": "text", "bbox": {"l": 53.811783000000005, "t": 208.23328000000004, "r": 62.219952, "b": 216.10645, "coord_origin": "TOPLEFT"}, "confidence": -1.0, "cells": [{"id": 0, "text": "b.", "bbox": {"l": 53.811783000000005, "t": 208.23328000000004, "r": 62.219952, "b": 216.10645, "coord_origin": "TOPLEFT"}}]}, "text": "b."}, {"label": "text", "id": 1, "page_no": 7, "cluster": {"id": 1, "label": "text", "bbox": {"l": 66.424026, "t": 208.23328000000004, "r": 385.93451, "b": 216.10645, "coord_origin": "TOPLEFT"}, "confidence": -1.0, "cells": [{"id": 1, "text": "Structure predicted by TableFormer, with superimposed matched PDF cell text:", "bbox": {"l": 66.424026, "t": 208.23328000000004, "r": 385.93451, "b": 216.10645, "coord_origin": "TOPLEFT"}}]}, "text": "Structure predicted by TableFormer, with superimposed matched PDF cell text:"}, {"label": "text", "id": 2, "page_no": 7, "cluster": {"id": 2, "label": "text", "bbox": {"l": 53.811783000000005, "t": 94.28112999999996, "r": 284.34592, "b": 102.15430000000003, "coord_origin": "TOPLEFT"}, "confidence": -1.0, "cells": [{"id": 2, "text": "Japanese language (previously unseen by TableFormer):", "bbox": {"l": 53.811783000000005, "t": 94.28112999999996, "r": 284.34592, "b": 102.15430000000003, "coord_origin": "TOPLEFT"}}]}, "text": "Japanese language (previously unseen by TableFormer):"}, {"label": "picture", "id": 3, "page_no": 7, "cluster": {"id": 3, "label": "picture", "bbox": {"l": 304.83081, "t": 94.28112999999996, "r": 554.8255615234375, "b": 180.62570190429688, "coord_origin": "TOPLEFT"}, "confidence": 0.7697690725326538, "cells": [{"id": 3, "text": "Example table from FinTabNet:", "bbox": {"l": 304.83081, "t": 94.28112999999996, "r": 431.09119, "b": 102.15430000000003, "coord_origin": "TOPLEFT"}}]}, "text": "", "data": null, "provenance": null, "predicted_class": null, "confidence": null}, {"label": "text", "id": 4, "page_no": 7, "cluster": {"id": 4, "label": "text", "bbox": {"l": 53.286037, "t": 78.68756000000008, "r": 61.550289, "b": 86.56073000000004, "coord_origin": "TOPLEFT"}, "confidence": -1.0, "cells": [{"id": 4, "text": "a.", "bbox": {"l": 53.286037, "t": 78.68756000000008, "r": 61.550289, "b": 86.56073000000004, "coord_origin": "TOPLEFT"}}]}, "text": "a."}, {"label": "text", "id": 5, "page_no": 7, "cluster": {"id": 5, "label": "text", "bbox": {"l": 65.682419, "t": 77.8168716430664, "r": 500.1541748046875, "b": 86.9799575805664, "coord_origin": "TOPLEFT"}, "confidence": 0.6126988530158997, "cells": [{"id": 5, "text": "Red - PDF cells, Green - predicted bounding boxes, Blue - post-processed predictions matched to PDF cells", "bbox": {"l": 65.682419, "t": 78.68756000000008, "r": 499.55563, "b": 86.56073000000004, "coord_origin": "TOPLEFT"}}]}, "text": "Red - PDF cells, Green - predicted bounding boxes, Blue - post-processed predictions matched to PDF cells"}, {"label": "table", "id": 6, "page_no": 7, "cluster": {"id": 6, "label": "table", "bbox": {"l": 53.6285400390625, "t": 218.94859313964844, "r": 298.5574035644531, "b": 292.3999938964844, "coord_origin": "TOPLEFT"}, "confidence": 0.8824083805084229, "cells": [{"id": 6, "text": "\u8ad6\u6587\u30d5\u30a1\u30a4\u30eb", "bbox": {"l": 209.93285, "t": 222.18073000000004, "r": 241.04458999999997, "b": 226.36212, "coord_origin": "TOPLEFT"}}, {"id": 7, "text": "\u53c2\u8003\u6587\u732e", "bbox": {"l": 263.76489, "t": 222.18073000000004, "r": 284.50589, "b": 226.36212, "coord_origin": "TOPLEFT"}}, {"id": 8, "text": "\u51fa\u5178", "bbox": {"l": 110.24990999999999, "t": 229.66594999999995, "r": 120.62018, "b": 233.84735, "coord_origin": "TOPLEFT"}}, {"id": 9, "text": "\u30d5\u30a1\u30a4\u30eb", "bbox": {"l": 175.36609, "t": 229.66594999999995, "r": 196.1071, "b": 233.84735, "coord_origin": "TOPLEFT"}}, {"id": 10, "text": "\u6570", "bbox": {"l": 196.10756, "t": 229.66594999999995, "r": 201.29247, "b": 233.84735, "coord_origin": "TOPLEFT"}}, {"id": 11, "text": "\u82f1\u8a9e", "bbox": {"l": 209.62408, "t": 229.66594999999995, "r": 219.99435, "b": 233.84735, "coord_origin": "TOPLEFT"}}, {"id": 12, "text": "\u65e5\u672c\u8a9e", "bbox": {"l": 229.19814, "t": 229.66594999999995, "r": 244.75377, "b": 233.84735, "coord_origin": "TOPLEFT"}}, {"id": 13, "text": "\u82f1\u8a9e", "bbox": {"l": 256.1142, "t": 229.66594999999995, "r": 266.48447, "b": 233.84735, "coord_origin": "TOPLEFT"}}, {"id": 14, "text": "\u65e5\u672c\u8a9e", "bbox": {"l": 278.38434, "t": 229.66594999999995, "r": 293.93997, "b": 233.84735, "coord_origin": "TOPLEFT"}}, {"id": 15, "text": "Association for Computational Linguistics(ACL2003)", "bbox": {"l": 55.53052099999999, "t": 236.42584, "r": 162.7131, "b": 240.78375000000005, "coord_origin": "TOPLEFT"}}, {"id": 16, "text": "65", "bbox": {"l": 184.39731, "t": 236.42584, "r": 189.56456, "b": 240.78375000000005, "coord_origin": "TOPLEFT"}}, {"id": 17, "text": "65", "bbox": {"l": 208.99026, "t": 236.42584, "r": 214.15752, "b": 240.78375000000005, "coord_origin": "TOPLEFT"}}, {"id": 18, "text": "0", "bbox": {"l": 234.87517, "t": 236.42584, "r": 237.45833000000002, "b": 240.78375000000005, "coord_origin": "TOPLEFT"}}, {"id": 19, "text": "150", "bbox": {"l": 256.88446, "t": 236.42584, "r": 264.6358, "b": 240.78375000000005, "coord_origin": "TOPLEFT"}}, {"id": 20, "text": "0", "bbox": {"l": 284.06134, "t": 236.42584, "r": 286.6445, "b": 240.78375000000005, "coord_origin": "TOPLEFT"}}, {"id": 21, "text": "Computational Linguistics(COLING2002)", "bbox": {"l": 55.53052099999999, "t": 242.62048000000004, "r": 139.72253, "b": 246.97839, "coord_origin": "TOPLEFT"}}, {"id": 22, "text": "140", "bbox": {"l": 183.10536, "t": 242.62048000000004, "r": 190.8567, "b": 246.97839, "coord_origin": "TOPLEFT"}}, {"id": 23, "text": "140", "bbox": {"l": 207.69832, "t": 242.62048000000004, "r": 215.44965999999997, "b": 246.97839, "coord_origin": "TOPLEFT"}}, {"id": 24, "text": "0", "bbox": {"l": 234.87517, "t": 242.62048000000004, "r": 237.45833000000002, "b": 246.97839, "coord_origin": "TOPLEFT"}}, {"id": 25, "text": "150", "bbox": {"l": 256.88446, "t": 242.62048000000004, "r": 264.6358, "b": 246.97839, "coord_origin": "TOPLEFT"}}, {"id": 26, "text": "0", "bbox": {"l": 284.06134, "t": 242.62048000000004, "r": 286.6445, "b": 246.97839, "coord_origin": "TOPLEFT"}}, {"id": 27, "text": "\u96fb\u6c17\u60c5\u5831\u901a\u4fe1\u5b66\u4f1a", "bbox": {"l": 55.53052099999999, "t": 249.79845999999998, "r": 97.013, "b": 253.97986000000003, "coord_origin": "TOPLEFT"}}, {"id": 28, "text": "2003", "bbox": {"l": 92.698288, "t": 249.58942000000002, "r": 103.03371, "b": 253.94732999999997, "coord_origin": "TOPLEFT"}}, {"id": 29, "text": "\u5e74\u7dcf\u5408\u5927\u4f1a", "bbox": {"l": 103.03389, "t": 249.79845999999998, "r": 128.96027, "b": 253.97986000000003, "coord_origin": "TOPLEFT"}}, {"id": 30, "text": "150", "bbox": {"l": 183.10536, "t": 248.81506000000002, "r": 190.8567, "b": 253.17296999999996, "coord_origin": "TOPLEFT"}}, {"id": 31, "text": "8", "bbox": {"l": 210.28223, "t": 248.81506000000002, "r": 212.86539, "b": 253.17296999999996, "coord_origin": "TOPLEFT"}}, {"id": 32, "text": "142", "bbox": {"l": 232.29153, "t": 248.81506000000002, "r": 240.04287999999997, "b": 253.17296999999996, "coord_origin": "TOPLEFT"}}, {"id": 33, "text": "223", "bbox": {"l": 256.88446, "t": 248.81506000000002, "r": 264.6358, "b": 253.17296999999996, "coord_origin": "TOPLEFT"}}, {"id": 34, "text": "147", "bbox": {"l": 281.47742, "t": 248.81506000000002, "r": 289.22876, "b": 253.17296999999996, "coord_origin": "TOPLEFT"}}, {"id": 35, "text": "\u60c5\u5831\u51e6\u7406\u5b66\u4f1a\u7b2c", "bbox": {"l": 55.53052099999999, "t": 257.28369, "r": 91.827637, "b": 261.46509000000003, "coord_origin": "TOPLEFT"}}, {"id": 36, "text": "65", "bbox": {"l": 88.052673, "t": 257.07465, "r": 93.219925, "b": 261.43255999999997, "coord_origin": "TOPLEFT"}}, {"id": 37, "text": "\u56de\u5168\u56fd\u5927\u4f1a", "bbox": {"l": 93.220474, "t": 257.28369, "r": 119.14685, "b": 261.46509000000003, "coord_origin": "TOPLEFT"}}, {"id": 38, "text": "(2003)", "bbox": {"l": 116.45073999999998, "t": 257.07465, "r": 129.88177, "b": 261.43255999999997, "coord_origin": "TOPLEFT"}}, {"id": 39, "text": "177", "bbox": {"l": 183.10536, "t": 256.30029, "r": 190.8567, "b": 260.65819999999997, "coord_origin": "TOPLEFT"}}, {"id": 40, "text": "1", "bbox": {"l": 210.28223, "t": 256.30029, "r": 212.86539, "b": 260.65819999999997, "coord_origin": "TOPLEFT"}}, {"id": 41, "text": "176", "bbox": {"l": 232.29153, "t": 256.30029, "r": 240.04287999999997, "b": 260.65819999999997, "coord_origin": "TOPLEFT"}}, {"id": 42, "text": "150", "bbox": {"l": 256.88446, "t": 256.30029, "r": 264.6358, "b": 260.65819999999997, "coord_origin": "TOPLEFT"}}, {"id": 43, "text": "236", "bbox": {"l": 281.47742, "t": 256.30029, "r": 289.22876, "b": 260.65819999999997, "coord_origin": "TOPLEFT"}}, {"id": 44, "text": "\u7b2c", "bbox": {"l": 55.53052099999999, "t": 264.5108, "r": 60.715424, "b": 268.69219999999996, "coord_origin": "TOPLEFT"}}, {"id": 45, "text": "17", "bbox": {"l": 60.17654799999999, "t": 264.30175999999994, "r": 65.343796, "b": 268.65967, "coord_origin": "TOPLEFT"}}, {"id": 46, "text": "\u56de\u4eba\u5de5\u77e5\u80fd\u5b66\u4f1a\u5168\u56fd\u5927\u4f1a", "bbox": {"l": 65.344376, "t": 264.5108, "r": 122.38297000000001, "b": 268.69219999999996, "coord_origin": "TOPLEFT"}}, {"id": 47, "text": "(2003)", "bbox": {"l": 116.45073999999998, "t": 264.30175999999994, "r": 129.88177, "b": 268.65967, "coord_origin": "TOPLEFT"}}, {"id": 48, "text": "208", "bbox": {"l": 183.10536, "t": 263.52739999999994, "r": 190.8567, "b": 267.88531, "coord_origin": "TOPLEFT"}}, {"id": 49, "text": "5", "bbox": {"l": 210.28223, "t": 263.52739999999994, "r": 212.86539, "b": 267.88531, "coord_origin": "TOPLEFT"}}, {"id": 50, "text": "203", "bbox": {"l": 232.29153, "t": 263.52739999999994, "r": 240.04287999999997, "b": 267.88531, "coord_origin": "TOPLEFT"}}, {"id": 51, "text": "152", "bbox": {"l": 256.88446, "t": 263.52739999999994, "r": 264.6358, "b": 267.88531, "coord_origin": "TOPLEFT"}}, {"id": 52, "text": "244", "bbox": {"l": 281.47742, "t": 263.52739999999994, "r": 289.22876, "b": 267.88531, "coord_origin": "TOPLEFT"}}, {"id": 53, "text": "\u81ea\u7136\u8a00\u8a9e\u51e6\u7406\u7814\u7a76\u4f1a\u7b2c", "bbox": {"l": 55.53052099999999, "t": 271.73785, "r": 107.38374, "b": 275.91925000000003, "coord_origin": "TOPLEFT"}}, {"id": 54, "text": "146", "bbox": {"l": 101.99034, "t": 271.52881, "r": 109.74168000000002, "b": 275.88671999999997, "coord_origin": "TOPLEFT"}}, {"id": 55, "text": "\u301c", "bbox": {"l": 109.74204, "t": 271.73785, "r": 114.92695000000002, "b": 275.91925000000003, "coord_origin": "TOPLEFT"}}, {"id": 56, "text": "155", "bbox": {"l": 114.38793, "t": 271.52881, "r": 122.13927, "b": 275.88671999999997, "coord_origin": "TOPLEFT"}}, {"id": 57, "text": "\u56de", "bbox": {"l": 122.13963, "t": 271.73785, "r": 127.32454000000001, "b": 275.91925000000003, "coord_origin": "TOPLEFT"}}, {"id": 58, "text": "98", "bbox": {"l": 184.39731, "t": 270.75446, "r": 189.56456, "b": 275.11237000000006, "coord_origin": "TOPLEFT"}}, {"id": 59, "text": "2", "bbox": {"l": 210.28223, "t": 270.75446, "r": 212.86539, "b": 275.11237000000006, "coord_origin": "TOPLEFT"}}, {"id": 60, "text": "96", "bbox": {"l": 233.58348, "t": 270.75446, "r": 238.75072999999998, "b": 275.11237000000006, "coord_origin": "TOPLEFT"}}, {"id": 61, "text": "150", "bbox": {"l": 256.88446, "t": 270.75446, "r": 264.6358, "b": 275.11237000000006, "coord_origin": "TOPLEFT"}}, {"id": 62, "text": "232", "bbox": {"l": 281.47742, "t": 270.75446, "r": 289.22876, "b": 275.11237000000006, "coord_origin": "TOPLEFT"}}, {"id": 63, "text": "WWW", "bbox": {"l": 55.53052099999999, "t": 279.01392, "r": 68.68605, "b": 283.37183, "coord_origin": "TOPLEFT"}}, {"id": 64, "text": "\u304b\u3089\u53ce\u96c6\u3057\u305f\u8ad6\u6587", "bbox": {"l": 68.685814, "t": 279.22295999999994, "r": 110.16829999999999, "b": 283.40436, "coord_origin": "TOPLEFT"}}, {"id": 65, "text": "107", "bbox": {"l": 183.10536, "t": 277.98157000000003, "r": 190.8567, "b": 282.33948000000004, "coord_origin": "TOPLEFT"}}, {"id": 66, "text": "73", "bbox": {"l": 208.99026, "t": 277.98157000000003, "r": 214.15752, "b": 282.33948000000004, "coord_origin": "TOPLEFT"}}, {"id": 67, "text": "34", "bbox": {"l": 233.58348, "t": 277.98157000000003, "r": 238.75072999999998, "b": 282.33948000000004, "coord_origin": "TOPLEFT"}}, {"id": 68, "text": "147", "bbox": {"l": 256.88446, "t": 277.98157000000003, "r": 264.6358, "b": 282.33948000000004, "coord_origin": "TOPLEFT"}}, {"id": 69, "text": "96", "bbox": {"l": 282.76938, "t": 277.98157000000003, "r": 287.93661, "b": 282.33948000000004, "coord_origin": "TOPLEFT"}}, {"id": 70, "text": "\u8a08", "bbox": {"l": 169.61508, "t": 286.45004, "r": 174.79999, "b": 290.63141, "coord_origin": "TOPLEFT"}}, {"id": 71, "text": "945", "bbox": {"l": 183.10536, "t": 285.46667, "r": 190.8567, "b": 289.8245800000001, "coord_origin": "TOPLEFT"}}, {"id": 72, "text": "294", "bbox": {"l": 207.69832, "t": 285.46667, "r": 215.44965999999997, "b": 289.8245800000001, "coord_origin": "TOPLEFT"}}, {"id": 73, "text": "651", "bbox": {"l": 232.29153, "t": 285.46667, "r": 240.04287999999997, "b": 289.8245800000001, "coord_origin": "TOPLEFT"}}, {"id": 74, "text": "1122", "bbox": {"l": 255.76506, "t": 285.46667, "r": 265.75204, "b": 289.8245800000001, "coord_origin": "TOPLEFT"}}, {"id": 75, "text": "955", "bbox": {"l": 281.47742, "t": 285.46667, "r": 289.22876, "b": 289.8245800000001, "coord_origin": "TOPLEFT"}}]}, "text": null, "otsl_seq": ["ched", "ched", "ched", "lcel", "ched", "lcel", "nl", "ched", "ched", "ched", "ched", "ched", "ched", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "fcel", "nl"], "num_rows": 10, "num_cols": 6, "table_cells": [{"bbox": {"l": 209.93285, "t": 222.18073000000004, "r": 241.04458999999997, "b": 226.36212, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 2, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 2, "end_col_offset_idx": 4, "text": "\u8ad6\u6587\u30d5\u30a1\u30a4\u30eb", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 263.76489, "t": 222.18073000000004, "r": 284.50589, "b": 226.36212, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 2, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 4, "end_col_offset_idx": 6, "text": "\u53c2\u8003\u6587\u732e", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 110.24990999999999, "t": 229.66594999999995, "r": 120.62018, "b": 233.84735, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "\u51fa\u5178", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 175.36609, "t": 229.66594999999995, "r": 201.29247, "b": 233.84735, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "\u30d5\u30a1\u30a4\u30eb \u6570", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 209.62408, "t": 229.66594999999995, "r": 219.99435, "b": 233.84735, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "\u82f1\u8a9e", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 229.19814, "t": 229.66594999999995, "r": 244.75377, "b": 233.84735, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "\u65e5\u672c\u8a9e", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 256.1142, "t": 229.66594999999995, "r": 266.48447, "b": 233.84735, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "\u82f1\u8a9e", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 278.38434, "t": 229.66594999999995, "r": 293.93997, "b": 233.84735, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "\u65e5\u672c\u8a9e", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 55.53052099999999, "t": 236.42584, "r": 162.7131, "b": 240.78375000000005, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Association for Computational Linguistics(ACL2003)", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 184.39731, "t": 236.42584, "r": 189.56456, "b": 240.78375000000005, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "65", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 208.99026, "t": 236.42584, "r": 214.15752, "b": 240.78375000000005, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "65", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 234.87517, "t": 236.42584, "r": 237.45833000000002, "b": 240.78375000000005, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 256.88446, "t": 236.42584, "r": 264.6358, "b": 240.78375000000005, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "150", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 284.06134, "t": 236.42584, "r": 286.6445, "b": 240.78375000000005, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 55.53052099999999, "t": 242.62048000000004, "r": 139.72253, "b": 246.97839, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Computational Linguistics(COLING2002)", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 183.10536, "t": 242.62048000000004, "r": 190.8567, "b": 246.97839, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "140", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 207.69832, "t": 242.62048000000004, "r": 215.44965999999997, "b": 246.97839, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "140", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 234.87517, "t": 242.62048000000004, "r": 237.45833000000002, "b": 246.97839, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 256.88446, "t": 242.62048000000004, "r": 264.6358, "b": 246.97839, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "150", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 284.06134, "t": 242.62048000000004, "r": 286.6445, "b": 246.97839, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 55.53052099999999, "t": 249.58942000000002, "r": 128.96027, "b": 253.97986000000003, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "\u96fb\u6c17\u60c5\u5831\u901a\u4fe1\u5b66\u4f1a 2003 \u5e74\u7dcf\u5408\u5927\u4f1a", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 183.10536, "t": 248.81506000000002, "r": 190.8567, "b": 253.17296999999996, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "150", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 210.28223, "t": 248.81506000000002, "r": 212.86539, "b": 253.17296999999996, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "8", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 232.29153, "t": 248.81506000000002, "r": 240.04287999999997, "b": 253.17296999999996, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "142", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 256.88446, "t": 248.81506000000002, "r": 264.6358, "b": 253.17296999999996, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "223", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 281.47742, "t": 248.81506000000002, "r": 289.22876, "b": 253.17296999999996, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "147", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 55.53052099999999, "t": 257.07465, "r": 129.88177, "b": 261.46509000000003, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "\u60c5\u5831\u51e6\u7406\u5b66\u4f1a\u7b2c 65 \u56de\u5168\u56fd\u5927\u4f1a (2003)", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 183.10536, "t": 256.30029, "r": 190.8567, "b": 260.65819999999997, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "177", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 210.28223, "t": 256.30029, "r": 212.86539, "b": 260.65819999999997, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "1", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 232.29153, "t": 256.30029, "r": 240.04287999999997, "b": 260.65819999999997, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "176", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 256.88446, "t": 256.30029, "r": 264.6358, "b": 260.65819999999997, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "150", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 281.47742, "t": 256.30029, "r": 289.22876, "b": 260.65819999999997, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "236", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 55.53052099999999, "t": 264.30175999999994, "r": 129.88177, "b": 268.69219999999996, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "\u7b2c 17 \u56de\u4eba\u5de5\u77e5\u80fd\u5b66\u4f1a\u5168\u56fd\u5927\u4f1a (2003)", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 183.10536, "t": 263.52739999999994, "r": 190.8567, "b": 267.88531, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "208", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 210.28223, "t": 263.52739999999994, "r": 212.86539, "b": 267.88531, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "5", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 232.29153, "t": 263.52739999999994, "r": 240.04287999999997, "b": 267.88531, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "203", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 256.88446, "t": 263.52739999999994, "r": 264.6358, "b": 267.88531, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "152", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 281.47742, "t": 263.52739999999994, "r": 289.22876, "b": 267.88531, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "244", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 55.53052099999999, "t": 271.52881, "r": 127.32454000000001, "b": 275.91925000000003, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "\u81ea\u7136\u8a00\u8a9e\u51e6\u7406\u7814\u7a76\u4f1a\u7b2c 146 \u301c 155 \u56de", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 184.39731, "t": 270.75446, "r": 189.56456, "b": 275.11237000000006, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "98", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 210.28223, "t": 270.75446, "r": 212.86539, "b": 275.11237000000006, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "2", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 233.58348, "t": 270.75446, "r": 238.75072999999998, "b": 275.11237000000006, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "96", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 256.88446, "t": 270.75446, "r": 264.6358, "b": 275.11237000000006, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "150", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 281.47742, "t": 270.75446, "r": 289.22876, "b": 275.11237000000006, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "232", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 55.53052099999999, "t": 279.01392, "r": 110.16829999999999, "b": 283.40436, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "WWW \u304b\u3089\u53ce\u96c6\u3057\u305f\u8ad6\u6587", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 183.10536, "t": 277.98157000000003, "r": 190.8567, "b": 282.33948000000004, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "107", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 208.99026, "t": 277.98157000000003, "r": 214.15752, "b": 282.33948000000004, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "73", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 233.58348, "t": 277.98157000000003, "r": 238.75072999999998, "b": 282.33948000000004, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "34", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 256.88446, "t": 277.98157000000003, "r": 264.6358, "b": 282.33948000000004, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "147", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 282.76938, "t": 277.98157000000003, "r": 287.93661, "b": 282.33948000000004, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "96", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 183.10536, "t": 285.46667, "r": 190.8567, "b": 289.8245800000001, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "945", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 207.69832, "t": 285.46667, "r": 215.44965999999997, "b": 289.8245800000001, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "294", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 232.29153, "t": 285.46667, "r": 240.04287999999997, "b": 289.8245800000001, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "651", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 255.76506, "t": 285.46667, "r": 265.75204, "b": 289.8245800000001, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "1122", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 281.47742, "t": 285.46667, "r": 289.22876, "b": 289.8245800000001, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "955", "column_header": false, "row_header": false, "row_section": false}]}, {"label": "caption", "id": 7, "page_no": 7, "cluster": {"id": 7, "label": "caption", "bbox": {"l": 380.340087890625, "t": 291.73724365234375, "r": 549.7123413085938, "b": 299.1470642089844, "coord_origin": "TOPLEFT"}, "confidence": 0.7500573396682739, "cells": [{"id": 76, "text": "Text is aligned to match original for ease of viewing", "bbox": {"l": 380.42731, "t": 292.30426, "r": 549.42175, "b": 298.60284, "coord_origin": "TOPLEFT"}}]}, "text": "Text is aligned to match original for ease of viewing"}, {"label": "table", "id": 8, "page_no": 7, "cluster": {"id": 8, "label": "table", "bbox": {"l": 304.9219970703125, "t": 218.51490783691406, "r": 550.2321166992188, "b": 287.9006652832031, "coord_origin": "TOPLEFT"}, "confidence": 0.8900098204612732, "cells": [{"id": 77, "text": "Weighted Average Grant Date Fair", "bbox": {"l": 459.04861, "t": 221.62415, "r": 542.00018, "b": 226.68933000000004, "coord_origin": "TOPLEFT"}}, {"id": 78, "text": "Value", "bbox": {"l": 493.82193, "t": 227.83416999999997, "r": 507.2258, "b": 232.89935000000003, "coord_origin": "TOPLEFT"}}, {"id": 79, "text": "RS", "bbox": {"l": 393.2442, "t": 236.74712999999997, "r": 400.74588, "b": 241.81232, "coord_origin": "TOPLEFT"}}, {"id": 80, "text": "U", "bbox": {"l": 400.74643, "t": 236.74712999999997, "r": 404.64523, "b": 241.81232, "coord_origin": "TOPLEFT"}}, {"id": 81, "text": "s", "bbox": {"l": 404.6463, "t": 236.74712999999997, "r": 407.34631, "b": 241.81232, "coord_origin": "TOPLEFT"}}, {"id": 82, "text": "Shares (in millions)", "bbox": {"l": 392.09671, "t": 221.57446000000004, "r": 438.0145, "b": 226.63964999999996, "coord_origin": "TOPLEFT"}}, {"id": 83, "text": "PSUs", "bbox": {"l": 427.18323, "t": 236.74712999999997, "r": 440.98778999999996, "b": 241.81232, "coord_origin": "TOPLEFT"}}, {"id": 84, "text": "RSUs", "bbox": {"l": 468.38254, "t": 236.74712999999997, "r": 482.48465000000004, "b": 241.81232, "coord_origin": "TOPLEFT"}}, {"id": 85, "text": "PSUs", "bbox": {"l": 516.92578, "t": 236.74712999999997, "r": 530.73035, "b": 241.81232, "coord_origin": "TOPLEFT"}}, {"id": 86, "text": "Nonvested on Janua", "bbox": {"l": 306.11493, "t": 244.61084000000005, "r": 355.6532, "b": 249.67602999999997, "coord_origin": "TOPLEFT"}}, {"id": 87, "text": "ry 1", "bbox": {"l": 355.65427, "t": 244.61084000000005, "r": 364.65607, "b": 249.67602999999997, "coord_origin": "TOPLEFT"}}, {"id": 88, "text": "1.", "bbox": {"l": 396.24661, "t": 244.91327, "r": 400.75238, "b": 249.97844999999995, "coord_origin": "TOPLEFT"}}, {"id": 89, "text": "1", "bbox": {"l": 400.7529, "t": 244.91327, "r": 403.75531, "b": 249.97844999999995, "coord_origin": "TOPLEFT"}}, {"id": 90, "text": "0.3", "bbox": {"l": 429.81838999999997, "t": 244.91327, "r": 437.32708999999994, "b": 249.97844999999995, "coord_origin": "TOPLEFT"}}, {"id": 91, "text": "90.10", "bbox": {"l": 465.52859, "t": 244.91327, "r": 478.40103, "b": 249.97844999999995, "coord_origin": "TOPLEFT"}}, {"id": 92, "text": "$", "bbox": {"l": 480.97552, "t": 244.91327, "r": 483.55001999999996, "b": 249.97844999999995, "coord_origin": "TOPLEFT"}}, {"id": 93, "text": "$ 91.19", "bbox": {"l": 513.44824, "t": 244.91327, "r": 531.46967, "b": 249.97844999999995, "coord_origin": "TOPLEFT"}}, {"id": 94, "text": "Granted", "bbox": {"l": 306.11493, "t": 253.68451000000005, "r": 325.62674, "b": 258.74969, "coord_origin": "TOPLEFT"}}, {"id": 95, "text": "0.", "bbox": {"l": 396.24661, "t": 253.68451000000005, "r": 400.75238, "b": 258.74969, "coord_origin": "TOPLEFT"}}, {"id": 96, "text": "5", "bbox": {"l": 400.7529, "t": 253.68451000000005, "r": 403.75531, "b": 258.74969, "coord_origin": "TOPLEFT"}}, {"id": 97, "text": "0.1", "bbox": {"l": 429.81838999999997, "t": 253.68451000000005, "r": 437.32708999999994, "b": 258.74969, "coord_origin": "TOPLEFT"}}, {"id": 98, "text": "117.44", "bbox": {"l": 466.43579000000005, "t": 253.68451000000005, "r": 482.54831, "b": 258.74969, "coord_origin": "TOPLEFT"}}, {"id": 99, "text": "122.41", "bbox": {"l": 514.29065, "t": 253.68451000000005, "r": 530.80981, "b": 258.74969, "coord_origin": "TOPLEFT"}}, {"id": 100, "text": "Vested", "bbox": {"l": 306.11493, "t": 261.54822, "r": 322.62866, "b": 266.61339999999996, "coord_origin": "TOPLEFT"}}, {"id": 101, "text": "(0.", "bbox": {"l": 394.43222, "t": 261.54822, "r": 400.73563, "b": 266.61339999999996, "coord_origin": "TOPLEFT"}}, {"id": 102, "text": "5", "bbox": {"l": 400.73456, "t": 261.54822, "r": 403.73697, "b": 266.61339999999996, "coord_origin": "TOPLEFT"}}, {"id": 103, "text": ")", "bbox": {"l": 403.73804, "t": 261.54822, "r": 405.53625, "b": 266.61339999999996, "coord_origin": "TOPLEFT"}}, {"id": 104, "text": "(0.1)", "bbox": {"l": 427.7016, "t": 261.54822, "r": 438.80563, "b": 266.61339999999996, "coord_origin": "TOPLEFT"}}, {"id": 105, "text": "87.08", "bbox": {"l": 468.55533, "t": 261.54822, "r": 482.07043, "b": 266.61339999999996, "coord_origin": "TOPLEFT"}}, {"id": 106, "text": "81.14", "bbox": {"l": 516.01862, "t": 261.54822, "r": 529.53375, "b": 266.61339999999996, "coord_origin": "TOPLEFT"}}, {"id": 107, "text": "Canceled or forfeited", "bbox": {"l": 306.11493, "t": 269.64148, "r": 356.24771, "b": 274.70667000000003, "coord_origin": "TOPLEFT"}}, {"id": 108, "text": "(0.", "bbox": {"l": 394.43222, "t": 270.31946000000005, "r": 400.73563, "b": 275.38464, "coord_origin": "TOPLEFT"}}, {"id": 109, "text": "1", "bbox": {"l": 400.73456, "t": 270.31946000000005, "r": 403.73697, "b": 275.38464, "coord_origin": "TOPLEFT"}}, {"id": 110, "text": ")", "bbox": {"l": 403.73804, "t": 270.31946000000005, "r": 405.53625, "b": 275.38464, "coord_origin": "TOPLEFT"}}, {"id": 111, "text": "-", "bbox": {"l": 431.02802, "t": 270.31946000000005, "r": 436.4280099999999, "b": 275.38464, "coord_origin": "TOPLEFT"}}, {"id": 112, "text": "102.01", "bbox": {"l": 465.83099000000004, "t": 270.31946000000005, "r": 482.35013, "b": 275.38464, "coord_origin": "TOPLEFT"}}, {"id": 113, "text": "92.18", "bbox": {"l": 516.01862, "t": 270.31946000000005, "r": 529.53375, "b": 275.38464, "coord_origin": "TOPLEFT"}}, {"id": 114, "text": "Nonvested on December 31", "bbox": {"l": 306.11493, "t": 278.48572, "r": 373.35764, "b": 283.55092999999994, "coord_origin": "TOPLEFT"}}, {"id": 115, "text": "1.0", "bbox": {"l": 396.24661, "t": 278.48572, "r": 403.75531, "b": 283.55092999999994, "coord_origin": "TOPLEFT"}}, {"id": 116, "text": "0.3", "bbox": {"l": 429.51599, "t": 278.48572, "r": 437.02469, "b": 283.55092999999994, "coord_origin": "TOPLEFT"}}, {"id": 117, "text": "104.85 $", "bbox": {"l": 463.7142, "t": 278.48572, "r": 484.73965000000004, "b": 283.55092999999994, "coord_origin": "TOPLEFT"}}, {"id": 118, "text": "$ 104.51", "bbox": {"l": 512.99463, "t": 278.48572, "r": 534.02008, "b": 283.55092999999994, "coord_origin": "TOPLEFT"}}]}, "text": null, "otsl_seq": ["ecel", "ched", "lcel", "ched", "lcel", "nl", "ecel", "ched", "ched", "ched", "ched", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "nl"], "num_rows": 7, "num_cols": 5, "table_cells": [{"bbox": {"l": 459.04861, "t": 221.62415, "r": 542.00018, "b": 232.89935000000003, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 2, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 3, "end_col_offset_idx": 5, "text": "Weighted Average Grant Date Fair Value", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 393.2442, "t": 236.74712999999997, "r": 407.34631, "b": 241.81232, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "RS U s", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 392.09671, "t": 221.57446000000004, "r": 438.0145, "b": 226.63964999999996, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 2, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 1, "end_col_offset_idx": 3, "text": "Shares (in millions)", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 427.18323, "t": 236.74712999999997, "r": 440.98778999999996, "b": 241.81232, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "PSUs", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 468.38254, "t": 236.74712999999997, "r": 482.48465000000004, "b": 241.81232, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "RSUs", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 516.92578, "t": 236.74712999999997, "r": 530.73035, "b": 241.81232, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "PSUs", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 306.11493, "t": 244.61084000000005, "r": 364.65607, "b": 249.67602999999997, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Nonvested on Janua ry 1", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 396.24661, "t": 244.91327, "r": 403.75531, "b": 249.97844999999995, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "1. 1", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 429.81838999999997, "t": 244.91327, "r": 437.32708999999994, "b": 249.97844999999995, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "0.3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 465.52859, "t": 244.91327, "r": 483.55001999999996, "b": 249.97844999999995, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "90.10 $", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 513.44824, "t": 244.91327, "r": 531.46967, "b": 249.97844999999995, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "$ 91.19", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 306.11493, "t": 253.68451000000005, "r": 325.62674, "b": 258.74969, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Granted", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 396.24661, "t": 253.68451000000005, "r": 403.75531, "b": 258.74969, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "0. 5", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 429.81838999999997, "t": 253.68451000000005, "r": 437.32708999999994, "b": 258.74969, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "0.1", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 466.43579000000005, "t": 253.68451000000005, "r": 482.54831, "b": 258.74969, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "117.44", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 514.29065, "t": 253.68451000000005, "r": 530.80981, "b": 258.74969, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "122.41", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 306.11493, "t": 261.54822, "r": 322.62866, "b": 266.61339999999996, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Vested", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 394.43222, "t": 261.54822, "r": 405.53625, "b": 266.61339999999996, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "(0. 5 )", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 427.7016, "t": 261.54822, "r": 438.80563, "b": 266.61339999999996, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "(0.1)", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 468.55533, "t": 261.54822, "r": 482.07043, "b": 266.61339999999996, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "87.08", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 516.01862, "t": 261.54822, "r": 529.53375, "b": 266.61339999999996, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "81.14", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 306.11493, "t": 269.64148, "r": 356.24771, "b": 274.70667000000003, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Canceled or forfeited", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 394.43222, "t": 270.31946000000005, "r": 405.53625, "b": 275.38464, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "(0. 1 )", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 431.02802, "t": 270.31946000000005, "r": 436.4280099999999, "b": 275.38464, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "-", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 465.83099000000004, "t": 270.31946000000005, "r": 482.35013, "b": 275.38464, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "102.01", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 516.01862, "t": 270.31946000000005, "r": 529.53375, "b": 275.38464, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "92.18", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 306.11493, "t": 278.48572, "r": 373.35764, "b": 283.55092999999994, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Nonvested on December 31", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 396.24661, "t": 278.48572, "r": 403.75531, "b": 283.55092999999994, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "1.0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 429.51599, "t": 278.48572, "r": 437.02469, "b": 283.55092999999994, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "0.3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 463.7142, "t": 278.48572, "r": 484.73965000000004, "b": 283.55092999999994, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "104.85 $", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 512.99463, "t": 278.48572, "r": 534.02008, "b": 283.55092999999994, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "$ 104.51", "column_header": false, "row_header": false, "row_section": false}]}, {"label": "caption", "id": 9, "page_no": 7, "cluster": {"id": 9, "label": "caption", "bbox": {"l": 49.38380813598633, "t": 319.77777099609375, "r": 545.11377, "b": 365.64987, "coord_origin": "TOPLEFT"}, "confidence": 0.9140278697013855, "cells": [{"id": 119, "text": "Figure 5:", "bbox": {"l": 50.112, "t": 320.87735, "r": 86.864021, "b": 329.78391, "coord_origin": "TOPLEFT"}}, {"id": 120, "text": "One of the benefits of TableFormer is that it is language agnostic, as an example, the left part of the illustration", "bbox": {"l": 93.917542, "t": 320.87735, "r": 545.11371, "b": 329.78391, "coord_origin": "TOPLEFT"}}, {"id": 121, "text": "demonstrates TableFormer predictions on previously unseen language (Japanese). Additionally, we see that TableFormer is", "bbox": {"l": 50.112, "t": 332.83233999999993, "r": 545.11371, "b": 341.73889, "coord_origin": "TOPLEFT"}}, {"id": 122, "text": "robust to variability in style and content, right side of the illustration shows the example of the TableFormer prediction from", "bbox": {"l": 50.112, "t": 344.78732, "r": 545.11377, "b": 353.69388, "coord_origin": "TOPLEFT"}}, {"id": 123, "text": "the FinTabNet dataset.", "bbox": {"l": 50.112, "t": 356.74332, "r": 139.79532, "b": 365.64987, "coord_origin": "TOPLEFT"}}]}, "text": "Figure 5: One of the benefits of TableFormer is that it is language agnostic, as an example, the left part of the illustration demonstrates TableFormer predictions on previously unseen language (Japanese). Additionally, we see that TableFormer is robust to variability in style and content, right side of the illustration shows the example of the TableFormer prediction from the FinTabNet dataset."}, {"label": "picture", "id": 10, "page_no": 7, "cluster": {"id": 10, "label": "picture", "bbox": {"l": 216.76930236816406, "t": 380.49066162109375, "r": 375.7828674316406, "b": 443.34698486328125, "coord_origin": "TOPLEFT"}, "confidence": 0.8057794570922852, "cells": [{"id": 124, "text": "Red - PDF cells, Green - predicted bounding boxes", "bbox": {"l": 220.26282, "t": 381.77722, "r": 342.07819, "b": 386.44281, "coord_origin": "TOPLEFT"}}]}, "text": "", "data": null, "provenance": null, "predicted_class": null, "confidence": null}, {"label": "picture", "id": 11, "page_no": 7, "cluster": {"id": 11, "label": "picture", "bbox": {"l": 51.73619842529297, "t": 380.48077392578125, "r": 211.83766174316406, "b": 443.65802001953125, "coord_origin": "TOPLEFT"}, "confidence": 0.8307981491088867, "cells": [{"id": 125, "text": "Ground Truth", "bbox": {"l": 53.715248, "t": 381.77722, "r": 85.657333, "b": 386.44281, "coord_origin": "TOPLEFT"}}]}, "text": "", "data": null, "provenance": null, "predicted_class": null, "confidence": null}, {"label": "picture", "id": 12, "page_no": 7, "cluster": {"id": 12, "label": "picture", "bbox": {"l": 383.13629150390625, "t": 381.2313232421875, "r": 542.1132202148438, "b": 442.7750244140625, "coord_origin": "TOPLEFT"}, "confidence": 0.7880472540855408, "cells": [{"id": 126, "text": "16", "bbox": {"l": 437.37939, "t": 400.55295, "r": 443.69870000000003, "b": 406.87158, "coord_origin": "TOPLEFT"}}, {"id": 127, "text": "17", "bbox": {"l": 450.33203, "t": 400.55295, "r": 456.6513100000001, "b": 406.87158, "coord_origin": "TOPLEFT"}}, {"id": 128, "text": "18", "bbox": {"l": 463.28464, "t": 400.55295, "r": 469.60394, "b": 406.87158, "coord_origin": "TOPLEFT"}}, {"id": 129, "text": "19", "bbox": {"l": 476.23724000000004, "t": 400.55295, "r": 482.5565500000001, "b": 406.87158, "coord_origin": "TOPLEFT"}}, {"id": 130, "text": "20", "bbox": {"l": 489.18988, "t": 400.55295, "r": 495.50916, "b": 406.87158, "coord_origin": "TOPLEFT"}}, {"id": 131, "text": "21", "bbox": {"l": 502.14251999999993, "t": 400.55295, "r": 508.46178999999995, "b": 406.87158, "coord_origin": "TOPLEFT"}}, {"id": 132, "text": "22", "bbox": {"l": 515.09509, "t": 400.55295, "r": 521.41443, "b": 406.87158, "coord_origin": "TOPLEFT"}}, {"id": 133, "text": "23", "bbox": {"l": 385.2814, "t": 411.03836000000007, "r": 391.60071, "b": 417.35699, "coord_origin": "TOPLEFT"}}, {"id": 134, "text": "24", "bbox": {"l": 398.52341, "t": 411.03836000000007, "r": 404.84271, "b": 417.35699, "coord_origin": "TOPLEFT"}}, {"id": 135, "text": "25", "bbox": {"l": 411.47604, "t": 411.03836000000007, "r": 417.79535, "b": 417.35699, "coord_origin": "TOPLEFT"}}, {"id": 136, "text": "26", "bbox": {"l": 437.37939, "t": 411.03836000000007, "r": 443.69870000000003, "b": 417.35699, "coord_origin": "TOPLEFT"}}, {"id": 137, "text": "27", "bbox": {"l": 450.33203, "t": 411.03836000000007, "r": 456.6513100000001, "b": 417.35699, "coord_origin": "TOPLEFT"}}, {"id": 138, "text": "28", "bbox": {"l": 463.28464, "t": 411.03836000000007, "r": 469.60394, "b": 417.35699, "coord_origin": "TOPLEFT"}}, {"id": 139, "text": "30", "bbox": {"l": 385.2814, "t": 421.0697, "r": 391.60071, "b": 427.38834, "coord_origin": "TOPLEFT"}}, {"id": 140, "text": "31", "bbox": {"l": 398.52341, "t": 421.0697, "r": 404.84271, "b": 427.38834, "coord_origin": "TOPLEFT"}}, {"id": 141, "text": "32", "bbox": {"l": 411.47604, "t": 421.0697, "r": 417.79532, "b": 427.38834, "coord_origin": "TOPLEFT"}}, {"id": 142, "text": "33", "bbox": {"l": 424.42865, "t": 421.0697, "r": 430.74796, "b": 427.38834, "coord_origin": "TOPLEFT"}}, {"id": 143, "text": "34", "bbox": {"l": 437.38129, "t": 421.0697, "r": 443.70056, "b": 427.38834, "coord_origin": "TOPLEFT"}}, {"id": 144, "text": "35", "bbox": {"l": 450.33389000000005, "t": 421.0697, "r": 456.65319999999997, "b": 427.38834, "coord_origin": "TOPLEFT"}}, {"id": 145, "text": "36", "bbox": {"l": 463.2865, "t": 421.0697, "r": 469.6058, "b": 427.38834, "coord_origin": "TOPLEFT"}}, {"id": 146, "text": "37", "bbox": {"l": 476.23914, "t": 421.0697, "r": 482.55841, "b": 427.38834, "coord_origin": "TOPLEFT"}}, {"id": 147, "text": "38", "bbox": {"l": 489.1917700000001, "t": 421.0697, "r": 495.51105, "b": 427.38834, "coord_origin": "TOPLEFT"}}, {"id": 148, "text": "39", "bbox": {"l": 502.14438, "t": 421.0697, "r": 508.46368, "b": 427.38834, "coord_origin": "TOPLEFT"}}, {"id": 149, "text": "40", "bbox": {"l": 515.09705, "t": 421.0697, "r": 521.41632, "b": 427.38834, "coord_origin": "TOPLEFT"}}, {"id": 150, "text": "41", "bbox": {"l": 528.04962, "t": 421.0697, "r": 534.3689, "b": 427.38834, "coord_origin": "TOPLEFT"}}, {"id": 151, "text": "42", "bbox": {"l": 385.2814, "t": 432.04431, "r": 391.60071, "b": 438.36295, "coord_origin": "TOPLEFT"}}, {"id": 152, "text": "43", "bbox": {"l": 398.52341, "t": 432.04431, "r": 404.84271, "b": 438.36295, "coord_origin": "TOPLEFT"}}, {"id": 153, "text": "44", "bbox": {"l": 411.47604, "t": 432.04431, "r": 417.79532, "b": 438.36295, "coord_origin": "TOPLEFT"}}, {"id": 154, "text": "45", "bbox": {"l": 424.42865, "t": 432.04431, "r": 430.74796, "b": 438.36295, "coord_origin": "TOPLEFT"}}, {"id": 155, "text": "46", "bbox": {"l": 437.38129, "t": 432.04431, "r": 443.70056, "b": 438.36295, "coord_origin": "TOPLEFT"}}, {"id": 156, "text": "47", "bbox": {"l": 450.33389000000005, "t": 432.04431, "r": 456.65319999999997, "b": 438.36295, "coord_origin": "TOPLEFT"}}, {"id": 157, "text": "48", "bbox": {"l": 463.2865, "t": 432.04431, "r": 469.6058, "b": 438.36295, "coord_origin": "TOPLEFT"}}, {"id": 158, "text": "49", "bbox": {"l": 476.23914, "t": 432.04431, "r": 482.55841, "b": 438.36295, "coord_origin": "TOPLEFT"}}, {"id": 159, "text": "50", "bbox": {"l": 489.1917700000001, "t": 432.04431, "r": 495.51105, "b": 438.36295, "coord_origin": "TOPLEFT"}}, {"id": 160, "text": "51", "bbox": {"l": 502.14438, "t": 432.04431, "r": 508.46368, "b": 438.36295, "coord_origin": "TOPLEFT"}}, {"id": 161, "text": "52", "bbox": {"l": 515.09705, "t": 432.04431, "r": 521.41632, "b": 438.36295, "coord_origin": "TOPLEFT"}}, {"id": 162, "text": "53", "bbox": {"l": 528.04962, "t": 432.04431, "r": 534.3689, "b": 438.36295, "coord_origin": "TOPLEFT"}}, {"id": 163, "text": "0", "bbox": {"l": 385.2814, "t": 389.20004, "r": 388.44073, "b": 395.51868, "coord_origin": "TOPLEFT"}}, {"id": 164, "text": "1", "bbox": {"l": 398.52341, "t": 389.20004, "r": 401.68274, "b": 395.51868, "coord_origin": "TOPLEFT"}}, {"id": 165, "text": "2", "bbox": {"l": 411.4754, "t": 389.20004, "r": 414.63474, "b": 395.51868, "coord_origin": "TOPLEFT"}}, {"id": 166, "text": "3", "bbox": {"l": 424.4274, "t": 389.20004, "r": 427.58673, "b": 395.51868, "coord_origin": "TOPLEFT"}}, {"id": 167, "text": "4", "bbox": {"l": 437.37939, "t": 389.20004, "r": 440.53870000000006, "b": 395.51868, "coord_origin": "TOPLEFT"}}, {"id": 168, "text": "5", "bbox": {"l": 450.33136, "t": 389.20004, "r": 453.49069000000003, "b": 395.51868, "coord_origin": "TOPLEFT"}}, {"id": 169, "text": "6", "bbox": {"l": 463.28336, "t": 389.20004, "r": 466.44269, "b": 395.51868, "coord_origin": "TOPLEFT"}}, {"id": 170, "text": "7", "bbox": {"l": 476.23535, "t": 389.20004, "r": 479.39468, "b": 395.51868, "coord_origin": "TOPLEFT"}}, {"id": 171, "text": "8", "bbox": {"l": 489.18735, "t": 389.20004, "r": 492.34668, "b": 395.51868, "coord_origin": "TOPLEFT"}}, {"id": 172, "text": "9", "bbox": {"l": 502.13933999999995, "t": 389.20004, "r": 505.29868000000005, "b": 395.51868, "coord_origin": "TOPLEFT"}}, {"id": 173, "text": "10", "bbox": {"l": 515.09131, "t": 389.20004, "r": 521.41064, "b": 395.51868, "coord_origin": "TOPLEFT"}}, {"id": 174, "text": "11", "bbox": {"l": 528.04364, "t": 389.20004, "r": 534.13104, "b": 395.51868, "coord_origin": "TOPLEFT"}}, {"id": 175, "text": "12", "bbox": {"l": 385.2814, "t": 398.97464, "r": 391.60071, "b": 405.29327, "coord_origin": "TOPLEFT"}}, {"id": 176, "text": "13", "bbox": {"l": 398.52341, "t": 398.97464, "r": 404.84271, "b": 405.29327, "coord_origin": "TOPLEFT"}}, {"id": 177, "text": "14", "bbox": {"l": 411.47604, "t": 398.97464, "r": 417.79535, "b": 405.29327, "coord_origin": "TOPLEFT"}}, {"id": 178, "text": "15", "bbox": {"l": 424.42719, "t": 406.77463000000006, "r": 430.74648999999994, "b": 413.09326, "coord_origin": "TOPLEFT"}}, {"id": 179, "text": "29", "bbox": {"l": 502.86941999999993, "t": 410.99438, "r": 509.18871999999993, "b": 417.31302, "coord_origin": "TOPLEFT"}}, {"id": 180, "text": "Predicted Structure", "bbox": {"l": 384.35437, "t": 381.77722, "r": 430.99261, "b": 386.44281, "coord_origin": "TOPLEFT"}}]}, "text": "", "data": null, "provenance": null, "predicted_class": null, "confidence": null}, {"label": "caption", "id": 13, "page_no": 7, "cluster": {"id": 13, "label": "caption", "bbox": {"l": 62.00014114379883, "t": 457.9582824707031, "r": 532.63049, "b": 467.8396301269531, "coord_origin": "TOPLEFT"}, "confidence": 0.9153729677200317, "cells": [{"id": 181, "text": "Figure 6: An example of TableFormer predictions (bounding boxes and structure) from generated SynthTabNet table.", "bbox": {"l": 62.595001, "t": 458.72836, "r": 532.63049, "b": 467.63492, "coord_origin": "TOPLEFT"}}]}, "text": "Figure 6: An example of TableFormer predictions (bounding boxes and structure) from generated SynthTabNet table."}, {"label": "section_header", "id": 14, "page_no": 7, "cluster": {"id": 14, "label": "section_header", "bbox": {"l": 49.432151794433594, "t": 490.32525634765625, "r": 164.0072479248047, "b": 501.24741, "coord_origin": "TOPLEFT"}, "confidence": 0.9561254978179932, "cells": [{"id": 182, "text": "5.5.", "bbox": {"l": 50.112, "t": 491.39536, "r": 64.448898, "b": 501.24741, "coord_origin": "TOPLEFT"}}, {"id": 183, "text": "Qualitative Analysis", "bbox": {"l": 74.006828, "t": 491.39536, "r": 163.7558, "b": 501.24741, "coord_origin": "TOPLEFT"}}]}, "text": "5.5. Qualitative Analysis"}, {"label": "text", "id": 15, "page_no": 7, "cluster": {"id": 15, "label": "text", "bbox": {"l": 49.34812545776367, "t": 535.7822875976562, "r": 286.7104187011719, "b": 713.9708251953125, "coord_origin": "TOPLEFT"}, "confidence": 0.9852354526519775, "cells": [{"id": 184, "text": "We showcase several visualizations for the different", "bbox": {"l": 62.067001, "t": 536.87337, "r": 286.36499, "b": 545.77992, "coord_origin": "TOPLEFT"}}, {"id": 185, "text": "components of our network on various", "bbox": {"l": 50.112, "t": 548.82837, "r": 211.15741, "b": 557.73492, "coord_origin": "TOPLEFT"}}, {"id": 186, "text": "\u201ccomplex\u201d", "bbox": {"l": 215.10000999999997, "t": 548.91803, "r": 259.17453, "b": 557.50578, "coord_origin": "TOPLEFT"}}, {"id": 187, "text": "tables", "bbox": {"l": 263.12, "t": 548.82837, "r": 286.36273, "b": 557.73492, "coord_origin": "TOPLEFT"}}, {"id": 188, "text": "within datasets presented in this work in Fig. 5 and Fig. 6", "bbox": {"l": 50.112, "t": 560.78337, "r": 286.36505, "b": 569.68993, "coord_origin": "TOPLEFT"}}, {"id": 189, "text": "As it is shown, our model is able to predict bounding boxes", "bbox": {"l": 50.112, "t": 572.73837, "r": 286.36508, "b": 581.6449299999999, "coord_origin": "TOPLEFT"}}, {"id": 190, "text": "for all table cells, even for the empty ones. Additionally,", "bbox": {"l": 50.112, "t": 584.69337, "r": 286.36508, "b": 593.59993, "coord_origin": "TOPLEFT"}}, {"id": 191, "text": "our post-processing techniques can extract the cell content", "bbox": {"l": 50.112, "t": 596.64937, "r": 286.36505, "b": 605.55592, "coord_origin": "TOPLEFT"}}, {"id": 192, "text": "by matching the predicted bounding boxes to the PDF cells", "bbox": {"l": 50.112, "t": 608.60437, "r": 286.36508, "b": 617.51093, "coord_origin": "TOPLEFT"}}, {"id": 193, "text": "based on their overlap and spatial proximity. The left part", "bbox": {"l": 50.112, "t": 620.55937, "r": 286.36508, "b": 629.46593, "coord_origin": "TOPLEFT"}}, {"id": 194, "text": "of Fig. 5 demonstrates also the adaptability of our method", "bbox": {"l": 50.112, "t": 632.51437, "r": 286.36508, "b": 641.42093, "coord_origin": "TOPLEFT"}}, {"id": 195, "text": "to any language, as it can successfully extract Japanese", "bbox": {"l": 50.112, "t": 644.46938, "r": 286.36508, "b": 653.37593, "coord_origin": "TOPLEFT"}}, {"id": 196, "text": "text, although the training set contains only English content.", "bbox": {"l": 50.112, "t": 656.42438, "r": 286.36511, "b": 665.33094, "coord_origin": "TOPLEFT"}}, {"id": 197, "text": "We provide more visualizations including the intermediate", "bbox": {"l": 50.112, "t": 668.38037, "r": 286.36508, "b": 677.28694, "coord_origin": "TOPLEFT"}}, {"id": 198, "text": "steps in the supplementary material. Overall these illustra-", "bbox": {"l": 50.112, "t": 680.33537, "r": 286.36511, "b": 689.24194, "coord_origin": "TOPLEFT"}}, {"id": 199, "text": "tions justify the versatility of our method across a diverse", "bbox": {"l": 50.112, "t": 692.290375, "r": 286.36511, "b": 701.196945, "coord_origin": "TOPLEFT"}}, {"id": 200, "text": "range of table appearances and content type.", "bbox": {"l": 50.112, "t": 704.245377, "r": 226.88833999999997, "b": 713.1519470000001, "coord_origin": "TOPLEFT"}}]}, "text": "We showcase several visualizations for the different components of our network on various \u201ccomplex\u201d tables within datasets presented in this work in Fig. 5 and Fig. 6 As it is shown, our model is able to predict bounding boxes for all table cells, even for the empty ones. Additionally, our post-processing techniques can extract the cell content by matching the predicted bounding boxes to the PDF cells based on their overlap and spatial proximity. The left part of Fig. 5 demonstrates also the adaptability of our method to any language, as it can successfully extract Japanese text, although the training set contains only English content. We provide more visualizations including the intermediate steps in the supplementary material. Overall these illustrations justify the versatility of our method across a diverse range of table appearances and content type."}, {"label": "section_header", "id": 16, "page_no": 7, "cluster": {"id": 16, "label": "section_header", "bbox": {"l": 308.2789306640625, "t": 489.6516418457031, "r": 460.84848, "b": 501.45663, "coord_origin": "TOPLEFT"}, "confidence": 0.9436547160148621, "cells": [{"id": 201, "text": "6.", "bbox": {"l": 308.862, "t": 490.70892, "r": 316.07382, "b": 501.45663, "coord_origin": "TOPLEFT"}}, {"id": 202, "text": "Future Work & Conclusion", "bbox": {"l": 325.68954, "t": 490.70892, "r": 460.84848, "b": 501.45663, "coord_origin": "TOPLEFT"}}]}, "text": "6. Future Work & Conclusion"}, {"label": "text", "id": 17, "page_no": 7, "cluster": {"id": 17, "label": "text", "bbox": {"l": 307.99957275390625, "t": 511.78887939453125, "r": 545.2568359375, "b": 653.30592, "coord_origin": "TOPLEFT"}, "confidence": 0.9875592589378357, "cells": [{"id": 203, "text": "In this paper, we presented TableFormer an end-to-end", "bbox": {"l": 320.81699, "t": 512.89337, "r": 545.11505, "b": 521.79993, "coord_origin": "TOPLEFT"}}, {"id": 204, "text": "transformer based approach to predict table structures and", "bbox": {"l": 308.862, "t": 524.84836, "r": 545.11517, "b": 533.75491, "coord_origin": "TOPLEFT"}}, {"id": 205, "text": "bounding boxes of cells from an image. This approach en-", "bbox": {"l": 308.862, "t": 536.80336, "r": 545.11511, "b": 545.70992, "coord_origin": "TOPLEFT"}}, {"id": 206, "text": "ables us to recreate the table structure, and extract the cell", "bbox": {"l": 308.862, "t": 548.75836, "r": 545.11505, "b": 557.6649199999999, "coord_origin": "TOPLEFT"}}, {"id": 207, "text": "content from PDF or OCR by using bounding boxes. Ad-", "bbox": {"l": 308.862, "t": 560.71336, "r": 545.11517, "b": 569.61992, "coord_origin": "TOPLEFT"}}, {"id": 208, "text": "ditionally, it provides the versatility required in real-world", "bbox": {"l": 308.862, "t": 572.66837, "r": 545.11511, "b": 581.57492, "coord_origin": "TOPLEFT"}}, {"id": 209, "text": "scenarios when dealing with various types of PDF docu-", "bbox": {"l": 308.862, "t": 584.62436, "r": 545.11511, "b": 593.53091, "coord_origin": "TOPLEFT"}}, {"id": 210, "text": "ments, and languages.", "bbox": {"l": 308.862, "t": 596.57936, "r": 400.46808, "b": 605.48592, "coord_origin": "TOPLEFT"}}, {"id": 211, "text": "Furthermore, our method outper-", "bbox": {"l": 408.37839, "t": 596.57936, "r": 545.11511, "b": 605.48592, "coord_origin": "TOPLEFT"}}, {"id": 212, "text": "forms all state-of-the-arts with a wide margin. Finally, we", "bbox": {"l": 308.862, "t": 608.53436, "r": 545.11505, "b": 617.44092, "coord_origin": "TOPLEFT"}}, {"id": 213, "text": "introduce \u201cSynthTabNet\u201d a challenging synthetically gen-", "bbox": {"l": 308.862, "t": 620.48936, "r": 545.11511, "b": 629.3959199999999, "coord_origin": "TOPLEFT"}}, {"id": 214, "text": "erated dataset that reinforces missing characteristics from", "bbox": {"l": 308.862, "t": 632.4443699999999, "r": 545.11505, "b": 641.35092, "coord_origin": "TOPLEFT"}}, {"id": 215, "text": "other datasets.", "bbox": {"l": 308.862, "t": 644.39937, "r": 365.85803, "b": 653.30592, "coord_origin": "TOPLEFT"}}]}, "text": "In this paper, we presented TableFormer an end-to-end transformer based approach to predict table structures and bounding boxes of cells from an image. This approach enables us to recreate the table structure, and extract the cell content from PDF or OCR by using bounding boxes. Additionally, it provides the versatility required in real-world scenarios when dealing with various types of PDF documents, and languages. Furthermore, our method outperforms all state-of-the-arts with a wide margin. Finally, we introduce \u201cSynthTabNet\u201d a challenging synthetically generated dataset that reinforces missing characteristics from other datasets."}, {"label": "section_header", "id": 18, "page_no": 7, "cluster": {"id": 18, "label": "section_header", "bbox": {"l": 308.3702392578125, "t": 671.6679077148438, "r": 364.48675537109375, "b": 682.84664, "coord_origin": "TOPLEFT"}, "confidence": 0.9442476034164429, "cells": [{"id": 216, "text": "References", "bbox": {"l": 308.862, "t": 672.09892, "r": 364.40585, "b": 682.84664, "coord_origin": "TOPLEFT"}}]}, "text": "References"}, {"label": "list_item", "id": 19, "page_no": 7, "cluster": {"id": 19, "label": "list_item", "bbox": {"l": 313.0051574707031, "t": 692.8663940429688, "r": 545.1151123046875, "b": 713.3478393554688, "coord_origin": "TOPLEFT"}, "confidence": 0.8318727612495422, "cells": [{"id": 217, "text": "[1]", "bbox": {"l": 313.345, "t": 693.9617920000001, "r": 323.80792, "b": 701.977753, "coord_origin": "TOPLEFT"}}, {"id": 218, "text": "Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas", "bbox": {"l": 326.05127, "t": 693.9617920000001, "r": 545.10852, "b": 701.977753, "coord_origin": "TOPLEFT"}}, {"id": 219, "text": "Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to-", "bbox": {"l": 328.78101, "t": 704.920792, "r": 545.1134, "b": 712.936752, "coord_origin": "TOPLEFT"}}]}, "text": "[1] Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to-"}, {"label": "page_footer", "id": 20, "page_no": 7, "cluster": {"id": 20, "label": "page_footer", "bbox": {"l": 294.6642761230469, "t": 733.6441650390625, "r": 300.13397216796875, "b": 743.039928, "coord_origin": "TOPLEFT"}, "confidence": 0.8709819912910461, "cells": [{"id": 220, "text": "8", "bbox": {"l": 295.121, "t": 734.133366, "r": 300.10229, "b": 743.039928, "coord_origin": "TOPLEFT"}}]}, "text": "8"}, {"label": "picture", "id": 21, "page_no": 7, "cluster": {"id": 21, "label": "picture", "bbox": {"l": 49.97499084472656, "t": 103.71235656738281, "r": 301.6349182128906, "b": 187.57875061035156, "coord_origin": "TOPLEFT"}, "confidence": 0.7873188853263855, "cells": []}, "text": "", "data": null, "provenance": null, "predicted_class": null, "confidence": null}], "body": [{"label": "text", "id": 0, "page_no": 7, "cluster": {"id": 0, "label": "text", "bbox": {"l": 53.811783000000005, "t": 208.23328000000004, "r": 62.219952, "b": 216.10645, "coord_origin": "TOPLEFT"}, "confidence": -1.0, "cells": [{"id": 0, "text": "b.", "bbox": {"l": 53.811783000000005, "t": 208.23328000000004, "r": 62.219952, "b": 216.10645, "coord_origin": "TOPLEFT"}}]}, "text": "b."}, {"label": "text", "id": 1, "page_no": 7, "cluster": {"id": 1, "label": "text", "bbox": {"l": 66.424026, "t": 208.23328000000004, "r": 385.93451, "b": 216.10645, "coord_origin": "TOPLEFT"}, "confidence": -1.0, "cells": [{"id": 1, "text": "Structure predicted by TableFormer, with superimposed matched PDF cell text:", "bbox": {"l": 66.424026, "t": 208.23328000000004, "r": 385.93451, "b": 216.10645, "coord_origin": "TOPLEFT"}}]}, "text": "Structure predicted by TableFormer, with superimposed matched PDF cell text:"}, {"label": "text", "id": 2, "page_no": 7, "cluster": {"id": 2, "label": "text", "bbox": {"l": 53.811783000000005, "t": 94.28112999999996, "r": 284.34592, "b": 102.15430000000003, "coord_origin": "TOPLEFT"}, "confidence": -1.0, "cells": [{"id": 2, "text": "Japanese language (previously unseen by TableFormer):", "bbox": {"l": 53.811783000000005, "t": 94.28112999999996, "r": 284.34592, "b": 102.15430000000003, "coord_origin": "TOPLEFT"}}]}, "text": "Japanese language (previously unseen by TableFormer):"}, {"label": "picture", "id": 3, "page_no": 7, "cluster": {"id": 3, "label": "picture", "bbox": {"l": 304.83081, "t": 94.28112999999996, "r": 554.8255615234375, "b": 180.62570190429688, "coord_origin": "TOPLEFT"}, "confidence": 0.7697690725326538, "cells": [{"id": 3, "text": "Example table from FinTabNet:", "bbox": {"l": 304.83081, "t": 94.28112999999996, "r": 431.09119, "b": 102.15430000000003, "coord_origin": "TOPLEFT"}}]}, "text": "", "data": null, "provenance": null, "predicted_class": null, "confidence": null}, {"label": "text", "id": 4, "page_no": 7, "cluster": {"id": 4, "label": "text", "bbox": {"l": 53.286037, "t": 78.68756000000008, "r": 61.550289, "b": 86.56073000000004, "coord_origin": "TOPLEFT"}, "confidence": -1.0, "cells": [{"id": 4, "text": "a.", "bbox": {"l": 53.286037, "t": 78.68756000000008, "r": 61.550289, "b": 86.56073000000004, "coord_origin": "TOPLEFT"}}]}, "text": "a."}, {"label": "text", "id": 5, "page_no": 7, "cluster": {"id": 5, "label": "text", "bbox": {"l": 65.682419, "t": 77.8168716430664, "r": 500.1541748046875, "b": 86.9799575805664, "coord_origin": "TOPLEFT"}, "confidence": 0.6126988530158997, "cells": [{"id": 5, "text": "Red - PDF cells, Green - predicted bounding boxes, Blue - post-processed predictions matched to PDF cells", "bbox": {"l": 65.682419, "t": 78.68756000000008, "r": 499.55563, "b": 86.56073000000004, "coord_origin": "TOPLEFT"}}]}, "text": "Red - PDF cells, Green - predicted bounding boxes, Blue - post-processed predictions matched to PDF cells"}, {"label": "table", "id": 6, "page_no": 7, "cluster": {"id": 6, "label": "table", "bbox": {"l": 53.6285400390625, "t": 218.94859313964844, "r": 298.5574035644531, "b": 292.3999938964844, "coord_origin": "TOPLEFT"}, "confidence": 0.8824083805084229, "cells": [{"id": 6, "text": "\u8ad6\u6587\u30d5\u30a1\u30a4\u30eb", "bbox": {"l": 209.93285, "t": 222.18073000000004, "r": 241.04458999999997, "b": 226.36212, "coord_origin": "TOPLEFT"}}, {"id": 7, "text": "\u53c2\u8003\u6587\u732e", "bbox": {"l": 263.76489, "t": 222.18073000000004, "r": 284.50589, "b": 226.36212, "coord_origin": "TOPLEFT"}}, {"id": 8, "text": "\u51fa\u5178", "bbox": {"l": 110.24990999999999, "t": 229.66594999999995, "r": 120.62018, "b": 233.84735, "coord_origin": "TOPLEFT"}}, {"id": 9, "text": "\u30d5\u30a1\u30a4\u30eb", "bbox": {"l": 175.36609, "t": 229.66594999999995, "r": 196.1071, "b": 233.84735, "coord_origin": "TOPLEFT"}}, {"id": 10, "text": "\u6570", "bbox": {"l": 196.10756, "t": 229.66594999999995, "r": 201.29247, "b": 233.84735, "coord_origin": "TOPLEFT"}}, {"id": 11, "text": "\u82f1\u8a9e", "bbox": {"l": 209.62408, "t": 229.66594999999995, "r": 219.99435, "b": 233.84735, "coord_origin": "TOPLEFT"}}, {"id": 12, "text": "\u65e5\u672c\u8a9e", "bbox": {"l": 229.19814, "t": 229.66594999999995, "r": 244.75377, "b": 233.84735, "coord_origin": "TOPLEFT"}}, {"id": 13, "text": "\u82f1\u8a9e", "bbox": {"l": 256.1142, "t": 229.66594999999995, "r": 266.48447, "b": 233.84735, "coord_origin": "TOPLEFT"}}, {"id": 14, "text": "\u65e5\u672c\u8a9e", "bbox": {"l": 278.38434, "t": 229.66594999999995, "r": 293.93997, "b": 233.84735, "coord_origin": "TOPLEFT"}}, {"id": 15, "text": "Association for Computational Linguistics(ACL2003)", "bbox": {"l": 55.53052099999999, "t": 236.42584, "r": 162.7131, "b": 240.78375000000005, "coord_origin": "TOPLEFT"}}, {"id": 16, "text": "65", "bbox": {"l": 184.39731, "t": 236.42584, "r": 189.56456, "b": 240.78375000000005, "coord_origin": "TOPLEFT"}}, {"id": 17, "text": "65", "bbox": {"l": 208.99026, "t": 236.42584, "r": 214.15752, "b": 240.78375000000005, "coord_origin": "TOPLEFT"}}, {"id": 18, "text": "0", "bbox": {"l": 234.87517, "t": 236.42584, "r": 237.45833000000002, "b": 240.78375000000005, "coord_origin": "TOPLEFT"}}, {"id": 19, "text": "150", "bbox": {"l": 256.88446, "t": 236.42584, "r": 264.6358, "b": 240.78375000000005, "coord_origin": "TOPLEFT"}}, {"id": 20, "text": "0", "bbox": {"l": 284.06134, "t": 236.42584, "r": 286.6445, "b": 240.78375000000005, "coord_origin": "TOPLEFT"}}, {"id": 21, "text": "Computational Linguistics(COLING2002)", "bbox": {"l": 55.53052099999999, "t": 242.62048000000004, "r": 139.72253, "b": 246.97839, "coord_origin": "TOPLEFT"}}, {"id": 22, "text": "140", "bbox": {"l": 183.10536, "t": 242.62048000000004, "r": 190.8567, "b": 246.97839, "coord_origin": "TOPLEFT"}}, {"id": 23, "text": "140", "bbox": {"l": 207.69832, "t": 242.62048000000004, "r": 215.44965999999997, "b": 246.97839, "coord_origin": "TOPLEFT"}}, {"id": 24, "text": "0", "bbox": {"l": 234.87517, "t": 242.62048000000004, "r": 237.45833000000002, "b": 246.97839, "coord_origin": "TOPLEFT"}}, {"id": 25, "text": "150", "bbox": {"l": 256.88446, "t": 242.62048000000004, "r": 264.6358, "b": 246.97839, "coord_origin": "TOPLEFT"}}, {"id": 26, "text": "0", "bbox": {"l": 284.06134, "t": 242.62048000000004, "r": 286.6445, "b": 246.97839, "coord_origin": "TOPLEFT"}}, {"id": 27, "text": "\u96fb\u6c17\u60c5\u5831\u901a\u4fe1\u5b66\u4f1a", "bbox": {"l": 55.53052099999999, "t": 249.79845999999998, "r": 97.013, "b": 253.97986000000003, "coord_origin": "TOPLEFT"}}, {"id": 28, "text": "2003", "bbox": {"l": 92.698288, "t": 249.58942000000002, "r": 103.03371, "b": 253.94732999999997, "coord_origin": "TOPLEFT"}}, {"id": 29, "text": "\u5e74\u7dcf\u5408\u5927\u4f1a", "bbox": {"l": 103.03389, "t": 249.79845999999998, "r": 128.96027, "b": 253.97986000000003, "coord_origin": "TOPLEFT"}}, {"id": 30, "text": "150", "bbox": {"l": 183.10536, "t": 248.81506000000002, "r": 190.8567, "b": 253.17296999999996, "coord_origin": "TOPLEFT"}}, {"id": 31, "text": "8", "bbox": {"l": 210.28223, "t": 248.81506000000002, "r": 212.86539, "b": 253.17296999999996, "coord_origin": "TOPLEFT"}}, {"id": 32, "text": "142", "bbox": {"l": 232.29153, "t": 248.81506000000002, "r": 240.04287999999997, "b": 253.17296999999996, "coord_origin": "TOPLEFT"}}, {"id": 33, "text": "223", "bbox": {"l": 256.88446, "t": 248.81506000000002, "r": 264.6358, "b": 253.17296999999996, "coord_origin": "TOPLEFT"}}, {"id": 34, "text": "147", "bbox": {"l": 281.47742, "t": 248.81506000000002, "r": 289.22876, "b": 253.17296999999996, "coord_origin": "TOPLEFT"}}, {"id": 35, "text": "\u60c5\u5831\u51e6\u7406\u5b66\u4f1a\u7b2c", "bbox": {"l": 55.53052099999999, "t": 257.28369, "r": 91.827637, "b": 261.46509000000003, "coord_origin": "TOPLEFT"}}, {"id": 36, "text": "65", "bbox": {"l": 88.052673, "t": 257.07465, "r": 93.219925, "b": 261.43255999999997, "coord_origin": "TOPLEFT"}}, {"id": 37, "text": "\u56de\u5168\u56fd\u5927\u4f1a", "bbox": {"l": 93.220474, "t": 257.28369, "r": 119.14685, "b": 261.46509000000003, "coord_origin": "TOPLEFT"}}, {"id": 38, "text": "(2003)", "bbox": {"l": 116.45073999999998, "t": 257.07465, "r": 129.88177, "b": 261.43255999999997, "coord_origin": "TOPLEFT"}}, {"id": 39, "text": "177", "bbox": {"l": 183.10536, "t": 256.30029, "r": 190.8567, "b": 260.65819999999997, "coord_origin": "TOPLEFT"}}, {"id": 40, "text": "1", "bbox": {"l": 210.28223, "t": 256.30029, "r": 212.86539, "b": 260.65819999999997, "coord_origin": "TOPLEFT"}}, {"id": 41, "text": "176", "bbox": {"l": 232.29153, "t": 256.30029, "r": 240.04287999999997, "b": 260.65819999999997, "coord_origin": "TOPLEFT"}}, {"id": 42, "text": "150", "bbox": {"l": 256.88446, "t": 256.30029, "r": 264.6358, "b": 260.65819999999997, "coord_origin": "TOPLEFT"}}, {"id": 43, "text": "236", "bbox": {"l": 281.47742, "t": 256.30029, "r": 289.22876, "b": 260.65819999999997, "coord_origin": "TOPLEFT"}}, {"id": 44, "text": "\u7b2c", "bbox": {"l": 55.53052099999999, "t": 264.5108, "r": 60.715424, "b": 268.69219999999996, "coord_origin": "TOPLEFT"}}, {"id": 45, "text": "17", "bbox": {"l": 60.17654799999999, "t": 264.30175999999994, "r": 65.343796, "b": 268.65967, "coord_origin": "TOPLEFT"}}, {"id": 46, "text": "\u56de\u4eba\u5de5\u77e5\u80fd\u5b66\u4f1a\u5168\u56fd\u5927\u4f1a", "bbox": {"l": 65.344376, "t": 264.5108, "r": 122.38297000000001, "b": 268.69219999999996, "coord_origin": "TOPLEFT"}}, {"id": 47, "text": "(2003)", "bbox": {"l": 116.45073999999998, "t": 264.30175999999994, "r": 129.88177, "b": 268.65967, "coord_origin": "TOPLEFT"}}, {"id": 48, "text": "208", "bbox": {"l": 183.10536, "t": 263.52739999999994, "r": 190.8567, "b": 267.88531, "coord_origin": "TOPLEFT"}}, {"id": 49, "text": "5", "bbox": {"l": 210.28223, "t": 263.52739999999994, "r": 212.86539, "b": 267.88531, "coord_origin": "TOPLEFT"}}, {"id": 50, "text": "203", "bbox": {"l": 232.29153, "t": 263.52739999999994, "r": 240.04287999999997, "b": 267.88531, "coord_origin": "TOPLEFT"}}, {"id": 51, "text": "152", "bbox": {"l": 256.88446, "t": 263.52739999999994, "r": 264.6358, "b": 267.88531, "coord_origin": "TOPLEFT"}}, {"id": 52, "text": "244", "bbox": {"l": 281.47742, "t": 263.52739999999994, "r": 289.22876, "b": 267.88531, "coord_origin": "TOPLEFT"}}, {"id": 53, "text": "\u81ea\u7136\u8a00\u8a9e\u51e6\u7406\u7814\u7a76\u4f1a\u7b2c", "bbox": {"l": 55.53052099999999, "t": 271.73785, "r": 107.38374, "b": 275.91925000000003, "coord_origin": "TOPLEFT"}}, {"id": 54, "text": "146", "bbox": {"l": 101.99034, "t": 271.52881, "r": 109.74168000000002, "b": 275.88671999999997, "coord_origin": "TOPLEFT"}}, {"id": 55, "text": "\u301c", "bbox": {"l": 109.74204, "t": 271.73785, "r": 114.92695000000002, "b": 275.91925000000003, "coord_origin": "TOPLEFT"}}, {"id": 56, "text": "155", "bbox": {"l": 114.38793, "t": 271.52881, "r": 122.13927, "b": 275.88671999999997, "coord_origin": "TOPLEFT"}}, {"id": 57, "text": "\u56de", "bbox": {"l": 122.13963, "t": 271.73785, "r": 127.32454000000001, "b": 275.91925000000003, "coord_origin": "TOPLEFT"}}, {"id": 58, "text": "98", "bbox": {"l": 184.39731, "t": 270.75446, "r": 189.56456, "b": 275.11237000000006, "coord_origin": "TOPLEFT"}}, {"id": 59, "text": "2", "bbox": {"l": 210.28223, "t": 270.75446, "r": 212.86539, "b": 275.11237000000006, "coord_origin": "TOPLEFT"}}, {"id": 60, "text": "96", "bbox": {"l": 233.58348, "t": 270.75446, "r": 238.75072999999998, "b": 275.11237000000006, "coord_origin": "TOPLEFT"}}, {"id": 61, "text": "150", "bbox": {"l": 256.88446, "t": 270.75446, "r": 264.6358, "b": 275.11237000000006, "coord_origin": "TOPLEFT"}}, {"id": 62, "text": "232", "bbox": {"l": 281.47742, "t": 270.75446, "r": 289.22876, "b": 275.11237000000006, "coord_origin": "TOPLEFT"}}, {"id": 63, "text": "WWW", "bbox": {"l": 55.53052099999999, "t": 279.01392, "r": 68.68605, "b": 283.37183, "coord_origin": "TOPLEFT"}}, {"id": 64, "text": "\u304b\u3089\u53ce\u96c6\u3057\u305f\u8ad6\u6587", "bbox": {"l": 68.685814, "t": 279.22295999999994, "r": 110.16829999999999, "b": 283.40436, "coord_origin": "TOPLEFT"}}, {"id": 65, "text": "107", "bbox": {"l": 183.10536, "t": 277.98157000000003, "r": 190.8567, "b": 282.33948000000004, "coord_origin": "TOPLEFT"}}, {"id": 66, "text": "73", "bbox": {"l": 208.99026, "t": 277.98157000000003, "r": 214.15752, "b": 282.33948000000004, "coord_origin": "TOPLEFT"}}, {"id": 67, "text": "34", "bbox": {"l": 233.58348, "t": 277.98157000000003, "r": 238.75072999999998, "b": 282.33948000000004, "coord_origin": "TOPLEFT"}}, {"id": 68, "text": "147", "bbox": {"l": 256.88446, "t": 277.98157000000003, "r": 264.6358, "b": 282.33948000000004, "coord_origin": "TOPLEFT"}}, {"id": 69, "text": "96", "bbox": {"l": 282.76938, "t": 277.98157000000003, "r": 287.93661, "b": 282.33948000000004, "coord_origin": "TOPLEFT"}}, {"id": 70, "text": "\u8a08", "bbox": {"l": 169.61508, "t": 286.45004, "r": 174.79999, "b": 290.63141, "coord_origin": "TOPLEFT"}}, {"id": 71, "text": "945", "bbox": {"l": 183.10536, "t": 285.46667, "r": 190.8567, "b": 289.8245800000001, "coord_origin": "TOPLEFT"}}, {"id": 72, "text": "294", "bbox": {"l": 207.69832, "t": 285.46667, "r": 215.44965999999997, "b": 289.8245800000001, "coord_origin": "TOPLEFT"}}, {"id": 73, "text": "651", "bbox": {"l": 232.29153, "t": 285.46667, "r": 240.04287999999997, "b": 289.8245800000001, "coord_origin": "TOPLEFT"}}, {"id": 74, "text": "1122", "bbox": {"l": 255.76506, "t": 285.46667, "r": 265.75204, "b": 289.8245800000001, "coord_origin": "TOPLEFT"}}, {"id": 75, "text": "955", "bbox": {"l": 281.47742, "t": 285.46667, "r": 289.22876, "b": 289.8245800000001, "coord_origin": "TOPLEFT"}}]}, "text": null, "otsl_seq": ["ched", "ched", "ched", "lcel", "ched", "lcel", "nl", "ched", "ched", "ched", "ched", "ched", "ched", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "fcel", "nl"], "num_rows": 10, "num_cols": 6, "table_cells": [{"bbox": {"l": 209.93285, "t": 222.18073000000004, "r": 241.04458999999997, "b": 226.36212, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 2, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 2, "end_col_offset_idx": 4, "text": "\u8ad6\u6587\u30d5\u30a1\u30a4\u30eb", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 263.76489, "t": 222.18073000000004, "r": 284.50589, "b": 226.36212, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 2, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 4, "end_col_offset_idx": 6, "text": "\u53c2\u8003\u6587\u732e", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 110.24990999999999, "t": 229.66594999999995, "r": 120.62018, "b": 233.84735, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "\u51fa\u5178", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 175.36609, "t": 229.66594999999995, "r": 201.29247, "b": 233.84735, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "\u30d5\u30a1\u30a4\u30eb \u6570", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 209.62408, "t": 229.66594999999995, "r": 219.99435, "b": 233.84735, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "\u82f1\u8a9e", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 229.19814, "t": 229.66594999999995, "r": 244.75377, "b": 233.84735, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "\u65e5\u672c\u8a9e", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 256.1142, "t": 229.66594999999995, "r": 266.48447, "b": 233.84735, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "\u82f1\u8a9e", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 278.38434, "t": 229.66594999999995, "r": 293.93997, "b": 233.84735, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "\u65e5\u672c\u8a9e", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 55.53052099999999, "t": 236.42584, "r": 162.7131, "b": 240.78375000000005, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Association for Computational Linguistics(ACL2003)", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 184.39731, "t": 236.42584, "r": 189.56456, "b": 240.78375000000005, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "65", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 208.99026, "t": 236.42584, "r": 214.15752, "b": 240.78375000000005, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "65", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 234.87517, "t": 236.42584, "r": 237.45833000000002, "b": 240.78375000000005, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 256.88446, "t": 236.42584, "r": 264.6358, "b": 240.78375000000005, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "150", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 284.06134, "t": 236.42584, "r": 286.6445, "b": 240.78375000000005, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 55.53052099999999, "t": 242.62048000000004, "r": 139.72253, "b": 246.97839, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Computational Linguistics(COLING2002)", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 183.10536, "t": 242.62048000000004, "r": 190.8567, "b": 246.97839, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "140", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 207.69832, "t": 242.62048000000004, "r": 215.44965999999997, "b": 246.97839, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "140", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 234.87517, "t": 242.62048000000004, "r": 237.45833000000002, "b": 246.97839, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 256.88446, "t": 242.62048000000004, "r": 264.6358, "b": 246.97839, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "150", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 284.06134, "t": 242.62048000000004, "r": 286.6445, "b": 246.97839, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 55.53052099999999, "t": 249.58942000000002, "r": 128.96027, "b": 253.97986000000003, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "\u96fb\u6c17\u60c5\u5831\u901a\u4fe1\u5b66\u4f1a 2003 \u5e74\u7dcf\u5408\u5927\u4f1a", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 183.10536, "t": 248.81506000000002, "r": 190.8567, "b": 253.17296999999996, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "150", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 210.28223, "t": 248.81506000000002, "r": 212.86539, "b": 253.17296999999996, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "8", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 232.29153, "t": 248.81506000000002, "r": 240.04287999999997, "b": 253.17296999999996, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "142", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 256.88446, "t": 248.81506000000002, "r": 264.6358, "b": 253.17296999999996, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "223", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 281.47742, "t": 248.81506000000002, "r": 289.22876, "b": 253.17296999999996, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "147", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 55.53052099999999, "t": 257.07465, "r": 129.88177, "b": 261.46509000000003, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "\u60c5\u5831\u51e6\u7406\u5b66\u4f1a\u7b2c 65 \u56de\u5168\u56fd\u5927\u4f1a (2003)", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 183.10536, "t": 256.30029, "r": 190.8567, "b": 260.65819999999997, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "177", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 210.28223, "t": 256.30029, "r": 212.86539, "b": 260.65819999999997, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "1", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 232.29153, "t": 256.30029, "r": 240.04287999999997, "b": 260.65819999999997, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "176", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 256.88446, "t": 256.30029, "r": 264.6358, "b": 260.65819999999997, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "150", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 281.47742, "t": 256.30029, "r": 289.22876, "b": 260.65819999999997, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "236", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 55.53052099999999, "t": 264.30175999999994, "r": 129.88177, "b": 268.69219999999996, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "\u7b2c 17 \u56de\u4eba\u5de5\u77e5\u80fd\u5b66\u4f1a\u5168\u56fd\u5927\u4f1a (2003)", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 183.10536, "t": 263.52739999999994, "r": 190.8567, "b": 267.88531, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "208", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 210.28223, "t": 263.52739999999994, "r": 212.86539, "b": 267.88531, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "5", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 232.29153, "t": 263.52739999999994, "r": 240.04287999999997, "b": 267.88531, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "203", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 256.88446, "t": 263.52739999999994, "r": 264.6358, "b": 267.88531, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "152", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 281.47742, "t": 263.52739999999994, "r": 289.22876, "b": 267.88531, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "244", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 55.53052099999999, "t": 271.52881, "r": 127.32454000000001, "b": 275.91925000000003, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "\u81ea\u7136\u8a00\u8a9e\u51e6\u7406\u7814\u7a76\u4f1a\u7b2c 146 \u301c 155 \u56de", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 184.39731, "t": 270.75446, "r": 189.56456, "b": 275.11237000000006, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "98", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 210.28223, "t": 270.75446, "r": 212.86539, "b": 275.11237000000006, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "2", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 233.58348, "t": 270.75446, "r": 238.75072999999998, "b": 275.11237000000006, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "96", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 256.88446, "t": 270.75446, "r": 264.6358, "b": 275.11237000000006, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "150", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 281.47742, "t": 270.75446, "r": 289.22876, "b": 275.11237000000006, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 7, "end_row_offset_idx": 8, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "232", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 55.53052099999999, "t": 279.01392, "r": 110.16829999999999, "b": 283.40436, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "WWW \u304b\u3089\u53ce\u96c6\u3057\u305f\u8ad6\u6587", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 183.10536, "t": 277.98157000000003, "r": 190.8567, "b": 282.33948000000004, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "107", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 208.99026, "t": 277.98157000000003, "r": 214.15752, "b": 282.33948000000004, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "73", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 233.58348, "t": 277.98157000000003, "r": 238.75072999999998, "b": 282.33948000000004, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "34", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 256.88446, "t": 277.98157000000003, "r": 264.6358, "b": 282.33948000000004, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "147", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 282.76938, "t": 277.98157000000003, "r": 287.93661, "b": 282.33948000000004, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 8, "end_row_offset_idx": 9, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "96", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 183.10536, "t": 285.46667, "r": 190.8567, "b": 289.8245800000001, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "945", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 207.69832, "t": 285.46667, "r": 215.44965999999997, "b": 289.8245800000001, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "294", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 232.29153, "t": 285.46667, "r": 240.04287999999997, "b": 289.8245800000001, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "651", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 255.76506, "t": 285.46667, "r": 265.75204, "b": 289.8245800000001, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "1122", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 281.47742, "t": 285.46667, "r": 289.22876, "b": 289.8245800000001, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 9, "end_row_offset_idx": 10, "start_col_offset_idx": 5, "end_col_offset_idx": 6, "text": "955", "column_header": false, "row_header": false, "row_section": false}]}, {"label": "caption", "id": 7, "page_no": 7, "cluster": {"id": 7, "label": "caption", "bbox": {"l": 380.340087890625, "t": 291.73724365234375, "r": 549.7123413085938, "b": 299.1470642089844, "coord_origin": "TOPLEFT"}, "confidence": 0.7500573396682739, "cells": [{"id": 76, "text": "Text is aligned to match original for ease of viewing", "bbox": {"l": 380.42731, "t": 292.30426, "r": 549.42175, "b": 298.60284, "coord_origin": "TOPLEFT"}}]}, "text": "Text is aligned to match original for ease of viewing"}, {"label": "table", "id": 8, "page_no": 7, "cluster": {"id": 8, "label": "table", "bbox": {"l": 304.9219970703125, "t": 218.51490783691406, "r": 550.2321166992188, "b": 287.9006652832031, "coord_origin": "TOPLEFT"}, "confidence": 0.8900098204612732, "cells": [{"id": 77, "text": "Weighted Average Grant Date Fair", "bbox": {"l": 459.04861, "t": 221.62415, "r": 542.00018, "b": 226.68933000000004, "coord_origin": "TOPLEFT"}}, {"id": 78, "text": "Value", "bbox": {"l": 493.82193, "t": 227.83416999999997, "r": 507.2258, "b": 232.89935000000003, "coord_origin": "TOPLEFT"}}, {"id": 79, "text": "RS", "bbox": {"l": 393.2442, "t": 236.74712999999997, "r": 400.74588, "b": 241.81232, "coord_origin": "TOPLEFT"}}, {"id": 80, "text": "U", "bbox": {"l": 400.74643, "t": 236.74712999999997, "r": 404.64523, "b": 241.81232, "coord_origin": "TOPLEFT"}}, {"id": 81, "text": "s", "bbox": {"l": 404.6463, "t": 236.74712999999997, "r": 407.34631, "b": 241.81232, "coord_origin": "TOPLEFT"}}, {"id": 82, "text": "Shares (in millions)", "bbox": {"l": 392.09671, "t": 221.57446000000004, "r": 438.0145, "b": 226.63964999999996, "coord_origin": "TOPLEFT"}}, {"id": 83, "text": "PSUs", "bbox": {"l": 427.18323, "t": 236.74712999999997, "r": 440.98778999999996, "b": 241.81232, "coord_origin": "TOPLEFT"}}, {"id": 84, "text": "RSUs", "bbox": {"l": 468.38254, "t": 236.74712999999997, "r": 482.48465000000004, "b": 241.81232, "coord_origin": "TOPLEFT"}}, {"id": 85, "text": "PSUs", "bbox": {"l": 516.92578, "t": 236.74712999999997, "r": 530.73035, "b": 241.81232, "coord_origin": "TOPLEFT"}}, {"id": 86, "text": "Nonvested on Janua", "bbox": {"l": 306.11493, "t": 244.61084000000005, "r": 355.6532, "b": 249.67602999999997, "coord_origin": "TOPLEFT"}}, {"id": 87, "text": "ry 1", "bbox": {"l": 355.65427, "t": 244.61084000000005, "r": 364.65607, "b": 249.67602999999997, "coord_origin": "TOPLEFT"}}, {"id": 88, "text": "1.", "bbox": {"l": 396.24661, "t": 244.91327, "r": 400.75238, "b": 249.97844999999995, "coord_origin": "TOPLEFT"}}, {"id": 89, "text": "1", "bbox": {"l": 400.7529, "t": 244.91327, "r": 403.75531, "b": 249.97844999999995, "coord_origin": "TOPLEFT"}}, {"id": 90, "text": "0.3", "bbox": {"l": 429.81838999999997, "t": 244.91327, "r": 437.32708999999994, "b": 249.97844999999995, "coord_origin": "TOPLEFT"}}, {"id": 91, "text": "90.10", "bbox": {"l": 465.52859, "t": 244.91327, "r": 478.40103, "b": 249.97844999999995, "coord_origin": "TOPLEFT"}}, {"id": 92, "text": "$", "bbox": {"l": 480.97552, "t": 244.91327, "r": 483.55001999999996, "b": 249.97844999999995, "coord_origin": "TOPLEFT"}}, {"id": 93, "text": "$ 91.19", "bbox": {"l": 513.44824, "t": 244.91327, "r": 531.46967, "b": 249.97844999999995, "coord_origin": "TOPLEFT"}}, {"id": 94, "text": "Granted", "bbox": {"l": 306.11493, "t": 253.68451000000005, "r": 325.62674, "b": 258.74969, "coord_origin": "TOPLEFT"}}, {"id": 95, "text": "0.", "bbox": {"l": 396.24661, "t": 253.68451000000005, "r": 400.75238, "b": 258.74969, "coord_origin": "TOPLEFT"}}, {"id": 96, "text": "5", "bbox": {"l": 400.7529, "t": 253.68451000000005, "r": 403.75531, "b": 258.74969, "coord_origin": "TOPLEFT"}}, {"id": 97, "text": "0.1", "bbox": {"l": 429.81838999999997, "t": 253.68451000000005, "r": 437.32708999999994, "b": 258.74969, "coord_origin": "TOPLEFT"}}, {"id": 98, "text": "117.44", "bbox": {"l": 466.43579000000005, "t": 253.68451000000005, "r": 482.54831, "b": 258.74969, "coord_origin": "TOPLEFT"}}, {"id": 99, "text": "122.41", "bbox": {"l": 514.29065, "t": 253.68451000000005, "r": 530.80981, "b": 258.74969, "coord_origin": "TOPLEFT"}}, {"id": 100, "text": "Vested", "bbox": {"l": 306.11493, "t": 261.54822, "r": 322.62866, "b": 266.61339999999996, "coord_origin": "TOPLEFT"}}, {"id": 101, "text": "(0.", "bbox": {"l": 394.43222, "t": 261.54822, "r": 400.73563, "b": 266.61339999999996, "coord_origin": "TOPLEFT"}}, {"id": 102, "text": "5", "bbox": {"l": 400.73456, "t": 261.54822, "r": 403.73697, "b": 266.61339999999996, "coord_origin": "TOPLEFT"}}, {"id": 103, "text": ")", "bbox": {"l": 403.73804, "t": 261.54822, "r": 405.53625, "b": 266.61339999999996, "coord_origin": "TOPLEFT"}}, {"id": 104, "text": "(0.1)", "bbox": {"l": 427.7016, "t": 261.54822, "r": 438.80563, "b": 266.61339999999996, "coord_origin": "TOPLEFT"}}, {"id": 105, "text": "87.08", "bbox": {"l": 468.55533, "t": 261.54822, "r": 482.07043, "b": 266.61339999999996, "coord_origin": "TOPLEFT"}}, {"id": 106, "text": "81.14", "bbox": {"l": 516.01862, "t": 261.54822, "r": 529.53375, "b": 266.61339999999996, "coord_origin": "TOPLEFT"}}, {"id": 107, "text": "Canceled or forfeited", "bbox": {"l": 306.11493, "t": 269.64148, "r": 356.24771, "b": 274.70667000000003, "coord_origin": "TOPLEFT"}}, {"id": 108, "text": "(0.", "bbox": {"l": 394.43222, "t": 270.31946000000005, "r": 400.73563, "b": 275.38464, "coord_origin": "TOPLEFT"}}, {"id": 109, "text": "1", "bbox": {"l": 400.73456, "t": 270.31946000000005, "r": 403.73697, "b": 275.38464, "coord_origin": "TOPLEFT"}}, {"id": 110, "text": ")", "bbox": {"l": 403.73804, "t": 270.31946000000005, "r": 405.53625, "b": 275.38464, "coord_origin": "TOPLEFT"}}, {"id": 111, "text": "-", "bbox": {"l": 431.02802, "t": 270.31946000000005, "r": 436.4280099999999, "b": 275.38464, "coord_origin": "TOPLEFT"}}, {"id": 112, "text": "102.01", "bbox": {"l": 465.83099000000004, "t": 270.31946000000005, "r": 482.35013, "b": 275.38464, "coord_origin": "TOPLEFT"}}, {"id": 113, "text": "92.18", "bbox": {"l": 516.01862, "t": 270.31946000000005, "r": 529.53375, "b": 275.38464, "coord_origin": "TOPLEFT"}}, {"id": 114, "text": "Nonvested on December 31", "bbox": {"l": 306.11493, "t": 278.48572, "r": 373.35764, "b": 283.55092999999994, "coord_origin": "TOPLEFT"}}, {"id": 115, "text": "1.0", "bbox": {"l": 396.24661, "t": 278.48572, "r": 403.75531, "b": 283.55092999999994, "coord_origin": "TOPLEFT"}}, {"id": 116, "text": "0.3", "bbox": {"l": 429.51599, "t": 278.48572, "r": 437.02469, "b": 283.55092999999994, "coord_origin": "TOPLEFT"}}, {"id": 117, "text": "104.85 $", "bbox": {"l": 463.7142, "t": 278.48572, "r": 484.73965000000004, "b": 283.55092999999994, "coord_origin": "TOPLEFT"}}, {"id": 118, "text": "$ 104.51", "bbox": {"l": 512.99463, "t": 278.48572, "r": 534.02008, "b": 283.55092999999994, "coord_origin": "TOPLEFT"}}]}, "text": null, "otsl_seq": ["ecel", "ched", "lcel", "ched", "lcel", "nl", "ecel", "ched", "ched", "ched", "ched", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "nl", "rhed", "fcel", "fcel", "fcel", "fcel", "nl"], "num_rows": 7, "num_cols": 5, "table_cells": [{"bbox": {"l": 459.04861, "t": 221.62415, "r": 542.00018, "b": 232.89935000000003, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 2, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 3, "end_col_offset_idx": 5, "text": "Weighted Average Grant Date Fair Value", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 393.2442, "t": 236.74712999999997, "r": 407.34631, "b": 241.81232, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "RS U s", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 392.09671, "t": 221.57446000000004, "r": 438.0145, "b": 226.63964999999996, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 2, "start_row_offset_idx": 0, "end_row_offset_idx": 1, "start_col_offset_idx": 1, "end_col_offset_idx": 3, "text": "Shares (in millions)", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 427.18323, "t": 236.74712999999997, "r": 440.98778999999996, "b": 241.81232, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "PSUs", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 468.38254, "t": 236.74712999999997, "r": 482.48465000000004, "b": 241.81232, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "RSUs", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 516.92578, "t": 236.74712999999997, "r": 530.73035, "b": 241.81232, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 1, "end_row_offset_idx": 2, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "PSUs", "column_header": true, "row_header": false, "row_section": false}, {"bbox": {"l": 306.11493, "t": 244.61084000000005, "r": 364.65607, "b": 249.67602999999997, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Nonvested on Janua ry 1", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 396.24661, "t": 244.91327, "r": 403.75531, "b": 249.97844999999995, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "1. 1", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 429.81838999999997, "t": 244.91327, "r": 437.32708999999994, "b": 249.97844999999995, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "0.3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 465.52859, "t": 244.91327, "r": 483.55001999999996, "b": 249.97844999999995, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "90.10 $", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 513.44824, "t": 244.91327, "r": 531.46967, "b": 249.97844999999995, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 2, "end_row_offset_idx": 3, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "$ 91.19", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 306.11493, "t": 253.68451000000005, "r": 325.62674, "b": 258.74969, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Granted", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 396.24661, "t": 253.68451000000005, "r": 403.75531, "b": 258.74969, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "0. 5", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 429.81838999999997, "t": 253.68451000000005, "r": 437.32708999999994, "b": 258.74969, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "0.1", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 466.43579000000005, "t": 253.68451000000005, "r": 482.54831, "b": 258.74969, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "117.44", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 514.29065, "t": 253.68451000000005, "r": 530.80981, "b": 258.74969, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 3, "end_row_offset_idx": 4, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "122.41", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 306.11493, "t": 261.54822, "r": 322.62866, "b": 266.61339999999996, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Vested", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 394.43222, "t": 261.54822, "r": 405.53625, "b": 266.61339999999996, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "(0. 5 )", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 427.7016, "t": 261.54822, "r": 438.80563, "b": 266.61339999999996, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "(0.1)", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 468.55533, "t": 261.54822, "r": 482.07043, "b": 266.61339999999996, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "87.08", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 516.01862, "t": 261.54822, "r": 529.53375, "b": 266.61339999999996, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "81.14", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 306.11493, "t": 269.64148, "r": 356.24771, "b": 274.70667000000003, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Canceled or forfeited", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 394.43222, "t": 270.31946000000005, "r": 405.53625, "b": 275.38464, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "(0. 1 )", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 431.02802, "t": 270.31946000000005, "r": 436.4280099999999, "b": 275.38464, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "-", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 465.83099000000004, "t": 270.31946000000005, "r": 482.35013, "b": 275.38464, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "102.01", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 516.01862, "t": 270.31946000000005, "r": 529.53375, "b": 275.38464, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 5, "end_row_offset_idx": 6, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "92.18", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 306.11493, "t": 278.48572, "r": 373.35764, "b": 283.55092999999994, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 0, "end_col_offset_idx": 1, "text": "Nonvested on December 31", "column_header": false, "row_header": true, "row_section": false}, {"bbox": {"l": 396.24661, "t": 278.48572, "r": 403.75531, "b": 283.55092999999994, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 1, "end_col_offset_idx": 2, "text": "1.0", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 429.51599, "t": 278.48572, "r": 437.02469, "b": 283.55092999999994, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "0.3", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 463.7142, "t": 278.48572, "r": 484.73965000000004, "b": 283.55092999999994, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "104.85 $", "column_header": false, "row_header": false, "row_section": false}, {"bbox": {"l": 512.99463, "t": 278.48572, "r": 534.02008, "b": 283.55092999999994, "coord_origin": "TOPLEFT"}, "row_span": 1, "col_span": 1, "start_row_offset_idx": 6, "end_row_offset_idx": 7, "start_col_offset_idx": 4, "end_col_offset_idx": 5, "text": "$ 104.51", "column_header": false, "row_header": false, "row_section": false}]}, {"label": "caption", "id": 9, "page_no": 7, "cluster": {"id": 9, "label": "caption", "bbox": {"l": 49.38380813598633, "t": 319.77777099609375, "r": 545.11377, "b": 365.64987, "coord_origin": "TOPLEFT"}, "confidence": 0.9140278697013855, "cells": [{"id": 119, "text": "Figure 5:", "bbox": {"l": 50.112, "t": 320.87735, "r": 86.864021, "b": 329.78391, "coord_origin": "TOPLEFT"}}, {"id": 120, "text": "One of the benefits of TableFormer is that it is language agnostic, as an example, the left part of the illustration", "bbox": {"l": 93.917542, "t": 320.87735, "r": 545.11371, "b": 329.78391, "coord_origin": "TOPLEFT"}}, {"id": 121, "text": "demonstrates TableFormer predictions on previously unseen language (Japanese). Additionally, we see that TableFormer is", "bbox": {"l": 50.112, "t": 332.83233999999993, "r": 545.11371, "b": 341.73889, "coord_origin": "TOPLEFT"}}, {"id": 122, "text": "robust to variability in style and content, right side of the illustration shows the example of the TableFormer prediction from", "bbox": {"l": 50.112, "t": 344.78732, "r": 545.11377, "b": 353.69388, "coord_origin": "TOPLEFT"}}, {"id": 123, "text": "the FinTabNet dataset.", "bbox": {"l": 50.112, "t": 356.74332, "r": 139.79532, "b": 365.64987, "coord_origin": "TOPLEFT"}}]}, "text": "Figure 5: One of the benefits of TableFormer is that it is language agnostic, as an example, the left part of the illustration demonstrates TableFormer predictions on previously unseen language (Japanese). Additionally, we see that TableFormer is robust to variability in style and content, right side of the illustration shows the example of the TableFormer prediction from the FinTabNet dataset."}, {"label": "picture", "id": 10, "page_no": 7, "cluster": {"id": 10, "label": "picture", "bbox": {"l": 216.76930236816406, "t": 380.49066162109375, "r": 375.7828674316406, "b": 443.34698486328125, "coord_origin": "TOPLEFT"}, "confidence": 0.8057794570922852, "cells": [{"id": 124, "text": "Red - PDF cells, Green - predicted bounding boxes", "bbox": {"l": 220.26282, "t": 381.77722, "r": 342.07819, "b": 386.44281, "coord_origin": "TOPLEFT"}}]}, "text": "", "data": null, "provenance": null, "predicted_class": null, "confidence": null}, {"label": "picture", "id": 11, "page_no": 7, "cluster": {"id": 11, "label": "picture", "bbox": {"l": 51.73619842529297, "t": 380.48077392578125, "r": 211.83766174316406, "b": 443.65802001953125, "coord_origin": "TOPLEFT"}, "confidence": 0.8307981491088867, "cells": [{"id": 125, "text": "Ground Truth", "bbox": {"l": 53.715248, "t": 381.77722, "r": 85.657333, "b": 386.44281, "coord_origin": "TOPLEFT"}}]}, "text": "", "data": null, "provenance": null, "predicted_class": null, "confidence": null}, {"label": "picture", "id": 12, "page_no": 7, "cluster": {"id": 12, "label": "picture", "bbox": {"l": 383.13629150390625, "t": 381.2313232421875, "r": 542.1132202148438, "b": 442.7750244140625, "coord_origin": "TOPLEFT"}, "confidence": 0.7880472540855408, "cells": [{"id": 126, "text": "16", "bbox": {"l": 437.37939, "t": 400.55295, "r": 443.69870000000003, "b": 406.87158, "coord_origin": "TOPLEFT"}}, {"id": 127, "text": "17", "bbox": {"l": 450.33203, "t": 400.55295, "r": 456.6513100000001, "b": 406.87158, "coord_origin": "TOPLEFT"}}, {"id": 128, "text": "18", "bbox": {"l": 463.28464, "t": 400.55295, "r": 469.60394, "b": 406.87158, "coord_origin": "TOPLEFT"}}, {"id": 129, "text": "19", "bbox": {"l": 476.23724000000004, "t": 400.55295, "r": 482.5565500000001, "b": 406.87158, "coord_origin": "TOPLEFT"}}, {"id": 130, "text": "20", "bbox": {"l": 489.18988, "t": 400.55295, "r": 495.50916, "b": 406.87158, "coord_origin": "TOPLEFT"}}, {"id": 131, "text": "21", "bbox": {"l": 502.14251999999993, "t": 400.55295, "r": 508.46178999999995, "b": 406.87158, "coord_origin": "TOPLEFT"}}, {"id": 132, "text": "22", "bbox": {"l": 515.09509, "t": 400.55295, "r": 521.41443, "b": 406.87158, "coord_origin": "TOPLEFT"}}, {"id": 133, "text": "23", "bbox": {"l": 385.2814, "t": 411.03836000000007, "r": 391.60071, "b": 417.35699, "coord_origin": "TOPLEFT"}}, {"id": 134, "text": "24", "bbox": {"l": 398.52341, "t": 411.03836000000007, "r": 404.84271, "b": 417.35699, "coord_origin": "TOPLEFT"}}, {"id": 135, "text": "25", "bbox": {"l": 411.47604, "t": 411.03836000000007, "r": 417.79535, "b": 417.35699, "coord_origin": "TOPLEFT"}}, {"id": 136, "text": "26", "bbox": {"l": 437.37939, "t": 411.03836000000007, "r": 443.69870000000003, "b": 417.35699, "coord_origin": "TOPLEFT"}}, {"id": 137, "text": "27", "bbox": {"l": 450.33203, "t": 411.03836000000007, "r": 456.6513100000001, "b": 417.35699, "coord_origin": "TOPLEFT"}}, {"id": 138, "text": "28", "bbox": {"l": 463.28464, "t": 411.03836000000007, "r": 469.60394, "b": 417.35699, "coord_origin": "TOPLEFT"}}, {"id": 139, "text": "30", "bbox": {"l": 385.2814, "t": 421.0697, "r": 391.60071, "b": 427.38834, "coord_origin": "TOPLEFT"}}, {"id": 140, "text": "31", "bbox": {"l": 398.52341, "t": 421.0697, "r": 404.84271, "b": 427.38834, "coord_origin": "TOPLEFT"}}, {"id": 141, "text": "32", "bbox": {"l": 411.47604, "t": 421.0697, "r": 417.79532, "b": 427.38834, "coord_origin": "TOPLEFT"}}, {"id": 142, "text": "33", "bbox": {"l": 424.42865, "t": 421.0697, "r": 430.74796, "b": 427.38834, "coord_origin": "TOPLEFT"}}, {"id": 143, "text": "34", "bbox": {"l": 437.38129, "t": 421.0697, "r": 443.70056, "b": 427.38834, "coord_origin": "TOPLEFT"}}, {"id": 144, "text": "35", "bbox": {"l": 450.33389000000005, "t": 421.0697, "r": 456.65319999999997, "b": 427.38834, "coord_origin": "TOPLEFT"}}, {"id": 145, "text": "36", "bbox": {"l": 463.2865, "t": 421.0697, "r": 469.6058, "b": 427.38834, "coord_origin": "TOPLEFT"}}, {"id": 146, "text": "37", "bbox": {"l": 476.23914, "t": 421.0697, "r": 482.55841, "b": 427.38834, "coord_origin": "TOPLEFT"}}, {"id": 147, "text": "38", "bbox": {"l": 489.1917700000001, "t": 421.0697, "r": 495.51105, "b": 427.38834, "coord_origin": "TOPLEFT"}}, {"id": 148, "text": "39", "bbox": {"l": 502.14438, "t": 421.0697, "r": 508.46368, "b": 427.38834, "coord_origin": "TOPLEFT"}}, {"id": 149, "text": "40", "bbox": {"l": 515.09705, "t": 421.0697, "r": 521.41632, "b": 427.38834, "coord_origin": "TOPLEFT"}}, {"id": 150, "text": "41", "bbox": {"l": 528.04962, "t": 421.0697, "r": 534.3689, "b": 427.38834, "coord_origin": "TOPLEFT"}}, {"id": 151, "text": "42", "bbox": {"l": 385.2814, "t": 432.04431, "r": 391.60071, "b": 438.36295, "coord_origin": "TOPLEFT"}}, {"id": 152, "text": "43", "bbox": {"l": 398.52341, "t": 432.04431, "r": 404.84271, "b": 438.36295, "coord_origin": "TOPLEFT"}}, {"id": 153, "text": "44", "bbox": {"l": 411.47604, "t": 432.04431, "r": 417.79532, "b": 438.36295, "coord_origin": "TOPLEFT"}}, {"id": 154, "text": "45", "bbox": {"l": 424.42865, "t": 432.04431, "r": 430.74796, "b": 438.36295, "coord_origin": "TOPLEFT"}}, {"id": 155, "text": "46", "bbox": {"l": 437.38129, "t": 432.04431, "r": 443.70056, "b": 438.36295, "coord_origin": "TOPLEFT"}}, {"id": 156, "text": "47", "bbox": {"l": 450.33389000000005, "t": 432.04431, "r": 456.65319999999997, "b": 438.36295, "coord_origin": "TOPLEFT"}}, {"id": 157, "text": "48", "bbox": {"l": 463.2865, "t": 432.04431, "r": 469.6058, "b": 438.36295, "coord_origin": "TOPLEFT"}}, {"id": 158, "text": "49", "bbox": {"l": 476.23914, "t": 432.04431, "r": 482.55841, "b": 438.36295, "coord_origin": "TOPLEFT"}}, {"id": 159, "text": "50", "bbox": {"l": 489.1917700000001, "t": 432.04431, "r": 495.51105, "b": 438.36295, "coord_origin": "TOPLEFT"}}, {"id": 160, "text": "51", "bbox": {"l": 502.14438, "t": 432.04431, "r": 508.46368, "b": 438.36295, "coord_origin": "TOPLEFT"}}, {"id": 161, "text": "52", "bbox": {"l": 515.09705, "t": 432.04431, "r": 521.41632, "b": 438.36295, "coord_origin": "TOPLEFT"}}, {"id": 162, "text": "53", "bbox": {"l": 528.04962, "t": 432.04431, "r": 534.3689, "b": 438.36295, "coord_origin": "TOPLEFT"}}, {"id": 163, "text": "0", "bbox": {"l": 385.2814, "t": 389.20004, "r": 388.44073, "b": 395.51868, "coord_origin": "TOPLEFT"}}, {"id": 164, "text": "1", "bbox": {"l": 398.52341, "t": 389.20004, "r": 401.68274, "b": 395.51868, "coord_origin": "TOPLEFT"}}, {"id": 165, "text": "2", "bbox": {"l": 411.4754, "t": 389.20004, "r": 414.63474, "b": 395.51868, "coord_origin": "TOPLEFT"}}, {"id": 166, "text": "3", "bbox": {"l": 424.4274, "t": 389.20004, "r": 427.58673, "b": 395.51868, "coord_origin": "TOPLEFT"}}, {"id": 167, "text": "4", "bbox": {"l": 437.37939, "t": 389.20004, "r": 440.53870000000006, "b": 395.51868, "coord_origin": "TOPLEFT"}}, {"id": 168, "text": "5", "bbox": {"l": 450.33136, "t": 389.20004, "r": 453.49069000000003, "b": 395.51868, "coord_origin": "TOPLEFT"}}, {"id": 169, "text": "6", "bbox": {"l": 463.28336, "t": 389.20004, "r": 466.44269, "b": 395.51868, "coord_origin": "TOPLEFT"}}, {"id": 170, "text": "7", "bbox": {"l": 476.23535, "t": 389.20004, "r": 479.39468, "b": 395.51868, "coord_origin": "TOPLEFT"}}, {"id": 171, "text": "8", "bbox": {"l": 489.18735, "t": 389.20004, "r": 492.34668, "b": 395.51868, "coord_origin": "TOPLEFT"}}, {"id": 172, "text": "9", "bbox": {"l": 502.13933999999995, "t": 389.20004, "r": 505.29868000000005, "b": 395.51868, "coord_origin": "TOPLEFT"}}, {"id": 173, "text": "10", "bbox": {"l": 515.09131, "t": 389.20004, "r": 521.41064, "b": 395.51868, "coord_origin": "TOPLEFT"}}, {"id": 174, "text": "11", "bbox": {"l": 528.04364, "t": 389.20004, "r": 534.13104, "b": 395.51868, "coord_origin": "TOPLEFT"}}, {"id": 175, "text": "12", "bbox": {"l": 385.2814, "t": 398.97464, "r": 391.60071, "b": 405.29327, "coord_origin": "TOPLEFT"}}, {"id": 176, "text": "13", "bbox": {"l": 398.52341, "t": 398.97464, "r": 404.84271, "b": 405.29327, "coord_origin": "TOPLEFT"}}, {"id": 177, "text": "14", "bbox": {"l": 411.47604, "t": 398.97464, "r": 417.79535, "b": 405.29327, "coord_origin": "TOPLEFT"}}, {"id": 178, "text": "15", "bbox": {"l": 424.42719, "t": 406.77463000000006, "r": 430.74648999999994, "b": 413.09326, "coord_origin": "TOPLEFT"}}, {"id": 179, "text": "29", "bbox": {"l": 502.86941999999993, "t": 410.99438, "r": 509.18871999999993, "b": 417.31302, "coord_origin": "TOPLEFT"}}, {"id": 180, "text": "Predicted Structure", "bbox": {"l": 384.35437, "t": 381.77722, "r": 430.99261, "b": 386.44281, "coord_origin": "TOPLEFT"}}]}, "text": "", "data": null, "provenance": null, "predicted_class": null, "confidence": null}, {"label": "caption", "id": 13, "page_no": 7, "cluster": {"id": 13, "label": "caption", "bbox": {"l": 62.00014114379883, "t": 457.9582824707031, "r": 532.63049, "b": 467.8396301269531, "coord_origin": "TOPLEFT"}, "confidence": 0.9153729677200317, "cells": [{"id": 181, "text": "Figure 6: An example of TableFormer predictions (bounding boxes and structure) from generated SynthTabNet table.", "bbox": {"l": 62.595001, "t": 458.72836, "r": 532.63049, "b": 467.63492, "coord_origin": "TOPLEFT"}}]}, "text": "Figure 6: An example of TableFormer predictions (bounding boxes and structure) from generated SynthTabNet table."}, {"label": "section_header", "id": 14, "page_no": 7, "cluster": {"id": 14, "label": "section_header", "bbox": {"l": 49.432151794433594, "t": 490.32525634765625, "r": 164.0072479248047, "b": 501.24741, "coord_origin": "TOPLEFT"}, "confidence": 0.9561254978179932, "cells": [{"id": 182, "text": "5.5.", "bbox": {"l": 50.112, "t": 491.39536, "r": 64.448898, "b": 501.24741, "coord_origin": "TOPLEFT"}}, {"id": 183, "text": "Qualitative Analysis", "bbox": {"l": 74.006828, "t": 491.39536, "r": 163.7558, "b": 501.24741, "coord_origin": "TOPLEFT"}}]}, "text": "5.5. Qualitative Analysis"}, {"label": "text", "id": 15, "page_no": 7, "cluster": {"id": 15, "label": "text", "bbox": {"l": 49.34812545776367, "t": 535.7822875976562, "r": 286.7104187011719, "b": 713.9708251953125, "coord_origin": "TOPLEFT"}, "confidence": 0.9852354526519775, "cells": [{"id": 184, "text": "We showcase several visualizations for the different", "bbox": {"l": 62.067001, "t": 536.87337, "r": 286.36499, "b": 545.77992, "coord_origin": "TOPLEFT"}}, {"id": 185, "text": "components of our network on various", "bbox": {"l": 50.112, "t": 548.82837, "r": 211.15741, "b": 557.73492, "coord_origin": "TOPLEFT"}}, {"id": 186, "text": "\u201ccomplex\u201d", "bbox": {"l": 215.10000999999997, "t": 548.91803, "r": 259.17453, "b": 557.50578, "coord_origin": "TOPLEFT"}}, {"id": 187, "text": "tables", "bbox": {"l": 263.12, "t": 548.82837, "r": 286.36273, "b": 557.73492, "coord_origin": "TOPLEFT"}}, {"id": 188, "text": "within datasets presented in this work in Fig. 5 and Fig. 6", "bbox": {"l": 50.112, "t": 560.78337, "r": 286.36505, "b": 569.68993, "coord_origin": "TOPLEFT"}}, {"id": 189, "text": "As it is shown, our model is able to predict bounding boxes", "bbox": {"l": 50.112, "t": 572.73837, "r": 286.36508, "b": 581.6449299999999, "coord_origin": "TOPLEFT"}}, {"id": 190, "text": "for all table cells, even for the empty ones. Additionally,", "bbox": {"l": 50.112, "t": 584.69337, "r": 286.36508, "b": 593.59993, "coord_origin": "TOPLEFT"}}, {"id": 191, "text": "our post-processing techniques can extract the cell content", "bbox": {"l": 50.112, "t": 596.64937, "r": 286.36505, "b": 605.55592, "coord_origin": "TOPLEFT"}}, {"id": 192, "text": "by matching the predicted bounding boxes to the PDF cells", "bbox": {"l": 50.112, "t": 608.60437, "r": 286.36508, "b": 617.51093, "coord_origin": "TOPLEFT"}}, {"id": 193, "text": "based on their overlap and spatial proximity. The left part", "bbox": {"l": 50.112, "t": 620.55937, "r": 286.36508, "b": 629.46593, "coord_origin": "TOPLEFT"}}, {"id": 194, "text": "of Fig. 5 demonstrates also the adaptability of our method", "bbox": {"l": 50.112, "t": 632.51437, "r": 286.36508, "b": 641.42093, "coord_origin": "TOPLEFT"}}, {"id": 195, "text": "to any language, as it can successfully extract Japanese", "bbox": {"l": 50.112, "t": 644.46938, "r": 286.36508, "b": 653.37593, "coord_origin": "TOPLEFT"}}, {"id": 196, "text": "text, although the training set contains only English content.", "bbox": {"l": 50.112, "t": 656.42438, "r": 286.36511, "b": 665.33094, "coord_origin": "TOPLEFT"}}, {"id": 197, "text": "We provide more visualizations including the intermediate", "bbox": {"l": 50.112, "t": 668.38037, "r": 286.36508, "b": 677.28694, "coord_origin": "TOPLEFT"}}, {"id": 198, "text": "steps in the supplementary material. Overall these illustra-", "bbox": {"l": 50.112, "t": 680.33537, "r": 286.36511, "b": 689.24194, "coord_origin": "TOPLEFT"}}, {"id": 199, "text": "tions justify the versatility of our method across a diverse", "bbox": {"l": 50.112, "t": 692.290375, "r": 286.36511, "b": 701.196945, "coord_origin": "TOPLEFT"}}, {"id": 200, "text": "range of table appearances and content type.", "bbox": {"l": 50.112, "t": 704.245377, "r": 226.88833999999997, "b": 713.1519470000001, "coord_origin": "TOPLEFT"}}]}, "text": "We showcase several visualizations for the different components of our network on various \u201ccomplex\u201d tables within datasets presented in this work in Fig. 5 and Fig. 6 As it is shown, our model is able to predict bounding boxes for all table cells, even for the empty ones. Additionally, our post-processing techniques can extract the cell content by matching the predicted bounding boxes to the PDF cells based on their overlap and spatial proximity. The left part of Fig. 5 demonstrates also the adaptability of our method to any language, as it can successfully extract Japanese text, although the training set contains only English content. We provide more visualizations including the intermediate steps in the supplementary material. Overall these illustrations justify the versatility of our method across a diverse range of table appearances and content type."}, {"label": "section_header", "id": 16, "page_no": 7, "cluster": {"id": 16, "label": "section_header", "bbox": {"l": 308.2789306640625, "t": 489.6516418457031, "r": 460.84848, "b": 501.45663, "coord_origin": "TOPLEFT"}, "confidence": 0.9436547160148621, "cells": [{"id": 201, "text": "6.", "bbox": {"l": 308.862, "t": 490.70892, "r": 316.07382, "b": 501.45663, "coord_origin": "TOPLEFT"}}, {"id": 202, "text": "Future Work & Conclusion", "bbox": {"l": 325.68954, "t": 490.70892, "r": 460.84848, "b": 501.45663, "coord_origin": "TOPLEFT"}}]}, "text": "6. Future Work & Conclusion"}, {"label": "text", "id": 17, "page_no": 7, "cluster": {"id": 17, "label": "text", "bbox": {"l": 307.99957275390625, "t": 511.78887939453125, "r": 545.2568359375, "b": 653.30592, "coord_origin": "TOPLEFT"}, "confidence": 0.9875592589378357, "cells": [{"id": 203, "text": "In this paper, we presented TableFormer an end-to-end", "bbox": {"l": 320.81699, "t": 512.89337, "r": 545.11505, "b": 521.79993, "coord_origin": "TOPLEFT"}}, {"id": 204, "text": "transformer based approach to predict table structures and", "bbox": {"l": 308.862, "t": 524.84836, "r": 545.11517, "b": 533.75491, "coord_origin": "TOPLEFT"}}, {"id": 205, "text": "bounding boxes of cells from an image. This approach en-", "bbox": {"l": 308.862, "t": 536.80336, "r": 545.11511, "b": 545.70992, "coord_origin": "TOPLEFT"}}, {"id": 206, "text": "ables us to recreate the table structure, and extract the cell", "bbox": {"l": 308.862, "t": 548.75836, "r": 545.11505, "b": 557.6649199999999, "coord_origin": "TOPLEFT"}}, {"id": 207, "text": "content from PDF or OCR by using bounding boxes. Ad-", "bbox": {"l": 308.862, "t": 560.71336, "r": 545.11517, "b": 569.61992, "coord_origin": "TOPLEFT"}}, {"id": 208, "text": "ditionally, it provides the versatility required in real-world", "bbox": {"l": 308.862, "t": 572.66837, "r": 545.11511, "b": 581.57492, "coord_origin": "TOPLEFT"}}, {"id": 209, "text": "scenarios when dealing with various types of PDF docu-", "bbox": {"l": 308.862, "t": 584.62436, "r": 545.11511, "b": 593.53091, "coord_origin": "TOPLEFT"}}, {"id": 210, "text": "ments, and languages.", "bbox": {"l": 308.862, "t": 596.57936, "r": 400.46808, "b": 605.48592, "coord_origin": "TOPLEFT"}}, {"id": 211, "text": "Furthermore, our method outper-", "bbox": {"l": 408.37839, "t": 596.57936, "r": 545.11511, "b": 605.48592, "coord_origin": "TOPLEFT"}}, {"id": 212, "text": "forms all state-of-the-arts with a wide margin. Finally, we", "bbox": {"l": 308.862, "t": 608.53436, "r": 545.11505, "b": 617.44092, "coord_origin": "TOPLEFT"}}, {"id": 213, "text": "introduce \u201cSynthTabNet\u201d a challenging synthetically gen-", "bbox": {"l": 308.862, "t": 620.48936, "r": 545.11511, "b": 629.3959199999999, "coord_origin": "TOPLEFT"}}, {"id": 214, "text": "erated dataset that reinforces missing characteristics from", "bbox": {"l": 308.862, "t": 632.4443699999999, "r": 545.11505, "b": 641.35092, "coord_origin": "TOPLEFT"}}, {"id": 215, "text": "other datasets.", "bbox": {"l": 308.862, "t": 644.39937, "r": 365.85803, "b": 653.30592, "coord_origin": "TOPLEFT"}}]}, "text": "In this paper, we presented TableFormer an end-to-end transformer based approach to predict table structures and bounding boxes of cells from an image. This approach enables us to recreate the table structure, and extract the cell content from PDF or OCR by using bounding boxes. Additionally, it provides the versatility required in real-world scenarios when dealing with various types of PDF documents, and languages. Furthermore, our method outperforms all state-of-the-arts with a wide margin. Finally, we introduce \u201cSynthTabNet\u201d a challenging synthetically generated dataset that reinforces missing characteristics from other datasets."}, {"label": "section_header", "id": 18, "page_no": 7, "cluster": {"id": 18, "label": "section_header", "bbox": {"l": 308.3702392578125, "t": 671.6679077148438, "r": 364.48675537109375, "b": 682.84664, "coord_origin": "TOPLEFT"}, "confidence": 0.9442476034164429, "cells": [{"id": 216, "text": "References", "bbox": {"l": 308.862, "t": 672.09892, "r": 364.40585, "b": 682.84664, "coord_origin": "TOPLEFT"}}]}, "text": "References"}, {"label": "list_item", "id": 19, "page_no": 7, "cluster": {"id": 19, "label": "list_item", "bbox": {"l": 313.0051574707031, "t": 692.8663940429688, "r": 545.1151123046875, "b": 713.3478393554688, "coord_origin": "TOPLEFT"}, "confidence": 0.8318727612495422, "cells": [{"id": 217, "text": "[1]", "bbox": {"l": 313.345, "t": 693.9617920000001, "r": 323.80792, "b": 701.977753, "coord_origin": "TOPLEFT"}}, {"id": 218, "text": "Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas", "bbox": {"l": 326.05127, "t": 693.9617920000001, "r": 545.10852, "b": 701.977753, "coord_origin": "TOPLEFT"}}, {"id": 219, "text": "Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to-", "bbox": {"l": 328.78101, "t": 704.920792, "r": 545.1134, "b": 712.936752, "coord_origin": "TOPLEFT"}}]}, "text": "[1] Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to-"}, {"label": "picture", "id": 21, "page_no": 7, "cluster": {"id": 21, "label": "picture", "bbox": {"l": 49.97499084472656, "t": 103.71235656738281, "r": 301.6349182128906, "b": 187.57875061035156, "coord_origin": "TOPLEFT"}, "confidence": 0.7873188853263855, "cells": []}, "text": "", "data": null, "provenance": null, "predicted_class": null, "confidence": null}], "headers": [{"label": "page_footer", "id": 20, "page_no": 7, "cluster": {"id": 20, "label": "page_footer", "bbox": {"l": 294.6642761230469, "t": 733.6441650390625, "r": 300.13397216796875, "b": 743.039928, "coord_origin": "TOPLEFT"}, "confidence": 0.8709819912910461, "cells": [{"id": 220, "text": "8", "bbox": {"l": 295.121, "t": 734.133366, "r": 300.10229, "b": 743.039928, "coord_origin": "TOPLEFT"}}]}, "text": "8"}]}}, {"page_no": 8, "size": {"width": 612.0, "height": 792.0}, "cells": [{"id": 0, "text": "end object detection with transformers. In Andrea Vedaldi,", "bbox": {"l": 70.030998, "t": 75.88378999999998, "r": 286.36334, "b": 83.89977999999996, "coord_origin": "TOPLEFT"}}, {"id": 1, "text": "Horst Bischof, Thomas Brox, and Jan-Michael Frahm, edi-", "bbox": {"l": 70.030998, "t": 86.84276999999997, "r": 286.36331, "b": 94.85875999999996, "coord_origin": "TOPLEFT"}}, {"id": 2, "text": "tors,", "bbox": {"l": 70.030998, "t": 97.80078000000003, "r": 85.722198, "b": 105.81677000000002, "coord_origin": "TOPLEFT"}}, {"id": 3, "text": "Computer Vision - ECCV 2020", "bbox": {"l": 87.889, "t": 97.88147000000004, "r": 199.93315, "b": 105.61053000000004, "coord_origin": "TOPLEFT"}}, {"id": 4, "text": ", pages 213-229, Cham,", "bbox": {"l": 199.936, "t": 97.80078000000003, "r": 286.36313, "b": 105.81677000000002, "coord_origin": "TOPLEFT"}}, {"id": 5, "text": "2020. Springer International Publishing. 5", "bbox": {"l": 70.031006, "t": 108.75977, "r": 221.94871999999998, "b": 116.77575999999999, "coord_origin": "TOPLEFT"}}, {"id": 6, "text": "[2]", "bbox": {"l": 54.595005, "t": 120.03174000000013, "r": 65.206657, "b": 128.04773, "coord_origin": "TOPLEFT"}}, {"id": 7, "text": "Zewen Chi, Heyan Huang, Heng-Da Xu, Houjin Yu, Wanx-", "bbox": {"l": 67.481873, "t": 120.03174000000013, "r": 286.35852, "b": 128.04773, "coord_origin": "TOPLEFT"}}, {"id": 8, "text": "uan Yin, and Xian-Ling Mao.", "bbox": {"l": 70.031006, "t": 130.99072, "r": 179.67215, "b": 139.00671, "coord_origin": "TOPLEFT"}}, {"id": 9, "text": "Complicated table structure", "bbox": {"l": 185.58101, "t": 130.99072, "r": 286.36334, "b": 139.00671, "coord_origin": "TOPLEFT"}}, {"id": 10, "text": "recognition.", "bbox": {"l": 70.031006, "t": 141.94970999999998, "r": 113.11456, "b": 149.96569999999997, "coord_origin": "TOPLEFT"}}, {"id": 11, "text": "arXiv preprint arXiv:1908.04729", "bbox": {"l": 116.34200999999999, "t": 142.0304, "r": 235.3082, "b": 149.75946, "coord_origin": "TOPLEFT"}}, {"id": 12, "text": ", 2019. 3", "bbox": {"l": 235.30701, "t": 141.94970999999998, "r": 267.67572, "b": 149.96569999999997, "coord_origin": "TOPLEFT"}}, {"id": 13, "text": "[3]", "bbox": {"l": 54.595001, "t": 153.22168, "r": 65.103195, "b": 161.23766999999998, "coord_origin": "TOPLEFT"}}, {"id": 14, "text": "Bertrand Couasnon and Aurelie Lemaitre.", "bbox": {"l": 67.356239, "t": 153.22168, "r": 218.77876, "b": 161.23766999999998, "coord_origin": "TOPLEFT"}}, {"id": 15, "text": "Recognition of Ta-", "bbox": {"l": 220.97999999999996, "t": 153.30237, "r": 286.36301, "b": 161.03143, "coord_origin": "TOPLEFT"}}, {"id": 16, "text": "bles and Forms", "bbox": {"l": 70.030991, "t": 164.26135, "r": 125.26401000000001, "b": 171.99041999999997, "coord_origin": "TOPLEFT"}}, {"id": 17, "text": ", pages 647-677. Springer London, London,", "bbox": {"l": 125.26098999999999, "t": 164.18066, "r": 286.36029, "b": 172.19665999999995, "coord_origin": "TOPLEFT"}}, {"id": 18, "text": "2014. 2", "bbox": {"l": 70.030991, "t": 175.13867000000005, "r": 97.916496, "b": 183.15466000000004, "coord_origin": "TOPLEFT"}}, {"id": 19, "text": "[4]", "bbox": {"l": 54.59499, "t": 186.41063999999994, "r": 65.806984, "b": 194.42664000000002, "coord_origin": "TOPLEFT"}}, {"id": 20, "text": "Herv\u00b4e D\u00b4ejean, Jean-Luc Meunier, Liangcai Gao, Yilun", "bbox": {"l": 68.210922, "t": 186.41063999999994, "r": 286.36401, "b": 194.42664000000002, "coord_origin": "TOPLEFT"}}, {"id": 21, "text": "Huang, Yu Fang, Florian Kleber, and Eva-Maria Lang. IC-", "bbox": {"l": 70.030983, "t": 197.36963000000003, "r": 286.36331, "b": 205.38562000000002, "coord_origin": "TOPLEFT"}}, {"id": 22, "text": "DAR 2019 Competition on Table Detection and Recognition", "bbox": {"l": 70.030983, "t": 208.32861000000003, "r": 286.36334, "b": 216.3446, "coord_origin": "TOPLEFT"}}, {"id": 23, "text": "(cTDaR), Apr. 2019. http://sac.founderit.com/. 2", "bbox": {"l": 70.030983, "t": 219.2876, "r": 245.83519, "b": 227.30358999999999, "coord_origin": "TOPLEFT"}}, {"id": 24, "text": "[5]", "bbox": {"l": 54.594982, "t": 230.55957, "r": 65.381134, "b": 238.57556, "coord_origin": "TOPLEFT"}}, {"id": 25, "text": "Basilios Gatos, Dimitrios Danatsas, Ioannis Pratikakis, and", "bbox": {"l": 67.693779, "t": 230.55957, "r": 286.35849, "b": 238.57556, "coord_origin": "TOPLEFT"}}, {"id": 26, "text": "Stavros J Perantonis. Automatic table detection in document", "bbox": {"l": 70.030983, "t": 241.51855, "r": 286.36334, "b": 249.53454999999997, "coord_origin": "TOPLEFT"}}, {"id": 27, "text": "images. In", "bbox": {"l": 70.030983, "t": 252.47655999999995, "r": 108.39821, "b": 260.49255000000005, "coord_origin": "TOPLEFT"}}, {"id": 28, "text": "International Conference on Pattern Recognition", "bbox": {"l": 110.64498000000002, "t": 252.55724999999995, "r": 286.3595, "b": 260.28632000000005, "coord_origin": "TOPLEFT"}}, {"id": 29, "text": "and Image Analysis", "bbox": {"l": 70.030983, "t": 263.51624000000004, "r": 140.57861, "b": 271.24530000000004, "coord_origin": "TOPLEFT"}}, {"id": 30, "text": ", pages 609-618. Springer, 2005. 2", "bbox": {"l": 140.57797, "t": 263.43555000000003, "r": 266.47522, "b": 271.45154, "coord_origin": "TOPLEFT"}}, {"id": 31, "text": "[6]", "bbox": {"l": 54.594971, "t": 274.70758, "r": 64.848648, "b": 282.72351, "coord_origin": "TOPLEFT"}}, {"id": 32, "text": "Max G\u00a8obel, Tamir Hassan, Ermelinda Oro, and Giorgio Orsi.", "bbox": {"l": 67.047119, "t": 274.70758, "r": 286.36676, "b": 282.72351, "coord_origin": "TOPLEFT"}}, {"id": 33, "text": "Icdar 2013 table competition.", "bbox": {"l": 70.030975, "t": 285.66655999999995, "r": 179.57349, "b": 293.68253, "coord_origin": "TOPLEFT"}}, {"id": 34, "text": "In", "bbox": {"l": 187.01559, "t": 285.66655999999995, "r": 194.4846, "b": 293.68253, "coord_origin": "TOPLEFT"}}, {"id": 35, "text": "2013 12th International", "bbox": {"l": 198.04398, "t": 285.74725, "r": 286.36304, "b": 293.47632, "coord_origin": "TOPLEFT"}}, {"id": 36, "text": "Conference on Document Analysis and Recognition", "bbox": {"l": 70.030975, "t": 296.70624, "r": 260.19937, "b": 304.43530000000004, "coord_origin": "TOPLEFT"}}, {"id": 37, "text": ", pages", "bbox": {"l": 260.198, "t": 296.62555, "r": 286.36197, "b": 304.64151, "coord_origin": "TOPLEFT"}}, {"id": 38, "text": "1449-1453, 2013. 2", "bbox": {"l": 70.030991, "t": 307.5845299999999, "r": 142.74849, "b": 315.6004899999999, "coord_origin": "TOPLEFT"}}, {"id": 39, "text": "[7]", "bbox": {"l": 54.59499, "t": 318.85654, "r": 65.61586, "b": 326.8725, "coord_origin": "TOPLEFT"}}, {"id": 40, "text": "EA Green and M Krishnamoorthy.", "bbox": {"l": 67.978821, "t": 318.85654, "r": 199.492, "b": 326.8725, "coord_origin": "TOPLEFT"}}, {"id": 41, "text": "Recognition of tables", "bbox": {"l": 206.98792, "t": 318.85654, "r": 286.35849, "b": 326.8725, "coord_origin": "TOPLEFT"}}, {"id": 42, "text": "using table grammars. procs.", "bbox": {"l": 70.030991, "t": 329.8145400000001, "r": 176.28284, "b": 337.83051, "coord_origin": "TOPLEFT"}}, {"id": 43, "text": "In", "bbox": {"l": 182.60416, "t": 329.8145400000001, "r": 190.07317, "b": 337.83051, "coord_origin": "TOPLEFT"}}, {"id": 44, "text": "Symposium on Document", "bbox": {"l": 193.28299, "t": 329.89522999999997, "r": 286.36319, "b": 337.62429999999995, "coord_origin": "TOPLEFT"}}, {"id": 45, "text": "Analysis and Recognition (SDAIR\u201995)", "bbox": {"l": 70.030991, "t": 340.85425, "r": 206.34717, "b": 348.58331, "coord_origin": "TOPLEFT"}}, {"id": 46, "text": ", pages 261-277. 2", "bbox": {"l": 206.34599, "t": 340.77356, "r": 274.82239, "b": 348.78952, "coord_origin": "TOPLEFT"}}, {"id": 47, "text": "[8]", "bbox": {"l": 54.594986000000006, "t": 352.0455600000001, "r": 65.04657, "b": 360.06152, "coord_origin": "TOPLEFT"}}, {"id": 48, "text": "Khurram Azeem Hashmi, Alain Pagani, Marcus Liwicki, Di-", "bbox": {"l": 67.287483, "t": 352.0455600000001, "r": 286.35849, "b": 360.06152, "coord_origin": "TOPLEFT"}}, {"id": 49, "text": "dier Stricker, and Muhammad Zeshan Afzal.", "bbox": {"l": 70.030983, "t": 363.00458, "r": 234.12507999999997, "b": 371.02054, "coord_origin": "TOPLEFT"}}, {"id": 50, "text": "Castabdetec-", "bbox": {"l": 240.05186, "t": 363.00458, "r": 286.36331, "b": 371.02054, "coord_origin": "TOPLEFT"}}, {"id": 51, "text": "tors: Cascade network for table detection in document im-", "bbox": {"l": 70.030983, "t": 373.96356, "r": 286.36331, "b": 381.97952, "coord_origin": "TOPLEFT"}}, {"id": 52, "text": "ages with recursive feature pyramid and switchable atrous", "bbox": {"l": 70.030983, "t": 384.92255, "r": 286.36331, "b": 392.93851, "coord_origin": "TOPLEFT"}}, {"id": 53, "text": "convolution.", "bbox": {"l": 70.030983, "t": 395.88153, "r": 114.57605, "b": 403.89749, "coord_origin": "TOPLEFT"}}, {"id": 54, "text": "Journal of Imaging", "bbox": {"l": 117.80399000000001, "t": 395.96222, "r": 186.7287, "b": 403.69128, "coord_origin": "TOPLEFT"}}, {"id": 55, "text": ", 7(10), 2021. 1", "bbox": {"l": 186.728, "t": 395.88153, "r": 243.00113999999996, "b": 403.89749, "coord_origin": "TOPLEFT"}}, {"id": 56, "text": "[9]", "bbox": {"l": 54.595001, "t": 407.15253000000007, "r": 65.334427, "b": 415.1684900000001, "coord_origin": "TOPLEFT"}}, {"id": 57, "text": "Kaiming He, Georgia Gkioxari, Piotr Dollar, and Ross Gir-", "bbox": {"l": 67.637054, "t": 407.15253000000007, "r": 286.35852, "b": 415.1684900000001, "coord_origin": "TOPLEFT"}}, {"id": 58, "text": "shick. Mask r-cnn. In", "bbox": {"l": 70.030998, "t": 418.11151, "r": 147.13306, "b": 426.12747, "coord_origin": "TOPLEFT"}}, {"id": 59, "text": "Proceedings of the IEEE International", "bbox": {"l": 149.15601, "t": 418.1922, "r": 286.35989, "b": 425.92126, "coord_origin": "TOPLEFT"}}, {"id": 60, "text": "Conference on Computer Vision (ICCV)", "bbox": {"l": 70.031006, "t": 429.15118, "r": 213.48445, "b": 436.88025, "coord_origin": "TOPLEFT"}}, {"id": 61, "text": ", Oct 2017. 1", "bbox": {"l": 213.483, "t": 429.07050000000004, "r": 261.04083, "b": 437.08646000000005, "coord_origin": "TOPLEFT"}}, {"id": 62, "text": "[10]", "bbox": {"l": 50.112, "t": 440.3424999999999, "r": 65.399307, "b": 448.3584599999999, "coord_origin": "TOPLEFT"}}, {"id": 63, "text": "Yelin He, X. Qi, Jiaquan Ye, Peng Gao, Yihao Chen, Bing-", "bbox": {"l": 67.693321, "t": 440.3424999999999, "r": 286.3587, "b": 448.3584599999999, "coord_origin": "TOPLEFT"}}, {"id": 64, "text": "cong Li, Xin Tang, and Rong Xiao.", "bbox": {"l": 70.030998, "t": 451.30151, "r": 202.74268, "b": 459.31747, "coord_origin": "TOPLEFT"}}, {"id": 65, "text": "Pingan-vcgroup\u2019s so-", "bbox": {"l": 209.00122, "t": 451.30151, "r": 286.36331, "b": 459.31747, "coord_origin": "TOPLEFT"}}, {"id": 66, "text": "lution for icdar 2021 competition on scientific table image", "bbox": {"l": 70.030998, "t": 462.2605, "r": 286.36334, "b": 470.27646, "coord_origin": "TOPLEFT"}}, {"id": 67, "text": "recognition to latex.", "bbox": {"l": 70.030998, "t": 473.21948, "r": 141.86981, "b": 481.23544, "coord_origin": "TOPLEFT"}}, {"id": 68, "text": "ArXiv", "bbox": {"l": 145.097, "t": 473.30017, "r": 166.01561, "b": 481.02924, "coord_origin": "TOPLEFT"}}, {"id": 69, "text": ", abs/2105.01846, 2021. 2", "bbox": {"l": 166.015, "t": 473.21948, "r": 259.90216, "b": 481.23544, "coord_origin": "TOPLEFT"}}, {"id": 70, "text": "[11]", "bbox": {"l": 50.112, "t": 484.49048, "r": 66.033806, "b": 492.50644, "coord_origin": "TOPLEFT"}}, {"id": 71, "text": "Jianying Hu, Ramanujan S Kashi, Daniel P Lopresti, and", "bbox": {"l": 68.423035, "t": 484.49048, "r": 286.35873, "b": 492.50644, "coord_origin": "TOPLEFT"}}, {"id": 72, "text": "Gordon Wilfong. Medium-independent table detection. In", "bbox": {"l": 70.030998, "t": 495.44946, "r": 286.36331, "b": 503.46542, "coord_origin": "TOPLEFT"}}, {"id": 73, "text": "Document Recognition and Retrieval VII", "bbox": {"l": 70.030998, "t": 506.48914, "r": 227.40926, "b": 514.2182, "coord_origin": "TOPLEFT"}}, {"id": 74, "text": ", volume 3967,", "bbox": {"l": 227.40500000000003, "t": 506.40845, "r": 286.35913, "b": 514.4244100000001, "coord_origin": "TOPLEFT"}}, {"id": 75, "text": "pages 291-302. International Society for Optics and Photon-", "bbox": {"l": 70.031006, "t": 517.36743, "r": 286.36328, "b": 525.38339, "coord_origin": "TOPLEFT"}}, {"id": 76, "text": "ics, 1999. 2", "bbox": {"l": 70.031006, "t": 528.32642, "r": 112.36138000000001, "b": 536.34238, "coord_origin": "TOPLEFT"}}, {"id": 77, "text": "[12]", "bbox": {"l": 50.112007, "t": 539.59842, "r": 65.466705, "b": 547.61438, "coord_origin": "TOPLEFT"}}, {"id": 78, "text": "Matthew Hurst. A constraint-based approach to table struc-", "bbox": {"l": 67.770828, "t": 539.59842, "r": 286.35873, "b": 547.61438, "coord_origin": "TOPLEFT"}}, {"id": 79, "text": "ture derivation. In", "bbox": {"l": 70.031006, "t": 550.55742, "r": 136.28374, "b": 558.57338, "coord_origin": "TOPLEFT"}}, {"id": 80, "text": "Proceedings of the Seventh International", "bbox": {"l": 138.811, "t": 550.63812, "r": 286.36206, "b": 558.36716, "coord_origin": "TOPLEFT"}}, {"id": 81, "text": "Conference on Document Analysis and Recognition - Volume", "bbox": {"l": 70.031006, "t": 561.5971199999999, "r": 286.36334, "b": 569.32616, "coord_origin": "TOPLEFT"}}, {"id": 82, "text": "2", "bbox": {"l": 70.031006, "t": 572.55612, "r": 74.514206, "b": 580.28516, "coord_origin": "TOPLEFT"}}, {"id": 83, "text": ", ICDAR \u201903, page 911, USA, 2003. IEEE Computer Soci-", "bbox": {"l": 74.514008, "t": 572.47542, "r": 286.36313, "b": 580.4913799999999, "coord_origin": "TOPLEFT"}}, {"id": 84, "text": "ety. 2", "bbox": {"l": 70.031006, "t": 583.4334100000001, "r": 90.357834, "b": 591.44937, "coord_origin": "TOPLEFT"}}, {"id": 85, "text": "[13]", "bbox": {"l": 50.112007, "t": 594.70541, "r": 66.270439, "b": 602.72137, "coord_origin": "TOPLEFT"}}, {"id": 86, "text": "Thotreingam Kasar, Philippine Barlas, Sebastien Adam,", "bbox": {"l": 68.695168, "t": 594.70541, "r": 286.35873, "b": 602.72137, "coord_origin": "TOPLEFT"}}, {"id": 87, "text": "Cl\u00b4ement Chatelain, and Thierry Paquet. Learning to detect", "bbox": {"l": 70.031006, "t": 605.66441, "r": 286.3631, "b": 613.68037, "coord_origin": "TOPLEFT"}}, {"id": 88, "text": "tables in scanned document images using line information.", "bbox": {"l": 70.031006, "t": 616.62341, "r": 286.36331, "b": 624.63937, "coord_origin": "TOPLEFT"}}, {"id": 89, "text": "In", "bbox": {"l": 70.031006, "t": 627.58241, "r": 77.500015, "b": 635.5983699999999, "coord_origin": "TOPLEFT"}}, {"id": 90, "text": "2013 12th International Conference on Document Analy-", "bbox": {"l": 79.920006, "t": 627.6631199999999, "r": 286.3624, "b": 635.39215, "coord_origin": "TOPLEFT"}}, {"id": 91, "text": "sis and Recognition", "bbox": {"l": 70.031006, "t": 638.62212, "r": 140.67728, "b": 646.35115, "coord_origin": "TOPLEFT"}}, {"id": 92, "text": ", pages 1185-1189. IEEE, 2013. 2", "bbox": {"l": 140.67599, "t": 638.54141, "r": 264.43921, "b": 646.55737, "coord_origin": "TOPLEFT"}}, {"id": 93, "text": "[14]", "bbox": {"l": 50.111992, "t": 649.81342, "r": 66.534035, "b": 657.82938, "coord_origin": "TOPLEFT"}}, {"id": 94, "text": "Pratik Kayal, Mrinal Anand, Harsh Desai, and Mayank", "bbox": {"l": 68.998329, "t": 649.81342, "r": 286.35873, "b": 657.82938, "coord_origin": "TOPLEFT"}}, {"id": 95, "text": "Singh.", "bbox": {"l": 70.030991, "t": 660.77142, "r": 93.200165, "b": 668.78738, "coord_origin": "TOPLEFT"}}, {"id": 96, "text": "Icdar 2021 competition on scientific table image", "bbox": {"l": 102.20243, "t": 660.77142, "r": 286.36334, "b": 668.78738, "coord_origin": "TOPLEFT"}}, {"id": 97, "text": "recognition to latex, 2021. 2", "bbox": {"l": 70.030991, "t": 671.73042, "r": 171.9969, "b": 679.74638, "coord_origin": "TOPLEFT"}}, {"id": 98, "text": "[15]", "bbox": {"l": 50.111992, "t": 683.00243, "r": 65.515968, "b": 691.01839, "coord_origin": "TOPLEFT"}}, {"id": 99, "text": "Harold W Kuhn. The hungarian method for the assignment", "bbox": {"l": 67.827499, "t": 683.00243, "r": 286.3587, "b": 691.01839, "coord_origin": "TOPLEFT"}}, {"id": 100, "text": "problem.", "bbox": {"l": 70.030991, "t": 693.9614260000001, "r": 102.15761, "b": 701.977386, "coord_origin": "TOPLEFT"}}, {"id": 101, "text": "Naval research logistics quarterly", "bbox": {"l": 107.54999, "t": 694.0421220000001, "r": 231.47461, "b": 701.771156, "coord_origin": "TOPLEFT"}}, {"id": 102, "text": ", 2(1-2):83-97,", "bbox": {"l": 231.47598, "t": 693.9614260000001, "r": 286.35931, "b": 701.977386, "coord_origin": "TOPLEFT"}}, {"id": 103, "text": "1955. 6", "bbox": {"l": 70.030975, "t": 704.920425, "r": 97.916481, "b": 712.936386, "coord_origin": "TOPLEFT"}}, {"id": 104, "text": "[16]", "bbox": {"l": 308.86197, "t": 75.88342000000011, "r": 324.74973, "b": 83.89940999999999, "coord_origin": "TOPLEFT"}}, {"id": 105, "text": "Girish Kulkarni, Visruth Premraj, Vicente Ordonez, Sag-", "bbox": {"l": 327.13382, "t": 75.88342000000011, "r": 545.1087, "b": 83.89940999999999, "coord_origin": "TOPLEFT"}}, {"id": 106, "text": "nik Dhar, Siming Li, Yejin Choi, Alexander C. Berg, and", "bbox": {"l": 328.78098, "t": 86.84142999999995, "r": 545.1134, "b": 94.85741999999993, "coord_origin": "TOPLEFT"}}, {"id": 107, "text": "Tamara L. Berg.", "bbox": {"l": 328.78098, "t": 97.80042000000003, "r": 390.96295, "b": 105.81641000000002, "coord_origin": "TOPLEFT"}}, {"id": 108, "text": "Babytalk:", "bbox": {"l": 400.27008, "t": 97.80042000000003, "r": 435.1404099999999, "b": 105.81641000000002, "coord_origin": "TOPLEFT"}}, {"id": 109, "text": "Understanding and generat-", "bbox": {"l": 441.71277, "t": 97.80042000000003, "r": 545.11328, "b": 105.81641000000002, "coord_origin": "TOPLEFT"}}, {"id": 110, "text": "ing simple image descriptions.", "bbox": {"l": 328.78098, "t": 108.75940000000003, "r": 440.80719, "b": 116.7753899999999, "coord_origin": "TOPLEFT"}}, {"id": 111, "text": "IEEE Transactions on Pat-", "bbox": {"l": 446.63498, "t": 108.84009000000003, "r": 545.11304, "b": 116.56914999999992, "coord_origin": "TOPLEFT"}}, {"id": 112, "text": "tern Analysis and Machine Intelligence", "bbox": {"l": 328.78098, "t": 119.79907000000003, "r": 471.13153, "b": 127.52814000000001, "coord_origin": "TOPLEFT"}}, {"id": 113, "text": ", 35(12):2891-2903,", "bbox": {"l": 471.13300000000004, "t": 119.71838000000002, "r": 545.11475, "b": 127.73437999999999, "coord_origin": "TOPLEFT"}}, {"id": 114, "text": "2013. 4", "bbox": {"l": 328.78101, "t": 130.67737, "r": 356.6665, "b": 138.69335999999998, "coord_origin": "TOPLEFT"}}, {"id": 115, "text": "[17]", "bbox": {"l": 308.862, "t": 142.12334999999996, "r": 325.24371, "b": 150.13933999999995, "coord_origin": "TOPLEFT"}}, {"id": 116, "text": "Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming", "bbox": {"l": 327.70197, "t": 142.12334999999996, "r": 545.10883, "b": 150.13933999999995, "coord_origin": "TOPLEFT"}}, {"id": 117, "text": "Zhou, and Zhoujun Li.", "bbox": {"l": 328.78101, "t": 153.08136000000002, "r": 414.44598, "b": 161.09735, "coord_origin": "TOPLEFT"}}, {"id": 118, "text": "Tablebank: A benchmark dataset", "bbox": {"l": 421.82532, "t": 153.08136000000002, "r": 545.1134, "b": 161.09735, "coord_origin": "TOPLEFT"}}, {"id": 119, "text": "for table detection and recognition, 2019. 2, 3", "bbox": {"l": 328.78101, "t": 164.04034000000001, "r": 493.62835999999993, "b": 172.05633999999998, "coord_origin": "TOPLEFT"}}, {"id": 120, "text": "[18]", "bbox": {"l": 308.862, "t": 175.48632999999995, "r": 324.26599, "b": 183.50232000000005, "coord_origin": "TOPLEFT"}}, {"id": 121, "text": "Yiren Li, Zheng Huang, Junchi Yan, Yi Zhou, Fan Ye, and", "bbox": {"l": 326.57751, "t": 175.48632999999995, "r": 545.10876, "b": 183.50232000000005, "coord_origin": "TOPLEFT"}}, {"id": 122, "text": "Xianhui Liu. Gfte: Graph-based financial table extraction.", "bbox": {"l": 328.78101, "t": 186.44530999999995, "r": 545.11334, "b": 194.46130000000005, "coord_origin": "TOPLEFT"}}, {"id": 123, "text": "In Alberto Del Bimbo, Rita Cucchiara, Stan Sclaroff, Gio-", "bbox": {"l": 328.78101, "t": 197.40430000000003, "r": 545.11346, "b": 205.42029000000002, "coord_origin": "TOPLEFT"}}, {"id": 124, "text": "vanni Maria Farinella, Tao Mei, Marco Bertini, Hugo Jair", "bbox": {"l": 328.78101, "t": 208.36328000000003, "r": 545.11353, "b": 216.37927000000002, "coord_origin": "TOPLEFT"}}, {"id": 125, "text": "Escalante, and Roberto Vezzani, editors,", "bbox": {"l": 328.78101, "t": 219.32227, "r": 479.26413, "b": 227.33826, "coord_origin": "TOPLEFT"}}, {"id": 126, "text": "Pattern Recogni-", "bbox": {"l": 483.11902, "t": 219.40295000000003, "r": 545.11273, "b": 227.13202, "coord_origin": "TOPLEFT"}}, {"id": 127, "text": "tion. ICPR International Workshops and Challenges", "bbox": {"l": 328.78101, "t": 230.36095999999998, "r": 519.39771, "b": 238.09002999999996, "coord_origin": "TOPLEFT"}}, {"id": 128, "text": ", pages", "bbox": {"l": 519.401, "t": 230.28026999999997, "r": 545.10767, "b": 238.29625999999996, "coord_origin": "TOPLEFT"}}, {"id": 129, "text": "644-658, Cham, 2021. Springer International Publishing. 2,", "bbox": {"l": 328.78101, "t": 241.23925999999994, "r": 545.11328, "b": 249.25525000000005, "coord_origin": "TOPLEFT"}}, {"id": 130, "text": "3", "bbox": {"l": 328.78101, "t": 252.19824000000006, "r": 333.26422, "b": 260.21423000000004, "coord_origin": "TOPLEFT"}}, {"id": 131, "text": "[19]", "bbox": {"l": 308.862, "t": 263.64423, "r": 324.26477, "b": 271.66022, "coord_origin": "TOPLEFT"}}, {"id": 132, "text": "Nikolaos Livathinos, Cesar Berrospi, Maksym Lysak, Vik-", "bbox": {"l": 326.57611, "t": 263.64423, "r": 545.10883, "b": 271.66022, "coord_origin": "TOPLEFT"}}, {"id": 133, "text": "tor Kuropiatnyk, Ahmed Nassar, Andre Carvalho, Michele", "bbox": {"l": 328.78101, "t": 274.60321, "r": 545.1134, "b": 282.61917000000005, "coord_origin": "TOPLEFT"}}, {"id": 134, "text": "Dolfi, Christoph Auer, Kasper Dinkla, and Peter Staar. Ro-", "bbox": {"l": 328.78101, "t": 285.56219, "r": 545.11328, "b": 293.57816, "coord_origin": "TOPLEFT"}}, {"id": 135, "text": "bust pdf document conversion using recurrent neural net-", "bbox": {"l": 328.78101, "t": 296.52118, "r": 545.11334, "b": 304.53714, "coord_origin": "TOPLEFT"}}, {"id": 136, "text": "works.", "bbox": {"l": 328.78101, "t": 307.47919, "r": 352.84683, "b": 315.49515, "coord_origin": "TOPLEFT"}}, {"id": 137, "text": "Proceedings of the AAAI Conference on Artificial", "bbox": {"l": 360.23599, "t": 307.55988, "r": 545.1142, "b": 315.28894, "coord_origin": "TOPLEFT"}}, {"id": 138, "text": "Intelligence", "bbox": {"l": 328.78101, "t": 318.51886, "r": 371.02173, "b": 326.24792, "coord_origin": "TOPLEFT"}}, {"id": 139, "text": ", 35(17):15137-15145, May 2021. 1", "bbox": {"l": 371.021, "t": 318.43817, "r": 502.26227, "b": 326.45413, "coord_origin": "TOPLEFT"}}, {"id": 140, "text": "[20]", "bbox": {"l": 308.862, "t": 329.88419, "r": 323.82672, "b": 337.90015, "coord_origin": "TOPLEFT"}}, {"id": 141, "text": "Rujiao Long, Wen Wang, Nan Xue, Feiyu Gao, Zhibo Yang,", "bbox": {"l": 326.07233, "t": 329.88419, "r": 545.10876, "b": 337.90015, "coord_origin": "TOPLEFT"}}, {"id": 142, "text": "Yongpan Wang, and Gui-Song Xia. Parsing table structures", "bbox": {"l": 328.78101, "t": 340.8432, "r": 545.11346, "b": 348.85916, "coord_origin": "TOPLEFT"}}, {"id": 143, "text": "in the wild. In", "bbox": {"l": 328.78101, "t": 351.80219000000005, "r": 382.7767, "b": 359.81815000000006, "coord_origin": "TOPLEFT"}}, {"id": 144, "text": "Proceedings of the IEEE/CVF International", "bbox": {"l": 385.54102, "t": 351.88287, "r": 545.11609, "b": 359.61194, "coord_origin": "TOPLEFT"}}, {"id": 145, "text": "Conference on Computer Vision", "bbox": {"l": 328.78101, "t": 362.84186, "r": 443.59579, "b": 370.57092, "coord_origin": "TOPLEFT"}}, {"id": 146, "text": ", pages 944-952, 2021. 2", "bbox": {"l": 443.59399, "t": 362.76117, "r": 534.48645, "b": 370.77713, "coord_origin": "TOPLEFT"}}, {"id": 147, "text": "[21]", "bbox": {"l": 308.862, "t": 374.20618, "r": 324.60281, "b": 382.22214, "coord_origin": "TOPLEFT"}}, {"id": 148, "text": "Shubham", "bbox": {"l": 326.96487, "t": 374.20618, "r": 362.6604, "b": 382.22214, "coord_origin": "TOPLEFT"}}, {"id": 149, "text": "Singh", "bbox": {"l": 368.69479, "t": 374.20618, "r": 389.6134, "b": 382.22214, "coord_origin": "TOPLEFT"}}, {"id": 150, "text": "Paliwal,", "bbox": {"l": 395.6478, "t": 374.20618, "r": 424.56445, "b": 382.22214, "coord_origin": "TOPLEFT"}}, {"id": 151, "text": "D", "bbox": {"l": 431.5492899999999, "t": 374.20618, "r": 438.0230399999999, "b": 382.22214, "coord_origin": "TOPLEFT"}}, {"id": 152, "text": "Vishwanath,", "bbox": {"l": 444.05743, "t": 374.20618, "r": 488.5038799999999, "b": 382.22214, "coord_origin": "TOPLEFT"}}, {"id": 153, "text": "Rohit", "bbox": {"l": 495.47974, "t": 374.20618, "r": 515.41205, "b": 382.22214, "coord_origin": "TOPLEFT"}}, {"id": 154, "text": "Rahul,", "bbox": {"l": 521.44641, "t": 374.20618, "r": 545.10876, "b": 382.22214, "coord_origin": "TOPLEFT"}}, {"id": 155, "text": "Monika Sharma, and Lovekesh Vig. Tablenet: Deep learn-", "bbox": {"l": 328.78101, "t": 385.16516, "r": 545.1134, "b": 393.18112, "coord_origin": "TOPLEFT"}}, {"id": 156, "text": "ing model for end-to-end table detection and tabular data ex-", "bbox": {"l": 328.78101, "t": 396.12415, "r": 545.11346, "b": 404.14011, "coord_origin": "TOPLEFT"}}, {"id": 157, "text": "traction from scanned document images.", "bbox": {"l": 328.78101, "t": 407.08313, "r": 478.00881999999996, "b": 415.09909, "coord_origin": "TOPLEFT"}}, {"id": 158, "text": "In", "bbox": {"l": 484.0701, "t": 407.08313, "r": 491.53912, "b": 415.09909, "coord_origin": "TOPLEFT"}}, {"id": 159, "text": "2019 Interna-", "bbox": {"l": 494.668, "t": 407.16382, "r": 545.11298, "b": 414.89288, "coord_origin": "TOPLEFT"}}, {"id": 160, "text": "tional Conference on Document Analysis and Recognition", "bbox": {"l": 328.78101, "t": 418.12280000000004, "r": 545.11334, "b": 425.85187, "coord_origin": "TOPLEFT"}}, {"id": 161, "text": "(ICDAR)", "bbox": {"l": 328.78101, "t": 429.08179, "r": 360.83591, "b": 436.8108500000001, "coord_origin": "TOPLEFT"}}, {"id": 162, "text": ", pages 128-133. IEEE, 2019. 1", "bbox": {"l": 360.836, "t": 429.0011, "r": 475.63287, "b": 437.01706, "coord_origin": "TOPLEFT"}}, {"id": 163, "text": "[22]", "bbox": {"l": 308.862, "t": 440.44611, "r": 324.57407, "b": 448.46207, "coord_origin": "TOPLEFT"}}, {"id": 164, "text": "Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer,", "bbox": {"l": 326.93179, "t": 440.44611, "r": 545.1087, "b": 448.46207, "coord_origin": "TOPLEFT"}}, {"id": 165, "text": "James Bradbury, Gregory Chanan, Trevor Killeen, Zeming", "bbox": {"l": 328.78101, "t": 451.40509, "r": 545.11346, "b": 459.42105, "coord_origin": "TOPLEFT"}}, {"id": 166, "text": "Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison,", "bbox": {"l": 328.78101, "t": 462.36407, "r": 545.11328, "b": 470.38004, "coord_origin": "TOPLEFT"}}, {"id": 167, "text": "Andreas Kopf, Edward Yang, Zachary DeVito, Martin Rai-", "bbox": {"l": 328.78101, "t": 473.32306, "r": 545.11328, "b": 481.33902, "coord_origin": "TOPLEFT"}}, {"id": 168, "text": "son, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner,", "bbox": {"l": 328.78101, "t": 484.28204, "r": 545.11328, "b": 492.298, "coord_origin": "TOPLEFT"}}, {"id": 169, "text": "Lu Fang, Junjie Bai, and Soumith Chintala. Pytorch: An im-", "bbox": {"l": 328.78101, "t": 495.24103, "r": 545.1134, "b": 503.25699, "coord_origin": "TOPLEFT"}}, {"id": 170, "text": "perative style, high-performance deep learning library. In H.", "bbox": {"l": 328.78101, "t": 506.20001, "r": 545.1134, "b": 514.21597, "coord_origin": "TOPLEFT"}}, {"id": 171, "text": "Wallach, H. Larochelle, A. Beygelzimer, F. d'Alch\u00b4e-Buc, E.", "bbox": {"l": 328.78101, "t": 517.159, "r": 545.1098, "b": 525.17496, "coord_origin": "TOPLEFT"}}, {"id": 172, "text": "Fox, and R. Garnett, editors,", "bbox": {"l": 328.78101, "t": 528.117, "r": 434.56659, "b": 536.13297, "coord_origin": "TOPLEFT"}}, {"id": 173, "text": "Advances in Neural Informa-", "bbox": {"l": 437.86401, "t": 528.19769, "r": 545.11115, "b": 535.9267600000001, "coord_origin": "TOPLEFT"}}, {"id": 174, "text": "tion Processing Systems 32", "bbox": {"l": 328.78101, "t": 539.15671, "r": 425.73471, "b": 546.8857399999999, "coord_origin": "TOPLEFT"}}, {"id": 175, "text": ", pages 8024-8035. Curran Asso-", "bbox": {"l": 425.73602, "t": 539.076, "r": 545.11475, "b": 547.09196, "coord_origin": "TOPLEFT"}}, {"id": 176, "text": "ciates, Inc., 2019. 6", "bbox": {"l": 328.78101, "t": 550.035, "r": 399.74109, "b": 558.05096, "coord_origin": "TOPLEFT"}}, {"id": 177, "text": "[23]", "bbox": {"l": 308.862, "t": 561.481, "r": 324.50351, "b": 569.49696, "coord_origin": "TOPLEFT"}}, {"id": 178, "text": "Devashish Prasad, Ayan Gadpal, Kshitij Kapadni, Manish", "bbox": {"l": 326.85068, "t": 561.481, "r": 545.10876, "b": 569.49696, "coord_origin": "TOPLEFT"}}, {"id": 179, "text": "Visave, and Kavita Sultanpure. Cascadetabnet: An approach", "bbox": {"l": 328.78101, "t": 572.44, "r": 545.1134, "b": 580.45596, "coord_origin": "TOPLEFT"}}, {"id": 180, "text": "for end to end table detection and structure recognition from", "bbox": {"l": 328.78101, "t": 583.399, "r": 545.11334, "b": 591.4149600000001, "coord_origin": "TOPLEFT"}}, {"id": 181, "text": "image-based documents. In", "bbox": {"l": 328.78101, "t": 594.358, "r": 431.61667, "b": 602.37396, "coord_origin": "TOPLEFT"}}, {"id": 182, "text": "Proceedings of the IEEE/CVF", "bbox": {"l": 434.69101000000006, "t": 594.4387099999999, "r": 545.11224, "b": 602.16774, "coord_origin": "TOPLEFT"}}, {"id": 183, "text": "Conference on Computer Vision and Pattern Recognition", "bbox": {"l": 328.78101, "t": 605.39671, "r": 545.1134, "b": 613.12575, "coord_origin": "TOPLEFT"}}, {"id": 184, "text": "Workshops", "bbox": {"l": 328.78101, "t": 616.35571, "r": 367.8028, "b": 624.08475, "coord_origin": "TOPLEFT"}}, {"id": 185, "text": ", pages 572-573, 2020. 1", "bbox": {"l": 367.802, "t": 616.2750100000001, "r": 458.69446000000005, "b": 624.29097, "coord_origin": "TOPLEFT"}}, {"id": 186, "text": "[24]", "bbox": {"l": 308.862, "t": 627.72101, "r": 324.69476, "b": 635.73697, "coord_origin": "TOPLEFT"}}, {"id": 187, "text": "Shah Rukh Qasim, Hassan Mahmood, and Faisal Shafait.", "bbox": {"l": 327.07065, "t": 627.72101, "r": 545.1087, "b": 635.73697, "coord_origin": "TOPLEFT"}}, {"id": 188, "text": "Rethinking table recognition using graph neural networks.", "bbox": {"l": 328.78101, "t": 638.68001, "r": 545.11328, "b": 646.69597, "coord_origin": "TOPLEFT"}}, {"id": 189, "text": "In", "bbox": {"l": 328.78101, "t": 649.63901, "r": 336.25003, "b": 657.65497, "coord_origin": "TOPLEFT"}}, {"id": 190, "text": "2019 International Conference on Document Analysis and", "bbox": {"l": 338.10001, "t": 649.71971, "r": 545.11621, "b": 657.44875, "coord_origin": "TOPLEFT"}}, {"id": 191, "text": "Recognition (ICDAR)", "bbox": {"l": 328.78101, "t": 660.67871, "r": 406.32245, "b": 668.40775, "coord_origin": "TOPLEFT"}}, {"id": 192, "text": ", pages 142-147. IEEE, 2019. 3", "bbox": {"l": 406.32202, "t": 660.5980099999999, "r": 521.1189, "b": 668.61398, "coord_origin": "TOPLEFT"}}, {"id": 193, "text": "[25]", "bbox": {"l": 308.86203, "t": 672.04301, "r": 324.71329, "b": 680.05898, "coord_origin": "TOPLEFT"}}, {"id": 194, "text": "Hamid Rezatofighi, Nathan Tsoi, JunYoung Gwak, Amir", "bbox": {"l": 327.09195, "t": 672.04301, "r": 545.10876, "b": 680.05898, "coord_origin": "TOPLEFT"}}, {"id": 195, "text": "Sadeghian, Ian Reid, and Silvio Savarese.", "bbox": {"l": 328.78104, "t": 683.0020099999999, "r": 482.81488, "b": 691.01797, "coord_origin": "TOPLEFT"}}, {"id": 196, "text": "Generalized in-", "bbox": {"l": 488.75064, "t": 683.0020099999999, "r": 545.1134, "b": 691.01797, "coord_origin": "TOPLEFT"}}, {"id": 197, "text": "tersection over union: A metric and a loss for bounding box", "bbox": {"l": 328.78104, "t": 693.961014, "r": 545.11334, "b": 701.976974, "coord_origin": "TOPLEFT"}}, {"id": 198, "text": "regression. In", "bbox": {"l": 328.78104, "t": 704.920013, "r": 379.1543, "b": 712.935974, "coord_origin": "TOPLEFT"}}, {"id": 199, "text": "Proceedings of the IEEE/CVF Conference on", "bbox": {"l": 381.61603, "t": 705.00071, "r": 545.10938, "b": 712.729744, "coord_origin": "TOPLEFT"}}, {"id": 200, "text": "9", "bbox": {"l": 295.12103, "t": 734.1325870000001, "r": 300.10233, "b": 743.0391500000001, "coord_origin": "TOPLEFT"}}], "predictions": {"layout": {"clusters": [{"id": 0, "label": "list_item", "bbox": {"l": 69.20614624023438, "t": 74.8327865600586, "r": 286.36334, "b": 116.86038208007812, "coord_origin": "TOPLEFT"}, "confidence": 0.7310391664505005, "cells": [{"id": 0, "text": "end object detection with transformers. In Andrea Vedaldi,", "bbox": {"l": 70.030998, "t": 75.88378999999998, "r": 286.36334, "b": 83.89977999999996, "coord_origin": "TOPLEFT"}}, {"id": 1, "text": "Horst Bischof, Thomas Brox, and Jan-Michael Frahm, edi-", "bbox": {"l": 70.030998, "t": 86.84276999999997, "r": 286.36331, "b": 94.85875999999996, "coord_origin": "TOPLEFT"}}, {"id": 2, "text": "tors,", "bbox": {"l": 70.030998, "t": 97.80078000000003, "r": 85.722198, "b": 105.81677000000002, "coord_origin": "TOPLEFT"}}, {"id": 3, "text": "Computer Vision - ECCV 2020", "bbox": {"l": 87.889, "t": 97.88147000000004, "r": 199.93315, "b": 105.61053000000004, "coord_origin": "TOPLEFT"}}, {"id": 4, "text": ", pages 213-229, Cham,", "bbox": {"l": 199.936, "t": 97.80078000000003, "r": 286.36313, "b": 105.81677000000002, "coord_origin": "TOPLEFT"}}, {"id": 5, "text": "2020. Springer International Publishing. 5", "bbox": {"l": 70.031006, "t": 108.75977, "r": 221.94871999999998, "b": 116.77575999999999, "coord_origin": "TOPLEFT"}}]}, {"id": 1, "label": "list_item", "bbox": {"l": 54.220462799072266, "t": 118.9831771850586, "r": 286.4865417480469, "b": 150.44512939453125, "coord_origin": "TOPLEFT"}, "confidence": 0.937275767326355, "cells": [{"id": 6, "text": "[2]", "bbox": {"l": 54.595005, "t": 120.03174000000013, "r": 65.206657, "b": 128.04773, "coord_origin": "TOPLEFT"}}, {"id": 7, "text": "Zewen Chi, Heyan Huang, Heng-Da Xu, Houjin Yu, Wanx-", "bbox": {"l": 67.481873, "t": 120.03174000000013, "r": 286.35852, "b": 128.04773, "coord_origin": "TOPLEFT"}}, {"id": 8, "text": "uan Yin, and Xian-Ling Mao.", "bbox": {"l": 70.031006, "t": 130.99072, "r": 179.67215, "b": 139.00671, "coord_origin": "TOPLEFT"}}, {"id": 9, "text": "Complicated table structure", "bbox": {"l": 185.58101, "t": 130.99072, "r": 286.36334, "b": 139.00671, "coord_origin": "TOPLEFT"}}, {"id": 10, "text": "recognition.", "bbox": {"l": 70.031006, "t": 141.94970999999998, "r": 113.11456, "b": 149.96569999999997, "coord_origin": "TOPLEFT"}}, {"id": 11, "text": "arXiv preprint arXiv:1908.04729", "bbox": {"l": 116.34200999999999, "t": 142.0304, "r": 235.3082, "b": 149.75946, "coord_origin": "TOPLEFT"}}, {"id": 12, "text": ", 2019. 3", "bbox": {"l": 235.30701, "t": 141.94970999999998, "r": 267.67572, "b": 149.96569999999997, "coord_origin": "TOPLEFT"}}]}, {"id": 2, "label": "list_item", "bbox": {"l": 54.171104431152344, "t": 152.46226501464844, "r": 286.6875, "b": 183.15466000000004, "coord_origin": "TOPLEFT"}, "confidence": 0.9378376603126526, "cells": [{"id": 13, "text": "[3]", "bbox": {"l": 54.595001, "t": 153.22168, "r": 65.103195, "b": 161.23766999999998, "coord_origin": "TOPLEFT"}}, {"id": 14, "text": "Bertrand Couasnon and Aurelie Lemaitre.", "bbox": {"l": 67.356239, "t": 153.22168, "r": 218.77876, "b": 161.23766999999998, "coord_origin": "TOPLEFT"}}, {"id": 15, "text": "Recognition of Ta-", "bbox": {"l": 220.97999999999996, "t": 153.30237, "r": 286.36301, "b": 161.03143, "coord_origin": "TOPLEFT"}}, {"id": 16, "text": "bles and Forms", "bbox": {"l": 70.030991, "t": 164.26135, "r": 125.26401000000001, "b": 171.99041999999997, "coord_origin": "TOPLEFT"}}, {"id": 17, "text": ", pages 647-677. Springer London, London,", "bbox": {"l": 125.26098999999999, "t": 164.18066, "r": 286.36029, "b": 172.19665999999995, "coord_origin": "TOPLEFT"}}, {"id": 18, "text": "2014. 2", "bbox": {"l": 70.030991, "t": 175.13867000000005, "r": 97.916496, "b": 183.15466000000004, "coord_origin": "TOPLEFT"}}]}, {"id": 3, "label": "list_item", "bbox": {"l": 54.094303131103516, "t": 185.52200317382812, "r": 286.5216979980469, "b": 227.71087646484375, "coord_origin": "TOPLEFT"}, "confidence": 0.9648825526237488, "cells": [{"id": 19, "text": "[4]", "bbox": {"l": 54.59499, "t": 186.41063999999994, "r": 65.806984, "b": 194.42664000000002, "coord_origin": "TOPLEFT"}}, {"id": 20, "text": "Herv\u00b4e D\u00b4ejean, Jean-Luc Meunier, Liangcai Gao, Yilun", "bbox": {"l": 68.210922, "t": 186.41063999999994, "r": 286.36401, "b": 194.42664000000002, "coord_origin": "TOPLEFT"}}, {"id": 21, "text": "Huang, Yu Fang, Florian Kleber, and Eva-Maria Lang. IC-", "bbox": {"l": 70.030983, "t": 197.36963000000003, "r": 286.36331, "b": 205.38562000000002, "coord_origin": "TOPLEFT"}}, {"id": 22, "text": "DAR 2019 Competition on Table Detection and Recognition", "bbox": {"l": 70.030983, "t": 208.32861000000003, "r": 286.36334, "b": 216.3446, "coord_origin": "TOPLEFT"}}, {"id": 23, "text": "(cTDaR), Apr. 2019. http://sac.founderit.com/. 2", "bbox": {"l": 70.030983, "t": 219.2876, "r": 245.83519, "b": 227.30358999999999, "coord_origin": "TOPLEFT"}}]}, {"id": 4, "label": "list_item", "bbox": {"l": 54.09014129638672, "t": 229.7616424560547, "r": 286.36334, "b": 271.5709228515625, "coord_origin": "TOPLEFT"}, "confidence": 0.9620944261550903, "cells": [{"id": 24, "text": "[5]", "bbox": {"l": 54.594982, "t": 230.55957, "r": 65.381134, "b": 238.57556, "coord_origin": "TOPLEFT"}}, {"id": 25, "text": "Basilios Gatos, Dimitrios Danatsas, Ioannis Pratikakis, and", "bbox": {"l": 67.693779, "t": 230.55957, "r": 286.35849, "b": 238.57556, "coord_origin": "TOPLEFT"}}, {"id": 26, "text": "Stavros J Perantonis. Automatic table detection in document", "bbox": {"l": 70.030983, "t": 241.51855, "r": 286.36334, "b": 249.53454999999997, "coord_origin": "TOPLEFT"}}, {"id": 27, "text": "images. In", "bbox": {"l": 70.030983, "t": 252.47655999999995, "r": 108.39821, "b": 260.49255000000005, "coord_origin": "TOPLEFT"}}, {"id": 28, "text": "International Conference on Pattern Recognition", "bbox": {"l": 110.64498000000002, "t": 252.55724999999995, "r": 286.3595, "b": 260.28632000000005, "coord_origin": "TOPLEFT"}}, {"id": 29, "text": "and Image Analysis", "bbox": {"l": 70.030983, "t": 263.51624000000004, "r": 140.57861, "b": 271.24530000000004, "coord_origin": "TOPLEFT"}}, {"id": 30, "text": ", pages 609-618. Springer, 2005. 2", "bbox": {"l": 140.57797, "t": 263.43555000000003, "r": 266.47522, "b": 271.45154, "coord_origin": "TOPLEFT"}}]}, {"id": 5, "label": "list_item", "bbox": {"l": 54.06441879272461, "t": 273.7647399902344, "r": 286.9118347167969, "b": 315.6004899999999, "coord_origin": "TOPLEFT"}, "confidence": 0.9555517435073853, "cells": [{"id": 31, "text": "[6]", "bbox": {"l": 54.594971, "t": 274.70758, "r": 64.848648, "b": 282.72351, "coord_origin": "TOPLEFT"}}, {"id": 32, "text": "Max G\u00a8obel, Tamir Hassan, Ermelinda Oro, and Giorgio Orsi.", "bbox": {"l": 67.047119, "t": 274.70758, "r": 286.36676, "b": 282.72351, "coord_origin": "TOPLEFT"}}, {"id": 33, "text": "Icdar 2013 table competition.", "bbox": {"l": 70.030975, "t": 285.66655999999995, "r": 179.57349, "b": 293.68253, "coord_origin": "TOPLEFT"}}, {"id": 34, "text": "In", "bbox": {"l": 187.01559, "t": 285.66655999999995, "r": 194.4846, "b": 293.68253, "coord_origin": "TOPLEFT"}}, {"id": 35, "text": "2013 12th International", "bbox": {"l": 198.04398, "t": 285.74725, "r": 286.36304, "b": 293.47632, "coord_origin": "TOPLEFT"}}, {"id": 36, "text": "Conference on Document Analysis and Recognition", "bbox": {"l": 70.030975, "t": 296.70624, "r": 260.19937, "b": 304.43530000000004, "coord_origin": "TOPLEFT"}}, {"id": 37, "text": ", pages", "bbox": {"l": 260.198, "t": 296.62555, "r": 286.36197, "b": 304.64151, "coord_origin": "TOPLEFT"}}, {"id": 38, "text": "1449-1453, 2013. 2", "bbox": {"l": 70.030991, "t": 307.5845299999999, "r": 142.74849, "b": 315.6004899999999, "coord_origin": "TOPLEFT"}}]}, {"id": 6, "label": "list_item", "bbox": {"l": 54.08487319946289, "t": 317.6698913574219, "r": 286.5190734863281, "b": 348.78952, "coord_origin": "TOPLEFT"}, "confidence": 0.9479843378067017, "cells": [{"id": 39, "text": "[7]", "bbox": {"l": 54.59499, "t": 318.85654, "r": 65.61586, "b": 326.8725, "coord_origin": "TOPLEFT"}}, {"id": 40, "text": "EA Green and M Krishnamoorthy.", "bbox": {"l": 67.978821, "t": 318.85654, "r": 199.492, "b": 326.8725, "coord_origin": "TOPLEFT"}}, {"id": 41, "text": "Recognition of tables", "bbox": {"l": 206.98792, "t": 318.85654, "r": 286.35849, "b": 326.8725, "coord_origin": "TOPLEFT"}}, {"id": 42, "text": "using table grammars. procs.", "bbox": {"l": 70.030991, "t": 329.8145400000001, "r": 176.28284, "b": 337.83051, "coord_origin": "TOPLEFT"}}, {"id": 43, "text": "In", "bbox": {"l": 182.60416, "t": 329.8145400000001, "r": 190.07317, "b": 337.83051, "coord_origin": "TOPLEFT"}}, {"id": 44, "text": "Symposium on Document", "bbox": {"l": 193.28299, "t": 329.89522999999997, "r": 286.36319, "b": 337.62429999999995, "coord_origin": "TOPLEFT"}}, {"id": 45, "text": "Analysis and Recognition (SDAIR\u201995)", "bbox": {"l": 70.030991, "t": 340.85425, "r": 206.34717, "b": 348.58331, "coord_origin": "TOPLEFT"}}, {"id": 46, "text": ", pages 261-277. 2", "bbox": {"l": 206.34599, "t": 340.77356, "r": 274.82239, "b": 348.78952, "coord_origin": "TOPLEFT"}}]}, {"id": 7, "label": "list_item", "bbox": {"l": 54.01877212524414, "t": 351.1116027832031, "r": 286.37677001953125, "b": 404.01751708984375, "coord_origin": "TOPLEFT"}, "confidence": 0.948442816734314, "cells": [{"id": 47, "text": "[8]", "bbox": {"l": 54.594986000000006, "t": 352.0455600000001, "r": 65.04657, "b": 360.06152, "coord_origin": "TOPLEFT"}}, {"id": 48, "text": "Khurram Azeem Hashmi, Alain Pagani, Marcus Liwicki, Di-", "bbox": {"l": 67.287483, "t": 352.0455600000001, "r": 286.35849, "b": 360.06152, "coord_origin": "TOPLEFT"}}, {"id": 49, "text": "dier Stricker, and Muhammad Zeshan Afzal.", "bbox": {"l": 70.030983, "t": 363.00458, "r": 234.12507999999997, "b": 371.02054, "coord_origin": "TOPLEFT"}}, {"id": 50, "text": "Castabdetec-", "bbox": {"l": 240.05186, "t": 363.00458, "r": 286.36331, "b": 371.02054, "coord_origin": "TOPLEFT"}}, {"id": 51, "text": "tors: Cascade network for table detection in document im-", "bbox": {"l": 70.030983, "t": 373.96356, "r": 286.36331, "b": 381.97952, "coord_origin": "TOPLEFT"}}, {"id": 52, "text": "ages with recursive feature pyramid and switchable atrous", "bbox": {"l": 70.030983, "t": 384.92255, "r": 286.36331, "b": 392.93851, "coord_origin": "TOPLEFT"}}, {"id": 53, "text": "convolution.", "bbox": {"l": 70.030983, "t": 395.88153, "r": 114.57605, "b": 403.89749, "coord_origin": "TOPLEFT"}}, {"id": 54, "text": "Journal of Imaging", "bbox": {"l": 117.80399000000001, "t": 395.96222, "r": 186.7287, "b": 403.69128, "coord_origin": "TOPLEFT"}}, {"id": 55, "text": ", 7(10), 2021. 1", "bbox": {"l": 186.728, "t": 395.88153, "r": 243.00113999999996, "b": 403.89749, "coord_origin": "TOPLEFT"}}]}, {"id": 8, "label": "list_item", "bbox": {"l": 53.796630859375, "t": 406.2373352050781, "r": 286.63372802734375, "b": 437.5993957519531, "coord_origin": "TOPLEFT"}, "confidence": 0.9330759048461914, "cells": [{"id": 56, "text": "[9]", "bbox": {"l": 54.595001, "t": 407.15253000000007, "r": 65.334427, "b": 415.1684900000001, "coord_origin": "TOPLEFT"}}, {"id": 57, "text": "Kaiming He, Georgia Gkioxari, Piotr Dollar, and Ross Gir-", "bbox": {"l": 67.637054, "t": 407.15253000000007, "r": 286.35852, "b": 415.1684900000001, "coord_origin": "TOPLEFT"}}, {"id": 58, "text": "shick. Mask r-cnn. In", "bbox": {"l": 70.030998, "t": 418.11151, "r": 147.13306, "b": 426.12747, "coord_origin": "TOPLEFT"}}, {"id": 59, "text": "Proceedings of the IEEE International", "bbox": {"l": 149.15601, "t": 418.1922, "r": 286.35989, "b": 425.92126, "coord_origin": "TOPLEFT"}}, {"id": 60, "text": "Conference on Computer Vision (ICCV)", "bbox": {"l": 70.031006, "t": 429.15118, "r": 213.48445, "b": 436.88025, "coord_origin": "TOPLEFT"}}, {"id": 61, "text": ", Oct 2017. 1", "bbox": {"l": 213.483, "t": 429.07050000000004, "r": 261.04083, "b": 437.08646000000005, "coord_origin": "TOPLEFT"}}]}, {"id": 9, "label": "list_item", "bbox": {"l": 49.86316680908203, "t": 439.81085205078125, "r": 286.36334, "b": 481.95904541015625, "coord_origin": "TOPLEFT"}, "confidence": 0.9274739027023315, "cells": [{"id": 62, "text": "[10]", "bbox": {"l": 50.112, "t": 440.3424999999999, "r": 65.399307, "b": 448.3584599999999, "coord_origin": "TOPLEFT"}}, {"id": 63, "text": "Yelin He, X. Qi, Jiaquan Ye, Peng Gao, Yihao Chen, Bing-", "bbox": {"l": 67.693321, "t": 440.3424999999999, "r": 286.3587, "b": 448.3584599999999, "coord_origin": "TOPLEFT"}}, {"id": 64, "text": "cong Li, Xin Tang, and Rong Xiao.", "bbox": {"l": 70.030998, "t": 451.30151, "r": 202.74268, "b": 459.31747, "coord_origin": "TOPLEFT"}}, {"id": 65, "text": "Pingan-vcgroup\u2019s so-", "bbox": {"l": 209.00122, "t": 451.30151, "r": 286.36331, "b": 459.31747, "coord_origin": "TOPLEFT"}}, {"id": 66, "text": "lution for icdar 2021 competition on scientific table image", "bbox": {"l": 70.030998, "t": 462.2605, "r": 286.36334, "b": 470.27646, "coord_origin": "TOPLEFT"}}, {"id": 67, "text": "recognition to latex.", "bbox": {"l": 70.030998, "t": 473.21948, "r": 141.86981, "b": 481.23544, "coord_origin": "TOPLEFT"}}, {"id": 68, "text": "ArXiv", "bbox": {"l": 145.097, "t": 473.30017, "r": 166.01561, "b": 481.02924, "coord_origin": "TOPLEFT"}}, {"id": 69, "text": ", abs/2105.01846, 2021. 2", "bbox": {"l": 166.015, "t": 473.21948, "r": 259.90216, "b": 481.23544, "coord_origin": "TOPLEFT"}}]}, {"id": 10, "label": "list_item", "bbox": {"l": 49.55924987792969, "t": 483.82781982421875, "r": 286.4127197265625, "b": 536.34238, "coord_origin": "TOPLEFT"}, "confidence": 0.9299948811531067, "cells": [{"id": 70, "text": "[11]", "bbox": {"l": 50.112, "t": 484.49048, "r": 66.033806, "b": 492.50644, "coord_origin": "TOPLEFT"}}, {"id": 71, "text": "Jianying Hu, Ramanujan S Kashi, Daniel P Lopresti, and", "bbox": {"l": 68.423035, "t": 484.49048, "r": 286.35873, "b": 492.50644, "coord_origin": "TOPLEFT"}}, {"id": 72, "text": "Gordon Wilfong. Medium-independent table detection. In", "bbox": {"l": 70.030998, "t": 495.44946, "r": 286.36331, "b": 503.46542, "coord_origin": "TOPLEFT"}}, {"id": 73, "text": "Document Recognition and Retrieval VII", "bbox": {"l": 70.030998, "t": 506.48914, "r": 227.40926, "b": 514.2182, "coord_origin": "TOPLEFT"}}, {"id": 74, "text": ", volume 3967,", "bbox": {"l": 227.40500000000003, "t": 506.40845, "r": 286.35913, "b": 514.4244100000001, "coord_origin": "TOPLEFT"}}, {"id": 75, "text": "pages 291-302. International Society for Optics and Photon-", "bbox": {"l": 70.031006, "t": 517.36743, "r": 286.36328, "b": 525.38339, "coord_origin": "TOPLEFT"}}, {"id": 76, "text": "ics, 1999. 2", "bbox": {"l": 70.031006, "t": 528.32642, "r": 112.36138000000001, "b": 536.34238, "coord_origin": "TOPLEFT"}}]}, {"id": 11, "label": "list_item", "bbox": {"l": 49.559425354003906, "t": 539.2149658203125, "r": 286.9141845703125, "b": 591.44937, "coord_origin": "TOPLEFT"}, "confidence": 0.9394100904464722, "cells": [{"id": 77, "text": "[12]", "bbox": {"l": 50.112007, "t": 539.59842, "r": 65.466705, "b": 547.61438, "coord_origin": "TOPLEFT"}}, {"id": 78, "text": "Matthew Hurst. A constraint-based approach to table struc-", "bbox": {"l": 67.770828, "t": 539.59842, "r": 286.35873, "b": 547.61438, "coord_origin": "TOPLEFT"}}, {"id": 79, "text": "ture derivation. In", "bbox": {"l": 70.031006, "t": 550.55742, "r": 136.28374, "b": 558.57338, "coord_origin": "TOPLEFT"}}, {"id": 80, "text": "Proceedings of the Seventh International", "bbox": {"l": 138.811, "t": 550.63812, "r": 286.36206, "b": 558.36716, "coord_origin": "TOPLEFT"}}, {"id": 81, "text": "Conference on Document Analysis and Recognition - Volume", "bbox": {"l": 70.031006, "t": 561.5971199999999, "r": 286.36334, "b": 569.32616, "coord_origin": "TOPLEFT"}}, {"id": 82, "text": "2", "bbox": {"l": 70.031006, "t": 572.55612, "r": 74.514206, "b": 580.28516, "coord_origin": "TOPLEFT"}}, {"id": 83, "text": ", ICDAR \u201903, page 911, USA, 2003. IEEE Computer Soci-", "bbox": {"l": 74.514008, "t": 572.47542, "r": 286.36313, "b": 580.4913799999999, "coord_origin": "TOPLEFT"}}, {"id": 84, "text": "ety. 2", "bbox": {"l": 70.031006, "t": 583.4334100000001, "r": 90.357834, "b": 591.44937, "coord_origin": "TOPLEFT"}}]}, {"id": 12, "label": "list_item", "bbox": {"l": 49.5648307800293, "t": 593.7275390625, "r": 286.607177734375, "b": 647.1483154296875, "coord_origin": "TOPLEFT"}, "confidence": 0.9298840761184692, "cells": [{"id": 85, "text": "[13]", "bbox": {"l": 50.112007, "t": 594.70541, "r": 66.270439, "b": 602.72137, "coord_origin": "TOPLEFT"}}, {"id": 86, "text": "Thotreingam Kasar, Philippine Barlas, Sebastien Adam,", "bbox": {"l": 68.695168, "t": 594.70541, "r": 286.35873, "b": 602.72137, "coord_origin": "TOPLEFT"}}, {"id": 87, "text": "Cl\u00b4ement Chatelain, and Thierry Paquet. Learning to detect", "bbox": {"l": 70.031006, "t": 605.66441, "r": 286.3631, "b": 613.68037, "coord_origin": "TOPLEFT"}}, {"id": 88, "text": "tables in scanned document images using line information.", "bbox": {"l": 70.031006, "t": 616.62341, "r": 286.36331, "b": 624.63937, "coord_origin": "TOPLEFT"}}, {"id": 89, "text": "In", "bbox": {"l": 70.031006, "t": 627.58241, "r": 77.500015, "b": 635.5983699999999, "coord_origin": "TOPLEFT"}}, {"id": 90, "text": "2013 12th International Conference on Document Analy-", "bbox": {"l": 79.920006, "t": 627.6631199999999, "r": 286.3624, "b": 635.39215, "coord_origin": "TOPLEFT"}}, {"id": 91, "text": "sis and Recognition", "bbox": {"l": 70.031006, "t": 638.62212, "r": 140.67728, "b": 646.35115, "coord_origin": "TOPLEFT"}}, {"id": 92, "text": ", pages 1185-1189. IEEE, 2013. 2", "bbox": {"l": 140.67599, "t": 638.54141, "r": 264.43921, "b": 646.55737, "coord_origin": "TOPLEFT"}}]}, {"id": 13, "label": "list_item", "bbox": {"l": 49.71070861816406, "t": 649.1871337890625, "r": 286.4481201171875, "b": 680.2498168945312, "coord_origin": "TOPLEFT"}, "confidence": 0.9115259051322937, "cells": [{"id": 93, "text": "[14]", "bbox": {"l": 50.111992, "t": 649.81342, "r": 66.534035, "b": 657.82938, "coord_origin": "TOPLEFT"}}, {"id": 94, "text": "Pratik Kayal, Mrinal Anand, Harsh Desai, and Mayank", "bbox": {"l": 68.998329, "t": 649.81342, "r": 286.35873, "b": 657.82938, "coord_origin": "TOPLEFT"}}, {"id": 95, "text": "Singh.", "bbox": {"l": 70.030991, "t": 660.77142, "r": 93.200165, "b": 668.78738, "coord_origin": "TOPLEFT"}}, {"id": 96, "text": "Icdar 2021 competition on scientific table image", "bbox": {"l": 102.20243, "t": 660.77142, "r": 286.36334, "b": 668.78738, "coord_origin": "TOPLEFT"}}, {"id": 97, "text": "recognition to latex, 2021. 2", "bbox": {"l": 70.030991, "t": 671.73042, "r": 171.9969, "b": 679.74638, "coord_origin": "TOPLEFT"}}]}, {"id": 14, "label": "list_item", "bbox": {"l": 49.51462936401367, "t": 682.48046875, "r": 286.42413330078125, "b": 712.936386, "coord_origin": "TOPLEFT"}, "confidence": 0.9122310876846313, "cells": [{"id": 98, "text": "[15]", "bbox": {"l": 50.111992, "t": 683.00243, "r": 65.515968, "b": 691.01839, "coord_origin": "TOPLEFT"}}, {"id": 99, "text": "Harold W Kuhn. The hungarian method for the assignment", "bbox": {"l": 67.827499, "t": 683.00243, "r": 286.3587, "b": 691.01839, "coord_origin": "TOPLEFT"}}, {"id": 100, "text": "problem.", "bbox": {"l": 70.030991, "t": 693.9614260000001, "r": 102.15761, "b": 701.977386, "coord_origin": "TOPLEFT"}}, {"id": 101, "text": "Naval research logistics quarterly", "bbox": {"l": 107.54999, "t": 694.0421220000001, "r": 231.47461, "b": 701.771156, "coord_origin": "TOPLEFT"}}, {"id": 102, "text": ", 2(1-2):83-97,", "bbox": {"l": 231.47598, "t": 693.9614260000001, "r": 286.35931, "b": 701.977386, "coord_origin": "TOPLEFT"}}, {"id": 103, "text": "1955. 6", "bbox": {"l": 70.030975, "t": 704.920425, "r": 97.916481, "b": 712.936386, "coord_origin": "TOPLEFT"}}]}, {"id": 15, "label": "list_item", "bbox": {"l": 308.4394836425781, "t": 74.8146743774414, "r": 545.1665649414062, "b": 138.69335999999998, "coord_origin": "TOPLEFT"}, "confidence": 0.9389601349830627, "cells": [{"id": 104, "text": "[16]", "bbox": {"l": 308.86197, "t": 75.88342000000011, "r": 324.74973, "b": 83.89940999999999, "coord_origin": "TOPLEFT"}}, {"id": 105, "text": "Girish Kulkarni, Visruth Premraj, Vicente Ordonez, Sag-", "bbox": {"l": 327.13382, "t": 75.88342000000011, "r": 545.1087, "b": 83.89940999999999, "coord_origin": "TOPLEFT"}}, {"id": 106, "text": "nik Dhar, Siming Li, Yejin Choi, Alexander C. Berg, and", "bbox": {"l": 328.78098, "t": 86.84142999999995, "r": 545.1134, "b": 94.85741999999993, "coord_origin": "TOPLEFT"}}, {"id": 107, "text": "Tamara L. Berg.", "bbox": {"l": 328.78098, "t": 97.80042000000003, "r": 390.96295, "b": 105.81641000000002, "coord_origin": "TOPLEFT"}}, {"id": 108, "text": "Babytalk:", "bbox": {"l": 400.27008, "t": 97.80042000000003, "r": 435.1404099999999, "b": 105.81641000000002, "coord_origin": "TOPLEFT"}}, {"id": 109, "text": "Understanding and generat-", "bbox": {"l": 441.71277, "t": 97.80042000000003, "r": 545.11328, "b": 105.81641000000002, "coord_origin": "TOPLEFT"}}, {"id": 110, "text": "ing simple image descriptions.", "bbox": {"l": 328.78098, "t": 108.75940000000003, "r": 440.80719, "b": 116.7753899999999, "coord_origin": "TOPLEFT"}}, {"id": 111, "text": "IEEE Transactions on Pat-", "bbox": {"l": 446.63498, "t": 108.84009000000003, "r": 545.11304, "b": 116.56914999999992, "coord_origin": "TOPLEFT"}}, {"id": 112, "text": "tern Analysis and Machine Intelligence", "bbox": {"l": 328.78098, "t": 119.79907000000003, "r": 471.13153, "b": 127.52814000000001, "coord_origin": "TOPLEFT"}}, {"id": 113, "text": ", 35(12):2891-2903,", "bbox": {"l": 471.13300000000004, "t": 119.71838000000002, "r": 545.11475, "b": 127.73437999999999, "coord_origin": "TOPLEFT"}}, {"id": 114, "text": "2013. 4", "bbox": {"l": 328.78101, "t": 130.67737, "r": 356.6665, "b": 138.69335999999998, "coord_origin": "TOPLEFT"}}]}, {"id": 16, "label": "list_item", "bbox": {"l": 308.39459228515625, "t": 141.0391845703125, "r": 545.1134, "b": 172.29119873046875, "coord_origin": "TOPLEFT"}, "confidence": 0.9253131747245789, "cells": [{"id": 115, "text": "[17]", "bbox": {"l": 308.862, "t": 142.12334999999996, "r": 325.24371, "b": 150.13933999999995, "coord_origin": "TOPLEFT"}}, {"id": 116, "text": "Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming", "bbox": {"l": 327.70197, "t": 142.12334999999996, "r": 545.10883, "b": 150.13933999999995, "coord_origin": "TOPLEFT"}}, {"id": 117, "text": "Zhou, and Zhoujun Li.", "bbox": {"l": 328.78101, "t": 153.08136000000002, "r": 414.44598, "b": 161.09735, "coord_origin": "TOPLEFT"}}, {"id": 118, "text": "Tablebank: A benchmark dataset", "bbox": {"l": 421.82532, "t": 153.08136000000002, "r": 545.1134, "b": 161.09735, "coord_origin": "TOPLEFT"}}, {"id": 119, "text": "for table detection and recognition, 2019. 2, 3", "bbox": {"l": 328.78101, "t": 164.04034000000001, "r": 493.62835999999993, "b": 172.05633999999998, "coord_origin": "TOPLEFT"}}]}, {"id": 17, "label": "list_item", "bbox": {"l": 308.69390869140625, "t": 174.55084228515625, "r": 545.3489990234375, "b": 260.21423000000004, "coord_origin": "TOPLEFT"}, "confidence": 0.9299733638763428, "cells": [{"id": 120, "text": "[18]", "bbox": {"l": 308.862, "t": 175.48632999999995, "r": 324.26599, "b": 183.50232000000005, "coord_origin": "TOPLEFT"}}, {"id": 121, "text": "Yiren Li, Zheng Huang, Junchi Yan, Yi Zhou, Fan Ye, and", "bbox": {"l": 326.57751, "t": 175.48632999999995, "r": 545.10876, "b": 183.50232000000005, "coord_origin": "TOPLEFT"}}, {"id": 122, "text": "Xianhui Liu. Gfte: Graph-based financial table extraction.", "bbox": {"l": 328.78101, "t": 186.44530999999995, "r": 545.11334, "b": 194.46130000000005, "coord_origin": "TOPLEFT"}}, {"id": 123, "text": "In Alberto Del Bimbo, Rita Cucchiara, Stan Sclaroff, Gio-", "bbox": {"l": 328.78101, "t": 197.40430000000003, "r": 545.11346, "b": 205.42029000000002, "coord_origin": "TOPLEFT"}}, {"id": 124, "text": "vanni Maria Farinella, Tao Mei, Marco Bertini, Hugo Jair", "bbox": {"l": 328.78101, "t": 208.36328000000003, "r": 545.11353, "b": 216.37927000000002, "coord_origin": "TOPLEFT"}}, {"id": 125, "text": "Escalante, and Roberto Vezzani, editors,", "bbox": {"l": 328.78101, "t": 219.32227, "r": 479.26413, "b": 227.33826, "coord_origin": "TOPLEFT"}}, {"id": 126, "text": "Pattern Recogni-", "bbox": {"l": 483.11902, "t": 219.40295000000003, "r": 545.11273, "b": 227.13202, "coord_origin": "TOPLEFT"}}, {"id": 127, "text": "tion. ICPR International Workshops and Challenges", "bbox": {"l": 328.78101, "t": 230.36095999999998, "r": 519.39771, "b": 238.09002999999996, "coord_origin": "TOPLEFT"}}, {"id": 128, "text": ", pages", "bbox": {"l": 519.401, "t": 230.28026999999997, "r": 545.10767, "b": 238.29625999999996, "coord_origin": "TOPLEFT"}}, {"id": 129, "text": "644-658, Cham, 2021. Springer International Publishing. 2,", "bbox": {"l": 328.78101, "t": 241.23925999999994, "r": 545.11328, "b": 249.25525000000005, "coord_origin": "TOPLEFT"}}, {"id": 130, "text": "3", "bbox": {"l": 328.78101, "t": 252.19824000000006, "r": 333.26422, "b": 260.21423000000004, "coord_origin": "TOPLEFT"}}]}, {"id": 18, "label": "list_item", "bbox": {"l": 308.6376647949219, "t": 263.07110595703125, "r": 545.2516479492188, "b": 326.943115234375, "coord_origin": "TOPLEFT"}, "confidence": 0.9373217821121216, "cells": [{"id": 131, "text": "[19]", "bbox": {"l": 308.862, "t": 263.64423, "r": 324.26477, "b": 271.66022, "coord_origin": "TOPLEFT"}}, {"id": 132, "text": "Nikolaos Livathinos, Cesar Berrospi, Maksym Lysak, Vik-", "bbox": {"l": 326.57611, "t": 263.64423, "r": 545.10883, "b": 271.66022, "coord_origin": "TOPLEFT"}}, {"id": 133, "text": "tor Kuropiatnyk, Ahmed Nassar, Andre Carvalho, Michele", "bbox": {"l": 328.78101, "t": 274.60321, "r": 545.1134, "b": 282.61917000000005, "coord_origin": "TOPLEFT"}}, {"id": 134, "text": "Dolfi, Christoph Auer, Kasper Dinkla, and Peter Staar. Ro-", "bbox": {"l": 328.78101, "t": 285.56219, "r": 545.11328, "b": 293.57816, "coord_origin": "TOPLEFT"}}, {"id": 135, "text": "bust pdf document conversion using recurrent neural net-", "bbox": {"l": 328.78101, "t": 296.52118, "r": 545.11334, "b": 304.53714, "coord_origin": "TOPLEFT"}}, {"id": 136, "text": "works.", "bbox": {"l": 328.78101, "t": 307.47919, "r": 352.84683, "b": 315.49515, "coord_origin": "TOPLEFT"}}, {"id": 137, "text": "Proceedings of the AAAI Conference on Artificial", "bbox": {"l": 360.23599, "t": 307.55988, "r": 545.1142, "b": 315.28894, "coord_origin": "TOPLEFT"}}, {"id": 138, "text": "Intelligence", "bbox": {"l": 328.78101, "t": 318.51886, "r": 371.02173, "b": 326.24792, "coord_origin": "TOPLEFT"}}, {"id": 139, "text": ", 35(17):15137-15145, May 2021. 1", "bbox": {"l": 371.021, "t": 318.43817, "r": 502.26227, "b": 326.45413, "coord_origin": "TOPLEFT"}}]}, {"id": 19, "label": "list_item", "bbox": {"l": 308.6220703125, "t": 328.9691467285156, "r": 545.3649291992188, "b": 371.1004638671875, "coord_origin": "TOPLEFT"}, "confidence": 0.902729332447052, "cells": [{"id": 140, "text": "[20]", "bbox": {"l": 308.862, "t": 329.88419, "r": 323.82672, "b": 337.90015, "coord_origin": "TOPLEFT"}}, {"id": 141, "text": "Rujiao Long, Wen Wang, Nan Xue, Feiyu Gao, Zhibo Yang,", "bbox": {"l": 326.07233, "t": 329.88419, "r": 545.10876, "b": 337.90015, "coord_origin": "TOPLEFT"}}, {"id": 142, "text": "Yongpan Wang, and Gui-Song Xia. Parsing table structures", "bbox": {"l": 328.78101, "t": 340.8432, "r": 545.11346, "b": 348.85916, "coord_origin": "TOPLEFT"}}, {"id": 143, "text": "in the wild. In", "bbox": {"l": 328.78101, "t": 351.80219000000005, "r": 382.7767, "b": 359.81815000000006, "coord_origin": "TOPLEFT"}}, {"id": 144, "text": "Proceedings of the IEEE/CVF International", "bbox": {"l": 385.54102, "t": 351.88287, "r": 545.11609, "b": 359.61194, "coord_origin": "TOPLEFT"}}, {"id": 145, "text": "Conference on Computer Vision", "bbox": {"l": 328.78101, "t": 362.84186, "r": 443.59579, "b": 370.57092, "coord_origin": "TOPLEFT"}}, {"id": 146, "text": ", pages 944-952, 2021. 2", "bbox": {"l": 443.59399, "t": 362.76117, "r": 534.48645, "b": 370.77713, "coord_origin": "TOPLEFT"}}]}, {"id": 20, "label": "list_item", "bbox": {"l": 308.6834716796875, "t": 373.1802978515625, "r": 545.1424560546875, "b": 437.55621337890625, "coord_origin": "TOPLEFT"}, "confidence": 0.8827157020568848, "cells": [{"id": 147, "text": "[21]", "bbox": {"l": 308.862, "t": 374.20618, "r": 324.60281, "b": 382.22214, "coord_origin": "TOPLEFT"}}, {"id": 148, "text": "Shubham", "bbox": {"l": 326.96487, "t": 374.20618, "r": 362.6604, "b": 382.22214, "coord_origin": "TOPLEFT"}}, {"id": 149, "text": "Singh", "bbox": {"l": 368.69479, "t": 374.20618, "r": 389.6134, "b": 382.22214, "coord_origin": "TOPLEFT"}}, {"id": 150, "text": "Paliwal,", "bbox": {"l": 395.6478, "t": 374.20618, "r": 424.56445, "b": 382.22214, "coord_origin": "TOPLEFT"}}, {"id": 151, "text": "D", "bbox": {"l": 431.5492899999999, "t": 374.20618, "r": 438.0230399999999, "b": 382.22214, "coord_origin": "TOPLEFT"}}, {"id": 152, "text": "Vishwanath,", "bbox": {"l": 444.05743, "t": 374.20618, "r": 488.5038799999999, "b": 382.22214, "coord_origin": "TOPLEFT"}}, {"id": 153, "text": "Rohit", "bbox": {"l": 495.47974, "t": 374.20618, "r": 515.41205, "b": 382.22214, "coord_origin": "TOPLEFT"}}, {"id": 154, "text": "Rahul,", "bbox": {"l": 521.44641, "t": 374.20618, "r": 545.10876, "b": 382.22214, "coord_origin": "TOPLEFT"}}, {"id": 155, "text": "Monika Sharma, and Lovekesh Vig. Tablenet: Deep learn-", "bbox": {"l": 328.78101, "t": 385.16516, "r": 545.1134, "b": 393.18112, "coord_origin": "TOPLEFT"}}, {"id": 156, "text": "ing model for end-to-end table detection and tabular data ex-", "bbox": {"l": 328.78101, "t": 396.12415, "r": 545.11346, "b": 404.14011, "coord_origin": "TOPLEFT"}}, {"id": 157, "text": "traction from scanned document images.", "bbox": {"l": 328.78101, "t": 407.08313, "r": 478.00881999999996, "b": 415.09909, "coord_origin": "TOPLEFT"}}, {"id": 158, "text": "In", "bbox": {"l": 484.0701, "t": 407.08313, "r": 491.53912, "b": 415.09909, "coord_origin": "TOPLEFT"}}, {"id": 159, "text": "2019 Interna-", "bbox": {"l": 494.668, "t": 407.16382, "r": 545.11298, "b": 414.89288, "coord_origin": "TOPLEFT"}}, {"id": 160, "text": "tional Conference on Document Analysis and Recognition", "bbox": {"l": 328.78101, "t": 418.12280000000004, "r": 545.11334, "b": 425.85187, "coord_origin": "TOPLEFT"}}, {"id": 161, "text": "(ICDAR)", "bbox": {"l": 328.78101, "t": 429.08179, "r": 360.83591, "b": 436.8108500000001, "coord_origin": "TOPLEFT"}}, {"id": 162, "text": ", pages 128-133. IEEE, 2019. 1", "bbox": {"l": 360.836, "t": 429.0011, "r": 475.63287, "b": 437.01706, "coord_origin": "TOPLEFT"}}]}, {"id": 21, "label": "list_item", "bbox": {"l": 308.78057861328125, "t": 439.68524169921875, "r": 545.1746215820312, "b": 558.05096, "coord_origin": "TOPLEFT"}, "confidence": 0.8896440863609314, "cells": [{"id": 163, "text": "[22]", "bbox": {"l": 308.862, "t": 440.44611, "r": 324.57407, "b": 448.46207, "coord_origin": "TOPLEFT"}}, {"id": 164, "text": "Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer,", "bbox": {"l": 326.93179, "t": 440.44611, "r": 545.1087, "b": 448.46207, "coord_origin": "TOPLEFT"}}, {"id": 165, "text": "James Bradbury, Gregory Chanan, Trevor Killeen, Zeming", "bbox": {"l": 328.78101, "t": 451.40509, "r": 545.11346, "b": 459.42105, "coord_origin": "TOPLEFT"}}, {"id": 166, "text": "Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison,", "bbox": {"l": 328.78101, "t": 462.36407, "r": 545.11328, "b": 470.38004, "coord_origin": "TOPLEFT"}}, {"id": 167, "text": "Andreas Kopf, Edward Yang, Zachary DeVito, Martin Rai-", "bbox": {"l": 328.78101, "t": 473.32306, "r": 545.11328, "b": 481.33902, "coord_origin": "TOPLEFT"}}, {"id": 168, "text": "son, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner,", "bbox": {"l": 328.78101, "t": 484.28204, "r": 545.11328, "b": 492.298, "coord_origin": "TOPLEFT"}}, {"id": 169, "text": "Lu Fang, Junjie Bai, and Soumith Chintala. Pytorch: An im-", "bbox": {"l": 328.78101, "t": 495.24103, "r": 545.1134, "b": 503.25699, "coord_origin": "TOPLEFT"}}, {"id": 170, "text": "perative style, high-performance deep learning library. In H.", "bbox": {"l": 328.78101, "t": 506.20001, "r": 545.1134, "b": 514.21597, "coord_origin": "TOPLEFT"}}, {"id": 171, "text": "Wallach, H. Larochelle, A. Beygelzimer, F. d'Alch\u00b4e-Buc, E.", "bbox": {"l": 328.78101, "t": 517.159, "r": 545.1098, "b": 525.17496, "coord_origin": "TOPLEFT"}}, {"id": 172, "text": "Fox, and R. Garnett, editors,", "bbox": {"l": 328.78101, "t": 528.117, "r": 434.56659, "b": 536.13297, "coord_origin": "TOPLEFT"}}, {"id": 173, "text": "Advances in Neural Informa-", "bbox": {"l": 437.86401, "t": 528.19769, "r": 545.11115, "b": 535.9267600000001, "coord_origin": "TOPLEFT"}}, {"id": 174, "text": "tion Processing Systems 32", "bbox": {"l": 328.78101, "t": 539.15671, "r": 425.73471, "b": 546.8857399999999, "coord_origin": "TOPLEFT"}}, {"id": 175, "text": ", pages 8024-8035. Curran Asso-", "bbox": {"l": 425.73602, "t": 539.076, "r": 545.11475, "b": 547.09196, "coord_origin": "TOPLEFT"}}, {"id": 176, "text": "ciates, Inc., 2019. 6", "bbox": {"l": 328.78101, "t": 550.035, "r": 399.74109, "b": 558.05096, "coord_origin": "TOPLEFT"}}]}, {"id": 22, "label": "list_item", "bbox": {"l": 308.73394775390625, "t": 560.5276489257812, "r": 545.4642944335938, "b": 625.1737060546875, "coord_origin": "TOPLEFT"}, "confidence": 0.9025442600250244, "cells": [{"id": 177, "text": "[23]", "bbox": {"l": 308.862, "t": 561.481, "r": 324.50351, "b": 569.49696, "coord_origin": "TOPLEFT"}}, {"id": 178, "text": "Devashish Prasad, Ayan Gadpal, Kshitij Kapadni, Manish", "bbox": {"l": 326.85068, "t": 561.481, "r": 545.10876, "b": 569.49696, "coord_origin": "TOPLEFT"}}, {"id": 179, "text": "Visave, and Kavita Sultanpure. Cascadetabnet: An approach", "bbox": {"l": 328.78101, "t": 572.44, "r": 545.1134, "b": 580.45596, "coord_origin": "TOPLEFT"}}, {"id": 180, "text": "for end to end table detection and structure recognition from", "bbox": {"l": 328.78101, "t": 583.399, "r": 545.11334, "b": 591.4149600000001, "coord_origin": "TOPLEFT"}}, {"id": 181, "text": "image-based documents. In", "bbox": {"l": 328.78101, "t": 594.358, "r": 431.61667, "b": 602.37396, "coord_origin": "TOPLEFT"}}, {"id": 182, "text": "Proceedings of the IEEE/CVF", "bbox": {"l": 434.69101000000006, "t": 594.4387099999999, "r": 545.11224, "b": 602.16774, "coord_origin": "TOPLEFT"}}, {"id": 183, "text": "Conference on Computer Vision and Pattern Recognition", "bbox": {"l": 328.78101, "t": 605.39671, "r": 545.1134, "b": 613.12575, "coord_origin": "TOPLEFT"}}, {"id": 184, "text": "Workshops", "bbox": {"l": 328.78101, "t": 616.35571, "r": 367.8028, "b": 624.08475, "coord_origin": "TOPLEFT"}}, {"id": 185, "text": ", pages 572-573, 2020. 1", "bbox": {"l": 367.802, "t": 616.2750100000001, "r": 458.69446000000005, "b": 624.29097, "coord_origin": "TOPLEFT"}}]}, {"id": 23, "label": "list_item", "bbox": {"l": 308.49481201171875, "t": 627.0421752929688, "r": 545.31982421875, "b": 669.146484375, "coord_origin": "TOPLEFT"}, "confidence": 0.8777534365653992, "cells": [{"id": 186, "text": "[24]", "bbox": {"l": 308.862, "t": 627.72101, "r": 324.69476, "b": 635.73697, "coord_origin": "TOPLEFT"}}, {"id": 187, "text": "Shah Rukh Qasim, Hassan Mahmood, and Faisal Shafait.", "bbox": {"l": 327.07065, "t": 627.72101, "r": 545.1087, "b": 635.73697, "coord_origin": "TOPLEFT"}}, {"id": 188, "text": "Rethinking table recognition using graph neural networks.", "bbox": {"l": 328.78101, "t": 638.68001, "r": 545.11328, "b": 646.69597, "coord_origin": "TOPLEFT"}}, {"id": 189, "text": "In", "bbox": {"l": 328.78101, "t": 649.63901, "r": 336.25003, "b": 657.65497, "coord_origin": "TOPLEFT"}}, {"id": 190, "text": "2019 International Conference on Document Analysis and", "bbox": {"l": 338.10001, "t": 649.71971, "r": 545.11621, "b": 657.44875, "coord_origin": "TOPLEFT"}}, {"id": 191, "text": "Recognition (ICDAR)", "bbox": {"l": 328.78101, "t": 660.67871, "r": 406.32245, "b": 668.40775, "coord_origin": "TOPLEFT"}}, {"id": 192, "text": ", pages 142-147. IEEE, 2019. 3", "bbox": {"l": 406.32202, "t": 660.5980099999999, "r": 521.1189, "b": 668.61398, "coord_origin": "TOPLEFT"}}]}, {"id": 24, "label": "list_item", "bbox": {"l": 308.78839111328125, "t": 671.11767578125, "r": 545.2333374023438, "b": 713.0277709960938, "coord_origin": "TOPLEFT"}, "confidence": 0.8654534220695496, "cells": [{"id": 193, "text": "[25]", "bbox": {"l": 308.86203, "t": 672.04301, "r": 324.71329, "b": 680.05898, "coord_origin": "TOPLEFT"}}, {"id": 194, "text": "Hamid Rezatofighi, Nathan Tsoi, JunYoung Gwak, Amir", "bbox": {"l": 327.09195, "t": 672.04301, "r": 545.10876, "b": 680.05898, "coord_origin": "TOPLEFT"}}, {"id": 195, "text": "Sadeghian, Ian Reid, and Silvio Savarese.", "bbox": {"l": 328.78104, "t": 683.0020099999999, "r": 482.81488, "b": 691.01797, "coord_origin": "TOPLEFT"}}, {"id": 196, "text": "Generalized in-", "bbox": {"l": 488.75064, "t": 683.0020099999999, "r": 545.1134, "b": 691.01797, "coord_origin": "TOPLEFT"}}, {"id": 197, "text": "tersection over union: A metric and a loss for bounding box", "bbox": {"l": 328.78104, "t": 693.961014, "r": 545.11334, "b": 701.976974, "coord_origin": "TOPLEFT"}}, {"id": 198, "text": "regression. In", "bbox": {"l": 328.78104, "t": 704.920013, "r": 379.1543, "b": 712.935974, "coord_origin": "TOPLEFT"}}, {"id": 199, "text": "Proceedings of the IEEE/CVF Conference on", "bbox": {"l": 381.61603, "t": 705.00071, "r": 545.10938, "b": 712.729744, "coord_origin": "TOPLEFT"}}]}, {"id": 25, "label": "page_footer", "bbox": {"l": 294.42181396484375, "t": 733.50244140625, "r": 300.25152587890625, "b": 743.0391500000001, "coord_origin": "TOPLEFT"}, "confidence": 0.8797808885574341, "cells": [{"id": 200, "text": "9", "bbox": {"l": 295.12103, "t": 734.1325870000001, "r": 300.10233, "b": 743.0391500000001, "coord_origin": "TOPLEFT"}}]}]}, "tablestructure": {"table_map": {}}, "figures_classification": null, "equations_prediction": null}, "assembled": {"elements": [{"label": "list_item", "id": 0, "page_no": 8, "cluster": {"id": 0, "label": "list_item", "bbox": {"l": 69.20614624023438, "t": 74.8327865600586, "r": 286.36334, "b": 116.86038208007812, "coord_origin": "TOPLEFT"}, "confidence": 0.7310391664505005, "cells": [{"id": 0, "text": "end object detection with transformers. In Andrea Vedaldi,", "bbox": {"l": 70.030998, "t": 75.88378999999998, "r": 286.36334, "b": 83.89977999999996, "coord_origin": "TOPLEFT"}}, {"id": 1, "text": "Horst Bischof, Thomas Brox, and Jan-Michael Frahm, edi-", "bbox": {"l": 70.030998, "t": 86.84276999999997, "r": 286.36331, "b": 94.85875999999996, "coord_origin": "TOPLEFT"}}, {"id": 2, "text": "tors,", "bbox": {"l": 70.030998, "t": 97.80078000000003, "r": 85.722198, "b": 105.81677000000002, "coord_origin": "TOPLEFT"}}, {"id": 3, "text": "Computer Vision - ECCV 2020", "bbox": {"l": 87.889, "t": 97.88147000000004, "r": 199.93315, "b": 105.61053000000004, "coord_origin": "TOPLEFT"}}, {"id": 4, "text": ", pages 213-229, Cham,", "bbox": {"l": 199.936, "t": 97.80078000000003, "r": 286.36313, "b": 105.81677000000002, "coord_origin": "TOPLEFT"}}, {"id": 5, "text": "2020. Springer International Publishing. 5", "bbox": {"l": 70.031006, "t": 108.75977, "r": 221.94871999999998, "b": 116.77575999999999, "coord_origin": "TOPLEFT"}}]}, "text": "end object detection with transformers. In Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm, editors, Computer Vision - ECCV 2020 , pages 213-229, Cham, 2020. Springer International Publishing. 5"}, {"label": "list_item", "id": 1, "page_no": 8, "cluster": {"id": 1, "label": "list_item", "bbox": {"l": 54.220462799072266, "t": 118.9831771850586, "r": 286.4865417480469, "b": 150.44512939453125, "coord_origin": "TOPLEFT"}, "confidence": 0.937275767326355, "cells": [{"id": 6, "text": "[2]", "bbox": {"l": 54.595005, "t": 120.03174000000013, "r": 65.206657, "b": 128.04773, "coord_origin": "TOPLEFT"}}, {"id": 7, "text": "Zewen Chi, Heyan Huang, Heng-Da Xu, Houjin Yu, Wanx-", "bbox": {"l": 67.481873, "t": 120.03174000000013, "r": 286.35852, "b": 128.04773, "coord_origin": "TOPLEFT"}}, {"id": 8, "text": "uan Yin, and Xian-Ling Mao.", "bbox": {"l": 70.031006, "t": 130.99072, "r": 179.67215, "b": 139.00671, "coord_origin": "TOPLEFT"}}, {"id": 9, "text": "Complicated table structure", "bbox": {"l": 185.58101, "t": 130.99072, "r": 286.36334, "b": 139.00671, "coord_origin": "TOPLEFT"}}, {"id": 10, "text": "recognition.", "bbox": {"l": 70.031006, "t": 141.94970999999998, "r": 113.11456, "b": 149.96569999999997, "coord_origin": "TOPLEFT"}}, {"id": 11, "text": "arXiv preprint arXiv:1908.04729", "bbox": {"l": 116.34200999999999, "t": 142.0304, "r": 235.3082, "b": 149.75946, "coord_origin": "TOPLEFT"}}, {"id": 12, "text": ", 2019. 3", "bbox": {"l": 235.30701, "t": 141.94970999999998, "r": 267.67572, "b": 149.96569999999997, "coord_origin": "TOPLEFT"}}]}, "text": "[2] Zewen Chi, Heyan Huang, Heng-Da Xu, Houjin Yu, Wanxuan Yin, and Xian-Ling Mao. Complicated table structure recognition. arXiv preprint arXiv:1908.04729 , 2019. 3"}, {"label": "list_item", "id": 2, "page_no": 8, "cluster": {"id": 2, "label": "list_item", "bbox": {"l": 54.171104431152344, "t": 152.46226501464844, "r": 286.6875, "b": 183.15466000000004, "coord_origin": "TOPLEFT"}, "confidence": 0.9378376603126526, "cells": [{"id": 13, "text": "[3]", "bbox": {"l": 54.595001, "t": 153.22168, "r": 65.103195, "b": 161.23766999999998, "coord_origin": "TOPLEFT"}}, {"id": 14, "text": "Bertrand Couasnon and Aurelie Lemaitre.", "bbox": {"l": 67.356239, "t": 153.22168, "r": 218.77876, "b": 161.23766999999998, "coord_origin": "TOPLEFT"}}, {"id": 15, "text": "Recognition of Ta-", "bbox": {"l": 220.97999999999996, "t": 153.30237, "r": 286.36301, "b": 161.03143, "coord_origin": "TOPLEFT"}}, {"id": 16, "text": "bles and Forms", "bbox": {"l": 70.030991, "t": 164.26135, "r": 125.26401000000001, "b": 171.99041999999997, "coord_origin": "TOPLEFT"}}, {"id": 17, "text": ", pages 647-677. Springer London, London,", "bbox": {"l": 125.26098999999999, "t": 164.18066, "r": 286.36029, "b": 172.19665999999995, "coord_origin": "TOPLEFT"}}, {"id": 18, "text": "2014. 2", "bbox": {"l": 70.030991, "t": 175.13867000000005, "r": 97.916496, "b": 183.15466000000004, "coord_origin": "TOPLEFT"}}]}, "text": "[3] Bertrand Couasnon and Aurelie Lemaitre. Recognition of Tables and Forms , pages 647-677. Springer London, London, 2014. 2"}, {"label": "list_item", "id": 3, "page_no": 8, "cluster": {"id": 3, "label": "list_item", "bbox": {"l": 54.094303131103516, "t": 185.52200317382812, "r": 286.5216979980469, "b": 227.71087646484375, "coord_origin": "TOPLEFT"}, "confidence": 0.9648825526237488, "cells": [{"id": 19, "text": "[4]", "bbox": {"l": 54.59499, "t": 186.41063999999994, "r": 65.806984, "b": 194.42664000000002, "coord_origin": "TOPLEFT"}}, {"id": 20, "text": "Herv\u00b4e D\u00b4ejean, Jean-Luc Meunier, Liangcai Gao, Yilun", "bbox": {"l": 68.210922, "t": 186.41063999999994, "r": 286.36401, "b": 194.42664000000002, "coord_origin": "TOPLEFT"}}, {"id": 21, "text": "Huang, Yu Fang, Florian Kleber, and Eva-Maria Lang. IC-", "bbox": {"l": 70.030983, "t": 197.36963000000003, "r": 286.36331, "b": 205.38562000000002, "coord_origin": "TOPLEFT"}}, {"id": 22, "text": "DAR 2019 Competition on Table Detection and Recognition", "bbox": {"l": 70.030983, "t": 208.32861000000003, "r": 286.36334, "b": 216.3446, "coord_origin": "TOPLEFT"}}, {"id": 23, "text": "(cTDaR), Apr. 2019. http://sac.founderit.com/. 2", "bbox": {"l": 70.030983, "t": 219.2876, "r": 245.83519, "b": 227.30358999999999, "coord_origin": "TOPLEFT"}}]}, "text": "[4] Herv\u00b4e D\u00b4ejean, Jean-Luc Meunier, Liangcai Gao, Yilun Huang, Yu Fang, Florian Kleber, and Eva-Maria Lang. ICDAR 2019 Competition on Table Detection and Recognition (cTDaR), Apr. 2019. http://sac.founderit.com/. 2"}, {"label": "list_item", "id": 4, "page_no": 8, "cluster": {"id": 4, "label": "list_item", "bbox": {"l": 54.09014129638672, "t": 229.7616424560547, "r": 286.36334, "b": 271.5709228515625, "coord_origin": "TOPLEFT"}, "confidence": 0.9620944261550903, "cells": [{"id": 24, "text": "[5]", "bbox": {"l": 54.594982, "t": 230.55957, "r": 65.381134, "b": 238.57556, "coord_origin": "TOPLEFT"}}, {"id": 25, "text": "Basilios Gatos, Dimitrios Danatsas, Ioannis Pratikakis, and", "bbox": {"l": 67.693779, "t": 230.55957, "r": 286.35849, "b": 238.57556, "coord_origin": "TOPLEFT"}}, {"id": 26, "text": "Stavros J Perantonis. Automatic table detection in document", "bbox": {"l": 70.030983, "t": 241.51855, "r": 286.36334, "b": 249.53454999999997, "coord_origin": "TOPLEFT"}}, {"id": 27, "text": "images. In", "bbox": {"l": 70.030983, "t": 252.47655999999995, "r": 108.39821, "b": 260.49255000000005, "coord_origin": "TOPLEFT"}}, {"id": 28, "text": "International Conference on Pattern Recognition", "bbox": {"l": 110.64498000000002, "t": 252.55724999999995, "r": 286.3595, "b": 260.28632000000005, "coord_origin": "TOPLEFT"}}, {"id": 29, "text": "and Image Analysis", "bbox": {"l": 70.030983, "t": 263.51624000000004, "r": 140.57861, "b": 271.24530000000004, "coord_origin": "TOPLEFT"}}, {"id": 30, "text": ", pages 609-618. Springer, 2005. 2", "bbox": {"l": 140.57797, "t": 263.43555000000003, "r": 266.47522, "b": 271.45154, "coord_origin": "TOPLEFT"}}]}, "text": "[5] Basilios Gatos, Dimitrios Danatsas, Ioannis Pratikakis, and Stavros J Perantonis. Automatic table detection in document images. In International Conference on Pattern Recognition and Image Analysis , pages 609-618. Springer, 2005. 2"}, {"label": "list_item", "id": 5, "page_no": 8, "cluster": {"id": 5, "label": "list_item", "bbox": {"l": 54.06441879272461, "t": 273.7647399902344, "r": 286.9118347167969, "b": 315.6004899999999, "coord_origin": "TOPLEFT"}, "confidence": 0.9555517435073853, "cells": [{"id": 31, "text": "[6]", "bbox": {"l": 54.594971, "t": 274.70758, "r": 64.848648, "b": 282.72351, "coord_origin": "TOPLEFT"}}, {"id": 32, "text": "Max G\u00a8obel, Tamir Hassan, Ermelinda Oro, and Giorgio Orsi.", "bbox": {"l": 67.047119, "t": 274.70758, "r": 286.36676, "b": 282.72351, "coord_origin": "TOPLEFT"}}, {"id": 33, "text": "Icdar 2013 table competition.", "bbox": {"l": 70.030975, "t": 285.66655999999995, "r": 179.57349, "b": 293.68253, "coord_origin": "TOPLEFT"}}, {"id": 34, "text": "In", "bbox": {"l": 187.01559, "t": 285.66655999999995, "r": 194.4846, "b": 293.68253, "coord_origin": "TOPLEFT"}}, {"id": 35, "text": "2013 12th International", "bbox": {"l": 198.04398, "t": 285.74725, "r": 286.36304, "b": 293.47632, "coord_origin": "TOPLEFT"}}, {"id": 36, "text": "Conference on Document Analysis and Recognition", "bbox": {"l": 70.030975, "t": 296.70624, "r": 260.19937, "b": 304.43530000000004, "coord_origin": "TOPLEFT"}}, {"id": 37, "text": ", pages", "bbox": {"l": 260.198, "t": 296.62555, "r": 286.36197, "b": 304.64151, "coord_origin": "TOPLEFT"}}, {"id": 38, "text": "1449-1453, 2013. 2", "bbox": {"l": 70.030991, "t": 307.5845299999999, "r": 142.74849, "b": 315.6004899999999, "coord_origin": "TOPLEFT"}}]}, "text": "[6] Max G\u00a8obel, Tamir Hassan, Ermelinda Oro, and Giorgio Orsi. Icdar 2013 table competition. In 2013 12th International Conference on Document Analysis and Recognition , pages 1449-1453, 2013. 2"}, {"label": "list_item", "id": 6, "page_no": 8, "cluster": {"id": 6, "label": "list_item", "bbox": {"l": 54.08487319946289, "t": 317.6698913574219, "r": 286.5190734863281, "b": 348.78952, "coord_origin": "TOPLEFT"}, "confidence": 0.9479843378067017, "cells": [{"id": 39, "text": "[7]", "bbox": {"l": 54.59499, "t": 318.85654, "r": 65.61586, "b": 326.8725, "coord_origin": "TOPLEFT"}}, {"id": 40, "text": "EA Green and M Krishnamoorthy.", "bbox": {"l": 67.978821, "t": 318.85654, "r": 199.492, "b": 326.8725, "coord_origin": "TOPLEFT"}}, {"id": 41, "text": "Recognition of tables", "bbox": {"l": 206.98792, "t": 318.85654, "r": 286.35849, "b": 326.8725, "coord_origin": "TOPLEFT"}}, {"id": 42, "text": "using table grammars. procs.", "bbox": {"l": 70.030991, "t": 329.8145400000001, "r": 176.28284, "b": 337.83051, "coord_origin": "TOPLEFT"}}, {"id": 43, "text": "In", "bbox": {"l": 182.60416, "t": 329.8145400000001, "r": 190.07317, "b": 337.83051, "coord_origin": "TOPLEFT"}}, {"id": 44, "text": "Symposium on Document", "bbox": {"l": 193.28299, "t": 329.89522999999997, "r": 286.36319, "b": 337.62429999999995, "coord_origin": "TOPLEFT"}}, {"id": 45, "text": "Analysis and Recognition (SDAIR\u201995)", "bbox": {"l": 70.030991, "t": 340.85425, "r": 206.34717, "b": 348.58331, "coord_origin": "TOPLEFT"}}, {"id": 46, "text": ", pages 261-277. 2", "bbox": {"l": 206.34599, "t": 340.77356, "r": 274.82239, "b": 348.78952, "coord_origin": "TOPLEFT"}}]}, "text": "[7] EA Green and M Krishnamoorthy. Recognition of tables using table grammars. procs. In Symposium on Document Analysis and Recognition (SDAIR\u201995) , pages 261-277. 2"}, {"label": "list_item", "id": 7, "page_no": 8, "cluster": {"id": 7, "label": "list_item", "bbox": {"l": 54.01877212524414, "t": 351.1116027832031, "r": 286.37677001953125, "b": 404.01751708984375, "coord_origin": "TOPLEFT"}, "confidence": 0.948442816734314, "cells": [{"id": 47, "text": "[8]", "bbox": {"l": 54.594986000000006, "t": 352.0455600000001, "r": 65.04657, "b": 360.06152, "coord_origin": "TOPLEFT"}}, {"id": 48, "text": "Khurram Azeem Hashmi, Alain Pagani, Marcus Liwicki, Di-", "bbox": {"l": 67.287483, "t": 352.0455600000001, "r": 286.35849, "b": 360.06152, "coord_origin": "TOPLEFT"}}, {"id": 49, "text": "dier Stricker, and Muhammad Zeshan Afzal.", "bbox": {"l": 70.030983, "t": 363.00458, "r": 234.12507999999997, "b": 371.02054, "coord_origin": "TOPLEFT"}}, {"id": 50, "text": "Castabdetec-", "bbox": {"l": 240.05186, "t": 363.00458, "r": 286.36331, "b": 371.02054, "coord_origin": "TOPLEFT"}}, {"id": 51, "text": "tors: Cascade network for table detection in document im-", "bbox": {"l": 70.030983, "t": 373.96356, "r": 286.36331, "b": 381.97952, "coord_origin": "TOPLEFT"}}, {"id": 52, "text": "ages with recursive feature pyramid and switchable atrous", "bbox": {"l": 70.030983, "t": 384.92255, "r": 286.36331, "b": 392.93851, "coord_origin": "TOPLEFT"}}, {"id": 53, "text": "convolution.", "bbox": {"l": 70.030983, "t": 395.88153, "r": 114.57605, "b": 403.89749, "coord_origin": "TOPLEFT"}}, {"id": 54, "text": "Journal of Imaging", "bbox": {"l": 117.80399000000001, "t": 395.96222, "r": 186.7287, "b": 403.69128, "coord_origin": "TOPLEFT"}}, {"id": 55, "text": ", 7(10), 2021. 1", "bbox": {"l": 186.728, "t": 395.88153, "r": 243.00113999999996, "b": 403.89749, "coord_origin": "TOPLEFT"}}]}, "text": "[8] Khurram Azeem Hashmi, Alain Pagani, Marcus Liwicki, Didier Stricker, and Muhammad Zeshan Afzal. Castabdetectors: Cascade network for table detection in document images with recursive feature pyramid and switchable atrous convolution. Journal of Imaging , 7(10), 2021. 1"}, {"label": "list_item", "id": 8, "page_no": 8, "cluster": {"id": 8, "label": "list_item", "bbox": {"l": 53.796630859375, "t": 406.2373352050781, "r": 286.63372802734375, "b": 437.5993957519531, "coord_origin": "TOPLEFT"}, "confidence": 0.9330759048461914, "cells": [{"id": 56, "text": "[9]", "bbox": {"l": 54.595001, "t": 407.15253000000007, "r": 65.334427, "b": 415.1684900000001, "coord_origin": "TOPLEFT"}}, {"id": 57, "text": "Kaiming He, Georgia Gkioxari, Piotr Dollar, and Ross Gir-", "bbox": {"l": 67.637054, "t": 407.15253000000007, "r": 286.35852, "b": 415.1684900000001, "coord_origin": "TOPLEFT"}}, {"id": 58, "text": "shick. Mask r-cnn. In", "bbox": {"l": 70.030998, "t": 418.11151, "r": 147.13306, "b": 426.12747, "coord_origin": "TOPLEFT"}}, {"id": 59, "text": "Proceedings of the IEEE International", "bbox": {"l": 149.15601, "t": 418.1922, "r": 286.35989, "b": 425.92126, "coord_origin": "TOPLEFT"}}, {"id": 60, "text": "Conference on Computer Vision (ICCV)", "bbox": {"l": 70.031006, "t": 429.15118, "r": 213.48445, "b": 436.88025, "coord_origin": "TOPLEFT"}}, {"id": 61, "text": ", Oct 2017. 1", "bbox": {"l": 213.483, "t": 429.07050000000004, "r": 261.04083, "b": 437.08646000000005, "coord_origin": "TOPLEFT"}}]}, "text": "[9] Kaiming He, Georgia Gkioxari, Piotr Dollar, and Ross Girshick. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision (ICCV) , Oct 2017. 1"}, {"label": "list_item", "id": 9, "page_no": 8, "cluster": {"id": 9, "label": "list_item", "bbox": {"l": 49.86316680908203, "t": 439.81085205078125, "r": 286.36334, "b": 481.95904541015625, "coord_origin": "TOPLEFT"}, "confidence": 0.9274739027023315, "cells": [{"id": 62, "text": "[10]", "bbox": {"l": 50.112, "t": 440.3424999999999, "r": 65.399307, "b": 448.3584599999999, "coord_origin": "TOPLEFT"}}, {"id": 63, "text": "Yelin He, X. Qi, Jiaquan Ye, Peng Gao, Yihao Chen, Bing-", "bbox": {"l": 67.693321, "t": 440.3424999999999, "r": 286.3587, "b": 448.3584599999999, "coord_origin": "TOPLEFT"}}, {"id": 64, "text": "cong Li, Xin Tang, and Rong Xiao.", "bbox": {"l": 70.030998, "t": 451.30151, "r": 202.74268, "b": 459.31747, "coord_origin": "TOPLEFT"}}, {"id": 65, "text": "Pingan-vcgroup\u2019s so-", "bbox": {"l": 209.00122, "t": 451.30151, "r": 286.36331, "b": 459.31747, "coord_origin": "TOPLEFT"}}, {"id": 66, "text": "lution for icdar 2021 competition on scientific table image", "bbox": {"l": 70.030998, "t": 462.2605, "r": 286.36334, "b": 470.27646, "coord_origin": "TOPLEFT"}}, {"id": 67, "text": "recognition to latex.", "bbox": {"l": 70.030998, "t": 473.21948, "r": 141.86981, "b": 481.23544, "coord_origin": "TOPLEFT"}}, {"id": 68, "text": "ArXiv", "bbox": {"l": 145.097, "t": 473.30017, "r": 166.01561, "b": 481.02924, "coord_origin": "TOPLEFT"}}, {"id": 69, "text": ", abs/2105.01846, 2021. 2", "bbox": {"l": 166.015, "t": 473.21948, "r": 259.90216, "b": 481.23544, "coord_origin": "TOPLEFT"}}]}, "text": "[10] Yelin He, X. Qi, Jiaquan Ye, Peng Gao, Yihao Chen, Bingcong Li, Xin Tang, and Rong Xiao. Pingan-vcgroup\u2019s solution for icdar 2021 competition on scientific table image recognition to latex. ArXiv , abs/2105.01846, 2021. 2"}, {"label": "list_item", "id": 10, "page_no": 8, "cluster": {"id": 10, "label": "list_item", "bbox": {"l": 49.55924987792969, "t": 483.82781982421875, "r": 286.4127197265625, "b": 536.34238, "coord_origin": "TOPLEFT"}, "confidence": 0.9299948811531067, "cells": [{"id": 70, "text": "[11]", "bbox": {"l": 50.112, "t": 484.49048, "r": 66.033806, "b": 492.50644, "coord_origin": "TOPLEFT"}}, {"id": 71, "text": "Jianying Hu, Ramanujan S Kashi, Daniel P Lopresti, and", "bbox": {"l": 68.423035, "t": 484.49048, "r": 286.35873, "b": 492.50644, "coord_origin": "TOPLEFT"}}, {"id": 72, "text": "Gordon Wilfong. Medium-independent table detection. In", "bbox": {"l": 70.030998, "t": 495.44946, "r": 286.36331, "b": 503.46542, "coord_origin": "TOPLEFT"}}, {"id": 73, "text": "Document Recognition and Retrieval VII", "bbox": {"l": 70.030998, "t": 506.48914, "r": 227.40926, "b": 514.2182, "coord_origin": "TOPLEFT"}}, {"id": 74, "text": ", volume 3967,", "bbox": {"l": 227.40500000000003, "t": 506.40845, "r": 286.35913, "b": 514.4244100000001, "coord_origin": "TOPLEFT"}}, {"id": 75, "text": "pages 291-302. International Society for Optics and Photon-", "bbox": {"l": 70.031006, "t": 517.36743, "r": 286.36328, "b": 525.38339, "coord_origin": "TOPLEFT"}}, {"id": 76, "text": "ics, 1999. 2", "bbox": {"l": 70.031006, "t": 528.32642, "r": 112.36138000000001, "b": 536.34238, "coord_origin": "TOPLEFT"}}]}, "text": "[11] Jianying Hu, Ramanujan S Kashi, Daniel P Lopresti, and Gordon Wilfong. Medium-independent table detection. In Document Recognition and Retrieval VII , volume 3967, pages 291-302. International Society for Optics and Photonics, 1999. 2"}, {"label": "list_item", "id": 11, "page_no": 8, "cluster": {"id": 11, "label": "list_item", "bbox": {"l": 49.559425354003906, "t": 539.2149658203125, "r": 286.9141845703125, "b": 591.44937, "coord_origin": "TOPLEFT"}, "confidence": 0.9394100904464722, "cells": [{"id": 77, "text": "[12]", "bbox": {"l": 50.112007, "t": 539.59842, "r": 65.466705, "b": 547.61438, "coord_origin": "TOPLEFT"}}, {"id": 78, "text": "Matthew Hurst. A constraint-based approach to table struc-", "bbox": {"l": 67.770828, "t": 539.59842, "r": 286.35873, "b": 547.61438, "coord_origin": "TOPLEFT"}}, {"id": 79, "text": "ture derivation. In", "bbox": {"l": 70.031006, "t": 550.55742, "r": 136.28374, "b": 558.57338, "coord_origin": "TOPLEFT"}}, {"id": 80, "text": "Proceedings of the Seventh International", "bbox": {"l": 138.811, "t": 550.63812, "r": 286.36206, "b": 558.36716, "coord_origin": "TOPLEFT"}}, {"id": 81, "text": "Conference on Document Analysis and Recognition - Volume", "bbox": {"l": 70.031006, "t": 561.5971199999999, "r": 286.36334, "b": 569.32616, "coord_origin": "TOPLEFT"}}, {"id": 82, "text": "2", "bbox": {"l": 70.031006, "t": 572.55612, "r": 74.514206, "b": 580.28516, "coord_origin": "TOPLEFT"}}, {"id": 83, "text": ", ICDAR \u201903, page 911, USA, 2003. IEEE Computer Soci-", "bbox": {"l": 74.514008, "t": 572.47542, "r": 286.36313, "b": 580.4913799999999, "coord_origin": "TOPLEFT"}}, {"id": 84, "text": "ety. 2", "bbox": {"l": 70.031006, "t": 583.4334100000001, "r": 90.357834, "b": 591.44937, "coord_origin": "TOPLEFT"}}]}, "text": "[12] Matthew Hurst. A constraint-based approach to table structure derivation. In Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2 , ICDAR \u201903, page 911, USA, 2003. IEEE Computer Society. 2"}, {"label": "list_item", "id": 12, "page_no": 8, "cluster": {"id": 12, "label": "list_item", "bbox": {"l": 49.5648307800293, "t": 593.7275390625, "r": 286.607177734375, "b": 647.1483154296875, "coord_origin": "TOPLEFT"}, "confidence": 0.9298840761184692, "cells": [{"id": 85, "text": "[13]", "bbox": {"l": 50.112007, "t": 594.70541, "r": 66.270439, "b": 602.72137, "coord_origin": "TOPLEFT"}}, {"id": 86, "text": "Thotreingam Kasar, Philippine Barlas, Sebastien Adam,", "bbox": {"l": 68.695168, "t": 594.70541, "r": 286.35873, "b": 602.72137, "coord_origin": "TOPLEFT"}}, {"id": 87, "text": "Cl\u00b4ement Chatelain, and Thierry Paquet. Learning to detect", "bbox": {"l": 70.031006, "t": 605.66441, "r": 286.3631, "b": 613.68037, "coord_origin": "TOPLEFT"}}, {"id": 88, "text": "tables in scanned document images using line information.", "bbox": {"l": 70.031006, "t": 616.62341, "r": 286.36331, "b": 624.63937, "coord_origin": "TOPLEFT"}}, {"id": 89, "text": "In", "bbox": {"l": 70.031006, "t": 627.58241, "r": 77.500015, "b": 635.5983699999999, "coord_origin": "TOPLEFT"}}, {"id": 90, "text": "2013 12th International Conference on Document Analy-", "bbox": {"l": 79.920006, "t": 627.6631199999999, "r": 286.3624, "b": 635.39215, "coord_origin": "TOPLEFT"}}, {"id": 91, "text": "sis and Recognition", "bbox": {"l": 70.031006, "t": 638.62212, "r": 140.67728, "b": 646.35115, "coord_origin": "TOPLEFT"}}, {"id": 92, "text": ", pages 1185-1189. IEEE, 2013. 2", "bbox": {"l": 140.67599, "t": 638.54141, "r": 264.43921, "b": 646.55737, "coord_origin": "TOPLEFT"}}]}, "text": "[13] Thotreingam Kasar, Philippine Barlas, Sebastien Adam, Cl\u00b4ement Chatelain, and Thierry Paquet. Learning to detect tables in scanned document images using line information. In 2013 12th International Conference on Document Analysis and Recognition , pages 1185-1189. IEEE, 2013. 2"}, {"label": "list_item", "id": 13, "page_no": 8, "cluster": {"id": 13, "label": "list_item", "bbox": {"l": 49.71070861816406, "t": 649.1871337890625, "r": 286.4481201171875, "b": 680.2498168945312, "coord_origin": "TOPLEFT"}, "confidence": 0.9115259051322937, "cells": [{"id": 93, "text": "[14]", "bbox": {"l": 50.111992, "t": 649.81342, "r": 66.534035, "b": 657.82938, "coord_origin": "TOPLEFT"}}, {"id": 94, "text": "Pratik Kayal, Mrinal Anand, Harsh Desai, and Mayank", "bbox": {"l": 68.998329, "t": 649.81342, "r": 286.35873, "b": 657.82938, "coord_origin": "TOPLEFT"}}, {"id": 95, "text": "Singh.", "bbox": {"l": 70.030991, "t": 660.77142, "r": 93.200165, "b": 668.78738, "coord_origin": "TOPLEFT"}}, {"id": 96, "text": "Icdar 2021 competition on scientific table image", "bbox": {"l": 102.20243, "t": 660.77142, "r": 286.36334, "b": 668.78738, "coord_origin": "TOPLEFT"}}, {"id": 97, "text": "recognition to latex, 2021. 2", "bbox": {"l": 70.030991, "t": 671.73042, "r": 171.9969, "b": 679.74638, "coord_origin": "TOPLEFT"}}]}, "text": "[14] Pratik Kayal, Mrinal Anand, Harsh Desai, and Mayank Singh. Icdar 2021 competition on scientific table image recognition to latex, 2021. 2"}, {"label": "list_item", "id": 14, "page_no": 8, "cluster": {"id": 14, "label": "list_item", "bbox": {"l": 49.51462936401367, "t": 682.48046875, "r": 286.42413330078125, "b": 712.936386, "coord_origin": "TOPLEFT"}, "confidence": 0.9122310876846313, "cells": [{"id": 98, "text": "[15]", "bbox": {"l": 50.111992, "t": 683.00243, "r": 65.515968, "b": 691.01839, "coord_origin": "TOPLEFT"}}, {"id": 99, "text": "Harold W Kuhn. The hungarian method for the assignment", "bbox": {"l": 67.827499, "t": 683.00243, "r": 286.3587, "b": 691.01839, "coord_origin": "TOPLEFT"}}, {"id": 100, "text": "problem.", "bbox": {"l": 70.030991, "t": 693.9614260000001, "r": 102.15761, "b": 701.977386, "coord_origin": "TOPLEFT"}}, {"id": 101, "text": "Naval research logistics quarterly", "bbox": {"l": 107.54999, "t": 694.0421220000001, "r": 231.47461, "b": 701.771156, "coord_origin": "TOPLEFT"}}, {"id": 102, "text": ", 2(1-2):83-97,", "bbox": {"l": 231.47598, "t": 693.9614260000001, "r": 286.35931, "b": 701.977386, "coord_origin": "TOPLEFT"}}, {"id": 103, "text": "1955. 6", "bbox": {"l": 70.030975, "t": 704.920425, "r": 97.916481, "b": 712.936386, "coord_origin": "TOPLEFT"}}]}, "text": "[15] Harold W Kuhn. The hungarian method for the assignment problem. Naval research logistics quarterly , 2(1-2):83-97, 1955. 6"}, {"label": "list_item", "id": 15, "page_no": 8, "cluster": {"id": 15, "label": "list_item", "bbox": {"l": 308.4394836425781, "t": 74.8146743774414, "r": 545.1665649414062, "b": 138.69335999999998, "coord_origin": "TOPLEFT"}, "confidence": 0.9389601349830627, "cells": [{"id": 104, "text": "[16]", "bbox": {"l": 308.86197, "t": 75.88342000000011, "r": 324.74973, "b": 83.89940999999999, "coord_origin": "TOPLEFT"}}, {"id": 105, "text": "Girish Kulkarni, Visruth Premraj, Vicente Ordonez, Sag-", "bbox": {"l": 327.13382, "t": 75.88342000000011, "r": 545.1087, "b": 83.89940999999999, "coord_origin": "TOPLEFT"}}, {"id": 106, "text": "nik Dhar, Siming Li, Yejin Choi, Alexander C. Berg, and", "bbox": {"l": 328.78098, "t": 86.84142999999995, "r": 545.1134, "b": 94.85741999999993, "coord_origin": "TOPLEFT"}}, {"id": 107, "text": "Tamara L. Berg.", "bbox": {"l": 328.78098, "t": 97.80042000000003, "r": 390.96295, "b": 105.81641000000002, "coord_origin": "TOPLEFT"}}, {"id": 108, "text": "Babytalk:", "bbox": {"l": 400.27008, "t": 97.80042000000003, "r": 435.1404099999999, "b": 105.81641000000002, "coord_origin": "TOPLEFT"}}, {"id": 109, "text": "Understanding and generat-", "bbox": {"l": 441.71277, "t": 97.80042000000003, "r": 545.11328, "b": 105.81641000000002, "coord_origin": "TOPLEFT"}}, {"id": 110, "text": "ing simple image descriptions.", "bbox": {"l": 328.78098, "t": 108.75940000000003, "r": 440.80719, "b": 116.7753899999999, "coord_origin": "TOPLEFT"}}, {"id": 111, "text": "IEEE Transactions on Pat-", "bbox": {"l": 446.63498, "t": 108.84009000000003, "r": 545.11304, "b": 116.56914999999992, "coord_origin": "TOPLEFT"}}, {"id": 112, "text": "tern Analysis and Machine Intelligence", "bbox": {"l": 328.78098, "t": 119.79907000000003, "r": 471.13153, "b": 127.52814000000001, "coord_origin": "TOPLEFT"}}, {"id": 113, "text": ", 35(12):2891-2903,", "bbox": {"l": 471.13300000000004, "t": 119.71838000000002, "r": 545.11475, "b": 127.73437999999999, "coord_origin": "TOPLEFT"}}, {"id": 114, "text": "2013. 4", "bbox": {"l": 328.78101, "t": 130.67737, "r": 356.6665, "b": 138.69335999999998, "coord_origin": "TOPLEFT"}}]}, "text": "[16] Girish Kulkarni, Visruth Premraj, Vicente Ordonez, Sagnik Dhar, Siming Li, Yejin Choi, Alexander C. Berg, and Tamara L. Berg. Babytalk: Understanding and generating simple image descriptions. IEEE Transactions on Pattern Analysis and Machine Intelligence , 35(12):2891-2903, 2013. 4"}, {"label": "list_item", "id": 16, "page_no": 8, "cluster": {"id": 16, "label": "list_item", "bbox": {"l": 308.39459228515625, "t": 141.0391845703125, "r": 545.1134, "b": 172.29119873046875, "coord_origin": "TOPLEFT"}, "confidence": 0.9253131747245789, "cells": [{"id": 115, "text": "[17]", "bbox": {"l": 308.862, "t": 142.12334999999996, "r": 325.24371, "b": 150.13933999999995, "coord_origin": "TOPLEFT"}}, {"id": 116, "text": "Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming", "bbox": {"l": 327.70197, "t": 142.12334999999996, "r": 545.10883, "b": 150.13933999999995, "coord_origin": "TOPLEFT"}}, {"id": 117, "text": "Zhou, and Zhoujun Li.", "bbox": {"l": 328.78101, "t": 153.08136000000002, "r": 414.44598, "b": 161.09735, "coord_origin": "TOPLEFT"}}, {"id": 118, "text": "Tablebank: A benchmark dataset", "bbox": {"l": 421.82532, "t": 153.08136000000002, "r": 545.1134, "b": 161.09735, "coord_origin": "TOPLEFT"}}, {"id": 119, "text": "for table detection and recognition, 2019. 2, 3", "bbox": {"l": 328.78101, "t": 164.04034000000001, "r": 493.62835999999993, "b": 172.05633999999998, "coord_origin": "TOPLEFT"}}]}, "text": "[17] Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou, and Zhoujun Li. Tablebank: A benchmark dataset for table detection and recognition, 2019. 2, 3"}, {"label": "list_item", "id": 17, "page_no": 8, "cluster": {"id": 17, "label": "list_item", "bbox": {"l": 308.69390869140625, "t": 174.55084228515625, "r": 545.3489990234375, "b": 260.21423000000004, "coord_origin": "TOPLEFT"}, "confidence": 0.9299733638763428, "cells": [{"id": 120, "text": "[18]", "bbox": {"l": 308.862, "t": 175.48632999999995, "r": 324.26599, "b": 183.50232000000005, "coord_origin": "TOPLEFT"}}, {"id": 121, "text": "Yiren Li, Zheng Huang, Junchi Yan, Yi Zhou, Fan Ye, and", "bbox": {"l": 326.57751, "t": 175.48632999999995, "r": 545.10876, "b": 183.50232000000005, "coord_origin": "TOPLEFT"}}, {"id": 122, "text": "Xianhui Liu. Gfte: Graph-based financial table extraction.", "bbox": {"l": 328.78101, "t": 186.44530999999995, "r": 545.11334, "b": 194.46130000000005, "coord_origin": "TOPLEFT"}}, {"id": 123, "text": "In Alberto Del Bimbo, Rita Cucchiara, Stan Sclaroff, Gio-", "bbox": {"l": 328.78101, "t": 197.40430000000003, "r": 545.11346, "b": 205.42029000000002, "coord_origin": "TOPLEFT"}}, {"id": 124, "text": "vanni Maria Farinella, Tao Mei, Marco Bertini, Hugo Jair", "bbox": {"l": 328.78101, "t": 208.36328000000003, "r": 545.11353, "b": 216.37927000000002, "coord_origin": "TOPLEFT"}}, {"id": 125, "text": "Escalante, and Roberto Vezzani, editors,", "bbox": {"l": 328.78101, "t": 219.32227, "r": 479.26413, "b": 227.33826, "coord_origin": "TOPLEFT"}}, {"id": 126, "text": "Pattern Recogni-", "bbox": {"l": 483.11902, "t": 219.40295000000003, "r": 545.11273, "b": 227.13202, "coord_origin": "TOPLEFT"}}, {"id": 127, "text": "tion. ICPR International Workshops and Challenges", "bbox": {"l": 328.78101, "t": 230.36095999999998, "r": 519.39771, "b": 238.09002999999996, "coord_origin": "TOPLEFT"}}, {"id": 128, "text": ", pages", "bbox": {"l": 519.401, "t": 230.28026999999997, "r": 545.10767, "b": 238.29625999999996, "coord_origin": "TOPLEFT"}}, {"id": 129, "text": "644-658, Cham, 2021. Springer International Publishing. 2,", "bbox": {"l": 328.78101, "t": 241.23925999999994, "r": 545.11328, "b": 249.25525000000005, "coord_origin": "TOPLEFT"}}, {"id": 130, "text": "3", "bbox": {"l": 328.78101, "t": 252.19824000000006, "r": 333.26422, "b": 260.21423000000004, "coord_origin": "TOPLEFT"}}]}, "text": "[18] Yiren Li, Zheng Huang, Junchi Yan, Yi Zhou, Fan Ye, and Xianhui Liu. Gfte: Graph-based financial table extraction. In Alberto Del Bimbo, Rita Cucchiara, Stan Sclaroff, Giovanni Maria Farinella, Tao Mei, Marco Bertini, Hugo Jair Escalante, and Roberto Vezzani, editors, Pattern Recognition. ICPR International Workshops and Challenges , pages 644-658, Cham, 2021. Springer International Publishing. 2, 3"}, {"label": "list_item", "id": 18, "page_no": 8, "cluster": {"id": 18, "label": "list_item", "bbox": {"l": 308.6376647949219, "t": 263.07110595703125, "r": 545.2516479492188, "b": 326.943115234375, "coord_origin": "TOPLEFT"}, "confidence": 0.9373217821121216, "cells": [{"id": 131, "text": "[19]", "bbox": {"l": 308.862, "t": 263.64423, "r": 324.26477, "b": 271.66022, "coord_origin": "TOPLEFT"}}, {"id": 132, "text": "Nikolaos Livathinos, Cesar Berrospi, Maksym Lysak, Vik-", "bbox": {"l": 326.57611, "t": 263.64423, "r": 545.10883, "b": 271.66022, "coord_origin": "TOPLEFT"}}, {"id": 133, "text": "tor Kuropiatnyk, Ahmed Nassar, Andre Carvalho, Michele", "bbox": {"l": 328.78101, "t": 274.60321, "r": 545.1134, "b": 282.61917000000005, "coord_origin": "TOPLEFT"}}, {"id": 134, "text": "Dolfi, Christoph Auer, Kasper Dinkla, and Peter Staar. Ro-", "bbox": {"l": 328.78101, "t": 285.56219, "r": 545.11328, "b": 293.57816, "coord_origin": "TOPLEFT"}}, {"id": 135, "text": "bust pdf document conversion using recurrent neural net-", "bbox": {"l": 328.78101, "t": 296.52118, "r": 545.11334, "b": 304.53714, "coord_origin": "TOPLEFT"}}, {"id": 136, "text": "works.", "bbox": {"l": 328.78101, "t": 307.47919, "r": 352.84683, "b": 315.49515, "coord_origin": "TOPLEFT"}}, {"id": 137, "text": "Proceedings of the AAAI Conference on Artificial", "bbox": {"l": 360.23599, "t": 307.55988, "r": 545.1142, "b": 315.28894, "coord_origin": "TOPLEFT"}}, {"id": 138, "text": "Intelligence", "bbox": {"l": 328.78101, "t": 318.51886, "r": 371.02173, "b": 326.24792, "coord_origin": "TOPLEFT"}}, {"id": 139, "text": ", 35(17):15137-15145, May 2021. 1", "bbox": {"l": 371.021, "t": 318.43817, "r": 502.26227, "b": 326.45413, "coord_origin": "TOPLEFT"}}]}, "text": "[19] Nikolaos Livathinos, Cesar Berrospi, Maksym Lysak, Viktor Kuropiatnyk, Ahmed Nassar, Andre Carvalho, Michele Dolfi, Christoph Auer, Kasper Dinkla, and Peter Staar. Robust pdf document conversion using recurrent neural networks. Proceedings of the AAAI Conference on Artificial Intelligence , 35(17):15137-15145, May 2021. 1"}, {"label": "list_item", "id": 19, "page_no": 8, "cluster": {"id": 19, "label": "list_item", "bbox": {"l": 308.6220703125, "t": 328.9691467285156, "r": 545.3649291992188, "b": 371.1004638671875, "coord_origin": "TOPLEFT"}, "confidence": 0.902729332447052, "cells": [{"id": 140, "text": "[20]", "bbox": {"l": 308.862, "t": 329.88419, "r": 323.82672, "b": 337.90015, "coord_origin": "TOPLEFT"}}, {"id": 141, "text": "Rujiao Long, Wen Wang, Nan Xue, Feiyu Gao, Zhibo Yang,", "bbox": {"l": 326.07233, "t": 329.88419, "r": 545.10876, "b": 337.90015, "coord_origin": "TOPLEFT"}}, {"id": 142, "text": "Yongpan Wang, and Gui-Song Xia. Parsing table structures", "bbox": {"l": 328.78101, "t": 340.8432, "r": 545.11346, "b": 348.85916, "coord_origin": "TOPLEFT"}}, {"id": 143, "text": "in the wild. In", "bbox": {"l": 328.78101, "t": 351.80219000000005, "r": 382.7767, "b": 359.81815000000006, "coord_origin": "TOPLEFT"}}, {"id": 144, "text": "Proceedings of the IEEE/CVF International", "bbox": {"l": 385.54102, "t": 351.88287, "r": 545.11609, "b": 359.61194, "coord_origin": "TOPLEFT"}}, {"id": 145, "text": "Conference on Computer Vision", "bbox": {"l": 328.78101, "t": 362.84186, "r": 443.59579, "b": 370.57092, "coord_origin": "TOPLEFT"}}, {"id": 146, "text": ", pages 944-952, 2021. 2", "bbox": {"l": 443.59399, "t": 362.76117, "r": 534.48645, "b": 370.77713, "coord_origin": "TOPLEFT"}}]}, "text": "[20] Rujiao Long, Wen Wang, Nan Xue, Feiyu Gao, Zhibo Yang, Yongpan Wang, and Gui-Song Xia. Parsing table structures in the wild. In Proceedings of the IEEE/CVF International Conference on Computer Vision , pages 944-952, 2021. 2"}, {"label": "list_item", "id": 20, "page_no": 8, "cluster": {"id": 20, "label": "list_item", "bbox": {"l": 308.6834716796875, "t": 373.1802978515625, "r": 545.1424560546875, "b": 437.55621337890625, "coord_origin": "TOPLEFT"}, "confidence": 0.8827157020568848, "cells": [{"id": 147, "text": "[21]", "bbox": {"l": 308.862, "t": 374.20618, "r": 324.60281, "b": 382.22214, "coord_origin": "TOPLEFT"}}, {"id": 148, "text": "Shubham", "bbox": {"l": 326.96487, "t": 374.20618, "r": 362.6604, "b": 382.22214, "coord_origin": "TOPLEFT"}}, {"id": 149, "text": "Singh", "bbox": {"l": 368.69479, "t": 374.20618, "r": 389.6134, "b": 382.22214, "coord_origin": "TOPLEFT"}}, {"id": 150, "text": "Paliwal,", "bbox": {"l": 395.6478, "t": 374.20618, "r": 424.56445, "b": 382.22214, "coord_origin": "TOPLEFT"}}, {"id": 151, "text": "D", "bbox": {"l": 431.5492899999999, "t": 374.20618, "r": 438.0230399999999, "b": 382.22214, "coord_origin": "TOPLEFT"}}, {"id": 152, "text": "Vishwanath,", "bbox": {"l": 444.05743, "t": 374.20618, "r": 488.5038799999999, "b": 382.22214, "coord_origin": "TOPLEFT"}}, {"id": 153, "text": "Rohit", "bbox": {"l": 495.47974, "t": 374.20618, "r": 515.41205, "b": 382.22214, "coord_origin": "TOPLEFT"}}, {"id": 154, "text": "Rahul,", "bbox": {"l": 521.44641, "t": 374.20618, "r": 545.10876, "b": 382.22214, "coord_origin": "TOPLEFT"}}, {"id": 155, "text": "Monika Sharma, and Lovekesh Vig. Tablenet: Deep learn-", "bbox": {"l": 328.78101, "t": 385.16516, "r": 545.1134, "b": 393.18112, "coord_origin": "TOPLEFT"}}, {"id": 156, "text": "ing model for end-to-end table detection and tabular data ex-", "bbox": {"l": 328.78101, "t": 396.12415, "r": 545.11346, "b": 404.14011, "coord_origin": "TOPLEFT"}}, {"id": 157, "text": "traction from scanned document images.", "bbox": {"l": 328.78101, "t": 407.08313, "r": 478.00881999999996, "b": 415.09909, "coord_origin": "TOPLEFT"}}, {"id": 158, "text": "In", "bbox": {"l": 484.0701, "t": 407.08313, "r": 491.53912, "b": 415.09909, "coord_origin": "TOPLEFT"}}, {"id": 159, "text": "2019 Interna-", "bbox": {"l": 494.668, "t": 407.16382, "r": 545.11298, "b": 414.89288, "coord_origin": "TOPLEFT"}}, {"id": 160, "text": "tional Conference on Document Analysis and Recognition", "bbox": {"l": 328.78101, "t": 418.12280000000004, "r": 545.11334, "b": 425.85187, "coord_origin": "TOPLEFT"}}, {"id": 161, "text": "(ICDAR)", "bbox": {"l": 328.78101, "t": 429.08179, "r": 360.83591, "b": 436.8108500000001, "coord_origin": "TOPLEFT"}}, {"id": 162, "text": ", pages 128-133. IEEE, 2019. 1", "bbox": {"l": 360.836, "t": 429.0011, "r": 475.63287, "b": 437.01706, "coord_origin": "TOPLEFT"}}]}, "text": "[21] Shubham Singh Paliwal, D Vishwanath, Rohit Rahul, Monika Sharma, and Lovekesh Vig. Tablenet: Deep learning model for end-to-end table detection and tabular data extraction from scanned document images. In 2019 International Conference on Document Analysis and Recognition (ICDAR) , pages 128-133. IEEE, 2019. 1"}, {"label": "list_item", "id": 21, "page_no": 8, "cluster": {"id": 21, "label": "list_item", "bbox": {"l": 308.78057861328125, "t": 439.68524169921875, "r": 545.1746215820312, "b": 558.05096, "coord_origin": "TOPLEFT"}, "confidence": 0.8896440863609314, "cells": [{"id": 163, "text": "[22]", "bbox": {"l": 308.862, "t": 440.44611, "r": 324.57407, "b": 448.46207, "coord_origin": "TOPLEFT"}}, {"id": 164, "text": "Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer,", "bbox": {"l": 326.93179, "t": 440.44611, "r": 545.1087, "b": 448.46207, "coord_origin": "TOPLEFT"}}, {"id": 165, "text": "James Bradbury, Gregory Chanan, Trevor Killeen, Zeming", "bbox": {"l": 328.78101, "t": 451.40509, "r": 545.11346, "b": 459.42105, "coord_origin": "TOPLEFT"}}, {"id": 166, "text": "Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison,", "bbox": {"l": 328.78101, "t": 462.36407, "r": 545.11328, "b": 470.38004, "coord_origin": "TOPLEFT"}}, {"id": 167, "text": "Andreas Kopf, Edward Yang, Zachary DeVito, Martin Rai-", "bbox": {"l": 328.78101, "t": 473.32306, "r": 545.11328, "b": 481.33902, "coord_origin": "TOPLEFT"}}, {"id": 168, "text": "son, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner,", "bbox": {"l": 328.78101, "t": 484.28204, "r": 545.11328, "b": 492.298, "coord_origin": "TOPLEFT"}}, {"id": 169, "text": "Lu Fang, Junjie Bai, and Soumith Chintala. Pytorch: An im-", "bbox": {"l": 328.78101, "t": 495.24103, "r": 545.1134, "b": 503.25699, "coord_origin": "TOPLEFT"}}, {"id": 170, "text": "perative style, high-performance deep learning library. In H.", "bbox": {"l": 328.78101, "t": 506.20001, "r": 545.1134, "b": 514.21597, "coord_origin": "TOPLEFT"}}, {"id": 171, "text": "Wallach, H. Larochelle, A. Beygelzimer, F. d'Alch\u00b4e-Buc, E.", "bbox": {"l": 328.78101, "t": 517.159, "r": 545.1098, "b": 525.17496, "coord_origin": "TOPLEFT"}}, {"id": 172, "text": "Fox, and R. Garnett, editors,", "bbox": {"l": 328.78101, "t": 528.117, "r": 434.56659, "b": 536.13297, "coord_origin": "TOPLEFT"}}, {"id": 173, "text": "Advances in Neural Informa-", "bbox": {"l": 437.86401, "t": 528.19769, "r": 545.11115, "b": 535.9267600000001, "coord_origin": "TOPLEFT"}}, {"id": 174, "text": "tion Processing Systems 32", "bbox": {"l": 328.78101, "t": 539.15671, "r": 425.73471, "b": 546.8857399999999, "coord_origin": "TOPLEFT"}}, {"id": 175, "text": ", pages 8024-8035. Curran Asso-", "bbox": {"l": 425.73602, "t": 539.076, "r": 545.11475, "b": 547.09196, "coord_origin": "TOPLEFT"}}, {"id": 176, "text": "ciates, Inc., 2019. 6", "bbox": {"l": 328.78101, "t": 550.035, "r": 399.74109, "b": 558.05096, "coord_origin": "TOPLEFT"}}]}, "text": "[22] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. Pytorch: An imperative style, high-performance deep learning library. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alch\u00b4e-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems 32 , pages 8024-8035. Curran Associates, Inc., 2019. 6"}, {"label": "list_item", "id": 22, "page_no": 8, "cluster": {"id": 22, "label": "list_item", "bbox": {"l": 308.73394775390625, "t": 560.5276489257812, "r": 545.4642944335938, "b": 625.1737060546875, "coord_origin": "TOPLEFT"}, "confidence": 0.9025442600250244, "cells": [{"id": 177, "text": "[23]", "bbox": {"l": 308.862, "t": 561.481, "r": 324.50351, "b": 569.49696, "coord_origin": "TOPLEFT"}}, {"id": 178, "text": "Devashish Prasad, Ayan Gadpal, Kshitij Kapadni, Manish", "bbox": {"l": 326.85068, "t": 561.481, "r": 545.10876, "b": 569.49696, "coord_origin": "TOPLEFT"}}, {"id": 179, "text": "Visave, and Kavita Sultanpure. Cascadetabnet: An approach", "bbox": {"l": 328.78101, "t": 572.44, "r": 545.1134, "b": 580.45596, "coord_origin": "TOPLEFT"}}, {"id": 180, "text": "for end to end table detection and structure recognition from", "bbox": {"l": 328.78101, "t": 583.399, "r": 545.11334, "b": 591.4149600000001, "coord_origin": "TOPLEFT"}}, {"id": 181, "text": "image-based documents. In", "bbox": {"l": 328.78101, "t": 594.358, "r": 431.61667, "b": 602.37396, "coord_origin": "TOPLEFT"}}, {"id": 182, "text": "Proceedings of the IEEE/CVF", "bbox": {"l": 434.69101000000006, "t": 594.4387099999999, "r": 545.11224, "b": 602.16774, "coord_origin": "TOPLEFT"}}, {"id": 183, "text": "Conference on Computer Vision and Pattern Recognition", "bbox": {"l": 328.78101, "t": 605.39671, "r": 545.1134, "b": 613.12575, "coord_origin": "TOPLEFT"}}, {"id": 184, "text": "Workshops", "bbox": {"l": 328.78101, "t": 616.35571, "r": 367.8028, "b": 624.08475, "coord_origin": "TOPLEFT"}}, {"id": 185, "text": ", pages 572-573, 2020. 1", "bbox": {"l": 367.802, "t": 616.2750100000001, "r": 458.69446000000005, "b": 624.29097, "coord_origin": "TOPLEFT"}}]}, "text": "[23] Devashish Prasad, Ayan Gadpal, Kshitij Kapadni, Manish Visave, and Kavita Sultanpure. Cascadetabnet: An approach for end to end table detection and structure recognition from image-based documents. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops , pages 572-573, 2020. 1"}, {"label": "list_item", "id": 23, "page_no": 8, "cluster": {"id": 23, "label": "list_item", "bbox": {"l": 308.49481201171875, "t": 627.0421752929688, "r": 545.31982421875, "b": 669.146484375, "coord_origin": "TOPLEFT"}, "confidence": 0.8777534365653992, "cells": [{"id": 186, "text": "[24]", "bbox": {"l": 308.862, "t": 627.72101, "r": 324.69476, "b": 635.73697, "coord_origin": "TOPLEFT"}}, {"id": 187, "text": "Shah Rukh Qasim, Hassan Mahmood, and Faisal Shafait.", "bbox": {"l": 327.07065, "t": 627.72101, "r": 545.1087, "b": 635.73697, "coord_origin": "TOPLEFT"}}, {"id": 188, "text": "Rethinking table recognition using graph neural networks.", "bbox": {"l": 328.78101, "t": 638.68001, "r": 545.11328, "b": 646.69597, "coord_origin": "TOPLEFT"}}, {"id": 189, "text": "In", "bbox": {"l": 328.78101, "t": 649.63901, "r": 336.25003, "b": 657.65497, "coord_origin": "TOPLEFT"}}, {"id": 190, "text": "2019 International Conference on Document Analysis and", "bbox": {"l": 338.10001, "t": 649.71971, "r": 545.11621, "b": 657.44875, "coord_origin": "TOPLEFT"}}, {"id": 191, "text": "Recognition (ICDAR)", "bbox": {"l": 328.78101, "t": 660.67871, "r": 406.32245, "b": 668.40775, "coord_origin": "TOPLEFT"}}, {"id": 192, "text": ", pages 142-147. IEEE, 2019. 3", "bbox": {"l": 406.32202, "t": 660.5980099999999, "r": 521.1189, "b": 668.61398, "coord_origin": "TOPLEFT"}}]}, "text": "[24] Shah Rukh Qasim, Hassan Mahmood, and Faisal Shafait. Rethinking table recognition using graph neural networks. In 2019 International Conference on Document Analysis and Recognition (ICDAR) , pages 142-147. IEEE, 2019. 3"}, {"label": "list_item", "id": 24, "page_no": 8, "cluster": {"id": 24, "label": "list_item", "bbox": {"l": 308.78839111328125, "t": 671.11767578125, "r": 545.2333374023438, "b": 713.0277709960938, "coord_origin": "TOPLEFT"}, "confidence": 0.8654534220695496, "cells": [{"id": 193, "text": "[25]", "bbox": {"l": 308.86203, "t": 672.04301, "r": 324.71329, "b": 680.05898, "coord_origin": "TOPLEFT"}}, {"id": 194, "text": "Hamid Rezatofighi, Nathan Tsoi, JunYoung Gwak, Amir", "bbox": {"l": 327.09195, "t": 672.04301, "r": 545.10876, "b": 680.05898, "coord_origin": "TOPLEFT"}}, {"id": 195, "text": "Sadeghian, Ian Reid, and Silvio Savarese.", "bbox": {"l": 328.78104, "t": 683.0020099999999, "r": 482.81488, "b": 691.01797, "coord_origin": "TOPLEFT"}}, {"id": 196, "text": "Generalized in-", "bbox": {"l": 488.75064, "t": 683.0020099999999, "r": 545.1134, "b": 691.01797, "coord_origin": "TOPLEFT"}}, {"id": 197, "text": "tersection over union: A metric and a loss for bounding box", "bbox": {"l": 328.78104, "t": 693.961014, "r": 545.11334, "b": 701.976974, "coord_origin": "TOPLEFT"}}, {"id": 198, "text": "regression. In", "bbox": {"l": 328.78104, "t": 704.920013, "r": 379.1543, "b": 712.935974, "coord_origin": "TOPLEFT"}}, {"id": 199, "text": "Proceedings of the IEEE/CVF Conference on", "bbox": {"l": 381.61603, "t": 705.00071, "r": 545.10938, "b": 712.729744, "coord_origin": "TOPLEFT"}}]}, "text": "[25] Hamid Rezatofighi, Nathan Tsoi, JunYoung Gwak, Amir Sadeghian, Ian Reid, and Silvio Savarese. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE/CVF Conference on"}, {"label": "page_footer", "id": 25, "page_no": 8, "cluster": {"id": 25, "label": "page_footer", "bbox": {"l": 294.42181396484375, "t": 733.50244140625, "r": 300.25152587890625, "b": 743.0391500000001, "coord_origin": "TOPLEFT"}, "confidence": 0.8797808885574341, "cells": [{"id": 200, "text": "9", "bbox": {"l": 295.12103, "t": 734.1325870000001, "r": 300.10233, "b": 743.0391500000001, "coord_origin": "TOPLEFT"}}]}, "text": "9"}], "body": [{"label": "list_item", "id": 0, "page_no": 8, "cluster": {"id": 0, "label": "list_item", "bbox": {"l": 69.20614624023438, "t": 74.8327865600586, "r": 286.36334, "b": 116.86038208007812, "coord_origin": "TOPLEFT"}, "confidence": 0.7310391664505005, "cells": [{"id": 0, "text": "end object detection with transformers. In Andrea Vedaldi,", "bbox": {"l": 70.030998, "t": 75.88378999999998, "r": 286.36334, "b": 83.89977999999996, "coord_origin": "TOPLEFT"}}, {"id": 1, "text": "Horst Bischof, Thomas Brox, and Jan-Michael Frahm, edi-", "bbox": {"l": 70.030998, "t": 86.84276999999997, "r": 286.36331, "b": 94.85875999999996, "coord_origin": "TOPLEFT"}}, {"id": 2, "text": "tors,", "bbox": {"l": 70.030998, "t": 97.80078000000003, "r": 85.722198, "b": 105.81677000000002, "coord_origin": "TOPLEFT"}}, {"id": 3, "text": "Computer Vision - ECCV 2020", "bbox": {"l": 87.889, "t": 97.88147000000004, "r": 199.93315, "b": 105.61053000000004, "coord_origin": "TOPLEFT"}}, {"id": 4, "text": ", pages 213-229, Cham,", "bbox": {"l": 199.936, "t": 97.80078000000003, "r": 286.36313, "b": 105.81677000000002, "coord_origin": "TOPLEFT"}}, {"id": 5, "text": "2020. Springer International Publishing. 5", "bbox": {"l": 70.031006, "t": 108.75977, "r": 221.94871999999998, "b": 116.77575999999999, "coord_origin": "TOPLEFT"}}]}, "text": "end object detection with transformers. In Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm, editors, Computer Vision - ECCV 2020 , pages 213-229, Cham, 2020. Springer International Publishing. 5"}, {"label": "list_item", "id": 1, "page_no": 8, "cluster": {"id": 1, "label": "list_item", "bbox": {"l": 54.220462799072266, "t": 118.9831771850586, "r": 286.4865417480469, "b": 150.44512939453125, "coord_origin": "TOPLEFT"}, "confidence": 0.937275767326355, "cells": [{"id": 6, "text": "[2]", "bbox": {"l": 54.595005, "t": 120.03174000000013, "r": 65.206657, "b": 128.04773, "coord_origin": "TOPLEFT"}}, {"id": 7, "text": "Zewen Chi, Heyan Huang, Heng-Da Xu, Houjin Yu, Wanx-", "bbox": {"l": 67.481873, "t": 120.03174000000013, "r": 286.35852, "b": 128.04773, "coord_origin": "TOPLEFT"}}, {"id": 8, "text": "uan Yin, and Xian-Ling Mao.", "bbox": {"l": 70.031006, "t": 130.99072, "r": 179.67215, "b": 139.00671, "coord_origin": "TOPLEFT"}}, {"id": 9, "text": "Complicated table structure", "bbox": {"l": 185.58101, "t": 130.99072, "r": 286.36334, "b": 139.00671, "coord_origin": "TOPLEFT"}}, {"id": 10, "text": "recognition.", "bbox": {"l": 70.031006, "t": 141.94970999999998, "r": 113.11456, "b": 149.96569999999997, "coord_origin": "TOPLEFT"}}, {"id": 11, "text": "arXiv preprint arXiv:1908.04729", "bbox": {"l": 116.34200999999999, "t": 142.0304, "r": 235.3082, "b": 149.75946, "coord_origin": "TOPLEFT"}}, {"id": 12, "text": ", 2019. 3", "bbox": {"l": 235.30701, "t": 141.94970999999998, "r": 267.67572, "b": 149.96569999999997, "coord_origin": "TOPLEFT"}}]}, "text": "[2] Zewen Chi, Heyan Huang, Heng-Da Xu, Houjin Yu, Wanxuan Yin, and Xian-Ling Mao. Complicated table structure recognition. arXiv preprint arXiv:1908.04729 , 2019. 3"}, {"label": "list_item", "id": 2, "page_no": 8, "cluster": {"id": 2, "label": "list_item", "bbox": {"l": 54.171104431152344, "t": 152.46226501464844, "r": 286.6875, "b": 183.15466000000004, "coord_origin": "TOPLEFT"}, "confidence": 0.9378376603126526, "cells": [{"id": 13, "text": "[3]", "bbox": {"l": 54.595001, "t": 153.22168, "r": 65.103195, "b": 161.23766999999998, "coord_origin": "TOPLEFT"}}, {"id": 14, "text": "Bertrand Couasnon and Aurelie Lemaitre.", "bbox": {"l": 67.356239, "t": 153.22168, "r": 218.77876, "b": 161.23766999999998, "coord_origin": "TOPLEFT"}}, {"id": 15, "text": "Recognition of Ta-", "bbox": {"l": 220.97999999999996, "t": 153.30237, "r": 286.36301, "b": 161.03143, "coord_origin": "TOPLEFT"}}, {"id": 16, "text": "bles and Forms", "bbox": {"l": 70.030991, "t": 164.26135, "r": 125.26401000000001, "b": 171.99041999999997, "coord_origin": "TOPLEFT"}}, {"id": 17, "text": ", pages 647-677. Springer London, London,", "bbox": {"l": 125.26098999999999, "t": 164.18066, "r": 286.36029, "b": 172.19665999999995, "coord_origin": "TOPLEFT"}}, {"id": 18, "text": "2014. 2", "bbox": {"l": 70.030991, "t": 175.13867000000005, "r": 97.916496, "b": 183.15466000000004, "coord_origin": "TOPLEFT"}}]}, "text": "[3] Bertrand Couasnon and Aurelie Lemaitre. Recognition of Tables and Forms , pages 647-677. Springer London, London, 2014. 2"}, {"label": "list_item", "id": 3, "page_no": 8, "cluster": {"id": 3, "label": "list_item", "bbox": {"l": 54.094303131103516, "t": 185.52200317382812, "r": 286.5216979980469, "b": 227.71087646484375, "coord_origin": "TOPLEFT"}, "confidence": 0.9648825526237488, "cells": [{"id": 19, "text": "[4]", "bbox": {"l": 54.59499, "t": 186.41063999999994, "r": 65.806984, "b": 194.42664000000002, "coord_origin": "TOPLEFT"}}, {"id": 20, "text": "Herv\u00b4e D\u00b4ejean, Jean-Luc Meunier, Liangcai Gao, Yilun", "bbox": {"l": 68.210922, "t": 186.41063999999994, "r": 286.36401, "b": 194.42664000000002, "coord_origin": "TOPLEFT"}}, {"id": 21, "text": "Huang, Yu Fang, Florian Kleber, and Eva-Maria Lang. IC-", "bbox": {"l": 70.030983, "t": 197.36963000000003, "r": 286.36331, "b": 205.38562000000002, "coord_origin": "TOPLEFT"}}, {"id": 22, "text": "DAR 2019 Competition on Table Detection and Recognition", "bbox": {"l": 70.030983, "t": 208.32861000000003, "r": 286.36334, "b": 216.3446, "coord_origin": "TOPLEFT"}}, {"id": 23, "text": "(cTDaR), Apr. 2019. http://sac.founderit.com/. 2", "bbox": {"l": 70.030983, "t": 219.2876, "r": 245.83519, "b": 227.30358999999999, "coord_origin": "TOPLEFT"}}]}, "text": "[4] Herv\u00b4e D\u00b4ejean, Jean-Luc Meunier, Liangcai Gao, Yilun Huang, Yu Fang, Florian Kleber, and Eva-Maria Lang. ICDAR 2019 Competition on Table Detection and Recognition (cTDaR), Apr. 2019. http://sac.founderit.com/. 2"}, {"label": "list_item", "id": 4, "page_no": 8, "cluster": {"id": 4, "label": "list_item", "bbox": {"l": 54.09014129638672, "t": 229.7616424560547, "r": 286.36334, "b": 271.5709228515625, "coord_origin": "TOPLEFT"}, "confidence": 0.9620944261550903, "cells": [{"id": 24, "text": "[5]", "bbox": {"l": 54.594982, "t": 230.55957, "r": 65.381134, "b": 238.57556, "coord_origin": "TOPLEFT"}}, {"id": 25, "text": "Basilios Gatos, Dimitrios Danatsas, Ioannis Pratikakis, and", "bbox": {"l": 67.693779, "t": 230.55957, "r": 286.35849, "b": 238.57556, "coord_origin": "TOPLEFT"}}, {"id": 26, "text": "Stavros J Perantonis. Automatic table detection in document", "bbox": {"l": 70.030983, "t": 241.51855, "r": 286.36334, "b": 249.53454999999997, "coord_origin": "TOPLEFT"}}, {"id": 27, "text": "images. In", "bbox": {"l": 70.030983, "t": 252.47655999999995, "r": 108.39821, "b": 260.49255000000005, "coord_origin": "TOPLEFT"}}, {"id": 28, "text": "International Conference on Pattern Recognition", "bbox": {"l": 110.64498000000002, "t": 252.55724999999995, "r": 286.3595, "b": 260.28632000000005, "coord_origin": "TOPLEFT"}}, {"id": 29, "text": "and Image Analysis", "bbox": {"l": 70.030983, "t": 263.51624000000004, "r": 140.57861, "b": 271.24530000000004, "coord_origin": "TOPLEFT"}}, {"id": 30, "text": ", pages 609-618. Springer, 2005. 2", "bbox": {"l": 140.57797, "t": 263.43555000000003, "r": 266.47522, "b": 271.45154, "coord_origin": "TOPLEFT"}}]}, "text": "[5] Basilios Gatos, Dimitrios Danatsas, Ioannis Pratikakis, and Stavros J Perantonis. Automatic table detection in document images. In International Conference on Pattern Recognition and Image Analysis , pages 609-618. Springer, 2005. 2"}, {"label": "list_item", "id": 5, "page_no": 8, "cluster": {"id": 5, "label": "list_item", "bbox": {"l": 54.06441879272461, "t": 273.7647399902344, "r": 286.9118347167969, "b": 315.6004899999999, "coord_origin": "TOPLEFT"}, "confidence": 0.9555517435073853, "cells": [{"id": 31, "text": "[6]", "bbox": {"l": 54.594971, "t": 274.70758, "r": 64.848648, "b": 282.72351, "coord_origin": "TOPLEFT"}}, {"id": 32, "text": "Max G\u00a8obel, Tamir Hassan, Ermelinda Oro, and Giorgio Orsi.", "bbox": {"l": 67.047119, "t": 274.70758, "r": 286.36676, "b": 282.72351, "coord_origin": "TOPLEFT"}}, {"id": 33, "text": "Icdar 2013 table competition.", "bbox": {"l": 70.030975, "t": 285.66655999999995, "r": 179.57349, "b": 293.68253, "coord_origin": "TOPLEFT"}}, {"id": 34, "text": "In", "bbox": {"l": 187.01559, "t": 285.66655999999995, "r": 194.4846, "b": 293.68253, "coord_origin": "TOPLEFT"}}, {"id": 35, "text": "2013 12th International", "bbox": {"l": 198.04398, "t": 285.74725, "r": 286.36304, "b": 293.47632, "coord_origin": "TOPLEFT"}}, {"id": 36, "text": "Conference on Document Analysis and Recognition", "bbox": {"l": 70.030975, "t": 296.70624, "r": 260.19937, "b": 304.43530000000004, "coord_origin": "TOPLEFT"}}, {"id": 37, "text": ", pages", "bbox": {"l": 260.198, "t": 296.62555, "r": 286.36197, "b": 304.64151, "coord_origin": "TOPLEFT"}}, {"id": 38, "text": "1449-1453, 2013. 2", "bbox": {"l": 70.030991, "t": 307.5845299999999, "r": 142.74849, "b": 315.6004899999999, "coord_origin": "TOPLEFT"}}]}, "text": "[6] Max G\u00a8obel, Tamir Hassan, Ermelinda Oro, and Giorgio Orsi. Icdar 2013 table competition. In 2013 12th International Conference on Document Analysis and Recognition , pages 1449-1453, 2013. 2"}, {"label": "list_item", "id": 6, "page_no": 8, "cluster": {"id": 6, "label": "list_item", "bbox": {"l": 54.08487319946289, "t": 317.6698913574219, "r": 286.5190734863281, "b": 348.78952, "coord_origin": "TOPLEFT"}, "confidence": 0.9479843378067017, "cells": [{"id": 39, "text": "[7]", "bbox": {"l": 54.59499, "t": 318.85654, "r": 65.61586, "b": 326.8725, "coord_origin": "TOPLEFT"}}, {"id": 40, "text": "EA Green and M Krishnamoorthy.", "bbox": {"l": 67.978821, "t": 318.85654, "r": 199.492, "b": 326.8725, "coord_origin": "TOPLEFT"}}, {"id": 41, "text": "Recognition of tables", "bbox": {"l": 206.98792, "t": 318.85654, "r": 286.35849, "b": 326.8725, "coord_origin": "TOPLEFT"}}, {"id": 42, "text": "using table grammars. procs.", "bbox": {"l": 70.030991, "t": 329.8145400000001, "r": 176.28284, "b": 337.83051, "coord_origin": "TOPLEFT"}}, {"id": 43, "text": "In", "bbox": {"l": 182.60416, "t": 329.8145400000001, "r": 190.07317, "b": 337.83051, "coord_origin": "TOPLEFT"}}, {"id": 44, "text": "Symposium on Document", "bbox": {"l": 193.28299, "t": 329.89522999999997, "r": 286.36319, "b": 337.62429999999995, "coord_origin": "TOPLEFT"}}, {"id": 45, "text": "Analysis and Recognition (SDAIR\u201995)", "bbox": {"l": 70.030991, "t": 340.85425, "r": 206.34717, "b": 348.58331, "coord_origin": "TOPLEFT"}}, {"id": 46, "text": ", pages 261-277. 2", "bbox": {"l": 206.34599, "t": 340.77356, "r": 274.82239, "b": 348.78952, "coord_origin": "TOPLEFT"}}]}, "text": "[7] EA Green and M Krishnamoorthy. Recognition of tables using table grammars. procs. In Symposium on Document Analysis and Recognition (SDAIR\u201995) , pages 261-277. 2"}, {"label": "list_item", "id": 7, "page_no": 8, "cluster": {"id": 7, "label": "list_item", "bbox": {"l": 54.01877212524414, "t": 351.1116027832031, "r": 286.37677001953125, "b": 404.01751708984375, "coord_origin": "TOPLEFT"}, "confidence": 0.948442816734314, "cells": [{"id": 47, "text": "[8]", "bbox": {"l": 54.594986000000006, "t": 352.0455600000001, "r": 65.04657, "b": 360.06152, "coord_origin": "TOPLEFT"}}, {"id": 48, "text": "Khurram Azeem Hashmi, Alain Pagani, Marcus Liwicki, Di-", "bbox": {"l": 67.287483, "t": 352.0455600000001, "r": 286.35849, "b": 360.06152, "coord_origin": "TOPLEFT"}}, {"id": 49, "text": "dier Stricker, and Muhammad Zeshan Afzal.", "bbox": {"l": 70.030983, "t": 363.00458, "r": 234.12507999999997, "b": 371.02054, "coord_origin": "TOPLEFT"}}, {"id": 50, "text": "Castabdetec-", "bbox": {"l": 240.05186, "t": 363.00458, "r": 286.36331, "b": 371.02054, "coord_origin": "TOPLEFT"}}, {"id": 51, "text": "tors: Cascade network for table detection in document im-", "bbox": {"l": 70.030983, "t": 373.96356, "r": 286.36331, "b": 381.97952, "coord_origin": "TOPLEFT"}}, {"id": 52, "text": "ages with recursive feature pyramid and switchable atrous", "bbox": {"l": 70.030983, "t": 384.92255, "r": 286.36331, "b": 392.93851, "coord_origin": "TOPLEFT"}}, {"id": 53, "text": "convolution.", "bbox": {"l": 70.030983, "t": 395.88153, "r": 114.57605, "b": 403.89749, "coord_origin": "TOPLEFT"}}, {"id": 54, "text": "Journal of Imaging", "bbox": {"l": 117.80399000000001, "t": 395.96222, "r": 186.7287, "b": 403.69128, "coord_origin": "TOPLEFT"}}, {"id": 55, "text": ", 7(10), 2021. 1", "bbox": {"l": 186.728, "t": 395.88153, "r": 243.00113999999996, "b": 403.89749, "coord_origin": "TOPLEFT"}}]}, "text": "[8] Khurram Azeem Hashmi, Alain Pagani, Marcus Liwicki, Didier Stricker, and Muhammad Zeshan Afzal. Castabdetectors: Cascade network for table detection in document images with recursive feature pyramid and switchable atrous convolution. Journal of Imaging , 7(10), 2021. 1"}, {"label": "list_item", "id": 8, "page_no": 8, "cluster": {"id": 8, "label": "list_item", "bbox": {"l": 53.796630859375, "t": 406.2373352050781, "r": 286.63372802734375, "b": 437.5993957519531, "coord_origin": "TOPLEFT"}, "confidence": 0.9330759048461914, "cells": [{"id": 56, "text": "[9]", "bbox": {"l": 54.595001, "t": 407.15253000000007, "r": 65.334427, "b": 415.1684900000001, "coord_origin": "TOPLEFT"}}, {"id": 57, "text": "Kaiming He, Georgia Gkioxari, Piotr Dollar, and Ross Gir-", "bbox": {"l": 67.637054, "t": 407.15253000000007, "r": 286.35852, "b": 415.1684900000001, "coord_origin": "TOPLEFT"}}, {"id": 58, "text": "shick. Mask r-cnn. In", "bbox": {"l": 70.030998, "t": 418.11151, "r": 147.13306, "b": 426.12747, "coord_origin": "TOPLEFT"}}, {"id": 59, "text": "Proceedings of the IEEE International", "bbox": {"l": 149.15601, "t": 418.1922, "r": 286.35989, "b": 425.92126, "coord_origin": "TOPLEFT"}}, {"id": 60, "text": "Conference on Computer Vision (ICCV)", "bbox": {"l": 70.031006, "t": 429.15118, "r": 213.48445, "b": 436.88025, "coord_origin": "TOPLEFT"}}, {"id": 61, "text": ", Oct 2017. 1", "bbox": {"l": 213.483, "t": 429.07050000000004, "r": 261.04083, "b": 437.08646000000005, "coord_origin": "TOPLEFT"}}]}, "text": "[9] Kaiming He, Georgia Gkioxari, Piotr Dollar, and Ross Girshick. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision (ICCV) , Oct 2017. 1"}, {"label": "list_item", "id": 9, "page_no": 8, "cluster": {"id": 9, "label": "list_item", "bbox": {"l": 49.86316680908203, "t": 439.81085205078125, "r": 286.36334, "b": 481.95904541015625, "coord_origin": "TOPLEFT"}, "confidence": 0.9274739027023315, "cells": [{"id": 62, "text": "[10]", "bbox": {"l": 50.112, "t": 440.3424999999999, "r": 65.399307, "b": 448.3584599999999, "coord_origin": "TOPLEFT"}}, {"id": 63, "text": "Yelin He, X. Qi, Jiaquan Ye, Peng Gao, Yihao Chen, Bing-", "bbox": {"l": 67.693321, "t": 440.3424999999999, "r": 286.3587, "b": 448.3584599999999, "coord_origin": "TOPLEFT"}}, {"id": 64, "text": "cong Li, Xin Tang, and Rong Xiao.", "bbox": {"l": 70.030998, "t": 451.30151, "r": 202.74268, "b": 459.31747, "coord_origin": "TOPLEFT"}}, {"id": 65, "text": "Pingan-vcgroup\u2019s so-", "bbox": {"l": 209.00122, "t": 451.30151, "r": 286.36331, "b": 459.31747, "coord_origin": "TOPLEFT"}}, {"id": 66, "text": "lution for icdar 2021 competition on scientific table image", "bbox": {"l": 70.030998, "t": 462.2605, "r": 286.36334, "b": 470.27646, "coord_origin": "TOPLEFT"}}, {"id": 67, "text": "recognition to latex.", "bbox": {"l": 70.030998, "t": 473.21948, "r": 141.86981, "b": 481.23544, "coord_origin": "TOPLEFT"}}, {"id": 68, "text": "ArXiv", "bbox": {"l": 145.097, "t": 473.30017, "r": 166.01561, "b": 481.02924, "coord_origin": "TOPLEFT"}}, {"id": 69, "text": ", abs/2105.01846, 2021. 2", "bbox": {"l": 166.015, "t": 473.21948, "r": 259.90216, "b": 481.23544, "coord_origin": "TOPLEFT"}}]}, "text": "[10] Yelin He, X. Qi, Jiaquan Ye, Peng Gao, Yihao Chen, Bingcong Li, Xin Tang, and Rong Xiao. Pingan-vcgroup\u2019s solution for icdar 2021 competition on scientific table image recognition to latex. ArXiv , abs/2105.01846, 2021. 2"}, {"label": "list_item", "id": 10, "page_no": 8, "cluster": {"id": 10, "label": "list_item", "bbox": {"l": 49.55924987792969, "t": 483.82781982421875, "r": 286.4127197265625, "b": 536.34238, "coord_origin": "TOPLEFT"}, "confidence": 0.9299948811531067, "cells": [{"id": 70, "text": "[11]", "bbox": {"l": 50.112, "t": 484.49048, "r": 66.033806, "b": 492.50644, "coord_origin": "TOPLEFT"}}, {"id": 71, "text": "Jianying Hu, Ramanujan S Kashi, Daniel P Lopresti, and", "bbox": {"l": 68.423035, "t": 484.49048, "r": 286.35873, "b": 492.50644, "coord_origin": "TOPLEFT"}}, {"id": 72, "text": "Gordon Wilfong. Medium-independent table detection. In", "bbox": {"l": 70.030998, "t": 495.44946, "r": 286.36331, "b": 503.46542, "coord_origin": "TOPLEFT"}}, {"id": 73, "text": "Document Recognition and Retrieval VII", "bbox": {"l": 70.030998, "t": 506.48914, "r": 227.40926, "b": 514.2182, "coord_origin": "TOPLEFT"}}, {"id": 74, "text": ", volume 3967,", "bbox": {"l": 227.40500000000003, "t": 506.40845, "r": 286.35913, "b": 514.4244100000001, "coord_origin": "TOPLEFT"}}, {"id": 75, "text": "pages 291-302. International Society for Optics and Photon-", "bbox": {"l": 70.031006, "t": 517.36743, "r": 286.36328, "b": 525.38339, "coord_origin": "TOPLEFT"}}, {"id": 76, "text": "ics, 1999. 2", "bbox": {"l": 70.031006, "t": 528.32642, "r": 112.36138000000001, "b": 536.34238, "coord_origin": "TOPLEFT"}}]}, "text": "[11] Jianying Hu, Ramanujan S Kashi, Daniel P Lopresti, and Gordon Wilfong. Medium-independent table detection. In Document Recognition and Retrieval VII , volume 3967, pages 291-302. International Society for Optics and Photonics, 1999. 2"}, {"label": "list_item", "id": 11, "page_no": 8, "cluster": {"id": 11, "label": "list_item", "bbox": {"l": 49.559425354003906, "t": 539.2149658203125, "r": 286.9141845703125, "b": 591.44937, "coord_origin": "TOPLEFT"}, "confidence": 0.9394100904464722, "cells": [{"id": 77, "text": "[12]", "bbox": {"l": 50.112007, "t": 539.59842, "r": 65.466705, "b": 547.61438, "coord_origin": "TOPLEFT"}}, {"id": 78, "text": "Matthew Hurst. A constraint-based approach to table struc-", "bbox": {"l": 67.770828, "t": 539.59842, "r": 286.35873, "b": 547.61438, "coord_origin": "TOPLEFT"}}, {"id": 79, "text": "ture derivation. In", "bbox": {"l": 70.031006, "t": 550.55742, "r": 136.28374, "b": 558.57338, "coord_origin": "TOPLEFT"}}, {"id": 80, "text": "Proceedings of the Seventh International", "bbox": {"l": 138.811, "t": 550.63812, "r": 286.36206, "b": 558.36716, "coord_origin": "TOPLEFT"}}, {"id": 81, "text": "Conference on Document Analysis and Recognition - Volume", "bbox": {"l": 70.031006, "t": 561.5971199999999, "r": 286.36334, "b": 569.32616, "coord_origin": "TOPLEFT"}}, {"id": 82, "text": "2", "bbox": {"l": 70.031006, "t": 572.55612, "r": 74.514206, "b": 580.28516, "coord_origin": "TOPLEFT"}}, {"id": 83, "text": ", ICDAR \u201903, page 911, USA, 2003. IEEE Computer Soci-", "bbox": {"l": 74.514008, "t": 572.47542, "r": 286.36313, "b": 580.4913799999999, "coord_origin": "TOPLEFT"}}, {"id": 84, "text": "ety. 2", "bbox": {"l": 70.031006, "t": 583.4334100000001, "r": 90.357834, "b": 591.44937, "coord_origin": "TOPLEFT"}}]}, "text": "[12] Matthew Hurst. A constraint-based approach to table structure derivation. In Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2 , ICDAR \u201903, page 911, USA, 2003. IEEE Computer Society. 2"}, {"label": "list_item", "id": 12, "page_no": 8, "cluster": {"id": 12, "label": "list_item", "bbox": {"l": 49.5648307800293, "t": 593.7275390625, "r": 286.607177734375, "b": 647.1483154296875, "coord_origin": "TOPLEFT"}, "confidence": 0.9298840761184692, "cells": [{"id": 85, "text": "[13]", "bbox": {"l": 50.112007, "t": 594.70541, "r": 66.270439, "b": 602.72137, "coord_origin": "TOPLEFT"}}, {"id": 86, "text": "Thotreingam Kasar, Philippine Barlas, Sebastien Adam,", "bbox": {"l": 68.695168, "t": 594.70541, "r": 286.35873, "b": 602.72137, "coord_origin": "TOPLEFT"}}, {"id": 87, "text": "Cl\u00b4ement Chatelain, and Thierry Paquet. Learning to detect", "bbox": {"l": 70.031006, "t": 605.66441, "r": 286.3631, "b": 613.68037, "coord_origin": "TOPLEFT"}}, {"id": 88, "text": "tables in scanned document images using line information.", "bbox": {"l": 70.031006, "t": 616.62341, "r": 286.36331, "b": 624.63937, "coord_origin": "TOPLEFT"}}, {"id": 89, "text": "In", "bbox": {"l": 70.031006, "t": 627.58241, "r": 77.500015, "b": 635.5983699999999, "coord_origin": "TOPLEFT"}}, {"id": 90, "text": "2013 12th International Conference on Document Analy-", "bbox": {"l": 79.920006, "t": 627.6631199999999, "r": 286.3624, "b": 635.39215, "coord_origin": "TOPLEFT"}}, {"id": 91, "text": "sis and Recognition", "bbox": {"l": 70.031006, "t": 638.62212, "r": 140.67728, "b": 646.35115, "coord_origin": "TOPLEFT"}}, {"id": 92, "text": ", pages 1185-1189. IEEE, 2013. 2", "bbox": {"l": 140.67599, "t": 638.54141, "r": 264.43921, "b": 646.55737, "coord_origin": "TOPLEFT"}}]}, "text": "[13] Thotreingam Kasar, Philippine Barlas, Sebastien Adam, Cl\u00b4ement Chatelain, and Thierry Paquet. Learning to detect tables in scanned document images using line information. In 2013 12th International Conference on Document Analysis and Recognition , pages 1185-1189. IEEE, 2013. 2"}, {"label": "list_item", "id": 13, "page_no": 8, "cluster": {"id": 13, "label": "list_item", "bbox": {"l": 49.71070861816406, "t": 649.1871337890625, "r": 286.4481201171875, "b": 680.2498168945312, "coord_origin": "TOPLEFT"}, "confidence": 0.9115259051322937, "cells": [{"id": 93, "text": "[14]", "bbox": {"l": 50.111992, "t": 649.81342, "r": 66.534035, "b": 657.82938, "coord_origin": "TOPLEFT"}}, {"id": 94, "text": "Pratik Kayal, Mrinal Anand, Harsh Desai, and Mayank", "bbox": {"l": 68.998329, "t": 649.81342, "r": 286.35873, "b": 657.82938, "coord_origin": "TOPLEFT"}}, {"id": 95, "text": "Singh.", "bbox": {"l": 70.030991, "t": 660.77142, "r": 93.200165, "b": 668.78738, "coord_origin": "TOPLEFT"}}, {"id": 96, "text": "Icdar 2021 competition on scientific table image", "bbox": {"l": 102.20243, "t": 660.77142, "r": 286.36334, "b": 668.78738, "coord_origin": "TOPLEFT"}}, {"id": 97, "text": "recognition to latex, 2021. 2", "bbox": {"l": 70.030991, "t": 671.73042, "r": 171.9969, "b": 679.74638, "coord_origin": "TOPLEFT"}}]}, "text": "[14] Pratik Kayal, Mrinal Anand, Harsh Desai, and Mayank Singh. Icdar 2021 competition on scientific table image recognition to latex, 2021. 2"}, {"label": "list_item", "id": 14, "page_no": 8, "cluster": {"id": 14, "label": "list_item", "bbox": {"l": 49.51462936401367, "t": 682.48046875, "r": 286.42413330078125, "b": 712.936386, "coord_origin": "TOPLEFT"}, "confidence": 0.9122310876846313, "cells": [{"id": 98, "text": "[15]", "bbox": {"l": 50.111992, "t": 683.00243, "r": 65.515968, "b": 691.01839, "coord_origin": "TOPLEFT"}}, {"id": 99, "text": "Harold W Kuhn. The hungarian method for the assignment", "bbox": {"l": 67.827499, "t": 683.00243, "r": 286.3587, "b": 691.01839, "coord_origin": "TOPLEFT"}}, {"id": 100, "text": "problem.", "bbox": {"l": 70.030991, "t": 693.9614260000001, "r": 102.15761, "b": 701.977386, "coord_origin": "TOPLEFT"}}, {"id": 101, "text": "Naval research logistics quarterly", "bbox": {"l": 107.54999, "t": 694.0421220000001, "r": 231.47461, "b": 701.771156, "coord_origin": "TOPLEFT"}}, {"id": 102, "text": ", 2(1-2):83-97,", "bbox": {"l": 231.47598, "t": 693.9614260000001, "r": 286.35931, "b": 701.977386, "coord_origin": "TOPLEFT"}}, {"id": 103, "text": "1955. 6", "bbox": {"l": 70.030975, "t": 704.920425, "r": 97.916481, "b": 712.936386, "coord_origin": "TOPLEFT"}}]}, "text": "[15] Harold W Kuhn. The hungarian method for the assignment problem. Naval research logistics quarterly , 2(1-2):83-97, 1955. 6"}, {"label": "list_item", "id": 15, "page_no": 8, "cluster": {"id": 15, "label": "list_item", "bbox": {"l": 308.4394836425781, "t": 74.8146743774414, "r": 545.1665649414062, "b": 138.69335999999998, "coord_origin": "TOPLEFT"}, "confidence": 0.9389601349830627, "cells": [{"id": 104, "text": "[16]", "bbox": {"l": 308.86197, "t": 75.88342000000011, "r": 324.74973, "b": 83.89940999999999, "coord_origin": "TOPLEFT"}}, {"id": 105, "text": "Girish Kulkarni, Visruth Premraj, Vicente Ordonez, Sag-", "bbox": {"l": 327.13382, "t": 75.88342000000011, "r": 545.1087, "b": 83.89940999999999, "coord_origin": "TOPLEFT"}}, {"id": 106, "text": "nik Dhar, Siming Li, Yejin Choi, Alexander C. Berg, and", "bbox": {"l": 328.78098, "t": 86.84142999999995, "r": 545.1134, "b": 94.85741999999993, "coord_origin": "TOPLEFT"}}, {"id": 107, "text": "Tamara L. Berg.", "bbox": {"l": 328.78098, "t": 97.80042000000003, "r": 390.96295, "b": 105.81641000000002, "coord_origin": "TOPLEFT"}}, {"id": 108, "text": "Babytalk:", "bbox": {"l": 400.27008, "t": 97.80042000000003, "r": 435.1404099999999, "b": 105.81641000000002, "coord_origin": "TOPLEFT"}}, {"id": 109, "text": "Understanding and generat-", "bbox": {"l": 441.71277, "t": 97.80042000000003, "r": 545.11328, "b": 105.81641000000002, "coord_origin": "TOPLEFT"}}, {"id": 110, "text": "ing simple image descriptions.", "bbox": {"l": 328.78098, "t": 108.75940000000003, "r": 440.80719, "b": 116.7753899999999, "coord_origin": "TOPLEFT"}}, {"id": 111, "text": "IEEE Transactions on Pat-", "bbox": {"l": 446.63498, "t": 108.84009000000003, "r": 545.11304, "b": 116.56914999999992, "coord_origin": "TOPLEFT"}}, {"id": 112, "text": "tern Analysis and Machine Intelligence", "bbox": {"l": 328.78098, "t": 119.79907000000003, "r": 471.13153, "b": 127.52814000000001, "coord_origin": "TOPLEFT"}}, {"id": 113, "text": ", 35(12):2891-2903,", "bbox": {"l": 471.13300000000004, "t": 119.71838000000002, "r": 545.11475, "b": 127.73437999999999, "coord_origin": "TOPLEFT"}}, {"id": 114, "text": "2013. 4", "bbox": {"l": 328.78101, "t": 130.67737, "r": 356.6665, "b": 138.69335999999998, "coord_origin": "TOPLEFT"}}]}, "text": "[16] Girish Kulkarni, Visruth Premraj, Vicente Ordonez, Sagnik Dhar, Siming Li, Yejin Choi, Alexander C. Berg, and Tamara L. Berg. Babytalk: Understanding and generating simple image descriptions. IEEE Transactions on Pattern Analysis and Machine Intelligence , 35(12):2891-2903, 2013. 4"}, {"label": "list_item", "id": 16, "page_no": 8, "cluster": {"id": 16, "label": "list_item", "bbox": {"l": 308.39459228515625, "t": 141.0391845703125, "r": 545.1134, "b": 172.29119873046875, "coord_origin": "TOPLEFT"}, "confidence": 0.9253131747245789, "cells": [{"id": 115, "text": "[17]", "bbox": {"l": 308.862, "t": 142.12334999999996, "r": 325.24371, "b": 150.13933999999995, "coord_origin": "TOPLEFT"}}, {"id": 116, "text": "Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming", "bbox": {"l": 327.70197, "t": 142.12334999999996, "r": 545.10883, "b": 150.13933999999995, "coord_origin": "TOPLEFT"}}, {"id": 117, "text": "Zhou, and Zhoujun Li.", "bbox": {"l": 328.78101, "t": 153.08136000000002, "r": 414.44598, "b": 161.09735, "coord_origin": "TOPLEFT"}}, {"id": 118, "text": "Tablebank: A benchmark dataset", "bbox": {"l": 421.82532, "t": 153.08136000000002, "r": 545.1134, "b": 161.09735, "coord_origin": "TOPLEFT"}}, {"id": 119, "text": "for table detection and recognition, 2019. 2, 3", "bbox": {"l": 328.78101, "t": 164.04034000000001, "r": 493.62835999999993, "b": 172.05633999999998, "coord_origin": "TOPLEFT"}}]}, "text": "[17] Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou, and Zhoujun Li. Tablebank: A benchmark dataset for table detection and recognition, 2019. 2, 3"}, {"label": "list_item", "id": 17, "page_no": 8, "cluster": {"id": 17, "label": "list_item", "bbox": {"l": 308.69390869140625, "t": 174.55084228515625, "r": 545.3489990234375, "b": 260.21423000000004, "coord_origin": "TOPLEFT"}, "confidence": 0.9299733638763428, "cells": [{"id": 120, "text": "[18]", "bbox": {"l": 308.862, "t": 175.48632999999995, "r": 324.26599, "b": 183.50232000000005, "coord_origin": "TOPLEFT"}}, {"id": 121, "text": "Yiren Li, Zheng Huang, Junchi Yan, Yi Zhou, Fan Ye, and", "bbox": {"l": 326.57751, "t": 175.48632999999995, "r": 545.10876, "b": 183.50232000000005, "coord_origin": "TOPLEFT"}}, {"id": 122, "text": "Xianhui Liu. Gfte: Graph-based financial table extraction.", "bbox": {"l": 328.78101, "t": 186.44530999999995, "r": 545.11334, "b": 194.46130000000005, "coord_origin": "TOPLEFT"}}, {"id": 123, "text": "In Alberto Del Bimbo, Rita Cucchiara, Stan Sclaroff, Gio-", "bbox": {"l": 328.78101, "t": 197.40430000000003, "r": 545.11346, "b": 205.42029000000002, "coord_origin": "TOPLEFT"}}, {"id": 124, "text": "vanni Maria Farinella, Tao Mei, Marco Bertini, Hugo Jair", "bbox": {"l": 328.78101, "t": 208.36328000000003, "r": 545.11353, "b": 216.37927000000002, "coord_origin": "TOPLEFT"}}, {"id": 125, "text": "Escalante, and Roberto Vezzani, editors,", "bbox": {"l": 328.78101, "t": 219.32227, "r": 479.26413, "b": 227.33826, "coord_origin": "TOPLEFT"}}, {"id": 126, "text": "Pattern Recogni-", "bbox": {"l": 483.11902, "t": 219.40295000000003, "r": 545.11273, "b": 227.13202, "coord_origin": "TOPLEFT"}}, {"id": 127, "text": "tion. ICPR International Workshops and Challenges", "bbox": {"l": 328.78101, "t": 230.36095999999998, "r": 519.39771, "b": 238.09002999999996, "coord_origin": "TOPLEFT"}}, {"id": 128, "text": ", pages", "bbox": {"l": 519.401, "t": 230.28026999999997, "r": 545.10767, "b": 238.29625999999996, "coord_origin": "TOPLEFT"}}, {"id": 129, "text": "644-658, Cham, 2021. Springer International Publishing. 2,", "bbox": {"l": 328.78101, "t": 241.23925999999994, "r": 545.11328, "b": 249.25525000000005, "coord_origin": "TOPLEFT"}}, {"id": 130, "text": "3", "bbox": {"l": 328.78101, "t": 252.19824000000006, "r": 333.26422, "b": 260.21423000000004, "coord_origin": "TOPLEFT"}}]}, "text": "[18] Yiren Li, Zheng Huang, Junchi Yan, Yi Zhou, Fan Ye, and Xianhui Liu. Gfte: Graph-based financial table extraction. In Alberto Del Bimbo, Rita Cucchiara, Stan Sclaroff, Giovanni Maria Farinella, Tao Mei, Marco Bertini, Hugo Jair Escalante, and Roberto Vezzani, editors, Pattern Recognition. ICPR International Workshops and Challenges , pages 644-658, Cham, 2021. Springer International Publishing. 2, 3"}, {"label": "list_item", "id": 18, "page_no": 8, "cluster": {"id": 18, "label": "list_item", "bbox": {"l": 308.6376647949219, "t": 263.07110595703125, "r": 545.2516479492188, "b": 326.943115234375, "coord_origin": "TOPLEFT"}, "confidence": 0.9373217821121216, "cells": [{"id": 131, "text": "[19]", "bbox": {"l": 308.862, "t": 263.64423, "r": 324.26477, "b": 271.66022, "coord_origin": "TOPLEFT"}}, {"id": 132, "text": "Nikolaos Livathinos, Cesar Berrospi, Maksym Lysak, Vik-", "bbox": {"l": 326.57611, "t": 263.64423, "r": 545.10883, "b": 271.66022, "coord_origin": "TOPLEFT"}}, {"id": 133, "text": "tor Kuropiatnyk, Ahmed Nassar, Andre Carvalho, Michele", "bbox": {"l": 328.78101, "t": 274.60321, "r": 545.1134, "b": 282.61917000000005, "coord_origin": "TOPLEFT"}}, {"id": 134, "text": "Dolfi, Christoph Auer, Kasper Dinkla, and Peter Staar. Ro-", "bbox": {"l": 328.78101, "t": 285.56219, "r": 545.11328, "b": 293.57816, "coord_origin": "TOPLEFT"}}, {"id": 135, "text": "bust pdf document conversion using recurrent neural net-", "bbox": {"l": 328.78101, "t": 296.52118, "r": 545.11334, "b": 304.53714, "coord_origin": "TOPLEFT"}}, {"id": 136, "text": "works.", "bbox": {"l": 328.78101, "t": 307.47919, "r": 352.84683, "b": 315.49515, "coord_origin": "TOPLEFT"}}, {"id": 137, "text": "Proceedings of the AAAI Conference on Artificial", "bbox": {"l": 360.23599, "t": 307.55988, "r": 545.1142, "b": 315.28894, "coord_origin": "TOPLEFT"}}, {"id": 138, "text": "Intelligence", "bbox": {"l": 328.78101, "t": 318.51886, "r": 371.02173, "b": 326.24792, "coord_origin": "TOPLEFT"}}, {"id": 139, "text": ", 35(17):15137-15145, May 2021. 1", "bbox": {"l": 371.021, "t": 318.43817, "r": 502.26227, "b": 326.45413, "coord_origin": "TOPLEFT"}}]}, "text": "[19] Nikolaos Livathinos, Cesar Berrospi, Maksym Lysak, Viktor Kuropiatnyk, Ahmed Nassar, Andre Carvalho, Michele Dolfi, Christoph Auer, Kasper Dinkla, and Peter Staar. Robust pdf document conversion using recurrent neural networks. Proceedings of the AAAI Conference on Artificial Intelligence , 35(17):15137-15145, May 2021. 1"}, {"label": "list_item", "id": 19, "page_no": 8, "cluster": {"id": 19, "label": "list_item", "bbox": {"l": 308.6220703125, "t": 328.9691467285156, "r": 545.3649291992188, "b": 371.1004638671875, "coord_origin": "TOPLEFT"}, "confidence": 0.902729332447052, "cells": [{"id": 140, "text": "[20]", "bbox": {"l": 308.862, "t": 329.88419, "r": 323.82672, "b": 337.90015, "coord_origin": "TOPLEFT"}}, {"id": 141, "text": "Rujiao Long, Wen Wang, Nan Xue, Feiyu Gao, Zhibo Yang,", "bbox": {"l": 326.07233, "t": 329.88419, "r": 545.10876, "b": 337.90015, "coord_origin": "TOPLEFT"}}, {"id": 142, "text": "Yongpan Wang, and Gui-Song Xia. Parsing table structures", "bbox": {"l": 328.78101, "t": 340.8432, "r": 545.11346, "b": 348.85916, "coord_origin": "TOPLEFT"}}, {"id": 143, "text": "in the wild. In", "bbox": {"l": 328.78101, "t": 351.80219000000005, "r": 382.7767, "b": 359.81815000000006, "coord_origin": "TOPLEFT"}}, {"id": 144, "text": "Proceedings of the IEEE/CVF International", "bbox": {"l": 385.54102, "t": 351.88287, "r": 545.11609, "b": 359.61194, "coord_origin": "TOPLEFT"}}, {"id": 145, "text": "Conference on Computer Vision", "bbox": {"l": 328.78101, "t": 362.84186, "r": 443.59579, "b": 370.57092, "coord_origin": "TOPLEFT"}}, {"id": 146, "text": ", pages 944-952, 2021. 2", "bbox": {"l": 443.59399, "t": 362.76117, "r": 534.48645, "b": 370.77713, "coord_origin": "TOPLEFT"}}]}, "text": "[20] Rujiao Long, Wen Wang, Nan Xue, Feiyu Gao, Zhibo Yang, Yongpan Wang, and Gui-Song Xia. Parsing table structures in the wild. In Proceedings of the IEEE/CVF International Conference on Computer Vision , pages 944-952, 2021. 2"}, {"label": "list_item", "id": 20, "page_no": 8, "cluster": {"id": 20, "label": "list_item", "bbox": {"l": 308.6834716796875, "t": 373.1802978515625, "r": 545.1424560546875, "b": 437.55621337890625, "coord_origin": "TOPLEFT"}, "confidence": 0.8827157020568848, "cells": [{"id": 147, "text": "[21]", "bbox": {"l": 308.862, "t": 374.20618, "r": 324.60281, "b": 382.22214, "coord_origin": "TOPLEFT"}}, {"id": 148, "text": "Shubham", "bbox": {"l": 326.96487, "t": 374.20618, "r": 362.6604, "b": 382.22214, "coord_origin": "TOPLEFT"}}, {"id": 149, "text": "Singh", "bbox": {"l": 368.69479, "t": 374.20618, "r": 389.6134, "b": 382.22214, "coord_origin": "TOPLEFT"}}, {"id": 150, "text": "Paliwal,", "bbox": {"l": 395.6478, "t": 374.20618, "r": 424.56445, "b": 382.22214, "coord_origin": "TOPLEFT"}}, {"id": 151, "text": "D", "bbox": {"l": 431.5492899999999, "t": 374.20618, "r": 438.0230399999999, "b": 382.22214, "coord_origin": "TOPLEFT"}}, {"id": 152, "text": "Vishwanath,", "bbox": {"l": 444.05743, "t": 374.20618, "r": 488.5038799999999, "b": 382.22214, "coord_origin": "TOPLEFT"}}, {"id": 153, "text": "Rohit", "bbox": {"l": 495.47974, "t": 374.20618, "r": 515.41205, "b": 382.22214, "coord_origin": "TOPLEFT"}}, {"id": 154, "text": "Rahul,", "bbox": {"l": 521.44641, "t": 374.20618, "r": 545.10876, "b": 382.22214, "coord_origin": "TOPLEFT"}}, {"id": 155, "text": "Monika Sharma, and Lovekesh Vig. Tablenet: Deep learn-", "bbox": {"l": 328.78101, "t": 385.16516, "r": 545.1134, "b": 393.18112, "coord_origin": "TOPLEFT"}}, {"id": 156, "text": "ing model for end-to-end table detection and tabular data ex-", "bbox": {"l": 328.78101, "t": 396.12415, "r": 545.11346, "b": 404.14011, "coord_origin": "TOPLEFT"}}, {"id": 157, "text": "traction from scanned document images.", "bbox": {"l": 328.78101, "t": 407.08313, "r": 478.00881999999996, "b": 415.09909, "coord_origin": "TOPLEFT"}}, {"id": 158, "text": "In", "bbox": {"l": 484.0701, "t": 407.08313, "r": 491.53912, "b": 415.09909, "coord_origin": "TOPLEFT"}}, {"id": 159, "text": "2019 Interna-", "bbox": {"l": 494.668, "t": 407.16382, "r": 545.11298, "b": 414.89288, "coord_origin": "TOPLEFT"}}, {"id": 160, "text": "tional Conference on Document Analysis and Recognition", "bbox": {"l": 328.78101, "t": 418.12280000000004, "r": 545.11334, "b": 425.85187, "coord_origin": "TOPLEFT"}}, {"id": 161, "text": "(ICDAR)", "bbox": {"l": 328.78101, "t": 429.08179, "r": 360.83591, "b": 436.8108500000001, "coord_origin": "TOPLEFT"}}, {"id": 162, "text": ", pages 128-133. IEEE, 2019. 1", "bbox": {"l": 360.836, "t": 429.0011, "r": 475.63287, "b": 437.01706, "coord_origin": "TOPLEFT"}}]}, "text": "[21] Shubham Singh Paliwal, D Vishwanath, Rohit Rahul, Monika Sharma, and Lovekesh Vig. Tablenet: Deep learning model for end-to-end table detection and tabular data extraction from scanned document images. In 2019 International Conference on Document Analysis and Recognition (ICDAR) , pages 128-133. IEEE, 2019. 1"}, {"label": "list_item", "id": 21, "page_no": 8, "cluster": {"id": 21, "label": "list_item", "bbox": {"l": 308.78057861328125, "t": 439.68524169921875, "r": 545.1746215820312, "b": 558.05096, "coord_origin": "TOPLEFT"}, "confidence": 0.8896440863609314, "cells": [{"id": 163, "text": "[22]", "bbox": {"l": 308.862, "t": 440.44611, "r": 324.57407, "b": 448.46207, "coord_origin": "TOPLEFT"}}, {"id": 164, "text": "Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer,", "bbox": {"l": 326.93179, "t": 440.44611, "r": 545.1087, "b": 448.46207, "coord_origin": "TOPLEFT"}}, {"id": 165, "text": "James Bradbury, Gregory Chanan, Trevor Killeen, Zeming", "bbox": {"l": 328.78101, "t": 451.40509, "r": 545.11346, "b": 459.42105, "coord_origin": "TOPLEFT"}}, {"id": 166, "text": "Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison,", "bbox": {"l": 328.78101, "t": 462.36407, "r": 545.11328, "b": 470.38004, "coord_origin": "TOPLEFT"}}, {"id": 167, "text": "Andreas Kopf, Edward Yang, Zachary DeVito, Martin Rai-", "bbox": {"l": 328.78101, "t": 473.32306, "r": 545.11328, "b": 481.33902, "coord_origin": "TOPLEFT"}}, {"id": 168, "text": "son, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner,", "bbox": {"l": 328.78101, "t": 484.28204, "r": 545.11328, "b": 492.298, "coord_origin": "TOPLEFT"}}, {"id": 169, "text": "Lu Fang, Junjie Bai, and Soumith Chintala. Pytorch: An im-", "bbox": {"l": 328.78101, "t": 495.24103, "r": 545.1134, "b": 503.25699, "coord_origin": "TOPLEFT"}}, {"id": 170, "text": "perative style, high-performance deep learning library. In H.", "bbox": {"l": 328.78101, "t": 506.20001, "r": 545.1134, "b": 514.21597, "coord_origin": "TOPLEFT"}}, {"id": 171, "text": "Wallach, H. Larochelle, A. Beygelzimer, F. d'Alch\u00b4e-Buc, E.", "bbox": {"l": 328.78101, "t": 517.159, "r": 545.1098, "b": 525.17496, "coord_origin": "TOPLEFT"}}, {"id": 172, "text": "Fox, and R. Garnett, editors,", "bbox": {"l": 328.78101, "t": 528.117, "r": 434.56659, "b": 536.13297, "coord_origin": "TOPLEFT"}}, {"id": 173, "text": "Advances in Neural Informa-", "bbox": {"l": 437.86401, "t": 528.19769, "r": 545.11115, "b": 535.9267600000001, "coord_origin": "TOPLEFT"}}, {"id": 174, "text": "tion Processing Systems 32", "bbox": {"l": 328.78101, "t": 539.15671, "r": 425.73471, "b": 546.8857399999999, "coord_origin": "TOPLEFT"}}, {"id": 175, "text": ", pages 8024-8035. Curran Asso-", "bbox": {"l": 425.73602, "t": 539.076, "r": 545.11475, "b": 547.09196, "coord_origin": "TOPLEFT"}}, {"id": 176, "text": "ciates, Inc., 2019. 6", "bbox": {"l": 328.78101, "t": 550.035, "r": 399.74109, "b": 558.05096, "coord_origin": "TOPLEFT"}}]}, "text": "[22] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. Pytorch: An imperative style, high-performance deep learning library. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alch\u00b4e-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems 32 , pages 8024-8035. Curran Associates, Inc., 2019. 6"}, {"label": "list_item", "id": 22, "page_no": 8, "cluster": {"id": 22, "label": "list_item", "bbox": {"l": 308.73394775390625, "t": 560.5276489257812, "r": 545.4642944335938, "b": 625.1737060546875, "coord_origin": "TOPLEFT"}, "confidence": 0.9025442600250244, "cells": [{"id": 177, "text": "[23]", "bbox": {"l": 308.862, "t": 561.481, "r": 324.50351, "b": 569.49696, "coord_origin": "TOPLEFT"}}, {"id": 178, "text": "Devashish Prasad, Ayan Gadpal, Kshitij Kapadni, Manish", "bbox": {"l": 326.85068, "t": 561.481, "r": 545.10876, "b": 569.49696, "coord_origin": "TOPLEFT"}}, {"id": 179, "text": "Visave, and Kavita Sultanpure. Cascadetabnet: An approach", "bbox": {"l": 328.78101, "t": 572.44, "r": 545.1134, "b": 580.45596, "coord_origin": "TOPLEFT"}}, {"id": 180, "text": "for end to end table detection and structure recognition from", "bbox": {"l": 328.78101, "t": 583.399, "r": 545.11334, "b": 591.4149600000001, "coord_origin": "TOPLEFT"}}, {"id": 181, "text": "image-based documents. In", "bbox": {"l": 328.78101, "t": 594.358, "r": 431.61667, "b": 602.37396, "coord_origin": "TOPLEFT"}}, {"id": 182, "text": "Proceedings of the IEEE/CVF", "bbox": {"l": 434.69101000000006, "t": 594.4387099999999, "r": 545.11224, "b": 602.16774, "coord_origin": "TOPLEFT"}}, {"id": 183, "text": "Conference on Computer Vision and Pattern Recognition", "bbox": {"l": 328.78101, "t": 605.39671, "r": 545.1134, "b": 613.12575, "coord_origin": "TOPLEFT"}}, {"id": 184, "text": "Workshops", "bbox": {"l": 328.78101, "t": 616.35571, "r": 367.8028, "b": 624.08475, "coord_origin": "TOPLEFT"}}, {"id": 185, "text": ", pages 572-573, 2020. 1", "bbox": {"l": 367.802, "t": 616.2750100000001, "r": 458.69446000000005, "b": 624.29097, "coord_origin": "TOPLEFT"}}]}, "text": "[23] Devashish Prasad, Ayan Gadpal, Kshitij Kapadni, Manish Visave, and Kavita Sultanpure. Cascadetabnet: An approach for end to end table detection and structure recognition from image-based documents. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops , pages 572-573, 2020. 1"}, {"label": "list_item", "id": 23, "page_no": 8, "cluster": {"id": 23, "label": "list_item", "bbox": {"l": 308.49481201171875, "t": 627.0421752929688, "r": 545.31982421875, "b": 669.146484375, "coord_origin": "TOPLEFT"}, "confidence": 0.8777534365653992, "cells": [{"id": 186, "text": "[24]", "bbox": {"l": 308.862, "t": 627.72101, "r": 324.69476, "b": 635.73697, "coord_origin": "TOPLEFT"}}, {"id": 187, "text": "Shah Rukh Qasim, Hassan Mahmood, and Faisal Shafait.", "bbox": {"l": 327.07065, "t": 627.72101, "r": 545.1087, "b": 635.73697, "coord_origin": "TOPLEFT"}}, {"id": 188, "text": "Rethinking table recognition using graph neural networks.", "bbox": {"l": 328.78101, "t": 638.68001, "r": 545.11328, "b": 646.69597, "coord_origin": "TOPLEFT"}}, {"id": 189, "text": "In", "bbox": {"l": 328.78101, "t": 649.63901, "r": 336.25003, "b": 657.65497, "coord_origin": "TOPLEFT"}}, {"id": 190, "text": "2019 International Conference on Document Analysis and", "bbox": {"l": 338.10001, "t": 649.71971, "r": 545.11621, "b": 657.44875, "coord_origin": "TOPLEFT"}}, {"id": 191, "text": "Recognition (ICDAR)", "bbox": {"l": 328.78101, "t": 660.67871, "r": 406.32245, "b": 668.40775, "coord_origin": "TOPLEFT"}}, {"id": 192, "text": ", pages 142-147. IEEE, 2019. 3", "bbox": {"l": 406.32202, "t": 660.5980099999999, "r": 521.1189, "b": 668.61398, "coord_origin": "TOPLEFT"}}]}, "text": "[24] Shah Rukh Qasim, Hassan Mahmood, and Faisal Shafait. Rethinking table recognition using graph neural networks. In 2019 International Conference on Document Analysis and Recognition (ICDAR) , pages 142-147. IEEE, 2019. 3"}, {"label": "list_item", "id": 24, "page_no": 8, "cluster": {"id": 24, "label": "list_item", "bbox": {"l": 308.78839111328125, "t": 671.11767578125, "r": 545.2333374023438, "b": 713.0277709960938, "coord_origin": "TOPLEFT"}, "confidence": 0.8654534220695496, "cells": [{"id": 193, "text": "[25]", "bbox": {"l": 308.86203, "t": 672.04301, "r": 324.71329, "b": 680.05898, "coord_origin": "TOPLEFT"}}, {"id": 194, "text": "Hamid Rezatofighi, Nathan Tsoi, JunYoung Gwak, Amir", "bbox": {"l": 327.09195, "t": 672.04301, "r": 545.10876, "b": 680.05898, "coord_origin": "TOPLEFT"}}, {"id": 195, "text": "Sadeghian, Ian Reid, and Silvio Savarese.", "bbox": {"l": 328.78104, "t": 683.0020099999999, "r": 482.81488, "b": 691.01797, "coord_origin": "TOPLEFT"}}, {"id": 196, "text": "Generalized in-", "bbox": {"l": 488.75064, "t": 683.0020099999999, "r": 545.1134, "b": 691.01797, "coord_origin": "TOPLEFT"}}, {"id": 197, "text": "tersection over union: A metric and a loss for bounding box", "bbox": {"l": 328.78104, "t": 693.961014, "r": 545.11334, "b": 701.976974, "coord_origin": "TOPLEFT"}}, {"id": 198, "text": "regression. In", "bbox": {"l": 328.78104, "t": 704.920013, "r": 379.1543, "b": 712.935974, "coord_origin": "TOPLEFT"}}, {"id": 199, "text": "Proceedings of the IEEE/CVF Conference on", "bbox": {"l": 381.61603, "t": 705.00071, "r": 545.10938, "b": 712.729744, "coord_origin": "TOPLEFT"}}]}, "text": "[25] Hamid Rezatofighi, Nathan Tsoi, JunYoung Gwak, Amir Sadeghian, Ian Reid, and Silvio Savarese. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE/CVF Conference on"}], "headers": [{"label": "page_footer", "id": 25, "page_no": 8, "cluster": {"id": 25, "label": "page_footer", "bbox": {"l": 294.42181396484375, "t": 733.50244140625, "r": 300.25152587890625, "b": 743.0391500000001, "coord_origin": "TOPLEFT"}, "confidence": 0.8797808885574341, "cells": [{"id": 200, "text": "9", "bbox": {"l": 295.12103, "t": 734.1325870000001, "r": 300.10233, "b": 743.0391500000001, "coord_origin": "TOPLEFT"}}]}, "text": "9"}]}}, {"page_no": 9, "size": {"width": 612.0, "height": 792.0}, "cells": [{"id": 0, "text": "Computer Vision and Pattern Recognition", "bbox": {"l": 70.030998, "t": 75.96447999999998, "r": 223.58061, "b": 83.69353999999998, "coord_origin": "TOPLEFT"}}, {"id": 1, "text": ", pages 658-666,", "bbox": {"l": 223.57901, "t": 75.88378999999998, "r": 286.36176, "b": 83.89977999999996, "coord_origin": "TOPLEFT"}}, {"id": 2, "text": "2019. 6", "bbox": {"l": 70.031006, "t": 86.84276999999997, "r": 97.916512, "b": 94.85875999999996, "coord_origin": "TOPLEFT"}}, {"id": 3, "text": "[26]", "bbox": {"l": 50.112007, "t": 98.16576999999995, "r": 65.534088, "b": 106.18176000000005, "coord_origin": "TOPLEFT"}}, {"id": 4, "text": "Sebastian Schreiber, Stefan Agne, Ivo Wolf, Andreas Den-", "bbox": {"l": 67.84832, "t": 98.16576999999995, "r": 286.35867, "b": 106.18176000000005, "coord_origin": "TOPLEFT"}}, {"id": 5, "text": "gel, and Sheraz Ahmed. Deepdesrt: Deep learning for detec-", "bbox": {"l": 70.031006, "t": 109.12476000000004, "r": 286.36331, "b": 117.14075000000003, "coord_origin": "TOPLEFT"}}, {"id": 6, "text": "tion and structure recognition of tables in document images.", "bbox": {"l": 70.031006, "t": 120.08374000000003, "r": 286.36334, "b": 128.0997299999999, "coord_origin": "TOPLEFT"}}, {"id": 7, "text": "In", "bbox": {"l": 70.031006, "t": 131.04272000000003, "r": 77.500015, "b": 139.05872, "coord_origin": "TOPLEFT"}}, {"id": 8, "text": "2017 14th IAPR International Conference on Document", "bbox": {"l": 80.560005, "t": 131.12341000000004, "r": 286.36578, "b": 138.85248, "coord_origin": "TOPLEFT"}}, {"id": 9, "text": "Analysis and Recognition (ICDAR)", "bbox": {"l": 70.031006, "t": 142.0824, "r": 195.22885, "b": 149.81146, "coord_origin": "TOPLEFT"}}, {"id": 10, "text": ", volume 01, pages 1162-", "bbox": {"l": 195.231, "t": 142.00171, "r": 286.36548, "b": 150.0177, "coord_origin": "TOPLEFT"}}, {"id": 11, "text": "1167, 2017. 1", "bbox": {"l": 70.031006, "t": 152.96069, "r": 120.33251, "b": 160.97668, "coord_origin": "TOPLEFT"}}, {"id": 12, "text": "[27]", "bbox": {"l": 50.112007, "t": 164.28467, "r": 65.534088, "b": 172.30066, "coord_origin": "TOPLEFT"}}, {"id": 13, "text": "Sebastian Schreiber, Stefan Agne, Ivo Wolf, Andreas Den-", "bbox": {"l": 67.84832, "t": 164.28467, "r": 286.35867, "b": 172.30066, "coord_origin": "TOPLEFT"}}, {"id": 14, "text": "gel, and Sheraz Ahmed. Deepdesrt: Deep learning for de-", "bbox": {"l": 70.031006, "t": 175.24365, "r": 286.36337, "b": 183.25964, "coord_origin": "TOPLEFT"}}, {"id": 15, "text": "tection and structure recognition of tables in document im-", "bbox": {"l": 70.031006, "t": 186.20263999999997, "r": 286.36334, "b": 194.21862999999996, "coord_origin": "TOPLEFT"}}, {"id": 16, "text": "ages. In", "bbox": {"l": 70.031006, "t": 197.16161999999997, "r": 101.33271, "b": 205.17760999999996, "coord_origin": "TOPLEFT"}}, {"id": 17, "text": "2017 14th IAPR international conference on doc-", "bbox": {"l": 104.33101, "t": 197.24230999999997, "r": 286.35791, "b": 204.97136999999998, "coord_origin": "TOPLEFT"}}, {"id": 18, "text": "ument analysis and recognition (ICDAR)", "bbox": {"l": 70.031006, "t": 208.20032000000003, "r": 220.48719999999997, "b": 215.92938000000004, "coord_origin": "TOPLEFT"}}, {"id": 19, "text": ", volume 1, pages", "bbox": {"l": 220.48401000000004, "t": 208.11963000000003, "r": 286.36017, "b": 216.13562000000002, "coord_origin": "TOPLEFT"}}, {"id": 20, "text": "1162-1167. IEEE, 2017. 3", "bbox": {"l": 70.031006, "t": 219.07861000000003, "r": 166.65294, "b": 227.0946, "coord_origin": "TOPLEFT"}}, {"id": 21, "text": "[28]", "bbox": {"l": 50.112007, "t": 230.40259000000003, "r": 65.650383, "b": 238.41858000000002, "coord_origin": "TOPLEFT"}}, {"id": 22, "text": "Faisal Shafait and Ray Smith. Table detection in heteroge-", "bbox": {"l": 67.982063, "t": 230.40259000000003, "r": 286.3587, "b": 238.41858000000002, "coord_origin": "TOPLEFT"}}, {"id": 23, "text": "neous documents. In", "bbox": {"l": 70.031006, "t": 241.36157000000003, "r": 147.16895, "b": 249.37756000000002, "coord_origin": "TOPLEFT"}}, {"id": 24, "text": "Proceedings of the 9th IAPR Interna-", "bbox": {"l": 149.93301, "t": 241.44226000000003, "r": 286.36578, "b": 249.17133, "coord_origin": "TOPLEFT"}}, {"id": 25, "text": "tional Workshop on Document Analysis Systems", "bbox": {"l": 70.031013, "t": 252.40125, "r": 244.6875, "b": 260.13031, "coord_origin": "TOPLEFT"}}, {"id": 26, "text": ", pages 65-", "bbox": {"l": 244.69101, "t": 252.32056, "r": 286.35791, "b": 260.33655, "coord_origin": "TOPLEFT"}}, {"id": 27, "text": "72, 2010. 2", "bbox": {"l": 70.031006, "t": 263.27954, "r": 111.36611, "b": 271.29553, "coord_origin": "TOPLEFT"}}, {"id": 28, "text": "[29]", "bbox": {"l": 50.112007, "t": 274.60357999999997, "r": 66.023834, "b": 282.61951, "coord_origin": "TOPLEFT"}}, {"id": 29, "text": "Shoaib", "bbox": {"l": 68.411568, "t": 274.60357999999997, "r": 94.944016, "b": 282.61951, "coord_origin": "TOPLEFT"}}, {"id": 30, "text": "Ahmed", "bbox": {"l": 100.8708, "t": 274.60357999999997, "r": 127.26788000000002, "b": 282.61951, "coord_origin": "TOPLEFT"}}, {"id": 31, "text": "Siddiqui,", "bbox": {"l": 133.19467, "t": 274.60357999999997, "r": 165.83237, "b": 282.61951, "coord_origin": "TOPLEFT"}}, {"id": 32, "text": "Imran", "bbox": {"l": 172.68269, "t": 274.60357999999997, "r": 194.09445, "b": 282.61951, "coord_origin": "TOPLEFT"}}, {"id": 33, "text": "Ali", "bbox": {"l": 200.02124, "t": 274.60357999999997, "r": 211.4803, "b": 282.61951, "coord_origin": "TOPLEFT"}}, {"id": 34, "text": "Fateh,", "bbox": {"l": 217.40708999999998, "t": 274.60357999999997, "r": 239.43755, "b": 282.61951, "coord_origin": "TOPLEFT"}}, {"id": 35, "text": "Syed", "bbox": {"l": 246.28787000000003, "t": 274.60357999999997, "r": 264.22067, "b": 282.61951, "coord_origin": "TOPLEFT"}}, {"id": 36, "text": "Tah-", "bbox": {"l": 270.14746, "t": 274.60357999999997, "r": 286.35873, "b": 282.61951, "coord_origin": "TOPLEFT"}}, {"id": 37, "text": "seen Raza Rizvi, Andreas Dengel, and Sheraz Ahmed.", "bbox": {"l": 70.031006, "t": 285.56256, "r": 286.36331, "b": 293.57852, "coord_origin": "TOPLEFT"}}, {"id": 38, "text": "Deeptabstr: Deep learning based table structure recognition.", "bbox": {"l": 70.031006, "t": 296.52155, "r": 286.36331, "b": 304.53751, "coord_origin": "TOPLEFT"}}, {"id": 39, "text": "In", "bbox": {"l": 70.031006, "t": 307.48053, "r": 77.500015, "b": 315.49649, "coord_origin": "TOPLEFT"}}, {"id": 40, "text": "2019 International Conference on Document Analysis and", "bbox": {"l": 79.350006, "t": 307.56122, "r": 286.36627, "b": 315.29028, "coord_origin": "TOPLEFT"}}, {"id": 41, "text": "Recognition (ICDAR)", "bbox": {"l": 70.031006, "t": 318.51923, "r": 147.57243, "b": 326.24829, "coord_origin": "TOPLEFT"}}, {"id": 42, "text": ", pages 1403-1409. IEEE, 2019. 3", "bbox": {"l": 147.57201, "t": 318.43854, "r": 271.33521, "b": 326.4545, "coord_origin": "TOPLEFT"}}, {"id": 43, "text": "[30]", "bbox": {"l": 50.112007, "t": 329.76254, "r": 65.366135, "b": 337.7785, "coord_origin": "TOPLEFT"}}, {"id": 44, "text": "Peter W J Staar, Michele Dolfi, Christoph Auer, and Costas", "bbox": {"l": 67.655159, "t": 329.76254, "r": 286.3587, "b": 337.7785, "coord_origin": "TOPLEFT"}}, {"id": 45, "text": "Bekas. Corpus conversion service: A machine learning plat-", "bbox": {"l": 70.031006, "t": 340.72156000000007, "r": 286.36334, "b": 348.7375200000001, "coord_origin": "TOPLEFT"}}, {"id": 46, "text": "form to ingest documents at scale.", "bbox": {"l": 70.031006, "t": 351.68054, "r": 198.82439, "b": 359.6965, "coord_origin": "TOPLEFT"}}, {"id": 47, "text": "In", "bbox": {"l": 206.06027, "t": 351.68054, "r": 213.52928, "b": 359.6965, "coord_origin": "TOPLEFT"}}, {"id": 48, "text": "Proceedings of the", "bbox": {"l": 217.02101, "t": 351.76123, "r": 286.35815, "b": 359.4903, "coord_origin": "TOPLEFT"}}, {"id": 49, "text": "24th ACM SIGKDD", "bbox": {"l": 70.031006, "t": 362.72021, "r": 143.08028, "b": 370.44928, "coord_origin": "TOPLEFT"}}, {"id": 50, "text": ", KDD \u201918, pages 774-782, New York,", "bbox": {"l": 143.078, "t": 362.63953000000004, "r": 286.36111, "b": 370.65549000000004, "coord_origin": "TOPLEFT"}}, {"id": 51, "text": "NY, USA, 2018. ACM. 1", "bbox": {"l": 70.031006, "t": 373.59851, "r": 161.15652, "b": 381.61447, "coord_origin": "TOPLEFT"}}, {"id": 52, "text": "[31]", "bbox": {"l": 50.112007, "t": 384.92252, "r": 65.140724, "b": 392.93848, "coord_origin": "TOPLEFT"}}, {"id": 53, "text": "Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko-", "bbox": {"l": 67.395927, "t": 384.92252, "r": 286.35876, "b": 392.93848, "coord_origin": "TOPLEFT"}}, {"id": 54, "text": "reit, Llion Jones, Aidan N Gomez, \u0141 ukasz Kaiser, and Il-", "bbox": {"l": 70.031006, "t": 395.88153, "r": 286.36337, "b": 403.89749, "coord_origin": "TOPLEFT"}}, {"id": 55, "text": "lia Polosukhin.", "bbox": {"l": 70.031006, "t": 406.84052, "r": 125.47024999999998, "b": 414.85648, "coord_origin": "TOPLEFT"}}, {"id": 56, "text": "Attention is all you need.", "bbox": {"l": 133.90764, "t": 406.84052, "r": 230.83444, "b": 414.85648, "coord_origin": "TOPLEFT"}}, {"id": 57, "text": "In I. Guyon,", "bbox": {"l": 239.27182, "t": 406.84052, "r": 286.36334, "b": 414.85648, "coord_origin": "TOPLEFT"}}, {"id": 58, "text": "U.", "bbox": {"l": 70.031006, "t": 417.7995, "r": 78.958366, "b": 425.81546, "coord_origin": "TOPLEFT"}}, {"id": 59, "text": "V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vish-", "bbox": {"l": 81.254494, "t": 417.7995, "r": 286.36334, "b": 425.81546, "coord_origin": "TOPLEFT"}}, {"id": 60, "text": "wanathan, and R. Garnett, editors,", "bbox": {"l": 70.031006, "t": 428.75751, "r": 196.7621, "b": 436.7734699999999, "coord_origin": "TOPLEFT"}}, {"id": 61, "text": "Advances in Neural In-", "bbox": {"l": 200.20201, "t": 428.8381999999999, "r": 286.36017, "b": 436.56726, "coord_origin": "TOPLEFT"}}, {"id": 62, "text": "formation Processing Systems 30", "bbox": {"l": 70.031006, "t": 439.79717999999997, "r": 189.19447, "b": 447.52624999999995, "coord_origin": "TOPLEFT"}}, {"id": 63, "text": ", pages 5998-6008. Curran", "bbox": {"l": 189.19501, "t": 439.71648999999996, "r": 286.36389, "b": 447.73245, "coord_origin": "TOPLEFT"}}, {"id": 64, "text": "Associates, Inc., 2017. 5", "bbox": {"l": 70.031006, "t": 450.67548, "r": 158.9239, "b": 458.69144, "coord_origin": "TOPLEFT"}}, {"id": 65, "text": "[32]", "bbox": {"l": 50.112007, "t": 461.99948, "r": 65.910469, "b": 470.01544, "coord_origin": "TOPLEFT"}}, {"id": 66, "text": "Oriol Vinyals, Alexander Toshev, Samy Bengio, and Du-", "bbox": {"l": 68.281181, "t": 461.99948, "r": 286.35873, "b": 470.01544, "coord_origin": "TOPLEFT"}}, {"id": 67, "text": "mitru Erhan.", "bbox": {"l": 70.031006, "t": 472.9585, "r": 116.27969999999999, "b": 480.97446, "coord_origin": "TOPLEFT"}}, {"id": 68, "text": "Show and tell: A neural image caption gen-", "bbox": {"l": 122.48445, "t": 472.9585, "r": 286.36334, "b": 480.97446, "coord_origin": "TOPLEFT"}}, {"id": 69, "text": "erator. In", "bbox": {"l": 70.031006, "t": 483.91748, "r": 103.30532, "b": 491.93344, "coord_origin": "TOPLEFT"}}, {"id": 70, "text": "Proceedings of the IEEE Conference on Computer", "bbox": {"l": 105.51601, "t": 483.99817, "r": 286.35931, "b": 491.72723, "coord_origin": "TOPLEFT"}}, {"id": 71, "text": "Vision and Pattern Recognition (CVPR)", "bbox": {"l": 70.031006, "t": 494.95715, "r": 212.51607, "b": 502.68622, "coord_origin": "TOPLEFT"}}, {"id": 72, "text": ", June 2015. 2", "bbox": {"l": 212.51401, "t": 494.87646, "r": 263.55975, "b": 502.89243, "coord_origin": "TOPLEFT"}}, {"id": 73, "text": "[33]", "bbox": {"l": 50.112015, "t": 506.20047, "r": 65.682777, "b": 514.21643, "coord_origin": "TOPLEFT"}}, {"id": 74, "text": "Wenyuan Xue, Qingyong Li, and Dacheng Tao.", "bbox": {"l": 68.019325, "t": 506.20047, "r": 247.37280000000004, "b": 514.21643, "coord_origin": "TOPLEFT"}}, {"id": 75, "text": "Res2tim:", "bbox": {"l": 253.97208000000003, "t": 506.20047, "r": 286.3587, "b": 514.21643, "coord_origin": "TOPLEFT"}}, {"id": 76, "text": "reconstruct syntactic structures from table images. In", "bbox": {"l": 70.031013, "t": 517.15948, "r": 265.62408, "b": 525.17545, "coord_origin": "TOPLEFT"}}, {"id": 77, "text": "2019", "bbox": {"l": 268.42902, "t": 517.24017, "r": 286.36182, "b": 524.96924, "coord_origin": "TOPLEFT"}}, {"id": 78, "text": "International Conference on Document Analysis and Recog-", "bbox": {"l": 70.031021, "t": 528.19916, "r": 286.36337, "b": 535.92822, "coord_origin": "TOPLEFT"}}, {"id": 79, "text": "nition (ICDAR)", "bbox": {"l": 70.031021, "t": 539.15718, "r": 125.25507999999999, "b": 546.88622, "coord_origin": "TOPLEFT"}}, {"id": 80, "text": ", pages 749-755. IEEE, 2019. 3", "bbox": {"l": 125.25402, "t": 539.07648, "r": 240.05083, "b": 547.09244, "coord_origin": "TOPLEFT"}}, {"id": 81, "text": "[34]", "bbox": {"l": 50.112022, "t": 550.40048, "r": 66.037048, "b": 558.41644, "coord_origin": "TOPLEFT"}}, {"id": 82, "text": "Wenyuan Xue, Baosheng Yu, Wen Wang, Dacheng Tao,", "bbox": {"l": 68.426765, "t": 550.40048, "r": 286.3587, "b": 558.41644, "coord_origin": "TOPLEFT"}}, {"id": 83, "text": "and Qingyong Li.", "bbox": {"l": 70.031021, "t": 561.35948, "r": 137.08176, "b": 569.37544, "coord_origin": "TOPLEFT"}}, {"id": 84, "text": "Tgrnet:", "bbox": {"l": 145.9854, "t": 561.35948, "r": 172.38248, "b": 569.37544, "coord_origin": "TOPLEFT"}}, {"id": 85, "text": "A table graph reconstruction", "bbox": {"l": 178.7038, "t": 561.35948, "r": 286.36337, "b": 569.37544, "coord_origin": "TOPLEFT"}}, {"id": 86, "text": "network for table structure recognition.", "bbox": {"l": 70.031021, "t": 572.31848, "r": 221.00723, "b": 580.33444, "coord_origin": "TOPLEFT"}}, {"id": 87, "text": "arXiv preprint", "bbox": {"l": 232.54300999999998, "t": 572.39919, "r": 286.35938, "b": 580.12822, "coord_origin": "TOPLEFT"}}, {"id": 88, "text": "arXiv:2106.10598", "bbox": {"l": 70.031021, "t": 583.35818, "r": 135.53058, "b": 591.08722, "coord_origin": "TOPLEFT"}}, {"id": 89, "text": ", 2021. 3", "bbox": {"l": 135.53003, "t": 583.27748, "r": 167.89876, "b": 591.29344, "coord_origin": "TOPLEFT"}}, {"id": 90, "text": "[35]", "bbox": {"l": 50.11203, "t": 594.60149, "r": 65.23661, "b": 602.61745, "coord_origin": "TOPLEFT"}}, {"id": 91, "text": "Quanzeng You, Hailin Jin, Zhaowen Wang, Chen Fang, and", "bbox": {"l": 67.506203, "t": 594.60149, "r": 286.3587, "b": 602.61745, "coord_origin": "TOPLEFT"}}, {"id": 92, "text": "Jiebo Luo.", "bbox": {"l": 70.031029, "t": 605.56049, "r": 109.1066, "b": 613.57645, "coord_origin": "TOPLEFT"}}, {"id": 93, "text": "Image captioning with semantic attention.", "bbox": {"l": 116.22592, "t": 605.56049, "r": 271.76605, "b": 613.57645, "coord_origin": "TOPLEFT"}}, {"id": 94, "text": "In", "bbox": {"l": 278.89435, "t": 605.56049, "r": 286.36337, "b": 613.57645, "coord_origin": "TOPLEFT"}}, {"id": 95, "text": "Proceedings of the IEEE conference on computer vision and", "bbox": {"l": 70.031029, "t": 616.60019, "r": 286.3634, "b": 624.32922, "coord_origin": "TOPLEFT"}}, {"id": 96, "text": "pattern recognition", "bbox": {"l": 70.031029, "t": 627.55919, "r": 139.09921, "b": 635.28822, "coord_origin": "TOPLEFT"}}, {"id": 97, "text": ", pages 4651-4659, 2016. 4", "bbox": {"l": 139.09802, "t": 627.47849, "r": 238.95683, "b": 635.49445, "coord_origin": "TOPLEFT"}}, {"id": 98, "text": "[36]", "bbox": {"l": 50.112022, "t": 638.80249, "r": 65.203552, "b": 646.81845, "coord_origin": "TOPLEFT"}}, {"id": 99, "text": "Xinyi Zheng, Doug Burdick, Lucian Popa, Peter Zhong, and", "bbox": {"l": 67.468193, "t": 638.80249, "r": 286.35873, "b": 646.81845, "coord_origin": "TOPLEFT"}}, {"id": 100, "text": "Nancy Xin Ru Wang. Global table extractor (gte): A frame-", "bbox": {"l": 70.031021, "t": 649.7605, "r": 286.36337, "b": 657.77646, "coord_origin": "TOPLEFT"}}, {"id": 101, "text": "work for joint table identification and cell structure recogni-", "bbox": {"l": 70.031021, "t": 660.7195, "r": 286.36334, "b": 668.73547, "coord_origin": "TOPLEFT"}}, {"id": 102, "text": "tion using visual context.", "bbox": {"l": 70.031021, "t": 671.6785, "r": 158.45766, "b": 679.69447, "coord_origin": "TOPLEFT"}}, {"id": 103, "text": "Winter Conference for Applications", "bbox": {"l": 160.52802, "t": 671.7592, "r": 286.36249, "b": 679.48824, "coord_origin": "TOPLEFT"}}, {"id": 104, "text": "in Computer Vision (WACV)", "bbox": {"l": 70.031013, "t": 682.7182, "r": 171.42305, "b": 690.44724, "coord_origin": "TOPLEFT"}}, {"id": 105, "text": ", 2021. 2, 3", "bbox": {"l": 171.42201, "t": 682.6375, "r": 212.75713, "b": 690.65347, "coord_origin": "TOPLEFT"}}, {"id": 106, "text": "[37]", "bbox": {"l": 50.112015, "t": 693.961502, "r": 66.506706, "b": 701.977463, "coord_origin": "TOPLEFT"}}, {"id": 107, "text": "Xu", "bbox": {"l": 68.966896, "t": 693.961502, "r": 80.992294, "b": 701.977463, "coord_origin": "TOPLEFT"}}, {"id": 108, "text": "Zhong,", "bbox": {"l": 89.062057, "t": 693.961502, "r": 114.71492999999998, "b": 701.977463, "coord_origin": "TOPLEFT"}}, {"id": 109, "text": "Elaheh", "bbox": {"l": 124.24621000000002, "t": 693.961502, "r": 149.1459, "b": 701.977463, "coord_origin": "TOPLEFT"}}, {"id": 110, "text": "ShafieiBavani,", "bbox": {"l": 157.22462, "t": 693.961502, "r": 209.37321, "b": 701.977463, "coord_origin": "TOPLEFT"}}, {"id": 111, "text": "and", "bbox": {"l": 218.9045, "t": 693.961502, "r": 231.85196999999997, "b": 701.977463, "coord_origin": "TOPLEFT"}}, {"id": 112, "text": "Antonio", "bbox": {"l": 239.93069, "t": 693.961502, "r": 269.32254, "b": 701.977463, "coord_origin": "TOPLEFT"}}, {"id": 113, "text": "Ji-", "bbox": {"l": 277.3923, "t": 693.961502, "r": 286.3587, "b": 701.977463, "coord_origin": "TOPLEFT"}}, {"id": 114, "text": "meno Yepes. Image-based table recognition: Data, model,", "bbox": {"l": 70.031013, "t": 704.920502, "r": 286.36334, "b": 712.936462, "coord_origin": "TOPLEFT"}}, {"id": 115, "text": "and evaluation. In Andrea Vedaldi, Horst Bischof, Thomas", "bbox": {"l": 328.78101, "t": 75.88347999999996, "r": 545.11346, "b": 83.89948000000015, "coord_origin": "TOPLEFT"}}, {"id": 116, "text": "Brox, and Jan-Michael Frahm, editors,", "bbox": {"l": 328.78101, "t": 86.84149000000002, "r": 472.30618, "b": 94.85748000000001, "coord_origin": "TOPLEFT"}}, {"id": 117, "text": "Computer Vision -", "bbox": {"l": 475.88501, "t": 86.92218000000003, "r": 545.11456, "b": 94.65125, "coord_origin": "TOPLEFT"}}, {"id": 118, "text": "ECCV 2020", "bbox": {"l": 328.78101, "t": 97.88116000000002, "r": 371.92734, "b": 105.61023, "coord_origin": "TOPLEFT"}}, {"id": 119, "text": ", pages 564-580, Cham, 2020. Springer Interna-", "bbox": {"l": 371.92599, "t": 97.80048, "r": 545.11206, "b": 105.81646999999987, "coord_origin": "TOPLEFT"}}, {"id": 120, "text": "tional Publishing. 2, 3, 7", "bbox": {"l": 328.78101, "t": 108.75945999999999, "r": 417.70087, "b": 116.77544999999998, "coord_origin": "TOPLEFT"}}, {"id": 121, "text": "[38]", "bbox": {"l": 308.862, "t": 120.71447999999998, "r": 324.33197, "b": 128.73046999999997, "coord_origin": "TOPLEFT"}}, {"id": 122, "text": "Xu Zhong, Jianbin Tang, and Antonio Jimeno Yepes. Pub-", "bbox": {"l": 326.65341, "t": 120.71447999999998, "r": 545.10876, "b": 128.73046999999997, "coord_origin": "TOPLEFT"}}, {"id": 123, "text": "laynet: Largest dataset ever for document layout analysis. In", "bbox": {"l": 328.78101, "t": 131.67345999999998, "r": 545.11334, "b": 139.68944999999997, "coord_origin": "TOPLEFT"}}, {"id": 124, "text": "2019 International Conference on Document Analysis and", "bbox": {"l": 328.78101, "t": 142.71312999999998, "r": 545.11328, "b": 150.44219999999996, "coord_origin": "TOPLEFT"}}, {"id": 125, "text": "Recognition (ICDAR)", "bbox": {"l": 328.78101, "t": 153.67211999999995, "r": 406.32245, "b": 161.40117999999995, "coord_origin": "TOPLEFT"}}, {"id": 126, "text": ", pages 1015-1022, 2019. 1", "bbox": {"l": 406.32202, "t": 153.59142999999995, "r": 506.18085, "b": 161.60742000000005, "coord_origin": "TOPLEFT"}}, {"id": 127, "text": "10", "bbox": {"l": 292.63, "t": 734.1329920000001, "r": 302.59259, "b": 743.039555, "coord_origin": "TOPLEFT"}}], "predictions": {"layout": {"clusters": [{"id": 0, "label": "text", "bbox": {"l": 69.42779541015625, "t": 75.20453643798828, "r": 286.36176, "b": 94.85875999999996, "coord_origin": "TOPLEFT"}, "confidence": 0.6886652708053589, "cells": [{"id": 0, "text": "Computer Vision and Pattern Recognition", "bbox": {"l": 70.030998, "t": 75.96447999999998, "r": 223.58061, "b": 83.69353999999998, "coord_origin": "TOPLEFT"}}, {"id": 1, "text": ", pages 658-666,", "bbox": {"l": 223.57901, "t": 75.88378999999998, "r": 286.36176, "b": 83.89977999999996, "coord_origin": "TOPLEFT"}}, {"id": 2, "text": "2019. 6", "bbox": {"l": 70.031006, "t": 86.84276999999997, "r": 97.916512, "b": 94.85875999999996, "coord_origin": "TOPLEFT"}}]}, {"id": 1, "label": "list_item", "bbox": {"l": 49.9049186706543, "t": 96.92471313476562, "r": 286.5950927734375, "b": 160.97668, "coord_origin": "TOPLEFT"}, "confidence": 0.7722932696342468, "cells": [{"id": 3, "text": "[26]", "bbox": {"l": 50.112007, "t": 98.16576999999995, "r": 65.534088, "b": 106.18176000000005, "coord_origin": "TOPLEFT"}}, {"id": 4, "text": "Sebastian Schreiber, Stefan Agne, Ivo Wolf, Andreas Den-", "bbox": {"l": 67.84832, "t": 98.16576999999995, "r": 286.35867, "b": 106.18176000000005, "coord_origin": "TOPLEFT"}}, {"id": 5, "text": "gel, and Sheraz Ahmed. Deepdesrt: Deep learning for detec-", "bbox": {"l": 70.031006, "t": 109.12476000000004, "r": 286.36331, "b": 117.14075000000003, "coord_origin": "TOPLEFT"}}, {"id": 6, "text": "tion and structure recognition of tables in document images.", "bbox": {"l": 70.031006, "t": 120.08374000000003, "r": 286.36334, "b": 128.0997299999999, "coord_origin": "TOPLEFT"}}, {"id": 7, "text": "In", "bbox": {"l": 70.031006, "t": 131.04272000000003, "r": 77.500015, "b": 139.05872, "coord_origin": "TOPLEFT"}}, {"id": 8, "text": "2017 14th IAPR International Conference on Document", "bbox": {"l": 80.560005, "t": 131.12341000000004, "r": 286.36578, "b": 138.85248, "coord_origin": "TOPLEFT"}}, {"id": 9, "text": "Analysis and Recognition (ICDAR)", "bbox": {"l": 70.031006, "t": 142.0824, "r": 195.22885, "b": 149.81146, "coord_origin": "TOPLEFT"}}, {"id": 10, "text": ", volume 01, pages 1162-", "bbox": {"l": 195.231, "t": 142.00171, "r": 286.36548, "b": 150.0177, "coord_origin": "TOPLEFT"}}, {"id": 11, "text": "1167, 2017. 1", "bbox": {"l": 70.031006, "t": 152.96069, "r": 120.33251, "b": 160.97668, "coord_origin": "TOPLEFT"}}]}, {"id": 2, "label": "list_item", "bbox": {"l": 49.60877990722656, "t": 163.67764282226562, "r": 286.36337, "b": 227.0946, "coord_origin": "TOPLEFT"}, "confidence": 0.858070969581604, "cells": [{"id": 12, "text": "[27]", "bbox": {"l": 50.112007, "t": 164.28467, "r": 65.534088, "b": 172.30066, "coord_origin": "TOPLEFT"}}, {"id": 13, "text": "Sebastian Schreiber, Stefan Agne, Ivo Wolf, Andreas Den-", "bbox": {"l": 67.84832, "t": 164.28467, "r": 286.35867, "b": 172.30066, "coord_origin": "TOPLEFT"}}, {"id": 14, "text": "gel, and Sheraz Ahmed. Deepdesrt: Deep learning for de-", "bbox": {"l": 70.031006, "t": 175.24365, "r": 286.36337, "b": 183.25964, "coord_origin": "TOPLEFT"}}, {"id": 15, "text": "tection and structure recognition of tables in document im-", "bbox": {"l": 70.031006, "t": 186.20263999999997, "r": 286.36334, "b": 194.21862999999996, "coord_origin": "TOPLEFT"}}, {"id": 16, "text": "ages. In", "bbox": {"l": 70.031006, "t": 197.16161999999997, "r": 101.33271, "b": 205.17760999999996, "coord_origin": "TOPLEFT"}}, {"id": 17, "text": "2017 14th IAPR international conference on doc-", "bbox": {"l": 104.33101, "t": 197.24230999999997, "r": 286.35791, "b": 204.97136999999998, "coord_origin": "TOPLEFT"}}, {"id": 18, "text": "ument analysis and recognition (ICDAR)", "bbox": {"l": 70.031006, "t": 208.20032000000003, "r": 220.48719999999997, "b": 215.92938000000004, "coord_origin": "TOPLEFT"}}, {"id": 19, "text": ", volume 1, pages", "bbox": {"l": 220.48401000000004, "t": 208.11963000000003, "r": 286.36017, "b": 216.13562000000002, "coord_origin": "TOPLEFT"}}, {"id": 20, "text": "1162-1167. IEEE, 2017. 3", "bbox": {"l": 70.031006, "t": 219.07861000000003, "r": 166.65294, "b": 227.0946, "coord_origin": "TOPLEFT"}}]}, {"id": 3, "label": "list_item", "bbox": {"l": 49.50430679321289, "t": 229.77830505371094, "r": 286.36578, "b": 271.29553, "coord_origin": "TOPLEFT"}, "confidence": 0.85127854347229, "cells": [{"id": 21, "text": "[28]", "bbox": {"l": 50.112007, "t": 230.40259000000003, "r": 65.650383, "b": 238.41858000000002, "coord_origin": "TOPLEFT"}}, {"id": 22, "text": "Faisal Shafait and Ray Smith. Table detection in heteroge-", "bbox": {"l": 67.982063, "t": 230.40259000000003, "r": 286.3587, "b": 238.41858000000002, "coord_origin": "TOPLEFT"}}, {"id": 23, "text": "neous documents. In", "bbox": {"l": 70.031006, "t": 241.36157000000003, "r": 147.16895, "b": 249.37756000000002, "coord_origin": "TOPLEFT"}}, {"id": 24, "text": "Proceedings of the 9th IAPR Interna-", "bbox": {"l": 149.93301, "t": 241.44226000000003, "r": 286.36578, "b": 249.17133, "coord_origin": "TOPLEFT"}}, {"id": 25, "text": "tional Workshop on Document Analysis Systems", "bbox": {"l": 70.031013, "t": 252.40125, "r": 244.6875, "b": 260.13031, "coord_origin": "TOPLEFT"}}, {"id": 26, "text": ", pages 65-", "bbox": {"l": 244.69101, "t": 252.32056, "r": 286.35791, "b": 260.33655, "coord_origin": "TOPLEFT"}}, {"id": 27, "text": "72, 2010. 2", "bbox": {"l": 70.031006, "t": 263.27954, "r": 111.36611, "b": 271.29553, "coord_origin": "TOPLEFT"}}]}, {"id": 4, "label": "list_item", "bbox": {"l": 49.674346923828125, "t": 273.582763671875, "r": 286.8292541503906, "b": 326.8854675292969, "coord_origin": "TOPLEFT"}, "confidence": 0.849456250667572, "cells": [{"id": 28, "text": "[29]", "bbox": {"l": 50.112007, "t": 274.60357999999997, "r": 66.023834, "b": 282.61951, "coord_origin": "TOPLEFT"}}, {"id": 29, "text": "Shoaib", "bbox": {"l": 68.411568, "t": 274.60357999999997, "r": 94.944016, "b": 282.61951, "coord_origin": "TOPLEFT"}}, {"id": 30, "text": "Ahmed", "bbox": {"l": 100.8708, "t": 274.60357999999997, "r": 127.26788000000002, "b": 282.61951, "coord_origin": "TOPLEFT"}}, {"id": 31, "text": "Siddiqui,", "bbox": {"l": 133.19467, "t": 274.60357999999997, "r": 165.83237, "b": 282.61951, "coord_origin": "TOPLEFT"}}, {"id": 32, "text": "Imran", "bbox": {"l": 172.68269, "t": 274.60357999999997, "r": 194.09445, "b": 282.61951, "coord_origin": "TOPLEFT"}}, {"id": 33, "text": "Ali", "bbox": {"l": 200.02124, "t": 274.60357999999997, "r": 211.4803, "b": 282.61951, "coord_origin": "TOPLEFT"}}, {"id": 34, "text": "Fateh,", "bbox": {"l": 217.40708999999998, "t": 274.60357999999997, "r": 239.43755, "b": 282.61951, "coord_origin": "TOPLEFT"}}, {"id": 35, "text": "Syed", "bbox": {"l": 246.28787000000003, "t": 274.60357999999997, "r": 264.22067, "b": 282.61951, "coord_origin": "TOPLEFT"}}, {"id": 36, "text": "Tah-", "bbox": {"l": 270.14746, "t": 274.60357999999997, "r": 286.35873, "b": 282.61951, "coord_origin": "TOPLEFT"}}, {"id": 37, "text": "seen Raza Rizvi, Andreas Dengel, and Sheraz Ahmed.", "bbox": {"l": 70.031006, "t": 285.56256, "r": 286.36331, "b": 293.57852, "coord_origin": "TOPLEFT"}}, {"id": 38, "text": "Deeptabstr: Deep learning based table structure recognition.", "bbox": {"l": 70.031006, "t": 296.52155, "r": 286.36331, "b": 304.53751, "coord_origin": "TOPLEFT"}}, {"id": 39, "text": "In", "bbox": {"l": 70.031006, "t": 307.48053, "r": 77.500015, "b": 315.49649, "coord_origin": "TOPLEFT"}}, {"id": 40, "text": "2019 International Conference on Document Analysis and", "bbox": {"l": 79.350006, "t": 307.56122, "r": 286.36627, "b": 315.29028, "coord_origin": "TOPLEFT"}}, {"id": 41, "text": "Recognition (ICDAR)", "bbox": {"l": 70.031006, "t": 318.51923, "r": 147.57243, "b": 326.24829, "coord_origin": "TOPLEFT"}}, {"id": 42, "text": ", pages 1403-1409. IEEE, 2019. 3", "bbox": {"l": 147.57201, "t": 318.43854, "r": 271.33521, "b": 326.4545, "coord_origin": "TOPLEFT"}}]}, {"id": 5, "label": "list_item", "bbox": {"l": 49.465091705322266, "t": 329.0513610839844, "r": 286.52130126953125, "b": 381.7455139160156, "coord_origin": "TOPLEFT"}, "confidence": 0.8567717671394348, "cells": [{"id": 43, "text": "[30]", "bbox": {"l": 50.112007, "t": 329.76254, "r": 65.366135, "b": 337.7785, "coord_origin": "TOPLEFT"}}, {"id": 44, "text": "Peter W J Staar, Michele Dolfi, Christoph Auer, and Costas", "bbox": {"l": 67.655159, "t": 329.76254, "r": 286.3587, "b": 337.7785, "coord_origin": "TOPLEFT"}}, {"id": 45, "text": "Bekas. Corpus conversion service: A machine learning plat-", "bbox": {"l": 70.031006, "t": 340.72156000000007, "r": 286.36334, "b": 348.7375200000001, "coord_origin": "TOPLEFT"}}, {"id": 46, "text": "form to ingest documents at scale.", "bbox": {"l": 70.031006, "t": 351.68054, "r": 198.82439, "b": 359.6965, "coord_origin": "TOPLEFT"}}, {"id": 47, "text": "In", "bbox": {"l": 206.06027, "t": 351.68054, "r": 213.52928, "b": 359.6965, "coord_origin": "TOPLEFT"}}, {"id": 48, "text": "Proceedings of the", "bbox": {"l": 217.02101, "t": 351.76123, "r": 286.35815, "b": 359.4903, "coord_origin": "TOPLEFT"}}, {"id": 49, "text": "24th ACM SIGKDD", "bbox": {"l": 70.031006, "t": 362.72021, "r": 143.08028, "b": 370.44928, "coord_origin": "TOPLEFT"}}, {"id": 50, "text": ", KDD \u201918, pages 774-782, New York,", "bbox": {"l": 143.078, "t": 362.63953000000004, "r": 286.36111, "b": 370.65549000000004, "coord_origin": "TOPLEFT"}}, {"id": 51, "text": "NY, USA, 2018. ACM. 1", "bbox": {"l": 70.031006, "t": 373.59851, "r": 161.15652, "b": 381.61447, "coord_origin": "TOPLEFT"}}]}, {"id": 6, "label": "list_item", "bbox": {"l": 49.55904006958008, "t": 384.240234375, "r": 286.4693603515625, "b": 458.69144, "coord_origin": "TOPLEFT"}, "confidence": 0.88321852684021, "cells": [{"id": 52, "text": "[31]", "bbox": {"l": 50.112007, "t": 384.92252, "r": 65.140724, "b": 392.93848, "coord_origin": "TOPLEFT"}}, {"id": 53, "text": "Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko-", "bbox": {"l": 67.395927, "t": 384.92252, "r": 286.35876, "b": 392.93848, "coord_origin": "TOPLEFT"}}, {"id": 54, "text": "reit, Llion Jones, Aidan N Gomez, \u0141 ukasz Kaiser, and Il-", "bbox": {"l": 70.031006, "t": 395.88153, "r": 286.36337, "b": 403.89749, "coord_origin": "TOPLEFT"}}, {"id": 55, "text": "lia Polosukhin.", "bbox": {"l": 70.031006, "t": 406.84052, "r": 125.47024999999998, "b": 414.85648, "coord_origin": "TOPLEFT"}}, {"id": 56, "text": "Attention is all you need.", "bbox": {"l": 133.90764, "t": 406.84052, "r": 230.83444, "b": 414.85648, "coord_origin": "TOPLEFT"}}, {"id": 57, "text": "In I. Guyon,", "bbox": {"l": 239.27182, "t": 406.84052, "r": 286.36334, "b": 414.85648, "coord_origin": "TOPLEFT"}}, {"id": 58, "text": "U.", "bbox": {"l": 70.031006, "t": 417.7995, "r": 78.958366, "b": 425.81546, "coord_origin": "TOPLEFT"}}, {"id": 59, "text": "V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vish-", "bbox": {"l": 81.254494, "t": 417.7995, "r": 286.36334, "b": 425.81546, "coord_origin": "TOPLEFT"}}, {"id": 60, "text": "wanathan, and R. Garnett, editors,", "bbox": {"l": 70.031006, "t": 428.75751, "r": 196.7621, "b": 436.7734699999999, "coord_origin": "TOPLEFT"}}, {"id": 61, "text": "Advances in Neural In-", "bbox": {"l": 200.20201, "t": 428.8381999999999, "r": 286.36017, "b": 436.56726, "coord_origin": "TOPLEFT"}}, {"id": 62, "text": "formation Processing Systems 30", "bbox": {"l": 70.031006, "t": 439.79717999999997, "r": 189.19447, "b": 447.52624999999995, "coord_origin": "TOPLEFT"}}, {"id": 63, "text": ", pages 5998-6008. Curran", "bbox": {"l": 189.19501, "t": 439.71648999999996, "r": 286.36389, "b": 447.73245, "coord_origin": "TOPLEFT"}}, {"id": 64, "text": "Associates, Inc., 2017. 5", "bbox": {"l": 70.031006, "t": 450.67548, "r": 158.9239, "b": 458.69144, "coord_origin": "TOPLEFT"}}]}, {"id": 7, "label": "list_item", "bbox": {"l": 49.52634048461914, "t": 461.1815490722656, "r": 286.46636962890625, "b": 502.89243, "coord_origin": "TOPLEFT"}, "confidence": 0.8431898355484009, "cells": [{"id": 65, "text": "[32]", "bbox": {"l": 50.112007, "t": 461.99948, "r": 65.910469, "b": 470.01544, "coord_origin": "TOPLEFT"}}, {"id": 66, "text": "Oriol Vinyals, Alexander Toshev, Samy Bengio, and Du-", "bbox": {"l": 68.281181, "t": 461.99948, "r": 286.35873, "b": 470.01544, "coord_origin": "TOPLEFT"}}, {"id": 67, "text": "mitru Erhan.", "bbox": {"l": 70.031006, "t": 472.9585, "r": 116.27969999999999, "b": 480.97446, "coord_origin": "TOPLEFT"}}, {"id": 68, "text": "Show and tell: A neural image caption gen-", "bbox": {"l": 122.48445, "t": 472.9585, "r": 286.36334, "b": 480.97446, "coord_origin": "TOPLEFT"}}, {"id": 69, "text": "erator. In", "bbox": {"l": 70.031006, "t": 483.91748, "r": 103.30532, "b": 491.93344, "coord_origin": "TOPLEFT"}}, {"id": 70, "text": "Proceedings of the IEEE Conference on Computer", "bbox": {"l": 105.51601, "t": 483.99817, "r": 286.35931, "b": 491.72723, "coord_origin": "TOPLEFT"}}, {"id": 71, "text": "Vision and Pattern Recognition (CVPR)", "bbox": {"l": 70.031006, "t": 494.95715, "r": 212.51607, "b": 502.68622, "coord_origin": "TOPLEFT"}}, {"id": 72, "text": ", June 2015. 2", "bbox": {"l": 212.51401, "t": 494.87646, "r": 263.55975, "b": 502.89243, "coord_origin": "TOPLEFT"}}]}, {"id": 8, "label": "list_item", "bbox": {"l": 49.58083724975586, "t": 505.2271728515625, "r": 286.77581787109375, "b": 547.5089721679688, "coord_origin": "TOPLEFT"}, "confidence": 0.8405721187591553, "cells": [{"id": 73, "text": "[33]", "bbox": {"l": 50.112015, "t": 506.20047, "r": 65.682777, "b": 514.21643, "coord_origin": "TOPLEFT"}}, {"id": 74, "text": "Wenyuan Xue, Qingyong Li, and Dacheng Tao.", "bbox": {"l": 68.019325, "t": 506.20047, "r": 247.37280000000004, "b": 514.21643, "coord_origin": "TOPLEFT"}}, {"id": 75, "text": "Res2tim:", "bbox": {"l": 253.97208000000003, "t": 506.20047, "r": 286.3587, "b": 514.21643, "coord_origin": "TOPLEFT"}}, {"id": 76, "text": "reconstruct syntactic structures from table images. In", "bbox": {"l": 70.031013, "t": 517.15948, "r": 265.62408, "b": 525.17545, "coord_origin": "TOPLEFT"}}, {"id": 77, "text": "2019", "bbox": {"l": 268.42902, "t": 517.24017, "r": 286.36182, "b": 524.96924, "coord_origin": "TOPLEFT"}}, {"id": 78, "text": "International Conference on Document Analysis and Recog-", "bbox": {"l": 70.031021, "t": 528.19916, "r": 286.36337, "b": 535.92822, "coord_origin": "TOPLEFT"}}, {"id": 79, "text": "nition (ICDAR)", "bbox": {"l": 70.031021, "t": 539.15718, "r": 125.25507999999999, "b": 546.88622, "coord_origin": "TOPLEFT"}}, {"id": 80, "text": ", pages 749-755. IEEE, 2019. 3", "bbox": {"l": 125.25402, "t": 539.07648, "r": 240.05083, "b": 547.09244, "coord_origin": "TOPLEFT"}}]}, {"id": 9, "label": "list_item", "bbox": {"l": 49.6944580078125, "t": 549.8140869140625, "r": 286.58221435546875, "b": 591.29344, "coord_origin": "TOPLEFT"}, "confidence": 0.8241643309593201, "cells": [{"id": 81, "text": "[34]", "bbox": {"l": 50.112022, "t": 550.40048, "r": 66.037048, "b": 558.41644, "coord_origin": "TOPLEFT"}}, {"id": 82, "text": "Wenyuan Xue, Baosheng Yu, Wen Wang, Dacheng Tao,", "bbox": {"l": 68.426765, "t": 550.40048, "r": 286.3587, "b": 558.41644, "coord_origin": "TOPLEFT"}}, {"id": 83, "text": "and Qingyong Li.", "bbox": {"l": 70.031021, "t": 561.35948, "r": 137.08176, "b": 569.37544, "coord_origin": "TOPLEFT"}}, {"id": 84, "text": "Tgrnet:", "bbox": {"l": 145.9854, "t": 561.35948, "r": 172.38248, "b": 569.37544, "coord_origin": "TOPLEFT"}}, {"id": 85, "text": "A table graph reconstruction", "bbox": {"l": 178.7038, "t": 561.35948, "r": 286.36337, "b": 569.37544, "coord_origin": "TOPLEFT"}}, {"id": 86, "text": "network for table structure recognition.", "bbox": {"l": 70.031021, "t": 572.31848, "r": 221.00723, "b": 580.33444, "coord_origin": "TOPLEFT"}}, {"id": 87, "text": "arXiv preprint", "bbox": {"l": 232.54300999999998, "t": 572.39919, "r": 286.35938, "b": 580.12822, "coord_origin": "TOPLEFT"}}, {"id": 88, "text": "arXiv:2106.10598", "bbox": {"l": 70.031021, "t": 583.35818, "r": 135.53058, "b": 591.08722, "coord_origin": "TOPLEFT"}}, {"id": 89, "text": ", 2021. 3", "bbox": {"l": 135.53003, "t": 583.27748, "r": 167.89876, "b": 591.29344, "coord_origin": "TOPLEFT"}}]}, {"id": 10, "label": "list_item", "bbox": {"l": 49.64909744262695, "t": 593.8485107421875, "r": 286.92523193359375, "b": 635.7434692382812, "coord_origin": "TOPLEFT"}, "confidence": 0.836098849773407, "cells": [{"id": 90, "text": "[35]", "bbox": {"l": 50.11203, "t": 594.60149, "r": 65.23661, "b": 602.61745, "coord_origin": "TOPLEFT"}}, {"id": 91, "text": "Quanzeng You, Hailin Jin, Zhaowen Wang, Chen Fang, and", "bbox": {"l": 67.506203, "t": 594.60149, "r": 286.3587, "b": 602.61745, "coord_origin": "TOPLEFT"}}, {"id": 92, "text": "Jiebo Luo.", "bbox": {"l": 70.031029, "t": 605.56049, "r": 109.1066, "b": 613.57645, "coord_origin": "TOPLEFT"}}, {"id": 93, "text": "Image captioning with semantic attention.", "bbox": {"l": 116.22592, "t": 605.56049, "r": 271.76605, "b": 613.57645, "coord_origin": "TOPLEFT"}}, {"id": 94, "text": "In", "bbox": {"l": 278.89435, "t": 605.56049, "r": 286.36337, "b": 613.57645, "coord_origin": "TOPLEFT"}}, {"id": 95, "text": "Proceedings of the IEEE conference on computer vision and", "bbox": {"l": 70.031029, "t": 616.60019, "r": 286.3634, "b": 624.32922, "coord_origin": "TOPLEFT"}}, {"id": 96, "text": "pattern recognition", "bbox": {"l": 70.031029, "t": 627.55919, "r": 139.09921, "b": 635.28822, "coord_origin": "TOPLEFT"}}, {"id": 97, "text": ", pages 4651-4659, 2016. 4", "bbox": {"l": 139.09802, "t": 627.47849, "r": 238.95683, "b": 635.49445, "coord_origin": "TOPLEFT"}}]}, {"id": 11, "label": "list_item", "bbox": {"l": 49.76350021362305, "t": 637.6057739257812, "r": 286.3849182128906, "b": 691.11279296875, "coord_origin": "TOPLEFT"}, "confidence": 0.8080187439918518, "cells": [{"id": 98, "text": "[36]", "bbox": {"l": 50.112022, "t": 638.80249, "r": 65.203552, "b": 646.81845, "coord_origin": "TOPLEFT"}}, {"id": 99, "text": "Xinyi Zheng, Doug Burdick, Lucian Popa, Peter Zhong, and", "bbox": {"l": 67.468193, "t": 638.80249, "r": 286.35873, "b": 646.81845, "coord_origin": "TOPLEFT"}}, {"id": 100, "text": "Nancy Xin Ru Wang. Global table extractor (gte): A frame-", "bbox": {"l": 70.031021, "t": 649.7605, "r": 286.36337, "b": 657.77646, "coord_origin": "TOPLEFT"}}, {"id": 101, "text": "work for joint table identification and cell structure recogni-", "bbox": {"l": 70.031021, "t": 660.7195, "r": 286.36334, "b": 668.73547, "coord_origin": "TOPLEFT"}}, {"id": 102, "text": "tion using visual context.", "bbox": {"l": 70.031021, "t": 671.6785, "r": 158.45766, "b": 679.69447, "coord_origin": "TOPLEFT"}}, {"id": 103, "text": "Winter Conference for Applications", "bbox": {"l": 160.52802, "t": 671.7592, "r": 286.36249, "b": 679.48824, "coord_origin": "TOPLEFT"}}, {"id": 104, "text": "in Computer Vision (WACV)", "bbox": {"l": 70.031013, "t": 682.7182, "r": 171.42305, "b": 690.44724, "coord_origin": "TOPLEFT"}}, {"id": 105, "text": ", 2021. 2, 3", "bbox": {"l": 171.42201, "t": 682.6375, "r": 212.75713, "b": 690.65347, "coord_origin": "TOPLEFT"}}]}, {"id": 12, "label": "list_item", "bbox": {"l": 49.56057357788086, "t": 693.036865234375, "r": 286.3756408691406, "b": 713.1453247070312, "coord_origin": "TOPLEFT"}, "confidence": 0.8385704755783081, "cells": [{"id": 106, "text": "[37]", "bbox": {"l": 50.112015, "t": 693.961502, "r": 66.506706, "b": 701.977463, "coord_origin": "TOPLEFT"}}, {"id": 107, "text": "Xu", "bbox": {"l": 68.966896, "t": 693.961502, "r": 80.992294, "b": 701.977463, "coord_origin": "TOPLEFT"}}, {"id": 108, "text": "Zhong,", "bbox": {"l": 89.062057, "t": 693.961502, "r": 114.71492999999998, "b": 701.977463, "coord_origin": "TOPLEFT"}}, {"id": 109, "text": "Elaheh", "bbox": {"l": 124.24621000000002, "t": 693.961502, "r": 149.1459, "b": 701.977463, "coord_origin": "TOPLEFT"}}, {"id": 110, "text": "ShafieiBavani,", "bbox": {"l": 157.22462, "t": 693.961502, "r": 209.37321, "b": 701.977463, "coord_origin": "TOPLEFT"}}, {"id": 111, "text": "and", "bbox": {"l": 218.9045, "t": 693.961502, "r": 231.85196999999997, "b": 701.977463, "coord_origin": "TOPLEFT"}}, {"id": 112, "text": "Antonio", "bbox": {"l": 239.93069, "t": 693.961502, "r": 269.32254, "b": 701.977463, "coord_origin": "TOPLEFT"}}, {"id": 113, "text": "Ji-", "bbox": {"l": 277.3923, "t": 693.961502, "r": 286.3587, "b": 701.977463, "coord_origin": "TOPLEFT"}}, {"id": 114, "text": "meno Yepes. Image-based table recognition: Data, model,", "bbox": {"l": 70.031013, "t": 704.920502, "r": 286.36334, "b": 712.936462, "coord_origin": "TOPLEFT"}}]}, {"id": 13, "label": "list_item", "bbox": {"l": 327.91717529296875, "t": 75.0067138671875, "r": 545.3355102539062, "b": 116.9290771484375, "coord_origin": "TOPLEFT"}, "confidence": 0.7556977868080139, "cells": [{"id": 115, "text": "and evaluation. In Andrea Vedaldi, Horst Bischof, Thomas", "bbox": {"l": 328.78101, "t": 75.88347999999996, "r": 545.11346, "b": 83.89948000000015, "coord_origin": "TOPLEFT"}}, {"id": 116, "text": "Brox, and Jan-Michael Frahm, editors,", "bbox": {"l": 328.78101, "t": 86.84149000000002, "r": 472.30618, "b": 94.85748000000001, "coord_origin": "TOPLEFT"}}, {"id": 117, "text": "Computer Vision -", "bbox": {"l": 475.88501, "t": 86.92218000000003, "r": 545.11456, "b": 94.65125, "coord_origin": "TOPLEFT"}}, {"id": 118, "text": "ECCV 2020", "bbox": {"l": 328.78101, "t": 97.88116000000002, "r": 371.92734, "b": 105.61023, "coord_origin": "TOPLEFT"}}, {"id": 119, "text": ", pages 564-580, Cham, 2020. Springer Interna-", "bbox": {"l": 371.92599, "t": 97.80048, "r": 545.11206, "b": 105.81646999999987, "coord_origin": "TOPLEFT"}}, {"id": 120, "text": "tional Publishing. 2, 3, 7", "bbox": {"l": 328.78101, "t": 108.75945999999999, "r": 417.70087, "b": 116.77544999999998, "coord_origin": "TOPLEFT"}}]}, {"id": 14, "label": "list_item", "bbox": {"l": 308.3899230957031, "t": 119.63013458251953, "r": 545.5173950195312, "b": 162.1349334716797, "coord_origin": "TOPLEFT"}, "confidence": 0.8375809788703918, "cells": [{"id": 121, "text": "[38]", "bbox": {"l": 308.862, "t": 120.71447999999998, "r": 324.33197, "b": 128.73046999999997, "coord_origin": "TOPLEFT"}}, {"id": 122, "text": "Xu Zhong, Jianbin Tang, and Antonio Jimeno Yepes. Pub-", "bbox": {"l": 326.65341, "t": 120.71447999999998, "r": 545.10876, "b": 128.73046999999997, "coord_origin": "TOPLEFT"}}, {"id": 123, "text": "laynet: Largest dataset ever for document layout analysis. In", "bbox": {"l": 328.78101, "t": 131.67345999999998, "r": 545.11334, "b": 139.68944999999997, "coord_origin": "TOPLEFT"}}, {"id": 124, "text": "2019 International Conference on Document Analysis and", "bbox": {"l": 328.78101, "t": 142.71312999999998, "r": 545.11328, "b": 150.44219999999996, "coord_origin": "TOPLEFT"}}, {"id": 125, "text": "Recognition (ICDAR)", "bbox": {"l": 328.78101, "t": 153.67211999999995, "r": 406.32245, "b": 161.40117999999995, "coord_origin": "TOPLEFT"}}, {"id": 126, "text": ", pages 1015-1022, 2019. 1", "bbox": {"l": 406.32202, "t": 153.59142999999995, "r": 506.18085, "b": 161.60742000000005, "coord_origin": "TOPLEFT"}}]}, {"id": 15, "label": "page_footer", "bbox": {"l": 292.63, "t": 733.3956909179688, "r": 302.69708251953125, "b": 743.039555, "coord_origin": "TOPLEFT"}, "confidence": 0.9069584012031555, "cells": [{"id": 127, "text": "10", "bbox": {"l": 292.63, "t": 734.1329920000001, "r": 302.59259, "b": 743.039555, "coord_origin": "TOPLEFT"}}]}]}, "tablestructure": {"table_map": {}}, "figures_classification": null, "equations_prediction": null}, "assembled": {"elements": [{"label": "text", "id": 0, "page_no": 9, "cluster": {"id": 0, "label": "text", "bbox": {"l": 69.42779541015625, "t": 75.20453643798828, "r": 286.36176, "b": 94.85875999999996, "coord_origin": "TOPLEFT"}, "confidence": 0.6886652708053589, "cells": [{"id": 0, "text": "Computer Vision and Pattern Recognition", "bbox": {"l": 70.030998, "t": 75.96447999999998, "r": 223.58061, "b": 83.69353999999998, "coord_origin": "TOPLEFT"}}, {"id": 1, "text": ", pages 658-666,", "bbox": {"l": 223.57901, "t": 75.88378999999998, "r": 286.36176, "b": 83.89977999999996, "coord_origin": "TOPLEFT"}}, {"id": 2, "text": "2019. 6", "bbox": {"l": 70.031006, "t": 86.84276999999997, "r": 97.916512, "b": 94.85875999999996, "coord_origin": "TOPLEFT"}}]}, "text": "Computer Vision and Pattern Recognition , pages 658-666, 2019. 6"}, {"label": "list_item", "id": 1, "page_no": 9, "cluster": {"id": 1, "label": "list_item", "bbox": {"l": 49.9049186706543, "t": 96.92471313476562, "r": 286.5950927734375, "b": 160.97668, "coord_origin": "TOPLEFT"}, "confidence": 0.7722932696342468, "cells": [{"id": 3, "text": "[26]", "bbox": {"l": 50.112007, "t": 98.16576999999995, "r": 65.534088, "b": 106.18176000000005, "coord_origin": "TOPLEFT"}}, {"id": 4, "text": "Sebastian Schreiber, Stefan Agne, Ivo Wolf, Andreas Den-", "bbox": {"l": 67.84832, "t": 98.16576999999995, "r": 286.35867, "b": 106.18176000000005, "coord_origin": "TOPLEFT"}}, {"id": 5, "text": "gel, and Sheraz Ahmed. Deepdesrt: Deep learning for detec-", "bbox": {"l": 70.031006, "t": 109.12476000000004, "r": 286.36331, "b": 117.14075000000003, "coord_origin": "TOPLEFT"}}, {"id": 6, "text": "tion and structure recognition of tables in document images.", "bbox": {"l": 70.031006, "t": 120.08374000000003, "r": 286.36334, "b": 128.0997299999999, "coord_origin": "TOPLEFT"}}, {"id": 7, "text": "In", "bbox": {"l": 70.031006, "t": 131.04272000000003, "r": 77.500015, "b": 139.05872, "coord_origin": "TOPLEFT"}}, {"id": 8, "text": "2017 14th IAPR International Conference on Document", "bbox": {"l": 80.560005, "t": 131.12341000000004, "r": 286.36578, "b": 138.85248, "coord_origin": "TOPLEFT"}}, {"id": 9, "text": "Analysis and Recognition (ICDAR)", "bbox": {"l": 70.031006, "t": 142.0824, "r": 195.22885, "b": 149.81146, "coord_origin": "TOPLEFT"}}, {"id": 10, "text": ", volume 01, pages 1162-", "bbox": {"l": 195.231, "t": 142.00171, "r": 286.36548, "b": 150.0177, "coord_origin": "TOPLEFT"}}, {"id": 11, "text": "1167, 2017. 1", "bbox": {"l": 70.031006, "t": 152.96069, "r": 120.33251, "b": 160.97668, "coord_origin": "TOPLEFT"}}]}, "text": "[26] Sebastian Schreiber, Stefan Agne, Ivo Wolf, Andreas Dengel, and Sheraz Ahmed. Deepdesrt: Deep learning for detection and structure recognition of tables in document images. In 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) , volume 01, pages 11621167, 2017. 1"}, {"label": "list_item", "id": 2, "page_no": 9, "cluster": {"id": 2, "label": "list_item", "bbox": {"l": 49.60877990722656, "t": 163.67764282226562, "r": 286.36337, "b": 227.0946, "coord_origin": "TOPLEFT"}, "confidence": 0.858070969581604, "cells": [{"id": 12, "text": "[27]", "bbox": {"l": 50.112007, "t": 164.28467, "r": 65.534088, "b": 172.30066, "coord_origin": "TOPLEFT"}}, {"id": 13, "text": "Sebastian Schreiber, Stefan Agne, Ivo Wolf, Andreas Den-", "bbox": {"l": 67.84832, "t": 164.28467, "r": 286.35867, "b": 172.30066, "coord_origin": "TOPLEFT"}}, {"id": 14, "text": "gel, and Sheraz Ahmed. Deepdesrt: Deep learning for de-", "bbox": {"l": 70.031006, "t": 175.24365, "r": 286.36337, "b": 183.25964, "coord_origin": "TOPLEFT"}}, {"id": 15, "text": "tection and structure recognition of tables in document im-", "bbox": {"l": 70.031006, "t": 186.20263999999997, "r": 286.36334, "b": 194.21862999999996, "coord_origin": "TOPLEFT"}}, {"id": 16, "text": "ages. In", "bbox": {"l": 70.031006, "t": 197.16161999999997, "r": 101.33271, "b": 205.17760999999996, "coord_origin": "TOPLEFT"}}, {"id": 17, "text": "2017 14th IAPR international conference on doc-", "bbox": {"l": 104.33101, "t": 197.24230999999997, "r": 286.35791, "b": 204.97136999999998, "coord_origin": "TOPLEFT"}}, {"id": 18, "text": "ument analysis and recognition (ICDAR)", "bbox": {"l": 70.031006, "t": 208.20032000000003, "r": 220.48719999999997, "b": 215.92938000000004, "coord_origin": "TOPLEFT"}}, {"id": 19, "text": ", volume 1, pages", "bbox": {"l": 220.48401000000004, "t": 208.11963000000003, "r": 286.36017, "b": 216.13562000000002, "coord_origin": "TOPLEFT"}}, {"id": 20, "text": "1162-1167. IEEE, 2017. 3", "bbox": {"l": 70.031006, "t": 219.07861000000003, "r": 166.65294, "b": 227.0946, "coord_origin": "TOPLEFT"}}]}, "text": "[27] Sebastian Schreiber, Stefan Agne, Ivo Wolf, Andreas Dengel, and Sheraz Ahmed. Deepdesrt: Deep learning for detection and structure recognition of tables in document images. In 2017 14th IAPR international conference on document analysis and recognition (ICDAR) , volume 1, pages 1162-1167. IEEE, 2017. 3"}, {"label": "list_item", "id": 3, "page_no": 9, "cluster": {"id": 3, "label": "list_item", "bbox": {"l": 49.50430679321289, "t": 229.77830505371094, "r": 286.36578, "b": 271.29553, "coord_origin": "TOPLEFT"}, "confidence": 0.85127854347229, "cells": [{"id": 21, "text": "[28]", "bbox": {"l": 50.112007, "t": 230.40259000000003, "r": 65.650383, "b": 238.41858000000002, "coord_origin": "TOPLEFT"}}, {"id": 22, "text": "Faisal Shafait and Ray Smith. Table detection in heteroge-", "bbox": {"l": 67.982063, "t": 230.40259000000003, "r": 286.3587, "b": 238.41858000000002, "coord_origin": "TOPLEFT"}}, {"id": 23, "text": "neous documents. In", "bbox": {"l": 70.031006, "t": 241.36157000000003, "r": 147.16895, "b": 249.37756000000002, "coord_origin": "TOPLEFT"}}, {"id": 24, "text": "Proceedings of the 9th IAPR Interna-", "bbox": {"l": 149.93301, "t": 241.44226000000003, "r": 286.36578, "b": 249.17133, "coord_origin": "TOPLEFT"}}, {"id": 25, "text": "tional Workshop on Document Analysis Systems", "bbox": {"l": 70.031013, "t": 252.40125, "r": 244.6875, "b": 260.13031, "coord_origin": "TOPLEFT"}}, {"id": 26, "text": ", pages 65-", "bbox": {"l": 244.69101, "t": 252.32056, "r": 286.35791, "b": 260.33655, "coord_origin": "TOPLEFT"}}, {"id": 27, "text": "72, 2010. 2", "bbox": {"l": 70.031006, "t": 263.27954, "r": 111.36611, "b": 271.29553, "coord_origin": "TOPLEFT"}}]}, "text": "[28] Faisal Shafait and Ray Smith. Table detection in heterogeneous documents. In Proceedings of the 9th IAPR International Workshop on Document Analysis Systems , pages 6572, 2010. 2"}, {"label": "list_item", "id": 4, "page_no": 9, "cluster": {"id": 4, "label": "list_item", "bbox": {"l": 49.674346923828125, "t": 273.582763671875, "r": 286.8292541503906, "b": 326.8854675292969, "coord_origin": "TOPLEFT"}, "confidence": 0.849456250667572, "cells": [{"id": 28, "text": "[29]", "bbox": {"l": 50.112007, "t": 274.60357999999997, "r": 66.023834, "b": 282.61951, "coord_origin": "TOPLEFT"}}, {"id": 29, "text": "Shoaib", "bbox": {"l": 68.411568, "t": 274.60357999999997, "r": 94.944016, "b": 282.61951, "coord_origin": "TOPLEFT"}}, {"id": 30, "text": "Ahmed", "bbox": {"l": 100.8708, "t": 274.60357999999997, "r": 127.26788000000002, "b": 282.61951, "coord_origin": "TOPLEFT"}}, {"id": 31, "text": "Siddiqui,", "bbox": {"l": 133.19467, "t": 274.60357999999997, "r": 165.83237, "b": 282.61951, "coord_origin": "TOPLEFT"}}, {"id": 32, "text": "Imran", "bbox": {"l": 172.68269, "t": 274.60357999999997, "r": 194.09445, "b": 282.61951, "coord_origin": "TOPLEFT"}}, {"id": 33, "text": "Ali", "bbox": {"l": 200.02124, "t": 274.60357999999997, "r": 211.4803, "b": 282.61951, "coord_origin": "TOPLEFT"}}, {"id": 34, "text": "Fateh,", "bbox": {"l": 217.40708999999998, "t": 274.60357999999997, "r": 239.43755, "b": 282.61951, "coord_origin": "TOPLEFT"}}, {"id": 35, "text": "Syed", "bbox": {"l": 246.28787000000003, "t": 274.60357999999997, "r": 264.22067, "b": 282.61951, "coord_origin": "TOPLEFT"}}, {"id": 36, "text": "Tah-", "bbox": {"l": 270.14746, "t": 274.60357999999997, "r": 286.35873, "b": 282.61951, "coord_origin": "TOPLEFT"}}, {"id": 37, "text": "seen Raza Rizvi, Andreas Dengel, and Sheraz Ahmed.", "bbox": {"l": 70.031006, "t": 285.56256, "r": 286.36331, "b": 293.57852, "coord_origin": "TOPLEFT"}}, {"id": 38, "text": "Deeptabstr: Deep learning based table structure recognition.", "bbox": {"l": 70.031006, "t": 296.52155, "r": 286.36331, "b": 304.53751, "coord_origin": "TOPLEFT"}}, {"id": 39, "text": "In", "bbox": {"l": 70.031006, "t": 307.48053, "r": 77.500015, "b": 315.49649, "coord_origin": "TOPLEFT"}}, {"id": 40, "text": "2019 International Conference on Document Analysis and", "bbox": {"l": 79.350006, "t": 307.56122, "r": 286.36627, "b": 315.29028, "coord_origin": "TOPLEFT"}}, {"id": 41, "text": "Recognition (ICDAR)", "bbox": {"l": 70.031006, "t": 318.51923, "r": 147.57243, "b": 326.24829, "coord_origin": "TOPLEFT"}}, {"id": 42, "text": ", pages 1403-1409. IEEE, 2019. 3", "bbox": {"l": 147.57201, "t": 318.43854, "r": 271.33521, "b": 326.4545, "coord_origin": "TOPLEFT"}}]}, "text": "[29] Shoaib Ahmed Siddiqui, Imran Ali Fateh, Syed Tahseen Raza Rizvi, Andreas Dengel, and Sheraz Ahmed. Deeptabstr: Deep learning based table structure recognition. In 2019 International Conference on Document Analysis and Recognition (ICDAR) , pages 1403-1409. IEEE, 2019. 3"}, {"label": "list_item", "id": 5, "page_no": 9, "cluster": {"id": 5, "label": "list_item", "bbox": {"l": 49.465091705322266, "t": 329.0513610839844, "r": 286.52130126953125, "b": 381.7455139160156, "coord_origin": "TOPLEFT"}, "confidence": 0.8567717671394348, "cells": [{"id": 43, "text": "[30]", "bbox": {"l": 50.112007, "t": 329.76254, "r": 65.366135, "b": 337.7785, "coord_origin": "TOPLEFT"}}, {"id": 44, "text": "Peter W J Staar, Michele Dolfi, Christoph Auer, and Costas", "bbox": {"l": 67.655159, "t": 329.76254, "r": 286.3587, "b": 337.7785, "coord_origin": "TOPLEFT"}}, {"id": 45, "text": "Bekas. Corpus conversion service: A machine learning plat-", "bbox": {"l": 70.031006, "t": 340.72156000000007, "r": 286.36334, "b": 348.7375200000001, "coord_origin": "TOPLEFT"}}, {"id": 46, "text": "form to ingest documents at scale.", "bbox": {"l": 70.031006, "t": 351.68054, "r": 198.82439, "b": 359.6965, "coord_origin": "TOPLEFT"}}, {"id": 47, "text": "In", "bbox": {"l": 206.06027, "t": 351.68054, "r": 213.52928, "b": 359.6965, "coord_origin": "TOPLEFT"}}, {"id": 48, "text": "Proceedings of the", "bbox": {"l": 217.02101, "t": 351.76123, "r": 286.35815, "b": 359.4903, "coord_origin": "TOPLEFT"}}, {"id": 49, "text": "24th ACM SIGKDD", "bbox": {"l": 70.031006, "t": 362.72021, "r": 143.08028, "b": 370.44928, "coord_origin": "TOPLEFT"}}, {"id": 50, "text": ", KDD \u201918, pages 774-782, New York,", "bbox": {"l": 143.078, "t": 362.63953000000004, "r": 286.36111, "b": 370.65549000000004, "coord_origin": "TOPLEFT"}}, {"id": 51, "text": "NY, USA, 2018. ACM. 1", "bbox": {"l": 70.031006, "t": 373.59851, "r": 161.15652, "b": 381.61447, "coord_origin": "TOPLEFT"}}]}, "text": "[30] Peter W J Staar, Michele Dolfi, Christoph Auer, and Costas Bekas. Corpus conversion service: A machine learning platform to ingest documents at scale. In Proceedings of the 24th ACM SIGKDD , KDD \u201918, pages 774-782, New York, NY, USA, 2018. ACM. 1"}, {"label": "list_item", "id": 6, "page_no": 9, "cluster": {"id": 6, "label": "list_item", "bbox": {"l": 49.55904006958008, "t": 384.240234375, "r": 286.4693603515625, "b": 458.69144, "coord_origin": "TOPLEFT"}, "confidence": 0.88321852684021, "cells": [{"id": 52, "text": "[31]", "bbox": {"l": 50.112007, "t": 384.92252, "r": 65.140724, "b": 392.93848, "coord_origin": "TOPLEFT"}}, {"id": 53, "text": "Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko-", "bbox": {"l": 67.395927, "t": 384.92252, "r": 286.35876, "b": 392.93848, "coord_origin": "TOPLEFT"}}, {"id": 54, "text": "reit, Llion Jones, Aidan N Gomez, \u0141 ukasz Kaiser, and Il-", "bbox": {"l": 70.031006, "t": 395.88153, "r": 286.36337, "b": 403.89749, "coord_origin": "TOPLEFT"}}, {"id": 55, "text": "lia Polosukhin.", "bbox": {"l": 70.031006, "t": 406.84052, "r": 125.47024999999998, "b": 414.85648, "coord_origin": "TOPLEFT"}}, {"id": 56, "text": "Attention is all you need.", "bbox": {"l": 133.90764, "t": 406.84052, "r": 230.83444, "b": 414.85648, "coord_origin": "TOPLEFT"}}, {"id": 57, "text": "In I. Guyon,", "bbox": {"l": 239.27182, "t": 406.84052, "r": 286.36334, "b": 414.85648, "coord_origin": "TOPLEFT"}}, {"id": 58, "text": "U.", "bbox": {"l": 70.031006, "t": 417.7995, "r": 78.958366, "b": 425.81546, "coord_origin": "TOPLEFT"}}, {"id": 59, "text": "V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vish-", "bbox": {"l": 81.254494, "t": 417.7995, "r": 286.36334, "b": 425.81546, "coord_origin": "TOPLEFT"}}, {"id": 60, "text": "wanathan, and R. Garnett, editors,", "bbox": {"l": 70.031006, "t": 428.75751, "r": 196.7621, "b": 436.7734699999999, "coord_origin": "TOPLEFT"}}, {"id": 61, "text": "Advances in Neural In-", "bbox": {"l": 200.20201, "t": 428.8381999999999, "r": 286.36017, "b": 436.56726, "coord_origin": "TOPLEFT"}}, {"id": 62, "text": "formation Processing Systems 30", "bbox": {"l": 70.031006, "t": 439.79717999999997, "r": 189.19447, "b": 447.52624999999995, "coord_origin": "TOPLEFT"}}, {"id": 63, "text": ", pages 5998-6008. Curran", "bbox": {"l": 189.19501, "t": 439.71648999999996, "r": 286.36389, "b": 447.73245, "coord_origin": "TOPLEFT"}}, {"id": 64, "text": "Associates, Inc., 2017. 5", "bbox": {"l": 70.031006, "t": 450.67548, "r": 158.9239, "b": 458.69144, "coord_origin": "TOPLEFT"}}]}, "text": "[31] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, \u0141 ukasz Kaiser, and Illia Polosukhin. Attention is all you need. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30 , pages 5998-6008. Curran Associates, Inc., 2017. 5"}, {"label": "list_item", "id": 7, "page_no": 9, "cluster": {"id": 7, "label": "list_item", "bbox": {"l": 49.52634048461914, "t": 461.1815490722656, "r": 286.46636962890625, "b": 502.89243, "coord_origin": "TOPLEFT"}, "confidence": 0.8431898355484009, "cells": [{"id": 65, "text": "[32]", "bbox": {"l": 50.112007, "t": 461.99948, "r": 65.910469, "b": 470.01544, "coord_origin": "TOPLEFT"}}, {"id": 66, "text": "Oriol Vinyals, Alexander Toshev, Samy Bengio, and Du-", "bbox": {"l": 68.281181, "t": 461.99948, "r": 286.35873, "b": 470.01544, "coord_origin": "TOPLEFT"}}, {"id": 67, "text": "mitru Erhan.", "bbox": {"l": 70.031006, "t": 472.9585, "r": 116.27969999999999, "b": 480.97446, "coord_origin": "TOPLEFT"}}, {"id": 68, "text": "Show and tell: A neural image caption gen-", "bbox": {"l": 122.48445, "t": 472.9585, "r": 286.36334, "b": 480.97446, "coord_origin": "TOPLEFT"}}, {"id": 69, "text": "erator. In", "bbox": {"l": 70.031006, "t": 483.91748, "r": 103.30532, "b": 491.93344, "coord_origin": "TOPLEFT"}}, {"id": 70, "text": "Proceedings of the IEEE Conference on Computer", "bbox": {"l": 105.51601, "t": 483.99817, "r": 286.35931, "b": 491.72723, "coord_origin": "TOPLEFT"}}, {"id": 71, "text": "Vision and Pattern Recognition (CVPR)", "bbox": {"l": 70.031006, "t": 494.95715, "r": 212.51607, "b": 502.68622, "coord_origin": "TOPLEFT"}}, {"id": 72, "text": ", June 2015. 2", "bbox": {"l": 212.51401, "t": 494.87646, "r": 263.55975, "b": 502.89243, "coord_origin": "TOPLEFT"}}]}, "text": "[32] Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan. Show and tell: A neural image caption generator. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , June 2015. 2"}, {"label": "list_item", "id": 8, "page_no": 9, "cluster": {"id": 8, "label": "list_item", "bbox": {"l": 49.58083724975586, "t": 505.2271728515625, "r": 286.77581787109375, "b": 547.5089721679688, "coord_origin": "TOPLEFT"}, "confidence": 0.8405721187591553, "cells": [{"id": 73, "text": "[33]", "bbox": {"l": 50.112015, "t": 506.20047, "r": 65.682777, "b": 514.21643, "coord_origin": "TOPLEFT"}}, {"id": 74, "text": "Wenyuan Xue, Qingyong Li, and Dacheng Tao.", "bbox": {"l": 68.019325, "t": 506.20047, "r": 247.37280000000004, "b": 514.21643, "coord_origin": "TOPLEFT"}}, {"id": 75, "text": "Res2tim:", "bbox": {"l": 253.97208000000003, "t": 506.20047, "r": 286.3587, "b": 514.21643, "coord_origin": "TOPLEFT"}}, {"id": 76, "text": "reconstruct syntactic structures from table images. In", "bbox": {"l": 70.031013, "t": 517.15948, "r": 265.62408, "b": 525.17545, "coord_origin": "TOPLEFT"}}, {"id": 77, "text": "2019", "bbox": {"l": 268.42902, "t": 517.24017, "r": 286.36182, "b": 524.96924, "coord_origin": "TOPLEFT"}}, {"id": 78, "text": "International Conference on Document Analysis and Recog-", "bbox": {"l": 70.031021, "t": 528.19916, "r": 286.36337, "b": 535.92822, "coord_origin": "TOPLEFT"}}, {"id": 79, "text": "nition (ICDAR)", "bbox": {"l": 70.031021, "t": 539.15718, "r": 125.25507999999999, "b": 546.88622, "coord_origin": "TOPLEFT"}}, {"id": 80, "text": ", pages 749-755. IEEE, 2019. 3", "bbox": {"l": 125.25402, "t": 539.07648, "r": 240.05083, "b": 547.09244, "coord_origin": "TOPLEFT"}}]}, "text": "[33] Wenyuan Xue, Qingyong Li, and Dacheng Tao. Res2tim: reconstruct syntactic structures from table images. In 2019 International Conference on Document Analysis and Recognition (ICDAR) , pages 749-755. IEEE, 2019. 3"}, {"label": "list_item", "id": 9, "page_no": 9, "cluster": {"id": 9, "label": "list_item", "bbox": {"l": 49.6944580078125, "t": 549.8140869140625, "r": 286.58221435546875, "b": 591.29344, "coord_origin": "TOPLEFT"}, "confidence": 0.8241643309593201, "cells": [{"id": 81, "text": "[34]", "bbox": {"l": 50.112022, "t": 550.40048, "r": 66.037048, "b": 558.41644, "coord_origin": "TOPLEFT"}}, {"id": 82, "text": "Wenyuan Xue, Baosheng Yu, Wen Wang, Dacheng Tao,", "bbox": {"l": 68.426765, "t": 550.40048, "r": 286.3587, "b": 558.41644, "coord_origin": "TOPLEFT"}}, {"id": 83, "text": "and Qingyong Li.", "bbox": {"l": 70.031021, "t": 561.35948, "r": 137.08176, "b": 569.37544, "coord_origin": "TOPLEFT"}}, {"id": 84, "text": "Tgrnet:", "bbox": {"l": 145.9854, "t": 561.35948, "r": 172.38248, "b": 569.37544, "coord_origin": "TOPLEFT"}}, {"id": 85, "text": "A table graph reconstruction", "bbox": {"l": 178.7038, "t": 561.35948, "r": 286.36337, "b": 569.37544, "coord_origin": "TOPLEFT"}}, {"id": 86, "text": "network for table structure recognition.", "bbox": {"l": 70.031021, "t": 572.31848, "r": 221.00723, "b": 580.33444, "coord_origin": "TOPLEFT"}}, {"id": 87, "text": "arXiv preprint", "bbox": {"l": 232.54300999999998, "t": 572.39919, "r": 286.35938, "b": 580.12822, "coord_origin": "TOPLEFT"}}, {"id": 88, "text": "arXiv:2106.10598", "bbox": {"l": 70.031021, "t": 583.35818, "r": 135.53058, "b": 591.08722, "coord_origin": "TOPLEFT"}}, {"id": 89, "text": ", 2021. 3", "bbox": {"l": 135.53003, "t": 583.27748, "r": 167.89876, "b": 591.29344, "coord_origin": "TOPLEFT"}}]}, "text": "[34] Wenyuan Xue, Baosheng Yu, Wen Wang, Dacheng Tao, and Qingyong Li. Tgrnet: A table graph reconstruction network for table structure recognition. arXiv preprint arXiv:2106.10598 , 2021. 3"}, {"label": "list_item", "id": 10, "page_no": 9, "cluster": {"id": 10, "label": "list_item", "bbox": {"l": 49.64909744262695, "t": 593.8485107421875, "r": 286.92523193359375, "b": 635.7434692382812, "coord_origin": "TOPLEFT"}, "confidence": 0.836098849773407, "cells": [{"id": 90, "text": "[35]", "bbox": {"l": 50.11203, "t": 594.60149, "r": 65.23661, "b": 602.61745, "coord_origin": "TOPLEFT"}}, {"id": 91, "text": "Quanzeng You, Hailin Jin, Zhaowen Wang, Chen Fang, and", "bbox": {"l": 67.506203, "t": 594.60149, "r": 286.3587, "b": 602.61745, "coord_origin": "TOPLEFT"}}, {"id": 92, "text": "Jiebo Luo.", "bbox": {"l": 70.031029, "t": 605.56049, "r": 109.1066, "b": 613.57645, "coord_origin": "TOPLEFT"}}, {"id": 93, "text": "Image captioning with semantic attention.", "bbox": {"l": 116.22592, "t": 605.56049, "r": 271.76605, "b": 613.57645, "coord_origin": "TOPLEFT"}}, {"id": 94, "text": "In", "bbox": {"l": 278.89435, "t": 605.56049, "r": 286.36337, "b": 613.57645, "coord_origin": "TOPLEFT"}}, {"id": 95, "text": "Proceedings of the IEEE conference on computer vision and", "bbox": {"l": 70.031029, "t": 616.60019, "r": 286.3634, "b": 624.32922, "coord_origin": "TOPLEFT"}}, {"id": 96, "text": "pattern recognition", "bbox": {"l": 70.031029, "t": 627.55919, "r": 139.09921, "b": 635.28822, "coord_origin": "TOPLEFT"}}, {"id": 97, "text": ", pages 4651-4659, 2016. 4", "bbox": {"l": 139.09802, "t": 627.47849, "r": 238.95683, "b": 635.49445, "coord_origin": "TOPLEFT"}}]}, "text": "[35] Quanzeng You, Hailin Jin, Zhaowen Wang, Chen Fang, and Jiebo Luo. Image captioning with semantic attention. In Proceedings of the IEEE conference on computer vision and pattern recognition , pages 4651-4659, 2016. 4"}, {"label": "list_item", "id": 11, "page_no": 9, "cluster": {"id": 11, "label": "list_item", "bbox": {"l": 49.76350021362305, "t": 637.6057739257812, "r": 286.3849182128906, "b": 691.11279296875, "coord_origin": "TOPLEFT"}, "confidence": 0.8080187439918518, "cells": [{"id": 98, "text": "[36]", "bbox": {"l": 50.112022, "t": 638.80249, "r": 65.203552, "b": 646.81845, "coord_origin": "TOPLEFT"}}, {"id": 99, "text": "Xinyi Zheng, Doug Burdick, Lucian Popa, Peter Zhong, and", "bbox": {"l": 67.468193, "t": 638.80249, "r": 286.35873, "b": 646.81845, "coord_origin": "TOPLEFT"}}, {"id": 100, "text": "Nancy Xin Ru Wang. Global table extractor (gte): A frame-", "bbox": {"l": 70.031021, "t": 649.7605, "r": 286.36337, "b": 657.77646, "coord_origin": "TOPLEFT"}}, {"id": 101, "text": "work for joint table identification and cell structure recogni-", "bbox": {"l": 70.031021, "t": 660.7195, "r": 286.36334, "b": 668.73547, "coord_origin": "TOPLEFT"}}, {"id": 102, "text": "tion using visual context.", "bbox": {"l": 70.031021, "t": 671.6785, "r": 158.45766, "b": 679.69447, "coord_origin": "TOPLEFT"}}, {"id": 103, "text": "Winter Conference for Applications", "bbox": {"l": 160.52802, "t": 671.7592, "r": 286.36249, "b": 679.48824, "coord_origin": "TOPLEFT"}}, {"id": 104, "text": "in Computer Vision (WACV)", "bbox": {"l": 70.031013, "t": 682.7182, "r": 171.42305, "b": 690.44724, "coord_origin": "TOPLEFT"}}, {"id": 105, "text": ", 2021. 2, 3", "bbox": {"l": 171.42201, "t": 682.6375, "r": 212.75713, "b": 690.65347, "coord_origin": "TOPLEFT"}}]}, "text": "[36] Xinyi Zheng, Doug Burdick, Lucian Popa, Peter Zhong, and Nancy Xin Ru Wang. Global table extractor (gte): A framework for joint table identification and cell structure recognition using visual context. Winter Conference for Applications in Computer Vision (WACV) , 2021. 2, 3"}, {"label": "list_item", "id": 12, "page_no": 9, "cluster": {"id": 12, "label": "list_item", "bbox": {"l": 49.56057357788086, "t": 693.036865234375, "r": 286.3756408691406, "b": 713.1453247070312, "coord_origin": "TOPLEFT"}, "confidence": 0.8385704755783081, "cells": [{"id": 106, "text": "[37]", "bbox": {"l": 50.112015, "t": 693.961502, "r": 66.506706, "b": 701.977463, "coord_origin": "TOPLEFT"}}, {"id": 107, "text": "Xu", "bbox": {"l": 68.966896, "t": 693.961502, "r": 80.992294, "b": 701.977463, "coord_origin": "TOPLEFT"}}, {"id": 108, "text": "Zhong,", "bbox": {"l": 89.062057, "t": 693.961502, "r": 114.71492999999998, "b": 701.977463, "coord_origin": "TOPLEFT"}}, {"id": 109, "text": "Elaheh", "bbox": {"l": 124.24621000000002, "t": 693.961502, "r": 149.1459, "b": 701.977463, "coord_origin": "TOPLEFT"}}, {"id": 110, "text": "ShafieiBavani,", "bbox": {"l": 157.22462, "t": 693.961502, "r": 209.37321, "b": 701.977463, "coord_origin": "TOPLEFT"}}, {"id": 111, "text": "and", "bbox": {"l": 218.9045, "t": 693.961502, "r": 231.85196999999997, "b": 701.977463, "coord_origin": "TOPLEFT"}}, {"id": 112, "text": "Antonio", "bbox": {"l": 239.93069, "t": 693.961502, "r": 269.32254, "b": 701.977463, "coord_origin": "TOPLEFT"}}, {"id": 113, "text": "Ji-", "bbox": {"l": 277.3923, "t": 693.961502, "r": 286.3587, "b": 701.977463, "coord_origin": "TOPLEFT"}}, {"id": 114, "text": "meno Yepes. Image-based table recognition: Data, model,", "bbox": {"l": 70.031013, "t": 704.920502, "r": 286.36334, "b": 712.936462, "coord_origin": "TOPLEFT"}}]}, "text": "[37] Xu Zhong, Elaheh ShafieiBavani, and Antonio Jimeno Yepes. Image-based table recognition: Data, model,"}, {"label": "list_item", "id": 13, "page_no": 9, "cluster": {"id": 13, "label": "list_item", "bbox": {"l": 327.91717529296875, "t": 75.0067138671875, "r": 545.3355102539062, "b": 116.9290771484375, "coord_origin": "TOPLEFT"}, "confidence": 0.7556977868080139, "cells": [{"id": 115, "text": "and evaluation. In Andrea Vedaldi, Horst Bischof, Thomas", "bbox": {"l": 328.78101, "t": 75.88347999999996, "r": 545.11346, "b": 83.89948000000015, "coord_origin": "TOPLEFT"}}, {"id": 116, "text": "Brox, and Jan-Michael Frahm, editors,", "bbox": {"l": 328.78101, "t": 86.84149000000002, "r": 472.30618, "b": 94.85748000000001, "coord_origin": "TOPLEFT"}}, {"id": 117, "text": "Computer Vision -", "bbox": {"l": 475.88501, "t": 86.92218000000003, "r": 545.11456, "b": 94.65125, "coord_origin": "TOPLEFT"}}, {"id": 118, "text": "ECCV 2020", "bbox": {"l": 328.78101, "t": 97.88116000000002, "r": 371.92734, "b": 105.61023, "coord_origin": "TOPLEFT"}}, {"id": 119, "text": ", pages 564-580, Cham, 2020. Springer Interna-", "bbox": {"l": 371.92599, "t": 97.80048, "r": 545.11206, "b": 105.81646999999987, "coord_origin": "TOPLEFT"}}, {"id": 120, "text": "tional Publishing. 2, 3, 7", "bbox": {"l": 328.78101, "t": 108.75945999999999, "r": 417.70087, "b": 116.77544999999998, "coord_origin": "TOPLEFT"}}]}, "text": "and evaluation. In Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm, editors, Computer Vision ECCV 2020 , pages 564-580, Cham, 2020. Springer International Publishing. 2, 3, 7"}, {"label": "list_item", "id": 14, "page_no": 9, "cluster": {"id": 14, "label": "list_item", "bbox": {"l": 308.3899230957031, "t": 119.63013458251953, "r": 545.5173950195312, "b": 162.1349334716797, "coord_origin": "TOPLEFT"}, "confidence": 0.8375809788703918, "cells": [{"id": 121, "text": "[38]", "bbox": {"l": 308.862, "t": 120.71447999999998, "r": 324.33197, "b": 128.73046999999997, "coord_origin": "TOPLEFT"}}, {"id": 122, "text": "Xu Zhong, Jianbin Tang, and Antonio Jimeno Yepes. Pub-", "bbox": {"l": 326.65341, "t": 120.71447999999998, "r": 545.10876, "b": 128.73046999999997, "coord_origin": "TOPLEFT"}}, {"id": 123, "text": "laynet: Largest dataset ever for document layout analysis. In", "bbox": {"l": 328.78101, "t": 131.67345999999998, "r": 545.11334, "b": 139.68944999999997, "coord_origin": "TOPLEFT"}}, {"id": 124, "text": "2019 International Conference on Document Analysis and", "bbox": {"l": 328.78101, "t": 142.71312999999998, "r": 545.11328, "b": 150.44219999999996, "coord_origin": "TOPLEFT"}}, {"id": 125, "text": "Recognition (ICDAR)", "bbox": {"l": 328.78101, "t": 153.67211999999995, "r": 406.32245, "b": 161.40117999999995, "coord_origin": "TOPLEFT"}}, {"id": 126, "text": ", pages 1015-1022, 2019. 1", "bbox": {"l": 406.32202, "t": 153.59142999999995, "r": 506.18085, "b": 161.60742000000005, "coord_origin": "TOPLEFT"}}]}, "text": "[38] Xu Zhong, Jianbin Tang, and Antonio Jimeno Yepes. Publaynet: Largest dataset ever for document layout analysis. In 2019 International Conference on Document Analysis and Recognition (ICDAR) , pages 1015-1022, 2019. 1"}, {"label": "page_footer", "id": 15, "page_no": 9, "cluster": {"id": 15, "label": "page_footer", "bbox": {"l": 292.63, "t": 733.3956909179688, "r": 302.69708251953125, "b": 743.039555, "coord_origin": "TOPLEFT"}, "confidence": 0.9069584012031555, "cells": [{"id": 127, "text": "10", "bbox": {"l": 292.63, "t": 734.1329920000001, "r": 302.59259, "b": 743.039555, "coord_origin": "TOPLEFT"}}]}, "text": "10"}], "body": [{"label": "text", "id": 0, "page_no": 9, "cluster": {"id": 0, "label": "text", "bbox": {"l": 69.42779541015625, "t": 75.20453643798828, "r": 286.36176, "b": 94.85875999999996, "coord_origin": "TOPLEFT"}, "confidence": 0.6886652708053589, "cells": [{"id": 0, "text": "Computer Vision and Pattern Recognition", "bbox": {"l": 70.030998, "t": 75.96447999999998, "r": 223.58061, "b": 83.69353999999998, "coord_origin": "TOPLEFT"}}, {"id": 1, "text": ", pages 658-666,", "bbox": {"l": 223.57901, "t": 75.88378999999998, "r": 286.36176, "b": 83.89977999999996, "coord_origin": "TOPLEFT"}}, {"id": 2, "text": "2019. 6", "bbox": {"l": 70.031006, "t": 86.84276999999997, "r": 97.916512, "b": 94.85875999999996, "coord_origin": "TOPLEFT"}}]}, "text": "Computer Vision and Pattern Recognition , pages 658-666, 2019. 6"}, {"label": "list_item", "id": 1, "page_no": 9, "cluster": {"id": 1, "label": "list_item", "bbox": {"l": 49.9049186706543, "t": 96.92471313476562, "r": 286.5950927734375, "b": 160.97668, "coord_origin": "TOPLEFT"}, "confidence": 0.7722932696342468, "cells": [{"id": 3, "text": "[26]", "bbox": {"l": 50.112007, "t": 98.16576999999995, "r": 65.534088, "b": 106.18176000000005, "coord_origin": "TOPLEFT"}}, {"id": 4, "text": "Sebastian Schreiber, Stefan Agne, Ivo Wolf, Andreas Den-", "bbox": {"l": 67.84832, "t": 98.16576999999995, "r": 286.35867, "b": 106.18176000000005, "coord_origin": "TOPLEFT"}}, {"id": 5, "text": "gel, and Sheraz Ahmed. Deepdesrt: Deep learning for detec-", "bbox": {"l": 70.031006, "t": 109.12476000000004, "r": 286.36331, "b": 117.14075000000003, "coord_origin": "TOPLEFT"}}, {"id": 6, "text": "tion and structure recognition of tables in document images.", "bbox": {"l": 70.031006, "t": 120.08374000000003, "r": 286.36334, "b": 128.0997299999999, "coord_origin": "TOPLEFT"}}, {"id": 7, "text": "In", "bbox": {"l": 70.031006, "t": 131.04272000000003, "r": 77.500015, "b": 139.05872, "coord_origin": "TOPLEFT"}}, {"id": 8, "text": "2017 14th IAPR International Conference on Document", "bbox": {"l": 80.560005, "t": 131.12341000000004, "r": 286.36578, "b": 138.85248, "coord_origin": "TOPLEFT"}}, {"id": 9, "text": "Analysis and Recognition (ICDAR)", "bbox": {"l": 70.031006, "t": 142.0824, "r": 195.22885, "b": 149.81146, "coord_origin": "TOPLEFT"}}, {"id": 10, "text": ", volume 01, pages 1162-", "bbox": {"l": 195.231, "t": 142.00171, "r": 286.36548, "b": 150.0177, "coord_origin": "TOPLEFT"}}, {"id": 11, "text": "1167, 2017. 1", "bbox": {"l": 70.031006, "t": 152.96069, "r": 120.33251, "b": 160.97668, "coord_origin": "TOPLEFT"}}]}, "text": "[26] Sebastian Schreiber, Stefan Agne, Ivo Wolf, Andreas Dengel, and Sheraz Ahmed. Deepdesrt: Deep learning for detection and structure recognition of tables in document images. In 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) , volume 01, pages 11621167, 2017. 1"}, {"label": "list_item", "id": 2, "page_no": 9, "cluster": {"id": 2, "label": "list_item", "bbox": {"l": 49.60877990722656, "t": 163.67764282226562, "r": 286.36337, "b": 227.0946, "coord_origin": "TOPLEFT"}, "confidence": 0.858070969581604, "cells": [{"id": 12, "text": "[27]", "bbox": {"l": 50.112007, "t": 164.28467, "r": 65.534088, "b": 172.30066, "coord_origin": "TOPLEFT"}}, {"id": 13, "text": "Sebastian Schreiber, Stefan Agne, Ivo Wolf, Andreas Den-", "bbox": {"l": 67.84832, "t": 164.28467, "r": 286.35867, "b": 172.30066, "coord_origin": "TOPLEFT"}}, {"id": 14, "text": "gel, and Sheraz Ahmed. Deepdesrt: Deep learning for de-", "bbox": {"l": 70.031006, "t": 175.24365, "r": 286.36337, "b": 183.25964, "coord_origin": "TOPLEFT"}}, {"id": 15, "text": "tection and structure recognition of tables in document im-", "bbox": {"l": 70.031006, "t": 186.20263999999997, "r": 286.36334, "b": 194.21862999999996, "coord_origin": "TOPLEFT"}}, {"id": 16, "text": "ages. In", "bbox": {"l": 70.031006, "t": 197.16161999999997, "r": 101.33271, "b": 205.17760999999996, "coord_origin": "TOPLEFT"}}, {"id": 17, "text": "2017 14th IAPR international conference on doc-", "bbox": {"l": 104.33101, "t": 197.24230999999997, "r": 286.35791, "b": 204.97136999999998, "coord_origin": "TOPLEFT"}}, {"id": 18, "text": "ument analysis and recognition (ICDAR)", "bbox": {"l": 70.031006, "t": 208.20032000000003, "r": 220.48719999999997, "b": 215.92938000000004, "coord_origin": "TOPLEFT"}}, {"id": 19, "text": ", volume 1, pages", "bbox": {"l": 220.48401000000004, "t": 208.11963000000003, "r": 286.36017, "b": 216.13562000000002, "coord_origin": "TOPLEFT"}}, {"id": 20, "text": "1162-1167. IEEE, 2017. 3", "bbox": {"l": 70.031006, "t": 219.07861000000003, "r": 166.65294, "b": 227.0946, "coord_origin": "TOPLEFT"}}]}, "text": "[27] Sebastian Schreiber, Stefan Agne, Ivo Wolf, Andreas Dengel, and Sheraz Ahmed. Deepdesrt: Deep learning for detection and structure recognition of tables in document images. In 2017 14th IAPR international conference on document analysis and recognition (ICDAR) , volume 1, pages 1162-1167. IEEE, 2017. 3"}, {"label": "list_item", "id": 3, "page_no": 9, "cluster": {"id": 3, "label": "list_item", "bbox": {"l": 49.50430679321289, "t": 229.77830505371094, "r": 286.36578, "b": 271.29553, "coord_origin": "TOPLEFT"}, "confidence": 0.85127854347229, "cells": [{"id": 21, "text": "[28]", "bbox": {"l": 50.112007, "t": 230.40259000000003, "r": 65.650383, "b": 238.41858000000002, "coord_origin": "TOPLEFT"}}, {"id": 22, "text": "Faisal Shafait and Ray Smith. Table detection in heteroge-", "bbox": {"l": 67.982063, "t": 230.40259000000003, "r": 286.3587, "b": 238.41858000000002, "coord_origin": "TOPLEFT"}}, {"id": 23, "text": "neous documents. In", "bbox": {"l": 70.031006, "t": 241.36157000000003, "r": 147.16895, "b": 249.37756000000002, "coord_origin": "TOPLEFT"}}, {"id": 24, "text": "Proceedings of the 9th IAPR Interna-", "bbox": {"l": 149.93301, "t": 241.44226000000003, "r": 286.36578, "b": 249.17133, "coord_origin": "TOPLEFT"}}, {"id": 25, "text": "tional Workshop on Document Analysis Systems", "bbox": {"l": 70.031013, "t": 252.40125, "r": 244.6875, "b": 260.13031, "coord_origin": "TOPLEFT"}}, {"id": 26, "text": ", pages 65-", "bbox": {"l": 244.69101, "t": 252.32056, "r": 286.35791, "b": 260.33655, "coord_origin": "TOPLEFT"}}, {"id": 27, "text": "72, 2010. 2", "bbox": {"l": 70.031006, "t": 263.27954, "r": 111.36611, "b": 271.29553, "coord_origin": "TOPLEFT"}}]}, "text": "[28] Faisal Shafait and Ray Smith. Table detection in heterogeneous documents. In Proceedings of the 9th IAPR International Workshop on Document Analysis Systems , pages 6572, 2010. 2"}, {"label": "list_item", "id": 4, "page_no": 9, "cluster": {"id": 4, "label": "list_item", "bbox": {"l": 49.674346923828125, "t": 273.582763671875, "r": 286.8292541503906, "b": 326.8854675292969, "coord_origin": "TOPLEFT"}, "confidence": 0.849456250667572, "cells": [{"id": 28, "text": "[29]", "bbox": {"l": 50.112007, "t": 274.60357999999997, "r": 66.023834, "b": 282.61951, "coord_origin": "TOPLEFT"}}, {"id": 29, "text": "Shoaib", "bbox": {"l": 68.411568, "t": 274.60357999999997, "r": 94.944016, "b": 282.61951, "coord_origin": "TOPLEFT"}}, {"id": 30, "text": "Ahmed", "bbox": {"l": 100.8708, "t": 274.60357999999997, "r": 127.26788000000002, "b": 282.61951, "coord_origin": "TOPLEFT"}}, {"id": 31, "text": "Siddiqui,", "bbox": {"l": 133.19467, "t": 274.60357999999997, "r": 165.83237, "b": 282.61951, "coord_origin": "TOPLEFT"}}, {"id": 32, "text": "Imran", "bbox": {"l": 172.68269, "t": 274.60357999999997, "r": 194.09445, "b": 282.61951, "coord_origin": "TOPLEFT"}}, {"id": 33, "text": "Ali", "bbox": {"l": 200.02124, "t": 274.60357999999997, "r": 211.4803, "b": 282.61951, "coord_origin": "TOPLEFT"}}, {"id": 34, "text": "Fateh,", "bbox": {"l": 217.40708999999998, "t": 274.60357999999997, "r": 239.43755, "b": 282.61951, "coord_origin": "TOPLEFT"}}, {"id": 35, "text": "Syed", "bbox": {"l": 246.28787000000003, "t": 274.60357999999997, "r": 264.22067, "b": 282.61951, "coord_origin": "TOPLEFT"}}, {"id": 36, "text": "Tah-", "bbox": {"l": 270.14746, "t": 274.60357999999997, "r": 286.35873, "b": 282.61951, "coord_origin": "TOPLEFT"}}, {"id": 37, "text": "seen Raza Rizvi, Andreas Dengel, and Sheraz Ahmed.", "bbox": {"l": 70.031006, "t": 285.56256, "r": 286.36331, "b": 293.57852, "coord_origin": "TOPLEFT"}}, {"id": 38, "text": "Deeptabstr: Deep learning based table structure recognition.", "bbox": {"l": 70.031006, "t": 296.52155, "r": 286.36331, "b": 304.53751, "coord_origin": "TOPLEFT"}}, {"id": 39, "text": "In", "bbox": {"l": 70.031006, "t": 307.48053, "r": 77.500015, "b": 315.49649, "coord_origin": "TOPLEFT"}}, {"id": 40, "text": "2019 International Conference on Document Analysis and", "bbox": {"l": 79.350006, "t": 307.56122, "r": 286.36627, "b": 315.29028, "coord_origin": "TOPLEFT"}}, {"id": 41, "text": "Recognition (ICDAR)", "bbox": {"l": 70.031006, "t": 318.51923, "r": 147.57243, "b": 326.24829, "coord_origin": "TOPLEFT"}}, {"id": 42, "text": ", pages 1403-1409. IEEE, 2019. 3", "bbox": {"l": 147.57201, "t": 318.43854, "r": 271.33521, "b": 326.4545, "coord_origin": "TOPLEFT"}}]}, "text": "[29] Shoaib Ahmed Siddiqui, Imran Ali Fateh, Syed Tahseen Raza Rizvi, Andreas Dengel, and Sheraz Ahmed. Deeptabstr: Deep learning based table structure recognition. In 2019 International Conference on Document Analysis and Recognition (ICDAR) , pages 1403-1409. IEEE, 2019. 3"}, {"label": "list_item", "id": 5, "page_no": 9, "cluster": {"id": 5, "label": "list_item", "bbox": {"l": 49.465091705322266, "t": 329.0513610839844, "r": 286.52130126953125, "b": 381.7455139160156, "coord_origin": "TOPLEFT"}, "confidence": 0.8567717671394348, "cells": [{"id": 43, "text": "[30]", "bbox": {"l": 50.112007, "t": 329.76254, "r": 65.366135, "b": 337.7785, "coord_origin": "TOPLEFT"}}, {"id": 44, "text": "Peter W J Staar, Michele Dolfi, Christoph Auer, and Costas", "bbox": {"l": 67.655159, "t": 329.76254, "r": 286.3587, "b": 337.7785, "coord_origin": "TOPLEFT"}}, {"id": 45, "text": "Bekas. Corpus conversion service: A machine learning plat-", "bbox": {"l": 70.031006, "t": 340.72156000000007, "r": 286.36334, "b": 348.7375200000001, "coord_origin": "TOPLEFT"}}, {"id": 46, "text": "form to ingest documents at scale.", "bbox": {"l": 70.031006, "t": 351.68054, "r": 198.82439, "b": 359.6965, "coord_origin": "TOPLEFT"}}, {"id": 47, "text": "In", "bbox": {"l": 206.06027, "t": 351.68054, "r": 213.52928, "b": 359.6965, "coord_origin": "TOPLEFT"}}, {"id": 48, "text": "Proceedings of the", "bbox": {"l": 217.02101, "t": 351.76123, "r": 286.35815, "b": 359.4903, "coord_origin": "TOPLEFT"}}, {"id": 49, "text": "24th ACM SIGKDD", "bbox": {"l": 70.031006, "t": 362.72021, "r": 143.08028, "b": 370.44928, "coord_origin": "TOPLEFT"}}, {"id": 50, "text": ", KDD \u201918, pages 774-782, New York,", "bbox": {"l": 143.078, "t": 362.63953000000004, "r": 286.36111, "b": 370.65549000000004, "coord_origin": "TOPLEFT"}}, {"id": 51, "text": "NY, USA, 2018. ACM. 1", "bbox": {"l": 70.031006, "t": 373.59851, "r": 161.15652, "b": 381.61447, "coord_origin": "TOPLEFT"}}]}, "text": "[30] Peter W J Staar, Michele Dolfi, Christoph Auer, and Costas Bekas. Corpus conversion service: A machine learning platform to ingest documents at scale. In Proceedings of the 24th ACM SIGKDD , KDD \u201918, pages 774-782, New York, NY, USA, 2018. ACM. 1"}, {"label": "list_item", "id": 6, "page_no": 9, "cluster": {"id": 6, "label": "list_item", "bbox": {"l": 49.55904006958008, "t": 384.240234375, "r": 286.4693603515625, "b": 458.69144, "coord_origin": "TOPLEFT"}, "confidence": 0.88321852684021, "cells": [{"id": 52, "text": "[31]", "bbox": {"l": 50.112007, "t": 384.92252, "r": 65.140724, "b": 392.93848, "coord_origin": "TOPLEFT"}}, {"id": 53, "text": "Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko-", "bbox": {"l": 67.395927, "t": 384.92252, "r": 286.35876, "b": 392.93848, "coord_origin": "TOPLEFT"}}, {"id": 54, "text": "reit, Llion Jones, Aidan N Gomez, \u0141 ukasz Kaiser, and Il-", "bbox": {"l": 70.031006, "t": 395.88153, "r": 286.36337, "b": 403.89749, "coord_origin": "TOPLEFT"}}, {"id": 55, "text": "lia Polosukhin.", "bbox": {"l": 70.031006, "t": 406.84052, "r": 125.47024999999998, "b": 414.85648, "coord_origin": "TOPLEFT"}}, {"id": 56, "text": "Attention is all you need.", "bbox": {"l": 133.90764, "t": 406.84052, "r": 230.83444, "b": 414.85648, "coord_origin": "TOPLEFT"}}, {"id": 57, "text": "In I. Guyon,", "bbox": {"l": 239.27182, "t": 406.84052, "r": 286.36334, "b": 414.85648, "coord_origin": "TOPLEFT"}}, {"id": 58, "text": "U.", "bbox": {"l": 70.031006, "t": 417.7995, "r": 78.958366, "b": 425.81546, "coord_origin": "TOPLEFT"}}, {"id": 59, "text": "V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vish-", "bbox": {"l": 81.254494, "t": 417.7995, "r": 286.36334, "b": 425.81546, "coord_origin": "TOPLEFT"}}, {"id": 60, "text": "wanathan, and R. Garnett, editors,", "bbox": {"l": 70.031006, "t": 428.75751, "r": 196.7621, "b": 436.7734699999999, "coord_origin": "TOPLEFT"}}, {"id": 61, "text": "Advances in Neural In-", "bbox": {"l": 200.20201, "t": 428.8381999999999, "r": 286.36017, "b": 436.56726, "coord_origin": "TOPLEFT"}}, {"id": 62, "text": "formation Processing Systems 30", "bbox": {"l": 70.031006, "t": 439.79717999999997, "r": 189.19447, "b": 447.52624999999995, "coord_origin": "TOPLEFT"}}, {"id": 63, "text": ", pages 5998-6008. Curran", "bbox": {"l": 189.19501, "t": 439.71648999999996, "r": 286.36389, "b": 447.73245, "coord_origin": "TOPLEFT"}}, {"id": 64, "text": "Associates, Inc., 2017. 5", "bbox": {"l": 70.031006, "t": 450.67548, "r": 158.9239, "b": 458.69144, "coord_origin": "TOPLEFT"}}]}, "text": "[31] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, \u0141 ukasz Kaiser, and Illia Polosukhin. Attention is all you need. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30 , pages 5998-6008. Curran Associates, Inc., 2017. 5"}, {"label": "list_item", "id": 7, "page_no": 9, "cluster": {"id": 7, "label": "list_item", "bbox": {"l": 49.52634048461914, "t": 461.1815490722656, "r": 286.46636962890625, "b": 502.89243, "coord_origin": "TOPLEFT"}, "confidence": 0.8431898355484009, "cells": [{"id": 65, "text": "[32]", "bbox": {"l": 50.112007, "t": 461.99948, "r": 65.910469, "b": 470.01544, "coord_origin": "TOPLEFT"}}, {"id": 66, "text": "Oriol Vinyals, Alexander Toshev, Samy Bengio, and Du-", "bbox": {"l": 68.281181, "t": 461.99948, "r": 286.35873, "b": 470.01544, "coord_origin": "TOPLEFT"}}, {"id": 67, "text": "mitru Erhan.", "bbox": {"l": 70.031006, "t": 472.9585, "r": 116.27969999999999, "b": 480.97446, "coord_origin": "TOPLEFT"}}, {"id": 68, "text": "Show and tell: A neural image caption gen-", "bbox": {"l": 122.48445, "t": 472.9585, "r": 286.36334, "b": 480.97446, "coord_origin": "TOPLEFT"}}, {"id": 69, "text": "erator. In", "bbox": {"l": 70.031006, "t": 483.91748, "r": 103.30532, "b": 491.93344, "coord_origin": "TOPLEFT"}}, {"id": 70, "text": "Proceedings of the IEEE Conference on Computer", "bbox": {"l": 105.51601, "t": 483.99817, "r": 286.35931, "b": 491.72723, "coord_origin": "TOPLEFT"}}, {"id": 71, "text": "Vision and Pattern Recognition (CVPR)", "bbox": {"l": 70.031006, "t": 494.95715, "r": 212.51607, "b": 502.68622, "coord_origin": "TOPLEFT"}}, {"id": 72, "text": ", June 2015. 2", "bbox": {"l": 212.51401, "t": 494.87646, "r": 263.55975, "b": 502.89243, "coord_origin": "TOPLEFT"}}]}, "text": "[32] Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan. Show and tell: A neural image caption generator. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , June 2015. 2"}, {"label": "list_item", "id": 8, "page_no": 9, "cluster": {"id": 8, "label": "list_item", "bbox": {"l": 49.58083724975586, "t": 505.2271728515625, "r": 286.77581787109375, "b": 547.5089721679688, "coord_origin": "TOPLEFT"}, "confidence": 0.8405721187591553, "cells": [{"id": 73, "text": "[33]", "bbox": {"l": 50.112015, "t": 506.20047, "r": 65.682777, "b": 514.21643, "coord_origin": "TOPLEFT"}}, {"id": 74, "text": "Wenyuan Xue, Qingyong Li, and Dacheng Tao.", "bbox": {"l": 68.019325, "t": 506.20047, "r": 247.37280000000004, "b": 514.21643, "coord_origin": "TOPLEFT"}}, {"id": 75, "text": "Res2tim:", "bbox": {"l": 253.97208000000003, "t": 506.20047, "r": 286.3587, "b": 514.21643, "coord_origin": "TOPLEFT"}}, {"id": 76, "text": "reconstruct syntactic structures from table images. In", "bbox": {"l": 70.031013, "t": 517.15948, "r": 265.62408, "b": 525.17545, "coord_origin": "TOPLEFT"}}, {"id": 77, "text": "2019", "bbox": {"l": 268.42902, "t": 517.24017, "r": 286.36182, "b": 524.96924, "coord_origin": "TOPLEFT"}}, {"id": 78, "text": "International Conference on Document Analysis and Recog-", "bbox": {"l": 70.031021, "t": 528.19916, "r": 286.36337, "b": 535.92822, "coord_origin": "TOPLEFT"}}, {"id": 79, "text": "nition (ICDAR)", "bbox": {"l": 70.031021, "t": 539.15718, "r": 125.25507999999999, "b": 546.88622, "coord_origin": "TOPLEFT"}}, {"id": 80, "text": ", pages 749-755. IEEE, 2019. 3", "bbox": {"l": 125.25402, "t": 539.07648, "r": 240.05083, "b": 547.09244, "coord_origin": "TOPLEFT"}}]}, "text": "[33] Wenyuan Xue, Qingyong Li, and Dacheng Tao. Res2tim: reconstruct syntactic structures from table images. In 2019 International Conference on Document Analysis and Recognition (ICDAR) , pages 749-755. IEEE, 2019. 3"}, {"label": "list_item", "id": 9, "page_no": 9, "cluster": {"id": 9, "label": "list_item", "bbox": {"l": 49.6944580078125, "t": 549.8140869140625, "r": 286.58221435546875, "b": 591.29344, "coord_origin": "TOPLEFT"}, "confidence": 0.8241643309593201, "cells": [{"id": 81, "text": "[34]", "bbox": {"l": 50.112022, "t": 550.40048, "r": 66.037048, "b": 558.41644, "coord_origin": "TOPLEFT"}}, {"id": 82, "text": "Wenyuan Xue, Baosheng Yu, Wen Wang, Dacheng Tao,", "bbox": {"l": 68.426765, "t": 550.40048, "r": 286.3587, "b": 558.41644, "coord_origin": "TOPLEFT"}}, {"id": 83, "text": "and Qingyong Li.", "bbox": {"l": 70.031021, "t": 561.35948, "r": 137.08176, "b": 569.37544, "coord_origin": "TOPLEFT"}}, {"id": 84, "text": "Tgrnet:", "bbox": {"l": 145.9854, "t": 561.35948, "r": 172.38248, "b": 569.37544, "coord_origin": "TOPLEFT"}}, {"id": 85, "text": "A table graph reconstruction", "bbox": {"l": 178.7038, "t": 561.35948, "r": 286.36337, "b": 569.37544, "coord_origin": "TOPLEFT"}}, {"id": 86, "text": "network for table structure recognition.", "bbox": {"l": 70.031021, "t": 572.31848, "r": 221.00723, "b": 580.33444, "coord_origin": "TOPLEFT"}}, {"id": 87, "text": "arXiv preprint", "bbox": {"l": 232.54300999999998, "t": 572.39919, "r": 286.35938, "b": 580.12822, "coord_origin": "TOPLEFT"}}, {"id": 88, "text": "arXiv:2106.10598", "bbox": {"l": 70.031021, "t": 583.35818, "r": 135.53058, "b": 591.08722, "coord_origin": "TOPLEFT"}}, {"id": 89, "text": ", 2021. 3", "bbox": {"l": 135.53003, "t": 583.27748, "r": 167.89876, "b": 591.29344, "coord_origin": "TOPLEFT"}}]}, "text": "[34] Wenyuan Xue, Baosheng Yu, Wen Wang, Dacheng Tao, and Qingyong Li. Tgrnet: A table graph reconstruction network for table structure recognition. arXiv preprint arXiv:2106.10598 , 2021. 3"}, {"label": "list_item", "id": 10, "page_no": 9, "cluster": {"id": 10, "label": "list_item", "bbox": {"l": 49.64909744262695, "t": 593.8485107421875, "r": 286.92523193359375, "b": 635.7434692382812, "coord_origin": "TOPLEFT"}, "confidence": 0.836098849773407, "cells": [{"id": 90, "text": "[35]", "bbox": {"l": 50.11203, "t": 594.60149, "r": 65.23661, "b": 602.61745, "coord_origin": "TOPLEFT"}}, {"id": 91, "text": "Quanzeng You, Hailin Jin, Zhaowen Wang, Chen Fang, and", "bbox": {"l": 67.506203, "t": 594.60149, "r": 286.3587, "b": 602.61745, "coord_origin": "TOPLEFT"}}, {"id": 92, "text": "Jiebo Luo.", "bbox": {"l": 70.031029, "t": 605.56049, "r": 109.1066, "b": 613.57645, "coord_origin": "TOPLEFT"}}, {"id": 93, "text": "Image captioning with semantic attention.", "bbox": {"l": 116.22592, "t": 605.56049, "r": 271.76605, "b": 613.57645, "coord_origin": "TOPLEFT"}}, {"id": 94, "text": "In", "bbox": {"l": 278.89435, "t": 605.56049, "r": 286.36337, "b": 613.57645, "coord_origin": "TOPLEFT"}}, {"id": 95, "text": "Proceedings of the IEEE conference on computer vision and", "bbox": {"l": 70.031029, "t": 616.60019, "r": 286.3634, "b": 624.32922, "coord_origin": "TOPLEFT"}}, {"id": 96, "text": "pattern recognition", "bbox": {"l": 70.031029, "t": 627.55919, "r": 139.09921, "b": 635.28822, "coord_origin": "TOPLEFT"}}, {"id": 97, "text": ", pages 4651-4659, 2016. 4", "bbox": {"l": 139.09802, "t": 627.47849, "r": 238.95683, "b": 635.49445, "coord_origin": "TOPLEFT"}}]}, "text": "[35] Quanzeng You, Hailin Jin, Zhaowen Wang, Chen Fang, and Jiebo Luo. Image captioning with semantic attention. In Proceedings of the IEEE conference on computer vision and pattern recognition , pages 4651-4659, 2016. 4"}, {"label": "list_item", "id": 11, "page_no": 9, "cluster": {"id": 11, "label": "list_item", "bbox": {"l": 49.76350021362305, "t": 637.6057739257812, "r": 286.3849182128906, "b": 691.11279296875, "coord_origin": "TOPLEFT"}, "confidence": 0.8080187439918518, "cells": [{"id": 98, "text": "[36]", "bbox": {"l": 50.112022, "t": 638.80249, "r": 65.203552, "b": 646.81845, "coord_origin": "TOPLEFT"}}, {"id": 99, "text": "Xinyi Zheng, Doug Burdick, Lucian Popa, Peter Zhong, and", "bbox": {"l": 67.468193, "t": 638.80249, "r": 286.35873, "b": 646.81845, "coord_origin": "TOPLEFT"}}, {"id": 100, "text": "Nancy Xin Ru Wang. Global table extractor (gte): A frame-", "bbox": {"l": 70.031021, "t": 649.7605, "r": 286.36337, "b": 657.77646, "coord_origin": "TOPLEFT"}}, {"id": 101, "text": "work for joint table identification and cell structure recogni-", "bbox": {"l": 70.031021, "t": 660.7195, "r": 286.36334, "b": 668.73547, "coord_origin": "TOPLEFT"}}, {"id": 102, "text": "tion using visual context.", "bbox": {"l": 70.031021, "t": 671.6785, "r": 158.45766, "b": 679.69447, "coord_origin": "TOPLEFT"}}, {"id": 103, "text": "Winter Conference for Applications", "bbox": {"l": 160.52802, "t": 671.7592, "r": 286.36249, "b": 679.48824, "coord_origin": "TOPLEFT"}}, {"id": 104, "text": "in Computer Vision (WACV)", "bbox": {"l": 70.031013, "t": 682.7182, "r": 171.42305, "b": 690.44724, "coord_origin": "TOPLEFT"}}, {"id": 105, "text": ", 2021. 2, 3", "bbox": {"l": 171.42201, "t": 682.6375, "r": 212.75713, "b": 690.65347, "coord_origin": "TOPLEFT"}}]}, "text": "[36] Xinyi Zheng, Doug Burdick, Lucian Popa, Peter Zhong, and Nancy Xin Ru Wang. Global table extractor (gte): A framework for joint table identification and cell structure recognition using visual context. Winter Conference for Applications in Computer Vision (WACV) , 2021. 2, 3"}, {"label": "list_item", "id": 12, "page_no": 9, "cluster": {"id": 12, "label": "list_item", "bbox": {"l": 49.56057357788086, "t": 693.036865234375, "r": 286.3756408691406, "b": 713.1453247070312, "coord_origin": "TOPLEFT"}, "confidence": 0.8385704755783081, "cells": [{"id": 106, "text": "[37]", "bbox": {"l": 50.112015, "t": 693.961502, "r": 66.506706, "b": 701.977463, "coord_origin": "TOPLEFT"}}, {"id": 107, "text": "Xu", "bbox": {"l": 68.966896, "t": 693.961502, "r": 80.992294, "b": 701.977463, "coord_origin": "TOPLEFT"}}, {"id": 108, "text": "Zhong,", "bbox": {"l": 89.062057, "t": 693.961502, "r": 114.71492999999998, "b": 701.977463, "coord_origin": "TOPLEFT"}}, {"id": 109, "text": "Elaheh", "bbox": {"l": 124.24621000000002, "t": 693.961502, "r": 149.1459, "b": 701.977463, "coord_origin": "TOPLEFT"}}, {"id": 110, "text": "ShafieiBavani,", "bbox": {"l": 157.22462, "t": 693.961502, "r": 209.37321, "b": 701.977463, "coord_origin": "TOPLEFT"}}, {"id": 111, "text": "and", "bbox": {"l": 218.9045, "t": 693.961502, "r": 231.85196999999997, "b": 701.977463, "coord_origin": "TOPLEFT"}}, {"id": 112, "text": "Antonio", "bbox": {"l": 239.93069, "t": 693.961502, "r": 269.32254, "b": 701.977463, "coord_origin": "TOPLEFT"}}, {"id": 113, "text": "Ji-", "bbox": {"l": 277.3923, "t": 693.961502, "r": 286.3587, "b": 701.977463, "coord_origin": "TOPLEFT"}}, {"id": 114, "text": "meno Yepes. Image-based table recognition: Data, model,", "bbox": {"l": 70.031013, "t": 704.920502, "r": 286.36334, "b": 712.936462, "coord_origin": "TOPLEFT"}}]}, "text": "[37] Xu Zhong, Elaheh ShafieiBavani, and Antonio Jimeno Yepes. Image-based table recognition: Data, model,"}, {"label": "list_item", "id": 13, "page_no": 9, "cluster": {"id": 13, "label": "list_item", "bbox": {"l": 327.91717529296875, "t": 75.0067138671875, "r": 545.3355102539062, "b": 116.9290771484375, "coord_origin": "TOPLEFT"}, "confidence": 0.7556977868080139, "cells": [{"id": 115, "text": "and evaluation. In Andrea Vedaldi, Horst Bischof, Thomas", "bbox": {"l": 328.78101, "t": 75.88347999999996, "r": 545.11346, "b": 83.89948000000015, "coord_origin": "TOPLEFT"}}, {"id": 116, "text": "Brox, and Jan-Michael Frahm, editors,", "bbox": {"l": 328.78101, "t": 86.84149000000002, "r": 472.30618, "b": 94.85748000000001, "coord_origin": "TOPLEFT"}}, {"id": 117, "text": "Computer Vision -", "bbox": {"l": 475.88501, "t": 86.92218000000003, "r": 545.11456, "b": 94.65125, "coord_origin": "TOPLEFT"}}, {"id": 118, "text": "ECCV 2020", "bbox": {"l": 328.78101, "t": 97.88116000000002, "r": 371.92734, "b": 105.61023, "coord_origin": "TOPLEFT"}}, {"id": 119, "text": ", pages 564-580, Cham, 2020. Springer Interna-", "bbox": {"l": 371.92599, "t": 97.80048, "r": 545.11206, "b": 105.81646999999987, "coord_origin": "TOPLEFT"}}, {"id": 120, "text": "tional Publishing. 2, 3, 7", "bbox": {"l": 328.78101, "t": 108.75945999999999, "r": 417.70087, "b": 116.77544999999998, "coord_origin": "TOPLEFT"}}]}, "text": "and evaluation. In Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm, editors, Computer Vision ECCV 2020 , pages 564-580, Cham, 2020. Springer International Publishing. 2, 3, 7"}, {"label": "list_item", "id": 14, "page_no": 9, "cluster": {"id": 14, "label": "list_item", "bbox": {"l": 308.3899230957031, "t": 119.63013458251953, "r": 545.5173950195312, "b": 162.1349334716797, "coord_origin": "TOPLEFT"}, "confidence": 0.8375809788703918, "cells": [{"id": 121, "text": "[38]", "bbox": {"l": 308.862, "t": 120.71447999999998, "r": 324.33197, "b": 128.73046999999997, "coord_origin": "TOPLEFT"}}, {"id": 122, "text": "Xu Zhong, Jianbin Tang, and Antonio Jimeno Yepes. Pub-", "bbox": {"l": 326.65341, "t": 120.71447999999998, "r": 545.10876, "b": 128.73046999999997, "coord_origin": "TOPLEFT"}}, {"id": 123, "text": "laynet: Largest dataset ever for document layout analysis. In", "bbox": {"l": 328.78101, "t": 131.67345999999998, "r": 545.11334, "b": 139.68944999999997, "coord_origin": "TOPLEFT"}}, {"id": 124, "text": "2019 International Conference on Document Analysis and", "bbox": {"l": 328.78101, "t": 142.71312999999998, "r": 545.11328, "b": 150.44219999999996, "coord_origin": "TOPLEFT"}}, {"id": 125, "text": "Recognition (ICDAR)", "bbox": {"l": 328.78101, "t": 153.67211999999995, "r": 406.32245, "b": 161.40117999999995, "coord_origin": "TOPLEFT"}}, {"id": 126, "text": ", pages 1015-1022, 2019. 1", "bbox": {"l": 406.32202, "t": 153.59142999999995, "r": 506.18085, "b": 161.60742000000005, "coord_origin": "TOPLEFT"}}]}, "text": "[38] Xu Zhong, Jianbin Tang, and Antonio Jimeno Yepes. Publaynet: Largest dataset ever for document layout analysis. In 2019 International Conference on Document Analysis and Recognition (ICDAR) , pages 1015-1022, 2019. 1"}], "headers": [{"label": "page_footer", "id": 15, "page_no": 9, "cluster": {"id": 15, "label": "page_footer", "bbox": {"l": 292.63, "t": 733.3956909179688, "r": 302.69708251953125, "b": 743.039555, "coord_origin": "TOPLEFT"}, "confidence": 0.9069584012031555, "cells": [{"id": 127, "text": "10", "bbox": {"l": 292.63, "t": 734.1329920000001, "r": 302.59259, "b": 743.039555, "coord_origin": "TOPLEFT"}}]}, "text": "10"}]}}, {"page_no": 10, "size": {"width": 612.0, "height": 792.0}, "cells": [{"id": 0, "text": "TableFormer: Table Structure Understanding with Transformers", "bbox": {"l": 132.842, "t": 110.57488999999998, "r": 465.37591999999995, "b": 121.32263, "coord_origin": "TOPLEFT"}}, {"id": 1, "text": "Supplementary Material", "bbox": {"l": 220.18399, "t": 122.25982999999997, "r": 375.04269, "b": 135.53008999999997, "coord_origin": "TOPLEFT"}}, {"id": 2, "text": "1.", "bbox": {"l": 50.111984, "t": 161.16089, "r": 57.089828, "b": 171.90863000000002, "coord_origin": "TOPLEFT"}}, {"id": 3, "text": "Details on the datasets", "bbox": {"l": 66.393616, "t": 161.16089, "r": 175.96437, "b": 171.90863000000002, "coord_origin": "TOPLEFT"}}, {"id": 4, "text": "1.1.", "bbox": {"l": 50.111984, "t": 180.97931000000005, "r": 64.210808, "b": 190.83136000000002, "coord_origin": "TOPLEFT"}}, {"id": 5, "text": "Data preparation", "bbox": {"l": 73.610023, "t": 180.97931000000005, "r": 150.36401, "b": 190.83136000000002, "coord_origin": "TOPLEFT"}}, {"id": 6, "text": "As a first step of our data preparation process, we have", "bbox": {"l": 62.06698600000001, "t": 199.92029000000002, "r": 286.36496, "b": 208.82683999999995, "coord_origin": "TOPLEFT"}}, {"id": 7, "text": "calculated statistics over the datasets across the following", "bbox": {"l": 50.111984, "t": 211.87627999999995, "r": 286.36505, "b": 220.78283999999996, "coord_origin": "TOPLEFT"}}, {"id": 8, "text": "dimensions: (1) table size measured in the number of rows", "bbox": {"l": 50.111984, "t": 223.83130000000006, "r": 286.36514, "b": 232.73784999999998, "coord_origin": "TOPLEFT"}}, {"id": 9, "text": "and columns, (2) complexity of the table, (3) strictness of", "bbox": {"l": 50.111984, "t": 235.78632000000005, "r": 286.36508, "b": 244.69286999999997, "coord_origin": "TOPLEFT"}}, {"id": 10, "text": "the provided HTML structure and (4) completeness (i.e. no", "bbox": {"l": 50.111984, "t": 247.74132999999995, "r": 286.36505, "b": 256.64788999999996, "coord_origin": "TOPLEFT"}}, {"id": 11, "text": "omitted bounding boxes). A table is considered to be simple", "bbox": {"l": 50.111984, "t": 259.69635000000005, "r": 286.36505, "b": 268.60290999999995, "coord_origin": "TOPLEFT"}}, {"id": 12, "text": "if it does not contain row spans or column spans. Addition-", "bbox": {"l": 50.111984, "t": 271.65137000000004, "r": 286.36505, "b": 280.55792, "coord_origin": "TOPLEFT"}}, {"id": 13, "text": "ally, a table has a strict HTML structure if every row has the", "bbox": {"l": 50.111984, "t": 283.60736, "r": 286.36502, "b": 292.5139199999999, "coord_origin": "TOPLEFT"}}, {"id": 14, "text": "same number of columns after taking into account any row", "bbox": {"l": 50.111984, "t": 295.56235, "r": 286.36505, "b": 304.4689, "coord_origin": "TOPLEFT"}}, {"id": 15, "text": "or column spans. Therefore a strict HTML structure looks", "bbox": {"l": 50.111984, "t": 307.5173300000001, "r": 286.36508, "b": 316.42389, "coord_origin": "TOPLEFT"}}, {"id": 16, "text": "always rectangular. However, HTML is a lenient encoding", "bbox": {"l": 50.111984, "t": 319.47232, "r": 286.36505, "b": 328.3788799999999, "coord_origin": "TOPLEFT"}}, {"id": 17, "text": "format, i.e. tables with rows of different sizes might still", "bbox": {"l": 50.111984, "t": 331.42731000000003, "r": 286.36502, "b": 340.33386, "coord_origin": "TOPLEFT"}}, {"id": 18, "text": "be regarded as correct due to implicit display rules. These", "bbox": {"l": 50.111984, "t": 343.3833, "r": 286.36508, "b": 352.28986, "coord_origin": "TOPLEFT"}}, {"id": 19, "text": "implicit rules leave room for ambiguity, which we want to", "bbox": {"l": 50.111984, "t": 355.33829, "r": 286.36505, "b": 364.24484000000007, "coord_origin": "TOPLEFT"}}, {"id": 20, "text": "avoid. As such, we prefer to have \u201dstrict\u201d tables, i.e. tables", "bbox": {"l": 50.111984, "t": 367.29327, "r": 286.36508, "b": 376.19983, "coord_origin": "TOPLEFT"}}, {"id": 21, "text": "where every row has exactly the same length.", "bbox": {"l": 50.111984, "t": 379.24826, "r": 230.80364999999998, "b": 388.15482000000003, "coord_origin": "TOPLEFT"}}, {"id": 22, "text": "We have developed a technique that tries to derive a", "bbox": {"l": 62.06698600000001, "t": 391.40527, "r": 286.36499, "b": 400.31183, "coord_origin": "TOPLEFT"}}, {"id": 23, "text": "missing bounding box out of its neighbors. As a first step,", "bbox": {"l": 50.111984, "t": 403.36026, "r": 286.36508, "b": 412.26681999999994, "coord_origin": "TOPLEFT"}}, {"id": 24, "text": "we use the annotation data to generate the most fine-grained", "bbox": {"l": 50.111984, "t": 415.31525, "r": 286.36505, "b": 424.22180000000003, "coord_origin": "TOPLEFT"}}, {"id": 25, "text": "grid that covers the table structure. In case of strict HTML", "bbox": {"l": 50.111984, "t": 427.2712399999999, "r": 286.36505, "b": 436.1778, "coord_origin": "TOPLEFT"}}, {"id": 26, "text": "tables, all grid squares are associated with some table cell", "bbox": {"l": 50.111984, "t": 439.22623, "r": 286.36508, "b": 448.1327800000001, "coord_origin": "TOPLEFT"}}, {"id": 27, "text": "and in the presence of table spans a cell extends across mul-", "bbox": {"l": 50.111984, "t": 451.18121, "r": 286.36511, "b": 460.08777, "coord_origin": "TOPLEFT"}}, {"id": 28, "text": "tiple grid squares. When enough bounding boxes are known", "bbox": {"l": 50.111984, "t": 463.1362, "r": 286.36505, "b": 472.04276, "coord_origin": "TOPLEFT"}}, {"id": 29, "text": "for a rectangular table, it is possible to compute the geo-", "bbox": {"l": 50.111984, "t": 475.09119, "r": 286.36508, "b": 483.99774, "coord_origin": "TOPLEFT"}}, {"id": 30, "text": "metrical border lines between the grid rows and columns.", "bbox": {"l": 50.111984, "t": 487.04617, "r": 286.36502, "b": 495.95273, "coord_origin": "TOPLEFT"}}, {"id": 31, "text": "Eventually this information is used to generate the missing", "bbox": {"l": 50.111984, "t": 499.00217, "r": 286.36511, "b": 507.90872, "coord_origin": "TOPLEFT"}}, {"id": 32, "text": "bounding boxes. Additionally, the existence of unused grid", "bbox": {"l": 50.111984, "t": 510.95715, "r": 286.36508, "b": 519.8637100000001, "coord_origin": "TOPLEFT"}}, {"id": 33, "text": "squares indicates that the table rows have unequal number", "bbox": {"l": 50.111984, "t": 522.91214, "r": 286.36508, "b": 531.8187, "coord_origin": "TOPLEFT"}}, {"id": 34, "text": "of columns and the overall structure is non-strict. The gen-", "bbox": {"l": 50.111984, "t": 534.86713, "r": 286.36505, "b": 543.7737, "coord_origin": "TOPLEFT"}}, {"id": 35, "text": "eration of missing bounding boxes for non-strict HTML ta-", "bbox": {"l": 50.111984, "t": 546.82214, "r": 286.36502, "b": 555.7287, "coord_origin": "TOPLEFT"}}, {"id": 36, "text": "bles is ambiguous and therefore quite challenging.", "bbox": {"l": 50.111984, "t": 558.77814, "r": 257.47351, "b": 567.68469, "coord_origin": "TOPLEFT"}}, {"id": 37, "text": "Thus,", "bbox": {"l": 263.94919, "t": 558.77814, "r": 286.36505, "b": 567.68469, "coord_origin": "TOPLEFT"}}, {"id": 38, "text": "we have decided to simply discard those tables. In case of", "bbox": {"l": 50.111984, "t": 570.73314, "r": 286.36508, "b": 579.63969, "coord_origin": "TOPLEFT"}}, {"id": 39, "text": "PubTabNet we have computed missing bounding boxes for", "bbox": {"l": 50.111984, "t": 582.68814, "r": 286.36511, "b": 591.5947, "coord_origin": "TOPLEFT"}}, {"id": 40, "text": "48% of the simple and 69% of the complex tables. Regard-", "bbox": {"l": 50.111984, "t": 594.64314, "r": 286.36511, "b": 603.5497, "coord_origin": "TOPLEFT"}}, {"id": 41, "text": "ing FinTabNet, 68% of the simple and 98% of the complex", "bbox": {"l": 50.111984, "t": 606.5981400000001, "r": 286.36505, "b": 615.5047, "coord_origin": "TOPLEFT"}}, {"id": 42, "text": "tables require the generation of bounding boxes.", "bbox": {"l": 50.111984, "t": 618.55315, "r": 242.2606, "b": 627.4597, "coord_origin": "TOPLEFT"}}, {"id": 43, "text": "Figure 7 illustrates the distribution of the tables across", "bbox": {"l": 62.06698600000001, "t": 630.71014, "r": 286.36496, "b": 639.6167, "coord_origin": "TOPLEFT"}}, {"id": 44, "text": "different dimensions per dataset.", "bbox": {"l": 50.111984, "t": 642.66614, "r": 179.90472, "b": 651.57269, "coord_origin": "TOPLEFT"}}, {"id": 45, "text": "1.2.", "bbox": {"l": 50.111984, "t": 662.39014, "r": 64.297272, "b": 672.24219, "coord_origin": "TOPLEFT"}}, {"id": 46, "text": "Synthetic datasets", "bbox": {"l": 73.754135, "t": 662.39014, "r": 153.60785, "b": 672.24219, "coord_origin": "TOPLEFT"}}, {"id": 47, "text": "Aiming to train and evaluate our models in a broader", "bbox": {"l": 62.06698600000001, "t": 681.33113, "r": 286.36493, "b": 690.2377, "coord_origin": "TOPLEFT"}}, {"id": 48, "text": "spectrum of table data we have synthesized four types of", "bbox": {"l": 50.111984, "t": 693.2861330000001, "r": 286.36505, "b": 702.1927029999999, "coord_origin": "TOPLEFT"}}, {"id": 49, "text": "datasets.", "bbox": {"l": 50.111984, "t": 705.241135, "r": 84.144226, "b": 714.147705, "coord_origin": "TOPLEFT"}}, {"id": 50, "text": "Each one contains tables with different appear-", "bbox": {"l": 91.237595, "t": 705.241135, "r": 286.36505, "b": 714.147705, "coord_origin": "TOPLEFT"}}, {"id": 51, "text": "ances in regard to their size, structure, style and content.", "bbox": {"l": 308.862, "t": 162.65515000000005, "r": 545.11511, "b": 171.56170999999995, "coord_origin": "TOPLEFT"}}, {"id": 52, "text": "Every synthetic dataset contains 150k examples, summing", "bbox": {"l": 308.862, "t": 174.61017000000004, "r": 545.11511, "b": 183.51671999999996, "coord_origin": "TOPLEFT"}}, {"id": 53, "text": "up to 600k synthetic examples. All datasets are divided into", "bbox": {"l": 308.862, "t": 186.56519000000003, "r": 545.11511, "b": 195.47173999999995, "coord_origin": "TOPLEFT"}}, {"id": 54, "text": "Train, Test and Val splits (80%, 10%, 10%).", "bbox": {"l": 308.862, "t": 198.52117999999996, "r": 484.07434, "b": 207.42773, "coord_origin": "TOPLEFT"}}, {"id": 55, "text": "The process of generating a synthetic dataset can be de-", "bbox": {"l": 320.81699, "t": 211.23517000000004, "r": 545.11505, "b": 220.14171999999996, "coord_origin": "TOPLEFT"}}, {"id": 56, "text": "composed into the following steps:", "bbox": {"l": 308.862, "t": 223.19019000000003, "r": 448.08939, "b": 232.09673999999995, "coord_origin": "TOPLEFT"}}, {"id": 57, "text": "1.", "bbox": {"l": 320.81699, "t": 235.90521, "r": 328.28894, "b": 244.81177000000002, "coord_origin": "TOPLEFT"}}, {"id": 58, "text": "Prepare styling and content templates: The styling", "bbox": {"l": 335.38232, "t": 235.90521, "r": 545.11499, "b": 244.81177000000002, "coord_origin": "TOPLEFT"}}, {"id": 59, "text": "templates have been manually designed and organized into", "bbox": {"l": 308.862, "t": 247.86023, "r": 545.11511, "b": 256.76678000000004, "coord_origin": "TOPLEFT"}}, {"id": 60, "text": "groups of scope specific appearances (e.g. financial data,", "bbox": {"l": 308.862, "t": 259.81525, "r": 545.11511, "b": 268.72180000000003, "coord_origin": "TOPLEFT"}}, {"id": 61, "text": "marketing data, etc.)", "bbox": {"l": 308.862, "t": 271.77026, "r": 393.3847, "b": 280.67682, "coord_origin": "TOPLEFT"}}, {"id": 62, "text": "Additionally, we have prepared cu-", "bbox": {"l": 400.11942, "t": 271.77026, "r": 545.11511, "b": 280.67682, "coord_origin": "TOPLEFT"}}, {"id": 63, "text": "rated collections of content templates by extracting the most", "bbox": {"l": 308.862, "t": 283.72524999999996, "r": 545.11505, "b": 292.63181, "coord_origin": "TOPLEFT"}}, {"id": 64, "text": "frequently used terms out of non-synthetic datasets (e.g.", "bbox": {"l": 308.862, "t": 295.68124, "r": 545.11511, "b": 304.5878000000001, "coord_origin": "TOPLEFT"}}, {"id": 65, "text": "PubTabNet, FinTabNet, etc.).", "bbox": {"l": 308.862, "t": 307.63623, "r": 425.69348, "b": 316.54279, "coord_origin": "TOPLEFT"}}, {"id": 66, "text": "2.", "bbox": {"l": 320.81699, "t": 320.35022, "r": 328.4949, "b": 329.25677, "coord_origin": "TOPLEFT"}}, {"id": 67, "text": "Generate table structures: The structure of each syn-", "bbox": {"l": 331.05423, "t": 320.35022, "r": 545.11499, "b": 329.25677, "coord_origin": "TOPLEFT"}}, {"id": 68, "text": "thetic dataset assumes a horizontal table header which po-", "bbox": {"l": 308.862, "t": 332.30521000000005, "r": 545.11517, "b": 341.21176, "coord_origin": "TOPLEFT"}}, {"id": 69, "text": "tentially spans over multiple rows and a table body that", "bbox": {"l": 308.862, "t": 344.26018999999997, "r": 545.11505, "b": 353.16675, "coord_origin": "TOPLEFT"}}, {"id": 70, "text": "may contain a combination of row spans and column spans.", "bbox": {"l": 308.862, "t": 356.21619, "r": 545.11511, "b": 365.12273999999996, "coord_origin": "TOPLEFT"}}, {"id": 71, "text": "However, spans are not allowed to cross the header - body", "bbox": {"l": 308.862, "t": 368.17117, "r": 545.11511, "b": 377.07773, "coord_origin": "TOPLEFT"}}, {"id": 72, "text": "boundary. The table structure is described by the parame-", "bbox": {"l": 308.862, "t": 380.12616, "r": 545.11499, "b": 389.03271, "coord_origin": "TOPLEFT"}}, {"id": 73, "text": "ters: Total number of table rows and columns, number of", "bbox": {"l": 308.862, "t": 392.08115, "r": 545.11517, "b": 400.98769999999996, "coord_origin": "TOPLEFT"}}, {"id": 74, "text": "header rows, type of spans (header only spans, row only", "bbox": {"l": 308.862, "t": 404.03613000000007, "r": 545.11511, "b": 412.94269, "coord_origin": "TOPLEFT"}}, {"id": 75, "text": "spans, column only spans, both row and column spans),", "bbox": {"l": 308.862, "t": 415.99112, "r": 545.11499, "b": 424.89767, "coord_origin": "TOPLEFT"}}, {"id": 76, "text": "maximum span size and the ratio of the table area covered", "bbox": {"l": 308.862, "t": 427.94711, "r": 545.11517, "b": 436.85367, "coord_origin": "TOPLEFT"}}, {"id": 77, "text": "by spans.", "bbox": {"l": 308.862, "t": 439.9021, "r": 345.94278, "b": 448.80865, "coord_origin": "TOPLEFT"}}, {"id": 78, "text": "3.", "bbox": {"l": 320.81699, "t": 452.61609, "r": 328.30341, "b": 461.52264, "coord_origin": "TOPLEFT"}}, {"id": 79, "text": "Generate content: Based on the dataset", "bbox": {"l": 330.79889, "t": 452.61609, "r": 485.75772000000006, "b": 461.52264, "coord_origin": "TOPLEFT"}}, {"id": 80, "text": "theme", "bbox": {"l": 488.073, "t": 452.70575, "r": 511.86368, "b": 461.29352, "coord_origin": "TOPLEFT"}}, {"id": 81, "text": ", a set of", "bbox": {"l": 511.86301, "t": 452.61609, "r": 545.10815, "b": 461.52264, "coord_origin": "TOPLEFT"}}, {"id": 82, "text": "suitable content templates is chosen first. Then, this content", "bbox": {"l": 308.862, "t": 464.57108, "r": 545.11505, "b": 473.47763, "coord_origin": "TOPLEFT"}}, {"id": 83, "text": "can be combined with purely random text to produce the", "bbox": {"l": 308.862, "t": 476.52707, "r": 545.11517, "b": 485.43362, "coord_origin": "TOPLEFT"}}, {"id": 84, "text": "synthetic content.", "bbox": {"l": 308.862, "t": 488.48206, "r": 379.14816, "b": 497.38861, "coord_origin": "TOPLEFT"}}, {"id": 85, "text": "4.", "bbox": {"l": 320.81699, "t": 501.19604, "r": 328.66177, "b": 510.1026, "coord_origin": "TOPLEFT"}}, {"id": 86, "text": "Apply styling templates: Depending on the domain", "bbox": {"l": 331.2767, "t": 501.19604, "r": 545.11493, "b": 510.1026, "coord_origin": "TOPLEFT"}}, {"id": 87, "text": "of the synthetic dataset, a set of styling templates is first", "bbox": {"l": 308.862, "t": 513.15103, "r": 545.1153, "b": 522.05759, "coord_origin": "TOPLEFT"}}, {"id": 88, "text": "manually selected.", "bbox": {"l": 308.862, "t": 525.10703, "r": 384.29883, "b": 534.01358, "coord_origin": "TOPLEFT"}}, {"id": 89, "text": "Then, a style is randomly selected to", "bbox": {"l": 391.25272, "t": 525.10703, "r": 545.11511, "b": 534.01358, "coord_origin": "TOPLEFT"}}, {"id": 90, "text": "format the appearance of the synthesized table.", "bbox": {"l": 308.862, "t": 537.06203, "r": 496.15897000000007, "b": 545.96858, "coord_origin": "TOPLEFT"}}, {"id": 91, "text": "5.", "bbox": {"l": 320.81699, "t": 549.77603, "r": 328.28894, "b": 558.68259, "coord_origin": "TOPLEFT"}}, {"id": 92, "text": "Render the complete tables: The synthetic table is", "bbox": {"l": 335.40222, "t": 549.77603, "r": 545.11499, "b": 558.68259, "coord_origin": "TOPLEFT"}}, {"id": 93, "text": "finally rendered by a web browser engine to generate the", "bbox": {"l": 308.862, "t": 561.73103, "r": 545.11517, "b": 570.63759, "coord_origin": "TOPLEFT"}}, {"id": 94, "text": "bounding boxes for each table cell. A batching technique is", "bbox": {"l": 308.862, "t": 573.68604, "r": 545.11511, "b": 582.59259, "coord_origin": "TOPLEFT"}}, {"id": 95, "text": "utilized to optimize the runtime overhead of the rendering", "bbox": {"l": 308.862, "t": 585.64203, "r": 545.11505, "b": 594.54858, "coord_origin": "TOPLEFT"}}, {"id": 96, "text": "process.", "bbox": {"l": 308.862, "t": 597.59703, "r": 341.2305, "b": 606.50359, "coord_origin": "TOPLEFT"}}, {"id": 97, "text": "2.", "bbox": {"l": 308.862, "t": 622.2905900000001, "r": 316.76675, "b": 633.03831, "coord_origin": "TOPLEFT"}}, {"id": 98, "text": "Prediction post-processing for PDF docu-", "bbox": {"l": 327.30643, "t": 622.2905900000001, "r": 545.10876, "b": 633.03831, "coord_origin": "TOPLEFT"}}, {"id": 99, "text": "ments", "bbox": {"l": 326.79501, "t": 636.2385899999999, "r": 357.34055, "b": 646.98631, "coord_origin": "TOPLEFT"}}, {"id": 100, "text": "Although TableFormer can predict the table structure and", "bbox": {"l": 320.81702, "t": 657.42104, "r": 545.11499, "b": 666.3276, "coord_origin": "TOPLEFT"}}, {"id": 101, "text": "the bounding boxes for tables recognized inside PDF docu-", "bbox": {"l": 308.86203, "t": 669.37604, "r": 545.11511, "b": 678.2826, "coord_origin": "TOPLEFT"}}, {"id": 102, "text": "ments, this is not enough when a full reconstruction of the", "bbox": {"l": 308.86203, "t": 681.33104, "r": 545.11517, "b": 690.2376, "coord_origin": "TOPLEFT"}}, {"id": 103, "text": "original table is required. This happens mainly due the fol-", "bbox": {"l": 308.86203, "t": 693.286041, "r": 545.11505, "b": 702.1926040000001, "coord_origin": "TOPLEFT"}}, {"id": 104, "text": "lowing reasons:", "bbox": {"l": 308.86203, "t": 705.242035, "r": 371.42719, "b": 714.148605, "coord_origin": "TOPLEFT"}}, {"id": 105, "text": "11", "bbox": {"l": 292.63104, "t": 734.1330379999999, "r": 302.59363, "b": 743.0396, "coord_origin": "TOPLEFT"}}], "predictions": {"layout": {"clusters": [{"id": 0, "label": "section_header", "bbox": {"l": 132.30833435058594, "t": 109.52092742919922, "r": 465.6390075683594, "b": 135.91281127929688, "coord_origin": "TOPLEFT"}, "confidence": 0.6497087478637695, "cells": [{"id": 0, "text": "TableFormer: Table Structure Understanding with Transformers", "bbox": {"l": 132.842, "t": 110.57488999999998, "r": 465.37591999999995, "b": 121.32263, "coord_origin": "TOPLEFT"}}, {"id": 1, "text": "Supplementary Material", "bbox": {"l": 220.18399, "t": 122.25982999999997, "r": 375.04269, "b": 135.53008999999997, "coord_origin": "TOPLEFT"}}]}, {"id": 1, "label": "section_header", "bbox": {"l": 50.058563232421875, "t": 160.34249877929688, "r": 175.96437, "b": 171.90863000000002, "coord_origin": "TOPLEFT"}, "confidence": 0.9454860091209412, "cells": [{"id": 2, "text": "1.", "bbox": {"l": 50.111984, "t": 161.16089, "r": 57.089828, "b": 171.90863000000002, "coord_origin": "TOPLEFT"}}, {"id": 3, "text": "Details on the datasets", "bbox": {"l": 66.393616, "t": 161.16089, "r": 175.96437, "b": 171.90863000000002, "coord_origin": "TOPLEFT"}}]}, {"id": 2, "label": "section_header", "bbox": {"l": 49.89580154418945, "t": 180.4131622314453, "r": 150.36401, "b": 191.2897491455078, "coord_origin": "TOPLEFT"}, "confidence": 0.9535645246505737, "cells": [{"id": 4, "text": "1.1.", "bbox": {"l": 50.111984, "t": 180.97931000000005, "r": 64.210808, "b": 190.83136000000002, "coord_origin": "TOPLEFT"}}, {"id": 5, "text": "Data preparation", "bbox": {"l": 73.610023, "t": 180.97931000000005, "r": 150.36401, "b": 190.83136000000002, "coord_origin": "TOPLEFT"}}]}, {"id": 3, "label": "text", "bbox": {"l": 49.297401428222656, "t": 198.91737365722656, "r": 286.80126953125, "b": 388.41900634765625, "coord_origin": "TOPLEFT"}, "confidence": 0.9864971041679382, "cells": [{"id": 6, "text": "As a first step of our data preparation process, we have", "bbox": {"l": 62.06698600000001, "t": 199.92029000000002, "r": 286.36496, "b": 208.82683999999995, "coord_origin": "TOPLEFT"}}, {"id": 7, "text": "calculated statistics over the datasets across the following", "bbox": {"l": 50.111984, "t": 211.87627999999995, "r": 286.36505, "b": 220.78283999999996, "coord_origin": "TOPLEFT"}}, {"id": 8, "text": "dimensions: (1) table size measured in the number of rows", "bbox": {"l": 50.111984, "t": 223.83130000000006, "r": 286.36514, "b": 232.73784999999998, "coord_origin": "TOPLEFT"}}, {"id": 9, "text": "and columns, (2) complexity of the table, (3) strictness of", "bbox": {"l": 50.111984, "t": 235.78632000000005, "r": 286.36508, "b": 244.69286999999997, "coord_origin": "TOPLEFT"}}, {"id": 10, "text": "the provided HTML structure and (4) completeness (i.e. no", "bbox": {"l": 50.111984, "t": 247.74132999999995, "r": 286.36505, "b": 256.64788999999996, "coord_origin": "TOPLEFT"}}, {"id": 11, "text": "omitted bounding boxes). A table is considered to be simple", "bbox": {"l": 50.111984, "t": 259.69635000000005, "r": 286.36505, "b": 268.60290999999995, "coord_origin": "TOPLEFT"}}, {"id": 12, "text": "if it does not contain row spans or column spans. Addition-", "bbox": {"l": 50.111984, "t": 271.65137000000004, "r": 286.36505, "b": 280.55792, "coord_origin": "TOPLEFT"}}, {"id": 13, "text": "ally, a table has a strict HTML structure if every row has the", "bbox": {"l": 50.111984, "t": 283.60736, "r": 286.36502, "b": 292.5139199999999, "coord_origin": "TOPLEFT"}}, {"id": 14, "text": "same number of columns after taking into account any row", "bbox": {"l": 50.111984, "t": 295.56235, "r": 286.36505, "b": 304.4689, "coord_origin": "TOPLEFT"}}, {"id": 15, "text": "or column spans. Therefore a strict HTML structure looks", "bbox": {"l": 50.111984, "t": 307.5173300000001, "r": 286.36508, "b": 316.42389, "coord_origin": "TOPLEFT"}}, {"id": 16, "text": "always rectangular. However, HTML is a lenient encoding", "bbox": {"l": 50.111984, "t": 319.47232, "r": 286.36505, "b": 328.3788799999999, "coord_origin": "TOPLEFT"}}, {"id": 17, "text": "format, i.e. tables with rows of different sizes might still", "bbox": {"l": 50.111984, "t": 331.42731000000003, "r": 286.36502, "b": 340.33386, "coord_origin": "TOPLEFT"}}, {"id": 18, "text": "be regarded as correct due to implicit display rules. These", "bbox": {"l": 50.111984, "t": 343.3833, "r": 286.36508, "b": 352.28986, "coord_origin": "TOPLEFT"}}, {"id": 19, "text": "implicit rules leave room for ambiguity, which we want to", "bbox": {"l": 50.111984, "t": 355.33829, "r": 286.36505, "b": 364.24484000000007, "coord_origin": "TOPLEFT"}}, {"id": 20, "text": "avoid. As such, we prefer to have \u201dstrict\u201d tables, i.e. tables", "bbox": {"l": 50.111984, "t": 367.29327, "r": 286.36508, "b": 376.19983, "coord_origin": "TOPLEFT"}}, {"id": 21, "text": "where every row has exactly the same length.", "bbox": {"l": 50.111984, "t": 379.24826, "r": 230.80364999999998, "b": 388.15482000000003, "coord_origin": "TOPLEFT"}}]}, {"id": 4, "label": "text", "bbox": {"l": 49.358585357666016, "t": 390.24591064453125, "r": 286.73260498046875, "b": 628.0463256835938, "coord_origin": "TOPLEFT"}, "confidence": 0.9826022386550903, "cells": [{"id": 22, "text": "We have developed a technique that tries to derive a", "bbox": {"l": 62.06698600000001, "t": 391.40527, "r": 286.36499, "b": 400.31183, "coord_origin": "TOPLEFT"}}, {"id": 23, "text": "missing bounding box out of its neighbors. As a first step,", "bbox": {"l": 50.111984, "t": 403.36026, "r": 286.36508, "b": 412.26681999999994, "coord_origin": "TOPLEFT"}}, {"id": 24, "text": "we use the annotation data to generate the most fine-grained", "bbox": {"l": 50.111984, "t": 415.31525, "r": 286.36505, "b": 424.22180000000003, "coord_origin": "TOPLEFT"}}, {"id": 25, "text": "grid that covers the table structure. In case of strict HTML", "bbox": {"l": 50.111984, "t": 427.2712399999999, "r": 286.36505, "b": 436.1778, "coord_origin": "TOPLEFT"}}, {"id": 26, "text": "tables, all grid squares are associated with some table cell", "bbox": {"l": 50.111984, "t": 439.22623, "r": 286.36508, "b": 448.1327800000001, "coord_origin": "TOPLEFT"}}, {"id": 27, "text": "and in the presence of table spans a cell extends across mul-", "bbox": {"l": 50.111984, "t": 451.18121, "r": 286.36511, "b": 460.08777, "coord_origin": "TOPLEFT"}}, {"id": 28, "text": "tiple grid squares. When enough bounding boxes are known", "bbox": {"l": 50.111984, "t": 463.1362, "r": 286.36505, "b": 472.04276, "coord_origin": "TOPLEFT"}}, {"id": 29, "text": "for a rectangular table, it is possible to compute the geo-", "bbox": {"l": 50.111984, "t": 475.09119, "r": 286.36508, "b": 483.99774, "coord_origin": "TOPLEFT"}}, {"id": 30, "text": "metrical border lines between the grid rows and columns.", "bbox": {"l": 50.111984, "t": 487.04617, "r": 286.36502, "b": 495.95273, "coord_origin": "TOPLEFT"}}, {"id": 31, "text": "Eventually this information is used to generate the missing", "bbox": {"l": 50.111984, "t": 499.00217, "r": 286.36511, "b": 507.90872, "coord_origin": "TOPLEFT"}}, {"id": 32, "text": "bounding boxes. Additionally, the existence of unused grid", "bbox": {"l": 50.111984, "t": 510.95715, "r": 286.36508, "b": 519.8637100000001, "coord_origin": "TOPLEFT"}}, {"id": 33, "text": "squares indicates that the table rows have unequal number", "bbox": {"l": 50.111984, "t": 522.91214, "r": 286.36508, "b": 531.8187, "coord_origin": "TOPLEFT"}}, {"id": 34, "text": "of columns and the overall structure is non-strict. The gen-", "bbox": {"l": 50.111984, "t": 534.86713, "r": 286.36505, "b": 543.7737, "coord_origin": "TOPLEFT"}}, {"id": 35, "text": "eration of missing bounding boxes for non-strict HTML ta-", "bbox": {"l": 50.111984, "t": 546.82214, "r": 286.36502, "b": 555.7287, "coord_origin": "TOPLEFT"}}, {"id": 36, "text": "bles is ambiguous and therefore quite challenging.", "bbox": {"l": 50.111984, "t": 558.77814, "r": 257.47351, "b": 567.68469, "coord_origin": "TOPLEFT"}}, {"id": 37, "text": "Thus,", "bbox": {"l": 263.94919, "t": 558.77814, "r": 286.36505, "b": 567.68469, "coord_origin": "TOPLEFT"}}, {"id": 38, "text": "we have decided to simply discard those tables. In case of", "bbox": {"l": 50.111984, "t": 570.73314, "r": 286.36508, "b": 579.63969, "coord_origin": "TOPLEFT"}}, {"id": 39, "text": "PubTabNet we have computed missing bounding boxes for", "bbox": {"l": 50.111984, "t": 582.68814, "r": 286.36511, "b": 591.5947, "coord_origin": "TOPLEFT"}}, {"id": 40, "text": "48% of the simple and 69% of the complex tables. Regard-", "bbox": {"l": 50.111984, "t": 594.64314, "r": 286.36511, "b": 603.5497, "coord_origin": "TOPLEFT"}}, {"id": 41, "text": "ing FinTabNet, 68% of the simple and 98% of the complex", "bbox": {"l": 50.111984, "t": 606.5981400000001, "r": 286.36505, "b": 615.5047, "coord_origin": "TOPLEFT"}}, {"id": 42, "text": "tables require the generation of bounding boxes.", "bbox": {"l": 50.111984, "t": 618.55315, "r": 242.2606, "b": 627.4597, "coord_origin": "TOPLEFT"}}]}, {"id": 5, "label": "text", "bbox": {"l": 49.51209259033203, "t": 629.6376342773438, "r": 286.36496, "b": 651.6802978515625, "coord_origin": "TOPLEFT"}, "confidence": 0.9335850477218628, "cells": [{"id": 43, "text": "Figure 7 illustrates the distribution of the tables across", "bbox": {"l": 62.06698600000001, "t": 630.71014, "r": 286.36496, "b": 639.6167, "coord_origin": "TOPLEFT"}}, {"id": 44, "text": "different dimensions per dataset.", "bbox": {"l": 50.111984, "t": 642.66614, "r": 179.90472, "b": 651.57269, "coord_origin": "TOPLEFT"}}]}, {"id": 6, "label": "section_header", "bbox": {"l": 50.04091262817383, "t": 661.6310424804688, "r": 153.79356384277344, "b": 672.540283203125, "coord_origin": "TOPLEFT"}, "confidence": 0.9515743255615234, "cells": [{"id": 45, "text": "1.2.", "bbox": {"l": 50.111984, "t": 662.39014, "r": 64.297272, "b": 672.24219, "coord_origin": "TOPLEFT"}}, {"id": 46, "text": "Synthetic datasets", "bbox": {"l": 73.754135, "t": 662.39014, "r": 153.60785, "b": 672.24219, "coord_origin": "TOPLEFT"}}]}, {"id": 7, "label": "text", "bbox": {"l": 49.59552001953125, "t": 680.3189086914062, "r": 287.0692443847656, "b": 714.6337890625, "coord_origin": "TOPLEFT"}, "confidence": 0.9783331751823425, "cells": [{"id": 47, "text": "Aiming to train and evaluate our models in a broader", "bbox": {"l": 62.06698600000001, "t": 681.33113, "r": 286.36493, "b": 690.2377, "coord_origin": "TOPLEFT"}}, {"id": 48, "text": "spectrum of table data we have synthesized four types of", "bbox": {"l": 50.111984, "t": 693.2861330000001, "r": 286.36505, "b": 702.1927029999999, "coord_origin": "TOPLEFT"}}, {"id": 49, "text": "datasets.", "bbox": {"l": 50.111984, "t": 705.241135, "r": 84.144226, "b": 714.147705, "coord_origin": "TOPLEFT"}}, {"id": 50, "text": "Each one contains tables with different appear-", "bbox": {"l": 91.237595, "t": 705.241135, "r": 286.36505, "b": 714.147705, "coord_origin": "TOPLEFT"}}]}, {"id": 8, "label": "text", "bbox": {"l": 307.9798583984375, "t": 161.70252990722656, "r": 545.11511, "b": 207.59095764160156, "coord_origin": "TOPLEFT"}, "confidence": 0.9596063494682312, "cells": [{"id": 51, "text": "ances in regard to their size, structure, style and content.", "bbox": {"l": 308.862, "t": 162.65515000000005, "r": 545.11511, "b": 171.56170999999995, "coord_origin": "TOPLEFT"}}, {"id": 52, "text": "Every synthetic dataset contains 150k examples, summing", "bbox": {"l": 308.862, "t": 174.61017000000004, "r": 545.11511, "b": 183.51671999999996, "coord_origin": "TOPLEFT"}}, {"id": 53, "text": "up to 600k synthetic examples. All datasets are divided into", "bbox": {"l": 308.862, "t": 186.56519000000003, "r": 545.11511, "b": 195.47173999999995, "coord_origin": "TOPLEFT"}}, {"id": 54, "text": "Train, Test and Val splits (80%, 10%, 10%).", "bbox": {"l": 308.862, "t": 198.52117999999996, "r": 484.07434, "b": 207.42773, "coord_origin": "TOPLEFT"}}]}, {"id": 9, "label": "text", "bbox": {"l": 307.9639892578125, "t": 210.19483947753906, "r": 545.11505, "b": 232.41128540039062, "coord_origin": "TOPLEFT"}, "confidence": 0.9261796474456787, "cells": [{"id": 55, "text": "The process of generating a synthetic dataset can be de-", "bbox": {"l": 320.81699, "t": 211.23517000000004, "r": 545.11505, "b": 220.14171999999996, "coord_origin": "TOPLEFT"}}, {"id": 56, "text": "composed into the following steps:", "bbox": {"l": 308.862, "t": 223.19019000000003, "r": 448.08939, "b": 232.09673999999995, "coord_origin": "TOPLEFT"}}]}, {"id": 10, "label": "list_item", "bbox": {"l": 308.30316162109375, "t": 234.8782196044922, "r": 545.11511, "b": 316.54279, "coord_origin": "TOPLEFT"}, "confidence": 0.9642953872680664, "cells": [{"id": 57, "text": "1.", "bbox": {"l": 320.81699, "t": 235.90521, "r": 328.28894, "b": 244.81177000000002, "coord_origin": "TOPLEFT"}}, {"id": 58, "text": "Prepare styling and content templates: The styling", "bbox": {"l": 335.38232, "t": 235.90521, "r": 545.11499, "b": 244.81177000000002, "coord_origin": "TOPLEFT"}}, {"id": 59, "text": "templates have been manually designed and organized into", "bbox": {"l": 308.862, "t": 247.86023, "r": 545.11511, "b": 256.76678000000004, "coord_origin": "TOPLEFT"}}, {"id": 60, "text": "groups of scope specific appearances (e.g. financial data,", "bbox": {"l": 308.862, "t": 259.81525, "r": 545.11511, "b": 268.72180000000003, "coord_origin": "TOPLEFT"}}, {"id": 61, "text": "marketing data, etc.)", "bbox": {"l": 308.862, "t": 271.77026, "r": 393.3847, "b": 280.67682, "coord_origin": "TOPLEFT"}}, {"id": 62, "text": "Additionally, we have prepared cu-", "bbox": {"l": 400.11942, "t": 271.77026, "r": 545.11511, "b": 280.67682, "coord_origin": "TOPLEFT"}}, {"id": 63, "text": "rated collections of content templates by extracting the most", "bbox": {"l": 308.862, "t": 283.72524999999996, "r": 545.11505, "b": 292.63181, "coord_origin": "TOPLEFT"}}, {"id": 64, "text": "frequently used terms out of non-synthetic datasets (e.g.", "bbox": {"l": 308.862, "t": 295.68124, "r": 545.11511, "b": 304.5878000000001, "coord_origin": "TOPLEFT"}}, {"id": 65, "text": "PubTabNet, FinTabNet, etc.).", "bbox": {"l": 308.862, "t": 307.63623, "r": 425.69348, "b": 316.54279, "coord_origin": "TOPLEFT"}}]}, {"id": 11, "label": "list_item", "bbox": {"l": 307.91741943359375, "t": 319.1956481933594, "r": 545.44873046875, "b": 449.3292236328125, "coord_origin": "TOPLEFT"}, "confidence": 0.9699996113777161, "cells": [{"id": 66, "text": "2.", "bbox": {"l": 320.81699, "t": 320.35022, "r": 328.4949, "b": 329.25677, "coord_origin": "TOPLEFT"}}, {"id": 67, "text": "Generate table structures: The structure of each syn-", "bbox": {"l": 331.05423, "t": 320.35022, "r": 545.11499, "b": 329.25677, "coord_origin": "TOPLEFT"}}, {"id": 68, "text": "thetic dataset assumes a horizontal table header which po-", "bbox": {"l": 308.862, "t": 332.30521000000005, "r": 545.11517, "b": 341.21176, "coord_origin": "TOPLEFT"}}, {"id": 69, "text": "tentially spans over multiple rows and a table body that", "bbox": {"l": 308.862, "t": 344.26018999999997, "r": 545.11505, "b": 353.16675, "coord_origin": "TOPLEFT"}}, {"id": 70, "text": "may contain a combination of row spans and column spans.", "bbox": {"l": 308.862, "t": 356.21619, "r": 545.11511, "b": 365.12273999999996, "coord_origin": "TOPLEFT"}}, {"id": 71, "text": "However, spans are not allowed to cross the header - body", "bbox": {"l": 308.862, "t": 368.17117, "r": 545.11511, "b": 377.07773, "coord_origin": "TOPLEFT"}}, {"id": 72, "text": "boundary. The table structure is described by the parame-", "bbox": {"l": 308.862, "t": 380.12616, "r": 545.11499, "b": 389.03271, "coord_origin": "TOPLEFT"}}, {"id": 73, "text": "ters: Total number of table rows and columns, number of", "bbox": {"l": 308.862, "t": 392.08115, "r": 545.11517, "b": 400.98769999999996, "coord_origin": "TOPLEFT"}}, {"id": 74, "text": "header rows, type of spans (header only spans, row only", "bbox": {"l": 308.862, "t": 404.03613000000007, "r": 545.11511, "b": 412.94269, "coord_origin": "TOPLEFT"}}, {"id": 75, "text": "spans, column only spans, both row and column spans),", "bbox": {"l": 308.862, "t": 415.99112, "r": 545.11499, "b": 424.89767, "coord_origin": "TOPLEFT"}}, {"id": 76, "text": "maximum span size and the ratio of the table area covered", "bbox": {"l": 308.862, "t": 427.94711, "r": 545.11517, "b": 436.85367, "coord_origin": "TOPLEFT"}}, {"id": 77, "text": "by spans.", "bbox": {"l": 308.862, "t": 439.9021, "r": 345.94278, "b": 448.80865, "coord_origin": "TOPLEFT"}}]}, {"id": 12, "label": "list_item", "bbox": {"l": 308.2115783691406, "t": 451.67333984375, "r": 545.591552734375, "b": 497.38861, "coord_origin": "TOPLEFT"}, "confidence": 0.9568929672241211, "cells": [{"id": 78, "text": "3.", "bbox": {"l": 320.81699, "t": 452.61609, "r": 328.30341, "b": 461.52264, "coord_origin": "TOPLEFT"}}, {"id": 79, "text": "Generate content: Based on the dataset", "bbox": {"l": 330.79889, "t": 452.61609, "r": 485.75772000000006, "b": 461.52264, "coord_origin": "TOPLEFT"}}, {"id": 80, "text": "theme", "bbox": {"l": 488.073, "t": 452.70575, "r": 511.86368, "b": 461.29352, "coord_origin": "TOPLEFT"}}, {"id": 81, "text": ", a set of", "bbox": {"l": 511.86301, "t": 452.61609, "r": 545.10815, "b": 461.52264, "coord_origin": "TOPLEFT"}}, {"id": 82, "text": "suitable content templates is chosen first. Then, this content", "bbox": {"l": 308.862, "t": 464.57108, "r": 545.11505, "b": 473.47763, "coord_origin": "TOPLEFT"}}, {"id": 83, "text": "can be combined with purely random text to produce the", "bbox": {"l": 308.862, "t": 476.52707, "r": 545.11517, "b": 485.43362, "coord_origin": "TOPLEFT"}}, {"id": 84, "text": "synthetic content.", "bbox": {"l": 308.862, "t": 488.48206, "r": 379.14816, "b": 497.38861, "coord_origin": "TOPLEFT"}}]}, {"id": 13, "label": "list_item", "bbox": {"l": 308.10528564453125, "t": 500.1474304199219, "r": 545.2609252929688, "b": 546.41064453125, "coord_origin": "TOPLEFT"}, "confidence": 0.9710659980773926, "cells": [{"id": 85, "text": "4.", "bbox": {"l": 320.81699, "t": 501.19604, "r": 328.66177, "b": 510.1026, "coord_origin": "TOPLEFT"}}, {"id": 86, "text": "Apply styling templates: Depending on the domain", "bbox": {"l": 331.2767, "t": 501.19604, "r": 545.11493, "b": 510.1026, "coord_origin": "TOPLEFT"}}, {"id": 87, "text": "of the synthetic dataset, a set of styling templates is first", "bbox": {"l": 308.862, "t": 513.15103, "r": 545.1153, "b": 522.05759, "coord_origin": "TOPLEFT"}}, {"id": 88, "text": "manually selected.", "bbox": {"l": 308.862, "t": 525.10703, "r": 384.29883, "b": 534.01358, "coord_origin": "TOPLEFT"}}, {"id": 89, "text": "Then, a style is randomly selected to", "bbox": {"l": 391.25272, "t": 525.10703, "r": 545.11511, "b": 534.01358, "coord_origin": "TOPLEFT"}}, {"id": 90, "text": "format the appearance of the synthesized table.", "bbox": {"l": 308.862, "t": 537.06203, "r": 496.15897000000007, "b": 545.96858, "coord_origin": "TOPLEFT"}}]}, {"id": 14, "label": "list_item", "bbox": {"l": 308.0560607910156, "t": 548.7080078125, "r": 545.3718872070312, "b": 607.4362182617188, "coord_origin": "TOPLEFT"}, "confidence": 0.9778757095336914, "cells": [{"id": 91, "text": "5.", "bbox": {"l": 320.81699, "t": 549.77603, "r": 328.28894, "b": 558.68259, "coord_origin": "TOPLEFT"}}, {"id": 92, "text": "Render the complete tables: The synthetic table is", "bbox": {"l": 335.40222, "t": 549.77603, "r": 545.11499, "b": 558.68259, "coord_origin": "TOPLEFT"}}, {"id": 93, "text": "finally rendered by a web browser engine to generate the", "bbox": {"l": 308.862, "t": 561.73103, "r": 545.11517, "b": 570.63759, "coord_origin": "TOPLEFT"}}, {"id": 94, "text": "bounding boxes for each table cell. A batching technique is", "bbox": {"l": 308.862, "t": 573.68604, "r": 545.11511, "b": 582.59259, "coord_origin": "TOPLEFT"}}, {"id": 95, "text": "utilized to optimize the runtime overhead of the rendering", "bbox": {"l": 308.862, "t": 585.64203, "r": 545.11505, "b": 594.54858, "coord_origin": "TOPLEFT"}}, {"id": 96, "text": "process.", "bbox": {"l": 308.862, "t": 597.59703, "r": 341.2305, "b": 606.50359, "coord_origin": "TOPLEFT"}}]}, {"id": 15, "label": "section_header", "bbox": {"l": 307.9035339355469, "t": 621.6393432617188, "r": 545.10876, "b": 646.98631, "coord_origin": "TOPLEFT"}, "confidence": 0.9549407362937927, "cells": [{"id": 97, "text": "2.", "bbox": {"l": 308.862, "t": 622.2905900000001, "r": 316.76675, "b": 633.03831, "coord_origin": "TOPLEFT"}}, {"id": 98, "text": "Prediction post-processing for PDF docu-", "bbox": {"l": 327.30643, "t": 622.2905900000001, "r": 545.10876, "b": 633.03831, "coord_origin": "TOPLEFT"}}, {"id": 99, "text": "ments", "bbox": {"l": 326.79501, "t": 636.2385899999999, "r": 357.34055, "b": 646.98631, "coord_origin": "TOPLEFT"}}]}, {"id": 16, "label": "text", "bbox": {"l": 308.0598449707031, "t": 656.1874389648438, "r": 545.1201171875, "b": 714.419189453125, "coord_origin": "TOPLEFT"}, "confidence": 0.9829330444335938, "cells": [{"id": 100, "text": "Although TableFormer can predict the table structure and", "bbox": {"l": 320.81702, "t": 657.42104, "r": 545.11499, "b": 666.3276, "coord_origin": "TOPLEFT"}}, {"id": 101, "text": "the bounding boxes for tables recognized inside PDF docu-", "bbox": {"l": 308.86203, "t": 669.37604, "r": 545.11511, "b": 678.2826, "coord_origin": "TOPLEFT"}}, {"id": 102, "text": "ments, this is not enough when a full reconstruction of the", "bbox": {"l": 308.86203, "t": 681.33104, "r": 545.11517, "b": 690.2376, "coord_origin": "TOPLEFT"}}, {"id": 103, "text": "original table is required. This happens mainly due the fol-", "bbox": {"l": 308.86203, "t": 693.286041, "r": 545.11505, "b": 702.1926040000001, "coord_origin": "TOPLEFT"}}, {"id": 104, "text": "lowing reasons:", "bbox": {"l": 308.86203, "t": 705.242035, "r": 371.42719, "b": 714.148605, "coord_origin": "TOPLEFT"}}]}, {"id": 17, "label": "page_footer", "bbox": {"l": 292.63104, "t": 733.1739501953125, "r": 302.59363, "b": 743.0396, "coord_origin": "TOPLEFT"}, "confidence": 0.8986664414405823, "cells": [{"id": 105, "text": "11", "bbox": {"l": 292.63104, "t": 734.1330379999999, "r": 302.59363, "b": 743.0396, "coord_origin": "TOPLEFT"}}]}]}, "tablestructure": {"table_map": {}}, "figures_classification": null, "equations_prediction": null}, "assembled": {"elements": [{"label": "section_header", "id": 0, "page_no": 10, "cluster": {"id": 0, "label": "section_header", "bbox": {"l": 132.30833435058594, "t": 109.52092742919922, "r": 465.6390075683594, "b": 135.91281127929688, "coord_origin": "TOPLEFT"}, "confidence": 0.6497087478637695, "cells": [{"id": 0, "text": "TableFormer: Table Structure Understanding with Transformers", "bbox": {"l": 132.842, "t": 110.57488999999998, "r": 465.37591999999995, "b": 121.32263, "coord_origin": "TOPLEFT"}}, {"id": 1, "text": "Supplementary Material", "bbox": {"l": 220.18399, "t": 122.25982999999997, "r": 375.04269, "b": 135.53008999999997, "coord_origin": "TOPLEFT"}}]}, "text": "TableFormer: Table Structure Understanding with Transformers Supplementary Material"}, {"label": "section_header", "id": 1, "page_no": 10, "cluster": {"id": 1, "label": "section_header", "bbox": {"l": 50.058563232421875, "t": 160.34249877929688, "r": 175.96437, "b": 171.90863000000002, "coord_origin": "TOPLEFT"}, "confidence": 0.9454860091209412, "cells": [{"id": 2, "text": "1.", "bbox": {"l": 50.111984, "t": 161.16089, "r": 57.089828, "b": 171.90863000000002, "coord_origin": "TOPLEFT"}}, {"id": 3, "text": "Details on the datasets", "bbox": {"l": 66.393616, "t": 161.16089, "r": 175.96437, "b": 171.90863000000002, "coord_origin": "TOPLEFT"}}]}, "text": "1. Details on the datasets"}, {"label": "section_header", "id": 2, "page_no": 10, "cluster": {"id": 2, "label": "section_header", "bbox": {"l": 49.89580154418945, "t": 180.4131622314453, "r": 150.36401, "b": 191.2897491455078, "coord_origin": "TOPLEFT"}, "confidence": 0.9535645246505737, "cells": [{"id": 4, "text": "1.1.", "bbox": {"l": 50.111984, "t": 180.97931000000005, "r": 64.210808, "b": 190.83136000000002, "coord_origin": "TOPLEFT"}}, {"id": 5, "text": "Data preparation", "bbox": {"l": 73.610023, "t": 180.97931000000005, "r": 150.36401, "b": 190.83136000000002, "coord_origin": "TOPLEFT"}}]}, "text": "1.1. Data preparation"}, {"label": "text", "id": 3, "page_no": 10, "cluster": {"id": 3, "label": "text", "bbox": {"l": 49.297401428222656, "t": 198.91737365722656, "r": 286.80126953125, "b": 388.41900634765625, "coord_origin": "TOPLEFT"}, "confidence": 0.9864971041679382, "cells": [{"id": 6, "text": "As a first step of our data preparation process, we have", "bbox": {"l": 62.06698600000001, "t": 199.92029000000002, "r": 286.36496, "b": 208.82683999999995, "coord_origin": "TOPLEFT"}}, {"id": 7, "text": "calculated statistics over the datasets across the following", "bbox": {"l": 50.111984, "t": 211.87627999999995, "r": 286.36505, "b": 220.78283999999996, "coord_origin": "TOPLEFT"}}, {"id": 8, "text": "dimensions: (1) table size measured in the number of rows", "bbox": {"l": 50.111984, "t": 223.83130000000006, "r": 286.36514, "b": 232.73784999999998, "coord_origin": "TOPLEFT"}}, {"id": 9, "text": "and columns, (2) complexity of the table, (3) strictness of", "bbox": {"l": 50.111984, "t": 235.78632000000005, "r": 286.36508, "b": 244.69286999999997, "coord_origin": "TOPLEFT"}}, {"id": 10, "text": "the provided HTML structure and (4) completeness (i.e. no", "bbox": {"l": 50.111984, "t": 247.74132999999995, "r": 286.36505, "b": 256.64788999999996, "coord_origin": "TOPLEFT"}}, {"id": 11, "text": "omitted bounding boxes). A table is considered to be simple", "bbox": {"l": 50.111984, "t": 259.69635000000005, "r": 286.36505, "b": 268.60290999999995, "coord_origin": "TOPLEFT"}}, {"id": 12, "text": "if it does not contain row spans or column spans. Addition-", "bbox": {"l": 50.111984, "t": 271.65137000000004, "r": 286.36505, "b": 280.55792, "coord_origin": "TOPLEFT"}}, {"id": 13, "text": "ally, a table has a strict HTML structure if every row has the", "bbox": {"l": 50.111984, "t": 283.60736, "r": 286.36502, "b": 292.5139199999999, "coord_origin": "TOPLEFT"}}, {"id": 14, "text": "same number of columns after taking into account any row", "bbox": {"l": 50.111984, "t": 295.56235, "r": 286.36505, "b": 304.4689, "coord_origin": "TOPLEFT"}}, {"id": 15, "text": "or column spans. Therefore a strict HTML structure looks", "bbox": {"l": 50.111984, "t": 307.5173300000001, "r": 286.36508, "b": 316.42389, "coord_origin": "TOPLEFT"}}, {"id": 16, "text": "always rectangular. However, HTML is a lenient encoding", "bbox": {"l": 50.111984, "t": 319.47232, "r": 286.36505, "b": 328.3788799999999, "coord_origin": "TOPLEFT"}}, {"id": 17, "text": "format, i.e. tables with rows of different sizes might still", "bbox": {"l": 50.111984, "t": 331.42731000000003, "r": 286.36502, "b": 340.33386, "coord_origin": "TOPLEFT"}}, {"id": 18, "text": "be regarded as correct due to implicit display rules. These", "bbox": {"l": 50.111984, "t": 343.3833, "r": 286.36508, "b": 352.28986, "coord_origin": "TOPLEFT"}}, {"id": 19, "text": "implicit rules leave room for ambiguity, which we want to", "bbox": {"l": 50.111984, "t": 355.33829, "r": 286.36505, "b": 364.24484000000007, "coord_origin": "TOPLEFT"}}, {"id": 20, "text": "avoid. As such, we prefer to have \u201dstrict\u201d tables, i.e. tables", "bbox": {"l": 50.111984, "t": 367.29327, "r": 286.36508, "b": 376.19983, "coord_origin": "TOPLEFT"}}, {"id": 21, "text": "where every row has exactly the same length.", "bbox": {"l": 50.111984, "t": 379.24826, "r": 230.80364999999998, "b": 388.15482000000003, "coord_origin": "TOPLEFT"}}]}, "text": "As a first step of our data preparation process, we have calculated statistics over the datasets across the following dimensions: (1) table size measured in the number of rows and columns, (2) complexity of the table, (3) strictness of the provided HTML structure and (4) completeness (i.e. no omitted bounding boxes). A table is considered to be simple if it does not contain row spans or column spans. Additionally, a table has a strict HTML structure if every row has the same number of columns after taking into account any row or column spans. Therefore a strict HTML structure looks always rectangular. However, HTML is a lenient encoding format, i.e. tables with rows of different sizes might still be regarded as correct due to implicit display rules. These implicit rules leave room for ambiguity, which we want to avoid. As such, we prefer to have \u201dstrict\u201d tables, i.e. tables where every row has exactly the same length."}, {"label": "text", "id": 4, "page_no": 10, "cluster": {"id": 4, "label": "text", "bbox": {"l": 49.358585357666016, "t": 390.24591064453125, "r": 286.73260498046875, "b": 628.0463256835938, "coord_origin": "TOPLEFT"}, "confidence": 0.9826022386550903, "cells": [{"id": 22, "text": "We have developed a technique that tries to derive a", "bbox": {"l": 62.06698600000001, "t": 391.40527, "r": 286.36499, "b": 400.31183, "coord_origin": "TOPLEFT"}}, {"id": 23, "text": "missing bounding box out of its neighbors. As a first step,", "bbox": {"l": 50.111984, "t": 403.36026, "r": 286.36508, "b": 412.26681999999994, "coord_origin": "TOPLEFT"}}, {"id": 24, "text": "we use the annotation data to generate the most fine-grained", "bbox": {"l": 50.111984, "t": 415.31525, "r": 286.36505, "b": 424.22180000000003, "coord_origin": "TOPLEFT"}}, {"id": 25, "text": "grid that covers the table structure. In case of strict HTML", "bbox": {"l": 50.111984, "t": 427.2712399999999, "r": 286.36505, "b": 436.1778, "coord_origin": "TOPLEFT"}}, {"id": 26, "text": "tables, all grid squares are associated with some table cell", "bbox": {"l": 50.111984, "t": 439.22623, "r": 286.36508, "b": 448.1327800000001, "coord_origin": "TOPLEFT"}}, {"id": 27, "text": "and in the presence of table spans a cell extends across mul-", "bbox": {"l": 50.111984, "t": 451.18121, "r": 286.36511, "b": 460.08777, "coord_origin": "TOPLEFT"}}, {"id": 28, "text": "tiple grid squares. When enough bounding boxes are known", "bbox": {"l": 50.111984, "t": 463.1362, "r": 286.36505, "b": 472.04276, "coord_origin": "TOPLEFT"}}, {"id": 29, "text": "for a rectangular table, it is possible to compute the geo-", "bbox": {"l": 50.111984, "t": 475.09119, "r": 286.36508, "b": 483.99774, "coord_origin": "TOPLEFT"}}, {"id": 30, "text": "metrical border lines between the grid rows and columns.", "bbox": {"l": 50.111984, "t": 487.04617, "r": 286.36502, "b": 495.95273, "coord_origin": "TOPLEFT"}}, {"id": 31, "text": "Eventually this information is used to generate the missing", "bbox": {"l": 50.111984, "t": 499.00217, "r": 286.36511, "b": 507.90872, "coord_origin": "TOPLEFT"}}, {"id": 32, "text": "bounding boxes. Additionally, the existence of unused grid", "bbox": {"l": 50.111984, "t": 510.95715, "r": 286.36508, "b": 519.8637100000001, "coord_origin": "TOPLEFT"}}, {"id": 33, "text": "squares indicates that the table rows have unequal number", "bbox": {"l": 50.111984, "t": 522.91214, "r": 286.36508, "b": 531.8187, "coord_origin": "TOPLEFT"}}, {"id": 34, "text": "of columns and the overall structure is non-strict. The gen-", "bbox": {"l": 50.111984, "t": 534.86713, "r": 286.36505, "b": 543.7737, "coord_origin": "TOPLEFT"}}, {"id": 35, "text": "eration of missing bounding boxes for non-strict HTML ta-", "bbox": {"l": 50.111984, "t": 546.82214, "r": 286.36502, "b": 555.7287, "coord_origin": "TOPLEFT"}}, {"id": 36, "text": "bles is ambiguous and therefore quite challenging.", "bbox": {"l": 50.111984, "t": 558.77814, "r": 257.47351, "b": 567.68469, "coord_origin": "TOPLEFT"}}, {"id": 37, "text": "Thus,", "bbox": {"l": 263.94919, "t": 558.77814, "r": 286.36505, "b": 567.68469, "coord_origin": "TOPLEFT"}}, {"id": 38, "text": "we have decided to simply discard those tables. In case of", "bbox": {"l": 50.111984, "t": 570.73314, "r": 286.36508, "b": 579.63969, "coord_origin": "TOPLEFT"}}, {"id": 39, "text": "PubTabNet we have computed missing bounding boxes for", "bbox": {"l": 50.111984, "t": 582.68814, "r": 286.36511, "b": 591.5947, "coord_origin": "TOPLEFT"}}, {"id": 40, "text": "48% of the simple and 69% of the complex tables. Regard-", "bbox": {"l": 50.111984, "t": 594.64314, "r": 286.36511, "b": 603.5497, "coord_origin": "TOPLEFT"}}, {"id": 41, "text": "ing FinTabNet, 68% of the simple and 98% of the complex", "bbox": {"l": 50.111984, "t": 606.5981400000001, "r": 286.36505, "b": 615.5047, "coord_origin": "TOPLEFT"}}, {"id": 42, "text": "tables require the generation of bounding boxes.", "bbox": {"l": 50.111984, "t": 618.55315, "r": 242.2606, "b": 627.4597, "coord_origin": "TOPLEFT"}}]}, "text": "We have developed a technique that tries to derive a missing bounding box out of its neighbors. As a first step, we use the annotation data to generate the most fine-grained grid that covers the table structure. In case of strict HTML tables, all grid squares are associated with some table cell and in the presence of table spans a cell extends across multiple grid squares. When enough bounding boxes are known for a rectangular table, it is possible to compute the geometrical border lines between the grid rows and columns. Eventually this information is used to generate the missing bounding boxes. Additionally, the existence of unused grid squares indicates that the table rows have unequal number of columns and the overall structure is non-strict. The generation of missing bounding boxes for non-strict HTML tables is ambiguous and therefore quite challenging. Thus, we have decided to simply discard those tables. In case of PubTabNet we have computed missing bounding boxes for 48% of the simple and 69% of the complex tables. Regarding FinTabNet, 68% of the simple and 98% of the complex tables require the generation of bounding boxes."}, {"label": "text", "id": 5, "page_no": 10, "cluster": {"id": 5, "label": "text", "bbox": {"l": 49.51209259033203, "t": 629.6376342773438, "r": 286.36496, "b": 651.6802978515625, "coord_origin": "TOPLEFT"}, "confidence": 0.9335850477218628, "cells": [{"id": 43, "text": "Figure 7 illustrates the distribution of the tables across", "bbox": {"l": 62.06698600000001, "t": 630.71014, "r": 286.36496, "b": 639.6167, "coord_origin": "TOPLEFT"}}, {"id": 44, "text": "different dimensions per dataset.", "bbox": {"l": 50.111984, "t": 642.66614, "r": 179.90472, "b": 651.57269, "coord_origin": "TOPLEFT"}}]}, "text": "Figure 7 illustrates the distribution of the tables across different dimensions per dataset."}, {"label": "section_header", "id": 6, "page_no": 10, "cluster": {"id": 6, "label": "section_header", "bbox": {"l": 50.04091262817383, "t": 661.6310424804688, "r": 153.79356384277344, "b": 672.540283203125, "coord_origin": "TOPLEFT"}, "confidence": 0.9515743255615234, "cells": [{"id": 45, "text": "1.2.", "bbox": {"l": 50.111984, "t": 662.39014, "r": 64.297272, "b": 672.24219, "coord_origin": "TOPLEFT"}}, {"id": 46, "text": "Synthetic datasets", "bbox": {"l": 73.754135, "t": 662.39014, "r": 153.60785, "b": 672.24219, "coord_origin": "TOPLEFT"}}]}, "text": "1.2. Synthetic datasets"}, {"label": "text", "id": 7, "page_no": 10, "cluster": {"id": 7, "label": "text", "bbox": {"l": 49.59552001953125, "t": 680.3189086914062, "r": 287.0692443847656, "b": 714.6337890625, "coord_origin": "TOPLEFT"}, "confidence": 0.9783331751823425, "cells": [{"id": 47, "text": "Aiming to train and evaluate our models in a broader", "bbox": {"l": 62.06698600000001, "t": 681.33113, "r": 286.36493, "b": 690.2377, "coord_origin": "TOPLEFT"}}, {"id": 48, "text": "spectrum of table data we have synthesized four types of", "bbox": {"l": 50.111984, "t": 693.2861330000001, "r": 286.36505, "b": 702.1927029999999, "coord_origin": "TOPLEFT"}}, {"id": 49, "text": "datasets.", "bbox": {"l": 50.111984, "t": 705.241135, "r": 84.144226, "b": 714.147705, "coord_origin": "TOPLEFT"}}, {"id": 50, "text": "Each one contains tables with different appear-", "bbox": {"l": 91.237595, "t": 705.241135, "r": 286.36505, "b": 714.147705, "coord_origin": "TOPLEFT"}}]}, "text": "Aiming to train and evaluate our models in a broader spectrum of table data we have synthesized four types of datasets. Each one contains tables with different appear-"}, {"label": "text", "id": 8, "page_no": 10, "cluster": {"id": 8, "label": "text", "bbox": {"l": 307.9798583984375, "t": 161.70252990722656, "r": 545.11511, "b": 207.59095764160156, "coord_origin": "TOPLEFT"}, "confidence": 0.9596063494682312, "cells": [{"id": 51, "text": "ances in regard to their size, structure, style and content.", "bbox": {"l": 308.862, "t": 162.65515000000005, "r": 545.11511, "b": 171.56170999999995, "coord_origin": "TOPLEFT"}}, {"id": 52, "text": "Every synthetic dataset contains 150k examples, summing", "bbox": {"l": 308.862, "t": 174.61017000000004, "r": 545.11511, "b": 183.51671999999996, "coord_origin": "TOPLEFT"}}, {"id": 53, "text": "up to 600k synthetic examples. All datasets are divided into", "bbox": {"l": 308.862, "t": 186.56519000000003, "r": 545.11511, "b": 195.47173999999995, "coord_origin": "TOPLEFT"}}, {"id": 54, "text": "Train, Test and Val splits (80%, 10%, 10%).", "bbox": {"l": 308.862, "t": 198.52117999999996, "r": 484.07434, "b": 207.42773, "coord_origin": "TOPLEFT"}}]}, "text": "ances in regard to their size, structure, style and content. Every synthetic dataset contains 150k examples, summing up to 600k synthetic examples. All datasets are divided into Train, Test and Val splits (80%, 10%, 10%)."}, {"label": "text", "id": 9, "page_no": 10, "cluster": {"id": 9, "label": "text", "bbox": {"l": 307.9639892578125, "t": 210.19483947753906, "r": 545.11505, "b": 232.41128540039062, "coord_origin": "TOPLEFT"}, "confidence": 0.9261796474456787, "cells": [{"id": 55, "text": "The process of generating a synthetic dataset can be de-", "bbox": {"l": 320.81699, "t": 211.23517000000004, "r": 545.11505, "b": 220.14171999999996, "coord_origin": "TOPLEFT"}}, {"id": 56, "text": "composed into the following steps:", "bbox": {"l": 308.862, "t": 223.19019000000003, "r": 448.08939, "b": 232.09673999999995, "coord_origin": "TOPLEFT"}}]}, "text": "The process of generating a synthetic dataset can be decomposed into the following steps:"}, {"label": "list_item", "id": 10, "page_no": 10, "cluster": {"id": 10, "label": "list_item", "bbox": {"l": 308.30316162109375, "t": 234.8782196044922, "r": 545.11511, "b": 316.54279, "coord_origin": "TOPLEFT"}, "confidence": 0.9642953872680664, "cells": [{"id": 57, "text": "1.", "bbox": {"l": 320.81699, "t": 235.90521, "r": 328.28894, "b": 244.81177000000002, "coord_origin": "TOPLEFT"}}, {"id": 58, "text": "Prepare styling and content templates: The styling", "bbox": {"l": 335.38232, "t": 235.90521, "r": 545.11499, "b": 244.81177000000002, "coord_origin": "TOPLEFT"}}, {"id": 59, "text": "templates have been manually designed and organized into", "bbox": {"l": 308.862, "t": 247.86023, "r": 545.11511, "b": 256.76678000000004, "coord_origin": "TOPLEFT"}}, {"id": 60, "text": "groups of scope specific appearances (e.g. financial data,", "bbox": {"l": 308.862, "t": 259.81525, "r": 545.11511, "b": 268.72180000000003, "coord_origin": "TOPLEFT"}}, {"id": 61, "text": "marketing data, etc.)", "bbox": {"l": 308.862, "t": 271.77026, "r": 393.3847, "b": 280.67682, "coord_origin": "TOPLEFT"}}, {"id": 62, "text": "Additionally, we have prepared cu-", "bbox": {"l": 400.11942, "t": 271.77026, "r": 545.11511, "b": 280.67682, "coord_origin": "TOPLEFT"}}, {"id": 63, "text": "rated collections of content templates by extracting the most", "bbox": {"l": 308.862, "t": 283.72524999999996, "r": 545.11505, "b": 292.63181, "coord_origin": "TOPLEFT"}}, {"id": 64, "text": "frequently used terms out of non-synthetic datasets (e.g.", "bbox": {"l": 308.862, "t": 295.68124, "r": 545.11511, "b": 304.5878000000001, "coord_origin": "TOPLEFT"}}, {"id": 65, "text": "PubTabNet, FinTabNet, etc.).", "bbox": {"l": 308.862, "t": 307.63623, "r": 425.69348, "b": 316.54279, "coord_origin": "TOPLEFT"}}]}, "text": "1. Prepare styling and content templates: The styling templates have been manually designed and organized into groups of scope specific appearances (e.g. financial data, marketing data, etc.) Additionally, we have prepared curated collections of content templates by extracting the most frequently used terms out of non-synthetic datasets (e.g. PubTabNet, FinTabNet, etc.)."}, {"label": "list_item", "id": 11, "page_no": 10, "cluster": {"id": 11, "label": "list_item", "bbox": {"l": 307.91741943359375, "t": 319.1956481933594, "r": 545.44873046875, "b": 449.3292236328125, "coord_origin": "TOPLEFT"}, "confidence": 0.9699996113777161, "cells": [{"id": 66, "text": "2.", "bbox": {"l": 320.81699, "t": 320.35022, "r": 328.4949, "b": 329.25677, "coord_origin": "TOPLEFT"}}, {"id": 67, "text": "Generate table structures: The structure of each syn-", "bbox": {"l": 331.05423, "t": 320.35022, "r": 545.11499, "b": 329.25677, "coord_origin": "TOPLEFT"}}, {"id": 68, "text": "thetic dataset assumes a horizontal table header which po-", "bbox": {"l": 308.862, "t": 332.30521000000005, "r": 545.11517, "b": 341.21176, "coord_origin": "TOPLEFT"}}, {"id": 69, "text": "tentially spans over multiple rows and a table body that", "bbox": {"l": 308.862, "t": 344.26018999999997, "r": 545.11505, "b": 353.16675, "coord_origin": "TOPLEFT"}}, {"id": 70, "text": "may contain a combination of row spans and column spans.", "bbox": {"l": 308.862, "t": 356.21619, "r": 545.11511, "b": 365.12273999999996, "coord_origin": "TOPLEFT"}}, {"id": 71, "text": "However, spans are not allowed to cross the header - body", "bbox": {"l": 308.862, "t": 368.17117, "r": 545.11511, "b": 377.07773, "coord_origin": "TOPLEFT"}}, {"id": 72, "text": "boundary. The table structure is described by the parame-", "bbox": {"l": 308.862, "t": 380.12616, "r": 545.11499, "b": 389.03271, "coord_origin": "TOPLEFT"}}, {"id": 73, "text": "ters: Total number of table rows and columns, number of", "bbox": {"l": 308.862, "t": 392.08115, "r": 545.11517, "b": 400.98769999999996, "coord_origin": "TOPLEFT"}}, {"id": 74, "text": "header rows, type of spans (header only spans, row only", "bbox": {"l": 308.862, "t": 404.03613000000007, "r": 545.11511, "b": 412.94269, "coord_origin": "TOPLEFT"}}, {"id": 75, "text": "spans, column only spans, both row and column spans),", "bbox": {"l": 308.862, "t": 415.99112, "r": 545.11499, "b": 424.89767, "coord_origin": "TOPLEFT"}}, {"id": 76, "text": "maximum span size and the ratio of the table area covered", "bbox": {"l": 308.862, "t": 427.94711, "r": 545.11517, "b": 436.85367, "coord_origin": "TOPLEFT"}}, {"id": 77, "text": "by spans.", "bbox": {"l": 308.862, "t": 439.9021, "r": 345.94278, "b": 448.80865, "coord_origin": "TOPLEFT"}}]}, "text": "2. Generate table structures: The structure of each synthetic dataset assumes a horizontal table header which potentially spans over multiple rows and a table body that may contain a combination of row spans and column spans. However, spans are not allowed to cross the header - body boundary. The table structure is described by the parameters: Total number of table rows and columns, number of header rows, type of spans (header only spans, row only spans, column only spans, both row and column spans), maximum span size and the ratio of the table area covered by spans."}, {"label": "list_item", "id": 12, "page_no": 10, "cluster": {"id": 12, "label": "list_item", "bbox": {"l": 308.2115783691406, "t": 451.67333984375, "r": 545.591552734375, "b": 497.38861, "coord_origin": "TOPLEFT"}, "confidence": 0.9568929672241211, "cells": [{"id": 78, "text": "3.", "bbox": {"l": 320.81699, "t": 452.61609, "r": 328.30341, "b": 461.52264, "coord_origin": "TOPLEFT"}}, {"id": 79, "text": "Generate content: Based on the dataset", "bbox": {"l": 330.79889, "t": 452.61609, "r": 485.75772000000006, "b": 461.52264, "coord_origin": "TOPLEFT"}}, {"id": 80, "text": "theme", "bbox": {"l": 488.073, "t": 452.70575, "r": 511.86368, "b": 461.29352, "coord_origin": "TOPLEFT"}}, {"id": 81, "text": ", a set of", "bbox": {"l": 511.86301, "t": 452.61609, "r": 545.10815, "b": 461.52264, "coord_origin": "TOPLEFT"}}, {"id": 82, "text": "suitable content templates is chosen first. Then, this content", "bbox": {"l": 308.862, "t": 464.57108, "r": 545.11505, "b": 473.47763, "coord_origin": "TOPLEFT"}}, {"id": 83, "text": "can be combined with purely random text to produce the", "bbox": {"l": 308.862, "t": 476.52707, "r": 545.11517, "b": 485.43362, "coord_origin": "TOPLEFT"}}, {"id": 84, "text": "synthetic content.", "bbox": {"l": 308.862, "t": 488.48206, "r": 379.14816, "b": 497.38861, "coord_origin": "TOPLEFT"}}]}, "text": "3. Generate content: Based on the dataset theme , a set of suitable content templates is chosen first. Then, this content can be combined with purely random text to produce the synthetic content."}, {"label": "list_item", "id": 13, "page_no": 10, "cluster": {"id": 13, "label": "list_item", "bbox": {"l": 308.10528564453125, "t": 500.1474304199219, "r": 545.2609252929688, "b": 546.41064453125, "coord_origin": "TOPLEFT"}, "confidence": 0.9710659980773926, "cells": [{"id": 85, "text": "4.", "bbox": {"l": 320.81699, "t": 501.19604, "r": 328.66177, "b": 510.1026, "coord_origin": "TOPLEFT"}}, {"id": 86, "text": "Apply styling templates: Depending on the domain", "bbox": {"l": 331.2767, "t": 501.19604, "r": 545.11493, "b": 510.1026, "coord_origin": "TOPLEFT"}}, {"id": 87, "text": "of the synthetic dataset, a set of styling templates is first", "bbox": {"l": 308.862, "t": 513.15103, "r": 545.1153, "b": 522.05759, "coord_origin": "TOPLEFT"}}, {"id": 88, "text": "manually selected.", "bbox": {"l": 308.862, "t": 525.10703, "r": 384.29883, "b": 534.01358, "coord_origin": "TOPLEFT"}}, {"id": 89, "text": "Then, a style is randomly selected to", "bbox": {"l": 391.25272, "t": 525.10703, "r": 545.11511, "b": 534.01358, "coord_origin": "TOPLEFT"}}, {"id": 90, "text": "format the appearance of the synthesized table.", "bbox": {"l": 308.862, "t": 537.06203, "r": 496.15897000000007, "b": 545.96858, "coord_origin": "TOPLEFT"}}]}, "text": "4. Apply styling templates: Depending on the domain of the synthetic dataset, a set of styling templates is first manually selected. Then, a style is randomly selected to format the appearance of the synthesized table."}, {"label": "list_item", "id": 14, "page_no": 10, "cluster": {"id": 14, "label": "list_item", "bbox": {"l": 308.0560607910156, "t": 548.7080078125, "r": 545.3718872070312, "b": 607.4362182617188, "coord_origin": "TOPLEFT"}, "confidence": 0.9778757095336914, "cells": [{"id": 91, "text": "5.", "bbox": {"l": 320.81699, "t": 549.77603, "r": 328.28894, "b": 558.68259, "coord_origin": "TOPLEFT"}}, {"id": 92, "text": "Render the complete tables: The synthetic table is", "bbox": {"l": 335.40222, "t": 549.77603, "r": 545.11499, "b": 558.68259, "coord_origin": "TOPLEFT"}}, {"id": 93, "text": "finally rendered by a web browser engine to generate the", "bbox": {"l": 308.862, "t": 561.73103, "r": 545.11517, "b": 570.63759, "coord_origin": "TOPLEFT"}}, {"id": 94, "text": "bounding boxes for each table cell. A batching technique is", "bbox": {"l": 308.862, "t": 573.68604, "r": 545.11511, "b": 582.59259, "coord_origin": "TOPLEFT"}}, {"id": 95, "text": "utilized to optimize the runtime overhead of the rendering", "bbox": {"l": 308.862, "t": 585.64203, "r": 545.11505, "b": 594.54858, "coord_origin": "TOPLEFT"}}, {"id": 96, "text": "process.", "bbox": {"l": 308.862, "t": 597.59703, "r": 341.2305, "b": 606.50359, "coord_origin": "TOPLEFT"}}]}, "text": "5. Render the complete tables: The synthetic table is finally rendered by a web browser engine to generate the bounding boxes for each table cell. A batching technique is utilized to optimize the runtime overhead of the rendering process."}, {"label": "section_header", "id": 15, "page_no": 10, "cluster": {"id": 15, "label": "section_header", "bbox": {"l": 307.9035339355469, "t": 621.6393432617188, "r": 545.10876, "b": 646.98631, "coord_origin": "TOPLEFT"}, "confidence": 0.9549407362937927, "cells": [{"id": 97, "text": "2.", "bbox": {"l": 308.862, "t": 622.2905900000001, "r": 316.76675, "b": 633.03831, "coord_origin": "TOPLEFT"}}, {"id": 98, "text": "Prediction post-processing for PDF docu-", "bbox": {"l": 327.30643, "t": 622.2905900000001, "r": 545.10876, "b": 633.03831, "coord_origin": "TOPLEFT"}}, {"id": 99, "text": "ments", "bbox": {"l": 326.79501, "t": 636.2385899999999, "r": 357.34055, "b": 646.98631, "coord_origin": "TOPLEFT"}}]}, "text": "2. Prediction post-processing for PDF documents"}, {"label": "text", "id": 16, "page_no": 10, "cluster": {"id": 16, "label": "text", "bbox": {"l": 308.0598449707031, "t": 656.1874389648438, "r": 545.1201171875, "b": 714.419189453125, "coord_origin": "TOPLEFT"}, "confidence": 0.9829330444335938, "cells": [{"id": 100, "text": "Although TableFormer can predict the table structure and", "bbox": {"l": 320.81702, "t": 657.42104, "r": 545.11499, "b": 666.3276, "coord_origin": "TOPLEFT"}}, {"id": 101, "text": "the bounding boxes for tables recognized inside PDF docu-", "bbox": {"l": 308.86203, "t": 669.37604, "r": 545.11511, "b": 678.2826, "coord_origin": "TOPLEFT"}}, {"id": 102, "text": "ments, this is not enough when a full reconstruction of the", "bbox": {"l": 308.86203, "t": 681.33104, "r": 545.11517, "b": 690.2376, "coord_origin": "TOPLEFT"}}, {"id": 103, "text": "original table is required. This happens mainly due the fol-", "bbox": {"l": 308.86203, "t": 693.286041, "r": 545.11505, "b": 702.1926040000001, "coord_origin": "TOPLEFT"}}, {"id": 104, "text": "lowing reasons:", "bbox": {"l": 308.86203, "t": 705.242035, "r": 371.42719, "b": 714.148605, "coord_origin": "TOPLEFT"}}]}, "text": "Although TableFormer can predict the table structure and the bounding boxes for tables recognized inside PDF documents, this is not enough when a full reconstruction of the original table is required. This happens mainly due the following reasons:"}, {"label": "page_footer", "id": 17, "page_no": 10, "cluster": {"id": 17, "label": "page_footer", "bbox": {"l": 292.63104, "t": 733.1739501953125, "r": 302.59363, "b": 743.0396, "coord_origin": "TOPLEFT"}, "confidence": 0.8986664414405823, "cells": [{"id": 105, "text": "11", "bbox": {"l": 292.63104, "t": 734.1330379999999, "r": 302.59363, "b": 743.0396, "coord_origin": "TOPLEFT"}}]}, "text": "11"}], "body": [{"label": "section_header", "id": 0, "page_no": 10, "cluster": {"id": 0, "label": "section_header", "bbox": {"l": 132.30833435058594, "t": 109.52092742919922, "r": 465.6390075683594, "b": 135.91281127929688, "coord_origin": "TOPLEFT"}, "confidence": 0.6497087478637695, "cells": [{"id": 0, "text": "TableFormer: Table Structure Understanding with Transformers", "bbox": {"l": 132.842, "t": 110.57488999999998, "r": 465.37591999999995, "b": 121.32263, "coord_origin": "TOPLEFT"}}, {"id": 1, "text": "Supplementary Material", "bbox": {"l": 220.18399, "t": 122.25982999999997, "r": 375.04269, "b": 135.53008999999997, "coord_origin": "TOPLEFT"}}]}, "text": "TableFormer: Table Structure Understanding with Transformers Supplementary Material"}, {"label": "section_header", "id": 1, "page_no": 10, "cluster": {"id": 1, "label": "section_header", "bbox": {"l": 50.058563232421875, "t": 160.34249877929688, "r": 175.96437, "b": 171.90863000000002, "coord_origin": "TOPLEFT"}, "confidence": 0.9454860091209412, "cells": [{"id": 2, "text": "1.", "bbox": {"l": 50.111984, "t": 161.16089, "r": 57.089828, "b": 171.90863000000002, "coord_origin": "TOPLEFT"}}, {"id": 3, "text": "Details on the datasets", "bbox": {"l": 66.393616, "t": 161.16089, "r": 175.96437, "b": 171.90863000000002, "coord_origin": "TOPLEFT"}}]}, "text": "1. Details on the datasets"}, {"label": "section_header", "id": 2, "page_no": 10, "cluster": {"id": 2, "label": "section_header", "bbox": {"l": 49.89580154418945, "t": 180.4131622314453, "r": 150.36401, "b": 191.2897491455078, "coord_origin": "TOPLEFT"}, "confidence": 0.9535645246505737, "cells": [{"id": 4, "text": "1.1.", "bbox": {"l": 50.111984, "t": 180.97931000000005, "r": 64.210808, "b": 190.83136000000002, "coord_origin": "TOPLEFT"}}, {"id": 5, "text": "Data preparation", "bbox": {"l": 73.610023, "t": 180.97931000000005, "r": 150.36401, "b": 190.83136000000002, "coord_origin": "TOPLEFT"}}]}, "text": "1.1. Data preparation"}, {"label": "text", "id": 3, "page_no": 10, "cluster": {"id": 3, "label": "text", "bbox": {"l": 49.297401428222656, "t": 198.91737365722656, "r": 286.80126953125, "b": 388.41900634765625, "coord_origin": "TOPLEFT"}, "confidence": 0.9864971041679382, "cells": [{"id": 6, "text": "As a first step of our data preparation process, we have", "bbox": {"l": 62.06698600000001, "t": 199.92029000000002, "r": 286.36496, "b": 208.82683999999995, "coord_origin": "TOPLEFT"}}, {"id": 7, "text": "calculated statistics over the datasets across the following", "bbox": {"l": 50.111984, "t": 211.87627999999995, "r": 286.36505, "b": 220.78283999999996, "coord_origin": "TOPLEFT"}}, {"id": 8, "text": "dimensions: (1) table size measured in the number of rows", "bbox": {"l": 50.111984, "t": 223.83130000000006, "r": 286.36514, "b": 232.73784999999998, "coord_origin": "TOPLEFT"}}, {"id": 9, "text": "and columns, (2) complexity of the table, (3) strictness of", "bbox": {"l": 50.111984, "t": 235.78632000000005, "r": 286.36508, "b": 244.69286999999997, "coord_origin": "TOPLEFT"}}, {"id": 10, "text": "the provided HTML structure and (4) completeness (i.e. no", "bbox": {"l": 50.111984, "t": 247.74132999999995, "r": 286.36505, "b": 256.64788999999996, "coord_origin": "TOPLEFT"}}, {"id": 11, "text": "omitted bounding boxes). A table is considered to be simple", "bbox": {"l": 50.111984, "t": 259.69635000000005, "r": 286.36505, "b": 268.60290999999995, "coord_origin": "TOPLEFT"}}, {"id": 12, "text": "if it does not contain row spans or column spans. Addition-", "bbox": {"l": 50.111984, "t": 271.65137000000004, "r": 286.36505, "b": 280.55792, "coord_origin": "TOPLEFT"}}, {"id": 13, "text": "ally, a table has a strict HTML structure if every row has the", "bbox": {"l": 50.111984, "t": 283.60736, "r": 286.36502, "b": 292.5139199999999, "coord_origin": "TOPLEFT"}}, {"id": 14, "text": "same number of columns after taking into account any row", "bbox": {"l": 50.111984, "t": 295.56235, "r": 286.36505, "b": 304.4689, "coord_origin": "TOPLEFT"}}, {"id": 15, "text": "or column spans. Therefore a strict HTML structure looks", "bbox": {"l": 50.111984, "t": 307.5173300000001, "r": 286.36508, "b": 316.42389, "coord_origin": "TOPLEFT"}}, {"id": 16, "text": "always rectangular. However, HTML is a lenient encoding", "bbox": {"l": 50.111984, "t": 319.47232, "r": 286.36505, "b": 328.3788799999999, "coord_origin": "TOPLEFT"}}, {"id": 17, "text": "format, i.e. tables with rows of different sizes might still", "bbox": {"l": 50.111984, "t": 331.42731000000003, "r": 286.36502, "b": 340.33386, "coord_origin": "TOPLEFT"}}, {"id": 18, "text": "be regarded as correct due to implicit display rules. These", "bbox": {"l": 50.111984, "t": 343.3833, "r": 286.36508, "b": 352.28986, "coord_origin": "TOPLEFT"}}, {"id": 19, "text": "implicit rules leave room for ambiguity, which we want to", "bbox": {"l": 50.111984, "t": 355.33829, "r": 286.36505, "b": 364.24484000000007, "coord_origin": "TOPLEFT"}}, {"id": 20, "text": "avoid. As such, we prefer to have \u201dstrict\u201d tables, i.e. tables", "bbox": {"l": 50.111984, "t": 367.29327, "r": 286.36508, "b": 376.19983, "coord_origin": "TOPLEFT"}}, {"id": 21, "text": "where every row has exactly the same length.", "bbox": {"l": 50.111984, "t": 379.24826, "r": 230.80364999999998, "b": 388.15482000000003, "coord_origin": "TOPLEFT"}}]}, "text": "As a first step of our data preparation process, we have calculated statistics over the datasets across the following dimensions: (1) table size measured in the number of rows and columns, (2) complexity of the table, (3) strictness of the provided HTML structure and (4) completeness (i.e. no omitted bounding boxes). A table is considered to be simple if it does not contain row spans or column spans. Additionally, a table has a strict HTML structure if every row has the same number of columns after taking into account any row or column spans. Therefore a strict HTML structure looks always rectangular. However, HTML is a lenient encoding format, i.e. tables with rows of different sizes might still be regarded as correct due to implicit display rules. These implicit rules leave room for ambiguity, which we want to avoid. As such, we prefer to have \u201dstrict\u201d tables, i.e. tables where every row has exactly the same length."}, {"label": "text", "id": 4, "page_no": 10, "cluster": {"id": 4, "label": "text", "bbox": {"l": 49.358585357666016, "t": 390.24591064453125, "r": 286.73260498046875, "b": 628.0463256835938, "coord_origin": "TOPLEFT"}, "confidence": 0.9826022386550903, "cells": [{"id": 22, "text": "We have developed a technique that tries to derive a", "bbox": {"l": 62.06698600000001, "t": 391.40527, "r": 286.36499, "b": 400.31183, "coord_origin": "TOPLEFT"}}, {"id": 23, "text": "missing bounding box out of its neighbors. As a first step,", "bbox": {"l": 50.111984, "t": 403.36026, "r": 286.36508, "b": 412.26681999999994, "coord_origin": "TOPLEFT"}}, {"id": 24, "text": "we use the annotation data to generate the most fine-grained", "bbox": {"l": 50.111984, "t": 415.31525, "r": 286.36505, "b": 424.22180000000003, "coord_origin": "TOPLEFT"}}, {"id": 25, "text": "grid that covers the table structure. In case of strict HTML", "bbox": {"l": 50.111984, "t": 427.2712399999999, "r": 286.36505, "b": 436.1778, "coord_origin": "TOPLEFT"}}, {"id": 26, "text": "tables, all grid squares are associated with some table cell", "bbox": {"l": 50.111984, "t": 439.22623, "r": 286.36508, "b": 448.1327800000001, "coord_origin": "TOPLEFT"}}, {"id": 27, "text": "and in the presence of table spans a cell extends across mul-", "bbox": {"l": 50.111984, "t": 451.18121, "r": 286.36511, "b": 460.08777, "coord_origin": "TOPLEFT"}}, {"id": 28, "text": "tiple grid squares. When enough bounding boxes are known", "bbox": {"l": 50.111984, "t": 463.1362, "r": 286.36505, "b": 472.04276, "coord_origin": "TOPLEFT"}}, {"id": 29, "text": "for a rectangular table, it is possible to compute the geo-", "bbox": {"l": 50.111984, "t": 475.09119, "r": 286.36508, "b": 483.99774, "coord_origin": "TOPLEFT"}}, {"id": 30, "text": "metrical border lines between the grid rows and columns.", "bbox": {"l": 50.111984, "t": 487.04617, "r": 286.36502, "b": 495.95273, "coord_origin": "TOPLEFT"}}, {"id": 31, "text": "Eventually this information is used to generate the missing", "bbox": {"l": 50.111984, "t": 499.00217, "r": 286.36511, "b": 507.90872, "coord_origin": "TOPLEFT"}}, {"id": 32, "text": "bounding boxes. Additionally, the existence of unused grid", "bbox": {"l": 50.111984, "t": 510.95715, "r": 286.36508, "b": 519.8637100000001, "coord_origin": "TOPLEFT"}}, {"id": 33, "text": "squares indicates that the table rows have unequal number", "bbox": {"l": 50.111984, "t": 522.91214, "r": 286.36508, "b": 531.8187, "coord_origin": "TOPLEFT"}}, {"id": 34, "text": "of columns and the overall structure is non-strict. The gen-", "bbox": {"l": 50.111984, "t": 534.86713, "r": 286.36505, "b": 543.7737, "coord_origin": "TOPLEFT"}}, {"id": 35, "text": "eration of missing bounding boxes for non-strict HTML ta-", "bbox": {"l": 50.111984, "t": 546.82214, "r": 286.36502, "b": 555.7287, "coord_origin": "TOPLEFT"}}, {"id": 36, "text": "bles is ambiguous and therefore quite challenging.", "bbox": {"l": 50.111984, "t": 558.77814, "r": 257.47351, "b": 567.68469, "coord_origin": "TOPLEFT"}}, {"id": 37, "text": "Thus,", "bbox": {"l": 263.94919, "t": 558.77814, "r": 286.36505, "b": 567.68469, "coord_origin": "TOPLEFT"}}, {"id": 38, "text": "we have decided to simply discard those tables. In case of", "bbox": {"l": 50.111984, "t": 570.73314, "r": 286.36508, "b": 579.63969, "coord_origin": "TOPLEFT"}}, {"id": 39, "text": "PubTabNet we have computed missing bounding boxes for", "bbox": {"l": 50.111984, "t": 582.68814, "r": 286.36511, "b": 591.5947, "coord_origin": "TOPLEFT"}}, {"id": 40, "text": "48% of the simple and 69% of the complex tables. Regard-", "bbox": {"l": 50.111984, "t": 594.64314, "r": 286.36511, "b": 603.5497, "coord_origin": "TOPLEFT"}}, {"id": 41, "text": "ing FinTabNet, 68% of the simple and 98% of the complex", "bbox": {"l": 50.111984, "t": 606.5981400000001, "r": 286.36505, "b": 615.5047, "coord_origin": "TOPLEFT"}}, {"id": 42, "text": "tables require the generation of bounding boxes.", "bbox": {"l": 50.111984, "t": 618.55315, "r": 242.2606, "b": 627.4597, "coord_origin": "TOPLEFT"}}]}, "text": "We have developed a technique that tries to derive a missing bounding box out of its neighbors. As a first step, we use the annotation data to generate the most fine-grained grid that covers the table structure. In case of strict HTML tables, all grid squares are associated with some table cell and in the presence of table spans a cell extends across multiple grid squares. When enough bounding boxes are known for a rectangular table, it is possible to compute the geometrical border lines between the grid rows and columns. Eventually this information is used to generate the missing bounding boxes. Additionally, the existence of unused grid squares indicates that the table rows have unequal number of columns and the overall structure is non-strict. The generation of missing bounding boxes for non-strict HTML tables is ambiguous and therefore quite challenging. Thus, we have decided to simply discard those tables. In case of PubTabNet we have computed missing bounding boxes for 48% of the simple and 69% of the complex tables. Regarding FinTabNet, 68% of the simple and 98% of the complex tables require the generation of bounding boxes."}, {"label": "text", "id": 5, "page_no": 10, "cluster": {"id": 5, "label": "text", "bbox": {"l": 49.51209259033203, "t": 629.6376342773438, "r": 286.36496, "b": 651.6802978515625, "coord_origin": "TOPLEFT"}, "confidence": 0.9335850477218628, "cells": [{"id": 43, "text": "Figure 7 illustrates the distribution of the tables across", "bbox": {"l": 62.06698600000001, "t": 630.71014, "r": 286.36496, "b": 639.6167, "coord_origin": "TOPLEFT"}}, {"id": 44, "text": "different dimensions per dataset.", "bbox": {"l": 50.111984, "t": 642.66614, "r": 179.90472, "b": 651.57269, "coord_origin": "TOPLEFT"}}]}, "text": "Figure 7 illustrates the distribution of the tables across different dimensions per dataset."}, {"label": "section_header", "id": 6, "page_no": 10, "cluster": {"id": 6, "label": "section_header", "bbox": {"l": 50.04091262817383, "t": 661.6310424804688, "r": 153.79356384277344, "b": 672.540283203125, "coord_origin": "TOPLEFT"}, "confidence": 0.9515743255615234, "cells": [{"id": 45, "text": "1.2.", "bbox": {"l": 50.111984, "t": 662.39014, "r": 64.297272, "b": 672.24219, "coord_origin": "TOPLEFT"}}, {"id": 46, "text": "Synthetic datasets", "bbox": {"l": 73.754135, "t": 662.39014, "r": 153.60785, "b": 672.24219, "coord_origin": "TOPLEFT"}}]}, "text": "1.2. Synthetic datasets"}, {"label": "text", "id": 7, "page_no": 10, "cluster": {"id": 7, "label": "text", "bbox": {"l": 49.59552001953125, "t": 680.3189086914062, "r": 287.0692443847656, "b": 714.6337890625, "coord_origin": "TOPLEFT"}, "confidence": 0.9783331751823425, "cells": [{"id": 47, "text": "Aiming to train and evaluate our models in a broader", "bbox": {"l": 62.06698600000001, "t": 681.33113, "r": 286.36493, "b": 690.2377, "coord_origin": "TOPLEFT"}}, {"id": 48, "text": "spectrum of table data we have synthesized four types of", "bbox": {"l": 50.111984, "t": 693.2861330000001, "r": 286.36505, "b": 702.1927029999999, "coord_origin": "TOPLEFT"}}, {"id": 49, "text": "datasets.", "bbox": {"l": 50.111984, "t": 705.241135, "r": 84.144226, "b": 714.147705, "coord_origin": "TOPLEFT"}}, {"id": 50, "text": "Each one contains tables with different appear-", "bbox": {"l": 91.237595, "t": 705.241135, "r": 286.36505, "b": 714.147705, "coord_origin": "TOPLEFT"}}]}, "text": "Aiming to train and evaluate our models in a broader spectrum of table data we have synthesized four types of datasets. Each one contains tables with different appear-"}, {"label": "text", "id": 8, "page_no": 10, "cluster": {"id": 8, "label": "text", "bbox": {"l": 307.9798583984375, "t": 161.70252990722656, "r": 545.11511, "b": 207.59095764160156, "coord_origin": "TOPLEFT"}, "confidence": 0.9596063494682312, "cells": [{"id": 51, "text": "ances in regard to their size, structure, style and content.", "bbox": {"l": 308.862, "t": 162.65515000000005, "r": 545.11511, "b": 171.56170999999995, "coord_origin": "TOPLEFT"}}, {"id": 52, "text": "Every synthetic dataset contains 150k examples, summing", "bbox": {"l": 308.862, "t": 174.61017000000004, "r": 545.11511, "b": 183.51671999999996, "coord_origin": "TOPLEFT"}}, {"id": 53, "text": "up to 600k synthetic examples. All datasets are divided into", "bbox": {"l": 308.862, "t": 186.56519000000003, "r": 545.11511, "b": 195.47173999999995, "coord_origin": "TOPLEFT"}}, {"id": 54, "text": "Train, Test and Val splits (80%, 10%, 10%).", "bbox": {"l": 308.862, "t": 198.52117999999996, "r": 484.07434, "b": 207.42773, "coord_origin": "TOPLEFT"}}]}, "text": "ances in regard to their size, structure, style and content. Every synthetic dataset contains 150k examples, summing up to 600k synthetic examples. All datasets are divided into Train, Test and Val splits (80%, 10%, 10%)."}, {"label": "text", "id": 9, "page_no": 10, "cluster": {"id": 9, "label": "text", "bbox": {"l": 307.9639892578125, "t": 210.19483947753906, "r": 545.11505, "b": 232.41128540039062, "coord_origin": "TOPLEFT"}, "confidence": 0.9261796474456787, "cells": [{"id": 55, "text": "The process of generating a synthetic dataset can be de-", "bbox": {"l": 320.81699, "t": 211.23517000000004, "r": 545.11505, "b": 220.14171999999996, "coord_origin": "TOPLEFT"}}, {"id": 56, "text": "composed into the following steps:", "bbox": {"l": 308.862, "t": 223.19019000000003, "r": 448.08939, "b": 232.09673999999995, "coord_origin": "TOPLEFT"}}]}, "text": "The process of generating a synthetic dataset can be decomposed into the following steps:"}, {"label": "list_item", "id": 10, "page_no": 10, "cluster": {"id": 10, "label": "list_item", "bbox": {"l": 308.30316162109375, "t": 234.8782196044922, "r": 545.11511, "b": 316.54279, "coord_origin": "TOPLEFT"}, "confidence": 0.9642953872680664, "cells": [{"id": 57, "text": "1.", "bbox": {"l": 320.81699, "t": 235.90521, "r": 328.28894, "b": 244.81177000000002, "coord_origin": "TOPLEFT"}}, {"id": 58, "text": "Prepare styling and content templates: The styling", "bbox": {"l": 335.38232, "t": 235.90521, "r": 545.11499, "b": 244.81177000000002, "coord_origin": "TOPLEFT"}}, {"id": 59, "text": "templates have been manually designed and organized into", "bbox": {"l": 308.862, "t": 247.86023, "r": 545.11511, "b": 256.76678000000004, "coord_origin": "TOPLEFT"}}, {"id": 60, "text": "groups of scope specific appearances (e.g. financial data,", "bbox": {"l": 308.862, "t": 259.81525, "r": 545.11511, "b": 268.72180000000003, "coord_origin": "TOPLEFT"}}, {"id": 61, "text": "marketing data, etc.)", "bbox": {"l": 308.862, "t": 271.77026, "r": 393.3847, "b": 280.67682, "coord_origin": "TOPLEFT"}}, {"id": 62, "text": "Additionally, we have prepared cu-", "bbox": {"l": 400.11942, "t": 271.77026, "r": 545.11511, "b": 280.67682, "coord_origin": "TOPLEFT"}}, {"id": 63, "text": "rated collections of content templates by extracting the most", "bbox": {"l": 308.862, "t": 283.72524999999996, "r": 545.11505, "b": 292.63181, "coord_origin": "TOPLEFT"}}, {"id": 64, "text": "frequently used terms out of non-synthetic datasets (e.g.", "bbox": {"l": 308.862, "t": 295.68124, "r": 545.11511, "b": 304.5878000000001, "coord_origin": "TOPLEFT"}}, {"id": 65, "text": "PubTabNet, FinTabNet, etc.).", "bbox": {"l": 308.862, "t": 307.63623, "r": 425.69348, "b": 316.54279, "coord_origin": "TOPLEFT"}}]}, "text": "1. Prepare styling and content templates: The styling templates have been manually designed and organized into groups of scope specific appearances (e.g. financial data, marketing data, etc.) Additionally, we have prepared curated collections of content templates by extracting the most frequently used terms out of non-synthetic datasets (e.g. PubTabNet, FinTabNet, etc.)."}, {"label": "list_item", "id": 11, "page_no": 10, "cluster": {"id": 11, "label": "list_item", "bbox": {"l": 307.91741943359375, "t": 319.1956481933594, "r": 545.44873046875, "b": 449.3292236328125, "coord_origin": "TOPLEFT"}, "confidence": 0.9699996113777161, "cells": [{"id": 66, "text": "2.", "bbox": {"l": 320.81699, "t": 320.35022, "r": 328.4949, "b": 329.25677, "coord_origin": "TOPLEFT"}}, {"id": 67, "text": "Generate table structures: The structure of each syn-", "bbox": {"l": 331.05423, "t": 320.35022, "r": 545.11499, "b": 329.25677, "coord_origin": "TOPLEFT"}}, {"id": 68, "text": "thetic dataset assumes a horizontal table header which po-", "bbox": {"l": 308.862, "t": 332.30521000000005, "r": 545.11517, "b": 341.21176, "coord_origin": "TOPLEFT"}}, {"id": 69, "text": "tentially spans over multiple rows and a table body that", "bbox": {"l": 308.862, "t": 344.26018999999997, "r": 545.11505, "b": 353.16675, "coord_origin": "TOPLEFT"}}, {"id": 70, "text": "may contain a combination of row spans and column spans.", "bbox": {"l": 308.862, "t": 356.21619, "r": 545.11511, "b": 365.12273999999996, "coord_origin": "TOPLEFT"}}, {"id": 71, "text": "However, spans are not allowed to cross the header - body", "bbox": {"l": 308.862, "t": 368.17117, "r": 545.11511, "b": 377.07773, "coord_origin": "TOPLEFT"}}, {"id": 72, "text": "boundary. The table structure is described by the parame-", "bbox": {"l": 308.862, "t": 380.12616, "r": 545.11499, "b": 389.03271, "coord_origin": "TOPLEFT"}}, {"id": 73, "text": "ters: Total number of table rows and columns, number of", "bbox": {"l": 308.862, "t": 392.08115, "r": 545.11517, "b": 400.98769999999996, "coord_origin": "TOPLEFT"}}, {"id": 74, "text": "header rows, type of spans (header only spans, row only", "bbox": {"l": 308.862, "t": 404.03613000000007, "r": 545.11511, "b": 412.94269, "coord_origin": "TOPLEFT"}}, {"id": 75, "text": "spans, column only spans, both row and column spans),", "bbox": {"l": 308.862, "t": 415.99112, "r": 545.11499, "b": 424.89767, "coord_origin": "TOPLEFT"}}, {"id": 76, "text": "maximum span size and the ratio of the table area covered", "bbox": {"l": 308.862, "t": 427.94711, "r": 545.11517, "b": 436.85367, "coord_origin": "TOPLEFT"}}, {"id": 77, "text": "by spans.", "bbox": {"l": 308.862, "t": 439.9021, "r": 345.94278, "b": 448.80865, "coord_origin": "TOPLEFT"}}]}, "text": "2. Generate table structures: The structure of each synthetic dataset assumes a horizontal table header which potentially spans over multiple rows and a table body that may contain a combination of row spans and column spans. However, spans are not allowed to cross the header - body boundary. The table structure is described by the parameters: Total number of table rows and columns, number of header rows, type of spans (header only spans, row only spans, column only spans, both row and column spans), maximum span size and the ratio of the table area covered by spans."}, {"label": "list_item", "id": 12, "page_no": 10, "cluster": {"id": 12, "label": "list_item", "bbox": {"l": 308.2115783691406, "t": 451.67333984375, "r": 545.591552734375, "b": 497.38861, "coord_origin": "TOPLEFT"}, "confidence": 0.9568929672241211, "cells": [{"id": 78, "text": "3.", "bbox": {"l": 320.81699, "t": 452.61609, "r": 328.30341, "b": 461.52264, "coord_origin": "TOPLEFT"}}, {"id": 79, "text": "Generate content: Based on the dataset", "bbox": {"l": 330.79889, "t": 452.61609, "r": 485.75772000000006, "b": 461.52264, "coord_origin": "TOPLEFT"}}, {"id": 80, "text": "theme", "bbox": {"l": 488.073, "t": 452.70575, "r": 511.86368, "b": 461.29352, "coord_origin": "TOPLEFT"}}, {"id": 81, "text": ", a set of", "bbox": {"l": 511.86301, "t": 452.61609, "r": 545.10815, "b": 461.52264, "coord_origin": "TOPLEFT"}}, {"id": 82, "text": "suitable content templates is chosen first. Then, this content", "bbox": {"l": 308.862, "t": 464.57108, "r": 545.11505, "b": 473.47763, "coord_origin": "TOPLEFT"}}, {"id": 83, "text": "can be combined with purely random text to produce the", "bbox": {"l": 308.862, "t": 476.52707, "r": 545.11517, "b": 485.43362, "coord_origin": "TOPLEFT"}}, {"id": 84, "text": "synthetic content.", "bbox": {"l": 308.862, "t": 488.48206, "r": 379.14816, "b": 497.38861, "coord_origin": "TOPLEFT"}}]}, "text": "3. Generate content: Based on the dataset theme , a set of suitable content templates is chosen first. Then, this content can be combined with purely random text to produce the synthetic content."}, {"label": "list_item", "id": 13, "page_no": 10, "cluster": {"id": 13, "label": "list_item", "bbox": {"l": 308.10528564453125, "t": 500.1474304199219, "r": 545.2609252929688, "b": 546.41064453125, "coord_origin": "TOPLEFT"}, "confidence": 0.9710659980773926, "cells": [{"id": 85, "text": "4.", "bbox": {"l": 320.81699, "t": 501.19604, "r": 328.66177, "b": 510.1026, "coord_origin": "TOPLEFT"}}, {"id": 86, "text": "Apply styling templates: Depending on the domain", "bbox": {"l": 331.2767, "t": 501.19604, "r": 545.11493, "b": 510.1026, "coord_origin": "TOPLEFT"}}, {"id": 87, "text": "of the synthetic dataset, a set of styling templates is first", "bbox": {"l": 308.862, "t": 513.15103, "r": 545.1153, "b": 522.05759, "coord_origin": "TOPLEFT"}}, {"id": 88, "text": "manually selected.", "bbox": {"l": 308.862, "t": 525.10703, "r": 384.29883, "b": 534.01358, "coord_origin": "TOPLEFT"}}, {"id": 89, "text": "Then, a style is randomly selected to", "bbox": {"l": 391.25272, "t": 525.10703, "r": 545.11511, "b": 534.01358, "coord_origin": "TOPLEFT"}}, {"id": 90, "text": "format the appearance of the synthesized table.", "bbox": {"l": 308.862, "t": 537.06203, "r": 496.15897000000007, "b": 545.96858, "coord_origin": "TOPLEFT"}}]}, "text": "4. Apply styling templates: Depending on the domain of the synthetic dataset, a set of styling templates is first manually selected. Then, a style is randomly selected to format the appearance of the synthesized table."}, {"label": "list_item", "id": 14, "page_no": 10, "cluster": {"id": 14, "label": "list_item", "bbox": {"l": 308.0560607910156, "t": 548.7080078125, "r": 545.3718872070312, "b": 607.4362182617188, "coord_origin": "TOPLEFT"}, "confidence": 0.9778757095336914, "cells": [{"id": 91, "text": "5.", "bbox": {"l": 320.81699, "t": 549.77603, "r": 328.28894, "b": 558.68259, "coord_origin": "TOPLEFT"}}, {"id": 92, "text": "Render the complete tables: The synthetic table is", "bbox": {"l": 335.40222, "t": 549.77603, "r": 545.11499, "b": 558.68259, "coord_origin": "TOPLEFT"}}, {"id": 93, "text": "finally rendered by a web browser engine to generate the", "bbox": {"l": 308.862, "t": 561.73103, "r": 545.11517, "b": 570.63759, "coord_origin": "TOPLEFT"}}, {"id": 94, "text": "bounding boxes for each table cell. A batching technique is", "bbox": {"l": 308.862, "t": 573.68604, "r": 545.11511, "b": 582.59259, "coord_origin": "TOPLEFT"}}, {"id": 95, "text": "utilized to optimize the runtime overhead of the rendering", "bbox": {"l": 308.862, "t": 585.64203, "r": 545.11505, "b": 594.54858, "coord_origin": "TOPLEFT"}}, {"id": 96, "text": "process.", "bbox": {"l": 308.862, "t": 597.59703, "r": 341.2305, "b": 606.50359, "coord_origin": "TOPLEFT"}}]}, "text": "5. Render the complete tables: The synthetic table is finally rendered by a web browser engine to generate the bounding boxes for each table cell. A batching technique is utilized to optimize the runtime overhead of the rendering process."}, {"label": "section_header", "id": 15, "page_no": 10, "cluster": {"id": 15, "label": "section_header", "bbox": {"l": 307.9035339355469, "t": 621.6393432617188, "r": 545.10876, "b": 646.98631, "coord_origin": "TOPLEFT"}, "confidence": 0.9549407362937927, "cells": [{"id": 97, "text": "2.", "bbox": {"l": 308.862, "t": 622.2905900000001, "r": 316.76675, "b": 633.03831, "coord_origin": "TOPLEFT"}}, {"id": 98, "text": "Prediction post-processing for PDF docu-", "bbox": {"l": 327.30643, "t": 622.2905900000001, "r": 545.10876, "b": 633.03831, "coord_origin": "TOPLEFT"}}, {"id": 99, "text": "ments", "bbox": {"l": 326.79501, "t": 636.2385899999999, "r": 357.34055, "b": 646.98631, "coord_origin": "TOPLEFT"}}]}, "text": "2. Prediction post-processing for PDF documents"}, {"label": "text", "id": 16, "page_no": 10, "cluster": {"id": 16, "label": "text", "bbox": {"l": 308.0598449707031, "t": 656.1874389648438, "r": 545.1201171875, "b": 714.419189453125, "coord_origin": "TOPLEFT"}, "confidence": 0.9829330444335938, "cells": [{"id": 100, "text": "Although TableFormer can predict the table structure and", "bbox": {"l": 320.81702, "t": 657.42104, "r": 545.11499, "b": 666.3276, "coord_origin": "TOPLEFT"}}, {"id": 101, "text": "the bounding boxes for tables recognized inside PDF docu-", "bbox": {"l": 308.86203, "t": 669.37604, "r": 545.11511, "b": 678.2826, "coord_origin": "TOPLEFT"}}, {"id": 102, "text": "ments, this is not enough when a full reconstruction of the", "bbox": {"l": 308.86203, "t": 681.33104, "r": 545.11517, "b": 690.2376, "coord_origin": "TOPLEFT"}}, {"id": 103, "text": "original table is required. This happens mainly due the fol-", "bbox": {"l": 308.86203, "t": 693.286041, "r": 545.11505, "b": 702.1926040000001, "coord_origin": "TOPLEFT"}}, {"id": 104, "text": "lowing reasons:", "bbox": {"l": 308.86203, "t": 705.242035, "r": 371.42719, "b": 714.148605, "coord_origin": "TOPLEFT"}}]}, "text": "Although TableFormer can predict the table structure and the bounding boxes for tables recognized inside PDF documents, this is not enough when a full reconstruction of the original table is required. This happens mainly due the following reasons:"}], "headers": [{"label": "page_footer", "id": 17, "page_no": 10, "cluster": {"id": 17, "label": "page_footer", "bbox": {"l": 292.63104, "t": 733.1739501953125, "r": 302.59363, "b": 743.0396, "coord_origin": "TOPLEFT"}, "confidence": 0.8986664414405823, "cells": [{"id": 105, "text": "11", "bbox": {"l": 292.63104, "t": 734.1330379999999, "r": 302.59363, "b": 743.0396, "coord_origin": "TOPLEFT"}}]}, "text": "11"}]}}, {"page_no": 11, "size": {"width": 612.0, "height": 792.0}, "cells": [{"id": 0, "text": "PubTabNet", "bbox": {"l": 119.39108, "t": 77.31055000000003, "r": 151.94641, "b": 83.25922000000003, "coord_origin": "TOPLEFT"}}, {"id": 1, "text": "b.", "bbox": {"l": 53.345978, "t": 75.19152999999994, "r": 59.327053, "b": 81.14020000000005, "coord_origin": "TOPLEFT"}}, {"id": 2, "text": "FinTabNet", "bbox": {"l": 289.5791, "t": 77.45830999999998, "r": 319.8266, "b": 83.40698000000009, "coord_origin": "TOPLEFT"}}, {"id": 3, "text": "Table Bank", "bbox": {"l": 448.37271, "t": 77.25396999999987, "r": 481.75916, "b": 83.20263999999997, "coord_origin": "TOPLEFT"}}, {"id": 4, "text": "Train", "bbox": {"l": 82.553436, "t": 141.27617999999995, "r": 94.976013, "b": 146.23339999999996, "coord_origin": "TOPLEFT"}}, {"id": 5, "text": "Complex", "bbox": {"l": 63.03878399999999, "t": 101.10413000000005, "r": 85.290085, "b": 106.06133999999986, "coord_origin": "TOPLEFT"}}, {"id": 6, "text": "Simple", "bbox": {"l": 67.76786, "t": 124.39531999999997, "r": 85.231277, "b": 129.35253999999998, "coord_origin": "TOPLEFT"}}, {"id": 7, "text": "Complex", "bbox": {"l": 227.55121, "t": 102.53992000000005, "r": 249.80251, "b": 107.49712999999997, "coord_origin": "TOPLEFT"}}, {"id": 8, "text": "Simple", "bbox": {"l": 232.19898999999998, "t": 126.98577999999986, "r": 249.66241, "b": 131.94299, "coord_origin": "TOPLEFT"}}, {"id": 9, "text": "Simple", "bbox": {"l": 396.2337, "t": 114.04522999999995, "r": 413.69711, "b": 119.00243999999998, "coord_origin": "TOPLEFT"}}, {"id": 10, "text": "Val", "bbox": {"l": 97.382202, "t": 141.27617999999995, "r": 105.08014, "b": 146.23339999999996, "coord_origin": "TOPLEFT"}}, {"id": 11, "text": "100%", "bbox": {"l": 60.93763400000001, "t": 85.73321999999996, "r": 76.151443, "b": 90.69042999999999, "coord_origin": "TOPLEFT"}}, {"id": 12, "text": "500K 10K", "bbox": {"l": 82.304901, "t": 86.22351000000003, "r": 106.99162, "b": 91.18073000000015, "coord_origin": "TOPLEFT"}}, {"id": 13, "text": "Train Test Val", "bbox": {"l": 246.20530999999997, "t": 141.60608000000002, "r": 281.88013, "b": 146.56329000000005, "coord_origin": "TOPLEFT"}}, {"id": 14, "text": "100%", "bbox": {"l": 226.69780000000003, "t": 85.73321999999996, "r": 241.91161, "b": 90.69042999999999, "coord_origin": "TOPLEFT"}}, {"id": 15, "text": "91K 10K 10K", "bbox": {"l": 249.93848999999997, "t": 86.08801000000005, "r": 282.49384, "b": 91.04522999999995, "coord_origin": "TOPLEFT"}}, {"id": 16, "text": "Train Test Val", "bbox": {"l": 410.19409, "t": 141.27617999999995, "r": 444.68915, "b": 146.23339999999996, "coord_origin": "TOPLEFT"}}, {"id": 17, "text": "100% 130K 5K", "bbox": {"l": 391.37341, "t": 85.73321999999996, "r": 432.6716599999999, "b": 90.69042999999999, "coord_origin": "TOPLEFT"}}, {"id": 18, "text": "10K", "bbox": {"l": 435.60571000000004, "t": 86.26140999999996, "r": 445.62414999999993, "b": 91.21862999999996, "coord_origin": "TOPLEFT"}}, {"id": 19, "text": "Complex", "bbox": {"l": 113.94921, "t": 141.28845, "r": 136.20052, "b": 146.24567000000002, "coord_origin": "TOPLEFT"}}, {"id": 20, "text": "Non", "bbox": {"l": 116.91554000000001, "t": 94.81853999999998, "r": 127.05433999999998, "b": 99.77575999999999, "coord_origin": "TOPLEFT"}}, {"id": 21, "text": "Strict", "bbox": {"l": 113.3146, "t": 100.93853999999999, "r": 127.05298, "b": 105.89575000000002, "coord_origin": "TOPLEFT"}}, {"id": 22, "text": "HTML", "bbox": {"l": 112.94112, "t": 107.05853000000013, "r": 127.05537, "b": 112.01575000000003, "coord_origin": "TOPLEFT"}}, {"id": 23, "text": "Strict", "bbox": {"l": 113.22738999999999, "t": 122.61523, "r": 126.96577, "b": 127.57245, "coord_origin": "TOPLEFT"}}, {"id": 24, "text": "HTML", "bbox": {"l": 112.85390000000001, "t": 128.73523, "r": 126.96814999999998, "b": 133.69244000000003, "coord_origin": "TOPLEFT"}}, {"id": 25, "text": "Simple", "bbox": {"l": 138.57864, "t": 141.43640000000005, "r": 156.04207, "b": 146.39362000000006, "coord_origin": "TOPLEFT"}}, {"id": 26, "text": "230K 280K", "bbox": {"l": 122.03101, "t": 86.2713, "r": 151.04185, "b": 91.22852, "coord_origin": "TOPLEFT"}}, {"id": 27, "text": "65K", "bbox": {"l": 311.65359, "t": 86.55498999999998, "r": 321.67203, "b": 91.5122100000001, "coord_origin": "TOPLEFT"}}, {"id": 28, "text": "Complex", "bbox": {"l": 287.89441, "t": 141.71063000000004, "r": 310.14572, "b": 146.66785000000004, "coord_origin": "TOPLEFT"}}, {"id": 29, "text": "Non", "bbox": {"l": 289.23572, "t": 93.07977000000005, "r": 299.37451, "b": 98.03698999999995, "coord_origin": "TOPLEFT"}}, {"id": 30, "text": "Strict", "bbox": {"l": 285.63513, "t": 99.19976999999994, "r": 299.3735, "b": 104.15698000000009, "coord_origin": "TOPLEFT"}}, {"id": 31, "text": "HTML", "bbox": {"l": 285.26111, "t": 105.31975999999997, "r": 299.37537, "b": 110.27697999999998, "coord_origin": "TOPLEFT"}}, {"id": 32, "text": "Strict", "bbox": {"l": 285.43109, "t": 120.38995, "r": 299.16946, "b": 125.34717, "coord_origin": "TOPLEFT"}}, {"id": 33, "text": "HTML", "bbox": {"l": 285.05713, "t": 126.50995, "r": 299.17139, "b": 131.46716000000004, "coord_origin": "TOPLEFT"}}, {"id": 34, "text": "Simple", "bbox": {"l": 311.34592, "t": 141.71063000000004, "r": 328.80933, "b": 146.66785000000004, "coord_origin": "TOPLEFT"}}, {"id": 35, "text": "47K", "bbox": {"l": 299.58362, "t": 86.69353999999998, "r": 309.60205, "b": 91.65075999999999, "coord_origin": "TOPLEFT"}}, {"id": 36, "text": "Simple", "bbox": {"l": 466.04077000000007, "t": 141.67169, "r": 483.50418, "b": 146.62891000000002, "coord_origin": "TOPLEFT"}}, {"id": 37, "text": "Non", "bbox": {"l": 459.02151, "t": 93.76116999999999, "r": 469.16031000000004, "b": 98.71838000000002, "coord_origin": "TOPLEFT"}}, {"id": 38, "text": "Strict", "bbox": {"l": 455.4209, "t": 99.88116000000002, "r": 469.15927000000005, "b": 104.83838000000003, "coord_origin": "TOPLEFT"}}, {"id": 39, "text": "HTML", "bbox": {"l": 455.04691, "t": 106.00116000000014, "r": 469.16115999999994, "b": 110.95836999999995, "coord_origin": "TOPLEFT"}}, {"id": 40, "text": "145K", "bbox": {"l": 467.39401, "t": 85.57239000000004, "r": 480.6545100000001, "b": 90.52959999999996, "coord_origin": "TOPLEFT"}}, {"id": 41, "text": "Complex", "bbox": {"l": 160.37672, "t": 141.58385999999996, "r": 182.62802, "b": 146.54107999999997, "coord_origin": "TOPLEFT"}}, {"id": 42, "text": "Contain", "bbox": {"l": 153.74265, "t": 94.86481000000003, "r": 173.32664, "b": 99.82201999999995, "coord_origin": "TOPLEFT"}}, {"id": 43, "text": "Missing", "bbox": {"l": 154.50967, "t": 100.98479999999995, "r": 173.3246, "b": 105.94202000000007, "coord_origin": "TOPLEFT"}}, {"id": 44, "text": "bboxes", "bbox": {"l": 155.27162, "t": 107.10479999999995, "r": 173.32664, "b": 112.06200999999987, "coord_origin": "TOPLEFT"}}, {"id": 45, "text": "Contain", "bbox": {"l": 326.41302, "t": 107.23248000000001, "r": 345.99701, "b": 112.18970000000002, "coord_origin": "TOPLEFT"}}, {"id": 46, "text": "Missing", "bbox": {"l": 327.17972, "t": 113.35248000000001, "r": 345.99463, "b": 118.30969000000005, "coord_origin": "TOPLEFT"}}, {"id": 47, "text": "bboxes", "bbox": {"l": 327.94131, "t": 119.47247000000004, "r": 345.99634, "b": 124.42969000000005, "coord_origin": "TOPLEFT"}}, {"id": 48, "text": "Dataset", "bbox": {"l": 488.9942, "t": 104.15374999999983, "r": 508.76384999999993, "b": 109.11095999999998, "coord_origin": "TOPLEFT"}}, {"id": 49, "text": "doesn't", "bbox": {"l": 490.1893, "t": 110.27373999999998, "r": 508.76349000000005, "b": 115.2309600000001, "coord_origin": "TOPLEFT"}}, {"id": 50, "text": "provide", "bbox": {"l": 489.72009, "t": 116.39373999999998, "r": 508.76758, "b": 121.35095000000013, "coord_origin": "TOPLEFT"}}, {"id": 51, "text": "bboxes", "bbox": {"l": 490.71121, "t": 122.51373000000001, "r": 508.76624, "b": 127.47095000000002, "coord_origin": "TOPLEFT"}}, {"id": 52, "text": "Simple", "bbox": {"l": 185.37759, "t": 141.71118, "r": 202.84102, "b": 146.66840000000002, "coord_origin": "TOPLEFT"}}, {"id": 53, "text": "230K 280K", "bbox": {"l": 168.50357, "t": 86.13611000000003, "r": 197.52699, "b": 91.09331999999995, "coord_origin": "TOPLEFT"}}, {"id": 54, "text": "65K", "bbox": {"l": 357.3768, "t": 85.99707000000001, "r": 367.39523, "b": 90.95428000000004, "coord_origin": "TOPLEFT"}}, {"id": 55, "text": "Complex Simple", "bbox": {"l": 333.73151, "t": 141.62323000000004, "r": 374.92862, "b": 146.58043999999995, "coord_origin": "TOPLEFT"}}, {"id": 56, "text": "47K", "bbox": {"l": 345.69101, "t": 86.05591000000004, "r": 355.70944, "b": 91.01312000000007, "coord_origin": "TOPLEFT"}}, {"id": 57, "text": "Simple", "bbox": {"l": 508.54248, "t": 141.37683000000004, "r": 526.00592, "b": 146.33405000000005, "coord_origin": "TOPLEFT"}}, {"id": 58, "text": "145K", "bbox": {"l": 510.44653000000005, "t": 86.09258999999986, "r": 523.70703, "b": 91.0498, "coord_origin": "TOPLEFT"}}, {"id": 59, "text": "Figure 7: Distribution of the tables across different dimensions per dataset. Simple vs complex tables per dataset and split,", "bbox": {"l": 50.112, "t": 165.50238000000002, "r": 545.11371, "b": 174.40894000000003, "coord_origin": "TOPLEFT"}}, {"id": 60, "text": "strict vs non strict html structures per dataset and table complexity, missing bboxes per dataset and table complexity.", "bbox": {"l": 50.112, "t": 177.4574, "r": 513.52234, "b": 186.36395000000005, "coord_origin": "TOPLEFT"}}, {"id": 61, "text": "\u2022", "bbox": {"l": 61.569, "t": 210.93140000000005, "r": 71.14743, "b": 219.83794999999998, "coord_origin": "TOPLEFT"}}, {"id": 62, "text": "TableFormer output does not include the table cell con-", "bbox": {"l": 73.542038, "t": 210.93140000000005, "r": 286.36511, "b": 219.83794999999998, "coord_origin": "TOPLEFT"}}, {"id": 63, "text": "tent.", "bbox": {"l": 70.037003, "t": 222.88640999999996, "r": 87.47155, "b": 231.79296999999997, "coord_origin": "TOPLEFT"}}, {"id": 64, "text": "\u2022", "bbox": {"l": 61.569, "t": 244.07141000000001, "r": 71.345718, "b": 252.97797000000003, "coord_origin": "TOPLEFT"}}, {"id": 65, "text": "There are occasional inaccuracies in the predictions of", "bbox": {"l": 73.789902, "t": 244.07141000000001, "r": 286.36514, "b": 252.97797000000003, "coord_origin": "TOPLEFT"}}, {"id": 66, "text": "the bounding boxes.", "bbox": {"l": 70.037003, "t": 256.02643, "r": 150.41524, "b": 264.93298000000004, "coord_origin": "TOPLEFT"}}, {"id": 67, "text": "However, it is possible to mitigate those limitations by", "bbox": {"l": 62.067001, "t": 279.20343, "r": 286.36499, "b": 288.10999, "coord_origin": "TOPLEFT"}}, {"id": 68, "text": "combining the TableFormer predictions with the informa-", "bbox": {"l": 50.112, "t": 291.15842, "r": 286.36505, "b": 300.06497, "coord_origin": "TOPLEFT"}}, {"id": 69, "text": "tion already present inside a programmatic PDF document.", "bbox": {"l": 50.112, "t": 303.1134, "r": 286.36511, "b": 312.01996, "coord_origin": "TOPLEFT"}}, {"id": 70, "text": "More specifically, PDF documents can be seen as a se-", "bbox": {"l": 50.112, "t": 315.06839, "r": 286.36511, "b": 323.97495, "coord_origin": "TOPLEFT"}}, {"id": 71, "text": "quence of PDF cells where each cell is described by its con-", "bbox": {"l": 50.112, "t": 327.02438, "r": 286.36511, "b": 335.93093999999996, "coord_origin": "TOPLEFT"}}, {"id": 72, "text": "tent and bounding box. If we are able to associate the PDF", "bbox": {"l": 50.112, "t": 338.97937, "r": 286.36505, "b": 347.88593, "coord_origin": "TOPLEFT"}}, {"id": 73, "text": "cells with the predicted table cells, we can directly link the", "bbox": {"l": 50.112, "t": 350.93436, "r": 286.36508, "b": 359.84091, "coord_origin": "TOPLEFT"}}, {"id": 74, "text": "PDF cell content to the table cell structure and use the PDF", "bbox": {"l": 50.112, "t": 362.88934, "r": 286.36511, "b": 371.7959, "coord_origin": "TOPLEFT"}}, {"id": 75, "text": "bounding boxes to correct misalignments in the predicted", "bbox": {"l": 50.112, "t": 374.84433000000007, "r": 286.36508, "b": 383.75089, "coord_origin": "TOPLEFT"}}, {"id": 76, "text": "table cell bounding boxes.", "bbox": {"l": 50.112, "t": 386.80032, "r": 154.55988, "b": 395.70688, "coord_origin": "TOPLEFT"}}, {"id": 77, "text": "Here is a step-by-step description of the prediction post-", "bbox": {"l": 62.067001, "t": 399.06934, "r": 286.36496, "b": 407.97589, "coord_origin": "TOPLEFT"}}, {"id": 78, "text": "processing:", "bbox": {"l": 50.112, "t": 411.02533, "r": 95.491638, "b": 419.93188, "coord_origin": "TOPLEFT"}}, {"id": 79, "text": "1.", "bbox": {"l": 62.067001, "t": 423.29532, "r": 69.37281, "b": 432.20187, "coord_origin": "TOPLEFT"}}, {"id": 80, "text": "Get the minimal grid dimensions - number of rows and", "bbox": {"l": 71.808075, "t": 423.29532, "r": 286.36502, "b": 432.20187, "coord_origin": "TOPLEFT"}}, {"id": 81, "text": "columns for the predicted table structure. This represents", "bbox": {"l": 50.112, "t": 435.25031, "r": 286.36508, "b": 444.15686, "coord_origin": "TOPLEFT"}}, {"id": 82, "text": "the most granular grid for the underlying table structure.", "bbox": {"l": 50.112, "t": 447.20529, "r": 274.50958, "b": 456.11185000000006, "coord_origin": "TOPLEFT"}}, {"id": 83, "text": "2.", "bbox": {"l": 62.067001, "t": 459.47528, "r": 69.538948, "b": 468.38184, "coord_origin": "TOPLEFT"}}, {"id": 84, "text": "Generate pair-wise matches between the bounding", "bbox": {"l": 77.429329, "t": 459.47528, "r": 286.36499, "b": 468.38184, "coord_origin": "TOPLEFT"}}, {"id": 85, "text": "boxes of the PDF cells and the predicted cells. The Intersec-", "bbox": {"l": 50.112, "t": 471.43027, "r": 286.36505, "b": 480.33682, "coord_origin": "TOPLEFT"}}, {"id": 86, "text": "tion Over Union (IOU) metric is used to evaluate the quality", "bbox": {"l": 50.112, "t": 483.38525, "r": 286.36505, "b": 492.29181, "coord_origin": "TOPLEFT"}}, {"id": 87, "text": "of the matches.", "bbox": {"l": 50.112, "t": 495.34024, "r": 110.70452999999999, "b": 504.2468, "coord_origin": "TOPLEFT"}}, {"id": 88, "text": "3.", "bbox": {"l": 62.067001, "t": 507.61023, "r": 69.863068, "b": 516.5167799999999, "coord_origin": "TOPLEFT"}}, {"id": 89, "text": "Use a carefully selected IOU threshold to designate", "bbox": {"l": 72.461754, "t": 507.61023, "r": 286.36493, "b": 516.5167799999999, "coord_origin": "TOPLEFT"}}, {"id": 90, "text": "the matches as \u201cgood\u201d ones and \u201cbad\u201d ones.", "bbox": {"l": 50.112, "t": 519.5662199999999, "r": 226.0714, "b": 528.4727800000001, "coord_origin": "TOPLEFT"}}, {"id": 91, "text": "3.a. If all IOU scores in a column are below the thresh-", "bbox": {"l": 62.067001, "t": 531.83521, "r": 286.36496, "b": 540.7417800000001, "coord_origin": "TOPLEFT"}}, {"id": 92, "text": "old, discard all predictions (structure and bounding boxes)", "bbox": {"l": 50.112, "t": 543.79121, "r": 286.36511, "b": 552.69777, "coord_origin": "TOPLEFT"}}, {"id": 93, "text": "for that column.", "bbox": {"l": 50.112, "t": 555.74622, "r": 114.03204, "b": 564.65277, "coord_origin": "TOPLEFT"}}, {"id": 94, "text": "4.", "bbox": {"l": 62.067001, "t": 568.01622, "r": 69.538948, "b": 576.92278, "coord_origin": "TOPLEFT"}}, {"id": 95, "text": "Find the best-fitting content alignment for the pre-", "bbox": {"l": 76.731949, "t": 568.01622, "r": 286.36502, "b": 576.92278, "coord_origin": "TOPLEFT"}}, {"id": 96, "text": "dicted cells with good IOU per each column. The alignment", "bbox": {"l": 50.112, "t": 579.97122, "r": 286.36508, "b": 588.87778, "coord_origin": "TOPLEFT"}}, {"id": 97, "text": "of the column can be identified by the following formula:", "bbox": {"l": 50.112, "t": 591.9262200000001, "r": 278.70383, "b": 600.83278, "coord_origin": "TOPLEFT"}}, {"id": 98, "text": "alignment", "bbox": {"l": 112.02799999999999, "t": 623.99382, "r": 157.9516, "b": 632.84061, "coord_origin": "TOPLEFT"}}, {"id": 99, "text": "= arg min", "bbox": {"l": 160.715, "t": 623.99382, "r": 203.4964, "b": 632.84061, "coord_origin": "TOPLEFT"}}, {"id": 100, "text": "c", "bbox": {"l": 185.58499, "t": 633.98305, "r": 189.14511, "b": 640.17578, "coord_origin": "TOPLEFT"}}, {"id": 101, "text": "{", "bbox": {"l": 203.49899, "t": 623.43591, "r": 208.48029, "b": 632.84061, "coord_origin": "TOPLEFT"}}, {"id": 102, "text": "D$_{c}$", "bbox": {"l": 208.48099, "t": 623.99382, "r": 220.28911, "b": 632.84061, "coord_origin": "TOPLEFT"}}, {"id": 103, "text": "}", "bbox": {"l": 220.78699, "t": 623.43591, "r": 225.76828, "b": 632.84061, "coord_origin": "TOPLEFT"}}, {"id": 104, "text": "D$_{c}$", "bbox": {"l": 110.70499, "t": 645.25882, "r": 122.51310999999998, "b": 654.1056100000001, "coord_origin": "TOPLEFT"}}, {"id": 105, "text": "=", "bbox": {"l": 125.77899000000001, "t": 645.25882, "r": 133.52791, "b": 654.1056100000001, "coord_origin": "TOPLEFT"}}, {"id": 106, "text": "max", "bbox": {"l": 136.295, "t": 645.25882, "r": 156.00201, "b": 654.1056100000001, "coord_origin": "TOPLEFT"}}, {"id": 107, "text": "{", "bbox": {"l": 156.00299, "t": 644.70091, "r": 160.98428, "b": 654.1056100000001, "coord_origin": "TOPLEFT"}}, {"id": 108, "text": "x$_{c}$", "bbox": {"l": 160.98399, "t": 645.25882, "r": 170.23811, "b": 654.1056100000001, "coord_origin": "TOPLEFT"}}, {"id": 109, "text": "} \u2212", "bbox": {"l": 170.73599, "t": 644.70091, "r": 185.6779, "b": 654.1056100000001, "coord_origin": "TOPLEFT"}}, {"id": 110, "text": "min", "bbox": {"l": 187.894, "t": 645.25882, "r": 206.05283, "b": 654.1056100000001, "coord_origin": "TOPLEFT"}}, {"id": 111, "text": "{", "bbox": {"l": 206.054, "t": 644.70091, "r": 211.03529, "b": 654.1056100000001, "coord_origin": "TOPLEFT"}}, {"id": 112, "text": "x$_{c}$", "bbox": {"l": 211.035, "t": 645.25882, "r": 220.28912, "b": 654.1056100000001, "coord_origin": "TOPLEFT"}}, {"id": 113, "text": "}", "bbox": {"l": 220.787, "t": 644.70091, "r": 225.76829999999998, "b": 654.1056100000001, "coord_origin": "TOPLEFT"}}, {"id": 114, "text": "(4)", "bbox": {"l": 274.746, "t": 634.88522, "r": 286.3624, "b": 643.79178, "coord_origin": "TOPLEFT"}}, {"id": 115, "text": "where", "bbox": {"l": 50.112, "t": 668.06522, "r": 74.45063, "b": 676.97179, "coord_origin": "TOPLEFT"}}, {"id": 116, "text": "c", "bbox": {"l": 78.335999, "t": 667.90582, "r": 82.647812, "b": 676.75261, "coord_origin": "TOPLEFT"}}, {"id": 117, "text": "is one of", "bbox": {"l": 86.532997, "t": 668.06522, "r": 123.63372, "b": 676.97179, "coord_origin": "TOPLEFT"}}, {"id": 118, "text": "{", "bbox": {"l": 127.51899999999999, "t": 667.3479199999999, "r": 132.50029, "b": 676.75261, "coord_origin": "TOPLEFT"}}, {"id": 119, "text": "left, centroid, right", "bbox": {"l": 132.50099, "t": 668.06522, "r": 210.69743, "b": 676.97179, "coord_origin": "TOPLEFT"}}, {"id": 120, "text": "}", "bbox": {"l": 210.69699, "t": 667.3479199999999, "r": 215.67828, "b": 676.75261, "coord_origin": "TOPLEFT"}}, {"id": 121, "text": "and", "bbox": {"l": 219.56299, "t": 668.06522, "r": 233.94897000000003, "b": 676.97179, "coord_origin": "TOPLEFT"}}, {"id": 122, "text": "x$_{c}$", "bbox": {"l": 237.83499000000003, "t": 667.90582, "r": 247.08911, "b": 676.75261, "coord_origin": "TOPLEFT"}}, {"id": 123, "text": "is the x-", "bbox": {"l": 251.47299000000004, "t": 668.06522, "r": 286.362, "b": 676.97179, "coord_origin": "TOPLEFT"}}, {"id": 124, "text": "coordinate for the corresponding point.", "bbox": {"l": 50.112, "t": 680.02022, "r": 205.88721, "b": 688.92679, "coord_origin": "TOPLEFT"}}, {"id": 125, "text": "5.", "bbox": {"l": 62.067001, "t": 692.290222, "r": 69.538948, "b": 701.196785, "coord_origin": "TOPLEFT"}}, {"id": 126, "text": "Use the alignment computed in step 4, to compute", "bbox": {"l": 76.273666, "t": 692.290222, "r": 286.36496, "b": 701.196785, "coord_origin": "TOPLEFT"}}, {"id": 127, "text": "the median", "bbox": {"l": 50.112, "t": 704.245224, "r": 94.604973, "b": 713.151787, "coord_origin": "TOPLEFT"}}, {"id": 128, "text": "x", "bbox": {"l": 97.598999, "t": 704.085815, "r": 103.29263, "b": 712.93261, "coord_origin": "TOPLEFT"}}, {"id": 129, "text": "-coordinate for all table columns and the me-", "bbox": {"l": 103.292, "t": 704.245224, "r": 286.36481, "b": 713.151787, "coord_origin": "TOPLEFT"}}, {"id": 130, "text": "dian cell size for all table cells. The usage of median dur-", "bbox": {"l": 308.862, "t": 210.93120999999996, "r": 545.11517, "b": 219.83776999999998, "coord_origin": "TOPLEFT"}}, {"id": 131, "text": "ing the computations, helps to eliminate outliers caused by", "bbox": {"l": 308.862, "t": 222.88720999999998, "r": 545.11511, "b": 231.79376000000002, "coord_origin": "TOPLEFT"}}, {"id": 132, "text": "occasional column spans which are usually wider than the", "bbox": {"l": 308.862, "t": 234.84222, "r": 545.11511, "b": 243.74878, "coord_origin": "TOPLEFT"}}, {"id": 133, "text": "normal.", "bbox": {"l": 308.862, "t": 246.79724, "r": 339.57669, "b": 255.7038, "coord_origin": "TOPLEFT"}}, {"id": 134, "text": "6.", "bbox": {"l": 320.81699, "t": 259.10222999999996, "r": 328.28894, "b": 268.00879, "coord_origin": "TOPLEFT"}}, {"id": 135, "text": "Snap all cells with bad IOU to their corresponding", "bbox": {"l": 334.88419, "t": 259.10222999999996, "r": 545.11499, "b": 268.00879, "coord_origin": "TOPLEFT"}}, {"id": 136, "text": "median", "bbox": {"l": 308.862, "t": 271.05724999999995, "r": 338.19189, "b": 279.96380999999997, "coord_origin": "TOPLEFT"}}, {"id": 137, "text": "x", "bbox": {"l": 340.68201, "t": 270.89783, "r": 346.37564, "b": 279.74463000000003, "coord_origin": "TOPLEFT"}}, {"id": 138, "text": "-coordinates and cell sizes.", "bbox": {"l": 346.37601, "t": 271.05724999999995, "r": 453.72305000000006, "b": 279.96380999999997, "coord_origin": "TOPLEFT"}}, {"id": 139, "text": "7.", "bbox": {"l": 320.81702, "t": 283.36325000000005, "r": 328.38953, "b": 292.26981, "coord_origin": "TOPLEFT"}}, {"id": 140, "text": "Generate a new set of pair-wise matches between the", "bbox": {"l": 330.9137, "t": 283.36325000000005, "r": 545.11499, "b": 292.26981, "coord_origin": "TOPLEFT"}}, {"id": 141, "text": "corrected bounding boxes and PDF cells. This time use a", "bbox": {"l": 308.86203, "t": 295.31824, "r": 545.11511, "b": 304.22479, "coord_origin": "TOPLEFT"}}, {"id": 142, "text": "modified version of the IOU metric, where the area of the", "bbox": {"l": 308.86203, "t": 307.27322, "r": 545.11505, "b": 316.17978, "coord_origin": "TOPLEFT"}}, {"id": 143, "text": "intersection between the predicted and PDF cells is divided", "bbox": {"l": 308.86203, "t": 319.22821000000005, "r": 545.11511, "b": 328.13477, "coord_origin": "TOPLEFT"}}, {"id": 144, "text": "by the PDF cell area.", "bbox": {"l": 308.86203, "t": 331.1842, "r": 397.19043, "b": 340.09076000000005, "coord_origin": "TOPLEFT"}}, {"id": 145, "text": "In case there are multiple matches", "bbox": {"l": 403.65616, "t": 331.1842, "r": 545.11511, "b": 340.09076000000005, "coord_origin": "TOPLEFT"}}, {"id": 146, "text": "for the same PDF cell, the prediction with the higher score", "bbox": {"l": 308.86203, "t": 343.13919, "r": 545.11511, "b": 352.04575, "coord_origin": "TOPLEFT"}}, {"id": 147, "text": "is preferred. This covers the cases where the PDF cells are", "bbox": {"l": 308.86203, "t": 355.09418, "r": 545.11505, "b": 364.00073, "coord_origin": "TOPLEFT"}}, {"id": 148, "text": "smaller than the area of predicted or corrected prediction", "bbox": {"l": 308.86203, "t": 367.04916, "r": 545.11505, "b": 375.95572000000004, "coord_origin": "TOPLEFT"}}, {"id": 149, "text": "cells.", "bbox": {"l": 308.86203, "t": 379.00415, "r": 329.61414, "b": 387.91071, "coord_origin": "TOPLEFT"}}, {"id": 150, "text": "8.", "bbox": {"l": 320.81702, "t": 391.31015, "r": 328.55356, "b": 400.2167099999999, "coord_origin": "TOPLEFT"}}, {"id": 151, "text": "In some rare occasions, we have noticed that Table-", "bbox": {"l": 331.13242, "t": 391.31015, "r": 545.11505, "b": 400.2167099999999, "coord_origin": "TOPLEFT"}}, {"id": 152, "text": "Former can confuse a single column as two. When the post-", "bbox": {"l": 308.86203, "t": 403.26514, "r": 545.11517, "b": 412.17169, "coord_origin": "TOPLEFT"}}, {"id": 153, "text": "processing steps are applied, this results with two predicted", "bbox": {"l": 308.86203, "t": 415.22012000000007, "r": 545.11511, "b": 424.12668, "coord_origin": "TOPLEFT"}}, {"id": 154, "text": "columns pointing to the same PDF column. In such case", "bbox": {"l": 308.86203, "t": 427.17511, "r": 545.11511, "b": 436.0816699999999, "coord_origin": "TOPLEFT"}}, {"id": 155, "text": "we must de-duplicate the columns according to highest to-", "bbox": {"l": 308.86203, "t": 439.1301, "r": 545.11505, "b": 448.03665, "coord_origin": "TOPLEFT"}}, {"id": 156, "text": "tal column intersection score.", "bbox": {"l": 308.86203, "t": 451.08507999999995, "r": 426.18161, "b": 459.99164, "coord_origin": "TOPLEFT"}}, {"id": 157, "text": "9.", "bbox": {"l": 320.81702, "t": 463.39108, "r": 328.67316, "b": 472.29764, "coord_origin": "TOPLEFT"}}, {"id": 158, "text": "Pick up the remaining orphan cells. There could be", "bbox": {"l": 331.29187, "t": 463.39108, "r": 545.11499, "b": 472.29764, "coord_origin": "TOPLEFT"}}, {"id": 159, "text": "cases, when after applying all the previous post-processing", "bbox": {"l": 308.86203, "t": 475.34607, "r": 545.11505, "b": 484.25262, "coord_origin": "TOPLEFT"}}, {"id": 160, "text": "steps, some PDF cells could still remain without any match", "bbox": {"l": 308.86203, "t": 487.30106, "r": 545.11517, "b": 496.20761, "coord_origin": "TOPLEFT"}}, {"id": 161, "text": "to predicted cells.", "bbox": {"l": 308.86203, "t": 499.25604, "r": 381.89786, "b": 508.1626, "coord_origin": "TOPLEFT"}}, {"id": 162, "text": "However, it is still possible to deduce", "bbox": {"l": 388.7023, "t": 499.25604, "r": 545.11517, "b": 508.1626, "coord_origin": "TOPLEFT"}}, {"id": 163, "text": "the correct matching for an orphan PDF cell by mapping its", "bbox": {"l": 308.86203, "t": 511.21204, "r": 545.11511, "b": 520.11859, "coord_origin": "TOPLEFT"}}, {"id": 164, "text": "bounding box on the geometry of the grid. This mapping", "bbox": {"l": 308.86203, "t": 523.16702, "r": 545.11505, "b": 532.07358, "coord_origin": "TOPLEFT"}}, {"id": 165, "text": "decides if the content of the orphan cell will be appended to", "bbox": {"l": 308.86203, "t": 535.12201, "r": 545.11499, "b": 544.02858, "coord_origin": "TOPLEFT"}}, {"id": 166, "text": "an already matched table cell, or a new table cell should be", "bbox": {"l": 308.86203, "t": 547.07703, "r": 545.11517, "b": 555.98358, "coord_origin": "TOPLEFT"}}, {"id": 167, "text": "created to match with the orphan.", "bbox": {"l": 308.86203, "t": 559.03203, "r": 442.22147000000007, "b": 567.93858, "coord_origin": "TOPLEFT"}}, {"id": 168, "text": "9a. Compute the top and bottom boundary of the hori-", "bbox": {"l": 320.81702, "t": 571.33803, "r": 545.11493, "b": 580.24458, "coord_origin": "TOPLEFT"}}, {"id": 169, "text": "zontal band for each grid row (min/max", "bbox": {"l": 308.86203, "t": 583.29303, "r": 471.64093, "b": 592.19958, "coord_origin": "TOPLEFT"}}, {"id": 170, "text": "y", "bbox": {"l": 474.83405, "t": 583.1336200000001, "r": 479.71872, "b": 591.98041, "coord_origin": "TOPLEFT"}}, {"id": 171, "text": "coordinates per", "bbox": {"l": 483.26903999999996, "t": 583.29303, "r": 545.11688, "b": 592.19958, "coord_origin": "TOPLEFT"}}, {"id": 172, "text": "row).", "bbox": {"l": 308.86206, "t": 595.24803, "r": 329.91306, "b": 604.15459, "coord_origin": "TOPLEFT"}}, {"id": 173, "text": "9b.", "bbox": {"l": 320.81705, "t": 607.55304, "r": 332.8718, "b": 616.4595899999999, "coord_origin": "TOPLEFT"}}, {"id": 174, "text": "Intersect the orphan\u2019s bounding box with the row", "bbox": {"l": 339.92532, "t": 607.55304, "r": 545.11505, "b": 616.4595899999999, "coord_origin": "TOPLEFT"}}, {"id": 175, "text": "bands, and map the cell to the closest grid row.", "bbox": {"l": 308.86206, "t": 619.50903, "r": 495.2923, "b": 628.4155900000001, "coord_origin": "TOPLEFT"}}, {"id": 176, "text": "9c. Compute the left and right boundary of the vertical", "bbox": {"l": 320.81705, "t": 631.81403, "r": 545.11505, "b": 640.72058, "coord_origin": "TOPLEFT"}}, {"id": 177, "text": "band for each grid column (min/max", "bbox": {"l": 308.86206, "t": 643.7690299999999, "r": 455.28238, "b": 652.67558, "coord_origin": "TOPLEFT"}}, {"id": 178, "text": "x", "bbox": {"l": 457.77704, "t": 643.60962, "r": 463.47067, "b": 652.45641, "coord_origin": "TOPLEFT"}}, {"id": 179, "text": "coordinates per col-", "bbox": {"l": 465.97104, "t": 643.7690299999999, "r": 545.11389, "b": 652.67558, "coord_origin": "TOPLEFT"}}, {"id": 180, "text": "umn).", "bbox": {"l": 308.86206, "t": 655.72403, "r": 332.38376, "b": 664.63059, "coord_origin": "TOPLEFT"}}, {"id": 181, "text": "9d. Intersect the orphan\u2019s bounding box with the column", "bbox": {"l": 320.81705, "t": 668.03003, "r": 545.11499, "b": 676.93659, "coord_origin": "TOPLEFT"}}, {"id": 182, "text": "bands, and map the cell to the closest grid column.", "bbox": {"l": 308.86206, "t": 679.98503, "r": 510.5848700000001, "b": 688.89159, "coord_origin": "TOPLEFT"}}, {"id": 183, "text": "9e. If the table cell under the identified row and column", "bbox": {"l": 320.81705, "t": 692.290024, "r": 545.11505, "b": 701.196594, "coord_origin": "TOPLEFT"}}, {"id": 184, "text": "is not empty, extend its content with the content of the or-", "bbox": {"l": 308.86206, "t": 704.245026, "r": 545.11517, "b": 713.151596, "coord_origin": "TOPLEFT"}}, {"id": 185, "text": "12", "bbox": {"l": 292.63107, "t": 734.13303, "r": 302.59366, "b": 743.039593, "coord_origin": "TOPLEFT"}}], "predictions": {"layout": {"clusters": [{"id": 0, "label": "picture", "bbox": {"l": 53.345978, "t": 74.74854278564453, "r": 544.938232421875, "b": 147.59092712402344, "coord_origin": "TOPLEFT"}, "confidence": 0.6033818125724792, "cells": [{"id": 1, "text": "b.", "bbox": {"l": 53.345978, "t": 75.19152999999994, "r": 59.327053, "b": 81.14020000000005, "coord_origin": "TOPLEFT"}}, {"id": 3, "text": "Table Bank", "bbox": {"l": 448.37271, "t": 77.25396999999987, "r": 481.75916, "b": 83.20263999999997, "coord_origin": "TOPLEFT"}}, {"id": 5, "text": "Complex", "bbox": {"l": 63.03878399999999, "t": 101.10413000000005, "r": 85.290085, "b": 106.06133999999986, "coord_origin": "TOPLEFT"}}, {"id": 7, "text": "Complex", "bbox": {"l": 227.55121, "t": 102.53992000000005, "r": 249.80251, "b": 107.49712999999997, "coord_origin": "TOPLEFT"}}, {"id": 9, "text": "Simple", "bbox": {"l": 396.2337, "t": 114.04522999999995, "r": 413.69711, "b": 119.00243999999998, "coord_origin": "TOPLEFT"}}, {"id": 11, "text": "100%", "bbox": {"l": 60.93763400000001, "t": 85.73321999999996, "r": 76.151443, "b": 90.69042999999999, "coord_origin": "TOPLEFT"}}, {"id": 13, "text": "Train Test Val", "bbox": {"l": 246.20530999999997, "t": 141.60608000000002, "r": 281.88013, "b": 146.56329000000005, "coord_origin": "TOPLEFT"}}, {"id": 15, "text": "91K 10K 10K", "bbox": {"l": 249.93848999999997, "t": 86.08801000000005, "r": 282.49384, "b": 91.04522999999995, "coord_origin": "TOPLEFT"}}, {"id": 17, "text": "100% 130K 5K", "bbox": {"l": 391.37341, "t": 85.73321999999996, "r": 432.6716599999999, "b": 90.69042999999999, "coord_origin": "TOPLEFT"}}, {"id": 19, "text": "Complex", "bbox": {"l": 113.94921, "t": 141.28845, "r": 136.20052, "b": 146.24567000000002, "coord_origin": "TOPLEFT"}}, {"id": 21, "text": "Strict", "bbox": {"l": 113.3146, "t": 100.93853999999999, "r": 127.05298, "b": 105.89575000000002, "coord_origin": "TOPLEFT"}}, {"id": 23, "text": "Strict", "bbox": {"l": 113.22738999999999, "t": 122.61523, "r": 126.96577, "b": 127.57245, "coord_origin": "TOPLEFT"}}, {"id": 25, "text": "Simple", "bbox": {"l": 138.57864, "t": 141.43640000000005, "r": 156.04207, "b": 146.39362000000006, "coord_origin": "TOPLEFT"}}, {"id": 27, "text": "65K", "bbox": {"l": 311.65359, "t": 86.55498999999998, "r": 321.67203, "b": 91.5122100000001, "coord_origin": "TOPLEFT"}}, {"id": 29, "text": "Non", "bbox": {"l": 289.23572, "t": 93.07977000000005, "r": 299.37451, "b": 98.03698999999995, "coord_origin": "TOPLEFT"}}, {"id": 31, "text": "HTML", "bbox": {"l": 285.26111, "t": 105.31975999999997, "r": 299.37537, "b": 110.27697999999998, "coord_origin": "TOPLEFT"}}, {"id": 33, "text": "HTML", "bbox": {"l": 285.05713, "t": 126.50995, "r": 299.17139, "b": 131.46716000000004, "coord_origin": "TOPLEFT"}}, {"id": 35, "text": "47K", "bbox": {"l": 299.58362, "t": 86.69353999999998, "r": 309.60205, "b": 91.65075999999999, "coord_origin": "TOPLEFT"}}, {"id": 37, "text": "Non", "bbox": {"l": 459.02151, "t": 93.76116999999999, "r": 469.16031000000004, "b": 98.71838000000002, "coord_origin": "TOPLEFT"}}, {"id": 39, "text": "HTML", "bbox": {"l": 455.04691, "t": 106.00116000000014, "r": 469.16115999999994, "b": 110.95836999999995, "coord_origin": "TOPLEFT"}}, {"id": 41, "text": "Complex", "bbox": {"l": 160.37672, "t": 141.58385999999996, "r": 182.62802, "b": 146.54107999999997, "coord_origin": "TOPLEFT"}}, {"id": 43, "text": "Missing", "bbox": {"l": 154.50967, "t": 100.98479999999995, "r": 173.3246, "b": 105.94202000000007, "coord_origin": "TOPLEFT"}}, {"id": 45, "text": "Contain", "bbox": {"l": 326.41302, "t": 107.23248000000001, "r": 345.99701, "b": 112.18970000000002, "coord_origin": "TOPLEFT"}}, {"id": 47, "text": "bboxes", "bbox": {"l": 327.94131, "t": 119.47247000000004, "r": 345.99634, "b": 124.42969000000005, "coord_origin": "TOPLEFT"}}, {"id": 49, "text": "doesn't", "bbox": {"l": 490.1893, "t": 110.27373999999998, "r": 508.76349000000005, "b": 115.2309600000001, "coord_origin": "TOPLEFT"}}, {"id": 51, "text": "bboxes", "bbox": {"l": 490.71121, "t": 122.51373000000001, "r": 508.76624, "b": 127.47095000000002, "coord_origin": "TOPLEFT"}}, {"id": 53, "text": "230K 280K", "bbox": {"l": 168.50357, "t": 86.13611000000003, "r": 197.52699, "b": 91.09331999999995, "coord_origin": "TOPLEFT"}}, {"id": 55, "text": "Complex Simple", "bbox": {"l": 333.73151, "t": 141.62323000000004, "r": 374.92862, "b": 146.58043999999995, "coord_origin": "TOPLEFT"}}, {"id": 57, "text": "Simple", "bbox": {"l": 508.54248, "t": 141.37683000000004, "r": 526.00592, "b": 146.33405000000005, "coord_origin": "TOPLEFT"}}]}, {"id": 1, "label": "caption", "bbox": {"l": 49.23687744140625, "t": 164.2614288330078, "r": 545.11371, "b": 186.6978759765625, "coord_origin": "TOPLEFT"}, "confidence": 0.9661495685577393, "cells": [{"id": 59, "text": "Figure 7: Distribution of the tables across different dimensions per dataset. Simple vs complex tables per dataset and split,", "bbox": {"l": 50.112, "t": 165.50238000000002, "r": 545.11371, "b": 174.40894000000003, "coord_origin": "TOPLEFT"}}, {"id": 60, "text": "strict vs non strict html structures per dataset and table complexity, missing bboxes per dataset and table complexity.", "bbox": {"l": 50.112, "t": 177.4574, "r": 513.52234, "b": 186.36395000000005, "coord_origin": "TOPLEFT"}}]}, {"id": 2, "label": "list_item", "bbox": {"l": 60.74156951904297, "t": 210.26402282714844, "r": 286.36511, "b": 231.79296999999997, "coord_origin": "TOPLEFT"}, "confidence": 0.9621952772140503, "cells": [{"id": 61, "text": "\u2022", "bbox": {"l": 61.569, "t": 210.93140000000005, "r": 71.14743, "b": 219.83794999999998, "coord_origin": "TOPLEFT"}}, {"id": 62, "text": "TableFormer output does not include the table cell con-", "bbox": {"l": 73.542038, "t": 210.93140000000005, "r": 286.36511, "b": 219.83794999999998, "coord_origin": "TOPLEFT"}}, {"id": 63, "text": "tent.", "bbox": {"l": 70.037003, "t": 222.88640999999996, "r": 87.47155, "b": 231.79296999999997, "coord_origin": "TOPLEFT"}}]}, {"id": 3, "label": "list_item", "bbox": {"l": 60.844627380371094, "t": 243.00099182128906, "r": 286.83001708984375, "b": 265.0201110839844, "coord_origin": "TOPLEFT"}, "confidence": 0.96295565366745, "cells": [{"id": 64, "text": "\u2022", "bbox": {"l": 61.569, "t": 244.07141000000001, "r": 71.345718, "b": 252.97797000000003, "coord_origin": "TOPLEFT"}}, {"id": 65, "text": "There are occasional inaccuracies in the predictions of", "bbox": {"l": 73.789902, "t": 244.07141000000001, "r": 286.36514, "b": 252.97797000000003, "coord_origin": "TOPLEFT"}}, {"id": 66, "text": "the bounding boxes.", "bbox": {"l": 70.037003, "t": 256.02643, "r": 150.41524, "b": 264.93298000000004, "coord_origin": "TOPLEFT"}}]}, {"id": 4, "label": "text", "bbox": {"l": 49.38137435913086, "t": 278.38848876953125, "r": 286.5146789550781, "b": 395.70688, "coord_origin": "TOPLEFT"}, "confidence": 0.9767084717750549, "cells": [{"id": 67, "text": "However, it is possible to mitigate those limitations by", "bbox": {"l": 62.067001, "t": 279.20343, "r": 286.36499, "b": 288.10999, "coord_origin": "TOPLEFT"}}, {"id": 68, "text": "combining the TableFormer predictions with the informa-", "bbox": {"l": 50.112, "t": 291.15842, "r": 286.36505, "b": 300.06497, "coord_origin": "TOPLEFT"}}, {"id": 69, "text": "tion already present inside a programmatic PDF document.", "bbox": {"l": 50.112, "t": 303.1134, "r": 286.36511, "b": 312.01996, "coord_origin": "TOPLEFT"}}, {"id": 70, "text": "More specifically, PDF documents can be seen as a se-", "bbox": {"l": 50.112, "t": 315.06839, "r": 286.36511, "b": 323.97495, "coord_origin": "TOPLEFT"}}, {"id": 71, "text": "quence of PDF cells where each cell is described by its con-", "bbox": {"l": 50.112, "t": 327.02438, "r": 286.36511, "b": 335.93093999999996, "coord_origin": "TOPLEFT"}}, {"id": 72, "text": "tent and bounding box. If we are able to associate the PDF", "bbox": {"l": 50.112, "t": 338.97937, "r": 286.36505, "b": 347.88593, "coord_origin": "TOPLEFT"}}, {"id": 73, "text": "cells with the predicted table cells, we can directly link the", "bbox": {"l": 50.112, "t": 350.93436, "r": 286.36508, "b": 359.84091, "coord_origin": "TOPLEFT"}}, {"id": 74, "text": "PDF cell content to the table cell structure and use the PDF", "bbox": {"l": 50.112, "t": 362.88934, "r": 286.36511, "b": 371.7959, "coord_origin": "TOPLEFT"}}, {"id": 75, "text": "bounding boxes to correct misalignments in the predicted", "bbox": {"l": 50.112, "t": 374.84433000000007, "r": 286.36508, "b": 383.75089, "coord_origin": "TOPLEFT"}}, {"id": 76, "text": "table cell bounding boxes.", "bbox": {"l": 50.112, "t": 386.80032, "r": 154.55988, "b": 395.70688, "coord_origin": "TOPLEFT"}}]}, {"id": 5, "label": "text", "bbox": {"l": 49.62057876586914, "t": 398.1251525878906, "r": 286.36496, "b": 420.4075622558594, "coord_origin": "TOPLEFT"}, "confidence": 0.9347663521766663, "cells": [{"id": 77, "text": "Here is a step-by-step description of the prediction post-", "bbox": {"l": 62.067001, "t": 399.06934, "r": 286.36496, "b": 407.97589, "coord_origin": "TOPLEFT"}}, {"id": 78, "text": "processing:", "bbox": {"l": 50.112, "t": 411.02533, "r": 95.491638, "b": 419.93188, "coord_origin": "TOPLEFT"}}]}, {"id": 6, "label": "list_item", "bbox": {"l": 49.60729217529297, "t": 422.3403625488281, "r": 286.36508, "b": 456.4351501464844, "coord_origin": "TOPLEFT"}, "confidence": 0.8280705213546753, "cells": [{"id": 79, "text": "1.", "bbox": {"l": 62.067001, "t": 423.29532, "r": 69.37281, "b": 432.20187, "coord_origin": "TOPLEFT"}}, {"id": 80, "text": "Get the minimal grid dimensions - number of rows and", "bbox": {"l": 71.808075, "t": 423.29532, "r": 286.36502, "b": 432.20187, "coord_origin": "TOPLEFT"}}, {"id": 81, "text": "columns for the predicted table structure. This represents", "bbox": {"l": 50.112, "t": 435.25031, "r": 286.36508, "b": 444.15686, "coord_origin": "TOPLEFT"}}, {"id": 82, "text": "the most granular grid for the underlying table structure.", "bbox": {"l": 50.112, "t": 447.20529, "r": 274.50958, "b": 456.11185000000006, "coord_origin": "TOPLEFT"}}]}, {"id": 7, "label": "list_item", "bbox": {"l": 49.62120819091797, "t": 458.19854736328125, "r": 286.4130859375, "b": 504.2468, "coord_origin": "TOPLEFT"}, "confidence": 0.9646760821342468, "cells": [{"id": 83, "text": "2.", "bbox": {"l": 62.067001, "t": 459.47528, "r": 69.538948, "b": 468.38184, "coord_origin": "TOPLEFT"}}, {"id": 84, "text": "Generate pair-wise matches between the bounding", "bbox": {"l": 77.429329, "t": 459.47528, "r": 286.36499, "b": 468.38184, "coord_origin": "TOPLEFT"}}, {"id": 85, "text": "boxes of the PDF cells and the predicted cells. The Intersec-", "bbox": {"l": 50.112, "t": 471.43027, "r": 286.36505, "b": 480.33682, "coord_origin": "TOPLEFT"}}, {"id": 86, "text": "tion Over Union (IOU) metric is used to evaluate the quality", "bbox": {"l": 50.112, "t": 483.38525, "r": 286.36505, "b": 492.29181, "coord_origin": "TOPLEFT"}}, {"id": 87, "text": "of the matches.", "bbox": {"l": 50.112, "t": 495.34024, "r": 110.70452999999999, "b": 504.2468, "coord_origin": "TOPLEFT"}}]}, {"id": 8, "label": "list_item", "bbox": {"l": 49.44905090332031, "t": 506.72406005859375, "r": 286.36493, "b": 528.4727800000001, "coord_origin": "TOPLEFT"}, "confidence": 0.9391399025917053, "cells": [{"id": 88, "text": "3.", "bbox": {"l": 62.067001, "t": 507.61023, "r": 69.863068, "b": 516.5167799999999, "coord_origin": "TOPLEFT"}}, {"id": 89, "text": "Use a carefully selected IOU threshold to designate", "bbox": {"l": 72.461754, "t": 507.61023, "r": 286.36493, "b": 516.5167799999999, "coord_origin": "TOPLEFT"}}, {"id": 90, "text": "the matches as \u201cgood\u201d ones and \u201cbad\u201d ones.", "bbox": {"l": 50.112, "t": 519.5662199999999, "r": 226.0714, "b": 528.4727800000001, "coord_origin": "TOPLEFT"}}]}, {"id": 9, "label": "list_item", "bbox": {"l": 49.594417572021484, "t": 530.9173583984375, "r": 286.36511, "b": 564.65277, "coord_origin": "TOPLEFT"}, "confidence": 0.9491711854934692, "cells": [{"id": 91, "text": "3.a. If all IOU scores in a column are below the thresh-", "bbox": {"l": 62.067001, "t": 531.83521, "r": 286.36496, "b": 540.7417800000001, "coord_origin": "TOPLEFT"}}, {"id": 92, "text": "old, discard all predictions (structure and bounding boxes)", "bbox": {"l": 50.112, "t": 543.79121, "r": 286.36511, "b": 552.69777, "coord_origin": "TOPLEFT"}}, {"id": 93, "text": "for that column.", "bbox": {"l": 50.112, "t": 555.74622, "r": 114.03204, "b": 564.65277, "coord_origin": "TOPLEFT"}}]}, {"id": 10, "label": "list_item", "bbox": {"l": 49.575374603271484, "t": 566.9488525390625, "r": 286.539306640625, "b": 601.2492065429688, "coord_origin": "TOPLEFT"}, "confidence": 0.9587164521217346, "cells": [{"id": 94, "text": "4.", "bbox": {"l": 62.067001, "t": 568.01622, "r": 69.538948, "b": 576.92278, "coord_origin": "TOPLEFT"}}, {"id": 95, "text": "Find the best-fitting content alignment for the pre-", "bbox": {"l": 76.731949, "t": 568.01622, "r": 286.36502, "b": 576.92278, "coord_origin": "TOPLEFT"}}, {"id": 96, "text": "dicted cells with good IOU per each column. The alignment", "bbox": {"l": 50.112, "t": 579.97122, "r": 286.36508, "b": 588.87778, "coord_origin": "TOPLEFT"}}, {"id": 97, "text": "of the column can be identified by the following formula:", "bbox": {"l": 50.112, "t": 591.9262200000001, "r": 278.70383, "b": 600.83278, "coord_origin": "TOPLEFT"}}]}, {"id": 11, "label": "formula", "bbox": {"l": 110.28009033203125, "t": 622.1171875, "r": 286.3624, "b": 655.0543823242188, "coord_origin": "TOPLEFT"}, "confidence": 0.9298408031463623, "cells": [{"id": 98, "text": "alignment", "bbox": {"l": 112.02799999999999, "t": 623.99382, "r": 157.9516, "b": 632.84061, "coord_origin": "TOPLEFT"}}, {"id": 99, "text": "= arg min", "bbox": {"l": 160.715, "t": 623.99382, "r": 203.4964, "b": 632.84061, "coord_origin": "TOPLEFT"}}, {"id": 100, "text": "c", "bbox": {"l": 185.58499, "t": 633.98305, "r": 189.14511, "b": 640.17578, "coord_origin": "TOPLEFT"}}, {"id": 101, "text": "{", "bbox": {"l": 203.49899, "t": 623.43591, "r": 208.48029, "b": 632.84061, "coord_origin": "TOPLEFT"}}, {"id": 102, "text": "D$_{c}$", "bbox": {"l": 208.48099, "t": 623.99382, "r": 220.28911, "b": 632.84061, "coord_origin": "TOPLEFT"}}, {"id": 103, "text": "}", "bbox": {"l": 220.78699, "t": 623.43591, "r": 225.76828, "b": 632.84061, "coord_origin": "TOPLEFT"}}, {"id": 104, "text": "D$_{c}$", "bbox": {"l": 110.70499, "t": 645.25882, "r": 122.51310999999998, "b": 654.1056100000001, "coord_origin": "TOPLEFT"}}, {"id": 105, "text": "=", "bbox": {"l": 125.77899000000001, "t": 645.25882, "r": 133.52791, "b": 654.1056100000001, "coord_origin": "TOPLEFT"}}, {"id": 106, "text": "max", "bbox": {"l": 136.295, "t": 645.25882, "r": 156.00201, "b": 654.1056100000001, "coord_origin": "TOPLEFT"}}, {"id": 107, "text": "{", "bbox": {"l": 156.00299, "t": 644.70091, "r": 160.98428, "b": 654.1056100000001, "coord_origin": "TOPLEFT"}}, {"id": 108, "text": "x$_{c}$", "bbox": {"l": 160.98399, "t": 645.25882, "r": 170.23811, "b": 654.1056100000001, "coord_origin": "TOPLEFT"}}, {"id": 109, "text": "} \u2212", "bbox": {"l": 170.73599, "t": 644.70091, "r": 185.6779, "b": 654.1056100000001, "coord_origin": "TOPLEFT"}}, {"id": 110, "text": "min", "bbox": {"l": 187.894, "t": 645.25882, "r": 206.05283, "b": 654.1056100000001, "coord_origin": "TOPLEFT"}}, {"id": 111, "text": "{", "bbox": {"l": 206.054, "t": 644.70091, "r": 211.03529, "b": 654.1056100000001, "coord_origin": "TOPLEFT"}}, {"id": 112, "text": "x$_{c}$", "bbox": {"l": 211.035, "t": 645.25882, "r": 220.28912, "b": 654.1056100000001, "coord_origin": "TOPLEFT"}}, {"id": 113, "text": "}", "bbox": {"l": 220.787, "t": 644.70091, "r": 225.76829999999998, "b": 654.1056100000001, "coord_origin": "TOPLEFT"}}, {"id": 114, "text": "(4)", "bbox": {"l": 274.746, "t": 634.88522, "r": 286.3624, "b": 643.79178, "coord_origin": "TOPLEFT"}}]}, {"id": 12, "label": "text", "bbox": {"l": 49.528846740722656, "t": 666.6375732421875, "r": 286.362, "b": 689.0491333007812, "coord_origin": "TOPLEFT"}, "confidence": 0.9545556306838989, "cells": [{"id": 115, "text": "where", "bbox": {"l": 50.112, "t": 668.06522, "r": 74.45063, "b": 676.97179, "coord_origin": "TOPLEFT"}}, {"id": 116, "text": "c", "bbox": {"l": 78.335999, "t": 667.90582, "r": 82.647812, "b": 676.75261, "coord_origin": "TOPLEFT"}}, {"id": 117, "text": "is one of", "bbox": {"l": 86.532997, "t": 668.06522, "r": 123.63372, "b": 676.97179, "coord_origin": "TOPLEFT"}}, {"id": 118, "text": "{", "bbox": {"l": 127.51899999999999, "t": 667.3479199999999, "r": 132.50029, "b": 676.75261, "coord_origin": "TOPLEFT"}}, {"id": 119, "text": "left, centroid, right", "bbox": {"l": 132.50099, "t": 668.06522, "r": 210.69743, "b": 676.97179, "coord_origin": "TOPLEFT"}}, {"id": 120, "text": "}", "bbox": {"l": 210.69699, "t": 667.3479199999999, "r": 215.67828, "b": 676.75261, "coord_origin": "TOPLEFT"}}, {"id": 121, "text": "and", "bbox": {"l": 219.56299, "t": 668.06522, "r": 233.94897000000003, "b": 676.97179, "coord_origin": "TOPLEFT"}}, {"id": 122, "text": "x$_{c}$", "bbox": {"l": 237.83499000000003, "t": 667.90582, "r": 247.08911, "b": 676.75261, "coord_origin": "TOPLEFT"}}, {"id": 123, "text": "is the x-", "bbox": {"l": 251.47299000000004, "t": 668.06522, "r": 286.362, "b": 676.97179, "coord_origin": "TOPLEFT"}}, {"id": 124, "text": "coordinate for the corresponding point.", "bbox": {"l": 50.112, "t": 680.02022, "r": 205.88721, "b": 688.92679, "coord_origin": "TOPLEFT"}}]}, {"id": 13, "label": "list_item", "bbox": {"l": 49.529109954833984, "t": 691.4943237304688, "r": 286.41558837890625, "b": 713.151787, "coord_origin": "TOPLEFT"}, "confidence": 0.9260509014129639, "cells": [{"id": 125, "text": "5.", "bbox": {"l": 62.067001, "t": 692.290222, "r": 69.538948, "b": 701.196785, "coord_origin": "TOPLEFT"}}, {"id": 126, "text": "Use the alignment computed in step 4, to compute", "bbox": {"l": 76.273666, "t": 692.290222, "r": 286.36496, "b": 701.196785, "coord_origin": "TOPLEFT"}}, {"id": 127, "text": "the median", "bbox": {"l": 50.112, "t": 704.245224, "r": 94.604973, "b": 713.151787, "coord_origin": "TOPLEFT"}}, {"id": 128, "text": "x", "bbox": {"l": 97.598999, "t": 704.085815, "r": 103.29263, "b": 712.93261, "coord_origin": "TOPLEFT"}}, {"id": 129, "text": "-coordinate for all table columns and the me-", "bbox": {"l": 103.292, "t": 704.245224, "r": 286.36481, "b": 713.151787, "coord_origin": "TOPLEFT"}}]}, {"id": 14, "label": "text", "bbox": {"l": 308.1027526855469, "t": 210.27102661132812, "r": 545.11517, "b": 255.7038, "coord_origin": "TOPLEFT"}, "confidence": 0.8942293524742126, "cells": [{"id": 130, "text": "dian cell size for all table cells. The usage of median dur-", "bbox": {"l": 308.862, "t": 210.93120999999996, "r": 545.11517, "b": 219.83776999999998, "coord_origin": "TOPLEFT"}}, {"id": 131, "text": "ing the computations, helps to eliminate outliers caused by", "bbox": {"l": 308.862, "t": 222.88720999999998, "r": 545.11511, "b": 231.79376000000002, "coord_origin": "TOPLEFT"}}, {"id": 132, "text": "occasional column spans which are usually wider than the", "bbox": {"l": 308.862, "t": 234.84222, "r": 545.11511, "b": 243.74878, "coord_origin": "TOPLEFT"}}, {"id": 133, "text": "normal.", "bbox": {"l": 308.862, "t": 246.79724, "r": 339.57669, "b": 255.7038, "coord_origin": "TOPLEFT"}}]}, {"id": 15, "label": "list_item", "bbox": {"l": 308.43133544921875, "t": 258.3742370605469, "r": 545.2000122070312, "b": 280.1125183105469, "coord_origin": "TOPLEFT"}, "confidence": 0.9423389434814453, "cells": [{"id": 134, "text": "6.", "bbox": {"l": 320.81699, "t": 259.10222999999996, "r": 328.28894, "b": 268.00879, "coord_origin": "TOPLEFT"}}, {"id": 135, "text": "Snap all cells with bad IOU to their corresponding", "bbox": {"l": 334.88419, "t": 259.10222999999996, "r": 545.11499, "b": 268.00879, "coord_origin": "TOPLEFT"}}, {"id": 136, "text": "median", "bbox": {"l": 308.862, "t": 271.05724999999995, "r": 338.19189, "b": 279.96380999999997, "coord_origin": "TOPLEFT"}}, {"id": 137, "text": "x", "bbox": {"l": 340.68201, "t": 270.89783, "r": 346.37564, "b": 279.74463000000003, "coord_origin": "TOPLEFT"}}, {"id": 138, "text": "-coordinates and cell sizes.", "bbox": {"l": 346.37601, "t": 271.05724999999995, "r": 453.72305000000006, "b": 279.96380999999997, "coord_origin": "TOPLEFT"}}]}, {"id": 16, "label": "list_item", "bbox": {"l": 308.0020751953125, "t": 282.2272644042969, "r": 545.3494262695312, "b": 387.91071, "coord_origin": "TOPLEFT"}, "confidence": 0.9562004804611206, "cells": [{"id": 139, "text": "7.", "bbox": {"l": 320.81702, "t": 283.36325000000005, "r": 328.38953, "b": 292.26981, "coord_origin": "TOPLEFT"}}, {"id": 140, "text": "Generate a new set of pair-wise matches between the", "bbox": {"l": 330.9137, "t": 283.36325000000005, "r": 545.11499, "b": 292.26981, "coord_origin": "TOPLEFT"}}, {"id": 141, "text": "corrected bounding boxes and PDF cells. This time use a", "bbox": {"l": 308.86203, "t": 295.31824, "r": 545.11511, "b": 304.22479, "coord_origin": "TOPLEFT"}}, {"id": 142, "text": "modified version of the IOU metric, where the area of the", "bbox": {"l": 308.86203, "t": 307.27322, "r": 545.11505, "b": 316.17978, "coord_origin": "TOPLEFT"}}, {"id": 143, "text": "intersection between the predicted and PDF cells is divided", "bbox": {"l": 308.86203, "t": 319.22821000000005, "r": 545.11511, "b": 328.13477, "coord_origin": "TOPLEFT"}}, {"id": 144, "text": "by the PDF cell area.", "bbox": {"l": 308.86203, "t": 331.1842, "r": 397.19043, "b": 340.09076000000005, "coord_origin": "TOPLEFT"}}, {"id": 145, "text": "In case there are multiple matches", "bbox": {"l": 403.65616, "t": 331.1842, "r": 545.11511, "b": 340.09076000000005, "coord_origin": "TOPLEFT"}}, {"id": 146, "text": "for the same PDF cell, the prediction with the higher score", "bbox": {"l": 308.86203, "t": 343.13919, "r": 545.11511, "b": 352.04575, "coord_origin": "TOPLEFT"}}, {"id": 147, "text": "is preferred. This covers the cases where the PDF cells are", "bbox": {"l": 308.86203, "t": 355.09418, "r": 545.11505, "b": 364.00073, "coord_origin": "TOPLEFT"}}, {"id": 148, "text": "smaller than the area of predicted or corrected prediction", "bbox": {"l": 308.86203, "t": 367.04916, "r": 545.11505, "b": 375.95572000000004, "coord_origin": "TOPLEFT"}}, {"id": 149, "text": "cells.", "bbox": {"l": 308.86203, "t": 379.00415, "r": 329.61414, "b": 387.91071, "coord_origin": "TOPLEFT"}}]}, {"id": 17, "label": "list_item", "bbox": {"l": 307.8562927246094, "t": 390.3468933105469, "r": 545.11517, "b": 459.99164, "coord_origin": "TOPLEFT"}, "confidence": 0.871994137763977, "cells": [{"id": 150, "text": "8.", "bbox": {"l": 320.81702, "t": 391.31015, "r": 328.55356, "b": 400.2167099999999, "coord_origin": "TOPLEFT"}}, {"id": 151, "text": "In some rare occasions, we have noticed that Table-", "bbox": {"l": 331.13242, "t": 391.31015, "r": 545.11505, "b": 400.2167099999999, "coord_origin": "TOPLEFT"}}, {"id": 152, "text": "Former can confuse a single column as two. When the post-", "bbox": {"l": 308.86203, "t": 403.26514, "r": 545.11517, "b": 412.17169, "coord_origin": "TOPLEFT"}}, {"id": 153, "text": "processing steps are applied, this results with two predicted", "bbox": {"l": 308.86203, "t": 415.22012000000007, "r": 545.11511, "b": 424.12668, "coord_origin": "TOPLEFT"}}, {"id": 154, "text": "columns pointing to the same PDF column. In such case", "bbox": {"l": 308.86203, "t": 427.17511, "r": 545.11511, "b": 436.0816699999999, "coord_origin": "TOPLEFT"}}, {"id": 155, "text": "we must de-duplicate the columns according to highest to-", "bbox": {"l": 308.86203, "t": 439.1301, "r": 545.11505, "b": 448.03665, "coord_origin": "TOPLEFT"}}, {"id": 156, "text": "tal column intersection score.", "bbox": {"l": 308.86203, "t": 451.08507999999995, "r": 426.18161, "b": 459.99164, "coord_origin": "TOPLEFT"}}]}, {"id": 18, "label": "list_item", "bbox": {"l": 307.8297424316406, "t": 462.252197265625, "r": 545.303466796875, "b": 568.1339111328125, "coord_origin": "TOPLEFT"}, "confidence": 0.9126080870628357, "cells": [{"id": 157, "text": "9.", "bbox": {"l": 320.81702, "t": 463.39108, "r": 328.67316, "b": 472.29764, "coord_origin": "TOPLEFT"}}, {"id": 158, "text": "Pick up the remaining orphan cells. There could be", "bbox": {"l": 331.29187, "t": 463.39108, "r": 545.11499, "b": 472.29764, "coord_origin": "TOPLEFT"}}, {"id": 159, "text": "cases, when after applying all the previous post-processing", "bbox": {"l": 308.86203, "t": 475.34607, "r": 545.11505, "b": 484.25262, "coord_origin": "TOPLEFT"}}, {"id": 160, "text": "steps, some PDF cells could still remain without any match", "bbox": {"l": 308.86203, "t": 487.30106, "r": 545.11517, "b": 496.20761, "coord_origin": "TOPLEFT"}}, {"id": 161, "text": "to predicted cells.", "bbox": {"l": 308.86203, "t": 499.25604, "r": 381.89786, "b": 508.1626, "coord_origin": "TOPLEFT"}}, {"id": 162, "text": "However, it is still possible to deduce", "bbox": {"l": 388.7023, "t": 499.25604, "r": 545.11517, "b": 508.1626, "coord_origin": "TOPLEFT"}}, {"id": 163, "text": "the correct matching for an orphan PDF cell by mapping its", "bbox": {"l": 308.86203, "t": 511.21204, "r": 545.11511, "b": 520.11859, "coord_origin": "TOPLEFT"}}, {"id": 164, "text": "bounding box on the geometry of the grid. This mapping", "bbox": {"l": 308.86203, "t": 523.16702, "r": 545.11505, "b": 532.07358, "coord_origin": "TOPLEFT"}}, {"id": 165, "text": "decides if the content of the orphan cell will be appended to", "bbox": {"l": 308.86203, "t": 535.12201, "r": 545.11499, "b": 544.02858, "coord_origin": "TOPLEFT"}}, {"id": 166, "text": "an already matched table cell, or a new table cell should be", "bbox": {"l": 308.86203, "t": 547.07703, "r": 545.11517, "b": 555.98358, "coord_origin": "TOPLEFT"}}, {"id": 167, "text": "created to match with the orphan.", "bbox": {"l": 308.86203, "t": 559.03203, "r": 442.22147000000007, "b": 567.93858, "coord_origin": "TOPLEFT"}}]}, {"id": 19, "label": "text", "bbox": {"l": 308.12921142578125, "t": 570.203857421875, "r": 545.2933349609375, "b": 604.15459, "coord_origin": "TOPLEFT"}, "confidence": 0.8459469676017761, "cells": [{"id": 168, "text": "9a. Compute the top and bottom boundary of the hori-", "bbox": {"l": 320.81702, "t": 571.33803, "r": 545.11493, "b": 580.24458, "coord_origin": "TOPLEFT"}}, {"id": 169, "text": "zontal band for each grid row (min/max", "bbox": {"l": 308.86203, "t": 583.29303, "r": 471.64093, "b": 592.19958, "coord_origin": "TOPLEFT"}}, {"id": 170, "text": "y", "bbox": {"l": 474.83405, "t": 583.1336200000001, "r": 479.71872, "b": 591.98041, "coord_origin": "TOPLEFT"}}, {"id": 171, "text": "coordinates per", "bbox": {"l": 483.26903999999996, "t": 583.29303, "r": 545.11688, "b": 592.19958, "coord_origin": "TOPLEFT"}}, {"id": 172, "text": "row).", "bbox": {"l": 308.86206, "t": 595.24803, "r": 329.91306, "b": 604.15459, "coord_origin": "TOPLEFT"}}]}, {"id": 20, "label": "list_item", "bbox": {"l": 308.2972106933594, "t": 606.1244506835938, "r": 545.11505, "b": 628.4155900000001, "coord_origin": "TOPLEFT"}, "confidence": 0.7716894149780273, "cells": [{"id": 173, "text": "9b.", "bbox": {"l": 320.81705, "t": 607.55304, "r": 332.8718, "b": 616.4595899999999, "coord_origin": "TOPLEFT"}}, {"id": 174, "text": "Intersect the orphan\u2019s bounding box with the row", "bbox": {"l": 339.92532, "t": 607.55304, "r": 545.11505, "b": 616.4595899999999, "coord_origin": "TOPLEFT"}}, {"id": 175, "text": "bands, and map the cell to the closest grid row.", "bbox": {"l": 308.86206, "t": 619.50903, "r": 495.2923, "b": 628.4155900000001, "coord_origin": "TOPLEFT"}}]}, {"id": 21, "label": "list_item", "bbox": {"l": 308.36566162109375, "t": 630.9808959960938, "r": 545.11505, "b": 664.63059, "coord_origin": "TOPLEFT"}, "confidence": 0.8584903478622437, "cells": [{"id": 176, "text": "9c. Compute the left and right boundary of the vertical", "bbox": {"l": 320.81705, "t": 631.81403, "r": 545.11505, "b": 640.72058, "coord_origin": "TOPLEFT"}}, {"id": 177, "text": "band for each grid column (min/max", "bbox": {"l": 308.86206, "t": 643.7690299999999, "r": 455.28238, "b": 652.67558, "coord_origin": "TOPLEFT"}}, {"id": 178, "text": "x", "bbox": {"l": 457.77704, "t": 643.60962, "r": 463.47067, "b": 652.45641, "coord_origin": "TOPLEFT"}}, {"id": 179, "text": "coordinates per col-", "bbox": {"l": 465.97104, "t": 643.7690299999999, "r": 545.11389, "b": 652.67558, "coord_origin": "TOPLEFT"}}, {"id": 180, "text": "umn).", "bbox": {"l": 308.86206, "t": 655.72403, "r": 332.38376, "b": 664.63059, "coord_origin": "TOPLEFT"}}]}, {"id": 22, "label": "list_item", "bbox": {"l": 308.18072509765625, "t": 667.103271484375, "r": 545.11499, "b": 688.9920654296875, "coord_origin": "TOPLEFT"}, "confidence": 0.7590745687484741, "cells": [{"id": 181, "text": "9d. Intersect the orphan\u2019s bounding box with the column", "bbox": {"l": 320.81705, "t": 668.03003, "r": 545.11499, "b": 676.93659, "coord_origin": "TOPLEFT"}}, {"id": 182, "text": "bands, and map the cell to the closest grid column.", "bbox": {"l": 308.86206, "t": 679.98503, "r": 510.5848700000001, "b": 688.89159, "coord_origin": "TOPLEFT"}}]}, {"id": 23, "label": "list_item", "bbox": {"l": 308.44647216796875, "t": 691.6022338867188, "r": 545.11517, "b": 713.151596, "coord_origin": "TOPLEFT"}, "confidence": 0.6971184611320496, "cells": [{"id": 183, "text": "9e. If the table cell under the identified row and column", "bbox": {"l": 320.81705, "t": 692.290024, "r": 545.11505, "b": 701.196594, "coord_origin": "TOPLEFT"}}, {"id": 184, "text": "is not empty, extend its content with the content of the or-", "bbox": {"l": 308.86206, "t": 704.245026, "r": 545.11517, "b": 713.151596, "coord_origin": "TOPLEFT"}}]}, {"id": 24, "label": "page_footer", "bbox": {"l": 292.63107, "t": 733.1248168945312, "r": 302.59366, "b": 743.039593, "coord_origin": "TOPLEFT"}, "confidence": 0.9126596450805664, "cells": [{"id": 185, "text": "12", "bbox": {"l": 292.63107, "t": 734.13303, "r": 302.59366, "b": 743.039593, "coord_origin": "TOPLEFT"}}]}]}, "tablestructure": {"table_map": {}}, "figures_classification": null, "equations_prediction": null}, "assembled": {"elements": [{"label": "picture", "id": 0, "page_no": 11, "cluster": {"id": 0, "label": "picture", "bbox": {"l": 53.345978, "t": 74.74854278564453, "r": 544.938232421875, "b": 147.59092712402344, "coord_origin": "TOPLEFT"}, "confidence": 0.6033818125724792, "cells": [{"id": 1, "text": "b.", "bbox": {"l": 53.345978, "t": 75.19152999999994, "r": 59.327053, "b": 81.14020000000005, "coord_origin": "TOPLEFT"}}, {"id": 3, "text": "Table Bank", "bbox": {"l": 448.37271, "t": 77.25396999999987, "r": 481.75916, "b": 83.20263999999997, "coord_origin": "TOPLEFT"}}, {"id": 5, "text": "Complex", "bbox": {"l": 63.03878399999999, "t": 101.10413000000005, "r": 85.290085, "b": 106.06133999999986, "coord_origin": "TOPLEFT"}}, {"id": 7, "text": "Complex", "bbox": {"l": 227.55121, "t": 102.53992000000005, "r": 249.80251, "b": 107.49712999999997, "coord_origin": "TOPLEFT"}}, {"id": 9, "text": "Simple", "bbox": {"l": 396.2337, "t": 114.04522999999995, "r": 413.69711, "b": 119.00243999999998, "coord_origin": "TOPLEFT"}}, {"id": 11, "text": "100%", "bbox": {"l": 60.93763400000001, "t": 85.73321999999996, "r": 76.151443, "b": 90.69042999999999, "coord_origin": "TOPLEFT"}}, {"id": 13, "text": "Train Test Val", "bbox": {"l": 246.20530999999997, "t": 141.60608000000002, "r": 281.88013, "b": 146.56329000000005, "coord_origin": "TOPLEFT"}}, {"id": 15, "text": "91K 10K 10K", "bbox": {"l": 249.93848999999997, "t": 86.08801000000005, "r": 282.49384, "b": 91.04522999999995, "coord_origin": "TOPLEFT"}}, {"id": 17, "text": "100% 130K 5K", "bbox": {"l": 391.37341, "t": 85.73321999999996, "r": 432.6716599999999, "b": 90.69042999999999, "coord_origin": "TOPLEFT"}}, {"id": 19, "text": "Complex", "bbox": {"l": 113.94921, "t": 141.28845, "r": 136.20052, "b": 146.24567000000002, "coord_origin": "TOPLEFT"}}, {"id": 21, "text": "Strict", "bbox": {"l": 113.3146, "t": 100.93853999999999, "r": 127.05298, "b": 105.89575000000002, "coord_origin": "TOPLEFT"}}, {"id": 23, "text": "Strict", "bbox": {"l": 113.22738999999999, "t": 122.61523, "r": 126.96577, "b": 127.57245, "coord_origin": "TOPLEFT"}}, {"id": 25, "text": "Simple", "bbox": {"l": 138.57864, "t": 141.43640000000005, "r": 156.04207, "b": 146.39362000000006, "coord_origin": "TOPLEFT"}}, {"id": 27, "text": "65K", "bbox": {"l": 311.65359, "t": 86.55498999999998, "r": 321.67203, "b": 91.5122100000001, "coord_origin": "TOPLEFT"}}, {"id": 29, "text": "Non", "bbox": {"l": 289.23572, "t": 93.07977000000005, "r": 299.37451, "b": 98.03698999999995, "coord_origin": "TOPLEFT"}}, {"id": 31, "text": "HTML", "bbox": {"l": 285.26111, "t": 105.31975999999997, "r": 299.37537, "b": 110.27697999999998, "coord_origin": "TOPLEFT"}}, {"id": 33, "text": "HTML", "bbox": {"l": 285.05713, "t": 126.50995, "r": 299.17139, "b": 131.46716000000004, "coord_origin": "TOPLEFT"}}, {"id": 35, "text": "47K", "bbox": {"l": 299.58362, "t": 86.69353999999998, "r": 309.60205, "b": 91.65075999999999, "coord_origin": "TOPLEFT"}}, {"id": 37, "text": "Non", "bbox": {"l": 459.02151, "t": 93.76116999999999, "r": 469.16031000000004, "b": 98.71838000000002, "coord_origin": "TOPLEFT"}}, {"id": 39, "text": "HTML", "bbox": {"l": 455.04691, "t": 106.00116000000014, "r": 469.16115999999994, "b": 110.95836999999995, "coord_origin": "TOPLEFT"}}, {"id": 41, "text": "Complex", "bbox": {"l": 160.37672, "t": 141.58385999999996, "r": 182.62802, "b": 146.54107999999997, "coord_origin": "TOPLEFT"}}, {"id": 43, "text": "Missing", "bbox": {"l": 154.50967, "t": 100.98479999999995, "r": 173.3246, "b": 105.94202000000007, "coord_origin": "TOPLEFT"}}, {"id": 45, "text": "Contain", "bbox": {"l": 326.41302, "t": 107.23248000000001, "r": 345.99701, "b": 112.18970000000002, "coord_origin": "TOPLEFT"}}, {"id": 47, "text": "bboxes", "bbox": {"l": 327.94131, "t": 119.47247000000004, "r": 345.99634, "b": 124.42969000000005, "coord_origin": "TOPLEFT"}}, {"id": 49, "text": "doesn't", "bbox": {"l": 490.1893, "t": 110.27373999999998, "r": 508.76349000000005, "b": 115.2309600000001, "coord_origin": "TOPLEFT"}}, {"id": 51, "text": "bboxes", "bbox": {"l": 490.71121, "t": 122.51373000000001, "r": 508.76624, "b": 127.47095000000002, "coord_origin": "TOPLEFT"}}, {"id": 53, "text": "230K 280K", "bbox": {"l": 168.50357, "t": 86.13611000000003, "r": 197.52699, "b": 91.09331999999995, "coord_origin": "TOPLEFT"}}, {"id": 55, "text": "Complex Simple", "bbox": {"l": 333.73151, "t": 141.62323000000004, "r": 374.92862, "b": 146.58043999999995, "coord_origin": "TOPLEFT"}}, {"id": 57, "text": "Simple", "bbox": {"l": 508.54248, "t": 141.37683000000004, "r": 526.00592, "b": 146.33405000000005, "coord_origin": "TOPLEFT"}}]}, "text": "", "data": null, "provenance": null, "predicted_class": null, "confidence": null}, {"label": "caption", "id": 1, "page_no": 11, "cluster": {"id": 1, "label": "caption", "bbox": {"l": 49.23687744140625, "t": 164.2614288330078, "r": 545.11371, "b": 186.6978759765625, "coord_origin": "TOPLEFT"}, "confidence": 0.9661495685577393, "cells": [{"id": 59, "text": "Figure 7: Distribution of the tables across different dimensions per dataset. Simple vs complex tables per dataset and split,", "bbox": {"l": 50.112, "t": 165.50238000000002, "r": 545.11371, "b": 174.40894000000003, "coord_origin": "TOPLEFT"}}, {"id": 60, "text": "strict vs non strict html structures per dataset and table complexity, missing bboxes per dataset and table complexity.", "bbox": {"l": 50.112, "t": 177.4574, "r": 513.52234, "b": 186.36395000000005, "coord_origin": "TOPLEFT"}}]}, "text": "Figure 7: Distribution of the tables across different dimensions per dataset. Simple vs complex tables per dataset and split, strict vs non strict html structures per dataset and table complexity, missing bboxes per dataset and table complexity."}, {"label": "list_item", "id": 2, "page_no": 11, "cluster": {"id": 2, "label": "list_item", "bbox": {"l": 60.74156951904297, "t": 210.26402282714844, "r": 286.36511, "b": 231.79296999999997, "coord_origin": "TOPLEFT"}, "confidence": 0.9621952772140503, "cells": [{"id": 61, "text": "\u2022", "bbox": {"l": 61.569, "t": 210.93140000000005, "r": 71.14743, "b": 219.83794999999998, "coord_origin": "TOPLEFT"}}, {"id": 62, "text": "TableFormer output does not include the table cell con-", "bbox": {"l": 73.542038, "t": 210.93140000000005, "r": 286.36511, "b": 219.83794999999998, "coord_origin": "TOPLEFT"}}, {"id": 63, "text": "tent.", "bbox": {"l": 70.037003, "t": 222.88640999999996, "r": 87.47155, "b": 231.79296999999997, "coord_origin": "TOPLEFT"}}]}, "text": "\u2022 TableFormer output does not include the table cell content."}, {"label": "list_item", "id": 3, "page_no": 11, "cluster": {"id": 3, "label": "list_item", "bbox": {"l": 60.844627380371094, "t": 243.00099182128906, "r": 286.83001708984375, "b": 265.0201110839844, "coord_origin": "TOPLEFT"}, "confidence": 0.96295565366745, "cells": [{"id": 64, "text": "\u2022", "bbox": {"l": 61.569, "t": 244.07141000000001, "r": 71.345718, "b": 252.97797000000003, "coord_origin": "TOPLEFT"}}, {"id": 65, "text": "There are occasional inaccuracies in the predictions of", "bbox": {"l": 73.789902, "t": 244.07141000000001, "r": 286.36514, "b": 252.97797000000003, "coord_origin": "TOPLEFT"}}, {"id": 66, "text": "the bounding boxes.", "bbox": {"l": 70.037003, "t": 256.02643, "r": 150.41524, "b": 264.93298000000004, "coord_origin": "TOPLEFT"}}]}, "text": "\u2022 There are occasional inaccuracies in the predictions of the bounding boxes."}, {"label": "text", "id": 4, "page_no": 11, "cluster": {"id": 4, "label": "text", "bbox": {"l": 49.38137435913086, "t": 278.38848876953125, "r": 286.5146789550781, "b": 395.70688, "coord_origin": "TOPLEFT"}, "confidence": 0.9767084717750549, "cells": [{"id": 67, "text": "However, it is possible to mitigate those limitations by", "bbox": {"l": 62.067001, "t": 279.20343, "r": 286.36499, "b": 288.10999, "coord_origin": "TOPLEFT"}}, {"id": 68, "text": "combining the TableFormer predictions with the informa-", "bbox": {"l": 50.112, "t": 291.15842, "r": 286.36505, "b": 300.06497, "coord_origin": "TOPLEFT"}}, {"id": 69, "text": "tion already present inside a programmatic PDF document.", "bbox": {"l": 50.112, "t": 303.1134, "r": 286.36511, "b": 312.01996, "coord_origin": "TOPLEFT"}}, {"id": 70, "text": "More specifically, PDF documents can be seen as a se-", "bbox": {"l": 50.112, "t": 315.06839, "r": 286.36511, "b": 323.97495, "coord_origin": "TOPLEFT"}}, {"id": 71, "text": "quence of PDF cells where each cell is described by its con-", "bbox": {"l": 50.112, "t": 327.02438, "r": 286.36511, "b": 335.93093999999996, "coord_origin": "TOPLEFT"}}, {"id": 72, "text": "tent and bounding box. If we are able to associate the PDF", "bbox": {"l": 50.112, "t": 338.97937, "r": 286.36505, "b": 347.88593, "coord_origin": "TOPLEFT"}}, {"id": 73, "text": "cells with the predicted table cells, we can directly link the", "bbox": {"l": 50.112, "t": 350.93436, "r": 286.36508, "b": 359.84091, "coord_origin": "TOPLEFT"}}, {"id": 74, "text": "PDF cell content to the table cell structure and use the PDF", "bbox": {"l": 50.112, "t": 362.88934, "r": 286.36511, "b": 371.7959, "coord_origin": "TOPLEFT"}}, {"id": 75, "text": "bounding boxes to correct misalignments in the predicted", "bbox": {"l": 50.112, "t": 374.84433000000007, "r": 286.36508, "b": 383.75089, "coord_origin": "TOPLEFT"}}, {"id": 76, "text": "table cell bounding boxes.", "bbox": {"l": 50.112, "t": 386.80032, "r": 154.55988, "b": 395.70688, "coord_origin": "TOPLEFT"}}]}, "text": "However, it is possible to mitigate those limitations by combining the TableFormer predictions with the information already present inside a programmatic PDF document. More specifically, PDF documents can be seen as a sequence of PDF cells where each cell is described by its content and bounding box. If we are able to associate the PDF cells with the predicted table cells, we can directly link the PDF cell content to the table cell structure and use the PDF bounding boxes to correct misalignments in the predicted table cell bounding boxes."}, {"label": "text", "id": 5, "page_no": 11, "cluster": {"id": 5, "label": "text", "bbox": {"l": 49.62057876586914, "t": 398.1251525878906, "r": 286.36496, "b": 420.4075622558594, "coord_origin": "TOPLEFT"}, "confidence": 0.9347663521766663, "cells": [{"id": 77, "text": "Here is a step-by-step description of the prediction post-", "bbox": {"l": 62.067001, "t": 399.06934, "r": 286.36496, "b": 407.97589, "coord_origin": "TOPLEFT"}}, {"id": 78, "text": "processing:", "bbox": {"l": 50.112, "t": 411.02533, "r": 95.491638, "b": 419.93188, "coord_origin": "TOPLEFT"}}]}, "text": "Here is a step-by-step description of the prediction postprocessing:"}, {"label": "list_item", "id": 6, "page_no": 11, "cluster": {"id": 6, "label": "list_item", "bbox": {"l": 49.60729217529297, "t": 422.3403625488281, "r": 286.36508, "b": 456.4351501464844, "coord_origin": "TOPLEFT"}, "confidence": 0.8280705213546753, "cells": [{"id": 79, "text": "1.", "bbox": {"l": 62.067001, "t": 423.29532, "r": 69.37281, "b": 432.20187, "coord_origin": "TOPLEFT"}}, {"id": 80, "text": "Get the minimal grid dimensions - number of rows and", "bbox": {"l": 71.808075, "t": 423.29532, "r": 286.36502, "b": 432.20187, "coord_origin": "TOPLEFT"}}, {"id": 81, "text": "columns for the predicted table structure. This represents", "bbox": {"l": 50.112, "t": 435.25031, "r": 286.36508, "b": 444.15686, "coord_origin": "TOPLEFT"}}, {"id": 82, "text": "the most granular grid for the underlying table structure.", "bbox": {"l": 50.112, "t": 447.20529, "r": 274.50958, "b": 456.11185000000006, "coord_origin": "TOPLEFT"}}]}, "text": "1. Get the minimal grid dimensions - number of rows and columns for the predicted table structure. This represents the most granular grid for the underlying table structure."}, {"label": "list_item", "id": 7, "page_no": 11, "cluster": {"id": 7, "label": "list_item", "bbox": {"l": 49.62120819091797, "t": 458.19854736328125, "r": 286.4130859375, "b": 504.2468, "coord_origin": "TOPLEFT"}, "confidence": 0.9646760821342468, "cells": [{"id": 83, "text": "2.", "bbox": {"l": 62.067001, "t": 459.47528, "r": 69.538948, "b": 468.38184, "coord_origin": "TOPLEFT"}}, {"id": 84, "text": "Generate pair-wise matches between the bounding", "bbox": {"l": 77.429329, "t": 459.47528, "r": 286.36499, "b": 468.38184, "coord_origin": "TOPLEFT"}}, {"id": 85, "text": "boxes of the PDF cells and the predicted cells. The Intersec-", "bbox": {"l": 50.112, "t": 471.43027, "r": 286.36505, "b": 480.33682, "coord_origin": "TOPLEFT"}}, {"id": 86, "text": "tion Over Union (IOU) metric is used to evaluate the quality", "bbox": {"l": 50.112, "t": 483.38525, "r": 286.36505, "b": 492.29181, "coord_origin": "TOPLEFT"}}, {"id": 87, "text": "of the matches.", "bbox": {"l": 50.112, "t": 495.34024, "r": 110.70452999999999, "b": 504.2468, "coord_origin": "TOPLEFT"}}]}, "text": "2. Generate pair-wise matches between the bounding boxes of the PDF cells and the predicted cells. The Intersection Over Union (IOU) metric is used to evaluate the quality of the matches."}, {"label": "list_item", "id": 8, "page_no": 11, "cluster": {"id": 8, "label": "list_item", "bbox": {"l": 49.44905090332031, "t": 506.72406005859375, "r": 286.36493, "b": 528.4727800000001, "coord_origin": "TOPLEFT"}, "confidence": 0.9391399025917053, "cells": [{"id": 88, "text": "3.", "bbox": {"l": 62.067001, "t": 507.61023, "r": 69.863068, "b": 516.5167799999999, "coord_origin": "TOPLEFT"}}, {"id": 89, "text": "Use a carefully selected IOU threshold to designate", "bbox": {"l": 72.461754, "t": 507.61023, "r": 286.36493, "b": 516.5167799999999, "coord_origin": "TOPLEFT"}}, {"id": 90, "text": "the matches as \u201cgood\u201d ones and \u201cbad\u201d ones.", "bbox": {"l": 50.112, "t": 519.5662199999999, "r": 226.0714, "b": 528.4727800000001, "coord_origin": "TOPLEFT"}}]}, "text": "3. Use a carefully selected IOU threshold to designate the matches as \u201cgood\u201d ones and \u201cbad\u201d ones."}, {"label": "list_item", "id": 9, "page_no": 11, "cluster": {"id": 9, "label": "list_item", "bbox": {"l": 49.594417572021484, "t": 530.9173583984375, "r": 286.36511, "b": 564.65277, "coord_origin": "TOPLEFT"}, "confidence": 0.9491711854934692, "cells": [{"id": 91, "text": "3.a. If all IOU scores in a column are below the thresh-", "bbox": {"l": 62.067001, "t": 531.83521, "r": 286.36496, "b": 540.7417800000001, "coord_origin": "TOPLEFT"}}, {"id": 92, "text": "old, discard all predictions (structure and bounding boxes)", "bbox": {"l": 50.112, "t": 543.79121, "r": 286.36511, "b": 552.69777, "coord_origin": "TOPLEFT"}}, {"id": 93, "text": "for that column.", "bbox": {"l": 50.112, "t": 555.74622, "r": 114.03204, "b": 564.65277, "coord_origin": "TOPLEFT"}}]}, "text": "3.a. If all IOU scores in a column are below the threshold, discard all predictions (structure and bounding boxes) for that column."}, {"label": "list_item", "id": 10, "page_no": 11, "cluster": {"id": 10, "label": "list_item", "bbox": {"l": 49.575374603271484, "t": 566.9488525390625, "r": 286.539306640625, "b": 601.2492065429688, "coord_origin": "TOPLEFT"}, "confidence": 0.9587164521217346, "cells": [{"id": 94, "text": "4.", "bbox": {"l": 62.067001, "t": 568.01622, "r": 69.538948, "b": 576.92278, "coord_origin": "TOPLEFT"}}, {"id": 95, "text": "Find the best-fitting content alignment for the pre-", "bbox": {"l": 76.731949, "t": 568.01622, "r": 286.36502, "b": 576.92278, "coord_origin": "TOPLEFT"}}, {"id": 96, "text": "dicted cells with good IOU per each column. The alignment", "bbox": {"l": 50.112, "t": 579.97122, "r": 286.36508, "b": 588.87778, "coord_origin": "TOPLEFT"}}, {"id": 97, "text": "of the column can be identified by the following formula:", "bbox": {"l": 50.112, "t": 591.9262200000001, "r": 278.70383, "b": 600.83278, "coord_origin": "TOPLEFT"}}]}, "text": "4. Find the best-fitting content alignment for the predicted cells with good IOU per each column. The alignment of the column can be identified by the following formula:"}, {"label": "formula", "id": 11, "page_no": 11, "cluster": {"id": 11, "label": "formula", "bbox": {"l": 110.28009033203125, "t": 622.1171875, "r": 286.3624, "b": 655.0543823242188, "coord_origin": "TOPLEFT"}, "confidence": 0.9298408031463623, "cells": [{"id": 98, "text": "alignment", "bbox": {"l": 112.02799999999999, "t": 623.99382, "r": 157.9516, "b": 632.84061, "coord_origin": "TOPLEFT"}}, {"id": 99, "text": "= arg min", "bbox": {"l": 160.715, "t": 623.99382, "r": 203.4964, "b": 632.84061, "coord_origin": "TOPLEFT"}}, {"id": 100, "text": "c", "bbox": {"l": 185.58499, "t": 633.98305, "r": 189.14511, "b": 640.17578, "coord_origin": "TOPLEFT"}}, {"id": 101, "text": "{", "bbox": {"l": 203.49899, "t": 623.43591, "r": 208.48029, "b": 632.84061, "coord_origin": "TOPLEFT"}}, {"id": 102, "text": "D$_{c}$", "bbox": {"l": 208.48099, "t": 623.99382, "r": 220.28911, "b": 632.84061, "coord_origin": "TOPLEFT"}}, {"id": 103, "text": "}", "bbox": {"l": 220.78699, "t": 623.43591, "r": 225.76828, "b": 632.84061, "coord_origin": "TOPLEFT"}}, {"id": 104, "text": "D$_{c}$", "bbox": {"l": 110.70499, "t": 645.25882, "r": 122.51310999999998, "b": 654.1056100000001, "coord_origin": "TOPLEFT"}}, {"id": 105, "text": "=", "bbox": {"l": 125.77899000000001, "t": 645.25882, "r": 133.52791, "b": 654.1056100000001, "coord_origin": "TOPLEFT"}}, {"id": 106, "text": "max", "bbox": {"l": 136.295, "t": 645.25882, "r": 156.00201, "b": 654.1056100000001, "coord_origin": "TOPLEFT"}}, {"id": 107, "text": "{", "bbox": {"l": 156.00299, "t": 644.70091, "r": 160.98428, "b": 654.1056100000001, "coord_origin": "TOPLEFT"}}, {"id": 108, "text": "x$_{c}$", "bbox": {"l": 160.98399, "t": 645.25882, "r": 170.23811, "b": 654.1056100000001, "coord_origin": "TOPLEFT"}}, {"id": 109, "text": "} \u2212", "bbox": {"l": 170.73599, "t": 644.70091, "r": 185.6779, "b": 654.1056100000001, "coord_origin": "TOPLEFT"}}, {"id": 110, "text": "min", "bbox": {"l": 187.894, "t": 645.25882, "r": 206.05283, "b": 654.1056100000001, "coord_origin": "TOPLEFT"}}, {"id": 111, "text": "{", "bbox": {"l": 206.054, "t": 644.70091, "r": 211.03529, "b": 654.1056100000001, "coord_origin": "TOPLEFT"}}, {"id": 112, "text": "x$_{c}$", "bbox": {"l": 211.035, "t": 645.25882, "r": 220.28912, "b": 654.1056100000001, "coord_origin": "TOPLEFT"}}, {"id": 113, "text": "}", "bbox": {"l": 220.787, "t": 644.70091, "r": 225.76829999999998, "b": 654.1056100000001, "coord_origin": "TOPLEFT"}}, {"id": 114, "text": "(4)", "bbox": {"l": 274.746, "t": 634.88522, "r": 286.3624, "b": 643.79178, "coord_origin": "TOPLEFT"}}]}, "text": "alignment = arg min c { D$_{c}$ } D$_{c}$ = max { x$_{c}$ } \u2212 min { x$_{c}$ } (4)"}, {"label": "text", "id": 12, "page_no": 11, "cluster": {"id": 12, "label": "text", "bbox": {"l": 49.528846740722656, "t": 666.6375732421875, "r": 286.362, "b": 689.0491333007812, "coord_origin": "TOPLEFT"}, "confidence": 0.9545556306838989, "cells": [{"id": 115, "text": "where", "bbox": {"l": 50.112, "t": 668.06522, "r": 74.45063, "b": 676.97179, "coord_origin": "TOPLEFT"}}, {"id": 116, "text": "c", "bbox": {"l": 78.335999, "t": 667.90582, "r": 82.647812, "b": 676.75261, "coord_origin": "TOPLEFT"}}, {"id": 117, "text": "is one of", "bbox": {"l": 86.532997, "t": 668.06522, "r": 123.63372, "b": 676.97179, "coord_origin": "TOPLEFT"}}, {"id": 118, "text": "{", "bbox": {"l": 127.51899999999999, "t": 667.3479199999999, "r": 132.50029, "b": 676.75261, "coord_origin": "TOPLEFT"}}, {"id": 119, "text": "left, centroid, right", "bbox": {"l": 132.50099, "t": 668.06522, "r": 210.69743, "b": 676.97179, "coord_origin": "TOPLEFT"}}, {"id": 120, "text": "}", "bbox": {"l": 210.69699, "t": 667.3479199999999, "r": 215.67828, "b": 676.75261, "coord_origin": "TOPLEFT"}}, {"id": 121, "text": "and", "bbox": {"l": 219.56299, "t": 668.06522, "r": 233.94897000000003, "b": 676.97179, "coord_origin": "TOPLEFT"}}, {"id": 122, "text": "x$_{c}$", "bbox": {"l": 237.83499000000003, "t": 667.90582, "r": 247.08911, "b": 676.75261, "coord_origin": "TOPLEFT"}}, {"id": 123, "text": "is the x-", "bbox": {"l": 251.47299000000004, "t": 668.06522, "r": 286.362, "b": 676.97179, "coord_origin": "TOPLEFT"}}, {"id": 124, "text": "coordinate for the corresponding point.", "bbox": {"l": 50.112, "t": 680.02022, "r": 205.88721, "b": 688.92679, "coord_origin": "TOPLEFT"}}]}, "text": "where c is one of { left, centroid, right } and x$_{c}$ is the xcoordinate for the corresponding point."}, {"label": "list_item", "id": 13, "page_no": 11, "cluster": {"id": 13, "label": "list_item", "bbox": {"l": 49.529109954833984, "t": 691.4943237304688, "r": 286.41558837890625, "b": 713.151787, "coord_origin": "TOPLEFT"}, "confidence": 0.9260509014129639, "cells": [{"id": 125, "text": "5.", "bbox": {"l": 62.067001, "t": 692.290222, "r": 69.538948, "b": 701.196785, "coord_origin": "TOPLEFT"}}, {"id": 126, "text": "Use the alignment computed in step 4, to compute", "bbox": {"l": 76.273666, "t": 692.290222, "r": 286.36496, "b": 701.196785, "coord_origin": "TOPLEFT"}}, {"id": 127, "text": "the median", "bbox": {"l": 50.112, "t": 704.245224, "r": 94.604973, "b": 713.151787, "coord_origin": "TOPLEFT"}}, {"id": 128, "text": "x", "bbox": {"l": 97.598999, "t": 704.085815, "r": 103.29263, "b": 712.93261, "coord_origin": "TOPLEFT"}}, {"id": 129, "text": "-coordinate for all table columns and the me-", "bbox": {"l": 103.292, "t": 704.245224, "r": 286.36481, "b": 713.151787, "coord_origin": "TOPLEFT"}}]}, "text": "5. Use the alignment computed in step 4, to compute the median x -coordinate for all table columns and the me-"}, {"label": "text", "id": 14, "page_no": 11, "cluster": {"id": 14, "label": "text", "bbox": {"l": 308.1027526855469, "t": 210.27102661132812, "r": 545.11517, "b": 255.7038, "coord_origin": "TOPLEFT"}, "confidence": 0.8942293524742126, "cells": [{"id": 130, "text": "dian cell size for all table cells. The usage of median dur-", "bbox": {"l": 308.862, "t": 210.93120999999996, "r": 545.11517, "b": 219.83776999999998, "coord_origin": "TOPLEFT"}}, {"id": 131, "text": "ing the computations, helps to eliminate outliers caused by", "bbox": {"l": 308.862, "t": 222.88720999999998, "r": 545.11511, "b": 231.79376000000002, "coord_origin": "TOPLEFT"}}, {"id": 132, "text": "occasional column spans which are usually wider than the", "bbox": {"l": 308.862, "t": 234.84222, "r": 545.11511, "b": 243.74878, "coord_origin": "TOPLEFT"}}, {"id": 133, "text": "normal.", "bbox": {"l": 308.862, "t": 246.79724, "r": 339.57669, "b": 255.7038, "coord_origin": "TOPLEFT"}}]}, "text": "dian cell size for all table cells. The usage of median during the computations, helps to eliminate outliers caused by occasional column spans which are usually wider than the normal."}, {"label": "list_item", "id": 15, "page_no": 11, "cluster": {"id": 15, "label": "list_item", "bbox": {"l": 308.43133544921875, "t": 258.3742370605469, "r": 545.2000122070312, "b": 280.1125183105469, "coord_origin": "TOPLEFT"}, "confidence": 0.9423389434814453, "cells": [{"id": 134, "text": "6.", "bbox": {"l": 320.81699, "t": 259.10222999999996, "r": 328.28894, "b": 268.00879, "coord_origin": "TOPLEFT"}}, {"id": 135, "text": "Snap all cells with bad IOU to their corresponding", "bbox": {"l": 334.88419, "t": 259.10222999999996, "r": 545.11499, "b": 268.00879, "coord_origin": "TOPLEFT"}}, {"id": 136, "text": "median", "bbox": {"l": 308.862, "t": 271.05724999999995, "r": 338.19189, "b": 279.96380999999997, "coord_origin": "TOPLEFT"}}, {"id": 137, "text": "x", "bbox": {"l": 340.68201, "t": 270.89783, "r": 346.37564, "b": 279.74463000000003, "coord_origin": "TOPLEFT"}}, {"id": 138, "text": "-coordinates and cell sizes.", "bbox": {"l": 346.37601, "t": 271.05724999999995, "r": 453.72305000000006, "b": 279.96380999999997, "coord_origin": "TOPLEFT"}}]}, "text": "6. Snap all cells with bad IOU to their corresponding median x -coordinates and cell sizes."}, {"label": "list_item", "id": 16, "page_no": 11, "cluster": {"id": 16, "label": "list_item", "bbox": {"l": 308.0020751953125, "t": 282.2272644042969, "r": 545.3494262695312, "b": 387.91071, "coord_origin": "TOPLEFT"}, "confidence": 0.9562004804611206, "cells": [{"id": 139, "text": "7.", "bbox": {"l": 320.81702, "t": 283.36325000000005, "r": 328.38953, "b": 292.26981, "coord_origin": "TOPLEFT"}}, {"id": 140, "text": "Generate a new set of pair-wise matches between the", "bbox": {"l": 330.9137, "t": 283.36325000000005, "r": 545.11499, "b": 292.26981, "coord_origin": "TOPLEFT"}}, {"id": 141, "text": "corrected bounding boxes and PDF cells. This time use a", "bbox": {"l": 308.86203, "t": 295.31824, "r": 545.11511, "b": 304.22479, "coord_origin": "TOPLEFT"}}, {"id": 142, "text": "modified version of the IOU metric, where the area of the", "bbox": {"l": 308.86203, "t": 307.27322, "r": 545.11505, "b": 316.17978, "coord_origin": "TOPLEFT"}}, {"id": 143, "text": "intersection between the predicted and PDF cells is divided", "bbox": {"l": 308.86203, "t": 319.22821000000005, "r": 545.11511, "b": 328.13477, "coord_origin": "TOPLEFT"}}, {"id": 144, "text": "by the PDF cell area.", "bbox": {"l": 308.86203, "t": 331.1842, "r": 397.19043, "b": 340.09076000000005, "coord_origin": "TOPLEFT"}}, {"id": 145, "text": "In case there are multiple matches", "bbox": {"l": 403.65616, "t": 331.1842, "r": 545.11511, "b": 340.09076000000005, "coord_origin": "TOPLEFT"}}, {"id": 146, "text": "for the same PDF cell, the prediction with the higher score", "bbox": {"l": 308.86203, "t": 343.13919, "r": 545.11511, "b": 352.04575, "coord_origin": "TOPLEFT"}}, {"id": 147, "text": "is preferred. This covers the cases where the PDF cells are", "bbox": {"l": 308.86203, "t": 355.09418, "r": 545.11505, "b": 364.00073, "coord_origin": "TOPLEFT"}}, {"id": 148, "text": "smaller than the area of predicted or corrected prediction", "bbox": {"l": 308.86203, "t": 367.04916, "r": 545.11505, "b": 375.95572000000004, "coord_origin": "TOPLEFT"}}, {"id": 149, "text": "cells.", "bbox": {"l": 308.86203, "t": 379.00415, "r": 329.61414, "b": 387.91071, "coord_origin": "TOPLEFT"}}]}, "text": "7. Generate a new set of pair-wise matches between the corrected bounding boxes and PDF cells. This time use a modified version of the IOU metric, where the area of the intersection between the predicted and PDF cells is divided by the PDF cell area. In case there are multiple matches for the same PDF cell, the prediction with the higher score is preferred. This covers the cases where the PDF cells are smaller than the area of predicted or corrected prediction cells."}, {"label": "list_item", "id": 17, "page_no": 11, "cluster": {"id": 17, "label": "list_item", "bbox": {"l": 307.8562927246094, "t": 390.3468933105469, "r": 545.11517, "b": 459.99164, "coord_origin": "TOPLEFT"}, "confidence": 0.871994137763977, "cells": [{"id": 150, "text": "8.", "bbox": {"l": 320.81702, "t": 391.31015, "r": 328.55356, "b": 400.2167099999999, "coord_origin": "TOPLEFT"}}, {"id": 151, "text": "In some rare occasions, we have noticed that Table-", "bbox": {"l": 331.13242, "t": 391.31015, "r": 545.11505, "b": 400.2167099999999, "coord_origin": "TOPLEFT"}}, {"id": 152, "text": "Former can confuse a single column as two. When the post-", "bbox": {"l": 308.86203, "t": 403.26514, "r": 545.11517, "b": 412.17169, "coord_origin": "TOPLEFT"}}, {"id": 153, "text": "processing steps are applied, this results with two predicted", "bbox": {"l": 308.86203, "t": 415.22012000000007, "r": 545.11511, "b": 424.12668, "coord_origin": "TOPLEFT"}}, {"id": 154, "text": "columns pointing to the same PDF column. In such case", "bbox": {"l": 308.86203, "t": 427.17511, "r": 545.11511, "b": 436.0816699999999, "coord_origin": "TOPLEFT"}}, {"id": 155, "text": "we must de-duplicate the columns according to highest to-", "bbox": {"l": 308.86203, "t": 439.1301, "r": 545.11505, "b": 448.03665, "coord_origin": "TOPLEFT"}}, {"id": 156, "text": "tal column intersection score.", "bbox": {"l": 308.86203, "t": 451.08507999999995, "r": 426.18161, "b": 459.99164, "coord_origin": "TOPLEFT"}}]}, "text": "8. In some rare occasions, we have noticed that TableFormer can confuse a single column as two. When the postprocessing steps are applied, this results with two predicted columns pointing to the same PDF column. In such case we must de-duplicate the columns according to highest total column intersection score."}, {"label": "list_item", "id": 18, "page_no": 11, "cluster": {"id": 18, "label": "list_item", "bbox": {"l": 307.8297424316406, "t": 462.252197265625, "r": 545.303466796875, "b": 568.1339111328125, "coord_origin": "TOPLEFT"}, "confidence": 0.9126080870628357, "cells": [{"id": 157, "text": "9.", "bbox": {"l": 320.81702, "t": 463.39108, "r": 328.67316, "b": 472.29764, "coord_origin": "TOPLEFT"}}, {"id": 158, "text": "Pick up the remaining orphan cells. There could be", "bbox": {"l": 331.29187, "t": 463.39108, "r": 545.11499, "b": 472.29764, "coord_origin": "TOPLEFT"}}, {"id": 159, "text": "cases, when after applying all the previous post-processing", "bbox": {"l": 308.86203, "t": 475.34607, "r": 545.11505, "b": 484.25262, "coord_origin": "TOPLEFT"}}, {"id": 160, "text": "steps, some PDF cells could still remain without any match", "bbox": {"l": 308.86203, "t": 487.30106, "r": 545.11517, "b": 496.20761, "coord_origin": "TOPLEFT"}}, {"id": 161, "text": "to predicted cells.", "bbox": {"l": 308.86203, "t": 499.25604, "r": 381.89786, "b": 508.1626, "coord_origin": "TOPLEFT"}}, {"id": 162, "text": "However, it is still possible to deduce", "bbox": {"l": 388.7023, "t": 499.25604, "r": 545.11517, "b": 508.1626, "coord_origin": "TOPLEFT"}}, {"id": 163, "text": "the correct matching for an orphan PDF cell by mapping its", "bbox": {"l": 308.86203, "t": 511.21204, "r": 545.11511, "b": 520.11859, "coord_origin": "TOPLEFT"}}, {"id": 164, "text": "bounding box on the geometry of the grid. This mapping", "bbox": {"l": 308.86203, "t": 523.16702, "r": 545.11505, "b": 532.07358, "coord_origin": "TOPLEFT"}}, {"id": 165, "text": "decides if the content of the orphan cell will be appended to", "bbox": {"l": 308.86203, "t": 535.12201, "r": 545.11499, "b": 544.02858, "coord_origin": "TOPLEFT"}}, {"id": 166, "text": "an already matched table cell, or a new table cell should be", "bbox": {"l": 308.86203, "t": 547.07703, "r": 545.11517, "b": 555.98358, "coord_origin": "TOPLEFT"}}, {"id": 167, "text": "created to match with the orphan.", "bbox": {"l": 308.86203, "t": 559.03203, "r": 442.22147000000007, "b": 567.93858, "coord_origin": "TOPLEFT"}}]}, "text": "9. Pick up the remaining orphan cells. There could be cases, when after applying all the previous post-processing steps, some PDF cells could still remain without any match to predicted cells. However, it is still possible to deduce the correct matching for an orphan PDF cell by mapping its bounding box on the geometry of the grid. This mapping decides if the content of the orphan cell will be appended to an already matched table cell, or a new table cell should be created to match with the orphan."}, {"label": "text", "id": 19, "page_no": 11, "cluster": {"id": 19, "label": "text", "bbox": {"l": 308.12921142578125, "t": 570.203857421875, "r": 545.2933349609375, "b": 604.15459, "coord_origin": "TOPLEFT"}, "confidence": 0.8459469676017761, "cells": [{"id": 168, "text": "9a. Compute the top and bottom boundary of the hori-", "bbox": {"l": 320.81702, "t": 571.33803, "r": 545.11493, "b": 580.24458, "coord_origin": "TOPLEFT"}}, {"id": 169, "text": "zontal band for each grid row (min/max", "bbox": {"l": 308.86203, "t": 583.29303, "r": 471.64093, "b": 592.19958, "coord_origin": "TOPLEFT"}}, {"id": 170, "text": "y", "bbox": {"l": 474.83405, "t": 583.1336200000001, "r": 479.71872, "b": 591.98041, "coord_origin": "TOPLEFT"}}, {"id": 171, "text": "coordinates per", "bbox": {"l": 483.26903999999996, "t": 583.29303, "r": 545.11688, "b": 592.19958, "coord_origin": "TOPLEFT"}}, {"id": 172, "text": "row).", "bbox": {"l": 308.86206, "t": 595.24803, "r": 329.91306, "b": 604.15459, "coord_origin": "TOPLEFT"}}]}, "text": "9a. Compute the top and bottom boundary of the horizontal band for each grid row (min/max y coordinates per row)."}, {"label": "list_item", "id": 20, "page_no": 11, "cluster": {"id": 20, "label": "list_item", "bbox": {"l": 308.2972106933594, "t": 606.1244506835938, "r": 545.11505, "b": 628.4155900000001, "coord_origin": "TOPLEFT"}, "confidence": 0.7716894149780273, "cells": [{"id": 173, "text": "9b.", "bbox": {"l": 320.81705, "t": 607.55304, "r": 332.8718, "b": 616.4595899999999, "coord_origin": "TOPLEFT"}}, {"id": 174, "text": "Intersect the orphan\u2019s bounding box with the row", "bbox": {"l": 339.92532, "t": 607.55304, "r": 545.11505, "b": 616.4595899999999, "coord_origin": "TOPLEFT"}}, {"id": 175, "text": "bands, and map the cell to the closest grid row.", "bbox": {"l": 308.86206, "t": 619.50903, "r": 495.2923, "b": 628.4155900000001, "coord_origin": "TOPLEFT"}}]}, "text": "9b. Intersect the orphan\u2019s bounding box with the row bands, and map the cell to the closest grid row."}, {"label": "list_item", "id": 21, "page_no": 11, "cluster": {"id": 21, "label": "list_item", "bbox": {"l": 308.36566162109375, "t": 630.9808959960938, "r": 545.11505, "b": 664.63059, "coord_origin": "TOPLEFT"}, "confidence": 0.8584903478622437, "cells": [{"id": 176, "text": "9c. Compute the left and right boundary of the vertical", "bbox": {"l": 320.81705, "t": 631.81403, "r": 545.11505, "b": 640.72058, "coord_origin": "TOPLEFT"}}, {"id": 177, "text": "band for each grid column (min/max", "bbox": {"l": 308.86206, "t": 643.7690299999999, "r": 455.28238, "b": 652.67558, "coord_origin": "TOPLEFT"}}, {"id": 178, "text": "x", "bbox": {"l": 457.77704, "t": 643.60962, "r": 463.47067, "b": 652.45641, "coord_origin": "TOPLEFT"}}, {"id": 179, "text": "coordinates per col-", "bbox": {"l": 465.97104, "t": 643.7690299999999, "r": 545.11389, "b": 652.67558, "coord_origin": "TOPLEFT"}}, {"id": 180, "text": "umn).", "bbox": {"l": 308.86206, "t": 655.72403, "r": 332.38376, "b": 664.63059, "coord_origin": "TOPLEFT"}}]}, "text": "9c. Compute the left and right boundary of the vertical band for each grid column (min/max x coordinates per column)."}, {"label": "list_item", "id": 22, "page_no": 11, "cluster": {"id": 22, "label": "list_item", "bbox": {"l": 308.18072509765625, "t": 667.103271484375, "r": 545.11499, "b": 688.9920654296875, "coord_origin": "TOPLEFT"}, "confidence": 0.7590745687484741, "cells": [{"id": 181, "text": "9d. Intersect the orphan\u2019s bounding box with the column", "bbox": {"l": 320.81705, "t": 668.03003, "r": 545.11499, "b": 676.93659, "coord_origin": "TOPLEFT"}}, {"id": 182, "text": "bands, and map the cell to the closest grid column.", "bbox": {"l": 308.86206, "t": 679.98503, "r": 510.5848700000001, "b": 688.89159, "coord_origin": "TOPLEFT"}}]}, "text": "9d. Intersect the orphan\u2019s bounding box with the column bands, and map the cell to the closest grid column."}, {"label": "list_item", "id": 23, "page_no": 11, "cluster": {"id": 23, "label": "list_item", "bbox": {"l": 308.44647216796875, "t": 691.6022338867188, "r": 545.11517, "b": 713.151596, "coord_origin": "TOPLEFT"}, "confidence": 0.6971184611320496, "cells": [{"id": 183, "text": "9e. If the table cell under the identified row and column", "bbox": {"l": 320.81705, "t": 692.290024, "r": 545.11505, "b": 701.196594, "coord_origin": "TOPLEFT"}}, {"id": 184, "text": "is not empty, extend its content with the content of the or-", "bbox": {"l": 308.86206, "t": 704.245026, "r": 545.11517, "b": 713.151596, "coord_origin": "TOPLEFT"}}]}, "text": "9e. If the table cell under the identified row and column is not empty, extend its content with the content of the or-"}, {"label": "page_footer", "id": 24, "page_no": 11, "cluster": {"id": 24, "label": "page_footer", "bbox": {"l": 292.63107, "t": 733.1248168945312, "r": 302.59366, "b": 743.039593, "coord_origin": "TOPLEFT"}, "confidence": 0.9126596450805664, "cells": [{"id": 185, "text": "12", "bbox": {"l": 292.63107, "t": 734.13303, "r": 302.59366, "b": 743.039593, "coord_origin": "TOPLEFT"}}]}, "text": "12"}], "body": [{"label": "picture", "id": 0, "page_no": 11, "cluster": {"id": 0, "label": "picture", "bbox": {"l": 53.345978, "t": 74.74854278564453, "r": 544.938232421875, "b": 147.59092712402344, "coord_origin": "TOPLEFT"}, "confidence": 0.6033818125724792, "cells": [{"id": 1, "text": "b.", "bbox": {"l": 53.345978, "t": 75.19152999999994, "r": 59.327053, "b": 81.14020000000005, "coord_origin": "TOPLEFT"}}, {"id": 3, "text": "Table Bank", "bbox": {"l": 448.37271, "t": 77.25396999999987, "r": 481.75916, "b": 83.20263999999997, "coord_origin": "TOPLEFT"}}, {"id": 5, "text": "Complex", "bbox": {"l": 63.03878399999999, "t": 101.10413000000005, "r": 85.290085, "b": 106.06133999999986, "coord_origin": "TOPLEFT"}}, {"id": 7, "text": "Complex", "bbox": {"l": 227.55121, "t": 102.53992000000005, "r": 249.80251, "b": 107.49712999999997, "coord_origin": "TOPLEFT"}}, {"id": 9, "text": "Simple", "bbox": {"l": 396.2337, "t": 114.04522999999995, "r": 413.69711, "b": 119.00243999999998, "coord_origin": "TOPLEFT"}}, {"id": 11, "text": "100%", "bbox": {"l": 60.93763400000001, "t": 85.73321999999996, "r": 76.151443, "b": 90.69042999999999, "coord_origin": "TOPLEFT"}}, {"id": 13, "text": "Train Test Val", "bbox": {"l": 246.20530999999997, "t": 141.60608000000002, "r": 281.88013, "b": 146.56329000000005, "coord_origin": "TOPLEFT"}}, {"id": 15, "text": "91K 10K 10K", "bbox": {"l": 249.93848999999997, "t": 86.08801000000005, "r": 282.49384, "b": 91.04522999999995, "coord_origin": "TOPLEFT"}}, {"id": 17, "text": "100% 130K 5K", "bbox": {"l": 391.37341, "t": 85.73321999999996, "r": 432.6716599999999, "b": 90.69042999999999, "coord_origin": "TOPLEFT"}}, {"id": 19, "text": "Complex", "bbox": {"l": 113.94921, "t": 141.28845, "r": 136.20052, "b": 146.24567000000002, "coord_origin": "TOPLEFT"}}, {"id": 21, "text": "Strict", "bbox": {"l": 113.3146, "t": 100.93853999999999, "r": 127.05298, "b": 105.89575000000002, "coord_origin": "TOPLEFT"}}, {"id": 23, "text": "Strict", "bbox": {"l": 113.22738999999999, "t": 122.61523, "r": 126.96577, "b": 127.57245, "coord_origin": "TOPLEFT"}}, {"id": 25, "text": "Simple", "bbox": {"l": 138.57864, "t": 141.43640000000005, "r": 156.04207, "b": 146.39362000000006, "coord_origin": "TOPLEFT"}}, {"id": 27, "text": "65K", "bbox": {"l": 311.65359, "t": 86.55498999999998, "r": 321.67203, "b": 91.5122100000001, "coord_origin": "TOPLEFT"}}, {"id": 29, "text": "Non", "bbox": {"l": 289.23572, "t": 93.07977000000005, "r": 299.37451, "b": 98.03698999999995, "coord_origin": "TOPLEFT"}}, {"id": 31, "text": "HTML", "bbox": {"l": 285.26111, "t": 105.31975999999997, "r": 299.37537, "b": 110.27697999999998, "coord_origin": "TOPLEFT"}}, {"id": 33, "text": "HTML", "bbox": {"l": 285.05713, "t": 126.50995, "r": 299.17139, "b": 131.46716000000004, "coord_origin": "TOPLEFT"}}, {"id": 35, "text": "47K", "bbox": {"l": 299.58362, "t": 86.69353999999998, "r": 309.60205, "b": 91.65075999999999, "coord_origin": "TOPLEFT"}}, {"id": 37, "text": "Non", "bbox": {"l": 459.02151, "t": 93.76116999999999, "r": 469.16031000000004, "b": 98.71838000000002, "coord_origin": "TOPLEFT"}}, {"id": 39, "text": "HTML", "bbox": {"l": 455.04691, "t": 106.00116000000014, "r": 469.16115999999994, "b": 110.95836999999995, "coord_origin": "TOPLEFT"}}, {"id": 41, "text": "Complex", "bbox": {"l": 160.37672, "t": 141.58385999999996, "r": 182.62802, "b": 146.54107999999997, "coord_origin": "TOPLEFT"}}, {"id": 43, "text": "Missing", "bbox": {"l": 154.50967, "t": 100.98479999999995, "r": 173.3246, "b": 105.94202000000007, "coord_origin": "TOPLEFT"}}, {"id": 45, "text": "Contain", "bbox": {"l": 326.41302, "t": 107.23248000000001, "r": 345.99701, "b": 112.18970000000002, "coord_origin": "TOPLEFT"}}, {"id": 47, "text": "bboxes", "bbox": {"l": 327.94131, "t": 119.47247000000004, "r": 345.99634, "b": 124.42969000000005, "coord_origin": "TOPLEFT"}}, {"id": 49, "text": "doesn't", "bbox": {"l": 490.1893, "t": 110.27373999999998, "r": 508.76349000000005, "b": 115.2309600000001, "coord_origin": "TOPLEFT"}}, {"id": 51, "text": "bboxes", "bbox": {"l": 490.71121, "t": 122.51373000000001, "r": 508.76624, "b": 127.47095000000002, "coord_origin": "TOPLEFT"}}, {"id": 53, "text": "230K 280K", "bbox": {"l": 168.50357, "t": 86.13611000000003, "r": 197.52699, "b": 91.09331999999995, "coord_origin": "TOPLEFT"}}, {"id": 55, "text": "Complex Simple", "bbox": {"l": 333.73151, "t": 141.62323000000004, "r": 374.92862, "b": 146.58043999999995, "coord_origin": "TOPLEFT"}}, {"id": 57, "text": "Simple", "bbox": {"l": 508.54248, "t": 141.37683000000004, "r": 526.00592, "b": 146.33405000000005, "coord_origin": "TOPLEFT"}}]}, "text": "", "data": null, "provenance": null, "predicted_class": null, "confidence": null}, {"label": "caption", "id": 1, "page_no": 11, "cluster": {"id": 1, "label": "caption", "bbox": {"l": 49.23687744140625, "t": 164.2614288330078, "r": 545.11371, "b": 186.6978759765625, "coord_origin": "TOPLEFT"}, "confidence": 0.9661495685577393, "cells": [{"id": 59, "text": "Figure 7: Distribution of the tables across different dimensions per dataset. Simple vs complex tables per dataset and split,", "bbox": {"l": 50.112, "t": 165.50238000000002, "r": 545.11371, "b": 174.40894000000003, "coord_origin": "TOPLEFT"}}, {"id": 60, "text": "strict vs non strict html structures per dataset and table complexity, missing bboxes per dataset and table complexity.", "bbox": {"l": 50.112, "t": 177.4574, "r": 513.52234, "b": 186.36395000000005, "coord_origin": "TOPLEFT"}}]}, "text": "Figure 7: Distribution of the tables across different dimensions per dataset. Simple vs complex tables per dataset and split, strict vs non strict html structures per dataset and table complexity, missing bboxes per dataset and table complexity."}, {"label": "list_item", "id": 2, "page_no": 11, "cluster": {"id": 2, "label": "list_item", "bbox": {"l": 60.74156951904297, "t": 210.26402282714844, "r": 286.36511, "b": 231.79296999999997, "coord_origin": "TOPLEFT"}, "confidence": 0.9621952772140503, "cells": [{"id": 61, "text": "\u2022", "bbox": {"l": 61.569, "t": 210.93140000000005, "r": 71.14743, "b": 219.83794999999998, "coord_origin": "TOPLEFT"}}, {"id": 62, "text": "TableFormer output does not include the table cell con-", "bbox": {"l": 73.542038, "t": 210.93140000000005, "r": 286.36511, "b": 219.83794999999998, "coord_origin": "TOPLEFT"}}, {"id": 63, "text": "tent.", "bbox": {"l": 70.037003, "t": 222.88640999999996, "r": 87.47155, "b": 231.79296999999997, "coord_origin": "TOPLEFT"}}]}, "text": "\u2022 TableFormer output does not include the table cell content."}, {"label": "list_item", "id": 3, "page_no": 11, "cluster": {"id": 3, "label": "list_item", "bbox": {"l": 60.844627380371094, "t": 243.00099182128906, "r": 286.83001708984375, "b": 265.0201110839844, "coord_origin": "TOPLEFT"}, "confidence": 0.96295565366745, "cells": [{"id": 64, "text": "\u2022", "bbox": {"l": 61.569, "t": 244.07141000000001, "r": 71.345718, "b": 252.97797000000003, "coord_origin": "TOPLEFT"}}, {"id": 65, "text": "There are occasional inaccuracies in the predictions of", "bbox": {"l": 73.789902, "t": 244.07141000000001, "r": 286.36514, "b": 252.97797000000003, "coord_origin": "TOPLEFT"}}, {"id": 66, "text": "the bounding boxes.", "bbox": {"l": 70.037003, "t": 256.02643, "r": 150.41524, "b": 264.93298000000004, "coord_origin": "TOPLEFT"}}]}, "text": "\u2022 There are occasional inaccuracies in the predictions of the bounding boxes."}, {"label": "text", "id": 4, "page_no": 11, "cluster": {"id": 4, "label": "text", "bbox": {"l": 49.38137435913086, "t": 278.38848876953125, "r": 286.5146789550781, "b": 395.70688, "coord_origin": "TOPLEFT"}, "confidence": 0.9767084717750549, "cells": [{"id": 67, "text": "However, it is possible to mitigate those limitations by", "bbox": {"l": 62.067001, "t": 279.20343, "r": 286.36499, "b": 288.10999, "coord_origin": "TOPLEFT"}}, {"id": 68, "text": "combining the TableFormer predictions with the informa-", "bbox": {"l": 50.112, "t": 291.15842, "r": 286.36505, "b": 300.06497, "coord_origin": "TOPLEFT"}}, {"id": 69, "text": "tion already present inside a programmatic PDF document.", "bbox": {"l": 50.112, "t": 303.1134, "r": 286.36511, "b": 312.01996, "coord_origin": "TOPLEFT"}}, {"id": 70, "text": "More specifically, PDF documents can be seen as a se-", "bbox": {"l": 50.112, "t": 315.06839, "r": 286.36511, "b": 323.97495, "coord_origin": "TOPLEFT"}}, {"id": 71, "text": "quence of PDF cells where each cell is described by its con-", "bbox": {"l": 50.112, "t": 327.02438, "r": 286.36511, "b": 335.93093999999996, "coord_origin": "TOPLEFT"}}, {"id": 72, "text": "tent and bounding box. If we are able to associate the PDF", "bbox": {"l": 50.112, "t": 338.97937, "r": 286.36505, "b": 347.88593, "coord_origin": "TOPLEFT"}}, {"id": 73, "text": "cells with the predicted table cells, we can directly link the", "bbox": {"l": 50.112, "t": 350.93436, "r": 286.36508, "b": 359.84091, "coord_origin": "TOPLEFT"}}, {"id": 74, "text": "PDF cell content to the table cell structure and use the PDF", "bbox": {"l": 50.112, "t": 362.88934, "r": 286.36511, "b": 371.7959, "coord_origin": "TOPLEFT"}}, {"id": 75, "text": "bounding boxes to correct misalignments in the predicted", "bbox": {"l": 50.112, "t": 374.84433000000007, "r": 286.36508, "b": 383.75089, "coord_origin": "TOPLEFT"}}, {"id": 76, "text": "table cell bounding boxes.", "bbox": {"l": 50.112, "t": 386.80032, "r": 154.55988, "b": 395.70688, "coord_origin": "TOPLEFT"}}]}, "text": "However, it is possible to mitigate those limitations by combining the TableFormer predictions with the information already present inside a programmatic PDF document. More specifically, PDF documents can be seen as a sequence of PDF cells where each cell is described by its content and bounding box. If we are able to associate the PDF cells with the predicted table cells, we can directly link the PDF cell content to the table cell structure and use the PDF bounding boxes to correct misalignments in the predicted table cell bounding boxes."}, {"label": "text", "id": 5, "page_no": 11, "cluster": {"id": 5, "label": "text", "bbox": {"l": 49.62057876586914, "t": 398.1251525878906, "r": 286.36496, "b": 420.4075622558594, "coord_origin": "TOPLEFT"}, "confidence": 0.9347663521766663, "cells": [{"id": 77, "text": "Here is a step-by-step description of the prediction post-", "bbox": {"l": 62.067001, "t": 399.06934, "r": 286.36496, "b": 407.97589, "coord_origin": "TOPLEFT"}}, {"id": 78, "text": "processing:", "bbox": {"l": 50.112, "t": 411.02533, "r": 95.491638, "b": 419.93188, "coord_origin": "TOPLEFT"}}]}, "text": "Here is a step-by-step description of the prediction postprocessing:"}, {"label": "list_item", "id": 6, "page_no": 11, "cluster": {"id": 6, "label": "list_item", "bbox": {"l": 49.60729217529297, "t": 422.3403625488281, "r": 286.36508, "b": 456.4351501464844, "coord_origin": "TOPLEFT"}, "confidence": 0.8280705213546753, "cells": [{"id": 79, "text": "1.", "bbox": {"l": 62.067001, "t": 423.29532, "r": 69.37281, "b": 432.20187, "coord_origin": "TOPLEFT"}}, {"id": 80, "text": "Get the minimal grid dimensions - number of rows and", "bbox": {"l": 71.808075, "t": 423.29532, "r": 286.36502, "b": 432.20187, "coord_origin": "TOPLEFT"}}, {"id": 81, "text": "columns for the predicted table structure. This represents", "bbox": {"l": 50.112, "t": 435.25031, "r": 286.36508, "b": 444.15686, "coord_origin": "TOPLEFT"}}, {"id": 82, "text": "the most granular grid for the underlying table structure.", "bbox": {"l": 50.112, "t": 447.20529, "r": 274.50958, "b": 456.11185000000006, "coord_origin": "TOPLEFT"}}]}, "text": "1. Get the minimal grid dimensions - number of rows and columns for the predicted table structure. This represents the most granular grid for the underlying table structure."}, {"label": "list_item", "id": 7, "page_no": 11, "cluster": {"id": 7, "label": "list_item", "bbox": {"l": 49.62120819091797, "t": 458.19854736328125, "r": 286.4130859375, "b": 504.2468, "coord_origin": "TOPLEFT"}, "confidence": 0.9646760821342468, "cells": [{"id": 83, "text": "2.", "bbox": {"l": 62.067001, "t": 459.47528, "r": 69.538948, "b": 468.38184, "coord_origin": "TOPLEFT"}}, {"id": 84, "text": "Generate pair-wise matches between the bounding", "bbox": {"l": 77.429329, "t": 459.47528, "r": 286.36499, "b": 468.38184, "coord_origin": "TOPLEFT"}}, {"id": 85, "text": "boxes of the PDF cells and the predicted cells. The Intersec-", "bbox": {"l": 50.112, "t": 471.43027, "r": 286.36505, "b": 480.33682, "coord_origin": "TOPLEFT"}}, {"id": 86, "text": "tion Over Union (IOU) metric is used to evaluate the quality", "bbox": {"l": 50.112, "t": 483.38525, "r": 286.36505, "b": 492.29181, "coord_origin": "TOPLEFT"}}, {"id": 87, "text": "of the matches.", "bbox": {"l": 50.112, "t": 495.34024, "r": 110.70452999999999, "b": 504.2468, "coord_origin": "TOPLEFT"}}]}, "text": "2. Generate pair-wise matches between the bounding boxes of the PDF cells and the predicted cells. The Intersection Over Union (IOU) metric is used to evaluate the quality of the matches."}, {"label": "list_item", "id": 8, "page_no": 11, "cluster": {"id": 8, "label": "list_item", "bbox": {"l": 49.44905090332031, "t": 506.72406005859375, "r": 286.36493, "b": 528.4727800000001, "coord_origin": "TOPLEFT"}, "confidence": 0.9391399025917053, "cells": [{"id": 88, "text": "3.", "bbox": {"l": 62.067001, "t": 507.61023, "r": 69.863068, "b": 516.5167799999999, "coord_origin": "TOPLEFT"}}, {"id": 89, "text": "Use a carefully selected IOU threshold to designate", "bbox": {"l": 72.461754, "t": 507.61023, "r": 286.36493, "b": 516.5167799999999, "coord_origin": "TOPLEFT"}}, {"id": 90, "text": "the matches as \u201cgood\u201d ones and \u201cbad\u201d ones.", "bbox": {"l": 50.112, "t": 519.5662199999999, "r": 226.0714, "b": 528.4727800000001, "coord_origin": "TOPLEFT"}}]}, "text": "3. Use a carefully selected IOU threshold to designate the matches as \u201cgood\u201d ones and \u201cbad\u201d ones."}, {"label": "list_item", "id": 9, "page_no": 11, "cluster": {"id": 9, "label": "list_item", "bbox": {"l": 49.594417572021484, "t": 530.9173583984375, "r": 286.36511, "b": 564.65277, "coord_origin": "TOPLEFT"}, "confidence": 0.9491711854934692, "cells": [{"id": 91, "text": "3.a. If all IOU scores in a column are below the thresh-", "bbox": {"l": 62.067001, "t": 531.83521, "r": 286.36496, "b": 540.7417800000001, "coord_origin": "TOPLEFT"}}, {"id": 92, "text": "old, discard all predictions (structure and bounding boxes)", "bbox": {"l": 50.112, "t": 543.79121, "r": 286.36511, "b": 552.69777, "coord_origin": "TOPLEFT"}}, {"id": 93, "text": "for that column.", "bbox": {"l": 50.112, "t": 555.74622, "r": 114.03204, "b": 564.65277, "coord_origin": "TOPLEFT"}}]}, "text": "3.a. If all IOU scores in a column are below the threshold, discard all predictions (structure and bounding boxes) for that column."}, {"label": "list_item", "id": 10, "page_no": 11, "cluster": {"id": 10, "label": "list_item", "bbox": {"l": 49.575374603271484, "t": 566.9488525390625, "r": 286.539306640625, "b": 601.2492065429688, "coord_origin": "TOPLEFT"}, "confidence": 0.9587164521217346, "cells": [{"id": 94, "text": "4.", "bbox": {"l": 62.067001, "t": 568.01622, "r": 69.538948, "b": 576.92278, "coord_origin": "TOPLEFT"}}, {"id": 95, "text": "Find the best-fitting content alignment for the pre-", "bbox": {"l": 76.731949, "t": 568.01622, "r": 286.36502, "b": 576.92278, "coord_origin": "TOPLEFT"}}, {"id": 96, "text": "dicted cells with good IOU per each column. The alignment", "bbox": {"l": 50.112, "t": 579.97122, "r": 286.36508, "b": 588.87778, "coord_origin": "TOPLEFT"}}, {"id": 97, "text": "of the column can be identified by the following formula:", "bbox": {"l": 50.112, "t": 591.9262200000001, "r": 278.70383, "b": 600.83278, "coord_origin": "TOPLEFT"}}]}, "text": "4. Find the best-fitting content alignment for the predicted cells with good IOU per each column. The alignment of the column can be identified by the following formula:"}, {"label": "formula", "id": 11, "page_no": 11, "cluster": {"id": 11, "label": "formula", "bbox": {"l": 110.28009033203125, "t": 622.1171875, "r": 286.3624, "b": 655.0543823242188, "coord_origin": "TOPLEFT"}, "confidence": 0.9298408031463623, "cells": [{"id": 98, "text": "alignment", "bbox": {"l": 112.02799999999999, "t": 623.99382, "r": 157.9516, "b": 632.84061, "coord_origin": "TOPLEFT"}}, {"id": 99, "text": "= arg min", "bbox": {"l": 160.715, "t": 623.99382, "r": 203.4964, "b": 632.84061, "coord_origin": "TOPLEFT"}}, {"id": 100, "text": "c", "bbox": {"l": 185.58499, "t": 633.98305, "r": 189.14511, "b": 640.17578, "coord_origin": "TOPLEFT"}}, {"id": 101, "text": "{", "bbox": {"l": 203.49899, "t": 623.43591, "r": 208.48029, "b": 632.84061, "coord_origin": "TOPLEFT"}}, {"id": 102, "text": "D$_{c}$", "bbox": {"l": 208.48099, "t": 623.99382, "r": 220.28911, "b": 632.84061, "coord_origin": "TOPLEFT"}}, {"id": 103, "text": "}", "bbox": {"l": 220.78699, "t": 623.43591, "r": 225.76828, "b": 632.84061, "coord_origin": "TOPLEFT"}}, {"id": 104, "text": "D$_{c}$", "bbox": {"l": 110.70499, "t": 645.25882, "r": 122.51310999999998, "b": 654.1056100000001, "coord_origin": "TOPLEFT"}}, {"id": 105, "text": "=", "bbox": {"l": 125.77899000000001, "t": 645.25882, "r": 133.52791, "b": 654.1056100000001, "coord_origin": "TOPLEFT"}}, {"id": 106, "text": "max", "bbox": {"l": 136.295, "t": 645.25882, "r": 156.00201, "b": 654.1056100000001, "coord_origin": "TOPLEFT"}}, {"id": 107, "text": "{", "bbox": {"l": 156.00299, "t": 644.70091, "r": 160.98428, "b": 654.1056100000001, "coord_origin": "TOPLEFT"}}, {"id": 108, "text": "x$_{c}$", "bbox": {"l": 160.98399, "t": 645.25882, "r": 170.23811, "b": 654.1056100000001, "coord_origin": "TOPLEFT"}}, {"id": 109, "text": "} \u2212", "bbox": {"l": 170.73599, "t": 644.70091, "r": 185.6779, "b": 654.1056100000001, "coord_origin": "TOPLEFT"}}, {"id": 110, "text": "min", "bbox": {"l": 187.894, "t": 645.25882, "r": 206.05283, "b": 654.1056100000001, "coord_origin": "TOPLEFT"}}, {"id": 111, "text": "{", "bbox": {"l": 206.054, "t": 644.70091, "r": 211.03529, "b": 654.1056100000001, "coord_origin": "TOPLEFT"}}, {"id": 112, "text": "x$_{c}$", "bbox": {"l": 211.035, "t": 645.25882, "r": 220.28912, "b": 654.1056100000001, "coord_origin": "TOPLEFT"}}, {"id": 113, "text": "}", "bbox": {"l": 220.787, "t": 644.70091, "r": 225.76829999999998, "b": 654.1056100000001, "coord_origin": "TOPLEFT"}}, {"id": 114, "text": "(4)", "bbox": {"l": 274.746, "t": 634.88522, "r": 286.3624, "b": 643.79178, "coord_origin": "TOPLEFT"}}]}, "text": "alignment = arg min c { D$_{c}$ } D$_{c}$ = max { x$_{c}$ } \u2212 min { x$_{c}$ } (4)"}, {"label": "text", "id": 12, "page_no": 11, "cluster": {"id": 12, "label": "text", "bbox": {"l": 49.528846740722656, "t": 666.6375732421875, "r": 286.362, "b": 689.0491333007812, "coord_origin": "TOPLEFT"}, "confidence": 0.9545556306838989, "cells": [{"id": 115, "text": "where", "bbox": {"l": 50.112, "t": 668.06522, "r": 74.45063, "b": 676.97179, "coord_origin": "TOPLEFT"}}, {"id": 116, "text": "c", "bbox": {"l": 78.335999, "t": 667.90582, "r": 82.647812, "b": 676.75261, "coord_origin": "TOPLEFT"}}, {"id": 117, "text": "is one of", "bbox": {"l": 86.532997, "t": 668.06522, "r": 123.63372, "b": 676.97179, "coord_origin": "TOPLEFT"}}, {"id": 118, "text": "{", "bbox": {"l": 127.51899999999999, "t": 667.3479199999999, "r": 132.50029, "b": 676.75261, "coord_origin": "TOPLEFT"}}, {"id": 119, "text": "left, centroid, right", "bbox": {"l": 132.50099, "t": 668.06522, "r": 210.69743, "b": 676.97179, "coord_origin": "TOPLEFT"}}, {"id": 120, "text": "}", "bbox": {"l": 210.69699, "t": 667.3479199999999, "r": 215.67828, "b": 676.75261, "coord_origin": "TOPLEFT"}}, {"id": 121, "text": "and", "bbox": {"l": 219.56299, "t": 668.06522, "r": 233.94897000000003, "b": 676.97179, "coord_origin": "TOPLEFT"}}, {"id": 122, "text": "x$_{c}$", "bbox": {"l": 237.83499000000003, "t": 667.90582, "r": 247.08911, "b": 676.75261, "coord_origin": "TOPLEFT"}}, {"id": 123, "text": "is the x-", "bbox": {"l": 251.47299000000004, "t": 668.06522, "r": 286.362, "b": 676.97179, "coord_origin": "TOPLEFT"}}, {"id": 124, "text": "coordinate for the corresponding point.", "bbox": {"l": 50.112, "t": 680.02022, "r": 205.88721, "b": 688.92679, "coord_origin": "TOPLEFT"}}]}, "text": "where c is one of { left, centroid, right } and x$_{c}$ is the xcoordinate for the corresponding point."}, {"label": "list_item", "id": 13, "page_no": 11, "cluster": {"id": 13, "label": "list_item", "bbox": {"l": 49.529109954833984, "t": 691.4943237304688, "r": 286.41558837890625, "b": 713.151787, "coord_origin": "TOPLEFT"}, "confidence": 0.9260509014129639, "cells": [{"id": 125, "text": "5.", "bbox": {"l": 62.067001, "t": 692.290222, "r": 69.538948, "b": 701.196785, "coord_origin": "TOPLEFT"}}, {"id": 126, "text": "Use the alignment computed in step 4, to compute", "bbox": {"l": 76.273666, "t": 692.290222, "r": 286.36496, "b": 701.196785, "coord_origin": "TOPLEFT"}}, {"id": 127, "text": "the median", "bbox": {"l": 50.112, "t": 704.245224, "r": 94.604973, "b": 713.151787, "coord_origin": "TOPLEFT"}}, {"id": 128, "text": "x", "bbox": {"l": 97.598999, "t": 704.085815, "r": 103.29263, "b": 712.93261, "coord_origin": "TOPLEFT"}}, {"id": 129, "text": "-coordinate for all table columns and the me-", "bbox": {"l": 103.292, "t": 704.245224, "r": 286.36481, "b": 713.151787, "coord_origin": "TOPLEFT"}}]}, "text": "5. Use the alignment computed in step 4, to compute the median x -coordinate for all table columns and the me-"}, {"label": "text", "id": 14, "page_no": 11, "cluster": {"id": 14, "label": "text", "bbox": {"l": 308.1027526855469, "t": 210.27102661132812, "r": 545.11517, "b": 255.7038, "coord_origin": "TOPLEFT"}, "confidence": 0.8942293524742126, "cells": [{"id": 130, "text": "dian cell size for all table cells. The usage of median dur-", "bbox": {"l": 308.862, "t": 210.93120999999996, "r": 545.11517, "b": 219.83776999999998, "coord_origin": "TOPLEFT"}}, {"id": 131, "text": "ing the computations, helps to eliminate outliers caused by", "bbox": {"l": 308.862, "t": 222.88720999999998, "r": 545.11511, "b": 231.79376000000002, "coord_origin": "TOPLEFT"}}, {"id": 132, "text": "occasional column spans which are usually wider than the", "bbox": {"l": 308.862, "t": 234.84222, "r": 545.11511, "b": 243.74878, "coord_origin": "TOPLEFT"}}, {"id": 133, "text": "normal.", "bbox": {"l": 308.862, "t": 246.79724, "r": 339.57669, "b": 255.7038, "coord_origin": "TOPLEFT"}}]}, "text": "dian cell size for all table cells. The usage of median during the computations, helps to eliminate outliers caused by occasional column spans which are usually wider than the normal."}, {"label": "list_item", "id": 15, "page_no": 11, "cluster": {"id": 15, "label": "list_item", "bbox": {"l": 308.43133544921875, "t": 258.3742370605469, "r": 545.2000122070312, "b": 280.1125183105469, "coord_origin": "TOPLEFT"}, "confidence": 0.9423389434814453, "cells": [{"id": 134, "text": "6.", "bbox": {"l": 320.81699, "t": 259.10222999999996, "r": 328.28894, "b": 268.00879, "coord_origin": "TOPLEFT"}}, {"id": 135, "text": "Snap all cells with bad IOU to their corresponding", "bbox": {"l": 334.88419, "t": 259.10222999999996, "r": 545.11499, "b": 268.00879, "coord_origin": "TOPLEFT"}}, {"id": 136, "text": "median", "bbox": {"l": 308.862, "t": 271.05724999999995, "r": 338.19189, "b": 279.96380999999997, "coord_origin": "TOPLEFT"}}, {"id": 137, "text": "x", "bbox": {"l": 340.68201, "t": 270.89783, "r": 346.37564, "b": 279.74463000000003, "coord_origin": "TOPLEFT"}}, {"id": 138, "text": "-coordinates and cell sizes.", "bbox": {"l": 346.37601, "t": 271.05724999999995, "r": 453.72305000000006, "b": 279.96380999999997, "coord_origin": "TOPLEFT"}}]}, "text": "6. Snap all cells with bad IOU to their corresponding median x -coordinates and cell sizes."}, {"label": "list_item", "id": 16, "page_no": 11, "cluster": {"id": 16, "label": "list_item", "bbox": {"l": 308.0020751953125, "t": 282.2272644042969, "r": 545.3494262695312, "b": 387.91071, "coord_origin": "TOPLEFT"}, "confidence": 0.9562004804611206, "cells": [{"id": 139, "text": "7.", "bbox": {"l": 320.81702, "t": 283.36325000000005, "r": 328.38953, "b": 292.26981, "coord_origin": "TOPLEFT"}}, {"id": 140, "text": "Generate a new set of pair-wise matches between the", "bbox": {"l": 330.9137, "t": 283.36325000000005, "r": 545.11499, "b": 292.26981, "coord_origin": "TOPLEFT"}}, {"id": 141, "text": "corrected bounding boxes and PDF cells. This time use a", "bbox": {"l": 308.86203, "t": 295.31824, "r": 545.11511, "b": 304.22479, "coord_origin": "TOPLEFT"}}, {"id": 142, "text": "modified version of the IOU metric, where the area of the", "bbox": {"l": 308.86203, "t": 307.27322, "r": 545.11505, "b": 316.17978, "coord_origin": "TOPLEFT"}}, {"id": 143, "text": "intersection between the predicted and PDF cells is divided", "bbox": {"l": 308.86203, "t": 319.22821000000005, "r": 545.11511, "b": 328.13477, "coord_origin": "TOPLEFT"}}, {"id": 144, "text": "by the PDF cell area.", "bbox": {"l": 308.86203, "t": 331.1842, "r": 397.19043, "b": 340.09076000000005, "coord_origin": "TOPLEFT"}}, {"id": 145, "text": "In case there are multiple matches", "bbox": {"l": 403.65616, "t": 331.1842, "r": 545.11511, "b": 340.09076000000005, "coord_origin": "TOPLEFT"}}, {"id": 146, "text": "for the same PDF cell, the prediction with the higher score", "bbox": {"l": 308.86203, "t": 343.13919, "r": 545.11511, "b": 352.04575, "coord_origin": "TOPLEFT"}}, {"id": 147, "text": "is preferred. This covers the cases where the PDF cells are", "bbox": {"l": 308.86203, "t": 355.09418, "r": 545.11505, "b": 364.00073, "coord_origin": "TOPLEFT"}}, {"id": 148, "text": "smaller than the area of predicted or corrected prediction", "bbox": {"l": 308.86203, "t": 367.04916, "r": 545.11505, "b": 375.95572000000004, "coord_origin": "TOPLEFT"}}, {"id": 149, "text": "cells.", "bbox": {"l": 308.86203, "t": 379.00415, "r": 329.61414, "b": 387.91071, "coord_origin": "TOPLEFT"}}]}, "text": "7. Generate a new set of pair-wise matches between the corrected bounding boxes and PDF cells. This time use a modified version of the IOU metric, where the area of the intersection between the predicted and PDF cells is divided by the PDF cell area. In case there are multiple matches for the same PDF cell, the prediction with the higher score is preferred. This covers the cases where the PDF cells are smaller than the area of predicted or corrected prediction cells."}, {"label": "list_item", "id": 17, "page_no": 11, "cluster": {"id": 17, "label": "list_item", "bbox": {"l": 307.8562927246094, "t": 390.3468933105469, "r": 545.11517, "b": 459.99164, "coord_origin": "TOPLEFT"}, "confidence": 0.871994137763977, "cells": [{"id": 150, "text": "8.", "bbox": {"l": 320.81702, "t": 391.31015, "r": 328.55356, "b": 400.2167099999999, "coord_origin": "TOPLEFT"}}, {"id": 151, "text": "In some rare occasions, we have noticed that Table-", "bbox": {"l": 331.13242, "t": 391.31015, "r": 545.11505, "b": 400.2167099999999, "coord_origin": "TOPLEFT"}}, {"id": 152, "text": "Former can confuse a single column as two. When the post-", "bbox": {"l": 308.86203, "t": 403.26514, "r": 545.11517, "b": 412.17169, "coord_origin": "TOPLEFT"}}, {"id": 153, "text": "processing steps are applied, this results with two predicted", "bbox": {"l": 308.86203, "t": 415.22012000000007, "r": 545.11511, "b": 424.12668, "coord_origin": "TOPLEFT"}}, {"id": 154, "text": "columns pointing to the same PDF column. In such case", "bbox": {"l": 308.86203, "t": 427.17511, "r": 545.11511, "b": 436.0816699999999, "coord_origin": "TOPLEFT"}}, {"id": 155, "text": "we must de-duplicate the columns according to highest to-", "bbox": {"l": 308.86203, "t": 439.1301, "r": 545.11505, "b": 448.03665, "coord_origin": "TOPLEFT"}}, {"id": 156, "text": "tal column intersection score.", "bbox": {"l": 308.86203, "t": 451.08507999999995, "r": 426.18161, "b": 459.99164, "coord_origin": "TOPLEFT"}}]}, "text": "8. In some rare occasions, we have noticed that TableFormer can confuse a single column as two. When the postprocessing steps are applied, this results with two predicted columns pointing to the same PDF column. In such case we must de-duplicate the columns according to highest total column intersection score."}, {"label": "list_item", "id": 18, "page_no": 11, "cluster": {"id": 18, "label": "list_item", "bbox": {"l": 307.8297424316406, "t": 462.252197265625, "r": 545.303466796875, "b": 568.1339111328125, "coord_origin": "TOPLEFT"}, "confidence": 0.9126080870628357, "cells": [{"id": 157, "text": "9.", "bbox": {"l": 320.81702, "t": 463.39108, "r": 328.67316, "b": 472.29764, "coord_origin": "TOPLEFT"}}, {"id": 158, "text": "Pick up the remaining orphan cells. There could be", "bbox": {"l": 331.29187, "t": 463.39108, "r": 545.11499, "b": 472.29764, "coord_origin": "TOPLEFT"}}, {"id": 159, "text": "cases, when after applying all the previous post-processing", "bbox": {"l": 308.86203, "t": 475.34607, "r": 545.11505, "b": 484.25262, "coord_origin": "TOPLEFT"}}, {"id": 160, "text": "steps, some PDF cells could still remain without any match", "bbox": {"l": 308.86203, "t": 487.30106, "r": 545.11517, "b": 496.20761, "coord_origin": "TOPLEFT"}}, {"id": 161, "text": "to predicted cells.", "bbox": {"l": 308.86203, "t": 499.25604, "r": 381.89786, "b": 508.1626, "coord_origin": "TOPLEFT"}}, {"id": 162, "text": "However, it is still possible to deduce", "bbox": {"l": 388.7023, "t": 499.25604, "r": 545.11517, "b": 508.1626, "coord_origin": "TOPLEFT"}}, {"id": 163, "text": "the correct matching for an orphan PDF cell by mapping its", "bbox": {"l": 308.86203, "t": 511.21204, "r": 545.11511, "b": 520.11859, "coord_origin": "TOPLEFT"}}, {"id": 164, "text": "bounding box on the geometry of the grid. This mapping", "bbox": {"l": 308.86203, "t": 523.16702, "r": 545.11505, "b": 532.07358, "coord_origin": "TOPLEFT"}}, {"id": 165, "text": "decides if the content of the orphan cell will be appended to", "bbox": {"l": 308.86203, "t": 535.12201, "r": 545.11499, "b": 544.02858, "coord_origin": "TOPLEFT"}}, {"id": 166, "text": "an already matched table cell, or a new table cell should be", "bbox": {"l": 308.86203, "t": 547.07703, "r": 545.11517, "b": 555.98358, "coord_origin": "TOPLEFT"}}, {"id": 167, "text": "created to match with the orphan.", "bbox": {"l": 308.86203, "t": 559.03203, "r": 442.22147000000007, "b": 567.93858, "coord_origin": "TOPLEFT"}}]}, "text": "9. Pick up the remaining orphan cells. There could be cases, when after applying all the previous post-processing steps, some PDF cells could still remain without any match to predicted cells. However, it is still possible to deduce the correct matching for an orphan PDF cell by mapping its bounding box on the geometry of the grid. This mapping decides if the content of the orphan cell will be appended to an already matched table cell, or a new table cell should be created to match with the orphan."}, {"label": "text", "id": 19, "page_no": 11, "cluster": {"id": 19, "label": "text", "bbox": {"l": 308.12921142578125, "t": 570.203857421875, "r": 545.2933349609375, "b": 604.15459, "coord_origin": "TOPLEFT"}, "confidence": 0.8459469676017761, "cells": [{"id": 168, "text": "9a. Compute the top and bottom boundary of the hori-", "bbox": {"l": 320.81702, "t": 571.33803, "r": 545.11493, "b": 580.24458, "coord_origin": "TOPLEFT"}}, {"id": 169, "text": "zontal band for each grid row (min/max", "bbox": {"l": 308.86203, "t": 583.29303, "r": 471.64093, "b": 592.19958, "coord_origin": "TOPLEFT"}}, {"id": 170, "text": "y", "bbox": {"l": 474.83405, "t": 583.1336200000001, "r": 479.71872, "b": 591.98041, "coord_origin": "TOPLEFT"}}, {"id": 171, "text": "coordinates per", "bbox": {"l": 483.26903999999996, "t": 583.29303, "r": 545.11688, "b": 592.19958, "coord_origin": "TOPLEFT"}}, {"id": 172, "text": "row).", "bbox": {"l": 308.86206, "t": 595.24803, "r": 329.91306, "b": 604.15459, "coord_origin": "TOPLEFT"}}]}, "text": "9a. Compute the top and bottom boundary of the horizontal band for each grid row (min/max y coordinates per row)."}, {"label": "list_item", "id": 20, "page_no": 11, "cluster": {"id": 20, "label": "list_item", "bbox": {"l": 308.2972106933594, "t": 606.1244506835938, "r": 545.11505, "b": 628.4155900000001, "coord_origin": "TOPLEFT"}, "confidence": 0.7716894149780273, "cells": [{"id": 173, "text": "9b.", "bbox": {"l": 320.81705, "t": 607.55304, "r": 332.8718, "b": 616.4595899999999, "coord_origin": "TOPLEFT"}}, {"id": 174, "text": "Intersect the orphan\u2019s bounding box with the row", "bbox": {"l": 339.92532, "t": 607.55304, "r": 545.11505, "b": 616.4595899999999, "coord_origin": "TOPLEFT"}}, {"id": 175, "text": "bands, and map the cell to the closest grid row.", "bbox": {"l": 308.86206, "t": 619.50903, "r": 495.2923, "b": 628.4155900000001, "coord_origin": "TOPLEFT"}}]}, "text": "9b. Intersect the orphan\u2019s bounding box with the row bands, and map the cell to the closest grid row."}, {"label": "list_item", "id": 21, "page_no": 11, "cluster": {"id": 21, "label": "list_item", "bbox": {"l": 308.36566162109375, "t": 630.9808959960938, "r": 545.11505, "b": 664.63059, "coord_origin": "TOPLEFT"}, "confidence": 0.8584903478622437, "cells": [{"id": 176, "text": "9c. Compute the left and right boundary of the vertical", "bbox": {"l": 320.81705, "t": 631.81403, "r": 545.11505, "b": 640.72058, "coord_origin": "TOPLEFT"}}, {"id": 177, "text": "band for each grid column (min/max", "bbox": {"l": 308.86206, "t": 643.7690299999999, "r": 455.28238, "b": 652.67558, "coord_origin": "TOPLEFT"}}, {"id": 178, "text": "x", "bbox": {"l": 457.77704, "t": 643.60962, "r": 463.47067, "b": 652.45641, "coord_origin": "TOPLEFT"}}, {"id": 179, "text": "coordinates per col-", "bbox": {"l": 465.97104, "t": 643.7690299999999, "r": 545.11389, "b": 652.67558, "coord_origin": "TOPLEFT"}}, {"id": 180, "text": "umn).", "bbox": {"l": 308.86206, "t": 655.72403, "r": 332.38376, "b": 664.63059, "coord_origin": "TOPLEFT"}}]}, "text": "9c. Compute the left and right boundary of the vertical band for each grid column (min/max x coordinates per column)."}, {"label": "list_item", "id": 22, "page_no": 11, "cluster": {"id": 22, "label": "list_item", "bbox": {"l": 308.18072509765625, "t": 667.103271484375, "r": 545.11499, "b": 688.9920654296875, "coord_origin": "TOPLEFT"}, "confidence": 0.7590745687484741, "cells": [{"id": 181, "text": "9d. Intersect the orphan\u2019s bounding box with the column", "bbox": {"l": 320.81705, "t": 668.03003, "r": 545.11499, "b": 676.93659, "coord_origin": "TOPLEFT"}}, {"id": 182, "text": "bands, and map the cell to the closest grid column.", "bbox": {"l": 308.86206, "t": 679.98503, "r": 510.5848700000001, "b": 688.89159, "coord_origin": "TOPLEFT"}}]}, "text": "9d. Intersect the orphan\u2019s bounding box with the column bands, and map the cell to the closest grid column."}, {"label": "list_item", "id": 23, "page_no": 11, "cluster": {"id": 23, "label": "list_item", "bbox": {"l": 308.44647216796875, "t": 691.6022338867188, "r": 545.11517, "b": 713.151596, "coord_origin": "TOPLEFT"}, "confidence": 0.6971184611320496, "cells": [{"id": 183, "text": "9e. If the table cell under the identified row and column", "bbox": {"l": 320.81705, "t": 692.290024, "r": 545.11505, "b": 701.196594, "coord_origin": "TOPLEFT"}}, {"id": 184, "text": "is not empty, extend its content with the content of the or-", "bbox": {"l": 308.86206, "t": 704.245026, "r": 545.11517, "b": 713.151596, "coord_origin": "TOPLEFT"}}]}, "text": "9e. If the table cell under the identified row and column is not empty, extend its content with the content of the or-"}], "headers": [{"label": "page_footer", "id": 24, "page_no": 11, "cluster": {"id": 24, "label": "page_footer", "bbox": {"l": 292.63107, "t": 733.1248168945312, "r": 302.59366, "b": 743.039593, "coord_origin": "TOPLEFT"}, "confidence": 0.9126596450805664, "cells": [{"id": 185, "text": "12", "bbox": {"l": 292.63107, "t": 734.13303, "r": 302.59366, "b": 743.039593, "coord_origin": "TOPLEFT"}}]}, "text": "12"}]}}, {"page_no": 12, "size": {"width": 612.0, "height": 792.0}, "cells": [{"id": 0, "text": "phan cell.", "bbox": {"l": 50.112, "t": 75.20836999999995, "r": 88.846588, "b": 84.11492999999996, "coord_origin": "TOPLEFT"}}, {"id": 1, "text": "9f. Otherwise create a new structural cell and match it", "bbox": {"l": 62.067001, "t": 87.16339000000005, "r": 286.36496, "b": 96.06994999999995, "coord_origin": "TOPLEFT"}}, {"id": 2, "text": "wit the orphan cell.", "bbox": {"l": 50.112, "t": 99.11841000000004, "r": 127.03322, "b": 108.02495999999985, "coord_origin": "TOPLEFT"}}, {"id": 3, "text": "Aditional images with examples of TableFormer predic-", "bbox": {"l": 62.067001, "t": 111.16309000000001, "r": 286.36499, "b": 119.7508499999999, "coord_origin": "TOPLEFT"}}, {"id": 4, "text": "tions and post-processing can be found below.", "bbox": {"l": 50.112, "t": 123.11810000000003, "r": 234.06139999999996, "b": 131.70587, "coord_origin": "TOPLEFT"}}, {"id": 5, "text": "Figure 8: Example of a table with multi-line header.", "bbox": {"l": 63.341, "t": 502.05637, "r": 273.13342, "b": 510.96292, "coord_origin": "TOPLEFT"}}, {"id": 6, "text": "Figure 9:", "bbox": {"l": 308.862, "t": 306.59836, "r": 345.63397, "b": 315.50491, "coord_origin": "TOPLEFT"}}, {"id": 7, "text": "Example of a table with big empty distance be-", "bbox": {"l": 352.78711, "t": 306.59836, "r": 545.11511, "b": 315.50491, "coord_origin": "TOPLEFT"}}, {"id": 8, "text": "tween cells.", "bbox": {"l": 308.862, "t": 318.55334, "r": 355.89545, "b": 327.45990000000006, "coord_origin": "TOPLEFT"}}, {"id": 9, "text": "Figure 10: Example of a complex table with empty cells.", "bbox": {"l": 312.34299, "t": 680.4933599999999, "r": 541.63232, "b": 689.39993, "coord_origin": "TOPLEFT"}}, {"id": 10, "text": "13", "bbox": {"l": 292.63098, "t": 734.133358, "r": 302.59357, "b": 743.039921, "coord_origin": "TOPLEFT"}}], "predictions": {"layout": {"clusters": [{"id": 0, "label": "text", "bbox": {"l": 49.737815856933594, "t": 74.53302001953125, "r": 88.846588, "b": 84.11492999999996, "coord_origin": "TOPLEFT"}, "confidence": 0.7545598745346069, "cells": [{"id": 0, "text": "phan cell.", "bbox": {"l": 50.112, "t": 75.20836999999995, "r": 88.846588, "b": 84.11492999999996, "coord_origin": "TOPLEFT"}}]}, {"id": 1, "label": "text", "bbox": {"l": 49.445377349853516, "t": 86.59966278076172, "r": 286.36496, "b": 108.02495999999985, "coord_origin": "TOPLEFT"}, "confidence": 0.9170761108398438, "cells": [{"id": 1, "text": "9f. Otherwise create a new structural cell and match it", "bbox": {"l": 62.067001, "t": 87.16339000000005, "r": 286.36496, "b": 96.06994999999995, "coord_origin": "TOPLEFT"}}, {"id": 2, "text": "wit the orphan cell.", "bbox": {"l": 50.112, "t": 99.11841000000004, "r": 127.03322, "b": 108.02495999999985, "coord_origin": "TOPLEFT"}}]}, {"id": 2, "label": "text", "bbox": {"l": 49.376834869384766, "t": 110.35437774658203, "r": 286.36499, "b": 132.31216430664062, "coord_origin": "TOPLEFT"}, "confidence": 0.945468544960022, "cells": [{"id": 3, "text": "Aditional images with examples of TableFormer predic-", "bbox": {"l": 62.067001, "t": 111.16309000000001, "r": 286.36499, "b": 119.7508499999999, "coord_origin": "TOPLEFT"}}, {"id": 4, "text": "tions and post-processing can be found below.", "bbox": {"l": 50.112, "t": 123.11810000000003, "r": 234.06139999999996, "b": 131.70587, "coord_origin": "TOPLEFT"}}]}, {"id": 3, "label": "caption", "bbox": {"l": 62.8093147277832, "t": 501.3182678222656, "r": 273.13342, "b": 511.12255859375, "coord_origin": "TOPLEFT"}, "confidence": 0.897108793258667, "cells": [{"id": 5, "text": "Figure 8: Example of a table with multi-line header.", "bbox": {"l": 63.341, "t": 502.05637, "r": 273.13342, "b": 510.96292, "coord_origin": "TOPLEFT"}}]}, {"id": 4, "label": "caption", "bbox": {"l": 308.3953552246094, "t": 305.9048156738281, "r": 545.11511, "b": 327.45990000000006, "coord_origin": "TOPLEFT"}, "confidence": 0.87733393907547, "cells": [{"id": 6, "text": "Figure 9:", "bbox": {"l": 308.862, "t": 306.59836, "r": 345.63397, "b": 315.50491, "coord_origin": "TOPLEFT"}}, {"id": 7, "text": "Example of a table with big empty distance be-", "bbox": {"l": 352.78711, "t": 306.59836, "r": 545.11511, "b": 315.50491, "coord_origin": "TOPLEFT"}}, {"id": 8, "text": "tween cells.", "bbox": {"l": 308.862, "t": 318.55334, "r": 355.89545, "b": 327.45990000000006, "coord_origin": "TOPLEFT"}}]}, {"id": 5, "label": "caption", "bbox": {"l": 311.76544189453125, "t": 679.7958374023438, "r": 541.63232, "b": 689.9346313476562, "coord_origin": "TOPLEFT"}, "confidence": 0.9188511967658997, "cells": [{"id": 9, "text": "Figure 10: Example of a complex table with empty cells.", "bbox": {"l": 312.34299, "t": 680.4933599999999, "r": 541.63232, "b": 689.39993, "coord_origin": "TOPLEFT"}}]}, {"id": 6, "label": "page_footer", "bbox": {"l": 292.63098, "t": 733.4386596679688, "r": 302.59357, "b": 743.039921, "coord_origin": "TOPLEFT"}, "confidence": 0.9020506143569946, "cells": [{"id": 10, "text": "13", "bbox": {"l": 292.63098, "t": 734.133358, "r": 302.59357, "b": 743.039921, "coord_origin": "TOPLEFT"}}]}, {"id": 7, "label": "picture", "bbox": {"l": 333.9573669433594, "t": 593.1134033203125, "r": 518.4768676757812, "b": 665.4903564453125, "coord_origin": "TOPLEFT"}, "confidence": 0.8023570775985718, "cells": []}, {"id": 8, "label": "picture", "bbox": {"l": 309.79150390625, "t": 253.90536499023438, "r": 425.9603271484375, "b": 292.39398193359375, "coord_origin": "TOPLEFT"}, "confidence": 0.6956619620323181, "cells": []}]}, "tablestructure": {"table_map": {}}, "figures_classification": null, "equations_prediction": null}, "assembled": {"elements": [{"label": "text", "id": 0, "page_no": 12, "cluster": {"id": 0, "label": "text", "bbox": {"l": 49.737815856933594, "t": 74.53302001953125, "r": 88.846588, "b": 84.11492999999996, "coord_origin": "TOPLEFT"}, "confidence": 0.7545598745346069, "cells": [{"id": 0, "text": "phan cell.", "bbox": {"l": 50.112, "t": 75.20836999999995, "r": 88.846588, "b": 84.11492999999996, "coord_origin": "TOPLEFT"}}]}, "text": "phan cell."}, {"label": "text", "id": 1, "page_no": 12, "cluster": {"id": 1, "label": "text", "bbox": {"l": 49.445377349853516, "t": 86.59966278076172, "r": 286.36496, "b": 108.02495999999985, "coord_origin": "TOPLEFT"}, "confidence": 0.9170761108398438, "cells": [{"id": 1, "text": "9f. Otherwise create a new structural cell and match it", "bbox": {"l": 62.067001, "t": 87.16339000000005, "r": 286.36496, "b": 96.06994999999995, "coord_origin": "TOPLEFT"}}, {"id": 2, "text": "wit the orphan cell.", "bbox": {"l": 50.112, "t": 99.11841000000004, "r": 127.03322, "b": 108.02495999999985, "coord_origin": "TOPLEFT"}}]}, "text": "9f. Otherwise create a new structural cell and match it wit the orphan cell."}, {"label": "text", "id": 2, "page_no": 12, "cluster": {"id": 2, "label": "text", "bbox": {"l": 49.376834869384766, "t": 110.35437774658203, "r": 286.36499, "b": 132.31216430664062, "coord_origin": "TOPLEFT"}, "confidence": 0.945468544960022, "cells": [{"id": 3, "text": "Aditional images with examples of TableFormer predic-", "bbox": {"l": 62.067001, "t": 111.16309000000001, "r": 286.36499, "b": 119.7508499999999, "coord_origin": "TOPLEFT"}}, {"id": 4, "text": "tions and post-processing can be found below.", "bbox": {"l": 50.112, "t": 123.11810000000003, "r": 234.06139999999996, "b": 131.70587, "coord_origin": "TOPLEFT"}}]}, "text": "Aditional images with examples of TableFormer predictions and post-processing can be found below."}, {"label": "caption", "id": 3, "page_no": 12, "cluster": {"id": 3, "label": "caption", "bbox": {"l": 62.8093147277832, "t": 501.3182678222656, "r": 273.13342, "b": 511.12255859375, "coord_origin": "TOPLEFT"}, "confidence": 0.897108793258667, "cells": [{"id": 5, "text": "Figure 8: Example of a table with multi-line header.", "bbox": {"l": 63.341, "t": 502.05637, "r": 273.13342, "b": 510.96292, "coord_origin": "TOPLEFT"}}]}, "text": "Figure 8: Example of a table with multi-line header."}, {"label": "caption", "id": 4, "page_no": 12, "cluster": {"id": 4, "label": "caption", "bbox": {"l": 308.3953552246094, "t": 305.9048156738281, "r": 545.11511, "b": 327.45990000000006, "coord_origin": "TOPLEFT"}, "confidence": 0.87733393907547, "cells": [{"id": 6, "text": "Figure 9:", "bbox": {"l": 308.862, "t": 306.59836, "r": 345.63397, "b": 315.50491, "coord_origin": "TOPLEFT"}}, {"id": 7, "text": "Example of a table with big empty distance be-", "bbox": {"l": 352.78711, "t": 306.59836, "r": 545.11511, "b": 315.50491, "coord_origin": "TOPLEFT"}}, {"id": 8, "text": "tween cells.", "bbox": {"l": 308.862, "t": 318.55334, "r": 355.89545, "b": 327.45990000000006, "coord_origin": "TOPLEFT"}}]}, "text": "Figure 9: Example of a table with big empty distance between cells."}, {"label": "caption", "id": 5, "page_no": 12, "cluster": {"id": 5, "label": "caption", "bbox": {"l": 311.76544189453125, "t": 679.7958374023438, "r": 541.63232, "b": 689.9346313476562, "coord_origin": "TOPLEFT"}, "confidence": 0.9188511967658997, "cells": [{"id": 9, "text": "Figure 10: Example of a complex table with empty cells.", "bbox": {"l": 312.34299, "t": 680.4933599999999, "r": 541.63232, "b": 689.39993, "coord_origin": "TOPLEFT"}}]}, "text": "Figure 10: Example of a complex table with empty cells."}, {"label": "page_footer", "id": 6, "page_no": 12, "cluster": {"id": 6, "label": "page_footer", "bbox": {"l": 292.63098, "t": 733.4386596679688, "r": 302.59357, "b": 743.039921, "coord_origin": "TOPLEFT"}, "confidence": 0.9020506143569946, "cells": [{"id": 10, "text": "13", "bbox": {"l": 292.63098, "t": 734.133358, "r": 302.59357, "b": 743.039921, "coord_origin": "TOPLEFT"}}]}, "text": "13"}, {"label": "picture", "id": 7, "page_no": 12, "cluster": {"id": 7, "label": "picture", "bbox": {"l": 333.9573669433594, "t": 593.1134033203125, "r": 518.4768676757812, "b": 665.4903564453125, "coord_origin": "TOPLEFT"}, "confidence": 0.8023570775985718, "cells": []}, "text": "", "data": null, "provenance": null, "predicted_class": null, "confidence": null}, {"label": "picture", "id": 8, "page_no": 12, "cluster": {"id": 8, "label": "picture", "bbox": {"l": 309.79150390625, "t": 253.90536499023438, "r": 425.9603271484375, "b": 292.39398193359375, "coord_origin": "TOPLEFT"}, "confidence": 0.6956619620323181, "cells": []}, "text": "", "data": null, "provenance": null, "predicted_class": null, "confidence": null}], "body": [{"label": "text", "id": 0, "page_no": 12, "cluster": {"id": 0, "label": "text", "bbox": {"l": 49.737815856933594, "t": 74.53302001953125, "r": 88.846588, "b": 84.11492999999996, "coord_origin": "TOPLEFT"}, "confidence": 0.7545598745346069, "cells": [{"id": 0, "text": "phan cell.", "bbox": {"l": 50.112, "t": 75.20836999999995, "r": 88.846588, "b": 84.11492999999996, "coord_origin": "TOPLEFT"}}]}, "text": "phan cell."}, {"label": "text", "id": 1, "page_no": 12, "cluster": {"id": 1, "label": "text", "bbox": {"l": 49.445377349853516, "t": 86.59966278076172, "r": 286.36496, "b": 108.02495999999985, "coord_origin": "TOPLEFT"}, "confidence": 0.9170761108398438, "cells": [{"id": 1, "text": "9f. Otherwise create a new structural cell and match it", "bbox": {"l": 62.067001, "t": 87.16339000000005, "r": 286.36496, "b": 96.06994999999995, "coord_origin": "TOPLEFT"}}, {"id": 2, "text": "wit the orphan cell.", "bbox": {"l": 50.112, "t": 99.11841000000004, "r": 127.03322, "b": 108.02495999999985, "coord_origin": "TOPLEFT"}}]}, "text": "9f. Otherwise create a new structural cell and match it wit the orphan cell."}, {"label": "text", "id": 2, "page_no": 12, "cluster": {"id": 2, "label": "text", "bbox": {"l": 49.376834869384766, "t": 110.35437774658203, "r": 286.36499, "b": 132.31216430664062, "coord_origin": "TOPLEFT"}, "confidence": 0.945468544960022, "cells": [{"id": 3, "text": "Aditional images with examples of TableFormer predic-", "bbox": {"l": 62.067001, "t": 111.16309000000001, "r": 286.36499, "b": 119.7508499999999, "coord_origin": "TOPLEFT"}}, {"id": 4, "text": "tions and post-processing can be found below.", "bbox": {"l": 50.112, "t": 123.11810000000003, "r": 234.06139999999996, "b": 131.70587, "coord_origin": "TOPLEFT"}}]}, "text": "Aditional images with examples of TableFormer predictions and post-processing can be found below."}, {"label": "caption", "id": 3, "page_no": 12, "cluster": {"id": 3, "label": "caption", "bbox": {"l": 62.8093147277832, "t": 501.3182678222656, "r": 273.13342, "b": 511.12255859375, "coord_origin": "TOPLEFT"}, "confidence": 0.897108793258667, "cells": [{"id": 5, "text": "Figure 8: Example of a table with multi-line header.", "bbox": {"l": 63.341, "t": 502.05637, "r": 273.13342, "b": 510.96292, "coord_origin": "TOPLEFT"}}]}, "text": "Figure 8: Example of a table with multi-line header."}, {"label": "caption", "id": 4, "page_no": 12, "cluster": {"id": 4, "label": "caption", "bbox": {"l": 308.3953552246094, "t": 305.9048156738281, "r": 545.11511, "b": 327.45990000000006, "coord_origin": "TOPLEFT"}, "confidence": 0.87733393907547, "cells": [{"id": 6, "text": "Figure 9:", "bbox": {"l": 308.862, "t": 306.59836, "r": 345.63397, "b": 315.50491, "coord_origin": "TOPLEFT"}}, {"id": 7, "text": "Example of a table with big empty distance be-", "bbox": {"l": 352.78711, "t": 306.59836, "r": 545.11511, "b": 315.50491, "coord_origin": "TOPLEFT"}}, {"id": 8, "text": "tween cells.", "bbox": {"l": 308.862, "t": 318.55334, "r": 355.89545, "b": 327.45990000000006, "coord_origin": "TOPLEFT"}}]}, "text": "Figure 9: Example of a table with big empty distance between cells."}, {"label": "caption", "id": 5, "page_no": 12, "cluster": {"id": 5, "label": "caption", "bbox": {"l": 311.76544189453125, "t": 679.7958374023438, "r": 541.63232, "b": 689.9346313476562, "coord_origin": "TOPLEFT"}, "confidence": 0.9188511967658997, "cells": [{"id": 9, "text": "Figure 10: Example of a complex table with empty cells.", "bbox": {"l": 312.34299, "t": 680.4933599999999, "r": 541.63232, "b": 689.39993, "coord_origin": "TOPLEFT"}}]}, "text": "Figure 10: Example of a complex table with empty cells."}, {"label": "picture", "id": 7, "page_no": 12, "cluster": {"id": 7, "label": "picture", "bbox": {"l": 333.9573669433594, "t": 593.1134033203125, "r": 518.4768676757812, "b": 665.4903564453125, "coord_origin": "TOPLEFT"}, "confidence": 0.8023570775985718, "cells": []}, "text": "", "data": null, "provenance": null, "predicted_class": null, "confidence": null}, {"label": "picture", "id": 8, "page_no": 12, "cluster": {"id": 8, "label": "picture", "bbox": {"l": 309.79150390625, "t": 253.90536499023438, "r": 425.9603271484375, "b": 292.39398193359375, "coord_origin": "TOPLEFT"}, "confidence": 0.6956619620323181, "cells": []}, "text": "", "data": null, "provenance": null, "predicted_class": null, "confidence": null}], "headers": [{"label": "page_footer", "id": 6, "page_no": 12, "cluster": {"id": 6, "label": "page_footer", "bbox": {"l": 292.63098, "t": 733.4386596679688, "r": 302.59357, "b": 743.039921, "coord_origin": "TOPLEFT"}, "confidence": 0.9020506143569946, "cells": [{"id": 10, "text": "13", "bbox": {"l": 292.63098, "t": 734.133358, "r": 302.59357, "b": 743.039921, "coord_origin": "TOPLEFT"}}]}, "text": "13"}]}}, {"page_no": 13, "size": {"width": 612.0, "height": 792.0}, "cells": [{"id": 0, "text": "Figure 11:", "bbox": {"l": 50.112, "t": 356.77036, "r": 93.050797, "b": 365.67691, "coord_origin": "TOPLEFT"}}, {"id": 1, "text": "Simple table with different style and empty", "bbox": {"l": 103.73071, "t": 356.77036, "r": 286.36508, "b": 365.67691, "coord_origin": "TOPLEFT"}}, {"id": 2, "text": "cells.", "bbox": {"l": 50.112, "t": 368.72534, "r": 70.864098, "b": 377.6319, "coord_origin": "TOPLEFT"}}, {"id": 3, "text": "Figure 12: Simple table predictions and post processing.", "bbox": {"l": 54.618998999999995, "t": 671.81836, "r": 281.8559, "b": 680.72492, "coord_origin": "TOPLEFT"}}, {"id": 4, "text": "Figure 13: Table predictions example on colorful table.", "bbox": {"l": 315.79001, "t": 371.68436, "r": 538.18524, "b": 380.59091, "coord_origin": "TOPLEFT"}}, {"id": 5, "text": "Figure 14: Example with multi-line text.", "bbox": {"l": 344.98499, "t": 683.54636, "r": 508.98935000000006, "b": 692.452927, "coord_origin": "TOPLEFT"}}, {"id": 6, "text": "14", "bbox": {"l": 292.63098, "t": 734.133362, "r": 302.59357, "b": 743.039925, "coord_origin": "TOPLEFT"}}], "predictions": {"layout": {"clusters": [{"id": 0, "label": "caption", "bbox": {"l": 49.527069091796875, "t": 356.1941223144531, "r": 286.36508, "b": 377.6319, "coord_origin": "TOPLEFT"}, "confidence": 0.951069176197052, "cells": [{"id": 0, "text": "Figure 11:", "bbox": {"l": 50.112, "t": 356.77036, "r": 93.050797, "b": 365.67691, "coord_origin": "TOPLEFT"}}, {"id": 1, "text": "Simple table with different style and empty", "bbox": {"l": 103.73071, "t": 356.77036, "r": 286.36508, "b": 365.67691, "coord_origin": "TOPLEFT"}}, {"id": 2, "text": "cells.", "bbox": {"l": 50.112, "t": 368.72534, "r": 70.864098, "b": 377.6319, "coord_origin": "TOPLEFT"}}]}, {"id": 1, "label": "caption", "bbox": {"l": 53.980308532714844, "t": 671.2754516601562, "r": 281.8559, "b": 681.1620483398438, "coord_origin": "TOPLEFT"}, "confidence": 0.926384687423706, "cells": [{"id": 3, "text": "Figure 12: Simple table predictions and post processing.", "bbox": {"l": 54.618998999999995, "t": 671.81836, "r": 281.8559, "b": 680.72492, "coord_origin": "TOPLEFT"}}]}, {"id": 2, "label": "caption", "bbox": {"l": 315.4080505371094, "t": 371.25640869140625, "r": 538.18524, "b": 380.9472961425781, "coord_origin": "TOPLEFT"}, "confidence": 0.9178511500358582, "cells": [{"id": 4, "text": "Figure 13: Table predictions example on colorful table.", "bbox": {"l": 315.79001, "t": 371.68436, "r": 538.18524, "b": 380.59091, "coord_origin": "TOPLEFT"}}]}, {"id": 3, "label": "caption", "bbox": {"l": 344.41790771484375, "t": 682.5609741210938, "r": 508.98935000000006, "b": 692.452927, "coord_origin": "TOPLEFT"}, "confidence": 0.9191021919250488, "cells": [{"id": 5, "text": "Figure 14: Example with multi-line text.", "bbox": {"l": 344.98499, "t": 683.54636, "r": 508.98935000000006, "b": 692.452927, "coord_origin": "TOPLEFT"}}]}, {"id": 4, "label": "page_footer", "bbox": {"l": 292.63098, "t": 733.5170288085938, "r": 302.59357, "b": 743.039925, "coord_origin": "TOPLEFT"}, "confidence": 0.8877151608467102, "cells": [{"id": 6, "text": "14", "bbox": {"l": 292.63098, "t": 734.133362, "r": 302.59357, "b": 743.039925, "coord_origin": "TOPLEFT"}}]}, {"id": 5, "label": "picture", "bbox": {"l": 50.40477752685547, "t": 611.0038452148438, "r": 177.0564422607422, "b": 656.1609497070312, "coord_origin": "TOPLEFT"}, "confidence": 0.7871121168136597, "cells": []}, {"id": 6, "label": "picture", "bbox": {"l": 318.8900146484375, "t": 96.33707427978516, "r": 534.3455200195312, "b": 149.86273193359375, "coord_origin": "TOPLEFT"}, "confidence": 0.7031845450401306, "cells": []}, {"id": 7, "label": "picture", "bbox": {"l": 319.0057678222656, "t": 226.10633850097656, "r": 534.408935546875, "b": 279.8576965332031, "coord_origin": "TOPLEFT"}, "confidence": 0.6806504130363464, "cells": []}, {"id": 8, "label": "picture", "bbox": {"l": 328.1381530761719, "t": 288.6817932128906, "r": 523.8916015625, "b": 358.2724304199219, "coord_origin": "TOPLEFT"}, "confidence": 0.6624093651771545, "cells": []}, {"id": 9, "label": "picture", "bbox": {"l": 52.22420883178711, "t": 214.05335998535156, "r": 167.55191040039062, "b": 254.2655487060547, "coord_origin": "TOPLEFT"}, "confidence": 0.6446252465248108, "cells": []}, {"id": 10, "label": "picture", "bbox": {"l": 52.18278884887695, "t": 110.0562744140625, "r": 167.67349243164062, "b": 149.3771514892578, "coord_origin": "TOPLEFT"}, "confidence": 0.6424618363380432, "cells": []}]}, "tablestructure": {"table_map": {}}, "figures_classification": null, "equations_prediction": null}, "assembled": {"elements": [{"label": "caption", "id": 0, "page_no": 13, "cluster": {"id": 0, "label": "caption", "bbox": {"l": 49.527069091796875, "t": 356.1941223144531, "r": 286.36508, "b": 377.6319, "coord_origin": "TOPLEFT"}, "confidence": 0.951069176197052, "cells": [{"id": 0, "text": "Figure 11:", "bbox": {"l": 50.112, "t": 356.77036, "r": 93.050797, "b": 365.67691, "coord_origin": "TOPLEFT"}}, {"id": 1, "text": "Simple table with different style and empty", "bbox": {"l": 103.73071, "t": 356.77036, "r": 286.36508, "b": 365.67691, "coord_origin": "TOPLEFT"}}, {"id": 2, "text": "cells.", "bbox": {"l": 50.112, "t": 368.72534, "r": 70.864098, "b": 377.6319, "coord_origin": "TOPLEFT"}}]}, "text": "Figure 11: Simple table with different style and empty cells."}, {"label": "caption", "id": 1, "page_no": 13, "cluster": {"id": 1, "label": "caption", "bbox": {"l": 53.980308532714844, "t": 671.2754516601562, "r": 281.8559, "b": 681.1620483398438, "coord_origin": "TOPLEFT"}, "confidence": 0.926384687423706, "cells": [{"id": 3, "text": "Figure 12: Simple table predictions and post processing.", "bbox": {"l": 54.618998999999995, "t": 671.81836, "r": 281.8559, "b": 680.72492, "coord_origin": "TOPLEFT"}}]}, "text": "Figure 12: Simple table predictions and post processing."}, {"label": "caption", "id": 2, "page_no": 13, "cluster": {"id": 2, "label": "caption", "bbox": {"l": 315.4080505371094, "t": 371.25640869140625, "r": 538.18524, "b": 380.9472961425781, "coord_origin": "TOPLEFT"}, "confidence": 0.9178511500358582, "cells": [{"id": 4, "text": "Figure 13: Table predictions example on colorful table.", "bbox": {"l": 315.79001, "t": 371.68436, "r": 538.18524, "b": 380.59091, "coord_origin": "TOPLEFT"}}]}, "text": "Figure 13: Table predictions example on colorful table."}, {"label": "caption", "id": 3, "page_no": 13, "cluster": {"id": 3, "label": "caption", "bbox": {"l": 344.41790771484375, "t": 682.5609741210938, "r": 508.98935000000006, "b": 692.452927, "coord_origin": "TOPLEFT"}, "confidence": 0.9191021919250488, "cells": [{"id": 5, "text": "Figure 14: Example with multi-line text.", "bbox": {"l": 344.98499, "t": 683.54636, "r": 508.98935000000006, "b": 692.452927, "coord_origin": "TOPLEFT"}}]}, "text": "Figure 14: Example with multi-line text."}, {"label": "page_footer", "id": 4, "page_no": 13, "cluster": {"id": 4, "label": "page_footer", "bbox": {"l": 292.63098, "t": 733.5170288085938, "r": 302.59357, "b": 743.039925, "coord_origin": "TOPLEFT"}, "confidence": 0.8877151608467102, "cells": [{"id": 6, "text": "14", "bbox": {"l": 292.63098, "t": 734.133362, "r": 302.59357, "b": 743.039925, "coord_origin": "TOPLEFT"}}]}, "text": "14"}, {"label": "picture", "id": 5, "page_no": 13, "cluster": {"id": 5, "label": "picture", "bbox": {"l": 50.40477752685547, "t": 611.0038452148438, "r": 177.0564422607422, "b": 656.1609497070312, "coord_origin": "TOPLEFT"}, "confidence": 0.7871121168136597, "cells": []}, "text": "", "data": null, "provenance": null, "predicted_class": null, "confidence": null}, {"label": "picture", "id": 6, "page_no": 13, "cluster": {"id": 6, "label": "picture", "bbox": {"l": 318.8900146484375, "t": 96.33707427978516, "r": 534.3455200195312, "b": 149.86273193359375, "coord_origin": "TOPLEFT"}, "confidence": 0.7031845450401306, "cells": []}, "text": "", "data": null, "provenance": null, "predicted_class": null, "confidence": null}, {"label": "picture", "id": 7, "page_no": 13, "cluster": {"id": 7, "label": "picture", "bbox": {"l": 319.0057678222656, "t": 226.10633850097656, "r": 534.408935546875, "b": 279.8576965332031, "coord_origin": "TOPLEFT"}, "confidence": 0.6806504130363464, "cells": []}, "text": "", "data": null, "provenance": null, "predicted_class": null, "confidence": null}, {"label": "picture", "id": 8, "page_no": 13, "cluster": {"id": 8, "label": "picture", "bbox": {"l": 328.1381530761719, "t": 288.6817932128906, "r": 523.8916015625, "b": 358.2724304199219, "coord_origin": "TOPLEFT"}, "confidence": 0.6624093651771545, "cells": []}, "text": "", "data": null, "provenance": null, "predicted_class": null, "confidence": null}, {"label": "picture", "id": 9, "page_no": 13, "cluster": {"id": 9, "label": "picture", "bbox": {"l": 52.22420883178711, "t": 214.05335998535156, "r": 167.55191040039062, "b": 254.2655487060547, "coord_origin": "TOPLEFT"}, "confidence": 0.6446252465248108, "cells": []}, "text": "", "data": null, "provenance": null, "predicted_class": null, "confidence": null}, {"label": "picture", "id": 10, "page_no": 13, "cluster": {"id": 10, "label": "picture", "bbox": {"l": 52.18278884887695, "t": 110.0562744140625, "r": 167.67349243164062, "b": 149.3771514892578, "coord_origin": "TOPLEFT"}, "confidence": 0.6424618363380432, "cells": []}, "text": "", "data": null, "provenance": null, "predicted_class": null, "confidence": null}], "body": [{"label": "caption", "id": 0, "page_no": 13, "cluster": {"id": 0, "label": "caption", "bbox": {"l": 49.527069091796875, "t": 356.1941223144531, "r": 286.36508, "b": 377.6319, "coord_origin": "TOPLEFT"}, "confidence": 0.951069176197052, "cells": [{"id": 0, "text": "Figure 11:", "bbox": {"l": 50.112, "t": 356.77036, "r": 93.050797, "b": 365.67691, "coord_origin": "TOPLEFT"}}, {"id": 1, "text": "Simple table with different style and empty", "bbox": {"l": 103.73071, "t": 356.77036, "r": 286.36508, "b": 365.67691, "coord_origin": "TOPLEFT"}}, {"id": 2, "text": "cells.", "bbox": {"l": 50.112, "t": 368.72534, "r": 70.864098, "b": 377.6319, "coord_origin": "TOPLEFT"}}]}, "text": "Figure 11: Simple table with different style and empty cells."}, {"label": "caption", "id": 1, "page_no": 13, "cluster": {"id": 1, "label": "caption", "bbox": {"l": 53.980308532714844, "t": 671.2754516601562, "r": 281.8559, "b": 681.1620483398438, "coord_origin": "TOPLEFT"}, "confidence": 0.926384687423706, "cells": [{"id": 3, "text": "Figure 12: Simple table predictions and post processing.", "bbox": {"l": 54.618998999999995, "t": 671.81836, "r": 281.8559, "b": 680.72492, "coord_origin": "TOPLEFT"}}]}, "text": "Figure 12: Simple table predictions and post processing."}, {"label": "caption", "id": 2, "page_no": 13, "cluster": {"id": 2, "label": "caption", "bbox": {"l": 315.4080505371094, "t": 371.25640869140625, "r": 538.18524, "b": 380.9472961425781, "coord_origin": "TOPLEFT"}, "confidence": 0.9178511500358582, "cells": [{"id": 4, "text": "Figure 13: Table predictions example on colorful table.", "bbox": {"l": 315.79001, "t": 371.68436, "r": 538.18524, "b": 380.59091, "coord_origin": "TOPLEFT"}}]}, "text": "Figure 13: Table predictions example on colorful table."}, {"label": "caption", "id": 3, "page_no": 13, "cluster": {"id": 3, "label": "caption", "bbox": {"l": 344.41790771484375, "t": 682.5609741210938, "r": 508.98935000000006, "b": 692.452927, "coord_origin": "TOPLEFT"}, "confidence": 0.9191021919250488, "cells": [{"id": 5, "text": "Figure 14: Example with multi-line text.", "bbox": {"l": 344.98499, "t": 683.54636, "r": 508.98935000000006, "b": 692.452927, "coord_origin": "TOPLEFT"}}]}, "text": "Figure 14: Example with multi-line text."}, {"label": "picture", "id": 5, "page_no": 13, "cluster": {"id": 5, "label": "picture", "bbox": {"l": 50.40477752685547, "t": 611.0038452148438, "r": 177.0564422607422, "b": 656.1609497070312, "coord_origin": "TOPLEFT"}, "confidence": 0.7871121168136597, "cells": []}, "text": "", "data": null, "provenance": null, "predicted_class": null, "confidence": null}, {"label": "picture", "id": 6, "page_no": 13, "cluster": {"id": 6, "label": "picture", "bbox": {"l": 318.8900146484375, "t": 96.33707427978516, "r": 534.3455200195312, "b": 149.86273193359375, "coord_origin": "TOPLEFT"}, "confidence": 0.7031845450401306, "cells": []}, "text": "", "data": null, "provenance": null, "predicted_class": null, "confidence": null}, {"label": "picture", "id": 7, "page_no": 13, "cluster": {"id": 7, "label": "picture", "bbox": {"l": 319.0057678222656, "t": 226.10633850097656, "r": 534.408935546875, "b": 279.8576965332031, "coord_origin": "TOPLEFT"}, "confidence": 0.6806504130363464, "cells": []}, "text": "", "data": null, "provenance": null, "predicted_class": null, "confidence": null}, {"label": "picture", "id": 8, "page_no": 13, "cluster": {"id": 8, "label": "picture", "bbox": {"l": 328.1381530761719, "t": 288.6817932128906, "r": 523.8916015625, "b": 358.2724304199219, "coord_origin": "TOPLEFT"}, "confidence": 0.6624093651771545, "cells": []}, "text": "", "data": null, "provenance": null, "predicted_class": null, "confidence": null}, {"label": "picture", "id": 9, "page_no": 13, "cluster": {"id": 9, "label": "picture", "bbox": {"l": 52.22420883178711, "t": 214.05335998535156, "r": 167.55191040039062, "b": 254.2655487060547, "coord_origin": "TOPLEFT"}, "confidence": 0.6446252465248108, "cells": []}, "text": "", "data": null, "provenance": null, "predicted_class": null, "confidence": null}, {"label": "picture", "id": 10, "page_no": 13, "cluster": {"id": 10, "label": "picture", "bbox": {"l": 52.18278884887695, "t": 110.0562744140625, "r": 167.67349243164062, "b": 149.3771514892578, "coord_origin": "TOPLEFT"}, "confidence": 0.6424618363380432, "cells": []}, "text": "", "data": null, "provenance": null, "predicted_class": null, "confidence": null}], "headers": [{"label": "page_footer", "id": 4, "page_no": 13, "cluster": {"id": 4, "label": "page_footer", "bbox": {"l": 292.63098, "t": 733.5170288085938, "r": 302.59357, "b": 743.039925, "coord_origin": "TOPLEFT"}, "confidence": 0.8877151608467102, "cells": [{"id": 6, "text": "14", "bbox": {"l": 292.63098, "t": 734.133362, "r": 302.59357, "b": 743.039925, "coord_origin": "TOPLEFT"}}]}, "text": "14"}]}}, {"page_no": 14, "size": {"width": 612.0, "height": 792.0}, "cells": [{"id": 0, "text": "Figure 15: Example with triangular table.", "bbox": {"l": 84.233002, "t": 644.3513800000001, "r": 252.24225, "b": 653.25793, "coord_origin": "TOPLEFT"}}, {"id": 1, "text": "Figure 16: Example of how post-processing helps to restore", "bbox": {"l": 308.86197, "t": 652.93535, "r": 545.11511, "b": 661.8419, "coord_origin": "TOPLEFT"}}, {"id": 2, "text": "mis-aligned bounding boxes prediction artifact.", "bbox": {"l": 308.86197, "t": 664.89035, "r": 497.60349, "b": 673.79691, "coord_origin": "TOPLEFT"}}, {"id": 3, "text": "15", "bbox": {"l": 292.63098, "t": 734.133343, "r": 302.59357, "b": 743.039906, "coord_origin": "TOPLEFT"}}], "predictions": {"layout": {"clusters": [{"id": 0, "label": "caption", "bbox": {"l": 83.66071319580078, "t": 643.9850463867188, "r": 252.24225, "b": 653.6401977539062, "coord_origin": "TOPLEFT"}, "confidence": 0.9257224202156067, "cells": [{"id": 0, "text": "Figure 15: Example with triangular table.", "bbox": {"l": 84.233002, "t": 644.3513800000001, "r": 252.24225, "b": 653.25793, "coord_origin": "TOPLEFT"}}]}, {"id": 1, "label": "caption", "bbox": {"l": 308.33172607421875, "t": 652.3787841796875, "r": 545.148681640625, "b": 673.79691, "coord_origin": "TOPLEFT"}, "confidence": 0.9134756922721863, "cells": [{"id": 1, "text": "Figure 16: Example of how post-processing helps to restore", "bbox": {"l": 308.86197, "t": 652.93535, "r": 545.11511, "b": 661.8419, "coord_origin": "TOPLEFT"}}, {"id": 2, "text": "mis-aligned bounding boxes prediction artifact.", "bbox": {"l": 308.86197, "t": 664.89035, "r": 497.60349, "b": 673.79691, "coord_origin": "TOPLEFT"}}]}, {"id": 2, "label": "page_footer", "bbox": {"l": 292.63098, "t": 733.3248901367188, "r": 302.59357, "b": 743.039906, "coord_origin": "TOPLEFT"}, "confidence": 0.9037021398544312, "cells": [{"id": 3, "text": "15", "bbox": {"l": 292.63098, "t": 734.133343, "r": 302.59357, "b": 743.039906, "coord_origin": "TOPLEFT"}}]}, {"id": 3, "label": "picture", "bbox": {"l": 55.423927307128906, "t": 384.5550537109375, "r": 280.2311096191406, "b": 497.563720703125, "coord_origin": "TOPLEFT"}, "confidence": 0.8410321474075317, "cells": []}, {"id": 4, "label": "picture", "bbox": {"l": 55.1163330078125, "t": 136.25503540039062, "r": 279.370849609375, "b": 249.33453369140625, "coord_origin": "TOPLEFT"}, "confidence": 0.8099270462989807, "cells": []}, {"id": 5, "label": "picture", "bbox": {"l": 50.648193359375, "t": 505.98046875, "r": 319.9103088378906, "b": 631.263671875, "coord_origin": "TOPLEFT"}, "confidence": 0.8057584762573242, "cells": []}]}, "tablestructure": {"table_map": {}}, "figures_classification": null, "equations_prediction": null}, "assembled": {"elements": [{"label": "caption", "id": 0, "page_no": 14, "cluster": {"id": 0, "label": "caption", "bbox": {"l": 83.66071319580078, "t": 643.9850463867188, "r": 252.24225, "b": 653.6401977539062, "coord_origin": "TOPLEFT"}, "confidence": 0.9257224202156067, "cells": [{"id": 0, "text": "Figure 15: Example with triangular table.", "bbox": {"l": 84.233002, "t": 644.3513800000001, "r": 252.24225, "b": 653.25793, "coord_origin": "TOPLEFT"}}]}, "text": "Figure 15: Example with triangular table."}, {"label": "caption", "id": 1, "page_no": 14, "cluster": {"id": 1, "label": "caption", "bbox": {"l": 308.33172607421875, "t": 652.3787841796875, "r": 545.148681640625, "b": 673.79691, "coord_origin": "TOPLEFT"}, "confidence": 0.9134756922721863, "cells": [{"id": 1, "text": "Figure 16: Example of how post-processing helps to restore", "bbox": {"l": 308.86197, "t": 652.93535, "r": 545.11511, "b": 661.8419, "coord_origin": "TOPLEFT"}}, {"id": 2, "text": "mis-aligned bounding boxes prediction artifact.", "bbox": {"l": 308.86197, "t": 664.89035, "r": 497.60349, "b": 673.79691, "coord_origin": "TOPLEFT"}}]}, "text": "Figure 16: Example of how post-processing helps to restore mis-aligned bounding boxes prediction artifact."}, {"label": "page_footer", "id": 2, "page_no": 14, "cluster": {"id": 2, "label": "page_footer", "bbox": {"l": 292.63098, "t": 733.3248901367188, "r": 302.59357, "b": 743.039906, "coord_origin": "TOPLEFT"}, "confidence": 0.9037021398544312, "cells": [{"id": 3, "text": "15", "bbox": {"l": 292.63098, "t": 734.133343, "r": 302.59357, "b": 743.039906, "coord_origin": "TOPLEFT"}}]}, "text": "15"}, {"label": "picture", "id": 3, "page_no": 14, "cluster": {"id": 3, "label": "picture", "bbox": {"l": 55.423927307128906, "t": 384.5550537109375, "r": 280.2311096191406, "b": 497.563720703125, "coord_origin": "TOPLEFT"}, "confidence": 0.8410321474075317, "cells": []}, "text": "", "data": null, "provenance": null, "predicted_class": null, "confidence": null}, {"label": "picture", "id": 4, "page_no": 14, "cluster": {"id": 4, "label": "picture", "bbox": {"l": 55.1163330078125, "t": 136.25503540039062, "r": 279.370849609375, "b": 249.33453369140625, "coord_origin": "TOPLEFT"}, "confidence": 0.8099270462989807, "cells": []}, "text": "", "data": null, "provenance": null, "predicted_class": null, "confidence": null}, {"label": "picture", "id": 5, "page_no": 14, "cluster": {"id": 5, "label": "picture", "bbox": {"l": 50.648193359375, "t": 505.98046875, "r": 319.9103088378906, "b": 631.263671875, "coord_origin": "TOPLEFT"}, "confidence": 0.8057584762573242, "cells": []}, "text": "", "data": null, "provenance": null, "predicted_class": null, "confidence": null}], "body": [{"label": "caption", "id": 0, "page_no": 14, "cluster": {"id": 0, "label": "caption", "bbox": {"l": 83.66071319580078, "t": 643.9850463867188, "r": 252.24225, "b": 653.6401977539062, "coord_origin": "TOPLEFT"}, "confidence": 0.9257224202156067, "cells": [{"id": 0, "text": "Figure 15: Example with triangular table.", "bbox": {"l": 84.233002, "t": 644.3513800000001, "r": 252.24225, "b": 653.25793, "coord_origin": "TOPLEFT"}}]}, "text": "Figure 15: Example with triangular table."}, {"label": "caption", "id": 1, "page_no": 14, "cluster": {"id": 1, "label": "caption", "bbox": {"l": 308.33172607421875, "t": 652.3787841796875, "r": 545.148681640625, "b": 673.79691, "coord_origin": "TOPLEFT"}, "confidence": 0.9134756922721863, "cells": [{"id": 1, "text": "Figure 16: Example of how post-processing helps to restore", "bbox": {"l": 308.86197, "t": 652.93535, "r": 545.11511, "b": 661.8419, "coord_origin": "TOPLEFT"}}, {"id": 2, "text": "mis-aligned bounding boxes prediction artifact.", "bbox": {"l": 308.86197, "t": 664.89035, "r": 497.60349, "b": 673.79691, "coord_origin": "TOPLEFT"}}]}, "text": "Figure 16: Example of how post-processing helps to restore mis-aligned bounding boxes prediction artifact."}, {"label": "picture", "id": 3, "page_no": 14, "cluster": {"id": 3, "label": "picture", "bbox": {"l": 55.423927307128906, "t": 384.5550537109375, "r": 280.2311096191406, "b": 497.563720703125, "coord_origin": "TOPLEFT"}, "confidence": 0.8410321474075317, "cells": []}, "text": "", "data": null, "provenance": null, "predicted_class": null, "confidence": null}, {"label": "picture", "id": 4, "page_no": 14, "cluster": {"id": 4, "label": "picture", "bbox": {"l": 55.1163330078125, "t": 136.25503540039062, "r": 279.370849609375, "b": 249.33453369140625, "coord_origin": "TOPLEFT"}, "confidence": 0.8099270462989807, "cells": []}, "text": "", "data": null, "provenance": null, "predicted_class": null, "confidence": null}, {"label": "picture", "id": 5, "page_no": 14, "cluster": {"id": 5, "label": "picture", "bbox": {"l": 50.648193359375, "t": 505.98046875, "r": 319.9103088378906, "b": 631.263671875, "coord_origin": "TOPLEFT"}, "confidence": 0.8057584762573242, "cells": []}, "text": "", "data": null, "provenance": null, "predicted_class": null, "confidence": null}], "headers": [{"label": "page_footer", "id": 2, "page_no": 14, "cluster": {"id": 2, "label": "page_footer", "bbox": {"l": 292.63098, "t": 733.3248901367188, "r": 302.59357, "b": 743.039906, "coord_origin": "TOPLEFT"}, "confidence": 0.9037021398544312, "cells": [{"id": 3, "text": "15", "bbox": {"l": 292.63098, "t": 734.133343, "r": 302.59357, "b": 743.039906, "coord_origin": "TOPLEFT"}}]}, "text": "15"}]}}, {"page_no": 15, "size": {"width": 612.0, "height": 792.0}, "cells": [{"id": 0, "text": "Figure 17: Example of long table. End-to-end example from initial PDF cells to prediction of bounding boxes, post process-", "bbox": {"l": 50.112, "t": 508.33737, "r": 545.11383, "b": 517.24393, "coord_origin": "TOPLEFT"}}, {"id": 1, "text": "ing and prediction of structure.", "bbox": {"l": 50.112, "t": 520.2923599999999, "r": 173.23975, "b": 529.1989100000001, "coord_origin": "TOPLEFT"}}, {"id": 2, "text": "16", "bbox": {"l": 292.63098, "t": 734.133358, "r": 302.59357, "b": 743.039921, "coord_origin": "TOPLEFT"}}], "predictions": {"layout": {"clusters": [{"id": 0, "label": "caption", "bbox": {"l": 49.31798553466797, "t": 507.8238830566406, "r": 545.11383, "b": 529.7783813476562, "coord_origin": "TOPLEFT"}, "confidence": 0.9624592661857605, "cells": [{"id": 0, "text": "Figure 17: Example of long table. End-to-end example from initial PDF cells to prediction of bounding boxes, post process-", "bbox": {"l": 50.112, "t": 508.33737, "r": 545.11383, "b": 517.24393, "coord_origin": "TOPLEFT"}}, {"id": 1, "text": "ing and prediction of structure.", "bbox": {"l": 50.112, "t": 520.2923599999999, "r": 173.23975, "b": 529.1989100000001, "coord_origin": "TOPLEFT"}}]}, {"id": 1, "label": "page_footer", "bbox": {"l": 292.63098, "t": 733.3433837890625, "r": 302.74853515625, "b": 743.039921, "coord_origin": "TOPLEFT"}, "confidence": 0.9134098291397095, "cells": [{"id": 2, "text": "16", "bbox": {"l": 292.63098, "t": 734.133358, "r": 302.59357, "b": 743.039921, "coord_origin": "TOPLEFT"}}]}, {"id": 2, "label": "picture", "bbox": {"l": 66.79948425292969, "t": 253.61627197265625, "r": 528.5565795898438, "b": 498.1384582519531, "coord_origin": "TOPLEFT"}, "confidence": 0.6913456916809082, "cells": []}]}, "tablestructure": {"table_map": {}}, "figures_classification": null, "equations_prediction": null}, "assembled": {"elements": [{"label": "caption", "id": 0, "page_no": 15, "cluster": {"id": 0, "label": "caption", "bbox": {"l": 49.31798553466797, "t": 507.8238830566406, "r": 545.11383, "b": 529.7783813476562, "coord_origin": "TOPLEFT"}, "confidence": 0.9624592661857605, "cells": [{"id": 0, "text": "Figure 17: Example of long table. End-to-end example from initial PDF cells to prediction of bounding boxes, post process-", "bbox": {"l": 50.112, "t": 508.33737, "r": 545.11383, "b": 517.24393, "coord_origin": "TOPLEFT"}}, {"id": 1, "text": "ing and prediction of structure.", "bbox": {"l": 50.112, "t": 520.2923599999999, "r": 173.23975, "b": 529.1989100000001, "coord_origin": "TOPLEFT"}}]}, "text": "Figure 17: Example of long table. End-to-end example from initial PDF cells to prediction of bounding boxes, post processing and prediction of structure."}, {"label": "page_footer", "id": 1, "page_no": 15, "cluster": {"id": 1, "label": "page_footer", "bbox": {"l": 292.63098, "t": 733.3433837890625, "r": 302.74853515625, "b": 743.039921, "coord_origin": "TOPLEFT"}, "confidence": 0.9134098291397095, "cells": [{"id": 2, "text": "16", "bbox": {"l": 292.63098, "t": 734.133358, "r": 302.59357, "b": 743.039921, "coord_origin": "TOPLEFT"}}]}, "text": "16"}, {"label": "picture", "id": 2, "page_no": 15, "cluster": {"id": 2, "label": "picture", "bbox": {"l": 66.79948425292969, "t": 253.61627197265625, "r": 528.5565795898438, "b": 498.1384582519531, "coord_origin": "TOPLEFT"}, "confidence": 0.6913456916809082, "cells": []}, "text": "", "data": null, "provenance": null, "predicted_class": null, "confidence": null}], "body": [{"label": "caption", "id": 0, "page_no": 15, "cluster": {"id": 0, "label": "caption", "bbox": {"l": 49.31798553466797, "t": 507.8238830566406, "r": 545.11383, "b": 529.7783813476562, "coord_origin": "TOPLEFT"}, "confidence": 0.9624592661857605, "cells": [{"id": 0, "text": "Figure 17: Example of long table. End-to-end example from initial PDF cells to prediction of bounding boxes, post process-", "bbox": {"l": 50.112, "t": 508.33737, "r": 545.11383, "b": 517.24393, "coord_origin": "TOPLEFT"}}, {"id": 1, "text": "ing and prediction of structure.", "bbox": {"l": 50.112, "t": 520.2923599999999, "r": 173.23975, "b": 529.1989100000001, "coord_origin": "TOPLEFT"}}]}, "text": "Figure 17: Example of long table. End-to-end example from initial PDF cells to prediction of bounding boxes, post processing and prediction of structure."}, {"label": "picture", "id": 2, "page_no": 15, "cluster": {"id": 2, "label": "picture", "bbox": {"l": 66.79948425292969, "t": 253.61627197265625, "r": 528.5565795898438, "b": 498.1384582519531, "coord_origin": "TOPLEFT"}, "confidence": 0.6913456916809082, "cells": []}, "text": "", "data": null, "provenance": null, "predicted_class": null, "confidence": null}], "headers": [{"label": "page_footer", "id": 1, "page_no": 15, "cluster": {"id": 1, "label": "page_footer", "bbox": {"l": 292.63098, "t": 733.3433837890625, "r": 302.74853515625, "b": 743.039921, "coord_origin": "TOPLEFT"}, "confidence": 0.9134098291397095, "cells": [{"id": 2, "text": "16", "bbox": {"l": 292.63098, "t": 734.133358, "r": 302.59357, "b": 743.039921, "coord_origin": "TOPLEFT"}}]}, "text": "16"}]}}]