feat: add figure in markdown (#98)

* feat: add figures in markdown

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* update to new docling-core and update test results with figures

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* update with improved docling-core

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
This commit is contained in:
Michele Dolfi 2024-09-24 17:28:23 +02:00 committed by GitHub
parent 001d214a13
commit 6a03c208ec
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
9 changed files with 284 additions and 58 deletions

View File

@ -324,8 +324,10 @@ class ConvertedDocument(BaseModel):
"paragraph",
"caption",
"table",
"figure",
],
strict_text: bool = False,
image_placeholder: str = "<!-- image -->",
):
return self.output.export_to_markdown(
delim=delim,
@ -333,6 +335,7 @@ class ConvertedDocument(BaseModel):
main_text_stop=main_text_stop,
main_text_labels=main_text_labels,
strict_text=strict_text,
image_placeholder=image_placeholder,
)
def render_as_text(

37
poetry.lock generated
View File

@ -957,13 +957,13 @@ files = [
[[package]]
name = "docling-core"
version = "1.5.0"
version = "1.6.2"
description = "A python library to define and validate data types in Docling."
optional = false
python-versions = "<4.0,>=3.9"
files = [
{file = "docling_core-1.5.0-py3-none-any.whl", hash = "sha256:1a8bb4940ecbf98c6381298f3ad121d95aa8895883150a5dd113a348a0987d09"},
{file = "docling_core-1.5.0.tar.gz", hash = "sha256:bc8ddbae16e2b740225f37758125eb95b9fcd4202542c4547a9683a7ad423e10"},
{file = "docling_core-1.6.2-py3-none-any.whl", hash = "sha256:1473ab13910d76552015c10fe351b90079a00c225f76ada3cd4fc7442183ffd0"},
{file = "docling_core-1.6.2.tar.gz", hash = "sha256:63f2b8a683dec56568ee1cd7d25cea419c0291211a88a11f74079ff2d62ccd5e"},
]
[package.dependencies]
@ -972,7 +972,6 @@ jsonref = ">=1.1.0,<2.0.0"
jsonschema = ">=4.16.0,<5.0.0"
pandas = ">=2.2.2,<3.0.0"
pydantic = ">=2.6.0,<3.0.0"
pyproject-toml = ">=0.0.10,<0.0.11"
tabulate = ">=0.9.0,<0.10.0"
[[package]]
@ -4481,23 +4480,6 @@ files = [
flake8 = "6.1.0"
tomli = {version = "*", markers = "python_version < \"3.11\""}
[[package]]
name = "pyproject-toml"
version = "0.0.10"
description = "Project intend to implement PEP 517, 518, 621, 631 and so on."
optional = false
python-versions = "*"
files = [
{file = "pyproject-toml-0.0.10.tar.gz", hash = "sha256:f0ce0e9934ecb00c0e529b4a1c380edd3034c4be65516769c5f080bdb23dfcb3"},
{file = "pyproject_toml-0.0.10-py3-none-any.whl", hash = "sha256:257a7070617e1a0bcfd8f790817b30bd9193876023a9b9e7a6b4fc976acf4c3e"},
]
[package.dependencies]
jsonschema = "*"
setuptools = ">=42"
toml = "*"
wheel = "*"
[[package]]
name = "pyreadline3"
version = "3.5.3"
@ -6256,17 +6238,6 @@ dev = ["tokenizers[testing]"]
docs = ["setuptools-rust", "sphinx", "sphinx-rtd-theme"]
testing = ["black (==22.3)", "datasets", "numpy", "pytest", "requests", "ruff"]
[[package]]
name = "toml"
version = "0.10.2"
description = "Python Library for Tom's Obvious, Minimal Language"
optional = false
python-versions = ">=2.6, !=3.0.*, !=3.1.*, !=3.2.*"
files = [
{file = "toml-0.10.2-py2.py3-none-any.whl", hash = "sha256:806143ae5bfb6a3c6e736a764057db0e6a0e05e338b5630894a5f779cabb4f9b"},
{file = "toml-0.10.2.tar.gz", hash = "sha256:b3bda1d108d5dd99f4a20d24d9c348e91c4db7ab1b749200bded2f839ccbe68f"},
]
[[package]]
name = "tomli"
version = "2.0.1"
@ -7257,4 +7228,4 @@ examples = ["langchain-huggingface", "langchain-milvus", "langchain-text-splitte
[metadata]
lock-version = "2.0"
python-versions = "^3.10"
content-hash = "7ee1e9e99c23e075fb1f8722e4fc9e6c0b02a4282f4e67ebbcd75598720536b7"
content-hash = "d6ede0493d8d2d0e250ba391d9ad32ced98541fbd4795b2b955d6f640736b3bc"

View File

@ -23,7 +23,7 @@ packages = [{include = "docling"}]
[tool.poetry.dependencies]
python = "^3.10"
pydantic = "^2.0.0"
docling-core = "^1.5.0"
docling-core = "^1.6.2"
docling-ibm-models = "^1.2.0"
deepsearch-glm = "^0.21.1"
filetype = "^1.2.0"

View File

@ -14,16 +14,19 @@ The occurrence of tables in documents is ubiquitous. They often summarise quanti
Tables organize valuable content in a concise and compact representation. This content is extremely valuable for systems such as search engines, Knowledge Graph's, etc, since they enhance their predictive capabilities. Unfortunately, tables come in a large variety of shapes and sizes. Furthermore, they can have complex column/row-header configurations, multiline rows, different variety of separation lines, missing entries, etc. As such, the correct identification of the table-structure from an image is a nontrivial task. In this paper, we present a new table-structure identification model. The latter improves the latest end-toend deep learning model (i.e. encoder-dual-decoder from PubTabNet) in two significant ways. First, we introduce a new object detection decoder for table-cells. In this way, we can obtain the content of the table-cells from programmatic PDF's directly from the PDF source and avoid the training of the custom OCR decoders. This architectural change leads to more accurate table-content extraction and allows us to tackle non-english tables. Second, we replace the LSTM decoders with transformer based decoders. This upgrade improves significantly the previous state-of-the-art tree-editing-distance-score (TEDS) from 91% to 98.5% on simple tables and from 88.7% to 95% on complex tables.
Tables organize valuable content in a concise and compact representation. This content is extremely valuable for systems such as search engines, Knowledge Graph's, etc, since they enhance their predictive capabilities. Unfortunately, tables come in a large variety of shapes and sizes. Furthermore, they can have complex column/row-header configurations, multiline rows, different variety of separation lines, missing entries, etc. As such, the correct identification of the table-structure from an image is a nontrivial task. In this paper, we present a new table-structure identification model. The latter improves the latest end-toend deep learning model (i.e. encoder-dual-decoder from PubTabNet) in two significant ways. First, we introduce a new object detection decoder for table-cells. In this way, we can obtain the content of the table-cells from programmatic PDF's directly from the PDF source and avoid the training of the custom OCR decoders. This architectural change leads to more accurate table-content extraction and allows us to tackle non-english tables. Second, we replace the LSTM decoders with transformer based decoders. This upgrade improves significantly the previous state-of-the-art tree-editing-distance-score (TEDS) from 91% to 98.5% on simple tables and from 88.7% to 95% on complex tables.
| | 3 | 1 |
|----|-----|-----|
| 2 | | |
b. Red-annotation of bounding boxes, Blue-predictions by TableFormer
<!-- image -->
c. Structure predicted by TableFormer:
Figure 1: Picture of a table with subtle, complex features such as (1) multi-column headers, (2) cell with multi-row text and (3) cells with no content. Image from PubTabNet evaluation set, filename: 'PMC2944238 004 02'.
| 0 | 1 | 1 | 2 1 | 2 1 | |
|-----|-----|-----|-------|-------|----|
| 3 | 4 | 5 3 | 6 | 7 | |
@ -76,6 +79,7 @@ Hybrid Deep Learning-Rule-Based approach : A popular current model for table-str
We rely on large-scale datasets such as PubTabNet [37], FinTabNet [36], and TableBank [17] datasets to train and evaluate our models. These datasets span over various appearance styles and content. We also introduce our own synthetically generated SynthTabNet dataset to fix an im-
Figure 2: Distribution of the tables across different table dimensions in PubTabNet + FinTabNet datasets
<!-- image -->
balance in the previous datasets.
@ -94,7 +98,6 @@ Motivated by those observations we aimed at generating a synthetic table dataset
In this regard, we have prepared four synthetic datasets, each one containing 150k examples. The corpora to generate the table text consists of the most frequent terms appearing in PubTabNet and FinTabNet together with randomly generated text. The first two synthetic datasets have been fine-tuned to mimic the appearance of the original datasets but encompass more complicated table structures. The third
Table 1: Both "Combined-Tabnet" and "CombinedTabnet" are variations of the following: (*) The CombinedTabnet dataset is the processed combination of PubTabNet and Fintabnet. (**) The combined dataset is the processed combination of PubTabNet, Fintabnet and TableBank.
| | Tags | Bbox | Size | Format |
|--------------------|--------|--------|--------|----------|
| PubTabNet | 3 | 3 | 509k | PNG |
@ -119,8 +122,10 @@ We now describe in detail the proposed method, which is composed of three main c
CNN Backbone Network. A ResNet-18 CNN is the backbone that receives the table image and encodes it as a vector of predefined length. The network has been modified by removing the linear and pooling layer, as we are not per-
Figure 3: TableFormer takes in an image of the PDF and creates bounding box and HTML structure predictions that are synchronized. The bounding boxes grabs the content from the PDF and inserts it in the structure.
<!-- image -->
Figure 4: Given an input image of a table, the Encoder produces fixed-length features that represent the input image. The features are then passed to both the Structure Decoder and Cell BBox Decoder . During training, the Structure Decoder receives 'tokenized tags' of the HTML code that represent the table structure. Afterwards, a transformer encoder and decoder architecture is employed to produce features that are received by a linear layer, and the Cell BBox Decoder. The linear layer is applied to the features to predict the tags. Simultaneously, the Cell BBox Decoder selects features referring to the data cells (' < td > ', ' < ') and passes them through an attention network, an MLP, and a linear layer to predict the bounding boxes.
<!-- image -->
forming classification, and adding an adaptive pooling layer of size 28*28. ResNet by default downsamples the image resolution by 32 and then the encoded image is provided to both the Structure Decoder , and Cell BBox Decoder .
@ -175,7 +180,6 @@ where T$_{a}$ and T$_{b}$ represent tables in tree structure HTML format. EditDi
Structure. As shown in Tab. 2, TableFormer outperforms all SOTA methods across different datasets by a large margin for predicting the table structure from an image. All the more, our model outperforms pre-trained methods. During the evaluation we do not apply any table filtering. We also provide our baseline results on the SynthTabNet dataset. It has been observed that large tables (e.g. tables that occupy half of the page or more) yield poor predictions. We attribute this issue to the image resizing during the preprocessing step, that produces downsampled images with indistinguishable features. This problem can be addressed by treating such big tables with a separate model which accepts a large input image size.
Table 2: Structure results on PubTabNet (PTN), FinTabNet (FTN), TableBank (TB) and SynthTabNet (STN).
| Model | Dataset | Simple | TEDS Complex | All |
|-------------|-----------|----------|----------------|-------|
| EDD | PTN | 91.1 | 88.7 | 89.9 |
@ -196,7 +200,6 @@ Cell Detection. Like any object detector, our Cell BBox Detector provides boundi
our Cell BBox Decoder accuracy for cells with a class label of 'content' only using the PASCAL VOC mAP metric for pre-processing and post-processing. Note that we do not have post-processing results for SynthTabNet as images are only provided. To compare the performance of our proposed approach, we've integrated TableFormer's Cell BBox Decoder into EDD architecture. As mentioned previously, the Structure Decoder provides the Cell BBox Decoder with the features needed to predict the bounding box predictions. Therefore, the accuracy of the Structure Decoder directly influences the accuracy of the Cell BBox Decoder . If the Structure Decoder predicts an extra column, this will result in an extra column of predicted bounding boxes.
Table 3: Cell Bounding Box detection results on PubTabNet, and FinTabNet. PP: Post-processing.
| Model | Dataset | mAP | mAP (PP) |
|-------------|-------------|-------|------------|
| EDD+BBox | PubTabNet | 79.2 | 82.7 |
@ -206,7 +209,6 @@ Table 3: Cell Bounding Box detection results on PubTabNet, and FinTabNet. PP: Po
Cell Content. In this section, we evaluate the entire pipeline of recovering a table with content. Here we put our approach to test by capitalizing on extracting content from the PDF cells rather than decoding from images. Tab. 4 shows the TEDs score of HTML code representing the structure of the table along with the content inserted in the data cell and compared with the ground-truth. Our method achieved a 5.3% increase over the state-of-the-art, and commercial solutions. We believe our scores would be higher if the HTML ground-truth matched the extracted PDF cell content. Unfortunately, there are small discrepancies such as spacings around words or special characters with various unicode representations.
Table 4: Results of structure with content retrieved using cell detection on PubTabNet. In all cases the input is PDF documents with cropped tables.
| Model | Simple | TEDS Complex | All |
|-------------|----------|----------------|-------|
| Tabula | 78 | 57.8 | 67.9 |
@ -222,8 +224,15 @@ Japanese language (previously unseen by TableFormer):
Example table from FinTabNet:
<!-- image -->
<!-- image -->
b. Structure predicted by TableFormer, with superimposed matched PDF cell text:
| | | 論文ファイル | 論文ファイル | 参考文献 | 参考文献 |
|----------------------------------------------------|-------------|----------------|----------------|------------|------------|
| 出典 | ファイル 数 | 英語 | 日本語 | 英語 | 日本語 |
@ -237,7 +246,6 @@ b. Structure predicted by TableFormer, with superimposed matched PDF cell text:
| | 945 | 294 | 651 | 1122 | 955 |
Text is aligned to match original for ease of viewing
| | Shares (in millions) | Shares (in millions) | Weighted Average Grant Date Fair Value | Weighted Average Grant Date Fair Value |
|--------------------------|------------------------|------------------------|------------------------------------------|------------------------------------------|
| | RS U s | PSUs | RSUs | PSUs |
@ -248,8 +256,13 @@ Text is aligned to match original for ease of viewing
| Nonvested on December 31 | 1.0 | 0.3 | 104.85 $ | $ 104.51 |
Figure 5: One of the benefits of TableFormer is that it is language agnostic, as an example, the left part of the illustration demonstrates TableFormer predictions on previously unseen language (Japanese). Additionally, we see that TableFormer is robust to variability in style and content, right side of the illustration shows the example of the TableFormer prediction from the FinTabNet dataset.
<!-- image -->
<!-- image -->
Figure 6: An example of TableFormer predictions (bounding boxes and structure) from generated SynthTabNet table.
<!-- image -->
## 5.5. Qualitative Analysis
@ -380,6 +393,7 @@ The process of generating a synthetic dataset can be decomposed into the followi
Although TableFormer can predict the table structure and the bounding boxes for tables recognized inside PDF documents, this is not enough when a full reconstruction of the original table is required. This happens mainly due the following reasons:
Figure 7: Distribution of the tables across different dimensions per dataset. Simple vs complex tables per dataset and split, strict vs non strict html structures per dataset and table complexity, missing bboxes per dataset and table complexity.
<!-- image -->
· TableFormer output does not include the table cell content.
@ -432,19 +446,33 @@ Aditional images with examples of TableFormer predictions and post-processing ca
Figure 8: Example of a table with multi-line header.
Figure 9: Example of a table with big empty distance between cells.
<!-- image -->
Figure 10: Example of a complex table with empty cells.
<!-- image -->
<!-- image -->
Figure 11: Simple table with different style and empty cells.
<!-- image -->
Figure 12: Simple table predictions and post processing.
<!-- image -->
Figure 13: Table predictions example on colorful table.
Figure 14: Example with multi-line text.
<!-- image -->
Figure 16: Example of how post-processing helps to restore mis-aligned bounding boxes prediction artifact.
<!-- image -->
<!-- image -->
Figure 15: Example with triangular table.
<!-- image -->
Figure 17: Example of long table. End-to-end example from initial PDF cells to prediction of bounding boxes, post processing and prediction of structure.
Figure 17: Example of long table. End-to-end example from initial PDF cells to prediction of bounding boxes, post processing and prediction of structure.
<!-- image -->

View File

@ -23,6 +23,7 @@ Permission to make digital or hard copies of part or all of this work for person
KDD '22, August 14-18, 2022, Washington, DC, USA © 2022 Copyright held by the owner/author(s). ACM ISBN 978-1-4503-9385-0/22/08. https://doi.org/10.1145/3534678.3539043
Figure 1: Four examples of complex page layouts across different document categories
<!-- image -->
## KEYWORDS
@ -69,6 +70,7 @@ DocLayNet contains 80863 PDF pages. Among these, 7059 carry two instances of hum
In addition to open intellectual property constraints for the source documents, we required that the documents in DocLayNet adhere to a few conditions. Firstly, we kept scanned documents
Figure 2: Distribution of DocLayNet pages across document categories.
<!-- image -->
to a minimum, since they introduce difficulties in annotation (see Section 4). As a second condition, we focussed on medium to large documents ( > 10 pages) with technical content, dense in complex tables, figures, plots and captions. Such documents carry a lot of information value, but are often hard to analyse with high accuracy due to their challenging layouts. Counterexamples of documents not included in the dataset are receipts, invoices, hand-written documents or photographs showing "text in the wild".
@ -89,7 +91,6 @@ Despite being cost-intense and far less scalable than automation, human annotati
The annotation campaign was carried out in four phases. In phase one, we identified and prepared the data sources for annotation. In phase two, we determined the class labels and how annotations should be done on the documents in order to obtain maximum consistency. The latter was guided by a detailed requirement analysis and exhaustive experiments. In phase three, we trained the annotation staff and performed exams for quality assurance. In phase four,
Table 1: DocLayNet dataset overview. Along with the frequency of each class label, we present the relative occurrence (as % of row "Total") in the train, test and validation sets. The inter-annotator agreement is computed as the mAP@0.5-0.95 metric between pairwise annotations from the triple-annotated pages, from which we obtain accuracy ranges.
| | | % of Total | % of Total | % of Total | triple inter-annotator mAP @ 0.5-0.95 (%) | triple inter-annotator mAP @ 0.5-0.95 (%) | triple inter-annotator mAP @ 0.5-0.95 (%) | triple inter-annotator mAP @ 0.5-0.95 (%) | triple inter-annotator mAP @ 0.5-0.95 (%) | triple inter-annotator mAP @ 0.5-0.95 (%) | triple inter-annotator mAP @ 0.5-0.95 (%) |
|----------------|---------|--------------|--------------|--------------|---------------------------------------------|---------------------------------------------|---------------------------------------------|---------------------------------------------|---------------------------------------------|---------------------------------------------|---------------------------------------------|
| class label | Count | Train | Test | Val | All | Fin | Man | Sci | Law | Pat | Ten |
@ -107,6 +108,7 @@ Table 1: DocLayNet dataset overview. Along with the frequency of each class labe
| Total | 1107470 | 941123 | 99816 | 66531 | 82-83 | 71-74 | 79-81 | 89-94 | 86-91 | 71-76 | 68-85 |
Figure 3: Corpus Conversion Service annotation user interface. The PDF page is shown in the background, with overlaid text-cells (in darker shades). The annotation boxes can be drawn by dragging a rectangle over each segment with the respective label from the palette on the right.
<!-- image -->
we distributed the annotation workload and performed continuous quality controls. Phase one and two required a small team of experts only. For phases three and four, a group of 40 dedicated annotators were assembled and supervised.
@ -141,6 +143,7 @@ The complete annotation guideline is over 100 pages long and a detailed descript
Phase 3: Training. After a first trial with a small group of people, we realised that providing the annotation guideline and a set of random practice pages did not yield the desired quality level for layout annotation. Therefore we prepared a subset of pages with two different complexity levels, each with a practice and an exam part. 974 pages were reference-annotated by one proficient core team member. Annotation staff were then given the task to annotate the same subsets (blinded from the reference). By comparing the annotations of each staff member with the reference annotations, we could quantify how closely their annotations matched the reference. Only after passing two exam levels with high annotation quality, staff were admitted into the production phase. Practice iterations
Figure 4: Examples of plausible annotation alternatives for the same page. Criteria in our annotation guideline can resolve cases A to C, while the case D remains ambiguous.
<!-- image -->
were carried out over a timeframe of 12 weeks, after which 8 of the 40 initially allocated annotators did not pass the bar.
@ -148,6 +151,7 @@ Phase 4: Production annotation. The previously selected 80K pages were annotated
Table 2: Prediction performance (mAP@0.5-0.95) of object detection networks on DocLayNet test set. The MRCNN (Mask R-CNN) and FRCNN (Faster R-CNN) models with ResNet-50 or ResNet-101 backbone were trained based on the network architectures from the detectron2 model zoo (Mask R-CNN R50, R101-FPN 3x, Faster R-CNN R101-FPN 3x), with default configurations. The YOLO implementation utilized was YOLOv5x6 [13]. All models were initialised using pre-trained weights from the COCO 2017 dataset.
Table 2: Prediction performance (mAP@0.5-0.95) of object detection networks on DocLayNet test set. The MRCNN (Mask R-CNN) and FRCNN (Faster R-CNN) models with ResNet-50 or ResNet-101 backbone were trained based on the network architectures from the detectron2 model zoo (Mask R-CNN R50, R101-FPN 3x, Faster R-CNN R101-FPN 3x), with default configurations. The YOLO implementation utilized was YOLOv5x6 [13]. All models were initialised using pre-trained weights from the COCO 2017 dataset.
| | human | MRCNN | MRCNN | FRCNN | YOLO |
|----------------|---------|---------|---------|---------|--------|
| | human | R50 | R101 | R101 | v5x6 |
@ -171,6 +175,7 @@ to avoid this at any cost in order to have clear, unbiased baseline numbers for
The primary goal of DocLayNet is to obtain high-quality ML models capable of accurate document-layout analysis on a wide variety of challenging layouts. As discussed in Section 2, object detection models are currently the easiest to use, due to the standardisation of ground-truth data in COCO format [16] and the availability of general frameworks such as detectron2 [17]. Furthermore, baseline numbers in PubLayNet and DocBank were obtained using standard object detection models such as Mask R-CNN and Faster R-CNN. As such, we will relate to these object detection methods in this
Figure 5: Prediction performance (mAP@0.5-0.95) of a Mask R-CNN network with ResNet50 backbone trained on increasing fractions of the DocLayNet dataset. The learning curve flattens around the 80% mark, indicating that increasing the size of the DocLayNet dataset with similar data will not yield significantly better predictions.
<!-- image -->
paper and leave the detailed evaluation of more recent methods mentioned in Section 2 for future work.
@ -181,7 +186,6 @@ In this section, we will present several aspects related to the performance of o
In Table 2, we present baseline experiments (given in mAP) on Mask R-CNN [12], Faster R-CNN [11], and YOLOv5 [13]. Both training and evaluation were performed on RGB images with dimensions of 1025 × 1025 pixels. For training, we only used one annotation in case of redundantly annotated pages. As one can observe, the variation in mAP between the models is rather low, but overall between 6 and 10% lower than the mAP computed from the pairwise human annotations on triple-annotated pages. This gives a good indication that the DocLayNet dataset poses a worthwhile challenge for the research community to close the gap between human recognition and ML approaches. It is interesting to see that Mask R-CNN and Faster R-CNN produce very comparable mAP scores, indicating that pixel-based image segmentation derived from bounding-boxes does not help to obtain better predictions. On the other hand, the more recent Yolov5x model does very well and even out-performs humans on selected labels such as Text , Table and Picture . This is not entirely surprising, as Text , Table and Picture are abundant and the most visually distinctive in a document.
Table 3: Performance of a Mask R-CNN R50 network in mAP@0.5-0.95 scores trained on DocLayNet with different class label sets. The reduced label sets were obtained by either down-mapping or dropping labels.
| Class-count | 11 | 6 | 5 | 4 |
|----------------|------|---------|---------|---------|
| Caption | 68 | Text | Text | Text |
@ -206,7 +210,6 @@ One of the fundamental questions related to any dataset is if it is "large enoug
The choice and number of labels can have a significant effect on the overall model performance. Since PubLayNet, DocBank and DocLayNet all have different label sets, it is of particular interest to understand and quantify this influence of the label set on the model performance. We investigate this by either down-mapping labels into more common ones (e.g. Caption → Text ) or excluding them from the annotations entirely. Furthermore, it must be stressed that all mappings and exclusions were performed on the data before model training. In Table 3, we present the mAP scores for a Mask R-CNN R50 network on different label sets. Where a label is down-mapped, we show its corresponding label, otherwise it was excluded. We present three different label sets, with 6, 5 and 4 different labels respectively. The set of 5 labels contains the same labels as PubLayNet. However, due to the different definition of
Table 4: Performance of a Mask R-CNN R50 network with document-wise and page-wise split for different label sets. Naive page-wise split will result in GLYPH<tildelow> 10% point improvement.
| Class-count | 11 | 11 | 5 | 5 |
|----------------|------|------|-----|------|
| Split | Doc | Page | Doc | Page |
@ -235,6 +238,7 @@ Throughout this paper, we claim that DocLayNet's wider variety of document layou
Table 5: Prediction Performance (mAP@0.5-0.95) of a Mask R-CNN R50 network across the PubLayNet, DocBank & DocLayNet data-sets. By evaluating on common label classes of each dataset, we observe that the DocLayNet-trained model has much less pronounced variations in performance across all datasets.
Table 5: Prediction Performance (mAP@0.5-0.95) of a Mask R-CNN R50 network across the PubLayNet, DocBank & DocLayNet data-sets. By evaluating on common label classes of each dataset, we observe that the DocLayNet-trained model has much less pronounced variations in performance across all datasets.
| | Testing on | Testing on | Testing on |
|------------|--------------|--------------|--------------|
| labels | PLN | DB | DLN |
@ -297,6 +301,7 @@ To date, there is still a significant gap between human and ML accuracy on the l
[13] Glenn Jocher, Alex Stoken, Ayush Chaurasia, Jirka Borovec, NanoCode012, TaoXie, Yonghye Kwon, Kalen Michael, Liu Changyu, Jiacong Fang, Abhiram V, Laughing, tkianai, yxNONG, Piotr Skalski, Adam Hogan, Jebastin Nadar, imyhxy, Lorenzo Mammana, Alex Wang, Cristi Fati, Diego Montes, Jan Hajek, Laurentiu
Figure 6: Example layout predictions on selected pages from the DocLayNet test-set. (A, D) exhibit favourable results on coloured backgrounds. (B, C) show accurate list-item and paragraph differentiation despite densely-spaced lines. (E) demonstrates good table and figure distinction. (F) shows predictions on a Chinese patent with multiple overlaps, label confusion and missing boxes.
<!-- image -->
Diaconu, Mai Thanh Minh, Marc, albinxavi, fatih, oleg, and wanghao yang. ultralytics/yolov5: v6.0 - yolov5n nano models, roboflow integration, tensorflow export, opencv dnn support, October 2021.

View File

@ -5,7 +5,6 @@ order to compute the TED score. Inference timing results for all experiments wer
We have chosen the PubTabNet data set to perform HPO, since it includes a highly diverse set of tables. Also we report TED scores separately for simple and complex tables (tables with cell spans). Results are presented in Table. 1. It is evident that with OTSL, our model achieves the same TED score and slightly better mAP scores in comparison to HTML. However OTSL yields a 2x speed up in the inference runtime over HTML.
Table 1. HPO performed in OTSL and HTML representation on the same transformer-based TableFormer [9] architecture, trained only on PubTabNet [22]. Effects of reducing the # of layers in encoder and decoder stages of the model show that smaller models trained on OTSL perform better, especially in recognizing complex table structures, and maintain a much higher mAP score than the HTML counterpart.
| # | # | Language | TEDs | TEDs | TEDs | mAP | Inference |
|------------|------------|------------|-------------|-------------------|-------------|-------------|-------------|
| enc-layers | dec-layers | Language | simple | complex | all | (0.75) | time (secs) |

View File

@ -16,6 +16,9 @@ In modern document understanding systems [1,15], table extraction is typically a
Fig. 1. Comparison between HTML and OTSL table structure representation: (A) table-example with complex row and column headers, including a 2D empty span, (B) minimal graphical representation of table structure using rectangular layout, (C) HTML representation, (D) OTSL representation. This example demonstrates many of the key-features of OTSL, namely its reduced vocabulary size (12 versus 5 in this case), its reduced sequence length (55 versus 30) and a enhanced internal structure (variable token sequence length per row in HTML versus a fixed length of rows in OTSL).
Fig. 1. Comparison between HTML and OTSL table structure representation: (A) table-example with complex row and column headers, including a 2D empty span, (B) minimal graphical representation of table structure using rectangular layout, (C) HTML representation, (D) OTSL representation. This example demonstrates many of the key-features of OTSL, namely its reduced vocabulary size (12 versus 5 in this case), its reduced sequence length (55 versus 30) and a enhanced internal structure (variable token sequence length per row in HTML versus a fixed length of rows in OTSL).
<!-- image -->
today, table detection in documents is a well understood problem, and the latest state-of-the-art (SOTA) object detection methods provide an accuracy comparable to human observers [7,8,10,14,23]. On the other hand, the problem of table structure recognition (TSR) is a lot more challenging and remains a very active area of research, in which many novel machine learning algorithms are being explored [3,4,5,9,11,12,13,14,17,18,21,22].
Recently emerging SOTA methods for table structure recognition employ transformer-based models, in which an image of the table is provided to the network in order to predict the structure of the table as a sequence of tokens. These image-to-sequence (Im2Seq) models are extremely powerful, since they allow for a purely data-driven solution. The tokens of the sequence typically belong to a markup language such as HTML, Latex or Markdown, which allow to describe table structure as rows, columns and spanning cells in various configurations. In Figure 1, we illustrate how HTML is used to represent the table-structure of a particular example table. Public table-structure data sets such as PubTabNet [22], and FinTabNet [21], which were created in a semi-automated way from paired PDF and HTML sources (e.g. PubMed Central), popularized primarily the use of HTML as ground-truth representation format for TSR.
@ -43,6 +46,7 @@ All known Im2Seq based models for TSR fundamentally work in similar ways. Given
ulary and can be interpreted as a table structure. For example, with the HTML tokens <table> , </table> , <tr> , </tr> , <td> and </td> , one can construct simple table structures without any spanning cells. In reality though, one needs at least 28 HTML tokens to describe the most common complex tables observed in real-world documents [21,22], due to a variety of spanning cells definitions in the HTML token vocabulary.
Fig. 2. Frequency of tokens in HTML and OTSL as they appear in PubTabNet.
<!-- image -->
Obviously, HTML and other general-purpose markup languages were not designed for Im2Seq models. As such, they have some serious drawbacks. First, the token vocabulary needs to be artificially large in order to describe all plausible tabular structures. Since most Im2Seq models use an autoregressive approach, they generate the sequence token by token. Therefore, to reduce inference time, a shorter sequence length is critical. Every table-cell is represented by at least two tokens ( <td> and </td> ). Furthermore, when tokenizing the HTML structure, one needs to explicitly enumerate possible column-spans and row-spans as words. In practice, this ends up requiring 28 different HTML tokens (when including column- and row-spans up to 10 cells) just to describe every table in the PubTabNet dataset. Clearly, not every token is equally represented, as is depicted in Figure 2. This skewed distribution of tokens in combination with variable token row-length makes it challenging for models to learn the HTML structure.
@ -77,6 +81,7 @@ The OTSL vocabulary is comprised of the following tokens:
A notable attribute of OTSL is that it has the capability of achieving lossless conversion to HTML.
Fig. 3. OTSL description of table structure: A - table example; B - graphical representation of table structure; C - mapping structure on a grid; D - OTSL structure encoding; E - explanation on cell encoding
<!-- image -->
## 4.2 Language Syntax
@ -111,6 +116,7 @@ The design of OTSL allows to validate a table structure easily on an unfinished
To evaluate the impact of OTSL on prediction accuracy and inference times, we conducted a series of experiments based on the TableFormer model (Figure 4) with two objectives: Firstly we evaluate the prediction quality and performance of OTSL vs. HTML after performing Hyper Parameter Optimization (HPO) on the canonical PubTabNet data set. Secondly we pick the best hyper-parameters found in the first step and evaluate how OTSL impacts the performance of TableFormer after training on other publicly available data sets (FinTabNet, PubTables-1M [14]). The ground truth (GT) from all data sets has been converted into OTSL format for this purpose, and will be made publicly available.
Fig. 4. Architecture sketch of the TableFormer model, which is a representative for the Im2Seq approach.
<!-- image -->
We rely on standard metrics such as Tree Edit Distance score (TEDs) for table structure prediction, and Mean Average Precision (mAP) with 0.75 Intersection Over Union (IOU) threshold for the bounding-box predictions of table cells. The predicted OTSL structures were converted back to HTML format in
@ -121,7 +127,6 @@ order to compute the TED score. Inference timing results for all experiments wer
We have chosen the PubTabNet data set to perform HPO, since it includes a highly diverse set of tables. Also we report TED scores separately for simple and complex tables (tables with cell spans). Results are presented in Table. 1. It is evident that with OTSL, our model achieves the same TED score and slightly better mAP scores in comparison to HTML. However OTSL yields a 2x speed up in the inference runtime over HTML.
Table 1. HPO performed in OTSL and HTML representation on the same transformer-based TableFormer [9] architecture, trained only on PubTabNet [22]. Effects of reducing the # of layers in encoder and decoder stages of the model show that smaller models trained on OTSL perform better, especially in recognizing complex table structures, and maintain a much higher mAP score than the HTML counterpart.
| # | # | Language | TEDs | TEDs | TEDs | mAP | Inference |
|------------|------------|------------|-------------|-------------|-------------|-------------|-------------|
| enc-layers | dec-layers | Language | simple | complex | all | (0.75) | time (secs) |
@ -138,7 +143,6 @@ We picked the model parameter configuration that produced the best prediction qu
Additionally, the results show that OTSL has an advantage over HTML when applied on a bigger data set like PubTables-1M and achieves significantly improved scores. Finally, OTSL achieves faster inference due to fewer decoding steps which is a result of the reduced sequence representation.
Table 2. TSR and cell detection results compared between OTSL and HTML on the PubTabNet [22], FinTabNet [21] and PubTables-1M [14] data sets using TableFormer [9] (with enc=6, dec=6, heads=8).
| | Language | TEDs | TEDs | TEDs | mAP(0.75) | Inference time (secs) |
|--------------|------------|--------|---------|--------|-------------|-------------------------|
| | Language | simple | complex | all | mAP(0.75) | Inference time (secs) |
@ -154,12 +158,14 @@ Table 2. TSR and cell detection results compared between OTSL and HTML on the Pu
To illustrate the qualitative differences between OTSL and HTML, Figure 5 demonstrates less overlap and more accurate bounding boxes with OTSL. In Figure 6, OTSL proves to be more effective in handling tables with longer token sequences, resulting in even more precise structure prediction and bounding boxes.
Fig. 5. The OTSL model produces more accurate bounding boxes with less overlap (E) than the HTML model (D), when predicting the structure of a sparse table (A), at twice the inference speed because of shorter sequence length (B),(C). "PMC2807444_006_00.png" PubTabNet. μ
<!-- image -->
μ
Fig. 6. Visualization of predicted structure and detected bounding boxes on a complex table with many rows. The OTSL model (B) captured repeating pattern of horizontally merged cells from the GT (A), unlike the HTML model (C). The HTML model also didn't complete the HTML sequence correctly and displayed a lot more of drift and overlap of bounding boxes. "PMC5406406_003_01.png" PubTabNet.
<!-- image -->
## 6 Conclusion

View File

@ -1,7 +1,19 @@
Front cover
<!-- image -->
## Row and Column Access Control Support in IBM DB2 for i
<!-- image -->
<!-- image -->
<!-- image -->
International Technical Support Organization
## Row and Column Access Control Support in IBM DB2 for i
@ -20,6 +32,7 @@ Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosur
## Contents
| Notices | . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii |
|------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------|
| Trademarks | . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii |
@ -65,6 +78,7 @@ Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosur
| 3.6.7 Demonstrating data access with RCAC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . | 29 |
| 3.6.8 Demonstrating data access with a view and RCAC . . . . . . . . . . . . . . . . . . . . . . . | 32 |
| Chapter 4. Implementing Row and Column Access Control: Banking example . . . . . | 37 |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 4.1 Business requirements for the RCAC banking scenario . . . . . . . . . . . . . . . . . . . . . . . . | 38 |
@ -116,6 +130,7 @@ Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosur
| | 111 |
| Chapter 7. Row and Column Access Control management . . . . . . . . . . . . . . . . . . . . | Chapter 7. Row and Column Access Control management . . . . . . . . . . . . . . . . . . . . |
| 7.1 Managing row permissions and column masks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . | 114 |
|---------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------|
| 7.1.1 Source management. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . | 114 |
@ -169,6 +184,7 @@ IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of Intern
The following terms are trademarks of the International Business Machines Corporation in the United States, other countries, or both:
| AS/400fi | IBMfi | Redpaper™ |
|------------|----------------|----------------------------|
| DB2fi | Power Systems™ | Redbooks (log o) fi System |
@ -194,6 +210,9 @@ GLYPH<g115>GLYPH<g3> GLYPH<g53>GLYPH<g72>GLYPH<g79>GLYPH<g92>GLYPH<g3> GLYPH<g82
GLYPH<g115>GLYPH<g3> GLYPH<g55> GLYPH<g68>GLYPH<g78>GLYPH<g72>GLYPH<g3> GLYPH<g68>GLYPH<g71>GLYPH<g89>GLYPH<g68>GLYPH<g81>GLYPH<g87>GLYPH<g68>GLYPH<g74>GLYPH<g72>GLYPH<g3> GLYPH<g82>GLYPH<g73>GLYPH<g3> GLYPH<g68>GLYPH<g70>GLYPH<g70>GLYPH<g72>GLYPH<g86>GLYPH<g86>GLYPH<g3> GLYPH<g87>GLYPH<g82>GLYPH<g3> GLYPH<g68> GLYPH<g3> GLYPH<g90>GLYPH<g82>GLYPH<g85>GLYPH<g79>GLYPH<g71>GLYPH<g90>GLYPH<g76>GLYPH<g71>GLYPH<g72>GLYPH<g3> GLYPH<g86>GLYPH<g82>GLYPH<g88>GLYPH<g85>GLYPH<g70>GLYPH<g72>GLYPH<g3> GLYPH<g82>GLYPH<g73>GLYPH<g3> GLYPH<g72>GLYPH<g91>GLYPH<g83>GLYPH<g72>GLYPH<g85>GLYPH<g87>GLYPH<g76>GLYPH<g86>GLYPH<g72>
<!-- image -->
Power Services
## DB2 for i Center of Excellence
@ -252,6 +271,9 @@ Pricing depends on the scope of work. Learn more about the DB2 for i Center of E
ibm.com GLYPH<g18>GLYPH<g86>GLYPH<g92>GLYPH<g86>GLYPH<g87>GLYPH<g72>GLYPH<g80>GLYPH<g86>GLYPH<g18>GLYPH<g86>GLYPH<g72>GLYPH<g85>GLYPH<g89>GLYPH<g76>GLYPH<g70>GLYPH<g72>GLYPH<g86>GLYPH<g18>GLYPH<g79>GLYPH<g68>GLYPH<g69>GLYPH<g86>GLYPH<g72>GLYPH<g85>GLYPH<g89>GLYPH<g76>GLYPH<g70>GLYPH<g72>GLYPH<g86>
<!-- image -->
© Copyright IBM Corporation 2013
IBM Corporation
@ -268,6 +290,9 @@ This document is current as of the initial date of publication and may be change
Not all offerings are available in every country in which IBM operates.
<!-- image -->
Please Recycle
## Preface
@ -278,12 +303,33 @@ This paper is intended for database engineers, data-centric application develope
This paper was produced by the IBM DB2 for i Center of Excellence team in partnership with the International Technical Support Organization (ITSO), Rochester, Minnesota US.
<!-- image -->
<!-- image -->
Jim Bainbridge is a senior DB2 consultant on the DB2 for i Center of Excellence team in the IBM Lab Services and Training organization. His primary role is training and implementation services for IBM DB2 Web Query for i and business analytics. Jim began his career with IBM 30 years ago in the IBM Rochester Development Lab, where he developed cooperative processing products that paired IBM PCs with IBM S/36 and AS/.400 systems. In the years since, Jim has held numerous technical roles, including independent software vendors technical support on a broad range of IBM technologies and products, and supporting customers in the IBM Executive Briefing Center and IBM Project Office.
Hernando Bedoya is a Senior IT Specialist at STG Lab Services and Training in Rochester, Minnesota. He writes extensively and teaches IBM classes worldwide in all areas of DB2 for i. Before joining STG Lab Services, he worked in the ITSO for nine years writing multiple IBM Redbooksfi publications. He also worked for IBM Colombia as an IBM AS/400fi IT Specialist doing presales support for the Andean countries. He has 28 years of experience in the computing field and has taught database classes in Colombian universities. He holds a Master's degree in Computer Science from EAFIT, Colombia. His areas of expertise are database technology, performance, and data warehousing. Hernando can be contacted at hbedoya@us.ibm.com .
## Authors
<!-- image -->
<!-- image -->
<!-- image -->
<!-- image -->
<!-- image -->
Rob Bestgen is a member of the DB2 for i Center of Excellence team helping customers use the capabilities of DB2 for i. In addition, Rob is the chief architect of the DB2 SQL Query Engine (SQE) for DB2 for i and is the product development manager for DB2 Web Query for i.
Mike Cain is a Senior Technical Staff Member within the IBM Systems and Technology Group. He is also the founder and team leader of the DB2 for i Center of Excellence in Rochester, Minnesota US. Before his current position, he worked as an IBM AS/400 Systems Engineer and technical consultant. Before joining IBM in 1988, Mike worked as a System/38 programmer and data processing manager for a property and casualty insurance company. Mike has 26 years of experience with IBM, engaging clients and Business Partners around the world. In addition to assisting clients, he uses his knowledge and experience to influence the IBM solution, development, and support processes.
@ -294,6 +340,12 @@ Jim Denton is a senior consultant at the IBM DB2 for i Center of Excellence, whe
Doug Mack is a DB2 for i and Business Intelligence Consultant in the IBM Power Systems™ Lab Services organization. Doug's 30+ year career with IBM spans many roles, including product development, technical sales support, Business Intelligence Sales Specialist, and DB2 for i Product Marketing Manager. Doug is a featured speaker at User Group conferences and meetings, IBM Technical Conferences, and Executive Briefings.
<!-- image -->
<!-- image -->
Tom McKinley is an IBM Lab Services Consultant working on DB2 for IBM i in Rochester MN. His main focus is complex query performance that is associated with Business Intelligence running on Very Large Databases. He worked as a developer or performance analyst in the DB area from 1986 until 2006. Some of his major pieces of work include the Symmetric Multiple processing capabilities of DB2 for IBM i and Large Object Data types. In addition, he was on the original team that designed and built the SQL Query Engine. Before his database work, he worked on Licensed Internal Code for System 34 and System 36.
Kent Milligan is a senior DB2 consultant on the DB2 for i Center of Excellence team within the IBM Lab Services and Training organization. His primary responsibility is helping software developers use the latest DB2 technologies and port applications from other databases to DB2 for i. After graduating from the University of Iowa, Kent spent the first eight years of his IBM career as a member of the DB2 development team in Rochester.
@ -350,6 +402,9 @@ GLYPH<SM590000> Stay current on recent Redbooks publications with RSS Feeds:
http://www.redbooks.ibm.com/rss.html
<!-- image -->
Chapter 1.
## Securing and protecting IBM DB2 data
@ -401,6 +456,7 @@ As described in 1.2, "Current state of IBM i security" on page 2, object-level c
As shown in Figure 1-1, it is an all-or-nothing access to the rows of a table.
Figure 1-1 All-or-nothing access to the rows of a table
<!-- image -->
Many businesses are trying to limit data access to a need-to-know basis. This security goal means that users should be given access only to the minimum set of data that is required to perform their job. Often, users with object-level access are given access to row and column values that are beyond what their business task requires because that object-level security provides an all-or-nothing solution. For example, object-level controls allow a manager to access data about all employees. Most security policies limit a manager to accessing data only for the employees that they manage.
@ -413,6 +469,7 @@ Using SQL views to limit access to a subset of the data in a table also has its
Even if you are willing to live with these performance and management issues, a user with *ALLOBJ access still can directly access all of the data in the underlying DB2 table and easily bypass the security controls that are built into an SQL view.
Figure 1-2 Existing row and column controls
<!-- image -->
## 1.3.2 New controls: Row and Column Access Control
@ -420,6 +477,9 @@ Based on the challenges that are associated with the existing technology availab
The new DB2 RCAC support provides a method for controlling data access across all interfaces and all types of users with a data-centric solution. Moving security processing to the database layer makes it easier to build controls that meet your compliance policies. The RCAC support provides an additional layer of security that complements object-level authorizations to limit data access to a need-to-know basis. Therefore, it is critical that you first have a sound object-level security implementation in place.
<!-- image -->
Chapter 2.
## Roles and separation of duties
@ -517,7 +577,6 @@ CHGFCNUSG FCNID(QIBM_DB_SECADM) USER(HBEDOYA) USAGE(*ALLOWED)
The FUNCTION_USAGE view contains function usage configuration details. Table 2-1 describes the columns in the FUNCTION_USAGE view.
Table 2-1 FUNCTION_USAGE view
| Column name | Data type | Description |
|---------------|-------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| FUNCTION_ID | VARCHAR(30) | ID of the function. |
@ -528,7 +587,6 @@ Table 2-1 FUNCTION_USAGE view
To discover who has authorization to define and manage RCAC, you can use the query that is shown in Example 2-1.
Example 2-1 Query to determine who has authority to define and manage RCAC
| SELECT | function_id, user_name, |
|----------|------------------------------|
| | usage, |
@ -554,7 +612,6 @@ A preferred practice is that the RCAC administrator has the QIBM_DB_SECADM funct
Table 2-2 shows a comparison of the different function usage IDs and *JOBCTL authority to the different CL commands and DB2 for i tools.
Table 2-2 Comparison of the different function usage IDs and *JOBCTL authority
| User action | *JOBCTL | QIBM_DB_SECADM | QIBM_DB_SQLADM | QIBM_DB_SYSMON No Authority |
|--------------------------------------------------------------------------------|-----------|------------------|------------------|-------------------------------|
| SET CURRENT DEGREE (SQL statement) | X | | X | |
@ -571,6 +628,7 @@ Table 2-2 Comparison of the different function usage IDs and *JOBCTL authority
| | X | | X | |
| CHANGE PLAN CACHE SIZE procedure (currently does not check authority) | X | | X | |
| User action | *JOBCTL | QIBM_DB_SECADM | QIBM_DB_SQLADM | QIBM_DB_SYSMON | No Authority |
|--------------------------------------------------------------|-----------|------------------|------------------|------------------|----------------|
| START PLAN CACHE EVENT MONITOR procedure | X | | X | | |
@ -595,6 +653,9 @@ Table 2-2 Comparison of the different function usage IDs and *JOBCTL authority
| Edit Authorization List ( EDTAUTL ) CL command | | X | | | |
| Work with Authorization Lists ( WRKAUTL ) CL command | | X | | | |
<!-- image -->
Chapter 3.
3
@ -646,6 +707,7 @@ A row permission is a database object that manifests a row access control rule f
The SQL CREATE PERMISSION statement that is shown in Figure 3-1 is used to define and initially enable or disable the row access rules.
Figure 3-1 CREATE PERMISSION SQL statement
<!-- image -->
## Column mask
@ -654,6 +716,7 @@ A column mask is a database object that manifests a column value access control
Column masks replace the need to create and use views to implement access control. The SQL CREATE MASK statement that is shown in Figure 3-2 is used to define and initially enable or disable the column value access rules.
Figure 3-2 CREATE MASK SQL statement
<!-- image -->
## 3.1.2 Enabling and activating RCAC
@ -664,12 +727,14 @@ Enabling and disabling effectively turns on or off the logic that is contained i
Note: An exclusive lock is required on the table object to perform the alter operation. All open cursors must be closed.
Figure 3-3 ALTER PERMISSION and ALTER MASK SQL statements
<!-- image -->
You can activate and deactivate RCAC for new or existing tables by using the SQL ALTER TABLE statement (Figure 3-4). The ACTIVATE or DEACTIVATE clause must be the option that is specified in the statement. No other alterations are permitted at the same time. The activating and deactivating effectively turns on or off all RCAC processing for the table. Only enabled row permissions and column masks take effect when activating RCAC.
Note: An exclusive lock is required on the table object to perform the alter operation. All open cursors must be closed.
Figure 3-4 ALTER TABLE SQL statement
<!-- image -->
When row access control is activated on a table, a default permission is established for that table. The name of this permission is QIBM_DEFAULT_ <table-name>_<schema-name>. This default permission contains a simple piece of logic (0=1) which is never true. The default permission effectively denies access to every user unless there is a permission defined that allows access explicitly. If row access control is activated on a table, and there is no permission that is defined, no one has permission to any rows. All queries against the table produce an empty set.
@ -700,7 +765,6 @@ In addition to these four special registers, any of the DB2 special registers ca
Table 3-1 summarizes these special registers and their values.
Table 3-1 Special registers and their corresponding values
| Special register | Corresponding value |
|----------------------|---------------------------------------------------------------------------------------------------------------------------------------|
| USER or SESSION_USER | The effective user of the thread excluding adopted authority. |
@ -720,6 +784,7 @@ GLYPH<SM590000> While the procedure is running, the special register USER still
GLYPH<SM590000> When proc1 ends, the session reverts to its original state with both USER and CURRENT USER having the value of ALICE.
Figure 3-5 Special registers and adopted authority
<!-- image -->
## 3.2.2 Built-in global variables
@ -730,7 +795,6 @@ IBM DB2 for i supports nine different built-in global variables that are read on
Table 3-2 lists the nine built-in global variables.
Table 3-2 Built-in global variables
| Global variable | Type | Description |
|-----------------------|--------------|----------------------------------------------------------------|
| CLIENT_HOST | VARCHAR(255) | Host name of the current client as returned by the system |
@ -782,7 +846,6 @@ One of the first tasks in either the row permission or the column mask logic is
More sophisticated methods can employ existential, day of year / time of day, and relational comparisons with set operations. For example, you can use a date master or date dimension table to determine whether the current date is a normal business day. If the current date is a valid business day, then access is allowed. If the current date is not a business day (for example a weekend day or holiday), access is denied. This test can be accomplished by performing a lookup using a subquery, such as the one that is shown in Example 3-1.
Example 3-1 Subquery that is used as part of the rule
| CURRENT_DATE IN (SELECT D.DATE_KEY | DATE_MASTER D D.BUSINESS_DAY = 'Y') |
|--------------------------------------|---------------------------------------|
| FROM WHERE | |
@ -874,6 +937,7 @@ SELECT COUNT(*) as ROW_COUNT FROM HR_SCHEMA.EMPLOYEES;
The result of this query is shown in Figure 3-7, which is the total number of employees of the company.
Figure 3-7 Number of employees
<!-- image -->
2. Run a second SQL statement (shown in Example 3-6) that lists the employees. If you have read access to the table, you see all the rows no matter who you are.
@ -908,6 +972,7 @@ CREATE PERMISSION HR_SCHEMA.PERMISSION1_ON_EMPLOYEES ON HR_SCHEMA.EMPLOYEES AS E
2. Look at the definition of the table and see the permissions, as shown in Figure 3-9. QIBM_DEFAULT_EMPLOYEE_HR_SCHEMA is the default permission, as described in 3.1.2, "Enabling and activating RCAC" on page 16.
Figure 3-9 Row permissions that are shown in System i Navigator
<!-- image -->
## 3.6.5 Defining and creating column masks
@ -924,7 +989,6 @@ Define the different masks for the columns that are sensitive by completing the
To implement this column mask, run the SQL statement that is shown in Example 3-8.
Example 3-8 Creation of a mask on the DATE_OF_BIRTH column
| CREATE MASK | HR_SCHEMA.MASK_DATE_OF_BIRTH_ON_EMPLOYEES |
|---------------|---------------------------------------------|
| ON | HR_SCHEMA.EMPLOYEES AS EMPLOYEES |
@ -951,6 +1015,7 @@ CREATE MASK HR_SCHEMA.MASK_TAX_ID_ON_EMPLOYEES ON HR_SCHEMA.EMPLOYEES AS EMPLOYE
3. Figure 3-10 shows the masks that are created in the HR_SCHEMA.
Figure 3-10 Column masks shown in System i Navigator
<!-- image -->
## 3.6.6 Activating RCAC
@ -965,10 +1030,12 @@ Example 3-10 Activating RCAC on the EMPLOYEES table
2. Look at the definition of the EMPLOYEE table, as shown in Figure 3-11. To do this, from the main navigation pane of System i Navigator, click Schemas  HR_SCHEMA  Tables , right-click the EMPLOYEES table, and click Definition .
Figure 3-11 Selecting the EMPLOYEES table from System i Navigator
<!-- image -->
3. The EMPLOYEES table definition is displayed, as shown in Figure 3-12. Note that the Row access control and Column access control options are checked.
Figure 3-12 RCAC enabled on the EMPLOYEES table
<!-- image -->
## 3.6.7 Demonstrating data access with RCAC
@ -984,17 +1051,25 @@ SELECT COUNT(*) as ROW_COUNT FROM HR_SCHEMA.EMPLOYEES;
Figure 3-13 Count of EMPLOYEES by HR
Figure 3-13 Count of EMPLOYEES by HR
<!-- image -->
3. The result of the same query for a user who is logged on as TQSPENSER (Manager) is shown in Figure 3-14. TQSPENSER has five employees in his department and he can also see his own row, which is why the count is 6.
Figure 3-14 Count of EMPLOYEES by a manager
<!-- image -->
4. The result of the same query that is run by an employee (DSSMITH) gives the result that is shown in Figure 3-15. Each employee can see only his or her own data (row).
Figure 3-15 Count of EMPLOYEES by an employee
Figure 3-15 Count of EMPLOYEES by an employee
<!-- image -->
5. The result of the same query that is run by the Consultant/DBE gives the result that is shown in Figure 3-16. The consultants/DBE can manage and implement RCAC, but they do not see any rows at all.
Figure 3-16 Count of EMPLOYEES by a consultant
<!-- image -->
Does the result make sense? Yes, it does because RCAC is enabled.
@ -1058,6 +1133,9 @@ Figure 3-23 Employee on leave - Manager of Field Reps user
Figure 3-24 Employees on leave - employee user
<!-- image -->
Chapter 4.
4
@ -1103,6 +1181,7 @@ GLYPH<SM590000> The row permission and column mask for the ACCOUNTS table are ba
GLYPH<SM590000> The row permission for the TRANSACTIONS table is based on the ACCOUNTS table permission rules and the CUSTOMERS table permission rules. A subquery is used to connect the transactions (child) with the account (parent) and the account (child) with the customer (parent).
Figure 4-1 Internet banking example
<!-- image -->
## 4.2 Description of the users roles and responsibilities
@ -1123,7 +1202,6 @@ GLYPH<SM590000> PUBLIC: Anyone not already in a group
Based on their respective roles and responsibilities, the users (that is, a group) are controlled by row permissions and column masks. The chart that is shown in Figure 4-2 shows the rules for row and column access in this example.
Figure 4-2 Rules for row and column access
| | CUSTOMERS | CUSTOMERS | ACCOUNTS | ACCOUNTS | TRANSACTIONS | TRANSACTIONS |
|----------|-------------|-------------|------------|------------|----------------|----------------|
| SECURITY | No Rows | Yes | No Rows | Yes | No Rows | No |
@ -1136,7 +1214,6 @@ Figure 4-2 Rules for row and column access
The chart that is shown in Figure 4-3 shows the column access that is allowed by group and lists the column masks by table.
Figure 4-3 Column masks
| | | CUSTOMERS | ACCOUNTS |
|----------|----------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------|
| SECURITY | No Rows | CUSTOMER_DRIVERS_LICENSE_NUMBER CUSTOMER_EMAIL CUSTOMER_LOGIN_ID CUSTOMER_SECURITY_QUESTION CUSTOMER_SECURITY_QUESTION_ANSWER CUSTOMER_TAX_ID | ACCOUNT_NUMBER |
@ -1167,6 +1244,7 @@ GLYPH<SM590000> Adam O. Olsen is a bank customer with a web application login ID
Figure 4-4 shows the data model of the banking scenario that is used in this example.
Figure 4-4 Data model of the banking scenario
<!-- image -->
This section covers the following steps:
@ -1207,6 +1285,7 @@ To review the attributes of each table that is used in this banking example, com
2. Right-click the CUSTOMERS table and select Definition . Figure 4-6 shows the attributes for the CUSTOMERS table. The Row access control and Column access control options are not selected, which indicates that the table does not have RCAC implemented.
Figure 4-6 CUSTOMERS table attributes
<!-- image -->
3. Click the Columns tab to see the columns of the CUSTOMERS table, as shown in Figure 4-7.
@ -1215,10 +1294,12 @@ Figure 4-7 Column definitions of the CUSTOMERS table
4. Click the Key Constraints , Foreign Key Constraints , and Check Constraints tabs to review the key, foreign, and check constraints on the CUSTOMERS table, as shown in Figure 4-8. There are no Foreign Key Constraints or Check Constraints on the CUSTOMERS table.
Figure 4-8 Reviewing the constraints on the CUSTOMERS table
<!-- image -->
5. Review the definition of the ACCOUNTS table. The definition of the ACCOUNTS table is shown in Figure 4-9. RCAC has not been defined for this table yet.
Figure 4-9 ACCOUNTS table attributes
<!-- image -->
6. Click the Columns tab to see the columns of the ACCOUNTS table, as shown in Figure 4-10.
@ -1231,6 +1312,7 @@ Figure 4-11 Reviewing the constraints on the ACCOUNTS table
8. Review the definition of the TRANSACTIONS table. The definition of the TRANSACTIONS table is shown in Figure 4-12. RCAC is not defined for this table yet.
Figure 4-12 TRANSACTIONS table attributes
<!-- image -->
9. Click the Columns tab to see the columns of the TRANSACTIONS table, as shown in Figure 4-13.
@ -1251,22 +1333,27 @@ Complete the following steps:
1. Right-click the database connection and select Application Administration , as shown in Figure 4-15.
Figure 4-15 Application administration
<!-- image -->
2. The Application Administration window opens, as shown in Figure 4-16. Click IBM i  Database and select the function usage ID of Database Security Administrator .
Figure 4-16 Application administration for IBM i
<!-- image -->
3. Click Customize for the function usage ID of Database Security Administrator, as shown in Figure 4-17.
Figure 4-17 Customizing the Database Security Administrator function usage ID
<!-- image -->
4. The Customize Access window opens, as shown in Figure 4-18. Click the users that need to implement RCAC. For this example, HBEDOYA and MCAIN are selected. Click Add and then click OK .
Figure 4-18 Customize Access window
<!-- image -->
5. The Application Administrator window opens again. The function usage ID of Database Security Administrator now has an X in the Customized Access column, as shown in Figure 4-19.
Figure 4-19 Function usage ID Database Security Administrator customized
<!-- image -->
6. Run an SQL query that shows which user profiles are enabled to define RCAC. The SQL query is shown in Figure 4-20.
@ -1281,16 +1368,19 @@ Complete the following steps:
1. On the main navigation pane of System i Navigator, right-click Groups and select New Group , as shown in Figure 4-21.
Figure 4-21 Creating group profiles
<!-- image -->
2. The New Group window opens, as shown in Figure 4-22. For each new group, enter the Group name (ADMIN, CUSTOMER, TELLER, and DBE) and add the user profiles that are associated to this group by selecting the user profile and clicking Add .
Figure 4-22 shows adding user TQSPENCER to the TELLER group profile.
Figure 4-22 Creating group profiles and adding users
<!-- image -->
3. After you create all the group profiles, you should see them listed in System i Navigator under Users and Groups  Groups , as shown in Figure 4-23.
Figure 4-23 Newly created group profiles
<!-- image -->
## 4.3.4 Creating the CUSTOMER_LOGIN_ID global variable
@ -1301,18 +1391,22 @@ Complete the following steps:
1. From System i Navigator, under the schema Bank_Schema, right-click Global Variable and select New  Global Variable , as shown in Figure 4-24.
Figure 4-24 Creating a global variable
<!-- image -->
2. The New Global Variable window opens, as shown in Figure 4-25. Enter the global variable name of CUSTOMER_LOGIN_ID, select the data type of VARCHAR, and leave the default value of NULL. This default value ensures that users that do not use the web interface do not have permission to access the data. Click OK .
Figure 4-25 Creating a global variable called CUSTOMER_LOGIN_ID
<!-- image -->
3. Now that the global variable is created, assign permissions to the variable so that it can be set by the program. Right-click the CUSTOMER_LOGIN_ID global variable and select Permissions , as shown in Figure 4-26.
Figure 4-26 Setting permissions on the CUSTOMER_LOGIN_ID global variable
<!-- image -->
4. The Permissions window opens, as shown in Figure 4-27. Select Change authority for Webuser so that the application can set this global variable.
Figure 4-27 Setting change permissions for Webuser on the CUSTOMER_LOGIN_ID global variable
<!-- image -->
## 4.3.5 Defining and creating row permissions
@ -1321,6 +1415,7 @@ You now ready to define the row permissions of the tables. Complete the followin
1. From the navigation pane of System i Navigator, click Schemas  BANK_SCHEMA , right-click Row Permissions , and select New  Row Permission , as shown in Figure 4-28.
Figure 4-28 Selecting new row permissions
<!-- image -->
2. The New Row Permission window opens, as shown in Figure 4-29. Enter the information regarding the row permissions on the CUSTOMERS table. This row permission defines what is established in the following policy:
@ -1333,6 +1428,7 @@ Figure 4-28 Selecting new row permissions
Select the Enabled option. Click OK .
Figure 4-29 New row permissions on the CUSTOMERS table
<!-- image -->
3. Define the row permissions for the ACCOUNTS table. The New Row Permission window opens, as shown in Figure 4-30. Enter the information regarding the row permissions on the ACCOUNTS table. This row permission defines what is established in the following policy:
@ -1345,6 +1441,7 @@ Figure 4-29 New row permissions on the CUSTOMERS table
Select the Enabled option. Click OK .
Figure 4-30 New row permissions on the ACCOUNTS table
<!-- image -->
4. Define the row permissions on the TRANSACTIONS table. The New Row Permission window opens, as shown in Figure 4-31. Enter the information regarding the row permissions on the TRANSACTIONS table. This row permission defines what is established in the following policy:
@ -1357,10 +1454,12 @@ Note: You must join back to ACCOUNTS and then to CUSTOMERS by using a subquery t
-Any other user profile cannot see any rows at all. Select the Enabled option. Click OK .
Figure 4-31 New row permissions on the TRANSACTIONS table
<!-- image -->
5. To verify that the row permissions are enabled, from System i Navigator, click Row Permissions , as shown in Figure 4-32. The three row permissions are created and enabled.
Figure 4-32 List of row permissions on BANK_SCHEMA
<!-- image -->
## 4.3.6 Defining and creating column masks
@ -1369,6 +1468,7 @@ This section defines the masks on the columns. Complete the following steps:
1. From the main navigation pane of System i Navigator, click Schemas  BANK_SCHEMA , right-click Column Masks , and select New  Column Mask , as shown in Figure 4-33.
Figure 4-33 Creating a column mask
<!-- image -->
2. In the New Column Mask window, which is shown in Figure 4-34, enter the following information:
@ -1381,6 +1481,7 @@ Figure 4-33 Creating a column mask
Select the Enabled option. Click OK .
Figure 4-34 Defining a column mask on the CUSTOMERS table
<!-- image -->
3. Repeat steps 1 on page 58 and 2 to create column masks for the following columns:
@ -1399,6 +1500,7 @@ Figure 4-34 Defining a column mask on the CUSTOMERS table
4. To verify that the column masks are enabled, from System i Navigator, click Column Masks , as shown in Figure 4-35. The seven column masks are created and enabled.
Figure 4-35 List of column masks on BANK_SCHEMA
<!-- image -->
## 4.3.7 Restricting the inserting and updating of masked data
@ -1409,10 +1511,12 @@ This step defines the check constraints that support the column masks to make su
1. Create a check constraint on the column CUSTOMER_EMAIL in the CUSTOMERS table. From the navigation pane of System i Navigator, right-click the CUSTOMERS table and select Definition , as shown Figure 4-36
Figure 4-36 Definition of the CUSTOMERS table
<!-- image -->
2. From the CUSTOMERS definition window, click the Check Constraints tab and click Add , as shown in Figure 4-37.
Figure 4-37 Adding a check constraint
<!-- image -->
3. The New Check Constraint window opens, as shown in Figure 4-38. Complete the following steps:
@ -1423,14 +1527,17 @@ b. Enter the check constraint condition. In this example, specify CUSTOMER_EMAIL
c. Select the On update violation, preserve column value option and click OK .
Figure 4-38 Specifying a new check constraint on the CUSTOMERS table
<!-- image -->
4. Figure 4-39 shows that there is now a check constraint on the CUSTOMERS table that prevents any masked data from being updated to the CUSTOMER_EMAIL column.
Figure 4-39 Check constraint on the CUSTOMERS table
<!-- image -->
5. Create all the other check constraints that are associated to each of the masks on the CUSTOMERS table. After this is done, these constraints should look like the ones that are shown in Figure 4-40.
Figure 4-40 List of check constraints on the CUSTOMERS table
<!-- image -->
## 4.3.8 Activating row and column access control
@ -1439,14 +1546,17 @@ You are now ready to activate RCAC on all three tables in this example. Complete
1. Start by enabling RCAC on the CUSTOMERS table. From System i Navigator, right-click the CUSTOMERS table and select Definition . As shown in Figure 4-41, make sure that you select Row access control and Column access control . Click OK .
Figure 4-41 Enabling RCAC on the CUSTOMERS table
<!-- image -->
2. Enable RCAC on the ACCOUNTS table. Right-click the ACCOUNTS table and select Definition . As shown Figure 4-42, make sure that you select Row access control and Column access control . Click OK .
Figure 4-42 Enabling RCAC on ACCOUNTS
<!-- image -->
3. Enable RCAC on the TRANSACTIONS table. Right-click the TRANSACTIONS table and select Definition . As shown in Figure 4-43, make sure that you select Row access control . Click OK .
Figure 4-43 Enabling RCAC on TRANSACTIONS
<!-- image -->
## 4.3.9 Reviewing row permissions
@ -1455,14 +1565,17 @@ This section displays all the row permissions after enabling RCAC. Complete the
1. From System i Navigator, click Row Permissions , as shown in Figure 4-44. Three additional Row Permissions are added (QIBM_DEFAULT*). There is one per each row permission.
Figure 4-44 Row permissions after enabling RCAC
<!-- image -->
2. Look at one of the row permission definitions by right-clicking it and selecting Definition , as shown in Figure 4-45.
Figure 4-45 Selecting row permission definition
<!-- image -->
3. A window opens, as shown in Figure 4-46. Take note of the nonsensical search condition (0=1) of the QIBM_DEFAULT row permission. This permission is ORed with all of the others and it ensures that if someone does not meet any of the criteria from the row permission then this condition is tested, and because it is false the access is denied.
Figure 4-46 Search condition of the QIBM_DEFAULT row permission
<!-- image -->
## 4.3.10 Demonstrating data access with RCAC
@ -1507,6 +1620,7 @@ To test a SECURITY user, complete the following steps:
1. Confirm that the user is the user of the session by running the first SQL statement, as shown in Figure 4-50. In this example, SECURITY is the security officer.
Figure 4-50 SECURITY session user
<!-- image -->
2. The number of rows in the CUSTOMERS table that the security officer can see is shown in Figure 4-51. The security officer cannot see any data at all.
@ -1515,6 +1629,7 @@ Figure 4-51 Number of rows that the security officer can see in the CUSTOMERS ta
3. The result of the third SQL statement is shown in Figure 4-52. Note the empty set that is returned to the security officer.
Figure 4-52 SQL statement that is run by the SECURITY user - no results
<!-- image -->
## Data access for TELLER user with RCAC
@ -1527,6 +1642,7 @@ Figure 4-53 TELLER session user
2. The number of rows in the CUSTOMERS table that the TELLER user can see is shown in Figure 4-54. The TELLER user can see all the rows.
Figure 4-54 Number of rows that the TELLER user can see in the CUSTOMERS table
<!-- image -->
3. The result of the third SQL statement is shown in Figure 4-55. Note the masked columns. The TELLER user, TQSPENSER, can see all the rows, but there are some columns where the result is masked.
@ -1539,6 +1655,7 @@ To test an ADMIN (VGLUCCHESS) user, complete the following steps:
1. Confirm that the ADMIN user is the user of the session by running the first SQL statement, as shown in Figure 4-56. In this example, VGLUCCHESS is an ADMIN user.
Figure 4-56 ADMIN session user
<!-- image -->
2. The number of rows that the ADMIN user can see is shown in Figure 4-57. The ADMIN user can see all the rows.
@ -1555,18 +1672,22 @@ To test a CUSTOMERS (WEBUSER) user that accesses the database by using the web a
1. Confirm that the user is the user of the session by running the first SQL statement, as shown in Figure 4-59. In this example, WEBUSER is a CUSTOMER user.
Figure 4-59 WEBUSER session user
<!-- image -->
2. A global variable (CUSTOMER_LOGIN_ID) is set by the web application and then is used to check the row permissions. Figure 4-60 shows setting the global variable by using the customer login ID.
Figure 4-60 Setting the global variable CUSTOMER_LOGIN_ID
<!-- image -->
3. Verify that the global variable was set with the correct value by clicking the Global Variable tab, as shown in Figure 4-61.
Figure 4-61 Viewing the global variable value
<!-- image -->
4. The number of rows that the WEBUSER can see is shown in Figure 4-62. This user can see only the one row that belongs to his web-based user ID.
Figure 4-62 Number of rows that the WEBUSER can see in the CUSTOMERS table
<!-- image -->
5. The result of the third SQL statement is shown in Figure 4-63. There are no masked columns, and the user can see only one row, which is the user's own row.
@ -1597,18 +1718,25 @@ This section looks at some other interesting information that is related to RCAC
1. Figure 4-67 shows the SQL statement in Visual Explain ran with no RCAC. The implementation of the SQL statement is a two-way join, which is exactly what the SQL statement is doing.
Figure 4-67 Visual Explain with no RCAC enabled
<!-- image -->
2. Figure 4-68 shows the Visual Explain of the same SQL statement, but with RCAC enabled. It is clear that the implementation of the SQL statement is more complex because the row permission rule becomes part of the WHERE clause.
Figure 4-68 Visual Explain with RCAC enabled
<!-- image -->
3. Compare the advised indexes that are provided by the Optimizer without RCAC and with RCAC enabled. Figure 4-69 shows the index advice for the SQL statement without RCAC enabled. The index being advised is for the ORDER BY clause.
Figure 4-69 Index advice with no RCAC
<!-- image -->
4. Now, look at the advised indexes with RCAC enabled. As shown in Figure 4-70, there is an additional index being advised, which is basically for the row permission rule. For more information, see 6.4.2, "Index advisor" on page 99.
Figure 4-70 Index advice with RCAC enabled
<!-- image -->
<!-- image -->
Chapter 5.
@ -1695,6 +1823,7 @@ For example, consider a table containing three columns of first name, last name,
In this example, the application reads the data for an update to correct the misspelling of the last name. The last name value is changed to Smith in the buffer. Now, a WRITE request is issued by the program, which uses the contents of the record buffer to update the row in the underlying DB2 table. Unfortunately, the record buffer still contains a masked value for the tax ID, so the tax ID value in the table is accidentally set to the masked value.
Figure 5-1 Accidental update with masked values scenario
<!-- image -->
Obviously, careful planning and testing should be exercised to avoid accidental updates with masked values.
@ -1724,6 +1853,9 @@ If the target table has RCAC controls defined and activated, then the CPYF comma
The CPYLIB command is enhanced with the same Access Control ( ACCCTL ) parameter as the CRTDUPOBJ command in the IBM i 7.2 release (see 5.4.1, "Create Duplicate Object (CRTDUPOBJ) command" on page 82). Row permissions and column masks are copied to the new object in the new library by default because the default value for the ACCCTL parameter is *ALL .
<!-- image -->
Chapter 6.
## Additional considerations
@ -1784,6 +1916,7 @@ FROM GROUP BY ORDER BY
## With RCAC Masking
| CREDIT CARD NUMBER _ _ | TOTAL |
|--------------------------|---------------|
| 3785 0000 0000 1234 | 233.50 |
@ -1798,7 +1931,6 @@ FROM GROUP BY ORDER BY
| 6011 9999 9999 0001 | 10.00 |
Figure 6-1 Timing of column masking
| CREDIT CARD NUMBER _ _ | TOTAL |
|---------------------------|---------------|
| **** **** **** 1234 | 233.50 |
@ -1823,12 +1955,14 @@ Conversely, field procedure masking causes the column values to be changed (that
Note: Column masks can influence an SQL INSERT or UPDATE . For example, you cannot insert or update a table with column access control activated with masked data generated from an expression within the same statement that is based on a column with a column mask.
Figure 6-2 Masking differences between Fieldproc and RCAC
<!-- image -->
## 6.2 RCAC effects on data movement
As described earlier and shown in Figure 6-3, RCAC is applied pervasively regardless of the data access programming interface, SQL statement, or IBM i command. The effects of RCAC on data movement scenarios can be profound and possibly problematic. It is important to understand these effects and make the appropriate adjustments to avoid incorrect results or data loss.
Figure 6-3 RCAC and data movement
<!-- image -->
The "user" that is running the data movement application or process, whether it be a high availability (HA) scenario, an extract, transform, load (ETL) scenario, or just copying data from one file or table to another one, must have permission to all the source rows without masking, and not be restricted from putting rows into the target. Allowing the data movement application or process to bypass the RCAC rules must be based on a clear and concise understanding of the organization's object security and data access policy. Proper design, implementation, and testing are critical success factors when applying RCAC.
@ -1853,6 +1987,7 @@ INSERT INTO TARGET (SELECT * FROM SOURCE);
For example, given a "source" table with a row permission defined as NAME <> 'CAIN' and a column mask that is defined to project the value 999.99 for AMOUNT, the SELECT statement produces a result set that has the RCAC rules applied. This reduced and modified result set is inserted into the "target" table even though the query is defined as returning all rows and all columns. Instead of seven rows that are selected from the source, only three rows are returned and placed into the target, as shown in Figure 6-4.
Figure 6-4 RCAC effects on data movement from SOURCE
<!-- image -->
## 6.2.2 Effects when RCAC is defined on the target table
@ -1865,6 +2000,7 @@ INSERT INTO TARGET (SELECT * FROM SOURCE);
Given a "target" table with a row permission defined as NAME <> 'CAIN' and a column mask that is defined to project the value 999.99 for AMOUNT, the SELECT statement produces a result set that represents all the rows and columns. The seven row result set is inserted into the "target", and the RCAC row permission causes an error to be returned, as shown in Figure 6-5. The source rows where NAME = 'CAIN' do not satisfy the target table's permission, and therefore cannot be inserted. In other words, you are inserting data that you cannot read.
Figure 6-5 RCAC effects on data movement on TARGET
<!-- image -->
## 6.2.3 Effects when RCAC is defined on both source and target tables
@ -1879,6 +2015,7 @@ Given a "source" table and a "target" table with a row permission defined as NAM
Although the source rows where NAME <> 'CAIN' do satisfy the target table's permission, the AMOUNT column value of 999.99 represents masked data and therefore cannot be inserted. An error is returned indicating the failure, as shown in Figure 6-6. In this scenario, DB2 is protecting against an overt attempt to insert masked data.
Figure 6-6 RCAC effects on data movement on SOURCE and TARGET
<!-- image -->
## 6.3 RCAC effects on joins
@ -1889,40 +2026,47 @@ Note: Thinking of the row permission as defining a virtual set of rows that can
As shown in Figure 6-7, there are two different sets, set A and set B. However, set B has a row permission that subsets the rows that a user can see.
Figure 6-7 Set A and set B with row permissions
<!-- image -->
## 6.3.1 Inner joins
Inner join defines the intersection of two data sets. For a row to be returned from the inner join query, it must appear in both sets, as shown in Figure 6-8.
Figure 6-8 Inner join without RCAC permission
<!-- image -->
Given that row permission serves to eliminate logically rows from one or more sets, the result set from an inner join (and a subquery) can be different when RCAC is applied. RCAC can reduce the number of rows that are permitted to be accessed by the join, as shown in Figure 6-9.
Effect of column masks on inner joins: Because column masks are applied after the query final results are determined, the masked value has no effect on the join processing and corresponding query result set.
Figure 6-9 Inner join with RCAC permission
<!-- image -->
## 6.3.2 Outer joins
Outer joins preserve one or both sides of two data sets. A row can be returned from the outer join query if it appears in the primary set (LEFT, RIGHT, or both in the case of FULL), as shown in Figure 6-10. Column values from the secondary set are returned if the row has a match in the primary set. Otherwise, NULL is returned for the column value by default.
Figure 6-10 Outer join without RCAC permission
<!-- image -->
Given that row permission serves to eliminate logically rows from one or more sets, more column values that are returned from the secondary table in outer join can be NULL when RCAC is applied, as shown in Figure 6-11.
Effect of column masks on inner joins: Because column masks are applied after the query final results are determined, the masked value has no effect on the join processing and corresponding query result set.
Figure 6-11 Outer join with RCAC permission
<!-- image -->
## 6.3.3 Exception joins
Exception joins preserve one side of two data sets. A row can be returned from the exception join query if it appears in the primary set (LEFT or RIGHT) and the row does not appear in the secondary set, as shown in Figure 6-12. Column values from the secondary set are returned as NULL by default.
Figure 6-12 Exception join without RCAC permission
<!-- image -->
Given that row permission serves to eliminate logically rows from one or more sets, more rows can appear to be exceptions when RCAC is applied, as shown in Figure 6-13. Also, because column masks are applied after the query final results are determined, the masked value has no effect on the join processing and corresponding query result set.
Figure 6-13 Exception join with RCAC permission
<!-- image -->
## 6.4 Monitoring, analyzing, and debugging with RCAC
@ -1953,18 +2097,22 @@ When monitoring and collecting metrics on database requests, DB2 for i provides
Figure 6-14 shows how Visual Explain externalizes RCAC.
Figure 6-14 Visual Explain indicating that RCAC is applied
<!-- image -->
Figure 6-15 shows the main dashboard of an SQL Performance Monitor. Click Summary .
Figure 6-15 SQL Performance Monitor
<!-- image -->
Figure 6-16 shows the summary of an SQL Performance Monitor with an indication that RCAC is applied.
Figure 6-16 SQL Performance Monitor indicating that RCAC is applied
<!-- image -->
Figure 6-17 shows the statements of an SQL Performance Monitor and how RCAC is externalized.
Figure 6-17 SQL Performance Monitor showing statements and RCAC
<!-- image -->
When implementing RCAC as part of a comprehensive and pervasive data access control initiative, consider that the database monitoring and analysis tools can collect literal values that are passed as part of SQL statements. These literal values can be viewed as part of the information collected. If any of the literals are based on or are used with masked columns, it is important to review the database engineer's policy for viewing these data elements. For example, supposed that column CUSTOMER_TAX_ID is deemed masked for the database engineer and the CUSTOMER_TAX_ID column is used in a predicate as follows:
@ -1983,10 +2131,12 @@ The index advisor is not specifically enhanced for RCAC, but because the rule te
For example, the query that is shown in Figure 6-18 produces index advice for the user's predicate and the RCAC predicate.
Figure 6-18 Index advice and RCAC
<!-- image -->
In Figure 6-19, index advisor is showing an index for the ACCOUNTS and CUSTOMERS tables based on the RCAC rule text.
Figure 6-19 Index advisor based on the RCAC rule
<!-- image -->
For more information about creating and using indexes, see IBM DB2 for i indexing methods and strategies , found at:
@ -2079,10 +2229,12 @@ This section covers the implications to views, materialized query tables (MQTs),
Any access to an SQL view that is over one or more tables that have RCAC also have those row permissions and column masking rules applied. If an SQL view has predicates, those are logically ANDed with any search condition that is specified in the permissions that are defined on the underlying tables. The view does not have to project the columns that are referenced by the permissions. Figure 6-21 shows an example of a view definition and user query.
Figure 6-21 View definition and user query
<!-- image -->
What the query optimizer plans for and what the database engine runs is shown in the Figure 6-22.
Figure 6-22 Query rewrite with RCAC
<!-- image -->
## 6.5.2 Materialized query tables
@ -2159,10 +2311,12 @@ A simple example to illustrate this concept is a random read using a keyed logic
For programs that access records sequentially, in or out of key order, the added RCAC logic can have a profound effect on the performance and scalability. Reading the "next record" in order is no longer a simple matter of positioning to the next available key, as shown in Figure 6-23.
Figure 6-23 Native record access with no RCAC
<!-- image -->
Before the record, as identified by the key, is considered available, the RCAC logic must be run. If the record is rejected by RCAC, the next record in sequence that is permissible must be identified. This spinning through the records can take a long time and uses many resources, as shown in Figure 6-24.
Figure 6-24 Native record level access with RCAC
<!-- image -->
After the row permissions and column masks are designed and implemented, adequate performance and scalability testing are recommended.
@ -2265,6 +2419,7 @@ When designing and implementing RCAC row permissions, special attention should b
Figure 6-25 illustrates that object level security is the first check and that RCAC permissions provide control only on tables and physical files.
Figure 6-25 Object-level security and RCAC permissions
<!-- image -->
To get access to the table and the rows, the user must pass the object level authority test and the RCAC permission test.
@ -2274,6 +2429,9 @@ Although the SQL Plan Cache data, the SQL Plan Cache Snapshot data, and the SQL
The ability to monitor, analyze, debug, and tune data-centric applications effectively and efficiently requires some understanding of the underlying data, or at least the attributes of the data. The organization must be willing to reconcile the conflicting requirements of "restricting access to data", and "needing access to data".
<!-- image -->
Chapter 7.
7
@ -2333,6 +2491,7 @@ Save and restore processing works fine with RCAC if the RCAC definition does not
For example, assume that the BANKSCHEMA library (which is the system name or short name for the schema long name of BANK_SCHEMA) is saved and restored into a library named BANK_TEST. Recall from the example in 7.1.4, "Regenerating" on page 114 that the row permission on the ACCOUNTS table references the CUSTOMERS table (… SELECT C.CUSTOMER_ID FROM CUSTOMERS C …). After the restore operation, the ACCOUNTS row permission still references the CUSTOMERS table in BANK_SCHEMA because DB2 explicitly qualifies all object references when the row permission or column mask is created. The restore processing does not change the explicit qualification from BANK_SCHEMA to BANK_TEST. As a result, the restored ACCOUNTS row permission now depends on DB2 objects residing in a different schema, even though it was not created that way originally. For more details, see Figure 7-1.
Figure 7-1 Restoring tables to different schemas
<!-- image -->
The only way to fix this issue is to re-create the row permission or column mask after the restore operation. Re-creation of the row permission or column mask is required only for definitions that reference other DB2 objects, but it is simpler to re-create all of the RCAC definitions instead of a subset. For example, generate the SQL using System i Navigator, clear the "Schema qualify names for objects" and select the "OR REPLACE clause", and then run the generated script.
@ -2360,6 +2519,9 @@ GLYPH<SM590000> IBM i Version 7.2 Security Reference Guide , found at:
http://www-01.ibm.com/support/knowledgecenter/ssw_ibm_i_72/rzarl/rzarlkickoff.h tm?lang=en
<!-- image -->
Chapter 8.
## Designing and planning for success
@ -2406,8 +2568,14 @@ To further assist you with understanding and implementing RCAC, the DB2 for i Ce
If you are interested in engaging with the DB2 for i Center of Excellence, contact Mike Cain at mcain@us.ibm.com .
<!-- image -->
Appendix A.
<!-- image -->
## Database definitions for the RCAC banking example
This appendix provides the database definitions or DDLs to re-create the Row and Column Access Control (RCAC) scenario that is described in Chapter 4, "Implementing Row and Column Access Control: Banking example" on page 37. The script that is shown in Example A-1 is the DDL script that is used to implement this example.
@ -2486,6 +2654,12 @@ This paper is intended for database engineers, data-centric application develope
REDP-5110-00
<!-- image -->
<!-- image -->
INTERNATIONAL TECHNICAL SUPPORT ORGANIZATION
## BUILDING TECHNICAL INFORMATION BASED ON PRACTICAL EXPERIENCE

View File

@ -1,7 +1,16 @@
Front cover
<!-- image -->
## IBM Cloud Pak for Data on IBM Z
<!-- image -->
<!-- image -->
## Executive overview
Most industries are susceptible to fraud, which poses a risk to both businesses and consumers. According to The National Health Care Anti-Fraud Association, health care fraud alone causes the nation around $68 billion annually.$^{1}$ This statistic does not include the numerous other industries where fraudulent activities occur daily. In addition, the growing amount of data that enterprises own makes it difficult for them to detect fraud. Businesses can benefit by using an analytical platform to fully integrate their data with artificial intelligence (AI) technology.
@ -37,6 +46,7 @@ To learn more about these features, see the IBM z16 product page.
Figure 1 on page 3 shows a picture of the IBM z16 mainframe.
Figure 1 IBM z16
<!-- image -->
## IBM z16 and IBM LinuxONE Emperor 4 features
@ -45,12 +55,14 @@ IBM Z are based on enterprise mainframe technology. Starting with transaction-ba
Figure 2 provides a snapshot of the IBM Z processor roadmap, which depicts the journey of transformation and improvement.
Figure 2 IBM Z: Processor roadmap
<!-- image -->
The IBM z16 and IBM LinuxONE Emperor 4 are the latest of the IBM Z, and they are developed with a 'built to build' focus to provide a powerful, cyberresilient, open, and secure platform for business with an extra focus on sustainability to help build sustainable data centers. Although the z16 server can host both IBM z/OSfi and Linux workloads, LinuxONE Emperor 4 is built to host Linux only workloads with a focus on consolidation and resiliency. Depending on the workload, consolidation from numerous x86 servers into a LinuxONE Emperor 4 can help reduce energy consumption by 75% and data center floor space by 50%, which helps to achieve the sustainability goals of the organization.
Figure 3 on page 5 shows a summary of the system design of IBM LinuxONE Emperor 4 with the IBM Telum™ processor. The IBM Telum processor chip is designed to run enterprise applications efficiently where their data resides to embed AI with super low latency. The support for higher bandwidth and I/O rates is supported through FCP Express cards with an endpoint security solution. The memory subsystem supports up to 40 TB of memory.
Figure 3 System design of IBM z16 LinuxONE Emperor 4
<!-- image -->
The IBM z16 and IBM LinuxONE Emperor 4 servers are built with 7-nm technology at a 5.2 GHz speed. They consist of four dual-chip modules (DCMs) per central processor complex (CPC) drawer, each of which is built with two 8-core Telum processor chips that has "first in the industry" on-chip acceleration for mid-transaction, real-time AI inferencing, which supports many different use cases, including fraud detection.
@ -59,12 +71,14 @@ Each core has access to a huge private 32 MB L2 cache where up to 16 MB of the L
Figure 4 provides more information about the features of AI Accelerator integration with the IBM Z processor cores.
Figure 4 IBM z16 on-chip AI Accelerator integration with IBM Z processor cores
<!-- image -->
The IBM z16 and IBM LinuxONE Emperor 4 server platforms are built with the hardware features that are shown in Figure 4 with addressing data and AI workloads in mind. Regardless of where the ML and deep learning (DL) frameworks are used to build and train data and AI models, the inferencing on existing enterprise application data can happen along currently running enterprise business applications. CP4D 4.6 supports Tensorflow and IBM Snap ML frameworks, which are optimized to use the on-chip AI Accelerator during inferencing. Support for various other frameworks is planned for future releases.
Figure 5 on page 7 shows the seamless integration of AI into existing enterprises workloads on the IBM z16 while leveraging the underlying hardware capabilities.
Figure 5 Seamless integration
<!-- image -->
## What is Cloud Pak for Data on IBM Z
@ -75,6 +89,7 @@ CP4D on IBM Z provides enterprises with a resilient and secure private cloud pla
Figure 6 shows a solution overview of CP4D. The infrastructure alternatives are shown at the bottom, and they include IBM Z and LinuxONE. They all leverage Red Hat OpenShift. Common Foundational Services come next, which offer clarity throughout the data and AI lifecycle, that is, from user access management to monitoring and service provisioning. A high-level view of the services is shown in the middle section. The services have several different capabilities that span the AI hierarchy. The platform can be expanded, and it offers a seamless user experience for all distinct personas across the AI lifecycle, from data gathering through AI infusion.
Figure 6 Solution overview of Cloud Pak for Data
<!-- image -->
We highlight the four main pillars that make IBM Z the correct infrastructure for CP4D:
@ -135,6 +150,7 @@ Traditional ML models' power most of today's ML applications in business and amo
Figure 7 on page 11 provides an overview of the components that are supported on CP4D on IBM Z. You can leverage Watson Studio for model building, training, and validation, and WML for deployment of the model. Eventually, applications can use the AI inference endpoint to score the model.
Figure 7 Developing, training, and deploying an AI model on Cloud Pak for Data on IBM Z and IBM LinuxONE
<!-- image -->
In summary, here are some of the reasons why you should choose AI on IBM Z:
@ -227,6 +243,7 @@ The key point here is that risk exists throughout the entire AI lifecycle starti
For example, a business can start testing a model before production for fairness metrics. For this task, enterprises need an end-to-end workflow with approvals to mitigate these risks and increase the scale of AI investments, as shown in Figure 8, which presents a typical AI model lifecycle in an enterprise.
Figure 8 Typical AI model lifecycle
<!-- image -->
Due to regulations, more stakeholders adopt the typical AI model lifecycle to protect their brand from new end-to-end risks. To ensure various aspects of both regulatory compliance and security, the personas that must be involved include the chief financial officer (CFO), chief marketing officer (CMO), chief data officer (CDO), HR, and chief regulatory officer (CRO), along with the data engineers, data scientists, and business analysts, who build AI workflows.
@ -279,44 +296,54 @@ GLYPH<SM590000> Enterprises can develop AI models by creating and training model
Figure 9 on page 16 shows the end-to-end flow for a remote AI governance solution.
Figure 9 Remote AI governance solution end-to-end flow
<!-- image -->
To achieve end-to-end AI governance, complete the following steps:
1. Create a model entry in IBM OpenPages by using CP4D on a x86 platform, as shown in Figure 10.
Figure 10 Creating a model entry in IBM OpenPages
<!-- image -->
2. Train a model by using Watson Studio and by using development tools such as Jupyter Notebook or JupyterLab on CP4D on Red Hat OpenShift on a virtual machine on IBM Z, as shown in Figure 11.
Figure 11 Training an AI model by using Watson Studio
<!-- image -->
3. Deploy the model by using WML on CP4D on Red Hat OpenShift on a virtual machine on IBM Z, as shown in Figure 12.
Figure 12 Deploying an AI model by using WML on Cloud Pak for Data
<!-- image -->
4. Track the external model lifecycle by browsing through the Catalogs/Platform assets catalog by using AI Factsheets and OpenPages while using CP4D on an x86 platform, as shown in Figure 13. The external model (deployed on CP4D on Red Hat OpenShift on a virtual machine on IBM Z) is saved as a platform asset catalog on the x86 platform.
Figure 13 External model
<!-- image -->
You can track the model through each stage of the model lifecycle, as shown in Figure 14, by using AI Factsheets and OpenPages.
Figure 14 Tracking the model
<!-- image -->
You can see that the model facts are tracked and synchronized to IBM OpenPages for risk management, as shown in Figure 15.
Figure 15 Model facts that are tracked and synchronized to IBM OpenPages on an x86 platform
<!-- image -->
5. Create an external model by using IBM OpenScale on the x86 platform, as shown in Figure 16.
Figure 16 Creating an external model on an x86 platform
<!-- image -->
IBM OpenScale provides a comprehensive dashboard that tracks fairness, quality monitoring, drift, and explainability of a model. Fairness determines whether your model produces biased outcomes. Quality determines how well your model predicts outcomes. Drift is the degradation of predictive performance over time. A sample is shown in Figure 17 on page 21.
Figure 17 IBM OpenScale dashboard that is used to monitor the external model
<!-- image -->
You developed and deployed the AI model by using Watson Studio, WML on CP4D on Red Hat OpenShift on a virtual machine on IBM Z, and end-to-end AI model governance by leveraging AI Factsheets, OpenScale, and OpenPages on CP4D on a x86 platform. Figure 18 shows end-to-end AI governance when using IBM OpenPages, AI Factsheets, and OpenScale.
Figure 18 Final result: End-to-end AI governance when using IBM OpenPages, AI Factsheets, and OpenScale
<!-- image -->
## Use case 2: Credit default risk assessment
@ -335,6 +362,7 @@ Financial institutions can leverage AI solutions by using ML techniques to predi
Figure 19 on page 23 shows a sample architecture about how to design and develop an AI model for credit risk assessment on IBM Z. An IBM WebSpherefi Application Server is used for handling in-bound transactions, and CP4D is used for AI model lifecycle management that includes building, training, and deploying the model.
Figure 19 Architecture for credit risk prediction by using an ML AI model on IBM Z
<!-- image -->
A data scientist can leverage Watson Studio to develop and train an AI model and WML to deploy and score the model. In this sample architecture, the WML Python run time leverages the ML framework, IBM Snap Machine Learning (Snap ML), for scoring, can leverage an integrated AI accelerator at the time of model import.
@ -349,6 +377,7 @@ We showed how IBM Z enable customers to use AI frameworks to detect credit risk.
Figure 20 shows an architecture for predicting credit risk by using DL on IBM Z.
Figure 20 Architecture for credit risk prediction by using DL on IBM Z
<!-- image -->
Data scientists can start creating and training a DL AI model by using a Jupyter Notebook instance and Watson Studio. Then, they can deploy the model by using WML on CP4D running on IBM Z, which provides an endpoint. Other applications, including the IBM WebSphere server, can produce credit risk results by using the model's endpoint.
@ -385,6 +414,7 @@ One possible solution is to build and train a TensorFlow based DL model that lea
Figure 21 provides a high-level diagram of a clearing and settlement use case for financial transactions that uses CP4D on IBM Z and IBM LinuxONE.
Figure 21 Clearing and settlement use case for financial transactions by using Cloud Pak for Data
<!-- image -->
Here are the steps of the high-level process flow:
@ -441,6 +471,7 @@ Remaining Useful Life (RUL) is the remaining time or cycles that an aircraft eng
Figure 22 provides an overview of the inferencing architecture for the RUL of an aircraft engine when using IBM Z.
Figure 22 Inferencing architecture on IBM Z
<!-- image -->
Because we are looking into data-driven model development, the data set of our target is the run-to-failure data of the engine. We are looking into a supervised learning problem, and we use regression techniques to learn from the data. DL techniques such as Long Short-Term Memory (LSTM) or Gated Recurrent Units (GRU) are our choice because we are looking into a time series data set. TensorFlow or PyTorch frameworks are leveraged to create models. AI governance monitors the data and model drift to maintain the model quality throughout the model's life.
@ -461,6 +492,7 @@ Client-side applications can invoke a REST apiserver that handles some preproces
Figure 23 on page 29 provides a more in-depth view of the architecture of an AI-based predictive maintenance application.
Figure 23 In-depth architectural view
<!-- image -->
In summary, consider the following points while developing an AI-based predictive maintenance application:
@ -501,6 +533,7 @@ AI is the current "market trend evolution" in video analytics and advancing the
S
Figure 24 Architecture for AI-powered video analytics
<!-- image -->
Live camera feeds or recorded videos of an infant's movement are the inputs for a pose detection model. This video streaming data was stored in IBM Cloudfi Object Storage for image processing. Video data must be transformed into frames so that the infant's body poses can be detected. These post-estimation components of the pipeline predict the location of all 17-person key points with 3 degrees of freedom each (x, y location and visibility) plus two virtual alignment key points. This approach also embraces a compute-intensive heat map prediction of infant body posture.
@ -620,6 +653,7 @@ IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of Intern
The following terms are trademarks or registered trademarks of International Business Machines Corporation, and might also be trademarks or registered trademarks in other countries.
| Db2fi IBMfi | IBM Watsonfi | Redbooks (log o) fi Turbon |
|----------------------|----------------|------------------------------|
| | IBM z16™ | omicfi |
@ -640,10 +674,16 @@ UNIX is a registered trademark of The Open Group in the United States and other
Other company, product, or service names may be trademarks or service marks of others.
<!-- image -->
Back cover
REDP-5695-00
ISBN 0738461067
Printed in U.S.A.
Printed in U.S.A.
<!-- image -->