SIP - icassp-2019

Clonability of anti-counterfeiting printable graphical codes:

a machine learning approach

O. Taran, S. Bonev and S. Voloshynovskiy

Citation

O. Taran, S. Bonev, and S. Voloshynovskiy, "Clonability of anti-counterfeiting printable graphical codes: a machine learning approach," in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),

Brighton, United Kingdom, 2019. Bibtex | PDF

Code: PyTorch
If you have questions about our PyTorch code, please contact us.

The research was supported by the SNF project No. 200021_182063.

Abstract

In recent years, printable graphical codes have attracted a lot of attention enabling a link between the physical and digital worlds, which is of great interest for the IoT and brand protection applications. The security of printable codes in terms of their reproducibility by unauthorized parties or clonability is largely unexplored. In this paper, we try to investigate the clonability of printable graphical codes from a machine learning perspective. The proposed framework is based on a simple system composed of fully connected neural network layers. The results obtained on real codes printed by several printers demonstrate a possibility to accurately estimate digital codes from their printed counterparts in certain cases. This provides a new insight on scenarios, where printable graphical codes can be accurately cloned.

Fig.1: Training procedure based on training samples .

Attacks against PGC

The main goal of this work is to investigate the resistance of PGC to clonability attacks. The overwhelming majority of such attacks can be split into two main groups: (a) handcrafted attacks, which are based on the experience and knowhow of the attackers and (b) machine learning based attacks, which use training data to create clones of the original codes.

In our work, we focus on the investigation of machine learning based attacks due to the recent advent in the theory and practice of machine learning tools. Growing popularity and remarkable results of deep neural network (DNN) architectures in computer vision applications motivated us to investigate the clonability of PGC using these architectures trained for different classes of printers.

The main contributions are:

we investigate the clonability of printable graphical codes using machine learning based attacks;
we examine the proposed framework on real printed codes reproduced with 4 printers;
we empirically demonstrate a possibility to sufficiently accurately clone the PGC from their printed counterparts in certain cases.

Digital printers. To evaluate the clonability aspects of PGC based on DataMatrix modulation and to investigate the influence of the printing technologies we use 4 digital printers: 2 inkjet printers HP OfficeJet Pro 8210 (HP) and Canon PIXMA iP7200 (CA) and 2 laser printers Lexmark CS310 (LX) and Samsung Xpress 430 (SA).

Table 1: Regeneration accuracy with respect to original codes.

DNN architectures. In our experiments we use two types of DNN architectures with the same input size equals to 576:

FC: fully connected DNN with 2, 3 and 4 hidden layers (hereafter referred to as FC 2, FC 3 and FC 4). The size of each layer equals to the input size.
BN: ”bottleneck” model with 2 fully connected hidden layers of size 256 and 128 at the encoder and decoder parts and a latent representation of size 36.

It should be pointed out that the training procedure is blind in the sense that we did not use any information about the principles of the DataMatrix code generation. To evaluate the accuracy of the prediction of ”regenerated” codes we use Pearson correlation and normalized Hamming distance between the original digital codes and the corresponding regenerated ones. The obtained results are presented in Table 1. Additionally to the DNN models, we perform the estimation from the printed codes via a simple thresholding (without DNN processing) similarly to [4, 5, 2]. The obtained results correspond to the Thr method in Table 1 and serve as baseline error. From the presented results, it is clear that the BN architecture provides the best results. To provide more understanding how the codes look, we visualize the sub-blocks of size 84 × 84 from several codes for each printer and the estimations deploying the BN as the best estimator in Fig. 2

Fig 2: Examples of attacks against PGC: two samples of scanned codes, the estimates produced by BN model

and the difference between the original and estimated codes.

To answer the question if the amount of errors in the BN regenerated codes can be noticed by the defender and how the BN results differ from the baseline estimation obtained via Thr method, we printed our estimated codes for both cases on the same printers with the same parameters as the original codes and after that we scanned them on the same scanner.

In our evaluation we use Pearson correlation between the originals and grey level printed codes. Additionally, we use normalized Hamming distance to measure the accuracy of the logical symbol estimation in the originals and binarized printed codes. Using these statistics, we compute the ROC curves based on the probability of correct detection and the probability of false acceptance via:

As it can be seen from the ROC curves illustrated in Fig. 3. the obtained results demonstrate the low resistance of the PGC based on DataMatrix modulation and similar codes to the machine learning based clonability attacks.

Fig.3: The ROC curves for Pearson correlation and Hamming distance between the original and fake printed codes estimated via BN and Thr methods.

Conclusions

In our work, we investigated the clonability of printable graphical codes using DataMatrix modulation typical for many PGC designs using machine learning based attacks. We tested the proposed framework with two different DNN architectures on real printed data. We empirically proved the possibility to accurately estimate the printable codes for high quality printers even from the relatively small training datasets. Based on the performed experiments and obtained results we can identify three main criteria for successful fake detection: (a) the printing quality, (b) the amount of errors in estimated codes and (c) the regularity of the estimated errors. The defenders should prefer average quality printers with a dot-gain sufficient to make regular errors in the originals estimation. Moreover, the results show that modern machine learning technologies make the printable graphical codes vulnerable to clonability attacks.

For future work, we aim at examining other types of graphical codes, at investigating the possibilities of mobile phones for the detection of fake codes and to compare the abilities of machine learning approaches versus hand-crafted attacks. Finally, we plan to consider GAN-like architectures to produce even more accurate fakes. The impact of the number of training examples and training from the original digital templates are also amongst our future priorities.