Face Anti-spoofing Using Deep Dual Network

9
IEIE Transactions on Smart Processing and Computing, vol. 9, no. 3, June 2020 https://doi.org/10.5573/IEIESPC.2020.9.3.203 203 IEIE Transactions on Smart Processing and Computing Face Anti-spoofing Using Deep Dual Network Yongjae Gwak 1 , Chanho Jeong 1 , Jong-hyuk Roh 2 , Sangrae Cho 2 , and Wonjun Kim 1,* 1 Department of Electrical and Electronics Engineering, Konkuk University / Seoul, Korea {trevor, jch828, wonjkim}@konkuk.ac.kr 2 Electronics and Telecommunications Research Institute / Daejeon, Korea {jhroh, sangrae}@etri.re.kr * Corresponding Author: Wonjun Kim Received December 17, 2019; Revised February 26, 2020; Accepted March 24, 2020; Published June 30, 2020 * Regular Paper Abstract: Face recognition has been adopted widely for real-world applications because of its convenience and contactless nature. On the other hand, forged faces for spoofing attacks can be fabricated easily using a variety of materials, such as pictures, high-resolution videos, and printed masks, etc., which pose a great threat to face-based recognition systems. Therefore, face anti- spoofing has become an essential technique to achieve high-level security. Although many studies have explored effective features to discriminate live faces from fake ones, even with deep neural networks, they still struggle to grasp meaningful differences from a single image because of the sophisticated spoofing attacks with various media. This paper proposes a novel method for face anti-spoofing based on stereo facial images. Because the three-dimensional structure of a live face clearly yields a structural difference in the image pair taken by a stereo camera, whereas significant differences do not occur in fake faces of two-dimensional planes, this paper proposes to learn the differences of left-right image pairs in the latent space of a deep neural network. One important advantage of the proposed method is that the structural difference is encoded implicitly in a nonlinear manner through the deep architecture without explicitly computing the disparity. The experimental results on a constructed dataset revealed the proposed method to be effective for diverse spoofing attacks. Keywords: Face anti-spoofing, Stereo facial images, Deep neural network, Latent space 1. Introduction Diverse applications of face recognition have been applied widely to mobile devices owing to its ease of use and contactless nature. Moreover, with the improved accuracy achieved by deep learning techniques, face recognition has recently expanded to electronic transactions. Although the widespread expansion of face recognition in the industry is possible, it still suffers from security challenges. Many sophisticated spoofing attacks using high-resolution pictures, videos, and printed masks can easily threaten face-based verification systems, leading to the leakage of personal information and even serious crimes related to financial services. Therefore, face anti- spoofing has become indispensable when constructing high-level security systems. Several approaches for face anti-spoofing have been developed in the field of computer vision and can be categorized into two main groups: handcrafted feature- based and learned feature-based methods. In the former, researchers attempted to precisely design the discriminative features for face anti-spoofing based on the motion, image quality, and textural information while the latter aims to learn the subtle differences between live and fake faces through a deep neural network (DNN). In particular, the local binary pattern (LBP) [1] and its variants have been adopted popularly as a good descriptor to reveal the difference driven by the fabrication process because they can encode the micro-textural features efficiently in a local area of a face image. Furthermore, those methods are easy to implement and work in real-time without any parallel computation, which is suitable for mobile applications. On the other hand, such simple patterns of the textural distribution are insufficient to identify fine spoofing attacks using various materials, e.g., warped and deformable masks. To detect face forgery more reliably, deep learning techniques, particularly focusing on the convolutional neural network (CNN), have been applied in recent studies. Because diverse variations in the intra-class and inter-class can be learned efficiently

Transcript of Face Anti-spoofing Using Deep Dual Network

IEIE Transactions on Smart Processing and Computing, vol. 9, no. 3, June 2020 https://doi.org/10.5573/IEIESPC.2020.9.3.203 203

IEIE Transactions on Smart Processing and Computing

Face Anti-spoofing Using Deep Dual Network

Yongjae Gwak1, Chanho Jeong1, Jong-hyuk Roh2, Sangrae Cho2, and Wonjun Kim1,*

1 Department of Electrical and Electronics Engineering, Konkuk University / Seoul, Korea {trevor, jch828, wonjkim}@konkuk.ac.kr

2 Electronics and Telecommunications Research Institute / Daejeon, Korea {jhroh, sangrae}@etri.re.kr

* Corresponding Author: Wonjun Kim

Received December 17, 2019; Revised February 26, 2020; Accepted March 24, 2020; Published June 30, 2020

* Regular Paper

Abstract: Face recognition has been adopted widely for real-world applications because of its convenience and contactless nature. On the other hand, forged faces for spoofing attacks can be fabricated easily using a variety of materials, such as pictures, high-resolution videos, and printed masks, etc., which pose a great threat to face-based recognition systems. Therefore, face anti-spoofing has become an essential technique to achieve high-level security. Although many studies have explored effective features to discriminate live faces from fake ones, even with deep neural networks, they still struggle to grasp meaningful differences from a single image because of the sophisticated spoofing attacks with various media. This paper proposes a novel method for face anti-spoofing based on stereo facial images. Because the three-dimensional structure of a live face clearly yields a structural difference in the image pair taken by a stereo camera, whereas significant differences do not occur in fake faces of two-dimensional planes, this paper proposes to learn the differences of left-right image pairs in the latent space of a deep neural network. One important advantage of the proposed method is that the structural difference is encoded implicitly in a nonlinear manner through the deep architecture without explicitly computing the disparity. The experimental results on a constructed dataset revealed the proposed method to be effective for diverse spoofing attacks.

Keywords: Face anti-spoofing, Stereo facial images, Deep neural network, Latent space 1. Introduction

Diverse applications of face recognition have been applied widely to mobile devices owing to its ease of use and contactless nature. Moreover, with the improved accuracy achieved by deep learning techniques, face recognition has recently expanded to electronic transactions. Although the widespread expansion of face recognition in the industry is possible, it still suffers from security challenges. Many sophisticated spoofing attacks using high-resolution pictures, videos, and printed masks can easily threaten face-based verification systems, leading to the leakage of personal information and even serious crimes related to financial services. Therefore, face anti-spoofing has become indispensable when constructing high-level security systems.

Several approaches for face anti-spoofing have been developed in the field of computer vision and can be categorized into two main groups: handcrafted feature-based and learned feature-based methods. In the former,

researchers attempted to precisely design the discriminative features for face anti-spoofing based on the motion, image quality, and textural information while the latter aims to learn the subtle differences between live and fake faces through a deep neural network (DNN). In particular, the local binary pattern (LBP) [1] and its variants have been adopted popularly as a good descriptor to reveal the difference driven by the fabrication process because they can encode the micro-textural features efficiently in a local area of a face image. Furthermore, those methods are easy to implement and work in real-time without any parallel computation, which is suitable for mobile applications. On the other hand, such simple patterns of the textural distribution are insufficient to identify fine spoofing attacks using various materials, e.g., warped and deformable masks. To detect face forgery more reliably, deep learning techniques, particularly focusing on the convolutional neural network (CNN), have been applied in recent studies. Because diverse variations in the intra-class and inter-class can be learned efficiently

Gwak et al.: Face Anti-spoofing Using Deep Dual Network

204

through deep-layered architectures, the performance of face anti-spoofing under diverse spoofing attacks has been improved considerably [2]. Although DNN-based approaches have boosted the accuracy of facial forgery detection, as shown in other fields of computer vision, their performance is limited to a trained model. That is, most of them still suffer from “unseen” spoofing attacks occurring frequently in real-world scenarios.

This paper reports a novel yet simple method for face anti-spoofing based on a deep dual network. The key observation of the proposed method is that the principal angle of live faces tends to have apparent disparity between the left and right images because of the three-dimensional structure. In contrast, fake faces show the same direction regardless of the spoof materials, e.g., printed mask and high-resolution display, as shown in Fig. 1. Based on this observation, this paper proposes to learn such directional differences by exploiting the dual network and its residual branch for face anti-spoofing. Moreover, the proposed method implicitly learns the structural difference between the stereo images through a deep architecture instead of explicitly calculating the three-dimensional features in a handcrafted manner [3]. To the best of the authors’ knowledge, this is the first attempt to apply the stereo image-based deep neural network to resolve the problem of face anti-spoofing. To achieve this, a new dataset was also constructed using a stereo camera. One important advantage of the proposed method is that this observation is valid for any type of two-dimensional spoofing attack. The method was robust to unseen

spoofing attacks, which posed great difficulty in previous DNN-based approaches.

The remainder of this paper is organized as follows. Section 2 provides a brief review of previous methods for face anti-spoofing. Section 3 explains the technical details of the proposed method, and Section 4 reports the experimental results of the newly constructed dataset. The conclusions are summarized in Section 5.

2. Related Work

This section provides a brief review of face anti-spoofing methods. First, in the category of handcrafted feature-based methods, most studies focused on modeling the difference in the visual quality between live and fake faces based on the textural properties. In particular, micro-patterns of the intensity values in a small local region, e.g., local binary patterns (LBP) [1], have been used widely because of the encoding capability and its simplicity. Maatta et al. [4] first adopted LBPs extracted from multiple faces with different scales as local descriptors for face liveness detection. Similarly, Chingovska et al. [5] combined the scores calculated from the LBP features of the global area (i.e., whole face) and local patches, which are defined by dividing a given face without overlap. Yang et al. [6] further attempted to extract the LBPs from the facial components, e.g., eyes and nose, to improve the performance of liveness detection. In [7], the authors proposed to encode micro-patterns of the diffusion speed, which efficiently revealed the reflectance differences by the facial structures, rather than the intensity values. Although these methods are simple to implement and work in real-time, they are still vulnerable to sophisticated forgery using deformable materials.

Inspired by the success of the deep neural network (DNN) in image classification, several researchers have started to adopt DNNs for the problems of face anti-spoofing. Many DNN architectures for image classification are also expected to yield good results for face anti-spoofing because the goal of face anti-spoofing can be defined as the problem of binary classification to determine if a given face is live or not. Atoum et al. [2] used two-stream convolutional neural networks (CNN) to extract the local features as well as the holistic depth information, but their method was not trained in an end-to-end manner. Jourabloo et al. [8] proposed to decompose a spoofing image into spoof noise and live face and attempted to learn the decomposition process based on the generative model for detecting fake faces in terms of the estimated spoof noise. Recently, Liu et al. [9] exploited the learning deep model via auxiliary supervision to cope with the problem of poor generalization, i.e., overfitting to the training dataset. To incorporate both spatial and temporal auxiliary information into the network architecture, they employed a depth map and rPPG signal estimated explicitly from additional methods. Although such learned feature-based approaches improve the performance of face anti-spoofing significantly, they often fail to grasp adequate spoofing cues for unseen spoofing attacks, which are not included in the training samples. On the other hand,

Fig. 1. (a) Live pairs, (b) Pairs of the spoofing attack using a tablet, (c) Pairs of the spoofing attack by a printed paper. Note that the principal angles of the faces in live pairs are captured obviously differently due to the three-dimensional structure, while those of the fake pairs are similarly photographed.

IEIE Transactions on Smart Processing and Computing, vol. 9, no. 3, June 2020

205

efforts to extract the discriminative features from stereo images have also been made [3]. In [3], the authors constructed a new dataset composed of stereo images of 35 subjects. They fused both two- and three-dimensional features, which were extracted from the Gabor wavelet-based similarity map and the histogram of the 3D point cloud.

This paper proposes a novel method for face anti-spoofing based on stereo facial images. To achieve this, a new dataset was constructed with 50 subjects, and the corresponding deep neural network was developed. The technical details will be explained in Section 3.

3. Proposed Method

The key idea of the proposed method is to reveal the directional differences between stereo images, which provide useful clues to discriminating live faces from fake ones, through the dual architecture of a DNN. The proposed deep dual network implicitly encodes the properties of three-dimensional structures by concatenating the differences of feature maps, which were estimated from each branch of the convolutional neural network whose weights are shared, as well as the features extracted from left and right facial images. Fig. 2 presents the overall architecture of the proposed deep dual network. The technical details of the proposed method will be explained in the following subsections.

3.1 Architecture The proposed dual network consists of three

subnetworks, as shown in Fig. 2. First, the discriminative

features of a given face are extracted through stacked convolution layers for left and right images, which share their weights in the first subnetwork. During the feed-forward process in the first subnetwork, the differences between the activation results generated from each convolution layer of the left and right branch networks are exploited. By concatenating them in the second subnetwork, it is expected that the distinctive properties of three-dimensional structures would be encoded efficiently, which plays an important role in detecting spoof attacks. Note that two or three convolution layers were also used to fit the size of such differential results (called difference maps), which were calculated at all layers, to be 8 8 64× × . In the third subnetwork, the outputs of the first and second subnetworks are encoded respectively via the fully connected layers as well as additional convolution ones to estimate the probability of whether the given input is fake. Note that all the convolution layers are followed by instance normalization [10]. For the convolution layers contained in the first subnetwork, the ReLU operation is then applied to the results of instance normalization. Table 1 lists the architecture details of the proposed deep dual network.

In particular, inspired by the concept of the Siamese network [11] estimating the similarity of two different inputs via shared weights, this architecture was modified to have stereo images as two different inputs to reveal their difference efficiently. The textural features of the left and right images were extracted precisely in a multiscale manner by stacking six convolution layers and three dilated convolution ones which are suitable for grasping wider spatial areas. The difference maps, i.e., differences of the activation maps from the left and right branch networks, were generated for all layers. Allowing for edge-

Fig. 2. Proposed deep dual network for face anti-spoofing.

Gwak et al.: Face Anti-spoofing Using Deep Dual Network

206

like structures and global shapes simultaneously is desirable. To combine them efficiently, each difference map was fed into additional convolution layers having a 4 4× kernel. That is, convolution was conducted repeatedly until the spatial size and channel number become 8 8× and 64 , respectively, as shown in Fig. 2. Note that the number of convolution layers was determined adaptively according to the size of the difference map. Subsequently, all the encoded difference maps were concatenated and input into the third subnetwork. Finally, textural features concatenated from the results of the left and right branch networks (i.e., output of the first subnetwork) as well as their concatenated difference maps (i.e., output of the second subnetwork) were compressed independently through several convolution layers in the third subnetwork. Those are then flattened into a single vector whose dimensions are 1 1 8,192× × to calculate the score based on the cross-entropy for face spoofing detection. The deep network can not only capture the textural details for each stereo facial image but reveal their difference efficiently, leading to significant improvement of face anti-spoofing.

3.2 Loss Function To optimize the weights of three subnetworks, the

proposed model was trained based on the cross-entropy loss function which is defined as follows:

( ) ( ){ }1

1 log 1 log 1N

i i iclass n n n n

n

y x y xN =

= − + − −∑L (1)

where { }1,2i ∈ denotes the index for each branch in the third subnetwork (see Fig. 2). n and N denote the sample

index in the mini-batch and the total number of samples in each mini-batch, respectively. ny indicates the class label, i.e., live( 1= ) or fake( 0= ), and i

nx represents the final output of each branch in the third subnetwork. The final loss is calculated simply by averaging such loss values as follows:

2

1

1 .2

iclass class

i=

= ∑L L (2)

Based on this loss function, the probability of a given

image belonging to the class of live or fake was estimated efficiently.

3.3 Training The proposed deep dual network was trained using the

dataset which contained 20,250 left-right image pairs taken by the stereo camera from 50 subjects (details will be explained in the following Section). The deep neural network (DNN)-based detector [12] which shows the reliable performance for face detection even under diverse variations (e.g., very small size, rotation, and paper folding), was adopted to extract the facial regions from captures images. The detected regions were resized directly to 128 128× pixels for training and test. The 50-fold cross-validation was adopted for the performance evaluation which has been employed in most previous methods. That is, the image pairs for 49 subjects were used for training, while those of the remaining one was utilized for the test. This process was repeated 50 times and the average of all the probabilities were calculated as the final score.

All the weights in the convolution layers are initialized

Table 1. Detailed architecture of proposed method.

Module Layer Output dimension

( )× ×H W C Kernel Padding Stride Dilation

Conv. 64 64 64× × 4 2 1 -

Conv. 32 32 128× × 4 2 1 -

Conv. 16 16 256× × 4 2 1 -

Conv. 16 16 384× × 3 1 1 -

Conv. 8 8 512× × 4 2 1 -

Dilated Conv. 8 8 512× × 3 2 1 2

Dilated Conv. 8 8 512× × 3 4 1 4

Dilated Conv. 8 8 512× × 3 8 1 8

Subnetwork 1

Conv. 8 8 256× × 3 1 1 -

Subnetwork 2 Conv. 8 8 64× × Repeatedly conduct convolution until the spatial size and the

channel number become 8 8× and 64 , respectively

Conv. 8 8 256× × 3 1 1 - Subnetwork 3_1

Conv. 8 8 128× × 3 1 1 -

Conv. 8 8 512× × 3 1 1 -

Conv. 8 8 256× × 3 1 1 - Subnetwork 3_2

Conv. 8 8 128× × 3 1 1 -

IEIE Transactions on Smart Processing and Computing, vol. 9, no. 3, June 2020

207

randomly, in the same manner, reported elsewhere [13]. For training, the stochastic gradient descent (SGD) [14] is employed with a batch size of 1,200. The momentum and the weight-decaying coefficient were set to 0.9 and 0.0005, respectively. The learning rate was set initially to 0.01 and decreased with the multiplicative factor of the learning rate decay, which was set to 13% at every epoch. Note that 100 epochs were conducted and it took approximately 40 hours in total to train the proposed network using four NVIDIA GeForce Titan Xp GPUs.

4. Experimental Results

This section analyzes the performance of the proposed method in detail based on the newly constructed dataset of stereo-facial images. The proposed deep dual network was evaluated for face anti-spoofing both qualitatively and quantitatively. The examples of this implementation were also demonstrated.

4.1 Dataset Because the stereo facial dataset was unavailable

publicly to the best of the authors' knowledge, a dataset was constructed using a stereo camera (model: oCamS-1CGN-U), as shown in Fig. 3. Note that the baseline of the stereo camera is 12cm and this camera is mounted 80cm from the subject. Under this environment, 50 subjects (29 males and 21 females) participated in constructing in the dataset. The stereo-images were captured without any restriction of moving and rotating their heads slightly, which are helpful for alleviating the overfitting problem. To produce fake faces, two different types of spoofing attacks were used, i.e., screen and print attacks. The resolutions of each image, which was acquired by the tablet, were 1,920 1,080× pixels and 3,024 4,032× pixels for the screen and print attacks, respectively, as shown in Fig. 4. In particular, fake faces printed on the paper were

warped freely and rotated slightly to consider the diversity of spoofing attacks. The number of image pairs for live and fake faces was 135 per subject. Hence, the total number of pairs for each category was 135 50 6,750× = , as shown in Table 2. Fig. 5 gives some examples of image pairs from the dataset. As explained in the previous subsection, the facial regions in image pairs were cropped using a DNN-based detector [12] and resized to 1 28 128× pixels for the training and test.

4.2 Performance Evaluation

This subsection reports various experimental results based on the stereo-facial dataset. To evaluate the performance quantitatively, the equal error rate (EER) was employed as a performance metric, which has been used widely in the field of face anti-spoofing [2-7]. All the evaluation results were averaged for 50-fold cross-validation (see subsection 3.3). To show the efficiency and robustness of the proposed method, the results were compared with the stereo camera-based algorithm [3], which developed the textural similarity as well as the histogram of 3D point clouds as the discriminator for live and fake faces. Compared to this method, of which the EER was 0.68 based on the dataset that the 35 subjects participated in, the proposed deep dual network yielded more reliable performance, i.e., EER=0.48, as listed in Table 3. For a fair comparison in terms of scale, the performance of the proposed method was also evaluated using the same number of subjects with [3] (i.e., 35 subjects), and the corresponding EER was 0.51. This shows that the model trained with a larger dataset has good generality because it is trained with more diverse samples for face anti-spoofing. Therefore, it is believed that the difference map implicitly encoding the directional properties of live and fake faces outperformed the handcrafted features. Fig. 6 presents some pictorial examples of correctly and incorrectly detected cases.

The performance variations, according to the parameter

Fig. 3. (a) Example of taking the live image using the stereo camera, (b) The stereo camera whose the name of model is oCamS-1CGN-U, (c) The fixed stereo camera to construct the dataset.

Fig. 4. Examples of photos used for (a) print attack, (b) screen attack.

Table 2. Our new dataset for stereo facial images.

The number of pairs Live image 6,750

Photo spoofing image 6,750 Video spoofing image 6,750

Total 20,250

Gwak et al.: Face Anti-spoofing Using Deep Dual Network

208

setting, were also evaluated. First, to verify the efficiency of the difference map, this study tested the performance of the network without a difference map and the network only using a few layers when calculating the differences in activation maps, which were obtained from the left and right branch networks. To achieve this, activation maps were selected at both two-layer and three-layer intervals, which yielded five and three difference maps, respectively,

whereas the default setting used all the activation maps (i.e., yielded nine difference maps). Table 4 lists the corresponding results. The performance of face anti-spoofing decreased significantly without difference maps and the performance improved progressively as the number of difference maps increases. Therefore, the difference map proposed in this paper helped grasp the subtle difference between the live and fake faces based on stereo facial images. Moreover, the effects of the dilated convolution employed in the first subnetwork were checked. From the experimental results in Table 5, the dilated convolution was useful for extracting the discriminative features for face anti- spoofing in a multiscale manner without increasing the computational burden or number of parameters.

Fig. 5. Pairs of live and fake images taken by the stereo camera and cropped face images by [12]. First row: live pairs and the cropped faces from the pairs. Second row: spoofing attacks via the printed paper and the croppedfaces. Third row: spoofing attacks using the tablet and the cropped faces.

Fig. 6. Pairs of correctly and incorrectly detected cases (a) Pairs of live faces, (b) Pairs of spoofing attacks (The blue box denotes the live faces while the red box represents the spoof faces). Note that the proposed network is sometimes fooled by the live input pairs which are taken very similarly.

Table 3. Quantitative evaluations.

Methods Number of subjects EER X. Sun et al. [3] 35 0.68 Proposed method 35 0.51 Proposed method 50 0.48

IEIE Transactions on Smart Processing and Computing, vol. 9, no. 3, June 2020

209

4.3 Implementation The proposed method was implemented on the

PyTorch framework [15]. The deep dual network was trained using a single PC with Xeon [email protected] CPU and four NVIDIA GTX Titan Xp GPUs. Specifically, the facial regions were extracted from the given stereo pairs utilizing a multiscale DNN-based face detector [12] and subsequently resized to 1 28 128× pixels for training. On the other hand, the real-time demonstration was conducted based on a single laptop PC with [email protected] CPU and NVIDIA GeForce GTX 1060, as shown in Fig. 7. A Viola-Jones facial detector [16] was employed for the demonstration instead of using the DNN-based approach [12] because of the computational burden. In this environment, the average of the processing speed was approximately 7 fps while maintaining the performance of the face anti-spoofing driven by the training framework. Fig. 7 presents some examples of demonstrations. The proposed deep dual network could detect various spoofing attacks under real-world scenarios.

The proposed model and the code are available on https://github.com/dcvl18/DeepDualN et-Pytorch.

5. Conclusion

A novel yet simple method for face anti-spoofing was proposed based on the deep dual network in this paper. The key idea of the proposed method was to infer the structural difference between the stereo pairs implicitly via the difference of dual branches. By adopting these difference maps, there is no need to compute the disparity between the left and right facial images explicitly, which increases the computation time and GPU memory, and the performance of the proposed method can be improved significantly. Moreover, a new dataset was constructed based on 20,250 stereo facial images, which is composed of live faces and two types of fake ones (i.e., print-attack and screen-attack). Based on various experimental results on the dataset, the proposed method can determine if the given stereo facial image is live or fake, and the implementation results showed that the proposed method could be adopted simply in the high-level security systems.

Acknowledgment

This work was supported by Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korean government (MSIT) (No.2018-0-00189, Security Technology for Portal Device that connects Human-Infrastructure-Service in highly trust intelligent information service)

References

[1] T. Ojala, M. Pietikainen, and T. Maenpaa, “Multiresolution gray-scale and rotation invariant texture classification with local binary patterns,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 7, pp. 971-987, Jul. 2002. Article (CrossRef Link)

[2] Y. Atoum, Y. Liu, A. Jourabloo, and X. Liu, “Face anti-spoofing using patch and depth-based CNNs,” in Proc. IEEE Int. J. Conf. Bio., pp. 319-328, Oct. 2017. Article (CrossRef Link)

[3] X. Sun, L Huang, and C Liu, “Dual camera based feature for face spoofing detection,” in Proc. Chi. Conf. Pattern Recognit., pp. 332-344, Nov. 2016. Article (CrossRef Link)

[4] J. Maatta, A. Hadid, and M. Pietikainen, “Face spoofing detection from single images using micro- texture analysis,” in Proc. IEEE Int. Joint Conf. Biometrics, pp. 1-7, Oct. 2011. Article (CrossRef Link)

[5] I. Chingovska, A. Anjos, and S. Marcel, “On the effectiveness of local binary patterns in face anti- spoofing,” in Proc. IEEE Int. Conf. Biometrics Spe- ial Interest Group, pp. 1-7, Sep. 2012. Article (CrossRef Link)

Table 4. Performance variations according to settings of difference maps.

EERw/o difference maps 1.31

w/ three-layer intervals in difference maps 1.10 w/ two-layer intervals in difference maps 0.75 w/ all difference maps, w/ dilated conv. 0.48

Table 5. Performance variations according to settings of dilated convolutions.

EERw/ all difference maps, w/o dilated conv. 0.80 w/ all difference maps, w/ dilated conv. 0.48

Fig. 7. Some examples of face anti-spoofing on our demonstration system (a) Results of detecting livefaces, (b) Results of detecting fake faces (screen and paper attacks, respectively).

Gwak et al.: Face Anti-spoofing Using Deep Dual Network

210

[6] J. Yang, Z. Lei, S. Liao, and S. Z. Li, “Face liveness detection with component dependent descriptor,” in Proc. IEEE Int. Conf. Biometrics, pp. 1-6, Jun.2013. Article (CrossRef Link)

[7] W. Kim, S. Suh, and J-J. Han, “Face liveness detection from a single image via diffusion speed model,” IEEE Trans. Image Process., vol. 24, no. 8, pp. 2456-2465, Aug. 2015. Article (CrossRef Link)

[8] A. Jourabloo, Y. Liu, and X. Liu, “Face de-spoofing: anti-spoofing via noise modeling,” in Proc. Eur. Conf. Comput. Vis., pp. 297-315, Sep. 2018. Article (CrossRef Link)

[9] Y. Liu, A. Jourabloo, and X. Liu, “Learning deep models for face anti-spoofing: binary or auxiliary supervision,” in Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognit., pp. 389-398, Jun. 2018. Article (CrossRef Link)

[10] D. Ulyanov, A. Vedaldi, and V. Lempitsky, “Instance normalization: The missing ingredient for fast stylization,” 2016, arXiv preprint arXiv:1607.08022. Article (CrossRef Link)

[11] G. Koch, R. Zemel, and R. Salakhutdinov, “Siamese neural networks for one-shot image recognition,” in ICML deep learning workshop, vol. 2, 2015. Article (CrossRef Link)

[12] P. Hu and D. Ramanan, “Finding tiny faces,” in Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognit., pp. 1522-1530, Jul. 2017. Article (CrossRef Link)

[13] X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks,” in Proc. Int. Conf. Artif. Intell. Statist., May 2010, pp. 249-256. Article (CrossRef Link)

[14] L. Bottou, “Large-scale machine learning with stochastic gradient descent,” in Proc. 19th Int. Conf. Comput. Statist., Aug. 2010, pp. 177–186. Article (CrossRef Link)

[15] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in PyTorch,” in Proc. Neural Inf. Process. Syst., Dec. 2017, pp. 1-4. Article (CrossRef Link)

[16] P. Viola and M. J. Jones, “Robust real-time face detection,” Int. J. Comput. Vis., vol. 57, no. 2, pp. 137–154, 2004. Article (CrossRef Link)

Yongjae Gwak received his B.S. degree in Department of Electronics Engineering from Konkuk University, Seoul, Korea. Currently, he is a graduate student in the Department of Electronic, Information, and Communication Engineering at Konkuk University, Seoul, Korea. His

research interests include image understanding, computer vision, and biometrics.

Chanho Jeong received his B.S. degree in Department of Electronics Engineering from Konkuk University, Seoul, Korea. Currently, he is a graduate student in Department of Electronic, Information, and Commu-nication Engineering at Konkuk University, Seoul, Korea. His research

interests include image understanding, computer vision, and biometrics.

Jong-hyuk Roh received the BS, MS, and Ph.D. degree in Computer Engineering from Inha University, Korea. He is currently the principal researcher in the Electronics and Telecommunications Research (ETRI), Korea. His research interests include machine learning, pattern analysis,

behavior-based authentication, and computer security.

Sangrae Cho is a senior researcher of the Authentication Research Team in ETRI, South Korea. He graduated from Imperial College London in 1996, obtained BEng Computing degree, and studied MSc in Information Security in Royal Holloway, University of London, in 1997. He started his career

as a researcher at LG Corporate Technology Institute in 1997 and has worked in ETRI as a security researcher for more than 15 years. During that time, he has been actively involved in constructing a national PKI infrastructure project until 2001. From 2004, he has done several projects relating to Digital Identity Management, including SAML v2.0 and Authentication technology based on FIDO (Fast Identity Online) specifications.

IEIE Transactions on Smart Processing and Computing, vol. 9, no. 3, June 2020

211

Wonjun Kim received the B.S degree in Department of Electronic Engi-neering from Sogang University, Seoul, Korea, the M.S. degree from the Department of Information and Communications, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea, and Ph.D.

degree from Department of Electrical Engineering, KAIST, in 2006, 2008, and 2012, respectively. From September 2012 to February 2016, he was a Research Staff Member in the Samsung Advanced Institute of Technology (SAIT), Gyeonggi-do, Korea. Since March 2016, he has been with the Department of Electrical and Electronics Engineering, Konkuk University, Seoul, Korea, where he is currently an Associate Professor. His research interests include image and video understanding, computer vision, pattern recognition, and biometrics, with an emphasis on background subtraction, saliency detection, face, and action recognition. He has served as a regular reviewer for more than 30 international journal papers, including the IEEE Transactions on Image Processing, the IEEE Access, the IEEE Transactions on Circuits and Systems for Video Technology, the IEEE Transactions on Multimedia, IEEE Transactions on Cybernetics, IEEE Signal Processing Letters, Pattern Recognition, etc.

Copyrights © 2020 The Institute of Electronics and Information Engineers