(2013) A Trade-off Between Number of Impressions and Number of Interaction Attempts

5
The 8th International Conference on Information Technology and Applications (ICITA 2013) Abstract--The amount of time taken to enroll or collect data from a subject in a fingerprint recognition system is of paramount importance. Time taken directly affects cost. A trade-off between number of impressions collected and number of interaction attempts allowed to submit those impressions must be realized. In this experiment, data were collected using an optical fingerprint sensor. Each subject submitted six successful impressions with a maximum of 18 interaction attempts. The resulting images were analyzed using three methods: the number of interaction attempts per finger, quality differences from the first three impressions to the last three impressions, and finally matching performance from the first three impressions to the last three impressions. The right middle finger seemed to have the most issues collecting as it required the most interaction attempts. Analysis was performed to show no significant differences in image quality or matching performance. However, after further analysis, a steady improvement was noticed from Group A to Group B in both image quality and matching performance Index Terms-- Biometrics, image quality, impression, interaction, matching performance I. INTRODUCTION There are many factors that impact the performance of a biometric system, from poor quality data including ridge-valley structure [1], skin conditions [2], human interaction with the sensor [3], and the associated metadata attached to biometric data [4]. Poor quality data, in this case fingerprint images, regardless of the source have a resulting impact on the performance of a biometric [5–8], and can impact the operations of the system. Test protocol designers are faced with a series of challenges when collecting data and minimizing error, regardless of the cause. In [9], the development of the Human Biometric Sensor Interaction model is discussed, which examined four fundamental issues – how do users interact with the biometric device, what errors do the users make and are there any commonalities within these different errors, and what J. Hasselgren is with the Technology, Leadership, and Innovation Department of Purdue University, West Lafayette IN 47907 USA (telephone: 765-494-2311, e-mail: [email protected]). S. Elliott is with the Technology, Leadership, and Innovation Department of Purdue University, West Lafayette IN 47907 USA (telephone: 765-494-2311, e-mail: [email protected]). J. Gue is a student in the Technology, Leadership, and Innovation Department of Purdue University, West Lafayette IN 47907 USA (telephone: 765-494-2311, e-mail: [email protected]). ISBN: 978-0-9803267-5-8 level of training should one expect to give the subject (if any at all) to successfully use a biometric device. Test protocol designers can reference documents describing the best practices of designing a test protocol, (for example [10]). And while minimizing the error is paramount in a test, so too are the decisions relating to the number of test subjects and the time they spend in the test center. The number of test subjects is an important task in developing the test protocol. Mansfield and Wayman note that the ideal test would be to have as many volunteers as is practically possible, each making a single transaction. They provide an example whereby an evaluation may have 200 subjects each enrolling and making three genuine transactions, with two further revisits, providing 1200 genuine attempts [10]. Test crews, and the number of attempts vary, depending on the nature of the test as well as the allowable expense related to test subject recruitment and administration of the test. In their guidance, [10] state that the test population should be “as large as practically possible”. Test protocols in the literature vary on the number of samples collected. One study examined image quality and performance on a single fingerprint sensor. Fifty subjects participated, providing three samples of their index, middle, ring and little on both hands, resulting in 1200 images [11]. Another study examined the effects of scanner height on fingerprint capture, and collected fingerprints from 75 different subjects at four different heights, with five different attempts [12]. Another example, FVC 2000 collected 880 fingerprints in total, with 8 impressions each per finger [13]. Each of these studies examined very different topics within fingerprint performance, but each test protocol designer made the determination of the number of fingerprints to collect, and the number of attempts that the subject would complete. II. MOTIVATION In an operational setting, there is an inherent trade-off between the number of samples collected, the number of interaction attempts to collect the samples, and the cost of the collection. For example, should the test personnel keep trying to collect from an individual that has poor image quality in the hope that they will provide better image quality because they are either getting accustomed to the device and improve their presentation? Or, in this scenario, is it better to stop after the first three attempts because the time taken to acquire the images does not provide any additional value? The research questions are as follows: does the quality improve with experience or familiarity with the device? Does performance change across different groups, such as the first three successfully acquired samples, the last three, the top three image quality samples, and for reference, the bottom three? All of these questions are A Trade-off Between Number of Impressions and Number of Interaction Attempts Jacob A. Hasselgren, Stephen J. Elliott, and Jue Gue, Member, IEEE

description

Published in ICITA 2013, 8th International Conference on Information Technology and Applications, Sydney, Australia, 1 - 4 July, 2013 The amount of time taken to enroll or collect data from a subject in a fingerprint recognition system is of paramount importance. Time taken directly affects cost. A trade-off between number of impressions collected and number of interaction attempts allowed to submit those impressions must be realized. In this experiment, data were collected using an optical fingerprint sensor. Each subject submitted six successful impressions with a maximum of 18 interaction attempts. The resulting images were analyzed using three methods: the number of interaction attempts per finger, quality differences from the first three impressions to the last three impressions, and finally matching performance from the first three impressions to the last three impressions. The right middle finger seemed to have the most issues collecting as it required the most interaction attempts. Analysis was performed to show no significant differences in image quality or matching performance. However, after further analysis, a steady improvement was noticed from Group A to Group B in both image quality and matching performance.

Transcript of (2013) A Trade-off Between Number of Impressions and Number of Interaction Attempts

Page 1: (2013) A Trade-off Between Number of Impressions and Number of Interaction Attempts

The 8th International Conference on Information Technology and Applications (ICITA 2013)

Abstract--The amount of time taken to enroll or collect data from a subject in a fingerprint recognition system is of paramount importance. Time taken directly affects cost. A trade-off between number of impressions collected and number of interaction attempts allowed to submit those impressions must be realized. In this experiment, data were collected using an optical fingerprint sensor. Each subject submitted six successful impressions with a maximum of 18 interaction attempts. The resulting images were analyzed using three methods: the number of interaction attempts per finger, quality differences from the first three impressions to the last three impressions, and finally matching performance from the first three impressions to the last three impressions. The right middle finger seemed to have the most issues collecting as it required the most interaction attempts. Analysis was performed to show no significant differences in image quality or matching performance. However, after further analysis, a steady improvement was noticed from Group A to Group B in both image quality and matching performance Index Terms-- Biometrics, image quality, impression, interaction, matching performance

I. INTRODUCTION There are many factors that impact the performance of a biometric system, from poor quality data including ridge-valley structure [1], skin conditions [2], human interaction with the sensor [3], and the associated metadata attached to biometric data [4]. Poor quality data, in this case fingerprint images, regardless of the source have a resulting impact on the performance of a biometric [5–8], and can impact the operations of the system. Test protocol designers are faced with a series of challenges when collecting data and minimizing error, regardless of the cause. In [9], the development of the Human Biometric Sensor Interaction model is discussed, which examined four fundamental issues – how do users interact with the biometric device, what errors do the users make and are there any commonalities within these different errors, and what

J. Hasselgren is with the Technology, Leadership, and Innovation

Department of Purdue University, West Lafayette IN 47907 USA (telephone: 765-494-2311, e-mail: [email protected]).

S. Elliott is with the Technology, Leadership, and Innovation Department of Purdue University, West Lafayette IN 47907 USA (telephone: 765-494-2311, e-mail: [email protected]).

J. Gue is a student in the Technology, Leadership, and Innovation Department of Purdue University, West Lafayette IN 47907 USA (telephone: 765-494-2311, e-mail: [email protected]).

ISBN: 978-0-9803267-5-8

level of training should one expect to give the subject (if any at all) to successfully use a biometric device. Test protocol designers can reference documents describing the best practices of designing a test protocol, (for example [10]). And while minimizing the error is paramount in a test, so too are the decisions relating to the number of test subjects and the time they spend in the test center. The number of test subjects is an important task in developing the test protocol. Mansfield and Wayman note that the ideal test would be to have as many volunteers as is practically possible, each making a single transaction. They provide an example whereby an evaluation may have 200 subjects each enrolling and making three genuine transactions, with two further revisits, providing 1200 genuine attempts [10]. Test crews, and the number of attempts vary, depending on the nature of the test as well as the allowable expense related to test subject recruitment and administration of the test. In their guidance, [10] state that the test population should be “as large as practically possible”. Test protocols in the literature vary on the number of samples collected. One study examined image quality and performance on a single fingerprint sensor. Fifty subjects participated, providing three samples of their index, middle, ring and little on both hands, resulting in 1200 images [11]. Another study examined the effects of scanner height on fingerprint capture, and collected fingerprints from 75 different subjects at four different heights, with five different attempts [12]. Another example, FVC 2000 collected 880 fingerprints in total, with 8 impressions each per finger [13]. Each of these studies examined very different topics within fingerprint performance, but each test protocol designer made the determination of the number of fingerprints to collect, and the number of attempts that the subject would complete.

II. MOTIVATION

In an operational setting, there is an inherent trade-off between the number of samples collected, the number of interaction attempts to collect the samples, and the cost of the collection. For example, should the test personnel keep trying to collect from an individual that has poor image quality in the hope that they will provide better image quality because they are either getting accustomed to the device and improve their presentation? Or, in this scenario, is it better to stop after the first three attempts because the time taken to acquire the images does not provide any additional value? The research questions are as follows: does the quality improve with experience or familiarity with the device? Does performance change across different groups, such as the first three successfully acquired samples, the last three, the top three image quality samples, and for reference, the bottom three? All of these questions are

A Trade-off Between Number of Impressions and Number of Interaction Attempts

Jacob A. Hasselgren, Stephen J. Elliott, and Jue Gue, Member, IEEE

Page 2: (2013) A Trade-off Between Number of Impressions and Number of Interaction Attempts

The 8th International Conference on Information Technology and Applications (ICITA 2013)

applicable in determining the best enrollment policy and will impact the time that the subject is at the enrollment station.

III. METHODOLOGY For the purposes of this study, and subsequent analysis the

following definitions are used. A successfully acquired sample (SAS) is determined when the fingerprint sensor acquired a sample. In these experiments, the fingerprint sensor acquired the sample with a slight set image quality threshold, which required a minimum number of minutiae. The following fingers were collected from the subject: right index, right middle, left index and left middle. Fig. 1 visually shows the hands used during this collection.

Fig. 1. Representation of fingers used for collection

Six impressions that were determined to be SAS’s were taken

on each finger. Each SAS was given an impression number, which in this case would always be a value between one and six. When a subject attempted to present to the sensor, regardless of whether a SAS occurred, or whether the presentation was good or bad, it was considered the subject had committed an interaction attempt. The subject was allowed maximum of 18 interaction attempts. The sensor used was the Digital Persona U.are.U 4500 sensor, which is commercially available. The data used in these analyses were taken from an on-going aging study in the BSPA Labs at Purdue University. Four fingerprint sensors were used in the overall data collection, along with other modalities. This particular sensor was the last sensor used in this fingerprint station.

The test protocol and subsequent definitions is consistent with the human biometric sensor interaction model as outlined in [3]. The schematic of interaction attempts and impressions is shown below in Fig. 2. Fig. 2 is only an example of the difference between impression numbers and interaction attempt numbers. Group A could consist of attempts higher in the order. Group B can consist of attempts 7, 8 and 9 or even 7, 11, and 16.

Fig. 2. Schematic of interaction attempts and impressions

Four different groups were established throughout these

analyses. Group A consisted of the first three successfully acquired samples for a subject for each finger. Group B

consisted of the last three successfully acquired samples for each subject for each finger. Group C included the images that have the lowest quality scores while Group D consisted of the highest image quality scores. Not all groups were used in every analysis.

Four commercially available software packages were used. Neurotechnology Megamatcher v4.3 was used for matching performance while the Aware WSQ1000 quality tool was used for image quality analysis. Oxford Wave graphing software was used to plot and calculate the Equal Error Rates, and Minitab 14 was used to determine statistical measures and results.

IV. RESULTS The results of the experiment are divided into three sections.

Table 1 provides a description of each analysis.

Table 1. Framework Analysis Description Groupings

Number of interaction attempts

Differences in the number of interaction

attempts based on finger location

Groups A and B

Image Quality

Differences in image quality from the first

three SAS to the last three SAS

Group A vs. Group B

Differences in image quality

from the lowest three quality

scoring SAS to the highest three quality scoring

SAS

Group C vs. Group D

Matching Performance

Differences in matching

performance from the first

three SAS to the last three SAS

Group A vs. Group B

Differences in matching

performance from the lowest

three quality scoring SAS to

the highest three quality scoring

SAS

Group C vs. Group D

The test subject population consisted of 49 males, 53 females,

and four subjects who did not disclose their demographic information.

Page 3: (2013) A Trade-off Between Number of Impressions and Number of Interaction Attempts

The 8th International Conference on Information Technology and Applications (ICITA 2013)

A. Number of interaction attempts

The results consist of those subjects that presented six successfully acquired samples in 18 or less interaction attempts. The results of the number of attempts are shown below for each finger collected (right index, left index, right middle, and left middle). There was no significant difference for interaction attempts between Group A and Group B for any given finger.

In an ideal data collection scenario, the impression numbers should match the interaction attempt numbers, as no additional attempts would have been necessary. Group A’s impression numbers were always one through three, but some subjects, particularly in the right middle finger, needed as many as 12 attempts just to submit three SAS.

The majority of individuals achieved their samples in six interaction attempts across all finger locations. However, there are some fingers, notably the right middle, where the distribution is more spread out. This is shown in Table 2.

Table 2. Variance of attempts for group per finger Finger Location Group Variance

LI A 1.0830 B 2.0962

LM A 1.0320 B 1.3039

RI A 1.3047 B 2.2292

RM A 2.0180 B 2.8247

The right middle (RM) and the right index (RI) have a greater

variance in Groups A and B than the other fingers. This difference in variance may be explained by the ordering in which the fingers were collected. For this collection, the fingers were collected in the following order: right index, right middle, left index and left middle. These higher values in variation for the right index and right middle fingers could be a result of the subject becoming comfortable with the sensor. Since the right index and right middle fingers are the first two fingers to present to the sensor, perhaps there is a habituation factor that is affecting the result of the number of interaction attempts and the variance. This could also simply be a case of hand dominance; however, this was not available for this paper. B. Image Quality

It is well understood that image quality impacts performance. In this section, we evaluate image quality across four groups – the groups A and B (first three SAS and last three SAS, respectively) and additionally groups C (top three image quality) and D (bottom three image quality). The images were processed using a commercial quality scoring algorithm, Aware WSQ1000 that provided an aggregate quality score from 0-100. The breakdown of these quality scores are as follows: good ranges from 85-100, adequate from 75-84, marginal from 60-74, and poor from 0-59. The distribution of image quality scores are shown in Fig. 3.

Fig. 3. Distribution of quality across groups A and B and

finger location.

Referring to Fig. 3, each finger’s mean quality is between70 and 76, or marginal quality.

Modality SubtypeGroup2

RMRILMLIDCDCDCDC

100

90

80

70

60

50

40

30

Qua

lity

Quality in Groups

Fig. 4. Distribution of quality across groups C and D and

finger location.

Fig. 4 shows the quality distribution for Groups C and D, the lowest three quality scoring SAS and the highest three quality scoring SAS, respectively.

Table 3. Basic quality statistics for groups per finger Finger Location

Group Mean Std. Dev.

Variance

LI A 71.604 9.397 88.313 B 72.911 9.545 91.115 C 68.785 9.442 89.149 D 75.729 8.182 66.946 LM A 73.327 9.992 99.843 B 74.871 9.322 86.907 C 70.459 10.059 101.176 D 77.739 7.758 60.180 RI A 72.139 10.253 105.131 B 73.289 9.327 86.984 C 69.014 10.104 102.082 D 76.415 7.951 63.213 RM A 74.683 9.296 86.418 B 75.237 9.042 81.767 C 71.720 9.501 90.276

Page 4: (2013) A Trade-off Between Number of Impressions and Number of Interaction Attempts

The 8th International Conference on Information Technology and Applications (ICITA 2013)

D 78.200 7.550 56.997 The variances of Group A were larger than Group B in all but

the left index finger. The means of quality for Group A and Group B of each finger were compared in a one-way ANOVA statistical test. There was no significant difference between Group A and Group B for any given finger.

The means of quality for Group C and Group D of each finger were compared in a one-way ANOVA statistical test. There was a significant difference for all fingers (p<.001).

C. Performance

To observe the differences in matching performance, the SAS, in their respective groups, were enrolled into minutiae-based matching software, Megamatcher 4.3. The resulting equal error rates for these matching sequences are presented in Table 4.

Table 4. Group A (first three) vs. Group B (last three)

Finger Group A vs Group A

Group B vs Group B

Group A vs Group B

LI 0.0000 0.0000 0.0006 LM 0.3322 0.0000 0.1282 RI 0.0000 0.0000 0.0000

RM 0.0000 0.0000 0.0000 No improvements were noticed in performance for any

fingers except for the left middle finger. When examining the performance Group A of the left middle finger, an Equal Error Rate (EER) of 0.3322 was observed. Group B of the same finger was matched to itself and the performance improved to 0.0000. The third matching procedure was an interoperable match with Group A being matched to Group B. This also produced an improvement from Group A being matched to itself at an EER of 0.1282.

To also observe the effect quality has on performance, Groups C and D were also matched to themselves and the other. The matching rates of Group C and D (the top three image quality scores and the bottom three image quality scores, respectively) are shown below.

Table 5: Group C (top three) vs. Group D (bottom three)

Finger Group C vs Group C

Group D vs Group D

Group C vs Group D

LI 0.0000 0.0000 0.0000 LM 0.2816 0.0506 0.1766 RI 0.0000 0.0000 0.0000

RM 0.0000 0.0000 0.0000 The left middle finger was the only finger that produced an

EER more than 0.0000. When examining the performance Group C of the left middle finger, an EER of 0.2816 was observed. In the second matching run, Group D was matched to itself and the performance improved 0.0506. This points to the conclusion that quality does affect performance as the highest three scoring improved the EER by 0.2310. The third matching

run performed was an interoperable match as Group C was matched to Group D. This also produced an improvement from Group C being matched to itself at an EER of 0.1282. These results do point to the idea that quality does affect performance.

V. CONCLUSIONS AND RECOMMENDATIONS It should be noted that the distribution of SAS does differ

from finger to finger. Subsequent work would be to examine other sensors and draw conclusions from this. Furthermore, there is additional work being conducted by O’Connor on the development of a metric to determine whether the subject is stable in their presentation – that is, it answers the problem of whe ther to take additional metrics given the prior knowledge of the individual’s performance within a given dataset [14].

Further work can be leveraged which would also identify test administrator error and provide an error-checking methodology for test administrators in the number of interaction attempts and impressions that are conducted.

While controlled laboratory style testing may not be impacted by this preliminary work, these results will provide guidance to operational data collections by answering the initial motivation of the study. In this study, we can conclude that test personnel would not benefit from collecting the additional fingerprints (4, 5 and 6) from LI, RI and RM, but would benefit marginally from collecting the six images. Furthermore, the quality metric may provide an additional tool in answering this question. Recall that the LM had the lowest group of quality images. Upon further analysis, these impressions came from subjects 60, 77 and 88. Perhaps these poor image quality metrics were caused by poor placement or age. The subjects’ ages were 60, 66 and 23, respectively.

It also should be noted that overall, the right index required more attempts to submit all six SAS’. This is interesting as it is assumed that the right index could be the more controllable finger for those with right hand dominance and this needs additional research.

Additionally, this study will be furthered by observing these metrics over multiple visits to attempt to measure habituation. Recall that both quality and performance improved from the first three impressions collected to the last three. This improvement could be an effect of using the device multiple times and becoming comfortable with it. The study from which this data was pulled from is a multiple visit study. Data will be available to observe this effect over multiple visits as well as multiple uses per visit.

REFERENCES [1] T. P. Pang, J. Xirdong, and W. Y. Yao, “Fingerprint image quality

analysis,” in 2004 International Conference on Image Processing,2004. ICIP ’04., 2004, pp. 1253–1256.

[2] K. Ito, A. Morita, T. Aoki, T. Higuchi, H. Nakajima, and K. Kobayashi, “A fingerprint recognition algorithm using phase-based image matching for low-quality fingerprints,” in IEEE International Conference on Image Processing 2005, 2005, pp. 33–36.

[3] E. Kukula, S. Elliott, and V. Duffy, “The effects of human interaction on biometric system performance,” in First International Conference on Digital Human Modeling (ICDHM 2007), Held as Part of HCI International, 2007, pp. 904–914.

Page 5: (2013) A Trade-off Between Number of Impressions and Number of Interaction Attempts

The 8th International Conference on Information Technology and Applications (ICITA 2013) [4] A. Hicklin and R. Khanna, “The role of data quality in biometric

systems,” White Paper. Mitretek Systems (February 2006), no. February, 2006.

[5] J. Fierrez-Aguilar, L. Munoz-Serrano, F. Alonso-Fernandez, and J. Ortega-Garcia, “On the effects of image quality degradation on minutiae- and ridge-based automatic fingerprint recognition,” in Proceedings 39th Annual 2005 International Carnahan Conference on Security Technology, 2005, pp. 79–82.

[6] S. K. Modi, S. J. Elliott, and H. Kim, “Statistical analysis of fingerprint sensor interoperability performance,” in 2009 IEEE 3rd International Conference on Biometrics: Theory, Applications, and Systems, 2009, pp. 1–6.

[7] C. Jin, H. Kim, X. Cui, E. Park, J. Kim, J. Hwang, and S. Elliott, “Comparative Assessment of Fingerprint Sample Quality Measures Based on Minutiae-Based Matching Performance,” in 2009 Second International Symposium on Electronic Commerce and Security, 2009, vol. 2, pp. 309–313.

[8] P. Grother and E. Tabassi, “Performance of biometric quality measures.,” IEEE transactions on pattern analysis and machine intelligence, vol. 29, no. 4, pp. 531–43, Apr. 2007.

[9] S. J. Elliott and E. P. Kukula, “A definitional framework for the human/biometric sensor interaction model,” in Biometric Technology for Human Identification VII, 2010, vol. 7667, no. 1, p. 76670H–8.

[10] A. J. Mansfield and J. L. Wayman, “Best Practices in Testing and Reporting Performance of Biometric Devices ver 2.01,” Teddington, 2002.

[11] M. R. Young and S. J. Elliott, “Image Quality and Performance Based on Henry Classification and Finger Location,” in 2007 IEEE Workshop on Automatic Identification Advanced Technologies, 2007, pp. 51–56.

[12] M. Theofanos, S. Orandi, R. Micheals, B. Stanton, and N. Zhang, “Effects of Scanner Height on Fingerprint Capture.” National Institute of Standards and Technology, Gaithersburg, p. 58, 2006.

[13] R. Cappelli, D. Maio, D. Maltoni, J. L. Wayman, and A. K. Jain, “Performance evaluation of fingerprint verification systems.,” IEEE transactions on pattern analysis and machine intelligence, vol. 28, no. 1, pp. 3–18, Jan. 2006.

[14] K.J. O'Connor, “Examination of stability in fingerprint recognition across force levels,” M.S. thesis, Dept. Tech., Lead., and Innov., Purdue Univ., West Lafayette, IN, 2013.