Strategies for Managing Missing or Incomplete Data in Biometric and Business Applications
description
Transcript of Strategies for Managing Missing or Incomplete Data in Biometric and Business Applications
1
Strategies for Managing Missing or Incomplete Data in Biometric and
Business Applications
Mark Ritzmann
Pace University
March17, 2007
2
Contents
Overview Essence and Significance of Work Experiment Design Outcomes Analysis Future Work
3
Contents
Overview Essence and Significance of Work Experiment Design Outcomes Analysis Future Work
4
OverviewEssence of this work
Address the problem of missing or incomplete data and put forth strategies to overcome that problem
Add to the accuracy of existing Keystroke Biometric Recognition System
Apply finding to other application areas
5
OverviewThe Impact of Missing data
<1% considered trivial 1-5% considered manageable 5-15% requires sophisticated methods >15% may severely impact any interpretation
P. Liu & L. Lei, Missing Data Treatment Methods and NBI Models, Proceedings of the Sixth International Conference on Intelligent Systems Design and Applications, IEEE, 2006
6
OverviewMissing Data Mechanisms
MCAR – Missing Completely at Random MAR – Missing At Random NMAR – Not missing at Random
Most missing data treatment methods assume missing is MAR
P. Liu & L. Lei, Missing Data Treatment Methods and NBI Models, Proceedings of the Sixth International Conference on Intelligent Systems Design and Applications, IEEE, 2006
7
OverviewMissing Data Treatment, High Level
Heuristic Statistical
•Based on established rules and guidelines
•Similar to an expert system
•Association is prime example
•Existing data used to calculate missing data
•Care need to be taken not to over fit
•Mean/mode is prime example
8
OverviewMissing Data Treatment Methods
Case Deletion Parameter Estimation Mean/Mode Imputation Method of Assigning All Possible Values of the Attribute Regression Imputation Hot Deck Imputation and Cold Deck Imputation Multiple Imputation K-Nearest Neighbor Imputation Internal Treatment Method
9
OverviewBiometric background
Roots in CIA & Dept of Defense work Early Issues – technology, cost, lack of standards Basic Uses
– Verification (easier of the two; yes/no)– Identification (harder of the two; 1 of n)
Basic types– Physiological – generally do not change– Behavioral – can change, easier to mimic
10
OverviewBiometric Issues
BIOMETRICS:CHALLENGES
& CAVEATS
Operational•Lab vs Field•Scalability•Continuous Authentication•Security
System•Business Process•Design•Control•Enrollment Challenge•System Downtime•Availability of template database•Effects of malicious code
Business•Financial feasibility•Interaction with traditional controls•Application not subject to rigor•Incompatibility with business partners•Transition to e-business•Control locus
People•User confidence•Privacy issues•User preferences•User acceptance•User profile•Trust
Legal & Regulatory•Lack of precedence•Ambiguous process•Imprecise definition•Logistics of proof of defense Technical
•Adaptation•Hardware•Evolving nature of technology•Scattered proliferation & polarization•Uniqueness of biometric•Scalability
A. Chandra & T. Calderon, Challenges and Constraints to the Diffusion of Biometrics Information Systems, Communications of the ACM, December 2005, Vol 48, No 2
11
OverviewPrivacy Issues – special mention
Opt in/Opt out– Any application or web site that used this system would
need to do so with full disclosure. The user could then knowingly decide.
Dictated environment– Any corporate or instructional e-mail system where the
ultimate ownership of the keystroke resides with that entity
Capture results, not text itself– Use keystrokes to authenticate/identify, not the words
themselves or the intact messages
12
OverviewKeyboard Biometric Studies in the Literature
Key Concepts– Copy vs Free– Authentication vs Identification
Classic Studies– Gaines, 1980– Umphress & Williams, 1985– Leggett & Williams, 1988– Joyce & Gupta, 1990– Bleha et al, 1990– Brown & Rogers, 1993
Recent Studies – University of Torino Pace University contributions
13
Contents
Overview Essence and Significance of Work Experiment Design Outcomes Analysis Future Work
14
Essence and Significance of WorkHigh Level Objectives
Improve the accuracy of the
current Keystroke Biometric
Recognition System
Develop strategies to manage the significant problem of missing or
incomplete data
Apply findings to other areas
1
2
3
15
Essence and Significance of WorkDetailed Objectives
First Objective:
Improve the accuracy of the current Keystroke Biometric Recognition system by improving the FALLBACK model invoked when a sample is of insufficient size
Second Objective:
Gain insight as to the effectiveness and application of MISSING DATA strategies and decision making with incomplete information
Third Objective:• Identify a potential application for a Keystroke Biometric recognition system
•Project the findings to other potential areas
16
Contents
Overview Essence and Significance of Work Experiment Design Outcomes Analysis Future Work
17
Experiment DesignRe-use of assets from previous Pace work
Data set Features/feature extraction Tests Optimal settings
18
Experiment DesignFuture inclusion ?
19
Experiment Design6 Test scenarios
Dr. Mary Vilani, Spring 2006Used with permission
20
Experiment DesignFeature set
Dr. Mary Vilani, Spring 2006Used with permission
21
Experiment DesignSummary of Subject Participation
Subjects by Experiment 36 subjects all four quadrants
52 subjects 1. Copy Task
40 subjects 2. Free Text
93 subjects 3. Desktop
47 subjects 4. Laptop
41 subjects 5. Desk Copy / Lap Free
40 subjects 6. Lap Copy / Desk Free
Dr. Mary Vilani, Spring 2006Used with permission
22
Experiment DesignData/Sample Capture Application
Dr. Mary Vilani, Spring 2006Used with permission
23
Experiment DesignApplication Version 2.0 - developed Fall, 2006
Development and Implementation of 2 additional Fallback Models
Tremendously enhanced Testing functionality Development and Implementation of Trace
Mechanism
24
Experiment DesignNew Bio Feature Extractor Interface
25
Experiment DesignNew Classifier Interface
26
Experiment DesignHigh Level Overview of Fallback Models
Heuristic Statistical
Touch Type Model
Statistical Model
Linguistic Model
New Models
27
Experiment DesignOverview of Models
Linguistic
Touch Type Statistical
28
Experiment DesignLinguistic Fallback Model - Duration
29
Experiment DesignLinguistic Fallback Model - Transition
30
Experiment DesignTouch Type Fallback Model - Background
Touch Type approach invented by Frank Edgar McGurrin in late 1800’s
– Won speed contest on July 25, 1888– Was front page news
Touch Type Idea - use sense of touch rather than sight (looking at key label)
Most keyboards still have raised indicator on “f” and “j” to indicate home position
31
Experiment DesignTouch Type Fallback Model
32
Experiment DesignTouch Type Fallback Model - Duration
A Q
Z 1 S W
X 2D E
C 3
F G R T V
B 4 5
H J Y U N
M 6 7
K I
, 8
L O
. 9
; P
/ 0
LeftLittle
LeftRing
LeftMiddle
LeftIndex
RightIndex
RightMiddle
RightRing
RightLittle
All Left Hand All Right Hand
All Keys
33
Experiment DesignTouch Type Fallback Model - Transition
E/A
Letter/letter
Left/left Right/right Left/right Right/left
R/EA/T
E/SS/TE/R
O/NI/N
T/IE/N
A/NT/H
O/RN/D
H/E
34
Experiment DesignStatistical Fallback Model
For Duration – Mean Imputation For Transition – Multiple Imputation
– Mean and Standard deviation calculated on transition full data set
– Any value >1 Standard deviation from the mean was removed
– New mean and standard deviation calculated on remaining data
– Process repeated 3 times
35
Experiment DesignStatistical Fallback Model – Duration Clusters
36
A
S
WDE C F
G
RT
B
H
Y
U
N
M
I
, L
O.
‘
P
-
CLUSTER 1All Keys
CLUSTER 4CLUSTER 8
CLUSTER 6CLUSTER 7 CLUSTER 5
CLUSTER 2
CLUSTER 3CLUSTER 9
NODE A
UNDER 100
OVER 100
NODE B
Experiment DesignStatistical Fallback Model - Duration
37
Experiment DesignStatistical Fallback Model – Transition development
Data Compacting
Sample Size
% of sample left after
outlier wash
100%Data
Compacting process
38
Experiment DesignStatistical Fallback Model – Transition, Raw Order
39
Experiment DesignStatistical Fallback Model – Transition, Cluster Development
40
Experiment DesignStatistical Fallback Model - Transition
E-R R-E
T-H
O-R
O-N
E-NA-T T-I
E-S
A-N H-E
N-D
E-A
S-T
I-N
Under 50 Over 50
Node A Node BNode C
Node D
Node 1 Node 2 Node 3Node 4
Any/Any
41
Contents
Overview Essence and Significance of Work Experiment Design Outcomes Analysis Future Work
42
OutcomesResults Comparison
43
Contents
Overview Essence and Significance of Work Experiment Design Outcomes Analysis Future Work
44
AnalysisFallback Trace
45
Experiment DesignLinguistic Fallback Model – Duration (repeat of previous)
46
AnalysisProposed Second Generation Touch Type Fallback Model - Duration
A Q
Z S W
XD E
C
F G R T V
B
H J Y U N
M
K IL O
P
LeftLittle
LeftRing
LeftMiddle
LeftIndex
RightIndex
RightMiddle
RightRing
RightLittle
All Left Hand All Right Hand
All KeysRed Circles remain as leafsAll else falls back to next level
47
Contents
Overview Essence and Significance of Work Experiment Design Outcomes Analysis Future Work
48
Future WorkTwo Main Areas
Academic– Hybrid System development – keystroke, mouse
movement, stylistic– Principle Components– Eigen Values
Application– For Keystroke Biometric system:
Academic – online testing Biometric Marketing
– For General Missing data, analytical applications
49
Future WorkKey success factors to system acceptance
Robustness – level of trust Acceptance Level – support by third party processes Cost – hardware/software, communications and support Ease of Use/Portability – extent of support across client
machines Security – privacy, integrity, and non-repudiation
“future research into the use of biometric technology in online marketing applications must consider not only technical
feasibility, but also social and legal acceptability.”
50
Future WorkBiometric Marketing
Use of Biometric technology to identify and segment users/consumers
What you have to believe:– Segmentation is better– Short + short + short = long for sampling
Chat rooms, e-mails etc.
51
Future WorkAnalytical Applications
Currently growing in use and acceptance Can be assumed Missing Data problem is
present– SAP considers <10% a non-factor– IBM identifies missing data, but does not manage– Case deletion most prevalent– Advanced strategies not identified
52
Future WorkAnalytical Applications - Examples
53
Future WorkAnalytical Applications - Examples
54
Future WorkAnalytical Applications - Examples
•Non Performing Loan Analysis•Organization Unit Profitability•Performance Measurement•Staffing Analysis
Store OperationsStore OperationsManagementManagement
•Activity Based Costing Analysis•Location Exposure•Location profitability•Loss Prevention Analysis
•Store Location Analysis•Store Optimization Analysis•Suspicious Activity Analysis
•Capital Allocation Analysis•Credit Risk Analysis
Corporate FinanceCorporate FinanceManagementManagement
•Financial Management Accounting•Income Analysis
•Campaign & Promotion Analysis•Cross Purchase Behavior•Cross Sell Analysis•Customer Attrition Analysis•Customer Complaints Analysis•Customer Credit Risk Profile•Customer Delinquency Analysis•Customer Interaction Analysis
Customer Customer ManagementManagement
•Customer Lifetime Value Analysis•Customer Loyalty•Customer Movement Dynamics •Customer Profile Analysis•Customer Profitability•Involved Party Exposure•Lead Analysis•Market Analysis
•Service Delivery Analysis•Transaction Profitability Analysis•Vendor Performance Analysis
Products & Products & ServicesServicesManagementManagement
•Business Performance Analysis•Planning and Forecasting Analysis•Product Analysis•Product profitability
•Assortment and Allocation Analysis•Inventory Analysis•Physical Merchandising / Space Management Analysis•Pricing Analysis•Promotion Analysis
MerchandisingMerchandisingManagementManagement