Missing Data Ppt

35
 Missing data & how to handle it Arooj Arshad PhD Scholar 1

description

this ppt is about missing data and how to handle this type of data with the appropriate handling techniques

Transcript of Missing Data Ppt

Page 1: Missing Data Ppt

7/18/2019 Missing Data Ppt

http://slidepdf.com/reader/full/missing-data-ppt 1/35

Missing data & how to

handle it

Arooj Arshad

PhD Scholar

1

Page 2: Missing Data Ppt

7/18/2019 Missing Data Ppt

http://slidepdf.com/reader/full/missing-data-ppt 2/35

•Carol Dweck,based on research

on belief systems,

and their role in

motivation and

achievement, has

a key contribution

in originating and

explaining implicit

theories of

intelligence/ability 

.

issing data and how to

handle it!

"oals• Discuss ways to evaluate

and understand missing data

•  Discuss common missing

data methods• Know the advantages and

disadvantages of common

methods

• Treatment of the missing

data

•  Efficient ways of missing

data handling

Page 3: Missing Data Ppt

7/18/2019 Missing Data Ppt

http://slidepdf.com/reader/full/missing-data-ppt 3/35

Reasons of Missing Data

issing data can occur for many reasons#

• Participants can fail to respond to $uestions

%legitimately or illegitimately&more on that later',

• ($uipment and data collecting or recording mechanisms

can malfunction,

• Subjects can withdraw from studies before they are

completed,

• Data entry errors can occur)

Page 4: Missing Data Ppt

7/18/2019 Missing Data Ppt

http://slidepdf.com/reader/full/missing-data-ppt 4/35

Difference between missing and legitimatemissing data

issingData

+f any data on any variablefrom any participant is notpresent, the researcher isdealing with missing orincomplete data

Example: he missing ofresponse on a particular item

that assesses a particularconstruct )

-egitimateissingData

-egitimate missing data is anabsence of data when it isappropriate for there to be anabsence)

Example #whether you are

arried and if so, how longyou have

been married) +f you say youare not

married, it is legitimate foryou to skip the follow.up$uestion

(Cole, 2008)4

Page 5: Missing Data Ppt

7/18/2019 Missing Data Ppt

http://slidepdf.com/reader/full/missing-data-ppt 5/35

• ethods for analy/ing missing data re$uireassumptions about the nature of the data andabout the reasons for the missing observationsthat are often not acknowledged)

• 0eviewing the stages of data collection, datapreparation, data analysis, and interpretation ofresults will highlight the issues that researchersmust consider in making a decision about how tohandle missing data in their work)

5

Page 6: Missing Data Ppt

7/18/2019 Missing Data Ppt

http://slidepdf.com/reader/full/missing-data-ppt 6/35

•Key Elements of missingness

• he number of cases missing pervariable

• he number of variables missing percase)

• he pattern of correlation among

variables)

Page 7: Missing Data Ppt

7/18/2019 Missing Data Ppt

http://slidepdf.com/reader/full/missing-data-ppt 7/35

•Point to be remembered…….• All researchers should examine their data formissingness, and researchers wanting the best %i)e),the most epli!able and "enerali#able' resultsfrom their research need to be prepared to deal withmissing data in the most appropriate and desirableway possible)

+f the proportion of cases with missing data is small,say 1ve percent or less, listwise deletion may beacceptable %0oth, 2334') +f 56 %or fewer' cases arenot missing completely at random, inconsistentparameter estimates can result) 7therwise, missing

data experts %-ittle 80ubin, 239:' recommend usinga - method for analysis, a method that makes use ofall available data points)

$

Page 8: Missing Data Ppt

7/18/2019 Missing Data Ppt

http://slidepdf.com/reader/full/missing-data-ppt 8/35

%atre of Missingness• Missing Completely at Random (MCAR)

• Probability of the missing data on Y is unrelated to Y and X.

Missingness is random not depend on anything.• Eam!le" the re!orting of income #y the res!ondents$

• Chec%ed with the hel! of &ittle's C* test$ The test is #ased on mean

differences across grou! of su#+ects with the same missing data !attern$

*eaders interested more on it should read this article (henoi et al$,

20-2)$• Missing at Random (MAR)

• Probability of missing data on y is relayed to X.

• Eam!le" for really sic% !atients, clinicians may not draw #lood for

routine la#s$

• Missing Not at Random (MNAR)

• Probability of missing data on Y is dependent on value of Y

• Eam!le" *es!ondents with high income less li%ely to re!ort income

'

Page 9: Missing Data Ppt

7/18/2019 Missing Data Ppt

http://slidepdf.com/reader/full/missing-data-ppt 9/35

Missing (ata )onse*en!es

+ias

• (stimatesystematicallydeviates from the$uantity of

interest)• ;o bias if the datais CA0, but biascan occur with not

CA0)• -ost data decreasestatistical power

,arian!e• issing data cansometimes leadto wrong

standard errors)• <rong studyconclusionsabout

relationship ofvariables tooutcomes)

(*oth, 200-)

-

Page 10: Missing Data Ppt

7/18/2019 Missing Data Ppt

http://slidepdf.com/reader/full/missing-data-ppt 10/35

)ommonly/sed Missing (ata

0andling Methods

1

Page 11: Missing Data Ppt

7/18/2019 Missing Data Ppt

http://slidepdf.com/reader/full/missing-data-ppt 11/35

)ommonly/sed Missing (ata Methods

(eletion Methods• -istwise=complete case deletion, pairwisedeletion

2ingle 3mptation Methods•

ean=mode substitution, dummyvariable method, single regression, >otDeck +mputation

Model+ased Methods• aximum -ikelihood, ultiple imputation

11

Page 12: Missing Data Ppt

7/18/2019 Missing Data Ppt

http://slidepdf.com/reader/full/missing-data-ppt 12/35

•(eletion Method

1

Page 13: Missing Data Ppt

7/18/2019 Missing Data Ppt

http://slidepdf.com/reader/full/missing-data-ppt 13/35

istwise (eletion 6)omplete )ase7nalysis8

• 7nly analy/e caseswith complete datadropping the missingvariables)

• <hen a researcher isestimating a model,such as a linear

regression, moststatistical packagesuse listwise deletionby default)

(Cole, 2008)19

Page 14: Missing Data Ppt

7/18/2019 Missing Data Ppt

http://slidepdf.com/reader/full/missing-data-ppt 14/35

istwise (eletion 6)omplete )ase7nalysis8

• 7dantages• (ase of implementation)

• Comparability across analyses

• (isadantage• 0educes statistical power %be!ase lowers n aresearcher cannot anticipate if an ade$uate amountof data remain for the analysis')

• Doesn?t use all information

• (stimates may be biased if data isn?t CA0

%complete case analysis assumes that the observedcomplete cases are a random sample of theoriginally targeted sample, or in 0ubin@s %23:'terminology, that the missing data are CA0'

14

Page 15: Missing Data Ppt

7/18/2019 Missing Data Ppt

http://slidepdf.com/reader/full/missing-data-ppt 15/35

•Pairwise deletion 67ailable )ase 7nalysis8

(Cole, 2008)15

• Analysis with all casesin which the variables

of interest are present)• 7dantage:

• Beeps as manycases as possible for

each analysis)• ses all information

possible with eachanalysis)

(isadantage:Can?t compareanalyses becausesample dierent eachtime)

Page 16: Missing Data Ppt

7/18/2019 Missing Data Ppt

http://slidepdf.com/reader/full/missing-data-ppt 16/35

0ot(e!; 3mptation• 0esearcher should replace a missing value withthe actual score from a similar case in the currentdata set)

• he imputed score is termed E>otF because it isused by the computer)

7dantages• end to increase accuracy because missing datavalues are replaced by the realistic values)

• Particularly helpful when data are missing in certainpatterns

(isadantages• ;o) of classi1cation variables may becomeunmanageable in large surveys)

1<

Page 17: Missing Data Ppt

7/18/2019 Missing Data Ppt

http://slidepdf.com/reader/full/missing-data-ppt 17/35

2ingle 3mptation Methods

1$

Page 18: Missing Data Ppt

7/18/2019 Missing Data Ppt

http://slidepdf.com/reader/full/missing-data-ppt 18/35

2ingle 3mptation Methods

• ean=ode substitution

• Dummy variable control• Conditional mean substitution

1'

Page 19: Missing Data Ppt

7/18/2019 Missing Data Ppt

http://slidepdf.com/reader/full/missing-data-ppt 19/35

Mean=Mode 2bstittion• 0eplace missing value with sample mean

or mode• 0un analyses as if complete cases analysis

7dantagesCan use complete case analysis methods

(isadantages0educes variability%underestimate standarderror')

<eakens covariance and correlation estimatesin the data %because +t ignores relationshipbetween variables'

1-

Page 20: Missing Data Ppt

7/18/2019 Missing Data Ppt

http://slidepdf.com/reader/full/missing-data-ppt 20/35

Computed variance estimated

decrease as more means are added tocalculations)

• Gor example, a researcher might have HIsubjects, but 5 have missing data)

 hrough mean substitution we add 5means to the J5 scores this wouldincrease the N in the calculation of the

variance but would not increase thedeviations around the mean)

Page 21: Missing Data Ppt

7/18/2019 Missing Data Ppt

http://slidepdf.com/reader/full/missing-data-ppt 21/35

• ean substitution is worth

considering when correlationsbetween variables in the data are lowand less than 2I6 of the data are

missing %Donner, 239J')

1

Page 22: Missing Data Ppt

7/18/2019 Missing Data Ppt

http://slidepdf.com/reader/full/missing-data-ppt 22/35

(mmy ,ariable 7d>stment• Create an indicator for missing value

%2Kvalue is missing for observationLIKvalue is observed for observation'

• +mpute missing values to a constant %suchas the mean'

7dantage• ses all available information about missingobservation

(isadantage

• 0esults in biased estimates

• ;ot theoretically driven

Page 23: Missing Data Ppt

7/18/2019 Missing Data Ppt

http://slidepdf.com/reader/full/missing-data-ppt 23/35

egression 3mptation• 0eplaces missing values with

predicted score from a regressione$uation)

7dantage:•

ses information from observed data(isadantages:

• 7verestimates model 1t and correlation

estimates• <eakens variance

9

Page 24: Missing Data Ppt

7/18/2019 Missing Data Ppt

http://slidepdf.com/reader/full/missing-data-ppt 24/35

Model +ased Methods

4

Page 25: Missing Data Ppt

7/18/2019 Missing Data Ppt

http://slidepdf.com/reader/full/missing-data-ppt 25/35

Model +ased Methods

• aximum -ikelihood sing (algorithm

• ultiple imputation

• hese methods share two assumptions# that the

 joint distribution of the data is multivariatenormal, and that the missing data mechanism isignorable)

5

Page 26: Missing Data Ppt

7/18/2019 Missing Data Ppt

http://slidepdf.com/reader/full/missing-data-ppt 26/35

Maximm i;elihood /sing EM algorithm

• +denti1es the set of parameter values thatproduces the highest log.likelihood)

• - estimate# value that is most likely to haveresulted in the observed data

• Conceptually, process the same with or withoutmissing data

7dantages:• ses full information %both complete cases andincomplete cases' to calculate log likelihood

• nbiased parameter estimates with CA0=A0 data

(isadantages• S(s biased downward&can be adjusted by usingobserved information matrix

<

Page 27: Missing Data Ppt

7/18/2019 Missing Data Ppt

http://slidepdf.com/reader/full/missing-data-ppt 27/35

• we can base estimation on the

likelihood of the observed data)

$

Page 28: Missing Data Ppt

7/18/2019 Missing Data Ppt

http://slidepdf.com/reader/full/missing-data-ppt 28/35

Mltiple 3mptation• +mpute# Data is M1lled inN with imputed valuesusing speci1ed regression model

• his step is repeated m times, resulting in aseparate dataset each time)

•  Analy/e# Analyses performed within each

dataset•  Pool# 0esults pooled into one estimate

• +mputation is done by the Donald 0ubin formula#

• OK <Q%2Q2=m' R)• < and R are the within and between imputedvariances)

'

Page 29: Missing Data Ppt

7/18/2019 Missing Data Ppt

http://slidepdf.com/reader/full/missing-data-ppt 29/35

Mltiple 3mptation• 7dantages#

Oariability more accurate with multipleimputations for each missing value

• Considers variability due to samplingA;D variability due to imputation

• (isadantages:• Cumbersome coding

• 0oom for error when specifying models

-

Page 30: Missing Data Ppt

7/18/2019 Missing Data Ppt

http://slidepdf.com/reader/full/missing-data-ppt 30/35

Mltiple 3mptation

sing this likelihood function the -

procedure provides parameter estimatesbased on all available data, including theincomplete cases) >owever, simulationstudies show that - is an inade$uate

estimation techni$ue for some smallsample problems and results in biasedestimates %-ittle and 0ubin, 2393') Gorlarge samples - is a preferred method fordealing with missing data %Schafer and"raham, JIIJ')

9

Page 31: Missing Data Ppt

7/18/2019 Missing Data Ppt

http://slidepdf.com/reader/full/missing-data-ppt 31/35

(i?eren!e between EM algorithmand M3•

.or the E algorithm we su#stituted a !redicted value on the #asis of the varia#les that

were availa#le for each case$ /n multi!le

im!utation we will do something similar, #ut

will add error com!onents to counteract the

tendency of E and aimum &i%elihood to

underestimate standard errors$

91

Page 32: Missing Data Ppt

7/18/2019 Missing Data Ppt

http://slidepdf.com/reader/full/missing-data-ppt 32/35

*oth, -1

9

Page 33: Missing Data Ppt

7/18/2019 Missing Data Ppt

http://slidepdf.com/reader/full/missing-data-ppt 33/35

99

Page 34: Missing Data Ppt

7/18/2019 Missing Data Ppt

http://slidepdf.com/reader/full/missing-data-ppt 34/35

•Referenesllison, $ D$ (200-)$ Missing Data$ Sage University Papers Series on

Quantitative Applications in the Social Sciences. Thousand

3a%s" age$

Cole, 4$ C$ (2008)$ 5ow to deal with missing data$ /n 4$ 6$ 3s#orne

(Ed$), Best practices in quantitative methods (2-1728)$ Thousand

3a%s, C" age$

Enders, C$ (20-0)$ Applied Missing Data Analysis$ 9uilford ress" :ew;or%$

&ittle, *$ 4$, < Donald, *$ (2002)$ Statistical Analysis with Missing

 Data$ 4ohn 6iley < ons, /nc" 5o#o%en$

*oth, $ (-1)$ issing data" conce!tual review for a!!lied

 !sychologists$ Personnel Psychology, 1=, >=?>@0$

chafer, 4$ &$, 4ohn 6$ 9$ (2002)$ issing Data" 3ur Aiew of the tate

of the rt$ Psychological Methods, (=), -1=?-==$

94

Page 35: Missing Data Ppt

7/18/2019 Missing Data Ppt

http://slidepdf.com/reader/full/missing-data-ppt 35/35