Copy of Data Analysis 08
Transcript of Copy of Data Analysis 08
-
8/14/2019 Copy of Data Analysis 08
1/44
Data Analysis
Florenda F. Cabatit RN MAFlorenda F. Cabatit RN MAFacilitator
-
8/14/2019 Copy of Data Analysis 08
2/44
DATA ANALYSIS
Data analysis is the process by whichinformation is rendered meaningfuland intelligible (Polit and Hungler,1995).
It is the systematic organization andsynthesis of research data and thetesting of research hypotheses usingthose data (2004).
-
8/14/2019 Copy of Data Analysis 08
3/44
Statistical Analysis
Quantitative analysis deals withnumerical analysis of information.
It is the manipulation of numeric data
through statistical procedures for thepurpose of describing phenomena orassessing the magnitude and reliabilityof relationships among them.
Statistics is the scientific method used inquantitative analysis.
-
8/14/2019 Copy of Data Analysis 08
4/44
StatisticsStatistics
Statistics helps to:Organize dataSummarize dataEvaluate data
Present data in an easilyunderstood form .
-
8/14/2019 Copy of Data Analysis 08
5/44
StatisticsStatistics
Two branches of Statistics :Descriptive statistics -
statistics used to describe andsummarize dataInferential Statistics
statistics that permit inferenceson whether relationshipsobserved in a sample are likelyto occur in the larger population.
-
8/14/2019 Copy of Data Analysis 08
6/44
Considerations in theConsiderations in thechoice of appropriatechoice of appropriatestatistical methodsstatistical methods
The purpose of the research The level of measurement of thevariables
The number of groups/variablesinvolved
The type of groups being studied
-
8/14/2019 Copy of Data Analysis 08
7/44
Levels of Measurement
Nominal - the lowest level- involves assigning numbers to classify
characteristics into categories
- numeric codes assigned in nominalmeasurement do not convey quantitativeinformation.
- the numbers are merely symbols that
represent different values.- categories must be mutually exclusive
and collectively exhaustive.
-
8/14/2019 Copy of Data Analysis 08
8/44
Ordinal Measurement
This involves sorting objects on the basisof their relative standing or ranking on anattribute.The numbers are not arbitrary-they signifyincremental values but does not however,tell anything about how much greater one
level is than another.
-
8/14/2019 Copy of Data Analysis 08
9/44
Interval Measurement
A measurement in which
an attribute of a variableis rank ordered on a scalethat has equal distances
between points on thatscale.
-
8/14/2019 Copy of Data Analysis 08
10/44
Ratio ScaleRatio Scale
A quantitative measurement in which intervalsare equal and there is a true zero point.
The highest level of measurementAll arithmetic operations are permissible withthis measurement (add, subtract, multiply, anddivide numbers on this scale).
-
8/14/2019 Copy of Data Analysis 08
11/44
Descriptive Statistics
Three characteristics to fullydescribe a set of data:
shape of the distributionvalues
central tendency Variability
-
8/14/2019 Copy of Data Analysis 08
12/44
Review of DescriptiveStats.
Descriptive Statistics are used to presentquantitative descriptions in a manageableform.This method works by reducing lots of datainto a simpler summary.Example:
37 0 Centigrade as average adult bodytemperatureSUs quality-point system
-
8/14/2019 Copy of Data Analysis 08
13/44
Univariate Analysis
This is the examination across cases of onevariable at a time.Frequency distributions are used to groupdata.One may set up margins that allow us togroup cases into categories.Examples include
Age categoriesPrice categoriesTemperature categories.
-
8/14/2019 Copy of Data Analysis 08
14/44
Distributions
Two ways to describe a univariatedistribution
A tableA graph (histogram, bar chart)
-
8/14/2019 Copy of Data Analysis 08
15/44
Distributions (cont)
Distributions may also be displayedusing percentages.
For example, one could usepercentages to describe the following:
Percentage of people under the
poverty levelOver a certain ageOver a certain score on a
standardized test
-
8/14/2019 Copy of Data Analysis 08
16/44
Distributions (cont.)
CategoryCategory PercentPercentUnder 35 9%36-45 2146-55 4556-65 1966+ 6
A Frequency Distribution Table A Frequency Distribution Table
-
8/14/2019 Copy of Data Analysis 08
17/44
Distributions (cont.)
05
1015
2025303540
45
U
n d e r
3 5
3 6
- 4 5
4 6
- 5 5
5 6
- 6 5
6 6 +
Percent
A Histogram
-
8/14/2019 Copy of Data Analysis 08
18/44
Central Tendency
An estimate of the center of adistribution
Three different types of estimates:MeanMedianMode
-
8/14/2019 Copy of Data Analysis 08
19/44
Mean
The most commonly used method of describing central tendency.One basically totals all the resultsand then divides by the number of units or n of the sample.Example: The NCM 104 Quiz meanwas determined by the sum of all thescores divided by the number of students taking the exam.
-
8/14/2019 Copy of Data Analysis 08
20/44
Median
The median is the score found at theexact middle of the set.One must list all scores in numericalorder and then locate the score inthe center of the sample.Example: If there are 500 scores in
the list, score #250 would be themedian. This is useful in weeding out outliers.
-
8/14/2019 Copy of Data Analysis 08
21/44
Mode
The mode is the most repeated scorein the set of results.Lets take the set of scores:15,20,21,20,36,15, 25,15Again we first line up the scores15,15,15,20,20,21,25,36
15 is the most repeated score and istherefore labeled the mode.
-
8/14/2019 Copy of Data Analysis 08
22/44
Central Tendency
If the distribution is normal (i.e., bell-shaped), the mean, median and mode
are all equal.In our analyses, well use the mean.
-
8/14/2019 Copy of Data Analysis 08
23/44
Dispersion
Two estimates types:
Range
Standard deviationStandard deviation is moreaccurate/detailed because an outlier can
greatly extend the range.
-
8/14/2019 Copy of Data Analysis 08
24/44
Range
The range is used to identify thehighest and lowest scores.Lets take the set of scores:15,20,21,20,36,15, 25,15.The range would be 15-36. Thisidentifies the fact that 21 points
separates the highest to the lowestscore.
-
8/14/2019 Copy of Data Analysis 08
25/44
Standard Deviation
The standard deviation is avalue that shows the relationthat individual scores have tothe mean of the sample.If scores are said to bestandardized to a normal curve,there are several statisticalmanipulations that can beperformed to analyze the data
set.
-
8/14/2019 Copy of Data Analysis 08
26/44
Standard Dev. (cont)
Assumptions may be made aboutthe percentage of scores as theydeviate from the mean.If scores are normally distributed,one can assume thatapproximately 69% of the scores in
the sample fall within one standarddeviation of the mean.Approximately 95% of the scoreswould then fall within two standard
deviations of the mean.
-
8/14/2019 Copy of Data Analysis 08
27/44
Standard Dev. (cont)
The standard deviation calculatesthe square root of the sum of the
squared deviations from the mean of all the scores, divided by the number of scores.This process accounts for bothpositive and negative deviationsfrom the mean.
-
8/14/2019 Copy of Data Analysis 08
28/44
RESEARCH QUESTION: DESCRIBE
LEVEL TYPE OF DESCRIPTION STATISTICAL TOOL
NOMINAL
Distribution
Central Tendency
Frequency distributionContingency Table
Mode
ORDINAL Distribution
Central Tendency
Frequency DistributionContingency TableScatterpoint
Mode, Median
RATIO/INTERVAL
Distribution Frequency DistributionContingency TableScatterpoint
Central TendencyMode, Median, Mean
VariabilityRange, Variance,
Standard Deviation
-
8/14/2019 Copy of Data Analysis 08
29/44
Inferential
statistics Based on the law of probabilityIt provides a means for drawingconclusions about a population,given data from a sampleIt estimates population parametersfrom sample statistics
-
8/14/2019 Copy of Data Analysis 08
30/44
Inferential
StatisticsStatistical Inference consists of twotechniques:
2.Estimation of parameters3.Hypothesis testing
-
8/14/2019 Copy of Data Analysis 08
31/44
Hypothesis TestingStatistical hypothesis testing provides
objective criteria for deciding whether hypotheses are supported by empirical evidence.
It is a process of disproof or rejection.Researchers seek to reject the null hypothesis through various statistical tests.Hypothesis testing uses samples to draw conclusions about relationships within the
population.
-
8/14/2019 Copy of Data Analysis 08
32/44
Type I and Type II
ErrorsType I Error - researchers make a type I
error when a true null hypothesis isrejected.
Type II Error researchers make a type IIerror when a false null hypothesis isaccepted
-
8/14/2019 Copy of Data Analysis 08
33/44
Level of Significance
This refers to the risk of making a typeI error in a statistical analysis.The value selected beforehand
signifies the risk or the probability of rejecting of rejecting a true nullhypothesis.
The two most frequently usedsignificance levels (referred to as alpha or ) are:
.05
.01
-
8/14/2019 Copy of Data Analysis 08
34/44
Level of Significance
With .05 significance level, we areaccepting the risk that out of 100 samplesdrawn from a population, a true nullhypothesis would be rejected only 5 times.
With a .01 level of significance, the risk of a type I error is lower: in only 1 sample outof 100 would we erroneously reject thenull hypothesis.
-
8/14/2019 Copy of Data Analysis 08
35/44
Critical Region
This refers to the area in the samplingdistribution representing values thatare improbable if the null hypothesisis true.
It is defined by the level of significance
-
8/14/2019 Copy of Data Analysis 08
36/44
Statistical Tests
Two-tailed test- this means that both endsor tails of the sampling distribution areused to determine improbable values.
In one-tailed tests, the critical region of improbable values is entirely in one tailof the distribution-the tail correspondingto the direction of the hypothesis
-
8/14/2019 Copy of Data Analysis 08
37/44
An example of Critical Regions of a two-tailed test
-
8/14/2019 Copy of Data Analysis 08
38/44
Types of StatisticalTypes of Statistical
TestsTestsParametric Tests a class of inferential statistical tests thatinvolve:a. Assumptions about thedistribution of the variablesb. The estimation of a parameterc. The use of interval or ratiomeasures.
-
8/14/2019 Copy of Data Analysis 08
39/44
Statistical TestsStatistical Tests
Non-parametric Tests statisticaltests that do not estimate parameters
- also called distribution-free statistics.
-
8/14/2019 Copy of Data Analysis 08
40/44
-
8/14/2019 Copy of Data Analysis 08
41/44
-
8/14/2019 Copy of Data Analysis 08
42/44
Steps in Hypothesis
testing1. State the alternative hypothesis2. State the null hypothesis3. Establish the level of significance
4. Select a one-tailed or two-tailed test5. Compute a test statistic6. Calculate the degrees of freedom
7. Obtain a tabled value for the statisticaltest8. Compare the test statistic with the
tabled value.
The Decision Matrix
-
8/14/2019 Copy of Data Analysis 08
43/44
The Decision MatrixIn realityIn reality
WhatWhatwe concludewe conclude
Null trueNull true Null falseNull false
Alternative falseAlternative false Alternative trueAlternative true
InIn realityreal ity...... InIn realityreal ity......
Accept nullAccept null
Reject alternativeReject alternative
Reject null
Accept alternative
WeWe says ay ......
There is no real programThere is no real programeffecteffect
There is no difference,There is no difference,gaingain
Our theory is wrongOur theory is wrong
We say...
There is a real programeffect
There is a difference, gain Our theory is correct
There is no real program effectThere is no real program effect There is no difference, gainThere is no difference, gain Our theory is wrongOur theory is wrong
There is a real program effectThere is a real program effect There is a difference, gainThere is a difference, gain Our theory is correctOur theory is correct
1-1-
THE CONFIDENCE LEVELTHE CONFIDENCE LEVEL TYPE II ERRORTYPE II ERROR
The odds of saying there isThe odds of saying there is nono effect or gain when in fact thereeffect or gain when in fact thereis noneis none
# of times out of 100 when# of times out of 100 whenthere isthere is nono effect, well say effect, well say
there is nonethere is none
The odds of saying there is noThe odds of saying there is noeffect or gain when in facteffect or gain when in fact therethereis oneis one
# of times out of 100 when# of times out of 100 whentherethere isis an effect, well say an effect, well say
there is nonethere is none
1-1- TYPE I ERRORTYPE I ERROR POWERPOWER
The odds of saying thereThe odds of saying there isis ananeffect or gain when in fact thereeffect or gain when in fact there
is noneis none
The odds of saying thereThe odds of saying there isis ananeffect or gain when in fact thereeffect or gain when in fact there
is oneis one
# of times out of 100 when# of times out of 100 whenthere isthere is nono effect, well say effect, well say
there is onethere is one
# of times out of 100 when# of times out of 100 whentherethere isis an effect, well say an effect, well say
there is onethere is one
-
8/14/2019 Copy of Data Analysis 08
44/44
Decision Matrix
If you try to increase power, youIf you try to increase power, youincrease the chance of windingincrease the chance of winding
up in the bottom row and of up in the bottom row and of Type I error.Type I error.
If you try to decrease Type IIf you try to decrease Type I
errors, you increase the chanceerrors, you increase the chanceof winding up in the top row andof winding up in the top row andof Type II error.of Type II error.