LEARNING PROGRAMME Types of data and descriptive statistic Intermediate Training in Quantitative...
-
Upload
mae-malone -
Category
Documents
-
view
218 -
download
1
Transcript of LEARNING PROGRAMME Types of data and descriptive statistic Intermediate Training in Quantitative...
LEARNING PROGRAMME
Types of data and Types of data and descriptive statisticdescriptive statistic
Intermediate Training in Quantitative Intermediate Training in Quantitative Analysis Analysis
Bangkok 19-23 November 2007Bangkok 19-23 November 2007
LEARNING PROGRAMME - 2
Topics to be covered in this presentation
Data collection and variable types
Basic descriptive analysis Review of mean, median, mode, range,
frequencies, crosstabs Multiple response
LEARNING PROGRAMME - 3
Learning objectives
By the end of this session, the participant should be able to:Define variable typesConduct basic descriptive statistics for continuous and categorical variablesConduct multiple response tests
LEARNING PROGRAMME - 4
In the social sciences we are usually interested in discovering something about a phenomenon.
Whatever the phenomenon we desire to explain, we seek to explain it by collecting data from the real world and then using these data to draw conclusions about what is being studied.
LEARNING PROGRAMME - 5
When collecting data…
We rely on two types of variables
1. Continuous2. Categorical
variables provide us information on individuals, households, administrative areas, or meteorological stations, etc
LEARNING PROGRAMME - 6
Continuous variableThey assume numeric values, expressed in a given unit of measurement
Income, mm. of rainfall, amount of agricultural production, percent of food insecure hhs, Weight-for-height z-score for children etc.
Most importantly, each number (within a variable) has a meaning in relation to the other numbers, allowing arithmetic comparisons to be drawn
LEARNING PROGRAMME - 7
Type of variables
CATEGORICAL CONTINUOUS
Nominal
Ordinal
Interval
Ratio
LEARNING PROGRAMME - 8
Types of variables:different levels of measurement
IntervalAn interval scale differs from an ordinal one in that the differences between adjacent categories are equal. Examples include the Fahrenheit and Celsius temperature scale.
Ratio A ratio scale differs from an interval one in that there is a true zero point (% of HHs food insecure).
LEARNING PROGRAMME - 9
Distribution of continuous variables
Continuous variables can either be symmetrically distributed (ie normally distributed) or asymmetrically distributed (otherwise known as skewed)Note: To assess a variables distribution in spss, use the “histogram” option under graphs (spss
help can provide more details)
A normal distribution looks
like this…
LEARNING PROGRAMME - 10
Distribution of continuous variables…
A skewed distribution looks like this…as this shows, distributions can be both positively and negatively skewed
LEARNING PROGRAMME - 11
Lets take a look at two examples…
Income
Weight for height z-scores
LEARNING PROGRAMME - 12
Categorical variables
Values are categories, taking a limited set of values
Ex. Child age groups– 0-11, 12-23, 24-35, 36-47, 48-59 months; Sex of respondent- male/female
Categories can be denoted by numbers/ alphabeticallyCategorical variables take 2 forms-- nominal and ordinal variables.
LEARNING PROGRAMME - 13
Type of variables
CATEGORICAL CONTINUOUS
Nominal
Ordinal
Interval
Ratio
LEARNING PROGRAMME - 14
Nominal variables
A nominal measurement scale is a set of mutually exclusive categories that varies qualitatively but not quantitatively, for example gender, provinces, income sources, etc.
Codes are labels representing different behaviours/ characteristics and they do not imply any underlying order
Variables with a "yes-no“ answer, “male” or “female”
LEARNING PROGRAMME - 15
Ordinal categorical variablesAn ordinal measurement scale differs from a nominal one in that the order among the original categories is preserved in the analysis. However differences between adjacent categories are not equal.
Examples social class and perception (bad – medium – good).
LEARNING PROGRAMME - 16
Descriptive statistics
Descriptive statistics are the most basic from of statistics
They include: Summaries of one variable Comparisons of two or more variables
These tests are the foundation for more advanced statistical techniques
LEARNING PROGRAMME - 17
Descriptives
Continuous variables Range Mean Median Mode
Categorical variables Frequencies Crosstabs
LEARNING PROGRAMME - 18
First lets discuss descriptives for continuous variables…
LEARNING PROGRAMME - 19
Range
It is the spread between the smallest and the largest values in a distribution
LEARNING PROGRAMME - 20
MeanThe mean is a measure of the variable’s central tendency
The (arithmetic) MEAN is the sum of all the values divided by the numbers of cases
Statistics such as mean assume normal distributions
LEARNING PROGRAMME - 21
Median
The MEDIAN is the value above and below which half the cases fall, the 50th percentile, i.e. the middle value of a set of observations ranked in order.
The median is a measure of central tendency not sensitive to outlying values--unlike the mean, which can be affected by a few extremely high or low values.
A median does not necessarily assume an normal distribution
LEARNING PROGRAMME - 22
Mode
The MODE of a distribution is the value of the observation occurring most frequently. It can be used with all measurement scales.
If several values share the greatest frequency of occurrence, each of them is a mode.
LEARNING PROGRAMME - 23
To illustrate these concepts…
Looking at age data from 10 individuals…
What is the range? 12 to 38 What is the mean? 27.2 What is the median? 28 What is the mode? 28
1 2 3 4 5 6 7 8 9 10
12 19 23 26 28 28 28 34 36 38
LEARNING PROGRAMME - 24
Other basic concepts that must be understood…
Variance
Standard deviation
LEARNING PROGRAMME - 25
Standard deviation and variance
The standard deviation is the average error between the mean and the observations made (and so is a measure of how well the mean describes the actual data).
The variance is square of the standard deviation
1
LEARNING PROGRAMME - 26
Variance and standard deviation
1 2 3 4 5 6 7 8 9 10
12 19 23 26 28 28 28 34 36 38
What is the standard deviation of age??
What is the variance??
LEARNING PROGRAMME - 27
Standard deviation
In a normal distribution, 68.27% of cases fall within ± one standard deviation of the mean, 95.45% of cases fall within ± two standard deviations and 99.73% fall within ± three standard deviations.
For example, if the mean age is 45, with a standard deviation of 10, 95% of the cases would be between 25 and 65 in a normal distribution.
LEARNING PROGRAMME - 28
Standard deviation
LEARNING PROGRAMME - 29
Now lets discuss descriptives for categorical variables…
LEARNING PROGRAMME - 30
Analysing categorical data
If you want to look at the relationship between two categorical variables:
Cannot use the mean and median
The mean of a categorical variable is completely meaningless because the numeric values we attach to different categories are arbitrary
LEARNING PROGRAMME - 31
Descriptives for categorical data
The most basic descriptive for categorical variables are frequencies
which shows the number (or percent) of cases in each category
WAZPREV
4290 11.2 80.2 80.2
1058 2.8 19.8 100.0
5348 14.0 100.0
32840 86.0
38188 100.0
Not malnourished
Malnourished
Total
Valid
SystemMissing
Total
Frequency Percent Valid PercentCumulative
Percent
LEARNING PROGRAMME - 32
Descriptives for categorical data
Second, we can also cross tabulate categories from one variable with categories from a second variable, this is known as a contingency table
LEARNING PROGRAMME - 33
Contingency TablesChild Gender * underweight Crosstabulation
underweight Total
no yes
Child Gender Male Count 793 280 1073
% within Child Gender 74% 26% 100%
% within underweight 51% 53% 51%
% of Total 38% 13% 51%
Female Count 772 253 1025
% within Child Gender 75% 25% 100%
% within underweight 49% 47% 49%
% of Total 37% 12% 49%
Total Count 1565 533 2098
% within Child Gender 75% 25% 100%
% within underweight 100% 100% 100%
% of Total 75% 25% 100%
LEARNING PROGRAMME - 34
Contingency Tables
Orphan
TotalNo Yes
Residence Capital, large city Count 1,071 125 1,196
% within residence 89.5% 10.5% 100.0%
% within orphan 7.4% 6.8% 7.3%
Small city Count 395 37 432
% within residence 91.4% 8.6% 100.0%
% within orphan 2.7% 2.0% 2.6%
Town Count 824 131 955
% within residence 86.3% 13.7% 100.0%
% within orphan 5.7% 7.1% 5.8%
Countryside Count 12,252 1,558 13,810
% within residence 88.7% 11.3% 100.0%
% within orphan 84.3% 84.2% 84.2%
Total Count 14,542 1,851 16,393
% within residence 88.7% 11.3% 100.0%
% within orphan 100.0% 100.0% 100.0%
LEARNING PROGRAMME - 35
Example…
What percentage of orphans are malnourished?
What percentage of non orphans are malnourished?
What percentage of malnourished children are orphans?
WAZPREV * ORPHAN Crosstabulation
3995 207 4202
95.1% 4.9% 100.0%
80.2% 80.9% 80.2%
987 49 1036
95.3% 4.7% 100.0%
19.8% 19.1% 19.8%
4982 256 5238
95.1% 4.9% 100.0%
100.0% 100.0% 100.0%
Count
% within WAZPREV
% within ORPHAN
Count
% within WAZPREV
% within ORPHAN
Count
% within WAZPREV
% within ORPHAN
Not malnourished
Malnourished
WAZPREV
Total
Non orphan Orphan
ORPHAN
Total
LEARNING PROGRAMME - 36
Multiple response analysis
Sometimes we have to analyze categorical data, where households are able to give more than one response to a question (ex. livelihoods, coping strategies, etc)
Analyzing such data requires a multiple response analysis
LEARNING PROGRAMME - 37
After completing question SCM1, complete one incident at a time (line by line), each time repeating the questions above
SCM 2.By order of importance, what incidents did your household experience in the last 1 YEARIncidents code: 1 = Insecurity, Violence2 = Increased price for food4 = Drop in farm gate price5 = Floods6 = drought/dry spell7 = crop pest and disease8 = Livestock disease9 = Sickness of household Member10 = Death of household member11 = Increased household size (IDPs)12 = Loss / lack of employment
SCM 3. What is the main action your household took to compensate the effect of that incident?Coping code: 0 = Nothing1 = Eat less preferred foods2 = Eat fewer or smaller meals per day3 = Go one entire day without meals4 = collect wild foods, hunt or harvest immature crops5 = Distress sale / slaughter of livestock6 = Distress sale of other assets7 = Purchase food on credit8 = Borrow food from families and friends, kinship support9 = Worked for money10 = Worked for food only11 = Reduced expenditures on health or education12 = spent savings13 = Some household members migrated14 = Other (specify)_____________
SCM 4.How often did you do this in the last 1 YEAR?
SCM 5.Did your Household Recover from that incident?
MAIN : (Code) ___ (Code) ___ ___ times
1. Yes
2. No
SECOND : (Code) ___ (Code) ___ ___ times
1. Yes
2. No
THIRD : (Code) ___ (Code) ___ ___ times
1. Yes
2. No
FOURTH : (Code) ___ (Code) ___ ___ times
1. Yes
2. No
FIFTH : (Code) ___ (Code) ___ ___ times
1. Yes
2. No
LEARNING PROGRAMME - 38
Multiple response frequencies$shocks Frequencies
2418 12.5% 36.0%
2115 11.0% 31.5%
881 4.6% 13.1%
1410 7.3% 21.0%
2695 14.0% 40.1%
2031 10.5% 30.3%
1479 7.7% 22.0%
2950 15.3% 43.9%
1229 6.4% 18.3%
689 3.6% 10.3%
1398 7.2% 20.8%
19295 100.0% 287.4%
Insecurity, violence
Higher prices
Drop in farmgate price
Floods
Drought
Crop pest/disease
Livestock disease
Sickness in HH
Death in HH
Increased HH size (IDPs)
Loss/lack of employment
$shocksa
Total
N Percent
Responses Percent ofCases
Groupa.
LEARNING PROGRAMME - 39
N is the number of households that reported a shock
The Percent column reports the percentage of total responses represented by each shock. This is not easily available from individual frequency tables.
The Percent of Cases column is the percentage of valid cases represented by each shock.
LEARNING PROGRAMME - 40
Multiple response crosstabs$shocks*fcgbivariate Crosstabulation
726 1516 2242
32.4% 67.6%
710 1269 1979
35.9% 64.1%
275 536 811
33.9% 66.1%
494 787 1281
38.6% 61.4%
911 1607 2518
36.2% 63.8%
646 1227 1873
34.5% 65.5%
414 946 1360
30.4% 69.6%
764 2050 2814
27.1% 72.9%
341 822 1163
29.3% 70.7%
187 451 638
29.3% 70.7%
431 869 1300
33.2% 66.8%
1887 4467 6354
Count
% within $shocks
Count
% within $shocks
Count
% within $shocks
Count
% within $shocks
Count
% within $shocks
Count
% within $shocks
Count
% within $shocks
Count
% within $shocks
Count
% within $shocks
Count
% within $shocks
Count
% within $shocks
Count
Insecurity, violence
Higher prices
Drop in farmgate pri
Floods
Drought
Crop pest/disease
Livestock disease
Sickness in HH
Death in HH
Increased HH size (I
Loss/lack of employm
$shocks
Total
poor/borderline food
acceptablefood cons
fcgbivariate
Total
Percentages and totals are based on respondents.
Groupa.
LEARNING PROGRAMME - 41
Multiple response… To set up a multiple response in spss…
Click on “Analyze” Click on “Multiple response” Click on “Define sets…” Move the variables of interest into the box in box on
the right Then define the range of the variable Then name the variable Then click on “Add” Then click on “Close”
LEARNING PROGRAMME - 42
Multiple response…To run a multiple response in spss…
Click on “Analyze”Click on “Multiple response”Click on “frequencies…” or “crosstabs…”
(whichever descriptive test you would like to conduct)
Move the variables into the proper boxesThen click “Okay”
LEARNING PROGRAMME - 43
now practical exercises…..