Data Analysis and Presentation
description
Transcript of Data Analysis and Presentation
![Page 2: Data Analysis and Presentation](https://reader031.fdocuments.in/reader031/viewer/2022013004/568147ee550346895db5259b/html5/thumbnails/2.jpg)
Data Type
• Quantitative data is classified as categorical and numerical data
• Categorical data refer to data whose values cannot be measured numerically but can be either classified into sets (categories) such as sex (male and female), religion, department
• Numerical data, which are sometimes termed ‘quantifiable’, are those whose values are measured or counted numerically as quantities
• These are analyzed by different techniques
![Page 3: Data Analysis and Presentation](https://reader031.fdocuments.in/reader031/viewer/2022013004/568147ee550346895db5259b/html5/thumbnails/3.jpg)
Quantitative Data Analysis
• Two common types analysis1. Descriptive statistics– to describe, summarize, or explain a given set of data
2. Inferential statistics– use statistics computed from a sample to infer about
the population– It is concerned by making inferences from the
samples about the populations from which they have been drawn
![Page 4: Data Analysis and Presentation](https://reader031.fdocuments.in/reader031/viewer/2022013004/568147ee550346895db5259b/html5/thumbnails/4.jpg)
Common data analysis technique
1. Frequency distribution 2. Measures of central tendency3. Measures of dispersion 4. Correlation 5. Regression 6. And more
![Page 5: Data Analysis and Presentation](https://reader031.fdocuments.in/reader031/viewer/2022013004/568147ee550346895db5259b/html5/thumbnails/5.jpg)
Frequency distribution
• It is simply a table in which the data are grouped into classes and the number of cases which fall in each class are recorded.
• Shows the frequency of occurrence of different values of a single Phenomenon.
• Main purpose1. To facilitate the analysis of data.2. To estimate frequencies of the unknown population
distribution from the distribution of sample data and3. To facilitate the computation of various statistical measures
![Page 6: Data Analysis and Presentation](https://reader031.fdocuments.in/reader031/viewer/2022013004/568147ee550346895db5259b/html5/thumbnails/6.jpg)
Example – Frequency Distribution
• In a survey of 30 organizations, the number of computers registered in each organizations is given in the following table
• This data has meaning unless it is summarized in some form
![Page 7: Data Analysis and Presentation](https://reader031.fdocuments.in/reader031/viewer/2022013004/568147ee550346895db5259b/html5/thumbnails/7.jpg)
Example The following table shows frequency distribution
Number of computers
![Page 8: Data Analysis and Presentation](https://reader031.fdocuments.in/reader031/viewer/2022013004/568147ee550346895db5259b/html5/thumbnails/8.jpg)
Example …
• The above table can tell us meaningful information such as – How many computers most organizations has?– How many organizations do not have computers?– How many organizations have more than five
computers?– Why the computer distribution is not the same in
all organizations?– And other questions
![Page 9: Data Analysis and Presentation](https://reader031.fdocuments.in/reader031/viewer/2022013004/568147ee550346895db5259b/html5/thumbnails/9.jpg)
Continuous frequency distribution
• Continuous frequency distribution constructed when the values do not have discrete values like number of computers
• Example is age, salary variables have continuous values
![Page 10: Data Analysis and Presentation](https://reader031.fdocuments.in/reader031/viewer/2022013004/568147ee550346895db5259b/html5/thumbnails/10.jpg)
Constructing frequency table
• The number of classes should preferably be between 5 and 20. However there is no rigidity about it.
• As far as possible one should avoid values of class intervals as 3,7,11,26….etc. preferably one should have class intervals of either five or multiples of 5 like 10,20,25,100 etc.
• The starting point i.e the lower limit of the first class, should either be zero or 5 or multiple of 5.
• To ensure continuity and to get correct class interval we should adopt “exclusive” method.
• Wherever possible, it is desirable to use class interval of equal sizes.
![Page 11: Data Analysis and Presentation](https://reader031.fdocuments.in/reader031/viewer/2022013004/568147ee550346895db5259b/html5/thumbnails/11.jpg)
Constructing …
You can create a frequency table with two variables This is called Bivariate frequency table
IT staff
Type of organizations
<10 10-50 >50
Private 15 5 0Government 0 10 50Non-government 0 30 5
![Page 12: Data Analysis and Presentation](https://reader031.fdocuments.in/reader031/viewer/2022013004/568147ee550346895db5259b/html5/thumbnails/12.jpg)
Graphs
• You can plot your frequency distribution using bar graph, pie chart, frequency polygon and other type of charts
• Computer Import in Ethiopia in 2010
Country of Origin
Computer import
China 62Japan 47Germany 35India 16USA 6
![Page 13: Data Analysis and Presentation](https://reader031.fdocuments.in/reader031/viewer/2022013004/568147ee550346895db5259b/html5/thumbnails/13.jpg)
Bar graph
China Japan Germany India 0
10
20
30
40
50
60
70
Computer import
![Page 14: Data Analysis and Presentation](https://reader031.fdocuments.in/reader031/viewer/2022013004/568147ee550346895db5259b/html5/thumbnails/14.jpg)
Measures of central tendency
• Mode shows values that occurs most frequently
• is the only measure of central tendency that can be interpreted sensibly
• Median is used to identify the mid point of the data
![Page 15: Data Analysis and Presentation](https://reader031.fdocuments.in/reader031/viewer/2022013004/568147ee550346895db5259b/html5/thumbnails/15.jpg)
Central Tendency ….• Mean is a measure of central tendency• includes all data values in its calculation
Mean = sum of observation (sum)/ Total no. of observation (frequency )
• The mean for grouped data is obtained from the following formula:
• where x = the mid-point of individual class• f = the frequency of individual class• N = the sum of the frequencies or total frequencies.
N
fxx
![Page 16: Data Analysis and Presentation](https://reader031.fdocuments.in/reader031/viewer/2022013004/568147ee550346895db5259b/html5/thumbnails/16.jpg)
Advantages of Mean• It should be rigidly defined.• It should be easy to understand and compute.• It should be based on all items in the data.• Its definition shall be in the form of a mathematical• formula.• It should be capable of further algebraic treatment.• It should have sampling stability.• It should be capable of being used in further statistical
computations or processing• However affected by extreme data values in skewed distributions• For Skewed distribution, use median than mean
![Page 17: Data Analysis and Presentation](https://reader031.fdocuments.in/reader031/viewer/2022013004/568147ee550346895db5259b/html5/thumbnails/17.jpg)
Exercise
• Do the following exercise for the following IT staff data for 13 organizations named as O1 to O13
• 25, 18, 20, 10, 8, 30, 42, 20, 53, 25, 10, 20, 42• What is the mode?• What is the median?• What is the mean?• Change into frequency table?• Plot on bar graph? Pie chart? • What you interpret from the data?
![Page 18: Data Analysis and Presentation](https://reader031.fdocuments.in/reader031/viewer/2022013004/568147ee550346895db5259b/html5/thumbnails/18.jpg)
Measures of Dispersion
• The measure of central tendency serve to locate the center of the distribution,
• Do not measure how the items are spread out on either side of the center.
• This characteristic of a frequency distribution is commonly referred to as dispersion.
• Small dispersion indicates high uniformity of the items, • Large dispersion indicates less uniformity.• Less variation or uniformity is a desirable characteristic
![Page 19: Data Analysis and Presentation](https://reader031.fdocuments.in/reader031/viewer/2022013004/568147ee550346895db5259b/html5/thumbnails/19.jpg)
Type of measure of dispersion
• There are two types1. Absolute measure of dispersion2. Relative measure of dispersion.
• Absolute measure of dispersion indicates the amount of variation in a set of values in terms of units of observations. For example, if computers measured by numbers, it shows dispersion by number
• Relative measures of dispersion are free from the units of measurements of the observations. You may measure dispersion by percentage
• Range is an absolute measure while coefficient of variation is the relative measure
![Page 20: Data Analysis and Presentation](https://reader031.fdocuments.in/reader031/viewer/2022013004/568147ee550346895db5259b/html5/thumbnails/20.jpg)
Dispersion …
• There are different type of dispersion measures • We look at Standard Deviation and Coefficient of
variation• Karl Pearson introduced the concept of standard
deviation in 1893• Standard deviation is most frequently used one• The reason is that it is the square–root of the
mean of the squared deviation• Square of standard deviation is called Variance
![Page 21: Data Analysis and Presentation](https://reader031.fdocuments.in/reader031/viewer/2022013004/568147ee550346895db5259b/html5/thumbnails/21.jpg)
Standard Deviation
• It is given by the formula
• Calculate the standard deviation from the following data.
• 14, 22, 9, 15, 20, 17, 12, 11• The Answer is 4.18
)(2
nx
n
xx )(2
or
![Page 22: Data Analysis and Presentation](https://reader031.fdocuments.in/reader031/viewer/2022013004/568147ee550346895db5259b/html5/thumbnails/22.jpg)
Interpretation
• We expect about two-thirds of the scores in a sample to lie within one standard deviation of the mean.
• Generally, most of the scores in a normal distribution cluster fairly close to the mean,
• There are fewer and fewer scores as you move away from the mean in either direction.
• In a normal distribution, 68.26% of the scores fall within one standard deviation of the mean,
• 95.44% fall within two standard deviations, and • 99.73% fall within three standard deviations.
![Page 23: Data Analysis and Presentation](https://reader031.fdocuments.in/reader031/viewer/2022013004/568147ee550346895db5259b/html5/thumbnails/23.jpg)
Advantage of SD
• Assume the mean is 10.0, and standard deviation is 3.36.• one standard deviation above the mean is 13.36 and one
standard deviation below the mean is 6.64.• The standard deviation takes account of all of the scores
and provides a sensitive measure of dispersion. • it also has the advantage that it describes the spread of
scores in a normal distribution with great precision.• The most obvious disadvantage of the standard deviation
is that it is much harder to work out than the other measures of dispersion like rank and percentiles
![Page 24: Data Analysis and Presentation](https://reader031.fdocuments.in/reader031/viewer/2022013004/568147ee550346895db5259b/html5/thumbnails/24.jpg)
Coefficient of Variation
• The Standard deviation is an absolute measure of dispersion.• However, It may not always applicable • The standard deviation of number of computers cannot be
compared with the standard deviation of computer use os students, as both are expressed in different units,
• standard deviation must be converted into a relative measure of dispersion for the purpose of comparison -- coefficient of variation
• The is obtained by dividing the standard deviation by the mean and multiply it by 100
coefficient of variation = X 100 x
![Page 25: Data Analysis and Presentation](https://reader031.fdocuments.in/reader031/viewer/2022013004/568147ee550346895db5259b/html5/thumbnails/25.jpg)
Skewness
• skewness means ‘ lack of symmetry’ . • We study skewness to have an idea about the
shape of the curve which we can draw with the help of the given data.
• If in a distribution mean = median =mode, then that distribution is known as symmetrical distribution.
• The spread of the frequencies is the same on both sides of the center point of the curve.
![Page 26: Data Analysis and Presentation](https://reader031.fdocuments.in/reader031/viewer/2022013004/568147ee550346895db5259b/html5/thumbnails/26.jpg)
Symmetrical distribution
• Mean = Median = Mode
![Page 27: Data Analysis and Presentation](https://reader031.fdocuments.in/reader031/viewer/2022013004/568147ee550346895db5259b/html5/thumbnails/27.jpg)
Negatively skewed distribution
Positively skewed distribution
![Page 28: Data Analysis and Presentation](https://reader031.fdocuments.in/reader031/viewer/2022013004/568147ee550346895db5259b/html5/thumbnails/28.jpg)
Measures of Skewness
1. Karl – Pearason’ s coefficient of skewness2. Bowley’ s coefficient of skewness3. Measure of skewness based on momentsWe see Karl- Pearson, read others from the textbook• Karl – Pearson is the absolute measure of skewness = mean –
mode.• Not suitable for different unit of measures• Use relative measure of skewness -- Karl – Pearson’ s coefficient of
skewness, i.e(Mean –Mode)/standard deviation
In case of ill defined mode, we use
3(Mean –median)/standard deviation
![Page 29: Data Analysis and Presentation](https://reader031.fdocuments.in/reader031/viewer/2022013004/568147ee550346895db5259b/html5/thumbnails/29.jpg)
Kurtosis
• All the frequency curves expose different degrees of flatness or peakedness – called kurtosis
• Measure of kurtosis tell us the extent to which a distribution is more peaked or more flat topped than the normal curve, which is symmetrical and bell-shaped, is designated as Mesokurtic.
• If a curve is relatively more narrow and peaked at the top, it is designated as Leptokurtic.
• If the frequency curve is more flat than normal curve, it is designated as platykurtic.
![Page 30: Data Analysis and Presentation](https://reader031.fdocuments.in/reader031/viewer/2022013004/568147ee550346895db5259b/html5/thumbnails/30.jpg)
Interpretation
• Real word things are usually have a normal distribution pattern – Bell shape
![Page 31: Data Analysis and Presentation](https://reader031.fdocuments.in/reader031/viewer/2022013004/568147ee550346895db5259b/html5/thumbnails/31.jpg)
Normal dist…
• This implies that • 68% of the population is in side 1 • 95% of the population is inside 2• 99% of the population is 3• So you need to select a confidence limit to say
your sample is statistically significant or not• For example, if more than 5% of the population
falls outside 2 standard deviation, the difference between two groups of population is not statistically significant
![Page 32: Data Analysis and Presentation](https://reader031.fdocuments.in/reader031/viewer/2022013004/568147ee550346895db5259b/html5/thumbnails/32.jpg)
Correlation
• Correlation is used to measure the linear association between two variables
• For example, assume X is IT skill and Y is IT use. Is there association b/n these two variables
22 )()(
)(*)(
yyxx
yyxxxr
![Page 33: Data Analysis and Presentation](https://reader031.fdocuments.in/reader031/viewer/2022013004/568147ee550346895db5259b/html5/thumbnails/33.jpg)
Correlation …
• Correlation expresses the inter-dependence of two sets of variables upon each other.
• One variable may be called as independent variable (IV) and the other is dependent variable (DV)
• A change in the IV has an influence in changing the value of dependent variable
• For example IT use will increase organization productivity because have better information access and improve their skills and knowledge
![Page 34: Data Analysis and Presentation](https://reader031.fdocuments.in/reader031/viewer/2022013004/568147ee550346895db5259b/html5/thumbnails/34.jpg)
Correlation Lines
![Page 35: Data Analysis and Presentation](https://reader031.fdocuments.in/reader031/viewer/2022013004/568147ee550346895db5259b/html5/thumbnails/35.jpg)
Correlation Lines
Perfect Correlation
No Correlation
![Page 36: Data Analysis and Presentation](https://reader031.fdocuments.in/reader031/viewer/2022013004/568147ee550346895db5259b/html5/thumbnails/36.jpg)
Type of Correlation
1. Simple2. Multiple correlation3. Partial correlation • In simple correlation, we study only two variables. • For example, number of computers and organization
efficiency • In multiple correlation we study more than two variables
simultaneously. • For example, usefulness and easy of use and IT adoption • In Partial and total correlation, it refers to the study of two
variables excluding some other variable
![Page 37: Data Analysis and Presentation](https://reader031.fdocuments.in/reader031/viewer/2022013004/568147ee550346895db5259b/html5/thumbnails/37.jpg)
Karl pearson’ s coefficient of correlation
• Karl pearson, a great biometrician and statistician, suggested a mathematical method for measuring the magnitude of linear relationship between the two variables
• Karl pearson’ s coefficient of correlation is the most widely used method of correlation
where X = x - x , Y = y - yyx
XYr
xn
XYr
y
22.
![Page 38: Data Analysis and Presentation](https://reader031.fdocuments.in/reader031/viewer/2022013004/568147ee550346895db5259b/html5/thumbnails/38.jpg)
Exercise
X 56 78 65 89 93 24 87 44 74
Y 34 65 67 90 86 30 80 50 70
Calculate the correlation for the following given data
![Page 39: Data Analysis and Presentation](https://reader031.fdocuments.in/reader031/viewer/2022013004/568147ee550346895db5259b/html5/thumbnails/39.jpg)
Spear Man Rank Correlation
• Developed by Edward Spearman in 1904• It is studied when no assumption about the
parameters of the population is made. • This method is based on ranks• It is useful to study the qualitative measure of
attributes like honesty, colour, beauty, intelligence, character, morality etc.
• The individuals in the group can be arranged in order and there on, obtaining for each individual a number showing his/her rank in the group
![Page 40: Data Analysis and Presentation](https://reader031.fdocuments.in/reader031/viewer/2022013004/568147ee550346895db5259b/html5/thumbnails/40.jpg)
Formula
• Where D2 = sum of squares of differences between the pairs of ranks.
• n = number of pairs of observations.• The value of r lies between –1 and +1. If r = +1, there is
complete agreement in order of ranks and the direction of ranks is also same. If r = -1, then there is complete disagreement in order of ranks and they are in opposite directions.
nnDr
3
26
1
![Page 41: Data Analysis and Presentation](https://reader031.fdocuments.in/reader031/viewer/2022013004/568147ee550346895db5259b/html5/thumbnails/41.jpg)
Advantage of Correlation
• It is a simplest and attractive method of finding the nature of correlation between the two variables.
• It is a non-mathematical method of studying correlation. It is easy to understand.
• It is not affected by extreme items.• It is the first step in finding out the relation between the
two variables.• We can have a rough idea at a glance whether it is a
positive correlation or negative correlation.• But we cannot get the exact degree or correlation between
the two variables
![Page 42: Data Analysis and Presentation](https://reader031.fdocuments.in/reader031/viewer/2022013004/568147ee550346895db5259b/html5/thumbnails/42.jpg)
The Pearson Chi-square
• it is the most common coefficient of association, which is calculated to assess the significance of the relationship between categorical variables.
• It is used to test the null hypothesis that observations are independent of each other.
• It is computed as the difference between observed frequencies shown in the cells of cross-tabulation and expected frequencies that would be obtained if variables were truly independent.
![Page 43: Data Analysis and Presentation](https://reader031.fdocuments.in/reader031/viewer/2022013004/568147ee550346895db5259b/html5/thumbnails/43.jpg)
Chi-square …
Obse Exp. difference
M 3 6T 5 6
W 7 6
Th 6 6
F 9 6
Tot
Where O is observed value E is expected value X2 is the association
Where is X2 value and its significance level depend on the total number of observations and the number of cells in the table
E
EOx
)2
![Page 44: Data Analysis and Presentation](https://reader031.fdocuments.in/reader031/viewer/2022013004/568147ee550346895db5259b/html5/thumbnails/44.jpg)
Regression
• Regression is used to estimate (predict) the value of one variable given the value of another.
• The variable predicted on the basis of other variables is called the “dependent” or the ‘ explained’ variable and the other the ‘ independent’ or the ‘ predicting’ variable.
• The prediction is based on average relationship derived statistically by regression analysis.
• For example, if we know that advertising and sales are correlated we may find out expected amount of sales f or a given advertising expenditure or the required amount of expenditure for attaining a given amount of sales.
![Page 45: Data Analysis and Presentation](https://reader031.fdocuments.in/reader031/viewer/2022013004/568147ee550346895db5259b/html5/thumbnails/45.jpg)
Regression
• Regression is the measure of the average relationship between two or more variables in terms of the original units of the data.
• Type of regression 1. Simple and Multiple2. Linear and Non –Linear3. Total and Partial
![Page 46: Data Analysis and Presentation](https://reader031.fdocuments.in/reader031/viewer/2022013004/568147ee550346895db5259b/html5/thumbnails/46.jpg)
Simple and Multiple:
• In case of simple relationship only two variables are considered, for example, the influence of advertising expenditure on sales turnover.
• In the case of multiple relationship, more than two variables are involved. On this while one variable is a dependent variable the remaining variables are independent ones.
• For example, the turnover (y) may depend on advertising expenditure (x) and the income of the people (z).
• Then the functional relationship can be expressed as y = f (x,z).
![Page 47: Data Analysis and Presentation](https://reader031.fdocuments.in/reader031/viewer/2022013004/568147ee550346895db5259b/html5/thumbnails/47.jpg)
Linear and Non-linear
• The linear relationships are based on straight-line trend, the equation of which has no-power higher than one. But, remember a linear relationship can be both simple and multiple.
• Normally a linear relationship is taken into account because besides its simplicity, it has a better predictive value, a linear trend can be easily projected into the future.
• In the case of non-linear relationship curved trend lines are derived. The equations of these are parabolic.
![Page 48: Data Analysis and Presentation](https://reader031.fdocuments.in/reader031/viewer/2022013004/568147ee550346895db5259b/html5/thumbnails/48.jpg)
Total and Partial
• In the case of total relationships all the important variables are considered.
• Normally, they take the form of a multiple relationships because most economic and business phenomena are affected by multiplicity of cases.
• In the case of partial relationship one or more variables are considered, but not all, thus excluding the influence of those not found relevant for a given purpose.
![Page 49: Data Analysis and Presentation](https://reader031.fdocuments.in/reader031/viewer/2022013004/568147ee550346895db5259b/html5/thumbnails/49.jpg)
Regression analysis
• The goal of regression analysis is to develop a regression equation from which we can predict one score on the basis of one or more other scores.
• For example, it can be used to predict a job applicant's potential job performance on the basis of test scores and other factors
![Page 50: Data Analysis and Presentation](https://reader031.fdocuments.in/reader031/viewer/2022013004/568147ee550346895db5259b/html5/thumbnails/50.jpg)
Linear regression equation
• Linear regression equation of Y on X isY = a + bX ……. (1)
• And X on Y isX = a + bY……. (2)Where a, b are constants.
• In a regression equation, y is the dependent variable or criterion variable, or outcome variable we would like to predict.
• X represents the variable we are using to predict y; x is called the predictor variable.
![Page 51: Data Analysis and Presentation](https://reader031.fdocuments.in/reader031/viewer/2022013004/568147ee550346895db5259b/html5/thumbnails/51.jpg)
• b is called the regression constant (or beta-zero), and is the y-intercept of the line that best fits the data in the scatter plot;
• It is the regression coefficient,
• a is the slope of the line that best represents the relationship between the predictor variable (x) and the criterion variable (y).
• You can use multiple regression Y= a+bx1+bx2+….+bxn
![Page 52: Data Analysis and Presentation](https://reader031.fdocuments.in/reader031/viewer/2022013004/568147ee550346895db5259b/html5/thumbnails/52.jpg)
Example
• Find the Two Equation Regression for the following data
X 6 2 10 4 8Y 9 11 5 8 7
![Page 53: Data Analysis and Presentation](https://reader031.fdocuments.in/reader031/viewer/2022013004/568147ee550346895db5259b/html5/thumbnails/53.jpg)
![Page 54: Data Analysis and Presentation](https://reader031.fdocuments.in/reader031/viewer/2022013004/568147ee550346895db5259b/html5/thumbnails/54.jpg)
Statistical significance • Researchers are interested not only in the correlation between
two variables, but also in whether the value of r they obtain is statistically significant.
• Statistical significance exists when a correlation coefficient calculated on a sample has a very low probability of being zero in the population.
• Assume we get a correlation between X and Y is 0.4 in our sample. • How do we now if this r not zero (r=0.0) if we take the census of
the entire population. • The probability that our correlation is truly zero in the population
is sufficiently low (usually less than .05), • we refer this probability as statistically significant
![Page 55: Data Analysis and Presentation](https://reader031.fdocuments.in/reader031/viewer/2022013004/568147ee550346895db5259b/html5/thumbnails/55.jpg)
Factors affecting statistical significance
• Sample size • Assume that, unbeknown to each other, you and I independently
calculated the correlation between shyness and self-esteem and that we both obtained a correlation of -.50.
• However, your calculation was based on data from 300 participants, whereas my calculation was based on data from 30 participants.
• Which of us should feel more confident that the true correlation between shyness and self-esteem in the population is not .OO?
• You can probably guess that your sample of 300 should give you more confidence in the value of r you obtained than my sample of 30.
• Thus, all other things being equal, we are more likely to conclude that a particular correlation is statistically significant the larger our sample is.
![Page 56: Data Analysis and Presentation](https://reader031.fdocuments.in/reader031/viewer/2022013004/568147ee550346895db5259b/html5/thumbnails/56.jpg)
Factors …
• Magnitude of the correlation. For a given sample size, the larger the value of r we obtain, the less likely it is to be .00 in the population.
• Imagine you and I both calculated a correlation coefficient based on data from 300 participants;
• your calculated value of r was .75, whereas my value of r was .20. You would be more confident that your correlation was not truly .00 in the population than I would be.
![Page 57: Data Analysis and Presentation](https://reader031.fdocuments.in/reader031/viewer/2022013004/568147ee550346895db5259b/html5/thumbnails/57.jpg)
Factors ..
• Level of confidence • It indicates how we are careful we want to be not to
draw an incorrect conclusion about whether the correlation we obtain could be zero in the population.
• Typically, researchers decide that they will consider a correlation to be significantly different from zero if there is less than a 5% chance (that is, less than 5 chances out of 100) that a correlation as large as the one they obtained could have come from a population with a true correlation of zero.
![Page 58: Data Analysis and Presentation](https://reader031.fdocuments.in/reader031/viewer/2022013004/568147ee550346895db5259b/html5/thumbnails/58.jpg)
Techniques
• Formulas and tables for testing the statistical significance of correlation coefficients can be found in many statistics books
• Imagine that you obtained a value of r = .32 based on a sample of 100 participants.
• Looking down the left-hand column, find the number of participants (100).
• Looking at the other column, we see that the minimum value of r that is significant with 100 participants is .16.
• Because our correlation coefficient (.32) exceeds .16, we conclude that the population correlation is very unlikely to be zero (in fact, there is less than a 5% chance that the population correlation is zero).
![Page 59: Data Analysis and Presentation](https://reader031.fdocuments.in/reader031/viewer/2022013004/568147ee550346895db5259b/html5/thumbnails/59.jpg)
Techniques …• Keep in mind that, with large samples, even very small correlations
are statistically significant• Thus, finding that a particular r is significant tells us only that it is
very unlikely to be .00 in the population; it does not tell us whether the relationship between the two variables is a strong or an important one.
• The strength of a correlation is assessed only by its magnitude, not whether it is statistically significant.
• As a rule of thumb, behavioral researchers tend to regard correlations at or below about .10 as weak in magnitude (they account for only 1 % of the variance), correlations around .30 as moderate in magnitude, and correlations over .50 as strong in magnitude