Introduction to Statistics

157
STATISTICS Anjan Mahanta [email protected] International Education Institute

Transcript of Introduction to Statistics

Page 1: Introduction to Statistics

STATISTICS

Anjan Mahanta [email protected]

International Education Institute

Page 2: Introduction to Statistics

Course Outline

Definitions, Scope and Limitations

Introduction to sampling methods

Collection of data, Classification and Tabulation

Frequency Distribution

Diagrammatic and Graphical Representation

Measures of Central Tendency

2

Page 3: Introduction to Statistics

Chapter 1

Definitions, Scope and Limitations

3

Page 4: Introduction to Statistics

Introduction In the modern world of computers and information

technology, the importance of statistics is very well recognized by all the disciplines.

Statistics has originated as a science of statehood and found applications slowly and steadily in ◦ Agriculture, ◦ Economics, ◦ Commerce, ◦ Biology, ◦ Medicine, ◦ Industry, ◦ Planning, education and so on.

As on date there is no other human walk of life, where statistics cannot be applied.

4

Page 5: Introduction to Statistics

Origin and Growth of Statistics

5

Page 6: Introduction to Statistics

Origin and Growth of Statistics The word ‘ Statistics’ and ‘ Statistical’ are all derived from the

Latin word Status, means a political state.

Statistics is concerned with scientific methods for ◦ collecting, ◦ organising, ◦ summarising, ◦ presenting and analysing data ◦ as well as deriving valid conclusions and making

reasonable decisions on the basis of this analysis.

6

Page 7: Introduction to Statistics

Meaning of Statistics The word ‘ statistic’ is used to refer to ◦ Numerical facts, such as the number of people living

in particular area. ◦ The study of ways of collecting, analysing and

interpreting the facts.

7

Page 8: Introduction to Statistics

Definition by A.L.Bowley “Statistics are numerical statement of facts in any

department of enquiry placed in relation to each other.” “Statistics may be rightly called the scheme of averages.”

8

Page 9: Introduction to Statistics

Definition by Croxton and Cowden “Statistics may be defined as the science of collection,

presentation analysis and interpretation of numerical data from the logical analysis.” ◦ 1. Collection of Data ◦ 2. Presentation of data ◦ 3. Analysis of data ◦ 4. Interpretation of data

9

Page 10: Introduction to Statistics

Functions of Statistics ◦ 1. Condensation ◦ 2. Comparison ◦ 3. Forecasting ◦ 4. Estimation

10

Page 11: Introduction to Statistics

Scope of Statistics

◦ 1. Statistics and Industry ◦ 2. Statistics and Commerce ◦ 3. Statistics and Agriculture ◦ 4. Statistics and Economics ◦ 5. Statistics and Education ◦ 6. Statistics and Planning ◦ 7. Statistics and Medicine ◦ 8. Statistics and Modern applications

11

Page 12: Introduction to Statistics

Limitations of Statistics

◦ Statistics is not suitable to the study of qualitative phenomenon

◦ Statistics does not study individuals ◦ Statistics laws are not exact ◦ Statistics table may be misused ◦ Statistics is only, one of the methods of studying a

problem ◦

12

Page 13: Introduction to Statistics

Exercises ◦

13

Page 14: Introduction to Statistics

Exercises ◦

14

Page 15: Introduction to Statistics

Assignments ◦

15

Page 16: Introduction to Statistics

Assignments ◦

16

Page 17: Introduction to Statistics

Assignments ◦

17

Page 18: Introduction to Statistics

Project Example

PROJECT TITLE - WIMBLEDON

18

Page 19: Introduction to Statistics

Information from Magazines

PROJECT TITLE - WIMBLEDON

19

Page 20: Introduction to Statistics

Information from Newspaper

20

Page 21: Introduction to Statistics

Information from Television

21

Page 22: Introduction to Statistics

Information from Television

22

Page 23: Introduction to Statistics

Information from Internet

23

Page 24: Introduction to Statistics

Information from Internet

24

Page 25: Introduction to Statistics

Examples ◦

25

Page 26: Introduction to Statistics

Examples ◦

26

Page 27: Introduction to Statistics

Chapter 2

Introduction to Sampling Methods

27

Page 28: Introduction to Statistics

Introduction Sampling is very often used in our daily life. For example: While purchasing food grains from a shop we usually examine a

handful from the bag to assess the quality of the commodity. A doctor examines a few drops of blood as sample and draws

conclusion about the blood constitution of the whole body.

28

Page 29: Introduction to Statistics

Population

In a statistical enquiry, all the items, which fall within the purview of enquiry, are known as Population or Universe.

Examples: Total number of students studying in a school or college, total number of books in a library, etc.

29

Page 30: Introduction to Statistics

Population

30

Page 31: Introduction to Statistics

Population Sometimes it is possible and practical to examine

every person or item in the population we wish to describe. We call this a Complete enumeration, or census.

We use sampling when it is not possible to measure every item in the population.

31

Page 32: Introduction to Statistics

Sampling Statisticians use the word sample to describe a

portion chosen from the population. A finite subset of statistical individuals defined in a

population is called a sample. The number of units in a sample is called the sample

size.

32

Page 33: Introduction to Statistics

Reasons for selecting a sample Complete enumerations are practically impossible when

the population is infinite. When the results are required in a short time. When the area of survey is wide. When resources for survey are limited particularly in

respect of money and trained persons. When the item or unit is destroyed under investigation.

33

Page 34: Introduction to Statistics

Sampling Representing Size Mathematically, A capital N is used to represent the size of a population. A lowercase n is used to represent the size of a sample. Let’s say you want to find the average GPA of a student at your university. Your university has 20,000 students, and you select 100 students and ask

them their GPAs. What are N and n in this example? N, the size of your population, is 20,000 n, the size of your sample, is 100

34

Page 35: Introduction to Statistics

Principles of Sampling Principle of statistical regularity Principle of Inertia of large numbers Principle of Validity Principle of Optimization

35

Page 36: Introduction to Statistics

Methods of selection of samples Simple Random Sampling

Every member of the population(N) has an equal

chance of being selected for your sample(n).

This is arguably the best sampling method, as your sample is almost guaranteed to be representative of your population. However, it is rarely ever used due to being too impractical.

36

Page 37: Introduction to Statistics

Methods of selection of samples

Simple Random Sampling

37

Page 38: Introduction to Statistics

Methods of selection of samples Stratified Random Sampling

With this method, the population(N) is split into non-

overlapping groups ("strata"), then simple random sampling is done on each group to form a sample(n).

One example of this would be splitting a population of students into men and women, then sampling from

each of the two groups. This may allow us to collect the same amount of information as simple random sampling, but use less people.

38

Page 39: Introduction to Statistics

Methods of selection of samples

Stratified Random Sampling

39

Page 40: Introduction to Statistics

Methods of selection of samples

Systematic Random Sampling

In this method, every nth individual from the population(N) is placed in the sample(n).

For example, if you add every 7th individual to

walk out of a supermarket to your sample, you are performing systematic sampling.

40

Page 41: Introduction to Statistics

Methods of selection of samples

Systematic Random Sampling

41

Page 42: Introduction to Statistics

Lottery Method This is the most popular and simplest method.

In this method all the items of the population are

numbered on separate slips of paper of same size, shape and colour.

They are folded and mixed up in a container.

The required numbers of slips are selected at random for the desire sample size.

If the universe is infinite this method is inapplicable.

42

Page 43: Introduction to Statistics

Lottery Method

43

Page 44: Introduction to Statistics

Table of Random Numbers A random number table is so constructed that all

digits 0 to 9 appear independent of each other with equal frequency.

If we have to select a sample from population of size N= 100, then the numbers can be combined three by three to give the numbers from 001 to 100.

When the size of the population is less than thousand, three digit number 000,001,002,….. 999 are assigned.

If any random number is greater than the population size N, then N can be subtracted from the random number drawn.

44

Page 45: Introduction to Statistics

Table of Random Numbers

45

Page 46: Introduction to Statistics

Table of Random Numbers

46

Page 47: Introduction to Statistics

Random Number selection using calculators and computers

Random number can be generated through scientific calculator or computers.

For each press of the key get a new random numbers.

The ways of selection of sample is similar to that of using random number table.

47

Page 48: Introduction to Statistics

Merits of using random numbers

Personal bias is eliminated as a selection depends solely on chance.

A random sample is in general a representative sample for a homogenous population.

There is no need for the thorough knowledge of the units of the population.

The accuracy of a sample can be tested by examining another sample from the same universe when the universe is unknown.

This method is also used in other methods of sampling.

48

Page 49: Introduction to Statistics

Limitations of using random numbers

Preparing lots or using random number tables is tedious when the population is large.

When there is large difference between the units of population, the simple random sampling may not be a representative sample.

The size of the sample required under this method is more than that required by stratified random sampling.

It is generally seen that the units of a simple random sample lie apart geographically. The cost and time of collection of data are more.

49

Page 50: Introduction to Statistics

Stratified Random Sampling

There are two types of stratified sampling. They are proportional and non-proportional.

In the proportional sampling 20 equal and proportionate representation is given to subgroups

The population size is denoted by N and the sample size is denoted by ‘ n’

The sample fractions is a constant for each stratum. That is given by n/N = c.

50

Page 51: Introduction to Statistics

Example

51

Page 52: Introduction to Statistics

Stratified Random Sampling Merits

It is more representative.

It ensures greater accuracy.

It is easy to administer as the universe is sub - divided.

Greater geographical concentration reduces time and expenses.

When the original population is badly skewed, this method is appropriate.

For non – homogeneous population, it may field good results.

52

Page 53: Introduction to Statistics

Stratified Random Sampling Limitations

To divide the population into homogeneous strata, it requires more money, time and statistical experience which is a difficult one.

Improper stratification leads to bias, if the different strata overlap such a sample will not be a representative one.

53

Page 54: Introduction to Statistics

Systematic Sampling

This method is widely employed because of its ease and convenience.

A frequently used method of sampling when a complete list of the population is available is systematic sampling.

It is also called Quasi-random sampling.

54

Page 55: Introduction to Statistics

Systematic Sampling Selection Procedure

The whole sample selection is based on just a random start.

The first unit is selected with the help of random numbers and the rest get selected automatically according to some pre designed pattern is known as systematic sampling.

With systematic random sampling every Kth element in the frame is selected for the sample, with the starting point among the first K elements determined at random.

55

Page 56: Introduction to Statistics

56

Page 57: Introduction to Statistics

Systematic Sampling Merits

This method is simple and convenient.

Time and work is reduced much.

If proper care is taken result will be accurate.

It can be used in infinite population.

57

Page 58: Introduction to Statistics

Systematic Sampling Limitations

Systematic sampling may not represent the whole population.

There is a chance of personal bias of the investigators.

58

Page 59: Introduction to Statistics

Exercise - 2

59

Page 60: Introduction to Statistics

Exercise - 2

60

Page 61: Introduction to Statistics

Exercise - 2

61

Page 62: Introduction to Statistics

Chapter 3

Collection of Data, Classification and Tabulation

62

Page 63: Introduction to Statistics

Collection of Data, Classification and Tabulation

Introduction Everybody collects, interprets and uses information, much

of it in a numerical or statistical forms in day-to-day life.

In everyday life, in business and industry, certain statistical information is necessary and it is independent to know where to find it how to collect it.

As employees of any firm, people want to compare their salaries and working conditions, promotion opportunities and so on.

63

Page 64: Introduction to Statistics

Collection of Data, Classification and Tabulation

Nature of Data Time Series Data

Spatial Data (place)

Spacio-temporal data (time & place)

64

Page 65: Introduction to Statistics

Nature of Data Time Series Data It is a collection of a set of numerical values, collected

over a period of time.

65

Page 66: Introduction to Statistics

Nature of Data Spatial Data If the data collected is connected with that of a place,

then it is termed as spatial data.

66

Page 67: Introduction to Statistics

Nature of Data Spacio Temporal Data If the data collected is connected to the time as well as

place then it is known as spacio temporal data.

67

Page 68: Introduction to Statistics

Categories of Data Primary Data

68

Page 69: Introduction to Statistics

Categories of Data

Secondary Data

69

Page 70: Introduction to Statistics

Classification of Data Objectives of Classification

It condenses the mass of data in an easily assimilable form.

It eliminates unnecessary details. It facilitates comparison and highlights the significant

aspect of data. It enables one to get a mental picture of the information

and helps in drawing inferences. It helps in the statistical treatment of the information

collected.

70

Page 71: Introduction to Statistics

Types of Classification

Chronological Classification In chronological classification the collected data are arranged

according to the order of time expressed in years, months, weeks, etc.,

71

Page 72: Introduction to Statistics

Chronological Classification

72

Page 73: Introduction to Statistics

Types of Classification

Geographical Classification In this type of classification the data are classified according to

geographical region or place.

73

Page 74: Introduction to Statistics

Types of Classification Qualitative Classification

In this type of classification data are classified on the basis of same attributes or quality like sex, literacy, religion, employment etc., Such attributes cannot be measured along with a scale.

Examples: ◦ He is brown and black ◦ He has long hair ◦ He has lots of energy ◦ He is clever

74

Page 75: Introduction to Statistics

Types of Classification Quantitative Classification

Quantitative classification refers to the classification of data according to some characteristics that can be measured such as height, weight, etc.

Examples:

◦ He has 4 legs ◦ He has 2 brothers ◦ He weighs 25.5 kg ◦ He is 170 cm tall

75

Page 76: Introduction to Statistics

Types of Classification

76

Page 77: Introduction to Statistics

Tabulation

Tabulation is the process of summarizing classified or grouped data in the form of a table so that it is easily understood and an investigator is quickly able to locate the desired information.

77

Page 78: Introduction to Statistics

Preparing a Table An ideal table should consist of the following main parts: ◦ 1. Table number ◦ 2. Title of the table ◦ 3. Captions or column headings ◦ 4. Stubs or row designation ◦ 5. Body of the table ◦ 6. Footnotes ◦ 7. Sources of data

78

Page 79: Introduction to Statistics

A model structure

79

Page 80: Introduction to Statistics

Types of Table

Simple or one-way table

◦ A simple or one-way table is the simplest table which contains data of one characteristic only.

80

Page 81: Introduction to Statistics

Types of Table

Two way table

◦ A table, which contains data on two characteristics, is called a two way table.

81

Page 82: Introduction to Statistics

Types of Table

Manifold table

◦ A table ,which has more than two characteristics of data is considered as a manifold table.

82

Page 83: Introduction to Statistics

Exercise - 3

83

Page 84: Introduction to Statistics

Exercise - 3

84

Page 85: Introduction to Statistics

Data Collection Project Prepare a questionnaire Collect the data Separate the data into qualitative and quantitative

types Display data in graphs Project Examples –

◦ 24 hour activities of our students ◦ Money spending habits of our students

85

Page 86: Introduction to Statistics

Chapter 4

Frequency Distribution

86

Page 87: Introduction to Statistics

Frequency Distribution

Introduction Frequency is how often something occurs.

87

Page 88: Introduction to Statistics

Frequency Distribution Introduction By counting

frequencies we can make a Frequency Distribution table.

88

Page 89: Introduction to Statistics

Frequency Distribution A frequency distribution is constructed for three main reasons:

1. To facilitate the analysis of data.

2. To estimate frequencies of the unknown population distribution from the distribution of sample data and

3. To facilitate the computation of various statistical measures

89

Page 90: Introduction to Statistics

Frequency Distribution

90

Arrangement of data in ascending order

Page 91: Introduction to Statistics

Frequency Distribution

91

Discrete (or) Ungrouped frequency distribution:

Page 92: Introduction to Statistics

Frequency Distribution

92

Discrete (or) Ungrouped frequency distribution:

Page 93: Introduction to Statistics

Frequency Distribution

93

Exercise - I

Page 94: Introduction to Statistics

Frequency Distribution Continuous frequency distribution: In this form of distribution

refers to groups of values.

Wage distribution of 100 employees

94

Page 95: Introduction to Statistics

Nature of Class Class Limit

◦ The class limits are the lowest and the highest values that can be included in the class. For example, take the class 30-40. The lowest value of the class is 30 and highest class is 40.

◦ In statistical calculations, lower class limit is denoted by L and upper class limit by U.

95

Page 96: Introduction to Statistics

Nature of Class Class Limit

96

Page 97: Introduction to Statistics

Nature of Class Class Interval

◦ The class interval may be defined as the size of each grouping of data. For example, 50-75, 75-100, 100-125… are class intervals.

97

Page 98: Introduction to Statistics

Nature of Class Width or size of the class interval

◦ The difference between the lower and upper class limits is called Width or size of class interval and is denoted by ‘ C’ .

98

Page 99: Introduction to Statistics

Nature of Class Range

◦ The difference between largest and smallest value of the observation is called The Range

◦ It is denoted by ‘ R’ R = Largest value – Smallest value R = L - S

99

Page 100: Introduction to Statistics

Nature of Class Mid-value or Middle Point

◦ The central point of a class interval is called the mid value or mid-point.

◦ It is found out by adding the upper and lower limits of a class and dividing the sum by 2.

◦ Mid-Value = L + U / 2

◦ For example, if the class interval is 20-30 then the mid-value is 20+30 / 2 = 25

100

Page 101: Introduction to Statistics

Nature of Class

Frequency ◦ Number of observations falling within a particular class

interval is called frequency of that class.

101

Page 102: Introduction to Statistics

Nature of Class No of class intervals ◦ The number of class interval in a frequency is matter of

importance. ◦ The number of class interval should not be too many. ◦ For an ideal frequency distribution, the number of class

intervals can vary from 5 to 15.

102

Page 103: Introduction to Statistics

Nature of Class Method of determining class intervals

103

Page 104: Introduction to Statistics

Nature of Class Size of Class Interval

104

Page 105: Introduction to Statistics

Types of Class Intervals 1.Exclusive Method When the class intervals are so fixed that the upper limit of

one class is the lower limit of the next class; it is known as the exclusive method of classification.

105

Page 106: Introduction to Statistics

Exclusive Method

106

Page 107: Introduction to Statistics

Types of Class Intervals 2.Inclusive Method In this method, the overlapping of the class

intervals is avoided. Both the lower and upper limits are included in

the class interval.

107

Page 108: Introduction to Statistics

Inclusive Method

108

Page 109: Introduction to Statistics

Types of Class Intervals

3. Open end classes A class limit is missing either at the lower end of

the first class interval or at the upper end of the last class interval or both are not specified.

109

Page 110: Introduction to Statistics

Open end classes

110

Page 111: Introduction to Statistics

Constructing the frequency table

111

Page 112: Introduction to Statistics

Preparation of frequency table Example 1

112

Maximum Weight = 64 Kgs

Minimum Weight = 32 Kgs

Page 113: Introduction to Statistics

Preparation of frequency table Example 1

113

Page 114: Introduction to Statistics

Preparation of frequency table Example 1

114

Page 115: Introduction to Statistics

Preparation of frequency table Example 2

115

Page 116: Introduction to Statistics

Preparation of frequency table Example 2

116

Page 117: Introduction to Statistics

Preparation of frequency table Example 2

117

Page 118: Introduction to Statistics

Percentage frequency table

118

Page 119: Introduction to Statistics

Cumulative frequency table Cumulative frequency distribution has a running total of the values. It

is constructed by adding the frequency of the first class interval to the frequency of the second class interval.

119

Page 120: Introduction to Statistics

Exercise

120

Page 121: Introduction to Statistics

Exercise

121

Page 122: Introduction to Statistics

Quiz

122

https://www.mathsisfun.com/data/frequency-distribution.html

Page 123: Introduction to Statistics

Chapter 5: Graphs What is a graph? It is a diagram that exhibits a relationship, often functional,

between two sets of numbers as a set of points having coordinates determined by the relationship.

123

Page 124: Introduction to Statistics

Graphs

Origin (0,0)

124

Example

Page 125: Introduction to Statistics

Types of Graphs Line Graph A line graph is useful in displaying data or

information that changes continuously over time. The points on a line graph are connected by a line. Another name for a line graph is a line chart.

125

Page 126: Introduction to Statistics

Types of Graphs Bar Graph A bar graph is a chart that uses either horizontal

or vertical bars to show comparisons among categories.

126

Page 127: Introduction to Statistics

Types of Graphs Pie Chart A pie chart (or a circle chart) is a circular

statistical graphic, which is divided into slices to illustrate numerical proportion.

127

Page 128: Introduction to Statistics

Review

128

Page 129: Introduction to Statistics

Creating Graph in Excel

Enter the following information in Excel Worksheet to create a Line Graph

129

Page 130: Introduction to Statistics

Creating Graph in Excel

Enter the following information in Excel Worksheet to create a Bar Graph

130

Page 131: Introduction to Statistics

Creating Graph in Excel

Enter the following information in Excel Worksheet to create a Pie Chart

131

Page 132: Introduction to Statistics

Chapter 6

132

Measures of Central Tendency

•In the study of a population with respect to one in which we are interested we may get a large number of observations.

Page 133: Introduction to Statistics

Measures of Central Tendency

133

•It is not possible to grasp any idea about the characteristic when we look at all the observations. •So it is better to get one number for one group.

Page 134: Introduction to Statistics

Measures of Central Tendency

134

• That number must be a good representative one for all the observations to give a clear picture of that characteristic. • Such representative number can be a central value for all these observations. • This central value is called a measure of central tendency.

Page 135: Introduction to Statistics

Arithmetic Mean

135

Page 136: Introduction to Statistics

Arithmetic Mean

136

Page 137: Introduction to Statistics

Arithmetic Mean

137

Page 138: Introduction to Statistics

Arithmetic Mean

138

Page 139: Introduction to Statistics

Arithmetic Mean

139

Page 140: Introduction to Statistics

Arithmetic Mean

140

Page 141: Introduction to Statistics

Arithmetic Mean

141

Page 142: Introduction to Statistics

Arithmetic Mean

142

Page 143: Introduction to Statistics

Arithmetic Mean

143

Page 144: Introduction to Statistics

Median The median is that value of the variant which divides the

group into two equal parts, one part comprising all values greater, and the other, all values less than median.

Arrange the given values in the increasing or decreasing order.

If the number of values are odd, median is the middle value.

If the number of values are even, median is the mean of middle two values.

144

Page 145: Introduction to Statistics

Median Example of ODD Number

145

Page 146: Introduction to Statistics

Median

146

Page 147: Introduction to Statistics

Median Even Number

147

Page 148: Introduction to Statistics

Median

148

Page 149: Introduction to Statistics

Median

149

Page 150: Introduction to Statistics

Mode The mode refers to that value in a distribution, which

occur most frequently. It is an actual value, which has the highest concentration of items in and around it.

150

Page 151: Introduction to Statistics

Mode

151

Page 152: Introduction to Statistics

Mode

152

Page 153: Introduction to Statistics

Mode

153

Page 154: Introduction to Statistics

Mode

154

Page 155: Introduction to Statistics

Mode

155

Page 156: Introduction to Statistics

Exercises

156

Page 157: Introduction to Statistics

Answers

157