Unit 1 Introduction

19
Unit-1 Statistics Definition 1 :- Statistics is the study of the collection, analysis, interpretation, presentation, and organization of data. In applying statistics to scientific, industrial, or societal problem, it is necessary to begin with a population or process to be studied. Populations can be diverse topics such as "all persons living in a country" or "every atom composing a crystal". It deals with all aspects of data including the planning of data collection in terms of the design of surveys and experiments. Definition 2 :- Statistics is a science of facts and figures and nothing beyond that. It's a measurement of data and expression of the same in the numerical manner. Uses of statistics: 1. It is highly quantitative than qualitative 2. Statistical method deals with two fundamental principles 3. Statistical unit 4. Statistical data must be manipulated 5. Presentation of statistical data with the help of line-diagram 1. It is highly quantitative than qualitative: Social statistics which present the data of an area must be numerous in nature. By which we can measure the tendency of a project. In a little period, it also understand by everyone, when listen the percentage. So it is easy to record and easy to understand. 2. Statistical method deals with two fundamental principles: Fundamental regularity based on mathematical probability It says about capacity of the researcher Fundamental regularity based on mathematical probability: It states that every social phenomena is influenced by large number by

Transcript of Unit 1 Introduction

Page 1: Unit 1 Introduction

Unit-1

Statistics

Definition 1 :-

Statistics is the study of the collection, analysis, interpretation, presentation, and organization of data. In applying statistics to scientific, industrial, or societal problem, it is necessary to begin with a population or process to be studied. Populations can be diverse topics such as "all persons living in a country" or "every atom composing a crystal". It deals with all aspects of data including the planning of data collection in terms of the design of surveys and experiments. Definition 2 :-

Statistics is a science of facts and figures and nothing beyond that. It's a measurement of data and expression of the same in the numerical manner. Uses of statistics:

1. It is highly quantitative than qualitative 2. Statistical method deals with two fundamental principles 3. Statistical unit 4. Statistical data must be manipulated 5. Presentation of statistical data with the help of line-diagram

1. It is highly quantitative than qualitative: Social statistics which present the data of an area must be numerous in nature. By which we can measure the tendency of a project. In a little period, it also understand by everyone, when listen the percentage. So it is easy to record and easy to understand.

2. Statistical method deals with two fundamental principles:

Fundamental regularity based on mathematical probability It says about capacity of the researcher

Fundamental regularity based on mathematical probability: It states that every social phenomena is influenced by large number by

Page 2: Unit 1 Introduction

variables, which are co-related and inter related and statistics ls to study this co-relation. Therefore the theory of probability, linear programs and shadow prices are used to find-out the reality.

It says about capacity of the researcher: For substantiation of findings and conclusions, statistical jargon are necessary and it save the researcher/scholar from danger and challenges. It is the data, facts and figures which say the capacity of the researcher. The skills and the resources which is used by the researcher must be applied in its research finding.

3. Statistical Units:

Statistical unit has four characteristics as:

Appropriateness Clarity Measurability Comparability

4. Statistical data must be manipulated: The statistical data must be manipulated, divided and totaled to formulate some conclusions.

5. Presentation of statistical data with the help of line-diagram: Presentation of statistical data with the help of line-diagram, graphs, charts, histogram, frequency, distribution, pie-diagrams etc.

Limitations of statistics:

Statistics is indispensable to almost all sciences - social, physical and natural. It is very often used in most of the spheres of human activity. In spite of the wide scope of the subject it has

certain limitations. Some important limitations of statistics are the following:

1. Statistics does not study qualitative phenomena:

Statistics deals with facts and figures. So the quality aspect of a variable or the subjective phenomenon falls out of the scope of statistics. For example, qualities like beauty, honesty,

intelligence etc. cannot be numerically expressed. So these characteristics cannot be examined statistically. This limits the scope of the subject.

2. Statistical laws are not exact:

Page 3: Unit 1 Introduction

Statistical laws are not exact as incase of natural sciences. These laws are true only on average. They hold good under certain conditions. They cannot be universally applied. So statistics has

less practical utility.

3. Statistics does not study individuals:

Statistics deals with aggregate of facts. Single or isolated figures are not statistics. This is considered to be a major handicap of statistics.

4. Statistics can be misused:

Statistics is mostly a tool of analysis. Statistical techniques are used to analyze and interpret the

collected information in an enquiry. As it is, statistics does not prove or disprove anything. It is just a means to an end. Statements supported by statistics are more appealing and are commonly

believed. For this, statistics is often misused. Statistical methods rightly used are beneficial but if misused these become harmful. Statistical methods used by less expert hands will lead to inaccurate results. Here the fault does not lie with the subject of statistics but with the person

who makes wrong use of it.

Frequency Distribution

Frequency:- Frequency is how often something occurs.

Example: Sam played football on Saturday Morning, Saturday Afternoon, Thursday Afternoon The frequency was 2 on Saturday, 1 on Thursday and 3 for the whole week. Frequency Distribution By counting frequencies we can make a Frequency Distribution table. Example: Goals

Sam put the numbers in order, then added up:

how often 1 occurs (2 times), how often 2 occurs (5 times), etc,

and wrote them down as a Frequency Distribution table.

Sam's team has scored the following numbers

of goals in recent games:

2, 3, 1, 2, 1, 3, 2, 3, 4, 5, 4, 2, 2,3

Page 4: Unit 1 Introduction

From the table we can see interesting things such as

getting 2 goals happens most often only once did they get 5 goals

Frequency Distribution:- values and their frequency (how often each value occurs).

Example: Newspapers These are the numbers of newspapers sold at a local shop over the last 10 days: 22, 20, 18, 23, 20, 25, 22, 20, 18, 20 Let us count how many of each number there is:

Papers Sold Frequency

18 2

19 0

20 4

21 0

22 2

23 1

24 0

25 1

It is also possible to group the values. Here they are grouped in 5s:

Papers Sold Frequency

15-19 2

20-24 7

25-29 1

Frequency Curve

A smooth curve which corresponds to the limiting case of a histogram computed for a frequency distribution of a continuous distribution as the number of data points becomes very large is

called frequency curve.

Page 5: Unit 1 Introduction

Measures of Central Tendency

Introduction

A measure of central tendency is a single value that attempts to describe a set of data by identifying the central position within that set of data. Measures of central tendency are sometimes called measures of central location. They are also classed as summary statistics. The mean (often called the average) is most likely the measure of central tendency that you are most familiar with, but there are others, such as the median and the mode.

The mean, median and mode are all valid measures of central tendency.

Mean (Arithmetic)

The mean (or average) is the most popular and well known measure of central tendency. It can be used with both discrete and continuous data, although its use is most often with continuous data. The mean is equal to the sum of all the values in the data set divided by the number of values in the data set. So, if we have n values in a data set and they have values π‘₯1 , π‘₯2 , π‘₯3 , … , π‘₯𝑛 the sample mean, usually denoted by (pronounced x bar), is:

οΏ½Μ…οΏ½ =(π‘₯1 + π‘₯2 + π‘₯3 + β‹― + π‘₯𝑛)

𝑛

This formula is usually written in a slightly different manner using the Greek capitol letter, βˆ‘ , pronounced "sigma", which means "sum of...":

οΏ½Μ…οΏ½ =βˆ‘π‘₯

𝑛

When not to use the mean

The mean has one main disadvantage: it is particularly susceptible to the influence of outliers. These are values that are unusual compared to the rest of the data set by being especially small or large in numerical value. For example, consider the wages of staff at a factory below:

Page 6: Unit 1 Introduction

Staff 1 2 3 4 5 6 7 8 9 10

Salary 15k 18k 16k 14k 15k 15k 12k 17k 90k 95k

The mean salary for these ten staff is $30.7k. However, inspecting the raw data suggests that this mean value might not be the best way to accurately reflect the typical salary of a worker, as most workers have salaries in the $12k to 18k range. The mean is being skewed by the two large salaries. Therefore, in this situation, we would like to have a better measure of central tendency.

Median

The median is the middle score for a set of data that has been arranged in order of magnitude.

If the number of events are even then the average of two middle are taken.

The median is better for describing the typical value.

Example:-

In order to calculate the median, suppose we have the data below:

65 55 89 56 35 14 56 55 87 45 92

We first need to rearrange that data into order of magnitude (smallest first):

14 35 45 55 55 56 56 65 87 89 92

Our median mark is the middle mark - in this case, 56 (highlighted in bold).

Mode

The mode is the most frequent score in our data set.

What will happen to the measures of central tendency if we add the same amount

to all data values, or multiply each data value by the same amount?

Page 7: Unit 1 Introduction

Data Mean Mode Median

Original Data Set:

6, 7, 8, 10, 12, 14, 14, 15, 16, 20 12.2 14 13

Add 3 to each

data value 9, 10, 11, 13, 15, 17, 17, 18, 19, 23 15.2 17 16

Multiply 2

times each

data value

12, 14, 16, 20, 24, 28, 28, 30, 32, 40 24.4 28 26

When added: Since all values are shifted the same amount, the measures of central tendency all shifted by the same amount. If you add 3 to each data value, you will add 3 to the mean, mode and median.

When multiplied: Since all values are affected by the same multiplicative values, the measures

of central tendency will feel the same affect. If you multiply each data value by 2, you will multiply the mean, mode and median by 2.

Example :-1

Find the mean, median and mode for the following data: 5, 15, 10, 15, 5, 10, 10, 20, 25, 15. Answer:- (You will need to organize the data.) 5, 5, 10, 10, 10, 15, 15, 15, 20, 25

Mean: π‘†π‘’π‘š π‘œπ‘“ π‘‘π‘Žπ‘‘π‘Ž

π‘π‘’π‘šπ‘π‘’π‘Ÿ π‘œπ‘“ π‘‘π‘Žπ‘‘π‘Ž=

130

10= 13

Median: 5, 5, 10, 10,10,15,15, 15, 20, 25 Listing the data in order is the easiest way to find the median.

The numbers 10 and 15 both fall in the middle.

Average these two numbers to get the median. 10+15

2= 12.5

Mode: Two numbers appear most often: 10 and 15. There are three 10's and three 15's. In this example there are two answers for the mode.

Example :- 2 For what value of x will 8 and x have the same mean (average) as 27 and 5?

Answer:-

Page 8: Unit 1 Introduction

First, find the mean of 27 and 5:

27 + 5

2= 16

Now, find the x value, knowing that the

average of x and 8 must be 16: π‘₯ + 8

2= 16

⟹32 = x + 8 cross multiply

β‡’ π‘₯ = 32 βˆ’ 8 = 24

Example :- 3 On his first 5 biology tests, Bob received the following scores: 72, 86, 92, 63, and 77. What test score must Bob earn on his sixth test so that his average (mean score) for all six tests will be 80? Show how you arrived at your answer.

Answer:- Possible solution: Set up an equation to represent the situation. Remember to use all 6 test scores:

72+86+92+63+77+x

6= 80

cross multiply and solve: (80)(6) = 390 + π‘₯ β‡’ 480 = 390 + π‘₯ β‡’ π‘₯ = 480 βˆ’ 390 = 90

Example:- 4 The mean (average) weight of three dogs is 38 pounds. One of the dogs, Sparky, weighs 46 pounds. The other two dogs, Eddie and Sandy, have the same weight. Find Eddie's weight. Answer:- Let x = Eddie's weigh ( they weigh the same, so they are both represented by "x".) Let x = Sandy's weight Average: sum of the data divided by the number of data. x + x + 46 = 38 cross multiply and solve 3(dogs)

Page 9: Unit 1 Introduction

(38)(3) = 2x + 46 114 = 2x + 46

2π‘₯ = 114 βˆ’ 46 β‡’ π‘₯ =68

2= 34

∴ Eddie weighs 34 pounds.

For Class interval:

π‘€π‘’π‘‘π‘–π‘Žπ‘› = 𝐿 + (

𝑁2 βˆ’ 𝑐𝑓

𝑓) Γ— 𝑖

π‘Šβ„Žπ‘’π‘Ÿπ‘’ 𝐿 = πΏπ‘œπ‘€π‘’π‘Ÿ π‘™π‘–π‘šπ‘–π‘‘ π‘œπ‘“ π‘šπ‘’π‘‘π‘–π‘Žπ‘› π‘π‘™π‘Žπ‘ π‘ 

𝑁 = π‘‡π‘œπ‘‘π‘Žπ‘™ π‘›π‘’π‘šπ‘π‘’π‘Ÿ π‘œπ‘“ π‘‘π‘Žπ‘‘π‘Ž π‘–π‘‘π‘’π‘šπ‘ 

𝑐𝑓 = πΆπ‘’π‘šπ‘’π‘™π‘Žπ‘‘π‘–π‘£π‘’ π‘“π‘Ÿπ‘’π‘žπ‘’π‘’π‘›π‘π‘¦

𝑓 = π‘“π‘Ÿπ‘’π‘žπ‘’π‘’π‘›π‘π‘¦ π‘œπ‘“ π‘‘β„Žπ‘’ π‘šπ‘’π‘‘π‘–π‘Žπ‘› π‘π‘™π‘Žπ‘ π‘ 

𝑖 = π‘‘β„Žπ‘’ π‘π‘™π‘Žπ‘ π‘  π‘–π‘›π‘‘π‘’π‘Ÿπ‘£π‘Žπ‘™ π‘œπ‘“ π‘‘β„Žπ‘’ π‘šπ‘’π‘‘π‘–π‘Žπ‘› π‘π‘™π‘Žπ‘ π‘ 

π‘€π‘œπ‘‘π‘’ = π‘Ž +𝐢(𝑓𝑖 βˆ’ π‘“π‘–βˆ’1)

2𝑓𝑖 βˆ’ π‘“π‘–βˆ’1 βˆ’ 𝑓𝑖 +1

π‘Šβ„Žπ‘’π‘Ÿπ‘’ π‘Ž = π‘šπ‘œπ‘‘π‘Žπ‘™ π‘π‘™π‘Žπ‘ π‘ 

𝑓𝑖 = π‘šπ‘Žπ‘₯π‘–π‘šπ‘’π‘š π‘“π‘Ÿπ‘’π‘žπ‘’π‘’π‘›π‘π‘¦

𝐢 = πΆπ‘œπ‘›π‘ π‘‘π‘Žπ‘›π‘‘ π‘‘π‘–π‘“π‘“π‘’π‘Ÿπ‘’π‘›π‘π‘’ π‘“π‘œπ‘Ÿ π‘’π‘Žπ‘β„Ž π‘π‘™π‘Žπ‘ π‘ 

Question:- Find the median of the following data.

Cost 10-20 20-30 30-40 40-50 50-60

Items in a group 4 5 3 6 3

Page 10: Unit 1 Introduction

Solution:-

Cost Number of items in the group Cumulative frequency 10-20 4 4

20-30 5 9 30-40 3 12

40-50 6 18 50-60 3 21

Here N=21 β‡’ 𝑁

2= 10.5

The median class is 30-40.

From Formula,

π‘€π‘’π‘‘π‘–π‘Žπ‘› = 𝐿 + (

𝑁2

βˆ’ 𝑐𝑓

𝑓) Γ— 𝑖

L=30, 𝑖 = 10, 𝑐𝑓 = 9

π‘€π‘’π‘‘π‘–π‘Žπ‘› = 30 +(10.5βˆ’9)

12Γ— 10 = 30 + 1.25 = 31.25

Question:- Find the Mode of the following distribution:

Class Interval 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80 Frequency 5 9 8 12 28 20 12 11

Solution:-

Maximum Frequency=28, Modal class=40-50

From Formula,

π‘€π‘œπ‘‘π‘’ = π‘Ž +𝐢(𝑓𝑖 βˆ’ 𝑓𝑖 βˆ’1)

2𝑓𝑖 βˆ’ π‘“π‘–βˆ’1 βˆ’ 𝑓𝑖+1

π‘Ž = 40, 𝐢 = 10, 𝑓𝑖 = 28, π‘“π‘–βˆ’1 = 12, 𝑓𝑖+1 = 20

Mode=40+ 10(28βˆ’12)

(2Γ—28)βˆ’12βˆ’20= 40 + 6.666 = 46.666

Page 11: Unit 1 Introduction

FAQs - Measures of Central Tendency

What is the best measure of central tendency?

There can often be a "best" measure of central tendency with regards to the data you are analyzing, but there is no one "best" measure of central tendency. This is because whether you use the median, mean or mode will depend on the type of data you have (see our Types of Variable guide), such as nominal or continuous data; whether your data has outliers and/or is skewed; and what you are trying to show from your data. Further considerations of when to use each measure of central tendency is found in our guide on the previous page.

In a strongly skewed distribution, what is the best indicator of central tendency?

It is usually inappropriate to use the mean in such situations where your data is skewed. You would normally choose the median or mode, with the median usually preferred. This is discussed on the previous page under the subtitle, "When not to use the mean".

Does all data have a median, mode and mean?

Yes and no. All continuous data has a median, mode and mean. However, strictly speaking, ordinal data has a median and mode only, and nominal data has only a mode. However, a consensus has not been reached among statisticians about whether the mean can be used with ordinal data, and you can often see a mean reported for Likert data in research.

When is the mean the best measure of central tendency?

The mean is usually the best measure of central tendency to use when your data distribution is continuous and symmetrical, such as when your data is normally distributed. However, it all depends on what you are trying to show from your data.

When is the mode the best measure of central tendency?

The mode is the least used of the measures of central tendency and can only be used when dealing with nominal data. For this reason, the mode will be the best measure of central tendency (as it is the only one appropriate to use) when

Page 12: Unit 1 Introduction

dealing with nominal data. The mean and/or median are usually preferred when dealing with all other types of data, but this does not mean it is never used with these data types.

When is the median the best measure of central tendency?

The median is usually preferred to other measures of central tendency when your data set is skewed (i.e., forms a skewed distribution) or you are dealing with ordinal data. However, the mode can also be appropriate in these situations, but is not as commonly used as the median.

What is the most appropriate measure of central tendency when the data has outliers?

The median is usually preferred in these situations because the value of the mean can be distorted by the outliers. However, it will depend on how influential the outliers are. If they do not significantly distort the mean, using the mean as the measure of central tendency will usually be preferred.

In a normally distributed data set, which is greatest: mode, median or mean?

If the data set is perfectly normal, the mean, median and mean are equal to each other (i.e., the same value).

For any data set, which measures of central tendency have only one value?

The median and mean can only have one value for a given data set. The mode can have more than one value

MERITS AND DEMERITS OF MEAN, MEDIAN AND MODE

MEAN

The arithmetic mean (or simply "mean") of a sample is the sum of the sampled

values divided by the number of items in the sample.

MERITS OF ARITHEMETIC MEAN

1. ARITHEMETIC MEAN RIGIDLY DEFINED BY ALGEBRIC FORMULA

2. It is easy to calculate and simple to understand

Page 13: Unit 1 Introduction

3. IT BASED ON ALL OBSERVATIONS AND IT CAN BE REGARDED AS

REPRESENTATIVE OF THE GIVEN DATA

4. It is capable of being treated mathematically and hence it is widely used in

statistical analysis.

5. Arithmetic mean can be computed even if the detailed distribution is not

known but some of the observation and number of the observation are

known.

6. It is least affected by the fluctuation of sampling

DEMERITS OF ARITHMETIC MEAN

1. It can neither be determined by inspection or by graphical location

2. Arithmetic mean cannot be computed for qualitative data like data on

intelligence honesty and smoking habit etc

3. It is too much affected by extreme observations and hence it is not

adequately represent data consisting of some extreme point

4. Arithmetic mean cannot be computed when class intervals have open ends

MEDIAN The median is that value of the series which divides the group into two equal parts, one part comprising all values greater than the median value and the other part comprising all the values smaller than the median value. MERITS OF MEDIAN (1) Simplicity:- It is very simple measure of the central tendency of the series. I the case of simple statistical series, just a glance at the data is enough to locate the median value. (2) Free from the effect of extreme values: - Unlike arithmetic mean, median value is not destroyed by the extreme values of the series.

Page 14: Unit 1 Introduction

(3) Certainty: - Certainty is another merits is the median. Median values are always a certain specific value in the series. (4) Real value: - Median value is real value and is a better representative value of the series compared to arithmetic mean average, the value of which may not exist in the series at all. (5) Graphic presentation: - Besides algebraic approach, the median value can be estimated also through the graphic presentation of data. (6) Possible even when data is incomplete: - Median can be estimated even in the case of certain incomplete series. It is enough if one knows the number of items and the middle item of the series.

DEMERITS OF MEDIAN Following are the various demerits of median: (1) Lack of representative character: - Median fails to be a representative measure in case of such series the different values of which are wide apart from each other. Also, median is of limited representative character as it is not based on all the items in the series. (2) Unrealistic:- When the median is located somewhere between the two middle values, it remains only an approximate measure, not a precise value. (3) Lack of algebraic treatment: - Arithmetic mean is capable of further algebraic treatment, but median is not. For example, multiplying the median with the number of items in the series will not give us the sum total of the values of the series. However, median is quite a simple method finding an average of a series. It is quite a commonly used measure in the case of such series which are related to qualitative observation as and health of the student.

Page 15: Unit 1 Introduction

MODE

The value of the variable which occurs most frequently in a distribution is called the mode. MERITS OF M0DE

Following are the various merits of mode: (1) Simple and popular: - Mode is very simple measure of central tendency. Sometimes, just at the series is enough to locate the model value. Because of its simplicity, it s a very popular measure of the central tendency. (2) Less effect of marginal values: - Compared top mean, mode is less affected by marginal values in the series. Mode is determined only by the value with highest frequencies. (3) Graphic presentation:- Mode can be located graphically, with the help of histogram. (4) Best representative: - Mode is that value which occurs most frequently in the series. Accordingly, mode is the best representative value of the series. (5) No need of knowing all the items or frequencies: - The calculation of mode does not require knowledge of all the items and frequencies of a distribution. In simple series, it is enough if one knows the items with highest frequencies in the distribution. DEMERITS OF M0DE Following are the various demerits of mode: (1) Uncertain and vague: - Mode is an uncertain and vague measure of the central tendency. (2) Not capable of algebraic treatment: - Unlike mean, mode is not capable of further algebraic treatment. (3) Difficult: - With frequencies of all items are identical, it is difficult to identify

Page 16: Unit 1 Introduction

the modal value. (4) Complex procedure of grouping:- Calculation of mode involves cumbersome procedure of grouping the data. If the extent of grouping changes there will be a change in the model value. (5) Ignores extreme marginal frequencies:- It ignores extreme marginal frequencies. To that extent model value is not a representative value of all the items in a series. Besides, one can question the representative character of the model value as its calculation does not involve all items of the series.

Dispersion

In statistics, dispersion (also called variability, scatter, or spread) denotes how stretched or squeezed is a distribution (theoretical or that underlying a statistical sample). Common examples of measures of statistical dispersion are the variance, standard deviation and interquartile range.

Dispersion is contrasted with location or central tendency, and together they are the most used properties of distributions.

Measures of dispersion

The set of constants which would in a concise way explain the β€œvariability”, or β€œscatter” in a data is called β€œMeasures of dispersion or variability”.

The average for two groups of the same number of measurements may be equal, but one group may be more variable then the others.

e.g. set of five values 5,6,7,8,9 has the mean as 7; while other set of five values 1,6,4,10,14 also has the same mean 7. The second set has more variability then the first.

Usually four measures of dispersion or variability are defined.

Range:-

The Range is the difference between the two extreme values.

In frequency distribution, 𝑅 = (πΏπ‘Žπ‘Ÿπ‘”π‘’π‘ π‘‘ π‘₯ π‘£π‘Žπ‘™π‘’π‘’) – (π‘†π‘šπ‘Žπ‘™π‘™π‘’π‘ π‘‘ π‘₯ π‘£π‘Žπ‘™π‘’π‘’)

Page 17: Unit 1 Introduction

Example: In {4, 6, 9, 3, 7} the lowest value is 3, and the highest is 9. So the range is 9-3 = 6. Quartile deviation:- Median bisects the distribution. If the distribution divided into four parts, quartiles are obtained. First Quartile is𝑄1 and third Quartile is 𝑄3 .

𝑄1 = 𝑙 +(

𝑁4

βˆ’ 𝑓𝑄1)

𝑓× 𝐢 𝑄3 = 𝑙 +

(3𝑁4

βˆ’ 𝑓𝑄3)

𝑓× 𝐢

Where 𝑙 = lower limit of the Quartile class 𝐢 = common factor

Quartile Deviation is defined as 𝑄. 𝐷. =1

2(𝑄3 βˆ’ 𝑄1)

Average Deviation:-If average chosen A, then average deviation about A is average deviation.

𝐴. 𝐷. (𝐴) =1

3βˆ‘|π‘₯𝑖 βˆ’ 𝐴| π‘“π‘œπ‘Ÿ π‘‘π‘–π‘ π‘π‘Ÿπ‘’π‘‘π‘’ π‘‘π‘Žπ‘‘π‘Ž

=1

3βˆ‘π‘“π‘–|π‘₯𝑖 βˆ’ 𝐴| π‘“π‘œπ‘Ÿ π‘Ž π‘“π‘Ÿπ‘’π‘žπ‘’π‘’π‘›π‘π‘¦ π‘‘π‘–π‘ π‘‘π‘Ÿπ‘–π‘π‘’π‘‘π‘–π‘œπ‘›

Standard deviation:-

Standard deviation(𝜎) = √1

π‘›βˆ‘(π‘₯𝑖 βˆ’ οΏ½Μ…οΏ½)2 π‘“π‘œπ‘Ÿ π‘‘π‘–π‘ π‘π‘Ÿπ‘’π‘‘π‘’ π‘‘π‘Žπ‘‘π‘Ž π‘‘π‘–π‘ π‘‘π‘Ÿπ‘–π‘π‘’π‘‘π‘–π‘œπ‘›

= √1

π‘βˆ‘π‘“π‘– (π‘₯𝑖 βˆ’ οΏ½Μ…οΏ½)2 π‘“π‘œπ‘Ÿ π‘“π‘Ÿπ‘’π‘žπ‘’π‘’π‘›π‘π‘¦ π‘‘π‘–π‘ π‘‘π‘Ÿπ‘–π‘π‘’π‘‘π‘–π‘œπ‘›

Square of standard deviation, 𝜎2 is defined as Variance (𝑉).

𝑉 = 𝜎2 =1

π‘βˆ‘π‘“π‘– (π‘₯𝑖 βˆ’ οΏ½Μ…οΏ½)2

Coefficient of variation

In probability theory and statistics, the coefficient of variation (CV) is a standardized measure of dispersion of a probability distribution or frequency distribution. It is defined as the ratio of the standard deviation 𝜎to the mean πœ‡ . It is also known as unitized risk or the variation coefficient. The absolute value of the CV is sometimes known as relative standard deviation (RSD), which is expressed as a percentage.

Page 18: Unit 1 Introduction

Definition

The coefficient of variation (CV) is defined as the ratio of the standard deviation 𝜎 to the mean πœ‡ :

𝐢𝑣 =𝜎

πœ‡

It shows the extent of variability in relation to mean of the population.

Example:-The owner of a restaurant is interested in how much people spend at the restaurant. He examines 10 randomly selected receipts for parties of four and write down the following data: 44, 50, 38, 96, 42, 47, 40,39, 46, 50

Find mean, standard deviation and variance.

Solution:-

Mean is calculated by adding and dividing by 10.

Mean = οΏ½Μ…οΏ½ = 49.2

Following table is used to find standard deviation

P 𝒙 βˆ’ πŸ’πŸ—. 𝟐 (𝒙 βˆ’ πŸ’πŸ—. 𝟐)𝟐 44 -5.2 27.04 50 0.8 0.64

38 11.2 125.44 96 46.8 2190.24

42 -7.2 51.84 47 -2.2 4.84

40 -9.2 84.64 39 -10.2 104.04

46 -3.2 10.24 50 0.8 0.64

Total 2600.4

Standard Deviation= 𝜎

Page 19: Unit 1 Introduction

= √1

π‘›βˆ‘(π‘₯𝑖 βˆ’ οΏ½Μ…οΏ½)2 =√

2600.4

10βˆ’1= √

2600.4

9= √288.93 = ±16.997=17

Variance =𝜎2 = 288.93

Coefficient of variation (C.V.)= 𝐢𝑣 =𝜎

πœ‡=

16.997

49.2= 0.34547