Quantitative Technique New

8/11/2019 Quantitative Technique New

1/35

QM

Brajaballav KarB. Tech (Electrical, CET)PGDM (XIMB)

1


2/35

DESCRIPTIVE STATISTICS

It is quantitatively describing the main features of acollection of data, different from inferential statistics (orinductive statistics), It aims to summarize a sample (notlearn about the population that the sample of datarepresents). This generally means that descriptivestatistics, unlike inferential statistics, are not developedon the basis of probability theory. Descriptive statisticsinclude measures of central tendency (mean, medianand mode) and measures of variability or dispersion

(standard deviation (or variance), the minimum andmaximum values of the variables, kurtosis andskewness.Entries in an analysis of variance table can also be

regarded as summary statistic 2


3/35

Data array: Arrange values in ascending or descendingorder (+ve notice largest, smallest value, divide datainto sections, notice value if appears more than once,observe distance between succeeding values)Frequency distribution: is a table that organizes datainto classes, (into groups), it shows number ofobservations from the data set that fall into each of theclasses.Relative Frequency distribution: Express in fractionor % of the total number of observation: /Mutually exclusive: No data point falls into more thanone categoryAll inclusive: Sum of all the relative frequencies equal 3


4/35

Open ended class: when it allows the upper or the lowerend of a quantitative classification to be limit less (Age: 11-20, 21-30, 31-40, 41-50, 51-60, 61 and olderDiscrete Classes: Separate entities that do not progressfrom one class to the next without a break.

Continuous class: Progress from one class to the nextwithout a break, (ex weights of cans of tomatoes)The range must be divided by equal classes; that is thewidth of the interval from the beginning of one class to thebeginning of the next class must be same for every class.No of Classes: thumb rule 6 to 15 classesWidth of class interval =Ogives: A cumulative frequency distribution enables to seehow many observations live above or below certain values,

rather than only recording the number of items within theinterval. Graph of a cumulative frequency distribution is 4


5/35

CHAPTER 2: MEASURES OF CENTRAL TENDENCY

Summary statistics: eg Central tendency anddispersion (which describes the characteristicsof the data set)Central tendencyDispersionSkewness: Opposite to symmetry, reason ofskewness is frequency distribution is lopsided,

not at the middle. Positively skewed (frequencymore at the beginning); Negatively skewed(frequency more at the end)Kurtosis (Peakedness)

5


6/35

CENTRAL TENDENCY:

Arithmetic mean:(Characteristics of sample are called statistics andthat of population called parameter)

=x/(N) =x/(n)Grouped data= =f x/(n) (n = f) (in case of grouped data the midpoint taken is (if theclass interval is like x1-x2, x3-x4, then midpoint =

(x1+x3)/ 2=> This is an assumption andapproximation)-ve: a. affected by extreme values, if the class is openended then the mean can not be computed, all data

points are taken except in case of grouped data) 6


7/35

CENTRAL TENDENCY

Weighted mean: w = w * x / (w) Geometric Mean= root of (Product of all xvalues) (Where to use)n th root of the growth ex: cube root of (1.1*1.15 *1.2)


8/35

CENTRAL TENDENCY-MEDIAN

Middle most or most centralMedian Ungrouped data=

Array the data in ascending or descending order, then((n)+1)/2 th item is median in both odd and evencases.Data set odd, then middle item is medianIf the data set has even then average of the twomiddle item

Median Grouped data: Median Class: the class where the cumulativefrequency becomes (n+1)/ 2Then the assumption is the data points are evenly

spread over entire class interval: 8


9/35

EXAMPLE

Account Balance Frequency

0-49.9950.00-99.99100.00-149.99150.00-199.99200.00-249.99250.00-299.99300.00-349.99350.00-399.99400.00-449.99450.00-499.99

7812318782514713964

600 9


10/35

MEDIAN EXAMPLE

Median class : 100.00-149.99Median value is in (600+1)/2 = 300.5 =>300 th and 301 st item; 300 th item =99 th of the

median class (300-(78+123);Width of median class: (150.00-100.00)/ 187= 0.267, 1 st is 100.00 so 99 th = 100.00+ 98 *0.267=126.17100 th = 126.17+ 0.267=126.44 so median =(126.17+126.44)/ 2= 126.30

10


11/35

MEDIAN FORMULA

Median formula= [ { (n+1)/2 (F+1)}/ f m ]* w + L m n= total no of itemsF= sum of all the class frequencies upto BUT notincluding median class

Fm=frequency of median classw= class interval widthLm= lower limit of the median class intervalFor the above median by formula =126.35 and thedifference is because rounding+ve of median: Extreme values dont affect median, canbe calculated for open ended grouped data, unless themedian is in open ended class. Can be calculated forqualitative data (excellent, very good, good, average bad;find the frequency and then median)

11


12/35

CENTRAL TENDENCY-MODE

Mode: Value that is most often repeated in the datasetMode of ungrouped data is rarely used; reason being, chance cancause an unrepresentative data to be the most frequent value.Data set 0,0,1,1,2,2,4,4,5,5,6,6,7,7,8, 12, 15,15,15,19 => Mode is15 but is unrepresentative of the data set, since most of the valuesare below 10

No of data= 20So class interval (20-0)/6 =3.3 =>4; => No of Classes = 20/4=5Class 0-3 4-7 8-11 12-15 16-19 6 8 1 4 1 => Modal class is 4-7 Mo = L MO + {d1/(d1+d2)} * wL MO : Lower limit of modal class d1= frequency of the modal class the frequency of the classdirectly below it d2= frequency of the modal class the frequency of the classdirectly above it

w= width of the modal class interval 12


13/35

MODE EXAMPLE

Account Balance Frequency0-49.9950.00-99.99100.00-149.99

150.00-199.99200.00-249.99250.00-299.99300.00-349.99350.00-399.99400.00-449.99450.00-499.99

78123187

82514713964600

Lmo =100,d1=187-123=64,

d2=187-82=105;w=50=>Mo=119.00

13


14/35

ADVANTAGE MODE

Advantages: like median it can be used as acentral location for qualitative as well asquantitative data; mode not affected byextreme values; it also can be sued for openended class-ve: if the data occurs with same frequencythen it can not be used, in case of multiplemodes, it is difficult to compare

14


15/35

MEAN MEDIAN-MODE

Mean, Median, and mode are identical in symmetricaldistributionIn a positively skewed distribution (skewed to right), themode is at the highest point of the distribution, median is tothe right of that and the mean is to the right of both medianand modeIn a negatively skewed distribution (skewed to left), themode is at the highest point of the distribution, median is tothe left of that and the mean is to the left of both medianand mode.When the population is skewed positively or negatively themedian is often the best measure of location because it isalways between the mean and the mode. The median is notas highly influenced by the frequency of occurrence of asingle value as the mode nor is it pulled by extreme valuesas is the mean. 15


16/35

DISPERSION:

VariabilityWhy dispersion It gives additional information that enables us to

judge the reliability of our measure of centraltendency: Mean age 26; (case 1: Age1=2 Age 2= 52;case 2: Age1=24 Age 2= 28); If data is widely spread,then mean is less representativeCompare dispersion of different samples

Usage: Financial earnings more dispersed=> morerisk Quality parameters Drug Purity

16


17/35

MEASURE- OF DISPERSION

Range (difference between the highest and lowest observedvalues); Easy to understand and find but usefulness is limited.Heavily influenced by extremes; Open ended distributions donthave a range.

Interfractile range: In a frequency distribution, a given fraction orproportion of the data lie at or below a fractile. The median for exampleis the 0.5 fractile, because half the data set is less than or equal to thisvalueInterfractile range is a measure of spread between tow fractiles in afrequency distribution, i.e the difference between the values of the twofractilesFractiles: if they divide the data into 10 equal parts, it is called deciles, if4, then quartile, if 100 then percentileInter quartile range is difference between the values of the first and thirdquartiles (Q3-Q1)

Other measures: Variance and Standard Deviation; both indicateaverage distance of any observation in the data set from the mean

of the distribution 17


18/35

VARIANCE-Variance: 2

Population Variance 2 = ((x )2 )/ N which isequivalent to (x2 / N ) 2){used when x values arelarge and x- values are small (Square of a unit measure is not intuitive)

Variance of Grouped Data 2 = (f(x )2 )/ N = f(x2 ) / N 2Sample variance s 2 = (x )2 )/ (n- 1)= x2 / (n-1) n 2/ (n-1)

Standard Deviation: Square root of variance; only positiveroot to considerPopulation standard deviation= Square root ofPopulation variance

18


19/35

CHEBYSHEVS THEOREM:

Chebyshevs theorem: says that NO MATTER what the shape of the distribution, at least 75% ofthe values will fall within +2 Standard deviation,from the mean of the distribution and at least 89

percent of the values will lie within +3 standarddeviation from the mean.

However it can be more precisely

68% of the values within +1 std Dev95% of the values within +2 Std Dev99% of the values within +3 Std Dev

19


20/35

STANDARD SCORE: COEFFICIENT OF VARIATION

Standard Score:Standard score gives the number of standard deviations aparticular observation lies below or above the mean.Population Standard Score =(x - ) /

Relative Dispersion: The coefficient of variation 1. Standard deviation is an absolute measure ofdispersion that expresses the variation in the same unitas the original data. 2. Standard deviation alone cant becompared. So we need to know a. The mean b. The

standard deviation c. and how the standard deviationis compared with the mean So to compare we need a relative measure which iscoefficient of variation= /

20


21/35

PROBABILITY It is the chance something will happen, expressedin fraction, %Event: one or more the possible outcomes ofdoing something

An Experiment: An activity that produces theeventsSample Space: The set of all possible outcomesof an experiment

Mutually exclusive: if one and only one of theevents can take place at a timeCollectively exhaustive: when a list of the possibleevents that can result from an experiment

includes every possible outcome, the list is calledcollectively exhaustive 21


22/35

TYPES OF PROBABILITY:

Classical approachRelative frequency approachSubjective approach (Not to discuss)Classical approach: A priori, symmetrical , assumed (faircoin, un biased dice) we can know the probability beforehand

Relative frequency approach:In this approach, of relative frequency the probability isdefined as

1 observed relative frequency of an event in a very largenumber of trials (ex CA Pass percentage) or

2 The proportion of times that an event occurs in the long runwhen conditions are stable (This method uses the relativefrequencies of past occurrences as probabilities.Relative frequencies becomes stable as the number oftosses becomes large (under uniform conditions)

22


23/35

RULES:

Single=marginal=unconditional probability => onlyone event can take placeMutually exclusive Events, Add probabilities:either or events P(A or B) = P (A) +P(B)

Proportion of families having this many childrenNo Children 0 1 2 3 4 5 >6

0.05 0.10 0.3 0.25 0.15 0.10 0.05Whats the P(4 or more Children) =0.15+0.10+0.05=0.3

23


24/35

NOT MUTUALLY EXCLUSIVE EVENT

Not Mutually exclusive event; Addition Rule: P(A or B) = P (A) +P(B) - P(A and B) Male Age 30 Male 32

Female 45 Female 20 Male 40 Choose one person, who is either female or over 35=>P (female or over 35) =P (female) + P(over35) P (female and over 35)

2/5+2/5 -1/5 = 3/5 24

PROBABILITIES UNDER STATISTICAL


25/35

PROBABILITIES UNDER STATISTICALINDEPENDENCE:

Statistical Independence: The occurrence ofone has no effect on the probability ofoccurrence of any other event.Rolling a die:In the die rolling: Getting a 6 the first time andgetting a 6 the second time are independent.But:

Getting a 6 the first time a die is rolled and theevent that the sum of the numbers seen on thefirst and second trials is 8 are not independent.

25


26/35

3 TYPES OF PROBABILITIES UNDER STATISTICALINDEPENDENCE:

1. Marginal2. Joint3. Conditional

Marginal Probabilities of independent events: is simpleprobabilities (e.g fair coin toss P(H)=0.5, If unfair P(H) = 0.8then it is 0.8 every time)Joint probability of two independent events: P(AB) =P(A)*P(B) P(AB) = Probability of events A and B occurring together orin succession is Joint ProbabilityP(A)= Marginal Probability of event A occurringP(B) = Marginal Probability of event B occurring(example: Two heads in succession, dice: first 1 and then 6)P(H1) = P(H2)= P(H3)=0.5 =(marginal or absoluteprobability) But

= =26


27/35

CONDITIONAL PROBABILITY UNDER STATISTICAL

INDEPENDENCE:

Conditional probability of independentevents: The conditional probability of eventB given that Event A has occurred is simplythe probability of B (Because they areindependent, by definition)P(B|A) = P(B)

Ex: Probability of Head in second toss, giventhat first toss resulted in Head = 0.5

27


28/35

PROBABILITY UNDER STATISTICAL

INDEPENDENCE:

Type ofProbability

Symbol Formula

Marginal

JointConditional

P(A)

P(AB)P(B|A)

P(A)

P(A)* P(B)P(B)

28


29/35

PROBABILITY UNDER STATISTICAL

INDEPENDENCE:

Type ofProbability

Symbol Formula

Marginal

JointConditional

P(A)

P(AB)P(B|A)

P(A)

P(A)* P(B)P(B)

29


30/35

SIMPLE REGRESSIONRegression & Correlation=> Naure and strengthof relationship between two variableRegress to go back to the meanRegression analysis Estimating equation

(mathematically relating the variables)Types of relationship Dependent Independent Variables

One dependent-> Multiple independent variableDirect relationship: X increase; Y increase; Slope+veInverse Relationship: X increase;Y decrease;Slope ve 30


31/35

REGRESSION- CAUSE & EFFECT ?Differentiate Cause-effect;Dependent-Independent variableNot all relationships are cause and effect (

Relationship found by regression is ofassociation but not of cause and effect. Cause and Effect:

Cause should precede in time Presence of cause indicates presence ofeffect

Presence of effect indicates presence of 31


32/35

SCATTER DIAGRAM:

Transform tabular information to graphVisually ObserveDraw a fit: How?

Not necessarily touching each point, equalpoints to lie on either side of the line.Relationships could be linear/Curvilinear

32


33/35

33


34/35

TOTAL COST

It is known that the total cost is addition ofvariable cost and fixed cost, one businessmanknows that for, incurring a raw material cost of5 crore the total cost comes to 8.5crore and fora RM Cost of 8 Crore, the total cost is 10.6crore. The business man assumes a linearrelationship of the costs involved.

If he plans his raw material cost to be 10 croreswhat would be the total cost, he should beready to incur

34


35/35

REGRESSION LINE

We will only examine linear relationshipY(dependent)=a(y intercept)+b (slope) x X(independent variable)b= (Y2-Y1) /(x2-x1)Estimating Y (hat)= a + b X

Add the errors (take the lowest)Individual difference may be +ve, -ve; and will cancel

Add absolute values (take lowest)Does not consider large single deviation=> does not stress

magnitude of errorSo, Square the error => Penalize the large absolutedeviation; take the leastMathematicallyb= {(XY)- n( )}/ {x2-n2}

Quantitative Technique New

Documents

Transcript of Quantitative Technique New