Post on 12-Sep-2015
description
Agenda
ContentsThe Purpose of Statistics2Common Terminology4Regression Analysis15Common Distributions30Stochastic Process37Queuing Theory47
The Purpose of Statistics
The Purpose of StatisticsWays of interpreting a collection of data
Conclusion from limited amount of dataExploratory data analysis using descriptive statisticsStatistical inference using probabilistic assumptionsMaking predictions into the futureTesting validity of assumptionsStatisticsApplications
Common Terminology
Data & Summary
Measures of Central Tendency
Measures of Dispersion
SkewnessSkewness: Describes asymmetry from the normal distribution in a set of statistical data. Skewness can come in the form of "negative skewness" or "positive skewness", depending on whether data points are skewed to the left (negative skew) or to the right (positive skew) of the data average.
KurtosisKurtosis: Measure to describe the peakedness of a distribution when compared to the standard normal distribution
Empirical RuleAssumption: Data has a bell-shaped distribution (close to normal)
Chebyshevs RuleObservations lying outside 3 SDs of the mean are called Outliers.Outliers need to be properly dealt with before analyzing the data,
Covariance : Cov(X,Y)
Measure of how 2 variables move together
+ve implies X and Y move in same direction; -ve implies they move in different directions Measure of Linear Relationship Between the 2 Variables
Sign determines nature of relationship Does not capture quadratic or any other type of relationship,like Y= X2 ONLY for linear relationship
Correlation
Measure of how 2 variables move together
r=+1 perfect +ve relationship r= -1 perfect ve relationship r= 0 no relationship Standardized Measure Interpretation of Values r = Cov(X,Y)/[Var(X).Var(Y)] Lies between +1 and -1 Level of Association Only Linear Relationship
Regression Analysis
Regression
Estimation of Parameters
Logistic Regression
Multicollinearity in Multiple Regression
Assumptions of Regression
Residual Diagnostics
Estimation of Parameters in Regression
Coefficients in Equations
Variations in Predictor VariableCoefficient of determination is the ratio sum of squares due to regression to the total sum of squares
Standard Error
Standard error is the estimate of the standard deviation of the regression errors
Standard error of estimate, Se, measures the variability or scatter of the observed values around the regression line.
Multiple Linear Regression
Estimation of parameters in Multiple Linear Regression
Partial Correlation and Semi Partial Correlation
Partial correlation coefficient measures the relationship between two variables (say Y and X1) when the influence of all other variables (say X2, X3, , Xn) connected with these two variables (Y and X1) are removed.
Partial correlation represents the correlation between the response variable and a predictor after common variance with other variables have been removed from both response and predictor variables.
Partial correlation is the correlation between residualized response and residualized predictor.
Semi-partial (or part correlation) coefficient measures the relationship between two variables (say Y and X1) when the influence of all other variables (say X2, X3, , Xn) connected with these two variables (Y and X1) are removed from one of the variables (X1).
Partial Correlation and Semi Partial Correlation
Common Distributions
Uniform Distribution
Normal Distribution
Normal Distribution contd
Binomial Distribution
Poisson Distribution
A few more distributions
Stochastic Processes
What is a Stochastic Process?Stochastic Process: A family of random variables {X(t) | t T } defined on a given probability space. T is an index set; it may be discrete or continuous. Values assumed by X(t) are called states. State space (I): Set of all possible states. A stochastic process is also known as a random process or a chance process.
ExamplesContinuous-time, discrete state space: Number of job requests processed by a server in a system during the interval (0,t). X(t) = {0,1,}Discrete-time, discrete state space: Number of requests served during the n-th hour of day, Xn, n = {1,2,,24}, Xn = {0,1,}.Continuous-time, continuous state space: Response time of a request to a server given that it arrives at time t; {X(t), t > 0}.Discrete-time, continuous state space: Waiting time of nth customer in the system before receiving service; Xn, n = {1,2, }Stock market price at the end of each day.
Stationary ProcessA stochastic process is wide-sense or covariance stationary if (t) =E[X(t)] is independent of t,Cov [X(t), X(s)] depends only on the time difference |t-s|, for all t, s.E[X2(t)] 0, then {X(t), t T} is said to be stationary of order n. It is strictly stationary if it is stationary of order n for any integer n. Distribution is invariant under shifts of the time origin.Strictly stationary does not imply wide-sense stationary and converse also.
Markov Process A Markov Process is a stochastic process {X(t) : tT} where for any t0 < t1< < tn < t,
P[X(t) x| X(tn) xn, X(tn-1) xn-1, , X(t0) x0] = P[X(t) x|X(tn) xn] .Discrete time Markov chain, Continuous time Markov chain
Bernoulli ProcessFor each trial Yi, P(Yi=1)=p, if the outcome is a success or P(Yi=0)=q, if the outcome is a failure, then {Yi | i=1,2,} is a Bernoulli ProcessE[Yi] = p ; Var[Yi] = p (1-p){Yi | i=1,2,} is a discrete-time, discrete-state stochastic processStrict sense as well as wide sense stationaryExample: Packet transmission in the communication channel
Let Sn = Y1+Y2++Yn , Yis are sequence of Bernoulli trialsSn = S n-1 +Yn, {Sn} is a discrete-time discrete state Markov process.P[Sn = k / S n-1 = k] = P[Yn = 0] = 1 p P[Sn = k / S n-1 = k-1] = P[Yn = 1] = pHere, Sn is a binomial r.v. {Sn, n = 1,2, .} is Binomial process
Binomial Process
Random WalkIf each trial has more than two possible outcomes, Yi , i=1,2,,n be the set of independent discrete random variables, Sn = Yi, Sum process {Sn, n=1,2,} is a Markov chain known as random walk. Sn = Sn-1 + yn ..
Discrete Time Markov Chain (DTMC)If {X(t), t T} satisfies the Markov property then {X(t), t T} is a Markov process.If state space I is discrete, then {X(t), t T} is a Markov chain.If parameter space T is discrete, then {X(t), t T} is a DTMC ti - discrete time points, T={0,1,2,}
We can write as { Xn , n = 0,1,2,3,} is a DTMC where P{Xn = in | X0= i0, X1= i1,,Xn-1= in-1] =P { Xn= in | Xn-1= in-1}We observe the system at discrete set of time points. Xi - state of the system at ith time step.
Discrete Time Markov Chain (DTMC)Xi -> state of the system at ith time step.X0 -> initial state of the system.
Distribution of XnPj(n) = P(Xn = j), j=0,1,2,
Conditional DistributionPj,k(m,n) = P[Xn = k | Xn = j], 0 m n When the Markov chain is time-homogeneousPj,k(n) = P[Xn+m = k | Xm = j], n-step transition prob.Pjk = Pjk(1) = P[Xn = k | Xn-1 = j], n 10-step transition prob. Pjk(0)= 1, if j=k 0, otherwise
Queuing Theory
Queueing SystemsModel processes in which customers arrive. Wait for their turn to receive service.Are serviced and then leave. Examples: - Supermarket check outs - Railway reservation counters - Computer service center - Calls allocation in telecommunication system
Pictorial RepresentationServer Queue(waiting line)CustomerArrivalsCustomerDepartures
Queuing CharacteristicsArrival pattern of customers.Service pattern of customers.Number of service channels.System capacity.Queue discipline.Kendall Notation (A/B/X/Y/Z) is used to represent a queueing system.
Kendall Notation: A/B/X/Y/ZA: Distribution of inter arrival timesB: Distribution of service timesX: Number of servers Y: Maximum number of customers in systemZ: Queue Discipline
M/M/1 QueueArrival process: Poisson with rate Service times: Exponential with parameter Service times and interarrival times are independentSingle serverInfinite capacity in a system N(t): Number of customers in a system at time t (state)
State transition diagram
M/M/1 Queue: Markov Chain Formulation Transitions due to arrival or departure of customers Only nearest neighbors transitions are allowed. State of the process at time t: N(t) = i ( i 0). {N(t): t 0} is a continuous-time Markov chain with
Stationary DistributionBirth-death process
Normalization constant
Stationary distribution
Performance Measures (1)
Performance Measures (2)
M/M/c QueuePoisson arrivals with rate Exponential service times with parameter c serversArriving customer finds n customers in system n < c: it is routed to any idle server n c: it joins the waiting queue all servers are busy
Birth-death process with state-dependent death rates
M/M/c QueueSteady state solutions
Normalizing
Performance Measures (1)
Performance Measures (2)
M/M/c/K QueuesPoisson arrivals with rate Exponential service times with parameter c Servers with system capacity KArriving customer find n customers already in system, where, if n < c: it is routed to an idle server n c: it joins the waiting queue all servers are busyCustomers forced to leave the system if already K present in the system.
Birth death process with state dependent death rates
Steady state solutions
Performance Measures(1)
Performance Measures(2)
M/M/c/c Queuec servers, no waiting roomAn arriving customer that finds all servers busy is blockedStationary distribution:
**************************