Statistical analysis training course

65
STATISTICAL ANALYSIS TRAINING COURSE This course was lectured in Municipality of Dubai By Data Analyst: Marwa Abo-Amra On November 2014

Transcript of Statistical analysis training course

Page 1: Statistical analysis training course

STATISTICAL ANALYSIS TRAINING COURSE

This course was lectured in Municipality of DubaiBy Data Analyst: Marwa Abo-AmraOn November 2014

Page 2: Statistical analysis training course

WHY WE USE STATISTICS & STATISTICAL

ANALYSISData don’t make any sense, we will have to resort to statistics.

Page 3: Statistical analysis training course

IN TODAY’S WORLD… Customer

Surveys Medical News Demographics

Political Polls Economic Predictions

Marketing Information

Sales Forecasts

Stock Market Projections

Consumer Price Index

Sports Statistics

Page 4: Statistical analysis training course

COURSE CONTENTS• Introduction to Statistics

• Statistical Concepts

• Data Collection Methods

• Statistical Analysis

• Introduction to Statistical Software: SPSS and Minitab

• Data Preparation and Analysis using MS Excel, SPSS, & Minitab

• Statistical Report writing

• Practical Examples and Applications

Page 5: Statistical analysis training course

COURSE LEARNING OBJECTIVESThose who deal with the different fields in life should be familiar with the importance of data and how to deal with it as well as the scientific methods to extract the necessary indicators for decision-making, and help decision makers to know the various alternatives and how to evaluate them.

Page 6: Statistical analysis training course

INTRODUCTION TO STATISTICS

Page 7: Statistical analysis training course

DEFINITION OF STATISTICSStatistics is the science of collecting, organizing, presenting, analyzing, and interpreting numerical data to assist in making more effective decisions.

Page 8: Statistical analysis training course

STAGES OF STATISTICS“Statistics is a way to get information from data.”

Collect Organize Present Analyze Interpret

Descriptive Analysis

Make Valid Conclusion & Take Correct Decision

Data Preparation Inferential Analysis

Page 9: Statistical analysis training course

IMPORTANCE OF STATISTICS

simplifies complex

data

provides a

technique of

comparison

helps in formulati

ng policies

helps to test &

formulate theories

presents the facts

in a definite

form

studies relations

hip

helps in forecasti

ng

Page 10: Statistical analysis training course

STATISTICAL CONCEPTS

Page 11: Statistical analysis training course

SOME BASIC CONCEPTS

Population

Sample &

Sampling

Parameter &

Statistic

Elementary Units

& Variable

s

Page 12: Statistical analysis training course

VARIABLES

Dependent

Independent Cause Effect

Page 13: Statistical analysis training course

Variables

Descriptive InformationDiscrete

(Counted)Continuous (Measured)

Qualitative Quantitative

SexNationality

Type of crimeOccupation

ReligionMarital status

Literacy…etc.

Number of children in a family

Number of accidents per week

Students marks…etc.

AgeDistanceHeightWeight…etc.

Page 14: Statistical analysis training course

VARIABLES & UNITSEmployees Race Sex Job Title Years of

Service Annual Salary

Abel Caucasian Male Pilot 2 $34,000

Cruz Caucasian Male Chief mechanic 10 60,000

Dunn Western Male Chief pilot 23 70,000

Hill Western Female Secretary 5 14,000

King Caucasian Male Janitor 8 17,000

Otis Caucasian Male Grounds keeper 10 20,000

West Western Male Mechanic 2 36,000

Wolf Caucasian Female Pilot 7 36,000

Zorn Caucasian Female Mechanic 7 40,000

Qualitative variable

Elementary unit

Frame Population of

employee sexes

Population of

employee salaries

Qualitative variable

Datum

Sample of employee salaries

Selected characteristics of all the full-time employees of Mountain Aviation, Inc.,: July 1, 1994

Page 15: Statistical analysis training course

QUANTITATIVE VARIABLES

Nominal• Attributes are

only named• No order

Ordinal• Attributes can

be rank-ordered• Distance is not

meaningful

Interval• Distance is

meaningful

Ratio• Absolute zero

Strongest

Weakest

Discrete Continuous

Measurement Level

Page 16: Statistical analysis training course

POPULATION VS. SAMPLINGPopulation

Sample

Parameter

Statistic

Something that describes the

population as a whole

Something that describes the

sample

Elementary Unit

Sampling Inference

Target Population

Page 17: Statistical analysis training course

WHY SAMPLING?Reduced

costTime factor

Very large

populations

Partly accessibl

e populatio

ns

Destructive units Accuracy

Page 18: Statistical analysis training course

SIMPLE RANDOM SAMPLINGFor the sampling plan to be statistically valid, the sample must be randomly selected from the population.

Page 19: Statistical analysis training course

DATA COLLECTION & PREPARATION

Page 20: Statistical analysis training course

WHAT IS DATA?

Data is a collection of facts, such as values or measurements.Examples of data include price, exam scores, export and import, labor and so on.

Page 21: Statistical analysis training course

Data Information Statistics

20 kg, 25 kg 5 individuals in the 20-to-25-kg range Mean weight = 22.5 kg

28 kg, 30 kg, etc. 15 individuals in the 26-to-30-kg range

Median weight = 28 kg

DATA VS. INFORMATION VS. STATISTICS

Page 22: Statistical analysis training course

SOURCES OF DATA

Internal• Organization

internal recordsExternal• Published and

unpublished external sources

Page 23: Statistical analysis training course

DATA COLLECTION METHODS

Cens

us

Sam

ple

Surv

eyEx

perim

ent

Obse

rvat

iona

l St

udy

Census

Survey

Experiment (Control)

Observation (No Control)

Page 24: Statistical analysis training course

OTHER DATA COLLECTION METHODS Interviews Face-to-face Telephone

Focus Groups

Ethnographies, Oral History, & Case Studies

Schedule through enumerators

Documents & Records

Page 25: Statistical analysis training course

Questionnaire Schedule• It is not confirmed if expected

respondent have filled the answers• Respondent identity is know

• Very slow • Information collected in time• No personal contact • Direct personal contact• Used only when respondent is

educated & cooperative• Information can be collected from

illiterate people• Wider distribution of samples

possible• Difficult for wider distribution

• More incomplete and false information

• Relatively more complete and correct information

• Information validity depends on quality of questionnaire

• Information validity depends on honesty & competence of numerator

• Questionnaire physical appearance should be attractive

• Schedule physical appearance not necessary

• Observation cannot be used • Observation can be used by enumerator

Comparison

Page 26: Statistical analysis training course

FRAMING A QUESTIONNAIRE OR A SCHEDULE

Cover Letter

Number of Questions

Nature of Questions

Questions should be simple

Arrangement of Questions

Information collected is usable

Avoid mathematical questions

Page 27: Statistical analysis training course

TYPES OF QUESTIONS

• Yes/No• Multiple

Choice• Scaled (i.e.,

Likert)

• no predefined options or categories

• closed-ended but arranged one under the other

• answered only when the respondent provides a particular response

Contingency

questions

Matrix question

s

Closed ended

questions

Open ended

questions

Page 28: Statistical analysis training course

LIKERT SCALE

1 2 3 4 5 6 7 8 9

Disagree

Agree

9-point Scale

5-point Scale

Not at all

Satisfied

Not Satisfie

d

Neutral Satisfied

Very Satisfie

d

1 2 3 4 5

7-point Scale

Entirely Disagre

e

Mostly Disagre

e

Somewhat

Disagree

Neither Agree nor Disagree

Somewhat Agree

Mostly Agree

Entirely Agree

1 2 3 4 5 6 7

Page 29: Statistical analysis training course

SEMANTIC DIFFERENTIAL SCALE

Page 30: Statistical analysis training course

RANK-ORDER SCALE

Page 31: Statistical analysis training course

STATISTICAL ANALYSIS

Page 32: Statistical analysis training course

STATISTICAL ANALYSIS METHODS & TECHNIQUES

Descriptive Statistics

Tables & Graphs

Summary Measures

Inferential Statistics

Confidence Intervals

Hypothesis Testing

Page 33: Statistical analysis training course

DATA ANALYSIS

- Performed to study a single variable- Descriptive statistics• Tables & Graphs• Summary Measures• One sample tests• Normality Tests• Normal Probability

Plot

- Performed to study two variables relationship- Statistical Analysis• Comparisons• Contingency Tables• Scatter plot• Correlation• Regression

Univ

aria

teBivariate

Page 34: Statistical analysis training course

DESCRIPTIVE STATISTICSTables & Graphs

Frequency Table

Frequency HistogramBar & Column ChartTime-series Line GraphPie ChartStem-&-leaf DiagramBoxplot

Summary Measures

Measures of Location

Measures of Variability

Measures of Shape

Proportion

Page 35: Statistical analysis training course

SUMMARY MEASURES

Measures of Central Tendency (Location)• Mean (Average)• Median (Middle

Value)• Mode (Most often

Value)

Measures of Dispersion (Spread)• Range (Highest

Value – Lowest Value)

• Variance• Standard Deviation• Coefficient of

Variation

Measures of Shape• Skewness• Kurtosis

Proportion• Frequency of

observations in a particular category as a fraction of all observations

Page 36: Statistical analysis training course

TABLES & GRAPHSFrequency Table Frequency Histogram Frequency Histogram

Time-series Line Graph Pie Chart Stem-&-Leaf Diagram Boxplot

Page 37: Statistical analysis training course

FREQUENCY TABLE

Page 38: Statistical analysis training course

HISTOGRAM VS. BAR CHART

Page 39: Statistical analysis training course

TIME-SERIES LINE GRAPH

Page 40: Statistical analysis training course

PIE CHART

Page 41: Statistical analysis training course

STEM-AND-LEAF DIAGRAM

Page 42: Statistical analysis training course

BOXPLOT

Page 43: Statistical analysis training course

DATA REPRESENTATION SUMMARY

Variable Associated Data type

Graphically represented by

Summary measures

Central Tendency Dispersion

Qualitative (Categorical)

Nominal Pie ChartColumn or Bar Chart

Mode Range

Ordinal Column or Bar Chart ModeMedian

Range

Quantitative (Scale)

Interval & Ratio

Frequency HistogramBoxplotStem-&-leaf DiagramTime-series line Graph

ModeMedianMean

RangeVarianceStandard DeviationCoefficient of Variation

Effective representation of data depends on choosing proper graphical tool and summary measures.

Page 44: Statistical analysis training course

NORMAL DISTRIBUTION

Page 45: Statistical analysis training course

CENTRAL LIMIT THEOREM

The central limit theorem in it's

shortest form states that the

sampling distribution of the

sampling means approaches a

normal distribution as the sample

size gets larger, regardless of the

shape of the population

distribution.

Page 46: Statistical analysis training course

TEST OF NORMALITY Since a number of the most common statistical tests rely

on the normality of a sample or population, it is often

useful to test whether the underlying distribution is normal,

or at least symmetric. This can be done via the following

approaches:

• Review the distribution graphically (histograms, & Normal Probability plots)

• Analyze the skewness & kurtosis

• Employ statistical tests (Chi-square, Kolmogorov-Smironov, & Shapiro-Wilk)

Page 47: Statistical analysis training course

INFERENTIAL STATISTICS

Confidence

Interval Hypothesis Testing

Page 48: Statistical analysis training course

DATA ANALYSIS

- Performed to study a single variable- Descriptive statistics• Tables & Graphs• Summary Measures• One sample tests• Normality Tests• Normal Probability

Plot

- Performed to study two variables relationship- Statistical Analysis• Comparisons• Contingency Tables• Scatter plot• Correlation• Regression

Univ

aria

teBivariate

Page 49: Statistical analysis training course

STATISTICAL TESTSIndependent

Samples Parametric

Tests1 Sample(t-test)

2 Samples(t-test)

K Samples(ANOVA)

Independent Samples

Nonparametric Tests1 Sample

(Binomial, Chi-square,

Kolmogorov-Smirnov)

2 Samples(Mann-Whitney)

K Samples(Kruskal-Wallis)

Related Samples

Parametric Tests

2 Samples(t-test)

K Samples(Repeated Measures ANOVA)

Related Samples

Nonparametric Tests

2 Samples(Wilcoxon)

K Samples(Friedman)

Page 50: Statistical analysis training course

CONTINGENCY TABLESA two-way table is a useful tool for examining relationships between categorical variables. The entries in the cells of a two-way table can be frequency counts or relative frequencies

Page 51: Statistical analysis training course

SCATTER PLOT

A Scatter (XY) Plot has points that show the relationship between two sets of data.

In this example, each dot shows one person's weight versus their height.

Positive Association

Negative Association No

Association

Page 52: Statistical analysis training course

CORRELATION

When two sets of data are strongly linked together we say they have a High Correlation.

• Correlation is Positive when the values increase together, and

• Correlation is Negative when one value decreases as the other increases

Correlation can have a value:1 is a perfect positive correlation0 is no correlation (the values don't seem linked at all)-1 is a perfect negative correlation

The value shows how good the correlation is (not how steep the line is), and if it is positive or negative.

Page 53: Statistical analysis training course

LINEAR REGRESSION In a cause and effect relationship, the independent variable is the cause, and the dependent variable is the effect.

Page 54: Statistical analysis training course

SPSS AND MINITAB

Page 55: Statistical analysis training course

MS EXCEL

Page 56: Statistical analysis training course

STATISTICAL PACKAGE FOR SOCIAL SCIENCES (SPSS)

Page 57: Statistical analysis training course

MINITAB

Page 58: Statistical analysis training course

DATA PREPARATION AND ANALYSIS USING MS EXCEL,

SPSS, & MINITAB

Page 59: Statistical analysis training course

DATA PREPARATION

Collect Data

Enter the

Data into

Computer

Check for

Data Accura

cy

Develop a

Database

Structure

Page 60: Statistical analysis training course

WHAT NEXT…?

Data ready in

Excel

Import Data to SPSS or Minitab

Explore data using

Descriptive

Statistics

Perform Hypoth

esis Testing

if needed

Write your

report

Page 61: Statistical analysis training course

STATISTICAL REPORT WRITING

Page 62: Statistical analysis training course

HOW TO WRITE A RESEARCH REPORT…?Cover Page• Title• Date• Researcher Name• Supervisor Name

Contents & Executive Summary

Introduction• Data Source• Data Analysis

Technique• Main Research Aim

Data Analysis• Descriptive• Research Questions Conclusion

Page 63: Statistical analysis training course

TAILORED REPORTS

Sales Real Estate

Marketing NPS Other

Page 64: Statistical analysis training course

SOURCES• https://cyfernetsearch.org/ilm_4_4• http://stattrek.com/statistics/data-collection-methods.aspx• http://onlinestatbook.com/2/introduction/descriptive.html• http://books.google.com.eg/books?

id=JbzzoRyOiXcC&pg=PA19&lpg=PA19&dq=introduction+to+statistics+internal+and+external+data&source=bl&ots=hKFxFq_DHG&sig=EHQL0LJ8JxvGQ1dRN5Et8BX6VLk&hl=en&sa=X&ei=2ohrVKC7FoLksASUzYGQCQ&ved=0CCgQ6AEwAg#v=onepage&q=introduction%20to%20statistics%20internal%20and%20external%20data&f=false

• http://www.sagepub.com/upm-data/10985_Chapter_4.pdf• http://sociology.about.com/od/Statistics/a/Descriptive-inferential-statistics.htm• http://www.statcan.gc.ca/edu/power-pouvoir/ch2/methods-methodes/5214773-eng.htm• http://www.stats.gla.ac.uk/steps/glossary/index.html• http://www.psychstat.missouristate.edu/introbook/sbk28m.htm

Page 65: Statistical analysis training course

THANK YOU

Have Successful career, Wonderful Life full of Joy, Productivity, & Happiness

Marwa Abo-AmraEmail: [email protected]: analystamra.blogspot.com