Statistical analysis training course

Post on 15-Apr-2017

4.049 views 1 download

Transcript of Statistical analysis training course

STATISTICAL ANALYSIS TRAINING COURSE

This course was lectured in Municipality of DubaiBy Data Analyst: Marwa Abo-AmraOn November 2014

WHY WE USE STATISTICS & STATISTICAL

ANALYSISData don’t make any sense, we will have to resort to statistics.

IN TODAY’S WORLD… Customer

Surveys Medical News Demographics

Political Polls Economic Predictions

Marketing Information

Sales Forecasts

Stock Market Projections

Consumer Price Index

Sports Statistics

COURSE CONTENTS• Introduction to Statistics

• Statistical Concepts

• Data Collection Methods

• Statistical Analysis

• Introduction to Statistical Software: SPSS and Minitab

• Data Preparation and Analysis using MS Excel, SPSS, & Minitab

• Statistical Report writing

• Practical Examples and Applications

COURSE LEARNING OBJECTIVESThose who deal with the different fields in life should be familiar with the importance of data and how to deal with it as well as the scientific methods to extract the necessary indicators for decision-making, and help decision makers to know the various alternatives and how to evaluate them.

INTRODUCTION TO STATISTICS

DEFINITION OF STATISTICSStatistics is the science of collecting, organizing, presenting, analyzing, and interpreting numerical data to assist in making more effective decisions.

STAGES OF STATISTICS“Statistics is a way to get information from data.”

Collect Organize Present Analyze Interpret

Descriptive Analysis

Make Valid Conclusion & Take Correct Decision

Data Preparation Inferential Analysis

IMPORTANCE OF STATISTICS

simplifies complex

data

provides a

technique of

comparison

helps in formulati

ng policies

helps to test &

formulate theories

presents the facts

in a definite

form

studies relations

hip

helps in forecasti

ng

STATISTICAL CONCEPTS

SOME BASIC CONCEPTS

Population

Sample &

Sampling

Parameter &

Statistic

Elementary Units

& Variable

s

VARIABLES

Dependent

Independent Cause Effect

Variables

Descriptive InformationDiscrete

(Counted)Continuous (Measured)

Qualitative Quantitative

SexNationality

Type of crimeOccupation

ReligionMarital status

Literacy…etc.

Number of children in a family

Number of accidents per week

Students marks…etc.

AgeDistanceHeightWeight…etc.

VARIABLES & UNITSEmployees Race Sex Job Title Years of

Service Annual Salary

Abel Caucasian Male Pilot 2 $34,000

Cruz Caucasian Male Chief mechanic 10 60,000

Dunn Western Male Chief pilot 23 70,000

Hill Western Female Secretary 5 14,000

King Caucasian Male Janitor 8 17,000

Otis Caucasian Male Grounds keeper 10 20,000

West Western Male Mechanic 2 36,000

Wolf Caucasian Female Pilot 7 36,000

Zorn Caucasian Female Mechanic 7 40,000

Qualitative variable

Elementary unit

Frame Population of

employee sexes

Population of

employee salaries

Qualitative variable

Datum

Sample of employee salaries

Selected characteristics of all the full-time employees of Mountain Aviation, Inc.,: July 1, 1994

QUANTITATIVE VARIABLES

Nominal• Attributes are

only named• No order

Ordinal• Attributes can

be rank-ordered• Distance is not

meaningful

Interval• Distance is

meaningful

Ratio• Absolute zero

Strongest

Weakest

Discrete Continuous

Measurement Level

POPULATION VS. SAMPLINGPopulation

Sample

Parameter

Statistic

Something that describes the

population as a whole

Something that describes the

sample

Elementary Unit

Sampling Inference

Target Population

WHY SAMPLING?Reduced

costTime factor

Very large

populations

Partly accessibl

e populatio

ns

Destructive units Accuracy

SIMPLE RANDOM SAMPLINGFor the sampling plan to be statistically valid, the sample must be randomly selected from the population.

DATA COLLECTION & PREPARATION

WHAT IS DATA?

Data is a collection of facts, such as values or measurements.Examples of data include price, exam scores, export and import, labor and so on.

Data Information Statistics

20 kg, 25 kg 5 individuals in the 20-to-25-kg range Mean weight = 22.5 kg

28 kg, 30 kg, etc. 15 individuals in the 26-to-30-kg range

Median weight = 28 kg

DATA VS. INFORMATION VS. STATISTICS

SOURCES OF DATA

Internal• Organization

internal recordsExternal• Published and

unpublished external sources

DATA COLLECTION METHODS

Cens

us

Sam

ple

Surv

eyEx

perim

ent

Obse

rvat

iona

l St

udy

Census

Survey

Experiment (Control)

Observation (No Control)

OTHER DATA COLLECTION METHODS Interviews Face-to-face Telephone

Focus Groups

Ethnographies, Oral History, & Case Studies

Schedule through enumerators

Documents & Records

Questionnaire Schedule• It is not confirmed if expected

respondent have filled the answers• Respondent identity is know

• Very slow • Information collected in time• No personal contact • Direct personal contact• Used only when respondent is

educated & cooperative• Information can be collected from

illiterate people• Wider distribution of samples

possible• Difficult for wider distribution

• More incomplete and false information

• Relatively more complete and correct information

• Information validity depends on quality of questionnaire

• Information validity depends on honesty & competence of numerator

• Questionnaire physical appearance should be attractive

• Schedule physical appearance not necessary

• Observation cannot be used • Observation can be used by enumerator

Comparison

FRAMING A QUESTIONNAIRE OR A SCHEDULE

Cover Letter

Number of Questions

Nature of Questions

Questions should be simple

Arrangement of Questions

Information collected is usable

Avoid mathematical questions

TYPES OF QUESTIONS

• Yes/No• Multiple

Choice• Scaled (i.e.,

Likert)

• no predefined options or categories

• closed-ended but arranged one under the other

• answered only when the respondent provides a particular response

Contingency

questions

Matrix question

s

Closed ended

questions

Open ended

questions

LIKERT SCALE

1 2 3 4 5 6 7 8 9

Disagree

Agree

9-point Scale

5-point Scale

Not at all

Satisfied

Not Satisfie

d

Neutral Satisfied

Very Satisfie

d

1 2 3 4 5

7-point Scale

Entirely Disagre

e

Mostly Disagre

e

Somewhat

Disagree

Neither Agree nor Disagree

Somewhat Agree

Mostly Agree

Entirely Agree

1 2 3 4 5 6 7

SEMANTIC DIFFERENTIAL SCALE

RANK-ORDER SCALE

STATISTICAL ANALYSIS

STATISTICAL ANALYSIS METHODS & TECHNIQUES

Descriptive Statistics

Tables & Graphs

Summary Measures

Inferential Statistics

Confidence Intervals

Hypothesis Testing

DATA ANALYSIS

- Performed to study a single variable- Descriptive statistics• Tables & Graphs• Summary Measures• One sample tests• Normality Tests• Normal Probability

Plot

- Performed to study two variables relationship- Statistical Analysis• Comparisons• Contingency Tables• Scatter plot• Correlation• Regression

Univ

aria

teBivariate

DESCRIPTIVE STATISTICSTables & Graphs

Frequency Table

Frequency HistogramBar & Column ChartTime-series Line GraphPie ChartStem-&-leaf DiagramBoxplot

Summary Measures

Measures of Location

Measures of Variability

Measures of Shape

Proportion

SUMMARY MEASURES

Measures of Central Tendency (Location)• Mean (Average)• Median (Middle

Value)• Mode (Most often

Value)

Measures of Dispersion (Spread)• Range (Highest

Value – Lowest Value)

• Variance• Standard Deviation• Coefficient of

Variation

Measures of Shape• Skewness• Kurtosis

Proportion• Frequency of

observations in a particular category as a fraction of all observations

TABLES & GRAPHSFrequency Table Frequency Histogram Frequency Histogram

Time-series Line Graph Pie Chart Stem-&-Leaf Diagram Boxplot

FREQUENCY TABLE

HISTOGRAM VS. BAR CHART

TIME-SERIES LINE GRAPH

PIE CHART

STEM-AND-LEAF DIAGRAM

BOXPLOT

DATA REPRESENTATION SUMMARY

Variable Associated Data type

Graphically represented by

Summary measures

Central Tendency Dispersion

Qualitative (Categorical)

Nominal Pie ChartColumn or Bar Chart

Mode Range

Ordinal Column or Bar Chart ModeMedian

Range

Quantitative (Scale)

Interval & Ratio

Frequency HistogramBoxplotStem-&-leaf DiagramTime-series line Graph

ModeMedianMean

RangeVarianceStandard DeviationCoefficient of Variation

Effective representation of data depends on choosing proper graphical tool and summary measures.

NORMAL DISTRIBUTION

CENTRAL LIMIT THEOREM

The central limit theorem in it's

shortest form states that the

sampling distribution of the

sampling means approaches a

normal distribution as the sample

size gets larger, regardless of the

shape of the population

distribution.

TEST OF NORMALITY Since a number of the most common statistical tests rely

on the normality of a sample or population, it is often

useful to test whether the underlying distribution is normal,

or at least symmetric. This can be done via the following

approaches:

• Review the distribution graphically (histograms, & Normal Probability plots)

• Analyze the skewness & kurtosis

• Employ statistical tests (Chi-square, Kolmogorov-Smironov, & Shapiro-Wilk)

INFERENTIAL STATISTICS

Confidence

Interval Hypothesis Testing

DATA ANALYSIS

- Performed to study a single variable- Descriptive statistics• Tables & Graphs• Summary Measures• One sample tests• Normality Tests• Normal Probability

Plot

- Performed to study two variables relationship- Statistical Analysis• Comparisons• Contingency Tables• Scatter plot• Correlation• Regression

Univ

aria

teBivariate

STATISTICAL TESTSIndependent

Samples Parametric

Tests1 Sample(t-test)

2 Samples(t-test)

K Samples(ANOVA)

Independent Samples

Nonparametric Tests1 Sample

(Binomial, Chi-square,

Kolmogorov-Smirnov)

2 Samples(Mann-Whitney)

K Samples(Kruskal-Wallis)

Related Samples

Parametric Tests

2 Samples(t-test)

K Samples(Repeated Measures ANOVA)

Related Samples

Nonparametric Tests

2 Samples(Wilcoxon)

K Samples(Friedman)

CONTINGENCY TABLESA two-way table is a useful tool for examining relationships between categorical variables. The entries in the cells of a two-way table can be frequency counts or relative frequencies

SCATTER PLOT

A Scatter (XY) Plot has points that show the relationship between two sets of data.

In this example, each dot shows one person's weight versus their height.

Positive Association

Negative Association No

Association

CORRELATION

When two sets of data are strongly linked together we say they have a High Correlation.

• Correlation is Positive when the values increase together, and

• Correlation is Negative when one value decreases as the other increases

Correlation can have a value:1 is a perfect positive correlation0 is no correlation (the values don't seem linked at all)-1 is a perfect negative correlation

The value shows how good the correlation is (not how steep the line is), and if it is positive or negative.

LINEAR REGRESSION In a cause and effect relationship, the independent variable is the cause, and the dependent variable is the effect.

SPSS AND MINITAB

MS EXCEL

STATISTICAL PACKAGE FOR SOCIAL SCIENCES (SPSS)

MINITAB

DATA PREPARATION AND ANALYSIS USING MS EXCEL,

SPSS, & MINITAB

DATA PREPARATION

Collect Data

Enter the

Data into

Computer

Check for

Data Accura

cy

Develop a

Database

Structure

WHAT NEXT…?

Data ready in

Excel

Import Data to SPSS or Minitab

Explore data using

Descriptive

Statistics

Perform Hypoth

esis Testing

if needed

Write your

report

STATISTICAL REPORT WRITING

HOW TO WRITE A RESEARCH REPORT…?Cover Page• Title• Date• Researcher Name• Supervisor Name

Contents & Executive Summary

Introduction• Data Source• Data Analysis

Technique• Main Research Aim

Data Analysis• Descriptive• Research Questions Conclusion

TAILORED REPORTS

Sales Real Estate

Marketing NPS Other

SOURCES• https://cyfernetsearch.org/ilm_4_4• http://stattrek.com/statistics/data-collection-methods.aspx• http://onlinestatbook.com/2/introduction/descriptive.html• http://books.google.com.eg/books?

id=JbzzoRyOiXcC&pg=PA19&lpg=PA19&dq=introduction+to+statistics+internal+and+external+data&source=bl&ots=hKFxFq_DHG&sig=EHQL0LJ8JxvGQ1dRN5Et8BX6VLk&hl=en&sa=X&ei=2ohrVKC7FoLksASUzYGQCQ&ved=0CCgQ6AEwAg#v=onepage&q=introduction%20to%20statistics%20internal%20and%20external%20data&f=false

• http://www.sagepub.com/upm-data/10985_Chapter_4.pdf• http://sociology.about.com/od/Statistics/a/Descriptive-inferential-statistics.htm• http://www.statcan.gc.ca/edu/power-pouvoir/ch2/methods-methodes/5214773-eng.htm• http://www.stats.gla.ac.uk/steps/glossary/index.html• http://www.psychstat.missouristate.edu/introbook/sbk28m.htm

THANK YOU

Have Successful career, Wonderful Life full of Joy, Productivity, & Happiness

Marwa Abo-AmraEmail: analyst.amra@gmail.comBlog: analystamra.blogspot.com