Mastering Data Analysis Tools.pptx

download Mastering Data Analysis Tools.pptx

of 67

Transcript of Mastering Data Analysis Tools.pptx

  • 7/30/2019 Mastering Data Analysis Tools.pptx

    1/67

    Data Analysis

    Using SPSS

    Know where to find information and

    how to use it, that's the secret ofsuccess:Albert Einstein

  • 7/30/2019 Mastering Data Analysis Tools.pptx

    2/67

    Workshop Objectives:

    To develop Data Analysis Skills

    Use of appropriate Statistical Techniques

    Use of SPSS to perform Statistical Analysis

    Understanding and Interpretation of Results

  • 7/30/2019 Mastering Data Analysis Tools.pptx

    3/67

    Types of Analysis:

    Descriptive Analysis Inferential Analysis

    Model Building Techniques Multivariate Analysis

    IBM SPSS

  • 7/30/2019 Mastering Data Analysis Tools.pptx

    4/67

    Before starting analyzing data let

    me introduce SPSS and its basicstructure

  • 7/30/2019 Mastering Data Analysis Tools.pptx

    5/67

    Descriptive Analysis:

    Descriptive Analysis for Qualitative Variables

    Descriptive Analysis for Quantitative Variables

  • 7/30/2019 Mastering Data Analysis Tools.pptx

    6/67

    Descriptive Analysis of Qualitative Data

    Tables Graphs Numbers

    One Way TableTwo Way Table

    .

    .

    .

    N-Way Table

    Bar Chart

    Pie Chart

    Clustered Bar

    Chart

    Percentages

    Qualitative Data

    (Categorical Data)

  • 7/30/2019 Mastering Data Analysis Tools.pptx

    7/67

  • 7/30/2019 Mastering Data Analysis Tools.pptx

    8/67

    Descriptive Analysis for Quantitative Data

    Quantitative Data(Numerical Data)

    Tables Graphs Numbers

    CenterImportant

    PointsVariation Distribution

    Mean

    Median

    Mode

    Geometric Mean

    Harmonic Mean

    Trimmed Mean

    Median

    Quartiles

    Percentiles

    Range

    Inter Quartile Range

    Variance

    Standard Deviation

    Skewness

    Kurtosis

    Frequency Distribution

    Stem and Leaf

    Histogram

    Box-Plot

  • 7/30/2019 Mastering Data Analysis Tools.pptx

    9/67

    Tabular Methods

  • 7/30/2019 Mastering Data Analysis Tools.pptx

    10/67

    Graphical Methods

    0

    50

    100

    150

    200

    250

    15750 35750 55750 75750 95750 115750

    Frequency

    Histogram

  • 7/30/2019 Mastering Data Analysis Tools.pptx

    11/67

    Numerical Methods

  • 7/30/2019 Mastering Data Analysis Tools.pptx

    12/67

    Practice Session for Descriptive

    Analysis

    Import Customers Databas.xls into SPSS

    Label data properly

    Make One-way tables for variables (Age, Sex,OwnHome and Married). Also make pie chart and bar

    chart for these variables Make Two-way tables (sex by OwnHome and Married

    by OwnHome). Also make clustered bar chart for eachvariable

    Produce Detailed Numerical descriptive statistics forvariable Purchases (Mean, Median, ..). Alsomake histogram and stem & leaf and box-plot forvariable Purchases

    Perform previous step by gender

  • 7/30/2019 Mastering Data Analysis Tools.pptx

    13/67

    Inferential AnalysisComparing Groups

  • 7/30/2019 Mastering Data Analysis Tools.pptx

    14/67

    Comparing

    Groups

    OneGroup

    MeasuredOnce

    NormalityAssumption

    Fulfilled

    NotFulfilled

    MeasuredTwice

    NormalityAssumption

    Fulfilled

    NotFulfilled

    TwoGroups

    NormalityAssumption

    Fulfilled

    NotFulfilled

    Homogeneity ofVariances

    Assumption

    Fulfilled

    NotFulfilled

    More thanTwo

    Groups

    NormalityAssumption

    Fulfilled

    NotFulfilled

    Homogeneity ofVariances

    Assumption

    Fulfilled

    NotFulfilled

    Parametric & Non-Parametric

    Inference

    Normality

    +

    Equal

    Variances

    Normality

    +

    Un-Equal

    Variances

    Normality

    +

    Un-Equal

    Variances

    Normality

    +

    Equal

    Variances

  • 7/30/2019 Mastering Data Analysis Tools.pptx

    15/67

    Comparing One Group

    Kinds of Research Questions

    For the one-sample situation, the prime concern in research is

    examining a measure of central tendency (location) for the

    population of interest. The best-known measures of locationare the mean and median. For a one-sample situation, we

    might want to know if the average waiting time in a doctor's

    office is greater than one hour, or if the average growth of

    roses is 4 inches or more with a certain fertilizer, oris annualreturn is 10.2% for the banks that exercised comprehensive

    planning.

  • 7/30/2019 Mastering Data Analysis Tools.pptx

    16/67

    Comparing Two Groups

    Kinds of Research Questions

    One of the most common tasks in research is to compare twopopulations (groups). We might want to compare the income levelof two regions, the nitrogen content of two lakes, or theeffectiveness of two drugs.

    The first question that arises is what aspects (parameters) of thepopulations shall we compare. We might consider comparing theaverages, the medians, the standard deviations, the distributionalshapes (histogram), or maximum values. We base the comparison

    parameter on our particular problem.Perhaps the simplest comparison that we can make is between themeans of the two populations.

  • 7/30/2019 Mastering Data Analysis Tools.pptx

    17/67

    Comparing more than two Groups

    Kinds of Research Questions

    One of the most common tasks in research is to compare severalpopulations (groups). We might want to compare the income levelof three regions, the nitrogen content of four lakes, or theeffectiveness of four drugs.

    The first question that arises concerns which aspects (parameters)of the populations we should compare. We might considercomparing the means, medians, standard deviations, distributionalshapes (histograms), or maximum values. We base the comparison

    of parameter on our particular problem.Perhaps the simplest comparison that we can make is to comparemeans of several populations.

  • 7/30/2019 Mastering Data Analysis Tools.pptx

    18/67

    One Sample t-test

    One Sample t-test is used to compare one group to a

    given standard on the basis of Arithmetic Average

    (Mean).

  • 7/30/2019 Mastering Data Analysis Tools.pptx

    19/67

    Assumptions of the One-sample t-test

    The data are continuous.

    The data follow the Normal distribution.

    The sample is a simple random sample from the

    population.

  • 7/30/2019 Mastering Data Analysis Tools.pptx

    20/67

    Hypotheses and Formulas

    0 0 0: , :AH H

    2

    Xt

    s

    n

    With

    1df n

  • 7/30/2019 Mastering Data Analysis Tools.pptx

    21/67

    Case Study

    A manufacturer of high-performance automobiles

    produces disc brakes that must measure 322 millimeters

    in diameter. Quality control manager randomly selects

    128 discs and measures their diameters.

    We can use One Sample T Test to determine whether or

    not the mean diameters of the brakes in sample

    significantly differ from 322 millimeters.

  • 7/30/2019 Mastering Data Analysis Tools.pptx

    22/67

    SPSS Analytic Procedure

  • 7/30/2019 Mastering Data Analysis Tools.pptx

    23/67

    The Sign Test

    The sign test is perhaps the oldest of all the nonparametricprocedures. This nonparametric test is based on thebinomial distribution. It assumes two mutually exclusiveoutcomes, constant or stable probability of success orfailure, and n independent trials

    The terminology, sign test, reinforces the point that thedata are converted to a series of plus and minus signs. Thetest is based on the number of plus signs that occur. Zerodifferences are thrown out, and the sample size is reduced

    accordingly.

  • 7/30/2019 Mastering Data Analysis Tools.pptx

    24/67

    Assumptions of the Sign Test

    The data are continuous

    The distribution of these data is symmetric.

    The measurement scale is at least interval.

  • 7/30/2019 Mastering Data Analysis Tools.pptx

    25/67

    Hypotheses and Formulas

    w

    w

    wZ

    1

    4

    1 2 1

    24

    w

    w

    n n

    n n n

    w R

    0 0

    0 0 0

    :

    : , ,A

    H

    H

  • 7/30/2019 Mastering Data Analysis Tools.pptx

    26/67

    Case Study

    A Researcher believes that median salary of HRManager is 50 thousands. To confirm thishypothesis he selects a random sample of 1207

    HR Managers from different companies.

    We can use Sign Test to determine whether ornot the median salary is significantly different

    from 50 thousands.

  • 7/30/2019 Mastering Data Analysis Tools.pptx

    27/67

    SPSS Analytic Procedure

  • 7/30/2019 Mastering Data Analysis Tools.pptx

    28/67

    Paired Samples t-test

    Kinds of Research Questions

    In the paired case, we take two measurements on sameindividual at different times, or we have one measurement

    on each individual of a pair.Examples of the first case are two insurance-claim adjustersassessing the damage for the same 15 cases. Evaluation ofthe improvement in aerobic fitness for 15 subjects wheremeasurements are made at the beginning of the fitnessprogram and at the end of it.An example of the second paired situation is the testing ofthe effectiveness of two drugs, A and B, on 20 pairs ofpatients who have been matched on physiological andpsychological variables. One patient in the pair receivesdrug A, and the other patient gets drug B.

  • 7/30/2019 Mastering Data Analysis Tools.pptx

    29/67

    Assumptions of the paired-sample t-test

    The data are continuous.

    The data, i.e., the differences for the matched-pairs,

    follow a Normal distribution.

    The sample of pairs is a simple random sample from

    its population.

  • 7/30/2019 Mastering Data Analysis Tools.pptx

    30/67

    Hypotheses and Formulas

    0: 0 , : 0d A dH H

    2

    d d

    d

    Xt

    n

    s

    With

    1df n

  • 7/30/2019 Mastering Data Analysis Tools.pptx

    31/67

    Case Study

    A researcher in behavioral medicine believes that stress oftenmakes asthma symptoms worse for people who suffer from thisrespiratory disorder. Therefore, the researcher decides to study theeffect of relaxation training on the severity of their symptoms.

    A sample of 5 patients is selected. During the week beforetreatment, the investigator records the severity of their symptomsby measuring how many doses of medication are needed forasthma attacks. Then the patients receive relaxation training. Forthe week following the training the researcher once again recordsthe number of doses used by each patient.

    Data from Gravetter and Wallnau (4th Ed.) p. 319.

  • 7/30/2019 Mastering Data Analysis Tools.pptx

    32/67

    SPSS Analytic Procedure

  • 7/30/2019 Mastering Data Analysis Tools.pptx

    33/67

    Wilcoxon Signed Rank test

    Wilcoxon Signed Rank test is used to test the

    median difference of zero in case ofnon

    normal populations.

  • 7/30/2019 Mastering Data Analysis Tools.pptx

    34/67

    Assumptions of the two-sample t-test

    The differences are continuous.

    The distribution of these differences is symmetric.

    The differences are mutually independent.

    The measurement scale is at least interval.

  • 7/30/2019 Mastering Data Analysis Tools.pptx

    35/67

    Hypotheses and Formulas

    w

    w

    wZ

    1

    41 2 1

    24

    w

    w

    n n

    n n n

    w R

    0 1 2

    1 1 2

    :

    :

    H

    H

  • 7/30/2019 Mastering Data Analysis Tools.pptx

    36/67

    Case Study

    An educationist wants to see the effectiveness of

    new teaching method. For this She selected 600

    students and record their scores in a test of 150

    marks. The scores are recorded before and after thenew teaching method.

    The Wilcoxon Signed Rank test can be used to test

    the effectiveness of new teaching method.

  • 7/30/2019 Mastering Data Analysis Tools.pptx

    37/67

    SPSS Analytic Procedure

  • 7/30/2019 Mastering Data Analysis Tools.pptx

    38/67

    Independent Samples t-test

    Equal Variances

    Independent sample t test is used to compare two

    groups on the basis of their averages.

  • 7/30/2019 Mastering Data Analysis Tools.pptx

    39/67

    Assumptions of the two-sample t-test

    The data are continuous

    The data follow the Normal distribution.

    The variances of the two populations are equal

    The two samples are independent

    Both samples are simple random samples from their

    respective populations.

  • 7/30/2019 Mastering Data Analysis Tools.pptx

    40/67

    Hypotheses and Formulas

    0 1 2 1 2: , :AH H

    1 2 1 2

    2 2

    1 1 2 2

    1 2 1

    1 1 1 1

    2 2

    X Xt

    n s n s

    n n n n

    With

    1 22df n n

  • 7/30/2019 Mastering Data Analysis Tools.pptx

    41/67

    Case Study

    An analyst at a department store wants to evaluate a

    recent credit card promotion. To this end, 500

    cardholders were randomly selected. Half received

    an ad promoting a reduced interest rate onpurchases made over the next three months, and

    half received a standard seasonal ad.

    We can use Independent-Samples T Test to compare

    the spending of the two groups.

  • 7/30/2019 Mastering Data Analysis Tools.pptx

    42/67

    SPSS Analytic Procedure

    d d l

  • 7/30/2019 Mastering Data Analysis Tools.pptx

    43/67

    Independent Samples t-test

    Unequal Variances

    Independent Samples t-test is use to compare two

    independent groups on the basis of average. This test

    does not require homogeneity of the variances.

  • 7/30/2019 Mastering Data Analysis Tools.pptx

    44/67

    Hypotheses and Formulas

    0 1 2 1 2: , :AH H

    1 2 1 22 2

    1 2

    1 2

    X Xt

    n n

    s s

    With

    22 2

    1 2

    1 2

    2 22 2

    1 2

    1 2

    1 21 1

    n ndf

    n n

    n n

    s s

    s s

  • 7/30/2019 Mastering Data Analysis Tools.pptx

    45/67

    Case Study

    A researcher wishes to compare the

    expenditure behavior of the students, one of

    the research question is to see the difference

    in expenditures by gender.

  • 7/30/2019 Mastering Data Analysis Tools.pptx

    46/67

    SPSS Analytic Procedure

  • 7/30/2019 Mastering Data Analysis Tools.pptx

    47/67

    Mann-Whitney Test

    Mann-Whitney Test is used to compare the

    two independent groups on the basis of

    medians. This test does not require the

    assumption of normality.

  • 7/30/2019 Mastering Data Analysis Tools.pptx

    48/67

    Mann-Whitney U Test Assumptions

    The variable of interest is continuous. The measurement scale

    is at least ordinal.

    The probability distributions of the two populations are

    identical, except for location.

    The two samples are independent.

    Both samples are simple random samples from their

    respective populations.

  • 7/30/2019 Mastering Data Analysis Tools.pptx

    49/67

    Hypotheses and Formulas

    1 2

    1 2 1 2

    1 1

    2

    1

    12

    1

    2

    u

    u

    n n

    n n n n

    n nu w

    u

    u

    uz

    W is the sum of ranks of the smaller sample

    0 1 2

    1 2

    :

    :A

    H

    H

  • 7/30/2019 Mastering Data Analysis Tools.pptx

    50/67

    Case Study

    Data on birth weight of infants born to mothers with

    different levels of prenatal care. Two independent

    samples data for univariate analysis. Test data for

    Mann-Whitney U-Test, obtained from Howell, DavidD. Fundamental Statistics for the Behavioral Sciences

    3rd Edition, p385.

  • 7/30/2019 Mastering Data Analysis Tools.pptx

    51/67

    SPSS Analytic Procedure

    O W A l i f V i

  • 7/30/2019 Mastering Data Analysis Tools.pptx

    52/67

    One-Way Analysis of Variance

    Equal Variances

    One Way Analysis of Variance is used to

    compare more than two groups on the basis

    of their averages.

    O W A l i f V i

  • 7/30/2019 Mastering Data Analysis Tools.pptx

    53/67

    One-Way Analysis of Variance

    Assumptions The data are continuous.

    The data follow the Normal distribution, each groupis normally distributed.

    The variances of the populations are equal.

    The groups are independent.

    Each group is a simple random sample from itspopulation.

  • 7/30/2019 Mastering Data Analysis Tools.pptx

    54/67

    Hypotheses and Formulas

    0 1 2 3: .......

    :

    k

    A

    H

    H Atleast one pair is significantly diffrent

    MSGF

    MSE

    MSG is the Mean Square of Group and MSE is the Mean Square Error

  • 7/30/2019 Mastering Data Analysis Tools.pptx

    55/67

    Example

    This is a hypothetical data file that concerns the

    popularity of a TV channel. Using a prototype, the

    marketing team has collected focus group data. One

    of the question of interest is to see the difference inpopularity of the TV channel in different age groups.

    This hypothesis can be tested using One Way ANOVA.

  • 7/30/2019 Mastering Data Analysis Tools.pptx

    56/67

    SPSS Analytic Procedure

    One Way Analysis of Variance

  • 7/30/2019 Mastering Data Analysis Tools.pptx

    57/67

    One-Way Analysis of Variance

    Unequal Variances

    Welch ANOVA is used to compare more than two

    groups on the basis of averages. This test does not

    require the homogeneity of variances.

    Welch Analysis of Variance

  • 7/30/2019 Mastering Data Analysis Tools.pptx

    58/67

    Welch Analysis of Variance

    Assumptions

    The data are continuous

    The data follow the Normal distribution, each group is

    normally distributed.

    The groups are independent.

    Each group is a simple random sample from its population.

  • 7/30/2019 Mastering Data Analysis Tools.pptx

    59/67

    0 1 2 3: .......

    :

    k

    A

    H

    H Atleast one pair is significantly diffrent

    2

    . ..21

    2

    2 21

    1

    1

    1

    2 21 1 / 1

    1

    /

    /

    ki

    ii

    i

    ki

    iki

    ii

    n

    X XkF

    k i nk

    i

    s

    sn

    sn

    12

    2

    221

    1

    /31 / 1

    1/

    ki i

    ini

    i ii

    df nk

    n S

    n S

    With

  • 7/30/2019 Mastering Data Analysis Tools.pptx

    60/67

    Case Study

    A sales manager evaluates two new training courses.

    Sixty employees, divided into three groups, all

    receive standard training. In addition, group 2

    receives technical training, and group 3 receives ahands-on tutorial. Each employee was tested at the

    end of the training course and their score recorded.

  • 7/30/2019 Mastering Data Analysis Tools.pptx

    61/67

    SPSS Analytic Procedure

  • 7/30/2019 Mastering Data Analysis Tools.pptx

    62/67

    Kruskal-Wallis Test

    Kruskal-Wallis H-test is used to compare more

    than two groups on the basis of their medians.

    Kruskal Wallis Test

  • 7/30/2019 Mastering Data Analysis Tools.pptx

    63/67

    Kruskal-Wallis Test

    Assumptions

    The variable of interest is continuous, themeasurement scale is at least ordinal.

    The probability distributions of the populations areidentical, except for location.

    The groups are independent.

    All groups are simple random samples from theirrespective populations.

  • 7/30/2019 Mastering Data Analysis Tools.pptx

    64/67

    Hypotheses and Formulas

    1

    123 1

    1

    ki

    i i

    RH N

    N N n

    0 1 2: ......

    :

    k

    A

    H

    H At least one pair of median is significantly diffrent

  • 7/30/2019 Mastering Data Analysis Tools.pptx

    65/67

    Case Study

    A health scientist wishes to compare the

    survival experiences after breast cancer with

    different Pathological Tumor Size (Categories).

    We can use Kruskal-Wallis H-Test to determine

    whether or not the median survival time of

    the patients is significantly differ in differentpathological tumor size.

  • 7/30/2019 Mastering Data Analysis Tools.pptx

    66/67

    SPSS Analytic Procedure

  • 7/30/2019 Mastering Data Analysis Tools.pptx

    67/67

    Model Building Techniques