ISDS361 Notes - Chapter 1

download ISDS361 Notes - Chapter 1

of 13

Transcript of ISDS361 Notes - Chapter 1

  • 8/14/2019 ISDS361 Notes - Chapter 1

    1/13

    Graphical Descriptive

    Techniques

    Chapter 2

    Learning Objectives

    Understand different types of data:

    Nominal

    Ordinal

    Learn how to describe a set of Nominal data.

    Learn how to describe the relationships between

    .

    2

  • 8/14/2019 ISDS361 Notes - Chapter 1

    2/13

    Populations & Samples

    Population Sample

    Subset

    3

    The graphical & tabular methods presented here apply to bothentire populations and samples drawn from populations.

    Definitions

    Variable: some characteristic of a population or

    sample.

    .g. s u en gra es.

    Typically denoted with a capital letter: X, Y, Z

    Values: range of possible values for a variable.

    E.g. student grades (0..100)

    4

    Data: observed values of a variable.

    E.g. student grades: {67, 74, 71, 83, 93, 55, 48}

    Variable: what you want to measure

    Values Example: Gas range: 2.90 5.00

    Data: actual gas prices

  • 8/14/2019 ISDS361 Notes - Chapter 1

    3/13

    Types of Data & Information

    Interval Data

    e.g. heights, weights, prices, etc.

    e.g. Marital status:

    Single = 1, Married = 2, Divorced = 3, Widowed = 4

    Ordinal Data

    e.g. College course rating system:

    oor = 1 fair = 2 ood = 3 ver ood = 4 excellent = 5

    5

    , , , ,

    We can say things like:

    excellent > poor or fair < very good

    Interval : nominal/quantitative > arithmetic (if you can apply, then interval)

    Ex: avg price, avg qty, age, distance traveled

    Nominal & Ordinal: both qualitative/categorical

    Nominal Ex: single x single married; cant say one is better than other; gender, race

    Ordinal Data: order matters (different from nominal)

    Grades

    Hierarchy of Data & Types of CalculationsInterval

    - Values are real numbers.

    - All calculations are valid.

    - Data ma be treated as ordinal or nominal .

    Ordinal- Values must represent the ranked order of the data.

    - Calculations based on a ranking process are valid.

    - Data may be treated as nominal but not as interval.

    Nominal

    6

    - a ues are t e ar trary num ers t at represent categor es.

    - Only calculations based on the count/frequencies of occurrence are valid.

    - Data may not be treated as ordinal or interval.Interval : nominal/quantitative > arithmetic (if you can apply, then interval)

    Ex: avg price, avg qty, age, distance traveled

    Nominal & Ordinal: both qualitative/categorical

    Ex: single x single married; cant say one is better than other; gender, race

    Ordinal Data: order matters (different from nominal)

    Grades

  • 8/14/2019 ISDS361 Notes - Chapter 1

    4/13

    Your Turn

    In-class team exercises (pages 17-18)

    .

    2.3

    2.4

    2.5

    2.6

    7

    Graphical & Tabular Techniques forNominal Data The only allowable calculation on nominal data is to

    .

    We can summarize the data in a table that presents

    the categories and their counts called a frequency

    distribution.

    8

    A relative frequency distribut ion lists the

    categories and the proportion with which each occurs.

    Tabular description = Table

    Frequency Distribution: Looks at counts only (own versus rent)

    Relative Frequency Distribution: Proportion / percentage (% of rent vs % buying)

  • 8/14/2019 ISDS361 Notes - Chapter 1

    5/13

    Work Status in the General Social Survey 2008

    Survey respondents were asked the following:

    Last week were you working full time, part time, going

    , ,

    were:

    1. Working full time

    2. Working part time

    3. Temporarily not working

    4. Unemployed, laid off

    5. Retired

    Generally, variable has a short name

    Variable = Work Status

    Values: 1-8

    9

    6. School

    7. Keeping house

    8. Other

    The responses were recorded using the codes 1, 2, 3, 4, 5, 6, 7, and 8.

    Work Status in the General Social Survey 2008

    2023 responses.

    ur as s o cons ruc a requency an re a ve

    frequency distribution for these data and

    graphically summarize the data by producing a bar

    chart and a pie chart.

    10

  • 8/14/2019 ISDS361 Notes - Chapter 1

    6/13

    Survey Data (150 observations) 1 1 1 1 2 4 3 5 1 3 1 3 7 5 1

    1 5 2 1 5 1 3 3 3 1 1 5 3 1 55 1 1 3 3 5 5 6 3 5 3 5 5 5 1

    1 2 1 1 5 5 3 2 1 6 1 1 4 5 1

    3 3 1 3 5 3 3 7 3 7 2 1 5 7

    3 6 2 6 3 6 6 6 5 6 1 1 6 3

    7 1 1 1 5 1 3 1 3 7 7 2 1 1

    2 5 3 1 1 3 1 1 7 5 3 2 1 1

    6 5 7 1 3 2 1 3 1 1 7 5 5 6

    1 4 6 1 3 1 1 5 5 5 5 1 5 5

    6 1 3 3 1 3 7 1 1 1 2 4 1 1

    3 3 7 5 5 1 1 3 5 1 5 4 5 3

    4 1 4 5 3 1 5 3 3 3 1 1 5 3

    11

    5 6 4 3 5 6 4 6 5 5 5 5 3 1

    2 3 2 7 5 1 6 6 2 3 3 3 1 1

    5 1 4 6 3 5 1 1 2 1 5 6 1 1

    5 1 3 5 1 1 1 3 7 3 1 6 3 1

    2 2 5 1 3 5 5 2 3 1 1 3 6 1

    1 1 1 7 3 1 5 3 3 3 5 3 1 7

    Frequency & Relative Frequency Distributions

    12

    Frequency Distribution: =countif(A1:A2023,1) = 1003 (BAR CHARTS)

    Relative frequency Distribution with Pivot tables (PIE CHARTS)

  • 8/14/2019 ISDS361 Notes - Chapter 1

    7/13

    Nominal Data (Frequency)

    13

    Bar Charts are often used to display frequencies.

    Frequency Distribution: =countif(A1:A2023,1) = 1003 (BAR CHARTS)

    Nominal Data (Relative Frequency)

    14

    Pie Charts show relative frequencies.

    Relative frequency Distribution with Pivot tables (PIE CHARTS)

  • 8/14/2019 ISDS361 Notes - Chapter 1

    8/13

    Nominal Data

    Its all the same information,(based on the same data).

    Just different resentation.

    15Frequency Distribut ion: =countif(A1:A2023,1) = 1003 (BAR CHARTS)

    Relative frequency Distribu tion with Pivot tables (PIE CHARTS)

    Your Turn

    In-class team exercises using MS. Excel (pages 29-

    31 :

    2.21

    2.28

    2.32

    16

  • 8/14/2019 ISDS361 Notes - Chapter 1

    9/13

    Describing the Relationship between Two

    Nominal Variables

    Newspaper Readership Survey

    In a major North American city there are four competing

    newspapers: the Post, Globe, Sun, and Star.

    To help design advertising campaigns, the advertising

    managers of the newspapers need to know which

    segments of the newspaper market are reading their

    papers.

    A survey was conducted to analyze the relationship

    between newspapers and occupation.

    17

    Newspaper Readership Survey

    A sample of newspaper readers was asked to

    report which newspaper they read:

    Globe (1)

    Post (2)

    Star (3)

    Sun (4)

    The readers were also asked to indicate whether

    - , -worker (2), or professional (3)

    How many possible combinations of these two

    variables are there?

    18

  • 8/14/2019 ISDS361 Notes - Chapter 1

    10/13

    Cross-classification table of Frequencies

    As a first step we need to produce a cross-classification table, which lists the frequency of each

    combination of the values of the two variables.

    Newspaper Blue Collar White Collar Professional Total

    Globe 27 29 33 89

    Post 18 43 51 112

    Star 38 21 22 81

    Sun 37 15 20 72

    Total 120 108 126 354

    19

    By counting the number of t imes each of the 12 possible combinations occurs, we can produce the

    following cross-tabulation (cross-classification)

    Relative Frequencies

    If occupation and newspaper are related, thenthere will be notable differences in newspapers

    . An easy way to see this is to covert the frequencies in

    each column to relative frequencies.

    Newspaper Blue Collar White Collar Professional

    Globe 27/120 =0.23 29/108 = 0.27 33/126 = 0.26

    Post 18/120 = 0.15 43/108 = 0.40 51/126 = 0.40

    Star 38/120 = 0.32 21/108 = 0.19 22/126 = 0.17

    Sun 37/120 = 0.31 15/108 = 0.14 20/126 = 0.16

    20

  • 8/14/2019 ISDS361 Notes - Chapter 1

    11/13

    Interpretation

    The relative frequencies in columns 2 and 3 are similar, but there arelarge differences between columns 1 and 2 and between columns 1and 3.

    Newspaper Blue Collar White Collar Professional

    Globe 27/120 =0.23 29/108 = 0.27 33/126 = 0.26

    Post 18/120 = 0.15 43/108 = 0.40 51/126 = 0.40

    Star 38/120 = 0.32 21/108 = 0.19 22/126 = 0.17

    Sun 37/120 = 0.31 15/108 = 0.14 20/126 = 0.16

    similar

    This tells us that blue collar workers tend to read different newspapersfrom both white collar workers and professionals and that white collarand professionals are quite similar in their newspaper choice.

    dissimilar

    21

    Graphing the Relationship between 2Nominal Variables

    Post

    60

    G&M G&M

    G&M

    Post

    Post

    Star

    Star Star

    Sun

    Sun

    Sun

    0

    10

    20

    30

    40

    50

    22

    Bluecollar Whitecollar ProfessionalOccupation

    Use the data from the cross-classification table to create bar charts

  • 8/14/2019 ISDS361 Notes - Chapter 1

    12/13

    Interpretation

    If the two variables are unrelated, the patterns exhibitedin the bar charts should be approximately the same.

    If some relationship exists, then some bar charts will differ from

    others.

    The graphs tell us the same story as did the table.

    The shapes of the bar charts for occupations 2 and 3 (White-

    collar and Professional) are very similar.

    Both differ considerably from the bar chart for occupation 1 (Blue-

    collar).

    23

    Your Turn

    In-class team exercises using MS. Excel (pages 39-

    40 :

    2.44

    24

  • 8/14/2019 ISDS361 Notes - Chapter 1

    13/13

    Homework

    Pages 41-42:

    .

    2.52

    2.54

    25