Chapter 2 part1-Scatterplots

18
INTRODUCTION TO STATISTICS & PROBABILITY Chapter 2: Looking at Data–Relationships (Part 1) 1 Dr. Nahid Sultana

description

Statistics

Transcript of Chapter 2 part1-Scatterplots

Page 1: Chapter 2 part1-Scatterplots

INTRODUCTION TO STATISTICS & PROBABILITY

Chapter 2:

Looking at Data–Relationships (Part 1)

1

Dr. Nahid Sultana

Page 2: Chapter 2 part1-Scatterplots

Chapter 2: Looking at Data–Relationships

2

2.1: Scatterplots

2.2: Correlation

2.3: Least-Squares Regression

2.5: Data Analysis for Two-Way Tables

Page 3: Chapter 2 part1-Scatterplots

3

Objectives

Bivariate data

Explanatory and response variables

Scatterplots

Interpreting scatterplots

Outliers

Categorical variables in scatterplots

2.1: Scatterplots

Page 4: Chapter 2 part1-Scatterplots

Bivariate data 4

For each individual studied, we record data on two variables.

We then examine whether there is a relationship between these two variables: Do changes in one variable tend to be associated with specific changes in the other variables?

Student ID

Number of Beers

Blood Alcohol Content

1 5 0.1

2 2 0.03

3 9 0.19

6 7 0.095

7 3 0.07

9 3 0.02

11 4 0.07

13 5 0.085

4 8 0.12

5 3 0.04

8 5 0.06

10 5 0.05

12 6 0.1

14 7 0.09

15 1 0.01

16 4 0.05

Here we have two quantitative variables recorded for each of 16 students:

1. how many beers they drank 2. their resulting blood alcohol content

(BAC)

Page 5: Chapter 2 part1-Scatterplots

5

Many interesting examples of the use of statistics involve relationships between pairs of variables.

Two variables measured on the same cases are associated if

knowing the value of one of the variables tells you something about the values of the other variable that you would not know without this information.

5

Associations Between Variables

A response (dependent) variable measures an outcome of a study.

An explanatory (independent) variable explains changes in the response variable.

Page 6: Chapter 2 part1-Scatterplots

6

Scatterplot

6

The most useful graph for displaying the relationship between two quantitative variables on the same individuals is a scatterplot.

1. Decide which variable should go on which axis.

2. Typically, the explanatory or independent variable is plotted on the x-axis, and the response or dependent variable is plotted on the y-axis.

3. Label and scale your axes.

4. Plot individual data values.

How to Make a Scatterplot

Page 7: Chapter 2 part1-Scatterplots

7

Scatterplot (Cont…) Example: Make a scatterplot of the relationship between body weight and backpack weight for a group of hikers.

7

Body weight (lb) 120 187 109 103 131 165 158 116

Backpack weight (lb) 26 30 26 24 29 35 31 28

Page 8: Chapter 2 part1-Scatterplots

8

Interpreting Scatterplots

8

After plotting two variables on a scatterplot, we describe the overall pattern of the relationship. Specifically, we look for form, direction, and strength .

Form: linear, curved, clusters, no pattern

Direction: positive, negative, no direction

Strength: how closely the points fit the “form”

… and clear deviations from that pattern

Outliers of the relationship, , an individual value that falls outside the overall pattern of the relationship

How to Examine a Scatterplot

Page 9: Chapter 2 part1-Scatterplots

9

Linear

Nonlinear

No relationship

Interpreting Scatterplots (Cont…) (Form)

Page 10: Chapter 2 part1-Scatterplots

10

Interpreting Scatterplots (Cont…) (Direction)

Positive association: High values of one variable tend to occur together with high values of the other variable.

Negative association: High values of one variable tend to occur together with low values of the other variable

Page 11: Chapter 2 part1-Scatterplots

11

Interpreting Scatterplots (Cont…)

No relationship: X and Y vary independently. Knowing X tells you nothing about Y.

Page 12: Chapter 2 part1-Scatterplots

12

Interpreting Scatterplots (Cont…) (Strength)

The strength of the relationship between the two variables can be seen by how much variation, or scatter, there is around the main form.

Page 13: Chapter 2 part1-Scatterplots

13

Interpreting Scatterplots (Cont…) (Outliers)

In a scatterplot, outliers are points that fall outside of the overall pattern of the relationship.

Page 14: Chapter 2 part1-Scatterplots

14

Interpreting Scatterplots (Cont…)

Direction Form Strength

There is one possible outlier―the hiker with the body weight of 187 pounds seems to be carrying relatively less weight than are the other group members.

There is a moderately strong, positive, linear relationship between body weight and backpack weight.

It appears that lighter hikers are carrying lighter backpacks.

Page 15: Chapter 2 part1-Scatterplots

How to scale a scatterplot

15

Using an inappropriate scale for a scatterplot can give an incorrect impression. Both variables should be given a similar amount of space: • Plot roughly square • Points should occupy all the plot space (no blank space)

Same data in all four plots

Page 16: Chapter 2 part1-Scatterplots

Categorical variables in scatterplots 16

What may look like a positive

linear relationship is in fact a

series of negative linear

associations.

Plotting different habitats in

different colors allows us to

make that important distinction.

To add a categorical variable, use a different plot color or symbol for each category.

Page 17: Chapter 2 part1-Scatterplots

17

Categorical variables in scatterplots (Cont…)

Comparison of men and women racing records over time. Each group shows a very strong negative linear relationship that would not be apparent without the gender categorization.

Relationship between lean body mass and metabolic rate in men and women. Both men and women follow the same positive linear trend, but women show a stronger association.

Page 18: Chapter 2 part1-Scatterplots

Categorical explanatory variables

When the explanatory variable is categorical, you cannot make a scatterplot, but you can compare the different categories side by side on the same graph (boxplots, or mean +/− standard deviation).

Comparison of income (quantitative response variable) for different education levels (five categories).

But be careful in your interpretation: This is NOT a positive association, because education is not quantitative.