1.4 Data in 2 Variables Definitions. 5.3 Data in 2 Variables: Visualizing Trends When data is...
-
Upload
mabel-merritt -
Category
Documents
-
view
218 -
download
0
Transcript of 1.4 Data in 2 Variables Definitions. 5.3 Data in 2 Variables: Visualizing Trends When data is...
1.4 Data in 2 Variables
Definitions
5.3 Data in 2 Variables:Visualizing Trends
• When data is collected over long period of time, it may show trends
• Trends allow you to make predictions about future events
• Trends can be over time, or over change in some other variable (e.g. mass)
• One effective way to visualize: scatterplot– Shows joint distribution of 2 variables
Scatterplots
Time (s)
Dis
tanc
e (m
)
Independent variable
•Variable whose values are arbitrarily chosen
Dependent variable
•Variable whose values depend on independent variable
• Scatterplots can help determine if there is a relationship in the data– Is there a pattern in the data?
x
y
xy
•there is a relationship
•As x increases, y increases
•there is no relationship
•As x increases, y stays pretty much the same
0
5
10
15
20
25
30
0 2 4 6 8 10 12
Time (s)
Dis
tanc
e (m
)We can show the relationship using a line of best fit
0
50
100
150
200
250
0 2 4 6 8 10 12
Time (s)
Dis
tanc
e (m
)If the data is nonlinear, we use a curve of best fit
0
50
100
150
200
250
0 2 4 6 8 10 12
Time (s)
Dis
tanc
e (m
)
Correlation
• Measure of the strength of the apparent relationship between two variables
• Look at upward/downward/horizontal trend– Positive/Negative/No correlation
• Look at how closely the points fit the curve of best fit– Strong/Moderate/Weak correlation
• Note: trend and fit are unrelated
Classifying Linear Relationships
x
y
Strong positive correlation
•Positive slope
•Tightly clustered to line of best fit
Classifying Linear Relationships
x
y
Strong negative correlation
•Negative slope
•Tightly clustered to line of best fit
Classifying Linear Relationships
x
y
No correlation
•0 slope
•Randomly scattered
Classifying Linear Relationships
x
y
Moderate positive correlation
Classifying Linear Relationships
x
y
Weak positive correlation
Classifying Linear Relationships
x
y
Weak negative correlation
Warning!!!
• Correlation does not necessarily mean causation
• Just because there is a relationship between A and B does not mean A causes B – More on this next day
Using trends for predictions
• Use the equation of the line of best fit
• Extrapolation– Estimation of a value outside known data set
• Interpolation– Estimation of a value between two known
values
x
y
y = mx + b
Extrapolation
Interpolation
Go to “Go For the Gold!”
Go for the Gold!Line of Best Fit: Men
• Mensdistance = 0.016 Year – 24.04• Sum of squares = 0.8308• Slope is 0.016: change in distance over time (in
years)– Every year, the distance should increase by 1.6 cm
• Y-intercept is –24.04– In year zero, they jumped backwards!?
– meaningless
Go for the Gold!Line of Best Fit: Women
• Womensdistance = 0.021 Year –35• Sum of squares = 0.7447• Slope is 0.021; y-int is -35
– Every year, the winning women’s distance should increase by 2.1 cm.
– Y-intercept is meaningless for this case
• In 2008, winning men’s distance should be 8.85 m and the women’s distance should be 7.17 m (actual distances 8.34 m and 7.04 m)
• In 2012?