Re-Expressing Data. Scatter Plot of: Weight of Vehicle vs. Fuel Efficiency Residual Plot of: Weight...

12
Re-Expressing Data

description

Scatter Plot of: Weight of Vehicle vs. Fuel Efficiency Residual Plot of: Weight of Vehicle vs. Fuel Efficiency Re-Expression!!

Transcript of Re-Expressing Data. Scatter Plot of: Weight of Vehicle vs. Fuel Efficiency Residual Plot of: Weight...

Page 1: Re-Expressing Data. Scatter Plot of: Weight of Vehicle vs. Fuel Efficiency Residual Plot of: Weight of Vehicle vs. Fuel Efficiency.

Re-Expressing Data

Page 2: Re-Expressing Data. Scatter Plot of: Weight of Vehicle vs. Fuel Efficiency Residual Plot of: Weight of Vehicle vs. Fuel Efficiency.

Scatter Plot of: Weight of

Vehicle vs. Fuel Efficiency

Residual Plot of: Weight of

Vehicle vs. Fuel Efficiency

Page 3: Re-Expressing Data. Scatter Plot of: Weight of Vehicle vs. Fuel Efficiency Residual Plot of: Weight of Vehicle vs. Fuel Efficiency.

Scatter Plot of: Weight of

Vehicle vs. Fuel Efficiency

Residual Plot of: Weight of

Vehicle vs. Fuel Efficiency

Re-Expression!!

Page 4: Re-Expressing Data. Scatter Plot of: Weight of Vehicle vs. Fuel Efficiency Residual Plot of: Weight of Vehicle vs. Fuel Efficiency.

Goal 1:Make distribution more symmetric!

Allows use of 68 – 95 – 99.7 Rule Take log of x variable

Page 5: Re-Expressing Data. Scatter Plot of: Weight of Vehicle vs. Fuel Efficiency Residual Plot of: Weight of Vehicle vs. Fuel Efficiency.

Goal 2:Make spread more alike even if centers are

different Groups with common spread are easier to

compare Take log of x

Page 6: Re-Expressing Data. Scatter Plot of: Weight of Vehicle vs. Fuel Efficiency Residual Plot of: Weight of Vehicle vs. Fuel Efficiency.

Goal 3:Make scatter plot nearly linear

Linear is easier to model Take log of x

Page 7: Re-Expressing Data. Scatter Plot of: Weight of Vehicle vs. Fuel Efficiency Residual Plot of: Weight of Vehicle vs. Fuel Efficiency.

Goal 4:Make scatter plot more spread out than

thick at one end or the other Take log x

Page 8: Re-Expressing Data. Scatter Plot of: Weight of Vehicle vs. Fuel Efficiency Residual Plot of: Weight of Vehicle vs. Fuel Efficiency.

There is a family of simple re-expressions that move data toward our goals in a consistent way. This collection of re-expressions is called the Ladder of Ladder of PowersPowers.

The Ladder of Powers orders the effects that the re-expressions have on data.

Ladder of Ladder of PowersPowers

Page 9: Re-Expressing Data. Scatter Plot of: Weight of Vehicle vs. Fuel Efficiency Residual Plot of: Weight of Vehicle vs. Fuel Efficiency.

Ladder of Ladder of PowersPowers

Ratios of two quantities (e.g., mph) often benefit from a reciprocal.

The reciprocal of the data––11

An uncommon re-expression, but sometimes useful.

Reciprocal square root––1/21/2

Measurements that cannot be negative often benefit from a log re-expression.

We’ll use logarithms here““0”0”

Counts often benefit from a square root re-expression.

Square root of data values½½

Data with positive and negative values and no bounds are less likely to benefit from re-expression.

Raw data11

Try with unimodal distributions that are skewed to the left.

Square of data values22

CommentCommentNameNamePowerPower

Page 10: Re-Expressing Data. Scatter Plot of: Weight of Vehicle vs. Fuel Efficiency Residual Plot of: Weight of Vehicle vs. Fuel Efficiency.

Plan B: Attack of the Plan B: Attack of the LogarithmsLogarithms

We seek a “useful” model, not We seek a “useful” model, not perfection!!!!perfection!!!!

Page 11: Re-Expressing Data. Scatter Plot of: Weight of Vehicle vs. Fuel Efficiency Residual Plot of: Weight of Vehicle vs. Fuel Efficiency.

Why Not Just Use a Curve?Why Not Just Use a Curve?

If there’s a curve in the scatterplot, why not just fit a curve to the data?

Page 12: Re-Expressing Data. Scatter Plot of: Weight of Vehicle vs. Fuel Efficiency Residual Plot of: Weight of Vehicle vs. Fuel Efficiency.

Why Not Just Use a Curve? Why Not Just Use a Curve? (cont.)(cont.)

The mathematics and calculations for “curves of best fit” are considerably more difficult than “lines of best fit.”

Besides, straight lines are easy to understand. We know how to think about the slope

and the y-intercept.