1 A Rorschach Test. S. Stanley Young, NISS Jessie Q. Xia, NISS Banff, Canada Dec 15, 2011 Variable...

32
1 A Rorschach Test

Transcript of 1 A Rorschach Test. S. Stanley Young, NISS Jessie Q. Xia, NISS Banff, Canada Dec 15, 2011 Variable...

Page 1: 1 A Rorschach Test. S. Stanley Young, NISS Jessie Q. Xia, NISS Banff, Canada Dec 15, 2011 Variable Importance in Environmental Studies.

1

A Rorschach Test

Page 2: 1 A Rorschach Test. S. Stanley Young, NISS Jessie Q. Xia, NISS Banff, Canada Dec 15, 2011 Variable Importance in Environmental Studies.

S. Stanley Young, NISSJessie Q. Xia, NISS

Banff, CanadaDec 15, 2011

Variable Importancein Environmental Studies

Page 3: 1 A Rorschach Test. S. Stanley Young, NISS Jessie Q. Xia, NISS Banff, Canada Dec 15, 2011 Variable Importance in Environmental Studies.

Current Challenges in Statistical Learning

1. Statistical methods

2. Data quality

3. Invalid claims

a. Multiple testing

b. Multiple modeling

c. Bias

Page 4: 1 A Rorschach Test. S. Stanley Young, NISS Jessie Q. Xia, NISS Banff, Canada Dec 15, 2011 Variable Importance in Environmental Studies.

Great Smog of '52 or Big Smoke12,000 estimated deaths

Page 5: 1 A Rorschach Test. S. Stanley Young, NISS Jessie Q. Xia, NISS Banff, Canada Dec 15, 2011 Variable Importance in Environmental Studies.

Pope et al. 2009

Page 6: 1 A Rorschach Test. S. Stanley Young, NISS Jessie Q. Xia, NISS Banff, Canada Dec 15, 2011 Variable Importance in Environmental Studies.

6

Studied Variables

Life Expectancy life-table methods

Per capita income (in thousands of $)

Lung Cancer (Age standardized death rate)

COPD (Age standardized death rate)

High-school graduates (proportion of population)

PM2.5 (μg/m3)

Black population (proportion of population)

Population (in hundreds of thousands)

5-Year in-migration (proportion of population)

Hispanic population (proportion of population)

Urban residence (proportion of population)

Page 7: 1 A Rorschach Test. S. Stanley Young, NISS Jessie Q. Xia, NISS Banff, Canada Dec 15, 2011 Variable Importance in Environmental Studies.

7

First Analysis, Regression

Variable SS First SS Last

Income 31.8 15.6

Lung Cancer 22.4 5.1

COPD 21.5 4.1

High School 15.9 0.0

Population 9.4 5.2

PM2.5 9.4 5.8

Hispanic 4.3 2.4

Black 3.1 1.7

Urban 1.4 0.8

Migration 0.0 0.8

Page 8: 1 A Rorschach Test. S. Stanley Young, NISS Jessie Q. Xia, NISS Banff, Canada Dec 15, 2011 Variable Importance in Environmental Studies.

8

Recursive Partitioning

Page 9: 1 A Rorschach Test. S. Stanley Young, NISS Jessie Q. Xia, NISS Banff, Canada Dec 15, 2011 Variable Importance in Environmental Studies.

9

Variable Importance

Variable Regression RP

Income 0.3390 0.2108

COPD 0.1621 0.1199

Lung Cancer 0.1768 0.1467

PM2.5 0.0732 0.1302

High School 0.0997 0.1066

%Black 0.0537 0.0319

Pop Density 0.0418 0.0793

%Hispanic 0.0177 0.0136

Migration 0.0228 0.0202

Urban 0.0133 0.0105

Page 10: 1 A Rorschach Test. S. Stanley Young, NISS Jessie Q. Xia, NISS Banff, Canada Dec 15, 2011 Variable Importance in Environmental Studies.

10

East versus West

Krewski et al. 2000 Health Effects In.

Enstrom 2005 Inhalation Toxicology

Bell et al. 2007 Env Health Pers

Smith et al. 2009 Inhalation Toxicology

Jerrett 2010 CARB workshop

Page 11: 1 A Rorschach Test. S. Stanley Young, NISS Jessie Q. Xia, NISS Banff, Canada Dec 15, 2011 Variable Importance in Environmental Studies.

11

Fine particles and Mortality

Pope co-author, 2000.

Page 12: 1 A Rorschach Test. S. Stanley Young, NISS Jessie Q. Xia, NISS Banff, Canada Dec 15, 2011 Variable Importance in Environmental Studies.

12

Ozone and Mortality

Page 13: 1 A Rorschach Test. S. Stanley Young, NISS Jessie Q. Xia, NISS Banff, Canada Dec 15, 2011 Variable Importance in Environmental Studies.

13

Variable Importance

Regression Recursive Partitioning

Page 14: 1 A Rorschach Test. S. Stanley Young, NISS Jessie Q. Xia, NISS Banff, Canada Dec 15, 2011 Variable Importance in Environmental Studies.

14

Longevity versus PM2.5

East : Gray West : Red

Page 15: 1 A Rorschach Test. S. Stanley Young, NISS Jessie Q. Xia, NISS Banff, Canada Dec 15, 2011 Variable Importance in Environmental Studies.

15

Longevity versus Income

Page 16: 1 A Rorschach Test. S. Stanley Young, NISS Jessie Q. Xia, NISS Banff, Canada Dec 15, 2011 Variable Importance in Environmental Studies.

16

Hans Rosling's 200 Countries, 200 Years

http://www.youtube.com/watch?v=jbkSRLYSojo

Page 17: 1 A Rorschach Test. S. Stanley Young, NISS Jessie Q. Xia, NISS Banff, Canada Dec 15, 2011 Variable Importance in Environmental Studies.

17

Summary to this point

Income is very important.

PM2.5 is 4th or 5th in importance.

PM2.5 is not important in West.

Pope knew or should have known the East/West heterogeneity.

Page 18: 1 A Rorschach Test. S. Stanley Young, NISS Jessie Q. Xia, NISS Banff, Canada Dec 15, 2011 Variable Importance in Environmental Studies.

18

E1: Breakfast cereal and boy babies

Page 19: 1 A Rorschach Test. S. Stanley Young, NISS Jessie Q. Xia, NISS Banff, Canada Dec 15, 2011 Variable Importance in Environmental Studies.

19

P-value plot

Page 20: 1 A Rorschach Test. S. Stanley Young, NISS Jessie Q. Xia, NISS Banff, Canada Dec 15, 2011 Variable Importance in Environmental Studies.

E2 : Peto, NEJM, statins and cancer

Hypothesis: The (SEAS) trial has raised the hypothesis that adding ezetimibe to statin therapy might increase theincidence of cancer.

Page 21: 1 A Rorschach Test. S. Stanley Young, NISS Jessie Q. Xia, NISS Banff, Canada Dec 15, 2011 Variable Importance in Environmental Studies.

The claim fails to replicate.

The relative risk is wide (95% CI, 1.13 to 2.12; 99% CI, 1.02 to 2.33; uncorrected P = 0.006 before any allowance is made for this being the hypothesis-generating result. NB: 16 x 0.006 = 0.098.

SEAS New Studies

Page 22: 1 A Rorschach Test. S. Stanley Young, NISS Jessie Q. Xia, NISS Banff, Canada Dec 15, 2011 Variable Importance in Environmental Studies.

E3: A multiple testing and modeling train wreck

1. 275 chemicals2. 32 medical outcomes3. 10 demographic covariates

275 x 32 = 8800 x 2^10 = ~9 million

A CDC “systems” train wreck in progress!

JAMA

Page 23: 1 A Rorschach Test. S. Stanley Young, NISS Jessie Q. Xia, NISS Banff, Canada Dec 15, 2011 Variable Importance in Environmental Studies.

Author Interpretation

There exists an increased risk of myocardial infarction in patients exposed to abacavir and didanosinewithin the preceding 6 months.

E4 : Bias Example: Lancet DAD study

Page 24: 1 A Rorschach Test. S. Stanley Young, NISS Jessie Q. Xia, NISS Banff, Canada Dec 15, 2011 Variable Importance in Environmental Studies.

First drug use (Text, page 1422, and Table 3)

Page 25: 1 A Rorschach Test. S. Stanley Young, NISS Jessie Q. Xia, NISS Banff, Canada Dec 15, 2011 Variable Importance in Environmental Studies.

25

E4 : BMJ versus JAMA (1)

Conclusion: The risk of oesophageal cancer increased with 10 or more prescriptions for oral bisphosphonates and with prescriptions over about a five year period.

BMJ 2010; 341:c4444

Page 26: 1 A Rorschach Test. S. Stanley Young, NISS Jessie Q. Xia, NISS Banff, Canada Dec 15, 2011 Variable Importance in Environmental Studies.

26

E4: BMJ versus JAMA (2)

Conclusion: Oral bisphosphonates was not significantly associated with incident of esophageal or gastric cancer.

JAMA 2010; 304(6): 657 663‐

Page 27: 1 A Rorschach Test. S. Stanley Young, NISS Jessie Q. Xia, NISS Banff, Canada Dec 15, 2011 Variable Importance in Environmental Studies.

27

A Rorschach Test

With large, complex data sets, there is enough flexibility to get what you want/need.

Page 28: 1 A Rorschach Test. S. Stanley Young, NISS Jessie Q. Xia, NISS Banff, Canada Dec 15, 2011 Variable Importance in Environmental Studies.

28

Consumer Wishes

Honest science

Valid claims

Claims in context

+ and – of data and methods

Page 29: 1 A Rorschach Test. S. Stanley Young, NISS Jessie Q. Xia, NISS Banff, Canada Dec 15, 2011 Variable Importance in Environmental Studies.

29

What do we have? (Deming)

A systems failure.

Essentially no process control.

Journals operating by “quality by inspection”.

Workers are happy.

Management failure.

Page 30: 1 A Rorschach Test. S. Stanley Young, NISS Jessie Q. Xia, NISS Banff, Canada Dec 15, 2011 Variable Importance in Environmental Studies.

30

What to do?

Funding agencies need to require data access on publication.

Editors need to give up quality by inspection require split sample strategy require number of claims at issue.

Page 31: 1 A Rorschach Test. S. Stanley Young, NISS Jessie Q. Xia, NISS Banff, Canada Dec 15, 2011 Variable Importance in Environmental Studies.

31

Statisticians

Eventually society will figure it out;

Scientific claims are (most) often wrong.

Essentially all claims are supported by statistics.

Society will ask, “Where were the statisticians?”

Page 32: 1 A Rorschach Test. S. Stanley Young, NISS Jessie Q. Xia, NISS Banff, Canada Dec 15, 2011 Variable Importance in Environmental Studies.

32

Contact

Stan Young

www.niss.org

[email protected]

919 685 9328