CENDI wilbanks

84

description

Talk given to the meeting of the CENDI group in early November 2013. CENDI is a volunteer-powered membership organization that serves the federal information community - that is, all those who create, manage, aggregate, organize, and provide access to federally-funded data and publications resulting from the nation’s $150 billion annual investment in federal R&D. Member organizations represent a cross-section of federal data and publication providers, including libraries, data centers, aggregators, information technology developers, and content management providers.

Transcript of CENDI wilbanks

Page 1: CENDI wilbanks
Page 2: CENDI wilbanks

1. the policy

environment. it is not sufficient.

Page 3: CENDI wilbanks
Page 4: CENDI wilbanks
Page 5: CENDI wilbanks

http://www.systemswiki.org/images/8/8a/Wisdom.png

Page 6: CENDI wilbanks
Page 7: CENDI wilbanks
Page 8: CENDI wilbanks
Page 9: CENDI wilbanks
Page 10: CENDI wilbanks
Page 11: CENDI wilbanks
Page 12: CENDI wilbanks
Page 13: CENDI wilbanks

“is it open?” is perhaps not the

right frame.

Page 14: CENDI wilbanks

accessibility

adaptability

leverage

ease of mastery

Page 15: CENDI wilbanks
Page 16: CENDI wilbanks

accessibility

adaptability

leverage

ease of mastery

EASY TO USE NO OPEN LICENSE

Page 17: CENDI wilbanks

�17

Page 18: CENDI wilbanks
Page 19: CENDI wilbanks

�19

Page 20: CENDI wilbanks

accessibility

adaptability

leverage

ease of mastery

NO OPEN LICENSE DOWNLOAD AVAILABLE DOCUMENTATION IN PDF

Page 21: CENDI wilbanks

2. doing research in the open: early returns. it is not sufficient.

Page 22: CENDI wilbanks

“how accurately can we predict if a female breast cancer survivor will develop a second tumor?”

Page 23: CENDI wilbanks

may the best (statistical) model win

Page 24: CENDI wilbanks

code sharing a prerequisite.

Page 25: CENDI wilbanks
Page 26: CENDI wilbanks

accuracy of model jumped three orders of magnitude in nine days.

Page 27: CENDI wilbanks

�27

76% accurate.

Page 28: CENDI wilbanks

�28

(not a biologist)

Page 29: CENDI wilbanks

21 february 2013

17 april 2013

ongoing...

Page 30: CENDI wilbanks
Page 31: CENDI wilbanks
Page 32: CENDI wilbanks
Page 33: CENDI wilbanks

SHOW ME THE CODE!

Page 34: CENDI wilbanks
Page 35: CENDI wilbanks

...

Page 36: CENDI wilbanks

...

Page 37: CENDI wilbanks

...

Page 38: CENDI wilbanks

...

Page 39: CENDI wilbanks

...

Page 40: CENDI wilbanks
Page 41: CENDI wilbanks
Page 42: CENDI wilbanks
Page 43: CENDI wilbanks

if we don’t have the article in machinable form with rights to tranform? doesn’t happen.

Page 44: CENDI wilbanks

can we predict clinical utility from genetics of arthritis?

Page 45: CENDI wilbanks

can we predict scores on alzheimers cognitive tests from existing data?

Page 46: CENDI wilbanks
Page 47: CENDI wilbanks
Page 48: CENDI wilbanks

accessibility

adaptability

leverage

ease  of  mastery

0

25

25

25

25

THREE  OPTIONS  TO  DOWNLOAD  NO  CLEAR  LICENSE  PRIVACY  RESTRICTIONS  METADATA

Page 49: CENDI wilbanks

accessibility

adaptability

leverage

ease  of  mastery

IMPACT  OF  PRIVATE  INTERVENTION

Page 50: CENDI wilbanks

68core projects

Page 51: CENDI wilbanks

248researchers

Page 52: CENDI wilbanks

28institutions

Page 53: CENDI wilbanks

1070datasets

Page 54: CENDI wilbanks

1723results

Page 55: CENDI wilbanks

Omberg,  et  al.  Nature  Gene*cs

Page 56: CENDI wilbanks

colorectal cancer subtyping

Page 57: CENDI wilbanks

A

B

C

D

E

F

1

2

3

4

5

6

datasets subtypesanalysis groups

Page 58: CENDI wilbanks

A

B

C

D

E

F

1

2

3

4

5

6

datasetsanalysis groups

G ...

subtypes

Page 59: CENDI wilbanks

analysis groups

G

Page 60: CENDI wilbanks

A

B

C

D

E

F

1

2

3

4

5

6

datasetsanalysis groups

G ...

subtypes

Page 61: CENDI wilbanks

3. research and culture are

on a collision course, driven by data.

Page 62: CENDI wilbanks

tension between anonymity and utility.

Page 63: CENDI wilbanks

“more like plutonium than gold”

Page 64: CENDI wilbanks

tension between expectation and reuse.

Page 65: CENDI wilbanks

68% want their data shared for science

Page 66: CENDI wilbanks

tension between value of individual and value of

aggregate.

Page 67: CENDI wilbanks
Page 68: CENDI wilbanks

$.50 to $2.50 for SSN, birthdate, etc.

Page 69: CENDI wilbanks

$5 to $15 for credit, background checks.

Page 70: CENDI wilbanks

~40 records for $2100

Page 71: CENDI wilbanks

tension between “research” data and

“consumer” data.

Page 72: CENDI wilbanks
Page 73: CENDI wilbanks
Page 74: CENDI wilbanks

https://www.scienceexchange.com/

Page 75: CENDI wilbanks
Page 76: CENDI wilbanks
Page 77: CENDI wilbanks
Page 78: CENDI wilbanks
Page 79: CENDI wilbanks

it’s likely that we will end up with a data network

effect of some sort.

Page 80: CENDI wilbanks

a. the incremental institution.

Page 81: CENDI wilbanks

b. the walled garden.

Page 82: CENDI wilbanks

c. big networks of small things.

Page 83: CENDI wilbanks
Page 84: CENDI wilbanks

thank you !

@wilbanks [email protected]