Keep your Data Science Efforts from Derailing

26
Keep your Data Science Efforts from Derailing Sean Murphy - @sayhitosean Marck Vaisman - @wahalulu Data Community DC @DataCommunityDC Additional thanks to Harlan Harris - @HarlanH

description

Keep your Data Science Efforts from Derailing. Sean Murphy - @ sayhitosean Marck Vaisman - @ wahalulu Data Community DC @ DataCommunityDC Additional thanks to Harlan Harris - @ HarlanH. Background and Motivations. Starting Data Community DC, Understanding our membership base. - PowerPoint PPT Presentation

Transcript of Keep your Data Science Efforts from Derailing

Page 1: Keep your  Data Science Efforts from Derailing

Keep your Data Science Efforts

from DerailingSean Murphy - @sayhitoseanMarck Vaisman - @wahalulu

Data Community DC@DataCommunityDC

Additional thanks to Harlan Harris - @HarlanH

Page 2: Keep your  Data Science Efforts from Derailing

Background and Motivations

Writing the chapter forThe Bad Data Handbook

Lack of clarity in the field on goals, skills, roles, career paths

Starting Data Community DC,Understanding our membership base

Page 3: Keep your  Data Science Efforts from Derailing

I) Know nothing about thy dataKnow your data Time spent up front is time well spent Over 80% of time is spent cleaning data Understand your data assets:

- How was it collected/generated?

- Where does it live?

- How is it formatted? Is formatting consistent?

- How is it stored?

- Are there missing values? If so, which ones, why?

- Where/how can you process it?

- Are there duplicated values, codes?

Page 4: Keep your  Data Science Efforts from Derailing

II) Thou shalt provide data scientists with one tool for all tasksProvide and configure the right tools for the job This is not a one-size-fits-all process Production or R&D/ad-hoc? Many tools, sources

- Databases (traditional, NoSQL)

- Legacy systems, Data Warehouses

- Flat files

- Analytics machine(s)

- Distributed/cloud computing (HDFS, S3)

- Open Source Software, libraries

Provide access and certain liberties (at least within R&D) Consider security and privacy issues Find a partner within your IT organization

Page 5: Keep your  Data Science Efforts from Derailing

III) Thou shalt analyze for analysis’ sake onlyBegin with the end in mind Analysis for analysis’s sake is pointless Lots of data or big data != Data Science or Value Open ended exploration or solving specific problem Focus on what is actionable Avoid analysis paralysis How prepared are you?

- You don’t even know where to begin:

- You have an idea of what you have, no previous analysis

- You know what you have, no previous analysis

- You know what you have, tried solving specific problems

Think broad: marketing, finance, operations, HR, product, etc.

Page 6: Keep your  Data Science Efforts from Derailing

IV) Thou shalt compartmentalize learningsShare your learnings Share Break down silos Doesn’t have to be complicated Avoid duplicated efforts

Page 7: Keep your  Data Science Efforts from Derailing

V) Thou shalt expect omnipotence from data scientistsGet the right people for the job, and value their specific skills Miscommunication leads to lost opportunities:

- excessive hype leads people to expect miracles, and miracle-workers

- a lack of awareness of the variety of data scientists leads organizations to wasted effort when trying to find talent

Page 8: Keep your  Data Science Efforts from Derailing

www.DataCommunityDC.org1. Data Science DC

(1808 members)2. Data Business DC

(369 members)3. Data Visualization

DC (329 members)

4. R Users DC (1133 members)

Page 9: Keep your  Data Science Efforts from Derailing

Greater than 250 completed surveys …

Page 10: Keep your  Data Science Efforts from Derailing
Page 11: Keep your  Data Science Efforts from Derailing

Skills Self-IdentificationExperiencesEducationWeb Presence

Page 12: Keep your  Data Science Efforts from Derailing

On a scale of 1 to 10, how good are you at Math?

Page 13: Keep your  Data Science Efforts from Derailing

Self Ranked Skills

Page 14: Keep your  Data Science Efforts from Derailing

Self Ranked Skills

Page 15: Keep your  Data Science Efforts from Derailing

Self Identification

Page 16: Keep your  Data Science Efforts from Derailing

Self Identification

Page 17: Keep your  Data Science Efforts from Derailing
Page 18: Keep your  Data Science Efforts from Derailing

DataBusinessPerson

Page 19: Keep your  Data Science Efforts from Derailing

DataCreative

Page 20: Keep your  Data Science Efforts from Derailing

DataDeveloper

Page 21: Keep your  Data Science Efforts from Derailing

DataResearcher

Page 22: Keep your  Data Science Efforts from Derailing

Why bother?

Page 23: Keep your  Data Science Efforts from Derailing

Awareness

1940 1950 1960 1970 1980 1990 2000 2010 20200

20406080

100120140160

Number of Subspecialty Certificates Issued by ABMS Member Boards

Page 24: Keep your  Data Science Efforts from Derailing

Common Language

DataBusinessPerson

DataCreative

DataDeveloper

DataResearcher

Page 25: Keep your  Data Science Efforts from Derailing

Efficiency

• Do you write code that is deployed in operational systems?

• Have you ever contributed to an open source project or open data initiative?

• Why are frequentists wrong?• What does SWOT stand for?

Page 26: Keep your  Data Science Efforts from Derailing

survey.datacommunitydc.org

@[email protected]

@[email protected]

Thank You!