Data Science from 3,209 Feet John Chandler University of Montana and Ars Quanta.

27
Data Science from 3,209 Feet John Chandler University of Montana and Ars Quanta

Transcript of Data Science from 3,209 Feet John Chandler University of Montana and Ars Quanta.

Data Science from3,209 Feet

John ChandlerUniversity of Montana and Ars Quanta

A Data Scientist Toolkit

• A scripting language (Python, C#, Java, Perl)• A statistical computing language (R, SAS, SPSS)• Database languages/environments (MSSQL, Oracle, Postgres, sqlite)• Distributed computing environment (MapReduce, in many flavors)

Fundamentally we are flipping bits, but this isn’t software development.

CRISP-DM, Shearer, 2000

CRISP-DM, Shearer, 2000

CRISP-DM, Shearer, 2000

Tools for data preparation

• A scripting language (Python, C#, Java)• A statistical computing language (R, SAS, SPSS)• Database languages/environments (MSSQL, Oracle, Postgres, sqlite)• Distributed computing environment (MapReduce)

CRISP-DM, Shearer, 2000

CRISP-DM, Shearer, 2000

CRISP-DM, Shearer, 2000

Advice

• What is the simplest thing that could possibly work?• Start small and expand scope.• Use general tools. • Bring uncertainty into the spotlight.• Expect iteration.• Clear-eyed evaluation of not competing on data.