How to Become a Data Scientist – By Ryan Orban, VP of Operations and Expansion, Galvanize
Transcript of How to Become a Data Scientist – By Ryan Orban, VP of Operations and Expansion, Galvanize
HOW TO BECOME A DATA SCIENTIST
Wednesday, April 15th
6:15-8:30p 1062 Delaware St.
Co-Founder & CEO Zipfian Academy
Ryan Orban @ryanorban
VP of Operations & Expansion Galvanize
Why are we talking about data science?
Perfect Storm
! "Cheap Storage
Competitive Advantages
#Abundant
Data
Technology
Year Capacity (GB) Cost per GB (USD)
1992 0.08 $3,827.20
1997 2.1 $157.00
2002 80 $3.74
2007 750 $0.35
2012 3,000 $0.05
Source: http://www.jcmit.com/diskprice.htm
Technology
Source: http://www.jcmit.com/diskprice.htm
0
1000
2000
3000
4000
1992 1997 2002 2007 2012
Capacity (GB) Cost per GB (USD)
! "Cheap Storage
Competitive Advantages
#Abundant
Data
Unprecedented Data Growth
! "Cheap Storage
Competitive Advantages
#Abundant
Data
Data Science Job Growth
Source: LinkedIn Analytics
! "Cheap Storage
Competitive Advantages
#Abundant
Data
Enter the Data Scientist
Math & Stats
Computer Science
Domain Expertise
Machine Learning
Software Engineering Research
Unicorn
Data Science
Data Science Unicorn
• Ask good questions. What do we not know? What would we like to know?
• Define and test an hypothesis. Run experiments with data.
• Scrape, munge, transform, and clean data. Instrument.
• Explore Data, Discover Unknowns
• Model Data, Understand Data Relationships
• Create Data Products. Tell Relevant Business Stories.
10 Things [most] Data Scientists Do
What do people look for in a data scientist?
Broad-range generalist
De
ep
exp
ert
ise
T-Shaped Skillset
T-Shaped Skillset
Machine Learning, Statistics, Domain Knowledge
Softwar
e
Engin
eering
Busin
ess A
cum
en
Distrib
uted
Comput
ing
Comm
unica
tion
Data Science Roles
How to I become a data scientist?
Master the fundamentals.
1
Math & Stats
Computer Science
Domain Expertise
Machine Learning
Software Engineering Research
Unicorn
Data Science
Mathematics Statistics
Mathematics & Statistics
Statistical Distributions
(Non) Parametric Tests
Significance & Hypothesis Testing
Bayesian Methods
Linear Algebra
Multivariable Calculus
Graph Theory
Probability
Mathematics & Statistics
Learn the tools of the trade.
2
Math & Stats
Computer Science
Domain Expertise
Machine Learning
Software Engineering
Research
Unicorn
Data Science
Python R Julia
Java C++/GoScala
Prototyping
Production
Software Engineering
Python R Julia
Java C++/GoScala
Prototyping
Production
Software Engineering
Python R Julia
Java C++/GoScala
Prototyping
Production
Software Engineering
Learn to Code
Math & Stats
Computer Science
Domain Expertise
Machine Learning
Software Engineering Research
Unicorn
Data Science
Distributed Computing
Supervised (SVM, Random Forest)
NLP / Information Retrieval
Algorithms & Data Structures
Data Visualization
Data Munging
Machine Learning & Software Engineering
Machine LearningSoftware
Engineering
Validation, Model Comparison
Unsupervised (Clustering, Topic
Modeling)
Demonstrate expertise.
3
Math & Stats
Computer Science
Domain Expertise
Machine Learning
Software Engineering Research
Unicorn
Data Science
DataTauGitHub
Math & Stats
Computer Science
Domain Expertise
Machine Learning
Software Engineering Research
Unicorn
Data Science
DATA SCIENCE HIERARCHY OF NEEDS
Mathematics, Statistics, Probability
FOUNDATIONAL
Software Engineering, Machine Learning, ETL, Data Transformation, SQL
FUNCTIONAL
Distributed Computing: Hadoop, Cascading, Spark, GraphLab
SCALABLE
Make an impact, communicate results, identify business objectives
EMOTIONAL
Learning data science on your own can be long and winding
road.
Open-Source Data Science Masters
SlideRule
Building the modern, educational campus for digital innovators and entrepreneurs
TRADITIONAL UNIVERSITIES ARE DISCONNECTED
ISSUE #1
ONLINE SCHOOLS LACK COMMUNITY + MOTIVATION
ISSUE #2
BOOT CAMPS LACK A LARGER ECOSYSTEM
ISSUE #3
AN URBAN CAMPUS & EDUCATION
PLATFORM
Galvanize provides immersive educational programs in web development and data science.
IMMERSIVE EDUCATIONAL PROGRAMS
Full stack development curriculum provided over an immersive, six month full-time program
• Strong Test-Driven Development (TDD)/agile software development environment
• No prior coding experience required
FULL STACK
Data science program (previously Zipfian academy) taught over 12-week immersive schedule
• One year accredited program resulting in Masters degree (Engineering/Big Data)
DATA SCIENCE PROGRAM
Timeline
STRUCTURED CURRICULUM
HIRING DAY
CAPSTONEPROJECT
GRADUATION
1 8 11 12
INTERVIEWPREP
Program Timeline
The Program
• Project-based curriculum with real datasets,
solving actual problems
• Guest lectures from leaders in the field
• Personal mentorship to help students grow
• Capstone project with company partners
• Vast network of hiring partners and events
Learning Techniques
$% &Rapid
IterationBuilt for
CollaborationContinuous Feedback
Curriculum as Product
Backgrounds
Educational Background
BS
MS
PhD
0 4 8 12 16
Backgrounds
Disciplines
Software EngineeringAnalysts
Finance/EconomicsEngineering
PhysicsPhysical Sciences
MathematicsStatistics
AstronomyLinguistics
Professional Poker
0 2 4 6 8
94% Placement Rate 94% Placement
$115k avg. salary
Hiring Partners
• Working knowledge of programming
• Background in a quantitative discipline, strong fundamentals
• Comfortable with mathematics and statistics
• Child-like curiosity
What We Look For
Data Science Immersive Denver, July 6th
• Present a guest lecture or share a data story
• Donate datasets and propose projects
• Sponsor a scholarship
• Attend one of our Hiring Days
Get Involved
• Instructors (Full-time & Part-Time)
• Outcomes Manager
• Student Services
• Community Coordinator
• Membership Manager
We’re Hiring!
Data Science Immersive GalvanizeU
Full-stack Web Development
Weekend Workshops
A Practical Intro to Data Science
http://bit.ly/learndatascience
THANK YOURYAN ORBAN| VP OF OPERATIONS & EXPANSION @RYANORBAN
www.galvanize.com