what's an intranet - Why an Intranet is a Must Have Investment for 2017
Data Scientists: Your Must-Have Business Investment
-
Upload
kalido -
Category
Technology
-
view
1.591 -
download
0
description
Transcript of Data Scientists: Your Must-Have Business Investment
1 April 10, 2023© Kalido I Kalido Confidential April 10, 2023
Data Scientist: Your Must-Have Business Investment NOW
2 April 10, 2023© Kalido I Kalido Confidential April 10, 2023
Gregory Piatetsky
Editor, Kdnuggets
co-founder KDD and ACM SIGKDD
David Smith
Data Scientist
Revolution Analytics
Carla Gentry
Data Scientist
Analytical Solution
Darren Peirce
CTO
Kalido
Eric Kavanagh
DM Radio Host
Information Management Magazine’s DM Radio
Today’s Speakers #DataScienceNow
Revolution Confidential
What is a data scientist?
3
© Dov Harrington, CC By-2.0
http://www.flickr.com/photos/idovermani/4110546683/
David SmithRevolution Analytics
@revodavid
Revolution Confidential
Statistician Data Scientist
Image Baseball (Cricket) HBR Sexiest Job of 21st Century
Mode Reactive Consultative
Works Solo In a team
Inputs Data File, Hypothesis A Business Problem
Data Pre-prepared, clean Distributed, messy, unstructured
Data Size Kilobytes Gigabytes
Tools SAS, Mainframe R, Python, awk, Hadoop, Linux, …
Nouns Tables Data Visualizations
Focus Inference (why) Prediction (what)
Output Report Data App / Data Product
Latency Weeks Seconds
Stars G.E.P BoxTrevor Hastie
Hilary MasonNate Silver
http://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/ 4
Revolution Confidential
Statistician Data Scientist
Image Baseball (Cricket) HBR Sexiest Job of 21st Century
Mode Reactive Consultative
Works Solo In a team
Inputs Data File, Hypothesis A Business Problem
Data Pre-prepared, clean Distributed, messy, unstructured
Data Size Kilobytes Gigabytes
Tools SAS, Mainframe R, Python, awk, Hadoop, Linux, …
Nouns Tables Data Visualizations
Focus Inference (why) Prediction (what)
Output Report Data App / Data Product
Latency Weeks Seconds
Stars G.E.P BoxTrevor Hastie
Hilary MasonNate Silver
5
Revolution ConfidentialThree Essential Skills of Data Scientists
6
Drew Conway
http://www.dataists.com/2010/09/the-data-science-venn-diagram/
Data IntegrationMashups
Applications
ModelsVisualizationPredictionsUncertainty
ProblemsData Sources
Credibility
EffectiveData
Applications
Revolution ConfidentialData Science to the Rescue!
Revolution Confidential
Business Intelligence Data Science
Perspective Looking backwards Looking forwards
Actions Slice and Dice Interact
Expertise Business User Data Scientist
Data Warehoused, Siloed Distributed, real-time
Scope Unlimited Specific business question
Questions What happened? What will happen?What if?
Output Table Answer
Applicability Historic, possible confounding factors
Future, correcting for influences
Tools SAP, Cognos, Microstrategy, SAS
Revolution R EnterpriseQlikView, Tableau, Jaspersoft
Hot or not? So 1997 Transformational
8
What is Data Science?
By Carla GentryData Scientist
Analytical-Solution
Data Science is….• The term "data science" has existed for over thirty
years – first mentioned by Peter Naur in 1960 but more recently it has gained a lot of attention!
Data Science can be broken down into 4 main areas of expertise.
• Data knowledge – design & structure
• Programming– SAS, R, SQL, NO-SQL
• Analytics– Insight
• Communication– Tell the story
Data Knowledge: Part analyst - part IT
• What kind of servers do you own?- Servers vs. Mainframe
• What kind of load can the server handle?- Iterations matter
– Why ask this?
Programming – Pick a language and use it wisely
• Efficiency is KING!- Why?
• Number of iterations & complex algorithms or scripts. Snowflakes vs. Star schema?-Design is import but why?
• Key things: normalize, index, there is more to Data Science than just analytics.
How can I learn about Data Science?
• For those who want to invest their time and talent there are resources.
• College Courses• Online• Webinars• Blogs
© KDnuggets 2013 15
Data Science and Data Scientists Now
Gregory Piatetsky, @kdnuggets
Analytics, Big Data, Data Mining, and Data Science Resources
© KDnuggets 2013 16
• Statistics, 1830-• Data mining, 1980-• Knowledge Discovery in Data (KDD), 1989-• Business Analytics, 1997-• Predictive Analytics, 2002-• Data Analytics,2011-• Data Science, 2011-• …?
Same Core Idea:
Finding Useful Patterns in Data
Different
Emphasis
Trends from Google Ngrams (1800-2008)
and Google Trends (2005-2013)
© KDnuggets 2013 17
Big Data > Data Mining > Business Analytics > Predictive Analytics
> Data Science
Big Data
Google Trends search, Jan 2008- Apr 2013, Worldwide
Data mining
© KDnuggets 2013 18
Data Scientist – sexiest job of the 21st Century (???)
say Thomas H. Davenport and D.J. Patil, (HBR, Oct 2012)
“Data Scientist”
Fastest growing term on
www.kdnuggets.com/jobs
1% of jobs in 2010
4% of jobs in 2011
19% of jobs in 2012
23% of jobs in 2013
© KDnuggets 2013 19
Data Mining
Big Data
Data Scientist
“Data mining” jobs are more common, but
“Big Data” jobs are surging much faster than “Data Scientist”
“Statistician” jobs are steady, but not growing
Statistician
© KDnuggets 2013 20
• Big Data can produce better predictions, but expect limited improvement
• Example: Netflix prize took 3 years to improve prediction of movie ranking from 0.95 stars to 0.86
• Inherent randomness in human behavior• Data Science should help separate hype from reality• Biggest effects from Big Data are from new platforms, like
Google, Facebook, LinkedIn; Personalized medicine• However, Big Data makes privacy online almost possible
Gregory Piatetsky-Shapiro, Big Data Hype and Reality, Harvard Business Review blog, Oct 2012
21© 2013 KDnuggets
Gartner Hype Cycle
Big Data
Gartner VP says Big Data is Falling into the Trough of Disillusionment, Jan 2013
© 2013 Kalido I Kalido Confidential I April 10, 202322
Q&A
Gregory PiatetskyEditor, Kdnuggetsco-founder KDD and ACM SIGKDD@kdnuggets
David SmithData ScientistRevolution Analytics@revodavid
Carla GentryData ScientistAnalytical Solution@data_nerd
Darren PeirceCTOKalido@DarrenPeirce
Eric KavanaghDM Radio HostInformation Management Magazine’s DM Radio@eric_kavanagh
© 2010 Kalido I Kalido Confidential I April 10, 202323
Summers Sessions: Two Tracks For YOU
Series Kickoff
May 14: Data Scientist: Your must-have business investment now.
(30 Minute Learning Sessions)
May 28 Rapid Data Integration tools and methods
June 4 Harmonizing Data for the Warehouse
June 11 Rapid Iteration Methodology Using Information Models
Series Kickoff
June 25: Find your data warehouse’s hidden costs before they find you.
(30 Minute Learning Sessions)
July 2 The real cost per release cycle
July 9 Automate to reduce operating costs
July 16 Reduce tool cost
July 23 Scale drives cost reductions
Agile Information Foundationfor the Data Scientist
TCO: Find data warehouse costs before they find you.
Visit get.kalido.com/summer-series to register