How to crack down big data?

27
How to crack down BIG DATA ? Required Skills for Data Scientist

Transcript of How to crack down big data?

Page 1: How to crack down big data?

How to crack down BIG DATA?Required Skills for Data Scientist

Page 2: How to crack down big data?

hello!I AM

DAVIDHUANG

I am here because Iwant to find more lovers for

data science . You can find me at:

tawei.huang1@gmail,com

My Experience• Data Scientist Intern, Yoctol• Data & Strategy Intern , Chocolabs• Summer Intern Student, Institute of

Mathematics, Academic Sinica

My Education Background• Master in Statistics, NTU• BSc. In Quantitative Finance, NTHU• Research Student, PKU

Page 3: How to crack down big data?

“Big data is a big trend, but it is very

difficult to hire a data scietist.

It’s also hard to find a job in TW XD

Page 4: How to crack down big data?

1. Who is a Data Scientist?The skill sets you need to be a data scientist.

Page 5: How to crack down big data?

In a big data project, we need these people!

Data Backend Engineer

Database Architect

Data Analyst / Data Scientist

Domain Expert

Develop and operate backend systems related to data access, collection, processing and storage,

Architect and design Database solutions for the enterprise, and lead the effort on database performance and optimization

To use advanced quantitative analysis, data mining techniques and strong industry acumen to interpret, connect and predict data to deliver insight and recommendations for decisions.

Assist the data team to understand the domain problem & knowledge.

Page 6: How to crack down big data?

Data analyst / Machine Leaning

Lots of people say that they are different, but I think “every data analyst should be a data scientist, and the converse holds!”

Explanatory Analytics

Theory-based, statistical testing of causal hypothesis (commonly see in economics)

Strength of relationship in statistical model

Data analyst

PredictiveAnalytics

Empirical method for predicting new observations (in statistical / math / CS ways)

Ability to accurately predict new individuals

Data scientist

Both fields are important for discovering knowledge.

Page 7: How to crack down big data?

Data UnicornA data unicorn expertises in all

fields… Mission impossible?

Page 8: How to crack down big data?

The Data Scientist Venn Diagram

Math &Statistics

HackingSkills

DomainExpertise

MachineLearning

ResearchProgram

Unicorn

First become a(1) researcher,(2) machine learner,(3) programmer,and then find your ownway to be a data scientist.

Page 9: How to crack down big data?

Skill Sets for Data Scientist – Math & Stat

Mathematics & Statistics

Multivariable CalculusLinear Algebra

Probability TheoryStatistics / Math Statistics

Convex OptimizationDiscrete Analysis

Basic Knowledge

Regression Analysis / GLMExperimental Design

Causal Inference

Multivariate AnalysisBiz Analytics & Data Mining

Data Mining

Machine LearningDeep Learning (ANN/CNN)

Machine Learning

Time Series Analysis

Forecasting

1

Page 10: How to crack down big data?

Skill Sets for Data Scientist – Programming

Programming Skills

Python(Scripting Language)

R(Statistical Software)

Matlab(Super Fast but Expensive)

Programming Skill SQL & Relational AlgebraNoSQL / Cassandra / etc.

HDFS / Map ReduceHadoop and Hive /Pig

Spark & Scala

Database Querying

A little bit JavaData Structure & Algorithm

Data Munging (python!)Data Viz (d3.js / Tableau)

Software Engineering

2

1. D3.js visualization: http://goo.gl/cVlTX72. Spark MiLib: http://goo.gl/VNMQ97

Page 11: How to crack down big data?

Skill Sets for Data Scientist – Business Sense

Business Professionalism

Hypothesis ThinkingPyramid Principles

BizPro is a good choice!

Logical Thinking

To be honest, the crucial truth is that “this part is very important, but the less important skill set!”

Presentation & PresenceCommunication Skill

Upward Management

Communication Skill

I think this is the niche for business school students. Specific knowledge about marketing, financial analysis, etc. helps a lot.

3

Page 12: How to crack down big data?

My Learning Path for you – Math matters!

CalculusLinear AlgebraProbability TheoryMath Statistics

Freshman - Junior

1

ProgrammingC / Java / R

2

Financial MarketMarketingManagement

3

Advance StatisticsData MiningEconometrics

Senior

R ProgrammingMatlab (Basic)

CompetitionsAdvanced FinanceMacroeconomics

Statistical LearningCompress Sensing

Current

Python & SQLHadoop & Spark

BizPro TrainingLogical ThinkingMarketing Analytic

Page 13: How to crack down big data?

2. Master in Data Science free!How to become the data unicorn without any tuition fee

Page 14: How to crack down big data?

Data Scientist 101: Johns Hopkins MOOC

The Coursera Specializations offered by Johns Hopkins University give a very good general exposure to the world of data science.

Executive Data Science

I think this specialization is designed for those who don’t want to become a data scientist but may work in a data-driven company.URL: https://goo.gl/ZNBF7N

Data Science

I think this specialization is designed for those who don’t have a very strong academic background but want to become a data scientist.URL: https://goo.gl/8OzBhe

Difficulty

Difficulty

Page 15: How to crack down big data?

Basic Math: Calculus & Linear Algebra

Calculus and linear algebra are fundamental tools for data scientists and statisticians. Having a solid foundation will help a lot.

Calculus I & II, NTHU

This course gives you a solid foundation of Euclidean space and multivariable calculus, which is very important for a data scientist.URL: http://ocw.nthu.edu.tw/ocw/index.php?page=course&cid=7&

Linear Algebra, NCTU

A data scientist usually thinks data with a matrix representation. The concept of vector algebra helps a lot for high dimensional data analysis.URL: http://goo.gl/KFdJTT

Difficulty

Difficulty

Page 16: How to crack down big data?

Advance Math: Convex Optimization

This is a very advanced topic we will use when doing machine learning. However, I don’t think every data scientist should understand this field.

Convex Optimization, Stanford

This course should benefit anyone who uses or will use scientific computing or optimization in engineering or related work (e.g., machine learning, finance, operational research).URL: http://stanford.edu/class/ee364a/MOOC: https://goo.gl/KBQ473

Difficulty

Page 17: How to crack down big data?

Basic Stat: Probability & Math Statistics

If you don‘t have a probability & math statistics, you can’t learn any advanced data analytics method. Please learn it!

Probability, NTHU

This course gives you a solid foundation of Euclidean space and multivariable calculus, which is very important for a data scientist.URL: http://goo.gl/G4MhIj

Math Statistics, NTHU

A data scientist usually thinks data with a matrix representation. The concept of vector algebra helps a lot for high dimensional data analysis.URL: http://goo.gl/nQ2cE2

Difficulty

Difficulty

Page 18: How to crack down big data?

Stat Method: Advanced Methods

These three fields are core data analytics methods. You will find them everywhere, like in econometrics, machine learning, and so on.

Regression Analysis, NTHU

URL: http://goo.gl/YQBAla

Difficulty

Multivariate Analysis, NTHU

URL: http://goo.gl/934GKd

Difficulty

Experimental Design, NTHU

URL: http://goo.gl/ED9HMr

Difficulty

Page 19: How to crack down big data?

Data Mining: Illinois & Stanford MOOC

Data mining is the most powerful tools for business analytics. It can be applied to user behavior data, questionnaire design, and financial market.

Data Mining, UIUC

The Data Mining Specialization teaches data mining techniques for both structured data which conform to a clearly defined schema, and unstructured data which exist in the form of natural language text.

URL: https://goo.gl/Tyzm6Z

Difficulty

Mining Massive Dataset, Stanford

Introduce the participant to modern distributed file systems and MapReduce, including what distinguishes good MapReduce algorithms from good algorithms in general. The rest of the course is devoted to algorithms for extracting models and information from large datasets.

URL: https://goo.gl/NYyxy9

Difficulty

Page 20: How to crack down big data?

Data Mining: Illinois & Stanford MOOC

Data mining is the most powerful tools for business analytics. It can be applied to user behavior data, questionnaire design, and financial market.

Data Mining, UIUC

The Data Mining Specialization teaches data mining techniques for both structured data which conform to a clearly defined schema, and unstructured data which exist in the form of natural language text.

URL: https://goo.gl/Tyzm6Z

Difficulty

Mining Massive Dataset, Stanford

Introduce the participant to modern distributed file systems and MapReduce, including what distinguishes good MapReduce algorithms from good algorithms in general. The rest of the course is devoted to algorithms for extracting models and information from large datasets.

URL: https://goo.gl/NYyxy9

Difficulty

Page 21: How to crack down big data?

Machine Learning: Stnaford / NTU MOOC

Machine learning is the science of getting computers to act without being explicitly programmed.

Machine Learning, Stanford

This course provides a broad introduction to machine learning, datamining, and statistical pattern recognition.URL: https://www.coursera.org/learn/machine-learning

Difficulty

Machine Learning, NTU

The students shall enjoy a story-like flow moving from "When Can Machines Learn" to "Why", "How" and beyond.. (Very tough course!)URL: https://www.coursera.org/course/ntumlone

Difficulty

Page 22: How to crack down big data?

3. What I’ve done in practice!How to become the data unicorn without any tuition fee

Page 23: How to crack down big data?

SOP for Data Analytic Project

Data Task Formulation

Data Collection

DataCleaning

Data Exploration

Data Modeling

Define Purpose

Model Selection

Performance Evaluation

Model Deployment

Initial Phase90% Efforts

Middle Phase90% Professions

Final Phase90% Domain

Page 24: How to crack down big data?

25,054,386 vcMonthly View Counts

751,631,580 valuesLots of user behavior!

1,785,244 usersMonthly Active Users

Page 25: How to crack down big data?

My workspace

R, Google Analytics, Spark

Page 26: How to crack down big data?

Big DataAll about math, statistics, and coding.But how about business knowledge?

Page 27: How to crack down big data?

thanks!ANY

QUESTIONS?

You can find me at:

[email protected]