Data X About Me · 01.12.2016 · Data X CS Tools in Data-X Common Tools • Python, Numpy, SciPy...

34
Data X Ikhlaq Sidhu Chief Scientist & Founding Director, Sutardja Center for Entrepreneurship & Technology IEOR Emerging Area Professor Award, UC Berkeley About Me: Introduc/on Data-X: A Course on Data, Signals, and Systems Data X

Transcript of Data X About Me · 01.12.2016 · Data X CS Tools in Data-X Common Tools • Python, Numpy, SciPy...

Data X

Ikhlaq Sidhu !Chief Scientist & Founding Director, !Sutardja Center for Entrepreneurship & Technology!IEOR Emerging Area Professor Award, UC Berkeley

AboutMe:

Introduc/onData-X:ACourseonData,Signals,andSystems

Data X

Data X

•  ACourseandLab

•  CustomerDriven

•  AppliedProject

•  IndustryPerspec/ve

CSTools

MathModels

RealLifeProblems

WhatisData-X?

Data X

CSToolsinData-X

CommonTools•  Python,Numpy,SciPy•  Pandas,•  TensorFlow,Sklearn•  SQL•  NLP/NLTK•  Matplotlib,•  Tableau

WorkingwithData:Collect,Combine,Store,Use&Compute,Analyze,Visualize

Data X

QuantModels:Signals,Systems,NetworksinData-X

Quan@ta@veToolBox

•  MarkovProcesses•  BayesianDecisions•  LTISystems:Fourier,Filters•  Predic@on:Linear,Max.Likelihood,

Regression,Correla@ons•  ControlModels•  Stochas@c,MLClassifica@on,K-Means,KNN,•  DeepLearning•  NetworkModels,Graphs,Paths

Data X

RealLifeApplica@onApplica@onAreas

•  Industry4.0•  SmartCi/es•  DataandHealth•  FinanceandFintech•  Transporta/on•  NetworksandCommunica/on•  Retail•  Security

Data X

Whatisthisclass

MaketheTools UsetheTools(Op/mally)

ArchitecttheSystem Whyandhowyoubuild

MostCS SutardjaCenterThisIEORCourse

Data X

Whatwecoverandwhatwewon’t

Yes•  ToolsforDataandMath•  Opensourcetoolsets•  UseofMLandNatural

Languagetools•  CookbookApplica/onsfor

commonsystem•  Areallifeintegratedproject•  Opera/onalscalablecode

foranyfieldorapplica/on•  Enoughtobedangerous•  Asystem’sviewpoint

No•  VerylargeLargeDataSets•  Hadoop•  SparkPipelines•  BDASStack•  AbilitytowriteanMLor

Naturallanguageframework•  Detaileddatascience•  Sparsedatatechniques•  ApurelyCSview•  Apurelysta/s/cian’sview•  In-depthMathema/c

Data X

Propose Low Tech

Solution (1)

Brainstorm Challenge and Validate (4)

Demo or Die

(1)

Execute * Iterate BMoE Reflections Agile Spring (8)

Insightful Story Solution

HowtheData-XCourseWorks:

CS Tool Industry Lectures – Video Flip

Team:TechLead,ProductLead+2-4Experts

Data X

BasicToolstoGetStarted•  AvailablewithAnacondaEnvironment(availableforfree):

–  Python,wewilluseversion2.7,pre-requisitetoclass–  NumPy,arrayprocessingfornumbers,strings,records,andobjects–  Pandas,Powerfuldatastructuresanddataanalysistools–  SciPy,Scien/ficLibraryforPython–  Matplotlib,Python2Dplo`nglibrary–  Ipython-Produc/veInterac/veCompu/ng

•  Environmentincludes:–  Jupyter–Interac/vewebbasedpython–  Spyder–codedevelopmentenvironmentwitheditor

•  Data-X:Thisclasswillhelpyoucombinemathanddataconcepts•  NotData-X:Thisclassisnotabigdataclass.

Data X

AHighLevelOverviewofData

Data X

BasicConceptofWorkingwithData

•  DataWrangling•  InProduc/on

Data X

HumanInterpreta@onofData

Human

Machines

LargeSetsofData

Insight

Data X

Howdiddatabecomesuchabigdeal?

Data X

ModelExamplesforthingsthatnormallyrequirehumanjudgment

ScoringWine

Winequality=12.145+0.00117x(winterrainfall)+0.0614x(averagegrowingseasontemperature)–0.00386x(harvestrainfall)

OrenAshenfelter,Princeton.NowusedbyChrisAesAucAonHouse

MoneyBall:HowtomeasureandpredictbaseballperformanceOaklandAthle/csbaseballteamanditsgeneralmanagerBillyBeaneA:WatchandtalkwithhundredsofplayersB:Runscreated=(hits+walks)xTotalBases/(AtBats+Walks)Now:Basketball,Football,andsooneveryothersport

Compe@@veAdvantageinSports

Data X

Recommenda@onsbasedonAlgorithms

Data X

Harrah’sCasino:Knowingyourcustomer

•  ServiceproviderofGamblingandCasinos

•  EntryCard

•  Painpoints

•  Interven@on

Reference:Supercrunchers

Data X

Whathasbeenhappening

1995 2005 2015 2025

Context: InternetWeb

SocialNetsRecommend

HigherAccuracyLargertraining

Control+AISelflearning

E-Commerce AdDriven

Fin/QuantSharingEconomy

?

Data X

AnMLHighLevelFramework

• Objects

• Events/Experiments

• People/Customers

• Products

• Stocks

• …

InRealLife Features,butalsolossofinforma@on

InSample

OutofSample

Person1Person2Person3

...

PersonN

-Characteris@cs-Paberns-Models

-Predic@ons-Similari@es-Differences-Distance

Somedatahasobservedresults

Data X

CS: TableMath: MatrixX,withNrows–eachperson

mcolumns,eachfeature(age,salary,..)

X=

• Objects

• Events/Experiments

• People/Customers

• Products

• Stocks

• …

InRealLife Features,butalsolossofinforma@on

InSample

OutofSample

Person1Person2Person3

...

PersonN

-Characteris@cs-Paberns-Models

-Predic@ons-Similari@es-Differences-Distance

Somedatahasobservedresults

AnMLHighLevelFramework

Data X

AFundamentalIdea:FromTabletoN-DimensionalSpace

A

B

CD

E

F

G

H

12345

54321

WhichuserisclosesttouserA?

Element F1 F2 F3

A 4 2 2

B 4.5 1.5 3

C 3 3 5

D 1 2 2

E 3 1.5 5

F 3.5 3.5 1

.. .. .. ..

X=

Movie1

Movie2

Data X 22

A

B

CD

E

F

G

H

Movie1

Movie2

12345

54321

ClusteringbyMeasuringDistance(Unsupervised)

Distancefunc@ons

Data X

ClusteringtoClassifica@on

24

A

B

CD

E

F

G

H

Actually:70K->200KAtles(dimensions),10Mplususers(points)

Feature1

Feature2

12345

54321

•  Targetcustomers?

•  PicturesofCatsandDogs

•  Speechrecogni@on

•  RecognizeLebers:A,B,C..

Data X

AFundamentalIdea:FromTabletoScoreCust F1 F2 F3

A 4 2 2

B 4.5 1.5 3

C 3 3 5

D 1 2 2

E 3 1.5 5

F 3.5 3.5 1

.. .. .. ..

F(X)

Cust CreditScore

A 552

B 381

C 760

D 330

E 452

F 678

.. ..

X Y

X=

Data X

MachineLearning:LearningfromDataInputData=MatrixX

Customer1:[Name,income,x,y,..Features..z]Customer2:[Name,income,x,y,..Features..z]CustomerN:[Name,income,x,y,..Features..z]

OutputData=ColumnVectorY

Customer1:[20]Customer2:[60]CustomerN:[05]

Purchases/year,repaidloan,…

Target:WhatisF(X)=Y aformulathatwedon’tknow

Sampledata(training):(x1,y1)(x2,y2)…(xm,ym) wehavethis

AlgorithmAfromH

H:HypothesisSet:Allpossiblealgorithmsorformulas

FindG(x)whichisapprox.F(x)

a)SupervisedML–asshownb)Unsupervised-notrainingdatac)Reinforcedlearning–donebysimula/on

Data X

Data X

X Y

X Y

Data X

TheKeyismul@-layerlearningalgorithmssuchasDeepConvolu@onalNeuralNetworks!

Neuralnetresultsareclosetohumanresults

Data X

ThismeansAccuracy

Data X

1.  Knowingyourcustomer,bepertarge/ngandrela/onshipE.g.Target,Disney,Neqlix

2.  Improvingphysicalproductorservicerwithcomplimentaryinforma/onE.g.UPS,FedEx

3.  Data-drivenreliabilityorsecurityE.g.GE,BMW,Siemens

4.  Informa/onBrokers,Arbitrage,andTradingOpportuni/esE.g.Investmentfunds.

5.  Improvingthecustomerjourney/experienceE.g.Harrah’s

6.  Func/onalApplica/ons:HR/Hiring,Opera/onsetc.E.g.Walmart,Baseball,Sports

7.  EfficiencyorbeperperformanceperdollarcostE.g.GeneralIT,SAP,etc

8.  RiskManagement,regula/on,andcomplianceE.g.Compliance360

Top8BusinessModelsUsingData

Data X

AHighLevelFramework

• Objects

• Events/Experiments

• People/Customers

• Products

• Stocks

• …

InRealLife Features,butalsolossofinforma@on

InSample

OutofSample

Person1Person2Person3

...

PersonN

-Characteris@cs-Paberns-Models

-Predic@ons-Similari@es-Differences-Distance

Somedatahasobservedresults

Data X

TheData-XSystemView

WeborPoll

PossibleInputsCodeBlocks

Download

Crawl

StreamSocialNet

AlgorithmOp/onsw/Tables/MatrixPredic/on/Classifica/onNaturalLanguage,StateSpaceFeatureExtrac/on

Computeincludingtest,train,split

Pandas:ShortTermStorage

LongTermStorage:SQLandFileFormats(JSON,CSV,Excel)

Web

PossibleOutputCodeBlocks

Email

ControlDecision

…Chatbot

FeedbackfromExternalSystem(World)

Data X

AHighLevelFramework

• Objects

• Events/Experiments

• People/Customers

• Products

• Stocks

• …

InRealLife Features,butalsolossofinforma@on

InSample

OutofSample

Person1Person2Person3

...

PersonN

-Characteris@cs-Paberns-Models

-Predic@ons-Similari@es-Differences-Distance

Somedatahasobservedresults

Inthisclass,wewilllearnwaysto:*Collectthedataaboutobjects*Combinedatasourceswhenneeded*Usetablesanddatabasestostore*Prac/cemakinggood“features”*LearntoAnalyze;Compute,Classify,Predict*VisualizesomeresultsUsecookbookapplica/onstogetyoustartedonyourownappliedprojectinagroup.

Data X

End of Section