Workshop:MakePredictions FromYourDataUsingR.UTM15July2019
Workshop Instructors: Dr. Norhaiza Ahmad . Dr. Noraslinda M. Ismail . Dr. Shariffah Suhaila Syed Jamaludin
Make Predictions from your Data using
PART A: R INTRODr. Norhaiza Ahmad
Department of Mathematical SciencesFaculty of Science
Universiti Teknologi Malaysia
http://people.utm.my/norhaiza/
Workshop:MakePredictions FromYourDataUsingR.UTM15July2019
@haiza
Start your R Session on the PC in this Lab!
1. Go to Desktop2. Click Folder: Mathematics Software or Math
Software 3. Click Folder: R4. There are three R applications:
i. R i386 3.4.0ii. R x64 3.4.0iii. RStudio
2
ChooseRStudio
Otherwise go to the START button and search for R studio
Workshop:MakePredictions FromYourDataUsingR.UTM15July2019
@haiza
Workshop Schedule: Make Predictions from your Data using R
PART ITEM DETAILS
PART A Intro toPredictiveLinearModel
IntroductiontoModelling- theory&terminology. Datastructure.
RQuickie About R.Rbase .BasicRSyntax.RstudioInterface.Packages.Help
PART B PredictiveLinearModelforContinuousData(Response)
(i)Basic RegressionModel,SinglePredictorVariable
(ii)MultipleLinearRegression,ModelAssessment
PARTC PredictiveLinearModelforCountsData(Response)
IntrotoGeneralised LinearModel
Workshop:MakePredictions FromYourDataUsingR.UTM15July2019
@haiza4
DataBigDatavsSmallData
Extract Information From Data
• Classification• Discrimination• Comparison• Relationship
•Prediction
Extractinformation
Workshop:MakePredictions FromYourDataUsingR.UTM15July2019
Example:PredictingBodyMassIndex
5
DatabaseofindividualsVariablesmeasured:height,weight,body fat
HowtouseBMItopredictbodyfatpercentage?
Workshop:MakePredictions FromYourDataUsingR.UTM15July2019
@haiza
Predictive Modelling
Predictive modeling is a technique that uses mathematical and computational methods to predict or forecast an event or outcome.
Workshop:MakePredictions FromYourDataUsingR.UTM15July2019
@haiza
Types of Predictive Models
Parametric
Models
Non-ParametricModels
• modelsuseafixednumberofparameters
• basedonanunderlyingprobabilisticmodel
• modelsdonothaveafixednumberofparameters
• basedonanunderlyingprobabilisticmodel
• eg.DecisionTreesetc.
This workshop focusses on:Regression Modeli.e linear regression model
Workshop:MakePredictions FromYourDataUsingR.UTM15July2019
@haiza8
General modeling framework
Workshop:MakePredictions FromYourDataUsingR.UTM15July2019
@haiza
Wewillbeusingadatacalled
9
• House sale prices for King County, Washington USA, which includes Seattle. • It includes homes sold between May 2014 and May 2015. • 21613observations.21variables
id 1 2 3 4 5 6 21613
date 20141013 20141209 20150225 20141209 20150218 20140512 .. .. .. 20141015price 221900 538000 180000 604000 510000 1225000 .. .. .. 325000bedrooms 3 3 2 4 3 4 .. .. .. 2bathrooms 1 2.25 1 3 2 4.5 .. .. .. 0.75sqft_living 1180 2570 770 1960 1680 5420 .. .. .. 1020sqft_lot 5650 7242 10000 5000 8080 101930 .. .. .. 1076floors 1 2 1 1 1 1 .. .. .. 2waterfront 0 0 0 0 0 0 .. .. .. 0view 0 0 0 0 0 0 .. .. .. 0condition 3 3 3 5 3 3 .. .. .. 3grade 7 7 6 7 8 11 .. .. .. 7sqft_above 1180 2170 770 1050 1680 3890 .. .. .. 1020sqft_basement 0 400 0 910 0 1530 .. .. .. 0yr_built 1955 1951 1933 1965 1987 2001 .. .. .. 2008yr_renovated 0 1991 0 0 0 0 .. .. .. 0zipcode 98178 98125 98028 98136 98074 98053 .. .. .. 98144lat 47.5112 47.721 47.7379 47.5208 47.6168 47.6561 .. .. .. 47.5941long -122.257 -122.319 -122.233 -122.393 -122.045 -122.005 .. .. .. -122.299sqft_living15 1340 1690 2720 1360 1800 4760 .. .. .. 1020sqft_lot15 5650 7639 8062 5000 7503 101930 .. .. .. 1357
Consideradatasetcalledhouse_prices
Question: Can we predict the sale price of houses based on their features?
Workshop:MakePredictions FromYourDataUsingR.UTM15July2019
@haiza
Consideronlycertainfeaturesofthehouses:sqft_living, condition, bedrooms, yr_built, waterfront
id 1 2 3 4 5 6 21613
price 221900 538000 180000 604000 510000 1225000 .. .. .. 325000bedrooms 3 3 2 4 3 4 .. .. .. 2sqft_living 1180 2570 770 1960 1680 5420 .. .. .. 1020waterfront 0 0 0 0 0 0 .. .. .. 0condition 3 3 3 5 3 3 .. .. .. 3yr_built 1955 1951 1933 1965 1987 2001 .. .. .. 2008
10
𝑦 = 𝑓 𝑥 + 𝜀Housesaleprices(in$)
squarefeet ,condition,bedrooms,yearhousewasbuilt,waterfront
Canwepredictthesalepriceofhousesbasedonthefeaturesofhouses?
Response/Outcomevariableofinterest explanatory/predictorvariables
Workshop:MakePredictions FromYourDataUsingR.UTM15July2019
@haiza
Background: General modeling framework formula
11
𝑦 = 𝑓 𝑥 + 𝜀
Response/Outcomevariableofinterest
explanatory/predictorvariables
where𝑦:outcomevariableofinterest𝑥 explanatory/predictorvariable(s)𝑓 :functionoftherelationshipbetweenyand𝑥 𝜀:unsystematicerrorcomponenti.e.noise
Workshop:MakePredictions FromYourDataUsingR.UTM15July2019
@haiza
The modelling problem
12
𝑦 = 𝑓 𝑥 + 𝜀
Response/Outcomevariableofinterest
explanatory/predictorvariables
consider
functionoftherelationshipbetweenyand▁𝑥
unknown
unknown
known.givenbydata(nobservations)
known.givenbydata(nobservations)
Aim: 1. Fitamodel 𝑓()* that approximates𝑓() whileignoring𝜀.è Separatesignalnoise
2. Generate fitted/predicted values𝑦, = 𝑓()*
Workshop:MakePredictions FromYourDataUsingR.UTM15July2019
@haiza
Fitted (Predicted) Linear Regression Model
𝑦 = 𝑓 𝑥 + 𝜀Generalmodellingi.e theobservedvalueofyis
𝑦, = 𝑓(𝑥)-Fitted (Predicted) Linear Regression Model is given by
ToFitaLinearRegressionModel,weestimate𝑓 . by𝑓(. )*
Thus,
Workshop:MakePredictions FromYourDataUsingR.UTM15July2019
@haiza
Recall: Types of Data
Data
Qualitative
Nominal Ordinal
Quantitative
Discrete Continuous
IntervalCategorical
CountsFrequency
Rank
Workshop:MakePredictions FromYourDataUsingR.UTM15July2019
@haiza
SCOPE OF WORKSHOPData
Prediction
UsingLinearRegressionModel
ContinuousData
SimpleLinearRegression
MultipleLinear
Regression
DiscreteData
CountsData
GeneralizedLinearModel
UsingRSoftware
ResponseVariable
ResponseVariable
Workshop:MakePredictions FromYourDataUsingR.UTM15July2019
@haiza16
About R: a quickie1. About R2. Interfaceoptions forR3. RStudio4. AnatomyofR:Functionsand
Packages5. Saving&Quitting
Workshop:MakePredictions FromYourDataUsingR.UTM15July2019
@haiza
About R?
17
WhyR?
FREE
LargeCommunityofRUsers
Growsintandemwith
Development in
Statistics
Economicsustainability
Applicationcodes- latestresearchworkarelikelytobeavailabletouse.
Helpavailable:
Extensivehelpdocumentationinsystem
Justask!OrBrowse inarchiveQ&A
• A computer language, with orientation towards statistical applications
• Open-sourced software -non-commercial- FREE§ open exchange,
publicly accessible Community-oriented software
• Origin in academics:§ solid foundation of core
statistical and numerical algorithms and continues to grow to this end.
Workshop:MakePredictions FromYourDataUsingR.UTM15July2019
@haiza
How to download R?
18
http://www.r-project.org/ Rbase
Workshop:MakePredictions FromYourDataUsingR.UTM15July2019
@haiza
PointstoNote
19
• WhenyoudownloadandinstallR,youaredownloadingandinstallingbaseRandselectedpackages.
OnceRdownloadiscomplete:§ anRIconwillappearonyourdesktop.
Workshop:MakePredictions FromYourDataUsingR.UTM15July2019
@haiza
Main Interface Options -R Users
20
Rbase
RCommander
RStudio
Workshop:MakePredictions FromYourDataUsingR.UTM15July2019
@haiza21
ProceduretodownloadRStudio onyourownPC(1) MakesureyouhavealreadydownloadedRbase
(2)DownloadthefreeversionofRStudio athttps://www.rstudio.com/products/rstudio/download/
How to download R Studio?
Workshop:MakePredictions FromYourDataUsingR.UTM15July2019
RInterface:RbasevsRStudio
22
Rbase RStudio
Workshop:MakePredictions FromYourDataUsingR.UTM15July2019
@haiza
1. GotoDesktop2. ClickFolder:MathematicsSoftwareorMath
Software3. ClickFolder:R4. ChooseRStudio
23
OtherwisegototheSTARTbuttonandsearchforRStudio
ProceduretodownloadRStudio onyourownPC(1)MakesureyouhavealreadydownloadedRbase(2)DownloadthefreeversionofRStudio athttps://www.rstudio.com/products/rstudio/download/
Start your R Studio Session!
Workshop:MakePredictions FromYourDataUsingR.UTM15July2019
@haiza24
About R: a quickie1. About R2. Interfaceoptions forR3. RStudio4. AnatomyofR:Functionsand
Packages5. Saving&Quitting
RStudio• NavigatePanels• Entering&ExecutingCommands-v Rconsolev Rscript
• functionsinR&Help• Rpackages
Workshop:MakePredictions FromYourDataUsingR.UTM15July2019
@haiza
NavigatingRstudio:ThePanels
ConsolePanel
ViewerPanel
EnvironmentPanelSourcePanel
Panelsonright:maintain
theworkingenvironment
Panelsonleft:RunCodes/Commands
4Panelsshown
writeyourcommandshere(likeanotepad)
writeyourcommands
aftertheprompt>
Workshop:MakePredictions FromYourDataUsingR.UTM15July2019
@haiza26
ASIDE:NavigatingRStudio
AttheRibbon,ChooseTabViewà Panes
Youcanspecifywhichpaneltodisplay
Ordisplayaparticularpane,eg.TodisplayConsoleonly
AttheRibbon,ChooseTabViewà Panesà ZoomConsole
Toreturndisplayofallpanels
AttheRibbon,ChooseTabViewà Panesà ShowAllPanes
Workshop:MakePredictions FromYourDataUsingR.UTM15July2019
@haiza
How to Run & Execute R commands?
27
Method1 Method
2R-CONSOLE R-SCRIPT
Method2
- Commandsareexecuted- Likearoughpaper- Notconvenientforcode
storage
Workshop:MakePredictions FromYourDataUsingR.UTM15July2019
@haiza
> 6+3[1] 9
28
• WriteyourRcommandsaftereachRprompt (>)• HitEnter toexecutecommand
• Otheroperators:+,-,*,/
Output
Write6+3
TypethefollowingontheRconsolePanel
How to Run commands in R:Method 1: R Console
28ConsolePanel
ViewerPanel
entryindex1
Workshop:MakePredictions FromYourDataUsingR.UTM15July2019
@haiza
HandyTip
29
• Usetheupanddownarrowkeys-
TorecallpreviousRlinecommandsintheconsole
Handy Tips
Workshop:MakePredictions FromYourDataUsingR.UTM15July2019
@haiza
Making Comments in R
30
• CommentlinesinRaredenotedby“#”• Anylinesthatiswrittenafter“#”willnotbereadasanRcommand
• Comment linesareusefulformakingnotation/notesinyourprogram
> 31 %% 7 #remainder after division of 31 by 7
Workshop:MakePredictions FromYourDataUsingR.UTM15July2019
@haiza
R Object Assignment
31
# use symbol ‘=‘ or ‘<-’ for assignment
• Risanobjectorientedprogram.• #Eachinput/outputcanbeassigned/storedtoanobject#case-sensitive
• OnceanRobjectisassigned,itcanbecalleduponatanytimeaslongasitissaved.Herethenumber2isstoredinanobjectcalled“x”.
> x = 2> x > y <-2> y
# CALL UP the r-object to display results
(if required)
Workshop:MakePredictions FromYourDataUsingR.UTM15July2019
@haiza32
# Use ‘;’ to combine line commands
• Ingeneral,RobjectsarestoredinanRworkspace,alsoknownastheglobalenvironment.
> x = 2; x > len = 2; len> x=2; len=2;x+2
R Object Assignment
Workshop:MakePredictions FromYourDataUsingR.UTM15July2019
@haiza
HandyTip2
33
• Usebracketsaroundassignment– toautomaticallycallupstoredobject
# Use ‘()’ to auto call up assignment
> (x=2)> (len=2)
Handy Tips
Workshop:MakePredictions FromYourDataUsingR.UTM15July2019
@haiza
Important Rules on object assignment
34
• Variablenamesarecasesensitive• Noblanksinname
(canuse_or.tojoinwords,butnot-)• Startwithaletter(capitalorlower-case)
Workshop:MakePredictions FromYourDataUsingR.UTM15July2019
@haiza
How to Run & Execute R commands?
35
Method1 Method
2R-CONSOLE R-SCRIPT
Method2
- Commandsareexecuted- Likearoughpaper- Notconvenientforcode
storage
Ascriptfile:filewhereyoucantypeyourcommandsandrunthemontheconsoleatyourownconvenience.Itis
similartoanotepad/textfile!
Workshop:MakePredictions FromYourDataUsingR.UTM15July2019
@haiza36
How to Run commands in R:Method 2: R Script
Nowopenascriptfile.
AttheRibbon,ChooseTabFileà NewFileà RScript
Thisisanewscript“Untitled1”.Youcannameitandsavelikeanotepad
Workshop:MakePredictions FromYourDataUsingR.UTM15July2019
@haiza
Writing and executing commands from an R Script
37
# This is my first Rscript6+36-3
Step2:Toruncommandsonascriptfile:
Placethecursorononthelineyouwanttoexecute.Presstherunbuttononthetab.èTheresultsofexecutingtheselineswillappearintheConsole.
Step1:TypetheseinyourRscript file.Note:Thereisno“>“promptinascriptfile
Step3:Saveyourscriptfileinyourpreferreddrive/folder.èsimilartosavinganotepad/textfile
Workshop:MakePredictions FromYourDataUsingR.UTM15July2019
@haiza
HandyTip• YoucanalsotypetheRCodesintheScriptfile(Sourcepanel)andexecuteeachlinebypressingCmd + enter.(alternativetopressing RUNicon ontheSourcePanel)
• ItisadvisabletowriteyourcodeintheRScriptfile(sourcepanel),sothatyouwillbeabletosaveyourworkattheendofyourcodingsession.
38
Workshop:MakePredictions FromYourDataUsingR.UTM15July2019
@haiza39
About R: a quickie1. About R2. Interfaceoptions forR3. RStudio4. AnatomyofR:Functionsand
Packages5. Saving&Quitting
RStudio• NavigatePanels• Entering&ExecutingCommands-v Rconsolev Rscript
• functionsinR&Help• Rpackages
Workshop:MakePredictions FromYourDataUsingR.UTM15July2019
40
\
PackagesinR=BRAINofR.
WhenyoudownloadandinstallRforthefirsttime,- youwillbeautomatically downloadingandinstalling:base packageandselectedpackages(fromCRAN).
packageBase
SelectedPackages
Add-OnPackages
HowRworks:AnatomyofR
Otherpackagescanbeaddedontoowhenrequired!
• R has many codes for many inbuilt functions, datasets & Help documentations
• These are contained in ‘Packages’ developed by the R-team and the community
Workshop:MakePredictions FromYourDataUsingR.UTM15July2019
Package‘base’
Containsvarious:
• Functions• Datasets• Help
Manymorepackageswithdifferentcapabilities!
Package‘stats’
Containsvarious:
• Functions• Datasets• Help
…..
Package‘ggplot2’
Containsvarious:
• Functions• Datasets• Help
HowRworks:AnatomyofRExamples of ‘Packages’ in R
BasePackage• Corepackage• AutomaticallyinstalledwhenyoudownloadR
gglot2 Package• FancyDataVisualization
stats Package• Generalstatisticalapplications
Workshop:MakePredictions FromYourDataUsingR.UTM15July2019
42
• R Packages are stored in certain Repositories:
HowRworks:AnatomyofR
Package‘base’
Containsvarious:
• Functions• Datasets• Help
CRAN
ROfficial/Default
….. Manyotherpackages
Bioconductor
Rspecifictobioinformatics
GitHubOtherrepositories:NotRbutRepository formanyopensourcedprojects
Package‘…’
Containsvarious:
• Functions• Datasets• Help
….. Manyotherpackages
Rforge
Includedevelopmentversionsofpackages
Package‘…’
Containsvarious:
• Functions• Datasets• Help
….. Manyotherpackages
Workshop:MakePredictions FromYourDataUsingR.UTM15July2019
@haiza
FindingPackagesinRStudio
GoToViewerPanel.PressthetabPackages.ThelistofpackagesalreadyinstalledinyourPCappears.Usethesearchfinder tofindaspecificpackagename
TheseareallthepackagesthatareavailableinyourR
ToseeallthepackagesalreadyinstalledinyourPC
Findpackages
Workshop:MakePredictions FromYourDataUsingR.UTM15July2019
@haiza
InstallingPackagesinRStudio
GoToViewerPanel.PressthetabPackages.
1. ClickInstall2. APop-up
windowappear3. Writethename
ofthepackageinthepop-upboxExample:
abind4. PressInstall
ToinstallthepackagesnotavailableonyourRSession
NoticethatthepackagethatyouhaveinstalledisnowcontainedinthelistofpackagesofyourRsession
Workshop:MakePredictions FromYourDataUsingR.UTM15July2019
@haiza
LoadingPackagesinRStudio
1. Findthepackagethatyouwanttouse
2. Tickontheboxnexttothenameofthepackageeg.abind
Toload(ie.use)apackageinRStudio
(Alternatively:manuallywriteattheRprompt>library(”nameofpackage”)
GoToViewerPanel.PressthetabPackages.
Noticethat:> library(abind) isshownintheRconsole
Workshop:MakePredictions FromYourDataUsingR.UTM15July2019
Task
46
Inthefollowingpartoftheworkshop,wewillusespecificpackagesinRinthisworkshop:
CheckifthesepackagesareavailableintheRstudio ofyourPC:
• moderndive• dplyr• ggplot2• tidyverse• MASS• glm2
Iftheyarenotinthelistofpackagesè Installthepackage(s)Iftheyarealreadyinthelistofpackagesè checkiftheyareloadedinthesystem,otherwiseloadthemup.
Workshop:MakePredictions FromYourDataUsingR.UTM15July2019
@haiza
R functions• Rcommandsareexecuted byrunningafunction• A functionisaname,whichistypedfollowedbyapairof
brackets.Argumentsareaddedinsidethebrackets> sqrt(2)> sin(pi)
• SometimesfunctionsinRhaveextraargument.> sum(2,3,5)> log(10,10)
• Rhasmanyin-builtfunctions.• Thesefunctionsarecontainedinspecificpackages• Thesefunctionscanalsobecombinedand
programmedmanually• Asfunctionsarebeingbuiltandcontributedallthe
timeè useHELPinRto(a)knowwhichpackagetheyarein(b)knowhowtousethem 47
Workshop:MakePredictions FromYourDataUsingR.UTM15July2019
@haiza48
Getting help for R functions: in R StudioGoToViewerPanel.PresstabHelp
Workshop:MakePredictions FromYourDataUsingR.UTM15July2019
@haiza
HelpfunctionsinRStudio
49
Task:Searchforlinear model
Getting help for R functions: in R StudioAttheViewerPanel-HelpTab,usekeywordsearchtofindthefunction&relatedpackage
ggplot2:fortify.lm
Workshop:MakePredictions FromYourDataUsingR.UTM15July2019
@haiza50
About R: a quickie1. About R2. Interfaceoptions forR3. RStudio4. AnatomyofR:Functionsand
Packages5. Saving&Quitting
Workshop:MakePredictions FromYourDataUsingR.UTM15July2019
@haiza
SavingStuff&Exiting
51
• Bringsuptothisscreen:• makeitapracticetosave
yourworkonascriptfile
• Unlessyouneedtorecallcertainobjectsregularly,wedonotneedtosavetheworkspace.
• Workspaceisyourcurrentworkingenvironment.Thisincludesallthefunctions, objectsetc thatyouhavecreatedinthatsession.
• Toexit:• hitX(top rightcorner)
oratRprompt
>quit ( )
SAVING EXITING
Workshop:MakePredictions FromYourDataUsingR.UTM15July2019
@haiza
NEXT
•PARTB(i):PredictiveLinearModelforContinuousData(Response)- singlepredictorvariable
52
Top Related