Microsoft R Server for Data Sciencea
-
Upload
data-science-thailand -
Category
Data & Analytics
-
view
688 -
download
2
Transcript of Microsoft R Server for Data Sciencea
![Page 1: Microsoft R Server for Data Sciencea](https://reader031.fdocuments.in/reader031/viewer/2022030317/5871295a1a28abe4448b6c6d/html5/thumbnails/1.jpg)
![Page 2: Microsoft R Server for Data Sciencea](https://reader031.fdocuments.in/reader031/viewer/2022030317/5871295a1a28abe4448b6c6d/html5/thumbnails/2.jpg)
Data Science Team
Data Engineering
Data Science
Application Development
Business Acumen
Data Management
Data
Dividend
![Page 3: Microsoft R Server for Data Sciencea](https://reader031.fdocuments.in/reader031/viewer/2022030317/5871295a1a28abe4448b6c6d/html5/thumbnails/3.jpg)
Typical advanced analytics lifecycle
Ingest Transform Explore Model Deploy
Score Visualize Measure
Model
Score
ƒ(x)
Preparation Modeling
Operationalization
![Page 4: Microsoft R Server for Data Sciencea](https://reader031.fdocuments.in/reader031/viewer/2022030317/5871295a1a28abe4448b6c6d/html5/thumbnails/4.jpg)
Data Scientist should be creating / testing models
Data scientist are rare and expensive
Ingest Transform Explore Model Deploy
Score Visualize Measure
Model
Score
ƒ(x)
Preparation Modeling
Operationalization
![Page 5: Microsoft R Server for Data Sciencea](https://reader031.fdocuments.in/reader031/viewer/2022030317/5871295a1a28abe4448b6c6d/html5/thumbnails/5.jpg)
But the reality is different …
Data scientist focus time
Ingest Transform Explore Model Deploy
Score Visualize Measure
Model
Score
ƒ(x)
Preparation Modeling
Operationalization
80%
5%
15%
![Page 6: Microsoft R Server for Data Sciencea](https://reader031.fdocuments.in/reader031/viewer/2022030317/5871295a1a28abe4448b6c6d/html5/thumbnails/6.jpg)
Decisions
OperationizePreparation
Model
![Page 7: Microsoft R Server for Data Sciencea](https://reader031.fdocuments.in/reader031/viewer/2022030317/5871295a1a28abe4448b6c6d/html5/thumbnails/7.jpg)
• Embrace Open Source
• Evolutionary Path to Cloud
• Democratize Data Science
• Skill Re-Use
• Transparent Scaling
• Facilitate Collaboration
• Decouple Data Science from Platforms
• Leverage Hybrid Cloud Architecture
• Accelerate Experimentation
• Streamline Deployment
Broaden The
Talent Pool
Increase
Productivity
Modernize
Infrastructure
Maximize
Innovation
Drive Down
TCO
![Page 8: Microsoft R Server for Data Sciencea](https://reader031.fdocuments.in/reader031/viewer/2022030317/5871295a1a28abe4448b6c6d/html5/thumbnails/8.jpg)
People
+
Data Sources
Apps
Sensors and devices
From Data To Action On Premises
INTELLIGENCEDATA ACTION
Automated SystemsMicrosoft R Server & SQL R Services
Apps
Cortana Intelligence
![Page 9: Microsoft R Server for Data Sciencea](https://reader031.fdocuments.in/reader031/viewer/2022030317/5871295a1a28abe4448b6c6d/html5/thumbnails/9.jpg)
Challenges posed by open source R
??
Lack of Commercial
Support
InadequateModeling
Performance
Complex DeploymentProcesses
Limited Data Scale
![Page 10: Microsoft R Server for Data Sciencea](https://reader031.fdocuments.in/reader031/viewer/2022030317/5871295a1a28abe4448b6c6d/html5/thumbnails/10.jpg)
R from Microsoft brings
Peace of mind
Efficiency Speed and scalability
Flexibility and agility
![Page 11: Microsoft R Server for Data Sciencea](https://reader031.fdocuments.in/reader031/viewer/2022030317/5871295a1a28abe4448b6c6d/html5/thumbnails/11.jpg)
High-performance, Scalable R
Linux, Windows, Hadoop & Teradata
R Server Technology
![Page 12: Microsoft R Server for Data Sciencea](https://reader031.fdocuments.in/reader031/viewer/2022030317/5871295a1a28abe4448b6c6d/html5/thumbnails/12.jpg)
CommercialOpen Community
Revolution R Open
R Open
Revolution R Enterprise
R Server
![Page 13: Microsoft R Server for Data Sciencea](https://reader031.fdocuments.in/reader031/viewer/2022030317/5871295a1a28abe4448b6c6d/html5/thumbnails/13.jpg)
Escapes R’s traditional memory limits
Scales predictive modeling using parallelization
Distributes computation cores & nodes
Minimizes data movement using in-database, in-MapReduce and in-Apache Spark execution
![Page 14: Microsoft R Server for Data Sciencea](https://reader031.fdocuments.in/reader031/viewer/2022030317/5871295a1a28abe4448b6c6d/html5/thumbnails/14.jpg)
![Page 15: Microsoft R Server for Data Sciencea](https://reader031.fdocuments.in/reader031/viewer/2022030317/5871295a1a28abe4448b6c6d/html5/thumbnails/15.jpg)
• Remote Execution
• Transparent Parallelization:
• Shared Resource Management
Data
Nodes
Corporate
Applications
Desktops &
Servers
direct web services
Microsoft R
Server
Hadoop
![Page 16: Microsoft R Server for Data Sciencea](https://reader031.fdocuments.in/reader031/viewer/2022030317/5871295a1a28abe4448b6c6d/html5/thumbnails/16.jpg)
Distributed R - How Does Remote Compute Context ?
Algorithm
Master
Predictive
Algorithm
Big
Data
Analyze
Blocks In
Parallel
Load Block
At A TimeDistribute Work,
Compile Results
“Pack and Ship”
Requests to
Remote
Environments
Results
Microsoft R Server functions
• A compute context defines where to process.
• E.g. remote context like Hadoop Map Reduce
• Microsoft R functions prefixed with rx
• Current set compute context determines processing
location
Copyright Microsoft Corporation. All rights reserved.
Microsoft R Server “Client” Microsoft R Server “Server”
Console
R IDE or
command-
line REMOTE
CONTEXT
![Page 17: Microsoft R Server for Data Sciencea](https://reader031.fdocuments.in/reader031/viewer/2022030317/5871295a1a28abe4448b6c6d/html5/thumbnails/17.jpg)
### SETUP HADOOP ENVIRONMENT VARIABLES ###
myHadoopCC <- RxHadoopMR()
### HADOOP COMPUTE CONTEXT ###
rxSetComputeContext(myHadoopCC)
### CREATE HDFS, DIRECTORY AND FILE OBJECTS ###
hdfsFS <- RxHdfsFileSystem()
hdfsFS
### ANALYTICAL PROCESSING ###
### Statistical Summary of the data
rxSummary(~ArrDelay+DayOfWeek, data= AirlineDataSet, reportProgress=1)
### CrossTab the data
rxCrossTabs(ArrDelay ~ DayOfWeek, data= AirlineDataSet, means=T)
### Linear Model and plot
hdfsXdfArrLateLinMod <- rxLinMod(ArrDelay ~ DayOfWeek + 0 , data = AirlineDataSet)
plot(hdfsXdfArrLateLinMod$coefficients)
### SETUP LOCAL ENVIRONMENT VARIABLES ###
myLocalCC <- “localpar”
### LOCAL COMPUTE CONTEXT ###
rxSetComputeContext(myLocalCC)
### CREATE LINUX, DIRECTORY AND FILE OBJECTS ###
localFS <- RxNativeFileSystem()
AirlineDataSet <- RxXdfData(“AirlineDemoSmall.xdf”,
fileSystem = localFS)
Local Parallel processing – Linux or Windows In – Hadoop
ScaleR models can be deployed from a server or edge node to run in Hadoop
without any functional R model re-coding for map-reduce
Compute
context R script
– sets where the
model will run
Functional
model R script –
does not need
to change to run
in Hadoop
Copyright Microsoft Corporation. All rights reserved.
![Page 18: Microsoft R Server for Data Sciencea](https://reader031.fdocuments.in/reader031/viewer/2022030317/5871295a1a28abe4448b6c6d/html5/thumbnails/18.jpg)
DeployR• Web services software development kit for
integration analytics via APIs :
• Java
• JavaScript
• .NET Integrates R Into application infrastructures
Capabilities:
• Enterprise authentication & security
• Horizontal scaling
• Invokes R Scripts from web services calls
• RESTful interface for easy integration
• Works with:
• Web & mobile apps
• Leading BI & Visualization tools
• Business rules and streaming engines
DeployR DevelopR
![Page 19: Microsoft R Server for Data Sciencea](https://reader031.fdocuments.in/reader031/viewer/2022030317/5871295a1a28abe4448b6c6d/html5/thumbnails/19.jpg)
19
On-demand sales forecasting
Real-time social
media analysisLeveraging the
power of Office365
![Page 20: Microsoft R Server for Data Sciencea](https://reader031.fdocuments.in/reader031/viewer/2022030317/5871295a1a28abe4448b6c6d/html5/thumbnails/20.jpg)
Microsoft R Server provides a unique opportunity to deliver advanced analytics capabilities to customers who have already invested in storing their data on non Microsoft platforms like Hadoop, Teradata and Linux
Hadoop
- Cloudera CDH, Hortonworks HDP, and HDInsight
![Page 21: Microsoft R Server for Data Sciencea](https://reader031.fdocuments.in/reader031/viewer/2022030317/5871295a1a28abe4448b6c6d/html5/thumbnails/21.jpg)
![Page 22: Microsoft R Server for Data Sciencea](https://reader031.fdocuments.in/reader031/viewer/2022030317/5871295a1a28abe4448b6c6d/html5/thumbnails/22.jpg)
![Page 23: Microsoft R Server for Data Sciencea](https://reader031.fdocuments.in/reader031/viewer/2022030317/5871295a1a28abe4448b6c6d/html5/thumbnails/23.jpg)
Write Once – Deploy Anywhere
R Server portfolio
Cloud
RDBMS
Desktops & Servers
Hadoop & Spark
EDWR Server Technology
![Page 24: Microsoft R Server for Data Sciencea](https://reader031.fdocuments.in/reader031/viewer/2022030317/5871295a1a28abe4448b6c6d/html5/thumbnails/24.jpg)
Included in SQL Server 2016
Reuse and optimize existing R code
Eliminate data movement
In-database deployment
Memory and disk scalability
No R memory limits
Write once, deploy anywhere
Enterprise speed and scale
Near-DB analytics
Parallel threading and processing
Reuse SQL skills for data engineering
Cost effectiveness
Scalability and choice
Simplicity and agility
![Page 25: Microsoft R Server for Data Sciencea](https://reader031.fdocuments.in/reader031/viewer/2022030317/5871295a1a28abe4448b6c6d/html5/thumbnails/25.jpg)
• The industry’s broadest R-based platform
• Enterprise scale atop spark, Hadoop, RDBMSs & EDWs
• Freedom from memory limits
• Choice of Windows and Linux IDEs
• Stable deployment
• Write-once-deploy-anywhere portability
• Investment protection
• Hybrid cloud evolution
![Page 26: Microsoft R Server for Data Sciencea](https://reader031.fdocuments.in/reader031/viewer/2022030317/5871295a1a28abe4448b6c6d/html5/thumbnails/26.jpg)
![Page 27: Microsoft R Server for Data Sciencea](https://reader031.fdocuments.in/reader031/viewer/2022030317/5871295a1a28abe4448b6c6d/html5/thumbnails/27.jpg)
Introduces the following topics:
1. Creating an R Server on Spark HDInsight cluster
2. Installing RStudio for the cluster
3. Running R using Rstudio on web
Reference: https://azure.microsoft.com/en-us/documentation/articles/hdinsight-hadoop-r-server-get-started/
![Page 28: Microsoft R Server for Data Sciencea](https://reader031.fdocuments.in/reader031/viewer/2022030317/5871295a1a28abe4448b6c6d/html5/thumbnails/28.jpg)
Get Essentials Microsoft Developer Resources and R Server Developer Edition: aka.ms/ch9.th
Microsoft R Server on-premises: www.microsoft.com/R-Server
Microsoft R Server on Azure (Cloud): https://azure.microsoft.com/en-us/marketplace/partners/microsoft-r-products/microsoft-r-server/
![Page 29: Microsoft R Server for Data Sciencea](https://reader031.fdocuments.in/reader031/viewer/2022030317/5871295a1a28abe4448b6c6d/html5/thumbnails/29.jpg)
![Page 30: Microsoft R Server for Data Sciencea](https://reader031.fdocuments.in/reader031/viewer/2022030317/5871295a1a28abe4448b6c6d/html5/thumbnails/30.jpg)
![Page 31: Microsoft R Server for Data Sciencea](https://reader031.fdocuments.in/reader031/viewer/2022030317/5871295a1a28abe4448b6c6d/html5/thumbnails/31.jpg)
What is
• A statistics programming language
• A data visualization tool
• Open source
• 2.5+M users
• Taught in most universities
• Thriving user groups worldwide
• 7000+ free algorithms in CRAN
• Scalable to big data
• New and recent grad’s use it
Language
Platform
Community
Ecosystem
• Rich application & platform integration
![Page 32: Microsoft R Server for Data Sciencea](https://reader031.fdocuments.in/reader031/viewer/2022030317/5871295a1a28abe4448b6c6d/html5/thumbnails/32.jpg)
Convergence with Flexibility
Scalable Algorithms
R: Write Once Deploy Anywhere
Templates & Samples
Microsoft R Server Family
R & Python to AML Interop.
Cortana Intelligence
![Page 33: Microsoft R Server for Data Sciencea](https://reader031.fdocuments.in/reader031/viewer/2022030317/5871295a1a28abe4448b6c6d/html5/thumbnails/33.jpg)
DistributedR
ScaleR
ConnectR
DevelopR
Code Portability Across Platforms
In the Cloud Azure HDI/ Spark
Workstations & Servers LinuxWindows
Clustered SystemsLinux Clusters (LSF For Now)Microsoft HPC
EDW Teradata
HadoopHortonworksClouderaMapR &HDInsight
![Page 34: Microsoft R Server for Data Sciencea](https://reader031.fdocuments.in/reader031/viewer/2022030317/5871295a1a28abe4448b6c6d/html5/thumbnails/34.jpg)
DI
R+
CR
AN
Mic
roso
ft R
DistributedR
DeployR DevelopR
ScaleR
ConnectR
Delivers High Performance Parallel Distributed Analytics Across Individual and Clustered Systems
• Cloudera
• Hortonworks
• MapR
• Apache Spark
• IBM Platform LSF
• Microsoft HPC Clusters
• Teradata Database
• Red Hat
• SuSE Servers
• Windows
DistributeR
![Page 35: Microsoft R Server for Data Sciencea](https://reader031.fdocuments.in/reader031/viewer/2022030317/5871295a1a28abe4448b6c6d/html5/thumbnails/35.jpg)
RevoDeployR Web Services
Client libraries (JavaScript, Java, .NET)
Desktop
Applications
(i.e. Excel)
Business
Intelligence
PowerBI
Interactive Web or
Mobile
Applications
HTTP/HTTPS – JSON/XML
Session
ManagementAuthentication
Data/Script
ManagementAdministration
RR
R scripts
End User
Application
Developer
Admin
Data Scientist
Grid Node
R