Data Culture Series - Keynote & Panel - Reading - 12th May 2015
-
Upload
jonathan-woodward -
Category
Data & Analytics
-
view
351 -
download
0
Transcript of Data Culture Series - Keynote & Panel - Reading - 12th May 2015
UK Business Lead for BI & Advanced Analytics
4
Jon Woodward : Connect & Follow
4
@JLWoodward
www.linkedin.com/in/jonathanwoodward
#DataCulture
SQL Server
PowerBI
AzureML
Hadoop
DataFactory
DocumentDB
Search
EventHub
Stream Analytics
Revolution R
Azure DW
Azure Data Lake
2015…We have reached a Tipping Point
Of organizations will
consider cloud
deployment
50%
Of new licence spend will
be for Data Discovery &
Analytics
50%
Of BI & Analytics spend
will be driven by the
Business
50%
Of Users will be touched
by BI and Analytics
50%
The Microsoft
data platformMobileReports
Natural
language queryDashboardsApplications
StreamingRelational
Internal &
externalNon-relational NoSQL
Orchestration
Machine
learningModeling
Information
management
Complex event
processing
Data Culture Series
Data Culture Exec
Session
Data Culture
Summit
4 events – final event 14th May, LondonCXO Level – Invite only
10 events; 1000 customersPower User, Analyst, Architect, Developer, DBA, Data Scientist
Final 2 events this fiscal (Reading, London)
Date Location
12 May READING Data Culture series
19 May LONDON Data Culture series
Summer Break
Date Location
September 16th
/17th
London 2 Day Data Culture Event
Nov London Future Decoded
Jan TBC 2 Day Data Culture Event
Value of Data
IoT
Business Apps
CMO, CFO, Sales
Business Case
For Data
CDO, CIO, CTO
Architect Level
Data Platform Workshop
Modernising your Data Platform
Data Developer
Multi-track - Hands-on
BI, Advanced Analytics , IoT, Data Services, Big Data
Dashboard in a
Day
Analyst
Hands on BI
Time
10.00 – 10.30 Intro – Jon Woodward
10:30 – 11:30 KeynoteRic Howe – Data Platform Update (Build & Ignite)Niel Miller – Revolution Analytics Overview
11:30 – 12:30 Immersion Tracks - Overview
12:30 – 13:15 Lunch & Expo
13:15 – 15:00 Immersion Hands on
15:00 - 15:15 Break & Expo
15:15 - 16:30 Immersion Hands on
16:30 – 17:00 Panel and l Close
Microsoft, HortonWorks, KPMG, Revolution
Microsoft Azure Data Lake
HDFS compatible
Unlimited Storage,
Petabyte files
Optimised for massive throughput
High-frequency, low-latency,
near-real-time
Native format
Azure Data Warehouse A mashup of Azure SQL v12, and PDW
Uses Azure Data Lake for storage*
Features PolyBase Can connect to Azure HDInsight
But also to Cloudera and Hortonworks clusters, in cloud or on-prem
Compute and storage scale separately Compare to Amazon Redshift
Integrations Azure Data Factory
Power BI
Face, Speech, Vision, Text, Recommendations, Churn, and more
Example Face API (How-Old.net)
Pattern recognition
Give it a photo, it will guess gender and age
Integrated with Bing images to make finding photos simple
Azure Machine Learning – new APIs
dbo.Patients
Jane Doe
Name
243-24-9812
SSN
USA
Country
Jim Gray 198-33-0987 USA
John Smith 123-82-1095 USA
dbo.Patients
Jane Doe
Name
1x7fg655se2e
SSN
USA
Jim Gray 0x7ff654ae6d USA
John Smith 0y8fj754ea2c USA
Country
Result Set
Jim Gray
Name
Jane Doe
Name
1x7fg655se2e
SSN
USA
Country
Jim Gray 0x7ff654ae6d USA
John Smith 0y8fj754ea2c USA
dbo.Patients
SQL Server
ciphertext
Query
Always EncryptedHelp protect data at rest and in motion, on-premises & cloud
TrustedApps
SELECT Name FROM
Patients WHERE SSN=@SSN
@SSN='198-33-0987'
Result Set
Jim Gray
Name
SELECT Name FROM
Patients WHERE SSN=@SSN
@SSN=0x7ff654ae6d
Column Encryption
Key
Enhanced
ADO.NET
Library
ColumnMasterKey
Client side
PolyBaseQuery relational and non-relational data with T-SQL
T-SQL query
SQL Server Hadoop
Quote:
************************
**********************
*********************
**********************
***********************
$658.39
Jim Gray
Name
11/13/58
DOB
WA
State
Ann Smith 04/29/76 ME
Data Scientist
Interact directly with data
Built-in to SQL Server
Data Developer/DBAManage data and
analytics together
Built-in advanced analyticsIn-database analytics at massive scale
Example Solutions
• Sales forecasting
• Warehouse efficiency
• Predictive maintenance
Relational Data
Analytic Library
T-SQL Interface
Extensibility
?R
R Integration
010010
100100
010101
Microsoft Azure
Marketplace
New R scripts
010010
100100
010101
010010
100100
010101
010010
100100
010101
010010
100100
010101
010010
100100
010101
• Credit risk protection
Enhanced Analysis & Reporting ServicesScalable on-premises BI solutions & new modern reports
Internet Explorer Firefox SafariChromeEdge
Order history
Name Date Item
0x21ba906fdb52 1ba906fd 2ba906f
0x19ca706fbd9a 5re316rl 1da813t
1x59cm676rfd8b 1re306fd 3ha706f
2y36cg776rgd5b 3bg606fl 1ba906i
1t64ce87r6pd7d 5ba616rj 2ra933f
0y16cj676r6fd3e 1ra806fd 3ra806t
3x47cr876r6fd9g 2hh906fj 1sa906f
1x11cj576rf6d3d 6be916gi 3sa523t
2t74ce6676rfd9c 1hi9306fj 2ga906f
0y47cm776rfd1b 3bi506gd 1wa806f
4x32cj6676rfd9y 3ha916fi 2ba913i
0x77cf6676rfd3x 5re926gi 1ba902f
2t22cm676rfd3a 1ra536fe 1ea667i
0x19ca706fbd9a 5re316rl 1da813t
Order history
Name Date Item
0x21ba906fdb52 1ba906fd 2ba906f
0x19ca706fbd9a 5re316rl 1da813t
1x59cm676rfd8b 1re306fd 3ha706f
2y36cg776rgd5b 3bg606fl 1ba906i
0x19ca706fbd9a 5re316rl 1da813t
Stretch SQL Server into Microsoft AzureSecurely stretch cold tables to Azure with remote query processing
App
Query
Microsoft Azure
Customer data
Product data
Order History
Stretch to cloud
Query
Always Encrypted
AZURE STREAM ANALYTICSHDINSIGHT
Data platform
POWER BI 2.0
AZURE SQL DATABASE
AZURE MACHINE LEARNING
SQL SERVER 2014 DOCUMENT DB
AZURE SQL DATA WAREHOUSE
REVOLUTION R
AZURE DATA FACTORY AZURE DATA LAKE
(A Zettabyte has 21 zeros)
(40,000,000,000,000,000,000,000)
(= 3 million books per person)
Volume
Variety
Velocity
Revolution Analytics Proprietary
But…
• Wider data sets (many more variables / features)
• Real time scoring (steaming data in fast…) Revolution Analytics Proprietary
THE PERFECT STORM
+ Computing Power
+ Bigger Data
+ Pace of Business
+ Customer Expectations
+ Data Science
+ Computer Science
+ Management Science
Better
Business
Decisions
Better
Business
Outcomes
Revolution Analytics Proprietary
- Robert Gentleman & Ross Ihaka, 1993
- Version 1.0 in 2000
- 3.0+ Million Global Users
- 6200+ “Packages”
- R in Universities = New Talent
- Open Source = Access To Innovation
- Programming Agility
- Huge range of predictive analytics
We love R!
Revolution Analytics Proprietary
OUR COMPANY
The leading providerof advanced
analytics software and services
based on open source R, since 2007
OUR PRODUCTS
REVOLUTION R: The enterprise-grade
predictive analytics application platform
based on the R language
Revolution Analytics Proprietary
Language
Interpreter and
Standard R
Algorithm Suites
Development &
Deployment Tooling
Big Data Distributed
Execution Platform
R +
CR
AN
Revo
R
DistributedR
ConnectR
ScaleR
DevelopR DeployR
Revolution R Enterprise Big Data Big Analytics
Ready
– Enterprise
readiness
– High performance
analytics
– Multi-platform
architecture
– Data source
integration
– Development tools
and Integration
tools
Enterprise Technical
Support
Revolution Analytics Proprietary
File NameCompressed
File Size (MB) No. RowsOpen Source R
(secs)Revolution R
(secs)
Tiny 0.3 1,235 0.001 0.05
V. Small 0.4 12,353 0.21 0.05
Small 1.3 123,534 0.03 0.03
Medium 10.7 1,235,349 1.94 0.08
Large 104.5 12,353,496 60.69 0.42
Big (full) 12,960.0 123,534,969 Memory! 4.89
V.Big 25,919.7 247,069,938 Memory! 9.49
Huge 51,840.2 494,139,876 Memory! 18.92
22 years of US
flight data
124m rows, 29
variables
Linear
Regression
model - arrival
delay as
function of
day-of-week Tests run on 4 core machine, 16GB RAM and 500GB SSD
Revolution Analytics Proprietary
DistributedR
ScaleR
ConnectR
DeployR
In the Cloud Cloud
Workstations & Servers DesktopsServer
Clustered Systems Microsoft HPCLinux
EDW Teradata
HadoopHortonworksClouderaMapR
+ HD Insights
+ SQL Server vNext
+ Azure ML
+ Power BI
Revolution Analytics Proprietary
In-database analytics at massive scale
Data Scientist
Interact directly with data
SQL Server
Data Developer/DBAManage data and
analytics together
ExtensibilityExample Solutions
• Fraud detection
• Sales forecasting
• Warehouse efficiency
• Predictive maintenance
010010
100100
010101
Relational Data
Analytic LibraryNative functions
T-SQL Interface
Benefits Faster deployment of ML models
Faster performance
(Move compute to the data)
Improved scalability
In-DB Analytic Scenarios Real-time fraud detection
Customer churn analysis
Product recommendations
R
R Integration
coming!
Microsoft Confidential. Preliminary Information. Dates and capabilities subject to change. Microsoft makes no warranties, express or implied.
Revolution Analytics Proprietary
Assemble and standardize
all of a marketer’s data into
a Hadoop cluster
Apply the rigor of a medical
researcher with patented
methodology
Know whom
to reachIdentify and attribute
the revenue drivers
Revolution Analytics Proprietary
More info at:
http://www.revolutionanalytics.com/content/datasong%E
2%80%99s-big-data-analytics-platform-marketing-
optimization-helps-clients-understand
Features
Ensemble of models used:
SVM, Random Forests,
and Neural Networks.
Then Logistic Regression
used to assess pass / fail
Pass
Or
Fail
Pre-processing
crop and align to
fixed size.
Feature
extraction.
More info at :
http://info.revolutionanalytics.com
/30apr15-iot-and-the-
manufacturing-floor.html
Revolution Analytics Proprietary
Hadoop
Edge & Data Node2 x Data Node
• Use new 3rd party data-sources of categorical data to automatically
create new variables (features). e.g consumer spend across various
categories, locations etc.
• Split and analyze features in parallel to measure predictive quality for
credit-risk and default
• Champion / Challenger: Select top ‘n’ new features and compare
against existing features in credit risk models.
• Introduce new “Golden” features once proven to enhance model
• Legacy solution took several months to code with 6 week run process
(with manual intervention). Unsuitable for production runs!
• Revolution code implemented in Hadoop using massive parallel
processing machine-learning to automate feature selection.
• 6 weeks processing reduced to a < 24 hour automated execution
process
External 3rd Party
Data Sources
Customer Credit
Process
Revolution Analytics Proprietary
What next?
• If you are using R (or SAS, SPSS, Matlab…) today and need scale,
speed, support and get on the road to Microsoft Advanced
Analytics come and talk to us!
• More info? www.revolutionanalytics.com
Revolution Analytics Proprietary
Panel
Ric Howe Microsoft
Tim Marston HortonWorks
Andrew Morgan KPMG
Simon Field Revolution Analytics
#DataCulture
Ric H (Microsoft) Q : With all the recent announcements at //Build and Ignite, what aspects should we be most excited about
#DataCulture
Tim M (Hortonworks)Q : if Hadoop is the answer to Big Data, where is it heading…what is the future vision
#DataCulture
Simon F (Revolution)Q : With the decades of investment in SAS, why are companies moving to R and why should we?
Trial : Revolution R Open
Trial : Hadoop (HDInsight, HDP)
61
Get Hands on…
Trial : SQL Server 2014
Trial : PowerBI
Trial : Machine Learning (AzureML)
SQL Saturday – Manchester , July 25th
SQL Relay , October 12-22nd – 8 Locations
PASS BA*, London , November
PASS Summit*- US, October 27-30th
SQL Saturday – Edinburgh , June 13th
62
Community Events
62
* See Jen Stirrup for Discount
• Contact our Azure advisory team at [email protected]
• Developer showcases, demos and deep technical learning with MS experts
• http://aka.ms/tdoazure
• Connect with Microsoft and others in IoT.
• Contact [email protected]
http://aka.ms/iotworkshop• Download the Hands On Lab
• June 11th Reading: 1-Day IoT & Data “Hackathon” hands on learning• Waitlist registrants receive 1st priority for next event
IoT Next Steps
Come Back for more…
Date Location
16 September READING
10 November LONDON
27 November READING
3 December LONDON
27 January LONDON
24 February LEEDS
24 March EDINBURGH
8 April BIRMINGHAM
12 May READING
19 May LONDON
Get Ready for Sept…
UK Business Lead for BI & Advanced Analytics
65
Jon Woodward : Connect & Follow
65
@JLWoodward
www.linkedin.com/in/jonathanwoodward
#DataCulture