Population Health Management using Azure Data Lake Analytics and Power...
-
Upload
duongkhuong -
Category
Documents
-
view
215 -
download
2
Transcript of Population Health Management using Azure Data Lake Analytics and Power...
From data to decisions and actions
What should I do?
What will happen?
Why did it happen?
Whathappened?
Interactive
Dashboards
Recommendations
& Automation
Predictive
ModelsReports
Insight
Machine LearningCloudBig Data & IoT
Population Health Management is shifting…
Identifying optimal LASIK treatment plansPredicting Post-Operative Visual Acuity for LASIK
Surgeries
“We have touched more than 20 million people in this
journey. This job has become even easier because of the
many technology platforms… applying those for the
betterment of health”
Gullapalli N Rao, MD, D.Med
Founder-Chair, LV Prasad Eye Institute
ScenarioWhile 95% of such surgeries are successful, the commonest
side effect is a residual refractive error and poor uncorrected
visual acuity (UCVA)
SolutionUtilizing surgery data from previous patients, models are built
to predict the UCVA
Result
• Help determine whether a patient should undergo LASIK
surgery or not
• Identifying the most promising type of laser surgery for
patients and optimal surgery parameters (such as suction
time, flap, and hinge details)
Early screening for dyslexia with eye-trackingOptolexia helps schools identify students at
risk for dyslexia earlier with a cloud tool
Scenario Many untreated or late identified students with dyslexia
Solution
As a student reads text on the screen of a laptop, tablet, or
desktop computer with an eye tracker mounted at the
bottom, the tool captures the student’s eye movements.
The data is used to calculate a numerical result that
identifies the likelihood that the student has dyslexia
Result
• The sooner children are diagnosed with dyslexia, the
sooner they can begin learning coping strategies to
improve their reading skills and academic performance.
• Taking advantage of the cloud computing and machine
learning Optolexia aims to help schools identify students
at risk for dyslexia significantly earlier than current
screening tests.
“Because Optolexia screening is such an easy
process, we may be able to reliably identify
dyslexic students earlier in school and provide the
educational support that they need for academic
success”
Karin Tosteberg, Literacy Development
Coordinator, Municipality of Järfälla
Speeding up research & resultsUniversity transforms Life Sciences
Research with Cloud Big Data Solution
ScenarioDNA sequencing analysis requires supercomputing
resources and Big Data storage that many researchers lack.
Solution
SeqInCloud (short for "sequencing in the cloud") built for
analyzing next-generation sequencing data, with the main
focus on variant discovery and genotyping.
SeqInCloud seamlessly generalizes the GATK pipeline,
allowing it to run in the cloud using HDInsight and
Microsoft Azure in order to maximize portability.
Result
• Easier, more cost-effective access to DNA sequencing
tools and resources
• Even faster, more exciting advancements in medical
research
• Supports collaborative analysis from anywhere, anytime
"Windows Azure is enabling us to keep up with
the data deluge in the DNA sequencing space.
We’re not only analyzing data faster, but
analyzing it more intelligently."
Wu Feng
Professor of Computer Science, Virginia Tech
Fix problems proactivelybefore they startWeka Health Solutions building smart fridges for
global clinicians and vaccine management
“Clinicians in areas of Africa and other regions where
power is unstable or inaccessible can use our Smart
Fridge to store and dispense vaccines. And the Fridge is
small enough that you can put it in a van. So if you can’t
bring the people to the vaccine, you can bring the
vaccine to the people.”
Alan Lowenstein, COO of Weka Health Solutions
ScenarioTo inoculate more people against diseases, clinicians need an
easier, more effective way to safely store vaccines and
manage inventory.
SolutionThe Vaccine Smart Fridge uses an IoT platform that collects
real-time data from numerous sensors on every unit to enable
24×7 monitoring and analysis.
Result
• The Fridge automates vaccine storage and dose dispensing
to save time and enhance patient care. It includes remote
monitoring services to ensure vaccines are stored at the
right temperature, while automatic inventory tracing saves
staff time and ensures a reliable vaccine supply.
• Controlled refrigeration and monitoring also helps reduce
financial losses
Machine Learning
and Analytics
Big Data StoresInformation
Management
Cortana Intelligence Suite
Action
People
Automated Systems
Apps
Web
Mobile
Bots
Event Hubs
Data Catalog
Data Factory
HDInsight
(Hadoop and
Spark)
Stream Analytics
Intelligence
Data Lake
Analytics
Machine
Learning
SQL Data
Warehouse
Data Lake Store
Data Sources
Apps
Sensors and devices
Data
DocumentDB
Intelligence
Dashboards &
Visualizations
Cortana
Bot
Service
Cognitive
Services
Power BI
Azure Analysis
Services
Big Data as a cornerstone of Cortana Intelligence
Action
People
Automated Systems
Apps
Web
Mobile
Bots
Intelligence
Dashboards &
Visualizations
Cortana
Bot
Framework
Cognitive
Services
Power BI
Information
Management
Event Hubs
Data Catalog
Data Factory
Machine Learning
and Analytics
HDInsight
(Hadoop and
Spark)
Stream Analytics
Intelligence
Data Lake
Analytics
Machine
Learning
Big Data Stores
SQL Data
Warehouse
Data Lake Store
Data Sources
Apps
Sensors and devices
Data
Big Data is driving transformative changes
Traditional Big Data
Relational datawith highly modeled schema
All datawith schema agility
Specialized HW Commodity HW
Data
characteristics
Costs
CultureOperational reportingFocus on rear-view analysis
Experimentation leading to intelligent actionWith machine learning, graph, a/b testing
However, there are challenges to Big Data…
Obtaining skills
and capabilitiesDetermining how
to get value
Integrating with
existing IT investments
*Gartner: Survey Analysis – Hadoop Adoption Drivers and Challenges (Stamford, CT.: Gartner, 2015)
But, Microsoft has done it beforeWe needed to better leverage data and analytics to do more experimentation
So we:
• Designed a data lake for everyone to put their data into
• Built tools approachable by any developer
• Created machine learning tools for collaborating across large experiment models
Result:
• Across Microsoft, ten thousand developers doing experimentation leading to better insights
• Leading to growth in our Microsoft businesses:
• Office productivity revenue (45%YoY)*
• Intelligent Cloud (100% YoY)*
• Bing search share doubles2010 2011 2012 2013 2014 2015
Growth of data @ Microsoft
Windows
SMSG
Live
Bing
CRM/Dynamics
Xbox Live
Office365
Malware Protection Microsoft Stores
Commerce Risk
Skype
LCA
Exchange
Yammer
Peta
byt
es
E
xab
ytes
* Microsoft. FY16 Q4 Results, URL: http://www.microsoft.com/en-us/Investor/earnings/FY-2016-Q4/press-release-webcast
Microsoft is now taking everything we’ve learned on this journey
and bringing it to our customers
Technology. Cost. Culture.
Azure Data Lake Store
A No limits Data Lake that powers Big Data Analytics
Petabyte size files and Trillions of objects
Scalable throughput for massively parallel
analytics
HDFS for the cloud
Always encrypted, role-based security &
auditing
Enterprise-grade support
Azure Data Lake AnalyticsA No limits Analytics Job Service to power intelligent action
Start in seconds, scale instantly, pay per job
Develop massively parallel programs with
simplicity
Debug and optimize your big data programs
with ease
Virtualize your analytics
Enterprise-grade security, auditing and
support
Azure HDInsight
A Cloud Spark and Hadoop service for the Enterprise
Reliable with an industry leading SLA
Enterprise-grade security and monitoring
Productive platform for developers and
scientists
Cost effective cloud scale
Integration with leading ISV applications
Easy for administrators to manage
63% lower TCO than deploy your own
Hadoop on-premises*
*IDC study “The Business Value and TCO Advantage of Apache Hadoop in the Cloud with Microsoft Azure HDInsight”
Azure Data Lake
YARN
U-SQL
Analytics HDInsight
Hive R Server
HDFS
Store
Store and analyze data of any kind and size
Develop faster, debug and optimize smarter
Interactively explore patterns in your data
No learning curve
Managed and supported
Dynamically scales to match your business
priorities
Enterprise-grade security
Built on YARN, designed for the cloud
Develop massively parallel programs with simplicity
• U-SQL: a simple and powerful language that’s familiar and easily extensible
• Unifies the declarative nature of SQL with expressive power of C#
• Leverage existing libraries in .NET languages, R and Python
• Massively parallelize code on diverse workloads (ETL, ML, image tagging, facial detection)
Debug and Optimize your Big Data programs with ease
• Deep integration with Visual Studio and Visual Studio Code
• Easy for novices to write simple queries
• Integrated with U-SQL
• Actively offers recommendations to improve performance and reduce cost
• Playback visually displays job run
Highest availability guarantee in the industry for peace of mind
• Managed, monitored and supported by Microsoft
• Enterprise-leading SLA—99.9% uptime
• No IT resources needed for upgrades and patching
• Microsoft monitors your deployment so you don’t have to
99.9% SLA
Lower total cost of ownership • No hardware
• Pay only for the processing used per job
• No paying for unused cluster capacity
• Independently scale storage and compute
• No need to hire specialized operations team
Big Data Stores
A hyper-scale repository for big data analytics workloads
• A Hadoop Distributed File System for the cloud
• No fixed limits on file size
• No fixed limits on account size
• Unstructured and structured data in their native format
• Massive throughput to increase analytic performance
• High durability, availability, and reliability
• Azure Active Directory access control
LOB
Applications
SocialDevices
Clickstream
Sensors
Video
Web
Relational
HDInsight
ADL Analytics
Machine Learning
Spark
R
ADL StoreSQL Data
Warehouse
Data Lake Store
DocumentDB
Big data analytics made easy
• Analyze data of any kind and size
• Develop faster, debug and optimize smarter
• Interactively explore patterns in your data
• No learning curve—use U-SQL, Spark, Hive, HBase and Storm
• Managed and supported with an enterprise-grade SLA
• Dynamically scales to match your business priorities
• Enterprise-grade security with Azure Active Directory
• Built on YARN, designed for the cloud
Data Lake Analytics
SQL Data Warehouse
SQL Database Storage BlobsData Lake Store SQL Database in a VM
Machine Learning
and Analytics
HDInsight
(Hadoop and
Spark)
Stream Analytics
Data Lake
Analytics
Machine
Learning
Azure Analysis
Services
https://www.youtube.com/watch?v=hDfaHeza1fU
R Programming Language
R Usage GrowthRexer Data Miner Survey, 2007-2015
Language PopularityIEEE Spectrum Top Programming Languages, 2016
76% of analytic professionals report using R
36% select R as their primary tool
Popularity• Free software
• Runs on almost any computing platform/OS
• Quite lean, functionality is modular and contained in packages
• Active and vibrant user community
• Graphics capabilities very powerful and customizable
• Objects generally have to fit into physical memory
• Recent advances
Statistical Software
Data Visualization Toolkit
Literate Programming Framework
Programming Language
CRAN Package System
The R Language
CRAN: 10,000+ add-on packages for R
CRAN Task View by Barry Rowlingson: http://www.maths.lancs.ac.uk/~rowlings/R/TaskViews/
https://gallery.cortanaintelligence.com/
Determine
whether a
patient
has cancer
Kaggle Data Science Bowl 2017
Jumpstart in less than 1h
Top 10% (Jan 19th) Based on work by Miguel Fierro, Ye Xing, Tao Wu
Data Science VM (GPU)
Microsoft Cognitive Toolkit (CNTK)
LightGBM for classification
Deep Neural Nets
Solution Pieces
11x11 conv, 96, /4, pool/2
5x5 conv, 256, pool/2
3x3 conv, 384
3x3 conv, 384
3x3 conv, 256, pool/2
fc, 4096
fc, 4096
fc, 1000
AlexNet, 8 layers
(ILSVRC 2012)
3x3 conv, 64
3x3 conv, 64, pool/2
3x3 conv, 128
3x3 conv, 128, pool/2
3x3 conv, 256
3x3 conv, 256
3x3 conv, 256
3x3 conv, 256, pool/2
3x3 conv, 512
3x3 conv, 512
3x3 conv, 512
3x3 conv, 512, pool/2
3x3 conv, 512
3x3 conv, 512
3x3 conv, 512
3x3 conv, 512, pool/2
fc, 4096
fc, 4096
fc, 1000
VGG, 19 layers
(ILSVRC 2014)
input
Conv
7x7+ 2(S)
MaxPool
3x3+ 2(S)
LocalRespNorm
Conv
1x1+ 1(V)
Conv
3x3+ 1(S)
LocalRespNorm
MaxPool
3x3+ 2(S)
Conv Conv Conv Conv
1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)
Conv Conv MaxPool
1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)
Dept hConcat
Conv Conv Conv Conv
1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)
Conv Conv MaxPool
1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)
Dept hConcat
MaxPool
3x3+ 2(S)
Conv Conv Conv Conv
1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)
Conv Conv MaxPool
1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)
Dept hConcat
Conv Conv Conv Conv
1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)
Conv Conv MaxPool
1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)
AveragePool
5x5+ 3(V)
Dept hConcat
Conv Conv Conv Conv
1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)
Conv Conv MaxPool
1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)
Dept hConcat
Conv Conv Conv Conv
1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)
Conv Conv MaxPool
1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)
Dept hConcat
Conv Conv Conv Conv
1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)
Conv Conv MaxPool
1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)
AveragePool
5x5+ 3(V)
Dept hConcat
MaxPool
3x3+ 2(S)
Conv Conv Conv Conv
1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)
Conv Conv MaxPool
1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)
Dept hConcat
Conv Conv Conv Conv
1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)
Conv Conv MaxPool
1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)
Dept hConcat
AveragePool
7x7+ 1(V)
FC
Conv
1x1+ 1(S)
FC
FC
Soft maxAct ivat ion
soft max0
Conv
1x1+ 1(S)
FC
FC
Soft maxAct ivat ion
soft max1
Soft maxAct ivat ion
soft max2
GoogleNet, 22 layers
(ILSVRC 2014)
ILSVRC (ImageNet Large Scale Visual Recognition Challenge)
ResNet, 152 layers 1x1 conv, 64
3x3 conv, 64
1x1 conv, 256
1x1 conv, 64
3x3 conv, 64
1x1 conv, 256
1x1 conv, 64
3x3 conv, 64
1x1 conv, 256
1x2 conv, 128, /2
3x3 conv, 128
1x1 conv, 512
1x1 conv, 128
3x3 conv, 128
1x1 conv, 512
1x1 conv, 128
3x3 conv, 128
1x1 conv, 512
1x1 conv, 128
3x3 conv, 128
1x1 conv, 512
1x1 conv, 128
3x3 conv, 128
1x1 conv, 512
1x1 conv, 128
3x3 conv, 128
1x1 conv, 512
1x1 conv, 128
3x3 conv, 128
1x1 conv, 512
1x1 conv, 128
3x3 conv, 128
1x1 conv, 512
1x1 conv, 256, /2
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 512, /2
3x3 conv, 512
1x1 conv, 2048
1x1 conv, 512
3x3 conv, 512
1x1 conv, 2048
1x1 conv, 512
3x3 conv, 512
1x1 conv, 2048
ave pool, fc 1000
7x7 conv, 64, /2, pool/2
Microsoft
3
224
224
last
layer
ImageNet ResNet N layers
penultimate
layer
tabby
cat
Solution: Transfer Learning
3
224
224
ResNet N-1 layers
penultimate
layer
no
cancer
CNTK
(53min)
LightGBM
(2min)
Boosted tree
k batch
of images= 1 patient
features
Solution: Transfer Learning
Kaggle script Cortana Gallery notebook
Blog in TechNet
https://aka.ms/dsb2017-cntk-notebookhttps://aka.ms/dsb2017-cntk-script https://aka.ms/dsb2017-cntk-blog
TechNet
How do I get started?
Get started now
Watch videos on Azure Data Lake: https://channel9.msdn.com/Series/AzureDataLake
Take courses and read documentation
on Azure Data Lake: http://aka.ms/hditraining
http://aka.ms/adlanalytics
http://aka.ms/adlstore
Learn more on the Data Lake website:http://azure.com/datalake
http://aka.ms/datalake
https://gallery.cortanaintelligence.com/
http://aka.ms/summitprize
https://aka.ms/mdis17schedule