Future Forward with Data Sciencekirkborne.net/phuse2018/KirkBorne-PhUSE-Nov2018.pdf · Feature...
Transcript of Future Forward with Data Sciencekirkborne.net/phuse2018/KirkBorne-PhUSE-Nov2018.pdf · Feature...
Principal Data Scientist
Booz Allen Hamilton
http://www.boozallen.com/datascience
Kirk Borne@KirkDBorne
Future Forward with Data Science:How to Predict (and to Change) the Future
Principal Data Scientist
Booz Allen Hamilton
http://www.boozallen.com/datascience
Kirk Borne@KirkDBorne
Future Forward with Data Science:How to Predict (and to Change) the Future
Ever since we first explored our world…
http://www.livescience.com/27663-seven-seas.html
3
…We have asked questions about everything around us.
https://jefflynchdev.wordpress.com/tag/adobe-photoshop-lightroom-3/page/5/
4
So, we have collected evidence (data) to answer our questions,
which leads to more questions, which leads to more data collection,
which leads to more questions, which leads to BIG DATA!
y ~ 2 * x (linear growth)
y ~ 2 ^ x (exponential growth)
https://www.linkedin.com/pulse/exponential-growth-isnt-cool-combinatorial-tor-bair
y ~ x! ≈ x ^ x→ Combinatorial Growth!(all possible interconnections,linkages, and interactions)
5
DefiningBig Data
• The 3 V’s of Big Data are not just hype…
• They represent really big challenges:
1. Volume
2. Velocity
3. Variety
Source for graphic: http://www.vitria.com/blog/Big-Data-Analytics-Challenges-Facing-All-Communications-Service-Providers/ 6
DefiningBig Data
• The 3 V’s of Big Data are not just hype…
• They represent really big challenges:
1. Volume
2. Velocity
3. Variety
✓VALUE!
7
Our mission as data scientists:is to discover Value in Big Data
(especially in high-Variety data) throughData Science and Machine Learning
8
What is the Big Data Variety Challenge?
9
What is the Big Data Variety Challenge?
Source for graphic: http://www.vitria.com/blog/Big-Data-Analytics-Challenges-Facing-All-Communications-Service-Providers/
1. We collect many different sources of data.
2. But we usually store diverse data in separate silos.
3. Therefore, we cannot easily integrate the data to
combine them for unified insight.
Consider the Blind Men
and the Elephant…
10
11
Adding more data doesn’t necessarily help…
https://paulmead.com.au/blog/understand-perceptions/
Unless we can combine and integrate the different signals
into a “single view” of the thing, there will continue to be
many possible interpretations of what the source is!
Combining, connecting, and linking diverse data makes data “smart”!
Think of data not as information, but as measurements that encode knowledge.
Feature Selection is important in order to disambiguate different classes.More importantly,Class Discovery depends on choosing the right projection and selecting the right features!
Feature Selection and Projection
12
Your chosen data attributes represent a low-dimension projection of the full truth – the feature space (dimensions) in which you explore your data is a form of bias – it matters!
Projection Matters
13
Feature Selection and Model Bias:choosing features in the dark
I picked out two socks from my sock drawer this morning!
It was still dark, but that shouldn’t matter, right? After all, they are the same size … THE SAME ?!?
The Era of Big Data represents the END OF DEMOGRAPHICS (i.e., our models should no longer be based on and biased by a limited selection of attributes and features)
14
An “Easy Button” for Extracting Value from Data through Machine Learning• Pattern Discovery (Detection)
– D2D: data-to-discovery
• Pattern Recognition– D2D: data-to-decisions
• Pattern Exploration– D2D: data-to-dollars (innovation)
• Pattern Exploitation– D2V: Data-to-Value (action)
– D2A: Data-to-Action (value)
15
Pattern Discovery is easy, but Pattern Exploitation requires more data science…
Source for graphic: http://www.holehouse.org/mlclass/10_Advice_for_applying_machine_learning.html
16
Generalization is key!
(The Goldilocks model)
The most generally useful model captures the fundamental pattern in the data and takes into account the natural variance in the data.
The Goal of Machine Learning
“…is to use algorithms to learn from data,
in order to build generalizable models that
give accurate classifications or predictions,
or to find (useful) patterns, particularly
with new and previously unseen data.”
(the key is GENERALIZATION!)
https://www.innoarchitech.com/machine-learning-an-in-depth-non-technical-guide/
17
18
4 Flavors of Machine Learning
for Pattern Detection and Discovery1) Class Discovery (Clustering): Find the
categories of objects (population segments), events, and behaviors in your data. + Learn the rules that constrain the class boundaries (that uniquely distinguish them).
2) Correlation (Predictive and Prescriptive Power) Discovery: Find trends, patterns, and
dependencies in data that reveal new governing principles or behavioral patterns (the object’s “DNA”).
3) Novelty (Surprise!) Discovery: Find the new,
surprising, unexpected one-in-a-[million / billion / trillion] object, event, or behavior.
4) Association (or Link) Discovery: (Graph and
Network Analytics) – Find the unusual (interesting) data associations / links / connections across entities in your domain.
5 Levels of Analytics Maturity
in Data-Driven Applications1) Descriptive Analytics
– Hindsight (What happened?)
2) Diagnostic Analytics
– Oversight (real-time / What is
happening? Why did it happen?)
3) Predictive Analytics
– Foresight (What will happen?)
19
5 Levels of Analytics Maturity
in Data-Driven Applications1) Descriptive Analytics
– Hindsight (What happened?)
2) Diagnostic Analytics
– Oversight (real-time / What is
happening? Why did it happen?)
3) Predictive Analytics
– Foresight (What will happen?)
4) Prescriptive Analytics
– Insight (How can we optimize what
happens?) (Follow the dots / connections in
the graph!)
5) Cognitive Analytics– Right Sight (the 360 view , what is the right
question to ask for this set of data in this
context = Game of Jeopardy)
– Finds the right insight, the right action, the
right decision,… right now!
– Moves beyond simply providing answers, to
generating new questions and hypotheses.
20
PREDICTIVE
Find a function (i.e., the model) f(d,t)
that predicts the value of some
predictive variable y = f(d,t) at a future
time t, given the set of conditions found
in the training data {d}.
=> Given {d}, find y.
PRESCRIPTIVEAnalytics
Find the conditions {d’} that will produce a
prescribed (desired, optimum) value y at a
future time t, using the previously learned
conditional dependencies among the variables
in the predictive function f(d,t).
=> Given y, find {d’}.
Predictive vs Prescriptive:What’s the Difference?
21
Analytics
Analytics
Find a function (i.e., the model) f(d,t)
that predicts the value of some
predictive variable y = f(d,t) at a future
time t, given the set of conditions found
in the training data {d}.
=> Given {d}, find y.
Analytics
Find the conditions {d’} that will produce a
prescribed (desired, optimum) value y at a
future time t, using the previously learned
conditional dependencies among the variables
in the predictive function f(d,t).
=> Given y, find {d’}.
Predictive vs Prescriptive:What’s the Difference?
22
Confucius says…
“Study your past to know
your future”
PREDICTIVE PRESCRIPTIVE
PREDICTIVEAnalytics
Find a function (i.e., the model) f(d,t)
that predicts the value of some
predictive variable y = f(d,t) at a future
time t, given the set of conditions found
in the training data {d}.
=> Given {d}, find y.
PRESCRIPTIVEAnalytics
Find the conditions {d’} that will produce a
prescribed (desired, optimum) value y at a
future time t, using the previously learned
conditional dependencies among the variables
in the predictive function f(d,t).
=> Given y, find {d’}.
Predictive vs Prescriptive:What’s the Difference?
23
Confucius says…
“Study your past to know
your future”
Baseball philosopher Yogi Berra says…
“The future ain’t what it
used to be.”
© Copyright 2016 Booz Allen Hamilton – http://www.boozallen.com/datascience
Data Analytics in Medicine & Health Administration1. Benefits Administration improvement (“ACO = HIE + Analytics”: process mining,
best practices, cost-efficiency, success metrics validation)2. Do Not Pay initiatives (payment error / fraud analytics)3. Beneficiary Recommendations ("Amazon-style" predictive analytics, prescriptive
modeling)4. Consumer Engagement (personalized online web experience, "marketing
analytics")5. Health Information Exchange (HIE) Exploitation (population health discovery, link
analysis, ICD-10 mining)6. Personalized Healthcare and Patient Wellness (wearables data-sharing/mining,
health baselining)7. Personalized/Precision Medicine and Care Coordination (EHR, HIE monitoring /
mining)8. Predictive Medicine (readmissions, complications, adverse interactions)9. At-Risk Precursor Analytics (early warning signals of cancer, diabetes, heart
disease, suicidal / mental health issues, ...)10. Patient Trajectories Analysis (mining / segmentation of whole population EHR
histories, pathways, outcomes, outliers)11. Learning Health System Decision Support (advanced analytics embedded in health
system data feeds)12. What Question Should I Be Asking of My Data? (Cognitive Analytics)
24
25Source for graphic: https://data-flair.training/blogs/machine-learning-applications/
Predictive Analytics is currently the most significant application of Machine Learning (*)
(*) The set of mathematical algorithms that learn (patterns) from experience (data)
26Source for graphic: https://www.altexsoft.com/blog/datascience/machine-learning-strategy-7-steps/
Predictive Analytics is everywhere in Business Data and Machine Learning (AI) Strategy Discussions
Traditional Time Series Forecasting:Prediction based on historical patterns
Source: https://medium.com/99xtechnology/time-series-forecasting-in-machine-learning-3972f7a7a467
27
Traditional Time Series Forecasting:Autoregressive (uncertainty in prediction can be large)
Source: https://peltiertech.com/excel-fan-chart-showing-uncertainty-in-projections/
Un
cert
ain
ty!
28
Traditional Time Series Forecasting:Autoregressive (assumes future time series values
depend on the past values from the same series)
Source: http://ucanalytics.com/blogs/step-by-step-graphic-guide-to-forecasting-through-arima-modeling-in-r-manufacturing-case-study-example/
29
Traditional Time Series Forecasting:Even with very high-fidelity physics-based models,
uncertainty in prediction can be large!
Source: https://www.reddit.com/r/weather/comments/6xecax/tracking_hurricane_irma/ 30
31
Data Science provides insights into the future: to predict it and to change it!
32
Source for image: https://www.hausmanmarketingletter.com/translating-analytics-to-action/
Advances in Predictive, Prescriptive,and Cognitive Analytics provide us with
More Ways to See Around Corners
Examples of Forecasting(seeing around corners)
1) Cognitive
2) Associations
3) Graphs
4) Clustering
33
Examples of Forecasting(seeing around corners)
1) Cognitive
2) Associations
3) Graphs
4) Clustering
34
“You can see a lot by just looking”
(and you can see around corners!)
Cognitive, Contextual, Insightful, Forecastful
35https://www.speedcafe.com/2017/07/12/f1-demo-take-place-london-streets/
Examples of Forecasting(seeing around corners)
1) Cognitive
2) Associations
3) Graphs
4) Clustering
36
◼ Classic Textbook Example of Data Mining (Legend?): Data
mining of grocery store logs indicated that men who buy
diapers also tend to buy beer at the same time.
Association Discovery Example #1
37
◼ Amazon.com mines its customers’ purchase logs to
recommend books to you: “People who bought this book also
bought this other one.”
Association Discovery Example #2
38
◼ Netflix mines its video rental history database to recommend
rentals to you based upon other customers who rented similar
movies as you.
Association Discovery Example #3
39
◼ Wal-Mart studied product sales in their Florida stores in 2004
when several hurricanes passed through Florida.
◼ Wal-Mart found that, before the hurricanes arrived, people
purchased 7 times as many of {one particular product}
compared to everything else.
Association Discovery Example #4
40
◼ Wal-Mart studied product sales in their Florida stores in 2004
when several hurricanes passed through Florida.
◼ Wal-Mart found that, before the hurricanes arrived, people
purchased 7 times as many strawberry pop tarts compared
to everything else.
Association Discovery Example #4
41
Strawberry pop tarts???
http://www.nytimes.com/2004/11/14/business/yourmoney/14wal.htmlhttp://www.hurricaneville.com/pop_tarts.html
http://bit.ly/1gHZddA42
Association Rule Discovery forHurricane Intensification Forecasting
• Research by GMU geoscientists
• Predict the final strength of hurricane at landfall.
• Find co-occurrence of final hurricane strength with specific values of measured physical properties of the hurricane while it is still over the ocean.
• Result: the association rule discovery prediction is better than National Hurricane Center prediction!
• Research Paper by GMU scientists: https://ams.confex.com/ams/pdfpapers/84949.pdf
43
Examples of Forecasting(seeing around corners)
1) Cognitive
2) Associations
3) Graphs
4) Clustering
44
(Graphic by Cray, for Cray Graph Engine CGE)
http://www.cray.com/products/analytics/cray-graph-engine
“All the World is a Graph” – Shakespeare?The natural data structure of the world is not
rows and columns, but a Graph!
45
Simple Example of the Power of Graph:Semi-Metric Space
• Entity {1} is linked to Entity {2} (small distance A)
• Entity {2} is linked to Entity {3} (small distance B)
• Entity {1} is *not* linked directly to Entity {3} (Similarity Distance C = infinite)
• Similarity Distances between A, B, and C violate the triangle inequality!
{1} {3}{2}
46
• Entity {1} is linked to Entity {2} (small distance A)
• Entity {2} is linked to Entity {3} (small distance B)
• Entity {1} is *not* linked directly to Entity {3} (Similarity Distance C = infinite)
• Similarity Distances between A, B, and C violate the triangle inequality!
• The connection between black hat entities {1} and {3} never appears explicitly
within a transactional database.
• Examples: (a) Medical Research Discoveries across disconnected journals,
through linked semantic assertions; (b) Customer Journey modeling; (c) Safety
Incident Causal Factor Analysis; (d) Marketing Attribution Analysis; (e) Fraud
networks, Illegal goods trafficking networks, Money-Laundering networks.
{1} {3}{2}
Simple Example of the Power of Graph:Semi-Metric Space
47
Customer Journey Science by Clickfox.com –The Journey Graph predicts Customer outcomes with high accuracy!
48https://www.slideshare.net/Qualtrics/how-to-leverage-analytics-design-and-development-to-transform-customer-journeys
Examples of Forecasting(seeing around corners)
1) Cognitive
2) Associations
3) Graphs
4) Clustering
49
Clustering = the process of partitioning a set of data into subsets
(segments or clusters) such that a data element belonging to any
chosen cluster is more similar to data elements belonging to
that cluster than to data elements belonging to other clusters.
= Group together similar items + separate the dissimilar items
= Identify similar characteristics, patterns, or behaviors among
subsets of the data elements.
Challenge #1) No prior knowledge of the number of clusters.
#2) No prior knowledge of semantic meaning of the clusters.
#3) Different clusters are possible from the same data set!
#4) Different clusters are possible using different similarity metrics.50
51
How to know if your clusters are good enough
Reference: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-7-S2-S5
R code for validation algorithms: https://cran.r-project.org/web/packages/clValid/clValid.pdf
◼ You know the clusters are good …
◼ … if the clusters are compact relative to their separation
◼ … if the clusters are well separated from one another
◼ … the “within cluster” errors are small (low variance within)
◼ … if the number of clusters is small relative to the number of data points
◼ Various measures of cluster compactness exist, including the Dunn index , the C-index, Silhouette analysis, and the DBI (Davies-Bouldin Index)
51
Application of Davies-Bouldin Index
◼ Assume K (the number of clusters) and assume other things (choice of clustering algorithm; the choice of clustering feature attributes; etc.)
◼ Measure DBI
◼ Test another set of values for the cluster input parameters (K, feature attributes, etc.)
◼ Measure DBI
◼ … continue iterating like this until you find the set of cluster input parameters that yields the best (minimum) value for DBI.
52
Scientific Discovery from
Cluster Analysis of data
parameters from events on
the Sun and around the Earth
Cluster Analysis:Find the clusters, then Evaluate them
D-
B
Ind
ex
Delay (hr) of Dst from Vsw and Bz
DBI for Dst_Vsw_Bz
0.8
0.85
0.9
0.95
1
1.05
1.1
1.15
1.2
0 1 2 3 4 5 6 7 8 9 10 11 12
Time Shift
DB
I
2C DBI
3C DBI
4C DBI
Average
Figure 10. Davies-Bouldin index for various time delays of Dst from Vsw and Bz for cases of 2 (blue), 3 (red), 4 (yellow) clusters, and the overall average (purple), indicating an optimal delay of ~2-3 hours for Dst.
Good Clusters =
Small Size relative to
Cluster Separation.
DISCOVERY! ...
Solar wind events
have the strongest
association (i.e., the
tightest clusters) with
the space plasma
events within the
Earth’s magnetosphere
about 2-4 hours after
a major plasma outburst
occurs on the Sun.
54
Next Steps…
55
Welcome to the new Hype 2018!
56https://marketoonist.com/2018/01/blockchain.html
https://datasciencebowl.com
Harness your Data Science Passion.Unleash your Curiosity.
Focus on a larger Purpose using #Data4Good and #AI4socialgood in #DataSciBowl.
57
75% of rare diseases affect children.
** https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3150084/
**
Data Science Bowl – largest global competition in DS(summary statistics for Data Science Bowls 2015-2018)
58
Thank you!Contact information, for further questions or inquiries:
Dr. Kirk Borne, Principal Data Scientist, Booz Allen Hamilton
Twitter: @KirkDBorne or Email: [email protected]
Get slides here: http://www.kirkborne.net/phuse2018/
59Booz | Allen | Hamilton