Big Data Analytics for BI, BA and QA
-
Upload
dmitry-tolpeko -
Category
Data & Analytics
-
view
709 -
download
0
Transcript of Big Data Analytics for BI, BA and QA
2
BIG DATA
Why was it invented?
How is it used now?
How will it be used in the near future?
What do we need to do to stay competitive?
4
IN REALITY
It is very easy to start using Hadoop
and Cloud now.
So it is true that now most people doing
traditional things with just larger data
sets.
And at much lower cost, of course.
So it looks like the size matters, and this
is just another technology
5
BUT IT IS …
Completely new mindset and
approach to analytics
Solution to satisfy new, “mass
market” analytics
And you cannot skip it
6
YOU CAN FEEL THIS AS …
Developers (Java, .NET etc.), non-
BI and even non-IT people talk and
work with analytics today.
That was not the case before.
So what happens?
7
TRADITIONAL ANALYTICS
Expensive
Separate and isolated BI world
Analyzing transactions (data you
cannot afford to lose or calculate
with errors)
Historical data and strategic decisions
9
EVERTHING IS ABOUT DATA
Mindset: Data Analysis
not OLTP, DWH, ETL
Kimball/Inmon
Any application: UX+Analytics
(Machine Learning i.e.)
Competing on analytics, not just
product and service
Analytics become operational,
mass market
10
THE NEXT BIG SHIFT?
Digital Transformation of Economy
IoT, VR, AR, Machine Learning, AI
Personalized UX
Heavily relies on analytics
11
ANALYTICS TODAY
Fast, Advanced and Predictive
Analytics
o Personalization and customization: from
summary reports to a lot of tailored
data-driven actions (in near real time)
o Fast prototyping, implementation,
deployment and fast performance
o Data lakes
12
EXAMPLE - YESTERDAY
Company sends promo by email to
1M users paying $1 for each email,
50,000 users purchased goods at
$25
Profit: 50,000 * $25 - $1M =
$250,000
This is what traditional analytics
does.
13
EXAMPLE - TODAY
Today
Company identified to send promo
email just to 100,000 users, now
30,000 users purchased goods at $25
Profit: 30,000 * $25 - $100K =
$650,000
No new customers, no new
contracts – just algorithms and more
data
14
USE CASES
o Anomaly Detection
o Recommendation Systems
o Loyalty and Retention Programs
o Optimization
o A/B Testing
o Alarms, Scoring, Diagnosis
o Demand Forecasting and so on.
15
NEW CORE SKILLS
Distributed Data Processing and
Streaming Analytics
Programming (Python, R, Spark)
Math, Statistics
Machine Learning
Deep Learning
16
MACHINE LEARNING
Automation of discovery
Automatically adapt to new
circumstances
Detect patterns
In wide use now. “Self-testing”.
Few lines of code
17
BUILDING BLOCKS
Enriching analysis, development and
quality in software development
o Generic algorithms vs hardcoding
endless IF-ELSE
o Discovering hidden, not obvious
patterns
o Finding anomalies, outliers vs test
cases
18
BI TOOLS NOW
Self-service (less jobs?)
Advanced analytics (requires
understanding stats and machine
learning fundamentals)
19
SOURCE DATA
Non-transactional systems, weak or
no data model
Calculations with probability
Raw, unstructured data from
diverse data sources
Extracting small relevant pieces of
data from huge data sets
21
GOOD NEWS
BI people still good match as they
love crunching data
But significant shift in skills is
required
22
WHY TO BE INVOLVED
o Cutting edge
o Challenges
o Cool staff (predictions, AI
etc.)
o Growth, margin and revenue
25
TRADITIONAL EDW PLATFORMS
o Too expensive ($10,000 per TB and more)
o Large upfront cost
o Not easy procurement, setup and
maintenance
o Designed for relational data, SQL interface
only, limited schema flexibility
o Data must be loaded first (modeled,
prepared and moved)
o Marketing limitations for Appliances
26
TRADITIONAL OPEN SOURCE PLATFORMS
• Designed for relational data, SQL interface
only, limited schema flexibility
• Data must be loaded first (modeled,
prepared and moved)
• Not easily scalable (scale up and down)
27
TRADITIONAL DATA MINING TOOLS
• Expensive
• Smaller community (one more isolated
world)
• Targeted for enterprise users
• Longer release cycles, no way to mix tools
and try fresh new staff etc.
• Scalability and integration issues
28
WHY BIG DATA AND CLOUD
o Extremely economically attractive
o Scalable and elastic
o Self service
o Rich and diverse data tools
o Good enough quality (and
constantly improving)
29
BIG DATA AND CLOUD DESIGN PRINCIPLES
Decoupling Data Storage and Computing
o Database engine does not own data anymore
o Simplified load/extract
o Schema on read
o Not just SQL interface
o Any computing engines on top of data
Commodity Hardware
o Fault tolerant
Scale up and down
30
GROW PATH
From monolithic suites to diverse and rich tool set
SQL tools on Hadoop, Cloud
Advanced Data Analysis and Analytics
o Spark, MapReduce, NoSQL
o Python, R, Java, Scala
o Statistics
o Batch, Streaming, Real-time
Machine Learning and Deep Learning
o Understand use cases
o Understand specific algorithms and their
application
o Implementation
32
LET’S WIN THIS CAR
Suppose you're on a game show, and
you're given the choice of three
doors:
Behind one door is a car; behind the
others, goats.
You pick a door, say No. 3