The modern analytics architecture
-
Upload
jdanton -
Category
Technology
-
view
352 -
download
1
Transcript of The modern analytics architecture
![Page 1: The modern analytics architecture](https://reader036.fdocuments.in/reader036/viewer/2022062512/554a3571b4c90542548b5994/html5/thumbnails/1.jpg)
The Modern Analytics ArchitectureMaking Big Data UsefulJoseph D’Antoni, Solutions
ArchitectAnexinet
May 7-9, 2014 | San Jose, CA
![Page 2: The modern analytics architecture](https://reader036.fdocuments.in/reader036/viewer/2022062512/554a3571b4c90542548b5994/html5/thumbnails/2.jpg)
Please silence
cell phones
![Page 3: The modern analytics architecture](https://reader036.fdocuments.in/reader036/viewer/2022062512/554a3571b4c90542548b5994/html5/thumbnails/3.jpg)
Joey D’AntoniJoey has over 15 years of experience with a wide variety of data platforms, in both Fortune 50 companies as well as smaller organizationsHe is a frequent speaker on database administration, big data, and career managementHe is the co-president of the Philadelphia SQL Server User’s GroupHe wants you to make sure you can restore your data
![Page 4: The modern analytics architecture](https://reader036.fdocuments.in/reader036/viewer/2022062512/554a3571b4c90542548b5994/html5/thumbnails/4.jpg)
Agenda
• Data Warehouses—how did we get here?• Big Data—Hadoop and more• Modern Analytic Tools• Building Our New Architecture
4
![Page 5: The modern analytics architecture](https://reader036.fdocuments.in/reader036/viewer/2022062512/554a3571b4c90542548b5994/html5/thumbnails/5.jpg)
5
Data Warehouses—A History
• Data Warehousing had it origins in the 1970s—A.C. Nielsen provided clients with data marts
• In 1988—Bill Inmon (IBM) published “An Architecture for a Business Information System”
• In 1996—Ralph Kimball published “The Data Warehouse Toolkit” which showcased models for OLAP style modelling
![Page 6: The modern analytics architecture](https://reader036.fdocuments.in/reader036/viewer/2022062512/554a3571b4c90542548b5994/html5/thumbnails/6.jpg)
6
Data Warehouse Models
• Star Schema
• Advantage is that the DW is easier to use
• Facts and dimensions allow queries to perform faster
• Loading and ETL become more complicated
• Structure changes are very expensive
Dimensional Model
![Page 7: The modern analytics architecture](https://reader036.fdocuments.in/reader036/viewer/2022062512/554a3571b4c90542548b5994/html5/thumbnails/7.jpg)
7
Data Warehouse Model
• Tables are grouped by subject area (consumer, finance, products)
• Tables are linked by joins
• Very easy to add information into the database
• Queries are harder to write, and joins can be very expensive performance wise
Normalization
![Page 8: The modern analytics architecture](https://reader036.fdocuments.in/reader036/viewer/2022062512/554a3571b4c90542548b5994/html5/thumbnails/8.jpg)
8
Data Warehousing Challenges
Data QualityETLPerformance and ScalabilityCosts—Licensing and Hardware
![Page 9: The modern analytics architecture](https://reader036.fdocuments.in/reader036/viewer/2022062512/554a3571b4c90542548b5994/html5/thumbnails/9.jpg)
9
Data Quality
![Page 10: The modern analytics architecture](https://reader036.fdocuments.in/reader036/viewer/2022062512/554a3571b4c90542548b5994/html5/thumbnails/10.jpg)
10
Extract, Transform, Load (ETL) Process
Some Database Business Doesn’t
Care About
Process
Your
Some
Credit—Buck Woody, Microsoft
![Page 11: The modern analytics architecture](https://reader036.fdocuments.in/reader036/viewer/2022062512/554a3571b4c90542548b5994/html5/thumbnails/11.jpg)
11
Performance and Scalability
Given the volume of data, DW queries can be very slowWe use techniques like data compression to make them fasterCPU was older problem—now tends to be storage
![Page 12: The modern analytics architecture](https://reader036.fdocuments.in/reader036/viewer/2022062512/554a3571b4c90542548b5994/html5/thumbnails/12.jpg)
12
Costs
Data Warehouses need large serversDatabase systems are licensed by the size of the server (core)Data Warehouses need a whole lot fast storageLarge volumes of fast storage (SANs) are expensive
![Page 13: The modern analytics architecture](https://reader036.fdocuments.in/reader036/viewer/2022062512/554a3571b4c90542548b5994/html5/thumbnails/13.jpg)
13
Traditional Solutions
![Page 14: The modern analytics architecture](https://reader036.fdocuments.in/reader036/viewer/2022062512/554a3571b4c90542548b5994/html5/thumbnails/14.jpg)
Classic Data Analysis
Data Warehouse & BI Solutions
ETL
…Uses Just a Subset
![Page 15: The modern analytics architecture](https://reader036.fdocuments.in/reader036/viewer/2022062512/554a3571b4c90542548b5994/html5/thumbnails/15.jpg)
Common Technical Themes
There are a lot of “big data” solutions, but most of have a lot of things in common
• Built in HA/DR through multiple copies of the data• Designed for analytics processing more than OLTP• Derived from Open Source solutions• Designed around local storage and commodity
hardware
![Page 16: The modern analytics architecture](https://reader036.fdocuments.in/reader036/viewer/2022062512/554a3571b4c90542548b5994/html5/thumbnails/16.jpg)
Components Of Modern ArchitectureHadoop• (And it’s ecosystem)
EDWAnalytics EngineVisualization Engine
![Page 17: The modern analytics architecture](https://reader036.fdocuments.in/reader036/viewer/2022062512/554a3571b4c90542548b5994/html5/thumbnails/17.jpg)
Big Data Workflow for Combined Data and Analytics
Data Acquire Organize Analyze Decide
Str
uct
ur
ed
Sem
i-S
tru
ctu
red
Un
-S
tru
ctu
red
Master and
Reference
Transactions
Machine Generated
(Logs)
Web
Text, Image, Audio, Video
DBMS (OLTP)
Files
NoSQL(Key Value
Data Store)
HDFS
ETL/ELT
Change Data
Capture
Real-Time
Message-Based
Hadoop MR
ODS
Data Warehouse
Streaming(CEP
Engine)
In-Database Analytics
Analytics
• Reporting and dashboards
• Alerting and recommendations
• EPM, Social Apps
• Text analytics and search
• Advanced analytics
• Interactive discovery
Hardware
Big Data Cluster
High Speed
Network
RDBMS Cluster
In-MemoryAnalytics
Source—Gartner, Credit Suisse, 8/12
![Page 18: The modern analytics architecture](https://reader036.fdocuments.in/reader036/viewer/2022062512/554a3571b4c90542548b5994/html5/thumbnails/18.jpg)
Are We Leaving the RDBMS?
![Page 19: The modern analytics architecture](https://reader036.fdocuments.in/reader036/viewer/2022062512/554a3571b4c90542548b5994/html5/thumbnails/19.jpg)
19
CPUs
Hadoop Project StartsExadata Launched
![Page 20: The modern analytics architecture](https://reader036.fdocuments.in/reader036/viewer/2022062512/554a3571b4c90542548b5994/html5/thumbnails/20.jpg)
20
Costs—Big Data versus Data Warehouse
Server Storage Licensing Total $-
$50,000.00
$100,000.00
$150,000.00
$200,000.00
$250,000.00
$300,000.00
$350,000.00
Hadoop and Data Warehouse Costs
Hadoop Data Warehouse
• For same costs you build a 15-node Hadoop cluster
• The Hadoop cluster would have 3840 GB of RAM versus the 1024 in the DW sever
![Page 21: The modern analytics architecture](https://reader036.fdocuments.in/reader036/viewer/2022062512/554a3571b4c90542548b5994/html5/thumbnails/21.jpg)
Enter the Yellow Elephant
21
![Page 22: The modern analytics architecture](https://reader036.fdocuments.in/reader036/viewer/2022062512/554a3571b4c90542548b5994/html5/thumbnails/22.jpg)
Hadoop
Hadoop is the leading Big Data platform (eco-system)Invented by Yahoo• Scales Horizontally (2 socket x86 servers
in massive clusters)• Uses big, slow, local storage • Extremely fault-tolerant• In a nutshell—it’s a Distributed File
System (3 copies of data in cluster) and a programming framework called MapReduce
![Page 23: The modern analytics architecture](https://reader036.fdocuments.in/reader036/viewer/2022062512/554a3571b4c90542548b5994/html5/thumbnails/23.jpg)
23
Introducing Hadoop
Host 1
Name Node
Host 3
Data Node
Host 5
Data Node
Host 2
Secondary Name Node
Host 4
Data Node
Host 6
Data Node
![Page 24: The modern analytics architecture](https://reader036.fdocuments.in/reader036/viewer/2022062512/554a3571b4c90542548b5994/html5/thumbnails/24.jpg)
24
How Map Reduce Works
• Automatic parallelism
• Fault tolerance
![Page 25: The modern analytics architecture](https://reader036.fdocuments.in/reader036/viewer/2022062512/554a3571b4c90542548b5994/html5/thumbnails/25.jpg)
Map Phase
Input File: foo.log
HDFS Block
1
HDFS Block
19
HDFS Block 1051) Read
splits into records
Split 1
K:0 V…
Map Task 1
K:INFO V…
Split 2
K:123 V…
Map Task 2
K:INFO V:1K:WARN
V:1
Split 3K:332 V…
K:368 V…
Map Task 3
K:Debug V:1
K:INFO V:1
2) Run Map
3) Write and Sort Output
![Page 26: The modern analytics architecture](https://reader036.fdocuments.in/reader036/viewer/2022062512/554a3571b4c90542548b5994/html5/thumbnails/26.jpg)
Hadoop Ecosystem
HDFS
MapReduce
Note: This is only a subset of ecosystem!
![Page 27: The modern analytics architecture](https://reader036.fdocuments.in/reader036/viewer/2022062512/554a3571b4c90542548b5994/html5/thumbnails/27.jpg)
YARN
![Page 28: The modern analytics architecture](https://reader036.fdocuments.in/reader036/viewer/2022062512/554a3571b4c90542548b5994/html5/thumbnails/28.jpg)
28
Spark and Shark
• Hadoop 2 Enhancements
• Spark is in-memory• Shark integrates
Spark with Hive
![Page 29: The modern analytics architecture](https://reader036.fdocuments.in/reader036/viewer/2022062512/554a3571b4c90542548b5994/html5/thumbnails/29.jpg)
Hadoop Architectural Decisions
• Distribution• Components• Support• Cloud vs On-Premises
![Page 30: The modern analytics architecture](https://reader036.fdocuments.in/reader036/viewer/2022062512/554a3571b4c90542548b5994/html5/thumbnails/30.jpg)
Choosing Your Hadoop Distribution
![Page 31: The modern analytics architecture](https://reader036.fdocuments.in/reader036/viewer/2022062512/554a3571b4c90542548b5994/html5/thumbnails/31.jpg)
Hadoop Vendors
Technology Vendor Description
Hadoop Distributions Apache Completely open source software for distributed clusters and map/reduce
Cloudera Industry leading commercial distribution, good management tools
Hortonworks Open source distribution—Apache compatible
MapR Multiple enhancements to Apache Hadoop (rewrite of HDFS), high performance, enterprise ready
Pivotal HD EMC spinoff with strong financial backing, this is full high performance RDBMS (with BI connectors) on top of Hadoop
![Page 32: The modern analytics architecture](https://reader036.fdocuments.in/reader036/viewer/2022062512/554a3571b4c90542548b5994/html5/thumbnails/32.jpg)
32
Cloud vs On-Premises
• Short Term Use• Rapid Scale
• Test Use Cases• Pay as you go• Internet data
source
• Large long term implementations
• Well known workloads• Shared clusters• Large initial investment
On-Premises
![Page 33: The modern analytics architecture](https://reader036.fdocuments.in/reader036/viewer/2022062512/554a3571b4c90542548b5994/html5/thumbnails/33.jpg)
Analytics Engine33
![Page 34: The modern analytics architecture](https://reader036.fdocuments.in/reader036/viewer/2022062512/554a3571b4c90542548b5994/html5/thumbnails/34.jpg)
34
Analytics
Hadoop is was not fastFull scans of filesSo How Do We Rapidly Analyze Data?
![Page 35: The modern analytics architecture](https://reader036.fdocuments.in/reader036/viewer/2022062512/554a3571b4c90542548b5994/html5/thumbnails/35.jpg)
35
Columnar Databases
Microsoft SQL Server (2012 & 2014)PDWHP VerticaHBaseParAccelInfiniDBEMC Greenplum
![Page 36: The modern analytics architecture](https://reader036.fdocuments.in/reader036/viewer/2022062512/554a3571b4c90542548b5994/html5/thumbnails/36.jpg)
36
In-Memory Databases
SQL Server 2014SAP HanaOracle Times TenVoltDBApache Spark
![Page 37: The modern analytics architecture](https://reader036.fdocuments.in/reader036/viewer/2022062512/554a3571b4c90542548b5994/html5/thumbnails/37.jpg)
37
Analytics Tools Past and Present
![Page 38: The modern analytics architecture](https://reader036.fdocuments.in/reader036/viewer/2022062512/554a3571b4c90542548b5994/html5/thumbnails/38.jpg)
38
Data Visualization
![Page 39: The modern analytics architecture](https://reader036.fdocuments.in/reader036/viewer/2022062512/554a3571b4c90542548b5994/html5/thumbnails/39.jpg)
Tools for Data Visualization
Excel (Power View and Power Map)TableauQlikPlatforaPentaho
![Page 40: The modern analytics architecture](https://reader036.fdocuments.in/reader036/viewer/2022062512/554a3571b4c90542548b5994/html5/thumbnails/40.jpg)
40
Bringing This All Together
Power Query (Excel)
Some Database Business Doesn’t
Care About
Process
Your
Some
![Page 41: The modern analytics architecture](https://reader036.fdocuments.in/reader036/viewer/2022062512/554a3571b4c90542548b5994/html5/thumbnails/41.jpg)
Q & A ?
![Page 42: The modern analytics architecture](https://reader036.fdocuments.in/reader036/viewer/2022062512/554a3571b4c90542548b5994/html5/thumbnails/42.jpg)
Session Evaluations
Submit by 5pmFriday May 9 to WIN prizes
Your feedback is important and valuable.
ways to access
Go to passbac2014/evals
Download the PASS EVENT App from your App Store and search: PASS BAC 2014
Follow the QR code link displayed on session signage throughout the conference venue and in the program guide
![Page 43: The modern analytics architecture](https://reader036.fdocuments.in/reader036/viewer/2022062512/554a3571b4c90542548b5994/html5/thumbnails/43.jpg)
for attending this session and the PASS Business Analytics Conference 2014
Thank
You
May 7-9, 2014 | San Jose, CA