hadoop reference architecturehortonworks.com/wp-content/uploads/2012/02/ref-architecture-ubs.pdf ·...
Transcript of hadoop reference architecturehortonworks.com/wp-content/uploads/2012/02/ref-architecture-ubs.pdf ·...
hadoop reference architecture
financial services webinar series
all informa6on in this document is confiden6al & proprietary
2
financial services industry issues
© tresata inc. 2011
A. Huge Financial Services industry
ver6cal…
B. …u6lized very limited data
capabili6es which…
C. …drove poor consumer
underwri6ng decisions…
D. …that yielded severely nega6ve
financial results
Figure 1: Consumer Credit Loans vs.
Delinquency Rates Trends
0.0%
0.5%
1.0%
1.5%
2.0%
2.5%
3.0%
3.5%
4.0%
4.5%
5.0%
‐
0.5
1.0
1.5
2.0
2.5
3.0
1990
1992
1994
1996
1998
2000
2002
2004
2006
2008
2010
Consumer Loans ($T)
Delinquency Rate (%)
Figure 2: Mortgage 1‐4 Loans vs.
Delinquency Rates Trends
0.0%
2.0%
4.0%
6.0%
8.0%
10.0%
12.0%
‐
2.0
4.0
6.0
8.0
10.0
12.0
1990
1992
1994
1996
1998
2000
2002
2004
2006
2008
2010
Mortgage 1‐4 Loans ($T)
Delinquency Rate (%)
Source: Federal Reserve 2011
3
big impact on consumer behavior
© tresata inc. 2011
• Debt Overhang
• ‘Nega6ve’ Equity Homes
• Damaged Credit Scores
• Zip Code Myth
Source: Tresata AnalyMcs 2012
4
required big data capabili6es
© tresata inc. 2011
A. Current Capabili6es 1. Sub Samples
2. Limited variables
3. Limited Mme series
4. Very high cost
B. Future Capabili6es 1. All customers
2. All variables
3. Full Mme series
4. BeSer AnalyMcs
Major Data Analy6cs &
Technology Challenge!
Time (Daily)
Variables
Individual Customers Current Industry
CapabiliMes
Future Industry CapabiliMes
Source: Tresata AnalyMcs 2011
5
where are we in the hype cycle?
© tresata inc. 2011
6
delivery & comprehension paradigm
© tresata inc. 2011
Source: Cap Gemini 2011
7
a new big data architecture
© tresata inc. 2011
Enterprise Data Systems / Enterprise File Systems (HDFS, NFS, etc.)
Hadoop Environment
Data & Analy6cs Pla\orm
• Broader leverage of EDS/EFS
• Abstracted modularity
• As‐a–service Infrastructure
• Automated Provision, Spin‐Up & Scale IaaS
• Auto‐deploy, auto‐scale Hadoop clusters (don't
need heavily staffed SME/support, instead
use commodity model)
• Unified, converging story over
Mme for indexing & algorithms
• Grow machine & senMment feedbase
Enterprise Distributed
Compute (hpc / grid)
cpu gpu fpga
TradiMonal BI / DW / OLAP
Pvt or Public Cloud (Kit & Opera6ng Model)
Smart Data AnalyMcs louds
• Broader leverage of compute
• Abstracted modularity
storage & access
paradigm
delivery & comprehension
paradigm
Batch Real‐Mme Hybrid
• Unified, near real‐Mme "big charts"
• Synergy with tradiMonal "quants/algs"
8
enterprise ready hadoop stack
© tresata inc. 2011
Tresata
Analy6cs Engine
Security & Monitoring
Hadoop Opera6ng System
Infra
structu
re
Data Pla\orm
Server Hardware & Opera6on System
Apache
• High bandwidth backplane • Industry approved & cerMfied security
• Fully managed plaiorm, e2e
• Private or public cloud
• Hadoop distribuMon (like Hortonworks) • DistribuMon support
Tresata Cer6fied & Packaged BDaaS Stack
• Extensible AnalyMcs Plaiorm • Total View of Customer Engine
• Proprietary Algorithms & AnalyMcs
9
do not add complexity
© tresata inc. 2011
Example Enterprise Architecture
10
make the start easy
© tresata inc. 2011
Data In
Info Out
Feed IN data from
mulMple warehouses
or SOR’s
Feed BACK info to Business Processes
11
if its data…you can move it
© tresata inc. 2011 Courtesy the genius of xkcd
12
build vs. buy
© tresata inc. 2011
EDW EDW EDW
Commodity Hardware (HP, Dell, Super Micro etc)
Server Opera6ng System (Linux)
Data Storage (HDFS, NFS)
Data Processing (Map Reduce)
Extract
Data Access Data Pipeline
Sqoop Oozie Avro PIG Hive HBase
Data Security (Kerberos,Custom)
Business ApplicaMons
Viz EV Report Analyze Model
Hadoop Big Data Pla\orm
Tresata Pla\orm
Source: Tresata IllustraMon 2012
13
focus on use cases
Structured & Unstructured
Data Sources
Store & Process
• Clean
• Parse
• De‐dupe
• Match
• Join
Compute
• Clustering
• Graph
• Classifiers
• SenMment
• Behavior
Visualize
• Heatmaps
• Benchmark
• Geotag
• Outliers
• Networks
Tresata Data Engine (fully engineered on Hadoop)
Client Data
Mul6ple
Use Cases
Marke6ng
© tresata inc. 2011
Social Data
Market Data Risk
Trading
Source: Tresata IllustraMon 2012
14
how hortonworks can help
• Training and Cer6fica6on – www.hortonworks.com/training/
• Hortonworks Data Pla\orm and Support
– www.hortonworks.com/hortonworksdataplaiorm/
– www.hortonworks.com/technology/techpreview/
– www.hortonworks.com/support/
• Educa6onal Webinars – www.hortonworks.com/webinars/