West Yorkshire Enterprise Architecture Patterns For Big Data · •Copper Line Performance Model...
Transcript of West Yorkshire Enterprise Architecture Patterns For Big Data · •Copper Line Performance Model...
![Page 1: West Yorkshire Enterprise Architecture Patterns For Big Data · •Copper Line Performance Model –Broadband Speed Prediction ... • D&B Legal Entities used as Reference Data ...](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b1f96127f8b9a60128b5926/html5/thumbnails/1.jpg)
Enterprise Architecture Patterns For Big Data
Phill Radley,
Chief Data Architect
20 / October / 2016
West
Yorkshire
1/25
![Page 2: West Yorkshire Enterprise Architecture Patterns For Big Data · •Copper Line Performance Model –Broadband Speed Prediction ... • D&B Legal Entities used as Reference Data ...](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b1f96127f8b9a60128b5926/html5/thumbnails/2.jpg)
What I’m going to talk about….
• The organisation of BT and its IT.
• Early stages of big data in the industry & BT
• The BT production big data platform (HaaS)
• Sample Adoption Patterns• Data Archive, DW Extension, Re-Platform old batch apps, Self-Service Analytics
• Example Use Cases• Copper Line Performance Model – Broadband Speed Prediction
• Nuisance Call Prediction
• Governance
• Where next
2/25
![Page 3: West Yorkshire Enterprise Architecture Patterns For Big Data · •Copper Line Performance Model –Broadband Speed Prediction ... • D&B Legal Entities used as Reference Data ...](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b1f96127f8b9a60128b5926/html5/thumbnails/3.jpg)
BT Group Structure 1/Apr/2016
Customers
Chief Architects Office Enterprise Architecture
Data Architecture
For BT Group
~ 90K FTE in 61 countries, serving 180 countries
Research & Innovation
3/25
![Page 4: West Yorkshire Enterprise Architecture Patterns For Big Data · •Copper Line Performance Model –Broadband Speed Prediction ... • D&B Legal Entities used as Reference Data ...](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b1f96127f8b9a60128b5926/html5/thumbnails/4.jpg)
Legacy Systems Architecture in each BT Business Unit
Analytics
Data
Warehouse
ESB
CRM
Service Management
Network Management
Networks
& IT
Customers
• Hundreds of systems in each business unit grouped into 3 operational areas (CRM/Service Mgt/Network Mgt)
• Data Warehouse per business unit
• Client – Server applications running onservers in BT Data Centres (~ 35K hosts)
• Mainframe applications (in Openreach)
• Total Storage ~ 25PB
• Lots of event / time series data – Network Alarms & Telemetry
– Netflow Traffic Events, Security events
– Call Detail Records, web clicks,
– mobile handset data (GPS, Apps, browsing..)
• Business Unit CIOs manage IT investment roadmap, each business unit deploys a “stack release” quarterly
Field Engineers
4/25
![Page 5: West Yorkshire Enterprise Architecture Patterns For Big Data · •Copper Line Performance Model –Broadband Speed Prediction ... • D&B Legal Entities used as Reference Data ...](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b1f96127f8b9a60128b5926/html5/thumbnails/5.jpg)
Research & Innovation (R&I)
• R&I is a unit within IT (TSO) located in Adastral Park Campus
• ~ 10 practices progressing research topics annually agreed with business units
• External innovation team ( based in Silicon Valley + Boston(MIT) )
• Big Data & Customer Experience (incl/ social media) Practice• established ~ 2008
• 20 people
• First Hadoop Clusters on AWS ~ 2009
• Migrated to on Premise 2011 ( Cluster 1 )
Research Cluster 1 closed after 4 years / 250K map/reduce jobs
![Page 6: West Yorkshire Enterprise Architecture Patterns For Big Data · •Copper Line Performance Model –Broadband Speed Prediction ... • D&B Legal Entities used as Reference Data ...](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b1f96127f8b9a60128b5926/html5/thumbnails/6.jpg)
The Long View of Big Data
Data “Bigness” =
( Volume, Velocity, Variety)
1990 Y2K
Mainframe (1st Platform)
1960 09
First Research cluster Production Cluster
HAAS = Hadoop as a Service
14
Proprietary, MonolithicBatch, Interactive
COBOL/ISAM/IDMSLinked Record sets Client-Server Applications +
RDBMS (2nd Platform )
OPEN ! 3GL, 4GL
PC & Serverson premise
RELATIONAL
1606
scale out infrastructure(3rd Platform)
Clusters, Data hub, pipelines
Mobile
SocialBig Data
Cloud?
cost/performanceVVV crunch
6/25
![Page 7: West Yorkshire Enterprise Architecture Patterns For Big Data · •Copper Line Performance Model –Broadband Speed Prediction ... • D&B Legal Entities used as Reference Data ...](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b1f96127f8b9a60128b5926/html5/thumbnails/7.jpg)
Early Big Data in BT - 2011-14
What’s causing high fault rates ? Why is early life TV usage low ? What does Social Media think of new product X ?……
BT Head Office
• “Head Office” gave R&I big data practice “fuzzy business problems” to analyse
• Data Science team manually assembled relevant data sets and worked on them to produce correlations, joins and predictive models
• Being outside core IT simultaneously constrains and liberates possibilities
• By 2013 the business units were starting to rely on R&I Hadoop and a production capability was needed
![Page 8: West Yorkshire Enterprise Architecture Patterns For Big Data · •Copper Line Performance Model –Broadband Speed Prediction ... • D&B Legal Entities used as Reference Data ...](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b1f96127f8b9a60128b5926/html5/thumbnails/8.jpg)
Feb 2014 Production Launch “Hadoop as a Service”
• Following a presentation to the TSO Leadership team Dec 2013 an initial inovestment in a production cluster was agreed backed by a plan to launch in Feb 2014
• 60 nodes optimised for Hadoop map/reduce deployed in BT Data Centre in Sheffield(6TB local disks, 1:1 core:spindle ratio, 8GB for JVM per map/reduce slot
• Existing linux 3rd line team tasked with running basic (Min. Viable Product) Hadoop Cluster as a shared service platform
BT HaaS Release 1: 60 Nodes ~ 2 PB Feb 2014 Linux 3rd Line Hadoop Admin
![Page 9: West Yorkshire Enterprise Architecture Patterns For Big Data · •Copper Line Performance Model –Broadband Speed Prediction ... • D&B Legal Entities used as Reference Data ...](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b1f96127f8b9a60128b5926/html5/thumbnails/9.jpg)
What is a “Shared Service Production Big Data Platform”
• User Authentication & Authorisation handled by Active Directory (integrated with Kerberos)
• Hadoop Distributed File Store
• Map/Reduce – Parallel batch processing framework + command utilities (Hive, PIG, sqoop, oozie)
• User Access via telnet/linux Command Line + Browser Based front end (HUE)
• Data Transfer ( batch files via httpFS, Oracle Tables via Sqoop (now Golden Gate), flume for telemetry)
• Single Page Intranet Order Form
• Standard IT Helpdesk (similar to infrastructure services linux, Oracle, WAN)
• Two categories of platform users 1. Developers/Testers/support working on applications using Hadoop to store or process data
2. End users consuming data, 3 broad groups of end users :-
i. Handful of Data Scientists using tools like R-Studio
ii. Tens of business analysts using SQL + Hive ( Data warehouse sandbox users)
iii. Tens of simple users working on specific business problems/Questions ( using Datameer or in a team with (i) & (ii)
9/25
![Page 10: West Yorkshire Enterprise Architecture Patterns For Big Data · •Copper Line Performance Model –Broadband Speed Prediction ... • D&B Legal Entities used as Reference Data ...](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b1f96127f8b9a60128b5926/html5/thumbnails/10.jpg)
Multi-Tenant Hadoop as a Service
HAAS Platform
Hadoop Cluster B (Openreach only)
Order form(SharePoint)
script
Active
Directory
Tennant“Project Owner”
User
admin
StandardUser Admin
Process
Hadoop
Cluster A HAASA AP 00307_12126
HIVE
HDFS
sentry
Job queue
HUE Impala
Flume
BI Server
Create
Hadoop
Features
“HAASA AP 00307_12126
Is ready for you to use”
existing
Business APP
12126 .
Oracle
DBAPP extends footprint in HaaS
http FS
Kerberos
Datameer
Analytics
Review
Board
Platform
Admin
ARB
User Access
Systems Access
Sqoop
Create
Security
Group
![Page 11: West Yorkshire Enterprise Architecture Patterns For Big Data · •Copper Line Performance Model –Broadband Speed Prediction ... • D&B Legal Entities used as Reference Data ...](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b1f96127f8b9a60128b5926/html5/thumbnails/11.jpg)
Service identifiers (link to Architectural Repository)
HAASA AP 00307_12126
Application(Dev/Test/Prod)
End User
Service Instance
No.
( order number )
Cluster (A
or B)
Suffix
“Application ID” Link to legacy application
(you do have an Application register ?)
OR..ARB board approval number R_0030
Prefix Identifies
Hadoop Groups
In Active Directory
Service Type
Service ID = Active Directory Group Name11/24
![Page 12: West Yorkshire Enterprise Architecture Patterns For Big Data · •Copper Line Performance Model –Broadband Speed Prediction ... • D&B Legal Entities used as Reference Data ...](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b1f96127f8b9a60128b5926/html5/thumbnails/12.jpg)
A Modern Enterprise Data Architecture ( V1.0)
CRM
HiveMetaStore
RDBMS
Web/APPServer
MapReduce
code
BI ToolsTableau, Zoomdata…(HIVE TABLE ACCESS)
HDFS
Impala+ Sentry
Wrangling & DiscoveryData Science
Datameer, HUE…(HDFS FILE ACCESS)
Flume
GoldenGate
ERP
RDBMS
Web/APPServer
MapReduce
code
sqoop
DW
RDBMS
Web/APPServer
MapReduce
code
sqoop
1. Event Ingestion from Networks/IT/Web servers
Collection with flume agentslanding in HDFS files 2. DB Table transfer using sqoop
(map/reduce) jobs, landing in HDFS files
ActiveDirectory
FILES
TABLES
snapshotCDC snapshot
Data Scientists
SQL analysts
business users
12/25
![Page 13: West Yorkshire Enterprise Architecture Patterns For Big Data · •Copper Line Performance Model –Broadband Speed Prediction ... • D&B Legal Entities used as Reference Data ...](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b1f96127f8b9a60128b5926/html5/thumbnails/13.jpg)
“Active Data Archive” Pattern
CRM
RDBMSOLTP Schema
MapReduce
code
Sqoop
BI ToolsTableau, OBI…
(HIVE TABLE ACCESS)
TABLESHIVE TABLEReporting
Schema
FILES
• Simple starter pattern for help application designers solve a common problem (long term archiving & managing data retention)
• Brings useful data sets to the cluster
• Can be used to provide a central archive of a particular data set, e.g. VAT Transaction archive ( saves clogging data warehouses)
Auditors+ analysts
13/25
![Page 14: West Yorkshire Enterprise Architecture Patterns For Big Data · •Copper Line Performance Model –Broadband Speed Prediction ... • D&B Legal Entities used as Reference Data ...](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b1f96127f8b9a60128b5926/html5/thumbnails/14.jpg)
Data Warehouse Extension
Data Warehouse
RDBMSBI Schema
MapReduce
Transform
Power user Sandbox offload to Hadoop
BI ToolsTableau, OBI…
TABLES
SandboxHive
Database
FILES
• DW Sandbox offload to Hadoop, esp. heavy ad-hoc DW users
• ETL offload from ETL/ELT servers to Hadoop (Data Reception Area)
• Faster integration of new sources (schema on read, Table oriented)
• Archive Data from DW
Analysts
CRM
RDBMSOLTP Schema
Sqoop
RAW OLTP
Schema
ETLserver
HIV
E VIEW
14/25
![Page 15: West Yorkshire Enterprise Architecture Patterns For Big Data · •Copper Line Performance Model –Broadband Speed Prediction ... • D&B Legal Entities used as Reference Data ...](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b1f96127f8b9a60128b5926/html5/thumbnails/15.jpg)
Re-Platform old Batch Applications Customer Data MDM Hub(CMF)
MDM hub for Business Customer Master file (CMF) • 10 years old and needed re-platforming (2014)
• 12 Source systems with local customer table• D&B Legal Entities used as Reference Data• Existing modules ported to Hadoop/Hive
Benefits• Business able to multiple runs in a day
Hadoop 15x faster• Cost saving over standalone re-platform• Data volumes increased 3x (multiple i/p files)• Adding new sources is quicker (schema on read)
• Data available for Self-Service Teams (DQ/Data Science)
• Using this “ETL Offload” Pattern the Master Address data is being converted to a hybrid hadoop application, Data migration
Hybrid apps are a low risk entry route to big data
1 Pre-Load
CSSCOSMOSS
DISEBTCC2B
AntilliaGlossi
CyclonePhoenixRadianz
Siebel OVSiebel OS
OLD CMFDBStaging
Source Systems
2 Load
3 Match / De-Dupe
4 Key Gen
5 Business Rule
6 Publish
7 Post Load
CMF
Reference Data15/25
![Page 16: West Yorkshire Enterprise Architecture Patterns For Big Data · •Copper Line Performance Model –Broadband Speed Prediction ... • D&B Legal Entities used as Reference Data ...](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b1f96127f8b9a60128b5926/html5/thumbnails/16.jpg)
Copper Line Testing & Performance done as Self Service Analytics
LINE
TESTS
NETWORKINVENTORY
Data
Warehouse
ADDRESS
DSL LINE
SPEED
Calculate 25Million line lengths
Exchange Engineers
Join with line tests & DSL line speed data
TABLES
FILES
Reporting Server
(Shiny)
Self-Service
Data Analysis
Team
Anomalous
Test Results
16/25
![Page 17: West Yorkshire Enterprise Architecture Patterns For Big Data · •Copper Line Performance Model –Broadband Speed Prediction ... • D&B Legal Entities used as Reference Data ...](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b1f96127f8b9a60128b5926/html5/thumbnails/17.jpg)
Self Service Analysis – Data Wrangling
Distribution
Row Count
Unique Values
Min
Max
Mean
Data
Data Profile
17/25
Only needs Excel style skills+ knowledge of business data
![Page 18: West Yorkshire Enterprise Architecture Patterns For Big Data · •Copper Line Performance Model –Broadband Speed Prediction ... • D&B Legal Entities used as Reference Data ...](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b1f96127f8b9a60128b5926/html5/thumbnails/18.jpg)
Broadband Speed Prediction
(data from old systems loaded onto HaaS & analysed in new ways)
Computed Daily Based on5M daily line tests
Proof the model works(blue dots = wasted truck roll
= £M in savings )
New Copper Line Performance Model
18/25
![Page 19: West Yorkshire Enterprise Architecture Patterns For Big Data · •Copper Line Performance Model –Broadband Speed Prediction ... • D&B Legal Entities used as Reference Data ...](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b1f96127f8b9a60128b5926/html5/thumbnails/19.jpg)
Data Visualisation for Exchange Engineers (R-Shiny Server)
![Page 20: West Yorkshire Enterprise Architecture Patterns For Big Data · •Copper Line Performance Model –Broadband Speed Prediction ... • D&B Legal Entities used as Reference Data ...](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b1f96127f8b9a60128b5926/html5/thumbnails/20.jpg)
Nuisance Call Scoring Model (BT Saturn)
Provision, Load & Model with 2BN CDRs in ~ 2 weeks
19/24
![Page 21: West Yorkshire Enterprise Architecture Patterns For Big Data · •Copper Line Performance Model –Broadband Speed Prediction ... • D&B Legal Entities used as Reference Data ...](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b1f96127f8b9a60128b5926/html5/thumbnails/21.jpg)
HaaS Platform Growth & adoption
82 Systems “connected”
to the data hub
400 services
provisioned
Feb 2014
200 @
Feb 2016
Sept 2016
21/25
![Page 22: West Yorkshire Enterprise Architecture Patterns For Big Data · •Copper Line Performance Model –Broadband Speed Prediction ... • D&B Legal Entities used as Reference Data ...](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b1f96127f8b9a60128b5926/html5/thumbnails/22.jpg)
Governing Use of the Platform (Project/System On boarding)
IF {project name} is following standard IT Operating Model* :
Document use of HaaS in the design, use the approved patterns
Order HaaS from Data Centre team
ELSE :
Register {project name} with Analytics Review Board
IF Approved (Order HaaS from Data Centre team)
Documented In“HaaS Cookbook”
Note: This is just governing use of the platform, Data Governance and Compliance is a whole separate topic !Eu GDPR, PCI-DSS, MDM etc. etc.
22/25
Service Request Form
![Page 23: West Yorkshire Enterprise Architecture Patterns For Big Data · •Copper Line Performance Model –Broadband Speed Prediction ... • D&B Legal Entities used as Reference Data ...](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b1f96127f8b9a60128b5926/html5/thumbnails/23.jpg)
HAASA AP 00101_2029
Faults4369
Orders3531
CRM2029
hree existing business applications (CRM, Orders, Faults) extended into HaaS
RDBMSCustomer
TableRDBMS
Orders
TableRDBMS
Faults
Table
T_CustomerHive DB
HAASA
AP 00101_2029
sqoop
V_Customer
HAASA AP 00202_3531
T_OrdersHive DB
HAASA
AP 0202_3531
sqoop
V_Orders
HAASA AP 00303_4369
T_FaultsHive DB
HAASA
AP 0303_4369
sqoop
V_Faults
Business
Data
Stewards
Business Analysts / Data Scientists
CRM
Orders
Faults
Governing Access to Data on the Platform ** WIP **
1. Browse & select data
2. Get Steward Approval
3. Create VIEWs & GRANTs
4. Recommend joins/ Views
Data Catalogue
(Million Table Meta-store)
23/25
![Page 24: West Yorkshire Enterprise Architecture Patterns For Big Data · •Copper Line Performance Model –Broadband Speed Prediction ... • D&B Legal Entities used as Reference Data ...](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b1f96127f8b9a60128b5926/html5/thumbnails/24.jpg)
What’s coming next ? Event Streaming for Time Series Data
Data Streams replaces batch mode
“The Dataflow Model” proceedings of the VLDB Endowment, vol. 8 (2015), pp. 1792-1803
Most analytics done on time series data (events as they arrive)
…web clicks, call detail records, IOT alarms & usage, vehicle telemetry, mobile GPS..
beamFlink24/25
![Page 25: West Yorkshire Enterprise Architecture Patterns For Big Data · •Copper Line Performance Model –Broadband Speed Prediction ... • D&B Legal Entities used as Reference Data ...](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b1f96127f8b9a60128b5926/html5/thumbnails/25.jpg)
Q & A
Phill Radley
Chief Data Architect
Enterprise Architecture Patterns For Big Data