Post on 22-Apr-2022
Strata Data 2018 - London
Audi's journey to an enterprise big data platform
Matthias Graunitz (AUDI AG, Germany)Carsten Herbe (Audi Business Innovation GmbH, Germany)
AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform2
WHO ARE WE?
AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform3
Audi GroupAudi, Lamborghini, Ducati and Italdesign
AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform4
Vorsprung is our promiseStrategy 2025
AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform5
Audi Business Innovation GmbH
...is the development, establishment, sales and operation of innovative concepts, products and services, as well the holding
of shares in the field of future mobility.
Audi mobilityinnovations
Audi on demand
Audi balancedtechnologies
Audi e-gas
Audi customerIT solutions
AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform6
About us
Matthias GraunitzAUDI AG
» Center of Competence Big Data & BI
» Big Data Architect
» 10+ years Data Warehousing & BI
Carsten HerbeAudi Business Innovation GmbH
» Data Platform & Solution Architecture
» Hadoop since 2013
» 10+ years Data Warehousing & BI
AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform7
2 YEARS AGO…
STARTING BIG DATA AT AUDI
AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform8
Analytical Capabilities by 2015
! Data Domains
Finance
Purchase
Production
Quality
Sales
Car Data
Programs Projects Data Scientists
Embed Analytics
Analyze Data
Store, Distribute and Process Data
Deliver InformationSecureData
Infrastruc-ture &
ServicesProvision Data
Deliver Service
Manage Infor-
mation
Design & MaintainSolutions
Authentifi-cation
Data Encryption
Auditing
ComplexEvent
Processing
AnalyitcalAPIs
Dash-boarding
Planning & Simulation
Visual Analytics
BI Report & OLAP
Statistical Methods
Analytical Script
Data Warehouse
Analytical Databases
ETL Framework
Batch Processing
Data Access / APIs
On-Prem Platform
ApplicationDeployment
Hardware, Network, OS
Monitoring
LifecycleMgmt
Development Process & Methods
Master Data Mgmt
Data Lineage
AAP – AUDI ANALYTIC PLATTFORM
AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform9
Analytical Capabilities by 2015
! Data Domains
Finance
Purchase
Production
Quality
Sales
Car Data
Programs Projects Data Scientists
Embed Analytics
Analyze Data
Store, Distribute and Process Data
Deliver InformationSecureData
Infrastruc-ture &
ServicesProvision Data
Deliver Service
Manage Infor-
mation
Design & MaintainSolutions
Authentifi-cation
Data Encryption
Auditing
ComplexEvent
Processing
AnalyitcalAPIs
Dash-boarding
Planning & Simulation
Visual Analytics
BI Report & OLAP
Statistical Methods
Analytical Script
Data Warehouse
Analytical Databases
ETL Framework
Batch Processing
Data Access / APIs
On-Prem Platform
ApplicationDeployment
Hardware, Network, OS
Monitoring
LifecycleMgmt
Development Process & Methods
Master Data Mgmt
Data Lineage
AAP – AUDI ANALYTIC PLATTFORM
AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform10
Analytical Capabilities by 2015
! Data Domains
Finance
Purchase
Production
Quality
Sales
Car Data
Programs Projects Data Scientists
Embed Analytics
Analyze Data
Store, Distribute and Process Data
Deliver InformationSecureData
Infrastruc-ture &
ServicesProvision Data
Deliver Service
Manage Infor-
mation
Design & MaintainSolutions
Authentifi-cation
Data Encryption
Auditing
ComplexEvent
Processing
AnalyitcalAPIs
Dash-boarding
Planning & Simulation
Visual Analytics
BI Report & OLAP
Statistical Methods
Analytical Script
Data Warehouse
Analytical Databases
ETL Framework
Batch Processing
Data Access / APIs
On-Prem Platform
Cloud Platform
ApplicationDeployment
Hardware, Network, OS
Monitoring
LifecycleMgmt
Development Process & Methods
Master Data Mgmt
Data Lineage
AAP – AUDI ANALYTIC PLATTFORM
File Systems (HDFS)
Stream Processing
MachineLearning
AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform11
Our first Hadoop Cluster 2015
Hadoop per node Sum
# data nodes 1 4
RAM 128 GB 0,5 TB
Cores 24 96
HDD* 40 TB 160 TB
DEV
* Raw Capacity without replication and FS overhead!
AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform12
Our first attempt to walkwith Big Data Technologies
SCREWDRIVER ANALYSIS
COMPANY CAR ANALYSIS
AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform13
ENTERPRISE INTEGRATION VS SPEED OF DELIVERY
AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform14
Securing the Cluster as multi-tenant environment Step by step by step towards our target architecture …
Access Control:ACLs
User ManagementLocal OS users
Basic Security: iptables + ssh tunneling
Authentication:LDAP for Hive
Protection from outside:Knox
Protection from insideKerberos
Access ControlFile Attributes
Dedicated network: BI Zone
Access Control & AuditRanger
User ManagementLDAP
AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform15
Legend: password required no password required next step
Password Hell
HiveWebHDFSSparkUI
HDFS/YARNKnox
Audi Active Directory:[ AD User ]
Named UserTechnical Hive User
DATA NODE 1 - X
NAME NODE 1 - 2
EDGE NODE 1 - 2OS Level
[ Local User ]OS Named User
Technical Hive UserTechnical Project User
Hadoop User
SSH 2 EdgeNode
kinit
Hadoop KDC:[ Kerberos Principal ]
Name UserTechnical Hive User
Technical Project UserHadoop User
AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform16
DATAINGESTION
AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform17
Data ingestion: technical requirements from projects, security and ops
» Streaming data
» Batch data
» easy writing to HDFS/DWHINGESTION
» Data Sources should not directly be coupled to analytical backend jobs
» This allows adding new analytical jobs without changing the sourceDECOUPLING
» Data ingestion must be available 24x7
» Data must be buffered (persisted) in case backend or backend job is not availableHA & BUFFERING
» Source systems must not connect directly to the data zone (Hadoop, DWH) – by IT Sec» Authentication + Data in motion encryption (multi tenancy)» Protocol must be auditable» Some data sources run in the cloudSECURITY
» Amount of data will increase over time for most projects
» Number of projects will increaseSCALABILITY
AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform18
Solution: Kerberized Confluent Kafka Platform
FWBI
FWMSG
FWMSG
FWSRC#1
BI Data ZoneData Source network #1 AAP Messaging Zone
authenticationLegend: encrypted (SSL) not encrypted protocol / direction
Data Source network #n FWSRC#n
firewall pain point
Schema RegistryHTTP HTTP
none noneKafka Client
Kerberos
BIN / push
Kafka Client
Kerberos
BIN / push
HDFS ConnectorBIN / pull
Hadoop KDC
Kerberos
HD
FS
Kerberos
Spark StreamingBIN / pull
Kerberos
DataProxy KDC
Kafka Broker
Kerberos Kerberos
BIN BIN
Zookeeper
Kerberos Kerberos
AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform19
Edge Node
Kafka Distributed Connector: unsecured REST API
User Bob
Connector Java Process
Bob‘s Kafka keytab
Bob‘s HDFSkeytab
HDFS Sink Bob
HTTP
Bob’sdata
sinkconfig
Bob
topic Bob
User Eve
sinkconfig
Eve
File Sink Eve
Bob’sdata
HDFS Source
Eve
sourceconfig
Eve
Legend: evil connection good connection
AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform20
TODAYCURRENT STATE
AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform21
Architecture & Network Zones – Data Ingestion
Data Proxy
BI Data Zone
Messaging Zone
Data Warehouse
System A
System A
HDFSConnector
SparkStreaming
Cloud App
System
System
Legend: encrypted (SSL) not encrypted
S3 Backup
AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform22
Architecture & Network Zones – User & Developer Access
PIPE
BI Data Zone
Deployment Zone
BI Application Zone
AAP Data Warehouse
Audi Office LAN
Audi Laptop
Data Mining
Dashboarding
AAP Remote Desktop
Legend: encrypted (SSL) not encrypted
AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform23
Hadoop Cluster Sizing Production 2017
* Raw Capacity without replication and FS overhead!
Hadoop per node Sum
# data nodes 1 12
RAM 512 GB 6 TB
Cores 24 288
HDD* 96 TB 9.216 TB
PR
OD
Kafka per node Sum
# broker nodes 1 4
RAM 32 GB 128 GB
Cores 6 24
HDD* 4 TB 16 TBPR
OD
AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform24
Current state
Organisational Tasks
AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform25
Organisational Tasks
Data Ownership & Data Governance(Data Domain Modell with clear responsibility in each domain)
Lifecycle Management for each Shared Service in strong collaboration with the projects and programs
Defined SLAs for each Shared Service based on general availability, data loss, confidentiality and verifiability
Different Development Lifecycle between car and backend systems
Use of Open Source Software and Support requirements from IT continuity
Balance between multi tenant environment and flexibility
Very long lifecycle of cars > 10 years with various built in software versions
AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform26
TOMORROWWHAT’S UP NEXT
AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform27
Hybrid Approach for the AAP
Public Cloud
On Premise / private Cloud
Entry Zone Application Zone Data Zone
Web GATEWAY
Full Client (Tableau, BO, etc.)
Web Client (Tableau, BO, etc.)
HDP
Data Warehouse
Messaging Zone
Kafka
Internet
RDP GATEWAY
Business User
Ingestor 1*
Repositories
Kn
ox
Direct Cloud Connect
Swarm VPC
KafkaData Inventory
Analytical VPC
Ingestor
HDP
Knox
WE ARE HIRINGhttps://www.audi.com/corporate/de/karriere/einstieg-bei-audi.html
https://karriere.audibusinessinnovation.com/