Big Data BluePrint
-
Upload
daan-gerits -
Category
Data & Analytics
-
view
427 -
download
1
Transcript of Big Data BluePrint
![Page 1: Big Data BluePrint](https://reader033.fdocuments.in/reader033/viewer/2022042817/55a200fc1a28ab47268b4570/html5/thumbnails/1.jpg)
Big Data BluePrintArchitect for change
@daangerits#bdbp
![Page 3: Big Data BluePrint](https://reader033.fdocuments.in/reader033/viewer/2022042817/55a200fc1a28ab47268b4570/html5/thumbnails/3.jpg)
Agenda
ConceptsArchitecture
Examples
![Page 4: Big Data BluePrint](https://reader033.fdocuments.in/reader033/viewer/2022042817/55a200fc1a28ab47268b4570/html5/thumbnails/4.jpg)
Concepts
![Page 5: Big Data BluePrint](https://reader033.fdocuments.in/reader033/viewer/2022042817/55a200fc1a28ab47268b4570/html5/thumbnails/5.jpg)
TransCo
Meet TransCo - Parcel delivery service
![Page 6: Big Data BluePrint](https://reader033.fdocuments.in/reader033/viewer/2022042817/55a200fc1a28ab47268b4570/html5/thumbnails/6.jpg)
Common interactions
A customer requesting a quote
A website visitor clicking on a link
Booking a financial transaction
A delivery truck pinging its GPS coördinates
![Page 7: Big Data BluePrint](https://reader033.fdocuments.in/reader033/viewer/2022042817/55a200fc1a28ab47268b4570/html5/thumbnails/7.jpg)
TransCo
All these have a similar thing:
Events
ITFinanceLegalLogisticsSalesCommunications...
![Page 8: Big Data BluePrint](https://reader033.fdocuments.in/reader033/viewer/2022042817/55a200fc1a28ab47268b4570/html5/thumbnails/8.jpg)
Events
Events used to manipulate our master data
![Page 9: Big Data BluePrint](https://reader033.fdocuments.in/reader033/viewer/2022042817/55a200fc1a28ab47268b4570/html5/thumbnails/9.jpg)
Events
Today, events ARE our master data
![Page 10: Big Data BluePrint](https://reader033.fdocuments.in/reader033/viewer/2022042817/55a200fc1a28ab47268b4570/html5/thumbnails/10.jpg)
Anatomy of an event
Timestamp
When did it happen?
Origin
Where did it came from?
Actor
Who did it?
Subject
Who was affected?
Facts
What changed?
Event
![Page 11: Big Data BluePrint](https://reader033.fdocuments.in/reader033/viewer/2022042817/55a200fc1a28ab47268b4570/html5/thumbnails/11.jpg)
Anatomy of an event - example
2014-05-0313:40:51
timestamp
CRM Application
origin
Daan Gerits
actor
Alfred Hitchcock
subject
street=”...”vat=”...”
facts
Event
![Page 12: Big Data BluePrint](https://reader033.fdocuments.in/reader033/viewer/2022042817/55a200fc1a28ab47268b4570/html5/thumbnails/12.jpg)
Architecture
![Page 13: Big Data BluePrint](https://reader033.fdocuments.in/reader033/viewer/2022042817/55a200fc1a28ab47268b4570/html5/thumbnails/13.jpg)
Store
View Generator
View Generator
Overview
Translate entities into events and
facts.
Resolve values to ids. Especially
subject, actor and origin.
Explode a single fact to multiple
rollup levels. Only explode if applicable.
Store the raw events so we can replay whenever
we want.
DetonatorLinkerTranslator
Ingest View generators can perform analytical tasks on the incoming events.
The generated view can be stored in a storage system of choice.
S
I
T L D
V
V
![Page 14: Big Data BluePrint](https://reader033.fdocuments.in/reader033/viewer/2022042817/55a200fc1a28ab47268b4570/html5/thumbnails/14.jpg)
Ingest
S
I
T L D
V
V
Get records in from other systems
- Event Bus/Broker
- Ingestion System like Flume / Sqoop / …
- ETL processes (not recommended)
- Backups
- Nagios / Statsd / Ganglia / ...
![Page 15: Big Data BluePrint](https://reader033.fdocuments.in/reader033/viewer/2022042817/55a200fc1a28ab47268b4570/html5/thumbnails/15.jpg)
Translator
Convert records into events- 1 record field = 1 fact- record timestamp vs generated timestamp
Only store changed facts- What changed?- Compare with existing views
S
I
T L D
V
V
![Page 16: Big Data BluePrint](https://reader033.fdocuments.in/reader033/viewer/2022042817/55a200fc1a28ab47268b4570/html5/thumbnails/16.jpg)
Store
Persist the events as they are
Raw Data- Source of truth- Recovery
Optimize Storage- Parquet, Avro, Thrift, ...
S
I
T L D
V
V
![Page 17: Big Data BluePrint](https://reader033.fdocuments.in/reader033/viewer/2022042817/55a200fc1a28ab47268b4570/html5/thumbnails/17.jpg)
Linker
Resolve event fields- “Daan Gerits” == id 44543-45436-9928
Optimize for speed- Use lookup tables- Group data if needed
S
I
T L D
V
V
![Page 18: Big Data BluePrint](https://reader033.fdocuments.in/reader033/viewer/2022042817/55a200fc1a28ab47268b4570/html5/thumbnails/18.jpg)
Detonator
Explode a fact to multiple rollup levels
Why?- Real-time rollups- Running analytics
When?- if there is an hierarchy in actor or actee- if there is an hierarchy in timestamp
S
I
T L D
V
V
IN OUT
{ts: 2014-05-19, fact: …} {ts: 2014-05-19, fact: …}{ts: 2014-05, fact: …}
{ts: 2014, fact: …}
![Page 19: Big Data BluePrint](https://reader033.fdocuments.in/reader033/viewer/2022042817/55a200fc1a28ab47268b4570/html5/thumbnails/19.jpg)
View Generator
Use facts to generate a view
A view is- != database view- read-only- optimised data model for a single purpose- disposable- based on all facts (facts depth & width)
A view generator manipulates- RDBMs, graphs, search indexes, ...
S
I
T L D
V
V
![Page 20: Big Data BluePrint](https://reader033.fdocuments.in/reader033/viewer/2022042817/55a200fc1a28ab47268b4570/html5/thumbnails/20.jpg)
Rules of the game
Only add and remove are allowed
Events are re-playable
Remove only be done by BDA’s (Big Data Administrators)
![Page 21: Big Data BluePrint](https://reader033.fdocuments.in/reader033/viewer/2022042817/55a200fc1a28ab47268b4570/html5/thumbnails/21.jpg)
Example
![Page 22: Big Data BluePrint](https://reader033.fdocuments.in/reader033/viewer/2022042817/55a200fc1a28ab47268b4570/html5/thumbnails/22.jpg)
Add Customer
IN:processing system: CRM
user: “fbaker”
data: { id: “9332-DG”, name: ”Daan Gerits”, address: “container 9” }
DATA:event ID origin actor subject timestamp fact value
1 crm fbaker 9332-DG 20140514 name Daan Gerits
1 crm fbaker 9332-DG 20140514 address container 9
![Page 23: Big Data BluePrint](https://reader033.fdocuments.in/reader033/viewer/2022042817/55a200fc1a28ab47268b4570/html5/thumbnails/23.jpg)
Update Customer
IN:processing system: ERP
user: “wvl”
data: { id: “9332-DG”, address: “container 24” }
DATA:event ID origin actor subject timestamp fact value
1 crm fbaker 9332-DG 20140514 name Daan Gerits
1 crm fbaker 9332-DG 20140514 address container 9
39 erp wvl 9332-DG 20141109 address container 24
![Page 24: Big Data BluePrint](https://reader033.fdocuments.in/reader033/viewer/2022042817/55a200fc1a28ab47268b4570/html5/thumbnails/24.jpg)
DELETE Customer
IN:processing system: ERP
user: “fbaker”
data: { id: “9332-DG” }
DATA:event ID origin actor subject timestamp fact value
1 crm fbaker 9332-DG 20140514 name Daan Gerits
1 crm fbaker 9332-DG 20140514 address container 9
39 erp wvl 9332-DG 20141109 address container 24
63 erp fbaker 9332-DG 20141201 address
63 erp fbaker 9332-DG 20141201 name
![Page 25: Big Data BluePrint](https://reader033.fdocuments.in/reader033/viewer/2022042817/55a200fc1a28ab47268b4570/html5/thumbnails/25.jpg)
Aaaarrgghhh!!
IN:processing system: ERP
user: “fbaker”
data: { id: “9332-DG” }
event ID origin actor subject timestamp fact value
1 crm fbaker 9332-DG 20140514 name Daan Gerits
1 crm fbaker 9332-DG 20140514 address container 9
39 erp wvl 9332-DG 20141109 address container 24
63 erp fbaker 9332-DG 20141201 address
63 erp fbaker 9332-DG 20141201 name
64 erp wvl 9332-DG 20141109 address container 24
64 crm fbaker 9332-DG 20140514 name Daan Gerits
![Page 26: Big Data BluePrint](https://reader033.fdocuments.in/reader033/viewer/2022042817/55a200fc1a28ab47268b4570/html5/thumbnails/26.jpg)
Allows fact trendingdriver statistics for his whole career
Allows state regenerationthe state of all facts on februari 12, 2005
Is human-error-proofremove the facts with eventId #
Scales very well
Conclusion
![Page 27: Big Data BluePrint](https://reader033.fdocuments.in/reader033/viewer/2022042817/55a200fc1a28ab47268b4570/html5/thumbnails/27.jpg)
We don’t hire datascientists, architects, developers, ux designers
or engineers.We hire individuals
Sh
am
ele
ss P
lug
Th
an
k Yo
u!