Designing a Data Warehouse - what would a BI solution recommend?
-
Upload
segah-meer -
Category
Data & Analytics
-
view
1.849 -
download
4
Transcript of Designing a Data Warehouse - what would a BI solution recommend?
![Page 1: Designing a Data Warehouse - what would a BI solution recommend?](https://reader036.fdocuments.in/reader036/viewer/2022062523/58ef3c111a28ab9f4d8b45cf/html5/thumbnails/1.jpg)
Segah MeerSr. Data Consultant, Professional Services
Connect. Describe. Explore.
![Page 2: Designing a Data Warehouse - what would a BI solution recommend?](https://reader036.fdocuments.in/reader036/viewer/2022062523/58ef3c111a28ab9f4d8b45cf/html5/thumbnails/2.jpg)
Designing a Data Warehouse- what would a BI solution recommend?
![Page 3: Designing a Data Warehouse - what would a BI solution recommend?](https://reader036.fdocuments.in/reader036/viewer/2022062523/58ef3c111a28ab9f4d8b45cf/html5/thumbnails/3.jpg)
4 Rules of Thumb
▪ Transparent E(T)L process
▪ Single copy of data
▪ Performance
▪ Shortest path
![Page 4: Designing a Data Warehouse - what would a BI solution recommend?](https://reader036.fdocuments.in/reader036/viewer/2022062523/58ef3c111a28ab9f4d8b45cf/html5/thumbnails/4.jpg)
Transparent E(T)L process
Perform transformations to optimize on performance and shortest-path, but avoid making broad assumptions about the final use case. Ex: how the revenue is calculated
account profit
1 1000
You seeaccount value
1 {revenue: 2000, expenses: 500,account_payable: 500, is_current: true}
2 {revenue: 2000, expenses: 100, is_current: false}
Actual Data
![Page 5: Designing a Data Warehouse - what would a BI solution recommend?](https://reader036.fdocuments.in/reader036/viewer/2022062523/58ef3c111a28ab9f4d8b45cf/html5/thumbnails/5.jpg)
Single Copy of Data
If data can change, store it in a single row. Avoid redundant tables. Ex: customer information
name phone_number
Segah Meer 650-575-5410
... ...
Segah Meer 650-575-5411
account profit
1 1000
account revenue cost
1 1500 500
OR
Redundant TablesDuplicate rows
![Page 6: Designing a Data Warehouse - what would a BI solution recommend?](https://reader036.fdocuments.in/reader036/viewer/2022062523/58ef3c111a28ab9f4d8b45cf/html5/thumbnails/6.jpg)
Performance
▪ databases focused on large data volume reads behave differently from those focused on frequent and “easy” inserts
▪ slow queries are a function of 1) LookML = f(model), 2) db resources, 3) and how the data is stored
Use flatter (wider) tables and don’t be afraid of redundant date columns
![Page 7: Designing a Data Warehouse - what would a BI solution recommend?](https://reader036.fdocuments.in/reader036/viewer/2022062523/58ef3c111a28ab9f4d8b45cf/html5/thumbnails/7.jpg)
Shortest Path
There is very little analytical value derived from modeling “long path” designs with Looker
extra +1 join adds modeling complexity
![Page 8: Designing a Data Warehouse - what would a BI solution recommend?](https://reader036.fdocuments.in/reader036/viewer/2022062523/58ef3c111a28ab9f4d8b45cf/html5/thumbnails/8.jpg)
Imagine a ride-sharing app
id created_at attribute_id
100001 2016-01-01 1
100002 2016-01-01 2
App Events
Example values:
id value
1 {json...}
Attributes
![Page 9: Designing a Data Warehouse - what would a BI solution recommend?](https://reader036.fdocuments.in/reader036/viewer/2022062523/58ef3c111a28ab9f4d8b45cf/html5/thumbnails/9.jpg)
One Intuitive Solution
- explore: events joins: - join: attributes sql_on: ${events.attribute_id} = ${attributes.id}
- explore: users joins: - join: attributes relationship: one_to_many sql_on: ${users.id} = ${attributes.user_id}
- joins: events relationship: one_to_many sql_on: ${attributes.id} = ${events.attribute_id}
- view: attributes fields: - dimension: user_id sql: JSON_EXTRACT(${value}, 'user_id')
- dimension: service_charge sql: JSON_EXTRACT(${value}, 'service_charge')
- dimension: amount sql: ${service_charge} + ${wait_charge} + ${tax}
![Page 10: Designing a Data Warehouse - what would a BI solution recommend?](https://reader036.fdocuments.in/reader036/viewer/2022062523/58ef3c111a28ab9f4d8b45cf/html5/thumbnails/10.jpg)
Let’s see how we did
Bad Bad Bad... Sure O.K.
Shortest Path ✗
Performance ✗
Single Source of Truth
✓
Transparency ✓
![Page 11: Designing a Data Warehouse - what would a BI solution recommend?](https://reader036.fdocuments.in/reader036/viewer/2022062523/58ef3c111a28ab9f4d8b45cf/html5/thumbnails/11.jpg)
Can we do better?
id created_at event_type amount location
100001 2016-01-01 transaction 14.3
100002 2016-01-01 ride_started 37.7833° N, 122.4167° W
Production
Data Warehouse
... ... ...
.. .. ...
... ... ..
... ...
.. ..
... ...
ETL
![Page 12: Designing a Data Warehouse - what would a BI solution recommend?](https://reader036.fdocuments.in/reader036/viewer/2022062523/58ef3c111a28ab9f4d8b45cf/html5/thumbnails/12.jpg)
Pre-flattening the table
SELECT id , created_at , JSON_EXTRACT(attribute.value,'type') AS event_type , JSON_EXTRACT(attribute.value,'service_charge') + JSON_EXTRACT(attribute.value,'wait_charge') +JSON_EXTRACT(attribute.value,'tax') AS amount , JSON_EXTRACT(attribute.value,'location') AS locationFROM eventsLEFT JOIN attributes ON events.attribute_id = attributes.id
ETL
![Page 13: Designing a Data Warehouse - what would a BI solution recommend?](https://reader036.fdocuments.in/reader036/viewer/2022062523/58ef3c111a28ab9f4d8b45cf/html5/thumbnails/13.jpg)
Let’s see how we did #2
Bad Sure O.K.
Shortest Path ✓
Performance ✓
Single Source of Truth ✗
Transparency ✗
- explore: users joins: - joins: event_attributes relationship: one_to_many sql_on: ${users.id} = ${event_attributes.user_id}
- view: event_attributes fields: - dimension: user_id sql: ${TABLE}.user_id
- dimension: amount sql: ${TABLE}.amount...
![Page 14: Designing a Data Warehouse - what would a BI solution recommend?](https://reader036.fdocuments.in/reader036/viewer/2022062523/58ef3c111a28ab9f4d8b45cf/html5/thumbnails/14.jpg)
Let’s try another improvement
id created_at user_id service_charge
wait_charge
tax
100001 2016-01-01 1 10 3 1.3
Data WarehouseTransaction Events
id created_at user_id location
100002 2016-01-01 1 37.7833° N, 122.4167° W
Ride_started Events
![Page 15: Designing a Data Warehouse - what would a BI solution recommend?](https://reader036.fdocuments.in/reader036/viewer/2022062523/58ef3c111a28ab9f4d8b45cf/html5/thumbnails/15.jpg)
Let’s try another improvementModel- explore: events joins: - joins: transaction_events view_label: 'Events' relationship: one_to_one sql_on: ${events.id} = ${transaction_events.id}
- explore: users joins: - join: events relationship: one_to_many sql_on: ${users.id} = ${events.user_id}
- view: events: derived_table: sql: | SELECT id, created_at, user_id FROM transaction_events UNION ALL SELECT id, created_at, user_id FROM ride_started_events
- view: transaction_events...
- view: ride_started_events...
![Page 16: Designing a Data Warehouse - what would a BI solution recommend?](https://reader036.fdocuments.in/reader036/viewer/2022062523/58ef3c111a28ab9f4d8b45cf/html5/thumbnails/16.jpg)
Let’s see how we did #3
Bad Sure O.K.
Shortest Path ✓
Performance ✓
Single Source of Truth ✓
Transparency ✓