OLAP Theory-English version On-Line Analytical processing (Buisness Intzlligence)
Data-Intensive Distributed Computingcs451/slides/big... · 2020-02-13 · OLAP (online analytical...
Transcript of Data-Intensive Distributed Computingcs451/slides/big... · 2020-02-13 · OLAP (online analytical...
![Page 1: Data-Intensive Distributed Computingcs451/slides/big... · 2020-02-13 · OLAP (online analytical processing) Typical applications: business intelligence, data mining Back-end processing:](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec9a05d19c36801164fcf07/html5/thumbnails/1.jpg)
Data-Intensive Distributed Computing
Part 5: Analyzing Relational Data (1/3)
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United StatesSee http://creativecommons.org/licenses/by-nc-sa/3.0/us/ for details
CS 431/631 451/651 (Winter 2020)
Ali Abedi
These slides are available at https://www.student.cs.uwaterloo.ca/~cs451
1
![Page 2: Data-Intensive Distributed Computingcs451/slides/big... · 2020-02-13 · OLAP (online analytical processing) Typical applications: business intelligence, data mining Back-end processing:](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec9a05d19c36801164fcf07/html5/thumbnails/2.jpg)
Structure of the Course
“Core” framework features
and algorithm design
Analy
zin
gT
ext
Analy
zin
gG
raphs
Analy
zin
g
Rela
tional D
ata
Data
Min
ing
2
![Page 3: Data-Intensive Distributed Computingcs451/slides/big... · 2020-02-13 · OLAP (online analytical processing) Typical applications: business intelligence, data mining Back-end processing:](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec9a05d19c36801164fcf07/html5/thumbnails/3.jpg)
Evolution of Enterprise Architectures
Next two sessions: techniques, algorithms, and optimizations for relational processing
3
![Page 4: Data-Intensive Distributed Computingcs451/slides/big... · 2020-02-13 · OLAP (online analytical processing) Typical applications: business intelligence, data mining Back-end processing:](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec9a05d19c36801164fcf07/html5/thumbnails/4.jpg)
MonolithicApplication
users
4
![Page 5: Data-Intensive Distributed Computingcs451/slides/big... · 2020-02-13 · OLAP (online analytical processing) Typical applications: business intelligence, data mining Back-end processing:](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec9a05d19c36801164fcf07/html5/thumbnails/5.jpg)
Frontend
Backend
users
5
![Page 6: Data-Intensive Distributed Computingcs451/slides/big... · 2020-02-13 · OLAP (online analytical processing) Typical applications: business intelligence, data mining Back-end processing:](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec9a05d19c36801164fcf07/html5/thumbnails/6.jpg)
6
Edgar F. Codd
• Inventor of the relational model for DBs
• SQL was created based on his work
• Turing award winner in 1981
![Page 7: Data-Intensive Distributed Computingcs451/slides/big... · 2020-02-13 · OLAP (online analytical processing) Typical applications: business intelligence, data mining Back-end processing:](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec9a05d19c36801164fcf07/html5/thumbnails/7.jpg)
Frontend
Backend
users
database
7
![Page 8: Data-Intensive Distributed Computingcs451/slides/big... · 2020-02-13 · OLAP (online analytical processing) Typical applications: business intelligence, data mining Back-end processing:](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec9a05d19c36801164fcf07/html5/thumbnails/8.jpg)
An organization should retain data that result from carrying out its mission and exploit those data to generate insights that benefit the organization, for example, market analysis, strategic planning, decision making, etc.
Business Intelligence
8
![Page 9: Data-Intensive Distributed Computingcs451/slides/big... · 2020-02-13 · OLAP (online analytical processing) Typical applications: business intelligence, data mining Back-end processing:](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec9a05d19c36801164fcf07/html5/thumbnails/9.jpg)
Frontend
Backend
users
database
BI tools
analysts
9
![Page 10: Data-Intensive Distributed Computingcs451/slides/big... · 2020-02-13 · OLAP (online analytical processing) Typical applications: business intelligence, data mining Back-end processing:](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec9a05d19c36801164fcf07/html5/thumbnails/10.jpg)
Frontend
Backend
users
database
BI tools
analysts
Why is myapplication so slow?
Why does my analysis take so
long?
10
![Page 11: Data-Intensive Distributed Computingcs451/slides/big... · 2020-02-13 · OLAP (online analytical processing) Typical applications: business intelligence, data mining Back-end processing:](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec9a05d19c36801164fcf07/html5/thumbnails/11.jpg)
Database Workloads
OLTP (online transaction processing)Typical applications: e-commerce, banking, airline reservations
User facing: real-time, low latency, highly-concurrentTasks: relatively small set of “standard” transactional queries
Data access pattern: random reads, updates, writes (small amounts of data)
OLAP (online analytical processing)Typical applications: business intelligence, data mining
Back-end processing: batch workloads, less concurrencyTasks: complex analytical queries, often ad hoc
Data access pattern: table scans, large amounts of data per query
11
![Page 12: Data-Intensive Distributed Computingcs451/slides/big... · 2020-02-13 · OLAP (online analytical processing) Typical applications: business intelligence, data mining Back-end processing:](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec9a05d19c36801164fcf07/html5/thumbnails/12.jpg)
OLTP and OLAP Together?
Downsides of co-existing OLTP and OLAP workloadsPoor memory management
Conflicting data access patternsVariable latency
Solution?
users and analysts
12
![Page 13: Data-Intensive Distributed Computingcs451/slides/big... · 2020-02-13 · OLAP (online analytical processing) Typical applications: business intelligence, data mining Back-end processing:](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec9a05d19c36801164fcf07/html5/thumbnails/13.jpg)
Source: Wikipedia (Warehouse)
Build a data warehouse!
13
![Page 14: Data-Intensive Distributed Computingcs451/slides/big... · 2020-02-13 · OLAP (online analytical processing) Typical applications: business intelligence, data mining Back-end processing:](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec9a05d19c36801164fcf07/html5/thumbnails/14.jpg)
Frontend
Backend
users
BI tools
analysts
ETL(Extract, Transform, and Load)
Data Warehouse
OLTP database
OLTP database for user-facing transactions
OLAP database for data warehousing
14
![Page 15: Data-Intensive Distributed Computingcs451/slides/big... · 2020-02-13 · OLAP (online analytical processing) Typical applications: business intelligence, data mining Back-end processing:](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec9a05d19c36801164fcf07/html5/thumbnails/15.jpg)
Customer Billing
OrderInventory
OrderLine
A Simple OLTP Schema
15
![Page 16: Data-Intensive Distributed Computingcs451/slides/big... · 2020-02-13 · OLAP (online analytical processing) Typical applications: business intelligence, data mining Back-end processing:](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec9a05d19c36801164fcf07/html5/thumbnails/16.jpg)
Dim_Customer
Dim_Date
Dim_ProductFact_Sales
Dim_Store
A Simple OLAP Schema
16
![Page 17: Data-Intensive Distributed Computingcs451/slides/big... · 2020-02-13 · OLAP (online analytical processing) Typical applications: business intelligence, data mining Back-end processing:](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec9a05d19c36801164fcf07/html5/thumbnails/17.jpg)
ETL
TransformData cleaning and integrity checking
Schema conversionField transformations
When does ETL happen?
Extract
Load
17
![Page 18: Data-Intensive Distributed Computingcs451/slides/big... · 2020-02-13 · OLAP (online analytical processing) Typical applications: business intelligence, data mining Back-end processing:](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec9a05d19c36801164fcf07/html5/thumbnails/18.jpg)
Frontend
Backend
users
BI tools
analysts
ETL(Extract, Transform, and Load)
Data Warehouse
OLTP database
My data is a day old… Meh.
18
![Page 19: Data-Intensive Distributed Computingcs451/slides/big... · 2020-02-13 · OLAP (online analytical processing) Typical applications: business intelligence, data mining Back-end processing:](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec9a05d19c36801164fcf07/html5/thumbnails/19.jpg)
Frontend
Backend
users
BI tools
analysts
ETL(Extract, Transform, and Load)
Data Warehouse
OLTP database
Frontend
Backend
users
Frontend
Backend
external APIs
OLTP database
OLTP database
19
![Page 20: Data-Intensive Distributed Computingcs451/slides/big... · 2020-02-13 · OLAP (online analytical processing) Typical applications: business intelligence, data mining Back-end processing:](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec9a05d19c36801164fcf07/html5/thumbnails/20.jpg)
What do you actually do?
Dashboards
Report generation
Ad hoc analyses
20
![Page 21: Data-Intensive Distributed Computingcs451/slides/big... · 2020-02-13 · OLAP (online analytical processing) Typical applications: business intelligence, data mining Back-end processing:](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec9a05d19c36801164fcf07/html5/thumbnails/21.jpg)
slice and dice
Common operations
roll up/drill down
pivot
OLAP Cubes
21
![Page 22: Data-Intensive Distributed Computingcs451/slides/big... · 2020-02-13 · OLAP (online analytical processing) Typical applications: business intelligence, data mining Back-end processing:](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec9a05d19c36801164fcf07/html5/thumbnails/22.jpg)
OLAP Cubes: Challenges
Fundamentally, lots of joins, group-bys and aggregationsHow to take advantage of schema structure to avoid repeated work?
Cube materializationRealistic to materialize the entire cube?If not, how/when/what to materialize?
22
![Page 23: Data-Intensive Distributed Computingcs451/slides/big... · 2020-02-13 · OLAP (online analytical processing) Typical applications: business intelligence, data mining Back-end processing:](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec9a05d19c36801164fcf07/html5/thumbnails/23.jpg)
Frontend
Backend
users
BI tools
analysts
ETL(Extract, Transform, and Load)
Data Warehouse
OLTP database
Frontend
Backend
users
Frontend
Backend
external APIs
OLTP database
OLTP database
23
![Page 24: Data-Intensive Distributed Computingcs451/slides/big... · 2020-02-13 · OLAP (online analytical processing) Typical applications: business intelligence, data mining Back-end processing:](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec9a05d19c36801164fcf07/html5/thumbnails/24.jpg)
Fast forward…
24
![Page 25: Data-Intensive Distributed Computingcs451/slides/big... · 2020-02-13 · OLAP (online analytical processing) Typical applications: business intelligence, data mining Back-end processing:](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec9a05d19c36801164fcf07/html5/thumbnails/25.jpg)
“On the first day of logging the Facebook clickstream, more than 400 gigabytes of data was collected. The load, index, and aggregation processes for this data set really taxed the Oracle data warehouse. Even after significant tuning, we were unable to aggregate a day of clickstream data in less than 24 hours.”
Jeff Hammerbacher, Information Platforms and the Rise of the Data Scientist. In, Beautiful Data, O’Reilly, 2009.
25
![Page 26: Data-Intensive Distributed Computingcs451/slides/big... · 2020-02-13 · OLAP (online analytical processing) Typical applications: business intelligence, data mining Back-end processing:](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec9a05d19c36801164fcf07/html5/thumbnails/26.jpg)
Frontend
Backend
users
BI tools
analysts
ETL(Extract, Transform, and Load)
Data Warehouse
OLTP database
Facebook context?
26
![Page 27: Data-Intensive Distributed Computingcs451/slides/big... · 2020-02-13 · OLAP (online analytical processing) Typical applications: business intelligence, data mining Back-end processing:](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec9a05d19c36801164fcf07/html5/thumbnails/27.jpg)
Frontend
Backend
users
BI tools
analysts
ETL(Extract, Transform, and Load)
Data Warehouse
“OLTP”
Adding friendsUpdating profilesLikes, comments…
Feed rankingFriend recommendationDemographic analysis…
27
![Page 28: Data-Intensive Distributed Computingcs451/slides/big... · 2020-02-13 · OLAP (online analytical processing) Typical applications: business intelligence, data mining Back-end processing:](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec9a05d19c36801164fcf07/html5/thumbnails/28.jpg)
Frontend
Backend
users
analysts
ETL(Extract, Transform, and Load)
“OLTP” PHP/MySQL
data scientists✗
Hadoop
or ELT?
28
![Page 29: Data-Intensive Distributed Computingcs451/slides/big... · 2020-02-13 · OLAP (online analytical processing) Typical applications: business intelligence, data mining Back-end processing:](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec9a05d19c36801164fcf07/html5/thumbnails/29.jpg)
What’s changed?
Dropping cost of disksCheaper to store everything than to figure out what to throw away
29
![Page 30: Data-Intensive Distributed Computingcs451/slides/big... · 2020-02-13 · OLAP (online analytical processing) Typical applications: business intelligence, data mining Back-end processing:](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec9a05d19c36801164fcf07/html5/thumbnails/30.jpg)
What’s changed?
Dropping cost of disksCheaper to store everything than to figure out what to throw away
Rise of social media and user-generated contentLarge increase in data volume
Growing maturity of data mining techniquesDemonstrates value of data analytics
Types of data collectedFrom data that’s obviously valuable to data whose value is less apparent
30
![Page 31: Data-Intensive Distributed Computingcs451/slides/big... · 2020-02-13 · OLAP (online analytical processing) Typical applications: business intelligence, data mining Back-end processing:](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec9a05d19c36801164fcf07/html5/thumbnails/31.jpg)
a useful service
analyze user behavior to extract insights
transform insights into action
$(hopefully)
Google. Facebook. Twitter. Amazon. Uber.
Virtuous Product Cycle
31
![Page 32: Data-Intensive Distributed Computingcs451/slides/big... · 2020-02-13 · OLAP (online analytical processing) Typical applications: business intelligence, data mining Back-end processing:](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec9a05d19c36801164fcf07/html5/thumbnails/32.jpg)
What do you actually do?
Dashboards
Report generation
Ad hoc analyses“Descriptive”“Predictive”
Data products
32
![Page 33: Data-Intensive Distributed Computingcs451/slides/big... · 2020-02-13 · OLAP (online analytical processing) Typical applications: business intelligence, data mining Back-end processing:](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec9a05d19c36801164fcf07/html5/thumbnails/33.jpg)
a useful service
analyze user behavior to extract insights
transform insights into action
$(hopefully)
Google. Facebook. Twitter. Amazon. Uber.
data sciencedata products
Virtuous Product Cycle
33
![Page 34: Data-Intensive Distributed Computingcs451/slides/big... · 2020-02-13 · OLAP (online analytical processing) Typical applications: business intelligence, data mining Back-end processing:](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec9a05d19c36801164fcf07/html5/thumbnails/34.jpg)
“On the first day of logging the Facebook clickstream, more than 400 gigabytes of data was collected. The load, index, and aggregation processes for this data set really taxed the Oracle data warehouse. Even after significant tuning, we were unable to aggregate a day of clickstream data in less than 24 hours.”
Jeff Hammerbacher, Information Platforms and the Rise of the Data Scientist. In, Beautiful Data, O’Reilly, 2009.
34
![Page 35: Data-Intensive Distributed Computingcs451/slides/big... · 2020-02-13 · OLAP (online analytical processing) Typical applications: business intelligence, data mining Back-end processing:](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec9a05d19c36801164fcf07/html5/thumbnails/35.jpg)
Frontend
Backend
users
data scientists
ETL(Extract, Transform, and Load)
“OLTP”
Hadoop
35
![Page 36: Data-Intensive Distributed Computingcs451/slides/big... · 2020-02-13 · OLAP (online analytical processing) Typical applications: business intelligence, data mining Back-end processing:](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec9a05d19c36801164fcf07/html5/thumbnails/36.jpg)
Frontend
Backend
users
ETL(Extract, Transform, and Load)
Hadoop
Wait, so why not use a database to begin with?
The Irony…
“OLTP”
data scientists
36
![Page 37: Data-Intensive Distributed Computingcs451/slides/big... · 2020-02-13 · OLAP (online analytical processing) Typical applications: business intelligence, data mining Back-end processing:](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec9a05d19c36801164fcf07/html5/thumbnails/37.jpg)
Why not just use a database?
Scalability. Cost.
SQL is awesome
37
![Page 38: Data-Intensive Distributed Computingcs451/slides/big... · 2020-02-13 · OLAP (online analytical processing) Typical applications: business intelligence, data mining Back-end processing:](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec9a05d19c36801164fcf07/html5/thumbnails/38.jpg)
Databases are great…
If your data has structure (and you know what the structure is)
If you know what queries you’re going to run ahead of timeIf your data is reasonably clean
Databases are not so great…
If your data has little structure (or you don’t know the structure)
If you don’t know what you’re looking forIf your data is messy and noisy
38
![Page 39: Data-Intensive Distributed Computingcs451/slides/big... · 2020-02-13 · OLAP (online analytical processing) Typical applications: business intelligence, data mining Back-end processing:](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec9a05d19c36801164fcf07/html5/thumbnails/39.jpg)
“there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are unknown unknowns – the ones we don't know we don't know…” – Donald Rumsfeld
Source: Wikipedia39
![Page 40: Data-Intensive Distributed Computingcs451/slides/big... · 2020-02-13 · OLAP (online analytical processing) Typical applications: business intelligence, data mining Back-end processing:](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec9a05d19c36801164fcf07/html5/thumbnails/40.jpg)
One who knows and knows that he knows
His horse of wisdom will reach the skies
One who doesn't know, but knows that he doesn't know
His limping mule will eventually get him home
One who doesn't know and doesn't know that he doesn't know
He will be eternally lost in his hopeless ignorance!
Ibn Yamin (1286-1368)
![Page 41: Data-Intensive Distributed Computingcs451/slides/big... · 2020-02-13 · OLAP (online analytical processing) Typical applications: business intelligence, data mining Back-end processing:](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec9a05d19c36801164fcf07/html5/thumbnails/41.jpg)
Databases are great…
If your data has structure (and you know what the structure is)
If you know what queries you’re going to run ahead of timeIf your data is reasonably clean
Databases are not so great…
If your data has little structure (or you don’t know the structure)
If you don’t know what you’re looking forIf your data is messy and noisy
41
![Page 42: Data-Intensive Distributed Computingcs451/slides/big... · 2020-02-13 · OLAP (online analytical processing) Typical applications: business intelligence, data mining Back-end processing:](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec9a05d19c36801164fcf07/html5/thumbnails/42.jpg)
Don’t need to know the schema ahead of time
Many analyses are better formulated imperatively
Raw scans are the most common operations
Much faster data ingest rate
Advantages of Hadoop dataflow languages
42
![Page 43: Data-Intensive Distributed Computingcs451/slides/big... · 2020-02-13 · OLAP (online analytical processing) Typical applications: business intelligence, data mining Back-end processing:](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec9a05d19c36801164fcf07/html5/thumbnails/43.jpg)
What do you actually do?
Dashboards
Report generation
Ad hoc analyses“Descriptive”“Predictive”
Data products
43
![Page 44: Data-Intensive Distributed Computingcs451/slides/big... · 2020-02-13 · OLAP (online analytical processing) Typical applications: business intelligence, data mining Back-end processing:](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec9a05d19c36801164fcf07/html5/thumbnails/44.jpg)
Frontend
Backend
users
BI tools
analysts
ETL(Extract, Transform, and Load)
Data Warehouse
OLTP database
Frontend
Backend
users
Frontend
Backend
external APIs
OLTP database
OLTP database
44
![Page 45: Data-Intensive Distributed Computingcs451/slides/big... · 2020-02-13 · OLAP (online analytical processing) Typical applications: business intelligence, data mining Back-end processing:](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec9a05d19c36801164fcf07/html5/thumbnails/45.jpg)
Frontend
Backend
users
Frontend
Backend
users
Frontend
Backend
external APIs
“Traditional”BI tools
SQL on Hadoop
Othertools
Data Warehouse“Data Lake”
data scientists
OLTP database
ETL(Extract, Transform, and Load)
OLTP database
OLTP database
45
![Page 46: Data-Intensive Distributed Computingcs451/slides/big... · 2020-02-13 · OLAP (online analytical processing) Typical applications: business intelligence, data mining Back-end processing:](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec9a05d19c36801164fcf07/html5/thumbnails/46.jpg)
What’s Next?
46
![Page 47: Data-Intensive Distributed Computingcs451/slides/big... · 2020-02-13 · OLAP (online analytical processing) Typical applications: business intelligence, data mining Back-end processing:](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec9a05d19c36801164fcf07/html5/thumbnails/47.jpg)
Frontend
Backend
users
database
BI tools
analysts
47
![Page 48: Data-Intensive Distributed Computingcs451/slides/big... · 2020-02-13 · OLAP (online analytical processing) Typical applications: business intelligence, data mining Back-end processing:](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec9a05d19c36801164fcf07/html5/thumbnails/48.jpg)
Frontend
Backend
users
BI tools
analysts
ETL(Extract, Transform, and Load)
Data Warehouse
OLTP database
Frontend
Backend
users
Frontend
Backend
external APIs
OLTP database
OLTP database
48
![Page 49: Data-Intensive Distributed Computingcs451/slides/big... · 2020-02-13 · OLAP (online analytical processing) Typical applications: business intelligence, data mining Back-end processing:](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec9a05d19c36801164fcf07/html5/thumbnails/49.jpg)
Frontend
Backend
users
Frontend
Backend
users
Frontend
Backend
external APIs
“Traditional”BI tools
SQL on Hadoop
Othertools
Data Warehouse“Data Lake”
data scientists
OLTP database
ETL(Extract, Transform, and Load)
OLTP database
OLTP database
My data is a day old… I refuse to
accept that!49
![Page 50: Data-Intensive Distributed Computingcs451/slides/big... · 2020-02-13 · OLAP (online analytical processing) Typical applications: business intelligence, data mining Back-end processing:](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec9a05d19c36801164fcf07/html5/thumbnails/50.jpg)
ETL
OLAPOLTP
What if you didn’t have to do this?
50
![Page 51: Data-Intensive Distributed Computingcs451/slides/big... · 2020-02-13 · OLAP (online analytical processing) Typical applications: business intelligence, data mining Back-end processing:](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec9a05d19c36801164fcf07/html5/thumbnails/51.jpg)
HTAP
Hybrid Transactional/Analytical Processing (HTAP)
51
![Page 52: Data-Intensive Distributed Computingcs451/slides/big... · 2020-02-13 · OLAP (online analytical processing) Typical applications: business intelligence, data mining Back-end processing:](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec9a05d19c36801164fcf07/html5/thumbnails/52.jpg)
Frontend
Backend
users
Frontend
Backend
users
Frontend
Backend
external APIs
“Traditional”BI tools
SQL on Hadoop
Othertools
Data Warehouse“Data Lake”
data scientists
OLTP database
ETL(Extract, Transform, and Load)
OLTP database
OLTP database
52
![Page 53: Data-Intensive Distributed Computingcs451/slides/big... · 2020-02-13 · OLAP (online analytical processing) Typical applications: business intelligence, data mining Back-end processing:](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec9a05d19c36801164fcf07/html5/thumbnails/53.jpg)
Frontend
Backend
users
Frontend
Backend
users
Frontend
Backend
external APIs
“Traditional”BI tools
SQL on Hadoop
Othertools
Data Warehouse“Data Lake”
data scientists
HTAP database
ETL(Extract, Transform, and Load)
HTAP database
HTAP database
Analyticstools
data scientists
Analyticstools
data scientists
53
![Page 54: Data-Intensive Distributed Computingcs451/slides/big... · 2020-02-13 · OLAP (online analytical processing) Typical applications: business intelligence, data mining Back-end processing:](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec9a05d19c36801164fcf07/html5/thumbnails/54.jpg)
Frontend
Backend
users
Frontend
Backend
users
Frontend
Backend
external APIs
“Traditional”BI tools
SQL on Hadoop
Othertools
Data Warehouse“Data Lake”
data scientists
ETL(Extract, Transform, and Load)
Everything In the cloud!
IaaS / Load balance aaS
OLTP database
OLTP database
OLTP database
DBaaS (e.g., RDS)
DBaaS (e.g., RedShift)
S3
“Cloudified” tools
ELT aaS
54