Vertica architecture
-
Upload
zvika-gutkin -
Category
Travel
-
view
1.093 -
download
3
description
Transcript of Vertica architecture
![Page 2: Vertica architecture](https://reader035.fdocuments.in/reader035/viewer/2022062308/558bdedbd8b42aee458b4578/html5/thumbnails/2.jpg)
Agenda• Vertica VS the world
• What is Vertica
• How does it work
• How To Use Vertica … (The Right Way )
• Where It Falls Short
• Drill Down to SQL’s… (Group by & Joins )
![Page 3: Vertica architecture](https://reader035.fdocuments.in/reader035/viewer/2022062308/558bdedbd8b42aee458b4578/html5/thumbnails/3.jpg)
Close Your Eyes
Imagine Your System
It Needs To support:
![Page 4: Vertica architecture](https://reader035.fdocuments.in/reader035/viewer/2022062308/558bdedbd8b42aee458b4578/html5/thumbnails/4.jpg)
• 1,000,000 concurrent users
• 1,000,000 operations/s
• Micro seconds read & write latency
• Complex analytics queries with seconds
latency
• ACID
Highly Avilable
Scalable
![Page 5: Vertica architecture](https://reader035.fdocuments.in/reader035/viewer/2022062308/558bdedbd8b42aee458b4578/html5/thumbnails/5.jpg)
Open Your Eyes
What Do You See ?
![Page 6: Vertica architecture](https://reader035.fdocuments.in/reader035/viewer/2022062308/558bdedbd8b42aee458b4578/html5/thumbnails/6.jpg)
Vertica
OracleCouchbase
Cassandra
MongoMySql
Exadata
![Page 7: Vertica architecture](https://reader035.fdocuments.in/reader035/viewer/2022062308/558bdedbd8b42aee458b4578/html5/thumbnails/7.jpg)
Vertica VS the WorldVertica Oracle Cassandra Couchbase
Scale Mpp Single Server* Mpp Mpp
Data Model Relational structured
Relational structured
Column store schema-less
Document schema-less
Transaction Model
ACID ACID Eventually consistent
Consistent
Dr Application solution
Stand by read only
Active Active Active Active
Development Sql… Sql… Python,Java,Cql…
Python,Java,Php…
Best for Analytics Generic,OLTP Write intensive key value
Read and write intensive json
documents
CAP CP N/A AP CP
![Page 8: Vertica architecture](https://reader035.fdocuments.in/reader035/viewer/2022062308/558bdedbd8b42aee458b4578/html5/thumbnails/8.jpg)
Use Cases
• Real time dashborading (5,000 concurrent
users, heavy writes and simple fetches ).
• Real time complex analytics
• Billing
• Blog Site
Cassandra
Vertica
Oracle
Couchbase
![Page 9: Vertica architecture](https://reader035.fdocuments.in/reader035/viewer/2022062308/558bdedbd8b42aee458b4578/html5/thumbnails/9.jpg)
MPP-Columnar DBMS
• 10x –100x performance of classic RDBMS
• Linear Scale
• SQL
• Commodity Hardware
• Built-in fault tolerance
![Page 10: Vertica architecture](https://reader035.fdocuments.in/reader035/viewer/2022062308/558bdedbd8b42aee458b4578/html5/thumbnails/10.jpg)
10x –100x performance of classic RDBMS
Column store architecture
• High Compression rates.• Sorted columns.• Objects Segmentation/Replication.
![Page 11: Vertica architecture](https://reader035.fdocuments.in/reader035/viewer/2022062308/558bdedbd8b42aee458b4578/html5/thumbnails/11.jpg)
Regular table
Continent Country City Size Size type Population
Asia Israel Tel Aviv 52000 Acres 450000
N.America USA Dallas 385 Sq. miles 1200000
Create Table …..
![Page 12: Vertica architecture](https://reader035.fdocuments.in/reader035/viewer/2022062308/558bdedbd8b42aee458b4578/html5/thumbnails/12.jpg)
Rows Vs ColumnContinent
• Asia
• Asia
• Asia
• N.America
• N.America
• N.America
Country
• Israel
• Israel
• Israel
• Usa
• Usa
• usa
Size Type
• Sq. miles
• Sq. miles
• Sq. miles
• Sq miles
• Sq. miles
• Sq. miles
City size
• 52000
• 78000
• 63000
• 385
• 468
• 8700
City Name
• Tel Aviv
• Jerusalem
• Haifa
• Dallas
• New York
• New Jersey
Population
• 450000
• 800000
• 268000
• 1200000
• 8200000
• 8800000
Block1
•Asia•Israel•Sq. miles•Tel Aviv
Block2
•52000•450000•Asia
Block3
•Israel•Sq. miles•Jerusalem
Block4
•78000•800000•N.America
Block 5
•Usa•Dallas•Sq. miles•385
Block 6
•1200000•Asia•Israel
Block 7
•Haifa•Sq. miles•63000
Block 8
•268000•N.America•Usa
Block 9
•New York•Sq. miles•468•8200000
Block1
•Asia•Israel•Sq. miles•Tel Aviv
Block2
•52000•450000•Asia
Block3
•Israel•Sq. miles•Jerusalem
Block4
•78000•800000•N.America
Block 5
•Usa•Dallas•Sq. miles•385
Block 6
•1200000•Asia•Israel
Block 7
•Haifa•Sq. miles•63000
Block 8
•268000•N.America•Usa
Block 9
•New York•Sq. miles•468•8200000
Block1
•Asia•Israel•Sq. miles•Tel Aviv
Block2
•52000•450000•Asia
Block3
•Israel•Sq. miles•Jerusalem
Block4
•78000•800000•N.America
Block 5
•Usa•Dallas•Sq. miles•385
Block 6
•1200000•Asia•Israel
Block 7
•Haifa•Sq. miles•63000
Block 8
•268000•N.America•Usa
Block 9
•New York•Sq. miles•468•8200000
Block1
•Asia•Israel•Sq. miles•Tel Aviv
Block2
•52000•450000•Asia
Block3
•Israel•Sq. miles•Jerusalem
Block4
•78000•800000•N.America
Block 5
•Usa•Dallas•Sq. miles•385
Block 6
•1200000•Asia•Israel
Block 7
•Haifa•Sq. miles•63000
Block 8
•268000•America•Usa
Block 9
•New York•Sq. miles•468•8200000
Block1
•Asia•Israel•Sq. miles•Tel Aviv
Block2
•52000•450000•Asia
Block3
•Israel•Sq. miles•Jerusalem
Block4
•78000•800000•N.America
Block 5
•Usa•Dallas•Sq. miles•385
Block 6
•1200000•Asia•Israel
Block 7
•Haifa•Sq. miles•63000
Block 8
•268000•N.America•Usa
Block 9
•New York•Sq. miles•468•8200000
Continent
•Asia,3N.America,3
RLE Encoding
Country
•Israel,3Usa,3
RLE Encoding
Size Type
•Dunam,3Sq. miles,3
RLE Encoding
City size
•5200078000630003854688700
DeltaVal Encoding
City Name
•Tel AvivJerusalemHaifaDallasNew YorkNew Jersey
RLE Encoding
Population
•450000800000268000120000082000008800000
LZO Encoding
![Page 13: Vertica architecture](https://reader035.fdocuments.in/reader035/viewer/2022062308/558bdedbd8b42aee458b4578/html5/thumbnails/13.jpg)
Rows VS Columns
• Conversion Table (~2 billion rows a month)–Oracle •Uncompressed => 418 GB • Compressed (manual) => 147 GB
–Vertica• 21 GB
Saving : 71%
![Page 14: Vertica architecture](https://reader035.fdocuments.in/reader035/viewer/2022062308/558bdedbd8b42aee458b4578/html5/thumbnails/14.jpg)
How Does It Work ?
![Page 15: Vertica architecture](https://reader035.fdocuments.in/reader035/viewer/2022062308/558bdedbd8b42aee458b4578/html5/thumbnails/15.jpg)
Tuple Mover
![Page 16: Vertica architecture](https://reader035.fdocuments.in/reader035/viewer/2022062308/558bdedbd8b42aee458b4578/html5/thumbnails/16.jpg)
ROSAsia,23
N.America,13
Israel,23
Usa,13
Natanya,1Zoran,1…
seattle,1Chicago,1Austin,1…
Asia,2
N.America, 3
Israel,2
Usa,1
Jerusalem,1Tel aviv,1…
Dallas,1New Jersey,1New York,1…
WOS
Tuple Mover Flow
N.America Usa Dallas Sq. miles 385 1200000
Asia Israel Tel Aviv Sq. miles 52000 450000
N.America Usa New York Sq. miles 462 8200000
N.America Usa New Jersey Sq. miles 468 8800000
Asia Israel Jerusalem Sq. miles 78000 800000
Asia,25
N.America,16
Israel,25
Usa,16
Jerusalem,1Natanya,1Tel Aviv,1Zoran,1…Austin,1Chicago,1Dallas,1New Jersey,1New York,1seattle,1…
![Page 17: Vertica architecture](https://reader035.fdocuments.in/reader035/viewer/2022062308/558bdedbd8b42aee458b4578/html5/thumbnails/17.jpg)
Projections
• Physical structure of the table (logical)• Stored sorted and compressed • Internal maintenance • At least one (super) projection• Projection Types:– Super projection– Query specific projection– Pre join projection– Buddy projection
![Page 18: Vertica architecture](https://reader035.fdocuments.in/reader035/viewer/2022062308/558bdedbd8b42aee458b4578/html5/thumbnails/18.jpg)
Projections
![Page 19: Vertica architecture](https://reader035.fdocuments.in/reader035/viewer/2022062308/558bdedbd8b42aee458b4578/html5/thumbnails/19.jpg)
How to build my projections ?
• Use DBD• Choose the right columns (General Vs Specific)• Choose the right sort order • Choose the right encoding • Choose the right column to partition by • Choose the right column to segment by
![Page 20: Vertica architecture](https://reader035.fdocuments.in/reader035/viewer/2022062308/558bdedbd8b42aee458b4578/html5/thumbnails/20.jpg)
Rule of thumbs(Don’t tell Tom Kyte)
• Avoid “select * …”• De normalize• Use bulks for DML’s • Use merge join for large joins. • Understand Vertica architecture &
your data
![Page 21: Vertica architecture](https://reader035.fdocuments.in/reader035/viewer/2022062308/558bdedbd8b42aee458b4578/html5/thumbnails/21.jpg)
Delete/Update
• Deleted rows are only marked as deleted• Stored in delete vector on disk• Query merge the ROS and Deleted vector to
remove deleted records• Data is removed asynchronously during merge
out
![Page 22: Vertica architecture](https://reader035.fdocuments.in/reader035/viewer/2022062308/558bdedbd8b42aee458b4578/html5/thumbnails/22.jpg)
Delete/UpdateStrata issue
Merge OutToo Many ROS
500MB
2GB
4GB
![Page 23: Vertica architecture](https://reader035.fdocuments.in/reader035/viewer/2022062308/558bdedbd8b42aee458b4578/html5/thumbnails/23.jpg)
Where It Falls Short …
• Lack of Features • Documentation • Good for specific types of queries
![Page 24: Vertica architecture](https://reader035.fdocuments.in/reader035/viewer/2022062308/558bdedbd8b42aee458b4578/html5/thumbnails/24.jpg)
Let’s Dive into Sql Examples
1. Sort Optimization2. Join Optimization
![Page 25: Vertica architecture](https://reader035.fdocuments.in/reader035/viewer/2022062308/558bdedbd8b42aee458b4578/html5/thumbnails/25.jpg)
Choose the Right sort order Example
select a11.LP_ACCOUNT_ID AS LP_ACCOUNT_ID, count(distinct a11.VS_LP_SESSION_ID) AS Visits, (count(distinct a11.VS_LP_SESSION_ID) * 1.0) AS WJXBFS1 from lp_15744040.FACT_VISIT_ROOM a11 group by a11.LP_ACCOUNT_ID;
![Page 26: Vertica architecture](https://reader035.fdocuments.in/reader035/viewer/2022062308/558bdedbd8b42aee458b4578/html5/thumbnails/26.jpg)
First projection ….table_name projection_name projection_column_name column_position sort_position
FACT_VISIT_ROOM FACT_VISIT_ROOM_bad VS_LP_SESSION_ID 0 0
FACT_VISIT_ROOM FACT_VISIT_ROOM_bad LP_ACCOUNT_ID 1 1
FACT_VISIT_ROOM FACT_VISIT_ROOM_bad VS_LP_VISITOR_ID 2 2
FACT_VISIT_ROOM FACT_VISIT_ROOM_bad VISIT_FROM_DT_TRUNC 3 3
FACT_VISIT_ROOM FACT_VISIT_ROOM_bad ACCOUNT_ID 4 4
FACT_VISIT_ROOM FACT_VISIT_ROOM_bad ROOM_ID 5 5
FACT_VISIT_ROOM FACT_VISIT_ROOM_bad VISIT_FROM_DT_ACTUAL 6 6
FACT_VISIT_ROOM FACT_VISIT_ROOM_bad VISIT_TO_DT_ACTUAL 7 7
FACT_VISIT_ROOM FACT_VISIT_ROOM_bad HOT_LEAD_IND 8 8
Access Path: +-GROUPBY PIPELINED [Cost: 7M, Rows: 10K] (PATH ID: 1) | Aggregates: count(DISTINCT a11.VS_LP_SESSION_ID) | Group By: a11.LP_ACCOUNT_ID | +---> GROUPBY HASH (SORT OUTPUT) [Cost: 7M, Rows: 10K] (PATH ID: 2) | | Group By: a11.LP_ACCOUNT_ID, a11.VS_LP_SESSION_ID | | +---> STORAGE ACCESS for a11 [Cost: 5M, Rows: 199M] (PATH ID: 3) | | | Projection: lp_15744040.FACT_VISIT_ROOM_bad | | | Materialize: a11.LP_ACCOUNT_ID, a11.VS_LP_SESSION_ID
![Page 27: Vertica architecture](https://reader035.fdocuments.in/reader035/viewer/2022062308/558bdedbd8b42aee458b4578/html5/thumbnails/27.jpg)
Second projection …table_name projection_name projection_column_name column_position sort_position
FACT_VISIT_ROOM FACT_VISIT_ROOM_fix1 LP_ACCOUNT_ID 0 0
FACT_VISIT_ROOM FACT_VISIT_ROOM_fix1 VS_LP_SESSION_ID 1 1
FACT_VISIT_ROOM FACT_VISIT_ROOM_fix1 VS_LP_VISITOR_ID 2 2
FACT_VISIT_ROOM FACT_VISIT_ROOM_fix1 VISIT_FROM_DT_TRUNC 3 3
FACT_VISIT_ROOM FACT_VISIT_ROOM_fix1 ACCOUNT_ID 4 4
FACT_VISIT_ROOM FACT_VISIT_ROOM_fix1 ROOM_ID 5 5
FACT_VISIT_ROOM FACT_VISIT_ROOM_fix1 VISIT_FROM_DT_ACTUAL 6 6
FACT_VISIT_ROOM FACT_VISIT_ROOM_fix1 VISIT_TO_DT_ACTUAL 7 7
FACT_VISIT_ROOM FACT_VISIT_ROOM_fix1 HOT_LEAD_IND 8 8
Access Path: +-GROUPBY PIPELINED [Cost: 7M, Rows: 10K] (PATH ID: 1) | Aggregates: count(DISTINCT a11.VS_LP_SESSION_ID) | Group By: a11.LP_ACCOUNT_ID | +---> GROUPBY PIPELINED [Cost: 7M, Rows: 10K] (PATH ID: 2) | | Group By: a11.LP_ACCOUNT_ID, a11.VS_LP_SESSION_ID | | +---> STORAGE ACCESS for a11 [Cost: 5M, Rows: 199M] (PATH ID: 3) | | | Projection: lp_15744040.FACT_VISIT_ROOM_fix1 | | | Materialize: a11.LP_ACCOUNT_ID, a11.VS_LP_SESSION_ID
![Page 28: Vertica architecture](https://reader035.fdocuments.in/reader035/viewer/2022062308/558bdedbd8b42aee458b4578/html5/thumbnails/28.jpg)
Results …
Elapsed Time First projectionGROUPBY HASH (SORT OUTPUT)
Time: First fetch (7 rows): 264527.916 ms. All rows formatted: 264527.978 ms
Elapsed Time Second projectionGROUPBY PIPELINED
Time: First fetch (7 rows): 38913.909 ms. All rows formatted: 38913.965 ms
![Page 29: Vertica architecture](https://reader035.fdocuments.in/reader035/viewer/2022062308/558bdedbd8b42aee458b4578/html5/thumbnails/29.jpg)
2
Group by Hash Not Sorted
Value Count
111
CBAD
12222
![Page 30: Vertica architecture](https://reader035.fdocuments.in/reader035/viewer/2022062308/558bdedbd8b42aee458b4578/html5/thumbnails/30.jpg)
Group By Pipe OperatorSorted
Count( ) =
![Page 31: Vertica architecture](https://reader035.fdocuments.in/reader035/viewer/2022062308/558bdedbd8b42aee458b4578/html5/thumbnails/31.jpg)
Join Exampleselect a12.DT_WEEK AS DT_WEEK, a11.LP_ACCOUNT_ID AS LP_ACCOUNT_ID, count(distinct a11.VS_LP_SESSION_ID) AS Visits, (count(distinct a11.VS_LP_SESSION_ID) * 1.0) AS WJXBFS1 from zzz.FACT_VISIT a11 join zzz.DIM_DATE_TIME a12 on (a11.VISIT_FROM_DT_TRUNC = a12.DATE_TIME_ID) where (a11.LP_ACCOUNT_ID in ('57386690') and a11.VISIT_FROM_DT_TRUNC between '2011-09-01 15:28:00' and '2011-12-31 12:52:50') group by a12.DT_WEEK, a11.LP_ACCOUNT_ID
Filter : LP_ACCOUNT_ID, VISIT_FROM_DT_TRUNC Group By : DT_WEEK , LP_ACCOUNT_ID Join: VISIT_FROM_DT_TRUNC , DATE_TIME_ID Select : DT_WEEK, LP_ACCOUNT_ID, VS_LP_SESSION_ID
![Page 32: Vertica architecture](https://reader035.fdocuments.in/reader035/viewer/2022062308/558bdedbd8b42aee458b4578/html5/thumbnails/32.jpg)
Full Explain Plan…Access Path: +-GROUPBY PIPELINED (RESEGMENT GROUPS) [Cost: 14M, Rows: 5M (NO STATISTICS)] (PATH ID: 1) | Aggregates: count(DISTINCT a11.VS_LP_SESSION_ID) | Group By: a12.DT_WEEK, a11.LP_ACCOUNT_ID | Execute on: All Nodes | +---> GROUPBY HASH (SORT OUTPUT) [Cost: 6M, Rows: 100M (NO STATISTICS)] (PATH ID: 2) | | Group By: a12.DT_WEEK, a11.LP_ACCOUNT_ID, a11.VS_LP_SESSION_ID | | Execute on: All Nodes | | +---> JOIN HASH [Cost: 944K, Rows: 372M (NO STATISTICS)] (PATH ID: 3) | | | Join Cond: (a11.VISIT_FROM_DT_TRUNC = a12.DATE_TIME_ID) | | | Materialize at Output: a11.VS_LP_SESSION_ID, a11.LP_ACCOUNT_ID | | | Execute on: All Nodes | | | +-- Outer -> STORAGE ACCESS for a11 [Cost: 421K, Rows: 372M (NO STATISTICS)] (PATH ID: 4) | | | | Projection: zzz.FACT_VISIT_b0 | | | | Materialize: a11.VISIT_FROM_DT_TRUNC | | | | Filter: (a11.LP_ACCOUNT_ID = '57386690') | | | | Filter: ((a11.VISIT_FROM_DT_TRUNC >= '2011-09-01 15:28:00'::timestamp) AND (a11.VISIT_FROM_DT_TRUNC <= '2011-12-31 12:52:50'::timestamp)) | | | | Execute on: All Nodes | | | +-- Inner -> STORAGE ACCESS for a12 [Cost: 1K, Rows: 10K (NO STATISTICS)] (PATH ID: 5) | | | | Projection: zzz.DIM_DATE_TIME_node0004 | | | | Materialize: a12.DATE_TIME_ID, a12.DT_WEEK | | | | Filter: ((a12.DATE_TIME_ID >= '2011-09-01 15:28:00'::timestamp) AND (a12.DATE_TIME_ID <= '2011-12-31 12:52:50'::timestamp)) | | | | Execute on: All Nodes
![Page 33: Vertica architecture](https://reader035.fdocuments.in/reader035/viewer/2022062308/558bdedbd8b42aee458b4578/html5/thumbnails/33.jpg)
Explain Plan (substract)…Access Path:l +-GROUPBY PIPELINED (RESEGMENT GROUPS) [Cost: 14M, Rows: 5M (NO STATISTICS)] (PATH ID: 1) | Aggregates: count(DISTINCT a11.VS_LP_SESSION_ID) | Group By: a12.DT_WEEK, a11.LP_ACCOUNT_ID | Execute on: All Nodes | +---> GROUPBY HASH (SORT OUTPUT) [Cost: 6M, Rows: 100M (NO STATISTICS)] (PATH ID: 2) | | Group By: a12.DT_WEEK, a11.LP_ACCOUNT_ID, a11.VS_LP_SESSION_ID | | Execute on: All Nodes | | +---> JOIN HASH [Cost: 944K, Rows: 372M (NO STATISlTICS)] (PATH ID: 3) | | | Join Cond: (a11.VISIT_FROM_DT_TRUNC = a12.DATE_TIME_ID) | | | Materialize at Output: a11.VS_LP_SESSION_ID, a11.LP_ACCOUNT_ID | | | Execute on: All Nodes
Time: First fetch (6 rows): 56654.894 ms. All rows formatted: 56654.988 ms
![Page 34: Vertica architecture](https://reader035.fdocuments.in/reader035/viewer/2022062308/558bdedbd8b42aee458b4578/html5/thumbnails/34.jpg)
Solution one - Functionsselect week(a11.VISIT_FROM_DT_TRUNC) AS DT_WEEK, a11.LP_ACCOUNT_ID AS LP_ACCOUNT_ID, count(distinct a11.VS_LP_SESSION_ID) AS Visits, (count(distinct a11.VS_LP_SESSION_ID) * 1.0) AS WJXBFS1 from zzz.FACT_VISIT a11 where (a11.LP_ACCOUNT_ID in ('57386690') and a11.VISIT_FROM_DT_TRUNC between '2011-09-01 15:28:00' and '2011-12-31 12:52:50') group by week(a11.VISIT_FROM_DT_TRUNC), a11.LP_ACCOUNT_ID;
Access Path: +-GROUPBY PIPELINED (RESEGMENT GROUPS) [Cost: 127, Rows: 1 (STALE STATISTICS)] (PATH ID: 1) | Aggregates: count(DISTINCT a11.VS_LP_SESSION_ID) | Group By: <SVAR>, a11.LP_ACCOUNT_ID | Execute on: All Nodes | +---> GROUPBY HASH (SORT OUTPUT) [Cost: 126, Rows: 1 (STALE STATISTICS)] (PATH ID: 2) | | Group By: (date_part('week', a11.VISIT_FROM_DT_TRUNC))::int, a11.LP_ACCOUNT_ID, a11.VS_LP_SESSION_ID | | Execute on: All Nodes | | +---> STORAGE ACCESS for a11 [Cost: 125, Rows: 1 (STALE STATISTICS)] (PATH ID: 3) | | | Projection: zzz.FACT_VISIT_b0 Time: First fetch (6 rows): 33453.997 ms. All rows formatted: 33454.154 ms
Saved the Join Time
![Page 35: Vertica architecture](https://reader035.fdocuments.in/reader035/viewer/2022062308/558bdedbd8b42aee458b4578/html5/thumbnails/35.jpg)
Solution Two- PreJoin Projection
Pros• Eliminate Join overhead• Maintain By Vertica
Cons• Not Flexible• Cause Overhead on Load• Need Primary/Foreign Key• Maintenance Restrictions
![Page 36: Vertica architecture](https://reader035.fdocuments.in/reader035/viewer/2022062308/558bdedbd8b42aee458b4578/html5/thumbnails/36.jpg)
Access Path: +-GROUPBY PIPELINED (RESEGMENT GROUPS) [Cost: 12K, Rows: 10K] (PATH ID: 1) | Aggregates: count(DISTINCT visit_date_time_prejoin8_b0.VS_LP_SESSION_ID) | Group By: visit_date_time_prejoin8_b0.DT_WEEK, visit_date_time_prejoin8_b0.LP_ACCOUNT_ID | Execute on: All Nodes | +---> GROUPBY HASH (SORT OUTPUT) [Cost: 11K, Rows: 10K] (PATH ID: 2) | | Group By: visit_date_time_prejoin8_b0.DT_WEEK, visit_date_time_prejoin8_b0.LP_ACCOUNT_ID, visit_date_time_prejoin8_b0.VS_LP_SESSION_ID | | Execute on: All Nodes | | +---> STORAGE ACCESS for <No Alias> [Cost: 8K, Rows: 1M] (PATH ID: 3) | | | Projection: lp_15744040.visit_date_time_prejoin8_b0
Solution Two- PreJoin Projectionorder by LP_ACCOUNT_ID,VISIT_FROM_DT_TRUNC,DT_WEEK,HOT_LEAD_IND,DATE_TIME_ID,VS_LP_SESSION_ID
Time: First fetch (6 rows): 35312.331 ms. All rows formatted: 35312.421 msSaved the Join Time
![Page 37: Vertica architecture](https://reader035.fdocuments.in/reader035/viewer/2022062308/558bdedbd8b42aee458b4578/html5/thumbnails/37.jpg)
Access Path: +-GROUPBY PIPELINED (RESEGMENT GROUPS) [Cost: 542K, Rows: 10K] (PATH ID: 1) | Aggregates: count(DISTINCT visit_date_time_prejoin_z6.VS_LP_SESSION_ID) | Group By: visit_date_time_prejoin_z6.DT_WEEK, visit_date_time_prejoin_z6.LP_ACCOUNT_ID | Execute on: All Nodes | +---> GROUPBY PIPELINED [Cost: 542K, Rows: 10K] (PATH ID: 2) | | Group By: visit_date_time_prejoin_z6.DT_WEEK, visit_date_time_prejoin_z6.VS_LP_SESSION_ID, visit_date_time_prejoin_z6.LP_ACCOUNT_ID | | Execute on: All Nodes | | +---> STORAGE ACCESS for <No Alias> [Cost: 501K, Rows: 15M] (PATH ID: 3) | | | Projection: lp_15744040.visit_date_time_prejoin_z6 | |
Solution Two- PreJoin ProjectionSorted By DT_WEEK, LP_ACCOUNT_ID, VS_LP_SESSION_ID
Time: First fetch (6 rows): 3680.853 ms. All rows formatted: 3680.969 msSaved the Join Time and Group by hash Time
![Page 38: Vertica architecture](https://reader035.fdocuments.in/reader035/viewer/2022062308/558bdedbd8b42aee458b4578/html5/thumbnails/38.jpg)
Solution Three - Denormalizeselect DT_WEEK, a11.LP_ACCOUNT_ID AS LP_ACCOUNT_ID, count(distinct a11.VS_LP_SESSION_ID) AS Visits, (count(distinct a11.VS_LP_SESSION_ID) * 1.0) AS WJXBFS1 from zzz.FACT_VISIT_Z1 a11 where (a11.LP_ACCOUNT_ID in ('57386690') and a11.VISIT_FROM_DT_TRUNC between '2011-09-01 15:28:00' and '2011-12-31 12:52:50') group by DT_WEEK, a11.LP_ACCOUNT_ID;
Access Path: +-GROUPBY PIPELINED (RESEGMENT GROUPS) [Cost: 3M, Rows: 10K (NO STATISTICS)] (PATH ID: 1) | Aggregates: count(DISTINCT a11.VS_LP_SESSION_ID) | Group By: a11.DT_WEEK, a11.LP_ACCOUNT_ID | Execute on: All Nodes | +---> GROUPBY HASH (SORT OUTPUT) [Cost: 3M, Rows: 10K (NO STATISTICS)] (PATH ID: 2) | | Group By: a11.DT_WEEK, a11.LP_ACCOUNT_ID, a11.VS_LP_SESSION_ID | | Execute on: All Nodes | | +---> STORAGE ACCESS for a11 [Cost: 2M, Rows: 372M (NO STATISTICS)] (PATH ID: 3) | | | Projection: zzz.FACT_VISIT_Z1_superTime: First etch (6 rows): 33885.178 ms. All rows formatted: 33885.253 ms
Saved the Join Time
![Page 39: Vertica architecture](https://reader035.fdocuments.in/reader035/viewer/2022062308/558bdedbd8b42aee458b4578/html5/thumbnails/39.jpg)
• Changing the projection sort order
Solution Three - Denormalize
Access Path: +-GROUPBY PIPELINED (RESEGMENT GROUPS) [Cost: 588K, Rows: 10K] (PATH ID: 1) | Aggregates: count(DISTINCT a11.VS_LP_SESSION_ID) | Group By: a11.DT_WEEK, a11.LP_ACCOUNT_ID | Execute on: All Nodes | +---> GROUPBY PIPELINED [Cost: 587K, Rows: 10K] (PATH ID: 2) | | Group By: a11.DT_WEEK, a11.VS_LP_SESSION_ID, a11.LP_ACCOUNT_ID | | Execute on: All Nodes | | +---> STORAGE ACCESS for a11 [Cost: 531K, Rows: 20M] (PATH ID: 3) | | | Projection: zzz.fact_visit_z1_pipe | | | Materialize: a11.DT_WEEK, a11.LP_ACCOUNT_ID, a11.VS_LP_SESSION_ID | | | Filter: (a11.LP_ACCOUNT_ID = '57386690') | | | Filter: ((a11.VISIT_FROM_DT_TRUNC >= '2011-09-01 15:28:00'::timestamp) AND (a11.VISIT_FROM_DT_TRUNC <= '2011-12-31 12:52:50'::timestamp)) | | | Execute on: All Nodes
Time: First fetch (6 rows): 4313.497 ms. All rows formatted: 4313.600 msSaved the Join Time and Group by hash Time
![Page 40: Vertica architecture](https://reader035.fdocuments.in/reader035/viewer/2022062308/558bdedbd8b42aee458b4578/html5/thumbnails/40.jpg)
Keep it simple.Keep it sorted.*** Keep it joinless.
Let’s sum it up…
![Page 41: Vertica architecture](https://reader035.fdocuments.in/reader035/viewer/2022062308/558bdedbd8b42aee458b4578/html5/thumbnails/41.jpg)
Questions ?
![Page 42: Vertica architecture](https://reader035.fdocuments.in/reader035/viewer/2022062308/558bdedbd8b42aee458b4578/html5/thumbnails/42.jpg)
Thank You