1
Introduction to Teradata
2
How Teradata Works
3
How Does Teradata Store Rows?
Teradata uses hashing algorithm to randomly and evenly distribute data across all AMPs.
The rows of every table are distributed among all AMPs - and ideally will be evenly distributed among all AMPs.
Each AMP is responsible for a subset of the rows of each table.Evenly distributed tables result in evenly distributed workloads.The data is not placed in any particular order
The benefits of unordered data include:No maintenance needed to preserve order, and It is independent of any query being submitted.
The benefits of automatic data placement include:Distribution is the same regardless of data volumeDistribution is based on row content, not data demographics
4
Primary Indexes· The mechanism used to assign a row to an
AMP· A table must have a Primary Index· The Primary Index cannot be changedUPI · If the index choice of column(s) is unique, we call
this a UPI (Unique Primary Index).· A UPI choice will result in even distribution of the
rows of the table across all AMPs.
NUPI · If the index choice of column(s) isn’t unique, we call this a NUPI (Non-Unique Primary Index).
· A NUPI choice will result in even distribution of the rows of the table proportional to the degree of uniqueness of the index.
UPI’s guarantee even data distribution and eliminate duplicate row checking.
Why would you choose an Index that is different from the Primary Key? • Join performance• Known access paths
5
Data Storage based on Primary Index· The value of the Primary Index for a specific row determines its AMP
assignment.· This is done using the hashing algorithm.
PI Value
AMP AMP AMP
PE
Row assignmentRow access
Hashingalgorithm
Accessing the row by its Primary Index value is:· Always a one-AMP operation · The most efficient way to access a row
6
Row Distribution Using a UPIOrder
NumberCustomerNumber
OrderDate
OrderStatus
PKUPI732573247415710372257384740271887202
231121312
4/134/134/134/104/154/124/164/134/09
OOCOCCCCC
The PK column(s) willoften be used as a UPI.PI values for Order_Number are known to be unique (it’s a PK).Teradata will distribute different index values evenly across AMPs.Resulting row distribution among AMPs is uniform.
AMP 1 AMP 2 AMP 3 AMP 4
7202 2 4/09 C
7402 3 4/16 C
7325 2 4/13 C
7225 2 4/15 C
7188 1 4/13 C
7384 1 4/12 C
7324 3 4/13 C7103 1 4/10 C7415 1 4/13 C
Order
7
Row Distribution Using a NUPI
OrderNumber
CustomerNumber
OrderDate
OrderStatus
PKNUPI
732573247415710372257384740271887202
231121312
4/134/134/134/104/154/124/164/134/09
OOCOCCCCC
Order
Customer_Number may be the referred access column for ORDER table, thus a good index candidate.Values for Customer_Number are non-unique and therefore a NUPI.Rows with the same PI value distribute to the same AMP causing row distribution to be less uniform or skewed.
7225 2 4/15 C
7325 2 4/13 0
7415 1 4/13 C
7384 1 4/12 C
7324 3 4/13 0
7402 3 4/16 C7103 1 4/10 C
AMP 1 AMP 2 AMP 4
7202 2 4/09 C
7188 1 4/13 C
AMP 3
8
Secondary IndexesThree general ways to access a table:· Primary index access (one-AMP access)· Secondary index access (two-or all-AMP access)· Full Table Scan (all-AMP access)
A secondary index is an alternate path to the rows of a table.A table can have from 0 to 32 secondary indexes.Secondary indexes:• Do not affect table distribution.• Add overhead, both in terms of disk space and maintenance.• May be added or dropped dynamically as needed.• Are chosen to improve table performance.
9
Customer table Id =
100USI Value =
56
Table IDRow HashUSI Value100 602 56
Hashing Algorith
m
PE
CREATE UNIQUE INDEX (cust) on customer;
SELECT *FROM customerWHERE cust = 56;
Create USI
Access via USI
- * -
AMP 1 AMP 2 AMP 3 AMP 4
RowIDCustRowID RowIDCustRowID RowIDCustRowID RowIDCustRowID
BYNET
AMP 2
Table ID100
Row Hash778
Unique Val7
USI Subtable USI Subtable USI SubtableUSI Subtable
BYNET
AMP 1 AMP 3 AMP 4
74775127
884, 1639, 1915, 9388, 1
244, 1505, 1744, 4757, 1
8498
5649
536, 5555, 6
778, 7147, 1
296, 1135, 1
602, 1969, 1
31404595
638, 1640, 1471, 1778, 3
288, 1339, 1372, 2588, 1
175, 1 37 107, 1489, 1 72 717, 2838, 1 12 147, 2919, 1 62 822, 1
AdamsSmith
RiceWhite555-4444
111-2222222-3333
666-555531
37
40
84107, 1536, 5638, 1640, 1
RowIDCustNamePhoneNUPI
Base Table
USI
RowIDCustNamePhoneNUPI
Base Table
USI
Base Table Base Table
AdamsSmith
BrownAdams444-6666
666-7777555-6666
333-999972
45
74
98471, 1555, 6717, 2884, 1
JonesBlack
YoungSmith111-6666
222-8888444-5555
777-444427
49
62
12147, 1147, 2388, 1822, 1
RowIDCustNamePhoneNUPIU
SI
SmithMarsh
PetersJones777-6666
555-7777888-2222
555-777756
77
51
95639, 1778, 3778, 7915, 9
RowIDCustNamePhoneNUPIU
SI
Unique Secondary Index (USI) Access
10
Non-Unique Secondary Index (NUSI) Access
Table ID100
Row Hash567
NUSI Value‘Adams’
Hashing Algorith
m
Customer table Id =
100
BYNET
AMP 2
NUSI Value = ‘Adams’
PE
CREATE INDEX (name) on customer;SELECT *FROM customerWHERE name = ‘Adams’;
Create NUSI
Access via
NUSI
AMP 1
BrownAdams
Smith555, 6
471, 1 717, 2
884, 1852, 1567, 2
432, 3
RowIDNameRowIDWhiteRiceAdamsSmith
107, 1536, 5638, 1640, 1
448, 1656, 1567, 3432, 8
RowID NameRowID
NUSI Subtable NUSI Subtable
SmithYoungJonesBlack
147, 1147, 2338, 1822, 1
432, 1770, 1567, 6448, 4
RowID NameRowID
NUSI Subtable
JonesPetersSmithMarsh
639, 1778, 3778, 7915, 9
262, 1396, 1432, 5155, 1
RowID NameRowID
NUSI Subtable
AMP 4AMP 3
AdamsSmith
RiceWhite555-4444
111-2222222-3333
666-555531
37
40
84107, 1536, 5638, 1640, 1
RowIDCustNamePhoneNUPI
Base Table
RowIDCustNamePhoneNUPI
Base Table Base Table Base Table
AdamsSmith
BrownAdams444-6666
666-7777555-6666
333-999972
45
74
98471, 1555, 6717, 2884, 1
JonesBlack
YoungSmith111-6666
222-8888444-5555
777-444427
49
62
12147, 1147, 2388, 1822, 1
RowIDCustNamePhoneNUPI
SmithMarsh
PetersJones777-6666
555-7777888-2222
555-777756
77
51
95639, 1778, 3778, 7915, 9
RowIDCustNamePhoneNUPINUSI NUSI NUSI NUSI
11
Comparison of Primary and Secondary Indexes
Index Feature Primary Secondary
Required? Yes No
Number per Table 1 0-32
Max Number of Columns 16 16
Unique or Non-Unique? Both Both
Affects Row Distribution Yes No
Created/Dropped Dynamically No Yes
Improves Access Yes Yes
Multiple Data Types Yes Yes
Separate Physical Structure None Sub-table
Extra Processing Overhead No Yes
12
Primary Keys and Primary Indexes
Primary Key
Logical concept of data modeling
Teradata doesn’t need to recognizeNo limit on column numbersDocumented in data model(Optional in CREATE TABLE)Must be uniqueUniquely identifies each row
Values should not changeMay not be NULL—requires a valueDoes not imply an access pathChosen for logical correctness
Primary Index
Physical mechanism for access andstorageEach table must have exactly one16-column limitDefined in CREATE TABLE statement
May be unique or non-uniqueUsed to place and locate each rowon an AMPValues may be changed (Del+ Ins)May be NULLDefines most efficient access pathChosen for physical performance
Indexes are conceptually different from keys:A PK is a relational modeling convention which uniquely identified each row.A PI is a Teradata convention which determines how the rows are stored and accessed.
A significant percentage of tables may use the same columns for both the PK and PI.A well-designed database will use a PI that is different from the PK for some tables.
Learn Teradata Online Contact: USA: +1 732 325 1626 India: +91 800 811 4040Mail: [email protected]
/bigclasses /bigclasses /bigclasses
http://bigclasses.com/teradata-online-training.html
Thank you
14
Watch Teradata DEMO Video On YouTube www.youtube.com/user/bigclassescom
Top Related