Introduction to Teradata And How Teradata Works

14
1 Introduction to Teradata

Transcript of Introduction to Teradata And How Teradata Works

Page 1: Introduction to Teradata And How Teradata Works

1

Introduction to Teradata

Page 2: Introduction to Teradata And How Teradata Works

2

How Teradata Works

Page 3: Introduction to Teradata And How Teradata Works

3

How Does Teradata Store Rows?

Teradata uses hashing algorithm to randomly and evenly distribute data across all AMPs.

The rows of every table are distributed among all AMPs - and ideally will be evenly distributed among all AMPs.

Each AMP is responsible for a subset of the rows of each table.Evenly distributed tables result in evenly distributed workloads.The data is not placed in any particular order

The benefits of unordered data include:No maintenance needed to preserve order, and It is independent of any query being submitted.

The benefits of automatic data placement include:Distribution is the same regardless of data volumeDistribution is based on row content, not data demographics

Page 4: Introduction to Teradata And How Teradata Works

4

Primary Indexes· The mechanism used to assign a row to an

AMP· A table must have a Primary Index· The Primary Index cannot be changedUPI · If the index choice of column(s) is unique, we call

this a UPI (Unique Primary Index).· A UPI choice will result in even distribution of the

rows of the table across all AMPs.

NUPI · If the index choice of column(s) isn’t unique, we call this a NUPI (Non-Unique Primary Index).

· A NUPI choice will result in even distribution of the rows of the table proportional to the degree of uniqueness of the index.

UPI’s guarantee even data distribution and eliminate duplicate row checking.

Why would you choose an Index that is different from the Primary Key? • Join performance• Known access paths

Page 5: Introduction to Teradata And How Teradata Works

5

Data Storage based on Primary Index· The value of the Primary Index for a specific row determines its AMP

assignment.· This is done using the hashing algorithm.

PI Value

AMP AMP AMP

PE

Row assignmentRow access

Hashingalgorithm

Accessing the row by its Primary Index value is:· Always a one-AMP operation · The most efficient way to access a row

Page 6: Introduction to Teradata And How Teradata Works

6

Row Distribution Using a UPIOrder

NumberCustomerNumber

OrderDate

OrderStatus

PKUPI732573247415710372257384740271887202

231121312

4/134/134/134/104/154/124/164/134/09

OOCOCCCCC

The PK column(s) willoften be used as a UPI.PI values for Order_Number are known to be unique (it’s a PK).Teradata will distribute different index values evenly across AMPs.Resulting row distribution among AMPs is uniform.

AMP 1 AMP 2 AMP 3 AMP 4

7202 2 4/09 C

7402 3 4/16 C

7325 2 4/13 C

7225 2 4/15 C

7188 1 4/13 C

7384 1 4/12 C

7324 3 4/13 C7103 1 4/10 C7415 1 4/13 C

Order

Page 7: Introduction to Teradata And How Teradata Works

7

Row Distribution Using a NUPI

OrderNumber

CustomerNumber

OrderDate

OrderStatus

PKNUPI

732573247415710372257384740271887202

231121312

4/134/134/134/104/154/124/164/134/09

OOCOCCCCC

Order

Customer_Number may be the referred access column for ORDER table, thus a good index candidate.Values for Customer_Number are non-unique and therefore a NUPI.Rows with the same PI value distribute to the same AMP causing row distribution to be less uniform or skewed.

7225 2 4/15 C

7325 2 4/13 0

7415 1 4/13 C

7384 1 4/12 C

7324 3 4/13 0

7402 3 4/16 C7103 1 4/10 C

AMP 1 AMP 2 AMP 4

7202 2 4/09 C

7188 1 4/13 C

AMP 3

Page 8: Introduction to Teradata And How Teradata Works

8

Secondary IndexesThree general ways to access a table:· Primary index access (one-AMP access)· Secondary index access (two-or all-AMP access)· Full Table Scan (all-AMP access)

A secondary index is an alternate path to the rows of a table.A table can have from 0 to 32 secondary indexes.Secondary indexes:• Do not affect table distribution.• Add overhead, both in terms of disk space and maintenance.• May be added or dropped dynamically as needed.• Are chosen to improve table performance.

Page 9: Introduction to Teradata And How Teradata Works

9

Customer table Id =

100USI Value =

56

Table IDRow HashUSI Value100 602 56

Hashing Algorith

m

PE

CREATE UNIQUE INDEX (cust) on customer;

SELECT *FROM customerWHERE cust = 56;

Create USI

Access via USI

- * -

AMP 1 AMP 2 AMP 3 AMP 4

RowIDCustRowID RowIDCustRowID RowIDCustRowID RowIDCustRowID

BYNET

AMP 2

Table ID100

Row Hash778

Unique Val7

USI Subtable USI Subtable USI SubtableUSI Subtable

BYNET

AMP 1 AMP 3 AMP 4

74775127

884, 1639, 1915, 9388, 1

244, 1505, 1744, 4757, 1

8498

5649

536, 5555, 6

778, 7147, 1

296, 1135, 1

602, 1969, 1

31404595

638, 1640, 1471, 1778, 3

288, 1339, 1372, 2588, 1

175, 1 37 107, 1489, 1 72 717, 2838, 1 12 147, 2919, 1 62 822, 1

AdamsSmith

RiceWhite555-4444

111-2222222-3333

666-555531

37

40

84107, 1536, 5638, 1640, 1

RowIDCustNamePhoneNUPI

Base Table

USI

RowIDCustNamePhoneNUPI

Base Table

USI

Base Table Base Table

AdamsSmith

BrownAdams444-6666

666-7777555-6666

333-999972

45

74

98471, 1555, 6717, 2884, 1

JonesBlack

YoungSmith111-6666

222-8888444-5555

777-444427

49

62

12147, 1147, 2388, 1822, 1

RowIDCustNamePhoneNUPIU

SI

SmithMarsh

PetersJones777-6666

555-7777888-2222

555-777756

77

51

95639, 1778, 3778, 7915, 9

RowIDCustNamePhoneNUPIU

SI

Unique Secondary Index (USI) Access

Page 10: Introduction to Teradata And How Teradata Works

10

Non-Unique Secondary Index (NUSI) Access

Table ID100

Row Hash567

NUSI Value‘Adams’

Hashing Algorith

m

Customer table Id =

100

BYNET

AMP 2

NUSI Value = ‘Adams’

PE

CREATE INDEX (name) on customer;SELECT *FROM customerWHERE name = ‘Adams’;

Create NUSI

Access via

NUSI

AMP 1

BrownAdams

Smith555, 6

471, 1 717, 2

884, 1852, 1567, 2

432, 3

RowIDNameRowIDWhiteRiceAdamsSmith

107, 1536, 5638, 1640, 1

448, 1656, 1567, 3432, 8

RowID NameRowID

NUSI Subtable NUSI Subtable

SmithYoungJonesBlack

147, 1147, 2338, 1822, 1

432, 1770, 1567, 6448, 4

RowID NameRowID

NUSI Subtable

JonesPetersSmithMarsh

639, 1778, 3778, 7915, 9

262, 1396, 1432, 5155, 1

RowID NameRowID

NUSI Subtable

AMP 4AMP 3

AdamsSmith

RiceWhite555-4444

111-2222222-3333

666-555531

37

40

84107, 1536, 5638, 1640, 1

RowIDCustNamePhoneNUPI

Base Table

RowIDCustNamePhoneNUPI

Base Table Base Table Base Table

AdamsSmith

BrownAdams444-6666

666-7777555-6666

333-999972

45

74

98471, 1555, 6717, 2884, 1

JonesBlack

YoungSmith111-6666

222-8888444-5555

777-444427

49

62

12147, 1147, 2388, 1822, 1

RowIDCustNamePhoneNUPI

SmithMarsh

PetersJones777-6666

555-7777888-2222

555-777756

77

51

95639, 1778, 3778, 7915, 9

RowIDCustNamePhoneNUPINUSI NUSI NUSI NUSI

Page 11: Introduction to Teradata And How Teradata Works

11

Comparison of Primary and Secondary Indexes

Index Feature Primary Secondary

Required? Yes No

Number per Table 1 0-32

Max Number of Columns 16 16

Unique or Non-Unique? Both Both

Affects Row Distribution Yes No

Created/Dropped Dynamically No Yes

Improves Access Yes Yes

Multiple Data Types Yes Yes

Separate Physical Structure None Sub-table

Extra Processing Overhead No Yes

Page 12: Introduction to Teradata And How Teradata Works

12

Primary Keys and Primary Indexes

Primary Key

Logical concept of data modeling

Teradata doesn’t need to recognizeNo limit on column numbersDocumented in data model(Optional in CREATE TABLE)Must be uniqueUniquely identifies each row

Values should not changeMay not be NULL—requires a valueDoes not imply an access pathChosen for logical correctness

Primary Index

Physical mechanism for access andstorageEach table must have exactly one16-column limitDefined in CREATE TABLE statement

May be unique or non-uniqueUsed to place and locate each rowon an AMPValues may be changed (Del+ Ins)May be NULLDefines most efficient access pathChosen for physical performance

Indexes are conceptually different from keys:A PK is a relational modeling convention which uniquely identified each row.A PI is a Teradata convention which determines how the rows are stored and accessed.

A significant percentage of tables may use the same columns for both the PK and PI.A well-designed database will use a PI that is different from the PK for some tables.

Page 13: Introduction to Teradata And How Teradata Works

Learn Teradata Online Contact: USA: +1 732 325 1626 India: +91 800 811 4040Mail: [email protected]

/bigclasses /bigclasses /bigclasses

http://bigclasses.com/teradata-online-training.html

Page 14: Introduction to Teradata And How Teradata Works

Thank you

14

Watch Teradata DEMO Video On YouTube www.youtube.com/user/bigclassescom