B210 Sizing

download B210 Sizing

of 51

Transcript of B210 Sizing

  • 7/27/2019 B210 Sizing

    1/51

    Module 10: Sizing

    After completing this module, you will be able to:

    Determine column sizing requirements based on chosen

    data type.

    Determine physical table row size.

    Determine table sizing requirements via estimates and

    empirical evidence.

    Determine index sizing requirements via estimates andempirical evidence.

  • 7/27/2019 B210 Sizing

    2/51

    Row Format

    Only if Variable Length

    Columns are declared.

    R

    o

    w

    Le

    n

    g

    t

    h

    Row ID

    Row

    HashUniq.

    Value

    S

    pa

    r

    e

    P

    r

    e

    s

    e

    nc

    e

    B

    i

    t

    s

    VAR

    CHAR

    Offsets

    Fixed

    Length

    Columns

    Uncom-

    pressed

    Columns

    VAR

    CHAR

    Columns

    R

    e

    f.

    A

    rr

    a

    y

    P

    t

    r.

    2 4 4 1 2 2 2 2 0-n Bytes 2

    1 - n Bytes Data for Compressible

    Columns that are neither

    Compressed nor NULL.

    Part.

    #

    Only for a V2R5 Table

    with a PPI.

    SELECT * returns columns in the order declared in the CREATE TABLE.

  • 7/27/2019 B210 Sizing

    3/51

  • 7/27/2019 B210 Sizing

    4/51

    NULL and COMPRESS

    The default for all columns is NULLable and NOT compressible.

    NULL column values are NOT automatically compressed.

    COMPRESS compresses NULL column values to zero space.

    COMPRESS compresses NULL and occurrences of to

    zero space.

    CREATE TABLE Employee

    ( emp_num INTEGER NOT NULL,dept_num INTEGER COMPRESS,

    country CHAR(20) COMPRESS 'Australia',:

    COMPRESS all fixed length columns where at least 10% of the rows

    participate. COMPRESS creates smaller rows, therefore more rows/block and fewer

    blocks.

    You cannot use the COMPRESS option with:

    Primary Index

    Referenced and referencing data columns (PK-FK)

    VARCHAR, VARBYTE, and VARGRAPHIC data types CHAR, BYTE, and GRAPHIC data types > 255 characters

    How many bits are allocated?

    01

    2

  • 7/27/2019 B210 Sizing

    5/51

    NOT NULL Clause

    In relational systems, NULL means missing or unknown.

    The default for all columns is NULLable.

    Specify NOT NULL on the CREATE TABLE as appropriate:

    All PRIMARY KEY columns

    All UNIQUE columns

    All columns with NN (No NULLS) constraints

    Any index may contain NULLs unless explicitly prohibited.

    NOT NULL permits storage of zero and blank values.

  • 7/27/2019 B210 Sizing

    6/51

    Multi-Value Compression (V2R5 Feature)

    What is Multi-Value Compression?

    A V2R5 enhancement that allows up to 255 distinct values (plus NULL)to be

    compressed per fixed width column a.k.a., Value List Compression.

    Enhances system cost and performance of high data volume applications.

    Provides significant performance improvement for general ad-hocworkloads and full-table scan applications.

    Reduces storage cost by storing more logical data per unit of physical

    capacity. Performance is improved because there is less physical data to

    retrieve during scan-oriented queries.

    Best candidate for compression - fixed width column with a small number

    of frequently occurring values in large table used frequently for FTSs.

    Compression in V2R4 provided up to 25% capacity savings and scan

    performance increases up to 35%.

    V2R5 Compression is expected to yield even greater savings.

  • 7/27/2019 B210 Sizing

    7/51

    Rules for Compression

    Up to 255 values (plus NULL) can be compressed per column.

    The maximum size of a compress value remains 255 bytes.

    Only fixed-width columns can be compressed.

    Primary index columns cannot be compressed.

    ALTER TABLE does not support compression on existing columns.

    You can not add a compress value to a column if doing so would cause

    the table header row to exceed its maximum length (64 KB).

    Number of Presence Bits needed for Multi-Value Compression:

    Compressed Values # of Bits

    1 12 - 3 2

    4 - 7 3

    8 - 15 4

    16 - 31 5

    32 - 63 6

    64 - 127 7

    128 - 255 8

    Note:

    If column is "nullable",

    there will be 1 additional

    presence bit.

  • 7/27/2019 B210 Sizing

    8/51

    Implementing Multi-Value Compression

    ALTER TABLE accepts a list of values

    for field attribute 'COMPRESS'.

    Add a Education column.

    ALTER TABLE People

    ADD Education CHAR(10)COMPRESS

    ('ELEMENTARY',

    'MIDDLE',

    'HIGH',

    'COLLEGE',

    'POST GRAD') ;

    CREATE TABLE accepts a list of values

    for field attribute 'COMPRESS'.Compress the top 15 most populated

    countries. Compress the Sex field.

    CREATE TABLE People

    ( Name VARCHAR(100),Address VARCHAR(100),

    Country CHAR(100) COMPRESS

    ( 'Australia', 'Bangladesh',

    'Brazil', 'China', 'England',

    'France', 'Germany', 'India',

    'Indonesia', 'Japan', 'Mexico',

    'Nigeria', 'Pakistan',

    'Russian Federation',

    'United States of America'

    ),

    Sex CHAR(1) COMPRESS ('F', 'M')

    );

  • 7/27/2019 B210 Sizing

    9/51

  • 7/27/2019 B210 Sizing

    10/51

    V2R5 Compression

    Compression is not supported on the following data types:

    INTERVAL TIME TIMESTAMP

    VARCHAR VARBYTE VARGRAPHIC

    Compression is supported on the following data types:

    DATE (4) CHAR (N) (N

  • 7/27/2019 B210 Sizing

    11/51

    ANSI and Teradata Data Types

    Teradata ANSI (Core) equivalent

    INTEGER INTEGER

    SMALLINT SMALLINT

    BYTEINT

    DATE

    TIME, TIME WITH TIME ZONE

    TIMESTAMP, TIMESTAMP WITH TIME ZONE

    FLOAT FLOAT

    FLOAT REAL

    FLOAT DOUBLE PRECISION

    DECIMAL(n,m) DECIMAL(n,m)

    DECIMAL(n,m) NUMERIC(n,m)

    CHAR(n) CHAR(n)

    VARCHAR(n), CHAR VARYING(n)

    LONG VARCHAR

    BYTE(n), VARBYTE(n)

    GRAPHIC(n)VARGRAPHIC(n), LONG VARGRAPHIC

  • 7/27/2019 B210 Sizing

    12/51

    Data Types

    -128 to +127 Non-ANSI

    S

    I

    G

    N

    BYTEINT

    -32,768 to +32,767SMALLINT

    S

    I

    G

    N

    -2,147,483,648 to +2,147,483,647INTEGER

    S

    I

    G

    N

  • 7/27/2019 B210 Sizing

    13/51

    Data Types (Date and Time)

    ((YYYY - 1900)) * 10000 + (MM * 100) + DDDATE (4 Bytes)

    TIME (6 Bytes) hh:mm:ss.ssssss

    TIME WITH

    TIME ZONE (8 Bytes) hh:mm:ss.ssssss +/- hh.mm

    TIMESTAMP (10 Bytes) Date + Time

    TIMESTAMP WITH

    TIME ZONE (12 Bytes)

    Date + Time + Zone

    DATE, TIME, and TIMESTAMP are also SQL functions.

    CURRENT_DATE, CURRENT_TIME, and CURRENT_TIMESTAMP represent

    values.

  • 7/27/2019 B210 Sizing

    14/51

    Data Types (Decimal, Numeric, Float, Real,

    Double Precision)

    Decimal and Numeric

    DECIMAL [ ( n [ , m ] ) ]

    n = 1 - 18

    m = 0 - n

    Default = (5, 0)

    Stored in scaled binary

    Number of Number of

    Digits Bytes

    1 to 2 1 byte

    3 to 4 2 bytes

    5 to 9 4 bytes

    10 to 18 8 bytes

    FLOAT, REAL, and DOUBLE PRECISION

    Notes:

    Range is 2 * 10 -307 to 2 * 10 +308

    15 significant decimal digit precision.

    Manipulated in IEEE floating point format.

    Corresponds to, but is not identical to, IBM

    normalized 64 bit floating point.

    S

    I

    G

    N

    Exponent

    11 Bits

    Fraction/Mantissa

    52 Bits

    8 Bytes

  • 7/27/2019 B210 Sizing

    15/51

  • 7/27/2019 B210 Sizing

    16/51

    Data Types (BYTE and VARBYTE)

    BYTE

    Stored in host format.

    Never translated by the Teradata Database

    Handled as if they were n-byte, unsigned binary

    integers.

    Suitable for digitized image information

    VARBYTE ( n ) n = 1 - 64000

    Variable length binary string

    2 byte column offset identifies location in row

    2

    BYTE ( n ) n = 1 - 64000

    Fixed length binary string

  • 7/27/2019 B210 Sizing

    17/51

    Data Types (GRAPHIC, VARGRAPHIC,

    LONG VARGRAPHIC)

    2 bytes per character.

    Used for double-byte Kanji and Hiragana, and

    Chinese double-byte Hanzi values.

    VARGRAPHIC

    GRAPHIC ( n ) n = 1 - 32000Fixed length multi-byte character string;

    n is the length in logical characters.

    VARGRAPHIC ( n ) n = 1 - 32000

    Variable length multi-byte character

    string; n is the length in logicalcharacters.

    2 byte column offset identifies location in row

    2

    2LONG VARGRAPHIC

    Equivalent to VARGRAPHIC (32000)

    2 byte column offset identifies location in row

  • 7/27/2019 B210 Sizing

    18/51

    Variable Column Offsets

    50 75 75 100 c1 data c3 data

    c1 c2 c3 x 5 7 10 5 0

    0

    Offset Array Variable Length Data

    Offset values tell the starting location of a variable column.

    Determine the column length by subtracting its starting location

    from the next columns starting location.

    The definition of variable length columns requires one additional 2-byte offset that locates the end of the final variable column.

  • 7/27/2019 B210 Sizing

    19/51

    Sizing Considerations

    Compress only columns where at least 10% to 20% of the rows participate.

    COMPRESS will create smaller rows, and smaller rows are generally more

    efficient.

    Compress columns whose NULL values are not subject to changes.

    Compression saves space but costs computational overhead.

    Adding a column that is not compressible expands all rows.

    Adding a column that is compressible and there are no spare presence bits

    expands all rows.

    Dropping a column changes all row sizes where data is present.

    Use VARCHAR whenever the space savings offset the overhead.

  • 7/27/2019 B210 Sizing

    20/51

  • 7/27/2019 B210 Sizing

    21/51

    Example: Sizing a Row

    SUPV JOB LAST FIRST HIRE BIRTH SALARY

    EMP # EMP # DEPT # CODE NAME NAME DATE DATE AMOUNT

    PK,SA FK FK FK NN NN NN NN NN

    INT INT INT SMALL Char Char DATE DATE DECINT Fix 20 Var 30 (10,2)

    EMPLOYEE

    Using this logical row layout, the next page will size a typical row of the Employee table.

  • 7/27/2019 B210 Sizing

    22/51

    Example: Completing the Row Size Calculation Form

    Data Type # of Columns Size TOTAL

    BYTEINT * 1 =SMALLINT 1 * 2 = 2

    INTEGER 3 * 4 = 12

    DATE 2 * 4 = 8

    TIME(ANSI) * 6 =

    TIME with ZONE * 8 =

    TIMESTAMP * 10 =

    TIMESTAMP/ ZONE * 12 =DECIMAL 1-2 * 1 =

    3-4 * 2 =

    5-9 * 4 =

    10-18 1 * 8 = 8

    FLOAT * 8 =

    Fixed 1 SUM(n) = 20

    Variable 1 SUM(a) = 14LOGICAL SIZE = 64

    Overhead = 14

    Partitioned Primary Index Overhead (2) = 0

    Variable Column Offsets(_1_ * 2 ) + 2 ; zero if no variable columns = 4

    ___0__ Bits for Compressible Columns

    ___3__ Nullable Columns

    ___3__ / 8 (Quotient only) = 0PHYSICAL ROW SIZE = 82

    Table Name EMPLOYEE

    Column Name Type Max Average

    First Name CV 30 14

    SUM(a) = 14

    Variable Column Data Detail

    SUM(a) = SUM of the AVERAGE number of bytes

    expected for the variable column.

    SUM(n) = SUM of the CHAR and GRAPHIC column

    bytes.

    ** For V2, round up to an even number of bytes.

  • 7/27/2019 B210 Sizing

    23/51

    Row Size Calculation Form

    Data Type # of Columns Size TOTAL

    BYTEINT * 1 =SMALLINT * 2 =

    INTEGER * 4 =

    DATE * 4 =

    TIME(ANSI) * 6 =

    TIME with ZONE * 8 =

    TIMESTAMP * 10 =

    TIMESTAMP/ ZONE * 12 =DECIMAL 1-2 * 1 =

    3-4 * 2 =

    5-9 * 4 =

    10-18 * 8 =

    FLOAT * 8 =

    Fixed SUM(n) =

    Variable SUM(a) =

    LOGICAL SIZE =

    Overhead = 14

    Partitioned Primary Index Overhead (2) =

    Variable Column Offsets(__ * 2 ) + 2 ; zero if no variable columns =

    _____ Bits for Compressible Columns

    _____ Nullable Columns

    _____ / 8 (Quotient only) =PHYSICAL ROW SIZE =

    Table Name ____________________

    Column Name Type Max Average

    SUM(a) =

    Variable Column Data Detail

    SUM(a) = SUM of the AVERAGE number of bytes

    expected for the variable column.

    SUM(n) = SUM of the CHAR and GRAPHIC column

    bytes.

    ** For V2, round up to an even number of bytes.

  • 7/27/2019 B210 Sizing

    24/51

    Row Size Exercise

    CALL

    CALL

    NUMBER

    PK, SA

    TAKEN BY

    EMPLOYEE

    NUMBER

    CUSTOMER

    NUMBER

    FK, NN FK

    PLACED BY

    CONTACT

    NUMBER

    FK

    PLACED BY

    EMPLOYEE

    NUMBER

    FK

    ORIGINAL

    CALL

    NUMBER

    FK

    UPI NUSI

    CALL

    DATE

    NN

    NUSI

    SYSTEM

    CATEGORY

    PART

    NUMBER

    CALL

    TIME

    NN

    CALL

    STATUS

    CODE

    FK, NN

    CALL

    TYPE

    CODE

    FK, NN

    CALL

    PRIORITY

    CODE

    FK, NN

    NUSINUSI

    FK FK

    AREA

    CODE PHONE EXTENSION

    100 100

    DOMAIN

    NAME

    Call_Number

    Employee_Number

    Customer_Number

    Contact_Number

    Date

    Time

    Call_Status_Code

    Call_Type_Code

    Call_Priority_Code

    Area_Code

    Phone

    Extension

    System_Number

    Part_Category

    MAX

    BYTES

    4

    4

    4

    4

    4

    6

    2

    2

    2

    2

    4

    4

    4

    4

    DATA

    TYPE

    INT

    INT

    INT

    INT

    D

    TIME

    CF

    SMALL INT

    INT

    INT

    INT

    INT

    SMALL INT

    SMALL INT

    Example ofConstraint Number

    This table will be partitioned via RANGE_N on Call_Date with Monthly intervals.

  • 7/27/2019 B210 Sizing

    25/51

    Row Size Calculation Form

    Data Type # of Columns Size TOTAL

    BYTEINT * 1 =SMALLINT * 2 =

    INTEGER * 4 =

    DATE * 4 =

    TIME(ANSI) * 6 =

    TIME with ZONE * 8 =

    TIMESTAMP * 10 =

    TIMESTAMP/ ZONE * 12 =DECIMAL 1-2 * 1 =

    3-4 * 2 =

    5-9 * 4 =

    10-18 * 8 =

    FLOAT * 8 =

    Fixed SUM(n) =

    Variable SUM(a) =

    LOGICAL SIZE =

    Overhead = 14

    Partitioned Primary Index Overhead (2) =

    Variable Column Offsets(___ * 2 ) + 2 ; zero if no variable columns =

    ______ Bits for Compressible Columns

    ______ Nullable Columns

    ______ / 8 (Quotient only) =PHYSICAL ROW SIZE =

    Table Name CALL

    Column Name Type Max Average

    SUM(a) =

    Variable Column Data Detail

    SUM(a) = SUM of the AVERAGE number of bytes

    expected for the variable column.

    SUM(n) = SUM of the CHAR and GRAPHIC column

    bytes.

    ** For V2, round up to an even number of bytes.

    3 6

    10 40

    1 4

    1 6

    1 2

    0

    58

    2

    9

    9 +175 76

  • 7/27/2019 B210 Sizing

    26/51

  • 7/27/2019 B210 Sizing

    27/51

    Table Headers

    One row per AMP per Table.

    Table headers are a separate subtable.

    Minimum table header block size is 512

    bytes (1 sector) per AMP.

    Typically, a table header will be at least

    1024 bytes. For example:

    Tables with 4 or more columns

    Tables with 2 columns and a NUSI

    Compressed values are maintained

    within the table header.

    Multiple values that are compressedfor a column are maintained in an

    array in the table header.

    Maximum value of approximately 64K.

    The base table header covers all of its

    secondary index subtables.

    STANDARD ROW HEADERLENGTH, ROW ID, PRESENCE/SPARE BYTES

    FIELD 2 OFFSETFIELD 3 OFFSET

    FIELD 4 OFFSET

    FIELD 5 OFFSET

    FIELD 6 OFFSET

    FIELD 7 OFFSET

    FIELD 8 OFFSET

    FIELD 9 OFFSET

    EXTRA OFFSET

    DATABASE AND TABLENAMES

    DATABASE ID

    OTHER INTERNAL INFOCREATION DATE, PROTECTION, TYPE OFJOURNALING, JOURNAL ID, STRUCTVERSION, etc.

    FIELD

    1

    INDEX DESCRIPTORS(36 BYTES * # INDEXES) PLUS 20 BYTESPER INDEX COLUMN

    F2

    ALWAYS NULL 3

    FASTLOAD & RESTOREINFORMATION

    USUALLY NULL

    F4

    BASE COLUMN INFOCOUNT OF COLUMNS, LOCATION OF FIRSTFIXED FIELD, NUMBER OF PRESENCE BITS,etc. 24 BYTES

    COLUMN INFORMATION FOREACH COLUMN

    20 BYTES PER COLUMN(+ COMPRESS VALUE)DATA TYPE, OFFSET WITHIN ROWNULLIBLE/NOT NULLIBLE, COMPRESS/NOCOMPRESS, PRESENCE BIT LOCATION, etc.

    FIEL

    D

    5

    RESTARTABLE SORTINFORMATION

    USUALLY NULL

    F

    6

    ALWAYS NULL 7

    ALWAYS NULL 8

    ALWAYS NULL 9

    ROW LENGTH orREF. ARRAY POINTER

    V2R4 Example of Table Header

  • 7/27/2019 B210 Sizing

    28/51

    Sizing a Data Table

    Block sizes vary within a table, so compute a range.

    Typically, the maximum block size is 63.5 KB bytes, and a typical block size is 48 KB.

    Formula:

    (BlockSize - 38) / RowSize = RowsPerBlock (rounded down)

    RowCount / RowsPerBlock = Blocks (rounded up)

    NumAmps * 1024 = Header(Blocks * BlockSize) + Header = NO FALLBACK (BlockSize = Typical Block Size)

    (Blocks * BlockSize) * 2 + Header = FALLBACK (BlockSize = Typical Block Size)

    Parameters:

    38 = Block Header + Block Trailer BlockSize = Typical block size in bytes

    1024 = Typical table header size NumAmps = Number of AMPs in the systemRowCount = Number of table rows expected

    RowSize = Physical row size

    Note:

    For large tables, table headers and block overhead (38 bytes) add a minimal

    amount of size to the table. Therefore, multiply row size by number of rows

    and double for Fallback.

  • 7/27/2019 B210 Sizing

    29/51

    Table Sizing Exercise

    Given this data, estimate the size of a table with Fallback and a typical block size of 48K.

    BlockSize = 49,152 bytes (48K) NumAmps = 20

    RowCount = 501,000,000

    RowSize = 98 bytes (includes overhead)

    Formula: (BlockSize - 38) / RowSize = RowsPerBlock (round down)

    RowCount / RowsPerBlock = Blocks (round up)

    NumAmps * 1024 = Header

    (Blocks * BlockSize) + Header = No Fallback

    (Blocks * BlockSize) * 2 + Header = Fallback

    Calculation:

    (49,152 - 38) / 98 = 501 rows per block501,000,000 / 501 = 1,000,000 blocks

    20 * 1024 = 20,480 for table headers

    (1,000,000 * 49,152) + 20,480 = 49,152,020,480 (No Fallback)

    (1,000,000 * 49,152) * 2 + 20,480 = 98,304,020,480 (Fallback)

    An easier way to estimate this table size:501,000,000 x 98 bytes x 2 (Fallback) = 98,196,000,000

  • 7/27/2019 B210 Sizing

    30/51

    Estimating the Size of a USI Subtable

    Spare &

    Presence

    Row

    Length

    Row ID of USI

    Row

    Hash

    Uniq.

    Value

    Secondary

    IndexValue

    Base Table RowIdentifier Ref.

    ArrayPointer

    2 4 4 2 7 Variable 2 (opt.) 4 4 2Bytes

    Row Offsets

    & Misc. (>=7)

    Row

    Hash

    Uniq.

    ValuePart.

    #

    There is one Index row for each Base Table row.USI subtable row size = (Index value size + 29 or 31)

    Where 31 = 4 (Row Header and Row Ref. Array pointer)

    + 8 (This row's Row ID)

    + 9 (Spare, presence, and offset bytes)

    + 2 (Optional Partition # for PPI tables)

    + 8 (Base table Row ID)

    To estimate the amount of space needed for a USI subtable, you can use the

    following formulas.

    For tables with NPPI, USI Subtable Size = (Row count) * (index value size + 29)

    For tables with PPI, USI Subtable Size = (Row count) * (index value size + 31)

    Note: Double this figure for Fallback.

  • 7/27/2019 B210 Sizing

    31/51

    Estimating the Size of a NUSI Subtable

    Row Offsets

    & Misc. (>=7)

    Spare &

    Presence

    Row

    Length

    Row ID of NUSI

    RowHash

    Uniq.Value

    Secondary

    Index

    Value

    Ref.

    Array

    Pointer

    2 4 4 2 7 Variable 8/10 8/10 2Bytes

    Table Row ID List

    P RH U

    2 4 4

    P RH U

    2 4 4

    There is at least one index row per AMP for each distinct index value that is in the base table

    on that AMP.

    To estimate the size of a NUSI subtable,

    Size = (Row count) * 8 (or 10 for PPI tables)

    + ( (#distinct values) * (Index value size + 21) * MIN ( #AMPs, Rows per value ) )

    MIN( ___ , ___ ) use the smaller of the two values.

    Double this figure for Fallback.

    More AMPs than typical rows/value:

    (10 AMPS, 5 rows/value)

    NOT Every AMP has every value.

    NOT Every AMP has a subtable row for every value.

    More strongly selective. Fewer rows returned from an equality search.

    Example:

    More typical rows/value than AMPS:

    (50 rows/value, 10 AMPS)

    Every AMP probably has every value.

    Every AMP has a subtable row for every value.

    More weakly selective. More rows returned from an equality search.

  • 7/27/2019 B210 Sizing

    32/51

    Estimating the Size

    of a Reference Index Subtable

    Row OffsetsOverhead

    Row

    Length

    Row ID of RI

    Row

    Hash

    Uniq.

    Value

    Foreign

    KeyValue

    Ref.

    ArrayPointer

    2 4 4 2 6 1 0+ Variable 4 2Bytes

    Valid

    Flag Count

    Optional Presence and

    variable length indicators

    There is one reference index row for each distinct foreign key value.

    RI subtable row size = (Index value size + 25)

    Where 25 = 4 (Row Header and Row Ref. Array pointer)

    + 8 (This row's Row ID)

    + 8 (Overhead and row offset bytes)

    + 1 (Validity flag)

    + 4 (Count)

    To estimate the size of a Reference Index (RI) subtable, you can use the

    following formula.

    RI Subtable Size = (Distinct count) * (index size + 25)

    Double this figure for Fallback.

  • 7/27/2019 B210 Sizing

    33/51

    Index Sizing Exercise

    A 1,000,000 row Fallback table, on a 20-AMP system, has an Integer USI and a

    50 Row per Value CHAR(29) NUSI with 20,000 distinct values.

    Assume the table is partitioned. Estimate the space for each secondary index.

    Formulas:

    USI Size = Row Count * (IndexValueSize + 31)

    NUSI Size = (Row Count * 10)

    +

    (#distinct values) * (Index value size + 21) * MIN ( #AMPs , Rows per value )

    1,000,000 * (4 + 31 + 1) * 2 = 72,000,000

    (10,000,000 + (20,000 * (29 + 21) * 20)) * 2 = 60,000,000

    Note: +1 - Rows are allocated on even offsets within the data block.

  • 7/27/2019 B210 Sizing

    34/51

    Empirical Sizing

    The best way to size a production table, including indexes is:

    1. Load a known percentage of rows onto the system.

    2. Query the DD through the view DBC.TableSize.

    3. Create one index.

    4. Query the DD through the view DBC.TableSize.

    5. Repeat steps 3 and 4 as necessary.

    6. Multiply the results to determine the production size.

    Example:

    Step 1 Load 1% of a table onto a system.

    Step 2 SELECT SUM(CurrentPerm) FROM DBC.Tablesize

    WHERE DatabaseName = DATABASE

    AND TableName = 'Daily_Sales' ;

    Sum(CurrentPerm)

    671,744Step 3 CREATE INDEX (sales_date) ON Daily_Sales;

    Step 4 SELECT SUM(CurrentPerm) FROM DBC.Tablesize

    WHERE DatabaseName = DATABASE

    AND TableName = 'Daily_Sales' ;

    Sum(CurrentPerm)914,944

    Therefore, index size is:

    914,944

    671,744243,200

    Note: The same query

    without the SUM

    keyword returns

    per/AMP figures which

    reveal distribution

    efficiency.

  • 7/27/2019 B210 Sizing

    35/51

    Spool Space

    Maximum spool space needs vary with table size, use (type of application),

    and frequency of use. Large systems use more spool to duplicate tables on each AMP.

    Cylinders not currently used for data may be used for spool.

    The user's spool amount may be changed dynamically.

    Avoid unnecessary copying or redistribution of entire tables to spool.

    As user con currency and/or SQL complexi ty inc reases, add mo re SPOOL.

    Running out of Spool Space

    If a user exceeds their Spool space limit, they will receive the following errormessage.

    2646 No more spool space in username

    If the AMP runs out of Spool space (insufficient available cylinders for Spool),

    the following message will be displayed.

    2507 - Out of spool space on disk

  • 7/27/2019 B210 Sizing

    36/51

    Release of Spool

    Intermediate Spool

    Intermediate Spool results are held until the (LastUse) Explain step.

    Output Spool

    Output Spool results are held until:

    Last spool Response - BTEQ

    CLOSE cursor - PreProcessor

    ERQ, Terminate function - CLI

    Session ends (Job Abort, timeout, logoff, etc.)

    System is restarted

    System Restart - each AMP rebuilds its Master Index from its Cylinder

    Indexes.

    The AMPs delete all spool files by moving them to the Free Cylinder List.

    This costs only one I/O per spool cylinder, and saves maintaining the

    Master Index on disk.

  • 7/27/2019 B210 Sizing

    37/51

    System Sizing Exercise

    Some general guidelines for estimating system size:

    20% (or more) of total space for Spool5% of total space for PJs/development/staging

    5% of total space for DBC & Transient Journal

    20 - 50% of data size for indexes

    If not using Fallback, multiply the amount of raw data by a factor of 2 or 3.

    If using Fallback, multiply the amount of raw data by a factor of 4 or 5.

    Example:

    User raw data 400 GB

    Estimate of Vdisk space needed 1600 - 2000 GB

    Proof:

    Estimate of Vdisk space 1600 - 2000 GB- Spool (20%) - 400 GB

    - PJs/development/staging (5%) - 100 GB

    - DBC & Transient Journal (5%) - 100 GB

    1000 - 1400 GB

    User raw data 400 GB

    20 - 50% for indexes 80 - 200 GBFallback (for data and indexes) 960 - 1200 GB

    A 4 node system, each

    with 7 AMPs, and each

    AMP with 72 GB of Vdiskspace would meet this

    requirement.

    4 x 7 x 72 GB = 2016 GB

    of available space

  • 7/27/2019 B210 Sizing

    38/51

    Sizing Summary

    Accurate row counts and sizes are needed to get good space estimates.

    Database sizing includes:

    Tables + Fallback +

    Secondary Indexes + Fallback +

    Reference Indexes + Fallback +

    Join Indexes + Fallback +

    Hash Indexes + Fallback +

    Permanent Journal (dual or single) +

    Stored Procedure space +Spool space +

    Temporary space

  • 7/27/2019 B210 Sizing

    39/51

    Review Questions

    1. Which of the following can be used with the COMPRESS option?

    a. Referencing columns - Foreign Key

    b. Referenced column - Primary Key as a USIc. Unique Primary Index

    d. Non-unique Secondary Index

    2. Which section of a row identifies the starting location of variable length data column data and is

    present only if variable length columns are declared?

    a. Uncompressed Columns

    b. VARCHAR Columns

    c. Presence Bits

    d. Column Offsets

    3. How can you override the default that a column with a NULL value will require row space?

    a. Use the COMPRESS option on the column as part of the CREATE TABLE statement.

    b. When creating the user, set the default so that columns will default to COMPRESS when creating

    a table.

    c. Use the NOT NULL option on the column as part of the CREATE TABLE statementd. Use the DEFAULT NULL option on the column as part of the CREATE TABLE statement.

    4. What is the minimum space the table headers will take for a 6-column table on a 10 AMP system?

    a. 10240 bytes

    b. 4096 bytes

    c. 5120 bytes

    d. 1024 bytes

    5. What DD view can you query to get sizing information about tables? _____________________

  • 7/27/2019 B210 Sizing

    40/51

    Module 10: Review Question Answers

    1. Which of the following can be used with the COMPRESS option?

    a. Referencing columns - Foreign Key

    b. Referenced column - Primary Key as a USIc. Unique Primary Index

    d. Non-unique Secondary Index

    2. Which section of a row identifies the starting location of variable length data column data and is

    present only if variable length columns are declared?

    a. Uncompressed Columns

    b. VARCHAR Columns

    c. Presence Bits

    d. Column Offsets

    3. How can you override the default that a column with a NULL value will require row space?

    a. Use the COMPRESS option on the column as part of the CREATE TABLE statement.

    b. When creating the user, set the default so that columns will default to COMPRESS when creating

    a table.

    c. Use the NOT NULL option on the column as part of the CREATE TABLE statementd. Use the DEFAULT NULL option on the column as part of the CREATE TABLE statement.

    4. What is the minimum space the table headers will take for a 6-column table on a 10 AMP system?

    a. 10240 bytes

    b. 4096 bytes

    c. 5120 bytes

    d. 1024 bytes

    5. What DD view can you query to get sizing information about tables? DBC.Tablesize

  • 7/27/2019 B210 Sizing

    41/51

    Lab Exercises

    Lab Exercise 10-1

    PurposeIn this lab, you will compress multiple values for a column in order to reduce Perm space.

    What you need

    Populated AU.Accounts table and an empty table in your database

    Tasks

    1. Populate your Accounts table from the AU.Accounts table using the INSERT/SELECT statement:

    INSERT INTO Accounts SELECT * FROM AU.Accounts;

    Using the DBC.TableSize view, what is the amount of Perm space used. Accounts =___________

    2. Create a new table, named "Accounts_MVC", based on the Accounts table except compress the

    following city names:

    Culver City, Hermosa Beach, Los Angeles, and Santa Monica

    Populate your Accounts_MVC table from the AU.Accounts table using INSERT/SELECT.

    Using the DBC.TableSize view, what is the amount of Perm space used. Accounts_MVC =___________

  • 7/27/2019 B210 Sizing

    42/51

    Lab Exercises

    Lab Exercise 10-2

    PurposeIn this lab, you will use populate tables, determine tables sizes, and create secondary indexes.

    What you need

    Populated AU.Trans table and an empty table in your database

    Tasks

    1. Determine the size of your empty Trans table using the DBC.TableSize view (SELECT with and withoutthe SUM aggregate function).

    Size of empty Trans = _______________

    What size are the table headers on each AMP? _______________

    2. Using SHOW TABLE, the Row Size Calculation form and the Sizing a Data Table Formula, estimate

    the size of this table; assume 15,000 rows.

    Estimated size of Trans = _______________

    3. Populate your Trans table from the AU.Trans table using the following INSERT/SELECT statement:

    INSERT INTO Trans SELECT * FROM AU.TRANS;

    Use the SELECT COUNT(*) function to verify the number of rows. ___________

  • 7/27/2019 B210 Sizing

    43/51

    Lab Exercises

    Lab Exercise 10-2 (cont.)

    Tasks

    4. Using the DBC.TableSize view, determine the actual size of the Trans table by using the SUM aggregate

    function.

    Size of populated Trans = _______________

    5. Create a USI on the Trans_Number column.

    Estimate the size of the USI = _______________

    Actual size of the USI = _______________ (use the empirical sizing technique)

    6. Create a NUSI on the Trans_ID column.

    Estimate the size of the NUSI = ______________ (Hint: use DISTINCT function)Actual size of the NUS I= ______________ (use the empirical sizing technique)

  • 7/27/2019 B210 Sizing

    44/51

    Lab Exercises

    Lab Exercise 10-3

    PurposeIn this lab, you will determine tables sizes and establish referential integrity between two tables.

    What you need

    Populated PD tables and empty tables in your database

    Tasks

    1. Populate your Employee and Emp_Phone tables from the PD.Employee and PD.Emp_Phone tablesusing the following INSERT/SELECT statements.

    INSERT INTO Employee SELECT * FROM PD.Employee;

    INSERT INTO Emp_Phone SELECT * FROM PD.Emp_Phone;

    2. Using the DBC.TableSize view, determine the actual size of the Emp_Phone table by using the SUM

    aggregate function.

    Size of populated Emp_Phone table = _______________

  • 7/27/2019 B210 Sizing

    45/51

    Lab Exercises

    Lab Exercise 10-3 (cont.)

    Tasks

    3. The Foreign key is Employee_Number in PD.Emp_Phone and the Primary Key is the

    Employee_Number in PD.Employee.

    Create a References constraint on Employee_Number using the following SQL statements.

    ALTER TABLE Emp_Phone ADD CONSTRAINT fk1

    FOREIGN KEY (Employee_Number)

    REFERENCES Employee (Employee_Number);

    (use the HELP CONSTRAINT Emp_Phone.fk1; to view constraint information.

    4. Using the DBC.TableSize view, determine the actual size of the Emp_Phone table by using the SUM

    aggregate function.

    Estimate the size of the Reference Index = _______________

    Size of populated Emp_Phone with references index = _______________

    Size of references index = _______________

    5. Drop the Foreign Key constraint by executing the following SQL command.

    ALTER TABLE Emp_Phone DROP CONSTRAINT fk1;

  • 7/27/2019 B210 Sizing

    46/51

    Lab Solutions for Lab 10-1

    Lab Exercise 10-1

    1. Populate your Accounts table from the AU.Accounts table using the INSERT/SELECT statement:

    INSERT INTO Accounts SELECT * FROM AU.Accounts;

    Using the DBC.TableSize view, what is the amount of Perm space used. Accounts = 1,804,288

    2. Create a new table, named "Accounts_MVC", based on the Accounts table except compress the

    following city names: Culver City, Hermosa Beach, Los Angeles, and Santa Monica

    CREATE SET TABLE Accounts_MVC, FALLBACK,

    NO BEFORE JOURNAL, NO AFTER JOURNAL

    (ACCOUNT_NUMBER INTEGER NOT NULL,

    NUMBER INTEGER,

    STREET CHAR(25),

    CITY CHAR(20) COMPRESS ('Hermosa Beach', 'Culver City', 'Los Angeles','Santa Monica'),

    STATE CHAR(2),ZIP_CODE INTEGER,

    BALANCE_FORWARD DECIMAL(10,2),

    BALANCE_CURRENT DECIMAL(10,2))

    PRIMARY INDEX ( ACCOUNT_NUMBER );

    Populate your Accounts_MVC table from the AU.Accounts table using INSERT/SELECT.

    Using the DBC.TableSize view, what is the amount of Perm space used. Accounts_MVC = 1,404,828

  • 7/27/2019 B210 Sizing

    47/51

    Lab Solutions for Lab 10-2

    Lab Exercise 10-2

    1. Determine the size of your empty Trans table using the DBC.Tablesize view (SELECT with and without the SUMaggregate function).

    SELECT SUM(CurrentPerm) FROM DBC.Tablesize

    WHERE DatabaseName = DATABASE AND TableName = 'Trans' ;

    Sum(CurrentPerm)

    8192 Size of empty Trans = 8192 (Captured from an 8 AMP system)

    What size are the table headers on each AMP? 1024

    2. Using SHOW TABLE, the Row Size Calculation form and the Sizing a Data Table Formula, estimate the size of this

    table; assume 15,000 rows.

    Each row is 24 bytes long plus 14 bytes for overhead = 38 bytes

    38 x 15,000 = 570,000 x 2 (Fallback) = 1,140,000 bytes approx.

    Estimated size of Trans = 1,140,000

    3. Populate your Trans table from the AU.Trans table using the following INSERT/SELECT statement:

    INSERT INTO Trans SELECT * FROM AU.TRANS;

    Use the SELECT COUNT(*) function to verify the number of rows. 15,000

    SELECT COUNT(*) FROM Trans; Count(*)15000

  • 7/27/2019 B210 Sizing

    48/51

    Lab Solutions for Lab 10-2 (cont.)

    Lab Exercise 10-2 (cont.)4. Using the DBC.Tablesize view, determine the actual size of the Trans table by using the SUM aggregate function.

    Size of populated Trans = 1,153,024

    SELECT SUM(CurrentPerm) FROM DBC.Tablesize

    WHERE DatabaseName = DATABASE

    AND TableName = 'Trans' ;

    Sum(CurrentPerm)

    1153024 (Estimated size was 1,140,000)

    5. Create a USI on the Trans_Number column.

    CREATE UNIQUE INDEX (Trans_Number) on Trans;

    Estimate the size of the USI = 960,000

    (4 + 29) x 15,000 = 480,000 x 2 (Fallback) = 990,000 bytes approx.

    Actual size of the USI = 1,026,048 (use the empirical sizing technique)

    SELECT SUM(CurrentPerm) FROM DBC.Tablesize

    WHERE DatabaseName = DATABASE

    AND TableName = 'Trans' ;

    Sum(CurrentPerm)2179072 2,179,072 - 1,153,024 = 1,026,048

  • 7/27/2019 B210 Sizing

    49/51

    Lab Solutions for Lab 10-2 (cont.)

    Lab Exercise 10-2 (cont.)6. Create a NUSI on the Trans_ID column.

    CREATE INDEX (Trans_ID) on Trans;

    Estimate the size of the NUSI = _______________

    SELECT COUNT(DISTINCT(Trans_ID)) FROM Trans;

    Count(Distinct(TRANS_ID))975

    (15,000 x 8) + ( 975 x (4 + 21) x 8 ) = 315,000 bytes approx.

    315,000 x 2 (Fallback) = 630,400 bytes approx.

    Actual size of the NUSI = 523,264 (use the empirical sizing technique)

    SELECT SUM(CurrentPerm) FROM DBC.Tablesize

    WHERE DatabaseName = DATABASE

    AND TableName = 'Trans' ;

    Sum(CurrentPerm)

    2702336 2,702,336 - 2,179,072 = 523,264

  • 7/27/2019 B210 Sizing

    50/51

    Lab Solutions for Lab 10-3

    Lab Exercise 10-31. Populate your Employee and Emp_Phone tables from the PD.Employee and PD.Emp_Phone tables using the following

    INSERT/SELECT statement:

    INSERT INTO Employee SELECT * FROM PD.Employee;

    INSERT INTO Emp_Phone SELECT * FROM PD.Emp_Phone;

    2. Using the DBC.Tablesize view, determine the actual size of the Emp_Phone table by using the SUM aggregate function.

    Size of populated Emp_Phone = 124,928

    SELECT SUM (CurrentPerm) FROM DBC.Tablesize

    WHERE DatabaseName = DATABASE

    AND TableName = 'Emp_Phone' ; Sum(CurrentPerm)

    124,928

    3. The Foreign key is Employee_Number in the Emp_Phone table and the Primary Key is the Employee_Number in the

    Employee table.

    Create a References constraint on Employee_Number using the following SQL statements.

    ALTER TABLE Emp_Phone ADD CONSTRAINT fk1 FOREIGN KEY (Employee_Number)

    REFERENCES Employee (Employee_Number);

    (use the HELP CONSTRAINT Emp_Phone.fk1; to view constraint information.

    HELP CONSTRAINT Emp_Phone.fk1;

    Name Type State Index ID Foreign Key Columns ...FK1 REFERENCE VALID 0 EMPLOYEE_NUMBER ...

  • 7/27/2019 B210 Sizing

    51/51

    Lab Solutions for Lab 10-3 (cont.)

    Lab Exercise 10-3 (cont.)

    4. Using the DBC.Tablesize view, determine the actual size of the Emp_Phone table by using the SUM aggregate function.

    SELECT COUNT(DISTINCT(Employee_Number)) AS "Count"

    FROM Emp_Phone;

    Count

    1000

    (4 + 25) x 1,000 = 29,000 x 2 (Fallback) = 58,000 bytes approx.

    Estimate the size of the Reference Index = 58,000

    Size of populated Emp_Phone with references index = 190,464

    SELECT SUM (CurrentPerm) FROM DBC.Tablesize

    WHERE DatabaseName = DATABASE

    AND TableName = 'Emp_Phone' ;

    Sum(CurrentPerm)

    190464 190,464 - 124,928 = 65,536

    Size of references index = 65,536

    5. Drop the Foreign Key constraint by executing the following SQL command.