MELJUN CORTES BAFEDM2 - Week 09, 11, Presentation Deck

download MELJUN CORTES BAFEDM2 - Week 09, 11, Presentation Deck

of 55

Transcript of MELJUN CORTES BAFEDM2 - Week 09, 11, Presentation Deck

  • 7/27/2019 MELJUN CORTES BAFEDM2 - Week 09, 11, Presentation Deck

    1/55

    2013 IBMCorporation

    IBM Global Center for Smarter Analytics

    BAFEDM2: Fundamentals of EnterpriseData Management

    Week 09, 11

  • 7/27/2019 MELJUN CORTES BAFEDM2 - Week 09, 11, Presentation Deck

    2/55

    2013 IBMCorporation

    IBM Global Center for Smarter Analytics

    Copyright IBM Corporation 2013. All rights reserved.

    THE INFORMATION CONTAINED IN THIS PRESENTATION IS FOR INFORMATIONALPURPOSES ONLY. IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUTOF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHERDOCUMENTATION.

    IBM, the IBM logo, ibm.com, Cognos, SPSS and iLog are trademarks or registered

    trademarks of International Business Machines Corporation in the United States,other countries, or both. If these and other IBM trademarked terms are U.S.registered or common law trademarks owned by IBM at the time this informationwas published. Trademarks may also be registered or common law trademarks inother countries. A current list of IBM trademarks is available on the Web atCopyright and trademark information at http://www.ibm.com/legal/copytrade.html .

    The IBM logo must not be moved, added to or altered in any way.

    Other company, product, or service names may be trademarks or service marks ofothers.

    http://www.ibm.com/legal/copytrade.htmlhttp://www.ibm.com/legal/copytrade.html
  • 7/27/2019 MELJUN CORTES BAFEDM2 - Week 09, 11, Presentation Deck

    3/55

    2013 IBMCorporation

    IBM Global Center for Smarter Analytics

    Agenda

    3

    Module 4: Extract, Transform and Loading Process (continued)

    The 34 Subsystems of ETL

    Group Project

    Project Development (ETL Process)

  • 7/27/2019 MELJUN CORTES BAFEDM2 - Week 09, 11, Presentation Deck

    4/55

    2013 IBMCorporation

    IBM Global Center for Smarter Analytics

    Readings

    4

    Mohanty, S. (2012). Data Warehousing: Design,Development and Best Practices. Tata McGraw-Hill

    Publishing Company, India.

    Kimball, R. (2008). The Data Warehouse LifecycleToolkit, Second Edition. John Wiley & Sons.

  • 7/27/2019 MELJUN CORTES BAFEDM2 - Week 09, 11, Presentation Deck

    5/55

    2013 IBMCorporation

    IBM Global Center for Smarter Analytics

    Module 4: Extract, Transformand Loading Process (continued)

    BAFEDM2: Fundamentals of Enterprise Data

    Management

    5

  • 7/27/2019 MELJUN CORTES BAFEDM2 - Week 09, 11, Presentation Deck

    6/55

    2013 IBMCorporation

    IBM Global Center for Smarter Analytics

    Overview[1]

    Extract, transformation, and load (ETL) system is often

    estimated to consume 70 percent of the time and

    effort of building a DW/BI environment.

    6

    WarehouseWarehouse

    Cognos

    ETLETL

    Data QualityData QualityCom

    monM

    etaDa

    ta

    Commo

    nMeta

    Data

    CubingCubing

    DifferentSources Users

    accessingcubes

  • 7/27/2019 MELJUN CORTES BAFEDM2 - Week 09, 11, Presentation Deck

    7/55 2013 IBMCorporation

    IBM Global Center for Smarter Analytics

    Ten Major Requirements for ETL[1]

    7

  • 7/27/2019 MELJUN CORTES BAFEDM2 - Week 09, 11, Presentation Deck

    8/55 2013 IBMCorporation

    IBM Global Center for Smarter Analytics

    The 34 Subsystems of ETL[1]

    Extracting (1 3).

    Cleaning and Conforming (4 8).

    Delivering (9 21).

    Managing (22 34).

    8

  • 7/27/2019 MELJUN CORTES BAFEDM2 - Week 09, 11, Presentation Deck

    9/55 2013 IBMCorporation

    IBM Global Center for Smarter Analytics

    The 34 Subsystem of ETL:Extracting

    BAFEDM2: Fundamentals of Enterprise Data

    Management

    9

  • 7/27/2019 MELJUN CORTES BAFEDM2 - Week 09, 11, Presentation Deck

    10/55 2013 IBMCorporation

    IBM Global Center for Smarter Analytics

    The 34 Subsystems of ETL: Extracting[1]

    Subsystem 1: Data Profiling

    Unsorted Files, Profiled and then Sorted.

    10

    Data

    UnsortedFiles

    UnsortedFiles

    UnsortedFiles

    Sorted Files

  • 7/27/2019 MELJUN CORTES BAFEDM2 - Week 09, 11, Presentation Deck

    11/55 2013 IBMCorporation

    IBM Global Center for Smarter Analytics

    The 34 Subsystems of ETL: Extracting[1]

    11

    Subsystem 2: Change Data Capture

    The key goals for the change data capture subsystem are:

  • 7/27/2019 MELJUN CORTES BAFEDM2 - Week 09, 11, Presentation Deck

    12/55

    2013 IBMCorporation

    IBM Global Center for Smarter Analytics

    The 34 Subsystems of ETL: Extracting[1]

    Subsystem 3: Extract System

    Getting Data from a source

    12

    Data

    Name ( Last Name,First Name, Middle

    Name )

    Address (Street No. ,Phase, Village, City )

    Contact Numbers( Tel No., Cell phone

    No., )

    Sorted Files

    Sorted Files

    Sorted Files

  • 7/27/2019 MELJUN CORTES BAFEDM2 - Week 09, 11, Presentation Deck

    13/55

    2013 IBMCorporation

    IBM Global Center for Smarter Analytics

    The 34 Subsystem of ETL:Cleaning and Conforming

    BAFEDM2: Fundamentals of Enterprise Data

    Management

    13

  • 7/27/2019 MELJUN CORTES BAFEDM2 - Week 09, 11, Presentation Deck

    14/55

    2013 IBMCorporation

    IBM Global Center for Smarter Analytics

    The 34 Subsystems of ETL: Cleaning and

    Confirming Data[1]Subsystem 4: Data Cleansing SystemDetermine the dirty data to be fixed

    14

    Data

    Emp ID Name Age Salary

    1111 Juan Dela Cruz 28 30000

    2222Pedro Gil $anchez 25 15000

    Emp ID Name Manager Report To

    1111Juan Dela Cruz Y N/A2222Pedro Gil $anchez N 1111

    Emp ID Name City Contact No

    1111Juan Dela Cruz Antipolo 123456

    2222Pedro Gil Sanche$ Manila 654321

    RejectData

    Data to be fixed

  • 7/27/2019 MELJUN CORTES BAFEDM2 - Week 09, 11, Presentation Deck

    15/55

    2013 IBMCorporation

    IBM Global Center for Smarter Analytics

    REC IDRECDATE

    BUS UNITID

    BUSUNIT

    SOURCEDETAILS

    SRCCNT

    TARGETDEBIT

    TARGETCREDITS

    TGTCNT

    BUS UNITSRC

    SRCDEBIT SR CREDITS

    RC01PP234

    31-Aug-11 2PPXRL FIN TRX 3 -62,027.13 102,078,553.52 3.00PPXL 72,027.13

    102,078,553.52

    The 34 Subsystems of ETL: Cleaning and

    Confirming Data[1]Subsystem 5: Error-Event Schema Is a centralized dimensional schema. Purpose is to record every error event thrown by a quality screen

    anywhere in the ETL pipeline.

    15

    Error seen by Users

    Error Event Schemaused

    Transactional data seenby Support / Developer

    Team

    Sample Financial Data /Journal Records

    [FIN] - ERROR: 1130 - FIN TARGET DEBIT is notequal to SRC DEBIT

  • 7/27/2019 MELJUN CORTES BAFEDM2 - Week 09, 11, Presentation Deck

    16/55

  • 7/27/2019 MELJUN CORTES BAFEDM2 - Week 09, 11, Presentation Deck

    17/55

    2013 IBMCorporation

    IBM Global Center for Smarter Analytics

    The 34 Subsystems of ETL: Cleaning and

    Confirming Data[1]Subsystem 7: De-duplication SystemResponsible for determining duplicate data

    17

    Emp ID Name Age Salary

    1111Juan Dela Cruz 28 30000

    2222Pedro Gil 25 15000

    Emp ID Name Manager Report To

    1111 Juan Dela Cruz Y N/A

    2222Pedro Gil N 1111

    Emp ID Name City Contact No

    1111Juan Dela Cruz Antipolo 1234562222Pedro Gil Manila 654321

    Emp ID Name Age SalaryManager Report To City Contact No

    1111Juan Dela Cruz 28 300002222Pedro Gil 25 150001111Juan Dela Cruz Y N/A2222Pedro Gil N 1111

    1111Juan Dela Cruz Antipolo 1234562222Pedro Gil Manila 654321

    Deduplication happens whenmerged

  • 7/27/2019 MELJUN CORTES BAFEDM2 - Week 09, 11, Presentation Deck

    18/55

    2013 IBMCorporation

    IBM Global Center for Smarter Analytics

    The 34 Subsystems of ETL: Cleaning and

    Confirming Data[1]Subsystem 8: Conforming SystemDefine keys that can be used for conforming data.

    18

    Emp ID Name Age SalaryManager Report To City Contact No

    1111Juan Dela Cruz 28 300002222Pedro Gil 25 150001111Juan Dela Cruz Y N/A2222Pedro Gil N 11111111Juan Dela Cruz Antipolo 1234562222Pedro Gil Manila 654321

    Emp ID

    1111

    2222

    We can use this to conform anddescribe customers data

  • 7/27/2019 MELJUN CORTES BAFEDM2 - Week 09, 11, Presentation Deck

    19/55

  • 7/27/2019 MELJUN CORTES BAFEDM2 - Week 09, 11, Presentation Deck

    20/55

    2013 IBMCorporation

    IBM Global Center for Smarter Analytics

    The 34 Subsystem of ETL:Delivering

    BAFEDM2: Fundamentals of Enterprise Data

    Management

    20

  • 7/27/2019 MELJUN CORTES BAFEDM2 - Week 09, 11, Presentation Deck

    21/55

    2013 IBMCorporation

    IBM Global Center for Smarter Analytics

    The 34 Subsystems of ETL: Delivering Data for

    Presentation[1]Subsystem 9: Slowly Changing Dimension ManagerHistory keeping dimension.

    21

    EmpID Name Age Salary

    UpdateFlag

    ExpiryDate

    BeginDate End Date

    1111

    Juan

    DelaCruz 28 30000 N 1/1/2012 1/1/2011 1/1/20122222Pedro Gil 25 15000 Y 1/1/2013

    1111

    JuanDelaCruz 29 40000 Y 1/1/2012 1/1/2013

    Used for slowly changing dimension

  • 7/27/2019 MELJUN CORTES BAFEDM2 - Week 09, 11, Presentation Deck

    22/55

    2013 IBMCorporation

    IBM Global Center for Smarter Analytics

    The 34 Subsystems of ETL: Delivering Data for

    Presentation[1]Subsystem 10: Surrogate Key GeneratorThe use of surrogate keys for all dimension tables is strongly

    recommended.

    This implies that you need a robust mechanism for producingsurrogate keys in your ETL system.

    22

    SEQ_ID EMP ID EMP NAME CONTACT NO1 1123Rhia Trogo 11238992 3321Aurea Muncal 11234563 1234Apple Bulao 9800112

    4 1111Joseph Lim 4561188

    5 2344VincentInocentes 4561178

    6 1122ChristianCequena 9701102

    7 2211Juan Dela Cruz 97012438 6657Pedro Gil Puyat 86100929 1125Steve John 1870092

    10 1124Billy Joe 14501828

    Surrogate Keys

  • 7/27/2019 MELJUN CORTES BAFEDM2 - Week 09, 11, Presentation Deck

    23/55

    2013 IBMCorporation

    IBM Global Center for Smarter Analytics

    The 34 Subsystems of ETL: Delivering Data for

    Presentation[1]Subsystem 11: Hierarchy Manager

    23

    StudentData

    Student Information

    StudentName

    StudentContact

    No

    StudentAddress

    FirstName

    LastName

    MiddleName

    Area

    Number

    Street

    City

    Town

    Fixed Hierarchy Ragged

    Hierarchy

    No Hierarchy

  • 7/27/2019 MELJUN CORTES BAFEDM2 - Week 09, 11, Presentation Deck

    24/55

    2013 IBMCorporation

    IBM Global Center for Smarter Analytics

    The 34 Subsystems of ETL: Delivering Data for

    Presentation[1]Subsystem 12: Special Dimensions Manager

    24

    Date/Timedimensions

    Mini Dimensions

    Junk Dimensions

    User Maintained Dimensions

  • 7/27/2019 MELJUN CORTES BAFEDM2 - Week 09, 11, Presentation Deck

    25/55

    2013 IBMCorporation

    IBM Global Center for Smarter Analytics

    The 34 Subsystems of ETL: Delivering Data for

    Presentation[1]Subsystem 13: Fact Table Builders Transaction grain fact tablesThe pure addition of most current

    records is the easiest case, simply bulk loading new rows into thefact table.

    Periodic snapshots have similar loading characteristics to those ofthe transaction grain fact tables. The same processing applies forinserts and updates.

    Accumulating Snapshot: The design and administration of theaccumulating snapshot is quite different from the first two fact tabletypes. All accumulating snapshot fact tables have a set of dates,usually four to eight, which describe the typical process workflow.

    25

    Transactional ( BulkLoading) Periodic (Insert then

    Update )

    Accumulating (KeepinHistory, Updatethen Insert)

  • 7/27/2019 MELJUN CORTES BAFEDM2 - Week 09, 11, Presentation Deck

    26/55

    2013 IBMCorporation

    IBM Global Center for Smarter Analytics

    The 34 Subsystems of ETL: Delivering Data for

    Presentation[1]Subsystem 14: Surrogate Key Pipeline

    Replace natural keys with the appropriate dimension surrogatekeys.

    26

    Emp ID Name Age Salary Update Flag Expiry Date Begin Date End Date

    1111Juan DelaCruz 28 30000N 1/1/2012 1/1/2011 1/1/2012

    2222Pedro Gil 25 15000Y 1/1/2013

    1111

    Juan Dela

    Cruz 29 40000Y 1/1/2012 1/1/2013

    Emp IDManagerID Name Manager

    1111 1 Juan Dela Cruz Y

    2222 2Pedro Gil N

    02 3Steve Gates Y

    SEQ ID Emp ID Reports to Manager ID

    1 1111 02 1

    2 2222 1111 2

    Surrogate KeyForeign Key

    Primary Key

  • 7/27/2019 MELJUN CORTES BAFEDM2 - Week 09, 11, Presentation Deck

    27/55

    2013 IBMCorporation

    IBM Global Center for Smarter Analytics

    The 34 Subsystems of ETL: Delivering Data for

    Presentation[1]Subsystem 15: Multi-Valued Dimension Bridge Table Builder

    Metadata of different dimension

    27

    Fact Table

    Dimension Tables

    Cardinalities (M : MRelationship )

  • 7/27/2019 MELJUN CORTES BAFEDM2 - Week 09, 11, Presentation Deck

    28/55

    2013 IBMCorporation

    IBM Global Center for Smarter Analytics

    The 34 Subsystems of ETL: Delivering Data for

    Presentation[1]Subsystem 16: Late Arriving Data Handler

    Handles the process of late arriving facts or dimension data

    28

    SourceData

    DataWarehouse

    30min

    s

    Delayed Processing /Loading

    Logs all the journal entries in a given sales.Indicating Journal amount is not equal toaccount amount. Process is still running and

    not yet updated

  • 7/27/2019 MELJUN CORTES BAFEDM2 - Week 09, 11, Presentation Deck

    29/55

    2013 IBMCorporation

    IBM Global Center for Smarter Analytics

    The 34 Subsystems of ETL: Delivering Data for

    Presentation[1]Subsystem 17: Dimension Manager System

    Prepares and publishes conformed dimensions

    29

    Data warehouseData Mart

    Dimension

    Manager

    Table 3.4: Demonstration Sales Report for Used Car Dealers After the Slice in the dimension Date

    Sales quantity for Date = 1997Region

    Product Warsaw Cracow Poznan

    BMW 1000 150 300

    Audi 500 250 300

    Ford 500 100 200

    Published Cube

  • 7/27/2019 MELJUN CORTES BAFEDM2 - Week 09, 11, Presentation Deck

    30/55

    2013 IBMCorporation

    IBM Global Center for Smarter Analytics

    The 34 Subsystems of ETL: Delivering Data for

    Presentation[1]Subsystem 18: Fact Provider System

    Administration of one or more fact tables.

    30

    Emp ID Salary ID Manager IDCustomerID Location ID

    11111 1212 2211 1 222222 1213 2212 2 233333 1214 2211 3 344444 1215 2211 4 455555 1216 2212 5 1

    F

    d

    Fd

    F

    d

    d

    HoldsSalary ID,

    CustomerID, Salary

    Holds EmpID,ManagerID, Time

    in, Timeout

    F

    d

    Holds Location ID,Building , Branch,Street, City, Town,Zip, Region

  • 7/27/2019 MELJUN CORTES BAFEDM2 - Week 09, 11, Presentation Deck

    31/55

    2013 IBMCorporation

    IBM Global Center for Smarter Analytics

    The 34 Subsystems of ETL: Delivering Data for

    Presentation[1]Subsystem 19: Aggregate Builder

    Data structures created to improve performance

    31

    Data

    Used aggregation to filterdesired output

    Data

    Data

    Data

  • 7/27/2019 MELJUN CORTES BAFEDM2 - Week 09, 11, Presentation Deck

    32/55

    2013 IBMCorporation

    IBM Global Center for Smarter Analytics

    The 34 Subsystems of ETL: Delivering Data for

    Presentation[1]Subsystem 20: OLAP Cube Builder

    Enable analytic user to slice and dice data

    32

    CubeBuilder

    Users

    Table 3.4: Demonstration Sales Report for Used Car Dealers After the Slice in the dimension Date

    Sales quantity for Date = 1997Region

    Product Warsaw Cracow Poznan

    BMW 1000 150 300

    Audi 500 250 300

    Ford 500 100 200

    After slice anddiced

  • 7/27/2019 MELJUN CORTES BAFEDM2 - Week 09, 11, Presentation Deck

    33/55

    2013 IBMCorporation

    IBM Global Center for Smarter Analytics

    The 34 Subsystems of ETL: Delivering Data for

    Presentation[1]Subsystem 21: Data Propagation Manager

    Responsible for integrating enterprise data from the datawarehouse to be used by multiple users

    33

  • 7/27/2019 MELJUN CORTES BAFEDM2 - Week 09, 11, Presentation Deck

    34/55

  • 7/27/2019 MELJUN CORTES BAFEDM2 - Week 09, 11, Presentation Deck

    35/55

  • 7/27/2019 MELJUN CORTES BAFEDM2 - Week 09, 11, Presentation Deck

    36/55

  • 7/27/2019 MELJUN CORTES BAFEDM2 - Week 09, 11, Presentation Deck

    37/55

    2013 IBMCorporation

    IBM Global Center for Smarter Analytics

    RecoveryProcess

    The 34 Subsystems of ETL: Managing the ETL

    Environment[1]Subsystem 24: Recovery and Restart System

    Provides recovery and restart the system

    37

    Sources ETL

    Back Up

    Cubes

    UserCubesDuring Process / Schedule , creating back

    up

    Failed

  • 7/27/2019 MELJUN CORTES BAFEDM2 - Week 09, 11, Presentation Deck

    38/55

  • 7/27/2019 MELJUN CORTES BAFEDM2 - Week 09, 11, Presentation Deck

    39/55

  • 7/27/2019 MELJUN CORTES BAFEDM2 - Week 09, 11, Presentation Deck

    40/55

    2013 IBMCorporation

    IBM Global Center for Smarter Analytics

    The 34 Subsystems of ETL: Managing the ETL

    Environment[1]Subsystem 27: Workflow Monitor

    Provides detailed steps of how the workflow runs

    40

    Sample WorkflowLogs

  • 7/27/2019 MELJUN CORTES BAFEDM2 - Week 09, 11, Presentation Deck

    41/55

    2013 IBMCorporation

    IBM Global Center for Smarter Analytics

    The 34 Subsystems of ETL: Managing the ETL

    Environment[1]Subsystem 28: Sorting System

    Used for sorting data.

    41

    Query used to sort date byascending order

  • 7/27/2019 MELJUN CORTES BAFEDM2 - Week 09, 11, Presentation Deck

    42/55

  • 7/27/2019 MELJUN CORTES BAFEDM2 - Week 09, 11, Presentation Deck

    43/55

    2013 IBMCorporation

    IBM Global Center for Smarter Analytics

    The 34 Subsystems of ETL: Managing the ETL

    Environment[1]Subsystem 30: Problem Escalation System

    Responsible for reporting error / audit logs

    43

    Sources ETLFailed

    Testers/ Monitors / Support Team If Production Failed,Automatically reportedto designated groups

  • 7/27/2019 MELJUN CORTES BAFEDM2 - Week 09, 11, Presentation Deck

    44/55

    2013 IBMCorporation

    IBM Global Center for Smarter Analytics

    The 34 Subsystems of ETL: Managing the ETL

    Environment[1]Subsystem 31: Parallelizing/Pipelining System

    Able to run one ETL Process from different Sources

    44

    Sources ETL

    Sources

    Sources

    Different Sources are usingOne ETL Pipeline

  • 7/27/2019 MELJUN CORTES BAFEDM2 - Week 09, 11, Presentation Deck

    45/55

    2013 IBMCorporation

    IBM Global Center for Smarter Analytics

    The 34 Subsystems of ETL: Managing the ETL

    Environment[1]Subsystem 32: Security System

    Provides and assure business data/information sent are secured andencrypted.

    45

  • 7/27/2019 MELJUN CORTES BAFEDM2 - Week 09, 11, Presentation Deck

    46/55

    2013 IBMCorporation

    IBM Global Center for Smarter Analytics

    The 34 Subsystems of ETL: Managing the ETL

    Environment[1]Subsystem 33: Compliance Manager

    Responsible for assuring data is factual and precise when usersaccess the cubes.

    46

    Data Bank

    Sources ETL

    Error Data Junk Data

    Make Sure Data isCorrect andAccurate

    Audits

    Audits

    Audits

  • 7/27/2019 MELJUN CORTES BAFEDM2 - Week 09, 11, Presentation Deck

    47/55

    2013 IBMCorporation

    IBM Global Center for Smarter Analytics

    The 34 Subsystems of ETL: Managing the ETL

    Environment[1]Subsystem 34: Metadata Repository Manager

    47

    SEQ_ID Error ID Error Desc1mp_TRXS_ERR_01 Load Error : Process didnt complete2mp_TRXS_ERR_02 Data Error: Unknown member

    3mp_TRXS_ERR_03Configuration Error : Limit Reach for non characterssubsets

    SEQ_ID Version ID Mapping Name Mapping Desc Date1 1 mp_TRNX_SALES Transaction Process in Sales 1/1/2010

    2 1mp_TRNX_MKTGTransaction Process inMarketing 1/1/2010

    3 1mp_TRNX_PRODTransaction Process inProduction 1/1/2010SEQ_ID Version ID Mapping Name Error ID

    1 1mp_TRNX_SALES 1

    2 1mp_TRNX_SALES 2

    3 1mp_TRNX_SALES 3

    Sources ETLFailed:ERR_02

    Error 2 Generated and Reported

  • 7/27/2019 MELJUN CORTES BAFEDM2 - Week 09, 11, Presentation Deck

    48/55

    2013 IBMCorporation

    IBM Global Center for Smarter Analytics

    Real Time Implication

    BAFEDM2: Fundamentals of Enterprise DataManagement

    48

  • 7/27/2019 MELJUN CORTES BAFEDM2 - Week 09, 11, Presentation Deck

    49/55

    2013 IBMCorporation

    IBM Global Center for Smarter Analytics

    Real Time Implication[2]

    Business users expect the data warehouse to be continuously updated

    throughout the day

    49

    Sources ETL

    While ETL Process isrunning , users canactually refresh cubes forrecent and up to datedata

  • 7/27/2019 MELJUN CORTES BAFEDM2 - Week 09, 11, Presentation Deck

    50/55

  • 7/27/2019 MELJUN CORTES BAFEDM2 - Week 09, 11, Presentation Deck

    51/55

    2013 IBMCorporation

    IBM Global Center for Smarter Analytics

    Designing and Developing ETL System[1]

    IMPORTANCE OF GOOD SYSTEM DEVELOPMENT PRACTICES

    ETL development may follow an iterative, interactive process, but thefundamental systems development practices still apply.

    Set up a header format and comment fields for your code.

    Hold structured design reviews early enough to allow changes.

    Write clean, well-commented code.

    Stick to the naming standards.

    Use the code library and management system.

    Test everything both unit testing and system testing.

    Document everything!!.

    51

  • 7/27/2019 MELJUN CORTES BAFEDM2 - Week 09, 11, Presentation Deck

    52/55

  • 7/27/2019 MELJUN CORTES BAFEDM2 - Week 09, 11, Presentation Deck

    53/55

    2013 IBMCorporation

    IBM Global Center for Smarter Analytics

    For the Next Session

    BAFEDM2: Fundamentals of Enterprise DataManagement

    53

  • 7/27/2019 MELJUN CORTES BAFEDM2 - Week 09, 11, Presentation Deck

    54/55

    2013 IBMCorporation

    IBM Global Center for Smarter Analytics

    For the Next Sessions

    Agenda

    Module 5: Advanced Topics

    Master Data Management

    Measuring the Effectiveness of a Data Warehouse

    10 Signs of a Data Warehousing Project in Trouble

    Ethical Dilemmas in Data Mining and WarehousingBig Data

    54

  • 7/27/2019 MELJUN CORTES BAFEDM2 - Week 09, 11, Presentation Deck

    55/55

    References

    [1] : Kimball, R. (2008). The Data Warehouse Lifecycle Toolkit, Second

    Edition. John Wiley & Sons.

    [2] : Mohanty S. (2006), Data Warehousing: Design, development andBest Practices. Tata McGraw-Hill Publishing Company, India.