datawhs

download datawhs

of 25

Transcript of datawhs

  • 8/9/2019 datawhs

    1/25

  • 8/9/2019 datawhs

    2/25

    ObjectiveIn this presentation we will

    be focusing on

    Data, information.

    Need of information.

    Database Data warehousing concept

    Difference betweendatabase and datawarehouse.

    Architecture of datawarehousing

    Applications of Datawarehousing

  • 8/9/2019 datawhs

    3/25

    Data and Information They may sound synonymous but are

    very different from each other.

    Data are plain facts. When data are

    processed, organized, structured orpresented in a given context so as tomake them useful, they are calledInformation.

  • 8/9/2019 datawhs

    4/25

    Database

    Database is a placewhere data is taken asa

    base and managed toget available fast andefficient access

    In general it is a

    reservoir havingoperational dataneeded for dailybusiness.

  • 8/9/2019 datawhs

    5/25

    Data Warehouse

    A single, complete andconsistent store ofdata obtained from avariety of different

    sources made availableto end users in a whatthey can understandand use in a business

    context.

  • 8/9/2019 datawhs

    6/25

    Similarities anddifferences

    Database and Data warehouse,Bothsupport application data storage accessand retrieval of data but only Datawarehouse support Decision making andBusiness intelligence.

    Data warehouse has historical data andthe current data so comparison becomes

    very easy for business analysis.

  • 8/9/2019 datawhs

    7/25

    Major Distinction

    Database is used for running thebusiness.

    Data warehouse is used to know howto run the business.

  • 8/9/2019 datawhs

    8/25

    A situation!

    You are an analyst working for a computer firm.

    As an analyst you want know-

    1- Who are your customers?

    2- What kind of products they want to buy?

    3- what is the most effective distribution channel?4- What product promotions have the big impact

    on revenue?

    5- How do I decide upon segmentation ?

  • 8/9/2019 datawhs

    9/25

    What do I do? Data scattered all over network based database.

    Data is available but cant understand it.

    Data needs to be collected and summarized togive it some meaning.

    Data collected cannot be used for analysis.

    All you need is to have a data warehouse in your

    company.

  • 8/9/2019 datawhs

    10/25

  • 8/9/2019 datawhs

    11/25

    Warehouses are VeryLarge Databases

    35%

    30%

    25%

    20%

    15%

    10%

    5%

    0%5GB

    5-9GB

    10-19GB 50-99GB 250-499GB

    20-49GB 100-249GB 500GB-1TB

    Initial

    Projected 2Q96

    Source: META Group, Inc.

    Respon

    de

    nts

  • 8/9/2019 datawhs

    12/25

    Terabytes -- 10^12bytes:

    Petabytes -- 10^15bytes:

    Exabytes -- 10^18bytes:

    Zettabytes -- 10^21bytes:

    Zottabytes -- 10^24

    Walmart -- 24Terabytes

    GeographicInformationSystems

    National MedicalRecords

    Weather images

    Intelligence AgencyVideos

    Very Large Data Bases

  • 8/9/2019 datawhs

    13/25

    Data WarehouseArchitecture

    Data Warehouse

    Engine

    Optimized Loader

    Extraction

    Cleansing

    Analyze

    Query

    Metadata Repository

    Relational

    Databases

    Legacy

    Data

    Purchased

    Data

    ERPSystems

  • 8/9/2019 datawhs

    14/25

    Components of theWarehouse

    Data Extraction and Loading

    The Warehouse

    Analyze and Query -- OLAP Tools Metadata

    Data Mining tools

  • 8/9/2019 datawhs

    15/25

    Loading the Warehouse

    Cleaning the data

    before it isloaded

  • 8/9/2019 datawhs

    16/25

    Why is LOADINGrequired?

    Warehouse data comes fromdisparate questionable sources

    Outside sources withquestionable quality procedures

  • 8/9/2019 datawhs

    17/25

    Data Integration AcrossSources

    Trust Credit cardSavings Loans

    Same datadifferent name

    Different dataSame name

    Data found herenowhere else

    Different keyssame data

  • 8/9/2019 datawhs

    18/25

    Data TransformationTerms

    Extracting

    Conditioning

    Scrubbing Merging

    Householding

    Enrichment

    Scoring

    Loading Validating

    Delta Updating

  • 8/9/2019 datawhs

    19/25

    Refresh

    Propagate updates on source data tothe warehouse

    Issues: when to refresh

    how to refresh -- refresh techniques

  • 8/9/2019 datawhs

    20/25

    De-normalization

    Normalization in a data warehousemay lead to lots of small tables

    Can lead to excessive I/Os sincemany tables have to be accessed

    De-normalization is the answerespecially since updates are rare

  • 8/9/2019 datawhs

    21/25

    True Warehouse

    Data Marts

    Data Sources

    Data Warehouse

  • 8/9/2019 datawhs

    22/25

    A Sample Query

    Select month, dollars, cume(dollars)as run_dollars, weight, cume(weight)as run_weights

    from sales, market, product, periodtwhere year = 1993and product like Columbian%

    and city like San Fr%order by t.perkey

  • 8/9/2019 datawhs

    23/25

    Automated processes indata warehouses

    select product, dollars as jun97_sales,(select sum(s1.dollars)from market mi, product pi, period, ti, sales siwhere pi.product = product.productand ti.year = period.yearand mi.city = market.city) as total97_sales,100 * dollars/(select sum(s1.dollars)from market mi, product pi, period, ti, sales siwhere pi.product = product.productand ti.year = period.yearand mi.city = market.city) as percent_of_yr

    from market, product, period, sales where year = 1997

    and month = June and city like Ahmed%

    order by product;

  • 8/9/2019 datawhs

    24/25

    Applications

    Industry Application

    Finance Credit Card Analysis

    Insurance Claims, Fraud AnalysisTelecommunication Call record analysis

    Transport Logistics management

    Consumer goods promotion analysis

    Data Service providersValue added data

    Utilities Power usage analysis

  • 8/9/2019 datawhs

    25/25