Day 1-2 - DW concepts

download Day 1-2 - DW concepts

of 80

description

DW concepts

Transcript of Day 1-2 - DW concepts

  • Introduction to Data WarehousingCopyright@ Pawan Likhi

    Copyright@ Pawan Likhi

  • AgendaOperational SystemsOverview of Data WarehousingData Warehouse ArchitecturesUnderstand the ETL processInformatica Introduction

    Copyright@ Pawan Likhi

    Copyright@ Pawan Likhi

  • Operational Systems Copyright@ Pawan Likhi

    Copyright@ Pawan Likhi

  • What is an Operational System?Operational systems are just what their name implies; they are the systems that help us run the day-to-day enterprise operations.

    These are the backbone systems of any enterprise, such as order entry inventory etc.

    The classic examples are airline reservations, credit-cardauthorizations, and ATM withdrawals etc.,

    Copyright@ Pawan Likhi

    Copyright@ Pawan Likhi

  • Characteristics of Operational SystemsContinuous availabilityPredefined access pathsTransaction integrityVolume of transaction - HighData volume per query - LowSupports day to day control operationsLarge number of users

    Copyright@ Pawan Likhi

    Copyright@ Pawan Likhi

  • Data Warehousing Copyright@ Pawan Likhi

    Copyright@ Pawan Likhi

  • Data Warehouse DefinitionSubject OrientedIntegratedTime variantNon-volatile collection of data in support of management decision processesCopyright@ Pawan LikhiThe Data Warehouse is

    Copyright@ Pawan Likhi

  • Data Warehouse- Differences from Operational SystemsCopyright@ Pawan LikhiAccountingOrder EntryBillingCustomerUsageRevenueOperational data is organized by specific processes or tasks and is maintained by separate systemsWarehoused data is organized by subject area and is populated from many operational systemsOperationalSystemsDataWarehouse

    Copyright@ Pawan Likhi

  • Data Warehouse- Differences from Operational SystemsCopyright@ Pawan LikhiApplication SpecificIntegratedApplications and their databases were designed and built separatelyEvolved over long periods of timeIntegrated from the start

    Designed (or Architected) at one time, implemented iteratively over short periods of timeOperationalSystemsData Warehouse

    Copyright@ Pawan Likhi

  • Data Warehouse- Differences from Operational SystemsCopyright@ Pawan LikhiPrimarily concerned with current dataGenerally concerned with historical dataOperationalSystemsDataWarehouse

    Copyright@ Pawan Likhi

  • Datawarehouse- Differences from Operational SystemsCopyright@ Pawan LikhiLoad/ UpdateConsistent Points in TimeUpdated constantly

    Data changes according to need, not a fixed scheduleAdded to regularly, but loaded data is rarely directly changedDoes NOT mean the Data warehouse is never updated or never changes!!Constant ChangeOperational systems DatabaseData warehouse InsertInsertUpdateInitial LoadIncremental LoadIncremental LoadUpdateDelete

    Copyright@ Pawan Likhi

  • Data in a Data WarehouseWhat about the data in the Datawarehouse?

    Separate DSS data baseStorage of data only, no data is createdIntegrated and Scrubbed dataHistorical dataRead only (no recasting of history)Various levels of summarizationSubject orientedEasily accessible

    Copyright@ Pawan Likhi

    Copyright@ Pawan Likhi

  • Life cycle of the DWOperational DatabasesFirst time loadRefreshRefreshRefreshPurge or Archive

    Copyright@ Pawan Likhi

  • Data Warehouse- Application AreasFollowing are some Business Applications of a data warehouse: Risk management Financial analysis Marketing programs Profit trends Procurement analysis Inventory analysis Statistical analysis Claims analysis Manufacturing optimization Customer relationship managementCopyright@ Pawan Likhi

    Copyright@ Pawan Likhi

  • RelationaltoolsApplications/ WebAny DataAny AccessAny SourceExternaldataOperationaldataOLAPtoolsText, imageRelational / MultidimensionalSpatialAudio, videoWeb

    Copyright@ Pawan Likhi

  • Types of datawarehousesEnterprise Data Warehouse - An enterprise data warehouse provides a central database for decision support throughout the enterprise. ODS (Operational Data Store) - This has a broad enterprise wide scope, but unlike the real entertprise data warehouse, data is refreshed in near real time and used for routine business activity. example finding the status of a customer order

    Data Mart - Datamart is a subset of data warehouse and it supports a particular region, business unit or business function. Copyright@ Pawan Likhi

    Copyright@ Pawan Likhi

  • CS 336Generic Warehouse ArchitectureExtractor/MonitorExtractor/MonitorExtractor/MonitorIntegratorWarehouseClientClientDesign PhaseMaintenanceLoading...MetadataOptimizationQuery & Analysis

    Copyright@ Pawan Likhi

  • Issues in Data WarehousingWarehouse DesignExtractionWrappers, monitors (change detectors)IntegrationCleansing & mergingWarehousing specification & MaintenanceOptimizationsMiscellaneous (e.g., evolution)

    Copyright@ Pawan Likhi

  • OLTP: On Line Transaction ProcessingDescribes processing at operational sitesOLAP: On Line Analytical ProcessingDescribes processing at warehouseOLTP vs. OLAP

    Copyright@ Pawan Likhi

  • Warehouse is a Specialized DBStandard DB (OLTP)Mostly updatesMany small transactionsMb - Gb of dataCurrent snapshotIndex/hash on p.k.Raw dataThousands of users (e.g., clerical users)Warehouse (OLAP)Mostly readsQueries are long and complexGb - Tb of dataHistoryLots of scansSummarized, reconciled dataHundreds of users (e.g., decision-makers, analysts)

    Copyright@ Pawan Likhi

  • Data Marts Copyright@ Pawan Likhi

    Copyright@ Pawan Likhi

  • What is a Data mart?Data mart is a decentralized subset of data found either in a data warehouse or as a standalone subset designed to support the unique business unit requirements of a specific decision-support system. Data marts have specific business-related purposes such as measuring the impact of marketing promotions, or measuring and forecasting sales performance etc,.

    Copyright@ Pawan LikhiEnterpriseData Warehouse

    Copyright@ Pawan Likhi

  • Data marts - Main FeaturesMain Features:Low cost Controlled locally rather than centrally, conferring power on the user group.Contain less information than the warehouse Rapid response Easily understood and navigated than an enterprise data warehouse.Within the range of divisional or departmental budgets

    Copyright@ Pawan Likhi

    Copyright@ Pawan Likhi

  • Advantages of Datamart over DatawarehouseDatamart Advantages :Typically single subject area and fewer dimensionsLimited feedsVery quick time to market (30-120 days to pilot)Quick impact on bottom line problemsFocused user needsLimited scope Optimum model for DW constructionDemonstrates ROIAllows prototyping

    Copyright@ Pawan Likhi

    Copyright@ Pawan Likhi

  • Disadvantages of Data Mart Data Mart disadvantages :Does not provide integrated view of business information.Uncontrolled proliferation of data marts results in redundancyMore number of data marts complex to maintainScalability issues for large number of users and increased data volume

    Copyright@ Pawan Likhi

    Copyright@ Pawan Likhi

  • Basic Data Warehouse Architecture

    Copyright@ Pawan LikhiMeta Data ManagementAdministrationMiningOperational & External data

    Copyright@ Pawan Likhi

  • Basic Data Warehouse ArchitectureCopyright@ Pawan Likhi

    Copyright@ Pawan Likhi

  • Operational & External Data layer Copyright@ Pawan LikhiThe database-of-recordConsists of system specific reference data and event data Source of data for the data warehouse. Contains detailed data Continually changes due to updates Stores data up to the last transaction. Operational & ExternalDataLayer

    Copyright@ Pawan Likhi

  • Basic Data Warehouse Architecture

    Meta Data Management

    Administration

    Mining

    Operational & External data

    Data Warhouse Training - Series 1

  • Data Staging layer Copyright@ Pawan LikhiExtracts data from operational and external databases.

    Transforms the data and loads into the data warehouse.

    This includes decoding production data and merging of records from multiple DBMS formats.

    Data Staginglayer

    Copyright@ Pawan Likhi

  • Basic Data Warehouse Architecture

    Meta Data Management

    Administration

    Mining

    Operational & External data

    Data Warhouse Training - Series 1

  • Data Warehouse layer Copyright@ Pawan LikhiStores data used for informational analysisPresent summarized data to the end-user for analysisThe nature of the operational data, the end-user requirements and the business objectives of the enterprise determine the structure Data ware houseLayer

    Copyright@ Pawan Likhi

  • Basic Data Warehouse Architecture

    Meta Data Management

    Administration

    Mining

    Operational & External data

    Data Warhouse Training - Series 1

  • Meta Data layer Copyright@ Pawan LikhiMetadata is data about data. Stored in a repository. Contains all corporate Metadata resources: database catalogs, data dictionaries Meta Data Layer

    Copyright@ Pawan Likhi

  • Basic Data Warehouse Architecture

    Meta Data Management

    Administration

    Mining

    Operational & External data

    Data Warhouse Training - Series 1

  • Process Management layer Copyright@ Pawan Likhi Process Management LayerScheduler or the high-level job control

    To build and maintain the data warehouse and data directory information

    To keep theData warehouse up-to-date.

    Copyright@ Pawan Likhi

  • Basic Data Warehouse Architecture

    Meta Data Management

    Administration

    Mining

    Operational & External data

    Data Warhouse Training - Series 1

  • Information Access layer Copyright@ Pawan Likhi Information Access LayerInterfaced with the data warehouse through an OLAP server.Performs analytical operations and presents data for analysis.End-users generates ad-hoc reports and perform multidimensional analysis using OLAP tools

    Copyright@ Pawan Likhi

  • Basic Data Warehouse Architecture

    Meta Data Management

    Administration

    Mining

    Operational & External data

    Data Warhouse Training - Series 1

  • Dimensional Modeling Copyright@ Pawan Likhi

    Copyright@ Pawan Likhi

  • Facts or Measures are the Key Performance Indicators of an enterpriseFactual data about the subject areaNumeric, summarizedCopyright@ Pawan LikhiNet ProfitSales RevenueGross MarginProfitabilityCostFacts and Measures

    Copyright@ Pawan Likhi

  • DimensionDimensions put measures in perspectiveWhat, when and where qualifiers to the measuresDimensions could be products, customers, time, geography etc.Copyright@ Pawan LikhiSales Revenue(Measure) What was sold ? Whom was it sold to ? When was it sold ? Where was it sold ?

    Copyright@ Pawan Likhi

  • Some Examples of Data warehousing DimensionsThe following Dimensions are common in all Data warehouses invarious forms

    Product Dimension Customer DimensionGeographic Dimension Time dimension

    Copyright@ Pawan Likhi

    Copyright@ Pawan Likhi

  • ModelingWarehouses differ from operational structures: Analytical requirementsSubject orientationData must map to subject oriented information:Identify business subjectsDefine relationships between subjectsName the attributes of each subjectModeling is iterative

    Copyright@ Pawan Likhi

  • Defining the business model Creating the dimensional model Modeling summaries4.Creating the physical model12, 34Modeling the Data Warehouse

    Copyright@ Pawan Likhi

  • Identifying Business RulesProductType Monitor Status PC15 inchNewServer17 inchRebuilt19 inchCustomNoneLocationGeographic proximity0 - 1 miles1 - 5 miles > 5 milesStoreStore > District > RegionTimeMonth > Quarter > Year

    Copyright@ Pawan Likhi

  • Creating the Dimensional Model

    Identify fact tablesTranslate business measures into fact tablesAnalyze source system information for additional measuresIdentify base and derived measuresDocument additivity of measuresIdentify dimension tablesLink fact tables to the dimension tablesCreate views for users

    Copyright@ Pawan Likhi

  • Dimension TablesDimension tables have the following characteristics: Contain textual information that represents the attributes of the businessContain relatively static dataAre joined to a fact table through a foreign key reference

    Copyright@ Pawan Likhi

  • Fact TablesFact tables have the following characteristics:Contain numeric measures (metrics) of the businessMay contain summarized (aggregated) dataMay contain date-stamped dataHave key value that is typically a concatenated key composed of the primary keys of the dimensionsJoined to dimension tables through foreign keys that reference primary keys in the dimension tables

    Copyright@ Pawan Likhi

  • Dimensional Model (Star Schema)Dimension tablesFact table

    Copyright@ Pawan Likhi

  • Star Schema ModelCentral fact tableRadiating dimensionsDenormalized modelStore TableStore_idDistrict_id...Item TableItem_idItem_desc...

    Time TableDay_idMonth_idPeriod_idYear_idProduct TableProduct_idProduct_desc Sales Fact TableProduct_idStore_idItem_idDay_idSales_dollarsSales_units...

    Copyright@ Pawan Likhi

  • Star Schema ModelEasy for users to understandFast response to queriesSimple metadataSupported by many front end toolsLess robust to changeSlower to build

    Copyright@ Pawan Likhi

  • Snowflake Schema ModelTime TableWeek_idPeriod_idYear_idDept TableDept_idDept_descMgr_idMgr TableDept_idMgr_idMgr_nameProduct TableProduct_idProduct_desc

    Item TableItem_idItem_descDept_idSales Fact TableItem_idStore_idSales_dollarsSales_unitsStore TableStore_idStore_descDistrict_idDistrict TableDistrict_idDistrict_desc

    Copyright@ Pawan Likhi

  • Snowflake Schema ModelDirect use by some toolsMore flexible to changeProvides for speedier data loadingMay become large and unmanageableDegrades query performanceMore complex metadata

    Copyright@ Pawan Likhi

  • Snowflake Schema ModelDirect use by some toolsMore flexible to changeProvides for speedier data loadingMay become large and unmanageableDegrades query performanceMore complex metadata

    Copyright@ Pawan Likhi

  • Four steps in dimensional modelingIdentify the process being modeled.Determine the grain at which facts will be stored.Choose the dimensions.Identify the numeric measures for the facts.

    Copyright@ Pawan Likhi

  • Retail Sales QuestionsWhat is the lift due to a promotion?Lift = gain in sales in a product because its being promotedRequires estimated baseline sales valueCould be calculated based on historical sales figuresDetect time shifting Customers stock up on the product thats on saleThen they dont buy more of it for a long timeDetect cannibalizationCustomers buy the promoted product instead of competing productsPromoting Brand A reduces sales of Brand BDetect cross-sell of complementary productsPromoting charcoal increases sales of lighter fluidPromoting hamburger meat increases sales of hamburger bunsWhat is the profitability of a promotion?Considering promotional costs, discounts, lift, time shifting, cannibalization, and cross-sell

    Copyright@ Pawan Likhi

  • Grain of a Fact TableGrain of a fact table = the meaning of one fact table rowDetermines the maximum level of detail of the warehouseExample grain statements: (one fact row represents a)Line item from a cash register receiptBoarding pass to get on a flightDaily snapshot of inventory level for a product in a warehouseSensor reading per minute for a sensorStudent enrolled in a courseFiner-grained fact tables:are more expressivehave more rowsTrade-off between performance and expressivenessRule of thumb: Err in favor of expressivenessPre-computed aggregates can solve performance problems

    Copyright@ Pawan Likhi

  • Surrogate KeysPrimary keys of dimension tables should be surrogate keys, not natural keysNatural key: A key that is meaningful to usersSurrogate key: A meaningless integer key that is assigned by the data warehouseKeys or codes generated by operational systems = natural keys (avoid using these as keys!)E.g. Account number, UPC code, Social Security Number

    Copyright@ Pawan Likhi

  • Benefits of Surrogate KeysData warehouse insulated from changes to operational systemsEasy to integrate data from multiple systems What if theres a merger or acquisition?Narrow dimension keys Thinner fact table Better performanceThis can actually make a big performance difference.Better handling of exceptional cases For example, what if the value is unknown or TBD?Using NULL is a poor optionThree-valued logic is not intuitive to usersThey will get their queries wrongJoin performance will sufferBetter: Explicit dimension rows for Unknown, TBD, N/A, etc.Avoids tempting query writers to assume implicit semanticsExample: WHERE date_key < '01/01/2004'Will facts with unknown date be included?

    Copyright@ Pawan Likhi

  • More Dimension TablesProductMerchandise hierarchySKU Brand Category DepartmentOther attributesProduct name, Size, Weight, Package Type, etc.StoreGeography hierarchyStore ZIP Code County StateAdministrative hierarchyStore District RegionOther attributesAddress, Store name, Store Manager, Square Footage, etc.HierarchiesCommon in dimension tablesMultiple hierarchies can appear in the same dimensionDont need to be strict hierarchies e.g. ZIP code that spans 2 counties

    Copyright@ Pawan Likhi

  • Factless Fact TablesFactless fact tableA fact table without numeric fact columnsUsed to capture relationships between dimensionsExamples:Student/department mapping fact tableWhat is the major field of study for each student?Even for students who didnt enroll in any courses

    Copyright@ Pawan Likhi

  • SCDThe usual changes to dimension tables are classified into three typesType 1Type 2Type 3

    We will consider the points discussed earlier when deciding which type to use

    Copyright@ Pawan Likhi

  • *Type 1 ChangesUsually relate to corrections of errors in the source systemFor example, the customer dimension: Mickey Schreiber -> Miky Schreiber

    What will happen when number of children is changed?

    Copyright@ Pawan Likhi

  • *Type 1 Changes, cont.General Principles for Type 1 changes:Usually, the changes relate to correction of errors in the source systemSometimes the change in the source system has no significanceThe old value in the source system needs to be discardedThe change in the source system need not be preserved in the DWHWhat will happen when only the last value before the change is needed?

    Copyright@ Pawan Likhi

  • *Type 2 ChangesLets look at the martial status of Miky SchreiberOne the DWHs requirements is to track orders by martial status (in addition to other attributes)All changes before 11/10/2004 will be under Martial Status = Single, and all changes after that date will be under Martial Status = MarriedWe need to aggregate the orders before and after the marriage separately

    Lets make life harder:Miky is living in Negba st., but on 30/8/2009 he moves to Avivim st.

    Copyright@ Pawan Likhi

  • *Type 2 Changes, cont.General Principles for Type 2 changes:They usually relate to true changes in source systemsThere is a need to preserve history in the DWHThis type of change partitions the history in the DWHEvery change for the same attributes must be preserved Must we track changes for all the attributes? For which attributes will we track changes? What are the considerations?

    Copyright@ Pawan Likhi

  • *Type 3 ChangesNot common at allComplex queries on type 2 changes may beHard to implementTime-consumingHard to maintain

    We want to track history without lifting heavy burdenThere are many soft changes and we dont care for the far history

    Copyright@ Pawan Likhi

  • *Type 3 ChangesGeneral Principles:They usually relate to soft or tentative changes in the source systemsThere is a need to keep track of history with old and new values of the changes attributeThey are used to compare performances across the transitionThey provide the ability to track forward and backward

    Copyright@ Pawan Likhi

  • ETL ConceptsExtraction, transformation, and loading. ETL refers to the methods involved in accessing and manipulating source data and loading it into target database.

    The first step in ETL process is mapping the data between source systems and target database (data warehouse or data mart). The second step is cleansing of source data in staging area. The third step is transforming cleansed source data and then loading into the target system.

    ETL (Extraction, Transformation and Loading) is a process by which data is integrated and transformed from the operational systems into the data warehouse environment

    Copyright@ Pawan Likhi

    Copyright@ Pawan Likhi

  • ETL GlossarySource System A database, application, file, or other storage facility from which the data in a data warehouse is derived.

    Mapping The definition of the relationship and data flow between source and target objects.

    Metadata Data that describes data and other structures, such as objects, business rules, and processes. For example, the schema design of a data warehouse is typically stored in a repository as metadata, which is used to generate scripts used to build and populate the data warehouse. A repository contains metadata.

    Staging Area A place where data is processed before entering the warehouse. Copyright@ Pawan Likhi

    Copyright@ Pawan Likhi

  • ETL GlossaryCleansing The process of resolving inconsistencies and fixing the anomalies in source data, typically as part of the ETL process.

    Transformation The process of manipulating data. Any manipulation beyond copying is a transformation. Examples include cleansing, aggregating, and integrating data from multiple sources.

    Transportation The process of moving copied or transformed data from a source to a data warehouse.

    Target System A database, application, file, or other storage facility to which the "transformed source data" is loaded in a data warehouse. Copyright@ Pawan Likhi

    Copyright@ Pawan Likhi

  • Sample ETL Process FlowCopyright@ Pawan Likhi

    Copyright@ Pawan Likhi

  • Detailed ETL Process FlowCopyright@ Pawan Likhi

    Copyright@ Pawan Likhi

  • ExtractionCopyright@ Pawan LikhiOracleSybaseText filesTarget80 tables50 tables

    Copyright@ Pawan Likhi

  • TransformationCopyright@ Pawan LikhiName = Concat(First Name, Last Name)Indiana JonesSherlock Homes

    Staging AreaSource

    Emp idLastNameFirstName10001JonesIndiana10002HolmesSherlock

    Copyright@ Pawan Likhi

  • LoadingCopyright@ Pawan LikhiStaging AreaSourceData WarehouseDirect LoadCleaning,Transformation& Integration of Raw dataClean,Transformed & integrated data load

    Copyright@ Pawan Likhi

  • Popular ETL Tools Copyright@ Pawan Likhi

    Tool NameCompany NameInformaticaInformatica CorporationDT/StudioEmbarcadero Technologies Data StageIBMAb InitioAb Initio Software Corporation Data Junction Pervasive Software Oracle Warehouse Builder Oracle Corporation Microsoft SQL Server Integration Microsoft TransformOnDemand Solonde Transformation Manager ETL Solutions

    Copyright@ Pawan Likhi

  • Informatica CorporationA market leading provider of e-business infrastructure and analytic software which enables customers to automate the integration, analysis and real time delivery of critical corporate information via web,wireless and voiceInformatica applications include eCRM applicationeBusiness Operations applicationeProcurement More than 1,370 customers, including 60 percent of the Fortune 100 companies are using Informaticas analytic solutionsMore than 900 companies are using Informatica products

    Copyright@ Pawan Likhi

    Copyright@ Pawan Likhi

  • Informatica HeadquartersFounded in 1993HQ : Redwood City, CA

    Copyright@ Pawan Likhi

    Copyright@ Pawan Likhi

  • Informatica Powercenter - GlossaryRepository: This is where all the metadata information is stored in the Informatica suite. The Power Center Client and the Repository Server would access this repository to retrieve, store and manage metadata.

    Power Center Client: Informatica client is used for managing users, identifiying source and target systems definitions, creating mapping and mapplets, creating sessions and run workflows etc.

    Repository Server: This repository server takes care of all the connections between the repository and the Power Center Client.

    Power Center Server: Power Center server does the extraction from source and then loading data into targets. Copyright@ Pawan Likhi

    Copyright@ Pawan Likhi

  • Informatica Powercenter - GlossaryDesigner: Source Analyzer, Mapping Designer and Warehouse Designer are tools reside within the Designer wizard. Source Analyzer is used for extracting metadata from source systems. Mapping Designer is used to create mapping between sources and targets. Mapping is a pictorial representation about the flow of data from source to target. Warehouse Designer is used for extracting metadata from target systems or metadata can be created in the Designer itself.

    Data Cleansing: The Power Center's data cleansing technology improves data quality by validating, correctly naming and standardization of address data. A person's address may not be same in all source systems because of typos and postal code, city name may not match with address. These errors can be corrected by using data cleansing process and standardized data can be loaded in target systems (data warehouse). Copyright@ Pawan Likhi

    Copyright@ Pawan Likhi

  • Informatica Powercenter - GlossaryTransformation: Transformations help to transform the source data according to the requirements of target system. Sorting, Filtering, Aggregation, Joining are some of the examples of transformation. Transformations ensure the quality of the data being loaded into target and this is done during the mapping process from source to target.

    Workflow Manager: Workflow helps to load the data from source to target in a sequential manner. For example, if the fact tables are loaded before the lookup tables, then the target system will pop up an error message since the fact table is violating the foreign key validation. To avoid this, workflows can be created to ensure the correct flow of data from source to target.

    Workflow Monitor: This monitor is helpful in monitoring and tracking the workflows created in each Power Center Server. Copyright@ Pawan Likhi

    Copyright@ Pawan Likhi

  • Informatica ArchitectureCopyright@ Pawan Likhi

    Copyright@ Pawan Likhi

  • Informatica ComponentsCopyright@ Pawan LikhiServer ComponentsInformatica ServerRepository Server

    Client ComponentsRepository Server Administration ConsoleRepository ManagerDesignerWorkflow ManagerWorkflow Monitor

    Copyright@ Pawan Likhi

  • Thank You!

    Copyright@ Pawan Likhi

    Copyright@ Pawan Likhi

    Dimensional Modeling Dimensional Modeling 5 -*Data Warehousing FundamentalsData Warehousing Fundamentals Dimensional Modeling Dimensional Modeling 5 -*Data Warehousing FundamentalsData Warehousing Fundamentals Dimensional Modeling Dimensional Modeling 5 -*Data Warehousing FundamentalsData Warehousing Fundamentals Dimensional Modeling Dimensional Modeling 5 -*Data Warehousing FundamentalsOperational systems are just what their name implies; they are the systems that help us run the day-to-day enterprise operations. These are the backbone systems of any enterprise, such as order entry inventory, manufacturing, payroll and accounting systems.

    Because of their importance to the organization, operational systems were almost always the first parts of the enterprise to be computerized. Over the years, these operational systems have been extended and rewritten, enhanced and maintained to the point that they are completely integrated into the organization.

    Data Warehousing Fundamentals Dimensional Modeling Dimensional Modeling 5 -*Data Warehousing FundamentalsNarrow focus : It is typically concerned with the operations of a singledepartment so as to optimize its performance

    Process-oriented - focussed on transaction processing. it has limited flexibility and querying possibilities.

    Continuous availability: round-the-clock access with very little downtime

    Predictability: major transaction response times do not vary significantly with time of day, with season of the year, or during any heavy, peak processing period

    Transaction integrity: the transaction processed by the system is immediately reflected in reality; when you make airline reservations and request a certain seat, you will get that seat.

    Data Warehousing Fundamentals Dimensional Modeling Dimensional Modeling 5 -*Data Warehousing FundamentalsData Warehousing Fundamentals Dimensional Modeling Dimensional Modeling 5 -*Data Warehousing FundamentalsData Warehousing Fundamentals Dimensional Modeling Dimensional Modeling 5 -*Data Warehousing FundamentalsWhereas operational systems are organized in terms of the application areas or processes such as billing, payroll, sales and distribution, order processing etc., decision makers and business analysts look at the enterprise information in terms of major subjects of the enterprise.

    A consequence of this is that, a data warehouse is organized in a subject oriented manner rather than an application or process oriented manner. Sample subject areas could be customer, products, sales, distribution, credit etc.

    Subject orientation is the fundamental difference between a data warehouse and an operational system.

    Data Warehousing Fundamentals Dimensional Modeling Dimensional Modeling 5 -*Data Warehousing FundamentalsData for the data warehouse originates from several heterogeneous, operational systems. There can be several differences across the source systems such as differences in coding schemes, structures, unit of measurement, naming conventions, date formats and physical characteristics.As an example, one application may encode the field GENDER as an "M" and an "F." Another application may represent GENDER as a "1" and a 0. Similarly, one application may measure length in inches while the other in centimeters. What matters is that whatever source the data comes from, it must arrive in the data warehouse in a consistent integrated state. Irrespective of how the source systems store the data, the issue is that, the data needs to be stored in the data warehouse in a singular, globally acceptable fashion. When the end user looks at the data warehouse, the focus of the user should be on using the data, rather than on wondering about the credibility or consistency of the data that is in the warehouse.

    Data Warehousing Fundamentals Dimensional Modeling Dimensional Modeling 5 -*Data Warehousing FundamentalsTime variance means that all data in the data warehouse is accurate as of some moment in time. In the operational environment data is accurate as of the moment of access. Another way of looking at time variance is that, in operational systems, data is kept for a smaller time horizon such as 90 days, 180 days etc. A data warehouse in the contrary stores data spanning several years such as 5 years, 10 years etc.The data warehouse schema represents the time variant nature by having a time element in the key structures. This usually is not a practice in most applications as it is not necessary.

    The time variant nature of a data warehouse stems from the requirement that the data warehouse should provide historical snapshots of data about an organization which is accurate as of some moment in time.Data Warehousing Fundamentals Dimensional Modeling Dimensional Modeling 5 -*Data Warehousing FundamentalsA database with large number of inserts, updates and deletes is said to be volatile. OLTP databases are highly volatile with thousands of insert, update and delete requests operating on a record by record basis.

    A data warehouse is characterized by only two types of writes, such as initial load and incremental load. An update, in its real sense, as part of its normal operations, does not happen in a data warehouse.

    In other words, OLTP databases are highly volatile and DW data is non-volatile

    Data Warehousing Fundamentals Dimensional Modeling Dimensional Modeling 5 -*Data Warehousing FundamentalsData Warehousing Fundamentals Dimensional Modeling *Data Warehousing Fundamentals Dimensional Modeling Dimensional Modeling 5 -*Data Warehousing FundamentalsData Warehousing Fundamentals Dimensional Modeling * Data Warehousing Fundamentals Dimensional Modeling Dimensional Modeling 5 -*Data Warehousing FundamentalsData Warehousing Fundamentals Dimensional Modeling Dimensional Modeling 5 -*Data Warehousing FundamentalsData Warehousing Fundamentals Dimensional Modeling Dimensional Modeling 5 -*Data Warehousing FundamentalsData marts can reduce network bottlenecks.

    Data marts allow you to utilize lower-cost hardware for processinginteractive information requests.

    Since the data warehouse feeds the data marts there is consistent source for all data in the data marts.

    Data marts can increase productivity by providing specific groups and departments with their own subset of the data warehouse to process informational requests when they wish. Data Warehousing Fundamentals Dimensional Modeling Dimensional Modeling 5 -*Data Warehousing FundamentalsData Warehousing Fundamentals Dimensional Modeling Dimensional Modeling 5 -*Data Warehousing FundamentalsA data mart does not provide one of the key benefits of data warehousing an integrated view of business information spanning multiple business areas.

    Uncontrolled proliferation of data marts results in redundant and inconsistent data, and will create significant migration problems and costs if an EDW is required in the future.

    As data marts proliferate, users will want to access data marts belonging to other departments for cross-business function analysis. Such a distributed data mart environment is complex to administer and manage, and will perform poorly.

    Data mart technology at present is often unable to scale up to support the increased data volumes and numbers of users that will exist in a distributed data mart environment.

    Data Warehousing Fundamentals Dimensional Modeling Dimensional Modeling 5 -*Data Warehousing FundamentalsData Warehousing Fundamentals Dimensional Modeling Dimensional Modeling 5 -*Data Warehousing FundamentalsData Warehousing Fundamentals Dimensional Modeling Dimensional Modeling 5 -*Data Warehousing FundamentalsData Warehousing Fundamentals Dimensional Modeling Dimensional Modeling 5 -*Data Warehousing FundamentalsData Warehousing Fundamentals Dimensional Modeling Dimensional Modeling 5 -*Data Warehousing FundamentalsData Warehousing Fundamentals Dimensional Modeling Dimensional Modeling 5 -*Data Warehousing FundamentalsData Warehousing Fundamentals Dimensional Modeling Dimensional Modeling 5 -*Data Warehousing FundamentalsData Warehousing Fundamentals Dimensional Modeling Dimensional Modeling 5 -*Data Warehousing FundamentalsData Warehousing Fundamentals Dimensional Modeling Dimensional Modeling 5 -*Data Warehousing FundamentalsData Warehousing Fundamentals Dimensional Modeling Dimensional Modeling 5 -*Data Warehousing FundamentalsA fact is a performance indicator for a given subject area.

    Facts are the numerical measurements of the business. They are continuously valued and additive.

    Data Warehousing Fundamentals Dimensional Modeling Dimensional Modeling 5 -*Data Warehousing FundamentalsDimensions are the basis on which a fact is analyzed.

    For example, the facts would be analyzed using the dimensions of Time (to provide time-based analysis), Geography (To provide performance analysis across regions),Product (To provide analysis across products).

    Dimensions will store the textual descriptions of the business. Examples are product definition attributes like color, size etc.

    Data Warehousing Fundamentals Dimensional Modeling Dimensional Modeling 5 -*Data Warehousing FundamentalsData Warehousing Fundamentals Dimensional Modeling *Data Warehousing Fundamentals Dimensional Modeling *Data Warehouse Database Design PhasesIn the past several years, a number of methods for designing a data warehouse have been published. Although various sources may use alternate syntax or define certain terms differently, all generally include the same tasks required to produce a sound data warehouse database design.This course attempts to focus on the major tasks associated with the data warehouse database design process. These tasks have been group into five phases that include:Defining the Business ModelCreating Entity Relationship Diagrams (ERD)Creating the Base Dimensional (Star) ModelModeling SummariesCreating the Physical ModelInstructor NoteThese bulleted phases map directly to Lessons 3 - 7. This lesson focuses on the first phase: Defining the Business Model.Data Warehousing Fundamentals Dimensional Modeling *Data Warehousing Fundamentals Dimensional Modeling *Data Warehousing Fundamentals Dimensional Modeling *Dimensions are the textual descriptions of the business. Dimension tables are typically smaller than fact tables and the data changes much less frequently. Dimension tables give perspective regarding the whys and hows of the business and element transactions.While dimensions generally contain relatively static data, customer dimensions are updated more frequently.Dimensions Are Essential for AnalysisThe key to a powerful dimensional model lies in the richness of the dimension attributes because they determine how facts can be analyzed. Dimensions can be considered as the entry point into fact space.Always name attributes in the users vocabulary. That way, the dimension will document itself, and its expressive power will be apparent.Unlike other database structures, in a star schema, the dimensions are denormalized.

    Data Warehousing Fundamentals Dimensional Modeling *Facts are the numerical measures of the business. The fact table is the largest table in the star schema and is composed of large volumes of data.Although a star schema typically contains one fact table, other DSS schemas can contain multiple fact tables. Raw facts such as dollar sales can be combined or calculated with other facts to create measures. Measures can be stored in the fact table or created when necessary for reporting purposes. Data Warehousing Fundamentals Dimensional Modeling *A schema is a collection of database objects, such as tables, views, indexes, and synonyms. The star schema has a single fact table and one or more lookup or dimension tables. The star schema is the simplest form of a dimensional model.There are a variety of models designed for decision-support systems (DSS). In this course, you will use the star as a vehicle for understanding database design and implementing the Global Computing Company data mart.Before you can load the data into the data mart, you must design the data mart database. This design is based on the type of information that users need to answer their questions and make business decisions. Then you create the dimension and fact tables: The time dimension table should include a datetime type column, an integer column for a Julian day value, and columns for all the output columns that need to be generated for this table. The other dimension tables should include an integer column for the primary key and columns for all the output columns that need to be generated for this table.The fact table requires a data column for each fact it stores and data columns for the generated foreign keys that reference dimension table keys.All the fact columns for this fact table need to be created. However, primary and foreign keys do not have to be created because they will be generated as part of the transformation process.

    Data Warehousing Fundamentals Dimensional Modeling *Data Warehousing Fundamentals Dimensional Modeling *Data Warehousing Fundamentals Dimensional Modeling *Data Warehousing Fundamentals Dimensional Modeling Dimensional Modeling 5 -*Data Warehousing FundamentalsData Warehousing Fundamentals Dimensional Modeling Dimensional Modeling 5 -*Data Warehousing FundamentalsData Warehousing Fundamentals Dimensional Modeling Dimensional Modeling 5 -*Data Warehousing FundamentalsData Warehousing Fundamentals Dimensional Modeling Dimensional Modeling 5 -*Data Warehousing FundamentalsData Warehousing Fundamentals Dimensional Modeling Dimensional Modeling 5 -*Data Warehousing FundamentalsData Warehousing Fundamentals Dimensional Modeling Dimensional Modeling 5 -*Data Warehousing FundamentalsData Warehousing Fundamentals