Data Warehousing
-
Upload
rahul-dhande -
Category
Documents
-
view
213 -
download
1
description
Transcript of Data Warehousing
Data Warehousing
UNIT-I
1. What is a need of data warehouse?Answer:In traditional database system
Non- Intuitive and Complex Database StructureNo Insight from Other SourcesHistorical Changes are Lost
Benefits of using data warehouseA Data Warehouse Delivers Enhanced Business Intelligence A Data Warehouse Saves TimeA Data Warehouse Enhances Data Quality and Consistency Data Warehouse Provides Historical IntelligenceKnowledge Discovery and Decision Support
2. List and explain the characteristics of data warehouse.Answer:a. Subject Oriented Datab. Integrated Datac. Time-Referenced Datad. Non-Volatile Data
Subject OrientedData warehouses are designed to help you analyze data. For example, to learn more about your company's sales data, you can build a warehouse that concentrates on sales. Using this warehouse, you can answer questions like "Who was our best customer for this item last year?" This ability to define a data warehouse by subject matter, sales in this case, makes the data warehouse subject oriented.
IntegratedIntegration is closely related to subject orientation. Data warehouses must put data from disparate sources into a consistent format. They must resolve such problems as naming conflicts and inconsistencies among units of measure. When they achieve this, they are said to be integrated.
NonvolatileNonvolatile means that, once entered into the warehouse, data should not change. This is logical because the purpose of a warehouse is to enable you to analyze what has occurred.
Time VariantIn order to discover trends in business, analysts need large amounts of data. This is very much in contrast to online transaction processing (OLTP)
1
systems, where performance requirements demand that historical data be moved to an archive. A data warehouse's focus on change over time is what is meant by the term time variant.
3. Differentiate Operational Systems and Informational Systems.Operational Systems: Systems that help the everyday operations of the enterprise.Informational System: Knowledge based systems. Informational Systems deal with analyzing data and making decisions, like how the enterprise will operate now, and in the future.
4. Draw and explain data warehouse architecture.The architecture is made up of a number of interconnected parts:(i) Source System(ii) Source data transport layer(iii) Data Quality control and data profiling layer.(iv) Metadata management layer.(v) Data integration layer.(vi) Data processing layer.(vii) End user reporting layer.
2
5. Short note on: Source System in data warehouse architecture.6. Explain is the role of Source data transport layer?7. What is role of Data Quality Control and Data Profiling Layer?8. What is metadata? Explain Metadata Management Layer.9. What is integrated data? Explain the importance of Data Integrity Layer.
Various tasks must be accomplished to integrate data acquired from various source systems.A lot of formatting & cleansing activities are carried out here so that the data is consistent across the enterprise.High level job control for various processes (procedures) that must occur to keep the data warehouse up-to-date.
10. Explain Data Processing Layer?The Warehouse (core) is where the dimensionally modeled data resides.Data must be stored in a form that is easy to access and highly flexible.This layer consists of staging and enterprise warehouse .Data staging often involves complex programming, data quality analysis programs, filter that indentify patternsand structures withing existing operational data.Staging area:
Staging area is a place where you hold temporary tables on data warehouse server. Staging tables are connected to work area or fact tables. We basically need staging area to hold the data, and perform data cleansing and merging, before loading the data into warehouse.
In the absence of a staging area, the data load will have to go from the OLTP system to the OLAP system directly, which in fact will severely hamper the performance of the OLTP system. This is the primary reason for the existence of a staging area. In addition, it also offers a platform for carrying out data cleansing.
Staging area is a temp schema used to
1. Do flat mapping i.e. dumping all the OLTP data in to it without applying any business rules. Pushing data into staging will take less time because there are no business rules or transformation applied on it.
2. Used for data cleansing and validation using First Logic
A staging area is like a large table with data separated from their sources to be loaded into a data warehouse in the required format. If we attempt to load data directly from OLTP, it might mess up the OLTP because of format changes between a warehouse and OLTP. Keeping the OLTP data intact is very important for both the OLTP and the warehouse.
3
11. Give the important role plays is data warehouse? Explain End User Reporting Layer.
12. What are the factors to be considered while designing data warehouse.13. Explain scope of data warehouse.14. What are the various levels of data redundancy in data warehouse? 15. List and explain the various types of end user of data warehouse.
(i) Executive and managers.(ii) Power users(business and functional analysts, engineers)(iii) Support users(Clerks, administrators)
16. Explain the goals of data warehouse.(i) Provide easy access to corporate data.(ii) Provide Clean and Reliable data for analysis
17. Differentiate between OLTP and data warehouse databases.18. Why dimensional modeling is essential.19. ……………………………………….20. Explain star schema with suitable example.21. What is relationship between fact table and dimension table? Explain with
suitable example.22. Differentiate between star schema and snowflake schema.
Star Schema Snowflake Schema
Ease of maintenance / change
Has redundant data and hence less easy to maintain/change
No redundancy and hence more easy to maintain and change
Ease of UseLess complex queries and easy to understand
More complex queries and hence less easy to understand
Query Performance Less no. of foreign keys and hence lesser query execution time
More foreign keys-and hence more query execution time
Type of Datawarehouse
Good for datamarts with simple relationships (1:1 or 1:many)
Good to use for data warehouse core to simplify complex relationships (many:many)
Joins Fewer Joins Higher number of Joins
Dimension table Contains only single dimension table for each dimension
It may have more than one dimension table for each dimension
4
When to use When dimension table contains less number of rows, we can go for Star schema.
When dimension table is relatively big in size, snowflaking is better as it reduces space.
Normalization/ De-Normalization
Both Dimension and Fact Tables are in De-Normalized form
Dimension Tables are in Normalized form but Fact Table is still in De-Normalized form
23. Describe the foreign key column in fact table and dimension table.Answer:Foreign keys of dimension tables are primary keys of entity tables.Foreign keys of fact tables are primary keys of dimension tables.
24. Differentiate between star schema and snowflake schema.25. What is a level of granularity of a fact table?
Answer:Level of granularity means level of detail that you put into the fact table in a data warehouse. Level of granularity would mean what details you are willing to put for each transactional fact.
26. What are additive, semi-additive and non-additive measures?Answer:Non-additive MeasuresNon-additive measures are those which can not be used inside any numeric aggregation function (e.g. SUM(), AVG() etc.). One example of non-additive fact is any kind of ratio or percentage. Example, 5% profit margin, revenue to asset ratio etc. A non-numerical data can also be a non-additive measure when that data is stored in fact tables, e.g. some kind of varchar flags in the fact table.
Semi Additive MeasuresSemi-additive measures are those where only a subset of aggregation function can be applied. Let’s say account balance. A sum() function on balance does not give a useful result but max() or min() balance might be useful. Consider price rate or currency rate. Sum is meaningless on rate; however, average function might be useful.
Additive MeasuresAdditive measures can be used with any aggregation function like Sum(), Avg() etc. Example is Sales Quantity etc
27. Short note on additivity of facts.28. What is role helper table of multi-valued dimensions?
ORExplain the role of associative entity.
5
Many to many relationship between entities creates complexity and should be resolved. Many-to-Many relationship between entities bin a relational database management system are resolved by placing a third entity (helper tabl) between the entities involved called associated entity.
==============================================
UNIT-II1. Differentiate between OLTP Database and Data warehouse database
2. What is data warehose?3. Write the steps to install Oracle 11g?4. What is LISTENER? Write the steps to create a new Listener?
(Page(18)5. Write the steps to create database using ‘Database Configuration
Assistant’(Page 20)6. Draw and explain the OWB architecture.7. List and explain OWB Components
(i) Data Center
6
Differences
Data warehouse database OLTP database
Designed for real-time business transactions and processes
Designed for analysis of business measures by subject area, category and attributes.
Designed for analysis of business measures by categories and attributes
Designed for real time business operations.
Optimized for bulk loads and large, complex, unpredictable queries that access many rows per table.
Optimized for a common set of transactions, usually adding or retrieving a single row at a time per table.
Loaded with consistent, valid data; requires no real time validation
Optimized for validation of incoming data during transactions; uses validation data tables.
Supports few concurrent users relative to OLTP
Supports thousands of concurrent users.
(ii) Repository Browser(iii) Control Center Service(iv) Repository(v) Target Schema
8. What is workspace?9. How to configure the repository and workspace.
7