Henry, Bill

6

Click here to load reader

Transcript of Henry, Bill

Page 1: Henry, Bill

Overview of Data Warehousing with Materialized ViewsTypically, data flows from one or more online transaction processing (OLTP) databases into a data warehouse on a monthly, weekly, or daily basis. The data is usually processed in a staging file before being added to the data warehouse. Data warehouses typically range in size from tens of gigabytes to a few terabytes, usually with the vast majority of the data stored in a few very large fact tables. One technique employed in data warehouses to improve performance is the creation of summaries, or aggregates. They are a special kind of aggregate view that improves query execution times by precalculating expensive joins and aggregation operations prior to execution and storing the results in a table in the database. For example, a table could be created to contain the sum of sales by region and by product. Prior to Oracle8i, organizations using summaries spent a significant amount of time manually creating summaries, identifying which ones to create, indexing the summaries, updating them, and advising their users on which ones to use. The introduction of summary management in the Oracle server eases the workload of the DBA and means the end user no longer has to be aware of which summaries have been defined. The DBA creates one or more materialized views, which are the equivalent of a summary. The end user queries the tables and views in the database and the query rewrite mechanism in the Oracle server automatically rewrites the SQL query to use the summary tables. This mechanism significantly improves the response time for returning results from the query and eliminates the need for the end user or database application to be aware of the materialized views that exist within the data warehouse. The summaries or aggregates that are referred to in this book and in literature on data warehousing are created in Oracle using a schema object called a materialized view. Materialized views can perform a number of roles, such as improving query performance or providing replicated data, as described below. Although materialized views are usually accessed via the query rewrite mechanism, an end-user or database application can construct queries that directly access the summaries. However, serious consideration should be given to whether users should be allowed to do this, because, once the summaries are directly referenced in queries, the DBA will not be free to drop and create summaries without affecting applications.

Oracle8i Data Warehousing Guide

It is highly recommended that you read Oracle’s Data Warehouse guide for Materialized Views.It is an excellent overview of all the capabilities and requirements.

Page 2: Henry, Bill

Sales Fact Table

Units_SoldPrice_Per_UnitDollars_Sold

Time PeriodsDimensionTable

Month_descQuarter_descYear_desc

ProductsDimensionTable

Prod_descProd_categoryProd_sizeProd_colorProd_list_Price

PromotionsDimensionTable

Promo_desc

GeographiesDimensionTable

Country_descRegion_descCity_desc

CustomersDimensionTable

Cust_nameCust_cityCust_genderCust_ageCust_income

Page 3: Henry, Bill

Select a.Year_desc, b.Category_desc, c.Promo_desc, d.Region_desc, e.Cust_city, sum(f.Dollars_SoldFrom Dim_time a, Dim_prod b, Dim_promo c, Dim_geo d, Dim_cust eWhere a.time_key = f.time_keyand b.prod_key = f.prod_keyand c.promo_key = f.promo_keyand d.geo_key = f.geo_keyand e.cust_key = f.cust_keyand a.Year_desc in (‘2002’,’2001’,’2000’)and b.Category_desc = “Shirts”and d.Region_desc = “USA East”and e.Cust_city in (“NY”,”BOSTON”,”MIAMI”)Group By a.Year_desc, b.Category_desc, c.Promo_desc, d.Region_desc, e.Cust_city

Expensive sort

Expensive join

Resubmitted over and overwith different filters

Page 4: Henry, Bill

CREATE MATERIALIZED VIEW Star_Sales_mv PCTFREE 0 TABLESPACE mviews STORAGE (initial 1M next 10M pctincrease 0)BUILD DEFERREDREFRESH COMPLETE ON DEMAND ENABLE QUERY REWRITE AS Select a.Year_desc, b.Category_desc, c.Promo_desc, d.Region_desc, e.Cust_city, sum(f.Dollars_SoldFrom Dim_time a, Dim_prod b, Dim_promo c, Dim_geo d, Dim_cust eWhere a.time_key = f.time_keyand b.prod_key = f.prod_keyand c.promo_key = f.promo_keyand d.geo_key = f.geo_keyand e.cust_key = f.cust_keyGroup By a.Year_desc, b.Category_desc, c.Promo_desc, d.Region_desc, e.Cust_city

Select a.Year_desc, b.Category_desc, c.Promo_desc, d.Region_desc, e.Cust_city, sum(f.Dollars_SoldFrom Dim_time a, Dim_prod b, Dim_promo c, Dim_geo d, Dim_cust eWhere a.time_key = f.time_keyand b.prod_key = f.prod_keyand c.promo_key = f.promo_keyand d.geo_key = f.geo_keyand e.cust_key = f.cust_keyand a.Year_desc in (‘2002’,’2001’,’2000’)and b.Category_desc = “Shirts”and d.Region_desc = “USA East”and e.Cust_city in (“NY”,”BOSTON”,”MIAMI”)Group By a.Year_desc, b.Category_desc, c.Promo_desc, d.Region_desc, e.Cust_city

Page 5: Henry, Bill

Select a.Year_desc, b.Category_desc, c.Promo_desc, d.Region_desc, e.Cust_city, sum(f.Dollars_SoldFrom Dim_time a, Dim_prod b, Dim_promo c, Dim_geo d, Dim_cust eWhere a.time_key = f.time_keyand b.prod_key = f.prod_keyand c.promo_key = f.promo_keyand d.geo_key = f.geo_keyand e.cust_key = f.cust_keyand b.Category_desc in (“Shirts”, “Pants”)and d.Country_desc = “USA”and e.Cust_city in (“NY”,”LA”,”CHICAGO”,”MIAMI”,”DALLAS”)Group By a.Year_desc, b.Category_desc, c.Promo_desc, d.Region_desc, e.Cust_city

New filter conditions that query rewritewill intercept and use materialized view.

Page 6: Henry, Bill

Security IssuesTo create a materialized view, the privilege CREATE MATERIALIZED VIEW is required, and to create a materialized view that references a table in another schema, you must have SELECT privileges on that table. Moreover, if you enable query rewrite, you must have the QUERY REWRITE or GLOBAL QUERY REWRITE privilege to reference tables in your own schema. To enable query rewrite on a materialized view that references tables outside your schema, you must have the GLOBAL QUERY REWRITE privilege.

Overview of Materialized View Management TasksThe motivation for using materialized views is to improve performance, but the overhead associated with materialized view management can become a significant system management problem. Materialized view management activities include:

•Identifying what materialized views to create initially •Indexing the materialized views •Ensuring that all materialized views and materialized view indexes are refreshed properly each time the database is updated •Checking which materialized views have been used •Determining how effective each materialized view has been on workload performance  •Measuring the space being used by materialized views •Determining which new materialized views should be created •Determining which existing materialized views should be dropped •Archiving old detail and materialized view data that is no longer useful

First Time ApproachThe first time approach should be a small step with a reoccurring long running join.:

•Identify reoccurring long running joins with filter conditions, probably generated by a third party product • Identify base joins and selectable attributes they have in common•Build a common Materialized View based on common join and attribute structure •Build the Materialized View, run Analyze on it, and build a bitmap index.•Determine how effective the materialized view has been on workload performance