Why Atrion
description
Transcript of Why Atrion
Challenges to designing financial warehouses, lessons learnt
Steve Simon MVP SQL Server BIhttp://www.infogoldusa.com
Why Atrion
PASS Data Architecture Virtual Chapter
Steve Simon is a SQL Server MVP and a Senior Business Intelligence Development Engineer with Atrion Networking Corporation, Providence RI USA. He has been involved with database design and analysis for over 25 years. Steve has presented numerous papers at PASS summits over the years including PASS Europe, in addition to numerous presentations at SQL Saturday events. He is the chairperson of the Oracle/ SQL Server virtual chapter and is a PASS regional mentor.
Steve Simon
Warehouse design will change with time
Two practical examples are used in this presentation.
Cold facts
The FDR(Financial Data Warehouse)
orHow things can run amuck!!
The Michael Jackson Design Technique
So what is wrong with all of this ?
mf_ jobjob_namejob_categoryjob_statusjob_ownermonitor_flagffc_enabledlast_start_datelast_end_dateavg_start_timeavg_end_timeavg_cpu_timeaverage_job_costtotal_job_costschd_frequencyjob_description
refclientFundClient_IDCLIENT_LONG_NM
mf_oncall_ listjob_namefrequencycics_critical_indbus_critical_indtso_caller_id_1tso_caller_id_2tso_manager_iddr_critical_indjob_description
mf_client_ jobclient_short_namejob_namegen_comment
tmp_all_ successor_ jobsextract_jobsuccessor_job
mf_ job_successorapplication_idjob_namemf_sys_namesucc_appl_idsucc_job_namesucc_sys_name
mf_ job_dataset_ fund *dataset_namefundjob_name
Relational SpaghettiSELECT DISTINCT t.ip_address, f.fund FROM mf_job_dataset_fund f inner join mf_job jon f.job_name = j.job_nameinner join mf_oncall_list o on j.job_name = o.job_nameLEFT JOIN mf_client_job c ON c.job_name = j.job_name LEFT JOIN refclient RC ON f.fund = RC.FundLEFT JOIN tmp_all_successor_jobs s ON s.extract_job = j.job_nameLEFT JOIN mf_transmission t ON s.successor_job = t.job_name AND t.from_dataset NOT LIKE 'KKKK‘ LEFT JOIN mf_job j2 ON t.job_name = j2.job_nameWHERE f.fund IN (‘AAAA’) AND j.job_name = f.job_name
inventory fileclient_long_nmclient_idreceiving_partystandard_or_customextract_reformatter[Direction Flow][SSC / Extract File Description][SSC Transmission J ob][Transmission job desciption]frequencyextract_time[From Transmission file][To:Output file]transmission_typeip_addressfund
1500 Queries and extracts
Users expect reports to be rendered in under 30 seconds
Re submitting reports when no results come back. Middleware failure and tie-ups.
So what is the solution?
The challenges were
Tables based upon subject areas AND REPORT TYPE (de-normalized)
Well indexed.
Easy to populate with what is required.
Back to Michael Jackson
How do we do this?
Data Access Layers (DAL)
Before we start
A DAL function ‘joins’ 2 or more tables and returns a table result set containing a myriad of data fields.
Parameters in
A ‘DAL’ is like a bowling ball
Results out
Process the ‘goodies’
We have a data warehouse.
Users accessed data via views (prior to DAL).
Users created their own SQL to extract their data.
Queries were not structured in an optimal manner.
Joins that you would never expect
GLDAL
GL
ObscureGL
View1
ObscureGL
View3
ObscureGL
View2
POSDAL
POS
ObscureREF
View1
ObscureREF
View3
ObscureREF
View2
In short….
Joins were being made ‘willy-nilly’.
CPU clocking went through the ceiling.
Few understood execution plans.
Those queries sent to us were optimized.
In short
From 10 sites we found a lot of commonality.
Looked for ways to pull data with most optimal execution plans (across the board).
Millions of records in most tables.
M.J. to the rescue.
In short
Pull from tables BUT with optimal execution plan.
Take advantage of the TABLE indices.
Avoid pulling one lone field from a view.
Hence the ‘Birth of the DAL’.
Example of view hell
Column Name ID Data Type Null?
FUND_ID 1 VARCHAR (8 ) NAGNT_BANK_FINS_NUM 2 VARCHAR (5 ) YASOF_CLIENT_SW 3 VARCHAR (1 ) YBASE_CNTRY_CD 4 VARCHAR (2 ) YBNFCY_TAX_ID 5 VARCHAR (9 ) YBOND_SRC_CD 6 VARCHAR (2 ) YCASH_COST_MTHD_IND 7 VARCHAR (1 ) YCASH_SELL_TRANS_CD 8 VARCHAR (2 ) YCLIENT_ACCT_NUM 9 VARCHAR (12 ) Y…………………………….CLIENT_FUND_NUM 100 VARCHAR (4 ) YCLIENT_ID 101 VARCHAR (8 ) Y
Our plan of action
Architecturechanges
TABLE TABLE
VIEWVIEW
DATABASEArchitecture prior to DAL
DAL
TABLE TABLE
DATABASEArchitecture
with DAL
Boils down to efficient use of TABLE indices
Positions
General Ledger
Transactions
Lot level data
DAL Coverage
Sample user defined function
USE DALGOSELECT fund_id, asset_id, calen_dt FROM Get_Pos_Sum('m1te|fdr1|pat2','203900105|IEP','2006-01-01','2008-12-31')
Demo 1
The Michael Jackson Design Technique
Meanwhileback in the
grocery business
Never less than 900 million rows
partitioned& event data
2.3 billion
GUID’s ain’t so great!!
CustomerID int Advert GUID Customer GUID
1 NULL NULL
2 1KWW-9POIU-R2 NULL
3 NULL NULL
CustomerKey Advert GUID
2 1KWW-9POIU-R2
Customer GUID $$$$ Required for the report
12345
Session Data
Warehouse Data
End Client Data
..even tried CTE’s
;with customerKeys as ( select customerKey, customerID from [DataWarehouse].[dbo].[CustomerHelper_MWG] ch join AcmePath.dbo.tempAcmeCoupontCustomers_1plusSessions t on t.fkCustomerID = ch.CustomerID ) SELECT sum(sales) as sales, basketID, k.CustomerID FROM [DataWarehouse].[dbo].[FactDailySales] fds join customerKeys k on fds.customerKey = k.customerKey where DateKey between 20120619 and 20120715 Group by basketID, k.CustomerID
3hr 21 minutes
Indices and the super warehouse
;with customerKeys as ( select customerKey, customerID from [DataWarehouse].[dbo].[CustomerHelper_MWG] chjoin AcmePath.dbo.tempAcmeCoupontCustomers_1plusSessions t on t.fkCustomerID = ch.CustomerID )
SELECT sum(sales) as sales, basketID, k.CustomerID FROM [SuperWarehouse].[dbo].[FactDailySales] fds join customerKeys k on fds.customerKey = k.customerKey where DateKey between 20120619 and 20120715 Group by basketID, k.CustomerID
Metrics avoid insanity
Monitoring performance issues
using Reporting Services
Green text box
Red text box
Queries with aggregations
SELECT fc.CustomerID, fc.OrderID, fc.DateKey, pr.Level1CategoryID, pr.Level2CategoryID, pr.Level3CategoryID, pr.Level4CategoryID, SUM(fc.Dollars) as Dollars, SUM(fc.Units) as Units, SUM(fc.TotalWeight) as [Weight], SUM (fc.Units + (case fc.TotalWeight when 0 then 0 else 1 end)) as TotalUnits INTO rpt.Acme_OrderFROM dwh.SalesOrderDetail fcINNER JOIN dwh.DimProduct pr on fc.ProductKey = pr.ProductKeyGROUP BY fc.CustomerID, fc.OrderID, fc.DateKey, pr.Level1CategoryID, pr.Level2CategoryID, pr.Level3CategoryID , pr.Level4CategoryID
3:41:00 to complete
SSIS is an answer
DMV’s as a monitoring tool
Demo 2
DW table structure similar to reporting patterns.
Data must be cleansed and complete across reporting areas.
The take away’s
DALs may be a solution to your problem.
Reporting Services a great tool to ‘show’ problematic areas.
Finally, revisit your over all architecture regularly.
The take away’s
DMV’s.
Which at the end of the day
Resulting in a better understanding of the
Steve Simonhttp://www.infogoldusa.com
Challenges to designing financial warehouses.
PASS Data Architecture Virtual Chapter
Why Atrion