Barclays use of TASM to manage Ad Hoc Business Users & Operational/Batch workloads
Ian RussellTeradata Certified Master V2R5Barclays Bank
2(C) Teradata Universe Seoul 2008 All Rights Reserved.
2
Agenda
• Brief overview
> Barclays
> Teradata data warehouse environment
> Teradata data warehouse workload
• Workload management prior to TASM and issues• How TASM was implemented • Workload management under TASM• Outstanding issues & Successes
3(C) Teradata Universe Seoul 2008 All Rights Reserved.
3
Brief overview of Barclays
• Barclays is one of the world’s largest financial services providers offering banking, investment banking and investment management services to more than 27 million customers in over 50 countries, and employing more than 123,000 people worldwide.
• 2006 pre-tax profits £7,136m > 2007 half-year profit of £4,101m
• Our ambition is to become one of the handful of universal banks leading the global financial services industry.
• Our strategy follows a simple premise: anticipate the needs of our customers and clients, then serve them by helping them achieve their goals.
4(C) Teradata Universe Seoul 2008 All Rights Reserved.
4
Overview of the Teradata data warehouse environment
• Barclays has 5 Teradata servers with a total of 62TB storage, running V2R6.2
• These are connected to 5 Mainframe LPARs and many mid-tier servers
• Batch primarily is executed from Mainframe with a smaller extent from UNIX
• Ab Initio is being introduced for batch work
• Archives of source system data going back over 10 years are held
• Users have access to a wide variety of tools – OpenText BI, SQL Assistant, SAS, Business Objects, White Light, MS Access/Excel, Informatica, KXEN, etc.. access over Wide Area Network
5(C) Teradata Universe Seoul 2008 All Rights Reserved.
5
Overview of the Teradata data warehouse workload
• 1,500 business users across Barclays organisation – not all UK based, off-shore development/test
• Up to 1,500,000 ad hoc queries submitted daily
• Users have access to sand-pits and ability to create/drop database objects and load/unload data
• User have access to a variety of tools to access the DW
• The dictionary is updated regularly
• Batch/operational database loads or builds 700 databases on either a daily, weekly or monthly basis
• No integrated data model
• Priority is given to batch overnight and users during the day
• Data is restored frequently for analysis
• TCRM application
6(C) Teradata Universe Seoul 2008 All Rights Reserved.
6
Agenda
• Brief overview
> Barclays
> Teradata data warehouse environment
> Teradata data warehouse workload
• Workload management prior to TASM and issues• How TASM was implemented • Workload management under TASM• Outstanding issues & Successes
7(C) Teradata Universe Seoul 2008 All Rights Reserved.
7
Pre-TASM and Issues
• Response times to user queries varies widely
• PSF split between BAU (User) and Operational work
• Limited ability to push critical path of batch through
• The main Teradata server was at CPU capacity
• Poorly written queries could take huge amounts of system resources that required manual identification and action. This was not available over night and batch could be impacted.
• Well written short queries were being delayed along with the bad within TDQM
• Certain business areas felt they owned percentages of the platform owing to how it was funded.
8(C) Teradata Universe Seoul 2008 All Rights Reserved.
8
Answer
• To overcome this Barclays engaged with Teradata for the provision of a world class workload management - TASM
• Teradata Professional Services provided the following consultants> TASM Proof of Concept: March 2006 Paul Davies
> TASM Implementation: September 2006 Brian Middleton
9(C) Teradata Universe Seoul 2008 All Rights Reserved.
9
Agenda
• Brief overview
> Barclays
> Teradata data warehouse environment
> Teradata data warehouse workload
• Workload management prior to TASM and issues
• How TASM was implemented
• Workload management under TASM• Outstanding issues & Successes
10(C) Teradata Universe Seoul 2008 All Rights Reserved.
10
Proof of Concept - Lessons learnt
• TASM was the direction Barclays wished to go
• Teradata Workload Analyser was immature
• Barclays specifics> Attempts to apportion % of machine to specific businesses by
financial contribution can lead to inefficient use of the system> Account Strings needed cleaning up> One central batch user with many account strings needed changing> Profiles should be introduced> Need to decide on workloads> DBQL data needs to be in DBC format for analysis> DBQL StepInfo needs activating> DBQL TDWM columns need holding in DBQL history> Colloquial knowledge of workloads does not match actual
• Further assistance from Teradata was required
11(C) Teradata Universe Seoul 2008 All Rights Reserved.
11
Define Workloads
• Barclays had already made the split between Operational workloadand User queries – BAU
• Further workloads were identified> Tactical – guaranteed response times
> High, Standard & Low Level Service
> Batch
> DBA work, including BAR
> System• Decide on policy of rewarding short queries and penalising long
queries
• Time periods for when specific workloads have priority were alsodecided upon
12(C) Teradata Universe Seoul 2008 All Rights Reserved.
12
Workload Classifications (1/2)
Account String = ‘$OPS$&D&H’, Estimated Processing <= 0.1 SecondFast Operations Support queriesOP_Support_Instant
Account String = ‘$OPS$&D&H’Operations Support queriesOP_Support
Account String = ‘$BAU$&D&H’, Load Utility Type = ALLUtility BAU queriesBAU_Utility
Account String = ‘$BAU$&D&H’, Estimated Processing <= 0.1 Second, <= 30 Seconds, <= 20 Minutes & > 20 Minutes
Fast, short, medium and long-running BAU queries
BAU_Instant, BAU_Short, BAU_Medium,BAU_Long
Account String = ‘$HSL$&D&H’, Load Utility Type = ALLUtility HSL queriesHSL_Utility
Account String = ‘$HSL$&D&H’, Estimated Processing <= 0.1 Second, <= 30 Seconds, <= 25 Minutes & > 25 Minutes
Fast, short, medium and long-running HSL queries
HSL_Instant, HSL_Short, HSL_Medium,HSL_Long
Account String = ‘$OPS$&D&H’, Load Utility Type = ALLUtility BAU Support queriesOP_Support_Utility
User = Crashdumps, DBADMIN, USERADMIN, DBAXK, APPLADMINLow priority DBA work - housekeepingDBA_Low
User = DBC, DBCMANAGER, TDWMDBC, DBCMANAGER, TDWMDefault_High
Account String = ‘$TAC$&D&H’All-Amp Tactical QueriesTactical_AllAmp
User = DBADMIN, USERADMIN, Estimated Processing <= 10 SecondsDBADMIN, USERADMINDBA_High
Account String = ‘$TAC$&D&H’, Single or few AMPsSingle-Amp Tactical QueriesTactical_Single
ClassificationPurposeWorkload
13(C) Teradata Universe Seoul 2008 All Rights Reserved.
13
All queries not defined aboveCatch allDefault
Account String = ‘$LSL$&D&H’Long running LSL queriesLSL_Long
Account String = ‘$ARC$&D&H’Routine archivesOP_Archives
Account String = ‘$RST$&D&H’User-initiated restoresOP_Restores
Account String = ‘$OPH$&D&H’, Load Utility Type = ALLAccount String =‘$OPM$&D&H’, Load Utility Type = ALLAccount String =‘$OPL$&D&H’, Load Utility Type = ALL
High. Medium & low Priority Batch Utility Jobs
OP_High_Utility,OP_Medium_Utility,OP_Low_Utility
Account String = ‘$LSL$&D&H’, Estimated Processing <= 0.1 SecondFast LSL queriesLSL_Instant
NoneSystem workloadsConsoleR, ConsoleH, ConsoleMConsoleL
Account String = ‘$LSL$&D&H’, Load Utility Type = ALLUtility BAU queriesLSL_Utility
Account String = ‘$OPL$&D&H’Low Priority Batch JobsOP_Low
Account String = ‘$OPM$&D&H’Medium Priority Batch JobsOP_Medium
Account String = ‘$OPH$&D&H’High Priority Batch JobsOP_High
Account String In ‘$OPH$&D&H’, ‘$OPM$&D&H’,Estimated Processing <= 0.1 Second
Fast OP queriesOP_Instant
ClassificationPurposeWorkload
Workload Classifications (2/2)
14(C) Teradata Universe Seoul 2008 All Rights Reserved.
14
Workload Exceptions (1/2)
Continue and LogCPU millisec per IO > 10 for 600 Seconds BAU_Long
Change to BAU_LongCPU Time >= 2000 SecondsBAU_Medium
Change to BAU_MediumCPU Time >= 200 SecondsBAU_Short
Change to HSL_LongCPU Time >= 2000 SecondsHSL_Medium
Continue and LogCPU millisec per IO > 10 for 600 Seconds HSL_Long
Change to HSL_MediumCPU Time >= 200 SecondsHSL_Short,
Change to OP_SupportCPU Time >= 10 SecondsOP_Support_Instant
Continue and LogCPU millisec per IO > 10 for 600 Seconds OP_Support
No Exception MonitoringBAU_Utility
Change to BAU_ShortCPU Time >= 10 SecondsBAU_Instant
No Exception MonitoringHSL_Utility
Change to HSL_ShortCPU Time >= 10 SecondsHSL_Instant
No Exception MonitoringOP_Support_Utility
No Exception MonitoringDBA_Low
No Exception MonitoringDefault_High
Continue and LogCPU Time >= 10 SecondsTactical_AllAmp
Change to DBA_LowCPU Time >= 100 SecondsDBA_High
Continue and LogCPU Time >= 2 SecondsTactical_Single
Exception ActionException CriteriaWorkload
15(C) Teradata Universe Seoul 2008 All Rights Reserved.
15
Workload Exceptions (2/2)
No Exception MonitoringOP_Medium_Utility
No Exception MonitoringOP_Low_Utility
No Exception MonitoringDefault
Continue and LogCPU millisec per IO > 10 for 600 Seconds LSL_Long
No Exception MonitoringOP_Archives
No Exception MonitoringOP_Restores
No Exception MonitoringOP_High_Utility
Change to LSL_LongCPU Time >= 10 SecondsLSL_Instant
No Exception MonitoringConsoleR, ConsoleH, ConsoleMConsoleL
No Exception MonitoringLSL_Utility
Continue and LogCPU millisec per IO > 10 for 600 Seconds OP_Low
Continue and LogCPU millisec per IO > 10 for 600 Seconds OP_Medium
Continue and LogCPU millisec per IO > 10 for 600 Seconds OP_High
Change to OP_MediumCPU Time >= 10 SecondsOP_Instant
Exception ActionException CriteriaWorkload
16(C) Teradata Universe Seoul 2008 All Rights Reserved.
16
Periods
Weekday Daytime
Weekday Evening
Weekday Night
09:00 18:00 20:00 08:00
Monday
Weekday Daytime
Weekday Evening
Weekday Night
09:00 18:00 20:00 08:00
Wednesday
Weekday Daytime
Weekday Evening
Weekday Night
09:00 18:00 20:00 08:00
Tuesday
Weekday Daytime
Weekday Evening
Weekday Night
09:00 18:00 20:00 08:00
Thursday
Weekday Daytime
Weekday Evening
Weekday Night
09:00 18:00 20:00 08:00
Friday Saturday Sunday
Default
Default
Wrap Around Midnight
Wrap Around Midnight
Wrap Around Midnight
Wrap Around Midnight
Wrap Around Midnight
17(C) Teradata Universe Seoul 2008 All Rights Reserved.
17
Allocation Group Weights
Standard
Standard
Standard
Standard
Standard
Default
Default
Default
Default
Tactical
Resource Partition
1
8
20
40
50
5
10
30
40
Tactical
Weekday Night(M-F, 20:00–08:00)
303030H
101010M
555L
404040R
20205UserLow
403010UserHigh
20120OpLow
40835OpHigh
805050Priority
TacticalTacticalTacticalTactical
Weekday Evening
(M-F, 18:00–20:00)
Weekday Daytime
(M-F, 09:00–18:00)
DefaultAG
18(C) Teradata Universe Seoul 2008 All Rights Reserved.
18
Workload to Allocation Group Mapping
Standard
Standard
Standard
Standard
Standard
Default
Default
Default
Default
Tactical
Resource Partition
DBA_Low, BAU_Long, LSL_Long, BAU_Medium, BAU_Short, LSL_Utility, BAU_Utility
HSL_Utility, HSL_Long, HSL_Medium, HSL_Short,
OP_Medium_Utility, OP_Archives, OP_Low, OP_Low_Utility, OP_Medium, OP_Restores
OP_Support_Utility, OP_High, OP_Support, OP_High_Utility
LSL_Instant, OP_Instant, BAU_Instant, HSL_Instant, OP_Support_Instant
ConsoleL
Default, ConsoleM
Default_High, ConsoleH, DBA_High
ConsoleR
Tactical_Single, Tactical_AllAmp
Workload
H
M
L
R
UserLow
UserHigh
OpLow
OpHigh
Priority
Tactical
Allocation Group
19(C) Teradata Universe Seoul 2008 All Rights Reserved.
19
Workload Throttles (Monitor and Tune)
NoneNoneNoneNoneOP_Support_Utility
NoneNoneNoneNoneOP_Support_Instant
2222OP_Support
2363BAU_Medium
1111BAU_Long
NoneNoneNoneNoneBAU_Utility
NoneNoneNoneNoneBAU_Instant
6122012BAU_Short
2222HSL_Medium
1111HSL_Long
3343HSL_Short
NoneNoneNoneNoneHSL_Instant
NoneNoneNoneNoneHSL_Utility
None
None
None
None
None
Weekday Night(M-F, 20:00–08:00)
NoneNoneNoneDefault_High
NoneNoneNoneDBA_Low
NoneNoneNoneTactical_AllAmp
NoneNoneNoneDBA_High
NoneNoneNoneTactical_Single
Weekday Evening(M-F, 18:00–20:00)
Weekday Daytime(M-F, 09:00–18:00)
DefaultWorkload
20(C) Teradata Universe Seoul 2008 All Rights Reserved.
20
Workload Throttles (Monitor and Tune)
4434OP_High
NoneNoneNoneNoneOP_Low_Utility
NoneNoneNoneNoneOP_Instant
NoneNoneNoneNoneOP_Medium_Utility
NoneNoneNoneNoneOP_High_Utility
208220OP_Medium
4214OP_Low
NoneNoneNoneNoneOP_Archives
NoneNoneNoneNoneOP_Restores
Weekday Night(M-F, 20:00–08:00)
Weekday Evening(M-F, 18:00–20:00)
Weekday Daytime(M-F, 09:00–18:00)
DefaultWorkload
NoneNoneNoneNoneDefault
1111LSL_Long
NoneNoneNoneNoneLSL_Instant
NoneNoneNoneNoneLSL_Utility
NoneNoneNoneNoneConsoleL
NoneNoneNoneNoneConsoleM
NoneNoneNoneNoneConsoleH
NoneNoneNoneNoneConsoleR
21(C) Teradata Universe Seoul 2008 All Rights Reserved.
21
Service Level Goals
1,080 Seconds
1,680 Seconds
1,080 Seconds
1,800 Seconds
240 Seconds
40 Seconds
2 Seconds
360 Seconds
600 Seconds
120 Seconds
20 Seconds
1 Seconds
360 Seconds
120 Seconds
300 Seconds
1 Seconds
0
0
Response Time Goal (95%)
0
0
0
0
0
0
0
0
0
0
0
0
0
500
350
800
0
0
Arrival Rate (QpH)
Nightime
Throughput Goal (QpH)
Daytime
900 Seconds100BAU_Long
360 Seconds600BAU_Utility
51 Seconds20,000BAU_Instant
90 Seconds3,000BAU_Short
360 Seconds300BAU_Medium
1,080 Seconds10OP_Support_Utility
10
80
135
100
500
5,000
600
500
350
800
0
0
Arrival Rate (QpH)
Throughput Goal (QpH)
1,080 Seconds
1,680 Seconds
660 Seconds
300 Seconds
50 Seconds
15 Seconds
360 Seconds
120 Seconds
300 Seconds
1 Seconds
0
0
Response Time Goal (95%)
Tactical_Single
Tactical_AllAmp
Default_High
DBA_High
DBA_Low
HSL_Utility
HSL_Instant
OP_Support_Instant
OP_Support
HSL_Long
HSL_Medium
HSL_Short
Workload
22(C) Teradata Universe Seoul 2008 All Rights Reserved.
22
Service Level Goals
0
660 Seconds
15 Seconds
360 Seconds
0
0
0
0
0
0
120 Seconds
540 Seconds
120 Seconds
600 Seconds
120 Seconds
540 Seconds
120 Seconds
Response Time Goal (95%)
0
0
0
0
0
0
0
0
0
0
100
200
3
1,500
100
200
3
Arrival Rate (QpH)
Nightime
Throughput Goal (QpH)
Daytime
660 Seconds25LSL_Long
00ConsoleM
00ConsoleL
360 Seconds600LSL_Utility
15 Seconds5,000LSL_Instant
00Default
0
0
0
0
0
0
0
1,500
0
0
0
Arrival Rate (QpH)
Throughput Goal (QpH)
0
0
0
0
120 Seconds
540 Seconds
120 Seconds
600 Seconds
120 Seconds
540 Seconds
120 Seconds
Response Time Goal (95%)
OP_High_Utility
OP_Medium_Utility
OP_Low_Utility
OP_Instant
OP_High
OP_Medium
OP_Low
ConsoleH
ConsoleR
OP_Restores
OP_Archives
Workload
23(C) Teradata Universe Seoul 2008 All Rights Reserved.
23
Agenda
• Brief overview
> Barclays
> Teradata data warehouse environment
> Teradata data warehouse workload
• Workload management prior to TASM and issues• How TASM was implemented • Workload management under TASM• Outstanding issues & Successes
24(C) Teradata Universe Seoul 2008 All Rights Reserved.
24
Routine Monitoring
• Delay Queue Statistics
• Service Level Goals
• TDWM Events
• TDWM Exceptions
• TDWM Summary
• DBQL
• In-house MS Excel spreadsheets
31(C) Teradata Universe Seoul 2008 All Rights Reserved.
31
DBA work
• Dynamic change of priority
• Release of Delays Query
• Catch up of Delayed Batch rule set
• Application of Throttles to Users
• Query investigation and allocation to workload
32(C) Teradata Universe Seoul 2008 All Rights Reserved.
32
Agenda
• Brief overview
> Barclays
> Teradata data warehouse environment
> Teradata data warehouse workload
• Workload management prior to TASM and issues• How TASM was implemented • Workload management under TASM• Outstanding issues & Successes
33(C) Teradata Universe Seoul 2008 All Rights Reserved.
33
Outstanding Issues
• Business• Matching workflow between Dev, OAT & Production. What works well in Dev/OAT
can be impacted by TASM in Production.
• Further time periods – first 5 working days of the month, first day of the month
• Users demanding access to HSL
• No take up of the LSL
• TASM• Zero estimated processing time
• SELECT DISTINCT
• Delay Time is not shown in Sessions Delayed
• Cannot see the SQL of queries held in Delay
• Have to log on to TDWM as tdwm
• TASM Modeller
34(C) Teradata Universe Seoul 2008 All Rights Reserved.
34
Pre-TASM and Issues
• Response times to user queries varies widely
• Limited ability to push batch critical path through
• The main Teradata server was at CPU capacity
• Poorly written queries could take huge amounts of system resources that required manual identification and action. This was not available over night and batch could be impacted.
• Well written short queries were being delayed by the bad
35(C) Teradata Universe Seoul 2008 All Rights Reserved.
35
Barclays TASM implementation
• Questions
• Contact [email protected]
Top Related