INSURANCEINVESTMENTSLOANSMORTGAGESPENSIONSSAVINGS BANKING CREDIT CARDS DSI Investigation & Practical...

Post on 22-Jan-2016

217 views 2 download

Transcript of INSURANCEINVESTMENTSLOANSMORTGAGESPENSIONSSAVINGS BANKING CREDIT CARDS DSI Investigation & Practical...

INSURANCE INVESTMENTS LOANS MORTGAGES PENSIONS SAVINGSBANKING CREDIT CARDS

DSI Investigation DSI Investigation

& Practical Health Modelling& Practical Health Modelling

DSI Investigation DSI Investigation

& Practical Health Modelling& Practical Health Modelling

Kev Robinson

&

Rob Morgan

BankingBanking PensionsPensions MortgagesMortgages LoansLoansCredit CardCredit Card InvestmentsInvestments SavingsSavingsInsuranceInsurance

MICROSOFT INFRASTRUCTURE ARCHITECT FORUM : October 2005

2/46

MICROSOFT INFRASTRUCTURE ARCHITECT FORUM : October 2005

Agenda• Introduction & Background

• Elements of DSI

• Aims

• Build & Deployment

• Monitoring & Creating a Health Model

• Dynamic Systems Initiative

– Availability & Resource on demand

• Conclusions

3/46

MICROSOFT INFRASTRUCTURE ARCHITECT FORUM : October 2005

Background

Nationwide

Building

Society

4/46

MICROSOFT INFRASTRUCTURE ARCHITECT FORUM : October 2005

Nationwide• The World’s largest Building Society

• Seventh largest financial organisation in the UK

• 9% par share of the UK retail savings balances – 2nd largest

• 11.8% par share of the UK residential mortgage lending - 4th largest

• 11 million customers

• 1st On-line banking offering in UK

• 1 in 4 UK households have a relationship with Nationwide

• 16,000 employees

• Around 880 Retail Outlets

• Over 2,350 ATMs

• £112 billion assets

5/46

MICROSOFT INFRASTRUCTURE ARCHITECT FORUM : October 2005

Nationwide – Technology

• Technology Division

– Approx. 1350 employees

• www.nationwide.co.uk

• Business Systems & Servicesinclude:

– Online Banking– Payments processing– Mortgage and loans systems– Customer Relationship Management– Point of sales systems– Call centre technologies– Regulatory systems, e.g. BASEL II– … Total 130+ systems

6/46

MICROSOFT INFRASTRUCTURE ARCHITECT FORUM : October 2005

Background

Why

Investigate

DSI?

7/46

MICROSOFT INFRASTRUCTURE ARCHITECT FORUM : October 2005

Background to investigating DSI• Why were we interested in doing this?

• History

– A service was a server or group of servers

– Shared environments

– More complex relationships between systems

• Improved delivery of infrastructure

• Improved Enterprise Systems Management

• Service monitoring was previously “Server Monitoring”

• Options to reduce the TCO of Technology Infrastructure

8/46

MICROSOFT INFRASTRUCTURE ARCHITECT FORUM : October 2005

Aims

How to best

deploy and manage

new systems and

technologies?

9/46

MICROSOFT INFRASTRUCTURE ARCHITECT FORUM : October 2005

Challenges• Builds

– Manual builds aren’t sustainable

– Lack of consistency

• Monitoring

– Eventlog monitoring & performance counters aren’t sufficient

– Systems Management needs to be part of the development process

• Load Balancing & Clustering

– So what is running where?

– When is a service reduced or unavailable?

• Adapting to a changing business

– Additional processing required at key times of the year

10/46

MICROSOFT INFRASTRUCTURE ARCHITECT FORUM : October 2005

Proof points• Provisioning and de-provisioning

• Automate wherever possible

• Building an effective health-model

• End to end systems management

• Prove high-availability technologies

• Review published best practices

• Demonstrate DSI principles

Best position Nationwide with technology and skills to run the core infrastructure of

future application delivery.

With technologythat is

availabletoday

11/46

MICROSOFT INFRASTRUCTURE ARCHITECT FORUM : October 2005

Technologies Involved• Automating the build of servers

– Automated Deployment Services (ADS)

• Automating the deployment of standard components

– Automated Purposing Framework (APF)

• Deploy new releases of applications

– Standard approach to application install (MSI technology)

• Monitor & manage systems – reporting of problems

– Microsoft Operations Manager 2005 (MOM)

• Review how to improve infrastructure availability

– Database Clustering, Load-balancing, etc.

12/46

MICROSOFT INFRASTRUCTURE ARCHITECT FORUM : October 2005

Automated

Builds

13/46

MICROSOFT INFRASTRUCTURE ARCHITECT FORUM : October 2005

Why automate builds?• Automated Builds

– Manually building servers and associated installs can take hours

– Consistency of builds and between environments

– Reduce the time taken to resolve problems

– Improve configuration management

– Knowledge management – scripts rather than in people’s head

• Provision on demand & provision on failure

• Availability: Rapid deployment and reaction to events in a controlled and managed way

• Business: Ability to deploy systems when business process or demand changes

14/46

MICROSOFT INFRASTRUCTURE ARCHITECT FORUM : October 2005

Server buildWindows

StandardComponents

SystemProducts

BusinessApplications

Server

.

AntiVirus

Backup

MOM

CustomSettings

Securitysettings

Schedulingagent

Role:

BizTalk

SQL Server

Web Server

Web Services

ApplicationServer

Bu

sin

ess

App

licati

on

s

Sta

nd

ard

Win

dow

s O

/S

ADS APF APF APF & MSI

15/46

MICROSOFT INFRASTRUCTURE ARCHITECT FORUM : October 2005

• What do you do?

Automated Builds

Manual Builds

Deploy Just baseOperating System

Operating System& Standard Components

Automate completesystem builds

16/46

MICROSOFT INFRASTRUCTURE ARCHITECT FORUM : October 2005

Release

Deployment

17/46

MICROSOFT INFRASTRUCTURE ARCHITECT FORUM : October 2005

Release Deployment• Consistent delivery from development into production

• Scheduled release rather than manually intensive

• Capability for Operations to release changes

– Enabling the ability to regress.

• Help reduce the deployment manpower

– Improve speed to deliver

– Reduce complexity

– Provide consistency

– Better configuration & change management

• Availability: Slick release and regression

• Business: Improved reliability of releases

18/46

MICROSOFT INFRASTRUCTURE ARCHITECT FORUM : October 2005

System

Monitoring DB

System Center System Center Data Data

WarehouseWarehouse ReportingReporting

AgentsAgentsAgentsAgents

Ops ConsoleOps ConsoleAdmin ConsoleAdmin ConsoleWeb ConsoleWeb Console

MOMMOM ServerServer

19/46

MICROSOFT INFRASTRUCTURE ARCHITECT FORUM : October 2005

System Monitoring & Management

•Traditional Alert Management

– Monitor event logs

– Monitor performance statistics

– Event correlation

•But…

– What about load-balanced systems?

– What about differentiating between degraded service and service outage?

– Is all OK when there are no alerts?

– Is the service down when there are dozens of events?

20/46

MICROSOFT INFRASTRUCTURE ARCHITECT FORUM : October 2005

Health

Modelling

Rob Morgan

Stay Aware

Effectively

Respond

Be Accountable

21/46

MICROSOFT INFRASTRUCTURE ARCHITECT FORUM : October 2005

Health/Task Model – Background• During development, applications and services are

coded to produce numerous alerts and messages.

– No consistent understanding how these map to service availability.

– No view on application coverage

• Task Model

– No consistent delivery of required processes

• To recover service should an outage occur

• Administer the service

22/46

MICROSOFT INFRASTRUCTURE ARCHITECT FORUM : October 2005

Service Composition• Server estate contains hundreds of interconnecting:

– Servers– Services– Components

• Elements are complex and not consistently recorded

• Entity Hierarchy

– Host

• The location on which components are hosted or accessed

– Components

• Application or services installed on a logical host

– Service

• Highest level description of the end to end solution

23/46

MICROSOFT INFRASTRUCTURE ARCHITECT FORUM : October 2005

Component Relationships• Relationships

– Consumes

• The component that receives information

– Consumed

• The component that provides information

• Impact

– Green

• Outage of the component has no impact

– Amber

• Outage of the component will degrade the operation of the dependant components

– Red

• Outage of the component will stop the operation of the dependant component

24/46

MICROSOFT INFRASTRUCTURE ARCHITECT FORUM : October 2005

Developing the modelAn Example Component Relationship Mapping

Server02

Server03

Payment Submit

Server01

Payment Workflow

Work Allocation

PaymentBatchStatus

Workflow System

Payment System

25/46

MICROSOFT INFRASTRUCTURE ARCHITECT FORUM : October 2005

Health Model Hierarchy

Payment Workflow

Work Allocation

Payments SystemWorkflow System

Payment Batch Status

Payment Submit

26/46

MICROSOFT INFRASTRUCTURE ARCHITECT FORUM : October 2005

Operational Conditions

Managed Entity – Payment Submit

Aspect – Queue Lengths

Up

Down

Degraded

•High level condition

• Impact to the end user

27/46

MICROSOFT INFRASTRUCTURE ARCHITECT FORUM : October 2005

Method of Collection

Managed Entity – Payment Submit

Aspect – Queue Lengths

Up

Down

Degraded

T3

T2

T1

Example items:

• Eventlog

• Standard Info

• Component

• Blame component

• State before

• State afterwards

28/46

MICROSOFT INFRASTRUCTURE ARCHITECT FORUM : October 2005

Managed Entity – Payment Submit

Aspect – Queue Lengths

StateDetector -1

ProviderProvider

NT event log

Perfmon data

WMI

SNMP

Log files

Syslog

CriteriaCriteria

Wheresource=DCOM and Event ID=1006

Detector – What has gone wrong

29/46

MICROSOFT INFRASTRUCTURE ARCHITECT FORUM : October 2005

Managed Entity – Payment Submit

Aspect – Queue Lengths

State Diagnoser

ResponseResponse

Alert

Script

SNMP trap

Pager

E-Mail

Task

Managed Code

File Transfer

Diagnose – Identify Root Cause

30/46

MICROSOFT INFRASTRUCTURE ARCHITECT FORUM : October 2005

Diagnoser

Resolver

Resolver

ResponseResponse

Alert

Script

SNMP trap

Pager

E-Mail

Task

Managed Code

File Transfer

Resolver – How do we fix it?

31/46

MICROSOFT INFRASTRUCTURE ARCHITECT FORUM : October 2005

Managed Entity – Payment Submit

Aspect – Queue Lengths

State Verifier

ProviderProvider

NT event log

Perfmon data

WMI

SNMP

Log files

Syslog

CriteriaCriteria

Wheresource=DCOM and Event ID=1006

Verifier – Has the problem been resolved?

32/46

MICROSOFT INFRASTRUCTURE ARCHITECT FORUM : October 2005

Method of collection – Task Model

Managed Entity – Payment Submit

Aspect – Queue Lengths

Up

Down

Degraded

T3

T2

T1

Example items:

• Description

• Operator Instructions& procedures

• Roles

• Frequency

33/46

MICROSOFT INFRASTRUCTURE ARCHITECT FORUM : October 2005

Health Model

Health Model & Components

Service View

Business ApplicationBusiness ApplicationBusiness ApplicationBusiness Application

Component ComponentComponent

.

34/46

MICROSOFT INFRASTRUCTURE ARCHITECT FORUM : October 2005

Limitations• How to implement within MOM2005

• Confined to physical servers

• No service view

• Server role versus component

35/46

MICROSOFT INFRASTRUCTURE ARCHITECT FORUM : October 2005

System Monitoring – Summary

36/46

MICROSOFT INFRASTRUCTURE ARCHITECT FORUM : October 2005

Business

&

information

Monitoring

Kev Robinson

37/46

MICROSOFT INFRASTRUCTURE ARCHITECT FORUM : October 2005

Additional Monitoring & Reporting• Business reporting

– Reporting business state rather than technical

• Business events

– e.g. suspicious payment data being processed – data driven alerts

• Operator tasks

– Restore service using common tasks

– Improve 1st level support

• Availability: Full proactive management. Automated responses. Visible “health-state”

• Business: Reduced downtime. Informed of what is happening. Reporting on service level exceptions

38/46

MICROSOFT INFRASTRUCTURE ARCHITECT FORUM : October 2005

DSI

Dynamic

Systems Initiative

39/46

MICROSOFT INFRASTRUCTURE ARCHITECT FORUM : October 2005

DSI – Building Blocks• Model based development tools

– System Definition Model– Coming with Visual Studio 2005

• Operationally aware applications

– Management packs for MOM– Application instrumentation

• Model-based Management

– State view & Service availability reporting– Health models

• Dynamic Resource Availability

– Automated builds and deployment– Performance governed deployment– Improved hardware utilisation

40/46

MICROSOFT INFRASTRUCTURE ARCHITECT FORUM : October 2005

DSI

Resilience

&

Dynamic

Resource

41/46

MICROSOFT INFRASTRUCTURE ARCHITECT FORUM : October 2005

Why Dynamic Resource?

• Meeting performance needs

– Year ends

– Business growth

– Internet adoption

– “Hidden internet threats”

• Biggest “hidden threats” to performance?

– Is it security?

– Is it hacking?

– Is it phishing?

– Clue: Insider threat…

MARKETING .

XXX

Safe advert

£10 voucherfor every10,000th eBankingsign-on

Unsafe advertMARKETING

42/46

MICROSOFT INFRASTRUCTURE ARCHITECT FORUM : October 2005

Resilience

1 server unavailabl

e

1 database taken down for maintenance

SQLCluster

Web BizTalk Database Web BizTalk Database

Technology Service

Web BizTalk Database Web BizTalk Database

Technology Service

Web BizTalk Database Web BizTalk Database

Technology Service

Both servers

unavailable

Web BizTalk Database Web BizTalk Database

Technology Service.

Users

43/46

MICROSOFT INFRASTRUCTURE ARCHITECT FORUM : October 2005

Dynamic Resourcing

Pool of Servers.

44/46

MICROSOFT INFRASTRUCTURE ARCHITECT FORUM : October 2005

Conclusions

45/46

MICROSOFT INFRASTRUCTURE ARCHITECT FORUM : October 2005

Conclusions

• Automate builds and deployments where possible

• Develop a health-model for business systems

– Needs to be factored in with the application development

• Improved monitoring & reporting

• Automated recovery and pro-active responses

• Make better use of hardware, reducing TCO

46/46

MICROSOFT INFRASTRUCTURE ARCHITECT FORUM : October 2005

BankingBanking PensionsPensions MortgagesMortgages LoansLoansCredit CardCredit Card InvestmentsInvestments SavingsSavingsInsuranceInsurance

• Thank you

• Kev Robinson kev@nbs.co.uk

• Rob Morgan robj.morgan@nationwide.co.uk

? Any Questions

© Nationwide Building Society, 2005. Some images © Microsoft Ltd.