S068

17
Autonomic Computing © 2005 IBM Corporation The Role of Predictive Methods in Autonomic Computing April 27, 2005 Ric Telford Director of Architecture and Development, Autonomic Computing

Transcript of S068

Page 1: S068

Autonomic Computing

© 2005 IBM Corporation

The Role of Predictive Methodsin Autonomic ComputingApril 27, 2005

Ric TelfordDirector of Architecture and Development, Autonomic Computing

Page 2: S068

2

Autonomic Computing

© 2005 IBM Corporation

Agenda

Autonomic Computing overviewAC Problem Determination TechnologiesCustomer ResultsThe Self-Healing VisionSummary

Page 3: S068

3

Autonomic Computing

© 2005 IBM Corporation

Today’s Complex Infrastructure

Management of complex,

heterogeneous environments

is too difficult

IT asset utilisation is too low

Operational speed too slow;IT flexibility too limited

Privacy, security and

business continuity

Inability to manage the

infrastructure seamlessly

Swamped by the proliferation

of technology and platforms

to support

WWW

Page 4: S068

4

Autonomic Computing

© 2005 IBM Corporation

“IBM’s autonomic computing initiative will become its most important cross-product initiative (as the foundation of On Demand Business).”

— Thomas Bittman, Gartner

Increased return on IT investment

Improved flexibility, resiliency and quality of service

Accelerated time to value

Providing customer value

Focus on business value, not infrastructure

Adapt to unpredictable conditions Continuously tune themselves Prevent and recover from failures Provide a safe environment

Autonomic Computing delivers intelligent open systems that:

Sense and respond to ever-changing environments

Page 5: S068

5

Autonomic Computing

© 2005 IBM Corporation

IBM Autonomic Computing Structure

Open Standards

Autonomic Computing Architecture

Products delivering autonomic features

Autonomic Computing Common Components

Problem Determination

Provisioning Admin ConsoleWorkload Mgt

Autonomic Computing Control Loop

Autonomic Computing Architecture Blueprint

Log/Trace Analyzer Generic Log Adapter Solution installation &

dependency checking Common Console Autonomic Management

Engine

50 products with 415+ features Partner solutions

Common log format Solution installation schema

Installation Management Engine

Page 6: S068

Autonomic Computing

© 2005 IBM Corporation

Autonomic Computing:Problem DeterminationTechnologies

Page 7: S068

7

Autonomic Computing

© 2005 IBM Corporation

The Pain Point….

Backup Servers

FireWall

HTTPServers

FireWall

FireWall

DataServers

ApplicationServers

FireWall

Network Routers/Switches

Policy Servers

Managing Servers

LoadBalancers

LDAP Registries

You

LoadBalancers

Edge ServersSecurity Servers

LoadBalancers

Page 8: S068

8

Autonomic Computing

© 2005 IBM Corporation

Today’s Approach… Internal Swat Team – The Manual Process

Requires:

Key resources across the IT staff to get the breadth of skills to understand the end-to-end problem

Deep understanding of log file formats

Deep understanding of system components

Result:Multiple man-hours/days/weeks of effort

Political issues – passing the blame

Insufficient / inadequate data can cause this approach to fail

Customers are repeating this step today for every major IT outage

Page 9: S068

9

Autonomic Computing

© 2005 IBM Corporation

Disparate pieces and parts

Tools focused on individual products

No common interfaces among tools

No synergies in building tools OR in creating log entries

Log format todayProblem determination: Log format tomorrow

Generic log adapter

Common format for log files

Common set of tools

Common interfaces among tools

common base event

Ad

ap

ters

Ad

ap

ters

Common Base Eventan OASIS standard

Common Base Eventan OASIS standard

Database

Networks

ApplicationServer

Servers

Storage devices

Applications

Page 10: S068

10

Autonomic Computing

© 2005 IBM Corporation

Common Base Event Format

MsgDataElement

msgId : Str ing

msgIdType : Str ing

msgCatalog Id : Stri ng

msgCatalogTokens : Str ing[]

msgCatalog : Str ing

msgLocal e : Str ing

msgCatalogType : Stri ng

ComponentIdentification

location : String

locationType : Str ing

application : Str ing

execut ionEnvironment : Str i ng

component : St ri ng

subCompo nent : Str ing

componentIdType : Str ing

ins tanceId : Str ing

process Id : Str ing

th read Id : String

Context Dat aElement

contextId : Stringty pe : Stringname : StringcontextValue : String

ExtendedDataElement

name : Str ing

type : Str ing

values : Str ing[]

hexValue : byte[]

0..n

1

+childr en

0..n

1

CommonBaseEvent

extensionName : Str ing

localInstanceId : Str ing

globalInstanceId : Str ing

creationTime : Str ing

sever ity : shor t

pr ior ity : shor t

situationType : Str ing

msg : Str ing

repeatCount : shor t

elapsedTime : long

sequenceNumber : long

version : Str ing = commonbaseevent1_0

10..1 1

+msgDataElement

0..1

11

+sourceComponentId

11

0..11

+repor terComponentId

0..11

1

0..n

1

+contextDataElements

0..n

0..n

1

+extendedDataElements 0..n

1

AssociationEngine

id : Str ing

name : Str ing

type : Str ing

AssociatedEvent

0..n

1

+resolvedEvents

0..n

1

0..n

1

+associatedEvents 0..n

1

1

0..1

+asso ciati onEngine

1

0..1

Page 11: S068

11

Autonomic Computing

© 2005 IBM Corporation

Supported Log Formats (Feb 2005) AIX errpt log AIX syslog Apache HTTP Server access log Apache HTTP Server error log CICS Transaction Server for z/OS System message

log Common Base Event XML log ESS (Shark) Problem log IBM Communications Server log IBM DB2 Express diagnostic log IBM DB2 Universal Database Cli Trace log IBM DB2 Universal Database JDBC trace log IBM DB2 Universal Database SVC Dump on z/OS IBM DB2 Universal Database Trace log IBM DB2 Universal Database diagnostic log IBM HTTP Server access log IBM HTTP Server error log IBM WebSphere Application Server activity log IBM WebSphere Application Server for z/OS error log IBM WebSphere Application Server plugin log IBM WebSphere Application Server trace log IBM WebSphere Commerce Server ecmsg log IBM WebSphere Commerce Server ecmsg, stdout,

stderr log IBM WebSphere InterChange Server log IBM WebSphere MQ FDC log IBM WebSphere MQ error log IBM WebSphere MQ for z/OS Joblog IBM WebSphere Portal Server appserver_err log IBM WebSphere Portal Server appserverout log IBM WebSphere Portal Server run-time information log

IBM WebSphere Portal Server systemerr log IBM WebSphere Portal Server systemout log IBM Websphere Edge Server log Javacore log Logging Utilities XML log Microsoft Windows Application log Microsoft Windows Security log Microsoft Windows System log Oracle JDBC trace log Oracle alert log Oracle listener log Oracle server log Rational TestManager log RedHat syslog SAN File System log SAN Volume Controller error log SAP system log Squadrons-S Problem log SunOS syslog SunOS vold log TXSeries CICS Console/CSMT log z/OS Component trace z/OS GTF trace z/OS Joblog z/OS Logrec z/OS System log(SYSLOG) z/OS System trace z/OS master trace

Page 12: S068

12

Autonomic Computing

© 2005 IBM Corporation

Log Correlation – Generating the End-to-End View

Transition from trying to understand log formats to identifying ways to analyze the overall data and the end-to-end view

Move the Mindset from Monitoring to Analysis

With Correlation IDs in place, or Correlation methods identified:Implement a Correlation Engine in the Log Analyzer

Generate a sequence diagram showing the log interactions and sequence of events

Help the IT staff hone in on where the problem occurred:Identify quickly where to concentrate efforts

Page 13: S068

13

Autonomic Computing

© 2005 IBM Corporation

End Results…

Multiple IT-Skilled

Resources

Multiple Man-Hours / Days /

Weeks of analysis

Unstructured Swat Team

Approach with success unknown

Repeatable Process with a reusable set of

tools

Root Cause identification in hours / minutes

Single PD-Skilled

Resource

From To

Page 14: S068

14

Autonomic Computing

© 2005 IBM Corporation

Self-Healing - Customer ResultsFrom several hours/days to less than one hour

85% Improvement

70% Improvement

50% Improvement

10 to 30% Savings in IT Support Costs

50% Improvement – IBM’s SAP Deployment

60% Improvement

60% Improvement

20 to 30% Improvement

10 to 20% improvement in operational staff productivity – IBM Software Delivery and Fulfillment

From 3 people 2 hours to 1 person 15 min

40% Improvement

75% Improvement

New in 2005

Page 15: S068

15

Autonomic Computing

© 2005 IBM Corporation

Self-Healing Roadmap

Event Event RepresentationRepresentation

AdaptersAdapters

IBM DeployersIBM Deployers

Knowledge Knowledge RepresentationRepresentation

Event Correlation Event Correlation and Analysisand Analysis

Partner DeployersPartner Deployers

Action Action RepresentationRepresentation

Knowledge Knowledge AccumulationAccumulation

Customer PullCustomer Pull Capture

RemediationBusiness PolicyBusiness Policy

Continuous Continuous AvailabilityAvailability

Knowledge SharingKnowledge Sharing

Self Healing

Analysis

Standard data model for common situation and event reporting

Tooling for easy adoption of standard

Commitments from IBM brands and IBM Partners to support the data model

Standardize data model for symptom analysis

Transport & correlate events from all components in IT infrastructure

Predictive Analysis Constructs

ARM Correlation

Standardize data model for change requests, change plans

Standardize grammar to describe change requests and constraints

Allow analysis and planning when uncertainty is present

Allow human to determine recovery action

High-profile customer deployments and references

Business policies guide self-healing system

Preemptive diagnostics automatically recognize and resolve problems

Call home facilities are integrated as part of self-healing solutions

Symptom data made available to customers, ISVs, partners

2004

2004-2005

2007

2006

Page 16: S068

16

Autonomic Computing

© 2005 IBM Corporation

Self-Healing Vision

WinSS

AIXDB2MQ

zOSDB2MQ

CallHome

MA EP

Increased Embedded Self-Management Function

IT Professionals

ToolingSymptom

Policy

Config

MA EP MA EP MA EP

CBEs

Human-based MAsand associatedtooling for correlation,analysis, viewing

Adapter

Analyze Plan

ExecuteMonitor

Knowledge

SymptomSymptom

Change Type

Change Type

CBECBE ActionAction

Change Plan

Change Plan

Sensor Effector

Page 17: S068

17

Autonomic Computing

© 2005 IBM Corporation

Summary

IBM’s Autonomic Computing initiative has helped deliver the right “hygiene” to enable the industry for better Problem Determination

Predictive technologies can capitalize on this hygiene to help automate the “Problem Determination” process

We need continued research and cooperation across IBM and the industry at large to make the vision of Self-Healing systems a reality!