Ibm Summer Internship Slides

download Ibm Summer Internship Slides

of 35

Transcript of Ibm Summer Internship Slides

  • 8/8/2019 Ibm Summer Internship Slides

    1/35

    Autonomic ComputingAutonomic Computing

    Framework for Error Recovery inFramework for Error Recovery inIBM WebSphere MQIBM WebSphere MQ

    A Proof of ConceptA Proof of Concept

    NeerajNeeraj BishtBisht,, PawanPawan HN &HN &

    Vikram SubramanyaVikram SubramanyaSummer Interns of 2007Summer Interns of 2007

    IBM India Software Lab, BangaloreIBM India Software Lab, BangaloreManagerManager:: ArunArun ShivaswamyShivaswamy

    WebShpereWebShpere MQ GroupMQ Group

  • 8/8/2019 Ibm Summer Internship Slides

    2/35

    Profile: IBM ISL SoftwareProfile: IBM ISL SoftwareGroupGroup

    IBM Software GroupIBM Software Group -- largest middlewarelargest middlewarecompany in the worldcompany in the world

    Brands:Brands: WebSphere, Information Mgmt.,WebSphere, Information Mgmt.,Lotus, Tivoli, and RationalLotus, Tivoli, and Rational

    Technology Areas:Technology Areas: SOA, XML, Web 2.0,SOA, XML, Web 2.0,Application Servers, Databases, AutonomicApplication Servers, Databases, AutonomicComputingComputing

  • 8/8/2019 Ibm Summer Internship Slides

    3/35

    Motivation for ourMotivation for our PoCPoC

    Current SceneCurrent Scene:: WebSphereWebSphere MQ cannot comeMQ cannot comeout of erroneous situations by itselfout of erroneous situations by itself Needs manual interventionNeeds manual intervention

    ObjectiveObjective: To make MQ self: To make MQ self--reliantreliant Automatic monitoring/analysis of errorAutomatic monitoring/analysis of error

    Recovery actionRecovery action

    GistGist: Expose MQ to AC: Expose MQ to AC

  • 8/8/2019 Ibm Summer Internship Slides

    4/35

    Autonomic ComputingAutonomic Computing

  • 8/8/2019 Ibm Summer Internship Slides

    5/35

    WhatWhats Autonomics AutonomicComputing?Computing?

    AimAim: To create: To create selfself--managingmanaging systemssystems Overcome complexity by automatingOvercome complexity by automating

    maintenancemaintenance

    AC makes the system:AC makes the system:

    SelfSelf--ConfiguringConfiguring: adapt to changes, use policies: adapt to changes, use policies

    SelfSelf--HealingHealing: diagnose H/W or S/W disruptions: diagnose H/W or S/W disruptions SelfSelf--OptimizingOptimizing: maximize IT resource usage: maximize IT resource usage

    SelfSelf--ProtectingProtecting: defend from threats/attacks: defend from threats/attacks

  • 8/8/2019 Ibm Summer Internship Slides

    6/35

    MAPEMAPE--K Loop ArchitectureK Loop Architecture

  • 8/8/2019 Ibm Summer Internship Slides

    7/35

    The MAPEThe MAPE--K Loop in ACK Loop in AC

    MonitorMonitor: Collect, filter details from: Collect, filter details frommanaged resourcemanaged resource

    AnalyzeAnalyze: Learn IT: Learn IT envtenvt., predict future., predict future

    PlanPlan: Policy actions to achieve goals: Policy actions to achieve goals

    ExecuteExecute: Run the plan: Run the plan

    KnowledgeKnowledge: Data shared among MAPE like: Data shared among MAPE like

    symptoms & policiessymptoms & policies

  • 8/8/2019 Ibm Summer Internship Slides

    8/35

    IBMIBMWebSphereWebSphere MQMQ

  • 8/8/2019 Ibm Summer Internship Slides

    9/35

    WhatWhats IBM WebSpheres IBM WebSphereMQ?MQ?

    IBMIBMs middlewares middlewarefor messaging &for messaging &queuingqueuing

    CommunicationCommunicationamong programsamong programs

    across aacross aheterogeneousheterogeneousnetworknetwork API callsAPI calls

  • 8/8/2019 Ibm Summer Internship Slides

    10/35

    Messaging & QueuingMessaging & Queuing

    MQ analogous toMQ analogous toemail, not phone!email, not phone!

  • 8/8/2019 Ibm Summer Internship Slides

    11/35

    Queue Manager ObjectsQueue Manager Objects

    QueueQueue: To store: To store msgmsg sent by programs; localsent by programs; localor remoteor remote

    ChannelChannel: Logical communication link: Logical communication link Message ChannelMessage Channel: connects 2: connects 2 QMgrsQMgrs MQI ChannelMQI Channel: connects client to: connects client to QMgrQMgr

  • 8/8/2019 Ibm Summer Internship Slides

    12/35

    MQ Error ScenariosMQ Error Scenarios

  • 8/8/2019 Ibm Summer Internship Slides

    13/35

    QMgrQMgr Crash Error ScenariosCrash Error Scenarios

    QMgrQMgr crash by killing the OAMcrash by killing the OAMprocessprocess amqzfuma.exeamqzfuma.exe

    RecoveryRecovery: Close connection, restart: Close connection, restart QMgrQMgr

    QMgrQMgr crash due to access violation incrash due to access violation in

    the agent processthe agent process RecoveryRecovery: Close connection, restart: Close connection, restart QMgrQMgr

  • 8/8/2019 Ibm Summer Internship Slides

    14/35

    More MQ Error ScenariosMore MQ Error Scenarios

    Backward version DLLs placed in theBackward version DLLs placed in themachinemachine

    RecoveryRecovery: Find installation path from: Find installation path from

    registry; delete/renameregistry; delete/rename

    DCOM user ID configured incorrectlyDCOM user ID configured incorrectly RecoveryRecovery: Run: Run amqmjpseamqmjpse --ss rr

  • 8/8/2019 Ibm Summer Internship Slides

    15/35

    What Did We Do?What Did We Do?

    These error scenarios were manuallyThese error scenarios were manuallyinduced into MQinduced into MQ

    Populated Symptom catalog with possiblePopulated Symptom catalog with possibleerrors. Parsed the generated error logs toerrors. Parsed the generated error logs todetect themdetect them

    Developed AC framework (MAPEDeveloped AC framework (MAPE--K loop)K loop)to call recovery procedure automaticallyto call recovery procedure automatically

  • 8/8/2019 Ibm Summer Internship Slides

    16/35

    IBM Tools UsedIBM Tools Used

  • 8/8/2019 Ibm Summer Internship Slides

    17/35

    Error Log AnalysisError Log Analysis

    EclipseEclipse--based tool,based tool,IBM Log & Trace AnalyzerIBM Log & Trace Analyzer(LTA)(LTA)

    Converts textual log records into Common BaseConverts textual log records into Common BaseEvent (CBE) format by parsingEvent (CBE) format by parsing

    Log View of LTALog View of LTA::

  • 8/8/2019 Ibm Summer Internship Slides

    18/35

    Symptom DatabaseSymptom Database

    Knowledge base of problems & solutions forKnowledge base of problems & solutions fora software producta software product Symptom description: Why the problem occurs?Symptom description: Why the problem occurs?

    Rules to identify a problem:Rules to identify a problem: XPathXPath expressionsexpressions

    Recommended actionRecommended action

    LTA provides a symptom editorLTA provides a symptom editor

    Can also be used for correlation of eventsCan also be used for correlation of events

  • 8/8/2019 Ibm Summer Internship Slides

    19/35

    Symptom Editor In LTASymptom Editor In LTA

  • 8/8/2019 Ibm Summer Internship Slides

    20/35

    Closing MAPEClosing MAPE--K LoopK Loop

  • 8/8/2019 Ibm Summer Internship Slides

    21/35

    IBM Problem DeterminationIBM Problem DeterminationAssistant (PDA)Assistant (PDA)

    Tool to achieve closed AC loopTool to achieve closed AC loop

    ComponentsComponents::

    Generic Lop Adapter (GLA)Generic Lop Adapter (GLA) Symptom CatalogSymptom Catalog

    Analysis EngineAnalysis Engine

    Action ProcessorAction Processor Manager: Notification, configuration, autoManager: Notification, configuration, auto--

    updateupdate

  • 8/8/2019 Ibm Summer Internship Slides

    22/35

    Our Project:Our Project:GUI and Source CodeGUI and Source Code

    ExplainedExplained

  • 8/8/2019 Ibm Summer Internship Slides

    23/35

    Management Application based on AC frameworkManagement Application based on AC framework

    WebSphereMQ

    WebSphereMQ

    ErrorLogsNotification

    Router

    NotificationRouter

    Correlation

    Engine (If needed)

    CorrelationEngine (If needed)

    Action

    Processor

    ActionProcessor

    Action:Change

    Queue ManagerQueue Manager

    ContextContext

    Analysis EngineAnalysis Engine

    CBE for WMQerroneous

    situation

    CBECBE

    Management

    Data

    Management

    Data

    CBE

    SymptomDatabase for

    WMQ

    SymptomDatabase for

    WMQ

    Loadrules

    Action:

    Save

    Save

    CBE

    AC Centric TechnologiesAC Centric Technologies

    Generic Log Adaptor(GLA) / Log TraceAnalyzer (LTA) for WMQ

    Runtime platform TPTP

    XPath CorrelationEngine (if needed)

    Generic Log Adaptor(GLA) / Log TraceAnalyzer (LTA) for WMQ

    Runtime platform TPTP

    XPath CorrelationEngine (if needed)

    Restart

    Action APIs

    GLA for WMQGLA for WMQ

    Use Case Realization ofUse Case Realization ofQMgrQMgr CrashCrash

  • 8/8/2019 Ibm Summer Internship Slides

    24/35

    Project GUIProject GUI

  • 8/8/2019 Ibm Summer Internship Slides

    25/35

    List of MQ ProcessesList of MQ Processes

  • 8/8/2019 Ibm Summer Internship Slides

    26/35

    PutterPutterApplicationApplication

    PutsPuts msgmsg in a nonin a non--full queuefull queue WhileWhile(Q.Connection(Q.Connection not closed)not closed)

    IfIf ((Q.CurrDepthQ.CurrDepth

  • 8/8/2019 Ibm Summer Internship Slides

    27/35

    GetterGetterApplicationApplication

    ReceivesReceives msgmsg in a nonin a non--empty queueempty queue WhileWhile(Q.Connection(Q.Connection not closed)not closed)

    IfIf ((Q.CurrDepthQ.CurrDepth > 0)> 0)

    Q.GetQ.Get ((msgmsg););

    ElseElse wait();wait();

  • 8/8/2019 Ibm Summer Internship Slides

    28/35

    InduceInduce QMgrQMgrCrashCrash: Kill: KillOAM Process (!)OAM Process (!)

    Manually issue the commandManually issue the command taskkilltaskkill /f //f /imimamqzfuma.exeamqzfuma.exe

    forcefully kills theforcefully kills the FumaFuma imageimage

    QMgrQMgrcrashes!crashes! AllAllprocesses are killedprocesses are killed

  • 8/8/2019 Ibm Summer Internship Slides

    29/35

    PutterPutter&& GetterGetterStopStop

  • 8/8/2019 Ibm Summer Internship Slides

    30/35

    Log File MonitorLog File Monitor

    Call PDA to continuously ping in theCall PDA to continuously ping in thebackgroundbackground

    When generated, log file is parsed intoWhen generated, log file is parsed into

    CBE formatCBE format Error is matched with the symptomError is matched with the symptom

    catalogcatalog

    User is alertedUser is alerted

    Recovery action is calledRecovery action is called

  • 8/8/2019 Ibm Summer Internship Slides

    31/35

    QMgrQMgr Restart Action APIRestart Action API

    After detection ofAfter detection of QMgrQMgr crash,crash, Close all existing connections to theClose all existing connections to the

    QMgrQMgr

    Restart using the commandRestart using the command STRMQMSTRMQM

    If restart fails, wait and output errorIf restart fails, wait and output errorcodecode

  • 8/8/2019 Ibm Summer Internship Slides

    32/35

    PutterPutter&& GetterGetterRestartRestart

  • 8/8/2019 Ibm Summer Internship Slides

    33/35

    What Have We Achieved?What Have We Achieved?

    For the first time, benefits of autonomic computingFor the first time, benefits of autonomic computingare realized onare realized on WebSphereWebSphere MQMQ

    Common MQ errors are successfully overcome inCommon MQ errors are successfully overcome in

    our demonstrationour demonstration

    Feasibility is high, since time & space cost isFeasibility is high, since time & space cost isminimumminimum

    Value Addition to MQ as a selfValue Addition to MQ as a self--managing resourcemanaging resource

  • 8/8/2019 Ibm Summer Internship Slides

    34/35

    Future As We SeeFuture As We See

    AC framework extends to all MQ errors;AC framework extends to all MQ errors;Makes MQ completelyMakes MQ completely SelfSelf--ReliantReliant

    Manual intervention drastically reduces,Manual intervention drastically reduces,cutting labor costs to IBM;cutting labor costs to IBM;ProductivityProductivityincreasesincreases

    We predict aWe predict aParadigm ShiftParadigm Shiftin the MQin the MQproduct & maintenance teamproduct & maintenance team

  • 8/8/2019 Ibm Summer Internship Slides

    35/35

    Thank You!Thank You!

    Managers,Managers,Mr.Mr.ArunArun ShivaswamyShivaswamy ofof WebSphereWebSphereMQ group &MQ group &Mr. M RMr. M RAnandaAnanda, AC team at IBM, AC team at IBMISL, BangaloreISL, Bangalore

    TeamTeam--mates:mates:NeerajNeeraj BishtBisht, IITB,, IITB,PawanPawan HNHN, NITK, NITK

    andand Vikram SubramanyaVikram Subramanya, NITK, NITK

    MQ team, AC team at IBMMQ team, AC team at IBM

    IBM, for giving us valuable exposure to industrialIBM, for giving us valuable exposure to industrialresearch, with some cash coming our way too (!)research, with some cash coming our way too (!)