The Perinatal Periods of Risk Approach Sanil Thomas MS Biostatistics candidate April 27, 2010.
Sanil Service management for Service Providers · PDF fileService Management for Service...
Transcript of Sanil Service management for Service Providers · PDF fileService Management for Service...
Service Management for Service Providers
Sanil NambiarSenior IT Specialist, Tivoli Netcool solutions
Service AssuranceScopeWhat is there to be assured?
Getting there from hereGet all the eventsGet the performanceDiscover the components and their relationshipsProcess automationsMarry events to the service model
Reference ArchitecturesConclusion
Agenda
Addressing the Assurance ChallengeNetwork & Service
Assurance Challenges
4. High degree of manual processes, lack of proactive
management and lack of automated testing
5. Lack of accurate network topology for troubleshooting
2. Inability to link resource level issues to service and customer
level impacts
1. Inability to offer guaranteed SLAs, differentiated service to
business customers
3. Lack of standardized operations processes across the
various silosM
MS
Location-based Service
TV
Gam
ing
Content Services
Unified C
omm
unications
Fixed Voice
VPN
High-speed Internet
Centrex
Mobile Voice
User ExperienceUser Experience
Next Generation NetworkNext Generation Network
IP MPLS Ethernet DWDM xDSL WiMax 3G
Service ALWAYS ON & EASY to USE
FAST ACCESS to services
Get a dial tone EVERY TIME
NEVER miss a call
GREAT video and CRYSTAL CLEAR voice quality
FAST channel selection
RICH content available on-demand
EFFICIENT customer service
Customer Expectations
6. Limitations of incumbent service assurance systems to handle next-gen complexity
Scope of Assurance
Telco Desired End GoalAn automated Service Assurance process, where the operators are only managing the exceptions
Key Assurance ProcessContinuous resource status and performance monitoring to proactively detect possible failures.(Fault Management)
Collection of performance data and analyzes them to identify potential problems and resolve them without impact to the customer. (Performance Management/QOS)
Perform Resource and Service Testing. (Test Management, shared with Fulfillment)
Manages the SLAs and reports service performance to the customer. (Service Management)
It receives trouble reports from the customer, informs the customer of the trouble status, and ensures restoration and repair, as well as a delighted customer. (Trouble Management)
Effectively manage the Network change (Change Management)
ServiceProblem
Management
ProblemHandling
Customer Interface Management
Retention & Loyalty
Customer QoS / SLA
Management
ServiceQuality
Management
Resource Data Collection & Processing
Resource Trouble Mgt.
Resource Perform. Mgt.
Supplier / Partner Interface Management
S/P Problem Reporting& Mgt.
S/P Perform. Mgt.
Assurance
1.6 1.7
2.3 2.4
3.3 3.4
3.5
4.3 4.4
1.2
1.9
4.6
Customer QoS / SLA Mgt
Customer Service / Account Problem
Resolution
ServicePerformance Mgt
Service Quality Monitoring & Impact
Analysis
SLA MgmtService Problem
Mgmt
Customer Self Management
Customer Contact, Retention & Loyalty
Resource Performance Monitoring/
Management
Resource Problem/FaultManagement
Resource Domain Management (Network, IT Computing, IT Applications)
Resource Data Mediation
Resource Status Monitoring
Correlation & Root Cause
Analysis
Resource Testing Mgt
TMF eTOM Process Map
Assurance
TMF Telecommunications Application Map
Service AssuranceScopeWhat is there to be assured?
Getting there from hereGet all the eventsGet the performanceDiscover the components and their relationshipsProcess automationsMarry events to the service model
Reference ArchitecturesConclusion
Agenda
Service Assurance - What is out there to be Assured?Services Access Aggregation Core Content
OSS/BSS
Client
Fixe
dM
obile
Cab
leTriple Play- Voice- Broadband- IPTV / VoD
Other- Gaming, etc.
Quad Play- Mobility- Voice- Broadband- IPTV / VoD
Other- Presence, etc.
Triple Play- TV- Broadband- VoIP
Other- PPV, etc.
Physical- Copper Wire- Fiber- PSTN Switch
Broadband- DSLAM- xDSL
Physical- Radio Area Net.- BTS/BCS
Broadband- GPRS- EDGE- 3G
Physical- HFC Coax
Broadband- CMTS- Packetcable
Cablelabs
NGN
MPLS
Optical
Met
roEt
hern
et
VoIP
VoD
Game
P2P
TV
ISPThirdParties
PSTN Internet
How to Assure? – Best Practices
• Assure from the User Point-of-View• Measure User Experience and Service Level• Build User/Service oriented views
• Assure to Add Business Value• Prioritize issues based in $ value
• Integrate with high business value databases
• Assure where High Costs are Involved• Focus in high $ spending areas• Avoid Field Services overhead• Use Assurance to manage SLAs/SLOs
• Assure by Integrating with Processes• Feedback important information to managers
• Assure with end-to-end Convergent Service Views• IT + Network
• eg.: is the service up and being billed?
How to Assure? – Best Practices• Assure from the User Point-of-View
• Measure User Experience and Service Level• Build User/Service oriented views
• Assure to Add Business Value• Prioritize issues based in $ value• Integrate with high business value databases
• Assure where High Costs are Involved• Focus in high $ spending areas• Avoid Field Services overhead• Use Assurance to manage SLAs/SLOs
• Assure by Integrating with Processes• Feedback important information to managers
• Assure with End-to-End Convergent Service Views• IT + Network
• eg.: is the service up and being billed?
End-
to-E
nd V
iew
s
End-
to-E
nd V
iew
s
Service/User Views
Assure from the User Point-of-View
Assure from the User Point-of-View
Assure with End-to-End Convergent Service
Assure with End-to-End Convergent ServiceAssure with End-to-End
Convergent Service ViewsAssure with End-to-End
Convergent Service Views
How to Assure? – Best Practices• Assure from the User Point-of-View
• Measure User Experience and Service Level• Build User/Service oriented views
• Assure to Add Business Value• Prioritize issues based in $ value• Integrate with high business value databases
• Assure where High Costs are Involved• Focus in high $ spending areas• Avoid Field Services overhead• Use Assurance to manage SLAs/SLOs
• Assure by Integrating with Processes• Feedback important information to managers
• Assure with end-to-end Convergent Service Views• IT + Network
• eg.: is the service up and being billed?
Assure toAdd Business Value
Assure toAdd Business Value
Focu
s in
Hig
her C
osts
(Lab
or /
Infr
astr
uctu
re)
Focu
s in
Hig
her C
osts
(Lab
or /
Infr
astr
uctu
re)
Assure toAdd Business Value
Assure toAdd Business Value
Reduce OPEXIncrease Revenue
$
Prio
ritiz
eIs
sues
bas
ed o
n H
igue
st B
usin
ess
Impa
ctPr
iorit
ize
Issu
es b
ased
on
Hig
uest
Bus
ines
s Im
pact
Business Priorities
Assure toAdd Business Value
Assure toAdd Business Value Fo
cus
in H
ighe
stR
even
ueC
usto
mer
sFo
cus
in H
ighe
stR
even
ueC
usto
mer
s
How to Assure? – Best Practices
• Assure from the User Point-of-View• Measure User Experience and Service Level• Build User/Service oriented views
• Assure to Add Business Value• Prioritize issues based in $ value• Integrate with high business value databases
• Assure where High Costs are Involved• Focus in high $ spending areas• Avoid Field Services overhead• Use Assurance to manage SLAs/SLOs
• Assure by Integrating with Processes• Feedback important information to managers
• Assure with end-to-end Convergent Service Views• IT + Network
• eg.: is the service up and being billed?
Assure where High Costs are Involved
Assure where High Costs are Involved
Workforce AutomationChain Management
Optimize for Efficiency
Assure where High Costs are Involved
Assure where High Costs are Involved
Assure where High Costs are Involved
Assure where High Costs are Involved
Inte
grat
e w
ith T
T to
R
educ
e C
usto
mer
Vis
itsIn
tegr
ate
with
TT
to
Red
uce
Cus
tom
er V
isits
Incr
ease
Effi
cien
cyin
hig
h C
ost A
reas
Incr
ease
Effi
cien
cyin
hig
h C
ost A
reas
Man
age
Part
ner
SLA
s/SL
Os
Man
age
Part
ner
SLA
s/SL
Os
How to Assure? – Best Practices
• Assure from the User Point-of-View• Measure User Experience and Service Level• Build User/Service oriented views
• Assure to Add Business Value• Prioritize issues based in $ value• Integrate with high business value databases
• Assure where High Costs are Involved• Focus in high $ spending areas• Avoid Field Services overhead• Use Assurance to manage SLAs/SLOs
• Assure by Integrating with Processes• Feedback important information to managers
• Assure with end-to-end Convergent Service Views• IT + Network
• eg.: is the service up and being billed?
Assure by Integrating with Processes
Assure by Integrating with Processes
Workforce AutomationManager Data
Business DB Feedback
Inte
grat
e w
ith T
T to
O
ptm
ize
Cus
tom
er C
are
Inte
grat
e w
ith T
T to
O
ptm
ize
Cus
tom
er C
are
Assure by Integrating with Processes
Assure by Integrating with Processes
Info
rm D
B Is
sues
for
Impr
oved
Dat
a A
ccur
acy
Info
rm D
B Is
sues
for
Impr
oved
Dat
a A
ccur
acy
Kee
p M
anag
ers
info
rmed
w
ith C
usto
miz
ed R
epor
tsK
eep
Man
ager
s in
form
ed
with
Cus
tom
ized
Rep
orts
Assure by Integrating with Processes
Assure by Integrating with Processes
Service AssuranceScopeWhat is there to be assured?
Getting there Get all the eventsGet the performanceDiscover the components and their relationshipsInstantiate the service model - add the business layerMarry events to the service model
Reference ArchitecturesConclusion
Agenda
Security
Performance
Fault
Manual Correlation across groups
IsolatedFault
Security OPS
FaultOPS
PerfOPS
Before and AfterBEFORE
Correlation
between groups
takes place on
the phone, this is
expensive
Security OPS
FaultOPS
PerfOPS
Resource model, Service modelService path model
Security
Performance
Fault
IsolatedFault
AFTER
Leveraging
service model,
path awareness
and service path
modeling the
cross silo
correlation gets
done first and
the isolated fault
sent to the
correct group for
further action
Service AssuranceScopeWhat is there to be assured?
Getting there Get all the eventsGet the performanceDiscover the components and their relationshipsProcess automationsMarry events to the service model
Reference ArchitecturesConclusion
Agenda
Event management and Enrichment
• Centralize Events – Simple Return on Investment case– Its cheaper to have 5 operators watching the same console than
having 5 operators watching 5 different consoles• Why Events?
– Free• Don’t have to be polled for
»No load on device»No load on network»No load on poller
– Asynchronous• No waiting for polling cycle to catch up to the fault
• Enrich Events with resource, service, and customer information• Create the fewest and best trouble tickets possible
EventManagement
Service AssuranceScopeWhat is there to be assured?
Getting there Get all the eventsGet the performanceDiscover the components and their relationshipsProcess automationsMarry events to the service model
Reference ArchitecturesConclusion
Agenda
Resource Performance Management
• Performance Management–Capacity Planning, Operations, Customers
• Point performance• Point to Point performance
–Transaction monitors–Synthetic transactions
• Performance along a path–Application path, Service path–SOA
• Real Time Transaction reporting – managing to goals
PerformanceManagement
Service Visibility is Key
Security
Application ServerCICS
Container
WebServices
Container
WebSphere Message Broker
zSeries
•Visibility to the complete path of the transaction is key to maintaining high service levels
•Understanding the performance of the services AND the health of the underlying resources is key to quickly isolating and correcting problems
•Relating services to business process completes the picture
End-to-end service flow
Service AssuranceScopeWhat is there to be assured?
Getting there Get all the eventsGet the performanceDiscover the components and their relationshipsProcess automationsMarry events to the service model
Reference ArchitecturesConclusion
Agenda
Discovery of Resources and dependencies
• What applications are on what servers and how they connect to and depend on each other
• How do they connect to the Network?–Where everything connects to everything–What is the network topology? – Layer 1,2 and 3–Map events to network Topology, Topology based RCA–Optimal network inventory – Topology reconciliation
• Build your service model on top of the resource model you discovered
Discovery & Dependencies
Cross tier application maps
Configuration changes
Launch in context to configuration details
panels
Discovered resource to Service models
Providing visibility into application topology and availability to end users and business users
Always cumbersome due to complex dependencies between the componentsmaking up the application
Problems with availability are frequently caused by changes made to components supporting delivery of the application; identifying and tracking this change is essential to service quality
Automated service model building
Discovery & Dependencies
Service AssuranceScopeWhat is there to be assured?
Getting there Get all the eventsGet the performanceDiscover the components and their relationshipsProcess automationsMarry events to the service model
Reference ArchitecturesConclusion
Agenda
Process Automation Architecture
Service Assurance PlatformService Assurance Platform
Tier – 1,2 Techs
Field techs Command & Control Centers Account ManagersTier – 3 Techs
InventoryManagement
Trouble Management
SOA Adaptor SOA Adaptor SOA Adaptor
Process ServerEnterprise Service BuseTOM & ITIL
processes SID entities
OSS/JOSS/JMTOSI
Change Management
SOA Adaptor
Fault Management
SOA Adaptor
WS
Service QualityManagement
TCAsTopologyLookups
Operations Service Models Dashboards
NOC SOC
WS
CSR
SOA Adaptor
OSS/J
*Performance Management
Rules
Managem
ent
WS
TestManagementBrix, Empirix,
SOA Adaptor
WS
Process Automation Use-Cases• Change Management process:
Rule-based Ticket Enrichment:
Create Change Request (CR) Approve & Schedule CR Implement CR Verify CR is successful Close CRIdentify impacted
Configuration Items
Service AssuranceScopeWhat is there to be assured?
Getting there from hereGet all the eventsGet the performanceDiscover the components and their relationshipsProcess automationsMarry events to the service model
Reference architecture Conclusions
Agenda
Network failure RCA &Service Level Correlation
Network SNMP Event
Topology Based Root Cause Analysis
Dynamic Event Enrichment from external repository
Service Level Correlation
Network Failure RCA & Service Level Correlation (part 2)
Problem is service affecting(The + icon)
Parent Service Critical due to ‘Worst Child’ dependency
(i.e. if any children are critical, this service is also critical)
Cache Servers instance is marginal (yellow) due to
‘Percentage-of child status dependency’(i.e if > 30% children in Critical State,
then status is marginal, if > 70% children in Critical State, status is critical.
Drilldown to service affecting events
Service AssuranceScopeWhat is there to be assured?
Getting there from hereGet all the eventsGet the performanceDiscover the components and their relationshipsProcess automationsMarry events to the service model
Reference architecture Conclusions
Agenda
IBM’s interpretation of TAM for Service Management
Fulfillment Service Assurance BillingOSS/BSS Integration
Infrastructure Data Probes Session Data Probes
Fault Management
Performance Management
Signaling Analysis
Transaction Analysis
Service ModelingDefines Dependencies between
service componentsDefine mapping with resourcesDiscovers dependencies dynamically
Service Level ManagementMonitors Service Quality
CommitmentTrack Service Quality Violations
Service Impact
Process infrastructure events to understand impact on end-users & service delivery
Service Quality
MonitoringService Availability
and UsabilityVoice & Video
signallingVoice & Video
Stream QualityService Usage
Service DashboardsReal-time DashboardsService Navigation
Problem Management
IncidentTrouble TicketRoot Cause
ServiceDesk
Customer Service Portal
OrderingProvisioningActivation
BillingUsage
Promos/CreditsRebates
Change Management
Security Management
Discovery
ITNM, TADDMITNM, TADDM
ITNM, ITM w/TDW, Omnibus , ImpactITNM, ITM w/TDW, Omnibus , Impact
Proviso, Network Assure , ITM w/TDW , ITCAM, Omnibus, Impact
Proviso, Network Assure , ITM w/TDW , ITCAM, Omnibus, Impact
ISS, TIM, TFIM,TAM, TSOM
ISS, TIM, TFIM,TAM, TSOM
ITCAM + IBM partnersITCAM + IBM partners
TNSQMSTNSQMS
TNSQMS,ProvisoTNSQMS,Proviso
ProvisoService AssureProvisoService Assure
TNSQMS, TEP, WBMTNSQMS, TEP, WBM
Tivoli Service Desk, CCMDB + IBM partners
Tivoli Service Desk, CCMDB + IBM partners
TPM, WESB, WBS Fabric & WebSphere Process Server + IBM partners
TPM, WESB, WBS Fabric & WebSphere Process Server + IBM partners
Fulfillment
ITUAM , TPM, WESB, WBS Fabric and WebSphere Process Server + IBM partners
ITUAM , TPM, WESB, WBS Fabric and WebSphere Process Server + IBM partners
PhysicalLogical
Inventory
CMDB, Maximo Asset + IBM Partners
CMDB, Maximo Asset + IBM Partners
TNSQMS ,CMDB,ProvisioTNSQMS ,CMDB,Provisio
IBM PartnersIBM Partners
Service Assurance –Reference Architecture- Product Overlays
WebSphere Portal Server based -> Launch into TEP,TBSM
WebSphere Portal Server based -> Launch into TEP,TBSM
TivoliEnd-to-end Management for Service Providers
IBM Tivoli’s integrated portfolio covering wireless, wireline, IP, and IT domains enables end-to-end management of all elements supporting legacy and next
generation service delivery (e.g., IMS, fixed-mobile convergence)
Vallent acquisition adds complementary service quality and network performance managementBroad coverage for visibility into all network layersScalability to handle growing complexityModular deployment to provide immediate, incremental valueIntegration capabilities to leverage existing investments
TNPMWWireless Performance
Management
TNPMWireline Performance
Management
NetcoolAvailability
Management
Service DeliveryPlatform (SDP)
IP Network
Wireless Access Network Core IP Network
TivoliIT Automation, Security& Storage Management
RAN Wireless
CoreNetwork
TNSQMService QualityManagement
Business Correlation
Tivoli Incremental Progression to Integrated Service Assurance
Discovery
Discover entire infrastructure including topology
StartHere!
Event Correlation & RCAEnable root cause analysis linking discovered network to live faults
Incident and Problem MgmtAdd automated linkage to trouble tickets from/to event infrastructure
Add Customer and Business context allowing better prioritization
Network Performance Management
Analyse utilisation and forward capability of infrastructure
Event Correlation
Alert on threshold violations for operator action
Incident and Problem MgmtUse trouble ticket to drive operational workflow to resolution
SLA ManagementVisualize and report on conformance to customer SLAs
Network Performance Management
Detects degradation in service to particular geography
StartHere!
Business Correlation
Identifies Enterprise customers impacted by degradtion
Real Time Status Monitoring
Narrows the issue to particular service impacted by over subscription
Customer Experience ManagementFurther narrows problems to individuals with high capacity services
StartHere!
e.g. SRM
e.g. OMNIbus
e.g. TBSM
e.g. OMNIbus, ITNM
e.g. ITNM, TADDM e.g. Proviso, TNPMW e.g. Proviso, TNPMW
e.g. Impact
e.g. Impact e.g. SRM
e.g. TNSQM e.g. TNSQM
Service AssuranceScopeWhat is there to be assured?
Getting there from hereGet all the eventsGet the performanceDiscover the components and their relationshipsProcess automationsMarry events to the service model
Reference architecture Conclusions
Agenda
Everybody has plenty of management already
• Customers call and complain• Partners call and complain• Internal customers call and complain• Really smart people take a long time to diagnose and troubleshoot
when they should be building stuff• If your operation is small enough you can rely on really smart people
with pagers until you burn them out• This results in two things
– reliance on oral tradition–Cowboy change management
• As the organization gets bigger–Some things are always broken–Other things take a really long time to fix
Conclusion• Service management is a bottom up thing – skip a step and regret it
–Get all the events–Get all the performance–Discover how things connect to and depend on each other– Instantiate the service model–Process Automation–Add the business significance
• Manage with big picture knowledge• Open fewer, smarter trouble tickets• User fewer, lower end people to open them and work them