Case Study: Automating Outage Monitoring & Communication
-
Upload
dave-olsen -
Category
Internet
-
view
259 -
download
1
Transcript of Case Study: Automating Outage Monitoring & Communication
WEST VIRGINIA UNIVERSITYUNIVERSITY RELATIONS - DIGITAL SERVICES
CASE STUDY:AUTOMATING OUTAGE MONITORING & COMMUNICATION
Dave Olsen, Professional TechnologistUniversity Relations - Digital Services
WEST VIRGINIA UNIVERSITYUNIVERSITY RELATIONS - DIGITAL SERVICES
WEST VIRGINIA UNIVERSITYUNIVERSITY RELATIONS - DIGITAL SERVICES
AGENDA• The Problem: Lack of Monitoring and a Manual
Communication Workflow
• The Solution: Automating All The Things
• Quick Overview of Our Connected Services
• In-depth Feature Tour of StatusPage.io
WEST VIRGINIA UNIVERSITYUNIVERSITY RELATIONS - DIGITAL SERVICES
THE PROBLEM• Slowly lost central monitoring of services• Received only email notifications for three
services that were monitored by a 3rd party UR - Digital Services set-up
• Communication workflow wasn’t standardized• Had no way to communicate during a complete
system failure
WEST VIRGINIA UNIVERSITYUNIVERSITY RELATIONS - DIGITAL SERVICES
THE SOLUTION• Took responsibility for monitoring our own
services• Re-engaged Systems to find additional
monitoring opportunities• Evaluated solutions that could send voice, text,
and push notifications in addition to email • Documented a tiered communication workflow• Identified an off-site hosting solution for comm
WEST VIRGINIA UNIVERSITYUNIVERSITY RELATIONS - DIGITAL SERVICES
THE SOLUTIONMONITORING 1ST LEVEL COMM
app performance/uptime
uptime
voice, text, push notifications
off-site website
internal comm, logging
2ND LEVEL COMM
hardware (planned)
@wvuurdigital
Listservsoutages, prof tech cohort
Individual Emailsstatuspage.io subscribers
automated, internal automated, internal/external manual, external, WOPE
WEST VIRGINIA UNIVERSITYUNIVERSITY RELATIONS - DIGITAL SERVICES
THE SOLUTIONMONITORING 1ST LEVEL COMM
app performance/uptime
uptime
voice, text, push notifications
off-site website
internal comm, logging
2ND LEVEL COMM
hardware (planned)
@wvuurdigital
Listservsoutages, prof tech cohort
Individual Emailsstatuspage.io subscribers
automated, internal automated, internal/external manual, external, WOPE
WEST VIRGINIA UNIVERSITYUNIVERSITY RELATIONS - DIGITAL SERVICES
THE SOLUTIONMONITORING 1ST LEVEL COMM
app performance/uptime
uptime
voice, text, push notifications
off-site website
internal comm, logging
2ND LEVEL COMM
hardware (planned)
@wvuurdigital
Listservsoutages, prof tech cohort
Individual Emailsstatuspage.io subscribers
automated, internal automated, internal/external manual, external, WOPE
WEST VIRGINIA UNIVERSITYUNIVERSITY RELATIONS - DIGITAL SERVICES
THE LINCHPIN
StatusPage.io is the bridge between the automated monitoring and the manual second-level communication channels.
WEST VIRGINIA UNIVERSITYUNIVERSITY RELATIONS - DIGITAL SERVICES
IMPORTANT!
Incidents are the only things in StatusPage.io that generate tweets & emails. Component updates do not.
Incidents are manual.
WEST VIRGINIA UNIVERSITYUNIVERSITY RELATIONS - DIGITAL SERVICES
IMPORTANT!Component statuses can be automated. This way your page can show the real status of your system without manual intervention.
Automated updates can only toggle between major outage and operational.
WEST VIRGINIA UNIVERSITYUNIVERSITY RELATIONS - DIGITAL SERVICES
HIGHER ED CLIENTS• Georgia State University• University of Derby• University of California - Davis• Macquarie University• San Jose State University• University of Michigan
WEST VIRGINIA UNIVERSITYUNIVERSITY RELATIONS - DIGITAL SERVICES
WRAPPING IT UP
StatusPage.io is now the hub for our automated and manual outage communication efforts.