NHS Choices: Managing complex infrastructure to deliver critical online services
-
Upload
splunk -
Category
Technology
-
view
525 -
download
1
Transcript of NHS Choices: Managing complex infrastructure to deliver critical online services
Copyright © 2015 Splunk Inc.
NHS Choices: Managing complex infrastructure to deliver critical online servicesNeil Moran, NHS Choices
Copyright © 2015 Splunk Inc.
Introduction
Matt Davies, Splunk
© 2015 Splunk Inc. Proprietary and Confidential InformationNot for Redistribution
About SplunkCompany (NASDAQ: SPLK)• Founded 2004, first software release in 2006• HQ: San Francisco / Regional HQ: London, Hong Kong• Over 1,800 employees, based in 12 countries
Business Model / Products• Free download to massive scale• Splunk Enterprise, Splunk Cloud, Splunk Light• Hunk: Splunk Analytics for Hadoop
10,000+ Customers• Customers in 100 countries• 80+ of the Fortune 100• Largest license: Over 400 Terabytes per day
MAKE MACHINE DATAACCESSIBLE, USABLE,
AND VALUABLETO EVERYONE
Platform for Machine Data
IndustrialData and
Internet ofThings
Security,Complianceand Fraud Application
DeliveryIT
OperationsBusinessAnalytics
OPERATIONAL INTELLIGENCE
European organisations transforming
business with Splunk
Infrastructure Monitoring
Application Delivery
Continuous Delivery
IT Troubleshooting
Splunk for IT Operations and App DeliveryManage Operations Through a Single Pane of Glass
Infrastructure Monitoring
Service Intelligence
Application Delivery
Continuous Delivery
IT Troubleshooting
Platform for Machine Data
Splunk IT Service Intelligence Data-Driven Service Monitoring and Analytics
At-a-Glance Problem Analysis
Early Warning on Deviations
Dynamic Service Models
Seamless Workflow Integrations
Time Series Index Schema-on-the-Fly Data Model Common Information Model
Platform for Machine Data
Splunk IT Service Intelligence
HOW VODAFONE GOT END-TO-END INSIGHT USING SPLUNK ITSI
Glass table visualizations enable rapid and proactive issue resolution
Custom KPIs empower teams across the business, operations & security
Actionable service insights in two days, not months
Copyright © 2015 Splunk Inc.
NHS Choices: Managing complex infrastructure to deliver critical online servicesNeil Moran, NHS Choices
11
About HSCIC
The Health and Social Care Information Centre (HSCIC) was formed in April 2013 as an executive non-departmental public body and the national provider of information,
data and IT systems for patients, service users, clinicians, commissioners, analysts and researchers in health and social care.
We have responsibility for a number of national systems, including Spine, NHSMail, Electronic Referrals and Gps Systems of Choice and NHS Choices to name just a few.
We provide information, data and IT Systems for commissioners, analysts and clinicians in health and social care. Our work includes…
• Setting standards that protect patient’s confidential information, reduce bureaucracy and improve data quality
• Operating essential technology services that support the health and care system
• Collecting, analysing and publishing national data and statistical information that helps inform decision making
• Developing the next generation of national data and information systems
HSCIC – What we do
13
About NHS Choices● Online 'front door' to the NHS –
the UK’s biggest health website
● More than 40 million visitors a month
● More than 20,000 regularly updated articles and hundreds of videos and interactive tools
● Compare health services in England
14
About NHS Choices
15
About NHS Choices
16
Our Original Challenge
We want to continually improve services we provide for the public at the lowest possible
cost.
17
Our Original Challenge
Used external service to monitor site availability. When problems occurred we had
to log on to each web server in turn to identify the root cause. Needed to identify
availability issues faster..
18
Our Original Challenge
More than 40 million visits a month, generates a LOT of log data.
Our Splunk Journey
19
Nov 2010• Tried out free
version• Attended
online demo• Minds blown!
June 2011• First purchase
5GB/day license
• Reporting revolutionised
Today• Multiple use
cases• Real-time stats on
display via big screens in the office
Looking ahead• More custom
dashboards/reports
• Plans to expand into other areas of the business
20
Live Performance Stats on Display● Main live operations dashboard:
21
Live Performance Stats on Display● Public Health England campaigns dashboard:
22
Live Performance Stats on Display● Other views (Google Maps, Particles, Sideview utils, XML, forms):
23
NHS Choices: Complex Infrastructure● 3 discrete technology stacks
● Sharepoint, SQL, Windows, .NET● Ruby, Nginx, Ubuntu Linux● Azure PaaS (Web Apps, SQL Azure)
● Multiple route to live environments per stack
● Automated creation of environments (introduction of DevOps principles)
● Continuous Integration – automated nightly builds/deployments
● External 3rd party integrations (search, tools, maps etc)
IndexersSingle Search Head
Searching
Splunk TopologyReporting Alerting Dashboards
NHS Choices
Data segmented by user
25
Our Data Sources
Online Services
Web Server Logs
CDN (Akamai)
Windows Event Logs
RSS Logs
Mail Server Logs
Performance Counters from Windows and
LINUX
26
Our Use Cases
TroubleshootingMonitoring Impact of Changes to the
Website
Managing Unpredictable
Traffic
Reporting to Management
27
Troubleshooting: Alerts● Correlation of events when something goes wrong to identify things
that regularly cause issues
● Set up alerts to identify when these things happen in the future, e.g.– GP surgery comments - alert when message queue length exceeds a certain
threshold which indicates that there is an issue in the backend causing the comments to get stuck.
– Alert to ensure there’s plenty of cache on the servers.– Errors when AV scanning finds issues with files uploaded by users– Regular scheduled reports to developers for URLs generating unhandled
exception errors – “closing the loop”
28
Troubleshooting: Root Cause Analysis
DDOS attack blocked by CDN
Reviewed data for previous year in Splunk to identify if there had ever been a similar attack that
went undetected
Splunk investigation revealed no previous
attack
We had the answer the same day
Content editors noticed user page
rating averages were dropping
Used Splunk to investigate
Identified a small set of IP addresses in
Germany that were bombarding pages with 1 and 2 star
ratings
Blocked those IPs and now have a dashboard
to show average ratings over time to
identify spam attacks
Example: DDOS Attack
Example: Spam page ratings
29
Troubleshooting: Root Cause Analysis● Page rating dashboard
– Count of ratings over time– Breakdown of different
ratings (star rating between 1 and 5)
– Detailed drilldown for source of ratings and correlation between URLs and source Ips
30
Monitoring Impact of Changes
● Impact on web server response time of recent change to database storage solution:
31
Managing Unpredictable Traffic
32
Managing Unpredictable Traffic● Traffic levels – 10 Minute Shake Up campaign launch with TV ads!
33
Reporting to Management● Uptime/availability reporting from external monitoring service for SLAs
34
Future PlansMake data available to the business
Analytics team: Monitor real time searches – which
browsers are they coming from, what search term took
them to NHS Choices, journey through the website.
Partner team: Track usage of content provided to over 600 syndication partners – usage
rates etc.
Product Owners: Build product specific dashboards
showing detailed performance and transaction status for key areas of the site
DevOps team: ingest data from other popular tools,
such as Git, Puppet or TeamCity, for end to end view
of builds/deployments
35
Value from Splunk
More informed decisions
Identifying business value in
the data we already have
Alerts and reports free up time to be
more proactive and deliver more value
Ability to correlate previously
unconnected events
Real-time and historical analysis
Thank You