NHS Choices: Managing complex infrastructure to deliver critical online services

36
Copyright © 2015 Splun Inc. NHS Choices: Managing complex infrastructure to deliver critical online services Neil Moran, NHS Choices

Transcript of NHS Choices: Managing complex infrastructure to deliver critical online services

Page 1: NHS Choices: Managing complex infrastructure to deliver critical online services

Copyright © 2015 Splunk Inc.

NHS Choices: Managing complex infrastructure to deliver critical online servicesNeil Moran, NHS Choices

Page 2: NHS Choices: Managing complex infrastructure to deliver critical online services

Copyright © 2015 Splunk Inc.

Introduction

Matt Davies, Splunk

Page 3: NHS Choices: Managing complex infrastructure to deliver critical online services

© 2015 Splunk Inc. Proprietary and Confidential InformationNot for Redistribution

About SplunkCompany (NASDAQ: SPLK)• Founded 2004, first software release in 2006• HQ: San Francisco / Regional HQ: London, Hong Kong• Over 1,800 employees, based in 12 countries

Business Model / Products• Free download to massive scale• Splunk Enterprise, Splunk Cloud, Splunk Light• Hunk: Splunk Analytics for Hadoop

10,000+ Customers• Customers in 100 countries• 80+ of the Fortune 100• Largest license: Over 400 Terabytes per day

Page 4: NHS Choices: Managing complex infrastructure to deliver critical online services

MAKE MACHINE DATAACCESSIBLE, USABLE,

AND VALUABLETO EVERYONE

Page 5: NHS Choices: Managing complex infrastructure to deliver critical online services

Platform for Machine Data

IndustrialData and

Internet ofThings

Security,Complianceand Fraud Application

DeliveryIT

OperationsBusinessAnalytics

OPERATIONAL INTELLIGENCE

Page 6: NHS Choices: Managing complex infrastructure to deliver critical online services

European organisations transforming

business with Splunk

Page 7: NHS Choices: Managing complex infrastructure to deliver critical online services

Infrastructure Monitoring

Application Delivery

Continuous Delivery

IT Troubleshooting

Splunk for IT Operations and App DeliveryManage Operations Through a Single Pane of Glass

Infrastructure Monitoring

Service Intelligence

Application Delivery

Continuous Delivery

IT Troubleshooting

Platform for Machine Data

Page 8: NHS Choices: Managing complex infrastructure to deliver critical online services

Splunk IT Service Intelligence Data-Driven Service Monitoring and Analytics

At-a-Glance Problem Analysis

Early Warning on Deviations

Dynamic Service Models

Seamless Workflow Integrations

Time Series Index Schema-on-the-Fly Data Model Common Information Model

Platform for Machine Data

Splunk IT Service Intelligence

Page 9: NHS Choices: Managing complex infrastructure to deliver critical online services

HOW VODAFONE GOT END-TO-END INSIGHT USING SPLUNK ITSI

Glass table visualizations enable rapid and proactive issue resolution

Custom KPIs empower teams across the business, operations & security

Actionable service insights in two days, not months

Page 10: NHS Choices: Managing complex infrastructure to deliver critical online services

Copyright © 2015 Splunk Inc.

NHS Choices: Managing complex infrastructure to deliver critical online servicesNeil Moran, NHS Choices

Page 11: NHS Choices: Managing complex infrastructure to deliver critical online services

11

About HSCIC

The Health and Social Care Information Centre (HSCIC) was formed in April 2013 as an executive non-departmental public body and the national provider of information,

data and IT systems for patients, service users, clinicians, commissioners, analysts and researchers in health and social care.

We have responsibility for a number of national systems, including Spine, NHSMail, Electronic Referrals and Gps Systems of Choice and NHS Choices to name just a few.

Page 12: NHS Choices: Managing complex infrastructure to deliver critical online services

We provide information, data and IT Systems for commissioners, analysts and clinicians in health and social care. Our work includes…

• Setting standards that protect patient’s confidential information, reduce bureaucracy and improve data quality

• Operating essential technology services that support the health and care system

• Collecting, analysing and publishing national data and statistical information that helps inform decision making

• Developing the next generation of national data and information systems

HSCIC – What we do

Page 13: NHS Choices: Managing complex infrastructure to deliver critical online services

13

About NHS Choices● Online 'front door' to the NHS –

the UK’s biggest health website

● More than 40 million visitors a month

● More than 20,000 regularly updated articles and hundreds of videos and interactive tools

● Compare health services in England

Page 14: NHS Choices: Managing complex infrastructure to deliver critical online services

14

About NHS Choices

Page 15: NHS Choices: Managing complex infrastructure to deliver critical online services

15

About NHS Choices

Page 16: NHS Choices: Managing complex infrastructure to deliver critical online services

16

Our Original Challenge

We want to continually improve services we provide for the public at the lowest possible

cost.

Page 17: NHS Choices: Managing complex infrastructure to deliver critical online services

17

Our Original Challenge

Used external service to monitor site availability. When problems occurred we had

to log on to each web server in turn to identify the root cause. Needed to identify

availability issues faster..

Page 18: NHS Choices: Managing complex infrastructure to deliver critical online services

18

Our Original Challenge

More than 40 million visits a month, generates a LOT of log data.

Page 19: NHS Choices: Managing complex infrastructure to deliver critical online services

Our Splunk Journey

19

Nov 2010• Tried out free

version• Attended

online demo• Minds blown!

June 2011• First purchase

5GB/day license

• Reporting revolutionised

Today• Multiple use

cases• Real-time stats on

display via big screens in the office

Looking ahead• More custom

dashboards/reports

• Plans to expand into other areas of the business

Page 20: NHS Choices: Managing complex infrastructure to deliver critical online services

20

Live Performance Stats on Display● Main live operations dashboard:

Page 21: NHS Choices: Managing complex infrastructure to deliver critical online services

21

Live Performance Stats on Display● Public Health England campaigns dashboard:

Page 22: NHS Choices: Managing complex infrastructure to deliver critical online services

22

Live Performance Stats on Display● Other views (Google Maps, Particles, Sideview utils, XML, forms):

Page 23: NHS Choices: Managing complex infrastructure to deliver critical online services

23

NHS Choices: Complex Infrastructure● 3 discrete technology stacks

● Sharepoint, SQL, Windows, .NET● Ruby, Nginx, Ubuntu Linux● Azure PaaS (Web Apps, SQL Azure)

● Multiple route to live environments per stack

● Automated creation of environments (introduction of DevOps principles)

● Continuous Integration – automated nightly builds/deployments

● External 3rd party integrations (search, tools, maps etc)

Page 24: NHS Choices: Managing complex infrastructure to deliver critical online services

IndexersSingle Search Head

Searching

Splunk TopologyReporting Alerting Dashboards

NHS Choices

Data segmented by user

Page 25: NHS Choices: Managing complex infrastructure to deliver critical online services

25

Our Data Sources

Online Services

Web Server Logs

CDN (Akamai)

Windows Event Logs

RSS Logs

Mail Server Logs

Performance Counters from Windows and

LINUX

Page 26: NHS Choices: Managing complex infrastructure to deliver critical online services

26

Our Use Cases

TroubleshootingMonitoring Impact of Changes to the

Website

Managing Unpredictable

Traffic

Reporting to Management

Page 27: NHS Choices: Managing complex infrastructure to deliver critical online services

27

Troubleshooting: Alerts● Correlation of events when something goes wrong to identify things

that regularly cause issues

● Set up alerts to identify when these things happen in the future, e.g.– GP surgery comments - alert when message queue length exceeds a certain

threshold which indicates that there is an issue in the backend causing the comments to get stuck.

– Alert to ensure there’s plenty of cache on the servers.– Errors when AV scanning finds issues with files uploaded by users– Regular scheduled reports to developers for URLs generating unhandled

exception errors – “closing the loop”

Page 28: NHS Choices: Managing complex infrastructure to deliver critical online services

28

Troubleshooting: Root Cause Analysis

DDOS attack blocked by CDN

Reviewed data for previous year in Splunk to identify if there had ever been a similar attack that

went undetected

Splunk investigation revealed no previous

attack

We had the answer the same day

Content editors noticed user page

rating averages were dropping

Used Splunk to investigate

Identified a small set of IP addresses in

Germany that were bombarding pages with 1 and 2 star

ratings

Blocked those IPs and now have a dashboard

to show average ratings over time to

identify spam attacks

Example: DDOS Attack

Example: Spam page ratings

Page 29: NHS Choices: Managing complex infrastructure to deliver critical online services

29

Troubleshooting: Root Cause Analysis● Page rating dashboard

– Count of ratings over time– Breakdown of different

ratings (star rating between 1 and 5)

– Detailed drilldown for source of ratings and correlation between URLs and source Ips

Page 30: NHS Choices: Managing complex infrastructure to deliver critical online services

30

Monitoring Impact of Changes

● Impact on web server response time of recent change to database storage solution:

Page 31: NHS Choices: Managing complex infrastructure to deliver critical online services

31

Managing Unpredictable Traffic

Page 32: NHS Choices: Managing complex infrastructure to deliver critical online services

32

Managing Unpredictable Traffic● Traffic levels – 10 Minute Shake Up campaign launch with TV ads!

Page 33: NHS Choices: Managing complex infrastructure to deliver critical online services

33

Reporting to Management● Uptime/availability reporting from external monitoring service for SLAs

Page 34: NHS Choices: Managing complex infrastructure to deliver critical online services

34

Future PlansMake data available to the business

Analytics team: Monitor real time searches – which

browsers are they coming from, what search term took

them to NHS Choices, journey through the website.

Partner team: Track usage of content provided to over 600 syndication partners – usage

rates etc.

Product Owners: Build product specific dashboards

showing detailed performance and transaction status for key areas of the site

DevOps team: ingest data from other popular tools,

such as Git, Puppet or TeamCity, for end to end view

of builds/deployments

Page 35: NHS Choices: Managing complex infrastructure to deliver critical online services

35

Value from Splunk

More informed decisions

Identifying business value in

the data we already have

Alerts and reports free up time to be

more proactive and deliver more value

Ability to correlate previously

unconnected events

Real-time and historical analysis

Page 36: NHS Choices: Managing complex infrastructure to deliver critical online services

Thank You