Practice Fusion & MongoDB: Transitioning a 4 TB Audit Log from SQL Server to MongoDB

Post on 18-Jul-2015

640 views 10 download

Transcript of Practice Fusion & MongoDB: Transitioning a 4 TB Audit Log from SQL Server to MongoDB

4 TB Audit Log from SQL Server to MongoDB

Michael Poremba

Director, Data Architecture

Practice Fusion

May 2015

+ 20 years software engineering

+ Data architect / application architect

+ High-volume OLTP relational databases

+ Application performance and scalability

+ Domain experience:Health care; financial services; IT management; content management and distribution;

targeted advertising; telecom billing; manufacturing; insurance

Michael Poremba @ Practice Fusion

2

Project BackgroundGetting started

3

+ Cloud-based electronic health records service (EHR)

+ Over 100,000 health care providers in US

+ Over 100,000,000 patient medical records

+ SQL Server OLTP database

Weekday peak ~ 60,000 transactions per second

+ Primary database = 8 TB

+ 50% of primary database is security audit records + indexes

Practice Fusion

4

+ HIPAA: Health Insurance Portability and Accountability Act of 1996

+ Who did what to which patient’s medical record when?

+ Regulatory requirement—audit log must be kept and reviewed

+ Law enforcement and evidence in legal discovery

+ Save the audit log forever

+ Primary use cases:

Audit report in EHR: Security audit log viewer

Physician data analytics: Clinical quality measures (CQM)

HIPAA Security Audit Log

5

6

HIPAA Security Auditing on MongoDB

Project anatomy & lessons learned

7

+ Latency on SAN increased

+ Database writes slowed down

+ Database connections held longer

+ Connection pool expanded

+ User interface locked up—waiting

+ Users tried to log in again

+ Login is heaviest user operation

+ [Repeat]

The Log Jam

Found at: http://anchorhardwoods.com/wp-content/uploads/2011/08/log-jam.jpg

8

Security Auditing – Legacy Architecture

Public

Load

Balancer

App 1

App 2

App n

.

.

.

EHR

(OLTP DB)

ActivityFeed

ActivityFeedParameter

2..10

CQM

Reporting

ETL

Audit

Report

9

Audit Service – New Architecture

Public

Load

Balancer

App 1

App 2

App n

.

.

.

MongoDB

Audit Log

Audit

ServiceAMQ

Queue

Listener

Audit

Report

CQM

Reporting

ETL

10

+ Isolate auditing system from EHR OLTP database

+ Move audit IO off of EHR SAN to AWS

+ New service interface for audit events using .NET

+ Scale out audit service interface on IIS farm

+ Scale out audit data store using MongoDB

Technical Benefits of New Architecture

11

+ Transaction volume: Sustain 1,000 new documents per second

+ Data volume: Scale to 10’s of billions of audit event records

+ High availability and disaster recovery—higher SLA than EHR

+ Quick UI response time for interactive audit report

+ Tamper prevention and detection

No updates or deletes permitted on audit log

Security alerts when audit log is altered

+ Leverage industry standards for health care security audit logging

~300 distinct auditable user actions

Required and varying data elements

Security Auditing – Application Requirements

12

Project Objectives

+ New infrastructure for MongoDB

and AMQ

+ Modernize audit service API

+ Convert ~200 audit events to new

audit service interface

+ Data warehouse ETL from MongoDB

+ Modernize audit report UI

+ Migrate 4 billion exiting audit records

Project: Audit 2.0Coletteprogram management

Ernestservices expert

Bhaviktest engineering

JayMongoDB expert

Jeffcluster architecture

Michaeldata architecture

BrettAMQ expert

Bryaninfrastructure coordination

Rajanidata warehouse ETL 13

Audit

Event

Participant

Object

Audit

SystemUser

0..n1..1 1..2

Health Care Industry Standards for Audit Logging

+ ISO 27789:2013: Health

Informatics – Audit trails for

electronic health records

+ ASTM E2147-01(2013):

Standard Specification for Audit

Disclosure Logs for Use in

Health Information Systems

+ FHIR SecurityEvent – resource

definition for auditing

14

{

"_id" : <BinaryData(4)>, // The audit event GUID

"docHash" : <String; Required>, // Tamper detection

"audOrgGuid" : <BinaryData(4); Required>, // Shard key

"crtdDttmUtc" : <Date; Required>, // Datetime record was inserted

"evnt" : {// Required subdocument

"dttmUtc" : <Date; Required>, // Date/time that event occurred

"typ" : <String; Required>, // Event record type; ~ 300 types

"ptDataTyp" : <String; Required>, // Standard set of patient data types

"actn" : <String; Required>, // Standard set of actions

"sys" : <String; Required> // Source system for audit event

},

"usr" : { // Required subdocument

"usrId" : <String; Required>, // Human-readable ID

"usrGuid" : <BinaryData(4); Required>, // Machine-readable ID

"dispNm" : <String; Required>, // Required; Display name for user

"orgId" : <String; Required>,

"orgNm" : <String; Required>

},

"altUsr" : { // Optional subdocument for second user

... // Subdocument contains same properties as "usr"

},

"pt" : { // Optional subdocument

"ptId" : <String; Required>, // Human-readable ID for patient

"ptPracGuid" : <BinaryData(4); Required>, // Machine-readable ID for patient

"dispNm" : <String; Required>, // Display name for patient

"orgId" : <String; Required>,

"orgNm" : <String; Required>

},

"body" : { // Optional subdocument

... // Flattened list of attributes, specific to audit event subtype

}

}

JSON Document Schema for Audit Events

AuditEvent

ParticipantObject

AuditSystem

User

0..n1..1 1..2

15

Schema Design – Lessons Learned

+ Prop nms strd per doc Long names add up for large collections (ours: 1 TB)

Consider using abbreviated property names

Up-vote this feature request:

https://jira.mongodb.org/browse/SERVER-863

+ Know your application read/write patterns

+ Application responsible for data integrity

+ Be aware of data type behaviors Indexed string search is case sensitive. Upvote:

https://jira.mongodb.org/browse/SERVER-90

Several binary data types for UUID—use type 4

(default type is specific to database driver)Found at: http://www.milesfinchinnovation.com/blog/wp-

content/uploads/2013/02/iStock_000019474446Medium.jpg

16

Schema Design – Lessons Learned

Leverage native data types:

+ Date

+ Boolean

+ Numeric "1" + "1" "11"

"11" + "1" "111"

+ UUID "8c290139-f4e3-49c1-9ba2-a883defc6a15"

"8C290139-F4E3-49C1-9BA2-A883DEFC6A15"

"8c29-0139-f4e3-49c1-9ba2-a883-defc-6a15"

"8c290139f4e349c19ba2a883defc6a15"

"{8c290139-f4e3-49c1-9ba2-a883defc6a15}"

"{8C290139-F4E3-49C1-9BA2-A883DEFC6A15}"

Found at: http://www.industryweek.com/innovation/innovation-one-size-fits-one

17

ActivityFeed

Audit EventType

ActivityFeed

Parameter

Action TypePatient

Data Type

(~300)

(~4 billion)

(~30 billion)

(10) (18)

UserPatient

(~100,000)(~100 million)

Practice

(~50,000)

Legacy Auditing System – Relational Schema

Issues around data normalization

+ New requirements introduced

+ Filter criteria and sort criteria

stored in five different tables

+ Audit events must be read into

memory for filtering and sorting

Join and expand data set by practice

Sort and filter expanded data set

+ Response time suffers for large

practices with many audit events

18

Schema Design – Lessons Learned

ActivityFeed

Audit EventType

ActivityFeed

Parameter

Action TypePatient

Data Type

UserPatient

Practice

Denormalize with care:

{

"_id" : <BinaryData(4)>,

"docHash" : <String; Required>,

"audOrgGuid" : <BinaryData(4); Required>,

"crtdDttmUtc" : <Date; Required>,

"evnt" : {

"dttmUtc" : <Date; Required>,

"typ" : <String; Required>,

"ptDataTyp" : <String; Required>,

"actn" : <String; Required>,

"sys" : <String; Required>

},

"usr" : {

"usrId" : <String; Required>,

"usrGuid" : <BinaryData(4); Required>,

"dispNm" : <String; Required>,

"orgId" : <String; Required>,

"orgNm" : <String; Required>

},

"pt" : {

"ptId" : <String; Required>,

"ptPracGuid" : <BinaryData(4); Required>,

"dispNm" : <String; Required>,

"orgId" : <String; Required>,

"orgNm" : <String; Required>

},

"body" : { ... }

}19

+ Millions of audit events per medical practice

+ Require fast response time for interactive audit report UI

+ Audit report UI allows events to be sorted/filtered five different ways

+ UI allows paging through audit events

+ Create a secondary index for each sort method

Index Design

20

+ Organization, event date DESCdb.auditEvent.ensureIndex ( {"audOrgGuid": 1, "evnt.dttmUtc": -1} );

+ Organization, patient, event date DESCdb.auditEvent.ensureIndex ( {"audOrgGuid": 1, "pt.ptId": 1, "evnt.dttmUtc": -1 } );

+ Organization, user, event date DESCdb.auditEvent.ensureIndex ( {"audOrgGuid": 1, "usr.usrId": 1, "evnt.dttmUtc": -1 } );

+ Organization, patient data type, event date DESCdb.auditEvent.ensureIndex ( {"audOrgGuid": 1, "evnt.ptDataTyp": 1, "evnt.dttmUtc": -1

} );

+ Organization, user action type, event date DESCdb.auditEvent.ensureIndex ( {"audOrgGuid": 1, "evnt.actn": 1, "evnt.dttmUtc": -1} );

+ Document created date DESCdb.auditEvent.ensureIndex ( {"crtdDttmUtc": -1 } );

Index Definitions

21

+ Filter by practice GUID

+ Sort by event created date time, descending order

+ Limit to 20 documents

db.auditEvent.find( {"audOrgGuid": BinData(4,"ABrlAG57Rx6gY3zyHzFK3Q==")} )

.sort( {"evnt.dttmUtc" : -1} ).limit(20).explain();

{

"clusteredType" : "ParallelSort",

"shards" : {

"RepSet02/MNGODDB03-SHRD02:27018, MNGODDB04-SHRD02:27018" : [

{

"cursor" : "BtreeCursor auditEvent_audOrgGuid_dttmUtc",

...

} ] }

...

"numshards" : 1,

...

Query Plan

22

Indexing Strategy – Lessons Learned

+ As with relational databases,

indexes are essential for efficient

queries

+ Learn how to use .explain()

to read query plans

+ Avoid collection scans"cursor" : "BasicCursor"

+ For compound indexes, query sort

order must match index sort order

+ Enable mongod --notablescan

option in test / staging environmentsFound at: http://www.ebay.com/itm/13-pc-Hex-Shank-Titanium-Drill-Bit-Set-Quick-Change-

Bits-/350526103504?pt=LH_DefaultDomain_0&hash=item519cfbdfd0

23

Principle of least privilege

+ MongoDB cluster not accessible from public Internet

+ Security enabled on cluster

+ Application users granted minimum permissions required

Signed audit events

+ Audit events signed with hash of audit event contents

+ Recompute hash on reads—test the data against hash value

+ Send security alert when hash does not match

Oplog monitoring

+ Use mongo-connector Python scripts to monitor oplog

+ Watch for .update() and .delete() operations on collection

+ Send security alert when data changes are detected

Tamper Prevention and Detection

Found at:http://legacymedia.localworld.co.uk/275663/Article/images/17639732/4416792.jpg

24

Security – Lessons Learned

+ Minimize network access to

MongoDB cluster

+ Enable authentication

+ Leverage role-based

authorization

+ Use SSL (MongoDB Enterprise)

+ Disable REST interface and

HTTP status interface

Found at: http://www.harborfreight.com/3-1-2-half-inch-circular-padlock-98972.html

25

+ Shard the database to scale out

+ Begin with small number of shards (2 or 3)

+ Group all audit events from the same medical practice

Every audit event is “owned” by some practice

Audit report UI always queries events by medical practice

+ Composite shard key on { PracticeGuid, _id }db.runCommand({

shardcollection : "AuditLog.auditEvent",

key: {audOrgGuid: 1,

_id: 1}});

Transaction Volume: 1,000 New Documents per Second

Found at:http://s3.amazonaws.com/Reconsales/800/0bfe72e0-9b06-42ac-9644-5727a3ca9c79.jpg

26

Sharding the Database – Lessons Learned

+ At the onset of development

determine whether to shard

+ Specify shard key in queries Allows mongos to route query

Minimize distributed “scatter/gather” queries

Queries spanning chunks likely span shards

+ Choose a key that allows even

balancing Balancing is performed in 32 MB chunks

Design shard key to ensure chunks will not

exceed 32 MB

Found at: http://www.airbrushaction.com/content/sites/default/files/tipstricks-images/4_27.png 27

High Availability and Disaster Recovery – Replica Sets

+ If audit log is down, then 100,000

health care providers are idle

+ Audit logging subsystem must be

more reliable than customer EHR

+ Node failover must be automatic

+ Protect against network and data

center failure scenarios

Found at: http://www.huntsmart.com/App_Themes/hs.com/ProductImages/250/DNSBC.jpg

28

Disaster Recovery

DCPrimary DC DC2 AZ2

Sharded Cluster Replicated Across Multiple Data Centers

config

mongos shard 2

arbitermongos

amq

arbiter

amq

DC3 AZ1

shard 2

DC2 AZ1

shard 2

mongos shard 3

arbitermongos

arbiter

shard 3shard 3

mongos shard 1

arbitermongos

arbiter

shard 1shard 1

config config

amq amq

29

Performance and Stress Testing – Lessons Learned

+ Acquire or build load testing tools

+ Test using a realistic, unbiased data set

+ Test database cluster to ensure write

throughput

+ Ensure read & write performance meets

load requirements

+ Find the performance ceiling

+ Find and resolve bottlenecks

+ Tune IO and memory

Found at: http://www.webdesign.org/img_articles/21892/broken_chain.jpg

30

Data Migration – Lessons Learned

Data Migration

+ Parallelize data migration process

+ Identify and remove bottlenecks

+ Scale out MongoDB cluster to handle

heavy write load

+ Determine whether best to add

indexes before or after migration

+ It takes a while to extract, transform,

and load billions of documentsFound at: http://www.dennissy.com/wp-content/uploads/2010/07/house_moving_malaysia.jpg

31

Data Repair – Lessons Learned

32

Bulk update on collections

+ Use Bulk() operation builder

bulk.find.update()

Simple, unordered parallelized

> 200,000 updates/minute

+ Regular update operation

~ 2,000 updates/minute

Choosing the Appropriate Data Store

MongoDB over relational?

+ Scale out for transaction volume

and data volume

+ Developer productivityEasy map between application and data store

+ Highly varying document

structure

+ Offload read activity in optimized

format different from data writes(a.k.a. CQRS pattern)

Found at: http://www.meonuk.com/hammers-mauls

33

Choosing the Appropriate Data Store

Relational over MongoDB?

+ Complex normalized data model

+ Diverse read patterns requiring

joins

+ Ad hoc reporting and analysis

+ Data integrity difficult to manage

in application layerFound at:

http://3.bp.blogspot.com/_QUmmdgc7l6A/TTPUyRWFNPI/AAAAAAAAAO8/KV_i2c2lrRk/s1600/saws+various.jpg

34

MongoDB @ Practice Fusion

Upcoming MongoDB projects

+ Observations data store

Scale-out data store for

patient vital signs, etc.

+ Clinical data repository

Read cache for patient medical

records (CQRS pattern)

+ Upgrades for Audit 2.0

WiredTiger + compressionFound at: http://jbirdmedia.org/vessels/images/uploads/framing-new-const-lg.jpg

35

Q&A

Michael Poremba

mporemba@practicefusion.com

linkedin.com/in/michaelporemba

@mporemba36