C* Summit 2013: Crossing the Chasm - SQL to NoSQL by Isaac Rieksts

27
© Health Market Science 2013, All Rights Reserved Isaac Rieksts Software Developer @IsaacRieksts, [email protected] CROSSING THE CHASM SQL to NOSQL #Cassandra13

description

Over the past few years, Health Market Science has transitioned from traditional relational databases and enterprise systems to a massively scalable Big Data platform that combines Cassandra and Storm to ingest thousands of feeds of data from the health market industry to produce a single high-quality masterfile. Come hear the "Why?", "What for?" and "How?" of that evolution.

Transcript of C* Summit 2013: Crossing the Chasm - SQL to NoSQL by Isaac Rieksts

Page 1: C* Summit 2013: Crossing the Chasm - SQL to NoSQL by Isaac Rieksts

© Health Market Science 2013, All Rights Reserved

Isaac Rieksts

Software Developer

@IsaacRieksts, [email protected]

CROSSING THE CHASM

SQL to NOSQL

#Cassandra13

Page 2: C* Summit 2013: Crossing the Chasm - SQL to NoSQL by Isaac Rieksts

© Health Market Science 2013, All Rights Reserved

Our Mission

§ Deliver the most current information on the U.S. healthcare provider universe using integrated solutions in order for customers to: ›  Prevent fraud, waste and abuse across the healthcare system ›  Comply with evolving state and federal regulations ›  Improve market opportunity for non retail drugs and devices

#Cassandra13

Page 3: C* Summit 2013: Crossing the Chasm - SQL to NoSQL by Isaac Rieksts

© Health Market Science 2013, All Rights Reserved

The Business

Business Solutions

Health Care Provider & Facilities

Variety/Velocity •  >2000 of sources •  6 Million unique HCPs •  10+ years history Data Challenges •  Constant change in real

world data •  Conflicting & partial info •  Frequent changes to

source structure •  Authoritative sources vs.

crowdsource •  Predicting source quality

Master Data Solutions Medical Procedures & Diagnosis

Volume/Velocity •  ~1B claims annually •  +5B records annually •  5+ years history Data Challenges •  Sources have

incomplete capture •  Overlapping source data •  Statistical projections &

biases •  Social media type

relationships

Medical Claims Data

Batch (CompleteView,

Expense Manager, CompleteSpend)

Transactional (PRS/PE)

Big Data Relational DB &

Analytics (Claims)

#Cassandra13

Page 4: C* Summit 2013: Crossing the Chasm - SQL to NoSQL by Isaac Rieksts

© Health Market Science 2013, All Rights Reserved

Master Data Management

Visualization

Dashboard / Reports

Structured Storage

Relational Indexing

Flexible Storage

NoSQL Graph(s)

Interfacing

Web Services

Distributed Processing

Standardize

Validate

Match

Consolidate

Analytics

Data Sources

Government

Web

Customer

I’m happy

User Interface

#Cassandra13

Page 5: C* Summit 2013: Crossing the Chasm - SQL to NoSQL by Isaac Rieksts

© Health Market Science 2013, All Rights Reserved

Consolidation

First Name: John Middle Name: David Last Name: Smith

First Name: Mike Middle Name: Steve Last Name: Smith

First Name: Mike Middle Name: David Last Name: Smith

#Cassandra13

Page 6: C* Summit 2013: Crossing the Chasm - SQL to NoSQL by Isaac Rieksts

© Health Market Science 2013, All Rights Reserved

Legacy System

§ Relational DB

§ Jboss

§ Jboss MQ

§ 1 Week to process a record through the system

#Cassandra13

Page 7: C* Summit 2013: Crossing the Chasm - SQL to NoSQL by Isaac Rieksts

© Health Market Science 2013, All Rights Reserved

Our Solutions

Business Needs

Finance & Legal Business Systems Compliance Sales & Marketing

Solutions Compliance Data Assessment, Integration, &

Outsourcing Enrichment Services

Provider Data

01010011

Market Intelligence

HMS Authoritative

Sources PDC Federal State Medical Claims Web Derived

Advanced Technology

Storm

HMS MDM

#Cassandra13

Page 8: C* Summit 2013: Crossing the Chasm - SQL to NoSQL by Isaac Rieksts

© Health Market Science 2013, All Rights Reserved

Data Model

§ Think of full entity

§ Build entity as you go

§ Get full view upon fetch

§ Choose PK carefully

#Cassandra13

Page 9: C* Summit 2013: Crossing the Chasm - SQL to NoSQL by Isaac Rieksts

© Health Market Science 2013, All Rights Reserved

Cassandra-Indexing

§ Fast wide row alternate key for Cassandra

§ Two row pull process ›  Fetch PKs matching AK ›  Use PK to fetch your data

https://github.com/hmsonline/cassandra-indexing #Cassandra13

Page 10: C* Summit 2013: Crossing the Chasm - SQL to NoSQL by Isaac Rieksts

© Health Market Science 2013, All Rights Reserved

Cassandra-Indexing

§ Key: Col1:Col2

§ Index: Col2:Col1

https://github.com/hmsonline/cassandra-indexing #Cassandra13

Page 11: C* Summit 2013: Crossing the Chasm - SQL to NoSQL by Isaac Rieksts

© Health Market Science 2013, All Rights Reserved

Cassandra-Indexing Example

§ Key: <First Name>:<Last Name>

§ Index: <Last Name>:<First Name>

§ Data ›  John:Smith ›  Steve:Smith ›  David:Jones

§ Index fetch “Smith” => John:Smith, Steve:Smith

§ Index fetch “Jones” => David:Jones

https://github.com/hmsonline/cassandra-indexing #Cassandra13

Page 12: C* Summit 2013: Crossing the Chasm - SQL to NoSQL by Isaac Rieksts

© Health Market Science 2013, All Rights Reserved

System Phase 1

#Cassandra13

Page 13: C* Summit 2013: Crossing the Chasm - SQL to NoSQL by Isaac Rieksts

© Health Market Science 2013, All Rights Reserved

System Phase 2

#Cassandra13

Page 14: C* Summit 2013: Crossing the Chasm - SQL to NoSQL by Isaac Rieksts

© Health Market Science 2013, All Rights Reserved

System Phase 3

#Cassandra13

Page 15: C* Summit 2013: Crossing the Chasm - SQL to NoSQL by Isaac Rieksts

© Health Market Science 2013, All Rights Reserved

Oracle Advanced Queue

§ Integrate Relation DB and JMS

§ Near Real time processing of data ›  Table trigger

§ Bulk exports ›  Keep only what you need on the queue

#Cassandra13

Page 16: C* Summit 2013: Crossing the Chasm - SQL to NoSQL by Isaac Rieksts

© Health Market Science 2013, All Rights Reserved

Oracle Advanced Queue (cont)

§ Distributed processing ›  Write to Cassandra as of queue time ›  Write only ids and query back for data

#Cassandra13

Page 17: C* Summit 2013: Crossing the Chasm - SQL to NoSQL by Isaac Rieksts

© Health Market Science 2013, All Rights Reserved

Unit testing

§ Module level ›  In memory mock ›  Map<String, Map<String, Map<String, Map<String, String>>>> ›  Map<Keyspace, Map<Column Family, Map<Column, Map<Row

Key, Value>>>>

§ Integration ›  Embedded Cassandra super class ›  Schema migration

#Cassandra13

Page 18: C* Summit 2013: Crossing the Chasm - SQL to NoSQL by Isaac Rieksts

© Health Market Science 2013, All Rights Reserved

QA

§ Fail fast and early

§ SoapUI and Maven

#Cassandra13

Page 19: C* Summit 2013: Crossing the Chasm - SQL to NoSQL by Isaac Rieksts

© Health Market Science 2013, All Rights Reserved

Organization Design

§ Project Manager

§ Business Analyst

§ Quality Assurance

§ Software Developer

§ Development Operations

#Cassandra13

Page 20: C* Summit 2013: Crossing the Chasm - SQL to NoSQL by Isaac Rieksts

© Health Market Science 2013, All Rights Reserved

Devops

§ Virtual Hardware (VMware)

§ Puppet ›  Puppet Master ›  Jenkins

§ Promote using config ›  Same script run in DEV as in Prod

#Cassandra13

Page 21: C* Summit 2013: Crossing the Chasm - SQL to NoSQL by Isaac Rieksts

© Health Market Science 2013, All Rights Reserved

Real-time System

Kafka Queue(s)

Offset

C* A

B C

C* ES1 Kafka

Elastic Search

ES2 C*

REST API

#Cassandra13

Page 22: C* Summit 2013: Crossing the Chasm - SQL to NoSQL by Isaac Rieksts

© Health Market Science 2013, All Rights Reserved

Storm

•  Guaranteed once semantics •  Well-designed processing abstraction •  Beats BYODP •  Momentum

#Cassandra13

Page 23: C* Summit 2013: Crossing the Chasm - SQL to NoSQL by Isaac Rieksts

© Health Market Science 2013, All Rights Reserved

Storm and Cassandra

§ Use Cases: ›  Write Storm Tuple data to C*

§ Computation Results §  Pre-computed indices

›  Read data from C* and emit Storm Tuples § Dynamic Lookups

http://github.com/hmsonline/storm-cassandra #Cassandra13

Page 24: C* Summit 2013: Crossing the Chasm - SQL to NoSQL by Isaac Rieksts

© Health Market Science 2013, All Rights Reserved

Storm-Cassandra Project

§ ColumnsMapper Interface ›  Tells the CassandraLookupBolt how to transform a C* row into a

Storm Tuple

§ Given a C* Row Key and list of Columns: ›  Return a list of Storm Tuples

http://github.com/hmsonline/storm-cassandra

#Cassandra13

Page 25: C* Summit 2013: Crossing the Chasm - SQL to NoSQL by Isaac Rieksts

© Health Market Science 2013, All Rights Reserved

Vision

Engine

•  Unpredictable schema/layout

•  Expand data storage structure dynamically

•  Fuzzy Search

Unstructured Data

•  Traversing relationships •  Building connections •  Real time relationship

changes

Graph Database

•  Traditional data base •  Predictable, logical structure •  Faceted Search

Structured Data

•  Scalability •  Performance •  Processing power •  Virtual grow/shrink

Distributed Processing

Data

#Cassandra13

Page 26: C* Summit 2013: Crossing the Chasm - SQL to NoSQL by Isaac Rieksts

© Health Market Science 2013, All Rights Reserved

Summary

§ Cassandra-Indexing

§ Oracle Advanced Queue

§ Storm-Cassandra

#Cassandra13

Page 27: C* Summit 2013: Crossing the Chasm - SQL to NoSQL by Isaac Rieksts

© Health Market Science 2013, All Rights Reserved

THE SCIENCE OF BETTER RESULTS

www.healthmarketscience.com

2700 Horizon Drive • King of Prussia, PA 19406 • 800.593.4467 • [email protected]

Questions?

#Cassandra13