Health Care Claims System Prototype · Health Care Claims System Prototype ... Hadoop map‐reduce...

7701 Greenbelt Road, Suite 400, Greenbelt, MD 20770 Tel: (301) 614-8600 Fax: (301) 614-8601 www.sgt-inc.com

©2015 SGT, Inc. All Rights Reserved

SGT WHITE PAPER

Health Care ClaimsSystem Prototype

MongoDB and Hadoop

Health Care Claims System Prototype

Purpose:

The prototype health care claims system database was built to explore the possibilities of using

MongoDB and Hadoop for big data processing across a cluster of servers in the cloud. The idea was to

use MongoDB to store hundreds of millions of records and then to use Hadoop’s map‐reduce

functionality to efficiently process the data. MongoDB is a NoSQL, document based, database that

provides easy scalability, thus it was a fitting choice for this project. Apache Hadoop is a framework that

enables the distributed processing of large data sets across clusters of computers. Hadoop is designed to

scale from use on a single server, to use on thousands of servers. Hadoop’s next generation map‐reduce,

YARN, is a framework used for job scheduling, managing cluster resources, and provides the ability for

parallel processing of large data sets.

Components:

1. The simulation data generator is a java program that generates all of the simulation data for the

health care providers, the beneficiaries, and the claims. This is the only part of the health care

claims system that is not required to live on the cloud because it only needs to be used once to

generate the simulation data, it doesn’t need to be continually running like the rest of the

system’s components.

2. MongoDB is a NoSQL database used for storing all of the data from the simulation data

generator and the map‐reduce jobs.

3. Hadoop is a tool which is designed for parallel processing of large data sets. Hadoop’s processing

power is completely scalable from one machine to a cluster built from any number of machines.

4. The Mongo‐Hadoop Connector is a program that converts BSON (Binary JSON) data from

MongoDB to a serialized format that Hadoop can use and then writes it back to a BSON data and

into MongoDB. The Mongo‐Hadoop Connector is essential for communication between

MongoDB and Apache Hadoop.

5. The Google Map Overlay is simply used for marking any locations of interest on a map. For

example, one of the map‐reduce programs finds all claims over $1,000,000, the program would

then read in the whole collection of claims over $1,000,000 and mark the beneficiary’s location

on the map. This way if there are a suspicious number of high claims in any one area it will be

easy to determine. This Google map overlay reads data from MongoDB and then marks

locations on the map.

6. Amazon Web Services (Amazon cloud) is used to run the MongoDB, Hadoop, the Mongo‐

Hadoop Connector, and the program for the Google map overlays on the cloud across a cluster

of multiple servers.

SGT Innovation Center Health Care Claims System Prototype White Paper ©2015 SGT, Inc. All Rights Reserved

A high‐level model of the whole system looks like:

Figure 1: Component Diagram of the System

How Does MongoDB store data?

MongoDB stores all of its data in structured documents which each contain a collection of key‐

value pairs. For example, in this project a document for a Provider might look something like:

{ "_id" : ObjectId("53207ae0ac4fb2e887177162"), "Provider_NPI_Number" : NumberLong("3905304760"), "Name" : "AAAA 10", "Address" : "43499 5th Street ", "City" : "Houston", "State" : "TX", "Zip" : 77213, "Longitude" : -95.434241, "Latitude" : 29.83399, "Claim 1" : { "Therapy Code" : 71, "Covered Paid Amount" : 352248.0684500998, "HCPCS Code" : "0145T", "HCPCS Code Modifier" : "GP", "Facility Type Code" : 1, "Class Function Code" : 10, "Beneficiary ID" : 43, "Year" : 2014, "Month" : 1


}, "Claim 2" : { "Therapy Code" : 62, "Covered Paid Amount" : 707612.6592194028, "HCPCS Code" : "90207", "HCPCS Code Modifier" : "GW", "Facility Type Code" : 9, "Class Function Code" : 3, "Beneficiary ID" : 16, "Year" : 2014, "Month" : 12 }, "Claim 3" : { "Therapy Code" : 40, "Covered Paid Amount" : 1067684.5161930372, "HCPCS Code" : "G0455", "HCPCS Code Modifier" : "GO", "Facility Type Code" : 10, "Class Function Code" : 5, "Beneficiary ID" : 41, "Year" : 2014, "Month" : 12 } }

Everything in every single document in MongoDB is a key‐value pair. For example, in the line ‘"Address" : "43499 5th Street "’ the key is “Address” and the value is “43499 5th Street”. All subdocuments, such as Claim 1, Claim 2, and Claim 3, in MongoDB are also considered values in a key‐value pair. For example, in the above document, the key “Claim 3” is matched with the value of the subdocument: { "Therapy Code" : 40, "Covered Paid Amount" : 1067684.5161930372, "HCPCS Code" : "G0455", "HCPCS Code Modifier" : "GO", "Facility Type Code" : 10, "Class Function Code" : 5, "Beneficiary ID" : 41, "Year" : 2014, "Month" : 12 }

What does Hadoop map‐reduce do?

Hadoop map‐reduce accepts input parameters for a key, which is a key‐value pair from a

document, and a value, which is also a key‐value pair from a document, it then combines (maps) all of

the keys with an equal value into a single document and then allows the programmer to perform any

calculations necessary with the value data(reduce). The general data flow for a hadoop map‐reduce job

using MongoDB for input and output looks as follows:


For example, in this project there is a map reduce program called AVGperNPI (Appendices A, B,

& C), it takes in the key‐value pair “Provider_NPI_Number” for the key for the mapper, and it takes in

the key‐value pair “Covered Paid Amount” as the value. The mapper then combines all claims that have

the same NPI number and sends the NPI number and a list of all the “Covered Paid Amount” values as

doubles (or DoubleWritable) to the Reducer. The reducer then iterates through the list of doubles, sums

them together, and divides by the number of elements in the list to calculate the average claim amount

for all the claims with the matching NPI number. The reducer writes the NPI number and the calculated

average to a BSON object and writes this BSON object into a new collection in MongoDB. This new

document will look something like:

{ "_id" : ObjectId("5320855cac4f1874ff31ce7d"), "Provider NPI Number" : NumberLong("2389959670"), "Average Claim Amount" : 613212.3526743455 }

This map‐reduce program maps all claims with the same NPI number together, and reduces

them to one document with the average claim amount for the NPI number listed instead of the

individual claim amounts. It should be noted, that doubles, floats, ints, etc. cannot be directly used for a

mapper key or passed to the reducer. Instead classes that implement Writable must be used, such as

DoubleWritable, FloatWritable, IntWritable, etc. A complete list of these classes is in the

org.apache.hadoop.io package which can be viewed here. Hadoop cannot read or write any data that is


not contained in a class that implements Writable, because Hadoop needs the data in a serialized format

that Writable provides.

Custom Map‐Reduce Key Types:

Hadoop also provides the ability to write custom keys as well for when data is used for a key

that is not simply a primitive data type. For example, if the desired outcome was the average amount of

each claim for each NPI number for each Year, the only additional code that would be required would be

a custom key class definition as can be seen in Appendix F, and slight modifications to the mapper and

reducer class where LongWritable would be replaced with the custom key type as seen in Appendices D

and E.

To create a custom Key, a class must be created that implements Hadoop’s WritableComparable

interface and overrides the hashCode() and equals() methods. The custom key class must override the

methods equals(), hashcode(), write(), readFields(), and compareTo(). If the map‐reduce program is

being written in the Eclipse IDE, it can generate the hashCode() and the equals() methods, but the

methods for the WritableComparable interface must still be written manually.

After using Hadoop to run the Jar file containing the code in Appendices D‐G, the output in

MongoDB will be a collection of documents with the average claim amount per NPI number, per year:

{ "_id" : ObjectId("53209a94ac4f6eb2433b7012"), "Provider NPI Number" : NumberLong("8884844805"), "Year" : 2014, "Average Claim Amount" : 496265.4353595429 }


Appendix A: AvgPerNPIMapper

package avgPerNPI; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.DoubleWritable; import org.apache.hadoop.mapreduce.Mapper; import org.bson.BSONObject; import java.io.IOException; public class AvgPerNPIMapper extends Mapper<Object, BSONObject, LongWritable, DoubleWritable> { @Override public void map(final Object pKey, final BSONObject pValue, final Context pContext) throws IOException, InterruptedException { BSONObject temp; long npi = (long)pValue.get("Provider_NPI_Number"); int i = 0; while(true) { try { i++; temp = ((BSONObject)pValue.get("Claim " + Integer.toString(i))); pContext.write(new LongWritable(npi), new DoubleWritable((double)temp.get("Covered Paid Amount"))); } catch(Throwable t) { break; }//end try/catch }//end while(true) } }


Appendix B: AvgPerNPIReducer

package avgPerNPI; import com.mongodb.hadoop.io.BSONWritable; import org.apache.hadoop.mapreduce.Reducer; import org.bson.BasicBSONObject; import java.io.IOException; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.DoubleWritable; public class AvgPerNPIReducer extends Reducer<LongWritable, DoubleWritable, BSONWritable, BSONWritable> { @Override public void reduce(final LongWritable pKey, final Iterable<DoubleWritable> pValues, final Context pContext) throws IOException, InterruptedException { double avg = 0; int cnt = 0; for (final DoubleWritable value : pValues) { cnt++; avg += value.get(); } avg = avg/cnt; BasicBSONObject output = new BasicBSONObject(); output.put("Provider NPI Number", pKey.get()); output.put("Average Claim Amount", avg); pContext.write(null, new BSONWritable(output)); } }


Appendix C: TestXMLConfig

package avgPerNPI; import com.mongodb.hadoop.util.MongoTool; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.util.ToolRunner; public class TestXMLConfig extends MongoTool { static { Configuration.addDefaultResource("test.xml"); } public static void main(final String[] pArgs) throws Exception { System.exit(ToolRunner.run(new TestXMLConfig(), pArgs)); } }


Appendix D: AvgPerNPIMapper (Year)

package avgPerNPI.year; import org.apache.hadoop.io.DoubleWritable; import org.apache.hadoop.mapreduce.Mapper; import org.bson.BSONObject; import java.io.IOException; public class AvgPerNPIMapper extends Mapper<Object, BSONObject, YearNPI_Key, DoubleWritable> { @Override public void map(final Object pKey, final BSONObject pValue, final Context pContext) throws IOException, InterruptedException { BSONObject temp; long npi = (long)pValue.get("Provider_NPI_Number"); int i = 0; System.out.println("27"); while(true) { try { i++; temp = ((BSONObject)pValue.get("Claim " + Integer.toString(i))); pContext.write(new YearNPI_Key(npi, (int)temp.get("Year")), new DoubleWritable((double)temp.get("Covered Paid Amount"))); } catch(Throwable t) { break; }//end try/catch }//end while(true) } }


Appendix E: AvgPerNPIReducer (Year)

package avgPerNPI.year; import com.mongodb.hadoop.io.BSONWritable; import org.apache.hadoop.mapreduce.Reducer; import org.bson.BasicBSONObject; import java.io.IOException; import org.apache.hadoop.io.DoubleWritable; public class AvgPerNPIReducer extends Reducer<YearNPI_Key, DoubleWritable, BSONWritable, BSONWritable> { @Override public void reduce(final YearNPI_Key pKey, final Iterable<DoubleWritable> pValues, final Context pContext) throws IOException, InterruptedException { double avg = 0; int cnt = 0; for (final DoubleWritable value : pValues) { cnt++; avg += value.get(); } avg = avg/cnt; BasicBSONObject output = new BasicBSONObject(); output.put("Provider NPI Number", pKey.npi); output.put("Year", pKey.year); output.put("Average Claim Amount", avg); pContext.write(null, new BSONWritable(output)); } }


Appendix F: YearNPI_Key

package avgPerNPI.year; import java.io.DataInput; import java.io.DataOutput; import java.io.IOException; import org.apache.hadoop.io.WritableComparable; public class YearNPI_Key implements WritableComparable<YearNPI_Key> { public long npi; public int year; public YearNPI_Key() { npi = 0; year = 0; } public YearNPI_Key(long _npi, int _year) { npi = _npi; year = _year; } @Override public void readFields(DataInput in) throws IOException { npi = in.readLong(); year = in.readInt(); } @Override public void write(DataOutput out) throws IOException { out.writeLong(npi); out.writeInt(year); } @Override public int hashCode() { final int prime = 31; int result = 1; result = prime * result + (int) (npi ^ (npi >>> 32)); result = prime * result + year; return result; } @Override public boolean equals(Object obj)


{ if (this == obj) return true; if (obj == null) return false; if (getClass() != obj.getClass()) return false; YearNPI_Key other = (YearNPI_Key) obj; if (npi != other.npi) return false; if (year != other.year) return false; return true; } @Override public int compareTo(YearNPI_Key temp) { int ret_val = 1; if(npi == temp.npi && year == temp.year) { ret_val = 0; } return ret_val; } }


Appendix G: TestXMLConfig (Year)

package avgPerNPI.year; import com.mongodb.hadoop.util.MongoTool; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.util.ToolRunner; public class TestXMLConfig extends MongoTool { static { Configuration.addDefaultResource("test.xml"); Configuration.addDefaultResource("mongo‐defaults.xml"); } public static void main(final String[] pArgs) throws Exception { System.exit(ToolRunner.run(new TestXMLConfig(), pArgs)); } }

‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐

Some Screenshots that I might use


Health Care Claims System Prototype · Health Care Claims System Prototype ... Hadoop map‐reduce...

Documents

Transcript of Health Care Claims System Prototype · Health Care Claims System Prototype ... Hadoop map‐reduce...