Health Care Claims System Prototype · Health Care Claims System Prototype ... Hadoop map‐reduce...

16
7701 Greenbelt Road, Suite 400, Greenbelt, MD 20770 Tel: (301) 614-8600 Fax: (301) 614-8601 www.sgt-inc.com ©2015 SGT, Inc. All Rights Reserved SGT WHITE PAPER Health Care Claims System Prototype MongoDB and Hadoop

Transcript of Health Care Claims System Prototype · Health Care Claims System Prototype ... Hadoop map‐reduce...

Page 1: Health Care Claims System Prototype · Health Care Claims System Prototype ... Hadoop map‐reduce accepts input parameters for a key, which is a key‐value pair from a ... Custom

7701 Greenbelt Road, Suite 400, Greenbelt, MD 20770 Tel: (301) 614-8600 Fax: (301) 614-8601 www.sgt-inc.com

©2015 SGT, Inc. All Rights Reserved

SGT WHITE PAPER

Health Care ClaimsSystem Prototype

MongoDB and Hadoop

Page 2: Health Care Claims System Prototype · Health Care Claims System Prototype ... Hadoop map‐reduce accepts input parameters for a key, which is a key‐value pair from a ... Custom

Health Care Claims System Prototype 

 

 

Purpose: 

  The prototype health care claims system database was built to explore the possibilities of using 

MongoDB and Hadoop for big data processing across a cluster of servers in the cloud. The idea was to 

use MongoDB to store hundreds of millions of records and then to use Hadoop’s map‐reduce 

functionality to efficiently process the data. MongoDB is a NoSQL, document based, database that 

provides easy scalability, thus it was a fitting choice for this project. Apache Hadoop is a framework that 

enables the distributed processing of large data sets across clusters of computers. Hadoop is designed to 

scale from use on a single server, to use on thousands of servers. Hadoop’s next generation map‐reduce, 

YARN, is a framework used for job scheduling, managing cluster resources, and provides the ability for 

parallel processing of large data sets. 

Components: 

1. The simulation data generator is a java program that generates all of the simulation data for the 

health care providers, the beneficiaries, and the claims. This is the only part of the health care 

claims system that is not required to live on the cloud because it only needs to be used once to 

generate the simulation data, it doesn’t need to be continually running like the rest of the 

system’s components. 

2. MongoDB is a NoSQL database used for storing all of the data from the simulation data 

generator and the map‐reduce jobs. 

3. Hadoop is a tool which is designed for parallel processing of large data sets. Hadoop’s processing 

power is completely scalable from one machine to a cluster built from any number of machines. 

4. The Mongo‐Hadoop Connector is a program that converts BSON (Binary JSON) data from 

MongoDB to a serialized format that Hadoop can use and then writes it back to a BSON data and 

into MongoDB. The Mongo‐Hadoop Connector is essential for communication between 

MongoDB and Apache Hadoop. 

5. The Google Map Overlay is simply used for marking any locations of interest on a map. For 

example, one of the map‐reduce programs finds all claims over $1,000,000, the program would 

then read in the whole collection of claims over $1,000,000 and mark the beneficiary’s location 

on the map. This way if there are a suspicious number of high claims in any one area it will be 

easy to determine. This Google map overlay reads data from MongoDB and then marks 

locations on the map. 

6. Amazon Web Services (Amazon cloud) is used to run the MongoDB, Hadoop, the Mongo‐

Hadoop Connector, and the program for the Google map overlays on the cloud across a cluster 

of multiple servers. 

 

SGT Innovation Center Health Care Claims System Prototype White Paper ©2015 SGT, Inc. All Rights Reserved

Page 3: Health Care Claims System Prototype · Health Care Claims System Prototype ... Hadoop map‐reduce accepts input parameters for a key, which is a key‐value pair from a ... Custom

A high‐level model of the whole system looks like: 

 

Figure 1: Component Diagram of the System 

 

How Does MongoDB store data? 

  MongoDB stores all of its data in structured documents which each contain a collection of key‐

value pairs. For example, in this project a document for a Provider might look something like: 

{ "_id" : ObjectId("53207ae0ac4fb2e887177162"), "Provider_NPI_Number" : NumberLong("3905304760"), "Name" : "AAAA 10", "Address" : "43499 5th Street ", "City" : "Houston", "State" : "TX", "Zip" : 77213, "Longitude" : -95.434241, "Latitude" : 29.83399, "Claim 1" : { "Therapy Code" : 71, "Covered Paid Amount" : 352248.0684500998, "HCPCS Code" : "0145T", "HCPCS Code Modifier" : "GP", "Facility Type Code" : 1, "Class Function Code" : 10, "Beneficiary ID" : 43, "Year" : 2014, "Month" : 1

SGT Innovation Center Health Care Claims System Prototype White Paper ©2015 SGT, Inc. All Rights Reserved

Page 4: Health Care Claims System Prototype · Health Care Claims System Prototype ... Hadoop map‐reduce accepts input parameters for a key, which is a key‐value pair from a ... Custom

}, "Claim 2" : { "Therapy Code" : 62, "Covered Paid Amount" : 707612.6592194028, "HCPCS Code" : "90207", "HCPCS Code Modifier" : "GW", "Facility Type Code" : 9, "Class Function Code" : 3, "Beneficiary ID" : 16, "Year" : 2014, "Month" : 12 }, "Claim 3" : { "Therapy Code" : 40, "Covered Paid Amount" : 1067684.5161930372, "HCPCS Code" : "G0455", "HCPCS Code Modifier" : "GO", "Facility Type Code" : 10, "Class Function Code" : 5, "Beneficiary ID" : 41, "Year" : 2014, "Month" : 12 } }

  Everything in every single document in MongoDB is a key‐value pair. For example, in the line ‘"Address" : "43499 5th Street "’ the key is “Address” and the value is “43499 5th Street”. All subdocuments, such as Claim 1, Claim 2, and Claim 3, in MongoDB are also considered values in a key‐value pair. For example, in the above document, the key “Claim 3” is matched with the value of the subdocument:         { "Therapy Code" : 40, "Covered Paid Amount" : 1067684.5161930372, "HCPCS Code" : "G0455", "HCPCS Code Modifier" : "GO", "Facility Type Code" : 10, "Class Function Code" : 5, "Beneficiary ID" : 41, "Year" : 2014, "Month" : 12 }

  

What does Hadoop map‐reduce do? 

  Hadoop map‐reduce accepts input parameters for a key, which is a key‐value pair from a 

document, and a value, which is also a key‐value pair from a document, it then combines (maps) all of 

the keys with an equal value into a single document and then allows the programmer to perform any 

calculations necessary with the value data(reduce). The general data flow for a hadoop map‐reduce job 

using MongoDB for input and output looks as follows: 

SGT Innovation Center Health Care Claims System Prototype White Paper ©2015 SGT, Inc. All Rights Reserved

Page 5: Health Care Claims System Prototype · Health Care Claims System Prototype ... Hadoop map‐reduce accepts input parameters for a key, which is a key‐value pair from a ... Custom

 

 

For example, in this project there is a map reduce program called AVGperNPI (Appendices A, B, 

& C), it takes in the key‐value pair “Provider_NPI_Number” for the key for the mapper, and it takes in 

the key‐value pair “Covered Paid Amount” as the value. The mapper then combines all claims that have 

the same NPI number and sends the NPI number and a list of all the “Covered Paid Amount” values as 

doubles (or DoubleWritable) to the Reducer. The reducer then iterates through the list of doubles, sums 

them together, and divides by the number of elements in the list to calculate the average claim amount 

for all the claims with the matching NPI number. The reducer writes the NPI number and the calculated 

average to a BSON object and writes this BSON object into a new collection in MongoDB. This new 

document will look something like:  

{ "_id" : ObjectId("5320855cac4f1874ff31ce7d"), "Provider NPI Number" : NumberLong("2389959670"), "Average Claim Amount" : 613212.3526743455 }

This map‐reduce program maps all claims with the same NPI number together, and reduces 

them to one document with the average claim amount for the NPI number listed instead of the 

individual claim amounts. It should be noted, that doubles, floats, ints, etc. cannot be directly used for a 

mapper key or passed to the reducer. Instead classes that implement Writable must be used, such as 

DoubleWritable, FloatWritable, IntWritable, etc. A complete list of these classes is in the 

org.apache.hadoop.io package which can be viewed here. Hadoop cannot read or write any data that is 

SGT Innovation Center Health Care Claims System Prototype White Paper ©2015 SGT, Inc. All Rights Reserved

Page 6: Health Care Claims System Prototype · Health Care Claims System Prototype ... Hadoop map‐reduce accepts input parameters for a key, which is a key‐value pair from a ... Custom

not contained in a class that implements Writable, because Hadoop needs the data in a serialized format 

that Writable provides. 

 

Custom Map‐Reduce Key Types: 

Hadoop also provides the ability to write custom keys as well for when data is used for a key 

that is not simply a primitive data type. For example, if the desired outcome was the average amount of 

each claim for each NPI number for each Year, the only additional code that would be required would be 

a custom key class definition as can be seen in Appendix F, and slight modifications to the mapper and 

reducer class where LongWritable would be replaced with the custom key type as seen in Appendices D 

and E. 

To create a custom Key, a class must be created that implements Hadoop’s WritableComparable 

interface and overrides the hashCode() and equals() methods. The custom key class must override the 

methods equals(), hashcode(), write(), readFields(), and compareTo(). If the map‐reduce program is 

being written in the Eclipse IDE, it can generate the hashCode() and the equals() methods, but the 

methods for the WritableComparable interface must still be written manually. 

After using Hadoop to run the Jar file containing the code in Appendices D‐G, the output in 

MongoDB will be a collection of documents with the average claim amount per NPI number, per year: 

{ "_id" : ObjectId("53209a94ac4f6eb2433b7012"), "Provider NPI Number" : NumberLong("8884844805"), "Year" : 2014, "Average Claim Amount" : 496265.4353595429 }

 

   

SGT Innovation Center Health Care Claims System Prototype White Paper ©2015 SGT, Inc. All Rights Reserved

Page 7: Health Care Claims System Prototype · Health Care Claims System Prototype ... Hadoop map‐reduce accepts input parameters for a key, which is a key‐value pair from a ... Custom

Appendix A: AvgPerNPIMapper 

package avgPerNPI;  import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.DoubleWritable; import org.apache.hadoop.mapreduce.Mapper; import org.bson.BSONObject; import java.io.IOException;  public class AvgPerNPIMapper     extends Mapper<Object, BSONObject, LongWritable, DoubleWritable> {        @Override     public void map(final Object pKey,                     final BSONObject pValue,                     final Context pContext)     throws IOException, InterruptedException      {                  BSONObject temp;         long npi = (long)pValue.get("Provider_NPI_Number");                int i = 0;        while(true)       {          try          {            i++;            temp = ((BSONObject)pValue.get("Claim " + Integer.toString(i)));           pContext.write(new LongWritable(npi), new DoubleWritable((double)temp.get("Covered Paid Amount")));     }     catch(Throwable t)          {           break;               }//end try/catch       }//end while(true)     } } 

 

   

SGT Innovation Center Health Care Claims System Prototype White Paper ©2015 SGT, Inc. All Rights Reserved

Page 8: Health Care Claims System Prototype · Health Care Claims System Prototype ... Hadoop map‐reduce accepts input parameters for a key, which is a key‐value pair from a ... Custom

Appendix B: AvgPerNPIReducer 

package avgPerNPI;  import com.mongodb.hadoop.io.BSONWritable; import org.apache.hadoop.mapreduce.Reducer; import org.bson.BasicBSONObject; import java.io.IOException; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.DoubleWritable;  public class AvgPerNPIReducer extends Reducer<LongWritable, DoubleWritable, BSONWritable, BSONWritable> {      @Override     public void reduce(final LongWritable pKey, final Iterable<DoubleWritable> pValues, final Context pContext)         throws IOException, InterruptedException          {      double avg = 0;      int cnt = 0;           for (final DoubleWritable value : pValues)            {             cnt++;               avg += value.get();           }           avg = avg/cnt;            BasicBSONObject output = new BasicBSONObject();                output.put("Provider NPI Number", pKey.get());                output.put("Average Claim Amount", avg);           pContext.write(null, new BSONWritable(output));         } }    

SGT Innovation Center Health Care Claims System Prototype White Paper ©2015 SGT, Inc. All Rights Reserved

Page 9: Health Care Claims System Prototype · Health Care Claims System Prototype ... Hadoop map‐reduce accepts input parameters for a key, which is a key‐value pair from a ... Custom

Appendix C: TestXMLConfig 

package avgPerNPI;  import com.mongodb.hadoop.util.MongoTool; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.util.ToolRunner;  public class TestXMLConfig extends MongoTool  {     static      {    Configuration.addDefaultResource("test.xml");     }      public static void main(final String[] pArgs) throws Exception      {         System.exit(ToolRunner.run(new TestXMLConfig(), pArgs));     } } 

 

   

SGT Innovation Center Health Care Claims System Prototype White Paper ©2015 SGT, Inc. All Rights Reserved

Page 10: Health Care Claims System Prototype · Health Care Claims System Prototype ... Hadoop map‐reduce accepts input parameters for a key, which is a key‐value pair from a ... Custom

 

Appendix D: AvgPerNPIMapper (Year) 

package avgPerNPI.year;  import org.apache.hadoop.io.DoubleWritable; import org.apache.hadoop.mapreduce.Mapper; import org.bson.BSONObject; import java.io.IOException;   public class AvgPerNPIMapper     extends Mapper<Object, BSONObject, YearNPI_Key, DoubleWritable> {        @Override     public void map(final Object pKey,                     final BSONObject pValue,                     final Context pContext)     throws IOException, InterruptedException      {                  BSONObject temp;         long npi = (long)pValue.get("Provider_NPI_Number");                int i = 0;       System.out.println("27");       while(true)       {          try          {         i++;         temp = ((BSONObject)pValue.get("Claim " + Integer.toString(i)));           pContext.write(new YearNPI_Key(npi, (int)temp.get("Year")), new DoubleWritable((double)temp.get("Covered Paid Amount")));     }          catch(Throwable t)          {         break;               }//end try/catch       }//end while(true)     } } 

   

SGT Innovation Center Health Care Claims System Prototype White Paper ©2015 SGT, Inc. All Rights Reserved

Page 11: Health Care Claims System Prototype · Health Care Claims System Prototype ... Hadoop map‐reduce accepts input parameters for a key, which is a key‐value pair from a ... Custom

Appendix E: AvgPerNPIReducer (Year) 

package avgPerNPI.year;  import com.mongodb.hadoop.io.BSONWritable;  import org.apache.hadoop.mapreduce.Reducer; import org.bson.BasicBSONObject; import java.io.IOException; import org.apache.hadoop.io.DoubleWritable;  public class AvgPerNPIReducer extends Reducer<YearNPI_Key, DoubleWritable, BSONWritable, BSONWritable> {      @Override     public void reduce(final YearNPI_Key pKey, final Iterable<DoubleWritable> pValues, final Context pContext)         throws IOException, InterruptedException          {       double avg = 0;       int cnt = 0;           for (final DoubleWritable value : pValues)            {             cnt++;               avg += value.get();           }           avg = avg/cnt;            BasicBSONObject output = new BasicBSONObject();             output.put("Provider NPI Number", pKey.npi);             output.put("Year", pKey.year);             output.put("Average Claim Amount", avg);           pContext.write(null, new BSONWritable(output));         } } 

   

SGT Innovation Center Health Care Claims System Prototype White Paper ©2015 SGT, Inc. All Rights Reserved

Page 12: Health Care Claims System Prototype · Health Care Claims System Prototype ... Hadoop map‐reduce accepts input parameters for a key, which is a key‐value pair from a ... Custom

Appendix F: YearNPI_Key 

package avgPerNPI.year;  import java.io.DataInput; import java.io.DataOutput; import java.io.IOException;  import org.apache.hadoop.io.WritableComparable;   public class YearNPI_Key implements WritableComparable<YearNPI_Key>  {   public long npi;   public int year;      public YearNPI_Key()   {     npi = 0;     year = 0;   }      public YearNPI_Key(long _npi, int _year)   {     npi = _npi;     year = _year;      }     @Override   public void readFields(DataInput in) throws IOException    {     npi = in.readLong();     year = in.readInt();   }    @Override   public void write(DataOutput out) throws IOException    {     out.writeLong(npi);     out.writeInt(year);   }    @Override   public int hashCode() {     final int prime = 31;     int result = 1;     result = prime * result + (int) (npi ^ (npi >>> 32));     result = prime * result + year;     return result;   }    @Override   public boolean equals(Object obj)  

SGT Innovation Center Health Care Claims System Prototype White Paper ©2015 SGT, Inc. All Rights Reserved

Page 13: Health Care Claims System Prototype · Health Care Claims System Prototype ... Hadoop map‐reduce accepts input parameters for a key, which is a key‐value pair from a ... Custom

       {     if (this == obj)       return true;     if (obj == null)       return false;     if (getClass() != obj.getClass())       return false;     YearNPI_Key other = (YearNPI_Key) obj;     if (npi != other.npi)       return false;     if (year != other.year)       return false;     return true;   }    @Override   public int compareTo(YearNPI_Key temp)    {     int ret_val = 1;     if(npi == temp.npi && year == temp.year)     {       ret_val = 0;     }     return ret_val;   } } 

 

 

   

SGT Innovation Center Health Care Claims System Prototype White Paper ©2015 SGT, Inc. All Rights Reserved

Page 14: Health Care Claims System Prototype · Health Care Claims System Prototype ... Hadoop map‐reduce accepts input parameters for a key, which is a key‐value pair from a ... Custom

Appendix G: TestXMLConfig (Year) 

package avgPerNPI.year;  import com.mongodb.hadoop.util.MongoTool; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.util.ToolRunner;  public class TestXMLConfig extends MongoTool  {     static      {     Configuration.addDefaultResource("test.xml");         Configuration.addDefaultResource("mongo‐defaults.xml");     }      public static void main(final String[] pArgs) throws Exception      {         System.exit(ToolRunner.run(new TestXMLConfig(), pArgs));     } } 

‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 

Some Screenshots that I might use 

 

SGT Innovation Center Health Care Claims System Prototype White Paper ©2015 SGT, Inc. All Rights Reserved

Page 15: Health Care Claims System Prototype · Health Care Claims System Prototype ... Hadoop map‐reduce accepts input parameters for a key, which is a key‐value pair from a ... Custom

 

SGT Innovation Center Health Care Claims System Prototype White Paper ©2015 SGT, Inc. All Rights Reserved

Page 16: Health Care Claims System Prototype · Health Care Claims System Prototype ... Hadoop map‐reduce accepts input parameters for a key, which is a key‐value pair from a ... Custom

 

SGT Innovation Center Health Care Claims System Prototype White Paper ©2015 SGT, Inc. All Rights Reserved