Advanced XML Integration September 13-14, 2005 St. Louis, Missouri Sponsored by the U.S. Department...

21
Advanced XML Integration September 13-14, 2005 St. Louis, Missouri Sponsored by the U.S. Department of Housing and Urban Development David Talbot, Data Systems International Eric Jahn, United Way of Sarasota County

Transcript of Advanced XML Integration September 13-14, 2005 St. Louis, Missouri Sponsored by the U.S. Department...

Page 1: Advanced XML Integration September 13-14, 2005 St. Louis, Missouri Sponsored by the U.S. Department of Housing and Urban Development David Talbot, Data.

Advanced XML Integration

September 13-14, 2005St. Louis, Missouri

Sponsored by the U.S. Department of Housing and Urban Development

David Talbot, Data Systems International

Eric Jahn, United Way of Sarasota County

Page 2: Advanced XML Integration September 13-14, 2005 St. Louis, Missouri Sponsored by the U.S. Department of Housing and Urban Development David Talbot, Data.

September 13-14, 2005 St. Louis, MissouriSponsored by the U.S. Department of Housing and Urban Development

2

What Is Data Integration and Why Should Our HMIS Do It?

Data integration is moving data from one computer system to another.

• Many local agencies already have their own data

tracking systems. • For these agencies, inputting into HMIS's web interface

is duplicate data entry. Duplication = Bad• Agencies can keep using their existing systems.

• Integration allows many Continua to combine their data into one place for reporting.

Page 3: Advanced XML Integration September 13-14, 2005 St. Louis, Missouri Sponsored by the U.S. Department of Housing and Urban Development David Talbot, Data.

September 13-14, 2005 St. Louis, MissouriSponsored by the U.S. Department of Housing and Urban Development

3

How Does Data Integration Work?

• Two types of data integration:• Live Integration• Episodic

• Transport and messaging protocols being worked on by NHSDC and HUD’s National HMIS TA team.

Page 4: Advanced XML Integration September 13-14, 2005 St. Louis, Missouri Sponsored by the U.S. Department of Housing and Urban Development David Talbot, Data.

September 13-14, 2005 St. Louis, MissouriSponsored by the U.S. Department of Housing and Urban Development

4

Data Warehousing

• Analysis and Reporting across continua• Multiple related systems in a city

• Example: Domestic Violence Shelters, AIDS Foundations, Homeless Consortium all using different systems but frequently serving the same requirements.

• Reporting for one or more systems statewide or nationally.

• Statewide or national picture of homeless.

• Why not just use the APRs generated from each system?• APR only answers specific questions, doesn’t allow the

data to be “sliced and diced”.

Page 5: Advanced XML Integration September 13-14, 2005 St. Louis, Missouri Sponsored by the U.S. Department of Housing and Urban Development David Talbot, Data.

September 13-14, 2005 St. Louis, MissouriSponsored by the U.S. Department of Housing and Urban Development

5

How multiple systems report together

ClientTrack.NETOther

ProprietaryHome GrownApplications

HUD XML

UncleansedData

Warehouse Vendor Proprietary

HUD XML

Data Cleansing/De-duplication

CleansedData Warehouse

Continua Wide Reporting

CubeAnalysis

Page 6: Advanced XML Integration September 13-14, 2005 St. Louis, Missouri Sponsored by the U.S. Department of Housing and Urban Development David Talbot, Data.

September 13-14, 2005 St. Louis, MissouriSponsored by the U.S. Department of Housing and Urban Development

6

De-Duplication

• Specific rules depending on how aggressive you want the results de-duplicated.• First/Last Name, SSN, DOB are common fields for de-

duplication.

• De-duplication without passing client identifiable information (Hashed Identifiers)• SHA-15 one way encryption of identifying information.• Feature demanded by Domestic Violence shelters for

good reason.• Each additional system a client’s data touches increases

the possible places where a confidentiality breach can occur.

Page 7: Advanced XML Integration September 13-14, 2005 St. Louis, Missouri Sponsored by the U.S. Department of Housing and Urban Development David Talbot, Data.

September 13-14, 2005 St. Louis, MissouriSponsored by the U.S. Department of Housing and Urban Development

7

System 1

De-Duplication Without Passing Client Identifiable Info

David Talbot

SH

A15

7&SA(*HAJAKYZM HUD XML

System 2

David TalbotS

HA

15

7&SA(*HAJAKYZM

Data warehouse knows that “7&SA(*HAJAKYZM” is the same client butbut can not turn “7&SA(*HAJAKYZM” into “David Talbot”

Page 8: Advanced XML Integration September 13-14, 2005 St. Louis, Missouri Sponsored by the U.S. Department of Housing and Urban Development David Talbot, Data.

September 13-14, 2005 St. Louis, MissouriSponsored by the U.S. Department of Housing and Urban Development

8

Common Strategies for Hashed Identifiers

• Precise: Concatenate first name, last name and date of birth or last 4 chars of SSN.• Few false “de-duplications”• Many records that should have been de-duplicated won’t

be due to things like misspelled names.

• More De-Duplications: Concatenate a SOUNDEX of first name, last name and date of birth.• More false “de-duplications”

Page 9: Advanced XML Integration September 13-14, 2005 St. Louis, Missouri Sponsored by the U.S. Department of Housing and Urban Development David Talbot, Data.

September 13-14, 2005 St. Louis, MissouriSponsored by the U.S. Department of Housing and Urban Development

9

Why HUD XML?

• Allows vendors to support one import/export format instead of many.• www.hmis.info lists 28 vendors!• There are hundreds, maybe even thousands of home

grown applications out there in use by only one or two groups.

• Result:• Without HUD XML, bringing data together from multiple

vendors required custom import routines to be written for each vendor.

• With HUD XML, all vendors write to a common specification and only one import routine needs to be written.

Page 10: Advanced XML Integration September 13-14, 2005 St. Louis, Missouri Sponsored by the U.S. Department of Housing and Urban Development David Talbot, Data.

September 13-14, 2005 St. Louis, MissouriSponsored by the U.S. Department of Housing and Urban Development

10

Basic Technology Behind HUD XML

• XSD – XML Schema• Supported by all modern programming languages.• Provides validation that a file conforms to the schema.

• Unfortunately just because XSD validates the file doesn’t mean it will just magically work 100%.• Many systems currently in use lack basic data validation

at data entry. Garbage In/Garbage Out.• Slight variances in interpretation of the schema can lead

to unintended results even though the export file is technically valid.

Page 11: Advanced XML Integration September 13-14, 2005 St. Louis, Missouri Sponsored by the U.S. Department of Housing and Urban Development David Talbot, Data.

September 13-14, 2005 St. Louis, MissouriSponsored by the U.S. Department of Housing and Urban Development

11

Internet

HUD XML: State of the Union

• What are people currently doing with the standard? • Suncoast Partnership agency example: csv->HUD XML• Open Source Reference Implementation to be supported

by vendors.

AgencyDatabase

XML SOAP HTTP SSL XMLSOAPHTTPSSL

Data WarehouseApplication

PostgreSQLDatabase

Page 12: Advanced XML Integration September 13-14, 2005 St. Louis, Missouri Sponsored by the U.S. Department of Housing and Urban Development David Talbot, Data.

September 13-14, 2005 St. Louis, MissouriSponsored by the U.S. Department of Housing and Urban Development

12

Vendor Support for HUD XML

• Vendor support matrix for HUD XML being published

• Why aren’t more vendors supporting HUD XML already?• Fear of “making it easy for customers to switch.”• Expectation that one of their customers should pay for it

as “custom work”.• Software built on outdated and/or non-mainstream

technologies that lack good tools to work with XML.• Lack of understanding on just how easy it is to

implement.

Page 13: Advanced XML Integration September 13-14, 2005 St. Louis, Missouri Sponsored by the U.S. Department of Housing and Urban Development David Talbot, Data.

September 13-14, 2005 St. Louis, MissouriSponsored by the U.S. Department of Housing and Urban Development

13

How do I get my vendor to support HUD XML?

• Peer pressure. Vendor X is doing it and you should too.

• RFP Pressure. If you haven’t chosen a vendor, please include HUD XML as a baseline requirement in your RFP.

• Pay the vendor to implement the feature.

Page 14: Advanced XML Integration September 13-14, 2005 St. Louis, Missouri Sponsored by the U.S. Department of Housing and Urban Development David Talbot, Data.

September 13-14, 2005 St. Louis, MissouriSponsored by the U.S. Department of Housing and Urban Development

14

How do I get my vendor to Support HUD XML?

• Peer pressure. Vendor X is doing it and you should too.

• RFP Pressure. If you haven’t chosen a vendor, please include HUD XML as a baseline requirement in your RFP.

• Pay the vendor to implement the feature.

Page 15: Advanced XML Integration September 13-14, 2005 St. Louis, Missouri Sponsored by the U.S. Department of Housing and Urban Development David Talbot, Data.

September 13-14, 2005 St. Louis, MissouriSponsored by the U.S. Department of Housing and Urban Development

15

Technical Explanation-Header Information

• SourceDatabase• Contains core information about the data source.• Assign a database ID to each contributing system in your

continua.

• Export• Assigns a unique id, for the source database, identifying

the export.• Used so destination system can discover if it has

imported this exact file before.

Page 16: Advanced XML Integration September 13-14, 2005 St. Louis, Missouri Sponsored by the U.S. Department of Housing and Urban Development David Talbot, Data.

September 13-14, 2005 St. Louis, MissouriSponsored by the U.S. Department of Housing and Urban Development

16

Technical Explanation-Program

• The word “program” has a thousand definitions in the gov/non-profit world.• Organizational Unit• Grant/Funding Source• Grouping of services packaged for special need

• In HUD XML it is all of the above, program is a hierarchical structure.

• Strategies for importing• Recommend treating root “programs” as organizational

units and sub-programs as grouping of services packaged for special need.

Page 17: Advanced XML Integration September 13-14, 2005 St. Louis, Missouri Sponsored by the U.S. Department of Housing and Urban Development David Talbot, Data.

September 13-14, 2005 St. Louis, MissouriSponsored by the U.S. Department of Housing and Urban Development

17

Technical Explanation-ClientExport

• Optional element used to associate clients with exports.• Example:

• Client David Talbot is exported from Database 1 for the first time in the export identified as 25.

• A few months later, David Talbot is included in a new export identified as 29.

• The ClientExport record in export 29 would be:<ClientExport>

<CEPersonID>123</CEPersonID><CEExportID>

<CEExportIDNum>25</CEExportIDNum>

</CEExportID>

</ClientExport>• Target database knows this is the same PersonID 123 it sent

before.• ClientExport is technically redundant data and won’t be

needed for most systems if the DatabaseIDs are consistent.

Page 18: Advanced XML Integration September 13-14, 2005 St. Louis, Missouri Sponsored by the U.S. Department of Housing and Urban Development David Talbot, Data.

September 13-14, 2005 St. Louis, MissouriSponsored by the U.S. Department of Housing and Urban Development

18

Technical Explanation-Client

• PersonID is unique to the source system. Destination system can use DatabaseID and PersonID together to uniquely identify a client.

• ProgramParticipation aligns to an enrollment.• ClientHistorical contains periodically gathered

data such as data gathered at entry and at exit.• ClientHistorical contains some of the most important

data for rich reporting because it shows client change over time.

• ServiceEvents has both a HUD category and an AIRS Code.

Page 19: Advanced XML Integration September 13-14, 2005 St. Louis, Missouri Sponsored by the U.S. Department of Housing and Urban Development David Talbot, Data.

September 13-14, 2005 St. Louis, MissouriSponsored by the U.S. Department of Housing and Urban Development

19

Core Data Cleansing Strategies

• Client Identifiers are the first line of defense in data cleansing.

• Identify which systems/participants are providing the best data and give them “precedence” in the data warehouse.

• Set intelligent null values.• Cleanse using carefully chosen business rules.• AIRS Codes are very helpful for cleansing services

if most of your source data systems can support them.

Page 20: Advanced XML Integration September 13-14, 2005 St. Louis, Missouri Sponsored by the U.S. Department of Housing and Urban Development David Talbot, Data.

September 13-14, 2005 St. Louis, MissouriSponsored by the U.S. Department of Housing and Urban Development

20

Related Standards

• AIRS XML is for exporting provider/organizational data.

• AIRS XML does not support exporting client data.• AIRS XML is broadly supported among I&R and

211 software vendors.• AIRS has agreed in principle to look at using HUD

XML as the export format for client data.• The NHSDC and the HMIS TA team are currently

working to define standards around how HUD XML files will be passed and processed.

Page 21: Advanced XML Integration September 13-14, 2005 St. Louis, Missouri Sponsored by the U.S. Department of Housing and Urban Development David Talbot, Data.

September 13-14, 2005 St. Louis, MissouriSponsored by the U.S. Department of Housing and Urban Development

21

Further Information

Listserv at:http://groups-beta.google.com/group/HMIS_Data_Integration

Open Source Data Warehouse Project:http://casesync.net

Vendors: Watch hmis.info for pending vendor comparison table.