CCEGA Informatics Project: Developing Shared Infrastructure and Data Models

20
CCEGA Informatics Project: Developing Shared Infrastructure and Data Models Project Leader: Brad Hemminger [email protected] School of Information and Library Science University of North Carolina at Chapel Hill

description

CCEGA Informatics Project: Developing Shared Infrastructure and Data Models. Project Leader: Brad Hemminger [email protected] School of Information and Library Science University of North Carolina at Chapel Hill. Participants. Brad Hemminger bmh at ils.unc.edu - PowerPoint PPT Presentation

Transcript of CCEGA Informatics Project: Developing Shared Infrastructure and Data Models

Page 1: CCEGA Informatics Project: Developing Shared Infrastructure and Data Models

CCEGA Informatics Project: Developing Shared Infrastructure and

Data Models

Project Leader: Brad [email protected]

School of Information and Library Science

University of North Carolina at Chapel Hill

Page 2: CCEGA Informatics Project: Developing Shared Infrastructure and Data Models

Participants• Brad Hemminger bmh at ils.unc.edu• Kaye Balke balke at ils.unc.edu• Kirk Wilhemsen kirk at neurology.unc.edu• David Threadgill dwt at med.unc.edu• Dong Xiang dxiang at email.unc.edu• Min Xu xumin at med.unc.edu• Joel Kingsolver jgking at bio.unc.edu• Paul Brown paul.brown at unc.edu• Lavana Ramakrishnan lavanya at renci.org• Roger Akers akers at unc.edu• Peter DeSaix pdesaix at email.unc.edu• Clark Jeffries clark_jeffries at med.unc.edu• Xiaojun Guan xguan at renci.org• Kevin Gamiel kgamiel at renci.org• Erik Scott escott at renci.org• Barrie Hayes bhayes at email.unc.edu

Page 3: CCEGA Informatics Project: Developing Shared Infrastructure and Data Models

Project Aims

Goal: Development of common data model and informatics infrastructure for UNC

• Determine needs of research labs on campus• Determine applicable global standards that can

be utilized • Determine issues that affect whether research

labs would utilize a common infrastructure and common data model.

• Understand and address security issues• Based on this information, develop model

Page 4: CCEGA Informatics Project: Developing Shared Infrastructure and Data Models

Lab Surveys

• Bioinformatics Research labs at UNC were invited to provide details of their data infrastructure, in particular their data models (and example data).

• PIs and database administrators from the projects meet with our full committee for interviews, and afterwards we followed up to obtain dumps of their data schemas.

Page 5: CCEGA Informatics Project: Developing Shared Infrastructure and Data Models

Labs that provided in depth interviews and complete data models

• Kirk Wilhelmsen (alcoholism and addiction projects)

• Paul Brown (Cell Biology, multiple projects)• Roger Akers (Epidemiology Specimen

Tracking)• Lineberger (multiple cancer projects)• Mike Knowles (Pulmonary and Cystic Fibrosis)• Kari North (case control and family based

studies of cardiovascular disease)• Proteomics Center (earlier project)

Page 6: CCEGA Informatics Project: Developing Shared Infrastructure and Data Models

Global Standards

• While there are no overarching standards that define common definitions for all the data elements necessary, standards exists in many individual domains (microarrays, genetic sequences, proteins, etc). Additionally, larger scale efforts are being made, such as CDSIC (clinical trials) and caBIG (cancer). caBIG has a whole workgroup devoted to vocabularies and common data elements (VCDE).

Page 7: CCEGA Informatics Project: Developing Shared Infrastructure and Data Models

Issues affecting user acceptance

• Most all research projects prefer to have their own database– Specific projects– No need to tie into other researchers data– No need to preserved data generated by study– Easier to build themselves– More control when managed themselves

• Core facilities– Require specific control, privacy of data

• Clinical facilities– Rigorous requirements regarding sharing of data (ELSI,

HIPAA)

Page 8: CCEGA Informatics Project: Developing Shared Infrastructure and Data Models

Reasons for Sharing

• More studies are required to share data between projects (larger studies, multicenter studies)

• More projects depend on outside resources (databanks)

• Free, or inexpensive disk space

• Dependable archiving of data

• Assistance in designing data models for study

Page 9: CCEGA Informatics Project: Developing Shared Infrastructure and Data Models

SecurityPossible security design requirements:• Identification tables of entities (as in Trusted Broker doc)• Translation tables among entities• Authentication (two-way) between broker and entities• Authorization of entities by broker• Encrypted channels (SSL, IPSec, other)• Protection against various denial of service attack types (limiting

multiple accesses or very frequent access requests from any one researcher, etc.)

• Multiple types of access requirements for the human trusted broker (something you have, you know, or you are)

• Other requirements on trusted broker (bonded staff, permission to modify databases requiring at least two separate trusted brokers cooperating, etc.)

• Remote backup system...

Page 10: CCEGA Informatics Project: Developing Shared Infrastructure and Data Models

Common Data Model

• Had a general framework from previous work• Built new model from ground up

– Took all data elements from all the research labs and pooled together to define overall set of elements, including which elements from different labs mapped to the same “common” elements.

– Produced set of core elements that were common to many projects and important for sharing.

• Integrated new model with overall design principles from general framework to develop final “common data model”.

Page 11: CCEGA Informatics Project: Developing Shared Infrastructure and Data Models
Page 12: CCEGA Informatics Project: Developing Shared Infrastructure and Data Models

Example of integrating data

• View integration spreadsheet, look at example (samples) of before and after.

Page 13: CCEGA Informatics Project: Developing Shared Infrastructure and Data Models

Final Common Model

• Developing taking common data elements and putting into a database system for testing.– Database schema design (see printout)– Integrate standards in definition of data

elements– Incorporate into actual database

• Test model database by incorporating actual data from volunteer labs (Kirk, Roger)

Page 14: CCEGA Informatics Project: Developing Shared Infrastructure and Data Models

Next Steps

• The aim of this P20 planning project is to prepare for further grants in this area, and to hopefully help lay the groundwork for building a common biomedical informatics infrastructure at UNC

• In Jan 2007, we submitted a CTSA grant (Clinical and Translational Science Award). This grant aims to integrate all biomedical informatics infrastructure on campus.

Page 15: CCEGA Informatics Project: Developing Shared Infrastructure and Data Models

CTSA--overview• The TraCS Biomedical Informatics Core will unite the

silos of biomedical informatics research excellence at UNC and across North Carolina to maximize re-use of data, knowledge and processes. With the establishment of the North Carolina Collaboratory for Biomedical Informatics (NCCBI), TraCS will support research, patient care, education and policy-making while building upon, leveraging and extending the current biomedical informatics infrastructure at UNC-CH. This core involves several external partners with a strong presence in NC and world-wide: Red Hat, IBM, SAS, Allscripts, Quintiles and NCHICA. We are committed to achieving a national leadership role in the design and development of best practices for the inclusion of clinical data into shared repositories of biomedical data.

Page 16: CCEGA Informatics Project: Developing Shared Infrastructure and Data Models

CTSA—tie in clinical data• To support the goals of the TraCS Institute, the Biomedical

Informatics Core will create a statewide interdisciplinary and inter-institutional collaboratory (collaborative laboratory): the North Carolina Collaboratory for Biomedical Informatics (NCCBI). It will build on the transformative technology used by the NIH to create Entrez for the NCBI. The long-term goal is to create a shared biomedical informatics data repository connecting clinical enterprises across the State of North Carolina to create a demonstration project for clinical data that will be a model for sharing and re-use of clinical data. This repository will contain appropriately de-identified data from clinical trials and clinical care. With the establishment of the NCCBI, the TraCS Biomedical Informatics Core will transform the excellent but fragmented biomedical informatics capabilities at UNC-CH into a coherent and connected system that facilitates routine re-use of research knowledge, data and processes throughout UNC and North Carolina, serving as a prototype for the nation.

Page 17: CCEGA Informatics Project: Developing Shared Infrastructure and Data Models

Example Centers Included

• General Clinical Research Center, the Collaborative Studies Coordinating Center, the Lineberger Comprehensive Cancer Center, the Carolina Center for Exploratory Genetic Analysis, the Carolina Center for Genome Sciences, the Carolina Exploratory Center for Cheminformatics Research, the Biomedical Imaging Research Center, the Carolina Environmental Bioinformatics Center, the Center for Bioinformatics, the Renaissance Computing Institute, and the Odum Institute for Research in Social Science

Page 18: CCEGA Informatics Project: Developing Shared Infrastructure and Data Models

CTSA

• In short, the CTSA proposal builds on the work of the P20, and offers us the potential to truly transform the way scientists and clinicians work at UNC, and bring about unprecedented integration and data sharing.

Page 19: CCEGA Informatics Project: Developing Shared Infrastructure and Data Models

Summary--TimelineInitial Workshop beginning project (spring 2005)• Analysis of data requirements, policies, and existing

infrastructure at UNC. Internal interviews with labs (spring through fall 2005)

• Development complete list of data elements, review with labs and finalize elements for common model (fall 2005-spring 2006)

• Development of draft model (fall 2006-spring 2007)• Testing of draft model using example labs data (fall

2007)• Review by labs and researchers at UNC. Share with

outside experts to solicit critiques. (fall 2007)• Use this work to develop new grants to fund actual

deployment of common data models, policies and infrastructure at UNC. (spring 2007-current)

Page 20: CCEGA Informatics Project: Developing Shared Infrastructure and Data Models