caBIG Data Structures
description
Transcript of caBIG Data Structures
caBIG Data Structures
CS584 Lecture on 4/6/2007
Patrick McConnell
Duke Comprehensive Cancer [email protected]
CS584 Lecture on 4/6/2007 caBIG Data Structures
Agenda
• caBIG background (5 min, 8 slides)
• Goals, program structure, organizations
• caTRIP background (5 min, 6 slides)
• Background, use cases, architecture
• caBIG compatibility (30 min, 21 slides + demonstration)
• Interoperability, compatibility, syntactics, and semantics
• Building caBIG compatible systems (10 min, 7 slides)
• Interoperability, compatibility, syntactics, and semantics
• caGrid (10 min, 8 slides)
• Background, service creation, metadata
• caTRIP demonstration (10 min, 2 slides + demo)
• Demonstration
• Discussion/questions (5 min + throughout)
caBIG Background
Goals, program structure, organizations
CS584 Lecture on 4/6/2007 caBIG Data Structures
caBIG backgroundBiomedical information tsunami
• overwhelming volume of data
• multitude of sources
CS584 Lecture on 4/6/2007 caBIG Data Structures
caBIG backgroundInformatics tower of Babel
•Each cancer research community speaks its own scientific “dialect”
•Integration critical to achieve promise of molecular medicine
CS584 Lecture on 4/6/2007 caBIG Data Structures
caBIG backgroundGoals and principles
• 50 Cancer Centers are working towards a common goal of integrated data, tools and methodologies to accelerate cancer research goals at the National Cancer Institute for Bioinformatics (NCICB), the cancer Biomedical Informatics Grid (caBIG™)
• The goal of caBIG™ is to create a virtual web of interconnected data, individuals, and organizations which will:
• redefine how research is conducted
• care is provided
• patients / participants interact with the biomedical research enterprise
• The principles driving caBIG™ are:
• Open Source
• Open Access
• Open Development
• Federated Model
CS584 Lecture on 4/6/2007 caBIG Data Structures
caBIG backgroundcaBIG facilitates sharing
CS584 Lecture on 4/6/2007 caBIG Data Structures
caBIG backgroundWorkspaces
DOMAIN WORKSPACE 3Tissue Banks & Pathology ToolsDOMAIN WORKSPACE 3Tissue Banks & Pathology Tools
provides for the integration, development, and implementation of tissue and pathology tools.
DOMAIN WORKSPACE 2Integrative Cancer ResearchDOMAIN WORKSPACE 2Integrative Cancer Research
provides tools and systems to enable integration and sharing of information.
DOMAIN WORKSPACE 1Clinical Trial Management SystemsDOMAIN WORKSPACE 1Clinical Trial Management Systems
addresses the need for consistent, open and comprehensive tools for clinical trials management.
CROSS CUTTING WORKSPACE 2Architecture
CROSS CUTTING WORKSPACE 2Architecture
developing architectural standards and architecture necessary for other workspaces.
CROSS CUTTING WORKSPACE 1Vocabularies & Common
Data Elements
CROSS CUTTING WORKSPACE 1Vocabularies & Common
Data Elements
responsible for evaluating, developing, and integrating systems for vocabulary and ontology content, standards, and software systems for content delivery
DOMAIN WORKSPACE 4ImagingDOMAIN WORKSPACE 4Imaging
provides for the sharing and analysis of in vivo imaging data.
CS584 Lecture on 4/6/2007 caBIG Data Structures
caBIG backgroundCommunities
9Star ResearchAlbert EinsteinArdais Argonne National LaboratoryBurnham Institute California Institute of Technology-JPLCity of Hope Clinical Trial Information Service (CTIS)Cold Spring HarborColumbia University-Herbert IrvingConsumer Advocates in Research and Related Activities (CARRA)Dartmouth-Norris CottonData Works DevelopmentDepartment of Veterans AffairsDrexel University Duke UniversityEMMES CorporationFirst Genetic TrustFood and Drug AdministrationFox Chase Fred HutchinsonGE Global Research CenterGeorgetown University-LombardiIBMIndiana UniversityInternet 2Jackson LaboratoryJohns Hopkins-Sidney Kimmel Lawrence Berkeley National Laboratory Massachusetts Institute of Technology Mayo Clinic Memorial Sloan KetteringMeyer L. Prentis-KarmanosNew York University
Ohio State University-Arthur G. James/Richard SoloveOregon Health and Science UniversityRoswell Park Cancer Institute St Jude Children's Research HospitalThomas Jefferson University-KimmelTranslational Genomics Research InstituteTulane University School of MedicineUniversity of Alabama at BirminghamUniversity of Arizona University of California Irvine-Chao FamilyUniversity of California, San FranciscoUniversity of California-DavisUniversity of ChicagoUniversity of ColoradoUniversity of Hawaii University of Iowa-HoldenUniversity of MichiganUniversity of MinnesotaUniversity of NebraskaUniversity of North Carolina-Lineberger University of Pennsylvania-AbramsonUniversity of PittsburghUniversity of South Florida-H. Lee Moffitt University of Southern California-NorrisUniversity of VermontUniversity of WisconsinVanderbilt University-IngramVelosVirginia Commonwealth University-MasseyVirginia TechWake Forest UniversityWashington University-SitemanWistarYale UniversityNorthwestern University-Robert H. Lurie
CS584 Lecture on 4/6/2007 caBIG Data Structures
caBIG backgroundDuke’s role in caBIG
•Pankaj Agarwal•Bob Annechiarico•Bill Banks•Vijaya Chadaram•Jamie Cuticchia•Raj Dash•Mohammad Farid•Seth Fehrs•Patrick McConnell•Salvatore Mungal•Mark Peedin
•CALGB•CCR•Coalition of Cooperative Groups•Dana Farber•Georgetown•Mayo•Oregon Health Sciences University•SemanticBits LLC•University of Pennsylvania•Wake Forest•Yale
•Integrative Cancer Research• Workspace participant• RProteomics developer • caTRIP developer
•Architecture• Workspace participant• caGrid developer• caGrid scientific liaison• Guide to Mentors
•Vocabularies and Common Data Elements• Workspace participant• Guide to Mentors
•Clinical Trials Management Systems• Workspace participant• C3PR developer• CTMS Interoperability architect• C3D developer
•Tissue Banking and Pathology Tools• Workspace participant• caTissue adopter
•Strategic Planning• Workspace participant
The Cancer Translational ResearchInformatics Platform (caTRIP)
Background, use cases, architecture
CS584 Lecture on 4/6/2007 caBIG Data Structures
caTRIPWho is involved?
•Duke Bioinformatics
• Jamie Cuticchia (PI)
• Patrick McConnell (lead architect)
•Duke Information Systems
• Bob Annechiarico (PM)
• Wilma Stanley (developer)
• Mark Peedin (developer)
• Mohamad Farid (DBA)
• Jeff Allred (IT manager)
•Duke Pathology
• Raj Dash (domain expert)
• Chris Hubbard (developer)
•Duke Oncology
• Kelley Marcom (domain expert)
• Gretchen Kimmick (domain expert)
• Kimberly Blackwell (domain expert)
• Lee Wilke (domain expert)
•Duke CALGB
• Kimberly Johnson (DataMart liaison)
•SemanticBits
• Ram Chilukuri (lead developer)
• Srini Akkala (developer)
• Sanjeev Agarwal (developer)
•5 AM Solutions
• Bill Mason (developer)
•NCI
• Julie Klemm (ICR WS lead)
• Carl Shaefer (NCI rep)
• Subha Madhavan (caIntegrator PM)
•BAH
• Curtis Lockshin
• Mehul Shah (tech support)
Managers and Architects
Database Developers and IT
Domain Experts
Software Developers
NCI/BAH
CS584 Lecture on 4/6/2007 caBIG Data Structures
caTRIP What is translational research?
• Bench-to-Bedside• Wikipedia (the source of all knowledge):
Translational medicine is a branch of medical research that attempts to more directly connect basic research to patient care.
• Basic research occurs in the lab• Patient care occurs in the clinic
• Translational research broadened…
Translational medicine can also have a much broader definition, referring to the development and application of new technologies in a patient driven environment - where the emphasis is on early patient testing and evaluation.
…facilitate the interaction between basic research clinical medicine, particularly in clinical trials.
CS584 Lecture on 4/6/2007 caBIG Data Structures
caTRIP Initial focus
• Our initial focus will be on connecting existing data systems, including basic science data, to enhance patient care
• Initial problem scenario: outcomes analysis
• Use data from existing patients to inform the treatment of another patient
• Leverage clinical, pathology, tissue, and basic science data
• Scenario:
Patient A enters the clinic. What treatments were applied with success on other patients with similar characteristics (race, sex, symptoms, pathology results, adverse events, biomarkers).
CS584 Lecture on 4/6/2007 caBIG Data Structures
caTRIP Broadened focus: scientific use cases
• Find available tumor tissue• What are all the tissue specimens from her2/neu positive patients that have a
primary tumor in the breast and are BRCA1 positive?• Find factors of survival
• What are all the ER positive patients that have survived breast cancer after radiation treatment?
• Find patients for trials• What are all the patients that are triple negative (ER, PR, and HER2/NEU
negative)?• Determine the distribution of disease factors over time
• Does a change in pathology biomarkers over time contribute to recurrence or death?
• Determine correlation of factors pre and post surgery• Does a change in ER or PR status before and after surgery correlate with other
factors?• Find pathology reports of interest
• Show me all of the pathology reports for Her2/Neu positive patients with a lobular carcinoma.
CS584 Lecture on 4/6/2007 caBIG Data Structures
caTRIP Connecting disparate data systems
Tumor Registry
Diagnosis, Treatment,
Recurrence, Follow-up
CAE
Pathology Biomarkers
caTissue CORE
Tissue Bank
caTIES
Pathology Reports
caIntegrator
SNP Data
MRNMRN
CS584 Lecture on 4/6/2007 caBIG Data Structures
caTRIP Architecture overview
MAW3 Tumor Registry Illumina
caTIES CAEcaTissue
CORETR
CGEMSSNP
Duke
Domain Grid Services
Distributed Query Engine
query
disc
over
GU
I
IndexService
IdPService
Core Grid Services
Domain Controller
auth
entic
ate
caTissue CORE
caTIES CAE TR caIntegrator
GridGrouper
authorize
caBIG Compatibility
Interoperability, compatibility, syntactics, and semantics
CS584 Lecture on 4/6/2007 caBIG Data Structures
ability of a system to access and use the parts or equipment of another system
SemanticSemanticinteroperabilityinteroperability
SyntacticSyntacticinteroperabilityinteroperability
Courtesy: Charlie MeadcaBIG compatibility Interoperability defined
CS584 Lecture on 4/6/2007 caBIG Data Structures
caBIG compatibility How does this apply to caBIG?
• Connect scientists and practitioners through a shareable and interoperable infrastructure
• Develop standard rules and a common language to more easily share information (compatibility guidelines)
• Build or adapt tools for collecting, analyzing, integrating, and disseminating information associated with cancer research and care.
“The cancer community is united in its mission to eliminate suffering and death due to cancer. It is now connected by caBIG™. “
CS584 Lecture on 4/6/2007 caBIG Data Structures
caBIG compatibility What is compatibility in caBIG?
The four areas of the caBIG compatibility guidelines:
• Information Models - Individual types of data are rarely collected or presented in isolation. Rather, they are assembled into a contextual environment that includes closely and more distantly associated data and information. These associations and relationships can be presented in the form of an information model.
• CDEs - Data that is collected on a given study or trial must be defined and described such that remote users of that data can understand what it means. These metadata descriptions are referred to as data elements.
• Vocabularies and Ontologies - Biomedical information includes a substantial body of specialized concepts that are represented by terms. Agreement upon the basic concepts, terms and definitions that are inherent in all biomedical information is essential for achieving semantic interoperability.
• Programming and Messaging Interfaces - Computer programs and the people who write them are able to access resources from other programs through programming and messaging interfaces. Each of these interfaces responds to a particular syntax for its communications. Agreement upon standards for these interfaces is necessary to overcome barriers to syntactic interoperability.
CS584 Lecture on 4/6/2007 caBIG Data Structures
caBIG compatibility Levels of compatibility
The four levels of the caBIGTM compatibility guidelines:
• Legacy - Implies no interoperability with an external system or resource. A system that was designed without awareness of or prior to the availability of these compatibility guidelines, and which does not meet any of the requirements for interoperability.
• Bronze - Classifies the minimum requirements that must be met to achieve a basic degree of interoperability.
• Silver - A rigorous set of requirements that, when met, significantly reduce the barrier to use of a resource by a remote party who was not involved in the development of that resource.
• Gold - Currently being defined by caBIG. Is expected to provide for a formalized grid architecture and data standards that will enable standardized advertising, discovery, and use of all federated caBIG resources.
CS584 Lecture on 4/6/2007 caBIG Data Structures
Syntactic
Semantic
Semantic & Syntactic
caBIG compatibility caBIG compatibility guidelines
CS584 Lecture on 4/6/2007 caBIG Data Structures
caBIG compatibility Syntactic interoperability
• The solution for syntactic interoperability in caBIG at the silver level of compatibility is for all systems to provide an Object Oriented Application Programmer Interface (API).
• Object Oriented Interfaces can be
implemented in many programming
languages.
• This interface can be connected to the
caGrid so that the local data repository
is globally accessible in a language
independent way.
• The interface is described by an
information model, which acts as the
junction between the syntactic
components and the semantic
components.
Gene
+ name: String
+ hugoGeneSymbol: String
+ sequence: String
CS584 Lecture on 4/6/2007 caBIG Data Structures
caBIG compatibility Programming and messaging interfaces
• Types of APIs
• Client APIs in a programming language
• Messaging APIs via a messaging protocol
• Types of systems
• Data services provide access to an information model
• Query method• Associations are “traversable”
• Analytical services provide methods tomanipulate data
• Hybrid services provide methods to manipulate information models
• Analytical tools consumer of silver compatible data, but don’t produce it
CS584 Lecture on 4/6/2007 caBIG Data Structures
caBIG compatibility Programming and messaging interfaces details
Legacy Bronze Silver Gold
No programmatic interfaces to the system are available. Only local data files in a custom format can be read
Data transfer mechanisms implemented only on an ad hoc basis
Programmatic access to data from an external resource is possible.
Well-described API’s provide access to data in the form of data objects.
Standards-based electronic data formats are supported for both input to and output from the system.
Standards-based messaging protocols are supported wherever messaging is relevant.
All features of Silver, plus:
Service-oriented components produce or consume resources in the form of grid services
Interoperable with data grid architecture to be defined by caBIG
Examples
Executables Proprietary API/data format
JavaDocs
XML, ASN.1
SOAP, CORBA
Globus
caGrid-based services
CS584 Lecture on 4/6/2007 caBIG Data Structures
caBIG compatibility caTRIP API
Hyperlinks to
caTRIP API
docs
CS584 Lecture on 4/6/2007 caBIG Data Structures
caBIG compatibility caTRIP grid service WSDL
Hyperlinks to
caTRIP API
WSDL
CS584 Lecture on 4/6/2007 caBIG Data Structures
cd Logical Model
engine::FederatedQueryProcessor
+ processDCQLQueryPlan(DCQLQueryDocument) : CQLQuery
engine::FederatedQueryExecutor
+ executeCQLQuery(CQLQuery, String) : CQLQueryResults
«interface»engine::FederatedQueryEngine
+ execute(Document) : CQLQueryResults
ResultAggregator
+ aggregateGroups(Group[]) : Group+ buildGroup(List) : Group+ processResults(CQLQueryResults) : List
Serv iceClientFactory
+ getSeviceClient() : Object
Object
caGridDataServ ice1Client
+ query(CQLQuery) : CQLQueryResults
caGridDataServ ice2Client
+ query(CQLQuery) : CQLQueryResults
executes / obtains
executes
caBIG compatibility caTRIP grid service WSDL
Hyperlinks to
caTRIP FQP
UML
CS584 Lecture on 4/6/2007 caBIG Data Structures
caBIG compatibility Semantic interoperability
• The Solution for semantic interoperability lies in object
oriented UML design of the service, an unambiguous
description of elements within the system and storage of
the description in a publicly accessible repository
(metadata).
• UML model
• Use of publicly accessible terminologies/
vocabularies/ontologies (EVS-NCI Thesaurus)
• Use of publicly accessible metadata repository
(caDSR)
CS584 Lecture on 4/6/2007 caBIG Data Structures
caBIG compatibility Common data element (CDE) details
Legacy Bronze Silver Gold
No Structured metadata is recorded
Data element descriptions have sufficient detail for a subject matter expert to unambiguously interpret
Data elements are built using controlled terminology
Metadata is stored and publicized in an electronic format that is separate from the resource that is being described
Common Data Elements (CDEs) built from controlled terminologies and according to practices validated by the VCDE workspace are used throughout.
CDEs are registered as ISO/IEC 11179 metadata components in the cancer Data Standards Repository (caDSR)
All features of Silver, plus:
Common Data Elements (CDEs) designated as caBIG Standards by the VCDE workspace are used.
Metadata is advertised and discoverable via the caBIG grid services registry
Examples
Free-text pathology reports
GeneOntology from GO website
NCI Thesaurus GeneOntology registered in EVS
NCI Thesaurus
CS584 Lecture on 4/6/2007 caBIG Data Structures
Enterprise Vocabulary Services• Storage of Metadata
• caDSR = cancer Data Standards Repository• Common Data Elements = CDEs• Enable end-users to access information about data and
services without having to access human developers• = Fusion of UML models + Concepts/Definitions
caBIG compatibility Metadata stored in caDSR
caDSR Search Tree: Displays all the current caDSR Contexts. Users can search for groups of DEs by navigating the tree.
Data Element Search Pane: This is the main search window. Users looking for Data Elements can enter a key word or phrase.
Navigation Menu: use these buttons to navigate to the CDE cart, Form Builder, or back to Home( that is back to this page)
CS584 Lecture on 4/6/2007 caBIG Data Structures
caBIG compatibility caTRIP CDEs
Hyperlinks to
caTRIP CDEs
CS584 Lecture on 4/6/2007 caBIG Data Structures
caBIG compatibility Vocabulary/terminology details
Legacy Bronze Silver Gold
Free text used throughout for data collection
Use of publicly accessible controlled vocabularies as well as local terminologies.
Terminologies must include definitions of terms that meet caBIG VCDE workspace guidelines
Terminologies reviewed and validated by the caBIG Vocabulary/Common Data Element (VCDE) Workspace used for all relevant data collection fields.
All features of Silver, plus:
Full adoption of caBIG terminology standards as approved by the VCDE Workspace.
Examples
Free-text pathology reports
GeneOntology from GO website
NCI Thesaurus GeneOntology registered in EVS
NCI Thesaurus
CS584 Lecture on 4/6/2007 caBIG Data Structures
Enterprise Vocabulary Services• Controlled vocabulary resources for the cancer research community• Vocabulary Products and Services
• NCI Thesaurus• NCI Metathesaurus• External Vocabularies
• NCI Thesaurus - controlled vocabulary source for metadata• Has excellent coverage of cancer terminology• Expands based on needs for additional terminology• Based on concepts rather than terms• Each concept has a unique identifier or CUI with definitions and
synonym• Housed by the Enterprise Vocabulary Service (EVS)• LexBIG
• a caBIG-funded vocabulary server to enable a Federated Vocabulary environment.
caBIG compatibility Publicly accessible terminologies
CS584 Lecture on 4/6/2007 caBIG Data Structures
caBIG compatibility caTRIP CDEs
Hyperlinks to a
caTRIP
concept
CS584 Lecture on 4/6/2007 caBIG Data Structures
caBIG compatibility Information model (UML) details
Legacy Bronze Silver Gold
No model describing the system is available in electronic format
Diagrammatic representation of the information model is available in electronic format.
Information models are defined in UML as class diagrams and are reviewed and validated by the VCDE workspace.
All features of Silver, plus:
Information models are harmonized across the caBIG Domain Workspaces
Examples
Database diagramcd StatML
statml::Array
- base64Value: String- dimensions: String- name: String- type: String
statml::Data
statml::List
- length: Integer- name: String- type: String
statml::Null
statml::Scalar
- name: String- type: String- value: String
0..*
+scalar
1
0..*
+scalar
1
0..*+null
1
0..*+null
1
+context 0..1
1
+list 0..*
1
0..*+list
1
0..*+array
1
0..*+array
1
cd StatML
statml::Array
- base64Value: String- dimensions: String- name: String- type: String
statml::Data
statml::List
- length: Integer- name: String- type: String
statml::Null
statml::Scalar
- name: String- type: String- value: String
0..*
+scalar
1
0..*
+scalar
1
0..*+null
1
0..*+null
1
+context 0..1
1
+list 0..*
1
0..*+list
1
0..*+array
1
0..*+array
1
CS584 Lecture on 4/6/2007 caBIG Data Structures
caBIG compatibility Domain information modeling
• A Domain Information Model is a representation of
our understanding of an area of knowledge.
• Domain Information Models consist of ‘Classes’
that represent ‘things’ in the real world
• Classes contain ‘attributes’ that are characteristics
of different instances of things in the real world.
• Relationships between the classes are described
by ‘associations’ and indicated by lines with
directionality and cardinality
• Each class plus attribute creates one Common
Data Element (CDE)
cd Central Dogma
Gene
+ name: String+ hugoGeneSymbol: String+ sequence: String
Transcript
+ sequence: String+ length: String
Protein
+ name: String+ aminoAcidSequence: String+ molecularWeight: double
1+transcript
1+protein
1+gene
1..*+transcriptCollection
CS584 Lecture on 4/6/2007 caBIG Data Structures
caBIG compatibility Tumor Registry model
Participant
Diagnosis
Treatment
Follow up and Recurrence
Collaborative Staging
Hyperlinks to
caTRIP UML
Building caBIG Compatible Systems
CS584 Lecture on 4/6/2007 caBIG Data Structures
Building caBIG compatible systemsSteps for creating an analytical system
• Step 1: model and register metadata• Model the domain objects• Register metadata
• Step 2: implement the analytical system• Implement an interface• Map data objects to existing inputs• Plug-in analytics
• Step 3: create the data service• Create an XML Schema • Use the caGrid 1.0 Introduce toolkit to create a service• Configure the service • Deploy
• Step 4: invoke the service• Java-based client• Use caTRIP
CS584 Lecture on 4/6/2007 caBIG Data Structures
Building caBIG compatible systemsSteps for creating a data system
• Step 1: model and register metadata• Model the domain objects• Register metadata
• Step 2: implement the information system• Model the databases (via scripts or EA)• Build the database• Generate Java beans• Create Hibernate mappings• Jar it all up
• Step 3: create the data service• Create an XML Schema • Use the caGrid 1.0 Introduce toolkit to create a service• Configure the service • Deploy
• Step 4: invoke the service• Java-based client• Use caTRIP
CS584 Lecture on 4/6/2007 caBIG Data Structures
Building caBIG compatible systemsN-tier architecture
domainmodel
caCORE SDK
CQL Engine
database
Object-relational mapping
Index Service
caGrid Data ServiceDistributed
Query Engine
ad
vert
ise
advertise
CQL Query
CS584 Lecture on 4/6/2007 caBIG Data Structures
Building caBIG Compatible SystemscaCORE SDK
UML Model XMI File
VerifiedEVSReport
Code Generator
VerifiedAnnotatedFixed XMI
caDSRSTAGEPublic APIs
EVS
NO
Fixed XMI
MetadataRetrieval
Stage
caDSRProduction
Terminology Services
SuccessfulTest?
CompatibilityReview
YES
ApprovedAnnotatedFixed XMI
caDSR ServicesUsing
CodeGen?
YES
NO
SemanticIntegrationWorkbench
(SIW)
Load to Stage
UMLLoaderUML
Loader
Info Model
Messaging Interfaces/
API
CommonData
Elements
Vocabularies
CS584 Lecture on 4/6/2007 caBIG Data Structures
caBIG compatibility Mapping UML to CDEs
Common Data Element (CDE)
Data Element Concept (DEC)
Value Domain (VD)
Object Class (OC)
Property
EVS Concept
UML ClassUML
Attribute
UML Class Attribute
UML Datatype
UML Class Attribute Datatype
CS584 Lecture on 4/6/2007 caBIG Data Structures
caBIG compatibility Mapping UML to CDEs example
Class: Gene
Datatype: String
Attribute:entrezGeneID
Gene Entrez Gene Genomic Identifier
java.lang.String
Gene
Entrez GeneGenomic Identifier
java.lang.String
Created Data Element
CS584 Lecture on 4/6/2007 caBIG Data Structures
caBIG compatibility Use SIW to designate existing CDEs
caGrid
Background, service creation, metadata
CS584 Lecture on 4/6/2007 caBIG Data Structures
caGridWhat is caGrid?
• What is Grid?• Evolution of distributed computing to support sciences and engineering• Sharing of resources (computational, storage, data, etc)• Secure Access (global authentication, local authorization, policies, trust,
etc.)• Open Standards• Virtualization
• What is caGrid?• Development project of Architecture Workspace
• Helping define and implement Gold Compliance• Implementation of Grid technology
• Leverages open standards, community open source projects• No requirements on implementation technology necessary for compliance
• Specifications will be created defining requirements for interoperability• caGrid provides core infrastructure, and tooling to provide “a way” to achieve
Gold compliance• Gold compliance creates the G in caBIG™
• Gold => Grid => connecting Silver Systems
CS584 Lecture on 4/6/2007 caBIG Data Structures
caGridMetadata infrastructure goals
• Support strongly typed grid• Syntactic and Semantic interoperability
• Programmatic!
• Smooth transition from Application to Grid and back
• Leverage wealth of existing metadata• Enable service Advertisement and Discovery
CS584 Lecture on 4/6/2007 caBIG Data Structures
caGridService development process
• Service developers first create a service using a simple wizard to specify information (target directory, type of service, service name, etc)
• Next developer locate the data types they will use for inputs or outputs• Can be discovered from the caDSR, GME, file system, etc
• Operations are then defined that take some number of the data types as input, and produce some number as output
• Metadata and Service Properties can be added and configured• The service’s security can be completely configured
• Some or all of these steps may be automatically handled by extensions
CS584 Lecture on 4/6/2007 caBIG Data Structures
caGrid Introduce
• GUI for creating and manipulating a grid service• Provides means of
simple creation of service skeleton that a developer can then implement, build, and deploy
• Automatic code generation of complete caBIG compliant grid service which is configured to provide:
• Advertisement• Standard Metadata• Security• Complete Client API
CS584 Lecture on 4/6/2007 caBIG Data Structures
caGridSteps for creating a data system
• Step 1: model and register metadata• Model the domain objects• Register metadata
• Step 2: implement the information system• Model the databases (via scripts or EA)• Build the database• Generate Java beans• Create Hibernate mappings• Jar it all up
• Step 3: create the data service• Create an XML Schema • Use the caGrid 1.0 Introduce toolkit to create a service• Configure the service • Deploy
• Step 4: invoke the service• Java-based client• Use caTRIP
CS584 Lecture on 4/6/2007 caBIG Data Structures
caGridSteps for creating an analytical system
• Step 1: model and register metadata• Model the domain objects• Register metadata
• Step 2: implement the analytical system• Implement an interface• Map data objects to existing inputs• Plug-in analytics
• Step 3: create the data service• Create an XML Schema • Use the caGrid 1.0 Introduce toolkit to create a service• Configure the service • Deploy
• Step 4: invoke the service• Java-based client• Use caTRIP
CS584 Lecture on 4/6/2007 caBIG Data Structures
caGridcaGrid data description infrastructure
• Client and service APIs are object oriented, and operate over well-defined and curated data types
• Objects are defined in UML and converted into ISO/IEC 11179 Administered Components, which are in turn registered in the Cancer Data Standards Repository (caDSR)
• Object definitions draw from controlled terminology and vocabulary registered in the Enterprise Vocabulary Services (EVS), and their relationships are thus semantically described
• XML serialization of objects adhere to XML schemas registered in the Global Model Exchange (GME)
Service
Core Services
Client
XSDWSDL
Grid Service
Service Definition
Data TypeDefinitions
Service API
Grid Client
Client API
Registered In
Object Definitions
SemanticallyDescribed In
XMLObjectsSerialize To
ValidatesAgainst
Client Uses
Cancer Data Standards Repository
Enterprise Vocabulary
Services
Objects
GlobalModel
Exchange
GMERegistered In
ObjectDefinitions
Objects
CS584 Lecture on 4/6/2007 caBIG Data Structures
caGridMetadata services
• Cancer Data Standards Repository (caDSR)• caBIG projects register their data models as Common Data Elements (CDEs) which are
semantically harmonized and then centrally stored and managed the caDSR• The caDSR grid service provides:
• Model discovery and traversal• caGrid standard metadata generation capabilities
• Enterprise Vocabulary Services (EVS)• EVS is set of services and resources that address the need for controlled vocabulary• The EVS grid service provides:
• Query access to the data semantics and controlled vocabulary managed by the EVS• Global Model Exchange (GME)
• GME is a DNS-like data definition registry and exchange service that is responsible for storing and linking together data models in the form of XML schema.
• The GME grid service provides:• Access to the authoritative structural representation of data types on the grid
• Globus Information Services: Index Service• The Globus Information Services infrastructure provides a generic framework for aggregation
of service metadata, a registry of running Grid services, and a dynamic data-generating and indexing node, suitable for use in a hierarchy or federation of services
• The Index grid service provides:• Yellow and white pages for the grid
CS584 Lecture on 4/6/2007 caBIG Data Structures
caGridcaGrid production environment
The Cancer Translational ResearchInformatics Platform (caTRIP)
Demonstration
CS584 Lecture on 4/6/2007 caBIG Data Structures
caTRIP Clinical and research scenarios
• Clinical scenario for demonstration• A patient enters the clinic and is diagnosed with a lobular carcinoma• The Her2/Neu biomarker test comes back positive• What are the treatments and outcomes of other patients with similar
characteristics?• Query for diagnosis date, treatment, treatment date, survival, recurrence, and
BRCA1 and BRCA2 status• Look for treatments given with success and correlation between BRCA status in
case test should be ordered• Research scenario for demonstration
• Is there a correlation between recurrence, mortality, histologic grade, and Her2/Neu status for breast cancer patients diagnosed with lobular carcinoma?
• Query caTRIP for recurrence type, date of death, histologic grade, and Her2/Neu status for patients diagnosed with lobular carcinoma
• Correlation is determined in Microsoft Excel• Investigate gene biomarkers that correlate with a Her2/Neu status of negative
and survival• Query caTRIP for all available tissue to order for microarray experiments
• Query sharing• What are all the triple negative patients?
CS584 Lecture on 4/6/2007 caBIG Data Structures
caTRIP Why the Simple GUI?
• What are all the tissue specimens from her2/neu positive patients that have a primary tumor in the breast and are BRCA1 positive?
caTissue CORECAE
Tumor Registry CGEMS
Participant Medical Record Number
CS584 Lecture on 4/6/2007 caBIG Data Structures
Discussion/questions
Backup Slides
CTMS Interoperability Project
Goals, scope, BRIDG, architecture, demo
CS584 Lecture on 4/6/2007 caBIG Data Structures
CTMSiA collaborative effort
11 Organizations• Booz Allen Hamilton• Dana-Farber• Duke University• Ekagra• Harvard University• Mayo Clinic• NCICB• Nortel Government Solutions• Northwestern University• ScenPro• SemanticBits
8 Locations• Maryland• Minnesota• Virginia• Georgia• Massachusetts• North Carolina• Illinois• France
35+ Team Members / 5 Applications• Cancer Central Clinical Participant Registry
(C3PR)• Cancer Central Clinical Database (C3D)• Patient Study Calendar (PSC)• caXchange: LabViewer and the Clinical Trials
Object Model (CTOM)• Cancer Adverse Events Reporting System
(caAERS)
8 Roles• Analysts• Architects• Developers• Project Director• Project Manager• Project Sponsor• Project Tech Leads• Subject Matter Experts
CS584 Lecture on 4/6/2007 caBIG Data Structures
CTMSi Credits
Project Director: Meg Gronvall (BAH)Charles N. Mead, M.D. (BAH)
NCICB CTMS Lead:Christo Andonyadis, D.Sc. (NCICB)
Project Manager:Edmond Mulaire (SemanticBits)
Project Architects:Patrick McConnell (Duke)Niket Parikh (BAH)
Analysts:Smita Hastak (ScenPro)Wendy Ver Hoef (ScenPro)
Subject Matter Experts:Sharon Elcombe (Mayo Clinic)Vijaya Chadaram (Duke)Jomol Mathew (Dana-Farber)Renee Webb (Northwestern)
NCICB Systems Support:Gavin Brennan (TerpSys), Vanessa Caldwell (TerpSys), Doug Kanoza (TerpSys), Wei Lu (TerpSys), Ralph Rutherford (TerpSys)
Project Technical Leads:Ram Chilukuri (SemanticBits)Charles Griffin (Ekagra)Vinay Kumar (SemanticBits)Stephen Reckford (Nortel Government Solutions)Rhett Sutphin (Northwestern)Sean Whitaker (Northwestern)
caAERS: Ram Chilukuri (SemanticBits), Krikor Krumlian(Akaza Research), Vinay Kumar (SemanticBits), RhettSutphin (Northwestern), Kulasekaran Sethumadhavan(SemanticBits), Sujith Thayylithodi (SemanticBits)
caGrid: Manav Kher (SemanticBits), Vinay Kumar (SemanticBits), Joshua Phillips (SemanticBits)
caXchange (Lab Viewer/CTOM): Charles Griffin (Ekagra), Smita Hastak (ScenPro), Mukesh Mediratta(Ekagra), Kunal Modi (Ekagra), Wendy Ver Hoef(ScenPro)
caXchange Extensions: Ekagra, SemanticBits
C3D: Srinivas Batchu (Ekagra), Patrick Conrad (Ekagra),Rangaraju Gadiraju (Ekagra), Stephen Reckford (Nortel)
C3PR: Kruttik Aggarwal (SemanticBits), Ram Chilukuri(SemanticBits), Ramakrishna Gundala (SemanticBits),Manav Kher (SemanticBits), Patrick McConnell (Duke), Priyatam Mudivarti (SemanticBits)
PSC: Rhett Sutphin (Northwestern), Sean Whitaker(Northwestern)
CS584 Lecture on 4/6/2007 caBIG Data Structures
CTMSi Goal
Patient Scheduling
Participant Registration Lab Results
Clinical Trials DB
Adverse Events
Integrate
caGrid
caXchange
CS584 Lecture on 4/6/2007 caBIG Data Structures
CTMSi BRIDG extract
cd CTMS Interoperability BRIDG-Based Analysis Model for Data Exchange
NOTES
Clinical Research Entities and Roles::Person
+ administrativeGenderCode: BRIDGCodedConcept+ dateOfBirth: dateTime- ethnicGroup: string- firstName: string- lastName: string- race: string
PersonRole
Clinical Research Entities and Roles::Participant
::Person+ administrativeGenderCode: BRIDGCodedConcept+ dateOfBirth: dateTime- ethnicGroup: string- firstName: string- lastName: string- race: string::Role+ id: BRIDGID
Participation
Clinical Research Activ ities and Participation::StudySubject
+ studySubjectIdentifier: BRIDGID::Participation+ endDate: dateTime+ identifier: BRIDGID+ startDate: dateTime = + status: BRIDGStatus
Clinical Research Activ ities and Participation::PerformedActiv ity
+ endDateTime: dateTime+ startDateTime: dateTime
BRIDG Shared Classes::Activity
+ codedDescription: BRIDGCodedConcept+ description: BRIDGDescription+ status: BRIDGStatus+ type: BRIDGCodedConcept
Participation
Clinical Research Activ ities and Participation::StudySite
::Participation+ endDate: dateTime+ identifier: BRIDGID+ startDate: dateTime = + status: BRIDGStatus
Clinical Research Entities and Roles::Organization
+ identifier: BRIDGID+ name: string
OrganizationRole
Clinical Research Entities and Roles::
HealthCareSite
::Organization+ identifier: BRIDGID+ name: string::Role+ id: BRIDGID
Clinical Research Activ ities and Participation::LabTest
ObjectiveResultQuantitativeMeasurement
Clinical Research Activ ities and Participation::LabResult
+ textResult: string::QuantitativeMeasurement+ numericResult: float+ numericUnits: BRIDGCodedConcept+ referenceRangeComment: string+ referenceRangeHigh: int+ referenceRangeLow: int
Name: CTMS Interoperability BRIDG-Based Analysis Model for Data ExchangeAuthor: Smita HastakVersion: 1.0Created: 8/13/2001 12:00:00 AMUpdated: 1/12/2007 9:50:44 AM
In implementation:do NOT use endDate, startDate, status
In implementation:do NOT use identifierC3PR only uses SubjectIdentifier
StudyParticipantEligibility
+ isEligible: boolean
Observation
Observations::AdverseEvent
- verbatimTerm: String::Activity+ codedDescription: BRIDGCodedConcept+ description: BRIDGDescription+ status: BRIDGStatus+ type: BRIDGCodedConcept
Green notes mark classes where attributes inherited from the same superclass are inherited in two differentsubclasses but are not necessarily used in both.
Note to Implementers: This is an analysis model not an implementation model , and therefore supplemental attributes may be required in your implementation model to support data exchange between applications (e.g. extra ids). Furthermore, it may be that not all attributes included here are required for data exchanges and may be eliminated from this model . It is also likely that an implementation based on this model may collapse associations to simplify the structure of data exchanges.
Identifier
+ identifier: BRIDGID+ type: BRIDGCodedConcept
Clinical Research Activities and
Participation::Study
+ id: BRIDGID+ longTitle: string
Disclaimer: BRIDG classes used in this model have been pared down to only what is needed for data exchange in the CTMS Interoperability project and this in no way indicates or suggests changes to the official BRIDG model .
0..*1
0..*
1
1
0..*
0..*1
0..*1
+are performed at
1..*
+participate in
1
1
0..*
+labTest 1
+labResult 0..1
Subject
Site
Study
Labs
Adverse
Events
Eligibility
cd Comprehensiv e Logical Model
Entities and Roles::Access
Entities and Roles::Activ ityRoleRelationship
+ relationshipCode: PSMCodedConcept+ sequenceNumber: NUMBER+ negationIndicator: BOOLEAN+ time: TimingSpecification+ contactMediumCode: PSMCodedConcept+ targetRoleAwarenessCode: PSMCodedConcept+ signatureCode: PSMCodedConcept+ signature: PSMDescription+ slotReservationIndicator: BOOLEAN+ substitionConditionCode: PSMCodedConcept+ id: PSMID+ status: PSMCodedConcept
Entities and Roles::Dev ice
- manufacturerModelName: - softwareName: - localRemoteControlStateCode: - alertLevelCode: - lastCalibrationTime:
Entities and Roles::Employee
+ jobCode: PSMCodedConcept
Entities and Roles::Entity
+ instantiationType: ENUM {Placeholder, Actual}+ id: SET <PSMID>+ name: string+ code: PSMCodedConcept+ quantity: int+ description: PSMDescription+ statusCode: BRIDGStatus+ existenceTime: BRIDGInterval+ riskCode: PSMCodedConcept+ handlingCode: PSMCodedConcept+ contactInformation: SET <PSMContactAddr>
Entities and Roles::Liv ingEntity
+ birthTime: + sex: + deceasedInd: boolean+ deceasedTime: - multipleBirthInd: boolean- multipleBirthOrderNumber: int- organDonorInd: boolean
Entities and Roles::ManufacturedMaterial
- lotNumberText: string- expirationTime: - stabil i tyTime:
Entities and Roles::Material
+ formCode:
Entities and Roles::NonPersonLiv ingEntity
+ strain: - genderStatusCode:
Entities and Roles::Organization
+ geographicAddress: + electronicCommAddr: + standardIndustryClassCode:
Entities and Roles::Patient
+ confidentialityCode:
Entities and Roles::Person
+ geographicAddress: - maritalStatusCode: - educationLevelCode: + raceCode: - disabil i tyCode: - l ivingArrangementCdoe: + electronicCommAddr: - religiousAffi l iationCode: + ethnicGroupCode:
Entities and Roles::Place
+ gpsText: - mobileInd: boolean- addr: - directionsText: - positionText:
Entities and Roles::
ResearchProgram
+ type:
Entities and Roles::Role
+ id: + code: PSMCodedConcept+ name: + status: + effectiveStartDate: + effectiveEndDate: + geographicAddress: + electronicCommAddr: + certificate/l icenseText:
Entities and Roles::Study
OProtocolStructure::Activ ityDeriv edData
OProtocolStructure::ElectronicSystem
OProtocolStructure::ResponsibilityAssignment
AbstractActivity
BasicTypes::RIMActivity
+ businessProcessMode: PSMBusinessProcessMode+ code: PSMCodedConcept+ derivationExpression: TEXT+ status: PSMCodedConcept+ availabil i tyTime: TimingSpecification+ priorityCode: PSMCodedConcept+ confidentialityCode: PSMCodedConcept+ repeatNumber: rangeOfIntegers+ interruptibleIndicator: BOOLEAN+ uncertaintyCode: CodedConcept+ reasonCode: PSMCodedConcept
BasicTypes::RIMActiv ityRelationship
+ relationshipCode: PSMCodedConcept+ sequenceNumber: NUMBER+ pauseCriterion: + checkpointCode: + splitCode: + joinCode: + negationIndicator: BOOLEAN+ conjunctionCode:
«ODM ItemData»Design Concepts::DiagnosticImage
OStudy Design and Data Collection::OEncounterDefinitionList--???
+ listOfDataCollectionInstruments:
OStudy Design and Data Collection::OBRIDGDeriv ationExpression
+ type: ENUM{transformation, selection}+ rule: TEXT+ id: PSMID+ name: TEXT
OStudy Design and Data Collection::OBRIDGTransition
+ criterion: RULE+ eventName: TEXT
Plans::Protocol/Plan
BusinessObjects::Amendment
Protocol Concepts::Bias
«implementationClass»BusinessObjects::
BusinessRule
BusinessObjects::ClinicalDev elopmentPlan
BusinessObjects::CommunicationRecord
Protocol Concepts::Concurrency
Protocol Concepts::
Configuration
Protocol Concepts::Constraint
Protocol Concepts::
Control
Protocol Concepts::DesignCharacteristic
+ synopsis: + type: test value domain = a,d,f,g+ summaryDescription: + summaryCode: + detailedMethodDescription: + detailedMethodCode:
Protocol Concepts::StudyDocument
+ effectiveEndDate: DATETIME+ version: + author: SET+ effectiveStartDate: DATETIME+ ID: SET PSMID+ documentID: + type: ENUMERATED = formal plus non...+ description: PSMDescription+ title: + status: PSMStatus+ confidentialityCode: PSMCodedConcept+ businessProcessMode: PSMBusinessProcessMode
Protocol Concepts::EligibilityCriterion
Protocol Concepts::ExclusionCriterion
BusinessObjects::IntegratedDev elopmentPlan
Design Concepts::Masking
+ level: + objectOfMasking (set): + procedureToBreak: + unmaskTriggerEvent (set):
Protocol Concepts::Milestone
BasicTypes::BRIDGAnalysisVariable
+ name: TEXT+ value: + controlledName: PSMCodedConcept+ businessProcessMode: PSMBusinessProcessMode
BasicTypes::BRIDGBusinessProcessMode
+ modeValue: ENUM {Plan, Execute}
BasicTypes::BRIDGContactAddr
+ type: PSMCodedConcept+ effectiveTime: BRIDGInterval+ usage: PSMCodedConcept
BasicTypes::BRIDGID
+ source: Text+ version: Text+ value: Text
BasicTypes::BRIDGInterv al
- startTime: timestamp+ endTime: timestamp
BasicTypes::BRIDGStatus
+ effectiveEndDate: + effectiveStartDate: + statusValue:
BusinessObjects::ProtocolRev iew
+ date: + result:
Design Concepts::Randomization
+ minimumBlockSize: + maximumBlockSize:
Protocol Concepts::
Scope
BusinessObjects::SiteStudyManagementProjectPlan
BusinessObjects::SiteSubjectManagementProjectPlan
BusinessObjects::SponsorStudyManagementProjectPlan
BusinessObjects::Study
+ startDate: Date+ endDate: Date+ type: PSMCodedConcept+ phase: PSMCodedConcept+ randomizedIndicator: Text+ SubjectType: PSMCodedConcept
Protocol Concepts::StudyBackground(why)
+ description: PSMDescription+ summaryOfPreviousFindings: PSMDescription+ summaryOfRisksAndBenefits: PSMDescription+ justificationOfObjectives: PSMDescription+ justificationOfApproach: PSMDescription+ populationDescription: PSMDescription+ rationaleForEndpoints: PSMDescription+ rationaleForDesign: PSMDescription+ rationaleForMasking: PSMDescription+ rationaleForControl: PSMDescription+ rationaleForAnalysisApproach: PSMDescription
Protocol Concepts::StudyObjectiv e(what)
+ description: PSMDescription+ intentCode: SET ENUMERATED+ objectiveType: ENUM{Primary,Secondary,Ancil lary}+ id: PSMID
Protocol Concepts::StudyObjectiv eRelationship
+ type: PSMCodedConcept
Protocol Concepts::StudyObligation
+ type: ENUMERATED+ description: PSMDescription+ commissioningParty: + responsibleParty:
BusinessObjects::Activ itySchedule (the "how",
"where", "when", "who")
+ description: PSMDescription
BusinessObjects::SupplementalMaterial
+ type: + description: PSMDescription+ version: + ID: SET PSMID
Protocol Concepts::Variance
BusinessObjects::Waiv er
Name: Comprehensive Logical ModelAuthor: FridsmaVersion: 1.0Created: 7/22/2005 2:53:51 PMUpdated: 7/29/2005 2:33:32 PM
BusinessObjects::Adv erseEv entPlan
BusinessObjects::DataManagementPlan
BusinessObjects::ContingencyPlan
BusinessObjects::SubjectRecruitmentPlan
BusinessObjects::DataMonitoringCommitteePlan
BusinessObjects::SafetyMonitoringPlan
BusinessObjects::Inv estigatorRecruitmentPlan
BusinessObjects::AssayProcedures
BusinessObjects::ClinicalTrialMaterialPlans
BusinessObjects::BiospecimenPlan
BusinessObjects::ProtocolDocument
BusinessObjects::ClinicalStudyReport
BusinessObjects::EnrollmentRecord
BusinessObjects::FinalRandomizationAssignment
BusinessObjects::GuideBusinessObjects::
RandomizationAssignment
+ randomizationCode: + subjectID: + assignmentDateTime:
BusinessObjects::
RegulatoryRecord
Protocol Concepts::Outcome
- description: BRIDGDescription- ranking: OutcomeRank- associatedObjective: Set- analyticMethods: Set- asMeasuredBy: Set- outcomeVariable: - threshold:
Statistical Concepts::Hypothesis
+ statement: PSMDescription- associatedObjective: - cl inicallySignificantDiff: char
AbstractActivity
Statistical Concepts::Computation
- description: PSMDescription- algorithm: char- input: AbstractStatisticalParameter- output: AbstractStatisticalParameter
Statistical Concepts::StatisticalModel
+ description: PSMDescription# outputStatistic: StudyVariable- computations: Set- assumptions: Set
Statistical Concepts::SampleSizeCalculation
+ clinicalJustification: TEXT
Statistical Concepts::AnalysisSetCriterion
- description: char- subgroupVariable: StudyDatum- sequence: int
Statistical Concepts::StatisticalAnalysisSet
+ description: PSMDescription- scopeType: AnalysisScopeTypes
Statistical Concepts::StatisticalAssumption
+ description: PSMDescription
Statistical Concepts::SequentialAnalysisStrategy
+ alphaSpendingFunction: + timingFunction: + analysis: + trialAdjustmentRule:
Statistical Concepts::StatisticalConceptArea
- evaluableSubjectDefinition: char- intentToTreatPopulation: char- clinicallyMeaningfulDifference: char- proceduresForMissingData: char- statSoftware: char- methodForMinimizingBias: char- subjectReplacementStrategy: char- randAndStratificationProcedures: char
Statistical Concepts::HypothesisTest
+ significanceLevel: double+ lowerRejectionRegion: int+ upperRejectionRegion: int+ testStatistic: + comparisonType: AnalyticComparisonTypes# associatedSummaryVariables:
AbstractActivity
Statistical Concepts::Analysis
+ description: PSMDescription+ analysisType: Set{AnalysisTypes}+ analysisRole: + rationaleForAnalysisApproach: PSMDescription# associatedStrategy: # associatedHypotheses:
Design Concepts::StudySchedule
- Periods: Set- Tasks: Set- TaskVisits: Set- associatedArms: Set
AbstractActivity
«Period»Design Concepts::Element
- Children: Set- epochType: EpochTypes
AbstractActivity
Design Concepts::PlannedTask
- displayName: char[]- whoPerforms: int- sequence: int- procDefID: PSMCodedConcept- sourceText: char[]
AbstractActivity
Design Concepts::Ev entTask
- localFacil ityType: LocalFacil i tyType- centralFacil i ti tyType: CentralFacil itiyType- eventID: OID- taskID: OID- purposes: Set
SubjectEvent
Design Concepts::ProtocolEv ent
- parent: AbstractActivity- eventType: ScheduledEventType- studyOffset: PSMInterval- studyDayOrTime: char
Design Concepts::Ev entTaskPurpose
- isBaseline: boolean- purposeType: PurposeType- associatedOutcome:
SubjectEvent
Design Concepts::UnscheduledEv ent
- eventType: UnscheduledEventType
BusinessObjects::StatisticalAnalysisPlan
Design Concepts::StudyActiv ityRef
- activityID: OID
«ODM ItemData»Design Concepts::Observ ation
- transactionType:
«ODM:ItemData»Design Concepts::
TreatmentConfirmed
«ODM:ItemDef»Design Concepts::
PlannedInterv ention
«ODM:ItemDef»Design Concepts::
PlannedObserv ation
AbstractActivity
«abstract»Design
Concepts::StudyActivityDef
«implementationClass»Design Concepts::ClinicalDecision
«implementationClass»Design Concepts::
TemporalRule
BasicTypes::StudyVariable
- OID: long- Name: char- unitOfMeasureID: OID- minValid: - maxValid: - controlledName: ENUM
BasicTypes::StudyDatum
- complete: bool- value: Value- timestamp: timestamp- itemOID:
BasicTypes::ActActRelation
- description: BRIDGDescription- relationQualifier: BRIDGCodedConcept- mode: PSMBusinessProcessMode- effectiveTime: BRIDGInterval+ priorityNumber: NUMBER- negationRule: AbstractRule- detail: char- sourceAct: AbstractActivity- destAct: AbstractActivity- sequence: int
+ «property» relationQualifier() : PSMCodedConcept+ «property» sourceAct() : AbstractActivity+ «property» destAct() : AbstractActivity
BasicTypes::AbstractRule
- isExclusive: bool
+ run() : bool
BasicTypes::AnalysisVariableInst
- roleInAnalysis: RoleInAnalysisTypes
Design Concepts::Arm
- nameOfArm: char[]- plannedEnrollmentPerArm: char[]- randomizationWeightForArn: int- associatedSchedules: Set
BasicTypes::BRIDGCodedConcept
- code: TEXT- codeSystem: - codeSystemName: TEXT- codeSystemVersion: NUMBER- displayName: TEXT- originalText: TEXT- translation: SET{PSMCodedConcept}
«ODM:ItemData»Design Concepts::
SubjectDatum
- subjectID: int
0..*
1
*
1
1..*
*
1
+source 1
+target 0..*
1 *
+correlativeStudy 0..*
+primaryStudy 1
1 *
hasAnalysisSets
*-_StatisticalAnalysisSet
hasAssumptions
hasModel
kindOfAnalysis
hasHypotheses
kindOfAnalysis
hasPurposes
hasAnalyses
kindOfActRelation
isKindOf
hasComputations
«abstraction»
1 1..*
hasAnalyses
*
-_Hypothesis
1
1..*
1-sourceobjective
*
*
+target activity
hasChildAnalyses
Defined By
-sourceactivity
*
Scheduled Sub Activities
Defined By
hasAnalysisSets
restates Objective
hasStrategy
hasElements
tasksPerformedThisSchedule
hasArms
as Measured By
hasUnscheduledEvents
hasOngoingEvents
Implements
hasCriteria
implements
«execution mode»
kindOfActivityRelation
implements
hasElements
associatedVariable
*-_DevelopmentPlan
kindOf
HasSubElements
hasSchedules
1..*
1..*
hasScheduledEvents
1
taskAtEvent
1..*
+TerminatingActivity 1..*
+EndEvent 1
+StartEvent 1
+FirstActivity 1..*
+passedTo
1+targetActivity
1+contains
1..*+IsContainedIn
1
1
1..*
1
-sourceactivity
0..*+generates
+sourceActivity
Protocol Authoring and Documentation
Clinical Trial Design
Structured Statistical Analysis
Clinical TrialRegistration
Eligibility Determination
Protocol activities and Safety monitoring (AE)
CS584 Lecture on 4/6/2007 caBIG Data Structures
caGrid
Enterprise Service Bus
CTMSi Architectural overview
Oracle
Grid Service
C3PR
GTSDorian Grid Grouper
Inbound
Binding
Component
Outbound
Binding
Component
Routing Rules
Messages
Oracle
Web Service
C3D
Postgre
Grid Service
PSC
Oracle
Grid Service
LabViewer/CTOM
Postgre
Grid Service
caAERS
Authentication
Trust
Authorization
caXchange
CS584 Lecture on 4/6/2007 caBIG Data Structures
CTMSi Demonstration
sd ov erv iew sequence
SME
C3PRcaExchange PSC LabViewer caAERSCTOM
User will create a new patient and register the patient to a protocol, checking the eligibil ity status. The protocol is already prepopulated amongst all the systems.
The user will have a hot-link from the C3PR interface to thePSC interface. The user will see the patient registered on the prepopulated protocol .
The user will hot-link over to the Lab Viewer to view Lab activities.
C3D WS
We may not be able to hot-linkto C3D, but the data should bepropogated there and viewable from the C3D interface.
caExchange (or some component hooked into caExchange) will load data into C3D.
The user will hot-link from the LabViewer to caAERS, where he can edit and submit the AE.
A new AE with some minimal information willbe created and sent to caAERS through caExchange.
The user hot-links from caAERS to PSC, where they will see the AE notification and make appropriate changes.
registerPatient
registerPatient(Participant,StudySubject, StudySite,HealthCareSite)
isValidProtocol(studyId)
patientPositionId= getPatientPosition(site, studyId)
registerPatient(Participant,StudySubject, StudySite,HealthCareSite)
registerPatient(Participant, StudySubject,StudySite, HealthCareSite)
registerPatient(Participant, StudySubject,StudySite, HealthCareSite)
registerPatient(Participant, StudySubject,StudySite, HealthCareSite)
viewSchedule
scheduleActivity
viewLabActivities
viewLabData(Patient)
viewLabData
Lab[]= query(id[])
loadLabData
loadLabData(Paticipant, StudySubject,Study, LabTest, LabResult)
loadLabData(mrn, studyId, lab, labTest)
viewPatient
viewLabData
selectLabForAE
Lab[]= query(id[])
newAE(Paticipant, StudySubject, Study, LabTest, LabResult)
id= newAE(Participant, StudySubject, Study, LabTest, LabResult, AE)
editAE
submitAE
submitAE(Participant, StudySubject, Study, AE)
flagAE(Participant, StudySubject, Study, AE)
login
aeNotification
modifySchedule
CS584 Lecture on 4/6/2007 caBIG Data Structures
Service Metadata: All Services
• Common Service Metadata
• Provided by all services
• Details service’s capabilities, operations, contact information, hosting research center
• Service operation’s inputs and outputs defined in terms of structure and semantics extracted from caDSR and EVS
• Majority auto-generated by Introduce
CS584 Lecture on 4/6/2007 caBIG Data Structures
Service Metadata: Service Security
• Service Security Metadata
• Provided by all services
• Details the service’s requirements on communication channel for each operation
• Can be used by client to programmatically negotiate an acceptable means of communication
• For example: Does operation X allow anonymous clients, or are credentials required?
• Auto-generated by Introduce
CS584 Lecture on 4/6/2007 caBIG Data Structures
Service Metadata: Data Service
• Data Service Metadata• Provided by all data
services• Describes the Domain
Model being exposed, in terms of a UML model linked to semantics
• Provides information needed to formulate the Object-Oriented Query
• As with common metadata, data types defined in terms of structure and semantics extracted from caDSR and EVS
• Auto-generated by Introduce
CS584 Lecture on 4/6/2007 caBIG Data Structures
caTRIP in-depth: ArchitectureSecurity
caGrid Authentication Service
SAML Assertion
User Credentials Dorian
Duke Authentication Plugin
Duke Domain ControllerNT Security
Grid Data Service
User Grid Certificate
CSM
backenddata
authentication
authorization
GridGrouper
Trust Fabric
CS584 Lecture on 4/6/2007 caBIG Data Structures
caTRIP in-depth: Data sharingChallenges in data sharing
• Building data-oriented systems
• Duke requires IRB approval to gain access to identifiable data
• We worked around by leveraging people already on IRB protocols
• Deidentifying data
• Data is owned by different groups across the cancer center
• Traditional deidentification: data manager deidentifies an entire dataset then throws away the key
• Distributed deidentification: trusted service provider (TSP) deidentifies discreet values
• Traditional approach is not scalable – requires a middle-man
• IRB approval required for distributed approach because it deviates from traditional deidentification (at Duke)
CS584 Lecture on 4/6/2007 caBIG Data Structures
caTRIP in-depth: Data sharingDistributed deidentification
Trusted Service Provider
PHI
DEID
MRN1
ABC123
MRN2 DEF456
. . .
. . .
MRN3
GHI789
MRN3
GHI789
Randomly generated
Has IRB approval to see identifiable data
Has IRB approval to see identifiable data
Has IRB approval to store identifiable data
Secure connection
PHI
DEID
MRN1
ABC123
MRN2 DEF456
MRN3 GHI789
. . .
. . .
CS584 Lecture on 4/6/2007 caBIG Data Structures
caTRIP in-depth: ArchitectureSimple GUI configuration
Target
Associated Classes
Service A Service B
Target
Linking ObjectJoin Condition
Linking ObjectJoin Condition
FilterObject
Association Direction
Association Direction
Associated Object Tree
Foreign Association Outbound Path
Foreign Association inbound Paths
Join Condition: CDE ex. MRN
Foreign AssociationService A Service B
TissueSpecimen
ParticipantMedicalIdentifier
SpecimenCharacteristics
SpecimenCollectio
nGroup
ClinicalReport
BreastCancerBiomarkers
CS584 Lecture on 4/6/2007 caBIG Data Structures
caTRIP in-depth: ArchitecturecaBIG compatibility
• Challenge
• Silver-compatibility is in some ways (and for good reason) stringent
• Grid technologies were still in development (caGrid 1.0 is now released)
• caTRIP is a silver-compatible application (in theory)
• Compatibility submission package completed
• Going through review now for silver-compatible data services
• caTRIP leverages caCORE technologies
• Common Security Module (CSM) provides authorization
• caCORE-SDK provides tooling to create Java classes from UML (XMI), XML schemas, and castor mappings
• caTRIP leverages caGrid technologies
• Index Service provides advertisement and discovery
• Authentication Service provides
• Dorian helps provide authentication
• GTS provides trust fabrics
CS584 Lecture on 4/6/2007 caBIG Data Structures
Next steps
• Aggregate data from multiple services of the same type• Scenario: caTissue Suite deployed at 13 cancer centers
• Add datasets and data types• CTMS, population sciences, basic science, etc.
• Add analytical services• Integrate with workflow• Add visualization components
• Enhanced reporting• Automate Excel pivot table• Data mining results
• Enhanced querying• Asynchronous, parallel querying• Querying multiple deployed distributed query services
• Continue refinement of user interface• Synchronization of advanced and simple GUI• Additional usability features
CS584 Lecture on 4/6/2007 caBIG Data Structures
caGridcaBIG Resources
• caBIG™ Website: http://cabig.cancer.gov/index.asp
• caBIG™ Compatibility Guidelines: https://cabig.nci.nih.gov/compatibility_guidelines_documentation/
• Cancer Common Ontologic Representation Environment (caCORE): http://ncicb.nci.nih.gov/NCICB/infrastructure/cacore_overview
• Enterprise Vocabulary Services (EVS): http://ncicb.nci.nih.gov/NCICB/infrastructure/cacore_overview/vocabulary
• Cancer Data Standards Repository (caDSR): http://ncicb.nci.nih.gov/NCICB/infrastructure/cacore_overview/cadsr
• caCORE Software Developer’s Kit (caCORE SDK): http://ncicb.nci.nih.gov/NCICB/infrastructure/cacoresdk
• caCORE Training: http://ncicb.nci.nih.gov/NCICB/training/cadsr_training
• Model Driven Architecture: http://www.omg.org/mda/
• UML Modeling: http://www.sparxsystems.com.au/UML_Tutorial.htm
CS584 Lecture on 4/6/2007 caBIG Data Structures
caTRIP Why can’t I just write DCQL?
• What are all the tissue specimens from her2/neu positive patients that have a primary tumor in the breast and are BRCA1 positive?
• <DCQLQuery xmlns="http://caGrid.caBIG/1.0/gov.nih.nci.cagrid.dcql">• <TargetObject name="edu.wustl.catissuecore.domainobject.impl.TissueSpecimenImpl" serviceURL="http://152.16.96.114/wsrf/services/cagrid/CaTissueCore">• <Association name="edu.wustl.catissuecore.domainobject.impl.SpecimenCollectionGroupImpl" roleName="specimenCollectionGroup">• <Association name="edu.wustl.catissuecore.domainobject.impl.ClinicalReportImpl" roleName="clinicalReport">• <Association name="edu.wustl.catissuecore.domainobject.impl.ParticipantMedicalIdentifierImpl" roleName="participantMedicalIdentifier">• <Group logicRelation="AND">• <ForeignAssociation>• <JoinCondition>• <LeftJoin>• <Object>edu.wustl.catissuecore.domainobject.impl.ParticipantMedicalIdentifierImpl</Object>• <Property>medicalRecordNumber</Property>• </LeftJoin>• <RightJoin>• <Object>edu.duke.catrip.cae.domain.general.ParticipantMedicalIdentifier</Object>• <Property>medicalRecordNumber</Property>• </RightJoin>• </JoinCondition>• <ForeignObject name="edu.duke.catrip.cae.domain.general.ParticipantMedicalIdentifier" serviceURL="http://152.16.96.114/wsrf/services/cagrid/CAE">• <Association name="edu.duke.catrip.cae.domain.general.Participant" roleName="participant">• <Association name="edu.pitt.cabig.cae.domain.general.AnnotationEventParameters" roleName="annotationEventParametersCollection">• <Association name="edu.pitt.cabig.cae.domain.breast.BreastCancerBiomarkers" roleName="annotationSetCollection">• <Attribute name="HER2Status" predicate="LIKE" value="POSITIVE%"/>• </Association>• </Association>• </Association>• </ForeignObject>• </ForeignAssociation>• <ForeignAssociation>• <JoinCondition>• <LeftJoin>• <Object>edu.wustl.catissuecore.domainobject.impl.ParticipantMedicalIdentifierImpl</Object>• <Property>medicalRecordNumber</Property>• </LeftJoin>• <RightJoin>• <Object>edu.duke.cabig.tumorregistry.domain.PatientIdentifier</Object>• <Property>medicalRecordNumber</Property>• </RightJoin>• </JoinCondition>• <ForeignObject name="edu.duke.cabig.tumorregistry.domain.PatientIdentifier" serviceURL="http://152.16.96.114/wsrf/services/cagrid/CaTRIPTumorRegistry">• <Association name="edu.duke.cabig.tumorregistry.domain.Patient" roleName="patient">• <Association name="edu.duke.cabig.tumorregistry.domain.Diagnosis" roleName="diagnosisCollection">• <Attribute name="primarySite" predicate="LIKE" value="BREAST%"/>• </Association>• </Association>• </ForeignObject>• </ForeignAssociation>• <ForeignAssociation>• <JoinCondition>• <LeftJoin>• <Object>edu.wustl.catissuecore.domainobject.impl.ParticipantMedicalIdentifierImpl</Object>• <Property>medicalRecordNumber</Property>• </LeftJoin>• <RightJoin>• <Object>gov.nih.nci.caintegrator.domain.study.bean.StudyParticipant</Object>• <Property>studySubjectIdentifier</Property>• </RightJoin>• </JoinCondition>• <ForeignObject name="gov.nih.nci.caintegrator.domain.study.bean.StudyParticipant" serviceURL="http://152.16.96.114/wsrf/services/cagrid/CGEMS">• <Association name="gov.nih.nci.caintegrator.domain.analysis.snp.bean.SNPAnalysisGroup" roleName="analysisGroupCollection">• <Attribute name="name" predicate="LIKE" value="BRCA1%"/>• </Association>• </ForeignObject>• </ForeignAssociation>• </Group>• </Association>• </Association>• </Association>• </TargetObject>• </DCQLQuery>
HER2/NEU Positive
Foreign Join w/ CAE
Foreign Join w/ Tumor Registry
Primary Site Breast
Foreign Join w/ CGEMS
BRCA1 Positive
Select tissue
CS584 Lecture on 4/6/2007 caBIG Data Structures
caTRIPDistributed query engine
DCQL Distributed Query Engine
CQL
CQL
CQL
caG
rid d
ata
se
rvic
eca
Grid
da
ta
serv
ice
caG
rid d
ata
se
rvic
e
dat
ab
ase
dat
ab
ase
dat
ab
ase
data objects
data objects
data objects
data objects
CS584 Lecture on 4/6/2007 caBIG Data Structures
CTMSi BRIDG dynamic modeling
• *Process flow
• *story boards
• *Scenarios
• *Use cases
• *Text UML activity diagrams
• *Links to static structures
• Interaction diagrams (?)
• Sequence diagrams
• Collaboration diagrams (UML 2.0)
CS584 Lecture on 4/6/2007 caBIG Data Structures
CTMSi Patient registration message
User
C3PR
GridBC
Registration Message
Registration Message
Registration Message
Acknowledgement
caAERS
PSC
PSC Grid Service
caAERS Grid Service
JMS IN
Que
ue
JMS OUT Queue
Router
ESB
CS584 Lecture on 4/6/2007 caBIG Data Structures
caBIG compatibility CDE Browser
CS584 Lecture on 4/6/2007 caBIG Data Structures
caBIG compatibility CDE Browser permissible values
CS584 Lecture on 4/6/2007 caBIG Data Structures
Preferred Name
Synonyms
Definition
Relationships
Concept Code
caBIG compatibility NCI Thesaurus
CS584 Lecture on 4/6/2007 caBIG Data Structures
caGrid caGrid community involvement
• caGrid itself provides no real “data” or “analysis” to caBIG• It’s the enabling infrastructure which allows the community to do
so• Community members add value to the grid as applications,
services, and processes (for example: shared workflows)
• caGrid provides the necessary core services, APIs, and tooling• The real “value” of the grid comes from bringing this information
to the “end user”• Data Services: expose data to the grid in a unified way• Analytical Services: expose analytical operations to the grid
• Community members develop end user applications which consume of the resources provided by the grid
CS584 Lecture on 4/6/2007 caBIG Data Structures
caGridcaGrid exposing silver systems
• Object Oriented APIs and data resources are developed using Object types and information models registered in the caDSR
• These “silver systems” are grid-enabled by defining a grid service interface that defines the functionality to be exposed to the grid
• The grid service interface uses the same Object types as the existing system, but leverages a platform and language neutral representation (XML) of them
• The grid service implementation maps service invocations to API calls or queries into the existing system
CS584 Lecture on 4/6/2007 caBIG Data Structures
caGridFederated Query Processor
• Provides a mechanism to perform basic distributed aggregations and joins of queries over multiple data services
• As caGrid data services all use a uniform query language, CQL, the Federated Query Infrastructure can be used to express queries over any combination of caGrid data services
• Federated queries are expressed with a query language, DCQL, which is an extension to CQL to express such concepts as joins, aggregations, and target services
• Implemented as a stateful grid service, queries may be executed asynchronously and results retrieved at a later time• Supports secure deployments wherein result ownership is
enforced• Coupled with semantic discovery capabilities of caGrid, provides
a powerful framework for data discovery, mining, and integration
CS584 Lecture on 4/6/2007 caBIG Data Structures
caGridData service common query language
• Specifies a target object (result) type and selects the instances which satisfy the specified properties and nested object properties• Allows path navigation• Provides logical grouping• Provides name/predicate/value filtering on properties of
objects• Recursively defined• Ability to return full Objects, Set of attributes, count of
results, or distinct attribute values
CS584 Lecture on 4/6/2007 caBIG Data Structures
caGridExample CQL query
Return all Genes with a symbol beginning with BRCA and have an associated Taxon with a scientificName equal to “Homo sapiens”:<CQLQuery xmlns="http://CQL.caBIG/1/gov.nih.nci.cagrid.CQLQuery"> <Target name="gov.nih.nci.cabio.domain.Gene"> <Group logicRelation="AND"> <Attribute name="symbol" predicate="LIKE“ value="BRCA%"/> <Association roleName="taxon“ name="gov.nih.nci.cabio.domain.Taxon"> <Attribute name=“scientificName" predicate=“EQUAL_TO” value=“Homo sapiens"/> </Association> </Group> </Target></CQLQuery>
LIKE “BRCA%”
= “Homo sapiens”
CS584 Lecture on 4/6/2007 caBIG Data Structures
caBIG compatibility Metadata and concepts example