Towards a Data Model for the Australian Microbial Resources Information Network (AMRiN)
description
Transcript of Towards a Data Model for the Australian Microbial Resources Information Network (AMRiN)
Towards a Data Model
for the
Australian Microbial Resources Information Network
(AMRiN)
Lynette WoodburnAtlas of Living Australia
Version: 0.0317/09/2010
Each slide in this presentation comes with accompanying Notes.
You can’t see them if you display this presentation in ‘Slide Show’ mode.
If you’d like to see the Notes
• view the presentation in ‘Normal’ mode, and • expand the pane below the slide (the Notes pane) to see extra text.
Only then will you have a chance of understanding all the crazy diagrams.
TIP
a standard set of data fields for all micro-organisms
. to support the sharing and integration of data through AMRiN
. to pre-configure BioloMICS
Requirement
Options . choose an existing set
. develop something new
Towards a data model for AMRiN
Recommendation
. surprise!
1. Requirements
2. Options
3. Recommendation
AMRiN
AMRiN community
AMRiN
AMRiN community
AMRiN
AMRiN community
1. Requirements
2. Options
3. Recommendation
- existing
CABRIMCL
Common Access to Biological Resources and Information CABRI
a European organization of partner collections
who contribute data to searchable ‘catalogues’ covering
http://www.cabri.org/
• bacteria & archaea
• fungi & yeasts
• animal & human cell lines
• plant cell lines
• hybridomas
• phages
• plasmids
• plant cell viruses
• genomic libraries
CABRI’s sets of data elements
• 26
• 23
• 29
• 17
• 15
• 33
• 30
• 12
• 7
• bacteria & archaea
• fungi & yeasts
• animal cell lines
• plant cell lines
• hybridomas
• phages
• plasmids
• plant cell viruses
• genomic libraries
elements per set
Original_host_plant
Doubling_time
Lysogenicity
Isolated_from
Morphology
Common Access to Biological Resources and Information CABRI
For each different kind of biological resource,
CABRI defines nested sets of data elements
Mandatory Recommended Full
CABRI : bacteria & archaea
Strain_numberOther_collection_numbersRestrictionsOrganism_typeNameInfrasubspecific_namesStatusHistoryConditions_for_growth Form_of_supply
SerovarOther_namesIsolated_fromGeographic_originMutantGenotypeLiterature
Sexual_statePathogenicityEnzyme_productionMetabolite_productionApplicationsCatalogue_entryRemarksPrice_codePlasmids
Mandatory Recommended Full
CABRI : fungi & yeasts
Strain_numberOther_collection_numbersNameStatusOrganism_typeHistoryRestrictionsForm_of_supplyConditions_for_growth
Misapplied_namesRaceSubstrateGeographic_originLiteratureApplicationsMutantSexual_state
Price_codeRemarksPathogenicityMetabolite_productionEnzyme_productionGenotype
Mandatory Recommended Full
CABRI : animal & human cell lines
Accession_numberCell_line_nameBrief_descriptionDescriptionDepositorBibliographic_referencesMorphologyCulture_conditionsVirusesPropertiesRelease_conditionsHazard Passage_number
Species_validation
TumorigenicityKaryologyFreezing_mediumSterilityValidation_assaysFurther_bibliographyCommentsStorageDoubling_timeMycoplasmaFingerprintCytogeneticsKaryotypeCommentsResearch_council_depositBIOMED_1
Mandatory Recommended Full
CABRI’s sets of data elements
• 26
• 23
• 29
• 17
• 15
• 33
• 30
• 12
• 7
• bacteria & archaea
• fungi & yeasts
• animal cell lines
• plant cell lines
• hybridomas
• phages
• plasmids
• plant cell viruses
• genomic libraries
192
Sharing data about one kind of biological resource is easy
eg. phages
eg. plasmids
Sharing data about one kind of biological resource is easy
Sharing data about multiple kinds of biological resources is hard
Other_culture_collection_numbers
Other_collection_numbers
133 distinct data elements …
for describing several different kinds of biological resources ?
What is the prospect of deriving a common model from CABRI
… distributed across 9 sets
bacte
ria &
arc
haea
fungi & yeasts
animal cell lines
plant cell lines
hybridomas
phag
es plasmids
plant cell viruses
genomic libraries
each of 92 elements is found in only one set
CABRI as a common model ?
only 41 elements are found in more than one set
CABRI as a common model ?
27 data elements are found in two sets 10 ….. in three 4 ….. in four
No elements are found in more than 4 sets
Distribution of data elements across CABRI sets
• bacteria & archaea
• fungi & yeasts
• animal cell lines
• hybridomas
• phages
• plant cell lines
• plant cell viruses
• plasmids
• genomic libraries
Count of data elements in one set two three four
6 3 22 7 14 12 9 13 6 11 4 12 2 1 2 2 1 1 1 3 1
CABRI data element ‘themes’
• bacteria & archaea
• fungi & yeasts
• animal cell lines
• plant cell lines
• hybridomas
• phages
• plasmids
• plant cell viruses
• genomic libraries
ID of item in
collection
Name / classific
ation of it
em
item admin
handling & distributio
n regulatio
ns
care / maintenance
characteristics
literature
….origin
CABRI : comparison of elements across sets
• different names, same meaning (definition)
Accession_number, Strain_number
History, History_of_deposit
Bibliographic_references, Reference_paper, Literature, Reference, Further_bibliography
Restricted_distribution, Release_conditions,Restrictions, Distribution
Morphology, Morphology_and_growth
….
CABRI : comparison of elements across sets
• same name, different meanings
Brief_description
Type
phages type of elementphage, transposon, minitransposon, IS element, …
plasmids type of elementplasmid, phasmid, cosmid, shuttle vector, transposon, minitransposon, IS element, …
genomic libraries type of libraryPAC, BAC, YAC, PI, cDNA, …
hybridomas listing of species, strain, antibody specificity
animal cell lines listing of species, strain, tissue, tumour, pathology, transformed/transfected
CABRI : comparison of data element sets
• varying levels of scope
Conditions_for_growth bacteria & archaea
fungi & yeasts
culture medium
atmospheric and light conditions
temperature conditions
additional remarks on cultivation
Medium plasmids, phages
Medium_1 plant cell lines
Light_regime plant cell lines
Light_conditions plant cell lines
Temperature plant cell lines
Humidity plant cell lines
• 9 sets of data elements (but does not cover algae)
good for sharing information about one kind of organism
• few elements common to several sets
hard to share information about more than one kind of organism • does not lend itself to the derivation of a common set
elements of ‘different names, same meaning’ elements of ‘same name, different meanings’ elements with meanings of varying scope
• has international acceptance / presence (but no longer funded?)
CABRI : fitness for our purpose
1. Requirements
2. Options
3. Recommendation
- existing
CABRIMCL
Microbiological Common Language
MCL
• a new data exchange standard for microbiological information
Research in Microbiology, 161(6), 439-445
http://www.straininfo.net/projects/mcl
• a pluggable framework, easily extended
• has the same ancestor as CABRI (MINE)
• underpins StrainInfo (www.straininfo.net)
“ a world-wide, virtual catalog integrating the information from BRC [Biological Resource Centres] catalogs with related information”
CABRIMCL
CABRI compared with MCL
partitioned by kind of biological resource partitioned by workflow step
Sample IsolationCulture
Deposit
Medium Publication
Strain
The abstract model of Microbiological Common Language (MCL)
… follows the logical flow from sampling to subsequent deposits
mcl : Sample
sampleDate
sampleCultureStrainNumber
sampleCollectorsampleCollectorInstitute
comments
sampleDescriptionsampleLocationDescription
sampleLocationCountrysampleLocationPlace
sampleAltsampleLatsampleLong
sampleHabitatEnvoTermsampleHabitat
sampleCulture
Sample
mcl : Culture
Culture
[otherStrainNumbers]
id
cultureLastUpdateDateotherStrainNumberstrainNumber
catalogURL
speciesName
historyisolationDateisolatorisolatorInstituteisolationMethod
typeStrainOfSpeciestypeStrainOf
typeStrainOfGenus
comments
minimalGrowthTemperature[growthTemperature]
optimalGrowthTemperaturemaximalGrowthTemperature
oxygenRelationship
nomenclaturalPublicationpublication
environmentPublicationhistoryPublicationtaxonomicPublication
hasSamplerecommendMedium
some Object Properties
Culture
hasSamplerecommendMedium
nomenclaturalPublicationpublication
environmentPublicationhistoryPublicationtaxonomicPublication
Sample
Medium Publication
mcl : Medium mcl : Publication
Medium
mediumNamemediumNumbermediumURLmediumDescriptioncomments
Publication
dcterms: bibliographicCitationdc: titledc: creatorprism: publicationNameprism: volumeprism: numberprism: startingPageprism: pageRangedcterms: issued
MCL : fitness for our purpose
• MCL offers a broadly-applicable suite of data elements
. data elements are grouped according to workflow steps, not organism type
. applicable to algae and cyanobacteria
. the Strain concept supports the logical linking of related cultures
• the model is modular and easily extensible
. model cohesion is achieved through Object Properties
. links easily with genomic standards (see StrainInfo)
• born and raised in Europe (StrainInfo), but now going global
. Asian biorepositories network is considering adoption
. we’re invited to contribute to ongoing development
• primarily devised (custom-built) as a data exchange standard
1. Requirements
2. Options
3. Recommendation
Recommendation : dip a toe into the water
• MCL, custom-built for describing microbiological data, deserves consideration
Proposal
undertake a pilot, involving a small group of AMRiN participants,
to assess the suitability of MCL for AMRiN’s purpose.
AMRiN
AMRiN community
AMRiN participants’ input
map local elements to MCL elements
Note:some MCL elements
may not have a local equivalent
identify local elements to be kept ‘private’
identify other local elements to be shared ;
provide English definitionsto enable reconciliation with other participants’ elements
Pilot assessment
• Coverage?
• What additional common elements exist amongst the set to be shared?
How much orange overlaps purple?
How much purple overlaps purple?
• Other assessment criteria?
Pulling the pieces together
Please consider the foregoing proposal.
Does it seem reasonable to you?
Do you think there’s a better way?