Gene Expression Data Annotation – an application of the cell type ontology Helen Parkinson, PhD 19...

14
Gene Expression Data Annotation – an application of the cell type ontology Helen Parkinson, PhD 19 May 2010

Transcript of Gene Expression Data Annotation – an application of the cell type ontology Helen Parkinson, PhD 19...

Gene Expression Data Annotation – an application of the cell type ontology

Helen Parkinson, PhD

19 May 2010

EBI Core Databases

..... and ~ 40 others

Use Cases

• Query support and expansion

• Data visualization and exploration

• Summary level data presentation

• Data integration via ontology terms

• Meta analysis – human and mouse

• Semantic distance queries across experiments

• Cross products between – cell lines, tissues, cell types, diseases ...

• Users – curators, biologists, engineers

• Intelligent template generation for different experiment types in submission or data presentation

• Detection of annotation inconsistency

• Annotator support, term suggestion

• Text mining at acquisition/submission for GEO data and post-hoc

• Literature text mining

Integration challenges

• 1,000,000 sample annotations in ArrayExpress (Aug 2009)• Seq DBs, tissues, metagenomics, reactions, etc• Cross database integration issues EGA/AE/ERA etc • Name value pairs ‘Disease’ =‘cancer’, semi-controlled text, papers• Algorithms, software, methods,• Parameter annotation e.g. Virtual Physiological Human• Complex phenotypes, clinical information• Embedded literature, Pubmed abstracts, full text papers,

supplemental information• Most of the data relate to cell lines, tissues, disease samples, clinical

information and phenotypes• Millions of records, legacy data, since ~1985

www.ebi.ac.uk/efo

Phenotypes

EBI Sample DatabaseMolecular databases

Genomes, genesENSEMBL

ProteinsUniPROT

ChemicalsChEBI

Archives of supporting data

Molecular Atlases

European Nucleotide Archive

Proteomics measurementsPride

Metabolomics experimentsA new database

Mol

ecul

es

Pathways(Reactome)

Transcript measurementsArrayExpress DBs

European Sample database

Archive data – Lucene full text searching plus ontology searching

Nikolay Kolesnikov, Anna Zhukova

Atlas Querying All genes under/over expressed in cell types per species, where cell type is annotated as a variable

EFO Vital Statistics• May 2010, release 2.3 (23 monthly releases), 2888 classes (832 no xrefs)• Built in Protégé, OWL, uses DL converted to OBO• Available via OLS, BioPortal, www.ebi.ac.uk/EFO• Focus on diseases, cell types, cell lines, ‘mammalian anatomy’, plant terms,

compound, experimental processes and hardware• OWL tools available – ontology differ• Mapped to 24 semantic resources • Malaria Ontology (MALIDO) ver0.2b Mammalian phenotype (MP) ver1.309 Medical Subject

Headings (MSH) ver2009_2009_02_13 International Classification of Diseases (ICD-9) ver9 Phenotypic quality (PATO) ver1.188 CRISP Thesaurus Version 2.5.2.0 Mosquito gross anatomy (TGMA) ver1.10 Human disease (DOID) ver1.88 Chemical entities of biological interest (CHEBI) ver1.59 Drosophila gross anatomy (FBbt) ver1.30 Foundational Model of Anatomy (FMA) ver3.0 The Arabidopsis Information Resource (TAIR) (various dates) The Jackson Lab mouse database SNOMED Clinical Terms (SNOMEDCT) ver2009_01_31 Ontology for Biomedical Investigations (OBI) ver2009-11-06 Philly Units of measurement (UO) ver1.21 Microarray experimental conditions (MO) ver1.3.1.1 Plant structure (PO) Minimal anatomical terminology (MAT) ver1.1 NIFSTD (nif) ver1.4 NCI Thesaurus (NCIt) ver09.07 Cell type (CL) ver1.40 Zebrafish anatomy and development (ZFA) ver1.23 BRENDA tissue / enzyme source (BTO) ver1.3 , Relations ontology 1.2, BFO• .

Building the Experimental Factor Ontology• Position of EFO in the ‘bigger picture’• Key is orthogonal coverage, reuse of existing resources

and shared frameworks

Disease Ontology Anatomy Reference OntologyEFO

Cell Type Ontology

Chemical Entities of Biological Interest

(ChEBI)

Various Species Anatomy

Ontologies

Relation Ontology

Text mining

Deploying EFO

• Text mining at data acquisition • Ontology driven queries• Data mining• Data driven ontology development• Term requests for source ontologies

AE/GEO acquire

310,000

assays

Experiment

Archive

Re-annotate, summarize, add semantics ATLAS

Gene Expression

Atlas

11

Sample Annotations ~ 1,000,000 in Archive, ~ 10,000 Atlas

Desiderata for the Cell Type Ontology

• Release with hematopoietic cell types ASAP• Mass deprecation release ASAP• All leaf nodes defined in text/logically• Cross products – anatomy, GO process• Cell line x cell types• More orthogonality - CTO as a definitive source• MIREOT for appropriate terms• EFO will import CTO name spaces (when?)• Synonyms - non-exact=bad

VBO – Vertebrate bridging ontology

• Collaboration between ArrayExpress, MRC Harwell, Cambridge Anatomy and Genetics• Scope: mouse, human, rat, teleosts• FMA view creation – ‘mammalian view’ of anatomy• Mapping to existing ontologies – single species, Uberon• Modelling using ‘homologous to’ relationship• Skeletal focus, adult stages• Evidence for homology – literature, experts, phylogeny• Workshop June 2010

Production People

Tomasz Adamusiak Tony Burdett Emma Hastings Anna Farne

Ele Holloway Margus Lukk James MaloneNatalja Kurbatova

Helen Parkinson Morris Swertz Raven Travillian Eleanor Williams

Alvis

Ugis

Misha Kapushesky

Alvis Brazma

Ugis Sarkans

Gen2Phen Visitor