UM/UT Microarray Short Course May 4, 2006 Functional Gene Clustering by Latent Semantic Indexing of...

29
UM/UT Microarray Short Course UM/UT Microarray Short Course May 4, 2006 May 4, 2006 Functional Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Ph.D. Department of Neurology University of Tennessee Health Science Center Center for Neurobiology of Brain Diseases

Transcript of UM/UT Microarray Short Course May 4, 2006 Functional Gene Clustering by Latent Semantic Indexing of...

Page 1: UM/UT Microarray Short Course May 4, 2006 Functional Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Ph.D. Department.

UM/UT Microarray Short CourseUM/UT Microarray Short CourseMay 4, 2006May 4, 2006

Functional Gene Clustering by Latent Semantic Indexing

of MEDLINE Abstracts

Ramin Homayouni, Ph.D. Department of Neurology

University of Tennessee Health Science Center

Center for Neurobiology of Brain Diseases

Page 2: UM/UT Microarray Short Course May 4, 2006 Functional Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Ph.D. Department.

Gene Expression ProfilingGene Expression Profiling

Alizadeh, et al., (2000) Nature 403:503.

Now What?Now What?

Page 3: UM/UT Microarray Short Course May 4, 2006 Functional Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Ph.D. Department.

Some Web ResourcesSome Web Resources

NCBI SitesOMIM http://www.ncbi.nlm.nih.gov/Literature/index.html LocusLink http://www.ncbi.nlm.nih.gov/LocusLink/ PubMed http://www.ncbi.nlm.nih.gov/entrez/

OthersHAPI http://array.ucsd.edu/hapi/ GenMAPP http://www.genmapp.org/ GO Tree Machine http://genereg.ornl.gov/gotm/ PubGene http://www.pubgene.org Arrowsmith http://arrowsmith.psych.uic.edu/Chillibot http://www.chilibot.net/ iHOP http://www.ihop-net.org/

Page 4: UM/UT Microarray Short Course May 4, 2006 Functional Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Ph.D. Department.

Defining Functional Relationships Defining Functional Relationships between Genesbetween Genes

Direct Relationship

Gene relationships already known (e.g., A-B or B-C)• Term co-occurrence

• Gene symbol: PubGene (Jenssen et al., Nature Genetics 2001 28:21)

• Gene names (synonyms and aliases) – biochemical

Indirect Relationship

Gene relationships unknown (e.g., such as A-C)

C

B

A

Page 5: UM/UT Microarray Short Course May 4, 2006 Functional Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Ph.D. Department.

Reelin Signaling PathwayReelin Signaling Pathway

Dab1

ApoE

Reelin

VLDLRApoER2

APP

p35Cdk5

Amyloidplaques

pTau

fyn

Page 6: UM/UT Microarray Short Course May 4, 2006 Functional Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Ph.D. Department.

Miscellaneous

Trp53FosNras

Rasa1Rab1Src

Notch1Dll1Jag1

Robo1PtchSmo

Reeler

RelnDab1

VLDLRLpr8

Gene Document Test SetGene Document Test Set

Alzheimer Disease

APP Aplp2Aplp1Psen1Psen2Lrp1MaptApoeA2m

Apbb1Apba1Cdk5Cdk5r

Cdk5r2

Page 7: UM/UT Microarray Short Course May 4, 2006 Functional Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Ph.D. Department.

PubGene Query: Dab1PubGene Query: Dab1http://www.pubgene.org/http://www.pubgene.org/

Reln 7 timesCdk5r 6 timesCdk5 5 timesGli2 3 timesSrc 3 timesDab2 2 timesFyn 2 timesSam68 1 timesCdkn1a 1 timesTbr1 1 timesGli 1 timesScr 1 timesShh 1 timescdf 1 timesAsh 1 timesDlgh4 1 timesp80 1 timesLck 1 timesEmx1 1 timesPcdh18 1 timesAgrn 1 timesArg2 1 times

Mouse Human

DAB2 3 timesGAD1 3 timesRELN 3 timesGSN 2 timesTNFSF5 2 timesHLA-DQA1 1 timesBAT2 1 timesGAD2 1 times

PubMed Query: Dab1 AND Reln = 10PubMed Query: Dab1 AND reelin = 57 !

Page 8: UM/UT Microarray Short Course May 4, 2006 Functional Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Ph.D. Department.

iHOP Query: Dab1iHOP Query: Dab1http://www.ihop-net.org/http://www.ihop-net.org/

Page 9: UM/UT Microarray Short Course May 4, 2006 Functional Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Ph.D. Department.

iHOP Query: Dab1; Sentence StructureiHOP Query: Dab1; Sentence Structurehttp://www.ihop-net.org/http://www.ihop-net.org/

Page 10: UM/UT Microarray Short Course May 4, 2006 Functional Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Ph.D. Department.

iHOP Query: Dab1; Network buildingiHOP Query: Dab1; Network buildinghttp://www.ihop-net.org/http://www.ihop-net.org/

Page 11: UM/UT Microarray Short Course May 4, 2006 Functional Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Ph.D. Department.

Vector Space Model:Vector Space Model:Latent Semantic IndexingLatent Semantic Indexing

w1

w2

w3

QueryW1

W2

W3

.

.

.

Wx

Query

G1 G2 ... Gx

aij

G1

aij = lij gi

Page 12: UM/UT Microarray Short Course May 4, 2006 Functional Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Ph.D. Department.

Semantic Gene OrganizerSemantic Gene Organizer©© User InterfaceUser Interface

Page 13: UM/UT Microarray Short Course May 4, 2006 Functional Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Ph.D. Department.

Reelin Accession # QueryReelin Accession # Query

Page 14: UM/UT Microarray Short Course May 4, 2006 Functional Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Ph.D. Department.

Reelin Keyword QueryReelin Keyword Query

Page 15: UM/UT Microarray Short Course May 4, 2006 Functional Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Ph.D. Department.

50-Gene Document Collection50-Gene Document Collection

Development

CancerAlzheimer

1511

5

163

Page 16: UM/UT Microarray Short Course May 4, 2006 Functional Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Ph.D. Department.

Hierarchical TreeHierarchical Tree

Development Cancer AlzheimerDevelopment

Page 17: UM/UT Microarray Short Course May 4, 2006 Functional Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Ph.D. Department.

Unrooted Tree (Graph)Unrooted Tree (Graph)

Page 18: UM/UT Microarray Short Course May 4, 2006 Functional Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Ph.D. Department.

Variation in Abstract RepresentationVariation in Abstract Representation

Reduce Reduce NoiseNoise

Page 19: UM/UT Microarray Short Course May 4, 2006 Functional Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Ph.D. Department.

Abstract References in LocusLinkAbstract References in LocusLink

Page 20: UM/UT Microarray Short Course May 4, 2006 Functional Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Ph.D. Department.

Gene symbols and names that are not Gene symbols and names that are not used in the literatureused in the literature

IncreaseIncreaseRepresentationRepresentation

Page 21: UM/UT Microarray Short Course May 4, 2006 Functional Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Ph.D. Department.

Alternate Names and AliasesAlternate Names and Aliases

Page 22: UM/UT Microarray Short Course May 4, 2006 Functional Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Ph.D. Department.

Log-entropy Term Weighting Log-entropy Term Weighting

W1

W2

W3

.

.

.

Wx

Query

G1 G2 ... Gx

aij

aij = lij gi

Page 23: UM/UT Microarray Short Course May 4, 2006 Functional Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Ph.D. Department.

Top Terms in Gene DocumentTop Terms in Gene Document

reelin (4.0323)reeler (3.7762) positioning (1.9135) lissencephaly (1.8491) schizophrenia (1.7113) apoer2 (1.5637) cr (1.5544) esophageal (1.5339) dab1 (1.5118) vldlr (1.4973) carcinoma (1.4881) wild-type (1.4862) cask (1.4288) psychiatric (1.4266) apoe (1.3739) positioned (1.3726)

reelin (4.0323)reeler (3.7762) positioning (1.9135) lissencephaly (1.8491) schizophrenia (1.7113) apoer2 (1.5637) cr (1.5544) esophageal (1.5339) dab1 (1.5118) vldlr (1.4973) carcinoma (1.4881) wild-type (1.4862) cask (1.4288) psychiatric (1.4266) apoe (1.3739) positioned (1.3726)

Page 24: UM/UT Microarray Short Course May 4, 2006 Functional Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Ph.D. Department.

Abstract retrieval by combining Abstract retrieval by combining weightedweighted terms in gene name, symbol or aliases terms in gene name, symbol or aliases

Query Description # abstracts

symbol Cdk5r2 0

alias p39 70

name cyclin-dependent kinase 5, regulatory subunit 2

0

c1 p39 AND cdk5 18

c2 p39 AND cyclin-dependent 17

c3 p39 AND kinase 24

c4 p39 AND cdk5 AND cyclin-dependent

17

c5 p39 AND cdk5 AND cyclin-dependent AND kinase

17

alias

c3

c1

53

171 7

Page 25: UM/UT Microarray Short Course May 4, 2006 Functional Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Ph.D. Department.

Weighted PubMed QueriesWeighted PubMed Queries

Cdk5r2

Lrp8

Atoh1

Cdk5r

kit

egfr

fos

myc

Under-represented Genes Over-represented Genes

Page 26: UM/UT Microarray Short Course May 4, 2006 Functional Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Ph.D. Department.

Weighted Query Weighted Query AlgorithmAlgorithm

Gene symbolGene Name Gene Aliases

Combination of highest weighted terms

Extract overlapping abstracts

RESULTS:2-59 fold increase in the number of abstracts associated with genes compared to those referenced in LL

RESULTS:2-59 fold increase in the number of abstracts associated with genes compared to those referenced in LL

Page 27: UM/UT Microarray Short Course May 4, 2006 Functional Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Ph.D. Department.

Summary and ConclusionsSummary and Conclusions

Log-entropy weighting identifies descriptive or ‘useful’ aliases for genes.

Weighted PubMed Querying increases abstracts for under-represented genes and decreases abstracts for over-represented genes with high specificity.

This automated method improves gene abstract assignment 2 to 59 fold beyond those assigned by LocusLink indexers.

Page 28: UM/UT Microarray Short Course May 4, 2006 Functional Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Ph.D. Department.

Vs.

Word x Gene DocMatrix

Word x Gene DocMatrix

PubMedAbstracts gene descriptor gene descriptor

word weights word weights

SearchTerm

Refinement

clustering clustering

pairwise Score pairwise ScoreGeneDoc

GeneDoc

GeneDoc

GeneDoc

PMID Citations inLocusLink

SGO overviewSGO overview

Page 29: UM/UT Microarray Short Course May 4, 2006 Functional Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Ph.D. Department.

AcknowledgmentsAcknowledgments

UT MemphisUT MemphisNeurology

Lijing Xu, M.S.

Lai Wei, M.D.

Molecular Sciences

Yan Cui, Ph.D.

Mi Zhou, M.S.

UT KnoxvilleUT KnoxvilleComputer Science

Michael Berry, Ph.D.

Kevin Heinrich

Center for Neurobiology of Brain Diseases