1 Annotation for Gene Expression Analysis with Reactome.db Package Utah State University – Spring...

Post on 11-Dec-2015

215 views 0 download

Tags:

Transcript of 1 Annotation for Gene Expression Analysis with Reactome.db Package Utah State University – Spring...

1

Annotation for Gene Expression Analysis with Reactome.db Package

Utah State University – Spring 2012

STAT 6570: Statistical Bioinformatics

Cody Tramp

2

References

Ligtenberg W. 2011. Reactome.db: How to use the reactome.db package.

www.reactome.org

3

Reactome.db Overview

“Open souce, open access, manually curated, and peer-reviewed pathway database” – www.reactome.org

Reactome.db is an R interface that allows queries to the SQL database containing pathway information

Contains functions for converting between annotation IDs and names for GO, Entrez, and Reactome

4

Getting Help on Specific Reactome.db Functions

#Load the Reactome.db packagelibrary(reactome.db)

#Check for main manual pages?reactome.db #This won't get the actual manual

#List all reactome.db objectsls("package:reactome.db")

# [1] "reactome“ "reactome_dbconn“ "reactome_dbfile" # [4] "reactome_dbInfo“ "reactome_dbschema“ "reactomeEXTID2PATHID" # [7] "reactomeGO2REACTOMEID“ "reactomeMAPCOUNTS“ "reactomePATHID2EXTID" #[10] "reactomePATHID2NAME“ "reactomePATHNAME2ID“ "reactomeREACTOMEID2GO"

#Look up specific manual for an object?reactome_dbInfo #Still not very useful – poor documentation

5

How IDs and names are stored in Reactome.db The reactome.db links to a SQL database Functions are interfaces to the database SQL databases are relational databases

(think of Excel spreedsheets, but better) Data is stored as key:value pairs

Key Value15869 Homo sapiens: Metabolism of nucleotides68616 Homo sapiens: Assembly of the ORC complex at the origin of replication68827 Homo sapiens: CDC6 association with the ORC:origin complex68867 Homo sapiens: CDT1 association with the CDC6:ORC:origin complex68874 Homo sapiens: Assembly of the pre-replicative complex

6

Reactome.db Function Uses(NOTE: all return a key:value list)

Converting Between Entrez and ReactomereactomeEXTID2PATHID = Entrez ID to Reactome.db IDreactomePATHID2EXTID = Reactome.db Name to Entrez ID

> xx <- toTable(reactomeEXTID2PATHID)> head(xx) reactome_id gene_id1 168253 108982 168254 108983 168253 81064 168254 81065 168253 56106 168254 5610

Use toTable() instead of as.list() that is shown in manuals

7

Reactome.db Function Uses(NOTE: all return a key:value list)

Converting from GO ID and Reactome IDreactomeREACTOMEID2GO = Reactome.db ID to GO IDsreactomeGO2REACTOMEID = GO ID to Reactome.db ID

> xx <- toTable(reactomeGO2REACTOMEID)> head(xx) reactome_id go_id1 168276 GO:00190542 168276 GO:00190483 168276 GO:00440684 168276 GO:00224155 168276 GO:00517016 168276 GO:0044003

8

Reactome.db Function Uses(NOTE: all return a key:value list)

Retrieving Pathway Names from Reactome IDSreactomePATHNAME2ID = Reactome.db Name to Reactome.db IDreactomePATHID2NAME = Reactome.db ID to Reactome.db Name

> xx <- toTable(reactomePATHID2NAME)> head(xx) reactome_id path_name1 15869 Homo sapiens: Metabolism of nucleotides2 68616 Homo sapiens: Assembly of the ORC complex at the origin of replication3 68689 Homo sapiens: CDC6 association with the ORC:origin complex4 68827 Homo sapiens: CDT1 association with the CDC6:ORC:origin complex5 68867 Homo sapiens: Assembly of the pre-replicative complex6 68874 Homo sapiens: M/G1 Transition

9

Reactome.db Function Uses(NOTE: all return a key:value list)

reactomeMAPCOUNTS = shows number of rows in each function’s relational database (not very useful unless error checking)

> xx <- as.list(reactomeMAPCOUNTS)> xx$reactomeEXTID2PATHID[1] 28363

$reactomeGO2REACTOMEID[1] 3217

$reactomePATHID2EXTID[1] 8320

$reactomePATHID2NAME[1] 13778

$reactomePATHNAME2ID[1] 13876

$reactomeREACTOMEID2GO[1] 47575

10

Ex: Find apoptosis induction-related ID(compare to Notes 6.1 slide 10)# Get data.frame summarizing all reactome.db pathways including a certain string

xx <- toTable(reactomePATHNAME2ID)all.pathways <- xx$path_name # get name of each reactome.db pathwayt <- grep('apoptosis',all.Terms) # get index where Term includes #use agrep() for approximate term searching

reactome.Term <- unlist(all.pathways[t])reactome.IDs <- unlist(xx$reactome_id[t])

reactome.frame <- data.frame(reactome.ID=reactome.IDs, reactome.Term=reactome.Term)

rownames(reactome.frame) <- 1:length(reactome.ID)reactome.frame # 13 terms

11

Ex: Find apoptosis induction-related ID(compare to Notes 6.1 slide 10)

12

Ex. Pathway Term Search Function##Define Function to search for pathways with given key word##agrep.bool is indicator to use agrep (TRUE) or grep (FALSE)searchPathways2REACTOMEID <- function(term, agrep.bool) { xx <- toTable(reactomePATHNAME2ID) all.pathways <- xx$path_name # get name of each reactome.db pathway #get index where Term is found if (agrep.bool==FALSE) (t <- grep(term, all.pathways)) else (t <- agrep(term, all.pathways)) unlist(xx$reactome_id[t]) }

apop.IDs <- searchPathways2REACTOMEID("apoptosis", FALSE)length(apop.IDs) #13 pathways matched

apop.IDs <- searchPathways2REACTOMEID("apoptosis", TRUE)length(apop.IDs) #85 pathways matched

13

Getting GO Terms from single Reactome ID##Get List of GO Terms from Reactome IDxx <- toTable(reactomeGO2REACTOMEID)t <- xx$reactome_id == "15869"GOTerms <- xx$go_id[t]

> GOTerms [1] "GO:0055086" "GO:0006139" "GO:0044281" [4] "GO:0034641" "GO:0044238" "GO:0008152" [7] "GO:0006807" "GO:0044237" "GO:0008150"[10] "GO:0009987"

> xx <- toTable(reactomeGO2REACTOMEID)> head(xx) reactome_id go_id1 168276 GO:00190542 168276 GO:00190483 168276 GO:00440684 168276 GO:00224155 168276 GO:00517016 168276 GO:0044003

14

Getting GO Terms from list of Reactome IDs##Define Function to get all GO Terms for all Reactome IDs in a listgetGOTerms <- function(list_reactome) { listGO = list(); xx <- toTable(reactomeGO2REACTOMEID); for(i in 1:length(list_reactome)) {t <- xx$reactome_id==list_reactome[i]; temp_list = xx$go_id[t] listGO = c(listGO, temp_list)} unlist(listGO) }

GOTerms.all <- getGOTerms(apop.IDs)#From slide 10length(GOTerms.all) #136 GO Terms from 13 apop.IDs

Should have yielded 169 terms (Notes 4.1 slide 10) – reactome.db might not be complete

15

Reactome.org Online Tools

16

Pathway Viewer on reactome.org

http://www.reactome.org/userguide/Usersguide.html#Introduction

17

Pathway Viewer on reactome.org Details Panel

18

Pathway Viewer on reactome.org

http://www.reactome.org/entitylevelview/PathwayBrowser.html#DB=gk_current&FOCUS_SPECIES_ID=48887&FOCUS_PATHWAY_ID=71387&ID=76213&VID=3422142

19

Reactome Pathway SymbolsUpregulation andparticipating proteins

Inhibition

http://www.reactome.org/entitylevelview/PathwayBrowser.html#DB=gk_current&FOCUS_SPECIES_ID=48887&FOCUS_PATHWAY_ID=71387&ID=76213&VID=3422142

20

Reactome Database Assignment Method Genes seem to be assigned to pathways in a

similar manner to GO database If gene is up-regulated, it is included Genes that are down-regulated in a condition are

NOT mapped to the condition/pathway

Haven’t received official response from reactome.org, but from general browsing this seems to be the case

21

Pathway Analysis Tool

http://www.reactome.org/ReactomeGWT/entrypoint.html#PathwayAnalysisDataUploadPage

22

Pathway Analysis Tool

http://www.reactome.org/ReactomeGWT/entrypoint.html#PathwayAnalysisDataUploadPage

23

Expression Set Data Analysis

24

Expression Set Data Analysis

25

Summary Reactome.db provides an interface to the

SQL database containing IDs Functions for converting between ID types No functionality for gene testing through R

Online tools include pathway maps and ID lookup tables

Some limited expression testing (with unknown statistical methods)

26

Questions?