Working with gene lists: Finding data using GEO & BioMart June 5, 2014.

21
Working with gene lists: Finding data using GEO & BioMart June 5, 2014

Transcript of Working with gene lists: Finding data using GEO & BioMart June 5, 2014.

Page 1: Working with gene lists: Finding data using GEO & BioMart June 5, 2014.

Working with gene lists:Finding data using GEO

& BioMart

June 5, 2014

Page 2: Working with gene lists: Finding data using GEO & BioMart June 5, 2014.

Analyzing a gene listWith hundreds of genes but a limited budget and lab

personnel, you need to prioritize the gene list to candidate genes for follow-up

Pick ones that are “interesting”Known to be involved in other related processes but

not (yet) in your process of interestHas protein features which suggest a function in your

process, but it has not been characterizedNo known function or domain, but it shows up in

other, related high-throughput experiments suggesting a key role in your process of interest

Page 3: Working with gene lists: Finding data using GEO & BioMart June 5, 2014.

Our approach

Analyzing gene lists by:

1. Finding overlap with other high-throughput experiments

2. Finding additional information using BioMart1. Mouse/human homologs2. Protein domain content3. GO classification

Page 4: Working with gene lists: Finding data using GEO & BioMart June 5, 2014.

GEO (gene expression omnibus)GEO Datasets

Curated gene expression datasets i.e. there is backlog of experiments that haven’t made it

into the databaseCan search for experiments and conduct differential

gene expression queries on some datasetsCan download datasets & do offline analyses

GEO ProfilesProfiles of expression data for genes

Page 5: Working with gene lists: Finding data using GEO & BioMart June 5, 2014.

Why search GEO?What other experiments have been done that are

similar to yours?GEO datasets

How do my genes of interest behave in other large scale experimentsGEO profiles

Page 6: Working with gene lists: Finding data using GEO & BioMart June 5, 2014.

GEO Profile searchSearch on a gene name (C04F5.7):

Page 7: Working with gene lists: Finding data using GEO & BioMart June 5, 2014.
Page 8: Working with gene lists: Finding data using GEO & BioMart June 5, 2014.
Page 9: Working with gene lists: Finding data using GEO & BioMart June 5, 2014.

GEO Dataset search

“C. elegans”: 4434

Page 10: Working with gene lists: Finding data using GEO & BioMart June 5, 2014.

GEO Dataset searches

Query Total datasets

C. elegans datasets

C. elegans 4434 4072

C. elegans AND response 131 121

C. elegans AND host response 5 5

C. elegans AND immune 24 20

C. elegans AND antimicrobial 109 94

Page 11: Working with gene lists: Finding data using GEO & BioMart June 5, 2014.

Once dataset identifiedDownload data

SOFT format: tab-delimited data Issues:

Not necessarily processed such that they have the ratios of experiment/control

If starting with raw data, may not be able to replicate exactly what authors did or lack expertise/software to generate a list of DE genes

Look for supplementary data from publication Usually they provide a list of all DE genes

Page 12: Working with gene lists: Finding data using GEO & BioMart June 5, 2014.

Choice of dataset for comparison

In class demo

Page 13: Working with gene lists: Finding data using GEO & BioMart June 5, 2014.

Biomart – EBI EnsemblUse series of menus

Data source – organism (genes, variation, ect) Filters -- reduce the number of results Attributes – what data to return

Can set up very precise and multilayered queriesCan query across multiple organisms

Simple query:Given a list of gene IDs, you can obtain attributes or

sequences for the entire listTools

ID converter – very useful, easy to use

Page 14: Working with gene lists: Finding data using GEO & BioMart June 5, 2014.

Two sites for BioMart access

www.biomart.org

Page 15: Working with gene lists: Finding data using GEO & BioMart June 5, 2014.

Database journal issue on BioMart

Page 16: Working with gene lists: Finding data using GEO & BioMart June 5, 2014.
Page 17: Working with gene lists: Finding data using GEO & BioMart June 5, 2014.

Filtering in BioMart

Page 18: Working with gene lists: Finding data using GEO & BioMart June 5, 2014.

Attributes in BioMart

Page 19: Working with gene lists: Finding data using GEO & BioMart June 5, 2014.

BiomartFilters

C. elegans genes with a human homologSpecify only genes with >= # isoformsprotein coding genes with a transmembrane domain

AttributesEntrez Gene IDs, WormBase IDs, Affy IDsSequence data

transcript, protein, UTRs, flanking regions, ect.

Page 20: Working with gene lists: Finding data using GEO & BioMart June 5, 2014.

BioMartIn class demo

Page 21: Working with gene lists: Finding data using GEO & BioMart June 5, 2014.

Today’s exerciseCompare current dataset from PLoS Pathogens

paper to data from a different datasetIdentify & retrieve additional information about C.

elegans genes using BioMart