BITs: Genome browsers and interpretation of gene lists.

Post on 11-May-2015

1.626 views 1 download

Tags:

description

Module 5 Genome browsers and interpreting gene lists. Part of training session "Basic Bioinformatics concepts, databases and tools" - http://www.bits.vib.be/training

Transcript of BITs: Genome browsers and interpretation of gene lists.

Basic bioinformatics concepts, databases and tools

Module 5

Genome browsers and

interpretation of gene lists

Dr. Joachim Jacob

http://www.bits.vib.be

Updated 21 July 2011http://dl.dropbox.com/u/18352887/BITS_training_material/Link%20to%20mod5-intro_H1_2011_genomebrowsers.pdf

Integrating biological information

Genome databases and browsers

– Integration on a species basis all biological information: Ensembl Genome Browser

http://www.ensembl.org/ Table Browsers

– Retrieving biological (not only sequence) data applying various criteria: Biomart

http://www.biomart.org/ Interpreting gene lists

– 'What is the biology behind my gene list': DAVID

http://david.abcc.ncifcrf.gov/

Reference genome sequences provide a standard genome sequence per species

Genomes From various sequence sources, a genome is

assembled By NCBI: currently assembly 37 in human (or

'build') (2010) By Celera: commercial

Each build differs! 1. Data freeze: all data for assembling (ignoring

new data from that point) 2. Assembly process and annotation 3. Release of the Build: Reference Sequence

Genomehttp://www.ncbi.nlm.nih.gov/Genomes/

Finding your way in genomes Annotation and terms

See also NCBI handbook Locus = place on the genome, ~ a

gene (different alleles) Location:

Rough location by staining of chromosomes e.g. 18q12.1 → chromosome 18, long arm (=q, small arm is p)

Exact bases on genomes (assembly must be mentioned!)

Ensembl Genome browser We will use this browser in this

session Information is combination of

automatic annotation and manually curated sources (ENS >< Havana (Vega) genes)

All entries can be accessed through the browser, each with its own clear identifiers

28 November 2009 Joachim.jacob@gmail.com

8/10

Information about the genomes

http://www.ensembl.orghttp://www.ensembl.org

http://www.ensemblgenomes.orghttp://www.ensemblgenomes.org

Joachim.jacob@gmail.com 10/10!

… or click on the figure feature!

28 November 2009 Joachim.jacob@gmail.com 11/10

28 November 2009 Joachim.jacob@gmail.com 12/10

Joachim.jacob@gmail.com

TAB SUMMARY

DETAILED INFORMATION

INFOR-MATION

SELEC-TOR

DATA MANAGER

tab

DAS

Ensembl Genome browser

Usefulness: One place for all information on a

particular gene / structure / location / variation

But also: Comparison to other species

The Ensembl Team has a lot of training movies and examples available. Check them out!

http://www.ensembl.org/info/index.htmlhttp://www.ensembl.org/Help/Movie?id=188

Ensembl Genome browser

Usefulness: One place for all information on a

particular gene / structure / location / variation

But also: Comparison to other species

The Ensembl Team has a lot of training movies and examples available. Check them out!

http://www.ensembl.org/info/index.htmlhttp://www.ensembl.org/Help/Movie?id=188

Tracks are a way to display information on a genome sequence

The annotation on a genome-wide scale is displayed in tracks.

– Relevant database content can be formatted in tracks and displayed on a reference genome

Genome reference

tracks Screenshot of Ensembl genome browser

Tracks are a way to display information on a genome sequence

The annotation on a genome-wide scale is displayed in tracks, most used formats:

- each base receives a value: dense continuous data: WIG format (e.g. %GC)

- annotation has a start and a stop coordinate: bed format (e.g. gene annotations)

Example

Variations in genomes are reported in vcf format

http://www.ensembl.org/info/website/upload/bed.htmlhttp://www.bits.vib.be/wiki/index.php/.vcf

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 20 14370 rs6054257 G A 29 PASS NS=3;DP=14;AF=0.5;DB;H2 GT:GQ:DP:HQ20 17330 . T A 3 q10 NS=3;DP=11;AF=0.017 GT:GQ:DP:HQ

Biomart, your one stop portal to fetch information

Biomart http://www.biomart.org/

– These questions are easy:Hey, can you tell me how many genes in mouse exist which regulate transcription and are located on

Chromosome 19 ?

Biomart, your one stop portal to fetch information

Biomart http://www.biomart.org/

– These questions are easy:Hey, can you tell me how many genes in mouse exist which regulate transcription and are located on

Chromosome 19 ?

Ensembl Genes

Genome sequence (Ensembl)

Gene OntologyGO:0009299

Biomart, your one stop portal to fetch information

Biomart http://www.biomart.org/

Translated questions reflect in database choice and Filters

Resulting genes are counted and the output set via Attributes

Biomart is available for an increasing number of databases

Biomart

http://www.biomart.org/

Gene lists resulting from different analyses can reveal their biology

DAVID - http://david.abcc.ncifcrf.gov/

Gene lists resulting from different analyses can reveal their biology

DAVID - http://david.abcc.ncifcrf.gov/

DEMO

Alternatives

g:Profiler http://biit.cs.ut.ee/gprofiler/Babelomics http://www.babelomics.org/

Galaxy allows you to store your data and to (re)analyse it conveniently

Galaxy - http://usegalaxy.org

Galaxy allows you to store your data and to (re)analyse it conveniently

Galaxy - http://usegalaxy.org

DEMO

TOOLS RESULTSDATA SETS