Lecture 3,4

Post on 18-Dec-2014

415 views 3 download

Tags:

description

Genome projects, Secondgen and Thirdgen genome sequencing, application of genome sequencing in predicting disease genes

Transcript of Lecture 3,4

Sucheta TripathyGenome Sequencing Projects, Genome Size,

Application of sequence information for identification of disease genes

Complete Genome SequencingWhole genome shotgun sequencingBAC end sequencingChromosome walkingEnd sealing

Reference: http://en.wikipedia.org/wiki/File:Genome_Sizes.png

Cost of Genome Sequencing

Nextgen sequencing methods454 sequencing methods(2006)

Principles of pyrophosphate detection(1985, 1988)

Illumina(Solexa) Genome sequencing methods(2007)Applied Biosystems ABI SOLiD System(2007)Helicos single molecule sequencing(Helioscope, 2007)Pacific Biosciences single-molecule real-time(SMRT)

technology, 2010Sequenom for Nanotechnology based sequencing.BioNanomatrixnanofluidiscsRNAP technologyhttp://www.ncbi.nlm.nih.gov/books/NBK20261/

Sequencing methods

Ref: http://www.wellcome.ac.uk/Education-resources/Teaching-and-education/Animations/DNA/WTX056046.htm

http://www.wellcome.ac.uk/Education-resources/Teaching-and-education/Animations/DNA/WTX056051.htm

http://www.wellcome.ac.uk/Education-resources/Teaching-and-education/Animations/DNA/WTDV026689.htm

Ion Torrent

SOLiD Sequencing

http://www.insdc.org/

http://www.ebi.ac.uk/embl/Contact/collaboration.html

• JGI – IMG [http://img.jgi.doe.gov/]

• Broad [http://www.broadinstitute.org/]

• TIGR [http://www.jcvi.org/]

• WashU [http://genome.wustl.edu/]

• VBI at Virginia Tech [www.vbi.vt.edu]

Microbial Genome Sequencing

Human Genome Project

In October 1990 Human

Genome project started

First Publication in 2000

Finished paper in 2003

NHGRI Solicited

pilot proposal

for ENCODE

First Report on Encode

Published in 2007

RFAs were sought for

full ENCODE

ENCODE published

2012

GWAS -90% lies outside coding

2005

What happens next?You have 10 million characters – what to do

with them?Locate genesDetermine the function of the gene

By similarity search By domain search By Predicting signal peptide By locating transmembrane region

Ref: http://www.nature.com/nature/journal/v406/n6797/pdf/406799a0.pdf

Genome Annotation

ATGAAGATAGACAGCATACTAGCAGCATAGAATAGATAAGAGATAGAAATAGAATAAATATAAGA

GAGA

Run 6 frame translation

Run Blastp with nr

Match

foundN

o

Make an hmmsearch

Match

found

Product found

Pathway analysisOther analysis

Repeat Finding, miRNA finding, tRNAscan etc.

NO

Unknown Genes Hypothesis

Genome SizesGametic Nuclear DNA contentRepresented as mass in pg(pico grams) or

length in mega bases

1 pg = 10^-12 gms

1mb = 10^6 bases

1 pg = 978 Mb

Ref: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1669731/

Genome SizesDatabase of Genome Sizes

http://www.cbs.dtu.dk/databases/DOGS/Plant Genome database

http://www.kew.org/genomesize/homepage.html

Mamalian genome size databasehttp://www.unipv.it/webbio/dbagsdb.htm

Animal Genome size databasewww.genomesize.com

Fungal Genome size database.www.zbi.ee/fungal-genomesize

Ref: http://www.kew.org/genomesize/homepage.html

Ref: http://www.genomesize.com/

Ref: http://www-3.unipv.it/webbio/dbagsh.htm

Ref: http://www.zbi.ee/fungal-genomesize/

Identifying Human Disease genesref: http://www.ncbi.nlm.nih.gov/books/NBK7561/

Before 1980, very few genes were recognizedReverse Genetics: Know gene product and go

back to gene and do a positional cloningGenetic Redundancy: Multiple genes have the

same function

Identification of genes through protein product

1000 genomes project1092 genomes of different individuals

sequenced.14 populationsLow coverage exome sequencing

38 million SNPs1.4 million short insertions14,000 large deletions

Ref: http://www.nature.com/nature/journal/v491/n7422/full/nature11632.html