Solyndra Sale Three Asset List - Equipment Auctions| HGP ...
HGP ENG 2014 web - Ústav lékařské biochemie 1.LF...
-
Upload
nguyenquynh -
Category
Documents
-
view
216 -
download
1
Transcript of HGP ENG 2014 web - Ústav lékařské biochemie 1.LF...
1
SequencingSequencing GenomesGenomes
Human Genome Project: Human Genome Project:
History, results and impactHistory, results and impact
MUDr. Jan PlMUDr. Jan Pláátenteníík, PhD.k, PhD.
(December 2014)
Beginnings of sequencingBeginnings of sequencing
•• 1965: Sequence of a yeast 1965: Sequence of a yeast tRNAtRNA (80 (80 bpbp) ) determineddetermined
•• 1977: Sanger1977: Sanger’’s and s and MaxamMaxam & Gilbert& Gilbert’’s s techniques inventedtechniques invented
•• 1981: Sequence of human1981: Sequence of human mitochondrialmitochondrialDNA (16DNA (16..5 5 kbpkbp))
•• 1983: 1983: SequenceSequence of of bacteriophagebacteriophage T7 (40 T7 (40 kbpkbp))
•• 1984: 1984: EpsteinEpstein & & BarrBarr‘‘s Virus (170 s Virus (170 kbpkbp))
2
Homo sapiensHomo sapiens•• 19851985--1990: Discussion on human 1990: Discussion on human
genome sequencinggenome sequencing•• ““dangerousdangerous”” -- ““meaninglessmeaningless”” -- ““impossible to doimpossible to do””
•• 19881988--1990: Foundation of 1990: Foundation of
HUMAN GENOME PROJECTHUMAN GENOME PROJECT•• International collaboration:International collaboration: HUGO (Human HUGO (Human
Genome Organisation)Genome Organisation)
•• Aims:Aims:–– genetic map of human genomegenetic map of human genome
–– physical map: marker every 100 physical map: marker every 100 kbpkbp
–– sequencing of model organisms (E. coli, S. sequencing of model organisms (E. coli, S.
cerevisiaecerevisiae, C. , C. eleganselegans, Drosophila, mouse), Drosophila, mouse)
–– find all human genes (find all human genes (estimestim. 60. 60--80 80 tistisíícc))
–– sequence all human genome (sequence all human genome (estimestim. 4000 . 4000 MbpMbp) )
by 2005by 2005
Other genomesOther genomes
•• July 1995July 1995: : HaemophilusHaemophilus influenzaeinfluenzae
(1.8 (1.8 MbpMbp)) ... First genome of independent organism... First genome of independent organism
•• October 1996:October 1996: SaccharomycesSaccharomyces cerevisiaecerevisiae
(12 (12 MbpMbp)) ... First ... First EukaryotaEukaryota
•• December 1998: December 1998: CaenorhabditisCaenorhabditis eleganselegans
(100 (100 MbpMbp)) ... First ... First MetazoaMetazoa
3
May 1998:May 1998:
•• Craig VenterCraig Venter launches private launches private
biotechnology company biotechnology company CELERA CELERA
GENOMICS, Inc.GENOMICS, Inc. and announces intention and announces intention
to sequence whole human genome in just to sequence whole human genome in just
3 years and 300 mil. USD using the 3 years and 300 mil. USD using the wholewhole--
genome shotgun genome shotgun approach.approach.
•• The publicly funded HGP in that time: The publicly funded HGP in that time:
sequenced cca 4 % of the genome sequenced cca 4 % of the genome
March 2000:March 2000:
•• Celera Genomics & academic Celera Genomics & academic
collaborators publish draft genome of collaborators publish draft genome of
Drosophila Drosophila melanogastermelanogaster (cca 2/3 from (cca 2/3 from
180 180 MbpMbp))
•• ... ... wholewhole--genome shotgungenome shotgun is feasible for large is feasible for large
genomes as wellgenomes as well
•• ... ... Human genome: competition between ... ... Human genome: competition between
Human Genome Project and Celera GenomicsHuman Genome Project and Celera Genomics
4
International Human Genome Sequencing International Human Genome Sequencing
Consortium (Human Genome Project, HGP)Consortium (Human Genome Project, HGP)
•• Open to coOpen to co--operation from any operation from any countrycountry
•• 20 laboratories from USA, Great Britain, 20 laboratories from USA, Great Britain, Japan, France, Germany and ChinaJapan, France, Germany and China
•• About 2800 workers, main coordinator: About 2800 workers, main coordinator: Francis Collins, NIHFrancis Collins, NIH
•• Publicly fundedPublicly funded ((aboutabout 3 3 billionbillion USD)USD)
•• Approach: Approach: cloneclone--byby--cloneclone
•• ResultsResults: duty to : duty to uploadupload on internet on internet withinwithin24 24 hourshours ((thethe BermudaBermuda rule)rule). .
CloneClone--byby--cloneclonegenomic DNAgenomic DNA
fragments cca 150fragments cca 150,,000 000 bpbp
cloning in BAC (cloning in BAC (bacterial artificial chromosomebacterial artificial chromosome))
clones positioned in the genome using physical maps clones positioned in the genome using physical maps (STS (STS -- sequence tagged sitesequence tagged site, , fingerprintfingerprint -- cleavage by cleavage by restrictasesrestrictases))
digestion of every clone to short fragments cca 500 digestion of every clone to short fragments cca 500 bpbp
sequencingsequencing
assembly of each clone sequence with computerassembly of each clone sequence with computer
5
Celera Genomics, Inc.Celera Genomics, Inc.•• Private biotechnology company, based in Private biotechnology company, based in
Rockville, Maryland, USA. President Craig Rockville, Maryland, USA. President Craig Venter.Venter.
•• Investments into automation and Investments into automation and computer processing, few dozenscomputer processing, few dozens ofofemployeesemployees
•• Approach: Approach: wholewhole--genome shotgungenome shotgun + + utiliutilisseded publicly shared data from HGP.publicly shared data from HGP.
•• Results: raw data temporarily available at Results: raw data temporarily available at company www site, but all other updates company www site, but all other updates and annotations for commercial purpose. and annotations for commercial purpose.
WholeWhole--genome shotgungenome shotgun
genomic DNAgenomic DNA
fragments 2, 10, 50 fragments 2, 10, 50 kbpkbp
cloned in plasmids cloned in plasmids E.coliE.coli
sequencingsequencing
sequence assembly using sophisticated computer algorithmssequence assembly using sophisticated computer algorithms
6
February 2001:February 2001:
•• International Human Genome International Human Genome
Sequencing Consortium publishes Sequencing Consortium publishes
draft of human genome in Nature draft of human genome in Nature
((Feb. 15Feb. 15thth 20012001))
• Draft: 90 % euchromatin (2.95 Gbp, wholegenome 3.2 Gbp). 25 % definitive.
•• Celera Genomics, Inc. publishes Celera Genomics, Inc. publishes
human genome sequence in Science human genome sequence in Science
((Feb. 16Feb. 16thth 20012001))
• Sequence of euchromatin (2.91 Gbp)
Advance in sequencingAdvance in sequencing
1985: 500 1985: 500 bpbp /lab and day/lab and day–– still the Sanger still the Sanger dideoxynucleotidedideoxynucleotide
technique, buttechnique, but
–– capillary electrophoresis instead capillary electrophoresis instead of of gelgel
–– fluorescence markers instead fluorescence markers instead
radioactivityradioactivity
–– full full automatisationautomatisation & & robotisationrobotisation
–– computer powercomputer power
2000: 175,000 2000: 175,000 bpbp /day (Celera)/day (Celera)
1000 1000 bpbp/sec. (HGP)/sec. (HGP)
7
Sequencing continues...Sequencing continues...•• Human genome now:Human genome now: Definitive version Definitive version
announced 14/4/2003 announced 14/4/2003 ……50 years since DNA double 50 years since DNA double helix. The reference sequence still being updated.helix. The reference sequence still being updated.
•• FuguFugu rubripesrubripes:: draft of genome in August 2002draft of genome in August 2002
•• Mouse:Mouse:•• Celera Genomics: draft in JuneCelera Genomics: draft in June 20012001•• Mouse Mouse GenomeGenome SequencingSequencing ConsortiumConsortium: : NatureNature, ,
DecemberDecember 2002 2002
•• Laboratory rat:Laboratory rat: draft in March 2004draft in March 2004
•• ChimpanzeeChimpanzee:: SeptemberSeptember 20052005
•• …… andand many omany otherther genomes:genomes: malaria (the malaria (the cause Plasmodium cause Plasmodium falciparumfalciparum and carrier Anopheles and carrier Anopheles gambiaegambiae), ), zebrafishzebrafish, rice, dog, cattle, sheep, pig, , rice, dog, cattle, sheep, pig, chicken,chicken, honeybeehoneybee, , mammothmammoth etcetc..
Public databases of Public databases of
DDNA/RNA seNA/RNA sequencesquences
• GenBank, National Center for Biotechnology Information (NCBI), Bethesda, Maryland, USA
• EMBL-Bank, EMBL's European Bioinformatics Institute, Hinxton, UK
• DNA Data Bank of Japan, National Institute of Genetics, Mishima, Japan
•• 22/8/2005 c22/8/2005 contentontent of all three databases of all three databases exceeded 100,000,000,000 base pairs exceeded 100,000,000,000 base pairs (100 (100 GbGb) ) ..... from genes/genomes of . from genes/genomes of 165,000 species of organisms 165,000 species of organisms
8
Research in Research in ““postgenomicpostgenomic”” ageage•• New approaches to study genes & proteins:New approaches to study genes & proteins:
•• GENOMICS GENOMICS ...... analysis of whole genome and its analysis of whole genome and its
expressionexpression
•• PROTEOMICS PROTEOMICS ...... analysis of whole proteome, i.e. analysis of whole proteome, i.e.
all proteins in given tissue or organismall proteins in given tissue or organism
•• BIOINFORMATICS BIOINFORMATICS ...... processing, analysis and processing, analysis and
interpretation of large data sets (NA or protein interpretation of large data sets (NA or protein
sequences, gene arrays, 3D protein structures sequences, gene arrays, 3D protein structures
etc. Experiments etc. Experiments in in silicosilico
•• Rapid development of new technologies:Rapid development of new technologies:
•• e.g. e.g. DNA MicroarrayDNA Microarray -- expression of thousands of expression of thousands of
genes can be studied simultaneouslygenes can be studied simultaneously
DNA Microarray (DNA Microarray (““ DNA chipDNA chip””))
9
Single Nucleotide Polymorphism (SNP)Single Nucleotide Polymorphism (SNP)
OccursOccurs on on averageaverage in in oneone base base per 1000 per 1000 bpbp, i.e. in 0.1 % of , i.e. in 0.1 % of humanhumangenomegenome
AboutAbout 1010 millionmillionss of of SNPsSNPswithwith occurrenceoccurrence > 1%> 1%
Coding/nonCoding/non--codingcoding
Protein structure changed/unchangedProtein structure changed/unchanged
A G A G T T C T G C T C G
A G G G T T C T G C G CG
International International HapMapHapMap ProjectProject•• Further international collaborationFurther international collaboration
20022002--20092009
•• Genotyping and sGenotyping and seeququenencingcing ofof DNA DNA fromfrom270 270 people from fourpeople from four differentdifferent populapopulationstions(USA, (USA, NigeriNigeriaa, , JapJapanan, , ChiChina) na)
•• Aims at findingAims at finding•• all important humanall important human SNPsSNPs ((about 10,000,000about 10,000,000))
•• their their stabstablele ccombinaombinationstions ((haplotyphaplotypeses))
•• Tag SNP for each Tag SNP for each haplotypehaplotype
•• Data publicly available for further Data publicly available for further exploration exploration
10
Human genetic variationHuman genetic variation
•• Two unrelated humans have 99.5% Two unrelated humans have 99.5% of genome identicalof genome identical•• Single Nucleotide Polymorphisms: 0.1%Single Nucleotide Polymorphisms: 0.1%•• Copy number variation (insertions, Copy number variation (insertions,
deletions, duplications): 0.4% deletions, duplications): 0.4% •• Variable number tandem repeats Variable number tandem repeats
((……DNA fingerprinting in forensics)DNA fingerprinting in forensics)•• EpigeneticsEpigenetics ((methylationmethylation))
SecondSecond--generationgeneration sequencerssequencers::
E.g. Illumina Co., E.g. Illumina Co., XII/XII/2008:2008:
•• OneOne run (3 run (3 daysdays) of ) of GenomeGenome AnalyzerAnalyzermade made by by IlluminaIllumina IncInc. = 60 . = 60 yearsyears of of workwork of ABI 3730xl (of ABI 3730xl (usedused by Celera by Celera GenomicsGenomics))
•• CostCost of one of one humanhuman gengenoomemesequencingsequencing:: 4040--50,000 50,000 $$
•• FirstFirst individualindividual humanhuman genomesgenomessequencedsequenced::•• 2007: 2007: CraigCraig VenterVenter, , JamesJames WatsonWatson –– bothboth
genomesgenomes publishedpublished in in thethe internetinternet
11
…… andand thirdthird--generationgeneration sequencerssequencers
Graph: Nature 458, 719-724 (2009).
Obtained from http://genome.wellcome.ac.uk
NextNext--GenerationGeneration SequencingSequencing (NGS)(NGS)
CurrentCurrent technology, e.g technology, e.g IlluminaIllumina HiSeqHiSeq2500:2500:
SequencingSequencing by by synthesissynthesis (SBS)(SBS)
WholeWhole humanhuman genomegenome, 30x , 30x coveragecoverage, , takestakes 27 27 hourshours, , costcost <5000 USD<5000 USD
www.illumina.com
12
(for Illumina technology, Wikimedia Commons)
ArchonArchon X Prize X Prize
forfor GenomicsGenomics
$ 10$ 10,,000000,,000000
AnnouncedAnnounced in 2006.in 2006.For For thethe firstfirst team team thatthat succeedssucceeds in in sequencingsequencing of 100 of 100 individualindividual humanhumangenomesgenomes withinwithin 30 30 daysdays in in certaincertainrequestedrequested qualityquality andand costcost belowbelow$1,000$1,000 per per oneone genomegenome..
13
ArchonArchon X Prize X Prize
forfor GenomicsGenomics
$ 10$ 10,,000000,,000000
AnnouncedAnnounced in 2006.in 2006.For For thethe firstfirst team team thatthat succeedssucceeds in in sequencingsequencing of 100 of 100 individualindividual humanhumangenomesgenomes withinwithin 30 30 daysdays in in certaincertainrequestedrequested qualityquality andand costcost belowbelow$1,000$1,000 per per oneone genomegenome..
Prize Prize cancelle
d
cancelled 22/8/2013
22/8/2013
„„Outpaced
Outpaced by by innovation
innovation““
Human Genome Project: Human Genome Project:
ResultsResults
14
TheThe HumanHuman GenomeGenome
Haploid Haploid genomegenome: 3 : 3 billionbillion base base pairspairs divideddivided to to 23 23 chromosomeschromosomes
•• 1 meter of DNA 1 meter of DNA ifif extendedextended
•• 750 750 MbMb (1 CD)(1 CD)
•• 2 2 millionmillion standard standard printedprinted pagespages
Fig. from Bolzer et al. 2005, PLoS Biol. 3(5): e157 DOI: 10.1371/journal.pbio.0030157
(50 (50 lettersletters/line, 30 /line, 30 lineslines//pagepage))
DNA DNA in cell in cell
nucleusnucleus
NucleusNucleus of of typicaltypical humanhuman
cell has cell has diameterdiameter 55--8 8 µµm m
andand containscontains 2 m of DNA 2 m of DNA
ComparableComparable to a to a tennistennis ballball
intointo whichwhich 20 km of 20 km of thinthin
threadthread has has beenbeen neatlyneatly
packedpacked..
15
Classification of Classification of eukaryeukaryootictic genomicgenomic DNA:DNA:
•• DDegreeegree of condensation:of condensation:•• EEuchromatinuchromatin
•• HeterochromatinHeterochromatin (cca 10%, not (cca 10%, not sequencedsequenced!) !)
•• RRepetitivityepetitivity::•• HHighlyighly repetitiverepetitive•• MModeratelyoderately repetitiverepetitive
•• NNonon--repetitive (singlerepetitive (single--copy)copy)
•• FFunction:unction:•• SStructuraltructural ((centromerscentromers, , telomerstelomers))
•• CCodingoding proteinprotein
•• TranscribedTranscribed to to noncodingnoncoding RNA (RNA (intronsintrons, , rRNArRNA, ,
tRNAtRNA, , miRNAmiRNA etcetc.).)•• TranspoTranspossonsons
•• RegulatoryRegulatory sequencessequences
•• JunkJunk……??
ExperimentExperimentss withwith denaturadenaturationtion & &
reasreasssoocciaiationtion of DNAof DNA::Rapid reassociation (10Rapid reassociation (10--15%):15%):
-- highlyhighly repetitive DNArepetitive DNA
IntermediateIntermediate reassociation (25reassociation (25-- 40%):40%):
-- moderamoderatelytely rrepetitive DNAepetitive DNA
Slow reassociation (50Slow reassociation (50-- 60%):60%):
-- nonnon--repetitive (single copy) DNArepetitive (single copy) DNA
FigFig: : LodishLodish, H. et al.: Molecular Cell Biology (, H. et al.: Molecular Cell Biology (3rd3rd
ed.), ed.), W.H.FreemanW.H.Freeman, New York , New York 19951995. .
16
CClalasssifisificcaationtion ofof eukaryoticeukaryotic genomicgenomic DNA:DNA:
•• HighlyHighly repetitiverepetitive ((simplesimple--sequence DNAsequence DNA):):•• AllAll heterochromatinheterochromatin ((centromerescentromeres, , telomerestelomeres, 8% , 8%
of of genomegenome, , yetyet unsequencedunsequenced))
•• MinisatellitesMinisatellites (3% of (3% of euchromatineuchromatin))
•• ModeratelyModerately repetitiverepetitive::•• TandemTandemlyly repeatedrepeated gengenes es forfor rRNArRNA, , tRNAtRNA aandnd
histonhistones es (more (more identicalidentical copiescopies to to achieveachieve highhightranscriptiontranscription efficiencyefficiency, e.g. , e.g. rRNArRNA genesgenes in in eukaryoteseukaryotes >100 >100 ccopiopies)es)
•• TranspozonsTranspozons
•• NonNon--repetitiverepetitive::•• Protein Protein genesgenes
•• GenesGenes forfor noncodingnoncoding RNARNA
•• RegulatoryRegulatory sequencessequences
Eukaryotic GENEEukaryotic GENE
FigFig: : MurrayMurray, , RR..K.K. et al.: et al.: HarperovaHarperova biochemiebiochemie, Appleton & Lange 1993, , Appleton & Lange 1993, in in CzechCzech HH&H 2002&H 2002. .
17
Genes are not placed evenly in genomeGenes are not placed evenly in genome
•• Big differences among chromosomes:Big differences among chromosomes:
•• chromosome 1: 2968 geneschromosome 1: 2968 genes
•• chromosome Y: 231 genes chromosome Y: 231 genes
•• Regions rich in genes (Regions rich in genes (““citiescities””) )
-- more C and Gmore C and G
•• Regions poor in genes (Regions poor in genes (““desertsdeserts””) )
-- more A and Tmore A and T, , upup to 3 to 3 MbMb!!
•• CpGCpG islands islands -- ““barriers between cities barriers between cities
and desertsand deserts”” ... regulation of gene ... regulation of gene
activity activity
•• Solitary gene:Solitary gene:•• present as a single copy in the whole haploid present as a single copy in the whole haploid
genome (about half of genes)genome (about half of genes)
•• TandemlyTandemly repeated genes for repeated genes for rRNArRNA, , tRNAtRNA, , histoneshistones
•• Gene family:Gene family:•• cluster of related genes that in evolution cluster of related genes that in evolution
originated from a single ancestor, gradual originated from a single ancestor, gradual diversification of sequence and functiondiversification of sequence and function
•• PseudogenePseudogene::•• gene where mutations accumulated to an gene where mutations accumulated to an
extent that it cannot be transcribed extent that it cannot be transcribed ((““molecular fossilmolecular fossil””) )
•• Processed Processed pseudogenepseudogene::•• originated from reverse transcription of originated from reverse transcription of
mRNA and integration to genomemRNA and integration to genome
18
GenesGenes in in humanhuman genomegenome
•• CoCoddinging gengeneses: 20: 20,,364364
•• SmallSmall noncodingnoncoding gengeneses: 9: 9,,673673•• ((upup tto 200 o 200 bpbp, , rRNArRNA, , miRNAmiRNA, , ncRNAncRNA, ,
snRNAsnRNA, , snoRNAsnoRNA ……))
•• LongLong noncodingnoncoding gengeneses: 1: 144,,817817•• ((overover 200 200 bpbp, , variousvarious noncodingnoncoding RNA)RNA)
•• PseudogenPseudogeneses: 14: 14,,414155
•• Gene Gene transtransccriptriptss: 19: 1966,,345345
Ensembl release 78, Dec. 2014 (www.ensembl.org)
19
PProteinrotein genesgenes inin humanhuman genomgenomee
cca cca 20 20 440000
AboutAbout 25% 25% genomgenomee transcribedtranscribed to to prepre--
mRNAmRNA, ,
FromFrom thisthis onlyonly 5% 5% are are exonexonss
……HumanHuman EXOMEXOMEE: cca 1.5 % : cca 1.5 % of of genomgenomee
NumberNumber of of ggeneneses doesdoes not not reflectreflectorganismorganism complexitycomplexity?!?!
SacchSacch. . cerevisiaecerevisiae 66,,000 gen000 genesesC. C. eleganselegans 1818,,000 gen000 genesesDrosophila Drosophila 1313,,000 gen000 genesesArabidopsis thalianaArabidopsis thaliana 2626,,000 gen000 geneses
Comparison of human/mouse genome with Comparison of human/mouse genome with
genomes of lower organisms (C.genomes of lower organisms (C. eleganselegans, ,
Drosophila):Drosophila):
•• low gene density, longer low gene density, longer intronesintrones
FigureFigure fromfrom: : LodishLodish, H. et al.: Molecular Cell Biology (5th ed.), , H. et al.: Molecular Cell Biology (5th ed.), W.H.FreemanW.H.Freeman, New York 2004. , New York 2004.
20
How to find genes in genomes: How to find genes in genomes:
•• Bacteria, yeast:Bacteria, yeast:
•• open reading frames (open reading frames (ORFsORFs))
•• Higher organisms:Higher organisms:
•• hybridisation/comparison with hybridisation/comparison with cDNAcDNA or or EST (expressed sequence tag = part EST (expressed sequence tag = part cDNAcDNA))
•• by similarity with other known genes by similarity with other known genes
•• prediction of recognition sites for splicingprediction of recognition sites for splicing
•• comparison with genomes of other comparison with genomes of other organismsorganisms
Comparison of human/mouse genome with Comparison of human/mouse genome with
genomes of lower organisms (C. genomes of lower organisms (C. eleganselegans, ,
Drosophila):Drosophila):
•• expansion of gene families /new families expansion of gene families /new families
related to:related to:
•• blood clottingblood clotting
•• acquired (specific) immunityacquired (specific) immunity
•• nervous systemnervous system
•• intraintra-- and intercellular communicationand intercellular communication
•• regulation of gene expressionregulation of gene expression
•• programmed cell death (apoptosis)programmed cell death (apoptosis)
21
•• only about 7 % of protein domains only about 7 % of protein domains entirely new in vertebrates, butentirely new in vertebrates, but•• expansion of protein familiesexpansion of protein families
•• new combinations of domains; and proteins new combinations of domains; and proteins more complex (more domains per protein)more complex (more domains per protein)
•• more proteins from one gene more proteins from one gene -- alternative alternative
splicingsplicing in up to in up to 9595 % %
SusumuSusumu OhnoOhno, 1972, 1972
•• BecauseBecause of of mutationmutation loadload thethe humanhuman
haploidhaploid genomegenome cannotcannot affordafford to to keepkeep
more more thanthan aboutabout 30,000 gene loci.30,000 gene loci.
•• Most of DNA Most of DNA isis redundantredundant …… junkjunk! !
http://www.junkdna.com/ohno.html
22
Mobile DNA Mobile DNA elementselements ((transposonstransposons) )
AutonomousAutonomous DNA DNA sequencessequences, , capablecapable to copy to copy themselvesthemselves, , representrepresent 44 % of 44 % of genomegenome
DNA transposons Retrotransposons
Virus-like Non-viral
Long (LINEs) Short (SINEs)
Mobile elements (Mobile elements (transposonstransposons):):
FigFig.: .: LodishLodish, H. et al.: Molecular Cell Biology (5th ed.), , H. et al.: Molecular Cell Biology (5th ed.), W.H.FreemanW.H.Freeman, New York 2004. , New York 2004.
23
DNA DNA transpotransposonssons
2-3 kb (or shorter), encode
transposase, cut & paste in genome
without RNA intermediate
FigFig: : LodishLodish, H. et al.: Molecular Cell Biology (5th ed.), , H. et al.: Molecular Cell Biology (5th ed.), W.H.FreemanW.H.Freeman, New York 2004. , New York 2004.
Mobile (parasitic) elements in Mobile (parasitic) elements in
mammalian genome:mammalian genome:•• DNA DNA transposonstransposons
•• 22--3 kb (or shorter), encode 3 kb (or shorter), encode transposasetransposase, cut , cut & paste & paste
or copy & paste in genome without or copy & paste in genome without RNA RNA intermediateintermediate
•• VirusVirus--like like retrotransposonsretrotransposons•• 66--11 kb (or shorter11 kb (or shorter), ), retrovirusesretroviruses withoutwithout gene gene forfor
protein protein envelopeenvelope ((envenv))
•• LINEsLINEs (long(long--interspersed repeats) interspersed repeats) •• 66--8 kb, e.g. L1, encode 2 proteins (one is reverse 8 kb, e.g. L1, encode 2 proteins (one is reverse
transcriptase)transcriptase)
•• SINEsSINEs (short(short--interspersed repeats) interspersed repeats) •• 100100--300 300 bpbp, e.g. , e.g. AluAlu, code no protein, proliferation , code no protein, proliferation
depends on depends on LINEsLINEs, origin: small , origin: small noncodingnoncoding cellular cellular
RNARNA
24
Census of parasitic elements in human Census of parasitic elements in human
genome:genome:
LINEsLINEs: : 850 000x 850 000x 21 % genome21 % genome
SINEsSINEs: : 1 500 000x 1 500 000x 13 % genome13 % genome
RetrovirusRetrovirus--like: like: 450 000x450 000x 8 % genome8 % genome
DNA DNA transposonstransposons: : 300 000x 300 000x 3 % genome3 % genome
•• Mostly mutated and/or incomplete copies, Mostly mutated and/or incomplete copies, only small part (<0,05%) still active:only small part (<0,05%) still active:
•• LINEsLINEs: 80: 80--100 L1100 L1
•• SINEsSINEs: 2000: 2000--3000 3000 AluAlu, , <100 SVA<100 SVA
•• RetrovirRetrovirusus--likelike: ? : ? (HERV(HERV--KK……reallyreally extinctextinct?)?)
•• DNA DNA transposonstransposons: 0: 0
•••• Mouse genome contains much more functional Mouse genome contains much more functional
transposonstransposons (...why?)(...why?)
Significance of Significance of transposonstransposons in human in human
genomegenome
•• TranspositionTransposition in in germinalgerminal cellscells isis a a rarerare
eventevent ((approxapprox. 1 . 1 newnew insertioninsertion per 20 per 20 livelive
birthsbirths, , mostlymostly AluAlu))
•• StillStill a a significantsignificant sourcesource of of humanhuman geneticgenetic
variabilityvariability
•• CanCan inactivateinactivate genesgenes –– documenteddocumented as a as a
rarerare cause of cause of inheritedinherited diseasesdiseases
•• In In somaticsomatic cellscells cancan resultresult in in mosaicismmosaicism
•• role of L1 in role of L1 in neurogenesisneurogenesis? ?
25
•• TransposonsTransposons facilitatefacilitate recombinationrecombination
……drivingdriving forceforce of of evolutionevolution!!
FigFig.: .: LodishLodish, H. et al.: Molecular Cell Biology (5th ed.), , H. et al.: Molecular Cell Biology (5th ed.), W.H.FreemanW.H.Freeman, New York 2004. , New York 2004.
NonNon--classifiedclassified ””spacerspacer”” DNA:DNA:nnonon--repetitiverepetitive, n, noncodingoncoding, , >1/2 >1/2 genomgenomee ……likely also dead likely also dead transpotransposonssons, , too mutated to too mutated to be recognizablebe recognizable
Project Project ENCODEENCODE, 2012: , 2012: nono junkjunk DNA!DNA!
•• Up toUp to 80% 80% of of genomgenome has e has biologicbiologicalal funfunctionction
•• Up toUp to 75% 75% of of genomgenome is at least some time e is at least some time and somewhere transcribed to and somewhere transcribed to RNA RNA
•• Despite the fact that only Despite the fact that only 20% 20% of of genomgenome at e at best is under evolutionary constraintbest is under evolutionary constraint
…….?????......?????.....
26
Human Genome Project: Human Genome Project:
ImpactImpact
Benefits of genome sequencingBenefits of genome sequencing
•• Facilitates research into molecular Facilitates research into molecular
basis of diseasesbasis of diseases
•• Study of human evolution and migrationStudy of human evolution and migration
•• What the genome determines (What the genome determines (““nature nature
vs. nurturevs. nurture””) and how genetic variation ) and how genetic variation
causes differences among peoplecauses differences among people
•• Genomic medicine, Genomic medicine, pharmacogenomicspharmacogenomics, ,
personalized medicinepersonalized medicine……..
27
GenomGenomicic medicmedicineine
•• 1) 1) DiagnostiDiagnosticscs at the gene levelat the gene level
•• Rare Rare monogenmonogenicic diseasesdiseases
•• Shift to earlier diagnosticsShift to earlier diagnostics•• Possibility of Possibility of diagndiagnosisosis before disease appearsbefore disease appears
•• NNewbornewborn screeningscreening
•• Noninvasive prenatal testing Noninvasive prenatal testing
•• PPrereconception carrier testingconception carrier testing, , preimplantapreimplantationtiongenetic analysis ingenetic analysis in IVF IVF
•• GenGenomicomic--based analysis ofbased analysis of tumorstumors enables enables effective targeted therapieseffective targeted therapies
•• In common complex diseases with polygenic In common complex diseases with polygenic predispositions predispositions ((diabetes, coronary disease diabetes, coronary disease etc.etc.) ) still difficultstill difficult
GenomGenomicic medicmediciinnee
•• 2) 2) PhPharmaarmaccogenomiogenomicscs
•• Targeted therapy of tumors directed by genetic Targeted therapy of tumors directed by genetic analysisanalysis
•• E.g.E.g.: : antibody against antibody against HERHER--2 2 only in breast tumors only in breast tumors that express this proteinthat express this protein
•• GenomicGenomic--based tests pbased tests prediredict drug efficacy, ct drug efficacy, occurrence of adverse side effects, or help to occurrence of adverse side effects, or help to optimize dosage. optimize dosage.
•• E.g.E.g.: : treatment of ctreatment of chronichronic hepatitihepatitiss C, HIC, HIV, V, possibly dosage of possibly dosage of warfarinwarfarin
…… personalizpersonalizeded medicmediciinnee
28
GenomGenomicic medicmediciinnee
•• 3) 3) MicroorganismMicroorganismss::
•• PatPathhogenogenicic::•• RRapidapid diagnostidiagnosticscs of infectious disease by pathogen of infectious disease by pathogen
sequencing sequencing –– especially relevant in tracing newespecially relevant in tracing newepidemiepidemic outbreaksc outbreaks (SARS, MRSA(SARS, MRSA……) )
•• NNononpatpathhogenogenicic:: Human MHuman Miiccrobiomrobiomee•• E.g. human gut bacteria E.g. human gut bacteria –– metabolicmetabolic aacctivittivity y
comparable to livercomparable to liver, , individuindividually differentally different spespecctrumtrum, , relationships to inflammatory bowel disease, relationships to inflammatory bowel disease, atathheroseroscclerlerosisosis, , obeobesitysity……
PersonalPersonal GenomiGenomicscs: 23andME: 23andME
•• Saliva sample sent bySaliva sample sent by DHL, DHL, genotypigenotypingng
cca 700 000 cca 700 000 SNPsSNPs
•• DNA DNA relativesrelatives
•• AncestryAncestry::
•• AncestryAncestry CompositionComposition
•• PaternalPaternal (Y chromosome (Y chromosome haplogrouphaplogroup))
•• MaternalMaternal ((mitochondrialmitochondrial DNA DNA haplogrouphaplogroup))
•• Per cent Per cent NeanderthalNeanderthal DNADNA
•• HealthHealth
29
PersonalPersonal GenomiGenomicscs: 23andME: 23andME
•• Saliva sample sent bySaliva sample sent by DHL, DHL, genotypigenotypingng
cca 700 000 cca 700 000 SNPsSNPs
•• DNA DNA relativesrelatives
•• AncestryAncestry
•• HealthHealth::
•• DiseaseDisease risk: 122 (31 risk: 122 (31 highhigh confidenceconfidence))
•• DrugDrug response: 25 (12 response: 25 (12 highhigh confidenceconfidence) ) InheritedInherited conditionsconditions: 53 (: 53 (allall highhigh confidenceconfidence))
•• TraitsTraits: 61 (13 : 61 (13 highhigh confidenceconfidence))
Why analysis of Why analysis of SNPSNPss does not say moredoes not say more??•• Common Common SNPSNPs not sufficients not sufficient –– necessary to necessary to find individual find individual ((rarerare) ) polymorpolymorphphismismss•• SNPSNPs are not the main source of human s are not the main source of human genetic variability genetic variability –– duplidupliccaationstions//deledeletionstions aandndinsertions of insertions of transpotransposonssons more significantmore significant•• Trait controlled by a single gene is probably Trait controlled by a single gene is probably rather uncommon condition rather uncommon condition –– phphenotypenotype is e is result of interplay of numerous genesresult of interplay of numerous genes•• Expression of genes (how genome is used) is Expression of genes (how genome is used) is what decides what decides •• PolymorPolymorphphismisms in s in noncodingnoncoding rregulaegulatorytory DNADNA•• EpigenetiEpigeneticscs (DNA (DNA methylationmethylation etcetc.) .) –– also also
heritableheritable! !
30
Ethical, legislative and social issuesEthical, legislative and social issues
•• Gene privacy: Gene privacy: •• who has the right of knowing someone elsewho has the right of knowing someone else’’s s
genetic information and how it can be used, genetic information and how it can be used,
worries about discrimination by employer, health worries about discrimination by employer, health insurance company...insurance company...
•• Gene testingGene testing
•• Gene therapyGene therapy
•• DesignerDesigner babiesbabies
•• BehavioralBehavioral genetics: genetics: •• how genes determine human behaviour, how genes determine human behaviour,
possible fall into genetic determinism and loss of possible fall into genetic determinism and loss of
responsibility for oneresponsibility for one’’s own behaviour s own behaviour
•• GMGMOO
•• Gene patentingGene patenting
ReferenceReferencess::AlbertsAlberts, B. , B. etet alal.: .: EssentialEssential Cell Biology, Cell Biology, GarlandGarland PublishingPublishing, , IncInc., ., NewNew
York 1998.York 1998.
LodishLodish, H. et al.: Molecular Cell Biology, , H. et al.: Molecular Cell Biology, W.H.FreemanW.H.Freeman, New York , New York 1995, 1995, 2004 (2004 (““DarnellDarnell””).).
Nature 2001: 409 (6822, 15.2.2001); pp. 813Nature 2001: 409 (6822, 15.2.2001); pp. 813--958958..
Science 2001: 291 (5507, 16.2.2001); pp.1177Science 2001: 291 (5507, 16.2.2001); pp.1177--13511351..
TrendsTrends in in GeneticsGenetics 2007: 23, 2007: 23, pppp.183.183--191.191.
NatureNature 2009: 2009: 458, 719-724.
FEBS FEBS LettersLetters 2011: 585; 2011: 585; pppp. 1589. 1589--1594. 1594.
LectureLecture by dr. M. by dr. M. LeblLebl ((IlluminaIllumina Co.), 1.LF UK, 1.12.2008.Co.), 1.LF UK, 1.12.2008.
Science Science TranslationalTranslational MedicineMedicine 2013: 5, 189sr4.2013: 5, 189sr4.
PNAS 2014: 111, PNAS 2014: 111, pppp. 6131. 6131--6138 6138 http://http://www.ncbi.nlm.nih.govwww.ncbi.nlm.nih.govhttp://http://genomicsgenomics..energyenergy..govgovhttp://en.wikipedia.orghttp://en.wikipedia.orghttp://www.http://www.ensemblensembl..orgorghttp://http://hapmaphapmap..ncbincbi..nlmnlm..nihnih..govgovhttp:www.http:www.illuminaillumina..comcomhttp(s)://www.23andme.http(s)://www.23andme.comcomFig. “Human and DNA Shadow”: Courtesy of U.S. Department of Energy's Joint Genome Institute, Walnut Creek, CA, http://www.jgi.doe.gov.