Curso de Genomica-Introduction to NGS Data Analysis-V3
-
Upload
ward-effect -
Category
Documents
-
view
220 -
download
0
Transcript of Curso de Genomica-Introduction to NGS Data Analysis-V3
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
1/84
NGS Data analysis http://ueb.ir.vhebron.net/NGS
Introduction to NGS
(Now Generation Sequencing)Data Analysis
Statistics and Bioinformatics Research GroupStatistics department, Universitat de Barelona
Statistics and Bioinformatics UnitVall d!e"ron #nstitut de Recerca
Alex Snchez
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
2/84
Outline
Introduction
!ioin"or#atics $hallen%es
NGS data analysis: So#e exa#ples and &or'"lo&s (eta%eno#ics) De novo se*uencin%) +ariant detection) ,NA-
se*
So"t&are Galaxy) Geno#e vie&ers
Data "or#ats and *uality control
NGS Data analysis http://ueb.ir.vhebron.net/NGS
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
3/84
NGS Data analysis http://ueb.ir.vhebron.net/NGS
Introduction
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
4/84
NGS Data analysis http://ueb.ir.vhebron.net/NGS
hy is NGS revolutionary
NGS has brou%ht hi%h speed not only to %eno#e
se*uencin% and personal #edicine)
it has also chan%ed the &ay &e do %eno#e research
Got a *uestion on %eno#e or%anization
S0120N$0 I3 444
Ana $onesa) bioin"or#atics researcher at
5rincipe 6elipe ,esearch $enter
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
5/84
NGS Data analysis http://ueb.ir.vhebron.net/NGS
NGS #eans hi%h se*uencin% capacity
GS FLX 454(ROCHE)
HiSeq 2000
(ILLUMINA)
5500xl SOLiD(ABI)
Ion ORREN
GS !"nio#
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
6/84
NGS Data analysis http://ueb.ir.vhebron.net/NGS
454 GS Junior
35MB
NGS 5lat"or#s 5er"or#ance
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
7/84 NGS Data analysis http://ueb.ir.vhebron.net/NGS
787 Se*uencin%
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
8/84 NGS Data analysis http://ueb.ir.vhebron.net/NGS
A!I SO9ID Sequencing
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
9/84 NGS Data analysis http://ueb.ir.vhebron.net/NGS
Solexa se*uencin%
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
10/84
Applications of Next-GenerationSequencing
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
11/84
NGS Data analysis http://ueb.ir.vhebron.net/NGS
$o#parison o" nd NGS
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
12/84
NGS Data analysis http://ueb.ir.vhebron.net/NGS
So#e nu#bers
Platform 454/FLX Solexa (Illumina)AB SOLID,ead len%th ; bp 8=bpSin%le read es es es
5aired-end ,eads es es es
9on%-insert Bseveral Cbp #ate-paired reads es es No
Nu#ber o" reads por instru#ent run 8.==C E@== ( 7==(
(ax Data output =.8Gbp :=.8 Gbp :=Gbp
,un ti#e to @Gb > Days E @ Day E@ Day
0ase o" use B&or'"lo& Di""icult 9east di""icult Di""icult
!ase $allin% 6lo& Space Nucleotide space $olor sapceDNA Applicationhole %eno#e se*uencin% and rese*uencin% es es es
de novo se*uencin% Yes es es
3ar%eted rese*uencin% es es es
Discovery o" %enetic variants B SN5s) InDels) $N+) ... es Yes Yes
$hro#atin I##unopecipitation B$hI5 es es es
(ethylation Analysis es es es
(eta%eno#ics es No No!NA Application es es eshole 3ranscripto#e es es es
S#all ,NA es es es
0xpression 3a%s es Yes Yes
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
13/84
NGS Data analysis http://ueb.ir.vhebron.net/NGS
!ioin"or#atics challen%es o" NGS
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
14/84
NGS Data analysis http://ueb.ir.vhebron.net/NGS
I have #y se*uences/i#a%es. No& &hat
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
15/84
NGS Data analysis http://ueb.ir.vhebron.net/NGS
NGS pushes Bbioin"or#atics needs up
Need "or co#puter po&er
+0, lar%e text "iles B;@= #illion lines lon% $anFt do business as usualF &ith "a#iliar tools such as 5erl/5ython.
I#possible #e#ory usa%e and execution ti#e
I#possible to bro&se "or proble#s
Need sequence $ualit% filtering
Need "or lar%e a#ount o" $52 po&er
In"or#atics %roups #ust #ana%e co#pute clusters
$hallen%es in parallelizin% existin% so"t&are or redesi%n o" al%orith#s to &or' in a
parallel environ#ent
Need "or !ioin"or#atics po&er444 3he challen%es turns "ro# data %eneration into data analysis4
Ho& should bioin"or#atics be structured !i%%er centralized bioin"or#atics services Bor research %roups providin% service
Distributed #odel: bioin"or#aticians #ust be part o" the te#as. Interoperability
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
16/84
NGS Data analysis http://ueb.ir.vhebron.net/NGS
Data #ana%e#ent issues
,a& data are lar%e. Ho& lon% should be 'ept
5rocessed data are #ana%eable "or #ost people = #illion reads B8=bp ;@Gb
(ore o" an issue "or a "acility: HiSe* reco##ends
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
17/84
NGS Data analysis http://ueb.ir.vhebron.net/NGS
So &hat
In NGS &e have to process really bi% a#ounts o" data)
&hich is not trivial in co#putin% ter#s.
!i% NGS proJects re*uire superco#putin% in"rastructures
Or put another &ay: itKs not the case that anyone can do
everythin%. S#all "acilities #ust care"ully choose their proJects to be scaled
&ith their co#putin% capabilities.
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
18/84
NGS Data analysis http://ueb.ir.vhebron.net/NGS
$o#putational in"rastructure "or NGS
3here is %reat variety but a %ood point to start &ith:
$o#putin% cluster (ultiple nodes Bservers &ith #ultiple cores
Hi%h per"or#ance stora%e B3!) 5! level
6ast net&or's B@=Gb ethernet) in"iniband
0nou%h space and conditions "or the e*uip#ent
BLservers roo#L
S'illed people Bsysad#in) developers
$NAG) in !arcelona: people) #ore than 8=M o" the#in"or#aticians
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
19/84
NGS Data analysis http://ueb.ir.vhebron.net/NGS
Alternatives B@: $loud $o#putin%
Pros
Flexibility. You pay what you use.
Dont need to maintain a data center.
Cons
Transfer bi datasets o!er internet is
slow. You pay for consumed bandwidth.That is a problem with bi datasets.
"ower performance# specially in dis$read%write.
Pri!acy%security concerns.
&ore expensi!e for bi and lonterm pro'ects.
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
20/84
NGS Data analysis http://ueb.ir.vhebron.net/NGS
Alternatives B: Grid $o#putin%
5ros
$heaper.
(ore resources available.
$ons
Hetero%eneousenviron#ent.
Slo& connectivity Bspecially
in Spain.
(uch ti#e re*uired to "ind%ood resources in the %rid.
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
21/84
NGS Data analysis http://ueb.ir.vhebron.net/NGS
In su##ary
NGS arrived ==?/P
No-one predicted NGS in ==@ Bten years a%o
3here"ore &e cannot predict &hat &e &ill co#e
up a%ainst
3GS represents speci"ic challen%es
Q9ar%e Data Stora%e
Q3echnolo%y-a&are so"t&are
Q0nables ne& assays and ne& science
e &ould have said the sa#e about NGSR.
3hese are not ne& proble#s) but &ill re*uire
ne& solutions3here is a la% bet&een technolo%y and
so"t&areR.
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
22/84
!ioin"or#atics and bioin"or#aticians
3he ter# bioin"or#atician #eans #any thin%s
So#e #ay re*uire a &ide ran%e o" s'ills Others re*uire a depth o" speci"ic s'ills
3he best thin% &e can teach is the ability to learn and
adapt
3he spirit o" adventure 3here is a de"inite s'ills shorta%e
3here al&ays has been
NGS Data analysis http://ueb.ir.vhebron.net/NGS
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
23/84
Increasin% i#portance o" data analysis
needs
NGS Data analysis http://ueb.ir.vhebron.net/NGS
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
24/84
NGS Data analysis http://ueb.ir.vhebron.net/NGS
NGS data analysis
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
25/84
NGS Data analysis http://ueb.ir.vhebron.net/NGS
NGS data analysis sta%es
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
26/84
NGS Data analysis http://ueb.ir.vhebron.net/NGS
1uality control and preprocessin% o"
NGS data
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
27/84
NGS Data analysis http://ueb.ir.vhebron.net/NGS
Data types
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
28/84
NGS Data analysis http://ueb.ir.vhebron.net/NGS
hy 1$ and preprocessin%
Se*uencer output:
,eads quality
Natural *uestions
Is the *uality o" #y se*uenced
data OC
I" so#ethin% is &ron% can I "ix it
Problem: HUGE "iles... Ho&
do they loo'
6iles are "lat "iles and bi%...
tens o" Gbs Beven hard tobro&se the#
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
29/84
NGS Data analysis http://ueb.ir.vhebron.net/NGS
5reprocessin% se*uences i#proves results
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
30/84
NGS Data analysis http://ueb.ir.vhebron.net/NGS
Ho& is *uality #easured
Se*uencin% syste#s use to assi%n *uality scores to each pea'
5hred scores provide lo%B@=-trans"or#ed error probability values:I"pis probability that the base call is &ron% the 5hred score is
1 T .@=Ulo%@=p
score T = corresponds to a @M error rate
score T
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
31/84
NGS Data analysis http://ueb.ir.vhebron.net/NGS
Data "or#ats
Fast "or#at Beverybody 'no&s about it Header line starts &ith E "ollo&ed by a se*uence ID
Se*uence Bstrin% o" nt.
Fast! "or#at Bhttp://#a*.source"or%e.net/"ast*.sht#l
6irst is the se*uence Bli'e 6asta but startin% &ith W 3hen and se*uence ID Boptional and in the "ollo&in% line are
1+s encoded as sin%le byte AS$II codes
Di""erent *uality encode variants
Nearly all do&nstrea# analysis ta'e Fast! as inputse*uence
http://maq.sourceforge.net/fastq.shtmlhttp://maq.sourceforge.net/fastq.shtml -
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
32/84
NGS Data analysis http://ueb.ir.vhebron.net/NGS
3he "ast* "or#at
A 6AS31 "ile nor#ally uses "our lines per se*uence.
9ine @ be%ins &ith a KWK character and is "ollo&ed by a se*uenceidenti"ier and an optionaldescription Bli'e a 6AS3A title line.
9ine is the ra& se*uence letters.
9ine < be%ins &ith a KK character and isoptionall%"ollo&ed by the sa#ese*uence identi"ier Band any description a%ain.
9ine 7 encodes the *uality values "or the se*uence in 9ine ) and #ust
contain the sa#e nu#ber o" sy#bols as letters in the se*uence. Di""erent encodin%s are in use
(aner format can encode a Phred )uality score from * to +, usin -(C ,, to /01
@Seq description
GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT
+
!''*((((***+))%%%++)(%%%%).1***-+*''))**55CCF>>>>>>CCCCCCC5
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
33/84
NGS Data analysis http://ueb.ir.vhebron.net/NGS
So#e tools to deal &ith 1$
2se Fast!" to see your startin% state.
2se Fast#$tool%it to opti#ize di""erent datasets and then
visualize the result &ith 6ast1$ to prove your success4
Hints: 3ri##in%) clippin% and "ilterin% #ay i#prove *uality
!ut be&are o" re#ovin% too #any se*uencesR
Go to the tutorial and tr% the e&ercises'''
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
34/84
Applications
X@Y (eta%eno#ics
XY De novo se*uencin% X
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
35/84
NGS Data analysis http://ueb.ir.vhebron.net/NGS
[1] Metagenomics &other community-based omics
&oeten'al E G et al(Gut )**+,5-./0*5$/0/5
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
36/84
X@Y (eta%eno#ic Approaches
SM11$S"1E. /0S r2 gene roiling
3he basic approach is to identi"y #icrobes in a co#plexco##unity by exploitin% universal and conserved tar%ets)such as r,NA %enes5etrosini.
12GE$S"1E. 67ole Genome S7otgun 86GS9
hole-%eno#e approaches enable to identi"y andannotate #icrobial %enes and its "unctions in theco##unity.
En:ironmental S7otgun Sequencing 8ESS9(
A pri#er on #eta%eno#ics.59oS $o#put !iol. =@= 6eb >Z>B:e@===>>?.
"7allenges an' limitations.$hi#eric se*uences caused by5$, a#pli"ication and se*uencin% errors.
"7allenges an' limitations.
relatively lar%e a#ounts o" startin% #aterial re*uired
potential conta#ination o" #eta%eno#ic sa#ples &ith host%enetic #aterial
hi%h nu#bers o" %enes o" un'no&n "unction.
X@Y A t i '"l
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
37/84
X@Y A #eta%eno#ics &or'"lo&
Gene re'iction
Binning
AAGA$G3GGA$A
$A3G$G3G$A3G
AG3$G3$AG3$A3GGG
G3$$G3$A$AA$3GA
S7ort rea's 84*$/5* bs9
AAGA$G3GGA$AGA3$3G$3$AGG$3AG$A3GAA$
"ontigs
GA3AGG3GGA$$GA3A3G$A33AGA$33G$AGGG$
/ 3*** 0***
;2Fs
Proteins< amilies< unctions
/ 3*** 0***
Functional roiles
/ )***
Sequences into secies
ssembly
Homology searc7ing
Functional classiication;ntologies
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
38/84
X@Y $o#parative (eta%eno#ics
Other so"t&are based onphylo%eneticdata are UniFrac.
MEG can also be used toco#pare the O32 co#positiono" t&o or #ore "re*uency-nor#alized sa#ples.MG$2S=provides a
co#parative "unctional andse*uence-based analysis "oruploaded sa#ples
.
$o#parin% t&o or #ore #eta%eno#es is necessary to understand ho& %eno#ic di""erencesa""ect) and are a""ected by the abiotic environ#ent.
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
39/84
X@Y So#e (eta%eno#ics proJects
L&hole-%eno#e shot%un se*uencin%L?P #illion base pairs o" uni*ue DNA se*uence &ere analyzed
L&hole-%eno#e shot%un se*uencin%L &as applied to #icrobial populationsA total o" @.=78 billion base pairs o" nonredundant se*uence &ere analyzed
=o 'ate< )4) metagenomic ro>ects are on going an' /*3 are comlete'8???(genomesonline(org9(
[2] i
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
40/84
NGS Data analysis http://ueb.ir.vhebron.net/NGS
[2] De novosequencing
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
41/84
[3] Ampicon anaysis
0ach a#plicon B5$, product is se*uenced individually) allo&in%
"or the identi"ication o" rare variants and the assi%n#ent o"haplotype in"or#ation over the "ull se*uence len%th
So#e applications: Detection o" lo&-"re*uency BV@M variants in co#plex #ixtures
[ rare so#atic #utations) viral *uasispecies... Ultra$'eea#plicon se*uencin% Identi"ication o" rare alleles associated &ith hereditary diseases)
heterozy%ote SN5 callin%... Ultra$broa'a#plicon se*uencin%
(etabolic pro"ilin% o" environ#ental habitats) bacterial taxono#y
and phlylo%eny /0S r2a#plicon se*uencin%
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
42/84
[3] !"ampe o# ra$ data generation $ith %-'()
...
[ ] +#
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
43/84
[3] Data *or+#o$
...
@ata
Processing
[ ] i
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
44/84
[3] 'ina output e"ampes
...
!ar plots output exa#ple B&ith circular le%end "or the AA
N3 substitution Berror #atrices
AA "re*uency tables
[,] i di
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
45/84
[,] ariant discovery
our ali%ner decides the type/a#ount o" variants you can
identi"yNaive SN5 callin%
,eads countin%
Statistic support SN5 callin%
(axi#u# li'elihood) !ayesian1uality score recalibration
,ecalibrate *uality score "ro# &hole ali%n#ent
9ocal reali%n#ent around indels,eali%n reads
Cno&n variants Bli#ited speciesdbSN5
[,] ! ! e aria t A a i
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
46/84
[,]!"ampe. !"ome ariant Anaysis
X7Y G t lli t l
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
47/84
X7Y Genotype callin% tools
X7Y GA3C i li
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
48/84
X7Y GA3C pipeline
X7Y
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
49/84
NGS Data analysis http://ueb.ir.vhebron.net/NGS
X7Y
X7Y ( i i J t
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
50/84
X7Y (any on%oin% se*uencin% proJects
NGS Data analysis http://ueb.ir.vhebron.net/NGS
X8Y 3 i t A l i i NGS
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
51/84
X8Y 3ranscripto#e Analysis usin% NGS
,NA-Se*) or Lhole
3ranscripto#e Shot%unSe*uencin%L BL3SSL
re"ers to use o" H3S
technolo%ies to se*uence
cDNA in order to %etin"or#ation about a
sa#pleKs ,NA content. ,eads produced by
se*uencin%
Ali%ned to a re"erence%eno#e to build
transcripto#e #appin%s.
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
52/84
X8Y A li ti B Di"" ti l i
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
53/84
X8Y Applications B Di""erential expression
@.,eads are #apped to the re"erence
%eno#e or transcripto#e.(apped reads are asse#bled into
expression su##aries Btables o"
counts) sho&in% ho& #ay reads are in
codin% re%ion) exon) %ene or JunctionZ
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
54/84
@= years or plus o" hi%h
throu%hput data analysis
X8Y ,NA Se* data analysis - (appin%
&ain ssues23umber of allowed mismatches
3umber of multihits
&ates expected distance
Considerin exon 'unctions
4nd up with a list of5 of reads per transcript
These will be our 6discrete7response !ariable
X8Y ,NA S d t l i N li i
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
55/84
@= years or plus o" hi%h throu%hput data analysis
3&o #ain sources o" bias
In"luence o" lengt7: $ounts are proportional to the transcriptlen%th ti#es the #,NA expression level.
In"luence o" sequencing 'et7: 3he hi%her se*uencin% depth) the
hi%her counts.
Ho& to deal &ith this ormaliAeBcorrect %ene counts to #ini#ize biases.
2se statistical mo'elsthat ta'e into account
lengt7 and sequencing 'et7
X8Y ,NA Se* data analysis -Nor#alization
X8Y ,NA Se* Di""erential expression #ethods
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
56/84
@= years or plus o" hi%h throu%hput data analysis
X8Y ,NA Se* - Di""erential expression #ethods
6isherKs exact test or si#ilar approaches.
2se Generalized 9inear (odels and #odel counts usin% 5oisson distribution. Ne%ative bino#ial distribution.
3rans"or# count data to use existin% approaches "or#icroarray data.
R
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
57/84
X8Y Advanta%es o" ,NA-se*
2nli'e hybridization approaches does not re*uire existin% %eno#ic
se*uence 0xpected to replace #icroarrays "or transcripto#ic studies
+ery lo& bac'%round noise ,eads can be unab#i%uously #apped
,esolution up to @ bp Hi%h-throu%hput *uantitative #easure#ent o" transcript abundance
!etter than San%er se*uencin% o" cDNA or 0S3 libraries
$ost decreasin% all the ti#e 9o&er than traditional se*uencin%
$an reveal se*uence variations BSN5s Auto#ated pipelines available
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
58/84
So"t&are "or NGS preprocessin% and analysis
NGS Data analysis http://ueb.ir.vhebron.net/NGS
hich so"t&are "or NGS Bdata analysis
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
59/84
NGS Data analysis http://ueb.ir.vhebron.net/NGS
hich so"t&are "or NGS Bdata analysis
Ans&er is not strai%ht"or&ard.
(any possible classi"ications !iolo%ical do#ains
SN5 discovery) Geno#ics) $hI5-Se*) De-novo asse#bly) R
!ioin"or#atics #ethods (appin%) Asse#bly) Ali%n#ent) Se*-1$)R
3echnolo%y
Illu#ina) 787) A!I SO9ID) Helicos) R Operatin% syste#
9inux) (ac OS ) indo&s) R
9icense type G59v
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
60/84
NGS Data analysis http://ueb.ir.vhebron.net/NGS
hich so"t&are "or NGS Bdata analysis
Ans&er is not strai%ht"or&ard.
(any possible classi"ications !iolo%ical do#ains
SN5 discovery) Geno#ics) $hI5-Se*) De-novo asse#bly) R
!ioin"or#atics #ethods (appin%) Asse#bly) Ali%n#ent) Se*-1$)R
3echnolo%y
Illu#ina) 787) A!I SO9ID) Helicos) R Operatin% syste#
9inux) (ac OS ) indo&s) R
9icense type G59v
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
61/84
NGS Data analysis http://ueb.ir.vhebron.net/NGS
So#e popular tools and places
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
62/84
Galax Site
,
http://galaxy.psu.edu/
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
63/84
,.
/'tain "ata from man "atasources inclu"ing the0S %a'le 2roser32io4art3 Worm2ase3
or our on "ata
+repare "ata for further
analsis ' rearrangingor cutting "ata columns3filtering "ata an" man
other actions
Anal6e "ata ' fin"ingo#erlapping regions3
"etermining statistics3phlogenetic analsis
an" much more
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
64/84
,7
contains lins to
the "onloa"ing3
pre-procession an"analsis tools
"isplasmenus an"
"ata inputs
Shos the historof analsis steps3"ata an" result#ieing
!egister0ser
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
65/84
,5
lic Get &ata
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
66/84
,,
Get &ata
from &ata'ase
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
67/84
,8
0ploa" $ile$ile $ormat
0ploa" or paste file
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
68/84
,9
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
69/84
$AS%: file manipulation;
format con#ersation3
summar statistics3
trimming rea"s3
filtering rea"s
' qualit scoreist sa#e" histories an"share" histories
Wor on a current histor3create ne3 share orflo
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
73/84
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
74/84
DA3A +IS2A9I]A3ION
NGS Data analysis http://ueb.ir.vhebron.net/NGS
?istor of Genome @isuali6ation
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
75/84
?istor of Genome @isuali6ation
@P==s@^==s ===s
ti#e
Wh is #isuali6ation important
-
8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3
76/84
Wh is #isuali6ation important
mae large amounts of "ata more interpreta'le
glean patterns from the "atasanit chec B #isual "e'ugging
more