Curso de Genomica-Introduction to NGS Data Analysis-V3

download Curso de Genomica-Introduction to NGS Data Analysis-V3

of 84

Transcript of Curso de Genomica-Introduction to NGS Data Analysis-V3

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    1/84

    NGS Data analysis http://ueb.ir.vhebron.net/NGS

    Introduction to NGS

    (Now Generation Sequencing)Data Analysis

    Statistics and Bioinformatics Research GroupStatistics department, Universitat de Barelona

    Statistics and Bioinformatics UnitVall d!e"ron #nstitut de Recerca

    Alex Snchez

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    2/84

    Outline

    Introduction

    !ioin"or#atics $hallen%es

    NGS data analysis: So#e exa#ples and &or'"lo&s (eta%eno#ics) De novo se*uencin%) +ariant detection) ,NA-

    se*

    So"t&are Galaxy) Geno#e vie&ers

    Data "or#ats and *uality control

    NGS Data analysis http://ueb.ir.vhebron.net/NGS

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    3/84

    NGS Data analysis http://ueb.ir.vhebron.net/NGS

    Introduction

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    4/84

    NGS Data analysis http://ueb.ir.vhebron.net/NGS

    hy is NGS revolutionary

    NGS has brou%ht hi%h speed not only to %eno#e

    se*uencin% and personal #edicine)

    it has also chan%ed the &ay &e do %eno#e research

    Got a *uestion on %eno#e or%anization

    S0120N$0 I3 444

    Ana $onesa) bioin"or#atics researcher at

    5rincipe 6elipe ,esearch $enter

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    5/84

    NGS Data analysis http://ueb.ir.vhebron.net/NGS

    NGS #eans hi%h se*uencin% capacity

    GS FLX 454(ROCHE)

    HiSeq 2000

    (ILLUMINA)

    5500xl SOLiD(ABI)

    Ion ORREN

    GS !"nio#

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    6/84

    NGS Data analysis http://ueb.ir.vhebron.net/NGS

    454 GS Junior

    35MB

    NGS 5lat"or#s 5er"or#ance

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    7/84 NGS Data analysis http://ueb.ir.vhebron.net/NGS

    787 Se*uencin%

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    8/84 NGS Data analysis http://ueb.ir.vhebron.net/NGS

    A!I SO9ID Sequencing

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    9/84 NGS Data analysis http://ueb.ir.vhebron.net/NGS

    Solexa se*uencin%

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    10/84

    Applications of Next-GenerationSequencing

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    11/84

    NGS Data analysis http://ueb.ir.vhebron.net/NGS

    $o#parison o" nd NGS

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    12/84

    NGS Data analysis http://ueb.ir.vhebron.net/NGS

    So#e nu#bers

    Platform 454/FLX Solexa (Illumina)AB SOLID,ead len%th ; bp 8=bpSin%le read es es es

    5aired-end ,eads es es es

    9on%-insert Bseveral Cbp #ate-paired reads es es No

    Nu#ber o" reads por instru#ent run 8.==C E@== ( 7==(

    (ax Data output =.8Gbp :=.8 Gbp :=Gbp

    ,un ti#e to @Gb > Days E @ Day E@ Day

    0ase o" use B&or'"lo& Di""icult 9east di""icult Di""icult

    !ase $allin% 6lo& Space Nucleotide space $olor sapceDNA Applicationhole %eno#e se*uencin% and rese*uencin% es es es

    de novo se*uencin% Yes es es

    3ar%eted rese*uencin% es es es

    Discovery o" %enetic variants B SN5s) InDels) $N+) ... es Yes Yes

    $hro#atin I##unopecipitation B$hI5 es es es

    (ethylation Analysis es es es

    (eta%eno#ics es No No!NA Application es es eshole 3ranscripto#e es es es

    S#all ,NA es es es

    0xpression 3a%s es Yes Yes

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    13/84

    NGS Data analysis http://ueb.ir.vhebron.net/NGS

    !ioin"or#atics challen%es o" NGS

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    14/84

    NGS Data analysis http://ueb.ir.vhebron.net/NGS

    I have #y se*uences/i#a%es. No& &hat

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    15/84

    NGS Data analysis http://ueb.ir.vhebron.net/NGS

    NGS pushes Bbioin"or#atics needs up

    Need "or co#puter po&er

    +0, lar%e text "iles B;@= #illion lines lon% $anFt do business as usualF &ith "a#iliar tools such as 5erl/5ython.

    I#possible #e#ory usa%e and execution ti#e

    I#possible to bro&se "or proble#s

    Need sequence $ualit% filtering

    Need "or lar%e a#ount o" $52 po&er

    In"or#atics %roups #ust #ana%e co#pute clusters

    $hallen%es in parallelizin% existin% so"t&are or redesi%n o" al%orith#s to &or' in a

    parallel environ#ent

    Need "or !ioin"or#atics po&er444 3he challen%es turns "ro# data %eneration into data analysis4

    Ho& should bioin"or#atics be structured !i%%er centralized bioin"or#atics services Bor research %roups providin% service

    Distributed #odel: bioin"or#aticians #ust be part o" the te#as. Interoperability

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    16/84

    NGS Data analysis http://ueb.ir.vhebron.net/NGS

    Data #ana%e#ent issues

    ,a& data are lar%e. Ho& lon% should be 'ept

    5rocessed data are #ana%eable "or #ost people = #illion reads B8=bp ;@Gb

    (ore o" an issue "or a "acility: HiSe* reco##ends

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    17/84

    NGS Data analysis http://ueb.ir.vhebron.net/NGS

    So &hat

    In NGS &e have to process really bi% a#ounts o" data)

    &hich is not trivial in co#putin% ter#s.

    !i% NGS proJects re*uire superco#putin% in"rastructures

    Or put another &ay: itKs not the case that anyone can do

    everythin%. S#all "acilities #ust care"ully choose their proJects to be scaled

    &ith their co#putin% capabilities.

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    18/84

    NGS Data analysis http://ueb.ir.vhebron.net/NGS

    $o#putational in"rastructure "or NGS

    3here is %reat variety but a %ood point to start &ith:

    $o#putin% cluster (ultiple nodes Bservers &ith #ultiple cores

    Hi%h per"or#ance stora%e B3!) 5! level

    6ast net&or's B@=Gb ethernet) in"iniband

    0nou%h space and conditions "or the e*uip#ent

    BLservers roo#L

    S'illed people Bsysad#in) developers

    $NAG) in !arcelona: people) #ore than 8=M o" the#in"or#aticians

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    19/84

    NGS Data analysis http://ueb.ir.vhebron.net/NGS

    Alternatives B@: $loud $o#putin%

    Pros

    Flexibility. You pay what you use.

    Dont need to maintain a data center.

    Cons

    Transfer bi datasets o!er internet is

    slow. You pay for consumed bandwidth.That is a problem with bi datasets.

    "ower performance# specially in dis$read%write.

    Pri!acy%security concerns.

    &ore expensi!e for bi and lonterm pro'ects.

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    20/84

    NGS Data analysis http://ueb.ir.vhebron.net/NGS

    Alternatives B: Grid $o#putin%

    5ros

    $heaper.

    (ore resources available.

    $ons

    Hetero%eneousenviron#ent.

    Slo& connectivity Bspecially

    in Spain.

    (uch ti#e re*uired to "ind%ood resources in the %rid.

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    21/84

    NGS Data analysis http://ueb.ir.vhebron.net/NGS

    In su##ary

    NGS arrived ==?/P

    No-one predicted NGS in ==@ Bten years a%o

    3here"ore &e cannot predict &hat &e &ill co#e

    up a%ainst

    3GS represents speci"ic challen%es

    Q9ar%e Data Stora%e

    Q3echnolo%y-a&are so"t&are

    Q0nables ne& assays and ne& science

    e &ould have said the sa#e about NGSR.

    3hese are not ne& proble#s) but &ill re*uire

    ne& solutions3here is a la% bet&een technolo%y and

    so"t&areR.

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    22/84

    !ioin"or#atics and bioin"or#aticians

    3he ter# bioin"or#atician #eans #any thin%s

    So#e #ay re*uire a &ide ran%e o" s'ills Others re*uire a depth o" speci"ic s'ills

    3he best thin% &e can teach is the ability to learn and

    adapt

    3he spirit o" adventure 3here is a de"inite s'ills shorta%e

    3here al&ays has been

    NGS Data analysis http://ueb.ir.vhebron.net/NGS

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    23/84

    Increasin% i#portance o" data analysis

    needs

    NGS Data analysis http://ueb.ir.vhebron.net/NGS

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    24/84

    NGS Data analysis http://ueb.ir.vhebron.net/NGS

    NGS data analysis

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    25/84

    NGS Data analysis http://ueb.ir.vhebron.net/NGS

    NGS data analysis sta%es

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    26/84

    NGS Data analysis http://ueb.ir.vhebron.net/NGS

    1uality control and preprocessin% o"

    NGS data

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    27/84

    NGS Data analysis http://ueb.ir.vhebron.net/NGS

    Data types

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    28/84

    NGS Data analysis http://ueb.ir.vhebron.net/NGS

    hy 1$ and preprocessin%

    Se*uencer output:

    ,eads quality

    Natural *uestions

    Is the *uality o" #y se*uenced

    data OC

    I" so#ethin% is &ron% can I "ix it

    Problem: HUGE "iles... Ho&

    do they loo'

    6iles are "lat "iles and bi%...

    tens o" Gbs Beven hard tobro&se the#

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    29/84

    NGS Data analysis http://ueb.ir.vhebron.net/NGS

    5reprocessin% se*uences i#proves results

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    30/84

    NGS Data analysis http://ueb.ir.vhebron.net/NGS

    Ho& is *uality #easured

    Se*uencin% syste#s use to assi%n *uality scores to each pea'

    5hred scores provide lo%B@=-trans"or#ed error probability values:I"pis probability that the base call is &ron% the 5hred score is

    1 T .@=Ulo%@=p

    score T = corresponds to a @M error rate

    score T

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    31/84

    NGS Data analysis http://ueb.ir.vhebron.net/NGS

    Data "or#ats

    Fast "or#at Beverybody 'no&s about it Header line starts &ith E "ollo&ed by a se*uence ID

    Se*uence Bstrin% o" nt.

    Fast! "or#at Bhttp://#a*.source"or%e.net/"ast*.sht#l

    6irst is the se*uence Bli'e 6asta but startin% &ith W 3hen and se*uence ID Boptional and in the "ollo&in% line are

    1+s encoded as sin%le byte AS$II codes

    Di""erent *uality encode variants

    Nearly all do&nstrea# analysis ta'e Fast! as inputse*uence

    http://maq.sourceforge.net/fastq.shtmlhttp://maq.sourceforge.net/fastq.shtml
  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    32/84

    NGS Data analysis http://ueb.ir.vhebron.net/NGS

    3he "ast* "or#at

    A 6AS31 "ile nor#ally uses "our lines per se*uence.

    9ine @ be%ins &ith a KWK character and is "ollo&ed by a se*uenceidenti"ier and an optionaldescription Bli'e a 6AS3A title line.

    9ine is the ra& se*uence letters.

    9ine < be%ins &ith a KK character and isoptionall%"ollo&ed by the sa#ese*uence identi"ier Band any description a%ain.

    9ine 7 encodes the *uality values "or the se*uence in 9ine ) and #ust

    contain the sa#e nu#ber o" sy#bols as letters in the se*uence. Di""erent encodin%s are in use

    (aner format can encode a Phred )uality score from * to +, usin -(C ,, to /01

    @Seq description

    GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT

    +

    !''*((((***+))%%%++)(%%%%).1***-+*''))**55CCF>>>>>>CCCCCCC5

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    33/84

    NGS Data analysis http://ueb.ir.vhebron.net/NGS

    So#e tools to deal &ith 1$

    2se Fast!" to see your startin% state.

    2se Fast#$tool%it to opti#ize di""erent datasets and then

    visualize the result &ith 6ast1$ to prove your success4

    Hints: 3ri##in%) clippin% and "ilterin% #ay i#prove *uality

    !ut be&are o" re#ovin% too #any se*uencesR

    Go to the tutorial and tr% the e&ercises'''

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    34/84

    Applications

    X@Y (eta%eno#ics

    XY De novo se*uencin% X

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    35/84

    NGS Data analysis http://ueb.ir.vhebron.net/NGS

    [1] Metagenomics &other community-based omics

    &oeten'al E G et al(Gut )**+,5-./0*5$/0/5

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    36/84

    X@Y (eta%eno#ic Approaches

    SM11$S"1E. /0S r2 gene roiling

    3he basic approach is to identi"y #icrobes in a co#plexco##unity by exploitin% universal and conserved tar%ets)such as r,NA %enes5etrosini.

    12GE$S"1E. 67ole Genome S7otgun 86GS9

    hole-%eno#e approaches enable to identi"y andannotate #icrobial %enes and its "unctions in theco##unity.

    En:ironmental S7otgun Sequencing 8ESS9(

    A pri#er on #eta%eno#ics.59oS $o#put !iol. =@= 6eb >Z>B:e@===>>?.

    "7allenges an' limitations.$hi#eric se*uences caused by5$, a#pli"ication and se*uencin% errors.

    "7allenges an' limitations.

    relatively lar%e a#ounts o" startin% #aterial re*uired

    potential conta#ination o" #eta%eno#ic sa#ples &ith host%enetic #aterial

    hi%h nu#bers o" %enes o" un'no&n "unction.

    X@Y A t i '"l

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    37/84

    X@Y A #eta%eno#ics &or'"lo&

    Gene re'iction

    Binning

    AAGA$G3GGA$A

    $A3G$G3G$A3G

    AG3$G3$AG3$A3GGG

    G3$$G3$A$AA$3GA

    S7ort rea's 84*$/5* bs9

    AAGA$G3GGA$AGA3$3G$3$AGG$3AG$A3GAA$

    "ontigs

    GA3AGG3GGA$$GA3A3G$A33AGA$33G$AGGG$

    / 3*** 0***

    ;2Fs

    Proteins< amilies< unctions

    / 3*** 0***

    Functional roiles

    / )***

    Sequences into secies

    ssembly

    Homology searc7ing

    Functional classiication;ntologies

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    38/84

    X@Y $o#parative (eta%eno#ics

    Other so"t&are based onphylo%eneticdata are UniFrac.

    MEG can also be used toco#pare the O32 co#positiono" t&o or #ore "re*uency-nor#alized sa#ples.MG$2S=provides a

    co#parative "unctional andse*uence-based analysis "oruploaded sa#ples

    .

    $o#parin% t&o or #ore #eta%eno#es is necessary to understand ho& %eno#ic di""erencesa""ect) and are a""ected by the abiotic environ#ent.

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    39/84

    X@Y So#e (eta%eno#ics proJects

    L&hole-%eno#e shot%un se*uencin%L?P #illion base pairs o" uni*ue DNA se*uence &ere analyzed

    L&hole-%eno#e shot%un se*uencin%L &as applied to #icrobial populationsA total o" @.=78 billion base pairs o" nonredundant se*uence &ere analyzed

    =o 'ate< )4) metagenomic ro>ects are on going an' /*3 are comlete'8???(genomesonline(org9(

    [2] i

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    40/84

    NGS Data analysis http://ueb.ir.vhebron.net/NGS

    [2] De novosequencing

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    41/84

    [3] Ampicon anaysis

    0ach a#plicon B5$, product is se*uenced individually) allo&in%

    "or the identi"ication o" rare variants and the assi%n#ent o"haplotype in"or#ation over the "ull se*uence len%th

    So#e applications: Detection o" lo&-"re*uency BV@M variants in co#plex #ixtures

    [ rare so#atic #utations) viral *uasispecies... Ultra$'eea#plicon se*uencin% Identi"ication o" rare alleles associated &ith hereditary diseases)

    heterozy%ote SN5 callin%... Ultra$broa'a#plicon se*uencin%

    (etabolic pro"ilin% o" environ#ental habitats) bacterial taxono#y

    and phlylo%eny /0S r2a#plicon se*uencin%

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    42/84

    [3] !"ampe o# ra$ data generation $ith %-'()

    ...

    [ ] +#

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    43/84

    [3] Data *or+#o$

    ...

    @ata

    Processing

    [ ] i

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    44/84

    [3] 'ina output e"ampes

    ...

    !ar plots output exa#ple B&ith circular le%end "or the AA

    N3 substitution Berror #atrices

    AA "re*uency tables

    [,] i di

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    45/84

    [,] ariant discovery

    our ali%ner decides the type/a#ount o" variants you can

    identi"yNaive SN5 callin%

    ,eads countin%

    Statistic support SN5 callin%

    (axi#u# li'elihood) !ayesian1uality score recalibration

    ,ecalibrate *uality score "ro# &hole ali%n#ent

    9ocal reali%n#ent around indels,eali%n reads

    Cno&n variants Bli#ited speciesdbSN5

    [,] ! ! e aria t A a i

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    46/84

    [,]!"ampe. !"ome ariant Anaysis

    X7Y G t lli t l

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    47/84

    X7Y Genotype callin% tools

    X7Y GA3C i li

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    48/84

    X7Y GA3C pipeline

    X7Y

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    49/84

    NGS Data analysis http://ueb.ir.vhebron.net/NGS

    X7Y

    X7Y ( i i J t

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    50/84

    X7Y (any on%oin% se*uencin% proJects

    NGS Data analysis http://ueb.ir.vhebron.net/NGS

    X8Y 3 i t A l i i NGS

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    51/84

    X8Y 3ranscripto#e Analysis usin% NGS

    ,NA-Se*) or Lhole

    3ranscripto#e Shot%unSe*uencin%L BL3SSL

    re"ers to use o" H3S

    technolo%ies to se*uence

    cDNA in order to %etin"or#ation about a

    sa#pleKs ,NA content. ,eads produced by

    se*uencin%

    Ali%ned to a re"erence%eno#e to build

    transcripto#e #appin%s.

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    52/84

    X8Y A li ti B Di"" ti l i

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    53/84

    X8Y Applications B Di""erential expression

    @.,eads are #apped to the re"erence

    %eno#e or transcripto#e.(apped reads are asse#bled into

    expression su##aries Btables o"

    counts) sho&in% ho& #ay reads are in

    codin% re%ion) exon) %ene or JunctionZ

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    54/84

    @= years or plus o" hi%h

    throu%hput data analysis

    X8Y ,NA Se* data analysis - (appin%

    &ain ssues23umber of allowed mismatches

    3umber of multihits

    &ates expected distance

    Considerin exon 'unctions

    4nd up with a list of5 of reads per transcript

    These will be our 6discrete7response !ariable

    X8Y ,NA S d t l i N li i

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    55/84

    @= years or plus o" hi%h throu%hput data analysis

    3&o #ain sources o" bias

    In"luence o" lengt7: $ounts are proportional to the transcriptlen%th ti#es the #,NA expression level.

    In"luence o" sequencing 'et7: 3he hi%her se*uencin% depth) the

    hi%her counts.

    Ho& to deal &ith this ormaliAeBcorrect %ene counts to #ini#ize biases.

    2se statistical mo'elsthat ta'e into account

    lengt7 and sequencing 'et7

    X8Y ,NA Se* data analysis -Nor#alization

    X8Y ,NA Se* Di""erential expression #ethods

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    56/84

    @= years or plus o" hi%h throu%hput data analysis

    X8Y ,NA Se* - Di""erential expression #ethods

    6isherKs exact test or si#ilar approaches.

    2se Generalized 9inear (odels and #odel counts usin% 5oisson distribution. Ne%ative bino#ial distribution.

    3rans"or# count data to use existin% approaches "or#icroarray data.

    R

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    57/84

    X8Y Advanta%es o" ,NA-se*

    2nli'e hybridization approaches does not re*uire existin% %eno#ic

    se*uence 0xpected to replace #icroarrays "or transcripto#ic studies

    +ery lo& bac'%round noise ,eads can be unab#i%uously #apped

    ,esolution up to @ bp Hi%h-throu%hput *uantitative #easure#ent o" transcript abundance

    !etter than San%er se*uencin% o" cDNA or 0S3 libraries

    $ost decreasin% all the ti#e 9o&er than traditional se*uencin%

    $an reveal se*uence variations BSN5s Auto#ated pipelines available

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    58/84

    So"t&are "or NGS preprocessin% and analysis

    NGS Data analysis http://ueb.ir.vhebron.net/NGS

    hich so"t&are "or NGS Bdata analysis

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    59/84

    NGS Data analysis http://ueb.ir.vhebron.net/NGS

    hich so"t&are "or NGS Bdata analysis

    Ans&er is not strai%ht"or&ard.

    (any possible classi"ications !iolo%ical do#ains

    SN5 discovery) Geno#ics) $hI5-Se*) De-novo asse#bly) R

    !ioin"or#atics #ethods (appin%) Asse#bly) Ali%n#ent) Se*-1$)R

    3echnolo%y

    Illu#ina) 787) A!I SO9ID) Helicos) R Operatin% syste#

    9inux) (ac OS ) indo&s) R

    9icense type G59v

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    60/84

    NGS Data analysis http://ueb.ir.vhebron.net/NGS

    hich so"t&are "or NGS Bdata analysis

    Ans&er is not strai%ht"or&ard.

    (any possible classi"ications !iolo%ical do#ains

    SN5 discovery) Geno#ics) $hI5-Se*) De-novo asse#bly) R

    !ioin"or#atics #ethods (appin%) Asse#bly) Ali%n#ent) Se*-1$)R

    3echnolo%y

    Illu#ina) 787) A!I SO9ID) Helicos) R Operatin% syste#

    9inux) (ac OS ) indo&s) R

    9icense type G59v

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    61/84

    NGS Data analysis http://ueb.ir.vhebron.net/NGS

    So#e popular tools and places

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    62/84

    Galax Site

    ,

    http://galaxy.psu.edu/

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    63/84

    ,.

    /'tain "ata from man "atasources inclu"ing the0S %a'le 2roser32io4art3 Worm2ase3

    or our on "ata

    +repare "ata for further

    analsis ' rearrangingor cutting "ata columns3filtering "ata an" man

    other actions

    Anal6e "ata ' fin"ingo#erlapping regions3

    "etermining statistics3phlogenetic analsis

    an" much more

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    64/84

    ,7

    contains lins to

    the "onloa"ing3

    pre-procession an"analsis tools

    "isplasmenus an"

    "ata inputs

    Shos the historof analsis steps3"ata an" result#ieing

    !egister0ser

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    65/84

    ,5

    lic Get &ata

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    66/84

    ,,

    Get &ata

    from &ata'ase

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    67/84

    ,8

    0ploa" $ile$ile $ormat

    0ploa" or paste file

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    68/84

    ,9

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    69/84

    $AS%: file manipulation;

    format con#ersation3

    summar statistics3

    trimming rea"s3

    filtering rea"s

    ' qualit scoreist sa#e" histories an"share" histories

    Wor on a current histor3create ne3 share orflo

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    73/84

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    74/84

    DA3A +IS2A9I]A3ION

    NGS Data analysis http://ueb.ir.vhebron.net/NGS

    ?istor of Genome @isuali6ation

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    75/84

    ?istor of Genome @isuali6ation

    @P==s@^==s ===s

    ti#e

    Wh is #isuali6ation important

  • 8/11/2019 Curso de Genomica-Introduction to NGS Data Analysis-V3

    76/84

    Wh is #isuali6ation important

    mae large amounts of "ata more interpreta'le

    glean patterns from the "atasanit chec B #isual "e'ugging

    more