Dr. Sonika Tyagi, - Australia Bioinformatics Resource · Dr Philippa Griffin Open Data Coordinator...
Transcript of Dr. Sonika Tyagi, - Australia Bioinformatics Resource · Dr Philippa Griffin Open Data Coordinator...
A/Prof Vicky Schneider Deputy Director A/Prof Andrew Lonie Director Dr Philippa Griffin Open Data Coordinator
Dr.SonikaTyagi, Bioinforma2csSupervisorAGRF
TrainingCoordina2onEMBL-ABR
Outline
• Stackspipelineworkflow– SinglevsPERADseq– Overviewofdifferentstepsinthepipeline– DenovovsReferencebasedSNPcalls– Outputs
Single-endRAD
SlidesfromYvanleBras,AnthonyBretaudeau,CyrinMonjeaud,GildasleCorguilleatCESCO(Centred’ÉcologieetdesSciencesdelaConserva2on),IFB(FrenchIns2tuteofBioinforma2cs)SomeslidesoriginallyfromKarimGharbi–EdinburghGenomics,UniversityofEdinburgh
• classicRAD:readsbetweentherestric2onsiteandarandomsite
(shearing/sonica2on)• ddRAD:readsbetweenthe2restric2onsites.Somoreflexibility
onthebalancecoverage/depthofcovarage
RADvsddRAD
Becauseallreadsbeginwith[halfof]therestric2onsite
Biases
• Consequence:• TheIlluminasequencerhavedifficulty
separa2ngpolonies/clustersduringthefirstcyclesimagingstep
• Solu2on:
• useasetbarcodeswithdifferentsizes• mixdifferentexperienceswhichuse
differentrestric2onenzymes
MainBioinforma2cspipelines• STACKS
• Website: http://catchenlab.life.illinois.edu/stacks/ • mbRAD, ddRAD, ezRAD & 2bRAD? • STACKS does not handle INDELS, so any loci near an INDEL is lost • STACKS does not call SNPs from paired end reads natively, and does especially poorly
with paired end fragments that are not of a random length (e.g., ddRAD and ezRAD) • dDocent
• Website: https://ddocent.wordpress.com/ddocent-pipeline-user-guide/ • ddRAD & ezRAD
• PyRAD
• Website: http://dereneaton.com/software/pyrad/ • mbRAD, ddRAD, PE-ddRAD, GBS, PE-GBS, EzRAD, PE-EzRAD, 2B-RAD • use of an alignment-clustering method (vsearch)
• 2bRAD (Wang et al 2012)
• de novo: https://github.com/z0on/2bRAD_denovo • With reference genome: https://github.com/z0on/2bRAD_GATK • 2bRAD
http://catchenlab.life.illinois.edu/stacks J. Catchen, A. Amores, P. Hohenlohe, W. Cresko, and J. Postlethwait. Stacks: building and genotyping loci de novo from short-read sequences. G3: Genes, Genomes, Genetics, 1:171-182, 2011.
Stacks
STACKS:pilingsimilarreadstogether
A. PileexactmatchestogetherB. Makingdic9onarybasedonK-
mersC. MatchingreadsbasedonK-mer
similarityThingstoremember:1. Stacksisnotop6mizedfordifferent
readslength(allreadsfromdifferentbarcodesshouldbetrimmeduniformly).
2. PCRduplicatesarenotrecognizable.
3. INDELsarenothandled*
STACKS:pilingsimilarreadstogether
D.Matchessecondaryreadsthatwerenotini9allyplacedinastackagainstputa9velocitoincreasestackdepth.E.CallsaconsensussequenceandrecordsSNPandhaplotypedata.F.PuHngconsensussequenceintocatalogue
Results
• Buildingloci:Generates3filespersample:– sample_BARCODE.alleles.tsv– sample_BARCODE.snps.tsv– sample_BARCODE.tags.tsv
• CataloguingofobservedSNPs:– batch_1001.catalog.alleles.tsv– batch_1001.catalog.snps.tsv– batch_1001.catalog.tags.tsv
• Verifyingindividualsamplesagainstcatalogue– batch_1001.catalog.matches.tsv– sample_BARCODE.matches.tsv
SoI’vegotmySNPs……whatnext?• Whatisyourresearchques2on?
• Areyouinterestedin– Popula2onstructure– Gene2cdiversity– Phylogeography– Phylogene2chistory– ???
‘Typical’downstreamanalysisworkflow
• Ifit’sthefirst2meyou’reworkingwiththisspeciesandlibrarydesign:– ExplorehaplotypeandSNPcallsinStacksinterfacetoassessparametersenngeffects
• Exportafairly‘permissive’vcffilefrompopula6ons
• Furtherproject-specificfilteringusingvcooolsorcustomscripts• E.g.‘Exclude
individualswithmissingdataat>50%ofloci;thenexcludealllocimissingin>30%ofindividualsperpopula2on’
• Datavisualisa2onisusefulinassessingfiltering
Someideasfordownstreamanalysis
– Popula2onstructure• CanobtainF-sta2s2csfrompopula6onsitself
• F-sta2s2csinGENEPOP(popula6onsexportsGENEPOPformat)
• F-sta2s2csinR(e.g.adegenetpackage)
• Morecomplexclusteringapproachestoexplorestructurewithoutassump2ons
– PCA,DAPC– STRUCTURE
Someideasfordownstreamanalysis
– Gene2cdiversity• popula6onscanoutputheterozygosity,pi,FISperpopula2on
• Alterna2vely,outputasvcfandcalculateinRorothersooware
Someideasfordownstreamanalysis
– Phylogeography• Outputasvcf,usetoolsinRorothersooware
– Phylogeny• Outputasfastaorphylipformat;easilyconvertedtonexus(forexample
• Useanynumberoftree-buildingapproaches:BEAST,MrBayes,SplitsTree,PAUP...
Public Galaxy Servers Training: http://galaxy-tut.genome.edu.au
Research work: http://galaxy-mel.genome.edu.au http://galaxy-qld.genome.edu.au http://usegalaxy.org
List of other public Galaxy servers: https://wiki.galaxyproject.org/PublicGalaxyServers
galaxy-mel galaxy-mel.genome.edu.au
• Galaxy server for (primarily Melbourne) researchers
• Available to everyone • Users get 100GB disk but can get more • Helpdesk available: [email protected] • Stacks software installed and available.