Mapping RNA sequence data Part 1: RNA-Rocket RNAseq pipeline · reference genome choose Plasmodium...
Transcript of Mapping RNA sequence data Part 1: RNA-Rocket RNAseq pipeline · reference genome choose Plasmodium...
![Page 1: Mapping RNA sequence data Part 1: RNA-Rocket RNAseq pipeline · reference genome choose Plasmodium falciparum 3D7. There are a number of options that may be modified, however, for](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9ab1d3733c8727c53b8925/html5/thumbnails/1.jpg)
MappingRNAsequencedataPart1:RNA-RocketRNAseqpipeline
ThegoalofthisexerciseistoretrieveanRNA-seqdatasetinFASTQformatandrunitthroughan RNA-sequence analysis pipeline. We will be using Pathogen Portal’s RNA-Rocket whichincludesaworkflowformappingRNA-Seqreadstoareferencegenome,usingthismappingtoassembletranscripts,mappingtranscriptstoexistingannotations,anddeterminingexpressionlevels.Themappingworkflowusestwoalgorithms,TopHatforaligningreadsandCufflinksfortranscriptpredictionandcalculatingexpressionlevels.TheinputrequiredisFASTQfilesandtheoutputsarereadalignments(BAMFiles),tabdelimitedassemblyandexpressionfilesforknowngenes,isoformsandnoveltranscripts.1. CreateanaccountonRNARocket
a. Go tohttp://rnaseq.pathogenportal.org/ b. Click on Create an Account and fill in the required information.
Clickheretocreateanaccountorlogintoyourexistingaccount
![Page 2: Mapping RNA sequence data Part 1: RNA-Rocket RNAseq pipeline · reference genome choose Plasmodium falciparum 3D7. There are a number of options that may be modified, however, for](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9ab1d3733c8727c53b8925/html5/thumbnails/2.jpg)
2. UploadtheRNAsequencingreadstoyourRNARocketlaunchpad.RNARocketallowsyoutodirectlyretrieveFASTQfilesofthesequencingreadsusingSRAaccessionnumbers.
a. Background:Thisexercisewill relyondatadeposited in thesequencereadarchive (SRA).
ThedataisbasedontranscriptomicanalysisofthreedevelopmentalstagesofPlasmodiumfalciparum:
1.Salivaryglandsporozoites2.Culturedsporozoites,and3.Culturedasexualstages.
EachdevelopmentalstagewasassayedbyRNAsequencing(2replicatespersample).Thestudyaccession number for this data on SRA is SRP033414 and additional information about thisexperimentmaybeobtainedfromGEO:http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE52867Examining the information available in GEO and under the SRA accession numbers you willnoticethatthisdataispairedend.Soforeachsamplethereshouldbetwofilesoneforeachofthepairs.Moreinformationforeachsequencingruncanbefoundat:Salivaryglandsporozoitessample1:http://www.ncbi.nlm.nih.gov/sra/SRX385640Salivaryglandsporozoitessample2:http://www.ncbi.nlm.nih.gov/sra/SRX385641Culturedsporozoitessample1: http://www.ncbi.nlm.nih.gov/sra/SRX385642Culturedsporozoitessample2: http://www.ncbi.nlm.nih.gov/sra/SRX385643Asexualstageparasitessample1: http://www.ncbi.nlm.nih.gov/sra/SRX385644Asexualstageparasitessample2: http://www.ncbi.nlm.nih.gov/sra/SRX385645TherequiredinputfileforRNARocket’sanalysispipeline isaFASTQfile,atextfile(similartoFASTA)thatincludessequencequalityinformationanddetailsinadditiontothesequence(ie.name,qualityscores,sequencingmachineID,lanenumberetc.).FASTQfilesarelargeandasaresult not all sequencing repositorieswill store this format. However, tools are available toconvert, for example, NCBI’s SRA format to FASTQ. Sequence data is housed in threerepositoriesthataresynchronizedonaregularbasis.
▪ ThesequencereadarchiveatGenBank▪ TheEuropeanNucleotideArchiveatEMBL▪ TheDNAdatabankofJapan
![Page 3: Mapping RNA sequence data Part 1: RNA-Rocket RNAseq pipeline · reference genome choose Plasmodium falciparum 3D7. There are a number of options that may be modified, however, for](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9ab1d3733c8727c53b8925/html5/thumbnails/3.jpg)
b. UploaddataintoyourLaunchpad.Note:DuringthisexerciseyouwillNOTdownloadanydatatoyourcomputer.InsteadyouwillbeprovidinginformationtoenabletransferringdatafromENA/SRAtoRNA-Rocket.
i. Clickonthe“LaunchPad”linkintheGalaxymenubar.Thenselect“FromENA/SRA”.
![Page 4: Mapping RNA sequence data Part 1: RNA-Rocket RNAseq pipeline · reference genome choose Plasmodium falciparum 3D7. There are a number of options that may be modified, however, for](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9ab1d3733c8727c53b8925/html5/thumbnails/4.jpg)
ii. Onthenextpage,noticetheinstructionstousetheglobalsearchontheENAsite.Clickoncontinue.
iii. Cutandpastethestudyaccessionnumber(SRP033414)intothesearchbox(seeredcirclebelow).Clickonthesearchicon.
iii. Depending on RNA-rocket’s configuration you may be taken to the EBI searchresultspagewhereyouwillneedtoclickontheStudylinkIDinordertogettothestudypage.Ifyourpagelookslikethesecondscreenshot,pleaseproceedtoiv.
![Page 5: Mapping RNA sequence data Part 1: RNA-Rocket RNAseq pipeline · reference genome choose Plasmodium falciparum 3D7. There are a number of options that may be modified, however, for](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9ab1d3733c8727c53b8925/html5/thumbnails/5.jpg)
iv. Click on the link for File 1 in the column called “Fastq files (galaxy)” for the sample
assignedtoyourgroup,thenclickonthebackbuttononyourbrowserandclickonthelinkforFile2fromthesamesample.ThiswillbeginthefiletransfertoRNA-Rocket.YoumayneedtoscrolldowntoseetheReadFilestabwhichcontainstheFastqfiles(galaxy)columnthatyouneed.Youwillneedtoget2 files,oneforeachfilegeneratedbythepairedendsequencing.
![Page 6: Mapping RNA sequence data Part 1: RNA-Rocket RNAseq pipeline · reference genome choose Plasmodium falciparum 3D7. There are a number of options that may be modified, however, for](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9ab1d3733c8727c53b8925/html5/thumbnails/6.jpg)
Youshouldnowseeawindowthatlookssimilartothis:
Toviewtheprogressofyourupload,clickon“ProjectView”(redsquareinimageabove).
Youcaninspectthecontentsofcompletedtasks(likeuploadedfiles)byclickingontheeye iconnext tothenameof the file (arrow inabove image). InspectingaFASTQfileshouldlooklikethis:
![Page 7: Mapping RNA sequence data Part 1: RNA-Rocket RNAseq pipeline · reference genome choose Plasmodium falciparum 3D7. There are a number of options that may be modified, however, for](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9ab1d3733c8727c53b8925/html5/thumbnails/7.jpg)
c. ConfigureandinitiatetheRNAsequenceanalysispipeline.i. Background: Pathogen portal uses two algorithms for mapping (TopHat) and
transcript prediction and expression value calculation (Cufflinks). Note that therearemanyalgorithmsandmethodsforRNA-seqmappingandanalysiseachwith itsadvantages and disadvantages. You are encouraged to learn more about thealgorithmyouareusing.
o TopHat: http://tophat.cbcb.umd.edu/o Cufflinks: http://cufflinks.cbcb.umd.edu/index.html
ii. Navigatetotheworkflow.Clickonthe“LaunchPad”linkintheuppermenubar.On
the next page, scroll down to the “RNA-Seq Analysis” section and click on “MapReads&AssembleTranscripts”.
![Page 8: Mapping RNA sequence data Part 1: RNA-Rocket RNAseq pipeline · reference genome choose Plasmodium falciparum 3D7. There are a number of options that may be modified, however, for](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9ab1d3733c8727c53b8925/html5/thumbnails/8.jpg)
iii. SelectAnalysisType.Onthenextpage,scrolldownandchooseEukaryoticPaired-EndAnalysisunderSelectAnalysisType.Weareanalyzingapairedendeukaryoticsample.
iv. Selectthetargetprojectfromthedropdownmenu.Youshouldonlyhaveoneor
two projects one of which will contain both FASTQ files you uploaded (probablycalled“UploadedFiles”).OnceyouselectthecorrectprojectyoushouldseethetwoFASTQfilescontainedwithinit.Nextclickoncontinue.
v. Configurethepipeline.Thepipelineconsistsof7steps.
Step1:Inputdataset–Selecttheupstreamreadfile(endsin_1)andclickonthearrowtomoveittothe“Selected”window.
Step2:Inputdataset–Selectthedownstreamreadfile(endsin_2)andclickonthearrowtomoveittothe“Selected”window.
![Page 9: Mapping RNA sequence data Part 1: RNA-Rocket RNAseq pipeline · reference genome choose Plasmodium falciparum 3D7. There are a number of options that may be modified, however, for](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9ab1d3733c8727c53b8925/html5/thumbnails/9.jpg)
Step3: TopHat2 – Under Select areference genome choose Plasmodiumfalciparum3D7.Thereareanumberofoptionsthatmaybemodified,however,for the purposes of this exercise thedefaultparametersmaybeused.
Step4:Cufflinks–Set the Maximum Intron Length (-I):5000.The reference annotation should be automaticallyselected:Plasmodiumfalciparum3D7Select how to use the provided annotation:AssembleNovel+annotatedtranscripts.
![Page 10: Mapping RNA sequence data Part 1: RNA-Rocket RNAseq pipeline · reference genome choose Plasmodium falciparum 3D7. There are a number of options that may be modified, however, for](https://reader033.fdocuments.in/reader033/viewer/2022050510/5f9ab1d3733c8727c53b8925/html5/thumbnails/10.jpg)
Once again there are a number of options to modify but we only need to change themaximumIntronLength.Step5:BAMtoBigWig–NochangeneededStep6:BAMtoBigWig–NochangeneededStep7:CreateaBedGraphofgenomecoverage–NochangeneededClickontheRunWorkflowbutton.
After you start theworkflow you should get a confirmationwindow listing all the steps thathave been added to the queue. The progress of yourworkflow can be viewed to the right.Completedtasksareingreen,runningtasksareinyellowandtaskswaitinginthequeueareingrey. Theworkflowwill run overnight andwewill view the results and calculate differentialexpressioninasubsequentexercise.