Install and run external command line softwaresbcb.unl.edu/yyin/teach/PBB/linux-cmd4.pdf ·...

26
Install and run external command line softwares Yanbin Yin 1

Transcript of Install and run external command line softwaresbcb.unl.edu/yyin/teach/PBB/linux-cmd4.pdf ·...

Page 1: Install and run external command line softwaresbcb.unl.edu/yyin/teach/PBB/linux-cmd4.pdf · Homework #8 Create a folder under your home called hw8 Change directory to hw8 Download

Install and run external command line softwares

YanbinYin

1

Page 2: Install and run external command line softwaresbcb.unl.edu/yyin/teach/PBB/linux-cmd4.pdf · Homework #8 Create a folder under your home called hw8 Change directory to hw8 Download

Homework#8Createafolderunderyourhomecalledhw8

Changedirectorytohw8

DownloadEscherichia_coli_K_12_substr__MG1655_uid57779faa filefromNCBIandcopyitasmg1655.faa

DownloadEscherichia_coli_O157_H7_uid57781allfaa files fromNCBIandconcatenatethemaso157h7.faa

Createunix one-linerstocounthowmanyproteinsarethereinthetwonewlymadefaa files

Createunix one-linerstodothefollowing:

- BLASTPsearchofmg1655.faavs.o157h7.faaandfindouthowmanymg1655.faaproteinshavehitsino157h7.faa(useE-value<1e-5tofilterhits)

- BLASTPsearchofo157h7.faavs.mg1655.faaandfindouthowmanyo157h7.faaproteinshavehitsinmg1655.faa(useE-value<1e-5tofilterhits)

2

Writeareport(inwordorppt)toincludealltheoperationsandscreenshots.

DueonNov21(sendbyemail,ifthereare2+files,puttheminazipfile;includeyourlastnameinthefilename)

Page 3: Install and run external command line softwaresbcb.unl.edu/yyin/teach/PBB/linux-cmd4.pdf · Homework #8 Create a folder under your home called hw8 Change directory to hw8 Download

Programs/toolsweoftenuse

• BLAST• FASTA• HMMER• EMBOSS• lftp• bioperl• R

• Clustalw• MAFFT• MUSCLE• SRAtoolkit• weblogo• PhyML• FastTree• RaxML• USEARCH• ...

http://gacrc.uga.edu/3

Page 4: Install and run external command line softwaresbcb.unl.edu/yyin/teach/PBB/linux-cmd4.pdf · Homework #8 Create a folder under your home called hw8 Change directory to hw8 Download

4

OnWINDOWS,youdownloadsomesoftwares usuallyinexeformat.Youdoubleclickonsomeiconandwindowsinstallerwilldotherestforyou.Usuallyyouwillbeasked:whereyouwanttohavethisprograminstalledto(C:\WINDOWS\Programs\)

OnLinux,it’sverydifferent.Youdownloadthesoftwarepackage,usuallyincompressedformat(.gz or.zip).Youhavetotypeinaseriesofcommandstoinstallit,oftenwriteittoasystemfolder(e.g./usr/bin),whichyoudon’thaveaccessunlessyouaretherootuser

Page 5: Install and run external command line softwaresbcb.unl.edu/yyin/teach/PBB/linux-cmd4.pdf · Homework #8 Create a folder under your home called hw8 Change directory to hw8 Download

Linux-basedprogramtypes

• SourcecodesinC,C++,Java,Fortranetc.– Needtobecompiledbeforeexecutethecommand

• Precompiledexecutables orbinarycodes• Sourcecodesinscriptinglanguages(perl,python,Retc.)– Canexecutedirectly

Manyprogramsneeddependencies(ifordertoinstallprogramA,youneedinstallBfirst…)

5

Page 6: Install and run external command line softwaresbcb.unl.edu/yyin/teach/PBB/linux-cmd4.pdf · Homework #8 Create a folder under your home called hw8 Change directory to hw8 Download

Onyourownubuntu machine…• Youaretheroot(administrator)andusingthesudo

commandyoucaninstallanythingyouwantintothesystemdirectory(/usr/bin/,/bin/,/lib/etc.)– apt-get(AdvancedPackagingTool)candomanyinstallationsfor

youfromsourceorbinarycodeshttps://help.ubuntu.com/12.04/serverguide/apt-get.html

• OnSer,youarenottherootandyoucanonlyinstallthingsunderyourhomeusingthe“hard”way– Download->unpack->install->editPATHenvironmentalvariable– Makesureyoucreatefoldersforeachtools,e.g.

yourhome/tools/fasta

6

NotonSer

Page 7: Install and run external command line softwaresbcb.unl.edu/yyin/teach/PBB/linux-cmd4.pdf · Homework #8 Create a folder under your home called hw8 Change directory to hw8 Download

InstallBLASTonyourownmachine[testifinstalled]sudo apt-get install blast2[testifinstalled]blastall[whereitinstalled]which blastall[ifyouwanttouninstall]sudo apt-get remove blast2[testifit’sgone]blastall

[newversionofblast]sudo apt-get install ncbi-blast+

7

NotonSer

Page 8: Install and run external command line softwaresbcb.unl.edu/yyin/teach/PBB/linux-cmd4.pdf · Homework #8 Create a folder under your home called hw8 Change directory to hw8 Download

Useapt-gettoinstall

lftp,emboss,hmmer,bioperl,clustaw,muscle,Rsudo apt-get install xxx

[totestifinstalled,typeinthecommand]

8

Page 9: Install and run external command line softwaresbcb.unl.edu/yyin/teach/PBB/linux-cmd4.pdf · Homework #8 Create a folder under your home called hw8 Change directory to hw8 Download

OnyourownMAC

http://www.macobserver.com/tmo/article/install_the_command_line_c_compilers_in_os_x_lionInstallXcode,thenCcompiler,thenyoucaninstallMacporthttp://www.macports.org/install.phpWithmacport,youcaninstallwget:sudo port install wgetlftp: sudo port install lftphmmer:sudo port install hmmeremboss:sudo port install embossR:sudo port install Rblast:http://www.blaststation.com/freestuff/en/howtoNCBIBlastMac.html

9

http://www.digimantra.com/howto/apple-aptget-command-mac/http://superuser.com/questions/173088/apt-get-on-mac-os-x

Page 10: Install and run external command line softwaresbcb.unl.edu/yyin/teach/PBB/linux-cmd4.pdf · Homework #8 Create a folder under your home called hw8 Change directory to hw8 Download

10

Installprogramsusingthehardway(yourareNOTtheroot)

Usuallywhenabioinformaticstoolisreleased,thetoolpackagealsoincludesareadme,install ormanualfileinadditiontotheprogramitself.

Inmostcases,therearemorethanonefilesintheprogramfolder.

Sometimesyouwillseefilesinmultipledifferentlanguages.

Thereadmeorinstallfilewilltellyouhowtoinstallthetool.

Basicsteps:downloadthepackage->unpack(untar,unzip)->readthereadme->install

Page 11: Install and run external command line softwaresbcb.unl.edu/yyin/teach/PBB/linux-cmd4.pdf · Homework #8 Create a folder under your home called hw8 Change directory to hw8 Download

InstallBLAST executable programBLAST source is written C and C program needs to be compiled to get executable programBut here we will download executables of BLAST

lftp ftp.ncbi.nih.gov:/blast/executables/LATEST

get ncbi-blast-2.7.1+-x64-linux.tar.gz

bye

Now you returned to Serls -l

mkdir tools

cd tools

mkdir blast

mv ../ncbi-blast-2.5.0+-x64-linux.tar.gz blast

cd blast

Unpack the tar balltar -zxf ncbi-blast-2.7.1+-x64-linux.tar.gz

ls -l

cd ncbi-blast-2.7.1+/bin

ls -l

./blastp -h

pwd 11

Page 12: Install and run external command line softwaresbcb.unl.edu/yyin/teach/PBB/linux-cmd4.pdf · Homework #8 Create a folder under your home called hw8 Change directory to hw8 Download

12

Youareatyourhomenano .bashrc

Addthefollowinginthebeginningofthefileexport PATH="$HOME/tools/blast/ncbi-blast-2.7.1+/bin:$PATH"

Nowexitfromnano anddothefollowing

. .bashrcblastp -h

Now return to your homecdblastp -hwhich blastpYou will be told that the blast version is BLAST 2.2.25+, and blastp in your current path is /usr/bin/blastp. This is an older version I installed two years ago, NOT the one you just installed. You have to give the full path to run the blast version in your home

/home/yyin/tools/blast/ncbi-blast-2.7.1+/bin/blastp

What if you don’t want to type this long path? You can add it to a hidden file in your home called .bashrc where Shell auto execute every time you log in: Shell will be notified that when you call blastp, go to that folder to find it.

Page 13: Install and run external command line softwaresbcb.unl.edu/yyin/teach/PBB/linux-cmd4.pdf · Homework #8 Create a folder under your home called hw8 Change directory to hw8 Download

EnvironmentvariableAnenvironmentvariableisanamedobjectthatcontainsdatausedbyoneormoreapplications.Thevalueofanenvironmentalvariablecanforexamplebethelocationofallexecutablefilesinthefilesystem,thedefaulteditorthatshouldbeused,orthesystemlocalesettings.UsersnewtoLinuxmayoftenfindthiswayofmanagingsettingsabitunmanageable.However,environmentvariablesprovidesasimplewaytoshareconfigurationsettingsbetweenmultipleapplicationsandprocessesinLinux.

env tolistallbuilt-inenvironmentvariable

PATH isaveryimportantenvironmentvariable.Thissetsthepaththattheshellwouldbelookingatwhenithastoexecuteanyprogram.Itwouldsearchinallthedirectoriesthataredefinedinthevariable.Rememberthatentriesareseparatedbya':'.Youcanaddanynumberofdirectoriestothislist.

PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games

13

Page 14: Install and run external command line softwaresbcb.unl.edu/yyin/teach/PBB/linux-cmd4.pdf · Homework #8 Create a folder under your home called hw8 Change directory to hw8 Download

Run BLAST in command line mode

14

http://www.ncbi.nlm.nih.gov/books/NBK52640/

Page 15: Install and run external command line softwaresbcb.unl.edu/yyin/teach/PBB/linux-cmd4.pdf · Homework #8 Create a folder under your home called hw8 Change directory to hw8 Download

15

WhyyoudoneedtorunBLASTincommandlineterminal?

- NCBI’sserverdoesnothaveadatabasethatyouwanttosearch

- MillionsofusersareusingNCBIBLASTservertoo

- Yourquerysethasmorethanonesequenceorevenagenome

- TheBLASToutputcouldbeprocessedandusedasinputforotherLinuxsoftwares

Page 16: Install and run external command line softwaresbcb.unl.edu/yyin/teach/PBB/linux-cmd4.pdf · Homework #8 Create a folder under your home called hw8 Change directory to hw8 Download

16

Fordoingblastsearch

1. Determinewhatisyourqueryanddatabase,andwhatisthecommand

2. Formatyourdatabase

3. Runtheblastcommand(whatE-valuecutoff,whatoutputformat)

4. Viewtheresult

5. Parsetheresult(hitIDs,hitsequences,alignment,etc.)

TherearetwoversionsofBLAST-blast2(legacyblast)-blast+(thisiswhatyouinstalledinyourhome/tools/)

https://www.biostars.org/p/9480/http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download

Page 17: Install and run external command line softwaresbcb.unl.edu/yyin/teach/PBB/linux-cmd4.pdf · Homework #8 Create a folder under your home called hw8 Change directory to hw8 Download

BLAST2(legacyblast)blastall - | less-p # specify blastp, blastn, blastx, tblastn, tblastx

More commands in blast package

formatdb (format database)megablast (faster version of blastn)rpsblast (protein seq vs. CDD PSSMs)impala (PSSM vs protein seq)bl2seq (two sequence blast)blastclust (given a fasta seq file, cluster them based on sequence similarity)blastpgp (psi-blast, iterative distant homolog search)

17

QueryProtein

DNA

DatabaseProteinDNA

http://www.ncbi.nlm.nih.gov/books/NBK1763/pdf/ch4.pdf

Page 18: Install and run external command line softwaresbcb.unl.edu/yyin/teach/PBB/linux-cmd4.pdf · Homework #8 Create a folder under your home called hw8 Change directory to hw8 Download

blastall options-pprogramname-ddatabasefilename(textfasta sequencefile)-i queryfilename-ee-valuecutoff(showhitslessthanthecutoff)-moutputformat-ooutputfilename(youcanalsouse>)-Ffilterlow-complexityregionsinquery-vnumberofone-linedescriptiontobeshown-bnumberofalignmenttobeshown-anumberofprocesserstobeused

18

Page 19: Install and run external command line softwaresbcb.unl.edu/yyin/teach/PBB/linux-cmd4.pdf · Homework #8 Create a folder under your home called hw8 Change directory to hw8 Download

Newversionblast+,e.g.blastp

blastp -help | less

-query query filename-db databasefilename-out outputfilename-evalue e-valuecutoff-outfmt outputformat-num_descriptions

-num_alignments

19

blastpblastntblastnblastxtblastx

Atyourhomels -l tools/ncbi-blast-2.5.0+/bin/22executablefiles(commands)

http://www.ncbi.nlm.nih.gov/books/NBK1763/pdf/CmdLineAppsManual.pdf

Page 20: Install and run external command line softwaresbcb.unl.edu/yyin/teach/PBB/linux-cmd4.pdf · Homework #8 Create a folder under your home called hw8 Change directory to hw8 Download

20

Exercise1:findCesA/Csl homologoussequenceincharophytic greenalgalgenome

Klebsormidium flaccidum genomewassequenced,analyzedandpublishedinhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC4052687/.WewanttofindouthowmanyCesA/Csl genesthisgenomehas.

NotethattherawDNAreadsareavailableinGenBank,buttheassembledgenomeandpredictedgenemodelsarenot

GotoMethodsoftheabovepaperandfindthelinktothewebsitethatprovidesthegenomeandannotationdata

Wget thepredictedproteinsequencefilewget -q http://www.plantmorphogenesis.bio.titech.ac.jp/~algae_genome_project/klebsormidium/kf_download/131203_kfl_initial_genesets_v1.0_AA.fasta

Makeindexfilesforthedatabasemakeblastdb -in 131203_kfl_initial_genesets_v1.0_AA.fasta -parse_seqids -dbtype 'prot’

Checkhowmanyproteinshavebeenpredictedless 131203_kfl_initial_genesets_v1.0_AA.fasta | grep '>' | wc

Page 21: Install and run external command line softwaresbcb.unl.edu/yyin/teach/PBB/linux-cmd4.pdf · Homework #8 Create a folder under your home called hw8 Change directory to hw8 Download

21

Wget ourquerysetcp /home/yyin/cesa-pr.fa .

Dotheblastp searchblastp -query cesa-pr.fa -db 131203_kfl_initial_genesets_v1.0_AA.fasta -out cesa-pr.fa.out -outfmt 11

Checkouttheoutputfileless cesa-pr.fa.out

Changetheoutputformattoatab-delimitedfileblast_formatter -archive cesa-pr.fa.out -outfmt 6 | less

FiltertheresultusingE-valuecutoffblast_formatter -archive cesa-pr.fa.out -outfmt 6 | awk '$11<1e-10' | less

blast_formatter -archive cesa-pr.fa.out -outfmt 6 | awk '$11<1e-10' | cut -f2 | less

Page 22: Install and run external command line softwaresbcb.unl.edu/yyin/teach/PBB/linux-cmd4.pdf · Homework #8 Create a folder under your home called hw8 Change directory to hw8 Download

22

Howdoyouextractthesequencesoftheblasthits?

DATABASEQuerySeqs HitIDs

HitSeqs

Furtheranalyses:MultiplealignmentPhylogeny,etc.

Linuxcmds

blastdbcmd

blast

blast_formatter

Linuxcmds

Page 23: Install and run external command line softwaresbcb.unl.edu/yyin/teach/PBB/linux-cmd4.pdf · Homework #8 Create a folder under your home called hw8 Change directory to hw8 Download

23

SavethehitIDsblast_formatter -archive cesa-pr.fa.out -outfmt 6 | awk '$11<1e-10' | cut -f2 | sort -u > cesa-pr.fa.out.id

Retrievethefasta sequencesofthehitsblastdbcmd -db 131203_kfl_initial_genesets_v1.0_AA.fasta -entry_batch cesa-pr.fa.out.id | less

blastdbcmd -db 131203_kfl_initial_genesets_v1.0_AA.fasta -entry_batch cesa-pr.fa.out.id | sed 's/>lcl|/>/' | sed 's/ .*//' | less

blastdbcmd -db 131203_kfl_initial_genesets_v1.0_AA.fasta -entry_batch cesa-pr.fa.out.id | sed 's/>lcl|/>/' | sed 's/ .*//’ > cesa-pr.fa.out.id.fa

Wget theannotationfilefromtheJapanwebsitewget -q http://www.plantmorphogenesis.bio.titech.ac.jp/~algae_genome_project/klebsormidium/kf_download/131203_gene_list.txt

Checkwhatourgenesareannotatedtobegrep -f cesa-pr.fa.out.id 131203_gene_list.txt

Page 24: Install and run external command line softwaresbcb.unl.edu/yyin/teach/PBB/linux-cmd4.pdf · Homework #8 Create a folder under your home called hw8 Change directory to hw8 Download

24

Ifaprogram(e.g.BLAST)runssolongonaremoteLinuxmachinethatitwon’tfinishbeforeyouleaveforhome…

Orifyousomehowwanttorestartyourlaptop/desktopwhereyouhaveaPuttysessionisrunning(Windows)orashellterminalisrunning(Ubuntu)…

Inanycase,youhavetoclosetheterminalsession(orhaveitbeautomaticallyterminatedbytheserver).Ifthishappens,yourprogramwillbeterminatedwithout finishing.Ifyouexpectyourprogramwillrunforaverylongtime,e.g.longerthan10hours,youmayput“nohup”beforeyourcommand;thisensuresthatevenifyouclosetheterminal,theprogramwillstillruninthebackgrounduntilitisfinishedandyoucanloginagainthenextdaytochecktheoutput.Forexample:

nohup blastp -query yeast.aa -db yeast.aa -out yeast.aa.ava.out-outfmt 6 &

Youwillgetanadditionalfilenohup.out intheworkingfolderandthisfilewillbeemptyifnothingwronghappened.

Page 25: Install and run external command line softwaresbcb.unl.edu/yyin/teach/PBB/linux-cmd4.pdf · Homework #8 Create a folder under your home called hw8 Change directory to hw8 Download

25

Youareatyourhomecd toolslftp emboss.open-bio.org

Insideembosslftp sitecd pub/EMBOSSget emboss-latest.tar.gz

Byetoexitfromlftpbye

NowyouarebacktoyourhomeonSertar zxf emboss-latest.tar.gz

./configure –prefix=$HOME/tools/emboss

make

make install

Ifyouaretheroot,onecommandcandoalltheaboveforyousudo apt-get install emboss

Testifembosshasbeeninstalled:

water

Exercise2:emboss

Page 26: Install and run external command line softwaresbcb.unl.edu/yyin/teach/PBB/linux-cmd4.pdf · Homework #8 Create a folder under your home called hw8 Change directory to hw8 Download

Changesequenceformatseqret –help

seqret -sequence /home/yyin/Unix_and_Perl_course/Data/GenBank/E.coli.genbank -outseqE.coli.genbank.fa -sformat genbank -osformat fasta

Calculatesequencelengthinfoseq –help

infoseq -sequence /home/yyin/cesa-pr.fa -name –length -only

Morecommandexamples:needle –helpwater –helpfuzznuc –helppepstats -helppepinfo –help

26

plotorf -helptranseq -helpprettyseq –help