NGS Bioinformatics Workshop 1.2 Tutorial – Sequence Formats, Databases and Visualization Tools

16
NGS Bioinformatics Workshop 1.2 Tutorial – Sequence Formats, Databases and Visualization Tools March 15 th , 2012 BioSci room B9242 Facilitator: Richard Bruskiewich Adjunct Professor, MBB

description

NGS Bioinformatics Workshop 1.2 Tutorial – Sequence Formats, Databases and Visualization Tools. March 15 th , 2012 BioSci room B9242 Facilitator: Richard Bruskiewich Adjunct Professor, MBB. Learning Objectives. Linux revisited Quick dive into the Open-Bio pool ( BioPython ) - PowerPoint PPT Presentation

Transcript of NGS Bioinformatics Workshop 1.2 Tutorial – Sequence Formats, Databases and Visualization Tools

Page 1: NGS Bioinformatics  Workshop 1.2 Tutorial – Sequence Formats, Databases and Visualization Tools

NGS Bioinformatics Workshop1.2 Tutorial – Sequence Formats, Databases and

Visualization Tools

March 15th, 2012BioSci room B9242

Facilitator: Richard BruskiewichAdjunct Professor, MBB

Page 2: NGS Bioinformatics  Workshop 1.2 Tutorial – Sequence Formats, Databases and Visualization Tools

Learning ObjectivesLinux revisitedQuick dive into the Open-Bio pool (BioPython)A first look at NGS data:

NCBI short read archiveProcessing NGS: FASTX tool kit et al.Visualization: IGV

Page 3: NGS Bioinformatics  Workshop 1.2 Tutorial – Sequence Formats, Databases and Visualization Tools

Files and Permission• Linux user permissions: owner, group, or others

– Owner/user is the person who created the file • “OWNS” the file / directory

– Group is a team of people that’s associated together• GROUP project / Team work

– Others is just other people on the server

• Each file / directory can have it’s permission set to (r)ead, (w)rite, or e(x)ecute

Page 4: NGS Bioinformatics  Workshop 1.2 Tutorial – Sequence Formats, Databases and Visualization Tools

Do a long listing (ls –l)– dr-x-wxrw- Separated into four sections– (d)(r - x)(- w x)(r w -)

Examples: chmod o+x foo.txt grant ‘execute’ permission to ‘others’ on foo.txtchmod g-rw foo.txt remove ‘read’ and ‘write’ permission from groupchmod ugo+rwx foo.txt grant all rights to everyone

To change the user/group (‘owner’) of a file: chmod ubuntu:ubuntu foo.txt

chmod: change file permissions

directory or file (-)user (owner) group others

Page 5: NGS Bioinformatics  Workshop 1.2 Tutorial – Sequence Formats, Databases and Visualization Tools

• Hitting “tab” will auto-complete file or program names (or suggest possible names)

• Up arrow will let you return to previous commands

• Editing of text files: “nano” is an easier alternative to “emacs”, but less powerful

alternatively, use SSH client to transfer files on your Windows desktop, edit them in Windows, then transfer back BUT: make sure you use a text editor that knows the difference between a Windows and a Linux text file (e.g. Notepad++)

a few useful tips…

Page 6: NGS Bioinformatics  Workshop 1.2 Tutorial – Sequence Formats, Databases and Visualization Tools

Some more useful basic Linux commands“cd” changes your directory, e.g. ‘cd /usr/local’“man” display manual for command, e.g. ‘man

‘ls’“pwd” tells you the directory you are currently

in (= working directory)“history” will list recent commands,

enumerated with line numbers. By; typing an exclamation point with the line number (e.g. !123), you can redo the command

Page 7: NGS Bioinformatics  Workshop 1.2 Tutorial – Sequence Formats, Databases and Visualization Tools

Accessing remote servers“ssh” – Secure Shell

ssh –i private_keypair user@host“scp” – Secure CoPy

ssh –i private_keypair [user@host:]sourcefile [user@host:]targetfile

Where user is the account (default: local user)and host is the internet name of the computer (defaults: local host)

Page 8: NGS Bioinformatics  Workshop 1.2 Tutorial – Sequence Formats, Databases and Visualization Tools

OpenBio Case Study: BioPython

http://biopython.org/wiki/Biopython http://biopython.org/DIST/docs/tutorial/Tutorial.html

Page 9: NGS Bioinformatics  Workshop 1.2 Tutorial – Sequence Formats, Databases and Visualization Tools

FIRST LOOK AT NGS DATA

NGS Bioinformatics Workshop1.2 Tutorial – Sequence Formats, Databases and Visualization Tools

Page 11: NGS Bioinformatics  Workshop 1.2 Tutorial – Sequence Formats, Databases and Visualization Tools

http://hannonlab.cshl.edu/fastx_toolkit/

Linux, MacOSX or Unix only

Page 12: NGS Bioinformatics  Workshop 1.2 Tutorial – Sequence Formats, Databases and Visualization Tools

Get the precompiled binarywget http://hannonlab.cshl.edu/fastx_toolkit/Ã fastx_toolkit_0.0.13_binaries_Linux_2.6_amd64.tar.bz2

bunzip2 fastx_toolkit_0.0.13_binaries_Linux_2.6_amd64.tar.bz2

tar –xvf fastx_toolkit_0.0.13_binaries_Linux_2.6_amd64.tar

sudo mv bin/* /usr/local/bin

Page 13: NGS Bioinformatics  Workshop 1.2 Tutorial – Sequence Formats, Databases and Visualization Tools

FASTX tool kit I FASTQ-to-FASTA converter

Convert FASTQ files to FASTA files. FASTQ Information

Chart Quality Statistics and Nucleotide Distribution FASTQ/A Collapser

Collapsing identical sequences in a FASTQ/A file into a single sequence (while maintaining reads counts)

FASTQ/A Trimmer Shortening reads in a FASTQ or FASTQ files (removing barcodes or

noise). FASTQ/A Renamer

Renames the sequence identifiers in FASTQ/A file. FASTQ/A Clipper

Removing sequencing adapters / linkers

Page 14: NGS Bioinformatics  Workshop 1.2 Tutorial – Sequence Formats, Databases and Visualization Tools

FASTX tool kit II FASTQ/A Reverse-Complement

Producing the Reverse-complement of each sequence in a FASTQ/FASTA file.

FASTQ/A Barcode splitter Splitting a FASTQ/FASTA files containing multiple samples

FASTA Formatter Changes the width of sequences line in a FASTA file

FASTA Nucleotide Changer Converts FASTA sequences from/to RNA/DNA

FASTQ Quality Filter Filters sequences based on quality

FASTQ Quality Trimmer Trims (cuts) sequences based on quality

FASTQ Masker Masks nucleotides with 'N' (or other character) based on quality

Page 15: NGS Bioinformatics  Workshop 1.2 Tutorial – Sequence Formats, Databases and Visualization Tools

www.bioinformatics.bbsrc.ac.uk/projects/download.html

http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/

Page 16: NGS Bioinformatics  Workshop 1.2 Tutorial – Sequence Formats, Databases and Visualization Tools

Integrative Genomics Viewerhttp://www.broadinstitute.org/igv/