Using GC content to distinguish Phytophthora sequences from tomato sequences

24
Using GC content to distinguish Phytophthora sequences from tomato sequences

description

Using GC content to distinguish Phytophthora sequences from tomato sequences. Mission #1. Calculate the GC content of each sequence in the Phytophthora -tomato interactome We will use a perl script to accomplish the mission. Preparation. - PowerPoint PPT Presentation

Transcript of Using GC content to distinguish Phytophthora sequences from tomato sequences

Page 1: Using GC content to distinguish  Phytophthora  sequences from tomato sequences

Using GC content to distinguish Phytophthora sequences from

tomato sequences

Page 2: Using GC content to distinguish  Phytophthora  sequences from tomato sequences

Mission #1

Calculate the GC content of each sequence in the Phytophthora-tomato interactome

We will use a perl script to accomplish the mission.

Page 3: Using GC content to distinguish  Phytophthora  sequences from tomato sequences

Preparation

• Download the perl script (gc.pl) from the class web site and store it in C:/BioDownload folder

Page 4: Using GC content to distinguish  Phytophthora  sequences from tomato sequences

• Open cygwin, or command prompt (Vista users), or terminal (Mac users)

• Change directory (cd) to the BioDownload folder

perl<space>gc.pl<space>PhytophSeq1.txt<space>phyto_gc.out

Running the script

Page 5: Using GC content to distinguish  Phytophthora  sequences from tomato sequences

In cygwin (Windows users) or terminal (Mac users)

grep<space>--perl-regexp<space>”\t”<space>-c<space>phytoph_gc.out

grep<space>”>”<space>-c<space>PhytophSeq1.txt

You should get the same number from the two commands.

The number should be 3921.

Results

Page 6: Using GC content to distinguish  Phytophthora  sequences from tomato sequences

The output file

GC content column

Namecolumn

Page 7: Using GC content to distinguish  Phytophthora  sequences from tomato sequences

Build a histogram of the values of GC content

We will use R program to accomplish this mission.

Mission #2

Page 8: Using GC content to distinguish  Phytophthora  sequences from tomato sequences

http://www.r-project.org

Page 9: Using GC content to distinguish  Phytophthora  sequences from tomato sequences
Page 10: Using GC content to distinguish  Phytophthora  sequences from tomato sequences
Page 11: Using GC content to distinguish  Phytophthora  sequences from tomato sequences

Mac users

Page 12: Using GC content to distinguish  Phytophthora  sequences from tomato sequences

All Windows users

Page 13: Using GC content to distinguish  Phytophthora  sequences from tomato sequences

XP users

Vista users

Page 14: Using GC content to distinguish  Phytophthora  sequences from tomato sequences
Page 15: Using GC content to distinguish  Phytophthora  sequences from tomato sequences

getwd() to know which folder you are in now

Page 16: Using GC content to distinguish  Phytophthora  sequences from tomato sequences

setwd(“c:/BioDownload”) to change the working directory to C:/BioDownload

setwd(“/path/to/biodownload”) for Mac users

Page 17: Using GC content to distinguish  Phytophthora  sequences from tomato sequences

data<-read.table(“phytoph_gc.out”,sep=“\t”,header=FALSE)

to read in the data in the file phytoph_gc.out (your file name may be different)

Page 18: Using GC content to distinguish  Phytophthora  sequences from tomato sequences

data[1:10,]

to see the first 10 lines of the vector “data”

Page 19: Using GC content to distinguish  Phytophthora  sequences from tomato sequences

gc<-data[,2]

to assign the values from the 2nd column of “data” to a new vector “gc”

Page 20: Using GC content to distinguish  Phytophthora  sequences from tomato sequences

summary(gc)

to get the summary of the values in the vector “gc”

Page 21: Using GC content to distinguish  Phytophthora  sequences from tomato sequences

hist(gc,breaks=58)

to draw a histogram of the values in “gc” vector

Breaks indicates how many cells you want for the histogram. It was calculated as 78.7 (max) - 21.2 (min). It means the bin of the histogram is ~ 1 GC value

Page 22: Using GC content to distinguish  Phytophthora  sequences from tomato sequences

hist(gc,breaks=58,xlab=“GC content”,ylim=range(c(0,400)),main=“Histogram of GC content of sequences\ninPhytophthora-tomato interactome”)

to make the histogram look better

Page 23: Using GC content to distinguish  Phytophthora  sequences from tomato sequences

>pdf(“gc_histogram.pdf”)>hist(gc,breaks=58,xlab=“GC content”,ylim=range(c(0,400)),main=“Histogram of GC content of sequences\ninPhytophthora-tomato interactome”)>dev.off()

To output the histogram to a PDF file.

Page 24: Using GC content to distinguish  Phytophthora  sequences from tomato sequences

location

file