Using GC content to distinguish Phytophthora sequences from tomato sequences.

24
Using GC content to distinguish Phytophthora sequences from tomato sequences

Transcript of Using GC content to distinguish Phytophthora sequences from tomato sequences.

Page 1: Using GC content to distinguish Phytophthora sequences from tomato sequences.

Using GC content to distinguish Phytophthora sequences from

tomato sequences

Page 2: Using GC content to distinguish Phytophthora sequences from tomato sequences.

Mission #1

Calculate the GC content of each sequence in the Phytophthora-tomato interactome

We will use a perl script to accomplish the mission.

Page 3: Using GC content to distinguish Phytophthora sequences from tomato sequences.

Preparation

• Download the perl script (gc.pl) from the class web site and store it in C:/BioDownload folder

Page 4: Using GC content to distinguish Phytophthora sequences from tomato sequences.

• Open cygwin, or command prompt (Vista users), or terminal (Mac users)

• Change directory (cd) to the BioDownload folder

perl<space>gc.pl<space>PhytophSeq1.txt<space>phyto_gc.out

Running the script

Page 5: Using GC content to distinguish Phytophthora sequences from tomato sequences.

In cygwin (Windows users) or terminal (Mac users)

grep<space>--perl-regexp<space>”\t”<space>-c<space>phytoph_gc.out

grep<space>”>”<space>-c<space>PhytophSeq1.txt

You should get the same number from the two commands.

The number should be 3921.

Results

Page 6: Using GC content to distinguish Phytophthora sequences from tomato sequences.

The output file

GC content column

Namecolumn

Page 7: Using GC content to distinguish Phytophthora sequences from tomato sequences.

Build a histogram of the values of GC content

We will use R program to accomplish this mission.

Mission #2

Page 8: Using GC content to distinguish Phytophthora sequences from tomato sequences.

http://www.r-project.org

Page 9: Using GC content to distinguish Phytophthora sequences from tomato sequences.
Page 10: Using GC content to distinguish Phytophthora sequences from tomato sequences.
Page 11: Using GC content to distinguish Phytophthora sequences from tomato sequences.

Mac users

Page 12: Using GC content to distinguish Phytophthora sequences from tomato sequences.

All Windows users

Page 13: Using GC content to distinguish Phytophthora sequences from tomato sequences.

XP users

Vista users

Page 14: Using GC content to distinguish Phytophthora sequences from tomato sequences.
Page 15: Using GC content to distinguish Phytophthora sequences from tomato sequences.

getwd() to know which folder you are in now

Page 16: Using GC content to distinguish Phytophthora sequences from tomato sequences.

setwd(“c:/BioDownload”) to change the working directory to C:/BioDownload

setwd(“/path/to/biodownload”) for Mac users

Page 17: Using GC content to distinguish Phytophthora sequences from tomato sequences.

data<-read.table(“phytoph_gc.out”,sep=“\t”,header=FALSE)

to read in the data in the file phytoph_gc.out (your file name may be different)

Page 18: Using GC content to distinguish Phytophthora sequences from tomato sequences.

data[1:10,]

to see the first 10 lines of the vector “data”

Page 19: Using GC content to distinguish Phytophthora sequences from tomato sequences.

gc<-data[,2]

to assign the values from the 2nd column of “data” to a new vector “gc”

Page 20: Using GC content to distinguish Phytophthora sequences from tomato sequences.

summary(gc)

to get the summary of the values in the vector “gc”

Page 21: Using GC content to distinguish Phytophthora sequences from tomato sequences.

hist(gc,breaks=58)

to draw a histogram of the values in “gc” vector

Breaks indicates how many cells you want for the histogram. It was calculated as 78.7 (max) - 21.2 (min). It means the bin of the histogram is ~ 1 GC value

Page 22: Using GC content to distinguish Phytophthora sequences from tomato sequences.

hist(gc,breaks=58,xlab=“GC content”,ylim=range(c(0,400)),main=“Histogram of GC content of sequences\ninPhytophthora-tomato interactome”)

to make the histogram look better

Page 23: Using GC content to distinguish Phytophthora sequences from tomato sequences.

>pdf(“gc_histogram.pdf”)>hist(gc,breaks=58,xlab=“GC content”,ylim=range(c(0,400)),main=“Histogram of GC content of sequences\ninPhytophthora-tomato interactome”)>dev.off()

To output the histogram to a PDF file.

Page 24: Using GC content to distinguish Phytophthora sequences from tomato sequences.

location

file