More loops

22
4.1 More loops

description

More loops. Commands inside a loop are executed repeatedly (iteratively): my $num=0; print "Guess a number.\n"; while ($num != 31) { $num = ; } print "correct!\n";. Loops. my @names = ; chomp(@names); my $name; foreach $name (@names) { print "Hello $name!\n"; }. - PowerPoint PPT Presentation

Transcript of More loops

Page 1: More loops

4.1

More loops

Page 2: More loops

4.2Loops

Commands inside a loop are executed repeatedly (iteratively):

my $num=0;

print "Guess a number.\n";

while ($num != 31) {

$num = <STDIN>;

}

print "correct!\n";my @names = <STDIN>;

chomp(@names);

my $name;

foreach $name (@names) {

print "Hello $name!\n";

}

Page 3: More loops

4.3Loops: for

The for loop is controlled by three statements:

• 1st is executed before the first iteration

• 2nd is the stop condition

• 3rd is executed before every re-iteration

for (my $i=0; $i<10; $i++) {

print "$i\n";

}

my $i=0;

while ($i<10){

print "$i\n";

$i++;

}

These are equivalent

Page 4: More loops

4.4Breaking out of loops

next – skip to the next iteration last – skip out of the loop

my @lines = <STDIN>;

foreach $line (@lines) {

if (substr($line,0,1) eq ">") { next; }

if (substr($line,0,8) eq "**stop**") { last; }

print $line;

}

Page 5: More loops

4.5Breaking out of loops

die – end the program and print an error message to the standard error <STDERR>

if ($score < 0) { die "score must be positive"; }

score must be positive at test.pl line 8.

Note: if you end the string with a "\n" then only your message will be printed

* warn does the same thing as die without ending the program

Page 6: More loops

4.6

The Programming Process

Page 7: More loops

4.7The programming process

It pays to plan ahead before writing a computer program:

1. Define the purpose of the program

2. Identify the required inputs

3. Decide how to present the outputs

4. Make an overall design of the program

5. Refine the design, specify more details

6. Write the code – one stage at a time and test each stage

7. Debug…

Page 8: More loops

4.8An example: SAGE libraries

1. Double-stranded cDNA is generated from cell extracts

2. The cDNA is cleaved with a restriction enzyme (NlaIII)

3. The most 3'-end of the cDNA is then collected by their poly-A

4. The fragments are ligated to linkers containing a recognition site for a type IIS restriction enzyme and a PCR primer site

5. This restriction enzyme cuts 15bp away from its recognition site

6. Ligation, PCR, cleavage, concatenation, cloning, sequencing… A 10bp tag sequence from each mRNA

7. 10bp sequences are searched in an mRNA database and the corresponding genes are identified

SAGE (Serial Analysis of Gene Expression) is used to identify all transcripts that are expressed in a tissue:

(1)

(2&3)

(4&5)

Page 9: More loops

4.9

An example: SAGE

libraries

SAGE (Serial Analysis of

Gene Expression) is used to

identify all transcripts that

are expressed in a tissue:

Page 10: More loops

4.10Predicting the SAGE tag of an mRNA

It would be useful to know what tag to expect for each mRNA in the database.

So lets write a script:

1. Purpose: To predict the 10bp sequence of the SAGE tag of a given mRNA

2. Inputs: A list of mRNA sequences in FASTA format

>gi|24646380|ref|NM_079608.2| Mus musculus EH-domain containing 4 (EHD4), mRNA GTGGTATTTCTTCGTTGTCTCTGGCGTGGTCACGTTGATTGGTCCGCTATCTGGACCGAAAAAAGTCGTA......GTCGACGGCGATGGGTTCCTGGACTCTGACGAGTTCGCGCTGGCCTTGCACTTAATCAACGTCAAGCTGGAAGGCTGCGAGCTGCCCACCGTGCTGCCGGAGCACTTAGTACCGCCGTCGAAGCGCTATGACTAGTGTCCTGTAGCATACGCATACGCACACTAGATCACACAGCCTCACAATTCCCAAAAAAAAAAAAAAAA

>gi|71895640|ref|NM_001031040.1| Mus musculus EH-domain containing 3 (EHD3), mRNAGGTAGGGCGCTACCGCCTCCGCCCGCCTCTCGCGCTGTTCCTCCGCGGTATGCCCGCGCCGGCAGCCGGC......TATTATATAGAGAAATATATTGTGTATGTAGGATGTGCTTATTGCATTACATTTATCACTTGTCTTAACTAGAATGCATTAACCTTTTTTGTACCCTGGTCCTAAAACATTATTAAAAAGAAAGGCTAAAAAAAAAAAAAAAAA

>gi|55742710|ref|NM_153068.2| Mus musculus EH-domain containing 2 (Ehd2), mRNATGAGGGGGCCTGGGGCCCGCCCTGCTCGCCGCTCCTAGCGCACGCGGCCCCACCCGTCTCACTCCACTGC......

Page 11: More loops

4.11

3. Decide how to present the results

Simply print the header line of each mRNA and then it’s predicted 10bp tag, like so:

> gi|24646380|ref|NM_079608.2| Mus musculus EH-domain containing 4 (EHD4), mRNAATCACACAGC

>gi|71895640|ref|NM_001031040.1| Mus musculus EH-domain containing 3 (EHD3), mRNAAATGCATTAA

...

...

Page 12: More loops

4.12

4. Overall design:

1. For each mRNA in the input:

1. Read the sequence

2. Find the most downstream recognition site of NlaII (CTAG)

3. Get the 10bp tag after that site

4. Print it

Page 13: More loops

4.13

Read sequence

Find most downstream CTAG

Get the 10bp tag

Print the tag

End of input? No

End

StartFlow diagram:

Page 14: More loops

4.14

5. Refine the design, specify more details:

1. For each mRNA in the input (use a loop):

1. Read the sequence

1. Store its header line in one string variable

2. Concatenate all lines of the sequence and store it in another string variable

2. Find the most downstream recognition site of NlaII (CTAG)

1. Go over the sequence with a loop, starting from the 3’ tail, and going back until the first CTAG is found

3. Get the 10bp tag after that site

1. Take a substr of length 10

4. Print it

6. Write the code

Page 15: More loops

4.15

Read sequence

Find most downstream CTAG

Get the 10bp tag

Print the tag

End of input? No

End

Start

Save header

Read line

Header?

Yes

Concatenate to sequence

No

Read line

Read line

Page 16: More loops

4.16

Start pos. at end of sequence

Check pos. for “CTAG”

“CTAG” at pos?pos--

Yes

Read sequence

Find most downstream CTAG

Get the 10bp tag

Print the tag

End of input? No

End

Start

Pos < 0?

Page 17: More loops

4.17

Start pos. at end of sequence

Check pos. for “CTAG”

“CTAG” at pos?pos--

Yes

Pos < 0?

Yes

Find most downstream CTAG

Print “no tag”

Page 18: More loops

4.18

Start pos. at end of sequence

Check pos. for “CTAG”

“CTAG” at pos?pos--

Yes

Pos < 0?

Yes

Pos < 0?Yes No

Print tagPrint “no tag”

Find most downstream CTAG

Page 19: More loops

4.19FASTA: Analyzing complex input

Overall design:

1. Read the sequence

2. Do something

Let’s see how it’s done…

Do something

End of input? No

End

Start

Save header

Read line

Header?

Yes

Concatenate to sequence

No

Read line

Read line

Page 20: More loops

4.20$line = <STDIN>;

my $endOfInput = 0;while ($endOfInput==0) {

# 1.1. Read sequence name from FASTA headerif (substr($line,0,1) eq ">") {

$name = substr($line,1);} else...

# 1.2. Read sequence until next FASTA header$seq = "";$line = <STDIN>;while (substr($line,0,1) ne ">") {

$seq = $seq . $line;$line = <STDIN>;if (!defined($line)) {

$endOfInput = 1;last;

}}

# 2. Do something...}

Do something

End of input? No

End

Start

Save header

Read line

Header?

Yes

Concatenate to sequence

No

Read line

Read line

Page 21: More loops

4.21#################################### 1. Foreach sequence in the inputmy (@lines, $line, $name, $seq);$line = <STDIN>;chomp $line;

my $endOfInput = 0;while ($endOfInput==0) {

################################# 1.1. Read sequence name from FASTA headerif (substr($line,0,1) eq ">") {

$name = substr($line,1);} else {

die "bad FASTA format";}# 1.2. Read sequence until next FASTA header$seq = "";$line = <STDIN>;chomp $line;# Read until next header or end of inputwhile (substr($line,0,1) ne ">") {

$seq = $seq . $line;$line = <STDIN>;if (!defined($line)) {

$endOfInput = 1;last;

}chomp $line;

}

################################# 2. Do something...

}

Do something

End of input? No

End

Start

Save header

Read line

Header?

Yes

Concatenate to sequence

No

Read line

Read line

Page 22: More loops

4.22 FASTA: An alternative approach(which is more confusing and generally not recommended!)

my @fasta = <STDIN>;my $oneline = join("", @fasta); # Concatenate all lines for ($i=0; $i<length($oneline); $i++){ my $c = substr($oneline,$i,1); my $sub10 = substr($oneline,$i,10);

if ($c eq ">") { # Save header start position $start = ($i+1); } if ($c eq "]") { # Save header end position $end = $i; } if(???) { # If we found what we were looking for... # Print last header $name = substr($oneline,$start,$end-$start+1); }}