BINF350, Tutorial 4 Karen Marshall. Aim ► Examine how blast parameters (e.g. scoring scheme, word...

22
BINF350, Tutorial 4 BINF350, Tutorial 4 Karen Marshall Karen Marshall
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    216
  • download

    0

Transcript of BINF350, Tutorial 4 Karen Marshall. Aim ► Examine how blast parameters (e.g. scoring scheme, word...

BINF350, Tutorial 4BINF350, Tutorial 4

Karen MarshallKaren Marshall

AimAim

►Examine how blast parameters (e.g. Examine how blast parameters (e.g. scoring scheme, word length) affect scoring scheme, word length) affect the alignment outcomethe alignment outcome

►To optimise blast parameters for To optimise blast parameters for alignments with different levels of alignments with different levels of sequence homologysequence homology

Practical: Part 1Practical: Part 1

► Start with an ~200 bp original DNA sequenceStart with an ~200 bp original DNA sequence► Simulation mutation events over time and collect Simulation mutation events over time and collect

sequencessequences► Blast original sequence against mutated Blast original sequence against mutated

sequencessequences► Repeat blasts using different parameters Repeat blasts using different parameters

vMutated sequences

Original sequence

Simulation of mutated Simulation of mutated sequencessequences

► Point accepted mutation (PAM) model of Point accepted mutation (PAM) model of molecular evolution molecular evolution

► 1 PAM = 1 mutation per 100 bases on 1 PAM = 1 mutation per 100 bases on averageaverage

1 PAM 1 PAM 99.0% sequence homology 99.0% sequence homology 10 PAM 10 PAM 90.6% sequence homology 90.6% sequence homology 50 PAM 50 PAM 63.5% sequence homology 63.5% sequence homology

Concept of forward and backwards mutationConcept of forward and backwards mutation

for each ‘successive PAM’ for each ‘nucleotide’ if (rand > 0.01) do not mutate else if (rand <=0.01) mutate by random selection from the non-identical bases

Base pairPAM

Re 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

0 AGATTCACTGGTGTGGCAAGTTGTCTCTCAGACTGTACATGCATTAAAATTTTGCTTGGCATTACTCAAAAGCAAAAGAAAAGTAAAAGGAAGAAACAAGAACAAGAAAAAAGATTATATTGATTTTAAAATCATGCAAAAACTGCAACTCTGTGTTTATATTTACCTGTTTATGCTGATTGTTGCTGGTCCAGTGGATCA G A T T C A C T G G T G T G G C A A G T T G T C T C T C A G A C T G T A C A T1 AGAGTCAGTGGTGTGGCAAGTTGTCTCTCAGACTGTACATGCATTAAAATTTTGCTTGGCATTACTCAAAACCAAAAGAAAAGTAAAAGGAAGAAACAAGAACAAGAAAAAAGATTATATTGATTTTAAAATCATGCAAAAACTGCAAGTCTGTGTTTATATTTACCTGTTTATGCTGATTGTTGCTGGTCCGGTGGATCA G A G T C A G T G G T G T G G C A A G T T G T C T C T C A G A C T G T A C A T2 AGAGTCAGTGGTGTGGCACGTTGTCTCTCAGACTGTACATGCATTTAAATTTTGCTTGGCATTACTCAAAACCAAAAGAAAAGTAAAAGGAAGAAACAAGAACAAGAAAAAAGATTATATTGATTTTAAAATCATGCAAAAACTGCAAGTCTGTGTTTATATTTACCTGTTTATGCTGATTGTTGCTGGTCCGGTGGATCA G A G T C A G T G G T G T G G C A C G T T G T C T C T C A G A C T G T A C A T3 AGAGTCAGTGGTGTGGCACGTTGTCTCTCAGACTGTACATGCATTTAAATTTTGCTTGGCATTACTCAAAACCAAAAGAAAAGTAAAAGGAAGAAACAATAACAAGAAAAAAGATTATATTGATTTTAAAATCATGCAAAAACTGCAAGTCTGTGTTTATATTTTCCTGTTTATGCTGATTGTTGCTGGTCCGGTGGATCA G A G T C A G T G G T G T G G C A C G T T G T C T C T C A G A C T G T A C A T4 AGTGTCAGTGGTGTGGCACGTTGTCTCTCAGACTGTACATGCATTTAAATTTCGCTTGGCATTAATCAAAACCAAAAGAAAAGGAAAAGGAAGAAACAATAACCAGAAAAAAGATTATATTGATTTTAAAATCATGCAAAAACTGCAAGTCTGTGTTTATATTTTCCTGTTTATGCTGATTGTTGCTGGTCCGGTGGATCA G T G T C A G T G G T G T G G C A C G T T G T C T C T C A G A C T G T A C A T5 AGTGTCAGTGGTGTGGCACGTTGTCTCTCAGACTGTACATGCATTTAAATTTCGCTTGGCATTAATCAAAACCAAAAGAAAAGGAAAAGGAAGAAACAATAACCAGAATAAAGATTATTTTGATTTTAAAATCATGCAAAAACTGCAAGTCTGTGTTTATATTTTCCTGTTTATGCTGATTGTTGCTGGTCCGGTGGATCA G T G T C A G T G G T G T G G C A C G T T G T C T C T C A G A C T G T A C A T6 AGTGTCAGTGGTGTGGCACGTTGTCTCTCAGACGGTACATGCATTTAAATTTCGCTTGGCATTAATCAAAACCATAAGAAAAGGAAAAGGAAGAAACAATAACCAGAATAAAGATTATTTTGATTTTAAAATCATGCAAAAACTGCAAGTCTGTGTTTATATTTTCCTGTTTATGCTGATTGTTGCTGGTCCGGTGGATCA G T G T C A G T G G T G T G G C A C G T T G T C T C T C A G A C G G T A C A T7 AGTGTCAGTGGTGTGGCACGTTGTCTCTCAGACGGTACATGCATTTAAATTTCGCTTGGCATTAATCAAAACCATAAGAAAAGGAAAAGGAAGAAACAATAACCAGAATAAAGATTATTTTGATTTTAAAATCATGCAAAAACTGCAAGTCTGTGTTTATATTTTCCTGTTTATGCTGATTGTTGCTGGTCCGGTGGATCA G T G T C A G T G G T G T G G C A C G T T G T C T C T C A G A C G G T A C A T8 AGTGTCAGTGGTGTGGCACGTTGTCTCTCAGACGGTACATGCATTTAAATTTCGCTCGGCATTAATCAAAACCATAAGAAAAGGAAAAGGAAGAAACAATAACCAGAATAAAGATTATTTTGATTTTAAAATCATGCAAAAACTGCAAGTCTGTGTTTATATTTTCCTGTTTATGCTGATTGTTGCTGGTCCGGTGGATCA G T G T C A G T G G T G T G G C A C G T T G T C T C T C A G A C G G T A C A T9 AGTGTCAGTGGTGTGGCACGTTGTCTCTCAAACGGTACATGCATTTAAATTTCGCTCGGCATTAATCAAAACCATAAGAAAAGGAAGAGGAAGAAACAATAACCAGAATAAAGATTATTTTGATTTTAAAATCATGCAAAAACTGCAAGTCTGTGTTTATATTTTCCTGTTTATGCTGATTGTTGCTGGTCCGGTGGATCA G T G T C A G T G G T G T G G C A C G T T G T C T C T C A A A C G G T A C A T

10 AGTGTCAGTGGTGTGGCACGTTGTCTCTCAAACGGCACGTGCATTTAAATTTAGCTCGGCATTAATCAAAACCATAAGAAAAGTAAGAGGAAGAAACAATAACCAGAATAAAGATTATTTTGCTTTTAAAATCTTGCAAAAACTGCAAGTCTGTGTTTATATTTTCCTGTTTATGCTGATTGTTGCTGGTCCGGTGGATCA G T G T C A G T G G T G T G G C A C G T T G T C T C T C A A A C G G C A C G T11 AGTGTCAGTGGTGTGGCACGTTGTCTCTCAAACGGCACGTGCATTTAAATTTAGCTCGGCATTAATCAAAACCAAAAGAAAAGTAAGAGGAAGAAACAATAACCAGAATAAAGATTATTTTGCTTTTAAAATCTTGCAAAAACTGCAAGTCTGTGTTTATATTTTCCTGTTTATGCTGATTGTTGCTGGTCCGGTGGATCA G T G T C A G T G G T G T G G C A C G T T G T C T C T C A A A C G G C A C G T12 AGTGTCAGTGGTGTTGCACGTTGTCTCTCAAACGGCACGTGCATTTCAATTTAGCTCGGCATTAATCAAAACCAAAAGAATAGTAAGAGGAAGTAACAATAACCAGAATAAAGATTATTTTGCTTTTAAAATCTTGCAAAAACTGCAAGTCTGTGTTTATATTTTCCTGTTTATGCTGATTGTTGCTGGTCCGGTGGATCA G T G T C A G T G G T G T T G C A C G T T G T C T C T C A A A C G G C A C G T13 AGTGTCAGTGGTGTTGCACGTTGTCTCTCAAACGGCACGTGCATTACAATTTAGCGCGGCATTAATCAAAACCAAAAGAATAGTAAGAGGAAGTAACAATAACCAGAAAAAAGATTATTTTGCTTTTAAAATCTTGCAAAAACTGCAAGTCTGTGTTTATATTTTCCTGTTTATGCTGATTGTTGCTGGTCCGGTGGATCA G T G T C A G T G G T G T T G C A C G T T G T C T C T C A A A C G G C A C G T14 AGTGTCAGTGGTGTTGCACGTTGTCTCTCAAACGGCACGTGCATTACAATTTAGCGCGGCATTAATCAAAACCAAAAGAATAGTAAGAGGAAGTAACAATAACCAGAAAAAAGATTATTTTGCTTTTAAAATCTTGCAAAAACTGCAAGTCTGTGTTTATATTTTCCTGTTTATGCCGATTGTTGCTGGTCCGGTGGATCA G T G T C A G T G G T G T T G C A C G T T G T C T C T C A A A C G G C A C G T15 AGTGTCAGTGGTGTTGTACGTTGTCTCTCAAACGGCACGTGCATTACGATTTAGCGCGGCATTAATAAAAACCAAAAGAATAGTAAGAGGAAGTAACAATAACCAGAAAAAAGATTATTTTGCTTTTAAAATCTTGCAAATACTGCAAGTCTGTGTTTATATTTTCCTGTTTATGCCGATTGTTGCTGGTCCGGTGGACCA G T G T C A G T G G T G T T G T A C G T T G T C T C T C A A A C G G C A C G T16 AGTGTCAGTGGTGTTGTACGTTGTCTCTCAAACGGCACGTGCATTACGATTTAGCGCGGCATTAATAAAAACCAAAAGAATAGTAAGAGGAAGTAACAATAACCAGAAAAAAGATTATTTTGCTTTTAAAATCTTGCAAATACTGCAAGTCTGTGTTTATATTCTCCTGTTTATGCCGATTGTTGCTGGTCCGGTGGACCA G T G T C A G T G G T G T T G T A C G T T G T C T C T C A A A C G G C A C G T17 AGTGTCAGTGGTGTTGTACGTTGTCTCTCAAACGGCACGTGCATTACGATTTAGCGCGGCATTAATAAAAACCAAAAGAATAGGAAGAGGAAGTAACAATAACCAGAAAAAAGATTATTTTGCTTTTAAAATCTTGCAAATACTGCAAGTCTGTGTTTATATTCTCCTGTTTATGCCGATTGTTGCTGATCCGGTGGACCA G T G T C A G T G G T G T T G T A C G T T G T C T C T C A A A C G G C A C G T18 AGTGTCAGTGGTGTTGTACGTTGTCTCTCAAACGGCACGTGCATTACGATTTAGCGCGGCATTAGTAAAAACCAAAAGAATAGGAAGAGGAAGTAACAATAACCAGAAAAAAGATTATTTTGCTTTTAAAATCTTGCAAATACTGCAAGTCTGTGTTTATATTCTCCTGTTTATGCCGATTGTTGCTGATCCGGTGGACCA G T G T C A G T G G T G T T G T A C G T T G T C T C T C A A A C G G C A C G T19 AGTGTCAGTGGTGTTGTAAGTTTTCTCTCAAACGGCACGTGCATTACGATTTAGCGCGGCATTAGTAAAAACCAAAAGAATAGGAAGAGGAAGTAACAATAACCAGAAAAAAGATTATTTTGCTTTTAAAATCTTGCAAATACTGCAAGTCTGTGTTTATATTCTCCTGTTTATGCCGATTGTTGCTGATCCGGTGGACCA G T G T C A G T G G T G T T G T A A G T T T T C T C T C A A A C G G C A C G T20 AGTGTCAGTGGTGTTGTAAGTTTTCTCTCAAACGGCACGTGCATTACGATTTAGCGCGGCATTAGTAAAAACCAAAAGGATAGGAAGAGGAAGTAACAATCACCGGAAAAAAGATTATTTTGCTTTTAAAATCTTGCAAATACTGCAAGTCTGTGTTTATATTCTCCTGTTTATGCCGATTGTTGCTGATCCGGTGGACCA G T G T C A G T G G T G T T G T A A G T T T T C T C T C A A A C G G C A C G T21 AGTGTCAGTGGTGTTGTAAGTTTTCTCTCAAACGGCACGTGCATTACGATTTAGCGCGGCATTAGTAAAAACCAAAAGGATAGGAAGAGGAAGTAACAATCACCGGAAAAAAGATTATTTTGCTTTTAAAATCTTGCAAATACTGCTAGTCTGTGTTTATATTCTCCTGTTTATGCCGATTGTTGCTGATCCGGTGGACCA G T G T C A G T G G T G T T G T A A G T T T T C T C T C A A A C G G C A C G T22 AGTGTCAGTGGTGTTGTAAGTTTTCTCTCAAACGGCACGTGCATTACGATTTAGCGCGGCATTAGTAAAAACCAAAAGGATAGGAAGAGGAAGTAACAGTCACCGGAAAAAAGATTATTTTGCTTTTAAAATCTTGCAAATACTGCTAGTCTGTATTTATATTCTCCTGTTTATGCCGATTGTTGCTGATCCGGTGGACCA G T G T C A G T G G T G T T G T A A G T T T T C T C T C A A A C G G C A C G T23 AGTGACAGTGGTGTTGTAAGTTTTCTCTCGAACGGCACGTGCATTACGATTTAGCGCGGCATTAGTAAAAACCAAAAGGATAGGAAGAGGAAGTAACAGTCACCGGAAAAAAGATTATTTTGCTTTTAAAATCTTGCAAATACTGCTAGTCTGTATTTATATTCTCCTGTTTATGCCGATTGTTGCTGATCCGGTGGACCA G T G A C A G T G G T G T T G T A A G T T T T C T C T C G A A C G G C A C G T24 AGTGACAGTGGTGTTGTAAGTTTTCTCTCGAACGGCACGTGCATTACGATTTAGCGCGGCATTAGTAAAAACCAAAAGGATAGGAAGAGGAAGTAACAGTCACCGGAAAAAAGATTATTTTGCTTTTAAAATCTTGCAAATACTGCTAGTCTGTATTTATATTCTCCTGTTTATGCCGATTGTTGCTGATCCGGTGGACCA G T G A C A G T G G T G T T G T A A G T T T T C T C T C G A A C G G C A C G T25 AGTGACAGTGGTGTTGTAAGTTTTCTCTCGAACGGCACGTGCATTACGATTTAGCGCGGCATTAGTAAAAACCAAAAGGATAGGAAGAGGAAGTAACAGTCACCGGAAAAAAGATTATTTTGCTTTTAAAATCTTGCAAATACTGCTAGTCTGTATTTATATTCTCCTGTTTATGCCGATTGTTGCTGATCCGGTGGACCA G T G A C A G T G G T G T T G T A A G T T T T C T C T C G A A C G G C A C G T26 AGTGACAGTGGTGTTGTAAGTTTTCTCTCGAACGGCACGTGCATTATGATTTAGCGCGGCATTAGTAAAAACCAAAAGGATAGGAAGAGGAAGTAACAGTCACCGGAAAAAAGATTATTTTGCTTTTAAAATCTTGCAAATACTGCTAGTCTGTATTTATATTCTCCTGTTTATGCCGATTTTTGCTGATCCGGTGGACCA G T G A C A G T G G T G T T G T A A G T T T T C T C T C G A A C G G C A C G T27 AGTGACAGTGGTGTTGTAAGTTTTCTCTCGAACGGCACGTGCATTATGATTTAGCGCGGCATTAGTAAAAACCAAAAGGATAGGAAGAGGAAGTAACAGTCACCGGAAAAAAGATTATTTTGCTTTTAAAATCTTGCAAATACTGCTAGTCTGTATTTATATTCTCCTGTTTATGCCGATTTTTGCTGATCCGGTGGGCCA G T G A C A G T G G T G T T G T A A G T T T T C T C T C G A A C G G C A C G T28 AGTGACAGTGGTGTTGTAAGTTTTCTCTCGAACGGCACGTGCATTATGATTTAGCGCGGCATTAGTAAAAACCAAAAGGATAGGAAGAGGAAGTAACTGTCACCGGAAAAAAGATTATTTTGCTTTTAAAATCTTGCAAATACTGCTAGTCTGTATTTATATTCTCCTGTTTATGCCGATTTTTGCTGATCCGGTGGGCCA G T G A C A G T G G T G T T G T A A G T T T T C T C T C G A A C G G C A C G T29 AGTGACAGTGGTGTTGTAAGTTTTCTCTCGAACGGCACGTGCATTATGATTTAGCGCGGCATTAGTAAAAACCAAAAGGATAGGAAGAGGAAGTAACTGTCACCGGAAAAAAGATTATTTTGCTTTTAAAATGTTGCAAATACTGCTAGTCTGTATTTATATTCTCCTGTTTATGCCGATTTTTGCTGATCCGGTGGGCCA G T G A C A G T G G T G T T G T A A G T T T T C T C T C G A A C G G C A C G T30 AGTGACAGTGGTGTTGTAAGTTTTCTCTCGAACGGCACGTGCATTATGATTTAGCGCGGCATTAGTAAAAACCAAAAGGATAGGAAGAGGAAGTAACTGTCACCGGAAAAAAGATTATTTTGCTTTTAAAATGTTGCAAATACTGCTAGTCTGTATTTATATTCTCCTGTTTATGCCGATTTTTGCTGATCCGGTGGGCCA G T G A C A G T G G T G T T G T A A G T T T T C T C T C G A A C G G C A C G T

BLAST - HeuristicBLAST - HeuristicStep

123

Suffix TreeLookup table

•Words/seeds•Location•Threshold T•Larger seq file

BLASTBLASTFebruary 10, 2004: BLAST 2.2.8 released

BLAST 2.2.8 release notes•Correction to tblastx alignment computation •ia32-linux now requires glibc 2.2.5

Source code can be obtained from: ftp://ftp.ncbi.nlm.nih.gov/toolbox/ncbi_tools/old/20040204/ncbi.tar.gz . Binaries can be obtained from: ftp://ftp.ncbi.nlm.nih.gov/blast/executables/release/2.2.8/ .

February 2, 2004: BLAST 2.2.7 released BLAST 2.2.7 release notes •Standalone BLAST is now available for amd64-linux. •formatdb now restricts volume sizes to 1G on 32-bit platforms for performance reasons. •The -A option has been removed from formatdb, that is, all databases will be created with ASN.1 deflines. •tblastn query concatenation now works correctly on 64-bit platforms. •The wwwblast source code has been merged into the C toolkit tree and is no longer distributed with the binaries.

Source code can be obtained from: ftp://ftp.ncbi.nlm.nih.gov/toolbox/ncbi_tools/old/20040202/ncbi.tar.gz . Binaries can be obtained from: ftp://ftp.ncbi.nlm.nih.gov/blast/executables/release/2.2.7/ .

http://www.ncbi.nih.gov/BLAST/blast_whatsnew.shtml

BLAST on your own machineBLAST on your own machine

► Allows you to BLAST multiple sequences Allows you to BLAST multiple sequences most web versions are single sequence onlymost web versions are single sequence only

► StepsSteps Sequence files in FASTA formatSequence files in FASTA format

Can have multiple sequences in each file but no Can have multiple sequences in each file but no duplicatesduplicates

Format larger sequence file into a databaseFormat larger sequence file into a databaseFormatdb –i dbfile.txt –p F –o TFormatdb –i dbfile.txt –p F –o T

Perform BLAST using appropriate switchesPerform BLAST using appropriate switchesBLASTALL –p BLASTN –d dbfile.txt –i comp.txt –o out.txtBLASTALL –p BLASTN –d dbfile.txt –i comp.txt –o out.txt

BLAST 2.2.8BLAST 2.2.8

► Arguments Arguments see appendix of handoutsee appendix of handout

––W for seed word length (default = 11)W for seed word length (default = 11) -r reward for a match (default = 1)-r reward for a match (default = 1) -q penalty for a mismatch (default = 3)-q penalty for a mismatch (default = 3) -G cost to open a gap-G cost to open a gap -E cost to extend a gap-E cost to extend a gap -F filter query sequence-F filter query sequence -e to set threshold expectation (threshold for HSP -e to set threshold expectation (threshold for HSP

before gaps are included)before gaps are included) -m to specify different output options-m to specify different output options

Score EScore ESequences producing significant alignments: Sequences producing significant alignments: (bits) Value(bits) Value

1_10 170 3e-0461_10 170 3e-0460_0 170 3e-0460_0 170 3e-0464_10 115 2e-0294_10 115 2e-0292_10 107 4e-0272_10 107 4e-0275_10 96 2e-0235_10 96 2e-0233_10 96 2e-0233_10 96 2e-0234_20 68 3e-0154_20 68 3e-0152_20 68 3e-0152_20 68 3e-0155_20 56 1e-0115_20 56 1e-011

QUERY 1 agattcactggtgtggcaagttgtctctcagactgtacatgcattaaaattttgcttggc 60QUERY 1 agattcactggtgtggcaagttgtctctcagactgtacatgcattaaaattttgcttggc 601_10 1 ............................................................ 601_10 1 ............................................................ 600_0 1 ............................................................ 600_0 1 ............................................................ 604_10 3 ....t.....c......ag..................a.................... 604_10 3 ....t.....c......ag..................a.................... 602_10 1 ............a..c....a...........a................g.......... 602_10 1 ............a..c....a...........a................g.......... 605_10 2 ........c......a.........g............................c.... 605_10 2 ........c......a.........g............................c.... 603_10 1 .................g........t.....................c.....a..... 603_10 1 .................g........t.....................c.....a..... 604_20 3 ....t.....c......ag....a.....g.......a.................... 604_20 3 ....t.....c......ag....a.....g.......a.................... 602_20 1 ............a..c...ta...........aa......c..a.....g..... 552_20 1 ............a..c...ta...........aa......c..a.....g..... 555_20 4 ......c..c...a....g....g..............a......c......c.... 605_20 4 ......c..c...a....g....g..............a......c......c.... 60

Example of BLAST output: -Example of BLAST output: -m3m3

Substitution scoresSubstitution scores

► Optimal substitution Optimal substitution scores were derived scores were derived for different PAM for different PAM distances / sequence distances / sequence homologies (States homologies (States et al., 1991)et al., 1991)

► Of importance is the Of importance is the match to mismatch match to mismatch score ratioscore ratio

Substitution scoresSubstitution scores

► ‘‘Better’ substitution Better’ substitution matrices exist, but matrices exist, but not yet not yet implemented in implemented in most BLAST most BLAST softwaresoftware

Practical: Part 2Practical: Part 2► Apply concepts from Part 1 to ‘real sequences’Apply concepts from Part 1 to ‘real sequences’► BLAST mRNA sequence for human and cattle BLAST mRNA sequence for human and cattle

INFG to an ~1/2 Mb sequence of human DNA INFG to an ~1/2 Mb sequence of human DNA ► Use optimal blast parameters for expected Use optimal blast parameters for expected

homologyhomology

Human DNA

Human INFG mRNACattle INFG mRNA

Expected levels of sequence Expected levels of sequence homologyhomology

► Varies for sequences being considered and Varies for sequences being considered and genomic regiongenomic region

Human to mouse comparison, from …

Efficiency of BLAST Efficiency of BLAST

► Human to Human to cattle coding cattle coding sequence sequence ~85% ~85% homologyhomology

(~PAM 15)(~PAM 15)

INFG mRNA sequencesINFG mRNA sequences

► Extracted from NCBI website using batch entrezExtracted from NCBI website using batch entrez

>gi|10835170|ref|NM_000619.1| Homo sapiens interferon, gamma (IFNG), mRNA>gi|10835170|ref|NM_000619.1| Homo sapiens interferon, gamma (IFNG), mRNATGAAGATCAGCTATTAGAAGAGAAAGATCAGTTAAGTCCTTTGGACCTGATCAGCTTGATACAAGAACTATGAAGATCAGCTATTAGAAGAGAAAGATCAGTTAAGTCCTTTGGACCTGATCAGCTTGATACAAGAACTACTGATTTCAACTTCTTTGGCTTAATTCTCTCGGAAACGATGAAATATACAAGTTATATCTTGGCTTTTCACTGATTTCAACTTCTTTGGCTTAATTCTCTCGGAAACGATGAAATATACAAGTTATATCTTGGCTTTTCAGCTCTGCATCGTTTTGGGTTCTCTTGGCTGTTACTGCCAGGACCCATATGTAAAAGAAGCAGAAAACCTTGCTCTGCATCGTTTTGGGTTCTCTTGGCTGTTACTGCCAGGACCCATATGTAAAAGAAGCAGAAAACCTTAAGAAATATTTTAATGCAGGTCATTCAGATGTAGCGGATAATGGAACTCTTTTCTTAGGCATTTTGAAGAAAGAAATATTTTAATGCAGGTCATTCAGATGTAGCGGATAATGGAACTCTTTTCTTAGGCATTTTGAAGAATTGGAAAGAGGAGAGTGACAGAAAAATAATGCAGAGCCAAATTGTCTCCTTTTACTTCAAACTTTTTAAATTGGAAAGAGGAGAGTGACAGAAAAATAATGCAGAGCCAAATTGTCTCCTTTTACTTCAAACTTTTTAAAAACTTTAAAGATGACCAGAGCATCCAAAAGAGTGTGGAGACCATCAAGGAAGACATGAATGTCAAGTTTAAACTTTAAAGATGACCAGAGCATCCAAAAGAGTGTGGAGACCATCAAGGAAGACATGAATGTCAAGTTTTTCAATAGCAACAAAAAGAAACGAGATGACTTCGAAAAGCTGACTAATTATTCGGTAACTGACTTGAATGTTCAATAGCAACAAAAAGAAACGAGATGACTTCGAAAAGCTGACTAATTATTCGGTAACTGACTTGAATGTCCAACGCAAAGCAATACATGAACTCATCCAAGTGATGGCTGAACTGTCGCCAGCAGCTAAAACAGGGAATCCAACGCAAAGCAATACATGAACTCATCCAAGTGATGGCTGAACTGTCGCCAGCAGCTAAAACAGGGAAGCGAAAAAGGAGTCAGATGCTGTTTCAAGGTCGAAGAGCATCCCAGTAATGGTTGTCCTGCCTGCAATATGCGAAAAAGGAGTCAGATGCTGTTTCAAGGTCGAAGAGCATCCCAGTAATGGTTGTCCTGCCTGCAATATTTGAATTTTAAATCTAAATCTATTTATTAATATTTAACATTATTTATATGGGGAATATATTTTTAGACTCTTGAATTTTAAATCTAAATCTATTTATTAATATTTAACATTATTTATATGGGGAATATATTTTTAGACTCATCAATCAAATAAGTATTTATAATAGCAACTTTTGTGTAATGAAAATGAATATCTATTAATATATGTATTATCAATCAAATAAGTATTTATAATAGCAACTTTTGTGTAATGAAAATGAATATCTATTAATATATGTATTATTTATAATTCCTATATCCTGTGACTGTCTCACTTAATCCTTTGTTTTCTGACTAATTAGGCAAGGCTATATTTATAATTCCTATATCCTGTGACTGTCTCACTTAATCCTTTGTTTTCTGACTAATTAGGCAAGGCTATGTGATTACAAGGCTTTATCTCAGGGGCCAACTAGGCAGCCAACCTAAGCAAGATCCCATGGGTTGTGTGTGTGATTACAAGGCTTTATCTCAGGGGCCAACTAGGCAGCCAACCTAAGCAAGATCCCATGGGTTGTGTGTTTATTTCACTTGATGATACAATGAACACTTATAAGTGAAGTGATACTATCCAGTTACTGCCGGTTTGAAATTATTTCACTTGATGATACAATGAACACTTATAAGTGAAGTGATACTATCCAGTTACTGCCGGTTTGAAAATATGCCTGCAATCTGAGCCAGTGCTTTAATGGCATGTCAGACAGAACTTGAATGTGTCAGGTGACCCTGATATGCCTGCAATCTGAGCCAGTGCTTTAATGGCATGTCAGACAGAACTTGAATGTGTCAGGTGACCCTGATGAAAACATAGCATCTCAGGAGATTTCATGCCTGGTGCTTCCAAATATTGTTGACAACTGTGACTGTACATGAAAACATAGCATCTCAGGAGATTTCATGCCTGGTGCTTCCAAATATTGTTGACAACTGTGACTGTACCCAAATGGAAAGTAACTCATTTGTTAAAATTATCAATATCTAATATATATGAATAAAGTGTAAGTTCACACCAAATGGAAAGTAACTCATTTGTTAAAATTATCAATATCTAATATATATGAATAAAGTGTAAGTTCACAACTACT

>gi|31982948|ref|NM_174086.1| Bos taurus interferon, gamma or immune type [interferon >gi|31982948|ref|NM_174086.1| Bos taurus interferon, gamma or immune type [interferon gamma type 2] (IFNG), mRNAgamma type 2] (IFNG), mRNA

ATTAGAAAAGAAAGATCAGCTACCTCCTTGGGACCTGATCATAACACAGGAGCTACCGATTTCAACTACTATTAGAAAAGAAAGATCAGCTACCTCCTTGGGACCTGATCATAACACAGGAGCTACCGATTTCAACTACTCCGGCCTAACTCTCTCCTAAACAATGAAATATACAAGCTATTTCTTAGCTTTACTGCTCTGTGGGCTTTTCCGGCCTAACTCTCTCCTAAACAATGAAATATACAAGCTATTTCTTAGCTTTACTGCTCTGTGGGCTTTTGGGTTTTTCTGGTTCTTATGGCCAGGGCCAATTTTTTAGAGAAATAGAAAACTTAAAGGAGTATTTTAATGGGTTTTTCTGGTTCTTATGGCCAGGGCCAATTTTTTAGAGAAATAGAAAACTTAAAGGAGTATTTTAATGCAAGTAGCCCAGATGTAGCTAAGGGTGGGCCTCTCTTCTCAGAAATTTTGAAGAATTGGAAAGATGAAAGCAAGTAGCCCAGATGTAGCTAAGGGTGGGCCTCTCTTCTCAGAAATTTTGAAGAATTGGAAAGATGAAA

INFG_refseq.txt

Human Chr12 sub-sequenceHuman Chr12 sub-sequence► Extracted from USCS ‘Golden Path’ websiteExtracted from USCS ‘Golden Path’ website► chr12:66,589,493-67,085,092 ~ ½ Mbchr12:66,589,493-67,085,092 ~ ½ Mb

does contain INFG gene does contain INFG gene ► Repeats masked to lower caseRepeats masked to lower case

>hg16_dna range=chr12:66589493-67085092 5'pad=0 3'pad=0 revComp=FALSE strand=? repeatMasking=lowerCATTCATTACTTTTATAAGGTTTCTCTCTGGTATGCATCTGACTTACATCATGGGAAAGCTAGTTTCATGACTCCTTTGGAATAGTTGTGGTCCTGAATATGGAAAATCAATTAATGAATAGCTTAAAGCACAATAGTCAACAAATAGATGTGAAAATTCTTTGTGAACTTTAAAGTCTTACTTAAACGTGAGATATTATATACAGTGTTTTATGTtagactgtgagcttgttaaagaaagaactatgccttctttttctttctaccagttccagtgcctcgtacaacatagaaaccataagtgtttttgaaagagcaaatGAATATTGGAAGGAGTAAGGTGATAGCTAAAGCTAAAACAATGTTTAGGGAGAACAACTGAAACAAAAGCAGCATTTGTGTCTTAAACTCATGGCCTCTGAAACAGCCTTGATAGATAGTAGAGAGGGTCAGATAGAGAGAGCCTGACTCAGAGATTGGGAAGCCCTATATGGTTGGAAGAGAAAGTAAGAGGAGACCCAAAGTATTAGACCACAGAAAGAAGTTCTAATAGTCAGTGTCAAGAGATTCAGCAGGAGGTTGTGTATCAGGATTTGGGTTTGGGAGTGGTATGGAGCTTACCTATCTCTAAAACGAGCAGGAGGGCAAAAATGAATCCCAGTCCCAAAGAATTCACTAATGGCCAGCAAACCAACACAGGAACCCCAGCACAGACACACAAGATAGGAAACCAGTTGTTGAAACTACAATGTAACGGGGCTGATTTAATAAAAACCTGTTACATGAGTTATAGGttttttttttttttttttttttttAATGTATGTGCCCCACCTTAGGAAAGCCAGAAATAATGGCAACGAAGAAATATTCATTCACAGTGAGAAAGCCATTAGAACGTTGGCTGGAACCTAGGGGCATATCGAGGGCCCACGTGGGAAGGACAATGACAACTTGTTTAGTCCTCACTGGTTTCCCAGTCTGTGGATCTTATTTGAAT

hs_chr12_subseq.txt

Human INFG geneHuman INFG gene

The exon / intron report from NCBI ‘AceView’ for NM_000619 is as follows:

In variant

Length & DNA

Coordinates on gene

Supporting clone (s)

Exon 1 243bp 1 to 243 M29383

Intron [gt-ag] 1242bp 244 to 1485 M29383 and 32 others

Exon 2 69bp 1486 to 1554 NM_000619 and 32 others

Intron [gt-ag] 95bp 1555 to 1649 NM_000619 and 33 others

Exon 3 183bp 1650 to 1832 NM_000619 and 24 others

Intron [gt-ag] 2425bp 1833 to 4257 NM_000619 and 24 others

Exon 4 725bp 4258 to 4982

Human INFG geneHuman INFG gene

From USCS ‘Golden Path website’ genome browserFrom USCS ‘Golden Path website’ genome browser

INFG against ~1/2 Mb region of Chr INFG against ~1/2 Mb region of Chr 1212

AssessmentAssessment

►Submit Submit

for for eithereither Part 1 Part 1 oror Part 2 the BLAST Part 2 the BLAST output, concatenated into one file and output, concatenated into one file and annotatedannotated

a short summary / discussion of the a short summary / discussion of the concepts covered in this practical (< 500 concepts covered in this practical (< 500 words)words)

ReferencesReferences

►Strongly recommend BLAST tutorial on Strongly recommend BLAST tutorial on NCBI siteNCBI site http://http://www.ncbi.nlm.nih.govwww.ncbi.nlm.nih.gov

/BLAST/tutorial/ Altschul-1.html/BLAST/tutorial/ Altschul-1.html

►Further “Bioinformatics for quantitative Further “Bioinformatics for quantitative geneticists course notes” J. McEwan geneticists course notes” J. McEwan http://www-http://www-personal.une.edu.au/~jvanderwpersonal.une.edu.au/~jvanderw

/ aabc_materials2004.htm#ModuleC/ aabc_materials2004.htm#ModuleC