Project Proposals
description
Transcript of Project Proposals
![Page 1: Project Proposals](https://reader035.fdocuments.in/reader035/viewer/2022070502/56813aec550346895da3599c/html5/thumbnails/1.jpg)
Project Proposals
• Due Monday Feb. 12
• Two Parts:
•Background—describe the question
•Why is it important and interesting?
•What is already known about it?
•Proposed Work
•What will you do?
•How will you do it?
• Include references and figures as needed
![Page 2: Project Proposals](https://reader035.fdocuments.in/reader035/viewer/2022070502/56813aec550346895da3599c/html5/thumbnails/2.jpg)
Phylogeny
• Reread background papers from weeks 3 &4
•Desulc et al
•Holder and Lewis
![Page 3: Project Proposals](https://reader035.fdocuments.in/reader035/viewer/2022070502/56813aec550346895da3599c/html5/thumbnails/3.jpg)
![Page 4: Project Proposals](https://reader035.fdocuments.in/reader035/viewer/2022070502/56813aec550346895da3599c/html5/thumbnails/4.jpg)
The twenty amino acids
![Page 5: Project Proposals](https://reader035.fdocuments.in/reader035/viewer/2022070502/56813aec550346895da3599c/html5/thumbnails/5.jpg)
Protein Weight Matrices
![Page 6: Project Proposals](https://reader035.fdocuments.in/reader035/viewer/2022070502/56813aec550346895da3599c/html5/thumbnails/6.jpg)
BLOSUM45 BLOSUM62 BLOSUM90
PAM 250 PAM 160 PAM100
More Divergent Less Divergent
Two Main kinds of weight matrices
BLOSUM62 is the BLASTP default
PAM (Point Accepted Mutation) Based on explicit evolutionary model. Based on mutations observed thoughout a global alignment (includes both highly conserved and highly mutable regions) of a small protein dataset
BLOSUM (Blocks Substitution Matrix) Based only on highly conserved regions in series of alignments forbidden to contain gaps. Sensitive for local alignment of related sequences. Based on larger dataset than PAM.
![Page 7: Project Proposals](https://reader035.fdocuments.in/reader035/viewer/2022070502/56813aec550346895da3599c/html5/thumbnails/7.jpg)
Other Types of BLAST • MegaBLAST (nt)
– Mega BLAST uses the greedy algorithm for nucleotide sequence alignment search. Optimized for aligning sequences that differ slightly as a result of sequencing or other similar "errors". Also able to efficiently handle much longer DNA sequences than the blastn program of traditional BLAST algorithm.
• Discontinous MegaBLAST (nt)– Designed specifically for comparison of diverged
sequences, especially sequences from different organisms, which have alignments with low degree of identity, where the original Mega BLAST is not very effective.
• See Also, MUMMER at TIGR
![Page 8: Project Proposals](https://reader035.fdocuments.in/reader035/viewer/2022070502/56813aec550346895da3599c/html5/thumbnails/8.jpg)
Other BLAST options • Search for short nearly exact matches
– (nt or aa)– Special page with altered parameters
• Expect value has been increased • word size decreased to optimise for short hits which
generally score a large E value• For proteins a different scoring matrix used,
optimized for smaller evolutionary distances
![Page 9: Project Proposals](https://reader035.fdocuments.in/reader035/viewer/2022070502/56813aec550346895da3599c/html5/thumbnails/9.jpg)
• Low complexity sequence– Regions of biased composition including
homopolymeric runs, short-period repeats, and more subtle overrepresentation of one or a few residues
– Examples: AAATAAAAAAAATAAAAAAT or PPCDPPPPPKDKKKKDDGPP
– Filters are used to remove low-complexity sequence because it can cause artifactual hits
• Filters result in strings of Ns or Xs substituted in your query– Without a filter-
• Some hits may be reported with high scores only because of the presence of a low-complexity region.
• Usually not the result of homology shared by the sequences.• Rather, it is as if the low-complexity region is "sticky" and is
pulling out many sequences that are not truly related.
![Page 10: Project Proposals](https://reader035.fdocuments.in/reader035/viewer/2022070502/56813aec550346895da3599c/html5/thumbnails/10.jpg)
Phylogenetic Profiling
Pattern of presence or absence of genes across genomes
Idea: proteins that function in the same cellular context frequently have similar phylogenetic profiles
![Page 11: Project Proposals](https://reader035.fdocuments.in/reader035/viewer/2022070502/56813aec550346895da3599c/html5/thumbnails/11.jpg)
Environmental Genomic Datasets
Sargasso SeaStation AlohaAcid Mine DrainageWhale Fallsludgesoilsmarine viromesHuman Gut
![Page 12: Project Proposals](https://reader035.fdocuments.in/reader035/viewer/2022070502/56813aec550346895da3599c/html5/thumbnails/12.jpg)
Global Ocean Survey: phase I
Community Cyberinfrastructure for Advanced Microbial Ecology Research and AnalysisCAMERA
Online since Jan. 23rd!
![Page 13: Project Proposals](https://reader035.fdocuments.in/reader035/viewer/2022070502/56813aec550346895da3599c/html5/thumbnails/13.jpg)
Today’s Lab
Use IMG (http://img.jgi.doe.gov/cgi-bin/pub/main.cgi) to exploreprecomputed homologs for your gene of interestgenomic neighborhoods for your gene of interestphylogenetic profile of your gene of interest genes that fit a specific phylogenetic profile of a subset of genomes
of interest to you
Register as a CAMERA user http://cameradev.calit2.net/index.phpSee if you can find homologs of your gene of interest in one of available databases