Doug RaifordLesson 3
More and more sequence data is being generated every day
Useless if not made available to other researchers
Not just sequence dataMany other biological
experiments Expression NMR Mass Spec Protein X-ray crystallography
With the data comes scientific journal articles
Search tools Find similar genes in other
organism Find articles Find
Implemented algorithms Alignment Sequence assembly Protein structure prediction
National Center for Biotechnology Information (NCBI) GenBank
(accessed through NCBI)▪ Sponsored by
National Institute of Health (NIH)
RefSeq▪ Derived from
GenBank, curated, non-redundant
European Molecular Biology Laboratory (EMBL)
DNA Data Bank of Japan (DDBJ)
Protein Data Bank (PDB) PDB files: standardized format for
viewersProtein Information Resource (PIR)
Will revisit laterCan actually perform scientific
analysis Color by charge Hydrophobicity Render surface
Entrez Global Query Cross-Database Search System Single source for searching publications,
sequences, proteins,diseases, etc.
Whole Genome DB Genomic
Expression Omnibux (GEO)
Online Mendelian Inheritance in Man(OMIM)
PubMed Map of site
Practical Extraction and Report Language Expansion came later
Really good at string manipulation DNA and proteins
represented as strings Scripting language Almost all Unix and Linux
systems come with it installed
Free download and install for windows
Make a computer do what we want it to do
Program in a language Machine language▪ Low level—1’s and 0’s
High level programming language▪ C/C++▪ Java▪ Compiled into machine
language Very high level
languages▪ Scripting▪ Interpreted
Perl lives herePerl lives here
Display something to the screenSyntax and punctuationStore something in a variableCommenting the codeSome easy string manipulation
print “Hello World\n”;