1
Administrivia
• What this course is about • Assumed knowledge and catch-up lecture • Labs • Course website • READ THE COURSE OUTLINE
Introduction to sequence analysis
BINF3010/9010
Topics (next few weeks) • Overview • Storing sequence data • Comparing a sequence with another: dotplots and
alignments • Comparing a sequence with many others:
similarity searching • Comparing many sequences with many others:
multiple sequence alignment and family representations. Molecular phylogeny
• Genome project informatics
Sequence analysis
• Representation is key to understanding • In sequence analysis, macromolecules are
represented as strings
QTELATKAGVKQQSIQLIEAGVTK
TATACAAGAAAGTTTGTACT
Nucleotide sequences
• DNA: 4 bases: A, G, C, T • RNA: 4 bases: A, G, C, U • Ambiguity codes: N = A or G or C or T or U (also = X) S (Strong) = G or C, W(Weak) = A or T/U R (puRine) = G or A, Y (pYrimidine) = C or T/U M (aMino) = A or C, K (Keto) = G or T/U B = not A, D = not C, H = not G, V = not T/U
2
Nucleotide sequences
5’- GATCCAGA - 3’ 5’- TCTGGATC - 3’
Sequence: 5’-GATCCAGA-3’ Reverse: 3’-AGACCTAG-5’ Complement: 3’-CTAGGTCT-5’ Reverse-complement: 5’-TCTGGATC-3’
Amino acid sequences • 20 characters
– Small: G (Gly), A (Ala) – Polar: S (Ser), T (Thr) – Hydrophobic: L (Leu), I (Ile), V (Val), M (Met) – Aromatic: F (Phe), Y (Tyr), W (Trp) – Acidic: D (Asp), E (Glu) – Amines: N (Asn), Q (Gln) – Basic: K (Lys), R (Arg), H (His) – Cyclic: P (Pro) – Sulphur-containing: C (Cys)
• Sequence written from N terminal to C terminal
Sequence analysis: overview
Nucleotide sequence file
Search databases for similar sequences
Sequence comparison
Multiple sequence analysis
Design further experiments l Restriction mapping l PCR planning
Translate into protein
Search for known motifs
RNA structure prediction
non-coding
coding
Protein sequence analysis
Search for protein coding regions
Manual sequence entry
Sequence database browsing
Sequencing project management
Protein sequence file
Search databases for similar sequences
Sequence comparison
Search for known motifs
Predict secondary structure
Predict tertiary
structure Create a multiple sequence alignment
Edit the alignment
Format the alignment for publication
Molecular phylogeny
Protein family analysis
Nucleotide sequence analysis
Sequence entry