01-Intro to Sequence.ppt

2
1 Administrivia What this course is about Assumed knowledge and catch-up lecture • Labs Course website READ THE COURSE OUTLINE Introduction to sequence analysis BINF3010/9010 Topics (next few weeks) • Overview Storing sequence data Comparing a sequence with another: dotplots and alignments Comparing a sequence with many others: similarity searching Comparing many sequences with many others: multiple sequence alignment and family representations. Molecular phylogeny Genome project informatics Sequence analysis Representation is key to understanding In sequence analysis, macromolecules are represented as strings QTELATKAGVKQQSIQLIEAGVTK TATACAAGAAAGTTTGTACT Nucleotide sequences DNA: 4 bases: A, G, C, T RNA: 4 bases: A, G, C, U Ambiguity codes: N = A or G or C or T or U (also = X) S (S trong) = G or C, W(W eak) = A or T/U R (puR ine) = G or A, Y (pY rimidine) = C or T/U M (aM ino) = A or C, K (K eto) = G or T/U B = not A, D = not C, H = not G, V = not T/U

description

Bioinformatics lecture introduction

Transcript of 01-Intro to Sequence.ppt

Page 1: 01-Intro to Sequence.ppt

1

Administrivia

•  What this course is about •  Assumed knowledge and catch-up lecture •  Labs •  Course website •  READ THE COURSE OUTLINE

Introduction to sequence analysis

BINF3010/9010

Topics (next few weeks) •  Overview •  Storing sequence data •  Comparing a sequence with another: dotplots and

alignments •  Comparing a sequence with many others:

similarity searching •  Comparing many sequences with many others:

multiple sequence alignment and family representations. Molecular phylogeny

•  Genome project informatics

Sequence analysis

•  Representation is key to understanding •  In sequence analysis, macromolecules are

represented as strings

QTELATKAGVKQQSIQLIEAGVTK

TATACAAGAAAGTTTGTACT

Nucleotide sequences

•  DNA: 4 bases: A, G, C, T •  RNA: 4 bases: A, G, C, U •  Ambiguity codes: N = A or G or C or T or U (also = X) S (Strong) = G or C, W(Weak) = A or T/U R (puRine) = G or A, Y (pYrimidine) = C or T/U M (aMino) = A or C, K (Keto) = G or T/U B = not A, D = not C, H = not G, V = not T/U

Page 2: 01-Intro to Sequence.ppt

2

Nucleotide sequences

5’- GATCCAGA - 3’ 5’- TCTGGATC - 3’

Sequence: 5’-GATCCAGA-3’ Reverse: 3’-AGACCTAG-5’ Complement: 3’-CTAGGTCT-5’ Reverse-complement: 5’-TCTGGATC-3’

Amino acid sequences •  20 characters

–  Small: G (Gly), A (Ala) –  Polar: S (Ser), T (Thr) –  Hydrophobic: L (Leu), I (Ile), V (Val), M (Met) –  Aromatic: F (Phe), Y (Tyr), W (Trp) –  Acidic: D (Asp), E (Glu) –  Amines: N (Asn), Q (Gln) –  Basic: K (Lys), R (Arg), H (His) –  Cyclic: P (Pro) –  Sulphur-containing: C (Cys)

•  Sequence written from N terminal to C terminal

Sequence analysis: overview

Nucleotide sequence file

Search databases for similar sequences

Sequence comparison

Multiple sequence analysis

Design further experiments l Restriction mapping l PCR planning

Translate into protein

Search for known motifs

RNA structure prediction

non-coding

coding

Protein sequence analysis

Search for protein coding regions

Manual sequence entry

Sequence database browsing

Sequencing project management

Protein sequence file

Search databases for similar sequences

Sequence comparison

Search for known motifs

Predict secondary structure

Predict tertiary

structure Create a multiple sequence alignment

Edit the alignment

Format the alignment for publication

Molecular phylogeny

Protein family analysis

Nucleotide sequence analysis

Sequence entry