Bioinformatics

28
Bioinformatics Overview School of B&I TCD May 2010

description

Bioinformatics. Overview School of B&I TCD May 2010. Who, me?. Andrew Lloyd [email protected] 087-225-9850, 053-9255717, 01-896-2450 Director INCBI 1993-2000 Population genetics, evolution Whole genome analysis Immunology, chickens, FIRM. Definition/scope. - PowerPoint PPT Presentation

Transcript of Bioinformatics

Page 1: Bioinformatics

Bioinformatics

Overview

School of B&I TCD May 2010

Page 2: Bioinformatics

Who, me?• Andrew Lloyd

[email protected]

• 087-225-9850, 053-9255717, 01-896-2450

• Director INCBI 1993-2000

• Population genetics, evolution

• Whole genome analysis

• Immunology, chickens, FIRM

Page 3: Bioinformatics

Definition/scope

• Storage, retrieval and analysis of biological (sequence) information.

• Insert better definition here• Case can be made for microarray analysis• NOT

– ecoinformatics (ecology)– Image analysis– Bar-coding hospital sheets

Page 4: Bioinformatics

Philosophy

“Nothing worth learning can be taught” Oscar Wilde

Page 5: Bioinformatics

Getting bioinformation

• Type it in: A,T,C,C,G,T,C,A (1991)

• Access databases– Literature (Pubmed)– Medical (OMIM)– DNA sequence (EMBL/GenBank)– Protein sequence (UniProt, SwissProt, PIR)– 3-D structure (PDB)

Page 6: Bioinformatics

Annotation

• In any DB, half is data and half context.– Gene ontology (language)– Parsing sequence (ORF, RBS, Intron, -helix)– Recognising similar sequences (evolution!)– Complementary info : DB cross-referencing

• (DNA -> Protein -> 3D structure -> motifs)

Page 7: Bioinformatics

Secondary databases

• Protein motifs, domains, families

• RNA structures (16S ribosomal RNA…)

• Taxonomy/classification

• Metabolic pathways (KEGG)

• Enzymes (Brenda, TCD, Ireland)

• SNPs: mutations and variants

• Disease DBs (OMIM)

• Immuno, epitope DBs

Page 8: Bioinformatics

Complete genomes

• Ensembl (complex, basically vertebrate)– Uniform look-and-feel; cross-refs

• UCSC GoldenPath browser

• Plants

• Bacterial genomes– Including mitochondrial, chloroplast– Eubacteria vs Archaea vs Eukaryotes

Page 9: Bioinformatics

Annotated/known genes

• What does my gene do?

• Blast (fasta) against the DB

• SRS/Entrez to access databases– Neighboring (similar things in same DB)

• DB cross-references– full picture of attributes– What biochemical pathway?

Page 10: Bioinformatics

UniProtProtein sequence

GenBank/EMBLDNA Sequence

PDB3-D struct

OMIM

PubMed

Taxonomy

Maps &Genomes

FullTextJournals

Prosite Pfam PSSM

The territory

Page 11: Bioinformatics

Databases

•BIG

• EMBL/GenBank 200Gbp, 100m entries, 2500 complete genomes, 200K species

• Encycl. Britannica 180m letters. 40m words• EMBL 1km of Britannica Volumes• Doubling every 14-18 mo• Human genome is X bp?

Page 12: Bioinformatics

Intrinsic vs Context

Internal• DNA, protein sequence

– DNA: Purine/Pyrimidine– AAs: small, hydrophobic, aromatic, polar– Variants: SNPs, Indels, Alt Splicing

• 2ndry structure– DNA: stem/loops– Protein: helix, sheet, turn, loop

Page 13: Bioinformatics

Intrinsic vs Context

External, context for your molecule

• In other species (homologs, phylog trees)

• In which cell

• In which cellular location (GO)

• Molecular complex (dimers)

• Which pathway (KEGG)

• Where in genome (neighbors, synteny)

Page 14: Bioinformatics

New Unknown Gene

• Blast homology searching

• Genomic location/neighboring genes

• Where is it expressed?

• How regulated (control sequences)

• Intron/exon structure

• Domain structure

• Restriction sites etc.

• Primer design

Page 15: Bioinformatics

DNA/gene structure

• Four bases A T C G U– 2 pyrimidine, 2 purine– LOTS of them: how many?

• Open reading frame

• 5’ signals, 3’ signals

• Introns/exons

• Neighbours (operons)

Page 16: Bioinformatics

Two sequences

• Alignment– Local– Global

• Dotplot

• Threading

Page 17: Bioinformatics

One seq vs many

• Homology search vs database

• Special case of 2-seq alignment

• Blast vs fasta

• Limit by species/taxon

• Substitution matrices

• Low complexity masking

Page 18: Bioinformatics

Multiple sequence alignment

• MSA

• Progressive alignment

• ClustalW or (better) T-Coffee

Page 19: Bioinformatics

Phylogenetic trees

• Computationally intensive

• Distance matrix methods– Neighbor-joining (NJ)– UPGMA

• Minimum evolution

• Maximum parsimony

• Maximum likelihood– Bayesian methods

Page 20: Bioinformatics

Genefinding

• Special case of DNA analysis• How to annotate a genome• Bacterial

– Find open reading frames (ORFs)– With start/stop codons– With promoter, RBS, CAAT, TATA

• Eukaryotic– As above PLUS– Introns/exons– Alternative splicing

Page 21: Bioinformatics

Exon 1 Exon 2 Exon 3 Exon 4

StopStart (ATG) IntronsControlRegion

Typical mammalian gene structure

Introns “spliced out” and discarded

DNA

RNA

RNA

ATGCCCAGGAGATTTGGA . . .

PROTEIN MetProArgArgPheGly . . .

miRNAs?

5’ 3’gt.. …ag

Stop: TAG, TGA, TAA

Page 22: Bioinformatics

Protein substructure• DNA makes protein and protein (enzymes)

make everything else.

• 20 Amino acids

• Amino acid properties

• Motifs

• Domains

• Biological units

Page 23: Bioinformatics

Amino acid propertiesagain … and again and again

Page 24: Bioinformatics

Protein 3-D structure

• Relationship between sequence & structure

• Secondary structure– Alpha helix– Beta sheet– Coil– Turn

• Threading sequence to homologous structure

Page 25: Bioinformatics

Gene Expression

• EST

• SAGE

• MicroArray

• Clustering of same expressed genes

Page 26: Bioinformatics

Genomics

• Complete DNA seq for a species

• Gene order

• Gene clusters/operons– Missing operons

• Gene duplication

• Whole genome duplication (WGD)

Page 27: Bioinformatics

SNPs

• Key issue in genetics is that two organisms are both the same and different:– Humans vs chimps vs mouse– Parent vs offspring vs co-national vs human

• Single nucleotide polymorphisms• Variation between individuals• Pharmacogenetics

– Personal tailored medicine

Page 28: Bioinformatics

Summary/take home

• Course designed to give you access to databases, software tools

• …and ways of thinking about data