Basics of Bioinformatics

99
BASICS OF BIOINFORMATICS Biotechnology Division North-East Institute of Science & Technology (Council of Scientific & Industrial Research) Jorhat 785 006, Assam Salam Pradeep Email: [email protected]

description

Basics of Bioinformatics powerpoint slide will be helpful for students who are from non-bioinformatics & non-biotechnology background. This slide is meant for students from MS in Botany, Zoology, Agri, Vet, Fishery etc

Transcript of Basics of Bioinformatics

Page 1: Basics of Bioinformatics

BASICS OF BIOINFORMATICSBiotechnology Division

North-East Institute of Science & Technology(Council of Scientific & Industrial Research)

Jorhat 785 006, Assam

Salam PradeepEmail: [email protected]

Page 2: Basics of Bioinformatics

Bioinformatics• Use of techniques including • Applied mathematics• Informatics• Statistics• Computer science• Artificial intelligence,• Chemistry & Biochemistry• To solve biological problems on the molecular

level

Page 3: Basics of Bioinformatics

Major Research Efforts & Applications

Page 4: Basics of Bioinformatics

Sequence analysis & alignment• Comparison of sequence in order to find the

similar sequence. • Way of arranging the sequences of DNA / RNA

/ Amino Acids to identify regions of similarity that may be a consequence of functional, structural or evolutionary relationships.

• Identification of gene structures, reading frames, distributions of introns & exons & regulatory elements.

Page 5: Basics of Bioinformatics

Genome annotation• Process of marking the genes and other biological

features in a DNA sequence• First genome annotation software system was

designed in 1995 by Dr. Owen White• First genome of a free-living organism to be

decoded, the bacterium Haemophilus influenzae.• White’s software system finds the genes (places

in the DNA sequence that encode a protein), the transfer RNA, and other features.

Page 6: Basics of Bioinformatics

Computational evolutionary biology• Trace the evolution of a large number of

organisms by measuring changes in their DNA, rather than through physical taxonomy or physiological observations alone.

• Compare entire genomes, permits the study of more complex evolutionary events, such as gene duplication, horizontal gene transfer, speciation.

• Track and share information on an increasingly large number of species and organisms

Page 7: Basics of Bioinformatics

Measuring biodiversity• Biodiversity Databases are used to collect the

species names, descriptions, distributions, genetic information, status & size of populations, habitat needs, and how each organism interacts with other species.

• Computer simulations model such things as population dynamics, or calculate the cumulative genetic health of a breeding pool (in agriculture) or endangered population (in conservation).

• Entire DNA sequences, or genomes of endangered species can be preserved, allowing the results of Nature's genetic experiment to be remembered in silico, and possibly reused in the future, even if that species is eventually lost.

Page 8: Basics of Bioinformatics

Prediction of protein structure• Protein structure prediction is one of the most

important goals pursued by bioinformatics and theoretical chemistry.

• Its aim is the prediction of the three-dimensional structure of proteins from their amino acid sequences.

• In other words, it deals with the prediction of a protein's tertiary structure from its primary structure.

• Protein structure prediction is of high importance in medicine (for example, in drug design) and biotechnology (for example, in the design of novel enzymes).

Page 9: Basics of Bioinformatics

Comparative genomics• Comparative genomics is the study of the

relationship of genome structure and function across different biological species or strains.

• Gene finding is an important application of comparative genomics, as is discovery of new, non-coding functional elements of the genome.

• Computational approaches to genome comparison have recently become a common research topic in computer science.

Page 10: Basics of Bioinformatics

Modeling biological systems

• Systems biology involves the use of computer simulations of cellular subsystems such as the networks of metabolites and enzymes which comprise metabolism, signal transduction pathways and gene regulatory networks) to both analyze and visualize the complex connections of these cellular processes.

• Artificial life or virtual evolution attempts to understand evolutionary processes via the computer simulation of simple (artificial) life forms.

Page 11: Basics of Bioinformatics

Protein-protein interaction & docking

• Protein-protein interactions involve the association of protein molecules.

• These associations are studied from the perspective of biochemistry, signal transduction and networks.

• Wet Lab Techniques: Co-immunoprecipitation, FRET, Bimolecular Fluorescence Complementation

• Protein-protein docking: the prediction of protein-protein interaction based on the three-dimensional protein structures only is not satisfactory As of 2006.

Page 12: Basics of Bioinformatics

Biological Sequence Database

Page 13: Basics of Bioinformatics

Primary Sequence Databases

• The International Nucleotide Sequence Database (INSD) consists of the following databases.

• DDBJ (DNA Data Bank of Japan) • EMBL Nucleotide DB (European Molecular

Biology Laboratory) • GenBank (National Center for Biotechnology

Information) • They interchange the stored information and

are the source for many other databases

Page 14: Basics of Bioinformatics

NCBI• National Center for Biotechnology Information is

part of the United States National Library of Medicine (NLM), a branch of the National Institutes of Health.

• Founded in 1988 sponsored by Senator Claude Pepper.

• NCBI has had responsibility for making available the GenBank DNA sequence database since 1992

• In addition to GenBank, NCBI provides OMIM, MMDB (3D protein structures), dbSNP, the Unique Human Gene Sequence Collection, a Gene Map of the Human genome, a Taxonomy Browser etc

Page 15: Basics of Bioinformatics
Page 16: Basics of Bioinformatics

DDBJ

Page 17: Basics of Bioinformatics

EMBL

Page 18: Basics of Bioinformatics

Protein Sequence Database

Page 19: Basics of Bioinformatics

UniProt - Universal Protein Resource

Page 20: Basics of Bioinformatics

Swiss-Prot - Protein Knowledgebase

Page 21: Basics of Bioinformatics

Protein Information Resource

Page 22: Basics of Bioinformatics

Pfam

Page 23: Basics of Bioinformatics

Protein Structure Databases

Page 24: Basics of Bioinformatics

Protein Data Bank (PDB)

Page 25: Basics of Bioinformatics
Page 26: Basics of Bioinformatics

PDB Statistics

Page 27: Basics of Bioinformatics

NCBI Molecular Modeling Database

Page 28: Basics of Bioinformatics
Page 29: Basics of Bioinformatics

Genome Databases

Page 30: Basics of Bioinformatics

Corn

Page 31: Basics of Bioinformatics
Page 32: Basics of Bioinformatics

ERIC (Enteropathogen Resource Integration Center)

Page 33: Basics of Bioinformatics
Page 34: Basics of Bioinformatics

Flybase

Page 35: Basics of Bioinformatics
Page 36: Basics of Bioinformatics

MGI Mouse Genome

Page 37: Basics of Bioinformatics
Page 38: Basics of Bioinformatics

Viral Bioinformatics Resource Center

Page 39: Basics of Bioinformatics
Page 40: Basics of Bioinformatics

Saccharomyces Genome Database

Page 41: Basics of Bioinformatics
Page 42: Basics of Bioinformatics

National Microbial Pathogen Data Resource

Page 43: Basics of Bioinformatics

Other Databases• Protein-protein interactions - BioGrid, STRING, DIP etc• Metabolic pathway Databases - KEGG, BioCyc, MANET etc• Microarray databases - ArrayExpress, Stanford Microarray Dbase, GEO

Page 44: Basics of Bioinformatics

Sequence File Formats• FASTA – Always starts with a > (greater than

symbol) • GENBANK – Series of header lines - Locus, Definition, Origin …• EMBL – 1st line begins the first sequence entry - 1st line of entry contains 2 letter ID

Page 45: Basics of Bioinformatics

FASTA Format

Page 46: Basics of Bioinformatics

GenBank Format

Page 47: Basics of Bioinformatics
Page 48: Basics of Bioinformatics

EMBL Format

Page 49: Basics of Bioinformatics
Page 50: Basics of Bioinformatics

Inside NCBI

Page 51: Basics of Bioinformatics

Sitemap

Page 52: Basics of Bioinformatics
Page 53: Basics of Bioinformatics

Taxonomy Browser

Page 54: Basics of Bioinformatics

NCBI Taxonomy Browser Statistics

Page 55: Basics of Bioinformatics

Genome Projects

Page 56: Basics of Bioinformatics

Genome Projects Statistics

Page 57: Basics of Bioinformatics

Map Viewer

Page 58: Basics of Bioinformatics
Page 59: Basics of Bioinformatics

Sequence analysis &

Sequence alignment

Page 60: Basics of Bioinformatics

Sequence analysis & alignment• Comparison of sequences in order to find

similar sequences • A way of arranging the sequences of

DNA/RNA/PTN to identify regions of similarity that may be a consequence of functional, structural or evolutionary relationships.

• Aligned sequences of nucleotide or amino acid residues are typically represented as rows within a matrix

Page 61: Basics of Bioinformatics

Representations in Sequence alignment

Semi Conservative Substitution

Conservative Substitution

Page 62: Basics of Bioinformatics

Global and Local alignments • Global alignments attempt to align every

residue in every sequence• Most useful when the sequences in the query

set are similar and of roughly equal size. • Local alignments are useful for dissimilar

sequences that are suspected to contain regions of similarity or similar sequence motifs within their larger sequence context.

• With sufficiently similar sequences - there is no difference between local and global alignments.

Page 63: Basics of Bioinformatics

• Needleman-Wunsch algorithm - A general global alignment technique and is based on dynamic programming

• Smith-Waterman algorithm - A general local alignment method also based on dynamic programming.

Page 64: Basics of Bioinformatics

Pairwise alignment • Used to find the best-matching piecewise local

or global alignments of two query sequences. • It can only be used between 2 sequences at a

time• Efficient to calculate and are often used for

methods such as searching a database for sequences with high homology to a query.

• Primary methods of producing pairwise alignments are dot-matrix methods, dynamic programming, and word methods

Page 65: Basics of Bioinformatics
Page 66: Basics of Bioinformatics

Multiple sequence alignment • MSA incorporate more than two sequences at

a time • Multiple alignment align all of the sequences

in a given query set • Often used in identifying conserved sequence

regions across a group of sequences• Aid in establishing evolutionary relationships

by constructing phylogenetic trees

Page 67: Basics of Bioinformatics
Page 68: Basics of Bioinformatics

Sequence Similarity Search

Page 69: Basics of Bioinformatics

NCBI BLAST• An algorithm for comparing primary biological

sequence information, such as the amino-acid sequences of different proteins or the nucleotides of DNA sequences

• A BLAST search enables a researcher to compare a query sequence with a library or database of sequences, and identify library sequences that resemble the query sequence above a certain threshold.

• BLAST program was designed by Eugene Myers, Stephen Altschul, Warren Gish, David J. Lipman and Webb Miller at the NIH and was published in J. Mol. Biol. in 1990

Page 70: Basics of Bioinformatics

BLAST Types• blastn - Nucleotide-nucleotide BLAST • blastp - Protein-protein BLAST • blastx - Nucleotide 6-frame translation-

protein• tblastx - -Nucleotide 6-frame translation-

nucleotide 6-frame translation • tblastn - Protein-nucleotide 6-frame

translation• megablast - Large numbers of query

sequences

Page 71: Basics of Bioinformatics
Page 72: Basics of Bioinformatics

BLASTn

Page 73: Basics of Bioinformatics

BLASTp

Page 74: Basics of Bioinformatics

BLASTn: Search Set

Page 75: Basics of Bioinformatics

BLASTp: Search Set

Page 76: Basics of Bioinformatics

BLASTn: Program Selection

Page 77: Basics of Bioinformatics

BLASTp: Program Selection

Page 78: Basics of Bioinformatics

BLASTn Result

Page 79: Basics of Bioinformatics

BLASTn: Graphic Summary

Page 80: Basics of Bioinformatics

BLASTn Description

Page 81: Basics of Bioinformatics

BLASTn Alignment

Page 82: Basics of Bioinformatics

BLASTn Tree View

Page 83: Basics of Bioinformatics

PDB BLASTp

Page 84: Basics of Bioinformatics

BLASTp: Graphic Summary

Page 85: Basics of Bioinformatics

PDB BLASTp Description

Page 86: Basics of Bioinformatics

PDB BLASTp Alignment

Page 87: Basics of Bioinformatics

BLASTp Tree View

Page 88: Basics of Bioinformatics

Multiple Sequence Alignment

Page 89: Basics of Bioinformatics

EBI ClustalW Server

Page 90: Basics of Bioinformatics

Preparing Multiple Sequence

Page 91: Basics of Bioinformatics
Page 92: Basics of Bioinformatics
Page 93: Basics of Bioinformatics
Page 94: Basics of Bioinformatics
Page 95: Basics of Bioinformatics

Phylogenetic Analysis

Page 96: Basics of Bioinformatics

Cladogram• A Cladogram is a branching diagram (tree) assumed to be

an estimate of a phylogeny where the branches are of equal length, thus cladograms show common ancestry, but do not indicate the amount of evolutionary "time" separating taxa.

Page 97: Basics of Bioinformatics

Phylogram• Phylogram is a branching diagram (tree) assumed to be

an estimate of a phylogeny, branch lengths are proportional to the amount of inferred evolutionary change.

Page 98: Basics of Bioinformatics

JalView – Java Applet

Page 99: Basics of Bioinformatics

Thank You