Application of Bioinformatics in Genetic Research Instructors: Dr. Henry Baker Dr. Luciano...

21
Application of Bioinformatics in Genetic Research Instructors: Dr. Henry Baker Dr. Luciano Brocchieri Dr. Michele Tennant Dr. Lei Zhou http://159.178.28.30/GMS6014/ home.htm

Transcript of Application of Bioinformatics in Genetic Research Instructors: Dr. Henry Baker Dr. Luciano...

Page 1: Application of Bioinformatics in Genetic Research Instructors: Dr. Henry Baker Dr. Luciano Brocchieri Dr. Michele Tennant Dr. Lei Zhou .

Application of Bioinformatics in Genetic Research

Instructors:

Dr. Henry Baker

Dr. Luciano Brocchieri

Dr. Michele Tennant

Dr. Lei Zhou

http://159.178.28.30/GMS6014/home.htm

Page 2: Application of Bioinformatics in Genetic Research Instructors: Dr. Henry Baker Dr. Luciano Brocchieri Dr. Michele Tennant Dr. Lei Zhou .

Application of Bioinformatics in Genetic Research

Time and location:

Monday: 12:00-12:50 in CGRC291.

Wednesday: 12:00-12:50 or 11:40-12:30, CGRC-291

Fridays (11/18. 12/2): 12:00-12:50 in CGRC-391 or 11:40-12:30 in CGRC291.

Page 3: Application of Bioinformatics in Genetic Research Instructors: Dr. Henry Baker Dr. Luciano Brocchieri Dr. Michele Tennant Dr. Lei Zhou .

Evaluation

• 50% classroom participation

• 50% homework

Page 4: Application of Bioinformatics in Genetic Research Instructors: Dr. Henry Baker Dr. Luciano Brocchieri Dr. Michele Tennant Dr. Lei Zhou .

History of bioinformatics – sequence analysis

• Sequence comparison

• Similarity search

• Phylogenetic analysis

• Structure predication

• Gene prediction

Page 5: Application of Bioinformatics in Genetic Research Instructors: Dr. Henry Baker Dr. Luciano Brocchieri Dr. Michele Tennant Dr. Lei Zhou .

Bioinformatics in the post genome era

• Information Representation.- many new types of data, such as Function,

Location, Interaction, Regulatory pathway, Expression profile, etc. needs to be recorded

• Data Management

- Infrastructure for inputting, managing, access and retrieval of relevant information in a “sea of databases”. Cloud computing.

• Systematics

The opportunity provided by genome sequence and genomic / proteomic technology is matched by the

challenge to bioinformatics / computational biology

Page 6: Application of Bioinformatics in Genetic Research Instructors: Dr. Henry Baker Dr. Luciano Brocchieri Dr. Michele Tennant Dr. Lei Zhou .

Bioinformatics in the post genome era

• SNP and whole genome wide association studies.

• Genomic expression profiling (RNA and protein levels).

• Comparative genomics, Epigenomics …• Individual genomes, epigenomes,

transcriptomes.

• Regulatory pathway simulation – systems biology.

$1,000 genome and … $500,000 analysis ?

Page 7: Application of Bioinformatics in Genetic Research Instructors: Dr. Henry Baker Dr. Luciano Brocchieri Dr. Michele Tennant Dr. Lei Zhou .

Objectives of GMS6014

• Basic skills for retrieving and storing data, using web-based applications.

• Ability to install and run stand alone local applications.

• Understanding the basis of bioinformatics applications using sequence similarity search as the example.

• A brief survey of available bioinformatics tools and introduction to functional genomics and systems biology.

Page 8: Application of Bioinformatics in Genetic Research Instructors: Dr. Henry Baker Dr. Luciano Brocchieri Dr. Michele Tennant Dr. Lei Zhou .

Sequence Representation - nucleotide

N G R C W T G Y C Y

A G A C A T G C C CC G T T TGT

For complete list, see table 2.1, Mount 2nd Ed

Or http://www.ncbi.nlm.nih.gov/blast/fasta.shtml

Page 9: Application of Bioinformatics in Genetic Research Instructors: Dr. Henry Baker Dr. Luciano Brocchieri Dr. Michele Tennant Dr. Lei Zhou .

Sequence Representation - amino acids

Q:

What’s the common property of these amino acids ?

1. D, E

2. I, L, V, M, F

3. A, S, P

Page 10: Application of Bioinformatics in Genetic Research Instructors: Dr. Henry Baker Dr. Luciano Brocchieri Dr. Michele Tennant Dr. Lei Zhou .

Sequence Representation - amino acids

Example:

Coloring based on aa property.

W D L L A Q I L C Y A L R I Y

W R F L A T V V L E T L R Q Y

W K F L A I T M C K V L K Q F

R C L L C N K L Y Y L L R K V

L N R L L A E L Y E V L C H I

L R L L Q Q Q Q M V L Q R Q Y

W D L L A Q I L C Y A L R I Y

W R F L A T V V L E T L R Q Y

W K F L A I T M C K V L K Q F

R C L L C N K L Y Y L L R K V

L N R L L A E L Y E V L C H I

L R L L Q Q Q Q M V L Q R Q Y

Page 11: Application of Bioinformatics in Genetic Research Instructors: Dr. Henry Baker Dr. Luciano Brocchieri Dr. Michele Tennant Dr. Lei Zhou .

Representation of sequence – sequence file format

1.) FASTA – simple and clean

> gene_name, (other info)

MASASASKJHKLJLKJLDSDFSF

SSDSASFSFD…

Practice / DIY: retrieve sequence in Fasta format and save the file in the local computer.

Page 12: Application of Bioinformatics in Genetic Research Instructors: Dr. Henry Baker Dr. Luciano Brocchieri Dr. Michele Tennant Dr. Lei Zhou .

How to store sequence files

• .txt format is clean and allows down stream sequence analysis

• .doc or .rtf allows formatting during annotation – however, extra information are inserted thus NOT suitable for computational analysis.

Page 13: Application of Bioinformatics in Genetic Research Instructors: Dr. Henry Baker Dr. Luciano Brocchieri Dr. Michele Tennant Dr. Lei Zhou .

Practice – file types

• Using Windows Explorer (with your own computer) or IE with “C:\” in the address window.

• Change the “ToolsFolder Options” so that the file extensions (.xxx) are revealed.

• Edit the downloaded sequence file in MS Word, highlight a section of the sequence with Bold font or color and save as .doc

• Open the .doc file in NotePad – observe the inserted characters.

Page 14: Application of Bioinformatics in Genetic Research Instructors: Dr. Henry Baker Dr. Luciano Brocchieri Dr. Michele Tennant Dr. Lei Zhou .

Practice – file types (Cont.)

• Load the .doc file to Webcutter using “Browse” and then “Upload sequence file”.-Notice that the “sequence” in the sequence box are

nonsense characters.

• Clear input; Browse and then load the .txt file. Run an analysis.

Always keep you sequences in .txt file for downstream analysis.

Page 15: Application of Bioinformatics in Genetic Research Instructors: Dr. Henry Baker Dr. Luciano Brocchieri Dr. Michele Tennant Dr. Lei Zhou .

Representation of sequence

The need to include annotations and functional information with each sequence.

• Structured data entry

• GeneBank

• EMBL / SwissProt

Observe: The difference of data structure between SwissProt, NCBI protein, and NCBI Genes.

Page 16: Application of Bioinformatics in Genetic Research Instructors: Dr. Henry Baker Dr. Luciano Brocchieri Dr. Michele Tennant Dr. Lei Zhou .

Representation of sequence

The need to represent associated info with sequence

• Structured data entry

• Specialized databases3-d StructureMutation / Diseases Protein family / Protein domainInteractionPathway….

Page 17: Application of Bioinformatics in Genetic Research Instructors: Dr. Henry Baker Dr. Luciano Brocchieri Dr. Michele Tennant Dr. Lei Zhou .

Representation of sequence

The need to represent associated info with sequence

• Structured data entry

• Specialized databases

• Complex / customized data structure

- Object-oriented data representation (Mount, p44-45)

Page 18: Application of Bioinformatics in Genetic Research Instructors: Dr. Henry Baker Dr. Luciano Brocchieri Dr. Michele Tennant Dr. Lei Zhou .

XML – Extensible Markup language

Define highly structured data for sharing and exchange.

Observe:

1.) The differences between the XML format and the GenPept format.

2.) The differences among XML, TinySeqXML, and INSDXML.

Page 19: Application of Bioinformatics in Genetic Research Instructors: Dr. Henry Baker Dr. Luciano Brocchieri Dr. Michele Tennant Dr. Lei Zhou .
Page 20: Application of Bioinformatics in Genetic Research Instructors: Dr. Henry Baker Dr. Luciano Brocchieri Dr. Michele Tennant Dr. Lei Zhou .

Bioinformatics / Computational biology

• Bioinformatics - Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data.

• Computational Biology - The development and application of data-analytical and theoretical methods, mathematical modeling and computational simulation techniques to the study of biological, behavioral, and social systems.

(Working Definition of Bioinformatics and Computational Biology - July 17, 2000). NIH / BISTI

Page 21: Application of Bioinformatics in Genetic Research Instructors: Dr. Henry Baker Dr. Luciano Brocchieri Dr. Michele Tennant Dr. Lei Zhou .

Genetic code

• Codon usage

• special code – mitochondria genes