On line (DNA and amino acid) Sequence Information Lecture 7.

18
On line (DNA and amino acid) Sequence Information Lecture 7

Transcript of On line (DNA and amino acid) Sequence Information Lecture 7.

Page 1: On line (DNA and amino acid) Sequence Information Lecture 7.

On line (DNA and amino acid) Sequence Information

Lecture 7

Page 2: On line (DNA and amino acid) Sequence Information Lecture 7.

Bioinformatcs Databases• The Biological data, generated by various labs, is

submitted and stored in specific databases is : • The data can be:– Nucleotide: DNA and mRNA (cDNA) – Proteins sequences

• The main nucleotide sequence databases are:– United states: Genebank (NCBI) – Europe: Nucleotide sequence database (EMBL)– Japan: DNA databank of Japan. (DDJB)

• These databases also contain sequences related to: – Expressed sequence tags (ESTs) small (800 bp) of mRNA

that be used to see what genes are expressed…

Page 3: On line (DNA and amino acid) Sequence Information Lecture 7.

Protein Databases

• The main protein databases is:• Uniprot (DB) databases contains data from three

related databases sites:– SWISS-PROT (most up-to date information)– Trembl: (translation of coding sequences.)– PIR database [protein information resource]

• Both the nucleotide and protein databases contain much more detail than just sequences. The data is generated is referred to gene annotated data.

Page 4: On line (DNA and amino acid) Sequence Information Lecture 7.

Global Sequence 4

The Annotation of genes• Once the gene sequence’s have been determined then

the data must be annotated, This basic annotated data includes: (Klug 2010)– Identify regulatory regions – Identify coding sequences (cds); the exons/ introns (if a

sequence; eukaryotic)….– The amino acid sequence for the gene. – Other organisms where the DNA sequence/ AA sequence is

to found– Journals/Reference to where data came from.– Links to other databases that contain information about the

gene,

Page 5: On line (DNA and amino acid) Sequence Information Lecture 7.

Bioinformatics Database• To faciliate finding annotated data about genes and

protein information there are a number of sites containing specific search engines;– NCBI has ENTREZ – EMBL has the EBI search page previously SRS engine – The SIB ExPaSy search engine (This is more fosuces on protein

related information. )

• Consider the following query:– What is the DNA and amino acid sequence for the following

gene: Human BTEB – Type the following into the search text box:– Human[orgamism] AND BTEB[title]

Page 7: On line (DNA and amino acid) Sequence Information Lecture 7.

BTEB NCBI Nucleotide Record

Page 8: On line (DNA and amino acid) Sequence Information Lecture 7.

Coding section of gene

The Exon intron structure is also available in graphic form

Page 9: On line (DNA and amino acid) Sequence Information Lecture 7.

Further information

• On the right hand column you will find links to online analytical resources; e.g. BLAST (psi-blast) (a tool to search for similar sequences contained in the database):

• Information on the amino acid sequence obtained for the CDs of the gene. The text box also provides a link to information on the protein in the uniprot database.

Page 10: On line (DNA and amino acid) Sequence Information Lecture 7.

An EMBL nucleotide record• Annotated data can also be found in the EMBL

database: • BTEB EMBL record.: shows the main record. • Clicking on the “text” link at the top right hand

corner will give the essential features of the gene. BTEB-EMBL-EBI_text_record.

• An ExPASy database search gives the following information for this gene: Type BTEB and then BTEB and Human

Page 11: On line (DNA and amino acid) Sequence Information Lecture 7.

The BTEB Protein record

A link to a graphic representation of the protein and the relevant annotated data can be found at: BTEB Human Protein

Page 12: On line (DNA and amino acid) Sequence Information Lecture 7.

Other databases databases

• The nucleotide (Genbank and EMBL) and protein (Uniprot) contain the “raw data” and are referred to as “primary databases”.– More specific databases derive data from these

and are referred to as secondary database; examples include protein family and sequence similarity databases such as PROSITE and PRINTS

– There are databases which contain information about specific organisms such as e. coli using Genome online database (GOLD)

Page 13: On line (DNA and amino acid) Sequence Information Lecture 7.

Other databases

– Databases for specific types of sequences such as those associated with promoters and other regulatory elements. dbEST ; Homologous structure alignment database.

– Structural databases from the Protein Data Bank– On-line Mendelian inheritance of man (OMIM) which

contains information on human genes and genetic disorders. • The nucleic acids research journal January edition

provides up-to-date analysis of current online bioinformatics databases: Nucleic acid research database edition

Page 14: On line (DNA and amino acid) Sequence Information Lecture 7.

Other important information sources• PUBMED: Literature research: journal articles/

conference proceedings/ books etc.– Search under many fields: keyword, author….– Returns: journal articles/abstracts– Two types: general/review.– BTEB pubmed search found at:

• http://www.ncbi.nlm.nih.gov/pubmed?term=BTEB&cmd=DetailsSearch

• The user can register a NCBI account to manage their activity and store findings of: gene searches; pubmed searches…. This information can be download, emailed….

Page 15: On line (DNA and amino acid) Sequence Information Lecture 7.

BTEB pubmed search result

Page 16: On line (DNA and amino acid) Sequence Information Lecture 7.
Page 17: On line (DNA and amino acid) Sequence Information Lecture 7.

Exercise

• The EMBL-EBI record: BTEB_”text”_record.• The NCBI : BTEB NCBI Nucleotide Record • The DDJB: BTEB flatfile Record

• Exercise: write a briefy report comparing and contrasting the core elements of both records: refer to page 8-16 in Bioinformatics: A practical guide to the analysis of genes and proteins 3rd edition ; Book can be found in the library.

Page 18: On line (DNA and amino acid) Sequence Information Lecture 7.

Exercise• Search for the following gene “DNA”

sequence:– Human Leukocyte Elastase gene linear DNA [ hint

should be 5292 bp long].– Retrieve the record and download and save the

fasta file.