Finding prokaryotic genes and non intronic eukaryotic genes

12
Finding prokaryotic genes and non intronic eukaryotic genes Lecture 8 Global Sequence 1

description

Finding prokaryotic genes and non intronic eukaryotic genes. Lecture 8. Introduction. Review structure of prokaryotic genes What is ORF Finding open reading frames Eliminating false ORF Algorithm for finding ORF. Prokaryotic coding sequence structure. - PowerPoint PPT Presentation

Transcript of Finding prokaryotic genes and non intronic eukaryotic genes

Page 1: Finding  prokaryotic genes and non  intronic  eukaryotic genes

Global Sequence 1

Finding prokaryotic genes and non intronic eukaryotic genes

Lecture 8

Page 2: Finding  prokaryotic genes and non  intronic  eukaryotic genes

Introduction

• Review structure of prokaryotic genes• What is ORF• Finding open reading frames• Eliminating false ORF • Algorithm for finding ORF.

Page 3: Finding  prokaryotic genes and non  intronic  eukaryotic genes

Prokaryotic coding sequence structure

• The gene coding sequence in the primary strand begins an ATG and ends at a stop codon: TAA, TGA TAG.

• The diagram shows the template strand (TAC compliment of ATG) and three other Amino acids does not show a stop codon.

• The codons are contiguous in ,prokaryotic and non-intronic, protein coding sequences.

Page 4: Finding  prokaryotic genes and non  intronic  eukaryotic genes

Global Sequence 4

Finding potential OFR

– Translate each reading frame beginning at:

– Base 1: 5’ 3’ frame 1– Base 2: 5’ 3’ frame 2– Base 3: 5’3’ frame 2

• Get the “reverse compliment of the given strand” and repeat the process”; 3’ 5’ frame 1….

• The diagram gives a partial sequence of the first exon in this gene.

Page 5: Finding  prokaryotic genes and non  intronic  eukaryotic genes

Finding ORF • Look for start and stop

codons (amino acids). • An ORF is the sequence

which begins with the start codon (ATG in a DNA strand) and ends with a stop codon (TAA/TAG/TGA)

• Which of the translated reading frames in the diagram opposite could contain ORF if you know that it is a partial sequence of the exon. ?

Page 6: Finding  prokaryotic genes and non  intronic  eukaryotic genes

True ORF gene• Not all ORF are “true” genes so one needs to consider:

– “False” start codons An ATG can also exist as part of the gene sequence where it does not represent a start codon. • What does it represent?

– length of the ORF should contain at minimum number of amino acids{consider the smallest protein is about 20 aa in length.]

• Does an ORF represents a gene or a coding sequence (CDS) of a gene then: – An ORF in Prokaryotes can be translated directly into the amino

acid sequence. – However in eukaryotes the ORF is a mixture of exons and

introns. So the exons sequences must be found in order to determine the true amino acid sequence (covered in next lecture).

Page 7: Finding  prokaryotic genes and non  intronic  eukaryotic genes

Global Sequence 7

ORF’s in prokaryotic genes• In order to make it easier to find true genes or ORF in

prokaryotic cells one needs to consider the structure of prokaryotic operons such as the “lac operon” below:

• E. G. Within the lac operon there are 3 genes (CDS) all in close proximity: so the ATG lac Y is close to TAG of LacZ….

Page 8: Finding  prokaryotic genes and non  intronic  eukaryotic genes

Eliminating false positives • Gene density is about 1 per kilobase, ORF every 1000 bases. In some cases the

genes density can cause the stop codon of one gene to overlap with the promoter of another [ Zvelebil chapter 9] or even the start codon to overlap with a stop codon

• The DNA sequence of genes contains different ratio of bases as opposed to the non “CDS of a gene” or non geneic DNA: (The student is expected to research this)

• An ORF should have a promoter region a small distance upstream of the start codon. Promoter sequences can be identified by sequences such as a TATA box…. (a lecture on the basic of finding promoters will discuss this in more details).

• The CDS sequence of genes are generally highly conserved so “searching on line databases, for similar cds sequences can increase the chance of finding true ORF.

Page 9: Finding  prokaryotic genes and non  intronic  eukaryotic genes

Algorithm to find ORF in prokaryotic Fasta files

1. Open files and convert file sequences (60 per line) into one string of sequences.

2. Translate the first reading frame of the downloaded strand. 3. Shift one position to the right and translate this sequence; repeat

for reading frame three. (note in Fasta Files there will be no need to reverse the compliment as it can be assumed that the CDS is in the sequences given in the files)

4. Mark the start and the stop amino acids5. Look for sequences with a start followed by a stop if there is none

then there is no ORF in that reading frame.6. Determine length of ORF and if less than 20 eliminate as it is a

“false positive”

Page 10: Finding  prokaryotic genes and non  intronic  eukaryotic genes

Exercise • The sequence for TUBAC3 gene can be found

at: TUBAC3 gene complete sequence Note: the diagram only shows a part of the exon. And the partial sequence shown in the figure above begins at position 57 to position 357

• Download the file and analyse it in more detail to reveal problems that can with polycistronic and Eukaryotic sequences with muliple contiguous exons.

Page 11: Finding  prokaryotic genes and non  intronic  eukaryotic genes

Exam question.• Open reading frames (ORFs) are an essential part of

finding genes in genomes: • What is an ORF (2 Marks). • Using a suitable example describe how you would

predict an ORF (8 Marks).• Write a basic algorithm to find ORF in a fasta file; e.g.

(BTEB gene) (6 marks) • Describe, by illustrating you answer with a suitable

example, two ways you can eliminate false ORF. (6 marks)

Page 12: Finding  prokaryotic genes and non  intronic  eukaryotic genes

Global Sequence 12

Reference• Baxevanis, A.D. 2005 Bioinformatics: a practical

guide to the analysis of genes and proteins. Wiley; Chapter 5. [book is in the library]

• Klug, W.A. et al 2010; Concepts of Genetics; Pearson Education p. 596-p.597

• Zhang, M.Q. 2002 Computational prediction of eukaryotic coding genes. Nat Rev. Genet. 3 698-709.

• Zvelebil M. “understanding bioinformatics” chapter 9 {book is in the library]