Chapter I

111
Dereje Beyene (PhD) College of Natural Science Microbial Cellular and Molecular Biology Department Nov, 2013/14 enome Science (Genomics

description

Chapter I

Transcript of Chapter I

Slide 1

Dereje Beyene (PhD)College of Natural ScienceMicrobial Cellular and Molecular Biology DepartmentNov, 2013/14 Genome Science (Genomics)

1

Figure X:DNA is a double helix. (A) Francis Crick (left) and James Watson (right) proposed that the DNA molecule has a double-helical structure. (B) Biochemists can now pinpoint the position of every atom in a DNA molecule. To see that the essential features of the original Watson-Crick model have been verified, follow with your eyes the double-helical chains of sugar-phosphate groups and note the horizontal rungs of the bases.Francis Crick James Watson 2

2

3The term Genomics derived from the term genomeThe term genomics is used for the first time in 1986, when sequencing and mapping of the entire human genome initiatedGenomics: it is the field of genome studies and includes intensive effort to determine the sequence of DNA and RNA using high-throughput sequencing strategies; generating fine scale genetic maps; microchips arrays, collecting genome variations within a population (e.g. 1000 genome project) and ascertaining the transcriptional control genes: and employing digital technology and computational intensive analysis to understand the structure, function and evolution of diverse organisms. Definition of the term Genomics

3

4Genomics provides essential tools to speed up the work of the forward geneticists and is now a scientific discipline in its own right.

Application of genomics science in all areas of biology has lowered the barriers that once separated the plant, animal, microbial research communities.Experimental techniques for studying transcriptomes and proteomes are providing novel insights into genome expression and the new discipline of systems biology is linking genome biology with cellular biochemistry Genomics Cont.....

GeneticsGenetics

4

The investigation of the roles and functions of single genes is a primary focus of molecular biology or genetics and is a common topic of modern medical and biological research. Research of single genes does not fall into the definition of genomics unless the aim of this genetic, pathway, and functional information analysis is to elucidate its effect on, place in, and response to the entire genome's networks.

A genome is the sum total of all an individual organism's genes. Thus, genomics is the study of all the genes of a cell, or tissue, at the DNA (genotype), mRNA (transcriptome), or protein (proteome) levels.5Genomics Cont.....

5

6Genome: The entire genetic complement of a living organismGene: A DNA segment containing biological information and hence coding for RNA and/or polypeptide molecule Genomic Expression: It is the series of events by which the biological information carried by the genome is released and made available to the cell.

The values of genome sequence lies in their annotations:Genome Annotation: Characterizing genomic features using computational and experimental methods (Functional genomics).

Genes: Four levels of annotationGene prediction where are genes?What do they look like?Domain- what do proteins do?Role What pathway(s) involved in? Genomics Cont.....

6

7Genome sequence can tell usEverything about the organism's lifeIts developmental programEnables us to identify genes responsible for disease resistance or susceptibility Novel gene discoveryWhere are we going and where we came from? .EvolutionHow similar are we to apes, trees, and yeast? (comparative genomics)To define the minimum genome size of free living organisms then exploit for the structure of minimal synthetic genome that can support life(corner stone for Synthetic Biology)

7

The Basic Scheme of genome annotation and Delivery Pipeline 8

Analysis Pipe LineDelivery Pipe Line

STS: Sequence Taged Site

8

9

OutComeFig. XX. Bioinormatics Uses Infromation Technology to Manage and Analyze Information Generated by the Life Sciences

9

Bacteriophages have played and continue to play a key role in bacterial genetics and molecular biology. E.g. Cloning vector, M13 Vector ......

Bacteriophage genome sequences can be obtained through direct sequencing of isolated bacteriophages, but can also be derived as part of microbial genomes (How?). Analysis of bacterial genomes has shown that a substantial amount of microbial DNA consists of prophage sequences and prophage-like elements. A detailed database mining of these sequences offers insights into the role of prophages in shaping the bacterial genomeI) Bacteriophage Genomics10N.B: Bacteriophage genomes are especially mosaic: the genome of any one phage species appears to be composed of numerous individual modules. These modules may be found in other phage species in different arrangements. MAJOR RESEARCH AREAS OF GENOMICS

10

II) Cyanobacteria Genomics11

Cyanobacteria are prokaryotic organisms that have served as important model organisms for studying oxygenic photosynthesis and have played a significant role in the Earths history as primary producers of atmospheric oxygen.

11

IV) Plant GenomicsRecent technological advancements have substantially expanded our ability to analyze and understand plant genomes and to reduce the gap existing between genotype and phenotype.

The fast evolving field of genomics allows scientists to analyze thousand of genes in parallel, to understand the genetic architecture of plant genomes and also to isolate the genes responsible for mutations.

Whole plant genomes (about 33 plant species) can now be sequenced available at http://phytozome.net/)12

12

13

Source: http://genomevolution.org/wiki/index.php/Sequenced_plant_genomesFig. Phylogenic tree up to date as of September 4th 2012

13

The Human Genome Project (HGP, it was initiated 1990) is an international scientific research project with a primary goal of determining the sequence of chemical base pairs (nitrogenous bases; Purine (A & G); Pyrimidine (T & C)) which make up DNA, and of identifying and mapping the approximately 20,000-25,000 genes of the human genome from both a physical and functional standpoint.The estimated size of human genome size is 3,156Mb = 3.156 GbIII) Human genomicsDisplay of the results of the project required significant bioinformatics resources. The sequence of the human reference assembly can be explored using the UCSC Genome Browser or Ensembl.Scope of Genomics Cont 14

14

Scope of Genomics Cont 15Organization Of Human Genome

15

V) MetagenomicsScope of Genomics Cont Metagenomics:It is powerful tool to reveal the previously hidden diversity of microscopic life (culture based studies).Thus, it offers a powerful lens for viewing the microbial world that has the potential to revolutionize the understanding of the entire microscopic organisms (bacteria and fungi).16Definition:Metagenomics ( Environmental Genomics or Community Genomics) is the study of genomes recovered from environmental samples without the need for culturing themMetagenomics data processed using Bioinformatics tools

16

17

Fig. Scheme of the major stages of integrative metagenomic ecosystem study of microbial ecology

17

18

Examining phylogenetic diversity using hyper variable region of 16S rRNA (Bacteria)Fig. The operon of nuclear rRNA Map of bacteria

18

19

Why 16S rRNA for Bacteria (Eubacteria and archea ) metagenomics?Conserved regions of the gene are identical for all bacteria while the variable regions contain specific sites unique to individual bacteria. The uniqueness (V1 to V9) enables taxonomic positioning and identification of bacteria.

19

20Importance of PCR primer pair selectionBroad-range primers used in the PCR reaction preceding the sequencing target the conserved regions of the 16S rRNA gene in order to unselectively amplify all bacterial DNA present in the sample. Therefore, critical evaluation of the primer pair coverage is a prerequisite for unbiased and comprehensive sequence information.

Microbial DNA sequence analysis pipelineMicrobial genomic DNA extraction from the samplesPCR amplification of 16S rRNA genes with the most conserved universal primers and constructing into a cloning vectorSequencing of the PCR productsQuality confirmation of the resulting 16S rRNA gene fragmentsRemoval of vector and primer sequencesData validation and consensus buildingComparison of the sequence data with public databases for identification using BLAST search toolClustering of the sequence data to genus and species levelAssessing of the microbial community compositionUniversal 16S RNA Primers For Metagenomics and/or Species Identification Of Microbial Isolates

20

21

Map of Nuclear rRNA genes and Their ITS of Fungi

21

22

Some Primer Map of ITS1 and ITS2 regions of Nuclear rRNA of Fungi

22

23Diversity patterns of microorganisms can be used for monitoring and predicting environmental conditions and change. How? e.g. Microbiome in human gut!!

Examining genes/operons for desirable enzyme candidates (e.g., cellulases, chitinases, lipases, antibiotics, other natural products Identified genes may be exploited for industrial or medical applications.

Examining secretory, regulatory, and signal transduction mechanisms associated with samples or genes of interest. Examining bacteriophage and/or plasmid sequences. These potentially influence diversity and structure of microbial communities. Examining potential lateral gene transfer events. Knowledge of genome plasticity may give us an idea of types selective pressures exist for gene capture and evolution within a habitat.

Aims of Metagenomics

23

24Examining metabolic pathways

Facilitate towards designing culture media for the growth of previously-uncultured microbes.

Examining genes that predominate in a given environment compared to others. Finally, metagenomic data and metadata can be leveraged towards designing low- and high-throughput experiments focused on defining the roles of genes and microorganisms in the establishment of a dynamic microbial community.

Metagenomics Cont

24

25Why is Metagenomics Important?

All reasons lead to more knowledge:

Organisms can be studied directly in their environments bypassing the need to isolate each species

There are significant advantages for viral metagenomics, because of difficulties cultivating the appropriate host, How? bacteriophages!!!

It is important to designing low- and high-throughput experiments focused on defining the roles of genes and microorganisms in the establishment of a dynamic microbial community.

25

26

Microbial diversity

Communitygenomics

26

27

27

28

Microbial community DNA extractionTotal community Genomes (DNA)Community Sampling approach Microbial community REs digest total DNA and then shotgun sequencingAmplifying single gene, e.g. 16S rRNAPhylogenetic tree

Phylogenetic snapshot of most members of the communityIdentification of novel phylotypeTotal gene pool of the community Identification of all genes categories Discovery of new genesLinking genes to particular phylotypes Assembled and annotation Sequence and generate tree

genomes

Outcomes:

28

29

29

30Pharmacogenomics:There is an inter-patient medication response variability in their efficacy and toxicityInter-patient variability are due to in part to polymorphisms (the frequency of the most frequent allele is 99%) in genes encoding for:

Thus, Pharmacogenomics is a field of study aim to elucidate the genetic basis for differences in drug efficacy and toxicity, and it uses genome wide approaches to identify the network of genes that govern an individuals response to drug therapy.

PharmacogenomicsDrug Metabolizing enzymes, Drug transporters, and/or Drug targets (e.g. enzymes, receptors)

30

31

Better patient treatments through advanced diagnostics and personalized medicine Diagnostic tests will guide the clinical decision-making to prescribe a specific drug, depending on the patients prognosis to be a responder or non-responder to a given medication. de Lecea and RossbachThe HUGO Journal20126:2 doi:10.1186/1877-6566-6-2DNA sequencing of Target gene

31

32Allelic variant: A variation in the normal sequence of a gene.Genotype: The genetic formation or the genetic makeup of an orgnaism.Genotype-phenotype correlation: The association between the presence of genetic variation and the resulting physical characteristics or abnormality.Pharmacogenetics vs. Pharmacogenomics: Pharmacogentics: it is the study of genetic variation in drug metabolizing enzymes and the effect on drug response/ it is often a study of the variations in a targeted gene, or group of functionally related genes. Pharmacogenomics represents the general study of the entire spectrum of genes that affect drug behavior, i.e. It is a much broader investigation of genetic variations at the level of the genome.5) Phenotype: Observable features of the expression of genes. 6) Polymorphism: Natural variations in genes that exist stably in the population, usually have no adverse effects on the individual.7) SNP: single nucleotide polymorphism, the most common type of DNA variation, usually with a frequency above 1%.Some Terminologies

32

33

Drug metabolizing enzymes, DMEs (Phase I enzymes/Cytochrome P450 enzymes, e.g. CYP2D6; Phase II enzymes, e.g. N-acetyl transferases) Drug transporters (Solute Carrier (SLC)- and ATP Binding Cassette (ABC)-transporters, e.g. organic cation transporters, OCTs, as members of the SLC family) Drug receptors (ligand controlled ion channels or class 1 receptors, e.g. glutamate receptor; G-protein coupled receptors (GPCRs) or class 2 receptors, e.g. -receptor; enzymatic receptors, e.g. insulin receptor; receptors regulating gene expression, e.g. steroid hormone receptor) G-proteins, e.g. GNAS1 or GNB3 The figure at the right shows the relevant pathways including the positions of genes which are important for drug response and drug effects:Genetic variability is seen both in the area of pharmacokinetics (absorption, distribution, metabolism and excretion) and in the area of pharmacodynamics (drug effects).

33

34E.g. Polymorphisms: Thiopurine S-methyltransferase monogenic traits have a marked effect on pharmacokinetics (Drug metabolism); such individuals who inherit an enzyme deficiency must treated with markedly different dose the affected medications (E.g. 5%-10% of the standard thiopurine dose)

Beta-aderenergic receptor - can alter the sensitivity of patient to treatment (e.g. beta-agonists), changing the pharmodynmics of drug response. N.B: Most drug effects are determined by the interplay of several gene products that govern the pharmacokinetics (drug absorption, distribution metabolism and excretion); and pharmacodynamics (effect of drug and mechanisms of action) of medications.

[Pharmacokinetics may simply defined as what the body does to the drug, whereas Pharmacodynamics which may be defined as what the drug does to the body]

The goal of Pharmacogenomics research is to elucidate these polygenic determinants of drug effect. Research outcome of the pharmacogenomics:Provide new strategies for optimizing drug therapy based on each patients genetic determinants of drug efficacy and toxicity. Pharmac

34

35

35

TPMT: Thiopurine S-methyltransferase36This gene encodes the enzyme that metabolizes thiopurine drugs via S-adenosyl-L-methionine as the S-methyl donor and S-adenosyl-L-homocysteine as a byproduct.

Thiopurine drugs such as 6-mercaptopurine are used as chemotherapeutic agents.

Genetic polymorphisms that affect this enzymatic activity are correlated with variations in sensitivity and toxicity to such drugs within individuals. Intolerance (defintion in medicine): inability to withstand or consume; in ability to absorb or metabolise nutrients.

Table : Summary of TPMT Deficiency: Pharmac

36

37The methyl group (CH3) attached to the methionine sulfur atom in SAM is chemically reactive. This allows donation of this group to an acceptor substrate in transmethylation reactions. More than 40 metabolic reactions involve the transfer of a methyl group from SAM to various substrates, such as nucleic acids, proteins, lipids and secondary metabolites.

Business

Intolerance (defintion in medicine): inability to withstand or consume; in ability to absorb or metabolise nutrients.Drug intolerance:Inability to continue taking, or difficult to take, a medication blc of an adverse side effect that is not immunity mediatedThe state of reacting to the normal pharmacologic doses of a drug with the syptome of overdosagePharmac

37

38http://www.kegg.jp/kegg-bin/show_pathway?hsa00983+7172

Thiopurine S-methyltransferase (EC:2.1.1.67)Hypoxanthine phosphoribosyltransferase [EC:2.4.2.8]Pharmac

38

Pharmac

39

Pharmac

40

SummaryThe adrenergic receptors (subtypes alpha 1, alpha 2, beta 1, and beta 2) are a prototypic family of guanine nucleotide binding regulatory protein-coupled receptors that mediate the physiological effects of the hormone epinephrine and the neurotransmitter norepinephrine. Specific polymorphisms in this gene have been shown to affect the resting heart rate and can be involved in heart failurePharmac

41

Pharmac

42

Pharmac

43

44

Potential of PharmacogenomicsTo identify patients within a population with the same diagnosis, who are genetically predisposed either not to respond to therapy or to develop unacceptable toxicity, an then to prospectively alter their therapy to avoid treatment that is not likely to optimal. The remaining now more homogeneous population, can then treated with conventional therapy inwhich they are not genetically predisposed to fail. The approaches promise the advent of personalized medicine; in which drugs and drug combinations are optimized for each individual's unique genetic makeup.

44

45

45

Definition: Genome size is the total amount of DNA contained within one copy a Genome (haploid). It is measured by picogram (pg), mega base pair (Mb). There is a genome size database, you can search and get the information.

Significance of Genome size in Genomics

Scope of Genomics Cont 46The significance of knowing genome size:It is importance to the genomics and broader scientific community as fundamental features of genome structureIt uses for genomics-based comparative biodiversity studies, and It is a direct estimators of the cost and workload of genome projects

46

47

Mycoplasma genitalium (genome size; 0.58 Mb) is a small parasitic bacterium that lives on the ciliated epithelial cells of the primate genital and respiratory tracts. M. genitalium is the smallest known genome that can constitute a cell, and the second-smallest bacterium after the endosymbiont Carsonella ruddii. Until the discovery of Nanoarchaeum in 2002, M. genitalium was also considered to be the organism with the smallest genome.N.B: There is a difference between smallest parasitic bacteria and smallest free living bacteria. The smallest known free living bacterium is Pelagibacter ubique with 1.3 Mb. Genome size Cont

47

48

Genome size Cont Aplaha- proteobacteria live in the ocean, 25% abundance The first cultures members of the cladeThe smallest genome and free living

48

49

Genome size Cont

49

Synthetic Biology and Genome Size50Synthetic Biology: its primary focus is building a minimal arteficial cell.

50

51Streamlining genomes of model bacteria revealed genome reduction lead to unanticipated beneficial properties such as:High electroporation/transformation efficiencyAccurate propagation of recombinant genes and plasmidsSuitable to construct robust minimal synthetic genome, it provides a minimal cell a good chassis to assemble kinds of functional modules. Synthetic Biology is referes to reliabley engineer biological systems that perform human-defined function. C. Smolke (Nature 441: 277- 279)

51

52The field of synthetic biology holds a great promise for:Design, Construction and Development of artificial (i.e. man-made) biological (sub)systems

Thus offering potentially viable new routes to genetically modified' organisms, smart drugs as well as model systems to examine artificial genomes and proteomes.

The informed manipulation of such biological (sub)systems could have an enormous positive impact on our societies, with its effects being felt across a range of activities such as the provision of healthcare, environmental protection and remediation, etc.Promise of Synthetic Biology

52

53Genome Organization

and also Linearly arrenged

53

54

Prokaryotic & Eukaryotic Ribosomes

54

Single, Circular DNA molecule, localized within nucleoid (the lightly staining area in the center of the cell) Linearly arrenged, histone complex and packagined in organized passion 55

55

56Example: E. coli 89% coding> 4,000 genes122 structural RNA genesProphage remainsTransposable elements: Insertion sequence (IS) and Transposons (Composite and non-composite)Horizontal transfers (conjugation)Prokaryote genomes

56

The Genetic Features of Prokaryotic Genomes Genome sequence inspection can be used to locate genes blc genes are not random series of Nucleotides but instead have distinct features

Fig X. A protein coding gene is an open reading frame (ORF) of triplet codons

Fig X. A dsDNA molecule has six reading frames (computational prediction!). Both strands are read in the 5 to 3 direction. Each starnd has three reading frames, depending on which nucleotide is chosen as the starting position. However, simple ORF scans are less effective with genome of higher Eukaryotes this is partly blc of their gene are often split by introns.Prokaryote cont57

57

A simplified version of prokaryotic operon organization. Genes A, B, and C are transcribed together onto a single polycistronic transcript, which is then translated to produce three separate proteins.

Proteins originating from genes of a common operon often have similar functions, interact physically through protein-protein interactions, or participate in shared biochemical pathways.

Gene organization in the prokaryotic genomeProkaryote cont58

PromoterPromoterOperatorStructural gene(s)Represor protein encoding gene

58

59In genetics, an operon is a functioning unit of genomic DNA containing a cluster of genes under the control of a single regulatory signal or promoter.The genes are transcribed together into an mRNA strand and either translated together in the cytoplasm, or undergo trans-splicing to create monocistronic mRNAs that are translated separately, i.e. several strands of mRNA that each encode a single gene product. The result of this is that the genes contained in the operon are either expressed together or not at all. In short, Several genes must be both co-transcribed and co-regulated to define an operon.It is the main features of prokaryotic genome, it is made of four components: promoter, regulator, operator and structural genes Operon

59

60Promoter a nucleotide sequence that enables a gene to be transcribed. The promoter is recognized by RNA polymerase, which then initiates transcription. In RNA synthesis, promoters indicate which genes should be used for messenger RNA creation and, by extension, control which proteins the cell produces.

Regulator - a These genes control the operator gene in cooperation with certain compounds called inducers and co-repressors present in in the cytoplasm. A regulator gene is not necessarily adjacent to the operator gene its controls. The regulator gene codes for and produces a protein substance called repressor. The repressor substance combines with the operator gene to repress its action.

Operator a segment of DNA that a repressor binds to. It is classically defined in the lac operon as a segment between the promoter and the genes of the operon. In the case of a repressor, the repressor protein physically obstructs the RNA polymerase from transcribing the genes.Structural genes the genes that are co-regulated by the operon.An operon is made up of four basic DNA components:

60

Prokaryotic genome contOperons (group of genes that are located adjacent to one another in the genome, with perhaps just one or two nucleotides bln the end of one gene and the start of the next) are the chractersitic of features of prokaryotic genomesAll genes in an operon are expressed as a single unitHence, prokaryotic genenomes have more compact genetic organization with a little space bln genes. How? Give your explanation (Hint: compare with eukaryotic conding gene organization)

beta-galactosidase: This enzyme hydrolyzes the bond between the two sugars, glucose and galactose. It is coded for by the gene LacZ. Lactose Permease: This enzyme spans the cell membrane and brings lactose into the cell from the outside environment. The membrane is otherwise essentially impermeable to lactose. It is coded for by the gene LacY. Thiogalactoside transacetylase: The function of this enzyme is not known. It is coded for by the gene LacA.Prokaryote cont61

61

62

62

63

Regulation of the tryptophan OperonThe operon contains five structural genes involved in the biosyhthesis of tryptophan: trpE, D, C, B and A. Expression of these genes is controled at two levels: The trpR gene encodes a repressor that in the presence of tryptophan, bind to the operator (o) block transcription. In addition, expression is mediated by an attenuator sequence that prematurely terminates transcription when high levels of tryptophan are present. In this case, the attenuated RNA consist of only a short leader sequence (L). P = promoter Premature termination

63

64Attenuation (in genetics) is a proposed mechanism of control in some bacterial operons which results in premature termination of transcription and which is based on the fact that, in bacteria, transcription and translation proceed simultaneously.

Attenuation involves a provisional stop signal (attenuator), located in the DNA segment that corresponds to the leader sequence of mRNA. During attenuation, the ribosome becomes stalled (delayed) in the attenuator region in the mRNA leader.

Depending on the metabolic conditions, the attenuator either stops transcription at that point or allows read-through to the structural gene part of the mRNA and synthesis of the appropriate protein.Regulation of the tryptophan Cont..

64

65Attenuation, or dampening, of the trp operon is made possible by the fact that the rate of translation influences RNA structure, which in turn influences the rate of transcription.

Translation therefore interferes with transcription, making this an example of translation-mediated transcription attenuation.

Mechanistically, this kind of attenuation is achieved because special sequences located near the beginning of the transcript, called the leader (trpL), interact to create two possible RNA conformations: one that terminates transcription (the terminator stem), and one that is permissive to transcription (the anti-terminator stem) (Figure at left. Regulation of the tryptophan Cont..

Fig. Mechanism of transcriptional attenuation of the trp operon.

65

ORF scan is approperiate for protein coding genes, but what about those genes for functional RNAs such as rRNA and tRNA. They have their own distinctive features, which can be used to aid their discovery in the genome sequence.They have the ability to fold into secondary stucture, such as the cloverleaf (tRNA- intramolecular base pairing)

Locating genes for functional RNAProkaryote cont

Anticodon66

66

Non-protein coding sequences make up only a small fraction of Prokaryotic genomes.

% of DNA Non-coding for Protein

Fig. X A 50 Kb segment of the E. coli genome. Insertion sequence (IS) are examples of transposable elemnts (TEs). Prokaryote cont67According the diagram, when the genome size increases the number of non-protein coding genes also increases

67

They have a multiple linear chromosomes, each chromosome containing multiple origins of replication (Think about the comparative genome sizes of the two groups; Prokaryotes and Eukaryotes)

68Eukaryotic genome

They have a specialized sequences at the ends of chromosomes to ensure a proper replication of the essential components of chromosomes (Telomeres), also protects them from nuclease degradation

They have special sequences to ensure the correct segregation of homologous chromosomes during cell divisions (Centromeres)

68

Eukaryotic genome Cont

How many genes are there? This question is surprisingly not very important, and has nothing to do with the organisms complexity. There is more to genomes than protein-coding genes alone.Eukaryotic genomes cont69

69

Protein-coding genesAlthough most prokaryotic chromosomes consist almost entirely of protein- coding genes, such elements make up a small fraction of most eukaryotic genomes ( Figure)

As a prime example, the human genome might contain as few as 20,000 genes, comprising less than 1.5% of the total genome sequenceEukaryotic genome ContEukaryotic genomes cont70

70

Eukaryotic genomes contEukaryotic genomes cont

IntronsShortly after their discovery, the non-coding intervening sequences within coding genes (introns) were suggested to account for the pronounced discrepancy between gene number and genome size. It has also recently been suggested that most non-coding DNA in animals (but not plants) is intronic, which would imply that most of the genome is transcribed even though protein-coding regions represent a tiny minority. 71

71

72Fig. Initiation of Transcription in Eukaryotestranscription factor (sometimes called a sequence-specific DNA-binding factor) is a protein that binds to specific DNA sequences, thereby controlling the flow (or transcription) of genetic information from DNA to RNA

72

Most introns start from the dinucleotide GU (DNA, GT) and end with the dinculeotide AG (in the 5' to 3' direction, mRNA).

GT and AG are referred to as the splice donor and splice acceptor site, respectively

These consensus sequences are known to be critical, because changing one of the conserved nucleotides results in inhibition of splicing.

Upstream from the AG there is a region high in pyrimidines (C and U), or polypyrimidine tract. Upstream from the polypyrimidine tract is the branch point.

The branch point always contains an Adenine, but it is otherwise loosely conserved.

A typical sequence is YNYYRAY, where Y indicates a pyrimidine (C or U), N denotes any nucleotide, R denotes any purine (G or A), and A denotes adenine.

In over 60% of cases, the exon sequence is (A/C)AG at the donor site, and AG at the acceptor site. (Figure in the next slide)Exon and Intron Splicing moifs SearchEukaryotic genomes cont73

73

Fig. Exon Intron Consensus sequences in EukaryotesEukaryotic genomes cont74

polypyrimidineSplicing is controlled by specific intron sequences, called splice-donor (GU) and splice-acceptor (AG) sequences, which flank the exons. Mutations in these sequences may lead to retention of large segments of intronic DNA by the mRNA, or to entire exons being spliced out of the mRNA. These changes could result in production of a nonfunctional protein. YNYYRAY

74

Mechanism of pre-mRNA splicing

Exon 1IntronExon 25 splice site3 splice site5--3

2branch-point adenosine

33ligated exonslariat intronpre-mRNAtrans-esterificationtrans-esterification(www.wisc.edu/pharm)Cut at 5 site, lariat formationCut at 3 site, exon joining, lariat releaseEukaryotic genomes cont75

75

76It is a post-transcriptional modification in which a single gene can code for multiple proteins (protein isoforms). It is done in eukaryotes, prior to mRNA translation, by the differential inclusion or exclusion of regions of pre-mRNA. It is an important source of protein diversity.

During a typical gene splicing event, the pre-mRNA transcribed from one gene can lead to different mature mRNA molecules that generate multiple functional proteins.

In conclusion:Gene splicing enables a single gene to increase its coding capacity, allowing the synthesis of protein isoforms that are structurally and functionally distinct. Gene splicing is observed in high proportion of genes. In human cells, about 40-60% of the genes are known to exhibit alternative splicing.Alternative splicing

76

77There are several types of common gene splicing events. These are the events that can simultaneously occur in the genes after the mRNA is formed from the transcription step of the central dogma of molecular biology.

Exon Skipping: This is the most common known gene splicing mechanism in which exon(s) are included or excluded from the final gene transcript leading to extended or shortened mRNA variants. The exons are the coding regions of a gene and are responsible for producing proteins that are utilized in various cell types for a number of functions.

Intron Retention: An event in which an intron is retained in the final transcript. In humans 2-5 % of the genes have been reported to retain introns. The gene splicing mechanism retains the non-coding (Intron) portions of the gene and leads to a deformity in the protein structure and functionality.

Alternative 3' Splice Site and 5' Splice Site: Alternative gene splicing includes joining of different 5' and 3' splice site. In this kind of gene splicing, two or more alternative 5' splice site compete for joining to two or more alternate 3' splice site.Gene Splicing Mechanism

77

78

Fig.. Gene models are depicted as exons (colored rectangles) connected by introns (black lines). Green arrows indicate transcription initiation sites, dotted lines indicate splicing patterns and polyadenylation sites are denoted as poly (A). The mRNA products generated by each type of AT are shown to the right of each gene model. Simple transcription is contrasted with alternative transcript initiation, the five major classes of alternative splicing, and alternative polyadenylation. In each model, yellow exons are constitutive and blue exons are alternative.

It should be coupled with exon escaping, why?

78

79

Alternative cleavage and polyadenylation: extent, regulation and functionRan Elkon,1 Alejandro P. Ugalde1 & Reuven Agami1 Nature Reviews Genetics Volume:14,:496506 (2013)DOI:doi:10.1038/nrg3482The four different APA typesThe simplest alternative polyadenylation (APA) type, which is termed tandem 3 untranslated region (UTR) APA, involves the occurrence of alternative poly(A) sites within the same terminal exon and hence generates multiple isoforms that differ in their 3UTR length without affecting the protein encoded by the gene. The other three types involve APA events, which potentially affect the coding sequences in addition to the 3UTRs. These types are: alternative terminal exon APA, in which alternative splicing generates isoforms that differ in their last exon; intronic APA, which involves cleaving at the cryptic intronic poly(A) signal (PAS), extending an internal exon and making it the terminal one; and internal exon APA, which involves premature polyadenylation within the coding region.

79

80Sequence-based methods for profiling transcript diversity

Hypothetical transcript sequences consisting of exons (green rectangles) with intervening introns (black lines) are depicted as gapped alignments to a reference genome. The following tracks represent sequences generated by each sequence-based method. Human genes have an average of 10 exons with an average length of 250 bp. The methods are displayed in order of least to most quantitative. Abbreviations: (EST) expressed sequence tag; (SAGE) serial analysis of gene expression; (CAGE) capped analysis of gene expression; (GIS) gene identification signature.

80

Eukaryotic genomes cont

81Trans-splicing: It refers to exons located on separate pre-mRNA molecuale (intragenic and/or intergenic trans-splicing) are selectively joined to produce mature mRNA encoding proteins with distinct features and functions

81

Eukaryotic genomes cont82

82

Eukaryotic genomes cont

Take Home Message!Do you noted that annotation of eukaryotic genome is so complex than prokaryotic genomes, Why?What does mean the prokaryotic genome is more dense/compact than Eukaryotes, how do you explain this statemet. 83

83

The term, coined in 1977 by Jacq, et al., is composed of the prefix pseudo, which means false, and the root gene, which is the central unit of molecular genetics.

They are dysfunctional relatives of genes that have lost their protein-coding ability or are otherwise no longer expressed in the cell Some do not have introns or promoters (these pseudogenes are copied from mRNA and incorporated into the chromosome and are called processed pseudogenes)most have some gene-like features (such as promoters, CpG islands, and splice sites), they are nonetheless considered nonfunctional, due to their lack of protein-coding ability resulting from various genetic disablements (premature stop codons, frameshifts, or a lack of transcription) or their inability to encode RNA (such as with rRNA pseudogenes). PseudogenesJacq C, Miller JR, Brownlee GG. A pseudogene structure in 5S DNA of Xenopus laevis. Cell 12:109-120. 1977.Eukaryotic genomes cont84

84

85

85

Pseudogenes contEukaryotic genomes cont

Repeat motifs

ProcessedPseudogene!Figure X: Origins of pseudogenes: A. Retrotransposed pseudogenes: starting from the original gene (the coding sequences are in black, the non-coding introns in gray, and the promoter element is indicated by the large arrow upstream of the gene), transcription generates a primary mRNA (black and gray broken line), from which the introns are excised by RNA splicing. This mature mRNA, which contains only exons and a poly-adenosine tail, is transcribed back into DNA by enzymes called reverse transcriptases, and the DNA is reinserted back into the genome. Hence, the pseudogene product will lack intron and promoter sequences, and will bear characteristic repeat sequences at the insertion site, due to the integration mechanism. B. Duplicated pseudogenes: DNA duplication generates a more-or-less faithful copy of the original gene, including introns and, in many cases, promoter and other transcriptional regulatory elements. In most cases, this duplicated gene will undergo crippling, inactivating mutations and turn into a pseudogene (in rarer cases, the duplicated copy will acquire new functions and become a new gene). (Adapted from [DErrico I, Gadaleta G, Saccone C. Pseudogenes in metazoa: origin and features. Brief Funct Genomic Proteomic. 2004 3:157-67].) 86

86

Pseudogenes are quite difficult to identify and characterize in genomes, because the two requirements of homology and nonfunctionality are implied through sequence calculations and alignments rather than biologically proven.

Homology is implied by sequence identity between the DNA sequences of the pseudogene and parent gene. After aligning the two sequences, the percentage of identical base pairs is computed. A high sequence identity (usually between 40% and 100%) means that it is highly likely that these two sequences diverged from a common ancestral sequence (are homologous), and highly unlikely that these two sequences were independently created.

Nonfunctionality can manifest itself in many ways. Normally, a gene must go through several steps in going from a genetic DNA sequence to a fully functional protein: transcription, pre-mRNA processing, translation, and protein folding are all required parts of this process. If any of these steps fails, then the sequence may be considered nonfunctional.Pseudogenes contEukaryotic genomes contPseudogenes from the point of view of genome annotation87

87

Genome and gene duplication Genome and gene duplication can occur by several mechanisms:

Polyploidization - Autopolyploidization and Allopolyploidization based on genome origin

Segmental duplication and

Tandem gene duplication

Polyploidy is thought to be rare in animal because polyploidy can disrupt dosage compensation that is required for genetic balance. How do they do dosage balance? Why? Hint: Genome imprinting!Eukaryotic genomes cont88

88

Fig. Four scenarios for the outcome of gene duplicationGene Duplication Cont Eukaryotic genomes cont89

Sub-functionalization

(i.e. Neo-functionalization)

1)3)2)4)

89

Eukaryotic genomes cont90

90

Gene Duplication Cont .Eukaryotic genomes cont91

91

Neofunctionalization, one of the possible outcomes of functional divergence, occurs when one gene copy, or paralog, takes on a totally new function after a gene duplication event. Neofunctionalization is an adaptive mutation process; meaning one of the gene copies must mutate to develop a function that was not present in the ancestral gene.Gene Duplication Cont Eukaryotic genomes cont92

92

Subfunctionalization is one of the possible outcomes of functional divergence that occurs after a gene duplication event, in which pairs of genes that originate from duplication, or paralogs, take on separate functions. Subfunctionalization is a neutral mutation process; meaning that no new adaptations are formed. During the process of gene duplication paralogs simply undergo a division of labor by retaining different parts (subfunctions) of their original ancestral function. This partitioning event occurs because of segmental gene silencing leading to the formation of paralogs that are no longer duplicates, because each gene only retains a single function. Gene Duplication Cont It is important to note that the ancestral gene was capable of performing both functions and the descendant duplicate genes can now only perform one of the original ancestral functions.Eukaryotic genomes cont93

93

Chromatin is a term designating the structure in which DNA exists within cells. The structure of chromatin is determined and stabilized through the interaction of the DNA with DNA-binding proteins.

There are 2 classes of DNA-binding proteins. The histones are the major class of DNA-binding proteins involved in maintaining the compacted structure of chromatin. There are 5 different histone proteins identified as H1, H2A, H2B, H3 and H4 (Core Histone) .Chromatin StructureEukaryotic genomes contThe other class of DNA-binding proteins is a diverse group of proteins called simply, non-histone proteins. This class of proteins includes the various transcription factors, polymerases, hormone receptors and other nuclear enzymes. In any given cell there are greater than 1000 different types of non-histone proteins bound to the DNA.

Fig. Structure of the chromosome94

94

The binding of DNA by the histones generates a structure called the nucleosome.

The nucleosome core contains an octamer protein structure consisting of 2 subunits each of H2A, H2B, H3 and H4.

Histone H1 occupies the internucleosomal DNA and is identified as the linker histone.

The nucleosome core contains approximately 150 bp of DNA.

The linker DNA between each nucleosome can vary from 20 to more than 200 bp.

These nucleosomal core structures would appear as "beads-on-a-string" if the DNA were pulled into a linear structure and observed under an electron microscope.Chromatin Structure Cont Eukaryotic genomes cont95

95

Chromatin is found in two varieties: euchromatin and heterochromatin. Originally, the two forms were distinguished cytologically by how intensely they stained

Euchromatin is less intense, while heterochromatin stains intensely, indicating tighter packing.

Heterochromatin mainly consists of genetically inactive satellite sequences, and many genes are repressed to various extents, although some cannot be expressed in euchromatin at all. Both centromeres and telomeres are heterochromatic, as is the Barr body of the second, inactivated X-chromosome in a female.Chromatin Structure Cont Eukaryotic genomes cont

96

96

Heterochromatin is a tightly packed form of DNA, which comes in different varieties. These varieties lie on a continuum between the two extremes of Constitutive and Facultative heterochromatin.

Both play a role in the expression of genes, where constitutive heterochromatin can affect the genes near them (position-effect variegation) Facultative heterochromatin is the result of genes that are silenced through a mechanism such as histone methylation or siRNA through RNAi. Chromatin Structure Cont Eukaryotic genomes contThe regions of DNA packaged in facultative heterochromatin will not be consistent between the cell types within a species, and thus a sequence in one cell that is packaged in facultative heterochromatin (and the genes within poorly expressed) may be packaged in euchromatin in another cell (and the genes within no longer silenced). However, the formation of facultative heterochromatin is regulated, and is often associated with morphogenesis or differentiation. 97

97

DNA Modification and Genome Expression Important alternation of genome activity can also be achieved by making chemical changes to the DNA itself.

These changes are associated with the semi-permanent silencing of the genome, possibly entire chromosome, and often the modified state is inherited by the progeny arising from cell division.

The modification are brought about by DNA methylation.

CpG islands or CG islands are genomic regions that contain a high frequency of CpG sites but to date objective definitions for CpG islands are limited.

In mammalian genomes, CpG islands are typically 300-3,000 base pairs in length. They are in and near approximately 40% of promoters of mammalian genes.Eukaryotic genomes cont98

98

Fig. Methylcytosine forms the same base pair with guanine as cytosine, because the methyl group does not block the formation of the inter-base hydrogen bonds.

Fig. When cytosine is deaminated, it becomes uracil. Repair enzymes recognize this as an abnormal DNA base and replace the uracil with a cytosine. However, when 5-methylcytosine is deaminated, it becomes thymine, which replaces the cytosine. The proof reading enzyme may keep the change and edit G into A.Eukaryotic genomes contDNA Modification cont

Fig. Deamination of 5-methylcytosine to thymine has led to the replacement of CpG sequences with TpA over time.99

Mutation

99

Figure 2. Methylation of CpG islands silences gene expression.

Figure 4. Methylation of CpG islands leads to long term gene silencing.Eukaryotic genomes cont

Figure 3. Methylation of CpG islands, together with histone deacetylation and other modifications, silences genes through the mechanism of chromatin remodeling and heterochromatin formation.DNA Modification Cont100The region also called DNA methylation Domain (DMA) or Imprinting Control Region (ICR)

100

Fig. Maintaining methylation and de novo methylationTypes of methylation

There are two types of methylations: Maintenance methylationIn order for genes to remain permanently silenced by this mechanism, the DNA methylation patterns must be stably transmitted to daughter cells. This is accomplished through the activity of DNA maintenance methylase, which detects CpG methylation in one strand of inherited DNA and methylates the other daughter strand, (Figure to the right).

2. De novo methylation It adds methyl groups at totally new position and so change the pattern of methylation in a localized region of the genomeReading assignment Methylation is involved in genome imprinting and X chromosome inactivation, How? Its significance in dosage balance (in human) Eukaryotic genomes contDNA Modification Cont101

101

In biology, histones are highly alkaline proteins found in eukaryotic cell nuclei that package and order the DNA into structural units called nucleosomes. They are the chief protein components of chromatin, acting as spools around which DNA winds, and play a role in gene regulation. Without histones, the unwound DNA in chromosomes would be very long (a length to width ratio of more than 10 million to one in human DNA).

For example, each human cell has about 1.8 meters of DNA, but wound on the histones it has about 90 micrometers (0.09mm) of chromatin, which, when duplicated and condensed during mitosis, result in about 120 micrometers of chromosomes.Histones Eukaryotic genomes cont

H1DNA Modification Cont102

102

Five major families of histones exist: H1/H5, H2A, H2B, H3, and H4. Histones H2A, H2B, H3 and H4 are known as the core histones, while histones H1 and H5 are known as the linker histones.

Two of each of the core histones assemble to form one octameric nucleosome core particle, and 147 base pairs of DNA wrap around this core particle 1.65 times in a left-handed super-helical turn.

The linker histone H1 binds the nucleosome and the entry and exit sites of the DNA, thus locking the DNA into place and allowing the formation of higher order structure.

The most basic such formation is the 10nm fiber or beads on a string conformation. This involves the wrapping of DNA around nucleosomes with approximately 50 base pairs of DNA separating each pair of nucleosomes (also referred to as linker DNA).

The assembled histones and DNA is called chromatin.

Higher-order structures include the 30nm fiber (forming an irregular zigzag) and 100nm fiber, these being the structures found in normal cells. During mitosis and meiosis, the condensed chromosomes are assembled through interactions between nucleosomes and other regulatory proteinsClass of Histone Eukaryotic genomes contHistones Cont 103

103

Fig. Histone modifications regulates chromatin structure and functions Eukaryotic genomes contAc attachment of acetyl group to lysine amino acids in the N- terminal regions of each of the core molecules. The enzyme mediate the acetylation is histone acetyl-transferase (HAT) Ac reduces the affinity of the histone for DNA and possibly reduces the interaction bln individual nucleosomes, destablizing the 30nm chromatin fiber. Hetrochromatin unacetylated whereas those in functional domains are acetylated this indicate the mecahnism is important for DNA packaging and gene expression regulation Gene activation often reversible the deacetylation is done by Histone deacetylase (HDACc)Chemical Modifications of HistoneHistones Cont 104

104

Eukaryotic genomes contLysine acetylation is not the only type of histone modification but the best studied form of histone modification

Methylation of lysine and Argenine residues of the N-terminal region of H3 and H4, it is reversible event.

Phosphorylation of serine residues in the N-terminal regions of H2A, H2B, H3 and H4

Ubiquitination of lysine residues at the C termini of H2A and H2B. This modification involves addition of the samll, common (ubiquitous) protein called ubiquitin or s related protein rather than unhelpfully called SUMO.

Histones Cont 105

105

106Chromatin structure role of acetylationSome coactivators have HAT (Histone acetyltransferase) activityLinks histone acetylation, chromatin structure and gene activationHAT activity of co-activator acetylates core histones bound to promoter DNA causing release of nucleosome core particles or loosening of histone-DNA interactionSubsequent binding of transcription factors and RNA polymeraseOnce transcription is initiated RNA polymerase is able to transcribe DNA packaged into nucleosomesAcetylation is dynamic enzymes also remove acetyl groups (Histone deacetylases (HDACs))

Histone Cont

106

107Chromatin structure role of deacetylationRemoval of acetyl groupsHistone deacetylases (HDACs)HDACs associated with transcriptional repressionHDACs are subunits of larger complexes corepressorsHDACs guided to regions of DNA by methylation patterns Example:Inactive X chromosome of femaleLargely deacetylated histonesActive X chromosome has a normal level of histone acetylationHemizygous:

Histone Cont

107

108

XYHuman chromosome Karyotype

108

109Chromatin structure Acetylation / Deacetylation

Histone Cont

109

110

Chromatin structure Acetylation / DeacetylationHistone Cont

110

END OF THE LESSON 111

111