Grading Study literature - jfmed.uniba.sk · 9/13/2018 1 Human genome –the basics Human genome...

9
9/13/2018 1 Human genome the basics Human genome project, organization, variations, gene regulation Organization The lectures will take part in Martin, Education center 1,2,3,4,5,6, 2x45min plus 15 min The seminars and practical will take part in Biomed, Division Oncology/Dept. Mol. Biology, 4. floor, Practical Room Nr. 5.51 Test 1: lectures 1-3, seminars week 5 and 6 (practical No. 2) Test 2: lectures 4-6, seminars week 7 and 8 (practical No. 3) Grading Each student mandatory 4 practicals (100% attendance) 1 presentation 10 p. 2 test max. 40p. each facultative List of 4 methods of molecular biology with applications in medicine 4 p. Study literature Mandatory Lectures in Molecular Biology on website of the Institute of molecular biology https://www.jfmed.uniba.sk/en/pracoviska/scientific - and - teaching - workplaces/pre - clinical - departments/institute - of - molecular - biology/graduate - study/ Facultative: Tom Strachan, Read: Genetics and Genomics in medicine, Garland Science 2015, selected parts Molecular biology methods in medicinebeginning in mid. 20.century Understanding of processes of replication, transcription, translation Central dogma of molecular biology Basic techniques of molecular biology manual DNA /RNA extraction, Souther/Nothern Blot, endpoint PCR, restriction analysis, radioactive DNA sequencing according Sanger Efforts of application of molecular biology methods in DNA diagnostics Molecular biology methods in medicineadvanced at the start of 21.century Subfield – human molecular genetics and genomics - the study of human gene structure and function Consequences – finishing of human genome project development of advanced techniques of molecular biology – automation of processes and high through –put analyses – automated DNA/RNA extraction, real-time PCR, fluorescence sequencing according Sanger, pyrosequencing detecting of disease causing genes and other disease-causing genetic and epigenetic changes sequencing of whole genomes Interface with genomics, bioinformatics and computational biology diagnostic and therapeutical consequences for human medicine

Transcript of Grading Study literature - jfmed.uniba.sk · 9/13/2018 1 Human genome –the basics Human genome...

9/13/2018

1

Human genome – the basics

Human genome project, organization, variations, gene

regulation

Organization

The lectures will take part in Martin, Education center 1,2,3,4,5,6, 2x45min plus 15 min

The seminars and practical will take part in Biomed, Division Oncology/Dept. Mol. Biology, 4. floor, Practical Room Nr. 5.51

Test 1: lectures 1-3, seminars week 5 and 6 (practicalNo. 2)

Test 2: lectures 4-6, seminars week 7 and 8 (practicalNo. 3)

Grading

• Each student

mandatory

4 practicals (100% attendance)

1 presentation 10 p.

2 test max. 40p. each

facultative

List of 4 methods of molecular biology with applications in medicine 4 p.

Study literature

MandatoryLectures in Molecular Biology on website of the Institute

of molecular biologyhttps://www.jfmed.uniba.sk/en/pracoviska/scientific-and-teaching-workplaces/pre-clinical-departments/institute-of-molecular-biology/graduate-study/

Facultative: Tom Strachan, Read: Genetics and Genomics in

medicine, Garland Science 2015, selected parts

Molecular biology methods in medicine–beginning in mid. 20.century

• Understanding of processes of replication, transcription, translation

• Central dogma of molecular biology • Basic techniques of molecular biology – manual DNA /RNA

extraction, Souther/Nothern Blot, endpoint PCR, restriction analysis, radioactive DNA sequencing according Sanger

• Efforts of application of molecular biology methods in DNA diagnostics

Molecular biology methods in medicine–advanced at the start of 21.century

• Subfield – human molecular genetics and genomics - the study of human gene structure and function

Consequences – finishing of human genome project development of advanced techniques of molecular biology – automation

of processes and high through –put analyses – automated DNA/RNA extraction, real-time PCR, fluorescence sequencing according Sanger, pyrosequencing

detecting of disease causing genes and other disease-causing genetic and epigenetic changes

sequencing of whole genomes Interface with genomics, bioinformatics and computational biology

diagnostic and therapeutical consequences for human medicine

9/13/2018

2

Major milestones in mapping and sequencing of the human genome - discoveries and methods

1953: The primary structure of DNA is discovered by Watson and Crick

1956: The first physical map of the human genome is determined based

on distinguishing chromosomes according to size and shape using cartain stains to produces subchromosomal banding patterns – light microscopy of stained tissue reveals that our cells contain 46 chromosomes, with a total of 24 different types of chromosome

Physical map

a map which

provides

information on

the linear

structure of

DNA molecule

showing

location of

some physical

entities on a

chromosome

Metaphase

chromosomes

Chromosome banding

• G banding – the chromosomes are subjected to controlled digestion with trypsin before staining with Giemsa, a DNA-binding chemical dye. Positively staining dark bands are known as G bands. Pale bands are G negative.

Locus –

unique

postition or

location of a

gene or

genetic

marker on a

chromosome

Genetic marker -characteristic located at the same place on a pair of homologous

chromosomes that allow us distinguish

one homolog from the other – genetic

polymorphism

Common allele is usually referred to as wild type

• Wild-type homozygote - whenalleles at a given locus are identical, the individual ishomozygous

• If the alleles are different on thematernal and the paternal copy of the gene, the individual isheterozygous at this locus

• Homozygous mutated alleles –inheriting identical copies of a mutant allele occurs in manyautosomal recessive disorders, particularly in circumstances of consanguinity )

• If two different mutant alleles are inherited at a given locus, the individual is said to be a compound heterozygote

• Hemizygous is used to describe males with a mutation in an X chromosomal gene or a femalewith a loss of one X chromosomal locus.

Goals of Human Genome Project

Generate working draft of 90% of the human genome (2001)

Obtain complete high quality genomic sequence (2003)

Make all data publically available adn develop bioinfromaticsoftware and computation biology tool

Develop novel sequencing technologies

Map sequence variations

Interpret function of genes/genome

Develop comparative genomic strategies.

Ethical, legel and social implications (ELSI)

Bioinformatics and Computational Biology

The human genome www.genome.gov/Education

Collective name for the different DNA molecules found in the cells of Homo sapiens

Comprises 25 different DNA molecules:

1. Nuclear genome: 24 different linear nuclear DNA molecules – 22 autosomes and 2 sex chromosomes X and Y, app. 21,000 protein coding genes and more than 6000 RNA genes

2. Mitochondrial genome: a single type of circular mitochondrial DNA, 37 genes

Protein- coding genes 1.1% , other conserved sequences 4%Number of protein-coding genes: app. 21,000

Number of RNA genes: thousands of RNA genes

Molecular definition of a gene

Sequence of chromosomal DNA that isrequired for production of a functional

product,be it a polypeptide or a functional RNA molecule inclusive regulatory sequences

9/13/2018

3

Nuclear genome

• Size: 3200 Mb• Number of different DNA molecules: 23 (in XX cells) or 24 (in XY cells); all

linear• Total number of molecules per cell: 46 in diploid cells• Number of protein-coding genes: app. 21,000• Number of RNA genes: unceratain, more than 15,000 • Protein-coding DNA: app. 1.1%• Noncoding DNA: 98,9%

- 4 % conserved other than coding sequences- 6,5% constitutive heterochromatin (a chromosomal region that remains highly

conserved throughout the cell cycle and shows little or no evidence of active gene expression)

- 45% transposon based repeats- 44% other non conserved ( incl. repetitive sequences)

Mitochodrial genome

• Size: 16.6 kb• Number of different DNA molecules: one circular DNA molecule• Total number of molecules per cell: often several thousands• Number of protein-coding genes: 13 • Number of RNA genes : 24 RNAs genes• Protein-coding DNA: 66%• RNA-coding DNA: 32%

Polypeptide-conding genes

• Single copy genes

• Gene families

- duplication of single copy genes

- degree of sequence similarity and structural similarity

If two different genes make very similar protein products, they are most likely to be originated by an evolutionary very recent gene duplication and tend to be clustered togheter

- clustered or dispersed trough genome

Major clases of human noncoding (nc)RNAGenes where the functional product is non-coding RNA molecule (ncRNA) )

• Ribosomal RNA• Transfer RNA• Small nuclear RNA• Small nucleolar RNA• Smal Cajal body RNA• Ribonucleases RNA• Micro RNA• Piwi-binding RNA• Endogenous short interfering RNA• Long noncoding regulatory RNA

microRNA ca. 2000 different types, about 22 nt

size regulating RNA, antisense regulation of other

genes

• Gene families with genes with high degree of sequence homology over most of the length of the gene or coding sequence

histone genes – 86 different histone sequences distributed over 10 different chromosomes; two large clusters

α-globin and β-globin genes

9/13/2018

4

• Gene families defined by common protein domain, the members may have low sequence homology, but they posses certain sequences that specify one or more specific protein domains

3. Gene superfamilies– the members are much distantly related in evolutionary terms; they encode products that are functionally related in a general sense, and show only weak sequence homology over a large segment without very significant conserved amino acid motif , but

common structural features

- general related function

Immunoglobulin superfamily – very large family encompassing immunoglobulin (Ig) genes, TCR and HLA genes

products are considerably divergent at the DNA level but which function in the immune system and contain Ig-like domain.

Tandemly repeated noncoding human DNA

1.Satellite DNA – often occurs in arrays (blocks) within 100 kb to several Mb size range

- size of repeat unit is 5-171 bp

- especially at centromers; not transcribed

Alphoid DNA – bulk of the centromeric heterochromatine; repeat unit 171 bp; important for centromere function

2.Minisatellite DNA – arrays within 0,1 kb to 20 kb range

- repeat unit 9-64 bp

Telomeric family – 3-20 kb tandem of hexanuklaotide repeat units, especially TTAGGG, which are added by specialized enzyme telomerase; acting as buffer to protect the ends of the chromosomes

Hypervariable - high polymorphic (various individual loci), organized in over 1000 arrays (0,1 to 20 kb long)

Tandemly repeated noncoding human DNA

Microsatellite DNA(simple sequence repeats;SSR or STR =short tandem repeats)

- small arrays of tandem repeats of a simple sequence (usually less than 10 bp; interspersed through genome, accounting for over 60 Mb (2% of the genome)

- arises by replication splippage

CA/TG repeats are very common, accounting for about 0.5% of the genome and are often highly polymorphic

tri,-, tetra and pentanucleotide repeats

Make all data publically available a develop bioinformaticsoftware and computational biology tools

• NCBI

http://www.ncbi.nlm.nih.gov

• Ensembl

http://www.ensemble.org

• USCS Genome Bioinformatics

http://www.genome.uscs.eduKRAS

human

genes

ITALICS

CAPITALS

9/13/2018

5

Human genome is variable in 0.1% - genetic variation describes differences between the DNA sequences of individual genomes

Human genome is variable in 0.1% -genetic variation describes differences between the DNA sequences of individual genomes -Goal of HGP - Map sequence variation

Origin of genetic variations are mutations

Mutation is process that produces altered DNA

Mutation (DNA variant) is outcome - any change- in the primary nucleotide sequence of DNA regardless of its functional consequence

Mutations result in alternative forms of DNA at the specific locus that are generally known as DNA variants

For any locus, if more than one DNA variant is common in the population (above frequency of 0.01), the DNA variation is described as polymorphisms

DNA variants that have frequencies of less than 0.01 are often described as rare variants

At any genetic locus the maternal and paternal alleles normaly

have identical or slightly different sequencesCommon allele is usually referred to as wild type

• Wild-type homozygote - when alleles at a given locus are identical (the DNA variants are identical), the individual is homozygous for common variant

• homozygous for rare variant

• If the alleles are different (DNA variants are different) on the maternal and the paternal copy of the gene, the individual is heterozygous at this locus

Human genome is variable in ca 0.1% - genetic variation describes differences between the DNA sequences of individual

genomes

Variants can occur in germline (sperm or oocytes); these can be transmitted from parents to progeny

De novo variants occur in sperm or oocytes, but are not present in parents

Alternatively, variants can occur during embryogenesis or in somatic tissues.

• Variants that occur during development lead to mosaicism, a situation in which tissues are composed of cells with different genetic constitution

• Other somatic variants are associated with neoplasia because they confer a growth advantage to cells

9/13/2018

6

The scale of human genetic variations

• Numerical variants or aneuploidy

- an entire chromosome is missing (monosomy)

- an extra copy is present (trisomy)

• Structural DNA variants ≥ 50 – 100 bp in size and including insertion, deletions, duplications and inversions of chromosomal regions

- Large copy number variants (CNV) involving hundreds of kb to Mb (≥ 500 Kb) of DNA that are either missing or duplicated in tandem (some times multiple times), can be very deleterious involving many genes and are typically de novo; are rare

- Small CNV - < 100 Kb

However there is imprecise cut-off between indels and copy number variants

• Microsatellites and other polymorphisms due to variable number of tandem repeats

• Small scale insertion/ deletions (insdel)

• Single nucleotide variants (SNVs) and single nucleotide polymorphisms (SNPs), when the variant exceeds the frequency of 0.01 in the population

The most common type of genetic variation in the human genome is due to single nucleotide substitution

genotype A

genotype T

Consequences of human genome variations

Depends from the impact on the protein production

• Deviations from normal gene expression

- decreased expression (one allele is inactivated from loss of a gene copy - haplosinsuficiency

- increased gene expression – one allele is duplicated from gain of a gene copy- triplosensitivity

IS SUFFICIENT TO CAUSE DISEASE IN SOME CASES – NUMERICAL, STRUCTURAL AND LARGE CNV HAVE THE GREATEST POTENTIAL TO DO DAMAGE

because they affect larger number of the genes

• Extra copies of genes - overexpression

• Too few copies – underexpression

• Translocation and inversions – fusion genes

Consequences of human genome variations

Depends from the impact on the protein production

Small CNVs and SNVs have variable effects ranging from completely innocuous to highly deleterious

- Innocuous – mostly in non coding part of the genome or outside of protein—coding sequences

- SNV most prevalent is single nucleotide substitution (1 individual more than 4 mil. )

- It can have effect of the gene product function when localized in exons, splice site or regulatory region.

Effect of SNVs on proteins structurei) no effect on the aminoacid composition (synonymous), ii) a change of one amino acid (nonsynonymous), iii) a change in the length of the protein due to a premature stop signal being induced

Consequences of human genome variations

Reference genome/ gene sequence- there is no „control“ or normal human gene or genome - to provide some sort of standard – reference genome/gene have been assembled

• Reference genome

- assembled representing a mosaic of DNA from over dozen anonymous volunteers, should contain variants that are notassociated with diseases

- Can Contain important variants of health significance that are not necessarily normal

- Is constantly updated

• Reference gene sequence is used to compare the patientssequence with - DNA sequence of a gene defines the gene coordinates and variants supposed to be not associated with diseases

9/13/2018

7

Controls on gene expression operate at several levels .

• Transcription of genes is controlled by transcription factors TF binding to specific DNA sequences within the regulatory regions of genes.

• Chromatine conformation: DNA methylation and histone code

Restriction of gene expression

Temporal – cell cycle stage, developmental stage, cell differentiation, induction

Spatial – tissues, cells,

Temporal restriction of gene expression• Cell cycle stage - some genes are only expressed at

specific times in the cell cycle (e.g. histones only at the S phase)

• Developmental stage – at the very earliest stage of development transcription does not occur; instead cell rely on previously synthesized mRNA; later in development some genes may be expressed transiently at specific stages; some genes are expressed at different developmental stages as in the case of beta-globin.

• Differentiation stage – as the cell differentiate, their genomes are modified resulting in altered expression pattern; in some differentiated cells transcription does not occur.

• Inducible expression – some genes are activate din response to environmental cues or extracellular signaling. The expression is easily reversed if the including factor is removed.

Spatial restriction of gene expression in mammalian cells

• Tissue-specific gene expression – as in the case of beta-globin gene which is expressed in erythoroid cells

• Expression in individual cells – some specialized genes produce different products in individual cells belonging to the same cell type; different B lymphocytes in a person express different (cell-specific) antibody molecules

Regulation of gene expression

• Transcriptional

• Post-transcriptional

• Translational

• Protein degradation

Regulation of gene expression

• Transcriptional – genetic (direct interaction of a control factor with the

gene) – cis-actingtrans-acting

-modulation (interaction of a control factor with the transcriptional machinery)

- Epigenetic (non-sequence changes in DNA structure)• Post-transcriptional • Translational• Protein degradation

9/13/2018

8

PROMOTERS – are combinations of short sequence elements (usually located in the immediate upstream region of the gene- often within 200 bp of the transcription start site) which serve to initiate transcription.

Position of cis-acting elements within promoter sequences

• TATA box, usually found at a position about 25 bp upstream (-25) from the transcriptional start; it is typically found in genes which are actively transcribed by RNA pol II

• GC box found in a variety of housekeeping genes, it appears to function in either orientation

• CAAT box often located at position -80; it is usually the strongest determinant of promoter efficiency

Cis-acting gene sequences -specific recognition elements

recognized by tissue-specific TFs

• ENHANCERS – positive transcriptional control elements which are particularly prevalent in mammals; they serve to increase the basal level of transcription which is initiated through the core promoter elements

• They function is independent of both their orientation and the distance (in some extent)

• SILENCERS – serve to reduce transcription levels; • RESPONSE ELEMENTS – modulate transcription in response to

specific external stimuli; they are usually located upstream of the promoter element (often within 1 kb of the transcription start site)

• A variety of such elements respond to the specific hormones (e.g. retinoic acid or steroid hormones such glucocorticoids)

THE CONCEPT IS

that the understanding of gene expression

changes in physiological and pathological

situations enables us to understand and/or

monitor these processes –

diagnostics and therapeutical

consequences

Pathological gene expression – non-physiological gene expression

Genetic changes in the regulatory mechanism of the control elements of gene expression – examples

• 1. mutations within the promoter region

• 2. mutation within enhancers, silencers and response elements

• 3. gene is under control of inappropriate enhancer, silencer or response elements e.g. gene translocation

• 4. mutations in conserved splicing sequences

Pathological gene expression

Genetic changes in the regulatory mechanism of the control elements of gene expression – examples

• 1. mutations within the promoter region

• 2. mutation within enhancers, silencers and response elements

• 3. gene is under control of inappropriate enhancer, silencer or response elements e.g. gene translocation

• 4. mutations in conserved splicing sequences

DETECTION of PATHOLOGICAL GENE EXPRESSION – diagnostic, prognostic and predictive consequences

9/13/2018

9

• Nucleosome – structural unit of chromatin; it consists of a central core of eight histone proteins (2x H2A,H2B, H3 a H4) around which a strech of 146 bp of dsDNA is coiled; adjacent nucleotides are connected by a short length of spacer DNA

•The strings of beads, approx. 10 nm in diameter, are in turn coiled into achromatin fiber ; the interphase chromosomes seems to consists of these chromatin fibers

The histone code

• The histone code concept implies that particular combination of histone modifications define the conformation of chromatin and hence the activity of DNA contained therein.

• Good example of importance of histone modifications for gene expression is provided by the methylation of H3K4 –dimethylated and trimethylated H3K4 appear in discrete peaks in genome that overlap precisely with promoter regions – landmark for recruitment of RNA pol II and protection against DNA methylation by methyltransferases

• Epigenetic mechanisms of gene control describes heritable states which do not depend on DNA sequence

• (Genetic mechanisms explain heritable states (characters) which result from changes in DNA sequences (mutations))

• DNA methylation Gene repression

• (Host defense against transposons or foreign DNA)

CpG islands –

CG rich (more than 50%) unmethylated or hypomethylated DNA sequence of about hundreds nucleotides long with significant frequency of CpG dinucleotides

are target for DNA methylation that can cause local condensation of chromatin and inhibit gene expression

DNA methylation is accomplished by DNA methyltrasnferases at CpG islands

• Genomic regions with high frequency of CpG dinucleotides; CpG islands are typically 300 – 3 000 bp in length

The usual formal definition of a CpG island is a region with at least 500 bp and with a GC percentage that is greater than 55%

Methyl-CpG binding proteins with methyl-CpG-binding domain (MBD)

•MECP2 on X chromosome – loss of function mutations in MECP2 is responsible for dominantly inherited Rett syndrome

Normal delivery, heterozygous girls develop

normally for their first year but than regress

Other main criteria include loss of purposeful

hand skills, loss of spoken language, gait

abnormalities, and stereotypic hand

movements.

80-90% - dominant de novo germline loss of function mutations (from fathers

germline) in MECP2 na Xq28

MECP2 is a transcription factor – methyl-CpG binding protein