Genes & genomes - UniFI - DiSIA - Sito...

125
Genes & genomes Genes & genomes Duccio Duccio Cavalieri Cavalieri [email protected] [email protected]

Transcript of Genes & genomes - UniFI - DiSIA - Sito...

Genes & genomesGenes & genomes

DuccioDuccio [email protected]@unifi.it

Genes structure

Chromatin structure

Chromosome structure

Genome projects

Genomes structures

Mendel 1866 – Ereditarietà dei caratteriSutton 1903 – I Geni sui CromosomiGriffith 1928 – Il Fattore TrasformanteAvery, MacLoad, McCarty 1944 – DNA: materiale genetico

Watson, Crick et al. 1953 – il DNA è una doppia elicaSanger et al. 1951 – Sequenziamento ProteineCrick 1957 – scoperta del tRNAVarious - 1960-65 – Mappaggio CodoniLeder et al. - 1977 – Introni

Sanger et al., Maxam, Gilbert et al. 1977 –Sequenziamento DNA

Dalla Genetica alla GenomicaDalla Genetica alla Genomica

Mullis - 1990 – PCRAltschul et al. - 1990 – BLAST1990’s - Varie metodologie “gene knockout” e RNAi

Pat Brown, Ron Davis 1995 – Microarrays

Consortium 1996 – sequenziamento genoma di S. CerevisiaeConsortium 1998 – primo genoma animale: C. elegansConsortium 2000 - primo genoma vegetale : Arabidopsis thalianaConsortia/Celera 2001 – Bozza Sequenza Genoma Umano

What is a gene?If you will ask to a....A geneticist

A genetic engeneer

A biochemist

A molecular Biologist

A sociobiologist

A philosopher

A poet

The self replicating unit able to transfer itself according to the Mendel laws

The information necessary to a cell to produce a protein

The fraction of a chromosome that can be transcribed

The idea that helps us in the comprehension of the mistery of life and its development.

A structure that is existed long enough and that is sufficiently complex to serve as base for evolution

A molecule capable to replicate and recombine that can be transfered in a receiving cell

She will answer......................

A gene is a gene....., is a gene....., is a gene.....

Evolution of the gene conceptMendelian trait

Gene

Sutton & Boveri

One gene one enzymeOne gene one protein

Beadle & Tatum

DNA structure

Watson & Crick

Introns-exons

Leder et al.

1866

1902

1940

1953

1977

1990-2000… Genome Projects

The genes are in couples as the chromosomes and

the 2 members of each pair segregate in a balanced fashion in the gametes

MessageMessage

A gene is a DNA region that can be transcribed A gene is a DNA region that can be transcribed in a functionalin a functional RNA RNA in a defined and precise in a defined and precise

moment based on the specific needs of themoment based on the specific needs of the cell cell to be later translated in a proteinto be later translated in a protein

What is Protein Function? The Post-Genomic View

• The biochemical reaction in which it participates?

• The biological process in which it is involved?

• The genes and proteins it interacts with?• The genes and proteins with which it is co-

regulated?

Every cell of an organism contains one or more sets of chromosomes, one genome

The genome is constituted by one or more long molecules of DNA that are organized in

chromosomes

Genome DefinitionGenome Definition

TheThe procaryotic cellprocaryotic cellThe genome is constituted by a circular chromosome in single copy (haploids)

TheThe Eucaryotic cellsEucaryotic cellsPossess a nucleus, a nuclear genome and organelles.

The nuclear genome is constuted of several linear chromosomes:

•In single copy, haploids (S. cerevisiae and germ cells)

•In double copy, diploids (animals)

•In multiple copy, polyploids (plants)

Epulopiscium fishelsoni, batterio 100 µm x 0.5mMthiomargarita.

Eucariotic cell diameter from 2 to 200 µm

S.cerevisiae 4 micrometers haploid6 micrometers diploid

Bacteria are smaller than eucarya

Thiomargarita namibiensis

Spirochete

Diplococcus

Sarcine

Stafilococcus

Bacillus

Eucaryotes have mitochondria

DNA avg 4x10 to the sixth bpfrom 0.65 to 10 megabases

DNA supercolied in a circulmolecule

3-4 plasmidis

transcription of DNA in RNA and translation of RNA in protein are simultaneous.

1. Does not have a nuclear membrane.

2. Has a unique circular genome a DNA double helix.

3. E. Coli = 4x106 bp, 99% of which is coding sequences, a compact genome only 1% is made of repeated non coding regions.

4. Genes are organized in operons functional units where correlated genes (es. enzymes of a metabolic pathway) are colocalized one next to the other on the chromosome,transcribed in only one polycistronic mRNA, and regulated as a whole.

Procariotic genome

Procaryotic gene StructureProcaryotic gene Structure

Regulatory Region responsible of the

onset of transcription

Coding RegionTerminator

Promoter Operator

What is a promoter

• Is a DNA region, usually a palyndromicsequence, recognized specifically by a given transcription factor.

• In many cases more than 1 transcription factor is needed to activate transcription, therefore the promoters have a complex structure.

TRP operonTRP operon E. coliE. coli

Proteins E+D = I° enzyme of the trp pathway

Protein C= intermediate step

Protein A + B = triptophan synthase

Negative and positiveNegative and positive transcritpional control of transcritpional control of laclac operon by repressor operon by repressor protein andprotein and cAMPcAMP--CAP respectivelyCAP respectively

Microbial genome projectsMicrobial genome projects

• 2,160,837-bp

• 2236 regioni codificanti

• 1440 (64%) known function

• 5% insertion sequences, IS ex-transposons, might cause rearrangements.

Es. StreptococcusStreptococcus pneumoniaepneumoniaeGram+Pneumonia, bacteremia, meningitis e otitis

The Institute of Genome Research (TIGR)

Comprehensive Microbial Resource (CMR)

55 microbial genomes completely sequenced in 2004. 45 species specie

Genome analysis and the tree of lifeGenome analysis and the tree of life

Archea : estremofilic

Bacteria: bacteria

Woese, 1998

The eucaryotic genomeThe eucaryotic genome

1.1. DNA DNA single copy,single copy, geni geni coding for proteinscoding for proteins

2. DNA 2. DNA in multiple copies, lines, sines, junk DNAin multiple copies, lines, sines, junk DNA

3. DNA 3. DNA spacerspacer

Eucariotic sequencesEucariotic sequences

The linear chromosmes have ends, telomeresThe linear chromosmes have ends, telomeres

And centromeres, regions in the middleAnd centromeres, regions in the middle

Rich of repeated sequences

Are important for attachment of chromosomes to the spindle body and the chromosome segregation.

A human chromosome

FeulgenFeulgenEuchromatin,Euchromatin, less coloured, less packed, active genes

Eterochromatin,Eterochromatin, more densely coloured, its borders often contain genes thjat can be turned on or off, is present near centromeres.

Citological mapsCitological mapsChromosom bandingChromosom banding::

3H-uridine labelling of RNA syntetci sytes (silver stain, black).White regions areheterocromatin near the nuclear membrane.

Heterocromatine is transcriptionally activeHeterocromatine is transcriptionally active

Active RNA synthesys

EuchromatinEuchromatin e e HEterochromatinHEterochromatin

Transcritpion active

Inactive

Boundaries between eu- and hetero-cromatine are variable?

Are tissue specific?

Genes in the borders can be switched off briging to loss of function?

RearrangementsRearrangements (traslocation or inversion)(traslocation or inversion) can can localize a gene near heterochromain or non localize a gene near heterochromain or non transcribed regions and inactivate the genetranscribed regions and inactivate the gene

GG--light bandslight bands: GC rich, contain housekeeping genes, active in every cell type.

GG--Dark BandsDark Bands: AT rich, late replication contain tissue specific genes.

Giemsa Band colorationGiemsa Band coloration

Chromosomes are organized in histonesChromosomes are organized in histones

Chromatin packing in chromosomesChromatin packing in chromosomes

Eliminating histones

Metaphase tight packing

Interfase.

Loose packing

Hygher eucaryotes genes are discontinuous Hygher eucaryotes genes are discontinuous genesgenes

They contain non translated regions, introns introns lthat interrupt the coding regions exsonsexsons, expandig gene size also 20 times.

Regulatory region, ATG

Coding Region

Terminator

Introns

Exons

Introns are transcribed together with exons but are eliminated during the mRNA maturation upon export from the nucleus trough a mechansim called RNAsplicing.

Some genes, interferons and Histons are an exception as they dont have introns.

RNA processingRNA processing

Structure of a eucaryotic gene (Introns / Exons)

3 exons interrupted by 2 introns

The “Untranslated regions” (UTRs) are transcribed regions

that will not be translated.

At 5’ end is added a7-metilguanilate cap

(m7Gppp; green)Al\t 3’ are addedd poly A residues

(poly(A))

ββ--globin geneglobin gene

RNA specific fro a protein of 147-aa

Splicing: introns removal

•Exons encode for different protein domains, a domain is a functional region of a protein.•The exon shuffling can cause a rapid evolution of the protein juxtaposing different domains in different splicing variants.•It is possible to generate a great variability variabilitàmixing a relatively small number of sequences.

•Alternative splicing processes are responsible for Tissue specific variability of the different proteins

TheThe Walter Walter GilbertGilbert hypothesys on the hypothesys on the

role of the intronesrole of the intrones

Evolution of the gene for triosophospate isomerase

Ancient origin of introns with equal position

Introns

Exons

S. cerevisiae nuclear genes do not have introns

S. cerevisiae Mitochondrial genes do have introns

S. cerevisiae genes have lost introns during evolution, S.cerevisiae is a unicellular organisms, does not need tissue specificity generated trough splicing, splicing can be cumbersome for a veryefficient model of developement.

Or introns have been generated for the first time in the mitochondrial DNA and such a system has been positively selectedduring evolution?

Saccharomyces cerevisiaeSaccharomyces cerevisiae, , the the conjunction between prochariotes and conjunction between prochariotes and

euchariotes or a very specialized euchariotes or a very specialized eucaryotic organismeucaryotic organism ??

In In S. cerevisiaeS. cerevisiae the 5the 5 genes that encode for the TRP genes that encode for the TRP synthetic genes are localized on 4 different synthetic genes are localized on 4 different

chromosomes, regulation occurs trough transcription chromosomes, regulation occurs trough transcription factors with a much more finely tuned interplayfactors with a much more finely tuned interplay

Promoters and transcription factors the p53 case

ATM ATR

CHK1

CHK2

DNA-PK JNK

CKI

PKC

CDK2

SUMO

p38

CAK

HIPK2

CK2/hSPT16/SSRP1 PCAF

p300/CBP

MDM2

HDAC

hSIR2

SIN3

Stress Signals

Post-translationalmodifications

The p53 tumor suppressor protein is an inducible sequence-specific transcription factor...

Stabilization and Activation

p53 tetramer

N

C

…that binds to a family of different response elements...

N

C

The p53 tumor suppressor protein is an inducible sequence-specific transcription factor...

RRRCWWGYYY(N)0-13RRRCWWGYYY

…and can modulate a wide array of target genes...

co-activators/adaptors/co-repressors:ADA3, ASPP, p53-BP1; p53-BP2, p33ING1, WRN, BRCA1, TFIID,

TFIIH, SIR2, CBP/p300, MDM2, MDMX

The p53 tumor suppressor protein is an inducible sequence-specific transcription factor...

N

CTarget gene

+

…that binds to a family of different response elements...

…that can modulate a wide array of target genes...

The p53 tumor suppressor protein is an inducible sequence-specific transcription factor...

Cell cycle arrest DNA repair Cell death

Target gene

P21CyclinG14-3-3sCDC25-CPC3PA26

BAXPIG3IGF-BP3NOXAPUMA AIP1

ScotinPIDDPERPApaf-1

Gadd45PCNA p53-R2p48BTG2XPC?

FASKiller/DR5TRAIL

…that can regulate:

…that binds to a family of different response elements...

p53 stability

MDM2

p53: a key player in cell cycle control

p53 co-activators: CBP/p300, ref1, p33ING1, WRN, BRCA1, ADA3

ApoptosisG1/G2 arrestDNA repair

Stress Signals

(e.g. DNA damage, nucleotide depletion, hypoxia, activated oncogenes, viral infection)

p14ARF

baxp21 PC3PA26

Killer/DR5TrailNOX-A

Transcriptional activation

IGF-BP3Gadd45

PIG3

AIP-1ScotinPIDD

p48-XPPCNA p53-R2BTG2

14-3-3σcyclinGCDC25-C

MDM2

upstream activatorspost-translational modifications:

phosphorylation, acetylation,sumolation

ATM ATRCHK1 CHK2

DNA-PK JNK

CKI PKCCDC2

SUMO-1

Rapid cell proliferation

Easily cultivated in petri dishes

Possibility to isolate mutants

Well defined genetic system

Highly versatile in gene manipulation techniques.

Non pathogenic Available in

large amounts

Saccharomyces cerevisiae as a model organismYeast Saccharomyces cerevisiae is one of the most common models of the

eucariotic cell

Asci with ascospores- S.cerevisiae

SaccharomycesSaccharomyces cerevisiae GENOMEcerevisiae GENOME (1996)(1996)

•Genome: 13.4 Mb

•16 chromosomes

•tRNA 275

•rRNA 140 repeats

•Quasi 1MB of repetitive sequences (junk DNA??)

•Proteins 6300-5570

•70% of the genome is coding

•One gene every 2Kb

•4% of the genes have introns

•60% of the proteins have known function

•20% of the proteins have assigned function in silico

•Duplicated gene families are in subtelomeric regions

Yeast: ideal model to integrate biological function

Yeast: a great deal of biological function

1994

Genetic Map (Gyapay et al., 1994)

23 linkage groups (one per chromosome) with 1.200 markers each spaced 1 cM

1995

Phisical map (Hudson at al., 1995): 52.000 STS (Sequence TaggedSite) intervals of 60 Kb

Database 30.000 EST (Adams et al., 1995)

1998

Collection of 3000 SNPs (Wang et al., 1998)

2000

First sequence of Chromosome 21 (Hattori et al., 2000)

The human genome projectThe human genome project

Human Genome ProjectHuman Genome Project

(National Institute of Health & Department of Energy)

Goals:

•Genetic maps.

•Sequence 3 miliardi di lettere del DNA umano con un’accuratezza maggiore del 99,99 % entro il 2005.

•Identify every human gene (ORFs e ESTs, functional and comparative data).

•Compilation of a polymorphsms database (SNPs)

February 2001February 2001

•Venter et al., The sequence of the human genome. Science 2001

•International Human Genome Sequencing Consortium (IHGSC), Initial sequencing and analysis of the human genome. Nature 2001

27,000 – 34,000 genes

DNA SourceDNA Source

3/4 2/3Only males in I° draft

No ethnic ID2 males 3 females

Ethnic base equally distributed

DNA in large excess respect to what is needed for making proteins

Only 3% of the sequences is coding

97% is non coding

1 error every 10.000 bases

Frequency of SNPs: 10 every 10.000 bases

1 error every 10 SNPs

Extended centromeric heterochromatin will never be sequenced(20% genoma).

Chr. 13 (3,038,416 bp)Longest intergenic region (between annotated + hypothetical genes)

Chr. Y (0.36)Chromosome with lowest proportion of DNA in annotated exons

Chr. 19 (9.33)Chromosome with highest proportion of DNA in annotated exons

605 MbpTotal size of gene deserts (>500 kb with no annotated genes)

Chr. 13 (5 genes/Mb), Chr. Y (5 genes/Mb)Least gene-rich chromosomesChr. 19 (23 genes/Mb)Most gene-rich chromosome27 kbpAverage gene size

Titin (234 exons)Gene with the most exons

59Percent of hypothetical and annotated genes with unknown function

39,114Number of genes (hypothetical and annotated)42Percent of annotated genes with unknown function26,383Number of annotated genes35Percent of genome classified as repeats

Chr. X (25%)Least GC-rich 50 kb

Chr. 2 (66%)Most GC-rich 50 kb

9Percent of undetermined bases in the genome

38Percent of G+C in the genome54Percent of A+T in the genome2.66 GbpSize of the genome (excluding gaps)

2.91 GbpSize of the genome (including gaps)

VenterVenter etet al., The al., The sequencesequence of the of the humanhuman genome. 2001genome. 2001 ScienceScience

Distribution of the functions ofDistribution of the functions of 26,38326,383 human geneshuman genes

Venter et al., Science 2001

Internet ResourcesInternet Resources1988: National Center for biotechnology Information (NCBI)

http://www.ncbi.nlm.nih.gov

HumanHuman genome genome ReourcesReources::

OnlineOnline MendelianMendelian InheritanceInheritance in Man (OMIM)in Man (OMIM)((http://www.ncbi.nlm.nih.govhttp://www.ncbi.nlm.nih.gov))

CancerCancer Genome Genome AnatomyAnatomy Project (CGAP)Project (CGAP)

http://http://cgapcgap..ncinci..nihnih..govgov//

Alleles mutations and population genetics

Genes in populationsGenes in populations

New alleles in populations appear trough mutations in the cells of the germ line (mutations in the somatic line get lost with the death of the individual).

Every allele has a given allelic frequence in the population depending on its fitness.

Mutations alter a status quo acquired trough selection, therefore most mutations are deleterious.

Allelic frequences changes with time trough interaction with the environments as result of natural selection and genetic drift..

Naturale selection changes the frequency of an allelel selecting on its fitness based on environmental or sociological considerations, the result is favourable mutations are preserved, (increase in frequency), deleterious are lost or kept unexpressed (recessive alleles present in heterozygosity), thier frequency tends to decrease, but they can be preserved in the population as they might become handy later on with a change in the environment.

An allele is fixed when reaches the frequency of 100%.

A single genotype may produce manydifferents phenotypes environment

dependent

A single phenotype may be producedby many differents genotypes

environment dependent

GenotypePhenotype 1

Phenotype 2

Phenotype 3

Phenotype

Genotype 1

Genotype 2

Genotype 3

SNPsSNPs• Single Nucleotide Polymorphism a site in the

genome where a signle nucleotide can be present in one or more forms in a collection of individuals of the same species

• SNPs usually are sostitutions, but also deletions or insertions of a single nucleotide are often observed

• Frequency in the human genome : 1 every 1Kb.

••TransitionsTransitions

••TransversionsTransversions

••InsertionsInsertions and/orand/or deletionsdeletions

Purine Purine (A G; G A)

Pirimidine Pirimidine (C T, T C)

Purine Pirimidine (G C; G T; A C; A T)

Pirimidine Purine (C G; C A; T A; T G)

Mostfrequent

Type ofType of SNPsSNPs

(A) Un errore di replicazione può portare ad un mismatch in una delle doppie eliche figlie, portando allal generezione di una molecola mutata e ad una con la corretta sequenza.

(B) Effetto di un mutagenosull’alterazione di A nel filamento inferiore della molecola parentale. Anche in questo caso si verifica un mismatch.

Examples of mutationsExamples of mutations

(A) La DNA polimerasi seleziona attivamente il corretto nucleotide da inserire in ciascuna posizione

(B) Gli errori che si verificano possono essere corretti da una attività 'proofreading' se la polimerasi possiede una attività esonucleasica 3’-5’.

Se l’ultimo nucleotide inserito è accoppiato alla base complementare del templatoprevarrà l’attività polimerasica.

Se invece non è accoppiato l’attività esonucleasica sarà favorita.

Meccanismi per assicurare l’accuratezza Meccanismi per assicurare l’accuratezza della duplicazione del DNA.della duplicazione del DNA.

Effect of mutations on coding regionsEffect of mutations on coding regions

nonnon coding SNPs coding SNPs

Are localized in 5’ o 3’ of (NTR), or 5’ o 3’ of non translated regions (UTR), in introns or intergenic regions.

coding SNPscoding SNPs

ReplacementReplacement Polymorphism Polymorphism Change the AA

SynonymousSynonymous Polymorphism Polymorphism Change the codon but not the AA

Nonreplacement Nonreplacement PolymorphismPolymorphism

Are the “Synonymous Polymorphism” and non coding SNPs.

Have an indirect effect on gene function aletring regolation of trascription, traduzction, splicing e RNA stability.

http//:snp.cshl.org

PolimorphismPolimorphism

This term refers to a locus represented by a different number of alleles or haplotypes in

the population

The importance of the models

As in Biology nothing makes sense unless interpreted trough evolution, in genomics nathing makes sense unless analyzed in

a comparative fashion.

Model systemsModel systems

Comparative GenomicsComparative Genomics

SyntenySynteny: conservation of gene order on the chromosomes of evolutionary related organisms.

Homologous genesHomologous genes: derived from a common ancestral locus

• Hortologous genesHortologous genes: Genes present in genomes of different organisms that derive from a common ancestral locus .

• Paralogous genesParalogous genes: Similar genes present in the same genome that derive from a process of genic or genomic duplication.

SintenySinteny Man MouseMan Mouse

A) Synteny Blocks chromosome 11 mouse and parts of 5 human chromosomes.

B) Zoom in 5q31(1Mb) with perfect synteny 23 genes(4 interleuchins)

C) Allignment of a 50kbp region.

Mouse Genome Sequencing Consortium, Nature 2002

Only (1%) is rodent specific while (14%) is in common amongst all mammals.

Mouse Genome Sequencing Consortium, Nature 2002

Taxonomy of mouse proteinsTaxonomy of mouse proteins

Functional categoriesFunctional categories

IHGSC, Nature 2001

April 2004

Rat Genome

Synteny blocksSynteny blocks

Mammals evolutionMammals evolution

Sequenced genomes

genomes being sequenced

genomes to be sequenced

Nature Reviews Genetics 3; 33-42 (2002); RAT GENETICS: ATTACHING PHYSIOLOGY AND PHARMACOLOGY TO THE GENOME

Data integration for understanding human pathologies Data integration for understanding human pathologies

Other invertebrates as models for biomedical researchOther invertebrates as models for biomedical research

December 1998CaenorhabditisCaenorhabditis eleganselegans March 2000,

DrosophilaDrosophila melanogastermelanogaster

Functional annotation of gene familiesFunctional annotation of gene families

www.geneontology.org

Gene Gene OntologyOntology

Complete Genome Sequencing

95 96 97 98 99 00 01

Bacteria1.6Mb

1600 genes

Eukaryote13Mb

~6000 genes

Animal100Mb

~20,000 genes

Human3Gb

~30,000 genes?

http://www.ncbi.nlm.nih.gov/

Bioinformatics is Born

Growth in number of residues in GenBank compared to the request for people with competence in bioinformatics

(as estimated from the number of positions advertised in Naturein March and September of each year)

Resi

dues

Posi

tion

s

Year

Bioinformatics and genomics128 Pentium processors in parallel 250 terabytes of

memoryBioinformatics core, Rosetta Resolver.

Bioinformatics allows making sense of the Sequence …

DNA genomic sequence

Gene Finding

Reguatory siteanalysis

Variation: SNPs

Exons/intronIdentification

The molecular “parts-list”: The transcriptosome

Type(~10,000 types/cell)

Splice variant (~90,000)

Quantity(Copy number)

Expression profiles

Transcriptosome Snapshots: Expression Profiling

cDNA bonded on a glass surface

Camera(Microarray)

Snapshot(Expression profile)

Scanned, hybridized array

Label RNA from cell and hybridize

to array

Reference Treatment

Prepare RNAFluorescently Labeled cDNA

Mix, andHybridize

The microarray procedure. The experimental objective in this example is to compare the transcriptional profile of cells in one growth phase (Treatment) to that of mixed-phase cells (Reference) (Figure courtesy of D. Botstein, Stanford University).

Experiments

200 10000 50.00 5.644800 4800 1.00 0.009000 300 0.03 -4.91

Gen

esCy3 Cy5

Cy5Cy3

log2Cy5Cy3

Extracting data. Slides are scanned at the appropriate excitation/ emisson spectra and intensities recorded in dye-specific channels. Log2-ratio intensities reveal fold-differences between Reference (green) and Treatment (red). These are color-coded and presented in a GENES * EXPERIMENT matrix (Figure courtesy of D. Botstein, Stanford University).

New

Scan

ScanAlyze

GenePix

Database

Data Selection

Complete Data Table (cdt)

Hierarchical

Clustering

K-Means

SVDDownload

SelfOrganizingMaps

Data flow in microarray studies. Following laser scanning, data are entered into a Complete Database Table (cdt). Multiple software packages are available as freeware forpost-scan analyses (Figure courtesy of D. Botstein, Stanford University).

The molecular “parts-list”: The proteome

TranslationDegradation

Localization

Modification (binding, cleavage,

covalent modification)

0.5-1X106 variants

Multiple Faces of the Proteome: Expression

Peptide Sequence identityProtein expression (modification)

Multiple Faces of the Proteome: Expression

Protein identity (by peptide sequence)

Relative protein expression (by

peak ratio)

Multiple faces of the proteome:Protein sequence and structure

Bionformatics and prediction of protein

structureAnalyzing existing

structures

Identification and

classification of folds

Structure alignment and

scoring

Association with sequence and

function

… Let There Be Structure

Structure prediction(Ab initio, threading, fold recognition,

homology modeling)

Domain B-2, AspartateTranscarbamolylase Propeptide of subtilisin

Threading model based on domain B-2

core

Multiple Faces of the Proteome: Protein-protein Interaction

High throughput 2-hybrid analysis of protein-protein interaction in yeast

Extracting what we already know

A protein-protein interaction pathway map automatically constructed from a user query of effective human cyclin inhibitor".

Multiple Faces of the Proteome: What we already know …

Over 11X109 records in Medline

Magical induction won’t turn the data flood into knowledge

The more you knowthe harder is

to take decisive action ???

The more you knowthe greatest the needfor tools enabling to

handle complexity

Genomes of parassitesGenomes of parassites

• 23 Mb• 14 chromosomi• 5,300 genes•The richest in (A + T) among bacteria• 90% Introns and coding regions.• Gene involved in antigenic variaiton are in subtelomeric regions.

• Most of its genes are transporters or genes involved in evading the hosyt immune system

PlasmodiumPlasmodium falciparumfalciparum

Drug design