A rapid algorithm for generating minimal pathway distances: Pathway distance correlates with genome...
-
Upload
suzanna-warner -
Category
Documents
-
view
224 -
download
0
Transcript of A rapid algorithm for generating minimal pathway distances: Pathway distance correlates with genome...
A rapid algorithm for generating minimal pathway
distances: Pathway distance correlates with genome distance but
not enzyme function
Stuart Rison1*, Evangelos Simeonidis2, Janet Thornton1,3,David Bogle2, Lazaros Papageorgiou2#
1 Department of Biochemistry and Molecular Biology and2 Department of Chemical Engineering,University College London, London, WC1E 6BT, UK3 Department of Crystallography,Birkbeck College, Malet Street, London, WC1E 7HX, UK
* Corresponding author (biology): [email protected]# Corresponding author (algorithm): [email protected]
Outline
• What is pathway distance?• Why calculate pathway distance?• Original method• Novel method - mathematical
programming• Application:
– Genomic distance– Enzyme function
The shortest pathway distance between GltA and Mdh is 8 steps (considering directionality) or 2 steps (without
directionality)
Each metabolic transition represents a pathway distance unit (step)
Pathway distance considers distance between metabolic enzymes
Should take into account:• directionality• circularity
The pathway distance between GapA and GltA is 7 steps
This step is
reversible
This step is irreversible
(pathway from EcoCyc: http://ecocyc.pangeasystems.com/)
Glycolysis+
TCA Pathway Distance
Pathway Distance
• Reverses the “usual” pathway representation (substrates as nodes, enzymes as edges)
• Pathway distance is inclusive; the source enzyme has a distance of 1 step
Why calculate pathway distance?
• Metabolic pathways are complex networks of interaction enzymes, substrates and co-factors
• Relatively well characterised for certain organisms (e.g. E. coli )
• Much work done on modelling metabolism but now also much interest in pathways as an indicator of “connectivity” between genes
• Pathway distance (Dp) is an extension of this connectivity
Original Method
• Represent pathways as directed acyclic graphs
• Use arbitrary direction for pathways• “Snip” open any cycle• Perform DFT of resulting graphs• Collect set of genes at distances
2,3,…,n along resulting traversals
Glycolysis+
TCA
(pathway from EcoCyc: http://ecocyc.pangeasystems.com/)
Original Method
Original EcoCyc pathways include:• Directionality• Cycles
Dictate directionality:• Arbitrarily set direction (top to bottom, clockwise)
mdh
gltA
“Snip” cycles
Pathway Distance Algorithm
• For each metabolic pathway– For each enzyme in the pathway
• Find the minimal distances from the source enzyme to all other enzymes by solving linear programming problems of the type:
Maximise Summation_of_Enzyme_Distancessubject toEnzyme_Connectivity_Constraints
• Post processing “calculations” are integrated in the algorithm (e.g. genome distance or enzyme function conservation)
For each node i* (source)
Maximise Di
i
subject to: Dj Di + 1, (i,j): Lij = 1
0 Di T, i
Di* = 1SETS– i,j: nodesPARAMETERS
– Lij:1 if there is a link from i to j, 0 otherwise
– T: large numberCONTINUOUS VARIABLES
– Di: Distance of node i from source node
i j
Algorithm - objective function and constraints
i* AMax DA+DB+DC+DD
s.t.DA = 1
DA DB+1
DB DA+1
DC DB+1
DC DD+1
DD DC+1
DD DB+1
A
B
C
D
A
B
C
D
1
2
3
3
Algorithm - Inequalities
Key Features of Algorithm
• Hierarchical solution procedure • Based on linear programming
techniques • Using an enzyme-node network
representation
Advantages of Algorithm
• Efficiency in tackling– pathway circularity– reaction directionality
• Modest computational times• Implementation within GAMS
software system
Metabolic pathways
• We encoded 68 E. coli small molecule metabolism (SMM) pathways, these pathways were derived from EcoCyc
• This represents a set of 594 enzymes
• Pathway distances ranged from 2 to 15
Pathway Distance and Genome Distance
• Calculate minimal pathway distances for all gene pairs in each pathway
• For the same pairs, calculate the base pair separation of the genes encoding the enzymes in the E. coli genome (Dg)
• Plot percentage of gene pairs within a certain genome distance against pathway distance
Shorter genomic distances are more likely at smaller pathway distances
0.00%
2.00%
4.00%
6.00%
8.00%
10.00%
12.00%
14.00%
16.00%
18.00%
20.00%
2 3 4 5 6 7 8 9 10 11 12 13 14 15
Pathw ay Distance
Cu
mu
lati
ve p
erce
nta
ges
<100bp <1000bp <10000bp <100000bp
Genome Distance - Conclusions• Strong correlation between Dp and
Dg
• Genes with small Dp tend to have shorter Dg
• Genes involved in nearby metabolic reactions are genomically clustered
Pathway Distance and Function• Calculate minimal pathway
distances for all gene pairs in each pathway
• Compare the EC numbers assigned to the genes in each pair
1.2.1.12 12. enzymespecific
2. acts on aldehydeor oxo group
1. NAD/NADP asacceptor1. oxidoreductase
1.2.1.121.2.1.20
1.2.1.122.2.1.20
L3 cons
No cons
e.g. G-3-P dehydrogenase
Pathway distance and EC number conservation
0.00
10.00
20.00
30.00
40.00
50.00
60.00
70.00
80.00
90.00
100.00
0 2 4 6 8 10 12 14 16
Pathway Distance
Per
cent
age
of
pai
rs a
t p
athw
ay d
ista
nce
None Level 1 Levels 1+2 Level 1+2+3 All levels
Function - Conclusions
• No observable correlation between pathway distance and function (as represented by EC number)
• Enzymatic chemistries are varied along the conversion from one substrate to the next and aren’t performed in ‘blocks’ of similar catalysis
Conclusions - Algorithm
• We have an effective, correct and rapid algorithm to calculate metabolic distance
• The Dp metric can be usefully used as a measure protein functional relation
Conclusions - Biology
• As expect pathway distance correlates with genome distance
• Pathway distance does not correlate with function as determined by EC number
Acknowledgements
• Sarah Teichmann, University College London
• Peter Karp, SRI international, Melno Park, CA
• Monica Riley, Alida Pellegrini-Toole, Marine Biological Laboratory, Woods Hole, MA
A rapid algorithm for generating minimal pathway
distances: Pathway distance correlates with genome distance but
not enzyme function
Stuart Rison1*, Evangelos Simeonidis2, Janet Thornton1,3,David Bogle2, Lazaros Papageorgiou2#
1 Department of Biochemistry and Molecular Biology and2 Department of Chemical Engineering,University College London, London, WC1E 6BT, UK3 Department of Crystallography,Birkbeck College, Malet Street, London, WC1E 7HX, UK
* Corresponding author (biology): [email protected]# Corresponding author (algorithm): [email protected]
All distances
0%
20%
40%
60%
80%
100%
2 3 4 5 6 7 8 9 10 11 12 13 14 15
Pathw ay distance
Cu
mu
lati
ve p
erce
nta
ges
100 1000 10000 100000 1000000 10000000
Pathway distance and EC number conservation
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
2 3 4 5 6 7 8 9 10 11 12 13 14 15
Pathway distance
Cu
mu
lati
ve p
erce
ntag
e o
f p
airs
None Level 1 Levels 1+2 Levels 1+2+3 All levels
• i* ADA = 1
DA DB+1
DB DA+1
DC DB+1
DC DD+1
DD DC+1
DD DB+1
DE DD+1
DE DF+1
DF DC+1
DF DE+1
A
B
E
C
D
F
A
B
E
C
D
F
1
2
3
4
3
4