Applications of information theory in chemical graph theory t

14
Indi an Journa l of Chemistry Vo l. 42A, June 2003, pp. 1227- 12 40 Applications of information theory in chemical graph theory t Elena V Konstantinova *, Vladimir A Skorobogatov, Maxim V Vidyuk Sobo lev Institute of Mathema ti cs. Russian Academy of Sciences. Novosibirsk. 630090. Russia. e konsta@ math.nsc. ru . skrbg@ ma it a r. org. il Budker In stitute of Nuclear Physics . Russian Academy of Sciences. Novosibirsk. 630090. Russia. v id [email protected] Received 30 December 2002 In formation th eo ry has been used in various branches of science. During rece nt years it is applied extensive ly in chem ica l grap h th eo ry for describ in g chemi ca l structur es and for providing good co rrelations between physico-chemical and structural properties. Th e appl ica ti on of inf or ma ti on th eo ry to the probl em of characteriz in g mol ecu lar structures is prese nted in the paper. The in fo rmation in di ces based on the distance in a graph are co nside red with respec t to their correlating ability and discriminating power. We briefly revi ew here selected to pi cs in che mi ca l graph theory, in particular, th ose being conce rned with the use of information th eory to characterizing molec ul ar stru ct ure. Chemical grap h th eory is interested in th e nature of chem ical structure l . A ll structural formulae of chemical compounds are molecular graphs where vertices represent atoms and edges represent che mi ca l bond s. Figure 1 gives the schematic represe nta ti on of the derivation of a molec ul ar graph from an alkane molecule. The molecular graph is the hydroge n- s uppr essed one. Th at is the commonly used re prese ntation in chemi ca l graph theory because hydrogen atoms are sma ll and so add very little to the overa ll size of the mol ec ul e. Using the molec ul ar graph one ca n obtain, for example, the car bo n-numb er index (the numb er of carbon atoms in the hydrocarbon mol ec ule) that is kn ow n sin ce 1844 as one of the first topol og ica l indices used in chemistry to characterize molec ul ar structures. Basically topological index expresses in numeri ca l form the topology of the chemical species it presents. Topol og ica l indices are designed by transforming a molecular graph into a number and possess the remarkable ability of being able to correlate and predict a very wide spectrum of properties for a vast range of mol ec ular species. The carbon-number index is well kn ow n to provide an effective measure of the molecular volume: for the members of homologous series, the molecular volume is known to be directly proportional to the carbon- number index 2 . tThis work was supported by the Russian fund of Fundamental inves ti gations through Grant 03-01-00796. The co nstruction and investiga ti on of topol og ical indices that cou ld uniquely characte ri ze molec ular topo logy is one of the main directions of chemic al graph theor/. The isomer discrimination, structure- property relationships and stru ct ure-actIVIty correlations, and the design of compounds of desired properties are the most imp or tant trends in chemical graph theory where topological indices are commonly A Ik anc molecu le Chemical formul a H H H I I I H- C- C- C- H I I I H H H Siru ctural formul a Mo lec ular hydro gen·suppre ssed grap h II = 3 Topological index Fig. I- Schematic represe nta ti on of the derivation of the mol ec ular graph and the ca rbo n-number index from an alkane mol ec ul e

Transcript of Applications of information theory in chemical graph theory t

Indi an Journal o f Chemistry Vol. 42A, June 2003, pp. 1227- 1240

Applications of information theory in chemical graph theory t

Elena V Konstantinova*, Vladimir A Skorobogatov, Maxim V Vidyuk

Sobolev Institu te o f Mathematics. Ru ss ian Academy of Sc iences. Novosibirsk. 630090. Ru ss ia. e konsta@ math .nsc .ru . skrbg@ ma itar.org. il

Budker Institute of Nuclear Phys ics. Russ ian Academy o f Sciences. Novosibirsk. 630090. Russ ia. vid yuk @inp .nsk.su

Received 30 December 2002

In formati on theory has been used in various branches of sc ience . During recent years it is appli ed ex tens ively in chemical graph theory for describ ing chemical structures and for prov iding good corre lat ions between physico-chemical and structural properties. The appl ica tion of information theory to the problem of characterizing mo lecu lar structures is presented in the paper. The info rmatio n ind ices based on the distance in a graph are conside red with respect to their correlating abi lity and di scriminating power.

We briefly review here selected topics in chemi cal graph theory, in particular, those being concerned with the use of information theory to characterizing molecular structure. Chemical graph theory is interested in the nature of chem ical structure l

. All structural formulae of chemical compounds are molecular graphs where vertices represent atoms and edges represent chemical bonds. Figure 1 gives the schematic representation of the derivation of a molecular graph from an alkane molecule. The molecular graph is the hydrogen-suppressed one. That is the commonly used representation in chemical graph theory because hydrogen atoms are small and so add very little to the overall size of the molecule. Using the molecul ar graph one can obtain, for example, the carbon-number index (the number of carbon atoms in the hydrocarbon molecule) that is known since 1844 as one of the first topological indices used in chemistry to characterize molecular structures. Basically topological index expresses in numerical form the topology of the chemical species it presents. Topological indices are des igned by transforming a molecular graph into a number and possess the remarkable ability of being able to correlate and predict a very wide spectrum of properties for a vast range of molecular species. The carbon-number index is well known to provide an effective measure of the molecular volume: for the members of homologous series, the molecular volume is known to be directly proportional to the carbon­number index2

.

tThi s work was supported by the Russian fund of Fundamental investigations through Grant 03-01-00796.

The construction and investigation of topological indices that cou ld uniquely characterize molecular topology is one of the main direct ions of chemical graph theor/. The isomer di scrimination, structure­property relationships and structure-actIVIty correlations, and the desi gn of compounds of des ired properties are the most important trends in chemical graph theory where topological indices are commonly

A Ik anc mo lecu le

Chemical formul a

H H H

I I I H - C - C - C - H

I I I H H H

Siru ctural formul a

• • • Mo lec ular hydro gen·suppre sse d grap h

II = 3 Topo logical index

Fig. I- Schematic representation of the derivation o f the mo lec ular graph and the carbon-number index from an alkane mo lecule

1228 INDIAN J CHEM, SEC. A, JUNE 2003

used. We concentrate our attention on the first and second trends. Moreover, we consider all the aspects with respect to the information theory application.

It is well-known that application of some ideas from one scientific field to another often gives a new view on the problems. Information, one of the most general ideas in contemporary science, should be expected to penetrate in various branches of science. Indeed, applications of information theory to molecular graphs have produced results that are important in chemistry.

The science of information theory has grown mainly out of the pioneering studies of Shannon3

,

Ashb/ , Brillouin5 and Kolmogorov6. There is more

than one version of information theory . In Shannon's statistical information theory, information is measured as reduced uncertainty of the system. Ashb/ describes information as a measure of variety. In the algorithmic theory of Kolmogorov, the quantity of information is defined as the minimal length of a program that allows a one-to-one transformation of an object (set) into another one.

Applying information theory to material structures like atoms, molecules, crystals, etc., as well as to different mathematical structures like sets, groups, graphs, etc., the interpretation given by Mowshovitz7

in 1968 is more appropriate. Let a given system [ having 11 elements be

regarded according to a certain equivalence relation,

into k equivalence classes with cardinalities Hi'

Considering all the n elements partitioned into k classes, we can define the probability Pi' i = 1, ... , k ,

for a randomly selected element of this system to be found in the i -th class. Therefore, a finite probability scheme may be associated with the following structure:

2, 3, .. .

k 1 112 I1J '" 11k

P2 p) Pk

where n= ~k n., p . =nl n and ~ k Pi =1. L.J,=I I " L..J ,=I The information content of a system [ with n

elements is defined by the relation5

k

[ = nlog2 n - L ni log2 11i .. , (1) ;= 1

The logarithm is taken at basis 2 for measuring the information contents in bits . Another information

measure is the mean information content of one element of the system I defined by means of the total information content or by the Shannon relation3

:

k

I = 1/11 = - L Pi log2 Pi ... (2) ;=1

The application of information theory to different systems or structures is based on the possibility of constructing a finite probability scheme for every sys tem. One can mention here, that the criterion for partitioning the elements of a given system is not unique. The number of information measures is equal to the number of ways in which a set of 11 elements may be partitioned into different subsets, that is, the number of Young diagrams for a given n. It is always possible to select for any system several information measures, each of them closely connected with certain properties of the system. They reflect the essence of the idea of information, given by Ashb/ as a measure of the variety in a given system. This idea was used in graph theory and in chemical graph theory for characterizing graphs as well as molecular graphs and molecular structures.

At first information theory was applied to graphs in 1955 by Rashevskl, who defined the so-called topological information content of the graph [top . His definition is based on the partition ing of the vertices of a given graph into classes of equivalent vertices having the same valencies. Trucc09 in 1956 made this definition more precise on the basis of an automorphism group of the graphs. In the latter case, two vertices are considered equivalent if they belong to the same orbit of the automorphism group, i.e., if they can interchange preserving the adjacency of the graph. Rashevsky 10 used the topological information in studying the possibility of self-generation of the life on earth. As for chemical structures, information theory has been successfully applied in the study of various molecular properties 1 1- 13, in the field of

. 14 15 h ' . 16- 19 molecular dynamics ' and quantum c emlstry , in the description of the electronic structure of atoms20

, in the interpretation of the Pauli principle and Hund rule21

.

A molecular topology determines a large number of molecular properties. It was found in the last years that some biological activities of molecules, and even carcinogenecity, are closely related to a molecular topology. Thus, it is of a pertinent interest for chemistry (as well as for other natural sciences) to

KONSTANTINOVA et 01. : APPLICATIONS OF INFORMATION THEORY IN CHEMICAL GRAPH THEORY 1229

have some quantitative measure reflecting the essent ial features of a given topological structure. As it was mentioned above such measures are usually called topological indices in chemical graph theory. A lot of such indices have been suggested in the last 50 years22-36. They have usually correlated more or less satisfactorily with the molecul ar properties but could not discriminate well between structural isomers, often providing the same index for different isomers. The firs t topological index reflecting the topological structure of a molecular graph was proposed by Harry Wiener22 in 1947. The Wiener number W was defined as the sum of all edges between all pai rs of carbon atoms in hydrocarbons. It gives a good correlation with the thermodynamic properties of saturated hydrocarbon molecules but doesn't discriminate well among structural isomers.

Bonchev and Trinajstic37 applied information theory to the problem of characterizi ng molecular structures and molecular topology38-42 by means of information indices43 that are just the quantitative measure of a given topological structure. The advantage of such kind of indices is in that they may be used directly as si mple numerical descriptors in a comparison with phys ical, chemical or biological parameters of molecules in quantitative structure­property relationships and in structure-actiVIty relationships44-46. It can also be noted that information indices normally have greater discriminating power for isomers than the respecti ve topological indices. The reasons for this are that information indices are not restricted to integral values as topological indices frequently are and information indices are formed from a summation of different magnitudes which is usually greater in number than that for the topological indices.

We present here some results concerning information indices applications to characterizing molecular structures. The paper is organized in the following way . First of all, the information indices based on the distance in a graph are reviewed. Then

I 2 3 4 5 <>----<>------<

the numerical results of discriminating tests of indices on structural isomers and graphs are presented. Lastly, the corre lating ab ility of information indices is demonstrated on the classes of organometallic compound.

Information Indices Based on the Distance in a Graph

One can start looking for poss ible information indices among the graph invariants. Information indices are constructed for various matrices (an adjacency matrix , an incidence matrix, a distance matrix , a layer matrix) and also for some topological indices such as the Wiener number.

In 1977 Bonchev and Trinajstic37 introduced the information on distances to explain the molecular branching that is the critical parameter determining the relative magnitude of various molecular thermodynamic properties. Firstly they used the information indices defined by Rashevsky for graphs. However, these indices are not suitable for describing branching properties of graphs since they cannot refl ect the essence of branching. This may be exemplified by considering trees with five vertices presented in Fig. 2. The five vertices are parti tioned in different orbits In the above three graphs: 7;(2,2,1), T2 (2,1,1,1), T3 (4,1). Using Eq.(l), the

following values fo r the information content in bits are obtained: I T. =7 .61,lT =9.6 1,lT =3.6 1. One can see

I 1 ]

that thi s index cannot reproduce the obvious fact that the branching increases from a chain, through a branched tree, to a star.

So another approach to find an appropriate information measure of branching was used . One of the graph invariants is the distance matrix. Let C be a connected graph with the set of vertices V(C), n =1 V(C) I. The distance d(u, v) between vertices u

and v in a graph C is the length of the shortest path that connects these vertices. The di stance matrix D =11 d ij II, i, j = 1, . .. , n, contains the distances

+ T,: / 1. 51, /2 , 41. 13 1 To: {4, 51, / 11 .12 1, 131 T J : I I, 2, 3, 4 I. 151

Fig. 2- Trees with five vertices and thei r orbits

1230 INDIAN J CHEM , SEC. A, JUNE 2003

d;j = d(i, j) between the every p~ur of connected

vertices. Branching is connected with the distance matrix in an obvious way, since with increasing branching the distances in the graph become smaller. This can easily be seen from the distance matrices of the trees ~,T2, T, presented in Fig. 2:

0 I 2 3 4 0 I 2 3 3 I 0 I 2 3 I 0 I 2 2

O(~ )= 2 I 0 I 2 O(T, ) = 2 I 0 I I

3 2 I 0 I 3 2 I 0 2

4 3 2 0 3 2 I 2 0

0 2 2 2

2 0 2 2

0 (7; )= 2 2 0 2 I

2 2 2 0 I

0

Wiener22 first made use of the connection between the distance matrix and branching defining the topological index

I " W =- I d ij

2 '.j=1

. . . (3)

However, the Wiener number often has the same value for different graphs . For reducing degeneracies, Bonchev and Trinajstic introduced the information I D

on distances in a graph, considering all the matrix elements of di stance matrix d;j as elements of a finite

probability scheme associated with the graph in question. Let the distance of a value i appears 2n;

times in the distance matrix, where I ~ i ~ d(C) and

d(C) = max;,jEV(G) dCi, j) is the diameter of a graph.

Then n2 matrix elements d;j are partitioned into

d(C) + 1 groups, and d(C) + 1 group contains II zeros

that are the diagonal matrix elements. With each one of these d(C) + 1 groups can be associated a certain

probability for a randomly chosen distance d;j to be

in the i -th group:

[ ~. "

2,

2n2

P2

d(C) j 2nd (G)

P d(G)

where p; = 21l / n2 and Po = I/ In2 = lin .

The information on distances of a given graph will then, according to Eqs (I), (2) be

dcm I = 11

2 log2 /1 2 -l1log 2 11 - L 21l; log2 21l;

;-::: 1

- I I d (C) 211 211 I =--Ioa -- "-' loa -' 0 2 L.J 2 0 2 2

n n ; = 1 n /1

. .. (4)

... (5)

Since 0 is a symmetric matrix , Bonchev and Trinajstic consider, for simplicity of discussion, only the upper triangular submatrix that does preserve all properties of the information measure. In that case, the following expressions for the mean and total information on distances are obtained:

( 1) ( 1) d ( Ci )

E _lln- la lln - _" l a I f) - 0 0 2 L.J n; 0 0 2 II ; 2 2 ; = 1

... (6)

- E d (G) 21l 217. I f) =-L ' log2 "

;=1 n(1l -1) n(n -I) .. . (7)

where 11 (,;- 1) is the total number of upper off-diagonal

elements in the distance matrix D. These information indices correspond to the information on the di stribution of distances in the gt'aph according to their equality or nonequality and depend on the partitioning of the total number of distances into classes.

Using Eq. (7) , one can obtain the following information on distances in the three graphs with five vertices presented in Fig. 2:

2 3 4 2 3

~: 4 3 2 11; T2 4 4 2 l1 i

4 3 2 I Pi 10 10 10 10

4 4 2 P i 10 10 III

T; (~) = 1.85 T; (T2) = 1.52

2

T, : 4 6 ni 4 I>

P i 10 10

T; (T, ) = 0.97

KONSTANTINOVA el 01.: APPLICATIONS OF INFORMATION THEORY IN CHEMICAL GRAPH THEORY 1231

One can see that I ~ as well as 1 ~ = 1/(/;- 1) T: reproduces the branching properiies of trees ~, T2 , Tl

decreasing regularly with increased branching. Moreover, Bonchev and Trinajstie have shown that

IZ is a rather sensitive measure of branching having

different values for all trees with 11 = 4,5,6,7,8 (the

total number of trees is 45). The number of all possible distributions of d ;j' i.e., number of different

Ig, increases rapidly with the increase in the number

of vertices in the graph. This makes I g an appropriate

quantity for distinguishing structural isomers . However, there is another possible information

measure that can be defined on the basis of distances in the graph. Bonchev and Trinajstie introduced the

information index I; as the information on the

realized distances in a given graph that depends on the partitioning of the total distance. It is information on the partitioning the Wiener number (that is the total distance of the graph) into groups of distances of the same absolute values. Since the Wiener number is

. ~d(G) . given by formula W = L.J ;~ I I It; and following

Eqs (1) and (2) , we obtain

d i G )

l~ = W iog 2 W - Lin; iog2 i (8) i=1

dIG)· . - W L 1 1 1 =- n-iog-

D . 1 W 2 W 1=1

(9)

For the three five-vertices trees presented in Figure 2, the following values are obtained:

~: I; =62.93, T;; =3.15,

T2 : I ; =57.55, TDw =3.20,

T) : I; = 52.00, T~v = 3.25

It is easy to see that I;~ decreases with branching .

It is a more sensitive quantity than the Wiener number since it can distinguish two graphs having the same Wiener number but different i and 12;. It was checked

that ,,~v increases regularly with branching at lower

values of 12 , but at higher ones some irregularity occur and it cannot be used as a good measure of branching.

As for 1; , it is a sensitive measure of branching

having different values for all trees with n =4,5,6,7 ,8 (the total number of trees is 45). Figure 3 presents the pair of trees having the same value of the Wiener

number and the different values of IZ and I; . The values of information measures Ig and I;

were inspected3? in comparison with several

topological indices such as the Wiener number, the greatest eigenvalue of the characteristic polynomial

XI' the sum of the polynomial coefficients (or Hosoya

index24), the information on polynomial coefficients

1,,(. , and Randie connectivity index25 XR' The

inspection of these values indicates the great

sensitivity of the two information indices IZ, I; to

IV = 48

I: = 41.0774 I;' = 203.0586

IV = 46

I~ = 39.5 676 I; = 195.5544

IV= 71~

I~ = 62 .0606 I; = 328 .0287

IV = 67

I~ = 55.34 48 I; = J 12.3888

IV = 62

I~ = 52 .32 53 I; = 18 9.3857

Fig. J--The pairs of trees having the same value of the Wiener

number and the different values of If; and I~

1232 INDIAN J CHEM, SEC. A, JUNE 2003

all structural detai Is of the tree graphs. There are no two graphs among the 45 graphs examined that have the same information on the graph di stances. All the other listed indices are not so speci fico and they often have the same value for different graphs . The same results were obtained for information indices 7;~ and - IV IJ) . Their values were tested on the set of 45 tree

graphs. There are no two graphs having the same values of these indices.

Thus the information measures introduced on the basis of distance matrix appear to be very appropriate indices for discrimination of graphs . The number of different I; for the graphs having the same number

of vertices is limited by the number of all the possible di stributions n(1/ -1)/2 graph edges into k different

groups. Since the number increases rapidly with increasing values of n, one may expect the information on graph distances to have a good ability of differentiation between structural isomers even for very large systems. It was one of the main results obtained by Bonchev and Trinajstic.

Later Konstantinova, Paleev and Diudea47-49 have

confirmed that the information approach allows to design very sensi ti ve information indices based on the distance in a graph. The information distance index of vertex i was introduced in Ref. 47 and defined as follows:

~ d d H f) (i) = - L.J _ 'J_Iog? _ IJ_, j = l d(i) - d(i)

. .. (10)

where d(i) = '" /I d is the distance of a vertex i. L.J j = l IJ

Then the information distance index of graph vertices takes the form

H; = iHf)(i) ... (11) ;=1

The same approach was applied to the layer matrix A=IIAij II, i=l, ... ,n, j=I, ... ,d(G) , where \ is equal

to the number of vertices located at a distance j from a vertex i. The information layer index of graph vertices is defined by the formula

.. . (12)

where e(i) = max ,,,, V (G) dei , v) IS the vertex

eccentricity. It will be shown later that indices H; and H~ have a great discriminating power among

structural isomers. One more information index based on the distance

matrix was considered by Skorobogatov et al .50 in structure-activity correlations . The information index H2 is defined by

_ Lk d (i)k; I d (i)k; H -- --oa--2 ; =1 2W 0 2 2W '

. .. (13)

where k;, i = 1, ... , k, is the number of vertices having

the distance d(i). This index gives the linear

cOITe lations with information mass-spectrum indices on the several classes of organic and organometallic compounds50

-54

.

D'yachkov and Konstantinova55 consider the

entropy HD, the marginal entropy H~ and the

information ID based on the distance matrix as follows

/I /I d .. d .. H . =-~ ~ _'J loa _ 'J o - L..J L.; b 2

;=1 j = l 2W 2W ... (14)

= - 2W log2 2W - L 2n; i log2 i 1 [ d(G) J 2W ;= 1

H ; == - ~ d(i ) 10 d(i) o f:( 2W g2 2W

.. . (15) 1 ( /I '

= - 2W log2 2W - L dU)log 2 d(i ) ) 2W ;=1 ,

(16)

where /1.; is the number of vertex pairs being at a

distance i from each other and W =L~~IG ) i 12; .

Let I; be the number of matrix elements equal to l.

The entropy H ;. , the margi nal entropies H 1 and H ~

and the information I). are defined by

"d(G ) A.. A.. '" '" IJ I 'J H ). == - L.J L.J ogz .. ;=1 j=1 12(n -1) n(n -1)

= --_- n.(n -1) iogz n(n -1) - II, i iogz i 1 ( max J

n(n. 1) ; = 1

.. . (17)

KONSTANTINOVA el al. : APPLICATIONS OF INFORMATlON THEORY IN CHEMICAL GRAPH THEORY 1233

tI(C ) 211 . 2n . H ' == - L ' 100 '

). j = i n(n -1) 0 2 n(n -I )

1 [ tI«(; j

= _ n(n -1) log2 n(n -1) - L 2nj log2 2n j n(n 1) j = i

... (18)

. n-l n-1 H~ == -n log2 = log2 n

n(n -1) n(n -1) .. . (19)

... (20)

The information indices I;, H D and H 1 are based

on the vector ni and the constants nand W that leads to their correlations. In particular, since

I~v = ~ (w log2 W - L ~~ iG ) ni i log2 i) then followi ng

Eq . (14) we immediately obtain H D = I; + I. So it is

enough to study the only index among them. The

index ~w was investigated in discriminating tests as

the most known one56,57.

Discriminating Tests The discriminating power4

1.58 is one of the basic characteristics of a topological index [ and corresponds to a measure of its ability to distinguish among the nonisomorphic graphs (the structural isomers) by distinct numerical values of an index [. The theoretical evaluation of the index sensitivity S on a fixed set M of non isomorphic graphs can be achieved by the formula

S = N-N1

N ... (21)

where N =1 M 1 is the number of graphs in a set M

and NI is the number of degeneracies of an index I

within a set M. According to the definition, S = 1 means that among the elements of the set considered, no two non isomorphic graphs have the same value of the index l.

Bonchev and Trinajstie7 investigated the discriminating power of information and topological indices between 45 alkane trees. Basak et al. 59 have continued these investigations on the set of 45 alkane trees as well as on the set of 19 monocyclic graphs.

Razinger, Chretien and Dubois58 explicitly pointed out the fact that the discriminating power of the Wiener number is very low in alkane series. The discriminating tests among polycyclic graphs were done by Konstantinova and Paleev47 on the set of 1020 subgraphs of the regular hexagonal and square lattices. Later Konstantinova48 has tested information and topological indices for 2562 subgraphs of the regular hexagonal lattice. Graphs of this class represent the molecular structures of unbranched cata­condensed benzenoid hydrocarbons. The discriminating powers of topological and information indices as well as the Wiener polynomial derivatives were studied by Konstantinova and Diudea49 on 3006 subgraphs of the regular hexagonal lattice and on the set of 347 cycle-containing graphs with ten vertices and three to eight-membered cycle.

An exhaustive analysis of 12 information and topological indices based on the distance in a graph was performed by Konstantinova and Vidyuk56

,57 on 1.443.032 polycyclic graphs and "3.473.141 trees. The

information indices I D' H~,l). , H). , H; ,H; ,H2 ,!:; presented in section 2 and the topological indices such as the Wiener number, the Schultz number, the Balaban number and the Randi6 number were examined in the discriminating tests. The formulas for topological indices are given below.

The Schultz molecular topological index60 is defined by

MTI = L. deg(v)· d(v) + L. deg(v)2, ... (22) "", V(G ) VE V (G)

where deg(v) is the vertex degree. This index has

found interesting applications in chemistri4• Its

discriminating power was investigated by Dobrynin6 1

for cata-condensed benzenoid graphs. The average distance sum connectivity was

introduced by Balaban29 and defined by

m '" _ 1. } = L.J (d(u)· d(v» 2 ,

m - n + 2 ",VEV (G)

.. . (23)

where m is the number of edges in a graph G. The Randi6 index25 X is based on the molecular

connectivity and achieved by the formula

X= L (deg(u).deg(v)rt ... (24) u ,,,,,, V (G )

1234 INDIAN] CHEM, SEC. A, JUNE 2003

The numerical resu lts of discriminating tests for considered indices are given below on the sets of polycyclic graphs and trees .

Polycyclic graphs

The polycyclic graphs embedded to the regular hexagonal, square and triangular lattices are tested. The hexagonal graphs correspond to the structural formulas of planar polycyclic aromatic hydrocarbons62

.63

. The values of 12 information and topological indices were calculated for 849.285 hexagonal, 298.382 square and 295.365 triangular graphs. The calculation accuracy for all indices is 10- 13

. The discriminating powers of indices were obtained in accordance with Eq. (21) and the results are given in Table I, where N is the number of graphs in the respective class.

The data show that the information indices give a much more discriminating power. The indices

H;~ ,lD , H~ ,H 2 have the best result (5=0.999) for

hexagonal graphs. All topological indices, excepting J , could not discriminate between these graphs. The same situation is observed for square and triangular graphs. The degeneracy is high for W , MTI, and very

-w low for H;,l D' The information index I v

discriminates not bad among hexagonal graphs but it doesn ' t discriminate square and triangular graphs. The opposite situation is observed for the Randi6 index X .

Its discriminating power is the lowest one on hexagonal graphs and the highest one on triangular graphs.

Trees

Similar results were obtained on the set of trees. A tree is a connected acyclic graph. The discriminating powers of indices were calculated on the set of

3.490.528 trees up to 21 vertices. The obtained data are given in Table 2. The highest sensitivity

corresponds to the information index H; . There are

no degeneracies of this index, i.e. 5 = 1, on the set of trees up to 17 vertices (the number of trees is N =81.134) . Two trees with 18 vertices and two trees

having 19 vertices give the same values of H; . There

are only 14 degeneracies on the set of 20 vertices trees, N = 823 .065, and only 12 degeneracies on the set of 21 vertices trees, N =2.144.505. All of trees having the same values of this index are presented in Fig. 4. Topological indices show a very low discriminating power.

So the data obtained on the sets of nonisomorphic polycyclic graphs and trees indicate that in common the information indices have greater discriminating power than the topological ones. Another basic characteristic of a topological index is its correlating ability with a molecular property. The information approach to the correlations between structure and reactive capability of molecules is considered in the next section .

Correlating Ability of Information Indices One of the key problems of a modern chemistry is

to find a relation between a structure and a reactive capability of a molecule. The reactive capability of a molecule can be characterized by its mass-'spectrum that contains the information on the ways of a molecule fragmentation and displays the "behavior" of some molecular fragments that can be interpreted as subgraphs of a molecular graph .

Let us define the information index of the chemical mass-spectrum using the Shannon relation . From the information theory point of view, the mass-spectrum

A is the distribution of probabilities p = --'-, i = 1, ... , k, , A

Table I- The discriminating power of indices on the sets of N hexagonal, square and triangular graphs

10 Hi H). H" H2 H:' - w J X MTI W N 0 I). D A 10

849285 0.999 0.999 0.997 0.993 0.999 0.999 0.997 0.659 0.998 0.0001 0.004 0.0006

298382 0.997 0.995 0.954 0.811 0.998 0.994 0.906 0.133 0.993 0.005 0.002 0.0003

295365 0.984 0.982 0.844 0.466 0.992 0.981 0.585 0.021 0.986 0.407 0.0008 0.0001

Table 2- The di scriminating power of indices on the set of N trees up to 21 vertices

N 10 Hi D I). H). H" 0 H2 H" ). T; J X MTI W

3490528 0.998 0.912 0.985 0.321 0.999 0.907 0.428 0.683 0.907 0.017 0.00004 0.00001

KONSTANTINOVA el al.: APPLICATIONS OF INFORMATION THEORY IN CHEMICAL GRAPH THEORY 1235

n = 20. Contd.

/

H ~ = 7 1.067 195826536 I 8 H; = 82.23982070813867

H; = 76.97820768153596

H; = 82.46161593939024

H ~ = 8 1.80354946997848 H; = 82.74262835086996

/

H; = 82.07740940300192 H; = 83.25510903225370

H ~ = 82.16707259857088 H; = 87.30751562364020

Fig. 4 - The pairs of trees having the same values of the information index. H~ . (Conld)

1236 INDIAN J CHEM. SEC. A, JUNE 2003

n = 21. Contd.

~ \

--'r I

/

1< \

H; = 87.93993238547368

~-H; = 88.30758149768472

H~ = 88.38614500060488

H ; =88.5599970657 1380

H ~ = 88.56334987995548

Fig. 4 - The pairs of trees hav ing the same values of the information index H~ .

kj J -

2

I 0---0---6--- 0::'

'0 8

10

4,5 '7.8

I I d(i)

2

I I 16 18 2U 12 14

Fig. 5--The dis tance vertex spectrum

of the ions formation, where A; is the mass-spectrum

amplitude of the i -th ion, A = 2:;=1 A;, and k is the

number of peaks in the mass-spectrum. The amplitude information index HA is defined by

kA A HA =-~ -.100 ---1... L.J 0 2 ;=1 A A

.. . (25)

On the other hand, a molecular graph that represents a structural formula of a molecule could be

used for defining specific structural features of a molecule by means of information indices based on the distance in a molecular graph . As it was mentioned in introduction the topological index is .designed by transforming a molecular graph into a number and it expresses in a numerical form the topology of the chemical species it presents. Moreover, it was shown by Skorobogatov et aI.50 that some information indices have a "chemi cal" spectral interpretation . Let us consider the information index

H 2 that is based on the vertex distance d (i) and the

KONSTANTINOVA el al.: APPLICATIONS OF INFORMATION THEORY IN CHEM ICAL GRAPH THEORY 1237

I , 4 7

- I 2 3 6

7. 8 4, 5 - 8

5

I 6 2 3 -

I I I aUl omel fl clly classes

Fig. 6-The autometricity vertex spectrum

number k; of vertices having the di stance d (i) , and

let us define the pairs (d(i),k) as the points in

Euclidean plane. Then the distance vertex spectrum can be pictured on the plane by the lines

((d (i), k;), (d (i). 0») , i = 1, .. . , k . Figure 5 shows the

distance vertex spectrum for the tree. As one can see, there is even the visual correspondence between the chemical mass-spectra and the topological spectra of molecular graphs.

One more topological spectrum called the autometricity vertex spectrum was defined on the basis of a layer matrix. It could be happen that some rows of this matrix are the same. It means that the corresponding vertices belong to one and the same class of autometricity. By this way the vertex set is divided into the autometricity classes. The autometricity vertex spectrum is defined by the

autometricity classes and by the number Il; of

vertices in the i -th class of autometricity. Figure 6 shows the autometr;city vertex spectrum

for the same tree. Its canonical layer matrix is the following one:

1 ') 3 7, 8 ... I I 3 2 I

): = 3 3 6

2 3 2 2

3 3 4, 5

4 3 3

The rows are ordered with respect to their lengths and then the rows are ordered lexicographically. The numeration of autometricity classes corresponds to the numeration of rows in the canonical layer matrix . Let us notice, that there is a finite probability scheme on the vertex set with respect to the autometric ity ratio and one can define the information index

k I1 A ,l H =-"-' 100 - ' a L..J b2

;=1 Il 11

... (26)

The information indices H 2 ,H " and HA based on

the topological as well as the chemical spectra provide the presentation of a molecular graph and a chemical structure in terms of the same quantitati ve scale because their values are expressed in information bits. These indices are suitable ones for finding structure-activity correlations.

At first this approach was applied to the class of

ferrocene derivatives C" FeCsH . R, where R is a

substituent5o. The linear correlations between the

information indices H2 and HA, and H" and HA

were found . It was shown that the initi al set of molecular structures is divided into three subsets by the linear regression. In the cases considered the correlation ratio ranges from 0 .94 to 0.975. The example of the autometricity vertex spectrum for a molecular structure of this class is given in Fig. 7.

Figure 8 shows the correlations between the information indices !fa and HA on the set of ferrocene derivatives. The correlation ratio r and the number of structures n for each subset are presented.

The similar results were obtained for infomlation indices H 2 and HA . The correlations between them

were found as follows:

1 : HA=I .09+0.76H2 (r=0.950, 11=9) 2 : HA= 1.30+0.99H2 (r=0.975, 11= 17)

3 : HA=3.96+0.40H2 (r=0 .950, 1l=6)

Nekrasov and co-workers51-54 have used this

approach for finding correlation on the several classes ( - 20) of organic and organometall ic compounds. In particular, they have found the correlations on the set

1238 [NDI AN J CHEM, SEC. A. JUNE 2003

H ~/H

H "~c C"-: H H

Fe ---) H?ct\H H~H

H

a) the structure fomlU la

of methylferrocene

3 9

3 3 4 9 7

A= 4 3 9 I 3 9

10 10 3 4 12 7 4 9 10

7 3 9 7

3 7

10

20

14 3

19

9 18

b) the molecular graph

of methylferroce ne

12. 13, 17, 18, 19, 20, 21 :; X, 22,23, 24 :; X,

1, 2,7,8,9, 10, II :; X, 15 :; ~ 14, 16 :; X5 6 :; X. 4 3, 5

:; X,

:; Xs

22

23

c) the canonical layer matrix of the methylferrocene molecul ar

l­n,

7 -

6 -

5 -

4 -

3 -

2 -

1 -

o 2

I I

3 4

I I I 1

567 8 1 autometnclty c1assesXi 9

d) the autometricity ve rtex spec trum

Fig. 7- The example o f the autometric ity vertex spectrum for the molecular graph of methylfe rrocene

of arylsilanes. There are three subsets of arylsilanes and each of them has specific structure peculiarities .

Figure 9 shows the correlations between the

information indices H 2 and HA on the set of arylsilanes, Line 1 corresponds to the set of

phenylmethylsilanes, line 2 corresponds to the set of phenylallyl- and phenylvinylsilanes and line 3 corresponds to the set of vinylmethylsilanes. As one can see from the picture the correlation ratio for all the cases ranges from 0.856 to 0.982.

HA

o

4

H, 1.25 2.25 3. 25 4.25 5.25

I : HA = 1.70 + 0.5 7 H, (r = 0.9 40. 11 = 10)

2: HA = 1.2 4 + 0.98 H, (r = 0.975 , n = 16)

3 : HA = 3.91 + 0.41 H, (r = 0.g40. '1 = 6)

Fi g. 8- The corre lations between the information indices H" and

HA on the set of ferrocene derivati ves

HA

4

2

1.25

PhM eSiR f

2. 25

o /

Vi Il MeSiR!

Ph All(Vin)S iR

3.25 4.2 5

I 5.25

I : HA = - 5.9 + 2. 9 HJ (r = 0.9 45 , II = 12)

2: HA = -8.3 + 2,8 HJ (r = 0.982 . II = 10)

3: HA = -8 .5 + 3.4 Hl (r = 0.85 6, II = 5)

)

Fig. 9- The correlati ons between the in formation indices H 2 and

HA on the set of arylsilanes

KONSTANTINOVA el al.: APPLICATIONS OF IN FORMATION THEORY IN CHEMICAL GRAPH THEORY 1239

Conclusion Information theory is a useful tool in developing

the new invariants for characterizing molecular structures. The only limit to the design of invariants is the imagination and resourcefulness of investigators . The situation is similar to a search for a system of codes and a design of codes for chemical structures in particular. There are graph theoretical invariants considered in the mathematical literature that have not yet been tested for possible chemical applications. Such mayor may not be of interest in chemistry , but without testing we will not know. If some existing examples mathematical invariants are shown to correlate with some of the known molecular properties, the findings may be of considerable interest for chemistry - not only because such invariants may offer a novel predictive tool in structure-property studi es, or an alternative route to a certain molecular property, but they may give additional insight into structure-property relationships. Finally, such results may show novel mathematical concepts of use in chemistry .

References Trinajstic N, Chelllical graph Iheory. 2nd Edn ; (CRC Press, Boca Raton, FL), 1992.

2 Rouvray D H, J II/olec Slruct (Theochell/) , 185 ( 1989) 187. 3 Shannon C & Weaver W, Malhell/alical IheOlY of

communications (U ni vers ity of Illinois, Urbana), 1949. 4 Ashby W R, An inlroduclion 10 cybem elics (Wiley. New

York) , 1956. 5 Brillouin L, Sciellce and in forll/a lioll theOl)' (Academic. New

York) , 1956. 6 Kolmogorov A N, Probl Peredachi Illj. 5 ( 1969) 3. 7 Mowshovitz A. Bullll/alh Biophys, 30 ( 1968) 225; 30 ( 1968)

175; 30 (1968) 387; 30 (1968) 533. 8 Rashevsky N. Sullmalh Biophys. 17 ( 1955) 229. 9 Trucco E, Bull maTh Biophys, 17 ( 1956) 129; 18 ( 1956) 237. 10 Rashevsky N, Bulllllalh Biophys, 22 (1960) 35 1. II Morovitz H, Bull math. Biophys, 17 (1955) 8 1. 12 Valentinuzzi M & Vaientinuzzi M E, Bull math Biophys, 25

(I963) II. 13 Bonchev D, Kamenski D & Kamenska V, Bull maTh Biophys,

38 (1976) 119. 14 Bernstein R B & Levine R D, J chelll Phys, 57 ( 1972) 434. 15 Ben-Shaul A, Levine R D & Bernstein R B, J chelll Phys, 57

( 1972) 5427. 16 Aslangul C, Constanciel R. Daudel R & Kotti s P, Adv

Quanlum Chem, 6 ( 1972) 93. 17 Daudel R, Bader R F, Stephens M E & Borett D S, Callud J

Chem, 52 (1974) 13 /0. 18 Aslangul C, Constanciel R, Daudel R, Esna ult L & Ludena

E, lnt J Qualllum Chem, 8 (1974) 499. 19 Fratev F, Enchev Y, Pulansky O.E & Bonchev D, J mol

Srruct (Theochem), 88 (1982) 105.

20 Bonchev D, MA TCH COI1l IllUII II/(I/h COlllput Chem. 7 ( 1979) 65.

2 1 Bonchev D, III[ J Quall [UIII Chelll. 19 ( 1981 ) 673. 22 Wiener H, J Am chem Soc. 69 ( 1947) 17. 23 Wiener H, J chem Phys, 52 ( 1948) 425. 24 Hosoya H, Bull. chem Soc JplI, 44 ( 1971 ) 2332. 25 Randic M, J Am chem Soc, 69 ( 1975) 6609. 26 Entringer R C, Jackson D E & Snyder D A, Czechoslovak

maTh J, 2 {I 976) 283. 27 Doyle J K & Graver J E, Discrere Malh , 17 (1977) 147. 28 Kier L B & Hall L H, J pharm Sci. 70 ( 1981) 583. 29 Balaban AT., Pure Appl. Chem .. 55 ( 1983) 199-206. 30 Rouvray D H in: King R B (Ed), Chemical applicaliolls oj

TOpology allrl graph theory (Elsevier, Amsterdam). 1983. p 159.

3 1 Randic M, J chem Inj Compu[ Sci, 24 ( 1984) 164. 32 King R B & Rouvray D H (Eds), Graph [heOl), alld [apology

ill chemistry (Elsevier, Amsterdam), 1987. 33 Randic M, J ma[h Chem. 7 ( /991) 155. 34 Nikolic S, Trinajstic N & Mihalic Z, Croal che/II Acra, 68

( 1995) 105. 35 Entringer R C, .I comb in MaTh call/bill Call/pur, 24 (1997) 65. 36 Diudea M Y & Gu tman I, Croat chell/ Acta, 71 (1998) 21. 37 Bonchev D & Trinajstic N, J chell/ Phys, 38 ( 1977) 4517. 38 Bonchev D & Trinajstic N, In[ J Qualltulll Chem. S 12 ( 1978)

293 . 39 Bonchev D, Knop J V & Trinajsti c , MA TCH COII/IIlUII

lIla/h COli/pili Chell/, 6 ( 1979) 2 1. 40 Bonchev D, Mekenyan 0 & Trinajstic N, Inl J QuanlUm

Chell/ . 17 ( 1980) 845. 4 1 Bonchev D, Mekenyan 0 & Trinajstic N, J COlllplll Chem, 2

( 1981) 127. 42 Bonchev D & Trinajstic N, 11l[ J Quan[um Chem SYIIlP, 16

( 1982) 463. 43 Bunchev D. Illjorllla/ioll-Iheoretic ill dices jar

charac[erizalioll oj chemical slrucllires (Research Studies Press, Ch ichester) , 1983.

44 Kier L B & Hall L H, Molecular cOllllecrivirv ill chemislIY and drug research (Academi c Press, New York), 1976.

45 K.ier L B & Hall L H, Molecular cOllnecriviry ill s[ruc[ure-acrivity all(/Ivsis (Research Studies Press, Letchworth), 1986.

46 Balaban A T. Chirac A. Motoc I & Simon Z, Steric fil in quaillilwive structure-activiry relwiollships (Lecture Notes in Chemi stry 15, Springer, Berlin), 1980.

47 Konstantinova E Y & Paleev A A, Vychis Sistemy, 136 ( 1990) 38.

48 Konstantinova E Y, J chem Inj ColI/put Sci, 36 (1996) 54. 49 Konstantinova E Y & Diudea M Y. Crow chell/ Acra, 73

(2000) 383. 50 Skorobogatov Y A, Konstantinova E Y. Nekrasov Y S.

Sukharev Y N & Tepfer EE, MA TCH COII/II/un ma[h Comput Chell/. 26 ( 1991 ) 215.

5 I Sukharev Y N. Nekrasov Y S, Molgacheva N S & Tepfer E E. Org Mass Specrr, 28 (1993) 1555.

52 Nekrasov Y S, Tepfer E E & Sukharev Y N, Russ chem Bull, 42 (1993) 343.

53 Nekrasov Y S, Sukharev Yu N. Moigacheva N S & Tepfer E.E, Russ chem BIIII. 42 (1993) 1986.

54 Nekrasov Y S, Sukharev Y N, Tepfer E E & Molgacheva N S, Russ chem Bul!, 45 (1996) 2542 .

1240 INDIAN J CHEM, SEC. A, JUNE 2003

55 D'yachkov A G & Konstantin(lva E V, (in progress) . -56 Konstantinova E V & Vidyuk M V (submitted in J chem II! f

Comput Sci) 57 Konstuntinova E V & Vielyuk M V (submitted in J chelll Illf

CompUl Sci) 58 Razinger M , Chretien 1 R & Dubois 1 K, J chelll Inf CO/llPllt

Sci, 25 ( 1985) 23. 59 Raychauelhary C. Ray S K, Ghosh 1 1. Roy A B & Basak S

C, J COlllput Chem, 5 (1984) 581.

60 Gutman I, J chem fl if Comput Sci, 34 (1994) 1087.

61 Dobrynin A A, MATCH Commlln match Comput Chem, 38 ( [998) 19.

62 Gutman I & Cyvin S J, Illt roductiolZ to the theory of be f1 zenoid hydrocarbolls (Springer-Verlag, Berlin), 1989.

63 Gutman I & Cyvin S J (Eels.). Advallces in the theory of bellzenoid hydrocarbolls (Springer-Verlag, Berlin), 1990.