Ligand search and data mining of Structural Genomics...

9
JCSG Ligand Search Server More than 2800 structures have been deposited into the PDB by the PSI centers as of March 18, 2008, of which the JCSG has contributed over 590 structures. The JCSG Ligand Server provides a tool to query these structures by a variety of search criteria and enables the dissemination of the information gained from these structures to a larger community of researchers. The main objective of this server is to extract ligands, biological or otherwise, bound to the structures, and to explore them further with a number of associated links. In addition, the structures can be queried by a host of other criteria, such as target names, PDB IDs, PFAM family names, structure descriptions, organisms, and PSI centers. A preliminary analysis indicates that 1706 of these PSI structures have some type of bound ligand, metal or solvent molecules, and 283 of these structures contain 147 unique biological ligands (Table1). A distribution of ligands is depicted in Figure 1. Additionally, 25 ligands have been observed first time in PSI structures (Tables 2, 3). We also studied the frequency with which cryo-protectant agents appeared in the structures (Table 4). Ethylene Glycol (EDO) seems to have a probability of 78% for showing up in the structure. Table 1: Summary of Ligands found in PSI structures

Transcript of Ligand search and data mining of Structural Genomics...

Ligand search and data mining of Structural Genomics Structures

JCSG Ligand Search Server

More than 2800 structures have been deposited into the PDB by the PSI centers as of March 18, 2008, of which the JCSG has contributed over 590 structures. The JCSG Ligand Server provides a tool to query these structures by a variety of search criteria and enables the dissemination of the information gained from these structures to a larger community of researchers. The main objective of this server is to extract ligands, biological or otherwise, bound to the structures, and to explore them further with a number of associated links. In addition, the structures can be queried by a host of other criteria, such as target names, PDB IDs, PFAM family names, structure descriptions, organisms, and PSI centers.

A preliminary analysis indicates that 1706 of these PSI structures have some type of bound ligand, metal or solvent molecules, and 283 of these structures contain 147 unique biological ligands (Table1). A distribution of ligands is depicted in Figure 1. Additionally, 25 ligands have been observed first time in PSI structures (Tables 2, 3). We also studied the frequency with which cryo-protectant agents appeared in the structures (Table 4). Ethylene Glycol (EDO) seems to have a probability of 78% for showing up in the structure.

Table 1: Summary of Ligands found in PSI structures

Type

Numbers for PSI

Numbers for JCSG

Structures

Ligand

Structures

Ligand

Ligand

283

147

89

25

Co-factors

217

21

55

19

Metals

689

30

162

11

Non-metals

734

22

247

12

Organics

92

26

27

6

Buffers

259

15

97

10

Precipitants

109

13

63

9

Cryos

539

5

280

3

0

50

100

150

200

250

300

350

SO4PO4IODSCNCACAZIBCTOXLSO3PO31AL

SHAPE \* MERGEFORMAT

SHAPE \* MERGEFORMAT

Figure 1: Distribution of ligands

Table 2: Unique Ligands (25) found in PSI structures

PDB

Ligand Name

Ligand

CENTER

1KPH

Didecyl-Dimethyl-Ammonium

10A

TBSGC

1KPI

Didecyl-Dimethyl-Ammonium

10A

TBSGC

1Z2L

Allantoate Ion

1AL

NYSGXRC

1M33

3-Hydroxy-Propanoic Acid

3OH

MCSG

1VR0

(2r)-3-Sulfolactic Acid

3SL

JCSG

1Y0G

2-[(2e,6e,10e,14e,18e,22e,26e)-3,7,11,15,19,23,27,31- Octamethyldotriaconta-2,6,10,14,18,22,26,30- Octaenyl]Phenol

8PP

NYSGXRC

1O8B

Beta-D-Arabinofuranose-5'-Phosphate

ABF

MCSG

1TUF

Azelaic Acid

AZ1

NYSGXRC

1Y80

Co-5-Methoxybenzimidazolylcobamide

B1M

SECSG

2B4B

N-Ethyl-N-[3-(Propylamino)Propyl]Propane- 1,3-Diamine

B33

NYSGXRC

2A3L

Coformycin 5'-Phosphate

CF5

CESG

2Q09

3-[(4s)-2,5-Dioxoimidazolidin-4-Yl]Propanoic Acid

DI6

NYSGXRC

2OSU

6-Diazenyl-5-Oxo-L-Norleucine

DON

MCSG

2NW9

6-Fluoro-L-Tryptophan

FT6

NESG

1P44

5-{[4-(9h-Fluoren-9-Yl)Piperazin-1-Yl]Carbonyl}- 1h-Indole

GEQ

TBSGC

2OU3

1h-Indole-3-Carbaldehyde

I3A

JCSG

1X92

D-Glycero-D-Mannopyranose-7-Phosphate

M7P

MCSG

2GVC

1-Methyl-1,3-Dihydro-2h-Imidazole-2-Thione

MMZ

NYSGXRC

1RTW

(4-Amino-2-Methylpyrimidin-5-Yl)Methyl Dihydrogen Phosphate

MP5

NESG

2PUZ

N-(Iminomethyl)-L-Glutamic Acid

NIG

NYSGXRC

2OD6

10-Oxohexadecanoic Acid

OHA

JCSG

1N2H

Pantoyl Adenylate

PAJ

TBSGC

1N2I

Pantoyl Adenylate

PAJ

TBSGC

1QPR

5-Phosphoribosyl-1-(Beta-Methylene) Pyrophosphate

PPC

TBSGC

1XKL

2-Amino-4h-1,3-Benzoxathiin-4-Ol

STH

NESG

1BVR

Trans-2-Hexadecenoyl-(N-Acetyl-Cysteamine)- Thioester

THT

TBSGC

1LW4

3-Hydroxy-2-[(3-Hydroxy-2-Methyl-5-Phosphonooxymethyl- Pyridin-4-Ylmethyl)-Amino]-Butyric Acid

TLP

NYSGXRC

Table 3: Examples of unique ligands

(R)-2-Hydroxy-3-Sulfopropanoic acid (3SL) bound to the putative

2-phosphosulfolactatetitle 2 phosphatase from Clostridium Acetobutylicum (1VR0)

Indole-3-Carboxaldehyde (I3A) bound to tellurite resistance protein of COG3793 (ZP_00109916.1) from Nostoc Punctiforme PCC 73102 (2OU3)

10-Oxohexadecanoic acid (OHA) bound to Ferredoxin-like protein (JCVI_PEP_1096682647733) from an environmental metagenome (unidentified marine microbe) (2OD6)

Unknown Ligands (UNL)

FB8805A (2Q9K) Protein of unknown function

FK9436A (2OH1) Acetyltransferase Gnat family

Table 4: Frequency of use of cryo-protectant agents

Cryo

# of times

used

# of times observed

in structure

EDO

230

179 (77.8 %)

GOL

213

93 (43.7 %)

MPD

66

23 (34.8 %)

PEG200

56

18 (32.0 %)

PEG400

35

10 (28.6 %)

Exploring Binding Modes of Ligands

We have begun to explore the binding modes of ligands in the structures where a large number of structures with a given bound ligand are available. For example, FMN appears bound in over 340 structures in RCSB. The co-factor displays considerable variation in binding mode due to the torsional flexibility in the molecule, as shown in the figure below.

However, unique binding modes can be observed in proteins belonging to specific PFAM families.

Binding Modes in various PFAMs

PFAM

PSI

Non-PSI

Total

Superposed structures

PF01070

(FMN-dependent dehydrogenase )

0

8

8

PF00881

(DHOdehase)

9

8

17

PF00258

(Flavodoxin _1)

3

13

16

PF00724

(Oxidored._FMN )

2

8

10

PF01613

(Flavin reductase-like)

2

7

9

PF01180

(Nitroreductase)

1

8

9

PF01243

(Pyridox._oxidase)

7

14

21

Precipitants

Buffers

Non-metal Ions

EMBED Excel.Chart.8 \s

Metal Ions

FS4

PLP

FMN

Co-factors

MPO

GAL

NDP

Ligands

0

10

20

30

40

50

60

70

80

UNLNDPUNKBALGALCEIGNPMPOBGCNIO

0

5

10

15

20

25

30

35

40

FMN

NAD

COA

NAP

PLP

ADP

FAD

SAM

ATP

SAH

AMPHEM

ACOGDP

FS4

U5P

MLC

COD

CNC

UTPCTP

0

20

40

60

80

100

120

140

160

180

200

MGCAFECDCOPRCSARSYT3MO3

0

10

20

30

40

50

60

70

80

90

100

ACTFMTTRSMESTMNBTBCPSNHE

0

5

10

15

20

25

30

35

40

PEGPG4PGE1PEP6G2PEPE4P33PE5PEFBU31PGPE8

0

50

100

150

200

250

300

350

SO4PO4IODSCNCACAZIBCTOXLSO3PO31AL

_1267422780.xlsChart9

324

243

118

11

10

10

8

4

4

3

3

2

2

2

2

1

1

1

1

1

1

1

Sheet1

Ligands269 structures; 140 different ligands

UNL70

UNX22

LLP6

SIN6

NDP6

MA76

NAG5

PLM4

UNK4

GUN3

APC3

SUC3

BAL3

GLC3

PAF3

APR2

GAL2

NCN2

CSD2

SAI2

CEI2

BIO2

HMH2

SAP2

GNP2

1442

NCA2

G4P2

MPO2

SRT2

ANP2

PCP2

BGC2

PAJ2

NIG1

PRP1

NIO1

ABF1

IPR1

MTA1

CP1

MLT1

DI61

MED1

MLZ1

5GP1

CSO1

CDP1

I3A1

2PL1

HED1

G1P1

NBZ1

CSY1

FRU1

PLG1

THF1

B1M1

ACP1

DU1

MMZ1

OHA1

16A1

THT1

M7P1

3GC1

CF51

PEO1

CTZ1

ADE1

FT61

KEG1

LUM1

XLS1

BAM1

ADN1

PMP1

ADQ1

B331

DGI1

G3H1

OXG1

NDS1

SAL1

3SL1

SIB1

STH1

FEO1

G3P1

OXN1

FES1

TYD1

DGT1

8PP1

CO21

MP51

NTM1

PNS1

AES1

APK1

UVW1

TRE1

PYR1

NAI1

TCL1

NMN1

MAN1

BFD1

HHP1

RIP1

RBF1

ORO1

SNN1

DTP1

ZID1

DEP1

UPG1

HXA1

AAT1

DTY1

DON1

NPO1

C2E1

AGC1

BDF1

PHT1

OSB1

NVA1

CRO1

BDN1

TNE1

SOG1

AGS1

TLP1

1PS1

DUT1

CXS1

GEQ1

MRD1

G6P1

Co-factors211 structures; 21 different co-factors

FMN36

NAD29

COA18

NAP17

PLP15

ADP15

FAD15

SAM14

ATP9

SAH9

AMP9

HEM8

ACO7

GDP4

FS43

U5P2

MLC1

COD1

CNC1

UTP1

CTP1

Metal Ions647 structures; 30 different metal ions

MG177

ZN174

NA102

CA83

NI40

MN31

FE26

K16

FE29

CD8

PT8

HG7

CO5

SM2

WO42

PR2

AU2

BA1

CS1

MW21

SE1

ARS1

ZN31

O4M1

YT31

LI1

MO21

MO31

VO41

MO61

SO4324

CL243

PO4118

NO311

IOD10

BR10

SCN8

CO34

CAC4

POP3

AZI3

SUL2

BCT2

ALF2

OXL2

PER1

SO31

MLI1

PO31

THJ1

1AL1

NH41

Organics90 structures; 26 different organics

IPA14

EOH13

BME9

BEZ5

TLA5

SEO5

AKG5

ETX4

TAR4

PGO4

DTT4

OAA2

ACE2

DMS2

MLA1

DOX1

XYL1

MOH1

3OH1

AZ11

PPI1

IOH1

FOR1

MYR1

GTT1

LMT1

Buffers240 structures; 15 different buffers

ACT86

ACY47

FMT37

CIT27

TRS16

EPE15

MES12

IMD8

TMN2

10A2

BTB2

ICT1

CPS1

FLC1

NHE1

PEG38

PG428

PGE16

1PE8

P6G7

2PE3

PE43

P333

PE52

PEF1

BU31

1PG1

PE81

Salts3 structures; 3 different salts

DPO1

AF31

PPC1

Detergents2 structures; 1 different detergents

BOG2

Cryos502 structures; 5 different cryos

GOL244

EDO241

MPD32

EGL3

CRY2

Sheet1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

Sheet2

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

Sheet3