Ligand search and data mining of Structural Genomics...
Transcript of Ligand search and data mining of Structural Genomics...
Ligand search and data mining of Structural Genomics Structures
JCSG Ligand Search Server
More than 2800 structures have been deposited into the PDB by the PSI centers as of March 18, 2008, of which the JCSG has contributed over 590 structures. The JCSG Ligand Server provides a tool to query these structures by a variety of search criteria and enables the dissemination of the information gained from these structures to a larger community of researchers. The main objective of this server is to extract ligands, biological or otherwise, bound to the structures, and to explore them further with a number of associated links. In addition, the structures can be queried by a host of other criteria, such as target names, PDB IDs, PFAM family names, structure descriptions, organisms, and PSI centers.
A preliminary analysis indicates that 1706 of these PSI structures have some type of bound ligand, metal or solvent molecules, and 283 of these structures contain 147 unique biological ligands (Table1). A distribution of ligands is depicted in Figure 1. Additionally, 25 ligands have been observed first time in PSI structures (Tables 2, 3). We also studied the frequency with which cryo-protectant agents appeared in the structures (Table 4). Ethylene Glycol (EDO) seems to have a probability of 78% for showing up in the structure.
Table 1: Summary of Ligands found in PSI structures
Type
Numbers for PSI
Numbers for JCSG
Structures
Ligand
Structures
Ligand
Ligand
283
147
89
25
Co-factors
217
21
55
19
Metals
689
30
162
11
Non-metals
734
22
247
12
Organics
92
26
27
6
Buffers
259
15
97
10
Precipitants
109
13
63
9
Cryos
539
5
280
3
0
50
100
150
200
250
300
350
SO4PO4IODSCNCACAZIBCTOXLSO3PO31AL
SHAPE \* MERGEFORMAT
SHAPE \* MERGEFORMAT
Figure 1: Distribution of ligands
Table 2: Unique Ligands (25) found in PSI structures
PDB
Ligand Name
Ligand
CENTER
1KPH
Didecyl-Dimethyl-Ammonium
10A
TBSGC
1KPI
Didecyl-Dimethyl-Ammonium
10A
TBSGC
1Z2L
Allantoate Ion
1AL
NYSGXRC
1M33
3-Hydroxy-Propanoic Acid
3OH
MCSG
1VR0
(2r)-3-Sulfolactic Acid
3SL
JCSG
1Y0G
2-[(2e,6e,10e,14e,18e,22e,26e)-3,7,11,15,19,23,27,31- Octamethyldotriaconta-2,6,10,14,18,22,26,30- Octaenyl]Phenol
8PP
NYSGXRC
1O8B
Beta-D-Arabinofuranose-5'-Phosphate
ABF
MCSG
1TUF
Azelaic Acid
AZ1
NYSGXRC
1Y80
Co-5-Methoxybenzimidazolylcobamide
B1M
SECSG
2B4B
N-Ethyl-N-[3-(Propylamino)Propyl]Propane- 1,3-Diamine
B33
NYSGXRC
2A3L
Coformycin 5'-Phosphate
CF5
CESG
2Q09
3-[(4s)-2,5-Dioxoimidazolidin-4-Yl]Propanoic Acid
DI6
NYSGXRC
2OSU
6-Diazenyl-5-Oxo-L-Norleucine
DON
MCSG
2NW9
6-Fluoro-L-Tryptophan
FT6
NESG
1P44
5-{[4-(9h-Fluoren-9-Yl)Piperazin-1-Yl]Carbonyl}- 1h-Indole
GEQ
TBSGC
2OU3
1h-Indole-3-Carbaldehyde
I3A
JCSG
1X92
D-Glycero-D-Mannopyranose-7-Phosphate
M7P
MCSG
2GVC
1-Methyl-1,3-Dihydro-2h-Imidazole-2-Thione
MMZ
NYSGXRC
1RTW
(4-Amino-2-Methylpyrimidin-5-Yl)Methyl Dihydrogen Phosphate
MP5
NESG
2PUZ
N-(Iminomethyl)-L-Glutamic Acid
NIG
NYSGXRC
2OD6
10-Oxohexadecanoic Acid
OHA
JCSG
1N2H
Pantoyl Adenylate
PAJ
TBSGC
1N2I
Pantoyl Adenylate
PAJ
TBSGC
1QPR
5-Phosphoribosyl-1-(Beta-Methylene) Pyrophosphate
PPC
TBSGC
1XKL
2-Amino-4h-1,3-Benzoxathiin-4-Ol
STH
NESG
1BVR
Trans-2-Hexadecenoyl-(N-Acetyl-Cysteamine)- Thioester
THT
TBSGC
1LW4
3-Hydroxy-2-[(3-Hydroxy-2-Methyl-5-Phosphonooxymethyl- Pyridin-4-Ylmethyl)-Amino]-Butyric Acid
TLP
NYSGXRC
Table 3: Examples of unique ligands
(R)-2-Hydroxy-3-Sulfopropanoic acid (3SL) bound to the putative
2-phosphosulfolactatetitle 2 phosphatase from Clostridium Acetobutylicum (1VR0)
Indole-3-Carboxaldehyde (I3A) bound to tellurite resistance protein of COG3793 (ZP_00109916.1) from Nostoc Punctiforme PCC 73102 (2OU3)
10-Oxohexadecanoic acid (OHA) bound to Ferredoxin-like protein (JCVI_PEP_1096682647733) from an environmental metagenome (unidentified marine microbe) (2OD6)
Unknown Ligands (UNL)
FB8805A (2Q9K) Protein of unknown function
FK9436A (2OH1) Acetyltransferase Gnat family
Table 4: Frequency of use of cryo-protectant agents
Cryo
# of times
used
# of times observed
in structure
EDO
230
179 (77.8 %)
GOL
213
93 (43.7 %)
MPD
66
23 (34.8 %)
PEG200
56
18 (32.0 %)
PEG400
35
10 (28.6 %)
Exploring Binding Modes of Ligands
We have begun to explore the binding modes of ligands in the structures where a large number of structures with a given bound ligand are available. For example, FMN appears bound in over 340 structures in RCSB. The co-factor displays considerable variation in binding mode due to the torsional flexibility in the molecule, as shown in the figure below.
However, unique binding modes can be observed in proteins belonging to specific PFAM families.
Binding Modes in various PFAMs
PFAM
PSI
Non-PSI
Total
Superposed structures
PF01070
(FMN-dependent dehydrogenase )
0
8
8
PF00881
(DHOdehase)
9
8
17
PF00258
(Flavodoxin _1)
3
13
16
PF00724
(Oxidored._FMN )
2
8
10
PF01613
(Flavin reductase-like)
2
7
9
PF01180
(Nitroreductase)
1
8
9
PF01243
(Pyridox._oxidase)
7
14
21
Precipitants
Buffers
Non-metal Ions
EMBED Excel.Chart.8 \s
Metal Ions
FS4
PLP
FMN
Co-factors
MPO
GAL
NDP
Ligands
0
10
20
30
40
50
60
70
80
UNLNDPUNKBALGALCEIGNPMPOBGCNIO
0
5
10
15
20
25
30
35
40
FMN
NAD
COA
NAP
PLP
ADP
FAD
SAM
ATP
SAH
AMPHEM
ACOGDP
FS4
U5P
MLC
COD
CNC
UTPCTP
0
20
40
60
80
100
120
140
160
180
200
MGCAFECDCOPRCSARSYT3MO3
0
10
20
30
40
50
60
70
80
90
100
ACTFMTTRSMESTMNBTBCPSNHE
0
5
10
15
20
25
30
35
40
PEGPG4PGE1PEP6G2PEPE4P33PE5PEFBU31PGPE8
0
50
100
150
200
250
300
350
SO4PO4IODSCNCACAZIBCTOXLSO3PO31AL
_1267422780.xlsChart9
324
243
118
11
10
10
8
4
4
3
3
2
2
2
2
1
1
1
1
1
1
1
Sheet1
Ligands269 structures; 140 different ligands
UNL70
UNX22
LLP6
SIN6
NDP6
MA76
NAG5
PLM4
UNK4
GUN3
APC3
SUC3
BAL3
GLC3
PAF3
APR2
GAL2
NCN2
CSD2
SAI2
CEI2
BIO2
HMH2
SAP2
GNP2
1442
NCA2
G4P2
MPO2
SRT2
ANP2
PCP2
BGC2
PAJ2
NIG1
PRP1
NIO1
ABF1
IPR1
MTA1
CP1
MLT1
DI61
MED1
MLZ1
5GP1
CSO1
CDP1
I3A1
2PL1
HED1
G1P1
NBZ1
CSY1
FRU1
PLG1
THF1
B1M1
ACP1
DU1
MMZ1
OHA1
16A1
THT1
M7P1
3GC1
CF51
PEO1
CTZ1
ADE1
FT61
KEG1
LUM1
XLS1
BAM1
ADN1
PMP1
ADQ1
B331
DGI1
G3H1
OXG1
NDS1
SAL1
3SL1
SIB1
STH1
FEO1
G3P1
OXN1
FES1
TYD1
DGT1
8PP1
CO21
MP51
NTM1
PNS1
AES1
APK1
UVW1
TRE1
PYR1
NAI1
TCL1
NMN1
MAN1
BFD1
HHP1
RIP1
RBF1
ORO1
SNN1
DTP1
ZID1
DEP1
UPG1
HXA1
AAT1
DTY1
DON1
NPO1
C2E1
AGC1
BDF1
PHT1
OSB1
NVA1
CRO1
BDN1
TNE1
SOG1
AGS1
TLP1
1PS1
DUT1
CXS1
GEQ1
MRD1
G6P1
Co-factors211 structures; 21 different co-factors
FMN36
NAD29
COA18
NAP17
PLP15
ADP15
FAD15
SAM14
ATP9
SAH9
AMP9
HEM8
ACO7
GDP4
FS43
U5P2
MLC1
COD1
CNC1
UTP1
CTP1
Metal Ions647 structures; 30 different metal ions
MG177
ZN174
NA102
CA83
NI40
MN31
FE26
K16
FE29
CD8
PT8
HG7
CO5
SM2
WO42
PR2
AU2
BA1
CS1
MW21
SE1
ARS1
ZN31
O4M1
YT31
LI1
MO21
MO31
VO41
MO61
SO4324
CL243
PO4118
NO311
IOD10
BR10
SCN8
CO34
CAC4
POP3
AZI3
SUL2
BCT2
ALF2
OXL2
PER1
SO31
MLI1
PO31
THJ1
1AL1
NH41
Organics90 structures; 26 different organics
IPA14
EOH13
BME9
BEZ5
TLA5
SEO5
AKG5
ETX4
TAR4
PGO4
DTT4
OAA2
ACE2
DMS2
MLA1
DOX1
XYL1
MOH1
3OH1
AZ11
PPI1
IOH1
FOR1
MYR1
GTT1
LMT1
Buffers240 structures; 15 different buffers
ACT86
ACY47
FMT37
CIT27
TRS16
EPE15
MES12
IMD8
TMN2
10A2
BTB2
ICT1
CPS1
FLC1
NHE1
PEG38
PG428
PGE16
1PE8
P6G7
2PE3
PE43
P333
PE52
PEF1
BU31
1PG1
PE81
Salts3 structures; 3 different salts
DPO1
AF31
PPC1
Detergents2 structures; 1 different detergents
BOG2
Cryos502 structures; 5 different cryos
GOL244
EDO241
MPD32
EGL3
CRY2
Sheet1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Sheet2
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Sheet3