SDF File analysis Creation, composition, checking.
-
Upload
blaise-greer -
Category
Documents
-
view
230 -
download
0
Transcript of SDF File analysis Creation, composition, checking.
Concerning chemical table files
• Chemical table files are files that contain information about chemicals
• Various formatsRGfiles, Rxnfiles, RDfiles, XDfiles and ClipboardMolfile, SDF
MDL Molfile
• A file format for holding information about the atoms, bonds, connectivity and coordinates of a molecule
• Most cheminformatics and some computational softwares are able to read
• Standard version: V2000• Containing a header and a connection table
MDL Molfile contentGenerated by Molgen 5.0
11 9 0 0 0 0 -0.0666 -1.5989 0.0514 C 0 0 0 0 0 0 0 0 0 0 0 0 0 1.2913 -1.6184 -0.1221 C 0 0 0 0 0 0 0 0 0 0 0 0 0 -0.9621 -1.2620 -0.9586 O 0 0 0 0 0 0 0 0 0 0 0 0 0 -0.0783 1.8974 -0.4702 O 0 0 0 0 0 0 0 0 0 0 0 0 0 -0.4844 1.6346 0.9333 O 0 0 0 0 0 0 0 0 0 0 0 0 0 -0.5244 -1.8601 1.0528 H 0 0 0 0 0 0 0 0 0 0 0 0 0 1.7535 -1.3543 -1.1238 H 0 0 0 0 0 0 0 0 0 0 0 0 0 1.9833 -1.8974 0.7324 H 0 0 0 0 0 0 0 0 0 0 0 0 0 -1.9833 -1.2177 -0.8648 H 0 0 0 0 0 0 0 0 0 0 0 0 0 0.8090 1.5332 -0.8167 H 0 0 0 0 0 0 0 0 0 0 0 0 0 -1.3677 1.1615 1.1238 H 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 2 0 0 0 0 1 3 1 0 0 0 0 1 6 1 0 0 0 0 2 7 1 0 0 0 0 2 8 1 0 0 0 0 3 9 1 0 0 0 0 4 5 1 0 0 0 0 4 10 1 0 0 0 0 5 11 1 0 0 0 0M END$$$$
1-3 Header1 Molecule name
2 User/Program/Date/etc information
3 Comment (blank)
4-25 Connection table (Ctab)
4Counts line: 11 atoms, 9 bonds, ..., V2000 standard
5-15Atom block (1 line for each atom): x, y, z, element, etc.
16-25Bond block (1 line for each bond): 1st atom, 2nd atom, type, etc.
25 M END
26 $$$$ Delimiter character (only for SDF)
SDF content §1 – molecular informations
./MinCheck/C2_H6_N0_O3_F0_S0_1.log OpenBabel04161413273DGaussian 09 # G3MP2B3 Opt(Cartesian,Tight,CalcAll,MaxStep=1,MaxCycles=300) QCISD 11 9 0 0 0 0 0 0 0 0999 V2000 0.4466 -1.5390 0.0292 C 0 0 0 0 0 0 0 0 0 0 0 0 1.4790 -2.1676 -0.5273 C 0 0 0 0 0 0 0 0 0 0 0 0 -0.2693 -0.5704 -0.6322 O 0 0 0 0 0 0 0 0 0 0 0 0 -0.3941 2.0659 0.3307 O 0 0 0 0 0 0 0 0 0 0 0 0 -1.5836 1.3451 0.7668 O 0 0 0 0 0 0 0 0 0 0 0 0 0.1141 -1.7508 1.0446 H 0 0 0 0 0 0 0 0 0 0 0 0 1.7979 -1.9482 -1.5413 H 0 0 0 0 0 0 0 0 0 0 0 0 2.0238 -2.9170 0.0345 H 0 0 0 0 0 0 0 0 0 0 0 0 -1.0239 -0.2837 -0.0806 H 0 0 0 0 0 0 0 0 0 0 0 0 0.0506 1.3459 -0.1697 H 0 0 0 0 0 0 0 0 0 0 0 0 -2.2708 1.8377 0.2828 H 0 0 0 0 0 0 0 0 0 0 0 0 1 6 1 0 0 0 0 2 1 2 0 0 0 0 2 8 1 0 0 0 0 3 9 1 0 0 0 0 3 1 1 0 0 0 0 4 5 1 0 0 0 0 7 2 1 0 0 0 0 10 4 1 0 0 0 0 11 5 1 0 0 0 0M END
1-3 Header1 Filename
2 User/Program/Date/etc information
3 Command
4-25 Connection table (Ctab)
4Counts line: 11 atoms, 9 bonds, ..., V2000 standard
5-15Atom block (1 line for each atom): x, y, z, element, etc.
16-25Bond block (1 line for each bond): 1st atom, 2nd atom, type, etc.
25 M END
SDF content §2 – input and calculated parameters
> <Scale factor> 0.96
> <Stoichiometry> C2H6O3
> <Charge> 0
> <Multiplicity> 1
> <Molecular mass> 78.03169
> <DegreeOfFreedom> 27
> <Permanent dipole moment(B3LYP, Debye)> 1.475
> <ABC(cm-1)> 14.133 1.731 1.655
> <Scaled freq(cm-1)> 49.1 59.1 80.1 182.8 222.6 335.5 460.0 529.6 663.0 762.0 812.3 911.3 928.1 944.3 1124.8 1287.3 1299.6 1321.8 1403.2 1483.7 1689.2 3041.9 3064.2 3147.0 3408.9 3472.7 3557.0
> <IR intensities(rel.)> 4.5 3.8 6.6 7.8 25.1 93.3 16.9 79.8 60.8 214.2 73.0 2.9 55.0 16.5 33.8 210.3 56.9 126.8 4.4 22.8 90.0 19.2 0.4 8.3 59.4 559.4 26.8
> <Temp(K)> 298.150
> <Pressure(atm)> 1.00000> <DfHg_G3MP2B3(kJ/mol)> -269.7
> <Scaled S(J/molK)> 363.4
> <UNScaled CV(J/molK)> 98.9
Scale factor Stoichiometry Charge Multiplicity Molecular mass DegreeOfFreedom Permanent dipole momentABC(cm-1)
Scaled freq(cm-1) IR intensities(rel.) Temp(K) Pressure(atm)
DfHg_G3MP2B3(kJ/mol) Scaled S(J/molK)
UNScaled CV(J/molK)
SDF content §3 – molecular descriptors> <MPD> 2;1-1-2;1-1-9;1-1-13;2-3-13; 2;1-1-2;1-2-13;2-1-9;2-1-13; 9;1-1-2;1-1-13;2-1-2;2-1-13; 8;1-1-8;1-1-13;2-1-13; 8;1-1-8;1-1-13;2-1-13; 13;1-1-2;2-1-2;2-1-9; 13;1-1-2;2-1-2;2-1-13; 13;1-1-2;2-1-2;2-1-13; 13;1-1-9;2-1-2; 13;1-1-8;2-1-8; 13;1-1-8;2-1-8;
> <MNA> -C(-H(-C)-C(-H-H-C)-O(-H-C))-C(-H(-C)-H(-C)-C(-H-C-O))-O(-H(-O)-C(-H-C-O))-O(-H(-O)-O(-H-O))-O(-H(-O)-O(-H-O))-H(-C(-H-C-O))-H(-C(-H-H-C))-H(-C(-H-H-C))-H(-O(-H-C))-H(-O(-H-O))-H(-O(-H-O))
> <SMI> C(=C)O.OO
> <MolRT> 3
> <InChi> InChI=1S/C2H4O.H2O2/c1-2-3;1-2/h2-3H,1H2;1-2H
> <InChiKey> JJZZTHKXWWHOAE-UHFFFAOYSA-N
> <MCDL> CH;CHH;3OH[2,3;;;5]
$$$$
MPD MNA SMI
MolRT InChi
InChiKey MCDL
Molecular fragment schemes
• Developed in the ’50s• Screens (strutural keys, fingerprints) have been developed in
the ’70s• Generally they represent big strings can be stored effectively -
> compressed• Important role
in providing efficient substructure searching capabilities in large chemical databases,
in similarity searching, in clustering large data sets, in assessing chemical diversity, in conducting SAR and QSAR studies
Images of the optimized structure(depicted differently)
GaussView ChemDraw
www.chemicalize.org (searched after InChI)
MPD (MOLPRINT 2D)
• MPD = Molecular Populational Dynamics• A molecular similarity searching technique
based on atom environments• Atom environments are count vectors of
heavy atoms present at a topological distance from each heavy atom of a molecule
> <MPD> 2;1-1-2;1-1-9;1-1-13;2-3-13; 2;1-1-2;1-2-13;2-1-9;2-1-13; 9;1-1-2;1-1-13;2-1-2;2-1-13; 8;1-1-8;1-1-13;2-1-13; 8;1-1-8;1-1-13;2-1-13; 13;1-1-2;2-1-2;2-1-9; 13;1-1-2;2-1-2;2-1-13; 13;1-1-2;2-1-2;2-1-13; 13;1-1-9;2-1-2; 13;1-1-8;2-1-8; 13;1-1-8;2-1-8;
MNA
• MNA = Multilevel Neighbourhood of Atoms
• 2D molecular fragments suitable for use in QSAR modelling
• Output: a complete descriptor fingerprint per molecule• Fragment: starting at the origin, each atom is
appended to the descriptor immediately followed by a parenthesized list of its neighbours
> <MNA> -C(-H(-C)-C(-H-H-C)-O(-H-C))-C(-H(-C)-H(-C)-C(-H-C-O))-O(-H(-O)-C(-H-C-O))-O(-H(-O)-O(-H-O))-O(-H(-O)-O(-H-O))-H(-C(-H-C-O))-H(-C(-H-H-C))-H(-C(-H-H-C))-H(-O(-H-C))-H(-O(-H-O))-H(-O(-H-O))
SMILES (SMI)
• SMILES = Simplified Molecular Input Line Entry Specification
• A linear text format which can describe the connectivity and chirality of a molecule
• Specifically represents a valence model of a molecule, not a computer data structure, a mathematical abstraction, or an "actual substance"
> <SMI> C(=C)O.OO
InChI
• InChI = International Chemical Identifier, • A reliable computerized method to represent identities• A representation of the chemical structure with details• Simple, but unique identifier for molecules (like a barcode)• Different layers separated with delimiters (/)
Main layer Charge layer Stereochemical layer Isotopic layer Fixed-H layer Reconnected layer> <InChi>
InChI=1S/C2H4O.H2O2/c1-2-3;1-2/h2-3H,1H2;1-2H
+
=
=
InChiKey
• A shortened and more browser-preferable form of InChI code• Its lengths is fixed in 27 characters• The first 14 represent the molecular skeleton/connectivity
matrix• Next layer contains 8+1 characters • the first 8-character block encodes stereochemistry and
isotopic substitution information• +1 character defines the kind of InChIKey (S=standard, N=non-
standard)• Next character: used version of InChI• Finishing character: protonation indicator
> <InChiKey> JJZZTHKXWWHOAE-UHFFFAOYSA-N
MCDL
• MCDL = Molecular Chemical Descriptor Language; firstly published in 2001
• Developed for linear representation of structural and other chemical information for chemical databases
• Similar to InChI: both languages are modular, constitution, connectivity, and stereochemistry is represented by individual „modules”
• MCDL provides direct placement of hydrogen atoms, whereas InChI uses a separate block> <MCDL> CH;CHH;3OH[2,3;;;5]
Other useful links and references• Todeschini, Roberto / Consonni, Viviana
Molecular Descriptors for Chemoinformatics, 2., revised and enlarged Edition, 2009.ISBN 978-3-527-31852-0 - Wiley-VCH, Weinheim
• Bender A, Mussa HY, Glen RC, Reiling S.: Similarity searching of chemical databases using atom environment descriptors (MOLPRINT 2D): evaluation of performance, J Chem Inf Comput Sci. 2004 Sep-Oct; 44(5):1708-18.
• Gakh AA, Burnett MN.: Modular Chemical Descriptor Language (MCDL): composition, connectivity, and supplementary modules, J Chem Inf Comput Sci. 2001 Nov-Dec; 41(6):1494-9.
• http://arxiv.org/ftp/arxiv/papers/1311/1311.3723.pdf• http://openbabel.org/wiki/Multilevel_Neighborhoods_of_Atoms• http://openbabel.org/wiki/SMILES• http://www.daylight.com/meetings/summerschool98/course/dave/smiles-intro.html• http://www.inchi-trust.org/ (and references therein)• http://www.iupac.org/home/publications/e-resources/inchi/download.html (and
references therein)• http://www.chemspider.com/inchi-resolver/
Your objectives for today
• To check your .sdf file for two chosen isomers• To collect all the codes• To compare them with each other and find
differences