Smiles New 1

21
SMILES • Simplified Molecular Input Line Entry System (SMILES) • Widely used AND computationally efficient • Uses atomic symbols and a set of intuitive rules • Uses hydrogen-suppressed molecular graphs (HSMG)

description

SMILES note

Transcript of Smiles New 1

  • SMILESSimplified Molecular Input Line Entry System (SMILES)Widely used AND computationally efficientUses atomic symbols and a set of intuitive rulesUses hydrogen-suppressed molecular graphs (HSMG)

  • SMILES BondsSINGLE*

    DOUBLE

    TRIPLE

    AROMATIC** can be omitted-

    =

    #

    :

  • Butanols2-Butanol

    iso-Butanol

    tert-Butanol

  • SMILES BranchesRepresented by enclosure in parenthesesCan be nested or stackedExamples:

    CC(O)CC is 2-ButanolOCC(C)C is iso-ButanolOC(C)(C)C is tert-Butanol

  • SMILES BondsEtheneChloroethene1,1-Dichloroethenecis-1,2-DichloroetheneTrichloroethenePerchloroetheneC=CClC=CClC(Cl)=CClC=CClClC(Cl)=CClClC(Cl)=C(Cl)Cl

  • SMILES AtomsUse normal chemical symbolsAdd punctuation symbols if necessaryNo super- or subscripts

  • SMILES SymbolsString of alphanumeric characters and certain punctuation symbolsTerminates at the first space encountered when read left to rightThe ORGANIC SUBSET:

    B, C, N, O, P, S, F, Cl, Br, I

  • Other SMILES AtomsAliphatic or nonaromatic carbon: CAtom in aromatic ring: lowercase letterDesignate ring closure with pairs of matching digits, e.g.

    c1ccccc1 (or C1=CC=CC=C1) is Benzene, whereasC1CCCCC1 is Cyclohexane

  • SMILES ChargesSpecify attached hydrogens and charges in square bracketsNumber of attached hydrogens is the symbol H followed by optional digit

  • SMILES Charges[H+][OH-][OH3+][Fe++][NH4+]protonhydroxyl anionhydronium cationiron(II) cationammonium cation

  • SMILES Cyclic StructuresBreak one single or one aromatic bond in each ringNumber in any orderDesignate ring-breaking atoms by the same digit following the atomic symbol

  • Cyclic StructuresNumbers indicate start and stop of ringSame number indicates start and end of the ring, entered immediately following the start/end atomsOnly numbers 1 9 are usedA number should appear only twiceAtom can be associated w. 2 consecutive numbers, e.g., Napthalene: c12ccccc1cccc2

  • Naphthalene

    c12ccccc1cccc2

  • SMILES ConventionsAvoid two consecutive left parentheses if possibleStrive for the fewest number of possible branchesTautomeric bonds are not designated; enter the appropriate form

  • Further RestrictionsA branch cannot begin a SMILES notationA branch cannot immediately follow a double- or triple-bond symbolExample: C=(CC)C is invalid, butC(=CC)C or C(CC)=C are valid SMILES

  • SMILES FragmentsNitroNitrateNitriteSulfonic acidCyanide/NitrileAzideAzido

    N(=O)(=O)ON(=O)(=O)ON(=O)S(=O)(=O)OC#NN=N#NN+=N-

  • SMILES Metals[Al] [As] [Au] [Be][Bi] [Cd] [Ca] [Fe][Hg] [K] [Li] [Mg] [Na] [Ni] [Pt] [Sb][Sn] [Zn] [Zr]

  • Disconnected StructuresIndicated by a dotTetramethyl ammonium bromide

    C[N+]C(C)C.[Br-]

  • Isomeric and Chiral SMILESIsomeric configuration indicated by forward and backward slashes: / \Examples:trans-1,2-dibromoethene: Br/C=C/BrDirection of the slash continuescis-1,2-dibromoethene: Br/C=C\BrDirection of the slash reversesChirality indicated by the @ symbol

  • Some ApplicationsJMDraw/SMILESViewer (Christoph Steinbeck)JME Molecular Editor (Peter Ertl)STN Express (SMILES as output)Tripos (dbtranslate: SMILES to MOL)Marvin (Ferenc Csizmadia)

    http://chemaxon.com/marvin/CACTVS http://www2.ccc.uni-erlangen.de/cactvs/

  • Another ApplicationSMILESCAS Database

    http://www.syrres.com/esc/smilecas.htmOver 103,000 SMILES notationsInput CAS Registry NumberLeads to SMILES and thence to a structure search