Cheminformatics & Pharmainformatics

CheminformCheminformatics & atics &

PharmainforPharmainformaticsmatics

In this presentation……

Part 1 – Molecular ConventionsPart 2 – ResourcesPart 3 – Drug DesignPart 4 – Drug Development

Part

1

Molecular Molecular ConventionsConventions

Cheminformatics

• It is a combination of chemistry and information technology, is required for the processing and analysis of chemical data

• Cheminformatics is relevant to biologists because chemistry data are important in many areas of molecular biology, e.g, in the study of protein interactions and metabolism

Molecular formulae• Molecules can be represented by simple

formulae, which give the number and type of atoms

• However, this does not show how they are connected

• Structural formulae provide some information about the arrangement of atoms in a molecule and thus allow isomers to be distinguished

Structural representation of ethane that show tetrahedral distribution of coordinated groups about saturated carbon atoms. Panels (a) and (b) show two extreme conformations. The energetically favourable conformation (a), which predominates in nature, has H atoms on opposite sides of C-C bond as far as possible from each other (in the staggered configuration). The less favourable conformation (b) has atoms in eclipsed configuration. Panels (c) and (d) show conformations viewed from the end of molecule

H H

HH

HH

1 2

H H

H H

HH

1 2

H

H

H H

HH

H

H

H

(a) (b)

(c) (d)

Structural formulae and full and simplified structural diagrams for some common organic compounds

Name Formula Full structure Simplified structure

Methane CH4 H |H – C – H | H

Ethane C2H6 H H | |H – C – C – H | | H H

Ethene (ethylene) C2H4 H H | |H – C = C – H

Structural formulae and full and simplified structural diagrams for some common organic compounds

Name Formula Full structure Simplified structure

Cyclohexane C6H12

Ethanol C2H5OH H H | |H – C – C – O | | H H

Ethenal (acetaldehyde)

CH3CHO H H | |H – C – C = O | H

OH

O

Structural diagrams

• Molecules can be represented using simple graphs, which show atoms as nodes and bonds as links

• For organic molecules, further simplification is achieved by assuming that carbon atoms make up the molecular backbone and that the valency of four is satisfied by hydrogen atoms unless otherwise shown

• Such diagrams present all molecules as planar shapes an do not indicate the spatial distribution of atoms in 3D

Chirality• If four different groups are coordinated around

a central carbon atom, the molecule is described as chiral

• Chiral molecules exist in two conformations, enantiomers, which are mirror-images of each other

• Although enanciomers have the same chemical properties, many enzymes and other proteins show chiral sensitivity, which is important in drug development and related fields

Multi-chiral configuration

• Molecules may contain any number of chiral centers and a series of forms, called distereoisomers, may exist

• These may have different chemical properties because of the way different groups interact within the molecule

DL and RS conventions• The absolute configuration of groups around a chiral

carbon atom can be described using a number of conventions

• In the DL system, molecules are named D or L according to whether the coordinated groups are arranged in a similar fashion to those in D-glyceraldehyde or L-alanine

• In the RS system, molecules are named R (rectus) and S (sinister) according to the size of chemical groups surrounding the carbon atom

Representation of a tetrahedrally coordinated saturated carbon atom in an organic molecule(a) the carbon atom is at the centre of a tetrahedron with four coordinated groups(b) simplified representation with the central carbon removed(c) Representation of the tetrahedron as a flat image

1

43

2C

1

43

2

1

4

3 2

(a) (b) (c)

Chirality representation

(a) The structural formula of glyceraldehyde gives no indication of its chirality

(b) if the molecule is represented as a tetrahedron, the D and L enantiomers can be distinguished

(c) these can be shown as 2D graphs using the Fischer convention

CH2OHCHOHCHO

CHO

H OH

CH2OH

CHO

HOH

CH2OHCH2OH

CHO

OH

CH2OH

CHO

OH

DD

LL

Part

2

ResourcesResources

SMILES

• SMILES is a system for representing chemical formulae as strings, based on a valence model in which all valencies are considered to be satisfied by hydrogen atoms unless otherwise shown

• The system has conventions for representing different bond types, cyclic molecules, branches, cis/trans isomers and chirality

RasMol and Chime

• There are several specialized data formats for chemical structures based on the principle of a molecular formula and associated table of connections

• Viewing utilities such as RasMol and Chime can interpret these file formats and display interactive molecular structures in a variety of user-defined schemes and colors

Chemical structure and databases• Structural information about different molecules can

be obtained from a number of comprehensive WWW resources, including Chemical Abstracts On-Line, Chemfinder and MedChem

• Each of these resources provides a chemical database that can be searched using a variety of query formats, e.g., systematic name, non-systematic name, formula, molecular weight or CAS registry number

• Search results provide physical, chemical and biomedical information with links to other databases and resources

• MedChem also provides the SMILES string

QSAR• A QSAR is a statistical method used to determine how

the structural features of a molecule are related to biological activity

• The QSAR approach is particularly useful for categorizing the activities of related molecules with multiple functional groups

• Each molecule is broken down into a series of descriptors (molecular properties) and the QSAR determines which descriptors are most likely to promote biological activity

• This gives rise to a set of rules that can be used to evaluate the potential activity of new molecules

Part

3

Drug Drug DesignDesign

Pharmainformatics

• Pharmainformatics is the combination of biology, chemistry, mathematics and information technology that is essential for efficient data management, processing and analysis in the pharmaceutical industry

Drugs

• Drugs interact with targets, usually proteins, in the body and through interactions cause physiological responses

• The pharmaceutical industry aims to discover drugs with specific beneficial effects to treat human diseases

Gene – drug – life

• To know a gene’s chemical structure and composition is one thing, but understanding its actual function is another thing

• Though the sequencing and analysis would help in answering questions on aging, diseases, disorders, and many more, a new discipline of designer drugs is around the corner waiting for someone to tap

• Even a single nucleotide polymorphism (SNP, pronounced “snips”), a T, for instance, in one of the gene sequence, where the neighbour has a C, can spell trouble

Gene – drug – life

• Many drugs work only on 30 percent of human population

• In extreme cases, a drug that saves one person may poison another. For instance, a type II drug Rezulin, which has been linked to more than 60 deaths from liver toxicity worldwide

• This is where in silico drug design would help not only in reducing the designing, modeling and testing time but also reducing the expenditure in manpower, resources and on various phases of drug design and development

Areas of drug design• For drug design, the process must be viewed from

three different dimensions viz., drug design for– Diseases such as HIV, cancer, etc. that have been beating

the people– Life style drugs– Drugs for repairing genetic disorders

• There is an immanent need for evolving drugs for diseases such as hepatitis C, leprosy and malaria since these diseases are wide spread and trouble the people at large

• Other infectious diseases such as tuberculosis, HIV, etc. are also highly troublesome

In silico drug design• Earlier, the drug design process used to take many

decades and was carried out haphazardly without any direction whereas presently there is a systems approach. Added to this are tremendous reduction in research and production costs

• Already the surge in bioinformatics solutions has redefined the way drug trials are done making a shift from in vitro to in silico

• In silico drug design could be used to shorten the time of drug design and this issue shall remain the biggest challenge for years to come

Drugs are insoluble in water…

• A large portion of proteins constitute water (2/3rd of human body consists of water) and hence do not behave like rigid bodies due to the presence of water in the cells and consequently, the behavioural pattern differs from protein to protein

• Drugs normally do not dissolve in water. Designing of drugs in silico (on chips, without water) should consider this point

Important areas for drug design

• The four most important areas of consideration for successful drug design are the– binding sites– molecular shape– molecular size– inhibitory properties of the proteins

Important areas for drug design

• The study related to crystallization of membrane protein structure also plays a vital role in drug design. This area of research would be highly challenging and would prove to be an excellent foundation for further research

• Since the sequence size of dengue virus is just about 11 KB, it would be highly useful for carrying out lot of work quickly and conveniently

Medical applications

• Bioinformatics and drug design can be highly useful for diagnosis and treatment of various neurological disorders. It has been found that many neurological disorders are due to unusual gene structures like the triple ‘A’ formation “AAA” (the A of “ATGC” nucleotides) in the genes. The problem becomes more complex with multiple repeats or occurrences of triple ‘A’. More than eight such repeats are known and in such cases children are permanently bed ridden or has to use wheel chairs

Part

4

Drug Drug DevelopmeDevelopme

ntnt

Bioinformatics in drug development• Genomics, proteomics, combinatorial

chemistry and high-throughput screening (HTS) have all contributed to a massive increase in the amount of data generated by the pharmaceutical industry

• The role of bioinformatics is to store, track and provide tools for the analysis of these data – some thing like an automated environment

Bioinformatics in drug development

• Specific applications include the modeling of protein interactions with small molecules allowing rational drug design, the association of genotype and drug response patterns (pharmacogenomics), the design and assessment of chemical diversity in combinatorial libraries, and the processing and storage of data from high-throughput screens of lead compounds

Areas of biologyApplication Role of bioinformatics

Genomics/proteomics (human genome project)•Characterization of human genes and proteins

•Target identification/ validation in the human genome•Cataloging SNPs and association with drug response patterns (pharmacogenomics)

Genomics/proteomics (human pathogen genome project)•Characterization of genes and proteins of organisms that are pathogenic to humans

•Target identification/ validation in pathogens

Functional genomics (protein structures)

•Analysis of protein structures (humans and their pathogens)

•Prediction of drug/target interactions•Rational drug design

Areas of biologyApplication Role of bioinformatics

Functional genomics (expression profiling)•Determining gene expression patterns in disease and health

•Gene classification based on drug responses•Pathway reconstruction

Functional genomics (genome-wide mutagenesis)•Determining the mutant phenotypes for all genes in the genome

•Databases of animal models•Target identification/ validation

Functional genomics (protein interactions)

•Determining interactions among all proteins

•Characterization of protein interactions•Reconstruction of pathways•Prediction of binding sites

Areas of chemistry

Application Role of bioinformaticsHTS

•Highly parallel assay formats for lead identification

•Storing, tracking and analyzing data

Combinatorial chemistry•Synthesis of large number of chemical compounds

•Cataloging chemical libraries•Assessing library quality/ diversity•Predicting drug/target interactions

Principles of drug development• Drug development begins with the

identification of a suitable target, which must contribute significantly to a human disease

• Ideally, altering the activity of this target should have a beneficial effect thus showing its potential for therapeutic intervention

• The next stage of the process is lead discovery, where compounds showing some of the desired activity of an ideal drug are sought

Principles of drug development

• Optimization of lead compounds results in drug candidates that may be registered and submitted for clinical trials, which establish their safety and metabolic behaviour in human subjects

Genetic link to drugs• An early example of the utility of bioinformatics in drug

design is cathepsin K, an enzyme that might turn out to be an important target for treating osteoporosis, a crippling disease caused by the breakdown of bone

• While analyzing the osteoclasts (cells that break down bone in the normal course of bone replenishment) taken from people with bone tumors, it was found that osteoclasts cells were over expressed and could be over active in individuals with osteoporosis

• They matched with a previously identified class of molecules called cathepsins. Efforts are on to find a potential drug to block the cathepsin K target

Genetic link to drugs

• Scientists believe that 99.9 percent of your genes perfectly match those of the person sitting beside you. But the remaining 0.1 percent of the genes vary and it is these variations in which the drug companies are interested in

• Several years after the debut of tests for BRCA1 and BRCA2, scientists are still trying to determine exactly to what degree those genes contribute to a woman’s cancer risk

Chemical diversity• Diverse chemical libraries are required for efficient

lead discovery if little is known about the binding properties of the drug target

• Conversely, focused libraries are required if the structure of the target is known, since this defines a particular set of ligands

• Chemical diversity can be defined by comparing molecules on the basis of descriptors (functional groups) and how these fill chemical space

• A number of software tools are available for the design and assessment of diverse or focused chemical libraries, virtual screening against drug targets

Computational screening• Software applications like DOCK and Autodock

match potential ligands to binding sites by calculating steric constraints and bond energies

• These can be used to search chemical databases and find potential drug leads

• Some applications consider the ligand and binding site as inflexible structures, rather like pieces of a jigsaw, while others can incorporate flexibility into the molecules by calculating allowable and compatible bond torsions

Functional genomics

• The large-scale functional annotation of genes is known as functional genomics and incorporates areas such as homology searching, structural analysis, expression analysis, large scale mutagenesis and the analysis of protein interactions

• All of these areas are important in drug development

Genome-scale mutagenesis

• Genome-scale mutagenesis is a rich source of animal disease models for target identification and validation, and large mutant collections in simple organisms can be used for the rapid high-throughput screening of potential lead compounds

Approaches in functional genomicsApproach Functional annotation method

Homology searching Comparison to related sequences with known function

Protein structure determination (structural genomics)

Comparison to molecules with related structure and known function

Comparative genomics Functional annotation by domain conservation, conserved phylogeny or conserved genomic organization

Expression analysis Similar expression profiles indicate conserved function

Mutagenesis Function based on mutant phenotype, e.g. knockout mice

Protein interaction screening

Function based on presence in multi-subunit complex or on interaction with proteins of known function

Small molecule informatics Interaction with small molecules

Pharmacogenomics• It is a study of how variation in the human

population correlates with drug response patterns

• The analysis of genomic data and its comparison with drug response data allows patients to be clustered into drug response groups, so that appropriate drugs and dose regimens can be administered

• Variation is catalogued by analyzing data on mutation (particularly SNPs) and gene expression profiles

In lab vs. out of lab effort• The companies and individuals plug into the effort of

drug design at various points: collecting and storing data, searching databases, and interpreting the data

• The race and competition is all about who can mine the massive information best

• Just modeling or computing of the drug design or protein structure would not be sufficient, but lot of information on test results and clinical trials from outside are also very important

• Most of the time should be spent on this aspect for ensuring success in drug design and development

Issues of drug design

• Eventhough the human genome has been sequenced, there a number of problems awaiting for solutions…… technical, legal, and social

• It is absolutely not clear as to how much must one know about a gene in order to patent it

• There is also a necessity of reviewing all failed drugs, i.e., drugs failed during clinical trails since their molecular composition and experimentation process could give lot of valuable information

• Various aspects connected to successful drug design include supercomputing, modeling of proteins through software, biotechnology, computational methods and analysis, biochemistry, in silico drug design, etc.

• It is notable that a drug that works for protein ‘A’ does not work for protein ‘B’ or behaves differently due to various factors. That is why, many drugs could fail, and hence an integrated (team work) effort is required with tremendous amount of information and interactions

• At the moment, many patent applications rely on computerized prediction techniques that are often referred to as “in silico” biology

• With full or partial gene sequence, scientists enter the data into a computer program that predicts the amino acid sequence of the resulting protein

• By comparing this hypothetical protein with known proteins, the researchers take a guess at what the underlying gene sequence does and how it might be useful in developing a drug, say, or a diagnostic test

• Searches for compounds that bind to and have the desired effect on drug targets still take place mainly in a biochemist’s traditional “wet” lab, where evaluations for activity, toxicity and absorption can take years

• But now with the bioinformatics initiatives, tools and growing databases of protein structures and biomolecular pathways, this aspect of drug development is shifting to computers

• As the saying goes “genomics without bioinformatics will not have much of a payoff”

Ayurveda and tribal medicine

• Till date, not much has been considered about the biodiversity, especially research and knowledge base on alternate medicine, Ayurveda, herbs/shrubs applications from remote villages, etc.

• This area of medicine and study of their affect on genes and proteins could be another challenging and interesting area

Future of pharmainformatics• Drug companies collect the genetic know-how to make

medicines tailored to specific genes – an effort called pharmacogenomics

• In the years to come, pharmacists may hand over one version of blood pressure drug based on your unique genetic profile, while the person behind in line would get a different version of the same medicine!!

• There is going to be a day when somebody comes in with cancer, and diagnosis can be done not on the basis of morphology of the cancer but by looking at the detailed patterns of gene expression and protein-binding activities in that cell

Target for the industry

• It is expected that in this decade, the pharmaceutical industry will be faced with evaluating up to 10,000 human proteins against which new therapeutics might be directed

• That is 25 times the number of drug targets that have been evaluated by all the companies since the dawn of the industry

Resources

• For a primer on genetic testing and a directory of genetic tests, visit GeneTests at www.genetests.org

• For more on the ethical, legal and social implications of human genome research, visit the National Human Genome Research Institute’s web site at www.nhgri.nih.gov/ELSI

Cheminformatics & Pharmainformatics

Documents

Transcript of Cheminformatics & Pharmainformatics