Bio inspiring computing and its application in cheminformatics

42
Bio-inspired Computing and its Application in Cheminformatics BY ABDELAZIM GALAL HUSSIEN DEMONSTRATOR AT FACULTY OF SCIENCE, FAYOUM UNIVERSITY Professor Mohamed Amin Faculty of Science Menofiya University Professor Aboul Ella Hassanien Faculty of Computers and Information Cairo University Supervisoion by

Transcript of Bio inspiring computing and its application in cheminformatics

Page 1: Bio inspiring computing and its application in cheminformatics

Bio-inspired Computing and its Application in Cheminformatics

B Y

A B D E L A Z I M G A L A L H U S S I E N

D E M O N S T R AT O R AT FA C U LT Y O F S C I E N C E , FAY O U M U N I V E R S I T Y

Professor Mohamed Amin

Faculty of Science

Menofiya University

Professor Aboul Ella Hassanien

Faculty of Computers and Information

Cairo University

Supervisoion by

Page 2: Bio inspiring computing and its application in cheminformatics

Agenda

Cheminformatics

• Introduction.• Representation.• Molecular descriptors.

Bio-Inspiring

• Problems• Algorithms• Ant Colony Optimization

Bioinspiring and Cheminformatics

• Classification• Clustering• Feature Selection

Application

• Drug Discovery• Drug Design

Page 3: Bio inspiring computing and its application in cheminformatics

Cheminformatics

Chemoinformatics is concerned with the application of computational methods to tackle chemical problems, with particular emphasis on the manipulation of chemical structural information.

The term was introduced in the late 1990s.

there is not even any universal agreement on the correct spelling:Cheminformatics.chemical informatics.Chemiinformatics. Chemoinformatics.

Page 4: Bio inspiring computing and its application in cheminformatics

Cheminformatics

• Cheminformatics is the use of computer and informational techniques applied to a range of problems in the field of Chemistry.

• Cheminformatics strategies are useful in drug discovery and other efforts where large numbers of compounds are being evaluated for specific properties.

• Cheminformatics is also known as multidisciplinary science as it combines Chemistry, Biology, Mathematics, Biochemistry, Statistics and informatics.

Page 5: Bio inspiring computing and its application in cheminformatics

Problems in Cheminformatics• Storing data generated through experiments or from molecular

simulation Retrieval of chemical• Structures from chemical database (Software libraries).• Prediction of physical, chemical and biological properties of chemical

compounds.• Elucidation of the structure of a compound based on spectroscopic

data.• Structure, Substructure, Similarity and diversity searching from

chemical database.• Docking - Interaction between two macromolecules.• Drug Discovery • Molecular Science, Materials Science, Food Science (nutraceuticals),

Atmospheric chemistry, Polymer chemistry, Textile Industry, Combinatorial organic synthesis (COS).

Page 6: Bio inspiring computing and its application in cheminformatics

Problem Statement

Page 7: Bio inspiring computing and its application in cheminformatics

Representation of Chemical Structures

• Chemical structures are usually stored in a computer as molecular graphs. Graph theory is a well-established area of mathematics that has found application not just in chemistry but in many other areas, such as computer science.

nodes = atomsedges = bonds

The nodes and edges may have properties associated with them.

SMILESConnection Table

Page 8: Bio inspiring computing and its application in cheminformatics

Connection Table

The simplest type of connection table consists of two sections:A) List of the atomic numbers of the atoms in the moleculeB) List of the bonds, specified as pairs of bonded atoms.

hydrogen atoms may be implied in which case the connection table is hydrogen suppressed.

Page 9: Bio inspiring computing and its application in cheminformatics

SMILES

• SMILES stands for Simplified Molecular Input Line Entry Specification.

• In SMILES, atoms are represented by their atomic symbol.

• Upper case symbols are used for aliphatic atoms and lower case for aromatic atoms.

• Double bonds are written using “=” and triple bonds using “#”

Page 10: Bio inspiring computing and its application in cheminformatics

Morgan algorithm

• There may be many different ways to construct the connection table or the SMILES string for a given molecule.

• each atom is assigned a connectivity value equal to the number of connected atoms. In the second and subsequent iterations a new connectivity value is calculated.

Page 11: Bio inspiring computing and its application in cheminformatics

Screening Methods

• Molecule screens are often implemented using binary string representations of the molecules and the query substructure called bitstrings. Bitstrings consist of a sequence of “0”s and “1s”. They are the “natural currency” of computers and so can be compared and manipulated very rapidly, especially if held in the computer’s memory. A “1” in a bitstring usually indicates the presence of a particular structural feature and a “0” its absence.

Page 12: Bio inspiring computing and its application in cheminformatics

Structure Searching

• Graph theoretic methods can be used to perform substructure searching, which is equivalent to determining whether one graph is entirely contained within another, a problem known as subgraph isomorphism.

Page 13: Bio inspiring computing and its application in cheminformatics

Molecular Descriptors

• The manipulation and analysis of chemical structural information is made possible through the use of molecular descriptors.

• These are numerical values that characterise properties of molecules.• The molecular descriptor is the final result of a logic and mathematical procedure which

transforms chemical information encoded within a symbolic representation of a molecule into an useful number or the result of some standardized experiment.

• Examples:• The descriptors fall into Four classes .

Topological. Geometrical. Electronic . Hybrid or 3D Descriptors.

Page 14: Bio inspiring computing and its application in cheminformatics

Molecular Descriptors

Page 15: Bio inspiring computing and its application in cheminformatics

Computational Models

• Most molecular discoveries today are the result of an iterative, three-phase cycle of design, synthesis and test. Analysis of the results from one iteration provides information and knowledge that enables the next cycle to be initiated and further improvements to be achieved.

• A common feature of this analysis stage is the construction of some form of model which enables the observed activity or properties to be related to the molecular structure.

• Examples: Quantitative Structure-Activity Relationships (QSARs) Quantitative Structure–Property Relationships (QSPRs)

Page 16: Bio inspiring computing and its application in cheminformatics

Quantitative Structure-Activity Relationships (QSARs)

QSAR is a mathematical relationship between a biological activity of a molecular system and its geometric and chemical characteristics.

A general formula for a quantitative structure-activity relationship (QSAR) can be given by the following:

activity = f (molecular or fragmental properties)

QSAR attempts to find consistent relationship between biological activity and molecular properties, so that these “rules” can be used to evaluate the activity of new compounds.

Page 17: Bio inspiring computing and its application in cheminformatics

Quantitative Structure-Activity Relationships (QSARs) (Cont.)

Page 18: Bio inspiring computing and its application in cheminformatics

QSAR

Compounds + biological activity

New compounds with improved biological activity

QSAR

Page 19: Bio inspiring computing and its application in cheminformatics

Agenda

Cheminformatics

• Introduction.• Representation.• Molecular descriptors.

Bio-Inspiring

• Problems• Algorithms• Ant Colony Optimization

Bioinspiring and Cheminformatics

• Classification• Clustering• Feature Selection

Application

• Drug Discovery• Drug Design

Thesis statement

• what’s I aim to achieve

Page 20: Bio inspiring computing and its application in cheminformatics

Bio-Inspired Computing Finding the best solution increasingly becomes very difficult to identify, if not impossible, due to the very large and dynamic scope of solutions and complexity of computations. Often, the optimal solution for such a NP hard problem is a point in the n-dimensional hyperspace and identifying the solution is computationally very expensive or even not feasible in limited time.

Page 21: Bio inspiring computing and its application in cheminformatics

Bio-Inspired Computing

21

• The computing inspired from biology is a field of study based on the social behavior of animals, insects and other living organisms, including also connectionism and emergence.

• Bio-inspired computing uses computers to model nature and the study of nature to improve the usage of computers.

Biological computation

ArtificialIntelligence

Bio-inspiredcomputing

Page 22: Bio inspiring computing and its application in cheminformatics

Bio-Inspired Algorithms

Page 23: Bio inspiring computing and its application in cheminformatics

Motivation Dealing too complex problems

Incapable to solve by human proposed solutionAbsence of complete mathematical model

Existing of similar problem in natureAdaptationSelf-organizationCommunicationOptimization

Page 24: Bio inspiring computing and its application in cheminformatics

Bio-inspired computing Methods:

24

Some areas of bio-inspired computing are:• neural networks• genetic algorithm• particle swarm• ant colony optimization• artificial bee colony• bacterial foraging• cuckoo search• Firefly• leaping frog• bat algorithm• flower pollination • artificial plant optimization

Page 25: Bio inspiring computing and its application in cheminformatics

Swarm Intelligence

• The SI-based algorithms belong to a wider class of the algorithms, called the bio-inspired algorithms.

• we can observe that SI-based bio-inspired nature-inspired.⊂ ⊂

Page 26: Bio inspiring computing and its application in cheminformatics

Swarm Intelligence

• Population of simple agents• Decentralized• Self-Organized• No or local communication• Example

Ant/Bee colonies Bird flocking Fish schooling

Page 27: Bio inspiring computing and its application in cheminformatics

Ant Colony Optimization

• mimic the foraging behavior of social ants.

• Ants primarily use pheromone as a chemical messenger.

• pheromone concentration can be considered as the indicator of quality solutions to a problem of interest.

• The movement of an ant is controlled by pheromone, which will evaporate over time.

• the probability of ants at a particular node i to choose the route from node i to node j is given by

Page 28: Bio inspiring computing and its application in cheminformatics

Agenda

Cheminformatics

• Introduction.• Representation.• Molecular descriptors.

BioInspiring

• Cheminformatics• Molecular Descriptors• Similarity

Bioinspiring and Cheminformatics

• Classification• Clustering• Feature Selection

Application

• Drug Discovery• Drug Design

Thesis statement

• what’s I aim to achieve

Page 29: Bio inspiring computing and its application in cheminformatics

Bio-Inspiring in CheminformaticsBio-Inspiring has many application in the field of Cheminformatics:

Classification: is a general process related to categorization, the process in which molecules are differentiated and understood.

Clustering: is the task of grouping a set of objects in such a way that objects in such a way that objects in the same group (called a cluster) are more similar to each other than those in other groups (clusters).

Feature Selection: is a process that chooses an optimal subset of features according to a certain criterion.

Page 30: Bio inspiring computing and its application in cheminformatics

Classification

• In machine learning and statistics, classification is the problem of identifying to which of a set of categories (sub-populations) a new observation belongs, on the basis of a training set of data containing observations (or instances) whose category membership is known.

Page 31: Bio inspiring computing and its application in cheminformatics

Clustering

• Clustering is the process of partitioning a usually large dataset into groups (or clusters), according to a similarity (or dissimilarity) measure.

• If we assume that we have a dataset X, defined as X = x1, x2, x3, . . ., which consists of all the data that we want to place into clusters, then we define a clustering of X in m clusters C1, ..., Cm, in such a way that the following conditions apply:

Page 32: Bio inspiring computing and its application in cheminformatics

Feature Selection

• Why we need FS? To improve performance (in terms of speed, predictive power,

simplicity of the model). to visualize the data for model selection. To reduce dimensionality and remove noise.

• Prespectives:– searching for the best subset of features.– criteria for evaluating different subsets.– principle for selection, adding, removing or changing new features

during the search.

Page 33: Bio inspiring computing and its application in cheminformatics

Agenda

Cheminformatics

• Introduction.• Representation.• Molecular descriptors.

BioInspiring

• Cheminformatics• Molecular Descriptors• Similarity

Bioinspiring and Cheminformatics

• Classification• Clustering• Feature Selection

Application

• Drug Discovery• Drug Design

Thesis statement

• what’s I aim to achieve

Page 34: Bio inspiring computing and its application in cheminformatics

Application (I) Drug Design

• Drug design, often referred to as rational drug design or simply rational design, is the inventive process of finding new medications based on the knowledge of a biological target.

• The drug is most commonly an organic small molecule that activates or inhibits the function of a biomolecule such as a protein.

Page 35: Bio inspiring computing and its application in cheminformatics

Application (II) Drug Discovery

Page 36: Bio inspiring computing and its application in cheminformatics

Cheminformatics and Bioinformatics in Drug Design

Page 37: Bio inspiring computing and its application in cheminformatics

Literature Review

• Joerg Kurt Wegner, Aaron Sterling, Rajarshi Guha, Andreas Bender in their

survey “ Cheminformatics ” introduce a comprehensive introduction to the field of cheminformatics and Roberto Todeschini and Viviana Consonni in their book molecular descriptors combine a huge number of descriptors. All new descriptors, QSAR approaches and chemometric strategies proposed since 2000 have been included in this handbook.

• Aboul Ella Hassnien and Eid Elamry introduce “Swarm Intelligence Methods and Concepts ”.

Page 38: Bio inspiring computing and its application in cheminformatics

Literature Review

Gerald M. Maggiora and Veerabahu Shanmugasundaram in the “Molecular Similarity Measures ” introduce a survey on getting similarity between 2 graph and they try to solve Maximum subgraph matching.

Arpan Kumar Kar introduce a bio-inspired review .

Page 39: Bio inspiring computing and its application in cheminformatics

Thesis Statement

Title:Bio-Inspiring Computing and its Application in Cheminformatics

Aim: Try to cluster Molecular using spectral clustering. Try to find similarity between molecules.

Page 40: Bio inspiring computing and its application in cheminformatics

References

1. Andrew R. Leach and Valerie J. Gillet, “An Introduction to Chemoinformatics” Springer 2007.2. Roberto Todeschini and Viviana Consonni ,“Molecular Descriptors for Cheminformatics” ,WILEY-VCH May,2009.3. Christina Chrysouli, Anastasios Tefa, “Spectral clustering and semi-supervised learning using evolvingsimilarity

graphs”, Applied Soft Computing,4. U. Luxburg, A tutorial on spectral clustering, Stat. Comput. 17 (4) (2007)395–4165. R. Dutt , A. K. Madan , “Predicting biological activity: Computational approach using novel distance based

molecular descriptors”, Computers in Biology and Medicine,2012.6. Yang, X.S., Cui, Z.,Xias, R., Gandomi, A.H. and Karamanoglu, M. eds., 2013. Swarm intelligence and bio-inspired

computation: theory and applications. Newnes7. Kar, Arpan Kumar. "Bio inspired computing–A review of algorithms and scope of applications." Expert Systems

with Applications 59 (2016): 20-32.8. Emmert-Streib, Frank, Matthias Dehmer and Yongtang Shi. “Fifty years of graph matching, network alignment and

network comparison.” Inf. Sci. 346-347 (2016): 180-197.9. Oduguwa, Abiola, Ashutosh Tiwari, Rajkumar Roy, and Conrad Bessant. "An overview of soft computing

techniques used in the drug discovery process." In Applied Soft Computing Technologies: The Challenge of Complexity, pp. 465-480. Springer Berlin Heidelberg, 2006.

10. Maggiora, G.M. and Shanmugasundaram, V., 2004. Molecular similarity measures. Chemoinformatics: Concepts, Methods, and Tools for Drug Discovery, pp.1-50.

Page 41: Bio inspiring computing and its application in cheminformatics

Questions

Page 42: Bio inspiring computing and its application in cheminformatics