MODULE 1 - iiitmk.ac.in · tools are CAD, CAE, CAM, Continuous Acquisition and Life Cycle Support,...

MODULE 1

Introduction to modeling: Theoretical vs. Computational Modeling, Stages of

Computational Modeling, Abstraction of idea-Properties of models, Importance of

virtual experiments in science & technology.

INTRODUCTION TO MODELING

Model

Model, is a physical, mathematical, or otherwise logical representation of a system, entity,

phenomenon, or process. A model is a mathematical object that has the ability to predict the behavior of

a real system under a set of defined operating conditions and simplifying assumptions. Simulation is

the process of exercising a model for a particular instantiation of the system and specific set of inputs

in order to predict the system response. A Simulation is the implementation of a model over time. A

Simulation brings a model to life and shows a particular object or phenomenon will behave. It is useful

for testing, analysis or training where real-world systems or concepts can be represented by a model.

Simply, models serve as representations of events and/or things that are real (such as a historic case

study) or contrived (a use case). They can be representations of actual systems. This is because systems

can be difficult or impossible to investigate. A system might be large and complex, or it might be

dangerous to impose conditions for which to study the system. Systems that are expensive or essential

cannot be taken out of service; systems that are notional do not have the physical components to

conduct experiments. Thus, models are developed to serve as a stand-in for systems. As a substitute, the

model is what will be investigated with the goal of learning more about the system.

Technological advancements have paved the way for new approaches to modeling, simulation, and

visualization. Modeling now encompasses high degrees of complexity and holistic methods of data

representation. A modeling can be physical, such as a scale model of an airplane to study aerodynamic

behavior. A physical model, such as the scale model of an airplane, can be used to study the

aerodynamic behavior of the airplane through wind-tunnel tests. At times, a model consists of a set of

mathematical equations or logic statements that describes the behavior of the system. These are

notional models. Simple equations often result in analytic solutions or an analytic representation of the

desired system performance characteristic under study.

Modeling and Simulation

Modeling and Simulation is the fundamental notion that models are approximations of the real world

Models are static representations of the entities or concepts they represent. To depict how models

behave over time, one must implement them in a simulation. Thus, the term simulation is defined as

models that have been implemented in a temporal manner. Modeling and Simulation is at the forefront

of multidisciplinary collaboration that integrates quantitative and qualitative research methods and

diverse modeling paradigms. Significantly, these modeling tools are capable of representing many

aspects of life. Modeling and Simulation has become a very important tool across all applications:

requirements definition; program management; design and engineering; efficient test planning; result

prediction. Modeling and Simulation (M&S) provides virtual duplication of products and processes,

and represents those products or processes in readily available operationally valid environments. Use of

models and simulations can reduce the cost and risk of life cycle activities. M&S is now capable of

prototyping full systems, networks, interconnecting multiple systems and their simulators so that

simulation technology is moving in every direction conceivable.

Classes of Simulations

The three classes of models and simulations are virtual, constructive, and live:

Virtual Simulation represent system both physical and electronically. Examples are aircraft

trainers, the Navy’s Battle Force Tactical Trainer, Close Combat Tactical Trainer and built-in

training. Virtual Simulations put the human-in-the-loop. The operators physical interface with

the system is dulplicated, and the simulated system is made to perform as if it where the real

system. The operator is subjected to an environment that looks, feels, and behaves like the real

thing. The more advanced version of this is virtual prototype, which allows the individual to

interface with a virtual mockup operating in a realistic computer-generated environment. A

virtual prototype is a computer-based simulation of a system or subsystem with a degree of

functional realism that is compared to that of a physical prototype.

Constructive simulations represent a system and its employment.They include computer

models, analytic tools, mockups, IDEF, Flow Diagrams, and Computer-Aided

Design/Manufacturing(CAD/CAM). The purpose of systems engineering is to develop

descriptions of system solutions. Accordingly, constructive simulations are important products

in all key system engineering task and activities. Of special interest Computer-Aided

engineerig (CAE) tools. Computer aided tools can allow more in-depth and complete analysis

of system requirements early in design. They can provide improved communication because

data can be disseminated rapidly to several individuals concurrently, and because design

changes can be incorporated and distributed expeditiously. Key computer-aided engineering

tools are CAD, CAE, CAM, Continuous Acquisition and Life Cycle Support, and Computer-

Aided Systems Engineering.

Live simulations are simulated operations with real operators and real equipment. Examples are

fire drills, operational tests, and initial production run with soft tooling. Live simulations are

simulated operations of real systems using real people in realistic situations. The intent is to put

the system, including its operators, through a perational scenario, where some conditions and

environments are mimicked to provide a realistic operating situation. Examples of live

simulations range from fleet exercises to fire drills. Eventually live simulations must be

performed to validate constructive and virtual simulations However, live simulations are usually

costly, and trade studies should be performed to support the balance of simulation types chosen

for the program.

Modeling and Simulation Cycle and Relevant Technologies

Model Implement

Analyze Execute

Figure 1: Modeling and Simulation cycle and relevant technologies

The process of M&S passes through four phases of a cyclic movement: model, code, execute, and

analyze. Each phase depends on a different set of supporting technologies:

(1) model phase = modeling technologies

(2) code phase = development technologies

(3) execute phase = computational technologies

(4) analyze phase = data/information technologies

Model

Simulation

Results

Insight

Modeling Technologies : The construction of a model for a system requires data, knowledge, and

insight about the system. Different types of systems are modeled using different constructs or

paradigms. The modeler must be proficient in his or her understanding of these different system classes

and select the best modeling paradigm to capture or represent the system he or she is to model. As

noted previoulsy, modeling involves mathematics and logic to describe expected behavior; as such,

only those system behaviors siginificant to the study or research question need be represented in the

model.

Development Technologies : The development of a simulation is a software design project. Computer

code must be written to algorithmically represent the mathematical statements and logical constructs of

the model. This phase of the M&S cycle uses principles and tools of software engineering.

Computational Technologies : The simulation is next executed to produce performance data for the

system. For simple simulations, this might mean implementing the simulation code on a personal

computer. For complex simulations, the simulation code might be implemented in a distributed,

multiprocessor or multicomputer environment where the different processing units are interconnected

over a high-speed computer network. Such an implementation often requires specialized knowledge of

computer architectures, computer networks, and distributed computing methodologies.

Data/Informational Technologies : During this phase of the M&S process, analysis of the simulation

output data is conducted to produce the desired performance information that was the original focus of

the M&S study. If the model contains variability and uncertainty, then techniques from probability and

statistics will likely be required for the analysis. If the focus of the study is to optimize performance,

then appropriate optimization techniques must be applied to analyze the simulation results.

THEORETICAL VS COMPUTATIONAL MODELING

'Theory’ is a word with which most scientists are entirely comfortable. A theory is one or more rules

that are postulated to govern the behavior of physical systems. Often, in science at least, such rules are

quantitative in nature and expressed in the form of a mathematical equation. For example, the theory of

Einstein that the energy of a particle, E, is equal to its relativistic mass, m, times the speed of light in a

vacuum, c, squared,

E = mc2

The quantitative nature of scientific theories allows them to be tested by experiment. This testing is the

means by which the applicable range of a theory is elucidated. Theory has limits in its applicability

might, at first glance, seem a sufficient flaw to warrant discarding it. However, if a sufficiently large

number of ‘interesting’ systems falls within the range of the theory, practical reasons tend to motivate

its continued use. The goal of a theory tends to be to achieve as great a generality as possible,

irrespective of the practical consequences.

A computational model is a mathematical model in computational science that requires extensive

computational resources to study the behavior of a complex system by computer simulation The system

under study is often a complex nonlinear system for which simple, intuitive analytical solutions are not

readily available. Rather than deriving a mathematical analytical solution to the problem,

experimentation with the model is done by adjusting the parameters of the system in the computer, and

studying the differences in the outcome of the experiments. Operation theories of the model can be

derived/deduced from these computational experiments.

Computational models are created to simulate a set of processes observed in the natural world in order

to gain an understanding of these processes and to predict the outcome of natural processes given a

specific set of input parameters. Conceptual and theoretical modeling constructs are expressed as sets

of algorithms and implemented as software packages.Theorists spend time to develop a MODEL of

what is going, whether it is semi-analytic, or semi-empirical, etc. That is their job, purely developing

models, and also they can match models to experiments, etc. Computational people tend to take really

complicated models (such as the big bang, formation of the universe, etc to give a few) and figure out

ways, using physics, math and computer tricks, to be able to simulate those models as best as possible,

and then to try to match those to what is observed. They don't always make the theories, they are the

ones who figure out how to make simulated data.

Theories and models like those represented by equations like E=mc2, are not particularly taxing in

terms of their mathematics, many others can only be efficiently put to use with the assistance of a

digital computer. Indeed, there is a certain synergy between the development of chemical theories and

the development of computational hardware, software, etc. If a theory cannot be tested, say because

solution of the relevant equations lies outside the scope of practical possibility, then its utility cannot be

determined. Similarly, advances in computational technology can permit existing theories to be applied

to increasingly complex systems to better gauge the degree to which they are robust. ‘Computation’ is

the use of digital technology to solve the mathematical equations defining a particular theory or model.

Theory is enormously attractive as a tool to reduce the costs of doing experiments. Theory is applied

post facto to a situation where some ambiguity exists in the interpretation of existing experimental

results. Theory may be employed in a simultaneous fashion to optimize the design and progress of an

experimental program. Theory may be used to predict properties which might be especially difficult or

dangerous (i.e., costly) to measure experimentally. In the difficult category are such data as rate

constants for the reactions of trace, upper-atmospheric constituents that might play an important role in

the ozone cycle. Computational model is the more expensive in terms of computational resources. The

talent of the well-trained computational chemist knows how to maximize the accuracy of a prediction

while minimizing the investment of such resources. A computational model is a clear equation that can

be worth 1000 words.

A theoretical model is a theory designed to explain an entire situation or behavior, with the idea that it

would eventually be able to predict that behavior. An example of a model would the theory of planned

behavior. This is a theoretical model because it takes the components of decision making and theories

on and attempts to predict how these will affect the intention to do something.Theoretical modeling

begins with the need to understand some real world phenomenon. The researcher then constructs an

environment-which he or she calls a model-in which h the actions to be explained take place. A model

is specified by a series of assumptions. Some assumptions are purely mathematical; their purpose is to

make the analysis tractable. Other assumptions are substantive, with verifiable empirical content.

Once a theoretical model has been built, the researcher analyzes its logical implications for the

phenomenon being explained. Then another model, substantively different from the first, is built-very

likely by another researcher-and its implications are analyzed. The process continues with a third and a

fourth model, if necessary, until all ramifications of the explanation being proposed have been

examined. By comparing the implications of one model with those of another, and tracing the

differences to the model design, we hope to understand the cause-effect relationships governing the

phenomenon in question. This is as though a logical experiment were being run, with the various

models as the treatments and the phenomenon being explained as the "dependent variables." The key

difference from empirical experiments is that in empirical experiments the subjects produce the effects,

whereas here the researcher produces the effects by logical argument.

Models help us to understand phenomena

A computational model can provide novel sources of insight into behavior, for example by providing a

counterintuitive explanation of a phenomenon, or by reconciling seemingly contradictory phenomena

(e.g., by complex interactions among components). Seemingly different phenomena can also be related

to each other in non obvious ways via a common set of computational mechanisms.

Computational models can also be lessoned and then tested, providing insight into behavior

following specific types of brain damage, and in turn, into normal functioning. Often, lesions can have

no obvious effects that computational models can explain. By virtue of being able to translate between

functional desiderata and the biological mechanisms that implement them, computational models

enable us to understand not just how the brain is structured, but why it is structured in the way it is.

Models deal with complexity

A computational model can deal with complexity in ways that verbal arguments cannot, producing

satisfying explanations of what would otherwise just be vague hand-wavy arguments. Further,

computational models can handle complexity across multiple levels of analysis, allowing data across

these levels to be integrated and related to each other. For example, the computational models in this

book show how biological properties give rise to cognitive behaviors in ways that would be impossible

with simple verbal arguments.

Models are explicit

Making a computational model forces you to be explicit about your assumptions and about exactly how

the relevant processes actually work. Such explicitness carries with it many potential advantages.

First, explicitness can help in deconstructing psychological concepts that may rely on homunculi to do

their work. A homunculus is a ``little man,'' and many theories of cognition make unintended use of

them by embodying particular components (often ``boxes'') of the theory with magical powers that end

up doing all the work in the theory. A canonical example is the ``executive'' theory of prefrontal cortex

function: if you posit an executive without explaining how it makes all those good decisions and

coordinates all the other brain areas, you haven't explained too much (you might as well just put

pinstripes and a tie on the box).

Second, an explicitly specified computational model can be run to generate novel predictions. A

computational model thus forces you to accept the consequences of your assumptions. If the model

must be modified to account for new data, it becomes very clear exactly what these changes are, and

the scientific community can more easily evaluate the resulting deviance from the previous theory.

Predictions from verbal theories can be tenuous due to lack of specificity and the flexibility of vague

verbal constructs.

Third, explicitness can contribute to a greater appreciation for the complexities of otherwise seemingly

simple processes. For example, before people tried to make explicit computational models of object

recognition, it didn't seem that difficult or interesting a problem -- there is an anecdotal story about a

scientist in the `60s who was going to implement a model of object recognition over the summer.

Needless to say, he didn't succeed.

Fourth, making a computational model forces you to confront aspects of the problem that you might

have otherwise ignored or considered to be irrelevant. Although one sometimes ends up using

simplifications or stand-ins for these other aspects, it can be useful to at least confront these problems.

STAGES OF COMPUTATIONAL MODELING

Figure 2: Stages of Computational Modeling

Pose the

Question

Period of

Study

Create a

Process

Model

Create a

Computational

Representation

n

Evaluate

the

computatio

nal Model

Explore the

Model’s

Behavior

Experime

nt with

model

Modify

theory

about real

world

Theory Development

Model Development Model Evaluation

Refinement

I. Theory Development

Lave and March(1993) provide an invaluable lesson by beginning their primer on modeling with “an

introduction to speculation”. The genesis of any good modeling project is some observation or question

that piques one's curiosity. Sometimes, as with most of Lave and March's examples, the observation is a

simple empirical regularity, and the question asks for a simple process explanation, which ideally draws

on existing theory. Obviously, theory development cannot occur in a vacuum. One needs a solid grasp

of existing substantive knowledge and theory about the research domain. Too often, researchers

become committed to particular analytic tools, such a computational modeling, and lose sight of the

substantive questions. This stage of the project, then, often requires original empirical analysis and

careful review of the relevant empirical and theoretical literatures.

In the social sciences, this period of study is likely to reveal one of several cases : (a) a simple, elegant

theory, possibly already expressed as a mathematical process model, that appears to explain the original

observation; (b) a reasonable, though underdeveloped, general theory that offers a promising

explanation but that has seemed too complex for formal analysis; (c) a variety of unconnected

theoretical snippets(perhaps expressed as mathematical process models), many of which find some

empirical support but none of which seems capable of explaining the observation alone: (d) many

separate quantitative empirical results, perhaps generated by “black box” models, none of which are

capable of explaining the observation; or (e) many rich qualitative studies, with little attempt at

developing rigorous theory, perhaps because the underlying processes intuitively seem to be too

complex for existing theory-building tools. In all but the first case, a computational modeling project

may be appropriate and helpful. Theory must be adopted, adapted, or created at this point, but this

process is well beyond our scope in this monograph.

Future of theory development in Science

Theoretics can help us not only in the development of new and better theories but it can also help us in

determining the validity of others’ research and much more. A theory is a set of interrelated statements

that provides an explanation for a class of events. It is “a way of binding together a multitude of facts

so that one may comprehend them all at once” The value of the knowledge yielded by the application

of theory lies in the control it gives us over our experience. Theory serves as a guide to action. By

formulating a theory, we attempt to make sense of our experiences. We must somehow “catch” fleeting

events and find a way to describe and explain them. Only then can we predict and influence the world

around us. Theory is the “fabric” we weave to accomplish these ends, just as a fine garment is crafted

from pieces of fabric and thread, carefully sewn together, and worn for a particular purpose. More

specifically, a theory performs a number of functions. First, it allows us to organize our observations

and to deal meaningfully with information that would otherwise be chaotic and useless

A scientific theory summarizes a hypothesis or group of hypotheses that have been supported with

repeated testing. If enough evidence accumulates to support a hypothesis, it moves to the next step

known as a theory in the scientific method and becomes accepted as a valid explanation of a

phenomenon. When used in non-scientific context, the word “theory” implies that something is

unproven or speculative. As used in science, however, a theory is an explanation or model based on

observation, experimentation, and reasoning, especially one that has been tested and confirmed as a

general principle helping to explain and predict natural phenomena. Any scientific theory must be

based on a careful and rational examination of the facts. In the scientific method, there is a clear

distinction between facts, which can be observed and/or measured, and theories, which are scientists’

explanations and interpretations of the facts. Scientists can have various interpretations of the outcomes

of experiments and observations, but the facts, which are the cornerstone of the scientific method, do

not change. Scientists obtain a great deal of the evidence they use by observing natural and

experimentally generated objects and effects.

The key components of Theoretics:

1. The application of logic to theory development and validation.

This includes the use of syllogisms which are propositions that if agreed to will necessarily lead

to a specific conclusion. One example is that Space physically exists:

A photon physically exists.

If a photon physically exists, then it must have physical dimensions.

Anything with physical dimensions must have mass.

http://www.livescience.com/20896-science-scientific-method.html

If a photon has mass but it can travel in finitely through the void/medium of 'space', then its

density must be less than that of the medium in which it travels.

Therefore space must have a density that is greater than that of light.

Since something must have physical dimensions and mass in order to have

density, then Space must have physical dimensions and mass.

If you agree to the above propositions then you have to agree that Space physically exists.

Another use of logic is in determining if a theory suffers from a fallacy. There are dozens of different

types of fallacies and they can be sometimes hard to spot. Finding the fallacy often takes a bit of

detective work. For instance, many organizations (e.g. AFL-CIO, AMWA) have touted the U.S.

Department of Labor, Bureau of Labor Statistics studies as showing that there is a pay discrepancy

between male and female physicians but when you take into account the number of hours put in

(women worked less hours than men), seniority (older physicians are paid more than younger ones),

and specialty (women chose lower paying specialties than men such as pediatrics, family practice, etc.),

there was no pay discrepancy.

2. The use of good science in theory development.

This can be a tough task. What can we use for the building blocks of our own theories? You can

use what are called facts (that which is known to a virtual certainty) as well as that which has been

demonstrated to probably be true (that which has been found to be valid to date). Science on the other

hand is an interesting definition in that it previously has applied to those fields of study which utilize

the scientific method. The scientific method is fine for experimentation but it is inadequate in

determining what is Science. Therefore those fields of study which attempt to describe and understand

the nature of the universe on a "whole" scale such as physics and chemistry would fit our definition but

so would those fields which study it in "part" such as biology whose field has been limited to only

those life forms on Earth. Theoretics is a field separate (albeit integrated) with experimentation, not

only because good theory development is actually quite complex and intricate, but also because the

people who make good experimentalists rarely make good theoreticians and visa versa. Theoreticians

usually possess a more creative and unbound mind (e.g.. Einstein) while the experimentalist maintains

a more meticulous and practical mindset (e.g. Edison). Rarely are the creative theoretical mind and the

meticulous experimental mind contained within the same skull. So what we currently have are

experimentalists trying to do theory development, and not very well. Theory is the wheel that the ship

of experimentation needs in order to steer a true course (which incidentally is where our logo below

came from) and Theoretics is not only the field of theory development but also its practical application

(e.g. gravity waves). Who could deny the positive impact there would be if the field of Theoretics were

to be formalized into our centers of higher education as well as in the scientific community at large.

3. Look for better theories.

Just because a theory is valid to date, does not make it true. It was only a few centuries ago that

it was thought that the Earth was the center of the universe and for the most part the evidence seemed

to validate the theory. But as man looked at it from a more detailed perspective and developed better

tools, it was found to be invalid. Take gravity for a current example. Currently it is thought that gravity

waves “pull” objects together but a theory that makes more sense and explains gravity better is that of

Space density where Space physically exists and is displaced by matter thus causing a pressure around

the mass.

4. Effective communication is critical.

If a person cannot communicate effectively then the best theory in the world could be rendered

useless. Scientists must agree on the definitions of the words that they are using and use them correctly.

Scientists are recognising the need not only to communicate more freely among themselves – hence the

growth of the open access movement – but also to communicate the significance of their work to both

policy-makers and to the general public, particularly when it has important social implications, whether

for good or, potentially, for ill.“Effective communication of science gives people accurate information

upon which to base decisions. By making science accessible, science communicators help counter the

misinformation and misconceptions which clutter public debate.”

5. Theory is just as important as experimentation (if not more so).

Science cannot progress without theory. Hence Theoretics. All too often the theories in today’s

scientific journals are usually single hypothesis studies and not broader theories which can bring about

a revolution of thought and understanding. Theory development is not easy and is often quite complex

and intricate, with experimentalists rarely making good theoreticians and visa versa. Theoreticians

usually possess a more creative and unbound mind (e.g. Einstein) while the experimentalist maintains a

more meticulous and practical mindset (e.g. Edison). Rarely are the creative theoretical mind and the

meticulous experimental mind contained within the same skull. It is difficult to compare theoretical and

experimental research because they are the two halves of the whole. Without one the other could not

exist. Over the last half century, theoretical research has taken a back seat to single-hypothesis studies.

In doing so we have been missing many great advances or at least delaying them. New theories, no

matter how credible have difficulty in gaining acceptance because they challenge the current thought

and there the forum for publishing and disseminating such theories is limited. It is for this reason that

the Journal of Theoretics was created. Theoretical research must be evaluated by the validity of its

arguments, its logic of thought, and its basis in credible facts. It must be accepted or rejected on those

criteria rather than "p" values and null hypotheses.Though both Einstein and Edison contributed greatly

to mankind, I envision Einstein when I think of theory and Edison when I think of experimentation.

Though both theory and experimentation are necessary components of science and research, the

emphasis on theory has been losing ground in the current scientific literature. It is therefore the

purpose and goal of this journal is to renew the spirit and vigor of theory in scientific research.

Theory + Experimentation = Scientific Progress

II. Model Development

Developing a model presumes the existence of theory. For computational modeling this stage divides

into two parts: building a process model, and expressing that model as a computer program .the first

step should be create a process model of theory. The computational language chosen should be a

language that most naturally expresses the concepts and processes in the model. It often is useful to

create a more detailed flowchart at this point maintain focus and clarity in the actual programming task.

When the theory does not specify a link in overall process flow, a guesses must be made. These may

be competing theoretical explanations of sub processes, or they might be relatively theory-neutral. In

the later case, empirical analysis might provide enough information to guess the distributional effect of

missing subprocess, which can be functionally implement using random variables.

III. Model Evaluation

One must ensure that no errors were introduced when translating the computational model in to

computational representation. The first stage of model evaluation examines the degree of

correspondence between model and its covering theory, which is called its internal validity. Once one is

content with internal validity, its external validity should be tested .The general standard for evaluating

any model are Truth, Beauty and Justice, with the most important being Truth .The standard for truth

divide into internal validity, outcome validity and process validity. If the model passes initial

evaluation, we are ready to explore its behavior. This may include running experiment within the

simulated theoretical world, and it may include count of actual analysis, which allows researchers to

address what might happen if the world where different from observation but still followed the

theorized process.

IV. Refinement

In dynamic simulation, we usually interested in the final values and time histories of particular

neumerical variables, called state variables, given a set of initial values for the state variables and

parameters. A choice concerned whether a model will incorporate continuous or discrete variables.

Most important how variable change with time: if they vary continuously with time, the state variables

will be defined by differential equation; if they vary with discrete time steps ,difference equations are

appropriate. Prospect theory ,like many other models from the social science contains nonlinear

equations. Whether a simulation model represents a linear or nonlinear process is also an important

modeling choice.

ABSTRACTION OF IDEA

Models are significant as they are used to provide a supposedly reliable description or representation of

the world. Most of the models that scientists attempt to generate and analyze are based on assumptions

that are only believed to be true since such models would not consider irregularities and inconsistencies

with common theory. Scientists spend most of their time formulating and analyzing models.

For example, consider the most commonly used models of the earth: flat, spherical, and ellipsoid. These

models do not account for the bumps and grooves. A perfect replica of the earth would reproduce every

contour, but such a representation would be impractical. There are seven key properties, whether they

already be widely accepted or have yet to be accepted at all, that a good model should possess. A real-

world situation has a tremendous potential for variation and an enormous amount of detail. A real-world

object is too unwieldy a thing to be studied mathematically. In many cases the departure from the real

world into the abstract world of modeling does not seem particulary troublesome---in fact it is useful,

and part of our everyday thinking. It is always important, however, when a model is extracted and the

its objects analyzed, manipulated, and reapplied to the real world that we think about the ``fit''. Features

that were disregarded when the model was extracted may turn out to have been important in ways that

were unforseen. It does not always follow that what is true mathematically from a model extracted from

the real world will also be true in the real world. Abstraction involves induction of ideas or the

synthesis of particular facts into one general theory about something. It is the opposite of specification,

which is the analysis or breaking-down of a general idea or abstraction into concrete facts. Computer

scientists use abstraction and communicate their solutions with the computer in some

particular computer language. Abstraction allows program designers to separate categories and concepts

from instances of implementation, so that they do not depend on concrete details of software or

hardware, but on an abstract contract

PROPERTIES OF MODEL

1. Parsimony

2. Tractability

3. Conceptual Insightfulness

4. Generalizability

http://en.wikipedia.org/wiki/Computer_scientists

http://en.wikipedia.org/wiki/Computer_scientists

http://en.wikipedia.org/wiki/Communication

http://en.wikipedia.org/wiki/Computer_language

5. Falsifiability

6. Empirical consistency

7. Predictive precision

1.Parsimony

Parsimonious models are simple models in the sense that they rely on relative few special assumptions

and they leave the researcher with relatively few degrees of freedom. Parsimonious models are

desirable because they prevent the researcher from consciously or subconsciously manipulating the

model so that it over-fits the available facts. Over-fitting occurs when a model works very well in a

given situation, but fails to make accurate out-of-sample predictions. For example, if a model

incorporates a large set of qualitative psychological biases then the model is non parsimonious, since

selective combination of those biases will enable the researcher to tweak the model so that it explains

almost any pattern of observations. Likewise, if a model has many free parameters – for instance, a

complex budget constraint or complex household preferences then the model is relatively non

parsimonious. When models are flexible and complex the researcher can combine the myriad elements

to match almost any given set of facts. Such flexibility makes it easy to explain in-sample data,

producing the false impression that the model will have real (out-of-sample) explanatory power. It is a

principle urging one to select, among competing hypotheses, that which makes the fewest

assumptions and thereby offers the simplest explanation of the effect. The role of parsimony and

simplicity is important in all forms of induction, learning, statistical learning theory, and in the debate

about rationalism vs empiricism.

The principle of parsimony means that the simplest possible model should be chosen. One can view the

problem of statistical modelling as choosing an adequate statistical model which is the most

parsimonious. In mathematical programming terminology we could say that the problem of statistical

Modelling has an objective function which is to minimize the model complexity (Model Parsimony)

subject to the constraint of Model Adequacy

2. Tractability

Tractable models are easy to analyze. Models with maximal tractability can be solved with analytic

methods i.e. paper and pencil calculations. At the other extreme, minimally tractable models cannot be

solved even with a computer, since the necessary computations/simulations would take too long. For

instance, optimization is typically not computationally feasible when there are dozens of continuous

state variables in such cases, numerical solution times are measured on the scale of years or centuries.

Tractable means easily managed. Tractability concerning how easily something can be done.

Computational Tractability: In many relevant environments optimal mechanisms are computationally

intractable. A mechanism that is computationally intractable, i.e., where no computer could calculate

the outcome of the mechanism in a reasonable amount of time, seems to violates even the loosest

interpretation of our general desideratum of simplicity. We will try to address this computational

intractability by considering approximation. In particular we will look for an approximation

mechanism, one that is guaranteed to achieve a performance that is close to the optimal, intractable,

mechanism’s performance. A first issue that arises in this approach is that approximation algorithms

are not generally compatible with mechanism design. The one approach we have discussed thus far,

following from generalization of the second-price auction, fails to generically convert approximation

algorithms into dominant-strategy incentive-compatible approximation mechanisms. Our philosophy

for mechanism design is that a mechanism that is not computationally tractable is not a valid solution to

a mechanism design problem. To make this criterion formal we review the most fundamental concepts

from computational complexity.

3. Conceptual Insightfulness

Conceptually insightful models reveal fundamental properties of economic behavior or economic

systems. For example, the model of concave utility identifies the key property of risk aversion. The

concept of concave utility is useful even though it makes only qualitative predictions. An optimizing

framework like Nash Equilibrium is also conceptually insightful even though it relies on an assumption

that is empirically false – perfect rationality. The concept of Nash Equilibrium clarifies some abstract

ideas about equilibrium that are important to understand even if the Nash framework is an incomplete

explanation of real-world behavior. Finally, many models are conceptually useful because they provide

normative insights.

4. Generalizability

Generalizable models can be applied to relatively wide range of situations. For example, a generalizable

model of risk aversion could be used to analyze risk aversion in settings with small or large stakes, as

well as risk aversion with respect to losses or gains.A generalizable model of learning could be used to

study learning dynamics in settings with a discrete action set or a continuous action set or an action set

with a mixture of discrete and continuous actions. A generalizable model of inter-temporal choice could

be used to study decisions with consequences that occur in minutes or decades. Generalizability is a

statistical framework for conceptualizing, investigating, and designing reliable observations. It is used

to determine the reliability (i.e., reproducibility) of measurements under specific conditions. Facets are

similar to the “factors” used in analysis of variance, and may include persons, raters, items/forms, time,

and settings among other possibilities. These facets are potential sources of error and the purpose of

generalizability is to quantify the amount of error caused by each facet and interaction of facets. The

usefulness of data gained from a study is crucially dependent on the design of the study. Therefore, the

researcher must carefully consider the ways in which he/she hopes to generalize any specific results.

Generalizability theory acknowledges and allows for variability in assessment conditions that may

affect measurements. The advantage of theory lies in the fact that researchers can estimate what

proportion of the total variance in the results is due to the individual factors that often vary in

assessment, such as setting, time, items, and raters.

5. Falsifiability

Falsifiability and prediction are the same concept. A model is falsifiable if and only if the model makes

nontrivial predictions that can in principle be empirically falsified. If a model makes no falsifiable

predictions, then the model cannot be empirically evaluated. Falsifiability or refutability is the logical

possibility that an assertion could be shown false by a particular observation or physical experiment.

That something is "falsifiable" does not mean it is false; rather, it means that if the statement were false,

then its falsehood could be demonstrated. Falsifiability, particularly testability, is an important concept

in science and the philosophy of science. The concept was made popular by Karl Popper in his

philosophical analysis of the scientific method. Popper concluded that a hypothesis, proposition,

http://en.wikipedia.org/wiki/Observation

http://en.wikipedia.org/wiki/Reliability_%28statistics%29

http://en.wikipedia.org/wiki/Analysis_of_variance

http://www.wikipedia.org/wiki/Science

http://www.wikipedia.org/wiki/Philosophy_of_science

http://www.wikipedia.org/wiki/Karl_Popper

http://www.wikipedia.org/wiki/Scientific_method

http://www.wikipedia.org/wiki/Hypothesis

http://www.wikipedia.org/wiki/Proposition

or theory is "scientific" only if it is, among other things, falsifiable. That is, falsifiability is a necessary

(but not sufficient) criterion for scientific ideas. Popper asserted that unfalsifiable statements are non-

scientific, although not without relevance. For example, meta-physical or religious propositions have

cultural or spiritual meaning, and the ancient metaphysical and unfalsifiable idea of the existence of

atoms has led to corresponding falsifiable modern theories. A falsifiable theory that has withstood

severe scientific testing is said to be corroborated by past experience, though in Popper's view this is

not equivalent with confirmation and does not guarantee that the theory is true or even partially true.

6.Empirical consistency

Empirically consistent models are broadly consistent with the available data. In other words,

empirically consistent models have not yet generated predictions that have been falsified by the data.

Empirically consistent models can be ranked by the strength of their predictions. At one extreme, a

model can be consistent with the data if the model makes only weak predictions that are verified

empirically. At the other extreme, models can achieve empirical consistency by making many strong –

i.e. precise – predictions that are verified empirically.

7. Predictive precision

Models have predictive precision when they make precise or strong predictions. Strong predictions are

desirable because they facilitate model evaluation and model testing. When an incorrect model makes

strong predictions it is easy to empirically falsify the model, even when the researcher only has access

to a small amount of data. A model with predictive precision also has greater potential to be practically

useful if it survives empirical testing. Models with predictive precision are useful tools for decision

makers who are trying to forecast future events or the consequences of new policies.

A model with predictive precision may even be useful when it is empirically inaccurate. For instance,

policymakers would value a structural model that predicts the timing of recessions, even if the model

usually generated small predictive errors. An alternative model that correctly predicted that a recession

would occur at some unspecified time over a ten year horizon would not be as useful. In general,

models that make approximately accurate strong predictions are much more useful than models that

make exactly accurate weak predictions.

http://www.wikipedia.org/wiki/Theory

http://www.wikipedia.org/wiki/Corroboration

Scientific models are mathematical approximations of the world. In every other scientific field, these

approximations are judged by something akin to the seven criteria listed above and not by a model’s

adoption of particular axiomatic assumptions. In the history of science, every axiomatic litmus test has

been discarded. In retrospect, it is easy to see that axioms like the flat earth, the earth at the center of the

universe, or the Euclidean structure of space, should not have been viewed as inviolate. It is likely that

every axiom that we currently use in economics will suffer the same fate that these earlier axioms

suffered. Indeed, it is now believed that no branch of science will ever identify an inviolate set of

axioms [Kuhn 1962]. This does not mean that we should abandon axioms, only that we should not use

axiomatic litmus tests to define a field of scientific inquiry. Even highly successful models do not have

all seven properties. Many of the properties are in conflict with one another. For example, generalizing

a model sometimes makes a model unfalsifiable the most general form of the theory of revealed

preference can’t be rejected by behavioral data. Models that make quantitatively precise predictions are

the norm in other sciences. Models with predictive precision are easy to empirically test and when such

models are approximately empirically accurate they are likely to be useful.

IMPORTANCE OF VIRTUAL EXPERIMENTS IN SCIENCE AND TECHNOLOGY

Virtual worlds are typically designed to elicit synchronous reactions while participants are in the virtual

environment. For example, an approach or attack by a character in a video game may elicit an

immediate psychological, physiological, and behavioral response by a player. Recent research,

however, has revealed that experiences in virtual worlds also have the power to influence behaviors in

the physical world after exposure (Anderson & Bushman, 2001; Fox & Bailenson, 2009; Price &

Anderson, 2007; Rizzo & Kim, 2005; Yee, Bailenson, & Ducheneaut, 2009).The progress of

technology has allowed immersive virtual environments to become increasingly realistic. These

advances may increase presence, the user’s feelings that the virtual environment is real and that the

user’s sensations and actions are responsive to the virtual world as opposed to the real, physical one

(Biocca, Harms, & Bur goon, 2003; Lee, 2004; Lombard & Ditton, 1997; Loomis, 1992; Slater &

Steed, 2000; Steuer, 1992; Witmer & Singer, 1998). The experience of presence may be a result of

characteristics of the technology used (IJsselsteijn, de Ridder, Freeman, Avons, & Bouwhuis, 2001),

aspects of the environment such as graphic realism (Ivory & Kalyanaraman, 2007), or individual

differences among users.

Research has determined several variables that influence a user’s experience of presence, including

features of the virtual environment, characteristics of the user, and the task in which the participant is

engaged .Regardless of the objective features of virtual worlds, the user’s psychological, subjective

experience of presence may enhance the experience and effects of a virtual environment both during

immersion and subsequently in the real world. Virtual experiments are required in order to simulate in

the models precisely the same protocols employed in generating the experimental data used to develop

or test the models. In defining virtual experiments there is a balance to be struck between a

standardized language that is reasonably concise, and hence providing support in many tools is not too

difficult, and allowing flexibility for researchers to represent new and varied kinds of experiments. The

major consequence that we see is that in order for standards to be taken up by the end users, use of

standards must be made easy. In addressing this problem we consider the following aspects.

Firstly, through examining the kinds of experiments required by our scientific applications, we are

determining the minimal set of semantic constructs required in a “protocol language” that still allows

the largest possible set of common experiments to be encoded. Note that we are not seeking to encode

every possible experiment – unusual or especially complex cases may well be better expressed using

general purpose programming languages and/or workflow systems.

Secondly, we argue that there is great value in the protocol language supporting the definition of

common generic components that may be parameterized, and hence instantiated for specific scenarios.

A library of such components may then be built up, facilitating the creation of new experiment

descriptions. For example, a common experiment type in cardiac electrophysiology is the voltage

clamp, where a potential is applied to the cell membrane, and the current response analyzed. This

generic protocol is used with different trans-membrane currents and different applied voltage traces –

these would become inputs to a parameterized pro-tocol. Any voltage clamp experiment could then be

specified quickly and easily.

Importance of Virtual Experiences

There is no risk :No danger. No loss of money or resources (other than the cost of designing and

doing the activity). Minimal loss of time. Not so in the real world.

Succeed through failure: The effectiveness of trial and error should not be underestimated. It’s

often said that we learn more from our failures than we do from our successes. Perhaps it’s

because we tend to do more analysis when we fail. Or that the emotional toll it takes on us

makes the experience more memorable and drives us to avoid it in the future (as some brain

research suggests). In any event, failure in the real world has consequences that discourage or

prevent us from even trying.

Simulate any condition we want: For example, how can a pilot learn to fly an airplane in poor

weather conditions? Does he jump into a real plane and go looking for a storm? How does he

learn to fly with mechanical failure?.

Control and accelerate the timing of events: Not so in the real world. For example, as a

business owner, how long would it take you to experience customer, human resource, financial,

and other issues before you gained the wisdom to anticipate and avoid such problems in the

future? Would it take months, years, decades?

Isolate and exaggerate cause and effect: In the real world, the consequences of our decisions

(and indecisions) may not be easily apparent. They may be hidden from view, or they may be

influenced by an endless number of other variables. For example, let’s say you invest in a stock,

but the value of the stock falls within days of your purchase. Does this mean that something has

gone wrong with the company? Or that you chose to invest in the wrong stock? Are all stocks

bad? Perhaps it is the economy? If we cannot recognize and evaluate cause and effect, how can

we truly learn how the world works?

Guide the learner towards making the correct conclusions: In the real world, our brain is

constantly making connections and conclusions, whether we realize it or not. And some of those

conclusions are plain wrong (or at the very least, based on insufficient data). For example, if ice-

cream sales increase during crime sprees in Central Park, does that mean eating ice-cream

encourages criminal activity? Closer investigation might reveal that weather is the driver, not

the ice-cream. But, how can we even know when we are coming to incorrect conclusions? It

takes a very well-trained mind to see the world objectively, and few people have this ability.

Virtual experience can be personalized and measured: Learner strengths and weaknesses can

be captured and used to guide the learner towards overcoming deficiencies while refining

existing skills. Performance can be directly monitored and assessed in an accurate, authentic,

and meaningful way. The inefficient and subjective trial-and-error, multiple-choice testing, and

peer-review methods used in the real world pale in comparison. And, as the saying goes, “you

can’t manage what you can’t measure.”

Virtual experience can be highly motivating: The real world often delivers significant

emotional consequences to a learner, including stress, loss of self-esteem, loss of time, money,

and more. I remember hating my first day on the job at IBM. But, eventually, the job turned out

great and shaped my life. What if I had quit on that first day? The safety of a virtual experience,

combined with well-designed gamification techniques, can create an encouraging and rewarding

environment that is free of the many factors that discourage people from pursuing their dreams.

Virtual experience can be highly scalable and widely accessible. Potentially millions of people

can participate in a virtual experience at the same time, at a comparatively minimal cost. It’s

pretty well impossible to conjure up the massive amounts of time, money, and resources needed

to make meaningful real-world experience available to the masses.

There are limitations to virtual experience as well. Not everything can be simulated in an effective way.

It may also be argued that the designer of the virtual experience may be able to knowingly, or

unknowingly, apply undue influence, incorrect data, or subliminal messaging into the learning

experience. True. But the same can be said of all other educational methods and the real world too.

MODULE 2

Simulation of dynamical systems, Simple ODE and Testing the solution against the data, Numerical solutions in SCILAB, Numerical versus Analytic methods, Compartmental Models, Complex ODE - Limited Population growth, Formulating the Differential Equation, Equilibrium Solutions, Logistic Equation, Systems of ODEs - Modeling Predators, Coupled ODEs, Numerical solutions to the fox and rabbit problem, Phase Plane Analysis, Equilibrium Populations, Explaining the oscillations, Adding Logistic Growth, Numerical method for bound value problems, Extended Predator-Prey Model, Functional Response, Numerical Response, Implementing the Extended Model Neural networks, Cellular automata

Simulation of dynamical systems

In practice we wish to study the behavior of complex systems subject to general input disturbances. Then it is not feasible to solve the (diÆerential) equations which describe the system analytically. Instead, it is necessary to use numerical simulations.

In numerical differential equation solvers the systems are usually represented in terms of coupled first-order differential equations. Then the system is characterized using a number of state variables,

and the time derivative of each variable is expressed as a function of the other variable and the input(s):

Example:Second-order system

Define the state variables

Then

which has the desired form.

There are many good ordinary diÆerential equation (ode) solver available, for example in Matlab. The example below shows how a second-order system with a step change in the input can be simulated.

Example:Simulation of second-order system controlled by a PID controller

The same approach can be applied to simulate more complex system connections. For illustration, consider the second-order system.

where u(t) is determined by the PID control law

(Observe that the derivative action is applied only to y). The we can define the state variables:

then

where

Simulink

As seen from the above simple examples, when the number of state variables required to describe a system increases, it may become rather tedious to construct programs which compute the time derivatives of all state variables. Therefore, various tools which facilitate the analysis and simulation of dynamical systems have been developed. One widely used tool of this kind isSimulink for use with Matlab, in which the block diagram and the individual blocks can be constructed using a graphical user interface.

Below is shown an example of a Simulink program consisting of a first-order system

and a time delay controlled by a PID controller, where there is a step change in setpoint. The block Scope plots the associated variable.

The parameters of the various blocks can be deØned by opening the individual blocks

The block may have internal structure, for example the structure of the PID block is shown below

Numerical solutions in SCILAB

It is designed to familiarise you with solving ordinary differential equations using the Scilab procedure ode.

Exercise 1

Chemical Reaction System

Numerical versus Analytic methods Suppose you have a mathematical model and you want to understand its behavior. That is, you want to find a solution to the set of equations. The best is when you can use calculus, trigonometry, and other math techniques to write down the solution. Now you know absolutely how the model will behave under any circumstances. This is called the analytic solution, because you used analysis to figure it out. It is also referred to as a closed form solution. But this tends to work only for simple models. For more complex models, the math becomes much too complicated. Then you turn to numerical methods of solving the equations, such as the Runge-Kutta method. For a differential equation that describes behavior over time, the numerical method starts with the initial values of the variables, and then uses the equations to figure out the changes in these variables over a very brief time period. Its only an approximation, but it can be a very good approximation under certain circumstances.

A computer must be used to perform the thousands of repetitive calculations involved. The result is a long list of numbers, not an equation. This long list of numbers can be used to drive an animated simulation, as we do with the models presented here. There is also a middle ground between these two methods. There are many important non-linear equations for which it is not possible to find an analytic solution. However, there are techniques where you can find approximate analytic solutions that are close to the true solution, at least within a certain range. One such method is called the perturbation method. The advantage over a numerical solution is that you wind up with an equation (instead of just a long list of numbers) which you can gain some insight from. Analytical solutions are the only methods taught to solve math problems until students take upper-level calculus classes. A simple example of an analytical solution is the equation for a line: y=mx+b. You can see that in an analytical solution, all variables of the system are accounted for at the same time which leads to large and complex equations.

Analytical: Solve a partial differential equation with initial and boundary conditions

• Need solution for each particular problem

• Gives dependence on variables (S, T, etc.)• Only available for relatively simp• le problems (homogeneous, simple geometry)• Examples: Theis, Theim, Analytical Element Method (AEM)

Numerical: Replace partial derivative with algebraic equation

• one solution can handle multiple problems• heterogeneous as well as complex geometry• some loss in accuracy if large region• does not give a continuous solution• Examples: Finite Difference Method (FDM), Finite Element Method (FEM

Note that the numerical solution is an approximation. In most cases it will be close enough, which is fine for most engineering problems. Typically mathematicians have more time and funding to find an analytical solution, but the demands of business usually necessitates an approximate solution for engineers working in industry. Analytical methods are typically only solvable in cases of “simple” models. Numerical methods, in contrast, provide ways to manipulate complex math problems so that they may be solved by simple processes. These methods allow for imperfect and complex models to be approximated, usually with great accuracy. Numerical methods can account for more variables and dimensions than would be solvable when using analytical methods. They are implemented by many different industries from meteorologists creating weather models to automobile engineers generating crash test simulations. They are even employed to optimize models for financial firms and insurance companies. Imagine a car company that is certifying the crash test rating on a new car model. There are several stringent requirements that new vehicles must meet in regard to passenger and pedestrian safety. Building complete prototype cars to crash test is expensive, so car companies run mathematical simulations of crash tests as part of an iterative design process. An analytical solution accounts for every possible interaction between all components of the vehicle and with the objects of collision. In the case of a car crash, this would be too difficult to model analytically. To better use mathematical resources a car will be broken down into several systems, each of which is individually mathematically modeled. Simulations of different scenarios are then computed and the summations of the results of these systems are modeled over a specified time interval.

When to use Numerical MethodsIt is important to note the limitations of numerical methods so they can be used effectively.

Numerical methods can only deliver approximate solutions to problems over a defined interval such as time or distance. The error present in the solution is dependent on how the problem is solved. Analytical methods provide solutions that are valid at all points. Numerical methods are employed with large systems involving many interactions. Numerical methods are used when evaluating empirical information such as experimental data. Such data, no matter how exact and regulated the experiment was conducted, will involve some degree of error. It would be a waste of time to employ exact analytical techniques in such a situation because the answer can never be more “correct” than the input data.

Numerical• The solution is only valid for a defined interval (time, distance) • A close approximation is good enough

• The solution can be applied to many systems • Large systems with many interactions • Evaluating experimental data • Short time / little resources available to construct a solution • Offline solutions which are not time-sensitive

Analytical• The solution must be valid at any interval (time, distance) • An exact solution is required • The solution can be only applied to one system • Small closed systems • Large budget, time, and manpower available to construct a solution • No computational aids available (computers with math modeling software) • Real-time computations on limited computer hardware

When deciding to use analytical or numerical methods consideration must be given to many criteria. In practice some important factors are the time available to solve, the monetary resources, the computational resources available, and the allowable error. Analytical methods will require more involved mathematical computations and can result in extended computational times, which will both cause increased cost. Numerical methods use simpler mathematical computations, but require frequently repeated computations. Numerical methods may at times be impractical if there are no computational aids such as computer programs or tables of previously calculated values. One needs to consider the interval on which the solution needs to be valid. A larger interval requires more computations for numerical methods and thus more computational time. The amount of error in a solution when using numerical methods is dependent upon the method or series of methods used and the order in which they are used. Not all methods are valid for all problems. However, some methods allow for quicker computations and thus provide an advantage when they are able to be used. When choosing a method or series of methods to use it is important to determine how accurate your solution needs to be and to track the error in your solution.

Compartment models

Modeling of dynamical systems plays a very important role in applied science, and compartment models are among the most important tools used for analyzing dynamical systems.Compartment models are often used to describe transport of material in biological systems. A compartment model contains a number of compartments, each containing well mixed material. Compartments exchange material with each other following certain rules.Figure shows a sketch of such a system. In this figure, compartments are represented by boxes and the connections between the compartments are represented by arrows. Every compartment (that is every box) has a number of connections leading to the box (inflows) and a number of arrows leading from the box (outflows). Material can either flow from one compartment to another, it can be added from the outside through a source, or it can be removed through a drain or a sink.Think of a bathtub, where water (the well-mixed material) is added through the faucet and leaves through the drain. In the example above,the material was water, but it can be used in a more abstract way. Generally, the material represents the amount of something that we wish to account for. To account for the material the model must fulfill some conservation law. In the bathtub example, we could develop a model based on conservation of mass. Most compartment models (as the one shown in Figure 1) have more than one compartment and equations for such a model are obtained by describing a conservation law for each compartment. Conservation laws state that the difference between what

flows in and what flows out a mounts to how much will be stored in the compartment.

Figure 1.1: Sketch of a compartment model. Material can be stored in the boxes andtransported between boxes following arrows

A compartment model could also represent an ecological system where the material could be energy, the compartments could represent different species of animals and plants, and the flow between compartments could account for uptake and loss of food (or energy). In this case we would base the equations on laws describing conservation of energy. Compartment models also arise in physiology, where the material could be oxygen that is transported with the blood between different organs (compartments) in the body. It should be emphasized, that one cannot think of compartments and the flows in and out of compartments as individual components where each part can be described independently of each other. Both the in- and the outflow from any compartment may depend on the volume inside the compartment. Similarly, the inflow into a compartment may dependent on the outflow from another compartment. In other words, it is important to think of the system as a whole, where the parameter representing the material in the compartment (the state-variable) can dependent on what flows in and what flows out. In addition, since what flows into one compartment typically flows out of another compartment, the state-variables depend on each other and on the state of the system as a whole. The important point to remember is, that it is the person modeling the system who chooses how themodel parameters and variables, in a consistent way, depend on each other.

Example : A Bathtub

Figure 1.2: A bathtub with an external source (the faucet) and sink (the drain).

Figure 1.2 shows a bathtub, that at the start of the investigation (at t= 0) contains V0 = 100 l (liter) of water. When the faucet is opened water flows into the bathtub with a velocity of 5 l/min. If the drain is opened, an additional 4 l/min will flow out of the bathtub. Now, let us derive an equation that can be used to describe the volume V(t) l of water in the bathtub as a function of time. The material that is transported in and out of the bathtub is water, and it is our job to determine how the volume of water changes with time, i.e. we want to determine the function .

The function V(t) can be determined using simple calculations, however, we will use this example to establish a more general method, that also works for more complicated examples, such as example 2 discussed in the next section. It is our aim to develop an equation, that describes how the volume changes during a small period of time (from t to t+ ∆t). The change in volume during the time increment ∆t is given by V(t+ ∆t)−V(t (this can be both positive and negative). Since water is an incompressible fluid, conservation of mass states that the change in volume equals the difference between what flows in and what flows out from the bathtub. Assume that both the faucet and the drain are open. Then, the flow into the bathtub is 5 l/min and the flow out of the bathtub is 4 l/min. In other words

If both sides of the equation are divided by ∆t we get the following difference equation:

If we let ∆t→0 we recover the derivative of V(t)

From this equation it becomes clear, that our method does not provide a direct way for determining the volume V(t), but instead we get the derivative V(t). However, using integration we can solve this equation for V(t). For any constant C, the solution is given by

All functions of the above type are candidates for the complete solution. To find the exact solution we need to have one more piece of information. For, example, if we know that V(0) = 100 l we can find C as

Inserting this value for C into the general solution (1.1) we can find the specific form of V(t). For this example, the specific solution is given by

If both the inlet and the outlet are open, and we start with 100 l of water, then every 1min we add 1 l of water to the bathtub

In this example, we could have gotten to the same conclusion without introducing this complex notation.We have used a compartment with the volume of water representing the state variable V+(t). Water flows into the bathtub at a rate of 5 l/min (this is represented by an arrow pointed into the compartment,Vinflow) and water is flowing out of the bathtub with a rate of 4 l/min (this is represented by an arrow pointing away from the compartment, Voutflow).Note, that the arrows indicate what way the water is flowing.

Limitations of Compartment Models

The basic idea with compartment models is to describe a system as a number of compartments and to derive equations of mass-balance for each of these compartments. In itself writing equations for mass balance is a very healthy technique, which assures that the model has a fundamental background, and that it is possible to judge the validity of the model.

Is the system closed

The equation for conservation of mass is only correct, if all the material that is added to or removed from the system is described in the model. That is, if the system is closed in some sense. In other words the compartments may not include un-accounted for sources or sinks. Normally, a closed system is obtained by assuming that the total amount of material is constant. Our definition of closed is less strict, since we allow to study systems were material is continuously added or removed. However, in such systems it is crucial to describe how material is added and/or removed.

Is homogeneity a reasonable assumption?

When we use compartment models we assume that all material in the compartment is homogeneous. This assumption can not directly be seen from the equations. Equations only include information about the total amount of material in the compartment, i.e. We cannot include more detailed description of the system. An example could be a compartment model of the blood in the circulatory system. We can describe arteries, veins and the heart, by compartments. These compartments would include blood, for example, it would not be possible to distinguish between plasma, red, and white blood-cells.

It is not always possible to assume homogeneity, for example let us try to use a one- compartment model to describe the concentration of phosphor in a lake. Following this strategy, we assume that the lake has an inflow, an outflow, and a compartment representing the concentration of phosphor. This implies that we assume that all material in the lake is uniformly mixed, and that the amount of phosphor leaving the lake is constant. This is not correct. For example in deep lakes (deeper than 5-10 meter), it is normal to see partition into several layers with different concentration of phosphor. This has to do with the fact that the temperature in the water changes with depth. Often, most of the water flowing into the lake stays at the surface and flow through the lake without exchange with water in the

deeper layers. Hence, we can not calculate the amount of phosphor using a one-compartment model. One way to solve this problem is to let the lake consist of several layers each having a different concentration, and a different amount of phosphor transported between the layers. The disadvantage with splitting the lake into several compartments is that more information is needed to describe the system. For example if the water in the lake is split into two compartments we need to know the concentration of phosphor in the water flowing into and out of the deep layer. This could be very difficult to measure. Even if it is not possible to determine all parameters analytically, it may be advantageous to split the model into smaller subsystem. It is possible using optimization to estimate some unknown parameters, but care must be taken, that enough data exist to obtain realistic parameter values.

Is the balance equation accurate enough?

When the equation of mass balance is derived it is important that all essential transports are known. In real biological systems this is typically not the case. In a realistic system, some of the mass-balances are known but others are not. Before a model can be completed all parameters must be estimated either based on experiments, or calculated based on values found in the literature.

Is the balance of mass relevant?

This last question can seem rather surprising? However, it is not all systems that can be described in terms of mass-balance. This question is easier to understand by discussing a counter example. In 1950 a compartment model was used to mark rabbits as the biggest thread for survival of the grass-land in Australia. The model was based on observations, that rabbits could eat all grass on an entire field. These observations overestimated the number of rabbits and the amount of damage they could make. The reason for this was that all grass-land was lumped into one compartment independent of their location. As a result the Australian state started to poison rabbits by exposing them to the deadly virus myxomatois, which killed 90% of all rabbits. Today, rabbits are no thread to Australia, and the original study has been criticized for being too crude.

Sensitivity analysis

When a compartment model is derived not all models and initial conditions are known precisely. In addition, the solution to the model will change depending on the model .When a compartment model is derived not all models and initial conditions are known precisely. In addition, the solution to the model will change depending on the model parameters and initial conditions. Therefore, it is important to investigate the sensitivity of parameters and initial conditions. This can be done by varying each parameter and record the change in the results.

Complex ODE - Limited Population growth

Using a limited-growth population model (also known as a logistic growth model), investigate several ways to visualize solutions to autonomous first-order differential equations—those that involve only the first derivative and that do not depend on time. Plot slope-field and solution graphs, and learn about a pictorial tool called a phase line. You may have studied recently the "natural" growth of biological populations, starting with assumptions of growth rate proportional to the population and no restrictions on growth. As you know, these assumptions lead to a model formula that is exponential. In this module we model a population for which there is a limit on the population size. Rather than start from data on a real population, we start from a theoretical assumption about the growth rate, and we study what this assumption predicts about the population.Our Theoretical Assumption: Let M be the maximum population that the environment will support. We assume (as biologists often do) that the rate of change of the population is proportional to the product of

• the population and • M minus the population.

• In your worksheet, write this Theoretical Assumption as a formula, using c to denote the proportionality constant and P to denote the population.

Here are two reasons for assuming such a relationship:

◦ When the population P is small relative to M, the difference M - P is essentially the same as M (a constant), and then dP/dt is essentially proportional to just the population P. This is the "natural growth" situation you have studied already. Hence, for small populations, our new model should result in population predictions that will look a lot like "natural population growth." This is what a biologist would expect to happen with most populations.

◦ At the other extreme, when the population gets close to M, the factor M - P is close to zero, and this effectively shuts down further growth. Thus, the factor M - P captures in a natural way the effect of a limited environment.

In the next Part we consider a way to generate numerical and graphical solutions to any problem of the form

Find a function P = P(t) such that

◦ dP/dt = some formula in P, and ◦ P has some known value at the starting time.

This problem has two pieces of given information:

• an equation involving dP/dt, called a differential equation, and • a starting value P0 for P at the starting time t = 0; P0 is called an initial value.

Such a problem is called a differential-equation-with-initial-value-problem, which is usually abbreviated to initial value problem.

Stepping Out: Rise = Rate x Run

Here is our procedure for generating a solution function P = P(t) for the initial value problem

• dP/dt = c P (M - P), and • P = P0 at time t = 0.

Starting from the known value P0, we will generate a set of values

P0 , P1 , P2 , P3 , ..., Pn

at corresponding times

0 , t1 , t2 , t3 , ..., tn ,

where the times are equally spaced at (small) steps of Delta-t. In brief, we will use the fact that we know both P and dP/dt at time 0 to calculate a rise from P0 to P1. Once we have a value for P1, we can find a slope at that point as well, so we can repeat the procedure to find P2. Continuing in the same way, we can find as many values of P as we like. As we will see, the success of this procedure depends on how small our step sizes are, and, in any case, this is an approximation procedure, because we are using instantaneous rates of change as if they stay unchanged over each interval of length Delta-t. We illustrate the procedure graphically. The following diagram shows a graph of the starting situation. Plotting population as the dependent variable and time as the independent variable, the initial population P(0) is represented by a vertical line segment of length P(0) at the starting time t = 0.

Now we step forward in time to t1 = Delta-t. We use the slope at time 0, which we call slope0, to draw a line from (0, P0) to (t1, P1):

The slope of a line segment is the rise divided by the run, so the rise is the slope times the run. In our situation, the run is Delta-t, so rise = slope0 x Delta-t.

The line segment representing P1 is the sum of two pieces, one of length P0, and the other of length rise = slope0 x Delta-t. Hence

P1 = P0 + slope0 x Delta-t.

Now enter the following formulas in your worksheet.

• The formula to compute P1 from P0.

• The same reasoning applies to computing P2 from P1. Enter the formula for P2.

• The same reasoning applies to computing Pk from Pk-1. (See the next figure.) Enter the formula for Pk.

Great! It looks like we have a wonderful recursion relation for computing Pk from Pk-1, except for one little problem: What is slope0? More generally, what is slopek-1 for k = 1, 2, ... ? Answer: This is where our Theoretical Assumption from the Part 1 enters the picture! The assumption gives us a formula for

the rate of change of population (the slope!) in terms of the current population.

• Use this observation to enter the formula for slope0 in terms of P0.

• The same reasoning applies to computing each slopek from the corresponding Pk. Enter the resulting equation.

We thus have an interesting interweaving -- we have to compute our quantities in the following order:

P0 , slope0, P1, slope1, P2, slope2, P3, slope3, ..., Pn, slopen.

Formulating the Differential Equation

In most cases, it is equally important to be able to come up with a di erential equation that accurately describes the problem you need to solve as to being able to solve the equation. The process of writing an equation describing the situation is called the mathematical modeling. In order to successfully model a problem with and solve a di erential equation, it might be helpful to ask yourself the questions listed below.

2. Identify the real problem. Identify the problem variables. What do we need to find out? What is the problem asking for?

3. Construct appropriate relation between the variables - a diferential equation What is dependent, what independent variable and what is the rate of change? Figuring out how these quantities are related will result in a di erential equation that models the problem.

• Obtain the mathematical solution. Recognize the type of the equation. Decide if you can solve it analytically ('by hand') or if you need to use the technology. In both cases, decide on the method that you will use (e.g. is the equation separable, linear or some other type; could Euler's method, ODE45 or other numerical solution be found).

• Interpret the mathematical solution. After solving the equation, check if the mathematical answer agrees with the context of the original problem. Check the validity: Does your answer make sense? Do the predictions agree with real data? Do the values have correct sign? Correct units? Correct size? Check e ectiveness: Could a simpler model be used? Have I found a right balance between greater precision (i.e. greater complexity) and simplicity?

Example 1. A bacteria culture starts with 500 bacteria and grows at a rate proportional to its size. After 3 hours there are 8000 bacteria. Find the number of bacteria after 4 hours.

Discussion. Identifying variables: let y stands for the bacteria culture and t stands for time passed. The first part of the problem “A bacteria culture starts with 500 bacteria..” tells us that y(0) = 500: The second part “... and grows at a rate proportional to its size.” is the key for getting the mathematical model. Recall that the rate is the derivative and that "...is proportional to..” corresponds to " equal to constant multiple of..” So, the equation relating the variables is dy/dt = ky: The solution of this diferential equation is y= y0ekt. Since y0 =500, t remains to determine the proportionality constant k: From the condition "After 3 hours there are 8000 bacteria.” we obtain that 8000=500 e3k which gives us that k =1/3 ln 16 =.924. Thus, the number of bacteria after t hours can be described by

y=500e.924t.

Solution. Using the function we have obtained, we find the number of bacteria after 4 hours to be y(4) = 20159 bacteria

Example 2.

A population of field mice inhabits a certain rural area. In the absence of predators, the mice population increases so that each month, the population increases by 50%. However, several owls live in the same area and they kill 15 mice per day. Find an equation describing the population size and use it to predict the long term behavior of the population.

Solution: Identifying variables: Let y stands for the size of mice population and t be the time in months. Note that without the predators, the equation describing the population of mice would be dy/dt=.5y. Incorporating the information about the owls, we must subtract the monthly loss in the number of mice. As 15 are killed daily, 15.30 = 450 is killed monthly and so the equation is dy/dt=.5y-450.

Solving the equation we obtain,

Graphing the equation for di erent initial conditions we can see that the number of mice will

- Drop to 0 if the initial number is smaller than 900;◦ Stay constant at 900 if the initial number is equal to 900; and◦ Keep increasing if the initial number is larger than 900.

Thus y = 900 is an unstable equilibrium solution in this case.

Equilibrium Solutions

Clearly a population cannot be allowed to grow forever at the same rate. The growth rate of a population needs to depend on the population itself. Once a population reaches a certain point the growth rate will start reduce, often drastically. A much more realistic model of a population growth is given by the logistic growth equation. Here is the logistic growth equation.

In the logistic growth equation r is the intrinsic growth rate and is the same r as in the last section. In other words, it is the growth rate that will occur in the absence of any limiting factors. K is called either the saturation level or the carrying capacity.

Now, we claimed that this was a more realistic model for a population. Let’s see if that in fact is correct. To allow us to sketch a direction field let’s pick a couple of numbers for r and K. We’ll use r=1/2 and K = 10. For these values the logistics equation is.

If you need a refresher on sketching direction fields go back and take a look at that section. First notice that the derivative will be zero at P = 0 and P = 10. Also notice that these are in fact solutions to the differential equation. These two values are called equilibrium solutions since they are constant solutions to the differential equation. We’ll leave the rest of the details on sketching the direction field to you. Here is the direction field as well as a couple of solutions sketched in as well.

Note, that we included a small portion of negative P’s in here even though they really don’t make any sense for a population problem. The reason for this will be apparent down the road. Also, notice that a population of say 8 doesn’t make all that much sense so let’s assume that population is in thousands or millions so that 8 actually represents 8,000 or 8,000,000 individuals in a population.

Notice that if we start with a population of zero, there is no growth and the population stays at zero. So, the logistic equation will correctly figure out that. Next, notice that if we start with a population in the range 0 < P(0) < 10 then the population will grow, but start to level off once we get close to a population of 10. If we start with a population of 10, the population will stay at 10. Finally if we start with a population that is greater than 10, then the population will actually die off until we start nearing a population of 10, at which point the population decline will start to slow down.

Now, from a realistic standpoint this should make some sense. Populations can’t just grow forever without bound. Eventually the population will reach such a size that the resources of an area are no longer able to sustain the population and the population growth will start to slow as it comes closer to this threshold. Also, if you start off with a population greater than what an area can sustain there will actually be a die off until we get near to this threshold.

In this case that threshold appears to be 10, which is also the value of K for our problem. That should explain the name that we gave K initially. The carrying capacity or saturation level of an area is the maximum sustainable population for that area.

So, the logistics equation, while still quite simplistic, does a much better job of modeling what will happen to a population.

Now, let’s move on to the point of this section. The logistics equation is an example of an autonomous differential equation. Autonomous differential equations are differential equations that are of the form.

The only place that the independent variable, t in this case, appears is in the derivative.

Notice that if ,

for some value

then this will also be a solution to the differential equation. These values are called equilibrium solutions or equilibrium points.

In logistics equation

As we pointed out there are two equilibrium solutions to this equation P = 0 and P = 10. If we ignore the fact that we’re dealing with population these points break up the P number line into three distinct regions.

We will say that a solution starts “near” an equilibrium solution if it starts in a region that is on either side of that equilibrium solution. So solutions that start “near” the equilibrium solution P = 10 will start in either

and solutions that start “near” P = 0 will start in either

For regions that lie between two equilibrium solutions we can think of any solutions starting in that region as starting “near” either of the two equilibrium solutions as we need to.

Now, solutions that start “near” P = 0 all move away from the solution as t increases. Note that moving away does not necessarily mean that they grow without bound as they move away. It only means that they move away. Solutions that start out greater than P = 0 move away, but do stay bounded as t grows. In fact, they move in towards P = 10.

Equilibrium solutions in which solutions that start “near” them move away from the equilibrium solution are called unstable equilibrium points or unstable equilibrium solutions. So, for our logistics equation, P = 0 is an unstable equilibrium solution.

Next, solutions that start “near” P = 10 all move in toward P = 10 as t increases.Equilibrium solutions in which solutions that start “near” them move toward the equilibrium solution are called asymptotically stable equilibrium points or asymptotically stable equilibrium solutions. So, P = 10 is an asymptotically stable equilibrium solution.

Example 2 Find and classify all the equilibrium solutions to the following differential equation.

Solution

First, find the equilibrium solutions. This is generally easy enough to do.

So, it looks like we’ve got two equilibrium solutions. Both y = -2 and y = 3 are equilibrium solutions. Below is the sketch of some integral curves for this differential equation. A sketch of the integral curves or direction fields can simplify the process of classifying the equilibrium solutions.

From this sketch it appears that solutions that start “near” y = -2 all move towards it as t increases and so y = -2 is an asymptotically stable equilibrium solution and solutions that start “near” y = 3 all move away from it as t increases and so y = 3 is an unstable equilibrium solution.

The Logistic Equation

The logistic equation (sometimes called the Verhulst model or logistic growth curve) is a model of population growth first published by Pierre-Fran cois Verhulst (1845,1847). The model is continuous in time, but a̧ modification of the continuous equation to a discrete quadratic recurrence equation, known as the logistic map, is also widely studied.

The continuous version of the logistic model is described by the differential equation

where r is the Malthusian parameter (rate of maximum population growth) and K is the carrying capacity (i.e. the maximum sustainable population). Dividing both sides by K and defining X=N/K then gives the differential equation

The discrete version of the logistic model is written

The simple logistic equation is a formula for approximating the evolution of an animal population over time. Many animal species are fertile only for a brief period during the year and the young are born in a particular season so that by the time they are ready to eat solid food it will be plentiful. For this reason, the system might be better described by a discrete difference equation than a continuous differential equation. Since not every existing animal will reproduce (a portion of them are male after all), not every female will be fertile, not every conception will be successful, and not every pregnancy will be successfully carried to term; the population increase will be some fraction of the present population. Therefore, if "An" is the number of animals this year and "An+1" is the number next year, then

An+1= rAn

where "r" is the growth rate or fecundity, will approximate the evolution of the population. This model

produces exponential growth without limit. Since every population is bound by the physical limitations of its

surrounding, some allowance must be made to restrict this growth. If there is a carrying-capacity of the

environment then the population may not exceed that capacity. If it does, the population would become extinct.

This can be modeled by multiplying the population by a number that approaches zero as the population

approaches its limit. If we normalize the "An" to this capacity then the multiplier (1−An) will suffice and the

resulting logistic equation becomes

An+1=rAn(1−An)

or in functional form

ƒ(x)=rx(1−x).

The logistic equation is parabolic like the quadratic mapping with ƒ(0)=ƒ(1)=0 and a maximum of ¼r at ½.

Varying the parameter changes the height of the parabola but leaves the width unchanged. (This is different

from the quadratic mapping which kept its overall shape and shifted up or down.) The behavior of the system

is determined by following the orbit of the initial seed value. All initial conditions eventually settle into one of

three different types of behavior.

Fixed: The population approaches a stable value. It can do so by approaching asymptotically from one

side in a manner something like an over damped harmonic oscillator or asymptotically from both sides

like an under damped oscillator. Starting on a seed that is a fixed point is something like starting an

SHO at equilibrium with a velocity of zero. The logistic equation differs from the SHO in the existence

of eventually fixed points. It's impossible for an SHO to arrive at its equilibrium position in a finite

amount of time (although it will get arbitrarily close to it).

Periodic: The population alternates between two or more fixed values. Likewise, it can do so by

approaching asymptotically in one direction or from opposite sides in an alternating manner.

The nature of periodicity is richer in the logistic equation than the SHO. For one thing, periodic

orbits can be either stable or unstable. An SHO would never settle in to a periodic state unless

driven there. In the case of the damped oscillator, the system was leaving the periodic state for

the comfort of equilibrium. Second, a periodic state with multiple maxima and/or minima can

arise only from systems of coupled SHOs (connected or compound pendulums, for example, or

vibrations in continuous media). Lastly, the periodicity is discrete; that is, there are no

intermediate values.

Chaotic: The population will eventually visit every neighborhood in a subinterval of (0, 1).

Nested among the points it does visit, there is a countably infinite set of fixed points and periodic points of every period. The points are equivalent to a Cantor middle thirds set and are wildly unstable. It is highly likely that any real population would ever begin with one of these values. In addition, chaotic orbits exhibit sensitive dependence on initial conditions such that any two nearby points will eventually diverge in their orbits to any arbitrary separation one chooses.

The behavior of the logistic equation is more complex than that of the simple harmonic oscillator. The type of orbit depends on the growth rate parameter, but in a manner that does not lend itself to "less than", "greater than", "equal to" statements. The best way to visualize the behavior of the orbits as a function of the growth rate is with a bifurcation diagram. Pick a convenient seed value, generate a large number of iterations, discard the first few and plot the rest as a function of the growth factor. For parameter values where the orbit is fixed, the bifurcation diagram will reduce to a single line; for periodic values, a series of lines; and for chaotic values, a gray wash of dots.

1. Systems of ODEs

1.1 Systems of First-Order Linear Differential Equations

In many (perhaps most) applications of differential equations, we have not one but several quantities which change over time and interact with one another.

4. Example: The populations of the various species in an ecosystem.5. Example: The concentrations of molecules involved in a chemical reaction.6. Example: The production of goods, availability of labor, prices of supplies, and many other

quantities over time in economic processes.

We would like to develop methods for solving systems of differential equations.Alas, as is typical with dierential equations, we cannot solve arbitrary systems in full generality: in fact it is very difficult even to solve individual nonlinear differential equations, let alone a system of nonlinear equations.The most we will be able to do in general is to solve systems of linear equations with constant coefficients,and give an existence-uniqueness theorem for general systems of linear equations.

1.2 General Theory of (First-Order) Linear Systems

• We can reduce any system of linear differential equations to a system of first-order linear differential equations (in more variables): if we dene new variables equal to the higher-order derivatives of our old variables, then we can rewrite the old system as a system of rst-order equations (in more variables).

Example:

Consider the single 3rd-order equation

If we define new variables

then the original equation tells us that

so

Thus, this single 3rd-order equation is equivalent to the first-order system

Example:

Consider the system

If we define new variables

then

and

So this system is equivalent to the first-order system

• Thus, whatever we can show about solutions of systems of first-order linear equations, will carry over to arbitrary systems of linear differential equations. differential equations from now on. So we will talk only about systems of first-order linear differential equations from now on.

A system of first-order linear di#erential equations (with unknown functions y1 , · · · , yn ) has the general form

for some functions

Most of our time we will be dealing with systems with constant coe#cients, in which all of the ai,j (x) are constant functions.

We say a first-order system is homogeneous if each of p1 (x), p2 (x), · · · , pn (x) is zero .• An initial condition for this system consists of n pieces of information y1 (x0 ) = b1 , y2 (x0 ) =

b2 , . . . yn (x0 ) = bn , where x0 is the starting value for x and the bi are constants .

• Many of the theorems about general systems of first-order linear equations are very similar to the theorems about nth order linear equations.

• Theorem (Homogeneous Systems): If the coeffcient functions (y1 , y2 , · · · , yn ) ai,j (x) are continuous, then the set of solutions to the homogeneous system .

is an n-dimensional vector space.

The fact that the set of solutions forms a vector space is not so hard to show using the subspace criteria.

The real result of this theorem, which follows from the existence-uniqueness theorem below, is that the set of solutions is n-dimensional.

• Theorem (Existence-Uniqueness): For a system of first-order linear differential equations, if the coefficient functions ai,j (x) and nonhomogeneous terms pj (x) are each continuous in an interval around x = x0 , then the system

with initial conditions

has a unique solution (y1 , y2 , · · · , yn ) in some (possibly smaller) interval aroun d x = x0

Example : The system

has a unique solution for every initial condition

Definition:

Given n vectors s1 = (y1,1 , y1,2 , · · · , y1,n ) ,...., sn = (yn,1 , yn,2 , · · · , yn,n ) with functions as entries, their Wronskian is defined as the determinant

W =

The n vectors s1 , · · · , sn are linearly independent if their Wronskian is nonzero.

1.3 Higher Order ODE

Example : Mechanical Oscillations

Mechanical oscillations

Phase space of the linear system

Solution curve of the linear system

Solution curve for Case2(i)

Solution curve for case2(ii)

Solution curve for case2(iii)

Input-Response curve

Modeling Predators

Predator-Prey Model

Models of population growth.

The simplest model for the growth, or decay, of a population says that the growth rate, or the decay rate, is proportional to the size of the population itself. Increasing or decreasing the size of the population results in a proportional increase or decrease in the number of births and deaths. Mathematically, this is described by the differential equation

The proportionality constant k relates the size of the population, y(t), to its rate of growth,to its rate of growth,

If k is positive, the population increases; if k is negative, population decreases. As we know, the solution to this equation is a function y(t) that is proportional to the exponential function

where η = y(0).

This simple model is appropriate in the initial stages of growth when there are no restrictions or constraints on the population. A small sample of bacteria in a large Petri dish, for example. But in more realistic situations there are limits to growth, such as finite space or food supply. A more realistic model

says that the population competes with itself. As the population increases, its growth rate decreases linearly. The differential equation is sometimes called the logistic equation.

This simple model is appropriate in the initial stages of growth when there are no restrictions or constraints on the population. A small sample of bacteria in a large Petri dish, for example. But in more realistic situations there are limits to growth, such as finite space or food supply. A more realistic model says that the population competes with itself. As the population increases, its growth rate decreases linearly.The differential equation is sometimes called the logistic equation.

Figure 1: Exponential growth and logistic growth.

The new parameter μ is the carrying capacity. As y(t) approaches μ the growth rate approaches zero and the growth ultimately stops. It turns out that the solution is

You can easily verify for yourself that as t approaches zero, y(t) approaches η and that as t approaches infinity, y(t) approaches μ. If you know calculus, then with quite a bit more effort, you can verify that y(t) actually satisfies the logistic equation.

Figure 1 shows the two solutions when both η and k are equal to one. The exponential function

gives the rapidly growing green curve. With carrying capacity μ = 20, the logistic function

gives the more slowly growing blue curve. Both curves have the same initial value and initial slope. The exponential function grows exponentially, while the logistic function approaches, but never exceeds, its carrying capacity.

Figure 1 was generated with the following code.

k = 1

eta = 1

mu = 20

t = 0:1/32:8;

y = mu*eta*exp(k*t)./(eta*exp(k*t) + mu - eta);

plot(t,[y; exp(t)])

axis([0 8 0 22])

If you don’t have the formula for the solution to the logistic equation handy, you can compute a numerical solution with ode45, one of the Matlab ordinary differential equation solvers. Try running the following code. It will automatically produce a plot something like the blue curve in figure 1.

k = 1

eta = 1

mu = 20

ydot = @(t,y) k*(1-y/mu)*y

ode45(ydot,[0 8],eta)

The @ sign and @(t,y) specify that you are defining a function of t and y. The t is necessary even though it doesn’t explicitly appear in this particular differential equation.

The logistic equation and its solution occur in many different fields. The logistic function is also known

as the sigmoid function and its graph is known as the S-curve.Populations do not live in isolation. Everybody has a few enemies here and there. The Lotka-Volterra predator-prey model is the simplest description of competition between two species. Think of rabbits and foxes, or zebras and lions, or little fish and big fish. The idea is that, if left to themselves with an infinite food supply, the rabbits or zebras would live happily and experience exponential population growth. On the other hand, if the foxes or lions were left with no prey to eat, they would die faster than they could reproduce, and would experience exponential population decline.

The predator-prey model is a pair of differential equations involving a pair of competing populations, y1 (t) and y2 (t). The growth rate for y1 is a linear function of y2 and vice versa.

We are using notation y1 (t) and y2 (t) instead of, say, r(t) for rabbits and f (t) for foxes, because our Matlab program uses a two-component vector y.

The extra minus sign in the second equation distinguishes the predators from the prey. Note that if y1

ever becomes zero, then

and the predators are in trouble. But if y2 ever becomes zero, then

and the prey population grows exponentially.

We have a formula for the solution of the single species logistic model. However it is not possible to express the solution to this predator-prey model in terms of exponential, trigonmetric, or any other elementary functions. It is necessary, but easy, to compute numerical solutions.

Figure 2:A typical solution of the predator-prey equations.

There are four parameters, the two constants μ1 and μ2 , and the two initial conditions,

η1 = y1 (0)

η2 = y2 (0)

If we happen to start with η1 = μ1 and η2 = μ2 , then both y1 and y2 are zero and the populations remain constant at their initial values. In other words, the point (μ1 , μ2 ) is an equilibrium point. The origin, (0, 0) is another equilibrium point, but not a very interesting one.

The following code uses ode45 to automatically plot the typical solution shown in figure 2.

mu = [300 200]’

eta = [400 100]’

signs = [1 -1]’

pred_prey_ode = @(t,y) signs.*(1-flipud(y./mu)).*y

period = 6.5357

ode45(pred_prey_ode,[0 3*period],eta)

There are two tricky parts of this code. Matlab vector operations are used to define pred_prey_ode, the differential equations in one line. And, the calculation that generates figure 16.3 provides the value assigned to period. This value specifies a value of t when the populations return to their initial values

given by eta. The code integrates over three of these time intervals, and so at the end we get back towhere we started.

The circles superimposed on the plots in figure 2 show the points where ode45 computes the solution. The plots look something like trig functions, but they’re not. Notice that the curves near the minima are broader, and require more steps to compute, then the curves near the maxima. The plot of sin t would look the same at the top as the bottom.

Figure 3:The predprey experiment.

Our Matlab program exm/predprey shows a red dot at the equilibrium point,(μ1 , μ2 ), and a blue-green dot at the initial point, (η1 , η2 ). When you drag either dot with the mouse, the solution is recomputing by ode45 and plotted. Figure 3 shows that two plots are produced — a phase plane plot of y2 (t) versus y1 (t) and a time series plot of y1 (t) and y2 (t) versus t. Figures 2 and 3 have the same parameters, and consequently show the same solution, but with different scaling of the axes

The remarkable property of the Lotka-Volterra model is that the solutions are always periodic. The populations always return to their initial values and repeat the cycle. This property is not obvious and not easy to prove. It is rare for nonlinear models to have periodic solutions.

The difficult aspect of computing the solution to the predator-prey equations is determining the length of the period. Our predprey program uses a feature of the Matlab ODE solvers called “event handling” to compute the length of a period. If the initial values (η1 , η2 ) are close to the equilibrium point (μ1 , μ2 ), then the length of the period is close to a familar value. An exercise asks you to discover that value experimentally.

Coupled ODEs

A coupled system is formed of two differential equations with two dependent variables and an

independent variable. The coupled ordinary differential equations systems (ode) are generally obtained

from ode's of order equal or higher than two. Usually it is not possible to obtain an analytic solution to

these systems.

We focus on systems with two dependent variables so that

Most of the analysis will be for autonomous systems so that

A useful compact notation is to write

An example

where a, b, c and d are given constants, and both y and x are functions of t.Use elimination to convert the system to a single second order differential equation. Another initial

condition is worked out, since we need 2 initial conditions to solve a second order problem. Solve this

equation and find the solution for one of the dependent variables (i.e. y or x). Use this solution to work

out the other dependent variable.

For example:


and

?Step 1: First make x the subject of (1),

Step 2: Substitute in (2) to get

which simplifies to


and

Step 3: The roots of the auxiliary equation,

are 2, 1. Hence the solution to the homogeneous problem is

Step 4: Substituting the initial conditions gives

Step 5: Now we have

Hence the solution is

Numerical Solution to the Fox and Rabbit Problem

Consider a simple model for the population dynamics of two interacting species, rabbits and foxes. We

take r(t) and f(t) to be the rabbit and fox population densities (respectively) at time t. In the absence of

any interaction between the two species, we assume that the rabbit population would grow

exponentially, while the fox population would decay exponentially. This suggests the model

We model theinteraction between the species (foxes eat rabbits) by assuming that the rate at which this

happens is proportional to the prodeuct of the rabbit and fox population densities. This suggests adding

additional terms to the right hand sides:

Clearly if r=f= 0 or r=f= 1, the populations will be in a steady state

But what about other initial conditions? For example, what happens if the initial conditions are r(0) = 2,

f(0) = 1?

In this example we are concerned with the populations of rabbits and foxes in a national park. Suppose

that 1000 rabbits and 1000 foxes are introduced into the park which previously contained no rabbits or

foxes and for every nonnegative integer n, the numbers of rabbits and foxes in the path after n months

are R(n) and F(n), respectively. Suppose finally that for each n we have

We want to determine what will happen to the populations of rabbits and foxes in the long term. We

begin our study of this problem with the observation that if then, for each n,

and so

By pointing at the matrix A and clicking on Eigenvalues we see that the two eigenvalues of this matrix

are 1 and 0.7 Since the eigenvalue 1 has multiplicity only 1 and the other eigenvalue has absolute value

less than 1 the above theory, tells us that the sequence (An) is convergent.

Solving the Problem NumericallyBy clicking on Evaluate Numerically we see that

Looking at these matrices we can conjecture that

as

and that, in the long term, the numbers of rabbits and foxes will approach the coordinates of the vector

In other words, in the long term, there will be twice as many rabbits in the park as there are foxes.

Phase Plane Analysis

1.Phase Plane Analysis

Phase plane analysis is a graphical method for studying second-order systems. Phase plane analysis is one of the most important techniques for studying the behavior of nonlinear

systems, since there is usually no analytical solution for a nonlinear system.The response characteristics (relative speed of response) for unforced systems were dependent on the initial conditions.Eigenvalue/eigenvector analysis allowed us to predict the fast and slow (or stable and unstable) initial conditions.Another way of obtaining a feel for the effect of initial conditions is to use a phase plane plot.

2. Concepts of Phase Plane Analysis

Phase portraits

The phase plane method is concerned with the graphical study of second-order autonomous systems described by

where

Geometrically, the state space of this system is a plane having x1 , x 2 as coordinates. This plane is called phase plane. The solution of (2.1) with time varies from zero to infinity can be represented as a curve in the phase plane. Such a curve is called a phase plane trajectory. A family of phase plane trajectories is called a phase portrait of a system.

Example 2.1: Phase portrait of a mass-spring system

Fig. 2.1 A mass-spring system and its phase portrait

The governing equation of the mass-spring system in Fig 2.1 is the familiar linear second-order differential equation

Assume that the mass is initially at rest, at length x0 . Then the solution of this equation is

Eliminating time t from the above equations, we obtain the equation of the trajectories

This represents a circle in the phase plane. Its plot is given in Fig. 1.b.

The nature of the system response corresponding to various initial conditions is directly displayed on the phase plane. In the above example, we can easily see that the system trajectories neither converge to the origin nor diverge to infinity. They simply circle around the origin, indicating the marginal nature of the system’s stability.

A major class of second-order systems can be described by the differential equations of the form

In state space form, this dynamics can be represented & with x1 = x and x 2 = x as follows :

Singular points

A singular point is an equilibrium point in the phase plane. Since an equilibrium point is defined as a point where the & system states can stay forever, this implies that x = 0 , and using (2.1)

For a linear system, there is usually only one singular point although in some cases there can be a set of singular points.

Example 2.2:A nonlinear second-order system

Figure 2.2 : A mass-spring system and its phase portrait

Consider the system

whose phase x portrait is plot in Fig. 2.2

The system has two singular points, one at (0,0) and the other at (−3,0) . The motion patterns of the system trajectories in the vicinity of the two singular points have different natures. The trajectories move towards the point x = 0 while moving away from the point x = −3 .

Why an equilibrium point of a second order system is called a singular point ? Let us examine the slope of the phase portrait. The slope of the phase trajectory passing through a point ( x1 , x 2 ) is determined by

where f1 , f 2 are assumed to be single valued functions. This implies that the phase trajectories will not intersect. At singular point, however, the value of the slope is 0/0, i.e., the slope is indeterminate. Many trajectories may intersect at such point, as seen from Fig. 2.. This indeterminacy of the slope accounts for the adjective “singular”.

Singular points are very important features in the phase plane. Examining the singular points can reveal a great deal of information about the properties of a system. In fact, the stability of linear systems is uniquely characterized by the nature of their singular points. Although the phase plane method is developed primarily for second-order systems, it can also be applied to the analysis of first-order systems of the form

The difference now is that the phase portrait is composed of a single trajectory.

Example 2.3 : A first-order system

Consider the system

there are three singular points, defined by − 4 x + x 3 = 0 , namely, x = 0, − 2, 2 . The phase portrait of the system consists of a single trajectory, and is shown in Fig. 2.3.

Figure 2.3 : Phase trajectory of a first-order system

The arrows in the figure denote the direction of motion, and whether they point toward the left or the right at a particular & point is determined by the sign of x at that point. It is seen from the phase portrait of this system that the equilibrium point x = 0 is stable, while the other two are unstable.

Symmetry in phase plane portrait

Let us consider the second-order dynamics

The slope of trajectories in the phase plane is of the form

Since symmetry of the phase portraits also implies symmetry of the slopes (equal in absolute value but

opposite in sign), we can identify the following situations:

Constructing Phase Portraits

There are a number of methods for constructing phase plane trajectories for linear or nonlinear system, such that so-called analytical method, the method of isoclines, the delta method, Lienard’s method, and Pell’s method.

Analytical method

There are two techniques for generating phase plane portraits analytically. Both technique lead to a functional relation between the two phase variables x1 and x2 in the form

where the constant c represents the effects of initial conditions (and, possibly, of external input signals). Plotting this relation in the phase plane for different initial conditions yields a phase portrait.

The first technique involves solving (2.1) for x1 and x2 as a function of time t , i.e., x 1 (t ) = g1 (t ) and x2 (t ) =g 2 (t ) , and then, eliminating time t from these equations. This technique was already illustrated in example 2.1.

The second technique, on the other hand, involves directly eliminating the time variable, by noting that

and then solving this equation for a functional relation between x1 and x2 . Let us use this technique to solve the mass- spring equation again.

Example 2.4 Mass-spring system

By noting that

we can rewrite (2.2) as

Integration of this equation yields

Most nonlinear systems cannot be easily solved by either of the above two techniques. However, for piece-wise linear systems, an important class of nonlinear systems, this can be conveniently used, as the following example shows.

Example 2.5 A satellite control system

Fig. 2.4 shows the control system for a simple satellite model. The satellite, depicted in Fig. 2.5.a, is simply a rotational unit inertia controlled by a pair of thrusters, which can provide either a positive constant torqueto maintain the satellite antenna at a zero angle by U (positive firing) or negative torque (negative firing). The purpose of the control system is appropriately firing the thrusters.

The mathematical model of the satellite is

where u is the torque provided by the thrusters and θ is the satellite angle. Let us examine on the phase plane the behavior of the control system when the thrusters are fired according to the control law

which means that the thrusters push in the counterclockwise direction if θ is positive, and vice versa.

As the first step of the phase portrait generation, let us consider the phase portrait when the thrusters provide a positive torque U .The dynamics of the system is

which implies that

Therefore, the phase portrait trajectories are a family of parabolas defined by

where c1 is constant. The corresponding phase portrait of the system is shown in Fig. 2.5.b.

When the thrusters provide a negative torque −U , the phase trajectories are similarly found to be

with the corresponding phase portrait as shown in Fig. 2.5.c.

The complete phase portrait of the closed-loop control system can be obtained simply by connecting the trajectories on the left half of the phase plane in 2.5.b with those on the right half of the phase plane in 2.5.c, as shown in Fig. 2.6.

The vertical axis represents a switching line, because the control input and thus the phase trajectories are switched on that line. It is interesting to see that, starting from a nonzero initial angle, the satellite will oscillate in periodic motions under the action of the jets. One can concludes from this phase portrait that the system is marginally stable, similarly to the mass-spring system in Example 2.1. Convergence of the system to the zero angle can be obtained by adding rate feedback.

The method of isoclines

The basic idea in this method is that of isoclines. Consider the dynamics in (2.1):

and

At a point ( x1 , x2 ) in the phase plane, the slope of the tangent to the trajectory can be determined by (2.5). An isocline is defined to be the locus of the points with a given tangent slope. An isocline with slope α is thus defined to be

This is to say that points on the curve

all have the same tangent slope α .

In the method of isoclines, the phase portrait of a system is generated in two steps. In the first step, a field of directions of tangents to the trajectories is obtained. In the second step, phase plane trajectories are formed from the field of directions.

Let us explain the isocline method on the mass-spring system in (2.2):

The slope of the trajectories is easily seen

Therefore, the isocline equation for a slope α is

i.e., a straight line. Along the line, we can draw a lot of short line segments with slope α . By taking α to be different values, a set of isoclines can be drawn, and a field of directions of tangents to trajectories are generated, as shown in Fig. 2.7. To obtain trajectories from the field of directions, we assume that the tangent slopes are locally constant. Therefore, a trajectory starting from any point in the plane can be found by connecting a sequence of line segments.

Example 2.6 The Van der Pol equation

For the Van der Pol equation

an isocline of slope α is defined by

Therfore, the points on the curve

all have the same slope α .

By taking α of different isoclines can be obtained, as plot in Fig. 2.8.

Short line segments are drawn on the isoclines to generate a field of tangent directions. The phase portraits can be obtained, as shown in the plot. It is interesting to note that there exists a closed curved in the portrait, and the trajectories starting from both outside and inside converge to this curve.

Determining Time from Phase Portraits

Time t does not explicitly appear in the phase plane having x1 and x 2 as coordinates. We now to describe two techniques for computing time history from phase portrait. Both of techniques involve a step-by step procedure for recovering time.

Obtaining time from ∆t ≈ ∆x / x , in a short time ∆t , the change of x is approximately ∆x ≈ x ∆t (2.8)

where x is the velocity corresponding to the increment ∆x . From (2.8), the length of time corresponding to the & increment ∆x is ∆t ≈ ∆x / x . This implies that, in order to obtain the time corresponding to the motion from one point to another point along the trajectory, we should divide the corresponding part of the trajectory into a number of small segments (not necessarily equally spaced), find the time associated with each segment, and then add up the results. To obtain the history of states corresponding to a certain initial condition, we simply compute the time t for each point on the & phase trajectory, and then plots x with respects to t and x with respects to t .

Obtaining time from

Since , we can write Therefore,

where x corresponding to time t and x0 corresponding to time t 0 . This implies that, if we plot a phase plane portrait & with new coordinates x and (1 / x ) , then the area under the resulting curve is the corresponding time interval.

Phase Plane Analysis of Linear Systems

The general form of a linear second-order system is

Transform these equations into a scalar second-order differential equation in the form

Consequently, differentiation of (2.9a) and then substitution of (2.9b) leads to

Therefore, we will simply consider the second-order linear system described by

To obtain the phase portrait of this linear system, we solve for the time history

where the constant λ1 , λ2 are the solutions of the characteristic equation

The roots λ1 , λ2 can be explicitly represented as

For linear systems described by (2.10), there is only one singular point (b ≠ 0) , namely the origin. However, the trajectories in the vicinity of this singularity point can display quite different characteristics, depending on the values of a and b . The following cases can occur

• λ1 , λ2 are both real and have the same sign (+ or -) • λ1 , λ2 are both real and have opposite sign • λ1 , λ2 are complex conjugates with non-zero real parts • λ1 , λ2 are complex conjugates with real parts equal to 0

We now briefly discuss each of the above four cases

Stable or unstable node (Fig. 2.9.a -b) The first case corresponds to a node. A node can be stable or unstable:

λ1 , λ2 < 0 : singularity point is called stable node. λ1 , λ2 > 0 : singularity point is called unstable node.

There is no oscillation in the trajectories.

Saddle point (Fig. 2.9.c) The second case ( λ1 < 0 < λ2 ) corresponds to a saddle point. Because of the unstable pole λ2 , almost all of the system trajectories diverge to infinity.

Stable or unstable locus (Fig. 2.9.d-e) The third case corresponds to a focus. Re(λ1 , λ2 ) < 0 : stable focus Re(λ1 , λ2 ) > 0 : unstable focus

Center point (Fig. 2.9.f)

The last case corresponds to a certain point. All trajectories are ellipses and the singularity point is the centre of these ellipses.

⊗ Note that the stability characteristics of linear systems are uniquely determined by the nature of their singularity points. This, however, is not true for nonlinear systems.

Phase Plane Analysis of Nonlinear Systems

In discussing the phase plane analysis of nonlinear system, two points should be kept in mind:

7. Phase plane analysis of nonlinear systems is related to that of liner systems, because the local behavior of nonlinear systems can be approximated by the behavior of a linear system.

8. Nonlinear systems can display much more complicated patterns in the phase plane, such as multiple equilibrium points and limit cycles.

Local behavior of nonlinear systems

If the singular point of interest is not at the origin, by defining the difference between the original state and the singular point as a new set of state variables, we can shift the singular point to the origin. Therefore, without loss of generality, we may simply consider Eq.(2.1) with a singular point at 0. Using Taylor expansion, Eqs. (2.1) can be rewritten in the form

where g1 , g 2 contain higher order terms In the vicinity of the origin, the higher order terms can be neglected, and therefore, the nonlinear system trajectories essentially satisfy the linearized equation

As a result, the local behavior of the nonlinear system can be approximated by the patterns shown in Fig. 2.9.

Limit cycle

In the phase plane, a limit cycle is defied as an isolated closed curve. The trajectory has to be both closed, indicating the periodic nature of the motion, and isolated, indicating the limiting nature of the cycle (with near by trajectories converging or diverging from it).

Depending on the motion patterns of the trajectories in the vicinity of the limit cycle, we can distinguish three kinds of limit cycles.

• Stable Limit Cycles: all trajectories in the vicinity of the limit cycle converge to it as t → ∞ (Fig. 2.10.a).

• Unstable Limit Cycles: all trajectories in the vicinity of the limit cycle diverge to it as t → ∞ (Fig. 2.10.b)

• Semi-Stable Limit Cycles: some of the trajectories in the vicinity of the limit cycle converge to

it as t → ∞ (Fig. 2.10.c)

Example 2.7 Stable, unstable, and semi-stable limit cycle

Consider the following nonlinear systems

By introducing a polar coordinates

the dynamics of (2.12) are transformed as

When the state starts on the unicycle, the above equation shows that r (t ) = 0 . Therefore, the state will circle around the origin with a period 1 / 2π . When r < 1 , then r > 0 . This implies that the state tends to the circle from inside. When r > 1 , then r < 0 . This implies that the states tend to the unit circle from outside. Therefore, the unit circle is a stable limit cycle. This can also be concluded by examining the analytical solution of (2.12)

Similarly, we can find that the system (b) has an unstable limit cycle and system (c) has a semi-stable

limit cycle.

Existence of Limit Cycles

Theorem 2.1 (Pointcare) If a limit cycle exists in the second- order autonomous system (2.1), the N=S+1.

Where, N represents the number of nodes, centers, and foci enclosed by a limit cycle, S represents the number of enclosed saddle points.

This theorem is sometime called index theorem.

Theorem 2.2 (Pointcare-Bendixson) If a trajectory of the second-order autonomous system remains in a finite region Ω , then one of the following is true:

(a) the trajectory goes to an equilibrium point (b) the trajectory tends to an asymptotically stable limit cycle (c) the trajectory is itself a limit cycle

Theorem 2.3 (Bendixson) For a nonlinear system (2.1), no limit cycle can exist in the region Ω of the phase plane in which ∂f1 / ∂x1 + ∂f 2 / ∂x 2 does not vanish and does not change sign.

Example 2.8Consider the nonlinear system

which is always strictly positive (except at the origin), the system does not have any limit cycles any where in the phase plane.

Numerical Methods for Bound Value Problems

Boundary Value Problems (BVP) for higher order ordinary differential equations are frequently encountered in applications. These require the determination of a function of a single independent variable satisfying a given differential equation and subject to specified values at the boundaries of the solution domain. Second and fourth order BVP are most common in engineering applications.

Initial value problems require determination of the function subject to specified value(s) at one end of the domain (typically t = 0). In contrast, in BVPs involving second order ODEs and function values are specified at the two ends of the solution domain (typically x = 0 and x = L). Many problems in engineering and science can be formulated as BVPs. Examples include: steady state conduction heat transfer in a thin heated wire, electric potential inside a thin conductor, deflection of a thin elastic thread under load and many others.

In contrast with initial value problems, boundary value problems often have multiple solutions or even fail to have a solution.

For instance, consider the problem of determining y(x) such that

By inspection (or by use of Frobenius’s method), the general solution of this differential equation is of the form

Now, if the boundary conditions are

the unique solution to the BVP is y(x) = sin(x).

Alternatively, if the boundary conditions are

y(0) = 0; y(π) = 0

multiple solutions of the form y(x) = A sin(x) with A arbitrary exist.Finally, if the boundary conditions are

y(0) = 0; y(π) = 1

no solution y(x) exists.

Three commonly used numerical methods for BVPs are: shooting, finite difference and finite element. All three can be used both for linear and nonlinear problems.

Higher Order Equations and Systems of Equations

A higher order IVP can be rewritten as a system of first order IVPs.

with

y(a) = (α1 , α2 , ..., αn )

where y(t) = (y1 , y2 , ...yn ) and f(t, y) = (f1 (t, y), f2 (t, y), ..fn (t, y)) For instance, the third order equation

subject to the initial conditions

y(0) = α1

y (0) = α2

y (0) = α3

by introducing the new variables

can be transformed into the equivalent system of three first order equations

subject to

y1 (0) = α1

subject to

y2(0) = α12

and

subject to

y3 (0) = α3

Uniqueness Theorem

If f is continuous and Lipschitz in all the variables y, the system of IVP’s has a unique solution y(t).

Therefore, any of the various Initial Value Problem algorithms used to solve single equations can be readily extended to deal with systems of equations. In Maple the solution of systems is implemented with the dsolve command.

Recall that the fourth order Runge-Kutta method is a single step procedure which produces an approximation wi to the solution y(ti ) of the initial value problem represented by the equation.

on a ≤ x ≤ b subject to the condition

y(a) = α

given by

where

and h = ∆t is the step size.

Runge-Kutta Method for Systems Algorithm

• Give number of equations m, the function f (t, y), initial condition αi , endpoints a and b, and the number of subintervals N .

• For each variable yj , j = 1, 2, ..., m compute K1,j , K2,j , K3,j , K4,j and the approximations wj

Linear Shooting

A boundary value problem (BVP) involving an ordinary differential equation of order two consists of determining the function y(x) for x [a, b] such that∈

subject to the boundary conditions

y(a) = α

and

y(b) = β

Theorem of Uniqueness: If the function f (x, y, y ) as well as ∂f /∂y and ∂f /∂y are continuous in the set D = {(x, y, y ) : a ≤ x ≤ b, −∞ ≤ y ≤ +∞, −∞ ≤ y ≤ +∞} and if ∂f /∂y > 0 for all (x, y, y ) D and∈ |∂f /∂y| ≤ M for all (x, y, y ) D where M is a ∈ constant, then the BVP has a unique solution.

The shooting method is based on the idea of replacing the original BVP by an equivalent set of initial value problems (IVPs) which are solved using standard IVP methods.

Corollary on Uniqueness of the Linear Problem. If the BVP is linear , i.e.

in

a≤x≤b

subject to

y(a) = α

and

y(b) = β

and the coefficients p(x), q(x) and r(x) satisfy

9. p(x), q(x), and r(x) are continuous on [a, b],

10. q(x) > 0 on [a, b],

then the problem has a unique solution given by

where y1 (x) and y2 (x) are, respectively, the solutions to the following two IVPs

in

a≤x≤b

subject to

y1 (a) = α

and

y1 (a) = 0

and

in

a≤x≤b

subject to

and

Linear Shooting Algorithm

Nonlinear Shooting

If the BVP is nonlinear, its solution cannot be obtained by linear combination of the solutions of the associated IVPs. Instead one must produce a sequence of solutions y(x, t) corresponding to an IVP of the form

in

subject to

and

where t is a parameter of the method whose values must be selected so that

If the preceeding differential equation is differentiated with respect to t and z(x, t) = the second associated IVP is obtained

in

with

and

Therefore, nonlinear shooting proceeds by solving the above two IVP’s and Newton’s method to determine subsequent iterate values for tk as follows

Nonlinear Shooting Algorithm.

Finite Difference Methods for Linear Problems

Consider the linear BVP

in

subject toy(a) = α

and

y(b) = β

Introduce a mesh in [a, b] by dividing the interval into N + 1 equal subintervals of size h. This produces two boundary mesh points x0 and xN +1 , and N interior mesh points xi , i = 1, 2, .., N . The values y(x0 ) = y(a) and y(xN +1 ) = y(b) are known but the values y(xi ), i = 1, 2, .., N must be determined.

From the Taylor expansions for y(xi+1 ) and y(xi−1 ) the following centered difference formula is obtained

for some ξi (x∈ i−1 , xi+1 ). Similarly, an approximation for y (xi ) is

for some ηi (x∈ i−1 , xi+1 ).

If the higher order terms are discarded from the above formulae and the approximations are used in the original differential equation, the second order accurate finite difference approximation to the BVP becomes a system of simultaneous linear algebraic equations

or, alternatively

for i = 1, 2, .., N .

In matrix notation the system becomes Aw = b

where A is a tridiagonal matrix.

Theorem of Uniqueness. If p(x), q(x) and r(x) are continuous and q(x) > 0 on [a, b] the problem Aw = b has a unique solution provided h < 2/L where L = maxa≤x≤b |p(x)|. Further, if y (4) is continuous on [a, b] the truncation error is O(h2 ).

Linear Finite Difference Method Algorithm.

Finite Difference Methods for Nonlinear Problems

Let the BVP

subject to

y(a) = α

and

y(b) = β

be such that

▪ f , ∂f /∂y = fy and ∂f /∂y = fy are continuous on D;

▪ ∂f /∂y ≥ δ > 0;

• Constants k and L exist such that k = max|∂f /∂y| and L = max|∂f /∂y'|

Then, a unique solution exists. Introducing finite differences and since f is nonlinear, the following system of nonlinear algebraic equations is obtained

for each interior mesh point i = 1, 2, .., N . The nonlinear system has a unique solution as long as h < 2/L.

The Jacobian matrix associated with this system is tridiagonal and it is given by

Newton’s method applied to this problem requires first the solution of the linear system

to produce v(k) which is then added to w(k−1) to obtain w(k) , i.e.

w(k) = w(k−1) + v(k)

for each i = 1, 2, .., N .

Nonlinear Finite Difference Method Algorithm.

• Give functions p(x), q(x), r(x), endpoints a, b, boundary conditions α, β, the number

of subintervals N + 1, the tolerance T OL and the maximum number of iterations M .

• Set h = (b − a)/(N + 1), w0 = α, wN +1 = β.

• For i = 1, .., N set wi = α + i[(β − α)/(b − a)]h.

• Set k = 1, and while k ≤ M compute ti , ai , bi , ci , di for i = 1, 2, ..., N .

• Solve tridiagonal system for zi .

• Set vN = zN , wn = wN + vN .

• Compute vi = zi − ui vi+1 and wi .

• If v ≤ T OL stop, else continue until k = M .

Rayleigh-Ritz Method

Instead of using finite difference approximation to the derivatives in the original differential equation one can invoke variational principles.

Theorem of Uniqueness. Let p C∈ 1 [0, 1], q, f C[0, 1] with p(x) ≥ δ > 0 and∈ q(x) ≥ 0 for 0 ≤ x ≤ 1. Then, the function y(x) C ∈ 2 [0, 1] with y(0) = y(1) = 0 is the solution to the differential equation.

if and only if y(x) is the unique function which minimizes the integral

In practice one searches for the solution among a set of functions produced by linear combination of a set of linearly independent and well defined basis functions φi (x), i = 1, 2, ..., n. The solution y(x) is first approximated by

and then the coefficients ci are chosen so that is minimized, i.e.

for each j = 1, 2, .., n.

Substituting

into I[y], differentiating and introducing the minimization condition leads to

Ac = b

where

are the entries of the n × n stiffness matrix A,

is the load vector and c = (c1 ,c2 , .., cn )t is the vector of unknown coefficients. The simplest kind of basis functions is the set of piecewise linear polynomials defined by first introducing in [0, 1] a collection of nodes x0 , x1 , ..., xn+1 with hi = xi+1 − xi and then making

for each i = 1, 2, .., n. This approximation is continuous but it is not differentiable. Note that φi (x)φj (x) = 0 and φi

'(x)φj' (x) = 0 except when j is equal to i − 1, i, or i + 1 and this makes A tridiagonal.

Evaluation of the matrix entries requires the calculation of 6 integrals for each node. One can approximate the functions p, q, and f by first order interpolating polynomials and then integrate numerically.

Piecewise Linear Rayleigh-Ritz Algorithm.

Give functions p(x), q(x), f (x), endpoints 0, 1, boundary conditions α = 0, β = 0, and the number of nodes n + 2.

• For each node i = 1, ..., n define the basis function φi (x).

• For i = 1, ..., n − 1 compute the integrals Qj,i for j = 1, ..., 6.

• Compute entries in the tridiagonal matrix.

• Solve tridiagonal system for the unknown coefficients ci.

• Compute

Piecewise linear basis functions produce a continuous solution which is however non-differentiable. Differentiability of the approximation can be obtained by using spline functions and a Cubic Spline Rayleigh-Ritz algorithm can be obtained.

Extended Predator-Prey Model

Predator-Two Prey

One population could be the predator such as a fox, and the second and third could be the prey such as a rabbits and turkeys. The populations will be modeled by three or more differential equations. The Matlab command ode45 can be used to solve such systems of differential equations.

Differential Equation Model

We will consider last two continuous predator-prey models:

• one predator, and one prey with constant birth and death rate, • one predator, and one prey with variable birth and death rates and • one predator, and two preys.

We consider the predator to be a fox and a prey to be either rabbits or turkeys. One could also consider different species of fish such as sharks and bass.

One Predator, and One Prey with Variable Birth and Death Rates.

The equation for the fox population will remain the same. The rabbit population will now have a logistic model when there are no foxes. This means the birth and death rate for the rabbit population will vary and be set equal to k(M - y). So, with no foxes the differential equation model for the rabbits is y' = k(M - y)y. If there is a nonzero fox population, then there will be an additional death rate for the rabbit population so that birth rate minus death rate equals k(M - y) - cx. The constants k, M and c must be determined from careful observations of the populations. The new mathematical model is

x' = (-d + ey)x and x(0) = xo, Fox Equation y' = (k(M - y) - cx )y and y(0) = yo. Rabbit Equation

One Predator, and Two Prey with Variable Birth and Death Rates.

Consider one fox and two prey: rabbits and turkeys. Let the turkey population be given by z(t). Since the turkey population is also a prey, the birth rate minus the death rate of the turkeys must be similar to the foxes so that birth rate minus death rate equals kT(MT - z) - cTx. The birth rate minus the death rate for the fox population will increase to -d + ey + eTz. Therefore, the new system of differential equations will have three equations

x' = (-d + ey + eTz)x and x(0) = xo, Fox Equation y' = (k(M - y) - cx )y and y(0) = yo, Rabbit Equation z' = (kT(MT - z) - cTx)z and z(0) = zo. Turkey Equation

Method of Solution

We can use the Matlab command ode45 to solve our systems of differential equation. This command is a robust implementation for systems of differential equations, which uses a variable step size method and the fourth and fifth order Runge-Kutta method.

Matlab Implementation.

In the following calculations we have attempted to solve the second predator-prey model

x' = -.5x + .01xy and y' = .005(100 - y)y - .01xy.

The steady state solutions are given by

x' = -.5x + .01xy = 0 and y' = .005(100 - y)y - .01xy = 0.

The nonzero steady state solutions are x = 25 and y = 50. Our numerical solution spirals in towards the steady state solution.

m-file yprf.m:

function yprf =yprf(t,y) yprf(1) =-.5*y(1) + .01*y(1)*y(2); yprf(2) = .005*(100 - y(2))*y(2) -.01*y(1)*y(2); yprf = [yprf(1) yprf(2)]';

m-file rf.m:

%your name, your student number, lesson number clear; to = 0; tf =50; yo = [80 100]; [t y] = ode45('yprf',[to tf],yo); plot(y(:,1),y(:,2)) title('your name, your student number, lesson number') ylabel('rabbits') xlabel('fox') %plot(t,y(:,1),t,y(:,2)) %xlabel('time') %ylabel('rabbits and fox')

In the yprf.m file the x(t), fox, is associated with the symbol y(1), and y(t), rabbits, is associated with the symbol y(2). The right sides of the differential equations are given by yprf(1) and yprf(2). In rf.m the yo = [80 100] is a 2x1 array of initial populations where x(0) = 80 and y(0) = 100. The output [t y] from the rf.m file has three column vectors where the first column is for all the time values and the second and third columns correspond to the values of x(t) and y(t). These values can be viewed by typing [t y] at the Matlab prompt. Also the following graph of x(t) versus t and y(t) versus t was generated by typing plot(t, y(:,1),t,y(:,2)) at the Matlab prompt.

FUNCTIONAL AND NUMERICAL RESPONSE OF PREDATION

Predation is the consumption of one living organism by another, a relationship in which one organism benefits at the other’s expense. Interactions between predator and prey have been described by the mathematical models of LOTKA and VOLTERRA, modified by others. Relationships between predator and prey populations result in two distinct responses.

According to Holling , predation rates increased with increasing prey population density. This resulted from 2 effects:

• each predator increased its consumption rate when exposed to a higher prey density.• predator density increased with increasing prey density.

Holling considered these effects as 2 kinds of responses of predator population to prey density:

• The Functional response • The Numerical response

Modeling Functional Response

A model of functional response is often called "disc equation" because Holling used paper discs to simulate the area examined by predators.

This model illustrates the principal of time budget in behavioral ecology. It assumes that a predator spends its time on 2 kinds of activities:

• Searching for prey• Prey handling which includes: chasing, killing, eating and digesting.

Consumption rate of a predator is limited in this model because even if prey are so abundant that no time is needed for search, a predator still needs to spend time on prey handling.

Total time equals to the sum of time spent on searching and time spent on handling::

Assume that a predator captured Ha prey during time T. Handling time should be proportional to the number of prey captured:

where Th is time spent on handling of 1 prey.

Capturing prey is assumed to be a random process. A predator examines area a per time unit (only search time is considered here) and captures all prey that were found there. Parameter a is often called "area of discovery", however it can be called "search rate" as well.

After spending time Tsearch for searching, a predator examines the area =a Tsearch, and captures aHTsearch

prey where H is prey density per unit area:

Hence:

Now we can balance the time budget:

The last step is to find the number of attacked prey Ha:

The graph of functional response that corresponds to this equation is shown below:

There are 3 major types of functional response:

This function indicates the number of prey killed by 1 predator at various prey densities. This is a typical shape of functional response of many predator species. At low prey densities, predators spend most of their time on search, whereas at high prey densities, predators spend most of their time on prey handling.

Holling (1959) considered 3 major types of functional response:

Type I functional response is found in passive predators like spiders. The number of flies caught in the net is proportional to fly density. Prey mortality due to predation is constant.

Type II functional response is most typical and corresponds to the equation above. Search rate is constant. Plateau represents predator saturation. Prey mortality declines with prey density. Predators of this type cause maximum mortality at low prey density. For example, small mammals destroy most of gypsy moth pupae in sparse populations of gypsy moth. However in high-density defoliating populations, small mammals kill a negligible proportion of pupae.

Type III functional response occurs in predators which increase their search activity with increasing prey density. For example, many predators respond to kairomones (chemicals emitted by prey) and increase their activity. Polyphagous vertebrate predators (e.g., birds) can switch to the most abundant prey species by learning to recognize it visually. Mortality first increases with prey increasing density, and then declines.

Estimation of Parameters of Functional Response

Experiments should be done as follows: predators are kept in large-size cages individually. Large-size cages are important because search abilities of predators should be limited. Different number of prey are released in these cages. Each prey density should be replicated to get sufficient accuracy. More experiments should be done with low prey density than with high prey density because the error of mortality estimates depends on the total number of prey. Experiments are usually set for a fixed time interval. At the end of experiments, survived prey are counted in each cage.

This is an example of experimental data:

Cage area was 10 sq.m., and duration of experiment was T=2 days. Holling's equation can be transformed to a linear form:

The linear regression has the following coefficients:

y = 3.43 x + 0.0612

Th = 0.0612 T = 0.1224 days = 2.9 hours

a = 1/3.43 = 0.29 cages = 2.9 sq.m.

Another possible method of parameter estimation is non-linear regression. It may give better results at high prey density than the linear regression method.

a=KH or a0 KH

Numerical Response

Numerical response means that predators become more abundant as prey density increases. However, the term "numerical response" is rather confusing because it may result from 2 different mechanisms:

• Increased rate of predator reproduction when prey are abundant (numerical response per se)• Attraction of predators to prey aggregations ("aggregational response")

Reproduction rate of predators naturally depends on their predation rate. The more prey consumed, the more energy the predator can allocate for reproduction. Mortality rate also reduces with increased prey consumption.

The most simple model of predator's numerical response is based on the assumption that reproduction rate of predators is proportional to the number of prey consumed. This is like conversion of prey into new predators. For example, as 10 prey are consumed, a new predator is born.

Aggregation of predators to prey density is often called "aggregational response". This term is better than "numerical response" because it is not ambiguous. Aggregational response was shown to be very important for several predator-prey systems. Predators selected for biological control of insect pests should have a strong aggregational response. Otherwise they would not be able to suppress prey populations. Also, aggregational response increases the stabilility of the spatially-distributed predator-prey (or host-parasite) system.

Implementing the Extended Model Neural networks

Neural Network

An Artificial Neural Network (ANN) is an information processing paradigm that is inspired by the way

biological nervous systems, such as the brain, process information. The key element of this paradigm is

the novel structure of the information processing system. It is composed of a large number of highly

interconnected processing elements (neurones) working in unison to solve specific problems. ANNs,

like people, learn by example. An ANN is configured for a specific application, such as pattern

recognition or data classification, through a learning process. Learning in biological systems involves

adjustments to the synaptic connections that exist between the neurones. This is true of ANNs as well.

Neural network simulations appear to be a recent development. However, this field was established

before the advent of computers, and has survived at least one major setback and several eras. Many

importand advances have been boosted by the use of inexpensive computer emulations. Following an

initial period of enthusiasm, the field survived a period of frustration and disrepute. During this period

when funding and professional support was minimal, important advances were made by relatively few

reserchers. These pioneers were able to develop convincing technology which surpassed the limitations

identified by Minsky and Papert. Minsky and Papert, published a book (in 1969) in which they summed

up a general feeling of frustration (against neural networks) among researchers, and was thus accepted

by most without further analysis. Currently, the neural network field enjoys a resurgence of interest and

a corresponding increase in funding.

Neural networks, with their remarkable ability to derive meaning from complicated or imprecise data,

can be used to extract patterns and detect trends that are too complex to be noticed by either humans or

other computer techniques. A trained neural network can be thought of as an "expert" in the category of

information it has been given to analyse. This expert can then be used to provide projections given new

situations of interest and answer "what if" questions

Other advantages include:

• Adaptive learning: An ability to learn how to do tasks based on the data given for training or

initial experience.

• Self-Organisation: An ANN can create its own organisation or representation of the information

it receives during learning time.

• Real Time Operation: ANN computations may be carried out in parallel, and special hardware

devices are being designed and manufactured which take advantage of this capability.

• Fault Tolerance via Redundant Information Coding: Partial destruction of a network leads to the

corresponding degradation of performance. However, some network capabilities may be

retained even with major network damage.

Architecture of neural networks

Feed-forward networks

Feed-forward ANNs (figure 1) allow signals to travel one way only; from input to output. There is no

feedback (loops) i.e. the output of any layer does not affect that same layer. Feed-forward ANNs tend to

be straight forward networks that associate inputs with outputs. They are extensively used in pattern

recognition. This type of organisation is also referred to as bottom-up or top-down.

Figure 1.

Feed-back networks

Feedback networks (figure 1) can have signals travelling in both directions by introducing loops in the

network. Feedback networks are very powerful and can get extremely complicated. Feedback networks

are dynamic; their 'state' is changing continuously until they reach an equilibrium point. They remain at

the equilibrium point until the input changes and a new equilibrium needs to be found. Feedback

architectures are also referred to as interactive or recurrent, although the latter term is often used to

denote feedback connections in single-layer organisations.

Figure 2: An example of a simple feedforward network

Figure 3: An example of a complicated network

Network layers

The commonest type of artificial neural network consists of three groups, or layers, of units: a layer of

"input" units is connected to a layer of "hidden" units, which is connected to a layer of "output" units.

(see Figure 2)

• The activity of the input units represents the raw information that is fed into the network.

• The activity of each hidden unit is determined by the activities of the input units and the weights on the connections between the input and the hidden units.

• The behaviour of the output units depends on the activity of the hidden units and the weights between the hidden and output units.

Artificial Intelligence has had its fair share from the field of neuroscience. Neuroscience is the study of

nervous system, particularly the brain. How the brain enables human beings to think has remained a

mystery until the present day. But significant leaps and bounds in the field have enabled scientists to

come close to the nature of thought processes inside a brain. A neural network is, in essence, an attempt

to simulate the brain. Neural network theory revolves around the idea that certain key properties of

biological neurons can be extracted and applied to simulations, thus creating a simulated (and very

much simplified) brain. The first important thing to understand then, is that the components of an

artificial neural network are an attempt to recreate the computing potential of the brain. The second

important thing to understand, however, is that no one has ever claimed to simulate anything as

complex as an actual brain. Whereas the human brain is estimated to have something on the order of

ten to a hundred billion neurons, a typical artificial neural network (ANN) is not likely to have more

than 1,000 artificial neurons.

This simple type of network is interesting because the hidden units are free to construct their own

representations of the input. The weights between the input and hidden units determine when each

hidden unit is active, and so by modifying these weights, a hidden unit can choose what it represents.

We also distinguish single-layer and multi-layer architectures. The single-layer organisation, in which

all units are connected to one another, constitutes the most general case and is of more potential

computational power than hierarchically structured multi-layer organisations. In multi-layer networks,

units are often numbered by layer, instead of following a global numbering.

Perceptrons

The most influential work on neural nets in the 60's went under the heading of 'perceptrons' a term

coined by Frank Rosenblatt. The perceptron (figure 4.4) turns out to be an MCP model ( neuron with

weighted inputs ) with some additional, fixed, pre--processing. Units labelled A1, A2, Aj , Ap are called

association units and their task is to extract specific, localised featured from the input images.

Perceptrons mimic the basic idea behind the mammalian visual system. They were mainly used in

pattern recognition even though their capabilities extended a lot more.

Figure 4

In 1969 Minsky and Papert wrote a book in which they described the limitations of single layer

Perceptrons. The impact that the book had was tremendous and caused a lot of neural network

researchers to loose their interest. The book was very well written and showed mathematically that

single layer perceptrons could not do some basic pattern recognition operations like determining the

parity of a shape or determining whether a shape is connected or not. What they did not realised, until

the 80's, is that given the appropriate training, multilevel perceptrons can do these operations.

Cellular automata

A Cellular Automaton (CA) is an infinite, regular lattice of simple finite state machines that change their states synchronously, according to a local update rule that specifies the new state of each cell based on the old states of its neighbours. Imagine a rectangular grid of light bulbs, such as those you can see displaying scrolling messages in shops and airports. Each light bulb can be either on or off. Suppose that the state of a light bulb depended only on the state of the other light bulbs immediately around it, according to some simple rules. Such an array of bulbs would be a cellular automaton (CA)

A CA has the following features

• It consists of a number of identical cells (often several thousand or even millions) arranged in a regular grid. The cells can be placed in a long line (a one-dimensional CA), in a rectangular array or even occasionally in a three-dimensional cube. In social simulations, cells may represent individuals or collective actors such as countries.

• Each cell can be in one of a few states – for example, ‘on’ or ‘off’, or ‘alive’ or ‘dead’. We shall encounter examples in which the states represent attitudes (such as supporting one of several political parties), individual characteristics (such as racial origin) or actions (such as cooperating or not cooperating with others)

• Time advances through the simulation in steps. At each time step, the state of each cell may change.

• The state of a cell after any time step is determined by a set of rules which specify how that state depends on the previous state of that cell and the states of the cell’s immediate neighbours. The same rules are used to update the state of every cell in the grid. The model is therefore homogeneous with respect to the rules.

• Because the rules only make reference to the states of other cells in a cell’s neighbourhood, cellular automata are best used to model situations where the interactions are local. For example, if gossip spreads by word of mouth and individuals only talk to their immediate neighbours, the interaction is local and can be modelled with a CA

To summarize, cellular automata model a world in which space is represented as a uniform grid, time advances by steps, and the ‘laws’ of the world are represented by a uniform set of rules which compute each cell’s state from its own previous state and those of its close neighbours. Cellular automata have been used as models in many areas of physical science, biology and mathematics, as well as social science.As we shall see, they are good at investigating the outcomes at the macro scale of millions of simple micro-scale events. One of the simplest examples of cellular automata, and certainly the best-known, is Conway’s Game of Life (Berlekamp et al. 1982).

The Game of Life

In the Game of Life, a cell can only survive if there are either two or three other living cells in its immediate neighbourhood, that is, among the eight cells surrounding it . Without these companions, it dies, either from overcrowding if it has too many living neighbours, or from

loneliness if it has too few. A dead cell will burst into life provided that there are exactly three living neighbours. Thus, for the Game of Life, there are just two rules:

1. A living cell remains alive if it has two or three living neighbours, otherwise it dies.

2.A dead cell remains dead unless it has three living neighbours, and it then becomes alive.

Figure 1: An example of the evolution of a pattern using the rules of the Game of Life

Surprisingly, with just these two rules, many ever-changing patterns of live and dead cells can be generated. Figure 1 shows the evolution of an small pattern of cells over 12 time steps. To form an impression of how the Game of Life works in practice, let us follow the rules by hand for the first step, shown enlarged in Figure 2.

The black cells are ‘alive’ and the white ones are ‘dead’ (see Figure 2).The cell at b3 has three

live neighbours, so it continues to live in the next time step. The same is true of cells b4, b6 and b7. Cell c3 has four live neighbours (b3, b4, c4 and d4), so it dies from overcrowding. So do c4, c6 and c7. Cells d4 and d6 each have three neighbours and survive. Cells e2 and e8 die because they only have one living neighbour each, but e4 and e6, with two living neighbours, continue. Cell f1, although dead at present, has three live neighbours at e2, f2 and g2, and it starts to live. Cell f2 survives with three living neighbours, and so do g2 (two neighbours alive) and g3 (three neighbours alive). Gathering all this together gives us the second pattern in the sequence shown in Figure 1.

Figure 2 : The initial arangement of cells

It is clear that simulating a CA is a job for a computer. Carrying out the process by hand is very tedious and one is very likely to make mistakes (although Schelling, whose work on segregation in the 1970s is discussed below, did everything with pencil and paper, while Conway is reputed to have worked out his Game of Life using dinner plates on a tiled kitchen floor!).The eighth pattern in Figure 1 is the same as the first, but inverted. If the sequence is continued, the fifteenth pattern will be seen to be the same as the first pattern, and thereafter the sequence repeats every 14 steps. There are a large number of patterns with repeating and other interesting properties and much effort has been spent on identifying these. For example, there are patterns that regularly ‘shoot’ off groups of live cells, which then march off across the grid (Berlekamp et al. 1982)

Other cellular automata models

The Game of Life is only one of a family of cellular automata models. All are based on the idea of cells located in a grid, but they vary in the rules used to update the cells’ states and in their definition of which cells are neighbours. The Game of Life uses the eight cells surrounding a cell as the neighbourhood that influences its state. These eight cells, the ones to the north, north-east, east, south-east, south, south-west, west and north- west, are known as its Moore neighbourhood , after an early CA pioneer

Figure 3 :Cell neighbourhoods

The parity model

A model of some significance for modelling physical systems is the parity model. This model uses just four cells, those to the north, east, south and west, as the neighbourhood (the von Neumann neighbourhood , shown in Figure 3). The parity model has just one rule for updating a cell’s state: then cell becomes ‘alive’ or ‘dead’ depending on whether the sum of the number of live cells, counting itself and the cells in its von Neumann neighbourhood, is odd or even. Figure 4 shows the effect of running this model for 124 steps from a starting configuration of a single filled square block of five by five live cells. As the simulation continues, the pattern expands. After a few more steps it returns to a simple arrangement of five blocks, the original one plus four copies, one at each corner of the starting block. After further steps, a richly textured pattern is created once again, until after many more steps, it reverts to blocks, this time consisting of 25 copies of the original. The regularity of these patterns is due to the properties of the parity rule. For example, the rule is ‘linear’: if two starting patterns are run in separate grids or a number of time steps and the resulting patterns are superimposed, this will be the same pattern one finds if the starting patterns are run together on the same grid.

Figure 4: The pattern produced by applying the parity rule to a squareblock of live cells after 124 time steps

As simulated time goes on, the parity pattern enlarges. Eventually it will reach the edge of the grid. We then have to decide what to do with cells that are on the edge. Which cell is the west neighbour of a cell at the left-hand edge of the grid? Rather than devise special rules for this situation, the usual choice is to treat the far right row of cells as the west neighbour of the far left row and vice versa, and the top row of cells as the south neighbours of the bottom row. Geometrically, this is equivalent to treating the grid as a two-dimensional projection of a torus (a doughnut-shaped surface). The grid now no longer has any edges which need to be treated specially, just as a doughnut has no edges.

One-dimensional models

The grids we have used so far have been two-dimensional. It is also possible to have grids with one or three dimensions. In one-dimensional models, the cells are arranged along a line (which has its left-hand end joined in a circle to its right-hand end in order to avoid edge effects). There are only 32 different rules for a one-dimensional CA because there are only that many combinations of alive and dead states for the cell and its two neighbours, one to the left and one to the right.Wolfram (1986) devised a classification scheme for the rules of one-dimensional automata

Figure 5: The pattern produced after 120 steps by rule 22 starting from a single live cell at the top centre

For example, Figure 5 shows the patterns that emerge from a single seed cell in the middle of the line, using a rule that Wolfram classifies as rule 22. This rule states that a cell becomes alive if and only if one of four situations applies: the cell and its left neighbour are alive, but the right neighbour is dead; it and its right neighbour are dead, but the left neighbour is alive; the left neighbour is dead, but the cell and its right neighbour are alive; or the cell and its left neighbour are dead, but the right neighbour is alive. Figure 5 shows the changing pattern of live cells after successive time steps, starting at time 0 at the top and moving down to step 120 at the bottom. Further time steps yield a steadily expanding but regular pattern of triangles as the influence of the initial live cell spreads to its left and right.

Module 3

Knowledge based systems: rule-based expert systems, Components of rule based system: working

memory, rule base, inference engine, conflict resolution, clausal form of logic, Semantic networks:

Search in semantic networks, Inheritance, criticisms of semantic network, Frame systems: Properties of

frames, Identifying situations and objects.

Knowledge based systems

Human brain can store several thousand folds of world’s knowledge. Still it is said that human brain is

not fully utilized. Advances in human knowledge are tied directly to the ability to analyze to form

information, process it into knowledge and communicate it to others. The human brain has

approximately 1011 nerve cells called biological neurons. It is probably the most complex and least

understood part of the human body. It is continuously thinking in declarative and procedural way for

problem solving. But till today it is a mystery that how does the human mind work. This new

millennium brought us an opportunity to attack all such questions with the help of new knowledge, new

tools and new resources. Development of systems that make use of knowledge, wisdom and

intelligence is a step towards meeting this challenge. The ability of the intelligent systems to capture

and redistribute expertise has significant implications on development of a nation, commodity or

population. Such systems allow documentation of one or more expert knowledge and utilize the

knowledge for problem solving in cost effective way. It allows for, in a controlled manner, the import

of expertise in various areas that the nation lacks, the export of knowledge relating to domestic areas of

expertise, and the duplication and redistribution of scarce knowledge in a cost effective manner (Darek

and Jain 1991). Thus areas of expertise that the selected domain/region/nation is deficient in or

possesses exclusively are potential candidates of the knowledge-based systems.

The Knowledge-Based Systems (KBS), which are a step towards an intelligent system, can be justified

when a few individuals have the majority of the knowledge.

KBS Structure

Knowledge-Based System (KBS) is one of the major family members of the AI group. With

availability of advanced computing facilities and other resources, attention is now turning to more and

more demanding tasks, which might require intelligence. The society and industry are becoming

knowledge oriented and rely on different experts’ decision-making ability. KBS can act as an expert on

demand without wasting time, anytime and anywhere. KBS can save money by leveraging expert,

allowing users to function at higher level and promoting consistency. One may consider the KBS as

productive tool, having knowledge of more than one expert for long period of time. In fact, a KBS is a

computer based system, which uses and generates knowledge from data, information and knowledge.

These systems are capable of understanding the information under process and can take decision based

on the residing information/knowledge in the system whereas the traditional computer systems do not

know or understand the data/information they process.

The KBS consists of a Knowledge Base and a search program called Inference Engine (IE). The IE is a

software program, which infers the knowledge available in the knowledge base. The knowledge base

can be used as a repository of knowledge in various forms. This may includes an empty WorkSpace to

store temporary results and information/knowledge pieces/chunks. As an expert’s power lies in his

explanation and reasoning capabilities, the expert system’s creditability also depends on the

Explanation and Reasoning of the decision made/suggested by the system. Also, human beings have an

ability to learn new things and forget the unused knowledge from their minds. Simulation of such

learning is essential component of KBS. The life of KBS may vary according to the degree of such

simulation. KBS may be either manually updated (manual update) or automatically updated by

machine (machine learning). Ideally, the basic frame of a KBS rarely needs to be modified. In addition

to all these, there should be an appropriate User Interface, which may have the Natural Language

Processing facility. These components are shown in Figure 1.

Figure 1: Architecture of a Knowledge-Based System

KBS ADVANTAGES AND LIMITATIONS Knowledge-based systems are more useful in many situations than the traditional computer based

information systems. Some major situations include:

When expert is not available.

When expertise is to be stored for future use or when expertise is to be cloned or multiplied.

When intelligent assistance and/or training are required for the decision making for problem

solving.

When more than one experts’ knowledge have to be grouped at one platform.

With the proper utilization of knowledge, the knowledge-based systems increase productivity,

document rare knowledge by capturing scare expertise, and enhance problem solving capabilities in

most flexible way. Such systems also document knowledge for future use and training. This leads to

increased quality in problem solving process. However, the scarcity and nature of knowledge make the

KBS development process difficult and complex. The transparent and abstract nature of knowledge is

mainly responsible for this. In addition, this field needs more guidelines to accelerate the development

process. Following are some of the major limitations with the KBS:

Acquisition, representation and manipulation of the large volume of the

data/information/knowledge.

High-tech image of the AI field.

Abstract nature of the knowledge.

Limitations of cognitive science and other scientific methods.

Knowledge-based systems offer several advantages over humans (Natural Intelligent systems). Some

of the major advantages can be listed as follows:

Knowledge-based systems provide efficient documentation of the important knowledge in a

secured and reliable way.

Knowledge-based systems solve unstructured, large and complex problems in an quick and

intelligent fashion and provides justification for the decision suggested.

Knowledge-based systems offer more than one expert knowledge in an integrated fashion.

Knowledge-based systems are able to infer (create) new knowledge and learn from cases or

data instead of just referring the stored content.

KBS DEVELOPMENT

Figure 2 presents the overview of KBS development process. The knowledge of the expert(s) is stored

in his mind in a very abstract way. Also every expert may not be familiar with knowledge-based

systems terminology and the way to develop an intelligent system. The Knowledge Engineer (KE) is

responsible person to acquire, transfer and represent the experts’ knowledge in form of computer

system. People, Experts, Teachers, Students and Testers are the main users’ groups of knowledge-based

systems.

Figure 2: Development of a Knowledge-Based System

The knowledge acquisition process incorporates typical fact finding methods like interviews,

questionnaires, record reviews and observation to acquire factual and explicit knowledge. However,

these methods are not much effective to extract tacit knowledge which is stored in subconscious mind

of experts and reflected in the mental models, insights, values, and actions of the experts. For this,

techniques like concept sorting, concept mapping, and protocol analysis are being used. The acquired

knowledge should be immediately documented in a knowledge representation scheme. At this initial

stage, the selected knowledge representation strategy might not be permanent. However documented

knowledge will lead the knowledge engineer/ developer to better understanding of the system and

provides guidelines to proceed further. Rules, frames, scripts and semantic network are the typical

examples of knowledge representation scheme. It is responsibility of the knowledge engineer to select

appropriate knowledge presentation scheme that is natural, efficient, transparent, and developer

friendly. One may think for hybrid knowledge representation strategies like rules within the frames in

slots like “on need” and “on request”; semantic network of default frames etc. More about knowledge

acquisition, knowledge representation and knowledge-based system development model is available in

the book Knowledge-based systems (Akerkar and Sajja 2009)

KBS PURE APPLICATIONS

Knowledge-based systems applications are divided into two broad categories namely: (i) pure

knowledge-based systems applications and (ii) applied knowledge-based systems application. Pure

applications include research contributing in knowledge-based systems and AI development techniques

such as knowledge acquisition, knowledge representation, models of automated knowledge-based

systems development (knowledge engineering approaches, models and CASE tools for KBS),

knowledge discovery and knowledge management types of tools. Table 1 tries to presents a few

possible research areas in pure KBS development.

Table 1: Research Areas in Pure KBS development

Rule-Based Expert Systems

WHAT ARE RULE-BASED SYSTEMS?

Conventional problem-solving computer programs make use of well-structured algorithms, data

structures, and crisp reasoning strategies to find solutions. For the difficult problems with which expert

systems are concerned, it may be more useful to employ heuristics: strategies that often lead to the

correct solution, but that also sometimes fail. Conventional rule-based expert systems, use human

expert knowledge to solve real-world problems that normally would require human intelligence. Expert

knowledge is often represented in the form of rules or as data within the computer. Depending upon the

problem requirement, these rules and data can be recalled to solve problems. Rule-based expert

systems have played an important role in modern intelligent systems and their applications in strategic

goal setting, planning, design, scheduling, fault monitoring, diagnosis and so on.

Figure 3: Architecture of a basic Rule-Based System

With the technological advances made in the last decade, today’s users can choose from dozens of

commercial software packages having friendly graphic user interfaces (Ignizio, 1991). Conventional

computer programs perform tasks using a decision-making logic containing very little knowledge other

than the basic algorithm for solving that specific problem. The basic knowledge is often embedded as

part of the programming code, so that as the knowledge changes, the program has to be rebuilt.

Knowledge-based expert systems collect the small fragments of human know- how into a knowledge

base, which is used to reason through a problem, using the knowledge that is appropriate. An important

advantage here is that within the domain of the knowledge base, a different problem can be solved

using the same program without reprogramming efforts. Moreover, expert systems could explain the

reasoning process and handle levels of confidence and uncertainty, which conventional algorithms do

not handle (Giarratano and Riley, 1989). Some of the important advantages of expert systems are as

follows:

A b i l i t y t o cap t u re an d p r es e rv e i r r ep l aceab l e h um an experience;

Ability to develop a system more consistent than human experts;

M i n i m i z e h u m a n e x p e r t i s e n e e d e d a t a n u m b e r o f l o c a t i o n s

a t t h e s a m e t i m e ( e s p e c i a l l y i n a h o s t i l e environment that is

dangerous to human health);

Solutions can be developed faster than human experts.

The basic components of an expert system are illustrated in Figure 4. The knowledge

base stores all relevant inform a t i on , d a t a , r u l es , c a s es , and r e l a t i on sh ip s us ed b y

t h e expert system. A knowledge base can combine the knowled ge o f mul t i p l e h uman

ex p e r t s . A r u l e i s a con d i t i o n a l s t a t emen t t h a t l i n ks g iv en con d i t i o ns t o a c t ion s

o r ou t - com es . A f ram e i s ano th e r ap pr o ach us ed to c ap tu r e an d store knowledge

in a knowledge base. It relates an objecto r i t em to v a r io us f ac t s o r v a l u es . A f ram e -

b as ed r ep r e sentation is ideally suited for object -oriented programming techniques.

Expert systems making use of frames to store knowledge are also called frame-based expert

systems.

T h e p u rp os e o f t h e in f e r en ce en g i ne i s t o s eek i n fo r mation and relationships from

the knowledge base and to provide answers, predictions, and suggestions in the way a

human expert would. The inference engine must find the right facts, interpretations, and

rules and assemble them correctly. Two types of inference methods are commonly used –

Backward chaining is the process of starting with conclusions and working backward to

the supporting facts. Forward chaining starts with the facts and works forward to the

conclusions.

Figure 4: Architecture of a simple expert system

The explanation facility allows a user to understand how t he ex pe r t s ys t em a r r iv ed a t c e r t a in

r e s u l t s . Th e o v e r a l l purpose of the knowledge acquisition facility is to provide a

convenient and efficient means for capturing and storing all components of the

knowledge base. Very often specialized user interface software is used for designing, updating, and

using expert systems. The purpose of the user interface is to ease use of the expert system for

developers, users, and administrators

Structure of a Typical Expert System

Figure 5 shows the structure of a typical expert system and its component parts. Running on an

interactive computer, typically a personal computer (PC) or a work station, is a piece of software called

an inference engine. Associated with the inference engine is a knowledge base, containing rules of

inference and related factual information, about a particular domain. Along with the inference engine,

there is need for a good interface for interaction with an expert who creates knowledge bases and with

the naive end-user of the ES. This interface should offer good support for man-machine interaction,

making it easy to put in rules and to edit them. The whole of the software that enables you to build and

use ESs is called an expert system shell. Shells are usually quite generic. You can easily develop

diverse ESs using a shell as a base. The domain-specific knowledge is in the knowledge base, which is

usually created by a knowledge engineer (KE) collaborating with a domain expert.

Figure 5: Structure of a Typical Expert System

RULE BASED SYSTEM

Instead of representing knowledge in a relatively declarative, static way (as a bunch of things that are

true), rule based system represent knowledge in terms of a bunch of rules that tell you what you should

do or what you could conclude in different situations. A rule-based system consists of a bunch of IF-

THEN rules, a bunch of facts, and some interpreter controlling the application of the rules, given the

facts. There are two broad kinds of rule system: forward chaining systems, and backward chaining

systems. In a forward chaining system you start with the initial facts, and keep using the rules to draw

new conclusions (or take certain actions) given those facts. In a backward chaining system you start

with some hypothesis (or goal) you are trying to prove, and keep looking for rules that would allow you

to conclude that hypothesis, perhaps setting new sub goals to prove as you go. Forward chaining

systems are primarily data-driven, while backward chaining systems are goal driven.

Components of a Rule Based System

A typical rule based system consists of three components (Figure 6). They are:

the working memory

the rule base

the inference engine

The rule base and the working memory are the data structures which the system uses and the inference

engine is the basic program which is used. The advantage of this framework is that there is a clear

separation between the data (the knowledge about the domain) and the control (how the knowledge is

to be used).

Figure 6: Components of Rule Based Systems

Working Memory The working memory (WM) represents the set of facts known about the domain. The elements reflect

the current state of the world. In an expert system, the WM typically contains information about the

particular instance of the problem being addressed. For example, in a medical expert system, the WM

could contain the details of a particular patient being diagnosed. The working memory is the storage

medium in a rule based system and helps the system focus its problem solving. It is also the means by

which rules communicate with one another. The actual data represented in the working memory

depends on the type of application. The initial working memory, for instance, can contain a priori

information known to the system. The inference engine uses this information in conjunction with the

rules in the rule base to derive additional information about the problem being solved.

Working memory elements of the form:

color(car5, black) to represent the fact that the colour

of car car5 is black

father(mohan, vivek) to represent the fact that Mohan

is the father of Vivek

This is only one way of representing information in the WM. There could be other ways depending on

the application.

Rule Base

The rule base (also called the knowledge base) is the set of rules which represents the knowledge about

the domain. The general form of a rule is:

If cond1

and cond2

and cond3

...

then action1, action2, ...

The conditions cond1, cond2, cond3, etc. (also known as antecedents) are evaluated based on what is

currently known about the problem being solved (i.e., the contents of the working memory). Some

systems would allow disjunctions in the antecedents. For example, rules like the following would be

allowed.

If cond1

and cond2

or cond3

…

then action1, action2, ...

Such rules are interpreted to mean that if the antecedents of the rule together evaluate to true (i.e., if the

boolean combination of the conditions is true),the actions in the consequents (i.e., action1, action2,

etc.) can be executed. Each antecedent of a rule typically checks if the particular problem instance

satisfies some condition. For example, an antecedent in a rule in a medical expert system could be: the

patient has previously undergone heart surgery. The complexity of antecedents can vary a lot depending

on the type of language used. For instance, in some languages, one could have antecedents such as: the

person’s age is between 10 and 30.

The consequents of a rule typically alter the WM, to incorporate the information obtained by

application of the rule. This could mean adding more elements to the WM, modifying an existing WM

element or even deleting WM elements. They could also include actions such as reading input from a

user, printing messages, accessing files, etc. When the consequents of a rule are executed, the rule is

said to have been fired. Sometimes the knowledge which is expressed in the form of rules is not known

with certainty. In such cases, typically, a degree of certainty is attached to the rules.

Consider rules with only one consequent and one or more antecedents which are combined with the

operator and. We will use a representation of the form:

ruleid: If antecedent1 and antecedent2 .... then consequent

For instance, to represent the rule that all birds can_ fly, we use:

f1: If bird(X) then can_ fly(X)

This representation, though simple, is often sufficient. For instance, if you want to represent the

knowledge that either a bird or a plane can fly, you can do this by using two rules f1 and f2 as follows:

f1: If bird(X) then can fly(X)

f2: If plane(X) then can fly(X)

Therefore the disjunction (ORing) of a set of antecedents can be achieved by having different rules

with the same consequent. Similarly, if multiple consequents follow from the conjunction (ANDing) of

a set of antecedents, this knowledge can be expressed in the form of a set of rules with one consequent

each. Each rule in this set will have the same set of antecedents.

Inference Engine Expert systems have become a popular method for representing large bodies of knowledge for a given

field of expertise and solving problems by use of this knowledge. An expert system often consists of

three parts, namely: a knowledge base, an inference engine, and a user interface. A dialogue is

conducted by the user interface between the user and the system. The user provides information about

the problem to be solved and the system then attempts to provide insights derived (or inferred) from the

knowledge base. These insights are provided by the inference engine after examining the knowledge

base. This interaction is illustrated by the picture in figure 7

Figure 7: An Expert System

Knowledge bases consist of some encoding of the domain of expertise for the system. This can be in

the form of semantic nets, procedural representations, production rules, or frames. We shall consider

only production rules for our knowledge base. These rules occur in sequences and are expressions of

the form

if <conditions> then <actions>

where if the conditions are true then the actions are executed When rules are examined by the

inference engine, actions are executed if the information supplied by the user satisfies the conditions in

the rules. Two methods of inference often are used, forward and backward chaining. Forward chaining

is a top-down method which takes facts as they become available and attempts to draw conclusions

(from satisfied conditions in rules) which lead to actions being executed. Backward chaining is the

reverse. It is a bottom-up procedure which starts with goals (or actions) and queries the user about

information which may satisfy the conditions contained in the rules. It is a verification process rather

than an exploration process. An example of backward chaining is MYCIN, and an example of forward

chaining is Expert. A system which uses both is Prospector

A rule-based system consists of if-then rules, a b u n c h o f facts, a n d a n interpreter controlling

the application of the rules, given the facts. These if-then r u l e s t a t em en t s a r e us ed to

f o rm ul a t e t h e con d i t i o na l s t a t em en t s t h a t com p r i s e t h e com pl e t e knowledge base.

A single if-then rule assumes the form‘ if x is A then y is B ’ and the if-part of the rule ‘ x is

A’ is called the antecedent or premise, while the then-part of the rule ‘ y is B ’ is called the

consequent or conclusion There are two broad kinds of inference engines used in rule-based systems:

forward chaining and backward chaining systems. In a forward chaining system, the initial facts are

processed first, and keep using the rules to draw new conclu sions given those facts. In a

backward chaining system, the hypothesis (or solution/goal) we are trying to reach is

processed first, and keep looking for rules that would allow to conclude that hypothesis.

As the processing progresses, new sub goals are also set for validation. Forward chaining

systems are primarily data-driven, while backward chaining systems are goal-driven. Consider

an example with the following set of if-then rules

Rule 1: I f A a nd C th en Y

Rule 2: I f A a nd X th en Z

Rule 3: I f B th en X

Rule 4: I f Z th en D

I f t h e t a sk i s t o p ro v e t h a t D is true, given A and B are true. According to forward chaining,

start with Rule 1 and go on downward till a rule that fires is found. Rule 3 is the only one

that fires in the first iteration. After the first iteration, it can be concluded that A, B, and X

are true. The second iteration uses this valuable information. Aft er the second iteration,

Rule 2 fires adding Z is true, which in turn helps R u le 4 to fir e , p r ov in g th a t D is true.

Forward chaining strategy is especially appropriate in situations where data a r e

ex p en s i ve to co l l ec t , b u t f ew in qu an t i t y. H o w eve r, special care is to be taken when

these rules are constructed with the preconditions specifying as precisely as possible when

different rules should fire.

In the backward chaining method, processing starts with t h e d e s i red go a l , and t hen a t t emp t s t o fin d

ev id ence fo r proving the goal. Returning to the same example, the task to prove that D

is true would be initiated by first finding a rule that proves D. R u l e 4 d o es s o , wh i ch

a l so p r ov id es a s u b g o a l t o p r o v e t h a t Z i s t r u e . N o w R u l e 2 c o m e s i n t o

p l ay, and as i t i s a l r e a d y k n ow n th a t A i s t ru e , t h e n ew su b go a l i s t o s ho w th a t

X i s t ru e . Ru le 3 p r o v i de s the next subgoal of proving that B i s t r u e . Bu t t ha t B is

true is one of the given assertions. Therefore, it could be concluded that X is true, which

implies that Z is true, which in turn also implies that D is true. Backward chaining is useful in

situations where the quantity of data is potentially ve r y l a rge an d w h e re s om e s pec i fic

ch a r ac t e r i s t i c o f t he s ys t em un d er con s i d er a t io n i s o f i n t e r e s t . I f t h e r e i s no t

much knowledge what the conclusion might be, or there is some specific hypothesis to test,

forward chaining systems m a y b e i ne f fic i en t . In p r in c ip l e , w e can us e th e sam e s e t o f

r u l e s fo r bo t h fo r wa r d and b ack w ar d ch a in i n g . In t h e case of backward chaining,

since the main concern is with matching the conclusion of a rule against some goal that is

to be proved, the ‘then’ (consequent) part of the ru le is usually not expressed as an action

to take but merely as astate, which will be true if the antecedent part(s) are true (Donald,

1986)T h e in f e ren ce en g i n e t r i e s t o d e r i v e n ew i n f o rm at io n ab ou t a g i ven

p r ob le m us in g th e r u l e s i n t h e r u l e ba s e an d th e s i t u a t io n -s p ec i f i c kn o wl ed ge

i n t h e WM . At t h i s s t age , w e n eed to un d e rs t and t he n o t ion o f an i ns t an t i a t i on .

C on s id e r an ex amp l e t o i l l u s t r a t e t h e id ea o f an i ns t an t i a t i on . S up po s e the

w o r k in g m emo r y co n t a i ns t h e e l em en t s :

b i rd ( c ro w )

b i rd ( eagl e )

a i r c r a f t (h e l i co p t e r )

With these working memory elements rule f1 (given in the previous section) can fire. However the antecedent of f1

actually matches 2 two working memory elements. The instantiated antecedents and the corresponding consequents are

given below:

f1:bird(crow)can fly(crow)

f1:bird(eagle)can fly(eagle)

Each of these matches (rule id and matching antecedent(s)) is called an instantiation of the rule f1. Given the contents of

the working memory, the inference engine determines the set of rules which can be fired. These are the rules for which the

antecedents are satisfied. The set of rules which can be fired is called the conflict set. Out of the rules in the conflict set, the

inference engine selects one rule based on some predefined criteria. This process is called conflict resolution and is

described in the next section. For example, a simple conflict resolution criterion could be to select the first rule in the

conflict set. The creation of the conflict set, selection of a rule to fire and the firing of the rule are together called the

recognise-act cycle of the inference engine. The action corresponding to a rule could be to add an element to the working

memory, delete an element from the working memory or change an existing working memory element. It could also

include some actions which do not affect the working memory, for instance printing out the value of a working memory

element. Since firing a rule may modify the working memory, in the next cycle, some instantiations present earlier may no

longer be applicable; instead, some new rules may be applicable. The conflict set will therefore change and a different

instantiation may be chosen in the next cycle. This process is repeated till, in a given cycle, the conflict set is empty or an

explicit halt action is executed, at which point the inference engine terminates.

Forward and Backward Chaining

Inference engines in rule based systems can use different strategies to derive the goal (i.e., new fact), depending on the

types of applications they are aimed at. The most common strategies are:

Forward Chaining

Backward Chaining

Systems can use either one of these strategies or a combination of both. Some systems allow users to specify which

strategy they need for their application. These strategies are briefly described below.

Forward Chaining Systems

ln a forward chaining system the facts in the system are represented in a working memory which is

continually updated. Rules in the system represent possible actions to take when specified conditions

hold on items in the working memory they are sometimes called condition-action rules. The conditions

are usually patterns that must match items in the working memory, while the actions usually involve

adding or deleting items from the working memory. The interpreter controls the application of the

rules, given the working memory, thus controlling the system's activity. It is based on a cycle of activity

sometimes known as a recognise-act cycle. The system first checks to find all the rules whose

conditions hold, given the current state of working memory. It then selects one and performs the actions

in the action part of the rule. (The selection of a rule to fire is based on fixed strategies, known as

conflict resolution strategies.) The actions will result in a new working memory, and the cycle begins

again. This cycle will be repeated until either no rules fire, or some specified goal state is satisfied.

Figure 8: Forward Chaining Procedure

Backward Chaining Systems

So far we have looked at how rule-based systems can be used to draw new conclusions from existing

data, adding these conclusions to a working memory. This approach is most useful when you know all

the initial facts, but don't have much idea what the conclusion might be. If you DO know what the

conclusion might be, or have some specific hypothesis to test, forward chaining systems may be

inefficient. You COULD keep on forward chaining until no more rules apply or you have added your

hypothesis to the working memory. But in the process the system is likely to do alot of irrelevant work,

adding uninteresting conclusions to working memory. In other problems, a goal is specified and the AI

must find a way to achieve that specified goal. For example, if there is an epidemic of a certain disease,

this AI could presume a given individual had the disease and attempt to determine if its diagnosis is

correct based on available information. A backward chaining, goal driven, system accomplishes this. To

do this, the system looks for the action in the THEN clause of the rules that matches the specified goal.

In other words, it looks for the rules that can produce this goal. If a rule is found and fired, it takes each

of that rule’s conditions as goals and continues until either the available data satisfies all of the goals or

there are no more rules that match.

Figure 9: Backward Chaining Procedure

Conflict Resolution

In a forward chaining expert system, during a consultation information collected is added to the working memory.

Information added to the working memory can result in rules in the knowledge base becoming ready to fire ( a rule that

can fire is one where all the conditions attached to the rule are known to be true. If all the conditions are true, then the

advice from the rule can be generated).In an expert system (especially a large one with a large amount of rules), at any one

moment in time there may be a series of rules that are ready to fire.The rules which could fire at any moment in time are

known as the conflict set. A Conflict Resolution Strategy is required to make the decision as to which rule should be fired

first. A list of these strategies is shown:

1. Rule Ordering (First Come, First Served) - Simple. The rule fired is the first rule in the conflict set. If the set

contains rules 2, 5, 7 and 9, then fire rule 2.

2. Recency - The rule is fired which uses the data added most recently to the working

memory (a common mistake is to use the rule added most recently to the

knowledge base – wrong).

Eg:

Rule 1: IF A AND B THEN C

Rule 2: IF D AND E THEN F

Rule 4: IF G AND H THEN I

If the contents of working memory are B, A, H, G, E, D (added in that order with D most

recent addition) then rule 2 will fi re, as D and E are the most recent additions.

3. Specificity - Fire the rule with the most conditions attached.

E.g.

Rule 1: IF A AND B THEN C

Rule 2: IF D AND E AND F THEN G

Rule 2 is fired because it has 3 conditions attached as opposed to the 2 conditions in rule

1.

4. Refractoriness - This prevents any rule rule that has already fired from firing

again, which can cause an infinite loop in the system. Once fired, a rule should be

permanently removed from the conflict set.

The strategy used for selecting one rule to fire from the conflict set is called the conflict

resolution strategy. The behaviour of the system is dependent on the conflict resolution

strategy or strategies used. There are, a number of sophisticated conflict resolution

strategies which are used by commercial systems. These include:

Specificity

Recency

Refractoriness

Specificity

Using this strategy, rules with more antecedents (conditions) are preferred to rules with

less conditions. That is, more specific rules are selected in preference to general rules.

For example consider the two rules: p1: If bird(X) then can fly(X)

which represents the knowledge that if X is a bird then X can _ fly.

p2: If bird(X) and ostrich(X) then not can _fly(X)

which represents the knowledge that if X is a bird and X is an ostrich then

X cannot fly. Suppose the working memory contains the facts:

bird(b1) b1 is a bird

ostrich(b1) b1 is an ostrich

Then by binding the variable X in the antecedent bird(X) of p1 to b1, we find that the

antecedent succeeds. Similarly, both the antecedents of rule p2 are true, if X is bound to

b1.Using specificity, rule p2 would be selected to fire, because its antecedents are a

superset of the antecedents for p1. As can be seen, one use of specificity is to put

knowledge about exceptions into the rule base. The general idea behind specificity is that

the more specific rules are tailored to specific problems and hence should be given

preference.

Recency With this strategy, every element of the working memory is tagged with a number

indicating how recent the data is. When a rule has to be selected from the conflict set, the

rule with an instantiation which uses the most recent data is chosen. If the most recent

data item is matched by more than one instantiation, the time of the second most recent

data element for each of these rules is compared and so on. The idea here is that a rule

which uses more recent data is likely to be more relevant than one which uses older data.

This criterion provides a simple attention focussing mechanism.

Refractoriness

Refractoriness prevents the same rule from applying again and again. If an instantiation

has been applied in a cycle, it will not be allowed to fire again. Refractoriness is

important for two reasons. It prevents the system from going into a loop (i.e., repeated

firing of the same rule with the same instantiation). It also improves the efficiency of the

system by avoiding unnecessary matching.

An Example

To get a better feel of rule based systems, we will conside r a simple example

which deals with relationships between members of a family.

Consider the following rule, defining the relationship mother.

p1: If father(X,Y) and wife(Z,X) then mother(Z,Y)

which represents the knowledge that If X is the father of Y and Z is the wife

of X then Z is the mother of Y.

We can add similar rules to represent other relationships. Some rules are

given below:

p1: If father(X,Y) and wife(Z,X) then mother(Z,Y)

p2: If mother(X,Y) and husband(Z,X) then father(Z,Y)

p3: If wife(X,Y) then husband(Y,X)

p4: If husband(X,Y) then wife(Y,X)

p5: If father(X,Z) and mother(Y,Z) then husband(X,Y)

p6: If father(X,Z) and mother(Y,Z) then wife(Y,X)

Consider the following facts which comprise the working memory. Each element has a

time tag which indicates the chronological order in which the element was added (more

recent elements have higher time tags):

father(rama, mohan) 1

mother(alka, lata) 2

wife(lata, hari) 3

wife(alka, rama) 4

father(hari, uma) 5

The relationships between the different people given in the working memory

is also pictorially depicted in Figure 10.

Figure 10: A Family Hierarchy

The conditions in rule p1 could match with the facts

father(hari, uma)

wife(lata, hari)

If p1 is chosen for firing with these matches, the fact mother(lata, uma) would be added

to the working memory. Now consider all the rules given earlier. The system will derive

new information based on the initial contents of the working memory and the set of rules

it has. The facts the other rules would match are:

p2: no facts matched

p3: wife(lata, hari) and wife(alka, rama)




Assume that the conflict resolution strategy is a combination of three differ ent

strategies in the order given below:

Refractoriness

Recency

Specificity

Refractoriness is used to ensure that the same rule will not be fired more than once with

the same instantiation. Recency is used to select between different instantiations of a

single rule which match the working memory elements. Specificity is to ensure that more

specific rules are fired first. If after the application of these three strategies, the conflict

set still contains more than one rule, then the textual order of the rules in the rule base is

used to resolve the conflict. The rule which comes first in textual order is selected to fire

The exact sequence of rule firings under this conflict resolution strategy and the facts

which are added to the working memory are given below. The system will keep on

deriving new facts until the conflict set is empty. Rules which add working memory

elements which are already present have been omitted from the list. Try simulating the

inference engine to see how these new facts were derived (the facts are preceded by the

name of the rules which were fired to derive them).

p1: mother(lata, uma)

p5: husband(hari, lata)

p1: mother(alka, mohan)

p5: husband(rama, alka)

p2: father(rama, lata)

Advantages of Rule Based Systems

Some of the advantages of rule based systems are:

• Homogeneity

Because of the uniform syntax, the meaning and interpretation of each

rule can be easily analyzed.

• Simplicity

Since the syntax is simple, it is easy to understand the meaning of rules.Domain

experts can often understand the rules without an explicit translation. Rules

therefore can be self-documenting to a good extent.

• Independence

While adding new knowledge one need not be worried about where in the rule base

the rule is added, or what the interactions with other rules are. In theory, each rule

is an independent piece of knowledge about the domain. However, in practice, this

is not completely true, as we shall see in the next section.

• Modularity

The independence of rules leads to modulari ty in the rule base. You can create a

prototype system fairly quickly by creating a few rules. This can be improved by

modifying the rules based on performance and adding new rules.

• Knowledge is Separated from Use and Control

The separation of the rule base from the inference engine separates the

knowledge from how it is used to solve the problem. This means that the same

inference engine can be used with different rule bases and a rule base can be

used with different inference engines. This is a big advantage over conventional

programs where data and control are

intermixed.

• Procedural Interpretations

Apart from declarative interpretation, rule based systems have procedural

interpretations also, which enable them to be viewed as computational models .

Drawbacks of Rule Based Systems

In spite of the advantages mentioned above, rule based systems have their own

drawbacks. Some of the drawbacks are listed below:

• Lack of Methodology

There is no methodology (i.e., systematic procedure), yet for crea ting rule based

systems. Most systems are built based on intuition, prior experience, and trial and

error.

• Interaction among Rules

An advantage of the rule based representation was stated to be the relative

independence of the different pieces of knowledge. However, in many systems

you cannot assume that the rules do not interact among themselves. In certain

cases, ignoring rule interaction could lead to unexpected results

• Opacity

Rule based systems provide no mechanism to group together related pieces of

knowledge. This makes any structure/relationships in the domain opaque in the

rule base

• Lack of Structure

The simplicity of rules leads to the drawback that all rules are at the same level.

In many domains it would be useful to have rules at di fferent levels in a

hierarchy, but the pure production system model does not support this.

• Representing Procedural Tasks

Some tasks which can be easily represented in terms of procedural

representations are not very easy to represent using rule based r epresentations.

For instance, it requires about 10 rules to model two column subtraction using

production rules. This sort of modelling is useful because it gives you a fine

grain analysis of human problem solving.However if you just wanted a system to

solve subtraction problems,using production rules is tedious, to say the least.

• Inefficiency

As mentioned earlier a large amount of time is taken in each cycle to match

applicable rules in the rule base. For large rule bases, this often leads to

inefficiencies. However, there is work going on in creating more efficient

matching algorithms. In addition, structuring the rule base can also lead to

increase in efficiency. The matching procedure can also be made faster by doing

the matching in parallel, if a parallel machine is used.

Clausal Form Logic

In an expert system, knowledge about the domain under consideration has to be encoded

in a form which can be used by the computer. The computer has to make use of the

encoded knowledge to solve problems in the domain. A number of standard formalisms

(or languages) have been developed to encode the knowledge about a domain. Some of

the well understood formalisms are: if-then rules, clausal form logic, semantic networks

and frames.If-then rules are the most popular formalism used in expert systems. The

popularity of if-then rules can be explained by their conceptual simplicity, availability of

efficient implementations and reasonably high expressive power. The expressive power

indicates the ability to represent different types of knowledge.

The expressive power of logic based knowledge representation languages is much better

than that of if-then rules. But, at present, logic based knowledge representation languages

are not as popular among expert system implementors as if-then rules. This is due to lack

of efficient implementations. We believe that in the near future the efficiency of

implementations of logic based languages will improve and they will be used extensively

in implementing expert systems and other AI systems. We can identify three aspects of a

logic – syntax, semantics and proof procedures. The syntax defines the permissible

structures. It specifies the basic structures and rules to combine them to form complex

structures. The semantics defines the meanings of the structures. The proof procedures

are the formal inference mechanisms which can be used to infer new (logically valid)

structures out of the existing structures. The newly inferred structures should be truth

preserving – that is, their meanings should be consistent with the meanings of the

existing structures.

Syntax of CFL

First considering the basic structures and then the formation of complex structures out of

the basic structures. The syntax of a knowledge representation language specifies a set of

basic structures, and rules to combine them to form complex structures. Typically, in

formal languages, the complex structures are called sentences. We use computer-related

symbols (that is, upper-case and lower-case alphabets, digits and special characters) as

the basic ingredients to define CFL structures.

Basic Structures

The four types of basic structures in CFL are: constant, variable, functor and literal. The

complex structure, which can be formed out of these basic structures, is called a clause.

The generic name for a constant, variable, or functor is term.

Constant

A constant begins with a lower-case letter or a digit, followed by zero or more

upper-case or lower-case letters, underscores or digits.

A few examples of constants are: bombay, india, ravi and 25. The following

are not constants: AirPort, NEXTSTATE and TEMP.

Variable

A variable begins with an upper-case letter, followed by zero or more upper-case or

lower-case letters, underscores or digits.

A few examples of variables are: Person, Port and X1. The following are not

variables: area, base and TEMP.

Functor

A functor begins with a functor-name followed by an opening bracket, followed by one or

more arguments (separated by commas), followed by a closing bracket. A functor w ith n

arguments is called an n-ary functor. The general form of a functor is:

functor-name(argument1 , argument2, ..., argumentn )

The form of the functor-name is the same as that of a constant. Each of the arguments of

a functor can be either a constant, variable, or functor. Because a functor can be an

argument of another (possibly same) functor, one can construct nested (or recursive)

functors.

A few examples of functors are: date(11, 12, 1992), date(11, december, 1992) and

name(kowalski, robert). The following are not functors: Address(NCST, Gulmohar Road,

Juhu, Bombay) and Circle(origin(10, 10), radius(5)).

Literal

A literal is the basic element of a clause in CFL. A clause is composed of zero or more

literals. A literal begins with a predicate-name followed by an opening bracket, followed

by one or more arguments (arguments separated by commas), followed by a closing

bracket. A literal with n number of arguments is called an n -ary literal. The general form

of a literal is:

predicate-name(argument1 , argument2 , ..., argumentn )

The form of a predicate-name is the same as that of a constant. The arguments of the

literal have to be constants, variables or functors. Though, syntactically, a functor and a

literal look alike, semantically, they are totally different entities – a functor is used to

represent an object, whereas a literal is used to represent a relationship among objects.

A few examples of literals are: parent(ravi, gopal), part of(bomb ay, india) and greater

than(25,17). The following are not literals: Capital(delhi, india) and

Head(ComputerScience, Vivek).

Now let us see how the basic structures are combined to form clauses.

Clause

A clause is formed by combining a number of literals using connectives. The permissible

connectives are ⇐ (implication), ∧ (and), and ∨ (or).A clause begins with a consequent -

part, followed by an implication and then an antecedent-part. The general form of a

clause is:

consequent-part <= antecedent-part

This is to be read as: the antecedent-part implies the consequent-part or as the

consequent-part is true if the antecedent-part is true.

The consequent-part consists of a number of literals connected together by

or’s. The general form of the consequent -part is:

literal1 ∨ literal2 ∨ ... literalN

The antecedent-part consists of a number of literals connected together by and’s. The

general form of the antecedent-part is:

literal1 ∧ literal2 ∧ ... literalN

Example

A few examples for clauses are:

mother(X, Y) ∨ father(X, Y) ⇐ parent(X, Y)

mother(X, Y) ⇐ parent(X, Y) ∧ female(X)

The following are not clauses:

capital(City, Country) ∧ part of(City, Country)

<= contains(Country, City)

PartOf(Person, cabinet) <= PartOf(Person, government)

Semantics of CFL

The semantics of a knowledge representa tion language deal with the meanings of the

structures. In this section, we first give the semantics of CFL in an intuitive manner. We

later provide an introductory treatment of the formal aspects of the semantics.

Intuitive Semantics

Our primary interest is to use CFL as a knowledge representation language. CFL is a

language which can be used to describe a world of our choice. We want to use the clauses

of CFL to encode (or represent) the facts (or knowledge) of a certain domain (or world).

A domain can be viewed as consisting of individuals (or objects) with relationships

among them. An object can be a human being, animal, inanimate object, or some abstract

object. The nature of the relationships among the objects can be simple or complex

depending on the domain. A relationship can be between two objects. For example,

consider the following sentence: The daughter of Jawaharlal Nehru is Indira Gandhi. In

this sentence, the symbols Jawaharlal Nehru and Indira Gandhi represent two individuals.

And, daughter of represents the relationship of being a daughter.

A relationship can be for a single object. For example, consider the following sentence:

Indira Gandhi is the Prime Minister. In this sentence, the symbol Indira Gandhi is used to

represent the individual. And, is the Prime Minister is used to represent the relation

(property) of being the Prime Minister. More complex relationships can exist among three

or more objects. We will take an example domain and illustrate how CFL can be used to

represent the knowledge in the domain. The domain that we are interested in is the

famous Nehru dynasty.

A part of the knowledge available about the Nehru dynasty can be stated as follows:

Jawaharlal Nehru was the son of Motilal Nehru. He was also the father of Indira Gandhi.

Indira’s husband was Feroz Gandhi. Indira had two sons: Rajiv Gandhi and Sanjay

Gandhi. Rajiv and Sanjay were married to Sonia and Maneka, respectively. Sonia has a

daughter, Priyanka, and a son, Rahul. Maneka has a son Varun.

How can CFL be used to represent what has been stated above? It can be done in the

following way (in general, any given knowledge can be represented using CFL in more

than one way).

son of(jawahar, motilal) <= (C1)

father of(jawahar, indira) <= (C2)

husband of(feroz, indira) <= (C3)

son of(rajiv, indira) <= (C4)

son of(sanjay, indira) <= (C5)

husband of(rajiv, sonia) <= (C6)

husband of(sanjay, maneka) <= (C7)

daughter of(priyanka, sonia) <= (C8)

son of(rahul, sonia)<= (C9)

son of(varun, maneka) <= (C10)

The above clauses are self-explanatory. Their informal meanings can be analysed by

considering a couple of them. The clause C1 represents the fact that Jawaharlal Nehru is

a son of Motilal Nehru. The constants, jawahar and motilal stand for Jawaharlal Nehru

and Motilal Nehru, respectively. The predicate son of represents the relation of somebody

being a son of somebody else. Similarly, the clause C2 represents the fact that Jawaharlal

Nehru is the father of Indira Gandhi. In clause C2, the constants jawahar and indira stand

for the individuals Jawaharlal Nehru and Indira Gandhi, respectively. The two

occurrences of the constant jawahar, in C1 and C2, stand for the same individual,

Jawaharlal Nehru.

Some general knowledge, relevant to any family, can be represented as follows:

wife of(X, Y) <= husband of(Y, X) (C11)

son of(X, Y) <= son of(X, Z) ∧ husband of(Y, Z) (C12)

son of(X, Y) <= son of(X, Z) ∧ wife of(Y, Z) (C13)

father of(X, Y) ∨ mother of(X, Y) ⇐ son of(Y, X) (C14)

father of(X, Y) ∨ mother of(X, Y) ⇐ daughter of(Y, X) (C15)

The clause C11 represents the fact that if a person is the husband of someone, then that

someone is the wife of the person. The same knowledge can also be stated as follows: if

Y is the husband of X, then X is the wife of Y. In this clause, the variables, X and Y, can

stand for any individual (not one particular individual) in the domain.

Now let us see how the meaning of the structures can be defined in a mathematical way

Formal Semantics

The ability to define the meanings of structures mathematically is an important property

of CFL. This property enables the use of CFL to encode the facts and knowledge of a

domain precisely. In fact, this is one of the primary reasons behind the attempt to use

CFL as a knowledge representation language for AI systems.The mathematical definition

of meanings for the structures has another important consequence. It enables one to

determine (by means of some procedure) if a newly inferred structure has a logically

valid meaning or not. The newly inferred structure is logically valid if it is truth

preserving that is, the meaning of the structure is consistent with the meanings of the

existing structures.

In defining formal semantics, we emphasise abstract domains (or worlds), consisting of

abstract objects and abstract relationships. The abstract world might correspond to the

physical world, but we do not necessarily have to limit ourselves to reality. The formal

semantics of CFL can be defined in terms of an interpretation and a model. The primary

purpose of the definition is to find out a world in which all the clauses are true. However,

for some set of clauses, it may not be possible to define such a world .An interpretation

consists of two parts a domain of discourse, and an interpretation function. The domain

of discourse is the set of objects which we are considering. For example, the domain of

discourse might be the set of all staff of an organisation or the set of all colors. The

interpretation function is a function which maps each of the basic structures into an

object relation, or function in the domain of discourse.

Semantic networks

Semantic Networks were created essentially out of the seminal work of Ross Quillian

[Quillian, 1967; 1968] which dealt with a simulation program intended to be able to

“compare and contrast the meanings of arbitrary pairs of English words”. The basic idea

behind Semantic Networks (or Semantic Nets) is simple: the meaning of a concept is

defined by the manner in which it is related (or linked) to other concepts. A semantic

network consists of nodes representing concepts interconnected by different kinds of

associative links. These links represent binary relations between a pair of concepts. N-

ary relations have to be broken up into binary relations, as discussed in the previou s

section. Using standard graph notation, these networks can be displayed and manipulated.

A semantic network differs from a graph in that there is meaning (semantics) associated

with each link. This semantics comes usually from the fact that the links are labelled with

standard English words. Quillian distinguished between type nodes and token nodes in a

semantic network. Type nodes defined classes of objects, while tokens denoted specific

instances of a type of object. Thus, human or woman would be examples of types, while

Gita would be a token, or a particular instance, of the type woman. a-kind-of, is-a, a-

part-of, and has-part were the main relations linking type and token nodes. Thus a

semantic network consists of a set of related concepts, linked to form a graph. These

concepts can be very general information about the world (constituting world

knowledge), or can be concepts specific to some particular domain. Concepts which are

semantically closer will be closer to each other in the network, while concepts which

have no real connection will be far apart. The path(s) through the network connecting two

concepts are quite significant;they indicate how the two concepts are re lated, and how

closely they are related. To understand how to locate the connecting path(s), we need to

understand the notion of search in semantic networks.

Search in Semantic Networks

Quillian’s aim was to compare and contrast two given word concepts, say W1 and W2.

The user can traverse the semantic network from the nodes W1 and W2, trying to detect

an intersection between the two concepts. The traversal is done by spreading activation

along the network. Following normal graph traversal procedure, to avoid wasteful moves,

it is useful to mark each node visited in the network.

Thus, we start with concept W1, and take one step forward in a direction which takes us

closer to W2. (This is an important step, which could be directed, say, by domain-

dependent heuristics. It is very easy to get lost in any network.) This starts the path f rom

the W1 side. To contain the movement, we then switch to the W2 side, and take a step

towards W1,starting the W2 path. As the search proceeds, each node that is visited is

tagged with its starting node name (the name of its patriach), and concepts are expanded

from the W1 side and the W2 side alternately.

If a node is visited from the W1 side, then it is given a W1 tag; else it is given a W2 tag.

If the node is already tagged, it is either because we have reached an intersection, or

because there is a loop in the traversal. Thus, when you are expanding the W1 path, if

you visit a node which already has a W1 tag, you are looping. If it is a W2 tag, an

intersection of the two paths has been found! .The path connecting the two concepts

defines the commonality between the two concepts. Each concept in the path is related in

some way to both W1 and W2.

Figure 11 shows the relation between concepts cry and comfort. The path from cry goes

through cry2 , which is the sense of cry meaning to make a sad sound, and through the

concept sad. The path from comfort goes through the concept comfort3, relating to

making less sad, and then through the concept sad. Thus, there is an intersection at the

concept sad. Quillian’s program could generate natural language sentences to talk about

the relation between these concepts. For this example, the following sentences were

generated:

Comparing: CRY, COMFORT

Intersection node: SAD

(1) CRY IS AMONG OTHER THINGS TO MAKE A SAD SOUND.

(2) TO COMFORT3 CAN BE TO MAKE2 SOMETHING LESS2 SAD.

Figure 11: Relation between concepts cry and comfort. The intersection

node is the concept sad.

This use of intersection search for inference with the help of a spreading activation

mechanism is one of the innovative features of Quillian’s semantic ne twork formalism.

Inheritance

For any domain, we can visualise a type hierarchy. Types inherit properties from their

supertypes.Such hierarchies are normally represented using links called a-kind-of and is-

a (or instance-of).The link a-kind-of (often abbreviated to ako) relates types (of

objects/events) to sub-types (of objects/events). is-a relates types to instances. Thus we

can say:

Professor a-kind-of Person

Anaka is-a Professor

One basic idea that is true of such a hierarch y is that the properties of an object at a

particular level include all the properties defined for it at that node, plus all those defined

for all the supertypes of that node. Thus Anaka will have all the properties of Professor;

in addition, Anaka will also inherit properties of the Person node (as well as those of the

ancestors of that node). Similarly, if Alka is-a Girl and Girl is a-kind-of Human and

Human is a-kind-of Mammal, Alka will have the propert ies defined at the level of the

Alka node. In addition, all the properties o f the nodes Girl , Human and Mammal will be

inherited by Alka.

Inheritance has several advantages. Information that is common to all objects of a class

needs to be represented just once. Avoiding duplication has other beneficial side-effects.

It becomes easier to modify information, without the need to explicitly propagate the

changes to other levels. It also reduces or eliminates the problem of maintaining a

consistent set of facts. Thus inheritance leads to a compact representation.

So far, we have discussed objects with at most one supertype. In general, you could have

more than one parent (supertype) for a given class (or object of a class). In this case, the

child class or object may inherit information from all its supertypes, leading to multiple

inheritance. There are several issues raised by multiple inheritance – what to do if there

is conflicting information, for instance.

There have been reaction-time studies done [Collins and Quillian, 1972] to see if

inheritance is indeed a mechanism used by humans. For example, consider a human who

is told:

Birds have wings.

A crow is a bird.

A crow is black.

Given that the more general properties are stored higher in the generalization hierarchy,

does it take more time to realise that crows have wings, than that crows are black? If so,

perhaps a mechanism such as inheritance is being used. There is conflicting evidence

about this, but the results seem to indicate the plausibility of humans using inheritance

mechanisms. These reaction time experiments also gave rise to the notion of semantic

distance – the number of nodes, for example, between any two concepts (or the number of

links/arcs to be traversed between any two concepts). This acts as a measure of the

“closeness” of the concepts. Two concepts which are closely linked, for example, cry and

sad, will have a very small semantic distance.

Criticisms of Semantic Networks

There have been several criticisms of semantic networks, prominent among them a paper

entitled What’s in a Link [Woods, 1975].Till Woods’ paper, the semantics of ne tworks

were very unclear. Woods argued that it is important to look at the nature of primitives

proposed. By pointing out problems and inadequacies of semantic networks, he

stimulated more formal work in this area.

Consider the following example: Rocks can be classified as igneous rocks,metamorphic

rocks and basaltic rocks. They can also be classified by colour: black rocks, white rocks,

orangish-red rocks etc. This can be represented in the semantic network formalism by a

set of nodes connected to their super-type node rock, as shown in Figure 12.

Figure 12: A Hierarchy of Rock-types

Does this mean that there are as many types of rocks as there are labels attached? This

would be a rather simplistic assumption, besides being wrong! However, many semantic

networks in use assumed that the number of child links from an object node represented

the number of types you could classify the object into.The type dimension (basaltic,

igneous etc.) and the colour dimension are not really independent of each other. Is it not

logical that the dimensions of representation be independent? And should not dimension

information be reflected in the semantic net formalism? Since little thought was paid to

the dimensions of representation, in most semantic networks, these questions went

unanswered.

However, there has been continuing interest in semantic networks, with attempts to go

beyond the limitations of early semantic network systems. Brachman surveyed the growth

and use of semantic networks and provided one of the first formal specifications of the

role of semantic networks [Brach-man, 1979]. [Sowa, 1991] is a recent collection of

excellent articles on the semantic network formalism and its applications.

Frames

Frames are general purpose representations of a cluster of facts or properties, which

describe a prototypical situation or object in detail. In a sense, they are equivalent to

semantic nets, where each node has considerable structure. The general idea behind

frames is very old.

Why Frames are Useful?

Consider a traffic accident situation. Suppose we hear that a car has collided with a truck, leaving two

people dead. There are various questions that immediately arise, for example:

Where was this accident?

When?

Was anyone injured?

How did the accident occur?

Other related question could also be asked:

Whose fault was it?

Could it have been prevented?

When we hear of an accident, we immediately take a lot of things for granted, such as the fact that

usually accidents cause damage to people and property. We also know that accidents are due to

negligence, or sometimes weather conditions, or some strange combination of causes. We need some

mechanism to store the standard (default) knowledge that people have about events and objects.

Traffic accidents are one particular type of accidents. Aircraft accidents are another type. Whatever is

true of accidents in general, will be true of traffic accidents in particular. In addition, there may be

more specific issues that arise. Similarly, whatever is true of an accident, and more than that too, will

be true of aircraft accidents. Thus we say that traffic accidents and aircraft accidents (among other

types of accidents) inherit all the properties of the prototypical accident. We need a formalism to

represent information such as that about accidents. This formalism must also provide mechanisms such

as inheritance. Frames satisfy these requirements, and provide other facilities as well.

Properties of Frames

Frames can be used to describe objects, classes of objects (for example, a room) or typical events such

as a birthday party, or a traffic accident. As we shall see, frames can be used, among other things, to

store such prototypical information,

represent default information,

provide multiple perspectives (viewpoints) of a situation, and

provide for analogies.

The power of the frame formalism derives from all these features.

Subsumption and disjointness are properties of frame-like structured types.These properties are not

available in formalisms such as semantic nets, where all types are atomic. One class is said to subsume

(encompass) another, if every element of the subsumed class is a member of the subsuming class.For

example, the class Human subsumes the class Woman. If two classes are disjoint, no element of one

class can be an element of the other class. For example, the classes Woman and Man are disjoint.

Subsumption and disjointness relations help in having e±cient and meaningful procedures to

manipulate and use frames.

Structure of Frames

A frame can be thought of as a network of nodes and relations, just like a semantic network. One part

of the frame has fixed information, consisting of information that is always true of the situation

represented. The other part of the frame has many slots, which are slots that must be filled with specific

values, instances or data. A frame can be looked upon as a form to be filled. A generic event or object

is described by such a form, and an instance of the event or object is described by a (partially) filled-in

form. Thus, to represent all available information about an object or an event, one would have to create

an instance of (i.e., instantiate) an appropriate class frame. To represent world knowledge, or

knowledge about a domain, we need several frames related in an intricate fashion. A frame system

provides for this.

Nested Frames

Slots may be filled with values that are themselves frames. Thus frames can be nested; for example, a

house frame can have embedded room frames in it. Nested frames let you view the world at different

levels of granularity. At some level, a house is just a set of rooms; at another level, each room has

certain characteristics, such as size, height, color of the walls, number and type of windows and doors

etc.

Inheritance and Masking in Frames

Frames inherit properties and values of properties just as nodes in semantic networks inherit properties.

That is, all the properties and values that are defined in a particular frame at a particular level in a

frame system are inherited (become accessible) by all its descendant frames (either class frames or

instance frames). Inheritance hierarchies are typically set up by a-kind-of and is-a links. The part-of

link is another important link for property inheritance.

Sometimes, it is useful to prevent such automatic inheritance. The area of a continent is a property of

continent. But it is not a parameter that can be inherited by the countries in that continent. Parameters

such as number-of- political units, area, perimeter etc. are made specifc to certain levels. These

parameters are \blocked" from being inherited by nodes lower down, using a \do-not-inherit" attribute;

attributes marked as \do-not-inherit" are not inherited by the current frame and its descendants. This is

also referred to as masking, since attributes are being masked by this method.

Figure 13: Top-level View of Geography Frame System

As in semantic networks, a frame may have more than one supertype. In this case, the child frame may

inherit information from all its super types, leading to multiple inheritance. For example, in Figure 13

we have defined India to be an instance of country, and so India will inherit all the properties of

country. But if we had also declared India to be a part of Asia, we would have a situation where India

inherits from the frame for Asia as well. Multiple inheritance could occur with the use of any of the

standard links - is-a (instance-of), a-kind-of and a-part-of. There may be problems of consistency

when multiple inheritance is permitted, and all these links are considered together. Implementors of

frame systems have to keep this in mind, and provide appropriate methods to unambiguously resolve

problems that may occur with multiple inheritance.

Constraints on Values

Each slot can specify conditions that its values must satisfy. For example, we may have a condition on

a slot for Age, constraining it to be a number, or, more specifically, a number between 0 and 120, say.

This will prevent invalid numbers such as 423 being entered as a value for Age. We may also have

constraints on the type of the value, on the range of values, or a complex condition involving several

such constraints. There could be constraints which specify that a certain slot must only be filled by a

frame of a certain type.

Some slots may have multiple correct values. The number-of-borders of a country varies, ranging from

1 to (perhaps even) 10 values. We can mark this as :min 1 :max 10 indicating that we must have at least

one value, and upto ten values for that slot. Note that min and max are not constraints on the value of

the attribute number-of-borders; they specify only the number of possible values that this parameter can

take.To define the range of values that a parameter can take, we may define sub-ranges for each such

range specification. Alternatively, we may specify the range using predicates such as in-range(X,Y) or

member-of(S). Here the constraint on the value is specified by the predicate invoked to test it.

Default Values

Slots may also have default values associated with them. Thus, we could have a frame for a Car, which

specifies that, by default, it has four wheels. So, if no other information is provided, we understand

from this frame that all instances of Car have four wheels. These default values can be overridden, if

required. Thus, we can still describe a three-wheeled car as a particular instance of the Car class, by

redefining the wheels parameter at that level as having the value three.

Expectations

Default values in frames, coupled with constraints, define expectations. Even if the actual value of a

slot is not known, the constraints and/or the default value provide a partial picture of what is expected.

Thus, if a slot is specified as:

Model : (default black), or as

Model : (member-of (black, blue, red, green))

we can guess that this slot cannot be filled with a completely random value (say, Bombay)- clearly,

color words are expected in these slots. Thus frames also delineate possible values of attributes. This

makes the task of representation much easier.

Procedures in Frames

It is not necessary that all the slots have some pre-defined values (such as a number, another frame, a

list of names etc.). Some information may be stored as a formula, which has to be evaluated. In some

implementations, the user may be provided with demons which are triggered by certain conditions.

These demons lead to certain expressions being evaluated to provide additional information

For example, we may wish to represent a rectangle by its length and breadth. However, we may require

other attributes of the rectangle, such as the area or perimeter of the rectangle, as parameters. We can

define these as attributes whose values are dependent on the attributes length and breadth. The exact

dependency can be specified as a procedure or as a formula (or expression), which may be evaluated

when the value of the attribute is required by the system.

For example, we could attach an expression to the slot area of a rectangle, thus:

area: (if-needed (length * breadth))

This feature provides enormous power to frames. An if-needed procedure attached to a slot tells you

how to compute its values if no value have been specified. Sometimes, you need a specific set of

actions to be taken as soon as a value is defined for a parameter, or if a value is erased. An if-added (or

if-erased) demon, attached to a parameter, defines a set of actions to be carried out if a value for the

attribute is added (or erased). Similarly, an if-accessed demon can be triggered if the value of a

parameter is accessed.

Multiple Viewpoints

A frame can be used to represent a particular instance of an event (present a snap-shot, as it were). A

frame system can be used to represent the state of the world at different times during the event. This

can be done by using a frame corresponding to each point in time that you wish to represent, and

building up a montage of these snap-shot frames to provide a complete picture of the event. It is

meaningless, in this resulting frame system, to replicate all slots and their values. Values which are

common across frames can be shared (using, for instance, some pointer-like mechanism). Only those

values that change need be represented anew. Thus frames offer an economical method of representing

events.

In [Minsky, 1975;1981], frames were seen as a mechanism supporting multiple viewpoints. Consider

two walls A and B, both perpendicular to a given wall C. We can have the frames for walls A and B

marked as perpendicular- to C. If now we take a right turn in the model, one of these walls (say A)

becomes the wall under focus; the other two walls (B and C) become adjacent walls. Minsky showed

how these transformations could be efficiently coded in frames, so that frames offered an efficient

mechanism to represent multiple viewpoints of the same objects.

Identifying Situations and Objects

We have seen various properties of frames and frame systems. The major question to ask now is: How

can frames be used?

One simple answer to this is that frames can be used as sophisticated data structures to represent

information in an elegant and economical fashion. The application which uses frames would explicitly

create or instantiate frames to represent new information. Simple retrieval mechanisms would permit

partial match of frames; thus, an application can provide a partial description of a frame and retrieve all

matching frames. It may happen that frames may not have exact matches and we may need to locate the

nearest (or approximate) matching frames. This may be required to identify a new situation or a new

object, and classify it into one of a known set of prototypical objects or events. While the process of

fitting in the event

or object may not be precise, it will point out ways in which the new object is different from the

prototype. This will, in turn, lead to the generalization of the relevant frames, or the creation of new

class frames.

We outline here the use of frames in the identification of situations. The procedure outline is equally

applicable to the identification of objects.

Given a situation, to identify it, we select a candidate frame based on some heuristics. If more than,

say, two attributes match, then this candidate frame is instantiated. Values are assigned to the attributes,

wherever available. Attributes which do not have a value in the new situation may take the default

value from the candidate frame.

If a matching candidate frame is not detected, we try again with a small relaxation in one or more of the

constraints. Instead of looking for exactly the same frame, we may decide to look for a frame which

subsumes the (ideal) target frame. Thus we are looking for more general versions of the same frame.

For instance, instead of looking for a woman with some properties, we may look for a person with

similar properties. This relaxation is intuitive and a natural way to search.

Related mechanisms such as scripts [Schank and Abelson, 1977] use similar match procedures. Scripts

can be viewed as special purpose frames, describing specifc situations. We can have scripts for a

birthday party, or for the ritual of eating out at a restaurant. In matching a real situation to the script for

the situation, events that are not mentioned are predicted assuming a normal sequence of sub-events. A

coherent interpretation of the situation becomes possible, since the script is used with available

information to understand the situation. Scripts can be modified on the basis of unusual or unexpected

ideas, or new scripts can be created if the unexpected ideas are really different.

MODULE IV

Models of machine learning, Connected models, Genetic algorithms, Evaluation of Computational models in terms of truth, Visualisationtechniques in arts, Modeling genetic and biochemical networks

Models of machine learning

In general, a learning problem considers a set of n samples of data and then tries to predict properties of

unknown data. If each sample is more than a single number and, for instance, a multi-dimensional

entry, is it said to have several attributes or features.

We can separate learning problems in a few large categories:

• supervised learning, in which the data comes with additional attributes that we want to predict.

This problem can be either:

• classification: samples belong to two or more classes and we want to learn from already labeled

data how to predict the class of unlabeled data. An example of classification problem would be

the handwritten digit recognition example, in which the aim is to assign each input vector to one

of a finite number of discrete categories. Another way to think of classification is as a discrete

(as opposed to continuous) form of supervised learning where one has a limited number of

categories and for each of the n samples provided, one is to try to label them with the correct

category or class.

• regression: if the desired output consists of one or more continuous variables, then the task is

called regression. An example of a regression problem would be the prediction of the length of

a salmon as a function of its age and weight.

• Unsupervied learning, in which the training data consists of a set of input vectors x without any

corresponding target values. The goal in such problems may be to discover groups of similar

examples within the data, where it is called clustering, or to determine the distribution of data

within the input space, known as density estimation, or to project the data from a high-

dimensional space down to two or three dimensions for the purpose of visualization.

Machine Learning is a hybrid of Statistics and algorithmic Computer Science. In this review, we will

mostly be concerned with the statistical side. Statistics is about managing and quantifying uncertainty.

Uncertainty may arise due to many different reasons, for example:

• Measurement noise: Measurements of physical processes are always subject to inaccuracies.

Sometimes, low quality data may be obtained more economically. Data items may be missing.

• Model uncertainty: Models are almost never exact, we abstract away complexity in order to

allow for predictions to be feasible.

• Parameter uncertainty: Variables in a model can never be identified exactly and with-out doubt

from finite data.

The calculus of uncertainty is probability theory, gives a good introduction. Some phenomenon of

interest is mapped to a model, being a set ofrandom variables and probabilistic relationships between

them. Variables are observed or latent (unobserved) . To give an example, consider the linear model.

where ε is independent noise. This model describes a functional relationship x→y R.∈

It is a cornerstone of Statistics, and we will see much of it in the following. Suppose we can measure

(x,y) repeatedly and independently. Here, x and y are observed, w and ε are latent. Latent variables are

query or nuisance, we want to know only about the former. For example, w may be query (is there a

linear trend in the data? are some features more relevant than others?), ε is nuisance.w is also called

parameter or weights. It is important to note that the classification of model variables into observed,

query, or nuisance depends on the task which is addressed by the model.

The origins of the field of machine learning go back at least to the middle of the last century. However,

it was only in the early 1990s that the field began to have widespread practical impact. Over the last

decade in particular, there has been a rapid increase in the number of successful applications, ranging

from web search to autonomous vehicles, and from medical imaging to speech recognition. This has

been driven by the increased availability of inexpensive computers, the development of improved

machine learning algorithms, greater interest in the area from both the research community and the

commercial sector, and most notably by the ‘data deluge’ characterized by an exponentially increasing

quantity of data being gathered and stored on the world’s computers.

A neural network with two layers of adjustable parameters, in which each parameter corresponds to one

of the links in the network.During this time, large numbers of machine learning techniques have been

developed, with names such as logistic regression, neural networks, decision trees, support vector

machines, Kalman filters and many others. Contributions to this multi-disciplinary effort have come

from the fields of statistics, artificial intelligence, optimization, signal processing, speech, vision and

control theory, as well as from the machine learning community itself. In the traditional approach to

solving a new machine learning problem, the practitioner must select a suitable algorithm or technique

from the set with which they are familiar, and then either make use of existing software, or write their

own implementation. If the technique requires modification to meet the particular requirements of their

specific application, then they must be sufficiently familiar with the details of the software to make the

required changes.

An example of a traditional machine learning technique is the two-layer neural network, illustrated

diagrammatically in figure 1. The neural network can be viewed as a flexible nonlinear parametric

function from a set of inputs {xi} to a set of outputs {yk}. First, linear combinations of the inputs are

formed, and these are transformed using a nonlinear function h(·) so that where h(·) is often chosen to

be the ‘tanh’ function. These intermediate variables are then linearly combined to produce the outputs.

The variables {w(1)ji} and {w(2)kj}are the adjustable parameters of the network, and their values are set

by minimizing an error function defined with respect to a set of training examples, each of which

consists of a set of values for the input variables together with the corresponding desired values for the

output variables. In a typical application of a neural network, the parameters are tuned using a training

dataset, with the number of hidden units optimized using separate validation data. The network

parameters are then fixed, and the neural network is then applied to new data in which the network

makes predictions for the outputs given new values for the input variables.

Model-basedmachinelearning

The central idea of the model-based approach to machine learning is to create a custom bespoke model

tailored specifically to each new application. In some cases, the model (together with an associated

inference algorithm) might correspond to a traditional machine learning technique,while in many cases

it will not. Typically, model-based machine learning will be implemented using a model specification

language in which the model can be defined using compact code, from which the software

implementing that model can be generated automatically. The key goals of a model-based approach

include the following

The ability to create a very broad range of models, along with suitable inference or learning algorithms,

in which many traditional machine learning techniques appear as special cases.

— Each specific model can be tuned to the individual requirements of the particular application:

for example, if the application requires a combination of clustering and classification in the

context of time-series data, it is not necessary to mash together traditional algorithms for each

of these elements (Gaussian mixtures, neural networks and hidden Markov models (HMMs), for

instance), but instead a single, integrated model capturing the desired behaviour can be

constructed.

— Segregation between the model and the inference algorithm: if changes are made to the model,

the corresponding modified inference software is created automatically. Equally, advances in

techniques for efficient inference are available to a broad range of models.

— Transparency of functionality: the model is described by compact code within a generic

modelling language, and so the structure of the model is readily apparent. Such modelling code

can easily be shared and extended within a community of model builders.

— Pedagogy: newcomers to the field of machine learning have only to learn a single modelling

environment in order to be able to access a wide range of modelling solutions. Because many

traditional methods will be subsumed as special cases of the model-based environment, there is

no need for newcomers to study these individually, or indeed to learn the specific terminology

associated with them. The ability to create a very broad range of models, along with suitable

inference or learning algorithms, in which many traditional machine learning techniques appear

as special cases.

— Each specific model can be tuned to the individual requirements of the particular application:

for example, if the application requires a combination of clustering and classification in the

context of time-series data, it is not necessary to mash together traditional algorithms for each

of these elements (Gaussian mixtures, neural networks and hidden Markov models (HMMs), for

instance), but instead a single, integrated model capturing the desired behaviour can be

constructed.

— Segregation between the model and the inference algorithm: if changes are made to the model,

the corresponding modified inference software is created automatically. Equally, advances in

techniques for efficient inference are available to a broad range of models.

— Transparency of functionality: the model is described by compact code within a generic

modelling language, and so the structure of the model is readily apparent. Such modelling code

can easily be shared and extended within a community of model builders.

— Pedagogy: newcomers to the field of machine learning have only to learn a single modelling

environment in order to be able to access a wide range of modelling solutions. Because many

traditional methods will be subsumed as special cases of the model-based environment, there is

no need for newcomers to study these individually, or indeed to learn the specific terminology

associated with them.

Connected models

Decision Tree based methods

The fundamental learning approach is to recursively divide the training data into buckets of

homogeneous members through the most discriminative dividing criteria. The measurement of

"homogeneity" is based on the output label; when it is a numeric value, the measurement will be the

variance of the bucket; when it is a category, the measurement will be the entropy or gini index of the

bucket. During the learning, various dividing criteria based on the input will be tried (using in a greedy

manner); when the input is a category (Mon, Tue, Wed ...), it will first be turned into binary (isMon,

isTue, isWed ...) and then use the true/false as a decision boundary to evaluate the homogeneity; when

the input is a numeric or ordinal value, the lessThan, greaterThan at each training data input value will

be used as the decision boundary. The training process stops when there is no significant gain in

homogeneity by further split the Tree. The members of the bucket represented at leaf node will vote for

the prediction; majority wins when the output is a category and member average when the output is a

numeric.

The good part of Tree is that it is very flexible in terms of the data type of input and output

variables which can be categorical, binary and numeric value. The level of decision nodes also indicate

the degree of influences of different input variables. The limitation is each decision boundary at each

split point is a concrete binary decision. Also the decision criteria only consider one input attributes at a

time but not a combination of multiple input variables. Another weakness of Tree is that once learned it

cannot be updated incrementally. When new training data arrives, you have to throw away the old tree

and retrain every data from scratch.However, Tree when mixed with Ensemble System (e.g. Random

Forest, Boosting Trees) addresses a lot of the limitations mentioned above. For example, Gradient

Boosting Decision Tree consistently beat the performance of other ML models in many problems and is

one of the most popular method these days.

Linear regression based methods

The basic assumption is that the output variable (a numeric value) can be expressed as a linear

combination (weighted sum) of a set of input variable (which is also numeric value).

y = w1x1 + w2x2 + w3x3 ....

The whole objective of the training phase is to learn the weights w1, w2 ... by minimizing the error

function lost(y, w1x1 + w2x2 + ...). Gradient descent is the classical technique of solving this problem

with the general idea of adjusting w1, w2 ... along the direction of the maximum gradient of the loss

function.

The input variable is required to be numeric. For binary variable, this will be represented as 0, 1. For

categorical variable, each possible value will be represented as a separate binary variable (and hence 0,

1). For the output, if it is a binary variable (0, 1) then a logit function is used to transform the range of

-infinity to +infinity into 0 to 1. This is called logistic regression and a different loss function (based on

maximum likelihood) is used.

To avoid overfitting, regularization technique (L1 and L2) is used to penalize large value of w1, w2 ...

L1 is by adding the absolute value of w1 into the loss function while L2 is by adding the square of w1

into the loss function. L1 has the property that it will penalize redundant features or irrelevant feature

more (with very small weight) and is a good tool to select highly influential features.

The strength of Linear model is that it has very high performance in both scoring and learning. The

Stochastic gradient descent-based learning algorithm is highly scalable and can handle incremental

learning.

The weakness of linear model is linear assumption of input features, which is often false. Therefore, an

http://horicky.blogspot.com/2009/11/machine-learning-with-linear-model.html

important feature engineering effort is required to transform each input feature, which usually involved

domain expert. Another common way is to throw different transformation functions 1/x, x^2, log(x) in

the hope that one of them will have a linear relationship with the output. Linearity can be checked by

observing whether the residual (y - predicted_y) is normally distributed or not (using the QQplot with

the Gaussian distribution).

Neural Network

Neural Network can be considered as multiple layer of perceptron (each is a logistic regression unit

with multiple binary input and one binary output). By having multiple layers, this is equivalent to :

z = logit(v1.y1 + v2y2 + ...), while y1 = logit(w11x1 + w12x2 + ...).

This multi-layer model enables Neural Network to learn non-linear relationship between input x and

output z. The typical learning technique is "backward error propagation" where the error is propagate

from the output layer back to the input layer to adjust the weight.

Notice that Neural Network expect binary input which means we need to transform categorical input

into multiple binary variable. For numeric input variable, we can transform that into binary encoded

101010 string. Categorical and numeric output can be transformed in a similar way.

Bayesian Network

It is basically a dependency graph where each node represents a binary variable and each edge

(directional) represents the dependency relationship. If NodeA and NodeB has an edge to NodeC. This

means the probably of C is true depends on different combinations of the boolean value of A and B.

NodeC can point to NodeD and now NodeD depends on NodeA and NodeB as well. The learning is

about finding at each node the join-probability distribution of all incoming edges. This is done by

counting the observed values of A, B and C and then update the joint probability distribution table at

NodeC.

Once we have the probability distribution table at every node, then we can compute the probability of

any hidden node (output variable) from the observed nodes (input variables) by using the Bayes rule.

http://horicky.blogspot.com/2009/05/machine-learning-probabilistic-model.html

http://horicky.blogspot.com/2009/11/machine-learning-with-linear-model.html

The strength of Bayesian network is it is highly scalable and can learn incrementally because all we do

is to count the observed variables and update the probability distribution table. Similar to Neural

Network, Bayesian network expects all data to be binary, categorical variable will need to be

transformed into multiple binary variable as described above. Numeric variable is generally not a good

fit for Bayesian network.

Support Vector Machine

Support Vector Machine takes numeric input and binary output. It is based on finding a linear plane

with maximum margin to separate two class of output. Categorical input can be turned into numeric

input as before and categorical output can be modeled as multiple binary output. With a different lost

function, SVM can also do regression (called SVR). I haven't used this myself so I can't talk much.

The strength of SVM is it can handle large number of dimensions. With the kernel function, it can

handle non-linear relationship as well.

Nearest Neighbor

We are not learning a model at all. The idea is to find K similar data point from the training set and use

them to interpolate the output value, which is either the majority value for categorical output, or

average (or weighted average) for numeric output. K is a tunable parameter which needs to be cross-

validated to pick the best value.

Nearest Neighbor require the definition of a distance function which is used to find the nearest

neighbor. For numeric input, the common practice is to normalize them by minus the mean and divided

by the standard deviation. Euclidean distance is commonly used when the input are independent,

otherwise mahalanobis distance (which account for correlation between pairs of input features) should

be used instead. For binary attributes, Jaccard distance can be used.

The strength of K nearest neighbor is its simplicity as no model needs to be trained. Incremental

learning is automatic when more data arrives (and old data can be deleted as well). Data, however,

needs to be organized in a distance-aware tree such that finding the nearest neighbor is O(logN) rather

than O(N). On the other hand, the weakness of KNN is it doesn't handle high number of dimensions

http://horicky.blogspot.com/2009/05/machine-learning-nearest-neighbor.html

http://horicky.blogspot.com/2009/11/support-vector-machine.html

well. Also, the weighting of different factors needs to be hand tuned (by cross-validation on different

weighting combination) and can be a very tedious process.

Genetic algorithm

Genetic algorithm (GA) is a powerful stochastic search and optimization method based on the

mechanics of natural selection and natural genetics. The main advantage of GA is that it can search the

whole parameter space with the ability to skip local-optimal points. Therefore, the GA has been applied

to parameter estimation problems.

Charles Darwin’s principle “Survival of the fittest” can be used as a starting point in introducing

evolutionary computation. Evolved data demonstrates optimized complex behavior at each level: the

cell, the organ, the individual and the population. Biological species have solved the problems of chaos,

chance, non-linear interactivities and temporality. These problems proved to be in equivalence with the

classic methods of optimization. The evolutionary concept can be applied to problems where heuristic

solutions are not present or which leads to unsatisfactory results. As a result, evolutionary algorithms

are of recent interest, particularly for practical problems solving.

The theory of natural selection proposes that the plants and animals that exist today are the result of

millions of years of adaptation to the demands of the environment. At any given time, a number of

different organisms may co-exist and compete for the same resources in an ecosystem. The organisms

that are most capable of acquiring resources and successfully procreating are the ones whose

descendants will tend to be numerous in the future. Organisms that are less capable, for whatever

reason, will tend to have few or no descendants in the future. The former are said to be more fit than

the latter, and the distinguishing characteristics that caused the former to be fit are said to be selected

for over the characteristics of the latter. Overtime, the entire population of the ecosystem is said to

evolve to contain organisms that, on average, are more fit than those of previous generations of the

population because they exhibit more of those characteristics that tend to promote survival.

Evolutionary computation (EC) techniques abstract these evolutionary principles into algorithms that

may be used to search for optimal solutions to a problem. In a search algorithm, a number of possible

solutions to a problem are available and the task is to find the best solution possible in a fixed amount

of time. For a search space with only a small number of possible solutions, all the solutions can be

examined in a reasonable amount of time and the optimal one found. This exhaustive search, however,

quickly becomes impractical as the search space grows in size. Traditional search algorithms randomly

sample (e.g., random walk) or heuristically sample (e.g., gradient descent) the search space one

solution at a time in the hopes. Evolutionary Computation of finding the optimal solution. The key

aspect distinguishing an evolutionary search algorithm from such traditional algorithms is that it is

population-based.Through the adaptation of successive generations of a large number of individuals, an

evolutionary algorithm performs an efficient directed search. Evolutionary search is generally better

than random search and is not susceptible to the hill-climbing behaviours of gradient-based search.

Evolutionary computing (EC) began by lifting ideas from biological evolutionary theory into computer

science, and continues to look toward new biological research findings for inspiration. However, an

over enthusiastic “biology envy” can only be to the detriment of both disciplines by masking the

broader potential for two-way intellectual traffic of shared insights and analogizing from one another.

Three fundamental features of biological evolution illustrate the range of potential intellectual flow

between the two communities: particulate genes carry some subtle consequences for biological

evolution that have not yet translated mainstream EC; the adaptive properties of the genetic code

illustrate how both communities can contribute to a common understanding of appropriate evolutionary

abstractions; finally, EC exploration of representational language seems pre-adapted to help biologists

understand why life evolved a dichotomy of genotype and phenotype.

Why Genetic Algorithm...?

Genetic Algorithm raises a couple of important features. First it is a stochastic algorithm; randomness

as an essential role in genetic algorithms. Both selection and reproduction needs random procedures. A

second very important point is that genetic algorithms always consider a population of solutions.

Keeping in memory more than a single solution at each iteration offers a lot of advantages. The

algorithm can recombine different solutions to get better ones and so, it can use the benefits of

assortment. A population base algorithm is also very amenable for parallelization. The robustness of the

algorithm should also be mentioned as something essential for the algorithm success. Robustness refers

to the ability to perform consistently well on a broad range of problem types. There is no particular

requirement on the problem before using GAs, so it can be applied to resolve any problem. All those

features make GA a really powerful optimization tool.

With the success of Genetic Algorithms, other algorithms make in use of on the same principle of

natural evolution have also emerged. Evolution strategy, Genetic programming are some of those

similar of those similar algorithms. The classification is not always clear between those different

algorithms, thus to avoid any confusion, they are all gathered in what is called Evolutionary

Algorithms.

The analogy with nature gives to those algorithms something exciting and enjoyable, Their ability to

deal successfully with a wide range of problem area, including those which are difficult for other

methods to solve make them quite powerful. But today, GAs are suffering from too much trendiness.

GA’s are a new field, and parts of the theory have still to be properly established. We can find almost as

many opinions on GAs as there are researchers in this field. Things evolve quickly in genetic

algorithms, and some comments might not be very accurate in few years.

What are Genetic Algorithm’s. . . ?

“Genetic Algorithms are search and optimization techniques based on Darwin’s Principle of Natural

Selection.”

Evolutionary computing was introduced in the 1960s by I. Rechenberg in the work “Evolution

strategies”. This idea was then developed by other researches. Genetic Algorithms (GAs) was invented

by John Holland and developed this idea in his book “Adaptation in natural and artificial systems” in

the year 1975. Holland proposed GA as a heuristic method based on “Survival of the fittest”. GA was

discovered as a useful tool for search and optimization problems. Genetic algorithms are search

algorithms based on the mechanics of natural selection and natural genetics. They combine survival of

the fittest among string structures with a structured yet randomized information exchange to form a

search algorithm with some of the innovative flair of human search. In every generation, a new set of

artificial creatures (strings) is created using bits and pieces of the fittest of the old; an occasional new

part is tried for good measure. Each chromosome (string) consists of “genes” (e.g., bits), each gene

being in instance of a particular “allele” (e.g., 0 or 1). The selection operator chooses those

chromosomes in the population that will be allowed to reproduce, and on average the fitter

chromosome produce more offspring than the less fit ones.

Crossover exchanges subparts of two chromosomes, roughly mimicking biological re-combination

between two single-chromosome (haploid) organisms. Mutation randomly changes the allele values of

some locations in the chromosome. Inversion reverses the order of a contiguous section of the

chromosome, thus rear- ranging the orders in which genes are arrayed. GA’s efficiently exploit

historical information to speculate on new search points with expected improved performance.

Evolution and Genetic Algorithms

John Holland, from the University of Michigan began his work on genetic algorithms at the beginning

of the 60s. A first achievement was the publication of Adaptation in Natural and Artificial System in

1975. Holland had a double aim: to improve the understanding of natural adaptation process, and to

design artificial systems having properties similar to natural systems.

The basic idea is as follows: the genetic pool of a given population potentially contains the solution, or

a better solution, to a given adaptive problem. This solution is not “active” because the genetic

combination on which it relies is split between several subjects. Only the association of different

genomes can lead to the solution. Simply speaking, we could by example consider that the shortening

of the paw and the extension of the fingers of our basilosaurus are controlled by 2 “genes”. No subject

has such a genome, but during reproduction and crossover, new genetic combination occurs and,

finally, a subject can inherit a “good gene” from both parents: his paw is now a flipper.

Holland method is especially, highly effective because he not only considered the role of mutation

(mutations improve very seldom the algorithms), but he also utilized genetic recombination,

(crossover) these recombination, the crossover of partial solutions greatly improve the capability of the

algorithm to approach, and eventually find, the optimum.

Recombination or sexual reproduction is a key operator for natural evolution. Technically, it takes two

genotypes and it produces a new genotype by mixing the gene found in the originals. In biology, the

most common form of recombination is crossover, two chromosomes are cut at one point and the

halves are spliced to create new chromosomes. The effect of recombination is very important because it

allows characteristics from two different parents to be assorted. If the father and the mother possess

different good qualities, we would expect that all the good qualities will be passed into the child. Thus

the offspring, just by combining all the good features from its parents, may surpass its ancestors. Many

people believe that this mixing of genetic material via sexual reproduction is one of the most powerful

features of Genetic Algorithms. As a quick parenthesis about sexual reproduction, Genetic Algorithms

representation usually does not differentiate male and female individuals (without

any perversity). As in many livings species (e.g., snails) any individual can Genetic Algorithms be

either a male or a female. In fact, for almost all recombination operators, mother and father are

interchangeable.

Mutation is the other way to get new genomes. Mutation consists in changing the value of genes. In

natural evolution, mutation mostly engenders non-viable genomes.Actually mutation is not a very

frequent operator in natural evolution. Nevertheless, is optimization, a few random changes can be a

good way of exploring the search space quickly.

Through those low-level notions of genetic, we have seen how living beings store their characteristic

information and how this information can be passed into their offspring. It very basic but it is more

than enough to understand the Genetic Algorithm Theory.Darwin was totally unaware of the

biochemical basics of genetics. Now we know how the genetic inheritable information is coded in

DNA, RNA and proteins and that the coding principles are actually digital much resembling the

information storage in computers. Information processing is in many ways totally different, however.

The magnificent phenomenon called the evolution of species can also give some insight into

information processing methods and optimization in particular. According to Darwinism, inherited

variation is characterized by the following properties:

1) Variation must be copying because selection does not create directly anything, but presupposes

a large population to work on.

2) Variation must be small-scaled in practice. Species do not appear suddenly.

3) Variation is undirected. This is also known as the blind watchmaker paradigm.

While the natural sciences approach to evolution has for over a century been to analyze and

study different aspects of evolution to find the underlying principles, the engineering sciences are

happy to apply evolutionary principles, that have been heavily tested over billions of years, to attack

the most complex technical problems, including protein folding.

Nature to Computer Mapping Analogy

Nature Computer

Population Set of solutions

Individual Solution to a problem

Fitness Quality of a solution

Chromosomes Encoding for a solution

Gene Part of the encoding of a solution

Reproduction Crossover

Figure : Nature to Computer Mapping

Flowchart of Genetic Algorithm

Figure 3.2: Flowchart of Genetic Algorithm

The steps involved in the Genetic Algorithm are:

Step1: Create a random Initial Population

• An initial population is created from a random selection of solutions.

• The solutions have been seen as represented by chromosomes as in living organisms.

• The genetic information defines the behaviour of the individual.

• The genetic principles (way in which that information encodes the individual) enable the

individuals to evolve in a given environment.

• A chromosome is a packet of genetic information organized in a standard way that defines

completely and individual (solution).

• The genetic structure (way in which that information is packed and defined) enables the

solutions to be manipulated.

• The genetic operands (way in which that information can be manipulated) enables

• The solutions to reproduces and evolve.

Step 2: Evaluate Fitness

• A value for fitness is assigned to each solution (chromosome) depending on how close it

actually is to solving the problem.

• Therefore we need to define the problem, model it, simulate it or have a data set as sample

answers.

• Each possible solution has to be tested in the problem and the answer evaluated (or

marked) on how good it is.

• The overall mark of each solution relative to all the marks of all solutions produces a

fitness ranking.

Step 3: Produce Next Generation

• Those chromosomes with a higher fitness value are more likely to reproduce offspring.

• The population for the next Generation will be produced using the genetic operators.

• Reproduction by Copy or Crossing Over and Mutation will be applied to the chromosomes

according to the selection rule.

• This rule states that the fitter and individual is, the higher the probability it has to

reproduce.

Step 4: Next Generation or Termination

• If the population in the last generation contains a solution that produces an output that is close

enough or equal to the desired answer then the problem has been solved.

• This is the ideal termination criterion of the evolution.

• If this is not the case, then the new generation will go through the same process as their parents

did, and the evolution will continue.

• A termination criterion that always must be included is Time-Out.

• Since one drawback of Evolutionary Programming is that is very difficult (impossible

most of the time) to know if the ideal termination criterion is going to be satisfied.

Terminologies and Operators of GA

Genetic Algorithm uses a metaphor where an optimization problem takes the place of an

environment and feasible solutions are considered as individuals living in that environment. In genetic

algorithms, individuals are binary digits or of some other set of symbols drawn from a finite set. As

computer memory is made up of array of bits, anything can be stored in a computer and can also be

encoded by a bit string of sufficient length. Each of the encoded individual in the population can be

viewed as a representation, according to an appropriate encoding of a particular solution to the

problem. For Genetic Algorithms to find a best optimum solution, it is necessary to perform certain

operations over these individuals. The basic terminologies and operators used in Genetic Algorithms to

achieve a good enough solution for possible terminating conditions are discussed here.

Key Elements

The two distinct elements in the GA are individuals and populations. An individual is a single

solution while the population is the set of individuals currently involved in the search process.

Individuals

An individual is a single solution. Individual groups together two forms of solutions as given

below:

1. The chromosome, which is the raw ’genetic’ information (genotype) that the GA deals.

2. The phenotype, which is the expressive of the chromosome in the terms of the model.

Solution set Phenotype Factor 1 Factor 2 Factor 3 …………… Factor N

Gene 1 Gene 2 Gene 3 ………….... Gene N Chromosome Genotype

Figure : Representation of Genotype and Phenotype

1 0 1 0 1 0 1 1 1 0 1 0 1 1 0

Figure : Representation of Chromosomes

A chromosome is subdivided into genes. A gene is the GA’s representation of a single factor for

a control factor. Each factor in the solution set corresponds to gene in the chromosome. Figure 3.3

shows the representation of a genotype.

A chromosome should in some way contain information about solution that it represents. The

morphogenesis function associates each genotype with its phenotype. It simply means that each

chromosome must define one unique solution, but it does not mean that each solution encoded by

exactly one chromosome. Indeed, the morphogenesis function is not necessary objective, and it is even

sometimes impossible (especially with binary representation). Nevertheless, the morphogenesis

function should at least be subjective. Indeed; all the candidate solutions of the problem must

correspond to at least one possible chromosome, to be sure that the whole search space can be

explored. When the morphogenesis function that associates each chromosome to one solution is not

injective, i.e., different chromosomes can encode the same solution, the representation is said to be

degenerated. A slight degeneracy is not so worrying, even if the space where the algorithm is looking

for the optimal solution is inevitably enlarged. But a too important degeneracy could be a more serious

problem. It can badly affect the behaviour of the GA, mostly because if several chromosomes can

represent the same phenotype, the meaning of each gene will obviously not correspond to a specific

characteristic of the solution. It may add some kind of confusion in the search

Genes

Genes are the basic “instructions” for building a Generic Algorithms. A chromosome is a sequence of

genes. Genes may describe a possible solution to a problem, without actually being the solution. A gene

is a bit string of arbitrary lengths. The bit string is a binary representation of number of intervals from a

lower bound. A gene is the GA’s representation of a single factor value for a control factor, where

control factor must have an upper bound and lower bound. This range can be divided into the number

of intervals that can be expressed by the gene’s bit string. A bit string of length “n” can represent (2n-1)

intervals. The size of the interval would be (range)/ (2n-1).

The structure of each gene is defined in a record of phenotyping parameters. The phenotype parameters

are instructions for mapping between genotype and phenotype. It can also be said as encoding a

solution set into a chromosome and decoding a chromosome to a solution set. The mapping between

genotype and phenotype is necessary to convert solution sets from the model into a form that the GA

can work with, and for converting new individuals from the GA into a form that the model can

evaluate. In a chromosome, the genes are represented as in Figure 3.5 Representation of a gene.

Figure 3.5: Representation of a gene

1 0 1 0 1 1 1 0 1 1 1 1 0 1 0 1

Gene 1 Gene 2 Gene 3 Gene 4

Fitness

The fitness of an individual in a genetic algorithm is the value of an objective function for its

phenotype. For calculating fitness, the chromosome has to be first decoded and the objective function

has to be evaluated. The fitness not only indicates how good the solution is, but also corresponds to

how close the chromosome is to the optimal one. In the case of multi-criterion optimization, the fitness

function is definitely more difficult to determine. In multi-criterion optimization problems, there is

often a dilemma as how to determine if one solution is better than another. What should be done if a

solution is better for one criterion but worse for another? But here, the trouble comes more from the

definition of a ’better’ solution rather than from how to implement a GA to resolve it. If sometimes a

fitness function obtained by a simple combination of the different criteria can give good result, it

suppose that criteria can be combined in a consistent way. But, for more advanced problems, it may be

useful to consider something like Pareto optimally or others ideas from multi-criteria optimization

theory.

Populations

A population is a collection of individuals. A population consists of a number of individuals being

tested, the phenotype parameters defining the individuals and some information about search space.

The two important aspects of population used in Genetic Algorithms are:

1. The initial population generation.

2. The population size.

Chromosome 1 1 1 1 0 0 0 1 0

Population Chromosome 2 0 1 1 1 1 0 1 1

Chromosome 3 1 0 1 0 1 0 1 0

Chromosome 4 1 1 0 0 1 1 0 0

Figure : Population

For each and every problem, the population size will depend on the complexity of the problem. It is

often a random initialization of population is carried. In the case of a binary coded chromosome this

means, that each bit is initialized to a random zero or one. But there may be instances where the

initialization of population is carried out with some known good solutions Figure 3.6 Population.

Ideally, the first population should have a gene pool as large as possible in order to be able to explore

the whole search space. All the different possible alleles of each should be present in the population. To

achieve this, the initial population is, in most of the cases, chosen randomly. Nevertheless, sometimes a

kind of heuristic can be used to seed the initial population. Thus, the mean fitness of the population is

already high and it may help the genetic algorithm to find good solutions faster. But for doing this one

should be sure that the gene pool is still large enough. Otherwise, if the population badly lacks

diversity, the algorithm will just explore a small part of the search space and never find global optimal

solutions.

The size of the population raises few problems too. The larger the population is, the easier it is to

explore the search space. But it has established that the time required by a GA to converge is O (nlogn)

function evaluations where n is the population size. We say that the population has converged when all

the individuals are very much alike and further improvement may only be possibly by mutation.

Goldberg has also shown that GA efficiency to reach global optimum instead of local ones is largely

determined by the size of the population. To sum up, a large population is quite useful. But it requires

much more computational cost, memory and time. Practically, a population size of around 100

individuals is quite frequent, but anyway this size can be changed according to the time and the

memory disposed on the machine compared to the quality of the result to be reached. Population being

combination of various chromosomes is represented as in Figure 3.6 . Thus the above population

consists of four chromosomes.

Genetic Operators

Selection

Selection is the process of choosing two parents from the population for crossing. After

deciding on an encoding, the next step is to decide how to perform selection i.e., how to choose

individuals in the population that will create offspring for the next generation and how many offspring

each will create. The purpose of selection is to emphasize fitter individuals in the population in hopes

that their off springs have higher fitness. Chromosomes are selected from the initial population to be

parents for reproduction. The problem is how to select these chromosomes. According to Darwin’s

theory of evolution the best ones survive to create new offspring. The Figure 3.7 shows the basic

selection process.

Selection is a method that randomly picks chromosomes out of the population according to their

evaluation function. The higher the fitness function, the more chance an individual has to be selected.

The selection pressure is defined as the degree to which the better individuals are favored. The higher

the selection pressured, the more the better individuals are favoured. This selection pressure drives the

GA to improve the population fitness over the successive generations.

Figure 3.7 : Selection

The convergence rate of GA is largely determined by the magnitude of the selection pressure, with

higher selection pressures resulting in higher convergence rate.

Genetic Algorithms should be able to identify optimal or nearly optimal solutions under a wise range of

selection scheme pressure. However, if the selection pressure is too low, the convergence rate will be

slow, and the GA will take unnecessarily longer time to find the optimal solution. If the selection

pressure is too high, there is an increased change of the GA prematurely converging to an incorrect

(sub-optimal) solution. In addition to providing selection pressure, selection schemes should also

preserve population diversity, as this helps to avoid premature convergence.

Typically we can distinguish two types of selection scheme, proportionate selection and ordinal-based

selection. Proportionate-based selection picks out individuals based upon their fitness values relative to

the fitness of the other individuals in the population. Ordinal-based selection schemes select individuals

not upon their raw fitness, but upon their rank within the population. This requires that the selection

pressure is independent of the fitness distribution of the population, and is solely based upon the

relative ordering (ranking) of the population.

It is also possible to use a scaling function to redistribute the fitness range of the population in order to

Mating Pool

The two best Individuals

New

Population

adapt the selection pressure. For example , if all the solutions have their fitness in range [999, 1000],

the probability of selecting a better individual than any other using a proportionate-based method will

not be important. If the fitness in every individual is brought to the range [0, 1] equitably, the

probability of selecting good individual instead of bad one will be important.

Selection has to be balanced with variation form crossover and mutation. Too strong selection means

sub optimal highly fit individuals will take over the population, reducing the diversity needed for

change and progress; too weak selection will result in too slow evolution.

Crossover (Recombination)

Crossover is the process of taking two parent solutions and producing from them a child. After the

selection (reproduction) process, the population is enriched with better individuals. Reproduction

makes clones of good strings but does not create new ones. Crossover operator is applied to the mating

pool with the hope that it creates a better offspring.

1 0 1 0 0 0 0 0 0 0 Parent 1 Offspring 1 1 0 1 1 0 1 1 1 1 1

1 0 0 1 0 1 1 1 1 1 Parent 2 Offspring 2 1 0 1 0 0 0 0 0 0 0

Figure : Crossover

Crossover is a recombination operator that proceeds in three steps:

i. The reproduction operator selects at random a pair of two individual strings for the

mating.

ii. A cross site is selected at random along the string length.

iii. Finally, the position values are swapped between the two strings following the cross site.

That is, the simplest way how to do that is to choose randomly some crossover point and copy

everything before this point from the first parent and then copy everything after the crossover point

from the other parent. The various crossover techniques are discussed as follows:

Mutation

After crossover, the strings are subjected to mutation. Mutation prevents the algorithm to be trapped in

a local minimum. Mutation plays the role of recovering the lost genetic materials as well as for

randomly disturbing genetic information. It is an insurance policy against the irreversible loss of

genetic material. Mutation has traditionally considered as a simple search operator. If crossover is

supposed to exploit the current solution to find better ones, mutation is supposed to help for the

exploration of the whole search space. Mutation is viewed as a background operator to maintain genetic

diversity in the population. It introduces new genetic structures in the population by randomly

modifying some of its building blocks. Mutation helps escape from local minima’s trap and maintains

diversity in the population. It also keeps the gene pool well stocked, and thus ensuring ergodicity. A

search space is said to be ergodic if there is a non-zero probability of generating any solution from any

population state.

Mutate

Offspring 1 1 0 1 1 0 1 1 1 1 1 Offspring 1 1 0 1 1 0 0 1 1 1 1

Offspring 2 1 0 1 0 0 0 0 0 0 0 Offspring 2 1 0 0 0 0 0 0 0 0 0

Figure : Mutation

There are many different forms of mutation for the different kinds of representation. For binary

representation, a simple mutation can consist in inverting the value of each gene with a small

probability. The probability is usually taken about 1/L, where L is the length of the chromosome. It is

also possible to implement kind of hill- climbing mutation operators that domutation only if it improves

the quality of the solution. Such an operator can accelerate the search. But care should be taken,

because it might also reduce the diversity in the population and makes the algorithm converge toward

some local optima. Mutation of a bit involves flipping a bit, changing 0 to 1 and vice-versa

Replacement

Replacement is the last stage of any breeding cycle. Two parents are drawn from a fixed size

population, they breed two children, but not all four can return to the population, so two must be

replaced i.e., once off springs are produced, a method must determine which of the current members of

the population, if any, should be replaced by the new solutions. The technique used to decide which

individual stay in a population and which are replaced in on a par with the selection in influencing

convergence. Basically, there are two kinds of methods for maintaining the population; generational

updates and steady state updates.

The basic generational update scheme consists in producing N children from a population of size N to

form the population at the next time step (generation), and this new population of children completely

replaces the parent selection. Clearly this kind of update implies that an individual can only reproduce

with individuals from the same generation. Derived forms of generational update are also used like ( _

+ μ) -update and (_, μ)-update. This time from a parent population of size μ, a little of children is

produced of size _ _ μ. Then the μ best individuals from either the offspring population or the combined

parent and offspring populations (for (_, μ) - and (_ + μ) -update respectively), form the next

generation.

In a steady state update, new individuals are inserted in the population as soon as they are created, as

opposed to the generational update where an entire new generation is produced at each time step. The

insertion of a new individual usually necessitates the replacement of another population member. The

individual to be deleted can be chosen as the worst member of the population. (it leads to a very strong

selection pressure), or as the oldest member of the population, but those method are quite radical:

Generally steady state updates use an ordinal based method for both the selection and the replacement,

usually a tournament method. Tournament replacement is exactly analogous to tournament selection

except the less good solutions are picked more often than the good ones. A subtitle alternative is to

replace the most similar member in the existing population.

Search Termination (Convergence Criteria)

In short, the various stopping condition are listed as follows:

• Maximum generations: The genetic algorithm stops when the specified numbers of generations

have evolved.

• The genetic process will end when a specified time has elapsed.

Note: If the maximum number of generation has been reached before the specified time has

elapsed, the process will end.

• No change in fitness: The genetic process will end if there is no change to the population’s best

fitness for a specified number of generations.

Note: If the maximum number of generation has been reached before the specified number of

generation with no changes has been

• Stall generations: The algorithm stops if there is no improvement in the objective function for a

sequence of consecutive generations of length Stall generations.

• Stall time limit: The algorithm stops if there is no improvement in the objective function during

an interval of time in seconds equal to Stall time limit.

The termination or convergence criterion finally brings the search to a halt.

Advantages of Genetic Algorithm

Genetic Algorithm, describes the field of investigation that concerns all evolutionary algorithms and

offers practical advantages to several optimization problems. The advantages include the simplicity of

the approach, its robust response to changing circumstances, and its flexibility and so on. This section

briefs some of these advantages and offers suggestions in designing evolutionary algorithms for real-

world problem solving.

Conceptual Simplicity

A key advantage of evolutionary computation is that it is conceptually simple. The algorithm consists

of initialization, iterative variation and selection in light of a performance index. In particular, no

gradient information needs to be presented to the algorithm. Over iterations of random variation and

selection, the population can be made to converge to optimal solutions. The effectiveness of an

evolutionary algorithm depends on the variation and selection operators as applied to a chosen

representation and initialization.

Broad Applicability

Evolutionary algorithms can be applied to any problems that can be formulated as function

optimization problems. To solve these problems, it requires a data structure to represent solutions, to

evaluate solutions from old solutions. Representations can be chosen by human designer based on his

intuition. Representation should allow for variation operators that maintain a behavioural link between

parent and offspring. Small changes in structure of parent will lead to small changes in offspring, and

similarly large changes in parent will lead to drastic alterations in offspring. In this case, evolutionary

algorithms are developed, so that they are tuned in self adaptive Apply Selection Start manner. This

makes the evolutionary computation to be applied to broad areas which includes, discrete combinatorial

problems, mixed-integer problems and so on.

Parallelism

Evolution is a highly parallel process. When distributed processing computers become more popular

are readily available, there will be increased potential for applying evolutionary computation to more

complex problems. Generally the individual solutions are evaluated independently of the evaluations

assigned to competing solutions. The evaluation of each solution can be handled in parallel and

selection only requires some serial operation. In effect, the running time required for an application

may be inversely proportional to the number of processors. Also, the current computing machines

provide sufficient computational speed to generate solutions to difficult problems in reasonable time.

Robust to Dynamic Changes

Traditional methods of optimization are not robust to dynamic changes in the environment and they

require a complete restart for providing a solution. In contrary, evolutionary computation can be used to

adapt solutions to changing circumstances. The generated population of evolved solutions provides a

basis for further improvement and in many cases, it is not necessary to reinitialize the population at

random. This method of adapting in the face of a dynamic environment is a key advantage.

Solves Problems that have no Solutions

The advantage of evolutionary algorithms includes its ability to address problems for which there is no

human expertise. Even though human expertise should be used when it is needed and available; it often

proves less adequate for automated problem- solving routines. Certain problems exist with expert

system: the experts may not agree, may not be qualified, may not be self-consistent or may simply

cause error. Artificial intelligence may be applied to several difficult problems requiring high

computational speed, but they cannot compete with the human intelligence.

Limitations of Genetic Algorithm

• The problem of identifying fitness function

• The problem of choosing the various parameters like the size of the population, mutation rate,

cross over rate, the selection method and its strength.

• Cannot easily incorporate problem specific information

• Not good at identifying local optima

• No effective terminator.

• Not effective for smooth unimodal functions

• Needs to be coupled with a local search technique.

• Require large number of response (fitness) function evaluations.

Applications of Genetic Algorithm

Genetic Algorithm has drawn much attention as optimization methods in the last two decades. From the

optimization point of view, the main advantage of Genetic Algorithm is that they do not have much

mathematical requirements about the optimization problems. All they need is an evaluation of the

objective function. As a result, they are applied to non-linear problems, defined on discrete, continuous

or mixed search spaces, constrained or unconstrained.

The applications of evolutionary computation include the following fields:

• Medicine (for example in breast cancer detection).

• Engineering application (including electrical, mechanical, civil, production, aeronautical and

robotics).

• Travelling salesman problem.

• Machine intelligence.

• Expert system

• Network design and routing

• Wired and wireless communication networks and so on.

Many activities involve unstructured, real life problems that are difficult to model, since they require

several unusual factors. Certain engineering problems are complex in nature: job shop scheduling

problems, timetabling, travelling salesman or facility layout problems. For all these applications,

evolutionary computation provides a near-optimal solution at the end of an optimization run.

Evolutionary algorithms are thus made efficient because they are flexible, and relatively easy to

hybridize with domain- dependent heuristics.

Modeling genetic and biochemical networks

While the completed sequence of the human genome as today’s largely unreadable Rosetta

stone awaits deciphering, the first milestones have been passed in the adventure of deciphering genome

function. Examples are the development of microarray techniques that currently revolutionize gene

expression studies on up to genome wide scales, as well as advanced data analysis tools to reconstruct

gene regulatory interactions. Several inter esting model systems are currently being studied where first

genome-wide expression analyses provide a complementary view on gene functioning, well beyond the

one-gene-one-protein perspective. Furthermore, detailed studies of separable genetic subcircuits as

functional modules of genomes are successfully performed with a number of model organisms.

However, a full understanding of genome dynamics on a larger scale is not as easily available and will

probably require a much bigger effort. Not only experimentalists, but also the theoretical sciences feel

challenged by this problem. They study possible approaches to an understanding of genome dynamics

and how a theorist can contribute to the toolbox of molecular biology and bioinformatics. One such

approach is a systems scale view of the genome as a complex interacting system of many components.

Indeed, mathematical and physical sciences have found ways to approach complex dynamic systems in

various branches of science, and one may ask whether such approaches could be applicable to the

genome and the new challenges of data-driven branches of the biosciences. For example, comparing

the basic mechanism of transcriptional regulators to that of simple switches makes approaches

applicable to basic questions in gene regulation that study complex dynamic systems and use tools

from theoretical physics. An interesting question is what could be in principle the dynamics of large

systems of interconnected (genetic) switches? While currently experimentally inaccessible in the

regulatory circuit of a genome, such a question can well be answered in theoretical model systems of

many switches. One question studied in such models directly addresses the dynamics of networks of

regulatory genes, as observed in cell control and differentiation. A second complex of questions

addresses evolutionary genomics and how gene regulation interacts with biological evolution, as for

example seen in speciation in the face of a strong requirement of stability of genetic networks.

Reconstruction of Global Genetic Network Properties

Bioinformatics algorithms for gene expression modeling face considerable difficulties, from the high

quantitative error of expression experiments to an insufficient number of data points when aimed at

genetic network reconstruction. For genetic subnetworks that are mostly modular and only loosely

connected to the rest of the genome, modeling works quite well, in particular cluster-

ing of expression data into groups of co-expressed genes (Wen et al., 1998; Basset et al., 1999) . In

networks where data are not as well separated, however, clustering often is ambiguous. Also, clustering

relies on strictly linear gene-gene interactions (Bittner et al., 1999).

Beyond clustering co-expressed genes from array data, we finally are faced with the underlying

complexity of genome-wide function. In a simple toy model at least, one finds that this problem may

finally not persist in the experiment: a simple estimate of the number of required experiments for a full

reconstruction of a network shows that only about K log(N) microarray experiments would be

necessary for an approximate reconstruction of a full network, where K is the average number of

regulatory genes that affect a given gene and N is the overall number of genes (Hertz, 1998). While this

is clearly more than currently available datasets can offer, experiments of this size are not

inconceivable in the future. How this problem scales for real genetic networks and in the face of noisy

data is, however, quite open today.

A conceptual problem in the reconstruction of genetic networks from raw expression data that remains

unsolved is the trivial fact that measuring correlation in general is not sufficient to infer causality

between genes. Here, a combination of algorithms that closely interact with experimental data could

obtain causal information (D’haeseleer et al., 2000). As predictive genetic network models will be out

of reach for quite some time it will be worthwhile to ask some more basic questions about what

principle dynamical properties such model networks can exhibit. For such an approach we will use

types of networks that are also used in the above reconstruction approaches and their basic assumptions

about transcriptional regulation.

Modeling Genetic Network Evolution Without Fitness

A computer study of artificial neutral genetic network evolution without any explicit definition

of a fitness function which explores further this viewpoint has first been published in (Bornholdt and

Sneppen, 1998, 2000). It will be recapitulated in this section. An observation that challenges the role

often ascribed to fitness in evolution is the fact that one often observes different phenotypes for the

same genotype, as enabled by gene regulation and observed in such diverse examples as cell

differentiation, metamorphosis, and other epigenetic phenomena. An important non-trivial mechanism

for evolution may thus be the exposure of the same species to different environments. The species then

faces a variable selection criterion, with the consequence that what is phenotypically neutral at some

instant may not be phenotypically neutral at later instants. Thus, in contrast to the molecular neutrality

where many RNA genotypes have the same phenotype (Schuster, 1997), in genetic network neutrality

more than one phenotype for each genotype may occur.

In the following, a class of model systems is studied that exhibits epigenetics as a simple model for

transcriptional regulation. It is represented by logical networks, where nodes in the network take values

on or off, as a function of the output of specified other nodes. In terms of these models it is natural to

define genotypes in form of the topology and rules of the nodes in the network. The phenotypes are

similarly associated to the dynamical expression patterns of the network. As a prerequisite a model for

evolution should fulfill the requirement of robustness. Robustness is defined as the ability to function

in spite of substantial change in components (Savageau, 1971; Hartwell, 1997; Alon et al., 1999; Little

et al., 1999). Robustness is an important ingredient in simple molecular networks and probably also an

important feature of gene regulation on both, small and large scale. In the framework of an

evolutionary model based on logical networks, robustness is implemented by requiring that mutations

of the regulatory network do not change expression patterns. Network types that exhibit epigenetics are

Boolean networks (Kauffman, 1969), and a subset of those are the threshold networks (Kürten, 1988a,

b). In these networks each node takes on one of two discrete values, σi = ±1, that at each time step is a

function of the value of some fixed set of other nodes. The links that provide input to node i are

denoted by {wij} with wij = ±1. A crucial structural parameter of the network is its connectivity K,

defined as the average number of incoming (non-zero) weights per node. The updating rule for the

dynamics on the network is defined as follows: For the threshold network case it is additive:

The threshold networks are well known as a type of neural networks, where a certain number of

input firings are necessary to induce firing in a given neuron (Kürten, 1988a, b). Boolean networks are

mostly discussed in connection with genetic networks, as the specificity of protein binding in principle

enables the implementation of more detailed logical functions. On the other hand, threshold networks

to a good approximation represent the basic principle of transcriptional regulation (Wagner, 1996). The

basic property of logical networks is a dynamics of the state vector {σi} characterized by transients that

lead to subsequent attractors, the periodic activity pattern to which the network dynamics converges.

The attractor length depends on the topology of the network. Below a critical connectivity Kc ~ 2

(Derrida and Pomeau, 1986; Kauffman, 1993) the network decouples into many disconnected regions,

i. e., the corresponding genome expression would become modular, with essentially independent gene

activities. Above Kc any local damage will initiate an avalanche of activity that may propagate

throughout most of the system. For any K above Kc the attractor period diverges exponentially with

respect to the number of nodes N, and in some interval above Kc the period length in fact also increases

nearly exponentially with connectivity K (Bastola and Parisi, 1996). Note that here the critical

connectivity (or coordination number) equals 2, compared to unity in usual random graphs (Erdös and

Renyi, 1960; Bollobas, 1985), due to the Boolean logic. Criticality means that a change at a node in the

network spreads marginally throughout the network. This picture is particularly simple for Boolean net-

works where any activity change of a node has the probability 1/2 to propagate along any link for

random Boolean rules, so that an average of 2 links have to leave each node to create the critical state.

For threshold networks similar arguments apply.

The evolution of the network topology is defined as a change in the wiring {wij} → {w’ij} that takes

place on a much slower time scale than the {σj} updating. The evolution of such networks represents

the extended degree of genetic network engineering that seems to be needed to account for the large

differences in the structure of species genomes (Shapiro, 1998), given the slow and steady speed of

single protein evolution (Kimura, 1983). The model will extend neutral evolution on the molecular

scale to neutral evolution on the regulatory level, and demonstrate that neutrality in itself enforces

constraints on the evolved graphs.

First it has been proposed to evolve Boolean networks with the sole constraint of continuity in

expression pattern (Bornholdt and Sneppen, 1998). Later this model has been simplified to

transcriptional regulators combined with a simple test of damage spreading (Bornholdt and

Sneppen, 2000): the model evolves a new single network from an old network by accepting rewiring

mutations with a rate determined by expression overlap. This is a minimal constraint scenario with no

outside fitness imposed. Furthermore, the model tends to select for networks which have high overlap

with neighbor mutant networks, thus securing robustness. The model is defined as follows: consider a

threshold network with N nodes. To each of these a logical variable σi = – 1 or + 1 is assigned.

The states {σi} of the N nodes are simultaneously updated according to (1, 2) where the links wij are

specified by a matrix. The entry value of the connectivity matrix wij may take the values – 1 and +1 in

case of a link between i and j, and the value 0 if i is not connected to j. The system

that is evolved is the set of couplings wij in a single network. One evolutionary time step of the

network is:

(1) create a daughter network by (a) adding, (b) removing, or (c) adding and removing a weight in

the coupling matrix wij at random, each option occurring with probability p = 1/3. This means turning a

wij = 0 to a randomly chosen ±1 or vice versa.

(2) Select a random input state {σi}. Iterate simultaneously both the mother and the daughter system

from this state until they either have reached and completed the same attractor cycle, or until a time

where {σi} differs be tween the two networks. In case their dynamics is identical then replace the

mother with the daughter network. In case their dynamics differs, keep the mother network.

Thus, the dynamics looks for mutations which are phenotypically silent, i. e., they are neutrally

inherited under at least some external condition. Note that adding a link involves selecting a new wij,

thus changing the rule on the same time scale as the network connectivity. Iterating these steps

represents an evolution which proceeds by checking overlap in expression pattern between networks. If

there are many states {σi} that give the same expression of the two networks, then transitions between

them are fast. On the other hand, if there are only very few states {σi} which result in the same

expression for the two networks, then the transition rate from one network to the other is small. If this

is true for all its neighbors then the evolutionary process will be hugely slowed down. Interestingly,

other than in existing concepts of selective neutrality (Schuster 1997), these transition rates are not

constant in this model of regulatory neutrality. In particular, they are a function of the evolving

connectivity K of the network instead.

Genetic Network Models and Experiment

For the neutral evolution scenario, a link to macroevolution can be drawn as the intermittent evolution

of the networks is reminiscent of punctuated equilibrium as observed for species in the fossil record

(Gould and Eldredge, 1993).Quantitatively, the 1/t2 distribution of lifetimes for single networks that

one finds for this model, as well as for the earlier version (Bornholdt and Sneppen, 1998), compares

well with similar scaling observed for the statistics of birth and death of individual species in the

evolutionary record (Bak and Sneppen, 1993). In fact, the analogy can even be fine-grained into a sum

of characteristic lifetimes, each associated to a given structural feature of the networks (Bornholdt and

Sneppen, 1998). A similar decomposition is known from the fossil record (VanValen, 1973), where

groups of related species display Poisson-distributed lifetimes and, therefore, similar evolutionary

stability. esting the models at the molecular level of gene regulation can be based either on direct

probing of genetic networks, but also on evolution experiments of fast-lived organisms such as E. coli

(Papadopoulos et al., 1999). Information on the overall organization of these genetic networks is

obtained from correlated gene knock-out experiments. A quantitative estimate for the overall degree of

connectivity in the genome can be deduced from Elena and Lenski’s experiments (1999) on double

mutants, which demonstrated that about 30 – 60% of these (dependent on interpretation) change their

fitness in a cooperative manner. In terms of the artificial network models, one should expect a coupled

genetic expression for about half of the of pairs of genes. Although the evolved networks can give such

correlations for current connectivity estimates, the uncertainty is still so large that random networks

also are in accordance with data. Further ne should keep in mind that the E. coli genome is large and

not well represented by threshold dynamics of all nodes, and also that only between 45 and 178 of the

E. coli’s 4290 genes are likely to mediate regulatory functions (Blattner, 2000). Thus, most of the

detected gene- gene correlations presumably involve genes which are not even regulatory, but instead

metabolic and their effect on each other more indirect than in the case of the regulatory ones. One

would obtain stronger elements of both, coupling and correlation, if one specialized on regulatory

genes. Thus one may wish to perform experiments where one- and two-point mutations are performed

in regulatory genes only. A more direct test of the hypothesis of robustness in form of damage control

as a selection criterion may be obtained from careful analysis of the evolution of gene regulation in

evolving E. coli cultures.

A further recent experimental approach is the study of the divergence of duplicate genes and the

divergence of their expression patterns. In a study in yeast (Wagner, 2000a) it was observed that the

expression patterns of duplicate genes diverge at speeds almost uncorrelated to the divergence of the

original sequence, pointing to a high flexibility on the genetic network level. Again, for the computer

experiments discussed here only coupled knock-out experiments would be conclusive, which

would be particularly interesting in duplicate genes. Another interesting experimental observation is the

simplicity of biological expression patterns. For example as observed in yeast many genes are only

active one or two times during the expression cycle (Cho et al., 1998), thus switching from off to on or

on to off occurs for each gene in this system only a few times during expression. For random dynamic

networks of comparable size one would expect a much higher activity. Thus surprisingly simple

expression patterns are observed in biological gene regulatory circuits. This bears resemblance with the

first model’s observation where simplicity of expression patterns emerges as a result of a the

evolutionary constraint of robustness.

A common observation of the models discussed above is the emergence of networks that are

mutationally robust compared to random networks. A similar observation is made experimentally in

yeast where the robustness of the gene regulation networks against single gene mutations has been

tested (Wagner, 2000b). A main observation is that single gene mutations are often phenotypically

silent, possibly due to a buffering of the intact gene regulation circuit for this single error. Wagner’s

study seems to indicate that quite unrelated genes are major agents in this buffering, rather than quasi-

redundant copies of the mutated genes in the form of closely related genes. As an effect, this might be

an evolved response of the global genetic network to stabilizing selection.

A further key observation is the estimated average connectivity K of 2 → 3 in the E. coli genome

(Thieffry et al., 1998). The second model of genetic network evolution by local adaptations

demonstrates how such an intermediate connectivity of a regulatory network may emerge by self-

organization. With respect to genetic networks one may discuss whether biological evolution exerts

selection pressure on the single gene level, that results in a selection rule similar to that model. Namely,

for a frozen regulatory gene which is practically non-functional to obtain a new function (obtain a new

link), as well as for a highly active gene to reduce functionality (remove a link). It is interesting to note

that the robust self-organizing algorithm described here provides a mechanism that in principle predicts

a value in the observed range.

Modeling of Biochemical Networks

Computational modeling and simulation of biochemical networks is at the core of systems biology and

this includes many types of analyses that can aid understanding of how these systems work.

Biochemical networks are intrinsically complex, not only because they encompass a large number of

interacting components, but also because those interactions are nonlinear. Like many other nonlinear

phenomena in nature, their behavior is often unintuitive and thus quantitative models are needed to

describe and understand their function. While the concept of biochemical networks arose from the

reductionist process of biochemistry, where the focus was on studying isolated enzymatic reactions, it

is now better understood in the framework of systems biology, where the focus is on the behavior of the

whole system, or at least several reactions, and particularly on what results from the interactions of its

parts. Computational modeling is thus a technique of systems biology as important as its experimental

counterparts.

From a modeling perspective, biochemical networks are a set of chemical species that can be converted

into each other through chemical reactions. The focus of biochemical network models is usually on the

levels of the chemical species and this usually requires explicit mathematical expressions for the

velocity at which the reactions proceed. The most popular representation for these models uses

ordinary differential equations (ODEs) to describe the change in the concentrations of the chemical

species. Another representation that is gaining popularity in systems biology uses probability

distribution functions to estimate when single reaction events happen and therefore track the number of

particles of the chemical species. As a general rule, the latter approach, known as stochastic simulation,

is preferred where the numbers of particles of a chemical species is small; the ODE approach is

required when the number of particles is large because the stochastic approach would be

computationally intractable.

ODE-Based Models

Each chemical species in the network is represented by an ODE that describes the rate of change of that

species along time. The ODE is composed by an algebraic sum of terms that represent the rates of the

reactions that affect the chemical species. For a chemical species X:

where si is a stoichiometry coefficient that is the number of molecules of X consumed or produced in

one cycle of reaction i, with a positive sign if it is produced or negative if consumed, and vi is the

velocity of reaction i. Obviously, for reactions that do not produce or consume X the corresponding si is

zero. The velocity of each reaction is described by a rate law that depends on the concentrations of the

reaction substrates, products, and modifiers . Rate laws are the subject of chemical and enzyme kinetics

Often these rate laws are saturable functions, i.e., have finite limits for high concentrations of

substrates, products, and also for many modifiers and are generally nonlinear (except the case of first-

order mass action kinetics).

An example of a rate law is depicted in Eq. 2, which represents a rate law of reaction with one substrate

(S), one product (P), and a competitive inhibitor (I)

In Eq. 2, the limiting

rate of reaction (“Vmax”) is directly represented as a product of the concentration of the enzyme and

the turnover number (E·kcat). It is usually good practice to make this product explicit, since it is then

possible to have the enzyme concentration be a variable of the model too. This is important if the

model includes protein synthesis, degradation, or protein– protein interactions. These ODE models can

be used to simulate the dynamics of the concentrations of the chemical species along time given their

initial values. This is achieved by numerical integration of the system of ODE which can be carried out

with well-established algorithms .It is also useful to find steady states of the system, which are

conditions when the concentrations of the chemical species do not change. If the steady state is such

that the fluxes are also zero, then the system is in chemical equilibrium, otherwise the fluxes are finite

meaning that the concentrations do not change because the rates of synthesis balance with the rates of

degradation for every chemical species. Steady states can be found using the Newton–Raphson

method which finds the roots of the right-hand side of the ODE (which must be zero by the definition

of steady state). Alternatively steady states can also be found by integration of the ODE.

Stochastic Models

When analyzing a biochemical system which contains small numbers of particles of each reactant, the

assumption of continuous concentrations fails and consequently the underlying basis of the ODE

representation also fails. Moreover, in such conditions, stochastic effects become more pronounced and

may lead to dynamics that differ significantly from those that would result from the ODE approach. In

the conditions described above, one should then use a stochastic discrete approach for the simulation of

the system dynamics. Stochastic models represent the number of particles of each chemical species and

use a reaction probability density function (PDF) to describe the timing of reaction events. Gillespie

developed a Monte Carlo simulation algorithm, known as the stochastic simulation algorithm (SSA)

first reaction method, that simulates the stochastic dynamics of the system by sampling this PDF(3, 4).

It is important to stress that one simulation run according to this approach is only one realization of a

probabilistic representation, and thus provides limited amount of information on its own. In the

stochastic formalism, it is very important that simulations are repeated for a sufficient number of times

in order to reveal the entire range of behavior presented by such a system (i.e., to estimate a distribu-

tion for each chemical species and its dynamic evolution).

MODULE V

Typical Modeling Projects: Protein Folding, Financial modeling, Medical

modeling: surgery simulation, Stochastic modeling of genetic and biochemical

networks, Population explosion in insects, epidemics

Module V

Typical Modeling Projects:

1. Protein Folding

The function of a protein can only be interpreted from its structure. The nervous

system is a network of cells, and the peculiar functional properties of these cells can be

derived from the properties and interactions of their proteins. Proteins are involved in

all stages of neural activity. Those embedded wholly or partly in membranes regulate

the transport of ions and molecules as a means of signal exchange with other cells and

the external medium. Some of them have enzymatic functions to catalyze the chemical

processes essential for function. The diverse and highly specific function of proteins is

a consequence of their sophisticated, individual surface pattern regarding shape, charge,

and hydrophobicit. The surface pattern is a consequence of the unique three

dimensional structure of the polypeptide chain. Proteins are linear polymers with non

repetitive, specific covalent structure. The covalent structure is determined by the order

of amino acids in which they are linked together. Since Anfinsen‘s famous experiments

(1973) in the 1960s, it has been believed and today generally accepted that folding and

the resulting native structure of proteins are autonomously governed and determined by

the amino acid sequence of a particular protein and its natural solvent environment.

Protein folding is the process by which a protein structure assumes its functional shape

or conformation. It is the physical process by which a polypeptide folds into its

characteristic and functional three-dimensional structure from random coil. Each protein

exists as an unfolded polypeptide or random coil when translated from a sequence of

mRNA to a linear chain of amino acids. This polypeptide lacks any developed three-

dimensional structure (the left hand side of the neighboring figure). Amino acids

interact with each other to produce a well-defined three-dimensional structure, the

folded protein (the right hand side of the figure), known as the native state. The

resulting three-dimensional structure is determined by the amino acid sequence. Protein

molecules are responsible for almost all biological functions in cells. In order to fulfil

their various biological roles, these chain-like molecules must fold into precise three-

dimensional shapes. Incorrect folding and clumping together of proteins is being

recognized as the cause for a growing number of age-related diseases, including

Alzheimer‘s and Parkinson‘s disease as well as other neurodegenerative disorders.

Relationship between protein and amino acid sequence

The correct three-dimensional structure is essential to function, although some parts of

functional proteins may remain unfolded. Failure to fold into native structure produces

inactive proteins that are usually toxic. Several neurodegenerative and other diseases are

believed to result from the accumulation of amyloid fibrils formed by misfolded

proteins. Many allergies are caused by the folding of the proteins, for the immune

system does not produce antibodies for certain protein structures. The amino-acid

sequence of a protein determines its native conformation. A protein molecule folds

spontaneously during or after biosynthesis. While these macromolecules may be

regarded as "folding themselves", the process also depends on the solvent (water orlipid

bilayer), the concentration of salts, the temperature, and the presence of molecular

chaperones.

Folded proteins usually have a side chain. Packing stabilizes the folded state, and

charged or side chains occupy the solvent-exposed surface where they interact with

surrounding water. Minimizing the number of hydrophobic side-chains exposed to

water is an important driving force behind the folding process. Formation of

intramolecular hydrogen bonds provides another important contribution to protein

stability. The strength of hydrogen bonds depends on their environment, thus H-bonds

enveloped in a hydrophobic core contribute more than H-bonds exposed to the aqueous

environment to the stability of the native state. The process of folding often begins co-

translationally, so that the N-terminus of the protein begins to fold while the C-terminal

portion of the protein is still being synthesized by the ribosome. Specialized proteins

called chaperones assist in the folding of other proteins. A well studied example is the

bacterial GroEL system, which assists in the folding of globular proteins. In eukaryotic

organisms chaperones are known as heat shock proteins. Although most globular

proteins are able to assume their native state unassisted, chaperone-assisted folding is

often necessary in the crowded intracellular environment to prevent aggregation;

chaperones are also used to prevent misfolding and aggregation that may occur as a

consequence of exposure to heat or other changes in the cellular environment.

There are two models of protein folding that are currently being confirmed: The first:

The diffusion collision model, in which a nucleus is formed, then the secondary

structure is formed, and finally these secondary structures are collided together and pack

tightly together. The second: The nucleation-condensation model, in which the

secondary and tertiary structures of the protein are made at the same time. Recent

studies have shown that some proteins show characteristics of both of these folding

models.

Disruption of the native state

Under some conditions proteins will not fold into their biochemically functional forms.

Temperatures above or below the range that cells tend to live in will cause thermally

unstable proteins to unfold or "denature" (this is why boiling makes an egg white turn

opaque). High concentrations of solutes, extremes of pH, mechanical forces, and the

presence of chemical denaturants can do the same. Protein thermal stability is far from

constant, however. For example, hyperthermophilic bacteria have been found that grow

at temperatures as high as 122 °C, which of course requires that their full complement

of vital proteins and protein assemblies be stable at that temperature or above.

A fully denatured protein lacks both tertiary and secondary structure, and exists as a so-

called random coil. Under certain conditions some proteins can refold; however, in

many cases, denaturation is irreversible Cells sometimes protect their proteins against

the denaturing influence of heat with enzymes known as chaperones or heat shock

proteins, which assist other proteins both in folding and in remaining folded. Some

proteins never fold in cells at all except with the assistance of chaperone molecules,

which either isolate individual proteins so that their folding is not interrupted by

interactions with other proteins or help to unfold misfolded proteins, giving them a

second chance to refold properly. This function is crucial to prevent the risk of

precipitation into insoluble amorphous aggregates.

Folding Mechanisms and Kinetics

A unified view of protein folding should be general enough to interpret the

diverse experimental findings of the field. Thermodynamics offers such a universal

approach. Thermodynamic systems in equilibrium occupy the states with lowest Gibbs

free energy at constant pressure and temperature. The Gibbs free energy (G) consists of

an enthalpy and an entropic term

G(q)= H(q)- T(q)

where H is the enthalpy, T the absolute temperature, and S the entropy of the

protein, and q represents the reaction coordinate used to describe the progress of the

protein advancing from the unfolded toward the native state. Under physiological

conditions, proteins maintain their native structure because the favorable enthalpic term

arising from the solvent and protein interactions exceeds in magnitude the unfavorable

entropic term, and therefore the native state has a smaller Gibbs free energy than the

denatured state. The stability of the protein depends on the solvent–solvent, protein–

solvent, and protein–protein interactions. These interactions depend on the intensive

parameters that describe the thermodynamic stateof the system.

The enthalpic and the entropic terms are large, but of opposite sign, and almost

cancel each other. The Gibbs free energy difference between the biologically active and

denatured states of the proteins is rather small (Scharnagl et al., 2005). Proteins are

stable only within a narrow range of conditions and can be denatured by changing

virtually any of the intensive parameters (Shortle, 1996). Experiments prove that

proteins can be unfolded by heat (Tsai et al., 2002; Prabhu and Sharp, 2005), cold

(Franks, 1995; Kunugi and Tanaka, 2002), high pressure (Smeller, 2002; Meersman et

al., 2006), extreme pH (Puett, 1973; Fitch et al., 2006), and addition of salts (Pfeil,

1981).

Studies of protein stability and folding systematically change one or more of the

intensive parameters and follow the kinetics of the change and/or the shift of

equilibrium. There is a broad selection of methods that can be used to follow the

structural changes of the proteins, including fluorescence (Isenman et al., 1979;

Vanhove et al., 1998), phosphorescence (Mersol et al., 1993; Mazhul‘ et al., 2003),

circular dichroism (Kelly and Price, 2000), infrared spectroscopy (Fabian and

Naumann, 2004; Ma and Gruebele, 2005),nuclear magnetic resonance (Englander and

Mayne, 1992; Kamatari et al., 2004), and mass spectroscopy (Miranker et al., 1996;

Konermann and Simmons, 2003).

Both theoretical and experimental results indicate that a single reaction

coordinate in general is not enough to describe protein folding, and multiple reaction

coordinates must be used. (Becker and Karplus, 1997; Ma and Gruebele, 2005).

Finding the adequate reaction coordinates for protein folding is not straightforward.

Several kinetic and thermodynamic coordinates have been used to describe the

‗‗nativeness‘‘ of a given protein state. Thermodynamic reaction coordinates use a

thermodynamic parameter, e.g.,Gibbs free energy and/or entropy, to define the

distance between the native state and the actual state of a protein. The kinetic reaction

coordinate measures the time needed for the protein to reach the native state from a

given starting state. An important thermodynamic reaction coordinate often used to

describe the folding process is the number of native contacts present in the

conformation, which proved useful in interpreting simple folding processes.

Thermodynamic reaction coordinates, however, are inadequate to describe folding

dominated by kinetic traps because they completely ignore the Gibbs free‐ energy

barriers separating the different states (Sali et al., 1994; Wolynes et al., 1995; Chan and

Dill, 1998).

The Gibbs free energy barrier to folding is determined by the unfavorable loss

in configurational entropy upon folding and the gain in stabilizing native interactions.

Starting from the unfolded protein, the polypeptide chain has to fold partially in order

to bring together the residues that need to form the contacts stabilizing the native

structure. The constrained polypeptide chain has smaller entropy, which means higher

Gibbs free energy. As native contacts form, the enthalpy term decreases, the protein is

stabilized. The rate limiting step in the folding process is the formation of the transition

state, i.e., the conformation that has the highest Gibbs free energy on the folding

pathway (Chan and Dill, 1998; Lindorff Larsen et al., 2005).

The simplest model for unfolding and refolding involves a single cooperative

folding step, in which the unfolded (U) and folded (F) states of the protein

interconvert: U ↔ F. This simple mechanism well describes the folding of several

small proteins (Gillespie and Plaxco, 2004). The formation of a contact between two

residues in the transition state involves an entropic cost which depends on the

sequence separation of the two residues: the longer the chain between them the greater

the entropic cost, and this entropic cost contributes to the height of the Gibbs

free‐ energy barrier between the unfolded and the folded state. If nonnative

interactions play a marginal role in the transition state, it is possible to estimate the

folding kinetics from the average sequence separation of the contacts in the native

structure (Plaxco et al., 1998; Grantcharova et al., 2001; Zarrine Afsar et al., 2005).

Intermediate structures were observed to accumulate during the folding

of many proteins (Englander,2000). Such intermediate states are trapped structures that

have low Gibbs free energy. Mass action models that involve one or more intermediate

states were constructed to explain more complex folding kinetics. Mass action models

distinguish between ‗‗on pathway‘‘ and ‗‗off pathway‘‘ intermediates depending on

whether the intermediate is on the folding pathway between the unfolded and native

states (Baldwin, 1996). Off pathway intermediates often correspond to misfolded

structures that must completely or partially unfold to allow formation of the native fold

(Evans et al., 2005). The most general theory of protein folding is a statistical

mechanical model that uses the concept of the energy landscape, which is discussed in

more detail later. Here we only want to clarify that in the energy landscape view, there

is no clear distinction between on and off pathway intermediates. Folding mechanisms

involving these two types of intermediates only differ in the distribution of traps on a

Gibbs free energy landscape (Onuchic and Wolynes, 2004; Jahn and Radford, 2005).

Energy landscape theory of protein folding predicts that the enthalpic and the entropic

term of the transition state Gibbs free energy can cancel each other, leading to folding

that lacks an activation barrier (Bryngelson et al., 1995). Although such downhill

folding has indeed been found experimentally, it seems to be atypical, probably because

such proteins are evolutionary unfavorable (Yang and Gruebele, 2004a). The probable

reason for this is that proteins that fold downhill lack the barrier that prevents partial

unfolding, and thus become more prone to aggregation (Yang and Gruebele, 2003;

Gruebele, 2005).

The folding process is usually not restricted to a narrow path in conformational space.

Each molecule may follow a different path, and the molecule population can be very

heterogeneous. Mass action models are inadequate to describe these folding processes,

and the more complex energy landscape view must be used.

Computational methods for studying protein folding

Energy landscape of protein folding

The protein folding phenomenon was largely an experimental endeavor until the

formulation of an energy landscape theory of proteins by Joseph Bryngelson and Peter

Wolynes in the late 1980s and early 1990s. This approach introduced the principle of

minimal frustration. This principle says that nature has chosen amino acid sequences so

that the folded state of the protein is very stable. In addition, the undesired interactions

between amino acids along the folding pathway are reduced making the acquisition of

the folded state a very fast process. Even though nature has reduced the level of

frustration in proteins, some degree of it remains up to now as can be observed in the

presence of local minima in the energy landscape of proteins. A consequence of these

evolutionarily selected sequences is that proteins are generally thought to have globally

"funneled energy landscapes" that are largely directed toward the native state. This

"folding funnel" landscape allows the protein to fold to the native state through any of a

large number of pathways and intermediates, rather than being restricted to a single

mechanism. The theory is supported by both computational simulations of model

proteins and experimental studies, and it has been used to improve methods for protein

structure prediction and design. The description of protein folding by the leveling free-

energy landscape is also consistent with the 2nd law of thermodynamics. Physically,

thinking of landscapes in terms of visualizable potential or total energy surfaces simply

with maxima, saddle points, minima, and funnels, rather like geographic landscapes, is

perhaps a little misleading. The relevant description is really a highly dimensional phase

space in which manifolds might take a variety of more complicated topological forms.

Modeling of protein folding

Molecular Dynamics (MD) is an important tool for studying protein folding and

dynamics in silico. First equilibrium folding simulations were done using implicit

solvent model and umbrella sampling. Because of computational cost, ab initio MD

folding simulations with explicit water are limited to peptides and very small proteins.

MD simulations of larger proteins remain restricted to dynamics of the experimental

structure or its high-temperature unfolding. In order to simulate long-time folding

processes (beyond about 1 microsecond), like folding of small-size proteins (about 50

residues) or larger, some approximations or simplifications in protein models may be

introduced to speed-up the calculation process.The 5-peta FLOP distributed computing

project Folding@home simulates protein folding using the idle processing time of

PlayStation 3s and the CPU and GPU of personal computers from volunteers. The

project aims to understand protein misfolding and accelerate drug design for disease

research.

Protein folding: mechanisms and role in disease

Protein molecules are responsible for almost all biological functions in cells. In order to

fulfil their various biological roles, these chain-like molecules must fold into precise

three-dimensional shapes. Incorrect folding and clumping together of proteins is being

recognized as the cause for a growing number of age-related diseases, including

Alzheimer‘s and Parkinson‘s disease as well as other neurodegenerative disorders.Most

biological functions in living cells are performed by proteins, chain-polymers of amino

acids that are synthesized on ribosomes based on genetic information. Upon synthesis,

protein chains must fold into unique three-dimensional structures in order to become

biologically active. While in the test-tube this folding process can occur spontaneously,

in the cell most proteins require assistance for proper folding by so-called ‗molecular

chaperones‘. These are specialized proteins which protect other, not-yet folded proteins

from mis-folding and clumping together (aggregating) in the highly crowded cellular

environment. However, proteins do not always fold correctly, despite the existence of a

complex cellular machinery of protein quality control. In particular, an increasing

number of neurodegenerative diseases have been recognized in recent years to be

caused by the accumulation of protein aggregates in the brain and other parts of the

central nervous system. These disorders, including Alzheimer‘s dementia, Parkinson‘s

disease and Chorea Huntington, are typically age-related and impose a large social and

economic burden on the aging societies of industrialized nations. Currently, there is no

cure for any of these diseases, and it is believed that a concerted research effort in the

areas of protein folding,combined with a systems biology analysis of the networks of

protein quality control will provide the knowledge base for the development of new

therapeutic strategies.

Simulations of Protein Folding and Unfolding

Computational models and simulations have greatly advanced our understanding

of protein folding. The information from such studies is complementary to experiments.

In fact, there is a synergy between theory and experiment: theory provides testable

models and experiments provide the means to test and validate the models. The

outcome from this combination is a much richer view of the system in question than

what either approach could provide alone. In particular, simulations can help identify or

predict transition and intermediate states along the folding pathway, provide predictions

of the rate of folding and in some cases, predict the final, folded structure.

Simulating protein folding presents a significant challenge. Small proteins typically fold

in the several microseconds to seconds timescale; detailed atomistic simulations,

however, are currently limited to the nanosecond to microsecond regime. Therefore,

simulation of folding requires either simplified models or special sampling methods,

both of which introduce new approximations.

All Atom Models

The most straightforward approach to simulating protein folding and unfolding is to use

an all atom model with a force field like AMBER or CHARMM and apply traditional

molecular dynamics simulation. These force fields describe the energies of the

deformations of covalent bonds as well as van der Waals interactions, charge–charge

interactions, hydrogen bonds, and so on. Traditional molecular dynamics numerically

solves Newton‘s equations of motion by calculating the forces acting on atoms and

computing accelerations, velocities, and atomic displacements. Temperature is assigned

to the system by assigning appropriate velocities to the atoms. To simulate unfolding,

the simplest method of increasing sampling is to increase the temperature of the

simulation to 498 K or more. At these temperatures, the native structure of the protein is

usually lost within a few nanoseconds. This technique has been applied to numerous

examples, including bovine pancreatic trypsin inhibitor (Kazmirski and Daggett,

1998a), lysozyme (Kazmirski and Daggett, 1998b), myoglobin (Tirado Rives and

Jorgensen, 1993), barnase (Wong et al.,2000), ubiquitin (Alonso and Daggett, 1998),

the SH3 domain (Tsai et al., 1999), etc. Features of the unfolding process such as the

transition state ensemble or the unfolded ensemble have shown remarkable agreement

with experimental results (Day and Daggett, 2003).

A direct simulation of folding is a much harder problem. The longest continuous all

atom molecular dynamics simulation so far is still the 1 ms simulation of villin

headpiece subdomain performed by Duan and Kollman in 1998 (Duan and Kollman,

1998). In this simulation, the chain sampled a large number of conformations after

initial collapse, and near‐ native structures appeared but the true native conformation

was not reached. Although processor speeds have dramatically increased since 1998,

computational power is still insufficient to allow a meaningful all atom simulation of

the entire folding process. Even if the native state could be reached in a single

trajectory, multiple simulations would have to be performed to construct a believable

folding pathway.

An interesting alternative is the massively distributed method employed in the ‗‗Folding

at Home‘‘project (Pande et al., 2003). A large number of computers run the simulation

in parallel. As soon as a transition is detected (as a momentary surge in the heat

capacity) in one of the simulations, all computers receive a copy of the posttransition

conformation and the simulation is continued until the native conformation is reached.

Although this approach has been criticized as being flawed (Fersht and Daggett, 2002),

it has been successfully applied to fold several small, fast‐ folding proteins (Snow et al.,

2004; Sorin and Pande, 2005).

Simplified Models

In simplified (coarse grained) models (Dokholyan, 2006), effective particles (beads)

represent amino acids or groups of atoms. An empirical potential function, usually

derived from protein structures, is used to describe the interaction between these beads.

The shape of this potential is often very simple, such as a square well function. In many

simplified models, the positions of the beads are restricted to points on a lattice. Perhaps

the most minimal model is the one where there are only two types of amino acids:

hydrophobic and polar, and the chain is restricted to a two dimensional lattice (Dill et

al., 1995). Smaller on lattice model proteins allow an exhaustive enumeration of all

possible states of the given system. This approach allows a complete thermodynamic

description of the phase space and has greatly enhanced our understanding of protein

folding. The funnel view and the concept of energy landscapes arose directly from the

exhaustive sampling allowed by these minimal models (Bryngelson et al., 1995).

In the case of larger, more complex simplified models, exhaustive enumeration of states

is not possible. Monte Carlo is a common choice for simulating simplified models. In

Monte Carlo simulation, small moves are generated randomly and accepted or rejected

based on the energy of the new conformation. This is often performed in the framework

of advanced sampling schemes such as Replica Exchange Monte Carlo, where several

replicas of the system are simulated at various temperatures (Kihara et al., 2001;

Pokarowski et al., 2003). A more recent simulation approach, termed discrete (or

discontinuous) molecular dynamics (DMD) (Smith and Hall, 2001; Ding and

Dokholyan, 2005), extends the accessible simulation time by using long integration time

steps with approximated energy functions. Simple models like this start showing

remarkable success. Recently, Trp cage, a 20 residue miniprotein has been folded to a

conformation very close to the experimental structure (Ding et al., 2005a). It is believed

that this technique will be applicable to larger proteins.

Multiscale Modeling

Approaches using simplified, coarse grained models can be combined with fine grained,

all atom simulations in what is called multiscale molecular modeling. Bradley and

coworkers (2005) reported high resolution structure prediction for proteins up to 85

residues using a multiscale approach that sampled low and high resolution

conformations. DMD simulations combined with all atom, traditional molecular

dynamics have been used to simulate the formation of a b helix (Khare et al., 2005) and

to identify the transition state of the SH3 domain (Ding et al., 2005b). Although it

remains to be seen whether the force fields used are transferable to larger proteins,

multiscale modeling has the potential to break the 1 ms barrier of direct folding

simulations

II. Financial modeling

Financial modeling is the task of building an abstract representation (a model) of a

financial decision making situation. This is a mathematical model designed to represent

(a simplified version of) the performance of a financial asset or portfolio of a business,

project, or any other investment. Financial modeling is a general term that means

different things to different users; the reference usually relates either to accounting and

corporate finance applications, or to quantitative finance applications. While there has

been some debate in the industry as to the nature of financial modeling - whether it is a

trade-craft, such as welding, or a science - the task of financial modeling has been

gaining acceptance and rigor over the years. Typically, financial modeling is understood

to mean an exercise in either asset pricing or corporate finance, of a quantitative nature.

In other words, financial modelling is about translating a set of hypotheses about the

behavior of markets or agents into numerical predictions; for example, a firm's decisions

about investments (the firm will invest 20% of assets), or investment returns (returns on

"stock A" will, on average, be 10% higher than the market's returns).

In corporate finance, investment banking, and the accounting profession financial

modeling is largely synonymous with cash flow forecasting. This usually involves the

preparation of detailed company specific models used for decision making purposes and

financial analysis.

Applications include:

Business valuation, especially discounted cash flow, but including other

valuation problems

Scenario planning and management decision making ("what is"; "what if"; "what

has to be done")

Capital budgeting

Cost of capital (i.e. WACC) calculations

Financial statement analysis (including of operating- and finance leases, and

R&D)

Project finance

To generalize as to the nature of these models: firstly, as they are built around financial

statements, calculations and outputs are monthly, quarterly or annual; secondly, the

inputs take the form of ―assumptions‖, where the analyst specifies the values that will

apply in each period for external / global variables (exchange rates, tax percentage,

etc.…) and internal / company specific variables (wages, unit costs, etc.…).

Correspondingly, both characteristics are reflected (at least implicitly) in the

mathematical form of these models: firstly, the models are in discrete time; secondly,

they are deterministic. For discussion of the issues that may arise, see below; for

discussion as to more sophisticated approaches sometimes employed, see Corporate

finance: Quantifying uncertainty.

Modellers are sometimes referred to (tongue in cheek) as "number crunchers", and are

often designated "financial analyst". Typically, the modeller will have completed an

MBA or MSF with (optional) coursework in "financial modeling". Accounting

qualifications and finance certifications such as the CIIA and CFA generally do not

provide direct or explicit training in modeling. At the same time, numerous commercial

training courses are offered, both through universities and privately.

Although purpose built software does exist, the vast proportion of the market is

spreadsheet-based- this is largely since the models are almost always company specific.

Microsoft Excel now has by far the dominant position, having overtaken Lotus 1-2-3 in

the 1990s. Spreadsheet-based modelling can have its own problems, and several

standardizations and "best practices" have been proposed. "Spreadsheet risk" is

increasingly studied and managed.

One critique here, is that model outputs, i.e. line items, often incorporate ―unrealistic

implicit assumptions‖ and ―internal inconsistencies‖. (For example, a forecast for

growth in revenue but without corresponding increases in working capital, fixed assets

and the associated financing, may imbed unrealistic assumptions about asset turnover,

leverage and / or equity financing.) What is required, but often lacking, is that all key

elements are explicitly and consistently forecasted. Related to this, is that modellers

often additionally "fail to identify crucial assumptions" relating to inputs, "and to

explore what can go wrong". Here, in general, modellers "use point values and simple

arithmetic instead of probability distributions and statistical measures" - i.e., as

mentioned, the problems are treated as deterministic in nature - and thus calculate a

single value for the asset or project, but without providing information on the range,

variance and sensitivity of outcomes. Other critiques discuss the lack of adequate

spreadsheet design skills, and of basic computer programming concepts. More serious

criticism, in fact, relates to the nature of budgeting itself, and its impact on the

organization.

Quantitative finance

In quantitative finance, financial modeling entails the development of a sophisticated

mathematical model. Models here deal with asset prices, market movements, portfolio

returns and the like. A key distinction is between models of the financial situation of a

large, complex firm or "quantitative financial management", models of the returns of

different stocks or "quantitative asset pricing", models of the price or returns of

derivative securities or "financial engineering" and models of the firm's financial

decisions or "quantitative corporate finance".

Applications include:

Other derivatives, especially Interest rate derivatives and Exotic derivatives

Modeling the term structure of interest rates (short rate modelling) and credit

spreads

Credit scoring and provisioning

Corporate financing activity prediction problems

Portfolio problems

Real options

Risk modeling and Value at risk.

These problems are often stochastic and continuous in nature, and models here thus

require complex algorithms, entailing computer simulation, advanced numerical

methods (such as numerical differential equations, numerical linear algebra, dynamic

programming) and/or the development of optimization models. Modelers‘ are generally

referred to as "quants" (quantitative analysts), and typically have advanced (Ph.D. level)

backgrounds in quantitative disciplines such as physics, engineering, computer science,

mathematics or operations research. Alternatively, or in addition to their quantitative

background, they complete a finance masters with a quantitative orientation, such as the

Master of Quantitative Finance, or the more specialized Master of Computational

Finance or Master of Financial Engineering.

Although spreadsheets are widely used here also (almost always requiring extensive

VBA), custom C++ or numerical analysis software such as MATLAB is often preferred,

particularly where stability or speed is a concern. Matlab is the tool of choice for doing

economics research because of its intuitive programming, graphical and debugging

tools, but C++/Fortran are preferred for conceptually simple but high computational

costs applications where Matlab is too slow. Additionally, for many (of the standard)

derivative and portfolio applications, commercial software is available, and the choice

as to whether the model is to be developed in-house, or whether existing products are to

be deployed, will depend on the problem in question.

The complexity of these models may result in incorrect pricing or hedging or both. This

Model risk is the subject of ongoing research by finance academics, and is a topic of

great, and growing, interest in the risk management arena.

Criticism of the discipline (often preceding the financial crisis of 2007-2008 by several

years) emphasizes the differences between the mathematical and physical sciences and

finance, and the resultant caution to be applied by modelers, and by traders and risk

managers using their models. Notable here are Emanuel Derman and Paul Wilmot,

authors of the Financial Modelers' Manifesto. Some go further and question whether

mathematical- and statistical modeling may be applied to finance at all, at least with the

assumptions usually made (for options; for portfolios). In fact, these may go so far as to

question the "empirical and scientific validity... of modern financial theory". Notable

here are Nassim Taleb and Benoit Mandelbrot.

The modeling of the financial statements components of an entity is a unique area of

spreadsheet modeling, because it involves the systematic linking in of information from

almost all of the other spreadsheet modeling areas. This section is designed to provide:

An overview of the concepts that are required to be understood in order to

undertake financial statements modeling;

An explanation of the links between the three financial statements that ensure

that the relationships between them are maintained at all times; and

A general understanding of the different ways in which information is correctly

and logically linked into each of the financial statements.

If undertaken according to the principles enunciated in this documentation, with

the correct use of error checks, the modeling of the financial statements component of

an entity should be the easiest part of the spreadsheet model development process.

Financial Statement Impacts

There are five different ways in which information may correctly impact the financial

statements. It is important to understand each of these types of financial statement

impacts because each financial statement precedent module will usually impact the

financial statements in one of these ways. Additionally, any customisation of the

Financial Statements modules should always be undertaken in accordance with one of

these impact types, to ensure that the financial statements remain logical and correct.

Income Statement & Balance Sheet Impact

When information impacts the financial statements via the Income Statement and

Balance Sheet, a revenue or expense is reported on the Income Statement, resulting in

the creation of an asset or liability on the Balance Sheet. This type of impact on the

financial statements does not impact cash flow, because it does not result in a change in

cash on the Cash Flow Statement.

Income Statement & Cash Flow Statement Impact

When information impacts the financial statements via the Income Statement and Cash

Flow Statement, a revenue or expense is reported on the Income Statement and is

received or paid in cash in the same accounting period (and therefore recorded as a

change in cash on the Cash Flow Statement). This type of impact on the financial

statements does not directly impact assets, liabilities or equity on the Balance Sheet –

i.e. all Balance Sheet impacts take place indirectly via Net Profit After Tax (NPAT)

from the Income Statement and the change in cash on the Cash Flow Statement.

Balance Sheet & Cash Flow Statement Impact

When information impacts the financial statements via the Balance Sheet and Cash

Flow Statement, a cash inflow or outflow causes a movement in an asset, liability or

equity account on the Balance Sheet. This type of impact on the financial statements

does not impact earnings on the Income Statement.The reduction in a working capital

asset or liability in this way would therefore usually take place in a period subsequent to

a period in which revenues or expenses were reported on the Income Statement but not

received or paid as cash, resulting in the creation of an associated working capital asset

or liability. Hence, a Balance Sheet and Cash Flow financial statements impact would

often follow a prior period Income Statement and Balance Sheet financial statements

impact.

Balance Sheet Only Impact

When information impacts the financial statements via the Balance Sheet only, a

movement in an asset, liability or equity account on the Balance Sheet is offset by a

counter-acting movement in another asset, liability or equity account on the Balance

Sheet. This type of financial statements impact has no impact on earnings or cash, and

therefore nothing is reported on the Income Statement of Cash Flow Statement. The

layout of a Balance Sheet is governed by the accounting standards and reporting

requirements applicable to each entity. It is also governed by the choices the entity

makes (within the boundaries of its reporting requirements) as to how it structures the

presentation of its assets, liabilities and equity accounts on its Balance Sheet.

All Financial Statements Impact

When information impacts all three financial statements, a revenue or expense is

reported on the Income Statement, a change in cash is reported on the Cash Flow

Statement and an asset, liability or equity account is created on the Balance Sheet.

Hence, this type offinancial statements impact directly impacts earnings, cash and

Balance Sheet accounts.

Advantage of using Financial Models

Models try and minimise financial risk.

Models provide quick answers to things that may take months to actually

happens. Automatic recalculation means that if a change is made in the model

then all related formula and values change.

Graphs can be produced to help understand the result: these will automatically

change as any values are added.

III. Medical modeling

Research into medical modeling and the application of design and product development

technologies in medicine and surgery requires a multidisciplinary approach. Designed to

be accessible to all disciplines, with medical and technical terms explained as clearly

and simply as possible, Medical modeling provides a genuinely useful text to help the

broadest possible range of professionals to understand not only the technologies,

techniques and methods, but also what is required to apply them in medical treatments.

Medical modeling describes steps in the process from acquisition of medical scan data,

transfer and translation of data formats, methods of utilizing the data and finally using

the information to produce physical models using rapid prototyping techniques for use

in surgery or prosthetic rehabilitation. Technologies are fully described, highlighting

their key characteristics, advantages and disadvantages.

The term ‗medical model‘ is frequently used in psychiatry with denigration, suggesting

that its methods are paternalistic, inhumane and reductionist. This view has influenced

mental health organizations, which in certain areas advocate a departure from the

medical model, and contributes to the difficulties in leadership being played out between

politicians, professionals and patients.

Surgery simulation

Modeling the deformation of human organs for surgery simulation systems has turned

out to be quite a challenge. In the fields of elasticity and related modeling paradigms,

the main interest has been the development of accurate mathematical models. The speed

of these models has been a secondary interest. But for surgery simulation systems, the

priorities are reversed. The main interest is the speed and robustness of the models, and

accuracy is of less concern. We have been in the development of different practical

modeling techniques that take into account the reversed priorities and can be used in

practice for real-time modeling of deformable organs.

In the field of virtual reality (VR) (which includes simulation of surgical procedures),

conditions are different. Virtual environments are interactive and reactive, allowing the

user to modify and interact with objects in the virtual scene using virtual tools Effective

surgical simulation is even more difficult. Not only do we need real-time interactive

graphics but the objects in the scene should also exhibit physically correct behaviors

corresponding to the behaviors of real human organs and tissues. Unfortunately, human

tissue is very complex and often behaves viscoelastically . In addition, human body

parts consist of layers of different tissues interlaced with ligaments and fascias. Very

complex models are needed to model these objects realistically.

To build a general surgery simulator, the following main components are needed.

1) Computer graphics:

Graphics is needed to render realistic views of the virtual surgery scene and provide the

surgeon with a visual illusion of reality.

2) Haptic interface:

This interface is provided to represent the instruments and tools that the surgeon uses to

work on the surgery simulator. By tracking the position of these tools and sensing their

state, the computer is able to determine the surgeon‘s actions and provide them as input

to the simulation system. In reaction to these inputs, a haptic interface can provide the

surgeon with a physical sensation of touching and sensing objects in the virtual scene

using force-feedback techniques. The haptic interface thus closes the loop between

action and reaction by providing the tactile illusion of reality.

3) Physical modeling:

Physical models provide the surgeon with a behavioral illusion of reality. Bymodeling

the viscoelastic deformation of human skin, the fluid flow of blood from a wound, etc.,

these models ensure that the virtual scene reflects the behavior of the physical reality.

The demand for real-time performance has forced most researchers to develop or adapt

very simplistic models of elastic deformation to the needs of surgery simulation.

Biomedical Model

Biomedical models can be of many types—from animal models of human

diseases to animal, in vitro, or modelling systems for studying any aspect of human

biology or disease.A biomedical model is a surrogate for a human being, or a human

biologic system, that can be used to understand normal and abnormal function from

gene to phenotype and to provide a basis for preventive or therapeutic intervention in

human diseases. For example, characterization of mouse models of various dwarfing

syndromes, cloning of mutated genes, and parallel comparative genetic mapping and

cloning of genes for similar human syndromes have led to an understanding of various

human dwarfing conditions and have suggested therapies based on biologic knowledge,

rather than shotgun testing. Mouse models with targeted mutations in the cystic fibrosis

gene are providing a means for testing gene therapy delivered by aerosol into the lungs

(Dorin and others 1996). The use of nonhuman primates that are genomically similar is

beginning to shed light on complex human diseases. Squid giant axons are important

model systems in neurobiologic research because their size allows a variety of

manipulations not possible with vertebrate axons and because there are 40 years of data

on the anatomy, physiology, biophysics, and biochemistry of those neurons. Clams, sea

urchins, and fishes are models in developmental biology (for example, for study of

transcriptional regulation during early cell differentiation) because they have high

fecundity, short generation times, and transparent eggs that develop externally. Those

are but a few examples among thousands that illustrate the breadth and utility of

comparative models in biomedicine.

A model need not be an exact replica of a human condition or disease. For example,

mice with mutations in the homologue of the human Duchenne-Becker muscular

dystrophy gene are less severely affected than human patients and can regenerate

degenerating muscle (Anderson and others 1988); they have been used successfully to

test muscle implantation therapy for this debilitating disease (Ragot and others 1993).

Many targeted-mutation (so-called knockout) mice exhibit unexpected phenotype,

revealing previously unidentified roles for known genes (Homanics and others 1995

Shastry 1994). Finally, to the extent that biologic processes in living organisms are

predictable, computer modelling might be able to predict the outcome of perturbing a

metabolic pathway or treating a metabolic disease; this can lead to hypothesis-driven

research with an animal model.

Essential and Emerging Research Fields and Technologies

FUNCTIONAL GENOMICS

Gene maps of human and selected model organisms' genomes are developing to the

point where serious work on gene function is a large-scale reality. The Human Genome

Initiative has been very successful in achieving the goal that it set out toward less than

15 years ago; early in the 21st century virtually all genes in the human genome will be

identified. This success is leading to a re-focusing of genomic research from

understanding how those genes function to gene expression at the molecular level and

translation into phenotypic features at the functional level. Understanding gene

interaction and how genes function within whole organisms will provide the basis for

translating basic research into clinical therapy and disease prevention and will benefit

human health on a broader scale than ever possible before. Although progress in

developing therapy for and prevention of some specific human diseases has taken place

coincidentally with the genome projects, the completion of gene identification will open

the way for a redirection of major efforts in gene-function studies.The popular term for

the study of how genes function to control the whole organism is "functional genomics."

The attention of the biomedical research community is refocusing from detailed analysis

of the genome to functional genomics. The American Physiological Society organized a

workshop at the Banbury Center, Cold Spring Harbor, NY, in 1997, "Genomics to

Physiology and Beyond: How Do We Get There?" at which the phrase "Genes to Health

Initiative" was coined. A meeting to begin detailed planning of next steps for the

Physiome Project was held in St. Petersburg, Russia, in 1997, and a followup meeting

will be held in San Francisco in 1998 (Cowley 1997).

Models are critical to enable us to move from genomic to functional genomic analysis.

Gene function can be assessed only by moving beyond molecular biology to the study

of whole animals, whole cellular systems in culture, or computer modelling of complex

biologic systems. Moreover, no gene acts alone. The interaction of genes in whole

animals to produce phenotypes or diseases can be understood only by performing

experiments in whole animals or with the aid of highly sophisticated computer

modelling systems. Sophisticated computer systems will be required to organize,

analyze, and interpret the complex data generated by such experiments.

BEHAVIOR AND NEUROBIOLOGY

Although a strong tradition in basic behavioral research exists, tools and techniques are

only now beginning to be available for dissecting out cellular, molecular, and genetic

components of behavior. Advances never before thought possible are being made in

understanding and treating human "behavioral" conditions. The aging of the human

population increases the need for more research on ways to improve quality of life and

to lessen the burden of age-related services for everyone. That need has already

increased the emphasis on studies of age-related cognitive diseases of aging and

memory, such as Alzheimer's disease and other forms of senile dementia. We are

becoming increasingly aware that the severity of many diseases and rates of recovery

from them have psychological components. In addition, as a society we are trying to

improve our children's quality of life. It is clear that early learning and conditioning

affect individual lives and behavior throughout life, as well as society as whole. Proper

diet and regular and exercise can improve an person's health. Witness the volume of

information put out by voluntary health organizations, such as the American Heart

Association or American Cancer Society and the Public Health Service encouraging

people to change their behavior to decrease their risk for cancer or heart disease. AIDS

is a dramatic example of impacts of social behavior on health. Individual behavior—

such as lack of self control vs. violence, is directly reflected in the rising incidence of

juvenile crime.

Emerging fields in which behavior is viewed as an end point include biologic

psychiatry, developmental biology (the modelling of specific psychiatric disorders, such

as anxiety, depression, and schizophrenia) and cognitive processes (such as spatial

learning, memory and age-related declines in cognition) (Palmour and others 1997).

Fields in which behavior itself can affect physiologic, cellular, and even molecular

processes include the use of pharmacologic and genetic models to study the effects of

drug addiction and relapse and psychoimmunology (the relationship of behavior to

disease resistance and recovery).Aquatic organisms have been used for many years in

behavioral studies and will continue to be valuable models. For example, zebrafish are a

burgeoning developmental model, in particular for their expected role in molecular

genetics and will probably provide advances in embryology, neurobiology, and other

fields. With sophisticated new microscopes such as two-photon detectors, the use of

resonant fluorescence probes, any cellular component can be followed in transparent

zebrafish embryos from inception throughout the acquisition of normal adult behaviors.

Mutational analysis can be conducted to assess the role of single gene loci in defined

behaviors, thus offering insight into basic mechanisms of development and

neuropathology.The types of technological advances needed for that research involve all

the emerging fields of understanding of brain function in living animals. Relating

neurobiology to behavior requires, for example, advances in brain imaging techniques

for real-time assessment of the chemistry and physiology of individual cells in awake

animals. This objective includes sophisticated telemetry and video for monitoring

behavior in ethologically relevant settings. Ethological assessment will requires

improvements in electrophysiologic measuring systems, noninvasive imaging

systems, and baseline behavioral measures of each model species.

MATHEMATICAL MODELLING, COMPUTATIONAL SIMULATIONS, AND

SCIENTIFIC DATABASES

Computer management of data related to biomedical models has two components: 1)

biomathematical modeling and statistical analysis of data and 2) databases that store

information that can be used to support functional genomic and other research. The

importance of mathematical modeling and computation in biomedical research grows as

the ability to collect and distribute data increases. The rapidly growing volume of data

generated in current research efforts will require data-analysis and data-management

resources that are not now widely available. Many kinds of biomedical research,

including work with animal models, have not taken full advantage of advances in

mathematical and computational modeling technology. Many investigators might be

unaware of the variety and utility of models, computational tools, and simulation

environments.

The database issue has more to do with the need for public databases to provide

phenotypic and physiologic information to the modeling community. Two kinds of

needs are apparent. First, investigators developing small databases for their own work

are often willing to share the information in them with fellow scientists. But most

biologists will require easy-to-use database-design tools or need user friendly templates

to make their databases public. Such templates would ensure good design and foster

standards for the ultimate integration of local information into larger biologic data

resources. Second, larger community databases are needed to provide phenotypic or

model information. Such databases require support both for startup and continued

maintenance.

IV. Stochastic modeling of genetic and biochemical networks

Computational modeling and simulation of biochemical networks is at the core of

systems biology and this includes many types of analyses that can aid understanding of

how these systems work. From a modeling perspective, biochemical networks are a set

of chemical species that can be converted into each other through chemical reactions.

The focus of biochemical network models is usually on the levels of the chemical

species and this usually requires explicit mathematical expressions for the velocity at

which the reactions proceed. The most popular representation for these models uses

ordinary differential equations (ODEs) to describe the change in the concentrations of

the chemical species. Another representation that is gaining popularity in systems

biology uses probability distribution functions to estimate when single reaction events

happen and therefore track the number of particles of the chemical species. As a general

rule, the latter approach, known as stochastic simulation, is preferred where the numbers

of particles of a chemical species is small; the ODE approach is required when the

number of particles is large because the stochastic approach would be computationally

intractable.

Stochastic models represent the number of particles of each chemical species and use a

reaction probability density function (PDF) to describe the timing of reaction events.

Gillespie developed a Monte Carlo simulation algorithm, known as the stochastic

simulation algorithm (SSA) first reaction method,that simulates the stochastic dynamics

of the system. In the stochastic formalism, it is very important that simulations are

repeated for a sufficient number of times in order to reveal the entire range of behavior

presented by such a system (i.e., to estimate a distribution for each chemical species and

its dynamic evolution). Once a new model has been entered or loaded it is ready to be

used for simulation. There are two basic types of simulation: Time Course and Steady

State which are entries under the Tasks branch.

There are several issues that have to be considered to carry out successful stochastic

simulations. The first consideration is that in this approach reversible reactions must be

handled as two separate irreversible reactions (the forward and reverse directions). In

ODE-based simulations, the forward and backward reaction rates are usually aggregated

and thus can cancel each other out (resulting in a null rate); in stochastic simulations

each single reaction event has to be considered separately and even if there is no net

rate, the actual cycling rate will be explicitly represented.For stochastic simulations the

volume of the systems is crucial: the volume should not be too big so that the computed

particle numbers are not too high and within numerical possibilities of a computer Thus,

it is important that the volume of the system be defined in the compartment description

in such a way that the particle numbers are not too high.

Stochastic modeling of gene regulatory networks

Gene regulatory networks are dynamic and stochastic in nature, and exhibit

exquisite feedback and feed forward control loops that regulate their biological function

at different levels. Modeling of such networks poses new challenges due, in part, to the

small number of molecules involved and the stochastic nature of their interactions.The

large amounts of data being generated by high-throughput technologies have motivated

the emergence of systems biology, a discipline which emphasizes a system‘s

characterization ofbiological networks. Such a system‘s view of biological organization

is aimed to draw on mathematical methods developed in the context of dynamical

systems and computational theories in order to create powerful simulation and analysis

tools to decipher existing data and devise new experiments. One goal is to generate an

integrated knowledge of biological complexity that unravels functional properties and

sources of health and disease in cells, organs,and organisms. This is commonly known

as the ‗reverse engineering‘ problem. Another goal is to successfully interface naturally

occurring genetic circuits with de novo designed and implemented systems that interfere

with the sources of disease or malfunction and revert their effects. This is known as the

‗forward engineering‘ problem. Although reverse and forward engineering of even the

simplest biological systems has proven to be a daunting task, mathematical approaches

coupled to rounds of iteration with experimental approaches greatly facilitate the road to

biological discovery in a manner that is otherwise difficult. This, however, necessitates

a serious effort devoted to the characterization of salient biological features, followed

by the successful extension of engineering/mathematical tools, and the creation of new

tools and theories to accommodate and/or exploit them.

Much of the mathematical modeling of genetic networks represents gene expression and

regulation as deterministic processes. There is now, however, considerable experimental

evidence indicating that significant stochastic fluctuations are present in these

processes, both in prokaryotic and eukaryotic cells . Furthermore, studies of engineered

genetic circuits designed to act as toggle switches or oscillators have revealed large

stochastic effects. Stochasticity is therefore an inherent feature of biological dynamics,

and as such, should be the subject of in depth investigation and analysis. A study of

stochastic properties in genetic systems is a challenging task. It involves the formulation

of a correct representation of molecular noise, followed by the formulation of

mathematically sound approximations for these representations.It also involves devising

efficient computational algorithms capable of tackling the complexity of the dynamics

involved.

MODELING GENETIC NETWORKS

The mathematical approaches used to model gene regulatory network differ in the level

of resolution they achieve and their underlying assumptions. A broad classification of

these methods separates the resulting models into deterministic and stochastic, each

class embodying various subclasses with their different mathematical formalisms.

1. Deterministic rate equations modeling

Cellular processes, such as transcription and translation, are often perceived to be

systems of distinct chemical reactions that can be described using the laws of mass-

action, yielding a set of differential equations (linear or nonlinear) that give the

succession of states (usually concentration of species) adopted by the network over

time. The equations are usually of the form

( )

where x =[x1, x2….Xn]T is a vector of non-negative real numbers describing

concentrations and fi : Rn → Rn is a function of the concentrations. Ordinary

differential equations are arguably the most widespread formalism for modelling gene

regulatory networks and their use goes back to the ‗operon‘ model of Jacob and Monod

and the early work of Goodwin. The main rationale of deterministic chemical kinetics is

that at constant temperature, elementary chemical reaction rates vary with reactant

concentration in a simple manner. These rates are proportional to the frequency at which

the reacting molecules collide, which is again dependent on the molecular

concentrations in these reactions

2. Stochastic modeling

Although the time evolution of a well-stirred chemically reacting system is traditionally

described by a set of coupled, ordinary differential equations that characterize the

evolution of the molecular populations as a continuous, deterministic process.

Chemically reacting systems in general, and genetic networks in particular, actually

possesses neither of those attributes:molecular populations are whole numbers, and

when they change they always do so by discrete,integer amounts. Furthermore, a

knowledge of the system‘s current molecular populations is not by itself sufficient to

predict with certainty the future molecular populations. Just as rolled dice are essentially

random or ‗stochastic‘ when we do not precisely track their positions and velocities and

all the forces acting on them, so is the time evolution of a well-stirred chemically

reacting system for all practical purposes stochastic. If discreteness and stochasticity are

not noticeable, for example in chemical systems of ‗test-tube‘ size or larger, then the

traditional continuous deterministic description seems to be adequate. But if the

molecular populations of some reactant species are very small, or if the dynamic

structure of the system makes it susceptible to noise amplification, as is often the case in

cellular systems, discreteness and stochasticity can play an important role. Whenever

that happens, the ordinary differential equations approach does not accurately describe

the true behavior of the system. Alternatively, one should resort to an overtly discrete

and stochastic description evolving in real (continuous) time that accurately reflects

how chemical reactions physically occur at the molecular level. We first give a number

of motivating examples that illustrate the importance of such stochastic methods, then

present the details of stochastic chemical kinetics.

Computational Modeling of Biochemical Networks

Computational modeling and simulation of biochemical networks is at the core of

systems biology and this includes many types of analyses that can aid understanding of

how these systems work. Biochemical networks are intrinsically complex, not only

because they encompass a large number of interacting components, but also because

those interactions are nonlinear. Like many other nonlinear phenomena in nature, their

behavior is often unintuitive and thus quantitative models are needed to describe and

understand their function. While the concept of biochemical networks arose from the

reductionist process of biochemistry, where the focus was on studying isolated

enzymatic reactions, it is now better understood in the framework of systems biology,

where the focus is on the behavior of the whole system, or at least several reactions, and

particularly on what results from the interactions of its parts. Computational modeling is

thus a technique of systems biology as important as its experimental counterparts. From

a modeling perspective, biochemical networks are a set of chemical species that can be

converted into each other through chemical reactions. The focus of biochemical network

models is usually on the levels of the chemical species and this usually requires explicit

mathematical expressions for the velocity at which the reactions proceed. The most

popular representation for these models uses ordinary differential equations (ODEs) to

describe the change in the concentrations of the chemical species. Another

representation that is gaining popularity in systems biology uses probability distribution

functions to estimate when single reaction events happen and therefore track the number

of particles of the chemical species. As a general rule, the latter approach, known as

stochastic simulation, is preferred where the numbers of particles of a chemical species

is small; the ODE approach is required when the number of particles is large because

the stochastic approach would be computationally intractable.

Stochastic Models

When analyzing a biochemical system which contains small numbers of particles of

each reactant, the assumption of continuous concentrations fails and consequently the

underlying basis of the ODE representation also fails. Moreover, in such conditions,

stochastic effects become more pronounced and may lead to dynamics that differ

significantly from those that would result from the ODE approach. In the conditions

described above, one should then use a stochastic discrete approach for the simulation

of the system dynamics.

Stochastic models represent the number of particles of each chemical species and use a

reaction probability density function (PDF) to describe the timing of reaction events.

Gillespie developed a Monte Carlo simulation algorithm, known as the stochastic

simulation algorithm (SSA) first reaction method, that simulates the stochastic

dynamics of the system by sampling. It is important to stress that one simulation run

according to this approach is only one realization of a probabilistic representation, and

thus provides limited amount of information on its own. In the stochastic formalism, it

is very important that simulations are repeated for a sufficient number of times in order

to reveal the entire range of behavior presented by such a system (i.e., to estimate a

distribution for each chemical species and its dynamic evolution

V. Population explosion in insects

Modeling of population dynamics is the essential part of both research and management

of forest pest insects. Forest pest insect models are present in all classes of population

models considered in this paper: regression-, theoretical-, non-parametric-, phenology-,

and life-system models both of medium and huge size. Modeling success depends on

selection of appropriate complexity level that corresponds to modeling objectives and to

available data. Huge models that attempt to integrate all existing knowledge about an

insect species and to serve all possible purposes, become obsolete before completion

and have unexpectedly limited usage because of the lack of flexibility. Simple models

were shown to be efficient in simulation of stationary time-series of population

dynamics. However, non-stationary population dynamics usually require more complex

models. It seems most beneficial to combine models of different complexity, when

complex (but not huge!) models can be used for biological interpretation of simple

models.

The shift in forest pest management objectives from simple (e.g., prevention of

defoliation) to complex (e.g., minimization of impact or slowing the spread of pest

population) requires adjustment of modeling strategies. We expect the models to

increase in their complexity and scope. The increase in model complexity can be

achieved using object-oriented programming which makes models modular, flexible and

re-usable. The cost and time (one year maximum) of model development should not be

increased. Any population model of a forest pest insect that attempts to be realistic

should consider spatial dynamics. Spatial models require different modeling techniques

as compared to models of local populations. These include: geo statistics, cellular

automata, and meta population models in which local populations are considered as

individuals. Computers with parallel processors may become useful for simulation of

spatial processes in ecological systems. Any tactic of forest pest management requires

prediction of pest population change over time and/or space. However, the scope of

prediction depends on management objectives. The most simple objective -- to prevent

defoliation in the current year -- requires information about the area to be treated with

pesticide and about the time of sampling and spraying. Two kinds of forecasts are most

useful for this management objective: 1) prediction of possible local defoliation from

population density samples and 2) prediction of insect development from temperature

data.

Current forest pest-management practice is shifting from simple objectives to more

complex ones. An example of complex objective is to minimize the impact of pest

population on forest ecosystems during several years. The impact can be changed by

silvicultural methods, biological control, or pesticide spraying. Another possible

objective is to reduce the spread rate of introduced pest population. These complex

objectives require prediction of population change over long time intervals and over

large areas. Of course, it is impossible to predict pest abundance at specific location 10

years ahead, but it may be possible to predict the change in average pest population

density as a result of some change in environment (e.g., thinning).

Mathematical modeling is the major tool for predicting population dynamics. The

changes in pest-management objectives should be followed by a change in modeling

methodology. In general, models will become more complex and their temporal and

spatial scope will increase. Thus, it is important to reconsider advantages and

disadvantages of different kinds of models and to delineate the most promising

modeling methodology. Theoretical models are relatively simple and represent one or

few most important mechanisms of population change. They have been developed for

more than a century starting from exponential and logistic models. These models were

improved and generalized by adding delayed density-dependence, asymmetric

population growth, group effect, multiple equilibria and other features (Berryman and

Millstein 1990). The major advantage of these models is their simplicity. Model

structure is usually very "transparent", and all effects are easy to explain. Regression

(=empiric) models are based on linear or polynomial relationship between predicted

value (e.g., population density) and one or several factors (e.g., temperature, abundance

of food or enemies). These models do not represent the mechanisms of population

change even at the mythological level. They are used to estimate population density

from one or several predictors which may be: population abundance in the previous

year, weather parameters, site characteristics, etc. These models are useful for making

immediate decisions concerning population treatment. Qualitative models can be

considered as a subclass of theoretical models. Instead of equations, these models use

sets of conditions the equations should satisfy (Berryman and Stenseth 1989). The result

is a set of conditions that the trajectory of population change should satisfy. These

models have no parameters and are often called "non-parametric". Qualitative models

are theoretical tools only. To make a quantitative prediction, the model should be

parametrized.

INSECT POPULATION GROWTH

The model is sequential, In the initial period, the farmer observes the initial insect

population, market parameters such as pesticide cost and output price, and regulations

including the length of the preharvest interval. Using this information, the farmer forms

expectations about profits at the date of harvest with and without pesticide application.

He then chooses, in the second period, whether or not to apply chemicals. In the third

period, which is the period of the PHI, the insect population grows or declines. Finally,

harvest occurs in the fourth period, at which time the farmer realizes some level of

profit.

The insect population growth rate is modeled as an increasing function of the current

insect population and time. However, insect growth is a stochastic process, due in part

to the influence of random factors such as weather on reproduction rates (Varley,

Gradwell, and Hassell; Trumble; Minks and Harrewijn). Formally, insect population

growth is modeled as a geometric Brownian motion process with a drift component as

follows:

(1)

where is the current insect population,α is the intrinsic insect growth rate, dt is an

increment of time, σ is a variance coefficient, and is the increment of a

Wiener process and is standard normal

The process in equation (1) has a number of features that make it appropriate for

modeling insect population levels. The process is such that per-period growth rates are

normally distributed; as we discuss in the empirical section below, this property

matches closely with experimental evidence relating insect growth rates to

environmental conditions. Since percentage changes in are changes in the natural

logarithm of , population levels themselves are normally distributed in this

formulation. Thus, is bounded from below by zero, so that population levels can never

be negative. Another important property of this process is that short-run changes in

are dominated by the volatility component of (1), whereas long-run changes are more

influenced by the trend component.

VI. Epidemics

The study of epidemic disease has always been a topic where biological issues mix with

social ones. When we talk about epidemic disease, we will be thinking of contagious

diseases caused by biological pathogens — things like influenza, measles, and sexually

transmitted diseases, which spread from person to person. Epidemics can pass

explosively through a population, or they can persist over long time periods at low

levels; they can experience sudden flare-ups or even wave-like cyclic patterns of

increasing and decreasing prevalence. In extreme cases, a single disease outbreak can

have a significant effect on a whole civilization, as with the epidemics started by the

arrival of Europeans in the Americas, or the outbreak of bubonic plague that killed 20%

of the population of Europe over a seven-year period in the 1300s

Diseases and the Networks that Transmit Them

The patterns by which epidemics spread through groups of people is determined not just

by the properties of the pathogen carrying it including its contagiousness, the length of

its infectious period, and its severity but also by network structures within the

population it is affecting. The social network within a population recording who knows

whom determines a lot about how the disease is likely to spread from one person to

another. But more generally, the opportunities for a disease to spread are given by a

contact network: there is a node for each person, and an edge if two people come into

contact with each other in a way that makes it possible for the disease to spread from

one to the other.

This suggests that accurately modeling the underlying network is crucial to

understanding the spread of an epidemic. This has led to research studying how travel

patterns within a city or via the worldwide airline network could affect the spread of a

fast-moving disease. Contact networks are also important in understanding how diseases

spread through animal populations with researchers tracing out the interactions within

livestock populations during epidemics such as the 2001 foot-and-mouth outbreak in the

United Kingdom as well as plant populations, where the affected individuals occupy

fixed locations and diseases tend to have a much clearer spatial footprint . And similar

models have been employed for studying the spread of computer viruses, with malicious

software spreading between computers across an underlying communication network.

The pathogen and the network are closely intertwined: even within the same population,

the contact networks for two different diseases can have very different structures,

depending on the diseases‘ respective modes of transmission. For a highly contagious

disease, involving airborne transmission based on coughs and sneezes, the contact

network will include a huge number of links, including any pair of people who sat

together on a bus or an airplane. For a disease requiring close contact, or a sexually

transmitted disease, the contact network will be much sparser, with many fewer pairs of

people connected by links. Similar distinctions arise in studying computer viruses,

where a piece of software infecting computers across the Internet will have a much

broader contact network than one that spreads by short-range wireless communication

between nearby mobile devices

Connections to the Diffusion of Ideas and Behaviors.

There are clear connections between epidemic disease and the diffusion of ideas through

social networks. Both diseases and ideas can spread from person to person, across

similar kinds of networks that connect people, and in this respect, they exhibit very

similar structural mechanisms — to the extent that the spread of ideas is often referred

to as ―social contagion‖. The biggest difference between biological and social contagion

lies in the process by which one person ―infects‖ another.With diseases, on the other

hand, not only is there a lack of decision-making in the transmission of the disease from

one person to another, but the process is sufficiently complex and unobservable at the

person-to-person level that it is most useful to model it as random. That is, we will

generally assume that when two people are directly linked in the contact network, and

one of them has the disease, there is a given probability that he or she will pass it to the

other. This use of randomness allows us to abstract away questions about the mechanics

of how one person catches a disease from another for which we have no useful simple

models.

This, then, will be the concrete difference in our discussion of biological as opposed to

social contagion not so much the new context as the new classes of models, based on

random processes in networks, that will be employed.

Branching Processes

The simplest model of contagion which is known as branching process.It works as

follows.

1. (First wave.) Suppose that a person carrying a new disease enters a population,

and transmits it to each person he meets independently with a probability of p.

Further, suppose that he meets k people while he is contagious; let‘s call these k

people the first wave of the epidemic. Based on the random transmission of the

disease from the initial person, some of the people in the first wave may get

infected with the disease,while others may not.

2. (Second wave.) Now, each person in the first wave goes out into the population

and meets k different people, resulting in a second wave of k · k = k 2 people.

Each infected person in the first wave passes the disease independently to each

of the k second-wave people they meet, again independently with probability p.

3. Subsequent waves.) Further waves are formed in the same way, by having each

person in the current wave meet k new people, passing the disease to each

independently with probability p.

Thus the contact network for this epidemic can be drawn as in Figure 21.1(a) (with k = 3

and only the first three waves shown). We refer to such a network as a tree: it has a single

node at the top called the root; every node is connected to a set of nodes in the level below

it; and every node but the root is also connected to a single node in the level above it. The

tree that forms the contact network for the branching process is in fact infinite, since we

continue defining waves indefinitely.

The branching process model is a simple framework for reasoning about the spread of

an epidemic as one varies both the amount of contact among individuals and the level of

contagion. Now, what is the behavior of an epidemic in this model? We can picture the

spread of the epidemic by highlighting the edges of the contact network on which the

disease passes successfully from one person to another recall that each of these

infections happens independently with probability p. Thus, Figure (b) shows an

aggressive epidemic that infects two people in the first wave, three in the second wave,

five in the third wave, and presumably more in future waves (not shown in the picture).

Figure (c), on the other hand, shows a much milder epidemic (for a less contagious

disease, with a smaller value of p): of the two people infected in the first wave, one

doesn‘t infect anyone else, and the other infects only one further person who in turn

doesn‘t pass it on. This disease has completely vanished from the population after the

second wave, having infected only four people in total.

An Epidemic model is a simplified means of describing the transmission of

communicable disease through individuals. The outbreak and spread of disease has been

questioned and studied for many years. The ability to make predictions about diseases

could enable scientists to evaluate inoculation or isolation plans and may have a

significant effect on the mortality rate of a particular epidemic. The modelling of

infectious diseases is a tool which has been used to study the mechanisms by which

diseases spread, to predict the future course of an outbreak and to evaluate strategies to

control an epidemic.

Types of Epidemic Models

Stochastic

"Stochastic" means being or having a random variable. A stochastic model is a tool for

estimating probability distributions of potential outcomes by allowing for random

variation in one or more inputs over time. Stochastic models depend on the chance

variations in risk of exposure, disease and other illness dynamics. They are used when

these fluctuations are important, as in small populations.

Deterministic

When dealing with large populations, as in the case of tuberculosis, deterministic or

compartmental mathematical models are used. In the deterministic model, individuals in

the population are assigned to different subgroups or compartments, each representing a

specific stage of the epidemic. Letters such as M, S, E, I, and R are often used to

represent different stages.The transition rates from one class to another are

mathematically expressed as derivatives, hence the model is formulated using

differential equations. While building such models, it must be assumed that the

population size in a compartment is differentiable with respect to time and that the

epidemic process is deterministic. In other words, the changes in population of a

compartment can be calculated using only the history used to develop the model.

Another approach is through discrete analysis on a lattice (such as a two-dimensional

square grid), where the updating is done through asynchronous single-site updates

(Kinetic Monte Carlo) or synchronous updating (Cellular Automata). The lattice

approach enables in homogeneities and clustering to be taken into account. Lattice

systems are usually studied through computer simulation.

Terminology

The following is a summary of the notation used in this and the next sections.

M: Passively Immune Infants

S: Susceptibles

E: Exposed Individuals in the Latent Period

I: Infectives

R: Removed with Immunity

β: Contact Rate

µ: Average Death Rate

B: Average Birth Rate

1/ε: Average Latent Period

1/γ: Average Infectious Period

R0: Basic Reproduction Number

N: Total Population

f: Average Loss of Immunity Rate of Recovered Individuals

δ: Average Temporary Immunity Period

MODULE 1 - iiitmk.ac.in · tools are CAD, CAE, CAM, Continuous Acquisition and Life Cycle Support,...

Documents

Transcript of MODULE 1 - iiitmk.ac.in · tools are CAD, CAE, CAM, Continuous Acquisition and Life Cycle Support,...