FUNCTIONAL IFFE TI L UA I S

FUNCTIONAL IFFE TI L

UA I S Guest Editor A.SLAVOVA

VOLUME 13, 2006 No.1

SPECIAL ISSUE ON

NEURAL NETWORKS

THE COLLEGE OF JUDEA & SAMARIA ARIEL, ISRAEL

------~~

The College of Judea and Samaria

© All Rights Reserved 2006 Printed in Israel ISSN 0793-1786

TABLE OF CONTENTS

Angela Slavova Guest Editor Preface. 3

P.Arena, L.Fortune, M.Frasca, L.Patane. VLSI implementation of control schemes based on dynamical central pattern generators. 5

F.Sapuppo, M.Bucolo, M.Intaglietta, L.Fortuna, P.Arena. cellular nonlinear network in microcirculation characterization. 23

P.Kozma, P.Sonkoly, P.Szolgay. Seismic wave propagation modeling on CNN-UM architecture. 43

Z.Nagy, P.Szolgay. Solving partial differential equations on emulated digital CNN-UM architectures. 61

I.Szatmari. Spatio-temporal nonlinear wave metric for binary and gray-scale object comparison on analogic cellular wave computers. 89

V.Mladenov. Spatio-temporal phenomena in two-dimensional cellular nonlinear networks based on second order cells. 99

A.Slavova. Hysteresis in CNN model of bacteria growth. 107

1

GUEST EDITOR PREFACE

This issue is a collected work of the recent advances in the theory and applications of Cellular Neural/Nonlinear Networks.

Cellular Neural Networks (CNN's) were introduced in 1988 by L.O.Chua and L.Yang as a novel class of information processing systems, which possesses some of the key features of Neural Networks and which has important potential applications in such areas as image processing and pattern recognition. Many complex computational problems can be formulated, as well-defined tasks where the signal values are placed on a regular geometric 2-D or 3-D grid, and the direct interaction between signal values are limited within a finite local neighborhood. CNN is an analog dynamic processor array, which reflects just this property: the processing elements interact directly within a finite local neighborhood. The dynamical system which describes CNNs is a system of functional differential equations.

The purpose of the issue is to present the recent results in the basic concepts of dynamics, stability analysis and CNN models of some partial differential equations. The main emphasis of the Special Issue is on spatiatemporal phenomena modelling via CNN. VLSI implementation of a class of nonlinear systems used as Central Pattern Generators for artificial locomation control is presented. It is known that CNNs technology represents

·an extremely suitable solution for spatio-temporal dynamics, especially in microcirculation characterization of the fluids such as blood flow. In this sense very good examples of the successful interactions between biology and engineering, so called bio-inspired robotics, are given in the issue.

Another aspect of the Special Issue is CNN representation of some partial differential equations (PDEs) modelling wave phenomena, for example seismic waves. The array structure and local connectivity of the CNN paradigm make it a natural framework for the solutions of PDEs. Using an analog CNN-UM chip the computation can be carried out in real time and the accuracy can be improved by a configurable emulated digital CNN-UM. The existing VLSI implemented CNN chips are shown to present very well spatiatemporal nonlinear wave metric for object comparison and classification problems.

3

4

Providing the recent results in CNN theory and important applications the Special Issue on Cellular Neural Networks: Theory and Applications will be of interest to all researchers and Ph.D students in the area of applied mathematics, electrical engineering, artificial intelligence and computer science. July, 2005

Angela Slavova Guest Editor

FUNCTIONAL DIFFERENTIAL EQUATIONS

VOLUME 13

2006, NO 1

PP. 5- 21

VLSI IMPLEMENTATION OF CONTROL SCHEMES BASED ON DYNAMICAL CENTRAL PATTERN GENERATORS

PAOLO ARENA, LUIGI FORTUNA, MATTIA FRASCA,

AND LUCA PATANE '

Abstract. In this paper the problems related to the VLSI implementation of a class of nonlinear systems used as Central Pattern Generators for artificial locomotion control are discussed and an efficient solution based on the switched-capacitor technique is proposed. Moreover, an example of application of the proposed methodology is presented: a VLSI chip implementing a CPG for hexapod gait control has been designed, realized and then successfully used to control a prototype of six-legged walking robot.

Key Words. Locomotion control, Central Pattern Generator (CPG), legged robots, analog VLSI chip, switched-capacitor.

1. Introduction. The paradigm of Central Pattern Generator (CPG) is one of the most significative examples of the successful interactions between biology and engineering which are at the base of the so-called bio-inspired robotics [1, 2, 3). This relatively recent research field includes two different perspectives: on one hand, inspiration from nature provides robotic engineers with a lot of insightful principles to design novel devices and machines, and more in general to face engineering problems; on the other hand, robotics constitutes an accurate environment to check the validity of a biological model, to formulate new questions on the model itself, providing valuable feedback to neurophysiologists, biologists and neuroethologists.

The CPG is a network of neurons able to generate the rhythmic locomotion pattern without the need of sensory feedback and signals from high-level neuronal centers. Its role is central in the hierarchical organization of the motor system of various animals [4).

• Dipartimento di Ingegneria Elettrica Elettronica e dei Sistemi, Universita degli Studi di Catania, viale A. Doria 6, 95125 Catania, Italy. E-mail: [parena, lfortuna, mfrasca, lpatane]@diees.unict.it.

5

6 P. ARENA ET AL.

CPGs and more in general hierarchical controllers [5, 6, 7, 8] were applied to control locomotion of walking machines that have better maneuverability than wheeled robots but are more difficult to be controlled. Seminal works on bio-inspired approaches include those by Brooks [9], and Beer et al. [6].

Concurrently, a lot of theoretical CPG models were investigated. Among these works a widespread approach is to use nonlinear dynamical systems to model CPGs [10]-[15]. CPGs based on neural network oscillators (for instance, leaky-integrators computing the average firing frequency of neurons) have been applied to biped robots [16] or robot arms [17]: the control system of these robots is based on the oscillator introduced by Matsuoka [18]. Learnable dynamical pattern generators are treated in [19]. All of these are based on systems of nonlinear differential equations implementing the dynamics of connected neurons. Most of these approaches stop at the software simulation level: when attempting to the low level implementation, this is usually performed using finite state machines realised with digital microcontrollers. So, rarely attention is focused on an hardware architecture serving as a paradigm in the topographic implementation of CPGs. Moreover, even if digital implementation is friendly in many cases, when the number of degrees of freedom to be concurrently actuated grows-up, as in the case of bio-inspired structures, the analog topographic approach becomes appealing. Here the controlling network is topographically distributed like the actuator network, avoiding the bottleneck of data/ signals transfer, and information processing takes place in strict relation to the neurobiological case.

One of the few examples dealing with VLSI implementation of the proposed CPG architectures is the work of Lewis [20], focusing on an hardware system able to control a biped robot, which, up to now, is not able to walk autonomously for the lack of posture control. However, to take full advantage of the bio-inspired analog control, general strategies for the design of VLSI controllers should be introduced. This is fundamental to design fully autonomous walking micro-robots with on board intelligence.

This paper focuses on the main problems connected with the VLSI design of a CPG made of dynamical interconnected nonlinear units and proposes a suitable strategy for its implementation. The solution proposed is based on the switched-capacitor technique and allows to simplify the control of various gait parameters, for instance, the walking speed.

Moreover, the paper discusses an example of the application of the introduced methodology. The example deals with a VLSI chip for locomotion control in a hexapod robot and is an experimental confirmation of the suitability of the approach.

VLSI DYNAMICAL CENTRAL PATTERN GENERATORS 7

The rest of the paper is organized as follows: Section 2 introduces the class of nonlinear systems modelling CPGs; Section 3 discusses the methodology for the design of a VLSI CPG; in Section 4 an example of the application of the methodology is shown; Section 5 concludes the paper.

2. Dynamical Central Pattern Generators. In nature locomotion (walking, swimming, flying) is often the result of stereotyped movements. The control of these movements is realised by CPGs, groups of neurons whose rhythmic pattern of firing is mapped onto the pattern of locomotor activity: CPGs generate and regulate the different motor patterns of the animal. This concept has been found useful in robotics, especially when the control of a locomotion system requires the concurrent activation of many actuators. CPG models have regarded several aspects (at several levels) of the neurophysiological mechanisms involved in neural control of locomotion: associative neural networks [21], synergetics [22], software architectures [23]. However, a popular approach is to consider networks of coupled nonlinear oscillators as possible models of CPGs [10)-[15]. The key point of this approach is to include in the model the dynamics of the neurons, an essential feature of biological networks. We refer to this class of CPGs as dynamical CPGs.

Focusing on locomotion control of walking, several models that differ from the choice of the nonlinear oscillator, synapses, level of abstraction and accuracy have been developed. Most of these models share some important features that can be summarized in the following points:

" the nonlinear oscillators in the network are identical; ., each oscillator may represent either a single motor-neuron, an inter

neuron or a population of neurons; " the movements of a leg are controlled by a single oscillator; " interlimb coordination is provided by the connections between the

oscillators.

A set of ordinary differential equations can be used to describe a dynamical CPG. A formal definition of a dynamical CPG is now introduced. We refer to the single oscillator as the CPG basic unit and introduce some preliminary notation.

Let us consider n CPG units, where each unit is a m- th order nonlinear system. Moreover, let us label the CPG units associated with the legs numbered from front to rear and labeled as left (L) or right (R). The dynamical CPG is described by the following equations:

8

(1)

P. ARENA ET AL.

{

~~ = j(x1, p, u) + E#i glj(Xj, p, u)

Xn = f(xn, p, u) + Ej;<i gnj(Xj, p, u)

where x; E Rm with i = l..n are the state variables of each CPG unit, p E Rmp is the parameter vector and u E Rm• is a set of external inputs (representing for example signals coming from sensors). f and g;j are nonlinear functions; they represent the dynamics of the single CPG unit and the connections between the CPG units, respectively.

Moreover, let us introduce a set of N output variables as follows:

(2)

where x = [x1 .. xn]T. These output variables are obtained from the state variables and are

effectively used to drive the N actuators of the robotic structure. In fact the further nonlinearities hj have been introduced to obtain signals suitable for the control of the motion structure joints. The most trivial example of such output transformation is a saturation nonlinearity accounting for upper and lower limits of a motor working range.

It should be noticed that CPGs based on recurrent networks [24]-[25], on Cellular Neural Networks [15], on networks of coupled oscillators [10]-[17] can all be modeled under the general paradigm represented by equations (1) and (2).

2.1. Example: a CPG for fast gait of six-legged robots. An example of dynamical CPG is the CPG for generating the fast gait for a six-legged robot discussed in [15]. Fast gait (also called alternating tripod) is a very common gait of hexapod walkers in which two tripods (Rl, L2, R3 and L1, R2, L3) alternate. The CPG consists of six CPG units, each one is connected to a leg of the structure and controls the two active joints of the leg. Moreover, each CPG unit is a second-order system, thus m = 2 and n=6.

Let us indicate with x; = [ :~:: ] , p = I ; ] and u = [ ~~ ] the state

variables of the i- th nonlinear oscillator, the parameter vector ( mp = 3) and


TABLE 1

Parameter values of system {3).

f.1, s !1 !z !c c 0.5 1 -0.3 0.3 0.3484 -0.6

the external inputs (mu = 2), respectively. The following equations describe the fast gait CPG:

(3)

{

it,Ll = -xl,Ll + (1 + f.t) tanh 2xt,Ll - s tanh 2xz,L1 + i1 + +c tanh 2xl,R3

Xz,Ll = -xz,LI + s tanh 2xl,Ll + (1 + f.t) tanh 2x2,L1 + iz

{

i1,L2 = -Xt,Lz + (1 + f.t) tanh 2xl,L2- s tanh 2xz,Lz + i1 + +~tanh 2xl,R2 + ~tanh 2x1,L3

Xz,Lz = -xz,Lz + 8 tanh 2xl,L2 + (1 + f.t) tanh 2x2,L2 + i 2

{

il,£3 = -xl,£3 + (1 + f.t) tanh 2xl,L3- 8 tanh 2x2,L3 + i 1 + + panh 2xl,L2 + ~tanh 2xl,Rl

iz,L3 = -xz,L3 + s tanh 2xi,L3 + (1 + f.t) tanh 2xz,L3 + i2

{

i1,R1 = -xl,Rl + (1 + f.t) tanh 2xl,Rl - s tanh 2xz,m + i1 + +c tanh 2xl,L3

Xz,RI = -xz,RI + s tanh 2xl,Rl + (1 + f.t) tanh 2xz,RI + iz

{

i1,R2 = -xl,R2 + (1 + f.t) tanh 2xl,R2 - s tanh 2xz,Rz + i1 + +Panh 2xl,L2 + panh 2xl,R3

Xz,Rz = -xz,Rz + s tanh 2xl,R2 + (1 + f.t) tanh 2xz,Rz + iz

{

il,R3 = -xi,R3 + (1 + f.t) tanh 2xl,R3 - s tanh 2xz,R3 + i1 + +panh2xl,R2 + panh2xl,Ll

Xz,R3 = -xz,R3 + s tanh 2xl,R3 + (1 + Jl.) tanh 2xz,R3 + iz

Given the choice of parameters as in Table 1, each second-order CPG unit admits a periodic solution, providing rhythmic movements to the effector organ. Moreover, there exists a critical value of i2 (indicated as ic) when i2 > ic there is a bifurcation in the dynamics of the CPG unit (the limit cycle solution collapses into a stable equilibrium point) and a leg can be stopped. This mechanism is used to implement direction control in the model [26].

Connections between the nonlinear oscillators labelled L1, L2, .. , R3 represented by terms such as 9LI,R3(XR3) = c tanh 2x1,R3 determine the phase lags needed for a proper locomotion gait (fast gait). Other locomotion gaits can be implemented by choosing other connections [15]. As regards the output variables, these can be defined as follows:

10

(4)

P. ARENA ET AL.

{ Yl,h = 0<1 tanh 2XJ,h + fJ1

Y2,h = 0<2 tanh 2x2,h + fJ2

where h = L1,L2, .. ,R3 and a 1, a2, /31 and {32 are parameters that depend on the leg structure [15].

Remark. This CPG model is based on Cellular Neural Networks (CNNs) [27, 28]. This approach has been shown to be suitable for the control of hexapod robots and other bio-mimetic structures [29, 30]. A very important point of this dynamical CPG is that the locomotion gait emerging from the interactions of a distributed network of nonlinear circuits is naturally mapped in the topographic architecture of the CNN circuit paradigm. In other words both hierarchical and spatially-distributed subdivisions are exploited to solve locomotion control in real-time.

3. Switched-capacitor implementation. In order to guarantee the feasibility of the implementation equations (1), that are dimensionless, have to be rewritten as follows:

(5) { !t = ~(f(xt, p, u) + I::j;Oi gtj(Xj, p, u))

d:t = ~(f(xn,P, u) + I::j;Oignj(Xj,p, u))

This introduces a time-scaling factor in equations (1) that does not change anything but the frequency of the oscillations of the CPG units. The parameter T in equations (5) constitutes the time-scaling factor; when the CPG is implemented in an electronic circuit, this is usually given by T = RC. It has been implicitly assumed that the state variables are voltage signals.

A peculiarity of dynamical CPGs (1) is that their typical oscillation frequency ranges in the order of magnitude of Hz. This derives from the typical stepping frequency of both the biological model and the legged robot. An efficient hardware implementation of dynamical CPGs thus requires a solution to the problem of integrating large time constants.

Another peculiarity of the dynamical CPG leads to another constraint in the design of VLSI chips for artificial locomotion control. Dynamical CPGs aim to be generic models representing the universal functional features of nonspecific oscillator networks. In order to preserve the generality of the approach the VLSI implementation should not be constrained to a parameterdependent model. This mainly applies to the oscillation frequency. Fixing this frequency restricts the application range of the developed circuit.


These design issues are solved by the choice of the switched-capacitor technique to implement the circuit.

In SC integrated circuits, resistors are realized by means of switched capacitors. An equivalent resistor circuit can be implemented by a capacitor CR, switched by a clock with a frequency fc = ~· The equivalent circuit approximates a resistor of value:

(6)

Referring to equations ( 5) factor 7 = RC is given by:

(7)

and adopting such circuit, the time-scaling

Equation (7) is fundamental to achieve the mentioned features of the locomotion control:

,. Integration of large time constants; " Suitability of the approach for different locomotion structures.

As regards the latter point, by changing the clock frequency, the oscillation frequency can be adapted in a large range to locomotion structures requiring different actuating frequencies (servo-motors, piezoelectric actuators, ... ).

The first point is exemplified with the help of the CPG for the hexapod fast gait in equations (3). The typical stepping frequency of insects (and of walking robots) is in the order of magnitude of Hz. An integrated circuit implementing this CPG could lead to big capacitors that either may require large silicon areas or external capacitors. Both solutions are not appealing. Alternatively, by using SC integrated circuits, it is possible to obtain low stepping frequencies. For instance, by selecting (}R = 10 and a clock frequency of fc = lOOH z the time scale factor is equal to 7 = O.ls; since each single CPG unit oscillates with a frequency of fa ~ 1~7 , a stepping frequency in the desired order of magnitude can be achieved.

There are two further characteristics that make the switched-capacitor approach particularly appealing. Although the control of walking speed in insects is complex and passes through a continuous transition between distinct gaits via a progressive shortening of the stance phase, in SC circuits a simple and efficient mechanism can be implemented. Given a locomotion gait, its speed can be controlled by varying slightly the clock frequency. Moreover,

12 P. ARENA ET AL.

different connection topologies can be implemented and selected by switches driven by external digital control signals. Hence this approach provides two different strategies: a switching between a given set of possible locomotion patterns and a continuous variation of the speed within each pattern. The choice of the right locomotion pattern is clearly devoted to a higher level control taking into account any source of information, to establish which speed has to be adopted.

These considerations lead to the second characteristics of the SC approach. The dynamical CPG, a continuous-time system, is implemented by a digital technique for analog circuits. Hence, the approach configures itself as an hybrid scheme, in which the core of the control is analog and is devoted to the locomotion pattern generation through the implementation of equations (5), while the behavior of the CPG is modulated by signals coming from sensory feedback or high level control through the vector parameter p or the input vector u. The idea of hybrid control is schematically illustrated in Fig. 1.

The advantages of an analog, digitally programmable circuit are evident since this strategy exploits the peculiar capabilities of the digital and analog world, by delegating the feedforward gait generation (the most difficult task for a micro-controller, since the number of actuators of the locomotion system can be high) to the analog core and the feedback control law to the digital control.

4. Example of application. In this Section the introduced methodology is applied to an example; a VLSI implementation of the CPG (3) is illustrated [33].

4.1. Design of the chip. The chip is constituted by six CPG units, locally connected, and includes a phase generator block generating the twophase clock starting from the external sinusoidal clock provided to the chip. Moreover, it allows the CPG (3) to be modulated by two external inputs hR2

Simple Logic Unit

1---t>~ Dynamical CPO

FIG. 1. Scheme of the hybrid control: the analog core {the dynamical CPG) is controlled by a digital controller.


TABLE 2

Characteristics of the CNN-based CPG chip.

Power Supply .±2.5V Power Consumption 50mW Pins 19 Area 1400tJ.mx1200ttm Clock (sinusoidal) amplitude 2V, frequency fc 2 Modulating Inputs i2 £2, iz R2

12 Outputs X1 Ll., X2 Ll, ··· , X1 R3, X2 R3

and i 2,L2 in order to implement direction control. Therefore it is suitable for controlling locomotion and speed of locomotion of a bio-inspired hexapod robot with 12 servomotors (2 for each leg).

The design of a single CPG unit follows the operational amplifier implementation of the CNN cell [32] and the SC design technique [31]. The values of the circuit capacitors have been chosen to match the parameters reported in Table 1 minimizing the on-chip area. The operational amplifiers, involved in the motor-neuron schematic, are canonical two-stage CMOS operational transconductance amplifiers [31] with dominant-pole compensation, static gain A ~ 55 dB, phase margin M"' = 60° and unity-gain frequency fr =15M Hz.

The characteristics of the chip are reported in Table 2. The inputs of the chip are the control signals i 2,12 and i 2,n2 , while the outputs are the two state variables of each CPG unit. The technology adopted is CMOS AMS 0.8J.Lm. A photo of the first prototype is reported in Fig. 2.

4.2. Experimental results. The experimental results, obtained on the first chip prototype, illustrate the key points of the control approach discussed above. All the data have been acquired by using a data acquisition board (National Instruments AT-MIO 1620E) and subsequently plotted by using MATLAB.

4.2.1. Range of oscillation frequency. First of all, the behavior of the single CPG unit is illustrated. Fig. 3 shows the trends of state variables x1,11 and x2,L1 at two different clock frequencies (Fig. 3(a) refers to fc = 10kHz, while Fig. 3(b) to fc = 100Hz) and the phase plane x1 - x2 for the case of fc = lOOH z. The corresponding oscillation frequencies are fo = 132Hz and fo = 1.32H z, respectively. These two cases demonstrate the suitability of the approach to control different actuators (for instance servomotors and piezo-electrics). Moreover the low stepping frequency of the latter case is an experimental confirmation that this technique allows to realize large

'

'

14 P. ARENA ET AL.

FIG. 2. Photo of the CNN-based CPG chip.

'' 0.,

:;: 0 :;: ,. 0 ,., .. 0.5 ,. " ·" _,

·1.5 _,

_, ·" '' ·~a

t{s]

(b)

FIG. 3. Waveforms of state variables x1,L1 and Xz,Ll at two different clock frequencies f, =10kHz (a) and f, =100Hz {b). (c) Phase plane x,- Xz for f, =100Hz.


TABLE 3

Comparison between simulated and measured oscillation period for different clock frequen-cies.

fc [Hz) Tsim [ms) Tm [ms) Tm-T~im T·

90 780 845 8.3% 100 700 760 8.6% 110 630 690 9.5%

time constants with relatively small capacitances. The frequency range has been further investigated by measuring the

period of oscillations versus the clock frequency in the interval 1 OOmH z -500kHz. For each clock frequency the data have been stored and then the period of oscillations has been calculated. The experimental results show a linear dependence of the period of oscillation from the clock frequency as expected from equation (7). The operating range extends up to fc = 3M Hz beyond which the chip outputs are saturated.

4.2.2. Locomotion gait. Fig. 4 reports the waveforms of the variable x1 for each CPG unit at fc = 100Hz. Similar results have been obtained in the full operating range of the circuit. As can be noticed the signals of the two tripods are suitably synchronized demonstrating that fast gait is correctly working. The good agreement between simulations and experimental results for the fast gait guarantees the possibility of obtaining other locomotion gaits following the approach described in [15].

4.2.3. Speed control. The speed control issue can be addressed by providing slight changes of the clock frequency. As an example we considered small variations of the clock frequency around fc = 100Hz, since this value is suitable for the control of a hexapod robot. The same linear behavior observed in the analysis of the operating frequency has been experimentally verified even for slight changes of the clock frequency. Table 3 reports a comparison of simulated (Tsim) and measured (Tm) periods. As can be noticed, a good agreement between simulation and experimental results has been obtained.

4.2.4. Direction control. The two biases iz,Lz and iz,Rz provide a suitable way to control the direction of the robot. The results given here illustrate the possibility of changing the behavior of the CPG by acting on these inputs. One of these two inputs has been kept constant, while a periodic square wave has been applied to the pin corresponding to the other input. The suitability

16 P. ARENA ET AL.

05

0

:;;: )(-0.$

_,

-1.$

0.5 ,, t[s)

FIG. 4. Trend of X! for each CPG unit (fast gait) at f, =100Hz.

2 L1 2 R1

~ 1.

_o 0· > ·v --1 -1

-2 -2

-3 16 18 20 22

-3 16 18 20 22

2 L2 2 R

... . 1 .

_o 0·

2:._1 -· vv· -1 .

-2 : __ ...... : ... -2

-3 -3 16 18 20 22 16 18 22

2 L 2 R3

>0 0

--1 -1

-2 -2 .lL

-3 16 16 20 22

-3 16 18 20 22

t [s] t[s]

FIG. 5. Direction control at f, =100Hz: waveforms ofxl for each cell. When the square wave signal is high, the oscillations of leg R2 stop.


of the direction control has been verified at different clock frequencies. In particular, Fig. 5 refers to fc = 100Hz. In this case the frequency of the square wave signal applied to the input i 2,FI2 is J; = 80mH z. Fig. 5 shows that, when the square wave signal is high, the movements of the leg R2 stop since the state x1,n2 of the CPG unit R2 is maintained below -1 and thus from relation (4) it derives that the output is y1,n2 c::: -a1 + fh: this value corresponds to a leg in support phase. For the same reason the changes in the trends of x 1,L2 and x 1,n3 , visible in Fig. 5 during the turning period, do not affect the locomotion gait.

4.2.5. Robot control. The VLSI chip has been tested on a hexapod robot prototype. The structure of the robot, in aluminium, is similar to the robot described in [34]. An electronic board was designed to provide all the control signals to the chip. For instance as regards the clock, a circuit included in the board is devoted to the generation of a square wave signal with an adjustable frequency.

The outputs of the chip, namely the state variables of the CPG units, are used to generate the Pulse Width Modulated (PWM) signals driving the servomotors (Futaba 83003) of the robot. An electronic board is devoted to the generation of PWM signals. The experimental setup thus consists of only two electronic boards carried on the robot.

The experimental tests have shown the suitability of the chip, which correctly works on the hexapod robot. Fast gait, turning and speed control were successfully tested. Movies of the robot are available on-line at the web page http: I lwww .scg.dees. unict.it I activities lbiorobotics lmovie.htm.

In Fig. 6 a photo of the hexapod robot while walking with fast gait is shown. Although a signal (x1,n3) is not perfectly synchronized with the other two signals of the tripod (as shown in Fig. 4), in real tests this slight deviation is not significant: during walking the legs of the robot are practically synchronized and the robot is able to walk on quite regular terrains.

As concerns speed control, the stepping frequency modification as a function of the clock frequency was verified. The stepping frequency changes appeared evident both in the trend of the state variable (x1,L1) visualized on the oscilloscope and in the movement of the robot legs: when the clock frequency is changed, the frequency of this signal correspondingly changes and the legs of the robot move faster or slower. It has been experimentally verified that the stepping frequency changes do not affect the leg synchronization.

18 P. ARENA ET AL.

4.2.6. Sensory feedback. A common issue of bio-inspired control is how to handle sensory input. We would like to remark that this depends more on the specific CPG equations than on the SC technique. Moreover, in many CPGs this is not a key point: in fact, by definition, CPGs do not need sensory feedback to produce a locomotion pattern, but sensory feedback may be fundamental for adaptive walking on uneven terrain [25]. We discuss how to handle sensory input in relation to our example (the CNN-based CPG).

In [25] entrainment of external periodic input was successfully applied to control walking of a quadruped robot on irregular terrain. We have taken into consideration the problem of feedback from ground contact signals in [26]. The results obtained by performing several simulations agree with those of [25]: the entrainment of external periodic input from ground contact sensors is needed for irregular terrains. Real tests on the hexapod robot demonstrate that in our case this entrainment is not strictly needed even on irregular terrains. For this reason, although this issue can be implemented on our chip, it has been not included in our design.

An example of sensory feedback is the direction control implemented on the robot. The robot has been equipped with two infrared sensors (one on the left side, the other on the right side). An infrared emitter diode has been used as target. The robot is able to turn right or left according to the sensor detecting the transmitted infrared signal. The sensor outputs are connected to the input bias iz,Lz and iz,Rz through two monostable circuits, used to prolong the duration of the input spike and thus to activate turning in one direction. A video included in our web page shows how the robot is able to orient towards the target and to follow it.

5. Conclusions. In this paper the problems related with the VLSI implementation of a class of nonlinear dynamical systems modelling biological CPGs have been faced. The importance of these nonlinear controllers is connected with the possibility of controlling legged robots in an analog, bioinspired, effective way. Analogously to the biological model, in CPGs realized through coupled nonlinear oscillators, the locomotion pattern is the result of the self-organization of the system and is modulated by external inputs. The whole approach configures itself as an hybrid approach, in which the generation of the gait is analog and its modulation is accomplished by digital signals related to the sensor status.

The solution introduced in this paper is to apply the switched-capacitor technique to design these dynamical CPGs. This solution is particularly appealing since it leads to the possibility of implementing the large time constants required to achieve low stepping frequency with small area con-


sumption. Moreover, the SC technique has the further advantage of being suitable for the implementation of an hybrid scheme, based on an analog but digitally programmable circuit. In this sense it can be concluded that the control methodology is for several aspects strictly related to the VLSI technique adopted for the design. An example is the speed control achieved by modulating the clock of the SC circuit.

The second part of the paper illustrates an application of the proposed methodology. The design of a VLSI chip implementing the CPG for hexapod alternating tripod has been briefly presented and emphasis has been posed on the experimental results. The VLSI chip correctly works and is able to control a simple prototype of walking robot. These results stimulate further studies directed towards the design of a fully autonomous bio-inspired microrobot with on-board intelligence, in which the generation and the control of the locomotion gait is addressed by the VLSI chip introduced in this paper.

FIG. 6. A photograph of the walking robot controlled by the VLSI chip.

20 P. ARENA ET AL.

Acknowledgements. This work was partially supported by the Italian "Ministero dell'Istruzione, dell'Universita e della Ricerca" (MIUR) under the project PRIN "Innovative Bio-Inspired Strategies for the Control of Motion Systems" and by the EU under the project FP6-004690 SPARK.

REFERENCES

[1] J. Ayers, J. L. Davis, A. Rudolf, Neurotechnology for biomimetic robots. MIT Press, 2002.

[2] B. Webb T. R. Consi, Biorobotics. MIT Press, 2001. [3] F. G. Barth, J. A. C. Humphrey, T. W. Secomb, Sensors and sensing in biology and

engineering. Springer Wien New York, 2003. [4] S. Grillner, G. N. Orlovsky, T. G. Deliagina. Neural Control of Locomotion. Oxford

Press, 1999. [5] T. Zielinska, J. Heng, "Development of a walking machine: mechanical design and

control problems", Mechatronics, 12:737-754, 2002. [6] R. D. Beer, H. J. Chiel, R. D. Quinn, K. S. Espenschied and P. Larsson, "A dis

tributed neural network architecture for hexapod robot locomotion", Neural Computation, 4:356-365, 1992.

[7] H. Cruse, T. Kindermann, M. Schumm, J. Dean and J. Schmitz, "Walknet - a biologically inspired network to control six-legged walking" Neural Networks, 11:1435-1447, 1998.

[8] J. Ayers P. Zavracky N. McGruer D. Massa W. Varus R. Mukherjee and S. Currie. A modular behavioral-based architecture for biomimetic autonomous underwater robots. In Proc. of the Autonomous Vehicles in Mine Countermeasures Symposium. Naval Postgraduate School, 1998.

[9] R. Brooks, "New Approaches to Robotics", Science, Vol. 253, pp. 1227-1232, 1991. [10] A. H. Cohen P. J Holmes R. H. Rand, "The nature of the coupling between segmen

tal oscillators of the maprey spinal generator for locomotion: a mathematical model", J. Math. Biol., 3, pp. 345-369, 1982.

[11] J. J. Collins and I. N. Stewart. Coupled nonlinear oscillators and the symmetries of animal gaits. J. Nonlinear Sci., 3:349-392, 1993.

[12] J. J. Collins and I. Stewart. A group-theoretic approach to rings of coupled biological oscillators. Biol. Cybern., 71:95-103, 1994.

[13] J. J. Collins and I. Stewart. Hexapodal gaits and coupled nonlinear oscillator models. Biol. Cybern., 68:287-298, 1993.

[14] M. Golubitsky I. Stewart P. Buono J. J. Collins. A modular network for legged locomotion. Physica D, 115:56-72, 1998.

[15] P. Arena, L. Fortuna, M. Frasca, "Multi-Template Approach to realize Central Pattern Generators for Artificial Locomotion Control", Int. J. Circ. Theor. Appl. Vol. 30, 2002, pp. 441-458.

[16] G. Taga, Y. Yamaguchi, H. Shimizu, "Selforganized control of bipedal locomotion by neural oscillators", Biological Cybernetics, Vol. 65, 1991, pp. 147159.

[17] M. M. Williamson, "Neural control of rhythmic arm movements", Neural Networks, Vol. 11, 1998, pp. 13791394.

[18] K. Matsuoka, "Mechanisms of frequency and pattern control in neural rhythm generators", Biological Cybernetics, Vol. 56, 1987, pp. 345-353.


[19] S. Schaal, "Is imitation learning the route to humanoid robots?", Trends in Cognitive Sciences, Vol. 3, No. 6, 1999, pp. 233242.

[20] M. A. Lewis, R. Etienne-Cummings, A. H. Cohen, M. Hartmann, "Toward biomorphic control using custom a VLSI CPG chips", In International Conference on Robotics and Automation, San Francisco, April 1998.

[21] D. Kleinfeld, H. Sompolinsky, "Associative neural network model for the generation of temporal patterns: Theory and application to central pattern generators", Journal of Biophysics, 54, pp. 1039-1051, 1988.

[22] G. Schoner, W. Y. Yang, J. A. S. Kelso, "A synergetic theory of quadrupedal gaits and gait transitions", Journal of Theoretical Biology, 142, pp. 359-361, 1990.

[23] R. C. Arkin, Behavior-Based Robotics, MIT Press: Cambridge, MA, 1998. [24] H. J. Chiel R. D. Beer and J. C. Gallagher, "Evolution and analysis of model CPGs

for walking I. Dynamical modules", J. Computational Neuroscience, 7(2):99-118, 1998.

[25] Y. Fukuoka, H. Kimura and A. H. Cohen, "Adaptive Dynamic Walking of a Quadruped Robot on Irregular Terrain based on Biological Concepts", Int. Journal of Robotics Research, val. 22, No. 3-4, pp. 187-202, 2003.

[26] P. Arena, L. Fortuna, M. Frasca, L. Patane, "Sensory Feedback in CNN-Based Central Pattern Generators", International Journal of Neural Systems, Vol. 13, No. 6, 2003, pp. 469-478.

[27] L. 0. Chua, L. Yang, "Cellular Neural Networks: Theory and Applications", IEEE Transactions on Circuits and Systems I, 35, pp. 1257-1290, 1988.

[28] L. 0. Chua, T. Roska, "The CNN paradigm", IEEE Transactions on Circuits and Systems, 40, pp. 147-156, 1993.

[29] P. Arena, L. Fortuna, M. Branciforte, "Reaction-diffusion CNN algorithms to generate and control artificial locomotion", IEEE Trans. CAS I, 46(2):253-260, 1999.

[30] P. Arena, C. Bonomo, L. Fortuna, M. Frasca, "Electro-active polymers as CNN actuators for locomotion control", In Circuits and Systems, 2002. IS CAS 2002. IEEE International Symposium on, vo!. 4, pp. 281-284, 2002.

[31] D. A. Johns, K. Martin, Analog Integrated Circuit Design, John Wiley & Sons, 1997. [32] G. Manganaro, P. Arena, L. Fortuna, Cellular Neural Networks: Chaos, Complexity

and VLSI Processing, Springer-Verlag, 1999. [33] P. Arena, S. Castorina, L. Fortuna, M. Frasca, M. Ruta, "A CNN-based chip for

robot locomotion control", Proc. of IEEE Int. Conference ISCAS03, vol. 3, pp. 510-513, 2003.

[34] P. Arena, L. Fortuna, M. Frasca, "Attitude control in walking hexapod robots: an analogic spatio-temporal approach", Int. J. Circ. Theor. Appl. Vol. 30, 2002, pp. 349-362.


VOLUME 13 2006, NO 1 PP. 23-42

CELLULAR NONLINEAR NETWORK IN MICROCIRCULATION CHARACTERIZATION '

FRANCESCA SAPUPPO t, MAIDE BUCOLO 1,

MARCOS INTAGLIETTA §,LUIGI FORTUNA~ AND PAOLO ARENA II

Abstract. A new non-invasive real-time system based on the Cellular Nonlinear Networks (CNNs) for the analysis and study of dynamic phenomena in the microcirculation is described. CNNs technology represents an extremely suitable solution for spatial-temporal dynamics characterization of fluidics phenomena at micrometric toward nanometric scales such as blood flow in microvessels and its interactions and exchanges with the surrounding environment. This characterization is fundamental to describe the physiology of the peripheral vascular network that is central in the understating of the global cardiovascular regulatory system for both experimental and clinical applications. Image processing algorithms based on the structure of a CNN universal machine (CNN-UM) were implemented for the study and reconstruction of incomplete and degraded images of capillary networks and for the oxygen distribution characterization in the microcirculation system is proposed which are a basic for the understanding of the microcirculation functionality. The algorithms are applied on microvessels images obtained during in vivo experiments by intravital microscopy on hamsters. They perform a directional reconstruction of degraded network images and exploit experimental data as a starting point for the oxygen distri-

'This work was partially supported by the Italian Minister "Ministero delJ'Istruzione, delJ'Universit e della Ricerca" (MIUR) under the Firb project RBNE01CW3M and by the University of Catania in relation with "Progetto Giovani Ricercatori- Anno 2000" . This study was supported in part by the USPHS Bioengineering Research Partnership grant R24-HL 64395, grants R01-HL 62318 and R01-HL 62354

t Dipartimento di Ingegneria Elettrica Elettronica e dei Sistemi, Universita degli Studi di Catania, viale A. Doria 6, 95125 Catania, Italy. E-mail: fsapuppo©diees. unict. it,

! Dipartimento di Ingegneria Elettrica Elettronica e dei Sistemi,Catania, Italy. E-mail: [email protected]

§ Department of Bioengineering of the University of California, San Diego, La Jolla, 92093 CA. Email: mintagli©ucsd,edu

, Dipartimento di Ingegneria Elettrica Elettronica e dei Sistemi,Catania, Italy. E-mail: [email protected]

II Dipartimento di Ingegneria Elettrica Elettronica e dei Sistemi,Catania, Italy. E-mail: [email protected]

23

24 F.SAPUPPO ET AL.

bution modeling. The algorithms were implemented via hardware on the ACE16k CNN chip and the resulting images representing the oxygen distribution were validated through analytical studies in order to verify the implemented oxygen gradient and diffusion laws

Key Words. CNN, microcirculation, image processing, directional network reconstruction, oxygen distribution.

1. Introduction. The characterization of microvascular and in particular capillary networks development and functional oxygen distribution in tissues at a micrometric scale is considered important in research applications such as the development of artificial blood and diagnostics for pathologies like retinal abnormalities, hypertension and cancer through the angiogenesis phenomena. A complete picture of the oxygen gradients from blood to tissue is, in itself, critically important to understand the process of oxygen delivery to tissue, its capacity, and its limitations [1]. A further information that integrates the characterization of the delivery phenomena in microcirculation is the capillary network map. It was shown [2] that the oxygen distribution in tissue is strictly related to the capillary density and therefore a global view of the morphological characteristic of the vascular network together with the oxygen distribution map can be considered expedient compared to conventional measurement methodologies that give fragmented information. In this paper the development of a CNN-based analysis system that allows a real-time study and reconstruction of capillary maps and oxygen distribution and that overcomes the limitations inherent to conventional measurement methodologies. The conventional methodologies such as the Microelectrode Methods, the Hemoglobin Spectrophotometric Methods, Porphyrin Phosphorescence Methods present several drawbacks and often inconsistency of results obtained by the different methods. The Microelectrode Methods presents spatial resolution limits due to the electrode size and that cause the averages the oxygen sources in a hemisphere of radius 6. 7 1-Lm from the surface of the electrode. This method needs, furthermore, a continuous calibration of the electrodes and research from the technological point of view. The Hemoglobin Spectrophotometric Methods is an indirect technique considered attractive because it utilizes optical means that are easily implemented at the microscope [3]. However, with respect to its use to determine P02, it depends on a precise knowledge of the hemoglobin absorption

CNN AND MICROCIRCULATION CHARACTERIZATION 25

spectrum and the relationship between the oxygen dissociation curve for hemoglobin and P02, a relation that is strongly influenced by local carbon dioxide concentration and pH, parameters that are not easily determined in blood in the microcirculation. A further limitation of this technique is that it cannot be used to measure tissue P02. Porphyrin Phosphorescence Methods allows a high spatial resolution measurements of oxygen tension in the microvessels and the surrounding tissue by optical means are now possible through the development of the phosphorescence quenching technique. The phosphorescence method is based on the relationship between the rate of decay rate of excited phosphorescence from Porphyrin bound to albumin and the partial pressure of oxygen according to the Stern-Volmer equation. In this method, animals receive a slow intravenous injection of the porphyrin dye before P02 measurements. The dye is made to phosphoresce by excitation with light flashes but, as drawback, oxygen is consumed as the phosphorescence decays, further distorting the decay signal[4]. These problems could be solved by analytical postprocessing or using a more complex technology with repeated light excitation of low intensity. As a corollary, microelectrode studies of oxygen distribution in the neighborhood of the blood vessel interface in general do not agree with the phosphorescence studies. The CNN based system, thanks to its technology characteristics, allows the implementation of image processing methods, a data acquisition parallelization and elaboration avoiding therefore time latency in the experiment and errors in the collection of data. This proposed methodology, therefore, provides a non invasive way to obtain a global view of the vascular network morphological characteristic associated with the oxygen information. It will give a contribute in order to validate in real time the in vivo experiment measurements and to insure their consistency and reliability. This methodology is applied on microvessels images obtained during in vivo experiments from a transparent window chamber surgically implemented in the dorsum of hamsters [1]. The system processes video frame sequences of the micro-vessel networks in order to map the capillary network and to characterize the oxygen distribution. In Section 2 the biological model used to describe and analyze the oxygen distribution phenomena in microcirculation is reported. In Section 3 a global monitoring system that exploits the features of CNN-UM is described. In Section 4 the implementation on the ACE16K CNN chip of two algorithms

26 F.SAPUPPO ET AL.

for the reconstruction of incomplete capillary maps and for the oxygen distribution characterization are shown.

2. Biological Model. The biological model we are considering to describe the oxygen delivery in the microcirculation field consists of two different phenomena: a radial contribute given by the oxygen diffusion law and longitudinal contribute along the vessel. In order to consider the radial phenomenon, the model of vessels we analyze consists of three concentric cylinders [5]. The inner cylinder with radius ri is the lumen of the vessel. The second cylinder has radius ro and extends from ri to the outside of the arteriolar wall, i.e. it is the wall. The outer third cylinder has radius rt and includes the region of tissue supplied by the vessel, i.e. from ro and outward. We assume full cylindrical symmetry and thus there is no angular dependence of the variables. In addition, we assume translational symmetry along the z-axis, i.e., we consider a two dimensional model. Note that this means that flows and concentrations do not change along the length of the vessel. Finally, we assume steady state conditions, thereby any time dependence is eliminated. With these assumptions, the diffusion equation in cylindrical coordinates reduces to a one-dimensional differential equation. Let St be the rate at which oxygen is consumed in the tissue per unit volume and Dt be the diffusion constant for oxygen in the tissue. The partial pressure of oxygen p(r) at a radial distance in the tissue then must satisfy:

(1)

Solving this differential equation with the boundary condition p(ro) = po = l9mmHg and p(rt) = pt = 13mmHg, that are taken from the experimental data in [6], assuming capillary walls are considered to be part of the tissue region, therefore no oxygen is exchanged across the outer boundary of the tissue cylinder

(2)

it leads to:

(3) S (r2 - r 2) S (r2 - r2) In .!..

( ) t o ( ( t t o ) r 0

P r = Po + 4D, + Po - Pt + 4Dt In r.. rt

(4)


which can be generalized to:

T p(T)=A+BT2+Cin-

To

where A, B and C can be calculated solving (1) as A = 23.65, B = -2.9 * 10- 3, c = -5.20.

The ( 4) was approximated for implementation reasons with a polynomial function using a curve fitting algorithm to calculate the coefficients and obtaining:

(5) ( ) 4 3 2

p r =PIT + P2T +PaT + P4T + Ps

The coefficients PJ can be calculated as: PI = 0.0001906, p2 = -0.00478, p3 = 0.04574, p4 = -0.2405, p5 = 1.191, and they approximate the original curve with a R2 ~ 1.

The longitudinal oxygen gradient phenomenon in hamster microcirculation is characterized using a general law extrapolated from experimental data shown in Fig.1[Intaglietta eta!, 1996]. It shows the curve of the oxygen tension for arterioles and venules with different radius and the p02 for different type of capillaries according to their function and position in the capillarynetwork. Considering a capillary range between the arteriolar capillary and the tissue space the linear low that represents the oxygen longitudinal gradient can be written as:

(6)

3. CNN Based System. The core of the developed system is a new hardware prototype system, called Alladin Visual Computer Stack. This system consist of a SPM6020 DSP board and an ACE16K based platform. The ACE16K CNNUM can be basically described as an array of 128x128 identical, locally interacting, analog processing units designed for high speed image processing tasks requiring moderate accuracy (around 8bits). The system contains a set of on-chip peripheral circuitries that, on one hand, allow a completely digital interface with the host, and on the other provide high algorithmic capability by means of conventional programming memories where

28 F.SAPUPPO ET AL.

the algorithms are stored. The ACE16K is conceived to be used in two alternative ways. First, in applications where the images to be processed are directly acquired by the optical input module of the chip, and second, as a conventional image co-processor working in parallel with a digital hosting system that provides and receives the images in electrical form (7]. In this paper the second solution has been exploited in order to test the CNN based algorithms and to easily integrate the new monitoring system in the microscopic apparatus. In this configuration the ACE BOX follows the PC104+ standard. A PC104+ machine consists of the commonly used hardware components, such as PCI motherboard with Pentium class CPU, display adapter, network module, frame grabber, etc. The ACE BOX system can be also hosted in a desktop PC, plugged into a normal PCI slot by using the PCIADAl adapter card (8]. The PCI bus interface provides fast image data transfer. Images from a frame grabber or a hard drive are processed and leave the system through the PCI bus. The image can be displayed on a PC monitor (Fig. 2). A global view of the monitoring system developed for the characterization of the micro-vessels is shown in Fig. 3. The visualization system includes an inverted microscope (IMT-2 Olympus with a x20 objective) and a CCD analog camera (Cohu 4815-2000). The animal is positioned under the microscope in a restraint that minimizes its movements. Images were recorded using a black and white CCD camera recording at 30 fps, each frame and digitized with a resolution of 320x240 pixels. The system exploits the CNNUM which has eight grey scale e two binary image memories (LAM and LLM) on chip, 32 template memory (TEM), 4069 digital extraction therefore it is able to execute complex image processing tasks on any chip-size images without accessing any location out of the chip. The platform processes arbitrary sized images with automatic cutting and merging. In this task an automatic overlap guarantees the correct work of the CNN-based algorithm. The frames sequence analysis via CNN architec.ture was carried out via a fully developed procedure, where each step of the algorithm is described and executed in terms of single operations carried out by the templates contained in the CNN Library Template. The algorithm was written using a dedicated programming language, the Analogic Macro Code (AMC), and compiled to analogic binary code (ABC). It is a CNN-UM specific language, capable of directly managing the internal memories and the 4096 internal instruction of the chip; and also to operate with a frame grabber to acquire the image directly from the camera.

CNN AND MICROCIRCULATION CHARACTERIZATION 2g

At a higher level, the ABC program is executed through console program called CNNrun. This is a graphical device interface that allows to the user to interact with the hardware platform.

4. CNN Based algorithm and Results. Programming by template, along with the analog operating mode, allows to perform complex algorithms in short time and at a very high computing speed compared to digital microprocessing technologies. It should be noted that the CNN-UM works with grey scale images. New optimized CNN based algorithms were designed in order to extract microhemodynamic parameters from video recordings. The developed procedures performing a capillary network map reconstruction starting from a degraded image and the oxygen distribution characterization in the reconstructed map. The algorithms were implemented via hardware on ACE16K CNN chip in in vivo experiments using a transparent window chamber surgically implemented in the dorsum of hamsters[g]. The tissue was transilluminated, and observed by means of intravital microscopy.

4.1. Capillary Network Map Reconstruction. To reconstruct the capillary map the algorithm takes as input an image representing an incomplete capillary network map and the sequence of the operations shown in the flowchart in Fig. 4 is carried out:

" 1. direction detection that gives the network components at 0°, 45°, goo and 135°;

'" 2. pixel counting of each component; '" 3. evaluation of the two main direction components.

The starting point binary image is obtained using a time incremental derivative in order to detect the moving RBCs, implemented by subtraction of two consecutive grayscale input images microscopic blood flow movie. Finally an integration in time was performed in order to rebuild the network map obtaining the moving RBCs traces as shown in the root image in the image tree in Fig.5. The children of the tree represent the extracted direction components and the results on the pixel counting show how the 45°, goo can be considered the two main components. The next step is the actual reconstruction of the network starting from the incomplete capillary map and the obtained main direction component through the following steps (Fig. 6):

" 1. growth of the starting network map in the two main directions. This step is repeated a number of iterations that is proportional to

30 F.SAPUPPO ET AL.

100 I ARTERIOI.Afll I CAPILLARY I 100 VENULAR

"' 80 !f J: 80

~ E E • z ~ 80 00 0

'\ iii

"' z z I!! ::' 40 .......... 40 z z

~ / w w

~ (!) >-

"' 1§ 20 • . ..........

0 0 ... 'Y 100 30 i$(1 -40 20 jltil!ip .. 100 150 200 2~ 300

DIAMETER, "m DIAMETER, "m • 3"**1~,\m:~!llk!IUI$'.,1911111

FIG. 1. Distribution of the oxygen tension in the hamster microcirculation.

FIG. 2. Block diagram of the Aladdin Visual Computer.


PCIBUS

Microscope: Amtlag Came rd. Hasting Desktop PC

FIG. 3. View of the global monitoring system.

32

[) • Component

F.SAPUPPO ET AL.

I Incomplete Network I

45 • Component

First privileged direction

90 • Component

Second privileged direction

FIG. 4. Detection of the capillary direction: Flowchart.

135 • Compont

CNN AND MICROCIRCULATION CHARACTERJZATION 33

the directional component contribute in order to obtain the reconstructed map.

'" 2. concave filler, in order to fill out possible discontinuities in the reconstructed network;

" 3. a skeletonization can be performed in order to obtain the actual length of the capillary map.

The resulting image flow of the reconstruction algorithm is shown in Fig. 7. The directional growth is applied to the incomplete capillary network map (Fig. 7 a)) obtaining the image in Fig. 7 b). The principal direction growth is applied for a number of iteration fixed through a trial and error method, the secondary direction growth is repeated for the same number of iteration weighted by the ratio ~ ~~i. Fig. 7 c) shows a smoother image of the map obtained by the concave filler template and the map skeleton is shown in Fig. 7 d).

4.2. Oxygen distribution Characterization. The oxygen distribution characterization has been obtained by an algorithm that combines experimental data and analytical model of the oxygen gradient and diffusion in microvascular networks. The algorithm steps are reported in the flow diagram in Fig. 8:

" 1. a diffusion of the reconstructed capillary map weighted by the polynomial function (5) obtained to characterize the radial diffusion from the vessel wall;

" 2. a multiplication of the radial diffusion map by the oxygen longitudinal gradient obtained by the (6) was performed to obtain the complete characterization of the oxygen distribution.

The resulting image flow obtained by the implementation of the oxygen distribution characterization is shown in Fig. 9. The diffusion template is applied to the reconstructed network image weighted by the polynomial diffusion law so that the intensity value of the diffused capillary decreases in the radial direction (Fig. 9 b). In order to take into account the oxygen longitudinal gradient, a mask image (Fig. 9 c) whose diagonal intensity profile represents the linear law (6) was created The resulting image coming from the multiplication of the two main effects, the radial and the longitudinal gradient, is shown in Fig. 9 d). A validation of the results obtained running the oxygen characterization algorithm were performed on the resulting image representing the oxygen distribution in the capillary network.

·'

34 F.SAPUPPO ET AL.

1/(t.l .--------1 . // /1--------.,

/.I I

.- •' ' II, ,.'

- =-· . : - ' - ' .. _::.- .f

.:-:.:.=

FIG. 5. Detection of the capillary direction: Results.

.<' . .11:! I •' ,, ·'··

I ,i' i:t

,.

' ~.~) .. · ~ ' .... ~ .~

~· ~\

,.

,.


First privileged direction

second privileged direction

lterllilax1=1ter

LineMaxl

NO

Reconstructed Network

Reconstructed Network Skeleton

FIG. 6. Capillary Network Reconstruction: Flowchart.

F.SAPUPPO ET AL.

36

FIG. 7. Capillary Network Reconstruction: Results.


In Fig. 10 it is shown how the intensity profile of the longitudinal section of the area along a capillary presents a decreasing trend which was imposed through the (6). The intensity profile of a radial section of two neighboring capillaries is represented by a curve that has the maxima in correspondence of the capillary walls and decreases as it goes further from the capillary wall which is consisten with the (5)(Fig. 11).

5. Conclusion. A system for the reconstruction of incomplete capillary maps and for the oxygen distribution characterization in microvessel was designed and implemented using the CNN technology. CNN based algorithms were implemented using digital video coming from a microscopic setup during in vivo experiment on hamsters. The microscopic microhemodynamic field was observed and the functional and morphological parameter analysed. This algorithms were implemented via hardware on the ACE16K CNN chip. The image processing algorithms allow in the analysis process to overcome the conventional system drawback and to support them as a complementary information. The new methodology represents a complete view of the microcirculation field providing a morphological characterization associated with the oxygen distribution. It, exploiting the CNN characteristics of analogic and parallel processing allows to avoid complex collection and interpretation of data and provide a non invasive way to collect and predict information on the tissue and vessel oxygen consumption. This CNN-based approach can be generalized to be applied to different types and sizes microvessel (arterioles, venules) where the oxygen consumption laws are more complex [2], and it could be considered expedient to model and monitor in real time oxygen distribution and consequent vascular network dynamic changes in control and pathologic experimental studies. Its application is not restricted to the network mapping and the oxygen measurement, and can potentially be extended to other microcirculation parameters such as RBC density, particle aggregation in blood, and other information contained in the microscopic image. Moreover, the possibility to build CNN chips directly interfaced with the microscopic imaging system thanks to the acquisition of the images directly by the optical input module of the chip, opens the way to the realization of powerful system-on-a-chip for real time monitoring and analysis of dynamic phenomena in the microcirculation. The fast execution time for image processing can be exploited for high speed frame by frame based analysis.

38 F.SAPUPPO ET AL.

Reconstructed Network

Radial Diffusion

~ PO:! Longitudinal Radial Oxygen Distribution Gradient Mask Capillary Network Map

~/ I Multiplication I

Oxygen Distribution

Capillary Network Map

FIG. 8. Oxygen Distribution: Flowchart.


lj l ' VI ' .

. // I /.I

'

d} FIG. 9. Oxygen Distribution: Results.

40 F.SAPUPPO ET AL.

FIG. 10. Image Intensity Profile: Longitudinal.


FIG. 11. Image Intensity Profile: Longitudinal and Radial Gradient.

42 F.SAPUPPO ET AL.

REFERENCES

[1] A. G. Tsai, P.C. Johnson, and M. Intaglietta, "Oxygen Gradients in the Microcirculation", Physiol Rev, 83: 933-963, 2003.

[2] B.J. McGuire and T.W. Secomb, "Estimation of capillary density in human skeletal muscle based on maximal oxygen consumption rates", Am J Physiol Heart Circ Physiol, 285: H2382-H2391, 2003.

[3] R.N. Pittman and B.R Duling, "Measurement of percent hemoglobin in the microvasculature", J Appl Physiol, 38: 321-327, 1975.

[4] M. De Francisci, M. Bucolo, M. Intaglietta , L. Fortuna and P. Arena, "Real-Time Estimation of Oxygen Concentration in Micro-Hemo-Vessels", The 26th Annual International Conference if the IEEE Engineering in Medicine and Biology Society (EMBC 2004), San Francisco, Sept. 2004.

[5] A. Vadapalli,R.N. Pittman,A.S. Popel, "Estimating oxygen transport resistance of themicrovascular wall", Am. J. Physiol. Heart Circ.Physiol, 279 (2000), H657-671.

[6] S. Bertuglia1, A. Limon, B. Andresen, K. Heinz Hoffmann,C. Essex, P. Salamon, "Transport of 02 from arterioles", www.sci.sdsu.edu/ salamonjMicrocirc.pdf

[7] G. Lin, R. Domnguez-Castro, S. Espejo and A Rodrguez-Vzquez, "ACE16K: an Advanced Focal-Plane Analog Programmable Array Processor", European Conference on Circuit Theory and Design (ECCTD'2001),Helsinki University of Technology, vol. I, pp. 345, 348, 2001.

[8] "Aladdin V .. 3.1, Overview", EAnalogic Computers LTD, Budapest 2003. [9] B. Endrich, K. Asaishi, A. Gtz, and K. Messmer, "Technical report: a new chamber

technique for microvascular studies in unanaesthetized hamsters" , Res Exp M ed (Berl), vol. 177, pp.125-134,1980 Budapest 2003.


VOLUME 13

2006, NO 1 PP. 43- 60

SEISMIC WAVE PROPAGATION MODELING ON CNN-UM ARCHITECTURE *

P. KOZMA, P. SONKOLY t AND P. SZOLGAY I

Abstract. Recent interest in the field of fine detail from field seismograms has stimulated the research for modeling procedures which can predict the answer of the complex subsurface geometries in acceptable time. It is an important tool to understand wave-field phenomena and how it relates to observations of recorded seismic data. Among the various techniques available for this purpose, the implementation of the two-dimensional wave equation solved by CNN-UM offers distinct advantages. This approach can be expanded to generate the accelerogram of the geological structure. Two CNN-UM models have been examined. The first model is based on the analogue CNN-UM implementation and the second is based on an emulated digital CNN-UM architecture which is reconfigurable. The propagation of seismic waves and synthetic seismograms computed for several models illustrate how the technique may help the interpreter. The Falcon architecture is used to determine the required computational precision for our seismic model.

Key Words. Seismology, Wave equation, Body waves, Algorithms for specific classes of architectures

AMS(MOS) subject classification. 86A15, 35105, 73D15, 65Y10

1. Introduction. It might be a counter-intuitive idea but rocks are elastic. This is why waves propagate in the subsurface in the first place. If rocks were not elastic but they were an infinitely rigid medium, the subsurface move en masse during an earthquake.

• Supported by OTKA (Grant. No.: T042942) t University of Veszprem, Department of Image Processing and Neurocomputing,

Egyetem u. 10, H-8200 Veszprem, Hungary I Also affiliated to Analogic and Neural Computing Laboratory, Computer and Au

tomation Institute of HAS, Kende u. 13-17. H-1111 Budapest, Hungary

43

44 P. KOZMA, P. SONKOLY AND P. SZOLGAY

The size and shape of a solid body can be changed by applying forces to the external surface of the body. These external forces are compensated by internal forces, which resist the changes in size and shape. As a result, the body tends to return to its original condition when the external forces are removed. This property of resisting changes in size or shape and of returning to the undeformed condition is called elasticity. An ideal elastic body is one that recovers completely after being deformed. Many geological structures can be considered to be perfectly elastic without appreciable error provided that the deformations are small as they are in seismic surveys. The seismic wave propagation modeling has been widespread and can be applied successfully [12],[9] in exploration seismology. In the beginning time it was used to simulate the normal incidence reflectivity of a horizontally stratified medium and has been employed more recently to obtain the responses of subsurface structural and stratigraphic configurations of ever-increasing complexity. Indeed, the grooving interest in numerical seismic modeling has lead to wide spread methods with varying degrees of complexity, accuracy and implementational aspect. Such efforts are stimulated by the awareness that exact analytical solution to the elastic wave equation does not exist for most subsurface configurations of exploration interest and that solution to realistic models may be obtained only by approximate means. Among the numerous techniques available for this purpose, the method of finite differences is particularly versatile. The two dimensional partial differential equations of motion describing the propagation of stress waves in an elastic medium are approximated by suitable finite difference equations [12],[9],[8],[6], which can be solved on a discrete spatial grid by strictly numerical procedures. Whenever a continuous medium is approximated by a discrete grid the calculated seismic responses are dispersed and the effect increases if the resolution of the grid decreases. This effect is called grid dispersion and it has a detrimental effect on the quality of the seismological model. The elastic medium may be considered as a collection of locally homogeneous lithologic regions, which can be characterized by constant values of the density and elastic parameters. Motions in each region can be described by an appropriate finite-difference representation for the elastic equations corresponding to that region. This approach is called "homogeneous" formulation. An alternative, the "heterogeneous" formulation makes it possible to create more general models where different density and elastic parameter values can be associated with every grid point. This formulation provides the flexibility required to simulate a variety of complex stratigraphic configurations.

SEISMIC WAVE PROPAGATION MODELING ON CNN-UM ARCHITECRURE45

In the next section a mathematical model of elastic wave propagation in an elastic medium is presented. After that the analog and an emulated digital Cellular Neural Network architectures are presented in the third section. In the fourth section two seismic models are shown, the first for the analog implementation and the second one to the emulated digital implementation. The experimental results are contained by fifth section and it is followed by the conclusions.

2. The mathematical model of the elastic wave propagation. What are elastic waves and why they do propagate in the earth? From the moment in time and point in space of an emission from a seismic source particles within the surrounding medium are displaced, which results in a disequilibrium in the local pressure regime. Those particles displaced by the seismic source undergo a compression, since they are unable to move freely, and in turn exert a pressure or push on their neighbors .. These particles then go on to be displaced and undergo compression themselves, exerting a further push on their neighbors before once coming to rest, and so on. A seismogram of the examined area is a record of the ground shaking recorded by a seismograph. This record can be divided into three parts by the following way. The ?-waves travel fastest through the Earth so they arrive at a seismograph first, followed by the S -waves and lastly by the surface waves. The propagation of these elastic waves can be represented mathematically. Let x and z be the horizontal and vertical rectangular coordinates in a twodimensional medium, and let the z-axis be positive downward. Under these conditions, two coupled, second-order partial differential equations can be used to describe the motion of compressional P- waves and vertically polarized shear SV- waves in an elastic medium. The horizontally polarized shear S H- wave motion will not be treated here as it is uncoupled from the compressional wave and the vertically polarized shear wave motion. In case of heterogeneous formulation the two equations of the motion can be described by the following coupled equations [6]:

cPu p [}t2

a [ (au ow) au] a [ (ow au)] - A - + - + 2J.t- + - J.t - + -ox ox oz ox Dz ox oz (1)

o2w p [}t2 - a [ (au ow) ow] a [ (Dw au)] - A - + - + 2J.t- + J.t - + -ox ox oz oz ox ox oz

Here u and w are the horizontal and vertical displacements, p is the density, t is the time and A and J.t are the Lame parameters of the particular medium. The assumption that the density p is constant in the whole model enables


us to write the equation ( 1) as a function of the spatially varying P- and SV-wave velocities, but it reduces the generality of the model. The Lame parameters of the rock cannot be measured directly but the velocities of P-waves denoted by a and S-waves denoted by /3 of a particular medium are measurable values. The relationship between the Lame parameters and the wave velocities can be defined in the following way

(2) >. = p ( a2 - 2/32

) and 11 = p/32

The time differentials on the left hand side of the equation ( 1) can be represented by a second order centered differences. On the right side of the equation (1), complication of differentiating terms can be seen which contain the spatially dependent elastic velocities. Two types of terms can be considered to occur in equation (1), specifically, those having partial derivates with respect to one spatial coordinate and those containing partial derivates with respect to both spatial variables (i.e. mixed derivates). Consider a term to be typical of the first type

(3)

Let a2(x, z) be replaced by its discrete value a 2 (m, n) at the grid point (m, n). Let a 2 (m, n) be defined as the average value of a2(x, z) over a rectangle of sides 6x and 6z centered at the grid point (m6x, n6z). An approximation which has been found to perform it satisfactorily is as following

(4)

a2 (m + ~' n) [u (m + 1,n,l)- u (m, n, l)J

(6x) 2

a2 ( m- ~' n) [u (m, n, l)- u (m- 1, n, l)]

(6x) 2

where the averages a 2(m + ~' n) and a 2(m- ~' n) are defined in the form

(5) 2 ( ±1 ) a2 (m±1,n)+a2 (m,n) a m 2,n =

2

Consider a typical term of a mixed derivate

(6) a [ ( )] c(m,n+1,l)-c(m,n-1,l) az c x,z,t ~ 26z


where the function c(x, z, t) has been introduced for convenience. The right member of equation ( 6) can be approximated by the centered first-order difference

(7)

Let

(8) 8

c(x,z,t) =a2 (x,z) Bxu(x,z,t)

be approximated by the centered first-order difference

(9) ( l) 2 ( )u(m+1,n,l)-u(m-1,n,l) c m,n, ~a m,n

2Lix

Substitution of equation (9) into equation (8) yields the expression

! [a2 (x, z)! u (x, z, t)] ~ 4t.~t.z · { a 2 (m, n + 1) [u (m + 1, n + 1, l)- u (m- 1, n + 1, l)]

(10) -a2 (m,n -1) [u(m+ 1,n-1,l)- u(m -1,n -1,1)]}

The other terms of equation ( 1) are handled in a similar way. One finally obtains the two coupled finite-difference equations

u (m, n, l + 1) = 2u (m, n, l)- u (m, n, l- 1) Ll 2 { _1_ [a2 (m + 1, n) + a2 (m, n) u (m + 1, n, l)- u (m, n, l)

+ t Llx 2 Llx

a 2 (m, n) + a 2 (m- 1, n) u (m, n, l)- u (m- 1, n, l)] 2 Llx

1 [ 2 ( )w(m+1,n+1,l)-w(m+1,n-1,l) +--;:;- a m + 1, n A

2ux 2uz

-a m -1 n 2 ( ) w(m -1,n+ 1,1)- w(m -1,n -1,!)] ' 2Liz

2 [(32 ( )w(m+1,n+1,l)-w(m+1,n-1,l) - 2Lix m, n + 1 2Liz

-(32 ( _ ) w(m -1,n+ 1,1)- w(m-1,n -1,1)] m 1,n A

2uz


1 [{12 ( )w(m+1,n+1,l)-w(m-1,n+1,l) + 6z m, n + 1 26x

-{J2(m,n- 1) w(m+ 1,n-1,l;~:(m-1,n-1,l)]

1 [·(12_,('--m..:...., n_+~1)'-+'--{J'-2-'(_m.:...., n...!..) u (m, n + 1, l)- u (m, n, l) +-6-z - 2 f:;z

{12 (m,n) + {12 (m,n- 1) u (m,n,l)- u (m,n- 1,l)]} 2 6z

and

w ( m, n, l + 1) = 2w ( m, n, l) - w ( m, n, l - 1)

6 2{ 1 [ 2( 1)u(m+1,n+1,l)-u(m-1,n+1,l) + t 2/::;z a m,n+ 2/::;x

2 ( 1) u (m + 1, n- 1, l)- u (m- 1, n- 1, l)] -a m n-' 2/::;x

1 [a2 (m, n + 1) + a 2 (m, n) w (m, n + 1, l)- w (m, n, l) + 6z 2 6z

a2 (m, n) + a2 (m, n- 1) w (m, n, l)- w (m, n- 1, l)] 2 6z

__ 2_[{12 ( 1) u(m+ 1,n+ 1,l)- u(m -1,n+ 1,l) 26z m,n+ 2/::;x

-{J2(m,n- 1) u(m+ 1,n-1,l;~:(m-1,n-1,l)]

1 [{12 (m + 1,n) + {12 (m, n) w (m + 1, n, l)- w (m,n, l) +/::;x 2 6x

_(12 (m, n)- {12 (m- 1, n) w (m, n,l)- w (m -1,n, l)] 2 6x

_1_[{12 ( 1 )u(m+l,n+1,l)-u(m+1,n-1,l) + 26x m + 'n 2/::;z

(11) -{12 (m _ 1, n) u (m- 1, n + 1, ~~~: (m- 1, n- 1, l)]}

Equations (11) may be solved explicitly for the displacement u (m, n, l + 1) and w ( m, n, l + 1) in terms of the previous displacements at time steps l and l - 1. Although the above derivation was carried through for a rectangular


grid with grid interval 6x and 6z the calculations of general seismic wave propagation models were all performed with the square grid 6x = 6z = h. Numerical calculation requires that the finite-difference algorithm should be stable, i.e. the difference between the exact and the numerical solution of a finite-difference equation must remain bounded as the time index l increases, 6t remaining fixed for all m and n. It was shown [1] that the system of equations (11) is stable provided that

(12) a6t ( (32) -~ -< 1+-h - a 2

for a and (3 of all grid points. This inequality can also be written in the more revealing form

(13)

which shows that the time increment is limited by grid interval as well as the P- and S-wave velocities in the particular medium.

3. Computational environment.

3.1. Original CNN-UM model. Cellular Neural/Nonlinear Networks (CNNs) are analog dynamic processor arrays. A CNN can be described as a 2 or n-dimensional array of identical nonlinear dynamical systems (called cells), that are locally interconnected [4],[5]. The mathematical model of a CNN consists of a large set of coupled nonlinear ordinary differential equations (ODEs), that may exhibit a rich spatio-temporal dynamics. The operation of a cell ( i, j) is described by the following dimensionless equations

(14)

(15)

dx · ~ dt

Yi,J (xi,J)

-x +A 0 y· · (t) + B 0 ,. · (t) +I t,) Z,J ""'L,)

1 (lx + 11-lx· ·-II) 2 t,) t,J

where @ denotes a two-dimensional discrete spatial convolution such that

(16) A 0 Yi,j = Lk,lEN(i,j) Ak,lYi-k,j-1

for k and l in the neighborhood N ( i, j) of a cell ( i, j), which is restricted to the 9-connected cells, 8 neighbors and a self feedback. A and B are called feedback and feedforward weighting matrices, I is the cell bias, ui,J, Xi,J and Yi,J are the input, state and output of a cell, respectively. The same set of


parameters A, B and I are also called cloning template, and it is repeated periodically for each cell over the whole network, which implies a reduced set of at most 19 control parameters but nevertheless a large number of possible processing operations. The extended version of the CNN is the CNN Universal Machine (CNN-UM) the first spatia-temporal analogic array computer was invented in 1993 [10]. Although the performance of the digital processors doubles yearly, there are certain tasks, which cannot be done with them within reasonable time interval. Such hard problem is the analysis of big dynamical systems (for example weather forecast, geological tests, transient behavior of mechanical vibrating systems). The CNN-UM architecture enables high computing speed due to the parallel operating mode. The first question is how the huge computational power of the CNN-UM implementations for the seismic modeling can be utilized. Another question is that what is the minimal computational precision of the CNN-UM implementation which suits the engineer's requirement.

3.2. Falcon, an emulated digital CNN-UM model. The continuous mechanical vibrating systems, whose dynamical behavior is described by partial differential equations, can be modeled by cellular neural networks and the limit of this approach is discussed well [11],[13]. Multi-layer models can be implemented on software simulator, an emulated digital CNN-UM architecture or the CACE1K [2] analog VLSI CNN-UM chip. The software simulator is a flexible solution but the computational power of a core processor of a computer is limited. The CACE1K chip has got impressive computational power but the computational precision and the number of the layers are not enough for our model. The flexibility of the software simulation and the high computational performance of the analog VLSI implementation are mixed on the Falcon emulated digital CNN-UM architecture. Wide range of parameters can be configured like the number of the layers, the accuracy of the value representation, the size and number of the templates additionally space-variant templates can be applied. The Falcon architecture [7] contains M x N processing elements in a rectangular grid (Fig. 1). Each processing element solves the full signal range (FSR) model of a CNN cell. The state equation of a processor element can be determined by the following way

x· · (m + 1) ,,) Lk,lEN(i,j) Ak,lXi+k-n,j+l-n ( m)

(17) + Lk 1 N(" ") BZ zUi+k-n,J+l-n (m) +hi j E Z,J '


Input lines

~

Control

'-r--,-,1-'/ lines

~ Output lines

FIG. 1. The processor array and the structure of one core processor of the multi-layer Falcon architecture

where x, u and I are the state, input and the bias values of a cell, n is the neighbor value, A' is the feedback and B* is the feed forward template. The processed image is partitioned according to the physical processors. Each physical processor column works on a long narrow vertical stripe of the image. In one cycle a row of processor units gets the result of the previous iteration from the row of processor units above, calculates one iteration and sends the results to the row of processor units below.

4. Seismic models on CNN-UM architectures. The main application area of the CNN-UM architecture is the two-dimensional image processing. Let us consider the horizontal and the vertical displacement of the ground as two-dimensional images. These images can be represented by a layer of the CNN-UM model. These layers have to be interconnected. The values of the interconnections between the layers depend on the implementation of the CNN-UM architecture. T~mplate values can be determined from the physical parameters of the examined geological structure as it is seen in the next subsections.

4.1. Seismic model on analog CNN-UM architecture. First order derivation in time can be computed by a template operation on a single layer CNN-UM architecture but second order derivation is required to compute the solution of the equation (1) so temporary layer is required. It means at least four layers are required to simulate elastic wave equation on CNN-UM architecture. The horizontal and the vertical displacements of the ground are stored in the layers u and w. The value of the temporary layers p can be computed from the spatio-temporal derivates of layer u and an additional term that contains mixed derivates from layer w. The temporary layer q can


be computed similarly. The templates of the model can be determined from equation (1) and equation (14) by the following way

ou Apu ® Pi,j (t) at -

82u op Aup ® U;,j (t) + Awp ® Wi,j (t) (}t2 -

at -

ow Aqw ® qi,j (t)

at

(18) (}2w oq

Awq ® W;,j (t) + Auq ® Ui,j (t) (}t2

-at

-

where

0 ,B~,n+/J!, n-1 0 2~z2

a~,n +a~-l,n 0~+1 n2a:m,n+a:~-1 n

a~+l,n +a~,n Aup = 2Af2

2.6.x2 .B!,n+l +2.Bm,n +P~,n-1 2ilx2

2Az2

0 .B~ n+l+P;;,n 0 2ilz2 [ "'-··-' -'"'-··-' '"'-... 0 O!~,n-l-2,8~,n-1 +P!.+t,n

l 4~xAz 4AxAz

(19) Auq = 0 0 0 0:~ n+l -2.8! n+l +P! 1 n 0 a~ n+l-2.8! n+l +.B.!+ In

- 4AxAz 4.6.x.6.z

0 a;;,n+a!. n-1 0 2A~2

,B~,n +/1!. ,8~+1 n+2fim,n+P~-l n

.B!.+t,n +,B~,n Awq = 1,n - 2 ~z2 2

2Ax2 am,n+t201m,n +a:m1n-1 2L\.x 2

- 2.6.z2

0 a! n+t+a~,n 0 2Az2 [ "'-·. _,,_,. '"··-· 0 0!~+1 n -2,B!+l,n +.B!,n-1

l 4Ll.xAz 4AxAz Awp=

_ a;;,_1 n -2/3~-1 n +f3;;.,n+1

0 0

0 a;.+ In -2.8!+1 n +.B!,n+l 4AxAz 4AxAz

Templates B and I are unused in this model. Recently only one multi-layer analog CNN-UM chip has been implemented the CACE1K chip but it has got only two layers. Other restrictions are that the computational precision of the analog chips is just 6-8 bits and only space-invariant templates can be applied. It means that our model currently cannot be implemented on analog CNN-UM chip.


p

Aup Apu Awp -'-------"---..

u

~<r--~w __ -r Auq

Aqw Awq

q

Fro. 2. The seismic model on analog CNN- UM architecture

p

A up Auu

Apu

Awu

Awq Aww Aqw

q

Fro. 3. The seismic model on Falcon emulated digital CNN-UM architecture

4.2. Seismic model on emulated digital CNN-UM architecture. There are several numerical integration methods to approximate the continuous state equation of a CNN cell. The implementation of forward Euler method is very simple but its accuracy is not enough for us. Another approximation method is the second-order centered difference method which is better than the forward Euler method and its implementation is not more difficult. Layers u and w contain the current values of horizontal and vertical displacements of the ground while layers p and q contain the previous values for the next step of the computation. Our model is based on equation (11) and the following templates are required for the emulated digital model

54

A = b.t2 ww

P. KOZMA, P. SONKOLY AND P. SZOLGAY

~ ~] 0 0

~1 ~ ] 0

o:~,n +a:~-1 n

2Llx2

0

0

f3;;.,n +,6;._1 n 26.x2

0

0 0

0

0

a:;,..+ I n +a:~,n 26.x2

0

5. Examples. In this section emulated digital CNN-UM simulations of elastic wave propagation are presented in two-dimensional medium. The first example shows the result of the model in homogenous geological structure which means that all elements of the layer have the same P- and SH -wave velocities. In the second example a real-life geological structure is examined. The models are represented on a 512 x 512 grid. The values of the spatial discretization L.x and L.z are 1 meter, which means the minimal thickness of a layer that can be simulated. The precision of the physical parameters are defined by 13 bits, which means the layer velocities can be defined with difference 1m/s if the maximal velocity of the examined area does not exceed the speed limit 8192m/s. Reflecting boundary conditions are used in the model, which means that the full excitation energy remains in the system. The displacement fields will be presented for both examples and the results generated by using different computational precisions are compared. The generated synthetic seismograms of the inhomogeneous model are also presented. A single point source is used on the w layer to excite the models


FIG. 4. Snapshot of the horizontal and the vertical displacements of the homogeneous model

and the excitation function can be described by the following equation

(21) -7!t2

str (t) =sin (27r fmaxt) ~v

where f = 30Hz, ry = 327r f and v = 750~000 .

5.1. Homogeneous geological area. The transient response of a homogeneous model can be seen in Figure 4. The P- wave velocity is 2100m/s and the S- wave velocity is 700m/s. The time step !'otis 2- 13s which ensures that the solution will be stable. The error values are computed in every iteration from the ration of the maximal differences between the absolute values of the floating point and the fixed point solutions (Fig. 5). It means if the maximal displacement during the wave propagation is considered to be 1, the error value of the 32 bit fixed point solution is about its 0.001% for the examined time interval.

5.2. A real-life geological area. The structure of the examined geological area can be seen in Figure 6. The source is placed in the upper middle part of the model. The evaluation of the horizontal and the vertical displacements can be seen in Figure 7 and the generated synthetic seismograms are in Fignre 8. The figures are histogram transformed for the good visibility. The time step !'oi is 2··13s, which ensures that the solution will be stable. The running time of the simulation is 1000 iterations. The error of the fixed point computations is compared to the 64 bit floating point solution and the

56

6

5

4

2

0 0

P. KOZMA, P. SONKOLY AND P. SZOLGAY

0.02 0.04 0.06 0.08 0.1

"*'20

..... 22

-11-24

-+-26

-<>-28

--30

0.12

time [s]

FIG. 5. Errors of different computational precisions compared to the 64 bit floating point solution of homogeneous model

result can be seen in Figure 9. The error values are computed as described previously. As it can be seen the characteristic of the errors is the same as the homogeneous case. The local variations of the error functions are caused by the reflections of the waves on the bounds. Bigger errors appear when the waves propagate through from lover velocity layer to the higher velocity layer as it is seen in Figure 9 because the amplitude of displacements is reduced.

These models were implemented on our RC200 prototyping board from Celoxica Ltd. [3]. They were implemented by an optimized code to reach the best performance of the system. The Virtex-II 1000 (XC2V1000) FPGA on this card can host 1 Falcon processor cores using 32 bit precision, which makes it possible to compute 1 iterations in one clock cycle. The performance of the system is limited by the speed of the on-board memory resulting in a maximum clock frequency of 90 MHz. The theoretical performance of the 1 processor core is 90 million cell updatejs. Unfortunately the board has 72 bit wide data bus, so 8 clock cycles are required to read a new cell value and to store the results. This reduces the achievable performance to 11.25 million cell update/s. To improve the efficiency of our solution 4 virtual processors are implemented on our board. As the result of this optimization the performance is improved to 45 million cell updatejs. The size of the memory is also a limiting factor because the state values must fit into the 4 Mbyte memory of the board. By using the new Virtex-4 SX [14] devices with larger and faster


o=2166m/s

FIG. 6. The structure of the geological area with the source point

FIG. 7. Snapshot of the horizontal and the vertical displacements of inhomogeneous model


FIG. 8. The horizontal and the vertical synthetic seismograms of inhomogeneous model

"*-20

--a-22

---24

-1-26

-&-28

..,_30

-EI-32

0.1 0.12 time [s]

FIG. 9. Errors of different computational precisions compared to the 64 bit floating point solution of inhomogeneous model.


TABLE 1

Performance comparison of different implementations of the seismic model

RC200 XC4VSX55 Athlon64 Pentium IV Clock freq. (MHz) 90 500 2200 3000

Performance (107 it/s) 45 7000 1.132 1.258 Iteration time ( ms) 22 0.15 762 795

Speedup 36.14 5300 1.04 1

memory the performance of the architecture can reach 500 MHz clock rate and can compute a new cell value in each clock cycle. Additionally the huge amount of on-chip memory and multipliers on the largest XC4VSX55 FPGA makes it possible to implement 14 processor cores resulting in 7000 million cell updates/s computing performance. On the other hand the large number of arithmetic units makes it possible to implement higher order and more accurate numerical methods. The achievable performance and the speedup compared to conventional microprocessors are summarized in Table 1. The first row shows the physical clock frequencies of the different architectures. The second one shows their computational performances. The computation times of one iteration on a 1024 array can be seen in the third row. The computational performances are compared in the fourth row where the unit was the performance of a Pentium IV 3GHz processor. The results show that even the limited implementation of the Falcon processor on our RC200 prototyping board can outperform a high performance desktop PC. If adequate memory bandwidth (576 bit wide memory bus running on 500 MHz clock frequency) is provided, the performance of the emulated digital solution is 5000 times faster!

6. Conclusion. In this paper two models are given to simulate the propagation of elastic waves in heterogeneous medium. The first model is based on analog CNN-UM architecture but it cannot be implemented because of the limitations of the analog chips. The second one is based on an emulated digital CNN-UM architecture called Falcon. This architecture can be used to overcome the limitations of the analog VLSI CNN-UM chips but it uses fixed-point numbers during the solution of the CNN state equation therefore the required computing precision must be determined before implementation. The performance of the emulated digital CNN-UM architectures can be significantly improved by decreasing the computing precision of the architecture. Therefore it is very important to examine the accuracy of the solution in the case of low computing precision. The proposed architecture


was implemented on a mid-sized FPGA with million equivalent system gates on our RC200 prototyping board. This solution is about 36 times faster than the Pentium IV 3GHz processor while using larger FPGA and more memory 5000- fold performance increase can be achieved. The results of the floatingpoint and the fixed-point solutions are very close even if low precision (20-24 bit) is used. If the precision is increased to 30-32 bit, the fixed-point computations are as accurate as the 64 bit floating-point results while it requires much less amount of computing resources in implementation. The accuracy of the solution can be increased by using higher order spatial and temporal discretization methods and by using more accurate state variables.

Acknowledgements. The support of OTKA (Grant. No.: T042942) is kindly acknowledged.

REFERENCES

[1] Z. S. Alterman, and D. Loewenthal, Seismic Waves in a Quarter and Tree-quarter Plane, Geophysical Journal of the Royal Astronomical Society, vol. 20 (1970), 101-126.

[2] R. Carmona, F. J. Garrido, R. D. Castro, S. Espejo, A. R. Vazquez, Cs. Rekeczky, and T. Raska, A Bioinspired 2-layer Mixed-signal Flexible Programmable Chip for Early Vision, IEEE Transactions on Neural Networks, vol. 14 (2003), 1313-1336.

[3] Celoxica Ltd. Homepage [Online]. Available: http:jjwww.celoxica.com [4] L. 0. Chua, and T. Raska, The CNN Paradigm, IEEE Trans. On Circuits and

Systems-!, vol. 40 (1993), 147-156. [5] L. 0. Chua, and T. Raska, Cellular Neural Networks and Visual Computing, Cam

bridge University Press, U.K., 2002. [6] K. R. Kelly, R. W. Ward, S. Treitel, and R. M. Alford, Synthetic Seismograms: A

Finite-difference Approach, Geophysics , 1976, vol. 41 1119-1152. [7] z. Nagy, and P. Szolgay, Configurable Multi-layer CNN-UM Emulator on FPGA,

IEEE Trans. On Circuits and Systems-!, vol. 50 (2003), 774-778. [8] M. Ottaviani, Elastic-wave Propagation in Two Evenly-welded quarter-spaces, Bul

letin of the seismological society of America, 1971, 1119-1152. [9] E. Robein, Velocities, Time-imaging and Depth-imaging: Principles and Methods,

EAGE Publications, 2003. [10] T. Raska, and L. 0. Chua, The CNN Universal Machine. An analogic array computer,

IEEE Trans. On Circuits and Systems-!!, vol. 40, (1993), 163-173. [11] T. Roska, T. Kozek, D. Wolf, and L. 0. Chua, Solving Partial Differential Equations

by CNN Proc. of European Con f. on Circuits Theory and Design, ( 1992). [12] R. E. Sheriff, and L. P. Geldart, Exploration Seismology, Cambridge University Press,

U.K., 1995. [13] P. Szolgay, G. Viiriis, Gy. Eross, On the Applications of the Cellular Neural Network

Paradigm in Mechanical Vibrating System, IEEE. Trans. Circuits and Systems-!, Fundamental Theory and Appl., vol. 40, no. 3 (1993), 222-227.

[14] Xilinx Products Homepage [Online]. Available: http://www.xilinx.com


VOLUME 13

2006, NO 1

PP. 61-87

SOLVING PARTIAL DIFFERENTIAL EQUATIONS ON EMULATED DIGITAL CNN-UM ARCHITECTURES

Z. NAGY ' AND P. SZOLGAY t

Abstract. The solution of partial differential equations (PDE) has long been one of the most important fields of mathematics, due to the frequent occurrence of spatiotemporal dynamics in many branches of physics, engineering and other sciences. The array structure and local connectivity of the CNN paradigm make it a natural framework to describe the behavior of locally interconnected dynamical systems which have an array structure. Using an analog CNN-UM chip the computation can be carried out in realtime, but the accuracy of the solution in some problems is low. To improve accuracy while preserving high computing performance a configurable emulated digital CNN-UM can be used where the main parameters (accuracy, template size and number of layers) are configurable.

Key Words. Algorithms for specific classes of architectures, Wave equation, Biharmonic equation, Oceanography

AMS(MOS) subject classification. 65YIO, 35L05, 31A30, 86A05

1. Introduction. The Cellular Neural Network (CNN) [1] paradigm is a natural framework to describe the behavior of locally interconnected dynamical systems which have an array structure. So, it is quite straightforward to use CNN to compute the solution of partial differential equations (PDE). Several studies proved the effectiveness of the CNN-UM solution of different PDEs [2], [3]. But the results cannot be used in real life implementations

' University of Veszprem, Department of Image Processing and Neurocomputing, Egyetem u. 10, H-8200 Veszprem, Hungary

t Also affiliated to Analogic and Neural Computing Laboratory, Computer and Automation Institute of HAS, Kende u. 13-17. H-1111 Budapest, Hungary.

61

62 Z. NAGY AND P. SZOLGAY

because of the limitations of the analog CNN-UM chips such as low precision or the application of non-linear templates.

Emulated digital CNN-UM architectures [4], [5] seem to be more flexible than their analog counterparts both in cell array size and accuracy while their computing power is just slightly smaller. By implementing these architectures on reconfigurable chips it is possible to change the cell model and evaluate the new architecture in very short time.

In the next section a brief introduction to the CNN theory will be given. In the third section a small system of Ordinary Differential Equations is solved by using an emulated digital CNN. In the following section solution of a simple PDE is solved by CNN. In the last sections two application case studies will be introduced: the state equation of a micro-electromechanical tactile sensor and the state equation of a simple ocean model is solved by CNN. In each case several numerical methods and different computing precisions are used to compute the solutions. The results of the different methods are compared to determine the most suitable method for emulated digital CNN-UM implementation and the required computing precision.

·, 2. The Cellular Neural/Non-linear Network. Cellular Neural/Non-linear Network [1] contains identical analog processing elements called cells. These cells are arranged on a 2 or k-dimensional square grid. Each cell is connected to its local neighborhood via programmable weights which are called the cloning template. The CNN cell array is programmable by changing the cloning template. The local neighborhood of the cell is defined by the following equation:

(1) Sr(ij) = {C(kl): max{Jk- iJ, Jl- jJ} ::0 r}

In the simplest case the sphere of influence is 1 thus the cell is connected to only its nearest neighbors. Input, output and state variables of the CNN cell array are continuous functions in time. The state equation of a cell can be described by the following ordinary differential equation:

(2) CxVxij(t) = -R1

Vxij(t) + I; Aij,kl · Vykl(t) + X klESr(ij)

I; Bij,kl • Vukl ( t) + Zij klESr(ij)

where Vxij is the state, Vyij is the output and Vuii is the input voltage of the cell. Aii is the feedback and Bii is the feed-forward template. The state of

SOLVING PDES ON EMULATED DIGITAL CNN-UM 63

the cells is connected to the output via a nonlinear element which is described by the following function:

(3) ··-j( ··)_lx+1l+lx-11 y,J - x,.J - 2 ={ X;j(t) > 1

-1 :S X;j(t) :S 1 Xij(t) < -1

In most cases the Cx and Rx values are assumed to be 1 which makes it possible to simplify the state equation as follows:

(4) X;j(t) = -X;j(t) + :S A;j,kl · Ykz(t) + :S Bij,kl · ukl(t) + Z;j klES-(ij) klES-(ij)

where x, y, u and z are the state, output, input and the cell bias value of the corresponding CNN cell respectively. Template matrices A;1 and B;1 are space invariants if its values do not depend on the (i,j) position of the cell otherwise it is called a space variant.

In order to fully specify the dynamics of a CNN cell array the boundary conditions have to be defined. In the simplest case the edge cells are connected to a constant value: this called Dirichlet or fixed boundary condition. If the cell values are duplicated at the edges, the system does not lose energy: this is called Neumann or zero-flux boundary condition. In case of circular boundary conditions the edge cells see the values at the opposite sides thus cell array can be placed on a torus.

By stacking several CNN arrays on each other and connecting them a multi-layer CNN structure can be defined. The state equation of one layer can be described by the following equation:

(5) Xm,ij(t) = -Xm,ij(t) +

t ( :S Amn,ij,kl · Yn,kl(t) + :S Bmn,ij,kl · Un,kl(t)) + Zm,ij n=l klES-(ij) klES'-(ij)

where p is the number of layers, m is the actual layer and Amn and Bmn are templates which connect the output of the n'h layer to the m'h layer.

3. Simple example: mechanical vibrating system. Using finite difference approximation the solution of a PDE can be reduced to a solution of a set of ordinary differential equations. This set of ODEs can be mapped to a CNN array. The emulation of a CNN dynamics on a digital architecture requires discretization in time and a suitable numerical ODE solver method. The accuracy of three single step algorithms was compared in [6]. In our case


the following three methods will be examined: the forward Euler method, which is widely used in CNN simulation and the 2nd and 4th order RungeKutta method [7]. The formula for the Euler method is

(6)

which advances a solution from tn to tn+l=tn +h. However the method is very simple it has some disadvantages:

1. the method is not very accurate when compared to other methods run at the equivalent step-size, and

2. it is not very stable either. The classical 2nd and 4th order Runge-Kutta method has the following form:

kl hj (tn, Un)

(7) k2 - ( h kl) hj tn+2,un+2

Un+l - Un+k2+0(h3)

kl - hj (tn, Un)

k2 - ( h kl) hf tn+2,un+2

(8) k3 - ( h k2) hj tn+2,Un+2

k4 - hj(tn+h,un+ka)

Un+l u + kl + k2 + k3 + k4 + 0 (h5) n 6 3 3 6

These methods require computing two and four times as many derivatives as the forward Euler case but we shall see that the extra computation is worthwhile.

The main issue when these equations are solved on an emulated digital CNN-UM is the required precision (bit width) of the state values to get accurate results. To examine the accuracy of the different fixed-point solutions the state equation of a simple dynamical system, shown in Fig. 1, is solved.

This simple mechanical vibrating system contains five bodies which are connected by springs. The motion of the bodies is described by the following set of equations. (For simplicity we assume unit mass and unit spring


FIG. 1. A simple mechanical vibrating system

constants.)

Xj -2x1 + Xz

(9) Xi Xi-l- 2Xi + Xi+li = 2, 3, 4

X5 - x 4 - 2x5

Xi (0) Xo xi (0) = Vo 1:Si:S5

This set of state equations can be solved exactly and the solution has the following general form:

5

(10) Xi (t) =I:. Bi,j cos (Ajt) j~l

where the A values are the eigen-frequencies of the system and the B values depend on the initial conditions. In our test case the initial parameters were set to 0 except for the central element which had an initial displacement of 1. According to the initial conditions the values of the B matrix and the A eigen-frequencies of the system are the following:

0.16667 0 -0.33333 0 0.16667 1.9319 -0.28868 0 0 0 0.28868 1.7321

(11) B= 0.33333 0 0.33333 0 0.33333 A= 1.4142 -0.28868 0 0 0 0.28868 1 0.16667 0 -0.33333 0 0.16667 0.51764

The state equations of the mechanical vibrating system can be solved on a line of 2-layer CNN cells where the first layer is the displacement and the second is the velocity of the given body. The following two templates are required for the computation:

(12) A 12 = 1 A21 = [ 1 -2 1 ]

The state equation of this CNN array is solved by using the previously described numerical methods. To compare the different solutions the first 16 seconds of the analytical solution is computed using a 0.125s timestep and


-0.8

-1~--~----~----7---~~--~----~----~--~. 0 2 4 6 B 10 12 14 16

Time{s)

FIG. 2. The analytical solution of the mechanical vibrating system

the numerical and exact solutions were compared only in these 128 points. The amplitude of the displacement of every body is always inside the [ -1, +1] range. The absolute maximum difference between the exact and the numerical solution using different timesteps is shown in Fig. 3.

As we could expect the forward Euler method has the largest errors and the higher order methods perform better. On the other hand the 2nd and 4th

order Runge-Kutta methods have a much better convergence as the stepsize is decreased. In the case of the 4th order Runge-Kutta method the accuracy of the solution cannot be increased if the timestep value is smaller than 2-11

because the rounding errors and the error of the method are in the same range. In spite of the fact that the 4th order Runge-Kutta method requires 4 times more computations and memory it is worthwhile to implement it if accurate simulation of the dynamics is required.

After a suitable 0 DE solver method is selected the next question is the required precision of the FPGA implementation because floating-point arithmetic requires huge area and by using fixed-point arithmetic large amount of resources can be saved or traded for performance. The absolute maximum difference between the exact and the different numerical solutions using different state precision and timestep values are plotted in Fig. 3.

In case of fixed-point computation the error of the different numerical methods has a very similar behavior. For large step-sizes the errors of the fixed and floating-point computations are equal. If the step-size is reduced further the error grows again because in these cases the rounding errors are


444044~M·-~--~-4-

11mol!tep {log2{h))

nmestep (Jog2(h))

-~-~

FP64 -&-8blt -!!r-12blt -*-16blt -lll-20bltl

24 bit -t-28 bit -.-32 bit --36 bit -+-40 .. b!ti

44blt -&-~.~-:=~-52 bit -.Ill-56 bit ---··-·J

1.0E..OS l----------------_:;:::,j •••••••~~--~G•m•4•

llmeutep (log2{h))

r:=.-·.:::"=== ... : •. b:lt ~-::·:.It··::;;:-:·: =:=::-: ~~-~~~~!~.~-~·---_)

1.0E.o4

1.0E.OO

! 1.0E.08

1.0E-10

1.0E·12

{d

UE-14

.44404·~~--~G4~44-

Tlmostep (log2(h))

-+-FP64 -e-s bit -IJr-12 bit -M-16blt -i!E-20blt

-&-24blt -+-2Bhlt --32bl< --36blt -+-40blt

-@l-44blt -tb-48blt -M-52blt -i~E-56blt

FIG. 3. Error of the different numerical methods

:::- 40 a c _g 30 .!l

~ ll.

10

0+-~--~------------~--------------~ -3 -4 -5 -5 -7 -8 -9 -10 -11 -12 -13 -14 -15 -16 -17 -18 -19 -20

Timestep (Jog2(h))

FIG. 4. Optimal bit width for the different numerical methods


larger than the error of the numerical method. These results show that an optimal bit width can be found for every timestep where the error of the floating-point and the fixed-point solutions are identical and the bit width is minimal. The optimal bit widths for different timestep values and simulation runtimes are plotted in Fig. 4.

In the case of the forward Euler and the 2nd order Runge-Kutta methods very low fixed-point precision is required to get similar results than the floating-point solution however these methods are not very accurate. To get better results the 4th order Runge-Kutta method should be used which requires about two times larger bit width but it is still more efficient than the lower order methods because much larger stepsize can be used during the computations.

Unfortunately the exact solution is usually not available which makes it hard to determine the optimal fixed-point precision. In these cases only the floating-point results can be used as a reference to determine the error of the fixed-point solution. The difference between the floating-point and the fixed-point solutions is shown in Fig. 5.

As the precision is increased the difference between the two solutions decreases but how can we tell the optimal bit width for the fixed-point solution? In Fig. 5. the difference between the exact and floating-point solution is also plotted (FP64) but in this case the error of the different fixed-point computations crosses this line instead of following it. But these solutions are not more accurate than the floating-point solution as shown in Fig. 3. Simple algorithm to determine the optimal bit width of the fixed-point computation: Let us assume that the error (c) of the floating-point solution is given in advance (it is usually true in engineering applications). The optimal bit width can be determined if several fixed-point solutions are computed and the bit width is increased in each iteration until the difference between the floating-point and fixed-point results is smaller than c. The determined precision of the state values can be used in case of other initial conditions provided that the results are in the same range as the reference solution.

Another interesting question is to determine the limits of the floatingpoint solution. The accuracy of the solution of our simple example can not be smaller than 10-14 as shown in Fig. 3. What happens if the bit width of the fixed-point solution is increased well beyond 56 bit? To answer this question the accuracy of the floating-point solution is decreased to 32 bit from 64 bit. In this case only the 32 bit floating-point results should be computed and compared to the previously computed fixed-point results and we do not have to use special library functions to handle bit widths larger than 56 bits. The

SOLVING PDES ON EMULATED DIGITAL CNN-UM

1.0E.()2

1.0E.04

1.0E.06

2 1.0E.OB

,l:; 1.0E-10

1.0E-12

1.0E-14

1.0E-16

-

~" ~

J-"'"

~ .A:

I-"" ,._ ~

""":" ~~ _;,.;.. ...,..

..,.. ;.;_ .... - - ::':__ __

e-"'

-3 -4 -5 -5 -7 -8 -9 -10 -11 -12 -13 -14-15 -16-17 -18 ·19 -20

Timestep (iog2(h))

-- 64 --a bit -~ 12 bit """"*"" 1-s b-it-~2o-bil

bit -1-28 bit -32 bit -36 bit --40 bit

bit ---40 bit --*""52 bit --llf-56 bit -···· . _;:.;_..:c:_::.:.:_ __ ____j

69

FIG. 5. Difference between the floating-point and the fixed-point solutions in the case of the 4'' order Runge-Kutta method

error of the fixed-point computations compared to the 32 bit floating-point results is shown in Fig. 6. The difference between the exact solution and the 32 bit floating-point computation is also shown (FP32). According to the narrower mantissa the 32 bit floating-point solution is very inaccurate compared to the 64 bit floating-point results and the smallest error is in the order of w-7• If the step-size is decreased to improve the accuracy, the error of the solution increases due to the larger rounding errors during the computation. If these inaccurate results are used as a reference solution in the computation of the error of the fixed-point solutions, the error values are very similar to the previous case if the precision is smaller than 28 bit. If the precision is increased, the error values are identical to the error of the 32 bit floating-point results. This effect is more visible if the error values are plotted as a function of the computational precision as shown in Fig. 7. If the precision is larger than 32 bit the error function is a horizontal line for all timesteps but these fixed-point solutions are more accurate as shown in Fig. 3.

Simple algorithm to determine the optimal bit width of the fixed-point computation when its accuracy is equal to the floating point solution: In this case the break-point of the error functions should be determined by increasing the precision until the error value remains the same. In this case the fixed-point solution is at least as accurate as the floating-point solution.

In this section the state equation of a simple dynamical system is solved

70

~ g w

Z. NAGY AND P. SZOLGAY

1.0E-01

1.0E-02

1.0E-03

1.0E-04

1.0E-05

1.0E-06 ----

1.0E-07 +-..,-~~~~~-~-~-~-----r-1 -3 -4 ·5 -6 -7 -8 .g ·10 ·11 ·12 ·13 ·14 ·15 -16 -17 -18 -19 -20

Timestep (log2(h))

--FP32 --11--8 bit --..-12 bit --16 bit --liE-20 bit

--24 bit -+-28 bit -32 bit -36 bit --40 b~

--ll--44bit ...,._48bit --52 bit -liE-56 bit __________ _j

FIG. 6. Difference between the 32 bit floating-point and the fixed-point solutions in the case of the 4th order Runge-Kutta method

1.0E-01

1.0E-02

1.0E-03 ~ e 1.0E-04 ~ w

1.0E-05

1.0E.06

1.0E.07 8 12 16 ~ ~ u ~ ~ ~ 44 48 ~ $

Precison (bit)

---3 --11---4 -..-..s ---6 --liE--7 ---8

-+--9 -------10 -------11 ---12 --11---13 --.k---14

---15 --liE--16 ---17 -+--18 --19 --20

FIG. 7. Difference between the 32 bit floating-point and the fixed-point solutions in the case of the 4'" order Runge-Kutta method


and the accuracy of three numerical methods was examined. The results showed that high order methods can be very efficient if the dynamical behavior of the system have to be computed accurately in spite of the fact that these methods require more computation per timestep. The state equation of this system is solved by using fixed-point numbers. The results showed that only moderate precision (28-32 bit) is required during the computation and the results are very close to the exact solution of the system. Two simple heuristic methods are introduced to determine the optimal fixed-point precision. The first method requires a-priori information about the desired error of the solution while the second method sets the precision so that the solution of the system using fixed-point numbers will be as accurate as the floating-point solution.

4. The wave equation. In this section the solution of the classical wave equation on the Falcon architecture will be described. The frequent appearance of these equations in many scientific and engineering problems makes them very important. However these are well-known equations and can be solved analytically, this solution can not be applied in case of complicated shapes and initial conditions. Therefore the partial differential equation must be discretized in space and the resulting coupled set of ordinary differential equations can be solved on a CNN array.

The one dimensional wave equation (13) describes the motion of a finite string where the cross section of the string and the amplitude of the wave are small and both end of the string is fixed.

d2u (J2u c2- O<x<l t>O dt2 8x2

u (0, t) - 0 t 2': 0

(13) u (l, t) 0 t 2': 0

u (x, 0) f (x) O:_Sx:_Sl

u (x, 0) g (x) O:_Sx:_Sl

In (13) l is the length of the string, c is the speed of the wave, u(x, t) is the displacement of the string, f(x) is the initial displacement and g(x) is the initial speed of the string. Equation (13) can be solved analytically and the solution has the following form (14):

u (x, t) 00

( (mrc ) . (n1rc )) . (n1rx) ~ an COS -1-t + bn Sill -

1-t Sill -

1-


(14) an- ~jf(x)sin(n;x)dx 0

bn = n!c) g (x) sin (n;x)dx 0

This infinite series can be used only when the initial conditions are simple and the integral in the computation of the an and bn values can be determined. For simplicity the length of the rod and the speed of the wave is set to 1 and the initial displacement is a reversed parabolic curve described by the following equation:

(15) f (x) = 4x(1 - x)

After substituting (15) into (14) and performing the integration the following equation is obtained:

(16)

where sin(n) is zero if n is integer and cos(n) is -1 if n is odd while it is +1 if n is even and the bn values are all zero. Additionally an is zero for every even value of n and (16) can be simplified:

(17)

After computing the an values (14) can be used to approximate the analytical solution. The analytical solution is computed on 1025 points of the string on a 4s interval with 2-5s timestep and the result is shown in Fig. 8.

To solve (13) on a CNN architecture two CNN layers are required: the state of the first layer is the displacement and the second is the speed of the appropriate finite element and at the boundaries fixed boundary conditions are used. The required templates are the following:

A12 1

(18)

In addition to the previously described numerical methods two other finite differencing approximations are also examined: these methods are the


4

FIG. 8. Motion of the string

leapfrog method and the Lax-Wendroff method. The leapfrog method is very similar to the forward Euler method but in this case the derivatives are added to the previous state instead of the present state [7]:

(19) uj+l = uj-1 + 26.tF ( un)

where F( un) is the derivative computed according to (13). The Lax-Wendroff scheme is a two-step second order in time method

where the first intermediate step u7:f/i is computed at half timesteps tn+l/2 and half mesh points Xj+J/2 by the Lax method [7]:

( ) n+l/2 _ 1 ( n n) D.t (Fn Fn) 20 uJ+l/2 - 2 uJ+l + uJ - 26x J+l - J

Using these variables the fluxes FJn~y:; are computed. Then the updated values uj+l are calculated by the following expression:

(21) n+l _ n _ D.t (Fn+l/2 _ Fn+l/2) 1t1 - uJ D.x i+l/2 i-1/2

Equation (13) should be written in a flux-conservative form to solve it by using the Lax-Wendroff scheme:

or OS -=c-ot ox

(22) OS or -=c-ot ox


After substituting (22) into (20) and (21) the following equations can be derived:

However the previously described methods cannot be implemented directly on the Falcon emulated digital CNN-UM architecture its structure can be modified to solve the wave equation by using these methods.

The state equation of a vibrating string is solved by these five methods and the difference between the exact solution and the numerical results is shown in Fig. 9.

In this case the forward Euler method has a very poor convergence rate. By using the 2nd and 4th order Runge-Kutta method the timestep can be increased however the accuracy of the solution cannot be increased. The leapfrog method requires 4 times smaller timestep than the 2nd order RungeKutta method but in this case the derivatives should be computed only once. The required timestep in the case of the Lax-Wendroff method is equal to the required timestep of the 4th order Runge-Kutta method while its error is much smaller. However by increasing the timestep the error values in the case of the Lax-Wendroff method are larger than the error of the other methods.

To implement the previously described methods on FPGA fixed-point numbers should be used. The absolute maximum difference between the analytical solution and the different fixed-point results and the optimal bit width for each method are shown in Fig. 10.

The characteristics of the error functions are very similar to the errors of the solution of the mechanical vibrating system. If the fixed-point precision is high enough, its error is equal to the error of the floating-point computation and an optimal bit width can be found for every grid size. If the number of grid points is increased without increasing the computing precision, the accuracy of the solution decreases due to the rounding errors. The required precision is the smallest if the Lax-Wendroff method is used therefore this method seems to be the most efficient in FPGA implementation.

5. Tactile sensor modeling. Humans do precise manipulation tasks largely through tactile perception of objects. When a fingertip touches an object a contact stress profile is induced at its surface. The resulting stress profile has three components: the normal stress Tn and the shear stress Tx, Ty

in the x and y dimensions. Sensing these three stress components provides


~~~~::~~~-~~~~~~~~~t 1.0E~1 l-

""***"*-*_,._,.Hi< g 1.0E-D3 f--------------------"' """'*"liHIHIHif-l«--li: w

1.0E.05 l-----~~------~~--1 • • • •• ~ • m • ~ •

Tlm~t~~top (log2{h))

1.0E.(I5 1----~--------~~--1 • • • m • • m • • •

Tlm0.rtop {!og2{h))

E:!:::.'r +33 --&-65 -*-129~.~~7 :....-~~ -+-10251

,,, 1_QM1~~ 1.0E..02t==

• • • •• ~ • m • • • nmmep (log2{h))

g 1.0E.(I3 w

1.0E.o4 -----

1.0E..05 <--------------~~ • • • •• ~ • m • • •

Tlmestop {log2{h))

r:;:..1r-:;:a;·::;::GS~=i=.2s7-:.:::o:::-m·=t=!§J

1.0E..05 1----~----------~ • • • m G • m • ~ •

Tlmll$tep (log2{h))

FIG. 9. Error of the different methods using different spatial discretization& (a) forward E1der (b) 2nd order Runge-Kutta (c) 4th order Runge-Kutta (d) Leapfrog (e) Lax- Wendroff method


Number of ehtmelrts

_._:-fp64 --8blt _._12blt -H-16blt -20btt -e-24blt -t-28blt __ ,_, --36b!t -+-40blt

-lt-44blt -.-48blt -M-52blt -¥-56 bit

17 33 " 129 "' '" Number of elements

~FP64 -lf-8blt -6-12blt -H-16blt -*-20blt

-e-ublt -r-2eb!t --32btt --36blt -+-40hlt

-tl-44 bit --ir-48 bit --K-52 bit ---'"=""~---'

17 33 " "' "' '" NumiMlr of etam&nt$

-+-FP64 --lf-8 bit -..-12blt -M-16blt _._20blt -e-24blt -+-28blt --32blt --36blt -+-40blt -11-44blt _.,_46b!t -K-52blt -IE-56blt

""

'"'

Number of o[olll(llrts

-+-FP64 -e-ablt _._12blt -K-16blt -iiE-20btt -e-24blt -t-28blt __ ,_, --36blt -+-40blt

-e-44blt -.-48blt -H-52blt -liE-56 bit

1.0E.05 +------------------l 17 33 " '" "' '" '"' Number of Glomonts

-+-FP64 -a-a bit -.lr-12blt -K-16blt --20 bit ---24blt -+-28blt --32btt --36blt -+-40blt -e-44blt -Ar-48blt -K-52blt -liE-56 bit

" .. " c

~ .. • i" • 10

' 0

17 " '" 2$7 '" "" Number of olt»nents

FIG. 10. Error of the different methods using different precisions (a) forward Euler (b) 2nd order Runge-Kutta (c) 4th order Runge-Kutta {d) Leapfrog (e) Lax- Wendroff method (f) The required precision for the different methods


humans with a rich source of information about their physical environment. Measurement and processing of the stress components also has a great importance in precise robotic manipulation. These biology inspired systems use a small array of sensing elements to improve sensing capabilities. Analog CNN-UM cells can be used to process the measured stress components in real time [9].

Simulation of the static and dynamic properties of the sensors in design time requires high computing power. Several studies proved the effectiveness of the CNN-UM solution of different PDEs [2], [3]. But in most cases the results cannot be used in real life implementations because of the limitations of the analog CNN-UM chips such as low precision or the application of 5 x 5 sized templates. These drawbacks can be solved by using an emulated digital CNN-UM. In this section the transient behavior of a simple tactile sensor will be modeled by using the multi-layer Falcon emulated digital CNN-UM architecture. The standard multi-layer architecture is specialized to solve the state equation of the tactile sensor to reduce the area requirements and improve the performance of the architecture.

Tactile sensors are usually composed of a central shuttle plate, which is suspended by four bridges over a pit. The suspension of the whole structure allows deformation of the bridges as normal and shear stress is applied to the central plate. Each bridge contains an embedded piezoresistor. The resistance of the bridges is changing due to the deformations and the voltage changes on the bridges can be measured. In our case the bridges are located on the center of each edge as shown in Fig. 11.

The transient response of the central plate due to an applied normal pressure can be described by the following partial differential equation [10]:

(24) [J2w ( [J4w [)4 111 [J4w)

hp [)tZ = p - D [)x4 + 2 [)x2 [)y2 + [)y4

where w is the displacement of the plate, p is the applied pressure, h is the thickness of the plate and p is the density of the plate. The flexure rigidity D can be computed by the following expression:

(25)

where E is the Young's modulus and v is the Poisson's ratio. The dimension of the plate is lOOp.mx lOOp.m and the thickness is 2.85p.m.

The width of the suspension bridges is l2.5Jun. For simplicity the suspension bridges themselves are not modeled but our solution can be extended to


FIG. 11. Structure of the tactile sensor

handle it. The tactile sensor is made from silicon so the material constants are the following: E=47GPa, v=0.278 and p=2330kgjm3.

,, To solve (24) on a CNN-UM the plate should be spatially discretized and each finite element is assigned to one CNN cell. Equation (24) is second order in time so two coupled CNN layers are required where the displacement of the plate is computed by the first layer while the velocity is computed by the second. The approximation of the spatial derivatives requires the following 5 x 5 sized template which is the conventional discretized form of the biharmonic operator:

0 0 1 0 0

(84w 8

4w 84w) D

0 2 -8 2 0 (26) D -+2 +- ~- 1 -8 20 -8 1

8x4 8x28y2 8y4 ~x4

0 2 -8 2 0 0 0 1 0 0

where ~x is the distance between the grid points. At the free edges of the plate zero-flux boundary conditions are used while fixed boundary conditions are used at the suspensions. Due to the two different boundary conditions space variant templates are required. Equation (24) can not be solved on the current analog VLSI chips because 5 x 5 sized and space variant templates are not supported in these architectures. Using the Falcon configurable emulated digital CNN-UM architecture the limitations of the analog VLSI chips can be solved. The Falcon architecture can be configured to support two CNN layers and 5 x 5 sized space variant templates.

SOLVING PDES ON EMULATED DIGITAL CNN-UM

-11

x10

1 -- - .

0_5

0 - - - - -

..... -······· .... ··

... ···

... ···

... ·· ,•' .. ····

E' ~ -0,5 .. ·· c: 0 ie -1 "' 0

Q. -1_5 ·--

-2 .. ··

-2,5 1

:.: ---

.. ·· 0,75

-4 x10

•,.•'

.. ·· .. ··· ' ~ : . 0,75 0_5

0,5 025

Y (m) 0 0 X(m)

FIG- 12, Displacement of the plate after 31-'s

1

-4 x10

79

-11

x10

-0_5

-1

-1 _5

-2

80

1.0E·10

1.0E·11

!i t.OE-12.

• w 1.0E·13

1.0E·14

Z. NAGY AND P. SZOLGAY

,., ~

'a.. ..........

......... --'&...

ro u u te ta ~ ~ u ~ ~ ~ u u ~ Pt&Cfllon {bft)

1~64--1281

1.0E+OO

1.0E.01

f.OE-0:2

1.0E-03

g 1.0E-04 •

1.0E.OS

t,OE-o7

~ ~ ~ ~ ~

'W::::a. ~ ....,_

10 12 U U 18 ~ ~ U H 28 ~ U U ~ P~-'<m(bft)

l-t-64 ......._12a I

FIG. 13. (a) Error of the displacement of the plate {b) Error of the speed of the plate

To achieve better numerical stability the leapfrog method (19) is used instead of the forward Euler method during the computation of the new cell value. The implementation of this method requires additional memory elements and doubles the required memory bandwidth of the processor but these modifications are worthwhile because much larger timestep can be used.

A simple test case was used to determine the accuracy of the fixed-point solution. The input function was a step function, which was applied to the center of the plate. The first 3.814t.ts (221 steps using 2-39s timestep) of the transient response was computed using 64bit floating-point numbers. The displacement of the plate after 3t.tS where the amplitude of the oscillation is the largest is shown in Fig. 12.

The 64 bit floating-point result was compared to the results of the fixedpoint computations by using different state precisions. The maximum difference between the two solutions is shown in Fig. 13.

The largest amplitude of the plate is 2.3 X w-n during the simulated time interval thus at least 18 bit precision must be used to get about 10% accurate results. If the precision is increased, the fixed-point computation is more and more accurate but the accuracy cannot be increased beyond 5 x 10-16 . This behavior is very similar to the results obtained in the previous sections where the state equation of the mechanical vibrating system and the wave equation are solved by using 32 bit floating-point numbers. The explanation of this behavior is the rounding errors because our arithmetic unit does not perform any rounding during the computation of the derivatives. If 36 bit precision is used, the inside precision is 60 bits because the coefficient D / l!.x4 is 18 bit wide and 6 additional bits are required during the computation of the spatial difference operator: this is 60 bits altogether. In the case of the standard IEEE 64 bit floating-point numbers the size of the mantissa is 52


bit so it is very likely that some bits are lost when the result of the spatial difference operator is multiplied by D / !:lx4

• In most cases by using 60 bit precision inside the arithmetic unit the results will be more accurate than the floating-point results. By using 32 bit precision the error is about 0.002% compared to the maximum amplitude of the plate, which is acceptable in most engineering applications.

6. Barotropic ocean model. Simulation of compressible and incompressible fluids is one of the most exciting areas of the solution of PDEs because these equations appear in many important applications in aerodynamics, meteorology and oceanography. In this section a CNN-UM simulation of ocean currents will be presented. The governing equations of the ocean model are derived from the N avier-Stokes equations of incompressible fluids. CNN-UM solution of the Navier-Stokes equations was described in [2]. But the non-linearity of the state equations does not make it possible to utilize the huge computing power of the current analog CNN-UM chips. By using an emulated digital CNN-UM the known limitations of the analog chips such as low precision, small array size and noise sensitivity can be solved. To improve the performance of our solution the cell model of the architecture is modified to handle the non-linearity of the model.

Building a universal ocean model that can accurately describe the state of the ocean on all spatial and temporal scales is very difficult [11 J. Thus ocean modeling efforts can be diversified into different classes, some concerned only with the turbulent surface boundary layers, some with continental shelves and some with the circulation in the whole ocean basin. Fine resolution models can be used to provide real-time weather forecasts for several weeks. These forecasts are very important to the fishing industry, ship routing and search and rescue operations. The more coarse resolution models are very efficient in long term global climate simulations such as simulating El Nino effects of the Pacific Ocean.

In general, ocean models describe the response of the variable density ocean to atmospheric momentum and heat forcing. In the simplest barotropic ocean model a region of the ocean's water column is vertically integrated to obtain one value for the vertically different horizontal currents. The more accurate models use several horizontal layers to describe the motion in the deeper regions of the ocean. Though these models are more accurate investigation of the barotropic ocean model is not worthless because it is relatively simple, easy to implement and it provides a good basis for the more sophisticated 3-D layered models.

The governing equations of the barotropic ocean model on a rotating


Earth can be derived from the Navier-Stokes equations of incompressible fluids. Using Cartesian coordinates these eqm~tions have the following form [11]:

(27)

(28)

(29)

duy dt

d'f/ dt

- 20sinBuy- gH~; +Twx- Tbx + A\72ux

_ Ux OUx _ Uy OUx Hox Hoy

= -20sinBux-gH~; +Twy-Tby+A\72uy

_ Ux GUy _ Uy GUy Hox Hoy

_ OUx _GUy ax oy

where 7/ is the height above mean sea level, Ux and uy are volume transports in the x and y directions respectively. In the Coriolis term n is the angular rotation of the Earth and 8 is the latitude. The pressure term contains H(x,y), which is the depth of the ocean and the gravitational acceleration

:' g. The wind and bottom stress components in both x and y directions are represented by Twx, Twy, Tbx and Tby respectively. The lateral viscosity is denoted by A.

The bottom stress components can be linearly approximated by multiplying the Ux and uy by a constant value u' the recommended value of this parameter is in the range 1 - 5 x 10-7. The wind stress components can be computed from the wind speed above the sea surface by the following approximation [11]:

(30) Pa 0 uz Tw =- d 10

Pw

Where Pais the density of the air, Pw is the density of the sea water, U10 is the speed of the wind at 10 meters above the surface and Cd is the drag coefficient. One possible approximation of the drag coefficient is the following:

(31) IOOOCd

IOOOCd

3.1 7.7 - o.29 + -u + uz

10 10

- 0.5 + 0.071 · U10

(3::; Uw::; 6mjs)

(6::; Uw::; 26m/s)

The horizontal friction parameter A can be computed from the mesh-box Reynolds number Rc:

(32)


where D..x is the mesh size and U is the magnitude of the velocity in the mesh-box. By approximating U with ViFf and setting Rc = 4 which is generally considered in nonlinear flow simulations the lateral friction can be computed by the following equation:

(33)

The circulation in the ocean is generally the result of the wind stress at the ocean's surface and the source sink mass flows at the basin boundaries. In our investigation steady wind was used to force our model. In this case the ocean will generally arrive at a steady circulation after an initial transient behavior.

Solution of equations (27)-(29) on a CNN-UM architecture requires finite difference approximation on a uniform square grid. The spatial derivatives can be approximated by the following well known finite difference schemes and CNN templates:

(34) {) 1 [ 0

0

~1] ""' - 1 0 = Adx {)x 2D..x

0 0

{) 1 [ 0 1

~] = Ady {)y ""' - 0 0 2D..x

0 -1 (35)

1 [ 0 1 n =An

vz ~- 1 -4 ~ t:J..x2 0

1 (36)

By using these templates the pressure and lateral viscosity terms can be easily computed on CNN-UM architecture. However the computation of the advection terms requires the following non-linear CNN template which can not be implemented on the present analog CNN-UM architectures:

(37) [

0 0 0 ] {) Ux ij

Ux,ij {)x "'=' 2~X ~ ~ ~ 1 = Ax,x,ij

Most ocean models arrange the time dependent variables ux, uy and 7J on a staggered grid called C-grid. In this case the pressure p and height H variables are located at the center of the mesh boxes, and mass transports Ux and uy are at the center of the box boundaries facing the x and y directions


respectively. The state equation of the ocean model can be solved by a 3-layer CNN-UM. In this case the CNN-UM solution can be described by the following set of equations:

(38)

(39)

(40)

dux,ij

dt

duy,ij

dt

f;JUy,ij - gH;j L A,t.,T/ + Twx,ij - 0'1Ux,ij

+Aij LAnUx- Hl· (LAx,x,ijUx + LAx,y,ijUy) •J

- J;jUx,ij - gH;j L Aay1) + Twy,ij - 0'1Uy,ij

+Aij LAnUy- ~. (l:Ay,x,ijUx + L Ay,y,ijUy) '1

At the edges of the model the normal and tangential components of the flow are zero: the former means that there is no flow through the boundary while the latter means that there is no slip at the solid boundary. To achieve these conditions fixed boundary conditions are used in the case of Ux and uy. In the case of the elevation 17 zero-flux boundary conditions are used because the water can move freely at the boundary.

By using equation (38)-( 40) and templates (34)-(37) an analogic algorithm can be constructed to solve the state equation of the barotropic ocean model. However the non-linear advection terms do not allow us to implement our algorithm on the present analog CNN-UM chips. The non-linear behavior can be modeled by using software simulation but this solution does not differ from the traditional approach and does not provide any performance advantage.

The Falcon emulated digital CNN-UM architecture can be modified to handle the non-linear templates required in the advection terms of the ocean model. The performance can be greatly improved by designing a specialized arithmetic unit which can compute these templates fully parallel. Instead of building a general CNN-UM architecture which can handle the required non-linear templates an array of specialized cells is designed which can solve the state equation of the discretized ocean model directly.

To emulate the behavior of the specialized cells the continuous state equations (38)-( 40) must be discretized in time. In the solution the leapfrog method is used but in this case we have an upper limit on the t.t timestep. The maximal value of the timestep can be computed by using the CourantFriedrichs-Levy (CFL) stability condition.

(41) t.t < t.x/ew

!


,,, t.OE+OO ~-'S~Si~~-----~ 1.0E+OO .

(b)

1.0E.01 t----- 1.0E.01

1.0E-D2 t------ 1.0E-D2

i w

1.0E-D3 ~"" ----- 1.0E.03

10 12 14 16 18 ~ ~ M ~ U 00 ~ ~ 36

Proclulon {bit)

10 12 14 ts 1s u n M u u ~ ~ ~ ~

Prodl<!on {bit)

1.GE-06 j-----

10 12 14 16 U ~ ~ M U U ~ U M ~

Proclnlon {bit)

FIG. 14. (a) Error of the horizontal flow Ux (b) Error of the vertical flow Uy (c) Error of the elevation

where 6.t is the timestep, 6.x is the distance between the grid points and cw is the speed of the surface gravity waves typically Cw = V9H.

To evaluate the accuracy of the fixed-point solution a simple model is used. The size of the modeled ocean is 2097km, the boundaries are closed, the grid size is 256 x 256 and the grid resolution is 8192m. Six different bottom topographies were used during the computation in the first two cases the ocean bottom was flat and 4000m and lOOOm deep. In the next two cases a reversed parabola shape was used which formed an underwater ridge where the deepest point is 4000m and 1500m while the top of the ridge is 2000m and lOOOm deep. In the last two cases the bottom topography was a seamount which is described by the following equation:

( 42) ( ~) H(x,y) = -h0 1- Ae- L

where h0 is the maximum depth, A is the height and L is the slope of the seamount. In both cases h0 was 4500m while A and L was 0.9 and 0.2 in the fifth case and 0. 75 and 0.3 in the sixth case.

The model is forced by a steady wind blowing from the west and its


value is constant along the x direction. The wind speed in the y direction is described by a reversed parabolic curve where the speed is zero at the edges and 8m/s in the center. The model was run for 72 hours of simulation time in each case by using 32s timestep. The result of the fixed-point computations is compared to the 64 bit floating-point results and the absolute maximum difference between the two solutions is shown in Fig. 14.

If the precision is smaller than 12 bit, the state of the ocean model does not change. By increasing the accuracy slightly the state of the model will change but the error of the flow is in the same range as the flow value itself. To achieve 10% accuracy at least 22 bit precision is required however in this case the error of the elevation is still too high compared to its maximum value. If the precision is further increased, the accuracy of the solution increases quickly but seemingly the error cannot be decreased if the precision is larger than 30 bit. This behavior is very similar to the results obtained in the previous sections and the error of the fixed-point solution in these cases is smaller than the error of the 64 bit floating-point solution. Though 30 bits of precision seem to be enough for the computation of the flow values the elevation of the ocean model should be computed more accurately to achieve similar accuracy to the 64 bit floating-point solution.

7. Conclusions. In this paper the solution of PDEs on emulated digital CNN-UM architectures was examined. However several previous studies proved the effectiveness of the CNN solution of PDEs in most cases these results cannot be applied in the solution of real-life problems because of the imperfections of the analog CNN chips. Emulated digital CNN-UM architectures can be used to overcome the limitations of the analog VLSI CNN chips. But these architectures use fixed-point numbers during the solution of the CNN state equation therefore the required computing precision must be determined before implementation. The performance of the emulated digital CNN-UM architectures can be significantly improved by decreasing the computing precision of the architecture. Therefore it is very important to examine the accuracy of the solution in the case of low computing precision. Our results show that the Forward Euler method, which is widely used in the computation of the CNN dynamics, is not perfect in the solution of spatially discretized PDEs. By using more sophisticated numerical methods PDEs can be solved orders of magnitude faster. Comparison of the floating-point and the fixed-point solutions showed that fixed-point arithmetic can be used very efficiently in the solution of PDEs. The results of the floating-point and the fixed-point solutions are very close even if low precision (20-24bit) is used. If the precision is increased to 30-34bit, the fixed-point computations


are as accurate as the 64bit floating-point results while it requires much less amount of computing resources in implementation.

REFERENCES

[1] T. Roska, and L. 0. Chua, The CNN Universal Machine. An analogic array computer, IEEE Trans. On Circuits and Systems-!!, vol. 40 (1993), 163-173.

[2] T. Roska, T. Kozek, D. Wolf, and L. 0. Chua, Solving Partial Differential Equations by CNN, Proc. of European Conf. on Circuits Theory and Design, (1992).

[3] P. Szolgay, G. Voros, Gy. Eross, On the Applications of the Cellular Neural Network Paradigm in Mechanical Vibrating System, IEEE. Trans. Circuits and Systems-!, Fundamental Theory and Appl., vo!. 40, no. 3 (1993), 222-227.

[4] P. Keresztes, A. Zarandy, T. Roska, P. Szolgay, T. Bezak, T. Hidvegi, P. Jonas, A. Katona, An emulated digital CNN Implementation, J. of VLSI Signal Processing Vol. 23, (1999), 291-303.

[5] Z. Nagy, and P. Szolgay, Configurable Multi-Layer CNN-UM Emulator on FPGA, IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications, Vol. 50, 2003, 774-778.

[6] H. Harrer, A. Schuler, and E. Amelunxen, Comparison of Different Numerical Integration Methods for Simulating Cellular Neural Networks, Proc. of the 1 '' IEEE Int. Workshop on Cellular Neural Networks and their Applications, (1990), 151-159.

[7] W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery, Numerical Recipes inC, [Online] http:f fwww.library.cornell.edu/nr/bookcpdf.html, (1992).

[8] T. Myint-U, and L. Debnath, Partial Differential Equations for Scientists and Enginiers, Elsevier Science Publishing Co. Inc., (1987).

[9] A. Kiss, and P. Szolgay, Elementary CNN Algorithms and an Experimental System for Typical Tactile Actions, Proc. of 16'h European Conf. on Circuits Theory and Design, (2003).

[10] S. Timoshenko, and J. N. Goodier, Theory of Elasticity, McGraw-Hill, (1951). [11] R. H. Stewart, Introduction To Physical Oceanography, [Online]

http:// oceanworld. tamu.edufresources/ ocng_textbook/ contents.html, (2003).


VOLUME 13

2006, NO 1

PP. 89-97

SPATIO-TEMPORAL NONLINEAR WAVE METRIC FOR BINARY AND GRAY-SCALE OBJECT COMPARISON ON

ANALOGIC CELLULAR WAVE COMPUTERS

I. SZATMARI '

Abstract. In this paper, spatio-temporal dynamic phenomena is investigated and introduced for object comparison. Nonlinear waves explore geometrical properties of objects and their recorded dynamics make possible to discover hidden object properties providing quantitative measurement as well. These nonlinear dynamical processes can take place on VLSI implementations of Cellular Neural/Nonlinear Network (CNN) offering a highspeed, parallel architecture for real-time applications. The proposed object comparison framework works on both binary and gray-scale object pairs.

Key Words. spatia-temporal dynamics, nonlinear waves, object comparison, Hausdorff metric

1. Introduction. Comparing objects, usually references (models) to their unknown object pairs, is a fundamental task in many image processing and classification type applications. Spatio-temporal dynamical processes offer novel approach for quantitative object comparison. The proposed framework is well-suited for parallel array processors such as Cellular Neural/Nonlinear Network (CNN), [1]-[3]. High speed programmable devices based on the concept of CNN Universal Machine (CNN-UM), [4],[5] can serve as very powerful and flexible tools for real-time applications.

The wave approach for pattern analysis has already been considered e.g. in [6], [7] and autowave principle was proposed for image analysis and processing [8],[9].The major contribution of this paper is to show that wave phe-

' Analogical and Neural Computing Laboratory of Computer and Automation Research Institute of Hungarian Academy of Sciences, Kende u. 13-17, Budapest, Hungary, H-1116, e-mail:[email protected]

89

90 I. SZATMARI

nomena is useful not only for successful computation of well known metrics such as Hausdorff metric but also qualitatively novel metrics and distance measures can be developed. The nonlinear wave metric and distance measure discussed here exploits much more information of objects than either Hamming distance or Hausdorff metric and contains these metrics as special cases. In this new methodology, spatio-temporal processes explore objects and the dynamics during the evolution is recorded and stored in a special wave map. Along spatial features as a new dimension, time related information can be extracted and this new feature makes possible to investigate "hidden" properties of objects.

We will consider the following dynamical CNN system consisting of two layers of first-order CNN cells. The dynamics of the two-layer, MxN CNN is described by

f,x1,ij (t) = -x!,ij (t) + I: A1,kl · Yl,i+k,j+l (t) + z1 + h (Yl,ij (t), u!,ij (t))} klES

f,x2,ij (t) = -x2,ij (t) + I: A2,kl · Y2,i+k,j+l (t) + z2 + h (Yl,ij (t), u!,ij (t)) klES

(1) with output nonlinearity

(2) 1

Y (x) = 2 [lx- 11 -lx + 11]

The input, state, and output are represented by u, x, y, respectively. Bias (offset) is denoted by z. Note that function fo.(.) has also its input from the first layer. Control functions ft (.) and fo. (.) will be defined later.

DEFINITION 1. An MxN binary image:

UA = UA,MxN = {-1, +1} E ';1'2

DEFINITION 2. An MxN gray-scale image:

VA= vA,MxN = {[-1,+1]} E !1{2

DEFINITION 3. A real value function:

VA= VA,!xK = {[-1,+1]} E !){

WAVE METRIC ON CNN 91

2. Metrics and their properties. A function D : S x S -> ~+ is a metric on a nonempty set S, if for all

we have

(3)

A,B,C E S

DAB= D(A, B)?: 0, DABIM;B = 0 ) DAB= DsA DAB+ Dsc?: DAc

A measure is defined as distance if it fulfils the first two postulates (positiveness and symmetry) and it is metric if in addition the third, triangle inequality also holds. Several distance functions can be defined on images, see e.g. [10].

2.1. Hamming distance on binary images. The most obvious criterion of the degree of coincidence of point sets is a measure of symmetrical difference (number of different points). This is a natural choice and it is the well-known Hamming distance (HD) that is the result of a pixel-wise XOR operation on binary images. Formally, the Hamming distance can be defined as

(4) HD(A,B) = [{(AuB)\(AnB)}[

The Hamming distance is context free in the sense that each pixel is as important as any other pixel, independently of the position of the pixel. It is obvious that Hamming distance is sensitive to object shift and noise. Another problem is that Hamming distance cannot take into account the shape information because it measures only the area differences.

2.2. Hausdorff distance. Another often-used distance is the Hausdorff distance. Given two finite point sets A and B, the Hausdorff distance is defined as

(5) H sD(A, B)= max(h(A, B), h(B, A))

where

(6) h(A, B)= max min [[a- b[[ aEA bEE

This measure defines a distance function that is a metric over the set of all closed, bounded sets. The function h( A,B) is called the directed Hausdorff

92 I. SZATMARI

distance from A to B. It identifies the point a E A that is farthest from any point of Band measures the distance from a to its nearest neighbor in B using the given norm JJ ... I!. The h(A,B), in effect, ranks each point ofA based on its distance to the nearest point of B and then uses the largest ranked such point as the distance (the most mismatched point of A). Note, that in general h( A,B) and h(B,A) can attain very different values (the directed distances are not symmetric). The Hausdorff distance HsD(A,B) is the maximum of h(A,B), and h(B,A). Thus, it measures the degree of mismatch between two sets by measuring the distance of the point of A that is farthest from any point of Band vice versa. Intuitively, if HsD(A,B) = d, then each point of A must be within distance d of some point of B and vice versa.

2.3. Constraints of commonly used metrics. We will demonstrate limitations of Hamming and Hausdorff metrics through a simple example which can analytically be treated. This example can be given both on binary and gray-scale images, here, binary case will be discussed. Let us determine Hamming and Hausdorff distances of a Gaussian probability density function comparing to its horizontal x axis. The Hamming distance (area difference) equals to the integral of the Gaussian function, while the Hausdorff distance is the maximum of the Gaussian function. The examined function is given as

(7) 1 -:e2

<p(x) = --e-,;2 tJ-./2ir

The Hamming distance is

+oo

(8) HD = j f(x)dx = il?(-oo,+oo) = 1 -oo

The Hausdorff distance is

(9) 1

HsD = max(f(x)) = <p(O) = ~<>= tJy27r

The two discussed metrics are examined as a function of parameters of the Gaussian function. Three cases are examined varying a c (c > 0) parameter, see TABLE 1.

As we can see, in the first case both distances are varying linearly on parameter c. But the last two cases show that either Hamming or Hausdorff metrics are constant, i.e. they cannot distinguish different cases. This behavior was also tested and proved on images. The Hamming distance cannot separate objects if they have equal area difference (the integral of the

WAVE METRIC ON CNN

TABLE 1

Hamming and Hausdorff distances of Gaussian pdf

f (x, c) f (x) = c· <p(x) f (x) = <p (x, c)lc=u f(x)=<p(c·x)

Hamming distance

HD=c HD=l HD~ 1.. c'

Hausdorff distance

HsD = u:};;; HsD = l. - 1-

c V21i HsD= u~

93

pdf). The Hausdorff distance cannot distinguish objects having equal distances (the maximum of the pdf) to a reference. This has led us to define a new metric based on propagating waves and recording evolution of wave dynamics.

3. The nonlinear wave metric. The nonlinear wave mapping based metric computation consists of a three-step transformation.

(10) dw(A, B)= fd Uw Uw (A, B)))

where fw (.) is a spatio-temporal mapping of images A and B, generating a wave map. fw (.) is an intermediate processing. It computes specific distributions of the wave map. fd (.) identifies the final distance via a real function projection. We define a dynamical system which is able to generate and propagate waves. We assign pixel intensity values to state values of this system. Let waves start from the intersections of objects and propagate until all positions are triggered in the union of objects (only those positions which form closed set with intersection). Time required to trigger a given position will be recorded in a special wave map. This wave map can be used to define several metrics and it contains Hamming and Hausdorff metrics as special cases. The wave map WAB is defined by function

(11) Uwl :=An B-> Au B } WAB (i,j) = tAnB~AuB (i,j)

Here, we define wave based distance calculation as

(12) AUB

WD(A,B) = j WAs(u)du AnB

94 I. SZATMARI

We can write

AUB AUB AUB

WD(A,B)= I WAs(u)du= I HsD(u)·HD(u)du= I HsD(u)du AnB AnB AnB

(13) because H D (u) = 1. Note that HsD(u) can be interpreted as local Hausdorff distance, so (12) defines a distance calculation by integrating local Hausdorff distances.

THEOREM 1. The wave mapping based distance measure (Integrated Hausdorff) of binary objects satisfies the metric axioms. Proof of THEOREM 1: The first and second axioms hold trivially. (1) W D(A, B) = 0 iff A= B. If A f. B then W D(A, B) f. 0 therefore H sD will produce a nonzero value. (2) WD(A, B)= WD(B,A). Because AnB and AUB are the same in both cases. (3) Finallythetriangleinequality: WD(A,C)::; WD(A,B)+WD(B,C). It

;, will be proved for initial set UAnBnC· Hamming and Hausdorff distances are metrics. The Hausdorff distance is monotonically increasing on the area of Hamming distance: H D2 > H D1 ---> H sD2>H sD1• If we denote the Hausdorff distance on this area as a function, !HsD(H D) then we can write

(14)

WD(A,C)=limHD(A,C')~HD(A,C) f fHsD(HD)dHD::; HD(A,C')

::'0 limHD(A,B')~HD(A,B) f fHsD(HD)dHD+ HD(A,B')

+ limHD(B,C')~HD(B,C) f fHsD(H D)dH D = HD(B,C')

= WD(A,B) + WD(B,C)

The monotonically increasing property ensures that W D(A, C) ::; W D(A, B)+ WD(B,C) holds. Note that this system is capable for simultaneous comparison of several object pairs.

4. CNN implementation and experimental results.

4.1. Binary objects. Using (1), the following template and control functions implement wave propagation and wave map generation. Wave propagation takes place on the first layer, while wave map is generated on

WAVE METRIC ON CNN

the second layer.

(15) [

0.41 0.59 0.41 ] AI = 0.59 2 0.59 'Zj = 4.5, h (.) = 0

0.41 0.59 0.41

(16) [

0 0 0] w: Uj- Yl > 0 A2 = 0 1 0 'Z2 = 0, h (.) = { 0. - < 0

0 0 0 . UJ Yl -

with initial conditions

(17) x1 [0J=AnB,u1 =AUB} x2[0J = 0

95

,w > 0

These templates were derived from discrete mapping of spatial diffusion operator with condition that even a single active ( + 1) pixel should trigger wave propagation. If we want to restrict wave propagation only on union of objects then we need an additional fixed state control to stop propagation. Nevertheless, this does not change the generated wave map. This fixed state control was used in [11J.

4.2. Gray-scale objects. Templates for wave generation and mapping are

(19) [

0 0 0] w: UJ- Y1 > 0 A2 = 0 1 0 'Z2 = 0, h (.) = { 0. - . < 0

0 0 0 . Uj Yl -,w > 0

with initial conditions

(20) x1 [OJ = min (A, B), u1 = max (A, B) } X2 [OJ= 0

Here, there is no need for fixed control map. Control function ft (.) restricts wave propagation. A simpler variant of the proposed metric on binary images with an iterative and a dynamical CNN implementation was presented in [11J. Experimental results was also given there.

96 I. SZATMARI

5. Conclusion and potential applications. In this paper, we have shown that spatia-temporal processes give qualitatively novel way for object exploration and in addition to this they provide investigation tool for quantitative measurements as well. This approach can be used for both binary and gray-scale object comparison and several object pairs can simultaneously be measured. Concrete application case study was presented in [12]. The wave metric was applied to the bubble debris separation problem where huge number of objects was to be classified in a very short time. Another possible application could be the roughness measurement of surfaces. Deformation can be projected to a reference model and wave based evaluation could give very detailed comparison. This wave mapping and recording of time evolution of dynamical processes make possible to measure any type of dynamics or e.g. can be used for optical flow estimation.

REFERENCES

[1] L. 0. Chua, and L. Yang, Cellular Neural Networks: Theory, IEEE Trans. on Circuits and Systems, 35 (1988), 1257-1272.

[2] L. 0. Chua, and L. Yang, Cellular Neural Networks: Applications, IEEE Trans. on Circuits and Systems, 35 (1988), 1273-1290.

[3] L. 0. Chua, and T. Roska, The CNN Paradigm, IEEE Trans. on Circuits and Systems, 40 (1993), 147-156.

[4] T. Roska, and L. 0. Chua, The CNN Universal Machine: an Analogic Array Computer, IEEE Trans. on Circuits and Systems, 40 (1993), 163-173.

[5] T. Raska, and L. 0. Chua, Computer-Sensors: Spatial-Temporal Computers for Analog Array Signals, Dynamically Integrated with Sensors, Journal of VLSI Signal Processing Systems, 23 (1999), 221-238.

[6] V. Krinsky,ed., Self-Organization. Autowaves and Structures Far from Equilibrium, Synergetics, 28 Springer, Berlin, 1984.

[7] H. Haken, Synergetics: From Pattern Formation to pattern Analysis and Pattern Recognition, International Journal of Bifurcation and Chaos, 4, No. 5, (1994), 1069-1083.

[8] V. I. Krinsky, V. N. Biktashev, and I. R. Efimov, Autowave Principles for Parallel Image Processing, Physica D49 (1991), 247-253.

[9] V. N. Biktashev, V. I. Krinsky, and H. Haken, A Wave Approach to Pattern Recognition (with Application to Optical Character Recognition), International Journal of Bifurcation and Chaos, 4, No. 1, (1994), 193-207.

[10] A. Rosenfeld, and J. Pfaltz, Distance Functions in Digital Pictures, Pattern Recognition, 1 (1968), 33-61.

[11] I. Szatmari , Cs. Rekeczky, and T. Roska, A Nonlinear Wave Metric and its CNN Implementation for Object Classification, Journal of VLSI Signal Processing, Special Issue: Spatiotempora/ Signal Processing with Analogic CNN Visual Microprocessors, 23, No 2/3, November-December (1999), 437-447.

[12] I. Szatmari, A. Schultz, Cs. Rekeczky, T. Kozek, T. Roska, and L. 0. Chua, Morphology and Autowave Metric on CNN Applied to Bubble-Debris Classification,

WAVE METRIC ON CNN 97

IEEE Transaction on Neural Networks, 11, No. 6, November (2000), 1385-1393.


VOLUME 13 2006, NO 1 PP. 99- 106

SPATIO-TEMPORAL PHENOMENA IN TWO-DIMENSIONAL CELLULAR NONLINEAR NETWORKS BASED ON SECOND

ORDER CELLS

VALERI MLADENOV'

Abstract. Two-dimensional Cellular Nonlinear/Neural Networks (CNN's) based on second order cells coupled by linear resistors are considered in the paper. The building cells are based on piece-wise linear resistors that have been proposed in the previous author's paper. In this contribution a new piece-wise linear resistor model is utilized also. The existence of the most commonly observed spatio-temporal phenomena as solitary and periodic waves and their dynamics is presented. The model considered allows studying the dynamics of the periodic waves based on the Tsypkin method of feedback systems.

1. Introduction. Spatio-temporal phenomena in electronic systems are considered by many authors. Special attention is drawn to the wave propagation [1-6]. This process exists in systems of coupled excitable cells and such systems can be described by a so-called reaction-diffusion mechanism [7, 8]. This mechanism plays an important role in neurophysiology and cardiophysiology where especially wave propagation phenomena are of special interest. It has been proved [7] that propagation failure cannot be observed in a continuous one-variable homogeneous reaction-diffusion system. Because of this in studying these phenomena discrete models like Cellular Nonlinear /Neural Networks (CNN's) are used [9]. Cellular Neural/Nonlinear Networks are dynamic nonlinear circuits having mainly locally recurrent circuit topology, i.e. a local interconnection of simple circuits called cells. Each CNN is defined mathematically by its cell dynamics and synaptic law, which specifies each cell's interaction with its neighbors.

Several circuit realizations based on Chua's circuits, and their dynamics, have been investigated in [1-6]. The circuit realization consisting of a chain

• Department Theory of Electrical Engineering, Faculty of Automatics, Technical University of Sofia, Sofia, Bulgaria

99

100 VALERIMLADENOV

of identical Chua's circuits (or degenerated ones) can be viewed as a onedimensional CNN, where each cell is represented by a Chua circuit. In [10] the authors propose autonomous CNN's as a universal and convenient substrate for modeling these phenomena.

The simplest circuit realization is presented in [11, 12]. The equations describing the studied system have similar properties as the Nagumo equation. The author shows that for the first order nonlinear cell both wave propagation and its failure are possible. He analyzes the reason why wave propagation failure can occur and determines analytically the critical value of the coupling resistor. In [13] the wave propagation in this system is applied for data processing. However, this realization does not exhibit recovering. For a model of nerve conduction to be realistic there must be a mechanism to return to the zero initial state, so that the nerve may again be excited by a next stimulus. In order to describe the recovering process, in [14], an extension of the cell given in [11-13] is proposed. The corresponding model is based on second order cells and could be viewed as a discrete version of the well known FitzHugh-Nagumo equation. The recovering process allows for the solitary waves in this CNN. Based on a new presentation of the piece-wise linear resistor an extension of these results is presented in [17]. Using the Tsypkin's method of feedback systems the dynamics of the periodic waves of one-dimensional CNN, have been studied. In this paper we extend the results of [17] and show the existence of periodic waves in two-dimensional CNN model based on second order cells.

The paper is organized as follows. In the next section we present the twodimensional CNN based on the second order cells. In Section 3 the solitary traveling pulses in the considered network are investigated. Periodic waves in the two-dimensional CNN are considered in Section 4. Conclusions are given in Section 5.

2. Two-dimensional CNN. We consider the two-dimensional Cellular Nonlinear/Neural Network (CNN) given in Figure 1. It consists of M x N second order resistively coupled nonlinear cells shown in Figure 2.

The model is described by the following normalized system of equations

(1)

SPATIO-TEMPORAL PHENOMENA

where f(u;j) is given by

(2) f(u;j) = { -0.25Uij, U;j < 0.2 0.25U;j - 0.1 , 0.2 :S U;j :S 0.8 -0.5U;j + 0.5, Uij > 0.8

101

Figurel. Two-dimensional CNN based of first order piecewise nonlinear resistively coupled cells

d = 1/ R determines the coupling between the cells, <p(uk) is a piece-wise linear function such that the resulting nonlinear function f(uk) in (1) has the form (2), C = 1, and the component values of Rb R2andL are appropriately chosen.

Figure 2. Basic cell of suggested CNN Normally E -+ 0 and W;j, i = 1, 2, ... , M, j = 1, 2, ... , N, become slow

variables with respect to U;j, i = 1, 2, ... , M, j = 1, 2, ... , N. The considered model can be viewed as a discrete version of the FitzHugh-Nagumo equation [6], [7], [9], [10], [14]. Several circuit realizations of the above CNN model are given in [14]. Here we utilize the following piece-wise linear function [17]

(3)

where 0 < a < 1. The corresponding piece-wise linear resistor <p(uk) can be modeled by the schematics given in [17].

102 VALERIMLADENOV

3. Solitary waves and propagation. To show that the wave propagation is possible we will use the approach from [10-12] and will extend it for the two-dimensional case. We will look for a "traveling wave" solution of system ( 1) in the form

uiJ = u(~), ~ = t- ih- jh,

where h > 0 is a parameter (1/h represents the velocity of the wave front) and ~ is so-called "moving coordinate".

Substituting the above into (1), we obtain the following equation

~~ = 2d[u(~- h)- 2u(~) + u(~ +h)]+ f(u(~))- w(~) ~'; = c[u(~) bw(~)]

If we assume that h is sufficiently small, the difference term in the above system can be replaced approximately by the second derivative of u (with respect to the moving coordinate), and we get

This system can be written in the form

u=v (4) i; _ -f(u)+v+w

- 2dh2

w = c(u- bw)

where the dot denotes differentiation with respect to f Solitary waves of the model considered correspond to non-constant solu

tions of () which satisfy the condition

limrel-->oo(u(~), v(~), w(~)) = 0.

The above condition is satisfied by the homoclinic orbits ([5], [6], [14], [15]) of system ( 4). The existence of such orbits and the corresponding solitary pulses of the proposed CNN have been studied in [14] for a CNN model based on piece-wise linear characteristic given by (2). The recovering process (solitary wave) retrieves the zero initial state of the system.

4. Periodic waves in the two-dimensional CNN. In this section, based on the piece-wise linear function (3), we will study the limit cycles of (4). As it has been mentioned a limit cycle results in periodic wave trains.

SPATIO-TEMPORAL PHENOMENA 103

Without loosing the generality we will consider b = 0. The elimination of variables v and w in ( 4) leads to the third order differential equation

d3u d2u _ if(u) -K d~3 +de +c-u- dl:'

where K = 2dh2 Taking into account (3) we obtain

(5)

where X= 0.5[1 + sgn(ui- a)]. By considering the Laplace transform from ~ to the complex variable s, (5) can be clearly separated in a linear and nonlinear part. The former is a dynamic one and can be represented by its transfer function W ( s) given by

(6) W(s) = XU((ss)) = s K s3 - s2 - s - E '

while the latter is simply described by the static nonlinearity (relay). The two parts are connected in feedback as shown in Figure 3.

This class of systems has a certain interest in control engineering for its importance in practical applications and gives an opportunity for a special type of analysis [16]. In particular, when a limit cycle occurs, the relay output signal necessarily becomes a periodic square wave of fixed amplitude whose switching frequency is unknown. For determining this frequency we can derive the steady-state output of the linear system W ( s) to this periodic square wave and then match the corresponding switching times at the input of the relay. The solution of this problem can be found in the frequency domain. The idea was originally proposed by Tsypkin [16].

u a E tc_ X W(s) ...::rT

Figure 3.Relay feedback system The relay output waveform, being the input of the linear part W ( s)

is written as a Fourier series which allows one to derive the corresponding Fourier series of the output u(~). In a similar way the series of du/d~ is obtained. The corresponding conditions are

104 VALERI MLADENOV

(7) 00 Vw(nw) I; = Im{Aw(O,w)} = 0

n::::::l n

00

(8) I; Uw(nw) = Re {Aw(O, w)} < 0 n=l

where

Uw(nw) = Re {W(jnw)}, Vw(nw) = Im{W(jnw)}

and the complex function Aw(8,w) is defined as

00

Re {Aw(8, w)} = I;[Vw(nw) sin(n8) + Uw(nw) cos(n8)] n=l

00 1 Im {Aw(O, w)} = I; -[Vw(nw) cos(n8) + Uw(nw) sin(n8)].

n==l n

The equality constraint (7) is a nonlinear equation to be solved in w for determining the period of the limit cycle of the system. The solution can be found numerically or graphically. The inequality condition (8) has only

, to be checked. The obtained limit cycle results in a periodic wave for the corresponding two-dimensional CNN model.

This result is similar to those given in [17]. The only difference is that in the two-dimensional case considered here, the coefficient K used in [17] is 2dh2 , while in the one-dimensional case [17] this coefficient is dh2 • This is due the fact that here the travelling wave is in both directions and starting from the cell (1, 1) the wave reaches the final state (M, N).

Simulation results for a 5x5 CNN with d = 0.2 are given in Figure 4. The initial condition for cell (1, 1) is u(1, 1) = 1 and for the other cells are zero i.e. the periodic wave is initiated at cell (1, 1). The periodic solutions for the cells of first, second, third, fourth and fifth row of the two-dimensional CNN model are depicted in Figure 4.

SPATIO-TEMPORAL PHENOMENA 105

' .,~ ,• " ' '

'•' •' -------------- .. H I I

Figure 4. Transient in 5x5 two-dimensional CNN with d = 0.2. Zero flux boundary conditions. Initial condition for u(1, 1) at t = Ois 1 and for all other cells are zero. Periodic solutions for the first, second, third, fourth

and fifth row cells.

5. Conclusion. In this paper we study the dynamics of the solitary and periodic waves in two-dimensional autonomous Cellular Nonlinear/Neural Networks (CNN's). In particular, we consider the two-dimensional CNN made up of second order cells coupled to each other by linear resistors. Applying the proposed form of the piece-wise linear resistor it is possible to study the dynamics of the periodic waves based on the Tsypkin method of feedback systems.

REFERENCES

[1] V.Perez-Munuzuri, V-Perez-Villar, L.O.Chua: "Travelling wave front and its failure in a one-dimensional array of Chua's circuits)), J. Circ. Comp., 3, No.1, pp. 211-215, 1993.

106 VALERI MLADENOV

[2] A.P.Munuzuri, V.Perez-Munuzuri, M.Gomez-Gesteria, L.O.Chua, V-Perez-Villar: "Spatiotemporal structures in discretely-coupled arrays of nonlinear circuits: a review", Int. Journal Bifurcation and Chaos, VoL 5, No.1, pp. 17-50, 1995.

[3] V.I.Nekorkin, V.B.Kazantsev, L.O.Chua: "Chaotic attractors and waves in a onedimensional array of modified Chua's circuits", Int. Journal Bifurcation and Chaos, VoL 6, No.7, pp. 1295-1317, 1996.

[4] V.LNekorkin, V.B.Kazantsev, M.G.Velarde: "Travelling waves in a circular array of Chua's circuits", Int. Journal Bifurcation and Chaos, Vol. 6, No. 3, pp. 473-484, 1996.

[5] V.LNekorkin, V.B.Kazantsev, M.F.Rulkov, M.G.Velarde, L.O.Chua: "Homoclinic orbits and solitary waves in a one-dimensional array of Chua's circuits", IEEE Trans. Circuits and Systems, Part I, VoL 42, No. 10, pp. 785-801, 1995.

[6] R. Genesio, M. Nitti, A. Torcini, "Analysis and Simulation of waves in ReactionDiffusion Systems", Proc. 37th IEEE Conf. on Design and Control, Tampa, Florida, USA, pp. 2059-2064, Dec. 1988.

[7] J.P.Keener: "Propagation and its failure in coupled systems of discrete excitable cells", SIAM J. AppL Math., VoL 47, pp. 556-572, 1987.

[8] T.Roska, L.O.Chua, D.Wolf, T.Kozek, R.Tetzlaff, F.Puffer: "Simulating nonlinear waves and partial differential equations via CNN - Part I: Basic techniques", IEEE Trans. Circuits and Systems, Part I, VoL 42, No. 10, pp. 807-815, 1995.

[9] G. Genesio, M. Nitti, A. Torcini, "Analysis and Simulation of Waves in ReactionDiffusion Systems", Proc. of 37th IEEE Conf. on Decision and Control, Tampa, Florida, USA, pp. 2059-2064, 1998.

[10] L.O.Chua, M. Hasler, G.S.Moschytz, J.Neirynsk: "Autonomous cellular neural networks: a unified paradigm for pattern formation and active wave propagation", IEEE Trans. Circuits and Systems, Part I, VoL 42, No. 10, pp. 559-577, 1995.

[11] D.M.W.Leenaerts: "Wave propagation and its failure in piecewise linear Nagumo equations", Proc. European Conf. Circ. Theory and Design, Budapest, pp. 342-347, 1997.

[12] D.M.W.Leenaerts: "On traveling waves in a one-dimensional array of resistively coupled cells", Proc. Int. Symp. NOLTA'97, Honolulu, pp. 257-260, 1997

[13] D.M.W.Leenaerts: "Data processing based on wave propagation", Int. Journal of Circuit Theory and Applications, 27, pp. 633-645, 1999.

[14] V.Mladenov, H. Hegt, "On Waves and Recovering in One-dimensional Autonomous CNN's", Proc. of 6th IEEE International Workshop on Cellular Neural Networks and Their Applications (CNNA 2000), Catania, Italy, May 23-25, pp. 21-26, 2000.

[15] Yu. A. Kuznetsov: "Elements of Applied Bifurcation Theory", Springer-Verlag, NewYork, Berlin, Heidelberg, 1995.

[16] LD.P. Atherton, Nonlinear Control Engineering, van Nostrand Reinhold Company, London, 1982.

[17] V.M. Mladenov, J.A. Hegt, ,A.H.M. van Roermund, "On Solitary and Periodic Waves in One-dimensional FitzHugh-Nagumo CNN's", Proceedings of the 8th IEEE International Workshop on Cellular Neural Networks and their Applications, CNNA 2004, Budapest, Hungary, pp. 88-93, 2004.


VOLUME 13

2006, NO 1 PP. 107-113

HYSTERESIS IN CNN MODEL OF BAKTERIA GROWTH

A.SLAVOVA'

Abstract. This paper deals with hysteresis in the CNN model of bakteria growth. Dynamics of the obtained CNN model is studied by using the desribing function method. Periodic solutions are predicted and an example is given for the model of bacteria growth.

1. Introduction. Several physical phenomena exhibit hystersis. In classical continuum mechanics, hysteresis behaviour is inherent in many constitutive laws. In systems and control applications, hysteresis regularly appears via mechanical play and friction, or in the form of a relay or thermostat, often deliberately built into the system. If the hysteretic behaviour is described using a hysteresis operator, then the mathematical model for the dynamical system consists of differential equations coupled with one or several hysteresis operaotrs, which is complemented by initial and boundary conditions. The oscillator with hystresis restoring force,

x" (t) + F[x](t) = f(t),

F being a hysteresis operator is a basic example. The coupling of rate independent hysteretic nonlinearities with ordinary

differential equations leads to interesting mathematical problems in the theory of nonlinear oscillations. Hysteretic constitutive laws in continuum mechanics formulated in terms of hysteresis operators lead in a natural way to partial differential equations coupled with hysteresis operators, where the former represent the balance laws for mass, momentum and internal energy.

There are two types of hysteresis relations: l).relay hysteresis, and 2). active hysteresis. In relay hysteresis, the graph (u, v) with output v(t) =

' Institute of Mathematics and Informatics of Bulgarian Academy of Sciences, Sofia 1113, Bulgaria, e-mail: [email protected]

107

108 A.SLAVOVA

F[u](t) moves, for a given continuous piecewise monotone input u(t), on one of two fixed output curves hu(u), hL(u) defined, respectively, on [a, oo), [-oo, ,8], a< ,8, depending on which threshold, a or ,8, was last attained. It is known [5] for hu(hL) to be asymptotically constant because of saturation as u---> +oo( -oo), and hu, hL need not meet. In relay hysteresis the memorybased relation can be presented by the formula:

(1) F[u](t) =

hL(u(t)), u(t) :::; a; hu(u(t)), u(t) 2 ,8; hL(u(t)), u(t) E (a, ,B),

u(r(t)) =a; hu(u(t)), u(t) E (a, ,B),

u(r(t) = ,8;

where r(t) = sup{sfs:::; t,u(s) =a or u(s) = ,8}. Note that r(t) is defined for any continuous input u(.), therefore the domain ofF can be taken as C[O,oo).

al I I

(a)

v

1 hu(u)

I I

1,8 u I

Fig. 1. Relay hystersis.

v

(b)

Active hysteresis allows trajectories inside the hysteresis region

7-l = {(u,v)!a < u < ,8,

hL(u) < v < hu(u)}.

HYSTERESIS IN CNN MODEL 109

The mathematical models for the two types of hysteresis defined above are quite different, even though their memory-based behaviour is similar. Both types have been described for u(t) continuous piecewise monotone, but active hysteresis is easily extended to continuous inputs by using approximations and a limit process [5]. Relay hysteresis :F : u --> :F[u] is inherently discontinuous as a map between function spaces, since an input function that just reaches a threshold and reverses just short of the threshold by an arbitrary small amount.

Cellular Nonlinear/Neural Networks (CNNs) are complex nonlinear dynamical systems. CNN [1] is simply an analogue dynamic processor array, made of cells, which contain linear capacitors, linear resistors, linear and nonlinear controlled sources. Let us consider a two-dimensional grid with 3 x 3 neighborhood system as it is shown on Fig.2.

(1, 1) (1, 2) (1, 3)

(2, 1) (2,2) (2,3)

(3, 1) (3, 2)

Fig.2. Two dimensional 3 x 3 CNN. The squares are the circuit units - cells, and the links between the cells

indicate that there are interactions between linked cells. One of the key features of a CNN is that the individual cells are nonlinear dynamical systems, but that the coupling between them is linear. Roughly speaking, one could say that these arrays are nonlinear but have a linear spatial structure, which makes the use of techniques for their investigation common in engineering or physics attractive.

We will give the general definition of a CNN which follows the original one [1]:

DEFINITION 1. The CNN is a a). 2-, 3-, or n- dimensional array of b). mainly identical dynamical systems, called cells, which satisfies two

properties: c). most interactions are local within a finite radius r, and d). all state variables are continuous valued signals. DEFINITION 2. An M x M cellular neural network is defined mathemat-

110

ically by four specifications: 1). CNN cell dynamics;

A.SLAVOVA

2). CNN synaptic law which represents the interactions (spatial coupling) within the neighbor cells;

3). Boundary conditions; 4). Initial conditions. Now in terms of definition 2 we can present the dynamical systems de

scribing CNNs. For a general CNN whose cells are made of time-invariant circuit elements, each cell C( ij) is characterized by its CNN cell dynamics :

(2)

where Xij E Rrn, Uij is usually a scalar. In most cases, the interactions (spatial coupling) with the neighbor cell C(i + k,j + l) are specified by a CNN synaptic law:

(3) l;j = A;j,k!Xi+k,Hl + A;j,kl * !kz(X;j, Xi+k,j+l) + +B;j,kl * Ui+k,j+l(t).

The first term A;j,k!Xi+k,j+l of (3) is simply a linear feedback of the states of the neighborhood nodes. The second term provides an arbitrary nonlinear coupling, and the third term accounts for the contributions from the external inputs of each neighbor cell that is located in the Nr neighborhood.

In Section 2 we will formulate the problem of bacteria growth in presence of nutrients by correponding reaction-diffussion system and its CNN modeL Section 3 deals with the analysis of the spatio-temporal phenomena by applying harmonic balance method. An example for the proposed CNN model is given then.

2. Problem and its CNN model. In this section we will outline a PDE model describing growth of bacteria in presence of nutrients [2]. This phenomenon exibits pattern formation: growing rings are formed in response of diffusion of nutrinets. This is analogous to the chemical phenomenon known as Liesegang rings, where nutrients are replaced by ions, and growing bacteria by a precipitating substance.

Let us denote by B the space domain, by u := ( u1, u2 ) the concentration of nutrients (here reduced to two, for sake of simplicity), by b the concentration of bacteria, and by s their activity: s = 1 if they are growing, s = 0 if they are not. We have the following system of reaction-diffusion equations:

(4)

HYSTERESIS IN CNN MODEL

au; DA o·s - - ·uu· + c·s = Ill at ' ' ' au;= 0 av

u;lt=O = u? inB

in L:

for i = 1, 2, and an equation for bacteria evolution

(5) ab -=cs in B at

blt=o = b0 m B,

111

here D;, e;, c (i = 1, 2) denote positive constants and u~, u~, b0 are the initial data. A constitutive relation between u and s must be added.

The presence of a bistability region will lead to occurence of hysteresis in evolution. Let us consider the following hysteresis relation

(6) 3 1

s(x, t) = 4[F(u(x, .))](t) + 2

where F( u) - ~3

- u, (the factor and the additive constant are inserted to transform the relay values into 0,1). So s = O(s = 1) if F(u) < p1

(F(u) > p2), P~> p2 are constants, and s depends on the previous evolution if P1 S F(u) S P2 [2].

We will consider an autonomous CNN with M x M cells and onedimensional discretized Laplacian template in order to approximate our model of bacteria growth (4),(6):

(7) du _32- D(u· · 1 - 2u· · + u· ·+1)+ dt Z 'l-J- ZJ Z)

+c;s = 0

d?t·. _3}_ = 0 dv '

1 ::; j ::; p = M.M, i = 1, 2. For the boundary conditions we take periodic ones,

which make the array circular.

112 A.SLAVOVA

3. Dynamics of the model. We will study the dynamics of our CNN model (7) by applying a suitable double Fourier transform [3]. Then we will reduce the network to a scalar Lur'e scheme [3]. After the transformation we obtain for the model:

(8) sU(s, z) = D(z-1U(s, z)- 2U(s, z)+

+zU(s, z))- cN(U(s, z)),

where s = iw0 is the temporal frequency, and z = exp(i00 ) is the spatial frequency. The transfer function in this case is:

(9) c

H(s, z) = D(z 1- 2 + z) s

We look for possible periodic solutions of the form (first harmonics):

(10) Um0 Sinwot,

Vm0 sinwot,

and we can find the amplitude Vmo of the output:

(11)

Suppose that our CNN model (7) is a finite circular array of p cells. For this case we have finite set of frequencies [3]

(12) no = 27[ k, 0 ::; k ::; p - 1. p

Then the following proposition hold according to the harmonic balance method [3]:

PROPOSITION 1. CNN model (5)of the bacteria growth, with circular array of p = M.M cells and periodic boundary conditions

uo(t) = up(t),

HYSTERESIS IN CNN MOD:DEE~·L 113

has periodic solution with period To = 21r / w0 and =:;;;;;;;~=- mplitude Um0 for all Oo = 2;k' 0 ::; k ::; p - 1.

We will consider the following examples for initial condition:

uo = { 0, 1- cost,

t ::; 0, t > 0.

REFERENCES

[1] L.O.Chua, L.Yang: "Cellular neural networks: Theor:::-Vol. 35, pp. 1257-1271, Oct. 1988.

--our CNN model (7) with

::=::51". IEEE Trans. Circuit Syst.,

[2] A.Lokshin, E.Sagomonian: Nonlinear waves in the m ::.....:.:=~';hanics of solid bodies, Edition of the Moscow State University, Moscow, (1989) C - in Russian).

[3] A.I.Mee: Dynamics of Feedback Systems, John Wile , 1981. [4] T.Roska, L.Chua, D.Wolf, T.Kozek, R.Tetzlaff, F. uffer: "Simulating nonlinear

waves and PDEs via CNN- Part I: Basic Techniqu. s, Part II: Typical Examples". IEEE Trans. Circuit and Syst. - I, Vol. 42, N 10, cii!!i!PP·809-820, 1995.

[5] Visintin,T: Differential Models of Hysterezis, Springe, , 1999. [6] G.B. Whitham: Linear and Nonlinear Waves, John~ iley, 1974.

FUNCTIONAL IFFE TI L UA I S

Documents

Transcript of FUNCTIONAL IFFE TI L UA I S