Liu 1996 Micro Processing and Micro Programming

download Liu 1996 Micro Processing and Micro Programming

of 9

Transcript of Liu 1996 Micro Processing and Micro Programming

  • 8/9/2019 Liu 1996 Micro Processing and Micro Programming

    1/9

    MicroprocessingandMicroprogramming

    EISEVIER Microprocessing and Microprogramming 41 (1996) 68 I-689

    Modelling and performance assessment of large ATM switchingnetworks on loosely-coupled parallel processorsW. Liu a, E. Dirkx a, G. Petit b, J. Tiberghien a**a Vrije Univ ersiteit Brussel, Dept. INFO, Pleinhan 2, B-1050 Brussel, Belgium

    b Trajic Technology Dept., Research Cent re, Bell Telephone Co., F. W ellesplein I, B-2018 Ant w erpen, BelgiumReceived 22 February 1995; revised 6 April 1995; accepted IO October 1995

    AbstractA parallel simulation tool has been developed with the aim of evaluating the performance of a class of ATM switching

    systems. In this paper, the class of ATM switching networks of interest and the corresponding modelling are addressed. Anappropriate decomposition scheme is revealed with respect to the model being built. The performance of the parallel tool isexamined, based on a scalable ATM switching network, by means of a loosely-coupled parallel computer system. Acombination of general purpose and problem specific algorithms results in a high computational efficiency, portability andscalability.Keywords: Parallel simulation; Performance evaluation; Asynchronous transfer mode

    1. IntroductionMany architectures of the Asynchronous TransferMode (ATM) switching system have been proposedin literature since ATM is chosen as the transfermode for implementing Broadband Integrated Ser-vices Digital Networks (BISDN) by the International

    Telecommunications Union (ITU). The performanceof such a switching network should be evaluated inorder to make sure that the design and implementa-tion will satisfy the future communication require-ments. Two general approaches are available to eval-uate the performance of digital switching systems:analytic and simulation techniques. The former anal-yses the model of a system design mathematically orwith computer-aided numerical methods. The lattersimulates the system behaviour by

    l Communicating author. Email: [email protected] puter(s), and global performance0165~6074/96/$15.00 0 1996 Elsevier Science B.V. All rights reservedSD/ 0165-6074(96)00029- 1

    means of com-parameters are

  • 8/9/2019 Liu 1996 Micro Processing and Micro Programming

    2/9

    682 W. Liu et al./Microprocessing and Microprogramming 41 (1996) 681-689

    measured during simulated operation of the systemunder study.Analytic techniques are able to reflect the rela-tionships of system-characteristic parameters intu-itively over a wide variety of operating conditions.Its functional range is however restricted to a highdegree of abstraction due to unrealistic assumptions,and many approximations are necessary to keep amodel tractable. It proves to be very difficult toanalyse complex systems such as large-scale multi-stage ATM switching networks at a very low level(i.e. highly detailed) with the current analytic ap-proach. Simulation techniques on the contrary aremore flexible and can be applied to a wide area ofapplications in arbitrary degree of detail. A disadvan-tage is that a simulation consumes a large amount ofCPU time, even for a modest problem. Hence, with-out strong enough computing power, simulationtechniques are also not applicable to a complicatedsystem. The only cost-effective solution to thisdilemma is to partition the system under study into anumber of components and execute them in parallelon a multicomputer system.To implement a parallel simulation, it is necessaryto decompose the system under study to a set ofexecutable components that can be simulated concur-rently. The decomposition of an application can bedone according to parallel heuristics such as pipelin-ing [1,2], farming [3] and/or data level techniquesfor Parallel Discrete Event Simulation (PDES) suchas the Chandy-Misra algorithm [4,5] and the Time-Warp algorithm 161.

    In the following, the modelling of a class of ATMswitching networks of interest is first addressed inSection 2. Section 3 then approaches the appropriatealgorithms for the model under study. The imple-mentation of the strategy on a MIMD parallel com-puter resulted in a general purpose, yet highly effi-cient software tool. The performance of the paralleltool implemented is examined, in Section 4, basedon a scalable ATM switching network. At last, someremarks are given in Section 5 to conclude the paper.

    2. System modellingAn ATM switch will be capable of handling a

    minimum of several hundreds of thousands of cellsper second at every input line [8,9]. Such a switch isdesigned to be able to route all cells from their inputlines to their requested output lines with low enoughpossibility for cell loss and unbearable cell delay,while preserving the correct cell order for individualvirtual connections. The switches, having been pro-posed in the literature, take various architectures,with the size ranging from a few ports to thousandsof ports. Each switch port will typically work withthe transmission rate of 155 Mbit/s or higher. Thetechnology used in implementing the switch placescertain limitations on the size of the switch and linespeed; thus to build large switches, many modulesare interconnected in a multistage configuration [lo].The time-division switches [9] allow several in-put-output connections to share the same physicalresource, say a conducting medium or memory, basedon discrete time slot. A memory module is a typicalexample for implementing connections based on timedivision. Such a memory module is the repository inwhich cells are supplied by input ports and removedby output ports. The switching principle is similar tothe Time Slot Interchange (TSI) mechanism [l 11 forsynchronous Time Division Multiplexing (TDM). Acontroller is required to sequentially process all in-coming cells and select all outgoing cells in eachtime cycle. This class of switches, with buffer mem-ory in basic switching blocks to solve cell outputconflict, is intended to construct large switchingsystems. Many proposed architectures fall into thisclass in order to construct a scalable ATM switch(e.g. [13-161).2.1. Fabrics with shared memory

    It is our intention to develop a simulation tool toevaluate the performance of a class of ATM switcharchitectures with shared memory. The previous work

  • 8/9/2019 Liu 1996 Micro Processing and Micro Programming

    3/9

    W . Liu et al./ Mi croprocessing and Mi croprogramm ing 41 (1996) 681-689 683

    has proven that output buffer can achieve betterswitch performance, e.g. higher throughput, than in-put buffer, and shared output buffer will utilisebuffer memory more efficient than separate outputbuffers for all traffic patterns [13,171. Shared buffertype switches are very attractive and many imple-mented prototypes use them as the building blocksfor large switch fabrics. The building blocks, orSwitching Elements @Es), are organised in stages,with interconnection restricted to adjacent stages. Ina k-stage network, the inputs are connected to SEs instage 0, while the outputs are connected to SEs instage k-l.

    Besides adopting shared memory within SEs, theclass of ATM switch systems have the followingcharacteristics:(1) The switch fabrics are scalable with Clos topol-ogy Dl ;(2) multipaths are available for cells;(3) cells are synchronised with the certain interval ofeach time cycle;(4) cells are routed with respect to the self-routingtag in the fabrics.The simulation model for the class of ATM switcharchitectures involves a series of modules: the source,the interface, the SEs, the queueing and routingmanagement, and the analysis.

    2.2. Data flowATM cells are generated at the source and flow

    towards the switch fabric via external links. How-ever, the internal packet format is required in orderto support the self-routing protocol within a switchfabric. At the switch interface, a self-routing tag isadded to each ATM cell according to its VP1 andVCI. This tag points out the appropriate outlet to-wards the cells destination. Together with the tagand other control information, an ATM cell is con-verted to an internal packet. Such a packet willtraverse the switch fabric one stage at a time until it

    reaches the requested outlet. Thus, the cell trafficforms a unidirectional data flow.When arriving at an SE, an internal cell will bebuffered in the shared memory. The cell may be sentout immediately if the requested output link is free atthe time cycle; otherwise, it has to stay in thecorresponding logical queue and wait for its turn togo. Each logical queue is applied with First-In First-Out (FIFO) discipline, even though the buffer isshared by several of such queues. An SE at differentstages may have a different routing mode, whichresults in a different number of output links in arouting group. An incoming cell would be lost if thebuffer had been fully occupied in an SE.

    At the output side of the switch interface, internalcells may stay in a resequencing buffer with anengineerable constant delay in order to compensatefor distinct cell transfer delay due to multiple paths.When a cell has waited long enough, it will bedelivered to the output buffer, where the internalcells are depacketized into ATM cells.2.3. Performance indices

    The performance indices of ATM switching net-works involve throughput, connection blocking prob-ability, cell loss probability, cell insertion probabil-ity, bit error rate, switching delay, and jitter on thedelay [121. We concentrate mainly on two perfor-mance indices in the model of the ATM switchingsystems: the cell loss probability and the switchingdelay, though other parameters are also measured(throughput, average link load, packet resequencemechanism, etc.>.Cells may be lost due to the resource shortageinside an ATM switch fabric. Cells entering fromdifferent input ports can compete the same outputport at the same cycle in an SE, i.e., a cell outputsconflict. For the ATM switch fabric to be modelled,buffers are used to relieve the cell contention atrouting spots. However, severe congestion may causebuffer memory overflow, and thus result in the loss

  • 8/9/2019 Liu 1996 Micro Processing and Micro Programming

    4/9

    684 W . Liu et al. / M icroprocessing and Microprogramm ing 41 (1996) 681-689

    of the incoming cells at the moment. The cell lossprobability is an important performance metric for anATM switch because it must be restricted to acertain quantity for a class of communication ser-vices. In fact, it is the key parameter for determiningthe buffer memory size of the individual SEs. If thesize is too large, precious network resources wouldbe wasted and the implementation becomes morecomplicated; if it is too small, cell loss ratio wouldincrease so that the network could not guarantee theQuality Of Service (QOS> requirements for userapplications. A satisfactory size of the buffer mem-ory can only be determined by means of very de-tailed evaluation of the internal operation of theswitching network.

    Switching delay and delay jitter are other impor-tant performance indices for ATM switching net-works. The ATM techniques multiply a diverse mixof sources on a single transmitting medium. Real-timemessages such as voice are very sensitive to thedelay and delay variation. Cell switching delay canbe affected by many factors: link load, cell arrivalpatterns, cell congestion, and so on. Again a mea-surement can be made based on very detailed evalua-tion of the internal operation of the switching net-work.3. Distributed simulation architecture

    In order to obtain details about the performance ofthe ATM switch under study, a very small time unitshould be used to describe the simulation progress.With large quantities of high-speed cells being han-dled inside, huge computing power and large mem-ory space are required when simulating a large scaleswitching network. The conventional event-drivenmechanism is not suitable for this kind of simulationbecause of the global event-list. Although techniquesfor performing event-list manipulation and eventsimulation in parallel have been suggested [181, largescale performance increases seem unlikely [ 191. Thestrategy of the distributed event-list is also unpromis-

    ing as the potential parallelism and resulting speedupwith the network model are significantly less thananticipated [20].

    To achieve results with a reasonable computingperiod, the model should be partitioned, with suitabledecomposition and process-processor mapping meth-ods, into concurrent processes that can be distributedover a network of processors of a MIMD machine.3.1. Process partition

    As there are inherent dependent relationshipsamong data and switching blocks, the farmingheuristic will not be applicable properly to the model.No matter which way is used to partition the model,the processes could not work independently as thealgorithm required. In other words, communicationsare necessary between the processes. Therefore, thecorresponding simulation would not be efficient.

    The PDES techniques look attractive for the simu-lation of communication networks. Indeed, these ap-proaches are based on the principle of using a net-work of logical processes to simulate the correspond-ing physical processes in the system to be modelled.Nevertheless, the topology of interactions amongcomponents of the system is the key to determinewhether an approach is reasonable [7].Significant differences exist between the ATMswitch fabric and a conventional communication net-work. The conventional communication network car-ries variable (large generally) packets with relativelylow transmission rate. The network topology is moreflexible with relatively less interaction among thenodes of the system. The packets may be sent fromany node with the destination of any other node.Therefore, the routing algorithm and communicationprotocols need to be designed with more efforts.On the other hand, the ATM switch fabric hasregular arrangements of the building blocks, and willbe used to switch short fixed-size cells in high speed.The cell movements are unidirectional and synchro-nised with the certain time interval within the fabric.As ATM switches handle very high speed and high

  • 8/9/2019 Liu 1996 Micro Processing and Micro Programming

    5/9

    W. Liu et al./ Microprocessing and Microprogramming 41 (1996) 681-689 685

    density traffic, the arrival and departure events occuralmost in every simulation cycle, even for a moder-ate-size switching system. The components of theATM switch fabric are tightly-coupled with highfrequent interactions among them. The simulationtime becomes a more crucial factor when evaluatingthe performance of ATM switches. With highly regu-lar configurations of switching blocks and fixed-length packets, the implementation of routing andprotocol algorithms becomes simple in contrast tothe simulation of classical communication networks.Based on the above comparisons, it can be seenthat it is unlikely to achieve high performance of themodel of the ATM switching system with the PDESapproaches. Even though the conservative approachis suitable for the networks with static interconnec-tion, the overhead, such as the Chandy-Misra dead-lock avoidance algorithm, is too high for implement-ing the simulation of the ATM switches. A largeATM switch fabric may involve several hundreds (orthousands) of SEs, while each SE may have severaldozens of inputs/outputs. If one process emulatesone SE, it must exchange messages with all itspredecessors and successors. Even with a light loadin the network, the sending of one useful cell meansthat corresponding null messages will be sent to allother outputs. In addition, the data priorities result inpoor lookahead for cell streams. The conservativealgorithms appear to be poorly suited for the applica-tions with poor lookahead properties, even if there isa healthy amount of parallelism available [21]. Forthe optimistic approach such as the Time Warpalgorithm, consider individual switching block(s) ofa fabric as a process running on a processor with itslocal simulation time. The clock of the process isseldom faster than that of its predecessors, due to theunidirectional traffic. Thus, there is no chance for anode to have some spare periods to schedule specula-tive events. In case the chance exists with the ap-proach, the cost of roll-back would be excessive fora model of the ATM switch due to the large quanti-ties of cells being processed.

    According to the characteristics of the system, atime-driven mechanism is adopted by the simulationtool. We think that the asynchronous simulation withlocal clocks is more applicable to the system ofinterest. When clocks are local and simulation isasynchronous a process can begin simulating thenext tick as soon as its predecessors have finishedthe last tick. Synchronisation for local clocks isimplicitly provided by sending a message from aprocessor to its successors [7]. The restricted com-munication overhead is the main benefit derivedfrom this choice. The synchronisation has been hid-den in the necessary data flow of the model. To savethe transmission and setup time, the communicatingdata between processors are organised as a singlemessage at a given simulation clock time.A drawback of time-driven simulation is that thestatistical results would be erroneous if the varianceof event arrival is large. However, the simulation ofATM switches will not suffer it due to the operatingbehaviour. An ATM cell may occupy a number oftime slots on a TDM transmission medium. Beforeentering a switch fabric, cells will be aligned withthe time slot for delineation. This implies that cellsare synchronised with time slot boundary within theswitch fabric. The slot interval can therefore be usedas basic unit of simulation time, which reflects thevery detail of cell movement.3.2. Process assignment

    A large ATM switch fabric may involve a dozenstages with respect to the switching blocks, and eachstage may have several tens or hundreds of suchblocks. With the regular arrangement of theseswitching blocks, it is possible to partition such aswitch fabric in horizontal direction, vertical direc-tion or a mix of both. To the model under study, thevertical partitioning has several advantages in con-trast with the other two. First, the vertical partition-ing coincides naturally with the traffic flowing. Inthe model, packets enter the switch from the externallinks, traverse the switch fabric stage by stage, and

  • 8/9/2019 Liu 1996 Micro Processing and Micro Programming

    6/9

    686 W. Liu et al. / Microprocessing and Microprogramming 41 (1996) 681-689

    finally leave the switch via the respective outputports. Thus, the switch fabric is just like a pipeline;the SEs in a column form a stage of the pipe, andarrival packets are flowing data to be handled. Withthis partitioning, the communication relationships areclear among those concurrent components.

    Second, it is easier to achieve the granularitybalance among the concurrent components divided.The activating time of each component depends onthe amount of data to be processed. For a multi-pathself-routing ATM switch, in general, the quantity ofarrival packets on individual links (thus stages) isstatistically equivalent. The balanced load in differ-ent stages hence means the balanced execution timeamong processes on behalf of different stages. How-ever, this load balance is ensured with the promise ofidentical processors.Third, it is natural for collecting statistics. TheSEs of the same stage within a switching networkplay identical functions for the traffic. The SEs of astage will either distribute incoming packets, or routeincoming packets under a specific routing mode.Thus, this partitioning makes the relationship ofconcurrent processes very logical.

    Forth, furthermore decomposition still can simplygo on to result in balanced partitioning. With moreavailable processors, each partitioned component, i.e.a stage, can be further partitioned in horizontal direc-tion so as to invite more processors to deal with thesimulation. The balanced load can still be kept if thesame number of identical processors is used for eachindividual stage. However, speedup may not beachieved by introducing more processors to join thesimulation if the communicatidn time dominates thecomputation time in the simulation.3.3. Parallel simulation tool

    The simulation tool is developed under a UNIX-like environment, programmed in C language, on atransputer based multicomputer. The processors usedare INMOS T800 with one or two Mbytes memory

    each. The processors can be constructed as a recon-figurable processing network by software through anetwork configuration unit.

    To execute concurrent processes on MIMD paral-lel machines, much attention should be paid on twoimportant aspects. One is to avoid load unbalance ondifferent processors, and the other is to try to de-crease the communication overhead among differentprocessors.

    Due to the unidirectional packet flow, the modelhas been partitioned according to the individualstages, which reflects the packet flow naturally andsimplifies the communication relationships amongprocesses. With a small size switch fabric of interest,one stage can be mapped to one process on a pro-cess. When the switch fabric grows, each stage canbe mapped to more processes, thus more processors,to speedup the simulation. Each process will spawnsome child processes at execution time dynamically.All processes will be executed on a single processor.The child processes are responsible for receiving/sending packets from/to the processes on other pro-cessors respectively.The time-driven mode is adopted, and the simula-tion is asynchronous with local clocks of individualprocesses. Process local times are synchronisedthrough message passing among processors. Thelarge quantities of dependent data in the system arethe main reason to choose time-driven mode in thesimulation.

    However, the time-driven mechanism also has anegative effect on the simulation. Empty cells haveto be inserted into cell streams in order to maintainthe time continuation. This means spending extraCPU time in checking the links on which no packetsare carried at the time unit.

    4. Simulation experiment and performanceA series of simulations has been carried out for ascalar ATM switching network, proposed in [22], by

  • 8/9/2019 Liu 1996 Micro Processing and Micro Programming

    7/9

    W. Liu et al./ Microprocessing and Microprogramming 41 (1996) 681-689 687

    means of the simulation tool on a transputer basedmulticomputer. The traffic sources generate ATMcell streams on external links of the switching sys-tem. In the interface module, a self-routing tag andcontrol information are added to each ATM cell, thusintroducing an overhead. The simulation tick corre-sponds one eighth of transmission time of an internalpacket on the link operating at 155,52 Mbit/s. Thisimplies that an internal packet needs eight simulationticks to be processed, while a tick is about 0,437microseconds. The fine time cycle may produceprecise results for performance metrics, but with thecost of long simulation time.

    In order to achieve reliable results, all the follow-ing simulation runs extend five million ticks. Inde-pendent uniform traffic pattern [lo] is assumed with80 percent load on each external links. Unfortu-nately, it is not possible to run the following versionson a single transputer due to the memory require-ment. Thus, the speedup obtained is deduced basedon the idea that simulation time taken on one trans-puter is approximately equal to the sum of the usefultime spent by all transputers involved.Table 1 gives the simulation performance aboutthree specific ATM switches with six, ten, and twelvestages respectively. The mapping strategy of a singleprocessor for each stage is applied to the first twosimulations, whereas that of two processors for asingle stage is used for the third simulation. Thesimulation run for the middle size switch is moreefficient than the small in that the packet communi-cation cost is decreased due to the large size mes-

    Table 1Simulation PerformanceInt.links2565124096

    Stages6

    1012

    Runningtime44 hours79 hours301 hours

    SpeedUP Efficiency4.5 75%8.7 87%18.76 72%

    sages. The simulation of the large size switch is notso efficient as that of other two because the commu-nication relationships become more complex be-tween stages. We achieved the statistical results aboutcell loss ratio and cell queueing delay in the quantityless than 10m6 from all three simulation runs.

    Eventhough these simulations spend a largeamount of time, the speedups are significant. It wasalso found that during execution of the simulation,the computation time dominates the communicationtime. This implies that the efforts to reduce thecommunication cost to a minimum by overlappingthe communications on different physical links oneach processor with computation have been verysuccessful.

    5. ConclusionsThe integration of simulation and parallel process-ing techniques can provide strong enough computingpower for the simulation of large scale ATM switch

    networks. A parallel simulation tool to evaluate theperformance of a class of ATM switching networkshas been developed and implemented. First, the sim-ulation model is built based on a class of ATMswitching systems. In order to implement an efficientsimulation tool, various parallel processing tech-niques are then approached together with the systemmodel under study. Through this approach, the modelhas been decomposed into a set of processes for aloosely-coupled parallel computer. With a very highlevel of detail in the simulation, the benefits fromparallelism are significant (and necessary). From theexperiments, we can conclude that MIMD multicom-puters are quite suitable to the problems of evaluat-ing performance for large high-speed ATM switch-ing systems.Even though we emphasise that the simulationtool is used to assess performance for a class ofATM switching systems, the tool can be generalisedto the packet switch architectures for variable length

  • 8/9/2019 Liu 1996 Micro Processing and Micro Programming

    8/9

    688 W . Liu et al./Mi croprocessing and Mi croprogramm ing 41 (1996) 681-689packets. Because it is written in C language anddeveloped under UNIX-like environment, the simula-tion tool can be ported to other loosely-coupledMIMD computers without much difficulty. However,attention should be paid to the communication costof the target machine. For example, distributing pro-cesses on a set of processors interconnected with abusy LAN would result in a very poor simulationperformance.

    Although large gain has been achieved throughthe parallelization of the model, reliable statisticsstill require huge computing power for a simulationof the large scale ATM switching network at adetailed level. The progress of semiconductor tech-nology is another aspect that can help in shorteningthe simulation run time for such applications. Forexample, new generation processors and parallel ma-chines offer significant improvements in both com-putation and communication performance. No doubtthe combination of advanced microprocessors withsuitable algorithms will provide a more cost-effec-tive environment for the simulations requiring largequantities computing power, such as large ATMswitching networks.

    AcknowledgementThe authors would like to thank the anonymousreferees for their helpful suggestions on an earlierversion of this paper.

    References[I] C.V. Ramamoorthy, Pipeline Architecture, Compuring Sur-

    veys 9(l) (March 1977) 61-102.[2] K. Hwang and F.A. Briggs, Computer Architecture andParallel Processing (McGraw-Hill, 1985).

    [3] R. Hackney and C. Jesshope, Parallel Computer 2 (AdamHilger, 19881.[4] K.M. Chandy, V. Holmes and J. Misra, Distributed simula-

    tion of networks, Computer Netw orks 3(l) (Feb. 19791 105-113.[S] J. Misra, Distributed discrete-event simulation, CompuringSuroeys 18(l) (March 1986) 39-65.

    [6] D.R. Jefferson, Virtual Time, ACM Trans. ProgrammingLanguages and Syst ems 7(3) (July 1985) 404-425.

    [7] R. Righter and J. Walrand, Distributed simulation of discreteevent systems, in Proc. IEEE 77(l) (Jan. 1989) 99-113.181 .A. Tobagi, T. Kwok and F.M. Chiussi, Architecture, per-formance, and implementation of the tandem Banyan fastpacket switch, IEEE J. Sel. Areas in Commun. SAC-9(8)(Oct. 19911 1173-l 193.

    [9] P. Newman, ATM technology for corporate networks, IEEECommun. Mag. 101 (April 1992) 90-101.

    [IO] F.A. Tobagi, Fast packet switch architectures for broadbandintegrated services Digital Networks, in Proc. IEEE 78( 1)(Jan. 19901 133-167.

    [l I] W. Stallings, ISDN: An Introduction, (Macmillan, New York,19891.

    [12] M.D. Prycker, Asy nchronous Transfer Mode: Solution for

    r 1 3 1

    [ I 4 1

    I151

    [I61

    [ I 7 1

    [I81E l 9 1

    DOI

    P I I

    L a

    Broadband ISDN (Ellis Horwood Series in Computer Com-munication and Networking, London, 1991).H. Kuwahara et al., A shared buffer memory switch for anATM exchange, in Proc. ICC89 Boston (June 19891 118-122.D.G. Fisher et al., A flexible network architecture for theintroduction of ATM, ISS90 Stockhoim (May 1990) 35-44.Y. Sakurai, N. Ido, S. Gohara and N. Endo, Large-scaleATM multistage switching network with shared buffer mem-ory switches, IEEE Commun. Mag. (Jan. 1991) 90-96.M.A. Hemion, G.J. Elienberger, G.H. Petit and P.H. Parmen-tier, A Multipath self-routing switch, IEEE Commun. Mag..(April 1993).M.G. Hluchyj and M.J. Karol, Queueing in high-performancepacket switching, IEEE J. Sel. Areas in Commu n. SA C-6(9)(Dec. 1988) 1587- 1597.J.C. Comfort, The simulation of a master-slave event setprocessor, Simulat ion 42 ( 1984) 117- 124.D.A. Reed, A.D. Malony and B.D. McCtedie, Parallel dis-crete event simulation using shared memory, IEEE Trans.Sofrw are Eng. SE-l 4(4) (April 19881541-553.C.I. Phillips and L.G. Cutbbert, Concurrent discrete event-driven simulation tools, IEEE J. Sel. Areas in Comm un.SAC-9(3) (April 19911477-485.R. Fujimoto, Parallel discrete event simulations, Commun.ACM 33(10) (Oct. 1990) 31-53.M.A. Henrion, et al., Switching network architecture forATM based broadband communications, ISS90 Stockholm(May 1990) l-8.

  • 8/9/2019 Liu 1996 Micro Processing and Micro Programming

    9/9

    W . Liu et al./ Mi croprocessing and Mi croprogramm ing 41 (1996) 681-689 689

    Weixin Liu received the B.S. degree incomputer science from Changsha Insti-tute of Technology, Changshu, China, in1982.He is currently working towards the Ph.J.degree at the Free Universitv of BNS-sel; (Vtije Universiteit B&el), BNS-se l s , Belgium. His current research in-terests include parallel computation,ATM switching networks, and perfor-mance evaluation techniques.

    E. Dirkx has an M.Sc. Electronic Engi-neering 1984, M.Sc. Computer ScienceI

    1986, Ph.D. Computer Science 1990,and an M.Sc. Management 1990. Dr.Dirkx has been a research scientist atI.B.M. T.J. Watson Research Centre(Yorktown Heights, 1992) and Elec-trotechnical Laboratorv (Tsukuba. 1993).Since 1995, he is lecturer at the VriJeUniversiteit Brussel. His research inter-

    G.H. Petit is traffic technology managerat the Alcatel Corporate Research Cen-ter in Antwerp (Alcatel-Bell). He is anIEEE member and since 1992 part-timelecturer at the University of Antwerp.He is active in the area of B-ISDNtraffic and flow control, resource man-agement strategies and performanceevaluation of large ATM switching sys-tems and networks.

    ests are parallel computer architecture,distributed computing, (discrete) simula-tion and telecommunication networks.