EE345 project

of 14

  • date post

    02-Jun-2018
  • Category

    Documents

  • view

    217
  • download

    0

Embed Size (px)

Transcript of EE345 project

  • 8/10/2019 EE345 project sample.pdf

    1/14

    EE345AModeling and Simulation

    Final Project:Web Server Simulation

  • 8/10/2019 EE345 project sample.pdf

    2/14

    1. The SimulationThe system to be modeled is a web and database server combination that comprises one of my other software

    projects. This project is to create a file sharing system using distributed servlets to disseminate content. The centralservers fulfill search and browse requests and present the user with an interface to navigate content.

    The goal of this simulation is to determine the maximum load, in users.

    2. The Real SystemThe real system consists of a web server running scripts that access a database containing items, folders, and

    other information. These scripts format and present data to users, and these scripts also send redirection URLs fordownloading and streaming from individual servlets. This servlets sit at the periphery of the system and serveindividual items after redirection. Once redirection is complete, the central server is uninvolved.

    There is also a back end server that communicates directly with the database server transmitting items to beindexed. This server only transmits information when changes occur in the items a servlet shares.

    3. The Problem

    The system is very complicated. The amount of time necessary to fulfill any individual request is dependanton the amount of content stored. The amount of content stored is dependant on the number of users and theirindividual content. Furthermore, the load on the system is distributed between the processor for tasks such asformatting and sorting and the hard disk for tasks such as searching the database.

    Either the hard disk activity or the processor activity will be the limiting factor, and this simulation is hopedwill find out which one is.

    4. The ModelTo model this system, the database server must be modeled to a high degree of detail. To accomplish this, I

    intend to model the database server as multiple servers that respond to various requests by analyzing a cache andutilizing a single disk server to serve requests not cached. This model is detailed enough to reveal behavior in the

    system while keeping it realistically simple.The database servers are fed by two servers: the parallel web servers working directly with user requests, and

    the parallel back end servers feeding the database new information. These each have independent queues and arefed by independent sources, as each is governed by a different process.

    The database servers are queued through a read/write locking queue. This queue allows mutually exclusiveaccess to read and write database servers, with priority for writes. That is, if a write is pending, no reads can occuruntil all writes complete. While work is waiting, queues hold the requests until they can be processed.

    The database servers communicate via a single queue with a single storage server that models drive access.The database servers communicate with a single static cache controller to model memory access of cache rows andkeys.

    See below in section 8.

    5. The Data After considering the data for interarrival timings for various events, it was determined almost all input to this

    system follows complicated patterns and requires empirical data to function best. All interactive input processesconsist of a period of interactive use with very short interarrival timings then experience a large and highly variabledelay until another batch of requests occurs.

    The interarrival time function must take into account these two behaviors. This is done by choosing anempirical distribution with appropriate cumulative distribution function. An empirical CDF is found from actual

  • 8/10/2019 EE345 project sample.pdf

    3/14

    interarrival data for various classes of events for 0 to 86400 seconds. This fragment is joined piecewise to anapproximated exponential CDF fragment that covers all outliers of the actual dataset and this composite CDF as asingle function is used to determine interarrival times by inverse transform.

    The following data were collected to be used in the model: Download requests interarrival times Browse requests interarrival times Search requests interarrival times Back end data interarrival times Back end data counts Back end data delta counts Folder subfolder counts Folder item counts Folder cache status statistics Cache retrieval time Per item cache miss penalty Download request time Search request time Search result count Search word count

    Most data was analyzed using MATLAB. Scripts were created to construct histograms of the source datausing appropriate bins. The resulting histogram was used to directly construct a CDF for the input process asdescribed in the text by Banks. Due to variability in user behavior, a set of CDFs was constructed for manycommon input processes, each based on a specific real user. Random CDFs are uniformly selected and used as inputfor each of the users simulated.

    The MATLAB scripts used are attached. Two functions, cdf.m and ecdf.m, were used to create the CDFs.Other scripts invoked by the main script batch.m are used to create all data at once. Writecdf.m was used to outputthese in a format readable by the simulation.

    The CDF script works primarily by creating a histogram of appropriate bins. Other scripts are self-explanatory. The source data and all scripts used are included in the directory Data.

    Graphs of various CDFs are shown below.

  • 8/10/2019 EE345 project sample.pdf

    4/14

    0 100 200 300 400 500 6000

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    Time (seconds)

    C u m u

    l a t i v e

    P r o

    b a

    b i l i t y

    Various CDFs of Download Request Interarrival Times

    0 100 200 300 400 500 6000

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9Various CDFs of Browse Request Interarrival Times

    Time (seconds)

    C u m u

    l a t i v e

    P r o

    b a

    b i l i t y

  • 8/10/2019 EE345 project sample.pdf

    5/14

    0 100 200 300 400 500 6000

    0.05

    0.1

    0.15

    0.2

    0.25Various CDFs of Search Request Interarrival Times

    Time (seconds)

    C u m u

    l a t i v e

    P r o

    b a

    b i l i t y

    For these data, it is clear that wide variability is present. Again, this is accommodated by using sets of

    empirical data collected from users as the inputs to the system.

    0 10 20 30 40 50 60 70 80 900

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1Various CDFs of Check-in Request Interarrival Times

    Time (seconds)

    C u m u

    l a t i v e

    P r o

    b a

    b i l i t y

  • 8/10/2019 EE345 project sample.pdf

    6/14

    0 50 100 150 200 250 300 350 4000

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1Various CDFs of Check-in Request Item Counts

    Time (seconds)

    C u m u

    l a t i v e

    P r o

    b a

    b i l i t y

    For these data, it is clear that a wide margin of time intervals and item counts exist in terms of requests

    submitted to the back end server.

    Single CDF functions were also created for some input processes, shown below:

    0 0.005 0.01 0.015 0.020

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    Time (seconds per item)

    C u m u

    l a t i v e

    P r o

    b a

    b i l i t y

    CDF of Per Item Browse Time

  • 8/10/2019 EE345 project sample.pdf

    7/14

    0 0.5 1 1.5 2 2.5 3

    x 104

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1CDF of Per User Item Counts

    Time (items per user)

    C u m u

    l a t i v e

    P r o

    b a

    b i l i t y

    0 0.005 0.01 0.015 0.020

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    Time (seconds per item)

    C u m u

    l a t i v e

    P r o

    b a

    b i l i t y

    CDF of Per Item Search Time

  • 8/10/2019 EE345 project sample.pdf

    8/14

    0 20 40 60 80 100 1200

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1CDF of Search Result Counts

    Items

    C u m u

    l a t i v e

    P r o

    b a

    b i l i t y

    0 50 100 150 200 250 300 350 4000

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    C u m u

    l a t i v e

    P r o

    b a

    b i l i t y

    Items

    CDF of Folder Sub-Item Counts

  • 8/10/2019 EE345 project sample.pdf

    9/14

    0 500 1000 1500 2000 2500 3000 3500 40000

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    Items

    C u m u

    l a t i v e

    P r o

    b a

    b i l i t y

    CDF of Folder Sub-Folder Counts

    of these functions plays a large role in determining the input to the system. For instan

    DF clearly shows that about 35% of the searches return close to 100 results (the limit). ThDF shows that