A Web-based distributed system for hurricane occurrence ...10 Firstly, our system is a large-scale...

Unc

orre

cted

proo

fs

October 22, 2003 Marked proof Ref: SPE580/26122e Sheet number 1

SOFTWARE—PRACTICE AND EXPERIENCESoftw. Pract. Exper. 2004; 34:1–23 (DOI: 10.1002/spe.580)

A Web-based distributed systemfor hurricane occurrenceprojection

Shu-Ching Chen1,∗,†, Sneh Gulati2, Shahid Hamid3, Xin Huang1, Lin Luo1,Nirva Morisseau-Leroy4, Mark D. Powell5, Chengjun Zhan1 and Chengcui Zhang1

1 School of Computer Science, Florida International University, Miami, FL 33199, U.S.A.2 Department of Statistics, Florida International University, Miami, FL 33199, U.S.A.3 Department of Finance, Florida International University, Miami, FL 33199, U.S.A.4 Cooperative Institute for Marine and Atmospheric Science, University of Miami, Coral Gables,FL 33124, U.S.A.5 Hurricane Research Division, NOAA, Miami, FL 33149, U.S.A.

SUMMARY5

As an environmental phenomenon, hurricanes cause significant property damage and loss of life in coastalareas almost every year. Research concerning hurricanes and their aftermath is gaining more and moreattention nowadays. This paper presents our work in designing and building a Web-based distributedsoftware system that can be used for the statistical analysis and projection of hurricane occurrences.Firstly, our system is a large-scale system and can handle the huge amount of hurricane data and intensive10

computations in hurricane data analysis and projection. Secondly, it is a distributed system, which allowsmultiple users at different locations to access the system simultaneously and to share and exchange the dataand data model. Thirdly, our system is a database-centered system where the Oracle database is employedto store and manage the large amount of hurricane data, the hurricane model and the projection results.Finally, a three-tier architecture has been adopted to make our system robust and resistant to the potential15

change in the lifetime of the system. This paper focuses on the three-tier system architecture, describing thedesign and implementation of the components at each layer. Copyright c© 2004 John Wiley & Sons, Ltd.

KEY WORDS: distributed system; hurricane statistical analysis; database

20

∗Correspondence to: Professor Shu-Ching Chen, Florida International University, School of Computer Science, 11200 SW 8thStreet, ECS 354, Miami, FL 33199, U.S.A.†E-mail: [email protected]

Contract/grant sponsor: Florida Department of Insurance under the ‘Hurricane Risk and Insured Loss Projection Model’ project

Copyright c© 2004 John Wiley & Sons, Ltd.Received 21 March 2003

Revised 4 September 2003Accepted 4 September 2003

Unc

orre

cted

proo

fs


2 S.-C. CHEN ET AL.

INTRODUCTION

Due to their significant threat to life and property, it is very important to predict the possible occurrencesof hurricanes in order to prevent damage and loss. However, tracking the recovery process acrossdecades to predict their future impact is a challenging task.

A hurricane is a type of tropical cyclone, which is a generic term for a low-pressure system that5

generally forms over warm, tropical oceans. Usually a hurricane measures several hundred miles indiameter and is accompanied by violent winds, incredible waves, heavy rains and floods. Normally ahurricane starts as a tropical depression, becomes a tropical storm when the maximum sustained windspeed exceeds 38 mph and finally turns into a hurricane when the winds have a speed higher than74 mph. Hurricanes have an eye and eye wall. The eye is the calm area near the rotational axis of the10

hurricane. Surrounding the eye are the chick clouds, called the eye wall, which is the violent area of ahurricane [1].

Hurricanes are categorized according to their severity using the Saffir-Simpson hurricane scale,ranging from 1 to 5 [2] as shown in Table I. A category 1 storm has the lowest wind speeds whilea category 5 hurricane has the strongest. These are relative terms, because lower category storms can15

sometimes inflict greater damage than higher category storms, depending on where they strike and theparticular hazards they bring. In fact, tropical storms can also produce tremendous damage, mainly dueto flooding.

It is reported that every year approximately ten tropical storms develop over the Atlantic Ocean.Although many of these remain over the ocean, some become hurricanes and strike the United States20

coastline and at least two of them are greater than category 3, posing enormous threats to life andproperty. For example, storm tides preceding hurricane Camille in 1969 were in excess of 20 ft, andthe flooding accompanying hurricane Agnes in 1972 caused 122 deaths and US$6.4 billion in damagein the northeast.

Sophisticated three-dimensional numerical weather prediction models (e.g. [3]) are too computa-25

tionally expensive to conduct hurricane loss projection simulation studies. In order to project lossesassociated with landfalling hurricanes, statistical Monte-Carlo simulations [4] are conducted, whichattempt to model thousands of years of hurricane activity based on the statistical character of thehistorical storms in the vicinity of the location of interest.

Another hurricane damage and loss projection model is HAZUS [5,6]. HAZUS, or Hazards U.S.,30

was developed by the Federal Emergency Management Agency (FEMA) as a standardized, nationalmethodology for natural hazards losses assessment. HAZUS can estimate the damage and losses thatare caused by various natural disasters such as earthquakes, wind and floods. Some useful databases,such as a national-level basic exposure database, are built into the HAZUS system, which allow theusers to run a preliminary analysis without having to collect additional local data. It also provides the35

functionality to allow the users to plug their own data into the databases.Although HAZUS is powerful and useful, the necessary software packages, such as the commercial

GIS software, need to be installed in every machine on which the HAZUS system runs, which in turnincreases both expenses and manual labor.

This paper presents a distributed system for hurricane statistical analysis and projection. First of40

all, our system is built upon an object-relational database management system Oracle9i [7], whichis one of the core system components to store and manage the large amount of hurricane data, thehurricane data model and the projection results as well. The source data sets, such as the HURDAT

Copyright c© 2004 John Wiley & Sons, Ltd. Softw. Pract. Exper. 2004; 34:1–23

Unc

orre

cted

proo

fs


HURRICANE OCCURRENCE PROJECTION 3

Table I. Saffir–Simpson hurricane scale.

Wind speed Storm surgeCategory (mph) (ft) Examples Damage

1 74–95 4–5 Charley (1998) minimal2 96–110 6–8 Bob (1991) moderate3 11–130 9–12 Alicia (1983) extensive4 131–155 13–18 Andrew (1992) extreme5 >155 >18 Camille (1969) catastrophic

database [8], are imported into the database and are modeled by applying the object-relational concepts.The user may also import the customized data into the database. In addition, the models and projectionresults produced by the system are stored into and managed by the database for future use. Secondly,in contrast to the existing hurricane projection applications, an important feature of the proposedsystem is that it aims to support both professional and general-purpose users in a very convenient way.5

For that purpose, a Web-based distributed system architecture following the client–server architectureis adopted to provide easy and parallel accesses to multiple users at different locations. Specifically, aWeb-based system based on Java Server Pages (JSPs) [9] and J2EE is implemented. All the specificsoftware and hardware are installed only on the server side. Anyone who can surf the Internet usinga standard Web browser is able to take advantage of the system without any additional cost, while10

the underlying principles are seamlessly concealed from the Website visitors. Prototyping the systemonline also offers great flexibility in content presentation and visualization. Since the hurricane dataare constantly being updated and the mathematical models for the hurricane data are also potentiallychangeable, a three-tier architecture is adopted as our system’s fundamental architecture to provide thetransparency among the data layer (hurricane data), application logic layer (the hurricane data model)15

and the user interface layer. This architecture makes our system more robust and resistant to a potentialchange in the lifetime of the system.

SYSTEM ARCHITECTURE

To achieve the system robustness, flexibility and resistance to potential change, the popular three-tierarchitecture is deployed in the intended system. The architecture consists of three layers: the user20

interface layer, the application logic layer and the database layer. The three-tier architecture aimsto solve a number of recurring design and development problems, hence to make the applicationdevelopment work easier and more efficient. The interface layer in the three-tier architecture offersthe user a friendly and convenient entry to communicate with the system while the application logiclayer performs the controlling functionalities and manipulates the underlying logic connection of25

information flows; finally, the database layer conducts the data modeling job, which can store, index,manage and model information needed for this application.


Unc

orre

cted

proo

fs


4 S.-C. CHEN ET AL.

Database User Interface

WebBrowser

OC4JContainer

Java Bean

ORACLEDB

IMSLLibrary

JNI

MathModelin C++

WebServer

Application Logic

HTTP/SSL

JDBC

Math Model

Figure 1. Detailed architecture of the system.

Web applications are perfect for utilizing three-tier architecture because the presentation layeris necessarily separated, and the logic and data components can be divided up much like aclient–server application. A detailed illustration of the system’s architecture is given in Figure 1.Components contained in each tier and the relations among different tiers are described in the followingsections.5

User interface tier

The first tier is the user interface tier. This tier manages the input/output data and their display. With theintention of offering great convenience for the users, the system is prototyped on the Internet. The usersare allowed to access the system by using any existing Web browser software. The user interfacetier contains HTML components needed to collect incoming information and to display information10

received from the application logic tiers. The Web visitors communicate with the Web server viaapplication protocols such as HTTP and SSL, sending requests and receiving replies. In our system, themajor Web-scripting language exploited in designing the presentation layer is the JSP technique [9].


Unc

orre

cted

proo

fs



Application logic tier

The application logic tier is the middle tier, which bridges the gap between the user interface andthe underlying database and hides technical details from the users. An Oracle9i Application Server isdeployed. Its OC4J container embeds a Web server, which responds to events, such as data receiving,translating, dispatching and feed-backing jobs [10,11]. Components in this tier receive requests coming5

from the interface tier and interpret the requests into apropos actions controlled by the defined workflow in accordance with certain pre-defined rules. JavaBeans perform the appropriate communicationand calculation activities such as getting/pushing information from/to the database and carrying out thenecessary computing work with respect to proper statistical and mathematical models. JDBC [12] isutilized for JavaBeans to access the physical database. In the interest of quick system response, C/C++10

language is used to program the computing modules that are integrated into the Java code via JNI [13].

Database tier

The database tier is responsible for modeling and storing information needed for the system and foroptimizing the data access. Data needed by the application logic layer are retrieved from the database,then the computation results produced by application logic layer are stored back to the database.15

Since data constitute one of the most complex aspects of many existing information systems, it isessential in structuring systems. Both the facts and rules captured during data modeling and processingare important to ensure the data integrity. An oracle9i database is deployed in our system, and ObjectRelational Model is applied to facilitate data reuse and standard adherence.

USER INTERFACE20

The intended system is prototyped into the Internet, therefore the design and implementation of thesystem user interface mainly becomes a job to design and implement Web pages. The users can gainaccess to the system through any commonly used commercial browsers such as Internet Explorer,Netscape, etc.

Due to its ‘unlimited’ expressive power and natural coherence with the J2EE architecture, JSP Web-25

scripting technology is adopted to implement the Web pages [9,14]. JSPs, sitting on top of a Javaservlets model, can easily and flexibly generate the dynamic content of a Web page. The basic ideaof JSPs is to allow Java codes to be mixed together with static HTML or XML templates. The Javalogic handles the dynamic content generating, while the markup language controls the structuring andpresentation of the data.30

Since putting all the Java codes into a JSP itself causes unmanageable content, especially whenthe tasks performed by the Java code are not simple, JavaBeans are imported to perform most of theactual work. For the sake of performance, complex computational tasks are actually achieved by usingC/C++ codes. The C/C++ code is seamlessly integrated into corresponding Java code via the JavaNative Interface (JNI) mechanism [13]. Java Applet techniques are exploited when necessary to live35

up the Web page.


Unc

orre

cted

proo

fs


6 S.-C. CHEN ET AL.

Table II. El Nino and La Nina years.

El Nino year La Nina year

1925 19331929 19381930 19421940 19441941 19451951 19481953 19491957 19501963 19541965 19551969 19561972 19611976 19641977 19671982 19701986 19711987 19731990 19741991 19751993 19781994 19881997 1995

199819992000

Annual Hurricane Occurrence projection

The first step to study the hurricane phenomena and their impact is to estimate the frequency ofhurricanes in the future. Annual Hurricane Occurrence (AHO) projection is proposed to address thisproblem. AHO estimates the frequency of hurricanes occurring in a series of years based on anassociated hurricane occurrence probability distribution, which is obtained through statistical analysis5

and calculation on the basis of historical hurricane records.

Rationale

For the estimation of hurricane occurrence distribution to be conducted, a suitable data set needsto be selected. Different data set choices significantly influence the final estimation of a probabilitydistribution.10

Hurricane records in the database are categorized into five datasets according to climate cyclesor qualifications. The categories are: (1) 1851–2000, (2) 1900–2000, (3) 1944–2000, (4) ENSO


Unc

orre

cted

proo

fs



Table III. Multi-Decadal yearranges and climate phase.

Climate phase Climate phase(warm) (cold)

1870–1902 1903–19251926–1970 1971–19941995–2001

begin

system gives outdataset selection

system gets data

calculate basicstatistical features

generatedistribution

userselect

Oracle DB

IMSL Statistic &Math Library

Figure 2. Flow chart for AHO.

and (5) Multi-Decadal. The first three groups contain hurricanes occurring in different year ranges.The ENSO data set is for the El Nino and La Nina years. Table II lists all El Nino and La Nina yearsup to date. The last group, Multi-Decadal, includes records of hurricanes that occurred in certain yearswhen the climate phase was either warm or cold. The years contained in this category are detailed inTable III.5

The statistical models are generated from the historical data set. Based on the generated probabilitydistribution models, the number of hurricane occurrences per year in the future are produced for anynumber of years the user desires. The detailed description of these models is presented later in the


Unc

orre

cted

proo

fs


8 S.-C. CHEN ET AL.

Figure 3. Data set selection Web page (AHO).

‘Statistical and mathematical modeling’ section. Figure 2 illustrates the overall workflow for AHOestimation.

Implementation

Several JSPs and JavaBeans are constructed to implement the functionalities of AHO projection.JSPs offer interfaces for the user to specify a data set and for displaying results to the user. JavaBeans5

are responsible for handling communication and computation tasks and for hiding the technical detailsfrom the external users. The data are retrieved from and stored back to the database via calling JDBCAPI. Simple calculations are performed by Java code itself while more complicated computing tasksare achieved by C/C++ programs that are integrated into Java code through JNI in order to improve thecomputing performance.10

Data set selection

First, the Web visitor needs to select a data set and to tell the system to use the selected data set as thebasis of the statistical projection. A JSP is built for that task. To avoid typos and illegal datasets, alldata sets that are currently available are offered to the user via a drop-down list. The user’s choice iscollected by a form. The user chooses a data set he/she wants and submits the selection to the system15

by clicking the ‘Submit’ button. The actual Web page is portrayed in Figure 3.


Unc

orre

cted

proo

fs



Figure 4. Distribution models evaluation Web page.

Statistical models evaluation

Another JSP file handles the submitted selection from the user. In this JSP file, there are twoimported JavaBeans. The first JavaBean is the database-querying Bean that communicates with thedatabase. It connects to the database, queries the database with respect to the selected data set,retrieves the corresponding data and stores the data. The second, distribution-evaluating, JavaBean5

has been devised particularly to evaluate various statistical distribution models using the retrieveddata. The data are passed to the distribution-evaluating JavaBean from the database-querying Bean.The statistical distribution models and evaluating standards exploited are elaborated in the ‘Statisticaland mathematical modeling’ section.

At the end of the processing, the related information is returned and displayed to the user. In our10

case, basic statistical characteristics of data in the selected data set, such as mean and variance, arereturned to the user. The distribution models are provided to the user as well.

For the purpose of statistical projection in the next step, the user needs to specify N , the numberof years for which the projection process generates the estimated numbers of hurricane occurrences.This information is captured by a textfield within a form. After the user inputs the desired number of15


Unc

orre

cted

proo

fs


10 S.-C. CHEN ET AL.

Figure 5. AHO projection result Web page (line).

years and clicks the ‘Submit’ button to send the request, the statistical projection is conducted basedon the best probability distribution generated from the user selected data set.

Figure 4 is a snapshot of the corresponding JSP Web page. The upper part displays informationreturned to the user; for example, the data set selection information and the statistical values of theselected data set. The lower part uses the text area to obtain the user’s input data.5

Statistical projection

Once it obtains the statistical projection request and the necessary information from the user, the systemstarts the projection process. The calculation part of the projection work is performed by anotherJavaBean, which generates the N values of the number of hurricane occurrences based on the indicateddistribution.10

The statistical projection results, a collection of the number of hurricane occurrences, are sent backto the user. In the meantime, these results will be stored in the database for future computation. To offerlive visualization, the Java Applet mechanism is introduced. Our Java Applets are implemented basedon the Ptolemy Java Applet package from Berkeley [15]. The statistical projection result can be plottedas a line chart or a bar chart, as shown in Figures 5 and 6. The maximum number of years displayed15

is 100 per screen. There are both ‘Previous 100’ and ‘Next 100’ buttons to allow for the browsing ofa very large number of years, screen by screen. In the example illustrated in Figures 5 and 6, the userspecified a large number of years for hurricane occurrences projection, and the graphs actually presentthe third screen of data that starts from year 201 to 300.


Unc

orre

cted

proo

fs



Figure 6. AHO projection result Web page (bar).

Storm Genesis Time projection

One essential trait of a hurricane is its genesis time. The Storm Genesis Time (SGT) is the date and timethat an organized closed cyclonic circulation is first identified in the surface wind field surrounding alow-pressure area, such that a regional forecast center would classify the system as an incipient tropicalcyclone.5

For each numerically simulated hurricane resulting from the AHO, the associated SGT needs to beproduced. SGT projection aims to achieve this target. The prediction of genesis time is grounded in theinvestigation and analysis of the historical hurricane genesis time data.

Rationale

One data set needs to be determined to serve as the basis of the statistical projection. As in AHO,10

five data sets are available for the projection of SGT: (1) 1851–2000, (2) 1900–2000, (3) 1944–2000,(4) ENSO and (5) Multi-Decadal. The meaning of each data set is described in Tables II and III.

Genesis time is represented by the first fix data of the selected data set. To record the precise genesistime of a storm once it forms is still beyond the capability of the currently available observationalinstrumentation and hurricane modeling techniques. The first fix data are a collection of data related to15

the characteristics of a hurricane the first time it is observed and recorded, including storm name, date,time, position (longitude and latitude), maximum wind speed and pressure, etc. Hence, technically, it isa suitable approximation of the actual SGT. The first fix data are stored in the database and are retrieved


Unc

orre

cted

proo

fs



Table IV. Example of first fix data records.

StormId StormName GenesisDate JulianDate GenesisTime

310 NME 5-Jul-1851 2 397 309 120 000311 NME 16-Aug-1851 2 397 351 000 000

1114 NME 19-Aug-1852 2 397 720 000 0001153 NME 5-Sep-1852 2 397 737 000 000

Table V. First fix data records after processing.

StormId StormName GenesisDate GenesisTime SGT

310 NME 5-Jul-1851 120 000 1572311 NME 16-Aug-1851 000 000 2592

1114 NME 19-Aug-1852 000 000 2640

upon the running time. Table IV depicts some data record examples for the first fix data. Each recordshows when and where a particular tropical storm originated.

In all these data fields, those of the utmost concern are fields representing time information:GenesisDate, JulianDate and GenesisTime. The GenesisDate field records the calendar date on whichthe storm began. The corresponding JulianDate of that calendar date is stored in the JulianDate5

field. JulianDate is simply a continuous count of days and fractions since noon Universal Time on1 January, 4713 BCE and is widely used as time variables within astronomical software. GenesisTimeindicates the time point when the storm originated. Since the actual observation is conducted everyhour, the value in that field represents not an exact time point but a time interval. The 24 h dayis divided into four intervals: I1 = [0AM, 6AM), I2 = [6AM, 12Noon), I3 = [12Noon, 6PM)10

and I4 = [6PM,Midnight), which are denoted respectively as values 000000, 060000, 120000 and180000. For instance, the first record shows that the tropical storm with StormId 310 began during thetime interval (12:00, 18:00), 07/05/1851.

Since the actual estimation is based on the time intervals between the continuous hurricanes thatare estimated in the unit of hours, the first fix data are processed to produce the interval data for thecalculation purpose. The conversion is conducted as following:

SGT = 24 × (Julian date of a storm − Julian date of 05/01/1851) + GenesisTime

For example, the storm with StormId 311 happened in the time interval I1 on 08/16/1851, the SGTvalue is:

24 × (2 397 351 − 2 397 243) + 0 = 2592.

where 2 397 351 is the Julian date of 16 August 1851, and 2 397 243 is the Julian date of 1 May 1851.The resulting SGT data after processing are also shown in Table V.15


Unc

orre

cted

proo

fs



Display result

Oracle DB

System providesdataset selection

Systemgets datafrom database

System estimatesthe CDF of HBG

System generatesthe SGT values

Save the SGT datato database

Begin

Userselects

Figure 7. Flow chart for SGT.

After this preprocessing, the probability distribution of the SGT values is analyzed based on certainestimating algorithms, which is elaborated in the ‘Statistical and mathematical modeling’ section.Then, according to the estimated distribution, an associated genesis time is produced for each hurricanethat is predicted by AHO.

The overall information flow for SGT prediction is shown in Figure 7.5

Implementation

The flow chart of SGT indicates that the users need to first appoint a data set, then the systemautomatically begins to estimate the distribution and to generate new SGT values. During the wholepredicting process, JSP Web pages allow the users to select the desired data set. The JavaBeans dealwith calculating and data retrieving/restoring work. The distribution estimating job involves a lot of10

statistical and mathematical functions, which are accomplished by C/C++ code.


Unc

orre

cted

proo

fs



Figure 8. Data set selection Web page (SGT).

Data set selection

Similar to AHO, there are a total of five data sets available, and they are provided to the users via adrop-down list. The users select one of them and send the projection request to the system by submittingthe selection to the system. A snapshot of the data set selection Web page is illustrated in Figure 8.

Distribution estimation and SGT projection5

Based on the data set choice, the system first retrieves the related first fix data from the databaseand processes them to generate SGT data in conformity to the above-mentioned converting approach.Then the system estimates the distribution of the SGT values and produces a SGT value for eachnumerically simulated hurricane from AHO. The SGT data are stored into the database at the sametime. Example SGT values are dynamically displayed to the user in the format of a table. Figure 9 is10

the resulting Web page.

STATISTICAL AND MATHEMATICAL MODELING

The modeling approach utilized in our system complies with the popular hurricane projection strategyas detailed in [16], i.e. to model the entire track of a hurricane beginning with its initiation over the


Unc

orre

cted

proo

fs



Figure 9. SGT projection result Web page.

open ocean to its dissipation. The characteristics of the storm are modeled at each 6 h point in the stormhistory.

The first step in modeling the complete track of a hurricane is to model the number of hurricanesoccurring per year and the genesis time of each individual storm, which are the purposes of the AHOprojection and SGT projection, respectively. Specifically, AHO projection aims to model and predict5

the number of storms occurring per year, and SGT projection attempts to predict the genesis time ofeach specific storm. A statistical approach is adopted, and the statistical models of the AHO and SGTare built from the historical storm data via statistical analysis.

One meteorological fact is that the statistical properties of AHO vary with different year ranges.For example, the statistical properties of storms in El Nino years are quite different from those in non10

El Nino years. Therefore, different statistical models are necessary for different year ranges. In oursystem, all the historical storm records in the database are categorized into five data sets according tometeorologic criteria that are: (1) 1851–2000, (2) 1900–2000, (3) 1944–2000, (4) ENSO and (5) Multi-Decadal. The meaning of each category has been discussed in the last section. Different statisticalmodels are built for individual data sets.15


Unc

orre

cted

proo

fs



AHO projection

AHO projection aims to model and predict the number of storms occurring per year. According todomain knowledge in meteorology, the best statistical distribution of the number of storms occurringper year is either Poisson distribution or negative binomial distribution. The Poisson distribution hasbeen the classic distribution describing the occurrence of a stochastic process. However, the Poisson5

distribution assumes that the mean number of hurricanes in any two nonoverlapping time intervals ofequal length is the same. Allowing these means to be different leads to the ANO being modeled by amixture of Poisson distributions, which in effect is the negative binomial distribution.

First, the parameters of both the Poisson distribution and the negative binomial distribution areestimated from the historical data. Then the goodness of fit for the two distributions is evaluated based10

on the chi-squared statistic, and the distribution with the better fit is picked as the final statistical modelof AHO.

Data samples

Since different statistical models are built for different data sets, the user first needs to select onedata set from the five categories through the user interface as mentioned in the last section. Then the15

historical data of the selected data set are retrieved from the database. The retrieved M data samplesare denoted by X = {xi} (i = 1, 2, . . . ,M) where M is the number of years in the data set and xi

denotes the number of storms that occurred in the ith year in the data set. The statistical model of AHOis built based on the M data samples.

Estimation of Poisson distribution20

The probability distribution of a Poisson random variable x with mean γ is P(x) = (γ xe−γ )/x!.Given the data samples X = {xi} (i = 1, 2, . . . ,M) from the historical storm data, the maximumlikelihood estimator of the parameter γ is

γ =∑M

i=1 xi

M(1)

Estimation of negative binomial distribution

The single variable negative binomial distribution can be represented as

P(x) = �(x + k)

�(x + 1) ∗ �(k)

(k

m + k

)k (m

m + k

)x

(2)

where �(·) is the gamma function, namely �(x) = ∫ ∞0 tx−1e−t dt


Unc

orre

cted

proo

fs



Given the M data samples X = {xi} (i = 1, 2, . . . ,M) from the historical storm data, the estimatesof parameter m and k are

m =∑M

i=1 xi

M(3)

k = m2

s2 − m(4)

where s is the variance of data samples X.

Model selection

After the estimation of both the Poisson distribution and the negative binomial distribution parameters,the chi-square statistic is calculated to select the final model. The distribution with higher p-value isselected as the final statistical model of the AHO.5

Assume the data are divided into k bins. The test statistic of the chi-square goodness of fit is definedas:

p =k∑

i=1

(Oi − Ei)2/Ei (5)

where Oi is the observed frequency for bin i, and Ei is the expected frequency for bin i.Let K = max{xi} (i = 1, 2, . . . ,M), which means K is the maximum number of hurricanes

occurring per year in historical data. It is safe to assume the number of hurricanes occurring per yearranges from 0 to K . Then the data are divided into (K + 1) bins with width of 1. The chi-square teststatistic can be rewritten as

p =K∑

i=0

(Oi − Ei)2/Ei (6)

where Oi is the observed frequency for i hurricanes occurring per year, and Ei is the expectedfrequency for i hurricanes occurring per year according to the statistical model that is either Poissondistribution or negative binomial distribution. The distribution with higher p value is selected as thefinal statistical model of the AHO.10

AHO projection validation

To validate the projection performance of the models explored for AHO, a subset of the data set thatincludes hurricane occurrence data is used for statistical distribution estimation, and then the derivedmodel is used to forecast the number of hurricanes for a number of years. Considering the historicalhurricane data stored in the database, the subset used to estimate the distribution contains 100 years15

worth of data, namely from the years 1900 to 1999, and the actual data used for comparison includesdata from years 1991 to 2001.

On the basis of historical data from 1900 to 1990, the 95% confidence intervals for the meannumber of hurricanes every year using Poisson distribution is (7.95, 9.15) and is (7.80, 9.29) using thenegative binomial distribution. Figure 10 presents side by side the projected frequencies of hurricane20

occurrences for the years 1991–2001 and the associated actual occurrence frequencies, which are based


Unc

orre

cted

proo

fs



0

1

2

3

4

5

6

Fre

qu

ency

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Number of Annual Hurricane Occurrences

Historical Projected

Figure 10. Frequencies histogram of historical/projected AHOs.

on the negative binomial distribution model. Since 11 years worth of data is too small to give accuratepredications, some projected data are not very close to the actual data as illustrated in this figure.

SGT projection

The genesis time of a storm are the first fix data of that storm. SGT projection aims to predict the genesistime of each specific storm. This goal is achieved by modeling the number of hours between the genesis5

of a storm in 6 h resolution and the start of its hurricane season rather than directly modeling the SGT.A storm season starts on 1 May of one year and ends on 30 April of the next year. After modeling thenumber of storms using AHO from the historical data, the SGT projection model can be used to predictthe time intervals among storms, and thus the SGT of each storm can be predicted as well.

Data samples10

The user first selects one data set from the five categories through the user interface. Since originally thedata set in the database contains no values of time intervals, the data conversion, described previously inthe ‘SGT projection’ section, is applied first to generate that information, then the time intervals can beretrieved from the database. The retrieved N data samples are denoted by S = {si} (i = 1, 2, . . . , N),where N is the number of storms in the data set and si denotes the time interval associated with the ith15

storm in the selected data set. The statistical model of SGT is built based on the data samples S.


Unc

orre

cted

proo

fs



Distribution estimation of time intervals

A nonparametric approach is applied to estimate the cumulative distribution function (CDF) of the timeintervals. Let T denote the random variable time (number of hours). The nonparametric approach isdescribed in detail as follows.

All the time intervals si are sorted in ascending order. Assume the sorted result is 0 ≤ T1 ≤ T2 ≤· · · ≤ TW , where W ≤ N . Let fi denote the frequency of the storms at time Ti . The empirical CDF forT as an estimate of the true CDF F(t) = P(T ≤ t) is calculated using the following equation.

FN(t) =

1 if t < T1f1 + f2 + · · · + fi

Nif Ti ≤ t < Ti+1, i = 1, 2, . . . ,W − 1

0 if t > TW

(7)

The empirical CDF is then smoothed using standard kernel smoothing techniques. The kernel usedis the Epanechnikov kernel: K(x) = 0.75(1 − 0.2x2)/

√5, and the local bandwidth is hN(t) =

(S/2)(1/N)1/3. The smooth estimator of F(t) is then calculated as

FN (t) =∫ ∞

0

1

hN(t)K((t − x)/hN(t))FN (x) dx =

W∑j=1

SjK∗(

t − Tj

hN(t)

)(8)

where Sj is the jump of FN at Tj , that is Sj = FN(Tj )−FN(Tj−1), j = 2, 3, . . . ,W and S1 = FN(T1).5

Also K∗(u) is the integral of K(x), that is, K∗(u) = ∫ u

−∞ K(x) dx.

SGT projection validation

We have no intention to validate the approach used for SGT modeling here in the same manner as we dofor the AHO. The reason is that SGT is modeled using a nonparametric approach. Although confidenceintervals for the smooth estimates exist, they are highly technical relying on difficult statistical theories10

and may not be appropriate to present here. However, as a demonstration of the accuracy of the SGTprojection, the comparison histogram is illustrated in Figure 11. The historical hurricane data from1900 to 1990 are still used to derive the distribution, and the actual data of year 1991–2001 are usedfor comparison. The possible SGT values are divided into a number of bins with interval of 600.The corresponding frequency histograms of both actual and projected data are plotted, and the result is15

promising.

DATABASE COMPONENT

The Oracle9i database is incorporated into the system as the information storehouse that stores datarecords for any storms happening in the Atlantic basin since the year 1851. An object-relationaldatabase schema is designed to facilitate the data reusability and manageability. The major advantage20

brought by object-relational concepts is the ability to incorporate higher levels of abstraction into ourdata models, while current relational databases are usually highly normalized models but with littleabstraction.


Unc

orre

cted

proo

fs



0

5

10

15

20

25

30

35

Fre

qu

ency

1200-1799 1800-2399 2400-2999 3000-3599 3600-4199 4200-4799 4800-5399 5400-5999 6000-6599

SGT Range

Number of Generated SGT Number of Historical SGT

Figure 11. Frequencies histogram of historical/projected SGT.

The original data set in the format of textual files is processed and extracted to fit into the object-relational schema. Several programs in a variety of programming languages are developed to automatethe processing and populating tasks.

Hurricane data modeling

Data analyzing and modeling is a vital aspect of the database component. In our system, an object-5

relational design pattern is applied to model hurricane data. Object-relational models can assist thereuse of the database objects. The overall view of the hurricane data schema is depicted in Figure 12.

The database schema for the HURDAT data set consists of six major object types and five majortables. The table Atmosevent list is used to hold the tracking data for all atmosevents, namely thestorms and hurricanes, which were dated from 1851 to the present day in the database, For each10

atmosevent, an atmosevent object is used to model its structured information. The table Storm categoryis used to store the information about the atmosevent’s category and description. The relationshipbetween the table Atmosevent list and table Storm category is built by adding a foreign key into thetable Atmosevent list. The table Landfall stores a storm id and a nested table of Landfall type arrobject. The foreign key storm id of the table Landfall corresponds to the primary key key id of15

Atmosevent list table. The table Stormfix list is used to store the fixes of all the atmosevents and eachstormfix is represented by a Stormfix object. This table is related to table Atmosevent list by a foreignkey event id. Furthermore, the for event field of the Stormfix object refers to an Atmosevent object,and its produced id and produced by fields refer to Platform type object, while its fixobj field is basedon Fix object. The table Platform type list is an object table of Platform type object. The primary key20

key id of platform type list corresponds to the foreign key produced by of the table Stormfix list.


Unc

orre

cted

proo

fs



Figure 12. Database schema.

Original data and data processing

Historical hurricane data stored in this database are directly imported from the North Atlantic ‘besttrack’ HURDAT database that is maintained by the National Hurricane Center in Miami, Florida andthe National Climatic Data Center in Asheville, North Carolina. Currently, the ‘best track’ databasehas been extended from 1851 to 2001.5

One problem with the original data representation of the storm tracks of the Atlantic basin is thatthey are recorded in text files, and there is no unified format for the data entries. Hence the originaldata need to be processed and converted properly in order to populate them into the database schema.

The first step to process the original data is to extract the useful data and to remove the unwanteddata, such as the format symbols. We use the database table Atmosevent list as an example. This table10

stores the high-level information for all storms, and the following corresponding data fields need to be


Unc

orre

cted

proo

fs



extracted from the original data file: (i) storm number, (ii) begin date of that storm and (iii) storm type.Some of the required data can be obtained directly from the original data set, while others need furtherconversion such as ‘storm type’. The ‘storm type’ field cannot be obtained directly from the originaldata file; instead, it has to be calculated by converting the maximum wind speed of each storm to itscorresponding storm category according to some criteria. A C++ program is developed to retrieve the5

data and then to automatically assign a correct storm type to each storm.As another example, the table Stormfix list stores the detailed information about each storm or

hurricane. Such information includes a storm’s life line, the exact latitude and longitude, the windspeed and the central pressure at different fix points for each day, etc. Therefore, this informationneeds to be derived from the original data file. However, the un-unified data entries make it difficult10

to directly import the needed data. A Java program is then developed to deal with the various formatsof the data entries and to output a text file with unified formats, which can be loaded into the databaselater on. To ensure the data consistency between the extracted data and the original data, data checkingis done either manually or automatically through programs.

CONCLUSION15

In this paper, a Web-based distributed system for the projection of hurricane occurrences is presented.It integrates a group of individual applications by combining hurricane data acquisition, storage,retrieval and analysis functions. The system exhibits a modular, extensible, and scalable architecturethat makes it possible to adapt to more complex tasks such as storm track simulation and wind fieldgeneration. The well-established three-tier architecture is exploited to build the system. A variety20

of advanced techniques such as JSP, JNI and JDBC are used in the design and development of theapplication. Both Oracle Database and Application Server are deployed to make the system a coherentintegration. In addition, it is accessible to any user who is able to connect to the Internet and has interestin hurricane prediction information.

ACKNOWLEDGEMENT

25

This work was partially supported by the Florida Department of Insurance (DOI) under the ‘Hurricane Risk andInsured Loss Projection Model’ project. While the project is funded by the Florida DOI it is not responsible forthis paper content.

REFERENCES

1. National Hurricane Center. http://www.nhc.noaa.gov/.30

2. Smith E. Atlantic and east coast hurricanes 1900–98: A frequency and intensity study for the twenty-first century. Bulletinof the American Meteorological Society 1999; 18(12):2717–2720.

3. Kurihara Y, Bender MA, Tuleya RE, Ross RJ. Improvements in the GFDL hurricane prediction system. Monthly WeatherReview 1995; 123:2791–2801.

4. Russell LR. Probability distributions for hurricane effects. Journal of Waterways, Harbors, and Coastal Engineering35

Division, ASCE 1971; 97:139–154.5. HAZUS Home. http://www.fema.gov/hazus/.6. HAZUS Overview. http://www.nibs.org/hazusweb/verview/overview.php.7. http://www.oracle.com/ip/deploy/database/oracle9i/.8. HURDAT data. http://www.aoml.noaa.gov/hrd/hurdat/Data Storm.html.40

9. Java Server Pages (TM) Technology. http://java.sun.com/products/jsp/.


Author Query

Au: Please give access dates for all web site references.

Author Query

Au: ref. 7 Please give further details.

Unc

orre

cted

proo

fs



10. Oracle9iAS Container for J2EE. http://technet.oracle.com/tech/java/oc4j/content.html.11. Panda D. Oracle Container for J2EE (OC4J). http://www.onjava.com/pub/a/onjava/2002/01/16/oracle.html.12. The JDBC API Universal Data Access for the Enterprise. http://java.sun.com/products/jdbc/overview.html.13. Java Native Interface. http://java.sun.com/docs/books/tutorial/native1.1/.14. Morisseau-Leroy N, Solomon MK, Basu J. Oracle8i: Java Component Programming with EJB, CORBA, and JSP. Oracle5

Press (McGraw-Hill/Osborne), 2000.15. The Ptolemy Java Applet package. http://ptolemy.eecs.berkeley.edu/papers/99/HMAD/html/plotb.html.16. Vickery PJ, Skerjl PF, Twisdale LA. Simulation of hurricane risk in the United States using an empirical storm track

modeling technique. Journal of Structural Engineering 2000; 126:12222–1237.


Author Query

Au: ref. 14 Please give location of publisher.

Author Query

Au: ref. 16 Please check that page numbers are correct.

Annotations from spe580.pdf

Page 22

Annotation 1Au: Please give access dates for all web site references.

Annotation 2Au: ref. 7Please give further details.

Page 23

Annotation 1Au: ref. 14Please give location of publisher.

Annotation 2Au: ref. 16Please check that page numbers are correct.

A Web-based distributed system for hurricane occurrence ...10 Firstly, our system is a large-scale...

Documents

Transcript of A Web-based distributed system for hurricane occurrence ...10 Firstly, our system is a large-scale...