Population Projection Assignment

7
Pg 1 1 Executive Summary The main purpose of this report is to learn how to use different methods to estimate the future population of a determined services area. Analyze the accuracy and precision of each method to determine which method represents more accurately the data given. The methods to be employed to predict the population are: linear, quadratic, exponential and logistic growth regressions. We concluded based on several parameters: sum of the mean squared errors and from both F and t tests; that the best model to represent this population is the logistic model (d=0) predicting a population of 42,116 people by the end of 2034. We are told that the capita daily consumption creases constantly by 2% annually, hence we developed an equation predict the per capita water consumption by the end of 2034: 126.11 gpcd. This gives as a design parameter of 3.54 MGD that the water must produce to keep up with the future demand and avoid an expensive expansion, construction of another facility or from buying water from other water plants. Theory Water demand forecasting is a tool used by utility managers to predict the amount of water the service area by a particular water plant would need. This water demand must account for domestic, industrial, commercial, public, leakage and wastage use. In this assignment we will only worry about the water demand pertaining the domestic area, the housing area. To determine the amount of water needed we need to use several databases to collected from several sources. One of those sources is from the department of transportation (DOT), the DOT has divided areas into traffic analysis zones (TAZ) which is a group of census block that has at least one major main road going through it or touching the zone boundary, the data is collected on population or housing and employment for the traffic demand model. Housing and population data is collect per TAZ, using the most recent data available from the DOT or other sources such as the regional building permit data and population estimates from the Municipalities Offices of Budget. Several methods area employed to forecast water demand, but we will do a per capita water demand forecast only. A per capita water demand forecast assumes each individual uses the same average amount of water annually (q) and to that average annual amount of

Transcript of Population Projection Assignment

Page 1: Population Projection Assignment

 Pg  1  

  1  

Executive Summary

The main purpose of this report is to learn how to use different methods to estimate

the future population of a determined services area. Analyze the accuracy and precision of

each method to determine which method represents more accurately the data given. The

methods to be employed to predict the population are: linear, quadratic, exponential and

logistic growth regressions.

We concluded based on several parameters: sum of the mean squared errors and from

both F and t tests; that the best model to represent this population is the logistic model (d=0)

predicting a population of 42,116 people by the end of 2034.

We are told that the capita daily consumption creases constantly by 2% annually,

hence we developed an equation predict the per capita water consumption by the end of 2034:

126.11 gpcd. This gives as a design parameter of 3.54 MGD that the water must produce to

keep up with the future demand and avoid an expensive expansion, construction of another

facility or from buying water from other water plants.

Theory

Water demand forecasting is a tool used by utility managers to predict the amount of

water the service area by a particular water plant would need. This water demand must

account for domestic, industrial, commercial, public, leakage and wastage use. In this

assignment we will only worry about the water demand pertaining the domestic area, the

housing area.

To determine the amount of water needed we need to use several databases to

collected from several sources. One of those sources is from the department of transportation

(DOT), the DOT has divided areas into traffic analysis zones (TAZ) which is a group of

census block that has at least one major main road going through it or touching the zone

boundary, the data is collected on population or housing and employment for the traffic

demand model.

Housing and population data is collect per TAZ, using the most recent data available

from the DOT or other sources such as the regional building permit data and population

estimates from the Municipalities Offices of Budget.

Several methods area employed to forecast water demand, but we will do a per capita

water demand forecast only. A per capita water demand forecast assumes each individual

uses the same average amount of water annually (q) and to that average annual amount of

Page 2: Population Projection Assignment

 Pg  2  

  2  

water consume we will have to adjust for future water demand, in our case we will simply

assume there is a 2% increase in annual water demand. That average amount of water

consume per capita is simply multiplies by the total population of the area (N) to be serviced

to give us the total system water demand (Q):

𝑄! = 𝑁! · 𝑞! where t represents the calendar year

In other to calculate the population in the year 2034 we will use several modeling

methods: simple linear regression, quadratic regression, exponential and logistic growth

regressions.

In a simple linear equation model we assume population, the dependent variable, is

only dependent on a single parameter, the year which is the independent variable (Soon 2004

p. 335). This model is represented by the following equation:

P(t)=a+m·t

In the above equation P represents the population at year t, and a and m are constant

to be determined by the statistical program, excel, that will be determined given the data

collected over a period of time.

Another arithmetic linear regression that can be applied is explain in Reynolds and

Richards book in which we normalize the values by using the natural log for the dependent

variable, in some cases where the population vs. time graph expresses a s-curve this method

tends to give a better function to represent them. It yields a function with the following

general equation:

𝑙𝑛 𝑃! = 𝑎 +𝑚 · 𝑡 ↔ 𝑃! = 𝑒!!!·! = 𝑒! · 𝑒!·! = 𝑏 · 𝑒!·!

∴ 𝑃! = 𝑏 · 𝑒!·!

Quadratic modeling is still a linear regression model in the sense that the unknown

constants to be determined are not the base of any exponent with a variable: ax where a is

constant and x does vary. This model is represented by the following equation:

P(t) = a+b·t+c·t2

In the above equation P and t represent population and time, respectively and a, b and

c represent constants to be determined by the statistical program.

Exponential growth represents a change of the population relative to the initial

population which increases at a consistent rate over time: P(t)=Po·ea·t, where Po represents the

initial population and a is constant to be determined. If we compare the exponential growth

equation and the arithmetic linear equation normalized by the natural log, we see that both

Page 3: Population Projection Assignment

 Pg  3  

  3  

equations are the same. Reynolds and Richards simply demonstrate how the equation is

derived and how it relates to a linear fit.

Logistic growth represents a more realistic situation over a long period of time since

every area has a limit of how many people it can support. This model represents the rapid

growth followed by steady growth and finally the growth rate decreases gradually until it

reaches it limit capacity, the maximum amount of people the current system can support.

𝑃 𝑡 = !!!!·!!·!

+ 𝑑

Due to equipment constraints, logarithmic regression will be developed using the TI-

nspire CAS software instead that is capable of doing logistic regression with both d=0 and

d≠0, I will do both logistic regressions just to justify that for this set of data since the

population within the time range analyze is never zero we should use the d≠0 logistic

regression. Other programs are also capable doing this type of regression like minitab, excel

is not capable of doing this type of regression.

In order to evaluate which model above represents the data more accurately and

precisely we will compare some parameters: standard error (s), mean square error (MSE),

hypothesis testing with t and F statistics based on a selected probability (α), correlation

coefficient (r), coefficient of determination (r2), and confidence and prediction intervals. We

will select the model with least the least error and highest correlation.

The method of least squares determines the linear fit by minimizing the error of all the

data points using the following formula: 𝑃(𝑡) = 𝑎 +𝑚 · 𝑡 where the error is simply the

difference between the value given by the regressed equation against the data: 𝑒 = 𝑃 𝑡 −

𝑃(𝑡). The sum of the squared errors (SEE) is minimized by this method.

𝑆𝐸𝐸 = 𝑃 𝑡 ! − (𝑎 +𝑚 · 𝑡!]

!

!!!≅ 𝑛 − 1 · 𝜎!

𝑆𝑆 = 𝑛 − 1 · 𝜎!  

σ2 represents the standard deviation and n the number of samples or data points.

For this assignment we will only do null hypothesis testing which established that for

two set of data, the data given and the data generated from the regression, should have the

same mean or variance in order for it to represent the same population. We will compare both

F and t statistical calculated values against the corresponding 95% confidence value from

each respective table. Note that F statistical analysis is also known as the F test, likewise t

Page 4: Population Projection Assignment

 Pg  4  

  4  

statistical analysis is also known as t test. Excel is capable of computing both F and T test,

giving a probability of how close two sets of data are related.

𝐹!"#! =!"#$%#  !"#$  !"  !"#$"%&'!"#$$%&  !"#$  !"  !"#$"%&'

and 𝑡!"#! =!!!!!

!!· !!!! !!!

from Benefield and Randall

If Fcalc is greater than the Fα (value from the table) then reject the null hypothesis and

conclude that the variances are not equal, otherwise this test fails to reject the null hypothesis

and we can conclude that there is not enough evidence to conclude both variances or means

are equal hence he can speculate the regression does model the data.

If tcalc is greater than tα,df ,where df represents the degrees of freedom and is equal to

the sum of all the data points compared from both the data given and from the regressed

model minus two, if it is greater then reject the null hypothesis an conclude the means or

variance are not equal, otherwise you can only conclude that there is not enough evidence to

conclude that the means or variances are not equal.

The correlation coefficient (r) and its determinant (r2) range from 0 to 1 and represent

how close does the equation represent a set of data, being the closer to one a more accurate

representation. No equation need to be given for these two parameters since excel computes

them for us.

Data

Table 1. Population, Flow and Per Capita Data

Year Population (ppl)

Flow (MGD)

Flow per Capita (gpcd)

1984 4,525.00 0.24 53.04 1986 4,902.00 0.26 53.04 1988 5,457.00 0.31 56.81 1990 5,564.00 0.33 59.31 1992 5,806.00 0.35 60.28 1994 6,312.00 0.40 63.37 1996 7,300.00 0.48 65.75 1998 8,307.00 0.55 66.21 2000 10,621.00 0.75 70.61 2002 11,543.00 0.83 71.91 2004 12,546.00 0.92 73.33 2006 13,651.00 1.04 76.18 2008 14,563.00 1.15 78.97 2010 15,308.00 1.23 80.35 2012 18,302.00 1.52 83.05 2014 19,678.00 1.67 84.87

Page 5: Population Projection Assignment

 Pg  5  

  5  

Analysis

I will do two linear regressions, a simple linear regression and a natural log

regression. For a simple linear regression we simply plot the year of the x-axis and the

population on the y-axis and excel computes the rest:

Linear regression as a function of a natural log:

As seen above, by doing so we loose accuracy hence this last linear regression method

is not adequate to model this data.

P(t)  =  1.02·106·ln(t)  -­‐  7.72·106  R²  =  0.94926  

0,0  

0,5  

1,0  

1,5  

2,0  

2,5  

1980   1985   1990   1995   2000   2005   2010   2015  

Population  x  10000  

Year    

Natural  log  function  

Linear  Regression  P(t)  =  508.81·t  -­‐  106  

R²  =  0.95006  

0  

1  

1  

2  

2  

3  

1980   1985   1990   1995   2000   2005   2010   2015  

Population  

x  10000  

Year    

Linear  Regression  

Page 6: Population Projection Assignment

 Pg  6  

  6  

Logistic Regressions:

y  =  12.59x2  -­‐  49826x  +  49.3E+06  R²  =  0.98915  

0,0  

0,5  

1,0  

1,5  

2,0  

2,5  

1980   1985   1990   1995   2000   2005   2010   2015  

Population  

x  10000  

Year    

Quadratic  Regression  

y  =  1.37·10-­‐41·e0.0516x  R²  =  0.98291  

0,0  

0,5  

1,0  

1,5  

2,0  

2,5  

1980   1985   1990   1995   2000   2005   2010   2015  

Population  x  10000  

Year    

Exponential  Regression  

d=0  

𝑃(𝑡) =83,556.09

1 + 19.6 · 𝑒!!.!"#$·(!!!"#$)  

Year:      1984   1988   1992   1996   2000   2004   2008   2012   2014  

Page 7: Population Projection Assignment

 Pg  7  

  7  

d≠0  27383

1 + 2.062 · 10!" · e!!.!"#·! + 2832.68  

Year:      1984   1988   1992   1996   2000   2004   2008   2012   2014