Lognormality in the observed size distribution of oil and gas pools as a consequence of sampling...

17
Mathematical Geology, Vol. 24, No. 8, 1992 Lognormality in the Observed Size Distribution of Oil and Gas Pools as a Consequence of Sampling Bias ~ M. Powerz Economic filtration has been offered as an explanation of the observed lognormality in the size distribution of discovered oil and gas deposits. The result leads to the conclusion that one cannot impute the shape of the underlying parent distribution from the observed discoveries size distribution. The fact that the largest pools tend to be discovered early in the exploration history of an area of interest suggests the existence of an inherent sampling bias in the discovery process. The bias is influenced by the levels of geologic knowledge and technological sophistication. Furthermore, fhe existence of the bias leads to lognormality in the observed discoveries size distribution of oil and gas pools. A discovery process model explicitly incorporating the no#on of sampling bias was applied to a series of Weibull parent frequency size distributions. The selected parent distributions are of a class suggested in the literature as more reflective of nature's size distribution and have empirical support. The distribution of discoveries resulting from the application of the model to the chosen parent size distributions were tested for lognormatity using a chi-squared test. Lognormatity was found to be an acceptable model of the discoveries size distribution over a wide range of resource exhaustion measures. When combined with the notion of economic filtration, sampling bias leads to the conclusion that one should not expect the lognormal distribution to accurately represent the underlying parent size distribution of oil and gas deposits. KEY WORDS: economic filtration, oil and gas pool-size distributions, resource appraisal. INTRODUCTION The observed discoveries size distribution of oil and gas pools in an area of exploratory interest are typically described as lognormal. Arps and Roberts (1958) were among the first to suggest such a distribution. McCrossan (1968) estab- lished that the discovered oil and gas reserves of Western Canada were lognor- mally distributed. Barouch and Kaufman (1976), Lee and Wang (1983a,b), and Received 19 March 1991; accepted 21 April 1992. 2Department of Management Sciences, University of Waterloo, Waterloo, Ontario, N2L 3GI, Canada. 929 0882-8121/92/I 100-0929506.50/I © I992 International Association for Mathematical Geolog~

Transcript of Lognormality in the observed size distribution of oil and gas pools as a consequence of sampling...

Page 1: Lognormality in the observed size distribution of oil and gas pools as a consequence of sampling bias

Mathematical Geology, Vol. 24, No. 8, 1992

Lognormal i ty in the Observed Size Distribution of Oil and Gas Pools as a Consequence of Sampling

Bias ~

M. Power z

Economic filtration has been offered as an explanation of the observed lognormality in the size distribution of discovered oil and gas deposits. The result leads to the conclusion that one cannot impute the shape o f the underlying parent distribution from the observed discoveries size distribution. The fact that the largest pools tend to be discovered early in the exploration history of an area of interest suggests the existence of an inherent sampling bias in the discovery process. The bias is influenced by the levels o f geologic knowledge and technological sophistication. Furthermore, fhe existence of the bias leads to lognormality in the observed discoveries size distribution of oil and gas pools. A discovery process model explicitly incorporating the no#on of sampling bias was applied to a series o f Weibull parent frequency size distributions. The selected parent distributions are o f a class suggested in the literature as more reflective o f nature's size distribution and have empirical support. The distribution o f discoveries resulting from the application of the model to the chosen parent size distributions were tested for lognormatity using a chi-squared test. Lognormatity was found to be an acceptable model o f the discoveries size distribution over a wide range of resource exhaustion measures. When combined with the notion of economic filtration, sampling bias leads to the conclusion that one should not expect the lognormal distribution to accurately represent the underlying parent size distribution of oil and gas deposits.

KEY WORDS: economic filtration, oil and gas pool-size distributions, resource appraisal.

I N T R O D U C T I O N

The observed discoveries size distribution of oil and gas pools in an area of exploratory interest are typically described as lognormal. Arps and Roberts (1958)

were among the first to suggest such a distribution. McCrossan (1968) estab- lished that the discovered oil and gas reserves o f Western Canada were lognor- mally distributed. Barouch and Kaufman (1976), Lee and Wang (1983a,b), and

Received 19 March 1991 ; accepted 21 April 1992. 2Department of Management Sciences, University of Waterloo, Waterloo, Ontario, N2L 3GI, Canada.

929

0882-8121/92/I 100-0929506.50/I © I992 International Association for Mathematical Geolog~

Page 2: Lognormality in the observed size distribution of oil and gas pools as a consequence of sampling bias

930 Power

Forman and Hinde (1985) have all employed the lognormal distribution as an integral part of petroleum exploration modeling frameworks.

Concern about probable oil and gas shortages has focused attention on the number of smaller pools remaining to be discovered and raised questions about the nature of the parent population of oil and gas pools. Many have argued that there is a difference between the observed discoveries size distribution and the parent frequency size distribution. The observed discoveries size distribution has been shown, in many instances, to be a consequence of economic filtration. In such cases, the observed discoveries size distributions are not reliable guides when assessing the probable number of smaller pools remaining to be discov- ered. Arps and Roberts (1958) first suggested the notion of economic filtration in studying the economics of petroleum exploration in the Denver-Julesburg basin when they wrote that "the tapering-off on the left-hand side of the mode, however, must be largely caused by economic factors." Attanasi and Drew (1985) using discoveries data from the Gulf of Mexico demonstrated that the observed lognormal size distributions should be regarded as the result of an economic filtering process. Drew et al. (1988) examined the observed discov- eries size distribution in the state and federal waters offshore Texas and con- cluded that the shape of the observed discoveries size distributions were sensitive to changes in key economic parameters, such as the resource price.

The evidence strongly suggests that the observed discoveries size distri- butions of oil and gas pools are influenced by economic factors such as the price for the resource and its production costs. Clearly, then, as Drew et al. (1988) point out, one "should not be confident about inferring the form and specific parameters of the parent field size distribution from the observed distributions." Baker et al. (1984) argued that "in nature's distribution, numbers of deposits probably increase progressively in successively smaller sizes down to droplets and molecules; such a distribution is not lognormal." The offered explanation for the observed discovery size distributions is, again, economic filtration. Baker et al.'s description of the natural size distribution strongly suggests a class of "J-shaped" distributions as ideal candidates for the underlying parent size dis- tributions. When coupled with the notion of sampling bias (Arps and Roberts, 1958; Attanasi et al. 1980; Drew et al. 1980), the observed lognormality in the size distribution of discovered pools can be demonstrated to arise for reasons other than economic filtration.

In this paper, a number of parent Weibull frequency size distributions are sampled using a sampling without replacement model of the discovery process explicitly incorporating the notion of sampling bias. The resulting discoveries size distributions are then tested for lognormality using a chi-squared goodness- of-fit test and conclusions are drawn about the acceptability of the lognormal model as an accurate representation of the underlying parent frequency size distribution.

Page 3: Lognormality in the observed size distribution of oil and gas pools as a consequence of sampling bias

Size Distribution of Oil and Gas Pools 931

T H E M O D E L I N G F R A M E W O R K

The discovery process modeling approach used in this paper relies on the postulates suggested by Barouch and Kaufman (1976) and Arps and Roberts (1958). These may be stated as follows:

1. The discovery of pools within an area of exploratory interest can be modeled statistically as sampling without replacement from an underlying pop- ulation of pools.

2. The discovery of a particular pool within the available population of undiscovered pools is random with the probability of discovery being propor- tional to the areal extent of the pool.

Consider a resource base consisting of N pool-size classes with areal sizes PI, P2 . . . . . PN. Each occurs within the population of pools with frequency F~, F2 . . . . . F u. One of the pool-size classes represents the dry hole outcome with the area of the dry hole being based on the minimum spacing allowed between exploratory wells. In a completely unexplored basin, if the drilling locations were chosen at random, then the probability of discovering a pool of size P] with the first exploratory well, i = 1, would be expressed simply as the ratio of the product of its occurrence and size divided by the area available for exploration:

FjPj P r ( X l = Pj) - N (1)

F Pk k = l

Evidence, however, from the drilling histories of basins in the United States suggests that companies are in fact more efficient at discovering pools than the random drilling rule would suggest. Drew et al. (1980) estimated exploration in the Denver basin was 2.58 times as efficient as sampling proportionally to areal extent. Similar results were found by Arps and Roberts (1958). This sug- gests that a parameter, /3, relating the influence of a pool's areal extent to the speed with which the pool is discovered be included in the modeling process. The introduction of the parameter is crucial to the model. It captures the degree of geologic understanding firms have of the area they are exploring and the degree of technical sophistication. In effect, the/3 parameter embodies the tech- nical bias existing toward the discovery of larger pools and, as such, may be appropriately regarded as a sampling bias parameter.

In cases when/3 = 0, pool size has no influence on the speed with which pools in a particular pool-size class are found; hence, /3 = 0 is an unrealistic value. If/3 = 1, sampling is directly proportional to pool size and the model represents random choices of drilling locations in the remaining area of interest. If/3 > 1, pools are sampled from the underlying distribution more than pro- portionally to relative pool size. This introduces a sampling bias toward the

Page 4: Lognormality in the observed size distribution of oil and gas pools as a consequence of sampling bias

932 Power

largest pools and produces a result where the largest pools are found more quickly than with random drilling. The /3 parameter works by increasing the relative weighting of larger pools in the probability expression defined in Eq. (1), above, thus increasing the speed with which they are discovered.

The number of pools of size Pj remaining prior to the drilling of the ith well depends upon the number of pools of size Pj discovered by the drilling of wells 1 to i - 1. Defining the previous number of discoveries of size Pj as Mji- l, the probability that the ith well will result in the discovery of pool-size Pj , Pr(Xi = Pj), can then be written as a function of the number of pools of size j remaining, the areal extent of those pools and the sampling bias parameter as follows:

Pr Xi = Pj mJ i - 1 = U

Z (Fk -- Mki-1) Pk ~ k=!

(2)

f o r j = 1 . . . . . N. Setting the Mj/values to the cumulative expected values of the previously

drilled wells and re-normalizing the probability expressions allows the calcula- tion of a vector of successive future discoveries expectations conditioned on previously predicted discovery results. The approach relies on incorporating geological predictions, or assumptions about the form of the parent frequency size distribution, directly into the modeling framework and allows specific es- timation of the/3 parameter based on historical drilling experience (Power and Fuller, 1991). Based on evidence from the literature, the work completed here assumes a WeibuU parent frequency size model and uses a FORTRAN program to produce the discovery predictions defined by Eq. (2).

Alternatively, given a sequence of discoveries, it is possible to derive from Eq. (2) estimates of the parent frequency size distribution and the parameters describing the discovery process. O'Carroll and Smith (1980) and Smith and Ward (1981) provide examples of this second approach. Using volume measures of the discovery size in place of the areal measures used here, their approach has been successfully used to examine the consistency of functional restrictions on the form of the parent frequency size distribution and on the sampling bias parameter with the observed discovery history of the North Sea.

THE PARENT FREQUENCY SIZE DISTRIBUTIONS

The effect of sampling bias on the observed discoveries size distribution was examined using the discovery process model discussed above and a series of Weibull parent frequency size distributions. The use of the Weibull distri- bution as a model for the parent pool-size distribution can be justified on the

Page 5: Lognormality in the observed size distribution of oil and gas pools as a consequence of sampling bias

Size Distribution of Oil and Gas Pools 933

grounds that it is of the "J-shaped" class of distributions argued by Baker et al. (1984) to be more representative of nature's size distribution. Furthermore, modeling work completed for Canada's Scofian Shelf and Western Sedimentary Basin (Power; 1990, 1992), found the Weibult distribution adequately described the estimated pool-size distribution for a number of considered plays in both regions. The results for the Scotian Shelf are unique in the sense that the esti- mated discoveries size distribution combined information on actual discoveries and geologically probable discoveries to produce a description of the size dis- tribution of all the pools available for discovery. Accordingly, the data set does not incorporate the effects of economic filtration that commonly bias most data sets. Finally, Smith and Ward (1981) report testing results completed for the North Sea discoveries data that suggest a "J-shaped" [Weibull] distribution "would better represent the geological process of petroleum deposition in the North Sea." The agreement between the tested results and the suggestions of Smith and Ward and Baker et al. concerning "J-shaped" distributions is strik- ing. Accordingly, a series of Weibull size distributions were generated using the following cumulative density function:

F(x) = 1 - exp [ - ( x / 2 0 ) "] i f x > 0 (3)

= 0 otherwise

The Weibull distribution is fully defined by two parameters; a shape parameter, c~, and a scale parameter, % Here the scale parameter has been set equal to 20. The shape parameter, however, has been varied to produce a series of closely related parent frequency size distributions. When cz = 1, an exponential distri- bution with mean 20 is defined. An o~ value less than 1 produces a more sharply defined "J-shaped" distribution and an c~ value greater than 1 produces a less sharply defined "J-shaped" distribution.

Pool-size classes were defined in arbitrary volume units starting at 0 and increasing by increments of 15 as follows: pool-size class 1 ranged from 0 to 15, pool-size class 2 from 15 to 30 and so on. The number of pools falling into each of the defined pool-size classes were determined as follows:

Number of pools in size class j = F(/j) - F(uj) (4)

where/j- = lower limit of the j th class, uj = upper limit of the j th class, and F(x) = cumulative density function of the Weibull distribution.

Application of the methodology to Weibull distributions with shape param- eters of 0.90, 0.95, 1.00, 1.05, and 1.10 produced the discretized size distri- butions given in Table 1. The choice of the range for ~, though somewhat arbitrary, has some basis in geologic observation. Discoveries data for some 26 plays in Canada's Western Sedimentary Basin were tested by Power (1992) using the Anderson-Darling test statistics developed by Stephens (1986). Thir- teen of the 26 plays were accepted as being best described by the Weibull model

Page 6: Lognormality in the observed size distribution of oil and gas pools as a consequence of sampling bias

934 Power

Table 1. Parent Size Distributions ~

Class interval ~x = 1.00 ~x = 1.05 et = 1.00 ~x = 0.95 a = 0.90

0-15 103.496 104.509 105.527 106.548 107.572 15 -30 54.564 52.215 49.847 47.462 45.061 30-45 24.509 24.070 23.546 22.939 22.249 45-60 10.403 10.800 11.122 11.362 11.513 60-75 4.260 4.765 5.254 5.711 6.122 75-90 1.698 2.077 2.482 2.900 3.318 90-105 0.663 0.897 1.172 1.484 1.824

105-120 0.254 0.384 0.554 0.765 1.014 120-135 0.096 0.163 0.262 0.396 0.569 135-150 0.036 0.069 0.124 0.206 0.322 150-165 0.013 0.029 0.058 0.108 0.184 165-180 0.005 0.012 0.028 0.056 0.105 180-195 0.002 0.005 0.013 0.030 0.061 195-210 0.001 0.004 0.012 0.033 0.085

"The frequencies in each of the defined pool-size classes are given in the table above for a series of Weibull [tx, 20] distributions. The a parameter defines the shape of the resulting distribution.

on the basis of their p-values. In those plays that tested Weibull ot averaged

1.007, though values for a ranged from 0.488 to 1.547. Thus, the selected range for the testing methodology discussed represents the mid-20% of the observed range for a . Furthermore, the selected range covers the values of a

where the Weibull distribution changes from being predominantly "J -shaped" , when ot < 1, to being unimodally humped in shape, when c~ > 1, and, ac- cordingly, considers a range of possible shapes for the distribution.

Before the volumetric information could be effectively used by the discov- ery process model, it was converted to areal information. Following Harbaugh et al. (1977), the relationship between volume and area was expressed as a

fractional power with area being represented by the square of a linear dimension and volume by the cube of a linear dimension. Pools were further assumed to be geometrically similar such that the area-volume relationship could be defined

as follows:

A = k V (2/3) (5)

where A = pool area in km 2, V = volume of the hydrocarbons-in-place, and k

is an assumed constant. The above model was then used to calculate the number of discoveries in

each of the defined pool-size classes as a result of drilling in a hypothetically defined area of exploration interest totalling 2000 km 2. Dry holes were assumed

Page 7: Lognormality in the observed size distribution of oil and gas pools as a consequence of sampling bias

Size Distribution of Oil and Gas Pools 935

to remove a square kilometer of area from further exploratory interest. This amounted to an area equivalent to about 26 % of the smallest pool-size class and is consistent with the minimum spacing required between wells before new wells can be classified as new field wildcats. The resulting distributions of discoveries by pool-size class were then examined, beginning after the completion of the 40th well, at 20 well increments for lognormality using a chi-squared goodness- of-fit test.

THE CHI-SQUARED TEST

The chi-squared goodness-of-fit test is the oldest and best known of the goodness-of-fit tests. It is useful for examining the assumption that a set of data can be reasonably regarded as being a sample drawn from a specified parent distribution. The most problematic aspect of applying the chi-squared test is the choice of the number and size of the intervals. No definitive prescription can be given which will guarantee good results. Law and Kelton (1991) state that the test will be approximately valid if the chosen number of intervals is greater than or equal to 3 and the expected number of observations in each interval equals or exceeds 5. DeGroot (1986) states the test will be satisfactory if the expected number of observations in each interval equals or exceeds 1.5. Just how small the expected number of observations in a class interval, j , can be- come, however, is not clear. For the distributions generated and tested in this paper there were a number of instances when the average expected value was less than 1. While this might appear to compromise the validity of the test, Slakter (1973) has argued that the number of classes can exceed the number of observations without compromising the appropriateness of the test. That, in turn, implies that the average expected value in the classes can be less than 1.

In testing for lognormality, it is often easier to exploit the connection between the lognormal and the normal distributions. The random variable X is said to be distributed lognormally with mean # and variance a 2 if, and only if, the natural logarithm of X is distributed normally with mean # and variance a 2. Thus, as was done here, if the data Xl, X2 . . . . XN are thought to be lognormally distributed, the natural logarithms of the data points, In X l, In X 2 . . . . In XN may be treated as normally distributed data for the purposes of parameter esti- mation and goodness-of-fit testing.

As the employed discovery process model produced grouped data, the pop- ulation parameters required to conduct the chi-squared test were estimated using grouped data techniques. Specifically, the mean and variance were estimated as

Page 8: Lognormality in the observed size distribution of oil and gas pools as a consequence of sampling bias

936 Power

follows:

n mif~

j = l n

[j~ 2." (~=lSin )2_l mi 2 1 J

s -- - - m i f i n - - 1 =1

(6)

where m i = mid-point of the ith class, f- = frequency of the ith class, and n = sample size.

The expected number of observations in each class were calculated using the normal integral algorithm of Hill (1973). The algorithm allows the quick computation of the area under the standard normal curve. The density contained in each of the class intervals was then scaled by the sample size of the tested data to produce estimates of the expected numbers in each of the defined class intervals. Comparison of the observed and expected discoveries numbers in each class was completed using the chi-squared test and the results tabulated for presentation in the tables that follow.

E X P E R I M E N T A L RESULTS

The hypotheses tested were as follows:

The nu l l h y p o t h e s i s : the discoveries size distribution resulting from a biased sampling procedure is lognormal.

The a l t e r n a t i v e h y p o t h e s i s : the discoveries size distribution resulting from a biased sampling procedure is not lognormal.

The results of the chi-squared testing of these hypotheses are presented in Tables 2-6 in terms of the percentage of the in s i tu geologic resources that are discov- ered by exploration drilling. After each increment of drilling was completed, the number of discoveries in each pool-size class were multiplied by the average volume of representative pools in the pool-size class. A summation across all class intervals was then completed to arrive at an estimate of the total quantity of discovered resources. The total quantity discovered figures were then divided by the total quantity available for discovery to arrive at an estimate of the area's exhaustion.

A total of 380 data sets were tested. These represented a data set for the discoveries distribution for successive 20 well increments of drilling activity in the range of 40-400 wells for each of the considered sampling bias and distri- butional shape parameters. Of the 380 data sets tested, 225 were compatible, and 155 incompatible, with the lognormal hypothesis at the 0.05 level of sig-

Page 9: Lognormality in the observed size distribution of oil and gas pools as a consequence of sampling bias

Size Distribution of Oil and Gas Pools

Table 2. Drilling and Exhaustion Results for the Weibull [0.90, 20] Distribution ~

Sampling bias Number

937

of wells /3 = 2.00 /3 = 1.75 ~ = 1.50 /3 = 1.25

40 0.389 0.358 0.319 0.273 60 0.529 0.493 0.446 0.387 80 0.640 0.603 0.553 0.487

100 0.727 0.692 0.642 0.574 120 0.794 0.762 0.715 0.648 140 0.845 0.847 0.775 0.711 160 0.883 0.859 0.822 0.764 180 0.912 0.892 0.860 0.808 200 0.934 0.917 0.890 0.844 220 0.952 0.936 0.913 0.874 240 0.966 0.952 0.932 0.898 260 0.977 O. 965 O. 947 O. 918 280 0.975 0.959 0.935 300 0.969 0.948 320 0.977 0.959 340 0.967 360 0.975

~The distributions for each combination of wells and /3 were tested for lognormality. For those distributions for which lognormality proved a statistically adequate description of the data, the percentage of the existing resources discovered by exploration, exhaustion, are reported. No figures are reported for the distributions rejected as lognormal.

nificance. Combina t ions of the wells dri l led and sampl ing bias parameter [fl], for a given shape parameter , that a l lowed acceptance of the lognormal hypothesis report the relevant resource exhaust ion measure in the appropriate cell o f Tables

2 -6 . Combina t ions of the wells dri l led and sampl ing bias parameter [/3], for a

g iven shape parameter , that rejected the lognormal hypothesis have the appro- priate cell o f Tables 2 - 6 left b lank.

Certain patterns in Tables 2 - 6 are evident . Defining a data grouping as the testing results for a c o m m o n shape [a] parameter , it is clear that within a data grouping the smal ler the sampl ing bias parameter [t3], the larger the dri l l ing

hor izon over which lognormal i ty is main ta ined as an acceptable description of the discoveries data. Wi th in a data set, there is no correlat ion be tween /3 and the highest resource exhaust ion rate measure for which the lognormal distr ibu- t ion was accepted as an adequate descript ion of the discoveries data. Tests o f the correlat ion be tween /3 and exhaust ion provided no evidence o f significant correlat ion which suggests that, regardless of /3 , lognormal i ty is main ta ined to a more or less c o m m o n resource base exhaust ion point .

W h e n all data sets are considered, there are some patterns with respect to

Page 10: Lognormality in the observed size distribution of oil and gas pools as a consequence of sampling bias

938

Table 3. Drilling and Exhaustion Results for the Weibull [0.95, 20] Distribution a

Sampling bias Number

Power

of wells /3 = 2.00 /3 = 1.75 /3 = 1.50 /3 = 1.25

40 0.374 0.343 0.306 0.261 60 0.512 0.476 0.430 0.372 80 0.625 0.587 0.536 0.470

100 0.714 0.677 0.626 0.557 120 0.784 0.750 0.701 0.631 140 0.837 0.807 0.762 0.695 160 0.877 0.851 0.811 0.750 180 0.907 0.886 0.851 0.796 200 0.931 0.912 0.883 0.834 220 0.949 0.933 0.908 0.865 240 0.949 0.927 0.891 260 0.943 0.912 280 0.929 300 0.943

aThe distributions for each combination of wells and /3 were tested for lognormality. For those distributions for which lognormality proved a statistically adequate description of the data, the percentage of the existing resources discovered by exploration, exhaustion, are reported. No figures are reported for the distributions rejected as lognormal.

Table 4. Drilling and Exhaustion Results for the Weibull [1.00, 20] Distribution a

Sampling bias Number of wells /3 = 2.00 /3 = 1.75 /3 = 1.50 /3 = 1.25

40 0.360 0.330 0.294 0.251 60 0.497 0.462 0.416 0.359 80 0.610 0.572 0.521 0.456

100 0.702 0.664 0.611 0.542 120 0.774 0.739 0.687 0.617 140 0.830 0.798 0.750 0.682 160 0.872 0.844 0.802 0.737 180 0.903 0.880 0.843 0.784 200 0.908 0.876 0.824 220 0.902 0.857 240 0.884

aThe distributions for each combination of wells and /3 were tested for lognormality. For those distributions for which lognormality proved a statistically adequate description of the data, the percentage of the existing resources discovered by exploration, exhaustion, are reported. No figures are reported for the distributions rejected as lognormal.

Page 11: Lognormality in the observed size distribution of oil and gas pools as a consequence of sampling bias

Size Distribution of Oil and Gas Pools

Table 5. Drilling and Exhaustion Results for the Weibull [1.05, 20] Distribution a

Sampling bias Number

939

of wells /3 = 2.00 /3 = 1.75 /3 = 1.50 /3 = 1.25

40 0.347 0.319 0.284 0.243 60 0.484 0.448 0.403 0.348 80 0.598 0.559 0.508 0.443

100 0.691 0.652 0.598 0.529 120 0.765 0.728 0.675 0.604 140 0.823 0.789 0.740 0.669 160 0.867 0.838 0.793 0.726 180 0.900 0.875 0.836 0.774 200 0.925 0.904 0.870 0.815 220 0.897 0.849 240 0.877 260 0.900

~The distributions for each combination of wells and /3 were tested for lognormality. For those distributions for which lognormality proved a statistically adequate description of the data, the percentage of the ,existing resources discovered by exploration, exhaustion, are reported. No figures are reported for the distributions rejected as lognormal.

Table 6. Drilling and Exhaustion Results for the Weibull [1.10, 20] Distribution"

Sampling bias Number of wells /3 = 2.00 /3 = 1.75 /3 = 1.50 /3 = 1.25

40 0.337 0.309 0.275 0.235 60 0.472 0.437 0.392 0.339 80 0.586 0.547 0.496 0.433

100 0.681 0.641 0.587 0.517 120 0.757 0.719 0.664 0.592 140 0.817 0.782 0.730 0.658 160 0.863 0.832 0.785 0.716 180 0.897 0.871 0.829 0.765 200 0.923 0.901 0.865 0.807 220 0.924 0.893 0.842 240 0.916 0.871 260 0.934 0.895

~The distributions for each combination of wells and 13 were tested for lognormality. For those distributions for which lognormality proved a statistically adequate description of the data, the percentage of the existing resources discovered by exploration, exhaustion, are reported. No figures are reported for the distributions rejected as lognormal.

Page 12: Lognormality in the observed size distribution of oil and gas pools as a consequence of sampling bias

940 Power

changes in ~,/3, the number of wells drilled, and the resource exhaustion point at which lognormality was no longer an adequate description of the discoveries size distribution. The sampling bias parameter, /3, and the shape parameter, et, were both negatively correlated to the number of wells drilled at the 0.05 level of significance. Increases in/3 increase the discovery efficiency, thus reducing the number of wells required to exhaust a region, while increases in c~ concen- trate more of the distributional mass in the medium pool-size range. As these pools tend to be found more quickly, for a given /3, the area of exploration interest is accordingly exhausted more quickly and with a fewer number of wells.

The correlations suggest regions experiencing high discovery efficiencies and having Weibull distributions defined by a shape parameter > 1 will tend to display lognormal discovery distributions over a shorter proportion of their dis- covery histories. The results, in part, help explain the conclusions of Smith and Ward (1981) regarding the adequacy of the lognormal models as a description of the depositional process in the North Sea. Data contained in O'Carroll and Smith (1980) suggest a North Sea drilling success ratio of 0.225 over the period 1966-1977. Data in Murray (1990) set the drilling success ratio in U.S. new field wildcats over the same period at 0.121 and when compared to North Sea results suggest that discovery efficiency has been particularly high in the North Sea. This, in turn, argues that conclusions regarding the adequacy of lognor- mality would tend to be reached sooner, rather than later, in the North Sea exploration history. The refutation by O'Carroll and Smith (1980) of the log- normal hypothesis and Smith and Ward's suspicions that "the lognormal form is not the appropriate characterization of the depositional process" represent cases in point.

Finally, as Fig. 1 depicts, as a increases to 1 the average resource ex- haustion point beyond which the lognormal hypothesis is no longer valid falls. The result has much to do with the distinctive "J-shape" of the Weibull dis- tribution when a < 1. The majority of the distributional density is concentrated in the lower tail of the distribution. The pools represented at this end of the distribution are consequently selected against in a biased exploration process. Furthermore, when c~ < 1, less density is concentrated in the middle of the distributional range and more is concentrated in the right-hand tail. The resulting distinctive bow shape of the distribution is also characteristic of some forms of the lognormal distribution (see Figs. 2A and B). As c~ rises toward 1, the similarities between the Weibull and lognormal models cease. This suggests that as c~ rises toward 1, smaller discoveries samples, and lower exhaustion mea- sures, are required to detect differences between an observed discoveries size distribution drawn from a Weibull distribution and the lognormal model.

Above a = 1, however, as c~ increases the average resource exhaustion point beyond which the lognormal hypothesis is no longer valid increases. The explanation of the result depends upon the increasing similarity observed in

Page 13: Lognormality in the observed size distribution of oil and gas pools as a consequence of sampling bias

Size Distribution of Oil and Gas Pools

1.00"

941

0.96

0.92

0.88 o.9o o.9s 1.~o 1.~o 1.~o

x

Fig. 1. Depicts the relationship between resource base exhaus- tion and the distributional shape parameter used to define the parent frequency size distribution. As the shape parameter in- creases toward 1, less discoveries information is required to reject the lognormal hypothesis. Above 1, as the shape param- eter increases more discoveries information is required to reject the lognormal hypothesis.

distributional form for the lognormal and Weibull models. As c~ increases the Weibull model becomes characteristically humped and increasingly symmetric. This is precisely the distributional form most commonly associated with log- normality and suggests that larger discovery samples, and higher exhaustion measures, are required to detect differences between the observed discoveries size distribution drawn from a Weibull model and the lognormal model.

The results clearly demonstrate that the discoveries size distribution cannot be rejected as being non-lognormal over a wide range of exhaustion measures. For all the discoveries distributions tested, lognormality was accepted as an adequate description of the discoveries size distribution, regardless of c~ and ~, when between 38.9% and 88.4% of the area's resources had been discovered. The results imply that over the mid-range of the discovery history of an area, lognormality is an adequate description of the discoveries size distribution ir- respective of whether economic filtration has occurred. The point is important. The tested distributions in this experiment assume perfect knowledge of every discovered pool 's size. The resulting discoveries size distributions do not filter out the smaller pools. What then is the importance and probable effect of filtra- tion? Given the experimental results of Tables 2-6, one must conclude that the observed lognormality in discoveries size distributions arises naturally as a con- sequence of sampling bias. Economic filtration serves to augment the tendency toward lognormality in the observed discoveries size distributions by removing

Page 14: Lognormality in the observed size distribution of oil and gas pools as a consequence of sampling bias

942 Po wer

11,2"

1.0'

0.11'

0.4.'

0,2'

0.0

' 4 - -W[0 .5 , 10]

1,0 2.0 3.0 4.0 5.0 X

I.O t~o, l.S)

0,8"

o.7. O, 0.5)

O.g'

0.4'

0,3

0.2'

0.1'

On .. 1.0 7,0 3.0 4.0 5.0

X

Fig. 2. (A) Depicts the influence of the shape parameter on the Weibull distributional model. The scale parameter has been set equal to 1 for ease of comparison with the lognormal distributional models depicted in B. (B) Depicts the influence of the shape parameter on the lognormal dis- tributional model. The scale parameter has been set equal to 1 for ease of comparison with the Weibull distributional models depicted above.

the smaller pools from the reported discoveries statistics. It is not, however, the sole cause of lognormality.

Consider Fig. 3 which represents the discoveries size distribution for the Weibull [1.00, 20] parent distribution when the sampling bias parameter equals 1.75. Figure 3 compares the discoveries size distribution for pool-size classes

Page 15: Lognormality in the observed size distribution of oil and gas pools as a consequence of sampling bias

Size Distribution of Oil and Gas Pools 943

12.0 -

1 0 0 "

= "E

8o- m

6(3-

20"

,,~- Parent OTstribufion

x ~ D i s l r i b u f i o n after ;500 wells

~ sfribufion after 2 0 0 wells

1 2 :3 4 5 6 Pool-53ze Class

Fig. 3. Depicts the portion of the frequency size distribution for the Weibull [l.00, 20] parent distribution for pool-size classes 1-6 and the relevant portions of the discoveries size distributions resulting from the completion of 100, 200, and 300 wells with a sampling bias parameter of 1.75. As drilling increases, the resulting discoveries size distributions evolve to- ward the parent frequency size distribution by filling in the number of discoveries occurring in the smaller pool-size classes.

1-6 after the completion of 100, 200, and 300 wells to the parent frequency size distribution. Pool size classes 7-14 are not included in the figure because 100 wells were sufficient to exhaust the available discoveries in those size classes. As drilling progresses the smaller pool-sizes become increasingly more numer- ous and the resulting discoveries size distributions evolve toward the parent frequency size distribution. As this occurs, the lognormal model becomes in- creasingly tenuous as an accurate description of the discoveries size distribution. Above 200 wells, the lognormal model is a statistically inadequate description of the data. If, however, economic filtration had been allowed, then pools below some minimum economic size would have been increasingly likely to be rejected as non-economic as the difference in their size and the minimum economic size grew. This would prevent the discoveries statistics from becoming dominated by the smaller pool-size classes and resulted in a discoveries size distribution more representative of the 100 well discoveries size distribution which was compatible with the lognormal hypothesis. Economic filtration, then, is seen to counteract the tendency of the discovery process to increase the number of small pools entering the discoveries size distribution. Undoubtedly, the offsetting ef- fect of filtration would help to preserve the observation of lognormality in the discoveries data over a greater range of drilling activity.

Page 16: Lognormality in the observed size distribution of oil and gas pools as a consequence of sampling bias

944 Power

C O N C L U S I O N S

The analysis has demonst ra ted that the observed lognormal i ty in the size

distr ibutions o f d i scovered oil and gas pools can arise for reasons other than

e c o n o m i c filtration. Us ing Weibu l l parent f requency size distr ibutions and a

d i scovery process mode l specif ical ly incorporat ing the not ion o f sampling bias,

it has been shown that lognormal i ty arises as a result o f the bias inherent in the

d i scovery process o v e r a wide range o f resource exhaust ion measures . Further-

more , bias was shown not to affect the point at which lognormal i ty was judged

to be an acceptable mode l o f the observed discover ies size distribution. Taken

together , sampl ing bias and e c o n o m i c filtration provide strong ev idence that the

observed d iscover ies size distr ibutions o f d i scovered oil and gas pools are un-

l ikely to accurate ly reflect the under ly ing parent f requency size distribution.

This suggests that conclus ions regarding the adequacy o f the lognormal mode l

as a descr ipt ion o f the parent f requency size distr ibution based on observed

d iscover ies data are not correct . Accord ing ly , one should refrain f rom making

inferences about the fo rm o f the parent f requency size distr ibution based on the

observed d iscover ies data.

R E F E R E N C E S

Arps, J. J., and Roberts, T. G., 1958, Economics of drilling for cretficeous oil on east flank of Denver-Julesburg Basin: Am. Assoc. Petr. Geol. Bull., v. 42, p. 2549-2566.

Attanasi, E. D., and Drew, L. J., 1985, Lognormal field size distribution as a consequence of economic truncation: Math Geol., v. 17, p. 335-351.

Attanasi, E. D., Drew, L. J., and Schuenemeyer, J. H., 1980, Petroleum Resource Appraisal and Discovery Rate Forecasting in Partially Explored Regions--An Application to Supply Mod- elling: USGS Professional Paper 1138-C.

Baker, R. A., Gehman, H. M., James, W. R., and White, D. A., 1984, Geologic field number and size assessments of oil and gas plays: Am. Assoc. Petr. Geol. Bull., v. 68, p. 426-437.

Baronch, E., and Kaufman, G. M., 1976, Oil and Gas Discovery Modelled as Sampling Without Replacement and Proportional to Random Size: Sloan School Working Paper No. 888-76.

DeGroot, M. H., 1986, Probability and Statistics, 2nd ed.: Addison-Wesley, New York, 723 p. Drew, L. J., Schuenemeyer, J. H., and Root, D. H., 1980, Petroleum Resource Appraisal and

Discovery Rate Forecasting in Partially Explored Regions--An Application to the Denver Basin: USGS Professional Paper 1138-A.

Drew, L. J., Attanasi, E. D., and Schuenemeyer, J. H., 1988, Observed oil and gas field size distributions: A consequence of the discovery process and prices of oil and gas: Math. Geol., v. 20, p. 939-953.

Forman, D. J., and Hinde, A. L., 1985, Improved statistical method for assessment of undiscovered petroleum resources: Am. Assoc. Petr. Geol. Bull., v. 69, p. 106-118.

Harbaugh, J. W., Doveton, J. H., and Davis, J. C., 1977, Probability Methods in Oil Exploration: John Wiley and Sons, New York, 269 p.

Hill, I. D., 1973, The normal integral, algorithm AS 66: Appl. Stat., v. 22, p. 424427. Law, A. M., and Kelton, W. D., 1991, Simulation Modelling and Analysis, 2nd ed.: McGraw-

Hill, New York, 759 p.

Page 17: Lognormality in the observed size distribution of oil and gas pools as a consequence of sampling bias

Size Distribution of Oil and Gas Pools 945

Lee, P. J., and Wang, P. C. C., 1983a, Probabilistic formulation of a method for the evaluation of petroleum resources: Math. Geol., v. 15, p. 163-181.

Lee, P. J. and Wang, P. C. C., 1983b, Conditional analysis for petroleum resource evaluation: Math. Geol., v. 15, p. 349-361.

McCrossan, R. G., 1968, An analysis of size frequency distribution of oil and gas reserves of Western Canada: Can. J. Earth Sci., v. 6~ p.201-211.

Murray, T. H., Jr., 1990, North American drilling activity in 1989: Am. Assoc. Petr. Geol. Bull., v. 74, p. 7-36.

O'Carroll, F. M., and Smith, J. L., 1980, Probabilistic methods for estimating undiscovered pe- troleum resources, in J. R. Moroney (Ed.), Advances in the Economics of Energy Resources, Vol. 3: JAI Press, Greenwich, Conn., p. 31-63.

Power, M., 1990, Modelling Natural Gas Exploration and Development on the Scotian Shelf: Ph.D. thesis, Department of Management Sciences, University of Waterloo, Waterloo, Ontario, Can- ada.

Power, M., 1992, The Appropriateness of the Lognormal Distribution as a Model for Hydrocarbon Frequency-Size Distributions: Proceedings of the 15th Annual International Conference of the International Association for Energy Economics, Tours, France, May 18-20, 1992.

Power, M., and Fuller, J. D., 1991, Predicting the discoveries and findings costs of natural gas: The example of the Scotian Shelf: Energy J., v. 12, p. 77-93.

Slakter, M. J., 1973, Large vales for the number of groups with the Pearson chi-squared goodness- of-fit test: Biometrika, v. 60, p. 420-421.

Smith, J. L., and G. L. Ward, 1981, Maximum likelihood estimates of the size distribution of North Sea oil fields: Math. Geol., v. 13, p. 399-413.

Stephens, M. A., 1986, Tests based on EDF statistics, in R. B. D'Agostino and M. A. Stephens (eds.), Goodness-of-Fit Techniques: Dekker, New York, p. 97-193.