Appendix C: Analytical Strategy for Quantifying the Effect of … · 2020. 3. 17. · neither...
Transcript of Appendix C: Analytical Strategy for Quantifying the Effect of … · 2020. 3. 17. · neither...
1
Appendix C: Analytical Strategy for Quantifying the Effect of Facility and
Operator Characteristics on Venting and Flaring Practices In this appendix, I review the strategies behind the analysis of the types of facilities and
operators most responsible for Texas oil and gas venting and flaring practices in 2012. In essence, this
appendix provides more details about the methods and findings underlying the research presented in
chapter 4.
C.1. Research Method
C.1.1. Units of Analysis, Population, and Sample This study involves all producing Texas oil and gas extraction facilities that submitted their
monthly production and disposition report in 2012 that are within a mile of a Census block group with at
least one American Community Survey five-year summary file block group estimate publicly released, as
in Appendix B. In addition, this study also involves the companies with direct ownership of the oil and
gas extraction facility (i.e., the operator). In 2012, there were 4,713 different operators in control of
producing oil and gas extraction facilities.
C.1.2. Data Sources In addition to the five different sources described in Appendix B, this study also relies on the
following three sources: (1) additional Texas Railroad Commission datasets, (2) United States Energy
Information Administration Interstate and Intrastate Pipeline Shapefile, and (3) Corporate Structure
Information on LexisNexis and Google.
C.1.2.1. Additional Texas Railroad Commission Datasets
C.1.2.1.1. Organization Report (P-5)
The Texas Railroad Commission Organization Report (P-5) dataset provides information on all
organizations that have completed form P-5 required to legally engage in the oil and gas extraction
industry business in Texas. Since 1981, organizations directly involved in oil and gas activities in Texas,
including organizations involved in drilling, operating, or producing any oil or gas well, are required to
file an organization report, Form P-5. This dataset is ideal because, to my knowledge, it is the only
dataset that provides researchers with the capacity to link production and disposition at individual gas
wells to specific operating companies.
C.1.2.1.2. 2012 Inspection Extract
An extract of all inspections conducted by the Texas Railroad Commission in 2012 was received
on June 25, 2016. This is an ideal dataset because it provides the most comprehensive information on
inspection activities and when facilities violate state regulations. Since the state (not the federal
government) is primarily responsible for regulating oil and gas extraction facilities, state regulatory
activity is critical to the analysis.
C.1.2.1.3. 2012 Permit Extract
An extract of all venting and flaring permits granted by the Texas Railroad Commission in 2012
was received on August 10, 2015. It includes information regarding approved flaring permits. This is an
ideal dataset because it is maintained by the agency responsible for approving and tracking permits to
vent and flare gas in Texas.
2
C.1.2.2. United States Energy Information Administration Intrastate and Interstate Natural Gas Pipeline
Shapefile
A shapefile of the natural gas interstate and intrastate pipelines as of January 1, 2012 is publicly
available to be downloaded at www.eia.gov/maps/layer_info-m.cfm. This dataset was collected by the
EIA from the Federal Energy Regulatory Commission (FERC). This dataset is ideal because it provides the
most extensive map of all natural gas pipelines in the continental United States. Like the Census
TIGER/Line shapefile, this shapefile datum is NAD83.
C.1.2.3. Corporate Structure Information on Lexis Nexis and Google
The Texas A&M University Sociology Department Graduate Research Award supported an
outstanding undergraduate student, Garrison Reed Barrilleaux, to collect corporate structure
information on the operators identified in the Texas Railroad Commission Organization Report Form.
First, using the operator names listed in the Texas Railroad Commission Organization Report Form, the
student identified operators listed in the LexisNexis Corporate Affiliations Database. Then, operators not
identified in LexisNexis were searched on Google. All companies that could not be found on Lexis Nexis
Corporate Affiliations that were found on Google were identified as private companies. The operators
neither identified through Google or Lexis Nexis Corporate Affiliations are assumed to be small private
companies or trusts without a multilayered subsidiary form. The procedures for collecting this
information are as follows.
C.1.2.3.1. LexisNexis Corporate Affiliations Data Collection Procedures
I. Set Up Excel Document for Data Collection A. Open the excel document 32-operator_20160516.xls B. In cell I:1 type opNotes C. In cell J:1 type opType D. In cell K:1 type opTicker E. In cell L:1 type subsidLevel F. In cell M:1 type parTicker G. In cell N:1 type parName H. In cell O:1 type parSubsid I. In cell P:1 type otherNotes
i. If you ever need to note something about your search that is not listed in this document, put your note in this column.
J. Right click cell B:1 K. Select Sort A to Z L. Save the document as 32-operatorMatch_InProgress.xls.
i. Always keep a backup of this data by saving it on the computer you are using and in our GoogleDrive project folder.
ii. To ensure accuracy, make sure the .xls has autocorrect turned off. If you do not know how to turn off autocorrect, Google how to turn off autocorrect in your version of Excel.
II. Go to LexisNexis Corporate Affiliations database A. Go to A&M Libraries website at http://library.tamu.edu/ B. Type lexisnexis corporate affiliations in the search box and select the search button C. Click LexisNexis Corporate Affiliations
i. If there is an issue with accessing the database, contact the library at http://askus.library.tamu.edu/ . Be quick to ask for help.
3
III. Collect Data A. If the row has new company name information:
i. Copy the name in the cell for operator_name ii. Paste the parent company name information in the company search bar and
select the search button 1. If LexisNexis says No Records Match, try shortening the company name
(e.g., deleting strings like “USA,” “CORP”, “LTD”, “LLC”, “INC”, “LP”, or “CO”)
a. If LexisNexis still says No Records Match: i. type notFound in the row’s opNotes cell
ii. type . in the row’s remaining cells iii. move on to collect data for the next row
2. If LexisNexis finds more than one company, use the information in the parState, parZip, or parAddress column to find the correct company.
a. If you cannot narrow it down, often the companies you can narrow it down to are all a part of the same ultimate parent company (this is the case if the companies you narrow it down to all have the same Hierarchy/Family Role name). If this is the case, type estNotUnique in the row’s opNotes cell, type subsidiary in the opType cell, type . in the opTicker, subsidLevel and parSubsid cells, and move on to collect the ultimate parent company information
iii. Once the company is found, collect the ultimate parent company information: 1. Record company match found
a. Type found in the row’s opNotes cell 2. Find operating company 2012 information
a. Select the company name listed on the first column on the left b. On the new screen, in the Historical Data scroll box, select 2012
and wait for the data to load 3. Record operating company type
a. Record the Company Type listed for 2012 in the opType cell i. Type parent in the row’s opType cell if the company
type is a parent company ii. Type subsidiary in the row’s opType cell if the company
type is a subsidiary iii. Type private in the row’s opType cell if the company
type is a private company 1. If the company type is private, there will be no
corresponding ticker symbol, subsidiary level, parent company ticker, parent subsidiary levels or parent company name information. As such, type . in the row’s remaining cells and move on to collect information for the next operating company.
4. Record the operating company ticker symbol information a. Scroll down to the Industry/Other table b. Find the Ticker Symbol 0 row
4
i. If there is not a Ticker Symbol 0 row, or if there is no information in the Ticker Symbol 0 row, type . in the opTicker cell.
ii. If there is a Ticker Symbol 0 row, copy the ticker symbol information and paste it in the opTicker cell.
5. Record operating company subsidiary level information a. If the company type is parent, type 0 in the operating
company’s subsidLevel cell and move on to collect remaining parent company information
b. If the company is a subsidiary, count the number of subsidiary levels between the operating company and the ultimate parent company. A subsidiary level is designated by a dotted line and dash after an entity with the code S. For example:
In the example, if the operating company were
Concrete, Inc. the number of subsidiary levels is 3. If the
operating company were Knife River Corporation, the
number of subsidiary levels is 2. Once you find the
number of subsidiary levels between the operating
company and its parent, type the number in the row’s
subsidLevel cell.
6. Record operating company ultimate parent company name
Subsidiary Level
Subsidiary Level
Subsidiary Level
5
a. Look in the Ultimate Parent Name row in the Company/Financial Table.
i. If there is not a Ultimate Parent Name row, or if there is no information in the Ultimate Parent Name row, type . in the parName cell.
ii. If there is a Ultimate Parent Name row, copy the ultimate parent company name information and paste it in the parName cell.
7. Record operating company ultimate parent company ticker information a. If the operating company is the parent company, re-record the
operating company ticker symbol information in the parTicker cell (i.e., for operating companies that are ultimate parent companies, like Exxon Mobil, the parTicker cell and the opTicker cell should have the same information.
b. If the operating company is a subsidiary, find the ultimate parent company information by looking at the corporate hierarchy graph on the bottom of the page, and clicking on the ultimate parent company (i.e., the company on the top of the hierarchy with the symbol P). Wait for the ultimate parent company information to load.
c. Once the 2012 ultimate parent company information loads, scroll down to the Industry/Other table and look at the Ticker Symbol 0 row
i. If there is not a Ticker Symbol 0 row, or if there is no information in the Ticker Symbol 0 row, type . in the parTicker cell.
ii. If there is a Ticker Symbol 0 row, copy the ticker symbol information and paste it in the parTicker cell.
8. Record ultimate parent company subsidiary levels. a. Count the total subsidiary levels within the ultimate parent
company. Drawing from the example image listed above in section 5. All of the operating companies within the ultimate parent company MDU Resources group would be 3, since that is the number of subsidiary levels within the organization.
b. Once you find the number of subsidiary levels within the operating company’s ultimate parent company organization, type the number in the row’s parSubsid cell.
IV. Save Collected Data A. Save the dataset as 32-operatorMatch_COMPLETE_YYYYMMDD.xls where YYYY is the
year, MM is the month, and DD is the day you completed the data collection. In addition to saving the document on your computer, also save it in our Google Drive project folder.
B. Email the dataset. Kate will replicate these procedures on a random sample of the collected dataset to verify the data was correctly collected.
C. If necessary, Kate may ask you to go back and search for the operating company using other mechanisms, such as a Google Search. If this occurs, another set of data collection procedures will be established.
6
C.1.2.3.2. Google Data Collection Procedures
V. Set Up Excel Document for Data Collection M. Open the excel document Finished-32-operator_Complete_20170705.xls N. In cell Q:1 type googleSearch O. In cell R:1 type googleOpNotes P. In cell S:1 type dateChecked Q. In cell T:1 type url1 R. In cell U:1 type url2 S. In cell V:1 type url3 T. In cell W:1 type url4 U. In cell X:1 type url5 V. Sort cell I:1 opNotes by right clicking it and selecting Sort A to Z W. For all rows where opNotes is found, type 0 in the googleSearch column X. Hide all rows where opNotes is found
i. If you ever need to note something about your search that is not listed in this document, put your note in this column.
Y. Save the document as 32-operatorGoogleMatch_InProgress.xls. i. Always keep a backup of this data by saving it on the computer you are using
and in our GoogleDrive project folder. ii. To ensure accuracy, make sure the .xls has autocorrect turned off. If you do not
know how to turn off autocorrect, Google how to turn off autocorrect in your version of Excel.
VI. Go to Google Search Engine D. Go to https://www.google.com/
VII. Collect Data A. If the row has new company name information:
iv. Copy the name in the cell for operator_name v. Paste the parent company name information in the company search bar and
select the search button vi. Search the first page of the results to see if they have any information about if
the company is private, or public. Stick to the first page of results so that we can maintain the same procedures for all companies. Keep an eye out for the Bloomberg.com results, as this website often has the information we need. You can tell if the company is a private company if the url has the word private in it, for example: https://www.bloomberg.com/research/stocks/private/snapshot.asp?privcapId=145709387 is a private company
1. If no information is found a. re-google the company, this time the search should include the
operating company name and the word “Bloomberg” b. If still no information is found:
i. type 1 in the row’s googleSearch cell ii. type notFound in the row’s googleOpNotes cell
iii. type the date it was checked in the row’s dateChecked cell
1. use the yyyymmdd format
7
2. for example, if the information is collected on July 10, 2017, type 20170710 in the dateChecked cell
c. move on to collect data for the next row vii. Once information about the parent company is found, collect the following
information: 1. Record company match found
a. Type found in the row’s googleOpNotes cell b. Copy the url where the information was found and paste it in
the row’s url1 cell c. type the date it was checked in the row’s dateChecked cell
2. Record operating company type a. Record the Company Type listed for 2012 in the opType cell
i. Type parent in the row’s opType cell if the company type is a parent company
ii. Type subsidiary in the row’s opType cell if the company type is a subsidiary
iii. Type private in the row’s opType cell if the company type is a private company
1. If the company type is private, there will be no corresponding ticker symbol, subsidiary level, parent company ticker, parent subsidiary levels or parent company name information. As such, type . in the row’s remaining cells and move on to collect information for the next operating company.
3. Record the operating company ticker symbol information in the opTicker cell.
a. If this information is not available in your search, but through your search, you found a better name for the company or the ultimate parent company, see if you can find some of the information in Lexis Nexis by following previous procedures or by conducting a new Google search using the better company name.
b. If you obtain information about the operating company’s ticker information from a different url from where you found out the company’s type (i.e. if it is private/parent/subsidiary), record this url in the url2 column
4. Record operating company subsidiary level information in the subsidLevel column.
a. If this information is not available in your search, but through your search, you found a better name for the company or the ultimate parent company, see if you can find some of the information in Lexis Nexis by following previous procedures or by conducting a new Google search using the better company name.
b. If you obtain information about the operating company’s subsidiary level from a different url from where you found out
8
the company’s type (i.e. if it is private/parent/subsidiary) and/or ticker information, record this url in the url2 or url3 column
9. Record operating company ultimate parent company ticker information in the parTicker cell
a. If this information is not available in your search, but through your search, you found a better name for the company or the ultimate parent company, see if you can find some of the information in Lexis Nexis by following previous procedures or by conducting a new Google search using the better company name.
b. If you obtain information about the operating company’s subsidiary level from a different url from where you found out the company’s type (i.e. if it is private/parent/subsidiary), ticker information, and/or subsidiary level, record this url in the url2 url3, or url4 column
10. Record ultimate parent company subsidiary levels in the row’s parSubsid cell
a. If this information is not available in your search, but through your search, you found a better name for the company or the ultimate parent company, see if you can find some of the information in Lexis Nexis by following previous procedures or by conducting a new Google search using the better company name.
b. If you obtain information about the operating company’s subsidiary level from a different url from where you found out the company’s type (i.e. if it is private/parent/subsidiary), ticker information, subsidiary level and/or parent ticker, record this url in the url2, url3, url4, or url5 column
VIII. Save Collected Data A. Save the dataset as 32-operatorGoogleMatch_COMPLETE_YYYYMMDD.xls where YYYY is
the year, MM is the month, and DD is the day you completed the data collection. In addition to saving the document on your computer, also save it in our Google Drive project folder.
B. Email the dataset. Kate will replicate these procedures on a random sample of the collected dataset to verify the data was correctly collected.
C. If necessary, Kate may ask you to go back and search for the operating company using other mechanisms, such as a Google Search. If this occurs, another set of data collection procedures will be established.
C.1.3. Variables and Measurements Variables and measured are described in the table below.
Figure C.1: Variables and Measures for Analysis Presented in Chapter 4
Variables Facility-Level Measure
Venting/Flaring Participation
1 = Facility vented or flared in 2012 0 = Not
Venting/Flaring Magnitude Log (100 * [Volume (in mcf) of gas well or casinghead gas vented or flared at the facility] / Volume (in mcf) of gas well or casinghead gas produced at the facility])
9
Income Median income category of households in block groups within one mile of facility: 1: Less than $10K 2: $19K – $14,999.99 3: $15K – $19,999.99 4: $20K – $24,999.99 5: $25K – $29,999.99 6: $30K – $34,999.99 7: $35K – $39,999.99 8: $40K – $44,999.99 9: $45K – $45,999.99 10: $50K – $59,999.99 11: $60K – $74,999.99 12: $75K – $99,999.99 13: $100K – $124,999.99 14: $125K – $149,999.99 15: $150K – $199,999.99 16: $200 thousand or more
Home Values Median owner-occupied home value category of households in block groups within one mile of facility: 1: Less than $10K 2: $10K – $14,999.99 3: $15K – $19,999.99 4: $20K – $24,999.99 5: $25K – $29,999.99 6: $30K – $34,999.99 7: $35K – $39,999.99 8: $40K – $49,999.99 9: $50K – $59,999.99 10: $60K – $69,999.99 11: $70K – $79,999.99 12: $80K – $89,999.99 13: $90K – $99,999.99 14: $100K – $124,999.99 15: $125K – $149,999.99 16: $150K – $174,999.99 17: $175K – $199,999.99 18: $200K – $249,999.99 19: $250K – $299,999.99 20: $300K – $399,999.99 21: $400K – $499,999.99 22: $500K – $749,999.99 23: $750K – $999,999.99 24: $1 million or more
Poor 100 * Number of households living at or below the poverty that live in block groups within one mile of the facility / Number of households living within one mile of the facility
Poor Education 100 * Number of individuals 25 and older without a high school diploma living in a block group within one mile of the facility / Number of individuals 25 and older residing in a block group within one mile of the facility
Limited English 100 * Number of households with limited or no English fluency living in block groups within one mile of the facility / Number of households living within one
10
mile of the facility
Population Density Number of individuals living in block groups within one mile of the facility / Land area of block groups within one mile of the facility (in square miles)
Nonprofit Organizations Number of registered nonprofits in the county in which the facility is located
Black 100 * Number of non-Hispanic black individuals residing in block groups within one mile of the facility / Number of individuals residing in a block group within one mile of the facility
Hispanic 100 * Number of Hispanic individuals residing in block groups within one mile of the facility / Number of individuals residing in a block group within one mile of the facility
Permitted 1 = Facility Had Permit to Legally Vent or Flare in 2012 0 = Not
Violation Number of violations facility received for venting or flaring in 2012
Oil/Condensate Production Volume (in barrels) of oil or condensate produced at facility squared
Gas/Casinghead Production Volume (in mcf) of gas or casinghead produced at the facility squared
New Well 1 = Facility drilled new wells in 2012 0 = Not
Gas Well 1 = Facility is a gas well 0 = Facility is an oil lease
Well Density Number of other active wells within one mile of the surface locations of active wells on the lease
Wells on Lease Number of wells active on lease
Distance to Nearest Pipeline
Nearest distance (in feet) between the surface locations of wells active on the lease and natural gas pipeline build by January 1, 2012
Operator Gas Production Volume of gas well gas and casinghead gas produced at the operator’s facilities (in thousand cubic feet)
Operator Oil Production Volume of condensate and oil produced at the operator’s facilities (in barrels)
Operator Wells Number of active wells directly owned by the operator
Operator Gas Ratio Volume (in mcf) of gas well gas produced by facilities directly owned by the operator / (Volume of petrochemicals (barrels of oil, barrels of condensate, mcf of casinghead, and mcf of gas well gas) produced by facilities directly owned by the operator
Multilayered Subsidiary Form
1 = Facility operator is subsidiary organization 0 = Not
C.1.4. Connecting Texas Railroad Commission Datasets In addition to connecting the Texas Railroad Commission Datasets as described in appendix B,
the following steps were also taken. First, I parsed out and connected the Organization Report dataset
to the production data dump using the operator number. Then I connected both the permit extract and
inspection extract to the production data query dump using the district number, lease name and
operator number.
C.1.5. Connecting Texas Railroad Commission Information with Other Datasets To connect facility points to the nearest pipeline, I build upon the Geographic Information
System described in appendix B. I started by adding the 2012 United States Energy Information
Administration Interstate and Intrastate Natural Gas Pipeline shapefile to the geodatabase and
projecting it to North American Dam NAD83 State Plan Texas Central FIPS 4203 Coordinate System.
Then, I used the nearest distance tool to find the nearest distance (in feet) between facility wellbore
surface locations and pipeline established as of January 1, 2012.
11
C.1.6. Data Analysis
This research uses a two-part/hurdle model in order to determine correlations between facility
and operator characteristics and both (1) whether, among all producing oil and gas extraction facilities,
the facility vented or flared (i.e., participation), and (2) the venting and flaring rate among facilities that
vented or flared (i.e., magnitude). The final model accounts for the clustering of standard errors by
facility operator. A two-part model accounting for the clustering of standard errors by facility operator
was chosen over a multi-level model for two reasons: (1) because there is not enough variation at level
one to run a multi-level regression model, and (2) because multi-level regression model outcomes are
very similar to regression model outcomes that account for the clustering of standard errors. Using
Stata’s vce(cluster) command, I use Huber’s (1967) formula to produce consistent standard errors, even
though the data is clustered. To ensure that operators with many facilities are not under sampled,
clustered sandwich variance estimators were produced rather than simply sampling one facility for each
operator.
C.1.6.1. Participation Generalized Linear Model
The first part of the model (i.e., the participation model) investigates the direct effects of lease
and operator characteristics on whether or not the lease vents or flares using the following equation:
log (𝜑1𝑗
1−𝜑1𝑗) = 𝛾0 + ∑ 𝛽𝑘(𝑀𝑘𝑗 − 𝑀𝑘𝑗
)𝐾𝑘=1 + 𝑒𝑗, 𝑤here 𝑒𝑖𝑡𝑗 ≈ N(0,𝜎𝑒
2)
In the full participation model above, 𝜑1𝑗 denotes the probability that lease j vented or flared; 𝛾0
denotes the average log odds that a lease will vent or flare; 𝛽𝑘 is the corresponding coefficient that
represents the direction and strength of the explanatory variable (k is the number of variables at the
lease-level); 𝑀𝑘𝑗 is the observation of the explanatory variable k for lease j, and 𝑀𝑘 is the mean of the
explanatory variable k; 𝑒𝑗 represents the random error, which is assumed to be normally distributed
with a mean of 0 and variance of 𝜎𝑒2.
C.1.6.2. Magnitude Generalized Linear Mixed Model
The second part of the model (i.e., the magnitude model) investigates the direct effects of lease,
and operator characteristics on the venting or flaring rate for leases that vented or flared gas using the
following equation:
log(𝐸[𝜑2𝑗 | 𝜑2𝑗 > 0]) = 𝛾0 + ∑ 𝛽𝑘(𝑀𝑘𝑗 − 𝑀𝑘 )𝐾
𝑘=1 + 𝑒𝑗, 𝑤here 𝑒𝑗 ≈ N(0,𝜎𝑒2)
In the full magnitude model above, 𝜑2𝑗 denotes the venting or flaring rate at lease j; 𝛾0 denotes the
average venting or flaring rate of all leases that vented or flared; 𝛽𝑘 is the corresponding coefficient that
represents the direction and strength of the explanatory variable (k is the number of different
explanatory variables in the model); 𝑀𝑘𝑗 is the observation of the explanatory variable k for lease j, and
𝑀𝑘 is the mean of the explanatory variable k; 𝑒𝑗 represents the random error, which is assumed to be
normally distributed with a mean of 0 and variance of 𝜎𝑒2.
C.2. Research Findings
C.2.1. Trends in Who Vents and Flares Summary statistics for facility and operator-level venting and flaring volumes and the
characteristics of communities surrounding all producing oil and gas extraction facilities in 2012 are as
follows.
12
C.2.1.1. Facility-Level Summary Statistics
Figure C.2: Facility-Level Analysis Summary Statistics
N Mean
Standard Deviation Minimum Maximum
VENTING AND FLARING PRACTICES
Venting and Flaring Facilities
126,862 0.05 0.22 0 1
Venting and Flaring Rate (log)
6,651 -3.62 2.96 -13.929 0
COMMUNITY ECONOMIC STATUS
Median Household Income
126,862 9.65 1.94 1 15
Median Owner Occupied Housing Value
126,862 12.81 2.73 1 21
Portion of Households Living At or Below the Poverty Line
126,862 13.99 9.43 0 100
COMMUNITY CULTURAL CAPITAL
Portion of Residents 25 and Older Without Highschool Diploma
126,862 19.74 10.84 0 78.553
Portion of Households with Limited Fluency in the English Language 126,861 4.91439 7.470736 0 44.85981
COMMUNITY ORGANIZATION CAPACITY
Population Density
126,862 38.905 156.712 0.007 6,707.434
Registered Nonprofit Organizations
126,862 283.443 845.645 0 14,502
COMMUNITY RACE AND ETHNICITY
Portion of Residents that are non-Hispanic black
126,862 4.211 7.436 0 88.075
Portion of Residents that are Hispanic
126,862 25.703 23.534 0 100
STATE REGULATION
Permitted to Vent or Flare
126,862 0.004 0.066 0 1
Venting or Flaring Violations
126,862 0.002 0.068 0 9
Lease Inspections 126,862 .2970866 1.177388 0 95
FACILITY SIZE
Oil and Condensate Produced (square)
126,862 3.98E+09 4.66E+11 0 1.12E+14
Gas and Casinghead Produced (square)
126,862 4.5E+10 4.1E+12 0 9.84E+14
FACILITY COMPLEXITY
Facility Wellbores 126,862 4.816 41.446 1 5,413
ECONOMIC COSTS New Wellbores Established 126,862 0.043 0.203 0 1
Gas Well 126,862 .6078495 .4882319 0 1
Other Wellbores within One Mile
126,862 43.567 5527.855 0 1,968,882
Nearest Distance to Gas Pipeline
126,862 9,398.80 17,562.53 0 176,658.40
OPERATOR SIZE
Operator Volume of Gas and Casinghead Produced
126,862 1.12e+08 2.27e+08 0 7.61e+08
Operator Volume of Oil and Condensate Produced 126,862 2766057 5893063 0 4.27e+07
13
OPERATOR COMPLEXITY
Wellbores Controlled by Facility Operator (log)
126,862 6.158 2.157 0 10.296
OPERATOR GAS DEPENDENCE
Gas Ratio
126,862 .8029547 .2939673 0 1
ORGANIZATIONAL STRUCTURE
Subsidiary 126,862 0.291 0.454 0 1
SIZE INTERACTION TERM
Facility gas/cond production volume x Operator gas/cond production volume 126,862 9.85e+12 4.71e+13 0 2.34e+15
Figure C.3: Correlations Between Variables- Facility Level Analysis
vent/ flare
vent/flare rate (ln) income
house value poverty uneducated
limited english
income -
0.0301* 0.0352 1.000
house value -0.026* -0.0504* 0.471* 1.000
poverty -0.035* -0.0871* -0.469* -0.404* 1.000
uneducated 0.1312* -0.0464* -0.434* -0.485* 0.5512* 1.000
limited english 0.1616* -0.0775* -0.4706* -0.303* 0.3495* 0.7545* 1.000
pop. density -0.035* -0.0601* 0.0391* 0.1341* 0.0062 -0.0615* 0.0014
nonprofits -0.026* -0.0865* 0.0495* 0.1597* -0.0019 -0.0276* 0.0214*
black -0.055* 0.0472* -0.155* -0.032* 0.0400* -0.045* -0.1431*
hispanic 0.1612* -0.0566* -0.322* -0.404* 0.4608* 0.8049* 0.7789*
permit 0.2083* 0.2001* -0.026* -0.036* 0.0117* 0.0461* 0.0472*
violation 0.0282* 0.0466* -0.012* -0.0067 0.0062 0.0036 -0.0008
oil/cond 0.011* -0.0384 0.0019 -0.0027 -0.0025 0.0059 0.0047
gas/csgd 0.0076* -0.0224 0.0016 -0.0019 -0.0016 0.0021 0.0012
new 0.1588* 0.0360 0.0054 0.0191* -0.022* 0.0393* 0.0297*
gas well -0.143* -0.6385* 0.0148* 0.0517* 0.1267* -0.0734* -0.0975*
well density -0.0008 -0.0619* 0.0009 0.0013 0.0010 0.0002 0.0008
wells 0.0434* 0.0191 0.0153* 0.0004 -0.019* 0.0267 0.0172*
pipe distance -0.007* 0.1696* -0.043* -0.057* -0.055* -0.0226* -0.0128*
oper. oil/cond 0.2214* 0.1003* 0.1016* 0.0504* -0.0358* 0.1007* 0.0756*
op. csgd/gas -0.075* -0.3502* 0.0967* 0.0997* -0.0504* -0.1036* -0.1157*
operator wells 0.0801* -0.2472* 0.1532* 0.1072* -0.051* 0.0328* 0.0316*
gas ratio -0.077* -0.6451* 0.0482* 0.0924* 0.0823* -0.0370* -0.0373*
subsidiary 0.0162* 0.1078* 0.0895* 0.0582* -0.0469* -0.0441* -0.0460*
size interact -0.019* -0.1778* 0.0173* 0.0428* -0.0321* -0.0469* -0.0389*
pop.
density ngos black Hisp. permit violation oil/cond gas/ csgd
pop dens 1.000
ngos 0.4844* 1.000
14
black 0.1261* 0.0775* 1.000
hispanic -0.0112* 0.0238* -0.208* 1.000
permit -0.0138* -0.016* -0.015* 0.0539* 1.00
violation 0.0009 -0.0012 -0.002 0.0045 0.00 1.000
inspect. 0.0216* 0.0479* 0.0016 -0.017* -0.01 0.1275*
oil/cond -0.0014 -0.0018 -0.002 0.0063 0.00 0.0005 1.000
gas/csgd 0.0000 -0.001 0.0013 0.0034 -0.00 -0.0001 0.755* 1.000
new -0.0013 -0.014* -0.027* 0.0474* 0.12* 0.0037 0.01* 0.01*
gas well 0.0863* 0.0593* 0.1195* -0.09* -0.1* -0.026* -0.011* -0.006
well dens 0.0001 0.0002 0.0025 0.0013 -0.00 -0.0001 0.0003 0.000
wells -0.0052 0.0073 -0.015* 0.0299* 0.00 0.0142* 0.158* 0.058*
pipe dist -0.0731* -0.083* -0.144* -0.021* 0.00 0.0346* -0.003 -0.003
op oil/cond -0.0244* -0.036* -0.085* 0.1093* 0.0866* 0.0009 0.1335* 0.0700*
op. csgd/gas 0.0871* 0.0772* 0.0600* -0.13* -0.0197* -0.012* -0.0275* 0.1029*
op wells 0.0141* 0.0066 0.0020 0.0257* 0.0178 -0.0072 0.0055 0.003
gas ratio 0.0448* 0.0016 0.0977* -0.059* -0.03* -0.026* -0.0046 -0.006
subsidiary 0.0285* 0.0276* 0.0286* -0.056* 0.0206* -0.0075 0.0698* 0.0026
size interact 0.0948* 0.0515* 0.0325* -0.066* -0.0074 -0.0048 0.1283* 0.4938*
new gas well well dens wells
pipe dist
oper. oil/cond
op csgd/ gas
op wells
gas ratio subsid
gas/csgd
new 1.000
gas well -0.058* 1.0000
well dens
-0.000 0.0024 1.000
wells 0.12* -0.12* 0.001 1.00
pipe dist -0.005 -0.22* -0.002 0.01 1.00
op oil/cond
0.1335* -0.166* -0.0001 0.0730 -0.1* 1.00
op csgd/gas
-0.0275* 0.3665* 0.0001* -0.04* -0.1* 0.1396* 1.00
op wells 0.062* 0.063* 0.0027 0.05* -0.1* 0.5068* 0.5394* 1.000
gas ratio -0.061 0.775* 0.0024 -0.1* -0.2* -0.02* 0.2976* 0.063 1.000
subsid 0.036* 0.222* -0.001 0.00 -0.1* 0.1969* 0.6314* 0.528* 0.22* 1.000
size inter 0.1283* 0.1584* -0.0003 0.0025 -0.0* 0.0584* 0.4462* 0.2270* 0.1281* 0.2775*
* = significant at p < 0.001
C.2.1.2. Operator-Level Summary Statistics
I also examined all operators directly responsible for oil and gas facility operations. Summary
statistics for facility operators are as follows:
Figure C.4: Operator-Level Summary Statistics
15
N Mean
Standard Deviation Minimum Maximum
VENTING AND FLARING PRACTICES
Venting and Flaring Operator
6,135 0.11 0.32 0 1
Venting and Flaring Rate (ln)
6,135 0.05 0.58 -13.929 37.96377
COMMUNITY ECONOMIC STATUS
Median Household Income (mean)
6,135 8.99 1.86 1.15709 14
Median Owner Occupied House Value (mean)
6,135 12.00 2.68 1 20.18182
Portion At or Below the Poverty Line (mean)
6,135 13.74 7.08 0 58.36614
COMMUNITY CULTURAL CAPITAL
Portion Without Highschool Diploma (mean)
6,135 19.07 8.27 0 56.972
Portion with Limited English Language Fluency (mean 6,135 4.465024 5.613945 0 44.85981
COMMUNITY ORGANIZATION CAPACITY
Population Density (mean)
6,135
35.391 145.222 0.007 4612.576
Registered Nonprofit Organizations (mean)
6,135 289.822 1051.664 0 14,502
COMMUNITY RACE AND ETHNICITY
Portion of Residents that are non-Hispanic Black (mean)
6,135 3.771 6.220 0 88.075
Portion of Residents that are Hispanic (mean)
6,135 24.813 21.138 0 98.74068
STATE REGULATION
Permitted to Vent or Flare (square mean)
6,135 0.001 0.021 0 1
Venting or Flaring Violations (mean)
6,135 0.003 0.085 0 4.5
Lease Inspections 6,135 22.94442 124.2467 0 3490
FACILITY SIZE
Oil and Condensate Produced (square mean)
6,135 1.30E+09 6.46E+10 0 3.57E+12
Gas and Casinghead Produced (square mean)
6,135 1.56E+10 7.33E+11 0 4.05E+13
FACILITY COMPLEXITY
Facility Wellbores (log mean) 6,135 6.265916 19.13736 1 707
16
ECONOMIC COSTS
New Wellbores Established (mean) 6,135 0.031 0.124 0 1
Gas wells (mean) 6,135 .3180042 .3615361 0 1
Other Wellbores within One Mile (mean) 6,135 17.413 37.31328 0 2,179
Nearest Distance to Gas Pipeline (mean)
6,135 13,558.99 20,400.24 0 175,789.10
OPERATOR COMPLEXITY
Wellbores Controlled by Facility Operator (log) 6,135 2.759 1.721
0 10.3
OPERATOR GAS DEPENDENCE
Gas Ratio 6,135 .5302668 .4331749
0 1
ORGANIZATIONAL STRUCTURE
Subsidiary 6,135 .0176039 .1315174 0 1
C.2.2. Regression Results My primary analysis involves a facility-level two-part regression model. The development of
both the participation and magnitude models are below.
Figure C.5: Development of Facility -Level Regression Models
Model 1 Model 2 Model 3 Model 4
Part. Mag. Part. Mag. Part. Mag. Part. Mag.
N 126,862 6,651 126,862 6,651 126,862 6,651 126,861 6,651
Operator Clusters 4,608 455 4,608 455 4,608 455
R2/Pseudo R2 0.0821 0.028 0.0821 0.028 0.1681 0.4544 0.2057 0.5570
Constant -3.7391* -1.67627* -3.7391* -1.676 -3.3632* -1.358349 -3.2315* 1.001008
SURROUNDING COMMUNITY DEMOGRAPHICS
median income NA -.026922 NA -.0269 NA -.0102889 NA .0639128
housing value .041787* -.095357* .041787 -.0954 .0387254 -.0759229 NA -.022374
percent living at or below the poverty line
-.044509* -.038167* -.04451* -.0382 -.033188 .0027215 -.03291* -.002882
percent without high school diploma
NA .0077081 NA .0077 NA -.0037406 NA .0041144
percent with limited fluency in the English language
NA -.022509* NA -.023* NA -.0083465 NA .007259
population density -.002532* -.000712 -.002532 -.0007 -.001753 -.0001171 -.001491 -.000062
number of NGOs -.000225* -.000209* -.000225 -.0002 -.000068 .000054 .000021 -.00004
percent black -.011402* .0227* -.011402 .0227 -.008752 .0197132 -.002243 .0081067
percent Hispanic .032917* .0009338 .032917* .0009 .030933* -.0007944 .027933* .0005596
STATE REGULATION
17
permit 3.08145* 1.191801* 2.93243* 1.22202*
violations .730726* .3233757* .741784* .268556*
FACILITY SIZE
oil/cond produced 9.80e-15 -1.08e-13* -1.9e-15 1.e-14*
csgd/gas produced 1.66e-15 -1.92e-15 -1.2e-14 -8.9e-14*
FACILITY COMPLEXITY
facility wellbores .0005629 .0010193* .0001769 .000878*
ECONOMIC COSTS
new 1.4752* .391152 1.41654* .4877963
gas wells -1.0853* -4.061098* -.521976 -2.9638*
wellbores within one mile -.003954 -.0065501* -.004127 -.00279*
nearest distance to pipeline -8.7e-06* .0000159* -6.4e-06* 6.06e-06
OPERATOR SIZE
oil/cond produced 6.71e-08* 2.52e-09*
csgd/gas produced -3.2e-09* 6.65e-09
OPERATOR COMPLEXITY
operator wellbores NA -.3678*
OPERATOR GAS DEPENDENCE
gas portion NA -2.9718*
ORGANIZATIONAL STRUCTURE
subsidiary .0982878 .6126543
SIZE INTERACTION
facility gas production volume x operator gas production volume
2.4e-15* -7.8e-15*
* significant at P < 0.05
C.2.3. Post Regression Analysis
C.2.3.1. Checking For Multicollinearity
Prior to determining which regression model to employ, I first determined if multicollinearity
would be a problem. To do this, the regression model was estimated and then variance inflation factors
(VIF) were measured as shown below.
Figure C.6: Table of Variance Inflation Factor Scores for Facility Participation Analysis
VIF 1/VIF
SURROUNDING COMMUNITY DEMOGRAPHICS
percent living at or below the poverty line 3.89 0.257054
population density 1.41 0.707269
number of NGOs 1.45 0.691777
percent black 1.37 0.730625
percent Hispanic 2.90 0.345352
STATE REGULATION
permit 1.03 0.971903
violation 1.00 0.997537
18
FACILITY SIZE
oil/cond produced 2.40 0.417042
csgd/gas produced 2.39 0.418156
FACILITY COMPLEXITY
facility wellbores 1.07 0.930883
ECONOMIC COSTS
new 1.10 0.907388
gas well 2.60 0.384956
wellbores within one mile 1.00 0.999919
nearest distance to pipeline 1.17 0.855453
OPERATOR SIZE
csgd/gas produced 2.70 0.370656
oil/cond produced 1.35 0.740576
ORGANIZATIONAL STRUCTURE
subsidiary 2.20 0.454092
SIZE INTERACTION
facility gas production volume * operator gas production volume
1.40 0.715139
1.80
Mean VIF
While there was moderate correlation, all VIF scores were not greater than 5, so multicollinearity was not determined to be a problem. I removed various community and operator variables with VIF scores significantly larger than 5 in order to ensure the ensure confidence in model estimates.
Figure C.7: Table of Variance Inflation Factor Scores for Facility Magnitude Analysis
VIF 1/VIF
SURROUNDING COMMUNITY DEMOGRAPHICS
median income 3.16 0.316646
median owner occupied housing value 1.93 0.518510
percent living at or below the poverty line 2.48 0.403658
percent without high school diploma 5.00 0.200166
percent with limited fluency in the English language 4.96 0.201544
population density 1.26 0.790534
number of NGOs 1.31 0.763892
percent black 1.17 0.853475
percent Hispanic 5.27 0.189803
STATE REGULATION
permit 1.07 0.933906
violation 1.01 0.989903
FACILITY SIZE
oil/cond produced 1.50 0.665079
19
csgd/gas produced 1.48 0.677098
FACILITY COMPLEXITY
facility wellbores 1.41 0.711328
ECONOMIC COSTS
new 1.17 0.854952
gas well 3.49 0.286199
wellbores within one mile 1.43 0.701469
nearest distance to pipeline 1.18 0.849737
OPERATOR SIZE
csgd/gas produced 2.31 0.433732
oil/cond produced 2.31 0.433381
OPERATOR COMPLEXITY
operator wellbores 2.48 0.402839
OPERATOR GAS DEPENDENCE
gas portion 2.93 0.341042
ORGANIZATIONAL STRUCTURE
subsidiary 1.19 0.843047
SIZE INTERACTION
facility gas production volume * operator gas production volume
2.02 0.495082
Mean VIF 2.23
While there was moderate correlation, all VIF scores were not much larger than 5, so multicollinearity was not determined to be a problem.
C.2.3.2. Checking Residuals
The residual distribution of the regression models used in the analysis were finally examined.
Figure C.8: Final Participation Regression Model Residual Distribution
20
As you can see here, the facility-level logit model predicts most observations very well. The average
residual (which ranged from -12.96196 to 18.50433) is -.0039995 with a standard deviation of .9616276.
Model assumptions that the residuals are close to normal and approximately independently distributed
are not significantly violated.
Figure C.9: Final Magnitude Regression Model Residual Distribution
Ordinary Lease Squares (OLS) regression model assumptions that the residuals are close to normal and
approximately independently distributed are not significantly violated. The average residual (which
ranged from -9.63593 to 8.536046) is -6.56e-10 with a standard deviation of 1.968609.
21
C.3. References Huber, Peter. 1967. “The behavior of maximum likelihood estimates under nonstandard
conditions.” Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and
Probability. pp. 221–233.