Shaocai Yu *, Brian Eder* ++, Robin Dennis* ++, Shao-hang Chu**, Stephen Schwartz*** *Atmospheric...

1
Shaocai Yu * , Brian Eder* ++ , Robin Dennis* ++ , Shao-hang Chu**, Stephen Schwartz*** *Atmospheric Sciences Modeling Division, National Exposure Research Laboratory, ** Office of Air Quality Planning and Standards, U.S. EPA, NC 27711 *** Atmospheric Sciences Division, Brookhaven National Laboratory, Upton, NY 11973 ++ Air Resources Laboratory, National Oceanic and Atmospheric Administration, RTP, NC 27711 In this study, we propose new metrics to solve the symmetrical problem between overprediction and underprediction following the concept of factor. Theoretically, factor is defined as ratio of model prediction to observation if the model prediction is higher than the observation, whereas it is defined as ratio of observation to model prediction if the observation is higher than the model prediction. Following this concept, the mean normalized factor bias (M NFB ), mean normalized gross factor error (M NGFE ), normalized mean bias factor (N MBF ) and normalized mean error factor (N MEF ) are proposed and defined as follows: where M i and O i are values of model (prediction) and observation at time and/or location i, respectively, N is number of samples (by time and/or location). The values of M NFB , and N MBF are linear and not bounded (range from - to +). Like M NB and M NGE , M NFB and M NGFE can have another general problem when some observation values (denomination) are trivially low and they can significantly influence the values of those metrics. N MBF and N MEF can avoid this problem because the sum of the observations is used to normalize the bias and error,. The above formulas of N MBF and N MEF can be rewritten for case as follows: N MBF and N MEF are actually the results of summaries of normalized bias (M NB ) and error (M NGE ) with the observational concentrations as a weighting function, respectively. N MBF and N MEF have both advantages of avoiding dominance by the low values of observations in normalization like N MB and N ME and maintaining adequate evaluation symmetry. The meanings of N MBF can be interpreted as follows: if N MBF 0, for example, N MBF =1.2, this means that the model overpredicts the observation by a factor of 2.2 (i.e., N MBF +1=1.2+1=2.2); if N MBF 0, for example, N MBF =- 0.2, this means that the model underpredicts the observation by a factor of 1.2 (i.e., N MBF -1=-0.2-1=- 1.2). Table 3 and Figures 3 and 4: (1) For weekly data from CASTNet, both N MBF (0.03 to 0.08) and N MEF (0.24 to 0.27) for SO 4 2- are lower than the performance criteria. (2) For 24-hour data from IMPROVE, SEARCH and STN, both N MBF (-0.19 to 0.22) and N MEF (0.42 to 0.46) for SO 4 2- are slightly higher than the performance criteria. The model performed better on the weekly data of CASTNet and better over the eastern region than western region for SO 4 2- . This is consistent with the results of Dennis et al. [1993]. (3) For PM 2.5 NO 3 - , both N MBF (-0.96 to 0.59) and N MEF (0.80 to 1.70) for SEARCH, CASTNet, and IMPROVE data are larger than the performance criteria. Although the N MBF (0.01) for STN data in 2002 is very small, its N MEF (0.77) is still larger than the performance criteria. (4) CMAQ performed well over the Northeastern region but overpredicted most of observed NO 3 - over the mid-Atlantic region by more than a factor of 2 and underpredicted most of observed NO 3 - over the west region and southeast by more than a factor of 2. Both N MBF and N MEF of winter of 2002 are smaller than those of summer of 1999. This is because the NO 3 - concentrations in winter of 2002 were 7 times higher than those in summer of 1999 although M B values of winter of 2002 were higher than those of summer of 1999. (5) One of major reasons for poor performance of model on PM 2.5 NO 3 - is that the PM 2.5 NO 3 - concentrations are very low and are very sensitive to the errors in PM 2.5 SO 4 2- and NH 4 + , and temperature and RH when thermodynamic model (such as ISSORROPIA model here) partitions total nitrate (i.e., aerosol NO 3 - +gas HNO 3 ) between the gas and aerosol phases. 4. Summary and conclusions A set of new statistical quantitative metrics on the basis of concept of factor has been proposed that can provide a rational operational evaluation of air quality model for the relative difference between modeled and observed results. The new metrics (i.e., N MBF , N MEF ) have advantages of both avoiding dominance by the low values of observations and maintaining adequate evaluation symmetry. The application of these new quantitative metrics in operational evaluation of CMAQ performance on PM 2.5 SO 4 2- and NO 3 - shows that the new quantitative metrics are useful and their meanings are also very clear and easy to explain. Figure 1: a dataset for a real case of model and observation for aerosol NO 3 - was separated into four regions, (1) region 1 for model/observation0.5, (2) region 2 for 0.5model/observation1.0, (3) region 3 for 1.0<model/observation2.0, (4) region 4 for 2.0model/observation. Results in Table 2: (1) for the only data in region 1 with model/observation0.5, M NB , N MB , F B , N MFB , M NFB and N MBF are –0.82, -0.78, -1.43, -1.28, -36.67, and –3.58, respectively. Obviously, only normalized mean bias factor (N MBF ) gives reasonable description of model performance, i.e., the model underpredicted the observations by a factor of 4.58 in this case. (2) For the only data in region 4 with model/observation>2, M NB , N MB , F B , N MFB , M NFB and N MBF are 4.27, 2.25, 1.12, 1.06, 4.27 and 2.25, respectively. The results of N MBF and N MB reasonably indicate that the model overpredicted the observations by a factor of 3.25. (3) For the results of each metrics on combination case of regions 1 and 4 data, M NB , N MB , F B , N MFB , M NFB and N MBF are 1.50, 0.06, -0.27, 0.06, -18.02 and 0.06, respectively. Both N MB and N MBF show that the model slightly overpredicted the observations by a factor of 1.06, while F B (-0.27) shows that the model underpredicted the observations. This shows that the value of F B can sometimes result in a misleading conclusion as well. This specific case shows that it is not wise to use F B as an evaluation metric. N ME and N MEF : gross error between observations and model results is 1.19 times of mean observation. The good model performance can be concluded only under the condition that both relative bias (N MBF ) and relative gross error (N MEF ) meet the certain performance standards. (4) For the all data in Figure 1 (combination case 1+2+3+4 in Table 2), M NB , N MB , F B , N MFB , M NFB and N MBF are 0.96, 0.09, -0.13, 0.09, -10.75 and 0.09, respectively. Both N MB and N MBF show that the mean model only overpredicted the mean observation by a factor of 1.09. However, the gross error (N MGE ) between the model and observation is 0.77 times as high as observation. See Figure 1. On the basis of the above analyses and test, it can be concluded that our proposed new statistical metrics (i.e., N MBF and N MEF ) on the basis of concept of factor can show the model performance reasonably. These new metrics use observational data as only reference for the model evaluation and their meanings are also very clear and easy to explain. 3. Applications of new metrics over the US CMAQ model on PM 2.5 SO 4 2- and NO 3 - over the US: 6/15 to 7/17, 1999 and 1/8 to 2/18, 2002. PM 2.5 SO 4 2- and NO 3 - observational data: (1) at 61 rural sites from IMPROVE. Two 24-hour samples are collected on quartz filters each week, on Wednesday and Saturday, beginning at midnight local time. (2) at 73 rural sites from CASNet. Weekly (Tuesday to Tuesday) samples are collected on Teflon filters. (3) at 8 sites from SEARCH (Southeastern Aerosol Research and Characterization project). Daily samples are collected on quartz filters. (4) at 153 urban sites from STN (Speciated Trends Network). 24-hour samples are usually taken once every six days. The recommended performance criteria for O 3 by US EPA (1991): normalized bias (M NB ) 5 to 15%; normalized gross error (M NGE ) 30% to 35%. +15% of M NB corresponds New unbiased symmetric metrics for evaluation of the air quality model 1. INTRODUCTION Model performance evaluation is a problem of longstanding concern, and has two important objectives, i.e., (1) determine a model’s degree of acceptability and usefulness for a specific task, and (2) establish that one is getting good results for the right reason (Russell and Dennis, 2000). These objectives are accomplished by two different types of model evaluation, i.e., operational evaluation and diagnostic evaluation (or model physics evaluation) (Fox, 1981; EPA, 1991; Weil et al., 1992; Russell and Dennis, 2000), respectively. Although the operational evaluations for different air quality models have been intensively performed for regulatory purposes in the past years, the resulting array of statistical metrics are so diverse and numerous that it is difficult to judge the overall performance of the models (EPA, 1991; Seigneur et al., 2000; Yu et al., 2003). Some statistical metrics can cause misleading conclusions about the model performance (Seigneur et al., 2000). In this paper, a new set of quantitative metrics for the operational evaluation is proposed and applied in real evaluation cases. The other quantitative metrics frequently used in the operational evaluation are tested and their shortcomings are examined. 2. Quantitative metrics related to the operational evaluation and their examinations •Table 1: traditional quantitative metrics,mathematical expressions used in the operational evaluation. •Differences between the model and observations: mean bias (M B ) and mean absolute gross error (M AGE ) (or root mean square error). •Relative differences between the model and observations: The traditional metrics (such as M NB , M NGE , N MB ,and N ME ) •two problems that may mislead conclusions with this approach: (1) the values of M NB and N MB can grow disproportionately for overpredictions and underpredicitons because both values of M NB and N MB are bounded by –100% for underprediction; (2) the values of M NB and M NGE can be significantly influenced by some points with trivially low values of observations (denomination). •To solve this asymmetric evaluation problem between overpredictions and underpredictions, fractional bias (F B ) and fractional gross error (F GE ) (Irwin and Smith, 1984; Kasibhatla et al., 1997; Seigneur et al., 2000). Problems for F B and F GE: (1)what the metrics F B and F GE measure is not clear because the model prediction is not evaluated against observation but average of observation and model prediction, which is contradict to the traditional definition of evaluation. (2)the scales of F B and F GE are not linear and are seriously compressed beyond 1 as F B and F GE are bounded by 2 and +2, respectively. Acknowledgements The authors wish to thank other members at ASMD of EPA for their contributions to the 2002 release version of EPA Models-3/CMAQ during the development and evaluation. This work has been subjected to US Environmental Protection Agency peer review and approved for publication. Mention of trade names or commercial products does not constitute endorsement or recommendation for use. REFERENCES Cox, W.M., Tikvart, J.A., 1990. Atmospheric Environment 24, 2387- 2395. Eder, B., Shaocai, Yu, etc., 2003. Atmospheric Environment (in preparation). EPA, 1991. Guideline for regulatory application of the urban airshed model. USEPA Report No. EPA-450/4-91-013. U.S. EPA, Office of Air Quality Planning and Standards, Research Triangle Park, North Carolina. Dennis, R.L., J.N. McHenry, W.R.Barchet, F.S. Binkowski, and D.W. Byun, Atmos. Environ., 26 A(6), 975-997, 1993. Fox, D.G., 1981. Bulletin American Meteorological Society 62, 599- 609. Irwin, J., and M. Smith, 1984, Bulletin American Meteorological Society 65, 559-568 Russell, A., and R. Dennis, Atmos. Environ., 34, 2283-2324, 2000. Seigneur, C., et al.,., 2000. Journal of the Air & Waste Management Association 50, 588-599. Weil, J.C., R.I. Sykes, and A. Venkatram, 1992, Journal of Applied

Transcript of Shaocai Yu *, Brian Eder* ++, Robin Dennis* ++, Shao-hang Chu**, Stephen Schwartz*** *Atmospheric...

Page 1: Shaocai Yu *, Brian Eder* ++, Robin Dennis* ++, Shao-hang Chu**, Stephen Schwartz*** *Atmospheric Sciences Modeling Division, National Exposure Research.

Shaocai Yu*, Brian Eder*++, Robin Dennis*++, Shao-hang Chu**, Stephen Schwartz*** *Atmospheric Sciences Modeling Division, National Exposure Research Laboratory,

** Office of Air Quality Planning and Standards,

U.S. EPA, NC 27711

*** Atmospheric Sciences Division, Brookhaven National Laboratory, Upton, NY 11973++Air Resources Laboratory, National Oceanic and Atmospheric Administration, RTP, NC 27711

In this study, we propose new metrics to solve the symmetrical problem

between overprediction and underprediction following the concept of factor.

Theoretically, factor is defined as ratio of model prediction to observation if

the model prediction is higher than the observation, whereas it is defined

as ratio of observation to model prediction if the observation is higher than

the model prediction. Following this concept, the mean normalized factor

bias (MNFB), mean normalized gross factor error (MNGFE), normalized mean

bias factor (NMBF) and normalized mean error factor (NMEF) are proposed

and defined as follows:

 

where Mi and Oi are values of model (prediction) and observation at time

and/or location i, respectively, N is number of samples (by time and/or

location). The values of MNFB, and NMBF are linear and not bounded (range

from - to +). Like MNB and MNGE, MNFB and MNGFE can have another

general problem when some observation values (denomination) are trivially

low and they can significantly influence the values of those metrics. NMBF

and NMEF can avoid this problem because the sum of the observations is

used to normalize the bias and error,. The above formulas of NMBF and

NMEF can be rewritten for case as follows:

• NMBF and NMEF are actually the results of summaries of normalized bias

(MNB) and error (MNGE) with the observational concentrations as a weighting

function, respectively.

• NMBF and NMEF have both advantages of avoiding dominance by the low

values of observations in normalization like NMB and NME and maintaining

adequate evaluation symmetry.

• The meanings of NMBF can be interpreted as follows: if NMBF 0, for

example, NMBF =1.2, this means that the model overpredicts the

observation by a factor of 2.2 (i.e., NMBF+1=1.2+1=2.2); if NMBF 0, for

example, NMBF =-0.2, this means that the model underpredicts the

observation by a factor of 1.2 (i.e., NMBF-1=-0.2-1=-1.2).

• Table 3 and Figures 3 and 4:

(1) For weekly data from CASTNet, both NMBF (0.03 to 0.08)

and NMEF (0.24 to 0.27) for SO42- are lower than the

performance criteria.

(2) For 24-hour data from IMPROVE, SEARCH and STN, both

NMBF (-0.19 to 0.22) and NMEF (0.42 to 0.46) for SO42- are

slightly higher than the performance criteria. The model

performed better on the weekly data of CASTNet and

better over the eastern region than western region for

SO42-. This is consistent with the results of Dennis et al.

[1993].

(3) For PM2.5 NO3-, both NMBF (-0.96 to 0.59) and NMEF (0.80 to

1.70) for SEARCH, CASTNet, and IMPROVE data are

larger than the performance criteria. Although the NMBF

(0.01) for STN data in 2002 is very small, its NMEF (0.77) is

still larger than the performance criteria.

(4) CMAQ performed well over the Northeastern region but

overpredicted most of observed NO3- over the mid-Atlantic

region by more than a factor of 2 and underpredicted most

of observed NO3- over the west region and southeast by

more than a factor of 2. Both NMBF and NMEF of winter of

2002 are smaller than those of summer of 1999. This is

because the NO3- concentrations in winter of 2002 were 7

times higher than those in summer of 1999 although MB

values of winter of 2002 were higher than those of summer

of 1999.

(5) One of major reasons for poor performance of model on

PM2.5 NO3- is that the PM2.5 NO3

- concentrations are very

low and are very sensitive to the errors in PM2.5 SO42- and

NH4+, and temperature and RH when thermodynamic

model (such as ISSORROPIA model here) partitions total

nitrate (i.e., aerosol NO3-+gas HNO3) between the gas and

aerosol phases.

 

4.      Summary and conclusions

A set of new statistical quantitative metrics on the basis of

concept of factor has been proposed that can provide a rational

operational evaluation of air quality model for the relative

difference between modeled and observed results. The new

metrics (i.e., NMBF, NMEF) have advantages of both avoiding

dominance by the low values of observations and maintaining

adequate evaluation symmetry. The application of these new

quantitative metrics in operational evaluation of CMAQ

performance on PM2.5 SO42- and NO3

- shows that the new

quantitative metrics are useful and their meanings are also very

clear and easy to explain.

 • Figure 1: a dataset for a real case of model and observation for aerosol

NO3- was separated into four regions,

(1) region 1 for model/observation0.5,

(2) region 2 for 0.5model/observation1.0,

(3) region 3 for 1.0<model/observation2.0,

(4) region 4 for 2.0model/observation.

• Results in Table 2:

(1) for the only data in region 1 with model/observation0.5, MNB, NMB, FB,

NMFB, MNFB and NMBF are –0.82, -0.78, -1.43, -1.28, -36.67, and –3.58,

respectively. Obviously, only normalized mean bias factor (NMBF) gives

reasonable description of model performance, i.e., the model

underpredicted the observations by a factor of 4.58 in this case.

(2) For the only data in region 4 with model/observation>2, MNB, NMB, FB,

NMFB, MNFB and NMBF are 4.27, 2.25, 1.12, 1.06, 4.27 and 2.25, respectively.

The results of NMBF and NMB reasonably indicate that the model

overpredicted the observations by a factor of 3.25.

(3) For the results of each metrics on combination case of regions 1 and 4

data, MNB, NMB, FB, NMFB, MNFB and NMBF are 1.50, 0.06, -0.27, 0.06, -18.02

and 0.06, respectively. Both NMB and NMBF show that the model slightly

overpredicted the observations by a factor of 1.06, while FB (-0.27) shows

that the model underpredicted the observations. This shows that the

value of FB can sometimes result in a misleading conclusion as well. This

specific case shows that it is not wise to use FB as an evaluation metric.

NME and NMEF: gross error between observations and model results is 1.19

times of mean observation. The good model performance can be

concluded only under the condition that both relative bias (NMBF) and

relative gross error (NMEF) meet the certain performance standards.

(4) For the all data in Figure 1 (combination case 1+2+3+4 in Table 2), MNB,

NMB, FB, NMFB, MNFB and NMBF are 0.96, 0.09, -0.13, 0.09, -10.75 and 0.09,

respectively. Both NMB and NMBF show that the mean model only

overpredicted the mean observation by a factor of 1.09. However, the

gross error (NMGE) between the model and observation is 0.77 times as

high as observation. See Figure 1.

On the basis of the above analyses and test, it can be concluded that our

proposed new statistical metrics (i.e., NMBF and NMEF) on the basis of

concept of factor can show the model performance reasonably. These

new metrics use observational data as only reference for the model

evaluation and their meanings are also very clear and easy to explain.

3. Applications of new metrics over the US

• CMAQ model on PM2.5 SO42- and NO3

- over the US: 6/15 to 7/17, 1999

and 1/8 to 2/18, 2002.

• PM2.5 SO42- and NO3

- observational data:

(1) at 61 rural sites from IMPROVE. Two 24-hour samples are collected on

quartz filters each week, on Wednesday and Saturday, beginning at

midnight local time.

(2) at 73 rural sites from CASNet. Weekly (Tuesday to Tuesday) samples

are collected on Teflon filters.

(3) at 8 sites from SEARCH (Southeastern Aerosol Research and

Characterization project). Daily samples are collected on quartz filters.

(4) at 153 urban sites from STN (Speciated Trends Network). 24-hour

samples are usually taken once every six days.

• The recommended performance criteria for O3 by US EPA (1991):

normalized bias (MNB) 5 to 15%; normalized gross error (MNGE) 30% to

35%. +15% of MNB corresponds to +15% of NMBF while –15% of MNB

corresponds to –18% of NMBF.

New unbiased symmetric metrics for evaluation of the air quality model

 1.   INTRODUCTION    

Model performance evaluation is a problem of longstanding concern,

and has two important objectives, i.e., (1) determine a model’s degree of

acceptability and usefulness for a specific task, and (2) establish that

one is getting good results for the right reason (Russell and Dennis,

2000). These objectives are accomplished by two different types of

model evaluation, i.e., operational evaluation and diagnostic evaluation

(or model physics evaluation) (Fox, 1981; EPA, 1991; Weil et al., 1992;

Russell and Dennis, 2000), respectively. Although the operational

evaluations for different air quality models have been intensively

performed for regulatory purposes in the past years, the resulting array

of statistical metrics are so diverse and numerous that it is difficult to

judge the overall performance of the models (EPA, 1991; Seigneur et

al., 2000; Yu et al., 2003). Some statistical metrics can cause

misleading conclusions about the model performance (Seigneur et al.,

2000).

In this paper, a new set of quantitative metrics for the operational

evaluation is proposed and applied in real evaluation cases. The other

quantitative metrics frequently used in the operational evaluation are

tested and their shortcomings are examined.

2. Quantitative metrics related to the operational

evaluation and their examinations

•Table 1: traditional quantitative metrics,mathematical expressions

used in the operational evaluation.

•Differences between the model and observations: mean bias (MB)

and mean absolute gross error (MAGE) (or root mean square error).

•Relative differences between the model and observations: The

traditional metrics (such as MNB, MNGE, NMB,and NME)

•two problems that may mislead conclusions with this approach:

(1) the values of MNB and NMB can grow disproportionately for

overpredictions and underpredicitons because both values of MNB and

NMB are bounded by –100% for underprediction;

(2) the values of MNB and MNGE can be significantly influenced by

some points with trivially low values of observations (denomination).

•To solve this asymmetric evaluation problem between

overpredictions and underpredictions,

fractional bias (FB) and fractional gross error (FGE) (Irwin and

Smith, 1984; Kasibhatla et al., 1997; Seigneur et al., 2000).

Problems for FB and FGE:

(1)what the metrics FB and FGE measure is not clear because the

model prediction is not evaluated against observation but average of

observation and model prediction, which is contradict to the traditional

definition of evaluation.

(2)the scales of FB and FGE are not linear and are seriously

compressed beyond 1 as FB and FGE are bounded by 2 and +2,

respectively.

Acknowledgements

The authors wish to thank other members at ASMD of EPA for their contributions to the 2002

release version of EPA Models-3/CMAQ during the development and evaluation. This work has been

subjected to US Environmental Protection Agency peer review and approved for publication.

Mention of trade names or commercial products does not constitute endorsement or recommendation

for use.

REFERENCES

Cox, W.M., Tikvart, J.A., 1990. Atmospheric Environment 24, 2387-2395.

Eder, B., Shaocai, Yu, etc., 2003. Atmospheric Environment (in preparation).

EPA, 1991. Guideline for regulatory application of the urban airshed model. USEPA Report No.

EPA-450/4-91-013. U.S. EPA, Office of Air Quality Planning and Standards, Research Triangle

Park, North Carolina.

Dennis, R.L., J.N. McHenry, W.R.Barchet, F.S. Binkowski, and D.W. Byun, Atmos. Environ., 26

A(6), 975-997, 1993.

Fox, D.G., 1981. Bulletin American Meteorological Society 62, 599-609.

Irwin, J., and M. Smith, 1984, Bulletin American Meteorological Society 65, 559-568

Russell, A., and R. Dennis, Atmos. Environ., 34, 2283-2324, 2000.

Seigneur, C., et al.,., 2000. Journal of the Air & Waste Management Association 50, 588-599.

Weil, J.C., R.I. Sykes, and A. Venkatram, 1992, Journal of Applied Meteorology, 31, 1121-1145.

Yu, S.C., Kasibhatla, P.S., Wright, D.L., Schwartz, S.E., McGraw, R., Deng, A., 2003.Journal of

Geophysical Research 108(D12), 4353, doi:10.1029/2002JD002890.