Zhichang Guo, Paul A. Dirmeyer, Xiang Gao, and Mei Zhao Center for Ocean-Land-Atmosphere Studies,...

1
Zhichang Guo, Paul A. Dirmeyer, Xiang Gao, and Mei Zhao Center for Ocean-Land-Atmosphere Studies, Calverton, Maryland, USA 6/2005 Improving the quality of simulated soil moisture with a multi-model ensemble approach References: Barnston, A.G., S.J. Mason, L. Goddard, D.G. DeWitt, and S.E. Zebiak, 2003: Multimodel ensembling in seasonal climate forecasting at IRI. Bull. Amer. Meteor. Soc., 84, 1783-1796. Gao, X., and P. A. Dirmeyer, 2005: A multimodel analysis, validation and transferability study of global soil wetness products. J. Hydrometeor., (submitted). Krishnamurti, T.N., C.M. Krishtawal, T.E. LaRow, D.R. Bachiochi, Z. Zhang, C.E. Williford, S. Gadgil and S. Surendran, 1999: Improved weather and seasonal climate forecasts from multimodel superensemble. Science, 285, 1548-1550. Dirmeyer, P. A., X. Gao, M. Zhao, Z. Guo, T. Oki, and N. Hanasaki, 2005: The Second Global Soil Wetness Project (GSWP-2): Multi-model analysis and implications for our perception of the land surface. Bull. Amer. Meteor. Soc., (submitted). Robock, A., K. Ya. Vinnikov, G. Srinivasan, J. K. Entin, S. E. Hollinger, N. A. Speranskaya, 4. Sensitivity of improvements to composition The degree of improvement is a function of the number of models used and the quality of the individual model simulations. The responses of the multi-model average to the inclusion of one more “good” or “bad” model estimates are examined to address improvement in multi-model analysis to the size of the ensemble and the qualities of individual members. QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture Acknowledgements: The authors would like to thank the providers of the various soil moisture products. In particular, we wish to thank E. Kalnay, M. Kanamitsu, K. Scipal, P. Viterbo, Y. Fan, W. Ebisuzaki, and S. Lu for the reanalysis and non-GSWP-2 products, N. Hanasaki for forwarding all of the GSWP-2 model submissions to us, and A. Robock and J. Entin for providing us the in- situ soil moisture observations. This work was conducted under support from National Aeronautics and Space Administration grant NAG5-11579. Conclusions: 1. Multi-model average of simulated soil moisture generally outperforms individual product in simulating the phasing of the annual cycle, interannual variability, and magnitudes in observed soil moisture. 2. There is usually an obvious improvement when a high-quality product is included while there is little or no apparent degradation when a poorer product is included. 3. Reasonable small numbers of “noise products” do not have large impacts on the skill for anomalies in the multi-model analysis. 1. Introduction Recent studies show that a combined forecast with predictions from several models can perform significantly better than a single- model system (Krishnamurtu et al. 1999, Barnston et al. 2003). However, all such work focuses on dynamical fluid systems like the atmosphere or ocean, and little has been done for the land surface. This paper explores whether a multi-model analysis can be used to reduce errors in simulated land surface state variables. Here, we select soil moisture as the variable of focus because soil moisture is an important state variables for both AGCM/LSS initialization and evaluating the performance of AGCM and LSS. Seventeen global soil moisture products, three from coupled land- atmosphere model reanalyses and fourteen from uncoupled LSSs that covers a baseline ten-year period from 1986 to 1995, are used in this study. The simplest ensembling approach, an arithmetic mean of soil moisture from individual model simulations, has been found to be quite effective (Dirmeyer et al. 2005; Gao et al. 2005) and is employed. The characteristics of a multi-model analysis of soil moisture are examined against in situ observations from the Global Soil Moisture Data Bank. We validate their abilities to estimate the actual column soil moisture, to simulate the phasing of the annual cycle, and to accurately represent observed interannual variability. We then compare with individual model performance to see whether the multi-model analysis from an ensemble of LSS products outperforms individual members. Sensitivity of performance of multi-model analysis to the ensemble size and member selection is also conducted. 2. Data description Seventeen model-derived global soil moisture products are used in this study. All models are built around physically-based parameterizations of the behavior of the land surface water balance. Most of the models also include a full representation of the surface energy balance. Eleven of them come from the Second Global Soil Wetness Project baseline experiments where different land surface models were forced with the same meteorological observations for a ten-year period (1986-1995). The remainder comes from the uncoupled land surface model calculations and coupled land-atmosphere model reanalyses. The following table presents a synopsis of the characteristics of the included models. Most measurements are gravimetric and taken in agricultural areas. We focus on station data in five regions (Illinois, USA; China; India; Mongolia and the former Soviet Union). The Soviet Union data are further divided into two categories, representing winter and spring cereal fields. These data sets span from 12 to more than 20 years, although individual stations may have a much shorter record of observations. The most complete collection of actual measurements of soil moisture with broad spatial and temporal coverage is the Global Soil Moisture Data Bank (GSMDB) of Robock et al. (2000). This collection of station measurements covers regions of North America, Europe and Asia. 3. Multi-model analysis and validation Model estimates of soil moisture are compared to observations over these six domains: China (40 stations), Illinois (19 stations), India (10 stations), Mongolia (42 stations) and two sets of largely co-located agricultural stations in Russia; representing spring cereal fields and winter cereal fields (171 stations total) QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. The above figures compare median correlations between the model estimates and observations of soil moisture for all regions in terms of the total series (left) and the anomalies (right). Overall, there is a tendency for the reanalysis estimates to perform poorly, although there are some locations where the ERA40 product does rather well. The three non-GSWP-2 LSS products generally perform better than the reanalyses, but worse than most of the GSWP-2 estimates, with exceptions most likely over Mongolia and China. The multi-model estimate (black bar) is almost always the best, and ranks very highly on those occasions when it is not the best. The above figures compare median root mean square of errors (RMSE) between the model estimates and observations of soil moisture in all regions for both the total series (left) and the anomalies (right). It is found that only some models rank high in some regions, and no individual model ranks high for all regions. However, the multi-model estimate does rank very highly for all the regions. This figure compares median correlations, significance counts, and RMSE between the model estimates and observations of soil moisture for both the total series and the anomalies when all stations over the five regions are considered. It is noted that the Russian region has many more stations than the other regions, so the results rely heavily on the models’ performances over the Russian region. However, this approach does provide a clear picture of improvement of simulated soil moisture by the multi-model average: increased skill in simulating the phasing of the annual cycle and representing interannual variability; increased fraction of stations where the simulated soil moisture correlates with observations at the We first calculate the median correlation, significance counts, and median RMSE for each individual product for all GSMDB stations, and rank all products according to values for each measure category (dotted lines), examine the response of improvement or degradation in multi- model average following the ascending and descending branch of measuring ranks. At the ascending(descending) branch, we start from the product with the lowest(highest) rank, and then add one more product with higher(lower) rank into ensemble at each step for calculating the multi- model average and the corresponding measure skills representing with dashed lines (solid lines) until reaching the full ensemble size. The fact that the solid lines following the descending branch is always nearly flat indicates that there is no apparent degradation when a worse product is added to the multi-model ensembles. If some ensemble members can not reproduce the interannual variability in observed soil moisture properly how does it affect the performance of the multi-model analysis? In the right-side figure, we shift the time series of soil moisture simulations with random years for all stations in the eight models with moderate performances, and repeat calculations above. It is easily found that the median correlations become very small, even negative for the eight models after the years are shifted, and disordered data can not represent the actual interannual variability (dotted lines) any more. As those eight models are accumulatively considered, the performance of the multi-model analyses decrease gradually (solid lines). However, the overall performances increase rapidly when those best models are considered (dash lines), even after disordered data in eight models are included in the analysis. In addition, the overall performances also increase when those poorer models are considered (solid lines). Thus, reasonable small numbers of “noise products” do not have large impacts on the skill for anomalies in the multi-model analysis. The dashed lines following the ascending branch have higher median correlations and significance counts than the corresponding individual products. It demonstrates that multi-model average outperforms the individual products even when all products with worse skill than that product are included.

Transcript of Zhichang Guo, Paul A. Dirmeyer, Xiang Gao, and Mei Zhao Center for Ocean-Land-Atmosphere Studies,...

Page 1: Zhichang Guo, Paul A. Dirmeyer, Xiang Gao, and Mei Zhao Center for Ocean-Land-Atmosphere Studies, Calverton, Maryland, USA 6/2005 Improving the quality.

Zhichang Guo, Paul A. Dirmeyer, Xiang Gao, and Mei ZhaoCenter for Ocean-Land-Atmosphere Studies, Calverton, Maryland, USA6/2005

Improving the quality of simulated soil moisture with a multi-model ensemble approach

References:Barnston, A.G., S.J. Mason, L. Goddard, D.G. DeWitt, and S.E. Zebiak, 2003: Multimodel ensembling in seasonal climate forecasting at IRI.

Bull. Amer. Meteor. Soc., 84, 1783-1796.Gao, X., and P. A. Dirmeyer, 2005: A multimodel analysis, validation and transferability study of global soil wetness products. J.

Hydrometeor., (submitted).Krishnamurti, T.N., C.M. Krishtawal, T.E. LaRow, D.R. Bachiochi, Z. Zhang, C.E. Williford, S. Gadgil and S. Surendran, 1999: Improved

weather and seasonal climate forecasts from multimodel superensemble. Science, 285, 1548-1550.Dirmeyer, P. A., X. Gao, M. Zhao, Z. Guo, T. Oki, and N. Hanasaki, 2005: The Second Global Soil Wetness Project (GSWP-2): Multi-model

analysis and implications for our perception of the land surface. Bull. Amer. Meteor. Soc., (submitted).Robock, A., K. Ya. Vinnikov, G. Srinivasan, J. K. Entin, S. E. Hollinger, N. A. Speranskaya, S. Liu, and A. Namkhai, 2000: The global soil moisture data bank. Bull. Amer. Meteor. Soc., 81, 12811299.

4. Sensitivity of improvements to compositionThe degree of improvement is a function of the number of models used and the quality of the individual model simulations. The responses of the multi-model average to the inclusion of one more “good” or “bad” model estimates are examined to address improvement in multi-model analysis to the size of the ensemble and the qualities of individual members.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Acknowledgements:The authors would like to thank the providers of the various soil moisture products. In particular, we wish to thank E. Kalnay, M. Kanamitsu, K. Scipal, P. Viterbo, Y. Fan, W. Ebisuzaki, and S. Lu for the reanalysis and non-GSWP-2 products, N. Hanasaki for forwarding all of the GSWP-2 model submissions to us, and A. Robock and J. Entin for providing us the in-situ soil moisture observations.

This work was conducted under support from National Aeronautics and Space Administration grant NAG5-11579.

Acknowledgements:The authors would like to thank the providers of the various soil moisture products. In particular, we wish to thank E. Kalnay, M. Kanamitsu, K. Scipal, P. Viterbo, Y. Fan, W. Ebisuzaki, and S. Lu for the reanalysis and non-GSWP-2 products, N. Hanasaki for forwarding all of the GSWP-2 model submissions to us, and A. Robock and J. Entin for providing us the in-situ soil moisture observations.

This work was conducted under support from National Aeronautics and Space Administration grant NAG5-11579.

Conclusions:1. Multi-model average of simulated soil moisture generally outperforms individual product in simulating

the phasing of the annual cycle, interannual variability, and magnitudes in observed soil moisture.

2. There is usually an obvious improvement when a high-quality product is included while there is little or no apparent degradation when a poorer product is included.

3. Reasonable small numbers of “noise products” do not have large impacts on the skill for anomalies in the multi-model analysis.

1. Introduction

Recent studies show that a combined forecast with predictions from several models can perform significantly better than a single-model system (Krishnamurtu et al. 1999, Barnston et al. 2003). However, all such work focuses on dynamical fluid systems like the atmosphere or ocean, and little has been done for the land surface. This paper explores whether a multi-model analysis can be used to reduce errors in simulated land surface state variables. Here, we select soil moisture as the variable of focus because soil moisture is an important state variables for both AGCM/LSS initialization and evaluating the performance of AGCM and LSS.

Seventeen global soil moisture products, three from coupled land-atmosphere model reanalyses and fourteen from uncoupled LSSs that covers a baseline ten-year period from 1986 to 1995, are used in this study. The simplest ensembling approach, an arithmetic mean of soil moisture from individual model simulations, has been found to be quite effective (Dirmeyer et al. 2005; Gao et al. 2005) and is employed. The characteristics of a multi-model analysis of soil moisture are examined against in situ observations from the Global Soil Moisture Data Bank. We validate their abilities to estimate the actual column soil moisture, to simulate the phasing of the annual cycle, and to accurately represent observed interannual variability. We then compare with individual model performance to see whether the multi-model analysis from an ensemble of LSS products outperforms individual members. Sensitivity of performance of multi-model analysis to the ensemble size and member selection is also conducted.

2. Data description

Seventeen model-derived global soil moisture products are used in this study. All models are built around physically-based parameterizations of the behavior of the land surface water balance. Most of the models also include a full representation of the surface energy balance. Eleven of them come from the Second Global Soil Wetness Project baseline experiments where different land surface models were forced with the same meteorological observations for a ten-year period (1986-1995). The remainder comes from the uncoupled land surface model calculations and coupled land-atmosphere model reanalyses. The following table presents a synopsis of the characteristics of the included models.

Most measurements are gravimetric and taken in agricultural areas. We focus on station data in five regions (Illinois, USA; China; India; Mongolia and the former Soviet Union). The Soviet Union data are further divided into two categories, representing winter and spring cereal fields. These data sets span from 12 to more than 20 years, although individual stations may have a much shorter record of observations.

The most complete collection of actual measurements of soil moisture with broad spatial and temporal coverage is the Global Soil Moisture Data Bank (GSMDB) of Robock et al. (2000). This collection of station measurements covers regions of North America, Europe and Asia.

3. Multi-model analysis and validation

Model estimates of soil moisture are compared to observations over these six domains: China (40 stations), Illinois (19 stations), India (10 stations), Mongolia (42 stations) and two sets of largely co-located agricultural stations in Russia; representing spring cereal fields and winter cereal fields (171 stations total)

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

The above figures compare median correlations between the model estimates and observations of soil moisture for all regions in terms of the total series (left) and the anomalies (right). Overall, there is a tendency for the reanalysis estimates to perform poorly, although there are some locations where the ERA40 product does rather well. The three non-GSWP-2 LSS products generally perform better than the reanalyses, but worse than most of the GSWP-2 estimates, with exceptions most likely over Mongolia and China. The multi-model estimate (black bar) is almost always the best, and ranks very highly on those occasions when it is not the best.

The above figures compare median root mean square of errors (RMSE) between the model estimates and observations of soil moisture in all regions for both the total series (left) and the anomalies (right). It is found that only some models rank high in some regions, and no individual model ranks high for all regions. However, the multi-model estimate does rank very highly for all the regions.

This figure compares median correlations, significance counts, and RMSE between the model estimates and observations of soil moisture for both the total series and the anomalies when all stations over the five regions are considered. It is noted that the Russian region has many more stations than the other regions, so the results rely heavily on the models’ performances over the Russian region. However, this approach does provide a clear picture of improvement of simulated soil moisture by the multi-model average: increased skill in simulating the phasing of the annual cycle and representing interannual variability; increased fraction of stations where the simulated soil moisture correlates with observations at the 95% significance level; and much reduced median RMSE between model estimates and observations.

We first calculate the median correlation, significance counts, and median RMSE for each individual product for all GSMDB stations, and rank all products according to values for each measure category (dotted lines), examine the response of improvement or degradation in multi-model average following the ascending and descending branch of measuring ranks. At the ascending(descending) branch, we start from the product with the lowest(highest) rank, and then add one more product with higher(lower) rank into ensemble at each step for calculating the multi-model average and the corresponding measure skills representing with dashed lines (solid lines) until reaching the full ensemble size. The fact that the solid lines following the descending branch is always nearly flat indicates that there is no apparent degradation when a worse product is added to the multi-model ensembles.

If some ensemble members can not reproduce the interannual variability in observed soil moisture properly how does it affect the performance of the multi-model analysis? In the right-side figure, we shift the time series of soil moisture simulations with random years for all stations in the eight models with moderate performances, and repeat calculations above.

It is easily found that the median correlations become very small, even negative for the eight models after the years are shifted, and disordered data can not represent the actual interannual variability (dotted lines) any more. As those eight models are accumulatively considered, the performance of the multi-model analyses decrease gradually (solid lines). However, the overall performances increase rapidly when those best models are considered (dash lines), even after disordered data in eight models are included in the analysis. In addition, the overall performances also increase when those poorer models are considered (solid lines). Thus, reasonable small numbers of “noise products” do not have large impacts on the skill for anomalies in the multi-model analysis.

The dashed lines following the ascending branch have higher median correlations and significance counts than the corresponding individual products. It demonstrates that multi-model average outperforms the individual products even when all products with worse skill than that product are included.