Anomaly Detection in Problematic GPS Time Series Data and Modeling Dafna Avraham, Yehuda Bock

1
Anomaly Detection in Problematic GPS Time Series Data and Modeling Dafna Avraham, Yehuda Bock Institute of Geophysics and Planetary Physics, Scripps Institution of Oceanography, University of California, San Diego, La Jolla, CA Introduction Geophysical Anomalies and Conclusions Anomalous event detection in Global Positioning System (GPS) time series is an important matter in geodetic research. The Scripps Orbit and Permanent Array Center (SOPAC) generates continuous and daily time series in three dimensions for over 1400 global GPS stations that are analyzed using a computerized modeling program, which is limited to fitting slopes (velocities), offsets, periodic (annual and semiannual) terms, and postseismic decays. Currently, anomalous events are not adequately recognized or considered. We have developed anomaly detection algorithms that are capable of detecting signals, outliers, trends in the data, and modeling problems. The algorithms contain modified versions of noise analysis, correlation statistics, and threshold utility. They run on the complete set of global GPS time series, successfully uncovering a majority of the previously undetected anomalies. We spatially cluster the types of anomalies in order to reveal the geophysical factors that contribute to the occurrence of the incongruities. We are developing a new interactive environment that will allow users to analyze on-the-fly temporal and spatial subsets of GPS time series in various ways, and to detect anomalous events using these newly developed methods. We are incorporating this into the GPS Explorer data portal, a joint project of SOPAC and JPL to provide user- friendly GPS data products and on-line modeling applications. (http://geoapp.ucsd.edu ) Anomaly Detection Algorithms for GPS Time Series References Anomalies in GPS Time Series Modeling Problems: These are seen when the model does not represent the data well. Often times, this happens either because the model is lacking an important model term(s), or because the data has gaps and jumps that mislead the model. Signals: Many signals such as postseismic decays, anthropogenic effects, and volcanic signals are recognized as data that deviate away from the model in particular patterns. Outliers: Outliers are caused by many different sources. If the outliers are extreme, they can distort the model, and it is therefore very important to detect and remove them from the data. Trend: Due to geophysical forces, GPS time series inherently contain a linear velocity (trend). Thus, the series are detrended before further analysis is performed. Nevertheless, some series still contain a significant trend, especially when two or more trends are estimated. The existence of trend in a detrended series signifies the need for further modeling of the data, and Signal and Modeling Problem Detection Algorithm Outlier Detection Algorithm Trend Detection Algorithm Spatial Clustering of Anomalies Signals and Modeling Problems Trend Outliers Above: Spatial diagrams displaying, in orange, the anomalous GPS time series that our algorithms detected in Western United States. It is important to consider the spatial component of problematic sites because spatial clusters (seen here as condensed orange areas) often indicate underlying geophysical signals that may have gone unnoticed or unaccounted for in the model. These diagrams were created using GPS Explorer, an on-line data and modeling application created by SOPAC and JPL (http://geoapp.ucsd.edu ). The daily GPS time series data is displayed in sets of three plots per GPS site, representing the north, east, and up directional components, respectively from top to bottom. Left: Signal and/or Modeling Problems. Middle: Outliers in beginning of series. Right: Trends in detrended data. Problem: A model that does not consistently fit the data constitutes a modeling problem. Similarly, data that deviates away from the model in a particular pattern represents a signal. Method: Search each GPS site for existence of eight-month windows during which the residual series does not change sign, and therefore does not resemble white noise. This signifies a lack of important, but unaccounted for, model terms. Problem: Outliers are problematic because they skew the data, and in turn, they can bias the model. Extreme outliers must be removed. Method: Create a threshold for each residual series that is equal to 5 times the interquartile range (IQR). The IQR is a very robust estimator of the spread of the series since it is more resistant to outliers than the standard deviation. Thus, residuals that cross this threshold correspond to outliers. Problem: A detrended series that still exhibits significant trend indicates that the data contains unaccounted for information and/or modeling. Method: Using the correlation coefficient, r, we can measure the strength of the linear association between time (X) and distance (Y) in GPS data. Since -1<r <1, with a value of 0 representing no linear association, and a value close to 1 or -1 representing a strong linear association, we determined that a value greater than .7 or less than -.7 signifies trend. [1] http://sopac.ucsd.edu/cgi-bin/refinedJavaTimeSeries. cgi [2] Nikolaidis, R. (2002), Observation of Geodetic and Seismic Deformation with the Global Positioning System, Ph.D. thesis, Univ. of Calif., San Diego. [3] Diebold, F.X. (2007). Elements of Forecasting. Mason, OH: Thomson Higher Education. Santa Ana basin San Gabriel basin Los Angeles basin Anthropogenic effects: The algorithms detect anomalous sites (in orange) in the Los Angeles basin, Santa Ana basin, and San Gabriel basin, which are regions where anthropogenic effects occur. Long Valley Caldera Mount St. Helens Yellowsto ne Parkfield Earthquake San Simeon Earthquake Volcanic Signals: Volcanoes affect ground motion in patterns that the anomaly detection algorithms consistently recognize, which is seen above as concentrations of detected sites in volcanic regions (Mt. St. Helens, Long Valley Caldera, and Yellowstone). Postseismic Deformation: The algorithms effectively detect post-seismic deformation, which is the anomalous trademark of medium to large earthquakes (1992 Mw=7.3 Landers, 1999 Mw= 7.1 Hector Mine, 2003 Mw=6.5 San Simeon, and 2004 Mw= 6.0 Parkfield). The epicenter for each earthquake is circled in red on the map above. The algorithms we developed successfully detect many GPS time series that exhibit geophysical anomalies, which often occur in the form of anthropogenic effects (such a groundwater removal and oil extraction), volcanic signals, or postseismic deformation. Hector Mine & Landers Earthquakes Acknowledgments. Dafna Avraham is a 2009 SCEC intern under the ACCESS-U project. Support is also provided by the NASA MEaSUREs project “Solid Earth Science ESDR System” with JPL. Help in this research was provided by Brendan Crowell, Peng Fang, Paul Jamason and Mindy Squibb at

description

Anomaly Detection in Problematic GPS Time Series Data and Modeling Dafna Avraham, Yehuda Bock Institute of Geophysics and Planetary Physics, Scripps Institution of Oceanography, University of California, San Diego, La Jolla, CA. Outlier Detection Algorithm. - PowerPoint PPT Presentation

Transcript of Anomaly Detection in Problematic GPS Time Series Data and Modeling Dafna Avraham, Yehuda Bock

Page 1: Anomaly Detection in Problematic GPS Time Series Data and Modeling Dafna Avraham, Yehuda Bock

Anomaly Detection in Problematic GPS Time Series Data and ModelingDafna Avraham, Yehuda Bock

Institute of Geophysics and Planetary Physics, Scripps Institution of Oceanography, University of California, San Diego, La Jolla, CA

Introduction Geophysical Anomalies and Conclusions

Anomalous event detection in Global Positioning System (GPS) time series is an important matter in geodetic research. The Scripps Orbit and Permanent Array Center (SOPAC) generates continuous and daily time series in three dimensions for over 1400 global GPS stations that are analyzed using a computerized modeling program, which is limited to fitting slopes (velocities), offsets, periodic (annual and semiannual) terms, and postseismic decays. Currently, anomalous events are not adequately recognized or considered.

We have developed anomaly detection algorithms that are capable of detecting signals, outliers, trends in the data, and modeling problems. The algorithms contain modified versions of noise analysis, correlation statistics, and threshold utility. They run on the complete set of global GPS time series, successfully uncovering a majority of the previously undetected anomalies.

We spatially cluster the types of anomalies in order to reveal the geophysical factors that contribute to the occurrence of the incongruities.

We are developing a new interactive environment that will allow users to analyze on-the-fly temporal and spatial subsets of GPS time series in various ways, and to detect anomalous events using these newly developed methods. We are incorporating this into the GPS Explorer data portal, a joint project of SOPAC and JPL to provide user-friendly GPS data products and on-line modeling applications.(http://geoapp.ucsd.edu)

Anomaly Detection Algorithms for GPS Time Series

References

Anomalies in GPS Time Series

Modeling Problems: These are seen when the model does not represent the data well. Often times, this happens either because the model is lacking an important model term(s), or because the data has gaps and jumps that mislead the model.

Signals: Many signals such as postseismic decays, anthropogenic effects, and volcanic signals are recognized as data that deviate away from the model in particular patterns.

Outliers: Outliers are caused by many different sources. If the outliers are extreme, they can distort the model, and it is therefore very important to detect and remove them from the data.

Trend: Due to geophysical forces, GPS time series inherently contain a linear velocity (trend). Thus, the series are detrended before further analysis is performed. Nevertheless, some series still contain a significant trend, especially when two or more trends are estimated. The existence of trend in a detrended series signifies the need for further modeling of the data, and so trend detection is critical.

Signal and Modeling Problem Detection Algorithm Outlier Detection Algorithm Trend Detection Algorithm

Spatial Clustering of Anomalies

Signals and Modeling Problems TrendOutliers

Above: Spatial diagrams displaying, in orange, the anomalous GPS time series that our algorithms detected in Western United States. It is important to consider the spatial component of problematic sites because spatial clusters (seen here as condensed orange areas) often indicate underlying geophysical signals that may have gone unnoticed or unaccounted for in the model. These diagrams were created using GPS Explorer, an on-line data and modeling application created by SOPAC and JPL (http://geoapp.ucsd.edu).

The daily GPS time series data is displayed in sets of three plots per GPS site, representing the north, east, and up directional components, respectively from top to bottom. Left: Signal and/or Modeling Problems. Middle: Outliers in beginning of series. Right: Trends in detrended data.

Problem: A model that does not consistently fit the data constitutes a modeling problem. Similarly, data that deviates away from the model in a particular pattern represents a signal.

Method: Search each GPS site for existence of eight-month windows during which the residual series does not change sign, and therefore does not resemble white noise. This signifies a lack of important, but unaccounted for, model terms.

Problem: Outliers are problematic because they skew the data, and in turn, they can bias the model. Extreme outliers must be removed.

Method: Create a threshold for each residual series that is equal to 5 times the interquartile range (IQR). The IQR is a very robust estimator of the spread of the series since it is more resistant to outliers than the standard deviation. Thus, residuals that cross this threshold correspond to outliers.

Problem: A detrended series that still exhibits significant trend indicates that the data contains unaccounted for information and/or modeling.

Method: Using the correlation coefficient, r, we can measure the strength of the linear association between time (X) and distance (Y) in GPS data. Since -1<r <1, with a value of 0 representing no linear association, and a value close to 1 or -1 representing a strong linear association, we determined that a value greater than .7 or less than -.7 signifies trend.

[1] http://sopac.ucsd.edu/cgi-bin/refinedJavaTimeSeries.cgi [2] Nikolaidis, R. (2002), Observation of Geodetic and Seismic Deformation with the Global Positioning System, Ph.D. thesis, Univ. of Calif., San Diego.[3] Diebold, F.X. (2007). Elements of Forecasting. Mason, OH: Thomson Higher Education.[4] Stoodley, K.D.C. and Mirnia, M. (1979). The Automatic Detection of Transients, Step Changes and Slope Changes in the Monitoring of Medical Time Series. Journal of the Royal Statistical Society, Series D, 28, 163-70.

Santa Ana basin

San Gabriel basin

Los Angeles basin

Anthropogenic effects: The algorithms detect anomalous sites (in orange) in the Los Angeles basin, Santa Ana basin, and San Gabriel basin, which are regions where anthropogenic effects occur.

Long Valley Caldera

Mount St. Helens

Yellowstone

Parkfield Earthquake

San Simeon Earthquake

Volcanic Signals: Volcanoes affect ground motion in patterns that the anomaly detection algorithms consistently recognize, which is seen above as concentrations of detected sites in volcanic regions (Mt. St. Helens, Long Valley Caldera, and Yellowstone).

Postseismic Deformation: The algorithms effectively detect post-seismic deformation, which is the anomalous trademark of medium to large earthquakes (1992 Mw=7.3 Landers, 1999 Mw= 7.1 Hector Mine, 2003 Mw=6.5 San Simeon, and 2004 Mw= 6.0 Parkfield). The epicenter for each earthquake is circled in red on the map above.

The algorithms we developed successfully detect many GPS time series that exhibit geophysical anomalies, which often occur in the form of anthropogenic effects (such a groundwater removal and oil extraction), volcanic signals, or postseismic deformation.

Hector Mine & Landers Earthquakes

Acknowledgments. Dafna Avraham is a 2009 SCEC intern under the ACCESS-U project. Support is also provided by the NASA MEaSUREs project “Solid Earth Science ESDR System” with JPL. Help in this research was provided by Brendan Crowell, Peng Fang, Paul Jamason and Mindy Squibb at SOPAC.