Mastering Real-Time Data Quality Control – How to Measure ......• Accessibility of controlled...

15
Mastering Real-Time Data Quality Control – How to Measure and Manage the Quality of (Rig) Sensor Data Wolfgang Mathis, TDE Thonhauser Data Engineering, Leoben, Austria Gerhard Thonhauser, University of Leoben, Austria This paper was prepared for presentation at the 11 th International Conference on Petroleum Data Integration, Information and Data Management in Amsterdam, 19-20 April 2007 Table of Contents Table of Contents ........................................................................................................... 1 Abstract .......................................................................................................................... 1 Introduction .................................................................................................................... 2 Data Management Steps ................................................................................................ 2 Data Standardization .................................................................................................. 3 Unit Conversion ..................................................................................................... 3 Null Values ............................................................................................................ 3 Time Stamping and Time Zone ............................................................................. 3 Depth Reference..................................................................................................... 4 Data Identification ................................................................................................. 4 Data Quality Control .................................................................................................. 4 Range Check .......................................................................................................... 4 Gap Filling ............................................................................................................. 5 Outlier Removal ..................................................................................................... 5 Noise Reduction ..................................................................................................... 6 Logical Checks....................................................................................................... 7 Data Quality Reporting .............................................................................................. 8 Data Compression ...................................................................................................... 8 Data Reduction....................................................................................................... 8 Data Decimation .................................................................................................... 8 Data Access and Visualization................................................................................... 9 Conclusions .................................................................................................................. 10 References .................................................................................................................... 10 Abstract The amount of data collected in the information age has grown to amounts barely manageable. Currently available technologies are already capable of transmitting the readings of any sensor to worldwide locations at high frequencies and with nearly no time delay. With an ever-increasing flow of data the need for criteria to measure and evaluate data quality are more pressing than ever as this data forms the basis for many critical business decisions. This paper addresses these problems and shows essential steps to a successful data and quality management strategy: Quality control and improvement Data quality benchmarking Accessibility of controlled data Mastering Real-Time Data Quality Control 1 Wolfgang Mathis, Gerhard Thonhauser

Transcript of Mastering Real-Time Data Quality Control – How to Measure ......• Accessibility of controlled...

Page 1: Mastering Real-Time Data Quality Control – How to Measure ......• Accessibility of controlled data Mastering Real-Time Data Quality Control 1 Wolfgang Mathis, Gerhard Thonhauser

Mastering Real-Time Data Quality Control – How to Measure and Manage the Quality of (Rig) Sensor Data

Wolfgang Mathis, TDE Thonhauser Data Engineering, Leoben, Austria

Gerhard Thonhauser, University of Leoben, Austria

This paper was prepared for presentation at the 11th International Conference on Petroleum Data Integration, Information and Data Management in Amsterdam, 19-20 April 2007

Table of Contents Table of Contents...........................................................................................................1 Abstract ..........................................................................................................................1 Introduction....................................................................................................................2 Data Management Steps ................................................................................................2

Data Standardization..................................................................................................3 Unit Conversion .....................................................................................................3 Null Values ............................................................................................................3 Time Stamping and Time Zone .............................................................................3 Depth Reference.....................................................................................................4 Data Identification .................................................................................................4

Data Quality Control..................................................................................................4 Range Check ..........................................................................................................4 Gap Filling .............................................................................................................5 Outlier Removal.....................................................................................................5 Noise Reduction.....................................................................................................6 Logical Checks.......................................................................................................7

Data Quality Reporting ..............................................................................................8 Data Compression......................................................................................................8

Data Reduction.......................................................................................................8 Data Decimation ....................................................................................................8

Data Access and Visualization...................................................................................9 Conclusions..................................................................................................................10 References....................................................................................................................10

Abstract The amount of data collected in the information age has grown to amounts barely manageable. Currently available technologies are already capable of transmitting the readings of any sensor to worldwide locations at high frequencies and with nearly no time delay. With an ever-increasing flow of data the need for criteria to measure and evaluate data quality are more pressing than ever as this data forms the basis for many critical business decisions. This paper addresses these problems and shows essential steps to a successful data and quality management strategy:

• Quality control and improvement • Data quality benchmarking • Accessibility of controlled data

Mastering Real-Time Data Quality Control 1 Wolfgang Mathis, Gerhard Thonhauser

Page 2: Mastering Real-Time Data Quality Control – How to Measure ......• Accessibility of controlled data Mastering Real-Time Data Quality Control 1 Wolfgang Mathis, Gerhard Thonhauser

Simple but very effective signal processing algorithms are presented to ensure that the data is in the right value range, outliers are removed and missing values are substituted where possible. More complex control instances may not be able to correct data completely by automation. Here the human expert is still necessary for corrections. To minimize the work for engineers, a processing engine produces smart alarms whenever an automated data correction is not possible. If no correction is possible at all, the questionable portion of the data must be flagged invalid, or even be deleted to prevent misuse and wrong conclusions. This may be dependent on the application; so good data management is essential to provide a minimum standard for each individual case. The final step is to publish and present the data at the correct level of detail to different parts of the corporation. Key in this procedure is that every individual has easy and fast access to the data for their decision-making. In this context, the appropriate resolution, as well as the most efficient use of data processing time, is of critical interest. A unique solution to browse large volumes of drilling data is presented.

Introduction The quality of measurement data collected at a rig site (but also during well production) is a much discussed problem. Who has not experienced the frustration related to bad data quality, especially in a critical situation, where decisions require high quality data to be readily available? With the industry establishing more and more real-time operating centers, which deliver decision support to rig operations, the need to assure data quality is constantly increasing. Where in the past predominantly human inspection of measurement data was used to generate the value added, today automated analysis and interpretation is available, which has to be based on good data quality. At this stage, there is no standardized assessment of measurement data service quality and the auditing of data streams from rigs is not state-of-the-art. The objective of this paper is to investigate problems related to rig sensor data quality (and measurement data in general), to propose a systematic process for measurement data quality control and auditing, and to manage and navigate the resulting huge data sets.

Data Management Steps The main quality management steps consist of:

• Data Standardization • Data Quality Control • Data Quality Reporting • Data Compression • Data Access and Visualization

Mastering Real-Time Data Quality Control 2 Wolfgang Mathis, Gerhard Thonhauser

Page 3: Mastering Real-Time Data Quality Control – How to Measure ......• Accessibility of controlled data Mastering Real-Time Data Quality Control 1 Wolfgang Mathis, Gerhard Thonhauser

Data Standardization Data standardization is the first and most crucial step in any automated data analysis procedure. Different problems need to be addressed to enable data processing; five of them are discussed in this Chapter:

• Unit conversion • Null values • Time stamping and time zones • Depth reference • Data identification

Unit Conversion In order to use data, the unit of a measurement needs to be known. Before using measurements, it must be assured that they are converted into the right unit. This prevents fatal errors similar to what happened during the last Mars Climate Orbiter mission: [... it missed its intended 140 - 150km altitude above Mars during orbit insertion, instead entering the Martian atmosphere at about 57km. The spacecraft would have been destroyed by atmospheric stresses and friction at this low altitude. A review board found that some data was calculated on the ground in imperial units (pound-seconds) and reported that way to the navigation team, who were expecting the data in metric units (Newton-seconds). The craft was unable to convert the two systems of measurement]. Cited from Wikipedia [4] (Mars Climate Orbiter). To prevent such errors, it is necessary that the first step in any data management process is to convert the units to a predefined standard set.

Null Values Definition of a null value: A null value is a value used to signify that a specific value does not have a valid measurement. According to [3] the null values in WITS are:

• -8888 for bad sensor readings • -9999 for null values

However, these values are not commonly used in practice. Normally, the two values -999.25 (for floating numbers) and -999.00 (for integer numbers) are used. In any case, the null value for a data management system needs to be standardized in similar way to the unit, to prevent misinterpretation.

Time Stamping and Time Zone With most organizations being active in many parts of the world, the time reference is a very important topic. The convenient way of defining a specific time would be

2005-03-05 11:45:32.343 CEST (Central European Summer Time)

It is important that along with the date and time, the time zone must follow; otherwise the date is not complete. Such information is not suitable for processing by computers, so different time stamping systems were introduced. One of them is the UNIX time which is widely used on many computer systems. This time stamping method is not exactly the UTC time, because the UNIX time does not account for leap seconds. Leap seconds are seconds added to the UTC time on special dates to correct for the decreasing rotational speed of the earth. However, for most applications in the

Mastering Real-Time Data Quality Control 3 Wolfgang Mathis, Gerhard Thonhauser

Page 4: Mastering Real-Time Data Quality Control – How to Measure ......• Accessibility of controlled data Mastering Real-Time Data Quality Control 1 Wolfgang Mathis, Gerhard Thonhauser

computer world, such as time-stamping of measurement data, this is accurate enough. The UNIX time stamp is the number of seconds passed since the 1st of January 1970 00:00:00.

Depth Reference The depth reference is also important to define a measurement. In oil well drilling, the rotary kelly bushing (RKB) is widely used as the standard reference for any depth measurement. However, it is seldom reported or specified when measurements are exchanged. As long as the equipment (such as the drilling rig) is not changed during the well construction process this does not impose any problem. But as soon as the rig leaves the well site when the well is finished, problems can occur if the data management solution does not provide the possibility to define different depth references over time.

Data Identification An often forgotten matter is the identification of the data curves. It is very important to apply proper naming conventions. Certain organizations like Energistics [5] (former POSC) with standards like WITS and WITSML have catalogs to name measurements properly. They are a widely used standard. However, some organizations use their own naming conventions. This is not considered to be a problem as long as these conventions are applied without exception. In this case, it is important to pay attention to naming conventions if data is imported into the organization.

Data Quality Control Data Quality control consists of the following steps:

• Range Check • Gap Filling • Outlier Removal • Noise Reduction • Logical Checks

All steps are normally performed automatically. The only exception is if a logical check fails; then the data manager is informed and required to take action. The high degree of automation is critical to be able to manage high amounts of data.

Range Check A very simple, but nevertheless highly effective approach to improve the data quality is the range check. This approach removes unrealistic values from the data. Holland [1] stated that about 80% of the wrong values (in production data, for example, flow rates and well head pressure) can be identified with this technique. Although the authors believe that the success of this method in regard to drilling data is not as high as 80%, it is mandatory that it is applied first, before all other processing steps that follow the standardization. The range check prevents disturbing effects otherwise occurring in other quality control and processing steps.

Mastering Real-Time Data Quality Control 4 Wolfgang Mathis, Gerhard Thonhauser

Page 5: Mastering Real-Time Data Quality Control – How to Measure ......• Accessibility of controlled data Mastering Real-Time Data Quality Control 1 Wolfgang Mathis, Gerhard Thonhauser

To improve the functionality of the range check, a dynamic range can be applied to parameters like the hook load. The hook load, more or less, linearly increases with the bit depth.

Gap Filling A further step in the quality control is to fill small gaps occurring in the data. The general problem with measurement data is that the data points are never equally spaced. Although the measurements are recorded at a specified frequency, missing data points always occur, due to outliers, sensor failures, etc. Gaps are missing data points, as well as points with null values. Applied filtering techniques require the data to be equally spaced and continuous. To prevent missing data points, a quite simple but effective gap filling was introduced. The algorithm works as followed:

• Gap is bigger than the defined gap time In this case the beginning of the gap will be marked with a null value

• Gap is smaller than the defined gap time In this case the gap is filled by interpolation between the two edge points

In the second case, artificial data is introduced and written to the database. The artificial data covers only for a very limited time period, typically set to 10 seconds on 1Hz surface measured drilling data and its introduction prevents interruption of the applied filters every time a minor gap occurs. Unnecessary interruptions would lead to serious drawbacks in event recognition where not only the total time, but also the individual duration of a certain operation is of interest. An example would be the key performance indicator average connection time (see Thonhauser [12] and [13]).

Outlier Removal After the range check and the gap filling, data may still contain outliers that are within the plausible range. Figure 1 (a) shows a clean sample data set, Figure 1 (b) shows the same set with some artificial outliers added (red dots mark outliers). The plausible data range check could be from 0 to 20 units on this data, so the range check does not take effect on the introduced outliers. While outliers do not have a big influence on the total duration of recognized operations, they influence tremendously the results where the duration of events is important, e.g., an outlier in the hook load can prevent the correct detection of the connection duration. To remove outlier data points several methods were investigated:

• Mean filter • Median filter • Conservative smoothing filter • Wavelet decomposition

These methods have their origin in the field of image processing. Sonka [2] applied mean and median to images, Fisher [7] additionally applied the conservative smoothing filter, which is a modification of the median filter. Wavelets have been utilized for many purposes, in this case to identify outliers. Due to conflicting reports in the literature this method was also investigated. Oberwinkler [8] applied wavelets to production data and reported good success with this method. However, Olsen [9]

Mastering Real-Time Data Quality Control 5 Wolfgang Mathis, Gerhard Thonhauser

Page 6: Mastering Real-Time Data Quality Control – How to Measure ......• Accessibility of controlled data Mastering Real-Time Data Quality Control 1 Wolfgang Mathis, Gerhard Thonhauser

stated that there are no robust and accurate methods to remove outliers with wavelet transforms. The different filters where first tested on a sample set of artificial data. Figure 1 shows the clean data (a) as well as the data including the introduced outliers in red (b). Tests of the four filters mentioned have lead to the following conclusions (see Figure 2):

• The mean filter is not suited to removal of outliers at all. The introduced error is very high as discontinuities are not well preserved. Nevertheless, this filter is commonly applied in data reduction, so the results are displayed for comparison (see Figure 2 (d) and (g)).

• The median filter removes outliers very well, even in the case of multiple occurrences if the window size is adjusted accordingly (see Figure 2 (e) and (h)). An outstanding property of this non-linear filter is that it is capable to preserve edges very well.

• The conservative smoothing filter works very well with low numbers of occurring outliers. This is because the data is not changed at all if no outlier is detected. However, if multiple outliers occur this filter starts to fail (see Figure 2 (f) and (i)).

• Investigations showed that wavelet decomposition is well suited to detect outliers. However, their replacement is difficult to achieve. Tests with linear interpolation resulted in smears at discontinuities. Further improvements using the median filter to replace the outliers showed better success. However, using the median filter alone yields similar good results and does not justify applying this rather complicated method. Furthermore, the fine tuning of this method is much more complicated than the median filter. This would increase configuration effort, as well as processing time, which is not justified with the results.

Figure 3 shows the results of the above mean, median and conservative smoothing filters with a window size of 3x3 pixels on the example of a picture. Ten percent of the original picture was comprised of outliers. As expected, the median filter is blurring the picture strongly, not coping with the outliers at all. The median filter managed to remove all outliers, blurring the picture much less than the mean filter. The conservative smoothing filter is not able to remove all the outliers. Depending on the expected occurrence and regularity of the outliers (which is not easy to determine at all) the filter of choice is between the median and the conservative smoothing filter. For data with less than 2% of outliers, the conservative smoothing filter shows exceptional results with very little impact to the overall quality of the data (blurring). However, because the median filter is very reliable on a much wider range, with a sufficient small smoothing effect and exceptional edge preserving behavior, this filter is recommended for outlier removal.

Noise Reduction The classical signal processing method, the Fourier Transformation, has long been used to de-noise seismic signals. However, this method has a major drawback: The functions are localized in frequency but not in time. A result of this is that a change of the frequency spectrum will cause changes in the whole signal.

Mastering Real-Time Data Quality Control 6 Wolfgang Mathis, Gerhard Thonhauser

Page 7: Mastering Real-Time Data Quality Control – How to Measure ......• Accessibility of controlled data Mastering Real-Time Data Quality Control 1 Wolfgang Mathis, Gerhard Thonhauser

To overcome the deficiencies of this method, wavelet analysis was developed. Several authors applied this technology to reduce noise in data. Ouyang [10] presented de-noising as one of the most important steps in permanent downhole gauge data analysis. Olsen [9] presented improved wavelet filtering and compression of production data. Uzoechina [11] investigated noise reduction via the moving average in several variations (simple, weighted and exponential). Depending on the application and the type of noise, the appropriate method should be chosen.

Logical Checks Logical checks give information about relations between different channels, as well as physical boundaries, that exist. The primary function of these checks is not to automatically correct the data, but to automatically provide alarms to the data system manager to tell him that something is not working as it is supposed to:

• Hole depth check Since the measured hole depth (mdHole) is a very important parameter to recognize operations such as drilling, extended quality control is necessary. The check uses the median filter to remove outliers which otherwise could cause extraordinarily high and incorrect ROP values. Furthermore, the algorithm does not allow the hole depth to decrease, unless specifically configured by the user (e.g. when the hole is plugged back and a sidetrack is drilled). This allows the removal of many frequent errors, such as in cases when the intentional manual resetting of the hole depth has been done, so the driller can monitor the speed of his reaming or re-drilling activity on the ROP curve.

• Relation between flow and pressure If flow into the well is present, the pressure at the stand pipe must be present as well. Otherwise something is wrong, for example, the pressure sensor could be failing, or the sensor cable may be severed. Misuse of standards might also be a reason for a violation of this relation. In one actual case, the flow rate of the pump output was delivered on the WITSML curve flowInAv instead of the required flow into the well. Obviously the pump pressure was zero in the case where only one pump was used to circulate the mud tanks during tripping.

• Relation between bit depth and block position If the block position is concurrently changing with the bit position, it has to be assumed that the tubular assembly is connected to the hook. The two values have to be in the same order of magnitude. If not, (e.g. the delta of the block is about 3 times higher) it can be assumed that the wrong input unit is assigned to one of them, e.g. m vs. ft!

• Block speed The speed of the block (first derivation of the block position) is physically limited to a certain value. If it is above this value, something is wrong. E.g., the values of the block position are delivered in ft instead of m. In this case, the velocity (assuming that the units are converted the wrong way) would be three times higher than normal. This gives hints about correct units.

These are just a few examples which pertain to the data used for this work. Any kind of logical check depending on the domain can be implemented into the expert rule

Mastering Real-Time Data Quality Control 7 Wolfgang Mathis, Gerhard Thonhauser

Page 8: Mastering Real-Time Data Quality Control – How to Measure ......• Accessibility of controlled data Mastering Real-Time Data Quality Control 1 Wolfgang Mathis, Gerhard Thonhauser

system to ensure correct data. It is very important that alarms are raised in an intelligent manner to prevent the cry wolf scenario.

Data Quality Reporting The final step of quality control is the generation of standardized quality control reports. Such reports enable the user of the data to assess the quality of the data and to audit the performance of the different data providers. Examples for such a report are depicted in Table 2 and Table 3. It is recommended to use such reports, generated on a regular basis (e.g. daily QC reports) to verify data collection service quality of data providers and for service quality assurance.

Data Compression With ever increasing amounts of data recorded and stored, the need for data compression is more evident than ever. In view of this work, two different strategies of data compression are discussed:

• Reduction Decrease of the data resolution with an equidistant time stamp. In this case, a filter (like average or median) is applied over a certain time interval. The amount of data is significantly reduced, but so is the level of detail

• Decimation Decrease of the amount of data, but with preservation of the details in the data. This strategy leads to non-equidistant time stamps in the data

Data Reduction The first possibility, the data reduction, is mainly used for displaying purposes and applications, which do not need (or can not cope with) high resolution data. For visualization, it does not make any sense to fetch hundreds of thousands of data points, because a modern computer display has a resolution of 1600x1200 pixels. So the display of more than 1600 values in this case results in the overlaying of several values onto one single pixel. The developed software has the possibility to generate reduced data for any desired time interval. Generally, resolutions of 10, 60, 600 and 3600 seconds are generated per default to be able to display data quickly and efficiently. Similarly, any resolution can be generated for applications which can not cope with high resolution data. Again, the most simple data reduction algorithm is to calculate the average of a time interval. Similar to the outlier processing, this mean filter does not give the best results. To be able to analyze the nature of the data after reduction, all resolutions are stored with

• Minimum value • Average value • Maximum value • Median value

Data Decimation Whereas in the last examples the reduction of data has no big impact on the application, e.g. to display data curves, it certainly does when high frequency events must be preserved. In this case, the data decimation must be applied in a way to obtain sufficiently accurate results.

Mastering Real-Time Data Quality Control 8 Wolfgang Mathis, Gerhard Thonhauser

Page 9: Mastering Real-Time Data Quality Control – How to Measure ......• Accessibility of controlled data Mastering Real-Time Data Quality Control 1 Wolfgang Mathis, Gerhard Thonhauser

For this purpose, an algorithm was developed, which allows the reduction of data by a factor of 1:17 on average, ranging from 1:7 with high frequency data, like rotary torque, up to 1:64 with low frequency data. The upper limit of 1:64 is only because it was intended to have at least about one data point every minute. If this boundary condition is elevated to one data point every 10 minutes the compression rate would be as high as 1:600 (600 seconds in 10 minutes) for very low frequency channels like the hole depth. Table 1 shows the data reduction with each run in absolute data numbers and percent from the original data (run 0). The big advantage of this algorithm is that with simple linear interpolation (like the gap filling algorithm presented above) the original data can be restored with very little error as shown in the last tree rows of Table 1 where the relative error of the 2nd, 4th and 8th run is calculated. The algorithm uses the following features:

• If the middle of 3 successive data points lies on a straight line (within a certain threshold), the middle one is marked for possible deletion if the time between the first and the third point is smaller than 64 seconds.

• If the difference between the linearly interpolated value and its true value is below a second threshold, the point remains marked, if not, the mark is deleted.

• Marked points are deleted from the data. • The algorithm is iterated until no significant data reduction happens.

This ensures that high frequency events are not deleted from the data. However, if the frequency is very low many of the data points are deleted and the compression is very high.

Data Access and Visualization The successful collection and quality control of measurement data stream leads to value-added data sets. The next challenge is the graphical display of the sensor data giving the user the ability to surf large data sets. If sensor data is collected at 1 Hz frequency, 86400 data points per day are recorded. In case a user wanted to scroll through this data set and look at each individual data point, he would have to skip through 54 individual screens (with a resolution of 1600 pixels). In order to avoid this problem and allow the user to surf into the detail multi-resolution data sets are used. This way the user can start by looking at lower resolution data, investigating a minimum-maximum range, rather than an average only. Using this approach the user will not miss any peaks, which may be relevant for analysis and be able to zoom into details at any point in time. A pure reduction of data, e.g. taking only every tenth or thirtieth data point, may lead to misinterpretation, as detail is lost. Furthermore, if the user of the data had to scroll through a 150 day well, he would be faced with navigating blindly through a set of 12.96 million data points. The solution proposed to this problem is to provide navigation capability using the time versus depth curve of the well as a reference as shown in Figure 4. The time versus depth curve, a very familiar graphical representation for the drilling engineer, is used to identify time intervals of interest for the user. It is possible to highlight operations of interest (e.g. reaming and washing, trouble times) and navigate straight to these intervals of interest.

Mastering Real-Time Data Quality Control 9 Wolfgang Mathis, Gerhard Thonhauser

Page 10: Mastering Real-Time Data Quality Control – How to Measure ......• Accessibility of controlled data Mastering Real-Time Data Quality Control 1 Wolfgang Mathis, Gerhard Thonhauser

Another unique feature of the new concept is the multi-well capability. Figure 5 shows how data curves of two different wells can be visualized into one plot. Multiple wellbores can be navigated independently. This enables fast and easy depth correlation of all displayed wellbores. Drilling measurement data can be studied in a time based context, but also requires a link from the time based view to the depth dimension. Figure 5 shows how the proposed visualization solution enables the user to line time and depth-based navigation. Again, the time versus depth curve is used as a reference. The user can display depth based measurement data and time based at the same time and surf through the well.

Conclusions This paper shows the importance of automated methods to quality control rig sensor data prior to analysis and decision making. The key elements of the presented concept can be summarized as follows:

• A step-by-step approach has been introduced to quality control measurement data in an automated way.

• The quality control steps are auditable and reproducible. • Quality control can be performed in real-time, which is the basis for real-time

decision making. • Data quality problems are identified, flagged, and corrected automatically

where possible (raw data is stored along with quality controlled data). • Standardized quality control reports are generated allowing assessing and

auditing the quality of data delivered from various data providers. • Significant time savings can be achieved compared to a manual quality

control. • The user community will build trust in the data as quality control measured are

flagged and quality controlled data is stored permanently for their use. • A visualization concept has been introduced, which allows the surfing of time

and depth based data with a unique navigation concept.

References 1. Holland, J., Oberwinkler, C., Huber M., Zangl G., Utilizing the Value

Continuously Measured Data, SPE 90404, presented at the SPE Annual Technical Conference and Exhibition held in Houston, Texas, USA, 26-29 September 2004

2. Sonka, M., Hlavac, V., Boyle, R., Image Processing, Analysis, and Machine Vision, Itps Thompson Learning, second edition, ISBN 0-534-95393-X, November 1998

3. API Publication 3855, Conventions and Implementation Guidelines for EDI - Wellsite Information Transfer Specification (WITS), Product Number 038550, July 1991

4. Wikipedia, the free encyclopedia, http://www.wikipedia.org/, February 2007 5. Energistics, http://www.energistics.org, 15.01.2007 6. Smith, A.H., The Economic Advantages of Managing Data, ONCE!, SPE

78337, presented at the SPE 13th European Petroleum Conference held in Aberdeen, Scotland, U.K., 29-31 October 2002

Mastering Real-Time Data Quality Control 10 Wolfgang Mathis, Gerhard Thonhauser

Page 11: Mastering Real-Time Data Quality Control – How to Measure ......• Accessibility of controlled data Mastering Real-Time Data Quality Control 1 Wolfgang Mathis, Gerhard Thonhauser

7. Fisher, B., Perkins, S., Walker, A., Wholart, E., HyperMedia Image Processing Reference, Department of Artificial Intelligence, University of Edinburgh, http://www.cee.hw.uk/hipr/html/hipr top.html

8. Oberwinkler, C., Stundner, M., Form Real Time Data to Production Optimization, SPE 87008, presented at the SPE Asia Pacific Conference on Integrated Modelling for Asset Management held in Kuala Lumpur, Malaysia, 29-30 March 2004

9. Olsen, S., Nordtvendt, J.-E., Improved Wavelet Filtering and Compression of Production Data, SPE 98600, presented at the Offshore Europe held in Aberdeen, Scotland, UK, 6-9 September 2005

10. Ouyang, B.-L., Kikani, J., Improving Permanend Downhole Gauge (PDG) Data Processing via Wavelet-Analysis, SPE 78290, presented at the SPE 13th European Petroleum Conference held in Aberdeen, Scotland, UK, 29-31 October 2002

11. Uzoechina, F., Analysis of Real-Time Production Data Using Wavelet Decomposition, Diploma Thesis, University of Leoben, June 2004

12. Thonhauser, G., Wallnoefer, G., Mathis, W., Ettl, J., Use of Real-Time Rig Sensor Data to Improve Daily Drilling Reporting, Benchmarking and Planning - A Case Study, SPE 99880, presented at the SPE Intelligent Energy Conference and Exhibition held in Amsterdam, The Netherlands, 11-13 April 2006

13. Thonhauser, G., Mathis, W., Automated Reporting Using Rig Sensor Data Enables Superior Drilling Project Management, SPE 103211, presented at the SPE Annual Technical Conference and Exhibition held in San Antonio, Texas, U.S.A., 24-27 September 2006

Mastering Real-Time Data Quality Control 11 Wolfgang Mathis, Gerhard Thonhauser

Page 12: Mastering Real-Time Data Quality Control – How to Measure ......• Accessibility of controlled data Mastering Real-Time Data Quality Control 1 Wolfgang Mathis, Gerhard Thonhauser

Run spm1 flowinav rpm hkldav tqav posblock mdhole average

0 21600 21600 21600 21600 21600 21600 21600 21600 1 11100 11070 11287 10806 11379 10806 10801 11036 2 5957 5807 6133 5422 6453 5423 5402 5800 3 3526 3182 3562 2741 4612 2739 2702 3295 4 2373 1896 2291 1413 3875 1410 1352 2087 5 1811 1291 1682 756 3533 735 677 1498 6 1541 1008 1409 441 3351 405 340 1214 7 1522 997 1393 440 3281 403 340 1197 8 1497 987 1386 439 3241 403 340 1185

num

ber o

f dat

a po

ints

9 1493 987 1381 439 3234 403 340 1182

0 100.00% 100.00% 100.00% 100.00% 100.00% 100.00% 100.00% 100.00%

1 51.39% 51.25% 52.26% 50.03% 52.68% 50.03% 50.01% 51.09%

2 27.58% 26.88% 28.39% 25.10% 29.88% 25.11% 25.01% 26.85%

3 16.32% 14.73% 16.49% 12.69% 21.35% 12.68% 12.51% 15.25%

4 10.99% 8.78% 10.61% 6.54% 17.94% 6.53% 6.26% 9.66%

5 8.38% 5.98% 7.79% 3.50% 16.36% 3.40% 3.13% 6.93%

6 7.13% 4.67% 6.52% 2.04% 15.51% 1.88% 1.57% 5.62%

7 7.05% 4.62% 6.45% 2.04% 15.19% 1.87% 1.57% 5.54%

8 6.93% 4.57% 6.42% 2.03% 15.01% 1.87% 1.57% 5.48%

num

ber o

f dat

a po

ints

[%

] fro

m o

rigin

al

9 6.91% 4.57% 6.39% 2.03% 14.97% 1.87% 1.57% 5.47%

Error 2 6.9E-04 5.2E-06 5.0E-02 2.4E+00 6.1E-01 4.4E-04 3.0E-04 Error 4 2.1E-03 2.7E-05 2.5E-01 4.6E+01 3.2E+00 3.0E-03 2.2E-03

Error 8 2.6E-03 4.6E-05 4.7E-01 1.2E+02 3.7E+00 8.9E-03 7.0E-03

Error (%) 2 0.0189% 0.0158% 0.0139% 0.0022% 0.0288% 0.0013% 0.0000% 0.0116% Error (%) 4 0.0584% 0.0812% 0.0700% 0.0438% 0.1491% 0.0090% 0.0001% 0.0588%

Error (%) 8 0.0717% 0.1393% 0.1298% 0.1128% 0.1761% 0.0266% 0.0002% 0.0938%

Table 1: Data Compression Performance

Mastering Real-Time Data Quality Control 12 Wolfgang Mathis, Gerhard Thonhauser

Page 13: Mastering Real-Time Data Quality Control – How to Measure ......• Accessibility of controlled data Mastering Real-Time Data Quality Control 1 Wolfgang Mathis, Gerhard Thonhauser

2005-02-16 05:00:00 hkldav mdbit mdhole posblock prespumpav 2005-02-17 05:00:00 [t] [m] [m] [m] [bar]

1 [s] 56654 56654 56654 56654 56654 2 [s] 2765 2765 2765 2765 2765 3 [s] 450 450 450 450 450 4 [s] 45 45 45 45 45 5 [s] 34 34 34 34 34 6 [s] 12 12 12 12 12 7 [s] 5 5 5 5 5 8 [s] 4 4 4 4 4 9 [s] 3 3 3 3 3 10 [s] 1 1 1 1 1 Sum of Data [#] 59973 59973 59973 59973 59973 Required Data [#] 86400 86400 86400 86400 86400 Duration [s] 86400 86400 86400 86400 86400 Min Time Difference [s] 1 1 1 1 1 Max Time Difference [s] 20544 20544 20544 20544 20544 Median Time Difference [s] 1 1 1 1 1 Correct Frequency Time [s] 56654 56654 56654 56654 56654 Different Frequency Time [s] 7406 7406 7406 7406 7406 Correct Frequency Time [%] 65.57% 65.57% 65.57% 65.57% 65.57% Different Frequency Time [%] 8.57% 8.57% 8.57% 8.57% 8.57% Total Data Time [s] 64060 64060 64060 64060 64060 Valid Data Time [s] 63774 63666 63557 63338 63754 Null Values [s] 286 394 503 722 306 Gap Time [s] 22340 22340 22340 22340 22340 Total Data Time [%] 74.14% 74.14% 74.14% 74.14% 74.14% Valid Data Time [%] 73.81% 73.69% 73.56% 73.31% 73.79% Null Value [%] 0.33% 0.46% 0.58% 0.84% 0.35% Gap Time [%] 25.86% 25.86% 25.86% 25.86% 25.86% Min. Check Value 0 0 0 0 0 Max. Check Value 500 10000 10000 50 500 Below Min. Check Value [#] 0 0 0 2342 0 Above Max. Check Value [#] 0 0 0 0 0 Min Value 7.75 20.00 574.01 -2.56 95.53 Mean Value 25.97 397.73 649.80 14.87 43.34 Max Value 55.53 924.93 924.93 40.00 134.75

Table 2: Quality Control Report - Data Details

Nr. Start [yyyy-mm-dd hh:mm:ss] End [yyyy-mm-dd hh:mm:ss] Duration [s] 1 2005-02-16 06:50:33 2005-02-16 07:00:43 610 2 2005-02-16 07:53:55 2005-02-16 08:01:26 451 3 2005-02-16 08:38:53 2005-02-16 08:39:31 38 4 2005-02-16 08:39:31 2005-02-16 08:39:50 19 5 2005-02-16 08:40:11 2005-02-16 08:40:39 28 6 2005-02-16 08:41:20 2005-02-16 08:41:32 12 7 2005-02-16 08:41:38 2005-02-16 08:42:03 25 8 2005-02-16 08:42:46 2005-02-16 08:43:02 16 9 2005-02-16 08:45:18 2005-02-16 08:45:30 12

10 2005-02-16 13:37:55 2005-02-16 13:38:13 18 11 2005-02-16 14:31:39 2005-02-16 14:33:34 115 12 2005-02-16 14:48:33 2005-02-16 14:48:57 24 13 2005-02-16 22:38:19 2005-02-16 22:40:16 117 14 2005-02-16 22:40:41 2005-02-16 22:42:36 115 15 2005-02-16 22:43:31 2005-02-16 22:43:50 19 16 2005-02-16 22:47:12 2005-02-16 22:48:36 84 17 2005-02-16 23:03:17 2005-02-16 23:04:50 93 18 2005-02-16 23:17:36 2005-02-17 05:00:00 20544

Table 3: Quality Control Report - Gap List

Mastering Real-Time Data Quality Control 13 Wolfgang Mathis, Gerhard Thonhauser

Page 14: Mastering Real-Time Data Quality Control – How to Measure ......• Accessibility of controlled data Mastering Real-Time Data Quality Control 1 Wolfgang Mathis, Gerhard Thonhauser

0 5 10 15 20 25 300

5

10

15

20

original (a)0 5 10 15 20 25 30

0

5

10

15

20

original with outlier (b) Figure 1: Outlier within plausible Data Range

0 5 10 15 20 25 300

5

10

15

20

original with outlier (a)0 5 10 15 20 25 30

0

5

10

15

20

original with outlier (b)0 5 10 15 20 25 30

0

5

10

15

20

original with outlier (c)

0 5 10 15 20 25 300

5

10

15

20

mean (window=3) (d)

−20

0

20

0 5 10 15 20 25 300

5

10

15

20

median (window=3) (e)

−20

0

20

0 5 10 15 20 25 300

5

10

15

20

conservative smooth (window=3) (f)

−20

0

20

0 5 10 15 20 25 300

5

10

15

20

mean (window=5) (g)

−20

0

20

0 5 10 15 20 25 300

5

10

15

20

median (window=5) (h)

−20

0

20

0 5 10 15 20 25 300

5

10

15

20

conservative smooth (window=5) (i)

−20

0

20

Figure 2: Application of Different Filters to Artificial Data

original

10% outlier

mean 3x3

median 3x3 conservative

smoothing 3x3 Figure 3: Filter Performance Example

Mastering Real-Time Data Quality Control 14 Wolfgang Mathis, Gerhard Thonhauser

Page 15: Mastering Real-Time Data Quality Control – How to Measure ......• Accessibility of controlled data Mastering Real-Time Data Quality Control 1 Wolfgang Mathis, Gerhard Thonhauser

Curve 1

Curve 2

Curve 3

Curve 4

Curve 5

Curve 6

Depth Index[m]

1600

1500

1700

1800

1900

2000

2100

2200

2300

2400

Curv

e1

Curv

e2

Curv

e3

Curv

e4

Tim

eIn

de

x[D

ate

]

2005-0

3-0

415:0

0

2005-0

3-0

412:0

0

2005-0

3-0

418:0

0

2005-0

3-0

421:0

0

2005-0

3-0

500:0

0

2005-0

3-0

503:0

0

2005-0

3-0

506:0

0

2005-0

3-0

509:0

0

2005-0

3-0

512:0

0

2005-0

3-0

515:0

0

2005-0

3-0

518:0

0

2005-0

3-0

521:0

0

2005-0

3-0

600:0

0

2005-0

3-0

603:0

0

ControllerDepth Based View

Time Based View

Time

Depth

Figure 4: TxD Based Data Navigation - Time and Depth View

Curve 1

Curve 2

Depth Index[m]

1600

1500

1700

1800

1900

2000

2100

2200

2300

2400

ControllerDepth Based View

Curve 1

Curve 2

Depth Index[m]

1700

1800

1900

2000

2100

2200

2300

2400

2500

2600

Time

Depth

Figure 5: TxD Based Data Navigation - Multi Well Depth View

Mastering Real-Time Data Quality Control 15 Wolfgang Mathis, Gerhard Thonhauser