Time Series Data Concepts

download Time Series Data Concepts

of 52

Transcript of Time Series Data Concepts

  • 8/9/2019 Time Series Data Concepts

    1/52

    RFCorsello

    Research

    Foundation

    Time Series DataConcepts

  • 8/9/2019 Time Series Data Concepts

    2/52

    Introduction

    Sensors and other continual monitoring data collection efforts are used in mfields

    These forms of data collection have a common underlying premise

    A fixed set of data fields collected at regular time intervals over a longer-terperiod

    This is the essence of a time series

    How time series data is collected and used has a direct influence on the stor

    methodology that should be used Time series data is a form of temporal data that is managed as a set

    It is the uniformity of the collection that enables and favors specific treatmemanagement

  • 8/9/2019 Time Series Data Concepts

    3/52

    Temporal Concepts

    Time is an intrinsic concept familiar to us all

    It marks the when of all events

    All events may be marked by when they occur

    All measurements are collected in time

    A temperature (say 15oC) is a value measured at a point in time (and space)

    Measurements are taken at the time now for when the measurement occurs

    At any point in time after the measurement was recorded, it can be referred to bythe time of the measurement

    This basic concept implies that all data is temporal in nature

    The term temporal in this construct means with respect topertaining to time where are key term is data

    Temporal data is any data that has value measured with retime

    Temporal data is about bounding the validity or relevance in time

    If a river is measured to be 15oC; that measurement is onltime the measurement was taken

    For any data measurement:

    the value measured (15oC) is non-temporal

    15oC is a value, it is the thing measured a river which

    For certain very specific applications, a measurement mayvariance over time and is therefore temporally static

    This does not imply that the data is not tempotemporal validity of the measurement is equilife of the item measured

    These are two distinct concepts, temporality of the measuof the item measured

  • 8/9/2019 Time Series Data Concepts

    4/52

    Time Series

    A time series is defined as a fixed structure of data collectedrepeatedly over time at fixed intervals

    This definition is very broad and as such allows for variability in

    areas

  • 8/9/2019 Time Series Data Concepts

    5/52

    Time Domain

    A single time series data set will have a time domain marking thand end of the time series

    For continual monitoring scenarios, the end may be thought of as b

    now and the end of time

    Since the data is a time series, now represents the current last re

  • 8/9/2019 Time Series Data Concepts

    6/52

    Time Interval

    For any time series, there is a fixed interval between value points For example every five minutes is an interval for a time series of data

    collected at five minute intervals

    It is this exact concept that permits a time series to only store the data and not the time value it is a measurement for

    A time series only stores two actual times

    Start date/time

    End date/time

    The time series stores a single interval value that is the return period osampling interval separating discrete readings

    Five minutes in our example

  • 8/9/2019 Time Series Data Concepts

    7/52

    Measurement Interval

    An important related concept of time series data is the actual measurement inter

    If a measurement is taken every five minutes, what is the collection method for the meas

    If a temperature measurement is recorded every five minutes on the 0 and 5 (e.g. 5:0

    Is the measure:

    An instantaneous temperature

    An average temperature from the previous time

    An average of a split time (5:00 recorded, sampled from 4:57:30-5:02:30)

    This information is not part of the time series itself, but is instead metadata aboutseries

    An important concept here is that for continual monitoring time series, changes oover time may measure using different approaches

    In the case of different measure intervals, the time series should be split for consistency

  • 8/9/2019 Time Series Data Concepts

    8/52

    Interval Examples

  • 8/9/2019 Time Series Data Concepts

    9/52

    Relationship to Temporal Data

    Time series data is a special case of temporal data

    A time series is temporal in that each measurement within the time series may be treated as a sitemporal measurement

    The fixed interval of measures makes the treatment of the data special, whereas the data itself isin any way

    A single time series may have thousands or millions of individual measurements, each sfixed intervals

    If a time series were to have only a single measurement (the degenerate case), it woulda temporal measure

    Any collection of temporal measures that have the property of being evenly spaced in tbe treated as a time series

    It is possible to construct a time series from non-evenly spaced data via an interpolation

    It is common to abstract detailed measures (such as hourly temperatures at uneven intervals sinto more abstract time series such as daily, weekly or monthly means

  • 8/9/2019 Time Series Data Concepts

    10/52

    Time Series and Temporal Representa

  • 8/9/2019 Time Series Data Concepts

    11/52

    Collection

    Time series data may be collected in any of a number of ways

    A simulation or application may generate a time series directly

    A single run of an application generates a full time series at once

    An application may also append to a time series each time it runs

    In the latter case, it is critical the application is consistent in each run to maintain the integrity of time serieoffsets

    It is often desirable to know which run produced which part of the time series

    In the collection of time series data from sensors or manual entry:

    Each subsequent round of collection is conceptually separate from the previous round of collection

    In the case of a field deployed sensor (non-telemetry)

    Each time the sensor is changed out or data is downloaded there is a new time series created for that batch o

    This is critical in that each deployment of a sensor may overlap slightly, may have short gaps, or may be skew(every five minutes, but on the 1s and 6s)

  • 8/9/2019 Time Series Data Concepts

    12/52

    Collection Example

  • 8/9/2019 Time Series Data Concepts

    13/52

    Virtual Time Series

    The concept of multiple time series collections that align with eestablishes a need for a virtual time series

    This virtual time series is the defined global time series for a

    collection definition (fields, interval and domain)

    Composed of individual physical time series that each contains a

    data records for a collection effort

  • 8/9/2019 Time Series Data Concepts

    14/52

    Time Series Use

    The long-term purpose of time series is no different than that of any data

    How time series data is used will influence the approach used for storage to adequate performance and storage volumes are available to handle the dem

    It is the nature of how time series data is used that most influences its specitreatment

    In many cases a time series is used as a whole (the entire series) rather thanindividual measures

    Without such a directed form of use, the notion of a time series would be iras a separate entity from the more general temporal data

    It is the cost of storage and transmission which can greatly affect the perforapplications using time series data that suggests the special treatment of timto reduce size and increase access performance

  • 8/9/2019 Time Series Data Concepts

    15/52

    Methodologies of

  • 8/9/2019 Time Series Data Concepts

    16/52

    Random Extraction

    The most basic form of use for a time series is that of random extra

    A user needs data from a time series based upon a set of criterion know

    by the user at extraction time (not planned or expected at data collecti

    This is one of the most common scenarios for any data use and has larg

    implications in storage format

    For random extraction, a user may request all records where temperatover 32

    This form of access results in a search over the time series to extract th

    individual elements matching the criteria provided

  • 8/9/2019 Time Series Data Concepts

    17/52

    Temporal Extraction

    The easiest form of extraction from a time series is temporal ex

    The user wants a portion of the time series between two dates

    This results in a new time series being returned that is bounded by

    most constrained limits between the user defined limits and the tim

    series internal limits Such as requesting an extraction starting prior to the start of the time

    itself

  • 8/9/2019 Time Series Data Concepts

    18/52

    Complete Delivery

    The best case use scenario for a time series is complete delivery

    Notice this is not an extraction, in that the entire data set is deliver

    whole

    No processing is required beyond integrating physical time seriesvirtual record

  • 8/9/2019 Time Series Data Concepts

    19/52

    Enumeration

    Once delivered a user will general walk through the data in so

    manner toward a goal

    For example, to compute the average of a time series a full forward

    scrolling read is performed to sum all values in the time series

    This is a complete linear access from start to finish

  • 8/9/2019 Time Series Data Concepts

    20/52

    Linear and Partial Access

    Linear Access Linear or sequential access is the direct reading of the time series i

    order of the data

    Linear access has no special requirements and is one common acce

    scenario

    Partial Access Only a portion of the data may need to be reviewed

    The access will only need to visit a portion of the data points withi

    time series

  • 8/9/2019 Time Series Data Concepts

    21/52

    Random Access

    The user may need to access any point within the time series at any time

    The user must be able to move within the time series at will

    Random access is the most complex form of access for any data structure, acommonly required

    One common example of random access is for sort

    If a user wanted to sort a time series by temperature rather than by time, thused both linear access to enumerate and random access to read specific ite

    More significantly, random access allows for access by data field, such astemperature (e.g. get record for temperature = 26)

    This form of random access is closed related to random extraction and has simpacts for performance

  • 8/9/2019 Time Series Data Concepts

    22/52

    Index or Ordinal

    Index or ordinal access to a time series is access by time offset ooffset into the time series by position (e.g. the 26th data poinseries)

    Index access is closely related to random access

    Is in fact a mechanism for random access without the performaissues of other forms of random access

    In general, index access is the only form of random access with performance costs

    Still has implications for large volume time series

  • 8/9/2019 Time Series Data Concepts

    23/52

    StoPlacing the Bytes on

  • 8/9/2019 Time Series Data Concepts

    24/52

    Storage

    There are many well-defined storage formats for dealing with the storage and transport of time series data such as:

    CDF (Common Data Format)

    NetCDF (Network Common Data Format)

    There are many databases and applications that have support for time series data such as

    Aquarius

    Historis

    Temporal Analyst

    GrADs

    Timescape XDB

    Hec-DSS

    There is a common thread across all time series formats

    A time series is a set of data delimited in time by a fixed interval with a fixed start date (our general definition)

    In specific implementations, there may be constraints on the data stored in a single time series (the fields) or on the maximum size of the time series w hen stored (Aquarius for the underlying database)

    When planning time series storage, considerations must be made for the collection and use of the data to be stored to ensure adequate capacity and performanc

    Each type of data to be stored in a time series (the field set) will require a dedicated time series store

    For example, a water quality time series cannot store sediment data (there are different fields)

    A water/sediment time series may be created that stores both together as a single entity

  • 8/9/2019 Time Series Data Concepts

    25/52

    Storage Mechanisms

    A time series may be stored:

    In a relational database management system (RDBMS)

    In flat files

    As XML

    The selection of storage location (e.g. flat file or RDBMS) will influethe data within that location is structured

    For example, in an RDBMS, each time series could be stored as:

    A dedicated table

    A set of rows in a shared table

    A single row in a shared table

  • 8/9/2019 Time Series Data Concepts

    26/52

    Field Storage

    An important aspect of the time series is the fields within the se

    If a time series stores only a single parameter (such as temperat

    the time series storage is relatively trivial

    If the time series stores a complex data structure, the storage otime series will be equally complex

  • 8/9/2019 Time Series Data Concepts

    27/52

    Storage Basics

    For storage on a computer:

    Data must be reduced into bytes that are written to and read from disk

    Even in an RDBMS, the same is true

    In any programming language or RDBMS, there are a set of specific data types that are well known and can be directly convertbytes and the data type (such as a 32-bit integer or text string)

    Each language and database understands a different way of converting between bytes and data types:

    A 32-bit integer in Java does not represent the same byte pattern as a 32-bit integer in Visual Basic

    The conversion of a data type to bytes is called serialization and the reverse is called deserialization

    This is an ongoing issue in computer science and affects all computing applications

    As long as there is a single platform performing all operations across the lifecycle, there is no measurable issue

    The most consistent format across all platforms is text, which is a powerful indicator of why XML has been so successful as everepresented as text in XML

    The comparison of data (such as during search) requires the processing software to understand the data stored

    Due to this fundamental concept, the storage format used should be aligned with the ultimate patterns of use and limitations of the platforms (for example maximum allowed field lengths in an RDBMS)

  • 8/9/2019 Time Series Data Concepts

    28/52

    Storage Considerations

    It is critical that storage designers consider:

    Volume (size)

    Access speed (read and write)

    General performance

    If most access will enumerate a data set, the selected storage mechshould favor that form of access

    If random access is still needed, then no optimizations should be usenumerations that make random access unusable

    This is always a trade-off and must be evaluated on a case-by-case

  • 8/9/2019 Time Series Data Concepts

    29/52

    Time Series Field Concepts

    Each time series may have multiple fields of data collected

    Each time series may have different fields collected than anotheseries

    Given both of these premises, the design of the data fields withtime series may be of considerable importance

    Time series data may be stored in any number of ways using varioutechnologies

    In each of these technologies, the time series and the data values arelated and may be treated differently based upon the specific techused

  • 8/9/2019 Time Series Data Concepts

    30/52

    Single Field Time Series

    This form of time series has a single value collected at each time interval

    This form of time series may be thought of and treated as a basic value stream of discrete values for the single field at t he fixed interval of the time series

    The field and storage design for this type of time series only needs to deal with the most primitive anomaly:

    Missing data values

    Within any time series it must be expected that some individual value points may be corrupt and therefore are missing from the series

    In any time series that uses IEEE 754 compliant single (32-bit) or double (64-bit) precision floating point numbers, there is a built- in not a number (NaN) value

    In this case, no special handling is required for the time series except to expect that NaN values may be present anywhere within the value stream

    If a single field time series is storing data in another format, such as integer or string values, accommodations must be made for the absence of value within the value stream

    For the design of single field time series data, there are two basic approaches:

    Time coupled

    Sequential

    A time coupled single value series will associate each record within the time s eries as the (T,V) pair of time (T) and value (V)

    This set of pairs becomes the time series

    A sequential single value series will provide all records within the time series as a stream of values with only a single time stored indicating the start of the series and a single intthe temporal spacing of the values within the series

    In this manner, the time series may be though of simply as an array of values

  • 8/9/2019 Time Series Data Concepts

    31/52

    Multiple Value Time Series Each temporal record within the time series has a set of multiple fields

    Based the definition of a time series, all records have exactly the same set owithin a single time series

    Each time series defines its own set of fields, and therefore may result in arbmany time series field sets within an organizations corpus of time series dat

    The pattern for storing multiple field time series data can take several forms

    The most basic form is to treat each field within the time series as a distinctfield time series

    This approach isolates each data field as a distinct time series and provides tto distribute the storage of each time series to different storage locations

    There is however the overhead of additional storage for the time series met

  • 8/9/2019 Time Series Data Concepts

    32/52

    Hub and Spoke Model

    A basic expansion of the single field time series pattern for mult

    fields is to create a hub and spoke or star pattern for the tim

    series

    The core time series metadata is recorded as a single entity, wit

    field modeled as a discrete time series data value stream

  • 8/9/2019 Time Series Data Concepts

    33/52

    Field Interleaved

    A time series is stored as a series of value streams Each value stream is complete for the time series, containing all va

    a single field

    This model most closely resembles the result of the hub and spoke

    where each parameter is isolated as a series

    The total time series has one value stream per field that can be easenumerated

    If values of multiple fields must be accessed together, there is addi

    overhead for enumerating multiple streams

  • 8/9/2019 Time Series Data Concepts

    34/52

    Field Interleaved Example

  • 8/9/2019 Time Series Data Concepts

    35/52

    Interval Interleaved

    The fields are stored in order within each temporal interval This permits each temporal interval to be the primary unit of separ

    between each data record

    Within a single temporal record, the fields are consecutive in a preorder

    Interval interleaved storage provides rapid enumeration of the tim

    when all fields are used in the enumeration

    If it is most common to enumerate the time series to access only aparameter, there is overhead in the transport and skipping of unusto access the required field

  • 8/9/2019 Time Series Data Concepts

    36/52

    Interval Interleaved Example

  • 8/9/2019 Time Series Data Concepts

    37/52

    Coupled Interleaved

    For any time series where general enumeration involves specific kn

    groups of fields, a hybrid of field and interval interleaving may be u

    Allows for groups of fields to be represented as field interleaved with t

    remainder of the dataset interval interleaved

    Provides fast enumeration for the coupled fields while avoiding the cos

    skipping unused fields

    If the coupling of fields is not known at design time, this representationdifficult to plan for

    Use of this pattern has the overhead of both interleaving methods if

    enumerating uncoupled fields (e.g. field 1 and field 5 in example)

  • 8/9/2019 Time Series Data Concepts

    38/52

    Coupled Interleaved Example

  • 8/9/2019 Time Series Data Concepts

    39/52

    RDBMS Storage Patt

  • 8/9/2019 Time Series Data Concepts

    40/52

    RDBMS Storage

    Time series data can be stored in a number of ways within an R

    Time series data may be stored as temporal records, one value

    Likewise, time series data can be compacted into a single field a

    stored as a binary object (BLOB) or XML

  • 8/9/2019 Time Series Data Concepts

    41/52

    Flat Temporal

    In the flat temporal model of storing time series data, there is n

    notion of a time series

    All data is simply stored as temporal records

    This is the most simplistic method of storing temporal data overall

    Provides good performance for random access

    Suffers from poor insert performance (mainly when indexed) Relatively slow overall sequential access performance due to the ta

    scan nature of retrieval

  • 8/9/2019 Time Series Data Concepts

    42/52

    Flat Time Series

    Each time series is registered in a time series table that definethe time series reference information (metadata)

    All the actual data for the time series is stored in a values table

    Each record in the values table stores a single time series record (p

    time)

    In most cases, each time series will have a different set of fields antherefore be best represented by a separate values table

    Results in a single master time series table and multiple values ta

  • 8/9/2019 Time Series Data Concepts

    43/52

    Flat Time Series Example

    Provides similar performance characteristics to the flat tempora

    model

    Time series table allows for retrieval based upon a specific time se

    instance

    Allows for a long-term time series (such as continual monitoring) t

    identified in a single values table

    If random access to data is the most common, this model will ybest overall performance characteristics and allow for query by

    values with no special software capabilities utilized

  • 8/9/2019 Time Series Data Concepts

    44/52

    Entity Time Series

    An entire time series is treated as an entity

    Individual data values treated simply as atoms within the entity Time series is stored as a single record in a database table

    Entity time series storage has multiple flavors that each have differences to improve some aspect of the time series storage sperformance

    Flat BLOB

    Flat XML

    External File

  • 8/9/2019 Time Series Data Concepts

    45/52

    Entity Time Series Example

  • 8/9/2019 Time Series Data Concepts

    46/52

    Dynamic Time Series A further refinement for RDBMS storage of time series data is to

    dynamically structure the storage rather than use fixed element

    the previous methodologies

    Dynamic time series storage is a broad class of methodologies tattempt to gain advantages in performance and size for managiseries data within an RDBMS

    In all dynamic time series storage strategies data within the valu

    fields may be encoded as BLOB or XML data In dynamic storage, the time series is simply broken into multip

    individual records each of which contains multiple data values

  • 8/9/2019 Time Series Data Concepts

    47/52

    Fixed Size Dynamic Storage

    Each record has a field target size limit (e.g. 100kb, 10Mb, etc) for s

    data values

    The data value encoding software is responsible for breaking the timseries into chunks of data values that do not exceed this size limit

    The goal is to encode the most discrete values possible, in time orddo not exceed this size limit. In this manner, each record will contavalues between a min and max time

    There are two basic sub-strategies for fixed size time series storage

    Time Window

    Entity Window

  • 8/9/2019 Time Series Data Concepts

    48/52

    Fixed Size Dynamic Example

    In the time window strategy, the time series values table mainta

    start date and end date for each time series record that indicatebounds stored within that record

    The entity window strategy is very similar, except that if the tim

    records are all of fixed size, it is possible to know a priori what t

    exact maximum number of data values may be stored within a

    record of the time series

  • 8/9/2019 Time Series Data Concepts

    49/52

    Entity Window Computation

    In this strategy, the time series itself indicates the number of values stored within

    and the offset is computed to any value as:

    Once the computation is completed

    recordOffset indicates the sequenceId (zero-based) containing the value

    elementOffset indicates which value within the record is to be returned

    The need for this computation makes random access possible but slightly computcostly

    For enumeration of data, there is no such overhead cost

  • 8/9/2019 Time Series Data Concepts

    50/52

  • 8/9/2019 Time Series Data Concepts

    51/52

    Conclusion

    Every organization must evaluate its information strategy and time series data needs toadequate planning and effective implementations are used for an effective lifecycle for

    There are many considerations for each type of time series data that comprises the orgainformation corpus

    Data modeling and implementation planning is an activity which is critical to ensure theentities are captured in a repeatable, standardized and maintainable manner

    Time series data can be reduced to a simple set of concepts and a small set of general pimplementation

    Each actual time series data set within the organization can use these concepts and pat

    create an effective and efficient implementation for that time series that can be reusedorganizations lifetime

    Each time series data type will need to be evaluated separately and the most effective spatterns used

  • 8/9/2019 Time Series Data Concepts

    52/52

    QuestThere are no silve