Transcript of 3 PODS Database Concepts and Terms 30-minute …...3 PODS Database Concepts and Terms 30-minute...
3 PODS Database Concepts and Terms
Presenter
Presentation Notes
30-minute session to introduce students to terms and concepts they will need to understand PODS.
PODS BasicsUnit 3 – A training in PODS data management
concepts and terms
Intended Audience
• GIS/IT professionals• New to pipeline industry• Little or no exposure to PODS
PODS Training – both PODS Basics and PODS Advanced – create a better understanding of PODS Standards and PODS implementations through geospatial and relational database applications.
Presenter
Presentation Notes
This workshop expects the attendees to have at least a fundamental understanding of GIS. This workshop will help GIS professionals understand the importance of the PODS data model as a foundation for spatially managing pipeline data and assets. It will teach the basic PODS and GIS terminology necessary to understand pipeline data in a GIS, the basics of data models and relational databases, and how Linear Referencing is used to model the location of pipelines and related assets in ArcGIS.
An introduction to PODS Data Management Concepts and Terms
3 PODS BASICS
Presenter
Presentation Notes
In Unit 2, we learned about Linear Referencing, the method by which we can store, manage, and work with pipeline features and their associated assets spatially. We learned that the linear referencing method stores the relative locations of pipeline assets like meters, valves, etc. along an existing pipeline route. These assets and their associated data are stored as Events and Event Tables and are related to the spatially stored pipeline in GIS. Today, we’ll turn our attention away from the spatial aspects of pipelines and towards the data management and data quality aspects of the PODS data model. The theme of this unit is CONNECTEDNESS.
WebinarSeries Overview
• Unit 1 – PODS Basics• Unit 2 - Linear Referencing Concepts and
Terms• Unit 3 - PODS Data Management
Concepts and Terms• Unit 4 - The PODS Schema• Unit 5 - Spatial Analysis of Pipeline Data• Unit 6 - PODS Implementation
Introduction
How would you convert this into an information system?
Presenter
Presentation Notes
In our last session we discussed Linear referencing and the spatial representation of a pipeline and its associated components, To fully understand this concept we focused on a pipeline route as a singular object that’s spatially aware of its location, length. We also discussed linear referencing as a way of locating pipeline components (and other data types) using the pipeline route as a basis for measurement. Today, we’re going to “zoom out” and holistically view the PODS data model and its various components. We’ll expand upon Unit 1’s introduction to the data model by digging deeper into the data management aspects of PODS. But first, examine the image above, what do you see? Pipelines, associated equipment and other assets. What kinds of assets? Production, monitoring, marketing(sales), transmission, mechanical, etc. Also, geographic boundaries. Holistically, you also see connectivity – a network(system) of pipeline, production, equipment, etc. How could you model this as an information system? What kinds of information would need to be associated with each asset type?
Work history
Leak survey
Physical pipeline facilities
How do you design a container for all this mission-critical data?
Regulatory compliance
Risk assessment
Operating measures
Site facilities
Cathodic protection
CompressionGeographicboundaries
Geographicfeature
crossings
Inlineinspection
Close interval surveys
Offshore lines
External documents,
reports
Presenter
Presentation Notes
The assets themselves must be stored as well as descriptive information about each one AND connectivity between assets must exist.
Work history
Leak survey
Physical pipeline facilities
How do you design a container for all this mission-critical data?
Regulatory compliance
Risk assessment
Operating measures
Site facilities
Cathodic protection
CompressionGeographicboundaries
Geographicfeature
crossings
Inlineinspection
Close interval surveys
Offshore lines
External documents,
reports
With logical groups of tables!
What data?
Organization?
Data Quality?
Governance?You need a data model.
Presenter
Presentation Notes
So the answer to our question “How do you design a container for all this mission-critical data is TABLES. You create groupings of logically related tables. OK, now what? What data should be stored? How should these tables be organized? How should the relationships between datasets be established and governed? You need a Data Model.
What is a Data Model?
A data model defines how data is connected, stored, and processed.
It provides the organizational and conceptual model of data relationships
PODS has 3 types of data models:1. Conceptual – Overview, low detail2. Logical – detailed model including
detailed, finished data model with defined data elements
PODS Data Model Example
Presenter
Presentation Notes
This is the conceptual model of version PODS 4.02- one of the most widely-used versions of the PODS Data Model. The groups of colored boxes demarcate the various groupings of data tables – Locations, Regulatory Compliance, Sites, Cathodic Protection, Events, Centerline, Pipeline Facilities, etc.
What key drivers shape the PODS Data Model?
Regulatory Compliance
Traceability of pipeline equipment, materials
Quality assurance
Interoperability with other enterprise systems
Project management, monitoring
Industry standards and common language
Data management strategy
Consistency through pipeline phases from design to operation
Presenter
Presentation Notes
Aside from the visual components of the pipeline system, there are other, more intangible associations to be made within the pipeline system. These associations add complexity to our information system model, yet, their importance cannot be underestimated in the management of a pipeline network. The list above represents a few of the key drivers shaping the components of the PODS data model. This is where we find the PODS Data Model – at the intersection of tangible and intangible aspects of a pipeline network.
What is the PODS Data Model?
The PODS Data Model is…
• a plan defining how all vital pipeline data is connected, stored, and processed.
• “pipeline-centric.”• designed to reside on spatial (ESRI
GIS) and non-spatial platforms.• GIS Neutral and Vendor neutral
Presenter
Presentation Notes
So, how what about the PODS data model? It begins with the PIPE. The data model is a collection of related data tables fully describing pipeline assets. Several varieties of the model available PODS 6.0, NextGen(7.0), PODS Lite, PODS Relational, PODS Spatial
3 PODS Key Concepts and Terms
• The PODS Data Model• Provides the database architecture pipeline operators use to
• store critical information • analyze data about their pipeline systems• manage this data geospatially in a linear-referenced database which can then
be visualized in any GIS platform.
Presenter
Presentation Notes
Data models are abstractions of reality. Platform independent Comprehensive Evolves with member’s input RDBMS and GIS independent Most versions are GIS agnostic.
PODS Model Data Building Blocks
Now, let’s go a little deeper in our understanding of how entire pipeline systems and ancillary data are managed within the PODS data model.
Relational Databases
Presenter
Presentation Notes
Next, let’s look at the fundamental building block of the PODS Data Model…. The Relational Database. As the name suggests, these related data tables are the essential unpinning of the Data Model. These databases and the relationships between them are the engine behind the PODS data model. In this unit, we’ll gain a better understanding of relational databases, how they work, and how they function in PODS. Take a moment to examine the graphic – notice the pipe segment is only one of several data types in the model, within the Pipe Segment route, notice the number of ancillary features associated with it. Also notice the data stored for the pipe segment’s casing and coating.
What is a Database?
• A Database is…
• A structured collection of information stored electronically
• Efficient, flexible data management (storage, retrieval, analysis, share, and update.)
• A central data repository
Presenter
Presentation Notes
There are various database types such as Relational, Object-oriented, Distributed, etc. For our purposes, we’ll focus on Relational Databases. “Database” in this series refers to this type of database.
What’s the difference between a Data Model and a Database?
Data Model•Defines the
connections between data – how its defined, stored, processed
•Not the actual data
Database•Physical implementation
of the data model •A central data
repository, i.e. “System of Record”
Presenter
Presentation Notes
PODS – Pipeline Open Data STANDARD. Data models are abstractions of reality.
What is a Database?
A Database is a collection of related data stored in Tables
• Tables are organized into Rows (records) and columns (fields)
• Databases also store prescribed rules, roles, and relationships for data within it
• Controlled by a DBMS (Database Management System)
Records
Fields
Relationships between tables
reflect the association of
objects in reality.
Database Types
• Oracle or SQL Server supported relational databases• Enterprise can also be RDBMS, such as Oracle or SQL
Server• Geodatabases - File vs. Enterprise (ESRI)• In pipeline industry, databases are typically connected
with some sort of mapping software Oracle
MicrosoftSQL Server
ESRIGeodatabase
Presenter
Presentation Notes
There are various database types such as Relational, Object-oriented, Distributed, etc. For our purposes, we’ll focus on Relational Databases. The term “Database” in this series will refer to this type of database. RDBMS – Relational Database Management System ?Compare/Contrast PODS Relational and PODS Spatial?
Relational Databases• Normalization of disparate data into a Relational Database
• A database design method that organizes information into multiple related tables to minimize data redundancy
• Information goes into the correct table, free of duplicates• Tables are related to each other• PODS database structure is normalized
Operator
• Name• UID• Contact
Pipeline
• Name• UID• OP-ID
Meter
• Name• UID• PIPE-ID
Presenter
Presentation Notes
A Relational Database has undergone a “normalization” process during which the tables, relationships, and rules are established in the database. This process makes the database operate efficiently. Why is it a Relational Database? Explain information about how an operator is stored only in the operator table. The same rule also applies to Pipelines and Meters. If I need information about the pipeline that a meter is on, I follow the relationship to that pipeline in the pipeline table and get it from there.
Relational Database Key Fields
• Primary Key• Unique• Not null• Never changes
Operator
• Name• UID• Contact
Pipeline
• Name• UID• OP-ID
Meter
• Name• UID• PIPE-ID
• Foreign Key• Not unique• Multiple• Points to some other Primary Key
Presenter
Presentation Notes
The Primary Key (PK) is the unique ID field. A GUID datatype so the database creates a unique ID for that thing. What is a GUID? It stands for Globally Unique Identifier. More robust than a simple unique ID. In fact, they’re extremely large numbers that are guaranteed to be unique, which is important for identifying every component in a pipeline network. GUIDS ensure the PODS relational databases have unique primary keys. PipelineID is an example of an GUID that permeates nearly all tables. Why? Because most all tables contain asset information relating back to the pipeline.
Example – Database Linkages
Presenter
Presentation Notes
Here’s some other data relationships in PODS.
3 PODS Key Concepts and Terms
• Databases – Essential PODS building blocks• Relational Databases are the basic building blocks of the PODS Data
Model.• Provides connectivity throughout the PODS Data Model.• Primary and foreign keys within databases connects tables together.
Database Schema
The implementation of the Database Model on a specific computer system and relational database.
Data Management Terms
Data Model – Conceptual, logical, physical
Relational Database –Connected data tables
Schema – the physical manifestation of the Data Model
Presenter
Presentation Notes
Sometimes you’ll hear or see these terms used interchangeably, but there are differences between them.
Ensuring Data Integrity
How does PODS enforce data integrity and quality?
• Code Lists• Enumerators• Domains/SubTypes
Presenter
Presentation Notes
Data integrity enhancements continue with each version of PODS. The addition of Code lists in the newest PODS Data Model (7.0) are a good example of continued strengthening of data quality management. Code lists are managed in the PODS 7.0 Logical Model. Like Lookup Tables. The PODS logical data model will support domains (for geodatabase implemen-tations), code lookup tables (for SQL DDL implementations), and code lists (for the XML Schema Data Exchange Specification). Code list values and descriptions are (typically) synonymous.
What is a Code List?
There are 3 Types of Code Lists:1. Enumerators2. Managed Code Lists3. Unmanaged Code Lists
Why are Code Lists Important?
• Enforces Data Integrity• Data Standardization• Data Entry
Presenter
Presentation Notes
How often have you looked at a field in a spreadsheet or database and noticed missing or inconsistent data? Like, how many versions of a single company name exist? For example- Reduces “fat fingering.” Smooths data entry process by providing a pick list or menu of items to choose from rather than typing an entry. Product is standardized data.
PODS CODE LIST TYPES
All possible values are permanently fixed at
the time of standardization.
Enumerations
Presenter
Presentation Notes
PODS Code Lists are maintained in the PODS Pipeline Data Model’s Logical Model. Code lists store valid values for certain attributes of pipeline features. Code lists are matched to the type of database/geodatabase the data resides in. Because values are important to the working of the standard, PODS manages the lifecycle of values. Values can be added, retired, deprecated, or superseded ONLY in very rare and extenuating circumstances.
Enumerator Example
Presenter
Presentation Notes
Fixed values.
PODS CODE LIST TYPES
Most possible values are permanently fixed at
the time of standardization.
Managed Code Lists
Presenter
Presentation Notes
Because values are important to the working of the standard AND/OR are required for interoperability reasons, PODS manages the lifecycle of values. Values can be added, retired, deprecated, or superseded. Modules/Users shall not add or supersede values without submission to a TBD PODS management process.
Managed Code List Example
Presenter
Presentation Notes
Values in Managed Code lists may be changed by PODS.
PODS CODE LIST TYPES
Values in the list are examples and not
managed by PODS. Unmanaged Code Lists
Presenter
Presentation Notes
These values are not important to the working of the standard; therefore, PODS does not manage the lifecycle of values. Values can be added, retired, deprecated, or superseded. Modules/Users can do whatever they like.
Unmanaged Code List Example
Presenter
Presentation Notes
This doesn’t mean no management is needed! It means the PODS model isn’t managing these lists. In each organization, there should always be individuals designated to ensure the code lists correct are up to date.
Database Subtypes and Domains
A type of code list used in Geospatial versions of PODS (PODS Spatial)
Database rules describing the valid values of a field.
Attribute domains constrain allowable values in a table.
Enforces data integrity.
Presenter
Presentation Notes
Domains and subtypes are another way to enhance data integrity and are similar in results to lookup tables and code lists. In some geospatial databases you can choose to employ subtypes and domains for data integrity enforcement. Here’s an example: Think of a road database. Subtypes of roads might be local, rural, or highway Domains would be the individual types of road that fall into each of the subtype categories. So local road domains might be street, boulevards, and Avenues. Rural domains might be dirt, path, or trail. Highway subtypes might be State highway and Interstate.
End of Part 3Any questions?
In Summary
1. The PODS Data Model(s) provide the architecture for storing and managing pipeline systems, equipment, etc.
2. PODS relational databases are the systems of record for pipeline assets and operating environments.
3. PODS lookup tables, domains, and codelists enforce Data integrity, standardization.
• Some GIS terms that are important to understanding pipeline modeling• PODS Databases
• RDBMS – Relational Data Base Management System • Geodatabases
• File GDB – ESRI format, stores Feature classes, uses files in a folder and no 3rd party RDBMS• Enterprise (ESRI) – Feature classes stored within 3rd party RDBMS
• Oracle or SQL Server supported relational databases• Domain / Code Lookup
• Domain – FGDB way to store limiting value to control acceptable values in a column
• Done with a Code Lookup table in an RDBMS
Presenter
Presentation Notes
File GDB – ESRI format, stores Feature classes, uses files in a folder and no 3rd party RDBMS Enterprise GDB – Feature classes stored within 3rd party RelationalDataBaseManagementSystem. Domain… - FGDB way to store limiting value to control acceptable values in a column, done with a table in an RDBMS.
Essential GIS Terms
Some GIS terms that are important to understanding pipeline modeling:• Relationship Class – GDB item that preserves a relationship between tables,
like a persistent join• GUID – a Global Unique Identifier – insures that the value is not duplicated
by any other GUID value in the entire GDB• Editor Tracking – ESRI enabled mechanism for tracking when a feature is
created and last edited• UTC – Coordinated Universal Time – the official agreed-upon time of the
earth, time zone independent
Presenter
Presentation Notes
Rel Class – GDB item that preserves a relationship between tables, like a persistent join. GUID – a Global Unique Identifier – ensures that the value is not duplicated by any other GUID value in the entire GDB. Editor Tracking – ESRI enabled mechanism for tracking when a feature is created and last edited. UTC – Coordinated Universal Time – the official agreed-upon time of the earth, time zone independent.
Important Acronyms
• EA – Enterprise Architect by Sparks Systems • DDL – Data Definition Language • APR – Esri’s ArcGIS Pipeline Referencing solution • LRS – Linear Reference System • OGC – Open Geospatial Consortium • SQL – Structured Query Language