Ibm White Paper Data Governance for Gis[1]
Transcript of Ibm White Paper Data Governance for Gis[1]
-
8/3/2019 Ibm White Paper Data Governance for Gis[1]
1/20
IBM Software
Thought Leadership White Paper
June 2010
Data governance for geographicalinformation systems
Maximize the value of GIS data with IBM InfoSphereFoundation Tools and IBM InfoSphere Information Server
-
8/3/2019 Ibm White Paper Data Governance for Gis[1]
2/20
2 Data governance for geographical information systems
Contents
2 Introduction
3 Geographic information systems: Analytics in action
4 InfoSphere Information Server: A unified foundation for
information architectures
7 The InfoSphere Information Server architecture
9 InfoSphere Information Server and GIS touch points
16 Integrating InfoSphere Information Server and GIS with
other software components
18 GIS solutions in the real world
20 Building trustworthy data with IBM software
IntroductionDuring the past two decades, government agencies across the
world have made a significant investment in information.
Agencies of all types are using enterprise resource planning
(ERP) software, supply chain management packages and cus-
tomer relationship management (CRM) solutions, molding
these information tools in a nearly endless variety of combina-
tions to better serve their constituents.
Agencies are also taking advantage of innovations for gather-
ing and processing information. These technologies are a mix
of the familiar and the new: Service-oriented architectures
(SOAs), Web services, XML, grid computing and radio
frequency identification (RFID). One technologygeographic
information system (GIS)holds particular promise for agen-
cies. A GIS links nearly any type of data with spatial and loca-
tion information, enabling agencies to see and analyze the
real-world dimensions that their information inhabits.
For government agencies, this is a powerful promise; by some
estimates 80 percent of all agency data has some type of spatial
or location component. For example, agencies can use GIS
tools to easily plot census, traffic or commerce data onto
maps, uncovering geographic trends that might otherwise
be missed.
But while this new world of information brims with possibili-
ties, agencies face serious challenges managing the tremen-dous amounts of data being generated. The sheer volume of
data, combined with changing laws and regulations, can make
it difficult to integrate multiple systems and turn the data
into consistent, timely and accurate information for decision
making.
IBM InfoSphere Information Server can help agencies
derive more value from complex, heterogeneous information.
It helps business users and IT personnel collaborate to under-
stand the meaning, structure and content of information
across a wide variety of sources. With IBM InfoSphere
Information Server, users can access and use information in
new ways to drive innovation, increase operational efficiency
and reduce risk.
InfoSphere Information Server can also help agencies maxi-
mize the value of a GIS by creating a gateway between the
system and data sources that may have been previously inac-
cessible. Using InfoSphere Information Server in conjunction
with a GIS enables an agency to enrich and leverage spatial
information in new ways and provide new perspectives on old
issues.
This white paper provides an overview of InfoSphere
Information Server with an emphasis on how InfoSphere
Information Server and InfoSphere Foundation Tools can be
used in conjunction with GIS. It begins with a brief discussion
of GIS, including the primary GIS vendors, followed by an
overview of InfoSphere Information Server and InfoSphere
-
8/3/2019 Ibm White Paper Data Governance for Gis[1]
3/20
3IBM Software
Foundation Tools components and capabilities. It will also
show how GIS data sources can be analyzed, transformed and
enhanced with InfoSphere Information Server and InfoSphere
Foundation Tools, providing references to complementary
software from IBM and third parties. The paper also provides
examples of how the capabilities can be used to help agencies
and organizations achieve their missions.
Geographic information systems:Analytics in actionMaking electronic maps is probably the most well-known use
for GIS, but there are plenty of other potential uses. Strictly
defined, GIS is a computer system capable of capturing, stor-
ing, analyzing and displaying geographically or spatially
referenced information; in other words, data that is referenced
by a spatial or physical location of some kind. Another way of
describing GIS is to refer to it as software that links geograph-
ical data (in the form of coordinates) to descriptive (also
known as tabular) data, making it possible to analyze the rela-
tionship between geographic data and its descriptive/tabularelements.
Data in a GIS can exist in one of two basic formats: Vector
and raster. Vector data consists of points, lines and polygons
defined by geographic coordinates. Raster dataan image that
covers a range of points where each pixel corresponds to a
geographic location and has an associated valueis usually
represented as a georeferenced picture or photo. A GIS can
integrate those two format types as data layers or themes and
can also link the vectors or images to descriptive or tabular
data, producing a complete geospatial representation.
GIS also helps users anticipate future outcomes by depicting
regression analysis for forecasting future events and processes.
This analytical capability is what separates a true GIS from
digital mapping.
The ability of GIS to manage, correlate, predict, model and
share geographic information makes GIS an essential analyti-
cal tool. But due to the specialized nature of the technology,
and the additional training required to master the systems,GIS is frequently segregated by agencies into well-defined
departments or shops. GIS departments are often viewed by
other agency segments as an organizational black box, with
staff that takes in mapping requirements and data; performs a
series of complicated, esoteric machinations; and then returns
the desired map.
But with so much agency data having a spatial or location
component, agencies are realizing that GIS analysis can pro-
vide unique and invaluable insight, and are looking for ways to
share that resource more broadly across their organizations.
The challenge is that most agency data is not stored in the
GIS, but in ERP systems, CRM solutions, relational databases
and other repositories scattered across the organization, some-
times in legacy applications, systems and formats.
Regardless of the format or source, if data has a spatial or
location component, it can be mappedand InfoSphere
Information Server can help organizations create gateways
between traditional data sources and GIS resources. Using
InfoSphere Information Server in conjunction with a GIS
can help an agency enrich and leverage spatial information in
new ways.
-
8/3/2019 Ibm White Paper Data Governance for Gis[1]
4/20
4 Data governance for geographical information systems
GIS software providers
Many companies provide GIS software and applications;
the largest is Environmental Systems Research Institute
(commonly known as ESRI; www.esri.com), a privately held
software company headquartered in Redlands, California.
According to ESRI, their software is used by more than
300,000 organizations worldwide, including most U.S.
Federal agencies and national mapping agencies, andmore than 24,000 U.S. state and local government
agencies.
The ranks of GIS software providers also include compa-
nies such as Intergraph, MapInfo, Bentley and Autodesk.
Regardless of the provider, virtually any database associ-
ated with a GIS can be enhanced by InfoSphere
Information Server.
InfoSphere Information Server: A unified
foundation for information architecturesToday, critical agency and business initiatives cannot succeed
without effectively integrated information. Initiatives such as
single view of the constituent/customer, business intelligence
(BI) and supply chain management require consistent, com-
plete and trustworthy information. If the information cannot
be trusted or doesnt meet their needs, end users will either
stop using a system for information or may create local copies
of the data on a spreadsheet. Those additional versions and
extracts of the data result in the central organization losing
control of valuable information.
IBM InfoSphere Information Server is designed to provide acomprehensive, unified foundation for enterprise information
architectures. It can scale to meet growing information
volume requirements, enabling companies to quickly deliver
high-quality business results. InfoSphere Information Server
supports a variety of initiatives:
Business intelligence: InfoSphere Information Server
makes it easy to develop a unified view of the business for
better decisions. It helps users understand existing data
sources; cleanse, correct and standardize information; and
load analytical views that can be reused throughout theagency. IBM provides a BI system through IBM Cognos
software. Master data management (MDM): InfoSphere
Information Server helps simplify the development of
authoritative master data (creating a golden record) by
showing where and how information is stored across source
systems. It also consolidates disparate data into a single, reli-
able record; cleanses and standardizes information; removes
duplicates; and links records across systems. This master
record can be loaded into operational data stores, data ware-
houses or master data applications. The record can also be
assembled, completely or partially, on demand. Infrastructure rationalization: InfoSphere Information
Server helps reduce operating costs by showing relationships
between systems and by defining migration rules to consoli-
date instances or move data from obsolete systems. Data
cleansing and matching ensure high-quality data enters the
new system.
Agency transformation: InfoSphere Information Server
can speed development and enhance agency agility by pro-
viding reusable information services that can be plugged
into applications, business processes and portals. Those
standards-based information services are maintained cen-
trally by information specialists, but are widely accessible
throughout the enterprise and can also be accessed by other
authorized agencies/entities.
http://www.esri.com/http://www.esri.com/ -
8/3/2019 Ibm White Paper Data Governance for Gis[1]
5/20
5IBM Software
Risk and compliance: InfoSphere Information Server helps
improve visibility and data governance by enabling com-
plete, authoritative views of information with proof of line-
age and quality. Those views can be made widely available
and reusable as shared services, while the rules inherent in
them are maintained centrally.
Data warehousing: InfoSphere Information Server can
help create data warehousing and data mart applications.
These are applications where data is offloaded from a trans-actional system and reorganizedusually for analytical or
reporting purposes.
Server, application and database consolidation:
InfoSphere Information Server can help consolidate struc-
tured data sources contained in applications to support the
reduction of the number of physical servers, applications and
databases.
Migrations: InfoSphere Information Server is frequently
used during database or application migrations. While
updating applications or during lengthy procedures such as
ERP implementation, InfoSphere Information Server can be
used to migrate data between legacy systems and ERP mod-
ules, or to provide temporary connectivity between legacy
systems and implemented ERP sections during migration.
InfoSphere Information Server capabilities
InfoSphere Information Server combines a variety of
IBM information integration technologies. Together, they
enable organizations to investigate and understand their data;
cleanse and certify it; and then transform and deliver it as a
fully trusted resource to systems across the agency or enter-
prise (see Figure 1).
Figure 1: InfoSphere Information Server enables businesses to performthree key functions: Gain an understanding of data, cleanse data and inte-
grate data.
Parallel processing
Metadata
repository
Cleanse
Standardize,
merge and
correct
information
Understand
Discover, model
and govern
information
structure and
content
Integrate
Combine,
restructure,
synchronize
and move
informationfor delivery
Datasources
$330,646.21
$0 $440K
Head count
Customer acquisition
Avg yield per customer
Express
Targets
Understand your information with IBM InfoSphere
Foundation Tools
For an organization to effectively integrate data, it must first
establish a clear picture of what data it has, where the data
resides and its overall condition. InfoSphere Foundation
Tools, which are part of InfoSphere Information Server, can
help organizations automate data profiling and data-quality
auditing to:
Understand data sources and relationships
Eliminate the risk of using or proliferating bad data Improve productivity Leverage existing IT investments
-
8/3/2019 Ibm White Paper Data Governance for Gis[1]
6/20
6 Data governance for geographical information systems
InfoSphere Foundation Tools help agencies collaborate across
user roles. Data analysts can use the analysis and reporting
functionalities to generate integration specifications and busi-
ness rules that can be monitored over time. Meanwhile, sub-
ject matter experts can use Web-based tools to define,
annotate and report on fields of agency data. The common
metadata foundation makes it easier for different types of users
to create and manage metadata by using tools that are opti-
mized for their roles.
InfoSphere Foundation Tools enable organizations to capture
and organize business metadata, provide modeling capabilities,
assist in the translation of business rules into transformation
processes and analyze data lineage by leveraging metadata.
Components of InfoSphere Foundation Tools
InfoSphere Information Analyzer
InfoSphere Business Glossary
InfoSphere Business Glossary AnywhereInfoSphere Data Architect
InfoSphere FastTrack
InfoSphere Discovery
InfoSphere Metadata Workbench
Cleanse your information
The InfoSphere QualityStage component of InfoSphere
Information Server supports information quality and consis-
tency by standardizing, validating, matching and merging data.
With InfoSphere QualityStage, organizations can certify andenrich common data elements, use trusted data such as postal
records for name and address information and match records
across or within data sources. InfoSphere Information Server
allows a single record to survive from the best information
across sources for each unique entity, helping you to create a
single, comprehensive and accurate view of information.
Integrate your data
InfoSphere Information Server helps organizations transform
and enrich information to ensure that it is in the proper con-
text for new uses. It includes hundreds of prebuilt transforma-
tion functions for combining, restructuring and aggregatinginformation. For example, InfoSphere Information Server
provides in-line validation and transformation of complex data
types, and high-speed joins and sorts of heterogeneous data. It
also provides high-volume, complex data transformation and
movement functionality that can be used for stand-alone
extract, transform and load (ETL) scenarios, or as a real-time
data processing engine for applications or processes.
InfoSphere Information Server data integration tools
InfoSphere DataStageInfoSphere Change Data Capture
InfoSphere Federation Server
InfoSphere Classic Federation Server
InfoSphere Information Server enables organizations to
virtualize, synchronize and move information to the people,
processes or applications that need it. Information can be
delivered by using federation, time-based or event-based pro-
cessing, moved in large bulk volumes from location to location
or accessed in places when it cannot be consolidated.
InfoSphere Information Server also provides direct, native
access to a wide variety of information sources, both main-
frame and distributed. It enables access to databases, files,
-
8/3/2019 Ibm White Paper Data Governance for Gis[1]
7/20
7IBM Software
services and packaged applications, as well as to content repos-
itories and collaboration systems. Companion products from
IBM support high-speed replication, data synchronization and
distribution across databases, change data capture capabilities
and event-based publishing of information.
The InfoSphere Information Server
architectureInfoSphere Information Server provides a unified architecture
that supports all types of information integration through
common services, unified parallel processing and unified meta-
data (see Figure 2). To ensure its availability across an organi-
zation, it employs an SOA; the SOA also connects the
individual components of InfoSphere Information Server.
CleanseUnderstand Transorm Deliver
Common connectivity
Unifed parallel processing
Common services
Structured Unstructured Applications Mainframe
Unifed metadata
D es ig n O pe ra ti on al
Metadataservices
Unifed servicedeployment Security services
Logging andreporting services
Analysis interace
Unifed user interace
Web admin interaceDevelopment interace
Figure 2: InfoSphere Information Server connects to a wide range of datasources and includes a unified parallel processing engine, a metadata
repository and a host of shared services.
Unified parallel processing engine
At the heart of InfoSphere Information Server is a unified par-
allel processing engine that handles everything from analysis
of large databases for InfoSphere Information Analyzer to data
cleansing for InfoSphere QualityStage and complex transfor-
mations for InfoSphere DataStage. The parallel processing
engine delivers outstanding performance, enabling organiza-
tions to handle more data more quickly. Benefits of the engine
include:
Parallelism and data pipelining to complete increasing vol-
umes of work in decreasing time windows Scalability support to add hardware (for example, processors
or nodes in a grid) with no changes to the data integration
design Optimized database, file and queue processing to handle
large files that cannot fit in memory all at once or large
numbers of small files
Common connectivity
InfoSphere Information Server connects to informationsources whether they are structured, unstructured, applications
or on the mainframe. Metadata-driven connectivity is shared
across the InfoSphere Information Server components, and
connection objects are reusable across functions. Connectors
provide design-time importing of metadata, data browsing and
sampling, runtime dynamic metadata access, error handling
and high functionality and high-performance runtime data
access. Prebuilt interfaces for packaged applications called
packs provide adapters to SAP, Siebel, Oracle and other
applications, enabling integration with enterprise applications
and associated reporting and analytical systems. In some cases,
you can extract specialized metadata associated with those
sources.
-
8/3/2019 Ibm White Paper Data Governance for Gis[1]
8/20
8 Data governance for geographical information systems
Unified metadata
InfoSphere Information Server is built on a unified metadata
infrastructure that enables shared understanding between
business and technical domains. This infrastructure helps
reduce development time and provides a persistent record
that can improve confidence in information. All functions of
InfoSphere Information Server share the same metadata
model, making it easier for different roles and functions to
collaborate.
A common metadata repository provides persistent storage for
all InfoSphere Information Server suite components, all of
which use the repository to navigate, query and update meta-
data. The repository contains four types of metadata:
Technical, business, dynamic and operational.
Technical metadata is information about the format of the
data, such as the tables that are present, the attributes of
those tables, how many characters wide a particular attrib-
ute may be and when the data was last updated.
Business metadata can include a wide range of information
about data usage, such as the owner or steward of a piece of
data, the intended use of the data and definitions for
acronyms or domain values. Dynamic metadata includes design-time information. Operational metadata includes performance monitoring, audit
and log data and data profiling sample data.
Because the repository is shared by all suite components,
profiling information that is created by InfoSphere
Information Analyzer is instantly available to users of other
InfoSphere Information Server productssuch as InfoSphere
DataStage and InfoSphere QualityStage. The repository is a
Java 2 Platform, Enterprise Edition (J2EE) application
that uses a standard relational database such as IBM DB2,
Oracle or Microsoft SQL Server for persistence (DB2 is
provided with InfoSphere Information Server). Those data-
bases provide backup, administration, scalability, parallel
access, transactions and concurrent access.
Common services
InfoSphere Information Server is built entirely on a set of
shared services that perform core tasks. Design, execution andmetadata functions are all available as shared services:
Design: Design services help developers create function-
specific services that can be shared. For example, InfoSphere
Information Analyzer calls a column analyzer service that
was created for enterprise data analysis but can be integrated
with other parts of InfoSphere Information Server because it
exhibits common SOA characteristics. Execution: Execution services include logging, scheduling,
monitoring, reporting, security and Web framework,
enabling organizations to manage and control all compo-
nents from a single interface. Metadata: Using metadata services, metadata is shared
live across tools so that changes made in one InfoSphere
Information Server component are instantly visible across all
of the suite components. Metadata services are tightly inte-
grated with the common repository and are packaged in
InfoSphere Metadata Server.
The common services layer manages how services are
deployed from any of the product functions, allowing cleans-
ing and transformation rules or federated queries to be pub-
lished as shared services within an SOA, using a consistent and
easy-to-use mechanism. This can help organizations pursuing
SOA-centric architectures by exposing data integration, feder-
ation or cleansing processes directly to an SOA rather than
requiring a separate integration layer.
-
8/3/2019 Ibm White Paper Data Governance for Gis[1]
9/20
9IBM Software
Unified user interface
InfoSphere Information Server provides a common graphical
interface and tool framework that makes it easy for organiza-
tions to access the full power of the solution. Shared interfaces
such as the InfoSphere Information Server console and the
Web console provide a common look and feel, visual controls
and user experience across products, making it possible to
reduce training time and helping to simplify overall
administration.
Common functions, such as catalog browsing, metadata
import, query and data browsing, all expose underlying com-
mon services in a uniform way. InfoSphere Information Server
provides rich client interfaces for highly detailed development
work and thin clients that run in Web browsers for adminis-
tration. Application programming interfaces (APIs) support a
variety of interface styles, including standard request-reply,
service-oriented, event-driven and scheduled task invocation.
This provides a flexible range of capabilities to meet different
users specific needs, while also ensuring a standard look and
feel throughout product interfaces.
InfoSphere Information Server and GIS
touch pointsThere are a number of touch points where InfoSphere
Information Server capabilities can be used almost immedi-
ately out of the box with GIS data. They fall into three cate-
gories: Understanding data, cleansing data and integrating/
delivering data. The capabilities can be used either in unison
or separately, but all of them utilize the same integrated
InfoSphere Information Server architecture (see Figure 3).
InfoSphere Metadata Workbench
Targets
$330,646.21
$0 $440K
Head count
Customer acquisition
Avg yield per customer
Express
InfoSphere InformationServices Director
Parallel processing
Metadata
repository
CleanseUnderstand
InfoSphereInformationAnalyzer
InfoSphereBusinessGlossary/BusinessGlossary
Anywhere
InfoSphereDiscovery
InfoSphereData Architect
InfoSphereQualityStage
Integrate/delivery
InfoSphereDataStage
InfoSphereFastTrack
InfoSphere
FederationServer
InfoSphereChange Data
Capture
Datasources
RDBMS
ERP
Application
Mainframe
Flat files
Webservices
Technical metadata Business metadata Operational metadata
ODBCFTPAPI
Figure 3: InfoSphere Information Server comprises many products spreadover several touch points.
GIS data quality
Understanding the actual quality, content and structure of data
is an important first step to make critical business decisions.
Overall data quality depends on many factors, such as correct
data types, consistent formatting, retrievability and usability. If
the structure and content of your data is poor, then queries of
that data will be incomplete, organizations will be unable to
make informed decisions and business users will learn to be
-
8/3/2019 Ibm White Paper Data Governance for Gis[1]
10/20
-
8/3/2019 Ibm White Paper Data Governance for Gis[1]
11/20
11IBM Software
or boundaries of the data, information about the source and
capture method, time period when the data was captured, geo-
graphical reference/projection, stewardship and normal display
characteristics. Access to GIS metadata allows users to better
determine if a given data set will work for the intended map or
spatial analysis, or if another data set should be found or cre-
ated. Access to GIS metadata also helps agencies find more
opportunities for sharing existing data sets, rather than devot-
ing time and resources to create a new set for each purpose.
Within a more general scope (both inside and outside of the
GIS venue), metadata allows users to determine the availabil-
ity and usefulness of data sets. They have a shared and
accepted definition of what a given term means, and can then
link that accepted definition to a database, database attribute
or another IT asset. Without a common definition and
accepted master record, it is much more likely that users will
either use different data sources or create their own, causing
version control problems and creating distrust of some orga-
nizational data sources.
The problem of distrust is common in the business world. For
example, in a meeting with a number of HR professionals
from a government agency, the group was asked what their
head count was. But the agency had five different definitions
of what their head count actually was, depending on whether
the tally included funded full-time equivalent positions. In
government terms, a full-time equivalent position may be
filled by a single, full-time employee or by two or more part-
time employees; part-time, temporary and intern workers;
personnel on temporary duty, full-time employees only or
employees currently receiving benefits. Each definition cre-
ated a different data set, all were in use and the head-count
number varied depending on who was asked.
It is almost impossible to determine where data comes from
and whether or not the data can be trusted under such condi-
tions. In many cases, if users dont trust the data, they will
copy the data to a spreadsheet and massage it for their own
purposeswhich results in inaccurate data being used to run
the organization.
Many of the InfoSphere Information Server components
create and share metadata in a very transparent fashion, with-out requiring further effort from developers. InfoSphere
QualityStage and InfoSphere DataStage developers can build
jobs for data quality, cleansing and data integration, creating
operational data in the process. For example, developers using
InfoSphere QualityStage and InfoSphere DataStage can also
read or develop notes and annotations to their processes for
collaborationdocumenting what an analyst may have cre-
ated, or simply creating notes on how a process worksthen
share those notes with other users and interfaces. Analysts
using InfoSphere Information Analyzer to look at the contents
of data sources not only develop technical metadata in the
process of connecting to data sources, but can also create
notes about what results to share with job developers.
InfoSphere Business Glossary and InfoSphere Business
Glossary Anywhere
To create business metadata, link business metadata to IT
assets and disseminate metadata to users, organizations use
InfoSphere Business Glossary and a companion product called
InfoSphere Business Glossary Anywhere. In the example of
the HR department with multiple ways to count their total
staff, it would enable the user community to collaborate on
the official definition for the term head count, where that
information is stored, descriptive usage information and the
data steward or owner for the term.
-
8/3/2019 Ibm White Paper Data Governance for Gis[1]
12/20
12 Data governance for geographical information systems
InfoSphere Business Glossary provides a Web-based tool for
creating and managing standard definitions of business con-
cepts. Through InfoSphere Business Glossary, users work col-
laboratively to share and build common understanding to
create a classification system that is tailored to an organiza-
tions specific needs and structure. InfoSphere Business
Glossary helps simplify the task of managing, browsing and
customizing the broad variety of metadata that is stored in the
repository of InfoSphere Metadata Serverthe critical detailsabout tables, columns, models, schemas, operations and other
components of the data integration process.
Within InfoSphere Business Glossary, metadata is organized
into categories, each of which contains terms. Users can use
terms to classify other objects in the metadata repository based
on the needs of your organization. You can also designate
users or groups as stewards for any metadata object.
For users, InfoSphere Business Glossary becomes an elec-
tronic data dictionary, providing an easy-to-comprehend
way to navigate the metadata that keeps the entire organiza-
tion speaking the same language. InfoSphere Business
Glossary helps business users:
Develop a common vocabulary between business and
technology:A common vocabulary allows multiple users to
share a common definition of the meaning of data. Users
can assign categories and terms that are meaningful in an
organizational context, and create a hierarchy of categories
for ease of browsing.
Take part in data governance and stewardship activities:
Data assurance programs assign responsibility to businessusers (data stewards) for the management of data through its
life cycle.
Find business information that is derived from
metadata:Metadata helps users to understand the meaning
of the data, its currency, its lineage and who the data
owner is. Access metadata without complicated tooling and
querying:Metadata objects can be arranged in a hierarchi-
cal fashion to simplify browsing of the data objects. Provide collaborative enrichment of business metadata:
Maintaining business metadata is an ongoing process: Datainputs evolve and business users collaborate and add notes,
annotations, categories and synonyms to enrich the meta-
data. InfoSphere Business Glossary provides a tool for
recording those definitions and for relating business con-
cepts together into taxonomies. This places the business
requirements into the same metadata foundation used by the
profiling and analysis processes.
InfoSphere Metadata Workbench
InfoSphere Metadata Workbench is an interface that analysts
and developers can use to discover and analyze relationships
between information assets in the metadata repository. Itenables users to understand, analyze, audit and manage the
flow of data throughout their organization.
InfoSphere Metadata Workbench provides IT professionals
with a design-time tool for managing and understanding the
assets generated and used by InfoSphere Information Server.
It also permits registration of outside processes (such as
COBOL programs) to be documented within the metadata
flow. By providing data lineage reports and analysis,
InfoSphere Metadata Workbench supports IT professionals
who are responsible for compliance and governance initiatives
(such as Sarbanes-Oxley Act compliance). It also provides for-
ward and backward impact analysis that displays the impact of
proposed changes to information management environments.
-
8/3/2019 Ibm White Paper Data Governance for Gis[1]
13/20
13IBM Software
InfoSphere Metadata Workbench helps analysts
and developers:
Explore information assets that reside in the metadata
repository of InfoSphere Information Server
Perform simple and advanced asset search and querying
See information assets in the context of the entire organi-
zation with integrated cross-suite viewing capabilities
Create graphical views of asset relationships/flows Analyze dependencies and relationships of key InfoSphere
Information Server assets and BI reports
Trace lineage from jobs and databases to Cognos or other
BI reports
Understand columns, tables and other assets
Perform lineage analysis to understand where data comes
from or goes to by using shared table information, job
design information or operational metadata from job runs
Perform impact analysis to understand dependencies and
the effects of changes to a column or job across
InfoSphere Information Server
Analyze operational metadata from job runs and report
on rows written and read, and on the success or failure
of events Manage InfoSphere Information Server metadata to obtain
in-depth analysis reports
Create and edit descriptions of information assets
Assign glossary terms to information assets
Reconcile duplicate assets
Map databases to database aliases
Access runtime information to enrich reporting
As an example of how InfoSphere Metadata Workbench can
help enhance operational effectiveness, consider a military
readiness dashboard application with geographical elements
that highlight the location of military assets and their current
readiness level. Data is displayed using a variety of maps
(with color codes to show asset location), gauges and tables.
While the dashboard provides a valuable high-level view, a
commander might want to know more about a particular
readiness metric. In this case, InfoSphere Metadata
Workbench can be used to quickly show that officer the data
feeds and processes used to generate that particular metric,
enabling the officer to evaluate the validity of the metric with-
out deep technical knowledge.
Figure 4 showcases the various software components collec-
tively known as InfoSphere Foundation Tools.
Technical metadata
Business metadata
Operation metadata
InfoSphere BusinessGlossary/BusinessGlossary Anywhere
Robust data dictionary,common definitions,establish stewardship,links from terms to datasources/objects
InfoSphereInformation Analyzer
Understand whatis actually containedin database fields: null,duplicates, formaterrors, etc.
InfoSphereData Architect
Database, GIS andGIS metadata datasource modeling
InfoSphere Information Server metadata repository
Nonspatial
Spatial (specifically tabular)
InfoSphereMetadata Workbench Metadata lineage and impact analysis
Figure 4: GIS metadata management is enabled through IBM InfoSphereFoundation Tools, which provide a direct interface for discovering, gather-
ing and exploiting metadata.
-
8/3/2019 Ibm White Paper Data Governance for Gis[1]
14/20
14 Data governance for geographical information systems
Modeling GIS data and metadata
InfoSphere Data Architect (IDA) helps organizations discover
the structure of heterogeneous data sources by examining and
analyzing the underlying metadata, and it assists in modeling
planned data sources/migrations. IDA uses established
Java Database Connectivity (JDBC) connections, enabling
the users to explore existing data structures using native
queries and easily browse the hierarchy of data elements.
With IDA, users develop data models that can be incorporated
into a data integration project at a source and a target level.
IDA can create logical, physical and domain models for a vari-
ety of relational database sources. Elements from logical and
physical data models can be visually represented in diagrams
using Information Engineering (IE) notation; alternatively,
physical data model diagrams can use the Unified Modeling
Language (UML) notation. InfoSphere Data Architect enables
data professionals to create physical data models from scratch,
from logical models using transformation or from the database
using reverse engineering.
IDA also enables modelers and architects to define and imple-
ment standards that help increase data quality and enterprise
consistency for naming, meaning, values, relationships, privi-
leges, privacy and traceability. Standards can be defined once
and associated with diverse models and databases, helping to
improve efficiency and consistency. IDA also includes extensi-
ble, rules-driven analysis that verifies compliance to naming,
syntax, normalization and best-practices standards for both
models and databases.
Finally, IDA can import and export logical and physical data
models from other modeling tools, making it possible to take
advantage of existing data models wherever they appear,
reducing the amount of time needed to translate models into
data objects.
Discover business objects hidden within data
Data from multiple heterogeneous sources is often related in
ways that are not immediately obvious. It may also contain
sensitive information that is not clearly identified. This can
create difficulties for agencies working to integrate GIS into
their broader organization, as data analysts may be unfamiliar
with the types of data traditionally managed by a separate GIS
department.
Uncovering these hidden relationships and categorizations is
critical to the success of any data integration or governance
project. However, identifying and documenting an organiza-
tions data, as well as identifying relationships, business objects
and transformational logic between data sources, is not always
a straightforward process. InfoSphere Discovery automates
this process through heuristics and sophisticated algorithms,
helping organizations accelerate data integration and gover-
nance projects, while achieving greater accuracy with less risk.
Cleansing GIS tabular data
As organizations grow, they retain old data systems and aug-ment them with new and improved systems as goals and needs
evolve. Over time, data becomes increasingly difficult to find,
manage and use, decreasing the likelihood that users can
quickly make accurate decisions based on up-to-date,
trusted data.
The cost of poor data quality is illustrated in the following
scenarios:
A military readiness application contains incorrect and out-
dated data, so military officers cannot correctly grasp their
readiness to deploy and operate in a hostile environment
potentially risking lives, equipment and mission success. A data error in a bank causes hundreds of creditworthy cus-
tomers to receive mortgage default notices. The error costs
the bank time, effort and customer goodwill. A marketing organization sends duplicate direct mail pieces;
redundancy in each mailing costs hundreds of thousands of
dollars a year.
-
8/3/2019 Ibm White Paper Data Governance for Gis[1]
15/20
15IBM Software
Data quality issues spring from many sources, but can often be
traced back to one of three common themes:
A lack of common standards or instructions for storing data Inconsistent data entry Poor or decentralized control over key organizational data
InfoSphere QualityStage is a data reengineering environment
that is designed to help organizations cleanse and enrich data.The solution includes a set of testing stages, design tools for
specifying matches between data, and additional features that
combine to create a development environment for building
data-cleansing tasks.
Using the stages and design components, developers can
quickly and easily process large stores of data, selectively
transforming the data as needed. InfoSphere QualityStage
provides a set of integrated modules for common data reengi-
neering and cleansing tasks:
Investigating
Conditioning (standardizing) Designing and running matches Determining which data records survive (survivorship)
With probabilistic matching capabilities and dynamic weight-
ing algorithms, InfoSphere QualityStage helps agencies create
high-quality, accurate data and consistently identify core busi-
ness informationsuch as customer, location and product
throughout the organization. InfoSphere QualityStage
standardizes and matches any type of informationincluding
information from disparate data sources, and all types ofconstituent/customer, product and tabular GIS data, either in
batch or at the transaction level in real time.
Integrate GIS data with nontraditional GIS data
InfoSphere Information Server can be used to extend the data
sources traditionally available to GIS by integrating GIS data
with data formats that normally arent readable by a GIS
with or without additional transformations. Nontraditional
GIS data could be in a mainframe, in a flat file, in a Web serv-
ice, in an ERP system or in an applicationor in any combi-
nation of these locations.
For example, a state department of transportation system
might contain contractor transaction work-order data in a
mainframe database, and may have department financial
information contained in a relational database or application.
The GIS may have tabular information including a work-
order number, while the work-order mainframe data contains
a work-order contractor number and task order and the
relational data has a contractor number and task order.
InfoSphere Information Server could join the relational data
to the mainframe data based on contractor and task order
data, and join the combined information to the GIS tabular
information to allow further insight into contractor perform-ance on a spatial/geographic basis.
In the event that spatial transformations are required,
InfoSphere Information Server components can work in con-
junction with FME Server, a specialized spatial ETL product
from Safe Software. FME Server integrates directly with
InfoSphere DataStage and by proxy with the other compo-
nents of InfoSphere Information Server to perform specialized
spatial transformations.
With a wide range of data integration and analysis capabilities,
InfoSphere Information Server opens the door to an equally
wide range of GIS projects, from simply analyzing GIS tabular
data to integrating and analyzing GIS data from more conven-
tional data sources.
-
8/3/2019 Ibm White Paper Data Governance for Gis[1]
16/20
16 Data governance for geographical information systems
Extending GIS capabilities with InfoSphere Information
Server and other IBM software
InfoSphere Information Server works with other IBM soft-
ware components, such as IBM Cognos and IBM SPSS, to
help agencies extract even more value from their GIS systems.
Organizations can integrate and augment the investments that
they have already made in people, processes and technology
by integrating data sources in new ways and providing more
organizational insight into data.
Here are a few examples of how extending a GIS with
InfoSphere Information Server can help organizations derive
additional value from their GIS investments:
Data modeling and metadata modeling: GIS data (or
data to be integrated with GIS data) can be modeled using
InfoSphere Data Architect as a modeling tool. InfoSphere
Data Architect is also a convenient gateway for planning and
modeling potential linkages of GIS data to traditional data-
base sources. This helps users better understand exactly
what relationships exist between GIS and non-GIS datasources, and how those relationships can be leveraged for
both existing and planned/future sources. Integrated metadata management: GIS metadata can be
stored and managed within the construct of the InfoSphere
Information Server metadata repository. This allows organi-
zations to have a better shared understanding of their GIS
data. In addition, applied use of the metadata repository in
conjunction with other InfoSphere Information Server
products, such as InfoSphere Metadata Workbench,
InfoSphere DataStage and InfoSphere QualityStage, offers a
window into data lineage, quality and governance. This
gives users a high level of trust in the data by showing where
data comes from, when it was updated and what kinds of
transformations took place to modify, transform or
augment it.
GIS metadata query/flag/mine/retrieve: Storing GIS and
non-GIS metadata can be done using InfoSphere Metadata
Server and can currently be queried with InfoSphere
Business Glossary. Theoretically, organizations can also
make map-layer retrieval much more interactive for users
dealing with large numbers of spatial layers. A query inter-
face (which could be a search engine or an ad hoc query
interface) could be combined with a writeback mechanism to
the metadata to speed the map production process. In thisscenario, users could type in search terms for their mapping
requirements. Those search terms could be applied against
spatial metadata entries, score the metadata based on what
users have found useful for mapping, retrieve the appropri-
ate spatial metadata and provide a checkbox next to each
entry. Users would then check the layers they wanted to use
to generate a map and click a submit button. The selected
layers would be flagged, and the flag would be written back
to the metadata to indicate that a user had selected this data
layerhelping to improve future queries of the metadata.
The selection would also be passed to the GIS interface
(which could be a GIS or a simple spatial viewer) for cre-
ation of a baseline map.
Integrating InfoSphere Information Server
and GIS with other componentsWhile InfoSphere Information Server is designed to provide
the data integration architecture for an organization, and a
GIS would provide spatial analysis capabilities, there are addi-
tional products that can be used to optimize organizational IT
investments. These capabilities exist in bolt-on, off-the-shelf
products such as analytics, data mining and BI applications
(see Figure 5).
-
8/3/2019 Ibm White Paper Data Governance for Gis[1]
17/20
17IBM Software
Data mining/pattern detectionTraditional data sources
Native API
ODBC
Web service
FTP
Spatial data sources
ESRI Geodatabase
Oracle spatial
DB2 spatial
Tabular
Data profiling
Data cleansing
Data transformation/
enrichment
Parallel processing
Integrated metadata
Spatial ETL (SAFE software FME)
InfoSphere Information Server
Metadata repository
SPSS
Scheduled reports
Ad hoc reporting
Scorecard/metrics
Graphs/charts
Trending
IBM Cognos
Cognos + ESRI arcGIS
Server + SpotOn
Geographical
Business Intelligence
(GBI)
SpotOn Vantage
$330,646.21
$0 $440K
Head count
Customer acquisition
Avg yield per customer
Express
Figure 5: InfoSphere Information Server connects to data mining applications, analytics applications and BI applications to extend the use and reach oftrusted enterprise information.
Spatial ETL: FME ServerFME Server (www.safe.com) can be used to extend InfoSphere
Information Server via InfoSphere DataStage, forming it into
a scalable spatial ETL platform. This enables spatial data
managers to quickly meet diverse data access requirements,
permitting specialized data-conversion processes specific to
the GIS arena. FME Server offers flexible spatial data services
that help users convert, load and distribute large volumes of
data so end users can access it where, when and how they
need to.
FME Server brings the power of Safe Softwares proven spatial
data translation, transformation and integration technologyfrom FME Desktop to enterprise server environments,
enabling organizations to take advantage of flexible spatial
data distribution and scalable data loading and conversion
features.
With FME Server, organizations can address diverse spatialdata requirements for:
Web-based spatial data access: Downloading and
streaming Scalable data consolidation: Loading and migration Online quality assurance: Spatial data uploading and
validation Server-based spatial data conversion:Translation and
transformation
Advanced statistics, analytics and data mining: SPSS
SPSS, an IBM company, helps organizations find and imple-ment new sources of competitive advantage through predictive
analytics. When analytics are inserted into key business
processes, better decisions are made and the best actions are
taken on a consistent, repeatable basis.
http://www.safe.com/http://www.safe.com/http://www.safe.com/ -
8/3/2019 Ibm White Paper Data Governance for Gis[1]
18/20
18 Data governance for geographical information systems
SPSS provides predictive analytics and data mining technolo-
gies that can be used to add predictive intelligence to any data
integration and/or BI solution. These capabilities can further
extend GIS by providing superior analytical capabilities com-
pared to plain vanilla GIS. Advanced statistics, analytics
and data mining can be included in a hybrid InfoSphere
Information ServerGIS in a grey box fashion, using an
InfoSphere Information Server component to deliver data to a
model/algorithm developed in SPSS. SPSS then runs the datathrough the model/algorithm, and enriches the data. The
enriched data can be fed back into InfoSphere Information
Server for further processing and movement to a data ware-
house, data mart or other location where it can be presented
through a BI interface.
BI: IBM Cognos
BI is a common interface or front end for any data integration
project, allowing users to view and query data for a variety of
purposes. IBM Cognos 8 BI delivers the complete range of BI
capabilities: reporting, analysis, dashboarding and scorecards
on a single SOA. Users can create, share and use reports thatdraw on data from virtually any combination of data sources
via InfoSphere Information Server.
Reporting gives users access to a list of self-serve report
types. It is adaptable to any data source and operates from a
single metadata layer to provide benefits such as multilingual
reporting, ad hoc query and scheduling and bursting. Analysis enables the guided exploration of information that
pertains to all dimensions of your business, regardless of
where the data is stored. You can analyze and report against
online analytical processing (OLAP) and dimensionally
aware relational sources. Dashboards communicate complex information quickly.
They translate information from various corporate systems
and data using gauges, charts and other graphical elements
to show the relative health of your organization. Scorecarding features align your business units and tactics
with strategy, providing the ability to communicate goals
consistently and monitor performance against targets.
Linking BI to GIS: SpotOn
SpotOn Vantage (www.spotonsystems.com) extends
IBM Cognos BI by seamlessly integrating with geospatial ana-
lytic capabilities from ESRI ArcGIS Server. Organizational
data can easily be presented in a geographic manner alongside
tabular and chart formats without the need for custom devel-
opment. Users can navigate and interact with ESRI maps and
Cognos report objects without leaving their current view.
Information flows between map and report while the userretains a single unified view. Information is presented synchro-
nized and in context, with high-impact and easy-to-understand
visualizations.
SpotOn Vantage can:
Embed live, interactive, high-impact maps within Cognos
reports Develop additional map layers in ESRI with business data
from Cognos reports Provide multidirectional interaction that allows freedom of
analysis: Dashboard interaction, map-to-report, report-to-
map, map-to-map, map-based prompting, drill down, drill
through and so on
Finally, it is important to note that there are additional meth-
ods for displaying geospatial data via a simplistic spatial view-
ing interface such as Google Earth. However, those viewers
lack the capability to provide analysis capabilities as presented
in a GISthey are designed as a display mechanism and lack
many spatial analytical capabilities.
GIS solutions in the real worldThe following examples illustrate solutions that employ acombination of GIS, InfoSphere Information Server and
IBM software components.
Biosurveillance/food supply surveillance
The earlier that food supply contamination incidents can be
detected, located and quarantined, the more lives can be saved
A biosurveillance or food supply surveillance solution must be
http://www.spotonsystems.com/http://www.spotonsystems.com/http://www.spotonsystems.com/ -
8/3/2019 Ibm White Paper Data Governance for Gis[1]
19/20
19IBM Software
capable of integrating data from multiple heterogeneous
sources, including GIS systems, hospitals, healthcare centers,
doctor offices and pharmacies. Most healthcare facilities use
an ICD-10 (International Standard of Diseases and Related
Health Problems) code for patient diagnosis or symptoms. By
monitoring diagnostic codes from healthcare facilities, a sys-
tem can look for different combinations of codes that may
indicate conditions of interestfood poisoning or unusual
medical symptoms that might indicate a terrorist attack.Searching for a particular pattern may be done via statistical
analytics or data mining algorithms provided by IBM SPSS,
and the data scored to test the likelihood that a particular
record may be of interest. If the data scores high enough, an
alert can be generated and sent to the appropriate authority.
The system could also be used to track natural disease occur-
rences as well, such as influenza outbreaks. If a historical base-
line of data is present, it allows monitoring to determine what
is normal for a given region, time of year or weather condi-
tion and what may indicate an outbreak of interest to health
officials. This type of information could be displayed in a
Cognos dashboard (to show where diseases are below normal
thresholds and where there may be outbreaks occurring), as
well as spatially analyzed within a GIS for simple mapping
purposes or to determine the origin or potential spread of a
disease.
Departments of Transportation
Departments of Transportation (DOTs) usually have massive
amounts of disparate data: GIS information for roadway maps
and engineering/construction projects; asset management and
tracking systems for construction and maintenance equipment;
video information on pavement conditions in linear referenc-
ing systems (LRS); financial, budgeting and contracting infor-
mation; work orders that may be in 20-year-old legacy systems
and so on.
In most cases, this information is kept in separately designed
and siloed systems and sources. As a result, it may contain
many redundancies and overlaps, making it difficult for
cials to query their data or use data in the most efficient way
possible. In turn, it becomes difficult to answer questions for
political officials or taxpaying constituents in an effective and
timely manner.
InfoSphere Information Server and other IBM software com-
ponents, such as Cognos and SPSS, can help DOTs create an
overarching information architecture to provide faster access
and more accurate information, increasing efficiency and pro-viding greater taxpayer value.
Military readiness
Military readiness activities typically involve combining and
leveraging complex and disparate data sources, many of which
are completely isolated from each other. These data sources
may contain GIS information on where different military
assets are located, as well as more conventional data on sup-
plies, logistics, asset management, manpower and critical skills
information.
For example, supply and logistics systems provide informationon systems and parts availability and location. If integrated
with supplier data, a point-of-view may be developed where
inventory replenishment and pipeline can also be determined.
Combining that with shipping information (associating a
RFID or other tracking system) can further determine parts
inventory availability.
Asset management is another example: It requires tracking the
location of completed or deployed systems. For the Navy, it
might be a ship; for the Air Force, a radar system or delivery
platform; for the Army, troop carriers or main battle tanks.
Battlefield commanders are typically interested in the location
of their assets, the collection of availability of those assets and
the readiness of assets for deployment, relocation or use.
Manpower information is also concerned with not only where
billets may be assigned, but also who is filling that billet, what
skills that person has (if they are fluent in Arabic, for example)
and where that person is currently located.
-
8/3/2019 Ibm White Paper Data Governance for Gis[1]
20/20
Please Recycle
With InfoSphere Information Server, all of those information
sources can be combined to deliver commanders a better view
of their operational environment. As an example, as part of an
operational deployment, a high-level commander may want to
move a squadron of AH-64 Apache helicopters from the conti-
nental U.S. to a desert locationa much harsher environment.
The burn rate on many mechanical parts, such as turbine
blades and other engine parts, goes up in such environments.
Data on how much the burn rate increases can be modeled
and compared to existing parts inventories as well as antici-
pated replacement schedules. This information can then be
used to determine if and when additional orders need to
be placed to ship additional parts, or if additional parts need to
be built.
From a manpower perspective, personnel/billet location can
be combined with skills and compared against mission plans to
determine the availability of personnel with the critical skills
needed to complete a particular mission. In turn, that informa-
tion can be combined with logistical and spatial/positional
information and presented to a regional commander through a
dashboard, scorecard and/or reports. This gives the com-
mander a much more cohesive and complete picture of asset
location and readiness, as well as presenting the information
necessary for predicting when critical capabilities may become
unavailable due to logistical shortfalls.
Building trustworthy data with
IBM softwareCurrently available off-the-shelf IBM software components,
such as InfoSphere Information Server, InfoSphere
Foundation Tools, Cognos and SPSS can be used to increase
not only the range of data sources available to a GIS, but also
to greatly increase the quality and trustworthiness of that data.
IBM solutions help you extend and leverage your existing
investments in systems and data. Using these existing and
tested components in new ways can improve your agencys or
organizations ability to meet new and evolving goals.
About the authorDave McDermott
Information Technology Specialist
Federal/InfoSphere Information Server
IBM Software Group
For more informationTo learn more about IBM InfoSphere Information Server,
InfoSphere Foundation Tools and GIS, please visit: ibm.com/software/data/infosphere
ibm.com/software/data/integration/info_server
ibm.com/software/data/infosphere/foundation-tools/index.htm
To learn more about SPSS, an IBM company, please visit:
ibm.com/software/data/info/spss
To learn more about IBM Cognos solutions, please visit:
ibm.com/software/data/cognos
Copyright IBM Corporation 2010IBM Software Group
Route 100Somers, NY 10589
Produced in the United States of AmericaJune 2010All Rights Reserved
IBM, the IBM logo, ibm.com and InfoSphere are trademarks or registeredtrademarks of International Business Machines Corporation in the UnitedStates, other countries, or both. If these and other IBM trademarkedterms are marked on their first occurrence in this information with atrademark symbol ( or ), these symbols indicate U.S. registered orcommon law trademarks owned by IBM at the time this information waspublished. Such trademarks may also be registered or common lawtrademarks in other countries. A current list of IBM trademarks isavailable on the Web at Copyright and trademark information atibm.com/legal/copytrade.shtml
Java and all Java-based trademarks and logos are trademarks of SunMicrosystems, Inc. in the United States, other countries or both.
Microsoft is a trademark of Microsoft Corporation in the United States,other countries or both.
Other company, product or service names may be trademarks or servicemarks of others.
http://www.ibm.com/software/data/infospherehttp://www.ibm.com/software/data/infospherehttp://www.ibm.com/software/data/integration/info_serverhttp://www.ibm.com/software/data/integration/info_serverhttp://www.ibm.com/software/data/infosphere/foundation-tools/index.htmlhttp://www.ibm.com/software/data/infosphere/foundation-tools/index.htmlhttp://www.ibm.com/software/data/info/spsshttp://www.ibm.com/software/data/info/spsshttp://www.ibm.com/software/data/cognoshttp://www.ibm.com/software/data/cognoshttp://www.ibm.com/legal/copytrade.shtmlhttp://www.ibm.com/legal/copytrade.shtmlhttp://www.ibm.com/software/data/cognoshttp://www.ibm.com/software/data/info/spsshttp://www.ibm.com/software/data/infosphere/foundation-tools/index.htmlhttp://www.ibm.com/software/data/integration/info_serverhttp://www.ibm.com/software/data/infospherehttp://www.ibm.com/legal/copytrade.shtml