Managing Open Transportation Data at the U.S. Department of Transportation

Post on 28-Jul-2015

25 views 2 download

Transcript of Managing Open Transportation Data at the U.S. Department of Transportation

Managing Open

Transportation Data

at the U.S.

Department of

Transportation

Dennis D. McDonald, Ph.D.

Page 1

Managing Open Transportation Data at the U.S.

Department of Transportation

May 29, 20151

By Dennis D. McDonald2

On May 27 I attended a symposium in Washington D.C.

sponsored by Data Innovation DC at the Georgetown University

School of Continuing Studies. The topic was “Get Moving with

Data - The US Department of Transportation and its Data.”

Presentations were made by U.S. Department of Transportation

speakers including Dan Morgan, The DOT’s first Chief Data

Officer.

Morgan and the others walked us through the variety of data

sets generated and published by the Department.

"Published" is the key word here. Many of these data are

available online, many in a standardized form accompanied by metadata dictionaries and in some

cases APIs to facilitate data access and use. Some of the data sets discussed included highway data (all

roads and highway performance, bridge inventory, freight analysis), research data sets for intelligent,

connected vehicles, the National Transit Database, and data on aviation safety, roadway fatalities and

crashes, product recalls, and transportation statistics.

Of special interest to me were comments by the presenters on how the Department’s data

management practices are changing as, in this day of “open data,” use and reuse of data is being

encouraged.

What follow are my own comments about some of the key points I took away from the meeting

regarding:

Changing data management practices

Continuing importance of partnerships

1 Revised June 1, 2015.

2 Copyright © 2015 by Dennis D. McDonald, Ph.D. Dennis is an independent management consultant based in Alexandria, Virginia. His experience includes consulting company ownership and management, database publishing and data transformation projects, managing the consolidation of large systems, open data, statistical research, corporate IT strategy, and IT cost analysis. Clients have included the U.S. Department of Veterans Affairs, the U.S. Environmental Protection Agency, the National Academy of Engineering, the World Bank, and the National Library of Medicine. He has worked as a project manager, analyst, and researcher in the U.S. and in Europe, Egypt, and China. His web site is located at www.ddmcd.com and his email address is ddmcd@yahoo.com. On Twitter he is @ddmcd.

Page 2

Encouraging reuse

Funding calculations

Increasing importance of real-time data

Data specialization & data generalization

Changing data management practices

Some of the data sets generated and published by DOT go back to the 1940s and are based on data

manually gathered and submitted on a regular monthly basis by state and local agencies. Over time

some data are now being gathered more frequently (if funding and cooperation are available). In some

cases heavy use is still being made of phone and fax based data collection.

Some data are gathered automatically via in-road sensors. Other data on road conditions are gathered

photographically.

It's a real mix. Efforts are currently underway via the ARNOLD system to integrate all highway related

data into a single network model incorporating geolocation, road condition, traffic, incident, weather,

and other data elements.

It’s a massive effort. As a former number cruncher I’m seriously impressed.

Continuing importance of partnerships

DOT doesn’t generate all this data by itself but depends on the cooperation of many state and local

entities to supply and update data.

This partnership model is one of the first things you learn about Federal open data management

efforts regardless of whether you are discussing roadway passenger traffic volume, incident and

accidents, expenditures, miles driven, or headcounts. Data ultimately originate at the state and local

level and usage occurs at all levels. It’s useful to keep this "partnership" concept in mind when applying

a “data management lifecycle" model to tracking and managing data from the time of origination to

publication, use in modeling or calculations, updating, and retirement. Given the multiple stakeholders

involved (and the multiple political interest groups, FAA data being a very good example), "managing"

how all these parties work together is a major DOT concern. How DOT does this management has to

change with the times as well as data management and access methods – and policies – continue to

evolve.

Encouraging reuse of data

DOT staff want their data to be used and reused and not just specifically for legislated funding

apportionment. DOT encourages use of data by a variety of means including "hackathons" to promote

interest among analytically oriented innovators and entrepreneurs. DOT also encourages innovative

use of its data by commercial ventures including the publishing of FAA data for private pilot iPads and

the analysis of truck incident data patterns by insurance companies.

Page 3

Such uses may not always be specified by DOT’s enabling legislation. One of the great things about

“open data” is that such data is available for use and combination in original ways with other data.

In some ways this is similar to what NOAA is doing with its big data project where major cloud vendors

are encouraged to support both public access and data reuse. One major difference, though, is NOAA’s

massive data volume compared with Transportation.

Funding calculations

Some data gathered and published by DOT is specifically designed for calculating how Federal funds

are to be allocated. This requirement is both a blessing and a curse.

Blessing-wise this means that data collection and publishing efforts can evolve to become dependable,

reliable, and sustainable (although there may be occasional hiccups introduced by periodic funding

sequesters).

A curse is that this focus on the data needs of specific programs may limit the resources that DOT can

devote to seeking out and encouraging innovative and potentially commercially viable uses of DOT

data. The end result of such resource limitations might be that, by adhering to legislated program

priorities, the public could be losing out if new or innovative data uses aren't being surfaced. Again,

making such data “open” is one way to encourage innovative usage.

Increasing importance of real-time data

Transportation data that may have been collected on a biennial or annual basis decades ago might now

be collected annually or monthly. Other data are being collected more frequently or in near real-time.

For some applications the argument for more frequent collection is straightforward: i.e., increased

accuracy of data for users. More frequent collection of data, of course, generates increased costs all

the way through from collection through processing, storage, and release.

Not all types of data and data uses can justify the expense of increased frequency or real time.

Deciding where to place priorities becomes a complex issue that raises interesting governance,

management, policy, and technology issues. It might also call for a more open and transparent process

for making such decisions so that fairness and objectivity can be maintained.

Data specialization and data generalization

DOT data cover a variety of specialized and generalized topics. Everyone can appreciate the

significance of data describing traffic and accidents on the ground or in the air but the language used

to describe the data may vary widely according to specialty and understandability. This places a strong

emphasis on availability of good documentation describing DOT data and metadata. DOT does provide

much documentation along with its data files.

Viewed strategically the mix of standards, terminology, semantics, and vocabularies places a premium

on managing data across the board as a strategic asset in ways that are aligned with national priorities.

Page 4

At the same time there is the need to maintain data quality and utility for the many vertical specialties

represented by DOT programs. These are reasons why DOT and other organizations now have a Chief

Data Officer so that within department and across department goals and objectives can be supported

efficiently.

Discussion

One things I found refreshing about the DOT presentations was the enthusiasm of the program

managers for "their data." Coming from a background in research and statistics I find this both

appropriate and inspiring. Maintaining data quality and utility requires professional skill and discipline.

The focus on analytics and visualization was impressive.

Still, one topic that wasn't addressed much during the presentations -- if at all -- was the overall

governance and management of the Department’s data related operations. This is the aspect of "big

data" and "open data" that fascinate me given the need to coordinate the many stakeholders involved.

How data related processes permeate data intensive organizations such as DOT might argue for a

"flatter" data management architecture where participants and influencers all along the nodes of

various data management lifecycles are able -- and encouraged -- to collaborate, share information,

and work together.

This need for collaboration is a pretty basic requirement where stakeholders and decision makers are

distributed throughout an organization. At the same time it’s not uncommon for hierarchically

structured bureaucracies to resist change. Sometimes the resulting stability is good -- and sometimes

it’s bad.

In the case of the data management operations at organizations like DOT the dedication of its IT and

data professionals will be an important force for good. However, as more changes are demanded in

how data are generated, managed, and released, it’s not only the IT managers, data administrators,

programmers, and analysts who have to work together to make things happen. All impacted

departments and budgets need to be involved in planning, implementation, and oversight as more

data are generated, standardized, released, and supported.

This calls for more collaboration and coordination than can be accomplished via a series of quarterly or

even monthly meetings among department heads. Such challenges are not unique to the Federal

Government. All large organizations desiring to take a more strategic position in how data -- the

lifeblood of organization processes -- are managed and released will have to address such governance

issues.

Related reading:

Breakthrough Financial Open Data Legislation To Be Introduced May 20

The Continuing Evolution of Data.gov

Interim Report on the Generalizability of the NOAA Big Data Project’s Management Model

Moving to the Cloud: Business as Usual or Opportunity for Change?