Post on 28-Jul-2015
Managing Open
Transportation Data
at the U.S.
Department of
Transportation
Dennis D. McDonald, Ph.D.
Page 1
Managing Open Transportation Data at the U.S.
Department of Transportation
May 29, 20151
By Dennis D. McDonald2
On May 27 I attended a symposium in Washington D.C.
sponsored by Data Innovation DC at the Georgetown University
School of Continuing Studies. The topic was “Get Moving with
Data - The US Department of Transportation and its Data.”
Presentations were made by U.S. Department of Transportation
speakers including Dan Morgan, The DOT’s first Chief Data
Officer.
Morgan and the others walked us through the variety of data
sets generated and published by the Department.
"Published" is the key word here. Many of these data are
available online, many in a standardized form accompanied by metadata dictionaries and in some
cases APIs to facilitate data access and use. Some of the data sets discussed included highway data (all
roads and highway performance, bridge inventory, freight analysis), research data sets for intelligent,
connected vehicles, the National Transit Database, and data on aviation safety, roadway fatalities and
crashes, product recalls, and transportation statistics.
Of special interest to me were comments by the presenters on how the Department’s data
management practices are changing as, in this day of “open data,” use and reuse of data is being
encouraged.
What follow are my own comments about some of the key points I took away from the meeting
regarding:
Changing data management practices
Continuing importance of partnerships
1 Revised June 1, 2015.
2 Copyright © 2015 by Dennis D. McDonald, Ph.D. Dennis is an independent management consultant based in Alexandria, Virginia. His experience includes consulting company ownership and management, database publishing and data transformation projects, managing the consolidation of large systems, open data, statistical research, corporate IT strategy, and IT cost analysis. Clients have included the U.S. Department of Veterans Affairs, the U.S. Environmental Protection Agency, the National Academy of Engineering, the World Bank, and the National Library of Medicine. He has worked as a project manager, analyst, and researcher in the U.S. and in Europe, Egypt, and China. His web site is located at www.ddmcd.com and his email address is ddmcd@yahoo.com. On Twitter he is @ddmcd.
Page 2
Encouraging reuse
Funding calculations
Increasing importance of real-time data
Data specialization & data generalization
Changing data management practices
Some of the data sets generated and published by DOT go back to the 1940s and are based on data
manually gathered and submitted on a regular monthly basis by state and local agencies. Over time
some data are now being gathered more frequently (if funding and cooperation are available). In some
cases heavy use is still being made of phone and fax based data collection.
Some data are gathered automatically via in-road sensors. Other data on road conditions are gathered
photographically.
It's a real mix. Efforts are currently underway via the ARNOLD system to integrate all highway related
data into a single network model incorporating geolocation, road condition, traffic, incident, weather,
and other data elements.
It’s a massive effort. As a former number cruncher I’m seriously impressed.
Continuing importance of partnerships
DOT doesn’t generate all this data by itself but depends on the cooperation of many state and local
entities to supply and update data.
This partnership model is one of the first things you learn about Federal open data management
efforts regardless of whether you are discussing roadway passenger traffic volume, incident and
accidents, expenditures, miles driven, or headcounts. Data ultimately originate at the state and local
level and usage occurs at all levels. It’s useful to keep this "partnership" concept in mind when applying
a “data management lifecycle" model to tracking and managing data from the time of origination to
publication, use in modeling or calculations, updating, and retirement. Given the multiple stakeholders
involved (and the multiple political interest groups, FAA data being a very good example), "managing"
how all these parties work together is a major DOT concern. How DOT does this management has to
change with the times as well as data management and access methods – and policies – continue to
evolve.
Encouraging reuse of data
DOT staff want their data to be used and reused and not just specifically for legislated funding
apportionment. DOT encourages use of data by a variety of means including "hackathons" to promote
interest among analytically oriented innovators and entrepreneurs. DOT also encourages innovative
use of its data by commercial ventures including the publishing of FAA data for private pilot iPads and
the analysis of truck incident data patterns by insurance companies.
Page 3
Such uses may not always be specified by DOT’s enabling legislation. One of the great things about
“open data” is that such data is available for use and combination in original ways with other data.
In some ways this is similar to what NOAA is doing with its big data project where major cloud vendors
are encouraged to support both public access and data reuse. One major difference, though, is NOAA’s
massive data volume compared with Transportation.
Funding calculations
Some data gathered and published by DOT is specifically designed for calculating how Federal funds
are to be allocated. This requirement is both a blessing and a curse.
Blessing-wise this means that data collection and publishing efforts can evolve to become dependable,
reliable, and sustainable (although there may be occasional hiccups introduced by periodic funding
sequesters).
A curse is that this focus on the data needs of specific programs may limit the resources that DOT can
devote to seeking out and encouraging innovative and potentially commercially viable uses of DOT
data. The end result of such resource limitations might be that, by adhering to legislated program
priorities, the public could be losing out if new or innovative data uses aren't being surfaced. Again,
making such data “open” is one way to encourage innovative usage.
Increasing importance of real-time data
Transportation data that may have been collected on a biennial or annual basis decades ago might now
be collected annually or monthly. Other data are being collected more frequently or in near real-time.
For some applications the argument for more frequent collection is straightforward: i.e., increased
accuracy of data for users. More frequent collection of data, of course, generates increased costs all
the way through from collection through processing, storage, and release.
Not all types of data and data uses can justify the expense of increased frequency or real time.
Deciding where to place priorities becomes a complex issue that raises interesting governance,
management, policy, and technology issues. It might also call for a more open and transparent process
for making such decisions so that fairness and objectivity can be maintained.
Data specialization and data generalization
DOT data cover a variety of specialized and generalized topics. Everyone can appreciate the
significance of data describing traffic and accidents on the ground or in the air but the language used
to describe the data may vary widely according to specialty and understandability. This places a strong
emphasis on availability of good documentation describing DOT data and metadata. DOT does provide
much documentation along with its data files.
Viewed strategically the mix of standards, terminology, semantics, and vocabularies places a premium
on managing data across the board as a strategic asset in ways that are aligned with national priorities.
Page 4
At the same time there is the need to maintain data quality and utility for the many vertical specialties
represented by DOT programs. These are reasons why DOT and other organizations now have a Chief
Data Officer so that within department and across department goals and objectives can be supported
efficiently.
Discussion
One things I found refreshing about the DOT presentations was the enthusiasm of the program
managers for "their data." Coming from a background in research and statistics I find this both
appropriate and inspiring. Maintaining data quality and utility requires professional skill and discipline.
The focus on analytics and visualization was impressive.
Still, one topic that wasn't addressed much during the presentations -- if at all -- was the overall
governance and management of the Department’s data related operations. This is the aspect of "big
data" and "open data" that fascinate me given the need to coordinate the many stakeholders involved.
How data related processes permeate data intensive organizations such as DOT might argue for a
"flatter" data management architecture where participants and influencers all along the nodes of
various data management lifecycles are able -- and encouraged -- to collaborate, share information,
and work together.
This need for collaboration is a pretty basic requirement where stakeholders and decision makers are
distributed throughout an organization. At the same time it’s not uncommon for hierarchically
structured bureaucracies to resist change. Sometimes the resulting stability is good -- and sometimes
it’s bad.
In the case of the data management operations at organizations like DOT the dedication of its IT and
data professionals will be an important force for good. However, as more changes are demanded in
how data are generated, managed, and released, it’s not only the IT managers, data administrators,
programmers, and analysts who have to work together to make things happen. All impacted
departments and budgets need to be involved in planning, implementation, and oversight as more
data are generated, standardized, released, and supported.
This calls for more collaboration and coordination than can be accomplished via a series of quarterly or
even monthly meetings among department heads. Such challenges are not unique to the Federal
Government. All large organizations desiring to take a more strategic position in how data -- the
lifeblood of organization processes -- are managed and released will have to address such governance
issues.
Related reading:
Breakthrough Financial Open Data Legislation To Be Introduced May 20
The Continuing Evolution of Data.gov
Interim Report on the Generalizability of the NOAA Big Data Project’s Management Model
Moving to the Cloud: Business as Usual or Opportunity for Change?
Page 5
Observations and Questions about Open Data Program Governance
OMB Releases Federal Data Inventories – So What?
On Defining the "Maturity" of Open Data Programs
Open Data Management at the U.S. Environmental Protection Agency (EPA)
Recommendations for Collaborative Management of Government Data Standardization Projects
USAID’s “Frequently Asked Questions” and the Management of Open Data Programs
Will NOAA’s “Big Data Partnership” be a Model for Other Government Agencies?