Data dynamite presentation

15
Today I am going to give you an overview of my new book, “Data Dynamite: how liberating information will transform our world.” Originally I was to co-author the book with Vivek Kundra, Chief Technical Officer of the District of Columbia, and a true trailblazer in this field. However, fortunately for the US, unfortunately for me, President Obama chose Vivek to become the US’s first CIO.

description

An overview of my new book, Data Dynamite: how liberating information will transform our world.

Transcript of Data dynamite presentation

Page 1: Data dynamite presentation

Today I am going to give you an overview of my new book,

“Data Dynamite: how liberating information will transform our

world.”

Originally I was to co-author the book with Vivek Kundra,

Chief Technical Officer of the District of Columbia, and a true

trailblazer in this field. However, fortunately for the US,

unfortunately for me, President Obama chose Vivek to become the

US’s first CIO.

Page 2: Data dynamite presentation

I’m convinced I was chosen by to write this book through some sort of cosmic joke, because I’m the least-likely person to write a book on data. You see, I’m right-brained and intuitive. For me, data used to be good for one thing, and one thing only: figuring the Red Sox’ batting averages. But in reality, that makes me ideally suited to write this book, because it’s time that people like me no longer be disenfranchised when it comes to data. It’s time for data for the rest of us!

Page 3: Data dynamite presentation

When I got interested in data, I found it was pretty hard to get at.

We pay taxes so government can collect data, and you

can bet companies know all about our shopping habits. Our

activities and lives are data’s raw material.

But once it’s collected, most citizens -- and a lot of

employees for that matter -- don’t have a clue where data is stored

or how it’s used. It’s like that last scene in “Raiders of the Lost

Ark,” where the Ark is boxed up and stored in a government

warehouse: you knew it wouldn’t be found again. Substitute a data

warehouse and you’re got the picture of the too-frequent reality.

Page 4: Data dynamite presentation

Today, there are signs of hope. Closely-controlled and

long-lost data is being liberated by the growing demand for

transparency.

Perhaps the best example is one of Vivek Kundra’s

primary accomplishments while he was the U.S. CIO: Data.gov.

The government launched it in the Spring of 2009 with about 20

data sets. By the end of its first three months in use more than

100,000 government data sets – many of them valuable real-time

geo-spatial ones – had been uploaded, Now, nearly 400,000 data

sets are hosted on Data.gov, demonstrating how much data has

been trapped in data warehouses, waiting only to be liberated to

serve the common good .

Page 5: Data dynamite presentation

The time has come to liberate data! ”Liberating data makes it automatically available to

those who need it (based on their roles and responsibilities), when and where they need it, in forms they can use, and with freedom to use as they choose -- while simultaneously protecting security and privacy."

Page 6: Data dynamite presentation

The result will be change and benefits in every aspect of our lives, changes that are particularly critical given the current global challenges and that will improve our lives: • give workforces real-time information • automate previously manual processes, saving time & increasing efficiency • improve government regulatory processes by making access to reports instantaneous and shareable by all agencies • reduce corporate regulatory costs • restore public confidence through transparency • empower the public as full partners in government and business.

Page 7: Data dynamite presentation

However, we are a long way from fully realizing these benefits. Data.gov and its counterparts in about 20 other countries to the contrary, the reality is that, by and large, data has not been liberated either by government or businesses -- and when it has been liberated we’re often unprepared to capitalize on it.

The potential for transformation is not all that different from 1520, when Martin Luther’s translation of the Latin Bible into German and decision to print copies, instead of hand-copy them, gave most people direct access to the printed word for the first time. They no longer had to rely on the clergy as intermediaries.

The results were quick and dramatic: Luther’s works no only led to the Reformation, but to a tremendous push for literacy and the printed word.

Just as the printing press transformed learning and people’s access to the word, so too the Internet, and handful of new web-based tools, none of them radically innovative by themselves but revolutionary when combined, is making it possible, in many cases for the first time, for workers and the general public to have direct access to actionable, valuable data. I believe the benefits and revolution for numbers will be equally dramatic as what Luther set in motion for words.

Page 8: Data dynamite presentation

The first step to begin this transition is an strategic one: It’s time to switch to data-centric organizations, in which usable data is accessible to all sorts of applications and devices, automatically, and all of the organization’s functions are arranged around the data.

Page 9: Data dynamite presentation

The 2nd step to liberate data is to assure that data is valuable. That means that instead of data becoming captured and altered by applications, it must remain as “data nuggets,” accessible to all applications and machines that can act on it. To create those data nuggets we must “structure” data using XML, KML or other systems that attach “tags” such as the XBRL ones you see here, to the numbers. This information about information, or metadata, transforms mere numbers into valuable data. In this case, instead of just the number 882,000,000, we now know it refers to the company’s net income. That income data can flow automatically, and in real time, to any place where the same tags are inserted.9

These tag systems are universal, open standards, available to all, at no charge. I want to emphasize standards, incidentally: it’s precisely because XML, XBRL, KML are universally recognized and not proprietary, that it makes them valuable: they, and the data tagged by them can be shared by all.

One of the most important aspects of XML and variants is that once the tags are attached to the data, they remain attached: the package of metadata and data can be automatically shared by other applications as well as devices. That reduces errors because the data doesn’t have to be rekeyed: you get a “single version of the truth.”

Page 10: Data dynamite presentation

The third step for effective liberating data programs is to provide users with the Web 2.0-based tools such as Gapminder (shown here) that will make it possible for them to really capitalize on that data. Even for trained statisticians, let alone the rest of us, data visualization tools aid in understanding complex data sets, relationships, and so on, because they take statistics and portray them graphically, which makes it easier to understand trends, possible causality, and other factors. As one of the acknowledged thought leaders in data visualization, Edward Tufte, says, “Graphics reveal data. Indeed, graphics can be more precise and revealing than conventional statistical computations.”

In recent years a number of lower-cost dashboard applications such as Tableau, as well as free web-based data visualization tools, such as Many Eyes, have become available , allowing non-statisticians to easily take data and turn it into a wide range of highly informative visual representations, while Web 2.0 tools such as tags, threaded discussions and topic hubs encourage robust discussion of the results. That’s important, too: when data is discussed by people with differing backgrounds, interests and skills, aspects of the data are discovered and explored that even the brightest person, working in isolation, would never uncover.

Page 11: Data dynamite presentation

Curiously, although a growing range of government agencies release public data streams, almost none provide them to their own workforces, to give workers actionable data precisely when and where they need it, to do their work more efficiently.

The fourth element of an effective liberating data strategy is for agencies -- and corporations -- to follow the District of Columbia's lead, and apply the same strategy behind the firewall first, giving workers access to the same data they disclose in public data feeds.

After all, employees may be struggling with incompatible data bases, may need to reach across departmental “silos” to see if there might be synergies between programs, and employees from another department may be able to provide new insights simply because of their differing life experiences and expertise.

As more young workers, who have never known life without the Web, join workforces, they’ll naturally ask why tools they’ve used can’t be used in the workplace. A data graphics project can empower them and tap their expertise.

Using the same data feeds to run your organization that agencies and companies furnish through external data feeds to the public and others can be a powerful way of earning public trust: you’re in essence saying we stand behind this data: we’re so confident in it that we use the same data to run our daily operations as we furnish to you.

Page 12: Data dynamite presentation

Finally, on the cutting edge of liberating data is to use it to

invite your customers or citizens to become co-creators of products

and services.

That’s what Beth Noveck, the former Obama Administration

deputy CTO, did prior to joining the Administration, with the Peer-

to-Patent program, which allows interested experts and laymen to

become active partners in the patent review process. They have

already significantly reduced the patent application backlog.

With liberating data, crowdsourcing will become

commonplace and will result in both improved services to the

public and entrepreneurial opportunities.

Page 13: Data dynamite presentation

But what if you liberate data but nobody comes? We have to

realize, and deal with, the reality that a majority of the American

population is innumerate, i.e., doesn’t have the basic skills

demanded to deal with basic numeric calculations. This rate was

probably masked by indifference during the era when data was

hard to obtained, but now that it is potentially ubiquitous, that

high failure rate is unacceptable.

Fortunately, the same tools that can make data intelligible and

interesting to adults can also be used in the classroom to make dry

numbers come alive and let students learn by playing with

numbers. The private sector should partner with educators to

make this transition a reality, to build numeracy and the people’s

ability to deal with statistical information..

Page 14: Data dynamite presentation

One reason for optimism that a new data-centric society could

overcome innumeracy is the way that users of a wide range of

social media have been quick to adopt, and have quickly learned to

use accurately, tagging data. In this case, use of the #wxreport tag

assures that the National Weather Service’s computers will receive

Tweets referring to breaking local weather observations, making

the public valuable adjuncts to other information sources. If this

kind of alteration in user behavior can happen spontaneously,

imagine what could happen if there were formal programs

designed to increase data numeracy!

Page 15: Data dynamite presentation

Thank you.

To learn more about liberating data and how to create

the processes and policies to make it a reality, contact:

Stephenson Strategies 335 Main Street, Medfield, MA 02052 (617)

314-7858 [email protected]