Dark Data: Where the Future Lies

28
Dark Data: Where the Future Lies Vince Kellen, Ph.D. Senior Vice Provost Analytics and Technologies University of Kentucky [email protected] March 5, 2014 This is a living document subject to substantial revision.

description

Innovation and economic growth depends on company's ability to gain insight into data. However, data is growing exponentially, but our ability to make use of it is not. Untapped economic value resides in this unutilized data, called "dark data." This presentation looks at some of the causes for the explosion of data, some of the impediments preventing exploring and creating business value from dark data; and some ideas for ways around those impediments.

Transcript of Dark Data: Where the Future Lies

Page 1: Dark Data: Where the Future Lies

Dark Data: Where the Future Lies

Vince Kellen, Ph.D.Senior Vice ProvostAnalytics and TechnologiesUniversity of Kentucky

[email protected] 5, 2014

This is a living document subject to substantial revision.

Page 2: Dark Data: Where the Future Lies

The economic case The global economy is now [permanently] fueled by information

Innovation is becoming the merging of human creativity and increasingly automated information extraction

Data is growing exponentially, human creativity ‘cycles’ are not

We are going to need [novel, surprising, freaky] ways of increasing the speed of information extraction from vast and growing data reserves

Finally, we are going to have to develop [novel, surprising, freaky] economic ‘infrastructure’ to foster emergent designs for turning extracted information into wealth creation faster

2

Page 3: Dark Data: Where the Future Lies

[Population, wealth, technology, knowledge]

Hunting and foraging

Agricultural revolution

Rise of the ‘world system’

Industrial revolution

Post-information revolution

Sources: Wikipedia; various; UN Report World Population to 2300 (2004)

Diffusion accelerates technology adoptionCommunications technology accelerates diffusion

Page 4: Dark Data: Where the Future Lies

World’s technological installed capacity to store information

Hilbert M, Lopez P. 2011. The World’s Technological Capacity to Store, Communicate, and Compute Information. Science. Vol. 332 no. 6025 pp. 60-65.

Page 5: Dark Data: Where the Future Lies

The total world's information, which is 1.8 zettabytes, could be stored in about four grams of DNA.Harvard stores 70 billion books using DNA.Research team stores 5.5 petabits, or 1 million gigabits, per cubic millimeter in DNA storage mediumhttp://www.computerworld.com/s/article/9230401/Harvard_stores_70_billion_books_using_DNA

Photo: Kelvin Ma for the Wall Street Journal

Dr. Church keeps a vial of DNA encoded with copies of his latest book.

Page 6: Dark Data: Where the Future Lies

Cause…

http://andrewmcafee.org/2011/01/jevons-computation-efficicency-hardware-investmen/

Page 7: Dark Data: Where the Future Lies

http://andrewmcafee.org/2011/01/jevons-computation-efficicency-hardware-investmen/

Effect!

Page 8: Dark Data: Where the Future Lies

Desperately seeking productivity

Page 9: Dark Data: Where the Future Lies

Moore’s law, growth in data, IT investments

9

Page 10: Dark Data: Where the Future Lies

As data grows exponentially, so does dark data

Dark data

10

Page 11: Dark Data: Where the Future Lies

Rate of innovation, pace of urban life In order to sustain exponential economic growth rates, the rate of

innovation must increase. Otherwise we will not have exponential growth

Information flows through human culture (cities) is akin to blood flowing through a circulatory system. • Both cities and animals conserve physical energy (molecules). As both

get bigger, they conserve energy

However, the two systems have two fundamentally different behavior when it comes to ‘output’• As cities get bigger, their ‘pace of life’ and economic output increases.

The rate of information flow quickens• As animals get bigger, their ‘pace of life’ and metabolic output

decreases. The rate of metabolic flow decreasesBettencourt, et al. (2007). Growth, innovation, scaling and the pace of life in cities. www.pnas.orgcgidoi10.1073pnas.0610172104

11

Page 12: Dark Data: Where the Future Lies

Information rules Information quickening drives economic growth, encouraging consumption

(and conservation) of molecules• While fears of a Malthusian collapse have haunted economists forever,

innovation and technology has enabled growth so far• Analytics can lead to productivity increases

Information’s dominance in the economy appears to be causing slowing or reversing population growth rates• Rising populations drive rates of innovation and economic growth. No

population growth might be worrisome• Is rising information unexpectedly going to cool down the economy?

While innovation allows both growth and efficient use of resources, to sustain growth we are going to need more innovation, not less!• Increasing stores of information and means of action will be needed• DARK DATA WILL NEED TO BE MINED

12

Page 13: Dark Data: Where the Future Lies

Bits versus Atoms

Physical material exhibits limits to scale. Data does not. Computing cost-effectiveness growth enables exponential data growth

13

Page 14: Dark Data: Where the Future Lies

Two overlapping, interacting systems

The two systems now interact. Less molecules create more data. Information fuels economic growth, reduces population rates, improves utilization of molecules. The rate at which dark data is applied will affect all these rates

Molecules

Dark data

Information

14

Page 15: Dark Data: Where the Future Lies

Pause. Where are we?

Data and information are very important at this point in human history. How do we take advantage of these megatrends?

15

Page 16: Dark Data: Where the Future Lies

Production and consumption of information In order to unleash dark data, we have to worry about two

problems: better production of information from dark data reserves and better means of applying mined information to economic activity (consumption)

Production• We will need new purely human, purely technical and human-technical

ways of extracting information from growing reserves of data

Consumption• We are desperately going to need [old, new] human beings with a very

different orientation to data and decision making

16

Page 17: Dark Data: Where the Future Lies

Production ideas Crowd-sourced and community sourced analytics.

• Skills will be scarce. Have to do a better job of matching analytics to global skill sets

Dark data exchanges• Can we sell our dark data to others for their exploitation? Can we buy others

dark data?

Dark data reserves exploration• More use of automated means of discovering data reserves and cataloging

their location. Idea generation on possible value from mining

Data refineries• We need to improve the rate at which data can be refined. Automated

metadata extraction, automated data quality detection, semi-automated model construction, elimination of ‘one-off’ models and better reuse of partial or complete models

17

Page 18: Dark Data: Where the Future Lies

Production ideas

Make widespread use of rapid data discovery tools. The ability to go from the first question to the final answer quickly matters greatly

Combine purely automated technical methods of extraction and refinement with human, collaborative processes to further refine the data

Develop and use refined, automated data movement tools

Increase data’s ‘surface area’ through careful model design aimed at facilitating regular analysts’ use

Increase data transparency, make available to many more analysts

Utilize new ways obscuring data to improve privacy and security without sacrificing pattern discovery

18

Page 19: Dark Data: Where the Future Lies

Information consumption dysfunction The No. 1 impediment for improved use of dark data is human

psychology. The dominant regime for managing information and power must end

This regime has the following attributes:• Define goals and try to achieve them• Maximize winning, minimize losing• Unilateral control and accountability

This regime causes the following dysfunctions• Information is power, thus data is hoarded, metadata formation is guarded,

‘framing of problems’ becomes a competitive battlefield• Gamers that rely on data obfuscation to make untestable claims• Reliance on personal anecdote and sample sizes of 1• Threat-induced reactions to difficult data, causing data suppression• Cover-ups, manipulation of others, assaults on autonomy and agency

See Chris Argyris and Double Loop Learning. http://en.wikipedia.org/wiki/Chris_Argyris 19

Page 20: Dark Data: Where the Future Lies

The problems with the dominant regime It’s in our nature, all humans are

highly skilled at this behavior. Part of being a child and parent

It is toxic to creative, high IQ talent

It inhibits team performance

It creates internal political theater

It limits terribly the application of insights from dark data

It causes awfully bad, if not tragic public spectacles

20

Page 21: Dark Data: Where the Future Lies

Needed: A new culture of information A new cultural model needs to develop, based on the following

attributes• Transparency. Provide equal access to all sides of a debate• Rapid validation. Find and use tools that let all sides of a debate

analyze, validate or refute insights into data• Instead of maximizing winning and minimizing losing, encourage

small, fast failure. Instead of ‘punishing’ individuals, put the focus on team rewards and multi-lateral control

• Instead of empowering leaders so that accountability can be overly simple, establish more intricate performance measurement systems that stabilize the enterprise, provide better feedback to many

The future of exploitation of dark data will be owned by teams that can collaborate well, challenge members productively and stay together long enough to turn the data into economic wealth

21

Page 22: Dark Data: Where the Future Lies

How can you spot the person who can’t succeed? Shine light on their data and data management processes. Ask

them to document and share details about their model. See if they will allow others to independently verify their results. Engage in a conversation about their model assumptions

Gamers playing under the old rules will typically do the following• Defer, delay and avoid the meeting or producing the evidence• Refer to concepts like ‘we’re the experts’ or ‘we can’t explain it to non-

experts’• Change the subject• Cite powers outside of their control that limit their ability to respond• Go undercover and hide for a while

You can’t succeed with a house full of gamers

22

Page 23: Dark Data: Where the Future Lies

Building expert teams takes skill and time Expert teams share a clear and common purpose and a strong mission

Expert teams share mental models • Their members anticipate each other. That can communicate without the need for overt communication

They are adaptive • They are self correcting. Their members compensate for each other. They reallocate functions. They engage in

a cycle of prebrief-performance-debrief, giving feedback to each other. They establish and revise team goals. They differentiate between high and low priorities. They have mechanisms for anticipating and reviewing issues and problems of members. They periodically review and diagnose team effectiveness and team vitality

They have clear (but not overly clear or rigid) roles and responsibilities • Members understand their roles and how they fit together

They have strong team leadership• Led by someone with good leadership, not just technical skills. They have team members who believe the

leader cares about them. They provide situation updates. They foster teamwork, coordination and cooperation. They self-correct first

They develop a strong sense of "collective" • Trust, teamness and confidence are important. They manage conflict well. Members confront each other

effectively. They trust each others intentions

They optimize performance outcomes • They make fewer errors. They communicate often enough, ensuring members have the information to be able

to contribute. They make better decisions

The cooperate and coordinate• They identify team task work requirements. They ensure, through staffing and development, that the team

possesses the right mix of competencies. They consciously integrate new members. They distribute and assign work thoughtfully. They examine and adjust the physical workspace to optimize communication and coordination

Page 24: Dark Data: Where the Future Lies

Other consumption ideas Examining decision-making within the enterprise. Find bottlenecks

to faster decisions. Draw a new line separating central from local agency. Let projects proceed with light/fast approval with follow-up and audit later

More rapid or time-boxed decision making. Use agile approaches. Minimally viable products. Incremental releases

Reward spontaneous collaboration. Design committees, teams, units based on collaboration IQ rather than representativeness

Automate more decisions, starting with the mundane or risk-free

Define new roles with complementary analysis and application skills. Hire more generalists with excellent critical thinking

24

Page 25: Dark Data: Where the Future Lies

CEO imperative Designing an organization that can take advantage of dark data is

very difficult. It is a CEO problem

The challenge has many layers• Understanding where to strategically apply dark data findings, how to

compete on analytics• Ascertaining organizational and infrastructure readiness• Establishing executive and employee incentive models that help• Managing and monitoring progress at the technical, individual, team

and enterprise level• Enforcing evidence-based decision making and changing the culture• Designing the models to be used throughout the enterprise

CIOs can play a strong role, but the CEO, IMHO, has to own this

25

Page 26: Dark Data: Where the Future Lies

CEO Advisory Engagement1. Strategic possibilities

• Examine the firm’s business model, value-creating activities• Identify areas where analytics and data may help, through ideation sessions

2. Dark data inventory• Document the data assets across the enterprise• Categorize and rank by quality and availability

3. Value network assessment• Evaluate the value for upstream and downstream players • Identify potential sources, uses for dark data

4. Economic estimates• Identify use cases, evaluate potential benefit and risks• Prioritize opportunities

5. Organizational development and change management• Identify culture issues, skill gaps, org structure changes, incentives, additional

resources needed, communications approach, timelines and sequencing

26

Page 27: Dark Data: Where the Future Lies

Summary Information is redefining humanity in ways we still don’t understand. The

future is not certain. It will be written by winners

Economic growth depends on rates of innovation. Innovation depends on new insights which come chiefly from data

Data is growing exponentially. Human ability to process it is not. Thus, dark data is growing exponentially too

Firms differ widely in their [in]ability to mine data for information (production) and apply information in decisions (consumption)

A [largely, partially] semi-automated analytic discovery and refining capability is imminent

Winners will find new ways of organizing themselves and their ecosystems to gain advantage, speeding up timeframes

27

Page 28: Dark Data: Where the Future Lies

Questions?

“Get your facts first, and then you can distort them as you please.”

-Mark Twain28