The False Promise of Big Data: Can Data Mining Replace ...

22
Research Paper The False Promise of Big Data: Can Data Mining Replace Hypothesis-Driven Learning in the Identication of Predictive Performance Metrics? David Apgar* Social Impact, Inc., Arlington, VA, USA This paper argues US manufacturers still fail to identify metrics that predict performance results despite two decades of intensive investment in data-mining applications because indicators with the power to predict complex results must have high information content as well as a high impact on those results. But data mining cannot substitute for experimental hypothesis testing in the search for predictive metrics with high information content not even in the aggregatebecause the low-information metrics it provides require improbably complex theories to explain complex results. So theories should always be simple but predictive factors may need to be com- plex. This means the widespread belief that data mining can help managers nd prescriptions for success is a fantasy. Instead of trying to substitute data mining for experimental hypothesis testing, managers confronted with complex results should lay out specic strategies, test them, adapt themand repeat the process. Copyright © 2013 John Wiley & Sons, Ltd. Keywords predictive performance metrics; complexity; information content; hypothesis-driven learning; data mining INTRODUCTION Despite decades of intensive investment in data analysis applications, US rms still have trouble identifying metrics that can predict and explain performance results. Research on complexity and predictive metrics is beginning to show why. Data mining, or the search for patterns among variables captured by data applications, cannot replace a form of experimental hypothesis testing here called hypothesis-driven learning in the identication of complex predictive metrics. Confronted with complex results, managers should lay out hypotheses in the form of highly specic strategies, test them in practice, adapt themand then repeat the process. This paper rejects as a fantasy the widespread be- lief that data mining can identify prescriptions for success. The reason is that it cannot generate metrics with the potential to predict, explain or control com- plex patterns of results. It may be useful in testing the predictive power of metrics once managers propose them, as in Karl Popper s conjectures and refutations *Correspondence to: David Apgar, 2804 Dumbarton Street NW, Washington, DC 20007, USA. E-mail: [email protected] Received 15 March 2012 Accepted 10 August 2013 Copyright © 2013 John Wiley & Sons, Ltd. Systems Research and Behavioral Science Syst. Res. 32, 2849 (2015) Published online 20 September 2013 in Wiley Online Library (wileyonlinelibrary.com) DOI: 10.1002/sres.2219

Transcript of The False Promise of Big Data: Can Data Mining Replace ...

Page 1: The False Promise of Big Data: Can Data Mining Replace ...

■ Research Paper

The False Promise of Big Data: Can DataMining Replace Hypothesis-DrivenLearning in the Identification ofPredictive Performance Metrics?

David Apgar*Social Impact, Inc., Arlington, VA, USA

This paper arguesUSmanufacturers still fail to identifymetrics that predict performance resultsdespite two decades of intensive investment in data-mining applications because indicatorswith the power to predict complex results must have high information content as well as a highimpact on those results. But data mining cannot substitute for experimental hypothesis testingin the search for predictive metrics with high information content—not even in the aggregate—because the low-informationmetrics it provides require improbably complex theories to explaincomplex results. So theories should always be simple but predictive factorsmayneed to be com-plex. This means the widespread belief that data mining can help managers find prescriptionsfor success is a fantasy. Instead of trying to substitute data mining for experimental hypothesistesting, managers confronted with complex results should lay out specific strategies, test them,adapt them—and repeat the process. Copyright © 2013 John Wiley & Sons, Ltd.

Keywords predictive performance metrics; complexity; information content; hypothesis-drivenlearning; data mining

INTRODUCTION

Despite decades of intensive investment in dataanalysis applications, US firms still have troubleidentifying metrics that can predict and explainperformance results. Research on complexityand predictive metrics is beginning to showwhy. Data mining, or the search for patternsamong variables captured by data applications,cannot replace a form of experimental hypothesis

testing here called hypothesis-driven learning inthe identification of complex predictive metrics.Confronted with complex results, managersshould lay out hypotheses in the form of highlyspecific strategies, test them in practice, adaptthem—and then repeat the process.

This paper rejects as a fantasy the widespread be-lief that data mining can identify prescriptions forsuccess. The reason is that it cannot generate metricswith the potential to predict, explain or control com-plex patterns of results. Itmay be useful in testing thepredictive power of metrics once managers proposethem, as inKarl Popper’s conjectures and refutations

*Correspondence to: David Apgar, 2804 Dumbarton Street NW,Washington, DC 20007, USA.E-mail: [email protected]

Received 15 March 2012Accepted 10 August 2013Copyright © 2013 John Wiley & Sons, Ltd.

Systems Research and Behavioral ScienceSyst. Res. 32, 28–49 (2015)Published online 20 September 2013 in Wiley Online Library(wileyonlinelibrary.com) DOI: 10.1002/sres.2219

Page 2: The False Promise of Big Data: Can Data Mining Replace ...

model of scientific discovery (Popper, 2002). Indeed,others have suggested this approach as a planningmodel for the US Army (Dewar et al., 1993) andstart-ups (McGrath and MacMillan, 1995). But datamining cannot replace experimental hypothesis test-ing or simply hypothesis-driven learning.Hypothesis-driven learning for the purposes of

this paper is defined as recursive experimentalhypothesis testing. In the context of businessthese hypotheses are generally strategies. Experi-mental hypothesis testing is taken in the Bayesiansense (pace Popper) of updating the probabilitiesassigned to competing hypotheses or strategiesin the light of new evidence or results based onhow much more or less likely each hypothesismakes that evidence. Recursive hypothesis testingis Bayesian in the sense that each new set of resultsupdates the probabilities resulting from the last up-date to the extent possible while always allowingfor the introduction of novel hypotheses or strate-gies. In sum, hypothesis-driven learning refers hereto the repeated use of results in testing businesshypotheses to find one with high predictive power.The first section of the paper illustrates the

argument by contrasting HP’s recent efforts toderive strategy from data with the tradition ofhypothesis-driven learning at Toyota. The secondsection shows what is at stake by assessing theweak impact of decades of intensive investmentin data analysis on the capacity of US manufac-turers to explain performance results.The main argument lies in the third and fourth

sections. The third section links ideas about requi-site variety, mutual information and probabilisticcausation to show why metrics with the power topredict complex resultsmust have high informationcontent as well as a high impact on those results.The fourth section proposes an approach to provingPareto’s conjecture that only a few metrics are vitalto most problems (Pareto, 1896). Data mining can-not substitute for hypothesis-driven learning inthe search for predictive metrics with high informa-tion content—not even in the aggregate—because itrelies on linking multiple low-information metricsin theories that turn out to be improbably complex.A final section illustrates the thesis by contrasting

WellPoint’s reliance on data-driven requirementswith Nestlé’s continual negotiated experiments tocope with complex results.

DIVERGENT APPROACHES TO SIMILARCHALLENGES AT HP AND TOYOTA

This section contrasts HP’s efforts to derivestrategy from data with Toyota’s tradition of hy-pothesis-driven learning. The strategic challengesare similar, but Toyota is faring better.

HP’s recent succession of chief executive officers(CEOs) has made the broad outlines of its strategicdebates a matter of public record. The story startswith Carly Fiorina, appointed CEO in 1999 as thefirm debated whether to split the businessesserving companies from the ones making printersand other consumer devices. Fiorina’s answer wasa bold move sideways—the audacious 2002purchase of Compaq to shore up HP’s consumerbusiness—before deciding the larger question.

Industry analysts credited Fiorinawith awell-executedmerger, but the results were lacklustre.Whereas the combined PC shares of Compaqand HPwere 21% against Dell’s 12% in 2000, the2004 share of Dell was 18% against 16% for themerged firm (Gaitonde, 2005). Moreover, HP’sdeeper dive into PCs shed little light on the valueof synergies between its profitable printer busi-ness and growing enterprise business.

Mark Hurd took over as CEO from 2005 to 2010and his HP bio summarizes his strategic directionfor the firm. He concentrated on enterprise datacentre architecture and services, mobile technolo-gies, and the move from analogue to digitalimaging and printing. In other words, except fora nod at the evolution from personal to mobilecomputing, he stayed the course.

Analysts remember Hurd fondly for thesteady 7.5% per year growth in revenue he over-saw and his relentless focus on cost control andexecution. As did Fiorina, he managed a majoracquisition—data services firm Electronic DataSystems (EDS)—without really inflectingstrategy. But he consolidated HP data centresfrom 85 to 6; he pruned software applicationsfrom 6000 to 1500, and he fattened operatingmargins from 4% in 2005 to 8.8% in 2008(Lashinsky, 2009).

A self-described ‘rack-and-stack’ guy, Hurdwas so focused on numbers that he reportedlydisliked graphics in presentations. After HP’sboard ousted him over inappropriate expenses

Syst. Res. RESEARCH PAPER

Copyright © 2013 John Wiley & Sons, Ltd. Syst. Res. 32, 28–49 (2015)DOI: 10.1002/sres.2219

The False Promise of Big Data 29

Page 3: The False Promise of Big Data: Can Data Mining Replace ...

and allegations about sexual advances, he crossedRoute 101 to help run Oracle, one of the threelargest enterprise data management companiesin the world.

And then HP hired a strategist from SAP,another of the three largest enterprise datamanagers in the world. Once he took the reins,Leo Apotheker announced decisively that HPwould let go of its PC business and buy Britishenterprise software star Autonomy.

Pascal Gobry wrote in August of 2011 that HPwas ‘ditching doomed…businesses and focusingon where…it has a competitive advantage bybeing a big company based in Silicon Valley witha tradition of R&D…’ (Gobry, 2011). Whether ornot the strategy was right, the important thing,Gobry argued, was to execute it and find out.But the share price sagged as PC sentimentalistsprotested, and the board replaced Apothekerafter just 10months with E-Bay’s Meg Whitman.

At her September 2011 press conference,Whitman said the important thing was to rebuildconfidence by executing well because HP’s strat-egy was sound. And yet the firm made no moveto sell its PC business or double down on enter-prise software. In fact, the only clear moverelated to strategy was the departure of long-timechief strategist Shane Robison and the decisionnot to replace him. There was a lot of marketanalysis while Autonomy floundered.

HP’s board may be afraid to jump into thedeep end of the enterprise software swimmingpool. But it could not offer a more heartfelt testi-monial than its experiment-shy stewardship ofHP’s decade of management changes to the hopethat some kind of advance in analysis will some-how free the modern enterprise from the messybusiness of trial and error.

Toyota looks out at a landscape similar inmany ways to those of HP. It has led its indus-try. Changes in automobile power sources arein some ways as momentous as changes inmobile and enterprise connectivity. Doubtsabout Toyota’s quality standards—similar tothose of HP—grew as the firm consolidated itsleadership position. And both firms have dis-tinctive cultures—the Toyota Way of continuousimprovement and the egalitarian, decentralizedHP Way.

Moreover, both firms face a growing riftbetween the two parts of their business. For HP,the needs of consumers focused on mobility,and businesses focused on data capture andrecall, are diverging. For Toyota, rural driverslooking to improve on the performance of apickup, and urban drivers looking for loweremissions, are at a proverbial crossroads. And ifHP has spread its resources across both sides ofa growing customer divide, so has Toyota.

With gas prices scraping $1.50 per gallon andall of its competitors building bigger and tallersport utility vehicles (SUVs), Toyota introducedthe hybrid Prius to the US market in 2001. Thatlooked smart when oil prices doubled from 2003to 2006. Toyota had foreseen peak oil, the end ofcheap fuel and a rush to environmentalism onUS roads, crooned the critics and the case writers.

That ignores the fact, however, that Toyotaadded two mid-sized to large-sized lines ofSUVs—the Highlander (fourth largest) and theSequoia (second largest)—to the three it alreadyhad in the US market in 2001. And it did not stopthere. It added the FJ Cruiser (sixth largest) in2007 and the Venza (fifth largest) in 2009. Therehas been no effort to reform the American driveror even to predict the American driver’s prefer-ences about car sizes.

What has emerged, instead, is a distinctlybarbell-shaped line-up of vehicles. With five linesof hybrid vehicles—including the HighlanderHybrid SUV—Toyota now has twice as manymodels of SUVs and hybrids as sedan cars inthe US market. That line-up, moreover, reflectsconstant tinkering with configuration, size,price point and power source. Both the SUVlines and the hybrid lines are evolving asquickly as they can in their separate tracks.The appearance of a hybrid adapted for citydriving—the Prius C—extended the trend.

In sum, both firms kept straddling dividedbusinesses. And yet the record suggests Toyotawent about it in a very different way. Nearlyevery year of the past decade brought majornew hypothesis-driven experiments in hybrids,SUVs or both. In keeping with the continuousimprovement on the factory floor for which it isfamous, Toyota tends to explore the ends of therange of any line of vehicles before filling in the

RESEARCH PAPER Syst. Res.

Copyright © 2013 John Wiley & Sons, Ltd. Syst. Res. 32, 28–49 (2015)DOI: 10.1002/sres.2219

30 David Apgar

Page 4: The False Promise of Big Data: Can Data Mining Replace ...

middle—as if it wanted to understand the fullrange of engineering challenges each vehicle linewill generate. At no point did Toyota manage-ment debate grandiose ideas about abandoninghybrids (e.g. when the Prius had a slow start) orSUVs (even when oil prices peaked before thefinancial crisis) without testing them. The firmprobed the specifics of the two sectors with newofferings every year.And Toyota is the one that seems to have come

up with an answer. If there is an overarching betthat the company is playing out, it is that USdriving habits will keep polarizing. The markets,at any rate, have substantially more confidencethat Toyota has determined what will driveresults in its sector than HP. Zacks InvestmentResearch reported a consensus 2012 5-year earn-ings growth forecast for HP of 8% per year com-pared with 15% for the computer sector—or 7%below average. The forecast for Toyota was 23%per year compared with 19% for the automobileindustry—or 4% above average.

US MANUFACTURING FIRMS STILLCHALLENGED TO IDENTIFYPREDICTIVE METRICS

US manufacturers have not overcome thechallenge of identifying metrics that can predictor explain performance results despite twodecades of intensive investment in data analysis.This section starts with evidence from the scar-city of sustained high performance over the past40 years that US companies have not adaptedwell to changes in their business environment.Research on the proportion of US business stallsdue to controllable factors shows that the rootof this performance sustainability problem is afailure of causal analysis—confusion about thedrivers of organizational performance. Thedisparity between pay and labour productivityover the past decade suggests this failure persistsdespite the prior decade’s widespread invest-ment in large-scale data systems for performanceimprovement.In a study of over 6700 companies from 1974 to

1997, Robert Wiggins and Tim Ruefli found thatvery few companies sustain superior performance

(2002, 2005). Defining superior performance rela-tive to each company’s sector competitors at levelsensuring statistical significance, they determinedthat just one out of 20 firms in their enormous sam-ple sustained superior results for at least a decade.Just one out of 200 companies (32 in their sample)managed it for 20years. And only 3M, AmericanHome Products, and Eli Lilly kept it up for50years. Perhaps more important, the authorsfound that the average duration of periods ofsustained superior performance declined towardsthe end of the sample period. This means that theamount of performance churning acceleratedthrough the late 1990s.

Citing the result in his survey of evolutionaryeconomics The Origin of Wealth, Eric Beinhockertakes it as evidence of what he, following evolu-tionary biologists, calls a Red Queen race in busi-ness competition. As the Red Queen warnedLewis Carroll’s Alice, it takes all the runningyou can do to stay in place. Much as the speciesstudied by those biologists, Beinhockersuggests, firms are ‘locked in a never-endingco-evolutionary arms race with each other ’(2006: 332). In fact, one can go further—asBeinhocker does later in his book—and arguethat companies are not adapting to the dyna-mism that the complexity of their competitive in-teractions creates in their business environments.

In Stall Points, Matt Olson and Derek van Beveranatomize this adaptive failure. Although morethan three quarters of large companies stall seri-ously and never recover, they report, the reasonsfor 87% of these stalls are controllable (2008: 35).Each of the four principal controllable reasonsthat they identify, furthermore, turns out to be akind of failure in diagnosing the causes of changein organizational performance. Because thesefour reasons account for 55% of all stalls, thefailure to understand what drives performancechange appears to be a major reason firms donot sustain strong performance.

Updating a major strategic research project ofthe Corporate Executive Board inspired by an in-ternal study and request from HP, Olson and vanBever looked at the more than 500 companiesthat have ranked at some point in the top halfof the Fortune 100 over the 50 years to 2006.76% suffered stalls defined as points after which

Syst. Res. RESEARCH PAPER

Copyright © 2013 John Wiley & Sons, Ltd. Syst. Res. 32, 28–49 (2015)DOI: 10.1002/sres.2219

The False Promise of Big Data 31

Page 5: The False Promise of Big Data: Can Data Mining Replace ...

10-year growth averaged at least 4% per year lessthan average growth for the preceding 10 years.Thus defined, stalls are serious—nine-tenths ofthe companies that suffered them lost at least50% of their market capitalization.

The authors find that premium position captiv-ity is the largest reason for stalls. As does theinnovator with a dilemma in Clay Christensen’sbook, these firms focus resources too long on aproven technology or innovation (Christensen,1997). They correspond to organizations inStafford Beer’s Viable System Model (VSM)whose strategic planning process (system five)favours current operations (system three) overnew ones (system four) (Beer, 1972). They tendto deprive internal technology experiments ofresources until a competitor concentrating on anupstart technology develops it to the point of cre-ating an insuperable competitive advantage. Thefailure here is one of causal analysis—misjudgingthe value of product or service attributes. Forexample, most laptop manufacturers failed tounderstand how increasing connectivity wouldaffect the trade-off consumers make betweencomputing power and mobility.

The steady stream of innovations that compa-nies bring to the market might seem to belie thisfirst reason for stalls. A long-term series of stud-ies by Michael Hannon and John Freeman onthe organizational ecology of markets suggests,however, that most firms bring new ideas to amarket by entering it (1984a, 1984b). Once theyhave entered the market, they tend to be muchless creative, often executing their original gameplan past its period of prime relevance (Hannanand Freeman, 1977; Hannan and Carroll, 1992).Hannan and his collaborators give the exampleof the shift from transistors to very large-scaleintegration chips in the 1980s. Among the top 10semiconductor firms of the transistor era, onlyTexas Instruments and Motorola adapted to chiptechnology and sustained their competitive edge.The leaders in the earlier period—Hughes, Philcoand Transitron—did not adapt, whereas newentrants such as Hitachi, Intel and Philips revolu-tionized computer hardware. This pattern ofinnovative entry and lack of later adaptabilitysuggests that while companies rarely test theircausal assumptions, markets always do. The

start-ups with well-adapted ideas survive untiltheir performance drivers evolve, and we tendnot to see the rest.

The second largest category of stalls is innova-tion management breakdown, including incon-sistent funding, disorganized research anddevelopment (R&D), inattention to time and sim-plicity, and conflicts with core firm technologies.The common failure of causal analysis underly-ing stalls in this category is misjudging the valueof past innovation. The third category, prematurecore abandonment, mirrors the first. Stalls herereflect confusion about the drivers of customersatisfaction and overestimate the potential toincrease it through new features and services.Companies heading for this kind of stall lookelsewhere and miss nearby opportunities.Volvo’s 1970s expansion into oil exploration,food products, biotech and leisure products isan example.

The fourth category, talent bench shortfall,explicitly turns on failures of causal analysisregarding the contribution to success of specificskills, experience, kinds of talent and key individ-uals. Accounting for 9% of stalls, these stalls raisequestions about firms’ ability to determine thedrivers of their employees’ productivity. As dothe other leading kinds of stalls, they reflect afailure to identify what drives success.

The long time-frame of Stall Points shows thatthe problem of a poor understanding of whatdrives success has been around for many years.But this is the problem that information systemsfor the large-scale analysis of performance data—variously called business analytics or just bigdata—specifically set out to address over the pastdecade. The Stall Points authors nevertheless seeno drop in the threats to growth they have identi-fied for the coming generation of corporategiants, which they dub the class of 2015. Indeed,they believe premium position captivity, and thefailures of causal analysis underlying it, aregrowing.

There is already some evidence from the USlabour market that they may be right. Duringthe decade from 2001 through 2010, the corporateuse of data analysis tools exploded. In general,expenditure grew disproportionately in the1990s, whereas usage widened and deepened in

RESEARCH PAPER Syst. Res.

Copyright © 2013 John Wiley & Sons, Ltd. Syst. Res. 32, 28–49 (2015)DOI: 10.1002/sres.2219

32 David Apgar

Page 6: The False Promise of Big Data: Can Data Mining Replace ...

the 2000s. Most analyses show US businessexpenditure on IT rising faster than GDP everyyear of the earlier decade (e.g. Goldman Sachs,IT spending survey, 2 Nov 2009). The pace fallsoff in the later decade, when the Census Bureau’sinformation and communication technology(ICT) survey finds US manufacturer spendingon IT rising to $36bn a year (or 0.2% of GDP)before dropping 20% in 2009, as it usually doesin recessions.In the years from 2001 to 2010, however, busi-

nesses put large-scale data analysis to work moreintensively in performance planning. This iswhen many terms for the strategic analysis ofenterprise data became widespread. Businessanalytics, business intelligence and enterprisedata visualization all became mainstream catego-ries of IT tools during the decade.More powerful computing tools no doubt con-

tributed to this most recent flowering of businessempiricism. The 1998 appearance of the first on-line analytic processing (OLAP) server, MicrosoftAnalysis Services, provides a useful marker. Butthe Sarbanes–Oxley audit reforms following theEnron scandal of 2001, and particularly theirfocus on the adequacy of controls for financialreporting processes, may have been equally im-portant. Following Sarbanes–Oxley, businessessystematized their controls for businesses pro-cesses ranging far beyond accounting. And thatprovided more coherent structures for a widerange of business data that OLAP could chewover.The decade is notable here for the dog that did

not bark—wage growth failed to keep up withinflation. Between 2001 and 2010, unit labourcosts in manufacturing grew 0.4% per year,whereas the consumer price index grew 2.4%per year in the US [Bureau of Labor Statistics(BLS) data]. It is true that hourly manufacturingcompensation rose 3.7% per year over thedecade. But output per hour grew 3.3% per yearin manufacturing, or 2% per year more than the1.3% per year real increase in hourly compensa-tion. For all of the decade’s new uses of dataanalysis, labour compensation growth continuedto lag labour productivity growth.Granted, other factors beyond labour affect

labour productivity, not least technology capital

investment. The data given earlier suggest weshould attribute as much as 2% per year of thegrowth in labour productivity to these otherfactors. That same 2% per year impact of otherfactors on labour productivity growth wouldarguably also explain the 2% gap between infla-tion and growth in manufacturing unit labourcosts. A starting point for much of the researchover decades of labour productivity analysis,however, is the expectation that the cost of labourwill reflect its marginal productivity in a compet-itive market. For that to be true between 2001 and2010 in US manufacturing, the 1.3% per year risein real hourly compensation would have to havereflected a similar rise in marginal output perhour—that is, the rise in output per hour keepingother factors constant. If so, the rise in marginaloutput per hour was 2% per year less than the3.3% rise in average output per hour in USmanufacturing over the decade. But why wouldtechnology affect average more than marginallabour productivity?

A number of studies of total factor productiv-ity try to explain the part of the growth in labourproductivity that hourly compensation does notcapture (Jorgenson et al., 2008). Perhaps theworkforce captured 39% of the growth in USmanufacturing labour productivity over the lastdecade, for example—taking the 1.3% real annualincrease in labour compensation as a proportionof the 3.3% annual growth in hourly labour pro-ductivity—because the value of skills in usingnew technology was 39% of the total value ofthe technology.

The problem is that no allocation of the valueof new technology to contributions from newskills and new investment explains the sustaineddifference between marginal and averageproductivity growth rates. If anything, onewould expect new workers to capture more oftheir productivity growth as they internalizedprocess innovations and technology advances.2% per year is a lot, at any rate, to take out ofthe pockets of manufacturing workers for anextended period. Over a decade, it amounts tonearly 22% of foregone wage increases.

Critics of US labour arrangements such asHedrick Smith simply conclude that the marketis unfair. Even if US employers are exercising

Syst. Res. RESEARCH PAPER

Copyright © 2013 John Wiley & Sons, Ltd. Syst. Res. 32, 28–49 (2015)DOI: 10.1002/sres.2219

The False Promise of Big Data 33

Page 7: The False Promise of Big Data: Can Data Mining Replace ...

unfair power in the labour market, however—bycooperating to ration hiring, for example, orthrough wage discrimination—it is not clearwhy average productivity would grow fasterthan marginal productivity. Moreover, it isunclear why employment of a price-repressedresource would not go up. An ongoing inabilityof manufacturing firms to evaluate employeecontributions is a much stronger explanation be-cause it severs the link between marginal produc-tivity andwages. Indeed, it would be surprising ifthis failure were limited to firms that stall.

In sum, a failure of causal analysis—and in par-ticular a failure to understand the drivers of perfor-mance—seems to lie at the root of performancesustainability problems that US companies havesuffered for years. Despite widespread investmentin IT systems to analyze performance in the 1990sand strides in applying those systems in the pastdecade, furthermore, the problems are not goingaway.

The rest of the paper looks closely at how in-creasing complexity in performance results inten-sifies organizations’ need for hypothesis-drivenlearning. It suggests part of the reason US firmsstill have trouble identifying explanatory perfor-mance metrics is their overreliance on the dataanalysis systems they procured to generate them.

HIGH INFORMATION CONTENT AS WELLAS IMPACT NEEDED TO PREDICTCOMPLEX RESULTS

The next two sections are the heart of the argu-ment. This section shows why only predictivemetrics with high information content as well ashigh impact can explain complex results. Thenext section argues data mining cannot substitutefor hypothesis-driven learning in the search forpredictive metrics with high information content,because it relies on low-information metrics. Andit can link them to form higher informationaggregate metrics only in theories that turn outto be improbably complex. Managers thereforecannot hope to understand complex results bysubstituting data mining for the ongoing experi-ments of hypothesis-driven learning.

The story so far is that US manufacturing exec-utives are failing to explore the right kind ofcausal assumptions about performance results,and this manifests itself as a failure to find pre-dictive performance metrics. Predictive metricshere are indicators whose values, when mea-sured, determine or narrow the range of possibleperformance results of interest to a managementteam. If the result is a measure of profit for a com-pany that sells software to improve factory oper-ations, for example, a predictive metric might bethe frequency of conversations with prospectivecustomers about their operating problems.

Assumptions about cause and effect are reallyhypotheses. In the last example, for instance, thecompany’s executives may believe that askingcustomers about unsolved problems will helpmake inroads in the process automation marketand that customers will be candid about them.To test a causal assumption, naturally, it is neces-sary to measure outcomes for both the effect andthe proposed cause. Fortunately, measures of theresults that matter to most management teamsare widely tracked—measures such as profit, re-turn on assets, return on equity, return oninvested capital and economic profit. So the chal-lenge is finding indicators that evaluate thecauses those teams think matter most.

This is why many so many consultants usemetrics as a shorthand way of talking aboutassumptions regarding performance drivers. Thispaper will also use metrics to discuss the predic-tive power of performance drivers and theadequacy of the assumptions underlying them.But the reason is not just convenience—metricshave two characteristics that turn out to be themeasurable counterparts of what make perfor-mance assumptions and hypotheses useful.

One of these characteristics is the impact of indi-cator outcomes on the results of interest tomanagers—the higher the impact, the greater theindicator’s predictive power. In the case of oursoftware company, for example, an indicator forthe frequency of problem-solving conversationswith customers would have high impact on profitif changes in profit always accompany changes inthe indicator. Indicators that have a high impacton a result thus have a high correlation with it(for amore precise statement, please seeAppendix).

RESEARCH PAPER Syst. Res.

Copyright © 2013 John Wiley & Sons, Ltd. Syst. Res. 32, 28–49 (2015)DOI: 10.1002/sres.2219

34 David Apgar

Page 8: The False Promise of Big Data: Can Data Mining Replace ...

High impact metrics, in turn, reflect assump-tions about causes that bear strongly on results.If the outcomes of a customer-conversation met-ric have a high impact on profit, for example,the software company’s management team hasprobably found a strong driver of profitabilityin the form of problem-seeking conversationswith prospects.Measures of the impact of a cause on a result

are related to several well-known statistics. Theregression coefficients of independent variablesin statistical analysis, for example, directlycapture the sensitivity of changes in a result tochanges in other variables. The correlation coeffi-cient between two metrics, furthermore, com-pares the impact of each on the other (withoutpresuming knowledge of which is the cause andwhich is the effect) with the variability of thetwo variables. Nonparametric impact measures(independent of any particular model) thatcapture the indeterminacy of a driver or causalvariable given information about the result it isthought to affect are increasingly widespread(see Appendix for more detail).The impact of metrics reflecting performance

drivers lends itself to a useful ranking techniqueborrowed from risk management. The techniquecan help managers focus on what potentiallymatters most to a result, and it makes the conceptof impact more concrete. It involves two steps.For any identified driver of a result—whether itis an operating variable or a risk factor—firstproject expected and worst-case outcomes. Then,project the impact on the result of the scenarioproducing the worst-case outcome of the driver.Let us call this the worst-case impact (WCI) ofthe driver. For example, if the expected andworst-case defect rates for a factory are 1% and4%, and the worst-case scenario reduces thefactory’s profit from $100 000 to $90 000, thenthe WCI of the default rate is $10 000.Worst-case impacts are useful because they

allow the direct comparison of different drivers’potential effects on a result. By the same token,they provide an impact ranking of all of theidentified drivers of a result and their associatedmetrics. It is possible to simplify most scorecardsby estimating WCIs for each driver and keepingjust the ones with the highest WCI. (The worst-case

scenarios for each identified driver must be equallyremote.)

The WCIs are closely related to value-at-riskcalculations for risk factors. Similar to value-at-risk,WCIs combine the concepts of volatility and expo-sure—the WCI of a factor or driver rises with bothits volatility and the sensitivity of results to itschanges. So, for example, cooking times have abig impact on enchiladas—not because they burneasily but because the times vary so much fromone recipe to another. The kind of oil matters forthe opposite reason—not because vegetable andolive oil are wildly different but because enchiladasare exquisitely sensitive to the oil (vegetable, foronce, being better).

Moreover, WCIs provide a natural frameworkfor incorporating an analysis of risk factors intoperformance assessment because there is no rea-son to treat performance levers under manage-ment’s control any differently from risk factorsoutside of its control. The importance of a perfor-mance driver depends on its WCI—not whetherit is controllable.

The impact on results of metrics reflecting theoperating variables, risk factors and other parame-ters assumed to affect an organization is fairly easyto understand —even if managers do not alwaysuse it to simplify their performance scorecardsand measurement systems. The big question iswhat else metrics need to have predictive power.

The answer is that understanding or predictinga complex pattern of results requires metrics withhigh information content as well as high impact.It has emerged repeatedly over the past centuryin statistics, Bayesian probability, genetics, logic,model theory, cybernetics, statistical mechanics,thermodynamics and decision theory. And yet itstill surprises.

The English statistician Ronald Fisher madesomething of a crusade out of the related argu-ment that correlation does not imply causation.He even claimed to prove that apple importsraise divorce rates in England (because they werecorrelated) to rebut a paper by British epidemiol-ogists Doll and Hill on the correlation betweensmoking and lung cancer (Doll and Hill, 1950).Although he was wrong about cancer, there canbe little question about his larger point, becauseeven when there is a causal relation between

Syst. Res. RESEARCH PAPER

Copyright © 2013 John Wiley & Sons, Ltd. Syst. Res. 32, 28–49 (2015)DOI: 10.1002/sres.2219

The False Promise of Big Data 35

Page 9: The False Promise of Big Data: Can Data Mining Replace ...

two variables, correlation does not show whichway it points. Moreover, there are plenty ofexamples of spurious regularities.

The correlation between a drop in a column ofmercury and storms is an example of a spuriousregularity. Here a storm’s cause may be a dropin barometric pressure, which also causes thechange in barometer readings. It would bewrong, of course, to infer from the correlationthat barometer readings cause storms. Spuriousregularities are so problematic for any attemptto understand causality in terms of things thatcan be observed that they provoked HansReichenbach to define causes as regularitiesbetween two variables persisting even whenyou account for every other variable that mightbe a common cause of them. He also stipulatedthe Common Cause Principle that some commoncause explains any correlation between two well-defined variables neither of which causes theother (Reichenbach, 1956).

Reichenbach’s definition has been of greatvalue to the understanding of causal hypotheseseven though it does not tell you how to findcommon causes—also called confounding factorsbecause they confound conclusions about causa-tion among other variables if they are not takeninto account. Indeed, both Fischer’s point aboutcorrelation and causation and Reichenbach’s con-cern about hidden common causes raise thelarger question here: what does a causal variablehave to have beyond an impact on or correlationwith the result it affects? If not correlation thenwhat does imply causation?

The English cybernetics pioneer Ross Ashbyprovided an answer in his 1956 Law of RequisiteVariety (Ashby, 1956). According to Ashby’sprinciple, any control or regulatory system suchas a thermostat must have as many alternativestates or settings as the external disturbances(such as extreme weather), it is designed to offset.If a bank branch experiences three widely differ-ent numbers of customers waiting in line eachweek, for example, then it should probably havethree different staffing plans for the appropriatehours to keep waiting time constant. The samegoes for measuring tools: if a thermometerdesigned to work up to 100°F bursts on a hotafternoon in Austin, its failure is one of requisite

variety—the thermometer could not reflect asmany possible states as the climate it measured.

Ashby’s law implies that control systems needto be able to match the complexity of their envi-ronment. By extension, a predictive system needsto match the complexity of what it tries to pre-dict. So when it tries to predict results that arecomplex in the sense of having high informationcontent, a predictive system also needs to havehigh information content.

Before going further, it may help to contrastindicators with high and low information contentin intuitive terms. Indicators with high informa-tion content compared with a result that onewants to understand or predict are similar to rec-ipes, whereas those with low information contentare similar to lists of ingredients. If your palate isgood, you can probably guess the ingredients in adish once you have tasted it. But knowing the listof ingredients is not enough to know how toprepare the dish. If you know a recipe, however,you can probably prepare the dish. No matterhow good your palate, just tasting the dish willprobably not help you guess the recipe.

Implementation metrics for highly specificstrategies are good examples of indicators withhigh information content. As in the case ofrecipes, knowing you implemented a provenstrategy will let you forecast a successful resultwith confidence. But knowing that someone hita target is not enough to tell you what specificstrategy she followed.

Requirements are a good example of indicatorswith low information content relative to theresult. As in the case of ingredients, just knowingyou hit a target may let you ascertain the value ofan indicator for one of its requirements. Knowingthat you have met a requirement for hitting atarget, however, is not enough to reassure you thatyou hit it. A requirement for our software companyto hit its profit target, for example, is completion ofthe latest version of its software—but completingthat version is no guarantee of any particular levelof profits.

Recipes have much higher information contentthan lists of ingredients just as good strategieshave much higher information content thanrequirements for hitting a goal. And root causes,as we shall see, have more information content

RESEARCH PAPER Syst. Res.

Copyright © 2013 John Wiley & Sons, Ltd. Syst. Res. 32, 28–49 (2015)DOI: 10.1002/sres.2219

36 David Apgar

Page 10: The False Promise of Big Data: Can Data Mining Replace ...

than later causes. That certainly makes recipes—and well-specified strategies—more useful thanlists of ingredients and requirements for knowingwhat to do. But recipes and well-specified strate-gies offer an additional advantage in understand-ing or predicting an uncertain result.The advantage is that you always know when

to modify a specific strategy or recipe. The failureto hit a goal after properly executing it shows therecipe or strategy was wrong. If you find a highlyspecific strategy or recipe that starts to work, incontrast, it will probably help you predict orcontrol results until environmental changesundermine it.Linking Ashby’s law of requisite variety to

Bayesian probability, the American physicist E.T. Jaynes proposed the related principle of maxi-mum entropy (Jaynes, 1957). Suppose a set ofdata reflects an unknown underlying probabilitydistribution of possible outcomes. Of all the pos-sible probability distributions that could explainthe data, the best one to infer until more dataare available—the best prior—is the one whoseoutcomes have the highest information content,or are most surprising or seemingly random onaverage.Say, for instance, you have collected a data set

large enough to give you confidence in the meanvalue of the underlying probability distribution itreflects, but not much else. This might be the caseif you have repeatedly measured the time it takesto complete a process in a factory. And supposeyou want to infer an underlying probability dis-tribution to make some predictions. The distribu-tion you infer should be the one whose outcomeshave the highest information content—are mostsurprising on average—and the mean you havemeasured.One can speak about the information content of

a distribution just as well as its outcomes. So, forexample, a distribution that is perfectly flat acrossall possible outcomes has higher information con-tent than others with the same number of possibleoutcomes. This is because the average surprise orimprobability of its outcomes turns out to behigher than the average surprise or improbabilityof the outcomes of less even distributions.The rationale behind Jaynes’ principle is modesty

about what we know. The probability distributions

thought to underlie a data set that have the highestinformation content—the flattest ones—are also theones that presume the least knowledge about whatgave rise to them. They are the closest to being ran-dom. And they are also the ones that reflect thefewest prior causes or forces that might haveshaped them. To infer that a data set (or knowledgeabout the data) arises from a probability distribu-tion with low information content—that is, froman uneven one—is to infer the effect of prior causesfor which the data set provides no evidence byitself.

The principle of maximum entropy helpsexplain why predictive metrics should have highinformation content. By searching for explana-tory probability distributions with high informa-tion content, once again, Jaynes made sure he didnot presume the effect of factors for which theresults he wanted to explain provided noevidence. But that is similar to asking whichchapter of a novel you would consider mostlikely to be true if you knew some events in themiddle of the book really happened to someone.If you had to choose one chapter, it would makesense to select the one that happened firstbecause the other chapters all limit the possibili-ties still open in the earliest one.

The flip side of the chapter in a novel with themost open future possibilities is the great amountyou learn by filling in the chapter’s future. Youlearn about the possibilities open at the end ofeach chapter of a novel by reading the rest ofthe novel. What you learn by reading from theend of any chapter to the end of the novel issimilar to the outcome of a metaphorical indica-tor that picks out an actual future from the possi-bilities still left open by that chapter.

Now ask which of those indicators is mostuseful in understanding how a novel turns out.You would choose the indicator reflecting whatone learns reading from the end of the earliestchapter to the end of the book—this has thehighest information content of the indicatorsreflecting what you learn about the possibilitiesleft open after each chapter.

It is in just this way that indicators with highinformation content resemble indicators for rootcauses. Root causes—such as the earliest chapterof a novel—precede other causes that determine

Syst. Res. RESEARCH PAPER

Copyright © 2013 John Wiley & Sons, Ltd. Syst. Res. 32, 28–49 (2015)DOI: 10.1002/sres.2219

The False Promise of Big Data 37

Page 11: The False Promise of Big Data: Can Data Mining Replace ...

a set of results. But that means they generallyhave higher information content than indicatorsfor successive causes. This is actually an implica-tion of Ashby’s law of requisite variety. To seewhy, suppose you measure two indicators Aand B and you know that the outcome of Acompletely determines the outcome of B. A mightbe income, for example, and B might be incometax. There is no way A could have less informa-tion content than B and still determine B. Incomemight have less information content than tax ifthere were two levels of tax payment correspond-ing to a single level of income—but then it wouldnot make sense to say that income determinedincome tax.

So Jaynes’ search for probability distributionswith the least baggage in terms of hidden as-sumptions is similar to the search for root causesin one tantalizing respect. Both push to the begin-ning of causal chains. Without aiming directly atit, Jaynes provided an answer to our questionabout what an indicator needs to have, beyonda high impact on results, to help managersunderstand, predict or control those results. Itneeds to have high information content, as doesthe outcome of a root cause indicator or variable.

Management theorists have occasionallyemphasized the dependence of predictive poweron information content. A good example is the sec-tion on Potential Problem Analysis in CharlesKepner’s and Ben Tregoe’s classic The New RationalManager (Kepner andTregoe, 1997). Usingweatheras an example, the authors focus on the problem-solving power of specificity in identifying riskfactors, writing: ‘In Potential Problem Analysis,the purposeful identification of specific potentialproblems leads to specific actions.’ The general riskof bad weather, for example, involves two possiblestates of affairs—good and bad weather. The morespecific risk of a windstorm with heavy raininvolves at least four—weather with and withoutstrong wind and with and without heavy rain.Anyone suspecting a windstorm would preparefor a windstorm rather than just bad weather.

More recently, Ricardo Hausmann, DaniRodrik and Andres Velasco set up their growthdiagnostic approach to economic developmentwith a search for root causes of growthconstraints working backwards through a causal

tree (Hausmann et al., 2005). The present analysissuggests these root causes have greater potentialto explain disappointing growth results, becausethey have higher information content than proxi-mate causes.

The dependence of predictive power on infor-mation content also seems to motivate RussellAckoff’s concept of idealized design. In Creatingthe Corporate Future, Ackoff recommends design-ing the organization to deal with the future ratherthan just forecasting the future to knowwhat willhappen to the organization (Ackoff, 1981). Toaccomplish this, Ackoff defines idealized designas the structure with which the designers

…would replace the existing organizationright now, if they were free to replace it withany organization they wanted subject to onlytwo constraints (technological feasibility andoperational viability) and one requirement(an ability to learn and adapt rapidly andeffectively). (Ackoff, 2001: 7)

The concept underlies the widespread use ofmission statements in non-profit and for-profitUS organizations today which are supposed tolaunch the process of self-renewal in idealizeddesign. But the anodyne and indistinct missionstatements organizations tend to mail in bear lit-tle resemblance to the purposive statementsAckoff recommends. “An organization’s missionstatement should be unique, not suitable for anyother organization” (Ackoff, 1981). In otherwords, to renew the organization, it is necessaryto state a mission as a means to the organization’sgoals so specific that it is unlike any other.Mission statements are effective, Ackoff is tellingus, when they have high information content.

Generalizing Bayes’ theorem, the chain rule inmodern information theory not only proves theimportance of high information content topredictive metrics but provides a way to measureit. Here is a conceptual account. Suppose F standsfor the outcome of some predictive factor you aremeasuring such as the frequency of in-depthcustomer conversations. And suppose E standsfor a business result such as earnings that youwant to understand, predict or control. Then,the mutual information of F and E measures theamount of information that F contains about E

RESEARCH PAPER Syst. Res.

Copyright © 2013 John Wiley & Sons, Ltd. Syst. Res. 32, 28–49 (2015)DOI: 10.1002/sres.2219

38 David Apgar

Page 12: The False Promise of Big Data: Can Data Mining Replace ...

and that E contains about F. The entropy of F or Emeasures the information content of the indicator(which, again, reflects the average improbabilityor surprise of an outcome). And the conditionalentropy of F given E is the amount of informationremaining in F, the factor, given an outcome for E,the result. (You might say it measures the infor-mation in F that is not in E.)The amount of information that F contains

about E is a good measure of the degree to whichoutcomes for the predictive factor F determineearnings or outcomes for E. This, once again, isthe mutual information of F and E. A founda-tional theorem in information theory called thechain rule shows it must equal the informationcontent of F less the amount of information in Fthat is not in E (Cover and Thomas, 1992: 19).Using the definitions earlier, this is the entropyof F less the conditional entropy of F given E. Inother words, these two quantities govern predic-tive power. (See Appendix for an outline of theproof using Bayes’ theorem.)To summarize the relation between predictive

power and information content:

Chain rule: (Mutual info of F and E) = (Entropyof F) less (conditional entropy of F given E)

Predictive power: (What F says about E) = (Infocontent of F) less (info content of F not in E)

What about impact? The subtracted component—the amount of information in F that is not in E (orthe conditional entropy of F given E) turns out tobe a precise negative measure of the impact of Fon E. Think of it as a measure of F’s lack of impacton E or E’s insensitivity to F. Because the first com-ponent was just the information content of F, youcan say the power of a factor F to help us under-stand, predict or control a result E is the informa-tion content of F (its entropy) less the lack of

impact of F on E (the conditional entropy F givenE). Both of these are measurable, because they arefunctions of the average probability of the out-comes of F and E (more in Appendix). Here ishow predictive power, information content andimpact relate:

Predictive power: (What F says about E) =(Info content of F) less (lack of impact of F on E)

Explicit consideration of the informationcontent of predictive metrics may help solvesome of the classic challenges in inferring causal-ity based on observed probabilities. Judea Pearl’scausal calculus goes farthest towards inferringthe effects of interventions such as a change inmarketing or procurement from observed proba-bilities and correlations (Pearl, 2000). His calculushas been put to work in Bayesian networks—causal models of business, economic and envi-ronmental systems—that help characterize theconfounding factors Reichenbach sought toidentify. As the results in these systems becomemore complex, however, it becomes more diffi-cult to find a sufficient set of common causes thatincludes all the factors that could confoundconclusions about true causes. This section showswhy, as complex results require a set of explana-tory factors that are at least similarly complex.

Here, as a practical aside, is a simple way tohave a rough measure of the information contentof an indicator. It may be an indicator that we al-ready know to have a high impact on a result wewant to understand or control. What we want toknow is whether it also has information contentat least as high as that of the result (Table 1). Thistool, by the way, is drawn from a chapter investi-gating the specificity of indicators, illustrating thetight relationship between specificity and infor-mation content (Apgar, 2008).

Table 1 Information content of a predictive indicator versus the results it helps predict

Less even distribution of possibleindicator outcomes than of

possible results

More even distribution of possibleindicator outcomes than of

possible results

More different possible indicatoroutcomes than possible results

Comparable Higher

Fewer different possible indicatoroutcomes than possible results

Lower Comparable

Syst. Res. RESEARCH PAPER

Copyright © 2013 John Wiley & Sons, Ltd. Syst. Res. 32, 28–49 (2015)DOI: 10.1002/sres.2219

The False Promise of Big Data 39

Page 13: The False Promise of Big Data: Can Data Mining Replace ...

The tool suggests a way to determine whichindicators belong on a performance scorecardthat is short but predictive. The first step is tolook for indicators with high information contentrelative to the result you want to understand orcontrol. You can do so by assigning the candidateindicators to the appropriate cells in Table 1. Thesecond step would be to rank the ones in the topright cell byWCI. It is overwhelmingly likely thatonly three to five will have relatively high WCIs.Throw out the rest. And then add the indicatorfor the result you are trying to predict or controlto update your scorecard.

The basic information relationship between apredictive metric and the results it helps predictthus suggests a procedure for prioritizing predic-tive indicators and even for designing perfor-mance scorecards. Those relationships may beeven more important for what they say aboutidentifying predictive metrics when the resultsthey are supposed to predict are complex.

The performance management challenge be-fore managers who want to understand and con-trol complex results is to find predictive metricswith high information content as well as a highimpact on those results. And plenty of large-scaleIT vendors and consultants have a ready answerin generating more, bigger and faster data. Thequestion is whether the answer works.

NO WAY TO IDENTIFY HIGH-INFORMATIONMETRICS WITHOUT HYPOTHESIS-DRIVENLEARNING

The last section showed any explanation of com-plex results requires predictive metrics with highinformation content as well as a high impact onthose results. This section argues data miningcannot substitute for the successive experimentsof hypothesis-driven learning in the search forpredictive metrics with high information con-tent because data mining relies on widespread,low-information metrics. And it can link low-information metrics to form higher informationaggregates only in theories that turn out to beimprobably complex. Managers therefore cannothope to understand complex results by substitut-ing data mining for hypothesis-driven learning.

At first sight, there could be no more naturalanswer to the manager’s dilemma about findingpredictive metrics to understand and controlcomplex results than taking advantage of thehuge increases in data capture and analysis nowat that manager’s disposal. Generating more, big-ger and faster data is a natural answer for threereasons.

First, data management is cheaper than everbefore. Second, the use of statistical analysis inthe social sciences has been so fruitful that theuse of related methods in management seemspromising. And third, one might hope to addressany deficiencies in the predictive power ofoperating and risk indicators currently used tounderstand and control results by looking atmore of them.

This section shows that, regarding the first ofthese reasons, amisguided use of business analyticsto find predictive metrics will make it very expen-sive. As to the second, management challenges nolonger resemble the kinds of social science problemsthat statistical methods have illuminated. And as tothe third, there are strong reasons to doubt quantitycanmake up for qualitywhen it comes to indicatorswith the power to predict complex results.

Ronald Fisher, the English statistician discussedearlierwho helped build the foundation ofmoderndata analysis, was acutely aware of the limits ofstatistical inference. He knew, for instance, that ex-planatory variables need to have more than a highimpact on what they are supposed to explain(Fisher, 1924). The problem for Fisher was thatwhile high-impact variables can explain some ofthe variability or scatter of a result, they may notbe able to explain most of it. (In the terms ofthe last section, we would say they do nothave high enough information content.) In-deed, the modern statistician’s F-test—namedfor him—compares the variability in a resultthat a set of predictive indicators explains withwhat they leave unexplained.

The temptation for the social scientist is to addmore explanatory variables or indicators in thehope of minimizing the unexplained variabilityin the result that is to be explained. Fisher haddoubts about that, too. The challenge is thatwhen the number of explanatory indicators ap-proaches the number of observations of the result

RESEARCH PAPER Syst. Res.

Copyright © 2013 John Wiley & Sons, Ltd. Syst. Res. 32, 28–49 (2015)DOI: 10.1002/sres.2219

40 David Apgar

Page 14: The False Promise of Big Data: Can Data Mining Replace ...

to be explained, there is no longer much reason toexpect those indicators to be able to predict fu-ture results that have not yet been observed. Thisis the so-called degrees of freedom problem facedin a simplified form if one tries to estimate atrend from just one observation—any trend fitsthe data. Thus, the F-test asks each explanatoryindicator in a regression model to do its fair shareof explaining—for fear that even unrelated vari-ables can appear to explain a little of the variabil-ity of results at the margin.One solution available to the social scientist

with an embarrassment of potential explana-tory indicators is to try to use them to explainpast results going back further in time. If somecombination of explanatory indicators canaccount for the variability in a much largerset of past and present results, it is likely theycan do so only by virtue of a deep underlyingrelation between the indicators and what theyare supposed to explain. The strategy does notwork, however, if the underlying relationship ischanging.This limitation of statistical inference inspired

much work in the 1980s on autoregressivemethods. The idea was to determine what ex-planatory variables matter in the future basedon an analysis of historical results. By assuminga regular, predictable change in the relationshipbetween explanatory indicators and results, it ispossible to solve this problem. But it is unsafe toassume this relationship changes smoothly inmost business environments.By 1974, moreover, methodologists were for-

mulating broader limits to the strategy of increas-ing the predictive power of a set of indicators byadding more new indicators to the set. There is aproblem with adding explanatory indicators to amodel even when ample results and degrees offreedom are available. Hirotsugu Akaike discov-ered a strict trade-off between the complexityand accuracy of any predictive model (Akaike,1974).Akaike’s information criterion rewards any

model for the accuracy with which it explains aset of complex results but penalizes it for each ex-planatory indicator that it uses to do so. The rea-son, broadly, is that as one uses more explanatoryindicators to account for a set of results, the

quality of the explanation is more likely to ariseby chance and less likely to reflect a persistentunderlying relationship. Akaike’s informationcriterion has attracted much attention since2002, especially in computational genetics(Burnham and Anderson, 2002).

Given all these limits on statistical inference,then, where have statistical methods successfullyused many explanatory variables to shed light oncomplex results in the social sciences? Two areas,broadly defined, stand out—both differingsharply from the circumstances where most man-agers need to provide causal explanations.

One involves the comparison of results acrosssimilar situations at a point in time. Evaluationsof government programs and development pro-jects in health care and education, for example,can often determine the impact of an interventionsuch as the introduction of new medical servicesor new curricula. The usual procedure is to com-pare results in randomly selected locations sub-ject to the intervention with those not subject tothe intervention—and to attribute the differencesto the intervention.

Even this kind of impact evaluation has its lim-itations, however. Although it is relatively easy toimplement randomized trials for the evaluationof social development programs in educationand health care, it is hard or impossible to do sofor infrastructure projects, general governmentpolicy support, or large-scale agriculturalprojects.

Similarly, two business sectors are particularlyamenable to quantitative analysis with many ex-planatory metrics: retailing and finance both gen-erate results that can be compared across manysimilar stores or similar accounts. And both havea range of interventions that are easy to tailoracross stores or types of accounts such as productpositioning and targeted financial services. Mostbusiness sectors, however, generate a series ofone-off results rather than large sets of simulta-neous comparable results. Or they face decisionsinvolving commitments such as investment in anew factory that simply cannot be meaningfullytailored from one customer to another. It is hardto imagine where managers could generate largeenough sets of comparable results to test long listsof potential predictive indicators in technology,

Syst. Res. RESEARCH PAPER

Copyright © 2013 John Wiley & Sons, Ltd. Syst. Res. 32, 28–49 (2015)DOI: 10.1002/sres.2219

The False Promise of Big Data 41

Page 15: The False Promise of Big Data: Can Data Mining Replace ...

telecoms, drugs, power, manufacturing, energyand transportation. Their best hope, in fact, is toanalyze historical data.

And that brings up the second area where sta-tistical methods using many variables haveexplained complex results in social sciences—the comparison of results across time, or time se-ries analysis. As mentioned earlier, arbitrarilylong time series provide enough data to test largenumbers of explanatory variables. This has beenuseful in economic analysis, but it has workedfor economists only where underlying causal re-lations are fixed or change smoothly. Economicmarkets in equilibrium fit these conditions, andempirical economics provides good explanationsof their behavior. Markets in disequilibrium donot, and the results show it. Robert Lucasapplied this critique to macroeconomic fore-casting with policy variables that change therelationship of other variables in the model(Lucas, 1976).

The problem for managers is that the drivers ofcomplex results such as profitability are subjectto constant change because their firms are lockedin Beinhocker’s never-ending coevolutionaryarms race with one another. Historical analysisof results can show which explanatory indicatorshave mattered at some point over long periods oftime but not what matters at the moment. Theoutcomes of stable processes such as steps in amanufacturing cycle that can put historical datato good use are mostly relevant to tactical, notstrategic, decisions. Areas such as quality controlhave indeed become highly analytical becausethe patterns of outcomes tend to be stable.Managers facing more complex results cannottake that kind of stability for granted, however,and therefore, cannot make the same use ofhistorical data.

The success of statistical methods deployinglarge numbers of explanatory variables in the so-cial sciences, therefore, provides little reason foroptimism about the use of data to find the driversof complex business results. The idea that moredata can help managers find new drivers of com-plex results, nevertheless, has persistent appeal.That appeal needs a closer look.

The challenge for the manager who needs tounderstand or control results, once again, is to

find indicators reflecting operating levers andrisk factors that have high information contentas well as a high impact on those results. Twoindicators will be better than one if together theyhave higher impact, higher information content,or both (and are no more costly to track). Thelogic of big data is straightforward: no singleindicator needs to have sufficient impact and suf-ficient information content to explain complexresults. Just keep adding indicators that raisethe information content or the impact on resultsof all of the predictive indicators as a group.You can keep going until you approach the num-ber of independent observations of results thatyour group of indicators needs to explain.

Although the logic is straightforward, it carriesa hidden assumption. The assumption is that asyou try to explain results better by using moreindicators from a data warehouse—be theymeasures of process quality, customer satisfac-tion or something else—it is likely that at leastsome will have a little of the two qualities neededto raise your predictive capacity. That is, at leastsome will have an impact on results afteraccounting for the impact of the other indicatorson your list. And, at the same time, they willadd at least a little information content to thoseother indicators. The assumption turns out to bea highly unlikely stretch.

To see why the assumption is so fragile,consider what it means for a new indicator toadd to the impact on results of other explanatoryindicators. It means changes in results mustaccompany changes in the new indicator if theoutcomes of the other indicators hold steady. Sothere must be some correlation between the newindicator and those results.

Now consider what it means for a new indica-tor to add to the information content of a set ofother explanatory indicators. It means youshould not be able to predict outcomes of thenew indicator from outcomes of the other indica-tors. So there must be very low correlation be-tween the new indicator and the other indicators.

These two conditions for new indicators to beuseful in understanding or controlling resultsbecome harder and harder to meet as you addmetrics to the predictive pile. Each additionalindicator must correlate decently with results

RESEARCH PAPER Syst. Res.

Copyright © 2013 John Wiley & Sons, Ltd. Syst. Res. 32, 28–49 (2015)DOI: 10.1002/sres.2219

42 David Apgar

Page 16: The False Promise of Big Data: Can Data Mining Replace ...

and very little with all of the other explanatoryindicators. It is similar to seeking locations fornew community clinics connected to a local hos-pital. You want each new clinic to be close tothe hospital. (This is similar to correlating withresults.) But you do not want to build a newclinic close to any other clinic. (This is similar toavoiding correlation with other explanatory indi-cators.) Fairly soon, you run out of places to buildclinics—and you run out of logical space inwhich to find new indicators that add to informa-tion content and impact at the same time (seeAppendix for related approaches).An equivalent way of putting this is that the

addition of low-information factors to any theorymeant to explain complex results makes the the-ory complex. And complex theories are improba-ble in any sense of the word. The two conditionsare related—complex theories are unlikely for thesame reason that a computer printing randomfigures is likelier to write a simple program bychance than a complex one (Kolmogorov, 1963).The strategy of big data amounts to taking com-plexity out of the predictive factors of a theoryand putting it into the theory, itself. But it is anunpromising strategy because complex theoriesare improbable.Pearl’s work on the causal Markov condition

may in time generate a proof of this. He hasshown that a model will include enough indica-tors for a causal analysis of a result if theirunexplained variances (i.e. the variances thatthe other indicators cannot explain) are indepen-dent (Pearl, 2000). This condition may be harderto meet as factors multiply.These considerations help explain the robust-

ness of the assertion Vilfredo Pareto made in hisclassic work on probability distributions thatonly a vital few factors drive most results (Pareto,1896). All versions of the 80–20 rule—that 20% offactors explain 80% of the variability in a result—derive from Pareto’s vital few. And qualitycontrol methods such as Six Sigma explicitlyrecognize it in searching for only the vital fewroot causes of a quality problem.Business analytics and other kinds of data anal-

ysis systems for improving performance makeplenty of sense as a way to test the effectivenessof indicators in understanding, predicting or

controlling results. But the vast majority of usersare hoping for somethingmore. Theywant to findnew indicators in the haystack of newly search-able stores of operating andmarket data that dataanalysis applications generate. They are lookingin the wrong place.

There is one way to avoid the misuse of dataanalysis described earlier and its reliance on thefragile assumption that it is possible to build upthe joint information content and impact on re-sults of a list of predictive indicators even if theindividual indicators have low individual infor-mation content. And that is to find a few indica-tors, each with high information content, that allhave a large impact on results. The trouble is thathigh information-content indicators are not likelyto be sitting in a data warehouse, no matter howexpensive. They are too specific.

Recall that factors and assumptions reflectedby indicators with high information content aresimilar to recipes rather than lists of ingredients.Their specificity makes them hard to find in anyrepository of widely shared indicators. To take anew example, consider the original microfinanceproposal to ensure credit quality by relying onpeer pressure from solidarity groups of womenproviding loan cross-guarantees to one another.The proposal embodies specific assumptionsabout the effect of solidarity group formation,cross-guarantees and customer knowledge onlending outcomes. It has far more informationcontent than the typical bank strategy of acquir-ing adjacent banking companies and closingredundant branches.

That information content gives the microfinanceformula the potential to account for complex lend-ing outcomes and—if it turns out to be right and iswell executed—achieve success. A broader strat-egy, in contrast, might be right as far as it goesbut could never by itself ensure results becauseone could never be clear what counted as correctexecution. The very information content that givesthe more specific formula the potential to deter-mine results, however, makes it unlikely to befound on a virtual shelf of widely measuredindicators.

There is a reliable way to find factors, drivers,or assumptions with enough information contentto predict or determine complex results and that

Syst. Res. RESEARCH PAPER

Copyright © 2013 John Wiley & Sons, Ltd. Syst. Res. 32, 28–49 (2015)DOI: 10.1002/sres.2219

The False Promise of Big Data 43

Page 17: The False Promise of Big Data: Can Data Mining Replace ...

is to stipulate specific hypotheses—big, uncertainbets about how things really work. They are sim-ilar to searchlights into the space of root causes—lighting up specific ones that we can readily test.They may be wrong, but they generate prescrip-tions specific enough to lay down markers forrapid learning.

The learning takes place in the course ofdeliberating on the outcomes of successive exper-iments to update those hypotheses. As RichardJeffrey put it in describing how subjective proba-bilities shed light on causality: ‘In decisionmaking it is deliberation, not observation, thatchanges your probabilities’ (Jeffrey, 2004: 103).Hypothesis-driven learning is not a bad termfor this, because it recognizes that deliberatingon successive experiments will lead you tometrics with increasing predictive power so longas you search through factors with informationcontent on the order of the results you are tryingto predict.

There are more specific kinds of learning thatcan find predictive metrics where data miningcannot. Ross Ashby defined nested feedbackloops that allow both learning within a processand learning to improve the process as double-loop learning, for example, and argued it is thecharacteristic capability of adaptive systems(1956). But more specific definitions wouldweaken the conclusion that, in the search forcomplex predictive factors, data mining cannotreplace any kind of learning that revises hypoth-eses over successive experiments. The point isthat any process of stipulating and testing spe-cific hypotheses has a potential that data mininglacks to explain complex results through simpletheories employing high-information factors.

Such learning may sound similar to a time-consuming ordeal until one considers howbroad a swath of experience will be relevantto predictive factors with high information content.For such factors, the tests come fast and furious. IanHacking warned against underestimating thepower of actual experience to test specific beliefs26years before writing his Introduction to probabilityand inductive logic (2001). In Why Does LanguageMatter to Philosophy?, he defended an account ofmeaning by Donald Davidson that takes defini-tions to be hypotheses tested by their experiential

use in sentences with observable truth conditions.Critics doubted experience could constraintspeakers’ definitions enough to ensure mutualcomprehension. In an echo of Alfred Tarski’s con-cept of logical consequence (1936), Hackingdisagreed. Specific theories offer more predic-tions for experience to test, and Davidson’sdictionary full of hypotheses is nothing if notexplicit—indeed, it ‘…is an elaborate structurewith a large number of, in themselves, triflingconsequences that can be matched against ourexperience’ (Hacking, 1975: 143). Specific strate-gies similarly make more of our business experi-ence available to test and refine our plans.

Managers who want to understand, predictand control complex results therefore face a starkchoice. They can sort through large volumes ofindicators with low information content and me-diocre predictive power from enterprise planningsystems and other big data applications. Or canthey can advance hypotheses about what reallydrives their results, test them as quickly as possi-ble, and repeat the process.

The first option can generate requirements butnot recipes for success—avoiding the embarrass-ment of wrong starts but never reaching clarityabout what drives results. The second option willoccasionally lead to a mistaken initiative but willevolve a clear picture of the (possibly changing)root causes of results and thus strengthen firmresilience over time.

Indeed, how organizations evolve to negotiatethat stark choice may be the most valuable legacyof the VSM of Stafford Beer (Beer, 1972). Beercontrasted the routine arbitration of the compet-ing demands of production and back-officefunctions with a more evolved function arbitrat-ing the more complex competing claims ofroutine management and a function responsiblefor radical innovation and renewal. Beer’s modelthus aligns management functions with thevarying information content or complexity ofthe challenges firms face. Indeed, the top layerof management is positioned to manage the mostcomplex problems precisely, because it is in thebest position to advance hypotheses and experi-ment. Modern organizational structure, in otherwords, has evolved to handle complexity. It ismodern management—perhaps persuaded by a

RESEARCH PAPER Syst. Res.

Copyright © 2013 John Wiley & Sons, Ltd. Syst. Res. 32, 28–49 (2015)DOI: 10.1002/sres.2219

44 David Apgar

Page 18: The False Promise of Big Data: Can Data Mining Replace ...

misguided empiricism or false promise of bigdata—that is not adapting to the rising complex-ity of business outcomes.In sum, managers cannot replace hypothesis-

driven learning with data mining to identify pre-dictive metrics that can explain complex results,because large-scale data analysis applicationsare limited to linking simple factors in theoriesthat are improbably complex. Managers conf-ronting complex results, as a consequence,should test specific hypotheses and adapt themuntil they work. In fact, the misuse of data analy-sis may help us understand the ongoing failure ofmany firms to deal with complexity.

DIVERGENT APPROACHES TO SIMILARCHALLENGES AT WELLPOINT AND NESTLÉ

This section contrasts the widespread use of busi-ness analytics at WellPoint with the process of ne-gotiated experimentation at Nestlé. Nestlé’siterative negotiation of highly specific strategyexperiments handles complexity better thanWellPoint’s requirements-based planning.Of course, companies do not experiment with

strategy unless they must. At Nestlé, executivesmust tinker, because the strategies they are inves-tigating are so tailored and specific that no onecould be sure in advance they will work. Thatkind of specificity is the hallmark of assumptionsabout operating levers and risk factors withenough information content to have predictivepower. The experiments help Nestlé teams findstrategies, as we shall see, that also have a highimpact on the complex results they need to un-derstand and control.Since WellPoint Health Networks and Anthem

merged in 2004, WellPoint has become one of thetop two health insurers in the USA and serves 34million plan members or policy holders. As an in-dependent licensee of Blue Cross and Blue Shield,it is active in 14 states, including California andNewYork. It is a notable pioneer in the use of busi-ness analytics for its human resources function andfor patient care. An early success in human re-sources was to analyze the reasons why first-yearturnover in services jobs varied between 15% and

65% in order to suggest improvements in talentmanagement (Ibarra, 2008).

Among WellPoint’s business intelligence solu-tions for patient care are SAS business analyticsalgorithms that classify patients by severity ofmedical risks. These algorithms search diagnosticlists, run rules-based tests and mine data for pat-terns. Uses include identifying patients whoshould have home monitoring devices afterbeing discharged from a hospital.

The provider also rates 1500 hospitals under 51metrics to measure quality of care and predicthospital readmission rates that boost health carecosts. Hospitals falling below a certain thresholdfor quality of care are not eligible for rate in-creases. The quality of care formula is based onhealth outcomes, patient safety precautions andpatient satisfaction.

On 13 September 2011, Analytics magazinereported the first commercial application ofIBM’s Watson technology in a WellPoint effortto improve the delivery of up-to-date, evidence-based health care. Watson’s most notableachievement prior to this was beating two of thetop Jeopardy champions. Watson’s ability to inter-pret natural language and process large amountsof data would go to work proposing diagnosesand treatment options to doctors and nurses.

Back in 2010, however, a WellPoint use of busi-ness analytics came to light that was far lessflattering. Reuters reporter Murray Waas wrotein April of that year that one of WellPoint’s algo-rithms identified policyholders recently diag-nosed with breast cancer and triggered fraudinvestigations of them. Fraud investigations areapparently not limited to fraud and can turn upany of a wide variety of reasons or pretexts forcancelling a patient’s coverage. Health andHuman Services Secretary Kathleen Sebelius crit-icized the practice as deplorable. WellPoint CEOAngela Braly responded that the computer algo-rithm in question did not single out breast cancer,which led critics to wonder whether the insurerwas using business intelligence to deny coveragefor all pre-existing conditions retroactively (Noah,2010). WellPoint agreed to stop nullifying policiesin force for reasons other than fraud or materialmisrepresentation 5months ahead of the deadlineset in the Affordable Care Act (ACA).

Syst. Res. RESEARCH PAPER

Copyright © 2013 John Wiley & Sons, Ltd. Syst. Res. 32, 28–49 (2015)DOI: 10.1002/sres.2219

The False Promise of Big Data 45

Page 19: The False Promise of Big Data: Can Data Mining Replace ...

Some may dismiss this as evidence of weak USbusiness ethics in general but the real problemappears to be WellPoint’s pursuit of too manyconflicting requirements subject to measurement.Annual letters to shareholders are rarely thesources of great business insight any more, butWellPoint’s 2010 letter is remarkable for its avoid-ance of hard choices in the strategy it lays out. Itcalls for increasing share in high-growth domes-tic businesses such as Medicare and Medicaid,for example, without explicitly cutting back itscore national health-benefits accounts business,which will soon be subject to the ACA.

For its national accounts business, moreover,WellPoint identifies four drivers: the size of itsmedical network, low cost structure, medicalmanagement innovation and customer service.This is a classic case of requirements-based plan-ning, in which the items form a list of necessaryingredients of success but provide nothing suchas a recipe for it. It is a list of low-informationfactors with low specificity and low predictivepower.

By leaving strategy undefined, moreover, theymake it more likely that the firm will drift. Nodoubt one headquarters unit works to grow themedical network, another to lower its cost struc-ture, a third to innovate in medical management,and a fourth to strengthen customer service. Butthis leaves open critical questions such aswhether the network will grow regardless of cost,whether cost containment is more important thancustomer service, or whether innovative medicalmanagement programs can cut back the network,raise indirect costs, or encourage customer self-help. Ideally, strong execution in each categoryof requirements for success will ensure success.In the real world, however, requirements-basedplanning systems encourage different teams topursue goals that turn out to conflict.

That may be precisely what happened in thecase of the WellPoint business analytics algo-rithm that featured in the political debate leadingup to the ACA. It is hard to imagine someone in aboardroom say: ‘Let’s figure out a way to nullifythe policies of women diagnosed with breast can-cer.’ But it is easy to imagine the team looking forcontinuous improvement in the application ofbusiness intelligence software to broaden its

mandate without checking in with the team try-ing to improve critical-care services for patientswith serious problems.

In other words, WellPoint’s misstep looks likea symptom of management by checklist wherethe checklist includes every major widely usedvariable subject to measurement. The problem isthat such checklists can never replace experimen-tal learning about specific and coherent strategiesthat may or may not work right away—but thatwill provide the most guidance about whatworks over time.

Under Peter Brabeck, Nestlé refined its recipesfor success every time it missed a target. Thecumulative effect was to enable the company togrow 5.4% per year over the 10years up to thefood crisis in 2007—faster than all of its rivalsdespite its enormous size, and in an industry thatgrows only at the pace of the world economy.

The company set broad cross-product goals foreach of its regions and cross-region goals for eachof its product groups. Managers then had leewayto shift goals across products or regions untilgoals for products within regions, and for regionswithin products, reflected equal tension. The re-sult was detailed recipes for meeting regionalgoals across all products and product goalsacross all regions. Flaws showed up as conspicu-ous patterns of hits and misses across productsand regions (author interview with controllerJean-Daniel Luthi, 22 May 2007).

For example, suppose the Kit Kat and Nescaféglobal managers each expect to sell $50m ofproduct in Vietnam and $100m in Indonesia.Knowing more about local tastes, the Vietnammanager might negotiate higher Nescafé andlower Kit Kat targets (say, $70 and $30m) if theIndonesia manager will accept lower Nescaféand higher Kit Kat targets (say, $80 and $120m).Whereas total country and product targets re-main the same, the specific product targets ineach country now reflect equal stretch or ambi-tion with respect to the market assumptions ofthe local managers.

As a consequence, missed Kit Kat targets inboth countries are more likely to tell you some-thing about Kit Kat strategy, and missed targetsfor both products in Vietnam are more likely totell you something about Vietnam strategy.

RESEARCH PAPER Syst. Res.

Copyright © 2013 John Wiley & Sons, Ltd. Syst. Res. 32, 28–49 (2015)DOI: 10.1002/sres.2219

46 David Apgar

Page 20: The False Promise of Big Data: Can Data Mining Replace ...

Because equal-stretch goals have high informa-tion content, results provide much guidanceabout how to revise them.Nestlé generates a succession of negotiated ex-

periments, each benefiting from lessons learnedfrom prior results. So there is a natural interplay be-tween proposing new success drivers, which is nec-essary to set up specific product/country goals inthe first place, and experimentation, which correctsand refines those proposals as quickly as possible.Unlike WellPoint’s pursuit of a list of broad and

potentially conflicting requirements, Nestlé’ssystem makes managers to negotiate conflictsbefore setting specific goals. Moreover, the very ne-gotiations that resolve conflicts lead tomore specificstrategies. The specificity of Nestlé’s successiverecipes stands in sharp contrast to the list of ingredi-ents WellPoint communicates to its shareholders.

CONCLUSION

In retrospect, it should be no surprise that USmanufacturers continue to have trouble identify-ing metrics that can predict and explain perfor-mance results despite two decades of intensiveIT investment in data analysis applications. Andit should be no surprise that the hypothesis-driven experimentalism of Toyota and Nestlé’sstream of negotiated experiments produce betterresults than HP’s efforts to derive strategy fromdata and WellPoint’s pursuit of data-driven re-quirements. Data mining cannot substitute for thehypothesis-driven learning needed to find metricswith the power to explain complex results.Reliance on the successive experiments of

hypothesis-driven learning to find what im-proves organizational performance may seemretrograde at a time of unprecedented analyticpower. Both systems research and informationtheory suggest, however, that metrics with thepower to predict complex results must have highinformation content as well as a high impact onthose results. And data mining cannot substitutefor the experiments of hypothesis-driven learn-ing in the search for predictive metrics with highinformation content—not even in the aggregate—because it relies on low-information metrics re-quiring theories that are improbably complex.

This means the widespread belief that datamining can help managers find prescriptions forsuccess is a fantasy. Data analysis can test metricswith the potential to predict or control complexresults once identified. But it cannot substitutefor hypothesis-driven learning.

REFERENCES

Ackoff RL. 1981. Creating the Corporate Future. Wiley:New York.

Ackoff RL. 2001. A brief guide to interactive planningand idealized design. http://www.ida.liu.se/~steho/und/htdd01/AckoffGuidetoIdealized-Redesign.pdf [10 March 2013].

Akaike H. 1974. A new look at the statistical modelidentification. IEEE Transactions on Automatic Control19(6): 716–723.

Apgar D. 2008. Relevance: Hitting Your Goals by Know-ing What Matters. Jossey-Bass: San Francisco.

Ashby R. 1956. An Introduction to Cybernetics.Chapman & Hall: London.

Beer S. 1972. Brain of the Firm: A Development in Man-agement Cybernetics. Herder and Herder: London.

Beinhocker E. 2006. The Origin of Wealth. Harvard Busi-ness School Press: Boston.

Burnham KP, Anderson DR. 2002. Model Selection andMulti-Model Inference: A Practical Information-TheoreticApproach 2nd ed. Springer-Verlag: New York.

Christensen C. 1997. The Innovator’s Dilemma. HarvardBusiness School Press: Boston.

Cover T, Thomas J. 1992. Elements of Information Theory.John Wiley & Sons, Inc: New York.

Dewar J, Builder CH, Hix WM, Levin M. 1993.Assumption-Based Planning: A Planning Tool for Very Un-certain Times. Rand Corporation: Santa Monica, CA.

Doll R, Hill AB. 1950. Smoking and carcinoma of thelung. British Medical Journal 2(4682): 739–748.

Fisher R. 1924. On a distribution yielding the error func-tions of several well known statistics. In Proceedings ofthe International Congress of Mathematics, Toronto, 2:805–813.

Gaitonde R. 2005. An analysis of HP’s future strategy,post Carly Fiorina. OS news 24 Feb. http://www.osnews.com/story/9810/An-analysis-of-HPs-future-strategy-post-Carly-Fiorina [8 Feb 2012].

Gobry P. 2011. Leave HP alone. Business Insider 29Aug. http://articles.businessinsider.com/2011-08-29/tech/30008570_1_leo-apotheker-hp-mark-hurd[8 February 2012].

Hacking I. 1975. Why Does Language Matter to Philoso-phy? Cambridge University Press: Cambridge, UK.

Hacking I. 2001. An Introduction to Probability andInductive Logic. Cambridge University Press:Cambridge, UK.

Syst. Res. RESEARCH PAPER

Copyright © 2013 John Wiley & Sons, Ltd. Syst. Res. 32, 28–49 (2015)DOI: 10.1002/sres.2219

The False Promise of Big Data 47

Page 21: The False Promise of Big Data: Can Data Mining Replace ...

Hannan MT, Carroll GR. 1992. Dynamics of Organiza-tional Populations: Density, Legitimation, andCompetition. Oxford University Press: Oxford.

HannanMT, Freeman JH. 1977. The population ecologyof organizations. American Journal of Sociology 83:929–964.

Hannan MT, Freeman JH. 1984a. Structural inertia andorganizational change. American Sociological Review49(2): 149–164.

Hannan MT, Freeman JH. 1984b. Organizational Ecology.Harvard University Press: Cambridge, MA.

Hausmann R, Rodrik D, Velasco A. 2005. Growthdiagnostics. Working paper summarized in: 2006.Getting the diagnostics right. Finance and Development43(1).

Ibarra D. 2008. HR driving the business forward: HR an-alytics atWellPoint.HR.com 22 April. http://www.hr.com/SITEFORUM?t=/documentManager/sfdoc.file.detail&e=UTF-8&i=1116423256281&l=0&fileID=1242361406876 [14 February 2012].

Jaynes ET. 1957. Information theory and statisticalmechanics. Physics Review 2(106): 620–630.

Jeffrey R. 2004. Subjective Probabilities. Cambridge Uni-versity Press: Cambridge, UK.

Jorgenson D, Ho M, Stiroh K. 2008. A retrospectivelook at U.S. productivity growth resurgence. TheJournal of Economic Perspectives 22(1): 3–24.

Kepner C, Tregoe B. 1997. The New Rational Manager.Princeton Research Press: Princeton, NJ.

Kolmogorov A. 1963. On tables of random numbers.Sankhyā Series A 25: 369–375.

Lashinsky A. 2009. Mark Hurd’s moment. CNN Money 3February. http://money.cnn.com/2009/02/27/news/companies/lashinsky_hurd.fortune [8 February 2012].

Lucas G. 1976. Econometric policy evaluation: a cri-tique. Carnegie-Rochester Conference Series on PublicPolicy 1(1): 19–46.

McGrath R, MacMillan I. 1995. Discovery-driven plan-ning. Harvard Business Review 1995: 44–54.

Noah T. 2010. Obama vs. WellPoint. Slate 10 May.http://www.slate.com/articles/news_and_politics/prescriptions/2010/05/obama_vs_wellpoint.html[14 February 2012].

Olson M, Van Bever D. 2008. Stall Points. YaleUniversity Press: New Haven, CT.

Pareto V. 1896. Cours d’Économie Politique Professé àl’Université de Lausanne. Université de Lausanne:Lausanne.

Pearl J. 2000. Causality: Models, Reasoning, and Inference.Cambridge University Press: Cambridge, UK.

Popper K. 2002. Conjectures and Refutations: TheGrowth of Scientific Knowledge. Clarendon Press:London.

Reichenbach H. 1956. The Direction of Time. UC Press:Berkeley, CA.

Scholten D. 2010. A primer for Conant and Ashby’sgood-regulator theorem. (Unpublished.)

Tarski A. 1936. Über den Begriff der logischenFolgerung. Actes du Congrès international de philosophiescientifique, Sorbonne, Paris 1935, Hermann VII: 1–11.

Wiggins RR, Ruefli TW. 2002. Sustained competitiveadvantage: temporal dynamics and the incidenceand persistence of superior economic performance.Organization Science 1: 81–105.

Wiggins RR, Ruefli TW. 2005. Schumpeter’s ghost: Ishypercompetition making the best of times shorter?Strategic Management Journal 26: 887–911.

APPENDIXFor those interested in a precise mathematical in-terpretation of the term impact, it is used here torefer to backscatter or the sensitivity of a result toa cause or driver of that result. Mathematically, itis the entropy of the driver conditioned on theresult it drives. The rough measure of a driver’sWCI given in the article is related but not equiv-alent to this. WCIs, however, are useful in prac-tice. The Conanct–Ashby theorem that everygood regulator of a system must be a model ofthe system is equivalent to the conditional en-tropy requirement—it shows there is a mappingfrom the states of a system to the states of any-thing that effectively regulates, explains, controlsor predicts it, thus minimizing the entropy of theregulator’s states conditioned on the states ofwhat it regulates (Scholten, 2010).

The conditional entropy of a driver given out-comes for a result that it drives can be measuredas R(h,ei) =Σ p(ei/h) × log2p(h/ei) where p(h/ei) isthe probability of an outcome h for the driver giveneach of the various outcomes ei for the result(Apgar, 2008: 198). Note, once again, that this isnot equivalent to the concept of WCI in the text.

Here is an analogue to the information-theoryrelations in the text for those more familiar withBayesian probability. Let f be an outcome of apredictive factor and let e be a result that it helpspredict. For the factor to have predictive power,knowledge of f must transform the probabilitieswe assign to different possible results e. In otherwords, p(e/f)/p(e) must be very high orvery low for all or most possible results e, wherep(e/f) is the probability of e given f and p(e) is theprior probability of e. But by Bayes’ theorem, thatmeans p(f/e)/p(f) must also be very high or verylow for all or most possible results e. This in turn

RESEARCH PAPER Syst. Res.

Copyright © 2013 John Wiley & Sons, Ltd. Syst. Res. 32, 28–49 (2015)DOI: 10.1002/sres.2219

48 David Apgar

Page 22: The False Promise of Big Data: Can Data Mining Replace ...

means that p(f/e) must be close to 1 or 0 for all ormost possible results e, and therefore, there cangenerally be no changes in f without correspond-ing changes in e—which is close to saying f musthave a high impact on e. It also means that thedenominator p(f) must be low for typical out-comes of the factor—which is the same as sayingthe factor must have high information content.

The argument on hospital clinics bears a re-semblance to the so-called Chinese restaurantprocess giving rise to a limiting case of certainsimple Dirichlet distributions that can also be usedto determine limits on the number of useful ex-planatory factors. See: Ferguson T. 1973. Bayesiananalysis of some nonparametric problems. Annalsof Statistics 1(2): 209–230.

Syst. Res. RESEARCH PAPER

Copyright © 2013 John Wiley & Sons, Ltd. Syst. Res. 32, 28–49 (2015)DOI: 10.1002/sres.2219

The False Promise of Big Data 49