Big data performance management thesis

68
The University of Strathclyde Business School Master of Business Administration Big Data: A Framework for guiding Big Data Analytics Ahmad Muammar 1 st of March 2014 Bahrain Centre Total Number of Words without appendices and table of contents: 14,903

Transcript of Big data performance management thesis

The University of Strathclyde

Business School

Master of Business Administration

Big Data: A Framework for guiding Big Data

Analytics

Ahmad Muammar

1st of March 2014

Bahrain Centre

Total Number of Words without appendices and table of contents: 14,903

i

Statement of academic honesty

I declare that this dissertation is entirely my own original work.

I declare that, except where fully referenced direct quotations have been included, no aspect of

this dissertation has been copied from any other source.

I declare that all other works cited in this dissertation have been appropriately referenced.

I understand that any act of Academic Dishonesty such as plagiarism or collusion may result in

the non-award of a Masters degree.

Signed _____Ahmad Muammar___________ Dated __March-1st-2014_______

ii

Contents

Statement of academic honesty ............................................................................................................ i

Contents .............................................................................................................................................. ii

List of figures ..................................................................................................................................... iv

2 Introduction ..................................................................................................................................1

3 Literature Review .........................................................................................................................3

3.1 Introduction ..........................................................................................................................3

3.2 Big Data- What and Why? ...................................................................................................4

3.3 Big Data – How and Who? ..................................................................................................8

3.4 Important Related Terms ...................................................................................................10

3.5 Analytics Models ...............................................................................................................11

3.6 Big Data as a strategic initiative ........................................................................................12

4 Cases Review .............................................................................................................................15

4.1 Introduction ........................................................................................................................15

4.2 C-PG: The case of Procter & Gamble (P&G) ....................................................................15

4.3 C-OE: The case of Obama election campaign ...................................................................17

4.4 C-GE: The case of GE .......................................................................................................18

4.5 C- WM: The case of Wal-Mart ..........................................................................................20

5 Methodology ..............................................................................................................................22

5.1 The study purpose and the research objectives. .................................................................22

5.2 Research Choices ...............................................................................................................23

5.2.1 Research Approach and Design. ................................................................................23

5.2.2 Research Methodologies and Methods ......................................................................24

5.2.3 Data Collection Techniques .......................................................................................25

5.2.4 Data Analysis .............................................................................................................27

5.3 Limitations .........................................................................................................................28

5.4 Conclusion .........................................................................................................................29

6 Discussion and Analysis ............................................................................................................29

6.1 Big Data and Organizations ...............................................................................................29

6.2 Different stage, different measurements ............................................................................31

iii

6.2.1 The Start .....................................................................................................................34

6.2.2 The Transformation....................................................................................................37

6.2.3 The Maturity ..............................................................................................................43

7 Conclusion and further studies. ..................................................................................................47

7.1 Future studies .....................................................................................................................47

8 Personal Reflection ....................................................................................................................49

9 Appendix A: Literature Search ..................................................................................................50

10 Appendix B: Common Data Mining Methods(Shearer 2000) ...............................................51

11 Appendix C: Sample of the Coding Matrix ...........................................................................56

12 Appendix D: Turnitin Report .................................................................................................57

Bibliography ......................................................................................................................................58

Interview Consent Form ........................................................................................................................

Interview Participant Information Sheet ................................................................................................

iv

List of figures Figure 1: Gartner Hype Cycle 2012 .....................................................................................................1

Figure 2: Literature review map...........................................................................................................3

Figure 3: IDC's Digital Universe Study, sponsored by EMC, June 2011 ...........................................4

Figure 4: Google Trends for “Big Data” limited to “Business and Industrial” ....................................5

Figure 5: The Vs characterizing Big Data ...........................................................................................6

Figure 6: Michael E. Porter “Competitive Strategy: Techniques for Analyzing Industries and

Competitors”(Bill Schmarzo 2012) .....................................................................................................8

Figure 7: Data Warehouse infrastructure ...........................................................................................10

Figure 8: Phases of the CRISP-DM Reference Mode (Shearer 2000) ...............................................11

Figure 9: virtuous cycle of data mining focuses on business results, ................................................12

Figure 10: Michael Porter’s Value Chain Analysis (Bill Schmarzo 2012) ........................................13

Figure 11: Business Sphere rooms in P&G .......................................................................................15

Figure 12: Example of 1% saving across sectors (Evans & Marco Annunziata n.d.) .......................19

Figure 13: Themes were printed with the interviewer to guide the discussions. ...............................23

Figure 14: Gartner Hype Cycle for emerging technologies as of Aug 2013 ......................................30

Figure 15: IT Industry 3rd

platform of growth and innovation (Source IDS) .....................................31

Figure 16: The stages of KM development (Lopez 2001) .................................................................33

Figure 17: Stages used by the author in the current study .................................................................33

Figure 18: The IT Strategic Impact Grid (Nolan & Mcfarlan 2005) .................................................37

Figure 19: Balanced scorecard (Kaplan & Norton 1992) ..................................................................40

Figure 20: Transformational stage BSC for Big Data ........................................................................43

Figure 21: P&G Value chain and data analytics ................................................................................44

1 Introduction

The world is facing an exponential growth of data; tremendous data is created by smart devices,

RFID technologies, sensors, social media, video surveillance and more. IDC estimated the data

created by humanity in 2000 by two Exabytes of data; a similar amount was created in 2011 every

day (LYMAN, Peter and Varian, Hal, 2011). While data is created primarily by individuals,

organizations are expected to manage this data(Gantz & Reinsel 2011). Isn’t this an unavoidable

burden on organizations? Is the problem of managing and storing data a vital concern that needs an

immediate resolution? Well, Big Data advocates believe that information explosion represents a

huge opportunity for organizations; mining this mountain of dirt will most likely reveal golden

values! In fact, Mckinsey (Manyika et al. 2011) estimates the potential annual value of leveraging

Big Data in US health care to be $300 billion, and more than that figure in Europe’s public sector

administration. Gartner mentioned Big Data more than ten times in its Hype Cycle report of

emerging technologies that evaluates 1900 technologies (Pettey & Meulen 2012). However, a

careful review of the hype indicates that Big Data is about to reach the peak of inflated expectation,

which is followed by trough of disillusionment. Does that mean that Big Data might be a fad and

simply a new IT buzzword to impress the business and sell more of the same stuff?

Figure 1: Gartner Hype Cycle 2012

In parallel to this hype, several companies are competing to create sound technologies to capture,

manage and analyze this huge data. At the same time, other companies are creating more smart

devices and applications to create even more data. Several investments are out there with the

2

purpose of collecting more data with no profit, hoping to figure out how to monetize it later,

following Facebook pathway.

“Because computers have enabled humans to gather more data than we can digest, it is only

natural to turn to computational techniques to help us unearth meaningful patterns and structures

from the massive volumes of data” (U. M. Fayyad et al. 1996)

This new data is mostly unstructured or semi-structured which is different form of data that

traditional technologies used to deal with. It is also created and streamed in a very fast speed, and

dealing with it has to be as fast as possible, some argues. This represents another challenge for the

current traditional technologies.

In this project, my aim is to understand the fascinating topic of Dig Data more thoroughly and to try

to differentiate realities and myths about Big Data. At the same time, I’m hoping to suggest a

practical framework that can be used by ambitious organizations to evaluate and guide their

performance in terms of Big Data. Critical literature review about the topic, synthesizing inputs

from subject matter experts and review successful implementation case studies in contemporary

organizations will be the main pillars for this framework.

3

2 Literature Review

2.1 Introduction

Through the literature review process, one can rapidly discover that “Big Data” topic is in its

infancy stage in the business academic journals and is still far from catching up with their

counterpart in trade and grey journals. Searching in known literature database shows small number

of hits in business academic journals compared to trade journals (refer to Appendix A), and the

number is really negligible if we compare it to Google scholar of more than 6000 hits and 21M in

Google general search hits1. (Lazer et al. 2009) noticed that “the emergence of a data-driven

computational social science has been much slower”. A possible explanation of the phenomenon is

that some aspects of “Big Data” is not relatively new; for example, a large amount of literature have

deep coverage of topics like, analytics, knowledge discovery, data mining, decision making and

business intelligence; both from technological and business point of view. The Big Data term,

however, triggered wild imaginations of ideas and possibilities in the media and trade papers due to

the value that can be created with today’s available technologies.

A semi-systematic literature review was initially followed to capture the maximum amount of

relevant papers. Both academic peer-reviewed and non-peer-reviewed relevant business papers

were studied. This stage was followed by non-systematic literature review, where the focus was

directed to search until certain themes were discovered and converged concepts were reached.

The literature flow and themes are depicted in the figure below

Figure 2: Literature review map

1 29th of November 2012

4

2.2 Big Data- What and Why?

The dynamic interplay between technology and social ecology is a historical phenomenon, and both

have been shaping each other for long time. Technology development, Internet inexpensive

availability, mobile proliferation and smart phones allowed the mainstream to stay connected most

of the time. (Manyika et al. 2011) has estimated the number of mobile phones in use to be 5 billion

in 2010. This has offered the already-rising social media more momentum and wider reach. In fact,

Facebook is around one billion users at the time of writing this research. What Kolter (Kotler et al.

2010) calls the age of participation is equipped now with more advanced and cheap tools for people

to remain connected longer and to create even further participation and additional collaboration,

nonetheless, with greater and more valuable content.

Figure 3: IDC's Digital Universe Study, sponsored by EMC, June 2011

In the other hand, technology is reaching the economy of scale quite faster than before; turning

what was once-restricted to rich, into easily accessible gadgets to the mainstream. This sharp drop

in computing, storage and network prices have not only enabled people to make more collaborative

content, but also enabled humankind to generate more smart digital sensors than before, producing

more data, sending it in real time and stocking it to for analytics.

For decades, organizations have been crunching and analyzing transactional data pursuing insight

and knowledge discovery. Recently, there has been unprecedented interest in big data and big data

analytics. While not particularly reliable, Google trends service shows a large search volume

against “Big Data” and a big hype has been created around the concept. This can give an indication

on the amount of interest about big data. Arguably, the Big Data has established itself as the

buzzword of 2013 and for years to come.

5

Figure 4: Google Trends for “Big Data” limited to “Business and Industrial”

Big data can be described simply as a new type of data that needs different tools and technologies to

deal with and Big Data analytics is the methods used to create insight out of it. Mostly showing up

in computer literature, several big data definitions are centered around size and scale, others have

focused on the technological implications - For example; McKinsey defines Big Data as the

datasets whose size represents a challenge for traditional computing technologies (Manyika et al.

2011). (Eaton et al. 2012), (Edd Dumbill 2012) have also suggested that term applies on the data

that can’t be processed using traditional tools. Those definitions imply that big data today will not

be big data anymore when technologies progress to overcome today’s obstacles! However, this

context is not new and (U. M. Fayyad et al. 1996) have similar proposition describing knowledge

discovery database (aka KDD).

At the same time the characteristics of Big Data, commonly known as 3Vs – have occupied

considerable part of the Big Data explanations (Philip Russom 2011), (Eaton et al. 2012),(Carter

2011) and others:

Volume: This V suggests that the amount of data available to organizations is growing

exponentially, and data sources are increasing in number and in the content they generate. It

also reflects the trend to analyze big chunk of the data rather than small samples, in order to

capture more value, some argue (SAS 2012).

Velocity: refers to the speed of capturing the real-time data and the need to rapidly process

it in real time.

Variety: highlights the importance of unstructured data (text, audio, blogs, micro blogs,

etc.), along with the traditional transactional data.

6

Figure 5: The Vs characterizing Big Data

Others have added the variability and seasonality of data flow (SAS 2012) as another attribute of

big data. Recently, veracity has been proposed to stress the importance of quality and

trustworthiness degree of data (Paul C. Zikopoulos et al. 2012); some data is uncertain by definition

(things like sentiment analysis, economic factors, weather conditions, truthfulness of humans),

which data cleansing can’t traditionally correct. It is also important to highlight that big data and big

data analytics have been widely used as a synonym, in fact, some has deliberately re-defined big

data to focus on the analysis part (Gantz & Reinsel 2011)

The debate about what value big data adds, and how the value is created has started to appear in

researches. The mainstream writers have let their imaginations soar to construct relations between

digital traces in order to foretell possibilities and extract insights. Others have taken the Big Data

further to a bold claim that Big Data is going to redefine science and knowledge as we know it;

(Anderson 2008) claims that applied mathematics will replace every other tool we know. Other

writers have shown some skepticism, seeing the promise of Big Data is over simplistic. At the end

social connections are not equal, frequency of cyber communications is not a relation, and number

of tweets doesn’t mean more social (Boyd & Kate Crawford 2011). In my view, Big Data analytics

is not a replacement for the scientific methodologies, market research or even the “gut feeling”, but

will certainly enrich them and enable faster knowledge-based-actions, in particular, when the speed

is an important factor, and the challenge in differentiating between causation and correlation is not

unique to Big Data analytics. Correlation among search phrases allowed Google in 2009 to predict

the spread the spread of H1N1 better the governmental analysts (Mayer-Schönberger et al. 2013).

7

Undoubtedly, Big Data analysis and discovery will create enormous value, some argue. Its value

comes profoundly from the extracting sophisticated patterns of relationships between its parts

(Boyd & Kate Crawford 2011). Hypothetically, with Big Data, we can rapidly construct detailed

knowledge using both deep data, commonly used in humanity studies, and surface data about lots

of people, commonly used in quantitative disciplines (Manovich 2011); (Lazer et al. 2009, p722)

provided similar argument on what they call, “depth and breadth and scale”.

(Thomas H Davenport et al. 2012) claim that organizations that learn how to use Big Data will

react to changes as they occur and will use different sources of data in real-time to create new

offerings. The speed factor was also emphasized by (LaValle et al. 2011) as an enabler to analyze

complex business decisions based on complex parameters.

There are empirical evidences that companies that use analytics in general outperform competition.

For example, in a survey of 3000 executives in different industries, (LaValle et al. 2011) concluded

that the top-performers companies use analytic much more than under-performing companies.

(Manyika et al. 2011) have identified five ways for Big Data to add value, which can be

summarized as follows:

Creating Transparency: Making Big Data available across functions can reduce time to

market, research and processing time and improve quality

Experimentation: statistical process control across the value chain to monitor and improve

performance.

Micro-segmenting of population: to address individual needs.

Automated decision making

Innovating with new business models, products and services.

(Stubbs 2011) argues that organizations can leverage business analytics at all strategic levels

(Organization planning, business planning and functional planning). It can be used as an input to

several famous strategic tools (i.e. SWOT, Porter’s five forces, PESEL, etc.). An example for Porter

five forces use of Big Data is shown below

8

Figure 6: Michael E. Porter “Competitive Strategy: Techniques for Analyzing Industries and Competitors”

(Bill Schmarzo 2012)

(Davenport & Dyché 2013) suggest that the Big Data objectives are to reduce cost and time, to do

an analytics tasks, or introduce a new product or service.

To conclude, Big Data advocates stressed on the value of Big Data along every point in the value

chain and across several management disciplines. However, similar claims were cited before Big

Data hype in knowledge discovery and data mining. Airlines, Banks, manufacturing - industries

have been collecting data and extracting insight for years. However, data types and sources are

different today, digital social realities have changed and computing power is more capable and

more cost effective.

2.3 Big Data – How and Who?

Data is created as an outcome of every business process, nevertheless the sources of data are not

limited to within organization anymore; data source can be external as well as internal –

transactional and unstructured. Data can be collected from outside organizational boundaries (i.e.,

suppliers, customers, partners, channel, environmental data, data banks, etc.). Its value spans

multiple business functions and its analytics serves multiple purposes. It is, therefore, imperative to

look at the implications from both strategic and tactical point of view. What does it mean to be data-

driven organization?

Organizations who aspire to compete effectively in the digital economy need to look at data and

data analytics as a source of competitive advantage. They need to pursue a strategy that is informed

and shaped by analytics (Davenport 2006), (Manyika et al. 2011), (Kiron & Shockley 2012). From

a system lenses perspective – Analytics needs to be embedded in different organization’s activities

ranging from operations, forecasting, sales and marketing, supply chain, customer service to

9

business development. Data needs to be transformed efficiently into information and consumable

knowledge across organization’s value chain.

Organizations who leverage Big Data act as lean organizations that process data as it comes rather

than stock it for future processing. The real-time processing of data can be used for quicker and

automated decision making or can be used for monitoring the environment (Thomas H. Davenport

et al. 2012). The need for data analysis automation, however, is not a new concept; (U. M. Fayyad

et al. 1996) emphasized the automation and the need for machine processing in his description of

the KDD. One can argue that, the unstructured nature of today’s data and the speed of creating are

some reasons behind the evolution of Big Data thinking. It is also the value of information that can

be extracted from the free raw data embedded in micro blogs, social media, GPS locations and

smart devices that created different possibilities.

(Thomas H Davenport et al. 2012) believe that Big Data ecosystem will evolve, creating an

information network of external and internal services to create more insight. (LaValle et al. 2011)

suggest that the insight created using analytics has to be linked to organization future strategy, and

tightly connected to daily operations. Skilled data-driven organizations use data not only in cost

cutting but also to prescribe actions and choose optimal options. They continuously find new ways

to collect process and consume data.

In a different perspective, (Kiron & Shockley 2012) survey of 4500 respondents shows that cultural

aspect and lack of commitments to analytics can hinder analytics programs. The data-driven culture,

(Kiron & Shockley 2012) argue, sees analytic as a strategic asset with full management support and

company widespread access to the insight analytics creates. While they did not describe how this

culture can be systematically built, they claim that data-oriented culture could be evolved.

(Davenport 2006) provided several practical examples of how the data culture will look in action,

but, again, did not offer an insight on how to sequentially develop this culture.

The talents who are able to cope with Big Data is another critical necessity and market is forecasted

to run short of resources by order of magnitude (Manyika et al. 2011). Those resources are

commonly called “data scientists”. While this term is not entirely defined, several observers have

highlighted the need of different skillsets to fulfill today’s analytics needs. They stressed the need

for both soft and hard skills for data scientist professionals. Those professionals have in-depth

expertise in a certain scientific discipline, nevertheless enjoying a good understanding of wide

business areas (Patil 2011). In his widely cited “Competing On Analytics” paper, (Davenport

2006) highlighted the importance of hiring the right analysts who have quantitative skills, analytics

10

aptitude, math and statistics, but also have the ability to simplify complex ideas and to speak the

business language, while having the skills needed to interact with business leaders.

2.4 Important Related Terms

Data Mining: is defined as the extraction knowledge from large amounts of data (Han et al. 2012).

(Linoff S. & Berry A. 2011) have similar definition with an emphasis on the operation part of the

data mining by declaring it as a business process. Data mining and knowledge discovery for data

(KDD) are often used as synonyms. Others are using the data mining term as one step in the

knowledge discovery process, which is then, refers to the intelligent methods used to extract insight

and pattern from data. Data mining can also be seen as a step in Big Data analytics; its predictive

and descriptive algorithms2 are commonly quoted in writings clarifying the opportunities possible

with Big Data analytics.

Data Warehousing: is the process of capturing data and collecting it from different sources to make

it available for online retrieval (U. Fayyad et al. 1996). In the process, data gets extracted from

operational systems, transformed, cleansed, aggregated and loaded and summarized to a repository

for processing (Bontempo & Zagelow 1998). Data warehouse helps in simplifying decision support

systems and ideally should represent single point of truth about organizations data. Data mart is a

subset of the Data Warehouse accessed typically by a certain line of business.

Figure 7: Data Warehouse infrastructure

2 please see Appendix B for methods used in data mining

11

2.5 Analytics Models

Based on real-world experience(Shearer 2000), CRISP-DM ((CRoss-Industry Standard Process for

Data Mining) was built as a general blueprint for data mining projects

Figure 8: Phases of the CRISP-DM Reference Mode (Shearer 2000)

The model suggests that the process should start by business understanding phase declaring clear

objectives of the undergoing analytic project. It should assess the resources needed and produce

plan for the project. Data Understanding phase includes the exercises needed to get more familiar

with the collected data, identify quality issues, explore and visualize the data and collect more

sources if needed. The phase is followed by Data Preparation where data gets cleaned, derived

attributes get created and data gets formatted, integrated and aggregated. The data mining models

can be applied then and tested in the Modeling phase before the Deployment phase is kicked off.

(Han et al. 2012) have proposed similar model for data mining that includes:

Data cleaning

Data integration

Data selection

Data transformation

Data mining

Pattern evaluation

Knowledge presentation

12

(U. Fayyad et al. 1996) suggest that following methodology, stressing the iterative nature of the

process:

Learning the application domain.

Selecting the datasets.

Data cleaning and preprocessing.

Data reduction and projection.

Choosing the function of data mining.

Data mining.

Choosing the data mining algorithm(s).

Interpretation

Using discovered knowledge.

(Linoff S. & Berry A. 2011) model, shown in figure (9), is more of an abstract methodology, but

with details close to the ones proposed by other models. Measuring results, however, through

financial measures or lifetime customer value is stressed out in this model.

The models are typically comparable; nevertheless they tend to zoom in or out in different aspects

of data treatment.

Figure 9: virtuous cycle of data mining focuses on business results, (Linoff S. & Berry A. 2011)

2.6 Big Data as a strategic initiative

The previous sections suggest that Big Data analytics can be leveraged across the company value

chain as shown in the figure below; it can also be used as an input for several strategic tools guiding

organizations to achieve competitive advantage. It can supports companies to create insight and to

13

ensure an informed and fast decision making. Moreover, Big Data can be used to measure

companies’ performance in a more accurate way and enrich the current BI practices.

Figure 10: Michael Porter’s Value Chain Analysis (Bill Schmarzo 2012)

The previous sections equally stress the vital role of talents, or data scientists, as key requirements

that companies have to nurture in order to be classified as data-driven organization. Those data

scientists will most likely, use some of the analytic models presented earlier, coupled with cross-

discipline knowledge to explore, present and explain data extracted insight.

It makes sense for a critical researcher to assume that Big Data analytics will possibly be seen as an

organizational strategic weapon. However, for skeptical practitioners and senior managers who are

going to kick off Big Data initiatives, there is most likely a need to measure how they are executing

against a set of performance areas. It is anticipated that several managers will be reluctant to

pursue a data transformation projects while they have experience failure in either a BI or data

warehouse projects;- surveys show that high percentage of those expensive projects have either

failed or have not achieved their objectives.

Practitioners will most likely have a strong desire to align their Big Data projects with the company

strategic directions and key performance indicators. The major question that I’m trying to answer

in this dissertation is:

“Assuming that Big Data analytics is a potential differentiator for certain companies, can we build

a framework for measuring the company’s performance in Big Data analytics?”

Additional key questions that can help exploring the main question are:

If Big Data sources and consumption span the corporate value chain, do we need senior managers

to oversee the whole data management strategy, For example is a Chief Data Officer role needed?

14

Is there a need for a separate business unit for analytics, does it need to be embedded in other

business units?

Are there some common characteristics for organizations that have maximized the value of data?

Have they been able to measure this value?

If the data scientists availability is a key requirement, are they acquired or cultivated? And how?

Is there a need for regular review of the analytics processes performance to make sure that

companies are maximizing both data assets and people talents?

How do companies know that the Big Data analytics KPIs are aligned with the corporate KPIs?

Is there a difference in requirements for digitally born organizations, whose main asset is

information, and those who are built on traditional business model?

Some of the questions have been partially answered in scattered literatures. It may possibly be of a

great value to combine those literatures with subject matter experts’ opinions who have witnessed

several successes and failures in the analytics field along with some cases analysis on contemporary

organizations who have achieved tangible results.

15

3 Cases Review

3.1 Introduction

In this chapter, I will be outlining four cases about organizations who have achieved tangible

success in analytics and Big Data. The cases were purposefully selected because of the

organizations success, the richness of the available online content and the revelatory of the cases.

Arguably, the cases would have been more relevant if the selected companies have achieved

competitive advantage and sustainable business growth compared to their peers who have not

leveraged Big Data and analytics. This seems to be a rational proposition; however, this proposition

faces practical challenges, some of which are:

The competitive advantage is complex set of competencies that are difficult to isolate and

measure in real life.

The Big Data topic is still in its infancy stage; with the exception of the digitally-born

organizations (i.e. facebook, Google, Amazon, etc.), it is therefore; challenging to find

many organizations who have developed the full competency and published detailed

information about those competencies.

Organizations who see analytics as competitive advantage could be reluctant to disclose

their initiatives to prevent their competitors to copycat their strategies

For the purpose of this project, I will try to discover some commonality and differences between the

understudy organizations who have achieved published success in analytics. I have consciously

avoided organizations that were built around Big Data (facebook, linkedin, Google, etc.), as their

business model could be difficult to replicate and so far is less common.

3.2 C-PG: The case of Procter & Gamble (P&G)

Figure 11: Business Sphere rooms in P&G

16

With more than 80 countries operations and 4 billion consumers touches every day, P&G has been a

leader in analytics for long. As stated in their innovation report3, the Business Sphere - a patent-

pending system - is transforming the decision making process by harnessing real time data. The

report claims that the system improves productivity and collaboration. Complex data is visualized in

the Business Sphere rooms and made available for the company’s leaders around the globe, driving

a quick and actionable insight. The report claims a 25% reduction on inventory.

(Henschen 2011) has described the Business Sphere rooms as follows:

“With a business analyst at the controls, executives see a global map of markets growing or

shrinking compared with expectations, and they can drill down to the countries and categories,

which range at P&G from laundry detergent and shampoo to diapers and potato chips”

The business sufficiency project in P&G allows its leaders to know

1. What is happening now (i.e. sales, inventory, market share, etc.)?

2. Why is something happening (country sales, marketing campaigns, store level, products)

3. Actions (pricing, products mix, what-if-scenarios)

The “Goldmine” conference is an analytics conference hosted by P&G. P&G invites several other

noncompetitive companies to share experience; it also invites academics and industry leaders to this

conference. After attending P&G “Goldmine” conference, (Davenport 2013) reported that P&G

CEO, Bob McDonald said:

“We see business intelligence as a key way to drive innovation, fueled by productivity, in everything

we do. To do this, we must move business intelligence from the periphery of operations to the center

of how business gets done.”

(Davenport 2013) stated that Filippo Passerini, P&G CIO, has renamed the IT department to

Information and Decision Solutions and outsourced all commodity functions.

In an interview with Passerini, (C. Murphy 2012) highlighted the collaborative decision-making

environment with the Business Sphere rooms which are equipped by videos and real-time data

along with analytics expertise.

In another interesting interview with (Chui 2011), McDonald said how he, personally, follows the

“consumer pulse” comments, - a project that collects data from the social media. This allows him to

react to issues happening in the marketplace and provides insight on how to improve a working

3 http://www.pg.com/en_US/downloads/innovation/factsheet_BusinessSphere.pdf

17

product. In the same interview, McDonald mentioned that analytics and digitization is touching

almost every stage in the value chain; for example, downloading the data in the manufacturing

plants, the Control Tower project handling inbound and outbound transport, connecting with

retailers in an automated way via GDSN4, simulation of molecules in R&D, or virtual walls that

simulate the store shelves.

Reflecting on his clear strategy to hire analytical thinking people, MacDonald said:

“We needed people with backgrounds in computer modeling and simulation. We wanted to find

people who had true mastery in computer science, from the basics of coding to advanced

programing” (Chui 2011)

The case of P&G shows high level commitment of analytics by senior executives in P&G, and

analytics is touching every angle of the organization. It also shows a data-centric strategy for the

organization. Moreover, it indicates a data-embracing culture supported by high expertise of data

analysts. The data is being collected and analyzed in every stage in the value chain; CEO is

listening directly to customers using social media, data is collected from manufacturing and sales,

data integration is happening from supplier to retailers, and management puts data at the center of

their business reviews.

3.3 C-OE: The case of Obama election campaign

This case shows how two years of data crunching by dozens of data gurus was leveraged to boost

personal marketing- or what (Wadhwa 2012) called “political data science” ; and how analytics

helped driving Obama campaign to win the presidency race again in 2012. It is sure that many of

the campaign secrets will not be revealed soon. In fact, a lot of the published information about

Obama campaign use of technology was not made available until Obama was re-elected.

It was the second digital campaign for Obama, but in the second time, the business intelligence

department was five times larger than the previous one. The department has dozens of analytical

positions – (T. Murphy 2012) listed some titles like chief digital strategists, chief integration and

innovation officer, director of digital analytics, and battleground states election analyst. This

highlights not only the importance of the new data science and data scientists, but also the diversity

in the scope that might be created in the coming years for such a profession.

In an interesting TIMES report, (Scherer 2012) cited Jim Messina, Obama campaign manager, after

taking the job “We are going to measure every single thing in this campaign” and his team started

4 Global Data Synchronisation Network.

18

early to consolidate databases for the voters and donators, they were able to microtarget voters and

predict several questions like:

Who was going to vote for Obama? Who was going to vote for Romney?

Who was reluctant? Who would not vote at all?

Who would vote if was approached?

Which types of people would be persuaded by certain kinds of appeals (Scherer 2012)

In an MIT Sloan interview with Andrew McAfee, Principle Research Scientist, said:

“I would hope that it becomes increasingly clear that an [analytical] style is increasingly superior

to the pundit style of decision making,”…“I am not saying that intuition doesn’t exist or is bad or is

wrong; our brains are really wonderful computers. - and our tool kit for doing that is really good

right now — we do not need a balance between intuition and being data driven. We need about a

hundred percent market share of the latter.”(Ferguson 2012)

Although Andrew’s comment of using hundred percent analytics could be seen as exaggerated, but

it truly reflects a new reality where data could give a better insight than guts feeling in many

situations. Obama campaign was able raise $1 billion, out of which 50% was raised digitally, and

he was able to win the digital race again.

Once again, one can observe a commitment from the campaign executives on data-driven programs

to measure everything in the campaign. Several articles out there talking about the mathematical

modeling used, database consolidations conducted, social media footprint, and the sophistications

used during this campaign. Two years of preparations and execution, fully qualified talents,

including Chris Hughes - the cofounder of facebook- and full commitment to analytics had driven

Obama campaign to the success we know.

3.4 C-GE: The case of GE

GE, the world leader in industrial technology, is not only a consumer and practitioner for Big Data,

but it is also an investor in technologies that will enable Big Data. GE has invested in Pivotal, a new

analytics and cloud company created by EMC and VMware, more than $100M to accelerate the

new analytic services. This is almost a 10% of the company market cap. In the era of Big Data,

Internet of things or Industrial Internet, GE is expecting to aggregate the data from machines to

create a value for the customer.(Vellante 2013).

The company is increasingly embedding sensors in array of “things that spin” to improve machine

performance. GE vision is connect the world’s machine together, leverage the power of analytics

and connecting people anytime to create more intelligent operations and design that provide high

19

quality of service. Below figure shows GE estimation of the industrial internet potential that

leverages Big Data to optimize a certain sector over the coming 15 years.(Evans & Marco

Annunziata n.d.)

Figure 12: Example of 1% saving across sectors (Evans & Marco Annunziata n.d.)

(Davenport & Dyché 2013) has highlighted that GE is recruiting roughly a 400 data scientists and

developing a special program for them. GE is focused on optimizing the service and maintenance

intervals for the products. In an interview with Bill Ruh, Vice President and Corporate Officer for

GE’s Global Software Center, (Davenport & Dyché 2013) quoted Ruh saying:

“We’re making a big bet on Big Data,” says Bill Ruh from GE. “With that said, the pilot projects

we’ve put out there have solved some big problems already. Our early proof-points were important.

Now we’re moving forward with even more complex problem sets. We’re making this a part of

everything we do

Our sensors collect signals on the health of blades on a gas turbine engine to show things like

‘stress cracks.’ The blade monitor can generate 500 gigabytes per day—and that’s only one sensor

on one turbine. There are 12,000 gas turbines in our fleet.” (Davenport & Dyché 2013)

In GE case, one can see that GE not only is a consumer and practitioner of Big Data, but it is also

taking the Big Data strategically and investing in other companies to boost what GE calls the

Industrial internet. The company is hiring data scientist to its Global Research. The company

software science and analytics website articulates the purpose of its initiatives

20

“We develop advanced computing and decision-making tools to analyze, interpret and utilize data,

creating software systems, solutions and architectures that will change the way our customers

create, deliver and manage their businesses”5

One can argue that GE is not only consuming the Big Data, but is also enabling more of the Big

Data that is created out of the internet of things. The investment done on both its research center as

well all Pivotal shows a high-level sponsorship for analytics. The commitment to analytics is high

both on what the company produces (machines) and how it produces them (operations and

services). It has established an impressive department for analytics and very serious in hiring data

scientists.

3.5 C- WM: The case of Wal-Mart

Wal-Mart has been using the Big Data even before the term was coined. In 2004, Wal-Mart has

predicted that Beer sales, rather than the obvious things like flashlights, would increase seven times

its normal sales while the Hurricane Frances was on its way! Trucks were speeding to fill out the

stock with products that were sold quickly (HAYS 2004)

According to the Economist, Wal-Mart, the largest retailer, was handling more than 1m customer

transaction a day (Economist 2010). Its famous and continually-refined system “Retail Link” has

been used by suppliers since 1991 from all over the globe. Retail Link is used by suppliers to record

sales, to trigger inventory reorder and to manage their own supply system once an item is scanned

by cash register. Using data from all suppliers, Wal-Mart demands suppliers to drop their prices

year over year and would replace their products in case suppliers could not cope up with this

requirement. Suppliers need also to be quick in order to be a candidate for Wal-Mart; Wal-Mart has

forced Levis to replenish within two days instead of Levi’s five days with other suppliers. Some

claim that Wal-Mart was one of the reasons behind low inflation in US due to its strategy in

squeezing suppliers’ prices (Fishman 2003).

In its 2012 annual report - a $443B revenue report- Michael T. Duke, Wal-Mart President and CEO,

stressed the customer-focus culture as a key strategy for his giant organization. Duke referred to the

world-class analytics developed by the Global Customer Insight Group as a mechanism to identify

customer trends and support marketing decisions (WalMart 2013). (Economist 2010) quoted Rollin

Ford, the CIO of Wal-Mart saying:

5 http://ge.geglobalresearch.com/technologies/software-sciences-analytics/

21

“Every day I wake up and ask, ‘how can I flow data better, manage data better, analyse data

better?”

Wal-Mart analytics DNA has also evolved with time, and data and analytics is no longer limited to

the backward or forward supply chain. It is no longer limited to data sharing and inventory

management, truck scheduling or even price information.

In 2011, Wal-Mart launched @WalmartLabs and acquired an analytic company in the social media

and mobile space, Kosmix. Several applications were developed and data sources have increased to

include data from social media. Wal-Mart collects online data about what customers are saying, and

approaching them with products information and discounts. Wal-Mart applications can help its

customers buy presents to their facebook friends based on their hoppies and interests. Some of these

applications use the concept of crowdsourcing where people pitch their own products in front of

large audience and the best products would be sold at Wal-Mart (Rijmenam 2013).

One can quickly appreciate Wal-Mart’s old commitment to data and analytics. A data-driven supply

chain started early with the company to optimize the most important area for a retailer, inventory

management. Price setting was another key area where data was leveraged for comparison and

bargaining. Wal-Mart has entered the social media game as well and leveraged data to create more

value for both the company and its customers.

The commitment from C-Level management toward data is also high and the company has

established a business unit and acquired another company to leverage the new data sources created

in the social media era.

22

4 Methodology

This chapter attempts to describe and justify the research approaches chosen, the data collection

implemented and outlines the analysis methods selected while highlighting the limitations of the

research.

4.1 The study purpose and the research objectives.

The main aim of this exploratory project has progressively transformed over the course of the study;

it has been evolved from answering generic questions around the being of Big Data into more

focused and practical questions about the maturity of analytics in organizations, with a hope to best

serve the management practice.

The purpose of this study is to understand how to measure the organization effectiveness in the area

of Big Data analytics and gain more insight about the topic of Big Data. The objective is to suggest

a practical framework or a model that can be used by organizations to evaluate and guide their

performance in terms of Big Data analytics.

The main question for this study is:

“If the Big Data analytics is considered by many observers as a potential differentiator for certain

companies, can we build a framework for measuring the company’s performance in Big Data

analytics?”

The following open-ended questions were framed as themes of topics asked to subject matter

experts in order to answer the research main question.

1. What are the common characteristics for organizations that have maximized the value of

data?

2. How do companies make sure they maintain analytics competencies?

3. How do companies know that the Big Data analytics programs are aligned with the

corporate objectives?

4. If Big Data sources and consumption span the corporate value chain, what should

organizations change or keep in their structures to make sure that all business units can

benefit from analytics?

5. What do you suggest as KPIs to measure corporate competencies in analytics?

The following map was also used as an aid for the interviewer to probe more questions related

to elements of organizations;7S framework was used as a guide for the questions themes

23

(Mintzberg et al. 1999). The framework was also relatively descriptive of the pattern discovered

during the case studies as explained in the data analysis section.

Figure 13: Themes were printed with the interviewer to guide the discussions.

4.2 Research Choices

4.2.1 Research Approach and Design.

As discussed in the literature review chapter, the topic of Big Data analytics is relatively new

surrounded by a great deal of excitements, and sometimes, exaggerations. It could be tricky to

categorize this research as either a deductive, generation of a theory; or an inductive research,

verification a theory (Cohen et al. 2007); while there are several performance management and key

performance indicators theories and methods in literature, those theories are not generally Big Data

analytics-centric and the preadaptation of a certain theory could narrow the view of analytics

performance dimensions. It was felt that the grounded approach, although time consuming, could

produce a better content. Therefore, the research has taken the approach of collecting primary and

24

secondary data first, to get a feel of the topic under study and its performance management

components in organizations. The outcome was then compared with some existing performance

management theories. Hence, while the research could be classified as inductive due to the absence

of foundation theory, the research was involved in a back-and-forth process of induction and

deduction. For example, in the early stages of the research, the author did not have any particular

theory in mind specific to Big Data performance; however, the author had some understanding of

theories around shareholder value and principal-agent framework, stakeholder’s theory and

balanced scorecard, but all were kept for later review after the case studies and interviews due to the

general nature of these theories. After the coding phase and interpretations of the case studies and

interviews, the author found some similarities between knowledge management and Big Data

management, and hence the literature were revisited again to compare and contrast the available

theories around knowledge management.

One can argue that the categorization of this study could also be of a less practical value; however,

it could be beneficial to the reader to explain the research approach in order to surf through the

material and find out how the author built the answers for the research questions under study. The

literature review was followed by four cases of organizations who achieved worthy outcomes

leveraging analytics, and the objective was to extract some insight on how those companies exceled

in analytics; then the author has interviewed analytics subject matter experts in the analytics field.

The outcome of both cases study and the interviews were assembled, synthesized to come up with a

broader analytics framework(s) measurement that organizations might use to monitor Big Data

analytics competencies.

4.2.2 Research Methodologies and Methods

(Hussey & Hussey 1997) define methodology as the way the research is approached from t the

research foundation theory to the collection of data and the way data is synthesized. Methods can be

described as the means on which these data are collected and analyzed (Collis & Hussey 2003).

For the purpose of this study, the multi-method qualitative study was selected where both semi-

structured interviews and case study documentary review have been chosen.

Qualitative research is a term used to refer to the analysis whose findings can’t be quantified using

quantitative analysis, or simply non-numerical way (Carl McDaniel & Gates 2012). (Maanen

1983,p.9) defines qualitative methodology as “an array of interpretive techniques which seek to

describe, decode, translate and otherwise come to terms with the meaning, not the frequency, of

certain more or less naturally occurring phenomena in the social world”.

25

The Big Data performance management topic is relatively under-researched in the management

literature and there is a little evidence of established theories. (Saunders et al. 2009, p482) argued

that ‘the more ambiguous and elastic our concepts, the less possible it is to quantify our data in a

meaningful way’, and therefore qualitative analysis could be a better strategy. Enormous part of the

researched writings and articles were focused on a single project delivered by certain organizations

to answer a specific question (i.e., why do we have a high churn rate? how can we increase profit in

a certain products? how do we integrate social media feeds into organization’s CRM? etc.) – rather

than looking at the holistic interplay between Big Data performance measurements and the

corporate goals and objectives ; therefore the project is exploratory in nature seeking new insights,

and grounded theory is selected as a research strategy. (Saunders et al. 2009) argue that there are

typically two ways of conducting such exploratory study:

A search of literature

Conducting individual and group interviews.

(Cooper & Schendel 1998) agree that exploratory study is likely to have a qualitative research as

part of the study. Grounded theory is best used in inductive approach and is particularly helpful

where the research emphasis is on developing and building theory (Saunders et al. 2009).

In this research, as semi-structured interview was selected in the exploratory part, as the problem of

Big Data analytics in management is relatively new and little research conducted pertaining to it

(Creswell 2009). (Saunders et al. 2009) suggest that the in-depth interview can be very helpful to

find what is happening in exploratory study, and semi-structured interviews can be also helpful in

the exploratory study. The author selected the semi-structured interview for the following reasons:

The research is dominantly exploratory with explanatory element in the cases section.

The method flexibility it provides in order to deviate from the essential questions when

needed.

The questions themes are open-ended and sometimes complex

The questions order could be changed, and throw-away and probing questions might be

added.

4.2.3 Data Collection Techniques

The research conducted by the author took a mixed approach of documentary review of successful

case studies of organizations that excelled in the analytics along with three interviews by subject

matter experts, who have either worked in analytics to boost the performance of their own

organizations, or those who have helped other organizations to plan and execute analytics programs.

26

For the first part, the conducted research considered four organizations whom the information about

were publicly available, their achievements in the analytics were considered by many observers and

commentators as successful. A free style internet search to find more information about those

companies from the following sources:

Published interviews with some key persons in the organization under study, from

newspapers, books, reports or YouTube videos.

Annual report statements which either show the desire to understand more or some bold

steps like building a large analytics business unit (the case of Wall-Mart) merger and

acquisitions (the case of GE)

News articles addressing successful implementation of analytics programs

Organizations speakers in data science forums.

For the second part, it was not easy to get experts who know Big Data from both theoretical as well

as practical point of view. There are plenty of individuals who understand business intelligence and

are experienced in the traditional way of doing analytics. There is also good number of individuals

who have done some data mining from a technical point of view, however, the number of profiles

who understand Big Data and organizational impact along with good experience in the field was

limited according to my research. It took linkedin searches and several email exchanges in order to

identify the interviewees- (no replies sometimes and apologies in others). The author finally

managed to agree with three subject matter experts to speak about their experience in this space

whose their profiles are listed below.

Code Title Company Location

I-CTO-BDV

CTO, Information Management and Analytics

Confidential Big Data Vendor US

I-DS-Pivotal Data Scientists Lead, EMEA Pivotal UK

I-CIO-QF x CIO Qatar Foundation Middle East

Interviews were conducted in face-to-face and internet video conferences manners. In all cases the

interviewees were asked for permission to record the audio and it was OK in all cases. The main

question of the interview along with questions themes were sent to the interviewees prior to the

interview and transcripts were sent to them after the interview for review. The author asked the

interviewees for their comfortable schedules in order to guarantee on hour during the interviews and

the timing was mutually agreed.

27

At the beginning of every interview, a five minutes introduction was presented by the author about

the purpose of the research and what it tries to accomplish along with a brief on what will happen

after the interview. Although, the questions were designed to be open-ended, in several times, the

author had to rephrase his question and improvise to get more insight about certain idea or topic and

this proved to be helpful.

In several occasions, the interviewees moved little bit beyond the scope due to the multidimensional

facets of Big Data. In most of these cases, the insight was inspiring and the author has tried to

capture it as soon as possible and interfere to move the discussion back to the project scope. In

several occasions, more elaboration was requested by asking for real life examples, and the author

inclined to summarize understanding after every main theme to confirm main ideas.

4.2.4 Data Analysis

As discussed in the research design section the grounded theory approach was selected for this

qualitative research and both case studies from published data as well as interviews with subject

matter experts were conducted, as suggested by (Strauss, 1998) and (Glaser,1967). (Saunders et al.

2009) argued that the qualitative data collection has some implications in the way it gets analyzed

and the researcher will most likely summarize, categorize and restructure the data to come up with

meaningful analysis. All interviews were recorded and subsequently transcribed in a word

document and then sent to interviewees for comments.

The author listened to and read the audio recordings and the written scripts several times while

coding the data by applying brief description on each segment of the transcribed documents.

The data and input collected was analyzed to come up with themes and patterns which were then

compared with the literature to ensure that the patterns are at least associated with the literature

findings. The process of coding started with some ideas of how the patterns might look like,

inspired by literature reviews and the research of the case studies. Mintzberg 7S general framework

was also considered in both the semi-structure interview as well as the description of the text in

hand. However, the lack of related theories in the area of Big Data performance management did

not provide a solid framework of categories.

The initial data analysis stages involved breaking down the data collected from the case studies as

well as the interviews into units, and significant categories were identified and chosen by the

author, on what is called the “open coding”.

“Axial coding” was followed to analyze the relationships between the identified categories, in order

to come up with main themes and sub-themes.

28

At the beginning, the author has created a codebook and tried to come up with codes and categories

(themes and sub-themes). After repetitive investigation of the codebook, it was clear that there are

some inconsistent inputs, or at least not easy to justify relations between inputs, within the same

interview. More examination of the codebook and extra reviews of the scripts, suggested that the

interviewees may have been referring to different characteristics during different phases of

organizational maturity in analytics, i.e., the characteristics and requirements for organizations to

succeed and excel might be different in each phase. So the coding was reviewed again to include the

phase factor into account.

The outcome of the interviews motivated the author to do yet another literature research looking for

similar fields of technology which has a better coverage of performance management, which will be

explained later in the discussion and analysis.

4.3 Limitations

Like any primary research, this study has its own limitations. There are some limitations related to

the “research design” and others related to the “data collection” implemented in this research.

This research was limited to selected industry subject matter experts along with purposefully

selected case studies. The subject matter experts don’t represent the opinion and observations of the

analytics industry. Although the cases are consciously selected to be from different sectors, there

are many other sectors that are still missing and even in the same sector, different business models

can lead to slightly different conclusions.

As discussed in the methodology chapter, the exploratory inductive research is typically related

with a ground theory which, despite the sample size and sample quality, needs more investigations

in order to claim generalisations. The impracticality of a larger sample size both from the case

studies as well as the subject matter experts could challenge the generalization of the findings of

this study; however, the objective of achieving usable results is felt to be reachable with the

expertise sample.

There is also an intrinsic risk that the researcher knowledge of the area of Big Data could lead to

partial influence embodied in the questions asked to the interviewees which could lead to put aside

some insights from the participants (Strauss & Corbin 1998).

The time validity of this research is not discussed, and the way technology is built, managed and

consumed could change overtime, creating new ways of managing it and therefore new ways of

measuring its performance. For example, data scientist nurturing was stressed out, but the future

could reveal that the analytics done by those data scientist could be automated.

29

4.4 Conclusion

The purpose of the research is to understand the topic of Big Data and to suggest a practical

framework to measure Big Data analytics in organizations. The literature review led to more

understanding of Big Data and it also highlighted an under-researched gap related to how

organization can possibly measure the performance of Big Data analytics. The selected case studies

have painted some common patterns of organization who have unleashed the potential of analytics.

This was followed by interviews of subject matter experts, which actually opened other interesting

facts related to the characteristics of these organizations as well as the time dimension in

organizations journey toward analytics. A well-defined research design was conducted

implementing “ground theory” that adopted coding procedure in order to come up to themes

representing the foundation of the new theory

5 Discussion and Analysis The author interviewed three subject matter experts in the area of Big Data and analytics during the

research period combined with case studies from leading analytics organizations.

The main question was:

“If the Big Data analytics is considered by many observers as a potential differentiator for certain

companies, can we build a framework for measuring the company’s performance in Big Data

analytics?”

5.1 Big Data and Organizations

There is no doubt that Big Data is the key buzzword of 2013 in the business information and

information technology and most likely to stay as hot as it is in 2014. By august 2013, see Figure

(14), Big Data has reached the top of Gartner hype cycle for emerging technologies; moving some

steps up from Figure (1) when I started the research on late 2012.With technology hypes, there is

some confusion around the best way of leveraging this hype and making sure that business money is

spent on the right thing for the right reason.

It is also imperative to appreciate that data is touching every part of contemporary organizations

value chains as explained in the literature review. The data could be generated from smart machines

in the production line, customers’ smart phones and social media comments, smart workers and

employees’ desktops, fleet sensors, surveillance, web clicks, etc. Companies will act differently

30

toward this information overload; some will keep measuring past business performance, others will

extend that to predict and optimize the future products and service, sales channels and optimal

prices; other companies will even take bold steps to extract insight out of mountains of data to

optimize their internal operations and understand market dynamics and forces.

Figure 14: Gartner Hype Cycle for emerging technologies as of Aug 2013

(Frank Gens 2012) of IDC predicts that “Vendors' ability (or inability) to effectively and

aggressively compete on the 3rd Platform will reorder leadership ranks within the IT market and

ultimately beyond it within every other industry that uses technology”. For IDC, the third platform

is a hyper-disruptive technology platform which includes “Cloud, Big Data, Mobile and Social”.

31

Figure 15: IT Industry 3rd platform of growth and innovation (Source IDS)

All the interviewees have stressed out the importance of the new technology wave and the impact it

could have in creating value for both organizations and customers. While some interviews argued

that the trigger of Big Data initiatives are most likely defensive in nature (regulation, competition,

risk, cost, etc.), others stressed the offensive nature as a driver for Big Data initiatives (more market

share, better consumer understanding).

The case studies show examples of successful turnover on the way organizations do business. Big

Data can help guiding organization strategies and as I-CTO-BDV said “You don’t need a Big Data

strategy, you need a strategy that incorporates Big Data”. Big Data can help on optimizing the

operations like what P&G has been doing, optimize inventory management like in the case of Wal-

Mart, understand your targeted customers as in the case of Obama election, and optimize products

and services in GE example.

5.2 Different stage, different measurements

Unlike case studies, where C-level commitment was obvious, interviewees’ observations on

organizations that they worked with, suggest that the Big Data initiatives were driven, typically,

bottom-up rather than top-down. I-DS-Pivotal noted that it was typically driven by customer-facing

units (i.e., marketing, customer relation management, etc.). However, the interviewees agreed that

C-level support will make the initiative successful.

32

A further investigation within the interviews and case studies could suggest that the context of

observations are highly dependent on the stages of development and how much data-centric the

organization under discussion is.

The author believes that above observation could be significant for several reasons:

1- It can resolve some conflicts between interviews, also between interviews and case studies

and sometimes within the same interview, for example:

While one interview suggests that C-level executives are typically not the ones who drive

Big Data initiatives, it also suggests that a data culture has to be there and has to spread

across all organization with incentives to encourage this culture. In the interviewees’

wordings: “Information is power, the moment you give up your data and allow others to

look at it, you lose this power…organizations have to have common shared goals behind

the vision on what they are trying to do in the company. Not only shared goals, but also

incentivized. The less incentivized the people are, the less people want to give up some

power and it becomes harder to give up this power”. This is something that typically occurs

with executive sponsorship.

In another interview, the interviewee agrees on the typical bottom-up approach of driving

data projects, but at the same time the interviewee proposes a bold assertion “You don’t

need a Big Data strategy, you need a business strategy that incorporate Big Data, that is a

big difference”.

In the other hand, almost all case studies suggest, high level commitment from executives.

Therefore, the distinction between organizations could be beneficial and more practical, as

it could be stubborn to measure a data-driven organization with the same metrics of want-

to-be organizations.

2- The observation suggests that similar to other previous trendy-technologies’ measurements,

different stages might require different measurement approaches. For example, (Lopez

2001) suggested that knowledge management five development stages shown in Figure (16)

require different approaches of measurements

33

Figure 16: The stages of KM development (Lopez 2001)

3- .Similar to knowledge management stage 1, two interviews suggest the necessity of a Big

Data advocate, in early stages, within the organization. In one interview, an advocate

example was given as an individual who is “absolutely passionate about what Big Data can

do and he has enough observations points that he is not afraid to stand in front of a crowd

and make some strong statements even at the risk of his own career”. In the other interview,

it was a global risk department.

Perhaps, it is impractical to apply concrete metrics equally on organizations that have not yet

embraced the data culture, and those who have achieved a certain degree of maturity. It is also

unlikely for the advocate(s) to have the power to enforce such metrics in the early phases of

development.

The author divided the codebook into three phases in order to distinguish the measurement of Big

Data performance at different stages of maturity as shown in Figure (17).

Figure 17: Stages used by the author in the current study

Future researches might test the suitability of the APQC knowledge management development

shown in Figure (16) and its applicability on Big Data, but this is beyond the scope of this study, as

the purpose of this study is not meant to construct a perfect development stages for Big Data in

organizations rather than measuring its performance. The author has proposed three simple stages

on Figure (17) to stress the need for different measurements through different phases of Big Data

that organizations could go through.

34

5.2.1 The Start

One can argue that the common sense suggests that more information will most likely lead to better

decisions, and therefore, analytics should be on top of the agendas for many executives. However,

most of the interviewees’ observations don’t align well with this view.

There is often a trigger that causes the interest in Big Data analytics to appear. This trigger could be

something like a strong advocate, marketing business unit, customer relations management, and

sometimes it is more defensive in nature, (for example, some regulations that needs to be enforced

by risk department).

In the case of the advocates (being a business unit or an individual), he/she usually has the energy,

vision and faith on Big Data that inspires others and pushes the initiatives forward. He/she is

typically focused on success stories and getting a business win or curing an existing pain in order to

prove his/her point.

In one of the interviews, this person is “more laser-focused on the business to show, demonstrate

and deliver quantifiable business value”. This observation, however, should not be confused with a

total business transformation, - but should be understood as a trial to prove a case. This person or

this unit works hard to create coalitions and search for others who share the same beliefs and are

willing to help.

It is also expected that this advocate will inevitably face political resistance. Big Data analytics

requires diverse sources of data that are typically scattered across the Enterprise, and the ability or

inability to acquire this data might lead to the success or failure of the business win that the

advocates is trying to do. Quoting from one of the interviews: “In organizations, knowledge is

power and therefore, the data you have gives you power, it provides you with insight into business

that others don’t have. The moment you give up your data and allow others to look at it, you lose

this power”

The author personal experience strongly supports this idea. In one of the largest banks in Middle

East, there was that advocate who wanted to push the idea of Big Data and he worked with the

author to bring in a data scientist to show one business case of Big Data (enhancing the process of

“loan prospects” based on more sources of data and more attributes of the customer). The project

faced enormous political resistance from several business units. It sometimes caused a lot of

frustrations for the people involved in this project, as the time was passing with little outcome after

several failures of acquiring the required data. The persistence of this advocate and the political

understanding of the bank and the organization structure forced things to start after few setbacks.

35

While the trigger typically comes from a non C-level champion, but interviews suggest that C-level

sponsorship can provide a better chance of success. Once again, the champion needs to realize the

importance of the political understanding.

“Those people might not be a C-Level people. However, the ones that are successful are the ones

who have strong relations with the C-Level people, who can carry that message forward and get

them to move forward” said I-CTO-BDV.

Most of the interviews as well as the case studies have implied the necessity of data culture, for

example, one of the interviews suggests that data driven companies who demonstrate competencies

and create value of analytics should have the following traits:

“Strong desire to let the data tell them what is going on”.

“The willingness to act on what the data tells them”

Perhaps, to get started with Big Data, it could be risky to start measuring the direct correlation

between the whole business value and the Big Data analytics initiatives, however, some value-

added results could be a good thing to leverage in order to influence the key sponsors and start

pushing the analytics culture. One can assume that this culture transformation could be difficult to

achieve in the very early stages of Big Data, and jumping to measure the whole business value

could be challenging to achieve.

Potential measurements in the start stage could reflect some of the above observations, and tangible

and intangible performance measurements might need to be observed around several stakeholders

(employees, management, business customers), those measurements could be quantitative and

qualitative, for example:

Sponsorship and Support

One should be able to observe and measure how much progress being done in order to develop

and grow the sponsorship inside the organization, for example:

o Number and profile of people recruited (supporters, champions, project sponsors,

influencers, etc.). One can use some communications frameworks to keep the

communications flowing in order to keep these stakeholders informed and increase the

support (for example, CAIRO framework)

o How often the Big Data and analytics discussed in front of executives.

o How much funding is available for the pilot projects

How much quantifiable business value could be achieved?

36

As stated before, these value-added results and success stories can help in getting more

support inside the organization and drive more data culture.

Political support and political resistance.

This part should not be underestimated, and the measurement of coalitions, teamwork,

information-sharing flexibility and collaboration should be observed.

Skills available to move the projects forward.

It could be also important to have a strong understanding of the “technical” readiness of the

organization, (some were addressed in literatures), for example:

Data scientist skills available inside the organization or available as contractors.

Number of data sources.

Privacy and regulation around user data.

..

The author also believes that an overall understanding of the organization attitude toward new

technology and information systems in general could be a great help to understand the

organizational culture and the easiness of change. For example, (Nolan & Mcfarlan 2005) IT Grid

shown in Fig(18) could help both the advocates and the company to evaluate the readiness of the

company to accept the idea of the Big Data and therefore, the chance to succeed.

The companies in the strategic and the turnaround modes could be more likely to accept innovation

than those in the factory and the support modes, due to their organic innovative nature, technology

spending ratio to capital expenditure, executives support, etc.

37

Figure 18: The IT Strategic Impact Grid (Nolan & Mcfarlan 2005)

5.2.2 The Transformation

It is assumed at this stage that proper funding would have been allocated to analytics and formal

implementation of the Big Data analytics inside the organization was given the green right with a

fair amount of support from several sponsors including C-level.

At this stage, it is also assumed that the organization would like to build and retain its competencies

in the Big Data analytics as an inimitable assets rather than keeping it as an adventure led by merely

some advocates.

To answer the objective about building and retaining competency and avoiding the loss of

‘corporate-memory’, interviewees have proposed different approaches while highlighting the

difficulties on achieving this due to the limited number of skilled people in the Big Data space and

the exposure associated with skills turnover.

Answering a question about the exposure that organizations might face if experienced data

scientists leave, one interview suggested that “most of the organizations have not thought beyond

the individual, and they are very exposed if this happens”. The interview continued to suggest

building center of excellence (COE), where the best practices, trainings, methodologies, data

38

scientists and other competencies are kept. In another word, it suggests a centralized Big Data

structure within the organization that is in charge of transforming the organization.

The other interview agrees on the difficulties of retaining competencies inside organization rather

than within individuals “At the moment, there is a lot of focus on the hiring, but if all this

knowledge stays within the data scientists, this defeats the purpose of Big Data”, the interviewee

also implicitly suggested some extended responsibilities of data scientists “One of the key tasks of

the data scientists team is to help convert the rest of organization into little data scientists”,

however, the interviewee suggested a different approach for the data scientists team “Ideally, you

need to have data scientist right next to you to help you understand the data right here, right now.

Most executives have a personal assistant, maybe a finance person, maybe a local HR person, never

mind there is a larger HR business unit working with them, in the same way, it makes sense to have

local data scientist expertise, one or more and spread out through the whole organization”. In

another word, it suggests a decentralized team

While the ultimate argument for the interviewees was to stress out the importance of nurturing data

culture as a an intellectual capital in the organization along with the vital role of the data scientists,

the interviewees suggested different approaches on getting the organization to learn rather than

merely getting the individuals to learn. In brief, the structure proposed was centralized team of data

experts versus decentralized team.

The author believes that both options could be appropriate for data driven organizations and they

both could achieve the stated objective of spreading out the data culture. However, the centralized

approach (i.e. the Center of Excellence), might be more practical at the transformation stage. I

argue that at this stage, the organization has already started to be enthusiastic about Big Data

transformation and the sponsoring senior executives are keen to see more business value evidences.

The sponsors would like to keep communications flowing about the new pilot projects and want to

listen to the good news, challenges and the lessons learned in order to provide a better support. The

center of excellent idea could provide a better structured way for initiating new initiatives across

business units, capturing lessons learned, communicating progress and performance measurements,

building skills inside the unit, keeping the communications flowing to the organization.

Perhaps, (Fairchild 2002) observations about the knowledge management measurements objectives

could be applied on Big Data transformation stage;- “ to find out how well the organization has

converted human capital (individual learning/team capabilities) to structural capital”.

The above observations about the cultural readiness, human capabilities and retention, business

values proofs, future readiness might suggest that balanced measurements need be established.

39

The measurements should not be restricted to the direct financial results, but it should also address

the intangible assets, culture, employees and management attitude toward data, technical and

infrastructure readiness. (Kaplan 2010) highlighted the difficulties of measuring the intangible

assets in financial terms:

The value is indirect, the direct impact of insight created by Big Data, technology or

knowledge management is not always easy to quantify, as this typically happens in a chain

of cause-and-effect, and could have some stages in between.

For example, customer records, mobile apps, or social media data can be used to send

greetings message to a customer once he passes by a restaurant. The customer might

decide to dine in immediately so we can measure profitability, dine in later and we could

lose the financial correlation, and maybe he would feel uncomfortable of being tracked.

The value is dependent on the organizational context. Perhaps (Nolan & Mcfarlan 2005)

grid could clarify this concept. For example, data scientist knowledge in a small bakery

shop is not as significant as the knowledge captured in airlines about passengers.

The value of intangible assets is generally bundled with other assets. For example, the

data scientists will be of a less important value if they are not equipped with the right

technology, access to diverse source of data, and the political support inside the

organization.

The author believes that at the transformation stage, the organization needs to have a structured

approach of measuring both tangible and intangible assets, and a balanced measurement should be

adopted in order to track the success of analytics initiatives. These measurements could be either

owned, communicated by center of excellence explained earlier or sponsored as corporate-wide

initiative. The choice will most likely depend on several factors such as, support, funding, capacity,

access, etc.

5.2.2.1 BSC Review

Balanced scorecard (BSC) has been successfully adopted by several organizations to measure

performance, it has also been used as a tool for translating the vision, communicating and

implementing strategy (Kaplan 2010). BSC is a balanced mixture between financial and non-

financial measurements to bind short-term activities to long-term objectives. It is also balanced

because it is not only measuring the historical performance but it also measures nonfinancial

metrics that can predict future performance. (Kaplan & Norton 1992) proposed four perspectives:

Financial (i.e., profit, revenue, share price, etc.)

40

Customer (i.e., how do customers see us?)

Internal (i.e., what we must excel at?)

Learning and growth (i.e., continuous improvement?)

Figure 19: Balanced scorecard (Kaplan & Norton 1992)

(Martinsons et al. 1999) applied the balanced scorecard as a foundation for strategic information

management systems and proposed four perspectives as well:

Measuring business value.

Measuring internal processes.

Measuring user orientation.

Measuring future readiness.

(Wim Van Grembergen et al. 2004) applied the IT BSC as maturity model to determine the

maturity of IT at a major Canadian financial group. (Fairchild 2002) extended the concept of BSC

and applied it on the knowledge management field from two approaches, leveraging different

subsets of BSC perspectives.

41

In the first approach, (Fairchild 2002) used the customer, internal and learning perspectives and

mapped them to social collaboration, structural and human capital respectively. He added the

intellectual capital as a new perspective, which is arguably can be combined with learning.

In the second approach he suggested “viewing the role of KM in organizational strategy via

management based approach, focusing on intellectual capital resources combined with business

processes of the organization” (Fairchild 2002). In other words, he proposed embedding the KM

measurements in the overall organizational BSC.

Knowledge management, arguably, shares some characteristics of analytics field, for example:

Knowledge management and analytics constitute both tangible and intangible assets.

They rely on both human skills and technology engines.

Their impact is arguably dependent on cultural transformation.

Their value stems from capturing data, analyzing information and sharing insight.

In the following section, the author proposes the use of BSC as a guiding tool to measure,

communicate and guide the implementation of Big Data projects.

5.2.2.2 Balanced scorecard and Big Data

Perhaps at the transformation phase, organizational widespread of analytics has not yet reached the

mainstream inside the organization; the management has not yet grasped the full vocabularies of

data as in the case of P&G or Walmart, more external and internal data sources are still being added

and the core technology is in place but large piece of integration is going on. As for the data

scientists, the hiring and training is going forward but within the fund allocated for this initiatives;

“You hire different kind of people, you train different kind of people, you bring more analysts and

data scientists, you acquire more data” said I-CTO-BDV.

As discussed earlier, the center of excellent, and any in-charge entity for this matter, should focus in

different measurements that are grouped in four perspectives in this project as shown in Fig (20)

1. Business value

This perspective answers the question of the contribution of Big Data analytics on key

organizational objectives; financial or nonfinancial measurements. For example; customer churn

rate, customer understanding, customer segmentations, inventory and supply chain optimization,

employee’s performance, etc. The higher the analytics impact on organizational objectives, the

better the chance of analytics to grow within organization.

42

This perspective could also include financial measurements related to the initiative like budgets and

ROI analysis, or financial measurement related the outcome of a certain project or projects

2. Business units and internal users.

The perspective might answer the question about the degree of relevance of analytics to several

business units contributing the value creation process of an organization. The measurements could

be lead measurements, for example the number of data sources from applications integrated for the

purposes of analytics; or lead indicators, i.e., the business units’ objectives that analytics helped to

achieve or measure.

The measurements will help spread the culture of Big Data and get more buy-ins and avoid political

resistance from within the organization.

I-DS-Pivotal suggested examples that could be linked to this perspective:

o “What % of decisions that you have to make at various level of the organization,

are driven by data”.

o “Show me the evidence of the analysis you have reached and show me how often

the decision you have actually made agrees with the recommendation that the

analysts have come up with”.

o “What % of the analysis is based on cross functions data and what % is based on

both internal and external? (Meaning traditional vs. non-traditional)”

3. Structural measurement.

This perspective answers the question about the performance of processes and systems needed in

order to excel. .”At some point after all the awareness and preparation is achieved and all are

onboard, it is very important to translate plans into solid initiatives executed on the ground, BIG

DATA related projects with clear/measureable objectives, timelines, and budget. “I-CIO-QF

Perhaps, this perspective could address questions about project prioritization, operation and

maintenance of Big Data applications, degree of success on acquiring more data sources, privacy

compliance, and time required to answer analytical questions, etc.

4. Future readiness

This perspective might answer the question about the way the COE, or any entity in-charge, will

continue to create and improve value. This might include questions about the skillsets and

motivational aspects inside the COE, turnover rate for data scientists training and development

43

effectiveness for data scientists and internal users, emerging technological opportunities and

challenges

Figure 20: Transformational stage BSC for Big Data

5.2.3 The Maturity

Perhaps, a useful description of this stage is what I-CTO-BDV mentioned in his interview “You

don’t need a Big Data strategy, you need a business strategy that incorporates Big Data“. Likewise,

the author believes that at this stage, organizations tend to implant analytics and Big Data

within their business performance management and across the value chain. In other words,

there is a full-organization acceptance and exploitation of analytics. It is not an adventure anymore,

as the enterprise depends on it.

Several quotes were mentioned in this study supporting this proposition. For example, P&G

McDonald mentioned that analytics and digitization is touching almost every stage in the value

chain “We see business intelligence as a key way to drive innovation, fueled by productivity, in

everything we do. To do this, we must move business intelligence from the periphery of operations

to the center of how business gets done” (Davenport 2013). A similar statement quoted earlier by

Obama campaign manager “We are going to measure every single thing in this campaign” (Scherer

44

2012). GE Bill Ruh is cited by (Davenport & Dyché 2013) “Our early proof-points were important.

Now we’re moving forward with even more complex problem sets. We’re making this a part of

everything we do”. (Economist 2010) quoted Rollin Ford, the CIO of Wal-Mart saying: “Every day

I wake up and ask, ‘how can I flow data better, manage data better, analyse data better?”

It could be challenging to discover all evidences about P&G McDonald’s claim or other studied

organization with the available public information, but the below figure is a trial to visualize

Porter’s value chain against the information captured in this dissertation.

Figure 21: P&G Value chain and data analytics

Figure (21) shows a good coverage of data-dependent value creation process across the value chain.

The author believes that market-based-view frameworks (analyzing the customer needs, suppliers,

environmental, social, etc.), or resource-based-views frameworks - Porter’s value chain framework,

(Mintzberg et al. 1999) 7S framework, (Kaplan & Norton 1992) Balanced Scorecard, and other

frameworks about business model and business logic - could really uncover the differences between

the data-driven companies and others. The formers incline to embed the analytics inside everything

they do as cited by several executives in this dissertation, and help them understand the world

around them.

P&G is not an exception, but there were explicit and publicly available information about various

projects supporting several primary and secondary activities. Walmart “Retail Link” project also

spans the primary activities, and is now supported by social crowdsourcing applications that feed it

with more customer data.

It worth mentioning that once again, (Lopez 2001) observation about the “Institutionalize KM”

phase of knowledge management sounds also applicable to analytics and resonates with the

45

executives statements quoted in this dissertation “It does not happen unless KM is embedded in the

business model”.

Revisiting the case reviews presented through Chapter 3 again, one can rapidly internalize tangible

differences between the characteristics of those organizations and those described in the previous

sections. Some of these characteristics were also mentioned in the interviews as well. The data-

personalities span the whole organizations and impact the way they do business, for example:

The analytics benefits have gone beyond proving a value, the value is already proven.

Walmart’s CIO was quoted that he always thinks on how can he” flows data better, manages data

better, analyses data better?” (Economist 2010). P&G business sphere room enables the company

to visualize, analyze and predict performance across the globe while answering questions about

pricing and products mix. GE - the internet of things advocate - has been embedding sensors in its

machinery creating more data. Obama second campaign was doing it for the second time, getting

more fund raising and approaching more voters through the right channel.

Increasing the data sources, while continuously increasing data value.

McDonald, the CEO of P&G, follows the consumer comments created in the social media, Obama

campaign is analyzing unstructured data in social media along with the structured database they

built about voters. GE is actually going beyond ERP data and maintenance schedules, but collecting

data about their machineries through sensors and investing on companies that will potentially

enable analyzing more data. Walmart has created applications that not only collect social media

data, but also creating more data while embracing the crowdsourcing in its business.

I-DS-Pivotal explained the technical phases that companies go through “First of all, combining

existing in-house different data sources that traditionally were separated, and sometimes were

separated functionally, (ERP in one hand never combined with CRM systems)….. The second stage

is the incorporation of external data…”

There is a senior level commitment for analytics

Several executives have been quoted in the case studies, including CEO and CIO, with bold

statements about the importance of data and data analytics and strong faith on the possibilities that

data analytics can bring into every part of the value creation process. “In order to keep the high

momentum, top management need to keep investing in sourcing new methods and acquiring new

data sources in order to keep the organization on the cutting edge. Differentiation using Big Data is

a journey not a destination” I-CIO-QF. The bold statements are supported by bold actions on these

46

organizations; some of these organizations built Big Data research centers, invested on companies

and even acquired others to further enable their capabilities.

There are some evidences of a strong data culture.

The collaborative decision making environment facilitated by P&G business sphere rooms, the

faithful comments from different researchers within Obama campaign, putting data on the center of

everything GE does claims are all evidences of a culture which has big belief in the big thing called

Big Data.

I-CTO-BDV also highlighted the cultural aspects of successful organizations “Once you have a

couple of successes, I think what you will see happen is that cultural change across the

organization about using data to tell you what is going on”.

Data-related roles are not scarce within the organizations.

The amount of data related roles hired in Obama campaign, the 400 human forces constructing the

global research entity created in GE or WalmartLabs in Walmarts can all be seen as good indication

of the maturity of these organizations on hiring and nurturing human capital who can make sense of

data. P&G management abilities to interpret data and visualize it, while taking actions

In brief, organizations in maturity phase show soft and hard characteristics embracing data and data

analytics, they see the insight created by data as both intellectual and structural capital to the

organization, and help them measure everything they do and predict things that they should do.

47

6 Conclusion and further studies.

Many Big Data advocates believe that business is poised for change due to the data proliferation

and the increasing ability of analyzing this data. The ability to do something about this data

explosion and insight created might restructure industry leaderships. However, this should not be

taken as the face value as each industry has its own characteristics. For example airlines are

different than manufacturing or education, as in (Nolan & Mcfarlan 2005) IT grid. Also, although

this is beyond the scope of this thesis, technology and analytics could be easy to imitate if it is not

based on composite competencies.

In brief, there is no size fits all, and this applies also to analytics. Perhaps, metrics and

measurements should reflect the analytics maturity of the organization under study.

While not a particularly precise description of analytics maturity, three stages were suggested and

different measurements were proposed for every stage.

At the start phase, metrics about management support, political alignment, culture, skillsets and

gaps, technical readiness should be closely monitored. Inspiring stories and some business wins

should be spread to gain more support.

In the transformation phase, BSC could be used as a guiding tool to measure the performance.

Perhaps, as the organization get into maturity stage, they might incorporate as much analytics as

they can into their business performance management.

6.1 Future studies

The topic of Big Data is still, and possibly will continue to be, a fascinating one, attracting a lot of

attention from a wide spectrum of observers. To large extend, vast amount of literature focus has

been concentrated on the data mining techniques and algorithms. Justifiably, Media is filling out its

columns with heroic stories about certain discovery here or there, and surprising readers with

correlation between, what could be seen as unrelated variables. I still believe that there is a spacious

research room in the Big Data field and its management implications. The following could be

valuable in both academic and professional fields

Deductive research to test the usability of APQC knowledge management development on

Big Data.

Deductive researches leveraging existing framework of knowledge management

performance theories and testing the applicability of these frameworks on Big Data. The

48

challenge of the suggested researches could be the purposive sampling of the cases and/or

interviews. The purposive sampling should be clear on which phase of the analytics

maturity the focused organizations are in.

Big Data performance management in different management fields, this could be also seen

as inherited BSC for different business units.

o Operations Strategy

o Marketing

o Supply chain

o Risk Management

Big Data management for certain industrial sector.

Privacy aspects of Big Data and what kind governance should be applied.

In-depth study comparing data centric companies and traditional companies.

The impact of organization size on the new analytics capabilities

Building a culture of analytics in organizations.

49

7 Personal Reflection As this project was coming to an end, I always had a feeling that this section will be a difficult one.

It could be hard to summarize days and nights of personal and professional experience in a few

lines, but I will try to get through it. While some readers of this study could expect bullet and

rational points in this section, reflecting not only the nature of “data analytics” project, but also

reflecting long time of structured and critical thinking exercises spent in the school; I decided to

start with a feeling statement and free style writings, avoiding the extra dose of management jargons

I consumed last years as much as I can.

Through the course of the MBA studies and the course of this particular project, I’ve often

surrounded the concepts and ideas I was exposed to, with extra sauce of feelings. Admittedly, there

were moments when rationality and feelings were mixed up in a way I could not differentiate

whether I was in the ‘confirmation bias’ zone, or I was rational supported by evidences.

So, here is what I feel right now

“I’ve just discovered the tip of the iceberg”

I have been always fascinated about how technology can transform humanity, organizations and the

super-system we live in; and how humanity and the system can transform technology. This

interplay between technology and humanity will unlikely to end soon. I learnt that decomposing the

relation between them is both an art and a science.

I learnt that Big Data is likely to be a big thing as long as mankind and machines create more data,

and this project proved to me that I’ve just scratched the surface of knowledge about this game

changer technology and how we can manage it.

Although, I started this project with some confidence on how this piece of work will eventually

look like and was pretty much sure of the traits of the island I was heading to, I found myself

surfing in the knowledge ocean in different directions, seeing different islands and was hoping to

visit them all. I finally landed on one island, hoping that I managed to provide a map similar to the

territory. I also hope that this primitive map will be a brick that will help others craft a better one.

50

8 Appendix A: Literature Search Below results were obtained on 17

th of November 2012

Vendor/Database Search Strings Details Hits

ProQuest: ABI/INFORM Complete "Academic"

"Big Data" "All Field No Full text" Date: 2008-2012 Peer Reviewed Source Type: Conference papers Dissertations Scholary Journal Subject Area: Business

32

ProQuest: ABI/INFORM Complete "Trade"

"Big Data" "All Field No Full text" Date: 2008-2012 Source Type: Trade Journal Reports Subject Area: Business

1608

EBSOS:OmniFile Full Text Mega/OmniFile Full Text Select

"Big Data" "TI, AB, SU" Date: 2008-2012 Source Type: Academic Journal Periodical Conference Proceedings Dissertations/Thesis

25

EBSOS:OmniFile Full Text Mega/OmniFile Full Text Select

"Big Data" "TI, AB, SU" Date: 2008-2012 Source Type: Trade Publication

0

Emerald "Big Data" "Journal" 0

51

9 Appendix B: Common Data Mining Methods(Shearer 2000)

Data Description and Summarization

Data Description and Summarization provides a concise description of the characteristics of data,

typically in elementary and aggregated form, to give users an overview of the data’s structure. Data

description and summarization alone can be an objective of a data mining project. For instance, a

retailer might be interested in the turnover of all outlets, broken down by categories, summarizing

changes and differences as compared to a previous period. In almost all data mining projects, data

description and summarization is a sub-goal in the process, typically in early stages where initial

exploratory data analysis can help to understand the nature of the data and to find potential

hypotheses for hidden information. Summarization also plays an important role in the presentation

of final results.

Many reporting systems, statistical packages, OLAP, and EIS systems can cover data description

and summarization but do not usually provide any methods to perform more advanced modeling. If

data description and summarization is considered a stand- alone problem type and no further

modeling is required, these tools also are appropriate to carry out data mining engagements.

Segmentation

The data mining problem type segmentation separates the data into interesting and meaningful

subgroups or classes that share common characteristics. For instance, in shopping basket analysis,

one could define segments of baskets, depending on the items they contain. An analyst can segment

certain subgroups as relevant for the business question, based on prior knowledge or based on the

outcome of data description and summarization. However, there also are automatic clustering

techniques that can detect previously unsuspected and hidden structures in data that allow

segmentation.

Segmentation can be a data mining problem type of its own when the detection of segments is the

main purpose. For example, all addresses in ZIP code areas with higher than average age and

income might be selected for mailing advertisements on home nursing insurance. However,

segmentation often is a step toward solving other problem types where the purpose is to keep the

size of the data manageable or to find homogeneous data subsets that are easier to analyze.

Appropriate techniques

Clustering techniques

52

Neural nets

Visualization

Example:

A car company regularly collects information about its customers concerning their socioeconomic

characteristics. Using cluster analysis, the company can divide its customers into more

understandable subgroups, analyze the structure of each subgroup, and deploy specific marketing

strategies for each group separately.

Concept Descriptions

Concept description aims at an understandable description of concepts or classes. The purpose is

not to develop complete models with high prediction accuracy, but to gain insights. For instance, a

company may be interested in learning more about their loyal and disloyal customers. From a

description of these concepts (loyal and disloyal customers), the company might infer what could be

done to keep customers loyal or to transform disloyal customers to loyal customers. Typically,

segmentation is performed before concept description. Some techniques, such as conceptual

clustering techniques, perform segmentation and concept description at the same time.

Concept descriptions also can be used for classification purposes. On the other hand, some

classification techniques produce understandable classification models, which then can be

considered concept descriptions. The important distinction is that classification aims to be complete

in some sense. The classification model needs to apply to all cases in the selected population. On

the other hand, concept descriptions need not be complete. It is sufficient if they describe important

parts of the concepts or classes.

Appropriate techniques

Rule induction methods

Conceptual clustering

Example

Using data about the buyers of new cars and using a rule induction technique, a car company could

generate rules that describe its loyal and disloyal customers. Below are simplified examples of the

generated rules:

If SEX = male and AGE > 51 then CUSTOMER = loyal If SEX = female and AGE > 21 then

CUSTOMER = loyal

53

Classification

Classification assumes that there is a set of objects—characterized by some attribute or feature—

which belong to different classes. The class label is a discrete (symbolic) value and is known for

each object. The objective is to build classification models (sometimes called classifiers) that assign

the correct class label to previously unseen and unlabeled objects. Classification models are mostly

used for predictive modeling.

Many data mining problems can be transformed to classification problems. For example, credit

scoring tries to assess the credit risk of a new customer. This can be transformed to a classification

problem by creating two classes, good and bad customers. A classification model can be generated

from existing customer data and their credit behavior. This classification model then can be used to

assign a new potential customer to one of the two classes and hence accept or reject him or her.

Classification has connections to almost all other problem types.

Appropriate techniques

Discriminant analysis

Rule induction methods

Decision tree learning

Neural nets

K Nearest Neighbor

Case-based reasoning

Genetic algorithms

Example

Banks generally have information on the payment behavior of their credit applicants. By combining

this financial information with other information about the customers, such as sex, age, income,

etc., it is possible to develop a system to classify new customers as good or bad customers, (i.e., the

credit risk in acceptance of a customer is either low or high, respectively).

Prediction

Another important problem type that occurs in a wide range of applications is prediction. Prediction

is very similar to classification, but unlike classification, the target attribute (class) in pre- diction is

not a qualitative discrete attribute but a continuous one. The aim of prediction is to find the

numerical value of the target attribute for unseen objects. This problem type is some- times called

regression. If prediction deals with time series data, then it is often called forecasting.

Appropriate techniques:

Regression analysis

Regression trees

54

Neural nets

K Nearest Neighbor

Box-Jenkins methods

Genetic algorithms

Example

The annual revenue of an international company is correlated with other attributes such as

advertisement, exchange rate, inflation rate, etc. Having these values (or their reliable estimations

for the next year), the company can predict its expected revenue for the next year.

Dependency Analysis Dependency analysis finds a model that describes significant dependencies

(or associations) between data items or events. Dependencies can be used to predict the value of a

data item, given information on other data items. Although dependencies can be used for predictive

modeling, they are mostly used for understanding. Dependencies can be strict or probabilistic.

Associations are a special case of dependencies that have recently become very popular.

Associations describe affinities of data items (i.e., data items or events that frequently occur

together). A typical application scenario for associations is the analysis of shop- ping baskets.

There, a rule such as “in 30 percent of all purchases, beer and peanuts have been bought together,”

is a typical example of an association. Algorithms for detecting associations are very fast and

produce many associations. Selecting the most interesting ones is often a challenge.

OF DATA WAREHOUSING Volume 5 Number 4 Fall 2000

Dependency analysis has close connections to prediction and classification, where dependencies are

implicitly used for the formulation of predictive models. There also is a connection to concept

descriptions, which often highlight dependencies. In applications, dependency analysis often co-

occurs with segmentation. In large data sets, dependencies are seldom significant because many

influences overlay each other. In such cases, it is advisable to perform a dependency analysis on

more homogeneous segments of the data.

Sequential patterns are a special kind of dependencies where the order of events is considered. In

the shopping basket domain, associations describe dependencies between items at a given time.

Sequential patterns describe shopping patterns of one particular customer or a group of customers

over time.

Appropriate Techniques

Correlation analysis

Regression analysis

55

Association rules

Bayesian networks

Inductive Logic Programming

Visualization techniques

Example

Using regression analysis, a business analyst might find a significant dependency between the total

sales of a product and its price and the amount of the total expenditures for the advertisement. Once

the analyst discovers this knowledge, he or she can reach the desired sales level by changing the

price and/or the advertisement expenditure accordingly.

56

10 Appendix C: Sample of the Coding Matrix

57

11 Appendix D: Turnitin Report

The full turnitin report is attached with this dissertation as a softcopy

Originality report including quotes:

Similarity Index 5% Similarity by Source Internet Sources: 5% Publications: 1% Student Papers: 1%

Originality report excluding quotes:

Similarity Index 2% Similarity by Source Internet Sources: 1% Publications: 1% Student Papers: 1%

58

Bibliography

Anderson, C., 2008. The End of Theory: The Data Deluge Makes the Scientific Method

Obsolete. Available at: http://www.wired.com/science/discoveries/magazine/16-

07/pb_theory.

Bill Schmarzo, 2012. Big Data MBA: Course 101A – Unit II.

Bontempo, C. & Zagelow, G., 1998. The IBM data warehouse architecture. ACM, 41(9),

pp.38–48.

Boyd, D. & Kate Crawford, 2011. Six Provocation for Big Data. In A Decade in Internet

Time: Symposium on the Dynamics of the Internet and Society. Available at:

http://ssrn.com/abstract=1926431.

Carl McDaniel, J. & Gates, R., 2012. Marketing Research with SPSS, John Wiley & Sons.

Carter, P., 2011. Big Data Analytics: Future Architectures, Skills and Roadmaps for the

CIO, Available at: http://www.sas.com/resources/asset/BigDataAnalytics-

FutureArchitectures-Skills-RoadmapsfortheCIO.pdf.

Chui, M., 2011. Inside P&G’s digital revolution. McKinsey Quarterly. Available at:

https://www.mckinseyquarterly.com/Inside_PGs_digital_revolution_2893#.

Cohen, L., Manion, L. & Morrison, K., 2007. Research Methods in Education Sixth., New

York: Routledge.

Collis, J. & Hussey, R., 2003. Business Research 2nd, ed., Palgrave Macmillian.

Cooper, A. & Schendel, D., 1998. Strategic Resposes to technological threats. Business

Horizons, pp.61–69.

Creswell, J.W., 2009. Research design : Qualitative, quantitative and mixed methods

approaches, Los Angeles: Sage.

Davenport, T.H., 2006. Competing on analytics. Harvard Business Review, 84(1), pp.98–

107, 134. Available at: http://www.ncbi.nlm.nih.gov/pubmed/20929194.

Davenport, T.H., 2013. P&G Finds a “Goldmine” in Analytics. WSJ. Available at:

http://blogs.wsj.com/cio/2013/02/13/pg-finds-a-goldmine-in-analytics/ [Accessed

March 8, 2013].

Davenport, Thomas H, Barth, P. & Bean, R., 2012. How “Big Data” Is Different. MIT

Sloan Management Review, 54(1), pp.p43–46. Available at:

http://search.ebscohost.com/login.aspx?direct=true&db=ofm&AN=80437210

[Accessed October 31, 2012].

59

Davenport, Thomas H., Barth, P. & Bean, R., 2012. How “Big Data” Is Different. MIT

Sloan Management Review, 54(1), pp.p43–46. Available at:

http://search.ebscohost.com/login.aspx?direct=true&db=ofm&AN=80437210

[Accessed October 31, 2012].

Davenport, T.H. & Dyché, J., 2013. Big Data in Big Companies. International Institution

For Analytics. Available at: http://www.sas.com/resources/asset/Big-Data-in-Big-

Companies.pdf.

Eaton, C. et al., 2012. Understanding Big Data, McGraw-Hill.

Economist, 2010. Data, data everywhere. Economist Special Report. Available at:

http://www.economist.com/node/15557443.

Edd Dumbill, 2012. What is Big Data? In Planning for Big Data A CIO Handbook to the

Changing Data Landscape. New York: O’Reilly.

Evans, P. & Marco Annunziata, Industrial Internet: Pushing the Boundaries of Minds and

Machines, Available at: www.ge.com/docs/chapters/Industrial_Internet.pdf.

Fairchild, A.M., 2002. Knowledge Management Metrics via a Balanced Scorecard

Methodology. Hawaii International Conference on System Sciences, 00(35th), pp.1–8.

Fayyad, U., Piatetsky-Shapiro & G, Smyth, P., 1996. The KDD process for extracting

useful knowledge from volumes of data. Communications of the ACM, 39(11), pp.27–

34.

Fayyad, U.M., Piatetsky-Shapiro, G. & Smyth, P., 1996. From data mining to knowledge

discovery: an overview. In U M Fayyad et al., eds. Advances in Knowledge Discovery

and Data Mining. American Association for Artificial Intelligence, pp. 1–34.

Available at: http://portal.acm.org/citation.cfm?id=257942.

Ferguson, R.B., 2012. The Obama Election: Analytics Makes the Call. MITSloan

Management Review. Available at: http://sloanreview.mit.edu/article/the-obama-

election-analytics-makes-the-call/.

Fishman, C., 2003. The Wal-Mart You Don’t Know. Available at:

http://www.fastcompany.com/47593/wal-mart-you-dont-know.

Frank Gens, 2012. I D C P r e d i c t i o n s 2 0 1 3 : C o m p e t i n g o n t h e 3 r d P l a t f

o r m, Framingham, MA.

Gantz, B.J. & Reinsel, D., 2011. Extracting Value from Chaos State of the Universe : An

Executive Summary. , (June), pp.1–12.

Han, J., Micheline Kamber & Jian Pei, 2012. Data Mining Concepts and Techniques 3rd

Editio., Morgan Kaufmann.

60

Henschen, D., 2011. News P&G Turns Analysis Into Action. InfomrationWeeks. Available

at: http://www.informationweek.com/global-cio/interviews/pg-turns-analysis-into-

action/231600959.

Hussey, J. & Hussey, R., 1997. Business Research - a pratical guide for undergraduate and

postgraduate students, New York: Palgrave.

Kaplan, R.S., 2010. Conceptual Foundations of the Balanced Scorecard Conceptual

Foundations of the Balanced Scorecard 1. , pp.1–36.

Kaplan, R.S. & Norton, D.P., 1992. The Balanced Scorecard – Measures that Drive

Performance The Balanced Scorecard — Measures. Harvard business review,

(January-February), pp.70–79.

Kiron, D. & Shockley, R., 2012. Creating Business Value with Analytics. MIT Sloan

Management Review, 53(1), pp.57–63. Available at: http://sloanreview.mit.edu/the-

magazine/2012-spring/53310/creating-value-through-business-model-innovation/.

Kotler, P., Kartajaya, H. & Setiawan, I., 2010. Marketing 3.0: From Products to Customers

to the Human Spirit, Wiley. Available at:

http://books.google.com/books?hl=en&lr=&id=8pk60fGn50oC&oi=fn

d&pg=PR7&dq=Marketing+3.0:+From+Products+to+Customers+to+the+H

uman+Spirit&ots=Ln3_yFSW2v&sig=WiuhiW0QBbI4cWhYXC-

pg96GNtk.

LaValle, S., Lesser, E. & Shockley, R., 2011. Big data, analytics and the path from insights

to value. MIT sloan management …, (52205). Available at:

http://www.ibm.com/smarterplanet/global/files/in_idea_smarter_computing_to_big-

data-analytics_and_path_from_insights-to-value.pdf [Accessed October 28, 2012].

Lazer, D. et al., 2009. Computational Social Science. Science, 323(5915), pp.721–723.

Available at: http://www.sciencemag.org.

Linoff S., G. & Berry A., M.J., 2011. Data Mining Techniques For Marketing, Sales and

Customer Relation Management Third Edit., Indianapolis: Wiley Publishing Inc.

Lopez, K., 2001. lopez.pdf. Knowledge Managment Review, 4(1), pp.20–23.

Maanen, V., 1983. Qualitative Methodology, London: Sage.

Manovich, L., 2011. Trending: The Promises and the Challenges of Big Social Data M. K.

Gold, ed. Debates in the Digital Humanities, pp.1–10. Available at:

http://www.manovich.net/DOCS/Manovich_trending_paper.pdf.

Manyika, J. et al., 2011. Big data: The next frontier for innovation, competition, and

productivity, Available at:

http://scholar.google.com/scholar?hl=en&btnG=Search&q=intitle:Big+data+:+The+ne

61

xt+frontier+for+innovation+,+competition+,+and+productivity#0 [Accessed October

28, 2012].

Martinsons, M., Davison, R. & Tse, D., 1999. The balanced scorecard: a foundation for the

strategic management of information systems. Decision Support Systems, 25(1),

pp.71–88. Available at:

http://linkinghub.elsevier.com/retrieve/pii/S0167923698000864.

Mayer-Schönberger, V., Cukier, K. & Houghton Mifflin, 2013. Big Data, A Revolution

That Will Transform How We Live, Work, and Think.

Mintzberg, H., B, Q. & Ghoshal, S., 1999. The Strategy Process, Harlow, Harlow: Printice

Hall.

Murphy, C., 2012. Why P&G CIO Is Quadrupling Analytics Expertise. InfomrationWeeks.

Murphy, T., 2012. Meet Obama’s Digital Gurus. Mother Jones. Available at:

http://www.motherjones.com/politics/2012/10/obama-campaign-tech-staff [Accessed

February 29, 2013].

Nolan, R. & Mcfarlan, F.W., 2005. Information Technology and the Board of Directors.

Harvard Business Review, pp.96–106.

Patil, D., 2011. Building data science teams. O’Reilly Radar. Available at:

http://radar.oreilly.com/2011/09/building-data-science-teams.html [Accessed

November 17, 2012].

Paul C. Zikopoulos et al., 2012. Harnessing the Power Of Big Data, McGraw-Hill.

Available at:

http://public.dhe.ibm.com/common/ssi/ecm/en/imm14100usen/IMM14100USEN.PDF

.

Pettey, C. & Meulen, R. van der, 2012. Gartner’s 2012 Hype Cycle for Emerging

Technologies Identifies “Tipping Point” Technologies That Will Unlock Long-

Awaited Technology Scenarios. Gartner. Available at:

http://www.gartner.com/it/page.jsp?id=2124315.

Philip Russom, 2011. Big Data Analytics,

Rijmenam, V., 2013. Walmart Makes Big Data Part of Its DNA. Available at:

http://smartdatacollective.com/bigdatastartups/111681/walmart-makes-big-data-part-

its-social-media.

SAS, 2012. Big Data Meets Big Data Analytics Three Key Technologies for Extracting

Real-Time Business Value from the Big Data That Threatens to Overwhelm

Traditional Computing Architectures, Available at:

http://www.sas.com/resources/whitepaper/wp_46345.pdf.

62

Saunders, M., Lewis, P. & Thornhill, A., 2009. for business students fi fth edition Fifth.,

Edinburgh: Pearson Education.

Scherer, M., 2012. Inside the Secret World of the Data Crunchers Who Helped Obama

Win. TIME. Available at: http://swampland.time.com/2012/11/07/inside-the-secret-

world-of-quants-and-data-crunchers-who-helped-obama-win/.

Shearer, C., 2000. The CRISP-DM Model: The New Blueprint for Data Mining. JOURNAL

OF DATA WAREHOUSING, 5, pp.13–22.

Strauss, A. & Corbin, J., 1998. Basics of Qualitative research Techniques and Procedures

for Developing Grounded Theory, London: Sage.

Stubbs, E., 2011. The Value of Business Analytics, John Wiley & Sons.

Vellante, D., 2013. The GE Pivotal Announcement: Rewriting the Rules of Big Data and

Internet of Things. Available at: http://siliconangle.com/blog/2013/04/24/the-ge-

pivotal-announcement-rewriting-the-rules-of-big-data-and-internet-of-things/.

Wadhwa, T., 2012. Nate Silver and the Rise of Political Data Science. HUFF POST

POLICITCS. Available at: http://www.huffingtonpost.com/tarun-wadhwa/nate-silver-

election-predictions_b_2090909.html.

WalMart, 2013. Walmart 2012 Annual Report, Available at:

http://www.walmartstores.com/sites/annual-report/2012/WalMart_AR.pdf.

Wim Van Grembergen, Saull, R. & Steven De Haes, 2004. Linking the IT Balanced

Scorecard to the Business Objectives at a Major Canadian Financial group. UAMS,

(11/27). Available at: www.uams.be/itag.