Clive Longbottom, Service Director, Quocirca Ltd “Big Data” The wrong name for a major issue?...

Post on 28-Dec-2015

215 views 1 download

Transcript of Clive Longbottom, Service Director, Quocirca Ltd “Big Data” The wrong name for a major issue?...

Clive Longbottom,

Service Director, Quocirca Ltd

“Big Data”The wrong name for a major issue?

Clive Longbottom,

Service Director, Quocirca Ltd

© Quocirca 2013

“Big Data”

• It’s not about databases per se

• It is about:– Volume – but not just databases– Velocity – results need to be

produced in near real-time– Variety – the aspect that is missed

by many– Veracity – how good are the inputs– Value – is the data worth it?

© Quocirca 2013

Which of the following statements most closely matches your understanding of the term “big data”?

© Quocirca 2013

How well do you believe that you understand what tools are needed for “big data”?

© Quocirca 2013

From your point of view, big data can be dealt with through:

© Quocirca 2013

How important do you believe big data will be to your organisation over the next 2 years?

© Quocirca 2013

A basic “rule of thumb”

• 20 years ago:– Only 20% of an organisation’s

information was in electronic form– 80% of this was in a formal database

• Today:– Well over 80% of an organisation’s

information is in electronic form– Less than 20% is in a formal database

© Quocirca 2013

The enterprise application dilemma

Inf. Silo

CRM ERP SCM

Inf. Silo Inf. Silo

© Quocirca 2013

The growth of unstructured

• Not just text – but images, video media assets, VoIP, Videoconferencing

• Replicated/archived data a large part of growth

• But – is it completely unstructured?

Source: Ram Subramanyam Gopalan

© Quocirca 2013

File formatting

• XML (or quasi-XML)• CSV/tab delimited• Text blocks• Meta data• TCP/IP packet header information• Pattern recognition• Colour, shape, texture (CST)• Inferred data

© Quocirca 2013

The open “value chain”

Your Organisation

SupplierSupplier’s

supplierCustomer

Customer’scustomer

Information flows

“Open” information from e.g. search engines, social networks

© Quocirca 2013

Organisation information sources

• Organisation data:– Enterprise application data– Office documents– Reports, analytics– GRC information– Information on competitors– Financial performance data– Images, voice, video…– …

© Quocirca 2013

Supplier information sources

• Supplier data– Logistics data– Inventory data– Transactional data– Competitive information– Credit and background checks– Invoices, catalogues, contracts, images…– Voice, video…– …

© Quocirca 2013

Customer information sources

• Customer data:– Orders, payment details, returns information– Past purchases– Credit and background checks– Searches, web analytics– Social media comments– …

© Quocirca 2013

Information issues

• You no longer have control– The open value chain removes

direct control– Security of information assets

is critical• Identifying and aggregating

information assets– Capturing information when

and where possible – and legal– Bringing structured and

unstructured together• Sifting through the dross to get to

the “golden nuggets”

© Quocirca 2013

Shrink and filter…

• Information under your control:– Deduplicate– Taxonomise– Index– Tag

• Information not under your control:– Filter (intelligently)– Tag and index when it crosses your

boundaries

© Quocirca 2013

Federate and aggregate

• Link databases– Use master data management

• Bring in unstructured data– Use Hadoop along with NoSQL datastores (e.g.

Cassandra, MongoDB)• Use cross-function search and reporting tools

– E.g. HP Autonomy, CommVault Simpana• Use analytics to present results in meaningful ways

© Quocirca 2013

Basic schematic approach

SQL NoSQL

MapReduce

Filter

Apply metadata

App

Search, analyse and report

© Quocirca 2013

A future glimpse?

• It’s déjà vu all over again– Remember in-memory databases?

• Big data cannot remain as a jigsaw solution– Full-service solutions will come forward

• Who will be the winners?– Oracle, IBM, Microsoft?– SAP?– EMC, Symantec?– The Open Source environment (e.g. 10Gen,

Apache/Cassandra, CouchDB)?

© Quocirca 2013

Conclusions

• Big Data has many vectors– Volume, velocity, variety and veracity: each is as

important as the others - value will accrue through getting them right

• More information is outside the realm of your direct control– Capturing what can be captured in a useful manner is

key• The evolution of the market is rapid

– NoSQL and Hadoop provide the underpinnings for a new, information centric approach

• The formal database is not dead– But it is only on aspect of the problem – and the

solution

© Quocirca 2013

Thank you

Contact details:Clive.Longbottom@Quocirca.com

Further reading:http://quocirca.com/reports/150http://quocirca.com/articles/617http://quocirca.com/articles/637