Finding and Using Big Data in your business

31
Simon Elliston Ball Head of Big Data @sireb Finding (and using) Big Data in your Business #findBigData http://bit.ly/findBigData

description

Technology challenges and how to introduce big data tools to your organisation, a real use case based on Red Gate's Feature analytics, and some cultural tools

Transcript of Finding and Using Big Data in your business

Page 1: Finding and Using Big Data in your business

Simon Elliston Ball Head of Big Data !

@sireb !

!

!

!

Finding (and using) Big Data in your Business

#findBigData

http://bit.ly/findBigData

Page 2: Finding and Using Big Data in your business

Now THAT's Big Data• A modern Ford kicks out 25GB per car, in a day.

• Ad networks: over a billion event logs per day.

• PayPal: 3 billion transactions a year

• Climate Corporation: soil type record for every square meter in the USA

• Facebook: 10PB a day

Page 3: Finding and Using Big Data in your business

So you're probably not Facebook• Big Data takes many forms

• Velocity

• Variety

• Volume

• Value

• Veracity

Page 4: Finding and Using Big Data in your business

Feature usage at Red Gate• We are obsessed with UX

• Knowing what our users do helps us make their life better

• Error reporting

• Feature usage reporting

• Conversations, survey, sales everything goes into making products better.

Page 5: Finding and Using Big Data in your business

The default: SQL Server

Page 6: Finding and Using Big Data in your business

The problem: SQL Server

"I used to use FUR all the time! I can't use it anymore, it's too slow." - Michelle, Product Manager

"I'm running a query right now... It started yesterday :(" - Ben, Product Manager

"Hey, this database is taking up a few TBs, can we just delete it?" - Simon, DBA

Page 7: Finding and Using Big Data in your business

DELETE IT!?!?!?

• Thinning out old data

• Archiving to cheaper storage (even tape)

• Turning down collection

Page 8: Finding and Using Big Data in your business

Big Data to the rescue

• Cheap storage in Hadoop

• Scale out, not scale up

• Distributed computing required for speed

• Occasional bursty workloads

• Semi-structured

Page 9: Finding and Using Big Data in your business

Hadoop• Created by Doug Cutting as a backend for a search engine and

crawler (Nutch) in 2005.

• Developed further at Yahoo

• Based on Google's papers on Google Filesystem, and MapReduce

• Since grown into an ecosystem of tools

• Now version 2.0

Page 10: Finding and Using Big Data in your business

Hadoop

Page 11: Finding and Using Big Data in your business

All grown up

Page 12: Finding and Using Big Data in your business

Really complex• Lots of moving parts

• Integrating into your network can be complex

• Getting all the tools to play nice

• Self build

• Fixing up from a good starting point

• Use a distro

Page 13: Finding and Using Big Data in your business

Sandboxes

• Quick Start

• Great to learn

Page 14: Finding and Using Big Data in your business

The menagerie

Page 15: Finding and Using Big Data in your business

What we did

• Test cloud

• Virtualization is not Hadoop's friend.

• Performance is not good

• “Can we have 2TB on the SAN for /tmp?” Ur. No.

• "Borrowed" some old hardware, and got a small cluster running.

Page 16: Finding and Using Big Data in your business

Putting data in

• Sqoop

• Cleaning

• ORC

Page 17: Finding and Using Big Data in your business

How to not kill SQL server

• To a DBA Sqoop is a DDOS attack

• Limit the number of mappers Sqoop uses

• Import from a replica, or backup

Page 18: Finding and Using Big Data in your business

Immediate value

• The data was a lot smaller

• Cheaper to store

• Column formats

• Compression: use lzo, bzip costs too much, and gzip is bad for Hadoop.

Page 19: Finding and Using Big Data in your business

Give it back! Queries and ETL

• Hive. Reuse your SQL

• Pig. New, but worth learning

• MapReduce? (Optional. Warning: may contain java. Or snakes)

Page 20: Finding and Using Big Data in your business

Give it back to the business

• Summary report in Excel

• Batch jobs

• Pump back into SQL for slicing and dicing

• Give us MORE!

Page 21: Finding and Using Big Data in your business

Give it back! The platform

• To the cloud!

• Reuse all our existing queries and workflow

• On demand compute

• Takes time to lift the initial data set into cloud storage, but incremental updates are fast

Page 22: Finding and Using Big Data in your business

Demo HDInsight

Page 23: Finding and Using Big Data in your business

Thinking like a data scientist

• Plan your experiments

• Precision is subjective.

• Show the error bars

• Use whatever tool works

• Embrace uncertainty

Page 24: Finding and Using Big Data in your business

Know your business

Page 25: Finding and Using Big Data in your business

Think strategically

• Business buy-in

• Show quick wins

• What is your analysis for?

• What will it deliver to the business?

Page 26: Finding and Using Big Data in your business

Break down the requirements

• Prioritize

• Go for the top value pieces

• Perfect fit for Agile methodologies

Page 27: Finding and Using Big Data in your business

Communication• Talk to everyone you can

• Before

• After

• During

• Organizational knowledge

• Keep a log

Page 28: Finding and Using Big Data in your business

Communication

• Conversations

• Coffee machine

• Formal talks

Page 29: Finding and Using Big Data in your business

So what's next?

• Denormalize

• Democratize

• Machine learning for alerts

• Marketing

• Sales

Page 30: Finding and Using Big Data in your business

And of course new tools

• We want to talk to you...

Page 31: Finding and Using Big Data in your business

QuestionsSimon Elliston Ball [email protected] !

@sireb

http://bit.ly/findBigData

#findBigData