[[The Wikibon Project]]

16
© Wikibon 2011 | Confidential www.wikibon.org [[The Wikibon Project]] Big Data and Hadoop: Key Drivers, Ecosystem and Use Cases November 2011

description

[[The Wikibon Project]]. Big Data and Hadoop: Key Drivers, Ecosystem and Use Cases November 2011. What is Big Data?. Big Data n Data sets whose size, type and/or speed make them impractical to process and analyze with traditional database technologies and related data management tools. - PowerPoint PPT Presentation

Transcript of [[The Wikibon Project]]

Page 1: [[The Wikibon Project]]

© Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org

[[The Wikibon Project]]

Big Data and Hadoop: Key Drivers, Ecosystem and Use CasesNovember 2011

Page 2: [[The Wikibon Project]]

© Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org

What is Big Data?

2

Big Data n Data sets whose size, type and/or speed make them impractical to process and analyze with traditional database technologies and related data management tools.

Page 3: [[The Wikibon Project]]

© Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org

Why is Big Data Important?

3

Big Data is the new definitive source of competitive advantage across industries …… For those organizations that embrace Big Data, the possibilities for innovation, improved agility, and increased profitability are nearly endless.

Page 4: [[The Wikibon Project]]

© Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org

Three Key Big Data Drivers

4

1. Volume, Variety, Velocity

2. Hardware Commoditization

3. Cloud Computing

Page 5: [[The Wikibon Project]]

© Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org

Characteristics of Big Data

5

Page 6: [[The Wikibon Project]]

© Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org

Sources of Big Data

6

Page 7: [[The Wikibon Project]]

© Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org

Hadoop

7

Open source framework for processing, storing and analyzing Big Data.

Fundamental concept: Rather than banging away at one, huge block of data with a single machine, Hadoop breaks up Big Data into multiple parts so each part can be processed and analyzed in parallel.

Page 8: [[The Wikibon Project]]

© Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org

Hadoop: The Pros and Cons

8

First the pros … Hadoop is a time- and cost-effective approach to store, process and analyze large volumes of unstructured data allowing for new and unprecedented types of analytics.

Now the cons … Hadoop is complex and difficult to deploy and manage; there’s a dearth of Hadoop-savvy engineers and Data Scientists on the job market; the risk of forking and vendor lock-in remains.

Page 9: [[The Wikibon Project]]

© Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org

Hadoop: The Pros and Cons cont.

9

More pros … Many bright minds contributing to Hadoop resulting in rapid development and an ecosystem of vendors emerging to make Hadoop enterprise-ready.

Page 10: [[The Wikibon Project]]

© Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org

The Big Data Ecosystem

10

Page 11: [[The Wikibon Project]]

© Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org

Big Data Pioneers

11

• Largest Hadoop instance on the planet … 40,000 nodes handling 200+ PB of data.

• Used to support research for ad systems and Web search.

• Match ads with users, detect spam in Yahoo! Mail, pick relevant top stories.

Page 12: [[The Wikibon Project]]

© Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org

Big Data Pioneers cont.

12

• Two major clusters processing and storing over 30 PB of data.

• Uses HDFS to store copies of internal log and dimension data.

• Developed Hive to perform large-scale analytics on user data.

• Using HBase to store, manage and retrieve Facebook Messenger data.

Page 13: [[The Wikibon Project]]

© Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org

Big Data Pioneers cont.

13

• Uses Hadoop to support “People You May Know” feature.

• Tailors its search engine to return most relevant results for recruiters, employers and job seekers.

• Created a visualization tool to allow users to explore their professional network to discover hidden patterns.

Page 14: [[The Wikibon Project]]

© Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org

Big Data in Financial Services

14

• Over 30,000 databases and 15,000 applications spread across 7 business units.

• Using Hadoop as the basis of its Common Data Platform.

• Looking to establish 360 degree view of customer for upsell and cross-sell opportunities.

Page 15: [[The Wikibon Project]]

© Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org

Big Data in Financial Services cont.

15

• Risk management and analysis to understand financial exposure.

• Detecting fraudulent transactions and potentially criminal activity.

• Conduct sentiment analysis on social media data.

Page 16: [[The Wikibon Project]]

© Wikibon 2008© Wikibon 2011 | Confidential www.wikibon.org

Thank You

16

Jeffrey F. KellyPrincipal Research Contributor

The Wikibon Project

[email protected]@jeffreyfkelly

www.wikibon.orgwww.siliconangle.com