Presentation at Google Day on Big Data

17
Big Data

description

 

Transcript of Presentation at Google Day on Big Data

Page 1: Presentation at Google Day on Big Data

Big Data

Page 2: Presentation at Google Day on Big Data

Hello

Rezaur Rahman (Jitu)CTO, G&R Ad [email protected]@jituboss

Page 3: Presentation at Google Day on Big Data

Data is growing at a exponential rate and traditional tools like RDBMS is not enough to process

Page 4: Presentation at Google Day on Big Data

Data is everywhere:

• Flickr (87 million registered members and 3.5 million photos per day)

• YouTube (4B videos streamed per day)• Yahoo! Webmap (3 trillion links, 300TB compressed, 5PB

disk)• Facebook is collecting your data 500 terabytes a day• Walmart handles more than 1 million customer

transactions every hour• IDC Estimates that by 2020, business transactions on the

internet- business-to-business and business-to-consumer – will reach 450 billion per day.

Page 5: Presentation at Google Day on Big Data

Data is growing at a 40% rate, reaching nearly 45 ZB by 2020 according to IDC

1 ZB is equal to 1 billion TB

Page 6: Presentation at Google Day on Big Data

What is Big Data and what is not?

• Order details of a e-commerce site• All Orders across 1000s of e-commerce sites• One person’s voter ID information• Every citizen’s voter ID information dataset

Simple Definition: Big Data is Data, that is too big to process with a single machine

Page 7: Presentation at Google Day on Big Data

What is Big Data?

Page 8: Presentation at Google Day on Big Data

3 v’s of Big Data

Page 9: Presentation at Google Day on Big Data

Types of Data:

• Relational Data (Tables/Transaction/Legacy Data)

• Unstructured Data – Apache weblogs• Text Data (Web)• Semi-structured Data (XML) • Graph Data• Social Network, Semantic Web (RDF)• Streaming Data

Page 10: Presentation at Google Day on Big Data

Data Processing Tasks:

• Aggregation and Statistics - Data warehouse• Contextual Advertising – Real Time Bidding,

Remarketing• Indexing, Searching, and Querying - Keyword

based search, Pattern recognition• Knowledge discovery - Data Mining, Statistical

Modeling

Page 11: Presentation at Google Day on Big Data

Traditional Architecture

• Relational Data is everything– SQL– Embedded– Client-Server Based

• Data Stack– Web, CDN, Load Balancers, Application, Database

and Storage

Page 12: Presentation at Google Day on Big Data

Traditional Scalability

• Scale-up– Memory And Hardware has limitations

• Scale-out– Reading

• Cache is everything– Query Cache– Memcache

• Pre-fetching, Replication– Writes

• Redundant Disk Arrays, RAID• Sharding

Page 13: Presentation at Google Day on Big Data

NoSQL Solution

• Lot of companies emerged to solve data problem• Big Table: Google started to implement massively

distributed scalable system• Many companies followed building scale-out

architecture using commodity hardware• ACID was termed as bad for scaling, so relaxed

consistency model came• Google Big Table and Amazon Dynamo are

notable

Page 14: Presentation at Google Day on Big Data

Big Data Tools

Page 15: Presentation at Google Day on Big Data

Big Data Landscape

Page 16: Presentation at Google Day on Big Data

Thanks

Page 17: Presentation at Google Day on Big Data

Questions?