Presentation at Google Day on Big Data
-
Upload
rezaur-rahman -
Category
Data & Analytics
-
view
115 -
download
7
description
Transcript of Presentation at Google Day on Big Data
Big Data
Data is growing at a exponential rate and traditional tools like RDBMS is not enough to process
Data is everywhere:
• Flickr (87 million registered members and 3.5 million photos per day)
• YouTube (4B videos streamed per day)• Yahoo! Webmap (3 trillion links, 300TB compressed, 5PB
disk)• Facebook is collecting your data 500 terabytes a day• Walmart handles more than 1 million customer
transactions every hour• IDC Estimates that by 2020, business transactions on the
internet- business-to-business and business-to-consumer – will reach 450 billion per day.
Data is growing at a 40% rate, reaching nearly 45 ZB by 2020 according to IDC
1 ZB is equal to 1 billion TB
What is Big Data and what is not?
• Order details of a e-commerce site• All Orders across 1000s of e-commerce sites• One person’s voter ID information• Every citizen’s voter ID information dataset
Simple Definition: Big Data is Data, that is too big to process with a single machine
What is Big Data?
3 v’s of Big Data
Types of Data:
• Relational Data (Tables/Transaction/Legacy Data)
• Unstructured Data – Apache weblogs• Text Data (Web)• Semi-structured Data (XML) • Graph Data• Social Network, Semantic Web (RDF)• Streaming Data
Data Processing Tasks:
• Aggregation and Statistics - Data warehouse• Contextual Advertising – Real Time Bidding,
Remarketing• Indexing, Searching, and Querying - Keyword
based search, Pattern recognition• Knowledge discovery - Data Mining, Statistical
Modeling
Traditional Architecture
• Relational Data is everything– SQL– Embedded– Client-Server Based
• Data Stack– Web, CDN, Load Balancers, Application, Database
and Storage
Traditional Scalability
• Scale-up– Memory And Hardware has limitations
• Scale-out– Reading
• Cache is everything– Query Cache– Memcache
• Pre-fetching, Replication– Writes
• Redundant Disk Arrays, RAID• Sharding
NoSQL Solution
• Lot of companies emerged to solve data problem• Big Table: Google started to implement massively
distributed scalable system• Many companies followed building scale-out
architecture using commodity hardware• ACID was termed as bad for scaling, so relaxed
consistency model came• Google Big Table and Amazon Dynamo are
notable
Big Data Tools
Big Data Landscape
Thanks
Questions?