Big table

17
Presented by Kevin Warrick Manuel Correa BigTable BigTable

description

Big table presentation

Transcript of Big table

Page 1: Big table

Presented byKevin Warrick

Manuel Correa

BigTableBigTable

Page 2: Big table

BigTable is a distributed storage system for managing structure data

Designed by Google Inc. in 2006

BigTable was designed to scale to petabytes of data and thousands of machines

What is BigTable? What is BigTable?

Page 3: Big table

Distributed Column – oriented Multidimensional High Availability High Performance Store System Self-managing

What is BigTable? What is BigTable?

Page 4: Big table

A SQL database No joins No query engine No types No SQL Normalized schema Not necessarily a replacement for RDBMS

BigTable is not... BigTable is not...

Page 5: Big table

Google has a lot of data.

Scale of data is too large even for commercial databases. Commercial databases require expensive hardware.

Google’s infrastructure is on arrays of low-cost commodity hardware, not cutting edge mainframes.

Internal database solution can be applied across a large range of Google products.

Absolute control over optimization and customization.

MotivationMotivation

Page 6: Big table

BigTable is composed of several other innovative, distribution oriented components.

GFS (Google file system) - backing store Scheduler - schedules jobs onto machines Lock service - distributed lock manager for workers MapReduce - framework for large scale calculations

Building BlocksBuilding Blocks

Page 7: Big table

BigTable is sparse, distributed, persistent multidimensional sorted Map

The map is index by row key, column key, timestamp. The value is an array of bytes

BigTable ModelBigTable Model

Page 8: Big table

Example: WebTable

BigTable ModelBigTable Model

Page 9: Big table

BigTable data is ordered lexicographically by row key– A row Range is for a table is dynamically partitioned. Each partition is

call tablet– A tablet is the unit of distribution and load balancing

Example: WebTable– Pages in the same domain are group together in contiguous rows– This makes easy to perform analysis, search and data retrieval as well

as distributed data across machines

BigTable Model - RowsBigTable Model - Rows

Page 10: Big table

Columns keys are grouped together in a single unit called column families– A column family is the basic unit of access control– All data within a column family is usually of the same type– A family must be created before to add any column index– The column families rarely change. The column key may change often– Syntax: family:qualifier

Example: WebTable– A family anchor with qualifier cnn.com– Anchor:cnn.com Anchor:mydomain.com

BigTable Model – ColumnsBigTable Model – Columns

Page 11: Big table

Each cell in BigTable are index by timestamp– Maintain different version of the same data– The most recent version will be first. The order of the timestamp is

decreasing– The system implements garbage collector. This takes care of unused

versions

Example: WebTable– The contents family column of a Web page has different versions

BigTable Model – TimestampsBigTable Model – Timestamps

Page 12: Big table

Each cell in BigTable are index by timestamp– Maintain different version of the same data– The most recent version will be first. The order of the timestamp is

decreasing– The system implements garbage collector. This takes care of unused

versions

Example: WebTable– The contents family column of a Web page has different versions

BigTable Model – TimestampsBigTable Model – Timestamps

Page 13: Big table

The implementation has three major components– A library that is linked into every client– One Master server– Many tablet servers

BigTable runs over Google File System

BigTable is store in a structure called SSTable. Each SSTable is divided into 64KB blocks. A Sstable can be loaded to Memory

BigTable ImplementationBigTable Implementation

Page 14: Big table

Chubby File: Provides an namespace to access the root table. This this is the first entry point to locate a user table. The service is distributed. The cubby service is used for: Bootstrap the location of BigTable Discover server tablets Finalize tablets servers deaths

BigTable ImplementationBigTable Implementation

Page 15: Big table

Root Table: contains the access point to METADATA tablet

METADATA Tablets: Provides the access to point to User Tables

The user library contains cache information about the location of the tablet

BigTable ImplementationBigTable Implementation

Page 16: Big table

BigTable DemoBigTable Demo

Page 17: Big table

Questions ?

BigTableBigTable