CISC 7610 Lecture 2b The beginnings of NoSQLm.mr-pc.org/t/cisc7610/2017fa/lecture02b.pdf ·...

CISC 7610 Lecture 2bThe beginnings of NoSQL

Topics:Big Data

Google’s infrastructureHadoop: open google infrastructure

Scaling through shardingCAP theorem

Amazon’s Dynamo

5 V’s of big data

● Everyone wants to say they do “Big Data”

● But many people agree true Big Data has 5 v’s:– Volume: a large amount of existing data

– Velocity: new data is generated and must be processed quickly

– Variety: different types and modalities of data must be combined

– Veracity: it is desirable to understand data provenance

– Value: the data are useful for something

http://www.ibmbigdatahub.com/blog/why-only-one-5-vs-big-data-really-matters

Google’s PageRank is a good example

● Rank search results not only by relevance to query, but also intrinsic quality of a page

● Estimate quality from relationships in link graph

● Important pages link to other important pages

Harrison, G. Next Generation Databases. New York: Apress. 2015

Google’s hardware: massive parallelism on commodity machines

● Traditional approach:– Big storage cluster

– Big database machine

– Fast network between them

● Google’s approach– “Scale out” instead of “scale

up”

– Process data where it is stored

– Adding machines adds storage, processing, I/O, network


http://www.ibmbigdatahub.com/blog/why-only-one-5-vs-big-data-really-matters

Google’s software stack (ca 2005)

● Commodity hardware fails

● Software needs to be robust to failures

● Google file system (GFS): distributed file system, redundant copies of data, massive bandwidth

● MapReduce: system for parallelizing jobs across many unreliable machines

● BigTable: non-relational database on GFS


MapReduce basics: word count


Typical jobs use multiple MapReduce passes / tasks


Hadoop: open-source Google stack

● Hadoop distributed file system (HDFS): like GFS

● HBase: like BigTable

● Node types (v1.0)– Data node: storage

– Task tracker: processing

– Job Tracker: scheduling

– Name node: data directory


Hbase and Hive

● Many more people use SQL than can write MapReduce programs

● Hive converts a SQL-like query language to MapReduce jobs

● But not real-time like SQL


Hive (Facebook) vs Pig (Yahoo!)

● Both query languages for HBase

● Hive is declarative (what to do)

● Pig Latin is procedural (how to do it)

● Pig can do everything Hive can and more

● But you need to know how to do it


Hadoop 2.0 moves beyond MapReduce

● YARN: Yet Another Resource Manager

● Splits task tracker into– Resource manager:

controls access to resources like memory, CPU

– Application manager: controls task execution


Summary: Google and Hadoop

● Massive parallelism on commodity hardware

● Allows scaling out, not up

● Designed to tolerate hardware failures

● Mainly batch-style processing, difficult to deal with data velocity

● No transactional operations


Scaling transactional web sites

● Original Google infrastructure was built to index static web pages● “Web 2.0” applications allowed interaction, built around databases

– Scaling web server is easy, because HTTP is stateless

– Scaling database is hard


Scaling transactional web sites

● Scale web servers by adding more machines

● Scale databases “up” by adding bigger machine– Single point of failure, bottleneck

● Scale database reads by adding caching, read-only slaves– All writes still go through single master machine


Scale writes by sharding

● Split biggest table across machines based on a key

● Facebook ca 2011: 4000 MySQL shards, 9000 memcached servers, 1.4B reads/s, 3.5M row changes/s, 8.1M physical I/O ops/s


Issues with sharding

● Application complexity: application must be able to determine which shard to use

● Crippled SQL: joins across shards very difficult– Basically only programmers can access whole DB

● Loss of transactional integrity: transactions across shards possible, but impractical for performance

● Operational complexity: load balancing across shards is complex, as is adding new shards, changing schema, etc


Brewer's CAP theorem

● These three properties cannot be achieved by a distributed system simultaneously– Strong Consistency: All clients see the same data at the

same time

– Availability: All requests receive a response as to their success or failure

– Partition tolerance: The system continues to function in the event of network failures

Brewer's real CAP theorem

● A node fails to communicate with another node to keep it in sync (P)

● It can decide to– Go ahead anyway, sacrificing consistency (C)

– Wait for the other node, sacrificing availability (A)

● Business considerations: availability > consistency– Take the customer's order and sort it out later if

necessary

Partitions are inevitable

● A network is partitioned when a component fails and a cluster is divided in two

● Want applications to keep operating

● Can't tell a partition from a node going down

Sacrificing availability

● Wait until all nodes can synchronize

● “Amazon claim that just an extra one tenth of a second on their response times will cost them 1% in sales. Google said they noticed that just a half a second increase in latency caused traffic to drop by a fifth.”

● Examples: Multi-master DBs, Neo4j, Google BigTable

http://www.julianbrowne.com/article/viewer/brewers-cap-theorem

Sacrificing consistency

● Go ahead with update (Eventual consistency)

● Problems– Pushes complexity from database into application

– When is “eventually”?

● Examples– Amazon’s Dynamo

– Domain name service

– Facebook's Cassandra and Voldemort

Amazon’s non-relational Dynamo:Requirements

● Continuous availability

● Network partition tolerant

● No-loss conflict resolution: never lose an order

● Efficiency: low latency

● Economy: run on commodity hardware

● Incremental scalability: add servers without downtime or manual maintenance

http://www.julianbrowne.com/article/viewer/brewers-cap-theorem

Amazon’s non-relational Dynamo:Characteristics

● Relaxed consistency, within limits– Configurable trade-off

between consistency and availability, configurable by the application

● Only primary key-based access

● No data model (schema)

● What we now call a key-value store

Amazon’s non-relational Dynamo:Innovative features

● Consistent hashing: allow shards to be added and removed with minimal rebalancing

● Tunable consistency: application specifies trade-off between consistency, read performance, write performance

● Data versioning: keeping multiple versions of each entry allows some automatic conflict resolution

Consistent hashing

● Decide which shard to put a piece of data on

● Naive mapping of key to shard is difficult to re-balance when adding or removing shards

● Consistent hashing minimizes rebalancing

● Virtual nodes further minimize rebalancing

Tunable consistency:NWR notation

● N: Number of replicas of data

● W: number to write before returning to application

● R: number to access in a read

Summary: Scaling with low consistency

● Amazon decided to relax consistency requirements in exchange for availability

● CAP theorem says that you can’t have both● Sharding already loses ACID and SQL-for-non-

programmers● Dynamo’s unstructured “value”s are also not useful for non

programmers– But we will discuss Document Databases soon

● Since Dynamo, many key-value stores released, especially starting 2008-2009

Tunable consistency:NWR notation

● N: Number of replicas of data

● W: number to write before returning to application

● R: number to access in a read

Summary: Scaling with low consistency

● Amazon decided to relax consistency requirements in exchange for availability

● CAP theorem says that you can’t have both● Sharding already loses ACID and SQL-for-non-

programmers● Dynamo’s unstructured “value”s are also not useful for non

programmers– But we will discuss Document Databases soon

● Since Dynamo, many key-value stores released, especially starting 2008-2009

CISC 7610 Lecture 2b The beginnings of NoSQLm.mr-pc.org/t/cisc7610/2017fa/lecture02b.pdf ·...

Documents

Transcript of CISC 7610 Lecture 2b The beginnings of NoSQLm.mr-pc.org/t/cisc7610/2017fa/lecture02b.pdf ·...