CISC 7610 Lecture 2b The beginnings of NoSQLm.mr-pc.org/t/cisc7610/2017fa/lecture02b.pdf ·...
Transcript of CISC 7610 Lecture 2b The beginnings of NoSQLm.mr-pc.org/t/cisc7610/2017fa/lecture02b.pdf ·...
CISC 7610 Lecture 2bThe beginnings of NoSQL
Topics:Big Data
Google’s infrastructureHadoop: open google infrastructure
Scaling through shardingCAP theorem
Amazon’s Dynamo
5 V’s of big data
● Everyone wants to say they do “Big Data”
● But many people agree true Big Data has 5 v’s:– Volume: a large amount of existing data
– Velocity: new data is generated and must be processed quickly
– Variety: different types and modalities of data must be combined
– Veracity: it is desirable to understand data provenance
– Value: the data are useful for something
http://www.ibmbigdatahub.com/blog/why-only-one-5-vs-big-data-really-matters
Google’s PageRank is a good example
● Rank search results not only by relevance to query, but also intrinsic quality of a page
● Estimate quality from relationships in link graph
● Important pages link to other important pages
Harrison, G. Next Generation Databases. New York: Apress. 2015
Google’s hardware: massive parallelism on commodity machines
● Traditional approach:– Big storage cluster
– Big database machine
– Fast network between them
● Google’s approach– “Scale out” instead of “scale
up”
– Process data where it is stored
– Adding machines adds storage, processing, I/O, network
Harrison, G. Next Generation Databases. New York: Apress. 2015
Google’s software stack (ca 2005)
● Commodity hardware fails
● Software needs to be robust to failures
● Google file system (GFS): distributed file system, redundant copies of data, massive bandwidth
● MapReduce: system for parallelizing jobs across many unreliable machines
● BigTable: non-relational database on GFS
Harrison, G. Next Generation Databases. New York: Apress. 2015
MapReduce basics: word count
Harrison, G. Next Generation Databases. New York: Apress. 2015
MapReduce basics: word count
Harrison, G. Next Generation Databases. New York: Apress. 2015
MapReduce basics: word count
Harrison, G. Next Generation Databases. New York: Apress. 2015
MapReduce basics: word count
Harrison, G. Next Generation Databases. New York: Apress. 2015
MapReduce basics: word count
Harrison, G. Next Generation Databases. New York: Apress. 2015
Typical jobs use multiple MapReduce passes / tasks
Harrison, G. Next Generation Databases. New York: Apress. 2015
Hadoop: open-source Google stack
● Hadoop distributed file system (HDFS): like GFS
● HBase: like BigTable
● Node types (v1.0)– Data node: storage
– Task tracker: processing
– Job Tracker: scheduling
– Name node: data directory
Harrison, G. Next Generation Databases. New York: Apress. 2015
Hbase and Hive
● Many more people use SQL than can write MapReduce programs
● Hive converts a SQL-like query language to MapReduce jobs
● But not real-time like SQL
Harrison, G. Next Generation Databases. New York: Apress. 2015
Hive (Facebook) vs Pig (Yahoo!)
● Both query languages for HBase
● Hive is declarative (what to do)
● Pig Latin is procedural (how to do it)
● Pig can do everything Hive can and more
● But you need to know how to do it
Harrison, G. Next Generation Databases. New York: Apress. 2015
Hadoop 2.0 moves beyond MapReduce
● YARN: Yet Another Resource Manager
● Splits task tracker into– Resource manager:
controls access to resources like memory, CPU
– Application manager: controls task execution
Harrison, G. Next Generation Databases. New York: Apress. 2015
Summary: Google and Hadoop
● Massive parallelism on commodity hardware
● Allows scaling out, not up
● Designed to tolerate hardware failures
● Mainly batch-style processing, difficult to deal with data velocity
● No transactional operations
Harrison, G. Next Generation Databases. New York: Apress. 2015
Scaling transactional web sites
● Original Google infrastructure was built to index static web pages● “Web 2.0” applications allowed interaction, built around databases
– Scaling web server is easy, because HTTP is stateless
– Scaling database is hard
Harrison, G. Next Generation Databases. New York: Apress. 2015
Scaling transactional web sites
● Scale web servers by adding more machines
● Scale databases “up” by adding bigger machine– Single point of failure, bottleneck
● Scale database reads by adding caching, read-only slaves– All writes still go through single master machine
Harrison, G. Next Generation Databases. New York: Apress. 2015
Scale writes by sharding
● Split biggest table across machines based on a key
● Facebook ca 2011: 4000 MySQL shards, 9000 memcached servers, 1.4B reads/s, 3.5M row changes/s, 8.1M physical I/O ops/s
Harrison, G. Next Generation Databases. New York: Apress. 2015
Issues with sharding
● Application complexity: application must be able to determine which shard to use
● Crippled SQL: joins across shards very difficult– Basically only programmers can access whole DB
● Loss of transactional integrity: transactions across shards possible, but impractical for performance
● Operational complexity: load balancing across shards is complex, as is adding new shards, changing schema, etc
Harrison, G. Next Generation Databases. New York: Apress. 2015
Brewer's CAP theorem
● These three properties cannot be achieved by a distributed system simultaneously– Strong Consistency: All clients see the same data at the
same time
– Availability: All requests receive a response as to their success or failure
– Partition tolerance: The system continues to function in the event of network failures
Brewer's real CAP theorem
● A node fails to communicate with another node to keep it in sync (P)
● It can decide to– Go ahead anyway, sacrificing consistency (C)
– Wait for the other node, sacrificing availability (A)
● Business considerations: availability > consistency– Take the customer's order and sort it out later if
necessary
Partitions are inevitable
● A network is partitioned when a component fails and a cluster is divided in two
● Want applications to keep operating
● Can't tell a partition from a node going down
Sacrificing availability
● Wait until all nodes can synchronize
● “Amazon claim that just an extra one tenth of a second on their response times will cost them 1% in sales. Google said they noticed that just a half a second increase in latency caused traffic to drop by a fifth.”
● Examples: Multi-master DBs, Neo4j, Google BigTable
http://www.julianbrowne.com/article/viewer/brewers-cap-theorem
Sacrificing consistency
● Go ahead with update (Eventual consistency)
● Problems– Pushes complexity from database into application
– When is “eventually”?
● Examples– Amazon’s Dynamo
– Domain name service
– Facebook's Cassandra and Voldemort
Amazon’s non-relational Dynamo:Requirements
● Continuous availability
● Network partition tolerant
● No-loss conflict resolution: never lose an order
● Efficiency: low latency
● Economy: run on commodity hardware
● Incremental scalability: add servers without downtime or manual maintenance
Amazon’s non-relational Dynamo:Characteristics
● Relaxed consistency, within limits– Configurable trade-off
between consistency and availability, configurable by the application
● Only primary key-based access
● No data model (schema)
● What we now call a key-value store
Amazon’s non-relational Dynamo:Innovative features
● Consistent hashing: allow shards to be added and removed with minimal rebalancing
● Tunable consistency: application specifies trade-off between consistency, read performance, write performance
● Data versioning: keeping multiple versions of each entry allows some automatic conflict resolution
Consistent hashing
● Decide which shard to put a piece of data on
● Naive mapping of key to shard is difficult to re-balance when adding or removing shards
● Consistent hashing minimizes rebalancing
● Virtual nodes further minimize rebalancing
Tunable consistency:NWR notation
● N: Number of replicas of data
● W: number to write before returning to application
● R: number to access in a read
Tunable consistency:NWR notation
● N: Number of replicas of data
● W: number to write before returning to application
● R: number to access in a read
Tunable consistency:NWR notation
● N: Number of replicas of data
● W: number to write before returning to application
● R: number to access in a read
Summary: Scaling with low consistency
● Amazon decided to relax consistency requirements in exchange for availability
● CAP theorem says that you can’t have both● Sharding already loses ACID and SQL-for-non-
programmers● Dynamo’s unstructured “value”s are also not useful for non
programmers– But we will discuss Document Databases soon
● Since Dynamo, many key-value stores released, especially starting 2008-2009
Tunable consistency:NWR notation
● N: Number of replicas of data
● W: number to write before returning to application
● R: number to access in a read
Summary: Scaling with low consistency
● Amazon decided to relax consistency requirements in exchange for availability
● CAP theorem says that you can’t have both● Sharding already loses ACID and SQL-for-non-
programmers● Dynamo’s unstructured “value”s are also not useful for non
programmers– But we will discuss Document Databases soon
● Since Dynamo, many key-value stores released, especially starting 2008-2009