Post on 06-Jul-2015
description
© 2014 Aerospike, Inc. All rights reserved. Confidential. | Berlin Big Data Beers - October 29, 2014
Aerospike aer . o . spike [air-oh- spahyk] noun, 1. tip of a rocket that enhances speed and stability
FIRST FLASH OPTIMIZED IN-MEMORY NOSQL DATABASE
NOW OPEN SOURCE!
IN-MEMORY NOSQL
KHOSROW AFROOZEH ENGINEER
© 2014 Aerospike. All rights reserved. Confidential. | Berlin Big Data Beers October 29, 2014
Aerospike – Built for the Age of the Millions Of Customers
■ The Gold Standard7 of top 16 powered by Aerospike(after Google, FB, from BuiltWith.com )
© 2014 Aerospike, Inc. All rights reserved. Confidential. | Berlin Big Data Beers - October 29, 2014
– Geir Magnusson, CTO of AppNexus
“We run Aerospike heavily, peaking at 3 Million reads per second and well over 1 1/2 million writes a second in a very cost effective way. I don’t think there’s
any technology we’ve run into that even comes close.”
© 2014 Aerospike, Inc. All rights reserved. Confidential. | Berlin Big Data Beers - October 29, 2014
INTELLIGENT & INSTANT INTERNET-SCALE INTERACTIONS
Who Uses Aerospike?
© 2014 Aerospike, Inc. All rights reserved. Confidential. | Berlin Big Data Beers - October 29, 2014
MARKET FORCES
© 2014 Aerospike, Inc. All rights reserved. Confidential. | Berlin Big Data Beers - October 29, 2014
REQUIREMENTS FOR INTERNET ENTERPRISES
1. Know who the Interaction is with■ Monitor 200+ Million US Consumers,
5+ Billion mobile devices and sensors
2. Determine intent based on current context
■ Page views, search terms, game state, last purchase, friends list, ads served, location
3. Respond now, use big data for more accurate decisions■ Display the most relevant Ad■ Recommend the best product■ Deliver the richest gaming experience■ Eliminate fraud…
4. 100% up-time!
© 2014 Aerospike, Inc. All rights reserved. Confidential. | Berlin Big Data Beers - October 29, 2014
Response time: Hours, WeeksTB to PBRead Intensive
TRANSACTIONS (OLTP)
Response time: SecondsGigabytes of data
Balanced Reads/Writes
ANALYTICS (OLAP)
STRUCTURED DATA
Response time: SecondsTerabytes of data
Read Intensive
BIG DATA ANALYTICS
Real-time TransactionsResponse time: < 10 ms1-20 TBBalanced Reads/Writes24x7x365 Availability
UNSTRUCTURED DATA
REAL-TIME BIG DATA
DATABASE LANDSCAPE
© 2014 Aerospike. All rights reserved. Confidential. | Berlin Big Data Beers October 29, 2014
Introduction to Advertising: Real-Time Bidding
© 2014 Aerospike. All rights reserved. Confidential. | Berlin Big Data Beers October 29, 2014
North American RTB speeds & feeds■ 1 to 6 billion cookies tracked
■ Some companies track 200M, some track 20B
■ Each bidder has their own data pool■ Data is your weapon■ Recent searches, behavior, IP addresses■ Audience clusters (K-cluster, K-means) from offline Hadoop
■ “Remnant” from Google, Yahoo is about 0.6 million / sec■ Facebook exchange: about 0.6 million / sec■ “other” is 0.5 million / sec
Currently about 2.0M / sec in North America
© 2014 Aerospike. All rights reserved. Confidential. | Berlin Big Data Beers October 29, 2014
Advertising requirements
■ 100 millisecond to 150 millisecond ad delivery■ De-facto standard set in 2004 by Washington Post and others
■ North America is 70 to 90 milliseconds wide / Europe About half of that■ Two or Three data centers
■ Auction is limited to 30 milliseconds■ Typically closes in 5 milliseconds
■ Winners have more data, better models – in 5 milliseconds
© 2014 Aerospike. All rights reserved. Confidential. | Berlin Big Data Beers October 29, 2014
MILLIONS OF CONSUMERS BILLIONS OF DEVICES
APP SERVERS
DATA WAREHOUSEINSIGHTS
Advertising Technology Stack
WRITE CONTEXT
OPERATIONAL DB
WRITE REAL-TIME CONTEXT READ RECENT CONTENT
PROFILE STORE Cookies, email, deviceID, IP address, location, segments, clicks, likes, tweets, search terms...
REAL-TIME ANALYTICS Best sellers, top scores, trending tweets
BATCH ANALYTICS Discover patterns, segment data: location patterns, audience affinity
© 2014 Aerospike. All rights reserved. Confidential. | Berlin Big Data Beers October 29, 2014
Financial Services – Intraday Positions
LEGACY DATABASE (MAINFRAME)
Read/Write
Start of Day Data Loading
End of Day Reconciliation
QueryREAL-TIME DATA FEED
ACCOUNT POSITIONS
XDR
10M+ user records
Primary key access
1M+ TPS planned
Finance App
Records App
RT Reporting App
© 2014 Aerospike. All rights reserved. Confidential. | Berlin Big Data Beers October 29, 2014
Travel Portal
PRICING DATABASE (RATE LIMITED)
Poll for Pricing Changes
PRICING DATA
Store Latest Price
SESSION MANAGEMENT
Session Data Read
Price
XDR
Airlines forced interstate banking
Legacy mainframe technology
Multi-company reservation and pricing
Requirement: 1M TPS allowing overhead
Travel App
© 2014 Aerospike. All rights reserved. Confidential. | Berlin Big Data Beers October 29, 2014
SOURCE DEVICE/ USER
QOS & Real-Time Billing for Telcos
■ In-switch Per HTTP request Billing■ US Telcos: 200M subscribers, 50 metros
■ In-memory use case
Hot Standby
Execute Request
Real-time Checks
DESTINATION
Update Device User Settings
Request
XDR
Real-time Auth. QoS Billing
Config Module App
© 2014 Aerospike. All rights reserved. Confidential. | Berlin Big Data Beers October 29, 2014
Old Architecture ( mid 2000s )
Request routing and sharding
APP SERVERS
CACHE
DATABASE
STORAGE
CONTENT DELIVERY NETWORK
LOAD BALANCER
© 2014 Aerospike. All rights reserved
Modern Scale Out Architecture
Load balancer Simple stateless
APP SERVERS
IN-MEMORY NoSQL
RESEARCHWAREHOUSE
CONTENT DELIVERY NETWORK
LOAD BALANCER
Long term cold storageFast stateless
© 2014 Aerospike. All rights reserved. Confidential. | Berlin Big Data Beers October 29, 2014
Modern Scale Out Architecture
Load balancer Simple statelessAPP SERVERS
IN-MEMORY NoSQL
RESEARCHWAREHOUSE
CONTENT DELIVERY NETWORK
LOAD BALANCER
Long term cold storageFast stateless
HDFS BASED
© 2014 Aerospike, Inc. All rights reserved. Confidential. | Berlin Big Data Beers - October 29, 2014
ARCHITECTURE
© 2014 Aerospike. All rights reserved. Confidential. | Berlin Big Data Beers October 29, 2014
Architecture – The Big Picture
1) No Hotspots – DHT simplifies data partitioning
2) Smart Client – 1 hop to data, no load balancers
3) Shared Nothing Architecture, every node identical
7) XDR – sync replication across data centers ensures Zero Downtime
8) Scale linearly as data-sizes and workloads increase
9) Add capacity with no service interruption
4) Single row ACID – synch replication in cluster
5) Smart Cluster, Zero Touch – auto-failover, rebalancing, rack aware, rolling upgrades..
6) Transactions and long running tasks prioritized real-time
© 2014 Aerospike, Inc. All rights reserved. Confidential. | Berlin Big Data Beers - October 29, 2014
SHARED-NOTHING SYSTEM: 100% DATA AVAILABILITY
■ Every node in a cluster is identical, handles both transactions and long running tasks
■ Data is replicated synchronously with immediate consistency within the cluster
■ Data is replicated asynchronously across data centers
OHIO Data Center
© 2014 Aerospike, Inc. All rights reserved. Confidential. | Berlin Big Data Beers - October 29, 2014
ROBUST DHT TO ELIMINATE HOT SPOTSHow Data Is Distributed (Replication Factor 2)
■ Every key is hashed into a 20 byte (fixed length) string using the RIPEMD160 hash function
■ This hash + additional data (fixed 64 bytes)are stored in RAM in the index
■ Some bits from this hash value are used to compute the partition id
■ There are 4096 partitions
■ Partition id maps to node id based on cluster membership
cookie-abcdefg-12345678
182023kh15hh3kahdjsh
Partition ID
Master node
Replica node
… 1 4
1820 2 3
1821 3 2
4096 4 1
© 2014 Aerospike, Inc. All rights reserved. Confidential. | Berlin Big Data Beers - October 29, 2014
REAL-TIME PRIORITIZATION TO MEET SLA
1. Write sent to row master
2. Latch against simultaneous writes
3. Apply write to master memory and replica memory synchronously
4. Queue operations to disk
5. Signal completed transaction (optional storage commit wait)
6. Master applies conflict resolution policy (rollback/ rollforward)
master replica
1. Cluster discovers new node via gossip protocol
2. Paxos vote determines new data organization
3. Partition migrations scheduled
4. When a partition migration starts, write journal starts on destination
5. Partition moves atomically
6. Journal is applied and source data deleted
transactions continueWriting with Immediate Consistency Adding a Node
© 2014 Aerospike, Inc. All rights reserved. Confidential. | Berlin Big Data Beers - October 29, 2014
INTELLIGENT CLIENT TO MAKE APPS SIMPLER Shield Applications from the Complexity of the Cluster
■ Implements Aerospike API ■ Optimistic row locking■ Optimized binary protocol
■ Cluster tracking ■ Learns about cluster changes,
partition map
■ Transaction semantics■ Global transaction ID■ Retransmit and timeout
■ Linear scale■ No extra hop■ No load balancers
© 2014 Aerospike, Inc. All rights reserved. Confidential. | Berlin Big Data Beers - October 29, 2014
OTHER DATABASE
OS FILE SYSTEM
PAGE CACHE
BLOCK INTERFACE
SSD HDD
BLOCK INTERFACE
SSD SSD
OPEN NVM
SSD
OTHER DATABASE
AEROSPIKE FLASH OPTIMIZEDIN-MEMORY DATABASE
Ask me and I’ll tell you the answer.Ask me. I’ll look up the answer and then I’ll let you know.
AEROSPIKE
HYBRID MEMORY SYSTEM™
• Direct Device Access • Large Block Writes • Indexes in DRAM • Highly Parallelized • Log-structured FS “copy-on-write” • Fast restart with shared memory
FLASH OPTIMIZED HIGH PERFORMANCE
© 2014 Aerospike, Inc. All rights reserved. Confidential. | Berlin Big Data Beers - October 29, 2014
Storage type DRAM & NoSQL SSD & DRAMStorage per server 180 GB (196 GB Server) 2.4 GB (4 x 700 GB)
TPS per server 500,000 500,000Cost per server $8,000 $11,000
Server costs $1,488,000 $154,000Power/server 0.9 kW 1.1 kW
Power (2 years) $0.12 per kWh ave. US
$352,000 $32,400Maintenance (2 years) $3,600 per
server$670,000 $50,400
Total $2,510,000 $236,800
FLASH PROVIDES DRAM-LIKE PERFORMANCE WITH MUCH LOWER COMPLEXITY & TCO
Actual customer analysis.Customer requires 500K TPS,
10 TB of storage, with 2x replication factor.
186 SERVERS REQUIRED 14 SERVERS REQUIRED
OTHER DATABASES
ONLY
© 2014 Aerospike, Inc. All rights reserved. Confidential. | Berlin Big Data Beers - October 29, 2014
High Availability Through Clustering & Replication
1 32 4 5 Phases1) 100KTPS – 4 nodes2) Clients at Max 3) 400KTPS – 4 nodes4) 400KTPS – 3 nodes5) 400KTPS – 4 nodes
Aerospike Node Specs: CentOS 6.3 Intel i5-2400@ 3.1 GHz (Quad core) 16 GB RAM@1333 MHz
© 2014 Aerospike, Inc. All rights reserved. Confidential. | Berlin Big Data Beers - October 29, 2014
HOT ANALYTICS
© 2014 Aerospike, Inc. All rights reserved. Confidential. | Berlin Big Data Beers - October 29, 2014
Key Value Store + Lists, Maps■ Namespaces (policy containers)
■ Determine storage - DRAM or Flash■ Determine replication factor■ Contain records and sets
■ Sets (tables) of records■ Arbitrary grouping
■ Records (rows) of key/bins■ Block size (128KiB – 2MiB)
■ Bin with same name can contain values of different types
■ String, integer, bytes (raw, blob, etc)■ List ( an ordered collection of values )■ Map ( a collection of keys and values )
■ Bins can be added anytime
■ Meta data■ Generation counter so apps can ensure that a
record was not modified since last read ■ Time-To-Live value for auto expiration, keeping
most recent context or "hot" data, aging out historical context
© 2014 Aerospike, Inc. All rights reserved. Confidential. | Berlin Big Data Beers - October 29, 2014
KVS + Lists, Maps + Queries + UDFs
STREAM AGGREGATIONS
(INDEXED MAP-REDUCE)
Pipe Query results through UDFs ■ Filter, Transform,
Aggregate: Map, Reduce
■ Enforce security
■ UDFs in Lua to ■ CRUD on record■ Calculation
based on data within a record
■ Iterate through a set / namespace of records
■ UDFs for real-time analytics and aggregations
© 2014 Aerospike, Inc. All rights reserved. Confidential. | Berlin Big Data Beers - October 29, 2014
LOW SELECTIVITY INDEX QUERIES
1. Query sent to ALL nodes in parallel “SCATTER”
2. Secondary Index keys in DRAM ■ Map to Primary keys in DRAM■ Co-located with Record on SSD
3. Records read in parallel from ALL SSDs
4. Parallel read results aggregated on node
5. Results from ALL nodesaggregated client-side
“GATHER”
Secondary Keys
Primary Keys
Records R1, R2
DRAM
SSDServer
Client
…
Keys Keys
R3, R4 R5, R4
V1 V2 V3 V4 V5 V6
© 2014 Aerospike, Inc. All rights reserved. Confidential. | Berlin Big Data Beers - October 29, 2014
SQL & NoSQL
➤ Secondary index▪ Equality, Range, Compound▪ e.g. WHERE group_id = 1234,
WHERE last_activity > 1349293398, WHERE branch_id BETWEEN 19812 AND 1987139
➤ Filters▪ SQL: Where clause with non-indexed “AND”s
(e.g. “AND gender=‘M’ ”)▪ NOSQL: Map step
➤ Aggregation▪ SQL: GROUP BY, ORDER BY, LIMIT,
OFFSET▪ NOSQL: Reduce step
Secondary Key
Primary Key
Record
Filter Map
Aggregate
DRAM
SSD
Aggregate
Client
Client
Server
Reduce
Aggregate
Query
© 2014 Aerospike, Inc. All rights reserved. Confidential. | Berlin Big Data Beers - October 29, 2014
Operational + Analytics + Adding servers and Re-balancing
■ 300k TPS Operations + Process 1 Million records■ Runs in 0.5 seconds
■ Add 2 servers, auto-rebalance while running query
© 2014 Aerospike, Inc. All rights reserved. Confidential. | Berlin Big Data Beers - October 29, 2014
Performance
© 2014 Aerospike, Inc. All rights reserved. Confidential. | Berlin Big Data Beers - October 29, 2014
Native Flash ! Performance
0
100,000
200,000
300,000
400,000
Balanced Read-Heavy
AerospikeCassandraMongoDBCouchbase 2.0*
* “We were forced to exclude Couchbase… since when run with either disk or replica durability on, it was unable to complete the test.” – Thumbtack Technology
Balanced Workload Read Latency
Aver
age
Late
ncy,
m
s
0
2.25
4.5
6.75
9
Throughput, ops/sec
0 50,000 100,000 150,000 200,000
AerospikeCassandraMongoDB
Balanced Workload Update Latency
Aver
age
Late
ncy,
m
s
0
3.5
7
10.5
14
Throughput, ops/sec
0 50,000 100,000 150,000 200,000
AerospikeCassandraMongoDB
HIGH THROUGHPUT LOW LATENCY
Thro
ughp
ut,
TPS
© 2014 Aerospike, Inc. All rights reserved. Confidential. | Berlin Big Data Beers - October 29, 2014
Updated YCSB Benchmark
■ What’s different?■ Aerospike 3.2.8 instead of Aerospike 2
■ Stock irqbalance is smarter■ “After Burner” script maps threads/cores to
cpu sockets, no copies across NUMA nodes
■ Minimized context switching, branch instructions
■ 10G network instead of 1 G
© 2014 Aerospike, Inc. All rights reserved. Confidential. | Berlin Big Data Beers - October 29, 2014
2014: 1 M TPS on Single Server
© 2014 Aerospike. All rights reserved. Confidential. | Berlin Big Data Beers October 29, 2014
Hot Analytics
■ High throughput Queries■ 2 node cluster, 10 Indexes■ Query returns 100 of 50M
records■ Predictable low latency
UN-PREDICTABLE LATENCY
128 – 300 ms
70 – 760 ms
7 – 10 ms
QPS
© 2014 Aerospike. All rights reserved. Confidential. | Berlin Big Data Beers October 29, 2014
Amazon EC2 results
© 2014 Aerospike. All rights reserved. Confidential. | Berlin Big Data Beers October 29, 2014
Amazon EC2 results
© 2014 Aerospike. All rights reserved. Confidential. | Berlin Big Data Beers October 29, 2014
Amazon EC2 results
© 2014 Aerospike, Inc. All rights reserved. Confidential. | Berlin Big Data Beers - October 29, 2014
LESSONS LEARNED
© 2014 Aerospike, Inc. All rights reserved. Confidential. | Berlin Big Data Beers - October 29, 2014
LESSONS LEARNED
1. Keep architecture simple■ No hot spots (e.g., robust DHT)■ Scales out easily (e.g., easy to size)■ Avoids points of failure (e.g., single node type)
2. Avoid manual operation – automate, automate!■ Self-managed cluster responds to node failures■ Data rebalancing requires no intervention■ Real-time prioritization allows unattended system operation
3. Keep system asynchronous■ Shared nothing – nodes are autonomous■ Async writes across data centers■ Independent tuning parameters for different classes of tasks
© 2014 Aerospike, Inc. All rights reserved. Confidential. | Berlin Big Data Beers - October 29, 2014
LESSONS LEARNED (cont’d)
4. Monitor the Health of the System Extensively■ Growth in load sneaks up on you over weeks■ Early detection means better service■ Most failures can be predicted (e.g., capacity, load, …)
5. Size clusters properly■ Have enough capacity ALWAYS!■ Upgrade SSDs every couple years■ Reduce cluster sizes to make operations simple
6. Have geographically distributed data centers■ Size the distributed data centers properly■ Use active-active configurations if possible■ Size bandwidth requirements accurately
© 2014 Aerospike, Inc. All rights reserved. Confidential. | Berlin Big Data Beers - October 29, 2014
LESSONS LEARNED (cont’d)
7. Have plan for unforeseen situations■ Devise scenarios and practice during normal work time■ Ensure you can do rolling upgrades during high load time■ Make sure that your nodes can restart fast (< 1 minute)
8. Constantly test and monitor app end-to-end ■ Application level metrics are more important than DB metrics■ Most issues in a service are due to a combination of application, network,
database, storage, etc.9. Separate online and offline workloads
■ Reserve real-time edge database for transactions and hot analytics queries (where newest data is important)
■ Avoid ad-hoc queries on on-line system■ Perform deep analysis in offline system (Hadoop)
10. Use the Right Data Management System for the job■ Fast NoSQL DB for real-time transactions and hot analytics on rapidly changing
data■ Hadoop or other comparable systems for exhaustive analytics on mostly read-only
data
© 2014 Aerospike, Inc. All rights reserved. Confidential. | Berlin Big Data Beers - October 29, 2014
1. Scaling the Internet of Everything 2. Pushing the limits of modern hardware 3. No data loss (ACID) and No downtime
MODERN REAL-TIME DATA PLATFORMAP
P SE
RVER
AER
OSP
IKE
SERV
ER
REAL-TIME BIG DATA APPLICATION
AEROSPIKE SMART CLIENT™
• APIs (C, C#, Java, Go, PHP, Python, Ruby, Node, Erlang…)• Transactions, Cluster awareness
EXTENSIBLE DATA MODEL
• Str, Int, Lists, Maps• Lookups, Queries, Scans
• Aerospike Alchemy Framework™with User Defined Functions and Distributed Aggregations
MONITORING & MANAGEMENT
• Aerospike Monitoring Console™
• Command Line Tools
• Plugins-Nagios, Graphite, Zabbix
AEROSPIKE SMART CLUSTER™
AEROSPIKE HYBRID MEMORY SYSTEM™
PROXIMITY & REDUNDANCY
Cross Data Center Replication™ (XDR)
REAL-TIMEENGINE
APP/WEB SERVER
AEROSPIKE CLUSTER
Written in ‘C’, Patents pending
© 2014 Aerospike. All rights reserved. Confidential. | Berlin Big Data Beers October 29, 2014
Server, Storage, Cloud benchmarks, partnerships
© 2014 Aerospike, Inc. All rights reserved. Confidential. | Berlin – October 29, 2014 |
SUMMARY
Rapid Development Complete Customizability➤ Support for popular languages and
tools ▪ AQL and Aerospike Client in C,
Java, C#, Go, Ruby, Python, …
➤ Complex data types ▪ Nested documents
(map, list, string, integer)▪ Large (Stack, Set, List) Objects
➤ Queries ▪ Single Record ▪ Batch multi-record lookups ▪ Equality and Range ▪ Aggregations and Map-Reduce
➤ User Defined Functions▪ In-DB processing
➤ Aggregation Framework ▪ UDF Pipeline▪ MapReduce
➤ Time Series Queries▪ Just 2 IOPs for most r/w
(independent of object size)
© 2014 Aerospike. All rights reserved
Live analytics without ETL
http://www.aerospike.com/community/labs/
© 2014 Aerospike, Inc. All rights reserved. Confidential. | Berlin Big Data Beers - October 29, 2014
Join us!
@khafkhosrow@aerospike.com
We are hiring hardcore system developers.