Big data cache-surge-2012

94
Without a Big Database Kate Matsudaira Decide.com @katemats Big Data

description

These days it is not uncommon to have 100s of gigabytes of data that must be sliced and diced, then delivered fast and rendered quickly. Typically solutions involve lots of caching and expensive hardware with lots of memory. And, while those solutions certainly can work, they aren’t always cost effective, or feasible in certain environments (like in the cloud). This talk seeks to cover some strategies for caching large data sets without tons of expensive hardware, but through software and data design. It’s common wisdom that serving your data from memory dramatically improves application performance and is a key to scaling. However, caching large datasets brings its own challenges: distribution, consistency, dealing with memory limits, and optimizing data loading just to name a few. This talk will go through some of the challenges, and solutions, to achieve fast data queries in the cloud. The audience will come away armed with a number of practical techniques for organizing and building caches for non-transactional datasets that can be applied to scale existing systems, or design new systems.

Transcript of Big data cache-surge-2012

Page 1: Big data cache-surge-2012

Without a Big Database

Kate Matsudaira Decide.com @katemats

Big Data

Page 2: Big data cache-surge-2012

Two kinds of data nicknames “user”, “transactional” “reference”, “non-

transactional”

examples:

•  user accounts •  shopping cart/orders •  user messages •  …

•  product/offer catalogs •  service catalogs •  static geolocation data •  dictionaries •  …

created/modified by: users business (you)

sensitivity to staleness: high low

plan for growth: hard easy

access optimization: read/write mostly read

Page 3: Big data cache-surge-2012

!eference data

user data

Page 4: Big data cache-surge-2012

!eference data

user data

Page 5: Big data cache-surge-2012

!eference data

user data

Page 6: Big data cache-surge-2012

Reference Data Needs Speed

Img src: http://cache.gawker.com Img src: http://

cache.gawker.com Img src: http://cache.gawker.com Img src: http://

cache.gawker.com Img src: http://cache.gawker.com Img src: http://

cache.gawker.com

credit:: http://cache.gawker.com

Page 7: Big data cache-surge-2012

Performance

main memory read 0.0001 ms (100 ns)

network round trip 0.5 ms (500,000 ns)

disk seek 10 ms (10,000,000 ns)

source: http://www.cs.cornell.edu/projects/ladis2009/talks/dean-keynote-ladis2009.pdf

Reminder

Page 8: Big data cache-surge-2012
Page 9: Big data cache-surge-2012

The Beginning

webapp

webapp

load

bal

ance

r

load

bal

ance

r

BIGDATABASE

service

service

service

data loader

availability problems

performance problems

"calability problems

Page 10: Big data cache-surge-2012

REPLICA

Replication

webapp

webapp

load

bal

ance

r

load

bal

ance

r

BIGDATABASE

service

service

service

data loader

"calability problems

performance problems

operational overhead

Page 11: Big data cache-surge-2012

REPLICA

webapp

webapp

load

bal

ance

r

load

bal

ance

r

BIGDATABASE

service

service

service

data loader

Local Caching "calability problems

performance problems

operational overhead

cach

e ca

che

cach

e

consistency problems

long tail performance problems

Page 12: Big data cache-surge-2012

The Long Tail

Problem

20% of requests query remaining 90% of entries (tail)

80% of requests query 10% of entries (head)

Page 13: Big data cache-surge-2012

webapp

webapp

load

bal

ance

r

load

bal

ance

r

replica

service

service

service data loader

BIG CACHE

database

preload

Big Cache

"calability problems

performance problems

consistency problems

long tail performance

problems

operational overhead

Page 14: Big data cache-surge-2012

ElastiCache (AWS)

memcached(b)

Oracle Coherence

Do I look like I need a cache?

Page 15: Big data cache-surge-2012

Targeted generic data/

use cases.

Scales horizontally

Dynamically rebalances

data

Poor performance on cold starts

No assumptions about loading/updating data

Dynamically assign keys to the “nodes”

Page 16: Big data cache-surge-2012

operational overhead

} performance

} •  Extra network hop

•  Slow scanning

•  Additional deserialization

•  Additional hardware

•  Additional configuration

•  Additional monitoring

Big Cache Technologies

Page 17: Big data cache-surge-2012

NoSQL Replica

webapp

webapp load

bal

ance

r

load

bal

ance

r

NoSQL Database

service

service

service

data loader

NoSQL to The Rescue?

"#e operational overhead

"#e performance problems

"#e "calability problems

Page 18: Big data cache-surge-2012

Remote Store Retrieval Latency

TCP request: 0.5 ms

Lookup/write response: 0.5 ms

TCP response: 0.5 ms read/parse

response: 0.25 ms

remote store network client network

1.75 ms Total time to retrieve single value:

Page 19: Big data cache-surge-2012

Total time to retrieve A single value

Sequential access of 1 million random keys

from remote store: 1.75 msfrom memory: 0.001 ms (10 main memory reads)

from remote store: 30 minutesfrom memory: 1 second

Page 20: Big data cache-surge-2012

The Truth About Databases

Page 21: Big data cache-surge-2012

“What I'm going to call as the hot data cliff: As the size of your hot data set (data frequently read at sustained rates above disk I/O capacity) approaches available memory, write operation bursts

that exceeds disk write I/O capacity can create a trashing death spiral where hot disk pages that MongoDB desperately needs are

evicted from disk cache by the OS as it consumes more buffer space to hold the writes in memory.”

MongoDB Source: http://www.quora.com/Is-MongoDB-a-good-replacement-for-Memcached

Page 22: Big data cache-surge-2012

“Redis is an in-memory but persistent on disk database, so it represents a different trade off where very high write and read speed is achieved with the

limitation of data sets that can't be larger than memory.”

Redis

source: http://redis.io/topics/faq

Page 23: Big data cache-surge-2012

They are fast if everything fits into memory.

Page 24: Big data cache-surge-2012
Page 25: Big data cache-surge-2012

Can you keep it in memory yourself?

webapp

webapp load

bal

ance

r

load

bal

ance

r

service full cache

data loader

service full cache

data loader

service full cache

data loader

BIGDATABASE

operational !elief

"cales infinitely

performance gain

consistency problems

Page 26: Big data cache-surge-2012

Fixing Consistency

webapp

load

bal

ance

r

service full cache

data loader

deployment cell

service full cache

data loader webapp

webapp service full cache

data loader

1.  Deployment “Cells” 2.  Sticky user sessions

Page 27: Big data cache-surge-2012

credit: http://www.fruitshare.ca/wp-content/uploads/2011/08/car-full-of-apples.jpeg

Page 28: Big data cache-surge-2012

"Programmers waste enormous amounts of time thinking about, or worrying about, the speed of

noncritical parts of their programs, and these attempts at efficiency actually have a strong negative

impact when debugging and maintenance are considered.

We should forget about small efficiencies, say about

97% of the time: premature optimization is the root of all evil. Yet we should not pass up our

opportunities in that critical 3%.”

Donald Knuth

Page 29: Big data cache-surge-2012

How do you fit all that data in memory?

Page 30: Big data cache-surge-2012

The Answer

1 2 3 4 5

Page 31: Big data cache-surge-2012

Domain Model Design

“Domain Layer (or Model Layer): Responsible for representing concepts of the business, information about the business situation, and business rules. State that reflects the business situation is controlled and used here, even though the technical details of storing it are delegated to the infrastructure. This layer is the heart of business software.”

Eric Evans, Domain-Driven Design, 2003

1 2 3 4 5

Page 32: Big data cache-surge-2012

Domain Model Design Guidelines

#1 Keep it immutable

#2 Use independent hierarchies

Help! I am in the trunk!

#3 Optimize Data

http://alloveralbany.com/images/bumper_gawking_dbgeek.jpg

Page 33: Big data cache-surge-2012

intern()your immutables

V1

A

C

B D

E

V2

F

B’

C’

E’

K1

K2

D’

E’

V1

A

C

B

D

E V2

F

K1

K2

Page 34: Big data cache-surge-2012

private  final  Map<Class<?>,  Map<Object,  WeakReference<Object>>>  cache  =            new  ConcurrentHashMap<Class<?>,  Map<Object,  

WeakReference<Object>>>();    public  <T>  T  intern(T  o)  {  

 if  (o  ==  null)      return  null;    Class<?>  c  =  o.getClass();    Map<Object,  WeakReference<Object>>  m  =  cache.get(c);    if  (m  ==  null)        cache.put(c,  m  =  synchronizedMap(new  WeakHashMap<Object,  

WeakReference<Object>>()));    WeakReference<Object>  r  =  m.get(o);    @SuppressWarnings("unchecked")    T  v  =  (r  ==  null)  ?  null  :  (T)  r.get();    if  (v  ==  null)  {      v  =  o;      m.put(v,  new  WeakReference<Object>(v));    }        return  v;  

}  

Page 35: Big data cache-surge-2012

Use Independent Hierarchies

Product id = … title= …

Offers

Specifications

Description

Reviews

Rumors

Model History

Product Summary productId = …

Offers productId = …

Specifications productId = …

Description productId = …

Reviews productId = …

Rumors productId = …

Model History productId = …

Product Info

Page 36: Big data cache-surge-2012

Collection Optimization

1 3 4 5 2

Page 37: Big data cache-surge-2012

Leverage Primitive Keys/Values

Trove (“High Performance Collections for Java”)

collection with 10,000 elements [0 .. 9,999] size in memory

java.util.ArrayList<Integer>   200K  

java.util.HashSet<Integer>   546K  

gnu.trove.list.array.TIntArrayList   40K  

gnu.trove.set.hash.TIntHashSet   102K  

Page 38: Big data cache-surge-2012

Optimize

class  ImmutableMap<K,  V>  implements  Map<K,V>,  Serializable  {  ...  }    class  MapN<K,  V>  extends  ImmutableMap<K,  V>  {  

 final  K  k1,  k2,  ...,  kN;    final  V  v1,  v2,  ...,  vN;    @Override  public  boolean  containsKey(Object  key)  {        if  (eq(key,  k1))  return  true;        if  (eq(key,  k2))  return  true;        ...        return  false;    }    ...  

Collections with small number of entries (up to ~20):

small immutable collections

Page 39: Big data cache-surge-2012

Space Savings

java.util.HashMap:

128 bytes + 32 bytes per entry

compact immutable map:

24 bytes + 8 bytes per entry

Page 40: Big data cache-surge-2012

Numeric Data Optimization

1 2 5 3 4

Page 41: Big data cache-surge-2012

Price History Example

Page 42: Big data cache-surge-2012

Example: Price

History

Problem: •  Store daily prices for 1M

products, 2 offers per product •  Average price history length per

product ~2 years Total price points: (1M + 2M) * 730 = ~2 billion

Page 43: Big data cache-surge-2012

Price History

First attempt

TreeMap<Date,  Double>  

88 bytes per entry * 2 billion = ~180 GB

Page 44: Big data cache-surge-2012

Typical Shopping Price History pr

ice

days 0 20 60 70 90 100 120 121

$100

Page 45: Big data cache-surge-2012

Run Length Encoding

a a a a a a b b b c c c c c c

6 a 3 b 6 a

Page 46: Big data cache-surge-2012

Price History Optimization

•  Drop pennies •  Store prices in primitive short (use scale factor to represent prices

greater than Short.MAX_VALUE)

-20 100 -40 150 -10 140 -20 100 -10 0 -20 100 90 -9 80

Memory: 15 * 2 + 16 (array) + 24 (start date) + 4 (scale factor) = 74 bytes

•  positive: price (adjusted to scale) •  negative: run length (precedes price) •  zero: unavailable

Page 47: Big data cache-surge-2012

Savings Reduction compared to TreeMap<Date,  Double>: 155 times Estimated memory for 2 billion price points: 1.2 GB

<< 180 GB

Page 48: Big data cache-surge-2012

public  class  PriceHistory  {    

 private  final  Date  startDate;  //  or  use  org.joda.time.LocalDate    private  final  short[]  encoded;    private  final  int  scaleFactor;  

   public  PriceHistory(SortedMap<Date,  Double>  prices)  {  …  }  //  encode    public  SortedMap<Date,  Double>  getPricesByDate()  {  …  }  //  decode    public  Date  getStartDate()  {  return  startDate;  }  

   //  Below  computations  implemented  directly  against  encoded  data    public  Date  getEndDate()  {  …  }    public  Double  getMinPrice()  {  …  }    public  int  getNumChanges(double  minChangeAmt,  double  minChangePct,  boolean  abs)  {  …  }    public  PriceHistory  trim(Date  startDate,  Date  endDate)  {  …  }    public  PriceHistory  interpolate()  {  …  }  

Price History Model

Page 49: Big data cache-surge-2012

Know Your Data

Page 50: Big data cache-surge-2012

Compress text

1 2 3 4 5

Page 51: Big data cache-surge-2012

String Compression:

static  Charset  UTF8  =  Charset.forName("UTF-­‐8");   String  s  =  "The  quick  brown  fox  jumps  over  the  lazy  dog”;  //  42  chars,  136  bytes  byte[]  b  a=  "The  quick  brown  fox  jumps  over  the  lazy  dog”.getBytes(UTF8);  //  64  bytes  String  s1  =  “Hello”;  //  5  chars,  64  bytes  byte[]  b1  =  “Hello”.getBytes(UTF8);  //  24  bytes    byte[]  toBytes(String  s)  {  return  s  ==  null  ?  null  :  s.getBytes(UTF8);  }  String  toString(byte[]  b)  {  return  b  ==  null  ?  null  :  new  String(b,  UTF8);  }    

byte arrays •  Use the minimum character set encoding

Page 52: Big data cache-surge-2012

•  Great for URLs

public  class  PrefixedString  {    private  PrefixedString  prefix;    private  byte[]  suffix;  

   .  .  .    

   @Override  public  int  hashCode()  {  …  }    @Override  public  boolean  equals(Object  o)  {  …  }  

}  

String Compression: shared prefix

Page 53: Big data cache-surge-2012

public abstract class AlphaNumericString { public static AlphaNumericString make(String s) { try { return new Numeric(Long.parseLong(s, Character.MAX_RADIX)); } catch (NumberFormatException e) { return new Alpha(s.getBytes(UTF8)); } } protected abstract String value(); @Override public String toString() {return value(); } private static class Numeric extends AlphaNumericString { long value; Numeric(long value) { this.value = value; } @Override protected String value() { return Long.toString(value, Character.MAX_RADIX); } @Override public int hashCode() { … } @Override public boolean equals(Object o) { … } } private static class Alpha extends AlphaNumericString { byte[] value; Alpha(byte[] value) {this.value = value; } @Override protected String value() { return new String(value, UTF8); } @Override public int hashCode() { … } @Override public boolean equals(Object o) { … } }

}

short alphanumeric case-insensitive strings String Compression:

Page 54: Big data cache-surge-2012

String Compression: Large Strings

Gzip

bzip2 Just convert

to byte[] first, then

compress

Become the master of your

strings!

Image source:https://www.facebook.com/note.php?note_id=80105080079image

Page 55: Big data cache-surge-2012

JVM Tuning

1 2 3 4 5

Page 56: Big data cache-surge-2012

JVM Tuning

make sure to use compressed pointers

(-XX:+UseCompressedOops) use low pause

GC (Concurrent Mark Sweep, G1)

Overprovision heap by ~30% Adjust

generation sizes/ratios

Image srouce: http://foro-cualquiera.com

This s#!% is heavy!

Page 57: Big data cache-surge-2012

JVM Tuning

Print garbage collection

If GC pauses still prohibitive then

consider partitioning

Page 58: Big data cache-surge-2012
Page 59: Big data cache-surge-2012

Cache Loading

webapp

webapp

load

bal

ance

r

service full cache

data loader

service full cache

data loader

service full cache

data loader

webapp reliable file store (S3)

“cooked” datasets

Page 60: Big data cache-surge-2012

Final datasets should be compressed and

stored (i.e. S3)

Keep the format simple (CSV, JSON)

Poll for updates

Poll frequency ==

data inconsistency threshold

Help! I am in the trunk!

Page 61: Big data cache-surge-2012

Cache Loading Time Sensitivity

Page 62: Big data cache-surge-2012

Cache Loading: low time Sensitivity Data

/tax-rates

/date=2012-05-01

tax-rates.2012-05-01.csv.gz

/date=2012-06-01

tax-rates.2012-06-01.csv.gz

/date=2012-07-01

tax-rates.2012-07-01.csv.gz

Page 63: Big data cache-surge-2012

Cache Loading: medium/high time

Sensitivity /prices

/date=2012-07-01 price-obs.2012-07-01.csv.gz

/date=2012-07-02

/full

/date=2012-07-01

2012-07-01T00-10-00.csv.gz

/inc

2012-07-01T00-20-00.csv.gz

Page 64: Big data cache-surge-2012

Swap

Cache Loading Strategy

Cache is immutable, so no locking is required

Works well for infrequently

updated data sets

meow

And for datasets that need to be refreshed

each update

Image src:http://static.fjcdn.com/pictures/funny_22d73a_372351.jpg

Page 65: Big data cache-surge-2012

Deletions can be tricky

Avoid full synchronization

YARRRRRR!

Use one container per partition

http://www.lostwackys.com/wacky-packages/WackyAds/capn-crud.htm

Page 66: Big data cache-surge-2012

Concurrent Locking with Trove Map

public class LongCache<V> { private TLongObjectMap<V> map = new TLongObjectHashMap<V>(); private ReentrantReadWriteLock lock = new ReentrantReadWriteLock(); private Lock r = lock.readLock(), w = lock.writeLock(); public V get(long k) {

r.lock(); try { return map.get(k); } finally { r.unlock(); } } public V update(long k, V v) { w.lock(); try { return map.put(k, v); } finally { w.unlock(); } } public V remove(long k) { w.lock(); try { return map.remove(k); } finally { w.unlock(); } }

}

Page 67: Big data cache-surge-2012

Keep local copies

Periodically generate

serialized data/state

Validate with CRC or hash

I am “cooking” the data sets.

Ha!

Page 68: Big data cache-surge-2012

service instance

cache A

cache B cache C

cache D

service status

aggregator (servlet)

dependencies

load

bal

ance

r

health check

Dependent Caches

Page 69: Big data cache-surge-2012

Deployment Cell Status

deployment cell

cell status aggregator

load

bal

ance

r

health check

webapp status

aggregator

service 1 status

aggregator

service 2 status

aggregator

HTTP or JMX

Page 70: Big data cache-surge-2012

Hierarchical Status Aggregation

Page 71: Big data cache-surge-2012

http://ginormasource.com/humor/wp-content/uploads/2011/10/girls-on-scale-funny-photo.jpg

Page 72: Big data cache-surge-2012

Data doesn’t fit into the heap

Keep a smaller heap

http://pryncepality.com/wp-content/uploads/2012/06/2reasons.jpg

Page 73: Big data cache-surge-2012
Page 74: Big data cache-surge-2012

Partitioning Decision Tree

does my data fit in a single VM?

can I partition

statically?

use fixed partitioning

don’t partition

use dynamic partitioning

difficulty

yes no

no yes

Page 75: Big data cache-surge-2012

Fixed Partitions

http://www.nhp-spfd.com/toilet_partitions/fullsize/plastic_laminate_toilet_partitions_fs.jpg

Page 76: Big data cache-surge-2012

Fixed Partition Assignment

deployment cell p 1 p 2

p 3 p 4

webapp

p 1 p 2

p 3 p 4

webapp

load

bal

ance

r

Page 77: Big data cache-surge-2012

Dynamic Partitioning

http://pryncepality.com/wp-content/uploads/2012/06/2reasons.jpg

Page 78: Big data cache-surge-2012

Dynamic Partition Assignment

•  Each partition lives on at least 2 nodes: primary/secondary

•  Partition (set) determined via a simple computation

Page 79: Big data cache-surge-2012

C

New

E

D

B

A

F

Member nodes operate with “active "tate” fr# the leader node

Leader

Leader

E

The leader is !e-elected

Page 80: Big data cache-surge-2012

E

E

D C B

A

F

Leader

Each member node learns their target "tate fr# the leader

All nodes load partitions to achieve their target "tate. Then will transition into an active "tate.

Page 81: Big data cache-surge-2012

Dynamic Partitioning p 1 p 2

webapp

load

bal

ance

r

webapp lo

ad b

alan

cer

p 3

p 4 p 5 p 6

p 4 p 5 p 6

p 7 p 8 p 9

p 7 p 8 p 9

p 1 p 2 p 3

primary secondary

extra network hop is back

Page 82: Big data cache-surge-2012
Page 83: Big data cache-surge-2012

Ad-hoc Cache

Querying

Image source: http://talkfunnyjokes.com/wp-content/uploads/2011/10/20111015-134527.jpg

Page 84: Big data cache-surge-2012

Ad-hoc Querying

Languages

•  Groovy

•  JRuby

•  MVEL (http://mvel.codehaus.org)

•  JoSQL (Java Object SQL, http://josql.sourceforge.net)

Page 85: Big data cache-surge-2012

Organizing Queries

Break up the query: •  Extractor expressions (like SQL

“SELECT” clause) •  Filter (evaluates to boolean, like

SQL “WHERE” clause) •  Sorter (like SQL “ORDER BY”

clause) •  Limit (like SQL “LIMIT” clause)

Page 86: Big data cache-surge-2012

Parallel (fan-out)

Query Execution

partition 1

prune partitions

filter

sort

limit

extract

intermediate results

… partition 2

filter

sort

limit

intermediate results

partition N

filter

sort

limit

intermediate results

sort

limit

final results

Page 87: Big data cache-surge-2012

Parallel (fan-out) Query: Multi-level Reduction

p 1 p 2 p 3

r 1

p M p M+1

r 2

p N

r 3

r 4

Page 88: Big data cache-surge-2012

location 3

Optimizing for Network Topology Using Multi-level Reduction

p 13 p 14

p 15 p 16

p 17 p 18

location 2

p 7 p 8

p 9 p 10

p 11 p 12

location 1

p 1 p 2

p 3 p 4

p 5 p 6

slower/lower capacity links

faster/higher capacity links

Page 89: Big data cache-surge-2012

JoSQL

Example (from JoSQL site): SELECT  *  FROM    java.io.File  WHERE  name  $LIKE  "%.html"  AND      toDate  (lastModified)          BETWEEN  toDate  ('01-­‐12-­‐2004')  AND  toDate  ('31-­‐12-­‐2004')  

Page 90: Big data cache-surge-2012

JoSQL Example

select @priceChangeRate, product, @avgPriceHistory from products where offers.lowestPrice >= 100 and offer(offers, 12).priceHistory.startDate <= date('2012-01-01') and offer(offers, 12).priceHistory.endDate >= date('2012-06-01') and decideRank.rank < 200 group by category(product.primaryCategoryId).name group by order 1 execute on group_by_results sum(getNumberOfDailyPriceChanges( trimPriceHistory(offer(offers, 12).priceHistory, '2012-01-01', '2012-06-01'), offer(offers, 12).priceHistory.startDate, 15, 5)) / count(1) as priceChangeRate, avgPriceHistory(trimPriceHistory(offer(offers, 12).priceHistory, '2012-01-01', '2012-06-01')) as avgPriceHistory

find average number of significant price changes and average price history by product category for a specific seller and time period (first half of 2012)

Page 91: Big data cache-surge-2012

Total time to execute where clause on all objects:

6010.0 ms

where clause average over 3,393,731 objects:

0.0011 ms

Group column collection and sort:

46.0 ms

Group operation took:

1.0 ms

Total time to execute 2 expressions on

GROUP_BY_RESULTS objects: 3.0 ms

Page 92: Big data cache-surge-2012

cache readiness •  rolled up status

aggregate metrics •  staleness •  total counts •  counts by partition •  counts by attribute •  average cardinality of one-to-many

relationships

http://dailyjokes.in/wp-content/uploads/2012/06/server.jpg

Monitoring

Page 93: Big data cache-surge-2012

Monitoring cache readiness •  rolled up status

aggregate metrics •  staleness •  total counts •  counts by partition •  counts by attribute •  average cardinality of one-

to-many relationships

Page 94: Big data cache-surge-2012

The End

http://www.decide.com http://katemats.com

And much of the credit for this talk goes to Decide’s

Chief Architect, Leon Stein for developing the technology and creating the content. Thank you, Leon.