Tweaking performance on high-load projects

97
Tweaking performance on high-load projects

description

Architecture of our advertising project.

Transcript of Tweaking performance on high-load projects

Page 1: Tweaking performance on high-load projects

Tweaking performance on high-load projects

Page 2: Tweaking performance on high-load projects

Dmitriy DumanskiyCogniance, mGage project

Java Team Lead

Page 3: Tweaking performance on high-load projects

Project evolution

Ad Poject 1 Ad Project 2

XXXX

Page 4: Tweaking performance on high-load projects

Ad Poject 1 delivery load

3 billions req/mon. ~8 c3.xLarge Amazon instances.

Average load : 3000 req/secPeak : x10

Page 5: Tweaking performance on high-load projects

Ad Project 2 delivery load

14 billions req/mon. ~16 c3.xLarge Amazon

instances.Average load : 6000 req/sec

Peak : x6

Page 6: Tweaking performance on high-load projects

XXXX delivery Load

20 billions req/mon. ~14 c3.xLarge Amazon instances.

Average load : 11000 req/secPeak : x6

Page 7: Tweaking performance on high-load projects

Is it a lot?

Average load : 11000 req/sec

Page 8: Tweaking performance on high-load projects

Twitter : new tweets

15 billions a month Average load : 5700 req/sec

Peak : x30

Page 9: Tweaking performance on high-load projects

Delivery load

Requests per month

Max load per

instance, req/sec

RequirementsServers, AWS c3.xLarge

Ad Project 1 3 billions 300 HTTPTime 95% < 60ms 8

Ad Project 2 14 billions 400 HTTPTime 95% < 100ms 16

XXXX 20 billions 800 HTTPSTime 99% < 100ms 14

Page 10: Tweaking performance on high-load projects

Delivery load

c3.XLarge - 4 vCPU, 2.8 GHz Intel Xeon E5-2680LA - ~2-3

1-2 cores reserved for sudden peaks

Page 11: Tweaking performance on high-load projects

BE tech stacksAd Project 2: Spring, iBatis, MySql, Solr, Vertica, Cascading, Tomcat

Ad Project 2 :Spring, Hibernate, Postgres, Distributed ehCache, Hadoop, Voldemort, Jboss

XXXX:Spring, Hibernate, MySQL, Solr, Cascading, Redis, Tomcat

Page 12: Tweaking performance on high-load projects

Initial problem

● ~1000 req/sec● Peaks 6x● 99% HTTPS with response time < 100ms

Page 13: Tweaking performance on high-load projects

Real problem

● ~85 mln active users, ~115 mln registered users● 11.5 messages per user per day● ~11000 req/sec● Peaks 6x● 99% HTTPS with response time < 100ms● Reliable and scalable for future grow up to 80k

Page 14: Tweaking performance on high-load projects

Architecture

AdServer Console (UI)

Reporting

Page 15: Tweaking performance on high-load projects

Architecture

Console (UI)

MySql

SOLR Master

SOLR Slave SOLR SlaveSOLR Slave

Page 16: Tweaking performance on high-load projects

SOLR? Why?● Pros:

○ Quick search on complex queries○ Has a lot of build-in features (master-slave

replication, RDBMS integration)

● Cons:○ Only HTTP, embedded performs worth○ Not easy for beginners○ Max load is ~100 req/sec per instance

Page 17: Tweaking performance on high-load projects

“Simple” query

"-(-connectionTypes:"+"\""+getConnectionType()+"\""+" AND connectionTypes:[* TO *]) AND "+"-connectionTypeExcludes:"+"\""+getConnectionType()+"\""+" AND " + "-(-

OSes:"+"(\""+osQuery+"\" OR \""+getOS()+"\")"+" AND OSes:[* TO *]) AND " + "-osExcludes:"+"(\""+osQuery+"\" OR \""+getOS()+"\")" "AND (runOfNetwork:T OR

appIncludes:"+getAppId()+" OR pubIncludes:"+getPubId()+" OR categories:("+categoryList+"))" +" AND -appExcludes:"+getAppId()+" AND -pubExcludes:"

+getPubId()+" AND -categoryExcludes:("+categoryList+") AND " + keywordQuery+" AND " + "-(-devices:"+"\""+getHandsetNormalized()+"\""+" AND devices:[* TO *]) AND " + "-deviceExcludes:"+"\""+getHandsetNormalized()+"\""+" AND " + "-(-carriers:"+"\""

+getCarrier()+"\""+" AND carriers:[* TO *]) AND " + "-carrierExcludes:"+"\""+getCarrier()+"\""+" AND " + "-(-locales:"+"(\""+locale+"\" OR \""+langOnly+"\")"

+" AND locales:[* TO *]) AND " + "-localeExcludes:"+"(\""+locale+"\" OR \""+langOnly+"\") AND " + "-(-segments:("+segmentQuery+") AND segments:[* TO *]) AND " + "-segmentExcludes:("+segmentQuery+")" + " AND -(-geos:"+geoQuery+" AND geos:[*

TO *]) AND " + "-geosExcludes:"+geoQuery

Page 18: Tweaking performance on high-load projects

Architecture

MySql

Solr Master

SOLR Slave

AdServer

SOLR Slave

AdServer

SOLR Slave

AdServer

No-SQL

Page 19: Tweaking performance on high-load projects

AdServer - Solr Slave

Delivery:volitile DeliveryData cache;

Cron Job:DeliveryData tempCache = loadData();

cache = tempCache;

Page 20: Tweaking performance on high-load projects

Why no-sql?

● Realtime data● Quick response time● Simple queries by key● 1-2 queries to no-sql on every request. Average load

10-20k req/sec and >120k req/sec in peaks. ● Cheap solution

Page 21: Tweaking performance on high-load projects

Why Redis? Pros

● Easy and light-weight● Low latency and response time. 99% is < 1ms● Average latency is ~0.2ms● Up to 100k 'get' commands per second on c1.X-Large● Cool features (atomic increments, sets, hashes)● Ready AWS service — ElastiCache

Page 22: Tweaking performance on high-load projects

Why Redis? Cons

● Single-threaded from the box● Utilize all cores - sharding/clustering● Scaling/failover not easy● Limited up to max instance memory (240GB largest

AWS)● Persistence/swapping may delay response● Cluster solution not production ready

Page 23: Tweaking performance on high-load projects

DynamoDB vs RedisPrice per month Put, 95% Get, 95% Rec/sec

DynamoDB 58$ 300ms 150ms 50

DynamoDB 580$ 60ms 8ms 780

DynamoDB 5800$ 16ms 8ms 1250

Redis 200$ (c1.medium) 3ms <1ms 4000

ElastiCache 600$ (c1.xlarge) <1ms <1ms 10000

Page 24: Tweaking performance on high-load projects

What about others?

● Cassandra● Voldemort● Memcached● MongoDB

Page 25: Tweaking performance on high-load projects

Redis RAM problem

● 1 user entry ~ from 80 bytes to 3kb● ~85 mln users● Required RAM ~ from 1 GB to 300 GB

Page 26: Tweaking performance on high-load projects

Data compression speed

Page 27: Tweaking performance on high-load projects

Data compression size

Page 28: Tweaking performance on high-load projects

Data compression

Json → Kryo binary → 4x times less data → Gzipping → 2x times less data == 8x less data

Now we need < 40 GB

+ Less load on network stack

Page 29: Tweaking performance on high-load projects

AdServer BE

Average response time — ~1.2 msLoad — 800 req/sec with LA ~4

c3.XLarge == 4 vCPU

Page 30: Tweaking performance on high-load projects

AdServer BE

● Logging — 12% of time (5% on SSD);● Response generation — 15% of time;● Redis request — 50% of time;● All business logic — 23% of time;

Page 31: Tweaking performance on high-load projects

AdServer BE

800 req/sec is cool! But 77% of time is spent on network (redis),

logging and response sending - IO operations.

Page 32: Tweaking performance on high-load projects

Reporting

AdServer Hadoop ETL

MySQLConsole

Page 33: Tweaking performance on high-load projects

Log structure{ "uid":"test", "platform":"android", "app":"xxx", "ts":1375952275223, "pid":1, "education":"Some-Highschool-or-less", "type":"new", "sh":1280, "appver":"6.4.34", "country":"AU", "time":"Sat, 03 August 2013 10:30:39 +0200", "deviceGroup":7, "rid":"fc389d966438478e9554ed15d27713f51", "responseCode":200, "event":"ad", "device":"9910", "sw":768, "ageGroup":"18-24", "preferences":["beer","girls"] }

Page 34: Tweaking performance on high-load projects

Log structure

● 1 mln. records == 0.6 GB.● ~900 mln records a day == ~0.55 TB.● 1 month up to 20 TB of data.● Zipped data is 10 times less.

Page 35: Tweaking performance on high-load projects

Reporting

Customer : “And we need fancy reporting”

But 20 TB of data per month is huge. So what we can do?

Page 36: Tweaking performance on high-load projects

Reporting

Dimensions:device, country, region, city, carrier, advertisingId, preferences, gender, age, income, etc...

Use case:I want to know how many users saw my ad in San-Francisco.

Page 37: Tweaking performance on high-load projects

ReportingGeo table:Country, City, Region, CampaignId, Date, counters;

Device table:Device, Carrier, Platform, CampaignId, Date, counters;

Uniques table:CampaignId, UID

Page 38: Tweaking performance on high-load projects

Of course - hadoop

Predefined report types → aggregation by predefined dimensions → 500-1000 times less

data

20 TB per month → 40 GB per month

Page 39: Tweaking performance on high-load projects

Of course - hadoop● Pros:

○ Unlimited (depends) horizontal scaling

● Cons:○ Not real-time○ Processing time directly depends on quality code

and on infrastructure cost.○ Not all input can be scaled○ Cluster startup is so... long

Page 40: Tweaking performance on high-load projects

Alternatives?

● Storm● Redshift● Vertica● Spark

Page 41: Tweaking performance on high-load projects

Elastic MapReduce

● Easy setup● Easy extend● Easy to monitor

Page 42: Tweaking performance on high-load projects

Timing● Hadoop (cascading) :

○ 25 GB in peak hour takes ~40min (-10 min). CSV output 300MB. With cluster of 4 c3.xLarge.

● MySQL:○ Put 300MB in DB with insert statements ~40 min.

Page 43: Tweaking performance on high-load projects

Timing● Hadoop (cascading) :

○ 25 GB in peak hour takes ~40min (-10 min). CSV output 300MB. With cluster of 4 c3.xLarge.

● MySQL:○ Put 300MB in DB with insert statements ~40 min.

● MySQL:○ Put 300MB in DB with optimizations ~5 min.

Page 44: Tweaking performance on high-load projects

Optimized are

● No “insert into”. Only “load data” - ~10 times faster● “ENGINE=MyISAM“ vs “INNODB” when possible - ~5

times faster● For “upsert” - temp table with “ENGINE=MEMORY” - IO

savings

Page 45: Tweaking performance on high-load projects

Why cascading?

Hadoop Job 1

Hadoop Job 2

Hadoop Job 3

Result of one job should be processed by another job

Page 46: Tweaking performance on high-load projects

Lessons Learned

Page 47: Tweaking performance on high-load projects

Cost of IO

L1 cache 3 cyclesL2 cache 14 cyclesRAM 250 cyclesDisk 41 000 000 cyclesNetwork 240 000 000 cycles

Page 48: Tweaking performance on high-load projects

Cost of IO

@Cacheable is everywhere

Page 49: Tweaking performance on high-load projects

Hadoop

Map input : 300 MBMap output : 80 GB

Page 50: Tweaking performance on high-load projects

Hadoop

● mapreduce.map.output.compress = true● codecs: GZip, BZ2 - CPU intensive● codecs: LZO, Snappy● codecs: JNI

~x10

Page 51: Tweaking performance on high-load projects

Hadoop

Consider Combiner

Page 52: Tweaking performance on high-load projects

Hadoop

Text, IntWritable, BytesWritable, NullWritable, etc

Simpler - better

Page 53: Tweaking performance on high-load projects

Hadoop

Missing data:map(T value, ...) {

Log log = parse(value);

Data data = dbWrapper.getSomeMissingData(log.getCampId());

}

Page 54: Tweaking performance on high-load projects

Hadoop

Missing data:map(T value, ...) {

Log log = parse(value);

Data data = dbWrapper.getSomeMissingData(log.getCampId());

}

Wrong

Page 55: Tweaking performance on high-load projects

Hadoop

Unnecessary data:map(T value, ...) {

Log log = parse(value);

Key resultKey = makeKey(log.getCampName(), ...);

output.collect(resultKey, resultValue);

}

Page 56: Tweaking performance on high-load projects

Hadoop

Unnecessary data:map(T value, ...) {

Log log = parse(value);

Key resultKey = makeKey(log.getCampName(), ...);

output.collect(resultKey, resultValue);

}

Wrong

Page 57: Tweaking performance on high-load projects

Hadoop

Unnecessary data: RecordWriter.write(K key, V value) {

Entity entity = makeEntity(key, value);

dbWrapper.save(entity);

}

Page 58: Tweaking performance on high-load projects

Hadoop

Unnecessary data: RecordWriter.write(K key, V value) {

Entity entity = makeEntity(key, value);

dbWrapper.save(entity);

}

Wrong

Page 59: Tweaking performance on high-load projects

Hadoop public boolean equals(Object obj) {

EqualsBuilder equalsBuilder = new EqualsBuilder();

equalsBuilder.append(id, otherKey.getId());

...

}

public int hashCode() {

HashCodeBuilder hashCodeBuilder = new HashCodeBuilder();

hashCodeBuilder.append(id);

...

}

Page 60: Tweaking performance on high-load projects

Hadoop public boolean equals(Object obj) {

EqualsBuilder equalsBuilder = new EqualsBuilder();

equalsBuilder.append(id, otherKey.getId());

...

}

public int hashCode() {

HashCodeBuilder hashCodeBuilder = new HashCodeBuilder();

hashCodeBuilder.append(id);

...

}

Wrong

Page 61: Tweaking performance on high-load projects

Hadooppublic void map(...) {

…for (String word : words) {

output.collect(new Text(word), new IntVal(1));

}

}

Page 62: Tweaking performance on high-load projects

Hadoopclass MyMapper extends Mapper {

Text word = new Text();

IntVal one = new IntVal(1);

public void map(...) {

for (String word : words) {

word.set(word);

output.collect(word, one);

}

}

}

Page 63: Tweaking performance on high-load projects

Facts

● HTTP x2 faster HTTPS● HTTPS keep-alive +80% performance● Java 7 40% faster Java 6 (our case)● All IO operations minimized

Page 64: Tweaking performance on high-load projects

Java 7. Random

return items.get(new Random().nextInt(items.size()))

Page 65: Tweaking performance on high-load projects

Java 7. Random

● private static final Random rand - no new instances● ThreadLocal<Random> rand - much faster of above● ThreadLocalRandom().current() - ~3x faster of above

Page 66: Tweaking performance on high-load projects

Java 7. Less garbage

new ArrayList():this.elementData = {};

insteadOf

this.elementData = new Object[10];

new HashMap():Entry<K,V>[] table = {};

insteadOf

this.table = new Entry[16];

Page 67: Tweaking performance on high-load projects

Java 7. Less garbage

Before:class String {

int offset;

int count;

char value[];

int hash;

}

After:class String {

char value[];

int hash;

}

Page 68: Tweaking performance on high-load projects

Java 7. HashMap<String>

final int hash(Object k) {

int h = hashSeed;

if (0 != h && k instanceof String) {

return sun.misc.Hashing.stringHash32((String) k);

}

}

Page 69: Tweaking performance on high-load projects

Java 7. String

● Substring● Split

Page 70: Tweaking performance on high-load projects

Avoid concurrency

volatile the simplest way

Page 71: Tweaking performance on high-load projects

Avoid concurrency

JedisPool.getResource() - syncJedisPool.returnResource() - syncOutputStreamWriter.write() - sync

Page 72: Tweaking performance on high-load projects

Avoid concurrency

JedisPool.getResource()

JedisPool.returnResource()

replace with

ThreadLocal<JedisConnection>

Page 73: Tweaking performance on high-load projects

Avoid concurrency

ThreadLocal<JedisConnection> - requires ~1000 open connections for Redis.

More connections — slower redis response.Dead end.

Page 74: Tweaking performance on high-load projects

Avoid concurrency

OutputStreamWriter.write()

● No flush() on every request and big buffered writer ● Async writer

No guarantee for no data loss. Dead end.

Page 75: Tweaking performance on high-load projects

Avoid concurrency

OutputStreamWriter.write()

Or buy SSD =)+30-60% on disk IO

Page 76: Tweaking performance on high-load projects

Use latest versions

Jedis 2.2.3 uses commons-pool 1.6Jedis 2.3 uses commons-pool 2.0

commons-pool 2.0 - 2 times faster

Page 77: Tweaking performance on high-load projects

Small tweaks. Date

new Date() vs

System.currentTimeMillis()

Page 78: Tweaking performance on high-load projects

Small tweaks. SimpleDateFormat

return new SimpleDateFormat(“MMM yyyy HH:mm:ss Z”).parse(dateString)

~0.5 kb

Page 79: Tweaking performance on high-load projects

Small tweaks. SimpleDateFormat

● ThreadLocal● Joda - threadsafe DateTimeFormat

Page 80: Tweaking performance on high-load projects

Small tweaks. Pattern

public Item isValid(String ip) {

Pattern pattern = Pattern.compile("xxx");

Matcher matcher = pattern.matcher(ip);

return matcher.matches();

}

Page 81: Tweaking performance on high-load projects

Small tweaks. Pattern

final Pattern pattern = Pattern.compile("xxx");

final Matcher matcher = pattern.matcher(“”);

public Item isValid(String ip) {

matcher.reset(ip);

return matcher.matches();

}

Page 82: Tweaking performance on high-load projects

Small tweaks. String.split

item.getPreferences().split(“[_,;,-]”);

Page 83: Tweaking performance on high-load projects

Small tweaks. String.split

item.getPreferences().split(“[_,;,-]”);

vs

static final Pattern PATTERN = Pattern.compile("[_,;,-]");

PATTERN.split(item.getPreferences()) - ~10x faster

Page 84: Tweaking performance on high-load projects

Small tweaks. FOR loop

for (A a : arrayListA) {

// do something

for (B b : arrayListB) {

// do something

for (C c : arrayListC) {

// do something

}

}

}

Page 85: Tweaking performance on high-load projects

Small tweaks. FOR loop

Page 86: Tweaking performance on high-load projects

Small tweaks. Primitives

double coord = Double.valueOf(textLine);

Page 87: Tweaking performance on high-load projects

Network

Per 1 AdServer instance :Income traffic : ~100Mb/secOutcome traffic : ~50Mb/sec

LB all traffic :Almost 10 Gb/sec

Page 88: Tweaking performance on high-load projects

Amazon

Page 89: Tweaking performance on high-load projects

AWS ElastiCacheSLOWLOG GET 1) 1) (integer) 35 2) (integer) 1391709950 3) (integer) 34155 4) 1) "GET" 2) "2ads10percent_rmywqesssitmfksetzvj" 2) 1) (integer) 34 2) (integer) 1391709830 3) (integer) 34863 4) 1) "GET" 2) "2ads10percent_tteeoomiimcgdzcocuqs"

Page 90: Tweaking performance on high-load projects

AWS ElastiCache

35ms for GET? WTF?Even java faster

Page 91: Tweaking performance on high-load projects

AWS ElastiCache

● Strange timeouts (with SO_TIMEOUT 50ms)● No replication for another cluster● «Cluster» is not a cluster● Cluster uses usual instances, so pay for 4

cores while using 1

Page 92: Tweaking performance on high-load projects

AWS Limits. You never know where

● Network limit● PPS rate limit● LB limit● Cluster start time up to 20 mins● Scalability limits● S3 is slow for many files

Page 93: Tweaking performance on high-load projects

Redis HashesUID : adId : freqNum : created : excluded

private long adId;

private short freqNum;

private int created;

private boolean excluded;

Page 94: Tweaking performance on high-load projects

Redis Hashes

incrBy UID:freqNum 10instead of

get UID:freqNum, incrBy 10, set UID:FreqNum

Page 95: Tweaking performance on high-load projects

Redis Hashes

incrBy UID:freqNum 10instead of

get UID:freqNum, incrBy 10, set UID:FreqNumBUT

Page 96: Tweaking performance on high-load projects

Redis Hashes

incrBy UID:freqNum 10instead of

get UID:freqNum, incrBy 10, set UID:FreqNumBUT

hGetAll UID = O(N), where N - number of fields

Page 97: Tweaking performance on high-load projects

Redis Hashes

set UID {"adId":1, "freqNum":"1", "created":"134834848", "excluded":"true"}

much cheaper