Caching for Sustainability

29
Caching for Sustainability Alex Bunch

description

Caching for Sustainability. Alex Bunch. Agenda. Intro Overview Background Analysis Implementation Future. Intro. Caching is a systems technique to use relatively expensive hardware with special features On-chip SRAM is fast but costs more than memory Memory is faster than disk but… - PowerPoint PPT Presentation

Transcript of Caching for Sustainability

Page 1: Caching for Sustainability

Caching for SustainabilityAlex Bunch

Page 2: Caching for Sustainability

Agenda• Intro• Overview• Background• Analysis• Implementation• Future

Page 3: Caching for Sustainability

Intro• Caching is a systems technique to use relatively expensive

hardware with special features• On-chip SRAM is fast but costs more than memory• Memory is faster than disk but…• Web caching services (like Akamai) have low network latency to

end users but can’t scale like datacenters

How it works: Caching relies on evidence that some pieces of data are more likely to be accessed

Page 4: Caching for Sustainability

Intro

Methods for determining likelihood of access

Spatial Locality: Data near data that has just been accessed is likely to be accessed.

Temporal Locality: Data that has just been accessed is likely to be accessed again.

Page 5: Caching for Sustainability

10000 ft. view

The principle idea behind this research is that green hosts are a new type of hardware with special features

These hosts offer either a service that is entirely run by renewable sources, or they supplement it by purchasing enough renewable energy credits to offset any dirty energy used

Page 6: Caching for Sustainability

10000 ft. view

The idea behind Greenmail is that it acts as a cache for emails that are likely to be accessed and due to the fact that it is a zero carbon service the overall carbon footprint of the user goes down.

Page 7: Caching for Sustainability

10000 ft. view

Page 8: Caching for Sustainability

10000 ft. view

Page 9: Caching for Sustainability

Background• On green trends• On green hosting• On greenmail locality

Page 10: Caching for Sustainability

Background

One of the fundamental ideas that Greenmail is based on this that people want their services to be green.

This idea is validated by the fact that the customer base for green hosts have increased 60% a year from 2002-2008[1]

Page 11: Caching for Sustainability

Background

Beyond simple customer interest, green products need to be competitively priced, as 83% of consumers would rather use a green service if it did not cost more than their dirty alternative[2]

Green hosting is becoming significantly more prolific and in turn becomes competitive with dirty energy prices.

Page 12: Caching for Sustainability

Background• Green hosts are internet hosting companies that perform

‘green’ actions for their users that offset any carbon caused by their datacenter, either through the direct use of renewable energy, planting trees, or buying offsets.

Page 13: Caching for Sustainability

Background

Stating that email exhibits temporal and/or spatial locality is a lofty claim, but intuition argues that a user who accesses an important email will eventually reference it again.

Our hope is that these claims are validated by the data.

Page 14: Caching for Sustainability

Analysis

One of the most classic equations in relation to caches is in regard to the Average Memory Access Time(AMAT):

AMAT = Ht + r*Mt

Where Ht is the cache hit time, r is the miss rate, and Mt is the miss penalty

Page 15: Caching for Sustainability

Analysis

Beyond serving as a great high level analogy, greenmail has a similar equation for Average Carbon Footprint:

ACFP = Hc + r*Mc

Where Hc is carbon associated with a cache hit, r is the miss rate, and Mc is the carbon miss penalty

Page 16: Caching for Sustainability

Analyis• Due to the fact that Greenmail is carbon neutral then Hc is 0

and since Mc is based on the original email provider then the rate (r) is the only element of this equation that we can attempt to minimize, subject to our constraints.

Page 17: Caching for Sustainability

Constraints

As with classic caches, the miss rate is based partially on the size of the cache and the algorithm used to replace data.

While the Algorithm can be modified depending on experimental data, the cache size has a cap.

Page 18: Caching for Sustainability

Constraints

Our cache size is self imposed to keep greenmail economically sound: our cost of maintaining the cache should not exceed the cost that the original email provider spends storing all of a single users data.

Page 19: Caching for Sustainability

Constraints

The reason that this makes our cache smaller is that email providers have two elements working to reduce their energy costs:Dirty energy – costs less than green energy.Economy of Scale – more users translates into spending less per user.

Page 20: Caching for Sustainability

Constraints example

Email Host A uses dirty power that costs half as much as green power, and due to the number of users it has it is able to purchase hardware at 75% the price Greenmail can. Greenmail must hold at most 37.5% of the emails that the host does.

Page 21: Caching for Sustainability

Implementation

Our implementation of Greenmail is based on a modified version of SquirrelMail, a free open source web based email application that has access to an IMAP proxy server.

Page 22: Caching for Sustainability

Implementation

Cache functionality comes from modifying the SquirrellMail IMAP functions.

A single IMAP session consists of many messages being sent between the user running SquirrelMail and the initial email provider, but only a few of them are worth caching.

Page 23: Caching for Sustainability

Implementation

Only two of these messages are ‘worth’ caching due to the fact most of the others are just a few lines long:‘Get Headers’ – Returns a list of all the email subjects in the relevant mailbox/search‘Get Body’ – Returns the body of the email requested

Page 24: Caching for Sustainability

Implementation

‘Get Body’ – An encrypted local copy is made whenever this is called and when any subsequent calls are made the local copy is retrieved.

‘Get Headers’ – theoretically should be easy to cache, except there is a timestamp baked into it that is used for error checking

Page 25: Caching for Sustainability

Implementation

In addition to the modifications made to SquirrelMail, additional scripts needed to be made to allow for users to quickly and easily set up their own cache.

Separate directories are made for each user due to how SquirrelMail stores IMAP configurations.

Page 26: Caching for Sustainability

Results

Currently in the process of collecting data from real users as there is no set test suite / benchmark that models users accessing emails

In the future if a good user ‘profile’ is found it is possible to automate this (x% spam, y% accessed frequently, etc)

Page 27: Caching for Sustainability

Example Locality Analysis (not from Greenmail)

Page 28: Caching for Sustainability

Future Work• Heavy data analysis• Cache Algorithms• Caching Headers• Caches searches• Used to limit mailbox refresh rate

• Zoolander backend

Page 29: Caching for Sustainability

Questions/References

[1] The AMD Opteron Processor Helps AISO. www.vmware.com.[2] N. Holdings. The nielsen global online environmental survey,

2011.