DCIM: Distributed Cache Invalidation Method Authors: Kassem Fawaz and Hassan Artail

46
DCIM: Distributed Cache Invalidation Method Authors: Kassem Fawaz and Hassan Artail Alex Buchholz CS 6204 April 25 th , 2013

description

DCIM: Distributed Cache Invalidation Method Authors: Kassem Fawaz and Hassan Artail. Alex Buchholz CS 6204 April 25 th , 2013. Presentation Overview. Paper overview Main concepts/definitions Related Works DCIM Architecture and Operations Analysis Experimental Results. DCIM Overview. - PowerPoint PPT Presentation

Transcript of DCIM: Distributed Cache Invalidation Method Authors: Kassem Fawaz and Hassan Artail

Page 1: DCIM: Distributed Cache  Invalidation Method Authors:  Kassem Fawaz  and Hassan  Artail

DCIM: Distributed Cache Invalidation MethodAuthors: Kassem Fawaz and Hassan Artail

Alex BuchholzCS 6204

April 25th, 2013

Page 2: DCIM: Distributed Cache  Invalidation Method Authors:  Kassem Fawaz  and Hassan  Artail

Presentation Overview1. Paper overview– Main concepts/definitions

2. Related Works3. DCIM Architecture and Operations4. Analysis5. Experimental Results

Page 3: DCIM: Distributed Cache  Invalidation Method Authors:  Kassem Fawaz  and Hassan  Artail

DCIM Overview

• Client-Based Cache Consistency Scheme• Pull Based• Implements:

1. TTL2. Piggybacking3. Prefetching

• Provides “near strong consistency”

“this is the first complete client side approach employing adaptive TTL and achieving superior availability, delay, and traffic performance”

Distributed Cache Invalidation Method

Page 4: DCIM: Distributed Cache  Invalidation Method Authors:  Kassem Fawaz  and Hassan  Artail

Cache Consistency• Strong : cached items are identical to server• Weak: client queries might get stale data items• Delta: data stale for a period of time• Probabilistic: data is consistent with a

probability p• Probabilistic-delta: data item is at most delta

units of time stale with a probability not less than p

Page 5: DCIM: Distributed Cache  Invalidation Method Authors:  Kassem Fawaz  and Hassan  Artail

Three Main Cache Mechanisms

1. Push: Server informs change2. Pull: Client asks for change3. Hybrid: Both

Page 6: DCIM: Distributed Cache  Invalidation Method Authors:  Kassem Fawaz  and Hassan  Artail

Time to Live (TTL)• Scheme: TTL stored along with value– Data considered good until TTL has passed

• Popular because:a) Simplicityb) Good performancec) Flexibility to assign TTL values

• Completely client based

Page 7: DCIM: Distributed Cache  Invalidation Method Authors:  Kassem Fawaz  and Hassan  Artail

PiggybackingTwo Ways:1. Cache piggybacks a list of invalidated

documents when communicating with the server

2. Server piggybacks a list of updated documents when it communicates with the cache.

Page 8: DCIM: Distributed Cache  Invalidation Method Authors:  Kassem Fawaz  and Hassan  Artail

Prefetching• Client pulls data before needed– based on that items request rates

• Helps increase availability and reduce wait states

Page 9: DCIM: Distributed Cache  Invalidation Method Authors:  Kassem Fawaz  and Hassan  Artail

Pull Based Approaches1. Client Polling2. Time to Live

Page 10: DCIM: Distributed Cache  Invalidation Method Authors:  Kassem Fawaz  and Hassan  Artail

Client Polling• Cache validation initiated on client’s schedule• To achieve strong consistency:

Validate each item before being served

*** Low bandwidth, high latency ***

Page 11: DCIM: Distributed Cache  Invalidation Method Authors:  Kassem Fawaz  and Hassan  Artail

Two TTL Approaches• Fixed: A set TTL value for all data items• Adaptive: provides higher consistency

requirements along with lower traffic

Page 12: DCIM: Distributed Cache  Invalidation Method Authors:  Kassem Fawaz  and Hassan  Artail

Adaptive TTL Approaches

1. TTL = factor * time difference btw query and last update2. TTL = factor * last update interval3. TTL = time difference btw query time and kth recent

update divided by factor, server replays cache at k most recent update times

4. Consider complete update history and predict future updates

5. TTL = probability describing staleness of cache

5 Discussed in the paper

Page 13: DCIM: Distributed Cache  Invalidation Method Authors:  Kassem Fawaz  and Hassan  Artail

COACS• DCIM builds on top of COACS– Cooperative caching architecture

• Two types of nodes– Caching nodes (CNs): cache previously requested

items– Query Directories (QD): index queries along with

the addresses of the caching nodes• Either can be a Requesting Node (RN) – Node that requests data items

Page 14: DCIM: Distributed Cache  Invalidation Method Authors:  Kassem Fawaz  and Hassan  Artail

COACS Operation1. RN wants data – scans QD’s sequentially from

nearest to farthest for same query– Far less QD’s than CN

2. When the same query is found at a QD, it forwards the request to the CN which has the data

3. If all are traversed and no queries found (cache miss), the last QD queries the server

4. RN -> CN

Page 15: DCIM: Distributed Cache  Invalidation Method Authors:  Kassem Fawaz  and Hassan  Artail

Design Methodology

• Pull-based: CN’s monitor TTL of cache• Scalable: anyone can become a CN if what

they requested has not been cached• Adaptive TTL values: each CN estimates the

interupdate interval and uses to set TTL

GOAL: improve efficiency of cache updating in MANETs without requiring mobile devices to maintain

cache state information

Page 16: DCIM: Distributed Cache  Invalidation Method Authors:  Kassem Fawaz  and Hassan  Artail

• CN polls server often to know update times of items it has cached

• Piggybacks requests refresh of cached items each time contacts server

• To avoid unnecessary packets, two-phase approach:1. After each polling interval (Tpoll) CNs revalidate data with expired TTL

with high update rate2. After a configurable number of polling intervals (Npoll), Cn revalidates if

at least one item has passed TTL

Piggybacking in DCIM

Tpigg = Npoll X Tpoll

Allows for delta consistency when at least one item has an expired TTL

CN ensures that data items are at most one piggybacking interval stale

Page 17: DCIM: Distributed Cache  Invalidation Method Authors:  Kassem Fawaz  and Hassan  Artail

DCIM Basic System Overview

Two CNs are sending cache validation requests to the server (dotted arrows) via gateway nodes and through the Access Point.

The server replies back with lists of valid and changed data items (short-dashed arrows) to the CNs

Which in turn update the corresponding QDs asynchronously about the items they cache (long-dashed arrows).

Page 18: DCIM: Distributed Cache  Invalidation Method Authors:  Kassem Fawaz  and Hassan  Artail

Basic Data Request

Page 19: DCIM: Distributed Cache  Invalidation Method Authors:  Kassem Fawaz  and Hassan  Artail

TTL in DCIM• All CN’s have insight into update patterns

1. Store last update times for all data in server2. Uses this info to predict next update time

• HOWEVER, CN are constrained– Power, processing, storage

• DCIM uses a running average to estimate interupdate interval at the server

• CN only needs to store the estimated interval and the last updated time. – simplicity and ease of computation– minimum amount of data required– diminishing weights assigned to older data

IUI(t) = (1 – α) * IUI(t-1) + α * LUI

IUI(t) = estimated interarrival time at time tLUI = last interupdate interval

α = value btw 0.1 and 0.2 (0.125 in this paper)

Page 20: DCIM: Distributed Cache  Invalidation Method Authors:  Kassem Fawaz  and Hassan  Artail

Server Operations

Decision flow at the serverCURP: Cache Update RequestSVRP: Server Validation ReplySUDP: Server Update Data

Server only reacts to the received CURP messages -> doesn’t need to be aware of

MANET dynamics

Therefore, DCIM could be deployed in an Internet environment using

HTTP header fields

Page 21: DCIM: Distributed Cache  Invalidation Method Authors:  Kassem Fawaz  and Hassan  Artail

QD Operations (COACS)• Elected based on resource capability• # bounded by two limits:

1. Lower bound: having enough such that an additional QD will not yield enough reduction in workload to be worth it

2. Upper bound: corresponds to delay threshold• # can change dynamically (7<->100 in sims.)• Typically experience 1.5x the load of a CN

Page 22: DCIM: Distributed Cache  Invalidation Method Authors:  Kassem Fawaz  and Hassan  Artail

CN Processing

Elements the General Cache Information Table

Elements of the Query-Specific Cache Information Table

The process that runs on the CN includes two threads: 1. A monitoring thread2. A processing thread

Page 23: DCIM: Distributed Cache  Invalidation Method Authors:  Kassem Fawaz  and Hassan  Artail

CN Monitoring Thread1. Checks for expired data items

2. Issues validation requests3. Requests for data items

It performs these in two functions:1. Inner Loop Function: 2. Outer Loop function

Page 24: DCIM: Distributed Cache  Invalidation Method Authors:  Kassem Fawaz  and Hassan  Artail

Inner Loop Function

Outer Loop Function

Page 25: DCIM: Distributed Cache  Invalidation Method Authors:  Kassem Fawaz  and Hassan  Artail

Outer Loop FunctionInner loop function Npoll passed?

Page 26: DCIM: Distributed Cache  Invalidation Method Authors:  Kassem Fawaz  and Hassan  Artail

CN Processing Thread1. Processes data requests (DRPs) from RNs 2. Processes SUDP and SVRP messages in

response to CURP messages3. Also computes TTL values

Page 27: DCIM: Distributed Cache  Invalidation Method Authors:  Kassem Fawaz  and Hassan  Artail

Processing Data Requests• CN checks the requested item in the DRP• If INVALID -> issues an update request directly

to the server– Changes state to TEMP_INVALID. Places query in

waiting list• Otherwise, the query is processed by sending

the request item to the RN view a DREP message

Page 28: DCIM: Distributed Cache  Invalidation Method Authors:  Kassem Fawaz  and Hassan  Artail

Processing SVRP and SUDP• SUDP – if received, must be for an item that has

changed at the server– CN calculates TTL. If SUDP makes reference to items that

have requests placed in the waiting list, those items are sent to the corresponding requesting nodes

• SVRP – sent from the server in response to a CURP packet– there are items which were specified in the CURP packet

but not sent as part of the SVRP because the actual updated data items were sent to the CNs as part of the SUDP message

SVRP: Server Validation ReplySUDP: Server Update Data

Page 29: DCIM: Distributed Cache  Invalidation Method Authors:  Kassem Fawaz  and Hassan  Artail

TTL Calculation• Exact TTL calculation depends on whether

the item was expired at the server or not Average TTL versus inverse of interupdate interval.

Inverse of update rate (s)

Shows that at very low update rates (less than 1 update per 1,000 s), the estimated TTL does not adapt well. However, in actuality, time goes beyond the 2,000 s considered for this simulation time, meaning that more item updates will occur on the server during the longer time interval.

Page 30: DCIM: Distributed Cache  Invalidation Method Authors:  Kassem Fawaz  and Hassan  Artail

Analysis Overview• DCIM analyzed to assess:– Bandwidth gain– Query response time gain

• Compared to two consistency schemes:– Poll-every-time (PET)– Push-based stateful (PBS)

Page 31: DCIM: Distributed Cache  Invalidation Method Authors:  Kassem Fawaz  and Hassan  Artail

Analysis Defined• Bandwidth gain: difference btw DCIM and PET, PBS• Query response time gain: difference btw times it takes to

answer queries• Data requests defined by homogeneous Poisson distributions:

– λR = rate of requests, λU = item update rate

Page 32: DCIM: Distributed Cache  Invalidation Method Authors:  Kassem Fawaz  and Hassan  Artail

Response Time Gain

(2) and (3) derived response time gain of DCIM over PET and PBS

Given HD = 5, HR = 5.21, HC = 5.21, Tin = 5ms, Tout = 70 ms, λU = 1/500

Page 33: DCIM: Distributed Cache  Invalidation Method Authors:  Kassem Fawaz  and Hassan  Artail

Bandwidth Gain

Given λR = λU = 1/500, N = 4,000, M = 20, SR = 0.5 KB, SD = 10KB, SU = 0.15 KB, q = 20, E[Ndisc] = 15, and D = 20 s

D is the disconnection intervalN is the number of cached itemsE[Ndisc] is the CN disconnection rate

Page 34: DCIM: Distributed Cache  Invalidation Method Authors:  Kassem Fawaz  and Hassan  Artail

Analysis Results• Response time gain mainly depends on update rate

– In DCIM the majority of the requests are answered from the MANET, this is why the time response difference is less than 10 ms

• The traffic resulting from large piggybacking intervals is lower than that of small piggybacking intervals

• The traffic demands for DCIM decrease exponentially for small polling intervals

• DCIM is less costly than PBS

*** The polling interval is the main parameter that affects performance ***Optimal polling interval is the value that results in the lowest traffic consumption – equivalent to the high bandwidth gain since poll every time does not depend on the

polling interval

Page 35: DCIM: Distributed Cache  Invalidation Method Authors:  Kassem Fawaz  and Hassan  Artail

Experimental Results Overview• Implemented using ns2• Two additional schemes were implemented for comparison:

1. Poll every time: validate each item each time it’s requested2. Fixed-TTL: each item has same TTL value

• Three version of DCIM implemented1. Piggybacking is disabled2. “prefetch” is always set3. Only implements the update rate adaptation mechanism – items are

validated when they expire

• 400x400 m2 area, 100 nodes, 2 Mbps node bitrate, 2 m/s

node mobility, etc. Rest shown on next slide

Page 36: DCIM: Distributed Cache  Invalidation Method Authors:  Kassem Fawaz  and Hassan  Artail

Experimental Results Overview

Page 37: DCIM: Distributed Cache  Invalidation Method Authors:  Kassem Fawaz  and Hassan  Artail

Experimental Results Overview• The reported results are from 5 experiments that involve varying:

– request rate– update rate– item popularity– maximum velocity– polling interval– data item size

• The results are:1. Consistency ratio2. Query delay3. Cached data query delay4. Uplink traffic5. Downlink traffic6. Average overhead traffic

Page 38: DCIM: Distributed Cache  Invalidation Method Authors:  Kassem Fawaz  and Hassan  Artail

Varying Request Rate• Varied between 5s and 120s • Piggybacking -> appropriate TTL value -> high

consistency ratio• Prefetching -> high hit ratio -> lower delays • Query delay gets smaller after the item is cached

increases by a small margin due to less prefetching • DCIM consumes more traffic on the server side due to

prefetching• As for the node traffic, by piggybacking large amount

of items, DCIM consumes more traffic when compared to other approaches. • However, as the request rate decreases, prefetching

does not happen that often, and this leads to lower traffic as shown in the graph. This is how DCIM adapts prefetching to the request rate of items.

Page 39: DCIM: Distributed Cache  Invalidation Method Authors:  Kassem Fawaz  and Hassan  Artail

Varying Update Rate• TTL < 100 s is less than the interupdate

intervals in all of the scenarios simulated -> provide the best consistency level. • Fixed TTL approaches have higher hit rates

than poll every time, but less than DCIM•Delay after caching for DCIM < polling every

time and fixed-100, but it may exceed that of fixed-500 that keeps the element for a longer time. • Traffic not high at the server, very low in the

MANET – less than 10 kbps, while the bandwidth is 2 Mbps

• Piggybacking of requests -> traffic increase – Increases in frequency as update rates increase

•Without this traffic though, the CNs would not be able to infer the update rate and calculate reliable TTL estimates.

Page 40: DCIM: Distributed Cache  Invalidation Method Authors:  Kassem Fawaz  and Hassan  Artail

Varying Zipf (Popularity)• Effectively varies the popularity of the

data items– Analogous to varying the items’ request

rate• Zipf increases -> diversity of items

decreases– smaller subset of the items is requested

more• TTL saved for all items regardless of

their request rates– constant consistency at 98 percent

• Zipf increases -> smaller set of requests -> increases the probability of hits. It is

• Through prefetching -> constant hit rate

Page 41: DCIM: Distributed Cache  Invalidation Method Authors:  Kassem Fawaz  and Hassan  Artail

Varying Node Velocity• Varied between 0 m/s and 20

m/s• No special outcome• Mild increase in the delay

Page 42: DCIM: Distributed Cache  Invalidation Method Authors:  Kassem Fawaz  and Hassan  Artail

Varying Polling Interval• Polling interval is increased from 1 to

50 s• Mild decrease in the consistency ratio

and hit ratio– an increase in the delay

• Piggyback interval increases due to the decrease of hit rate

• By increasing the polling interval the validation requests from the inner loop function become farther apart in time

• Piggybacking large - > CN will predict that items will be requested before the end of this interval– leads to more prefetching and

consequently more traffic

Page 43: DCIM: Distributed Cache  Invalidation Method Authors:  Kassem Fawaz  and Hassan  Artail

Varying Data Item Size• 5 KB and 1,500 KB• DCIM at high data sizes still

demands less traffic than the network bandwidth• Traffic per node increases

linearly with the increase in data item size

Page 44: DCIM: Distributed Cache  Invalidation Method Authors:  Kassem Fawaz  and Hassan  Artail

• Better consistency over its variants• Piggybacking offers an added value by

providing a better estimate of TTL– consequently higher data consistency

• Piggybacking induces more overhead traffic– validation requests and prefetching

• Prefetching highly requested items saves on traffic and provides acceptable query delay

• Piggybacking increases data consistency

• Prefetching adapted to the request rate controls overhead traffic consumption

Effectiveness of DCIM

Page 45: DCIM: Distributed Cache  Invalidation Method Authors:  Kassem Fawaz  and Hassan  Artail

Comparison between DCIM and Other Approaches

Page 46: DCIM: Distributed Cache  Invalidation Method Authors:  Kassem Fawaz  and Hassan  Artail

Conclusions• DCIM is a client-based cache consistency scheme for MANETs • Relies on estimating the inter update intervals of data items to set their

expiry time. • Uses piggybacking and prefetching

– increases accuracy of estimation – reduce both traffic and query delays

• Compared DCIM approach to fixed TTL and client polling and to two server-based approaches SSUM and UIR

• Showed that DCIM provides a better overall performance than the other client- based schemes and comparable performance to SSUM