DCIM: Distributed Cache Invalidation Method Authors: Kassem Fawaz and Hassan Artail

DCIM: Distributed Cache Invalidation MethodAuthors: Kassem Fawaz and Hassan Artail

Alex BuchholzCS 6204

April 25th, 2013

Presentation Overview1. Paper overview– Main concepts/definitions

2. Related Works3. DCIM Architecture and Operations4. Analysis5. Experimental Results

DCIM Overview

• Client-Based Cache Consistency Scheme• Pull Based• Implements:

1. TTL2. Piggybacking3. Prefetching

• Provides “near strong consistency”

“this is the first complete client side approach employing adaptive TTL and achieving superior availability, delay, and traffic performance”

Distributed Cache Invalidation Method

Cache Consistency• Strong : cached items are identical to server• Weak: client queries might get stale data items• Delta: data stale for a period of time• Probabilistic: data is consistent with a

probability p• Probabilistic-delta: data item is at most delta

units of time stale with a probability not less than p

Three Main Cache Mechanisms

1. Push: Server informs change2. Pull: Client asks for change3. Hybrid: Both

Time to Live (TTL)• Scheme: TTL stored along with value– Data considered good until TTL has passed

• Popular because:a) Simplicityb) Good performancec) Flexibility to assign TTL values

• Completely client based

PiggybackingTwo Ways:1. Cache piggybacks a list of invalidated

documents when communicating with the server

2. Server piggybacks a list of updated documents when it communicates with the cache.

Prefetching• Client pulls data before needed– based on that items request rates

• Helps increase availability and reduce wait states

Pull Based Approaches1. Client Polling2. Time to Live

Client Polling• Cache validation initiated on client’s schedule• To achieve strong consistency:

Validate each item before being served

*** Low bandwidth, high latency ***

Two TTL Approaches• Fixed: A set TTL value for all data items• Adaptive: provides higher consistency

requirements along with lower traffic

Adaptive TTL Approaches

1. TTL = factor * time difference btw query and last update2. TTL = factor * last update interval3. TTL = time difference btw query time and kth recent

update divided by factor, server replays cache at k most recent update times

4. Consider complete update history and predict future updates

5. TTL = probability describing staleness of cache

5 Discussed in the paper

COACS• DCIM builds on top of COACS– Cooperative caching architecture

• Two types of nodes– Caching nodes (CNs): cache previously requested

items– Query Directories (QD): index queries along with

the addresses of the caching nodes• Either can be a Requesting Node (RN) – Node that requests data items

COACS Operation1. RN wants data – scans QD’s sequentially from

nearest to farthest for same query– Far less QD’s than CN

2. When the same query is found at a QD, it forwards the request to the CN which has the data

3. If all are traversed and no queries found (cache miss), the last QD queries the server

4. RN -> CN

Design Methodology

• Pull-based: CN’s monitor TTL of cache• Scalable: anyone can become a CN if what

they requested has not been cached• Adaptive TTL values: each CN estimates the

interupdate interval and uses to set TTL

GOAL: improve efficiency of cache updating in MANETs without requiring mobile devices to maintain

cache state information

• CN polls server often to know update times of items it has cached

• Piggybacks requests refresh of cached items each time contacts server

• To avoid unnecessary packets, two-phase approach:1. After each polling interval (Tpoll) CNs revalidate data with expired TTL

with high update rate2. After a configurable number of polling intervals (Npoll), Cn revalidates if

at least one item has passed TTL

Piggybacking in DCIM

Tpigg = Npoll X Tpoll

Allows for delta consistency when at least one item has an expired TTL

CN ensures that data items are at most one piggybacking interval stale

DCIM Basic System Overview

Two CNs are sending cache validation requests to the server (dotted arrows) via gateway nodes and through the Access Point.

The server replies back with lists of valid and changed data items (short-dashed arrows) to the CNs

Which in turn update the corresponding QDs asynchronously about the items they cache (long-dashed arrows).

Basic Data Request

TTL in DCIM• All CN’s have insight into update patterns

1. Store last update times for all data in server2. Uses this info to predict next update time

• HOWEVER, CN are constrained– Power, processing, storage

• DCIM uses a running average to estimate interupdate interval at the server

• CN only needs to store the estimated interval and the last updated time. – simplicity and ease of computation– minimum amount of data required– diminishing weights assigned to older data

IUI(t) = (1 – α) * IUI(t-1) + α * LUI

IUI(t) = estimated interarrival time at time tLUI = last interupdate interval

α = value btw 0.1 and 0.2 (0.125 in this paper)

Server Operations

Decision flow at the serverCURP: Cache Update RequestSVRP: Server Validation ReplySUDP: Server Update Data

Server only reacts to the received CURP messages -> doesn’t need to be aware of

MANET dynamics

Therefore, DCIM could be deployed in an Internet environment using

HTTP header fields

QD Operations (COACS)• Elected based on resource capability• # bounded by two limits:

1. Lower bound: having enough such that an additional QD will not yield enough reduction in workload to be worth it

2. Upper bound: corresponds to delay threshold• # can change dynamically (7<->100 in sims.)• Typically experience 1.5x the load of a CN

CN Processing

Elements the General Cache Information Table

Elements of the Query-Specific Cache Information Table

The process that runs on the CN includes two threads: 1. A monitoring thread2. A processing thread

CN Monitoring Thread1. Checks for expired data items

2. Issues validation requests3. Requests for data items

It performs these in two functions:1. Inner Loop Function: 2. Outer Loop function

Inner Loop Function

Outer Loop Function

Outer Loop FunctionInner loop function Npoll passed?

CN Processing Thread1. Processes data requests (DRPs) from RNs 2. Processes SUDP and SVRP messages in

response to CURP messages3. Also computes TTL values

Processing Data Requests• CN checks the requested item in the DRP• If INVALID -> issues an update request directly

to the server– Changes state to TEMP_INVALID. Places query in

waiting list• Otherwise, the query is processed by sending

the request item to the RN view a DREP message

Processing SVRP and SUDP• SUDP – if received, must be for an item that has

changed at the server– CN calculates TTL. If SUDP makes reference to items that

have requests placed in the waiting list, those items are sent to the corresponding requesting nodes

• SVRP – sent from the server in response to a CURP packet– there are items which were specified in the CURP packet

but not sent as part of the SVRP because the actual updated data items were sent to the CNs as part of the SUDP message

SVRP: Server Validation ReplySUDP: Server Update Data

TTL Calculation• Exact TTL calculation depends on whether

the item was expired at the server or not Average TTL versus inverse of interupdate interval.

Inverse of update rate (s)

Shows that at very low update rates (less than 1 update per 1,000 s), the estimated TTL does not adapt well. However, in actuality, time goes beyond the 2,000 s considered for this simulation time, meaning that more item updates will occur on the server during the longer time interval.

Analysis Overview• DCIM analyzed to assess:– Bandwidth gain– Query response time gain

• Compared to two consistency schemes:– Poll-every-time (PET)– Push-based stateful (PBS)

Analysis Defined• Bandwidth gain: difference btw DCIM and PET, PBS• Query response time gain: difference btw times it takes to

answer queries• Data requests defined by homogeneous Poisson distributions:

– λR = rate of requests, λU = item update rate

Response Time Gain

(2) and (3) derived response time gain of DCIM over PET and PBS

Given HD = 5, HR = 5.21, HC = 5.21, Tin = 5ms, Tout = 70 ms, λU = 1/500

Bandwidth Gain

Given λR = λU = 1/500, N = 4,000, M = 20, SR = 0.5 KB, SD = 10KB, SU = 0.15 KB, q = 20, E[Ndisc] = 15, and D = 20 s

D is the disconnection intervalN is the number of cached itemsE[Ndisc] is the CN disconnection rate

Analysis Results• Response time gain mainly depends on update rate

– In DCIM the majority of the requests are answered from the MANET, this is why the time response difference is less than 10 ms

• The traffic resulting from large piggybacking intervals is lower than that of small piggybacking intervals

• The traffic demands for DCIM decrease exponentially for small polling intervals

• DCIM is less costly than PBS

*** The polling interval is the main parameter that affects performance ***Optimal polling interval is the value that results in the lowest traffic consumption – equivalent to the high bandwidth gain since poll every time does not depend on the

polling interval

Experimental Results Overview• Implemented using ns2• Two additional schemes were implemented for comparison:

1. Poll every time: validate each item each time it’s requested2. Fixed-TTL: each item has same TTL value

• Three version of DCIM implemented1. Piggybacking is disabled2. “prefetch” is always set3. Only implements the update rate adaptation mechanism – items are

validated when they expire

• 400x400 m2 area, 100 nodes, 2 Mbps node bitrate, 2 m/s

node mobility, etc. Rest shown on next slide

Experimental Results Overview

Experimental Results Overview• The reported results are from 5 experiments that involve varying:

– request rate– update rate– item popularity– maximum velocity– polling interval– data item size

• The results are:1. Consistency ratio2. Query delay3. Cached data query delay4. Uplink traffic5. Downlink traffic6. Average overhead traffic

Varying Request Rate• Varied between 5s and 120s • Piggybacking -> appropriate TTL value -> high

consistency ratio• Prefetching -> high hit ratio -> lower delays • Query delay gets smaller after the item is cached

increases by a small margin due to less prefetching • DCIM consumes more traffic on the server side due to

prefetching• As for the node traffic, by piggybacking large amount

of items, DCIM consumes more traffic when compared to other approaches. • However, as the request rate decreases, prefetching

does not happen that often, and this leads to lower traffic as shown in the graph. This is how DCIM adapts prefetching to the request rate of items.

Varying Update Rate• TTL < 100 s is less than the interupdate

intervals in all of the scenarios simulated -> provide the best consistency level. • Fixed TTL approaches have higher hit rates

than poll every time, but less than DCIM•Delay after caching for DCIM < polling every

time and fixed-100, but it may exceed that of fixed-500 that keeps the element for a longer time. • Traffic not high at the server, very low in the

MANET – less than 10 kbps, while the bandwidth is 2 Mbps

• Piggybacking of requests -> traffic increase – Increases in frequency as update rates increase

•Without this traffic though, the CNs would not be able to infer the update rate and calculate reliable TTL estimates.

Varying Zipf (Popularity)• Effectively varies the popularity of the

data items– Analogous to varying the items’ request

rate• Zipf increases -> diversity of items

decreases– smaller subset of the items is requested

more• TTL saved for all items regardless of

their request rates– constant consistency at 98 percent

• Zipf increases -> smaller set of requests -> increases the probability of hits. It is

• Through prefetching -> constant hit rate

Varying Node Velocity• Varied between 0 m/s and 20

m/s• No special outcome• Mild increase in the delay

Varying Polling Interval• Polling interval is increased from 1 to

50 s• Mild decrease in the consistency ratio

and hit ratio– an increase in the delay

• Piggyback interval increases due to the decrease of hit rate

• By increasing the polling interval the validation requests from the inner loop function become farther apart in time

• Piggybacking large - > CN will predict that items will be requested before the end of this interval– leads to more prefetching and

consequently more traffic

Varying Data Item Size• 5 KB and 1,500 KB• DCIM at high data sizes still

demands less traffic than the network bandwidth• Traffic per node increases

linearly with the increase in data item size

• Better consistency over its variants• Piggybacking offers an added value by

providing a better estimate of TTL– consequently higher data consistency

• Piggybacking induces more overhead traffic– validation requests and prefetching

• Prefetching highly requested items saves on traffic and provides acceptable query delay

• Piggybacking increases data consistency

• Prefetching adapted to the request rate controls overhead traffic consumption

Effectiveness of DCIM

Comparison between DCIM and Other Approaches

Conclusions• DCIM is a client-based cache consistency scheme for MANETs • Relies on estimating the inter update intervals of data items to set their

expiry time. • Uses piggybacking and prefetching

– increases accuracy of estimation – reduce both traffic and query delays

• Compared DCIM approach to fixed TTL and client polling and to two server-based approaches SSUM and UIR

• Showed that DCIM provides a better overall performance than the other client- based schemes and comparable performance to SSUM

DCIM: Distributed Cache Invalidation Method Authors: Kassem Fawaz and Hassan Artail

Documents

Transcript of DCIM: Distributed Cache Invalidation Method Authors: Kassem Fawaz and Hassan Artail