Post on 21-Jul-2020
20
CHAPTER 2
LITERATURE REVIEW
2.1 INTRODUCTION
The size and popularity of the WWW systems have grown
dramatically in the last couple of decades. The demand on the WWW systems
to provide response in quick time is also increasing. The increase in web users
and web applications has lead to an increase in latency, network congestion
and server overloading. Similarly, various prefetching techniques have been
developed to augment the caching efforts. Cooperative caching which is an
efficient technique to enhance the user experience in WWW has been
extensively researched in the recent past. This chapter reviews the seminal
works carried out by various researchers in the web caching in the recent past
with a special focus on the information retrieval systems. More precisely, this
chapter reviews the extent literature on the web caching systems, prefetching
techniques. The chapter also discusses the recent research works conducted
on the effectiveness of the cooperative caching techniques in reducing the
user perceived latencies in web applications.
2.2 WEB CACHING
The main function of a caching system is to store the popular web
objects that are most likely to be visited in the near future in the client
machine or the proxy server (Ali et al 2011).The performance of the web
based systems can be improved by employing various web caching
21
techniques. Acharjee (2006) has listed the main advantages of web caching
to the end users, network managers, and content creators. Some of the
advantages are that web caching decreases user perceived latency, web
caching reduces network bandwidth usage and it also reduces load on the
origin servers.
Cache replacement is the core of the web caching systems and
hence the design of efficient cache replacement algorithms are vitally
important to achieve highly sophisticated caching mechanism (Chen 2007).
Considering the importance of cache replacement algorithms, they are more
popularly called as web caching algorithms (Koskela et al 2003).
2.2.1 Web Caching Algorithms
A. Least-Recently-Used (LRU) Algorithm
Least-Recently-Used (LRU) algorithm is the simplest and most
commonly used cache management approach. LRU algorithm removes the
least recently accessed objects so that sufficient space is made available for
the new objects. LRU is easy to implement and is mostly suitable for uniform
size objects, like the traditional memory cache. However, Least-Recently-
Used (LRU) algorithm does not consider the size of an object and the
download latency of objects, so it is not suitable to use these techniques
directly for web caching (Koskela et al 2003).
B. Least-Frequently-Used (LFU)
In the case of Least-Frequently-Used (LFU) algorithm, the objects
with the least number of accesses are replaced. More precisely, LFU keeps the
more popular web objects and evicts the rarely used ones. However, the
drawback of LFU algorithm is that objects with the large reference accounts
are never replaced even if they are not requested again.
22
C. SIZE Policy
The SIZE policy is one of the common web caching approaches
that replace the largest object(s) from cache when space is needed for a new
object. It causes high cache hit ratio. The drawback of this approach is that,
the cache is often filled with small objects which are not accessed again
mostly. This scheme has low byte hit ratio.
D. Greedy-Dual-Size (GDS) Policy
Cao & Irani (1997) have suggested Greedy-Dual-Size (GDS) policy
as an extension to the SIZE policy. The algorithm integrates several factors
and assigns a key value or priority for each web object stored in the cache.
When the cache space becomes full and a new object is required to be stored
in cache, the object with the lowest key value is removed. They have proved
that the GDS algorithm achieved better performance compared with other
traditional caching algorithms. When user requests an object p, GDS
algorithm assigns key value K(p) of object p as shown in Equation 2.1.
C(p)K(p) L
S(p) (2.1)
where C(p) is the cost of fetching object p from server into the cache, S(p) is
the size of object p; and L is an aging factor. L starts at 0 and is updated to the
key value of the last replaced object. The key value K(p) of object p is
updated using the new L value since the object p is accessed again. Thus,
larger key values are assigned to objects that have been visited recently. The
major drawback of the GDS algorithm is that it ignores the usage frequency
of web object.
23
E. Greedy- Dual-Size-Frequency (GDSF)
Cherkasova (1998) has developed greedy dual size frequency
algorithm as enhancement to GDS algorithm. The design integrated frequency
factor into of the key value K(p) as shown in equation 2.2.
C(p)K(p) L F(p) *
S(p) (2.2)
where, F(p) is frequency of the visits of object p. Initially, when p is requested
by the user, F(p) is initialized to 1. If p is in the cache, its frequency is
increased by one. The drawback of the GDSF algorithm is that it does not
take into account the predicted assess in the future.
2.2.2 Studies Based on Web Caching
Ali et al (2012) have utilized a new approach to incorporate the
machine learning techniques like support vector machine (SVM) and a
decision tree (C4.5) classifiers with the conventional web proxy caching
techniques like Least-Recently-Used (LRU), Greedy-Dual-Size (GDS) and
Greedy-Dual-Size- Frequency (GDSF). As a result, three intelligent caching
approaches were used, and are known as SVM LRU, SVM GDSF and C4.5
GDS. The intelligent web proxy caching approaches were employed for
making cache replacement decisions. More specifically, the conventional
web proxy caching approaches were extended using machine learning to
enable the algorithms to adapt intelligently over time.
Ali et al (2012) have effectively combined the factors such as
recency, frequency, size, access latency and type of object using intelligent
classifier to predict whether the objects will be requested again in the future.
Then, this information has been effectively incorporated into the traditional
24
web proxy caching algorithms to present novel intelligent web proxy caching
approaches with good performance in terms of hit ratio (HR) and byte hit
ratio (BHR).
Ali et al (2012) have also benchmarked the SVM and C4.5
classifiers by comparing with both Back-Propagation Neural Network
(BPNN) and Artificial Neuro Fuzzy system (ANFIS). They have evaluated
the intelligent approaches by employing a trace-driven simulator to meet the
requirement of proxy caching approaches. The results of the simulation were
compared with other relevant web proxy caching polices. Experimental
results have revealed that the SVM LRU, SVM GDSF and C4.5 GDS
significantly improve the performances of LRU, GDSF and GDS
respectively. Furthermore, C4.5 GDS achieved the best HR among all
algorithms across the proxy datasets. Finally, SVM LRU achieves the best
BHR among all algorithms across the proxy datasets. Lastly, SVM GDSF
achieves the best balance between HR and BHR among all algorithms across
the proxy datasets.
One of the limitations of the approaches of Ali et al (2012) is that
the classifiers have to be trained continuously to ensure effective web
caching. Another limitation is that computational overhead increases in
preparation of the target outputs in training phase when looking for the future
requests. A good alternative solution could be the use of clustering algorithms
for enhancing the performance of web caching policies, since the clustering
algorithms do not need any preparation for the target output.
Yates et al (2007) have studied the trade-offs in designing efficient
caching systems for web search engines. They have explored the impact of
different caching approaches, such as static vs. dynamic caching, caching
query results vs. caching posting lists. The data for the study consisted of
25
crawl of documents from the UK domain, and query logs of one year of
queries. Yates et al (2007) have demonstrated that caching posting lists can
achieve higher hit rates than caching query answers and also suggested an
algorithm for static caching of posting lists. One of the major contributions of
Yates et al (2007) includes the design of an optimal way to split the static
cache between answers and posting lists. They have also measured the impact
of changes in the query log on the effectiveness of static caching. Yates et al
(2007) compared the performance of the their algorithm with that of static
caching algorithms as well as dynamic algorithms such as LRU and LFU and
confirmed that the new algorithm outperforms others in terms of hit rate
values.
According to Wong (2006), replacement decisions in web caching
can be affected by various factors like recency which measures the time since
the last reference to the object, Frequency of requests to an object, Size of the
web objects, Cost of fetching the object and access latency of object. The web
cache replacement policies classified based on these factors into five
categories namely, Recency-based polices, Frequency-based polices, Size-
based polices, Function-based polices and Randomized polices.
Web caches have been considered as effective tools to access the
web pages with less latency and perceived as an important mechanism to
reduced bandwidth usage. However, the explosive growth in the web related
technologies as user interactions; traditional web caching systems needed
several modifications. Several disadvantages in web caching systems were
reported in the literature. Some of the disadvantages are caching offers slower
performance if the resource isn't found in the cache, often caches store the
stale copy of a resource and supply the stale copy to the user even when the
updated copy is needed and also cache miss occurs sometimes because of data
losses.
26
2.3 PREFETCHING
Web caching has been successful in reducing the network and I/O
bandwidth consumption, but still suffer from a low hit rate, stale data and
inefficient resource management(Sharma & Goel 2010). Moreover, caching
cannot prove beneficial if the web pages were not visited in the past. Hence,
web pre-fetching techniques were suggested to overcome the limitation of
web cache mechanisms through preprocessing the contents before it was
actually requested by the user. Web prefetching predicts the web objects
which are expected to be requested by the users in the near future, though
these objects are not yet requested by the users. The predicted objects are
usually fetched from the origin server and stored in a cache. In a way, the web
prefetching helps in increasing the cache hits and reducing the user-perceived
latency (Ali et al 2011). Thus prefetching in conjunction with caching can
cater the needs of the WWW users in more than one aspect. Research interest
in prefetching has been increased in the recent years. This following section
discusses the literature related to prefetching with a special focus on the
present work.
2.3.1 Types of Web Prefetching
Based on the location, prefetching techniques can be implemented
on the client side, the server side, or in the proxy side (Zhijie et al 2009). The
difference between the various prefetching techniques lies on the navigation
patterns. The navigation pattern of the client-based prefetching concentrates
on single user across many web servers. In the case of sever-based
prefetching, the navigation pattern concentrates on all users accessing a single
website. The proxy-based prefetching concentrates on the navigation patterns
of a group of users across many web servers (Ali et al 2011).
27
2.3.2 Approaches to Web Prefetching
The existing prefetching algorithms can be broadly classified into
two main categories according to the data taken for prediction as content-
based prefetching and history based prefetching.
A. Content-Based Prefetching
The content-based prefetching approach predicts the future user
requests depending on the analysis of web page contents to find Hyper Text
Markup Language (HTML) links that are likely to be followed by the clients.
Some of the content-based prefetching approaches used ANN to predict future
requests depending on keywords in anchor text of URL. The keywords
extracted from web documents were given as inputs to ANN that predict
whether the URL needs to be prefetched or not. The major drawback of
content-based prefetching techniques is that it consumes high load for parsing
every web page served and hence it is not recommended for implementation
on the server side (Domenech et al 2010).
B. History-Based Prefetching
The history-based prefetching category predicts future user requests
depending on the observed page access behavior in the past. The algorithms
of this category can be classified into four approaches: approach based on
Dependency Graph (DG), approach based on Markov model, approach based
on cost function and approach based on data mining (Zhijieet al 2009).
28
2.3.3 Studies on Web Prefetching Techniques
Padmanabhan & Mogul (1996) have shown a dependency graph
algorithm for prediction of next page. The nodes of the dependency graph
represented web pages while the arcs indicated that the target node is accessed
after the original node within a sliding window. The weight assigned to the
arc was used to represent the probability that the target node will be the next
one. The dependency graph based prefetching approach predicts and
prefetches the nodes whose arcs connect to the current accessed node and
have weights higher than a threshold. Although the web prefetching approach
based on DG can help in reducing the latency time, the network traffic is
increased with this approach. Another drawback of DG approach was that the
prediction accuracy was low because it examined only the pair dependencies
between two web pages.
Lan et al (2002) have developed a priori-based mining method to
deduce a rule table for predicting and prefetching the highest document into a
proxy buffer. However, too many rules are produced and maintained in the
rule table, which increases complexity.
Xu et al (2004) have used the keyword-based semantic pre-fetching
approach that predicts the future requests based on semantic preferences of
past retrieved objects, rather than on the temporal relationships between web
objects. More precisely, the semantic prefetching techniques are used to
capture the client surfing interest from their past access patterns and predict
future preferences from a list of possible objects when a new web site is
visited. Xu et al (2004) employed ANN to predict future requests depending
on keywords in anchor text of URL. The keywords extracted from web
documents were given as inputs to ANN that predict whether the URL needs
to be prefetched or not.
29
Domenech et al (2010) have stressed that structure of the current
web pages has to be taken into account in predicting the web objects to reduce
the user perceived latency in prefetching techniques. They believe that the
HTML object has a noticeable number of embedded objects. As a result they
provide a Double Dependency Graph (DDG) algorithm that considers the
characteristics of the current web sites by distinguishing between container
objects and embedded objects to create a new prediction model. Their model
has a better latency reduction while decreasing the need of resources like
extra bandwidth and extra server load.
2.3.4 Clustering Based Prefetching
Pallis et al (2008) have introduced a clustering-based pre-fetching
scheme which integrates efficiently caching and prefetching approaches to
improve the performance of the web infrastructure. The main advantage of
adopting prefetching policies over a proxy cache server is that the web
content can be managed effectively by exploiting the temporal as well as the
spatial locality of objects (Pallis et al 2008) requests were
represented using a Web Navigational Graph (WNG). More specifically, a
graph-based clustering algorithm has been used to identify the clusters of
web pages based on the users access patterns. This scheme can
be easily integrated into a web proxy server, so that its performance can be
improved.
A clustering-based prefetching scheme, according to (Pallis et al
2008), can be efficiently employed to identify the clusters of correlated web
ccess patterns. The web pages may belong to
different w request, the specific clusters were
selected and fetched by the proxy cache. In addition, cache replacement
policy was used by the proxy servers to manage its content. Pallis et al (2008)
30
have introduced two algorithms viz. a clustWeb algorithm for clustering inter-
site web pages, clustPref to provide clustering based short-term prefetching
scheme.
The algorithms were simulated and experimented using the web
cache traces provided by a proxy cache server (Squid). In order to evaluate
the scheme, (Pallis et al 2008) used two performance metrics namely Hit Rate
(HR) and Byte Hit Rate (BHR). The simulations results showed that the
integrated framework was robust and effective in improving the performance
of the web caching environment.
Pallis et al (2008) have stated that the application of clustWeb
algorithm can be extended to areas like discovering usage patterns and
profiles, detecting copyright violations, and reporting search results.
Similarly, the efficiency of the pre-fetching scheme can be compared with
other clustering algorithms.
Sharma et al (2009) have introduced a clustering approach based on
rough set clustering which was used to form the clusters of sessions. In this
approach, data acquired from past experience was classified as uncertain,
imprecise or incomplete information. Using rough set clustering, only
meaningful sessions are obtained in which user spends their quality time. The
authors have developed RST algorithm based on the concept of rough sets to
calculate equivalence between objects and then finds lower approximation
and upper approximation. Lower approximation is the union of all
equivalence objects which are contained in the target set, which is generally
supposed by the user. The upper approximation is the union of all equivalence
objects which have non-empty intersection with target set.
31
Sharma et al (2009) have also developed the Prediction Prefetching
Engine (PPE) concept which can reside at a proxy server. The function of
PPE is matching the requests of user for web pages with existing rough set
clusters and then decides whether to prefetch the page or not. One of the
advantages of clustering RST is that it feeds only meaningful sessions of web
log to rule generator phase of PPE. Hence the complexity of PPE is also
reduced.
Ahmad et al (2011) have presented an optimized predictive
prefetching technique based on clustering. The clusters of similar pages is
created using web log data file and then prediction algorithm is employed on
these clusters. The authors have optimized the prediction by taking into
account the frequency of each predicted cluster to calculate the percentage of
each web object. The frequency of each page usage in each cluster can be
determined using association rules. Ahmad et al (2011) also compared the
results with the existing technique of Yang et al (2001). Overall performance
was calculated by the summation of percentage of each web object and the
results showed that the results with higher probability of prediction accuracy.
Sathiyamoorthi & Bhaskaran (2012) have developed a clustering
approach based on a modified ART1 neural network to pre-fetch web pages
into the proxy cache. The modified ART1 clustering algorithm to group users
based on their web access patterns. The advantage of using the modified
ART1 algorithm is that it adapts to the change in users web access patterns
over time without losing information about their past web access patterns.
Sathiyamoorthi & Bhaskaran (2012) have conducted several
experiments to empirically compare the modified ART1 based clustering
approach with the existing ART1 based pre-fetching technique. The metric
used in the comparison was average inter and intra cluster distance for all the
32
data sets. Overall results of the experiment indicated that the hit rate of the
cache increases in their approach and hence the user perceived latencies was
reduced to a great extent.
Poornalatha & Raghavendra (2012) have suggested a new approach
which integrates the work of Hay et al (2004) on clustering with distance
measure technique and that of (Poornalatha & Raghavendra 2011a) on
sequence alignment in computing similarities between the clusters. In that
model, first user sessions were created based on various factors like IP
address, date and time using the web access logs. Modified k-means
algorithm (Poornalatha & Raghavendra 2011b) was used to create the cluster
of the sessions. When a user makes requests for a web page, the nearest
cluster to the requested page is determined by measuring the distance with all
cluster centers. Then the next page in the cluster is retrieved. In addition, the
number of sessions in which the next page is present followed by the
requested page in the cluster is also counted. Based on frequency, top n pages
were selected for the prediction list (Poornalatha & Raghavendra 2012).
Ali et al (2011) have highlighted the major drawback in the
prefetching enhanced systems by noting that some prefetched objects might
not have been actually requested by the users. In such cases, the prefetching
scheme eventually increases the network traffic as well as the load on the web
server. In addition, cache space is not used optimally. Hence they suggested
that the prefetching approach should be designed carefully in order to
overcome these limitations.
2.4 INTEGRATING WEB CACHING AND PREFETCHING
TECHNIQUES
The performance of the WWW system can be significantly
improved by integrating web proxy caching and prefetching techniques in a
33
proper manner. The integration can enhance the performance of web by
exploiting the temporal locality in the web proxy caching and spatial locality
in the prefetching of the web objects. In addition, the combination of the
caching and the prefetching helps in improving hit ratio and reducing the
user-perceived latency. Basically, the web prefetching requires two steps:
anticipating future pages of users and preloading them into a cache. This
means the web prefetching involves also the caching. However, the web
caching and prefetching are addressed separately by many researchers in the
past. It is important to take into consideration the impact of these two
techniques combined together. Few studies were discussed integration of web
caching and web prefetching together.
One of the earliest studies on integrating web caching and
prefetching was carried out by Kroeger et al (1997). Interestingly, they
studied the effect of combining caching and prefetching on end user latency.
Finally, they concluded that the combination of web caching and prefetching
can potentially improve latency up to 60%, whereas web caching alone
improves the latency up to 26% only.
Yang et al (2001) have advocated mining the web logs to obtain the
web-document access patterns. These patterns were further used to extend the
GDSF caching policies and prefetching policies.
Teng et al (2005) have suggested a cache replacement algorithm for
Integrating Web Caching and web Prefetching in client-side proxies (IWCP).
Using a normalized profit function, they evaluated the profit from caching an
object according to some prefetching rule.
Ibrahim & Xu (2004), Acharjee (2006) have used ANN in both
prefetching policy and web cache removal decision. This approach depends
on the keywords of URL anchor text to predict the user's future requests. The
34
most significant factors (recency and frequency) were ignored in web cache
replacement decision. Moreover, since the keywords extracted from web
documents are given as inputs to ANN, applying ANN in this way may cause
extra overhead on the server.
Balamash et al (2007) have analyzed the effects of integrating the
web caching and prefetching techniques and developed a mathematical model
to established the conditions under which prefetching reduces the average
response time of a requested document. The model accommodated both
passive client and proxy caching along with prefetching. They have
contended that prefetching never degraded the effectiveness of passive
caching and advocated that both can coexist in the same system. As an
outcome of the analysis, (Balamash et al 2007) developed an expression for
the prefetching threshold that can be set dynamically to optimize the
effectiveness of prefetching. They also introduced a prefetching protocol
based on the analytical results for optimizing the prefetching gain. A number
of investigations were made to study the effect of the caching system on the
effectiveness of prefetching.
The study observed that the high variability in web file sizes has
limited the effectiveness of prefetching. One of the assumptions of (Balamash
et al 2007), is that each client runs one browsing session at a time. Although,
one-session assumption is generally acceptable for clients with low-
bandwidth connections, further work is necessary to study the impact of
clients working with multiple sessions on high-bandwidth connections.
Jin et al (2007) have suggested a set of algorithms for integrating
web caching and prefetching for wireless local area network. As a part of the
work, they have developed a sequence mining based prediction algorithm,
context-aware prefetching algorithm and profit-driven caching replacement
policy.
35
Sulaiman et al (2009) have developed a framework for combining
web caching and prefetching on mobile environment. Using the combination
of Artificial Neural Network (ANN) and Particle Swarm Optimization (PSO)
for classification web object, they have developed a hybrid technique (Rough
Neuro-PSO). Then, rough set technique was used to generate rules from log
data on the proxy server. In prefetching side, prefetching approach based on
XML was suggested for implementation on mobile device to handle
communication between client and server.
2.5 COOPERATIVE CACHING
The major drawback in the single proxy systems is the low hit rate
which can be augmented by coordination among the caches. Hence
cooperative web caching was developed as an advanced caching scheme to
achieve effective and efficient cooperation among caches. The main goals of
the cooperative caching mechanisms are to reduce the load on the server and
to improve client-perceived latencies. According to Anderson et al (1996),
sharing and coordination of cache state among multiple communicating
caches using cooperative caching, has been shown to improve the
performance of file and virtual memory systems in a high-speed, local-area
network environment.
2.5.1 Cooperative Caching Mechanisms
A number of studies were conducted to determine an efficient
mechanism for cooperative caching in web applications. Some of the salient
features of the most common mechanisms in cooperative caching like
hierarchical caching, distributed caching and hybrid caching are discussed in
this section.
36
In hierarchical caching, there are three levels of caches, such as
client, proxy and server levels. A proxy cache can be considered as the parent
of some client caches, server cache can be considered as the parent of some
proxy caches. A client is often connected to any one of the three levels of
caches, then that cache becomes a default one for the client. When a request
from the client for data is not satisfied by the default cache, it is redirected to
the parent cache and the parent cache can in turn forward its unsatisfied
requests to its parent cache. If the document is not found at any cache level,
the upper level proxy cache contacts the original server directly. When the
document is found, either at a cache or at the original server, it travels down
the hierarchy, and each of the intermediate caches along its path makes the
decision whether a copy of the document should be cached locally or not,
based on the cache content update algorithm used (Chankhunthod et al 1996).
There are no intermediate caches in a distributed caching. Hence
whenever a miss is encountered, client and intermediate caches rely on other
mechanisms to retrieve a miss document. Some of the mechanisms included
Inter Cache Protocol ICP, Cache Array Routing Protocol, Summary Cache
and Cache Digest (Povey& Harrison 1997, Tewari et al 1999).
Finally, in case of hybrid caching, caches may cooperate with other
caches at the same level or at a higher level using distributed caching so that
the document is fetched from a parent or neighbor cache that has the lowest
round trip time (RTT) (Rabinovich et al 1998).
2.5.2 Cooperative Caching Algorithms
Dahlin et al (2004) have reviewed the various cooperative caching
algorithms and presented the results of the comparison of four algorithms
namely Direct Client Cooperation, Greedy Forwarding, Centrally Coordinated
Caching and N-Chance Forwarding.
37
In a Direct Client Cooperation, each active client uses the idle
che
overflows, it forwards the cache entries directly to an idle machine. The active
client can then access the private remote cache of the idle machine to fulfill its
read requests until the remote machine becomes active and evicts the
cooperative cache (Dahlin et al 2004).
In a Greedy Forwarding approach, the cache memories of all clients
in the system are considered as a global resource that can be accessed to fulfill
lacks in providing coordination
among the contents of the caches memories which ultimately causes
unnecessary data duplication (Dahlin et al 2004).
A Centrally Coordinated Caching scheme improvises the greedy
algorithm by adding coordination among the cache memories. Here
managed greedily by that client, and a globally managed section, coordinated
by the server as an extension of its central cache (Dahlin et a1 2004).
Centrally Coordinated Caching has the high global hit rate because of its
global management of the bulk of its memory resources. The main drawbacks
are that the clients local hit rates has been reduced since the local caches are
effectively made smaller and also that the central coordination may impose
significant load on the server (Dahlin et a1 2004).
In the case of N-Chance Forwarding, the fraction of each clients
cache managed cooperatively is adjusted dynamically depending on the client
activity. In a way, Greedy Forwarding algorithm has been modified by the N-
Chance algorithm to have clients cooperate to preferentially cache singlet
which are blocks stored in only one client cache. Thus, N-Chance Forwarding
is that it provides a simple dynamic trade-
38
cache data and the data being cached in the overall system. A major
disadvantage of N-Chance Forwarding approach is that it produces an
unnecessary system load because a block may be bounced among the multiple
caches while living in the cooperative portion of the caches (Dahlin et a1 2004).
2.5.3 Studies Based on Cooperative Caching
Khalil & PeiQi (2007) assert that the most of the existing
cooperative caching protocols did not examine how to select the best proxy
server that would offer the best response time to a web client. They suggested
that a fastest server can enhance the file transfer time when compared to
conventional cooperative proxy mechanisms where the selection of server is
done randomly. Hence, (Khalil & PeiQi 2007) have utilized a technique to
facilitate efficient server selection by dynamically measuring the data transfer
rates between proxy servers. The primary objective of the study was to
improve the proxy to proxy file transfer time by selecting fastest possible
available proxy from a pool of cooperative proxies (Khalil & PeiQi 2007).
The performance of the Fastest Free Server (FFS) strategy was
compared to conventional Random selection (RS) strategy. The results
indicated that FFS offers better performance in terms of mean response time
(Khalil & PeiQi 2007). They also studied the performance benefits of
integrating efficient server selection mechanism and showed that strategy of
FFS selection can be hugely beneficial as file transfer time can improve
significantly if chosen as the preferred method.
The work of Khalil & PeiQi (2007) can be further investigated by
including real life web traces to analyze the effectiveness of server selection
strategies. Similarly, the efficiency of the FFS scheme has to be validated in a
dynamic network environment with real cache networks data.
39
Chen et al (2008) have introduced a hybrid and cooperative
browser-level web caching system based on chord. In that system, each node
in the system can contact with each other for web cache sharing. The
advantage of the system is that when a miss occurs in local web cache and
web cache server, the request is automatically sent to a remote node in the
chord or its web cache server. Hence the effective workload of server
decreases and hit ratio is increased because of resource sharing between proxy
server web caches (Chen et al 2008).
Initially, (Chen et al 2008) have designed the hybrid browser-level
web caching system based on peer-to-peer system chord. Then using the
improved resource searching algorithm, the system was extended to the local
web cache and web cache server level systems.
Chen et al (2008) have evaluated the performance of the system by
performing hit ratio simulation, comparing the hits in local web cache, hits in
web cache server and hits in chord web cache server for various web cache
sizes. The results showed that the system achieved a better hit ratio in
comparison with other two systems.
Baek et al (2009) have designed a new object management policy
that can be applied in the hybrid architecture for cooperative caching. The
new policy has the provision for discarding the web objects that are not likely
to be accessed by clients. This approach employed predictive technique using
table of rules derived from actual experience of the web object requests. It
also employed summary tables in each proxy cache to limit number of
executions of the expensive predictions.
In the object management policy by Baek et al (2009),
each lower level proxy cache has a summary table containing its neighbor
proxy caches object information. The summary table was used to identify the
40
location of the requested object when the object is not available in the proxy
cache. In addition, in order to boost the performance of the proxy cache, their
solution limits the number of executions of Finite Inductive (FI) Systems
depending upon the current available space of the proxy cache. The object
management model has reduced response time for the requested object by
minimizing unnecessary traffic and bandwidth usage between the low level
proxy caches and upper level proxy cache (Baek et al 2009).
Wang et al (2013) have designed an intra-Autonomous System
(AS) cache cooperation scheme to effectively control the redundancy level
within the AS. This system enables the neighboring nodes in an AS to
of limited cache resources. Intra-AS cache cooperation scheme has been
effective in solving caching issues in Content-Centric Networking (CCN).
Considering the fact that controlling the redundancy level is very important in
improving the AS-level caching performance, Wang et al (2013) have
introduced a greedy heuristic algorithm named CRE-P to eliminate
redundancy in the caching network of CCN. Thus the performance evaluation
consists of two parts. Initially, the efficiency of the greedy heuristic algorithm
in yielding an approximate solution to CRE-P was evaluated. Then the core
benefits brought by intra-AS cache cooperation scheme from different aspects
was analyzed. The results of trace based simulation showed that the simple
greedy heuristic is very efficient in eliminating redundancy.
According to (Wang et al 2013), the intra-AS cache cooperation
scheme offered two benefits. Firstly, caching slots released from redundancy
elimination could be used to cache other popular items. Secondly, the intra-
AS cache cooperation scheme provided a broader view of cached items at
neighboring nodes which actually has helped in serving locally unsatisfied
requests.
41
Wang et al (2013) have evaluated the algorithms by deriving
network topologies derived from various sources like AS 1755, AS 3967,
Brite1 and Brite2.The router level topologies AS 1755 and AS 3967 were
derived from rocket fuel project (Spring et al 2004). On the other hand, Brite1
and Brite2 were generated from hierarchical top-down topologies known as
BRITE. The performance of the intra-AS cache cooperation scheme was
compared with ubiquitous LRU scheme with varying cache sizes. The results
of the simulation showed that intra-AS cache cooperation scheme performs
better in terms of hit rate and bandwidth saving across the four topologies.
Similarly, the intra-AS cache cooperation scheme improves the caching
performance of access routers and also reduced the AS cross-traffic without
overloading the internal links. Finally, the scheme can be further improved by
combining the vertical redundancy elimination approach with the horizontal
cache cooperation scheme.
Liu et al (2013) have designed a cooperative caching scheme for
Content Oriented networking (CON) with the intention of minimizing the
content access delay for mobile users. They have formulated the caching
problem as a mixed integer programming model and also a heuristic solution
based on Lagrangian relaxation. Simulation results show that this scheme can
greatly reduce content access delay.
Liu et al (2013) have compared the performance of their approach
with LRU (removing the content least recently used in the cache-router) and
FIFO (removing the content first cached in the cache-router) scheme in the
same environment. The metrics of the evaluation primarily included average
content access delay, apart from cache size of cache-router, number of users
and move speed to measure the primary metric. The results showed that the
suggested scheme has significantly improved the delay performance
42
compared to existing algorithms such as LRU and FIFO under various cache
sizes, number of users and speed.
The scheme of Liu et al (2013) can be further developed including
detailed analytical models. In addition, the scheme has to be validated by
measuring various metrics to analyze the access parameters like cache size
and maximize number of users to ensure acceptable delay. Similarly, path
prediction algorithms could be employed in order to improve caching
decisions to detect regular user movement patterns.
Nikolaou et al (2013) have evaluated the efficiency and cost of
different placement strategies for a distributed cache implemented on the
clients of an online social network or web service and introduced a novel
cache placement strategy that leverages relationships between clients. In the
developed model, the service maintains a directory for content that tracks the
location of objects. In addition, the service also inform the requesting clients
about the location of the directory so that the clients could cache, serve, and
push content based on the directives provided by the service (Nikolaou et al
2013). The model was compared with three other placement strategies like
minimalistic scheme, opportunistic scheme and popularity-based algorithm
and the performance was evaluated. The metrics employed for evaluation
were the local cache hit ratio, global hit ratio and local outgoing bandwidth.
Overall results of the simulations revealed that the client relationship
Nikolaou et al (2013) have suggested the model can be extended further to
analyze whether social proximity is detrimental to latency when compared to
schemes that rely on geographic locality. Similarly, they also suggested
further research is necessary to study the possibility of designing a hybrid
strategy that combines the best features of the proactive and popularity
schemes.
43
Taghizadeh et al (2013) have suggested a novel cooperative
caching scheme which can be used to minimize the electronic content
provisioning cost in Social Wireless Networks (SWNET). Content
provisioning cost rely on the service and pricing dependencies among various
stakeholders like content providers (CP), network service providers, and End
Consumers (EC).
delivery business (Taghizadeh et al 2013) have developed a practical network,
service, and pricing models which were used for creating object caching
strategies with homogenous and heterogeneous object demands. They have
also analyzed the caching strategies using analytical and simulation models in
the presence of selfishness that deviate from network-wide cost-
optimal policies. They also showed that selfishness can increase user rebate
only when the number of selfish nodes in an SWNET is less than a critical
number.
John et al (2013) have designed - A Proxy Agent for Client
Systems (APACS) which acts as an intermediate system to control the access
of users to any webpage or website. APACS enhanced the client-server
communication by adding network features and the internet capabilities by
taking into account of the safety concerns in the networking environment. The
client systems were installed with APACS and further the users has to be
authorized to get access to the Internet. APACS also offered a built-in
browser to the users with limited options. The administrator has the privilege
that it could restrict any website to a particular user or a group of users. Thus
the data usage was controlled and the performance of the system can be
improved. John et al (2013) described APACS as a system that works
between a client system and a server where the client system includes a built-
in web browser
44
John et al (2013) have evaluated the performance of APACS by
comparing with other popular web browsers. The results of the comparison
between the browsers showed that APACS performed better than a few
browsers when the features like speed and success rate were considered. The
major advantage of APACS is that it can act as an internet administration tool
for Windows OS.
2.6 SUMMARY
This chapter has focused on the important work carried out by
various researchers on the web caching and prefetching techniques. Particular
attention was given to the cooperative web caching systems. A through
review of literature was performed to identify the techniques, classification,
algorithms and research works on the cooperative web caching systems.
From the survey, it is identified that the machine learning
algorithms finds limited applications to information retrieval. The training
data set admitted in the cache without any admission control will admit
redundant data for further processing. In case of pattern identification all the
intelligent algorithms like NN, AI and GA are complex in nature and are
computationally very expensive in making the caching decisions. Traditional
cache management policies are not suitable for web caching systems. The
existing distributed system does not share their browser objects. Chord
network system is inspired by the way in which it allows to share their
contents within an authenticated group. Hence based on the survey, an
integrated cluster based proxy cache with hybrid based cache management
system is developed for information sharing. Information sharing can be
obtained by creating a chord network within a group.
The function of Access Log Manager (ALM), which is the first
module of the integrated system, is described in the next chapter.