Nsd Note Final
-
Upload
vidula-nandasena -
Category
Documents
-
view
221 -
download
0
Transcript of Nsd Note Final
-
8/8/2019 Nsd Note Final
1/29
NSD Final Shortnote
LDAP
y Directorieso What is a directory?
Specialized database DBMS vs Directory
y Directories are optimized for read operationsy Directories cannot facilitate complex relationships
Has Attribute-Value pairs Offline directories are static Online directories are
y Dynamicy Flexibley Personalizedy Secure
o Factors to consider when using a directory Size of files. Type of information stored. Read : Write ratio Search capability Standards based access
y LDAPo Lightweight directory access protocol
Alternative to DAP (which was complex) Uses TCP/IP stack (DAPused OSI)
o Originally used to access and update information built on x.500 model. Earlyadopters found it unsuitable for desktopPCs.
o LDAP Models Information
y Structure of information stored in directoryy Schemas
o Defined what object classes are allowedo Where they are storedo What attributes they haveo Whichattributes are optionalo Syntax of eachattribute
-
8/8/2019 Nsd Note Final
2/29
y LDAPSchema must be readable by client Naming
y How information is organized and identifiedo
Directory information tree (DIT)o Name ofan LDAP entry is followed by connecting the names
ofparent entries back to the root.
o Distinguished names (DN) and Relative Distinguished names(RDN)
Functional operationsy What operations that can be performed in the LDAP directoryy Protocol operations
o Authentication/control
BIN
D and UN
BIN
D ABANDON
o Interrogation Search Compare
o Update (All operations are Atomic, i.e. one at a time) Add entry Delete entry Modify entry
Modify DN/R
DN
y Client server interaction
o Client BINDso Client performs operationo Client UNBINDs
y Security
y How the information can be protected from unauthorized accessy LDAPprovides Encryption, Authentication and Access Controly LDAP can operate without authenticationy LDAP can operate with simple authentication
o DNand password providedo Clear text encoded with Base64
y LDAP can operate with simple authentication over SSL/TLS
-
8/8/2019 Nsd Note Final
3/29
y LDAP can operate with simple authentication and Security Layer SASLo Parameters: DN, mechanism and credentialso Provides cross protocolauthentication callso Encryption can be optionally negotiated
oLDIF (lightweight data interchange format)
Standard way of representing directory data in textual format Used when importing/exporting data to and from a directory server. ASCII (no Unicode untilLDAP v3)
o Schema design Schema design determines what data can be stored in the directory Purpose:
y Ensure that poorly designed applications do not store redundant datain the directory.
yCan be used to impose size/range/format constraints on data.
LDAP schemas consists ofy Attribute typesy Attribute syntaxesy Matching rulesy Object types
Attributesy Directory entries contain a collection ofattribute types and valuesy Attribute types hold specific elements suchas names or telephone
numbers.y The definition ofattribute type contains the following
o Unique nameo Unique Object IDo Textual descriptiono Indication of whether the attribute is single or multi-valuedo Associated syntax and matching ruleso Attribute usage indicatoro Range and size restrictions of values stored in the attributeo Indication whether it can be modified by regular applications
y Attribute syntaxes and matching ruleso Eachattribute has an associated syntax which specifies how
exactly each value is represented and how comparisons are
made.
o The rules for making attribute value comparisons are calledmatching rules
-
8/8/2019 Nsd Note Final
4/29
Object Classesy Object classes are used to group related attributes together
o Each directory entry belongs to one or more object classeso The names ofan object class an entry belong to are always
listed as a special multi-value attribute called objectclassy The set of object classes associated entry fulfills the following needs
o What type of data MUSTbe in the objecto What type of dataCANbe in the objecto Provides a convenient way for a client to retrieve a subset of
entries from a search operation
y An object class holds the following informationo Unique nameo Unique Object IDo
Textual descriptiono Set of mandatory attribute typeso Set ofallowed attribute typeso A kind (abstract, auxiliary, structural)
y Structural classeso Describes the basic aspects ofan objecto All entries should belong to exactly one structural object class
y Auxiliary classeso To add a set ofattributes that already belongs to a structural
class. (think of multiple inheritance, but only with requiredattributes)
y Abstract classeso Only used as ancestors of derived classes.
o Namespace design The directories namespace provides means on how information is referenced
in the directory.
Proper namespace design leads toy Ease of managementy
Access control flexibilityy Ability to satisfy a wider variety of directory based applicationsy More natural navigations
Structure ofa namespacey LDAP namespace is inherited from X.500y Tree structured
-
8/8/2019 Nsd Note Final
5/29
y LDAP does not support namespaces that entry might bothbe a childand aparent of the same entry.
Naming entries in LDAPy A set of attributes are chosen to form a RDNy
Form the full name (DN
) by combining all the entries and itsancestors RDNs.
Purpose ofa namespacey Data referencingy Data organizationy Datapartitioning
o Dividing among multiple serverso Can only be performed at abranchpoint in a directory (whole
branch)
y Data replicationy Access controly Application support
Analyzing namespace needsy Choice of suffixes
o The directory service may have only alocal scope or can bepart ofalarger directory system
o Suffixes name the top of the department treeo A directory server may hold multiple suffixes
y Flat or hierarchical namespaceo A good design practice is to always make a flat directory
namespace
o The flatter the namespace, the less likely are the names tochange.
o Shorter names take upless space.o Hierarchical structures
Can be used to partition dataamongst servers.y Whichattributes should be chosen for RDN
o Ensuring unique names Use existing unique names
y Employee IDsy Unix login names
Constructing new identifiersy Timestamping
-
8/8/2019 Nsd Note Final
6/29
y Append numbers to namesy Application requirementsy Administrative concernsy Privacy concernsy Future needs
o Avoid choosing the wrong naming attributeso Dont choose ahierarchical namespace withahierarchy based
on geographical, organizational or any scheme that is bound
to change
o Topology design What is topology design
y LDAP services are built to support a distributed directoryy A topology describes the way you divide the directory among physical
servers and how these servers are allocated amongst yourorganization.
y Benefitso Optimalperformance for directory applicationso Increase availabilityo Better managed
When you divide a directory into manageable chunks and assign them toseparate servers, youare partitioning the directory
y The directory is responsible for hiding the partitioning details fromuser (transparency)
y Each server is only responsible for aportion of the treey A partition is a complete sub tree of the directory information tree
(DIT)
y All entries in apartition must share a common ancestor; the partitionroot.
Knowledge referencesy Relationships between partitioned directoriesy Two types
o Immediate superior knowledge references Point upward in the directory towards the root and ties
the naming context with its ancestor.
o Subordinate references Name resolution
y Process where the directory resolves a given DN from a client to anactual entry in the directory.
-
8/8/2019 Nsd Note Final
7/29
y Used ino Locating abase object ofan LDAP search or compare
operation
o When locating a delete to modify, delete or rename.o
When locating aparent ofan entry to be added to thedirectory.
y A DNpresented in this way is called apurported nameo The client claims that the entry exists, the server must check if
its true.
Handling client distributiony LDAPReferrals and search result continuation references are
pieces of knowledge reference information sent from an LDAP
server to an LDAP client which indicates that other servers need to be
contacted to fulfill the requesty LDAP referrals
o When add/del/modify operations are performed and theserver does not have the entry.
y Search result continuation referenceso When search scope is not base-leveland the server has
knowledge of other subordinate directory partitions.
y Client libraries handle processing of references automaticallyy Structure ofaLDAP referral
o Information is in form ofan LDAPUniform Resource Locatoro A referralhas the following information
The port number of the server Hostname of the server Base DN (search) or target DN (add,deleted,compare)
y Chainingo The client sends a request to the servero If the server is unable to process the request because It does
not hold all the required data, it sends requests to other
servers instead of referring to the client.o It then returns the combined results to the client when the
other servers have completed the requests.
y Deciding between distribution handling by client or servero Chaining reduces client complexity at the cost of increased
server complexity
-
8/8/2019 Nsd Note Final
8/29
Authentication in a distributed directoryy The server or servers that ultimately handle a client request must
verify the identity of the client so they can enforce restriction
o True even in chained requests.y
Authentication in a chained environmento The directory connects to server 1 and authenticateso If the client is authenticated, sends a success codeo The client submits a search operation to server 1o Server determines that it does not hold the appropriate
partition so it chains the operation to server 2
o There are 2 ways the server 2 can learn the clients identity Server 1 can just tell server 2 who the client is and
server 2 has to believe it
Server 1 can forward the clients identity andcredentials so that server 2 can verify.
o Security Implications When using referrals. Client should always make sure
that the target server is not a rogue server so that it
does not send its authentication credentials.
When to partition?y Too much entries for a single servery Applications tend to read and modify entries from the local
workgroupy Update traffic to a single partition is too largey Directory use will expand significantly in the future, beyond the point
where a single partition is feasible
Replication Designy Replication increases the availability ofa directory server
o Increases reliabilityo Increases read performance by balancing loado Security purposes
a replica can function as apublic directory servicethrough selective replication
y Conceptso Suppliers, consumers and replication agreements
The terms which the supplier and consumer refer tothe source and destination of replication updates
y Roles are not mutually exclusive
-
8/8/2019 Nsd Note Final
9/29
Configuration information that tells a supplier serverabout a consumer server is termed a replication
agreement
y Includes the unit of replicationy
The hostname and port of the remote serverand Scheduling information
In essence the replication agreement defines what tobe replicated and where it is to be send and how it has
to be done.
o The unit of replication Define sub tree
y Specify the DNat the top ofa sub tree andeverything below willbe replicated
Filtered replication
y Define select entries in a sub tree based onobjectclass definitions
y Define select attribute typesy Replication holes!
o When performing filtered replicationsthere may be holes in the tree.
o Glue entries are placed in this place tofill the holes.
oConsistency and convergence
Describes how closely the content in replicated serversmatches, in any given time.
The supplier and the consumer is converged when theyhave the same data.
o Incrementaland totalupdates Incrementalupdates are to send to the consumer only
data that it does not have.
Totalupdates are useful when creating a server only.o I
nitialpopulation ofa replica Consumer contains no data when its replicated Techniques used
y X.500 directory information shadowingprotocol
y openLDAPuses LDAP itself to perform initialreplication by sending a series of delete
-
8/8/2019 Nsd Note Final
10/29
operations to remove unwanted entries and a
series ofad operations to populate it.
o Replication strategies The term replication strategies refer to the way
updates flow how they interact when updates arepropagated to and from the servers.
Problem: updating all servers when a client modifies anentry in a server.
3 solutionsy Single master
o Replicas are all read onlyo Edits have to be done by the master
server
o Has a single point of failureo Client can write data to the server
through chaining or referrals
y Floating mastero Replicas are all read only and a single
read write server. But ifa read/write
server goes down, an algorithm will
elect a new read/write server.
o Complications arise when masters arerejoined.
Requires an update conflictresolution policy
o Never used in directory productsy Multi master replication
o More than one read/write serveravailable
o Client writes to any of these serverso Servers must ensure that the updates
are properly propagated
o Requires an update conflict resolutionpolicy
Replication errorsy Errors suchas network problems in the
connection from the server to the slave cause
slurpd to rqueue a modification.
-
8/8/2019 Nsd Note Final
11/29
y Errors suchas schema violations andinconsistent states require manual intervention
Replication errors are written together with thereplication record to reject a file
yLDAP vs S
NMP
o Connection oriented vs Connectionlesso Stateful vs statelesso Complex vs simple data structureso Comlex vs simple operations
Disaster Recovery
y Risk vs consequenceo Affordabilityo Manageabilityo Service and support
y Component redundancy methodo RAIDo Mirroring
Keeping an exact duplicateo Clustering
A group of servers that share work and may be able to back each other up ifone server fails.
y Using multiple diskso Multiple drive system without striping, the IOload is never perfectly balanced. Some
drives will contain files that are frequently accessed and some may contain files that
are rarely accessed
o To maximize throughput, IOload must be balanced across all the drives.y Disk striping
o Combining multiple drives into one logical storage unito The stripes are interleaved in a rotating sequence so that the combined space is
composed alternatively of stripes from each drive.
o Striping can be done at byte level or in blockso Stripe width
Number ofparallel stripes that can be written to or read from simultaneously = to the number of disks in the array
o Stripe size Size of the stripes written to each disk
-
8/8/2019 Nsd Note Final
12/29
Smaller the size, the more pieces required to store a file ofa given size.Increases transfer performance but lowers seek performance.
Larger the size, fewer drives are required to store files ofa given size,lowering transfer performance
o Read
/write performance
When you write data in a redundant environment, you must write to eachredundant location (slow)
When you read data you only need to read one of the locationsy RAID
o Data is reconstructed in the event of failureo The number of drives in the array and the way data is split determines
The RAID level The capacity Overallperformance Dataprotection characteristics
o Raid 0 Striped disk array without fault tolerance Files are broken intro stripe size and distributed to the separate disk in
rotating order
Minimum of 2 drives No fault tolerance Enhances performance in either a request-rate-intensive or transfer-rate-
intensive environment
Best performance is achieved when data is striped across multiple controllers Simple design and easy to implement Not a true RAID because of no redundancy. If one drive fails, all is lost.
o RAID1 Mirroring and duplexing Mirroring
y Data is duplicated on two different drivesy When a drive fails, the system continues on a single drive
Duplexingy Uses two drive controllers
Minimum of two drives Capacity is that of the smallest drive Read performance ofa single drive unless the controller can perform
concurrent separate reads per mirrored pair
-
8/8/2019 Nsd Note Final
13/29
Write performance less than a single drive 100% redundancy of data means no rebuild is necessary in case of disk failure Simplest RAID design Disadvantages
yHigh disk overhead
y If the RAID function is done by system software it willload the CPUand degrade throughput.
y Ifa single controller is used, its stilla single point of failure.o RAID2 : ECC (error correction code)
5 drives. Eachbyte of data is written to a different disk drive in the array. Each word has its hamming code word recorded on the ECC disk On a read, the ECC code verifies correct data or corrects single disk errors
Advantagesy On the fly error correctiony High data transfer ratesy Relatively simple controllery Entry level cost very high
Most disk drives today offer embedded ECC information within each sectoras standard so no significant advantages over RAID 3
o RAID3 Parallel transfer withparity Data is distributed by bytes or bits. Typically less than 1024 bytes per stripe. A parity check byte is created for each data stripe As data is written, aparity check is calculated and stored on the parity disk As data is read, the parity is recalculated and compared with the parity disk Ifa drive goes down, the parity bit allows the missing data to be
reconstructed.
Minimum of 3 drives required Can tolerate the loss of one drive Disk failure has an insignificant impact on throughput
Controller design is fairly complex Low ratio ofparity disk to data disks mean high efficiency Very difficult and resource intensive to do as a software RAID
o RAID4 Independent data disks with shared parity disk Same as RAID3 but data is distributed by blocks Larger stripes mean that records can sometimes be read from a single disk.
-
8/8/2019 Nsd Note Final
14/29
Very high read rate to highblock size Write performance is slow Good for smallblock transfers like transaction processing applications (high
reads, low writes)
o RAID5 Independent data disks with distributed parity blocks Same as RAID4 but withparity information distributed in stripes across the
array.
Multiple concurrent reads provide better throughput Most common RAID Minimum of 3 drives Can tolerate the loss of one drive High read transaction rate and slow write performance (can be improved
withadditional cache)
Low ratio ofparity disks to data disks mean high efficiencyo RAID6
Independent data disks with two independent distributed parity schemes Same as RAID5 but with the parity information distributed in stripes across
the array and down the array
Requires at least 4 disks Very complex controlled design Controller overhead due to compute parity addresses is extremely high Requires N+2 drives to implement because of 2d parity scheme Poor write performance Useful for mission criticalapplications
o Nested RAID Multiple raid level is generally created by taking a number of disks and
dividing them into sets.
y Within each set, a single RAID level is applied to form a number ofarrays
y Then the second RAID level is applied to the arrays to create ahigherlevelarray
RAID 0+1y High data transfer performancey RAID 0 performance withRAID1 data securityy Most common nested raidy Minimum of 4 drives, expands in 2sy Has the same overhead for fault-tolerance as mirroring alone
-
8/8/2019 Nsd Note Final
15/29
RAID 10y Very high reliability combined withhighperformancey Requires at least 4 disks in multiples of twoy Implemented as a striped array whose segments are RAID1 arraysy Raid10 has the same fault tolerance as RAID 1y Raid 10 has the same overhead for fault tolerance as mirroring aloney Very expensivey All drives must move in parallel to appropriate track, lowering
sustained performance.
o Types of raid Hardware fault tolerance is faster Software fault tolerance is less expensive A hardware fault tolerance solution might lock you into a single vendor Some vendors may support hot-swappable disks when there is a failure Fault tolerance does not reduce the need for backups
o Terminology Hot swap
y Allows failed hard disks to be removed and replaces withoutpowering down
Hot sparey Uses an additional, preconfigured disk or disks to automatically
replace a failed hard disk.
y Clusteringo A parallel or distributed system that consists ofa collection of interconnected whole
computers, that is utilized as a single unified computing resource.
o Two or more computers combined to perform a task whicha single computer is notadequate to perform.
o The clients have no knowledge of the underlying hardware of the clustero Highavailability clustering
Two or more systems configured to monitor each others healthand to takeaction when a fault is detected
Most common causes of server failurey Disk and IO system failurey Power outagey Software failurey Natural disastersy Human error
-
8/8/2019 Nsd Note Final
16/29
Used for key databases, file sharing on a network, business applications andcustomer services suchas ecommerce websites
o HighPerformance Clustering Split a computational task across many different nodes
Distributed computingy Different parts ofaprogram run simultaneously on two or more
computers that are communicating with each other.
Parallel computingy Simultaneous execution of the same task (split up) on multiple
processors in order to obtain results faster
o Improve scalabilityo Improve manageability
Cluster management consoles can monitor and configure numerousindividual systems from a centralized location
Administrator can identify hot spots or failures across a wide range ofapplications and take some preventive action.
Advanced cluster consoles can distribute software patches to many nodeswitha single command
o Load balancing Each node is configured to provide the same services. All requests come throughaload balancing frontend which would distribute
the work to a set ofbackend servers.
Primarily for improved performance.o Cluster architecture
Shared disky Independent systems share disk storage for files and databases.y They have a shared IObus but dont share memoryy Provides highavailabilityy Dataaccess has to be synchronizedy Applications must be specialized for shared disk clusteringy The larger the number of nodes, the larger the locking protocol
overhead. Shared databases Mirror
y Replicating allapplications and data to a secondary storage location. Shared nothing
-
8/8/2019 Nsd Note Final
17/29
o Tired network architecture Scale is can be from 2 to thousands Single tier
y 48 port switching configuration allowing up to 48 nodesy
Can be scaled by increasing ports on the switchy Modular switches can scale welly Many clusters implement a dedicated network for inter node
communication
o Shared disk tiered system 3 server cluster
yy When server 1 breaks..it can move its tasks to the other 2 servers like
this
-
8/8/2019 Nsd Note Final
18/29
o Mirroring Each server has its own disk When a server does a write operation, it replicates the write operation into
one or more configured servers.
If that server fails, the other secondary server can take the primary position.
Data mirroring cannot deliver as highavailability as shared disk clustering asthere is a small instance where the databetween the two servers are not
consistent
Requires additional disks. Can be very expensive to replicate RAIDs Takes CPUand network resources, When a server goes down and comes back up it needs to be resynchronized.
o Shared nothing cluster Each server has its own disk Monitors each others health status and directs system recovery actions if
something goes down.
A node cannot access the disk it does not own unless the owner fails andgives up control.
Failover time may be slow No overhead oflocking protocols Disadvantages
y Tends to limit ausers storage options as the storage hardware mustbe capable ofbeing shared by multiple servers
y Sharing of non-disk peripheral devices may not be supported.y Having a dedicated cluster communication network is highly desirable
and may require purchase ofadditional network components.
y More complex to administer.y Server Load balancing
o Algorithms Round robin
y Each request is sent to the next server in the pool.y Assumes that all servers are equal
Weighted round robin
y When servers have different performance, a fractional weight is givento each server and sessions are assigned using these ratios
y Assumes that all workloads are equal Least load algorithm
y Based on number of open sessions a server has or keeping track ofresponse time and balancing on the fastest response.
-
8/8/2019 Nsd Note Final
19/29
WorldWideWeb
y Main componentso Serverso Clientso Proxies
y HTTPo HTTP is used to transmit information in multiple formats, languages and character
sets based on MIME. Contents ofaHTTP message body are treated blindly.
o GlobalURI: HTTPuses uniform request identifiers in all its transactions to identifyresources on the web
o Request-response exchange: HTTP requests are sent by clients and responses arefrom servers
o No state is maintained by clients across requests and responses (v1.0)o Syntax
Request Line (PUT/motd HTTP1.0) General/Request/Entity header(s)
y General : (Date: Wed,22 Mar 2000 08:10:07 GMT)y Request : (User-Agent: Mozilla/4.03)y Entity : (Content-Length: 23)
CRLF Message body (optional)
o HTTP/TCPInteraction HTTPhas no defined transport protocol Nearly every implementation ofHTTPuses TCP TCPTimers
y Tcp rely on timers for operations suchaso Retransmission oflost packetso Repeating the slow-start phaseo Reclaiming state from a terminated connection
Possible delaysy TCP Establishment
o Loss ofaSYN or SYN-ACKpacket introduces alonger delayo The retransmission timer is doubled for eachlost packet, and
the delay is increased.
o A frustrated user might repeatedly click the stopand refreshbutton which will increase congestion in the target network
y Transmitting the HTTP request
-
8/8/2019 Nsd Note Final
20/29
o Less likely to occur because TCP gradually decreases theretransmission timer value, but most web transfers are very
smalland might spend all the time in the slow-start phase of
congestion control
oWitha small congestion window, the likelihood of successfullydelivering multiple packets after aloss is low
Receiver is unlikely to generate the 3 duplicate packetsnecessary to trigger a fast retransmission
y Receiving the HTTP responsey Rendering the resource
Persistent connectiony A web client typically downloads multiple resources from the same
website. Withpersistent connections, these HTTP transfers can travel
over a single connection.y Sending a response message on apreviously idle persistent
connection generates alarge burst ofpackets
y To avoid overloading the network, the slow-start phase has to berestarted after aperiod of inactivity
Reducing slow-start-restart penaltyy Disabling slow-start-restarty Larger SSR timeouty Gradually decreasing congestion windowy Pacing transmission ofpackets
o Dont send bursts. use a delay. TIME_WAIT state and its effect
y TIME_WAIT state connections consume memory in the OSy Reducing TIME_WAIT overhead
o Lowering resource requirements for TIME_WAIT connectionso Shift responsibility to clients
Policies for closing persistent connectionsy Apply timeouts
o Small timeouts after first request, then increase timeout afteradditional requests
y Limit total number of requestsy Complicated policies
o Close connection that has been idle longesto Base decision on relative importance ofuser
-
8/8/2019 Nsd Note Final
21/29
HTTP/TCPLayeringy Aborted HTTP transfers
o Because the HTTPprotocol does not have a mechanism forterminating an ongoing transfer, aborting a request requires
closing the underlying transport connectiono Ifa connection is terminated before the browser receives the
response, the user may not know whether the server actually
completed the request or not.
o Aborting one request in the pipeline requires aborting all ofthe pipelined requests that have not been processed
o User levelabort operations do not immediately stop thetransfer of data from the server
y Nagles algorithmo L
imits the number of smallpackets transmitted by aTCP
sender, which may delay the transfer of the last packet ofa
HTTP message
o With nagles algorithm enabled, the last packet ofaHTTPrequest or response willhave a slight delay to be sent. (until
all the acks are received)
o To disable nagles algorithm, set the TCP_NODELAY socketoption
Disadvantages:y Consider a web server that performs separate
system call to write eachline ofHTTP response.
y Disabling nagles algorithm will introduce largeamount ofTCP overhead.
y Delayed acknowledgementso The TCP receiver may delay the transmission ofan ACK in the
hope ofpiggybacking acknowledgement on an outgoing data
packet.
o Receiver should not wait too long to send ACKo Sender may not be able to send additional data if the transfer
is not acknowledged
Multiplexing TCPConnectionsy Multiple connections to a server per clienty Enables simultaneous downloading of embedded imagesy Proxy acts behalf of multiple clients allows files to be retrieved in
parallel
-
8/8/2019 Nsd Note Final
22/29
y Problemso Unfairness to other clientso Higher loado Higher perceived latency
yRemedies
o Removing performance incentives for parallel connectionso Providing alternatives
Web Traffic Analysis
y 3 main stepso Monitoring web transfers at some locationo Generating measurement records in a formato Pre-processing records for subsequent analysis
y Motivation for web measuremento Content creatorso Webhosting companieso Network operatorso Web/networking researchers
y Measurement techniqueso Web software components can generate logs as part ofhandling requestso Traces of web traffic can be collected by passively monitoring the links and routers
in the network
o Methods Server Logging
y Web server generates log as part ofprocessing client requestsy a server log could be used to analyze access patterns and popularity
of web resources. BUT
o the server does not know how many requests are satisfied bythe caches
o popular resources may be more likely to be returned from acache
o cache-busting can be used to circumvent such side effectsy Associating requests withusers is difficult
o Requests could come from aproxyo Shared client machineso Dynamic IPaddresseso Use cookies to track users
-
8/8/2019 Nsd Note Final
23/29
Proxy Loggingy Proxy log provides a record of requests to a wide range of websitesy Can be used to determine the relative popularity of different sites and
the effectiveness ofpolicies for web caching
yLimitations
o Proxy does not see the requests that are satisfied by thebrowser cache or other proxies closer to the UAs
o May not be representative of the rest of the clients in the web Client Logging
y Has the potential to provide a detailed view ofuser browsing patternsy Compared to proxy and server logs, the user agent handles a
relatively small number of requests at a time
y Limitationso No de-facto standardo A realistic study requires alarge number ofusers
Packet Monitoringy Client, proxy and server logging impose an overhead on software
components
y Application layer logs have little or no information about networkactivity at tcpand iplevel
y Limitationso Cannot capture requests satisfied the browser cache or
encrypted messageso More difficult with increasing link speedo Requires software for reconstructing http messages from
packet traces
Active Measurementy Logs and packet traces typically do not include enough information to
evaluate the performance experienced by users.
y Alternative approach is to generate requests in a controlled mannerand observe their performance
y Issueso Where to locate the modified agentso What requests to generateo What measurements to collect
y Limitationso Diversity and complexity of the internet makes it difficult to
draw general conclusions
-
8/8/2019 Nsd Note Final
24/29
o Requires extensive experimentation over a wide range oftimes, locations and server sites
o Necessitates a wide-scale, global measurement infrastructureo Common log format
Remote host (
IPor hostname)
Remote identity (Account) Authenticated user (name provided by user) Time Request (first line ofHTTPheader) Response code Content length
o Limitations ofHTTPhead information Logs capture the request line and response code only, but not the header
fields
o Detecting resource modifications Useful for analyzing changes Statistics about resource modifications can be inferred from httpheaders,
response sizes and timestamps
y Age = request timestamp last modifiedWeb Cachinga
o Movement of web content close to userso G
oals of caching Reduce user experienced latency between the time of web request and the
time the response is displayed at the browser
Reduce the load on the network Reduce the load on the origin server
o First major technique that attempted to reduce user-perceived latencyo Why cache?
Reduce bandwidth costs Enhance user experience F
rees network resource for other data Reduce delay
o Where is caching done? Webbrowser cache to re-fetchpages the user examined during the same
session
Caching proxy
-
8/8/2019 Nsd Note Final
25/29
Regional caches can help several geographically collocoated caches in one ormore administrative domains
Reverse proxiesy Acts as a front end to one or more origin servers whichare placed
close to the server. Acts behalf of the server. Interception proxies
y Intercept http request and explicitly receive responsesy Placed close to clientsy Does not have to be directly in the path ofpackets
o How is caching done? Deciding whether a message is cacheable
y Protocol requirements that prevent cachingy Is the content dynamicy Is the cached response likely to be used again?y Will the decision to cache aparticular response lead to replacement
of one or more resources
Cache replacement and storing the response in cachey Cache replacement introduces some overhead, especially if many
small cached objects must be removed
y Resources known to be stale are evicted from a cache even if thecache is not full.
y The most well known policy is the LRU (Least recently used) approachy Factors of complex approaches
o Cost of fetching the resourceo Cost of storing the resourceo Number ofaccesses to the resourceo Probability of the resource being accessed in the futureo Time since the last modificationo Heuristic expiration
Cache replacement policiesy Least recently usedy Least frequently usedy Key based policies
o Size of objects (ditch the biggest)o Lowest latency firsto Hyper-G
First consider LFU If there is a tie use LRU
-
8/8/2019 Nsd Note Final
26/29
If its still tied use Sizey Cost based policies
o Size adjusted LRU Ranks according to their cost to size ratio
o Greedy dual Similar to Size adjusted LRU, objects in GDs cache are
ranked according to an assigned value specified by the
application. Objects with the lowest utility are evicted
first. However, every eviction operation lowers the
utility of other objects by the utility value of the
evicted object
o Cache coherency Cache has to ensure that its still freshbefore returning to a client Cache may return an older cached value along witha reason for the staleness
of the response (unable to contact origin server)
HTTP/1.1 provides several ways to maintain cache coherencey Cache-Control: forces proxy to return cached response revalidation
only-if-cached
y If-Modified-Since request header informs proxy/server to send aresponse ifand only if the modification time of the resource is higher
than specified
y Expired response header : informs client when a resource is expectedto go stale
Weak consistency approachesy Lease based approach
o Cache agrees to store a response for a fixed amount of timeo This shifts overhead of revalidation to the server, which must
keep track ofall the caching proxies it promised to reply
o Not scalable and never implementedy TTLapproach
o Responses have a cache expiration time associated with themo
During theTTL
period, the cache does not revalidate theresponse at the risk of staleness.
o TTL value can be based on Frequency of request for a cached resource Mobile environment Last modified header
-
8/8/2019 Nsd Note Final
27/29
o Cache related protocols Ifa set of caches are arranged in ahierarchy, a cache can contact other
caches at the same level to see if the requested object is present in other
caches.
Protocols are needed to reduce inter-cache communication costs
y Internet Cache Protocolo UDPbased protocol for querying peer caches for a cached
copy ofaparticular response
o Used in Squido Lends itself well to hierarchies
Requests are propagated towards the origin server ICP requests has to be translated to HTTP if the the
request is routed back to the web server
When a response comes back from the origin server,the intermediate caches can store the response for
future use
Although regionaland nationalproxies can helpalargenumber ofusers, eachapplication layer hop can
degrade performance
y Cache Array Resolution Protocolo Defines a mechanism by whicha set of caching proxies can
effectively function as a single logical cache
o A hash function is used to partition the urls across the caches A client trying to locate a cached response can target
the request to an appropriate cache by applying the
function
The hash function uses the request URLand theidentity of the proxy members to construct a
resolution path
Compared withICP, CARPhas a deterministic requestresolution path
o Disadvantages Load balancing When configuration of the caching system changes,
the cached URLs must be reassigned
Responses cannot exist on more than one proxy
-
8/8/2019 Nsd Note Final
28/29
y Cache Digest Protocolo Permit the exchange ofa digest ofa caches contentso When a cache has a digest ofall its peers, it can quickly check
in the digest first to see if the object of interest is available
oDisadvantage: false positive.
o CDP is an extension to ICPy WebCache Coordination Protocol
o Loosely connected to the network layero Purpose is to intercept the HTTP request and redirect it to the
cache engine
o Performed by a cache coordinatoro Implemented as part of the Cisco Cache Engine
o Impediments of caching C
ache Bustingy An origin server has no way of knowing ifaproxy is serving stale
resources
y Origin servers have no way of knowing the number of times aresource has been viewed at the cache
y Solutionso Hit metering
New HTTPheader named Meter which would be usedby a cache to report to the origin server, the number of
cache hits for a resource Not commercially used due to lack of confidence
o Ad-insertion Suggest that aproxy add advertisements to apage to
relieve origin server
o Clear GIFs Also known as webbugs, are placed on a webpage to
serve as a container for an advertisement
Privacy issuesy R
esponses can be cached against the wishes of the origin servery Cache control request and response directive no-store prevents
message from being cached along the path.
o Caching versus Replication Advantages of replication
y Updating resources in replication is outside the HTTPprotocoly The list of replicas are known beforehand
-
8/8/2019 Nsd Note Final
29/29
o Content distribution Selective, fine-grained mirroring Akamais content distribution solution:
y Site that wishes to have parts of its resource distributed by akamaiwould agree to rename those URLs witha specific prefix
y Resolving the hostname string using DNS yields the IPaddress of themirror