Nsd Note Final

download Nsd Note Final

of 29

Transcript of Nsd Note Final

  • 8/8/2019 Nsd Note Final

    1/29

    NSD Final Shortnote

    LDAP

    y Directorieso What is a directory?

    Specialized database DBMS vs Directory

    y Directories are optimized for read operationsy Directories cannot facilitate complex relationships

    Has Attribute-Value pairs Offline directories are static Online directories are

    y Dynamicy Flexibley Personalizedy Secure

    o Factors to consider when using a directory Size of files. Type of information stored. Read : Write ratio Search capability Standards based access

    y LDAPo Lightweight directory access protocol

    Alternative to DAP (which was complex) Uses TCP/IP stack (DAPused OSI)

    o Originally used to access and update information built on x.500 model. Earlyadopters found it unsuitable for desktopPCs.

    o LDAP Models Information

    y Structure of information stored in directoryy Schemas

    o Defined what object classes are allowedo Where they are storedo What attributes they haveo Whichattributes are optionalo Syntax of eachattribute

  • 8/8/2019 Nsd Note Final

    2/29

    y LDAPSchema must be readable by client Naming

    y How information is organized and identifiedo

    Directory information tree (DIT)o Name ofan LDAP entry is followed by connecting the names

    ofparent entries back to the root.

    o Distinguished names (DN) and Relative Distinguished names(RDN)

    Functional operationsy What operations that can be performed in the LDAP directoryy Protocol operations

    o Authentication/control

    BIN

    D and UN

    BIN

    D ABANDON

    o Interrogation Search Compare

    o Update (All operations are Atomic, i.e. one at a time) Add entry Delete entry Modify entry

    Modify DN/R

    DN

    y Client server interaction

    o Client BINDso Client performs operationo Client UNBINDs

    y Security

    y How the information can be protected from unauthorized accessy LDAPprovides Encryption, Authentication and Access Controly LDAP can operate without authenticationy LDAP can operate with simple authentication

    o DNand password providedo Clear text encoded with Base64

    y LDAP can operate with simple authentication over SSL/TLS

  • 8/8/2019 Nsd Note Final

    3/29

    y LDAP can operate with simple authentication and Security Layer SASLo Parameters: DN, mechanism and credentialso Provides cross protocolauthentication callso Encryption can be optionally negotiated

    oLDIF (lightweight data interchange format)

    Standard way of representing directory data in textual format Used when importing/exporting data to and from a directory server. ASCII (no Unicode untilLDAP v3)

    o Schema design Schema design determines what data can be stored in the directory Purpose:

    y Ensure that poorly designed applications do not store redundant datain the directory.

    yCan be used to impose size/range/format constraints on data.

    LDAP schemas consists ofy Attribute typesy Attribute syntaxesy Matching rulesy Object types

    Attributesy Directory entries contain a collection ofattribute types and valuesy Attribute types hold specific elements suchas names or telephone

    numbers.y The definition ofattribute type contains the following

    o Unique nameo Unique Object IDo Textual descriptiono Indication of whether the attribute is single or multi-valuedo Associated syntax and matching ruleso Attribute usage indicatoro Range and size restrictions of values stored in the attributeo Indication whether it can be modified by regular applications

    y Attribute syntaxes and matching ruleso Eachattribute has an associated syntax which specifies how

    exactly each value is represented and how comparisons are

    made.

    o The rules for making attribute value comparisons are calledmatching rules

  • 8/8/2019 Nsd Note Final

    4/29

    Object Classesy Object classes are used to group related attributes together

    o Each directory entry belongs to one or more object classeso The names ofan object class an entry belong to are always

    listed as a special multi-value attribute called objectclassy The set of object classes associated entry fulfills the following needs

    o What type of data MUSTbe in the objecto What type of dataCANbe in the objecto Provides a convenient way for a client to retrieve a subset of

    entries from a search operation

    y An object class holds the following informationo Unique nameo Unique Object IDo

    Textual descriptiono Set of mandatory attribute typeso Set ofallowed attribute typeso A kind (abstract, auxiliary, structural)

    y Structural classeso Describes the basic aspects ofan objecto All entries should belong to exactly one structural object class

    y Auxiliary classeso To add a set ofattributes that already belongs to a structural

    class. (think of multiple inheritance, but only with requiredattributes)

    y Abstract classeso Only used as ancestors of derived classes.

    o Namespace design The directories namespace provides means on how information is referenced

    in the directory.

    Proper namespace design leads toy Ease of managementy

    Access control flexibilityy Ability to satisfy a wider variety of directory based applicationsy More natural navigations

    Structure ofa namespacey LDAP namespace is inherited from X.500y Tree structured

  • 8/8/2019 Nsd Note Final

    5/29

    y LDAP does not support namespaces that entry might bothbe a childand aparent of the same entry.

    Naming entries in LDAPy A set of attributes are chosen to form a RDNy

    Form the full name (DN

    ) by combining all the entries and itsancestors RDNs.

    Purpose ofa namespacey Data referencingy Data organizationy Datapartitioning

    o Dividing among multiple serverso Can only be performed at abranchpoint in a directory (whole

    branch)

    y Data replicationy Access controly Application support

    Analyzing namespace needsy Choice of suffixes

    o The directory service may have only alocal scope or can bepart ofalarger directory system

    o Suffixes name the top of the department treeo A directory server may hold multiple suffixes

    y Flat or hierarchical namespaceo A good design practice is to always make a flat directory

    namespace

    o The flatter the namespace, the less likely are the names tochange.

    o Shorter names take upless space.o Hierarchical structures

    Can be used to partition dataamongst servers.y Whichattributes should be chosen for RDN

    o Ensuring unique names Use existing unique names

    y Employee IDsy Unix login names

    Constructing new identifiersy Timestamping

  • 8/8/2019 Nsd Note Final

    6/29

    y Append numbers to namesy Application requirementsy Administrative concernsy Privacy concernsy Future needs

    o Avoid choosing the wrong naming attributeso Dont choose ahierarchical namespace withahierarchy based

    on geographical, organizational or any scheme that is bound

    to change

    o Topology design What is topology design

    y LDAP services are built to support a distributed directoryy A topology describes the way you divide the directory among physical

    servers and how these servers are allocated amongst yourorganization.

    y Benefitso Optimalperformance for directory applicationso Increase availabilityo Better managed

    When you divide a directory into manageable chunks and assign them toseparate servers, youare partitioning the directory

    y The directory is responsible for hiding the partitioning details fromuser (transparency)

    y Each server is only responsible for aportion of the treey A partition is a complete sub tree of the directory information tree

    (DIT)

    y All entries in apartition must share a common ancestor; the partitionroot.

    Knowledge referencesy Relationships between partitioned directoriesy Two types

    o Immediate superior knowledge references Point upward in the directory towards the root and ties

    the naming context with its ancestor.

    o Subordinate references Name resolution

    y Process where the directory resolves a given DN from a client to anactual entry in the directory.

  • 8/8/2019 Nsd Note Final

    7/29

    y Used ino Locating abase object ofan LDAP search or compare

    operation

    o When locating a delete to modify, delete or rename.o

    When locating aparent ofan entry to be added to thedirectory.

    y A DNpresented in this way is called apurported nameo The client claims that the entry exists, the server must check if

    its true.

    Handling client distributiony LDAPReferrals and search result continuation references are

    pieces of knowledge reference information sent from an LDAP

    server to an LDAP client which indicates that other servers need to be

    contacted to fulfill the requesty LDAP referrals

    o When add/del/modify operations are performed and theserver does not have the entry.

    y Search result continuation referenceso When search scope is not base-leveland the server has

    knowledge of other subordinate directory partitions.

    y Client libraries handle processing of references automaticallyy Structure ofaLDAP referral

    o Information is in form ofan LDAPUniform Resource Locatoro A referralhas the following information

    The port number of the server Hostname of the server Base DN (search) or target DN (add,deleted,compare)

    y Chainingo The client sends a request to the servero If the server is unable to process the request because It does

    not hold all the required data, it sends requests to other

    servers instead of referring to the client.o It then returns the combined results to the client when the

    other servers have completed the requests.

    y Deciding between distribution handling by client or servero Chaining reduces client complexity at the cost of increased

    server complexity

  • 8/8/2019 Nsd Note Final

    8/29

    Authentication in a distributed directoryy The server or servers that ultimately handle a client request must

    verify the identity of the client so they can enforce restriction

    o True even in chained requests.y

    Authentication in a chained environmento The directory connects to server 1 and authenticateso If the client is authenticated, sends a success codeo The client submits a search operation to server 1o Server determines that it does not hold the appropriate

    partition so it chains the operation to server 2

    o There are 2 ways the server 2 can learn the clients identity Server 1 can just tell server 2 who the client is and

    server 2 has to believe it

    Server 1 can forward the clients identity andcredentials so that server 2 can verify.

    o Security Implications When using referrals. Client should always make sure

    that the target server is not a rogue server so that it

    does not send its authentication credentials.

    When to partition?y Too much entries for a single servery Applications tend to read and modify entries from the local

    workgroupy Update traffic to a single partition is too largey Directory use will expand significantly in the future, beyond the point

    where a single partition is feasible

    Replication Designy Replication increases the availability ofa directory server

    o Increases reliabilityo Increases read performance by balancing loado Security purposes

    a replica can function as apublic directory servicethrough selective replication

    y Conceptso Suppliers, consumers and replication agreements

    The terms which the supplier and consumer refer tothe source and destination of replication updates

    y Roles are not mutually exclusive

  • 8/8/2019 Nsd Note Final

    9/29

    Configuration information that tells a supplier serverabout a consumer server is termed a replication

    agreement

    y Includes the unit of replicationy

    The hostname and port of the remote serverand Scheduling information

    In essence the replication agreement defines what tobe replicated and where it is to be send and how it has

    to be done.

    o The unit of replication Define sub tree

    y Specify the DNat the top ofa sub tree andeverything below willbe replicated

    Filtered replication

    y Define select entries in a sub tree based onobjectclass definitions

    y Define select attribute typesy Replication holes!

    o When performing filtered replicationsthere may be holes in the tree.

    o Glue entries are placed in this place tofill the holes.

    oConsistency and convergence

    Describes how closely the content in replicated serversmatches, in any given time.

    The supplier and the consumer is converged when theyhave the same data.

    o Incrementaland totalupdates Incrementalupdates are to send to the consumer only

    data that it does not have.

    Totalupdates are useful when creating a server only.o I

    nitialpopulation ofa replica Consumer contains no data when its replicated Techniques used

    y X.500 directory information shadowingprotocol

    y openLDAPuses LDAP itself to perform initialreplication by sending a series of delete

  • 8/8/2019 Nsd Note Final

    10/29

    operations to remove unwanted entries and a

    series ofad operations to populate it.

    o Replication strategies The term replication strategies refer to the way

    updates flow how they interact when updates arepropagated to and from the servers.

    Problem: updating all servers when a client modifies anentry in a server.

    3 solutionsy Single master

    o Replicas are all read onlyo Edits have to be done by the master

    server

    o Has a single point of failureo Client can write data to the server

    through chaining or referrals

    y Floating mastero Replicas are all read only and a single

    read write server. But ifa read/write

    server goes down, an algorithm will

    elect a new read/write server.

    o Complications arise when masters arerejoined.

    Requires an update conflictresolution policy

    o Never used in directory productsy Multi master replication

    o More than one read/write serveravailable

    o Client writes to any of these serverso Servers must ensure that the updates

    are properly propagated

    o Requires an update conflict resolutionpolicy

    Replication errorsy Errors suchas network problems in the

    connection from the server to the slave cause

    slurpd to rqueue a modification.

  • 8/8/2019 Nsd Note Final

    11/29

    y Errors suchas schema violations andinconsistent states require manual intervention

    Replication errors are written together with thereplication record to reject a file

    yLDAP vs S

    NMP

    o Connection oriented vs Connectionlesso Stateful vs statelesso Complex vs simple data structureso Comlex vs simple operations

    Disaster Recovery

    y Risk vs consequenceo Affordabilityo Manageabilityo Service and support

    y Component redundancy methodo RAIDo Mirroring

    Keeping an exact duplicateo Clustering

    A group of servers that share work and may be able to back each other up ifone server fails.

    y Using multiple diskso Multiple drive system without striping, the IOload is never perfectly balanced. Some

    drives will contain files that are frequently accessed and some may contain files that

    are rarely accessed

    o To maximize throughput, IOload must be balanced across all the drives.y Disk striping

    o Combining multiple drives into one logical storage unito The stripes are interleaved in a rotating sequence so that the combined space is

    composed alternatively of stripes from each drive.

    o Striping can be done at byte level or in blockso Stripe width

    Number ofparallel stripes that can be written to or read from simultaneously = to the number of disks in the array

    o Stripe size Size of the stripes written to each disk

  • 8/8/2019 Nsd Note Final

    12/29

    Smaller the size, the more pieces required to store a file ofa given size.Increases transfer performance but lowers seek performance.

    Larger the size, fewer drives are required to store files ofa given size,lowering transfer performance

    o Read

    /write performance

    When you write data in a redundant environment, you must write to eachredundant location (slow)

    When you read data you only need to read one of the locationsy RAID

    o Data is reconstructed in the event of failureo The number of drives in the array and the way data is split determines

    The RAID level The capacity Overallperformance Dataprotection characteristics

    o Raid 0 Striped disk array without fault tolerance Files are broken intro stripe size and distributed to the separate disk in

    rotating order

    Minimum of 2 drives No fault tolerance Enhances performance in either a request-rate-intensive or transfer-rate-

    intensive environment

    Best performance is achieved when data is striped across multiple controllers Simple design and easy to implement Not a true RAID because of no redundancy. If one drive fails, all is lost.

    o RAID1 Mirroring and duplexing Mirroring

    y Data is duplicated on two different drivesy When a drive fails, the system continues on a single drive

    Duplexingy Uses two drive controllers

    Minimum of two drives Capacity is that of the smallest drive Read performance ofa single drive unless the controller can perform

    concurrent separate reads per mirrored pair

  • 8/8/2019 Nsd Note Final

    13/29

    Write performance less than a single drive 100% redundancy of data means no rebuild is necessary in case of disk failure Simplest RAID design Disadvantages

    yHigh disk overhead

    y If the RAID function is done by system software it willload the CPUand degrade throughput.

    y Ifa single controller is used, its stilla single point of failure.o RAID2 : ECC (error correction code)

    5 drives. Eachbyte of data is written to a different disk drive in the array. Each word has its hamming code word recorded on the ECC disk On a read, the ECC code verifies correct data or corrects single disk errors

    Advantagesy On the fly error correctiony High data transfer ratesy Relatively simple controllery Entry level cost very high

    Most disk drives today offer embedded ECC information within each sectoras standard so no significant advantages over RAID 3

    o RAID3 Parallel transfer withparity Data is distributed by bytes or bits. Typically less than 1024 bytes per stripe. A parity check byte is created for each data stripe As data is written, aparity check is calculated and stored on the parity disk As data is read, the parity is recalculated and compared with the parity disk Ifa drive goes down, the parity bit allows the missing data to be

    reconstructed.

    Minimum of 3 drives required Can tolerate the loss of one drive Disk failure has an insignificant impact on throughput

    Controller design is fairly complex Low ratio ofparity disk to data disks mean high efficiency Very difficult and resource intensive to do as a software RAID

    o RAID4 Independent data disks with shared parity disk Same as RAID3 but data is distributed by blocks Larger stripes mean that records can sometimes be read from a single disk.

  • 8/8/2019 Nsd Note Final

    14/29

    Very high read rate to highblock size Write performance is slow Good for smallblock transfers like transaction processing applications (high

    reads, low writes)

    o RAID5 Independent data disks with distributed parity blocks Same as RAID4 but withparity information distributed in stripes across the

    array.

    Multiple concurrent reads provide better throughput Most common RAID Minimum of 3 drives Can tolerate the loss of one drive High read transaction rate and slow write performance (can be improved

    withadditional cache)

    Low ratio ofparity disks to data disks mean high efficiencyo RAID6

    Independent data disks with two independent distributed parity schemes Same as RAID5 but with the parity information distributed in stripes across

    the array and down the array

    Requires at least 4 disks Very complex controlled design Controller overhead due to compute parity addresses is extremely high Requires N+2 drives to implement because of 2d parity scheme Poor write performance Useful for mission criticalapplications

    o Nested RAID Multiple raid level is generally created by taking a number of disks and

    dividing them into sets.

    y Within each set, a single RAID level is applied to form a number ofarrays

    y Then the second RAID level is applied to the arrays to create ahigherlevelarray

    RAID 0+1y High data transfer performancey RAID 0 performance withRAID1 data securityy Most common nested raidy Minimum of 4 drives, expands in 2sy Has the same overhead for fault-tolerance as mirroring alone

  • 8/8/2019 Nsd Note Final

    15/29

    RAID 10y Very high reliability combined withhighperformancey Requires at least 4 disks in multiples of twoy Implemented as a striped array whose segments are RAID1 arraysy Raid10 has the same fault tolerance as RAID 1y Raid 10 has the same overhead for fault tolerance as mirroring aloney Very expensivey All drives must move in parallel to appropriate track, lowering

    sustained performance.

    o Types of raid Hardware fault tolerance is faster Software fault tolerance is less expensive A hardware fault tolerance solution might lock you into a single vendor Some vendors may support hot-swappable disks when there is a failure Fault tolerance does not reduce the need for backups

    o Terminology Hot swap

    y Allows failed hard disks to be removed and replaces withoutpowering down

    Hot sparey Uses an additional, preconfigured disk or disks to automatically

    replace a failed hard disk.

    y Clusteringo A parallel or distributed system that consists ofa collection of interconnected whole

    computers, that is utilized as a single unified computing resource.

    o Two or more computers combined to perform a task whicha single computer is notadequate to perform.

    o The clients have no knowledge of the underlying hardware of the clustero Highavailability clustering

    Two or more systems configured to monitor each others healthand to takeaction when a fault is detected

    Most common causes of server failurey Disk and IO system failurey Power outagey Software failurey Natural disastersy Human error

  • 8/8/2019 Nsd Note Final

    16/29

    Used for key databases, file sharing on a network, business applications andcustomer services suchas ecommerce websites

    o HighPerformance Clustering Split a computational task across many different nodes

    Distributed computingy Different parts ofaprogram run simultaneously on two or more

    computers that are communicating with each other.

    Parallel computingy Simultaneous execution of the same task (split up) on multiple

    processors in order to obtain results faster

    o Improve scalabilityo Improve manageability

    Cluster management consoles can monitor and configure numerousindividual systems from a centralized location

    Administrator can identify hot spots or failures across a wide range ofapplications and take some preventive action.

    Advanced cluster consoles can distribute software patches to many nodeswitha single command

    o Load balancing Each node is configured to provide the same services. All requests come throughaload balancing frontend which would distribute

    the work to a set ofbackend servers.

    Primarily for improved performance.o Cluster architecture

    Shared disky Independent systems share disk storage for files and databases.y They have a shared IObus but dont share memoryy Provides highavailabilityy Dataaccess has to be synchronizedy Applications must be specialized for shared disk clusteringy The larger the number of nodes, the larger the locking protocol

    overhead. Shared databases Mirror

    y Replicating allapplications and data to a secondary storage location. Shared nothing

  • 8/8/2019 Nsd Note Final

    17/29

    o Tired network architecture Scale is can be from 2 to thousands Single tier

    y 48 port switching configuration allowing up to 48 nodesy

    Can be scaled by increasing ports on the switchy Modular switches can scale welly Many clusters implement a dedicated network for inter node

    communication

    o Shared disk tiered system 3 server cluster

    yy When server 1 breaks..it can move its tasks to the other 2 servers like

    this

  • 8/8/2019 Nsd Note Final

    18/29

    o Mirroring Each server has its own disk When a server does a write operation, it replicates the write operation into

    one or more configured servers.

    If that server fails, the other secondary server can take the primary position.

    Data mirroring cannot deliver as highavailability as shared disk clustering asthere is a small instance where the databetween the two servers are not

    consistent

    Requires additional disks. Can be very expensive to replicate RAIDs Takes CPUand network resources, When a server goes down and comes back up it needs to be resynchronized.

    o Shared nothing cluster Each server has its own disk Monitors each others health status and directs system recovery actions if

    something goes down.

    A node cannot access the disk it does not own unless the owner fails andgives up control.

    Failover time may be slow No overhead oflocking protocols Disadvantages

    y Tends to limit ausers storage options as the storage hardware mustbe capable ofbeing shared by multiple servers

    y Sharing of non-disk peripheral devices may not be supported.y Having a dedicated cluster communication network is highly desirable

    and may require purchase ofadditional network components.

    y More complex to administer.y Server Load balancing

    o Algorithms Round robin

    y Each request is sent to the next server in the pool.y Assumes that all servers are equal

    Weighted round robin

    y When servers have different performance, a fractional weight is givento each server and sessions are assigned using these ratios

    y Assumes that all workloads are equal Least load algorithm

    y Based on number of open sessions a server has or keeping track ofresponse time and balancing on the fastest response.

  • 8/8/2019 Nsd Note Final

    19/29

    WorldWideWeb

    y Main componentso Serverso Clientso Proxies

    y HTTPo HTTP is used to transmit information in multiple formats, languages and character

    sets based on MIME. Contents ofaHTTP message body are treated blindly.

    o GlobalURI: HTTPuses uniform request identifiers in all its transactions to identifyresources on the web

    o Request-response exchange: HTTP requests are sent by clients and responses arefrom servers

    o No state is maintained by clients across requests and responses (v1.0)o Syntax

    Request Line (PUT/motd HTTP1.0) General/Request/Entity header(s)

    y General : (Date: Wed,22 Mar 2000 08:10:07 GMT)y Request : (User-Agent: Mozilla/4.03)y Entity : (Content-Length: 23)

    CRLF Message body (optional)

    o HTTP/TCPInteraction HTTPhas no defined transport protocol Nearly every implementation ofHTTPuses TCP TCPTimers

    y Tcp rely on timers for operations suchaso Retransmission oflost packetso Repeating the slow-start phaseo Reclaiming state from a terminated connection

    Possible delaysy TCP Establishment

    o Loss ofaSYN or SYN-ACKpacket introduces alonger delayo The retransmission timer is doubled for eachlost packet, and

    the delay is increased.

    o A frustrated user might repeatedly click the stopand refreshbutton which will increase congestion in the target network

    y Transmitting the HTTP request

  • 8/8/2019 Nsd Note Final

    20/29

    o Less likely to occur because TCP gradually decreases theretransmission timer value, but most web transfers are very

    smalland might spend all the time in the slow-start phase of

    congestion control

    oWitha small congestion window, the likelihood of successfullydelivering multiple packets after aloss is low

    Receiver is unlikely to generate the 3 duplicate packetsnecessary to trigger a fast retransmission

    y Receiving the HTTP responsey Rendering the resource

    Persistent connectiony A web client typically downloads multiple resources from the same

    website. Withpersistent connections, these HTTP transfers can travel

    over a single connection.y Sending a response message on apreviously idle persistent

    connection generates alarge burst ofpackets

    y To avoid overloading the network, the slow-start phase has to berestarted after aperiod of inactivity

    Reducing slow-start-restart penaltyy Disabling slow-start-restarty Larger SSR timeouty Gradually decreasing congestion windowy Pacing transmission ofpackets

    o Dont send bursts. use a delay. TIME_WAIT state and its effect

    y TIME_WAIT state connections consume memory in the OSy Reducing TIME_WAIT overhead

    o Lowering resource requirements for TIME_WAIT connectionso Shift responsibility to clients

    Policies for closing persistent connectionsy Apply timeouts

    o Small timeouts after first request, then increase timeout afteradditional requests

    y Limit total number of requestsy Complicated policies

    o Close connection that has been idle longesto Base decision on relative importance ofuser

  • 8/8/2019 Nsd Note Final

    21/29

    HTTP/TCPLayeringy Aborted HTTP transfers

    o Because the HTTPprotocol does not have a mechanism forterminating an ongoing transfer, aborting a request requires

    closing the underlying transport connectiono Ifa connection is terminated before the browser receives the

    response, the user may not know whether the server actually

    completed the request or not.

    o Aborting one request in the pipeline requires aborting all ofthe pipelined requests that have not been processed

    o User levelabort operations do not immediately stop thetransfer of data from the server

    y Nagles algorithmo L

    imits the number of smallpackets transmitted by aTCP

    sender, which may delay the transfer of the last packet ofa

    HTTP message

    o With nagles algorithm enabled, the last packet ofaHTTPrequest or response willhave a slight delay to be sent. (until

    all the acks are received)

    o To disable nagles algorithm, set the TCP_NODELAY socketoption

    Disadvantages:y Consider a web server that performs separate

    system call to write eachline ofHTTP response.

    y Disabling nagles algorithm will introduce largeamount ofTCP overhead.

    y Delayed acknowledgementso The TCP receiver may delay the transmission ofan ACK in the

    hope ofpiggybacking acknowledgement on an outgoing data

    packet.

    o Receiver should not wait too long to send ACKo Sender may not be able to send additional data if the transfer

    is not acknowledged

    Multiplexing TCPConnectionsy Multiple connections to a server per clienty Enables simultaneous downloading of embedded imagesy Proxy acts behalf of multiple clients allows files to be retrieved in

    parallel

  • 8/8/2019 Nsd Note Final

    22/29

    y Problemso Unfairness to other clientso Higher loado Higher perceived latency

    yRemedies

    o Removing performance incentives for parallel connectionso Providing alternatives

    Web Traffic Analysis

    y 3 main stepso Monitoring web transfers at some locationo Generating measurement records in a formato Pre-processing records for subsequent analysis

    y Motivation for web measuremento Content creatorso Webhosting companieso Network operatorso Web/networking researchers

    y Measurement techniqueso Web software components can generate logs as part ofhandling requestso Traces of web traffic can be collected by passively monitoring the links and routers

    in the network

    o Methods Server Logging

    y Web server generates log as part ofprocessing client requestsy a server log could be used to analyze access patterns and popularity

    of web resources. BUT

    o the server does not know how many requests are satisfied bythe caches

    o popular resources may be more likely to be returned from acache

    o cache-busting can be used to circumvent such side effectsy Associating requests withusers is difficult

    o Requests could come from aproxyo Shared client machineso Dynamic IPaddresseso Use cookies to track users

  • 8/8/2019 Nsd Note Final

    23/29

    Proxy Loggingy Proxy log provides a record of requests to a wide range of websitesy Can be used to determine the relative popularity of different sites and

    the effectiveness ofpolicies for web caching

    yLimitations

    o Proxy does not see the requests that are satisfied by thebrowser cache or other proxies closer to the UAs

    o May not be representative of the rest of the clients in the web Client Logging

    y Has the potential to provide a detailed view ofuser browsing patternsy Compared to proxy and server logs, the user agent handles a

    relatively small number of requests at a time

    y Limitationso No de-facto standardo A realistic study requires alarge number ofusers

    Packet Monitoringy Client, proxy and server logging impose an overhead on software

    components

    y Application layer logs have little or no information about networkactivity at tcpand iplevel

    y Limitationso Cannot capture requests satisfied the browser cache or

    encrypted messageso More difficult with increasing link speedo Requires software for reconstructing http messages from

    packet traces

    Active Measurementy Logs and packet traces typically do not include enough information to

    evaluate the performance experienced by users.

    y Alternative approach is to generate requests in a controlled mannerand observe their performance

    y Issueso Where to locate the modified agentso What requests to generateo What measurements to collect

    y Limitationso Diversity and complexity of the internet makes it difficult to

    draw general conclusions

  • 8/8/2019 Nsd Note Final

    24/29

    o Requires extensive experimentation over a wide range oftimes, locations and server sites

    o Necessitates a wide-scale, global measurement infrastructureo Common log format

    Remote host (

    IPor hostname)

    Remote identity (Account) Authenticated user (name provided by user) Time Request (first line ofHTTPheader) Response code Content length

    o Limitations ofHTTPhead information Logs capture the request line and response code only, but not the header

    fields

    o Detecting resource modifications Useful for analyzing changes Statistics about resource modifications can be inferred from httpheaders,

    response sizes and timestamps

    y Age = request timestamp last modifiedWeb Cachinga

    o Movement of web content close to userso G

    oals of caching Reduce user experienced latency between the time of web request and the

    time the response is displayed at the browser

    Reduce the load on the network Reduce the load on the origin server

    o First major technique that attempted to reduce user-perceived latencyo Why cache?

    Reduce bandwidth costs Enhance user experience F

    rees network resource for other data Reduce delay

    o Where is caching done? Webbrowser cache to re-fetchpages the user examined during the same

    session

    Caching proxy

  • 8/8/2019 Nsd Note Final

    25/29

    Regional caches can help several geographically collocoated caches in one ormore administrative domains

    Reverse proxiesy Acts as a front end to one or more origin servers whichare placed

    close to the server. Acts behalf of the server. Interception proxies

    y Intercept http request and explicitly receive responsesy Placed close to clientsy Does not have to be directly in the path ofpackets

    o How is caching done? Deciding whether a message is cacheable

    y Protocol requirements that prevent cachingy Is the content dynamicy Is the cached response likely to be used again?y Will the decision to cache aparticular response lead to replacement

    of one or more resources

    Cache replacement and storing the response in cachey Cache replacement introduces some overhead, especially if many

    small cached objects must be removed

    y Resources known to be stale are evicted from a cache even if thecache is not full.

    y The most well known policy is the LRU (Least recently used) approachy Factors of complex approaches

    o Cost of fetching the resourceo Cost of storing the resourceo Number ofaccesses to the resourceo Probability of the resource being accessed in the futureo Time since the last modificationo Heuristic expiration

    Cache replacement policiesy Least recently usedy Least frequently usedy Key based policies

    o Size of objects (ditch the biggest)o Lowest latency firsto Hyper-G

    First consider LFU If there is a tie use LRU

  • 8/8/2019 Nsd Note Final

    26/29

    If its still tied use Sizey Cost based policies

    o Size adjusted LRU Ranks according to their cost to size ratio

    o Greedy dual Similar to Size adjusted LRU, objects in GDs cache are

    ranked according to an assigned value specified by the

    application. Objects with the lowest utility are evicted

    first. However, every eviction operation lowers the

    utility of other objects by the utility value of the

    evicted object

    o Cache coherency Cache has to ensure that its still freshbefore returning to a client Cache may return an older cached value along witha reason for the staleness

    of the response (unable to contact origin server)

    HTTP/1.1 provides several ways to maintain cache coherencey Cache-Control: forces proxy to return cached response revalidation

    only-if-cached

    y If-Modified-Since request header informs proxy/server to send aresponse ifand only if the modification time of the resource is higher

    than specified

    y Expired response header : informs client when a resource is expectedto go stale

    Weak consistency approachesy Lease based approach

    o Cache agrees to store a response for a fixed amount of timeo This shifts overhead of revalidation to the server, which must

    keep track ofall the caching proxies it promised to reply

    o Not scalable and never implementedy TTLapproach

    o Responses have a cache expiration time associated with themo

    During theTTL

    period, the cache does not revalidate theresponse at the risk of staleness.

    o TTL value can be based on Frequency of request for a cached resource Mobile environment Last modified header

  • 8/8/2019 Nsd Note Final

    27/29

    o Cache related protocols Ifa set of caches are arranged in ahierarchy, a cache can contact other

    caches at the same level to see if the requested object is present in other

    caches.

    Protocols are needed to reduce inter-cache communication costs

    y Internet Cache Protocolo UDPbased protocol for querying peer caches for a cached

    copy ofaparticular response

    o Used in Squido Lends itself well to hierarchies

    Requests are propagated towards the origin server ICP requests has to be translated to HTTP if the the

    request is routed back to the web server

    When a response comes back from the origin server,the intermediate caches can store the response for

    future use

    Although regionaland nationalproxies can helpalargenumber ofusers, eachapplication layer hop can

    degrade performance

    y Cache Array Resolution Protocolo Defines a mechanism by whicha set of caching proxies can

    effectively function as a single logical cache

    o A hash function is used to partition the urls across the caches A client trying to locate a cached response can target

    the request to an appropriate cache by applying the

    function

    The hash function uses the request URLand theidentity of the proxy members to construct a

    resolution path

    Compared withICP, CARPhas a deterministic requestresolution path

    o Disadvantages Load balancing When configuration of the caching system changes,

    the cached URLs must be reassigned

    Responses cannot exist on more than one proxy

  • 8/8/2019 Nsd Note Final

    28/29

    y Cache Digest Protocolo Permit the exchange ofa digest ofa caches contentso When a cache has a digest ofall its peers, it can quickly check

    in the digest first to see if the object of interest is available

    oDisadvantage: false positive.

    o CDP is an extension to ICPy WebCache Coordination Protocol

    o Loosely connected to the network layero Purpose is to intercept the HTTP request and redirect it to the

    cache engine

    o Performed by a cache coordinatoro Implemented as part of the Cisco Cache Engine

    o Impediments of caching C

    ache Bustingy An origin server has no way of knowing ifaproxy is serving stale

    resources

    y Origin servers have no way of knowing the number of times aresource has been viewed at the cache

    y Solutionso Hit metering

    New HTTPheader named Meter which would be usedby a cache to report to the origin server, the number of

    cache hits for a resource Not commercially used due to lack of confidence

    o Ad-insertion Suggest that aproxy add advertisements to apage to

    relieve origin server

    o Clear GIFs Also known as webbugs, are placed on a webpage to

    serve as a container for an advertisement

    Privacy issuesy R

    esponses can be cached against the wishes of the origin servery Cache control request and response directive no-store prevents

    message from being cached along the path.

    o Caching versus Replication Advantages of replication

    y Updating resources in replication is outside the HTTPprotocoly The list of replicas are known beforehand

  • 8/8/2019 Nsd Note Final

    29/29

    o Content distribution Selective, fine-grained mirroring Akamais content distribution solution:

    y Site that wishes to have parts of its resource distributed by akamaiwould agree to rename those URLs witha specific prefix

    y Resolving the hostname string using DNS yields the IPaddress of themirror