QoS in Clustered Environments
description
Transcript of QoS in Clustered Environments
Overview
• Introduction
• Routing Mechanisms
• Approaches to QoS in Clusters
• Conclusions
Introduction
• Networked applications inject different mixes of traffic in the network.
• Some classes of traffic require QoS treatment.
• Traditional best-effort model cannot handle such QoS demand
Cluster Systems
• Cost effective for high performance environment– Used in scientific computing, web servers,
multimedia servers, commercial applications
• Two switch/router design to build clusters:– Virtual cut-through (includes wormhole)– Packet switching
• Virtual cut-through– Designed for multicomputers
– Offers low latency and high bandwidth for best-effort traffic
– To support QoS, must modify the switch
• Packet switching: Ex. ATM– QoS support available for real time traffic
– Can not handle best-effort efficiently due to high message latency (compared to virtual cut-through)
Bottom Line
• Must reevaluate and optimize the network architecture to handle both types of traffic, best-effort and QoS, in clustered environments.
Routing mechanisms• Virtual cut-through & wormhole:
– Packets is composed of small flits
– A header flit leads and middle flits follow in a pipelined manner
– Once header is received at the switch, it is forwarded to the outgoing channel
– If channel is busy:• Virtual cut-through: store whole packet at the switch
• Wormhole: store a few flits across several switches
– Each worm carries routing info:• Can support multiple connections on a virtual channel
• Virtual channels and physical links are shared resources
• Real time application require predictable scheduling of such resources
• Must enforce a global priority ordering among competing messages
• Example of limitation: – assume a message with highest priority p at time t occupies a virtual
channel– If another message arrives with p` > p, it must wait till p message release
the channel– Limitation: with v virtual channels, can only enforce v level priority
ordering, although message priority levels may be more
• Pipelined circuit switching:– Similar to wormhole in terms of flits
– Connection oriented: header flit tries to reserve the path first
– If path is blocked, must backtrack and find another
– Middle flits follow if path is available
– If not, a connection is dropped
Classification of QoS Approaches
• Virtual circuits– Paths are virtualized & controlled locally at switches
– Based on QoS parameters, a separate VC is created where buffer and link bandwidth are reserved
– To guarantee end-to-end QoS, switches are responsible to schedule packets
– Flexible in terms of providing QoS
– Large buffer, complex scheduling algorithms
– Increases hardware complexity of switches
• Physical circuits:– No virtualization simpler design of switches
– Link arbitration policy is used to implement some control on delay and bandwidth
– Policy merges multiple streams at a physical link
– This causes coupling between streams sharing the link
– A QoS stream depends on other traffic flows sharing the link
– inflexible to manage network resources to support QoS
• Global Scheduling:– Complexity is moved out from switches to network
interfaces
– Switches are much simpler and fast
– Network interfaces augmented with special hardware responsible for:
• Routing• Timing packets injected into the network• Negotiation of shared resources with other NICs
– Relatively new approach• Issues of practicality, scalability, cost of synchronization, and
scheduling are open subjects to discuss
QoS in Packet Switching Networks
• Rotating Combined Queuing RCQ:– Low cost queuing & scheduling algorithm
– Provides QoS support in multicomputers and point to point LANs
– Switch model supports:• Connection based switching: decide routing and reserve bandwidth at
connection setup time
• Output queuing: packets arriving simultaneously at an output link are queued and scheduled for transmission (reduces head-of-line blocking)
• RCQ:– Reduce traffic cost by combing multiple decoupled queues
• Combine queues allocated for a few connection with small traffic and large delay bounds
– Support best-effort traffic using multiple FIFO queues per port
– Uses frame-based scheduling• Connection is allocated number of packet slots in a period of time
– Extra queues enable sender to send at higher rate more than reserved
– Queuing structure allows real time traffic to bypass best-effort traffic
– Permits best-effort traffic to utilize unused bandwidth by other connections
How Does It Work?• Enqueue arriving packets into one of the queue pointed by the current input queue pointer
for a specific connection
• If maximum number of allowed packets per connection is reached in the current queue, then move the pointer to the next queue
• For each idle cycle of the output channel, send any pending packets
• Else if there are no packets to transmit, move output queue pointer to the next queue and do the same
• Idle connections change their input queue pointer to always point to the current output queue pointer
• If QoS packet arrives, it is enqueued in the queue that is also pointed by the current output queue pointer, incurring a delay of the packets in front of it in the queue
• Guarantees a worst-case delay of one frame time
• End-to-end worst-case delay is bounded by the distance multiplied by the frame time
QoS in PCS Networks
• Wormhole switching may suffer from message blockage while PCS does not
• PCS is connection oriented– Can reserve bandwidth at connection setup
• Requires a VC per connection– Thus, it demands for large number of virtual channels per PC for high link
bandwidth– Switch hardware must support VCs ≥ Max simultaneous streams in the
network– Else, new connection are not guaranteed– Streams may be dropped
• To support QoS in PCS, use a preemption protocol for real time traffic
• Higher priority messages can preempt lower priority message on a virtual channel
• Blockage only occurs for low priority message competing with a high priority one
QoS in Wormhole-Switched Networks
• SuperNet project:– QoS using a separate subnet:
• Costly in terms of number host interfaces
– Imposing synchronous structure over asynchronous network
• Large overhead for small messages
• Costly in terms of number host interfaces
– Virtual Channels:• Better than the two above
• Requires complex scheduling and buffer space at switches
Continued• MediaWorm:
– Wormhole based router to support QoS
– Supports two traffic: best-effort and QoS
– Unlike FIFO, uses rate-based algorithm called Virtual Clock to schedule network resources
– Virtual Clock regulates bandwidth of each connection by assigning virtual clock value vtick that ticks at each packet arrival
– High bandwidth is represented by smaller vtick
• Example:– Message requires 50K flits/s
– Header flit carries a vtick set to 1/50K
– Header flit asks this value at all routers it passes till it reaches the destination
– Thus, no need for explicit connection setup
– For best-effort traffic, vtick is set to ∞ since it has the maximum slack
– Virtual Clock algorithm can improve QoS delivered to real time traffic compared to FIFO
• MediaWorm can achieve as good performance as a PCS router without dropping any connections
• PCS is expected to perform better since it is connection oriented. Yet, dropping of connections occurs
Conclusions
• For cluster systems, wormhole-like routings seem to be popular
• To support QoS is a challenge
• Several approaches are overviewed
• Use of virtual channels with a preemptive protocol to enforce priority among network traffic is a promising technique