[32]
Transcript of [32]
Hierarchical resource allocation for robust in-home video streaming
Peter van der Stok1,2, Dmitri Jarnikov1,2, Sergei Kozlov1, Michael van Hartskamp2, Johan Lukkien1
1, Eindhoven, Technical University, Netherlands; 2, Philips Research, Eindhoven, Netherlands
Abstract High quality video streaming puts high demands on network and processor
resources. The bandwidth of the communication medium and the timely arrival of the frames
necessitate a tight resource allocation. Given the dynamic environment where videos are
started and stopped and electro-magnetic perturbations affect the bandwidth of the wireless
medium, a framework is needed that reacts timely to the changes in network load and
network operational conditions. This paper describes a hierarchical framework, which can
handle the dynamic network resource allocation in a timely manner.
1. IntroductionToday, TVs, video recorders and set-top boxes are mostly interconnected with SCART cables. The advent of
broadband and the introduction of the second PC in the home make digital home networks a more realistic
way to interconnect Consumer Electronic devices. This tendency moves video away from dedicated media to
video streaming across an open and shared network connecting multiple types of devices (e.g. phones, PCs,
and CE-devices). This new multimedia environment introduces the problem of sharing limited network
resources between video- and other applications. Both the need for resources, expressed as bit rate, and the
availability of resources, expressed as bandwidth, fluctuate within intervals of tens of milliseconds. In
addition there is a need for a timely delivery of the video at the destination. The source of the video can be
either live (broadband TV or a camera) or taken from a storage medium. For a live transmission, a low
overall delay from generation to displaying is mandatory, which imposes strict timeliness requirements. The
source can be located inside the home, outside, or connected through some gateway (e.g. a broadband
connection). The quality of the source video can vary from relatively poor - in the order of tens of kbits/s - for
use on a small display, to high quality - in the order of 6 to 40 Mbits/s - for use on a large screen flat TV.
The sharing of the network medium among several applications leads to a lower bandwidth available to a
given video. In addition, a significant part of the home network will be based on wireless technology. The
stok-elsevier-jss-4 1
consequences of wireless connections are reduced security and bandwidth, as well as increased fluctuation of
the bandwidth through interference with other transmission sources and the moving of objects.
Not meeting the resource and timeliness requirements leads to non-optimal viewing experiences in the form
of distortion, hiccups, delayed viewing or stalling. To avoid these severe quality changes (leading to people
refusing to buy networks and TVs) we propose a scheme that allocates the resources in such a manner that
under overload a tolerable quality degradation occurs such that recognizable video is provided at all times.
The scheme combines the video-source, the video coding, and the transport protocol and is especially
advantageous for live broadcasts. It distinguishes fast fluctuations at the frame level (≈ 40 ms) from structural
fluctuations. Fast fluctuations are caused by variations in the frame sizes and distortions in the bandwidth.
Structural fluctuations last longer and come for example from the starting/stopping of another application.
It is not sufficient to come up with technologically viable solutions. Because many manufacturers provide
network equipment and CE-devices, inter-operability must be assured. The situation in the home is very
different from telephone- or service provider networks. The networks for the latter are under control of one
operator who decides the resource management procedures and technology. In the home there is no such
authority, and standardization must assure that the provided technology collaborates to support the policies
wished by the users of the home network.
2. Related workMost CE-devices (TV, DVD) are resource-constrained to put them in an acceptable price range. For
telephones, the same resource constraints are mostly driven by battery power constraints. The work on video
streaming most related to our work can be distinguished in three areas (1) scheduling of packets on the link,
(2) adaptation of processing power to video requirements in the renderer, (3) multicasting video with a bit-
rate adapted to the individual capacities of the receivers, and (4) transporting video over a network.
Scheduling packets. The packets of all applications, which share the network, need to be scheduled. It is
assumed that a network authority (as foreseen by UPnP QoS standardization[32][33]) allocates bandwidth to
the individual applications. To prevent buffer overflows and provide a balanced loading of the network,
packets are scheduled such that the consumed bandwidth is not exceeded and the network load is evenly
distributed in time. The unscheduled packets are viewed as interference to the scheduled applications. Leaky
stok-elsevier-jss-4 2
bucket and token bucket techniques are two well-known examples [2]. General Processor Sharing (GPS) is an
ideal scheduling mechanism, which is applied to networks. Two examples show how scheduling techniques
can allocate bandwidth to streams or separate asynchronous traffic from synchronous (video) traffic [22][23].
Processing power in CE-devices. In [7], and [17] the observation is made that the CPU needs for a video
stream fluctuate from frame to frame and from scene to scene. A distinction is made between the fast
fluctuation at the frame level, and the slower fluctuation at the video scene level. The concept of Scalable
Video Algorithm (SVA) is introduced to adapt the quality of the decoding process to control the CPU
requirements. In this framework it is possible to provide the highest quality while still meeting the deadline of
each frame. In [8] the authors describe the allocation of priorities to video frames such that important video
frames, on which other frames depend, have a higher probability of acquiring CPU resources to maintain the
highest possible video quality under processing overload.
Multicasting. Devices have different processor/memory capabilities, thus not every device may be capable of
processing all video data that is streamed by the sender. To ensure that all devices process video according to
their capacity, a sender sends to each destination the amount of data that the device can successfully process.
In [16] the sender adapts the content. There are several strategies for content adaptation. The three foremost
of those strategies are the simulcast model, transcoding and scalable video coding model. With the simulcast
approach [16] the sender produces several independently encoded copies of the same video content, which
differ in e.g. bit/frame rates, and spatial resolutions. The sender delivers these copies to the destinations, in
agreement with the specifications coming from the destination.
Video transport. Two protocols are generally deployed for the transport of video over a network with the
Internet protocol (IP) facilities defined by Internet Engineering Task Force (IETF): Transmission Control
Protocol (TCP) [20], and Real-time Transport Protocol (RTP) [19]. RTP is very successful on the Internet to
support live video but the quality displayed on the PC is far below the quality accepted by buyers of digital
TV. RTP promotes timely arrival of packets by allowing loss of packets. Efforts are ongoing to extend RTP
with retransmission facilities as provided by TCP [34]. TCP provides loss-less transport of packets but badly
supports live streaming over the Internet. The TCP-RTM protocol is an effort to provide timeliness to audio
packets by skipping late packets [21].
stok-elsevier-jss-4 3
The fluctuating bandwidth of the wireless medium has as consequence that the quality of the rendered video
fluctuates. In [14] a controller at the receiver side removes the quality fluctuations by selecting the
transmitted video parts such that the quality fluctuations are reduced.
Standardization for the home.
Only a few years ago, IEEE 1394 [25] and HiperLAN [26] were considered as promising candidates for
connectivity in consumer electronics home networking as they offered timely delivery of packets and are
therefore suited for multimedia transport. Now wired and wireless Ethernet are recognized as the
predominant connectivity standards. The advantage is that only one technology is used for all networking
applications in the home (e.g. file transport, chatting, audio, and video). Yet, it offers only best-effort but no
timeliness guarantees. The IEEE 802.11e standard [27], which provides extensions to wireless Ethernet,
offers prioritized and scheduled access. It also offers several other enhancements. Before the IEEE 802.11e
standard was completed in 2005, the Wi-Fi Alliance had started a certification program for wireless
multimedia based on IEEE 802.11e. The program for prioritized access, called Wi-Fi Multimedia (WMM)
[28], was completed in 2004. The Wi-Fi Alliance is currently working on a certification program for
scheduled access. Several other connectivity technologies have recently been developed that include
scheduled access: WiMedia [29] HomePlug AV [30], etc. Even for wired Ethernet, the Ethernet AV initiative
[31] intends to improve its timeliness properties.
As it is expected that home networks remain heterogeneous in their connectivity technologies, middleware
solutions are developed to deal with this heterogeneity. One of the more popular middleware technologies for
the home is UPnP [32]. The UPnP forum standardizes so-called device control protocols for e.g. AV
applications, Internet Gateway devices, but also for Quality of Service (QoS). The UPnP-QoS v1 and
ongoing v2 [33] specifications define the use of priority-based policies. Currently work in UPnP-QoS is
ongoing to develop version 3 on parameterized QoS and scheduled access. The UPnP-QoS makes it possible
to share bandwidth in accordance to application quality criteria. As such it becomes possible to share the
network between different types of applications. For example, the real-time aspects of audio and video
streaming can be guaranteed at the expense of delays for file transport
stok-elsevier-jss-4 4
The Digital Living Network Alliance (DLNA) is an industry forum that provides interoperability guidelines
to implement digital media servers, players and renderers [34]. Guidelines are written on the use of wired and
wireless Ethernet and Bluetooth, TCP/IP and UPnP. TCP is the mandatory transport protocol for AV content.
The DLNA also defines various profiles for AV media formats [35].
3. Video transportThis section motivates our choice of TCP and the deployment of scalable video coding. Figure 1 shows the
most important features of the network configurations we consider.
a)
b)
sender switch AP
receiver 1
receiver N ...
TCP traffic shaper
application
IP data
video data sender
real-time video
stored video
Figure 1 Example network configuration
The sender contains a video application, which sends stored or live video to a destination on the network. It
invokes a transport protocol, which packs the video frames in packets and the traffic shaper sends the packets
in a regular fashion to the network. The network is composed of a wired (switched Ethernet) part and a
wireless part. Packets are buffered and sent on in the switch and the Access Point (AP). Losses due to buffer
overflow may occur at the sender, the switch, the AP and the receiver. For today’s wired segment there is
almost no packet loss. In addition, measurements showed that losses over the wireless segment do not occur
when the retry counter is set to 4 or higher [24]. Consequently we may assume that packet loss over the
communication media in the home is negligible, and most of the time losses occur due to buffer overflow.
3.1. Transport protocolsWhen unreliable transport protocols (such as RTP, [19]) are used for sending multimedia streams over a large
network (the Internet), it is very difficult to control the data losses happening due to congestion in the routers
or due to the low reliability of the medium, e.g. a wireless link. Usual practice to handle such losses is to use
stok-elsevier-jss-4 5
error recovery mechanisms at the receiver and/or redundancy coding at the sender. However, these
mechanisms are often combined with some content adaptation technique, which uses the feedback from the
network or receiver to inform the sender about the losses. This makes the system difficult to implement.
TCP, being a reliable transport protocol, eliminates uncontrollable losses of data. Applications built upon
TCP see the network as reliable transport means with varying throughput. Nevertheless, if at some moment
the application needs to send more data than TCP can deliver (the network bandwidth drops below the bit rate
of a live encoded stream), loss of data can happen due to application/TCP buffer overflows. Introducing
larger buffers may decrease the probability of buffer over- and under-flows, because often the network
throughput drops below the video bit-rate only for a short time, after which the “recovery” takes place. The
larger the buffers, the longer the periods of insufficient bandwidth can go unnoticed by the end-user. The cost
for the large buffers is increased latency (the time needed for a unit of data – e.g. a video frame – to be
transferred from the sender to the receiver). Latency of more than 200 ms, which corresponds to buffering of
5-7 frames, is not acceptable in real-time video applications. Keeping the buffers small, limits the amplitude
and duration of bandwidth variations that can be handled. However, these losses are easy to control by
applying buffer management techniques that are different from the default Tail-Drop technique. Such
techniques as Partial Buffer Sharing (PBS) and Triggered Buffer Sharing (TBS) [4], as well as Push out
Buffers (POB) [5] are based on dropping lower priority data to accommodate higher priority data when the
buffer cannot accommodate both. Implementations of buffer management techniques in the video-streaming
domain address frame skipping approaches and scalable video coding methods (see below).
TCP selection. The major drawback of TCP is its stalling behavior and its slow start-up after an end-to-end
packet loss. However our measurements (section 5.1) indicated that end-to-end packet losses occur rarely in
home networks and are always immediately restored by acknowledgements resent within 20 ms.
Consequently, stalling behavior is almost completely eliminated. On the positive side, the flow control of
TCP adapts the bit rates of the packets to the bandwidth availability. For live video, packets are lost in the
sender buffer of the application, but the same application has access to this buffer and can decide which parts
can be removed. In contrast, RTP just goes on sending packets leading to uncontrolled losses.
stok-elsevier-jss-4 6
3.2. Video framesA MPEG-2 video stream is built up of I, P and B frames. Each frame represents one picture. An I-frame
contains enough information to be decoded independently. A P-frame needs additional information from a
directly preceding I-frame or P-frame. Motion vectors describe how a part of the referenced picture must be
moved for a correct visualization in the frame to decode. B-frames need additional information from two
frames, a succeeding P- or I-frame and another preceding P- or I-frame. The video is structured in Groups of
Pictures (GOP), containing one I-frame followed by a sequence of B- and P-frames. Examples of legal GOPs
are I(I), IPP(I), IBPBPB(I) or IBBPBBPBB(I). The (I) denotes the start of the next GOP.
A scalable video coding scheme describes an encoding of video frames into multiple layers, including a Base
Layer (BL) of basic quality and several Enhancement Layers (EL) containing increasingly more video data to
enhance the quality of the base layer and thus resulting in video of increasingly higher quality [18]. Scalable
video coding is represented by variety of methods that could be applied to many existing video coding
standards [10][11][12]. These methods are based on principles of temporal, signal-to-noise ratio (SNR),
spatial and data partition scalability [15]. In our framework we use a specific form of temporal scalability that
we call I-Frame Delay (IFD) and a form of SNR scalability that is resistant against packet losses.
I-Frame Delay. IFD represents a temporal scaling technique. When the network bandwidth drops below the
bit rate of the video, temporal scalability decreases the bit rate of the video by dropping video frames without
influencing the quality of the surviving frames. A reasonably low amount of dropped frames might not be
noticeable by the end-user. However, dropping frames arbitrarily (as it would be in case of Tail-Drop buffer
handling) is not a good idea because the impact of dropping MPEG frames has an impact on the end-user
perceived quality dependent on the frame type (I, P, or B). We use the frame type to guide the frames
skipping process as follows: when the sender buffer gets full, IFD will push the B frames out of the buffer
first, and then, (i.e. the bandwidth has dropped significantly for a longer period), the P and I frames.
The cumulative weight of B frames in a MPEG-2 stream often comes to 50% and more. This means that by
only dropping all B frames we can make the resulting example video stream fit into a bandwidth that is half
the bit rate of the original stream, still preserving inter-frame dependencies. The price for this will be a
stok-elsevier-jss-4 7
decreased frame-rate - in the case of an IBBPBB(I) GOP structure, all B frames dropped would lead to 1/3 of
the original frame-rate and 1/2 of the original bit rate.
SNR scalability. In Figure 2 two possible structures for the enhancement layer are shown. The arrows
indicate the dependence of the frames on each other. A cross suggests the loss of a particular frame in a layer.
A horizontal thick arrow indicates the loss of a frame in a layer dependent on the lost part indicated with a
cross. The BL has a normal standard GOP structure. Using a GOP structure that has P and B frames in the EL
is dangerous from a reliability point of view. If the network condition is bad, there is a high risk of losing a
frame during the transmission. In this case if the lost frame is I or P, the receiver will not be able to decode
the rest of the GOP, which will lead to a considerable loss of frames in current and upper enhancement layers
(Figure 2a). In our coding scheme, the enhancement layer is formed from the residuals of the frames from the
base layer. That means no dependency between different frames inside the enhancement layer. Therefore the
loss of any frame from an enhancement layer will not influence subsequent frames (Figure 2b).
Figure 2 Two SNR enhancement layer structures
Choosing TCP together with IFD or SNR scalable video makes it possible to remove a selected part of the
video at the source. The percentage of video to be removed at the source is determined by the bandwidth.
3.3. Example managementWithout too much loss of generality we explain the framework techniques by looking at the transmission of a
continuous stream over network of Figure 1. The wireless channel retransmits packets until they arrive at the
sender, simultaneously adjusting the bit rate of the channel dependent on the packet loss rate.
stok-elsevier-jss-4 8
Figure 3 bandwidth fluctuations for a given stream plus control indication.
Figure 3 shows an example of the bandwidth fluctuations as perceived by a TCP stream sent at maximum
packet rate. The first 5 seconds the TCP stream is the single user of the link. From 5 to 10 seconds an
additional file transport shares the wireless link, from 10 to 15 seconds – interference is added, then from 15
to 20 seconds the second stream stops and finally, after 20 seconds, the interference stops as well. Every
40ms the number of arrived bits is measured, and divided by 40 ms to obtain the effective bit rate of the TCP
stream with a sampling interval of 40 ms. Using Figure 3 some important observations can be made.
1. To exploit the available bandwidth to its fullest, the video bit rate curve should retrace the measured
curve shown in Figure 3. However, this is impossible in practice due to, among others, the variable
bit rate of the video, use of fixed-sized video layers, inertia of the transcoder, lack of calculation
power etc. Even when the video bit rate follows the available bandwidth, the end-user is confronted
with an unpleasant perception of frequent quality changes [3]. Therefore, the notion of quality level is
introduced. In our case, for simplicity, quality level is fully determined by the video bit rate.
2. We could change the quality level based on feedback, which triggers the source to change to another
quality level when appropriate. In Figure 3 we base this triggering on the changes of the two average
bandwidth values denoted with dashed and solid lines. However, answering the question when the
quality change should change does not answer the question with what value the quality level should
change. A worst-case (dashed line) or a more optimistic guaranteed level (solid line) are possible.
stok-elsevier-jss-4 9
3. Using the pessimistic dashed line, the video gets through with maximum probability. However, the
drawback is the low effectiveness of the bandwidth usage. In Figure 3 we see that due to the
fluctuations in the intervals [0,5) and [20,25), the worst-case dotted line is 1 Mbit/s below the
measured bit rate, while in interval [9,14) the bandwidth fluctuation becomes so high, that the worst
case scenario brings us no video at all.
The solid line depicts a video quality level close to the measured value. The price for this is an occasional
loss of data. Two techniques are used to keep the effects of losses low: (1) layered scalable video and (2) I-
Frame Delay (IFD)
4. Management frameworkThe management is hierarchically ordered. At the highest level, bandwidth allocation is done to permit
bandwidth sharing between videos. At the next level, the bit rate of the video is adapted to the available
bandwidth. At the lowest level less important packets are dropped to assure that more important packets
arrive in time at the destination.
Figure 4 Sender refinement
Figure 4 shows the structure of the sender. The original video enters the application. Inside the application a
transcoder transforms the single layer video into a layered scalable video. The bandwidth allocation algorithm
specifies the sum of the sizes of the layers. The size of the individual layers and the number of layers are
determined by the bandwidth fluctuations. The video layers are presented to TCP, which fragments the
frames into packets. A traffic shaper outputs the packets on the link.
stok-elsevier-jss-4 10
4.1. Slow fluctuationsTwo types of slow fluctuations are considered; (1) user interaction to increase or decrease the quality of a
video stream or to start/stop a video stream, and (2) adaptation of the sizes of the video layers in response to
changes in bandwidth availability for example coming from physical stimuli.
User interaction The UPnP QoS working groups prescribe the elements, which distribute the bandwidth
allocation decisions over the network. Each device holds a QoSDevice module, which receives from the
UPnP QoS manager instructions on the bandwidth it may use (see Figure 4). The traffic shaper takes time
windows in which it sends packets according to the prescription received from the QoS manager.
Layer configurations. Two modules in our framework are involved in determining the number of layers and
the size of each layer: (1) scalable transcoder, and (2) layer configurator (see Figure 4).
Scalable transcoder. The transcoder converts non-scalable video into multi-layered scalable video. The layer
configuration may be changed at run-time. The input to the transcoder is provided via a reliable channel, thus
assuming that there are no losses or delays in the incoming stream.
Layer configurator. The layer configurator chooses number and bit-rates of layers based on the acquired
information about network conditions and receiver’s decoding capability. The network information is used to
estimate the currently available network bandwidth, fluctuations and errors. The receiver’s decoding
capability is used to define the maximal number of layers that can be handled by the receiver.
4.2. Fast fluctuationsThe scalable video solution makes it possible to react to fast changes by dropping enhancement layers for a
given frame. The layers should be chosen such that the bit-rate of BL is less than the available bandwidth to
assure that BL always arrives. Taking this to its logical consequence means a very small BL. A small BL
yields a very low quality that is difficult to repair with larger EL [1]. Therefore, the BL is chosen as high as
possible. To counteract the fast bandwidth fluctuations two components are employed (1) I-Frame Delay
(IFD) algorithm and (2) layered frame scheduler.
IFD. Our experiments show that impressive improvements (compared to the default Tail-Drop technique) can
be achieved with only two buffers, which accommodate each 1 video frame. Let us mark the frames in the
buffer as follows: S – the frame that is being transmitted (and is partially sent), W – the frame waiting in the
stok-elsevier-jss-4 11
buffer. The frame, which is offered for transmission by the application, will be marked as C. When W is
present, which means that we cannot buffer any more frames, and C is arriving from the application, the
scheduling algorithm decides which of the two frames (C and W) is least important for the end-quality to
decide which one to discard. The following algorithm favors I frames and P frames over B frames:
WHILE (TRUE) DOWHILE (C is empty) DO NothingIF (W is empty) THEN Store C in WELSE
IF (C is of type I) THEN Overwrite W with CELSE IF (C is of type B) Discard CELSE IF (W is of type I or P) Discard CELSE Overwrite W with C
A Boolean is added to the algorithm to drop all frames to the next I-frame when a P-frame is dropped.
Is bufferfull?
Delete packets in bufferfor a frame with lowest priority
Check BL buffer
Put packet into buffer
YesNo
Is it BL?No Yes
Start with BL
Is bufferempty?
Take packet Send packet
Check buffer
Choose next layer
Is packet outdated?
Drop packet
No No
Yes
Yes
Figure 5 Transmission of packets from sender buffer (left), and filling sender buffer (right)
Frame scheduler. The scheduling combines the layered scalable video with the IFD temporal approach at
the sender. IFD and layered scalable video can be used independently and in isolation. The combination
supports larger fluctuations in bandwidth. However, there is a minimum bandwidth of 1 Mbit/s, associated
with the 802.11 technology. Layers of a scalable video are sent according to a priority scheme. Since BL
information is absolutely necessary, the BL has the highest priority. The priority of each EL decreases with
increasing layer number. When a frame from ELx is being transmitted and a frame from BL arrives, the
sender sends the BL frame after the transmission of the current packet belonging to ELx (if any). When a
frame from ELx arrives, it preempts a frame from ELy (where y>x) in a similar fashion (see Figure 5).
When the channel bandwidth has become lower than the total video bit rate, the sender buffer gets full. To
prevent sending late packets, we introduce a maximum lifetime for EL packets. If the maximum is reached,
the packet is deleted from the buffer. BL packets are removed by IFD independent of their life time.
stok-elsevier-jss-4 12
4.3. Choosing layersThe layer configurator uses a table that is created off-line to choose the most appropriate layer configuration
as function of the network conditions. For a predefined set of network conditions we estimate (1) loss
probability per layer for each layers configuration, and (2) the average quality that can be delivered by this
layers configuration (by looking at loss probabilities and calculating the SNR quality of the video). A fixed
maximum number of layers is used per network condition. If the decoding capacity of the receiver is lower
than the suggested BL-value, part of the bit-rate of BL is reassigned to the first EL. For example, if an
optimal configuration for a given network condition yields a BL of 4 Mbps, an EL 1 of 2 Mbps, and an EL2 of
2 Mbps and due to device requirements the BL should be limited to 1 Mbps, then the BL bit-rate is set to 1
Mbps and EL1 bit-rate is set to 5 Mbps.
Offline, a network simulation environment creates strategies for the layer configurator as shown in Figure 6.
The environment consists of five major components: frame size generator, packet generator,
sender/prioritizer, wireless channel simulator, and receiver/quality calculator. The frame size generator
produces normally distributed random values for frame sizes based on the stream bit-rate, assuming that the
mean size of a frame in a stream is equal to the bit-rate divided by frame rate. These values are passed to the
packet generator, which formats an incoming data stream into a set of packets based on video stream syntax
and the network protocol specification. The packets are buffered and sent over in accordance with their
priority by the sender/prioritizer. In accordance with MAC level retransmissions of 802.11-like protocols, we
allow a fixed number of retransmissions for a packet that is lost. The module also uses a maximum lifetime
for packets, so the outdated packets are deleted from the buffer. The Gilbert model [13] was used for the
insertion of errors into the transmission channel of the channel simulation module. The module, based on the
description of the network condition, expressed in average available bandwidth, error rate and burstiness of
errors, calculates the amount of error-free packets, dropped packets and frames. Corrupted packets are
dropped (a packet is considered corrupted if at least one bit of the packet is wrong). A complete frame is
dropped when at least one packet of the frame is dropped.
stok-elsevier-jss-4 13
Frame sizegenerator
Packetgenerator
Sender /prioritizer
Wireless channel
simulator
Receiver / quality
calculator
Frame sizes
Number and size ofpackets per stream
Packets with priorities
Packet and Frameerror rates
OUTPUTAverage PSNR as a function ofnetwork condition and layer configuration
INPUT
Layer configuration(number of layers,bit-rates of layers)
Network conditions(average bandwidth,error rate, burstiness)
Packetizationscheme
Buffer sizes,packets lifetimenumber of retransmissions
Figure 6 Network simulation environment (input in italic is implementation specific)
Finally, the packets of a given frame, transmitted over the channel, are merged together into a single frame in
the receiver/quality calculator module. The receiver computes how many times corresponding frames from
different layers are transmitted successfully. Based on these values and knowing (from predefined data) the
mapping between layer size and quality expressed in average Peak Signal to Noise Ratio (PSNR) the module
calculates average PSNR for the received video. The layer configuration with the highest average PSNR is
considered to be the optimal for the given network condition.
4.4. InteroperabilityIt is important that the framework does not only solve the technical requirement of showing the best possible
video as function of transmission conditions and receiver capacity, but also provides a high level of
interoperability. The UPnP and DLNA standards and recommendations govern all interaction between
devices on the network. The global problem of sharing network resources between applications is solved
within the context of the standard. The problem of optimizing perceived video quality is solved entirely
within the sender. Consequently, it is possible to apply the framework solutions within an interoperable
framework, still allowing the manufacturers to improve the quality of their own senders.
5. EvaluationEvaluation is done in two parts. Section 5.1 shows the validity of our choice for TCP. Section 5.2 shows how
the framework handles the variations in bandwidth coming from fluctuating operational conditions.
stok-elsevier-jss-4 14
5.1. Video streaming with TCP over wireless mediumThe measurements presented in this paper are a selection from the measurements described in [24]. The
measurement setup is as follows. A PC with a wireless card is used as sender. The PC sends video over IEEE
802.11b to an Access Point (AP). The AP is connected with switched Ethernet to the receiver PC which
renders the video. The following transmission protocols are compared: (A) Unblocking UDP, (B) blocking
UDP, (C) RFC 2250 [9], which describes packetizing mpeg, over blocking UDP, (D) TCP and (E) TCP with
IFD. A MPEG-2 video with duration of 60 seconds was streamed from sender to renderer with 4 different bit
rates, 3, 4, 5 and 6 Mbit/s. The MPEG-2 video specifies that 25 frames are sent per second i.e. 40 ms between
each frame.
For all transmission protocols, frames are sent with the bit rate of the video or with a rate limited by the
bandwidth of the wireless channel. When the bandwidth is smaller than the bit rate of the video, the effective
bit rate was reduced at the sender such that the duration of the video increased beyond the original 60 seconds
for transmission protocols B, C, and D. For transmission protocol A (unblocking UDP), packets were lost
inside the driver of the sender in an uncontrolled fashion, limiting the duration to 60 seconds. Transmission
protocol A is rejected for that reason. For transmission protocol E (TCP with IFD) the rendered video
duration remained equal to 60 seconds while losses were controlled. This is explained in more detail below.
The maximum effective transmission rates for B is 6 Mbit/s, for C is 4.5 Mbit/s and for D is 5 Mbit/s. The
effective transmission rate of protocol C (RTP, the official standardized video protocol) is lower than B and
even D (TCP) because the packet overhead is larger and not all packets are completely filled.
(a) (b)
Figure 7 (a) Throughput for TCP versus wireless retry value and (b) latency of TCP versus video bit rate
stok-elsevier-jss-4 15
One of the properties of the wireless link is that the sending of a packet is immediately acknowledged at the
link layer. When the sender receives no acknowledgement, the wireless frequency rate is lowered and the
packet is resent. The retry value determines the number of times a packet can be resent before it is definitely
lost. Figure 7(a) shows the throughput of TCP with different retry values of the wireless link. On the
horizontal axis the time during video transmission is shown. After 20 seconds the microwave is switched on,
and switched off after 40 seconds. When the retry value of the wireless link is set to one (no retransmissions)
the TCP protocol needs to resend all lost packets and we see that the throughput is below 1 Mbit/s. Not
visible in the figure is that with a retry value of four almost all packets arrive and TCP retransmits only 3 to 4
packets during the full 60 seconds. With blocking UDP we see the same dip between 20 and 40 seconds with
the difference that the maximum transmission rate is 4.5 Mbit/s, no end-to-end retransmission takes place and
frames are lost.
Figure 7(b) shows the latency of the frames with respect to the time they should have arrived at the sender,
given the arrival time of first frame. A retry value of 4 is used, meaning that no losses occur during wireless
transmission. A latency of 80 msec is acceptable with a reception buffer of two frames. For bit rates of 3 and
4 Mbit/s the video transmission is stopped after 60 seconds. During the interval [20,40) for video bit rate 3
Mbit/s a delay builds up and disappears once (probably associated with a TCP retransmission) and for video
bit rate 4 Mbit/s two delays build up and are removed later on. For bit rate 6 Mbit/s the situation is dramatic,
an enormous latency builds up during the whole transmission period. For bit rate 5 Mbit/s latency builds up
during the microwave on period. The same type of behavior can be measured with blocking UDP, with the
exception that for video bit-rate of 5 Mbit/s the latency builds up as for the video bit rate of 6 Mbit/s, and
video frames are lost from time to time.
A latency value larger than 80 msec. means that the video is stalled during several moments. In case of live
video, frames would even be lost at the sender buffer, because video cannot be delayed, contrary to the
conditions in this experiment. The total effect on the viewer would be disastrous.
The IFD protocol is activated on top of TCP to trade latency against frame losses in a controlled fashion.
Figure 8 shows the latency versus time with wireless retry value equal to 4 (TCP retransmissions < 5) for
different video bit rates. For all bit rates the maximum latency remains below 110 msec. This means that live
stok-elsevier-jss-4 16
video is transmitted in time even when overload conditions occur. The 110 msec means that a given frame
can be rendered two to three times consecutively when frames are dropped. This dropping leads to jerky
movements at times. During the 60 seconds period of transmission of the 6 Mbit/s video, with a 20 seconds
microwave perturbation, no I-frames are removed, 3 P-frames are removed and 314 B frames are removed
(total 20% of frames are removed) the consequence on the video is no artifacts, sometimes a jerky movement,
but all rendered frames are not more retarded than 110 msec.
Figure 8 Latency of TCP with IFD for different video bit rates
The IFD protocol can also be activated on top of UDP. However, the disadvantage is that when a second
wireless segment is introduced (e.g. sender to AP and from AP to renderer), IFD must also be applied for the
following segment (e.g. in the Access Point). From an interoperability point of view this is difficult to realize
in practice (manufacturers have to agree on the specifics of the IFD protocol).
Given all these measurement, the solution of TCP coupled with controlled frame losses seems the best
solution to obtain timely live video under restricted network resources, because the TCP throughput is highest
coupled with the least chance of artifacts in the rendered video.
5.2. Framework behaviorAn example behavior is based on Figure 3. Each interval experiences network conditions that are different
from both predecessor and successor intervals. The wildly moving line represents the bandwidth of the
channel sampled with 40 ms intervals.
stok-elsevier-jss-4 17
Interval BL (Mbps) EL (Mbps) BL loss rate % EL loss rate %1 4 1 0 4.4
4.25 1 0.1 12.53.75 1 0 1.1
2 3.5 1 0.2 9.83.25 1 0 4.23.75 1 0.7 19.5
3 1.5 1.25 6.5 26.71.5 1 6.5 20.8
1.75 1 8.8 26.91.25 1.5 5 26.6
4 2.25 1 2 17.92 1 0.8 11.72 1.25 0.8 17.6
5 4 1 0 4.44.25 1 0.1 12.83.75 1 0 1
Table 1 Configurations that deliver highest average objective quality under different network conditions
Two layers are transmitted. The lower dotted line approximates the average bit rate of the BL. The higher
solid line represents the average bit rate of the BL + EL layers. In practice the video bit rate fluctuates around
the average value with a deviation of ± 25%. As soon as a change of network conditions is detected, the layer
configurator changes the bit rates of the layers. The network conditions of the different time intervals are
expressed in error rate and burstiness of errors. Once these parameters are measured the configurator uses a
simple look up table to choose a layer configuration.
The lookup table is created offline as described in section 4.3. For each of the 5 intervals of Figure 3 the best
fit is calculated with one of the network transmission conditions used by the network simulation environment.
Table 1 shows three (four for time interval 3) best layer configurations for every time interval (network
condition) of our example. The uppermost configuration of each time interval is preferred as the best choice
for that time interval. Under good network conditions the second and third configuration in a given interval
differ from the best only by the size of BL (plus-minus 0.25 Mbps). A small change in bit rate of the BL
produces a more significant difference in quality than a change in an EL.
Under poor network conditions BL size is very small (segment 3). So, even a small increase in bit rate for BL
brings a huge raise in the value of objective video quality. However, the penalty for the bit rate increase is a
high loss rate for BL, which influences the subjective quality value. The acceptable loss range for BL is 5%,
so for segment 3 a configuration with BL of 1.25 Mbps and EL of 1.5 Mbps is chosen.
stok-elsevier-jss-4 18
6. ConclusionsStreaming video requires a dynamic adaptation of the video to the available network resources and
destination resources. Even when the bit rate of the video is in agreement with the average network
bandwidth, wireless networks suffer rapid bandwidth fluctuations caused by interference from other electro-
magnetic sources, and the differences in frame sizes imply fluctuating bandwidth requirements. This calls for
a hierarchical approach to handle at the highest level the slow bandwidth changes, and at a lower level the
fast bandwidth changes. At the same time the capabilities of the receiver need to be taken into account to
prevent sending data, which cannot be handled at the destination. When nothing is done, the video is rendered
with artifacts, which are very annoying to the viewer of the video.
A framework is proposed that removes video data to reduce the bit rate of the video in a controlled fashion. A
transcoder adapts the bit rate to the slow fluctuations while the fast fluctuations are handled by throwing
away layers of SNR scalable streams or remove entire frames when the bandwidth drop is extremely large
and sudden. Using TCP provides the following advantage: All the intelligence of the system is concentrated
at the sender side. No network protocol adaptation is needed.
Presenting solutions for optimizing transmission of video is not enough. The solutions should be presented in
an interoperability framework, to be accepted by the CE device manufacturers. The paper shows how such a
framework can be integrated within the UPnP and DLNA standardization efforts.
AcknowledgementsWe like to thank Jeffrey Kang and Jan Ouwens for many helpful discussions and valuable input.
References
[1] R. Haakma, D. Jarnikov, P. van der Stok, Perceived quality of wirelessly transported videos, in Dynamic and
Robust Streaming in and between Connected Consumer-Electronic Devices (ed. P. van der Stok), Series: Philips
Research Book Series, Vol. 3, 2005
[2] S. Tanenbaum, Computer Networks, 4th ed. Prentice-Hall, 2003.
[3] M. Zink et al, Subjective Impression of Variations in Layer Encoded Videos, KOM Multimedia
Communications, 2003
stok-elsevier-jss-4 19
[4] Pedro Cuenca et al, Performance Evaluation of Cell Discarding Mechanisms for the Distribution of VBR
MPEG-2 Video Over ATM Networks. IEEE Transactions on Broadcasting, 44(2), June 1998
[5] Tao Tian et al, Priority dropping in network transmission of scalable video. International Conference on Image
Processing, 3:400-3, Sept. 2000
[6] Dmitri Jarnikov, Peter van der Stok, Johan Lukkien, Wireless streaming based on a scalability scheme using
legacy MPEG-2 decoders, Ninth IASTED Int. Conference on Internet & Multimedia Systems & Applications, 2005
[7] C.C.Wust, L.Steffens,R.J.Bril, and W.F.J.Verhaegh, “QoS Control Strategies for High Quality Video
Processing”. In Proc. 16th Euromicro Conference on Real-Time Systems (ECRTS), Catania, Italy, 2004.
[8] D Isovic and G.Fohler, “Quality aware MPEG-2 Stream Adaptation in Resource Constrained Systems”. In
Proc. 16th Euromicro Conference on Real-Time Systems (ECRTS), Catania, Italy, 2004.
[9] D. Hoffman, G. Fernando, V. Goyal and M. Civanlar, “RTP Payload Format for MPEG1/MPEG-2 Video. RFC
2250”, Network Working group, Jan. 1998.
[10] ISO/IEC International Standard 13818-2, “Generic Coding of Moving Pictures and Associated Audio
Information: Video”, Nov., 1994.
[11] ISO/IEC International Standard 14496-2, “Information Technology – Generic Coding of Audio-Visual Objects,
Part 2: Visual”, MPEG98/N2502a, Oct., 1998.
[12] ITU-T International Telecommunication Union, “Draft ITU-T Recommendation H.263 (Video Coding for Low
Bit Rate Communication)”, KPN Research, The Netherlands, Jan., 1995.
[13] J. R. Yee and J. Edward J. Weldon, ”Evaluation of the performance of error-correcting codes on a Gilbert
channel”, IEEE Trans. on Communications, pp. 2316-2323, Aug. 1995.
[14] D. Jarnikov, P. van der Stok, C.C. Wust, “Predictive Control of Video Quality under Fluctuating Bandwidth
Conditions”. ICME '04, Volume: 2 , pp. 1051 – 1054, June 27-30, 2004
[15] McCanne, S., Vetterli, M., Jacobson, V., “Low-complexity video coding for receiver-driven layered multicast”,
IEEE journal on selected areas in communications, vol. 16, no 6, p.983-1001, 1997.
[16] Peter Amon, Jurgen Pandel, “Evaluation of Adaptive and Reliable Video Transmission Technologies”,
available from http://www.polytech.univ-nantes.fr/pv2003/papers/pv/html/main/all_pap.htm
[17] R.J. Bril, C. Hentschel, E.F.M. Steffens, M. Gabrani, G.C. van Loo and J.H.A. Gelissen, “Multimedia QoS in
consumer terminals”, Proc. IEEE Workshop on Signal Processing Systems (SIPS), pp. 332-343, Sep. 2001.
stok-elsevier-jss-4 20
[18] Yao Wang, Joern Ostermann, and Ya-Qin Zhang, “Video Processing and Communications”, Prentice Hall,
2002.
[19] H. Schulzrinne, G.M.D. Fokus, S. Casner, R. Frederick and V. Jacobson. RTP: A Transport Protocol for Real-
Time Applications. Internet Engineering Task Force, A/V Transport Working Group, Jan. 1996.
[20] J. Postel. Transmission Control Protocol. RFC 793, Information Sciences Institute, September 1981.
[21] S. Liang and D. Cheriton., TCP-RTM: Using TCP for Real-Time Multimedia Applications, InfoCom 2001.
[22] L. Lenzini, E. Mingozzi, G. Stea, A unifying service discipline for providing rate-based guaranteed and fair
queuing services based on the Timed Token protocol, IEEE transaction on Computers, Vol 51, Nr 9 2002.
[23] J.C.R. Bennett and H. Zhang, Hierarchical packet fair queuing algorithms, Proc of the ACM SIGCOMM 1996.
[24] J. Ouwens, The Performance of Wireless MPEG-2 Video Streaming, Philips Internal note TN-2005/00735.
[25] IEEE 1394 standard
[26] HiperLAN, http://en.wikipedia.org/wiki/HIPERLAN#HIPERLAN.2F2
[27] IEEE 802.11e standard
[28] Wi-Fi CERTIFIED™ for WMM™ - Support for Multimedia Applications with Quality of Service in Wi-Fi®
Networks, http://www.wifi.org/membersonly/getfile.asp?f=WMM_QoS_whitepaper.pdf
[29] WiMedia, http://www.wimedia.org/en/index.asp
[30] HomePlug AV White Paper, http://www.homeplug.org/en/docs/HPAV-White-Paper_050818.pdf
[31] Residential Ethernet Overview, Michael Johas Teener, CommsDesign,
http://www.teener.com/ResidentialEthernet/Residential%20Ethernet.pdf
[32] UPnP forum, www.upnp.org
[33] UPnP Quality of Service specifications, http://www.upnp.org/standardizeddcps/qualityofservice.asp
[34] DLNA Interoperability Guidelines v1.5, March 2006
[35] DLNA Media Format Guidelines v1.5 - Volume 2, March 2006
stok-elsevier-jss-4 21