spyPanda Thesis Report
-
Upload
alexander-stevens -
Category
Documents
-
view
1.260 -
download
3
description
Transcript of spyPanda Thesis Report
Wireless Transmissionof Video using WLAN
byAlexander Conrad Stevens
School of Information Technology and Electrical Engineering,University of Queensland.
Submitted for the degree ofBachelor of Engineering
in the division of Mechatronic Engineering
November 2011.
November 6, 2011
The Head of School
School of Information Technology and Electrical Engineering
University of Queensland
St Lucia, Q 4072
Dear Professor Paul Strooper,
In accordance with the requirements of the degree of Bachelor of Engineering in
the School of Information Technology and Electrical
Engineering, I present the following thesis entitled “Wireless Transmission of Video
using WLAN”. This work was performed under the supervision of Dr Konstanty
Bialkowski.
I declare that the work submitted in this thesis is my own, except as acknowl-
edged in the text and footnotes, and has not been previously submitted for a degree
at the University of Queensland or any other institution.
Yours sincerely,
Alexander Conrad Stevens
To my parents and grandparents; for supporting my ambitions through life. . .
Acknowledgments
I would love to show my earnest appreciation for the guidance from my supervisor,
Dr Konstanty Bialkowski. He was patient, informative and guided me in the right
direction when I strayed from the path of this work. It would be a pleasure to
continue to liaise with him in the future.
I would also like to thank Mr Ross Finlayson of Live Networks, Inc. - for the
Live555 streaming libraries; and Jason Garrett-Glaser and his team of x264 devel-
opers - for the x264 H.264/AVC encoding library. Their contributions to the open
source community and providing the means for myself to complete this thesis, will
be forever acknowledged.
i
Abstract
A call for efficient use of wireless networks for streaming of video has become appar-
ent in today’s age of on-demand content. This is ever more evident in the Unmanned
Aerial Vehicle (UAV) research and development sectors - streamed video must be
delivered of reasonable quality, in real-time and of a high framerate over 802.11b/g/n
wireless LAN for the research solution utilising the video. Further compounding the
task is that most research and consumer UAVs are of a form factor far too small for
the average consumer desktop computing solution.
This thesis is inherently on the topic of providing a complete open-source software
solution that runs on a low power, light-weight computing platform and provides
the aforementioned features. Developed within this software is a custom rate-control
control algorithm that utilises the Live555 Real-time Transport Protocol (RTP) and
Real-time Transport Control Protocol (RTCP) library, the x264 H.264/AVC encod-
ing library and the Video4Linux2 application programming interface. The software
solution is then built and ran within a Linux distribution named Ubuntu, upon an
ARM development platform called the Pandaboard.
This solution has been tested and optimised for a Pandaboard mounted atop the
popular Parrot AR.Drone consumer UAV. It has proven that streaming high fram-
erate H.264 video from a UAV platform is achievable through various rate control
techniques, like on-the-fly resolution adjustments and adjusting H.264 quality pa-
rameters. However, without the use of an H.264 encoder optimised for the Pand-
aboard’s Digital Signal Processor, the Pandaboard cannot encode video of high
enough quality to saturate the wireless network. That is until, the wireless LAN is
at the limits of its range or the wireless LAN is negotiating heavy traffic.
ii
Contents
Acknowledgments i
Abstract ii
List of Figures v
List of Tables vi
1 Thesis Overview 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Aim of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Theory 5
2.1 The H.264 Codec . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 IEEE 802.11 Wireless and Protocols . . . . . . . . . . . . . . . . . . . 7
2.3 The RTP and RTCP Protocols . . . . . . . . . . . . . . . . . . . . . 8
2.4 Rate Control for Streams . . . . . . . . . . . . . . . . . . . . . . . . . 11
3 Literature Review 13
3.1 Military Implementations . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2 The AR.Drone by Parrot SA . . . . . . . . . . . . . . . . . . . . . . . 15
3.3 UAV Traffic Surveillance . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.4 Real-time Encoding and Transmission of H.264 . . . . . . . . . . . . 16
3.5 Adaptive Rate Control for RTP Streams . . . . . . . . . . . . . . . . 18
4 Design of Platform 20
4.1 Choice in Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.2 The Operating System . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.3 The UAV Platform and Camera . . . . . . . . . . . . . . . . . . . . . 23
iii
iv CONTENTS
5 Design of Software 26
5.1 Video4Linux2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
5.2 x264 Open Broadcast Encoder . . . . . . . . . . . . . . . . . . . . . . 28
5.3 RTP and RTCP with Live555 . . . . . . . . . . . . . . . . . . . . . . 30
5.4 The spyPanda Control Algorithm . . . . . . . . . . . . . . . . . . . . 31
6 Results of spyPanda 34
6.1 Pandaboard Encoded Framerate Results . . . . . . . . . . . . . . . . 34
6.2 Pandaboard Output Bitrate Results . . . . . . . . . . . . . . . . . . . 36
6.3 Jitter and Latency Results . . . . . . . . . . . . . . . . . . . . . . . . 38
6.4 Discussion on Picture Quality . . . . . . . . . . . . . . . . . . . . . . 40
7 Conclusions 42
7.1 Summary and conclusions . . . . . . . . . . . . . . . . . . . . . . . . 42
7.2 Possible future work . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Appendices 44
A Program listings 45
B Companion disk 46
B.1 Main C File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
B.2 Video4Linux2 Implementation . . . . . . . . . . . . . . . . . . . . . . 46
B.3 x264 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
B.4 Live555 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . 46
B.5 Miscellaneous C Implementations . . . . . . . . . . . . . . . . . . . . 47
B.6 Sample Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
B.7 Report LaTeX Source and Items . . . . . . . . . . . . . . . . . . . . . 47
B.8 This Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
List of Figures
2.1 Example of I, P, and B-Frame Prediction. Picture courtesy of Petteri
Aimonen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 The RTP Header as defined in RFC-1889 [25] . . . . . . . . . . . . . 9
2.3 Receiver Report for the RTP Control Protocol [25] . . . . . . . . . . 10
3.1 The MQ-1 Predator in full flight [3] . . . . . . . . . . . . . . . . . . . 13
3.2 The RQ11 Raven being hand launched [15] . . . . . . . . . . . . . . . 14
3.3 The Parrot AR.Drone with forward facing camera [21] . . . . . . . . 15
3.4 The proposed UAV traffic surveillance system [27] . . . . . . . . . . . 16
3.5 The Texas Instruments results of the real-time encoding algorithm.
[26, Page 4] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.6 Example of TFRC use in an RTP/RTCP environment [29] . . . . . . 18
3.7 Results of the TFRC method vs. no method [29] . . . . . . . . . . . . 19
3.8 PID method employed by the Tos and Ayav’s Method [31] . . . . . . 19
4.1 The BeagleBoard-xM and its Peripherals [5] . . . . . . . . . . . . . . 21
4.2 The Pandaboard and its Peripherals [9] . . . . . . . . . . . . . . . . . 22
4.3 The AR.Drone, Pandaboard and Logitech C910 Camera - with and
without protective Styrofoam . . . . . . . . . . . . . . . . . . . . . . 25
5.1 Overview block diagram of the spyPanda software . . . . . . . . . . . 26
5.2 Illustration displaying stride [11] . . . . . . . . . . . . . . . . . . . . . 33
6.1 Framerate attained over progressions of dropped resolutions . . . . . 35
6.2 Bitrate attained over progressions of dropped resolutions . . . . . . . 37
6.3 Jitter experienced in the stream . . . . . . . . . . . . . . . . . . . . . 39
6.4 Stable flight of AR.Drone at 320x240 pixels . . . . . . . . . . . . . . 40
6.5 Unstable flight of AR.Drone at 320x240 pixels . . . . . . . . . . . . . 40
6.6 High motion flight of AR.Drone at 320x240 pixels . . . . . . . . . . . 41
v
List of Tables
5.1 Sample of spyPanda’s ordered linked list of resolutions and framerates 27
6.1 Times for Resolution changes in stream for Figures 6.1, 6.2 and 6.3 . 36
vi
Chapter 1
Thesis Overview
1.1 Introduction
Generally, video streaming is performed on wired networks providing almost un-
fettered access to multimedia content for the end-user. However, one can see the
benefits of a device utilising existing and cheap wireless networks to provide video
content - whether it be in real-time or on-demand. The same applies for Closed-
circuit TV (CCTV) security systems - where the video is streamed through physical
wires to a base station - could be improved by using a modular wireless system.
The choice between wired and wireless networks though, is there for the consumer
to decide. However, in an application like an Unmanned Aerial Vehicle (UAV), it
can be considered almost impractical to tether the UAV to a CAT5e/6 LAN ca-
ble to provide the end-user with the UAVs on-board video. So proprietary digital
and analogue solutions have been developed to stream video over existing wireless
technologies. These solutions generally have static and unchangeable configurations
unless licensing agreements are met, and these agreements can sometimes be costly
to the party requiring them.
Third and fourth generation (3G/4G) mobile networks have been utilised by video
call applications on mobile phones to provide real-time conferencing to loved ones
and colleagues; though, these networks are costly, even when rented from a carrier.
So for a portable and cheap streaming solution, existing and widespread networks
are required - which leads to the next point. In today’s day and age, almost every
home in the developed nations around the world can purchase and set up their own
802.11 wireless network in as little as 1 hour with little to no configuration. This
IEEE standardised technology requires no licensing fees, can be extended by direc-
tional antennae to reported record ranges of up to 382 kilometres [23], and provides
1
2 CHAPTER 1. THESIS OVERVIEW
sufficient network bandwidth capacity to stream high quality video to an end user.
To stream real-time video using a digital wireless technology like 802.11, it would
require the use of a well established video codec. The video codec is designed to
use motion prediction estimation and various other techniques to compress video.
Though, like many forms of intellectual property, some codecs require royalty fees
if the video streaming product is to be sold. For the most part, a research or-
ganisation can use such codecs without restriction, and there exists open-source
implementations of quality codecs like WebM/VP8, Vorbis and H.264/AVC, all free
for modification and use. Many of these codecs are used in home appliances already,
for example, the H.264/AVC video codec is used in the Blu-ray DiscTMtechnology
and both MPEG-2 and H.264/AVC are used for Digital Television in many countries
because of their high compression.
Once a codec is chosen, the video can be compressed and sent through the wireless
network to the client. Some companies implement their own protocols which encap-
sulate and packet the video ready for streaming, only to force clients into a product
lock-in. For hobbyists, enthusiasts and research institutions, most choose an open
protocol named the Real-time Transport Protocol (RTP) and its sister protocol, the
Real-time Transport Control Protocol (RTCP). They are both standardised under
the International Telecommunications Union, and provide a means of encapsulating
the video and receiving network and video statistics from servers and clients.
Aggregating all of these technologies will make it possible to create a complete sys-
tem for real-time streaming of video over wireless networks from a UAV. Already,
there is a consumer product called the AR.Drone as developed by Parrot SA which
provides a full UAV platform with a video camera, a custom ARM based platform
running Linux, and their own proprietary implementation of real-time video stream-
ing. However, the software is not open-source and is limited in customisation to only
the provided application programming interface (API).
1.2 Aim of Thesis
Like the video streaming software upon the AR.Drone by Parrot SA - and as the
introduction implies - the objectives of the thesis is develop an open-source software
solution for real-time, adaptive video streaming at reasonable qualities, and high
framerates. It will implement libraries to interface to a camera, compress the cap-
tured video frames using an encoder (MPEG-2, H.264, VP8, etc), and stream using
1.3. THESIS OUTLINE 3
the standardised RTP/RTCP protocols. On top of all of these libraries, it will then
implement a control algorithm that adjusts various aspects of quality to maintain a
real-time and high framerate video stream. The software will be optimised, built and
ran upon a low-power, high performance platform small enough to be then mounted
atop either a custom or pre-built UAV for testing purposes.
These points can be summarised in the following:
• Real time encoding of high quality captured video on board the portable plat-
form
• Adaptive control of bit-rate via an algorithm and wireless connection feedback
• Monitor the quality of wireless transmission for dropped Intra-frames
• Maintain an acceptable latency between capture of video and display on client
(under 200ms)
• Ensure end-user can display video stream (standardised streams, or a custom
built client)
• Ensure that the hardware on the portable platform can perform all associated
tasks while keeping a minimal footprint on power usage and size
The main reason for developing this thesis, is that open-source implementations that
provide streaming from the camera, all the way to the client is hard to come by.
There’s implementations that stream video from a camera over RTP, like the media
player VLC, but they generally have no rate control aspects. Also, most of these
implementations are only tested on full desktop computers with x86 architectures
and are not optimised for use on small development boards.
1.3 Thesis Outline
The following chapters should clearly explain:
• Chapter 2, Theory
A summary of the H.264 codec, IEEE 802.11 Wireless Standard, and the Real-
time Transport Protocol and Real-time Transport Control Protocol
• Chapter 3, Literature Review
Examples of prior work placed into the field of real-time streaming of video,
real-time encoding, and other works that relate to this thesis.
4 CHAPTER 1. THESIS OVERVIEW
• Chapter 4, Design of Platform
An explanation of the decisions in hardware and operating system used to test
and develop the final program.
• Chapter 5, Design of Software
A walkthrough of the design and steps taken to develop the “spyPanda” soft-
ware used to stream the adaptive real-time video.
• Chapter 6, Results and Discussion
Results from the designed software, and what could have been improved or
implemented.
• Chapter 7, Conclusions and Future Work
A summary of how well the designed software fared for the objective, and a
short list of possible future work to improve on the platform.
Chapter 2
Theory
2.1 The H.264 Codec
First off, the H.264 standard was developed and designed by the ITU-T (Inter-
national Telecommunications Union Standards Department), and was designed to
complement existing and future networking technologies [34]. Built upon the prior
H.263 standard, it aimed to reduce the bit-rate by 50% [20] compared to previous
standards; however, from this, increased computational complexity was inevitable.
While the reduction in bit-rate is perfect for the next section Real Time Wireless
Transmission of Streamed Video, it unfortunately brings up another issue for this
section. With increased computational complexity, many options arise to alleviate
strain on the processor:
1. Optimise the encoder for the specified processor architecture; which V. Iverson,
J. McVeigh and B. Reese have explained for Intel architectures [20]
2. Choose hardware that allows for hardware based H.264 encoding [8]
3. Choose hardware that has optimisations written within the x264 encoder [18]
4. Use a rate control algorithm to dictate limits on the encoded picture and
hardware; as previously investigated by N. Srinivasamurthy, S. Nagori, G.
Murthy and S. Kumar, TI [26]
From these selections though, it can be clearly seen that a rate control algorithm
(option 4) can be used in conjunction with any of the other options to great effect.
However, to optimise the encoder for the architecture (option 1) is mostly mutually
exclusive with choosing hardware that already has optimisations in x264 (option 3).
So from this quick analysis, it can be seen that the most optimal results for real-time
5
6 CHAPTER 2. THEORY
playback can come back from choosing hardware with optimisations for the x264
encoder, hardware with a hardware based H.264 accelerator and to develop a rate
control algorithm.
Options 2 and 3 have been briefly browsed over in the previous section, where
the PandaBoard is an ARM based Cortex A9 processor which has NEON optimisa-
tions from x264 [18] and a digital signal processor which provides hardware based
encoding of H.264 video [8]. This leaves option 4 with a form of rate control to
dictate the encoders limits to successfully provide 30 frames per second at real time.
Potentially though, option 4 could be made redundant by options 2 and 3 if the
manufacturer specifications are upheld, but having option 4 as a secondary system
linked with the adaptive control in the next section is still a possibility.
As a preface to real time encoding of H.264, it was essential to get an idea of
the output of a H.264 encoder. As a simplified description, the encoder will split the
captured picture into a series of blocks (as small as 4x4 pixels per block). Each block
is then placed through a discrete cosine transform (DCT) to convert the block to its
frequency components, this is the first step to compression. Depending on the type
of frame that is split into blocks, these blocks can also be turned into secondary
layer luma and chroma types, to once again compress the picture. These blocks
combined, make up a total frame when decoded. [34] [16]
For every number of frames in a video sequence, an Intra-frame is taken; a frame in
which contains essentially a full representation of a picture in the video. Although,
it itself uses Intra-frame prediction, in which analyses pixels next to each other,
approximates the region and hence compresses the frame further. In-between every
gap of Intra-frames, there exists two types of Inter-frames the Predicted-frames
and Bi-directional Predicted-frames. These use the previous Intra-frames and each
other to approximate and predict movement (P-Frame and B-Frame) and also use
the next frames and Intra-frame to predict future movement (B-Frame); this method
is essentially a supreme form of compression. [34] [16]
The three frames can be summarised as so and are demonstrated in the following
illustration (Figure 2.1):
• Intra-frame (I-frame): Relies on only itself for decompression, minimal com-
pression
• Predicted-frame (P-frame): Relies on previous frames to decompress, medium
compression
2.2. IEEE 802.11 WIRELESS AND PROTOCOLS 7
• Bi-directional Predicted-frame (B-Frame): Relies on previous and succeeding
frames, maximum compression
Figure 2.1: Example of I, P, and B-Frame Prediction. Picture courtesy of Petteri
Aimonen
After each frame is encoded, the encoder will then split them up with descriptive
headers into network packet sized portions. This is the Network Abstraction Layer
of the H.264 standard, and is designed to effectively pass streamed video over a
network technology. [34] [16] So, a simple definition to real time encoding is not
that an outputted frame from an encoder is instantaneous; instead the frame is
captured, encoded and packeted in under 33.3 milliseconds with a frame rate of 30
frames per second.
2.2 IEEE 802.11 Wireless and Protocols
Matthew Gast’s book, 802.11 Wireless Networks: The Definitive Guide [19] talks
about the underlying principles behind the 802.11 standard. The book explains the
Media Access Control (MAC) layer and its attempts to hide the complexity of the
wireless system underneath. Strength of signal, hidden nodes, obstacles and the
clients distance away from the base station effect the quality of transmission signif-
icantly. Many fall-backs have been designed into the MAC layer, to effectively send
data over wireless space.
In high wireless capacity environments say that of a University campus the hidden
node problem is clearly evident. It creates dead spots between two base stations in
which can only be mitigated by RTS/CTS clearing, these being Request to Send
and Clear to Send frames that are sent between wireless base stations. Effectively,
a base station in a busy environment, will send out a request to silence the other
base stations in proximity and their peers to stop collision of signals. Once the
silence has been completed, the other base stations will return a CTS frame to allow
transmission of the main frame; after the main frame is sent from the original base
station, the original should expect to receive an ACK (or acknowledgement) frame
8 CHAPTER 2. THEORY
back. If any of these frames are not completed, the process of sending the frame
restarts, and hence transmission latency increases. [19, Pages 34–36]
Whenever there’s a drop in a frame, the frame is attempted to be resent, and a
retry counter is placed on that frame. Each time the counter increases, the time al-
located to retry a send in that transmission increases (called the contention window),
and hence this increase of time takes up more bandwidth, and increases the latency
of sent and sending transmissions. To make the matter even more complicated, each
frame is allocated a time called the Duration/ID field, which notifies the recipient to
expect a certain busy time for the transmission currently in progress. Fragmentation
also has an effect on latency, with packets being sent out by the home base station
there is a certain amount of fragmentation that could occur, if any fragments are
lost though, the whole packet will have to be resent. All of these potential problems
increase latency, and so it is important to control the data size for each frame and
hence packet being sent to maintain real time wireless transmission. [19, Pages 50
& 57]
Another contributor to latency, is the Transmission Control Protocol (TCP). It
essentially lies over the top of the MAC layer to ensure highly reliable transport of
frames to other wireless recipients. However, in the situation of transmitting real
time video over a mobile network, TCP proves to be a set back, as can be seen in
[32]. PRSCTP and UDP (Partially-Reliable Stream Control Transmission Protocol
and User Datagram Protocol) are seen to generally transmit frames of MPEG-4
video in under an average of 25 milliseconds, while TCP is seen to do similar, but
unfortunately would spike in latency to in-excess of 2.5 seconds per frame for as
many as 25 frames [32, Figure 7]. This leads to a conclusion that a UDP based
protocol would be needed to stream video far more effectively.
2.3 The RTP and RTCP Protocols
Leading on from the previous section, it was explained that for the application of a
real-time stream within a 802.11 wireless network, one would need to the use of a
UDP based protocol or similar. This leads to the Real-time Transport Protocol and
the Real-time Transport Control Protocol.
The Real-time Transport Protocol (under RFC-1889 [25]) was designed for media
and data with real time characteristics. It essentially lies upon existing network
infrastructure (like TCP and more generally UDP), and packets the media with a
2.3. THE RTP AND RTCP PROTOCOLS 9
header that contains a sequence number, a time stamp, a payload type description
and various other RTP specific identifiers. This header can also be extended to
facilitate custom implementations of the RTP scheme.
The header is laid out in this fashion (Figure 2.2):
Figure 2.2: The RTP Header as defined in RFC-1889 [25]
There is one problem though with the default RTP header, and that is that it does
not include statistics about the stream. This was an intentional design, as if every
RTP header packeted with the payload included statistics about network latency,
quality, etc. would include too much overhead for a real-time implementation. This
is where the RTCP implementation was introduced.
The RTCP protocol was designed to overcome this question of Quality of Service
(QoS). It comprises of 5 different packet types [25]:
• Sender Report (SR): Includes transmission and reception statistics from par-
ticipants that are active senders, and are sent to the receivers
• Receiver Report (RR): Includes transmission and reception statistics from par-
ticipants that are not active senders, and are sent to the senders
• Source Description (SDES): Includes optional details about the source
• Disconnection (BYE): Indicates end of participation
• Application Functions (APP): Indicates custom functions as specified for the
target application
For the purpose of this thesis though, it is only necessary to have an in depth
knowledge of the receiver report, since it is the server that controls the encoding
10 CHAPTER 2. THEORY
parameters. The receiver report (as can be seen in Figure 2.3), is comprised of the
details of the source, the fraction of packets lost since last sender report, the total
number of packets lost, highest sequence number received, the interarrival jitter,
time of last sender report, and the delay since the last sender report and the time
that this receiver report was sent.
Figure 2.3: Receiver Report for the RTP Control Protocol [25]
For streaming of video the important aspects are interarrival jitter, fraction of pack-
ets lost and the delay since last sender report. Interarrival jitter is essentially the
estimate of order that a stream has from the expected details of the video/audio. If
the data comes through unordered, not within the expected time stamp, larger than
expected, etc. it increases the jitter associated with the stream. The delay since
last sender report essentially is the time that the client takes to receive, process and
play the media since the last sender report. These statistics could theoretically be
used to drop the quality of the picture if a clients computer is not powerful enough
to play the stream. The fraction of packets lost can also be used in conjunction with
the total number of packets lost to calculate the total number of packets received.
2.4. RATE CONTROL FOR STREAMS 11
This is demonstrated in the following equation:
CurrentPacketsExpected =n∑
i=0
Losti − Losti−1
FractionLost(2.1)
Even though RTP has Quality of Service statistics through RTCP, it does not ac-
tually perform any guarantees of delivery and requires the underlying network in-
frastructure (like TCP) to handle this. As mentioned previously though, RTP is
generally built upon the UDP infrastructure - which has a “send and forget” prin-
ciple. So Sunhun Lee and Kwangsue Chung investigate the Real-time Transport
Protocol implemented with a TCP-Friendly Rate Control Scheme [22]. This is to
avoid congestion collapse developed from UDP packets and their unwillingness to
adhere to TCP’s congestion control. However, with their TCP-Friendly RTP, they
were still encountering about 160 millisecond round trip times [22, Figure 3]. In
the case of the thesis though, it can be assumed that there will be no interfering
TCP traffic coming from either the UAV or the base station receiving the video. So
hence, a pure method of using RTP could be used to develop UDP-like transmission
times with slightly more reliability.
The benefit of using RTP in this case, is that the H.264 encoder packets its data
suitable for UDP, TCP and RTP/RTCP formats through its NAL standard. If
we proceed one step further, a sister protocol of RTP as mentioned before, is the
Real-time Transport Control Protocol. It’s primary purpose is that of the RTP
protocol, plus it receives information back from the clients about the quality of the
data distribution [30]. It was designed for multiple-user environments, but makes
acquirement of network information simple for a unicast system. Another benefit is
that RTP/RTCP streams are supported through the Live555 project [17] within the
Videolan media program, VLC and also within the MPlayer series of video players.
2.4 Rate Control for Streams
The basic objective of rate control for video streams is to minimise the throughput
enough, so that the medium can keep a consistent flow. This is quite analogous to
the flow of traffic in a busy stretch of road. If too many cars are sent down the
stretch of road, the more they pile and cause congestion. However, if less cars are
sent, an even flow of cars can pass unhindered. The traffic can be adjusted until the
near point of congestion, but not enough to cause it. This is in its simplest form, a
form of bit-rate rate control.
Bit-rate rate control can be used by streams utilising an encoder, to drop the out-
12 CHAPTER 2. THEORY
put bit-rate from the encoder enough so that the network can handle the stream. A
stream though, can also require too much of the encoder where the encoder cannot
stream high quality video at the maximum allowable bit-rate of the network. In
this case, the program has to perform bit-rate rate control on the encoder itself to
allow maximum throughput that is possible of the encoder. These methods can
be implemented by a simple PID algorithm [33] (Equation 2.2), depending on the
implementation of the encoder.
Err = BitrateAim−BitrateCurr
Derivative = Err−ErrOldT imeCurr−T imeOld
Integral = IntegralOld + Err ∗ (TimeCurr − TimeOld)
OutputBitrate = Kp ∗ Err + Ki ∗ Integral + Kd ∗Derivative (2.2)
This is a simple yet effective solution to obtain a desired bitrate for the stream.
However, it can be seen that BitrateAim shall need to be initialised and changed
according to the conditions of the network or limits of the encoder. Bitrate could
also be interchanged for different variables; like packet delay, jitter, network bit-rate,
encoder bit-rate, encoder quality, resolution etc.
Chapter 3
Literature Review
In the following literature review, we will investigate prior or similar attempts to the
problem of streaming H.264 video in real time over wireless, and each of its inherent
difficulties. The difficulties in: a suitable portable platform to encode video in
real time; encoding H.264 video in real time; and, streaming the encoded video
over 802.11 wireless in real time. Other implementations using custom or existing
technologies will also be reviewed, and most or all are mostly related to the field of
UAV surveillance.
3.1 Military Implementations
There are a few military applications of surveillance using UAVs have been imple-
mented, like the General Atomics MQ-1 Predator and the cheaper AeroVironment
RQ-11 Raven. These quarter million (1x Raven Craft, and Control System [2]) and
40 million dollar (4x Predator Craft, and Control System [3]) systems are inaccessi-
ble to the civilian populace.
Figure 3.1: The MQ-1 Predator in full flight [3]
13
14 CHAPTER 3. LITERATURE REVIEW
The MQ-1 Predator contains a colour nose camera, a day variable-aperture TV
camera, a variable-aperture infrared camera, and synthetic aperture radar. These
cameras provide full real-time video (except the radar) but utilise a direct line of
sight proprietary wireless link and a satellite link for beyond horizon flight.
Whereas the Raven provides 3 different attachable cameras that connect to the
nose of the UAV. The first has both a forward facing and side facing camera, the
second is a nose mounted infrared camera and the third is a side mounted infrared
camera. These camera feeds are provided in real-time through a line of sight pro-
prietary wireless link.
The Raven looks remarkably similar to a hobby remote control plane (as can be seen
in Figure 3.2); however, instead of a 20 minute run time like that of a hobby plane, it
provides 50-60 minutes of flight. Notably though, these system have had many years
of development and military backing to provide a sophisticated and extremely reli-
able system. However, there are other, cheaper alternatives for surveillance UAVs.
Figure 3.2: The RQ11 Raven being hand launched [15]
3.2. THE AR.DRONE BY PARROT SA 15
3.2 The AR.Drone by Parrot SA
It is not only militaries around the world that are using UAVs; civilian companies
are developing commercial UAVs for educational use and entertainment. One such
company that is making an impact in the civilian sector is Parrot SA with their
quad-rotor (or quadrocopter) AR.Drone - named so for the Augmented Reality
games it provides.
Figure 3.3: The Parrot AR.Drone with forward facing camera [21]
It utilises a 640x480 pixel forward facing camera providing video at 15 frames per
second, and a downward facing camera providing 176x144 pixels at 60 frames per
second. The custom implemented P.264 video is streamed in real-time (about 100ms
latency) via an 802.11 ad-hoc connection to a computer or smart-phone and allows
the user to control the UAV. For a civilian application, it is rather cheap at US$300
for a relatively customisable system. This platform will be mentioned in later chap-
ters as it will be used in testing and comparison.
3.3 UAV Traffic Surveillance
Suman Srinivasan, Haniph Latchman, John Shea, Tan Wong and Janice McNair of
the University of Florida, have explored the use of Surveillance UAVs [27]. The prob-
lem was that the Department of Transportation had to upgrade their surveillance
of highway traffic, from magnetic loop detectors to something far less primitive. So
they investigated the potential of UAVs in use to quantify traffic conditions in real
time.
16 CHAPTER 3. LITERATURE REVIEW
These UAVs could be dispatched over specific and/or vast amounts of area in short
amounts of time as opposed to fixed cameras. By using this system, there would
be no need of wired infrastructure, the savings would have been considerable.
Figure 3.4: The proposed UAV traffic surveillance system [27]
However, they proposed a system in which the UAV would transmit the video data
directly to an external computer to encode, instead of compressing it in real time
on-board the system. This approach has one major drawback: limited bandwidth
for higher quality video.
The video and added data would then be transmitted to a microwave based tower
from the UAV based on existing TV station infrastructures. This however, has added
cost because of FCC regulations due to the licensing and use of certain bands. This
problem could have been mitigated by using a direct line of sight, cheap, unlicensed,
802.11 wireless technology (like the Venezuelans [23]) to stream the video back to
base.
3.4 Real-time Encoding and Transmission of H.264
As described in the theory section, The H.264 Codec (Section 2.1); it was explained
that real time video is defined as the time taken to capture a frame, encode and
packet the frame within the allocated framerate. For example, under 33.3 millisec-
3.4. REAL-TIME ENCODING AND TRANSMISSION OF H.264 17
onds for 30 frames per second video. With this in mind, the Texas Instruments
team (Naveen Srinivasamurthy, Soyeb Nagori, Girish Murthy and Satish Kumar )
proposed in their findings within [26] that a rate control algorithm for real time
encoding would limit the total packeted count per frame, and hence the maximum
size of each NAL unit. This is due to the latency involved in creating more headers
for each NAL unit, which when minimised will help minimise the amount of data
sent through a transmission stream.
In addition to minimising the NAL unit count, they proposed that each slave proces-
sor (like the DSP on the OMAP4430) have an offloaded pipeline task that computes
the motion estimation, intra-frame prediction, transformation and quantisation of
residual picture, etc. [26, Pages 2]; while the main processor computes the main
macro-block loop. However, by the results of the TI team and encoding 1080p video,
in reducing the picture size and hence the NAL units, quality loss can be expected
when vast amounts of motion or if a complex scene is encountered. Although, the
overall quality in a non-complex video is increased since the encoder is given more
time to process a frame. The efficient use of encode times in their findings can be
seen in the following graph (Figure 3.5).
Figure 3.5: The Texas Instruments results of the real-time encoding algorithm. [26,
Page 4]
18 CHAPTER 3. LITERATURE REVIEW
3.5 Adaptive Rate Control for RTP Streams
There are generally two different camps when it comes to streaming multimedia
with rate control over RTP. One is that uses RTP on top of a UDP framework,
and the other is one that is TCP friendly. TCP-Friendly Rate Control (TFRC)
essentially utilises the UDP layer with a control method that competes with TCP
packets in the network, in a fair manner. This is the topic of Ktawut Tappayuth-
pijarn’s (and his team) work in adaptive video streaming over a mobile network [29].
The objective of this scheme, was to optimally control a video stream over the
said mobile network using the extended H.264 Scalable Video Coding standard and
a RTP based, TFRC method to send the data. It utilises the feedback received from
RTP and RTCP in the form of the following feedback shown in Figure 3.6.
Figure 3.6: Example of TFRC use in an RTP/RTCP environment [29]
From this feedback, the server can then calculate the expected rate to adhere to
while being fair to TCP packets within the network - instead of flooding the connec-
tion. This method essentially doubles the receiving rate to acquire the sending rate
(X) until it saturates the network, in which it uses the method shown in Equation
3.1 [29]:
X = min(xtcp, 2 ∗ReceivingRate)
xtcp =s
R
√2p3
+ tRTO(3√
3p8
) ∗ p(1 + 32p2)(3.1)
Where s is the packet size, tRTO is the packet timeout
p is the loss event rate, and R is the Round Trip Time.
This is an excellent method in which to calculate an expected bit-rate to send over a
network. However, their implementation is tailored to resend dropped packets in the
application up to 4 times, and as can be read in Chapter 2.2, it will extend latency
of sent packets. This means that the TFRC method, while useful for streaming, is
not an excellent real-time source for multimedia, and this is confirmed in the results
that can be seen from Figure 3.7
3.5. ADAPTIVE RATE CONTROL FOR RTP STREAMS 19
Figure 3.7: Results of the TFRC method vs. no method [29]
This is not the end of the line for rate control over RTP though. In the appli-
cation of this thesis, the platform only has to compete with SSH (Secure Shell) over
a wireless connection - which inherently uses low network capacity. So the use of
a plain UDP based RTP/RTCP connection with a simple PID control algorithm -
like the one used by Uras Tos and Tolga Ayav in their Adaptive RTP Rate Control
Method [31] - can suffice.
Figure 3.8: PID method employed by the Tos and Ayav’s Method [31]
The method essentially uses the same PID algorithm in Equation 2.2, except uses
packet loss fraction and outputs an expected bit-rate. It is then passed through a
limiting function (L(u(t))), to clip between the lowest and highest bit-rates. This
method can provide low latency, low loss results in single streams applications.
Chapter 4
Design of Platform
4.1 Choice in Hardware
To meet the requirements set in place for the thesis, the base hardware has to re-
ceive, encode and transmit the video captured via an 802.11 based network. This
has to be done while in a portable scenario and because of this, it has to be of a
power efficient nature. Capturing video and transmitting it, are fairly trivial re-
quirements to be satisfied. Most development boards contain some kind of 802.11
wireless technology, and a video camera can be connected via USB (for example,
a USB web-cam). This narrows the search down to potential candidates that can
handle the processing needed to encode video.
Processors from Intel have graced the offices of the business sector and also the
homes of the consumer market. Intel though, has mainly designed high processing
capability processors with immense power requirements, much like those of the i7
line of processors with a measured system load of 80 Watts in Idle and a system
draw of 128 Watts under load [12]. While these processors are more than capable
of encoding H.264 video in real time [12], they are in no real position to be placed
on a mobile platform and powered by battery.
Since a generalised CPU is inefficient for the task of encoding a video stream, alterna-
tives had to be found. Two contenders to the development board market, the newly
revised BeagleBoard-xM [4] and the PandaBoard [8], are perfect for a development
environment, since they support expansion headers, UART, JTAG debugging and
are extremely power efficient (in the sub 5 Watts range). Both boards contain ARM
based processors with digital signal processors, in which ARM NEON optimisations
are supported by the x264 encoder. Although, since these boards are of ARM based
20
4.1. CHOICE IN HARDWARE 21
RISC architecture, the only options for operating systems are only Linux, Android,
QNX, Symbian OS and Windows Mobile CE.
Figure 4.1: The BeagleBoard-xM and its Peripherals [5]
Both are similarly equivalent in respects to hardware, except for their processor
packages and that the BeagleBoard-xM has no built in 802.11 functionality and has
only 512MByte of LPDDR RAM. Overlooking the absence of WLAN in the Bea-
gleBoard which can be integrated in via USB anyway we can see that the Texas
Instruments DM3730 [6] media core in the BeagleBoard is essentially an OMAP3
based media core but with a 1GHz ARM Cortex-A8 processor. The DM3730 has
built in digital signal processing based on the C64x+ line of DSP’s produced by
Texas Instruments (TI), which claims to enable 720p decoding and encoding, but
unfortunately doesn’t state any specific codec. The BeagleBoard measures in at a
meagre 3.35 inches by 3.45 inches and weighs 37 grams; as can be seen in Figure
4.1.
On the other hand, the PandaBoard supports a Texas Instruments OMAP4430
[7] media core. This has a dual core 1GHz ARM Cortex-A9 MPCore, in which is
joined by an IVA3 based hardware accelerator with a similar C64x+ DSP, which TI
claims to provide 1080p H.264 encoding of up to 30 frames per second (with C64x+
optimisations enabled). The PandaBoard also provides 1GB of DDR2 RAM along
with built in 802.11b/g/n based wireless and is slightly larger with dimensions of
4.5 inches by 4.0 inches and weighs just 74 grams; as can be seen in Figure 4.2.
22 CHAPTER 4. DESIGN OF PLATFORM
Figure 4.2: The Pandaboard and its Peripherals [9]
From this brief overview of both boards, it seems clear that the newer generation
PandaBoard would be more than capable enough of producing results with minimal
customisation of hardware.
4.2 The Operating System
As mentioned in the previous section (4.1), ARM based processors do not have
support from many of the popular desktop operating systems. The short list of
operating systems that do run however, are Linux, Android, QNX, Symbian OS
and Windows Mobile CE. However, the Pandaboard has so far only received active
support by Linux based distributions and Android - so this automatically rules out
the rest.
Android is a Linux based, mobile operating system with a custom environment
tailored for the small screened smart phones and tablets of today. It is primarily
4.3. THE UAV PLATFORM AND CAMERA 23
designed as a user based system with API’s and provided toolkits to develop appli-
cations, and as such does not always have required libraries to develop applications.
In this case, RTP/RTCP and H.264 encoding libraries that have been tailored for
the Android environment would be hard to come by. This is no burden however,
since the Linux kernel underneath Android is also used in servers, desktop comput-
ers, laptops and any device with a processor capable of running it.
There are quite literally hundreds of distributions of Linux tailored for certain appli-
cations, devices, and environments. In this case though, the ARM based Pandaboard
supports Angstrom Linux, Gentoo Linux, and Ubuntu Linux. Each have their own
perks, and some for development are just easier than others. However, due to in-
terest and prior knowledge about the Ubuntu distribution, it was decided that the
easy to use system would be used for development. This became a two-edged sword,
as a team of ARM developers (called Linaro [14]) are contributing to the upstream
Linux kernel for improved ARM support, and have developed pre-made images of
the Ubuntu distribution on top of these improved Kernel images for the Pandaboard.
This version of the distribution is nominated as a “headless” or server version,
meaning that it does not include a desktop environment (or user interface) and can
be connected to via a UART serial or SSH connection. This provides a minimal foot-
print in memory and streamlines the system. Ubuntu, being a fork of the Debian
GNU/Linux distribution, also implements the easy to use dpkg/apt package man-
agement system. This provides easy access to a wealth of libraries for development,
including RTP/RTSP and H.264 encoding libraries.
4.3 The UAV Platform and Camera
For the thesis, a UAV platform was chosen that would satisfy the following criteria:
• Have lift capability to lift a camera and the Pandaboard
• Be simple enough to modify and mount camera and Pandaboard
• Provide existing capability to control platform with no need for modification
(but potentially could be extended)
• Provide a stable platform to receive conclusive results
• Be cheap enough for a small thesis project such as this
As can be seen from this criteria, there exists do it yourself hobby remote control
planes and quadrocopters. However, the design of these systems do not quite cover
24 CHAPTER 4. DESIGN OF PLATFORM
the aim and topic of this project. So a pre-built system that can be bought from
local retailers was decided on, and this was the previously mentioned, AR.Drone by
Parrot SA (Chapter 3.2).
It is reported by experiments in forums on the internet [28], that the AR.Drone can
perform to maximum altitude (20 feet) with 253 grams of weight located centrally
over the battery. Since the Pandaboard is 70 grams, the power of the AR.Drone
is proven to be enough to take off with the Pandaboard mounted. This however,
leaves the selection of the camera to be determined by weight and capability.
The camera would have to be light enough to be mounted to the AR.Drone, and
be able to provide various resolutions from resolutions as small as 160x120 pixels to
even 1920x1080 pixels (for testing purposes). It would also have to provide as least
the YUV420 format (for direct encoding with x264, as will be discussed in Chapter
5.2), support the Pandaboards on-board camera connector or just USB and fully
support the Video4Linux2 drivers. This narrowed the search target down to USB
Video Class (UVC) compliant camera devices, like many Logitech web cameras.
The Logitech C910 is a full HD capable webcam (30 fps) [13] with Carl Zeiss op-
tics and autofocus, stereo microphones and supports the UVC standard in which
Linux has also implemented. However, the mounting bracket it comes with, weights
in at around 300 grams - which would have to be removed. This however would
prove beneficial, as the bracket utilised a mounting system that could be used to
mount the camera to the AR.Drone. The final configuration of the test platform
can be viewed in Figure 4.3. For testing purposes, a Styrofoam protective shield
was mounted on top of the Pandaboard to mitigate against potential damage in an
unforeseen circumstance.
The Pandaboard, is powered by a simple 5 Volt/3 Amp voltage regulator circuit,
connected to the AR.Drone’s 11.1V, 1000mAh, 10C Lithium Polymer battery. This
circuit is light enough to be picked up by the AR.Drone and provides a continuous
source of voltage for the Pandaboard - until the Lithium Polymer battery reaches a
voltage below 5.5 Volts.
4.3. THE UAV PLATFORM AND CAMERA 25
Figure 4.3: The AR.Drone, Pandaboard and Logitech C910 Camera - with and
without protective Styrofoam
Chapter 5
Design of Software
This chapter will guide the reader through the various aspects of the software de-
veloped within this thesis. This section will provide a brief overview of the architec-
ture of this software. The software, spyPanda (named after the development board
and its inherent potential use), will use the Video4Linux2 driver stack to receive
YUV420 format video from the Logitech C910 connected to the Pandaboard’s USB
port. This video will then be buffered in memory, and retrieved by the open source
x264 H.264/MPEG-4 AVC library, ready for encoding and compression. Once com-
pressed, the encoder will notify the Live555 RTP/RTCP streaming library, that a
new frame is ready to be packeted and sent over the network using the Real-time
Transport Protocol.
The video is then streamed to and played by the client, statistics are calculated,
and then finally a Real-time Transport Control Protocol Receiver Report packet is
sent back every interval (as defined by the servers bandwidth, generally 5 seconds)
with the statistics of the last burst of packets since the last sender report. This data
is then used by the spyPanda control algorithm to control the encoder parameters, to
increase or decrease, framerate, resolution and quality to ensure a real-time stream
of video over the wireless connection. This overview can be simplified in Figure 5.1.
Figure 5.1: Overview block diagram of the spyPanda software
26
5.1. VIDEO4LINUX2 27
5.1 Video4Linux2
The Video4Linux2 API [24] is a set of calls that can be used to directly interface
with a piece of video or radio hardware from the Linux host. Initially, a program will
open up the device with read and/or write permissions and then issue ioctl calls
from the program which tell the kernel device driver what to do. In the spyPanda
implementation these ioctl calls are encapsulated using the v4l2 ioctl function
located in the libv4l2 library. For error checking, this has been inserted into the
xioctl function within the v4l2 camera.c source file (Appendices B.2).
After opening the device (open device and v4l2 open), spyPanda will probe the
device for resolutions in the same aspect ratio (from the resolution) that was passed
to it from the command line at startup, or from a default ratio of 4:3. This device
probing function (known as save device resolutions) utilises a linked list (Table
5.1) to order resolutions from least demanding for the encoder to most demanding
- calculated by a simple and logical formula as shown in Equation 5.1:
ProcessorDemand = width ∗ height ∗ framerate (5.1)
Linked List BOT -1 HEAD +1 TOP
WxH@FPS NULL 160x120@30 320x240@30 320x240@60 NULL
Demand 0 576000 2304000 4608000 0
Table 5.1: Sample of spyPanda’s ordered linked list of resolutions and framerates
Once spyPanda has verified that the selected (or default) resolution exists in the
list, it can finally execute the init device function. This function will use xioctl
to set the device with a format (V4L2 PIX FMT YUV420 for efficiency when passing
to the encoder) and a resolution (e.g. 640x480 pixels). It will then request the
driver to initiate a BUF COUNT number of memory mapped buffers for the device,
then map that amount of buffers and finally exchange those buffers with the driver
using VIDIOC QBUF. Since the device is finally ready to be functional, the framer-
ate can then be set by textttset device framerate and the stream turned on using
VIDIOC STREAMON.
28 CHAPTER 5. DESIGN OF SOFTWARE
Now that the video is ready to be captured, spyPanda can use the start read frame
and stop read frame functions to indicate to the driver that the frame can be copied
to the memory mapped location in memory, ready to be encoded. When the pro-
gram needs to reinitialise or just clean the device from memory, it will call the
uninit device (and if closing, close device) function to turn the stream off, free
and unmap the memory mapped buffers, and then finally request that the rest of
the buffers be destroyed within the driver.
5.2 x264 Open Broadcast Encoder
The x264 encoding library is an implementation of a H.264/MPEG-4 AVC encoder.
It supports many CPU architectures like x86, x86 64, PowerPC, Sparc and ARM;
and is optimised to run on these. In particular, it has NEON optimisations for the
ARM architecture [18], which can be used upon the Pandaboard to further acceler-
ate its capability to encode raw video. However, the x264 library does not implement
C64x+ DSP accelerations since the framework to offload the processing using the
Direct Memory Access (DMA) method has not been written. This would have been
a massive task for the current thesis, and one which would require far more knowl-
edge and experience in DSP’s and kernel drivers. So the NEON optimisations would
have to suffice.
Encoding within the x264 library is a relatively simple task, except for the sheer
amount of variables to initially set up. There are many presets to initially base the
encoder upon though, in which make the encoder a breeze to set up. The initial pa-
rameters used to initialise the encoder are the preset “veryfast” and tuning named
“zerolatency” (using x264 param default preset(x264param, "veryfast", "ze
rolatency"). These parameters have been use in addition to the “Constrained
Baseline Profile” (CBP) that is specified in the H.264 standard and that has been im-
plemented in the x264 encoder (using x264 param apply profile(encoder, "base
line")).
Each preset defines slightly different parameters that the encoder needs to abide by;
however, the x264 encoding library simplifies a standard set of 10 presets. These
presets generally implement different analysing algorithms, inter/intra partitions,
transforms, refinements, etc. all of which can be viewed within the x264 source code
under the function x264 param apply preset located in common/common.c. The
tunings however, generally specify the number of threads and number of frames it
can buffer. In the case of “zerolatency”, it specifies that the encoder cannot look
5.2. X264 OPEN BROADCAST ENCODER 29
ahead in any way, has no B-Frames (since this would require frames buffered), uses
the frames per second for the encoders own rate control, and enables slice-based
threading.
There is though, a modified x264 library named the Open Broadcast Encoder which
aims to provide an encoder that provides real-time encoding. This library is simple,
as it uses a single parameter - x264param.sc.f speed (the ratio from real-time, eg.
1.0 is equivalent to real-time) - to perform rate control on the encoder. By using
this parameter, the quality of video can be controlled depending on the feedback
from the client and also whether the Pandaboard is stressing under the load - this
will be explained in Chapter 5.4.
Now that the profile (CBP), preset (“veryfast”) and the tuning (“zerolatency”)
have been chosen, they can be implemented in an initialising function within spy-
Panda named init encoder param). This will set the encoder to use these preset
parameters with modified parameters have been optimised for the Pandaboard. In
particular, a default resolution of 320x240 at 30 frames per second, no B-Frames,
a real-time ratio of 1.0, and a constant rate factor of 25 with a limit of 40 (the
higher, quality will drop within the video and compress further in scenes of motion).
The control algorithm will however modify some of these default parameters as the
program runs, and so are subject to change.
After the default spyPanda encoding parameters have been set, and the encoder
opened, the encoder can finally start encoding frames stored in the buffer by the
Video4Linux2 driver. The encode frame function essentially calls start read frame,
points the 3 planes - located within the enc.pic in.img plane array - to locations
within the buffer that adhere to the YUV420 specification. From here, the 3 planes
are then passed to the library function x264 encoder encode to encode the frame,
and save a pointer to the memory holding the NAL unit in the spyPanda variable,
enc.nal. The frame size is then saved in the NAL unit along with the payload,
ready to be passed to and triggered for the Live555 library using triggerFrameRe
ceived.
The average encode time and frame size, is also kept track of by means of using
the gettimeofday function, and a difference in time between the start and end of
a encoded frame. These statistics will then be used to calculate expected bitrate
and framerate for the control algorithms. The code for this implementation, can be
located within the x264 control.c source file (Appendices B.3).
30 CHAPTER 5. DESIGN OF SOFTWARE
5.3 RTP and RTCP with Live555
The Live555 [17] set of streaming libraries, is a standards compliant RTP/RTCP/RTSP
library that has the ability to stream many codecs over a network connection. It
is supported as cross platform on Windows, Linux and Mac; and is integrated as
a client or server in many open source media applications. This means that, VLC
media player and MPlayer - which both use Live555 as a client - can be used, along
with many other RTSP compliant media players, to play the streamed video.
The Live555 library however, does not offer pre-written support for sourcing video
frames or byte streams from encoders. So, a subclass of the library FramedSource -
a means of standardising capture from a file, encoder or another RTP source, etc -
has to be written to encapsulate the encoder, ready for reading. This is be a modi-
fied copy of the DeviceSource example that the library provides, and can be seen
in x264EncoderSource.cpp under Appendices B.4. This code is essentially sched-
uled in the library’s event loop, and calls deliverReadyFrame to retrieve the NAL
payload produced by the encoder. The event loop knows when a new frame is ready
and is told the new address of the NAL payload, when the triggerFrameReceived
function is called (located in live555.cpp of Appendices B.4).
Now that the Live555 library can finally read from the encoder, a dedicated Live555
thread is created using the pthread library. This is due to the design of the library,
as it uses an event loop to schedule specified tasks, and would detriment the concur-
rency that the x264 encoder requires to process frames in an efficient manner. How-
ever, the Live555 library is in no way considered to be thread safe, so any updates
to the library would have to be passed through using triggers (like triggerFrameRe
ceived) or global variables. Once the thread has started, spyPanda has to wait for
Live555 to complete its setup - this is implemented using a pthread cond wait and
mutex locks, both in main.c (Appendices B.1).
The Live555 runtime can then be setup within this separate thread, by specifying
a succession of configurations for the library to read and stream the encoded video.
For spyPanda, an RTSP server has been set up since it is the most widely supported
protocol, and uses the RTP and RTCP application layers, with the UDP transport
layer underneath. The implementation that is used to set up this RTSP server, is a
modified clone of the Live555 implementation of the testH264VideoStreamer.cpp
RTSP server that is located within the Live555 “testProgs” source directory. This
modified code can be seen in live555.cpp within Appendices B.4.
5.4. THE SPYPANDA CONTROL ALGORITHM 31
Firstly, a port number for both the RTP and RTCP servers are specified as 18888 and
18889 respectively. Both of these ports are then “groupsocked” (a Live555 socket
like object) and binded, ready to be used. The RTP Groupsock can then be linked
to a H264VideoRTPSink, which will handle the passing of H.264 packets and then
an RTCPInstance can be created to provide QoS statistics. The RTCP instance is
initiated with an estimate of the session bandwidth, by acquiring an estimate from
the wl1271 (Pandaboard wireless card) wireless driver and interfaced using ioctl
calls as demonstrated in iw stats.c in Appendices B.5.
After these modules are loaded, the RTSP server can finally be created, and a
ServerMediaSession with an RTCP subsession added and created. However, the
video source - in this case, the encoder - still has not been added; this is where the
x264EncoderSource sub-class is finally added and used. In the setup of the thread,
a single video frame was captured and encoded, and the memory address of the first
NAL unit from this frame was passed through.
The x264EncoderSource sub-class is then initialised with this NAL unit address,
and is then instantiated as a FramedSource. However, the frames from the encoder
source still need to be split, ready to be repackaged correctly for use in the RTP
stream. This is the job of the H264VideoStreamFramer class, in which is a filter
that breaks the H.264 elementary stream into an RTP workable state. Once this is
initialised and ready, the stream is finally ready to be played and the Live555 thread
broadcasts that it is completed, ready for the x264/Video4Linux2 thread to proceed.
Once playing though, the spyPanda control algorithm requires stream statistics
from Live555. This is completed through the H264VideoRTPSink class, in which
holds a transmission statistics database (TransmisionStatsDB) saved from every
connected client and their respective Receiver Reports. In the case of spyPanda, an
iterator passes through the database, and selects the very last (or the most recently
connected) clients transmission statistics. These statistics contain standard feed-
back from Receiver Reports and are requested by the custom functions specified in
the “Quality of Service Section” of live555.cpp in Appendices B.4.
5.4 The spyPanda Control Algorithm
The spyPanda control algorithm is simply a jitter based system, where an optimal
jitter is discovered and the control algorithm tries to attain that optimal jitter. It
also monitors the framerate that is produced from the encoder, and acts on the
32 CHAPTER 5. DESIGN OF SOFTWARE
quality/size of the picture if the output encoder framerate drops below 66% of the
specified encoder framerate (enc.x264param.i fps num).
The control algorithm is simple; keep dropping the quality of the video until the
framerate and jitter levels are back to within the acceptable limits. If the quality
cannot be dropped any further, lead to more aggressive action and drop the reso-
lution to the next lowest demanding resolution (from the linked list as specified in
Chapter 5.1). This system essentially attempts to drop the quality and compression
in order to reduce the load that the encoder has on the Pandaboard hardware and
hence, increase the bit-rate over the network.
It is implemented by using the real-time ratio that is supplied by the x264 Open
Broadcast Encoder (which drops quality of video if it is below a ratio of 1.0, and in-
creases if above) and by using the measured variables from the average encode time
(enc.ave enc) and the estimated jitter acquired from the TransmissionStatsDB
in Live555. Every interval that a new RTCP packet is received, spyPanda will
read the data within this packet and make a decision as to whether it will drop
the real-time ratio or the resolution. When not within the bounds of the jitter or
framerate, it will proportionally drop (similar to the PID algorithm from Equation
2.2) the real-time ratio by the last ratio, effectively dropping the quality of the video.
If this ratio is dropped too far, finally drop the resolution; in this case, the ra-
tio is reset to 1.0, and the process starts again. However, if the new resolution has
processing leeway, the ratio will increase (increasing quality), until the network/en-
coder can no longer handle the quality or if it is greater than a top ratio limit. In
this case, it will attempt to increase the resolution if the new resolutions old jitter
reading is within acceptable bounds, or if that resolution just has not been used
before. On any resolution swap, the jitter for that last resolution is saved in the
linked list, ready to be checked if required.
Implementing this control algorithm within spyPanda proved to elude any prospect
that it would work in practice. Modifying the resolution within the encoder would
throw errors about stride, which in practice is the padding after a resolution width,
as demonstrated in Figure 5.2. In the case of the x264 implementation though, the
stride was equal to the width of the image and so a picture could not dynamically
increase. This applied to the Video4Linux2 library as well, as you could not just
pass in the same sized picture without the stride error biting once again.
5.4. THE SPYPANDA CONTROL ALGORITHM 33
Figure 5.2: Illustration displaying stride [11]
It was considered to use the libswscale (or Software Scale) implementation from
the FFmpeg libraries [10] to scale the larger camera images into smaller ones for
x264. However, it was deemed too processor intensive and so the encoder is com-
pletely reinitialised with the new configurations. This process also requires that
the camera dynamically change its resolution. Initially, the same idea of restarting
was applied to the camera; however, a complete v4l2 close and v4l2 open proved
to take a substantial amount of time to operate. So a simple uninit device and
init device (from v4l2 camera.c, Appendices B.2) was implemented to force the
camera to change resolution.
This method proved successful in dropping the resolution, but took about 0.5 sec-
onds to restart and increased latency for the first couple of seconds of the stream.
This was due to the catchup that was needed for the client to play the first few
seconds of the stream, and flush the buffer - and also happens at the start of every
spyPanda RTSP session.
Chapter 6
Results of spyPanda
The results for this chapter, were based on a configuration of spyPanda that uses
Instantaneous Decoding Refresh (IDR) frames and starts at a resolution of 640x480
pixels at 30 frames per second (FPS).
This configuration can be run with the command:
./spyPanda -i -r 640x480@30
Since the Pandaboard’s processor is only capable of so much, it was deemed that
sections dedicated to the results of its encoding performance and stream performance
would be listed.
6.1 Pandaboard Encoded Framerate Results
As can be seen from Figure 6.1, initially the Pandaboard is struggling to encode
at a rate (FPS ) above the encoders specified framerate (Encoder FPS ). At ∼11
seconds, spyPanda drops the specified Encoder FPS to 20 frames per second. At
this point though, the Pandaboard is on the verge to effectively delivering the video
at the expected framerate. However, the jitter (as shown in Figure 6.3) dictates
that the resolution needs to be dropped further to produce even results (and hence
low latency). So at ∼22 seconds, the resolution is dropped further from 640x480
pixels at 20 frames per second, to the next lowest demanding resolution of 320x240
pixels at 60 frames per second. The framerate is continually dropped in the camera
and encoder until the jitter finally stabilises at a point that exhibits the least latency.
It should be noted though, that the result FPS, is the potential framerate that
could be encoded using that resolution. The encoder does not stream a framerate
higher than what is expected, only equal to or lower.
34
6.1. PANDABOARD ENCODED FRAMERATE RESULTS 35
Figure 6.1: Framerate attained over progressions of dropped resolutions
36 CHAPTER 6. RESULTS OF SPYPANDA
6.2 Pandaboard Output Bitrate Results
The bitrate in this case also exhibits a similar progression as the real-time ratio
and resolutions are changed. As can be seen in Figure 6.2, the bitrate attempts
to keep underneath the estimated Wireless LAN bitrate capacity according to the
Jitter exhibited by the stream. The encoder output bitrate slowly increases as the
compression is reduced to alleviate stress on the Cortex-A9 processor that is encod-
ing the video.
It can also be seen that using this control scheme, the processing capability of
the Pandaboard will never have the ability to stream over 2000Kbit/s. This means
that only at far ranges, when the link quality is low, that the current bitrate from
the encoder will become a problem. However, constraints on jitter should alleviate
concerns and further drop the resolution.
These drops in resolution are specified at times within the following table:
Time (ms) 0 11’012 22’528 25’842 29’542
Resolution 640x480@30 640x480@20 320x240@60 320x240@30 320x240@24
Table 6.1: Times for Resolution changes in stream for Figures 6.1, 6.2 and 6.3
6.2. PANDABOARD OUTPUT BITRATE RESULTS 37
Figure 6.2: Bitrate attained over progressions of dropped resolutions
38 CHAPTER 6. RESULTS OF SPYPANDA
6.3 Jitter and Latency Results
In the case of jitter, it could be seen that at the beginning of each encoder start/restart
(or resolution/framerate change), that the Jitter would spike and then settle after
an arbitrary time. This spike would be observed (clientside) in an increase of latency
(of more than half a second) from capturing and then viewing the image. This spike
added more than half a second to the original latency, until it finally settled. It
can also be seen that the time taken to change resolution is ∼0.5 seconds; which
means that the client is expecting frames between this down time. The client will
then attempt to increase the buffer size (and hence latency buffer) until a time out
is reached, and the stream is cut.
However, when the video does return, this buffer latency is still kept high, and
the new video is played with added latency on top of the encoding latency and
stream latency. This also proves that jitter can be used as a rough estimate to how
much latency a stream has on client side - which is expected, since jitter is a measure
of time invariance of the stream. The next reasonable assumption would then be to
slowly stress the Pandaboard until the stream (at client-side) seems to lag, or have
noticeable latency. At this point, the jitter would be recorded and used to set the
maximum jitter a stream should have to be deemed real-time. The jitter found to
ensure a real-time stream was found to be at a jitter of ∼3000.
As can be seen, for the resolution changes - until the change to 320x240 pixels
at 24 frames per second - the jitter is far too high to maintain a low latency stream.
So the real-time ratio is dropped to attempt to reduce the time (and quality) that
is taken to encode each frame. However, even with these measures, the time taken
is still too high to reduce the latency. So aggressive action is taken by dropping the
resolution and/or framerate whenever the ratio drops below 0.6.
However, when the optimal jitter is finally reached, the ratio has room to increase
quality. This can be observed at ∼29 seconds, and eventually increases until it caps
at a ratio of ∼2.0.
6.3. JITTER AND LATENCY RESULTS 39
Figure 6.3: Jitter experienced in the stream
40 CHAPTER 6. RESULTS OF SPYPANDA
6.4 Discussion on Picture Quality
For the sake of a high framerate and low latency, the picture quality does suffer. On
average, the Pandaboard drops the resolution down to 320x240 pixels just to keep
a low latency. This can be seen in Figure 6.4:
Figure 6.4: Stable flight of AR.Drone at 320x240 pixels
However, when high motion is encountered, the Constant Rate Factor (CRF) al-
gorithm of the x264 encoder kicks in, and limits the amount of compression, and
removes vast amounts of quality. As can be seen in Figure 6.5:
Figure 6.5: Unstable flight of AR.Drone at 320x240 pixels
However, in times of unpredictable motion (simultaneous yaw and elevation), the
quality finally gives in and pixelates the picture beyond recognition. Which can be
seen in Figure 6.6:
6.4. DISCUSSION ON PICTURE QUALITY 41
Figure 6.6: High motion flight of AR.Drone at 320x240 pixels
Simple motion did not pose a serious issue in the design of the spyPanda solution,
since the CRF algorithm is designed to reduce the quality of the video enough in
motion, that the human eye cannot tell the difference. However, when pixelation
occurs, it would have been beneficial to include a motion stabilisation pre-processing
algorithm, so that the encoder does not have to use extra processing power to pre-
dict and compress further.
The trials of the spyPanda program did support the fact that a higher framer-
ate and low latency is far more beneficial than quality in terms of the ability for a
user to control and observe the said UAV. In a security sense though, it could be
far more beneficial to include an on-board algorithm that turns up quality when a
suspicious activity is found. In this case, the use of DSP accelerated encoding could
have vastly improved the results of the picture quality and increased the number of
potential resolutions that spyPanda could have used.
Chapter 7
Conclusions
7.1 Summary and conclusions
The final product of the thesis - spyPanda - has demonstrated the ability to provide
a low latency, high framerate H.264 video stream from a UAV to a client. Unfor-
tunately, to provide these requirements, the control algorithm had to reduce the
quality of the video to allow the Pandaboard’s Cortex-A9 processor to encode the
video. However, it was deemed in user tests that a high framerate and low latency
video stream was far more beneficial to the observation of the UAV, than a high
quality, noticeably “laggy” (high latency), low framerate solution.
It was suggested that the use of a DSP accelerated encoder would have improved
the quality of the feed, while retaining the high framerate and low latency qualities.
This also demonstrated that the limiting factor of the platform was the software
based encoding upon the Pandaboard’s processor.
The spyPanda algorithm also successfully changed resolutions of the feed in a short
enough amount of time so that the client did not deem the RTP stream to be timed
out. This method made use of the fact that time invariance jitter - as calculated
from the RTP stream - could be successfully used as feedback to optimise a stream
to provide low latency video.
To summarise, the open-source spyPanda platform succeeded in all of the initial
goals of the project of providing a reliable low-latency, high framerate, adaptive
video stream over a standards compliant application layer (RTP/RTCP/RTSP).
42
7.2. POSSIBLE FUTURE WORK 43
7.2 Possible future work
As has been discussed, the main areas of improvement would be to include a DSP
optimised encoder. This could be implemented in one of the following ways:
• Purchase a license from Texas Instruments to use their proprietary encoders
• Modify the existing x264 source code to include custom optimisations for the
C64x+ line of DSP’s
This would most likely be the second option, in order to keep the open source nature
of the spyPanda program.
If the DSP optimisations prove to be not effective at reducing the pixelation (due
to unpredictable motion) in the video stream, a pre-processing motion stabilisation
algorithm could be implemented. This would essentially lie in the middle of the
Video4Linux2 and x264 encoding layers, and act as a “spring and damper” for the
motion of video in the hopes of reducing motion and blur. Alternatively, a float-
ing lens could be used upon the platforms camera to passively reduce the effects of
extreme motion [1].
44 CHAPTER 7. CONCLUSIONS
Appendix A
Program listings
The spyPanda adaptive and real-time, H.264 video streaming program provides an
open source solution to many real-time streaming products.
It utilises the well established x264 encoder (Open Broadcast Encoder variant) and
the live555 RTSP libraries, with a customisable and extendable control algorithm.
Currently, spyPanda uses the Bazaar revisioning system and stores its code on
Launchpad.
Install Bazaar using:
sudo apt-get install bzr
And get the source code using:
bzr clone lp: alex-stevens/+junk/spyPanda
The code can also be viewed online at this address:
http://bazaar.launchpad.net/ alex-stevens/+junk/spyPanda/files
The revision that is referenced in this version of the document is revision 72.
45
Appendix B
Companion disk
Due to the size of the source code for the program, the source code can be found
online or within the companion disk.
B.1 Main C File
This can be located on the companion disc in:
spyPanda/main.c
B.2 Video4Linux2 Implementation
This can be located on the companion disc in:
spyPanda/v4l2 camera.c and spyPanda/v4l2 camera.h
B.3 x264 Implementation
This can be located on the companion disc in:
spyPanda/x264 control.c and spyPanda/x264 control.h
B.4 Live555 Implementation
This can be located on the companion disc in:
spyPanda/x264EncoderSource.cpp and spyPanda/x264EncoderSource.hh
46
B.5. MISCELLANEOUS C IMPLEMENTATIONS 47
B.5 Miscellaneous C Implementations
Linked Lists can be located on the companion disc in:
spyPanda/linked list.c and spyPanda/linked list.h
Wireless LAN statistics can be located on the companion disc in:
spyPanda/iw stats.c and spyPanda/iw stats.h
B.6 Sample Results
The results used in this document can be located under Results/stats.csv
These results were used in the creation of the Framerate, Bitrate and Jitter graphs.
Figures 6.1, 6.2, 6.3 respectively.
B.7 Report LaTeX Source and Items
The source for this document is located within the Report-latex directory under the
companion disc.
B.8 This Report
This report can be located within the root directory of this companion disc, and is
named 41719882 stevens.pdf
48 APPENDIX B. COMPANION DISK
Bibliography
[1] What is optical shift image stabilizer? http://www.canon.com/bctv/faq/
optis.html.
[2] Rq-11 raven. http://www.globalsecurity.org/intell/systems/raven.
htm, 2005.
[3] Mq-1 predator unmanned aerial vehicle. http://www.162fw.ang.af.mil/
resources/factsheets/factsheet.asp?id=11932, February 2008.
[4] Beagleboard-xm product reference. http://beagle.s3.amazonaws.com/
design/xM-A/BB_xM_SRM_A2_01.pdf, 2010.
[5] Beagleboard.org - hardware-xm. http://beagleboard.org/hardware-xM,
2010.
[6] Davincitm dm37x video processors. http://focus.tij.co.jp/jp/lit/ml/
sprt571/sprt571.pdf, 2010.
[7] Omap 4 mobile applications platform. http://focus.ti.com/lit/ml/
swpt034a/swpt034a.pdf, 2010.
[8] Pandaboard platform specifications. http://www.pandaboard.org/content/
platform, 2010.
[9] Pandaboard references — pandaboard. http://pandaboard.org/content/
resources/references, 2010.
[10] Ffmpeg. http://ffmpeg.org/, 2011.
[11] Image stride. http://msdn.microsoft.com/en-us/library/aa473780%28v=
vs.85%29.aspx, 8 September 2011.
[12] Intel core i7 2600k cpu benchmark. http://www.anandtech.com/bench/
Product/287, 2011.
49
50 BIBLIOGRAPHY
[13] Logitech hd pro webcam c910. http://www.logitech.com/en-au/
webcam-communications/webcams/devices/6816, 2011.
[14] Open source software for arm socs. http://www.linaro.org/, 2011.
[15] Sgt. 1st Class Michael Guillory. Up, up and away. http://usarmy.vo.llnwd.
net/e2/-images/2006/11/22/1024/army.mil-2006-11-22-114612.jpg,
November 2006.
[16] Panasonic Corporation. Mpeg-4 avc/h.264 codec technology explanation. http:
//pro-av.panasonic.net/en/technology/technology.pdf.
[17] Ross Finlayson. Live555 streaming media. http://www.live555.com/
liveMedia/.
[18] Jason Garrett-Glaser. Announcing arm support for x264. http://x264dev.
multimedia.cx/archives/142, 24 August 2009.
[19] Matthew Gast. O’Reilly Media, Inc., 2nd edition, 25 April 2005.
[20] V. Iverson, J. McVeigh, and B. Reese. Real-time h.24-avc codec on intel ar-
chitectures. In ICIP International Conference on Image Processing, volume 2,
pages 757–760, 24-27 October 2004.
[21] Ben Kuchera. Parrot ar.drone to attack this september,
for $300. http://arstechnica.com/gaming/news/2010/06/
parrot-ardrone-to-attack-this-september-for-300.ars.
[22] Sunhun Lee and Kwangsue Chung. Cp-friendly rate control scheme based on
rtp. In Information Networking. Advances in Data Communications and Wire-
less Networks, Lecture Notes in Computer Science, volume 3961, pages 660–669,
2006.
[23] Nilay Patel. Venezuelans set new wifi distance record:
237 miles. http://www.engadget.com/2007/06/19/
venezuelans-set-new-wifi-distance-record-237-miles/, June 2007.
[24] Michael H Schimek, Bill Dirk, Hans Verkuil, and Martin Rubli. Video for Linux
Two API Specification, volume 0. Bytesex.org.
[25] Henning Schulzrinne, S. Casner, R. Frederick, and V. Jacobson. RTP: a trans-
port protocol for Real-Time applications. RFC 3550, Internet Engineering Task
Force, 2003.
BIBLIOGRAPHY 51
[26] Naveen Srinivasamurthy, Soyeb Nagori, Girish Murthy, and Satish Kumar. Sub-
picture based rate control algorithm for achieving real time encoding and im-
proved video quality for h.264 hd encoder on embedded video socs. In 2010
IEEE 4th International Conference on Internet Multimedia Services Architec-
ture and Application, pages 1–6, 15-17 December 2010.
[27] Suman Srinivasan, Haniph Latchman, John Shea, Tan Wong, and Janice Mc-
Nair. Airborne traffic surveillance systems: video surveillance of highway traffic.
In Proceedings of the ACM 2nd international workshop on Video surveillance
& sensor networks, 10-16 October 2004.
[28] symon. Payload of the a.r. drone. http://www.ardrone-flyers.com/forum/
viewtopic.php?f=7&t=38, 5 September 2010.
[29] Ktawut Tappayuthpijarn, Guenther Liebl, Thomas Stockhammer, and Ecke-
hard Steinbach. Adaptive video streaming over a mobile network with tcp-
friendly rate control. June 2009.
[30] Javvin Technologies. chapter RTCP: RTP Control Protocol, page 145. 2nd
edition.
[31] Uras Tos and Tolga Ayav. Adaptive rtp rate control method. In 2011 35th IEEE
Annual Computer Software and Applications Conference Workshops, 2011.
[32] Hongtao Wang, Yuehui Jin, Wendong Wang, Jian Ma, and Dongmei Zhang.
The performance comparison of prsctp, tcp and udp for mpeg-4 multimedia
traffic in mobile network. In International Conference on Communication Tech-
nology Proceedings, volume 1, pages 403–406, 9-11 April 2003.
[33] Tim Wescott. Pid without a phd. http://igor.chudov.com/manuals/
Servo-Tuning/PID-without-a-PhD.pdf, October 2000.
[34] T. Wiegand, G.J. Sullivan, G. Bjontegaard, and A. Luthra. Overview of the
h.264/avc video coding standard. IEEE Transactions on Circuits and Systems
for Video Technology, 13(6):560–576, July 2003.