spyPanda Thesis Report

Wireless Transmissionof Video using WLAN

byAlexander Conrad Stevens

School of Information Technology and Electrical Engineering,University of Queensland.

Submitted for the degree ofBachelor of Engineering

in the division of Mechatronic Engineering

November 2011.

November 6, 2011

The Head of School

School of Information Technology and Electrical Engineering

University of Queensland

St Lucia, Q 4072

Dear Professor Paul Strooper,

In accordance with the requirements of the degree of Bachelor of Engineering in

the School of Information Technology and Electrical

Engineering, I present the following thesis entitled “Wireless Transmission of Video

using WLAN”. This work was performed under the supervision of Dr Konstanty

Bialkowski.

I declare that the work submitted in this thesis is my own, except as acknowl-

edged in the text and footnotes, and has not been previously submitted for a degree

at the University of Queensland or any other institution.

Yours sincerely,

Alexander Conrad Stevens

To my parents and grandparents; for supporting my ambitions through life. . .

Acknowledgments

I would love to show my earnest appreciation for the guidance from my supervisor,

Dr Konstanty Bialkowski. He was patient, informative and guided me in the right

direction when I strayed from the path of this work. It would be a pleasure to

continue to liaise with him in the future.

I would also like to thank Mr Ross Finlayson of Live Networks, Inc. - for the

Live555 streaming libraries; and Jason Garrett-Glaser and his team of x264 devel-

opers - for the x264 H.264/AVC encoding library. Their contributions to the open

source community and providing the means for myself to complete this thesis, will

be forever acknowledged.

i

Abstract

A call for efficient use of wireless networks for streaming of video has become appar-

ent in today’s age of on-demand content. This is ever more evident in the Unmanned

Aerial Vehicle (UAV) research and development sectors - streamed video must be

delivered of reasonable quality, in real-time and of a high framerate over 802.11b/g/n

wireless LAN for the research solution utilising the video. Further compounding the

task is that most research and consumer UAVs are of a form factor far too small for

the average consumer desktop computing solution.

This thesis is inherently on the topic of providing a complete open-source software

solution that runs on a low power, light-weight computing platform and provides

the aforementioned features. Developed within this software is a custom rate-control

control algorithm that utilises the Live555 Real-time Transport Protocol (RTP) and

Real-time Transport Control Protocol (RTCP) library, the x264 H.264/AVC encod-

ing library and the Video4Linux2 application programming interface. The software

solution is then built and ran within a Linux distribution named Ubuntu, upon an

ARM development platform called the Pandaboard.

This solution has been tested and optimised for a Pandaboard mounted atop the

popular Parrot AR.Drone consumer UAV. It has proven that streaming high fram-

erate H.264 video from a UAV platform is achievable through various rate control

techniques, like on-the-fly resolution adjustments and adjusting H.264 quality pa-

rameters. However, without the use of an H.264 encoder optimised for the Pand-

aboard’s Digital Signal Processor, the Pandaboard cannot encode video of high

enough quality to saturate the wireless network. That is until, the wireless LAN is

at the limits of its range or the wireless LAN is negotiating heavy traffic.

ii

Contents

Acknowledgments i

Abstract ii

List of Figures v

List of Tables vi

1 Thesis Overview 1

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Aim of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Theory 5

2.1 The H.264 Codec . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 IEEE 802.11 Wireless and Protocols . . . . . . . . . . . . . . . . . . . 7

2.3 The RTP and RTCP Protocols . . . . . . . . . . . . . . . . . . . . . 8

2.4 Rate Control for Streams . . . . . . . . . . . . . . . . . . . . . . . . . 11

3 Literature Review 13

3.1 Military Implementations . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.2 The AR.Drone by Parrot SA . . . . . . . . . . . . . . . . . . . . . . . 15

3.3 UAV Traffic Surveillance . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.4 Real-time Encoding and Transmission of H.264 . . . . . . . . . . . . 16

3.5 Adaptive Rate Control for RTP Streams . . . . . . . . . . . . . . . . 18

4 Design of Platform 20

4.1 Choice in Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.2 The Operating System . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4.3 The UAV Platform and Camera . . . . . . . . . . . . . . . . . . . . . 23

iii

iv CONTENTS

5 Design of Software 26

5.1 Video4Linux2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

5.2 x264 Open Broadcast Encoder . . . . . . . . . . . . . . . . . . . . . . 28

5.3 RTP and RTCP with Live555 . . . . . . . . . . . . . . . . . . . . . . 30

5.4 The spyPanda Control Algorithm . . . . . . . . . . . . . . . . . . . . 31

6 Results of spyPanda 34

6.1 Pandaboard Encoded Framerate Results . . . . . . . . . . . . . . . . 34

6.2 Pandaboard Output Bitrate Results . . . . . . . . . . . . . . . . . . . 36

6.3 Jitter and Latency Results . . . . . . . . . . . . . . . . . . . . . . . . 38

6.4 Discussion on Picture Quality . . . . . . . . . . . . . . . . . . . . . . 40

7 Conclusions 42

7.1 Summary and conclusions . . . . . . . . . . . . . . . . . . . . . . . . 42

7.2 Possible future work . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

Appendices 44

A Program listings 45

B Companion disk 46

B.1 Main C File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

B.2 Video4Linux2 Implementation . . . . . . . . . . . . . . . . . . . . . . 46

B.3 x264 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

B.4 Live555 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . 46

B.5 Miscellaneous C Implementations . . . . . . . . . . . . . . . . . . . . 47

B.6 Sample Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

B.7 Report LaTeX Source and Items . . . . . . . . . . . . . . . . . . . . . 47

B.8 This Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

List of Figures

2.1 Example of I, P, and B-Frame Prediction. Picture courtesy of Petteri

Aimonen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2 The RTP Header as defined in RFC-1889 [25] . . . . . . . . . . . . . 9

2.3 Receiver Report for the RTP Control Protocol [25] . . . . . . . . . . 10

3.1 The MQ-1 Predator in full flight [3] . . . . . . . . . . . . . . . . . . . 13

3.2 The RQ11 Raven being hand launched [15] . . . . . . . . . . . . . . . 14

3.3 The Parrot AR.Drone with forward facing camera [21] . . . . . . . . 15

3.4 The proposed UAV traffic surveillance system [27] . . . . . . . . . . . 16

3.5 The Texas Instruments results of the real-time encoding algorithm.

[26, Page 4] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.6 Example of TFRC use in an RTP/RTCP environment [29] . . . . . . 18

3.7 Results of the TFRC method vs. no method [29] . . . . . . . . . . . . 19

3.8 PID method employed by the Tos and Ayav’s Method [31] . . . . . . 19

4.1 The BeagleBoard-xM and its Peripherals [5] . . . . . . . . . . . . . . 21

4.2 The Pandaboard and its Peripherals [9] . . . . . . . . . . . . . . . . . 22

4.3 The AR.Drone, Pandaboard and Logitech C910 Camera - with and

without protective Styrofoam . . . . . . . . . . . . . . . . . . . . . . 25

5.1 Overview block diagram of the spyPanda software . . . . . . . . . . . 26

5.2 Illustration displaying stride [11] . . . . . . . . . . . . . . . . . . . . . 33

6.1 Framerate attained over progressions of dropped resolutions . . . . . 35

6.2 Bitrate attained over progressions of dropped resolutions . . . . . . . 37

6.3 Jitter experienced in the stream . . . . . . . . . . . . . . . . . . . . . 39

6.4 Stable flight of AR.Drone at 320x240 pixels . . . . . . . . . . . . . . 40

6.5 Unstable flight of AR.Drone at 320x240 pixels . . . . . . . . . . . . . 40

6.6 High motion flight of AR.Drone at 320x240 pixels . . . . . . . . . . . 41

v

List of Tables

5.1 Sample of spyPanda’s ordered linked list of resolutions and framerates 27

6.1 Times for Resolution changes in stream for Figures 6.1, 6.2 and 6.3 . 36

vi

Chapter 1

Thesis Overview

1.1 Introduction

Generally, video streaming is performed on wired networks providing almost un-

fettered access to multimedia content for the end-user. However, one can see the

benefits of a device utilising existing and cheap wireless networks to provide video

content - whether it be in real-time or on-demand. The same applies for Closed-

circuit TV (CCTV) security systems - where the video is streamed through physical

wires to a base station - could be improved by using a modular wireless system.

The choice between wired and wireless networks though, is there for the consumer

to decide. However, in an application like an Unmanned Aerial Vehicle (UAV), it

can be considered almost impractical to tether the UAV to a CAT5e/6 LAN ca-

ble to provide the end-user with the UAVs on-board video. So proprietary digital

and analogue solutions have been developed to stream video over existing wireless

technologies. These solutions generally have static and unchangeable configurations

unless licensing agreements are met, and these agreements can sometimes be costly

to the party requiring them.

Third and fourth generation (3G/4G) mobile networks have been utilised by video

call applications on mobile phones to provide real-time conferencing to loved ones

and colleagues; though, these networks are costly, even when rented from a carrier.

So for a portable and cheap streaming solution, existing and widespread networks

are required - which leads to the next point. In today’s day and age, almost every

home in the developed nations around the world can purchase and set up their own

802.11 wireless network in as little as 1 hour with little to no configuration. This

IEEE standardised technology requires no licensing fees, can be extended by direc-

tional antennae to reported record ranges of up to 382 kilometres [23], and provides

1

2 CHAPTER 1. THESIS OVERVIEW

sufficient network bandwidth capacity to stream high quality video to an end user.

To stream real-time video using a digital wireless technology like 802.11, it would

require the use of a well established video codec. The video codec is designed to

use motion prediction estimation and various other techniques to compress video.

Though, like many forms of intellectual property, some codecs require royalty fees

if the video streaming product is to be sold. For the most part, a research or-

ganisation can use such codecs without restriction, and there exists open-source

implementations of quality codecs like WebM/VP8, Vorbis and H.264/AVC, all free

for modification and use. Many of these codecs are used in home appliances already,

for example, the H.264/AVC video codec is used in the Blu-ray DiscTMtechnology

and both MPEG-2 and H.264/AVC are used for Digital Television in many countries

because of their high compression.

Once a codec is chosen, the video can be compressed and sent through the wireless

network to the client. Some companies implement their own protocols which encap-

sulate and packet the video ready for streaming, only to force clients into a product

lock-in. For hobbyists, enthusiasts and research institutions, most choose an open

protocol named the Real-time Transport Protocol (RTP) and its sister protocol, the

Real-time Transport Control Protocol (RTCP). They are both standardised under

the International Telecommunications Union, and provide a means of encapsulating

the video and receiving network and video statistics from servers and clients.

Aggregating all of these technologies will make it possible to create a complete sys-

tem for real-time streaming of video over wireless networks from a UAV. Already,

there is a consumer product called the AR.Drone as developed by Parrot SA which

provides a full UAV platform with a video camera, a custom ARM based platform

running Linux, and their own proprietary implementation of real-time video stream-

ing. However, the software is not open-source and is limited in customisation to only

the provided application programming interface (API).

1.2 Aim of Thesis

Like the video streaming software upon the AR.Drone by Parrot SA - and as the

introduction implies - the objectives of the thesis is develop an open-source software

solution for real-time, adaptive video streaming at reasonable qualities, and high

framerates. It will implement libraries to interface to a camera, compress the cap-

tured video frames using an encoder (MPEG-2, H.264, VP8, etc), and stream using

1.3. THESIS OUTLINE 3

the standardised RTP/RTCP protocols. On top of all of these libraries, it will then

implement a control algorithm that adjusts various aspects of quality to maintain a

real-time and high framerate video stream. The software will be optimised, built and

ran upon a low-power, high performance platform small enough to be then mounted

atop either a custom or pre-built UAV for testing purposes.

These points can be summarised in the following:

• Real time encoding of high quality captured video on board the portable plat-

form

• Adaptive control of bit-rate via an algorithm and wireless connection feedback

• Monitor the quality of wireless transmission for dropped Intra-frames

• Maintain an acceptable latency between capture of video and display on client

(under 200ms)

• Ensure end-user can display video stream (standardised streams, or a custom

built client)

• Ensure that the hardware on the portable platform can perform all associated

tasks while keeping a minimal footprint on power usage and size

The main reason for developing this thesis, is that open-source implementations that

provide streaming from the camera, all the way to the client is hard to come by.

There’s implementations that stream video from a camera over RTP, like the media

player VLC, but they generally have no rate control aspects. Also, most of these

implementations are only tested on full desktop computers with x86 architectures

and are not optimised for use on small development boards.

1.3 Thesis Outline

The following chapters should clearly explain:

• Chapter 2, Theory

A summary of the H.264 codec, IEEE 802.11 Wireless Standard, and the Real-

time Transport Protocol and Real-time Transport Control Protocol

• Chapter 3, Literature Review

Examples of prior work placed into the field of real-time streaming of video,

real-time encoding, and other works that relate to this thesis.

4 CHAPTER 1. THESIS OVERVIEW

• Chapter 4, Design of Platform

An explanation of the decisions in hardware and operating system used to test

and develop the final program.

• Chapter 5, Design of Software

A walkthrough of the design and steps taken to develop the “spyPanda” soft-

ware used to stream the adaptive real-time video.

• Chapter 6, Results and Discussion

Results from the designed software, and what could have been improved or

implemented.

• Chapter 7, Conclusions and Future Work

A summary of how well the designed software fared for the objective, and a

short list of possible future work to improve on the platform.

Chapter 2

Theory

2.1 The H.264 Codec

First off, the H.264 standard was developed and designed by the ITU-T (Inter-

national Telecommunications Union Standards Department), and was designed to

complement existing and future networking technologies [34]. Built upon the prior

H.263 standard, it aimed to reduce the bit-rate by 50% [20] compared to previous

standards; however, from this, increased computational complexity was inevitable.

While the reduction in bit-rate is perfect for the next section Real Time Wireless

Transmission of Streamed Video, it unfortunately brings up another issue for this

section. With increased computational complexity, many options arise to alleviate

strain on the processor:

1. Optimise the encoder for the specified processor architecture; which V. Iverson,

J. McVeigh and B. Reese have explained for Intel architectures [20]

2. Choose hardware that allows for hardware based H.264 encoding [8]

3. Choose hardware that has optimisations written within the x264 encoder [18]

4. Use a rate control algorithm to dictate limits on the encoded picture and

hardware; as previously investigated by N. Srinivasamurthy, S. Nagori, G.

Murthy and S. Kumar, TI [26]

From these selections though, it can be clearly seen that a rate control algorithm

(option 4) can be used in conjunction with any of the other options to great effect.

However, to optimise the encoder for the architecture (option 1) is mostly mutually

exclusive with choosing hardware that already has optimisations in x264 (option 3).

So from this quick analysis, it can be seen that the most optimal results for real-time

5

6 CHAPTER 2. THEORY

playback can come back from choosing hardware with optimisations for the x264

encoder, hardware with a hardware based H.264 accelerator and to develop a rate

control algorithm.

Options 2 and 3 have been briefly browsed over in the previous section, where

the PandaBoard is an ARM based Cortex A9 processor which has NEON optimisa-

tions from x264 [18] and a digital signal processor which provides hardware based

encoding of H.264 video [8]. This leaves option 4 with a form of rate control to

dictate the encoders limits to successfully provide 30 frames per second at real time.

Potentially though, option 4 could be made redundant by options 2 and 3 if the

manufacturer specifications are upheld, but having option 4 as a secondary system

linked with the adaptive control in the next section is still a possibility.

As a preface to real time encoding of H.264, it was essential to get an idea of

the output of a H.264 encoder. As a simplified description, the encoder will split the

captured picture into a series of blocks (as small as 4x4 pixels per block). Each block

is then placed through a discrete cosine transform (DCT) to convert the block to its

frequency components, this is the first step to compression. Depending on the type

of frame that is split into blocks, these blocks can also be turned into secondary

layer luma and chroma types, to once again compress the picture. These blocks

combined, make up a total frame when decoded. [34] [16]

For every number of frames in a video sequence, an Intra-frame is taken; a frame in

which contains essentially a full representation of a picture in the video. Although,

it itself uses Intra-frame prediction, in which analyses pixels next to each other,

approximates the region and hence compresses the frame further. In-between every

gap of Intra-frames, there exists two types of Inter-frames the Predicted-frames

and Bi-directional Predicted-frames. These use the previous Intra-frames and each

other to approximate and predict movement (P-Frame and B-Frame) and also use

the next frames and Intra-frame to predict future movement (B-Frame); this method

is essentially a supreme form of compression. [34] [16]

The three frames can be summarised as so and are demonstrated in the following

illustration (Figure 2.1):

• Intra-frame (I-frame): Relies on only itself for decompression, minimal com-

pression

• Predicted-frame (P-frame): Relies on previous frames to decompress, medium

compression

2.2. IEEE 802.11 WIRELESS AND PROTOCOLS 7

• Bi-directional Predicted-frame (B-Frame): Relies on previous and succeeding

frames, maximum compression

Figure 2.1: Example of I, P, and B-Frame Prediction. Picture courtesy of Petteri

Aimonen

After each frame is encoded, the encoder will then split them up with descriptive

headers into network packet sized portions. This is the Network Abstraction Layer

of the H.264 standard, and is designed to effectively pass streamed video over a

network technology. [34] [16] So, a simple definition to real time encoding is not

that an outputted frame from an encoder is instantaneous; instead the frame is

captured, encoded and packeted in under 33.3 milliseconds with a frame rate of 30

frames per second.

2.2 IEEE 802.11 Wireless and Protocols

Matthew Gast’s book, 802.11 Wireless Networks: The Definitive Guide [19] talks

about the underlying principles behind the 802.11 standard. The book explains the

Media Access Control (MAC) layer and its attempts to hide the complexity of the

wireless system underneath. Strength of signal, hidden nodes, obstacles and the

clients distance away from the base station effect the quality of transmission signif-

icantly. Many fall-backs have been designed into the MAC layer, to effectively send

data over wireless space.

In high wireless capacity environments say that of a University campus the hidden

node problem is clearly evident. It creates dead spots between two base stations in

which can only be mitigated by RTS/CTS clearing, these being Request to Send

and Clear to Send frames that are sent between wireless base stations. Effectively,

a base station in a busy environment, will send out a request to silence the other

base stations in proximity and their peers to stop collision of signals. Once the

silence has been completed, the other base stations will return a CTS frame to allow

transmission of the main frame; after the main frame is sent from the original base

station, the original should expect to receive an ACK (or acknowledgement) frame

8 CHAPTER 2. THEORY

back. If any of these frames are not completed, the process of sending the frame

restarts, and hence transmission latency increases. [19, Pages 34–36]

Whenever there’s a drop in a frame, the frame is attempted to be resent, and a

retry counter is placed on that frame. Each time the counter increases, the time al-

located to retry a send in that transmission increases (called the contention window),

and hence this increase of time takes up more bandwidth, and increases the latency

of sent and sending transmissions. To make the matter even more complicated, each

frame is allocated a time called the Duration/ID field, which notifies the recipient to

expect a certain busy time for the transmission currently in progress. Fragmentation

also has an effect on latency, with packets being sent out by the home base station

there is a certain amount of fragmentation that could occur, if any fragments are

lost though, the whole packet will have to be resent. All of these potential problems

increase latency, and so it is important to control the data size for each frame and

hence packet being sent to maintain real time wireless transmission. [19, Pages 50

& 57]

Another contributor to latency, is the Transmission Control Protocol (TCP). It

essentially lies over the top of the MAC layer to ensure highly reliable transport of

frames to other wireless recipients. However, in the situation of transmitting real

time video over a mobile network, TCP proves to be a set back, as can be seen in

[32]. PRSCTP and UDP (Partially-Reliable Stream Control Transmission Protocol

and User Datagram Protocol) are seen to generally transmit frames of MPEG-4

video in under an average of 25 milliseconds, while TCP is seen to do similar, but

unfortunately would spike in latency to in-excess of 2.5 seconds per frame for as

many as 25 frames [32, Figure 7]. This leads to a conclusion that a UDP based

protocol would be needed to stream video far more effectively.

2.3 The RTP and RTCP Protocols

Leading on from the previous section, it was explained that for the application of a

real-time stream within a 802.11 wireless network, one would need to the use of a

UDP based protocol or similar. This leads to the Real-time Transport Protocol and

the Real-time Transport Control Protocol.

The Real-time Transport Protocol (under RFC-1889 [25]) was designed for media

and data with real time characteristics. It essentially lies upon existing network

infrastructure (like TCP and more generally UDP), and packets the media with a

2.3. THE RTP AND RTCP PROTOCOLS 9

header that contains a sequence number, a time stamp, a payload type description

and various other RTP specific identifiers. This header can also be extended to

facilitate custom implementations of the RTP scheme.

The header is laid out in this fashion (Figure 2.2):

Figure 2.2: The RTP Header as defined in RFC-1889 [25]

There is one problem though with the default RTP header, and that is that it does

not include statistics about the stream. This was an intentional design, as if every

RTP header packeted with the payload included statistics about network latency,

quality, etc. would include too much overhead for a real-time implementation. This

is where the RTCP implementation was introduced.

The RTCP protocol was designed to overcome this question of Quality of Service

(QoS). It comprises of 5 different packet types [25]:

• Sender Report (SR): Includes transmission and reception statistics from par-

ticipants that are active senders, and are sent to the receivers

• Receiver Report (RR): Includes transmission and reception statistics from par-

ticipants that are not active senders, and are sent to the senders

• Source Description (SDES): Includes optional details about the source

• Disconnection (BYE): Indicates end of participation

• Application Functions (APP): Indicates custom functions as specified for the

target application

For the purpose of this thesis though, it is only necessary to have an in depth

knowledge of the receiver report, since it is the server that controls the encoding

10 CHAPTER 2. THEORY

parameters. The receiver report (as can be seen in Figure 2.3), is comprised of the

details of the source, the fraction of packets lost since last sender report, the total

number of packets lost, highest sequence number received, the interarrival jitter,

time of last sender report, and the delay since the last sender report and the time

that this receiver report was sent.

Figure 2.3: Receiver Report for the RTP Control Protocol [25]

For streaming of video the important aspects are interarrival jitter, fraction of pack-

ets lost and the delay since last sender report. Interarrival jitter is essentially the

estimate of order that a stream has from the expected details of the video/audio. If

the data comes through unordered, not within the expected time stamp, larger than

expected, etc. it increases the jitter associated with the stream. The delay since

last sender report essentially is the time that the client takes to receive, process and

play the media since the last sender report. These statistics could theoretically be

used to drop the quality of the picture if a clients computer is not powerful enough

to play the stream. The fraction of packets lost can also be used in conjunction with

the total number of packets lost to calculate the total number of packets received.

2.4. RATE CONTROL FOR STREAMS 11

This is demonstrated in the following equation:

CurrentPacketsExpected =n∑

i=0

Losti − Losti−1

FractionLost(2.1)

Even though RTP has Quality of Service statistics through RTCP, it does not ac-

tually perform any guarantees of delivery and requires the underlying network in-

frastructure (like TCP) to handle this. As mentioned previously though, RTP is

generally built upon the UDP infrastructure - which has a “send and forget” prin-

ciple. So Sunhun Lee and Kwangsue Chung investigate the Real-time Transport

Protocol implemented with a TCP-Friendly Rate Control Scheme [22]. This is to

avoid congestion collapse developed from UDP packets and their unwillingness to

adhere to TCP’s congestion control. However, with their TCP-Friendly RTP, they

were still encountering about 160 millisecond round trip times [22, Figure 3]. In

the case of the thesis though, it can be assumed that there will be no interfering

TCP traffic coming from either the UAV or the base station receiving the video. So

hence, a pure method of using RTP could be used to develop UDP-like transmission

times with slightly more reliability.

The benefit of using RTP in this case, is that the H.264 encoder packets its data

suitable for UDP, TCP and RTP/RTCP formats through its NAL standard. If

we proceed one step further, a sister protocol of RTP as mentioned before, is the

Real-time Transport Control Protocol. It’s primary purpose is that of the RTP

protocol, plus it receives information back from the clients about the quality of the

data distribution [30]. It was designed for multiple-user environments, but makes

acquirement of network information simple for a unicast system. Another benefit is

that RTP/RTCP streams are supported through the Live555 project [17] within the

Videolan media program, VLC and also within the MPlayer series of video players.

2.4 Rate Control for Streams

The basic objective of rate control for video streams is to minimise the throughput

enough, so that the medium can keep a consistent flow. This is quite analogous to

the flow of traffic in a busy stretch of road. If too many cars are sent down the

stretch of road, the more they pile and cause congestion. However, if less cars are

sent, an even flow of cars can pass unhindered. The traffic can be adjusted until the

near point of congestion, but not enough to cause it. This is in its simplest form, a

form of bit-rate rate control.

Bit-rate rate control can be used by streams utilising an encoder, to drop the out-

12 CHAPTER 2. THEORY

put bit-rate from the encoder enough so that the network can handle the stream. A

stream though, can also require too much of the encoder where the encoder cannot

stream high quality video at the maximum allowable bit-rate of the network. In

this case, the program has to perform bit-rate rate control on the encoder itself to

allow maximum throughput that is possible of the encoder. These methods can

be implemented by a simple PID algorithm [33] (Equation 2.2), depending on the

implementation of the encoder.

Err = BitrateAim−BitrateCurr

Derivative = Err−ErrOldT imeCurr−T imeOld

Integral = IntegralOld + Err ∗ (TimeCurr − TimeOld)

OutputBitrate = Kp ∗ Err + Ki ∗ Integral + Kd ∗Derivative (2.2)

This is a simple yet effective solution to obtain a desired bitrate for the stream.

However, it can be seen that BitrateAim shall need to be initialised and changed

according to the conditions of the network or limits of the encoder. Bitrate could

also be interchanged for different variables; like packet delay, jitter, network bit-rate,

encoder bit-rate, encoder quality, resolution etc.

Chapter 3

Literature Review

In the following literature review, we will investigate prior or similar attempts to the

problem of streaming H.264 video in real time over wireless, and each of its inherent

difficulties. The difficulties in: a suitable portable platform to encode video in

real time; encoding H.264 video in real time; and, streaming the encoded video

over 802.11 wireless in real time. Other implementations using custom or existing

technologies will also be reviewed, and most or all are mostly related to the field of

UAV surveillance.

3.1 Military Implementations

There are a few military applications of surveillance using UAVs have been imple-

mented, like the General Atomics MQ-1 Predator and the cheaper AeroVironment

RQ-11 Raven. These quarter million (1x Raven Craft, and Control System [2]) and

40 million dollar (4x Predator Craft, and Control System [3]) systems are inaccessi-

ble to the civilian populace.

Figure 3.1: The MQ-1 Predator in full flight [3]

13

14 CHAPTER 3. LITERATURE REVIEW

The MQ-1 Predator contains a colour nose camera, a day variable-aperture TV

camera, a variable-aperture infrared camera, and synthetic aperture radar. These

cameras provide full real-time video (except the radar) but utilise a direct line of

sight proprietary wireless link and a satellite link for beyond horizon flight.

Whereas the Raven provides 3 different attachable cameras that connect to the

nose of the UAV. The first has both a forward facing and side facing camera, the

second is a nose mounted infrared camera and the third is a side mounted infrared

camera. These camera feeds are provided in real-time through a line of sight pro-

prietary wireless link.

The Raven looks remarkably similar to a hobby remote control plane (as can be seen

in Figure 3.2); however, instead of a 20 minute run time like that of a hobby plane, it

provides 50-60 minutes of flight. Notably though, these system have had many years

of development and military backing to provide a sophisticated and extremely reli-

able system. However, there are other, cheaper alternatives for surveillance UAVs.

Figure 3.2: The RQ11 Raven being hand launched [15]

3.2. THE AR.DRONE BY PARROT SA 15

3.2 The AR.Drone by Parrot SA

It is not only militaries around the world that are using UAVs; civilian companies

are developing commercial UAVs for educational use and entertainment. One such

company that is making an impact in the civilian sector is Parrot SA with their

quad-rotor (or quadrocopter) AR.Drone - named so for the Augmented Reality

games it provides.

Figure 3.3: The Parrot AR.Drone with forward facing camera [21]

It utilises a 640x480 pixel forward facing camera providing video at 15 frames per

second, and a downward facing camera providing 176x144 pixels at 60 frames per

second. The custom implemented P.264 video is streamed in real-time (about 100ms

latency) via an 802.11 ad-hoc connection to a computer or smart-phone and allows

the user to control the UAV. For a civilian application, it is rather cheap at US$300

for a relatively customisable system. This platform will be mentioned in later chap-

ters as it will be used in testing and comparison.

3.3 UAV Traffic Surveillance

Suman Srinivasan, Haniph Latchman, John Shea, Tan Wong and Janice McNair of

the University of Florida, have explored the use of Surveillance UAVs [27]. The prob-

lem was that the Department of Transportation had to upgrade their surveillance

of highway traffic, from magnetic loop detectors to something far less primitive. So

they investigated the potential of UAVs in use to quantify traffic conditions in real

time.


These UAVs could be dispatched over specific and/or vast amounts of area in short

amounts of time as opposed to fixed cameras. By using this system, there would

be no need of wired infrastructure, the savings would have been considerable.

Figure 3.4: The proposed UAV traffic surveillance system [27]

However, they proposed a system in which the UAV would transmit the video data

directly to an external computer to encode, instead of compressing it in real time

on-board the system. This approach has one major drawback: limited bandwidth

for higher quality video.

The video and added data would then be transmitted to a microwave based tower

from the UAV based on existing TV station infrastructures. This however, has added

cost because of FCC regulations due to the licensing and use of certain bands. This

problem could have been mitigated by using a direct line of sight, cheap, unlicensed,

802.11 wireless technology (like the Venezuelans [23]) to stream the video back to

base.

3.4 Real-time Encoding and Transmission of H.264

As described in the theory section, The H.264 Codec (Section 2.1); it was explained

that real time video is defined as the time taken to capture a frame, encode and

packet the frame within the allocated framerate. For example, under 33.3 millisec-

3.4. REAL-TIME ENCODING AND TRANSMISSION OF H.264 17

onds for 30 frames per second video. With this in mind, the Texas Instruments

team (Naveen Srinivasamurthy, Soyeb Nagori, Girish Murthy and Satish Kumar )

proposed in their findings within [26] that a rate control algorithm for real time

encoding would limit the total packeted count per frame, and hence the maximum

size of each NAL unit. This is due to the latency involved in creating more headers

for each NAL unit, which when minimised will help minimise the amount of data

sent through a transmission stream.

In addition to minimising the NAL unit count, they proposed that each slave proces-

sor (like the DSP on the OMAP4430) have an offloaded pipeline task that computes

the motion estimation, intra-frame prediction, transformation and quantisation of

residual picture, etc. [26, Pages 2]; while the main processor computes the main

macro-block loop. However, by the results of the TI team and encoding 1080p video,

in reducing the picture size and hence the NAL units, quality loss can be expected

when vast amounts of motion or if a complex scene is encountered. Although, the

overall quality in a non-complex video is increased since the encoder is given more

time to process a frame. The efficient use of encode times in their findings can be

seen in the following graph (Figure 3.5).

Figure 3.5: The Texas Instruments results of the real-time encoding algorithm. [26,

Page 4]


3.5 Adaptive Rate Control for RTP Streams

There are generally two different camps when it comes to streaming multimedia

with rate control over RTP. One is that uses RTP on top of a UDP framework,

and the other is one that is TCP friendly. TCP-Friendly Rate Control (TFRC)

essentially utilises the UDP layer with a control method that competes with TCP

packets in the network, in a fair manner. This is the topic of Ktawut Tappayuth-

pijarn’s (and his team) work in adaptive video streaming over a mobile network [29].

The objective of this scheme, was to optimally control a video stream over the

said mobile network using the extended H.264 Scalable Video Coding standard and

a RTP based, TFRC method to send the data. It utilises the feedback received from

RTP and RTCP in the form of the following feedback shown in Figure 3.6.

Figure 3.6: Example of TFRC use in an RTP/RTCP environment [29]

From this feedback, the server can then calculate the expected rate to adhere to

while being fair to TCP packets within the network - instead of flooding the connec-

tion. This method essentially doubles the receiving rate to acquire the sending rate

(X) until it saturates the network, in which it uses the method shown in Equation

3.1 [29]:

X = min(xtcp, 2 ∗ReceivingRate)

xtcp =s

R

√2p3

+ tRTO(3√

3p8

) ∗ p(1 + 32p2)(3.1)

Where s is the packet size, tRTO is the packet timeout

p is the loss event rate, and R is the Round Trip Time.

This is an excellent method in which to calculate an expected bit-rate to send over a

network. However, their implementation is tailored to resend dropped packets in the

application up to 4 times, and as can be read in Chapter 2.2, it will extend latency

of sent packets. This means that the TFRC method, while useful for streaming, is

not an excellent real-time source for multimedia, and this is confirmed in the results

that can be seen from Figure 3.7

3.5. ADAPTIVE RATE CONTROL FOR RTP STREAMS 19

Figure 3.7: Results of the TFRC method vs. no method [29]

This is not the end of the line for rate control over RTP though. In the appli-

cation of this thesis, the platform only has to compete with SSH (Secure Shell) over

a wireless connection - which inherently uses low network capacity. So the use of

a plain UDP based RTP/RTCP connection with a simple PID control algorithm -

like the one used by Uras Tos and Tolga Ayav in their Adaptive RTP Rate Control

Method [31] - can suffice.

Figure 3.8: PID method employed by the Tos and Ayav’s Method [31]

The method essentially uses the same PID algorithm in Equation 2.2, except uses

packet loss fraction and outputs an expected bit-rate. It is then passed through a

limiting function (L(u(t))), to clip between the lowest and highest bit-rates. This

method can provide low latency, low loss results in single streams applications.

Chapter 4

Design of Platform

4.1 Choice in Hardware

To meet the requirements set in place for the thesis, the base hardware has to re-

ceive, encode and transmit the video captured via an 802.11 based network. This

has to be done while in a portable scenario and because of this, it has to be of a

power efficient nature. Capturing video and transmitting it, are fairly trivial re-

quirements to be satisfied. Most development boards contain some kind of 802.11

wireless technology, and a video camera can be connected via USB (for example,

a USB web-cam). This narrows the search down to potential candidates that can

handle the processing needed to encode video.

Processors from Intel have graced the offices of the business sector and also the

homes of the consumer market. Intel though, has mainly designed high processing

capability processors with immense power requirements, much like those of the i7

line of processors with a measured system load of 80 Watts in Idle and a system

draw of 128 Watts under load [12]. While these processors are more than capable

of encoding H.264 video in real time [12], they are in no real position to be placed

on a mobile platform and powered by battery.

Since a generalised CPU is inefficient for the task of encoding a video stream, alterna-

tives had to be found. Two contenders to the development board market, the newly

revised BeagleBoard-xM [4] and the PandaBoard [8], are perfect for a development

environment, since they support expansion headers, UART, JTAG debugging and

are extremely power efficient (in the sub 5 Watts range). Both boards contain ARM

based processors with digital signal processors, in which ARM NEON optimisations

are supported by the x264 encoder. Although, since these boards are of ARM based

20

4.1. CHOICE IN HARDWARE 21

RISC architecture, the only options for operating systems are only Linux, Android,

QNX, Symbian OS and Windows Mobile CE.

Figure 4.1: The BeagleBoard-xM and its Peripherals [5]

Both are similarly equivalent in respects to hardware, except for their processor

packages and that the BeagleBoard-xM has no built in 802.11 functionality and has

only 512MByte of LPDDR RAM. Overlooking the absence of WLAN in the Bea-

gleBoard which can be integrated in via USB anyway we can see that the Texas

Instruments DM3730 [6] media core in the BeagleBoard is essentially an OMAP3

based media core but with a 1GHz ARM Cortex-A8 processor. The DM3730 has

built in digital signal processing based on the C64x+ line of DSP’s produced by

Texas Instruments (TI), which claims to enable 720p decoding and encoding, but

unfortunately doesn’t state any specific codec. The BeagleBoard measures in at a

meagre 3.35 inches by 3.45 inches and weighs 37 grams; as can be seen in Figure

4.1.

On the other hand, the PandaBoard supports a Texas Instruments OMAP4430

[7] media core. This has a dual core 1GHz ARM Cortex-A9 MPCore, in which is

joined by an IVA3 based hardware accelerator with a similar C64x+ DSP, which TI

claims to provide 1080p H.264 encoding of up to 30 frames per second (with C64x+

optimisations enabled). The PandaBoard also provides 1GB of DDR2 RAM along

with built in 802.11b/g/n based wireless and is slightly larger with dimensions of

4.5 inches by 4.0 inches and weighs just 74 grams; as can be seen in Figure 4.2.

22 CHAPTER 4. DESIGN OF PLATFORM

Figure 4.2: The Pandaboard and its Peripherals [9]

From this brief overview of both boards, it seems clear that the newer generation

PandaBoard would be more than capable enough of producing results with minimal

customisation of hardware.

4.2 The Operating System

As mentioned in the previous section (4.1), ARM based processors do not have

support from many of the popular desktop operating systems. The short list of

operating systems that do run however, are Linux, Android, QNX, Symbian OS

and Windows Mobile CE. However, the Pandaboard has so far only received active

support by Linux based distributions and Android - so this automatically rules out

the rest.

Android is a Linux based, mobile operating system with a custom environment

tailored for the small screened smart phones and tablets of today. It is primarily

4.3. THE UAV PLATFORM AND CAMERA 23

designed as a user based system with API’s and provided toolkits to develop appli-

cations, and as such does not always have required libraries to develop applications.

In this case, RTP/RTCP and H.264 encoding libraries that have been tailored for

the Android environment would be hard to come by. This is no burden however,

since the Linux kernel underneath Android is also used in servers, desktop comput-

ers, laptops and any device with a processor capable of running it.

There are quite literally hundreds of distributions of Linux tailored for certain appli-

cations, devices, and environments. In this case though, the ARM based Pandaboard

supports Angstrom Linux, Gentoo Linux, and Ubuntu Linux. Each have their own

perks, and some for development are just easier than others. However, due to in-

terest and prior knowledge about the Ubuntu distribution, it was decided that the

easy to use system would be used for development. This became a two-edged sword,

as a team of ARM developers (called Linaro [14]) are contributing to the upstream

Linux kernel for improved ARM support, and have developed pre-made images of

the Ubuntu distribution on top of these improved Kernel images for the Pandaboard.

This version of the distribution is nominated as a “headless” or server version,

meaning that it does not include a desktop environment (or user interface) and can

be connected to via a UART serial or SSH connection. This provides a minimal foot-

print in memory and streamlines the system. Ubuntu, being a fork of the Debian

GNU/Linux distribution, also implements the easy to use dpkg/apt package man-

agement system. This provides easy access to a wealth of libraries for development,

including RTP/RTSP and H.264 encoding libraries.

4.3 The UAV Platform and Camera

For the thesis, a UAV platform was chosen that would satisfy the following criteria:

• Have lift capability to lift a camera and the Pandaboard

• Be simple enough to modify and mount camera and Pandaboard

• Provide existing capability to control platform with no need for modification

(but potentially could be extended)

• Provide a stable platform to receive conclusive results

• Be cheap enough for a small thesis project such as this

As can be seen from this criteria, there exists do it yourself hobby remote control

planes and quadrocopters. However, the design of these systems do not quite cover

24 CHAPTER 4. DESIGN OF PLATFORM

the aim and topic of this project. So a pre-built system that can be bought from

local retailers was decided on, and this was the previously mentioned, AR.Drone by

Parrot SA (Chapter 3.2).

It is reported by experiments in forums on the internet [28], that the AR.Drone can

perform to maximum altitude (20 feet) with 253 grams of weight located centrally

over the battery. Since the Pandaboard is 70 grams, the power of the AR.Drone

is proven to be enough to take off with the Pandaboard mounted. This however,

leaves the selection of the camera to be determined by weight and capability.

The camera would have to be light enough to be mounted to the AR.Drone, and

be able to provide various resolutions from resolutions as small as 160x120 pixels to

even 1920x1080 pixels (for testing purposes). It would also have to provide as least

the YUV420 format (for direct encoding with x264, as will be discussed in Chapter

5.2), support the Pandaboards on-board camera connector or just USB and fully

support the Video4Linux2 drivers. This narrowed the search target down to USB

Video Class (UVC) compliant camera devices, like many Logitech web cameras.

The Logitech C910 is a full HD capable webcam (30 fps) [13] with Carl Zeiss op-

tics and autofocus, stereo microphones and supports the UVC standard in which

Linux has also implemented. However, the mounting bracket it comes with, weights

in at around 300 grams - which would have to be removed. This however would

prove beneficial, as the bracket utilised a mounting system that could be used to

mount the camera to the AR.Drone. The final configuration of the test platform

can be viewed in Figure 4.3. For testing purposes, a Styrofoam protective shield

was mounted on top of the Pandaboard to mitigate against potential damage in an

unforeseen circumstance.

The Pandaboard, is powered by a simple 5 Volt/3 Amp voltage regulator circuit,

connected to the AR.Drone’s 11.1V, 1000mAh, 10C Lithium Polymer battery. This

circuit is light enough to be picked up by the AR.Drone and provides a continuous

source of voltage for the Pandaboard - until the Lithium Polymer battery reaches a

voltage below 5.5 Volts.

4.3. THE UAV PLATFORM AND CAMERA 25

Figure 4.3: The AR.Drone, Pandaboard and Logitech C910 Camera - with and

without protective Styrofoam

Chapter 5

Design of Software

This chapter will guide the reader through the various aspects of the software de-

veloped within this thesis. This section will provide a brief overview of the architec-

ture of this software. The software, spyPanda (named after the development board

and its inherent potential use), will use the Video4Linux2 driver stack to receive

YUV420 format video from the Logitech C910 connected to the Pandaboard’s USB

port. This video will then be buffered in memory, and retrieved by the open source

x264 H.264/MPEG-4 AVC library, ready for encoding and compression. Once com-

pressed, the encoder will notify the Live555 RTP/RTCP streaming library, that a

new frame is ready to be packeted and sent over the network using the Real-time

Transport Protocol.

The video is then streamed to and played by the client, statistics are calculated,

and then finally a Real-time Transport Control Protocol Receiver Report packet is

sent back every interval (as defined by the servers bandwidth, generally 5 seconds)

with the statistics of the last burst of packets since the last sender report. This data

is then used by the spyPanda control algorithm to control the encoder parameters, to

increase or decrease, framerate, resolution and quality to ensure a real-time stream

of video over the wireless connection. This overview can be simplified in Figure 5.1.

Figure 5.1: Overview block diagram of the spyPanda software

26

5.1. VIDEO4LINUX2 27

5.1 Video4Linux2

The Video4Linux2 API [24] is a set of calls that can be used to directly interface

with a piece of video or radio hardware from the Linux host. Initially, a program will

open up the device with read and/or write permissions and then issue ioctl calls

from the program which tell the kernel device driver what to do. In the spyPanda

implementation these ioctl calls are encapsulated using the v4l2 ioctl function

located in the libv4l2 library. For error checking, this has been inserted into the

xioctl function within the v4l2 camera.c source file (Appendices B.2).

After opening the device (open device and v4l2 open), spyPanda will probe the

device for resolutions in the same aspect ratio (from the resolution) that was passed

to it from the command line at startup, or from a default ratio of 4:3. This device

probing function (known as save device resolutions) utilises a linked list (Table

5.1) to order resolutions from least demanding for the encoder to most demanding

- calculated by a simple and logical formula as shown in Equation 5.1:

ProcessorDemand = width ∗ height ∗ framerate (5.1)

Linked List BOT -1 HEAD +1 TOP

WxH@FPS NULL 160x120@30 320x240@30 320x240@60 NULL

Demand 0 576000 2304000 4608000 0

Table 5.1: Sample of spyPanda’s ordered linked list of resolutions and framerates

Once spyPanda has verified that the selected (or default) resolution exists in the

list, it can finally execute the init device function. This function will use xioctl

to set the device with a format (V4L2 PIX FMT YUV420 for efficiency when passing

to the encoder) and a resolution (e.g. 640x480 pixels). It will then request the

driver to initiate a BUF COUNT number of memory mapped buffers for the device,

then map that amount of buffers and finally exchange those buffers with the driver

using VIDIOC QBUF. Since the device is finally ready to be functional, the framer-

ate can then be set by textttset device framerate and the stream turned on using

VIDIOC STREAMON.

28 CHAPTER 5. DESIGN OF SOFTWARE

Now that the video is ready to be captured, spyPanda can use the start read frame

and stop read frame functions to indicate to the driver that the frame can be copied

to the memory mapped location in memory, ready to be encoded. When the pro-

gram needs to reinitialise or just clean the device from memory, it will call the

uninit device (and if closing, close device) function to turn the stream off, free

and unmap the memory mapped buffers, and then finally request that the rest of

the buffers be destroyed within the driver.

5.2 x264 Open Broadcast Encoder

The x264 encoding library is an implementation of a H.264/MPEG-4 AVC encoder.

It supports many CPU architectures like x86, x86 64, PowerPC, Sparc and ARM;

and is optimised to run on these. In particular, it has NEON optimisations for the

ARM architecture [18], which can be used upon the Pandaboard to further acceler-

ate its capability to encode raw video. However, the x264 library does not implement

C64x+ DSP accelerations since the framework to offload the processing using the

Direct Memory Access (DMA) method has not been written. This would have been

a massive task for the current thesis, and one which would require far more knowl-

edge and experience in DSP’s and kernel drivers. So the NEON optimisations would

have to suffice.

Encoding within the x264 library is a relatively simple task, except for the sheer

amount of variables to initially set up. There are many presets to initially base the

encoder upon though, in which make the encoder a breeze to set up. The initial pa-

rameters used to initialise the encoder are the preset “veryfast” and tuning named

“zerolatency” (using x264 param default preset(x264param, "veryfast", "ze

rolatency"). These parameters have been use in addition to the “Constrained

Baseline Profile” (CBP) that is specified in the H.264 standard and that has been im-

plemented in the x264 encoder (using x264 param apply profile(encoder, "base

line")).

Each preset defines slightly different parameters that the encoder needs to abide by;

however, the x264 encoding library simplifies a standard set of 10 presets. These

presets generally implement different analysing algorithms, inter/intra partitions,

transforms, refinements, etc. all of which can be viewed within the x264 source code

under the function x264 param apply preset located in common/common.c. The

tunings however, generally specify the number of threads and number of frames it

can buffer. In the case of “zerolatency”, it specifies that the encoder cannot look

5.2. X264 OPEN BROADCAST ENCODER 29

ahead in any way, has no B-Frames (since this would require frames buffered), uses

the frames per second for the encoders own rate control, and enables slice-based

threading.

There is though, a modified x264 library named the Open Broadcast Encoder which

aims to provide an encoder that provides real-time encoding. This library is simple,

as it uses a single parameter - x264param.sc.f speed (the ratio from real-time, eg.

1.0 is equivalent to real-time) - to perform rate control on the encoder. By using

this parameter, the quality of video can be controlled depending on the feedback

from the client and also whether the Pandaboard is stressing under the load - this

will be explained in Chapter 5.4.

Now that the profile (CBP), preset (“veryfast”) and the tuning (“zerolatency”)

have been chosen, they can be implemented in an initialising function within spy-

Panda named init encoder param). This will set the encoder to use these preset

parameters with modified parameters have been optimised for the Pandaboard. In

particular, a default resolution of 320x240 at 30 frames per second, no B-Frames,

a real-time ratio of 1.0, and a constant rate factor of 25 with a limit of 40 (the

higher, quality will drop within the video and compress further in scenes of motion).

The control algorithm will however modify some of these default parameters as the

program runs, and so are subject to change.

After the default spyPanda encoding parameters have been set, and the encoder

opened, the encoder can finally start encoding frames stored in the buffer by the

Video4Linux2 driver. The encode frame function essentially calls start read frame,

points the 3 planes - located within the enc.pic in.img plane array - to locations

within the buffer that adhere to the YUV420 specification. From here, the 3 planes

are then passed to the library function x264 encoder encode to encode the frame,

and save a pointer to the memory holding the NAL unit in the spyPanda variable,

enc.nal. The frame size is then saved in the NAL unit along with the payload,

ready to be passed to and triggered for the Live555 library using triggerFrameRe

ceived.

The average encode time and frame size, is also kept track of by means of using

the gettimeofday function, and a difference in time between the start and end of

a encoded frame. These statistics will then be used to calculate expected bitrate

and framerate for the control algorithms. The code for this implementation, can be

located within the x264 control.c source file (Appendices B.3).


5.3 RTP and RTCP with Live555

The Live555 [17] set of streaming libraries, is a standards compliant RTP/RTCP/RTSP

library that has the ability to stream many codecs over a network connection. It

is supported as cross platform on Windows, Linux and Mac; and is integrated as

a client or server in many open source media applications. This means that, VLC

media player and MPlayer - which both use Live555 as a client - can be used, along

with many other RTSP compliant media players, to play the streamed video.

The Live555 library however, does not offer pre-written support for sourcing video

frames or byte streams from encoders. So, a subclass of the library FramedSource -

a means of standardising capture from a file, encoder or another RTP source, etc -

has to be written to encapsulate the encoder, ready for reading. This is be a modi-

fied copy of the DeviceSource example that the library provides, and can be seen

in x264EncoderSource.cpp under Appendices B.4. This code is essentially sched-

uled in the library’s event loop, and calls deliverReadyFrame to retrieve the NAL

payload produced by the encoder. The event loop knows when a new frame is ready

and is told the new address of the NAL payload, when the triggerFrameReceived

function is called (located in live555.cpp of Appendices B.4).

Now that the Live555 library can finally read from the encoder, a dedicated Live555

thread is created using the pthread library. This is due to the design of the library,

as it uses an event loop to schedule specified tasks, and would detriment the concur-

rency that the x264 encoder requires to process frames in an efficient manner. How-

ever, the Live555 library is in no way considered to be thread safe, so any updates

to the library would have to be passed through using triggers (like triggerFrameRe

ceived) or global variables. Once the thread has started, spyPanda has to wait for

Live555 to complete its setup - this is implemented using a pthread cond wait and

mutex locks, both in main.c (Appendices B.1).

The Live555 runtime can then be setup within this separate thread, by specifying

a succession of configurations for the library to read and stream the encoded video.

For spyPanda, an RTSP server has been set up since it is the most widely supported

protocol, and uses the RTP and RTCP application layers, with the UDP transport

layer underneath. The implementation that is used to set up this RTSP server, is a

modified clone of the Live555 implementation of the testH264VideoStreamer.cpp

RTSP server that is located within the Live555 “testProgs” source directory. This

modified code can be seen in live555.cpp within Appendices B.4.

5.4. THE SPYPANDA CONTROL ALGORITHM 31

Firstly, a port number for both the RTP and RTCP servers are specified as 18888 and

18889 respectively. Both of these ports are then “groupsocked” (a Live555 socket

like object) and binded, ready to be used. The RTP Groupsock can then be linked

to a H264VideoRTPSink, which will handle the passing of H.264 packets and then

an RTCPInstance can be created to provide QoS statistics. The RTCP instance is

initiated with an estimate of the session bandwidth, by acquiring an estimate from

the wl1271 (Pandaboard wireless card) wireless driver and interfaced using ioctl

calls as demonstrated in iw stats.c in Appendices B.5.

After these modules are loaded, the RTSP server can finally be created, and a

ServerMediaSession with an RTCP subsession added and created. However, the

video source - in this case, the encoder - still has not been added; this is where the

x264EncoderSource sub-class is finally added and used. In the setup of the thread,

a single video frame was captured and encoded, and the memory address of the first

NAL unit from this frame was passed through.

The x264EncoderSource sub-class is then initialised with this NAL unit address,

and is then instantiated as a FramedSource. However, the frames from the encoder

source still need to be split, ready to be repackaged correctly for use in the RTP

stream. This is the job of the H264VideoStreamFramer class, in which is a filter

that breaks the H.264 elementary stream into an RTP workable state. Once this is

initialised and ready, the stream is finally ready to be played and the Live555 thread

broadcasts that it is completed, ready for the x264/Video4Linux2 thread to proceed.

Once playing though, the spyPanda control algorithm requires stream statistics

from Live555. This is completed through the H264VideoRTPSink class, in which

holds a transmission statistics database (TransmisionStatsDB) saved from every

connected client and their respective Receiver Reports. In the case of spyPanda, an

iterator passes through the database, and selects the very last (or the most recently

connected) clients transmission statistics. These statistics contain standard feed-

back from Receiver Reports and are requested by the custom functions specified in

the “Quality of Service Section” of live555.cpp in Appendices B.4.

5.4 The spyPanda Control Algorithm

The spyPanda control algorithm is simply a jitter based system, where an optimal

jitter is discovered and the control algorithm tries to attain that optimal jitter. It

also monitors the framerate that is produced from the encoder, and acts on the


quality/size of the picture if the output encoder framerate drops below 66% of the

specified encoder framerate (enc.x264param.i fps num).

The control algorithm is simple; keep dropping the quality of the video until the

framerate and jitter levels are back to within the acceptable limits. If the quality

cannot be dropped any further, lead to more aggressive action and drop the reso-

lution to the next lowest demanding resolution (from the linked list as specified in

Chapter 5.1). This system essentially attempts to drop the quality and compression

in order to reduce the load that the encoder has on the Pandaboard hardware and

hence, increase the bit-rate over the network.

It is implemented by using the real-time ratio that is supplied by the x264 Open

Broadcast Encoder (which drops quality of video if it is below a ratio of 1.0, and in-

creases if above) and by using the measured variables from the average encode time

(enc.ave enc) and the estimated jitter acquired from the TransmissionStatsDB

in Live555. Every interval that a new RTCP packet is received, spyPanda will

read the data within this packet and make a decision as to whether it will drop

the real-time ratio or the resolution. When not within the bounds of the jitter or

framerate, it will proportionally drop (similar to the PID algorithm from Equation

2.2) the real-time ratio by the last ratio, effectively dropping the quality of the video.

If this ratio is dropped too far, finally drop the resolution; in this case, the ra-

tio is reset to 1.0, and the process starts again. However, if the new resolution has

processing leeway, the ratio will increase (increasing quality), until the network/en-

coder can no longer handle the quality or if it is greater than a top ratio limit. In

this case, it will attempt to increase the resolution if the new resolutions old jitter

reading is within acceptable bounds, or if that resolution just has not been used

before. On any resolution swap, the jitter for that last resolution is saved in the

linked list, ready to be checked if required.

Implementing this control algorithm within spyPanda proved to elude any prospect

that it would work in practice. Modifying the resolution within the encoder would

throw errors about stride, which in practice is the padding after a resolution width,

as demonstrated in Figure 5.2. In the case of the x264 implementation though, the

stride was equal to the width of the image and so a picture could not dynamically

increase. This applied to the Video4Linux2 library as well, as you could not just

pass in the same sized picture without the stride error biting once again.

5.4. THE SPYPANDA CONTROL ALGORITHM 33

Figure 5.2: Illustration displaying stride [11]

It was considered to use the libswscale (or Software Scale) implementation from

the FFmpeg libraries [10] to scale the larger camera images into smaller ones for

x264. However, it was deemed too processor intensive and so the encoder is com-

pletely reinitialised with the new configurations. This process also requires that

the camera dynamically change its resolution. Initially, the same idea of restarting

was applied to the camera; however, a complete v4l2 close and v4l2 open proved

to take a substantial amount of time to operate. So a simple uninit device and

init device (from v4l2 camera.c, Appendices B.2) was implemented to force the

camera to change resolution.

This method proved successful in dropping the resolution, but took about 0.5 sec-

onds to restart and increased latency for the first couple of seconds of the stream.

This was due to the catchup that was needed for the client to play the first few

seconds of the stream, and flush the buffer - and also happens at the start of every

spyPanda RTSP session.

Chapter 6

Results of spyPanda

The results for this chapter, were based on a configuration of spyPanda that uses

Instantaneous Decoding Refresh (IDR) frames and starts at a resolution of 640x480

pixels at 30 frames per second (FPS).

This configuration can be run with the command:

./spyPanda -i -r 640x480@30

Since the Pandaboard’s processor is only capable of so much, it was deemed that

sections dedicated to the results of its encoding performance and stream performance

would be listed.

6.1 Pandaboard Encoded Framerate Results

As can be seen from Figure 6.1, initially the Pandaboard is struggling to encode

at a rate (FPS ) above the encoders specified framerate (Encoder FPS ). At ∼11

seconds, spyPanda drops the specified Encoder FPS to 20 frames per second. At

this point though, the Pandaboard is on the verge to effectively delivering the video

at the expected framerate. However, the jitter (as shown in Figure 6.3) dictates

that the resolution needs to be dropped further to produce even results (and hence

low latency). So at ∼22 seconds, the resolution is dropped further from 640x480

pixels at 20 frames per second, to the next lowest demanding resolution of 320x240

pixels at 60 frames per second. The framerate is continually dropped in the camera

and encoder until the jitter finally stabilises at a point that exhibits the least latency.

It should be noted though, that the result FPS, is the potential framerate that

could be encoded using that resolution. The encoder does not stream a framerate

higher than what is expected, only equal to or lower.

34

6.1. PANDABOARD ENCODED FRAMERATE RESULTS 35

Figure 6.1: Framerate attained over progressions of dropped resolutions

36 CHAPTER 6. RESULTS OF SPYPANDA

6.2 Pandaboard Output Bitrate Results

The bitrate in this case also exhibits a similar progression as the real-time ratio

and resolutions are changed. As can be seen in Figure 6.2, the bitrate attempts

to keep underneath the estimated Wireless LAN bitrate capacity according to the

Jitter exhibited by the stream. The encoder output bitrate slowly increases as the

compression is reduced to alleviate stress on the Cortex-A9 processor that is encod-

ing the video.

It can also be seen that using this control scheme, the processing capability of

the Pandaboard will never have the ability to stream over 2000Kbit/s. This means

that only at far ranges, when the link quality is low, that the current bitrate from

the encoder will become a problem. However, constraints on jitter should alleviate

concerns and further drop the resolution.

These drops in resolution are specified at times within the following table:

Time (ms) 0 11’012 22’528 25’842 29’542

Resolution 640x480@30 640x480@20 320x240@60 320x240@30 320x240@24

Table 6.1: Times for Resolution changes in stream for Figures 6.1, 6.2 and 6.3

6.2. PANDABOARD OUTPUT BITRATE RESULTS 37

Figure 6.2: Bitrate attained over progressions of dropped resolutions


6.3 Jitter and Latency Results

In the case of jitter, it could be seen that at the beginning of each encoder start/restart

(or resolution/framerate change), that the Jitter would spike and then settle after

an arbitrary time. This spike would be observed (clientside) in an increase of latency

(of more than half a second) from capturing and then viewing the image. This spike

added more than half a second to the original latency, until it finally settled. It

can also be seen that the time taken to change resolution is ∼0.5 seconds; which

means that the client is expecting frames between this down time. The client will

then attempt to increase the buffer size (and hence latency buffer) until a time out

is reached, and the stream is cut.

However, when the video does return, this buffer latency is still kept high, and

the new video is played with added latency on top of the encoding latency and

stream latency. This also proves that jitter can be used as a rough estimate to how

much latency a stream has on client side - which is expected, since jitter is a measure

of time invariance of the stream. The next reasonable assumption would then be to

slowly stress the Pandaboard until the stream (at client-side) seems to lag, or have

noticeable latency. At this point, the jitter would be recorded and used to set the

maximum jitter a stream should have to be deemed real-time. The jitter found to

ensure a real-time stream was found to be at a jitter of ∼3000.

As can be seen, for the resolution changes - until the change to 320x240 pixels

at 24 frames per second - the jitter is far too high to maintain a low latency stream.

So the real-time ratio is dropped to attempt to reduce the time (and quality) that

is taken to encode each frame. However, even with these measures, the time taken

is still too high to reduce the latency. So aggressive action is taken by dropping the

resolution and/or framerate whenever the ratio drops below 0.6.

However, when the optimal jitter is finally reached, the ratio has room to increase

quality. This can be observed at ∼29 seconds, and eventually increases until it caps

at a ratio of ∼2.0.

6.3. JITTER AND LATENCY RESULTS 39

Figure 6.3: Jitter experienced in the stream


6.4 Discussion on Picture Quality

For the sake of a high framerate and low latency, the picture quality does suffer. On

average, the Pandaboard drops the resolution down to 320x240 pixels just to keep

a low latency. This can be seen in Figure 6.4:

Figure 6.4: Stable flight of AR.Drone at 320x240 pixels

However, when high motion is encountered, the Constant Rate Factor (CRF) al-

gorithm of the x264 encoder kicks in, and limits the amount of compression, and

removes vast amounts of quality. As can be seen in Figure 6.5:

Figure 6.5: Unstable flight of AR.Drone at 320x240 pixels

However, in times of unpredictable motion (simultaneous yaw and elevation), the

quality finally gives in and pixelates the picture beyond recognition. Which can be

seen in Figure 6.6:

6.4. DISCUSSION ON PICTURE QUALITY 41

Figure 6.6: High motion flight of AR.Drone at 320x240 pixels

Simple motion did not pose a serious issue in the design of the spyPanda solution,

since the CRF algorithm is designed to reduce the quality of the video enough in

motion, that the human eye cannot tell the difference. However, when pixelation

occurs, it would have been beneficial to include a motion stabilisation pre-processing

algorithm, so that the encoder does not have to use extra processing power to pre-

dict and compress further.

The trials of the spyPanda program did support the fact that a higher framer-

ate and low latency is far more beneficial than quality in terms of the ability for a

user to control and observe the said UAV. In a security sense though, it could be

far more beneficial to include an on-board algorithm that turns up quality when a

suspicious activity is found. In this case, the use of DSP accelerated encoding could

have vastly improved the results of the picture quality and increased the number of

potential resolutions that spyPanda could have used.

Chapter 7

Conclusions

7.1 Summary and conclusions

The final product of the thesis - spyPanda - has demonstrated the ability to provide

a low latency, high framerate H.264 video stream from a UAV to a client. Unfor-

tunately, to provide these requirements, the control algorithm had to reduce the

quality of the video to allow the Pandaboard’s Cortex-A9 processor to encode the

video. However, it was deemed in user tests that a high framerate and low latency

video stream was far more beneficial to the observation of the UAV, than a high

quality, noticeably “laggy” (high latency), low framerate solution.

It was suggested that the use of a DSP accelerated encoder would have improved

the quality of the feed, while retaining the high framerate and low latency qualities.

This also demonstrated that the limiting factor of the platform was the software

based encoding upon the Pandaboard’s processor.

The spyPanda algorithm also successfully changed resolutions of the feed in a short

enough amount of time so that the client did not deem the RTP stream to be timed

out. This method made use of the fact that time invariance jitter - as calculated

from the RTP stream - could be successfully used as feedback to optimise a stream

to provide low latency video.

To summarise, the open-source spyPanda platform succeeded in all of the initial

goals of the project of providing a reliable low-latency, high framerate, adaptive

video stream over a standards compliant application layer (RTP/RTCP/RTSP).

42

7.2. POSSIBLE FUTURE WORK 43

7.2 Possible future work

As has been discussed, the main areas of improvement would be to include a DSP

optimised encoder. This could be implemented in one of the following ways:

• Purchase a license from Texas Instruments to use their proprietary encoders

• Modify the existing x264 source code to include custom optimisations for the

C64x+ line of DSP’s

This would most likely be the second option, in order to keep the open source nature

of the spyPanda program.

If the DSP optimisations prove to be not effective at reducing the pixelation (due

to unpredictable motion) in the video stream, a pre-processing motion stabilisation

algorithm could be implemented. This would essentially lie in the middle of the

Video4Linux2 and x264 encoding layers, and act as a “spring and damper” for the

motion of video in the hopes of reducing motion and blur. Alternatively, a float-

ing lens could be used upon the platforms camera to passively reduce the effects of

extreme motion [1].

44 CHAPTER 7. CONCLUSIONS

Appendix A

Program listings

The spyPanda adaptive and real-time, H.264 video streaming program provides an

open source solution to many real-time streaming products.

It utilises the well established x264 encoder (Open Broadcast Encoder variant) and

the live555 RTSP libraries, with a customisable and extendable control algorithm.

Currently, spyPanda uses the Bazaar revisioning system and stores its code on

Launchpad.

Install Bazaar using:

sudo apt-get install bzr

And get the source code using:

bzr clone lp: alex-stevens/+junk/spyPanda

The code can also be viewed online at this address:

http://bazaar.launchpad.net/ alex-stevens/+junk/spyPanda/files

The revision that is referenced in this version of the document is revision 72.

45

Appendix B

Companion disk

Due to the size of the source code for the program, the source code can be found

online or within the companion disk.

B.1 Main C File

This can be located on the companion disc in:

spyPanda/main.c

B.2 Video4Linux2 Implementation


spyPanda/v4l2 camera.c and spyPanda/v4l2 camera.h

B.3 x264 Implementation


spyPanda/x264 control.c and spyPanda/x264 control.h

B.4 Live555 Implementation


spyPanda/x264EncoderSource.cpp and spyPanda/x264EncoderSource.hh

46

B.5. MISCELLANEOUS C IMPLEMENTATIONS 47

B.5 Miscellaneous C Implementations

Linked Lists can be located on the companion disc in:

spyPanda/linked list.c and spyPanda/linked list.h

Wireless LAN statistics can be located on the companion disc in:

spyPanda/iw stats.c and spyPanda/iw stats.h

B.6 Sample Results

The results used in this document can be located under Results/stats.csv

These results were used in the creation of the Framerate, Bitrate and Jitter graphs.

Figures 6.1, 6.2, 6.3 respectively.

B.7 Report LaTeX Source and Items

The source for this document is located within the Report-latex directory under the

companion disc.

B.8 This Report

This report can be located within the root directory of this companion disc, and is

named 41719882 stevens.pdf

48 APPENDIX B. COMPANION DISK

Bibliography

[1] What is optical shift image stabilizer? http://www.canon.com/bctv/faq/

optis.html.

[2] Rq-11 raven. http://www.globalsecurity.org/intell/systems/raven.

htm, 2005.

[3] Mq-1 predator unmanned aerial vehicle. http://www.162fw.ang.af.mil/

resources/factsheets/factsheet.asp?id=11932, February 2008.

[4] Beagleboard-xm product reference. http://beagle.s3.amazonaws.com/

design/xM-A/BB_xM_SRM_A2_01.pdf, 2010.

[5] Beagleboard.org - hardware-xm. http://beagleboard.org/hardware-xM,

2010.

[6] Davincitm dm37x video processors. http://focus.tij.co.jp/jp/lit/ml/

sprt571/sprt571.pdf, 2010.

[7] Omap 4 mobile applications platform. http://focus.ti.com/lit/ml/

swpt034a/swpt034a.pdf, 2010.

[8] Pandaboard platform specifications. http://www.pandaboard.org/content/

platform, 2010.

[9] Pandaboard references — pandaboard. http://pandaboard.org/content/

resources/references, 2010.

[10] Ffmpeg. http://ffmpeg.org/, 2011.

[11] Image stride. http://msdn.microsoft.com/en-us/library/aa473780%28v=

vs.85%29.aspx, 8 September 2011.

[12] Intel core i7 2600k cpu benchmark. http://www.anandtech.com/bench/

Product/287, 2011.

49

50 BIBLIOGRAPHY

[13] Logitech hd pro webcam c910. http://www.logitech.com/en-au/

webcam-communications/webcams/devices/6816, 2011.

[14] Open source software for arm socs. http://www.linaro.org/, 2011.

[15] Sgt. 1st Class Michael Guillory. Up, up and away. http://usarmy.vo.llnwd.

net/e2/-images/2006/11/22/1024/army.mil-2006-11-22-114612.jpg,

November 2006.

[16] Panasonic Corporation. Mpeg-4 avc/h.264 codec technology explanation. http:

//pro-av.panasonic.net/en/technology/technology.pdf.

[17] Ross Finlayson. Live555 streaming media. http://www.live555.com/

liveMedia/.

[18] Jason Garrett-Glaser. Announcing arm support for x264. http://x264dev.

multimedia.cx/archives/142, 24 August 2009.

[19] Matthew Gast. O’Reilly Media, Inc., 2nd edition, 25 April 2005.

[20] V. Iverson, J. McVeigh, and B. Reese. Real-time h.24-avc codec on intel ar-

chitectures. In ICIP International Conference on Image Processing, volume 2,

pages 757–760, 24-27 October 2004.

[21] Ben Kuchera. Parrot ar.drone to attack this september,

for $300. http://arstechnica.com/gaming/news/2010/06/

parrot-ardrone-to-attack-this-september-for-300.ars.

[22] Sunhun Lee and Kwangsue Chung. Cp-friendly rate control scheme based on

rtp. In Information Networking. Advances in Data Communications and Wire-

less Networks, Lecture Notes in Computer Science, volume 3961, pages 660–669,

2006.

[23] Nilay Patel. Venezuelans set new wifi distance record:

237 miles. http://www.engadget.com/2007/06/19/

venezuelans-set-new-wifi-distance-record-237-miles/, June 2007.

[24] Michael H Schimek, Bill Dirk, Hans Verkuil, and Martin Rubli. Video for Linux

Two API Specification, volume 0. Bytesex.org.

[25] Henning Schulzrinne, S. Casner, R. Frederick, and V. Jacobson. RTP: a trans-

port protocol for Real-Time applications. RFC 3550, Internet Engineering Task

Force, 2003.

BIBLIOGRAPHY 51

[26] Naveen Srinivasamurthy, Soyeb Nagori, Girish Murthy, and Satish Kumar. Sub-

picture based rate control algorithm for achieving real time encoding and im-

proved video quality for h.264 hd encoder on embedded video socs. In 2010

IEEE 4th International Conference on Internet Multimedia Services Architec-

ture and Application, pages 1–6, 15-17 December 2010.

[27] Suman Srinivasan, Haniph Latchman, John Shea, Tan Wong, and Janice Mc-

Nair. Airborne traffic surveillance systems: video surveillance of highway traffic.

In Proceedings of the ACM 2nd international workshop on Video surveillance

& sensor networks, 10-16 October 2004.

[28] symon. Payload of the a.r. drone. http://www.ardrone-flyers.com/forum/

viewtopic.php?f=7&t=38, 5 September 2010.

[29] Ktawut Tappayuthpijarn, Guenther Liebl, Thomas Stockhammer, and Ecke-

hard Steinbach. Adaptive video streaming over a mobile network with tcp-

friendly rate control. June 2009.

[30] Javvin Technologies. chapter RTCP: RTP Control Protocol, page 145. 2nd

edition.

[31] Uras Tos and Tolga Ayav. Adaptive rtp rate control method. In 2011 35th IEEE

Annual Computer Software and Applications Conference Workshops, 2011.

[32] Hongtao Wang, Yuehui Jin, Wendong Wang, Jian Ma, and Dongmei Zhang.

The performance comparison of prsctp, tcp and udp for mpeg-4 multimedia

traffic in mobile network. In International Conference on Communication Tech-

nology Proceedings, volume 1, pages 403–406, 9-11 April 2003.

[33] Tim Wescott. Pid without a phd. http://igor.chudov.com/manuals/

Servo-Tuning/PID-without-a-PhD.pdf, October 2000.

[34] T. Wiegand, G.J. Sullivan, G. Bjontegaard, and A. Luthra. Overview of the

h.264/avc video coding standard. IEEE Transactions on Circuits and Systems

for Video Technology, 13(6):560–576, July 2003.

spyPanda Thesis Report

Documents

Transcript of spyPanda Thesis Report