Bandwidth Efficient IPTV Distribution2038/...Mittuniversitetet Informationsteknologi och medier...

127
Bandwidth Efficient IPTV Distribution – On Error Resilience and Fast Channel Change Ulf Jennehag Department of Information Technology and Media Mid Sweden University Doctoral Thesis No. 39 Sundsvall, Sweden 2007

Transcript of Bandwidth Efficient IPTV Distribution2038/...Mittuniversitetet Informationsteknologi och medier...

  • Bandwidth Efficient IPTVDistribution

    – On Error Resilience and Fast ChannelChange

    Ulf Jennehag

    Department of Information Technology and MediaMid Sweden University

    Doctoral Thesis No. 39Sundsvall, Sweden

    2007

  • MittuniversitetetInformationsteknologi och medier

    ISBN 978-91-85317-77-6 SE-851 70 SundsvallISSN 1652-893X SWEDEN

    Akademisk avhandling som med tillstånd av Mittuniversitetet framlägges till of-fentlig granskning för avläggande av teknologie doktorsexamen torsdagen den 17januari 2008 i L111, Mittuniversitetet, Holmgatan 10, Sundsvall.

    c©Ulf Jennehag, december 2007Tryck: Tryckeriet Mittuniversitetet

  • To my wife

  • Abstract

    Television is now changing its traditional distribution forms to being distributeddigitally over broadband networks. The recent development of broadband Internetconnectivity has made the transition to Internet Protocol Television (IPTV) possible.When changing distribution technique of an existing service, it is important that thenew technique does not make the service worse from the user’s point of view. Al-though a broadband network offers high capacity and has excellent performancethere will be occasional packet losses and delays which could negatively influencethe user experience of the delivered broadband service. Since bandwidth is a keyconstraint for video distribution there is a strong incentive for finding schemes toincrease bandwidth utilization, especially when distributing high bandwidth IPTVservices. In digital video coding it is common to use predictive coding to removetemporal redundancy in video sequences. This technique greatly increases the cod-ing efficiency but makes the sequence more sensitive to information loss or delay.In addition, the use of predictive coding also introduce a inter frame dependencywhich could make the channel change significantly slower.

    This thesis addresses two important areas related to bandwidth efficient IPTVdistribution, namely error resilience and fast channel change. A method to numer-ically estimate the decoded objective video quality of scalable coded video is pre-sented and evaluated. The method can be used to estimate objective video qualityfor a scalable video transmission system subject to packet-loss. The quality gain oftemporally scalable video in a priority packet dropping environment is also investi-gated and quantified. Synchronization Frames for Channel Switching (SFCS) is pro-posed as a method to code and distribute video with IP-multicast, which can be usedto efficiently combat packet-loss, increase bandwidth utilization, and offer a channelchange speed up. The performance of SFCS is analyzed and bandwidth estima-tion expressions are formulated, analytical results are complemented with computersimulations. The results show that SFCS deployed in an IPTV delivery system cansignificantly lower the bandwidth consumption and speed up the channel change.

  • Sammanfattning

    De traditionella distributionssätten för tv byts rask takt ut mot digital distributionvia bredband. Det är den senaste utvecklingen av bredbandsnät som möjligtgjortlanseringen av nya bredbandiga tjänster, till exempel ip-tv. När distributionsteknikerbyt ut till förmån för nya är det viktigt att de nya teknikerna inte på verkar den un-derliggande tjänsten på ett negativt sätt. Även om nya bredbandsnäten oftast harmycket hög kapacitet och utmärkta prestanda så händer det att paket förloras ellerblir fördröjda. Eftersom bandbredd är en flaskhals för distribution av video så finnsdet starka incitament för att finna effektivare distributionssätt, speciellt vid distri-bution av ip-tv med hög bandbredd. Vid kodning av digital video är det vanligtatt använda prediktiv kodning för att reducera temporal redundans som är van-ligt förekommande i videosekvenser. Användandet av prediktiv kodning ökar kod-ningseffektiviteten avsevärt men gör den kodade sekvensen känsligare för infor-mationsförluster och fördröjningar. Prediktiv kodning av video skapar även ettberoende mellan olika bilder i sekvensen vilket kan göra att kanalbyten blir långsam-mare.

    Avhandlingen omfattar två viktiga områden rörande distribution av ip-tv, näm-ligen motståndskraft mot paketförluster och snabba kanalbyten. En metod för attnumeriskt förutsäga den objektiva videokvaliteten för skalbara videoströmmar harpresenterats och utvärderats. Den föreslagna metoden kan användas för att ge ettvärde på den förväntade objektiva videokvaliteten i ett skalbart videotransmission-ssystem som är utsatt för paketförluster. Den kvantitativa kvalitetsökningen under-söks för temporalt skalbar video signal i ett system där viktigare skikt prioriterasi händelse av paketförluster. Vidare föreslås Synchronization Frames for ChannelSwitching (SFCS) som en metod för att distribuera digital video med IP-multicast.SFCS kan användas för att bätte hantera paketförluster, förbättra bandbreddsutnyt-tjandet samt minska kanalbytestider. Prestanda för metoden analyseras och genomanalytiska beräkningar för bandbreddsutnyttjande kompletterade med datorsimu-leringar. Resultaten visar att användningen av SFCS i ett distributionssystem förip-tv kan minska bandbreddsutnyttjandet signifikant samt reducera tiden för kanal-byten.

  • Acknowledgements

    Firstly I would like to thank my supervisor Docent Tingting Zhang and my sec-ondary supervisor Dr. Stefan Pettersson, without whose support, guidance, andcommitment, this work would have been impossible. I would also like to thank Pro-fessor Björn Pehrson and Professor Youzhi Xu who made it possible for me to startmy Ph.D studies.

    I would like to also thank Dr. Patrik Österberg, Lic.Eng Daniel Forsgren and M.ScHans Eric Sandström for being the best working colleagues and friends one can wishfor.

    I also like to thank Dr. Mikael Gidlund for giving me suggestions for improve-ment of this thesis and also for making it possible for me to spend almost half a yearbeing employed at Acreo AB in Hudiksvall.

    I would also like to extend gratitude toward all the people outside of the uni-versity supporting our research in the MUCOM group, especially Karin Nygård-Skalman who has been helping us with applications for funding and extending ournetwork of industrial partners.

    I also like to thank all my colleagues in MUCOM and IKS department for yourcompany and interesting discussions in the coffee room.

    A big thank to my mother, father and brothers. This work is based on your end-less and unconditional support and love.

    At last and most, to my wife Johanna. Without your love and support this workwould have been impossible. I dedicate this thesis to you.

  • Contents

    Abstract v

    Sammanfattning vii

    Acknowledgements ix

    List of Papers xv

    List of Figures xvii

    List of Tables xix

    Terminology xxi

    1 Introduction 1

    1.1 Background and Problem Motivation . . . . . . . . . . . . . . . . . . . 2

    1.2 Overall Aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

    1.3 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

    1.4 Concrete and Verifiable Goals . . . . . . . . . . . . . . . . . . . . . . . . 3

    1.4.1 Error Resilience in Video Transmission . . . . . . . . . . . . . . 3

    1.4.2 IPTV Fast Channel Change . . . . . . . . . . . . . . . . . . . . . 4

    1.5 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

    1.6 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

    2 Background Theory 7

    2.1 Digital Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

    2.1.1 Color Representation . . . . . . . . . . . . . . . . . . . . . . . . . 7

  • xii CONTENTS

    2.1.2 Interlace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

    2.1.3 Resolution Formats . . . . . . . . . . . . . . . . . . . . . . . . . . 8

    2.2 Video Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

    2.2.1 Intra Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

    2.2.2 Inter Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

    2.2.3 Hybrid Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

    2.2.4 SI/SP- frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

    2.3 Audio Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

    2.4 MPEG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

    2.4.1 MPEG-1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

    2.4.2 MPEG-2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

    2.4.3 MPEG-4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

    2.4.4 MPEG-4 AVC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

    2.5 Packet Switched Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 18

    2.5.1 Unicast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

    2.5.2 Broadcast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

    2.5.3 Multicast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

    2.6 Internet Protocol Networks . . . . . . . . . . . . . . . . . . . . . . . . . 20

    2.6.1 Internet Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

    2.6.2 Transmission Control Protocol . . . . . . . . . . . . . . . . . . . 20

    2.6.3 User Datagram Protocol . . . . . . . . . . . . . . . . . . . . . . . 21

    2.6.4 IP Multicasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

    2.7 Quality of Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

    2.7.1 Differentiated Services . . . . . . . . . . . . . . . . . . . . . . . . 22

    2.8 Video Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

    2.8.1 Subjective Video Quality . . . . . . . . . . . . . . . . . . . . . . . 23

    2.8.2 Objective Video Quality . . . . . . . . . . . . . . . . . . . . . . . 23

    2.9 Multimedia Transport . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

    2.9.1 Multimedia Streaming . . . . . . . . . . . . . . . . . . . . . . . . 24

    2.9.2 MPEG-2 Transport Stream . . . . . . . . . . . . . . . . . . . . . . 24

    2.9.3 Real-time Transport Protocol . . . . . . . . . . . . . . . . . . . . 24

    2.10 Internet Protocol Television . . . . . . . . . . . . . . . . . . . . . . . . . 26

    2.11 Hierarchical Video Coding . . . . . . . . . . . . . . . . . . . . . . . . . . 27

  • CONTENTS xiii

    2.11.1 Temporal Scalability . . . . . . . . . . . . . . . . . . . . . . . . . 27

    2.11.2 Frequency Scalability . . . . . . . . . . . . . . . . . . . . . . . . . 27

    3 Synchronization Frames for Channel Switching 29

    3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

    3.2 Synchronization Frames for Channel Switching . . . . . . . . . . . . . . 30

    3.3 Bandwidth Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

    4 Error Resilience in Video Transmission 35

    4.1 Scalable Video . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

    4.2 Synchronization Frames for Channel Switching . . . . . . . . . . . . . . 35

    4.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

    5 Fast Channel Change 39

    5.1 Tune in Channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

    5.2 Edge Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

    5.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

    6 Summary 43

    6.1 Error Resilience in Video Transmission . . . . . . . . . . . . . . . . . . . 43

    6.2 Fast Channel Change . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

    6.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

    6.3.1 Error Resilience Coding . . . . . . . . . . . . . . . . . . . . . . . 44

    6.3.2 Fast Channel Change . . . . . . . . . . . . . . . . . . . . . . . . . 45

    6.3.3 IPTV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

    Bibliography 47

    Biography 53

    Included Papers 55

  • List of Papers

    This thesis is based mainly on the following papers, herein referred by their Romannumerals:

    I Tingting Zhang, Ulf Jennehag, and Youshi Xu , Numerical modeling of trans-mission errors and video quality of MPEG-2. In Signal Processing: Image Commu-nication, Volume 16, Issue 8, Pages 817-825, May 2001.

    II Daniel Forsgren, Ulf Jennehag, and Patrik Österberg , Objective End-to-EndQoS Gain from Packet Prioritization and Layering in MPEG-2 Streaming. InProceedings of the 12th International Packet Video Workshop (PV 2002), PittsburghPA, USA, April 2002.

    III Ulf Jennehag and Tingting Zhang , Increasing Bandwidth Utilization in NextGeneration IPTV Networks. In Proceedings of the 2004 International Conference onImage Processing (ICIP 2004), Volume 3, Pages 2075 - 2078, Singapore, October2004.

    IV Ulf Jennehag, Tingting Zhang, and Stefan Pettersson , Improving TransmissionEfficiency in H.264 Based IPTV Systems. In IEEE Transactions on Broadcasting,Volume 53, Issue 1, Pages 69-78, March 2007.

    V Ulf Jennehag and Stefan Pettersson , On Synchronization Frames for ChannelSwitching in a GOP-based IPTV Environment. In Proceedings of the fifth IEEEConsumer Communications & Networking Conference (CCNC 2008), Las Vegas NV,USA, January 2008.

  • List of Figures

    2.1 Component sampling in Y CBCR formats . . . . . . . . . . . . . . . . . 8

    2.2 Interlace using alternating odd/even field update . . . . . . . . . . . . 8

    2.3 Different common screen resolutions compared . . . . . . . . . . . . . 9

    2.4 Common data structures used in video compression . . . . . . . . . . . 10

    2.5 Intra encoding and decoding . . . . . . . . . . . . . . . . . . . . . . . . 10

    2.6 Displaying previous picture, picture to be coded, and the differencepicture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

    2.7 Displaying the predicted image, the motion vectors, and the residueimage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

    2.8 Hybrid encoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

    2.9 Hybrid decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

    2.10 Switching between streams with SP-frames. . . . . . . . . . . . . . . . . 14

    2.11 SI-frame acting as a stream entry point. . . . . . . . . . . . . . . . . . . 14

    2.12 H.264 and MPEG-2 encodings of the SVT Fairytale sequence, 1280x720p50fps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

    2.13 Unicasting packets from one host to other hosts. . . . . . . . . . . . . . 18

    2.14 Broadcasting packets from one host to all hosts. . . . . . . . . . . . . . 19

    2.15 Multicasting packets form a server to a group of hosts. . . . . . . . . . 19

    2.16 OSI and TCP/IP reference models . . . . . . . . . . . . . . . . . . . . . 20

    2.17 IP header fields. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

    2.18 UDP header fields. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

    2.19 Transport Steam packet and header. . . . . . . . . . . . . . . . . . . . . 25

    2.20 RTP header fields. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

    2.21 Temporal scalability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

    2.22 Frequency partitioning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

  • xviii LIST OF FIGURES

    3.1 Rate distortion plots for the 1280x720p 50fps SVT sequence "Fairytale"at different GOP sizes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

    3.2 Channel switch for a traditional GOP system. . . . . . . . . . . . . . . . 31

    3.3 Channel switch for a SFCS system. . . . . . . . . . . . . . . . . . . . . . 31

    3.4 The different node and link types in the network. . . . . . . . . . . . . 32

  • List of Tables

    2.1 IPv4 Multicast address space . . . . . . . . . . . . . . . . . . . . . . . . 22

  • Terminology

    Abbreviations and Acronyms

    AAC Advanced Audio CodingCABAC Context-adaptive binary arithmetic codingADC Analogue-Digital ConverterDCT Discrete Cosine TransformDS Differentiated ServicesDSL Digital Subscriber LineDSLAM Digital Subscriber Line Access MultiplexerDTS Decoding Time StampDVB-T Digital Video Broadcasting, TerrestrialFCC Fast Channel ChangeHDTV High Definition TelevisionIEC International Electrotechnical CommissionIP Internet ProtocolIPv4 Internet Protocol version 4IPSTB Internet Protocol Set-Top-BoxIPTV Internet Protocol TelevisionISMA Internet Streaming Media AllianceISO International Organization for Standardizationlive-TV Traditional channel based television service, with a fixed time-

    line.MPEG Moving Picture Experts GroupOSI Open Systems InterconnectionPTS Presentation Time StampRAP Random Access PointRTP Real-time Transport ProtocolSDTV Standard Definition TelevisionTCP Transport Control ProtocolTS MPEG-2 Transport StreamUDP User Datagram ProtocolVoD Video on Demand

  • xxii LIST OF TABLES

    Mathematical Notations

    fps frames per secondM number of available channelsN total number of clientsd average duration between resynchronization requestsNm number of clients tuned to channel mP size of a Inter-frame i bitsRRSM RAPSY NC rateRPSM inter-frame rate SFCS mainRRG RAPGOP rateRPG inter-frame rate GOPrRSS size ratio RAPSY NC/PrRG size ratio RAPGOP /PBGRCL expected GOP bandwidth in one RCLBGRRL,m expected GOP RRL bandwidth for one channel mBGRRL expected GOP RRL bandwidth for all channelsBSRCL expected SFCS bandwidth in one RCLBSRRL,m expected SFCS RRL bandwidth for one channel mBSRRL expected SFCS RRL bandwidth for all channels

  • Chapter 1

    Introduction

    Over the last fifteen years there have been continuous developments associated withthe Internet. More people have gained Internet access as well as making use of theservices offered. The increasing number of new users and services offered has causeda surge for connections with high bandwidth, low delay, and high reliability. Thistype of network connections called broadband connections, have increased greatlyin numbers over the years. The increased capacity in the network allow for newand exciting services. Streaming media is one such group of services that has grownrapidly growing over the last few years. Internet Protocol Television (IPTV) is aspecific example. IPTV specifies a group of multimedia services including, live-TV,Video on Demand, and Pay-per view.

    Multimedia services such as IPTV rely heavily on streaming video techniques.For a streaming video service to be at all feasible, it must utilize compression tech-niques in order to reduce the amount of data being transmitted. Modern compres-sion techniques utilize predictive coding which makes the stream sensitive to in-formation loss. Since streaming video is a real-time service it is also sensitive toinformation being delayed or received out of order.

    Some of the current deployments of IPTV suffer from slow channel change per-formance. This is in some part due to the long waiting time for the decoder to find asuitable starting point in the new stream.

    In summation, IPTV distribution requires a high performing network infrastruc-ture, information loss probabilities and network delay variance needs to be very lowotherwise the service will not function properly.

    This thesis deals with two of the issues regarding IPTV distribution, namely, errorresilience and fast channel change.

  • 2 Introduction

    1.1 Background and Problem Motivation

    The development of network infrastructure is still increasing and more and morehouseholds are gaining broadband access. There is no actual figure defining thecapacity of a broadband connection, but the common notion of a broadband connec-tion is that it is "always on" and often offered to a flat rate. A broadband connectionis considered to include Digital Subscriber Line (DSL) connections but excludes mo-dem and single ISDN connections. Sweden is a good example of a market wherethere has been a rapid expansion in broadband development. In their annual report"Bredband i Sverige" the "Post och Telestyrelsen (PTS)" in Sweden have investigatedthe number of Swedish households with the means to acquire a broadband connec-tion [1] [2] [3]. Over the years the development has been steady and rising. In theirlast report PTS indicated that over 97% of all Swedish households have the means ofacquiring a broadband connection.

    This broadband infrastructure development surely increases the possibility ofsuccessfully deploying services with high bandwidth demands. Internet ProtocolTelevision (IPTV) [4] [5] [6] [7] is one such service. IPTV is now gaining popularityvery rapidly, Informa Research [8] state that the market will grow by a factor of sevenby 2011 based on the 2006 numbers. The notion of IPTV being a rapid increasing in-dustry is also true in the broadband dense country of Sweden [9]. For example, twomajor Swedish ISPs, TeliaSonera [10] and Bredbandsbolaget [11], are now promotingIPTV services on their homepages and in broadcast TV commercials.

    Distributing multimedia demands high bandwidth. Increased bandwidth meanshigher quality or more content distributed over the network. The demands for lowbandwidth and high quality drives the media compression industry which are re-lentlessly researching new and more complex algorithms. Bandwidth issues is al-ways going to be a interesting and challenging topic in IPTV research.

    Distributing multimedia means that there are high demands placed on the net-work. Multimedia is sensitive to information loss and excessive delay jitter due tothe predictive coding used in modern multimedia compression techniques. For ex-ample, information loss in video data manifests itself by means of blocking artifactsin the image sequence. IPTV is very sensitive to packet-loss and require almost aerror-free environment [12] to give excellent service quality.

    Modern packet switched broadband networks often have excellent performancein terms of extremely low packet loss ratio and a low delay with low variance [13],this is, however, not the case for most networks, where occasional packets still arelost or delayed.

    One important contributory factor to information loss in packet switched net-works is congestion. As compressed multimedia often rely on predictive coding it isreasonable to assume that some information packets in the stream can be consideredas being more important than other packets. In a packet loss situation the result ofthe lost packet can have different impacts depending on how important the packetlost was. In the situation associated with congestion, there can be beneficial for thecongested routing node to throw away packets with low importance first. If a packet

  • 1.2 Overall Aim 3

    is lost it is also important to know how the receiver is able to recover from that con-dition without causing to much overhead in the network. Bearing this in mind, it ispossible to argue that it is important to have methods which are able to combat thislow frequency packet loss, which is often caused by congestion.

    Another issue in IPTV distribution is the increased channel change time com-pared to analog television distribution. This is caused by several factors and in whichthe time to wait for a suitable entry for the decoder in the stream must be consideredas being one of the biggest. A reduction in IPTV channel switching time targets animportant area of IPTV research.

    1.2 Overall Aim

    This thesis will address IPTV distribution over networks. More specifically it will ad-dress error resilience issues for video transmission in terms of objective video qualityand bandwidth consumption. It also targets the performance of channel switchingin terms of bandwidth efficiency, tune-in time, and objective video quality.

    1.3 Scope

    By definition multimedia content contains more than one media type. Audio andvideo are two types of media that are often combined to represent a course of events.Modern audio compression techniques achieve such good results [14] that the audiosignal only occupy a fraction of the bandwidth compared to the video signal. There-fore, the studies presented in this thesis are focused on different aspects of broadcastquality video distribution over IP networks. The underlying physical media usedfor packet transport in the network is out of scope for this thesis.

    1.4 Concrete and Verifiable Goals

    The thesis deals with two different tracks related to IPTV distribution.

    1.4.1 Error Resilience in Video Transmission

    Within the area of error resilience in video transmission, three different subareashave been studied.

    • Due to predictive coding of video sequences it is reasonable to assume thatsome information is more important than other. Packets containing video datacan be classified into layers by the importance of the data residing in them. Inthis case, the lower layers predict the higher layers. By calculating the effect

  • 4 Introduction

    that a packet loss in one layer has on the others, suitable schemes for errorresilience and protection can be constructed.

    • Since the main reason for packet loss is congestion it makes sense to allowrouters to utilize information concerning the importance of the data in onepacket in relation to other packets in the stream video stream. By assigningpackets to different classes the router can, in a state of congestion choose tothrow away packets with lower importance first in order to minimize the effecton the quality of the decoded video.

    • When a video packet is lost it will probably influence the decoded result ofother packets due to the use of predictive coding. It is important to stop thiserror propagation as early as possible. Therefore it is important to investigatevideo distribution solutions that address is area.

    1.4.2 IPTV Fast Channel Change

    The use of predictive coding and the spacing of key frames (aka random accesspoints), influences channel switching time. It is of great interest to reduce the chan-nel switching time without compromising steady-state compression performance.This thesis addresses a tune-in channel approach based on one steady state streamaccompanied by a tune-in stream distributed on two separate multicast addresses.A channel tune-in is performed by first joining and receive tune-in information formthe tune-in stream and later join the steady state stream. The information from thetune-in stream is spliced with the steady state information and is then fed to thedecoder.

    • One of the incentives to use fast channel change techniques is to reduce band-width consumption. Therefore it is of great interest to investigate methodsto estimate the bandwidth consumption caused by fast channel change tech-niques.

    • The primary purpose of a fast channel change technique is to reduce the tune-intime when changing channels. Tune-in time estimations and the implicationsassociated with the bandwidth are of great interest.

    • When splicing two streams from different encodings there is a risk of experi-encing quality distortions caused by the differences in the reference pictures.Methods to reduce this distortion or remove it completely are of great interest.

    1.5 Outline

    In chapter 2 the reader is introduced to the area of multimedia coding and transmis-sion as well as related aspects. Chapter 3 introduces the reader to the concept of thedeveloped side-steam tune-in technique which is a major part of the thesis. Chapter

  • 1.6 Contributions 5

    4 presents related work in the field of error resilience in video transmission and thecontributions by the author in the area. Chapter 5 introduces the reader to the fieldof fast channel change in IPTV distribution, related work, and the contributions bythe author in the field. A summary is given in chapter 6 together with suggestionsfor future work in the two fields presented.

    1.6 Contributions

    The author’s contributions to the area are provided in the included published pa-pers. The main contributions are stated here.

    A study in which the mathematical relationship between packet loss and the de-coded video quality is presented and evaluated, in the case of temporally layeredMPEG-2 transmitted over a simple network experiencing packet-loss. The analyticalresults are verified using software simulations. The results are presented in paperI. The author has been sorely responsible for the design and implementation of thesimulation environment, including development of the video quality measurementmethod.

    Paper II includes a study of the objective video quality gain attained by us-ing temporally layered MPEG-2 streaming, when applying priority dropping tech-niques at intermediate router nodes. Different queuing strategies and the influenceof packet size are also investigated. The author contributed by participating in thedesign and construction of the simulation environment and the video quality mea-sure.

    Paper III-V deal with a idea regarding the use of side stream information forfaster channel tune-in and error recovery in an IPTV environment. A complete math-ematical analysis of the system is presented and the results are confirmed by meansof a simulation. The method is applied to both traditional encoder variants and morecomplex ones using switching frames. The presented method can be used to reducethe channel switching time and to combat error propagation caused by packet loss.In paper V a measure of tune-in quality for the system applied to a group-of-picturesenvironment is presented. In these publications the author has been responsible forall the results with some additional assistance from Doc. Tingting Zhang with re-gards to the development of the mathematical bandwidth estimation expressions.Dr. Stefan Pettersson provided assistance in the form of suggestions which weremainly concerned with the writing of the papers and how to present the informationin an understandable manner.

  • Chapter 2

    Background Theory

    This chapter introduces the reader to the field of image and video representation,video compression, audio compression, multimedia packetization, and the trans-mission of multimedia over IP networks.

    2.1 Digital Images

    Digital images can be visualized as a collection of color data (values) arranged in amatrix of small pixels. The actual value for one given pixel position within the matrixrepresents the color of that pixel. If the pixels are sufficiently small, the human eyewill perceive the image as smooth and consistent. Several images in a sequence canbe considered as being a video. Images in a video sequence are often captured ata constant frequency where each of the images represents a snapshot of the scenecaptured. The images in a video sequence are also called frames or pictures.

    2.1.1 Color Representation

    There are several standards for representing colors in digital images and digitalvideo. For example; RGB: Uses a fixed number of bits for the each one of the threecolor channels R (red), G (green) and B (blue). This format is often used in computerapplications. Y CBCR: Is component based color representing scheme. Y representsluminance value of the pixel and the CB CR represents the chrominance value, onefor each channel. This component system has several modes each with different sam-pling ratios with regards to the chrominance information. The 4:4:4, 4:2:2 ,and 4:2:0sampling ratios and positions are outlined in Figure 2.1. 4:4:4 Sampling ratio: In thiscase, the Y and the CB CR information and are present for every pixel. 4:2:2 Sam-pling ratio: In this case, Y is present in every pixel and CB CR information is presentfor every second pixel in the horizontal direction. This gives a color compression of1/3 compared with the 4:4:4. 4:2:0 Sampling ratio: The 4:2:0 format further reduces

  • 8 Background Theory

    Figure 2.1: Component sampling in Y CBCR formats

    Figure 2.2: Interlace using alternating odd/even field update

    the number of chrominance samples. Chrominance information is only available ev-ery fourth pixel, where the chrominance values are calculated as the mean of twopixels, as illustrated in figure 2.1.

    2.1.2 Interlace

    Interlace scanning [15][16] is a widely used method to increase the perceived videopicture update frequency and reduce the information usage. In general interlacemeans that temporal resolution can be increased through a reduction in the verticalresolution. The most common variant of interlace is based on updating either theodd or the even lines, called fields, on each update of the viewing screen, as illus-trated in Figure 2.2. Two fields make up a picture. To further enhance the viewingexperience the two fields in a picture are sampled at different times. In contrast tointerlace scanning, progressive scanning is often mentioned. In progressive scan-ning all lines are updated every time. Interlaced pictures are used both in analogand digital video transmission.

    2.1.3 Resolution Formats

    A large number of screen resolution variations associated with television and multi-media streaming formats exist. These include QCIF, CIF, 576i, 720p, 1080i ,and 1080p.Where i denotes interlace scanning and p denotes progressive scanning. 576i is the

  • 2.2 Video Coding 9

    Figure 2.3: Different common screen resolutions compared

    same as PAL resolution and is also referred to as SDTV. The format 720p is calledHDTV, or half-HD, in contrast, the 1080i and 1080p are called Full-HD. The Full-HDfomat is specified by EBU in [17]. The size ratio between the different resolutions arepresented in figure 2.3.

    2.2 Video Coding

    All common implementations of video encoder/decoders (codecs) rely on severaldata structures as illustrated in figure 2.4 to aid in the coding process of the video.A picture is typically divided into blocks, which are generally of the size of 8 by 8pixels. The block is then subject to transform coding. Blocks are often grouped to-gether in macroblocks. A common macroblock constellation is that four horizontaland vertical adjacent blocks form a macroblock. Motion estimation/compensationis often performed at the macroblock level. A group of macroblocks makes a slice.The slice structure often contains the means for the decoder to resynchronize withthe data stream, which is achieved by inserting a unique bit sequence called a startcode. A coded picture consists of one or several slices. A picture can be encodedeither intra (I) picture, utilizing information within the picture itself, or as an inter(P/B) -picture, using information from adjacent pictures in time (temporal predic-tion). Section 2.2.1 and 2.2.2 describes inter and intra coding in more detail. Picturesare typically arranged into a Group Of Pictures (GOP) where the first picture in theGOP is intra coded.

  • 10 Background Theory

    Figure 2.4: Common data structures used in video compression

    (a) Intra encoder

    (b) Intra decoder

    Figure 2.5: Intra encoding and decoding

    2.2.1 Intra Coding

    Coding of pictures with no reference to other temporal information is called intracoding. Intra coded pictures are often referred to as I-pictures or I-frames. Intracoders utilize the spatial redundancy that exists in the picture to be coded. Figure2.5 illustrates a simple intra coding process. Image blocks, e.g. 8X8 matrices of pixels,are transformed using a frequency transform, e.g. Discrete Cosine Transform (DCT)function. The transform coefficients are quantized, zig-zag scanned (Q), and Vari-able Length enCoded (VLC). The decoding is a reverse process of the encoding witha Variable Length Decoder (VLD) followed by an inverse quantize function (IQ) andan inverse DCT (IDCT). The Joint Photographic Experts Group (JPEG)[18] has de-veloped a popular and widely spread intra coding standard utilizing a block-basedvariant of the intra encoder/decoder called JPEG.

    2.2.2 Inter Coding

    Since video can be interpreted as a sequence of pictures captured at a predeterminedrate, there is a great deal of information that is more or less static between pictures.This redundancy can be removed to further increase the coding efficiency. figure 2.6

  • 2.2 Video Coding 11

    (a) Foreman picture 66 (b) Foreman picture 69 (c) Difference picture

    Figure 2.6: Displaying previous picture, picture to be coded, and the difference picture

    illustrates this by showing the pixel difference of two subsequent pictures (a) and(b). The uniform gray areas in 2.6 (c) correspond to areas with low temporal motion.The coding of picture sequences by removing temporal redundancy is called intercoding.

    Inter coding relies on techniques to find matching information in the referencepicture and reuses this information in the picture to be coded. This minimizes theamount of information required to describe the new picture thus increasing the com-pression ratio. Although this techniques may be related to such a new phenomenonas coding of digital images, the idea of reusing known information in moving pic-tures is actually quite old. As early as 1929 Ray Davis Kell introduced this idea in apatent [19].

    This idea of reusing known information is implemented in the majority of mod-ern video compression schemes. The technique of finding a suitable part of the ref-erence picture to reuse is referred to as Motion Estimation (ME) [16]. The result ofthe ME process provides information relating to the reference picture the best matchcan be found. This directional information is called a motion vector. The reverse pro-cess, using the motion vector to copy information from the reference picture to thepicture to be reconstructed is called Motion Compensation (MC). ME/MC is oftenperformed at the macroblock level. In figure 2.7 (a) the motion compensated repre-sentation of the picture figure 2.6 (b) is displayed. This picture is reconstructed onlywith the previous picture figure 2.6 (a) and the motion vectors in Figure 2.7 (b).

    This approach only requires that the motion vectors should be transmitted, whichreduces the information for the reconstruction of the picture even further. However,there is the possibility that the result produced may not be perfect and this is illus-trated in figure 2.7 (c) which contains the difference between the figure 2.6 (b) and2.7 (a). This residue information must also be coded and transmitted in order to re-construct a good representation of figure 2.6 (b). The residue is coded using similartechniques to those described in section 2.2.1

    Inter coded frames or pictures are often referred to as P-frames or P-pictures.Pictures acting as a reference for other pictures, i.e. I and P-pictures, are collectivelyreferred to as reference pictures. If the inter coded picture uses information from boththe preceding and following reference pictures, it is referred to as a bi-directional pre-dictive frame, or more concisely as a B-frame or B-picture. B-pictures are generally

  • 12 Background Theory

    (a) Predicted picture (b) Motion vectors (c) Difference picture

    Figure 2.7: Displaying the predicted image, the motion vectors, and the residue image

    not used to predict other pictures.

    One obvious drawback associated with inter coding is the necessity to have areference picture. Without the reference it is impossible to reconstruct a good rep-resentation of the coded picture. Bearing this in mind, reference pictures can beconsidered as being more valuable than pictures not used as a reference for otherpictures. This is a fact that must be considered when transmitting inter coded framesover those mediums which are prone to information loss.

    2.2.3 Hybrid Coding

    When utilizing intra and inter coding in conjunction, the result is called a hybridcoder and most modern coders are variants of this method. Figure 2.8 illustratesa block scheme of a hybrid encoder and the details are explained in the followingexample. A combination of both intra and inter coding is involved in this encodingprocess. The first decision of the encoder (C) is the mode in which to code the currentpicture (side information). If the picture is to be coded as an intra picture the encodertakes the input block and then applies a transform function (T). The transformedcoefficients are then quantized and zigzag scanned (Q). The result is then variablylength coded (VLC) and this is then placed in the output buffer. If the picture is tobe inter coded, a best match search by the motion estimator (ME) is performed onthe reference picture found in the picture memory (PM) for the block to be encoded.The location of this block is called the motion vector. The residue between the in-put block and the match found is transformed (T), quantized, zigzag scanned (Q),and variably length encoded (VLC). The encoded coefficients are combined with themotion vector indicating the best match in the output buffer.

    The decoding process of an intra picture can be described as a reverse process ofthe encoding, the hybrid decoder block scheme is shown in figure 2.9. Decoding anintra block is performed by means of a variable length decode of the coefficients, in-verse quantize, and it is then run through an inverse transform function. The result-ing image block is copied to the output image. Decoding an intra block is performedby copying (MC) the area from the picture memory (PM) indicated by the motionvector, decoding the residue, and finally adding the two together.

  • 2.2 Video Coding 13

    Figure 2.8: Hybrid encoder

    Figure 2.9: Hybrid decoder

    2.2.4 SI/SP- frames

    The difference between SP- and P-frames is that SP-frames allow identical frames tobe reconstructed even when they are predicted using different reference frames.

    SI/SP frames were proposed and introduced by Karczewicz and Kurceren [20]based on the work introduced by Färber and Girod [21]. SI/SP-frames are sometimesreferred to as switching frames [22]. However, this is not mentioned in the paper byKarczewicz and Kurceren where the new frame types are simply called SI/SP-frameswithout any further explanation.

    SP-frames are a variant of P-frames but have the property of being able to re-construct identical frames with different reference frames. The following examplefurther explains the properties of SP-frames and is illustrated in Figure 2.10. StreamA, which only contains inter coded material (P-, or B-pictures) is currently being

  • 14 Background Theory

    Figure 2.10: Switching between streams with SP-frames.

    Figure 2.11: SI-frame acting as a stream entry point.

    decoded, i.e. PA. A switch to the content in stream B is then performed, and theswitching picture SPAB(t) is then decoded, instead of the SPB(t) picture. The de-coded output of the SPAB(t) picture is exactly the same as the decoded output of theSPB(t) picture. This enables the next picture, i.e. PB(t+1) to use any of these twopictures as a basis for the prediction. Hence, a switch from steam A to stream Bhas been made utilizing the temporal redundancy provided by stream A, when de-coding stream B. The switching picture must be coded using knowledge concerningboth stream A and B.

    SI-frames behave in exactly the same manner as a SP-frame but do not requirea reference frame to be decoded. SI-frames can be used as stream entry points asillustrated in figure 2.11.

    SI/SP-frames form part of the extended profile of the H.264 standard but is re-ferred to, in that case, as SI/SP-slices since H.264 does not have explicit frame/picturestructure, for more information see section 2.4.4.

  • 2.3 Audio Coding 15

    2.3 Audio Coding

    Single channel audio can be looked upon as a one dimensional time variant signal.Compression of audio is rather straightforward at a first glance but the imperfectionsof the human auditory system opens it up to many additional coding techniques[23]. For example, temporal masking is the description of the property of a strongsound preceding a weak sound which masks the weaker sound to such an extentthat it is often inaudible. Another example is frequency masking, this means that aloud sound at a particular frequency cause weaker sounds to be masked which havea frequency close to that of the loud sound. This masking causes the weak soundto become inaudible to the human ear. The above mentioned properties and severalothers are utilized in most modern audio compression algorithms.

    2.4 MPEG

    Motion Pictures Expert Group (MPEG) is the commonly known name of the ISO/IECworking group ISO/IEC JTC1/SC29 WG11. MPEG consists of approximately 350different people from both industry and educational bodies and was founded in1988. The MPEG proposes and develops standards for multimedia coding and trans-mission.

    2.4.1 MPEG-1

    In 1993 the MPEG-1 [24] was introduced as an ISO/IEC standard. MPEG-1 specifiesaudio and video compression and the purpose is to enable the delivery of accept-able quality for the combined bitrate of 1.5Mbps. The complete MPEG-1 standardconsists of five parts. Part 1, Systems, covers the issues associated with combiningone or more streams into one stream suitable for retrieval and play out. The sys-tems layer also provides the necessary timing information required to decode thecombined stream. Part 2, Video, specifies the video compression layer. MPEG-1 Part2, includes ME/MC techniques as well as support for B-frames. MPEG-1 does notsupport interlaced video. Part 3, Audio, specifies a compression scheme for monoand stereo audio signals at various bitrates. MPEG-1 part 3 includes three encod-ing decoding layers and in which layer 1 has a low complexity and compressionratio. Layer 3 is also known as mp3 and uses a more complex coding model andthus achieves a better compression ratio at a specified bitrate compared to the twoother specified layers. Part 4 specifies compliance testing of the implementationsof the software and hardware for part 1-3 encoders/decoders. Part 5 is a softwareimplementation of part 1-3. Gall summarizes the main features of MPEG-1 in [25].

  • 16 Background Theory

    2.4.2 MPEG-2

    In 1996 the MPEG finalized and standardized a new suite for multimedia coding,MPEG-2. MPEG-2 gained international status when the work was accepted by theInternational Organization for Standardization/International Electrotechnical Com-mission (ISO/IEC) [26]. MPEG-2 targets standard definition television (SDTV) res-olution but also includes high definition television (HDTV) resolution. Coding ofinterlaced signals was included in MPEG-2 which greatly improves perceived qual-ity when it is displayed on a standard television set. The MPEG-2 standard alsoincluded updated audio encoding schemes and protocols for packetizing and distri-bution. Haskell et.al has published a book [15] covering the features of the MPEG-2standard in a comprehensive manner.

    MPEG-2 Part 1, Systems: Systems is a comprehensive framework for multiplex-ing and packetizing multimedia data suited to a wide range of communication chan-nels and protocols. The MPEG-2 Systems standards text is shared with the Interna-tional Telecommunication Union (ITU) standard H.222.0 [27].

    Part 2, Video: Specifies compression of video signals targeted for SDTV andHDTV resolutions. MPEG-2 Video has support for compression of interlaced videosuch as PAL and NTSC signals. As with systems MPEG-2 Video shares standard textwith the ITU standard H.262 [28].

    Part 3, Audio: The audio part of MPEG-2 which has support for multi-channelaudio coding but is also backward compatible with MPEG-1. Part 4, Conformance:As with MPEG-1 this part specifies compliance testing of the implementations of thesoftware and hardware for the MPEG-2 part 1-3 encoders/decoders. Part 5, Software5: Software implementations of part 1-3. Part 6, Digital Storage Media - Commandand Control (DSM-CC): Specifies a set of protocols with functions to manage MPEG-1 and MPEG-2 streams. Part 7, NBC Audio: Multi-channel audio, not backwardcompatible with MPEG-1 Part 8, 10 bit Video: Support for 10bit video, this is nolonger continued due to a lack of interest on the part of industry. Part 9, Real TimeInterface: Specification of a real-time interface to transport a stream for adaptationto networks carrying transport streams. Part 10, DSM-CC Conformance 10: Confor-mance testing specification for part 6.

    2.4.3 MPEG-4

    MPEG-4 [29] was standardized at the beginning of 1999, and is a large frameworkfor multimedia compression and communication. The video coding of MPEG-4 isspecified in part two of the standard, which also divides into different profiles, andwhich includes the simple profile (SP) and the advanced simple profile (ASP). Sev-eral video compression techniques are described within the framework, e.g. block-based hybrid coding, B-pictures, object-based video coding, advanced audio codingto mention but a few. MPEG-4 ASP is also used under the name DivX [30] and hasfor some years been the dominant method for compressing video in order to fit aCD-ROM. MPEG-4 has a similar structure to that of both MPEG-1 and MPEG-2 but

  • 2.4 MPEG 17

    0 5 10 15 20 2526

    28

    30

    32

    34

    36

    38

    Bandwidth [Mbps]

    Seq

    uenc

    e Y

    −P

    SN

    R [d

    B]

    Rate distortion plots for MPEG−2 and H.264 (1280x720p 50fps GOP=24)

    H.264MPEG−2

    Figure 2.12: H.264 and MPEG-2 encodings of the SVT Fairytale sequence, 1280x720p 50fps

    consists of approximately twenty different parts covered by the scope of the stan-dard.

    2.4.4 MPEG-4 AVC

    In 2003 a new part of the MPEG-4 standards suite was completed which was calledMPEG-4 Part 10, Advanced Video Coding (ISO/IEC 14496-10). MPEG-4 AVC is alsoknown as H.264 [31]and was previously known as H.26L. MPEG4 AVC/H.264 wasdeveloped by the Joint Video Team (JVT) consisting of the ITU-T related workgroupVideo Coding Experts Group (VQEG) and MPEG. The text in the MPEG-4 AVC stan-dard is identical to that of H.264. MPEG-4 AVC/H.264 introduces a new set of videocoding features, which increase the compression efficiency and flexibility comparedto the previous versions of MPEG-4 and MPEG-2. H.264 offers a far superior com-pression ratio as compared to MPEG-2. In figure 2.12 a comparison of MPEG-2 andH.264 encodings of the SVT high definition progressive sequence "Fairytale" in reso-lution 1280x720 at 50 frames per second is presented. The open source software x264[32] was used to encode the H.264 sequences and FFMPEG [33] was used to encodethe MPEG-2 sequences. Both plots of the encodings have variable bitrate settings forconstant quality during the sequence. The GOP size was set to 24 and two B-framesare used between P-frames. The reported objective quality is the PSNR of the wholesequence. It is worth noting that for a given quality of PSNR equal to 35.0 dB, H.264has approximately 60% of the bandwidth as that for MPEG-2. For further readingwith reference to H.264 the document by Wiegand et. al [34] is excellent and coversmost of the aspects of H.264. Richardson has also written a book [22] covering thesubject which is also well worth reading.

  • 18 Background Theory

    Figure 2.13: Unicasting packets from one host to other hosts.

    2.5 Packet Switched Networks

    Information distribution in a packet switched network is performed by enclosingthe information to be distributed in a packet and marking the packet with the sourceand destination address. The source and destination of the packets are identified byunique addresses. Packets are distributed through the network by a set of interme-diate nodes called routers that route the packets to their destinations.

    This section will introduce the reader to the different transmission modes thatexist in a packet switched network.

    2.5.1 Unicast

    Unicast transmission deals with one to one communication and includes for exam-ple, one host sending exclusive information to another host. However, it is alsopossible to use unicast for the occasions when one host wants to communicate thesame information to several hosts. Figure 2.13 illustrates this with a server sendingthe same information to several hosts interested in this particular information. Ascan be seen in the figure, there is a substantial wastage of bandwidth in the sharedlinks since the information is sent independently from the server to each one of thehosts.

    2.5.2 Broadcast

    Broadcast transmission mode is very useful when there is a requirement to send in-formation to a large number of connected hosts. However, broadcasting informationmeans that the information reaches all connected hosts, including those who havenot requested the information. It is possible to distribute multimedia with a broad-cast transmission mode but, as described above, it introduces an additional networkload. Figure 2.14 illustrates a multimedia delivery situation using broadcast.

  • 2.5 Packet Switched Networks 19

    Figure 2.14: Broadcasting packets from one host to all hosts.

    Figure 2.15: Multicasting packets form a server to a group of hosts.

    2.5.3 Multicast

    Multicast works in a manner similar to that of broadcast for a selected number ofhosts. A host subscribes for a stream and becomes a member of that multicast group.Information is then broadcast within the group. A membership inquiry to the mul-ticast group is handled by routers.

    Multicast applied in a streaming multimedia application is illustrated in figure2.15, and the following example. A sender sends information (video) to a multicastaddress (host group), the receiving hosts (clients) join the group if the video is re-quested by the client. If a client no longer wants to receive the video carried by thegroup, the client leaves the group and the multimedia data is no longer forwarded tothat client. One example of an application that can benefit from using multicast is anIPTV service by assigning each TV-channel to its own multicast address. A similarapproach can be used for an Internet radio application.

  • 20 Background Theory

    Figure 2.16: OSI and TCP/IP reference models

    2.6 Internet Protocol Networks

    This section will cover Internet networking protocols in general in addition to In-ternet Protocol (IP) multicasting. The Internet protocol stack is somewhat differentfrom the seven layer Open Systems Interconnection (OSI) model [35]. The two differ-ent approaches are compared in figure 2.16 where it is possible to study the mappingbetween the two. This thesis only deals with IP networks and the reference to OSI ismerely for information.

    2.6.1 Internet Protocol

    Internet Protocol [36] is located at the network layer of the TCP/IP reference model.The internet protocol ensures that packets arrive at the correct destination and itacts as an interface for a wide plethora of underlying network techniques. The IPversion 4 (IPv4) header, illustrated in figure 2.17, contains information about thesource host, destination host, header length, total packet length, as well as otheradditional information.

    2.6.2 Transmission Control Protocol

    Transmission Control Protocol (TCP) [37] handles connection oriented packet trans-port and control. The working principle of TCP is well suited to the transport of datawhere the correctness of the data is prioritized over timely delivery.

  • 2.6 Internet Protocol Networks 21

    Figure 2.17: IP header fields.

    Figure 2.18: UDP header fields.

    2.6.3 User Datagram Protocol

    Datagrams, or connectionless communication is an efficient method of distributinginformation with a constrained lifespan. User Datagram Protocol (UDP) is a con-nectionless transport protocol and all IP packets are handled according to the besteffort principle within the network. UDP is often used to transport real-time data,in particular for streaming video. UDP does not have an error detecting capabilitysince there is no specified back-channel defined. This means that if a UDP packet islost, it is up to the higher layers in the protocol stack to detect and solve the packet-loss situation. However, the UDP header information only specifies the port of thesource and destination, packet length and a checksum. To assist in the detection ofpacket loss or packets out of sequence, the Real-Time Protocol (RTP), Section 2.9.3,has been developed to complement the underlying UDP/IP protocols.

    2.6.4 IP Multicasting

    Internet Protocol (IP) (Section 2.6.1 multicasting [38][39] specifies how to transportIP datagrams from one host to a group of hosts. This group can consist of zeroor some hosts. The group is identified by a particular IP destination address from adedicated multicast address range. In IPv4 the address range dedicated for multicast[40] is presented in table 2.1. Datagrams sent to a multicast group are forwarded toall the member hosts of that group. However, it is not a requirement that a host is amember of a host group in order to send datagrams to it. Membership is handled bythe Internet Group Management Protocol (IGMP) [41].

  • 22 Background Theory

    Table 2.1: IPv4 Multicast address space

    Start address End address Description224.000.000.000 224.000.000.255 Local Network Control Block224.000.001.000 224.000.001.255 Internetwork Control Block224.000.002.000 224.000.255.000 AD&HOC Block224.001.000.000 224.001.255.255 ST Multicast Groups224.002.000.000 224.002.255.255 SDP/SAP Block224.252.000.000 224.255.255.255 DIS Transient Block225.000.000.000 231.255.255.255 RESERVED232.000.000.000 232.255.255.255 Source Specific Multicast Block233.000.000.000 233.255.255.255 GLOP Block234.000.000.000 238.255.255.255 RESERVED239.000.000.000 239.255.255.255 Administratively Scoped Block

    2.7 Quality of Service

    Quality of Service (QoS) is a very wide concept and is used quite loosely in vari-ous situations. The QoS term in a multimedia distribution context often means thatresources are reserved and measures are made to guarantee that these reservationsare honoured during the session. Real-time services such as IPTV are very sensi-tive to variations in available bandwidth or excessive delay jitter which makes QoSguarantees particularly important for such services.

    2.7.1 Differentiated Services

    Differentiated Services (DS) [42] [43] is a method to implement a scalable solutionfor service differentiation in an IP network. In practice the type of service, see figure2.17, field in the IP header is used to assign the packet to a particular traffic class [44]by a ingress router to the DS-domain. The packet is then distributed through the DS-domain, which contains routers employing a common DS-policy for the routers inthe DS-domain. Individual routers act on the class information and can treat packetsbelonging to different classes differently.

    2.8 Video Quality

    Video quality assessment can be performed, either using a subjective judgment con-ducted by a human focus group or by an objective method where the judgment isbased on a predefined algorithm.

  • 2.9 Multimedia Transport 23

    2.8.1 Subjective Video Quality

    The international standard ITU-R BT.500-10 [45] specifies a set of methods whichdeals with how to set up a subjective test for the assessment of TV pictures. A largenumber of parameters are specified, such as the audience size, viewing time, andstatistical methods to mention but a few. Although the standard is not really appli-cable to digital video in general it at least provides some information as to how toset up a subjective test. Since subjective methods include a focus group and spe-cialized equipment, they are often too complicated and expensive to use for smallinvestigations under a constrained budget.

    2.8.2 Objective Video Quality

    Objective methods are ideal for the evaluation of the degradation when the uncon-taminated source material is available for reference. An objective picture qualityassessment method uses information supplied and a method to derive the metric.The effectiveness of objective methods is often compared with subjective data. Ob-jective methods are often ranked with regards to how much information they requirein order to judge a picture and are therefore often classified based on this. Full refer-ence methods use a complete undistorted version of the picture in order to assess it.Peak Signal to Noise Ratio (PSNR) [46] is a popular and widely used full referenceobjective metric. The Structural Similarity (SSIM) index is a rather new full referenceobjective method [47] which can be used as a replacement for PSNR in some ap-plications. No-reference methods usually require additional information than thatactually provided in the picture to be judged. In this case the assessment method isoften specialized to assess a specific type of impairment and is thus rather tailoredto the application.

    2.9 Multimedia Transport

    To transport multimedia information over a packet switched network provides aseries of challenges. A multimedia stream generally consists of several individualstreams of audio and video. Each one of these individual streams is often referred toas an Elementary Stream (ES) and thus each of the ESs is probably related to others,i.e. representing audio and video from a common capture instance. In this casethe timing of the delivery and presentation of the individual ESs is crucial for theperceived quality of the presentation. For example, consider a multimedia streamconsisting of one audio ES and one video ES. These two streams have to be deliveredand presented in such a fashion that the streams are synchronized in order to achievethe best possible quality. If the video becomes of sync with the audio, this reducesthe overall quality impression of the combined multimedia stream even though theindividual streams are error free. If the sync skew is below "plus minus" 80ms thenthe synchronization between the audio and video may be considered to be good [23].

    Multimedia transport is also affected by the transmission channel over which the

  • 24 Background Theory

    system transmits the signal. Information loss caused by congestion is one plausibleevent that can cause distortions in the decoded video. For example, if the videocontent being delivered is compressed using predictive coding there is a risk that asingle information loss event can influence several frames. The distortion propagatesuntil a random access point is reached. This effect is also known as drift. When largevariations in delay occur then this is another factor that can severely impact uponthe quality of streamed multimedia transmissions. For example, the data buffer inthe receiver can be starved and the decoder can be forced to miss a frame, causingunpleasant jerkiness in the video play-out.

    2.9.1 Multimedia Streaming

    The term streaming is often used when mentioning transport of multimedia fromone host to another. In general, video can either be streamed or downloaded. Ifmultimedia content is downloaded it is received in full before the play-out of themultimedia begins. When multimedia is streamed a portion of the multimedia isforwarded to the receiving host in such a manner that the play-out can begin beforethe stream has been received in full. The amount of buffer used and the rate of thedata received will determine the amount of time before the play-out of the multime-dia can begin. In the strictest sense, downloading can be considered to be streamingwith a buffer which is sufficiently large to hold the entire feature.

    2.9.2 MPEG-2 Transport Stream

    MPEG-2 Transport Stream (TS) a protocol for packetizing and multiplexing MPEGstreams. TS is a part of the MPEG-2 Systems standard [27] [48].

    In practice the audio ES and video ES together with other data are multiplexedinto one TS. The TS is a stream consisting of several fixed length packets. Transportstream packets are 188 octets long where four octets comprise a fixed header,as illus-trated in figure 2.19. The TS header indicates which ES the payload data belongs toas well as counters for error resilience and fields to indicate scrambling and headerextensions. TS is widely used as a transmission technique as specified by the DigitalVideo Broadcasting (DVB). DVB specifies different ways of transmitting TS over var-ious mediums, including terrestrial (DVB-T), cable (DVB-C), and satellite (DVB-S),to mention the most common.

    Since DVB as a organization is heavily linked to the broadcast industry it is nat-ural that it is able to have some influence on the development of the transmission ofdigital television over networks [49].

    2.9.3 Real-time Transport Protocol

    The Real-time Transport Protocol (RTP) [50] [51] specifies header extensions for UDPpackets to enable transport of real-time data, such as multimedia. The standard

  • 2.9 Multimedia Transport 25

    Figure 2.19: Transport Steam packet and header.

    Figure 2.20: RTP header fields.

    RTP header format is illustrated in figure 2.20. There are RTP header extensionsspecified for several video and audio payload formats. For example, H.264[52] and[53]. Since RTP is transmitted using UDP the means for feedback information aboutthe transmission quality is limited. For this purpose the RTP Control Protocol (RTCP)[50] is defined. The primary function of RTCP is to send feedback information aboutthe quality of the RTP distribution.

    Given that the recommendation is to use RTP in MPEG2-TS based IPTV distribu-tion [49], it is somewhat surprising that this is not actually used in many cases [54]thus making QoS monitoring impossible.

    In general RTP is very well suited for distributing IPTV services. To use Na-tive RTP [54] instead of MPEG2-TS removes unnecessary headers introduced by theMPEG2-Ts. Native RTP transmission also allows for the splitting of streams relat-ing to one program into several streams. For example, a program distributed withMPEG2-TS can consist of one video stream and several audio streams. However,since it is very unlikely that a client would be interested in two simultaneous audiostreams, it is a waste of bandwidth to packetetize the program in such a way. Byusing native RTP for all the streams associated with the program these can be dis-tributed on separate multicast addresses and be received by explicitly joining them.The time-stamp field in the RTP header can be used to synchronize the differentstreams in the program.

  • 26 Background Theory

    2.10 Internet Protocol Television

    Internet Protocol Television (IPTV) has over the recent years gained popularity amongthe new broadband services. IPTV is a relatively new broadband service, but it is ex-pected to grow rapidly over the next few years [8].

    Although IPTV has been mentioned during the last couple of years, some con-fusion exists around the meaning of the what is or is not IPTV [55]. Even though itappears to be impossible to clearly define ITPV it can actually be very simple. IPTVis TV distributed over IP. However, two services that are often mentioned as beingpart of the IPTV concept are; linear TV distributed by multicast (live-TV) and Videoon Demand (VoD).

    The live-TV service is the one considered to be the normal TV service of today,that is, TV-programs are bundled together in a channel following a predefined sched-ule. In the IPTV deployments over the world at present, live-TV service is distributedby multicast. In general the content is coded using MPEG-2 and multiplexed into aMPEG-2 TS. It is often not necessary to re-encode the video and audio since it theyare often delivered to the IPTV playout location (head end) by either satellite or ter-restrial transmission and are already coded in MPEG-2. However, since satellite andterrestrial Digital video transmission techniques often bundle several TV-channelstogether into one TS, it is necessary to re-multiplex the information associated withone TV-channel into a new TS. One TV-channel now resides inside its own MPEG-2TS and is packetized into UDP/IP packets and sent to the multicast group.

    Video on Demand (VoD) is the other of the two mentioned IPTV services. VoDfunctions as a video rental store accessible through the TV-set. This sounds remark-able, but is actually quite easily implemented in its more basic forms. Several pro-tocols exist which can assist the VoD distribution and examples of these are; RTSPto provide trick play functionality and different content scrambling protocols andconditional access protocols.

    IPTV is sometimes confused with Internet-TV [56] which offers low resolutionstreamed TV content over the Internet, or even sites such as youTube [57]. The borderbetween what can be considered as Internet-TV and IPTV are services including joost[58], babelgum [59], and zattoo [60]. These have a more TV-like user interface andin some cases a more developed and advanced distribution method. However, theseservices require the user to download and install some proprietary software beforethe service can be used.

    A good comparison between Internet-TV and IPTV is available in an EBU report[61] where factors such as, video quality, resolution, bandwidth, and so on, are listedin order to mark the difference between IPTV and Internet-TV. This thesis uses thisdefinition of IPTV.

  • 2.11 Hierarchical Video Coding 27

    Figure 2.21: Temporal scalability.

    2.11 Hierarchical Video Coding

    Since the coding of video is based on the removal of redundant information in thevideo signal it can be argued that some part of the coded information is more im-portant than another. Extending this, it could also be stated that some of the codedinformation is useless without other information being available. One example ofthis is that it is impossible to reconstruct a P-frame without access to the referenceframe. Hierarchical video coding [16] makes use of this property when ordering theimportance of the coded video information. Hierarchical coding of video signals isof great use when examining methods for video transport over capacity constrainedchannels. Hierarchical video coding makes it possible to reduce the resource require-ments of a coded video signal in a graceful way by reducing the resource consump-tion by dropping the least important information first.

    2.11.1 Temporal Scalability

    Temporal scalability [16] is one method of organizing the coded video data into dif-ferent hierarchical layers. In general, temporal scalability is easy to achieve since it ispossible to partition a pre-encoded bitstream. One example regarding how temporalscalability can be applied in a GOP video sequence is illustrated in Figure 2.21. Inthis case the hierarchical structure over one GOP is provided as the I-frame is locatedin the first layer. The I-frame predicts the following P-frames which constitute thesecond layer. The third layer consists of all the B-frames in the GOP. One drawbackassociated with temporal scalability in this manner is that the visual result of theremoval of the least important layer drastically reduces the frame update frequency.

    2.11.2 Frequency Scalability

    Another approach to achieving scalable video encoding is to assign the frequencyrelated information in the picture to different layers. Frequency scalability [16] is

  • 28 Background Theory

    Figure 2.22: Frequency partitioning.

    achieved by dividing the transform coefficients into different layers. Low frequencycomponents are assigned to a base layer and higher frequency components are as-signed to higher layers. An example of a frequency scalability approach is illustratedin figure 2.22. The result of adding layers to the base layer can be thought of as anincrease in the quality.

  • Chapter 3

    Synchronization Frames forChannel Switching

    This chapter familiarize the reader with the concept of Synchronization Frames forChannel Switching (SFCS). SFCS was originally designed to work with compressedvideo but the principles could be applied to all shared information flows subject totemporal redundancy.

    3.1 Background

    For efficient coding of digital video it is essential to exploit the temporal redundancyin the signal. Inter coding, see section 2.2.2, minimizes the temporal redundancyof video sequences and achieves a good compression result. However, inter codingintroduces dependencies between consecutive frames in a coded video sequence.This dependency is due to the motion- estimation/compensation techniques used.In brief, information in one frame can be reused in other frames. In order to be ableto initiate decoding a stream consisting of inter coded frames, intra coded framesare inserted to act as good decoder starting points. The Group of Pictures (GOP)concept specifies an intra coded frame followed by a number of inter coded frames.GOP sizes are often fixed in a sequence. Hence,this will provide suitable streamresynchronization points at a fixed frequency. A stream synchronization point isalso called a stream Random Access Point (RAP) [62]. Since intra coded frames areby definition larger than inter coded counterparts it makes sense to have as few ofthem as possible. In figure 3.1 this is illustrated by the rate/distortion plots of H.264encodings of the 1280x720p 50fps SVT sequence "Fairytale" with the x264 encoder[32] using different GOP sizes. Fewer intra frames means better quality or lowerbitrates. The only remaining reason for having intra coded frames in a stream is toprovide RAPs.

    The following two sections will address the basics involved in SFCS and also

  • 30 Synchronization Frames for Channel Switching

    0 5 10 15 20 25 3024

    26

    28

    30

    32

    34

    36

    38

    Bandwidth [Mbps]

    Seq

    uenc

    e Y

    −P

    SN

    R [d

    B]

    Rate distortion for GOP sizes H.264 (1280x720p 50fps)

    GOP=6GOP=12GOP=48GOP=192

    Figure 3.1: Rate distortion plots for the 1280x720p 50fps SVT sequence "Fairytale" at differentGOP sizes.

    a section covering expressions for bandwidth estimation of SFCS distributed overnetwork links.

    3.2 Synchronization Frames for Channel Switching

    Synchronization Frame for Channel Switching (SFCS), published in the included pa-pers III-V, is based on the idea that it is redundant to send RAPs at a fixed frequencyas is done in GOP based transmission schemes. SFCS separates the RAPs from theinter coded stream and transmits them on two separate multicast addresses. Channelchanges are performed by first joining the stream with that carrying the RAPs. Afterreceiving one RAP, the RAP stream is left and the inter coded stream is joined. Thereceived RAP is spliced with the inter coded stream and forwarded to the decoder.

    SFCS is designed to function in a streaming environment where several hostsshare a common stream originating from one source. A good example is the live-TV service in the IPTV service suite. In such a service TV channels are transmittedfrom one head-end server to hosts using multicast. One channel is transmitted onone multicast address. When clients want to switch channels, they merely leave thecurrent channel and join the new one. The following example explains how this isdone traditionally and is illustrated in figure 3.2 where the transmitted, received, anddecoded frames are viewed. The client requests a channel switch from the currentchannel (1) to channel 2 at time instance 4. The client now immediately leaves themulticast group for channel 1 and joins the multicast group for channel 2. The clientstarts to receive channel 2 data at time instance 5. The client’s decoder inspects thedata and waits until a RAP appears in the stream. At time instance 9 the client

  • 3.2 Synchronization Frames for Channel Switching 31

    Figure 3.2: Channel switch for a traditional GOP system.

    Figure 3.3: Channel switch for a SFCS system.

    receives a RAP and can start to decode the stream. The client is synchronized tochannel 2 at time instance 9.

    The SFCS concept divides the channel into two streams. The steady state stream,which is called the main stream, consists of inter coded frames only to provide a goodcompression ratio. The second stream is the synchronization (sync) stream, consist-ing only of RAPs. The RAPs in the sync steam are provided at a frequency suitablefor the situation. The sync stream is only received at the actual synchronization pointand is later left. Both the sync and main are distributed by means of multicast. Thefollowing example illustrated in figure 3.3 and explains a SFCS channel switch. Theclient is synchronized to the main stream of channel one and requests a switch tochannel 2 at time instance 4. The client leaves channel 1 main multicast group andimmediately joins the sync multicast group of channel 2. The client receives dataon the sync group at time instance 9, after the whole RAP is received, the channel 2sync multicast group is left. At the same time the main group of channel 2 is joined.The client splices the received data from the sync stream with the data from the mainstream and feeds it to the decoder. The client is now synchronized to channel 2.

    SFCS can also be used to combat effects of information loss. For example, forsome reason the client’s decoder looses synchronization with the main stream. This

  • 32 Synchronization Frames for Channel Switching

    Figure 3.4: The different node and link types in the network.

    reason could be that a packet is lost and the decoder is unable to decode the streamcorrectly. Then the client can join the sync stream and resynchronize to the stream.This approach can also be good to use in a packet loss situation which has beencaused by congestion. The initial back-off caused by leaving the main stream releasesresources and therefore resolves the congestion situation.

    3.3 Bandwidth Estimation

    This section will state bandwidth estimation expressions related to SFCS transmis-sion over multicast enabled networks. There exist three types of nodes on the net-work, Video Server (VS), Router (R), and clients (C). The server is the origin of allstreams. Routers route traffic from the server to other routers or to clients. Clientsrequest streams and are the sink for the streams received. The links between thenodes are categorized into two types. The link which connects one router to anotheror to the video server is called a Router to Router Link (RRL). A link connecting arouter to a client is called a Router Client Link (RCL). A simple network layout withall the components is displayed in figure 3.4.

    It is assumed that N À M , RRG + RPG = fps and RPSM = fps. All channelsare assumed to have the same RAP/inter-frame size ratio. Clients are assumed tobehave equally in terms of issuing synchronization requests (d).

    The expected GOP bandwidth in one RCL becomes

    BGRCL = (rRGRRG + RPG)P. (3.1)

    GOP bandwidth estimation expression for the RRL for one channel m evaluates to

    BGRRL,m = (rRGRRG + RPG) P. (3.2)

    And the GOP bandwidth for all the channels (M ), assuming N >> M in one RRLthe becomes

    BGRRL = M (rRGRRG + RPG)P. (3.3)

    SFCS bandwidth estimation for one RCL

    BSRCL = (RPSM + rRSSRRSSφ)P. (3.4)

  • 3.3 Bandwidth Estimation 33

    SFCS bandwidth estimation in a RRL for the channel m

    BSRRL,m =(RPSM + rRSSRRSS

    (1− e− NmdRRSS

    ))P. (3.5)

    For all channels in a RRL the SFCS bandwidth estimation then becomes

    BSRRL =

    (MRPSM + rRSSRRSS

    M∑m=1

    (1− e− NmdRRSS

    ))P. (3.6)

    The bandwidth estimation expressions can be utilized when designing IPTV distri-bution systems.

    Further details of the SFCS concept applied in different environments can befound in the included papers III-V.

  • Chapter 4

    Error Resilience in VideoTransmission

    Error resilience in video transmission deals with techniques to recover from or resistbeing affected by information loss. Two aspects of error resilience is addressed inthis thesis which are presented in the following two sections.

    4.1 Scalable Video

    Information loss is sometimes caused by congested routing nodes. Scalable videocoding and transmission are based on the idea that it can be beneficial to discardinformation of low importance first in order to resolve an information loss situationcaused by congestion. Receiver-driven layered multicast [63] is one such application.

    To understand the impact that a layered video transmission system has on thequality of the decoded video it is important to study the error propagation betweenthe layers.

    Layered video coding as well as prioritized packet scheduling are two well knownmethods that may improve the quality of service level in real-time applications withhigh bandwidth requirements, and are used over packet switched networks. How-ever, it is often difficult to obtain an idea of, and to quantify, the actual gains thatmay be achievable, especially from an end-to-end perspective.

    4.2 Synchronization Frames for Channel Switching

    SFCS (see chapter 3 is a method for a receiving decoder to synchronize to a streamwith inter-coded frames only by using a side stream with RAPs. This method can beused to recover from transmission errors by simply issuing a resynchronization to

  • 36 Error Resilience in Video Transmission

    the channel when an information loss situation occurs.

    4.3 Contributions

    The author has published three papers, I, II, and IV in the field each one addressingdifferent aspects of error resilience related to video distribution over networks.

    In paper I, the author present a mathematical model for the estimation of videoquality for temporally layered MPEG-2 video subject to information loss. The tem-porally layered MPEG-2 video divides the frames in one GOP into five differentlayers. The base layer consists of the I-frame in the GOP. The second layer is the fistP-frame following the I-frame, the third layer consists of the P-frame following theone in layer two. Layer four consists of the last P-frame. Finally, layer five includesall the B-frames in the GOP.

    The paper uses the number of distorted macroblocks in the decoded frame asa measure for impairments in a picture. Two types of impairments are recognizedin the paper, direct impairments which are the result of information loss in framesbelonging to that layer and indirect impairments which are the result of using animpaired frame as a prediction for the current frame. A mathematical expression forthe number of expected impairments is formulated. Parameters such as the relation-ship between the size of the layers and how the amount of information loss in onelayer affects other layers are investigated experimentally. The results from the math-ematical model are then compared with experimental results and the conformanceis found to be good.

    Although the investigation is focused on the temporally layered MPEG-2, theproposed methods can be applied to any scalable video transmission system with ahierarchical layer structure.

    In paper II the author presents some experimental results obtained from usinga temporally layered MPEG-2 video combined with a basic per-layer IP packet pri-oritization. The goal has been to discover whether a basic scheme is at all usefulin combination with this particular source coding method, and if so, how much theobjective video quality can be increased during bandwidth constrained periods. Thequality is measured in terms of PSNR and the results are compared to the case ofequal packet priority. In addition, different packet sizes as well as packet queuingdisciplines are used. The conclusion drawn is that using even a relatively simpletemporal layering strategy in combination with packet prioritization can quite sig-nificantly improve the end-to-end quality of MPEG-2 video, especially in moderatelybandwidth constrained situations. Furthermore, packet size and queuing disciplineare found to have an impact on decoded video quality. It is interesting to note that theproposed solution shows an improvement in objective quality even for low packetloss ratios.

    In paper IV, SFCS is evaluated from an error resilience point of view. In that casethe sync stream is used to resynchronize the receiver to the channel. In this way thereceiver can recover from an information loss situation. Different packet loss ratios

  • 4.3 Contributions 37

    are investigated using both the presented analytical model and a software simula-tion. The effect on the bandwidth in both the RRL and RCL are studied. The con-formance between the simulated and the analytical results for moderate frame-lossfrequencies are more than satisfactory. The results indicate that SFCS can be used asa method to increase error resilience in live-TV distribution in an IPTV deployment.

  • Chapter 5

    Fast Channel Change

    In live-TV distribution over IP it is common to use multicast to distribute the individ-ual TV-channels to the receivers. When a user of the system issues a channel switchit simply leaves the current multicast channel and joins another. The time betweenthe switch to the new channel being issued and it appearing on the screen is denotedas the channel switch time or the channel change time.

    The general feeling is that long channel switching times are perceived as annoy-ing. This is also confirmed by a study by Kooij et al. [64] who suggests that channelswitching times below 0.5 seconds are acceptable, but channel switching times above0.5 seconds are considered to be annoying.

    Fast Channel Change (FCC) describes techniques enabling rapid changes be-tween encodings of different streams. The following example is a good represen-tation of an IPTV channel switch. The client IP-STB receives the channel changeorder by means of a user request (a user is pressing a button on the remote). Thecurrent multimedia stream is left by issuing an IGMP leave message to the nearestrouter. Immediately the new stream is joined by issuing an IGMP join message tothe nearest router. The client IP-STB now awaits packets from the new stream, butdepending on the availability of the stream in the router and the multicast routingprotocols used this process this can consume a considerable amount of time. How-ever, if the stream is available the waiting time is considered to be small. Whenpackets arrive at the client IP-STB it starts by identifying the stream and searchingfor a suitable stream position to start the decoding process. This stream positionis recognized as a random access point (RAP). A RAP is often an I-frame precededby information regarding the stream. Compressed audio streams have in generalmuch more frequent RAPs when compared to compressed video. Depending on theRAP distance in the stream the waiting times can range from zero to a considerableamount. When the stream is received, the client IP-STB buffers the steam and ini-tializes the hardware. When the IP-STB is initialized the decoder can start to decodethe stream. Depending on the number of bidirectional frames used there can be anadditional buffering time before the picture is displayed. To avoid sensitivity to vari-

  • 40 Fast Channel Change

    ations caused by the packet arrival rate (jitter) it is wise to build a de-jitter buffer tocombat this condition. The fill time of the de-jitter buffer depends on several factorsbut cannot be ignored when estimating the channel change time.

    Fast channel change is a rather wide concept and can include several techniques.How these techniques are implemented depends mostly on the underlying networkinfrastructure. For example, Cho et al. [65] suggests an adjacent groups join methodin which it forwards channels neighboring the current channel to the home gatewayfor more rapid ac