Channel Equalization using Machine Learning for Underwater ...1442845/...Master of Science Thesis in...

Master of Science Thesis in Electrical EngineeringDepartment of Electrical Engineering, Linköping University, 2020

Channel Equalization usingMachine Learning forUnderwater AcousticCommunications

Martin Allander

Master of Science Thesis in Electrical Engineering

Channel Equalization using Machine Learning for Underwater AcousticCommunications

Martin Allander

LiTH-ISY-EX--20/5301--SE

Supervisor: Dr. Özlem Tugfe Demirisy, Linköping University

Systems Engineer Oskar AxelssonSaab Dynamics

Examiner: Associate Professor Emil Björnsonisy, Linköping University

Division of Communication SystemsDepartment of Electrical Engineering

Linköping UniversitySE-581 83 Linköping, Sweden

Copyright © 2020 Martin Allander

Sammanfattning

Trådlös akustisk undervattenskommunikation är ett fält i utveckling med ett fler-tal applikationer. Den akustiska undervattenskanalen är väldigt speciell och be-teendet beror mycket på miljön kommunikationen sker i. Jämfört med trådlösradiokommunikation är den använda bandbredden mycket mindre och Doppler-effekten är mycket mer påtaglig, på grund av ljudets långsammare utbrednings-hastighet. Litteratur publicerade de senaste åren framhäver att maskininlärnings-assisterad kanalestimering och kanalutjämning jämfört med traditionella signal-behandlingsmetoder. Maskininlärning kan vara fördelaktigt att använda då detkan vara svårt att designea algoritmer för undervattenskommunikation, då gene-rella kanalmodeller har visat sig vara svårt att hitta. Denna studie syftar till attutforska ifall maskininlärnings-assisterad kanalestimering och kanalutjämningkan erbjuda ökad prestanda jämfört med tradiotionella metoder. I studien stude-ras övervakad maskininlärning med ett ”deep neural network” och ett ”recurrentneural network”, för att se om neuronnäten kan öka prestanda i termer av an-talet bitfel. En kanalsimulator med miljöspecifik indata används för att studeraett antal olika scenarion. Resultatet av simuleringarna syftar till att identifieraintressanta miljöer att testa neuronnäten i. Resultaten i studien pekar på att imycket tidsvarierande kanaler kan maskininlärning sänka bitfelsfrekvensen, omnätverk tränas med förhandsinformation om kanalen. Att utnyttja maskininlär-ning utan föregående information om kanalen resulterade i ingen förbättring avprestandan.

iii

Abstract

Wireless underwater acoustic (uwa) communications is a developing field withvarious applications. The underwater acoustic communication channel is veryspecial and its behavior is environment-dependent. The uwa channel is charac-terized by low available bandwidth, and severe motion-introduced Doppler effectcompared to wireless radio communication. Recent literature suggests that ma-chine learning (ml)-based channel estimation and equalization offer benefits overtraditional techniques (a decision feedback equalizer), in uwa communications.ml can be advantageous due to the difficultly in designing algorithms for uwacommunication, as finding general channel models have proven to be difficult.This study aims to explore if ml-based channel estimation and equalization as apart of a sophisticated physical layer structure can offer improved performance.In the study, supervised ml using a deep neural network and a recurrent neu-ral network will be utilized to improve the bit error rate. A channel simulatorwith environment-specific input is used to study a wide range of channels. Thesimulations are utilized to study in which environments ml should be tested. Itis shown that in highly time-varying channels, ml outperforms traditional tech-niques if trained with prior information of the channel. However, utilizing mlwithout prior information of the channel yielded no improvement of the perfor-mance.

v

Acknowledgments

A big thank you to Oskar Axelsson for hosting me at Saab Dynamics and givingme the freedom to outline this thesis after my interests and all the support andfeedback. At Saab Dynamics, I would also like to thank Per Abramhamsson andSimon Keisala, Per for helping me understand all the complications in modelingsonar propagation, and Simon for all the help with tweaking the neural networks.

Of course, a final thank you to Emil Björnson and Özlem Tugfe Demir. I ap-preciate all the feedback on the report and support.

Linköping, June 2020Martin Allander

vii

Contents

List of Figures xii

List of Tables xiv

Notation xv

1 Introduction 11.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.3 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . 41.4 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.5 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Theoretical Background 72.1 Underwater Channel Characteristics . . . . . . . . . . . . . . . . . 7

2.1.1 Attenuation . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.1.2 Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.1.3 Multipath Propagation . . . . . . . . . . . . . . . . . . . . . 92.1.4 Doppler Effect . . . . . . . . . . . . . . . . . . . . . . . . . . 112.1.5 Scattering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.2 Channel Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.3 Baseline Physical Layer . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.3.1 Transmitter . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.3.2 Receiver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.4 Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . . . . 152.4.1 Input and Output . . . . . . . . . . . . . . . . . . . . . . . . 162.4.2 The Neuron . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.4.3 The Network . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.4.4 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.4.5 Designs Considerations . . . . . . . . . . . . . . . . . . . . . 172.4.6 Recurrent Neural Networks . . . . . . . . . . . . . . . . . . 192.4.7 Long Short-Term Memory Architecture . . . . . . . . . . . . 19

2.5 Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

ix

x Contents

2.5.1 Machine Learning in Wireless Radio Communication . . . 202.5.2 Machine Learning in Underwater Acoustic Communication 212.5.3 Key Takeaways . . . . . . . . . . . . . . . . . . . . . . . . . 21

3 Method 233.1 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.1.1 Baseline Receiver . . . . . . . . . . . . . . . . . . . . . . . . 243.1.2 Machine Learning Receiver . . . . . . . . . . . . . . . . . . 243.1.3 Choice of Parameters . . . . . . . . . . . . . . . . . . . . . . 243.1.4 Bit Error Rate Definition . . . . . . . . . . . . . . . . . . . . 25

3.2 Software Simulation Environment . . . . . . . . . . . . . . . . . . . 253.2.1 Machine Learning Software . . . . . . . . . . . . . . . . . . 253.2.2 Channel Model . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.3 Channel Simulation Configuration . . . . . . . . . . . . . . . . . . 263.3.1 Bathymetry . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.3.2 Sound Speed Profiles . . . . . . . . . . . . . . . . . . . . . . 273.3.3 Bottom Sediment Types . . . . . . . . . . . . . . . . . . . . 273.3.4 Channel Geometry and General Parameters . . . . . . . . . 283.3.5 Channel Variations . . . . . . . . . . . . . . . . . . . . . . . 283.3.6 Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.4 Channel Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . 303.4.1 Time-Variant Filter . . . . . . . . . . . . . . . . . . . . . . . 313.4.2 Simulation Loop . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.5 Artificial Neural Network Structure . . . . . . . . . . . . . . . . . . 323.5.1 Deep Neural Network . . . . . . . . . . . . . . . . . . . . . 333.5.2 Long Short-Term Memory Network . . . . . . . . . . . . . . 34

3.6 Artificial Neural Network Experiments . . . . . . . . . . . . . . . . 343.6.1 Training Data Generation . . . . . . . . . . . . . . . . . . . 35

3.7 Miscellaneous Studies . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4 Results 374.1 Artificial Neural Network Experiments . . . . . . . . . . . . . . . . 37

4.1.1 High Bit Error Rate Channels . . . . . . . . . . . . . . . . . 384.1.2 Low Bit Error Rate Channels . . . . . . . . . . . . . . . . . . 39

4.2 Deployment Strategies . . . . . . . . . . . . . . . . . . . . . . . . . 404.2.1 Low to Moderate Time-Variance . . . . . . . . . . . . . . . . 404.2.2 High Time-Variance . . . . . . . . . . . . . . . . . . . . . . . 42

4.3 Miscellaneous Studies . . . . . . . . . . . . . . . . . . . . . . . . . . 43

5 Discussion 455.1 The Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

5.1.1 Artificial Neural Network Structure . . . . . . . . . . . . . 455.1.2 Deployment Strategies . . . . . . . . . . . . . . . . . . . . . 465.1.3 Miscellaneous Studies . . . . . . . . . . . . . . . . . . . . . 47

5.2 Error Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475.2.1 Channel Model . . . . . . . . . . . . . . . . . . . . . . . . . 47

Contents xi

5.2.2 Machine Learning Software . . . . . . . . . . . . . . . . . . 485.3 Relation to Other Work . . . . . . . . . . . . . . . . . . . . . . . . . 485.4 Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495.5 The Thesis in a Larger Perspective . . . . . . . . . . . . . . . . . . . 49

6 Conclusions 516.1 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 516.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 526.3 Final Words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

A Bathymetry Profiles 57A.1 Shallow Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57A.2 Deep Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

B Sound Speed Profiles 63B.1 Shallow Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63B.2 Deep Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

C Channel Simulations 73

D Time-varying Channels 79

Bibliography 83

List of Figures

1.1 Picture of an underwater sensor node from Saab Dynamics. . . . . 2

2.1 Transmission loss as a function of frequency, at 1km range andwith k = 1.7. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2 An illustration of the frss modulation format, compared to tradi-tional dsss, from [32]. . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.3 Illustration of an lstm block and its connections, from [6]. . . . . 20

4.1 Performance comparison of the lstm, dnn and dfe-pll in highber channel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4.2 Performance comparison of the lstm, dnn and dfe-pll in lowber channel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.3 ber as a function of snr, using three different equalizers. Shallowprofile, clay bottom ssp 2019-03-16. . . . . . . . . . . . . . . . . . 41



4.6 ber as a function of snr, using two different equalizers. Shallowprofile, clay bottom ssp 2019-01-07. . . . . . . . . . . . . . . . . . 43

4.7 ber as a function of snr, for online and offline-trained lstm incgn and wgn. Shallow obstacle profile, clay bottom ssp 2019-03-16. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

A.1 Illustration of the shallow flat bottom profile. . . . . . . . . . . . . 58A.2 Illustration of the shallow slope bottom profile. . . . . . . . . . . . 59A.3 Illustration of the shallow obstacle bottom profile. . . . . . . . . . 60A.4 Illustration of the deep flat bottom profile. . . . . . . . . . . . . . . 61A.5 Illustration of the deep slope bottom profile. . . . . . . . . . . . . . 61A.6 Illustration of the deep obstacle bottom profile. . . . . . . . . . . . 62

B.1 Sound speed as a function of depth, data from 2019-03-16 04:45REF M1V1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64


xii

LIST OF FIGURES xiii



B.5 Sound speed as a function of depth, data from 2019-01-07 13:50Släggö. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68





C.1 Channel simulation, shallow scenario with sandy bottom type. . . 74C.2 Channel simulation, shallow scenario with clay bottom type. . . . 75C.3 Channel simulation, deep scenario with sandy bottom type. . . . . 76C.4 Channel simulation, deep scenario with clay bottom type. . . . . . 77

D.1 Channel impulse response for deep scenario, slope profile, sandbottom ssp 2019-03-05. . . . . . . . . . . . . . . . . . . . . . . . . . 79

D.2 Channel impulse response for deep scenario, slope profile, sandbottom ssp 2019-05-06. . . . . . . . . . . . . . . . . . . . . . . . . . 80

D.3 Channel impulse response for deep scenario, obstacle profile, claybottom ssp 2019-01-07. . . . . . . . . . . . . . . . . . . . . . . . . . 80

D.4 Channel impulse response for shallow scenario, obstacle profile,clay bottom ssp 2019-03-16. . . . . . . . . . . . . . . . . . . . . . . 81



List of Tables

1.1 Comparison of fundamental physical properties for radio commu-nication and uwa communication. . . . . . . . . . . . . . . . . . . 2

3.1 Configurable equalizer parameters. . . . . . . . . . . . . . . . . . . 253.2 Bottom sediment properties for two different kinds of bottom. . . 273.3 Channel geometry and general channel/simulation properties. . . 283.4 Configurable small-scale settings. . . . . . . . . . . . . . . . . . . . 293.5 Configurable large-scale (L-S) settings. . . . . . . . . . . . . . . . . 293.6 Configurable Doppler effect parameters. . . . . . . . . . . . . . . . 303.7 dnn layer structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.8 lstm layer structure. . . . . . . . . . . . . . . . . . . . . . . . . . . 343.9 Training options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

xiv

Notation

Abbreviations

Abbreviations Meaning

adam Adaptive moment estimationann Artificial neural networkauv Autonomous underwater vehicleber Bit error ratecgn Colored Gaussian noisedfe Decision feedback equalizerdfe-pll Decision feedback equalizer with a phase-locked loopdnn Deep neural networkdsss Direct sequence spread spectrumfrss Frequency repetition spread spectrumfsk Frequency-shift keyingGPU Graphical processing unitisi Intersymbol interferencelstm Long short-term memorymcss Multi-carrier spread spectrumml Machine learningofdm Orthogonal frequency division multiplexingpll Phase-locked looppsd Power spectral densityqpsk Quadrature phase-shift keyingrelu Rectified linear unitrls Recursive least squaresrnn Recurrent neural networksiso Single input single outputsnr Signal-to-noise ratiossp Sound speed profileuwa Underwater acousticwgn White Gaussian noisewssus Wide-sense stationary uncorrelated scattering

xv

1Introduction

1.1 Motivation

Wireless terrestrial communications is a game-changing technology and is a defin-ing technology of the 20th and 21st century. Huge efforts have been made by theindustry and the research community to improve and optimize all aspects of mod-ern wireless communications. Modern wireless communications are based onelectromagnetic waves, which travel at the speed of light. Underwater commu-nication is not an equally explored territory as its terrestrial radio counterpartsince the applications are not necessarily as broad and mainstream. However,applications do exist, both civilian and military. Examples of civil applicationsare oceanographic studies, in terms of undersea exploration, environmental mon-itoring, and disaster prevention. Examples of military applications are communi-cations between submarines, surveillance systems, and mine reconnaissance. Inother words, underwater networks have interesting and varying applications. Apicture of an underwater sensor node designed by Saab Dynamics is shown inFigure 1.1.

Electromagnetic and optical waves are not feasible in underwater communica-tions as they are quickly absorbed by the water. So to communicate under waterat a longer range than a few meters, acoustic waves are utilized, although acous-tic waves come with some undesired properties. Table 1.1 presents some met-rics, comparing an example of the underwater acoustic (uwa) channel with anexample of the common terrestrial radio channel, to give the reader some under-standing of the implications of using acoustic waves compared to electromagneticwaves. Data for the radio channel is from the long term evolution standard, morecommonly known as 4G [1]. As can be seen in the table, the utilizable bandwidthand propagation speed in uwa communications are in vastly different scales com-

1

2 1 Introduction

Figure 1.1: Picture of an underwater sensor node from Saab Dynamics.

pared to radio communications.

Table 1.1: Comparison of fundamental physical properties for radio com-munication and uwa communication.

Property Radio channel uwa channel

Wave propagation speed 3 · 108 m/s 1500 m/sBandwidth 20 MHz 5 - 10 kHz

Center frequency 700 MHz - 2.7 GHz 5 - 30 kHz

Communication networks are often modeled by dividing the stack into differentlayers, an example is the OSI-model [2], each layer has different tasks and the low-est level is the physical layer. The physical layer has the task of transmitting theraw bit-stream over the physical channel [2]. The purpose of the transmitter isto map binary data onto a waveform which can be transmitted over the medium.The medium can, for example, be a radio-link or an acoustic link. The physicalmedium disturbs the transmitted waveform by introducing noise and attenuatingthe signal, etc. The receiver has the task of picking up the disturbed waveformand decode the information correctly. The physical layer transmitter and receivershould, therefore, be designed according to the communication channel it is op-erating in.

1.1 Motivation 3

The nodes in an underwater network can be static or moving. A static node canbe a sensor gathering data, while a moving node can be an autonomous underwa-ter vehicle (auv). Underwater networks are often built ad-hoc, as the applicationand the environment between nodes vary a lot. For example, a node that collectsdata from the ocean needs to establish a communication link between a node onthe surface. The distance between the nodes can be several kilometers, whichposes a significant challenge, due to the severe signal attenuation at long dis-tances. Another very different scenario would be two auvs wanting to establisha communication link in the Baltic Sea (around 80 meters deep). Here difficultiesdo not arise due to the vast distance, but instead, echoes caused by reflections ofaudio on the surrounding surfaces. The environment in the underwater channelbetween the nodes is also subject to constant change due to factors such as ma-rine wildlife, ocean vessels, and tides.

With the very different propagation environments described above, designing aphysical layer that can handle very different and difficult conditions is a chal-lenge. There are in literature a lot of suggestions for transmitter/receiver struc-tures for uwa communications with a lot of variations. In modern wireless ra-dio communication standards such as 4G and 5G, orthogonal frequency divisionmultiplexing (ofdm) is a commonly used method. In ofdm the available fre-quency band is divided into many sub-bands, each behaving similarly to an addi-tive white Gaussian noise (wgn) channel. ofdm reduces modem complexity andenables high data-rates. ofdm is applicable in uwa communication and can bebeneficial. However, ofdm does not always provide the optimal solution in uwacommunication, shown in [31]. The preferred physical layer design depends on ifthe application is long/short-range, high/low signal-to-noise ratio (snr), shallowor deep. Therefore more complicated physical layer designs for uwa exist andcan provide benefits over ofdm, models where the specifics of the uwa channelare taken into consideration. Developing physical layers for uwa communica-tion is non-trivial and with the different possible environments, uniformly goodperformance is not easily guaranteed. It is all worsened by the fact that testing isexpensive and time-consuming.

uwa has some leading characteristics, explained in Section 2.1, but modelingthe uwa channel is non-trivial, which is detailed in Section 2.2. Finding generalmodels has proven to be difficult, which leads to a model deficit. Physical layerdesigns are therefore often sub-optimal and unable to perform well in all con-ditions. In this case, machine learning (ml) is a good candidate to combat themodel deficit. Recent literature highlights ml as an alternative or complementapproach to classical signal processing, to improve and generalize performancefor physical layer algorithms. ml has also gained popularity in other fields suchas speech recognition, image processing, etc. It is therefore of great interest tostudy ifml algorithms can be utilized to improve the performance of a communi-cation system when the underwater channel introduces difficulties. The desirableproperty of an ml-based system is that it can be generalized, performing well inmultiple circumstances, if trained correctly.

4 1 Introduction

1.2 Purpose

Researchers in cooperation with Saab Dynamics have suggested a physical layer(transmitter and receiver) protocol based on the frequency repetition spread spec-trum (frss). The protocol is motivated for reliability and performance for lowsnr, where it outperforms ofdm [31]. The existing frss transmitter and receiverwill be used as a baseline for the project. The purpose is then to investigate ifthe channel estimation and equalization utilized by the frss based on a decisionfeedback equalizer with a phase-locked loop (dfe-pll) can be improved by uti-lizing ml. Channel estimation and equalization are considered among the mostdifficult tasks in uwa communication, due to the sparse time-varying multipathpropagation. The purpose of the study is to explore whether a receiver basedon ml methods like deep neural networks (dnns) or recurrent neural networks(rnns) can offer improved performance. The performance will be studied interms of coded bit error rate (ber) as a function of the snr when the baselinedfe-pll will be compared to the developed ml-based receiver.

1.3 Problem Formulation

The thesis aims to study

1. in which environments dnn or rnn-based channel estimation and channelequalization can improve performance compared to dfe-pll;

2. why it can offer improved performance and how much the performance canbe improved;

3. and possibilities to develop a solution where the rnn or dnn has no priorknowledge of the deployment scenario, either an online-training or a data-driven approach.

Various channel simulations will be performed to identify environments wherethe performance can be improved, which will be a sizeable part of the work.

1.4 Limitations

To limit the scope of the thesis, several limitations are set throughout the thesis.Here some of the initial assumptions are described, but more limitations are setthroughout the thesis as theory and models are introduced.

The distance between transmitter and receiver is assumed to be 1000 meters. Itwas also decided that the underwater environments should resemble the condi-tions in Swedish waters. Swedish waters, mainly the Baltic Sea and the waters inSkagerrak and Kattegatt are shallow. So the intention is to study shallow-watercommunications. Two depths, a shallow case (18 meters) and a deep case (72meters), are studied to represent different environments. Note that a depth of 72

1.5 Background 5

meters is still considered in the general field as shallow-water communications.It is assumed that the communication nodes utilize hydrophones in a single inputsingle output (siso) setup i.e., transmitter and receiver only use one hydrophone.The beam-pattern is assumed to be omnidirectional. The hardware used at SaabDynamics does not provide a truly omnidirectional pattern, but this assumptionreduces the complexity and the assumption is common in the literature. Trans-mitter and receiver locations are assumed to be static, only drifting slightly in theenvironment. Further limitations introduced in the thesis are intended to repli-cate a realistic simulation.

The aspect of computational complexity and hardware limitations of uwa com-munication modems will be disregarded.

1.5 Background

This thesis is written in corporation with Saab Dynamics in Linköping. SaabDynamics is a subsidiary of Saab AB. Saab AB provides high technological solu-tions within the military defense, civil defense, and aerospace. Saab Dynamics, inturn, supplies a wide range of products, such as torpedoes, missiles, ground com-bat equipment, and various naval solutions. The naval solutions include auvs,remotely-operated vehicles, underwater networks, and torpedoes (a mix of civiland military products). For example, remotely-operated vehicles can perform re-pairing missions on underwater pipelines or oceanographic studies.

Thus, underwater wireless communications is an area of great interest for SaabDynamics as it can allow for these products to become wireless and autonomous.Saab Dynamics works with a lot of partners to promote and participate in uwawireless communications research. As an active participant in the research com-munity and involved in product development, Saab Dynamics has an interest inuwa channel modeling and signal processing.

2Theoretical Background

This chapter describes the fundamental theory important to the study. First, anintroduction to the underwater channel is given in Section 2.1 and the channelmodels are discussed in Section 2.2. Based on theoretical knowledge about uwacommunications, the baseline frss physical layer is described in Section 2.3. Asection is devoted to describing the basics of artificial neural networks (anns)and the types dnns and rnns, which are studied in this thesis. The chapter isended by studying related work in communications.

2.1 Underwater Channel Characteristics

Due to the usage of acoustic signals, the signals transmitted in uwa are inher-ently wideband [25], i.e., the bandwidth of the signal B is large relative to thecarrier frequency fc. As highlighted in Table 1.1, the carrier frequency is substan-tially lower compared to electromagnetic waves, making the available bandwidthlower. The low bandwidth utilized also means that the supported data-rates arequite low, as the capacity of the channel increases with the available bandwidth[29]. The intuition behind the result is that more available bandwidth means thatmore information can be loaded into one transmission. In radio communications,assumptions are often based on the fact that fc � B, this is not viable in the uwachannel [25]. To get a further understanding of the difficulties of the acousticchannel, some crucial aspects will be discussed below in separate sections.

2.1.1 Attenuation

The amount of power captured at the receiver is a determining factor if we canextract any data from the signal, or if we just receive noise. Thus, understandingthe attenuation of a signal through a communication medium is crucial in all

7

8 2 Theoretical Background

kinds of communication systems. In the uwa channel, the signal attenuationis frequency-dependent. The attenuation, i.e., path-loss A(l, f ) can be describedaccording to [25] as:

A(l, f ) = (l/ lr )kα(f )l−lr , (2.1)

where l is the propagation distance compared to a reference distance lr , f is thefrequency of the signal, and α(f ) is the absorption coefficient, which increaseswith increasing frequency. The exponent k, i.e., the path loss exponent, modelsthe spreading factor in the water, which is a factor between 1 and 2. α(f ) impliesthat low-frequency components of the signal transmitted through the water willbe received with higher power compared to the high-frequency components. Thefrequency-dependent absorption coefficient can be described by Thorp’s empiri-cal formula [16] in dB/km:

10 logα(f ) = 0.11f 2

1 + f 2+ 44

f 2

4100 + f 2+ 2.75 · 10−4f 2 + 0.003, (2.2)

where f is in kHz. The frequency-dependent transmission is visualized in Figure2.1 for a typical bandwidth 5 − 10 kHz.

Figure 2.1: Transmission loss as a function of frequency, at 1km range andwith k = 1.7.

The choice of the spreading factor is determined by the physical properties of thechannel. A spreading factor of k = 2 corresponds to spherical spreading, wherethe transmission loss increases with the square of the range [30, p. 101] and isspread over the surface of a sphere. The choice of k ≈ 2 is comparable to deepocean communications when the sound waves propagate through the ocean with

2.1 Underwater Channel Characteristics 9

few obstacles. A spreading factor of k = 1 corresponds to cylindrical spread-ing [30, p. 102]. Cylindrical spreading occurs when the sound does not propa-gate freely horizontally or vertically, for example, when the vertical propagationis limited by the seafloor and surface. Cylindrical spreading can occur both atmoderate and long ranges [30, p. 102], where the sound is trapped between theseafloor and surface.

The reality in most scenarios is somewhere in between cylindrical and sphericalspreading. With our interest in shallow-water communication, a choice of k = 2is not realistic as the sound does not propagate freely in the medium. A choiceof k = 1 is not realistic as the trapped sound at a depth of 18 or 72 meters stillsuffers from attenuation when reflected on the surfaces, and the bending due tosound speed variations yields an inhomogeneous propagation.

2.1.2 Noise

All communication channels are subject to disturbing noise. A common assump-tion is to study the performance of a communication system in the presence ofambient wgn, where the white color describes the power spectral density (psd)of the noise being constant in the frequency range of interest and Gaussian de-scribes the probability density function of the noise. In the uwa channel, theambient noise may be modeled as Gaussian [25], although no specific motivationis provided. Some of the most important articles cited in this thesis assume thenoise is Gaussian, but without further references or motivation [23, 26, 32]. Thenoise psd in uwa communication is in fact colored (frequency-dependent), simi-lar to the path-loss [30, p. 206]. Both [25] and [30, p. 210] suggest that the noisepsd decreases at approximately 18 dB/decade with increasing frequency.

In the water, there exists a lot of interference, such as ocean wildlife and shipnoise. These noise sources can differ a lot depending on the environment, for ex-ample in a harbor or the middle of the sea. Attempts have been made to modelspecific interference sources, such as shipping lanes [3] and shrimp clicking [5].Due to the unpredictability of some of these interference sources, they can poten-tially disrupt even the most optimal receivers, designed under assumptions ofcolored or white Gaussian noise.

2.1.3 Multipath Propagation

The environment between the receiver and the transmitter is not obstacle-free,therefore, the acoustic signal is reflected while propagating in the environment.Thus the receiver can pick up multiple delayed instances (from multiple paths) ofthe originally transmitted signal, as reflective components are received. Multipledelayed instances of signals at the receiver cause intersymbol interference (isi).isi is an issue that must be dealt with by the receiver, and if not, can disrupt anyattempts at communication. Multipath propagation is commonly modeled as atapped-line impulse response [24], where the tap gains are modeled by stochastic


processes. In standard terrestrial wireless communications, multipath can bein large quantities. In the uwa channel, the number of propagation paths isnot necessarily as many. The issue is rather that the isi can last several symbolintervals [26], due to the speed of sound, as sound travels at a much slower speedin water compared to electromagnetic waves.

Time-Varying Multipath Propagation

It is important to notice that the multipath propagation properties shift slowlyover time. The time-varying multipath channel is described by a time-variant fil-ter h(τ ; t), as described in [2, p. 132]. The variable τ corresponds to the delays inthe impulse response, i.e., filter taps. The variable t describes how h(τ ; t) varieswith time. When the impulse response varies faster with respect to τ in compar-ison to the variable t, the filter can be considered a sequence of time-invariantfilters [2, p. 132]. The output of the filter is given by the convolution:

y(t) =

∞∫−∞

h(τ ; t)x(t − τ)dτ, (2.3)

where x(t) is the input signal to the system.

Environmental Variations

The behavior of multipath propagation is determined by the appearance of thephysical channel [24]. In shallow waters, the channel impulse response is de-termined by reflections on the surface and bottom, as well as other objects andthe direct path [24]. The appearance of the bottom is called bathymetry andcan be compared to topography on land. Bathymetry between two communica-tion nodes depends on where the nodes are deployed. Hence, there is no singlebathymetry valid for all communications. The importance of having an under-standing of the bathymetry for communications will be highlighted in Section3.3.1. The surface properties between the communication nodes are not staticeither. Due to tides, waves, and other phenomena the surface is subject to vari-ations over time compared to the bathymetry which can be considered ratherstatic once known. Attempts at modeling the behavior of the surface will berather limited in this thesis, but the impact of surface variations should be men-tioned. Entire studies have been devoted to modeling the impact of waves oncommunications and concluded that they are significant [15]. Even details suchas air bubbles due to crashing waves also affect communications.

Sound Speed Variations

The speed of sound in water varies with depth [30, p. 111], due to varying lev-els of temperature, pressure, and salinity, etc. This is important to consider asthe sound waves do not propagate homogeneously in the water due to this trait.Sound waves are bent, which causes further unpredictabilities. The sound speed

2.2 Channel Models 11

profile (ssp) can vary largely with the seasons. In the spring, cold water fromrivers cools down the surface water, while the deeper ocean water keeps a stabletemperature, causing a "knee" in the sound speed profile, Figure B.5 illustratesthis well. Similar behaviors can be observed in the summer when water close tothe surface is heated up. This affects the geometry of the multipath propagation[24].

2.1.4 Doppler Effect

The Doppler effect is the frequency shift in the observed signal due to the move-ment of the transmitter or receiver relative to the path traveled by the signal.The Doppler effect is present in all non-stationary communication channels, butas in all previous sections, it is more severe in the underwater case. The mag-nitude of the Doppler effect is proportional to a = v/c, where c is the propaga-tion speed of the signal and v is the velocity of the moving transmitter/receiver.To give some insight, c ≈ 1500 m/s for underwater sound, while radio wavestravel at the speed of light, c ≈ 3 · 108 m/s. Hence, for a particular object ve-locity v, the Doppler effect would be higher by a factor of 200000 for the uwachannel. As [25] points out, there are few comparable scenarios in radio com-munications, only for low-orbit satellite communications similar Doppler effectsare introduced. The Doppler effect is introduced when we consider a movingtransmitter or receiver such as an auv, but even without intentional motion, un-derwater nodes are subject to drifting with waves, tides, and currents [25]. So theDoppler effect can not be disregarded even for a non-moving setup.

2.1.5 Scattering

The sea contains inhomogeneities which intercept and reradiate portions of theacoustic signal [30, p. 237]. The reradiation is called scattering. The sum ofthe total scattering is called reverberation, and [30, p. 237] names three typesof reverberation: sea-surface reverberation, bottom reverberation, and volumereverberation. The first two are self-explanatory and volume reverberation occursdue to marine wildlife, other objects, and the inhomogeneous structure of seaitself.

2.2 Channel Models

Modeling of the uwa channel is a major obstacle to achieving reliable communi-cations in the uwa channel [12]. Good channel models that can be implementedin software are essential to simulate physical layers, as sea tests are expensiveand troubleshooting is more difficult. To create a channel model that takes intoconsideration all the aspects mentioned in Section 2.1 is non-trivial. In wirelesscommunications, a common approach to deal with multipath propagation andfading is to create stochastic channel models. A common assumption is the wide-sense stationary uncorrelated scattering (wssus) channel [2]. Due to its analyti-cal tractability, a similar model would be desired in uwa communications, but


studies conclude that wssus assumptions are violated by non-stationary behav-ior in the uwa channel [33]. Also, as summarized in [18], the signal envelope hasbeen reported to follow Rayleigh, Rice, and Lognormal distributions in differentstudies. According to another article [24], claims of shallow-water medium-rangechannels following a Rayleigh fading model have been challenged. Models suchas Rayleigh fading might be feasible but are constantly challenged and the uwachannel behavior is difficult to generalize. It is concluded that the contradictingresults show that a realistic uwa channel simulator is required [18]. A promisingapproach is ray theory, both [24] and [12] highlight the usage of ray theory tomodel multipath propagation in the uwa channel.

2.3 Baseline Physical Layer

The dfe-pll has been considered a suitable and popular receiver structure foruwa communications [12]. The structure was initially suggested in a series oftwo articles [23] and [26]. The design is motivated by the unique characteristicsof the uwa channel, namely large Doppler-fluctuations and long time-varyingmultipath. The structure has been combined with the multi-carrier spread spec-trum (mcss) modulation technique in [34]. The authors suggest a physical layerstructure based on mcss, where the receiver utilizes the dfe-pll [34]. mcss waslater renamed to frss and was motivated specifically for outperforming the othercandidates, namely, direct sequence spread spectrum (dsss) and frequency-shiftkeying (fsk) in low snr scenarios [32]. It did not give the highest data-rates athigh snrs compared to dsss and fsk, but the low snr performance is very desir-able in a tough underwater channel. The frss transmitter and receiver are vitalcomponents in this thesis.

2.3.1 Transmitter

The effective data-rate (bit/s) and stability of the reception are determined by theused spreading factor. The spreading factor is determined by a rate-parameter R,as it determines the effective data-rate (bit/s). A total of six available configura-tions of R are suggested in [32], R ∈ {1, 2, 3, 4, 5, 6}, but only four are available inthe available implementation. The spreading factor is of length K = 2R − 1. Achoice of R = 1 yields a single sub-band, while R = 4 yields fifteen sub-bands.A choice of higher R yields a more stable transmission, but the increased spread-ing factor reduces the effective data-rate (bit/s). Therefore, the choice should bemade based on the transmission conditions.

When R is chosen, the data symbols are mapped onto the K = 2R − 1 sub-bands.Initial training symbols are prepended and information bits are continuouslymultiplexed with periodic training symbols. The waveform is prepended with apreamble, see [32], which is utilized for detection, synchronization, and Dopplerestimation. Figure 2.2 from [32] illustrates the frssmodulation format for R = 2compared to dsss.

2.3 Baseline Physical Layer 13

Figure 2.2: An illustration of the frss modulation format, compared to tra-ditional dsss, from [32].

Preamble

All rates use an m-sequence preamble as described in [32]. The m-sequence ismapped onto a raised cosine carrier function. The raised cosine has a roll-offfactor of β = 1/3. The preamble is then moved to the passband with carrierfrequency fc.

Channel Coding

Before the bits are channel coded they are scrambled. After scrambling, the bitsare channel coded using a 1/2 rate convolutional encoder [32]. The bits are theninterleaved, i.e., rearranged to protect the data from burst errors.

Modulation and Training Symbols

The symbol sequence created from the process of channel coding is modulatedusing Gray-coded quadrature phase-shift keying (qpsk). The training symbolsmentioned in Section 2.3.1 are a set of known qpsk symbols, inserting one train-ing symbol for every two data symbols. A sequence of random qpsk symbolsis prepended to the signal, to enable initial equalizer training. The resulting se-quence which is ready to be modulated onto the sub-bands is denoted z(n).

FRSS generation

The sequence z(n) is mapped onto the K = 2R − 1 sub-bands utilizing a raisedcosine pulse with roll-off factor of β = 1/3. After modulation onto the sub-band


waveforms, the signal is concatenated with the preamble with a short pause inbetween of duration ∆t. The final output from the frss transmitter is a time-continuous signal x(t).

2.3.2 Receiver

The receiver can be divided into several parts as well. The processes are in or-der acquisition, equalization, adaptive filtering, phase tracking, log-likelihoodcomputation, and soft Viterbi decoding. The input to the frss receiver is a time-continuous signal u(t) which has passed through the uwa channel.

Acquisition and Pre-processing

The received signal u(t) is demodulated to the baseband from the carrier fre-quency fc. The baseband signal is then passed to a Doppler filter bank, i.e., thesignal is correlated with a bank of Doppler-shifted replicas of the preamble. Abrick wall filter is applied to remove out-of-band noise [34]. A filtered signal yi(t)per sub-band i is obtained from the pre-processing.

Equalization

The purpose of the equalizer is to eliminate the isi introduced by the channel.The input to the equalizer is a time segment of length Teq of the pre-processedsignal. Teq should be larger than the channel delay spread, Teq determines theparameter L. The equalizer is trained using the training sequences described inSection 2.3.1. Training is performed by feeding the samples to the filter and up-dating the filter coefficients by investigating the error between the known train-ing symbols and the filter output. The output of the equalizer is denoted as ẑ(n).

The equalization is performed by a fractionally spaced dfe-pll as described in

[23] and [26]. A symbol estimate is yielded by taking the input signal y(m)k,n and

performing multiplication with the filter coefficients c(m)k,n , see equation (9) in [34];

ẑ(m)(n) =K∑k=1

(y(m)k,n )T c(m)k,n±1, (2.4)

where y(m)k,n is given by equation (10) in [34];

y(m)k,n =

yk(nT − (L − 1)bT /(2a))yk(nT − (L − 3)bT /(2a))

...yk(nT + (L − 3)bT /(2a))yk(nT + (L − 1)bT /(2a))

exp

[−iθ(m)k (n ± 1)

]. (2.5)

2.4 Artificial Neural Networks 15

The signal yk(t) is down-sampled to a samples per symbol. The dfe has a spac-

ing of bTeq/a, a and b are design parameters [34], T is the symbol period. θ(m)k

represents the approximated phase shift of the carrier frequency for sub-band k.

Adaptive filtering and phase tracking

The parameters utilized in the equalization process are constantly tracked. The

known training symbols are used to update c(m)k,n using an adaptive filter, in [34]a recursive least squares (rls) filter is suggested. For more details on adaptive

filtering and the rls filter see [8]. The phase shift θ(m)k is approximated by a pllby utilizing the known training symbols, for details see [34].

Log-likelihood Ratio

The symbol estimates from the equalization ẑ(n) are then utilized to calculate thelog-likelihood ratio. The periodic training symbols are utilized to calculate a log-likelihood ratio [34], which is utilized to compensate for the bias of the equalizer.The output from the Log-likelihood Ratio is fed to the Viterbi decoder.

Viterbi Decoding

The scaled symbols are deinterleaved using the known interleaving sequence atthe transmitter. The symbols are then put through a soft-decision Viterbi decoder[32]. The final step is to unscramble the symbols, which yields the desired bitsequence.

2.4 Artificial Neural Networks

In this section, the basics of anns are described, as dnns and rnns are specifictypes of anns that have been utilized in this thesis.

anns is categorized as an ml algorithm. anns have attracted great attentionin recent years and often ml is considered synonymous with anns, even thoughthere are other kinds of algorithms in ml. anns have become very popular dueto their ability to solve very complex and non-linear problems and have provento be applicable in a wide array of subjects. Applications include computer vi-sion, signal processing, medical diagnosis, etc. Fundamental research and the-ory in the area were made in the 1970s, but it has gained popularity in the past10-20 years due to the increased accessibility of computing power that enablesthe training of the parameters in larger anns. In this section, the fundamentaltheory will be presented to introduce the reader into the basics in anns. This sec-tion only studies supervised learning with anns, where the network is trainedby feeding the network input/output combinations of the correct behavior. Thegoal with the training is to learn a function that takes previously unseen data andperforms the correct function mapping, i.e. generalization. The basics of annsin this section are based on [19].


2.4.1 Input and Output

Before explaining how anns work, the input and output have to be explained. Asmentioned before, anns have been applied to a variety of problems and the inputand output of an ann vary depending on the application. The input data to anann is often referred to as features, where selected features of the data are used.If the user wants to identify cats in an image, the feature can be 256 × 256 pixelvalues and the desired output is 0 or 1, 0 if there is no cat, 1 if there is a cat. In asignal processing example, the feature can be an entire received signal in the time-domain, its Fourier transform, or other properties of the signal. Feature selectionis of out-most importance, and selecting the correct feature and feature size iscrucial. It is desirable to pre-process information, to reduce the input dimensionand aid the neural network in learning already known properties. However, itmight come at a cost of lost information which can degrade the performance. Theimportant part is that there is a good data set with correctly annotated input andoutput pairs xn and yn, n ∈ {1, ..., N } respectively. The purpose of the network isthen to build an approximation of the function that gives the desired function fso that yn = f (xn) for all n.

2.4.2 The Neuron

The neuron is the fundamental computational block of an ann. The neuronhas a set of input features xn, and each feature xn is multiplied with a weightwn, n ∈ {1, ..., N }. The multiplied weights are summed up according to:

z =N∑n=1

wnxn + b (2.6)

where b corresponds to an introduced bias weight. The purpose of the bias weightis to be able to represent the output of the neuron in a possibly wider rangecompared to the input domain. The output of the neuron y is finally given byy = g(z), where g(z) corresponds to the activation function. The purpose of theactivation is to perform a non-linear mapping of the weighted summation to theoutput so that it can be decided if the neuron was activated or not. Activationfunctions are non-linear functions (linear functions may only be utilized in theoutput layer), as it enables anns to capture non-linear behaviors, a necessityto solve complex problems. Some examples of popular activation functions aregiven in Section 2.4.5.

2.4.3 The Network

The artificial neural network is built of multiple layers of neurons. In the simplestcase, the input features (input layer) is connected to a set of neurons, which thenproduce the output (output layer). To solve complex problems layers of neuronsare added in between the input and output layers, which are called hidden layers.

When performing classification tasks the Softmax layer is commonly used in the


output layer in multilabel classification tasks. So each class is given a probabilitybetween zero and one, based on which a classification decision is made.

2.4.4 Training

Once the network architecture is built the final step is training the network. Thetraining of an ann is the process of updating the weights (including bias weights)w in all the neurons in the network. First, we have to design a problem which canbe optimized. Therefore, a loss function is introduced, in this case the meansquare error loss function is selected, but note that there are other variants avail-able. The mean square error loss function is given as;

�(w) =∑m

∑k

(yk,m − pk,m(w))2, (2.7)

where k corresponds to all training samples and m to all output nodes. Note thatthe training sample yk,m is known and the network predicts the output pk,m(w) .It is desired to minimize �(w), this is done by gradient descent:

wt+1i,j = wti,j − η

∂�(w)∂wi,j

, (2.8)

the parameter η is called the learning rate and the derivative d�(w)/dwij is foundusing partial derivatives and the chain rule, using that activation functions shouldbe differentiable or piecewise differentiable. The learning rate determines howfast the weights in the network are updated. A high learning rate leads to fastertraining, and possibly over-fitting. A low learning-rate leads to slower training,and possibly non-converging results.

The process of feeding training samples through the network is called forwardpropagation. The process of updating the weights based on the loss function iscalled backward propagation. This can be done for one training sample at a timeor utilizing multiples samples, which is called batch learning. In batch-learning,training samples are put into batches, and then forward and propagation is per-formed. The number of training samples in one batch is called batch size ormini-batch size, as common sizes are relatively small numbers as 32 or 16.

2.4.5 Designs Considerations

Some important design considerations will be discussed in this section, to givesome insight into the design of an ann.

Activation Function

There exist many possible activation functions, here three classical activationfunctions are presented. The first one is the sigmoid function:

σ (x) =1

1 + e−x. (2.9)


The sigmoid function has a range between 0 and 1 and is easy to apply. However,it suffers from some undesirable properties:

• Vanishing gradient problem, i.e., the partial derivates become very smallwhen x is too small or too large, which leads to slow updates for early layersin the network.

• The range of the derivative of the sigmoid function is very narrow, whichleads to indistinct gradient values.

• It is not zero-centered. Hence, negative-valued outputs cannot be repre-sented by a sigmoid function.

The hyperbolic tangent function:

f (x) = tanh(x) (2.10)

is a popular choice. It is zero-centered and the derivative is not as narrow as thesigmoid function. It does, however, suffer from the vanishing gradient problem.

The rectified linear unit (relu):

f (x) = max(0, x) (2.11)

is a function that does not suffer from the vanishing gradient problem and it isnot zero-centered. It is worth noting that Equation (2.11) is not suitable for theoutput-layer and should only be used in the hidden layers if the range of output isnot restricted to 0-1 interval. A variant of the relu is the leaky relu, it behavessimilarly to the relu, with the exception that it allows small negative values tobe let through. It can be described as:

f (x) =

x x > 00.01x otherwise. (2.12)Number of Layers

The number of layers and neurons is a design consideration. The number of lay-ers can be increased to improve the ability to capture complexities in data, i.e., amore shallow network might be well suited for simpler tasks and vice versa. Thenumber of neurons in each layer can also be increased to capture more complexstructures, but should also be of suitable dimensions compared to the dimensionsof the input.

There are some drawbacks to increasing the number of layers. One danger isthat the network is more prone to over-fitting on the training data. Increasingthe number of layers increases training and run times. When working with alarger amount of layers it is also important to make sure that earlier layers in thenetwork are updated properly, to avoid the vanishing gradient problem. A dnnis an ann where there is more than one hidden layer.


Optimization method

Equation (2.8) presents the simple gradient descent method. There are a lot ofvariants of the basic gradient descent available. Different methods can reducetraining time or modify the learning rate while training. The learning rate shouldalso be selected carefully. An example of a popular optimization method is adap-tive moment estimation (adam), which updates the learning rate while training[13].

Loss Function

The mentioned mean squared error function (2.7) is a popular choice of the lossfunction, however, there are other options available. The choice of loss functiondepends on how the problem is designed.

Data Set

The performance of anns is only as good as the provided data set. If the dataset is not general enough the ann might only become applicable to specific situ-ations. The training data has to be selected such that it allows to be generalizedto the possible various data that will appear when using the ann. Any network,regardless of how many neurons it has or how deep it is, is only as good as thegiven training data.

2.4.6 Recurrent Neural Networks

rnns is a class of anns, where the connections between neurons form a directedgraph. Compared to feedforward anns (information only moves in one direction,from the input layer to output layer), rnns can use their internal state/memoryto aid in the solving of tasks [10]. Essentially rnns can utilize information ofprevious input compared to the basic feedforward network, which lacks thisproperty. To learn time-varying underwater channels, this property could be ofuse. There are several rnn architectures. The only rnn structure provided byMATLABs Deep Learning Toolbox (version 2019b) is the long short-term memory(lstm) structure.

2.4.7 Long Short-Term Memory Architecture

A lstm layer is built from a set of recurrently connected blocks, called memoryblocks. Each block contains a set of cells, where each cell is connected to threemultiplicative gates, the input gate, the output gate, and the forget gate. Thegates regulate the flow of information into the cell by being open or closed, thuscontrolling its behavior [6]. An illustration of an lstm is presented in Figure 2.3,from [6]. The gates open or close based on the signals they receive, so informa-tion is let through or blocked based on the signal strength and importance. Thesignals that lead into the gates are weighed as shown in Figure 2.3. This meansthat the network learns how to value the signal strength, i.e., learning how to


Figure 2.3: Illustration of an lstm block and its connections, from [6].

handle the data. The input gate thus controls when data is allowed to enter thecell, the output gate controls when data is allowed to leave the cell, and the forgetgate controls when data is allowed to be deleted. The concept of gates, allows theneural network to capture remote dependencies, long-term memory as the nameimplies.

Bidirectional LSTMs

A bidirectional lstm runs the input in both directions, yielding backward andforward information of the sequence. This allows the network to combine infor-mation from the past and the future. This property has proven to be useful as itlearns the context of the provided information better compared to a basic lstm.

2.5 Previous Work

Before explaining the outlined method in this thesis, recent work and progresswithin the field will be explained to give some insight into the new approach thisthesis takes.

2.5.1 Machine Learning in Wireless Radio Communication

Terrestrial radio wireless communications have a plethora of literature on utiliz-ing ml for performing channel estimation and equalization. One can learn a lotfrom this field, but the assumptions are not coherent with the assumptions for the

2.5 Previous Work 21

uwa channel. In an article, a dnn is trained online and offline, exploiting knowl-edge from training symbols to learn a time-varying channel [17]. An interestingapproach to this problem is to develop a network that can learn time-varyingproperties. Recent literature in radio communications highlights the fact thatrnns which can learn a temporal dynamic behavior compared to dnns, can besuitable for channel estimation/equalization. rnns are exploited in [4] and [7] toperform online pilot-assisted channel estimation/equalization. rnns utilize theprevious outputs of the network to create an internal state, which it can use toprocess new inputs. The number of works on lstms in communications is ratherlimited, they have however been proven successful in speech recognition [14].

2.5.2 Machine Learning in Underwater Acoustic Communication

Recent literature on uwa communication suggests that different parts of the re-ceiver chain or the entire chain can be replaced by dnns with promising results.In a study, a version of the dfe-pll receiver is replaced by a dnn with improvedsystem performance, namely lower bers compared to the normal receiver at thesame snr [37]. The authors considered a single sub-band system, while in thisthesis a multi-sub-band system is adopted. A related study proposes a systemwhere the entire ofdm receiver chain is replaced by a dnn [36], implying thepossibility that dnns are promising to improve ber in multi-sub-band systems.However ofdm is very different compared to frss in a lot of aspects, most im-portantly the isi is removed by the cyclic prefix in ofdm, which simplifies therest of the reception. Another work replaces the channel estimation in an ofdmsystem with a dnn [11]. In [36] and [11] the authors observed lower bers at allsnrs compared to traditional receiver structures which the authors suggested asbenchmarks.

One common factor in [11, 36, 37] is that the dnns only consist of three to fourhidden layers and have a common topology. The topology follows the patternthat the number of neurons in the hidden layer is half of the previous layer. Ifthe input layer is of size 1024 as in [36], the first hidden layer should be of size512, then 256, and so on. Building a network from a similar structure should bea good start.

2.5.3 Key Takeaways

The strength of rnns are highlighted for wireless radio communication [4, 7],but similar studies in uwa communication were not found. The studies relatedto uwa communication highlights the usage of dnns [11, 36, 37]. In both areas,training symbol-aided online-training has been proven effective for rnns anddnns. Training symbol-aided training is very interesting as frss training sym-bols could be utilized in such an approach.

This study will take a new approach by comparing the two different structuresin uwa communications and a rather unique multi-sub-band system. The cited


literature highlights the possibilities of this approach.

3Method

In this chapter, the method to answer the problems posed in Section 1.3 is out-lined. First, the theory from previous chapters is used to describe the systemmodel. Then the software used to simulate the system is described in Section3.2. Details regarding the intricate channel simulation are described in Section3.3, outlining the possible channels. Section 3.4 outlines how the different con-figurations are simulated. The content in the sections until Section 3.5 serves thepurpose to identify interesting deployment scenarios for the anns, i.e., answer-ing in which environments performance can be improved. The final questions ofhow much performance can be increased and deployment strategies (generaliz-able performance) are answered in the final sections. Section 3.5 proposes annstructures based on the related work and Section 3.6 describes how anns are de-ployed to answer the outlined problem formulation. Finally, some miscellaneousstudies are described.

3.1 System Model

The intention with this section is to describe how the ml receiver and baselinereceiver were implemented on a schematic level. Both systems utilized the samefunctions and shared most properties, besides the equalization.

Assuming that the frss transmitter described in Section 2.3.1 yields a time-continuossignal x(t), x(t) was then sent through the underwater multipath channel accord-ing to equation (2.3). The received signal at the hydrophone is y(t) where n(t) isadditive noise:

u(t) = y(t) + n(t). (3.1)

The noise n(t) can either be colored or white, see Section 3.3.6. What is importantis that the received signal is inevitably embedded in noise. The received signal

23

24 3 Method

u(t) was fed to the different receivers.

3.1.1 Baseline Receiver

The baseline receiver utilizes all the steps outlined in Section 2.3.2. In the end, asequence of estimated information bits is generated. By comparing the estimatedbit sequence and the original bit sequence, the ber could be calculated.

3.1.2 Machine Learning Receiver

For performance of the ann receiver to be comparable to the baseline receiver,only the equalization process described in Section 2.3.2 is replaced. First, thefrss receiver performs the described pre-processing mentioned in Section 2.3.2.The next step was to perform the equalization, which was performed by an ann.Similar to equation (2.5) the signals were stacked, but the phase offset was disre-garded in this implementation, yielding the following expression:

yk,n =

yk(nT − (L − 1)bT /(2a))yk(nT − (L − 3)bT /(2a))

...yk(nT + (L − 3)bT /(2a))yk(nT + (L − 1)bT /(2a))

. (3.2)

yk,n for each symbol n was considered as the input to the ann. The choice of aand b are design parameters which had to be considered, see Section 3.1.3, theyessentially determined how many samples the signal was represented by, i.e., thelength of the vector yk,n. Dimensions for the selection of a and b are presented inSection 3.1.3, which yielded a vector of length 90. The input was then to be fedto the ann which yielded a symbol estimate ẑ(n) for each n, the estimate is fed tothe log-likelihood ratio and Viterbi decoding as described in Section 2.3.2. In theend, a sequence of estimated information bits was generated. By comparing theestimated bit sequence and the original bit sequence, the ber could be calculated.

3.1.3 Choice of Parameters

Each rate had a different amount of sub-bands, hence the signal dimensions foreach rate vary. Thus in this thesis an ann is configured to one rate configuration.Out of the four available rates, the rate R = 2 which utilizes three sub-bands wasstudied in this thesis. The choice of R = 2 was motivated by the fact that fewersub-bands reduces the input dimension complexity to the ann. R = 1 was notconsidered as part of what makes the thesis unique compared to [37] is to studyequalization in a multi-sub-band configuration.

The choice of equalizer parameters a, b and Teq (Teq determines the size of L)for the ann are presented in Table 3.1.

3.2 Software Simulation Environment 25

Table 3.1: Configurable equalizer parameters.

Property baseline ann

a undisclosed 4b undisclosed 1Teq undisclosed 24 ms

The choices of a, b, and Teq utilized by the baseline frss receiver are undisclosedbut are comparable to the ones chosen for the ann. Teq was chosen by studyingthe simulated channel impulse responses, which had a maximum delay spreadin the range of 20 ms, see Appendix D. Thus 24 ms was chosen to have somemargins. A spacing of b = 1 was chosen as a high signal fidelity was consideredbeneficial, i.e., the spacing between samples was minimal. The parameter a waschosen to be comparable to the baseline equalizer.

3.1.4 Bit Error Rate Definition

The ber refers to the channel coded ber, the number of bit errors in a receivedpacket. As mentioned, by comparing the estimated bit sequence and the originalbit sequence, the ber could be calculated. The ber curve as a function of snris the performance metric utilized in this report. At least 10 errors per point onthe curve is utilized to estimate the ber. In some channels, and in high snr, thiscould require an extensive number of simulations to estimate.

The ber can be also be affected by the reception, namely false alarms or unde-tected packets. The receiver has a probability of giving a false alarm, i.e., a detec-tion occurs in the absence of a signal. There is also a probability that a receivedpacket is not detected. False alarms and undetected packet affects the modem per-formance, and can thus be represented in the ber curve, but in this study perfectreception is assumed, i.e., false alarms and undetected packets are disregarded.Any false alarms are disregarded in simulations, as false alarms occur with anincreased probability as the number of simulations is increased, which can skewber calculations when a higher number of simulations are needed to estimate theber. For the same reasoning, undetected packets are also disregarded.

3.2 Software Simulation Environment

The frss transmitter and receiver described in Section 2.3 came with a MATLABreference implementation. In this section, the software and setup used to simu-late the underwater channel and the ann framework will be described.

3.2.1 Machine Learning Software

MATLAB has a Deep Learning Toolbox [28], which was utilized to implementthe dnn and rnn. The Deep Learning Toolbox provides tools to build anns

26 3 Method

and contains a lot of advanced features and provided the most necessary toolsto design and build anns within this thesis. The toolbox also allows the user tospeed up training with a graphical processing unit (GPU), which was available inthe setup. An advantage of utilizing the toolbox was that the thesis could focusmore on designing network structures and testing different models, rather thanspending time on implementing basic functions. An alternative would have beento use popular toolboxes, PyTorch or TensorFlow which are available for Python,but the frss code was given in MATLAB, so MATLAB’s toolbox was chosen.

3.2.2 Channel Model

As simple channel models based on assumptions regarding the uwa channelwere proven to be non-realistic, as mentioned in Section 2.2, one must resortto more complex channel models. The aspects mentioned in Section 2.1 neededto be taken into consideration one by one. A common choice for modeling multi-path propagation and attenuation in underwater conditions is Bellhop [20]. Bell-hop is a model based on ray tracing and can utilize information about ssps andbathymetry, along with other environmental properties, to generate a channel im-pulse response. Bellhop is relatively complicated as it requires a lot of input, butit has become popular as it is considered somewhat realistic. There are plenti-ful of models [21, 35, 38], which utilize Bellhop as a baseline for generating thechannel impulse response. [35] proposes a channel model and study the impactof noise and the Doppler effect to generate a complete channel model. Anotherstudy [21] proposes a statistical model based on large-scale and small-scale ef-fects of the movements and environmental conditions. Doppler effects are alsostudied in detail. A third article [38] utilizes Bellhop to simulate the physicallayer in a NET-layer simulator.

Out of the possible channel models, [21] was chosen to be used in this thesis.All the important aspects mentioned in Section 2.1 are modeled in [21] whichmakes the model very complete, and it is referenced in recent uwa literature[36, 37]. An additional benefit is also that the model had an openly availableimplementation in MATLAB [22]. The model described in [21] contains a lot ofhyperparameters, and the next section is devoted to motivating the input to thesimulation.

3.3 Channel Simulation Configuration

The channel simulator, based on [21], yielded an estimated time-variant channelimpulse response h(τ ; t) based on the configured parameters. The channel simu-lator took the environmental parameters and gave them to Bellhop. The Bellhopsimulation was deterministic, a specific configuration always yielded the samesignal arrivals. Based on the signal arrivals, the time-variant channel impulseresponse was generated from the small-scale variations (Table 3.4) and Dopplervariations (Table 3.6).

3.3 Channel Simulation Configuration 27

In this section, the configuration utilized to simulate h(τ ; t) will be described.Bathymetry, ssps and bottom sediment described in Section 3.3.1 to Section 3.3.3were input to the Bellhop simulator while the general parameters and channelvariations were input to the statistical model from [21].

3.3.1 Bathymetry

As mentioned in Section 1.4, two depths were considered. For each depth, threegeneral bathymetry profiles were considered, yielding six possible bathymetries.The purpose was to capture general scenarios. The three general scenarios se-lected were a flat ocean bottom, a slope, and an obstacle. Motivation and plots ofthe bathymetries can be found in Appendix A.

3.3.2 Sound Speed Profiles

Nine different ssps were utilized in simulations, four for the shallow channel, andfive for the deeper channel. The ssps were chosen to correspond to the varyingseasons. The ssps were taken from the Swedish Meteorological and HydrologicalInstitute’s weather buoys in Skagerrak and the Baltic Sea [27]. Data from the buoylocated at Släggö was chosen due to its coast proximity in which the effects offreshwater rivers can be noticed. Data from the buoy REF M1V1 located betweenÖland and Småland was chosen for the shallow scenario, as it was one of thefew buoys with sound speed data in shallower waters. For each buoy, a set of theavailable dates with data were selected, the chosen dates are found in Appendix B.Data points were written down on paper and then manually added into MATLAB,creating an approximation of the plots provided by Swedish Meteorological andHydrological Institute. ssps were picked at specific times, attempting to capturethe varying seasons and their effects on the sea. Plots of the ssps and descriptionof the behaviors are found in Appendix B.

3.3.3 Bottom Sediment Types

The Bellhop simulator allowed the user to specify properties of the sea bottomsediment. It was noted that altering this parameter yielded significant changes inthe simulation (thus, two setups were considered). The properties of the bottomsediment determined how much reflection and reverberation the bottom gener-ates. Two kinds of bottom sediment, with very different properties were consid-ered. The sediment types and properties are presented below in Table 3.2, thedata was obtained from seafloor measurements [9].

Table 3.2: Bottom sediment properties for two different kinds of bottom.

Sediment type Sediment sound speed Wet bulk density

Silty sand 1657.49 m/s 1.91 g/cm3

Clayey silt 1465.52 m/s 1.63 g/cm3

28 3 Method

The intention was to study how important this property is to the reception sincethe receiver and transmitter described in Appendix A are very closely located tothe bottom. The center frequency (carrier frequency) and bandwidth are undis-closed, as they were specific to the given simulation code which can be confiden-tial information.

3.3.4 Channel Geometry and General Parameters

The properties of the basic channel geometry are listed below in Table 3.3.

Table 3.3: Channel geometry and general channel/simulation properties.

Property Value Unit

Surface height 18 or 72 mTX height 2 meters from bottom mRX height 2 meters from bottom m

Channel distance 1000 mSpreading factor 1.7 -Center frequency undisclosed Hz

Bandwidth undisclosed HzFrequency resolution 20 Hz

Time resolution dt 120 msSmall-scale variation duration T 20 s

Most parameters were based on the limitations set in the introduction. Thespreading factor was set to k = 1.7 by default. It was also verified with sonarexperts at Saab Dynamics who claimed that the choice of k = 1.7 was suitablefor shallow water communications. The time resolution corresponds to the reso-lution of time-variance t, and the frequency resolution can be interpreted as theresolution of the delays τ . The simulator returned τ as a frequency vector, butcould be converted simply by dividing the returned frequency vector with thebandwidth.

3.3.5 Channel Variations

Small-scale fading describe movements/displacements within in the range of afew wavelengths. Small-scale variations were described by [21] as the variationsof the surface and bottom, and the statistical properties of the scattering. The con-figurable small-scale fading settings are described in Table 3.4. In [21], each pathestimated by the Bellhop simulator was assumed to undergo scattering. Eachpath would be split into several micro paths. The statistical properties of themicro path and their delays were described by the psd, mean of the intra-pathamplitudes, and the variance. The values were the same as the default from [22],except for the variance of bottom variations which was set to zero as the bottomwas considered rather static.

3.3 Channel Simulation Configuration 29

Table 3.4: Configurable small-scale settings.

Property Value Unit

Variance of surface variations 1.125 m2

Variance of bottom variations 0 m2

3-dB width of psd 0.5 msNumber of intra-paths 20 -

Mean of intra-path amplitudes 0.025 -Variance of intra-path 10−6 -

All large-scale variations in the model [21] were assumed to be zero. The reason-ing for this was twofold;

• Reducing the number of parameters simplifies the analysis of the results.

• In reality, the change of height and distance is negligible over time.

The configurable large-scale settings are listed below in Table 3.5.

Table 3.5: Configurable large-scale (L-S) settings.

Property Value Unit

Range of surface height variation 0 mRange of TX height variation 0 mRange of RX height variation 0 m

Range of channel distance variation 0 mStandard deviation of L-S variations of surface height 0 m

Standard deviation of L-S variations of TX height 0 mStandard deviation of L-S variations of RX height 0 m

Standard deviation of L-S variations of channel distance 0 m

Doppler Parameters

It was assumed that the transmitter and receiver were in a constant small drifthorizontally along the surface bottom, all other movements were assumed to bezero. Surface variations were the default from [22]. The configurable Dopplereffect paramters are described in Table 3.6.

30 3 Method

Table 3.6: Configurable Doppler effect parameters.

Property Value Unit

TX drifting speed [-0.005 0.005] m/sTX drifting angle 0 radRX drifting speed [-0.005 0.005] m/sRX drifting angle 0 rad

TX vehicular speed 0 m/sTX vehicular angle 0 radRX vehicular speed 0 m/sRX vehicular angle 0 rad

Surface variation amplitude 0.05 -Surface variation frequency 0.01 -

3.3.6 Noise

Two kinds of additive noise were considered, complex white Gaussian noise (wgn)and complex colored Gaussian noise (cgn). For channel simulations and train-ing anns, wgn was utilized. cgn was studied in the final step, to identify howit affected the studied anns. The additive noise in Section 3.1 was described byits snr.

A red-colored noise psd was be used to describe the colored additive noise. Asthe noise described in Section 2.1.2 was described to decay at 18 dB per decade.Using red noise which decays at 20 dB per decade, could be considered a rea-sonable assumption. The advantage of using red noise was that there existedadditive red noise implementations for MATLAB, which reduced uncertainty asimplementing a custom noise profile takes time and increased the risk of unnec-essary errors. Thus, further on cgn will refer to additive red noise, as it wasconsidered an approximation of the ocean ambient noise psd.

The frss receiver utilized a brick-wall filter to remove out of band noise, hencecgn andwgn energy in the frequency band of interest had to be of similar quan-tities to be comparable. To ensure that the snr of the cgn and wgn was com-parable, the noise profiles were bandpass-filtered in the bandwidth utilized fortransmission. A Butterworth filter of order twenty was used, yielding roughlythe same energies with a maximum difference of one to two percent.

3.4 Channel Simulations

To identify how the different channels behaved, several simulations were con-ducted. With the two different depth scenarios, bathymetries, and sound speedprofiles (ssps), a total number of 27 possible combinations exised. Consideringtwo different bottom types, a total of 54 possible combinations existed. The chan-nel simulation was crucial to identify the environments in which anns were to

3.4 Channel Simulations 31

be deployed.

3.4.1 Time-Variant Filter

The time-variant filter described by (2.3) had to be implemented by consideringthe channel impulse response as a sequence of time-invariant filters. The chan-nel simulator yields an approximated discrete-time version of h(τ, t), which isdenoted by h[i, j]. The indexes i and j are discrete samples, i corresponds to dis-crete instances of time delays 0, dτ, ..., Td and j corresponds to discrete instancesof time 0, dt, ..., T . The time resolution dt and T are directly available from Table3.3, while dτ and Td depend on the selected value of frequency resolution fromthe same table. With the available frss signal x(t), the implemented filter is de-scribed below in Algorithm 1. Before the algorithm is run, h[i, j] is up-sampled tothe same sampling frequency as x(t). The sampling frequency fs is undisclosed,as it belonged to the given simulation code. The notation ∗ denotes the convolu-tional operator.

Algorithm 1 Time-variant filter algorithm.

procedure Timevariantfilter(h, x, t, τ) . Performs time-variant filtering, his a function of t and τ

for i = 1 to length(t) doyp[:, i] = x ∗ h[:, i]

end forj = 1for i = 1 to length(yp[:, 1]) do

if (i/fs) > (dt · j) thenj = j + 1

end ify[i] = yp[i, j]

end forend procedure

3.4.2 Simulation Loop

To simulate the different configurations a simulation script was setup. The in-tention was to study the channel impact on the ber. The algorithm takes thenumber of available ssps and bathymetries as S and B, x corresponds to a set offrss signals, n is wgn and fs the sampling frequency. The channel type, labeledtype corresponds to the deep or shallow simulation configuration.

32 3 Method

Algorithm 2 Simulation loop

procedure Simulation loop(S, B, x, n, type) .for b = 1 to B do

for s = 1 to S do[h, t, τ] = createEnvironment(b, s, type)y = timeVaryingFilter(h, x, t, τ)ber = frssReceiver(y + n)

end forend for

end procedure

Algorithm 2 was executed for all types of setups to study the channel impact.The simulation loop was run in a range of snrs multiple times to ensure thatstatistical anomalies in the channel simulator are mitigated.

3.5 Artificial Neural Network Structure

As the literature had highlighted dnns and rnns to be viable solutions, one vari-ation of each network would be considered. The ann layers are designed ac-cording to MATLAB’s syntax. Both anns wre designed as classifiers. The annstask was to take the input described in the system model in Section 3.1.2, withthe selected parameters from Section 3.1.3, and classify which symbol each time-segment corresponds to. As three different sub-bands were utilized, the inputfrom each sub-band had to be treated as separate input streams. anns do nothandle complex numbers, so the real and imaginary samples in each sub-bandhad to be separated yielding a total of six input streams. The choices of a, band Teq from Table 3.1 yield a total of 90 symbols per sub-band. Thus, the fea-ture dimension was x ∈ [6, 90]. The output dimension was a class from the setq = {1, 2, 3, 4}, corresponding to the four possible qpsk symbols and the classifica-tion was based on a Softmax layer. However, in the implementation, the probabil-ity of each qpsk symbol from the Softmax layer was used to make a soft decision,instead of the hard decision classification. This proved to be beneficial over us-ing hard decisions from the classification layer. Probably because the uncertaintyin which the soft decision gives information about the uncertainty of the symboldecision which could be utilized by the Viterbi decoding in the receiver.

Offline-training

The adam optimizer was utilized. The initial learning rate was set to the default0.001. In offline-training over-fitting was combated by using early stopping. Arandom 10% of the data set was extracted and used as a validation set. The train-ing was then stopped if the accuracy of the validation set does not increase after apredetermined number passes (validation patience). The validation patience wasset to six. Training was performed in mini-batches of size 651. The specific sizewas related to the number of symbols in one frss packet. The standard mean

3.5 Artificial Neural Network Structure 33

square error loss function was used, as the correct equalizer symbol output wasknown.

Online-training

In online-training only the training symbols which were used to train the dfe-pll were used. Training was performed once for each received frss waveform,utilizing all the known training symbols. The adam optimizer was utilized onceagain with the default learning rate of 0.001 with mini-batches of size 31, onceagain related to the number of symbols in one frss packet. Training was alwaysperformed in fifteen epochs without any early stopping or method to preventover-fitting. The intention was rather to over-fit on the particular set of trainingsymbols. Training was thus performed for each received frss signal. The stan-dard mean square error loss function was used, as the correct equalizer symboloutput was known.

3.5.1 Deep Neural Network

In Section 2.5.2, a common structure for dnns in uwa communication was high-lighted. The topology followed a constant halving of the number of neurons. Thetopology suggested in Table 3.7 almost follows the concept of halving the numberof neurons in each layer. The intention was to utilize a similar topology and thenetwork design is, something which can be studied more in detail. The choiceof the activation function was decided by simply trying out different layers, theleaky relu provided the best training results.

Table 3.7: dnn layer structure.

Layer Dimension

Image input layer [90 1 6]Fully connected layer 220

Leaky relu layer -Fully connected layer 100



Softmax layer -Classification layer -

MATLAB’s Deep Learning toolbox supported two kinds of input layers, imageinput layers, and sequence input layers. Even though the input is not an image,its dimensions were adjusted in MATLAB to adhere to the syntax.

34 3 Method

3.5.2 Long Short-Term Memory Network

In MATLAB’s 2019b version the only available rnn structure was the lstm struc-ture. It was therefore chosen as the recurrent layer. The bidirectional lstm struc-ture was chosen, as early tests show that the bidirectional structure yielded muchhigher accuracy on the same data set.

Table 3.8: lstm layer structure.

Layer Dimension

Sequence input layer 6Bi lstm layer 100

Fully connected layer 4Softmax layer -

Classification layer -

3.6 Artificial Neural Network Experiments

This section will outline the experiments which were performed to answer theproblem formulation, see Section 1.3. The simulations outlined in Section 3.4.2gave a clear understanding of which channels were suitable and unsuitable tothe baseline receiver. This information was utilized to build a test strategy forthe anns. Both the lstm and dnn network were to be tested in:

• Channels with low ber,

• Channels with high ber.

Based on the network structures and the result, the second problem from Sec-tion 1.3 would be answered, regarding how much and why performance improve-ments could be offered. To

Channel Equalization using Machine Learning for Underwater ...1442845/...Master of Science Thesis in...

Documents

Transcript of Channel Equalization using Machine Learning for Underwater ...1442845/...Master of Science Thesis in...