Two Stage Constant-Envelope Precoding for Low Cost Massive … · High Peak-to-Average Power Ratio...

24
1053-587X (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSP.2015.2486749, IEEE Transactions on Signal Processing Two Stage Constant-Envelope Precoding for Low Cost Massive MIMO Systems An Liu, Member IEEE, and Vincent Lau, Fellow IEEE Abstract Massive MIMO is a key technology to meet increasing capacity demands in 5G wireless systems. However, a base station (BS) equipped with M 1 antennas requires M radio frequency (RF) chains with linear power amplifiers, which are very expensive. In this paper, we propose a two stage constant- envelope (CE) precoding scheme to enable low-cost implementation of massive MIMO BS with S M RF chains and nonlinear power amplifiers. Specifically, the MIMO precoder at the BS is partitioned into an RF precoder and a baseband precoder. The RF precoder is adaptive to the slow timescale channel statistics to achieve the array gain. The baseband precoder is adaptive to the fast timescale low dimensional effective channel to achieve the spatial multiplexing gain. Both the RF and baseband precoders are subject to CE constraints to reduce the implementation cost and the peak-to-average power ratio of the transmit signal. The two stage CE precoding is a challenging non-convex stochastic optimization problem and we propose an online alternating optimization algorithm which can autonomously converge to a stationary solution without explicit knowledge of channel statistics. Simulations show that the proposed solution has many advantages over various baselines. Index Terms Massive MIMO, Constant-Envelope Precoding, Limited RF Chains, PAPR, Online Alternating Op- timization Copyright (c) 2015 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending a request to [email protected]. This work was partially supported by NSFC Grant No.61571383, and partially supported by RGC 614913 and Huawei. An Liu and Vincent K. N. Lau are with the Department of ECE, The Hong Kong University of Science and Technology (email: [email protected]; [email protected]).

Transcript of Two Stage Constant-Envelope Precoding for Low Cost Massive … · High Peak-to-Average Power Ratio...

Page 1: Two Stage Constant-Envelope Precoding for Low Cost Massive … · High Peak-to-Average Power Ratio (PAR): In today’s wideband systems, orthogonal frequency-division multiplexing

1053-587X (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSP.2015.2486749, IEEE Transactions on Signal Processing

Two Stage Constant-Envelope Precoding for

Low Cost Massive MIMO SystemsAn Liu, Member IEEE, and Vincent Lau, Fellow IEEE

Abstract

Massive MIMO is a key technology to meet increasing capacity demands in 5G wireless systems.

However, a base station (BS) equipped with M � 1 antennas requires M radio frequency (RF) chains

with linear power amplifiers, which are very expensive. In this paper, we propose a two stage constant-

envelope (CE) precoding scheme to enable low-cost implementation of massive MIMO BS with S �M

RF chains and nonlinear power amplifiers. Specifically, the MIMO precoder at the BS is partitioned into

an RF precoder and a baseband precoder. The RF precoder is adaptive to the slow timescale channel

statistics to achieve the array gain. The baseband precoder is adaptive to the fast timescale low dimensional

effective channel to achieve the spatial multiplexing gain. Both the RF and baseband precoders are subject

to CE constraints to reduce the implementation cost and the peak-to-average power ratio of the transmit

signal. The two stage CE precoding is a challenging non-convex stochastic optimization problem and we

propose an online alternating optimization algorithm which can autonomously converge to a stationary

solution without explicit knowledge of channel statistics. Simulations show that the proposed solution

has many advantages over various baselines.

Index Terms

Massive MIMO, Constant-Envelope Precoding, Limited RF Chains, PAPR, Online Alternating Op-

timization

Copyright (c) 2015 IEEE. Personal use of this material is permitted. However, permission to use this material for any other

purposes must be obtained from the IEEE by sending a request to [email protected].

This work was partially supported by NSFC Grant No.61571383, and partially supported by RGC 614913 and Huawei.

An Liu and Vincent K. N. Lau are with the Department of ECE, The Hong Kong University of Science and Technology

(email: [email protected]; [email protected]).

Page 2: Two Stage Constant-Envelope Precoding for Low Cost Massive … · High Peak-to-Average Power Ratio (PAR): In today’s wideband systems, orthogonal frequency-division multiplexing

1053-587X (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSP.2015.2486749, IEEE Transactions on Signal Processing

I. INTRODUCTION

Massive MIMO is regarded as the main contributor for spectrum efficiency gain in 5G wireless

systems [1]. While the theoretical research and even some prototype demo show considerable performance

advantages of massive MIMO, the practical deployment still faces two big challenges.

High RF Chain Cost: For a massive MIMO system with hundreds of antennas, it is very expensive to

have one RF chain behind every antenna. In practical massive MIMO systems, it is desirable to deploy

fewer RF chains than the number of antennas to reduce the hardware cost.

High Peak-to-Average Power Ratio (PAR): In today’s wideband systems, orthogonal frequency-

division multiplexing (OFDM) is widely used to deal with frequency selective channels. However, OFDM

combined with linear MIMO precoding yields transmit signals with very high PAR especially for massive

MIMO systems [2]–[4]. As a result, more expensive linear power amplifiers are required to avoid out-

of-band radiation and signal distortions. Therefore, a better transmission scheme which yields low-PAR

transmit signals would be desirable to enable low-cost and low-power BS implementations with nonlinear

power amplifiers.

There have been some works aiming at solving one of the above practical issues in massive MIMO

systems. For example, to reduce the number of RF chains, hybrid RF/baseband precoding has been

proposed for MIMO and mmWave systems [5]–[8], where the MIMO precoder is a cascade of a high

dimensional RF precoder followed by a low dimensional baseband precoder. However, the high PAR

issue in frequency selective fading channel is not addressed in [5]–[8]. A few low-PAR precoders for

massive MIMO systems have been proposed in [2]–[4]. In [3], [4], a constant envelope (CE) constraint is

imposed on the transmit signals of massive MIMO systems. In this case, extra power is required in order

to achieve the same sum-rate as without the CE constraint. However, the overall power consumption

may decrease because the amplifier back-off is reduced and thus the amplifier efficiency is improved [3],

[4]. Moreover, CE transmit signals are much more RF-friendly (which leads to cheaper and simpler BS

design) and can reduce the out-of-band radiation and signal distortions. However, the existing low-PAR

precoders in [3], [4] suffers from high RF chain cost. To the best of our knowledge, there is no systematic

method reported in the literature to simultaneously solve all of the above issues.

In this paper, we propose a novel two stage constant-envelope precoding architecture for massive

MIMO systems to simultaneously solve all of the aforementioned practical issues. Specifically, the MIMO

precoder at the BS is partitioned into a RF precoder and a baseband precoder as illustrated in Fig. 1. The

baseband precoder generates baseband transmit vectors with CE elements from the input data symbol

Page 3: Two Stage Constant-Envelope Precoding for Low Cost Massive … · High Peak-to-Average Power Ratio (PAR): In today’s wideband systems, orthogonal frequency-division multiplexing

1053-587X (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSP.2015.2486749, IEEE Transactions on Signal Processing

vectors. Then the CE baseband transmit vector is converted to RF signal vector using S �M RF chains

and the resulting RF signal vector is precoded using a RF precoder before being transmitted using the

M antennas. To facilitate low cost implementation of the RF precoder using RF phase shifting networks

[5], [6], the RF precoder is adaptive to channel statistics only with a CE constraint. The proposed two

stage CE precoding has several advantages. First, since only S � M RF chains is required at the BS,

the cost of RF chains can be significantly reduced. Second, the CE constraint on the baseband precoder

ensures that the baseband signal at the input of the RF chain has low PAR, which enables low-cost and

low-power BS with nonlinear power amplifiers. Third, the CSI signaling overhead can also be alleviated

since only low dimensional effective CSI is required at each BS. Finally, the frequency selective fading

can also be combated by the two stage CE precoding and there is no need to use OFDM, which greatly

reduces the complexity at user side. Therefore, the proposed two stage CE precoding architecture is a

good candidate for practical implementation of massive MIMO systems. However, these good features

of the proposed solution cannot be achieved by a naive combination of the existing techniques. There

are several new technical challenges associated with the design of two stage CE precoding.

• Two Stage Non-convex Stochastic Optimization: Due to the mixed timescale precoding structure

and the CE constraints, the design of two stage CE precoding is a two stage non-convex stochastic

optimization problem [9], which cannot be solved by the existing stochastic optimization algorithms

such as stochastic subgradient [10] or stochastic cutting plane [9].

• Infinite Dimensional Problem: When the CSI has a continuous distribution, the two stage CE

precoding problem becomes an infinite dimensional problem with uncountable infinite number of

optimization variables. In this case, it is difficult to even find a stationary solution1 of the problem,

because this involves solving a fixed point equation over the functional space.

In this paper, we propose a novel online alternating optimization (AO) algorithm to solve the two stage

non-convex stochastic optimization problem for two stage CE precoding design. The proposed online AO

solution does not require explicit knowledge of the channel statistics and can autonomously converge to a

stationary solution using observations of the (outdated) channel only. Analysis and simulations show that

the proposed two stage CE precoding with online AO optimization achieves the best tradeoff between

performance, hardware cost, power efficiency, CSI signaling overhead and computational complexity,

compared with various state-of-the-art baselines. Moreover, the proposed online AO method can be

1Stationary solution is a natural extension of stationary point for infinite dimensional problem as will be defined in Definition

1.

Page 4: Two Stage Constant-Envelope Precoding for Low Cost Massive … · High Peak-to-Average Power Ratio (PAR): In today’s wideband systems, orthogonal frequency-division multiplexing

1053-587X (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSP.2015.2486749, IEEE Transactions on Signal Processing

Figure 1: An illustration of the massive MIMO downlink under two stage CE Precoding. The blue / red blocks

represent long timescale / short timescale processes. The blue / red arrows represent long-term / short-term signaling.

potentially used to solve a general class of two stage non-convex stochastic optimization problems.

II. MASSIVE MIMO SYSTEM WITH TWO STAGE CE PRECODING

A. Transmit Signal Model under Two Stage CE Precoding

Consider a massive MIMO downlink system where a BS serves K � M single-antenna users as

illustrated in Fig. 1. The BS is equipped with M � 1 antennas but only S �M transmit RF chains to

reduce the hardware cost. The key components of the transmitter at the BS are elaborated below.

1) Constant-envelope Baseband Precoder: The CE-baseband precoder is a mapping from a block of

N data symbol vectors u ,[uT [0] , ...,uT [N − 1]

]T ∈ CNK and the effective channel2 h ∈ CKLS

to a block of N baseband transmit vectors x ,[xT [0] , ...,xT [N − 1]

]T ∈ CNS , where u [n] =

[u1 (n) , ..., uK (n)]T ∈ CK is the data symbol vector and uk (n) with E[|uk (n)|2

]= 1 is the data

symbol of user k at time n, x [n] = [x1 [n] , ..., xS [n]]T ∈ CS is the baseband transmit vector and xs [n]

is the input signal of the s-th RF chain at time n. The baseband precoder satisfies the CE constraint:

|xs [n]| = 1, ∀n, s. (1)

As a result of the CE constraint in (1), the baseband precoder can be specified by a phase angle vector

θ = [θ1, ..., θNS ]T . Specifically, given the phase angle vector θ, the baseband transmit vector is given by

xs [n] = ejθnS+s ,∀n, s.

2The definition of the effective channel will be given in (7).

Page 5: Two Stage Constant-Envelope Precoding for Low Cost Massive … · High Peak-to-Average Power Ratio (PAR): In today’s wideband systems, orthogonal frequency-division multiplexing

1053-587X (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSP.2015.2486749, IEEE Transactions on Signal Processing

Figure 2: An illustration of frame and block structures.

2) Insertion of Cyclic Prefix : To facilitate the communications over a frequency selective fading

channel, a cyclic prefix (CP) of length Nc is inserted at the beginning of each block such that the

transmit vectors from time 0 to time N − 1 is x [n] , n = 0, 1, ..., N − 1 generated from the CE baseband

precoder, and the transmit vectors from time −Nc to time −1 is the CP generated according to

x [n] = x [N + n] , n = −Nc, ...,−1.

The CP is used to absorb the inter-symbol-interference caused by frequency selective fading.

3) Constant-envelope RF Precoder: The RF precoder F ∈ CM×S is used to convert the signal vector√Px [n] output from the S power amplifiers to a signal vector

√PFx [n] ∈ CM , which is eventually

transmitted from the M antennas, where P is the transmit power of each power amplifier. The RF

precoder F also satisfies the CE constraints

Fms =1√Mejφ(m−1)S+s ,∀m, s, (2)

where Fms denotes the (m, s)-th entry of F. Hence, the RF precoder can be specified by a phase angle

vector φ = [φ1, ..., φMS ].

In the proposed two stage CE precoding, the time domain is divided into frames, where each frame

consists of Tf blocks. The frame and block structures are illustrated in Fig. 2.

B. Frequency Selective MIMO Channel Model

Consider a frequency selective channel which can be modeled as an FIR filter with L taps. The channel

is specified by L matrices H [l] , l = 0, ..., L − 1, whose (k,m)-th elements {Hkm [0] , ...,Hkm [L− 1]}

Page 6: Two Stage Constant-Envelope Precoding for Low Cost Massive … · High Peak-to-Average Power Ratio (PAR): In today’s wideband systems, orthogonal frequency-division multiplexing

1053-587X (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSP.2015.2486749, IEEE Transactions on Signal Processing

form the impulse response from antenna m to user k. The received signal vector at time n

y [n] =√P

L−1∑l=0

H [l]Fx [n− l] + w [n] ,

where w [n] ∼ CN (0, IK) is the AWGN vector. After removing the CP at the receiver, the input-output

relation of the channel is given by [11]

y =√PHFx + w, (3)

where H ∈ CKN×MN is the composite channel for the each block and it is a block-circulant matrix

given by

H =

H [0] 0 · · · 0 H [L− 1] · · · H [1]

H [1] H [0] 0 · · · · · · · · · H [2]...

......

......

......

0 · · · · · · 0 H [L− 1] · · · H [0]

(4)

F = Block Diag [F, ...,F] ∈ CMN×NS is a block diagonal matrix, and w =[wT [0] , ...,wT [N − 1]

]T .

For convenience, define the concatenated channel vector h ,[VecT (H [0]) , ...,VecT (H [L− 1])

]T ∈CLKM . We consider a block fading model where the the concatenated channel vector h (t) at block t

is generated according to a general distribution H (ζ (t)) with a slow time-varying parameter ζ (t). The

parameter ζ (t) is called channel statistics and it can be used to model the large scale fading such as path

loss and shadow fading which usually changes at a much slower timescale compared to the duration of

a block. In other words, H (ζ) is the conditional distribution of h when the channel statistics is ζ. This

channel model includes multipath Rayleigh and Rician fading channels as special cases.

C. Matched Filter Receiver

To reduce the complexity of the receiver at the user (mobile station), a simple matched filter detection

is employed at each user. Specifically, at user k, the baseband received signal after the matched filter is

scaled by a complex receive coefficient Gk to obtain the estimated data symbol. From (3), the estimated

data symbol vector can be expressed as

u = Gy = u + ξ + Gw, (5)

where G = Block Diag [G, ...,G] ∈ CNK×NK is a block diagonal matrix with G = Diag [G1, ..., GK ],

and ξ =√P GHFx−u can be interpreted as multiuser interference (MUI) vector. Assuming uk (n) ,∀k, n

Page 7: Two Stage Constant-Envelope Precoding for Low Cost Massive … · High Peak-to-Average Power Ratio (PAR): In today’s wideband systems, orthogonal frequency-division multiplexing

1053-587X (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSP.2015.2486749, IEEE Transactions on Signal Processing

to be i.i.d. complex Gaussian with zero mean and unit variance, and following similar analysis as that

in [3], [4], it can be shown that the corresponding achievable rate is lower bounded as

rk (H) =I (uk;uk|H)

N

− log∣∣∣E [ξkξHk |H]+ |Gk|2 I

∣∣∣N

+

, (6)

where uk = [uk [0] , ..., uk [N − 1]]T and ξk = [ξk [0] , ..., ξk [N − 1]]T .

III. PROBLEM FORMULATION FOR TWO STAGE CE-PRECODING

A. Two Timescale Optimization Variables

1) Long-term optimization variable (RF Precoder φ): The RF precoder φ is used to achieve the array

gain provided by the massive MIMO. It is adaptive to the channel statistics ζ only to reduce the signaling

overhead [8], [12].

2) Short-term optimization variables (Baseband Precoder θ and Receive Coefficients G): For fixed

RF precoder φ, the effect of the concatenated channel h on the performance is completely characterized

by the effective concatenated channel

h ,[VecT (H [0]F) , ...,VecT (H [L− 1]F)

]T. (7)

Hence, the baseband precoder θ and the receive coefficients G are adaptive to the effective CSI h and

data symbol vector u to achieve MUI mitigation under constant envelope constraints on the baseband

transmit vector x.

B. Two-Timescale Stochastic Optimization Formulation

Define z ,[hT ,uT

]T as the concatenated channel-symbol vector, and Θ = {θ (z) ,G (z) : ∀z} as

the collection of the baseband precoders and receive coefficients for all possible channel-symbol vectors.

For a given realization of the channel statistics ζ, the two stage CE precoding design is formulated as

the following two timescale stochastic optimization problem which minimizes the MSE conditioned on

ζ:

P0 : minφ,ΘI(φ,Θ) , E [I (φ,θ (z) ,G (z) ; z)| ζ] (8)

s.t. θ (z) ∈ [−π, π)NS ,∀z;φ ∈ [−π, π)MS ,

Page 8: Two Stage Constant-Envelope Precoding for Low Cost Massive … · High Peak-to-Average Power Ratio (PAR): In today’s wideband systems, orthogonal frequency-division multiplexing

1053-587X (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSP.2015.2486749, IEEE Transactions on Signal Processing

where

I (φ,θ,G; z) , E[‖u− u‖2

∣∣∣ z]=∥∥∥√P GHF (φ)x (θ)− u

∥∥∥2+ Tr

(GGH

),

is the MSE for given z. Note that for clarity, we explicitly write F and x as a function of φ and θ

respectively.

Problem P0 is a very challenging non-convex stochastic optimization problem and it is highly non-

trivial even to design a sub-optimal algorithm which converges to a local optimal solution. In this paper,

we will propose a low complexity online algorithm to find a ε-accurate stationary solution of Problem

P0 defined below.

Definition 1 (ε-accurate stationary solution of P0). A solution (φ?,Θ? = {θ? (z) ,G? (z) : ∀z}) is called

a ε-accurate stationary solution of Problem P0 for some ε ≥ 0 if it satisfies the following conditions:

∇Tθ I (φ?,θ? (z) ,G? (z) ; z) (θ − θ? (z))

+

K∑k=1

∂I (φ?,θ? (z) ,G? (z) ; z)

∂GIk(GIk −G?Ik (z))

+

K∑k=1

∂I (φ?,θ? (z) ,G? (z) ; z)

∂GQk

(GQk −G?Qk (z)

)≥ −ε,∀θ ∈ [−π, π)NS ,G (9)

for every z outside a set of probability zero, and

∇TφI(φ?,Θ?) (φ− φ?) ≥ −ε, ∀φ ∈ [−π, π)MS , (10)

where GIk and GQk are the real and imaginary parts of Gk respectively. A 0-accurate stationary solution

is simply called a stationary solution of P0.

The stationary solution is a natural extension of the stationary point for a deterministic optimization

problem. The global optimal solution of P0 must be a stationary solution. However, the set of stationary

solutions may also contain local optimal solutions and a certain type of saddle points. In the simulations, it

is observed that the proposed algorithm always converges to a stationary solution with good performance.

The problem of finding a stationary solution of P0 is still highly non-trivial. First, there is no closed-

form characterization of the MSE I(φ,Θ). Furthermore, when the channel-symbol vector z has a

continuous distribution, finding a stationary solution of P0 involves solving a fixed point equation in

(9) and (10) with uncountable number of variables.

Page 9: Two Stage Constant-Envelope Precoding for Low Cost Massive … · High Peak-to-Average Power Ratio (PAR): In today’s wideband systems, orthogonal frequency-division multiplexing

1053-587X (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSP.2015.2486749, IEEE Transactions on Signal Processing

IV. ONLINE ALTERNATING OPTIMIZATION FOR P0

In this section, we propose an online alternating optimization (AO) algorithm to solve a stationary

solution of P0. Our target is to develop a robust online solution which does not require explicit knowledge

of the channel statistics ζ. The solution should autonomously converge to a stationary solution of P0

using observations of the (outdated) channel only.

Challenge 1 (Design an online algorithm for P0). Design an online AO algorithm to solve a ε-accurate

stationary solution of the non-convex stochastic optimization problem P0 for arbitrarily small but fixed

ε > 0.

We shall first summarize the proposed online AO algorithm. Then we elaborate the key steps and prove

the convergence to a ε-accurate stationary solution.

A. Online AO Algorithm

The proposed online AO algorithm is summarized in Algorithm 1. The indexes J and t are indicators

for referring to a frame and a block, respectively.

In each frame, the BS obtains one channel-symbol vector and stores it in the memory. In the J-th frame

(iteration), the BS has obtained J channel-symbol vectors z1:J ,{zq =

[hqT ,uqT

]T, q = 1, ..., J

}, with

which the BS can construct the following approximated problem of P0:

PJ : minφ,Θ(z1:J)

1

J

J∑q=1

I (φ,θ (zq) ,G (zq) ; zq)

s.t. θ (zq) ∈ [−π, π)NS , q = 1, ..., J ;φ ∈ [−π, π)MS ,

where Θ(z1:J

)= {θ (zq) ,G (zq) , q = 1, ..., J}. The objective function in PJ is a sample average

approximation of the MSE I(φ,Θ), and it converges to I(φ,Θ) as J → ∞. This motivates us to

propose Algorithm 1 to solve P0, where in the J-th iteration, it aims at solving a ε-accurate stationary

point of the approximated problem PJ as elaborated below.

For fixed φ, Problem PJ can be decomposed into J independent short-term control subproblems

PS (φ, zq) , q = 1, ..., J , where for fixed zq = z, PS is defined as

PS (φ; z) : minθ,GI (φ,θ,G; z) , s.t. θ ∈ [−π, π)NS . (11)

The subproblem PS is still non-convex. However, we can solve a ε-accurate stationary point of PS (φ; z)

(i.e., a point (θ?,G?) satisfying (9) with φ? replaced by φ) using a low complexity procedure called

Page 10: Two Stage Constant-Envelope Precoding for Low Cost Massive … · High Peak-to-Average Power Ratio (PAR): In today’s wideband systems, orthogonal frequency-division multiplexing

1053-587X (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSP.2015.2486749, IEEE Transactions on Signal Processing

Procedure PS as will be elaborated in Section IV-B. For fixed Θ(z1:J

), Problem PJ becomes a long-term

RF precoding subproblem:

PJL(Θ(z1:J

); z1:J

):

minφ

1

J

J∑q=1

I (φ,θ (zq) ,G (zq) ; zq) , s.t.φ ∈ [−π, π)MS (12)

Similarly, although the subproblem PJL is non-convex, we can solve a ε-accurate stationary point of

PJL using a low complexity procedure called Procedure PL as will be elaborated in Section IV-C. This

motivates us to use an AO-like update to solve PJ as summarized in the J-th iteration of Algorithm

1. Specifically, in step 2 of the J-th iteration, Procedure PS is called to solve a ε-accurate stationary

point(θJ (zq) ,GJ (zq)

)of PS

(φJ−1; zq

)for q = 1, ..., J . For convenience, we express Procedure PS

as a mapping FPS from some inputs to the short-term control as given in (13) in Algorithm 1. When

solving PS(φJ−1; zq

), the input of Procedure PS includes the threshold ε, the q-th sample of effective

channel-symbol vector zq ,[hqT ,uqT

]T, and the initial baseband precoder θJ−1 (zq), where hq is the

effective channel determined by the RF precoder φJ−1 and the q-th channel sample hq according to (7).

Note that φJ and ΘJ(z1:J

)={θJ (zq) ,GJ (zq) , q = 1, ...J

}denote the RF precoder and short-term

control variables after the J-th iteration, respectively. In step 3 of the J-th iteration, Procedure PL is

called to solve a ε-accurate stationary point of PJL with input ε, z1:J (the J samples of channel-symbol

vectors), ΘJ(z1:J

), and φJ−1 (the initial RF precoder). Similarly, in (14) of Algorithm 1, Procedure PL

is expressed as a mapping FPL from the inputs to the RF precoder.

In the (J + 1)-th iteration, the BS obtains a new sample zJ+1 of channel-symbol vector to improve the

sample average approximation of the MSE I(φ,Θ). Then it performs one AO-like update on PJ+1 as

described above and enters the (J + 2)-th iteration. The AO-like iteration is carried out until convergence.

The overall solution is illustrated in Fig. 3. Intuitively, we can conjecture that Algorithm 1 converges to

a stationary solution of P0 as J →∞. However, the formal proof is quite involved as will be elaborated

in Section IV-D.

Note that in the J-th frame, the updated control variables φJ and ΘJ(z1:J

)are output at the end

of J-th frame to allow sufficient time for the BS to obtain the sample zJ and calculate the control

variables. Hence, in the J-th frame, the BS only has φJ−1 and ΘJ−1(z1:J−1

), and φJ−1 will be used as

the RF precoder in the J-th frame for downlink transmission. However, the short-term control variables

for each block t ∈ [(J − 1)Tf + 1, JTf ] in the J-th frame is still unknown because we usually have

z (t) /∈ z1:J−1. Hence, at the beginning of each block t ∈ [(J − 1)Tf + 1, JTf ], the BS needs to call

Page 11: Two Stage Constant-Envelope Precoding for Low Cost Massive … · High Peak-to-Average Power Ratio (PAR): In today’s wideband systems, orthogonal frequency-division multiplexing

1053-587X (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSP.2015.2486749, IEEE Transactions on Signal Processing

Figure 3: Summary of overall solution for P0.

Algorithm 1 Online Alternating Optimization Algorithm for Solving a ε-accurate Stationary Solution of

P0

1: Initialization: Choose a fixed ε > 0.

2: Let J = 1. Choose an initial RF precoder φ0.

3: Let z0 = 0. Choose an initial baseband precoder θ0(z0).

4: Step 1: Obtain one realization of channel-symbol vector during frame J , indexed as zJ =[hJT ,uJT

]T .

Initialize θJ−1(zJ)

= θJ−1(zq

J

zJ

)for zJ , where qJzJ = 0 if J = 1 and qJzJ = argmin

q∈[1,J−1]

∥∥zJ − zq∥∥

otherwise.

5: Step 2 (Short-term control optimization):

6: For q = 1 to J

Let (θJ (zq) ,GJ (zq)

)= FPS

(ε, zq,θJ−1 (zq)

). (13)

7: Step 3 (Long-term RF precoder optimization):

8: Let

φJ = FPL(ε, z1:J ,ΘJ

(z1:J

),φJ−1

). (14)

9: Termination:

10: Let J = J + 1 and return to Step 1 until convergence.

Procedure PS to calculate the optimized short-term control variables as:(θJ∗ (z (t)) ,GJ∗ (z (t))

)= FPS

(ε, z (t) ,θJ−1

(zq

Jz

)), (15)

using the input ε, z (t) =[hT (t) ,uT (t)

]Tand θJ−1

(zq

Jz

), where h (t) is the effective channel

determined by the RF precoder φJ−1 and the current channel h (t), qJz = argminq∈[1,J−1]

‖z (t)− zq‖ and

Page 12: Two Stage Constant-Envelope Precoding for Low Cost Massive … · High Peak-to-Average Power Ratio (PAR): In today’s wideband systems, orthogonal frequency-division multiplexing

1053-587X (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSP.2015.2486749, IEEE Transactions on Signal Processing

zqJz is the stored channel-symbol sample which has the minimum distance with z (t).

In (15), only the effective channel h (t) is required to calculate the optimized short-term control

variables θJ? (z (t)) ,GJ? (z (t)). On the other hand, the update of the long-term RF precoder in step

3 of the J-th iteration only requires the outdated CSI {hq, q = 1, ..., J}, where hq can be obtained in

the q-th frame using downlink pilot training and uplink CSI feedback. Since only a low dimensional

effective channel h (t) ∈ CKLS is required at each block and one outdated high dimensional channel

hq ∈ CKLM is required at each frame (recall that each frame contains many blocks), the CSI signaling

overhead caused by Algorithm 1 is much smaller than the conventional single stage precoding schemes.

In the following, we elaborate Procedure PS for solving the short-term control subproblem PS and

Procedure PL for solving the long-term RF precoding subproblem PJL .

B. Procedure PS for Solving PS

The short-term control subproblem (11) is a deterministic non-convex optimization problem. For fixed

θ, the optimal receive coefficients are given by

G? = argminGI (φ,θ,G; z) . (16)

For fixed θ−i = [θ1, ..., θi−1, θi+1, ..., θNS ] and G, the optimal θi is given by

θ?i = argminθi∈[−π,π)

I (φ,θ,G; z) . (17)

The optimal solutions of (16) and (17) have closed form solutions as will be given in Lemma 1. Before

stating Lemma 1, we first define some notations. Let H ∈ CKN×SN denote the composite effective

channel obtained by replacing the channel matrices H (l) , l = 0, ..., L − 1 in (4) with the effective

channel matrices H (l) = H (l)F, l = 0, ..., L − 1. Note that H can be determined by the the effective

channel h. Let H−i =[H1, ..., Hi−1, Hi+1, ..., HSN

]∈ CKN×(SN−1) denote a matrix obtained by

deleting the i-th column of H.

Lemma 1 (Solutions of Short-term Subproblems). For given φ, z =[hT ,uT

]T and baseband precoder

θ, the solution of (16) is uniquely given by

G?k =

√P∑N−1

n=0 H∗θ [nK + k]uk (n)

P∑N−1

n=0

∣∣∣H∗θ [nK + k]∣∣∣2 +N

, ∀k (18)

where Hθ = Hx (θ), Hθ [nK + k] is the (nK + k)-th element of Hθ, and H is the composite effective

channel corresponding to φ,h.

Page 13: Two Stage Constant-Envelope Precoding for Low Cost Massive … · High Peak-to-Average Power Ratio (PAR): In today’s wideband systems, orthogonal frequency-division multiplexing

1053-587X (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSP.2015.2486749, IEEE Transactions on Signal Processing

For given φ, z =[hT ,uT

]T , θ−i = [θ1, ..., θi−1, θi+1, ..., θNS ] and G, the solution of (17) is uniquely

given by

θ?i = π + arg{HHi G

H(√

P GH−ix−i − u)}

, (19)

when∣∣∣HH

i GH(√

P GH−ix−i − u)∣∣∣ 6= 0, where x−i =

[ejθ1 , ..., ejθi−1 , ejθi+1 , ..., ejθSN

]. When∣∣∣HH

i GH(√

P GH−ix−i − u)∣∣∣ = 0, the optimal θ?i can take any value in [−π, π).

Please refer to Appendix A for the proof.

Note that to calculate the G? and θ?i in (18) and (19), the BS only needs to know the effective

channel-symbol vector z since φ, z only appear in the composite effective channel H.

The above analysis motivates us to use an alternating optimization method to solve (11). However, the

standard AO method may not converge to a stationary point of a non-convex problem [13]. We need to

address the following challenge.

Challenge 2 (AO algorithm to solve a ε-accurate stationary point of (11)). Design an AO algorithm

which is guaranteed to converge to a ε-accurate stationary point of (11) for arbitrarily small but fixed

ε > 0.

To achieve this, the specific structure of the short-term control subproblems as characterized in Lemma

1 must be exploited in the design of the AO algorithm. Based on Lemma 1, we propose Procedure PS

in Algorithm 2 to find a ε-accurate stationary point of (11) using a modified AO method, where in Line

7, we let θ(l)i = θ

(l−1)i whenever

∣∣∣HHi G

(l)H(√

P G(l)H−ix(l)−i − u

)∣∣∣ < ε. This modification ensures the

convergence to a ε-accurate stationary point of (11) as proved in the following theorem.

Theorem 1 (Convergence of Procedure PS). Every accumulation point (θ?,G?) of the sequence of

iterates(θ(l),G(l)

), l = 1, 2, ... generated by Procedure PS is a ε-accurate stationary point of (11).

Moreover, any accumulation point (θ?,G?) of(θ(l),G(l)

), l = 1, 2, ... is a stationary point of (11) if it

satisfies the following condition:∣∣∣HHi G

?H(√

P G?H−ix?−i − u

)∣∣∣ ≥ ε,∀i (20)

where x?−i =[ejθ

?1 , ..., ejθ

?i−1 , ejθ

?i+1 , ..., ejθ

?SN

].

Please refer to Appendix B for the proof. In practice, we can set ε to be a very small number. In this

case, the probability that∣∣∣HH

i G?H(√

P G?H−ix?−i − u

)∣∣∣ < ε is very small for randomly generated

Page 14: Two Stage Constant-Envelope Precoding for Low Cost Massive … · High Peak-to-Average Power Ratio (PAR): In today’s wideband systems, orthogonal frequency-division multiplexing

1053-587X (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSP.2015.2486749, IEEE Transactions on Signal Processing

Algorithm 2 Procedure PS: Finding a ε-accurate stationary point of (11)

1: Input: ε, z =[h,u

]Tand θ(0).

2: Initialization: Let l = 1.

3: Step 1 (Receive coefficients optimization):

4: Let

G(l)k =

√P∑N−1

n=0 H∗θ(l−1) [nK + k]uk (n)

P∑N−1

n=0

∣∣∣H∗θ(l−1) [nK + k]

∣∣∣2 +N,∀k (21)

where Hθ(l−1) = Hx(θ(l−1)

)and H is the composite effective channel corresponding to h.

5: Step 2 (Baseband precoder optimization):

6: For i = 1 to NS

7: If∣∣∣HH

i G(l)H

(√P G(l)H−ix

(l)−i − u

)∣∣∣ < ε, let θ(l)i = θ

(l−1)i ;

Otherwise, let

θ(l)i = π + arg

{HHi G

(l)H(√

P G(l)H−ix(l)−i − u

)}, (22)

where x(l)−i =

[ejθ

(l)1 , ..., ejθ

(l)i−1 , ejθ

(l−1)i+1 , ..., ejθ

(l−1)SN

].

8: Termination:

9: Let l = l + 1 and return to Step 1 until convergence or l ≥ NPS .

channel-symbol vector z. In the simulations, we set ε = 10−4 and Procedure PS is always observed to

converge to a stationary point of (11), i.e., condition (20) is always satisfied.

C. Procedure PL for Solving the PJL

Similarly, we can use AO method to solve the RF precoding subproblem in (12). For fixed φ−i =

[φ1, ..., φi−1, φi+1, ..., φMS ], the optimal φi is given by

φ?i = argminφi∈[−π,π)

1

J

J∑q=1

I (φ,θ (zq) ,G (zq) ; zq) . (23)

In the following lemma, we give closed form solution for (23).

We first define some useful notations. Let F−ms denote a matrix obtained by setting the (m, s)-th

element of F to be zero and let F−ms = Block Diag [F−ms, ...,F−ms] ∈ CMN×NS denote the (m, s)-th

extended RF precoder. Note that F−ms and F−ms are determined by φ−i , [φ1, ..., φi−1, φi+1, ..., φMS ],

Page 15: Two Stage Constant-Envelope Precoding for Low Cost Massive … · High Peak-to-Average Power Ratio (PAR): In today’s wideband systems, orthogonal frequency-division multiplexing

1053-587X (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSP.2015.2486749, IEEE Transactions on Signal Processing

Algorithm 3 Procedure PL: Finding a ε-accurate stationary point of (12)

1: Input: ε, z1:J , Θ(z1:J

)and φ(0).

2: Initialization: Let l = 1.

3: For i = 1 to MS

4: If∣∣∣ 1J

∑Jq=1 gi

(l)−i,θ (zq) ,G (zq) ; zq

)∣∣∣ < ε, let φ(l)i = φ

(l−1)i ;

5: Otherwise, let

φ(l)i = π + arg

1

J

J∑q=1

gi

(l)−i,θ (zq) ,G (zq) ; zq

) , (26)

where φ(l)−i =

(l)1 , ..., φ

(l)i−1, φ

(l−1)i+1 , ..., φ

(l−1)MS

].

6: Termination:

7: Let l = l + 1 and return to Line 3 until convergence or l ≥ NPL.

where i = (m− 1)S + s. For given φ−i , z =[hT ,uT

]T and (θ,G), define a function

gi(φ−i,θ,G; z

),

N−1∑n=0

√Pe−jθnS+sHH

nM+mGH(√

P GHF−msx (θ)− u), (24)

where m and s are determined by (m− 1)S + s = i, and H is the composite channel corresponding to

h.

Lemma 2 (Solution of Long-term Subproblem). For given φ−i, a set of sampled channel-symbol vectors

z1:J and the corresponding short-term control variables Θ(z1:J

), the solution of (23) is uniquely given

by

φ?i = π + arg

1

J

J∑q=1

gi(φ−i,θ (zq) ,G (zq) ; zq

) (25)

providing that∣∣∣ 1J

∑Jq=1 gi

(φ−i,θ (zq) ,G (zq) ; zq

)∣∣∣ 6= 0. When∣∣∣ 1J

∑Jq=1 gi

(φ−i,θ (zq) ,G (zq) ; zq

)∣∣∣ =

0, the optimal φ?i can take any value in [−π, π).

The proof is similar to that of Lemma 1.

Based on Lemma 2, we propose Procedure PL to solve the RF precoding subproblem in (12) as

summarized in Algorithm 3. Similar to Theorem 1, it can be shown that Procedure PL converges to a

ε-accurate stationary point of (12).

Page 16: Two Stage Constant-Envelope Precoding for Low Cost Massive … · High Peak-to-Average Power Ratio (PAR): In today’s wideband systems, orthogonal frequency-division multiplexing

1053-587X (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSP.2015.2486749, IEEE Transactions on Signal Processing

D. Convergence of the Online AO Algorithm

In this section, we address the following challenge.

Challenge 3 (Convergence to a ε-accurate stationary solution of P0). Establish the analytical proof for

convergence of the online AO Algorithm to a ε-accurate stationary solution of P0 for arbitrarily small

but fixed ε > 0.

Compared to the convergence proof of the conventional AO algorithm for deterministic optimization

problems, there are several new technical challenges in the convergence proof of the online AO algorithm.

• Non-monotonic property due to stochastic optimization: In the propose online AO algorithm,

due to the stochastic nature of Problem P0, the objective value no longer decreases monotonically

after each online AO iteration. Hence, the techniques used to prove the convergence of conventional

AO algorithm cannot be applied to the online AO algorithm.

• Infinite Dimensional Problem: Another challenge is that P0 is an infinite dimensional problem

which has uncountable infinite number of optimization variables when the channel-symbol vector

z has a continuous distribution. As a result, the convergence proof of the online AO algorithm is

much more difficult than the conventional AO algorithm which only works for a problem with a

finite number of optimization variables.

Despite of the above challenges, we prove that the online AO algorithm converges to a ε-accurate

stationary solution of Problem P0 as summarized in the following Theorem.

Theorem 2 (Convergence of Algorithm 1). Let(φJ ,ΘJ ,

{θJ? (z) ,GJ? (z) : ∀z

}), J = 1, 2, ... be

the sequence of iterates generated by Algorithm 1, where θJ? (z) ,GJ? (z) are given in (15). Then any

accumulation point (φ?,Θ?) of(φJ ,ΘJ

), J = 1, 2, ... is a ε-accurate stationary solution of P0 with

probability 1.

Please refer to Appendix C for the detailed proof.

Since ε can be set to be an arbitrarily small (but fixed) positive number, we can say that Algorithm 1

converges to a stationary solution of P0 for all practical purposes.

V. SIMULATION RESULTS AND DISCUSSIONS

Consider the downlink of a multi-user massive MIMO cellular system operating in FDD mode. The

coverage area of the BS is a circle with a radius of 250m. There are a total number of K = 8 users, 6 of

whom are clustered around 2 hotspots. The two hotspots and the other two users are uniformly distributed

Page 17: Two Stage Constant-Envelope Precoding for Low Cost Massive … · High Peak-to-Average Power Ratio (PAR): In today’s wideband systems, orthogonal frequency-division multiplexing

1053-587X (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSP.2015.2486749, IEEE Transactions on Signal Processing

within the cell. There are L = 4 resolvable paths in the frequency selective fading channel. The channel

vector of user k corresponding to the l-th resolvable path is modeled as hk [l] = ζ1/2k hwk , ∀k, where

hwk ∈ CM has i.i.d. complex entries of zero mean and unit variance; and ζk = E[hk [l]hHk [l]

], ∀l is the

spatial correlation matrix generated according to the local scattering model in [8]. The block length is

N = 128 and the frame length is Tf = 20 blocks. The path gains PLk’s are generated using the “Urban

Macro NLOS” model in [14]. We compare the performance of the proposed solution with the following

3 baselines.

• Baseline 1 (Single Stage ZF Precoding [15]): OFDM with N = 128 subcarriers is used to combat

frequency selective fading and conventional single stage MU-MIMO ZF precoding [15] is used at

each subcarrier.

• Baseline 2 (Single Stage CE Precoding [4]): There is no RF precoder and a CE constraint is

imposed on the baseband transmit signals. The block length is N = 128.

• Baseline 3 (Two Stage Precoding [16]): OFDM with N = 128 subcarriers is used to combat

frequency selective fading and the two stage precoding in [16] is used at each subcarrier. The

dimension of the pre-beamforming matrix in [16] is set as S for fair comparison.

In the simulations, both cases with perfect CSI and outdated CSI will be considered. Specifically,

the outdated CSI is related to the actual CSI by the autoregressive model in [17] with the following

parameters: the user speed is 6 km/h; the carrier frequency is 2GHz; the CSI delay is given by 8Nh

M ms,

where Nh is the dimension of the per user CSI vector required at the BS (e.g., Nh = S for the proposed

solution).

We compare the ergodic sum rate of different schemes under different simulation setup. Note that the

ergodic sum rate depends on the effective total transmit power Pe = PT η, which further depends on

the total transmit power PT and power efficiency η of power amplifiers. Typically, a non-linear power

amplifier is 4 − 6 times more power efficient than a highly linear power amplifier [18]. Since the transmit

signal of Baseline 1 and Baseline 3 has high PAR, we have to use linear power amplifiers. On the other

hand, the proposed solution and Baseline 2 can use more efficient non-linear power amplifiers. In the

simulations, we will consider two cases. In the case with ideal power efficiency, we assume the power

efficiency of both linear and non-linear power amplifiers are 1. In the case with practical power efficiency,

the power efficiency of non-linear power amplifiers are 4 times higher than that of linear power amplifiers.

In this case, the effective total transmit power of Baseline 1 and Baseline 3 is 10 log10(4) ≈ 6dB smaller

than that of the proposed solution and Baseline 2 under the same total transmit power.

Page 18: Two Stage Constant-Envelope Precoding for Low Cost Massive … · High Peak-to-Average Power Ratio (PAR): In today’s wideband systems, orthogonal frequency-division multiplexing

1053-587X (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSP.2015.2486749, IEEE Transactions on Signal Processing

15 20 25 30 35 40 45 5016

18

20

22

24

26

28

30

32

34

36

Number of RF chains (S)

Erg

odic

sum

rat

e (B

its/c

hann

el u

se)

Proposed under ideal PEBaseline 1 under ideal and practical PEBaseline 2 under ideal PEBaseline 3 under ideal and practical PEProposed under practical PEBaseline 2 under practical PE

Figure 4: Ergodic sum rate versus S with M = 100 antennas and perfect CSI at the BS.

In Fig. 4, we plot the ergodic sum rate versus S with M = 100 antennas and perfect CSI at the BS.

The effective total transmit power of Baseline 1 and Baseline 3 are fixed as -5dB, and the total transmit

power of all schemes are set to be identical (i.e., the effective total transmit power of the proposed

solution and Baseline 2 is -5dB under ideal power efficiency and 1dB under practical power efficiency).

With only S = 32 RF chains, the proposed solution already achieves similar performance as Baseline 2

(single stage CE precoding) which requires M = 100 RF chains. Under the same effective total transmit

power, the two stage/single stage CE precoding schemes achieve lower ergodic sum rate compared to the

linear precoding counterpart (Baseline 3/Baseline 1) due to the more stringent CE constraint. However,

under practical power efficiency, the two stage/single stage CE precoding schemes achieve much higher

ergodic sum rate (see the dot curves) than the linear precoding counterpart due to higher power efficiency

of non-linear power amplifiers. In Fig. 5, we consider outdated CSI at the BS. In this case, with only

S = 24 RF chains, the proposed solution already outperforms Baseline 2 which requires M = 100 RF

chains. This is because in Baseline 2, the BS requires full channel vector which has higher dimension

than the effective channel vector. As a result, the CSI delay (and CSI error) of Baseline 2 is also larger.

The other results are similar to Fig. 4. In summary, with S = 16 RF chains, the proposed solution

achieves the best performance under practical scenario with outdated CSI and non-ideal power efficiency

of power amplifiers, as shown in Fig. 5.

Page 19: Two Stage Constant-Envelope Precoding for Low Cost Massive … · High Peak-to-Average Power Ratio (PAR): In today’s wideband systems, orthogonal frequency-division multiplexing

1053-587X (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSP.2015.2486749, IEEE Transactions on Signal Processing

15 20 25 30 35 40 45 5016

18

20

22

24

26

28

30

32

Number of RF chains (S)

Erg

odic

sum

rat

e (B

its/c

hann

el u

se)

Proposed under ideal PEBaseline 1 under ideal and practical PEBaseline 2 under ideal PEBaseline 3 under ideal and practical PEProposed under practical PEBaseline 2 under practical PE

Figure 5: Ergodic sum rate versus S with M = 100 antennas and outdated CSI at the BS.

VI. CONCLUSION

We propose a two stage CE precoding solution to resolve the key practical issues associated with the

implementation of massive MIMO systems. While the proposed solution can potentially enable low cost

massive MIMO BS implementations, the optimization of two stage CE precoding is a very challenging

non-convex stochastic optimization problem which cannot be solved by the existing algorithms. We

propose an online AO algorithm which is guaranteed to converge to a stationary solution of this problem

without requiring the explicitly knowledge of channel statistics. Simulations show that the propose solution

can achieve the first order gain provided by the massive MIMO array under practical constraints such

as limited number of RF chains, nonlinear power amplifiers and limited resource for CSI signaling.

Therefore, the proposed solution is a good candidate for practical implementation of massive MIMO

systems.

APPENDIX

A. Proof of Lemma 1

After some calculations, we have

I (φ,θ,G; z) = P∥∥∥GHx (θ)

∥∥∥2+ ‖u‖2 + Tr

(GGH

)−2√PRe

[uHGHx (θ)

], (27)

Page 20: Two Stage Constant-Envelope Precoding for Low Cost Massive … · High Peak-to-Average Power Ratio (PAR): In today’s wideband systems, orthogonal frequency-division multiplexing

1053-587X (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSP.2015.2486749, IEEE Transactions on Signal Processing

which is a strictly convex function of G. Then the optimal power in (16) can be obtained by solving the

first order optimality condition of the convex optimization problem minGI (φ,θ,G; z).

It can be verified that

I (φ,θ,G; z) =

Tr(GGH

)+∥∥∥√P GH−ix−i − u

∥∥∥2+ P

∥∥∥GHi

∥∥∥2

+2Re(√

P HHi G

H(√

P GH−ix−i − u)e−jθi

), (28)

from which it is easy to see that the solution in Lemma 1 minimizes I (φ,θ,G; z) over θi ∈ [−π, π).

B. Proof of Theorem 1

For convenience, define f (x,G) = I (φ,θ,G; z), where θ = arg (x). Clearly, Procedure PS decreases

the objective f (x,G) after each iteration. Since f (x,G) is lower bounded by zero, the objective

f (x,G) converges to some value I?. Let x(l) = ejθ(l)

and let (x?,G?) denote an accumulation point of(x(l),G(l)

), l = 1, 2, .... Then there exists a subsequence lk, k = 1, ... such that limk→∞G(lk) → G?

and limk→∞ x(lk) → x?.

We first prove G? = G◦ , argminG

f (x?,G). Since G(lk+1) = argminG

f(x(lk),G

), we have limk→∞G(lk+1) =

G◦. If G? 6= G◦, we have

limk→∞

f(x(lk),G(lk)

)− f

(x(lk),G(lk+1)

)= f (x?,G?)− f (x?,G◦) 6= 0, (29)

where the last inequality holds because minGf (x?,G) has a unique solution G◦. (29) contradicts with

liml→∞ f(x(l),G(l)

)= I?. Hence, we must have G? = G◦.

Second, we prove x?1 = x◦1 , argmin|x1|=1

f (x?0,G?), where x?0 = [x1, x

?2, ..., x

?NS ]. There are two cases.

Case 1:∣∣∣HH

1 G?H(√

P G?H−1x?−1 − u

)∣∣∣ ≥ ε, where x?−1 = [x?2, ..., x?NS ]. In this case, if x?1 6= x◦1,

we have

limk→∞

f(x(lk),G(lk+1)

)− f

(b

(lk+1)1 ,G(lk+1)

)= f (x?,G?)− f (b◦1,G

?) 6= 0, (30)

where b(lk+1)1 =

[x

(lk+1)1 , x

(lk)2 , ..., x

(lk)NS

], b◦1 = [x◦1, x

?2, ..., x

?NS ], the first equality holds because limk→∞G(lk+1) =

G◦ = G? and limk→∞ x(lk+1)1 = x◦1, the last inequality holds because argmin

|x1|=1

f (x?0,G?) has a unique

Page 21: Two Stage Constant-Envelope Precoding for Low Cost Massive … · High Peak-to-Average Power Ratio (PAR): In today’s wideband systems, orthogonal frequency-division multiplexing

1053-587X (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSP.2015.2486749, IEEE Transactions on Signal Processing

solution when∣∣∣HH

1 G?H(√

P G?H−1x?−1 − u

)∣∣∣ ≥ ε. (30) contradicts with liml→∞ f(x(l),G(l)

)= I?.

Hence, x?1 = x◦1.

Case 2:∣∣∣HH

1 G?H(√

P G?H−1x?−1 − u

)∣∣∣ < ε. It follows from (28) that f (x?,G?)−argmin|x1|=1

f (x?0,G?) <

ε. Moreover, there exists a sufficiently large k0, such that∣∣∣HH

1 G(lk)H(√

P G(lk)H−1x(lk)−1 − u

)∣∣∣ <ε,∀k ≥ k0. By the definition of Procedure PS, x(lk+1)

1 = x(lk)1 , ∀k ≥ k0 and thus we also have

limk→∞ x(lk+1)1 = x?1.

Repeating similar analysis as that for x?1, it can be shown that for i = 1, ..., NS, we have x?i = x◦i ,

argmin|x1|=1

f(x?i−1,G

?), if∣∣∣HH

i G?H(√

P G?H−ix?−i − u

)∣∣∣ ≥ ε; and f (x?,G?)−argmin|x1|=1

f(x?i−1,G

?)< ε

otherwise, where x?i−1 =[x?1, ..., x

?i−1, xi, x

?i+1, ..., x

?NS

]. Then Theorem 1 follows immediately from this

result and that G? = G◦ , argminG

f (x?,G).

C. Proof of Theorem 2

Define IJ (φ) = 1J

∑Jq=1 I

(φ,θJ (zq) ,GJ (zq) ; zq

)and IJ0 (φ) = 1

J

∑Jq=1 I

(φ,θJ−1 (zq) ,GJ−1 (zq) ; zq

).

Let 4IJ = IJ(φJ)− IJ0

(φJ−1

). Clearly, we have 4IJ ≤ 0. On the other hand,

4IJ = I(φJ , ΘJ

)− I

(φJ−1, ΘJ−1

)+ IJ

(φJ)

−I(φJ , ΘJ

)+ I

(φJ−1, ΘJ−1

)− IJ0

(φJ−1

). (31)

where ΘJ ={θJ(zq

J+1z

),GJ

(zq

J+1z

): ∀z

}, J = 1, 2, ..., qJ+1

z = argminq∈[1,J ]

‖z− zq‖, and z is the

effective channel-symbol vector corresponding to z. According to the the strong law of large numbers,

we have

limJ→∞

IJ(φJ)− I

(φJ , ΘJ

)→ 0, (32)

limJ→∞

IJ0(φJ−1

)− I

(φJ−1, ΘJ−1

)→ 0, (33)

with probability 1. Combining (31) to (33), we have

limJ→∞

4IJ −(I(φJ , ΘJ

)− I

(φJ−1, ΘJ−1

))= 0 (34)

with probability 1. From (34), 4IJ ≤ 0 and the fact that I(φJ , ΘJ

)is lower bounded by zero, we

have limJ→∞4IJ

= 0 with probability 1, from which it follows that

limJ→∞

I(φJ , ΘJ

)− I

(φJ−1, ΘJ−1

)= 0, (35)

with probability 1. Based on the above analysis, we use contradiction to prove that any accumulation

point(φ?, Θ? ,

{θ?

(z) , G? (z) : ∀z})

of(φJ , ΘJ

), J = 1, 2, ... satisfies the following conditions if

(35) is true.

Page 22: Two Stage Constant-Envelope Precoding for Low Cost Massive … · High Peak-to-Average Power Ratio (PAR): In today’s wideband systems, orthogonal frequency-division multiplexing

1053-587X (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSP.2015.2486749, IEEE Transactions on Signal Processing

C1: For every z outside a set of probability zero, we have 1) G? (z) = argminGI(φ?, θ

?(z) ,G; z

)and

2) if∣∣∣∣HH

i˜G?H

(z)

(√P˜G?

(z) H−ix?−i (z)− u

)∣∣∣∣ ≥ ε, θ?i (z) = argminθi∈[−π,π)

I(φ?, θ

?

i−1 (z) , G? (z) ; z)

,

where θ?

i−1 (z) =[θ?1 (z) , ..., θ?i−1 (z) , θi, θ

?i+1 (z) , ..., θ?NS (z)

], x?−i (z) =

[ejθ

?1 (z), ..., ejθ

?i−1(z), ejθ

?i+1(z), ..., ejθ

?SN (z)

];

otherwise I(φ?, θ

?(z) , G? (z) ; z

)− argminθi∈[−π,π)

I(φ?, θ

?

i−1 (z) , G? (z) ; z)< ε.

C2: If E[g(φ?−i, θ

?(z) , G? (z) ; z

)]≥ ε, φ?i = argmin

φi∈[−π,π)

I(φ?i−1, Θ

?)

, where φ?−i =[φ?1, ..., φ

?i−1, φ

?i+1, ..., φ

?MS

]and φ?i−1 =

[φ?1, ..., φ

?i−1, φi, φ

?i+1, ..., φ

?MS

]; otherwise I

(φ?, Θ?

)− argminφi∈[−π,π)

I(φ?i−1, Θ

?)< ε.

The proof can be obtained by contradiction. Since(φ?, Θ?

)is an accumulation point, there exists a

subsequence Jk, k = 1, ... such that limk→∞φJk → φ? and limk→∞ ΘJk → Θ?. Suppose C1 and C2

are not satisfied. Using (32), (33) and following similar analysis as that in Appendix B, it can be shown

that limk→∞ I(φJk , ΘJk

)− I

(φJk+1, ΘJk+1

)6= 0, which contradicts with (35).

Next, we show that limJ→∞ θJ? (z) − θJ(zq

J+1z

)= 0 and limJ→∞GJ? (z) −GJ

(zq

J+1z

)= 0 for

all z and thus any accumulation point (φ?,Θ?) of(φJ ,ΘJ

), J = 1, 2, ... is also an accumulation point(

φ?, Θ?)

of(φJ , ΘJ

), J = 1, 2, .... Note that

(θJ? (z) ,GJ? (z)

)is the output of Procedure PS with

input z and(θJ−1

(zq

Jz

),GJ−1

(zq

Jz

)), and

(θJ(zq

J+1z

),GJ

(zq

J+1z

))is the output of Procedure PS

with input zqJ+1z and

(θJ−1

(zq

Jz

),GJ−1

(zq

Jz

)). It can be verified that Procedure PS is a continuous

mapping from the input to the output and limJ→∞ zqJ+1z −z = 0. As a result, we have limJ→∞ θJ? (z)−

θJ(zq

J+1z

)= 0 and limJ→∞GJ? (z)−GJ

(zq

J+1z

)= 0.

Combining the above results, we have proved that any accumulation point (φ?,Θ?) of(φJ ,ΘJ

), J =

1, 2, ... satisfies C1 and C2 with probability 1, from which Theorem 2 follows.

REFERENCES

[1] F. Rusek, D. Persson, B. K. Lau, E. Larsson, T. Marzetta, O. Edfors, and F. Tufvesson, “Scaling up MIMO: Opportunities

and challenges with very large arrays,” IEEE Signal Processing Magazine, vol. 30, no. 1, pp. 40–60, Jan. 2013.

[2] C. Studer and E. Larsson, “Par-aware large-scale multi-user mimo-ofdm downlink,” IEEE J. Select. Areas Commun., vol. 31,

no. 2, pp. 303–313, February 2013.

[3] S. Mohammed and E. Larsson, “Per-antenna constant envelope precoding for large multi-user MIMO systems,” IEEE

Trans. Commun., vol. 61, no. 3, pp. 1059–1071, March 2013.

[4] ——, “Constant-envelope multi-user precoding for frequency-selective massive MIMO systems,” IEEE Wireless Commu-

nications Letters, vol. 2, no. 5, pp. 547–550, October 2013.

[5] X. Zhang, A. Molisch, and S.-Y. Kung, “Variable-phase-shift-based rf-baseband codesign for mimo antenna selection,”

IEEE Trans. Signal Processing, vol. 53, no. 11, pp. 4091–4103, 2005.

[6] P. Sudarshan, N. Mehta, A. Molisch, and J. Zhang, “Channel statistics-based RF pre-processing with antenna selection,”

IEEE Trans. Wireless Commun., vol. 5, no. 12, pp. 3501–3511, 2006.

Page 23: Two Stage Constant-Envelope Precoding for Low Cost Massive … · High Peak-to-Average Power Ratio (PAR): In today’s wideband systems, orthogonal frequency-division multiplexing

1053-587X (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSP.2015.2486749, IEEE Transactions on Signal Processing

[7] O. El Ayach, S. Rajagopal, S. Abu-Surra, Z. Pi, and R. Heath, “Spatially sparse precoding in millimeter wave MIMO

systems,” IEEE Trans. Wireless Commun., 2014.

[8] A. Liu and V. K. N. Lau, “Phase only RF precoding for massive MIMO systems with limited RF chains,” IEEE Trans.

Signal Processing, vol. 62, no. 17, pp. 4505–4515, Sept 2014.

[9] J. R. Birge and F. Louveaux, Introduction to stochastic programming. Springer, 2011.

[10] S. Boyd and A. Mutapcic, “Stochastic subgradient methods,” 2008. [Online]. Available: http://see.stanford.edu/materials/

lsocoee364b/04-stoch_subgrad_notes.pdf

[11] V. van Zelst and T. Schenk, “Implementation of a mimo ofdm-based wireless lan system,” IEEE Trans. Signal Processing,

vol. 52, no. 2, pp. 483–494, Feb 2004.

[12] A. Liu and V. Lau, “Hierarchical interference mitigation for massive mimo cellular networks,” IEEE Trans. Signal

Processing, vol. 62, no. 18, pp. 4786–4797, Sept 2014.

[13] L. Grippo and M. Sciandrone, “On the convergence of the block nonlinear gauss-seidel method under convex constraints,”

Operat. Res. Lett., vol. 26, pp. 127–136, 2000.

[14] Technical Specification Group Radio Access Network; Further Advancements for E-UTRA Physical Layer Aspects, 3GPP

TR 36.814. [Online]. Available: http://www.3gpp.org

[15] T. Yoo and A. Goldsmith, “On the optimality of multiantenna broadcast scheduling using zero-forcing beamforming,” IEEE

Journal on Selected Areas in Communications, vol. 24, no. 3, pp. 528 – 541, mar. 2006.

[16] A. Adhikary, J. Nam, J.-Y. Ahn, and G. Caire, “Joint spatial division and multiplexing - the large-scale array regime,”

IEEE Trans. Info. Theory, vol. 59, no. 10, pp. 6441–6463, Oct 2013.

[17] K. Baddour and N. Beaulieu, “Autoregressive modeling for fading channel simulation,” IEEE Trans. Wireless Commun.,

vol. 4, no. 4, pp. 1650–1662, 2005.

[18] S. C. Cripps, RF Power Amplifiers for Wireless Communications. Artech Publishing House, 1999.

An Liu (S’07–M’09) received the Ph.D. and the B.S. degree in Electrical Engineering from Peking

University, China, in 2011 and 2004 respectively.

From 2008 to 2010, he was a visiting scholar at the Department of ECEE, University of Colorado at

Boulder. From 2011 to 2013, he was a Postdoctoral Research Fellow with the Department of ECE, HKUST,

and he is currently a Research Assistant Professor. His research interests include wireless communication,

stochastic optimization and compressive sensing.

Page 24: Two Stage Constant-Envelope Precoding for Low Cost Massive … · High Peak-to-Average Power Ratio (PAR): In today’s wideband systems, orthogonal frequency-division multiplexing

1053-587X (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSP.2015.2486749, IEEE Transactions on Signal Processing

Vincent K. N. Lau (SM’04–F’12) obtained B.Eng (Distinction 1st Hons) from the University of Hong

Kong (1989-1992) and Ph.D. from the Cambridge University (1995-1997). He joined Bell Labs from

1997-2004 and the Department of ECE, Hong Kong University of Science and Technology (HKUST) in

2004. He is currently a Chair Professor and the Founding Director of Huawei-HKUST Joint Innovation

Lab at HKUST. His current research focus includes robust and delay-optimal cross layer optimization

for MIMO/OFDM wireless systems, interference mitigation techniques for wireless networks, massive

MIMO, M2M and network control systems.