One-bit Quantized Massive MIMO Detection Based on ...Over the past decades, massive...

16
1 One-bit Quantized Massive MIMO Detection Based on Variational Approximate Message Passing Zhaoyang Zhang, Member, IEEE, Xiao Cai, Chunguang Li, Senior Member, IEEE, Caijun Zhong, Senior Member, IEEE, and Huaiyu Dai, Fellow, IEEE Abstract—One-bit quantization can significantly reduce the massive multiple-input and multiple-output (MIMO) system hardware complexity, but at the same time it also brings great challenges to the system algorithm design. Specifically, it is difficult to recover information from the highly distorted samples as well as to obtain accurate channel estimation without increasing the number of pilots. In this paper, a novel infer- ence algorithm called variational approximate message passing (VAMP) for one-bit quantized massive MIMO receiver is de- veloped, which attempts to exploit the advantages of both the variational Bayesian inference (VBI) algorithm and the bilinear generalized approximated message passing (BiG-AMP) algorithm to accomplish joint channel estimation and data detection (JCD) in a closed form with first-order complexity. Asymptotic state evolution analysis indicates the fast convergence rate of VAMP and also provides a lower bound for the data detection error. Moreover, through extensive simulations we show that VAMP can achieve excellent detection performance with low pilot overhead in a wide range of scenarios. Index Terms—Variational approximate message passing (VAMP), variational Bayesian inference (VBI), bilinear general- ized approximated message passing (BiG-AMP), massive MIMO, one-bit quantization. I. I NTRODUCTION Over the past decades, massive multiple-input-multiple- output (MIMO) systems have been widely regarded as an enabling technology for next-generation wireless communi- cations, due to its great potential in increasing power and spectrum efficiency [1], [2]. However, the large number of antenna elements equipped at the base station (BS) also bring tremendous hardware complexity as well as high circuit power consumption [3], [4], which has been one of the most critical issues in practical implementation. To neutralize the curse of hardware complexity and power consumption, low-resolution quantization [5]–[7], especially one-bit quantization [8], has been introduced to simplify the receiver circuitry, which results in the one-bit quantized massive MIMO system [9]–[12]. However, the data detection in a one-bit quantized massive MIMO system is quite different from that of a conventional MIMO system. The high distortion of one-bit measurements This work was supported in part by National Natural Science Foundation of China (Nos. 61725104, 61371094, 61631003), and Huawei Technologies Co.,Ltd (YB2013120029, YB2015040053, HF2017010003). Zhaoyang Zhang (Email: [email protected]), Xiao Cai, Chunguang Li, and Caijun Zhong are with the College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou, China, and also with the Provincial Key Laboratory of Information Processing, Communication and Networking, Zhejiang, China. Huaiyu Dai (Email: [email protected]) is with the De- partment of ECE, NC State University. make it hard to achieve accurate channel state information [9]– [11] when the pilot length is not long enough [8], [13]. As a way to save the situation, joint channel estimation and data detection (JCD) is adopted [14]–[16], which exploits both the pilot symbols and data symbols to improve the channel esti- mation, and thus achieves performances close to the training- based detection while using much shorter pilot sequence. Although the idea of JCD is appealing for one-bit quantized massive MIMO systems, there are still much to do in its real- ization. On the one hand, the one-bit quantization destroys the asymptotic optimality of linear-filtering-based turbo/iterative detection [17], and most of the statistical information are distorted, which undermines the applicability of high-order- moment-based [18] and subspace-based methods [19]. On the other hand, for a large-scale system like massive MIMO, tree-search-based algorithms with exponential complexity or conventional message passing (MP) (see [20] and references therein) with second-order or even higher complexity with respect to the antenna number are in general undesirable. Recently, approximate message passing (AMP) has attracted increasing attention. Differing from other variants of MP, AMP manipulates messages associated with nodes instead of edges on the factor graph [21]–[24], and thus reduces the number of messages drastically. The bilinear generalized approximate message passing (BiG-AMP) algorithm further expands AMP from linear model to quantized bilinear model using Gaussian approximation and Taylor expansion to cope with the non- linearity in the large-system limit. Moreover, its convergence can be further improved with the aid of adaptive damping [25]. Hence, BiG-AMP is a powerful message passing approach for large-scale systems, and has shown good performance in the contexts of JCD and compressed sensing [13], [25]–[28], etc. Another inference algorithm with the first-order complexity is variational Bayesian inference (VBI) [29], [30], which has also been applied to JCD of MIMO systems [31], [32]. It updates the posterior probability of the target variable in com- plicated models by marginalize neighboring variables using variational approximation with no need for the assumption of large user number. However, since there is no such a term like the Onsager reaction term to compensate for the cyclic structure on the factor graph [21], VBI usually suffers performance loss compared with AMP and BiG-AMP [27]. In this paper, by exploiting the advantages of both Big-AMP and VBI, a novel vertex-message based inference algorithm called variational approximate message passing (VAMP) is proposed for the one-bit quantized massive MIMO detection. It succeeds to complete channel estimation, data detection

Transcript of One-bit Quantized Massive MIMO Detection Based on ...Over the past decades, massive...

  • 1

    One-bit Quantized Massive MIMO Detection Basedon Variational Approximate Message Passing

    Zhaoyang Zhang, Member, IEEE, Xiao Cai, Chunguang Li, Senior Member, IEEE,Caijun Zhong, Senior Member, IEEE, and Huaiyu Dai, Fellow, IEEE

    Abstract—One-bit quantization can significantly reduce themassive multiple-input and multiple-output (MIMO) systemhardware complexity, but at the same time it also bringsgreat challenges to the system algorithm design. Specifically,it is difficult to recover information from the highly distortedsamples as well as to obtain accurate channel estimation withoutincreasing the number of pilots. In this paper, a novel infer-ence algorithm called variational approximate message passing(VAMP) for one-bit quantized massive MIMO receiver is de-veloped, which attempts to exploit the advantages of both thevariational Bayesian inference (VBI) algorithm and the bilineargeneralized approximated message passing (BiG-AMP) algorithmto accomplish joint channel estimation and data detection (JCD)in a closed form with first-order complexity. Asymptotic stateevolution analysis indicates the fast convergence rate of VAMPand also provides a lower bound for the data detection error.Moreover, through extensive simulations we show that VAMP canachieve excellent detection performance with low pilot overheadin a wide range of scenarios.

    Index Terms—Variational approximate message passing(VAMP), variational Bayesian inference (VBI), bilinear general-ized approximated message passing (BiG-AMP), massive MIMO,one-bit quantization.

    I. INTRODUCTION

    Over the past decades, massive multiple-input-multiple-output (MIMO) systems have been widely regarded as anenabling technology for next-generation wireless communi-cations, due to its great potential in increasing power andspectrum efficiency [1], [2]. However, the large number ofantenna elements equipped at the base station (BS) also bringtremendous hardware complexity as well as high circuit powerconsumption [3], [4], which has been one of the most criticalissues in practical implementation. To neutralize the curse ofhardware complexity and power consumption, low-resolutionquantization [5]–[7], especially one-bit quantization [8], hasbeen introduced to simplify the receiver circuitry, which resultsin the one-bit quantized massive MIMO system [9]–[12].

    However, the data detection in a one-bit quantized massiveMIMO system is quite different from that of a conventionalMIMO system. The high distortion of one-bit measurements

    This work was supported in part by National Natural Science Foundationof China (Nos. 61725104, 61371094, 61631003), and Huawei TechnologiesCo.,Ltd (YB2013120029, YB2015040053, HF2017010003).

    Zhaoyang Zhang (Email: [email protected]), Xiao Cai,Chunguang Li, and Caijun Zhong are with the College of InformationScience and Electronic Engineering, Zhejiang University, Hangzhou, China,and also with the Provincial Key Laboratory of Information Processing,Communication and Networking, Zhejiang, China.

    Huaiyu Dai (Email: [email protected]) is with the De-partment of ECE, NC State University.

    make it hard to achieve accurate channel state information [9]–[11] when the pilot length is not long enough [8], [13]. As away to save the situation, joint channel estimation and datadetection (JCD) is adopted [14]–[16], which exploits both thepilot symbols and data symbols to improve the channel esti-mation, and thus achieves performances close to the training-based detection while using much shorter pilot sequence.Although the idea of JCD is appealing for one-bit quantizedmassive MIMO systems, there are still much to do in its real-ization. On the one hand, the one-bit quantization destroys theasymptotic optimality of linear-filtering-based turbo/iterativedetection [17], and most of the statistical information aredistorted, which undermines the applicability of high-order-moment-based [18] and subspace-based methods [19]. On theother hand, for a large-scale system like massive MIMO,tree-search-based algorithms with exponential complexity orconventional message passing (MP) (see [20] and referencestherein) with second-order or even higher complexity withrespect to the antenna number are in general undesirable.

    Recently, approximate message passing (AMP) has attractedincreasing attention. Differing from other variants of MP, AMPmanipulates messages associated with nodes instead of edgeson the factor graph [21]–[24], and thus reduces the numberof messages drastically. The bilinear generalized approximatemessage passing (BiG-AMP) algorithm further expands AMPfrom linear model to quantized bilinear model using Gaussianapproximation and Taylor expansion to cope with the non-linearity in the large-system limit. Moreover, its convergencecan be further improved with the aid of adaptive damping [25].Hence, BiG-AMP is a powerful message passing approach forlarge-scale systems, and has shown good performance in thecontexts of JCD and compressed sensing [13], [25]–[28], etc.

    Another inference algorithm with the first-order complexityis variational Bayesian inference (VBI) [29], [30], which hasalso been applied to JCD of MIMO systems [31], [32]. Itupdates the posterior probability of the target variable in com-plicated models by marginalize neighboring variables usingvariational approximation with no need for the assumptionof large user number. However, since there is no such aterm like the Onsager reaction term to compensate for thecyclic structure on the factor graph [21], VBI usually suffersperformance loss compared with AMP and BiG-AMP [27].

    In this paper, by exploiting the advantages of both Big-AMPand VBI, a novel vertex-message based inference algorithmcalled variational approximate message passing (VAMP) isproposed for the one-bit quantized massive MIMO detection.It succeeds to complete channel estimation, data detection

  • 2

    and noise level estimation simultaneously with only binarymeasurements and relatively short pilot sequences at thereceiver. In particular, variational approximation is insertedinto the framework of AMP as the means to deal with non-linearity (intractable marginalization). To support its usage,the detection problem is formulated and approximated usingexponential functions. In the algorithm derivation, we firstdevelop message passing (MP) on the factor graph usingvariational approximation, and then try to replace those edgemessages with newly-defined vertex messages similar as inAMP. The proposed algorithm not only inherits BiG-AMP’sadvantages of closed-form expressions, low complexity, lowpilot overhead and excellent performance, but also makesup for its shortcoming in convergence in some particularscenarios. Our main contributions are summarized as follows.

    1) VAMP is developed in the framework of AMP. Differentfrom VBI, VAMP derives the desired vertex messagesfrom edge messages in MP. In this way, dependencecorrection is accomplished [21] and the performanceloss in VBI is avoided [27]. The first-order complexityof VAMP as well as its low pilot overhead in dealingwith the JCD problem is analyzed.

    2) Asymptotic analysis based on state evolution is per-formed to investigate the iterative behavior of VAMP.It applies to not only the JCD problem, but also theproblems of training-based detection and detection withperfect CSI. An analytical performance bound of VAMPis provided in terms of mean square error (MSE) fromthe state evolution analysis.

    3) Extensive experiments validate the theoretical analysisand show the outstanding performance of VAMP. Itoutperforms both linear detector and the VBI approachin all signal-to-noise ratio (SNR) regimes. Moreover,in some scenarios of relatively low SNR and low pilotpercentage, VAMP can still accomplish the JCD whileBiG-AMP cannot.

    The remainder of this paper is organized as follows. InSection II, the detection model is presented and the problemis formulated mathematically. In Section III, the variationalmethodology is introduced and some of the exponential ap-proximations are given. The algorithm expressions of VAMPis derived in Section IV and its asymptotic analysis by stateevolution is provided in Section V. In Section VI, simulationresults are presented. Finally, Section VII concludes this paper.

    Throughout the paper, xij refers to the (i, j)th entry ofmatrix X; ln(·) denotes the natural logarithm; (·)R and (·)Iextracts the real and imaginary components respectively froma complex scalar or matrix; (·)∗ is the conjugate of a scalar;CN (x;µ, τ) denotes a complex-valued circularly-symmetricGaussian probability density function (PDF); N (x;µ, τ) de-notes a real-valued Gaussian PDF; Ψ(x) =

    ∫ x−∞ N (s; 0, 1)ds

    denotes the standard real-valued Gaussian cumulative distribu-tion function (CDF); Φ(x) = N (x; 0, 1) denotes the standardreal-valued Gaussian PDF; σ(x) = (1 + exp(−x))−1 denotesthe logistic function; KL(q(x)||p(x)) = −

    ∫q(x) ln p(x)q(x)dx

    denotes the Kullback-Leibler divergence between two PDFs.

    II. SYSTEM MODEL

    We consider an narrow-band 1 uplink MIMO system, wherethe base station (BS) is equipped with M antennas to serveN single-antenna users. The channel is assumed to be flatblock fading, wherein the channel state remains constant overa block of K symbols. Each user inserts Kpilot pilot symbolsperiodically in each block. On the receiver side, the antennaarray collects analog signals and quantizes them using 1-bitADC. Due to the inefficiency of one-bit measurements, M ≫N and K ≫ N are assumed to enable channel estimation andsymbol detection. The system model is depicted in Fig. 1.

    Fig. 1. System model of one-bit quantized massive MIMO systems.

    The received analog signal Ỹ = [ỹmk] ∈ CM×K over theblock interval can be written in matrix form as

    Ỹ = HX +W = Z +W, (1)

    where X = [xnk] ∈ CN×K denotes the transmit symbolsin the block, H = [hmn] ∈ CM×N denotes the channelmatrix, W = [wmk] ∈ CN×K represents the additive complexGaussian noise with zero mean and element-wise variance τw,and we define Z = [zmk] = HX ∈ CM×K .

    On the receiver side, each received signal is down-convertedinto analog baseband Ỹ and then quantized. Each complex-valued quantizer consists of two real-valued 1-bit ADCs,which operate independently on the in-phase (real) and onthe quadrature (imaginary) part of each analog sample. Theyconvert Ỹ into 2MK 1-bit measurements, namely, Y =[ymk] ∈ {±1± i1}M×K :

    Y = sgn(Ỹ ) = sgn(Ỹ R) + i · sgn(Ỹ I). (2)

    Hence, the conditional probability function can be expressedas

    P (Y |H,X) =M∏

    m=1

    K∏k=1

    P (ymk|zmk) (3)

    where each P (ymk|zmk) is factorized as

    P (ymk|zmk) = P (yRmk|zRmk)P (yImk|zImk). (4)

    In this model, entries in W are assumed to be independentand identically distributed (i.i.d.) complex Gaussian variables

    1Throughout this paper we focus on a narrow-band channel model, similarto those considered in, e.g., [8]–[13], [33], among others. Wide-band channelmodels with multi-path effect can still be treated using the general frameworkof Bayesian inference in MIMO-OFDM systems, and will be treated in ourfuture work; a further discussion is in Section VIII.

  • 3

    CN (wmk; 0, 16π ). Once the noise variance τw is fixed, thelogistic function σ(·) can be used in the conditional probabilityfunction as follows [29]:

    P (yRmk = 1|zRmk) = Ψ(√

    π

    8zRmk) ≈ σ(zRmk), (5)

    P (yRmk|zRmk) = σ(zRmk)1+yRmk

    2 σ(−zRmk)1−yRmk

    2 . (6)

    In this way, P (yImk|zImk) can be calculated similarly.Assume that users transmit modulated symbols X ∈ CN×K ,

    where the first Kpilot in the block are known pilot symbols andthe following K−Kpilot are unknown data symbols. The priorprobability can be written as

    PX(X) =

    N∏n=1

    Kpilot∏k=1

    PXp(xnk)

    N∏n=1

    K∏k>Kpilot

    PXd(xnk). (7)

    Symbols are chosen from the alphabet A of the particular mod-ulation. Without loss of generality, we can normalize the aver-age energy of those constellation points in A. Pilot symbols areknown to the receiver, namely, PXp(xnk) are Delta functionsfor all pilot symbols xnk = x0nk, k ∈ {1, 2, · · · ,Kpilot}. Datasymbols are chosen according to discrete uniform distributionPXd(xnk) over the alphabet A of the particular modulation.

    Entries in the channel matrix H ∈ CM×N are assumed tobe complex Gaussian variables, which can be expressed as

    PH(H|αn) =M∏

    m=1

    N∏n=1

    PH(hmn|αn), (8)

    PH(hmn|αn) = CN (hmn; 0, α−1n ). (9)

    The joint effect of transmit power and channel fading isrepresented by the hyper-parameter α ∈ RN , whose entriesare assumed to obey the Gamma distribution

    Pα(αn) = Γ(αn; a0, b0). (10)

    Given that noise variance is assumed constant beforehand, theestimation of noise level is equivalent to the acquisition of α.

    Our target is to accomplish channel estimation, data de-tection and noise level estimation simultaneously, namely, toobtain the posterior probability function P (H,X,α|Y ) fromY and a certain number of known pilots in X .

    Assume the posterior can be expressed in the form as

    P (H,X,α|Y ) =∏n

    P (αn|Y )∏m,n

    P (hmn|Y )︸ ︷︷ ︸P (H|Y )

    ∏n,k

    P (xn,k|Y )︸ ︷︷ ︸P (X|Y )

    ,

    (11)and such assumption on independency is common in factor-graph based inference algorithms. As long as the posteriorsare obtained, the channel, noise level and data symbols canthen be estimated. Take channel estimation as an example.The Bayesian optimal estimator is the posterior mean

    ĥmn =

    ∫hmn · P (hmn|Y )dhmn (12)

    which minimizes the mean square error (MSE) EH|Y {||Ĥ −H||2F }. Data symbol and noise level estimates can be obtainedsimilarly.

    III. VARIATIONAL APPROXIMATION FOR JCDBefore proceeding, we provide a brief review of the varia-

    tional approximation (please refer to [30] for details).Consider the problem of approximating a set of hidden vari-

    ables Θ = {θ1, θ2, · · · } from observation Y . When the exactposterior P (Θ|Y ) is not available, it can be approximatedby the variational approximation q(Θ) which minimizes theKullback-Leibler divergence KL(q(Θ)||P (Θ|Y )). Since P (Y )is constant, the minimization of Kullback-Leibler divergencecan be translated into the maximization of L(q), which isdefined as

    L(q) =

    ∫q(Θ) ln

    P (Y,Θ)

    q(Θ)dΘ. (13)

    Under the independence assumption for the hidden vari-ables, or namely,

    q(Θ) =∏l

    ql(θl) =∏l

    ql, (14)

    we can rewrite L(q) with respect to ql and maximize it as in[30], and then obtain q(Θ) with each ql calculated as

    ln ql =

    ∫lnP (Y,Θ)

    ∏l′ ̸=l

    (ql′dθl′) + const. (15)

    This is the rough idea of variational approximation.In our studied JCD problem,

    Θ = {H,X,α} (16)

    and ql represent

    {ql} = {{P (hmn|Y )}, {P (xnk|Y )}, {P (αn|Y )}} (17)

    according to (11). In (15), the logarithms of ql and P (Y,Θ)need to be addressed. Using (3), (7) and (8), the jointdistribution P (Y,Θ) can be factorized into (18). Amongthe probability functions in (17) and (18), the Gaussian-distributed P (H|Y ), PH(H|α) and the Gamma-distributedP (α|Y ), Pα(α) are all in exponential form. Therefore, toobtain closed-form expressions for (15), P (X|Y ), PX(X) andP (Y |H,X) should also be approximated in exponential form,which is the main focus of this section.

    A. Exponential Approximation of P (Y |H,X)As mentioned in [29], [34], when − log(es/2 + e−s/2) is

    first-order Taylor expanded at any s2 = ε2, a global bound ofthe logistic function σ(·) can be obtained as

    σ(s) ≥ σ(ε) exp(s− ε2

    − λ(ε)(s2 − ε2))

    (19)

    where λ(ε) is a constant calculated as

    λ(ε) ≡ 12ε

    (σ(ε)− 12). (20)

    This lower bound is an exponential approximation to σ(s),which is exact when ε2 = s2.

    Substituting (19) into (6), we can obtain

    lnP (yRmk|zRmk) =yRmk2

    zRmk − λ(εRmk)(zRmk)2 + const,

    lnP (yImk|zImk) =yImk2

    zImk − λ(εImk)(zImk)2 + const,(21)

  • 4

    P (Y,H,X, α) =∏m,k

    P (ymk|zmk)︸ ︷︷ ︸P (Y |H,X)

    ∏m,n

    PH(hmn|α)︸ ︷︷ ︸PH(H|α)

    ∏n,k≤Kpilot

    PXp(xnk)∏

    n,k>Kpilot

    PXd(xnk)︸ ︷︷ ︸PX(X)

    ∏n

    Pα(αn). (18)

    where (εRmk)2 = E[(zRmk)

    2] and (εImk)2 = E[(zImk)

    2] areconstant values2. Therefore, we can use (3), (4) and (21) toobtain the exponential approximation of P (Y |H,X).

    B. Exponential Approximation of modulated Symbols

    Naturally, symbol values are regarded as continuous randomvariables and thus P (X|Y ) and PX(X) are expressed usingcomplex Gaussian distributions. When PX(X) is concerned,PXp(xnk) and PXd(xnk) can be expressed as

    PXp(xnk) = CN (xnk;x0nk, 0), (22)PXd(xnk) = CN (xnk; 0, 1). (23)

    As for P (X|Y ), it should be noted that P (xnk|Y ) calcu-lated from (15) is not the complete content of target posterior.In practical systems, the detector may exchange log-likelihoodratio (LLR) information of the coded bits with channel de-coder. For symbol variable xnk, the LLR information of itsl-th bit are calculated as

    L̃LRnk,l = ln

    ∑{xnk:xnk∈A,cnkl=1}

    CN (xnk; µ̃nk, τ̃nk)∑{xnk:xnk∈A,cnkl=0}

    CN (xnk; µ̃nk, τ̃nk). (24)

    In this paper, we neglect the LLR update in the decoderand directly construct the output message using L̃LRnk,l.Here a projection operation Proj[·] is introduced to representthe process, and the two distributions during projection aredenoted as

    CN (xnk;µnk, τnk) = Proj[CN (xnk; µ̃nk, τ̃nk)]. (25)

    For low-order modulation such as BPSK and QPSK, thenew message can be constructed by solving equations:

    L̃LRnk,l = LLRnk,l, ∀l,

    (µR/Ink )

    2 + τR/Ink =

    1

    2(or 1 for BPSK).

    (26)

    where LLRnk,l is obtained by substituting the output mes-sage N (xR/Ink ;µ

    R/Ink , τ

    R/Ink ) into (24). In this way, full LLR

    information of this symbol is conserved. For higher-ordermodulation, however, the output message fails to involve allLLR information of this symbol, since the three quantities(µRnk, µ

    Ink, τnk) cannot satisfy all the LLR equations in (26).

    Therefore, µnk, τnk are calculated alternatively based on mo-ment matching in [13], [29] or expectation propagation in [33].In this way, Proj(·) can be described using F1(µ̃nk, τ̃nk) andF2(µ̃nk, τ̃nk) as

    µnk = F1(µ̃nk, τ̃nk) =∑

    xnk∈AxnkPL̃LR(xnk), (27)

    τnk = F2(µ̃nk, τ̃nk) =∑

    xnk∈A|xnk|2PL̃LR(xnk)− |µnk|

    2,

    2The reason for the choice of expansion points can be found in Chapter10.6.3 in [29].

    where PL̃LR(xnk) is the discrete probability distribution of xnkgiven by L̃LRnk,l.

    IV. VERTEX-MESSAGE BASED VAMP

    A. Message Passing with Variational Approximation

    For the presentation of the factor graph, we use the sameconvention as in [35]. The factorization of the joint distributionin (18) is represented by the factor graph in Fig. 2.

    Fig. 2. Factor graph representation of the one-bit quantized massive MIMOsystems for the JCD problem, with M = 3, N = 2, and K = 2. The nodewith “=” represents the cloning of variables.

    Since the factor graph is loopy, a message passing scheduleis required. Messages about the random variables (in the formof pdfs or log pdfs) are propagated among the edges of thefactor graph.

    Let ϕt+1mk→mn(hmn) and ϕt+1mk→nk(xnk) denote the messages

    from the channel transition node P (ymk|zmk) to the cloningnode of variables hmn and xnk respectively. Based on varia-tional approximation (15), the two messages are calculated as(28) and (29).

    To eliminate the duplicate computation, their com-mon part is extracted in (30), which is defined asϕ̃tmk→(mn,nk)(hmnxnk) to denote the message from the chan-nel transition node P (ymk|zmk) to the cloning node of hmnand xnk simultaneously. Note that P (Y |H,X) in (21) willcause different variance in real and imaginary parts. There-fore, to simplify calculation, we define a circularly-symmetriccomplex-Gaussian message ϕtmk→(mn,nk)(hmnxnk):

    CN (hmnxnk;ζtmk→(mn,nk)

    γtmk→(mn,nk),

    1

    γtmk→(mn,nk)),

    whose mean and variance are averaged over the distribution

  • 5

    lnϕt+1mk→mn(hmn) =

    ∫∫lnP (ymk|zmk)

    ∏n′ ̸=n

    ϕtmn′→mk(hmn′)dhmn′∏n′′

    ϕtn′′k→mk(xn′′k)dxn′′k + const, (28)

    lnϕt+1mk→nk(xnk) =

    ∫∫lnP (ymk|zmk)

    ∏n′

    ϕtmn′→mk(hmn′)dhmn′∏

    n′′ ̸=n

    ϕtn′′k→mk(xn′′k)dxn′′k + const. (29)

    ln ϕ̃tmk→(mn,nk)(hmnxnk) =

    ∫lnP (ymk|zmk)

    ∏n′ ̸=n

    ϕtmn′→mk(hmn′)dhmn′ϕtn′k→mk(xn′k)dxn′k + const. (30)

    of ϕ̃tmk→(mn,nk)(hmnxnk):

    γtmk→(mn,nk) = (Varϕ̃[(hmnxnk)R] + Varϕ̃[(hmnxnk)

    I ])−1,

    ζtmk→(mn,nk)

    γtmk→(mn,nk)= Eϕ̃[hmnxnk].

    (31)Using ϕtmk→(mn,nk)(hmnxnk), the expressions ofϕt+1mk→mn(hmn) and ϕ

    t+1mk→nk(xnk) can be simplified

    into (32) and (33).Corresponding messages in the opposite direction are

    ϕt+1mn→mk(hmn) =PH(hmn|αtn)∏k′ ̸=k

    ϕt+1mk′→mn(hmn), (34)

    ϕ̃t+1nk→mk(xnk) =PX(xnk)∏

    m′ ̸=m

    ϕt+1m′k→nk(xnk). (35)

    Specifically, Proj[·] in (25) is applied to the messageϕ̃t+1nk→mk(xnk):

    ϕt+1nk→mk(xnk) = Proj[ϕ̃t+1nk→mk(xnk)]. (36)

    The update of variable α is not concerned for the momentand will be discussed later.

    By Gaussian approximation, the messages mentioned aboveare denoted as

    ϕtmn→mk(hmn) = CN (hmn; νtmn→mk, ρtmn→mk), (37)ϕ̃tnk→mk(xnk) = CN (xnk; µ̃tnk→mk, τ̃ tnk→mk), (38)ϕtnk→mk(xnk) = CN (xnk;µtnk→mk, τ tnk→mk), (39)

    we can obtain the closed-form algorithm based on messagepassing in Algorithm 1.

    In Algorithm 1, Line 1-5 calculate ϕtmk→(mn,nk)(hmnxnk)by substituting (21) into (30). Line 6-7 calculateϕtmn→mk(hmn) by combining (32) and (34). Line 8-9is concerned with ϕ̃tnk→mk(xnk) by combining (33) and (35).Based on (27), Line 10-11 obtain ϕtnk→mk(xnk).

    Up to this point, we get a semi-finished inference al-gorithm for the JCD problem. Variational approximation isadopted in the inference but different algorithm steps arepresented in comparison with VBI [29]. The introduction ofϕtmk→(mn,nk)(hmnxnk) slightly increases the message num-ber, but greatly reduces the total complexity of marginaliza-tion, which will later be shown in Section IV-D. Besides, (34)and (35) multiply all but one incoming messages while VBImultiplies all messages. Such exclusion, which is reflectedin our VAMP algorithm by the so-called Onsager reactionterm as later derived in (84), mitigates the dependence ofvariable estimates in the loopy graph and contributes to theperformance superiority to VBI.

    Algorithm 1 Message Passing Algorithmwhile t < tmax do∀m,n, k,

    1. δtmk =∑n(ρtmn→mkτ

    tnk→mk

    +|µtmn→mk|2ρtmn→mk + |νtmn→mk|2τ tnk→mk),2. (εR,tmk)

    2 = |(∑nνtmn→mkµ

    tnk→mk)

    R|2 + 12δtmk,

    3. (εI,tmk)2 = |(

    ∑nνtmn→mkµ

    tnk→mk)

    I |2 + 12δtmk,

    4. γtmk→(mn,nk) =4λ(εR,tmk)λ(ε

    I,tmk)

    2λ(εR,tmk)+2λ(εI,tmk)

    ,

    5. ζtmk→(mn,nk) = γtmk→(mn,nk)((

    yRmk4λ(εR,tmk)

    + iyImk

    4λ(εI,tmk))

    −∑

    n′ ̸=nνtmn′→mkµ

    tn′k→mk),

    6. ρt+1mn→mk = (αtn +

    ∑k′ ̸=k

    γtmk′→(mn,nk′)

    ·(|µtnk′→mk′ |2 + τ tnk′→mk′))−1,7. νt+1mn→mk = ρ

    t+1mn→mk(

    ∑k′ ̸=k

    (µtnk′→mk′)∗ζtmk′→(mn,nk′)),

    if k > Kpilot then8. τ̃ t+1nk→mk = (1 +

    ∑m′ ̸=m

    γtm′k→(m′n,nk)

    ·(|νtm′n→m′k|2 + ρtm′n→m′k))−1,9. µ̃t+1nk→mk = τ̃

    t+1nk→mk(

    ∑m′ ̸=m

    (νtm′n→m′k)∗

    ·ζtm′k→(m′n,nk)),10. µt+1nk→mk = F1(µ̃

    t+1nk→mk, τ̃

    t+1nk→mk),

    11. τ t+1nk→mk = F2(µ̃t+1nk→mk, τ̃

    t+1nk→mk).

    end if12. t = t+ 1.

    end while

    B. From Edge Messages to Vertex Messages

    The computation of Algorithm 1 is expensive. The exclusionin (34) and (35) produces a large number of edge messageswith strong resemblance. To reduce the total complexity, theframework of AMP [21] [22] (or BiG-AMP [25] [27]) isadopted to transform Algorithm 1. Therein, messages corre-sponding to nodes on the factor graph (i.e., vertex messages)are defined to extract the common parts of correlated edgemessages. As a result, the message number can be reducedgreatly from O(MNK) to about MK +MN +NK.

    For readability purposes, the message transformation workcan be found in Appendix A, where vertex messages such asϕtmn(hmn) = CN (hmn; νtmn, ρtmn) are defined, with

    ρt+1mn = (αtn+∑k′

    γtmk′→(mn,nk′)(|µtnk′→mk′ |2+τ tnk′→mk′))−1,

    (40)

  • 6

    lnϕt+1mk→mn(hmn) =

    ∫lnϕtmk→(mn,nk)(hmnxnk) · ϕ

    tnk→mk(xnk)dxnk + const, (32)

    lnϕt+1mk→nk(xnk) =

    ∫lnϕtmk→(mn,nk)(hmnxnk) · ϕ

    tmn→mk(hmn)dhmn + const. (33)

    Algorithm 2 Variational Approximate Message PassingInput: ∀m, k, ymk; ∀n, k ≤ Kpilot, x0nk.Output: ∀n, α̂n; ∀m,n, ĥmn; ∀n, k > Kpilot, x̂mn.

    Initialize: t = 1, α̂tn = a0/b0.while t < tmax do∀m,n, k,

    1. ztmk =∑nνtmnµ

    tnk − ζ

    t−1mk

    ∑n((νtmn′ν

    t−1mn )

    ·τ tnk + (µtnkµt−1nk )

    ∗ρtmn),2. δtmk =

    ∑n(ρtmnτ

    tnk + |µtmn|2ρtmn

    +|νtmn|2τ tnk),3. (εR,tmk)

    2 = |(ztmk)R|2 + 12δtmk,

    4. (εI,tmk)2 = |(ztmk)I |2 + 12δ

    tmk,

    5. γtmk =4λ(εR,tmk)λ(ε

    I,tmk)

    2λ(εR,tmk)+2λ(εI,tmk)

    ,

    6. ζtmk = γtmk((

    yRmk4λ(εR,tmk)

    + iyImk

    4λ(εI,tmk))− ztmk),

    7. ρt+1mn = (αtn +

    ∑k

    γtmk(|µtnk|2 + τ tnk))−1,

    8. νt+1mn = νtmn(1− ρt+1mn (αtn +

    ∑k

    γtmkτtnk))

    +ρt+1mn∑k

    (µtnk)∗ζtmk,

    9. at+1n = a0 +M2 ,

    10. btn = b0 +12

    ∑m(|νt+1mn |2 + ρt+1mn ),

    11. αt+1n = at+1n /b

    t+1n ,

    if k > Kpilot then12. τ̃ t+1nk = (1 +

    ∑m

    γtmk(|νtmn|2 + ρtmn))−1,

    13. µ̃t+1mn = µtmn(1− τ t+1mn (1 +

    ∑m

    γtmkρtmn))

    +τ t+1mn∑m(νtmn)

    ∗ζtmk,

    14. µt+1nk = F1(µ̃t+1nk , τ̃

    t+1nk ),

    15. τ t+1nk = F2(µ̃t+1nk , τ̃

    t+1nk ),

    end if16. t = t+ 1.

    end whilereturn α̂n = atn/btn; ĥmn = νtmn; x̂nk = sgn(µtnk).

    νt+1mn = ρt+1mn (

    ∑k′

    (µtnk′→mk′)∗ζtmk′→(mn,nk′)). (41)

    Other vertex messages ϕtnk(xnk), ϕ̃tnk(xnk) and

    ϕtmk(hmnxnk) are defined similarly. Then the edge messagesin Algorithm 1 are all replaced using those newly-definedvertex messages carefully. In this way, the novel variationalapproximate message passing (VAMP) algorithm is created.Algorithm steps are summarized in Algorithm 2.

    In Algorithm 2, Line 1-6 calculate the vertex messagesfor channel transition node. Then, messages for variables arecalculated. Line 7-11 are for channel variables and Lines 12-15 are for data symbol variables, whose prior probabilitiesrefer to (9) (22) and (23). In Line 9-11, the hyper-parameter

    αt is estimated by averaging the norm of all channel variables.Property of finite alphabet is exploited in Line 14-15. Finally,VAMP outputs ĥmn, x̂mn and α̂n based on Bayesian optimalestimator.

    Compared with Algorithm 1, expression changes in VAMPappear mainly in expectations such as νtmn, µ

    tnk and z

    tmk.

    Specifically, expressions of νtmn and µtnk are calculated and

    their differences between νtmn→mk and µtnk→mk are found.

    Then ztmk are re-derived by summing up the non-negligiblecorrection term in the large-system limit. As a result, thesecond term in Line 1 is the representative Onsager reactionterm in AMP based algorithms, which does not appear in VBIwith mere variational approximation.

    C. Convergence AnalysisAn intuitive but not rigorous explanation is provided for the

    low pilot overhead of VAMP in the JCD problem.Now, a coefficient βt+1mn is defined as

    βt+1mn = 1− ρt+1mn

    (αtn +

    ∑k

    γtmkτtnk

    ), (42)

    which represents the influence of νtmn on νt+1mn :

    νt+1mn = βt+1mn ν

    tmn + ρ

    t+1mn

    (∑k

    (µtnk)∗ζtmk

    ). (43)

    During each iteration, the acquisition of the current estimate isbased on the gradual improvements of the previous estimate,so an intuitive strategy to achieve convergence is to ensure

    0 < βt+1mn < 1, ∀t. (44)

    With decreasing uncertainties {τ tnk}, βt+1mn will gradually in-crease to one.

    Then let us review the expressions of ρtmn in VAMP:

    ρt+1mn =

    (αtn +

    ∑k

    γtmk(|µtnk|2 + τ tnk)

    )−1. (45)

    Obviously, (44) can always be satisfied using (45). Therefore,VAMP puts no strict limitation on the pilot length Kpilot, aslong as it is long enough to distinguish different users, whichin general can be very close to N .

    D. Complexity AnalysisComplexity of the proposed VAMP is analyzed and com-

    pared with those of MP, VBI, BiG-AMP. As shown in Table I,message passing (MP) requires the most message number andtotal complexity, so it is not suitable for large-scale systems.VBI requires the smallest message number. Its complexity islower than MP, but can be further reduced. BiG-AMP andVAMP share the similar algorithm structure. Their messagenumbers are slightly larger than VBI, but both have the lowestcomplexity.

  • 7

    TABLE ICOMPLEXITY COMPARISON BETWEEN FOUR ALGORITHMS

    Algorithm name Message number Total complexityMP 3MNK + 1 O(MNK(M + N + K))VBI MN + NK + 1 O(MN2K)

    BiG-AMP MK + MN + NK + 1 O(MNK)VAMP MK + MN + NK + 1 O(MNK)

    V. PERFORMANCE ANALYSIS

    In this section, we derive the asymptotic analysis of VAMPbased on state evolution. This statistical analysis approachis known as the cavity method in statistical physics [36],[37], density evolution in coding [38], and state evolution incompressed sensing [21], [23] and matrix factorization [28].

    In the above literatures, the framework of state evolutionis usually used for real-valued problems. Hence, the complexmodel in (2) needs to be converted using real-valued decompo-sition. Specifically, the data detection process can be modeledas [

    Y R

    Y I

    ]= sgn(

    [HR, −HIHI , HR

    ] [XR

    XI

    ]) (46)

    and the channel estimation process as

    [Y R, Y I ] = sgn([HR,HI ][

    XR, XI

    −XI , XR]), (47)

    In this way, state evolution can be applied directly to our work.The JCD model expressed by (46) and (47) is still quantized

    bilinear. Dealing with real and imaginary parts separately, itcan be regarded as a variant of (2) with real values and adoubled size, that is, 2MK real-valued measurements, 2MNreal-valued channel variables and 2NK real-valued symbolvariables. Therefore, for conciseness of derivation and withoutloss of generality, we just need to study the state evolutionprocess of (2) in real number field, which is equivalent to thatof complex JCD with half size.

    Two assumptions are made in this section.Assumption 1: Users transmit BPSK symbols. Hence, the

    exponential approximation of posterior probability in (26) willnot change the soft information of bits in BPSK symbolsduring iteration.

    Assumption 2: Power of each user is equal, namely, α1 =· · · = αN = α0. It is reasonable because practical systemsusually include uplink power control to avoid the near-fareffect in MAC channels.

    Besides, we use K1 = Kpilot and K2 = K − Kpilot forbrevity.

    A. state evolution

    Let us first introduce some important statistical parameters.Parameters about data symbol variables are defined as

    qtx = En,k>K1 [(µtnk)

    2], ctx = En,k>K1 [x0nkµ

    tnk]. (48)

    Parameters about channel variables are defined as

    qth = Em,n[(νtmn)

    2], cth = Em,n[h0mnν

    tmn],

    Qth = Em,n[(νtmn)

    2 + ρtmn].(49)

    Parameters about channel transition nodes are defined as

    qtp = Em,k≤K1 [(ζtmk)

    2],qtd = Em,k>K1 [(ζtmk)

    2], (50)

    χtp = Em,k≤K1 [γtmk],χ

    td = Em,k>K1 [γ

    tmk], (51)

    ∆tp = Em,k≤K1 [−2∂zλ(εtmk)ztmk],∆td = Em,k>K1 [−2∂zλ(εtmk)ztmk].

    (52)

    Parameters defined in (48)-(52) are used to derive theprocess of state evolution of VAMP. The general strategyof derivation resembles the argument in [28], but details aretotally distinctive. Proposition 1 describes the whole process ofstate evolution which contains 11 coupled expressions, muchmore complicated than its counterparts, such as BiG-AMP’sevolution with three [28] and AMP’s with only one [21].

    Proposition 1. In the large-system limit M,K,N ≫ 1,the asymptotic evolution of VAMP can be expressed usingparameters defined in (48)-(52) as in Algorithm 3.

    Proof: See Appendix B.

    Algorithm 3 State evolution of VAMPInput: M,N,K,Kpilot, α0.Output: ctx, qtx; cth, qth, Qth; qtp, χtp,∆tp; qtd, χtd,∆td.

    Initialization: t = 1, ctx = qtx = 0, cth = qth = 0, Qth =Q0h = (α0)

    −1.while t < tmax do

    1. Calculate χtp, qtp, ∆

    tp as (104), (109) (106),

    2. Calculate χtd, qtd, ∆

    td as (110), (112) (111),

    3. Calculate ct+1h , qt+1h as (99), (100),

    4. Calculate Qt+1h as (101),5. Calculate ct+1x , q

    t+1x as (102) (103).

    end while

    Remark 1: Five steps in Algorithm 3 correspond to thelines of VAMP (Algorithm 2) in sequential order. The first twosteps in Algorithm 3 correspond to Line 1-4 in Algorithm 2,the third step to Line 5-6, the fourth step to Line 7-8, and thefifth step to Line 9-12.

    Proposition 1 can be used to track the iterative behavior ofVAMP in the JCD problem. We take M = 500, N = 10,K = 500, Kpilot = 100 and α0 = 1 as example. Thetheoretical values of ctx, q

    tx, c

    th, q

    th, Q

    th are plotted as well

    as their simulated results in Fig. 3. It can be observed thatProposition 1 well matches the simulated experiments.

    Fig. 3 also shows that channel variables converge veryslowly, using about 50 iterations in Fig. 3(a), while symbolvariables converge rapidly, leveling off within 10 iterations inFig. 3(b). It indicates that the algorithm can be terminatedeven if the process of channel estimation is far from over,provided that data estimates becomes stable. This helps reducethe detection delay of VAMP.

    Remark 2: If VAMP is used in detection with perfect CSIor training-based detection, Algorithm 3 still applies.

    Case 1 (perfect CSI): In the detection problem assumingperfect CSI, the state evolution of VAMP can be expressedby Algorithm 3 with Kpilot = 0 and cth = q

    th = (α0)

    −1, ∀t.The resulted evolution is simplified to the one with only the

  • 8

    1 2 3 4 5 6 7 8 9 10t

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    ctx,simulated

    qtx,simulated

    ctx,theoretical

    qtx,theoretical

    (a) Parameters about symbol variables

    10 20 30 40 50 60 70t

    0

    0.2

    0.4

    0.6

    0.8

    1

    1.2

    1.4

    1.6

    1.8ct

    h,simulated

    qth,simulated

    Qth,simulated

    5 10 150

    0.2

    0.4

    0.6

    0.8

    1

    40 45 501.5

    1.55

    1.6

    theoretical

    (b) Parameters about channel variables

    Fig. 3. Theoretic and simulated results of VAMP’s state evolution withM = 500, N = 10, K = 500, Kpilot = 100 and α0 = 1 (or equivalently,SNR= 6dB at the receiver).

    second and the fifth steps, and thus the equation number isreduced to five.

    Case 2 (training-based): In the training-based detection,Algorithm 3 needs to be run twice. The first time is for channelestimation, with K = Kpilot and ctx = q

    tx = 1. Hence the first

    evolution includes only six equations, with only the first, thirdand the fourth steps in Algorithm 3. The second time is fordata detection. The second evolution process is the same as inCase 1 except that cth, q

    th, Q

    th,∀t are fixed as the convergence

    values of c∞h , q∞h , Q

    ∞h in the channel estimation.

    B. Bound of Detection Performance

    Assume that VAMP’s detection performance is evaluatedby the mean square error (MSE) of data estimates, which isdefined as

    mseX = En,k>K1 [(x0nk − µ∞nk)2]. (53)

    Substituting (48) into mseX , we obtain

    mseX = 1 + q∞x − 2c∞x . (54)

    The above µ∞nk, q∞x and c

    ∞x denote the convergence values of

    µtnk, qtx, and c

    tx.

    The theoretical value of mseX can be achieved by substi-tuting the results of Proposition 1 into (54). Unfortunately,however, it is difficult to obtain analytical expressions ofq∞x , c

    ∞x in such complicated evolution. To tackle this problem,

    we switch the JCD problem to another simplified problem oftraining-based detection, where all K symbols are regardedas pilot symbols during channel estimation, and then datasymbols are detected using the improved CSI. In this way,channel estimation and symbol detection are decoupled so thatmseX is much more convenient to obtain. Therefore, we canrefer to a similar evolution process as in Case 2, and obtainthe lower bound of JCD’s theoretical mseX in Proposition 2.

    Proposition 2. Applying VAMP to the training-based schemewhose pilot length is Kpilot = K, JCD’s performance in termsof MSE can be approximated as

    mseX =1

    4M2(χ∞p q∞h + 1/K)

    2. (55)

    Here (q∞h , χ∞p ) is the solution to the state evolution

    ct+1h =KκQ0h +K∆

    tpc

    th

    (Qth)−1 +Kχtp

    , (56)

    qt+1h =(Kκ)2Q0h + (K∆

    tp)

    2qth + 2K2κ∆tp +Kq

    tp

    ((Qth)−1 +Kχtp)

    2. (57)

    Qt+1h = qt+1h +

    1

    (Qth)−1 +Kχtp

    . (58)

    where χtp, qtp and ∆

    tp is the function of c

    th, q

    th and Q

    th written

    as (104), (109) and (106), Q0h = 1/α0 is the true power ofchannel variables and κ is a constant defined in (96).

    Proof: See Appendix C.Proposition 2 provides an analytical performance expression

    for the detection performance. Define SNR at the receiverusing N and α0 as

    SNR = 10 logE[||zmk||2]E[||wmk||2]

    = 10 logN

    16α0/π. (59)

    As a lower bound of MSE, Proposition 2 neglects the effectof pilot length, but clearly presents the roles of M , SNR andK on the detection performance.

    In (55), antenna number M presents a square-inverse rela-tionship with mseX . Besides, the quantity χ∞p q

    ∞h represents

    the influence of SNR. The reason is that Qth and Kqtp can be

    omitted in the expression of ct+1h , qt+1h as K → ∞, and thus

    K can be reduced. As a result, the evolution in (56)-(58) haslittle relationship with M and K, but is mainly determined byN and α0 (i.e. SNR). Finally, block length K helps improveMSE, but its effect is limited. In practice, for given M,N,α0, asuitable K can be inferred from the value of χ∞p q

    ∞h and 1/K

    to shorten the detection delay at the expense of acceptableperformance loss.

    Fig. 4 compares the simulated results with the theoreticalones derived from Proposition 1 and 2. Fig. 4(a) presents theMSE under different M and SNR with other parameters asN = 10,K = 500,Kpilot = 100, tmax = 200 (the maximumiteration number), and shows that the performance improves

  • 9

    100 150 200 250 300 350 400 450 500

    M

    10-4

    10-3

    10-2

    10-1

    MS

    EJCD,simulatedP.1,theoreticalP.2,theoretical

    SNR=3dB

    SNR=10dB

    (a) MSE versus antenna number and SNR

    200 300 400 500 600 700 800 900 1000

    K

    10-4

    10-3

    10-2

    10-1

    MS

    E

    JCD,simulatedP.1,theoreticalP.2,theoretical

    SNR=3dB

    SNR=10dB

    (b) MSE versus block length

    Fig. 4. Detection performance of VAMP in terms of MSE with N =10,Kpilot = 100, and tmax = 200. In addition, K = 500 in fig. 4(a) andM = 200 in fig. 4(b).

    significantly with larger M and SNR. Fig. 4(b) assumes M =200, N = 10,Kpilot = 100, tmax = 200 and considers the effectof K, which seems limited on the detection improvement. Asexpected, theoretical curves have similar trends with simulatedones, with those of Proposition 2 acting as lower bounds.

    VI. NUMERICAL RESULTSIn this section, we numerically evaluate the performance

    of the proposed VAMP algorithm. In the experiments below,the data and pilot sequences are assumed random QPSKsequences. The channel matrices are generated according to(9) with α1 = · · · = αN = α0. The hyper-parameter α0to be estimated determines the noise level via received SNRdefined in (59). The detection performance is evaluated by bit-error rate (BER)3. We conduct 1000 trials for each setting andaverage the numerical results.

    Three inference detectors are considered for comparison,including the proposed VAMP, BiG-AMP [13] and VBI [30].The BiG-AMP algorithm is implemented based on the open-source “GAMPmatlab” software suite [39]. Note that the

    3BER is similar with mseX defined in (54). They both calculate the squareerror between estimates and true signals. The difference is that mseX is aboutsoft decision while BER is about hard decision.

    0 5 10 15

    SNR

    10-4

    10-3

    10-2

    10-1

    100

    BE

    R

    LS+MRCVBIVAMPBiG-AMP

    Fig. 5. BER performance versus SNR in relatively high SNR regime withM = 200, N = 50,K = 500,Kpilot = 50.

    damping is used adaptively (the method can be found in[25]), i.e., we always choose to use it if it can improvethe performance, otherwise we do not. Besides, the lineardetector in [8] [12] is also regarded as a benchmark detector. Itcombines least-squares (LS) channel estimation and maximal-ratio combining (MRC) processing to form a JCD detector, andis thus denoted as “LS+MRC”. Although suboptimal, it provesto achieve the channel capacity in such non-linear systems. Allthe four algorithms have the first-order complexity in terms ofantenna number M and block length K. They are all iterativealgorithms, and will be terminated if the data estimates re-mains unchanged within 10 iterations or tmax = 200 is reached.

    A. BER versus SNR

    In order to compare the proposed algorithm with the bench-mark algorithms, first we check the consistency with existingworks using the same simulation settings. To this end, weexamine the quantized MIMO system with antenna numbernumber M = 200, user number N = 50, block lengthK = 500 and pilot length Kpilot = 50 as in [13] [40]. Itcan be observed that, the performance of BiG-AMP coincideswell with the results in [40]; VAMP can also accomplish thejoint detection task, and it achieves better performance thanVBI and linear detector, but is not as good as BiG-AMP inrelatively high SNR regimes.

    However, in the relatively low SNR regimes, VAMP showstreasurable robustness in comparison with BiG-AMP. Fig. 6presents the BER versus SNR performance with parametersettings different from Fig. 5, and shows that VAMP canwork well in scenarios with relatively small M , low SNR andsmall pilot percentage (Kpilot/K), while BiG-AMP cannot.Specifically, in Fig. 6(a), BiG-AMP performs better thanthe other three counterparts in SNR= 3dB, but it fails toconverge with SNR lower than 0dB when parameters areM = 200, N = 10,K = 1000 and Kpilot = 10. The SNRthreshold of convergence reduces to -2dB with longer pilotlength Kpilot = 20 as the dashed line in Fig. 6(a), or withshorter block length K = 700 as the solid line in Fig. 6(b). Fig.6(b) also shows that the SNR threshold decreases significantly

  • 10

    -5 -4 -3 -2 -1 0 1 2 3

    SNR

    10-4

    10-3

    10-2

    10-1

    100

    BE

    R

    LS+MRCVBIVAMPBiG-AMP

    (a) M = 200, N = 10,K = 1000,Kpilot = (solid) 10, (dashed) 20.

    -5 -4 -3 -2 -1 0 1 2 3

    SNR

    10-4

    10-3

    10-2

    10-1

    100

    BE

    R

    LS+MRCVBIVAMPBiG-AMP

    (b) M = (solid) 200, (dashed) 150, N = 10,K = 700,Kpilot = 10.

    Fig. 6. BER performance versus SNR in relatively low SNR regime withdifferent parameter settings.

    with antenna number M , e.g., from 0dB for M = 150 to -2dBfor M = 200, with N = 10,K = 700,Kpilot = 10.

    Besides, it can also be seen from Fig. 6 that VBI has asimilar robustness as the proposed VAMP, but presents aninferior performance to VAMP in all SNR regimes. The lineardetector does not work well with Kpilot = N in Fig. 5 and 6.It achieves the detection performance lower than 10−1 only inFig. 6(a) with Kpilot = 2N = 20, which is still inferior to thethree inference algorithms.

    Remark 3: Although convergence could be guaranteed,VAMP suffers from performance loss compared with BiG-AMP. The reason is that, besides the approximation formarginalization (Taylor expansion in BiG-AMP and varia-tional approximation in VAMP), an additional exponentialapproximation (19) is used for the conditional probabilityfunction of the quantized channel P (Y |H,X), and thus extraerror is introduced.

    Remark 4: An intuitive explanation is provided for thereason why VAMP can work in some scenarios with relativelylow SNR, small M , and small Kpilot/K where BiG-AMPcannot.

    We begin with an example to illustrate the underlying be-

    havior in the bilinear model. An unquantized AWGN channely = hx + w is considered, where h and x are two scalervariables and w represents Gaussian noise. We denote the twovariables as h ∼ N (νt, ρt), x ∼ N (µt, τ t), νt, µt ̸= 0 andthe offset as ∆t = y − νtµt. Based on marginalization suchas (28) and (29), variable updates can be expressed as

    νt+1 =(µt)2

    (µt)2 + τ tνt +

    ∆tµt

    (µt)2 + τ t,

    µt+1 =(νt)2

    (νt)2 + ρtµt +

    ∆tνt

    (νt)2 + ρt.

    (60)

    Then a quadratic term of ∆t appears in the new offset∆t+1 = y − νt+1µt+1. In this way, the necessary conditionof convergence ||∆t+1|| < ||∆t|| is not always guaranteed,especially when µt and νt are both inaccurate. On the otherhand, the initial channel estimates are calculated using thepilot sequences, and the initial symbol estimates involves Kpilotknown pilot symbols and K−Kpilot inaccurate data estimatesobtained from channel estimates. Specifically, low SNR andsmall Kpilot will result in inaccurate channel estimates, whilelow SNR, small M , inaccurate channel estimates and smallKpilot/K will make the overall accuracy of initial symbol es-timates low. Therefore, divergence risks increase significantlyin the bilinear model when low SNR, small M , and smallKpilot/K are confronted.

    -5 0 5z

    0

    1

    2

    3

    ζ

    BiG-AMPVAMP

    Fig. 7. Values of ζ in BiG-AMP and VAMP in a quantized channel y =sgn(x+ w), assuming w ∼ N (0,

    √π/8), y = 1 and x ∼ N (x; z, 1).

    Now we take a close look at the key difference between thetwo algorithms. It comes from the update expression of ζtmk,the correction terms to variable estimates in both BiG-AMPand VAMP. As in Fig. 7, the two curves of ζ overlap for theright part z > 0, namely, the two algorithms behave nearly thesame when the estimate is consistent with the measurement.Differences appear in the left part z < 0, where the value of ζincreases unboundedly in BiG-AMP, but is bounded in VAMP.Returning to VAMP’s initial symbol estimates, it indicatesthat messages with positive effects are conserved, such asthose from Kpilot pilot symbols, while messages with negativeeffects such as those from K −Kpilot inaccurate data symbolestimates are limited. Therefore, VAMP are less subject tothe pilot percentage Kpilot/K than BiG-AMP, and thus enjoysa wider operation range in SNR and antenna number M .This phenomenon continues to be verified by the followingnumerical results.

    B. BER versus other parametersFig. 8 reveals the role of antenna number M on the detection

    performance. The parameter settings are SNR= 0dB, N = 10,

  • 11

    100 150 200 250

    M

    10-4

    10-3

    10-2

    10-1

    100

    BE

    R

    LS+MRCVBIVAMPBiG-AMP

    Fig. 8. BER performance versus antenna number M, with SNR= 0dB,N = 10,K = 1000,Kpilot = 10(solid), 20(dashed), 30(dot-dash).

    10 20 30 40 50 60 70 80 90 100K

    pilot

    10-4

    10-3

    10-2

    10-1

    100

    BE

    R

    LS+MRCVBIVAMPBiG-AMP

    SNR=0dB

    SNR=5dB

    Fig. 9. BER performance versus pilot length Kpilot with SNR = 0dB, 5dB,M = 100, N = 10,K = 1000.

    K = 1000, and Kpilot = 10, 20, 30. It can be seen that BERsof all the four algorithms improve with higher SNR and largerM . The linear detector is sensitive to pilot length, and requiresmore than Kpilot = 30 to achieve its optimal performance.When the pilot length is enough, the performance of VAMPis better than VBI and LS+MRC but inferior to BiG-AMP. Thegaps are negligible in low SNR regimes but becomes obviousgradually when SNR increases. In Fig. 8, BiG-AMP requiresat least M = 210 antennas to accomplish the detection withKpilot = 10. This antenna threshold decreases with pilot length,such as M = 130 for Kpilot = 20. In addition, Kpilot = 30 isneeded to accomplish the detection with M = 100 antennas.

    Fig. 9 presents the roles of Kpilot on the detection perfor-mance. The parameter settings are M = 100, N = 10,K =1000 and SNR= 0dB, 5dB. It shows that the increase inKpilot helps little to improve the detection performance of thefour algorithms. Hence, more pilot symbols are unnecessaryonce the convergence is guaranteed. Specifically, LS+MRCneeds about Kpilot = 20 ∼ 50 to avoid performance loss.BiG-AMP requires Kpilot = 30 to avoid divergence withSNR= 0dB, and then requires Kpilot = 20 when SNRincreases to 5dB. Contrarily, VAMP and VBI can work well

    100 200 300 400 500 600 700 800 900 1000

    K

    10-4

    10-3

    10-2

    10-1

    100

    BE

    R

    LS+MRCVBIVAMPBiG-AMP

    SNR=0dBSNR=5dB

    Fig. 10. BER performance versus block length K, with SNR=0dB, 5dB,M = 150, N = 10,Kpilot = 10(solid), 30(dashed).

    10 15 20 25 30 35 40 45 50

    N

    10-4

    10-3

    10-2

    10-1

    100

    BE

    R

    VBIVAMPBiG-AMP

    M=150,SNR=5dB

    M=100,SNR=0dB

    M=150,SNR=0dB

    Fig. 11. BER performance versus user number N : (a) M = 150, SNR=5dB,K = 1000,Kpilot = N ; (b) M = 150, SNR= 0dB, K = 1000,Kpilot = N (solid), Kpilot = 3N (dashed); (c) M = 100, SNR= 0dB, K =1000, Kpilot = N (solid), Kpilot = 3N (dashed).

    with only Kpilot = N = 10.Fig. 10 discusses the effect of block length K. The parame-

    ter settings are M = 150, N = 10,Kpilot = 10, 30 and SNR=0dB, 5dB. It shows that BER can be improved with longerblock length K, but the gain decreases gradually. Therefore, asuitable block length K should be chosen carefully to shortenthe detection delay at the expense of acceptable performanceloss. BiG-AMP fails to converge with SNR= 0dB, Kpilot = 10when the block length is larger than K = 800, which can beavoided with either higher SNR (SNR= 5dB) or longer pilotlength Kpilot = 30. VAMP never has this kind of problem, andpresents a better performance than VBI and LS+MRC.

    Fig. 11 discusses the supportable number of users N withdifferent parameters. The parameter settings are divided intothree groups: (a) M = 150, SNR= 5dB,K = 1000, (b) M =150, SNR= 0dB, K = 1000, (c) M = 100, SNR= 0dB, K =1000. Curves with low pilot overhead Kpilot = N are drawnwith solid lines while those with relatively high pilot overheadKpilot = 3N are drawn for comparison with dashed lines. Forclarity, the linear detector LS+MRC in not presented. It can be

  • 12

    seen that for given M,K, SNR and assuming Kpilot = N , theperformances of VBI and the proposed VAMP both degradeas the user number increases, and thus the maximum usernumbers that can achieve BER lower than 0.2 in groups (a),(b) and (c) are about N = 50, 30, 20 respectively. BiG-AMPpresents the best performance using Kpilot = N with M =150 and SNR= 5dB. However, when the antenna number andthe SNR is reduced, BiG-AMP fails to support small usernumber with low pilot overhead, such as N = 10 in M = 150,SNR= 0dB. The problem becomes troubling in M = 100,SNR= 0dB, where BiG-AMP cannot support any user numberwith low pilot overhead Kpilot = N , since the ratio Kpilot/Kis too low for N = 10, 20 and meanwhile the antenna numberand SNR are also too low for N ≥ 30. In such scenario,BiG-AMP can only resort to long pilot sequences such asKpilot = 3N .

    VII. CONCLUSION

    In this paper, we propose a vertex-message based inferencealgorithm called VAMP for the JCD of one-bit quantizedmassive MIMO systems. The proposed scheme requires onlybinary measurements and pilot sequences whose lengths arejust exactly the user number to implement channel estima-tion, data detection and noise level estimation simultane-ously. Borrowing ideas from the two related algorithms BiG-AMP and VBI, VAMP overcomes their respective flaws bycombining variational approximation into the framework ofAMP. Asymptotic analysis of state evolution is derived todescribe the iterative behavior of VAMP in the limit of large-scale systems. Extensive simulations verify the advantagesof VAMP: it outperforms the linear detector LS+MRC andthe VBI approach in all SNR regimes, and can accomplishthe detection task in some scenarios with relatively low SNRand low pilot percentage while BiG-AMP cannot, thus showscertain robustness.

    A number of interesting and important problems remainunsolved beyond this paper, such as extending the analysisof state evolution to high-order modulation; considering theeffect of channel correlation in the specific detector structure;taking into account channel capacity and energy consumptionto provide a more comprehensive analysis. among others. Ad-ditionally, to make this approach effective for wide-band chan-nels which are more appealing in the future communicationsystems, it is particularly crucial to design the JCD methodfor quantized MIMO systems in the frequency-selective fadingchannel [41]. When one adopts multi-carrier transceiver archi-tectures like MIMO-OFDM, one-bit ADCs are applied in thetime domain rather than the frequency domain. Hence, severeinter-carrier interference due to quantization is inevitable andthe decoder needs to properly account for this. This is feasibleusing the general framework of Bayesian inference (see [42]for reference) but beyond the scope of this paper, and will betreated in our future work.

    APPENDIX ADERIVATION OF ALGORITHM 2

    To begin with, we define

    ϕtmk(hmnxnk) = CN (hmnxnk;ζtmkγtmk

    ,1

    γtmk) (61)

    as the vertex message corresponding to the channel transitionnode P (ymk|zmk), with

    γtmk =γtmk→(mn,nk), (62)

    ζtmk =γtmk(

    ymk2

    (1

    2λ(εR,tmk)+ i

    1

    2λ(εI,tmk)) (63)

    −∑n′

    νtmn′→mkµtn′k→mk).

    Secondly, we turn to the cloning node and deal with thechannel variable hmn first. Denote its vertex message asϕt+1mn (hmn) = CN (hmn; νtmn, ρtmn), with

    ρt+1mn =(αtn +

    ∑k′

    γtmk′→(mn,nk′) (64)

    · (|µtnk′→mk′ |2 + τ tnk′→mk′))−1,

    νt+1mn =ρt+1mn (

    ∑k′

    (µtnk′→mk′)∗ζtmk′→(mn,nk′)). (65)

    The difference between edge messages ϕt+1mn→mk(hmn) andvertex messages ϕt+1mn (hmn) is the one-out-of-K exclusion inthe sum of γtmk′→(mn,nk′) and (µ

    tnk′→mk′)

    ∗ζtmk′→(mn,nk′). Inthe large-system limit, there are

    ρt+1mn→mkρt+1mn

    = 1 +O( 1K

    ), (66)∑k′ ̸=k

    (µtnk′→mk′)∗ζtmk′→(mn,nk′)∑

    k′(µtnk′→mk′)

    ∗ζtmk′→(mn,nk′)= 1 +O( 1√

    K). (67)

    Regarding O( 1√K) as the leading order and omitting O( 1K ),

    we obtain

    ρt+1mn→mk ≈ ρt+1mn , (68)

    νt+1mn→mk = ρt+1mn (

    ∑k′ ̸=k

    ((µtnk′→mk′)∗ζtmk′→(mn,nk′)). (69)

    Therefore, replacing γtmk→(mn,nk) and ζtmk→(mn,nk) with

    γtmk and ζtmk, we can rewrite ϕ

    t+1mn (hmn) in (64) and (65) with

    other vertex messages as

    ρt+1mn =(αtn +

    ∑k′

    γtmk′(|µtnk′ |2 + τ tnk′))−1, (70)

    νt+1mn =νtmn(1− ρt+1mn (αtn +

    ∑k′

    γtmk′τtnk′)) (71)

    + ρt+1mn∑k′

    (µtnk′)∗ζtmk′ , (72)

    and express the relationship between ϕt+1mn→mk(hmn) andϕt+1mn (hmn) also using vertex messages as

    ρt+1mn→mk = ρt+1mn , (73)

    νt+1mn→mk = νt+1mn − ρt+1mn (µtnk)∗ζtmk. (74)

  • 13

    In (71) and (74), τ tnk→mk and µtnk→mk have been replaced

    with τ tnk and µtnk which will be defined later. The effect of

    their differences is negligible herein.Thirdly, based on variational approximation (15), the mes-

    sage ϕt(αn) = Γ(αn; atn, btn) can be calculated as

    ϕt(αn) = Pα(αn)∏m

    ∫P (hmn|αn)ϕtmn(hmn)dhmn+const,

    (75)and thus we can obtain

    atn = a0 +M

    2, btn = b0 +

    1

    2

    ∑m

    (|νtmn|2 + ρtmn). (76)

    According to Bayesian optimal estimator, there is

    E[αtn] = atn/b

    tn. (77)

    Fourthly, ϕ̃tnk(xnk) = CN (xnk; µ̃tnk, τ̃ tnk) can be calculatedin a similar way as ϕtmn(hmn):

    τ̃ t+1nk =(1 +∑m′

    γtm′k(|νtm′n|2 + ρtm′n))−1, (78)

    µ̃t+1mn =µtmn(1− τ t+1mn (1 +

    ∑m′

    γtm′kρtm′n)) (79)

    + τ t+1mn∑m′

    (νtm′n)∗ζtm′k.

    Naturally, ϕtnk(xnk) = CN (xnk;µtnk, τ tnk) is defined and canbe expressed as

    µt+1nk =F1(µ̃t+1nk , τ̃

    t+1nk ), (80)

    τ t+1nk =F2(µ̃t+1nk , τ̃

    t+1nk ). (81)

    Something to note is that the difference between µt+1nk→mk andµt+1nk can be approximated as in [33]:

    µt+1nk→mk = µt+1nk − τ

    t+1nk (ν

    tmn)

    ∗ζtmk. (82)

    Finally, we return to the beginning and deriveϕtmk(hmnxnk) using vertex messages mentioned above.Define ztmk as

    ztmk =∑n′

    νtmn′→mkµtn′k→mk. (83)

    Using (74) and (82), ztmk can be approximated as (84), whosesecond term is the representative Onsager reaction term inAMP based algorithms and does not appear in VBI with merevariational approximation.

    Using ztmk, the expression of εtmk can be approximated as

    (εR,tmk)2 = |(ztmk)R|2 +

    1

    2δtmk,

    (εI,tmk)2 = |(ztmk)I |2 +

    1

    2δtmk,

    (85)

    where (ztmk)2 is regarded as the dominant term and δtmk =∑

    n(ρtmnτ

    tnk + |µtmn|2ρtmn + |νtmn|2τ tnk). Substituting ztmk and

    εtmk into (62) and (63), γtmk and ζ

    tmk becomes

    γtmk =4λ(εR,tmk)λ(ε

    I,tmk)

    2λ(εR,tmk) + 2λ(εI,tmk)

    , (86)

    ζtmk = γtmk(

    ymk2

    (1

    2λ(εR,tmk)+ i

    1

    2λ(εI,tmk))− ztmk). (87)

    This completes the derivation.

    APPENDIX BDERIVATION OF PARAMETERS IN STATE EVOLUTION

    A. Parameters about Variable Nodes

    In real number field, tthe expressions of εtmk, γtmk and ζ

    tmk

    in Algorithm 2 can be simplified into

    (εtmk)2 = (ztmk)

    2 +∑n

    (ρtmn + (νtmn)

    2τ tnk),

    γtmk = 2λ(εtmk),

    ζtmk =ymk2

    − γtmkztmk.

    (88)

    Based on this, expressions of qth, cth, Q

    th can be derived as

    follows. qtx, ctx can be obtained accordingly.

    It can be observed that the update of νtmn has the form ofνtmn = A

    t+1mn /B

    t+1mn , where

    At+1mn ≡∑k

    µtnk→mkζtmk→(mn,nk), (89)

    Bt+1mn ≡ (αt)−1 +∑k

    γtmk. (90)

    Easily, Bt+1mn can calculated as

    Bt+1mn = (Qth)

    −1 +K1χtp +K2χ

    td. (91)

    When At+1mn is concerned, the message µtnk→mk is un-

    correlated with all the other incoming messages included inztmk→(mn,nk) and also with all the x

    0n′k for all n

    ′ ̸= n. Itis, however, correlated with x0nk in

    ymk2 and µ

    tnk→mk in ε

    tmk.

    Hence, the dependent and independent parts have to be treatedseparately. After an expansion we obtain

    ζtmk→(mn,nk) ≈(ymk\n

    2− 2λ(εtmk\n)z

    tmk→(mn,nk)

    )+ ∂z0

    ymk\n

    2h0mnx

    0nk

    − 2(∂zεtmk\n)ztmk→(mn,nk)ν

    tmn→mkµ

    tnk→mk.

    (92)where ymk\n = sgn(

    ∑n′ ̸=n

    h0mn′x0n′k + w

    0mk) and (ε

    tmk\n)

    2 =

    (ztmk→(mn,nk))2 +

    ∑n′ ̸=n

    (ρtmn′→mk + (νtmn′→mk)

    2τ tn′k→mk).

    Using (92) to approximate At+1mn , we obtain

    At+1mn =√K1qtp +K2q

    txq

    tdv + (K1 +K2c

    tx)κh

    0mn

    + (K1∆tp +K2q

    tx∆

    td)ν

    tmn

    (93)

    where K1,K2, N ≫ 1 is assumed. Here v is a standardGaussian variable, and the first term in (93) represents theGaussian distribution on the basis of the the central limittheorem, assuming the approximation

    Em,n,k[ymk\n

    2− 2λ(εtmk\n)z

    tmk→(mn,nk)] = 0, (94)

    Em,n,k[(ymk\n

    2− 2λ(εtmk\n)z

    tmk→(mn,nk)

    )2]

    ≈Em,k[(ymk

    2− 2λ(εtmk)ztmk

    )2].

    (95)

  • 14

    ztmk ≈∑n′

    νtmn′µtn′k − ζt−1mk

    ∑n′

    ((νtmn′νt−1mn′)

    ∗τ tn′k + (µtn′kµ

    t−1n′k )

    ∗ρtmn′). (84)

    In the second term, we calculate κ = Em,k[∂z0ymk2 ] as

    κ =

    ∫∫ϕ(

    z0√NQ0h

    )∂z0P (y|z0)y

    2dydz0

    =1√

    2π(NQ0h + τw).

    (96)

    Substituting (91) and (93) into the definitions in (49), wecan calculate cth, q

    th as

    ct+1h = Em,n[

    ∫At+1mnBt+1mn

    h0mnΦ(v)dv], (97)

    qt+1h = Em,n[

    ∫ (At+1mnBt+1mn

    )2Φ(v)dv], (98)

    and obtain

    ct+1h =κ(K1 +K2c

    tx)Q

    0h + (K1∆

    tp +K2q

    tx∆

    td)c

    th

    (Qth)−1 +K1χtp +K2χ

    td

    , (99)

    qt+1h =((K1 +K2c

    tx)κ)

    2Q0h((Qth)

    −1 +K1χtp +K2χtd)

    2

    +(K1∆

    tp +K2q

    tx∆

    td)

    2qth + (K1qtp +K2q

    txq

    td)

    ((Qth)−1 +K1χtp +K2χ

    td)

    2

    +2κ(K1 +K2c

    tx)(K1∆

    tp +K2q

    tx∆

    td)c

    th

    ((Qth)−1 +K1χtp +K2χ

    td)

    2.

    (100)Incidentally, we obtain the expression of Qth:

    Qt+1h = qt+1h +

    1

    (Qth)−1 +K1χtp +K2χ

    td

    . (101)

    So far the three parameters about channel variables havebeen completed. Assuming M ≫ 1, those parameters aboutdata variables defined in (48) can be derived similarly. Due tospace limitation, the results are expressed as

    ct+1x =

    ∫F1(

    √Mqthq

    td

    1 +MχtdQth

    v +Mcthκ

    1 +MχtdQth

    +Mqth∆

    td

    1 +MχtdQth

    ctx,1

    1 +MχtdQth

    )Φ(v)dv (102)

    ≈F1(Mcthκ

    1 +MχtdQth

    +Mqth∆

    td

    1 +MχtdQth

    ctx,1

    1 +MχtdQth

    )

    qt+1x =

    ∫F 21 (

    √Mqthq

    td

    1 +MχtdQth

    v +Mcthκ

    1 +MχtdQth

    +Mqth∆

    td

    1 +MχtdQth

    ctx,1

    1 +MχtdQth

    )Φ(v)dv (103)

    ≈F 21 (Mcthκ

    1 +MχtdQth

    +Mqth∆

    td

    1 +MχtdQth

    ctx,1

    1 +MχtdQth

    ).

    B. Parameters about Channel Transition Nodes

    Firstly, the expression of χtp can be calculated as

    χtp = Em,k≤K1 [γtmk] =

    ∫Φ(

    s√Nqth

    )2λ(εtp)ds, (104)

    where s is defined as s2 = Em,k≤K1 [(∑nνtmnµ

    tnk)

    2] and

    (εtp)2 = Em,k≤K1 [(

    ∑n

    hmnxnk)2] = s2 +N(Qth − qth).

    (105)Secondly, ∆tp = Em,k≤K1 [−2∂zλ(εtmk)ztmk] can be calcu-

    lated as

    ∆tp =

    ∫Φ(

    s√Nqth

    )(−2∂sλ(εtp))sds. (106)

    Thirdly, qtp = Em,k≤K1 [(ζtmk)

    2] can be calculated as

    qtp =

    ∫∫∫N (s, z)P (y|z)(y

    2− 2λ(εtp))2dydsdz. (107)

    where N (s, z) is a bivariate gaussian joint distribution, withs’s variance Nqth, z’s variance NQ

    0h and the correlation

    coefficient cth√

    Q0hqth

    . N (s, z) can also be expressed as

    N (s, z) = Φ( s√Nqth

    )Φ(z − (cth/qth)s

    N(Q0hqth − (cth)2)/qth

    ). (108)

    Therefore, qtp can be fully expanded as

    qtp =1

    4+

    ∫Φ(

    s√Nqth

    )(2λ(εtp))2ds

    −2∫

    Φ(s√Nqth

    )Ψ((cth/q

    th)s

    NQ0hq

    th−(c

    th)

    2

    qth+ τw

    )2λ(εtp)sds.(109)

    Finally, expressions of χtd, qtd, ∆

    td can be obtained similarly.

    χtd =

    ∫Φ(

    s√Nqthq

    tx

    )2λ(εtd)ds, (110)

    ∆td =

    ∫Φ(

    s√Nqthq

    tx

    )(−2∂sλ(εtd))sds, (111)

    qtd =1

    4+

    ∫Φ(

    s√Nqthq

    tx

    )(2λ(εtd))2ds

    −2∫

    Φ(s√

    Nqthqtx

    )Ψ(

    cthctx

    qthqtxs

    NQ0hq

    thq

    tx−(cthctx)2qthq

    tx

    + τw)2λ(εtd)sds,

    (112)with s2 = Em,k>K1 [(

    ∑nνtmnµ

    tnk)

    2] and (εtd)2 = s2+N(Qth−

    qthqtx).

  • 15

    APPENDIX CPROOF OF PROPOSITION 2

    In the training-based scheme, we substituting K = K1 andctx = q

    tx = 1 into Algorithm 3, and can easily obtain the

    simplified evolution process mentioned in Proposition 2.When detection performance is concerned, we recall the

    relationship qtx = (ctx)

    2 in (102) and (103). Assuming that1− c∞x is a small quantity, there will be

    1− q∞x = 1− (1− (1− c∞x ))2 ≈ 2(1− c∞x ). (113)

    Substituting (113) into mseX in (54), we can get

    mseX = (1− c∞x )2 =1

    4(1− q∞x )2. (114)

    Then for any symbol variable xnk, there must be µ∞nk ≈µ̃∞nk ≈ 1, otherwise ν∞mn will change gradually to

    µ∞nkµ̃∞nk

    ν∞mn.Based on the LLR projection rule in (24) and (26) for BPSK:

    LLR =2µ∞nkτ∞nk

    =2µ̃∞nkτ̃∞nk

    , (115)

    we can obtain τ̃∞nk = τ∞nk. Hence, mseX can be

    mseX =1

    4(En,k>K1 [τ

    ∞nk])

    2 =1

    4(En,k>K1 [τ̃

    ∞nk])

    2

    =1/4

    (1 +Mχ∞d Q∞h )

    2.

    (116)

    As q∞x → 1, there will χ∞d = χ∞p , so we can remove therecursive part in mseX and approximate it as

    mseX =1/4

    (1 +Mχ∞p Q∞h )

    2. (117)

    At last, using M,K ≫ 1 and (101), we can achieve (55).

    REFERENCES

    [1] F. Rusek, D. Persson, B. K. Lau, E. G. Larsson, T. L. Marzetta,O. Edfors, and F. Tufvesson, “Scaling up mimo: Opportunities andchallenges with very large arrays,” IEEE Signal Process. Mag., vol. 30,no. 1, pp. 40–60, 2013.

    [2] T. L. Marzetta, “Noncooperative cellular wireless with unlimited num-bers of base station antennas,” IEEE Trans. Wireless Commun., vol. 9,no. 11, pp. 3590–3600, 2010.

    [3] G. Auer, O. Blume, V. Giannini, I. Godor, M. Imran, Y. Jading, E. Ka-tranaras, M. Olsson, D. Sabella, P. Skillermark et al., “D2. 3: Energyefficiency analysis of the reference systems, areas of improvements andtarget breakdown,” EARTH, 2010.

    [4] D. Ha, K. Lee, and J. Kang, “Energy efficiency analysis with circuitpower consumption in massive mimo systems,” in Personal Indoor andMobile Radio Communications (PIMRC). IEEE, 2013, pp. 938–942.

    [5] J. Singh, O. Dabeer, and U. Madhow, “On the limits of communicationwith low-precision analog-to-digital conversion at the receiver,” IEEETrans. Commun., vol. 57, no. 12, pp. 3629–3639, 2009.

    [6] Q. Bai, A. Mezghani, and J. A. Nossek, “On the optimization ofadc resolution in multi-antenna systems,” in IEEE Int. Symp. WirelessCommun. Systems (ISWCS). VDE, 2013, pp. 1–5.

    [7] O. Orhan, E. Erkip, and S. Rangan, “Low power analog-to-digital con-version in millimeter wave systems: Impact of resolution and bandwidthon performance,” in Inform. Theory and Appl. Workshop (ITA), 2015.

    [8] C. Risi, D. Persson, and E. G. Larsson, “Massive mimo with 1-bit adc,”Mathematics, 2014.

    [9] S. Wang, Y. Li, and J. Wang, “Convex optimization based multiuser de-tection for uplink large-scale mimo under low-resolution quantization,”in IEEE Int. Conf. Commun. (ICC), 2014, pp. 4789–4794.

    [10] ——, “Multiuser detection for uplink large-scale mimo under one-bitquantization,” in IEEE Int. Conf. Commun. (ICC), 2014, pp. 4460–4465.

    [11] ——, “Multiuser detection in massive spatial modulation mimo withlow-resolution adcs,” IEEE Trans. Wireless Commun., vol. 14, no. 4,pp. 2156–2168, 2015.

    [12] S. Jacobsson, G. Durisi, M. Coldrey, U. Gustavsson, and C. Studer,“One-bit massive mimo: Channel estimation and high-order modula-tions,” in IEEE Int. Conf. Commun. Workshop (ICCW), 2015.

    [13] C. K. Wen, C. J. Wang, S. Jin, and K. K. Wong, “Bayes-optimal jointchannel-and-data estimation for massive mimo with low-precision adcs,”IEEE Trans. Signal Process., vol. 64, no. 10, pp. 2541–2556, 2015.

    [14] E. De Carvalho and D. Slock, “Semi-blind maximum-likelihood mul-tichannel estimation with gaussian prior for the symbols using softdecisions,” in IEEE Veh. Technol. Conf. (VTC), vol. 2, 1998, pp. 1563–1567.

    [15] J. Ma and L. Ping, “Data-aided channel estimation in large antennasystems,” in IEEE Int. Conf. Commun. (ICC), 2014, pp. 4626–4631.

    [16] V. Buchoux, O. Cappé, É. Moulines, and A. Gorokhov, “On theperformance of semi-blind subspace-based channel estimation,” IEEETrans. Signal Process., vol. 48, no. 6, pp. 1750–1759, 2000.

    [17] A. Scaglione and A. Vosoughi, “Turbo estimation of channel andsymbols in precoded mimo systems,” in Proc. IEEE Int. Conf. Acoustics,Speed and Signal Process. (ICASSP), vol. 4, 2004, pp. iv–413.

    [18] B. Chen and A. P. Petropulu, “Frequency domain blind mimo systemidentification based on second and higher order statistics,” IEEE Trans.Signal Process., vol. 49, no. 8, pp. 1677–1688, 2001.

    [19] C. Shin, R. W. Heath Jr, and E. J. Powers, “Blind channel estimationfor mimo-ofdm systems,” IEEE Trans. Veh. Technol., vol. 56, no. 2, pp.670–685, 2007.

    [20] J. Pearl, Probabilistic reasoning in intelligent systems: networks ofplausible inference. Morgan Kaufmann, 2014.

    [21] D. L. Donoho, A. Maleki, and A. Montanari, “Message-passing al-gorithms for compressed sensing,” National Acad. Sciences, vol. 106,no. 45, pp. 18 914–18 919, 2009.

    [22] ——, “Message passing algorithms for compressed sensing: I. motiva-tion and construction,” in IEEE Inform. Theory Workshop (ITW), 2010,pp. 1–5.

    [23] M. Bayati and A. Montanari, “The dynamics of message passing ondense graphs, with applications to compressed sensing,” IEEE Trans.Inform. Theory, vol. 57, no. 2, pp. 764–785, 2011.

    [24] S. Rangan, “Generalized approximate message passing for estimationwith random linear mixing,” in Proc. IEEE Int. Symp. Inform. Theory(ISIT), 2011, pp. 2168–2172.

    [25] J. T. Parker, P. Schniter, and V. Cevher, “Bilinear generalized approxi-mate message passingłpart i: Derivation,” IEEE Trans. Signal Process.,vol. 62, no. 22, pp. 5839–5853, 2014.

    [26] J. T. Parker, V. Cevher, and P. Schniter, “Compressive sensing undermatrix uncertainties: An approximate message passing approach,” inAsilomar Conf. Signals, Systems, and Computers (ASILOMAR). IEEE,2011, pp. 804–808.

    [27] J. T. Parker, P. Schniter, and V. Cevher, “Bilinear generalized ap-proximate message passingłpart ii: Applications,” IEEE Trans. SignalProcess., vol. 62, no. 22, pp. 5854–5867, 2014.

    [28] Y. Kabashima, F. Krzakala, M. Mézard, A. Sakata, and L. Zdeborová,“Phase transitions and sample complexity in bayes-optimal matrixfactorization,” IEEE Trans. Inform. Theory, vol. 62, no. 7, pp. 4228–4265, 2016.

    [29] C. M. Bishop, Pattern recognition and machine learning. springer,2006.

    [30] D. G. Tzikas, A. C. Likas, and N. P. Galatsanos, “The variationalapproximation for bayesian inference,” IEEE Trans. Signal Process.,vol. 25, no. 6, pp. 131–146, 2008.

    [31] X. Zhang, D. Wang, C. Xiong, and J. Wei, “Joint symbol detectionand channel tracking for mimo-ofdm systems via the variational bayesem algorithm,” in Personal, Indoor and Mobile Radio Communications(PIMRC). IEEE, 2010, pp. 46–51.

    [32] C. N. Manchón, G. E. Kirkelund, E. Riegler, L. P. Christensen, and B. H.Fleury, “Receiver architectures for mimo-ofdm based on a combinedvmp-sp algorithm,” Mathematics, 2011.

    [33] X. Meng, S. Wu, L. Kuang, and J. Lu, “An expectation propagationperspective on approximate message passing,” IEEE Signal Process.Lett., vol. PP, no. 99, pp. 1194–1197, 2015.

    [34] T. S. Jaakkola and M. I. Jordan, “Bayesian parameter estimation viavariational methods,” Statistics and Computing, vol. 10, no. 1, pp. 25–37, 2000.

    [35] H.-A. Loeliger, “An introduction to factor graphs,” IEEE Signal Process.Mag., vol. 21, no. 1, pp. 28–41, 2004.

    [36] M. Mezard and A. Montanari, Information, physics, and computation.Oxford University Press, 2009.

  • 16

    [37] M. Mezard, G. Parisi, and M. Virasoro, Spin glass theory and beyond.World Scientific, 1987.

    [38] org.cambridge.ebooks.online.book.Author@ca, “Modern coding theory.”[39] S. Rangan, J. T. Parker, P. Schniter, and etc., “Gampmatlab,” http://

    sourceforge.net/projects/gampmatlab/.[40] F. Steiner, A. Mezghani, L. Swindlehurst, J. A. Nossek, and W. Utschick,

    “Turbo-like joint data-and-channel estimation in quantized massivemimo systems,” in International ITG Workshop on Smart Antennas(WSA), 2016, pp. 1–5.

    [41] C. Studer and G. Durisi, “Quantized massive mu-mimo-ofdm uplink,”IEEE Trans. Commun., vol. 64, no. 6, pp. 2387–2399, 2016.

    [42] C. Chen, H. Li, and Z. Hu, “An amp based decoder for massive mu-mimo-ofdm with low-resolution adcs,” in IEEE Int. Conf. on Computing,Networking and Commun. (ICNC), 2017.

    [43] S. Wu, L. Kuang, Z. Ni, J. Lu, D. D. Huang, and Q. Guo, “Low-complexity iterative detection for large-scale multiuser mimo-ofdmsystems using approximate message passing,” IEEE J. Sel. Topics SignalProcess., vol. 8, no. 5, pp. 902–915, 2014.

    Zhaoyang Zhang (M’02) received his Ph.D. degreein communication and information systems fromZhejiang University, China, in 1998. He is currentlya full professor with the College of InformationScience and Electronic Engineering, Zhejiang Uni-versity. His current research interests are mainlyfocused on some fundamental and interdisciplinaryaspects of communication and computation, such ascommunication-computation convergence and net-work intelligence, theoretic and algorithmic foun-dations for Internet-of-Things (IoT) and Internet-of-

    Data (IoD), etc. He has co-authored more than 200 refereed internationaljournal and conference papers as well as two books. He was awardedthe National Natural Science Fund for Distinguished Young Scholars fromNSFC in 2017, and was a co-recipient of five international conference bestpaper awards. He is currently serving as Editor for IEEE Transactions onCommunications, IET Communications and some other international journals.He served as General Chair, TPC Co-Chair or Symposium Co-Chair for manyinternational conferences like VTC-Spring 2017 HMWC Workshop, WCSP2018/2013 and Globecom 2014 Wireless Communications Symposium, etc.

    Xiao Cai received the B.S. degree in Informa-tion and Communication Engineering from ZhejiangUniversity, Hangzhou, China, in 2011. He is cur-rently working toward the Ph.D. degree in Infor-mation and Communication Engineering at ZhejiangUniversity, Hangzhou, China. His research interestsinclude signal processing techniques for communi-cations and in networks.

    Chunguang Li (M’14-SM’14) received the M.S.degree in pattern recognition and intelligent systemsand the Ph.D. degree in circuits and systems fromthe University of Electronic Science and Technol-ogy of China, Chengdu, China, in 2002 and 2004,respectively.

    He is currently a Professor with the Collegeof Information Science and Electronic Engineering,Zhejiang University, Hangzhou, China. His currentresearch interests include statistical signal process-ing, machine intelligence, and wireless sensor net-

    work.

    Caijun Zhong (S’07-M’10-SM’14) received theB.S. degree in Information Engineering from theXi’an Jiaotong University, Xi’an, China, in 2004,and the M.S. degree in Information Security in 2006,Ph.D. degree in Telecommunications in 2010, bothfrom University College London, London, UnitedKingdom. From September 2009 to September 2011,he was a research fellow at the Institute for Electron-ics, Communications and Information Technologies(ECIT), Queens University Belfast, Belfast, UK.Since September 2011, he has been with Zhejiang

    University, Hangzhou, China, where he is currently an associate professor.His research interests include massive MIMO systems, full-duplex communi-cations, wireless power transfer and physical layer security.

    Dr. Zhong is an Editor of the IEEE TRANSACTIONS ON WIRELESS COM-MUNICATIONS, IEEE COMMUNICATIONS LETTERS, EURASIP JOURNALON WIRELESS COMMUNICATIONS AND NETWORKING, and JOURNAL OFCOMMUNICATIONS AND NETWORKS. He is the recipient of the 2013 IEEEComSoc Asia-Pacific Outstanding Young Researcher Award. He and hiscoauthors has been awarded a Best Paper Award at the WCSP 2013. He wasan Exemplary Reviewer for IEEE TRANSACTIONS ON COMMUNICATIONSin 2014.

    Huaiyu Dai (F’17) received the B.E. and M.S.degrees in electrical engineering from Tsinghua Uni-versity, Beijing, China, in 1996 and 1998, respec-tively, and the Ph.D. degree in electrical engineeringfrom Princeton University, Princeton, NJ in 2002.

    He was with Bell Labs, Lucent Technologies,Holmdel, NJ, in summer 2000, and with AT&TLabs-Research, Middletown, NJ, in summer 2001.He is currently a Professor of Electrical andComputer Engineering with NC State University,Raleigh. His research interests are in the general

    areas of communication systems and networks, advanced signal processing fordigital communications, and communication theory and information theory.His current research focuses on networked information processing and cross-layer design in wireless networks, cognitive radio networks, network security,and associated information-theoretic and computation-theoretic analysis.

    He has served as an editor of IEEE Transactions on Wireless Communi-cations, IEEE Transactions on Signal Processing, and IEEE Transactions onWireless Communications. Currently he is an Area Editor in charge of wirelesscommunications for IEEE Transactions on Communications. He co-editedtwo special issues of EURASIP journals on distributed signal processingtechniques for wireless sensor networks, and on multiuser information theoryand related applications, respectively. He co-chaired the Signal Processing forCommunications Symposium of IEEE Globecom 2013, the CommunicationsTheory Symposium of IEEE ICC 2014, and the Wireless CommunicationsSymposium of IEEE Globecom 2014. He was a co-recipient of best paperawards at 2010 IEEE International Conference on Mobile Ad-hoc and SensorSystems (MASS 2010), 2016 IEEE INFOCOM BIGSECURITY Workshop,and 2017 IEEE International Conference on Communications (ICC 2017).