Local Differential Privacy based Federated Learning for ...

33
arXiv:2004.08856v2 [cs.CR] 22 Dec 2020 This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks. Local Differential Privacy based Federated Learning for the Internet of Things Yang Zhao, Graduate Student Member, IEEE, Jun Zhao, Member, IEEE, Mengmeng Yang, Member, IEEE, Teng Wang, Member, IEEE, Ning Wang, Member, IEEE, Lingjuan Lyu, Member, IEEE, Dusit Niyato, Fellow, IEEE, and Kwok-Yan Lam, Senior Member, IEEE Abstract—The Internet of Vehicles (IoV) is a promising branch of the Internet of Things. IoV simulates a large variety of crowdsourcing applications such as Waze, Uber, and Amazon Mechanical Turk, etc. Users of these applications report the real-time traffic information to the cloud server which trains a machine learning model based on traffic information reported by users for intelligent traffic management. However, crowdsourcing application owners can easily infer users’ location information, traffic information, motor vehicle information, environmental in- formation, etc., which raises severe sensitive personal information privacy concerns of the users. In addition, as the number of vehicles increases, the frequent communication between vehicles and the cloud server incurs unexpected amount of communication cost. To avoid the privacy threat and reduce the communi- cation cost, in this paper, we propose to integrate federated learning and local differential privacy (LDP) to facilitate the crowdsourcing applications to achieve the machine learning model. Specifically, we propose four LDP mechanisms to perturb Yang Zhao and Jun Zhao are supported by 1) Nanyang Technological University (NTU) Startup Grant, 2) Alibaba-NTU Singapore Joint Research Institute (JRI), 3) Singapore Ministry of Education Academic Research Fund Tier 1 RG128/18, Tier 1 RG115/19, Tier 1 RT07/19, Tier 1 RT01/19, and Tier 2 MOE2019-T2-1-176, 4) NTU-WASP Joint Project, 5) Singapore National Research Foundation (NRF) under its Strategic Capability Research Centres Funding Initiative: Strategic Centre for Research in Privacy-Preserving Technologies & Systems (SCRIPTS), 6) Energy Research Institute @NTU (ERIAN), 7) Singapore NRF National Satellite of Excellence, Design Science and Technology for Secure Critical Infrastructure NSoE DeST-SCI2019- 0012, 8) AI Singapore (AISG) 100 Experiments (100E) programme, and 9) NTU Project for Large Vertical Take-Off & Landing (VTOL) Research Platform. Mengmeng Yang and Kwok-Yan Lam are supported by the National Research Foundation, Prime Minister’s Office, Singapore under its Strategic Capability Research Centres Funding Initiative. Ning Wang is supported by the National Natural Science Foundation of China (61902365) and China Postdoctoral Science Foundation Grant (2019M652473). Dusit Niyato is supported by the National Research Foundation (NRF), Singapore, under Sin- gapore Energy Market Authority (EMA), Energy Resilience, NRF2017EWT- EP003-041, Singapore NRF2015-NRF-ISF001-2277, Singapore NRF National Satellite of Excellence, Design Science and Technology for Secure Crit- ical Infrastructure NSoE DeST-SCI2019-0007, A*STAR-NTU-SUTD Joint Research Grant on Artificial Intelligence for the Future of Manufacturing RGANS1906, Wallenberg AI, Autonomous Systems and Software Program and Nanyang Technological University (WASP/NTU) under grant M4082187 (4080), Singapore Ministry of Education (MOE) Tier 1 (RG16/20), and NTU- WeBank JRI (NWJ-2020-004), Alibaba Group through Alibaba Innovative Research (AIR) Program and Alibaba-NTU Singapore Joint Research Institute (JRI). (Corresponding author: Jun Zhao). Yang Zhao, Jun Zhao, Mengmeng Yang, Dusit Niyato and Kwok- Yan Lam are with School of Computer Science and Engineer- ing, Nanyang Technological University, Singapore, 639798. (Emails: [email protected], [email protected], [email protected], dniy- [email protected], [email protected]). Teng Wang is with School of Cyberspace Security, Xi’an University of Posts & Telecommunications. (Email: [email protected]). Ning Wang is with College of Information Science and Engineering, Ocean University of China, 266102. (Email: [email protected]). Lingjuan Lyu is with Department of Computer Science, National University of Singapore, 117417. (Email: [email protected]). gradients generated by vehicles. The proposed Three-Outputs mechanism introduces three different output possibilities to deliver a high accuracy when the privacy budget is small. The output possibilities of Three-Outputs can be encoded with two bits to reduce the communication cost. Besides, to maximize the performance when the privacy budget is large, an optimal piecewise mechanism (PM-OPT) is proposed. We further propose a suboptimal mechanism (PM-SUB) with a simple formula and comparable utility to PM-OPT. Then, we build a novel hybrid mechanism by combining Three-Outputs and PM-SUB. Fi- nally, an LDP-FedSGD algorithm is proposed to coordinate the cloud server and vehicles to train the model collaboratively. Extensive experimental results on real-world datasets validate that our proposed algorithms are capable of protecting privacy while guaranteeing utility. The development of sensors and communication technolo- gies for Internet of Things (IoT) have enabled a fast and large- scale collection of user data, which has bred new services and applications such as the Waze application that provides the intelligent transportation routing service. This kind of service benefits users’ daily life, but it may raise privacy concerns of sensitive data such as users’ location information. To address these concerns, we propose a hybrid approach that integrates federated learning (FL) [1] with local differential privacy (LDP) [2] techniques. FL can facilitate the collab- orative learning with uploaded gradients from users instead of sharing users’ raw data. A honest-but-curious aggregator may be able to leverage users’ uploaded gradients to infer the original data [3,4]. Thus, we deploy LDP noises to gradients to ensure privacy while not compromising the utility of gradients. Federated learning with LDP. In addition to LDP mech- anisms, FL also provides privacy protection to the data by enabling users to maintain data locally. FedSGD algorithm [5] allows users to submit gradients instead of true data. However, attackers may reverse the gradients to infer original data. By adding LDP noises to the gradients before uploading, we obtain the LDP-FedSGD algorithm, which prevents attackers from deducing original data even though they obtain perturbed gradients. As a result, the FL server gathers and averages users’ submitted perturbed gradients to obtain the averaged result to update the global model’s parameters. Existing LDP mechanisms. Since the proposal of LDP in [7], various LDP mechanisms have been proposed in the lit- erature. Mechanisms for categorical data are presented in [9]– [11]. For numerical data, for which new LDP mechanisms are developed in this paper, prior mechanisms of [6]–[8] are discussed as follows. For simplicity, we consider data with 1

Transcript of Local Differential Privacy based Federated Learning for ...

Page 1: Local Differential Privacy based Federated Learning for ...

arX

iv:2

004.

0885

6v2

[cs

.CR

] 2

2 D

ec 2

020

This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks.

Local Differential Privacy based Federated Learning

for the Internet of ThingsYang Zhao, Graduate Student Member, IEEE, Jun Zhao, Member, IEEE, Mengmeng Yang, Member, IEEE,

Teng Wang, Member, IEEE, Ning Wang, Member, IEEE, Lingjuan Lyu, Member, IEEE,

Dusit Niyato, Fellow, IEEE, and Kwok-Yan Lam, Senior Member, IEEE

Abstract—The Internet of Vehicles (IoV) is a promising branchof the Internet of Things. IoV simulates a large variety ofcrowdsourcing applications such as Waze, Uber, and AmazonMechanical Turk, etc. Users of these applications report thereal-time traffic information to the cloud server which trains amachine learning model based on traffic information reported byusers for intelligent traffic management. However, crowdsourcingapplication owners can easily infer users’ location information,traffic information, motor vehicle information, environmental in-formation, etc., which raises severe sensitive personal informationprivacy concerns of the users. In addition, as the number ofvehicles increases, the frequent communication between vehiclesand the cloud server incurs unexpected amount of communicationcost. To avoid the privacy threat and reduce the communi-cation cost, in this paper, we propose to integrate federatedlearning and local differential privacy (LDP) to facilitate thecrowdsourcing applications to achieve the machine learningmodel. Specifically, we propose four LDP mechanisms to perturb

Yang Zhao and Jun Zhao are supported by 1) Nanyang TechnologicalUniversity (NTU) Startup Grant, 2) Alibaba-NTU Singapore Joint ResearchInstitute (JRI), 3) Singapore Ministry of Education Academic Research FundTier 1 RG128/18, Tier 1 RG115/19, Tier 1 RT07/19, Tier 1 RT01/19,and Tier 2 MOE2019-T2-1-176, 4) NTU-WASP Joint Project, 5) SingaporeNational Research Foundation (NRF) under its Strategic Capability ResearchCentres Funding Initiative: Strategic Centre for Research in Privacy-PreservingTechnologies & Systems (SCRIPTS), 6) Energy Research Institute @NTU(ERIAN), 7) Singapore NRF National Satellite of Excellence, Design Scienceand Technology for Secure Critical Infrastructure NSoE DeST-SCI2019-0012, 8) AI Singapore (AISG) 100 Experiments (100E) programme, and9) NTU Project for Large Vertical Take-Off & Landing (VTOL) ResearchPlatform. Mengmeng Yang and Kwok-Yan Lam are supported by the NationalResearch Foundation, Prime Minister’s Office, Singapore under its StrategicCapability Research Centres Funding Initiative. Ning Wang is supported bythe National Natural Science Foundation of China (61902365) and ChinaPostdoctoral Science Foundation Grant (2019M652473). Dusit Niyato issupported by the National Research Foundation (NRF), Singapore, under Sin-gapore Energy Market Authority (EMA), Energy Resilience, NRF2017EWT-EP003-041, Singapore NRF2015-NRF-ISF001-2277, Singapore NRF NationalSatellite of Excellence, Design Science and Technology for Secure Crit-ical Infrastructure NSoE DeST-SCI2019-0007, A*STAR-NTU-SUTD JointResearch Grant on Artificial Intelligence for the Future of ManufacturingRGANS1906, Wallenberg AI, Autonomous Systems and Software Programand Nanyang Technological University (WASP/NTU) under grant M4082187(4080), Singapore Ministry of Education (MOE) Tier 1 (RG16/20), and NTU-WeBank JRI (NWJ-2020-004), Alibaba Group through Alibaba InnovativeResearch (AIR) Program and Alibaba-NTU Singapore Joint Research Institute(JRI). (Corresponding author: Jun Zhao).

Yang Zhao, Jun Zhao, Mengmeng Yang, Dusit Niyato and Kwok-Yan Lam are with School of Computer Science and Engineer-ing, Nanyang Technological University, Singapore, 639798. (Emails:[email protected], [email protected], [email protected], [email protected], [email protected]).

Teng Wang is with School of Cyberspace Security, Xi’an University ofPosts & Telecommunications. (Email: [email protected]).

Ning Wang is with College of Information Science and Engineering, OceanUniversity of China, 266102. (Email: [email protected]).

Lingjuan Lyu is with Department of Computer Science, National Universityof Singapore, 117417. (Email: [email protected]).

gradients generated by vehicles. The proposed Three-Outputs

mechanism introduces three different output possibilities todeliver a high accuracy when the privacy budget is small. Theoutput possibilities of Three-Outputs can be encoded withtwo bits to reduce the communication cost. Besides, to maximizethe performance when the privacy budget is large, an optimalpiecewise mechanism (PM-OPT) is proposed. We further proposea suboptimal mechanism (PM-SUB) with a simple formula andcomparable utility to PM-OPT. Then, we build a novel hybridmechanism by combining Three-Outputs and PM-SUB. Fi-nally, an LDP-FedSGD algorithm is proposed to coordinate thecloud server and vehicles to train the model collaboratively.Extensive experimental results on real-world datasets validatethat our proposed algorithms are capable of protecting privacywhile guaranteeing utility.

The development of sensors and communication technolo-

gies for Internet of Things (IoT) have enabled a fast and large-

scale collection of user data, which has bred new services

and applications such as the Waze application that provides

the intelligent transportation routing service. This kind of

service benefits users’ daily life, but it may raise privacy

concerns of sensitive data such as users’ location information.

To address these concerns, we propose a hybrid approach that

integrates federated learning (FL) [1] with local differential

privacy (LDP) [2] techniques. FL can facilitate the collab-

orative learning with uploaded gradients from users instead

of sharing users’ raw data. A honest-but-curious aggregator

may be able to leverage users’ uploaded gradients to infer the

original data [3,4]. Thus, we deploy LDP noises to gradients to

ensure privacy while not compromising the utility of gradients.

Federated learning with LDP. In addition to LDP mech-

anisms, FL also provides privacy protection to the data by

enabling users to maintain data locally. FedSGD algorithm [5]

allows users to submit gradients instead of true data. However,

attackers may reverse the gradients to infer original data. By

adding LDP noises to the gradients before uploading, we

obtain the LDP-FedSGD algorithm, which prevents attackers

from deducing original data even though they obtain perturbed

gradients. As a result, the FL server gathers and averages

users’ submitted perturbed gradients to obtain the averaged

result to update the global model’s parameters.

Existing LDP mechanisms. Since the proposal of LDP

in [7], various LDP mechanisms have been proposed in the lit-

erature. Mechanisms for categorical data are presented in [9]–

[11]. For numerical data, for which new LDP mechanisms

are developed in this paper, prior mechanisms of [6]–[8] are

discussed as follows. For simplicity, we consider data with

1

Page 2: Local Differential Privacy based Federated Learning for ...

This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks.

TABLE I: A comparison of the worst-case variances of ex-

isting ǫ-LDP mechanisms on a single numeric attribute with

a domain [−1, 1]: Duchi of [6] generating a binary output,

Laplace of [7] with the addition of Laplace noise, and the

Piecewise Mechanism (PM) of [8]. For an LDP mechanism A,

its worst-case variance is denoted by VA. We obtain this table

based on results of [8].

Range of ǫ Comparison of mechanisms

0 < ǫ < 1.29 VDuchi < VPM < VLaplace1.29 < ǫ < 2.32 VPM < VDuchi < VLaplaceǫ > 2.32 VPM < VLaplace < VDuchi

TABLE II: A comparison of the worst-case variances of our

and existing ǫ-LDP mechanisms on a single numeric attribute

with a domain [−1, 1]. Three-Outputs and PM-SUB are

our main LDP mechanisms proposed in this paper. The results

in this table show the advantages of our mechanisms over

existing mechanisms for a wide range of privacy parameter ǫ.

Range of ǫ Comparison of mechanisms

0 < ǫ < ln 2 ≈ 0.69 VDuchi = VThree-Outputs < VPM-SUB < VPMln 2 < ǫ < 1.19 VThree-Outputs < VDuchi < VPM-SUB < VPM1.19 < ǫ < 1.29 VThree-Outputs < VPM-SUB < VDuchi < VPM1.29 < ǫ < 2.56 VThree-Outputs < VPM-SUB < VPM < VDuchi2.56 < ǫ < 3.27 VPM-SUB < VThree-Outputs < VPM < VDuchiǫ > 3.27 VPM-SUB < VPM < VThree-Outputs < VDuchi

a single numeric attribute which has a domain [−1, 1] (after

normalization if the original domain is not [−1, 1]). Extensions

to the case of multiple numeric attributes will be discussed

later in the paper.

• Laplace of [7]. LDP can be understood as a variant of

DP, where the difference is the definition of “neighboring

data(sets)”. In DP, two datasets are neighboring if they

differ in just one record; in LDP, any two instances of

the user’s data are neighboring. Due to this connection

between DP and LDP, the classical Laplace mechanism

(referred to Laplace hereinafter) for DP can thus also

be used to achieve LDP. Yet, Laplace may not achieve

a high utility for some ǫ since ① Laplace does not

consider the difference between DP and LDP .

• Duchi of [6]. In view of the above drawback ① of

Laplace, Duchi et al. [6] introduce alternative mech-

anisms for LDP. For a single numeric attribute, one

mechanism, hereinafter referred to Duchi of [6], flips

a coin with two possibilities to generate an output, where

the probability of each possibility depends on the input.

A disadvantage of Duchi is as follows: ② Since the

output of Duchi has only two possibilities, the utility

may not be high for large ǫ (intuitively, for large ǫ,the privacy protection is weak so the output should be

close to the input which means the output should have

many possibilities since the input can take any value in

[−1, 1])1.

• Piecewise Mechanism (PM) of [8]. Due to the draw-

back ① of Laplace, and the drawback ② of Duchi

above, Wang et al. [8] propose the Piecewise Mechanism

(PM) which achieves higher utility than Duchi for large

ǫ, since the output range of PM is continuous and has

infinite possibilities, instead of just 2 possibilities as

in Duchi. In PM, the plot of the output’s probability den-

sity function with respect to the output value consists of

three “pieces”, among which the center piece has a higher

probability than the other two. As the input increases,

the length of the center piece remains unchanged, but the

length of the leftmost (resp., rightmost) piece increases

(resp., decreases). Since PM is tailored for LDP, unlike

Laplace for LDP, PM has a strictly lower worst-case

variance (i.e., the maximum variance with respect to the

input given ǫ) than Laplace for any ǫ.

Proposing new LDP mechanisms. Based on results of [8],

Table I on Page 2 shows that among the three mechanisms

Duchi, Laplace, and PM, in terms of the worst-case vari-

ance, Duchi is the best for 0 < ǫ < 1.29, while PM is the best

for ǫ > 1.29. Then a natural research question is that can we

propose better or even optimal LDP mechanisms? The optimal

LDP mechanism for numeric data is still open in the literature,

but the optimal LDP mechanism for categorical data has been

discussed by Kairouz et al. [12]. In particular, [12] shows that

for a categorical attribute (with a limited number of discrete

values), the binary and randomized response mechanisms, are

universally optimal for very small and large ǫ, respectively.

Although [12] handles categorical data, its results can provide

the following insight even for continuous numeric data: for

very small ǫ, a mechanism generating a binary output should

be optimal; for very large ǫ, the optimal mechanism’s output

should have infinite possibilities. This is also in consistent

with the results of Table I on Page 2. Based on the above

insight, intuitively, there may exist a range of medium ǫwhere a mechanism with three, or four, or five, ... output

possibilities can be optimal. To this end, our first motivation

of proposing new LDP mechanisms is to develop a mecha-

nism with three output possibilities such that the mechanism

outperform existing mechanisms of Table I for some ǫ. The

outcome of the above motivation is our mechanism called

Three-Outputs. Since the analysis of Three-Outputs

is already very complex, we do not consider a mechanism with

four, or five, ... output possibilities.

In addition, our second motivation of proposing new LDP

mechanisms is that despite the elegance of the Piecewise

Mechanism (PM) of [8], we can derive the optimal mechanism

PM-OPT under the “piecewise framework” of [8], in order to

improve PM. Note that PM-OPT is just optimal under the above

framework and may not be the optimal LDP mechanism. Since

1Note that any algorithm satisfying DP or LDP has the following property:the set of possible values for the output does not depend on the input (thoughthe output distribution depends on the input). This can be easily seen bycontradiction. Suppose an output y is possible for input x but not for x′ (xand x′ satisfy the neighboring relation in DP or LDP). Then P [y | x] > 0and P [y | x′] = 0, resulting in P [y | x] > eǫP [y | x′] and hence violatingthe privacy requirement (P[·|·] denotes conditional probability).

2

Page 3: Local Differential Privacy based Federated Learning for ...

This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks.

the expressions for PM-OPT are quite complex, we present

PM-SUB which compared with PM-OPT is suboptimal, but

has simpler expressions and achieves a comparable utility.

Table II on Page 2 gives a comparison of our and existing

LDP mechanisms. For simplicity, we do not include Laplace

in the comparison since Laplace is worse than PM for any

ǫ according to Table I. As shown in Table II, in terms of

the worst-case variance for a single numeric attribute with

a domain [−1, 1], Three-Outputs outperforms Duchi for

ǫ > ln 2 ≈ 0.69 (and is the same as Duchi for ǫ ≤ ln 2), while

PM-SUB beats PM for any ǫ; moreover, Three-Outputs

outperforms both Duchi and PM for ln 2 < ǫ < 3.27, while

PM-SUB beats both Duchi and PM for ǫ > 1.19.

We also follow the practice of [8], which combines different

mechanisms Duchi and PM to obtain a hybrid mechanism

HM. HM has a lower worst-case variance than those of Duchi

and PM. In this paper, we combine Three-Outputs and

PM-SUB to propose HM-TP. The intuition is as follows.

Given ǫ, the variances of Three-Outputs and PM-SUB

(denoted by Tǫ(x) and Pǫ(x)) depend on the input x, and

their maximal values (i.e., the worst-case variances) may be

taken at different x. Hence, for a hybrid mechanism which

probabilistically uses Three-Outputs with probability qor PM-SUB with probability 1 − q, given ǫ, the worst-case

variance maxx (q · Tǫ(x) + (1 − q) · Pǫ(x)) may be strictly

smaller than the minimum of maxx Tǫ(x) and maxx Pǫ(x).We optimize q for each ǫ to obtain HM-TP. Due to the

already complex expression of PM-OPT, we do not consider

the combination of PM-OPT and Three-Outputs.

Contributions. Our contributions can be summarized as

follows:

• Using the LDP-FedSGD algorithm for federated learning

in IoV as a motivating context, we present novel LDP

mechanisms for numeric data with a continuous domain.

Among our proposed mechanisms, Three-Outputs

and PM-SUB outperform existing mechanisms for a wide

range of ǫ, as shown in the theoretical results in Ta-

ble II and confirmed by experiments. In terms of com-

paring our Three-Outputs and PM-SUB, we have:

Three-Outputs, whose output has three possibilities,

is better for small ǫ, while PM-SUB, whose output can

take infinite possibilities of an interval, has higher utility

for large ǫ. Our PM-SUB is a slightly suboptimal version

of our PM-OPT to simplify the expressions. We further

combine Three-Outputs and PM-SUB to obtain a

hybrid mechanism HM-TP, which achieves even higher

utility.

• We discretize the continuous output ranges of our pro-

posed mechanisms PM-SUB and PM-OPT. Through the

discretization post-processing, we enable vehicles to use

our proposed mechanisms. In Section VII, we confirm

that the discretization post-processing algorithm main-

tains utility with our experiments, while reducing the

communication cost.

• Experimental evaluation of our proposed mechanisms on

real-world datasets and synthetic datasets demonstrates

that our proposed mechanisms achieve higher accuracy

in estimating the mean frequency of the data and per-

forming empirical risk minimization tasks than existing

approaches.

Organization. In the following, Section I introduces the

preliminaries. Then, we introduce related works in Section II.

Then, we illustrate the system model and the local differential

privacy based FedSGD algorithm in Section III. Section IV

presents the problem formation. Section V proposes novel

solutions for the single numerical data estimation. Section VI

illustrates proposed mechanisms used for multidimensional

numerical data estimation. Section VII demonstrates our ex-

perimental results. Section VIII concludes the paper.

I. PRELIMINARIES

In local differential privacy, users complete the perturbation

by themselves. To protect users’ privacy, each user runs a ran-

dom perturbation algorithm M, and then he sends perturbed

results to the aggregator. The privacy budget ǫ controls the

privacy-utility trade-off, and a higher privacy budget means a

lower privacy protection. As a result of this, we define local

differential privacy as follows:

Definition 1. (Local Differential Privacy.) Let M be a ran-

domized function with domain X and range Y; i.e., M maps

each element in X to a probability distribution with sample

space Y. For a non-negative ǫ, the randomized mechanism

M satisfies ǫ-local differential privacy if∣

lnPM[Y ∈ S|x]PM[Y ∈ S|x′]

≤ ǫ, ∀x, x′ ∈ X, ∀S ⊆ Y. (1)

PM[·|·] means conditional probability distribution depend-

ing on M. In local differential privacy, the random perturba-

tion is performed by users instead of a centralized aggregator.

Centralized aggregator only receives perturbed results which

make sure that the aggregator is unable to distinguish whether

the true tuple is x or x′ with high confidence (controlled by

the privacy budget ǫ).

0 1 2 3 4 5 6 7 8

privacy budget

0

1

2

3

4

5

6

wor

st-c

ase

vari

ance

Fig. 1: Different mechanisms’ worst-case noise variance for

one-dimensional numeric data versus the privacy budget ǫ.

II. RELATED WORK

Recently, local differential privacy has attracted much at-

tention [13]–[20]. Several mechanisms for numeric data es-

timation have been proposed [6]–[8,21]. (i) Dwork et al. [7]

3

Page 4: Local Differential Privacy based Federated Learning for ...

This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks.

propose the Laplace mechanism which adds the Laplace noise

to real one-dimensional data directly. The Laplace mechanism

is originally used in the centralized differential privacy mecha-

nism, and it can be applied to local differential privacy directly.

(ii) For a single numeric attribute with a domain [−1, 1],Duchi et al. [6] propose an LDP framework that provides out-

put from {−C,C}, where C > 1. (iii) Wang et al. [8] propose

the piecewise mechanism (PM) which offers an output that

contains infinite possibilities in the range of [−A,A], where

A > 1. In addition, they apply LDP mechanism to preserve the

privacy of gradients generated during machine learning tasks.

Both approaches by Duchi et al. [6] and Wang et al. [8] can

be extended to the case of multidimensional numerical data.

Deficiencies of existing solutions. Fig. 1 illustrates that when

ǫ ≤ 2.3, Laplace mechanism’s worst-case noise variance is

larger than that of Duchi et al.’s [22] solution; however, the

Laplace mechanism outperforms Duchi et al.’s [22] solution

if ǫ is larger. The worst-case noise variance in PM is smaller

than that of Laplace and Duchi et al.’s [22] solution when ǫ is

large. The HM mechanism outperforms other existing solutions

by taking advantage of Duchi et al.’s [22] solution when ǫ is

small and PM when ǫ is large. However, PM and HM’s outputs

have infinite possibilities that are hard to encode. We would

like to find a mechanism that can improve the utility of existing

mechanisms. In addition, we believe there is a mechanism that

retains a high utility and is easy to encode its outputs. Based

on the above intuition, we propose four novel mechanisms that

can be used by vehicles in Section V.

In addition, LDP has been widely used in the research

of IoT [23]–[27]. For example, Xu et al. [23] integrate

deep learning with local differential privacy techniques and

apply them to protect users’ privacy in edge computing.

They develop an EdgeSanitizer framework that forms a new

protection layer against sensitive inference by leveraging a

deep learning model to mask the learned features with noise

and minimize data. Choi et al. [24] explore the feasibility

of applying LDP on ultra-low-power (ULP) systems. They

use resampling, thresholding, and a privacy budget control

algorithm to overcome the low resolution and fixed point

nature of ULPs. He et al. [25] address the location privacy and

usage pattern privacy induced by the mobile edge computing’s

wireless task offloading feature by proposing a privacy-aware

task offloading scheduling algorithm based on constrained

Markov decision process. Li et al. [26] propose a scheme

for privacy-preserving data aggregation in the mobile edge

computing to assist IoT applications with three participants,

i.e., a public cloud center (PCC), an edge server (ES), and

a terminal device (TD). TDs generate and encrypt data and

send them to the ES, and then the ES submits the aggregated

data to the PCC. The PCC uses its private key to recover

the aggregated plaintext data. Their scheme provides source

authentication and integrity and guarantees the data privacy

of the TDs. In addition, their scheme can save half of the

communication cost. To protect the privacy of massive data

generated from IoT platforms, Arachchige et al. [27] design

an LDP mechanism named as LATENT for deep learning. A

randomization layer between the convolutional module and the

fully connected module is added to the LATENT to perturb

data before data leave data owners for machine learning ser-

vices. Pihur et al. [28] propose the Podium Mechanism which

is similar to our PM-SUB, but his mechanism is applicable to

DP instead of LDP.

Moreover, federated learning or collaborative learning is

an emerging distributed machine learning paradigm, and it

is widely used to address data privacy problem in machine

learning [1,29]. Recently, federated learning is explored ex-

tensively in the Internet of Things recently [30]–[34]. Lim et

al. [30] survey federated learning applications in mobile edge

network comprehensively, including algorithms, applications

and potential research problems, etc. Besides, Lu et al. [32]

propose CLONE which is a collaborative learning framework

on the edges for connected vehicles, and it reduces the training

time while guaranteeing the prediction accuracy. Different

from CLONE, our proposed approach utilizes local differential

privacy noises to protect the privacy of the uploaded data.

Furthermore, Fantacci et al. [33] leverage FL to protect the

privacy of mobile edge computing, while Saputra et al. [34]

apply FL to predict the energy demand for electrical vehicle

networks.

Furthermore, there have been many papers on federated

learning and differential privacy such as [31,35]–[43]. For

example, Truex et al. [36] utilize both secure multiparty

computation and centralized differential privacy to prevent

inference over both the messages exchanged in the process

of training the model. However, they do not analyze the

impact of the privacy budget on performance of FL. Hu et

al. [37] propose a privacy-preserving FL approach for learning

effective personalized models. They use Gaussian mechanism,

a centralized DP mechanism, to protect the privacy of the

model. Compared with them, our proposed LDP mechanisms

provide a stronger privacy protection using the local differen-

tial privacy mechanism. Hao et al. [31] propose a differential

enhanced federated learning scheme for industrial artificial

industry. Triastcyn et al. [39] employ Bayesian differential

privacy on federated learning. They make use of the cen-

tralized differential privacy mechanism to protect the privacy

of gradients, but we leverage a stronger privacy-preserving

mechanism (LDP) to protect each vehicle’s privacy.

Additionally, DP can be applied to various FL algorithms

such as FedSGD [5] and FedAvg [1]. FedAvg requires

users to upload model parameters instead of gradients in

FedSGD. The advantage of FedAvg is that it allows users to

train the model for multiple rounds locally before submitting

gradients. McMahan et al. [35] propose to apply centralized

DP to FedAvg and FedSGD algorithm. In our paper, we

deploy LDP mechanisms to gradients in FedSGD algorithm.

Our future work is to develop LDP mechanisms for state-of-

the-art FL algorithms. Zhao et al. [44] propose a SecProbe

mechanism to protect privacy and quality of participants’

data by leveraging exponential mechanism and functional

mechanism of differential privacy. SecProbe guarantees the

high accuracy as well as privacy protection. In addition, it

prevents unreliable participants in the collaborative learning.

4

Page 5: Local Differential Privacy based Federated Learning for ...

This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks.

III. SYSTEM MODEL AND ERENTIAL PRIVACY BASED

FEDSGD ALGORITHM

A. System Model

In this work, we consider a scenario where a number of

vehicles are connected with a cloud server as Fig. 2. Each

vehicle is responsible for continuously performing training and

inference locally based on data that it collects and the model

initiated by the cloud server. Local training dataset is never

uploaded to the cloud server. After finishing predefined epochs

locally, the cloud server calculates the average of uploaded

gradients from vehicles and updates the global model with

the average. The FL aggregator is honest-but-curious or semi-

honest, which follows the FL protocol but it will try to learn

additional information using received data [36,45]. With the

injected LDP noise, servers or attackers cannot retrieve users’

information by reversing their uploaded gradients [3,4]. Thus,

there is a need to deploy LDP mechanisms to the FL to develop

a communication-efficient LDP-based FL algorithm.

Cloud Server Download model

UploadÊ gradients

Noisy gradient gradient

LDP

LDP

LDP

Local DatasetÊ

Download model

UploadÊ gradients

Download modelUploadÊ gradients

Fig. 2: System Design.

B. Federated learning with LDP: LDP-FedSGD

In addition, we propose a local differential privacy

based federated stochastic gradient descent algorithm

(LDP-FedSGD) for our proposed system. Details of

LDP-FedSGD are given in Algorithm 1. Unlike the FedAvg

algorithm, in the FedSGD algorithm, clients (i.e., vehicles)

upload updated gradients instead of model parameters to the

central aggregator (i.e., cloud server) [5]. However, compared

with the standard FedSGD [5], we add our proposed LDP

mechanism proposed in Section VI to prevent the privacy

leakage of gradients. Each vehicle locally takes one step of

gradient descent on the current model using its local data, and

then it perturbs the true gradient with Algorithm 6. The server

aggregates and averages the updated gradients from vehicles

and then updates the model. To reduce the communication

rounds, we separate vehicles into groups, so that the cloud

server updates the model after gathering gradient updates

from vehicles in a group. In the following sections, we will

introduce how we obtain the LDP algorithm in detail.

C. Comparing LDP-FedSGD with other privacy-preserving

federated learning paradigms

The LDP-FedSGD algorithm incorporates LDP into fed-

erated learning. In addition to LDP-FedSGD, one may be

Algorithm 1: Local Differential Privacy based

FedSGD (LDP-FedSGD) Algorithm.

1 Server executes:

2 Server initializes the parameter as θ0;

3 for t from 1 to maximal iteration number do

4 Server sends θt−1 to vehicles in group Gt;

5 for each vehicle i in Group Gt do

6 VehicleUpdate(i, ∆L):

7 Server computes the average of the noisy gradient

of group Gt and updates the parameter from θt−1to θt:θt ← θt−1 − η · 1

|Gt|∑

i∈GtM(∆L(θt−1;xi)),

where ηt is the learning rate;

8 if θt and θt−1 are close enough or these remains

no vehicle which has not participated in the

computation then

9 break;

10 t→ t+ 1;

11 VehicleUpdate (i, ∆L):

12 Compute the (true) gradient ∆L(θt−1;xi), where xi is

vehicle i’s data;

13 Use local differential privacy-compliant algorithm Mto compute the noisy gradient M(∆(θt−1;xi));

interested in other ways of using DP in federated learning.

To explain them, we categorize combinations of DP and

federated learning (or distributed computations in general) by

considering the place of perturbation (distributed/centralized

perturbation) and privacy granularity (user-level/record-level

privacy protection) [46]:

• Distributed/Centralized perturbation. Note that differ-

ential privacy is achieved by introducing perturbation.

Distributed perturbation considers an honest-but-curious

aggregator, while centralized perturbation needs a trusted

aggregator. Both perturbation methods defend against

external inference attacks after model publishing.

• User-level/Record-level privacy protection. In general,

a differentially private algorithm ensures that the prob-

ability distributions of the outputs on two neighboring

datasets do not differ much. The distinction between

user-level and record-level privacy protection lies in how

neighboring datasets are defined. We define that two

datasets are user-neighboring if one dataset can be formed

from the other dataset by adding or removing all one

user’s records arbitrarily. We define that two datasets are

record-neighboring if one dataset can be obtained from

the other dataset by changing a single record of one user.

Based on the above, we further obtain four paradigms as fol-

lows: 1) ULDP (user-level privacy protection with distributed

perturbation), 2) RLDP (record-level privacy protection with

distributed perturbation), 3) RLCP (record-level privacy pro-

tection with centralized perturbation), and 4) ULCP (user-level

privacy protection with centralized perturbation). The details

are as follows and can also be found in the second author’s

prior work [46].

5

Page 6: Local Differential Privacy based Federated Learning for ...

This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks.

1) In ULDP, each user i selects a privacy parameter ǫi and

applies a randomization algorithm Yi such that given any

two instances xi and x′i of user i’s data (which are user-

neighboring), and for any possible subset of outputs2 Yiof Yi, we obtain P [Yi ∈ Yi | xi] ≤ eǫi×P [Yi ∈ Yi | x′i].Clearly, ULDP is achieved by each user implementing

local differential privacy studied in this paper.

2) In RLDP, each user i selects a privacy parameter ǫiand applies a randomization algorithm Yi such that for

any two record-neighboring instances xi and x′i of user

i’s data (i.e, xi and x′i differ in only one record), and

for any possible subset of outputs Yi of Yi, we have

P [Yi ∈ Yi | xi] ≤ eǫi × P [Yi ∈ Yi | x′i]. Clearly, in

RLDP, what each user does is just to apply standard

differential privacy. In contrast, in ULDP above, each

user applies local differential privacy.

3) In ǫ-RLCP, the aggregator sets a privacy parameter ǫand applies a randomization algorithm Y such that for

any user i, for any two record-neighboring instances

xi and x′i of user i’s data, and for any possible subset

of outputs2 Y of Y , we obtain P [Y ∈ Y | xi] ≤ eǫ ×P [Y ∈ Y | x′i]. In other words, in RLCP, the aggregator

applies standard differential privacy. For the aggrega-

tor to implement RLCP well, typically the aggregator

should be able to bound the impact of each record on the

information sent from a user to the aggregator. Further

discussions on this can be interesting, but we do not

present more details since RLCP is not our paper’s focus.

4) In ǫ-ULCP, the aggregator sets a privacy parameter ǫand applies a randomization algorithm Y so that for

any user i, for any two instances xi and x′i of user i’sdata (which are user-neighboring), and for any possible

subset of outputs Y of Y , we have P [Y ∈ Y | xi] ≤eǫ × P [Y ∈ Y | x′i]. The difference RLCP and ULCP

is that RLCP achieves record-level privacy protection

while ULCP ensures the stronger user-level privacy

protection.

In the case of distributed perturbation, when all users set

the same privacy parameter ǫ, we refer to ULDP and RLDP

above as ǫ-ULDP and ǫ-RLDP, respectively. Table III presents

a comparison of ǫ-ULDP, ǫ-RLDP, ǫ-RLCP, and ǫ-ULCP. In

this paper’s focus, each user applies ǫ-local differential privacy,

so our framework is under ULDP. The reasons that we consider

ULDP instead of RLDP, RLCP, and ULCP are as follows.

• We do not consider RLDP which implements perturbation

at each user via standard differential privacy, since we

aim to achieve user-level privacy protection instead of the

weaker record-level privacy protection (a vehicle is a user

in our IoV applications and may have multiple records).

The motivation is that often much data from a vehicle

may be about the vehicle’s regular driver, and it often

makes more sense to protect all data about the regular

driver instead of just protecting each single record. A

similar argument has been recently stated in [35], which

incorporates user-level differential privacy into the train-

2For simplicity, we slightly abuse the notation and denote the output ofalgorithm Yi (resp., algorithm Y ) by Yi (resp., Y ).

TABLE III: We compare different privacy notions in this

table. In this paper, we focus on ǫ-local differential privacy

which achieves user-level privacy protection with distributed

perturbation (ULDP). We do not consider record-level pri-

vacy protection with distributed perturbation (RLDP) which

implements perturbation at each user via standard differential

privacy, since we aim to achieve user-level privacy protection

instead of the weaker record-level privacy protection (a vehicle

is a user in our IoV applications and may have multiple

records). We also do not investigate record/user-level privacy

protection with centralized perturbation (RLCP/ULCP) since

this paper considers a honest-but-curious aggregator instead of

a trusted aggregator.

privacy granularity andplace of perturbation

privacyproperty

adversary model

ǫ-LDP (defined fordistributed perturbation)

ǫ-ULDP defend against a honest-but-curiousaggregator &external attacks after modelpublishing

ǫ-DP withdistributed perturbation

ǫ-RLDP

ǫ-DP withcentralized perturbation

ǫ-RLCP trusted aggregator; defend against externalattacks after model publishing

user-level privacy withcentralized perturbation

ǫ-ULCP

ing process of federated learning for language modeling.

Specifically, [35] considers user-level privacy to protect

the privacy of all typed words of a user, and explains

that such privacy protection is more reasonable than

protecting individual words as in the case of record-level

privacy. In addition, although we can compute the level of

user-level privacy from record-level privacy via the group

privacy property of differential privacy (see Theorem 2.2

of [47]), but this may significantly increase the privacy

parameter and hence weaken the privacy protection if

a user has many records (note that a larger privacy

parameter ǫ in ǫ-DP means weaker privacy protection).

More specifically, for a user with m records, according

to the group privacy property [47], the privacy protection

strength for the user under ǫ-record-level privacy is just

as that under mǫ-user-level privacy (i.e., for a user with

m records, ǫ-RLDP ensures mǫ-ULDP; ǫ-RLCP ensures

mǫ-ULCP).

• We also do not investigate RLCP and ULCP since this

paper considers a honest-but-curious aggregator instead

of a trusted aggregator. The aggregator is not completely

trusted, so the perturbation is implemented at each user

(i.e., vehicle in IoV).

IV. PROBLEM FORMATION

Let x be a user’s true value, and Y be the perturbed value.

Under the perturbation mechanism M, we use EM[Y |x] to

denote the expectation of the randomized output Y given input

x. VarM[Y |x] is the variance of output Y given input x.

MaxVar(M) denotes the worst-case VarM[Y |x]. We are inter-

ested in finding a privatization mechanism M that minimizes

MaxVar(M) by solving the following constraint minimization

problem:

minM

MaxVar(M),

6

Page 7: Local Differential Privacy based Federated Learning for ...

This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks.

s.t. Eq. (1),

EM[Y |x] = x, and

PM[Y ∈ Y|x] = 1.

The second constraint illustrates that our estimator is unbiased,

and the third constraint shows the proper distribution where

Y is the range of randomized function M. In the following

sections, ifM is clear from the context, we omit the subscript

M for simplicity.

V. MECHANISMS FOR ESTIMATION OF A SINGLE

NUMERIC ATTRIBUTE

To solve the problem in Section IV, we propose four

local differential privacy mechanisms: Three-Outputs,

PM-OPT, PM-SUB, and HM-TP. Fig. 1 compares the worst-

case noise variances of existing mechanisms and our proposed

mechanisms. Three-Outputs has three discrete output pos-

sibilities, which incurs little communication cost because two

bits are enough to encode three different outputs. Moreover, it

achieves a small worst-case noise variance in the high privacy

regime (small privacy budget ǫ). However, to maintain a low

worst-case noise variance in the low privacy regime (large

privacy budget ǫ), we propose PM-OPT and PM-SUB. Both of

them achieve higher accuracies than Three-Outputs and

other existing solutions when the privacy budget ǫ is large.

Additionally, we discretize their continuous ranges of output

for vehicles to encode using a post-processing discretization

algorithm. In the following sections, we will explain our pro-

posed four mechanisms and the post-processing discretization

algorithm in detail respectively.

A. Three-Outputs Mechanism

Now, we propose a mechanism with three output possi-

bilities named as Three-Outputs which is illustrated in

Algorithm 2. Three-Outputs ensures low communication

cost while achieving a smaller worst-case noise variance than

existing solutions in the high privacy regime (small privacy

budget ǫ). Duchi et al.’s [22] solution contains two output

possibilities, and it outperforms other approaches when the

privacy budget is small. However, Kairouz et al. [12] prove

that two outputs are not always optimal as ǫ increases. By

outputting three values instead of two, Three-Outputs

improves the performance as the privacy budget increases,

which is shown in Fig. 1. When the privacy budget is small,

Three-Outputs is equivalent to Duchi et al.’s [22] solu-

tion.

For notional simplicity, given a mechanism M, we often

write PM[Y = y | X = x] as Py←x(M) below. We also

sometimes omit M to obtain P[Y = y | X = x] and Py←x.

Given a tuple x ∈ [−1, 1], Three-Outputs returns a

perturbed value Y that equals −C, 0 or C with probabilities

defined by

P−C←x=

1−P0←0

2 +(

1−P0←0

2 − eǫ−P0←0

eǫ(eǫ+1)

)

x, if 0 ≤ x ≤ 1,

1−P0←0

2 +(

eǫ−P0←0

eǫ+1 −1−P0←0

2

)

x, if − 1 ≤ x ≤ 0,

(2)

Algorithm 2: Three-Outputs Mechanism for One-

Dimensional Numeric Data.

Input: tuple x ∈ [−1, 1] and privacy parameter ǫ.Output: tuple Y ∈ {−C, 0, C}.

1 Sampling a random variable u with the probability

distribution as follows:

P [u = −1] = P−C←x,

P [u = 0] = P0←x, and

P [u = 1] = PC←x,

where P−C←x, P0←x and PC←x are given in Eq. (2),

Eq. (3) and Eq. (4).

2 if u = −1 then

3 Y = −C;

4 else if u = 0 then

5 Y = 0;

6 else

7 Y = C;

8 return Y ;

PC←x=

1−P0←0

2 +(

eǫ−P0←0

eǫ+1 −1−P0←0

2

)

x, if 0 ≤ x ≤ 1,

1−P0←0

2 +(

1−P0←0

2 − eǫ−P0←0

eǫ(eǫ+1)

)

x, if− 1 ≤ x ≤ 0,

(3)

and P0←x = P0←0 + (P0←0

eǫ− P0←0)x, if− 1 ≤ x ≤ 1,

(4)

where P0←0 is defined by

P0←0 :=

0, if ǫ < ln 2,

− 16 (−e2ǫ − 4eǫ − 5

+2√∆0 cos(

π3 + 1

3 arccos(−∆1

2∆320

))), if ln 2 ≤ ǫ ≤ ǫ′,

eǫ+2 , if ǫ > ǫ′,(5)

in which

∆0 := e4ǫ + 14e3ǫ + 50e2ǫ − 2eǫ + 25, (6)

∆1 := −2e6ǫ − 42e5ǫ − 270e4ǫ − 404e3ǫ − 918e2ǫ

+ 30eǫ − 250, (7)

and ǫ′ := ln

(

3 +√65

2

)

≈ ln 5.53. (8)

Next, we will show how we derive the above probabil-

ities. For a mechanism which uses x ∈ [−1, 1] as the input

and only three possibilities −C, 0, C for the output value, it

satisfies

ǫ-LDP :PC←x

PC←x′,P0←x

P0←x′,P−C←x

P−C←x′∈ [e−ǫ, eǫ], (9a)

unbiased estimation:

C · PC←x + 0 · P0←x + (−C) · P−C←x = x, (9b)

proper distribution:

Py←x ≥ 0 and PC←x + P0←x + P−C←x = 1. (9c)

To calculate values of PC←x, P0←x and P−C←x, we use

Lemma 1 below to convert a mechanism M1 satisfying the

requirements in (9a) (9b) (9c) to a symmetric mechanismM2.

7

Page 8: Local Differential Privacy based Federated Learning for ...

This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks.

Then, we use Lemma 2 below to transform the symmetric

mechanism further to M3 whose worst-case noise variance is

smaller thanM2’s. Next, we use P0←1 to represent other prob-

abilities, and then we prove that we get the minimum variance

when P0←0 = eǫP0←1 using Lemma 3. Finally, Lemma 4

and Lemma 5 are used to obtain values for P0←0 and the

worst-case noise variance of Three-Outputs, respectively.

Thus, we can obtain values of PC←x, P0←x and P−C←x using

P0←0. In the following, we will illustrate above processes in

detail.

By symmetry, for any x ∈ [−1, 1], we enforce{

PC←x = P−C←−x, (10a)

P0←x = P0←−x, (10b)where Eq. (10b) can be derived from Eq. (10a). The formal

justification of Eq. (10a) (10b) is given by Lemma 1 below.

Since the input domain [−1, 1] is symmetric, we can transform

any mechanism satisfying requirements in (9a) (9b) (9c) to a

symmetric mechanism while guaranteeing the worst-case noise

variance will not increase in Lemma 1. Thus, we can derive

probabilities when x ∈ [−1, 0] using probabilities when x ∈[0, 1] based on the symmetry.

Lemma 1. For a mechanism M1 satisfying the requirements

in (9a) (9b) (9c), the following symmetrization process to

obtain a mechanism M2 will not increase (i.e., will reduce

or not change) the worst-case noise variance, while mech-

anism M2 still satisfies the requirements in (9a) (9b) (9c).

Symmetrization: For x ∈ [−1, 1],PC←x(M2) = P−C←−x(M2) =

PC←x(M1) + P−C←−x(M1)

2,

(11)

P0←x(M2) = P0←−x(M2) =P0←x(M1) + P0←−x(M1)

2.

(12)

Proof. The proof details are given in Appendix A of the

submitted supplementary file.

Based on Lemma 1, we define a symmetric mechanism as

follows.

Symmetric Mechanism. A mechanism under (9a) (9b) (9c)

is called a symmetric mechanism if it satisfies Eq. (10a) (10b).

In the following, we only consider the symmetric mechanism

M2.

Now, we design probabilities for the symmetric mechanism

M2. AsM2 satisfies the unbiased estimation which is a linear

relationship, we set probabilities as piecewise linear functions

of x as follows:

Case 1: For x ∈ [0, 1],

PC←x = PC←0 + (PC←1 − PC←0)x, (13)

P−C←x = P−C←0 − (P−C←0 − P−C←1)x, (14)

P0←x = 1− P−C←0 − PC←0

+ (P−C←0 − PC←0 + P−C←1 − PC←1)x. (15)

Case 2: For x ∈ [−1, 0],PC←x = PC←0 + (PC←0 − PC←−1)x, (16)

P−C←x = P−C←0 − (P−C←−1 − P−C←0)x, (17)

P0←x = 1− P−C←0 − PC←0

+ (P−C←0 − PC←0 + P−C←1 − PC←1)x. (18)

Then, we may assign values to our designed probabili-

ties above. We find that if a symmetric mechanism satisfies

Eq. (19a) and Eq. (19b), it obtains a smaller worst-case noise

variance. From Lemma 2 below, we enforce{

PC←1 = eǫPC←−1, (19a)

P−C←−1 = eǫP−C←1. (19b)Hence, given a symmetric mechanismM2 satisfying Inequal-

ity (20), we can transform it to a new symmetric mechanism

M3 which satisfies Eq. (19a) and Eq. (19b) through processes

of Eq. (21) (22) (23) until PC←−1 = eǫP−C←1. After

transformation, the new mechanism M3 achieves a smaller

worst-case noise variance than mechanism M2. Therefore,

we use the new symmetric mechanism M3 to replace M2

in the future’s discussion. Details of transformation are in the

Lemma 2.

Lemma 2. For a symmetric mechanism M2, if

PC←1(M2) < eǫPC←−1(M2), (20)

we set a symmetric mechanism M3 as follows: For x ∈[−1, 1],

PC←x(M3) = P−C←−x(M3)

= PC←x(M2)−eǫPC←−1(M2)− PC←1(M2)

eǫ − 1, (21)

P−C←x(M3) = PC←−x(M3)

= P−C←x(M2)−eǫP−C←1(M2)− P−C←−1(M2)

eǫ − 1, (22)

P0←x(M3) = 1− PC←x(M3)− P−C←x(M3)

= P0←x(M2) +2(eǫPC←−1(M2)− PC←1(M2))

eǫ − 1. (23)

Moreover, the mechanism M3 has a worst-case noise vari-

ance smaller than that of M2, while M3 still satisfies the

requirements in (9a) (9b) (9c).

Proof. The proof details are given in Appendix B of the

submitted supplementary file.

We have proved that the symmetric mechanism M3 has

a smaller worst-case noise variance than that of mechanism

M2 in Lemma 2, and then we use mechanism M3 to obtain

the relation between P0←1 and P0←0 to find the minimum

variance. From Lemma 3 below, we enforce

P0←0 = eǫP0←1. (24)

Then, we use the following Lemma 3 to obtain the relation

between P0←1 and P0←0, so that we can obtain PC←x, P0←x

and P−C←x using P0←0.

Lemma 3. Given P0←0, the variance of the output given

input x is a strictly decreasing function of P0←1 and hence is

minimized when P0←1 = P0←0

eǫ .

Proof. The proof details are given in Appendix C of the

submitted supplementary file.

Lemma 3 shows that we get the minimum variance when

P0←1 = P0←0

eǫ . Hence, we replace eǫP0←1 with P0←0. Then,

the variance is equivalent to

Var[Y |X = x]

=

(

eǫ + 1

(eǫ − 1)(1− P0←0

eǫ )

)2(

1− P0←0 + (P0←0 −P0←0

eǫ)|x|)

8

Page 9: Local Differential Privacy based Federated Learning for ...

This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks.

− x2. (25)

Complete details for obtaining Eq. (25) are in Appendix C of

the submitted supplementary file.

Next, we use Lemma 4 to obtain the optimal P0←0 in

Three-Outputs to achieve the minimum worst-case vari-

ance as follows:

Lemma 4. The optimal P0←0 to minimize the

maxx∈[−1,1] Var[Y |x] is defined by Eq. (5).

Proof. The proof details are given in Appendix E of the

submitted supplementary file.

Remark 1. Fig. 3 displays how P0←0 changes with ǫ in

Eq. (5). When the privacy budget ǫ is small, P0←0 = 0. Thus,

Three-Outputs is equivalent to Duchi et al.’s [22] solution

when P0←0 = 0. However, as the privacy budget ǫ increases,

P0←0 increases, which means that the probability of outputting

true value increases.

0 2 4 6 8privacy budget

0

0.2

0.4

0.6

0.8

1

P0

0

Fig. 3: Optimal P0←0 if the privacy budget ǫ ∈ [0, 8].

By summarizing above, we obtain P−C←x, PC←x and

P0←x from Eq. (2), Eq. (3) and Eq. (4) using P0←0.

Then, we can calculate the optimal P0←0 to obtain the

minimum worst-case noise variance of Three-Outputs as

follows:

Lemma 5. The minimum worst-case noise variance of

Three-Outputs is obtained when P0←0 satisfies Eq. (5).

Proof. The proof details are given in Appendix F of the

submitted supplementary file.

A clarification about Three-Outputs ver-

sus Four-Outputs. One may wonder why we consider

a perturbation mechanism with three outputs (i.e., our

Three-Outputs) instead of a perturbation mechanism

with four outputs (referred to as Four-Outputs), since

using two bits to encode the output of a perturbation

mechanism can represent four outputs. The reason is as

follows. The approach to design Four-Outputs is similar

to that for Three-Outputs, but the detailed analysis for

Four-Outputs will be even more tedious than that for

Three-Outputs (which is already quite complex). Given

above reasons, we elaborate Three-Outputs but not

Four-Outputs in this paper.

B. PM-OPT Mechanism

Now, we advocate an optimal piecewise mechanism

(PM-OPT) as shown in Algorithm 3 to get a small worst-

case variance when the privacy budget is large. As shown

in Fig. 1, Three-Outputs’s worst-case noise variance is

smaller than PM’s when the privacy budget ǫ < 3.2. But it

loses the advantage when the privacy budget ǫ ≥ 3.2. As the

privacy budget increases, Kairouz et al. [12] suggested to send

more information using more output possibilities. Besides, we

observe that it is possible to improve Wang et al.’s [8] PM to

achieve a smaller worst-case noise variance. Thus, inspired by

them, we propose an optimal piecewise mechanism named as

PM-OPT with a smaller worst-case noise variance than PM.

Algorithm 3: PM-OPT Mechanism for One-

Dimensional Numeric Data under Local Differential

Privacy.

Input: tuple x ∈ [−1, 1] and privacy parameter ǫ.Output: tuple Y ∈ [−A,A].

1 Value t is calculated in the Eq. (30);

2 Sample u uniformly at random from [0, 1];

3 if u < eǫ

t+eǫ then

4 Sample Y uniformly at random from

[L(ǫ, x, t), R(ǫ, x, t)];5 else

6 Sample Y uniformly at random from

[−A,L(ǫ, x, t)) ∪ (R(ǫ, x, t), A];

7 return Y ;

0

( = | )

( , , ) ( , , )

Fig. 4: The probability density function F [Y = y|x] of the

randomized output Y after applying ǫ-local differential pri-

vacy.

For a true input x ∈ [−1, 1], the probability density function

of the randomized output Y ∈ [−A,A] after applying local

differential privacy is given by

F [Y = y|x]={

c,for y∈[L(ǫ, x, t), R(ǫ, x, t)], (26a)

d,for y∈[−A,L(ǫ, x, t)) ∪ (R(ǫ, x, t), A], (26b)where

c =eǫt(eǫ − 1)

2(t+ eǫ)2, (27)

9

Page 10: Local Differential Privacy based Federated Learning for ...

This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks.

d =t(eǫ − 1)

2(t+ eǫ)2, (28)

A =(eǫ + t)(t+ 1)

t(eǫ − 1), (29)

L(ǫ, x, t) =(eǫ + t)(xt− 1)

t(eǫ − 1),

R(ǫ, x, t) =(eǫ + t)(xt+ 1)

t(eǫ − 1), and

t=

1

2

e2ǫ+22/33√

e2ǫ−e4ǫ+

1

2

2e2ǫ−22/3 3√

e2ǫ−e4ǫ + 4eǫ−2e3ǫ√

e2ǫ+22/3 3√e2ǫ−e4ǫ

− eǫ

2, if ǫ < ln

√2,

− 1

2

e2ǫ+22/33√

e2ǫ−e4ǫ+

1

2

2e2ǫ−22/3 3√

e2ǫ−e4ǫ − 4eǫ−2e3ǫ√

e2ǫ+22/3 3√e2ǫ−e4ǫ

− eǫ

2, if ǫ > ln

√2,

3 + 2√3− 1√

2, if ǫ = ln

√2.

(30a)

(30b)

(30c)

The meaning of t can be seen from t−1t+1 = L(ǫ,1,t)

R(ǫ,1,t) . When

the input is x = 1, the length of the higher probability density

function F [Y = y|x] = eǫt(eǫ−1)2(t+eǫ)2 is R(ǫ, 1, t) − L(ǫ, 1, t).

R(ǫ, 1, t) is the right boundary, and L(ǫ, 1, t) is the left

boundary. If 0 < t < ∞, we can derive limt→0t−1t+1 = −1,

meaning the right boundary is opposite to the left boundary if

t is close to 0. Since limt→∞t−1t+1 = 1, it means that the right

boundary is equal to the left boundary when t is close to ∞.

Moreover, Fig. 4 illustrates that the probability den-

sity function of Eq. (26) contains three pieces. If y ∈[L(ǫ, x, t), R(ǫ, x, t)], the probability density function is equal

to c which is higher than other two pieces y ∈ [−A,L(ǫ, x, t))and y ∈ (R(ǫ, x, t), A]. We calculate the probability of a

variable Y falling in the interval [L(ǫ, x, t), R(ǫ, x, t)] as

P [L(ǫ, x, t) ≤ Y ≤ R(ǫ, x, t)] =∫ R(ǫ,x,t)

L(ǫ,x,t)c dY = eǫ

t+eǫ .

Furthermore, we use the following lemmas to establish how

we get the value t in Eq. (26).

Lemma 6. Algorithm 3 achieves ǫ-local differential privacy.

Given an input value x, it returns a noisy value Y with

E[Y |x] = x and

Var[Y |x] = t+ 1

eǫ − 1x2 +

(t+ eǫ)(

(t+ 1)3 + eǫ − 1)

3t2(eǫ − 1)2. (31)

Proof. The proof details are given in Appendix I of the

submitted supplementary file.

Thus, when x = 1, we obtain the worst-case noise variance

as follows:

maxx∈[−1,1]

Var[Y |x] = t+ 1

eǫ − 1+

(t+ eǫ)(

(t+ 1)3 + eǫ − 1)

3t2(eǫ − 1)2.

(32)

Then, we obtain the optimal t in Lemma 7 to minimize

Eq. (32).

Lemma 7. The optimal t for mint maxx∈[−1,1] Var[Y |x] is

Eq. (30).

Proof. By computing the first-order derivative and second-

order derivative of mint maxx∈[−1,1] Var[Y |x], we get the

optimal t. The proof details are given in Appendix D of the

submitted supplementary file.

C. PM-SUB Mechanism

We propose a suboptimal piecewise mechanism (PM-SUB)

to simplify the sophisticated computation of t in Eq. (4) of

PM-OPT, and details of PM-SUB are shown in Algorithm 4.

Algorithm 4: PM-SUB Mechanism for One-

Dimensional Numeric Data under Local Differential

Privacy.

Input: tuple x ∈ [−1, 1] and privacy parameter ǫ.Output: tuple Y ∈ [−A,A].

1 Sample u uniformly at random from [0, 1];

2 if u < eǫ

eǫ/3+eǫthen

3 Sample Y uniformly at random from

[ (eǫ+eǫ/3)(xeǫ/3−1)

eǫ/3(eǫ−1) , (eǫ+eǫ/3)(xeǫ/3+1)

eǫ/3(eǫ−1) ];

4 else

5 Sample Y uniformly at random from

[−A, (eǫ+eǫ/3)(xeǫ/3−1)eǫ/3(eǫ−1) ) ∪ ( (e

ǫ+eǫ/3)(xeǫ/3+1)eǫ/3(eǫ−1) , A];

6 return Y ;

Fig. 1 illustrates that PM-OPT achieves a smaller worst-

case noise variance compared with PM, but the parameter tfor PM-OPT in Eq. (30) is complicated to compute. Some

vehicles are unable to process the complicated computation.

To make t simple for vehicles to implement, we need to find

a simple expression for it while ensuring the mechanism’s

performance. Then, we find that Wang et al.’s [8] PM is the

case when t = eǫ/2. Inspired by PM, ln t and ǫ can be linearly

related. Then, we find that ln tǫ is close to 1

3 (t for PM-OPT

in Eq. (30)), so we can set eǫ/3 as t in Eq. (26) for a new

mechanism named as PM-SUB. The probability of a variable

Y falling in the interval [L(ǫ, x, eǫ/3), R(ǫ, x, eǫ/3)] is eǫ

eǫ/3+eǫ,

and we give the detail of proof in Appendix J.

Similar to PM-OPT, we derive the worst-case noise variance

of PM-SUB from Lemma 6 with t = eǫ/3 as follows:

maxx∈[−1,1]

Var[Y |x] = 5e4ǫ/3

3(eǫ − 1)2+

5e2ǫ/3

3(eǫ − 1)2+

2eǫ

(eǫ − 1)2.

(33)

As shown in Fig. 5, PM-SUB’s worst-case noise variance is

close to PM-OPT’s, but it is smaller than PM’s, which can be

observed in Fig. 1.

D. Discretization Post-Processing

Both PM-OPT and PM-SUB’s output ranges is [−1, 1] which

is continuous, so that there are infinite output possibilities

given an input x. Thus, it is difficult to encode their outputs

for vehicles. Hence, we consider to apply a post-processing

process to discretize the continuous output range into finite

10

Page 11: Local Differential Privacy based Federated Learning for ...

This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks.

0 1 2 3 4 5 6 7 8privacy budget

0.955

0.96

0.965

0.97

0.975

0.98

0.985

0.99

0.995

1

Fig. 5: PM-OPT’s worst-case noise variance versus PM-SUB’s

worst-case noise variance.

Algorithm 5: Discretization Post-Processing.

Input: Perturbed data y ∈ [−C,C], and domain

[−C,C] is separated into 2m pieces, where mis a positive integer.

Output: Discrete data Z .

1 Sample a Bernoulli variable u such that

P [u = 1] =

(

C · (⌊

m·yC

+ 1)

m− y

)

· mC;

2 if u = 1 then

3 Z =C·⌊m·yC ⌋

m ;

4 else

5 Z =C·(⌊m·yC ⌋+1)

m ;

6 return Z;

output possibilities. Algorithm 5 shows our discretization post-

processing steps.

The idea of Algorithm 5 is as follows. We discretize the

range of output into 2m parts due to the symmetric range

[−C,C], and then we obtain 2m + 1 output possibilities.

After we get a perturbed data y, it will fall into one of 2msegments. Then, we categorize it to the left boundary or the

right boundary of the segment, which resembles sampling a

Bernoulli variable.

Next, we explain how we derive probabilities for the

Bernoulli variable. Let the original input be x. A random

variable Y represents the intermediate output after the per-

turbation and a random variable Z represents the output after

the discretization. The range of Y is [−C,C]. Because the

range of output is symmetric with respect to 0, we discretize

both [−C, 0] and [0, C] into m parts, where the value of mdepends on the user’ requirement. Thus, we discretize Y to Zto take only the following (2m+ 1) values:

{

i× C

m: integer i ∈ {−m,−m+ 1, . . . ,m}

}

. (34)

When Y is instantiated as y ∈ [−C,C], we have the following

two cases:

① If y is one of the above (2m+1) values, we set Z as y.

② If y is not one of the above (2m + 1) values, and then

there exist some integer k ∈ {−m,−m+ 1, . . . ,m− 1}such that kC

m < y < (k+1)Cm . In fact, this gives k < ym

C <k+1, so we can set k := ⌊ ymC ⌋. Then conditioning on that

Y is instantiated as y, we set Z as kCm with probability

k + 1 − ymC and as

(k+1)Cm with probability ym

C − k, so

that the expectation of Z given Y = y equals y (as we

will show in Eq. (145), this ensures that the expectation

of Z given the original input as x equals x).

The following Lemma 8 shows the probability distribution

of assigning y with a boundary value in the second case above

when the intermediate output y is not one of discrete (2m+1)values.

Lemma 8. After we obtain the intermediate output y after

perturbation, we discretize it to a random variable Z equal tokCm or

(k+1)Cm with the following probabilities:

P [Z = z | Y = y] =

k + 1− ymC , if z = kC

m ,

ymC − k, if z = (k+1)C

m .(35)

Proof. The proof details are given in Appendix K of the

submitted supplementary file.

After discretization, the worst-case noise variance does not

change or get worse proved by Lemma 9 as follows:

Lemma 9. Let local differential privacy mechanism be Mech-

anism M1, and discretization algorithm be Mechanism M2.

Let all of output possibilities of Mechanism M1 be S1, and

output possibilities of Mechanism M2 be S2. S2 ⊂ S1. When

given input x, M1 and M2 are unbiased. The worst-case

noise variance of Mechanism M2 is greater than or equal to

the worst-case noise variance of Mechanism M1.

Proof. The proof details are given in Appendix L of the

submitted supplementary file.

E. HM-TP Mechanism

Fig. 1 shows that Three-Outputs outperforms PM-SUB

when the privacy budget ǫ is small, whereas PM-SUB

achieves a smaller variance if the privacy budget ǫ is large.

To fully take advantage of two mechanisms, we combine

Three-Outputs and PM-SUB to create a new hybrid

mechanism named as HM-TP. Fig. 1 illustrates that HM-TP

obtains a lower worst-case noise variance than other solutions.

Hence, HM-TP invokes PM-SUB with probability β. Oth-

erwise, it invokes Three-Outputs. We define the noisy

variance of HM-TP as VarH[Y |x] given inputs x as follows:

VarH[Y |x] = β · VarP [Y |x] + (1− β) · VarT [Y |x],where VarP [Y |x] and VarT [Y |x] denote noisy outputs’ vari-

ances incurred by PM-SUB and Three-Outputs, respec-

tively. The following lemma presents the value of β:

Lemma 10. The maxx∈[−1,1] VarH[Y |x] is minimized when βis Eq. (150). Due to the complicated equation of β, we put it

in the appendix.

Proof. The proof details are given in Appendix M of the

submitted supplementary file.

11

Page 12: Local Differential Privacy based Federated Learning for ...

This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks.

Since we have obtained the probability β, we can calculate

the exact expression for the worst-case noise variance in

Lemma 11 as follows:

Lemma 11. If β satisfies Lemma 10, we obtain the worst-case

noise variance of HM-TP as

maxx∈[−1,1]

VarH[Y |x] ={

VarH[Y |x∗], if 0 < β < 2(eǫ−a)2(eǫ−1)−aeǫ(eǫ+1)2

2(eǫ−a)2(eǫ+t)−aeǫ(eǫ+1)2 ,

max{VarH[Y |0],VarH[Y |1]}, otherwise,

where x∗ := (β−1)aeǫ(eǫ+1)2

2(eǫ−a)2(β(eǫ+t)−eǫ+1) and a = P0←0 which is

defined in Eq. (5).

Proof. The proof details are given in Appendix U of the

submitted supplementary file.

VI. MECHANISMS FOR ESTIMATION OF MULTIPLE

NUMERIC ATTRIBUTES

Now, we consider a case in which the user’s data record

contains d > 1 attributes. There are three existing solu-

tions to collect multiple attributes: (i) The straightforward

approach which collects each attribute with privacy budget

ǫ/d. Based on the composition theorem [47], it satisfies ǫ-LDP after collecting of all attributes. But the added noise can

be excessive if d is large [8]. (ii) Duchi et al.’s [22] solution,

which is rather complicated, handles numeric attributes only.

(iii) Wang et al.’s [8] solution is the advanced approach

that deals with a data tuple containing both numeric and

categorical attributes. Their algorithm requires to calculate an

optimal k < d based on the single dimensional attribute’s ǫ-LDP mechanism, and a user submits selected k dimensional

attributes instead of d dimensions.

Algorithm 6: Mechanism for Multiple-Dimensional

Numeric Attributes.

Input: tuple x ∈ [−1, 1]d and privacy parameter ǫ.Output: tuple Y ∈ [−A,A]d.

1 Let Y = 〈0, 0, . . . , 0〉;2 Let k = max{1,min{d,

ǫ2.5

}};3 Sample k values uniformly without replacement from

{1, 2, . . . , d};4 for each sampled value j do

5 Feed x[tj ] and ǫk as input to PM-SUB,

Three-Outputs or HM-TP, and obtain a noisy

value yj ;

6 Y [tj ] =dkyj ;

7 return Y ;

Thus, we follow Wang et al.’s [8] idea to extend Sec-

tion V to the case of multidimensional attributes. Algorithm 6

shows the pseudo-code of our extension for our PM-SUB,

Three-Outputs, and HM-TP. Given a tuple x ∈ [−1, 1]d,

the algorithm returns a perturbed tuple Y that has non-zero

value on k attributes, where

k = max{1,min{d,⌊ ǫ

2.5

}}, (36)

and Appendix S of the submitted supplementary file

proves our selected k is optimal after extending PM-SUB,

Three-Outputs, and HM-TP to support d dimensional

attributes.

Overall, our algorithm for collecting multiple attributes

outperforms existing solutions, which is confirmed by our

experiments in the Section VII. But Three-Outputs uses

only one more bit compared with Duchi et al.’s [22] solution

to encode outputs. Moreover, our Three-Outputs obtains a

higher accuracy in the high privacy regime (where the privacy

budget is small) and saves many bits for encoding since PM and

HM’s continuous output range requires infinite bits to encode,

whereas PM-SUB and HM-TP’s advantages are obvious at a

large privacy budget. Furthermore, because vehicles can not

encode continuous range, we discretize the continuous range

of outputs to discrete outputs. Our experiments in Section ??

confirm that we can achieve similar results to algorithms

before discretizing by carefully designing the number of

discrete parts. Hence, our proposed algorithms are obviously

more suitable for vehicles than existing solutions.

Intuitively, Algorithm 6 requires every user to submit kattributes instead of d attributes, such that the privacy budget

for each attribute increases from ǫ/d to ǫ/k, which helps

to minimize the noisy variance. In addition, by setting kas Eq. (36), algorithm 6 achieves an asymptotically optimal

performance while preserving privacy, which we will prove

using Lemma 12 and 13. Lemma 12 and 13 are proved in the

same way as that of Lemma 4 and 5 in [8].

Lemma 12. Algorithm 6 satisfies ǫ-local differential privacy.

In addition, given an input tuple x, it outputs a noisy tuple Y ,

such that for any j ∈ [1, d], and each tj of those k attributes

is selected uniformly at random (without replacement) from

all d attributes of x, and then E[Y [tj ]] = x[tj ].

Proof. Algorithm 6 composes k numbers of ǫ-LDP pertur-

bation algorithms; thus, based on composition theorem of

differential mechanism [48], Algorithm 6 satisfies ǫ-LDP. As

we can see from Algorithm 6, each perturbed output Y equals

to dkyj with probability k

d or equals to 0 with probability 1− kd .

Thus, E[Y [tj ]] =kd · E[ dk · yj ] = E[yj ] = x[tj ] holds.

Lemma 13. For any j ∈ [1, d], let Z[tj] =1n

∑ni=1 Y [tj ] and

X [tj] =1n

∑ni=1 x[tj ]. With at least 1− β probability,

maxj∈[1,d]

|Z[tj ]−X [tj]| = O

(

d ln(d/β))

ǫ√n

)

.

Proof. The proof details are given in Appendix R of the

submitted supplementary file.

VII. EXPERIMENTS

We implemented both existing solutions and our proposed

solutions, including PM-SUB, Three-Outputs, HM-TP

proposed by us, PM and HM proposed by Wang et al. [8],

Duchi et al.’s [22] solution and the traditional Laplace mech-

anism. Our datasets include (i) the WISDM Human Activ-

ity Recognition dataset [49] is a set of accelerometer data

collecting on Android phones from 35 subjects performing 6activities, where the domain of the timestamps of the phone’s

12

Page 13: Local Differential Privacy based Federated Learning for ...

This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks.

0.5 1 2 4 privacy budget

10-6

10-5

10-4

10-3

10-2

MS

E

(a) MX-Numeric

0.5 1 2 4 privacy budget

10-6

10-5

10-4

10-3

MS

E

(b) BR-Numeric

0.5 1 2 4 privacy budget

10-6

10-5

10-4

10-3

MS

E

(c) WISDM-Numeric

0.5 1 2 4 privacy budget

10-6

10-4

10-2

MSE

(d) Vehicle-Numeric

Fig. 6: Result accuracy for mean estimation (on numeric attributes).

0.5 1 2 4 privacy budget

10-6

10-4

10-2

MS

E

(a) µ = 0

0.5 1 2 4 privacy budget

10-6

10-4

10-2

MS

E

(b) µ = 1/3

0.5 1 2 4 privacy budget

10-6

10-4

10-2

MS

E

(c) µ = 2/3

0.5 1 2 4 privacy budget

10-6

10-4

10-2

MS

E

(d) µ = 1

Fig. 7: Result accuracy on synthetic datasets with 16 dimensions, each of which follows a Gaussian distribution N(µ, 1/16)truncated to [−1, 1].

0.5 1 2 4 privacy budget

0.2

0.3

0.4

Mis

clas

sifi

catio

n ra

tes

(a) MX

0.5 1 2 4 privacy budget

0.2

0.3

0.4

Mis

clas

sifi

catio

n ra

tes

(b) BR

0.5 1 2 4

privacy budget

0.3

0.35

0.4

0.45

0.5

0.55

Mis

clas

sifi

catio

n ra

tes

(c) WISDN

Fig. 8: Logistic Regression.

uptime is removed from the dataset, and the remaining 3numeric attributes are accelerations in x, y, and z directions

measured by the Android phone’s accelerometer and 2 cat-

egorical attributes; (ii) two public datasets extracted from

Integrated Public Use Microdata Series [50] contain census

records from Brazil (BR) and Mexico (MX). BR includes 4M

tuples and 16 attributes, of which 6 are numerical and 10 are

categorical. MX contains 4M records and 19 attributes, of

which 5 are numerical and 14 are categorical; (iii) a Vehicle

dataset obtained by collecting from a distributed sensor net-

work, including acoustic (microphone), seismic (geophone),

and infrared (polarized IR sensor) [51]. The dataset contains

98528 tuples and 101 attributes, where 100 attributes are

numerical representing information such as the raw time series

data observed at each sensor and acoustic feature vectors

extracted from each sensor’s microphone. One attribute is

categorical denoting different types of vehicles, which are

labeled manually by a human operator to ensure high accuracy.

Besides, information about vehicles is gathered to find out

the type or brand of the vehicle. The Vehicle dataset is

also used as the federated learning benchmark by [52]. We

normalize the domain of each numeric attribute to [−1, 1]. In

our experiments, we report average results over 100 runs.

A. Results on the Mean Values of Numeric Attributes

We estimate the mean of every numeric attribute by col-

lecting a noisy multidimensional tuple from each user. To

compare with Wang et al.’s [8] mechanisms, we follow their

experiments and then divide the total privacy budget ǫ into

two parts. Assume a tuple contains d attributes which include

13

Page 14: Local Differential Privacy based Federated Learning for ...

This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks.

0.5 1 2 4 privacy budget

10-4

10-2

100

MS

E

(a) MX

0.5 1 2 4 privacy budget

10-5

100

MSE

(b) BR

0.5 1 2 4 privacy budget

10-1

100

MSE

(c) WISDN

Fig. 9: Linear Regression.

0.5 1 2 4 privacy budget

0.1

0.2

0.30.4

Mis

clas

sifi

catio

n ra

tes

(a) MX

0.5 1 2 4 privacy budget

0.2

0.3

0.4

Mis

clas

sifi

catio

n ra

tes

(b) BR

0.5 1 2 4 privacy budget

0.3

0.4

0.5

Mis

clas

sifi

catio

n ra

tes

(c) WISDN

0.5 1 2 4 privacy budget

0.1

0.2

0.30.40.5

Mis

clas

sifi

catio

n ra

tes

(d) Vehicle

Fig. 10: Support Vector Machines.

0.5 1 2 4 privacy budget

10-6

10-4

10-2

MSE

(a) MX

0.5 1 2 4 privacy budget

10-6

10-4

10-2

MSE

(b) BR

0.5 1 2 4 privacy budget

10-6

10-4

MSE

(c) WISDN

Fig. 11: Result accuracy for mean estimation with discretization post processing on PM, HM, and HM-TP.

dn numeric attributes and dc categorical attributes. Then, we

allocate dnǫ/d budget to numeric attributes, and dcǫ/d to

categorical ones, respectively. Our approach of using LDP

for categorical data is same as that of Wang et al. [8]. We

estimate the mean value for each of the numeric attributes

using existing methods: (i) Duchi et al.’s [22] solution han-

dles multiple numeric attributes directly; (ii) when using the

Laplace mechanism, it applies ǫ/d budget to each numeric

attribute individually; (iii) PM and HM are from Wang et al. [8].

In Section VI, we evaluate the mean square error (MSE)

of the estimated mean values for numeric attributes using

our proposed approaches. Fig. 6 presents MSE results as a

function of the total budget of ǫ in the datasets (WISDM, MX,

BR and Vehicle). To simplify the complexity, we use last 6numerical attributes of the Vehicle dataset to calculate MSE.

Overall, our experimental evaluation shows that our proposed

approaches outperform existing solutions. HM-TP outperforms

existing solutions in all settings, whereas PM-SUB’s MSE is

smaller than PM’s when privacy budget ǫ is large such as 4, and

Three-Outputs’ performance is better at a small privacy

budget. Hence, experimental results are in accordance with our

theories.

We also run a set of experiments on synthetic datasets

that contain numeric attributes only. We create four synthetic

datasets, including 16 numeric attributes where each attribute

value is obtained by sampling from a Gaussian distribution

with mean value u ∈ {0, 13 , 23 , 1} and standard deviation

of 14 . By evaluating the MSE in estimating mean values of

numeric attributes with our proposed mechanisms, we present

our experimental results in Fig. 7. Hereby, we confirm that

PM-SUB, Three-Outputs and HM-TP outperform existing

solutions.

B. Results on Empirical Risk Minimization

In the following experiments, we evaluate the proposed al-

gorithms’ performance using linear regression, logistic regres-

sion, and SVM classification tasks. We change each categorical

attribute tj with k values into k − 1 binary attributes with a

domain {−1, 1}, for example, given tj , (i) 1 represents the

l-th (l < k) value on the l-th binary attribute and −1 on

each of the rest of k − 2 attributes; (ii) −1 represents the

k-th value on all binary attributes. After the transformation,

the dimension of WISDN is 43, BR (resp. MX) is 90 (resp.

14

Page 15: Local Differential Privacy based Federated Learning for ...

This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks.

0.5 1 2 4 privacy budget

10-2

100

MSE

(a) MX

0.5 1 2 4 privacy budget

0.1

0.2

0.3

0.4

MSE

(b) BR

0.5 1 2 4 privacy budget

10-1

100

MSE

(c) WISDN

Fig. 12: Linear Regression with discretization post processing on PM, HM, and HM-TP (privacy parameter ǫ = 4).

0.5 1 2 4 privacy budget

0.2

0.3

0.4

Mis

clas

sifi

catio

n ra

tes

(a) MX

0.5 1 2 4 privacy budget

0.2

0.3

0.4

Mis

clas

sifi

catio

n ra

tes

(b) BR

0.5 1 2 4 privacy budget

0.3

0.4

0.5

Mis

clas

sifi

catio

n ra

tes

(c) WISDN

Fig. 13: Logistic Regression with discretization post processing on PM, HM, and HM-TP (privacy budget ǫ = 4).

3 21 201 2001Output Possibilities

0.01

0.02

0.030.04

MSE

PMPM-SUBThree-Outputs

(a) MX

3 21 201 2001Output Possibilities

0.005

0.01

0.0150.02

MSE

PMPM-SUBThree-Outputs

(b) BR

3 21 201 2001Output Possibilities

0.05

0.06

0.07

MSE

PMPM-SUBThree-Outputs

(c) WISDN

Fig. 14: Linear Regression with discretization post processing on PM, HM, and HM-TP (privacy budget ǫ = 4).

200001 2e+06 2e+07 Output Possibilities

0.182

0.184

0.186

Mis

clas

sifi

catio

n ra

te

PMPM-SUBThree-Outputs

(a) BR

Fig. 15: Support Vector Machine with discretization post

processing on PM, HM, and HM-TP (privacy budget ǫ = 5).

94) and Vehicle is 101. Since both the BR and MX datasets

contain the “total income” attribute, we use it as the dependent

variable and consider other attributes as independent variables.

The Vehicle dataset is used for SVM [51,52]. Each tuple in the

Vehicle dataset contains 100-dimensional feature and a binary

label.

Consider each tuple of data as the dataset of a vehicle, so

vehicles calculate gradients and run different LDP mechanisms

to generate noisy gradients. Each mini-batch is a group of

vehicles. Thus, the centralized aggregator i.e. cloud server

updates the model after each group of vehicles send noisy

gradients. The experiment involves 8 competitors: PM-SUB,

Three-Outputs, HM-TP, PM, HM, Duchi et al.’s solution,

Laplace and a non-private setting. We set the regularization

factor λ = 10−4 in all approaches. We use 10-fold cross-

validation 5 times to evaluate the performance of each method

in each dataset. Fig. 8 and Fig. 10 show that the proposed

mechanisms (PM-SUB, Three-Outputs, and HM-TP) have

lower misclassification rates than other mechanisms. Fig. 9

shows the MSE of the linear regression model. We ignore

Laplace’s result because its MSE clearly exceeds those of other

mechanisms. In the selected privacy budgets, our proposed

mechanisms (PM-SUB, Three-Outputs, and HM-TP) out-

perform existing approaches, including Laplace mechanism,

Duchi et al.’s solution, PM, and HM.

C. Results after Discretization

In this section, we add a discretization post processing step

in Algorithm 5 to the implementation of mechanisms with

continuous range of outputs, including PM, PM-SUB, HM and

HM-TP. To confirm that the discretization is effective, we

perform the following experiments. We separate the output

15

Page 16: Local Differential Privacy based Federated Learning for ...

This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks.

domain [−C,C] into 2000 segments, and then we have 2001possible outputs given an initial input x. We add a discretiza-

tion step to the experiments in Section VII-A. Fig. 11 displays

our experimental results. We confirm that our proposed ap-

proaches outperform existing solutions in estimating the mean

value using three real-world datasets: WISDM, MX, and BR

after discretizing.

In addition, we use log regression and linear regression to

evaluate the performance after discretization. We repeat the

experiments in Section VII-B with an additional discretiza-

tion post processing step. Fig. 12 and Fig. 13 present our

experimental results. Compared with other approaches, the

performance is similar to that before discretizing. Furthermore,

Fig. 14 illustrates how the accuracy changes as output possi-

bilities increase. It shows that the misclassification rate of the

logistic regression task and the MSE of the linear regression

task are related to the size of output possibilities. Although

incurring with randomness, we find that the misclassification

rate and MSE decrease as the number of output possibili-

ties increases. When there are three output possibilities, it

incurs randomness. Moreover, Fig. 15 shows that PM-SUB

outperforms Three-Outputs, when the number of output

possibilities is large. However, when we discretize the range

of outputs into 2000 segments, the performance is satisfactory

and similar to the performance with a continuous range of

outputs. Hence, our proposed approaches combined with the

discretization step help retain the performance while enabling

the usage in vehicles.

VIII. CONCLUSION

In this paper, we propose PM-OPT, PM-SUB,

Three-Outputs, and HM-TP local differential privacy

mechanisms. These mechanisms effectively preserve the

privacy when collecting data records and computing

accurate statistics in various data analysis tasks, including

estimating the mean frequency and machine learning tasks

such as SVM classification, logistic regression, and linear

regression. Moreover, we integrate our proposed local

differential privacy mechanisms with FedSGD algorithm

to create an LDP-FedSGD algorithm. The LDP-FedSGD

algorithm enables the vehicular crowdsourcing applications

to train a machine learning model to predict the traffic

status while avoiding the privacy threat and reducing the

communication cost. More specifically, by leveraging LDP

mechanisms, adversaries are unable to deduce the exact

location information of vehicles from uploaded gradients.

Then, FL enables vehicles to train their local machine

learning models using collected data and then send noisy

gradients instead of data to the cloud server to obtain a global

model. Extensive experiments demonstrate that our proposed

approaches are effective and able to perform better than

existing solutions. Further, we intend to apply our proposed

LDP mechanisms to deep neural network to deal with more

complex data analysis tasks.

REFERENCES

[1] B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas,“Communication-efficient learning of deep networks from decentralizeddata,” in Artificial Intelligence and Statistics, 2017, pp. 1273–1282.

[2] V. Bindschaedler, R. Shokri, and C. A. Gunter, “Plausible deniabilityfor privacy-preserving data synthesis,” Proceedings of the VLDB En-dowment, vol. 10, no. 5, pp. 481–492, 2017.

[3] B. Hitaj, G. Ateniese, and F. Perez-Cruz, “Deep models under the GAN:information leakage from collaborative deep learning,” in Proceedings of

the 2017 ACM SIGSAC Conference on Computer and Communications

Security, 2017, pp. 603–618.

[4] G. Xu, H. Li, S. Liu, K. Yang, and X. Lin, “Verifynet: Secure and veri-fiable federated learning,” IEEE Transactions on Information Forensicsand Security, vol. 15, pp. 911–926, 2019.

[5] J. Chen, X. Pan, R. Monga, S. Bengio, and R. Jozefowicz, “Revisitingdistributed synchronous SGD,” arXiv preprint arXiv:1604.00981, 2016.

[6] J. Duchi, M. J. Wainwright, and M. I. Jordan, “Local privacy andminimax bounds: Sharp rates for probability estimation,” in Advances

in Neural Information Processing Systems, 2013, pp. 1529–1537.

[7] C. Dwork, F. McSherry, K. Nissim, and A. Smith, “Calibrating noiseto sensitivity in private data analysis,” in Theory of CryptographyConference (TCC), 2006, pp. 265–284.

[8] N. Wang, X. Xiao, Y. Yang, J. Zhao, S. C. Hui, H. Shin, J. Shin,and G. Yu, “Collecting and analyzing multidimensional data withlocal differential privacy,” in IEEE International Conference on Data

Engineering (ICDE), 2019.

[9] R. Bassily and A. Smith, “Local, private, efficient protocols for succincthistograms,” in Proceedings of the forty-seventh annual ACM symposiumon Theory of computing, 2015, pp. 127–135.

[10] T. Wang, J. Blocki, N. Li, and S. Jha, “Locally differentially private pro-tocols for frequency estimation,” in 26th {USENIX} Security Symposium

({USENIX} Security 17), 2017, pp. 729–745.

[11] U. Erlingsson, V. Pihur, and A. Korolova, “RAPPOR: Randomizedaggregatable privacy-preserving ordinal response,” in Proc. ACM Con-

ference on Computer and Communications Security (CCS), 2014, pp.1054–1067.

[12] P. Kairouz, S. Oh, and P. Viswanath, “Extremal mechanisms for localdifferential privacy,” in Advances in Neural Information ProcessingSystems, 2014, pp. 2879–2887.

[13] L. Ou, Z. Qin, S. Liao, T. Li, and D. Zhang, “Singular spectrum analysisfor local differential privacy of classifications in the smart grid,” IEEE

Internet of Things Journal, 2020.

[14] W. Tang, J. Ren, K. Deng, and Y. Zhang, “Secure data aggregation oflightweight e-healthcare IoT devices with fair incentives,” IEEE Internet

of Things Journal, vol. 6, no. 5, pp. 8714–8726, 2019.

[15] M. Sun and W. P. Tay, “On the relationship between inference anddata privacy in decentralized IoT networks,” IEEE Transactions on

Information Forensics and Security, vol. 15, pp. 852–866, 2019.

[16] P. Zhao, G. Zhang, S. Wan, G. Liu, and T. Umer, “A survey of localdifferential privacy for securing Internet of Vehicles,” The Journal of

Supercomputing, pp. 1–22, 2019.

[17] L. Sun, J. Zhao, and X. Ye, “Distributed clustering in the anonymizedspace with local differential privacy,” arXiv preprint arXiv:1906.11441,2019.

[18] M. E. Gursoy, A. Tamersoy, S. Truex, W. Wei, and L. Liu, “Secure andutility-aware data collection with condensed local differential privacy,”IEEE Transactions on Dependable and Secure Computing, 2019.

[19] S. Ghane, A. Jolfaei, L. Kulik, K. Ramamohanarao, and D. Puthal, “Pre-serving privacy in the internet of connected vehicles,” IEEE Transactionson Intelligent Transportation Systems, 2020.

[20] L. Lyu, J. Yu, K. Nandakumar, Y. Li, X. Ma, J. Jin, H. Yu, and K. S.Ng, “Towards fair and privacy-preserving federated deep models,” IEEE

Transactions on Parallel and Distributed Systems, vol. 31, no. 11, pp.2524–2541, 2020.

[21] L. Sun, X. Ye, J. Zhao, C. Lu, and M. Yang, “Bisample: Bidirectionalsampling for handling missing data with local differential privacy,” 2020.

[22] J. C. Duchi, M. I. Jordan, and M. J. Wainwright, “Minimax optimalprocedures for locally private estimation,” Journal of the American

Statistical Association, vol. 113, no. 521, pp. 182–201, 2018.

[23] C. Xu, J. Ren, L. She, Y. Zhang, Z. Qin, and K. Ren, “EdgeSanitizer:Locally differentially private deep inference at the edge for mobile dataanalytics,” IEEE Internet of Things Journal, 2019.

[24] W.-S. Choi, M. Tomei, J. R. S. Vicarte, P. K. Hanumolu, and R. Kumar,“Guaranteeing local differential privacy on ultra-low-power systems,” in2018 ACM/IEEE 45th Annual International Symposium on Computer

Architecture (ISCA). IEEE, 2018, pp. 561–574.

[25] X. He, J. Liu, R. Jin, and H. Dai, “Privacy-aware offloading in mobile-edge computing,” in GLOBECOM 2017-2017 IEEE Global Communi-

cations Conference. IEEE, 2017, pp. 1–6.

16

Page 17: Local Differential Privacy based Federated Learning for ...

This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks.

[26] X. Li, S. Liu, F. Wu, S. Kumari, and J. J. Rodrigues, “Privacypreserving data aggregation scheme for mobile edge computing assistedIoT applications,” IEEE Internet of Things Journal, 2018.

[27] P. C. M. Arachchige, P. Bertok, I. Khalil, D. Liu, S. Camtepe, andM. Atiquzzaman, “Local differential privacy for deep learning,” IEEE

Internet of Things Journal, 2019.

[28] V. Pihur, “The podium mechanism: Improving on the laplace andstaircase mechanisms,” arXiv preprint arXiv:1905.00191, 2019.

[29] L. Zhao, S. Hu, Q. Wang, J. Jiang, C. Shen, X. Luo, and P. Hu,“Shielding collaborative learning: Mitigating poisoning attacks throughclient-side detection,” IEEE Transactions on Dependable and Secure

Computing, vol. PP, no. 99, pp. 1–1, 10.1109/TDSC.2020.2 986 205,2020.

[30] W. Y. B. Lim, N. C. Luong, D. T. Hoang, Y. Jiao, Y.-C. Liang, Q. Yang,D. Niyato, and C. Miao, “Federated learning in mobile edge networks:A comprehensive survey,” arXiv preprint arXiv:1909.11875, 2019.

[31] M. Hao, H. Li, X. Luo, G. Xu, H. Yang, and S. Liu, “Efficient andprivacy-enhanced federated learning for industrial artificial intelligence,”IEEE Transactions on Industrial Informatics, 2019.

[32] S. Lu, Y. Yao, and W. Shi, “Collaborative learning on the edges: A casestudy on connected vehicles,” in 2nd USENIX Workshop on Hot Topics

in Edge Computing (HotEdge 19), 2019.

[33] R. Fantacci and B. Picano, “Federated learning framework for mobileedge computing networks,” CAAI Transactions on Intelligence Technol-

ogy, vol. 5, no. 1, pp. 15–21, 2020.

[34] Y. M. Saputra, D. T. Hoang, D. N. Nguyen, E. Dutkiewicz, M. D.Mueck, and S. Srikanteswara, “Energy demand prediction with federatedlearning for electric vehicle networks,” arXiv preprint arXiv:1909.00907,2019.

[35] M. Brendan, D. Ramage, K. Talwar, and L. Zhang, “Learning differen-tially private recurrent language models,” in International Conference

on Learning and Representation, 2018.

[36] S. Truex, N. Baracaldo, A. Anwar, T. Steinke, H. Ludwig, R. Zhang, andY. Zhou, “A hybrid approach to privacy-preserving federated learning,”in Proceedings of the 12th ACM Workshop on Artificial Intelligence and

Security, 2019, pp. 1–11.

[37] R. Hu, Y. Guo, H. Li, Q. Pei, and Y. Gong, “Personalized federatedlearning with differential privacy,” IEEE Internet of Things Journal,2020.

[38] E. Bagdasaryan, O. Poursaeed, and V. Shmatikov, “Differential privacyhas disparate impact on model accuracy,” in Advances in Neural Infor-mation Processing Systems, 2019, pp. 15 453–15 462.

[39] A. Triastcyn and B. Faltings, “Federated learning with bayesian differ-ential privacy,” arXiv preprint arXiv:1911.10071, 2019.

[40] Y. Wang, Y. Tong, and D. Shi, “Federated latent dirichlet allocation: Alocal differential privacy based framework.” in AAAI, 2020, pp. 6283–6290.

[41] L. Lyu, J. C. Bezdek, X. He, and J. Jin, “Fog-embedded deep learningfor the Internet of Things,” IEEE Transactions on Industrial Informatics,2019.

[42] T. Li, Z. Liu, V. Sekar, and V. Smith, “Privacy for free: Communication-efficient learning with differential privacy using sketches,” arXiv preprintarXiv:1911.00972, 2019.

[43] Y. Zhao, J. Zhao, L. Jiang, R. Tan, and D. Niyato, “Mobile edge com-puting, blockchain and reputation-based crowdsourcing IoT federatedlearning: A secure, decentralized and privacy-preserving system,” arXiv

preprint arXiv:1906.10893, 2019.

[44] L. Zhao, Q. Wang, Q. Zou, Y. Zhang, and Y. Chen, “Privacy-preservingcollaborative deep learning with unreliable participants,” IEEE Transac-

tions on Information Forensics and Security, vol. 15, no. 1, pp. 1486–1500, 2020.

[45] L. Lyu, H. Yu, and Q. Yang, “Threats to federated learning: A survey,”arXiv preprint arXiv:2003.02133, 2020.

[46] T. Wang, J. Zhao, H. Yu, J. Liu, X. Yang, X. Ren, and S. Shi, “Privacy-preserving crowd-guided AI decision-making in ethical dilemmas,” inACM International Conference on Information and Knowledge Man-agement (CIKM), 2019, pp. 1311–1320.

[47] C. Dwork and A. Roth, “The algorithmic foundations of differentialprivacy,” Foundations and Trends in Theoretical Computer Science (FnT-

TCS), vol. 9, no. 3–4, pp. 211–407, 2014.

[48] F. McSherry and K. Talwar, “Mechanism design via differential privacy,”in IEEE Symposium on Foundations of Computer Science (FOCS), 2007,pp. 94–103.

[49] J. R. Kwapisz, G. M. Weiss, and S. A. Moore, “Activity recognition us-ing cell phone accelerometers,” ACM SigKDD Explorations Newsletter,vol. 12, no. 2, pp. 74–82, 2011.

[50] S. Ruggles, S. Flood, R. Goeken, J. Grover, E. Meyer, J. Pacas, andM. Sobek, “IPUMS USA: Version 10.0,” dataset]. Minneapolis, MN:IPUMS. https://doi.org/10.18128/D010.V10.0, 2020.

[51] M. F. Duarte and Y. H. Hu, “Vehicle classification in distributed sensornetworks,” Journal of Parallel and Distributed Computing, vol. 64, no. 7,pp. 826–838, 2004.

[52] T. Li, M. Sanjabi, A. Beirami, and V. Smith, “Fair resource allocationin federated learning,” arXiv preprint arXiv:1905.10497, 2019.

17

Page 18: Local Differential Privacy based Federated Learning for ...

This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks.

APPENDIX

A. Proof of Lemma 1

The mechanism M2 satisfies the proper distribution in

Eq. (9c) because of

PC←x(M2) + P−C←x(M2) + P0←x(M2)

=PC←x(M1) + P−C←−x(M1)

2

+P−C←x(M1) + PC←−x(M1)

2

+P0←x(M1) + P0←−x(M1)

2= 1.

Besides, the mechanism M2 satisfies unbiased estimation in

Eq. (9b) because

C · PC←x(M2) + (−C) · P−C←x(M2) + 0 · P0←x(M2)

= C · PC←x(M1) + P−C←−x(M1)

2

+ (−C) · P−C←x(M1) + PC←−x(M1)

2

+ 0 · P0←x(M1) + P0←−x(M1)

2= x.

In addition,PC←x(M2)

PC←x′(M2)=

PC←x(M1) + P−C←−x(M1)

PC←x′(M1) + P−C←−x′(M1)(37)

andP−C←x(M2)

P−C←x′(M2)=

P−C←x(M1) + PC←−x(M1)

P−C←x′(M1) + PC←−x′(M1)(38)

andP0←x(M2)

P0←x′(M2)=

P0←x(M1) + P0←−x(M1)

P0←x′(M1) + P0←−x′(M1). (39)

According to (9a), we obtaine−ǫ(PC←x′(M1) + P−C←−x′(M1))

PC←x′(M1) + P−C←−x′(M1)

≤ Eq. (37) ≤ eǫ(PC←x′(M1) + P−C←−x′(M1))

PC←x′(M1) + P−C←−x′(M1),

which is equavalent to

e−ǫ ≤ Eq. (37) ≤ eǫ. (40)

Similarly, we prove that Eq. (38) (39) satisfy (9a). Hence, we

conclude that M2 satisfies ǫ-LDP requirements.

Then, we prove that the symmetrization process does not

increase the worst-case noise variance as follows:

Since M2 satisfies the unbiased estimation of Eq. (9b),

E[Y |X = x] = x. Hence, the variance of mechanism M2

given x is

VarM2 [Y |X = x] = E[Y 2|X = x]− (E[Y |X = x])2

= C2 · PC←x(M2) + 0 · PC←x(M2)

+ (−C)2 · P−C←x(M2)− x2

= C2(1− P0←x(M2))− x2 (41)

= C2

(

1− P0←x(M1) + P0←−x(M1)

2

)

− x2, (42)

or it changes to

VarM2 [Y |X = −x] = E[Y 2|X = −x]− (E[Y |X = −x])2

= C2 · PC←−x(M2) + 0 · PC←−x(M2)

+ (−C)2 · P−C←−x(M2)− x2

= C2(1− P0←−x(M2))− x2 (43)

= C2

(

1− P0←x(M1) + P0←−x(M1)

2

)

− x2, (44)

when given −x.

The variance of mechanism M1 is

VarM1 [Y |X = x] = E[Y 2|X = x]− (E[Y |X = x])2

= C2 · PC←x(M1) + 0 · PC←x(M1)

+ (−C)2 · P−C←x(M1)− x2

= C2(1 − P0←x(M1))− x2, (45)

or

VarM1 [Y |X = −x] = E[Y 2|X = −x]− (E[Y |X = −x])2

= C2 · PC←−x(M1) + 0 · PC←−x(M1)

+ (−C)2 · P−C←−x(M1)− x2

= C2(1 − P0←−x(M1))− x2. (46)

Hence,

VarM2 [Y |X = x] = VarM2 [Y |X = −x]

=VarM2 [Y |X = x] + VarM2 [Y |X = −x]

2(47)

=Eq. (45) + Eq. (46)

2≤ Max{VarM1 [Y |X = −x],VarM1 [Y |X = x]}.

B. Proof of Lemma 2

In essence, with Eq. (21) (21), we havePC←1(M3)

PC←−1(M3)

=PC←1(M2)− eǫPC←−1(M2)−PC←1(M2)

eǫ−1

PC←−1(M2)− eǫPC←−1(M2)−PC←1(M2)eǫ−1

= eǫ.

Similarly, we can prove thatPC←1(M3)PC←−1(M3)

, PC←1(M3)PC←−1(M3)

= eǫ.Besides,

PC←x + P−C←x + P0←x = 1.

In addition,

C · PC←x + (−C) · P−C←x + 0 · P0←x

= C · (PC←x(M2)− P−C←x(M2)) = x.

Hence, mechanismM3 satisfies requirements in (9a) (9b) (9c).

Because the symmetric mechanism M3 satisfies require-

ments in (9a) (9b) (9c), the variance of M3 is

VarM3 [Y |X = x] = E[Y 2|X = x]− (E[Y |X = x])2

= C2 · PC←x(M3) + 0 · PC←x(M3)

+ (−C)2 · P−C←x(M3)− x2

= C2(PC←x(M3) + P−C←x(M3))− x2

= C2(1− P0←x(M3)) − x2.

By comparing with variance of M2 in Eq. (41), we obtain

VarM3 [Y |X = x]− VarM2 [Y |X = x]

= C2(1− P0←x(M3)) − x2 − (C2(1 − P0←x(M2))− x2)

= C2(P0←x(M2)− P0←x(M3)).

Because of P0←x(M2) > P0←x(M3), based on Eq. (23)

and Inequality (20), we get P0←x(M2) > P0←x(M3) which

means that

VarM2 [Y |X = x] > VarM3 [Y |X = x].

18

Page 19: Local Differential Privacy based Federated Learning for ...

This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks.

Thus, the variance of M3 is smaller than the variance ofM2

when x ∈ [−1, 1], so we obtain that the worst-case noise

variance of M3 is smaller that of M2 using

maxx∈[−1,1]

VarM2 [Y |X = x] > maxx∈[−1,1]

VarM3 [Y |X = x].

C. Proof of Lemma 3

Since P−C←0+PC←0+P0←0 = 1 and unbiased estimation

−C · P−C←0 + C · PC←0 + 0 · P0←0 = 0, we have

PC←0 = P−C←0 =1− P0←0

2. (48)

Then, based on (9a) (9b) (9c) and Lemma 2, we can derive Cwith the following steps:

P−C←1 + PC←1 + P0←1 = 1,

− C · P−C←1 + C · PC←1 + 0 · P0←1 = 1.

Therefore, we have

PC←1 =1− P0←1 +

1C

2,

P−C←1 =1− P0←1 − 1

C

2.

From Lemma 2, we obtain1− P0←1 +

1C

2= eǫ ·

(

1− P0←1 − 1C

2

)

,

which is equivalent to

C =eǫ + 1

(eǫ − 1)(1− P0←1). (49)

Hence,

PC←1 = P−C←−1 =(1− P0←1)e

ǫ

eǫ + 1,

P−C←1 = PC←−1 =(1− P0←1)

eǫ + 1. (50)

Then, we compute the variance as follows:

I. For x ∈ [0, 1], we have

Var[Y |X = x] = E[Y 2|X = x]− (E[Y |X = x])2

= C2 · PC←x + 0 · P0←x + (−C)2 · P−C←x − x2

= C2 (PC←x + P−C←x)− x2. (51)

Substituting Eq. (13) and Eq. (14) into Eq. (51) yields

= C2 (PC←0 + (PC←1 − PC←0)x)

+ C2 (P−C←0 − (P−C←0 − P−C←1)x) − x2

= C2 (PC←0 + P−C←0) + C2(PC←1 + PC←−1)x

− C2(PC←0 + P−C←0)x − x2

= C2 (PC←0 + P−C←0) + C2(1− P0←1)x

− C2(1− P0←0)x− x2

= C2 (1− P0←0) + C2(P0←0 − P0←1)x− x2. (52)

II. For x ∈ [−1, 0], we have

Var[Y |X = x] = E[Y 2|X = x]− (E[Y |X = x])2

= C2 (1− P0←0) + C2(P0←0 − P0←1)(−x) − x2.(53)

Hence, by summarizing Eq. (49), Eq. (52), and Eq. (53), we

get the variance as follows:

Var[Y |X = x]

= C2 (1− P0←0) + C2(P0←0 − P0←1)|x| − x2

=

(

eǫ + 1

(eǫ − 1)(1− P0←1)

)2

(1− P0←0 + (P0←0 − P0←1)|x|)

− x2. (54)

Derive the partial derivative of Var[Y |X = x] to P0←1, and

we getd(Var[Y |X = x])

dP0←1

=(eǫ + 1)2 (2P0←0 + |x|(1 − 2P0←0 + P0←1)− 2)

(P0←1 + 1)2(eǫ − 1)2. (55)

Then, we have the following cases:

I. If |x| = 0, Eq. (55) = (eǫ+1)2(2P0←0−2)(P0←1+1)2(eǫ−1)2 < 0,

II. If |x| = 1, Eq. (55) = (eǫ+1)2(P0←1−1)(P0←1+1)2(eǫ−1)2 < 0.

Therefore, if given P0←0, the variance of the output given

input x is a strictly decreasing function of P0←1. Hence, we

get the minimized variance when P0←1 = P0←0

eǫ . �

D. Solve Eq. (58)

To find the optimal t for mint maxx∈[−1,1] Var[Y |x], we

calculate first-order derivative of the maxx∈[−1,1] Var[Y |x] as

follows:2t

3(eǫ − 1)2+

4

3(eǫ − 1)+

4

3(eǫ − 1)2− 4t−2

3(eǫ − 1)2

− 4t−2

3(eǫ − 1)− 2t−3

3(eǫ − 1)2− 4t−3

3(eǫ − 1)− 2t−3

3

=2

3(eǫ − 1)2[t+ 2eǫ − 2eǫt−2 − e2ǫt−3]. (56)

Next, we calculate the second-order derivative of

maxx∈[−1,1] Var[Y |x] as follows:2

3(eǫ − 1)2[1 + 4eǫt−3 + 3eǫt−4] > 0. (57)

Since the second-order derivative of maxx∈[−1,1] Var[Y |x] >0, we can conclude that maxx∈[−1,1] Var[Y |x] has minimum

point in its domain.

To find t which minimizes maxx∈[−1,1] Var[Y |x], we set

t4 + 2eǫt3 − 2eǫt− e2ǫ = 0. By solving

t4 + 2eǫt3 − 2eǫt− e2ǫ = 0, (58)

we obtain Eq. (30). Define Eq. (58)’s coefficients as c4 :=1, c3 := 2eǫ, c2 := 0, c1 := −2eǫ, c0 := −e2ǫ, and we

obtain

c4 · t4 + c3 · t3 + c1 · t+ c0 = 0. (59)

To change Eq. (59) into a depressed quartic form, we substitute

f := eǫ, t := y − c34c4

= y − f2 into Eq. (59) and obtain

y4 + p · y2 + q · y + r = 0, (60)

where

p =8c2c4 − 3c23

8c24= −3f2

2, (61)

q =c33 − 4c2c3c4 + 8c1c

24

8c34= f3 − 2f, (62)

r =−3c43 + 256c0c

34 − 64c1c3c

24 + 16c2c

23c4

256c44

= − 3

16f4. (63)

Rewrite Eq. (60) to the following:(

y2 +p

2

)

= −qy − r +p2

4. (64)

19

Page 20: Local Differential Privacy based Federated Learning for ...

This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks.

Then, we introduce a variable m into the factor on the left-

hand side of Eq. (64) by adding 2y2m + pm + m2 to both

sides. Thus, we can change the equation to the following:(

y2 +p

2+m

)

= 2my2 − qy +m2 +mp+p2

4− r. (65)

Since m is arbitrarily chosen, we choose the value of m to

get a perfect square in the right-hand side. Hence,

8m3 + 8pm2 + (2p2 − 8r)m− q2 = 0. (66)

To solve Eq. (66), we substitute Eq. (61), Eq. (62) and Eq. (63)

into the following equations:

c′3 := 8,

c′2 := 8p,

c′1 := 2p2 − 8r,

c′0 := −q2,∆0 = (c′2)

2 − 3c′3c′1 = (8p)2 − 3 · 8 · (2p2 − 8r) = 0,

∆1 = 2(c′2)2 − 9c′3 · c′2 · c′1 + 27(c′3)

2 · c′0= 2(8p)3 − 9 · 8 · 8p · (2p2 − 8r) + 27 · 82 · (−q2)= 6912(f4 − f2),

C =3

∆1 ±√

∆21 − 4∆3

0

2= 3√

∆1

= 3√

6912(f4 − f2). (67)

By solving the cubic function Eq. (67), we have roots as

follows :

mk = − 1

3c′3(c′2 + ξkC +

∆0

ξkC)

=f2

2+

3

f2 − f4

2ξk, k ∈ 0, 1, 2. (68)

We only use the real-value root; thus, we get

m =f2

2+

3

f2 − f4

2. (69)

Thus,

y =±1

√2m±2

−(2p+ 2m±1

√2q√m)

2. (70)

Then, the solutions of the original quartic equation are

t = − c34c4

+±1

√2m±2

−(2p+ 2m±1

√2q√m)

2. (71)

Since t is a real number and t > 0, we obtain Eq. (30) after

substituting c3, c4,m, p, q, f into Eq. (71). �

E. Proof of Lemma 4

By summarizing Lemma 1 2 3, our designed mechanism

achieves minimum variance when it satisfies P0←1 = P0←0

eǫ .

Hence, the variance is

Var[Y |X = x] = C2 (1− P0←0) + C2P0←0(1−1

eǫ)|x| − x2,

(72)where

C =eǫ + 1

(eǫ − 1)(1− P0←0

eǫ ). (73)

For simplicity, we set

a = P0←0, (74)

b = P0←0(1−1

eǫ) = a(1− 1

eǫ). (75)

Since x ∈ [−1, 1], the worst-case noise variance is

maxx∈[−1,1]

Var[Y |x] ={

(1 − a)C2 + C4b2

4 , if C2b2 < 1,

(1 − a+ b)C2 − 1, if C2b2 ≥ 1.

(76)

Substituting Eq. (74), Eq. (75) and Eq. (73) into Eq. (76)

yields

maxx∈[−1,1]

Var[Y |x]

=

(eǫ+1)2·e2ǫ(eǫ−1)2

(

1−a(eǫ−a)2 + (eǫ+1)2·a2

4(eǫ−a)4

)

, if C2b2 < 1,

(eǫ+1)2·e2ǫ(eǫ−1)2

(

1−a(eǫ−a)2 + (eǫ−1)·a

eǫ(eǫ−a)2

)

−1,if C2b2 ≥ 1.

=

(eǫ+1)2·e2ǫ(eǫ−1)2

(

1−a(eǫ−a)2 + (eǫ+1)2·a2

4(eǫ−a)4

)

, if C2b2 < 1,

(eǫ+1)2·eǫ(eǫ−1)2·(eǫ−a)2 − 1, if C2b

2 ≥ 1.

(77)

Substituting Eq. (75) and Eq. (73) yieldsC2b

2=

(eǫ + 1)2 · e2ǫ2(eǫ − 1)2(eǫ − a)2

· a(eǫ − 1)

=(eǫ + 1)2 · eǫ · a

2(eǫ − 1)(eǫ − a)2

< 1, (78)

and

2(eǫ − 1)a2 −[

4(eǫ − 1)eǫ + (eǫ + 1)2 · eǫ]

a

+ 2(eǫ − 1)e2ǫ > 0. (79)

To solve Eq. (79), we denote the smaller solution of the

quadratic function as

a∗ =eǫ(e2ǫ + 6eǫ − 3)− (eǫ + 1)eǫ

(eǫ + 1)2 + 8(eǫ − 1)

4(eǫ − 1).

(80)From Eq. (14), we get

P−C←0 ≥ P−C←1. (81)

Then, substituting P−C←0 and P−C←1 with Eq. (48) and

Eq. (50) in Eq. (81) yields1− P0←0

2≥ (1 − P0←1)

eǫ + 1. (82)

Hence,

a = P0←0 ≤eǫ

eǫ + 2. (83)

From Eq. (83), we know that the Eq. (79) will be ensured (i)

when 0 ≤ a < a∗ if a∗ < eǫ

eǫ+2 , or (ii) when 0 ≤ a ≤ eǫ

eǫ+2 if

a∗ ≥ eǫ

eǫ+2 . Hence, by combining with Eq. (77), we obtain

maxx∈[−1,1]

Var[Y |x] =

(eǫ+1)2·e2ǫ(eǫ−1)2

(

1−a(eǫ−a)2 + (eǫ+1)2·a2

4(eǫ−a)4

)

,

for 0 ≤ a < a∗,

(eǫ+1)2·eǫ(eǫ−1)2·(eǫ−a)2 − 1,

for a∗ ≤ a ≤ eǫ

eǫ+2 ,

, if a∗ < eǫ

eǫ+2

(eǫ+1)2·e2ǫ(eǫ−1)2

(

1−a(eǫ−a)2 + (eǫ+1)2·a2

4(eǫ−a)4

)

,

for 0 ≤ a ≤ eǫ

eǫ+2 , if a∗ ≥ eǫ

eǫ+2 .

(84)

20

Page 21: Local Differential Privacy based Federated Learning for ...

This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks.

Substituting Eq. (80) into a∗ = eǫ

eǫ+2 yields

eǫ(e2ǫ + 6eǫ − 3)− (eǫ + 1)eǫ√

(eǫ + 1)2 + 8(eǫ − 1)

4(eǫ − 1)

=eǫ

eǫ + 2. (85)

After solving Eq. (85), we get ǫ = ln 4.

0 0.5 1 1.5 2 2.5 30

0.5

1

1.5

Fig. 16: Compare a∗ with eǫ

eǫ+2 .

According to Fig. 16, we obtain that a∗ ≥ eǫ

eǫ+2 if 0 < ǫ ≤ln 4. Since ǫ = ln 4 is the only solution if ǫ > 0, we conclude

that a∗ > eǫ

eǫ+2 if ǫ > ln 4. Therefore, we can replace the

condition a∗ < eǫ

eǫ+2 and write the variance as follows:

maxx∈[−1,1] Var[Y |x] =

(eǫ+1)2·e2ǫ(eǫ−1)2

(

1−a(eǫ−a)2 + (eǫ+1)2·a2

4(eǫ−a)4

)

,

for 0 ≤ a < a∗,

(eǫ+1)2·eǫ(eǫ−1)2·(eǫ−a)2 − 1,

for a∗ ≤ a ≤ eǫ

eǫ+2 ,

, if ǫ < ln 4,

(eǫ + 1)2 · e2ǫ(eǫ − 1)2

(

1− a

(eǫ − a)2+

(eǫ + 1)2 · a24(eǫ − a)4

)

,

for 0 ≤ a ≤ eǫ

eǫ + 2, if ǫ ≥ ln 4.

(86a)

(86b)

To simplify the calculation of the minimum

maxx∈[−1,1] Var[Y |x] in Eq. (86), we define

f1(a) :=(eǫ + 1)2 · e2ǫ(eǫ − 1)2

(

1− a

(eǫ − a)2+

(eǫ + 1)2 · a24(eǫ − a)4

)

, (87)

and

f2(a) :=(eǫ + 1)2 · eǫ

(eǫ − 1)2 · (eǫ − a)2− 1. (88)

First order derivative of f2(a) in Eq. (88) is

f ′2(a) =2eǫ(eǫ + 1)2

(eǫ − 1)2(eǫ − a)3> 0. (89)

Since f ′2(a) > 0, the worst-case noise variance monotonously

increases if a ∈ [a∗, eǫ

eǫ+2 ], we can get optimal a by analyzing

f1(a) in Eq. (87) when a ∈ [0, a∗) if ǫ < ln 4. First order

derivative of Eq. (87) is

f ′1(a) =2(1− a)

(eǫ − a)3− 1

(eǫ − a)2+

a(eǫ + 1)2

2(eǫ − a)4+

a2(eǫ + 1)2

(eǫ − a)5.

(90)

After simplifying f ′1(a), we have

f ′1(a) =−2a3 − a2(−e2ǫ − 5− 4eǫ)

2(eǫ − a)5

+−a(7eǫ − 4e2ǫ − e3ǫ)− (2e3ǫ − 4e2ǫ))

2(eǫ − a)5. (91)

Since 2(eǫ−a)5 > 0, solving f ′1(a) = 0 is equivalent to solve

the following equation

2a3 + a2(−e2ǫ − 5− 4eǫ) + a(7eǫ − 4e2ǫ − e3ǫ)

+ (2e3ǫ − 4e2ǫ) = 0. (92)

We define coefficients of Eq. (92) as follows:

c3 := 2, (93)

c2 := −e2ǫ − 5− 4eǫ, (94)

c1 := 7eǫ − 4e2ǫ − e3ǫ, (95)

c0 := 2e3ǫ − 4e2ǫ. (96)

The general solution of the cubic equation involves calculation

of

∆0 = c22 − 3c3c1

= (−e2ǫ − 5− 4eǫ)2 − 3× 2(7eǫ − 4e2ǫ − e3ǫ)

= e4ǫ + 14e3ǫ + 50e2ǫ − 2eǫ + 25 > 0,

∆1 = 2c23 − 9c3c2c1 + 27c23c0

= 2(−e2ǫ − 5− 4eǫ)3

− 9× 2(−e2ǫ − 5− 4eǫ)(7eǫ − 4e2ǫ − e3ǫ)

+ 27× 22(2e3ǫ − 4e2ǫ)

= −2e6ǫ − 42e5ǫ − 270e4ǫ − 404e3ǫ − 918e2ǫ

+ 30eǫ − 250 < 0,

C =3

∆1 ±√

∆21 − 4∆3

0

2.

Substituting ∆0 and ∆1 into C yields

∆21 − 4∆3

0

= (−2c6 − 42c5 − 270c4 − 404c3 − 918c2 + 30c− 250)2

− 4(c4 + 14c3 + 50c2 − 2c+ 25)3 < 0,

then√

∆21 − 4∆3

0 = i√

4∆30 −∆2

1, (97)

finally,

C =3

∆1 − i√

4∆30 −∆2

1

2.

To eliminate the imaginary number, we change Eq. (156) using

Euler’s formula. Define mold of C as follows:

|C| = (|C|3)1/3 =

(

∆21

4+ ∆3

0 −∆2

1

4

)1/3

=√

∆0, (98)

C = |C|eiθ, (99)

C3 = |C|3e3iθ =√

∆30e

3iθ. (100)

Therefore, we obtain C =√∆0e

iθ. According to Euler’s

Formula, we have

ei3θ = cos 3θ + i sin 3θ, (101)

cos 3θ =∆1

2∆320

< 0, (102)

sin 3θ = −√

4∆30 −∆2

1

2∆320

< 0. (103)

21

Page 22: Local Differential Privacy based Federated Learning for ...

This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks.

Hereby,

3θ = −π + arccos(− ∆1

2∆320

), (104)

θ = −π

3+

1

3arccos(− ∆1

2∆320

). (105)

The solution of the cubic function is

ak = − 1

3c3(c2 + ξkC +

∆0

ξkC), k ∈ {0, 1, 2}. (106)

To solve Eq. (106), we have the following cases:

• If k = 0, we have

a0 = − 1

3c3(c2 + C +

∆0

C). (107)

Substituting θ (105) and c2 (94) into Eq. (107) yields

a0=−1

6(−e2ǫ−4eǫ− 5

+2√

∆0 cos(−π

3+

1

3arccos(− ∆1

2∆320

))). (108)

• If k = 1, we have

a1 = − 1

3c3(c2 + ξC +

∆0

ξC)

= − 1

3c3(c2 + (−1

2+

√3i

2)C +

∆0

(− 12 +

√3i2 )C

)

= − 1

3c3(c2 + Cei

2π3 +

∆0

Cei

4π3 )

= − 1

3c3(c2 +

∆0ei(θ+ 2

3π) +√

∆0ei( 4π

3 −θ)).

(109)

Simplify Eq. (109) using Eq. (101), we have

a1 = − 1

3c3(c2 + 2

∆0 cos(θ +2π

3)). (110)

Substituting θ (105) and c2 (94) into Eq. (110) yields

a1 = −1

6(−e2ǫ − 4eǫ − 5

+ 2√

∆0 cos(π

3+

1

3arccos(− ∆1

2∆320

))). (111)

• If k = 2, we get

a2 = − 1

3c3(c2 + ξ2C +

∆0

ξ2C)

= − 1

3c3(c2 + (−1

2+

√3i

2)2C +

∆0

(− 12 +

√3i2 )2C

)

= − 1

3c3(c2 + (−1

2−√3i

2)C +

∆0

(− 12 −

√3i2 )C

)

= − 1

3c3(c2 +

∆0ei(θ+ 4π

3 ) +√

∆0ei( 2π

3 −θ)). (112)

Simplify Eq. (112) using Eq. (101), we obtain

a2 = − 1

3c3(c2 + 2

∆0 cos(θ −2π

3)). (113)

Substituting θ (105) and c2 (94) into Eq. (113) yields

a2 = −1

6(−e2ǫ − 4eǫ − 5

+ 2√

∆0 cos(−π +1

3arccos(− ∆1

2∆320

))). (114)

The number of real and complex roots are determined by the

discriminant of the cubic equation as follows:

∆ = 18c3c2c1c0 − 4c32c0 + c22c21 − 4c3c

31 − 27c23c

20. (115)

Substituting c3 (93), c2 (94), c1 (95) and c0 (96) into ∆ (115)

yields

∆ = e2ǫ(eǫ + 1)2(e6ǫ + 30e5ǫ

+ 279e4ǫ + 580e3ǫ − 2385e2eǫ

+ 606eǫ − 775). (116)

If ∆ = 0, we get ǫ = ln(root of ǫ6+30ǫ5+279ǫ4+580ǫ3−2385ǫ2 + 606ǫ− 775 near ǫ = 1.87686) ≈ 0.629598.

• If 0 < ǫ < 0.629598, ∆ < 0, the equation has one real

root and two non-real complex conjugate roots.

• If ǫ = 0.629598, ∆ = 0, the equation has a multiple root

all of its roots are real.

• If ǫ > 0.629598, ∆ > 0, the equation has three distinct

real roots.

From the simplified f ′1(a) Eq. (91), we know that the sign and

roots of f ′1(a) are same as its numerator, defined as follows:

g(a) := −2a3 − a2(−e2ǫ − 5− 4eǫ)

− a(7eǫ − 4e2ǫ − e3ǫ)− (2e3ǫ − 4e2ǫ). (117)

Let c = eǫ, we can change g(a) to the following:

g(a) = −2a3 − a2(−c2 − 5− 4c)− a(7c− 4c2 − c3)

− (2c3 − 4c2). (118)

Case 1: If 0 < ǫ < 0.629598, by observing a0, a1, a2, it

is obvious that a2 is a real root. Fig. 17 shows that a2 > 1.

Since a2 is the only real value root, Eq. (90) f ′1(a) > 0 if

0 < ǫ ≤ 0.629598 and f ′1(a) ≤ 0 if ǫ > 0.629598, f1(a)monotonously increases if 0 < ǫ ≤ 0.629598. Therefore, we

conclude that a = 0.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7privacy budget

4.5

5

5.5

6

6.5

7

7.5

8

8.5

a 2

Fig. 17: a2 if ǫ ∈ [0, 0.629598].

Case 2: If 0.629598 ≤ ǫ < ln 2, ∆ ≥ 0, we get real roots.

• If a = 0, g(0) = −(2c3 − 4c2).• If a = 2, g(2) = 2(8c2 + c+ 2) > 0.

• If a = +∞, lima→∞ g(a) = −∞ < 0.

If there is a root is in [0, eǫ

eǫ+2 ], it means that g(0) ≤ 0. By

solving g(0) = −(2c3−4c2) ≤ 0, we have c ≥ 2 meaning ǫ ≥ln 2. When 0.629598 < ǫ < ln 2, we have a root ∈ (2,+∞).

Based on the properties of cubic function, we have

a0a1 + a0a2 + a1a2 =c1c3

=7eǫ − 4e2ǫ − e3ǫ

2, (119)

a0a1a2 = −c0c3

= −2e3ǫ − 4e2ǫ

2. (120)

Fig. 18 shows that a0a1 + a0a2 + a1a2 < 0 (Eq. (119))

and a0a1a2 > 0 (Eq. (120)). Therefore, we can conclude that

there are one positive real root and two negative real roots or

22

Page 23: Local Differential Privacy based Federated Learning for ...

This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks.

0.62 0.63 0.64 0.65 0.66 0.67 0.68 0.69privacy budget

-5

-4

-3

-2

-1

0

1

Fig. 18: a0a1+a0a2+a1a2 and a0a1a2 if ǫ ∈ [0.629598, ln2].

a multiple root. Since two negative roots are out of the a’s

domain, we only discuss the positive root.

• If a ∈ [0, root), g(a) > 0 meaning f ′1(a) > 0.

• If a ∈ [root,+∞), g(a) ≤ 0 meaning f ′1(a) ≤ 0.

From above we know that g(a) > 0, so that f1(a)monotonously increases if a ∈ [0, eǫ

eǫ+2 ]. Therefore, a = 0.

Case 3: If ln 2 ≤ ǫ ≤ ln 5.53, ∆ > 0, there are three distinct

real roots. Since a0a1 + a0a2 + a1a2 < 0 and a0a1a2 < 0,

there are one negative root or three negative roots. If there are

three negative roots, a0a1+a0a2+a1a2 > 0, there is only one

negative root, f ′1(a) have two positive roots and one negative

root.

• If a = 0, g(0) = −(2c3 − 4c2) < 0.

• If a = 2, g(2) = 2(8c2 + c+ 2) > 0.

• If a = +∞, lima→∞ g(a) = −∞.

From above results, we can deduce that there is one positive

root in (0, 2) defined as root1, the other positive root is in

(2,+∞) defined as root2. Since root2 > 1, we only discuss

root1.

• a ∈ [0, root1], g(a) ≤ 0.

• a ∈ (root1, root2), g(a) > 0.

Therefore, if g( cc+2 ) ≥ 0, we can conclude that root1 ≤ c

c+2 .

The exact form of g( cc+2) is

g

(

c

c+ 2

)

=c2(c+ 1)2(−c2 + 3c+ 14)

(c+ 2)3. (121)

By solving g( cc+2 ) ≥ 0, we have c ≤ ln

(

3+√65

2

)

≈ 5.53,

i.e. ǫ ≤ ln 5.53. From Fig. 19, we can conclude that a1 is the

correct root, a0 < 0 and a2 > 1. a = a1 = − 16 (−e2ǫ − 4eǫ −

5 + 2√∆0 cos(

π3 + 1

3 arccos(−∆1

2∆320

))).

Case 4: If ǫ > ln 5.53, ∆ > 0, there are three distinct real

roots. From analysis in Case 3, we know that if ǫ > ln 5.53,

root1 > cc+2 and g( c

c+2 ) < 0. We know that g(a) ≤ 0 if

a ∈ [0, cc+2 ], meaning f ′1(a) < 0, so that f1(a) monotonously

decreases if ǫ > ln 5.53. Since a ∈ [0, eǫ

eǫ+2 ], we have a =eǫ

eǫ+2 .

Summarize above, we obtain the optimal a which is named

as P0←0 in the Eq. (5).

0.6 0.8 1 1.2 1.4 1.6 1.8privacy budget

-5

0

5

10

15

20

25

30

35

Fig. 19: a0, a1 and a2 if ǫ ∈ [ln 2, ln 5.53].

F. Proof of Lemma 5

By substituting the optimal P0←0 of Eq. (5) with a in the

maxx∈[−1,1] Var[Y |x] of Eq. (86), we obtain the worst-case

noise variance of Three-Outputs as follows:

minP0←0

maxx∈[−1,1]

Var[Y |x] =

(eǫ+1)2

(eǫ−1)2 , for ǫ < ln 2,

(eǫ+1)2·e2ǫ(eǫ−1)2

(

1−P0←0

(eǫ−P0←0)2+

(eǫ+1)2·P 20←0

4(eǫ−P0←0)4

)

, for ln 2 ≤ ǫ ≤ ln 5.53,

where P0←0 = − 16 (−e2ǫ − 4eǫ − 5

+2√∆0 cos(

π3 + 1

3 arccos(− ∆1

2∆320

)))

(eǫ+2)(eǫ+10)4(eǫ−1)2 , for ǫ > ln 5.53.

(122)

G. Proof of2(eǫ−a)2(eǫ−1)−aeǫ(eǫ+1)2

2(eǫ−a)2(eǫ+t)−aeǫ(eǫ+1)2 ≤ eǫ−1eǫ+t if t = ǫ

3

Proposition 1.2(eǫ−a)2(eǫ−1)−aeǫ(eǫ+1)2

2(eǫ−a)2(eǫ+t)−aeǫ(eǫ+1)2 ≤ eǫ−1eǫ+t for ǫ >

0.

Define

f :=2(eǫ − a)2(eǫ − 1)− aeǫ(eǫ + 1)2

2(eǫ − a)2(eǫ + ǫ3 )− aeǫ(eǫ + 1)2

− eǫ − 1

eǫ + ǫ3

=aeǫ(eǫ + 1)2(t+ 1)

(aeǫ(eǫ + 1)2 − 2(eǫ − a)2(eǫ + ǫ3 ))(e

ǫ + ǫ3 )

,

(123)

and

h := 2(eǫ − a)2(eǫ +ǫ

3)− aeǫ(eǫ + 1)2. (124)

When ǫ > 0, we have eǫ+ ǫ3 > 0 and aeǫ(eǫ+1)2( ǫ3 +1) > 0

(the numerator of Eq. (123)).

• If 0 < ǫ < ln 2, we have a = 0 and h = −2e2ǫ(eǫ +eǫ/3) < 0. Therefore, we conclude that Eq. (123) < 0.

• If ln 2 ≤ ǫ ≤ ln 5.53, we have a = − 16 (−e2ǫ − 4eǫ −

5 + 2√∆0 cos(

π3 + 1

3 arccos(−∆1

2∆320

))). Fig. 20 shows

that h := 2(eǫ − a)2(eǫ + t)− aeǫ(eǫ + 1)2 < 0, so that

we obtain Eq. (123) < 0.

• If ǫ > ln 5.53, we have a = eǫ

eǫ+2 and h =e2ǫ(eǫ+1)2(−2+eǫ+2eǫ/3)

(eǫ+2)2 . Since eǫ and eǫ/3 > 1, we

23

Page 24: Local Differential Privacy based Federated Learning for ...

This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks.

obtain −2 + eǫ + 2eǫ/3 > 0 and h > 0. Hence, we

conclude that Eq. (123) < 0.

Based on above analysis, we have Eq. (123)) < 0 when ǫ > 0,

meaning that2(eǫ−a)2(eǫ−1)−aeǫ(eǫ+1)2

2(eǫ−a)2(eǫ+t)−aeǫ(eǫ+1)2 ≤ eǫ−1eǫ+t when ǫ > 0.

H. Proof of 2(eǫ − a)2(eǫ + t)− aeǫ(eǫ + 1)2 > 0.

From values of ǫ, we have the following cases:

• If 0 < ǫ ≤ ln 5.53, we have 2(eǫ−a)2(eǫ+ t)−aeǫ(eǫ+1)2 > 0 referring to Fig. 20.

• If ǫ > ln 5.53, we have a = eǫ

eǫ+2 and t = eǫ3 , so that

2(eǫ − a)2(eǫ + t)− aeǫ(eǫ + 1)2

=e2ǫ(eǫ + 1)2(eǫ − 2 + 2e

ǫ3 )

(eǫ + 2)2. (125)

Sincee2ǫ(eǫ+1)2(−eǫ+2−2e

ǫ3 )

(eǫ+2)2 > 0 if ǫ > 0, we have

2(eǫ − a)2(eǫ + t)− aeǫ(eǫ + 1)2 > 0 if ǫ > ln 5.53.

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8privacy budget

0

2

4

6

8

10

12

14

16

18

2(e

-a)2

(e +

t)-a

e(e

+1)

2

Fig. 20: Value 2(eǫ − a)2(eǫ + t) − aeǫ(eǫ + 1)2 > 0, when

ǫ ∈ (0, ln 5.53].

Thus, we conclude that 2(eǫ−a)2(eǫ+t)−aeǫ(eǫ+1)2 > 0if ǫ > 0. �

I. Proving Lemma 6

From Eq. (26a) and Eq. (26b), for any Y ∈ [−A,A] and

any two input values x1, x2 ∈ [−1, 1], we havepdf(Y |x1)pdf(Y |x2)

≤cd = exp(ǫ). Thus, Algorithm 3 satisfies local differential

privacy. For notational simplicity, with a fixed ǫ below, we

will write L(ǫ, x, t) and R(ǫ, x, t) as Lx and Rx. Based on

proper probability distribution, we have∫ A

−AF [Y = y|x] dy = c(Rx − Lx) + d[2A− (Rx − Lx)] = 1.

(126)

In addition,

E[Y = y|x] =∫ A

−Ay · pdf(Y = y|x) dy

= c(Rx − Lx) + d[2A− (Rx − Lx)]

=d

2· (Lx

2 −A2) +c

2(Rx

2 − Lx2) +

d

2· (A2 −Rx

2)

= x. (127)

By solving above Eq. (126) and Eq. (127), we have{

Lx = x1−2Ad − 1−2Ad

2(c−d) ,

Rx = x1−2Ad + 1−2Ad

2(c−d) .(128)

With −A ≤ y ≤ A, the constraint −A ≤ Lx < Rx ≤ A for

any −1 ≤ x ≤ 1 in Eq. (26), Eq. (28) and Eq. (29) implies

Ad <1

2, (129)

A ≥ 1

1− 2Ad+

1− 2Ad

2(c− d). (130)

For notational simplicity, we define α and ξ as

α := Ad, (131)

ξ :=c− d

d, (132)

where it is clear under privacy parameter ǫ that

ξ = eǫ − 1. (133)

Applying Eq. (131) and Eq. (132) to Inequality (129) and

Eq. (130), we obtainα

d≥ 1

1− 2α+

1− 2α

2ξd, (134)

α <1

2. (135)

The condition Eq. (134) induces d ≤ (2ξ+4)α−(4+4ξ)α2−12ξ =

[(2ξ+2)α−1](1−2α)2ξ . In view of 1

2(ξ+1) = 12eǫ < α < 1

2 , we

define t satisfying 0 < t < ∞ such that α = t+12(t+eǫ) . Note

that limt→0t+1

2(t+eǫ) = 12eǫ , limt→∞

t+12(t+eǫ) = 1

2 and d =[(2ξ+2)α−1](1−2α)

2ξ = 12ξ ·

ξtt+1+ξ ·

ξt+1+ξ = t(eǫ−1)

2(t+eǫ)2 .

Applying α, d and ξ to Eq. (128), we have{

Lx=x

1−2α − 1−2α2ξd = x · eǫ+t

eǫ−1 − eǫ+tt(eǫ−1) =

(eǫ+t)(xt−1)t(eǫ−1) ,

Rx=x

1−2α + 1−2α2ξd = x · eǫ+t

eǫ−1 + eǫ+tt(eǫ−1) =

(eǫ+t)(xt+1)t(eǫ−1) .

(136)

Furthermore, the variance of Y is

Var[Y |x] = E[Y 2| x]− (E[Y |x])2

=

∫ A

−Ay2F [Y = y|x] dy − x2

=

∫ Lx

−Ady2 dy +

∫ Rx

Lx

cy2 dy +

∫ A

Rx

dy2 dy − x2

= maxx∈[−1,1]

d

3[Lx

3 − (−A)3] + c

3(Rx

3 − Lx3)+

d

3(A3 −Rx

3)− x2

=2d

3A3 +

(c− d)

3(Rx

3 − Lx3)− x2. (137)

Substituting Eq. (128) into Eq. (137) yields

Var[Y |x] = 2d

3A3 +

(c− d)

[

(

x

1− 2Ad+

1− 2Ad

2(c− d)

)3

−(

x

1− 2Ad− 1− 2Ad

2(c− d)

)3]

− x2

=2d

3A3 +

(c− d)

{

6

(

x

1− 2Ad

)2

× 1− 2Ad

2(c− d)+ 2

[

1− 2Ad

2(c− d)

]3}

− x2

24

Page 25: Local Differential Privacy based Federated Learning for ...

This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks.

=

(

1

1− 2Ad− 1

)

x2 +2d

3A3 +

(1 − 2Ad)3

12(c− d)2. (138)

Substituting 1−2α = eǫ−1eǫ+t , d = t(eǫ−1)

2(t+eǫ)2 , ξ = c−dd , ξ = eǫ−1,

and α = t+12(t+eǫ) into Eq. (138) yields

Var[Y |x] = t+ 1

eǫ − 1x2 +

(t+ eǫ)(

(t+ 1)3 + eǫ − 1)

3t2(eǫ − 1)2. (139)

J. Calculate the probability of a variable Y falling in the

interval [L(ǫ, x, eǫ/3), R(ǫ, x, eǫ/3)].

By replacing t in Eq.(26) of PM-OPT with eǫ/3, we obtain

the probability as follows:

P

[

L(ǫ, x, eǫ/3) ≤ Y ≤ R(ǫ, x, eǫ/3)]

=

∫ R(ǫ,x,eǫ/3)

L(ǫ,x,eǫ/3)

c dY

=

∫(eǫ+eǫ/3)(xeǫ/3+1)

eǫ/3(eǫ−1)

(eǫ+eǫ/3)(xeǫ/3−1)

eǫ/3(eǫ−1)

eǫt(eǫ − 1)

2(t+ eǫ)2dY

=eǫ

eǫ/3 + eǫ.

K. Proof of Lemma 8

The expression of these two probabilities in Eq. (142) can

be solved from the following:

proper distribution so that

P

[

Z =kC

m| Y = y

]

+ P

[

Z =(k + 1)C

m| Y = y

]

= 1,

E [Z | Y = y] = y so that(

kCm × P

[

Z = kCm | Y = y

]

+ (k+1)Cm × P

[

Z = (k+1)Cm | Y = y

]

)

= y.

(140a)

(140b)

Summarizing ① and ②, with k := ⌊ ymC ⌋, we have

P [Z = z | Y = y] =

k + 1− ymC , if z = kC

m ,

ymC − k, if z = (k+1)C

m .

(141)

In the perturbation step, the distribution of Y given the input

x is given by

F [Y = y | x] ={

p1, if y ∈ [L(x), R(x)],

p2, if y ∈ [−C,L(x)) ∪ (R(x), C].

(142)Hence,

P [Z = z | x]

=

y

P [Z = z | x and Y = y]F [Y = y | x] dy

=

y

P [Z = z | Y = y]F [Y = y | x] dy. (143)

In addition, our mechanisms are unbiased, such that

E[Y | x] =∫

y

y × F [Y = y | x] dy = x. (144)

Therefore, we obtain

E[Z | x] =∑

z

z × P [Z = z | x]

=∑

z

z ×∫

y

P [Z = z | Y = y]F [Y = y | x] dy

=

y

(

z

z × P [Z = z | Y = y]

)

F [Y = y | x] dy

=

y

y × F [Y = y | x] dy = x. (145)

L. Proof of Lemma 9

To prove Var[Z|X = x] ≥ Var[Y |X = x], it is equivalent

to prove

E[

Z2|X = x]

− (E [Z|X = x])2

≥ E[

Y 2|X = x]

− (E [Y |X = x])2.

Since M1 and M2 are unbiased, we have E [Z|X = x] =E [Y |X = x] = x. Hence, it is sufficient to prove

E[

Z2|X = x]

≥ E[

Y 2|X = x]

. (146)

We can derive that

E[

Z2|X = x]

=∑

z

z2 · P [Z = z|X = x]

=∑

z

z2∫

y

P [Z = z|Y = y] · F [Y = y|X = x] dy

=

y

z

z2P [Z = z|Y = y] · F [Y = y|X = x] dy

=

y

E[

Z2|Y = y]

· F [Y = y|X = x] dy, (147)

and

E[

Y 2|X = x]

=

y

y2 · F [Y = y|X = x] dy. (148)

To prove Inequality (146), because of Eq. (147) and

Eq. (148), it is sufficient to prove

E[

Z2|Y = y]

≥ y2, ∀y ∈ Range(Y ). (149)

After getting the intermediate output y from M1, we may

discretize the intermediate output y into z1 with probability

p1 and z2 with probability p2. Hereby,

p1 + p2 = 1.

Mechanism M2 is unbiased, so that E [Z|Y = y] = y, and

we have

p1 · z1 + p2 · z2 = y.

According to Cauchy–Schwarz inequality, we have

E[

Z2|Y = y]

= p1 · z21 + p2 · z22= [(√p1)

2 + (√p2)

2][(√p1z1)

2 + (√p2z2)

2]

≥ (p1 · z1 + p2 · z2)2 = y2.

Thus, we get Inequality (146) and Inequality (149). �

M. Proof of Lemma 10

The maxx∈[−1,1] VarH[Y |x] is minimized when

β = (150)

25

Page 26: Local Differential Privacy based Federated Learning for ...

This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks.

0, If 0 < ǫ < ǫ∗,

β1 = 2(eǫ−a)2(eǫ−1)−aeǫ(eǫ+1)2

2(eǫ−a)2(eǫ+t)−aeǫ(eǫ+1)2 , If ǫ∗ ≤ ǫ < ln 2,

β3 =−√

BA+eǫ−1

eǫ+eǫ/3, If ǫ ≥ ln 2,

Where

ǫ∗ := 3 ln(root of 3x5 − 2x3 + 3x2 − 5x− 3 near x = 1.22588)

≈ 0.610986. (151)

Proof. If x = x∗ = (β−1)aeǫ(eǫ+1)2

2(eǫ−a)2(β(eǫ+t)−eǫ+1) , we have

variance of Y as follows:

VarH[Y |x∗]

= (βt+ 1

eǫ − 1+ β − 1)

(

(β − 1)aeǫ(eǫ + 1)2

2(eǫ − a)2(β(eǫ + t)− eǫ + 1)

)2

+ (1− β)aeǫ(eǫ + 1)2

(eǫ − 1)(eǫ − a)2

(

(β − 1)aeǫ(eǫ + 1)2

2(eǫ − a)2(β(eǫ + t)− eǫ + 1)

)

+

(

(t+ eǫ)((t+ 1)3 + eǫ − 1)

3t2(eǫ − 1)2β

+ (1− β)(1 − a)e2ǫ(eǫ + 1)2

(eǫ − 1)2(eǫ − a)2

)

. (152)

Let γ := β(eǫ + t)− eǫ + 1 and c := eǫ, we can transform

VarH[Y |x∗] Eq. (152) to the following:

γ

(

a2c2(c+ 1)4

4(c+ t)2(c− a)4(c− 1)− a2c2(c+ 1)4

2(c+ t)2(c− 1)(c− a)4

+(t+ 1)3 + c− 1

3t2(c− 1)2− (1− a)c2(c+ 1)2

(c+ t)(c− 1)2(c− a)2

)

+1

γ

(

(1 + t)2a2c2(c+ 1)4

4(c+ t)2(c− a)4(c− 1)− (1 + t)2a2c2(c+ 1)4

2(c+ t)2(c− 1)(c− a)4

)

− (1 + t)a2c2(c+ 1)4

2(c+ t)2(c− a)4(c− 1)+

(1 + t)a2c2(c+ 1)4

(c+ t)2(c− 1)(c− a)4

+(t+ 1)3 + c− 1

3t2(c− 1)+

(1 + t)(1 − a)c2(c+ 1)2

(c+ t)(c− 1)2(c− a)2. (153)

Set coefficient of γ as A :

A :=a2c2(c+ 1)4

4(c+ t)2(c− a)4(c− 1)− a2c2(c+ 1)4

2(c+ t)2(c− 1)(c− a)4

+(t+ 1)3 + c− 1

3t2(c− 1)2− (1− a)c2(c+ 1)2

(c+ t)(c− 1)2(c− a)2. (154)

Set coefficient of 1γ as B :

B := − (1 + t)2a2c2(c+ 1)4

4(c+ t)2(c− a)4(c− 1). (155)

Set C as :

C := − (1 + t)a2c2(c+ 1)4

2(c+ t)2(c− a)4(c− 1)+

(1 + t)a2c2(c+ 1)4

(c+ t)2(c− 1)(c− a)4

+(t+ 1)3 + c− 1

3t2(c− 1)+

(1 + t)(1− a)c2(c+ 1)2

(c+ t)(c− 1)2(c− a)2. (156)

Since γ monotonically increases with β in the domain β ∈(0, β1), the minimum γ is γ1 := −eǫ+1 at β = 0, maximum

γ is γ2 := 0 at β = β1.

• If 0 < ǫ < ln 2, a = 0, we have

A =(t+ 1)3 + c− 1

3t2(c− 1)2− (c+ 1)2

(c+ t)(c− 1)2,

B = 0,

VarH[Y |x∗] = Aγ + C is a linear function.

-If β ∈ (0, β1), Appendix O proves:

a) A > 0, if 0 < ǫ < 0.610986.

b) A = 0, if ǫ = 0.610986.

c) A < 0, if 0.610986 < ǫ < β1.

Therefore, minβ maxx∈[−1,1] VarH[Y |x] is at:

a) β = 0, if 0 < ǫ < 0.610986.

b) β = β1, if 0.610986 ≤ ǫ < ln 2.

-If β ∈ [β1, β2], slope1 = t+1eǫ−1+1+ (t+eǫ)((t+1)3+eǫ−1)

3t2(eǫ−1)2 −(eǫ+1)2

(eǫ−1)2 , slope2 = (t+eǫ)((t+1)3+eǫ−1)3t2(eǫ−1)2 − (eǫ+1)2

(eǫ−1)2 .

Fig. 21 proves that slope1 > 0, when ǫ ∈ [0, ln 2].

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7privacy budget

0

2000

4000

6000

8000

10000

12000

14000

slop

e1

Fig. 21: slope1 when ǫ ∈ [0, ln 2].

When a = 0, β1 (178) = βintersection (180) = eǫ−1t+eǫ , the

intersection of VarH[Y |0] and VarH[Y |1] is at β1.

When slope2 = 0, we have root at ǫ = 3 ln(root of 3x5−2x3 + 3x2 − 5x− 3 near x = 1.22588) ≈ 0.610986.

From Fig. 22, we have:

(1) If 0 < ǫ < 0.610986, slope2 > 0.

(2) If ǫ = 0.610986, slope2 = 0.

(3) If ǫ > 0.610986, slope2 < 0.

Based on previous analysis, we have:

(1) If 0 < ǫ < 0.610986,

minβ maxx∈[−1,1] VarH[Y |x] =maxx∈[−1,1] VarH[Y |x, β = β1].

(2) If ǫ = 0.610986, minβ maxx∈[−1,1] VarH[Y |x] =VarH[Y |1, β = β1].

(3) If ǫ > 0.610986, minβ maxx∈[−1,1] VarH[Y |x] =VarH[Y |1, β = β1].

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7privacy budget

-2000

0

2000

4000

6000

8000

10000

12000

14000

slop

e2

Fig. 22: slope2 when ǫ ∈ [0, ln 2].

26

Page 27: Local Differential Privacy based Federated Learning for ...

This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks.

Therefore, minβ maxx∈[−1,1] VarH[Y |x] is at β = β1 if

β ∈ [β1, β2]. Summarize above analysis, we can conclude

that minβ maxx∈[−1,1] VarH[Y |x] =– VarH[Y |x∗, β = 0], if 0 < ǫ < 0.610986.

– maxx∈[−1,1] VarH[Y |x, β = β1], if 0.610986 ≤ ǫ <ln 2.

• If ln 2 ≤ ǫ ≤ ln 5.53,

When β ∈ (0, β1), Appendix O proves A <

0, B < 0, Appendix N proves when γ = −√

BA ,

we have minβ maxx∈[−1,1] VarH[Y |x] =maxx∈[−1,1] VarH[Y |x, β = β3]. Since

γ := β(c + t) − c + 1, we have β3 :=−√

BA+c−1c+t . minβ maxx∈[−1,1] VarH[Y |x] =

maxx∈[−1,1] VarH[Y |x∗, β = β3].When β ∈ [β1, β2], Fig. 23 shows that slope1 > 0 if

ǫ ∈ [ln 2, ln 5.53]. Fig. 24 shows that slope2:

– If 0 < ǫ < 1.4338, slope2 < 0, βintersection <β1, see Fig. 25, minβ maxx∈[−1,1] VarH[Y |x] =maxx∈[−1,1] VarH[Y |x, β = β1].

– If ǫ ≈ 1.4338, slope2 = 0, βintersection <β1, see Fig. 25, minβ maxx∈[−1,1] VarH[Y |x] =maxx∈[−1,1] VarH[Y |x, β = β1].

– If 1.4338 < ǫ ≤ ln 5.53, slope2 >0, minβ maxx∈[−1,1] VarH[Y |x] =maxx∈[−1,1] VarH[Y |x, β = β1].

Since VarH[Y |x, β = β3] < VarH[Y |x, β = β1],minβ maxx∈[−1,1] VarH[Y |x] is at β = β3.

0.6 0.8 1 1.2 1.4 1.6 1.8privacy budget

0.5

1

1.5

2

2.5

3

slop

e1

Fig. 23: slope1 when ǫ ∈ [ln 2, ln 5.53].

• If ǫ > ln 5.53,

– If β ∈ (0, β1), Appendix O proves A < 0 and

B < 0, Appendix N proves when γ = −√

BA , β4 :=

−√

BA+c−1c+t , we get minβ maxx∈[−1,1] VarH[Y |x] =

maxx∈[−1,1] VarH[Y |x, β = β4].– If β ∈ [β1, β2], Appendix P proves slope1 >

0 and Appendix Q proves slope2 > 0. There-

fore, we obtain minβ maxx∈[−1,1] VarH[Y |x] =maxx∈[−1,1] VarH[Y |x, β = β1].

0.6 0.8 1 1.2 1.4 1.6 1.8privacy budget

-0.7

-0.6

-0.5

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

slop

e2

Fig. 24: slope2 when ǫ ∈ [ln 2, ln 5.53].

0.6 0.8 1 1.2 1.4 1.6 1.8privacy budget

-100

-50

0

50

100

150

200

250

300

350

400

inte

rsec

tion

- 1

Fig. 25: βintersection − β1 when ǫ ∈ [ln 2, ln 5.53].

N. Proof of the monotonicity of VarH[Y |x∗]Substituting A (Eq. 154), B (Eq. 155) and C (Eq. 156) into

VarH[Y |x∗] (Eq. 153) yields

VarH[Y |x∗] = Aγ +B

γ+ C. (157)

The first order derivative of (157) is

VarH[Y |x∗]′ = A− B

γ2. (158)

If A− Bγ2 = 0, we get two roots:

γ1 = −√

B

A, γ2 =

B

A.

Since γ < 0, γ1 is eligible. Hereby,

• If γ ∈ (−∞,−√

BA ], VarH[Y |x∗]′ < 0, VarH[Y |x∗]

monotonically decreases.

• If γ ∈ (−√

BA , 0), VarH[Y |x∗]′ > 0, VarH[Y |x∗] mono-

tonically increases.

O. The sign of A to ǫ

If A = 0, we have ǫ = 3 ln(root of 3ǫ5 − 2ǫ3 + 3ǫ2 − 5ǫ−3 near x = 1.22588) ≈ 0.610986.

First order derivative of A is

A′ = −(25eǫ − 27e2ǫ − 9e3ǫ − 12eǫ/3 + 19e2ǫ/3 − e4ǫ/3

+ 41e5ǫ/3 + 7e7ǫ/3 + 5)/(9e2ǫ/3(e2ǫ/3 + 1)2(eǫ − 1)3).(159)

27

Page 28: Local Differential Privacy based Federated Learning for ...

This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks.

• If 0 < ǫ < ln 2, Fig. 26 shows that A′ < 0 and Amonotonically decreases if ǫ ∈ (0, ln 2). Therefore, we

have

– A > 0, if 0 < ǫ < 0.610986.

– A = 0, if ǫ = 0.610986.

– A < 0, if 0.610986 < ǫ < ln 2.

0 0.1 0.2 0.3 0.4 0.5 0.6privacy budget

-14

-12

-10

-8

-6

-4

-2

0

A'

105

Fig. 26: First order derivative of A is less than 0, if 0 < ǫ ≤ln 2.

• If ln 2 ≤ ǫ ≤ ln 5.53, Fig. 27 shows A < 0.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7privacy budget

-1000

0

1000

2000

3000

4000

5000

6000

7000

A

Fig. 27: Value A is less than 0, if ln 2 < ǫ ≤ ln 5.53.

• If ǫ > ln 5.53, we obtain

A = −(16eǫ + 21e2ǫ + 3e3ǫ + 36eǫ3 − 12e

2ǫ3

− 28e5ǫ3 − 8e

7ǫ3 − 12))/(12e

2ǫ3 (eǫ − e

2ǫ3 + e

5ǫ3 − 1)2.

(160)

When A = 0, we have three roots:

r1 ≈ −16.9563, r2 ≈ −1.2284, r3 ≈ 0.0463914.

If the denominator

= 12e2ǫ3 (eǫ − e

2ǫ3 + e

5ǫ3 − 1)2 > 0

and

limǫ→∞

−(16eǫ + 21e2ǫ + 3e3ǫ

+ 36eǫ3 − 12e

2ǫ3 − 28e

5ǫ3 − 8e

7ǫ3 − 12) = −∞,

r3 is the largest real value root, the sign of

−(16eǫ+21e2ǫ+3e3ǫ+36eǫ3−12e 2ǫ

3 −28e 5ǫ3 −8e 7ǫ

3 −12)doesn’t change, so that A < 0 when ǫ > ln 5.53.

P. The sign of slope1 when ǫ > ln 5.53

If slope1 =t+ 1

eǫ − 1+ 1− aeǫ(eǫ + 1)2

(eǫ − 1)(eǫ − a)2

+(t+ eǫ)((t+ 1)3 + eǫ − 1)

3t2(eǫ − 1)2− (1− a)e2ǫ(eǫ + 1)2

(eǫ − 1)2(eǫ − a)2= 0,

we have two roots:

eǫ1 ≈ −0.141506, eǫ2 ≈ 2.21598.

The first order derivative of slope1 is

slope′1 = −e2ǫ3 (10 eǫ + 20) + 20 eǫ + (−27 eǫ − 45) e

ǫ3 + 10

9eǫ3 (eǫ − 1)3

.

The denominator of the slope′1 is > 0. Thus,

limǫ→∞

−e 2ǫ3 (10 eǫ + 20) + 20 eǫ + (−27 eǫ − 45) e

ǫ3 + 10

= −∞.

If slope′1 = 0, we have three roots:

eǫ1 ≈ −1.35696, eǫ2 ≈ 0.0169067, eǫ3 ≈ 4.22192.

Since eǫ3 ≈ 4.22192 is the largest real value root, the sign of

slope′1 doesn’t change when eǫ > 4.22192. Therefore, when

ǫ > ln 5.53 and slope′1 < 0, slope1 monotonically decreases.

By simplifying slope1, we get

slope1 =−(9eǫ − 5e2ǫ/3 − 5e4ǫ/3 + 3)

3(eǫ − 1)2.

Then, we obtain limǫ→∞ slope1 = 0. Thus, we have slope1 > 0if ǫ > ln 5.53.

Q. The sign of slope2 when ǫ > ln 5.53

When

slope2 =(t+ eǫ)((t + 1)3 + eǫ − 1)

3t2(eǫ − 1)2

− (1 − a)e2ǫ(eǫ + 1)2

(eǫ − 1)2(eǫ − a)2= 0,

we have

ǫ1 ≈ ln−1.24835, ǫ2 ≈ ln 1.52144.

The first order derivative of slope2 is :

slope′2 =

−4 e2ǫ + e2ǫ3 (9 eǫ + 63)− 23 eǫ + (−20 eǫ − 10) e

ǫ3 − 3

9e2ǫ3 (eǫ − 1)

3 .

(161)

The denominator of the Eq. (161) is > 0. Besides, from

nominator of Eq. (161), we obtain

limǫ→∞

(−4 e2ǫ + e2ǫ3 (9 eǫ + 63)− 23 eǫ + (−20 eǫ − 10) e

ǫ3 − 3)

= −∞.

If Eq. (161) = 0, we have two roots:

ǫ1 = 3 ln(root of 4x6 − 9x5 + 20x4 + 23x3 − 63x2 + 10x+ 3)

≈ −3.13865,ǫ2 = 3 ln(root of 4x6 − 9x5 + 20x4 + 23x3 − 63x2 + 10x+ 3)

≈ 0.709472.

Since ǫ ≈ 0.709472 is the largest real value root, the sign of

slope′2 doesn’t change if ǫ > 0.709472. Therefore, slope′2 < 0if ǫ > ln 5.53. Simplify

slope2 =3e

ǫ3 − 3eǫ + 5e

2ǫ3 + 2e

4ǫ3 − 9

3(eǫ − 1)2,

28

Page 29: Local Differential Privacy based Federated Learning for ...

This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks.

and then we get

limǫ→∞

slope2 = 0.

Thus, we have slope2 > 0 if ǫ > ln 5.53. �

R. Proof of Lemma 13

For any i ∈ [1, n], the random variable Y [tj ] − x[tj ]has zero mean based on Lemma 12. In both PM-SUB and

HMPM-SUB,Three-Outputs, |Y [tj ] − x[tj ]| ≤ dk ·

(eǫk +e

ǫ3k )(e

ǫ3k +1)

eǫ3k (e

ǫk−1)

.

By Bernstein’s inequality, we have

P [|Z[tj]−X [tj]| ≥ λ]

= Pr

[

|n∑

i=1

{Y [tj ]− x[tj ]}| ≥ nλ

]

≤ 2 · exp( −(nλ)2

2∑n

i=1 Var[Y [tj ]] +23 · nλ · dk ·

(eǫk +e

ǫ3k )(e

ǫ3k +1)

eǫ3k (e

ǫk−1)

)

.

In Algorithm 6, Y [tj ] equals dkyj with probability k

d and 0

with probability 1− kd . Moreover, we obtain E[Y [tj ]] = x[tj ]

from Lemma 12, and then we get

Var[Y [tj ]] = E[(Y [tj ])2]− E[Y [tj ]]

2

=k

dE[(

d

kyj)

2]− (x[tj ])2 =

d

kE[(yj)

2]− (x[tj ])2. (162)

In Algorithm 6, if Line 5 uses PM-SUB, we the variance

in Eq. (139) to compute E[(yj)2], the asymptotic expression

involving ǫ are in the sense of ǫ→ 0.

E[(yj)2] = Var[yj] + (E[yj ])

2

=t( ǫk ) + 1

eǫk − 1

(x[tj ])2 +

(t( ǫk ) + e

ǫk )(

(t( ǫk ) + 1)3 + e

ǫk − 1

)

3(t( ǫk ))

2(eǫk − 1)2

+ (x[tj ])2 = O

(

k2

ǫ2

)

. (163)

In Algorithm 6, if Line 5 uses Three-Outputs, and

then we the variance in Eq. (172) to compute E[(yj)2], the

asymptotic expression involving ǫ are in the sense of ǫ→ 0.

E[(yj)2] = Var[yj] + (E[yj ])

2

=(1 − a)e

2ǫk (e

ǫk + 1)2

(eǫk − 1)2(e

ǫk − a)2

+b|x[tj ]|e

2ǫk (e

ǫk + 1)2

(eǫk − 1)2(e

ǫk − a)2

− (x[tj ])2

+ (x[tj ])2

=(1 − a)e

2ǫk (e

ǫk + 1)2

(eǫk − 1)2(e

ǫk − a)2

+b|x[tj ]|e

2ǫk (e

ǫk + 1)2

(eǫk − 1)2(e

ǫk − a)2

= O(k2

ǫ2),

In Algorithm 6, if Line 5 uses HM-TP, we have:

E[(yj)2] = Var[yj] + (E[yj ])

2

=

(eǫk +1)2

(eǫk−1)2

+ (x[tj ])2, If 0 < ǫ < ǫ∗,

VarH[Y |1, β1,ǫk ] + (x[tj ])

2, If ǫ∗ ≤ ǫ < ln 2,

VarH[Y |1, β3,ǫk ] + (x[tj ])

2, If ǫ ≥ ln 2

= O

(

k2

ǫ2

)

, (164)

where ǫ∗ is defined in the Eq. (151). Then,

Var[Y [tj ]] =d

k

(

t( ǫk ) + 1

eǫk − 1

(x[tj ])2

+(t( ǫ

k ) + eǫk )(

(t( ǫk ) + 1)3 + e

ǫk − 1

)

3(t( ǫk ))

2(eǫk − 1)2

+ (x[tj ])2

)

− (x[tj ])2.

Substituting Eq. (163) into Eq. (162) yields

Var[Y [tj ]] =d

k·O(

k2

ǫ2

)

− (x[tj ])2 = O

(

dk

ǫ2

)

. (165)

Therefore, we obtain

P [|Z[tj ]−X [tj]| ≥ λ] ≤ 2·exp(

− nλ2

O(dk/ǫ2) + λ ·O(d/ǫ)

)

.

By the union bound, there exists λ = O

(√d ln(d/β))

ǫ√n

)

.

Therefore, maxj∈[1,d] |Z[tj ]−X [tj]| = λ = O

(√d ln(d/β))

ǫ√n

)

.

S. Calculate k for PM-SUB and Three-Outputs

We calculate the optimal k for PM-SUB and

Three-Outputs.

(I) We calculate the k for d dimension PM-SUB. When

x[tj ] = 1, we get

maxVar[Y [tj ]]

=d

k

(

t( ǫk ) + 1

eǫk − 1

+(t( ǫ

k ) + eǫk )(

(t( ǫk ) + 1)3 + e

ǫk − 1

)

3(t( ǫk ))

2(eǫk − 1)2

+ 1

)

− 1.

For PM-SUB, we have

maxVar[Y [tj ]]

=d

k

(

eǫ3k + 1

eǫk − 1

+(e

ǫ3k + e

ǫk )(

(eǫ3k + 1)3 + e

ǫk − 1

)

3(eǫ3k )2(e

ǫk − 1)2

+ 1

)

− 1.

Let s = ǫk , and then

maxVar[Y [tj ]]

=d

ǫ· s(

es3 + 1

es − 1+

(es3 + es)

(

(es3 + 1)3 + es − 1

)

3(es3 )2(es − 1)2

+ 1

)

− 1.

Let

f(s) = s ·(

es3 + 1

es − 1+

(es3 + es)

(

(es3 + 1)3 + es − 1

)

3(es3 )2(es − 1)2

+ 1

)

,

and then we obtain

maxVar[Y [tj ]] =d

ǫ· f(s)− 1. (166)

0 2 4 6 8 100

10

20

30

40

50

60

X 2.5Y 4.0976

Fig. 28: Find s for min f(s).

From numerical experiments shown in Fig. 28, we con-

clude that we can get min f(s) and minmaxVar[Y [tj ]]if s = 2.5, i.e. k = ǫ

2.5 .

(II) Calculate the k for d dimension Three-Outputs. The

variance of Y [tj ] is

Var[Y [tj ]] =

29

Page 30: Local Differential Privacy based Federated Learning for ...

This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks.

d

k

(

(1− a)e2ǫk (e

ǫk + 1)2

(eǫk − 1)2(e

ǫk − a)2

+b|x[tj ]|e

2ǫk (e

ǫk + 1)2

(eǫk − 1)2(e

ǫk − a)2

)

− (x[tj ])2

=d

ǫ· s ·

(

(1− a)e2ǫk (e

ǫk + 1)2

(eǫk − 1)2(e

ǫk − a)2

+b|x[tj ]|e

2ǫk (e

ǫk + 1)2

(eǫk − 1)2(e

ǫk − a)2

)

− (x[tj ])2,

where b is from Eq. (75) and a is from Eq. (74).

Let x[tj ]′ = dbe

2ǫk (e

ǫk +1)2

2k(eǫk−1)2(e

ǫk−a)2

, if 0 < x[tj ]′ < 1, the

worst-case noise variance of Y is

maxx∈[−1,1]

Var[Y [tj ]] =

{

Var[x[tj ]′], if 0 < x[tj ]

′ < 1,

max{Var[0],Var[1]}, otherwise.(167)

Let s = ǫk , and then

x[tj ]′ =

d

ǫ· s b(s)e2s(es + 1)2

2(es − 1)2(es − a(s))2

=d

ǫ· s a(s)es(es + 1)2

2(es − 1)(es − a(s))2.

If 0 < dǫ · s

a(s)es(es+1)2

2(es−1)(es−a(s))2 < 1, and then

maxVar[Y [tj ]] = maxVar[x[tj ]′] =

=d

k

(

(1 − a( ǫk ))e2ǫk (e

ǫk + 1)2

(eǫk − 1)2(e

ǫk − a)2

+b( ǫk )x[tj ]

′e2ǫk (e

ǫk + 1)2

(eǫk − 1)2(e

ǫk − a)2

)

− (x[tj ]′)2

=d2

ǫ2· s2 (b(s))2e4s(es + 1)4

2(es − 1)4(es − a(s))4

− d2

ǫ2· s2 (b(s))2e4s(es + 1)4

4(es − 1)4(es − a(s))4

+d

ǫ· s (1 − a(s))e2s(es + 1)2

(es − 1)2(es − a(s))2

=d2

ǫ2· s2 (b(s))2e4s(es + 1)4

4(es − 1)4(es − a(s))4

+d

ǫ· s (1 − a(s))e2s(es + 1)2

(es − 1)2(es − a(s))2. (168)

Substituting b = a · es−1es into Eq. (168) yields

=d2

ǫ2· s2 (a(s))2e2s(es + 1)4

4(es − 1)2(es − a(s))4

+d

ǫ· s (1− a(s))e2s(es + 1)2

(es − 1)2(es − a(s))2.

– If ǫ < ln 2, a = 0, b = 0, first-order derivative of

maxVar[Y [tj ]] is

maxVar[Y [tj ]]′ =

d

ǫ· (e

s + 1)(−4ses + e2s − 1)

(es − 1)3.

(169)

When maxVar[Y [tj ]]′ = 0, we have root s ≈ 2.18.

– If ln 2 < ǫ < ln 5.5, by numerical experiments, we

have optimal s ≈ 2.5.

– If ǫ ≥ ln 5.5, by numerical experiments, we have

optimal s ≈ 2.5.

Therefore, we pick s = 2.5 and k = ǫ2.5 to simplify the

experimental evaluation.

T. Extending Three-Outputs for Multiple Numeric At-

tributes

Lemma 14. For a d-dimensional numeric tuple x which is

perturbed as Y under ǫ-LDP, and for each tj of the d attribute,

the variance of Y [tj ] induced by Three-Outputs is

Var[Y [tj ]] =d

k

(

(1 − a)e2ǫk (e

ǫk + 1)2

(eǫk − 1)2(e

ǫk − a)2

+b|x[tj ]|e

2ǫk (e

ǫk + 1)2

(eǫk − 1)2(e

ǫk − a)2

)

− (x[tj ])2.

Proof of Lemma 14. The variance of Y [tj ] is computed as

Var[Y [tj ]] = E[(Y [tj ])2]− E[Y [tj ]]

2

=k

dE[(

d

kyj)

2]− (x[tj ])2

=d

kE[(yj)

2]− (x[tj ])2. (170)

We use variance Eq. (172) to compute

E[(yj)2] = Var[yj ] + (E[yj ])

2

=(1− a)e

2ǫk (e

ǫk + 1)2

(eǫk − 1)2(e

ǫk − a)2

+b|x[tj ]|e

2ǫk (e

ǫk + 1)2

(eǫk − 1)2(e

ǫk − a)2

− (x[tj ])2

+ (x[tj ])2

=(1− a)e

2ǫk (e

ǫk + 1)2

(eǫk − 1)2(e

ǫk − a)2

+b|x[tj ]|e

2ǫk (e

ǫk + 1)2

(eǫk − 1)2(e

ǫk − a)2

.

Then,

Var[Y [tj ]] =d

k

(

(1 − a)e2ǫk (e

ǫk + 1)2

(eǫk − 1)2(e

ǫk − a)2

+b|x[tj ]|e

2ǫk (e

ǫk + 1)2

(eǫk − 1)2(e

ǫk − a)2

)

− (x[tj ])2.

U. Proof of Lemma 11

Given PM-SUB’s variance in Eq. (138), we have

VarP [Y |x] =(

1

1− 2Ad− 1

)

x2 +2d

3A3 +

(1 − 2Ad)3

12(c− d)2.

Substituting α = Ad yields

VarP [Y |x] =(

1

1− 2α− 1

)

x2 +2α

3d2+

(1 − 2α)3

12(c− d)2.

In addition, substitute 1 − 2α = eǫ−1eǫ+t , d = t(eǫ−1)

2(t+eǫ)2 , ξ :=c−dd , ξ = eǫ − 1, α = t+1

2(t+eǫ) . Then, we have

VarP [Y |x] =t+ 1

eǫ − 1x2 +

(t+ eǫ)(

(t+ 1)3 + eǫ − 1)

3t2(eǫ − 1)2.

(171)

Given Three-Outputs’s variance in Eq. (72), we can

simplify it as

VarT [Y |x] = (1− a)C2 + C2b|x| − x2.

According to Eq. (73), we have C = eǫ(eǫ+1)(eǫ−1)(eǫ−a) , so that

VarT [Y |x] =(1− a)e2ǫ(eǫ + 1)2

(eǫ − 1)2(eǫ − a)2+

b|x|e2ǫ(eǫ + 1)2

(eǫ − 1)2(eǫ − a)2− x2.

(172)

Based on Eq. (171) and Eq. (172), we can construct variance

of hybrid mechanism as follows

VarH[Y |x] = β(t+ 1

eǫ − 1x2 +

(t+ eǫ)(

(t+ 1)3 + eǫ − 1)

3t2(eǫ − 1)2)+

(1− β)((1 − a)e2ǫ(eǫ + 1)2

(eǫ − 1)2(eǫ − a)2+

b|x|e2ǫ(eǫ + 1)2

(eǫ − 1)2(eǫ − a)2− x2),

where t = eǫ/3.

30

Page 31: Local Differential Privacy based Federated Learning for ...

This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks.

From Eq. (73), we set b = a· eǫ−1eǫ to get the worst-case noise

variance. Then, we have variance of the hybrid mechanism as

VarH[Y |x] = (βt+ 1

eǫ − 1+ β − 1)x2

+ (1− β)aeǫ(eǫ + 1)2

(eǫ − 1)(eǫ − a)2|x|

+

(

(t+ eǫ)((t + 1)3 + eǫ − 1)

3t2(eǫ − 1)2β

+ (1− β)(1 − a)e2ǫ(eǫ + 1)2

(eǫ − 1)2(eǫ − a)2

)

, (173)

where t = eǫ/3. Based on Eq. (173), we get

maxx∈[−1,1]

VarH[Y |x] ={

VarH[Y |x∗], if β t+1eǫ−1 + β − 1 < 0, 0 < x∗ < 1,

max{VarH[Y |0],VarH[Y |1]}, otherwise,(174)

where x∗ := (β−1)aeǫ(eǫ+1)2

2(eǫ−a)2(β(eǫ+t)−eǫ+1) .

Therefore, we have the following cases to compute

maxx∈[−1,1] VarH[Y |x]:(I) If β t+1

eǫ−1 + β − 1 < 0, 0 < Y < 1, we obtain:

– β < eǫ−1eǫ+t ,

– For Y := (β−1)aeǫ(eǫ+1)2

2(eǫ−a)2(β(eǫ+t)−eǫ+1) , if 0 < Y < 1, we

have 0 < (β−1)aeǫ(eǫ+1)2

2(eǫ−a)2(β(eǫ+t)−eǫ+1) < 1.

If(β−1)aeǫ(eǫ+1)2

2(eǫ−a)2(β(eǫ+t)−eǫ+1) < 1, we have :

β(2(eǫ − a)2(eǫ + t)− aeǫ(eǫ + 1)2) >

2(eǫ − a)2(eǫ − 1)− aeǫ(eǫ + 1)2. (175)

– If 2(eǫ − a)2(eǫ + t)− aeǫ(eǫ + 1)2 > 0, we have

β >2(eǫ − a)2(eǫ − 1)− aeǫ(eǫ + 1)2

2(eǫ − a)2(eǫ + t)− aeǫ(eǫ + 1)2.

– If 2(eǫ − a)2(eǫ + t)− aeǫ(eǫ + 1)2 = 0 and 2(eǫ −a)2(eǫ − 1) − aeǫ(eǫ + 1)2 > 0, no β satisfies the

condition.

– If 2(eǫ − a)2(eǫ + t)− aeǫ(eǫ + 1)2 = 0 and 2(eǫ −a)2(eǫ − 1) − aeǫ(eǫ + 1)2 ≤ 0, any β satisfies the

condition.

– If 2(eǫ − a)2(eǫ + t)− aeǫ(eǫ + 1)2 < 0, we have

β < 2(eǫ−a)2(eǫ−1)−aeǫ(eǫ+1)2

2(eǫ−a)2(eǫ+t)−aeǫ(eǫ+1)2 . Since β <eǫ−1eǫ+t , to get the correct domain, we com-

pare2(eǫ−a)2(eǫ−1)−aeǫ(eǫ+1)2

2(eǫ−a)2(eǫ+t)−aeǫ(eǫ+1)2 and eǫ−1eǫ+t , see Ap-

pendix G, we have2(eǫ−a)2(eǫ−1)−aeǫ(eǫ+1)2

2(eǫ−a)2(eǫ+t)−aeǫ(eǫ+1)2 ≤eǫ−1eǫ+t . Therefore, β < 2(eǫ−a)2(eǫ−1)−aeǫ(eǫ+1)2

2(eǫ−a)2(eǫ+t)−aeǫ(eǫ+1)2 .

Summarize above analysis, we have the following cases

to compute maxx∈[−1,1] VarH[Y |x] :

1) If 2(eǫ − a)2(eǫ + t)− aeǫ(eǫ + 1)2 > 0, we have:

maxx∈[−1,1]

VarH[Y |x] ={

VarH[Y |x∗], if2(eǫ−a)2(eǫ−1)−aeǫ(eǫ+1)2

2(eǫ−a)2(eǫ+t)−aeǫ(eǫ+1)2 < β < eǫ−1eǫ+t ,

max{VarH[Y |0],VarH[Y |1]}, otherwise.

2) If 2(eǫ−a)2(eǫ+ t)−aeǫ(eǫ+1)2 = 0 and aeǫ(eǫ+1)2 + 2(eǫ − a)2(1− eǫ) > 0, we have:

maxx∈[−1,1]

VarH[Y |x] = max{VarH[Y |0],VarH[Y |1]}

3) If 2(eǫ− a)2(eǫ + t)− aeǫ(eǫ + 1)2 = 0 and 2(eǫ−a)2(eǫ − 1)− aeǫ(eǫ + 1)2 ≤ 0, we have:

maxx∈[−1,1]

VarH[Y |x] ={

VarH[Y |x∗], if β < eǫ−1eǫ+t ,

max{VarH[Y |0],VarH[Y |1]}, otherwise.

4) If 2(eǫ − a)2(eǫ + t)− aeǫ(eǫ + 1)2 < 0, we have:

maxx∈[−1,1]

VarH[Y |x] ={

VarH[Y |x∗], if β < 2(eǫ−a)2(eǫ−1)−aeǫ(eǫ+1)2

2(eǫ−a)2(eǫ+t)−aeǫ(eǫ+1)2 ,

max{VarH[Y |0],VarH[Y |1]}, otherwise.

Appendix H proves that

2(eǫ − a)2(eǫ + t)− aeǫ(eǫ + 1)2 > 0

and Appendix G2(eǫ − a)2(eǫ − 1)− aeǫ(eǫ + 1)2

2(eǫ − a)2(eǫ + t)− aeǫ(eǫ + 1)2≤ eǫ − 1

eǫ + t.

Therefore, we have

maxx∈[−1,1]

VarH[Y |x] ={

VarH[Y |x∗], if 0 < β < 2(eǫ−a)2(eǫ−1)−aeǫ(eǫ+1)2

2(eǫ−a)2(eǫ+t)−aeǫ(eǫ+1)2 ,

max{VarH[Y |0],VarH[Y |1]}, otherwise.

(II) Based on above analysis, to make maxx∈[−1,1] VarH[Y =y|x] = max{VarH[Y |0],VarH[Y |1]}, β should satisfy

constraint2(eǫ−a)2(eǫ−1)−aeǫ(eǫ+1)2

2(eǫ−a)2(eǫ+t)−aeǫ(eǫ+1)2 ≤ β ≤ 1.

To get the exact value of maxx∈[−1,1] VarH[Y |x], we

compare VarH[Y |1] and VarH[Y |0], values of VarH[Y |1]and VarH[Y |0] are:

– VarH[Y |1] = (β t+1eǫ−1 + β − 1) + (1 −

β) aeǫ(eǫ+1)2

(eǫ−1)(eǫ−a)2+( (t+eǫ)((t+1)3+eǫ−1)

3t2(eǫ−1)2 β+(1−β)(1−a) e2ǫ(eǫ+1)2

(eǫ−1)2(eǫ−a)2)

= β(

t+1eǫ−1 + 1 − aeǫ(eǫ+1)2

(eǫ−1)(eǫ−a)2 +(t+eǫ)((t+1)3+eǫ−1)

3t2(eǫ−1)2 − (1−a)e2ǫ(eǫ+1)2

(eǫ−1)2(eǫ−a)2)

− 1 +aeǫ(eǫ+1)2

(eǫ−1)(eǫ−a)2 + (1−a)e2ǫ(eǫ+1)2

(eǫ−1)2(eǫ−a)2 ,

– VarH[Y |0] = (t+eǫ)((t+1)3+eǫ−1)3t2(eǫ−1)2 β + (1 − β)(1 −

a) e2ǫ(eǫ+1)2

(eǫ−1)2(eǫ−a)2 = β( (t+eǫ)((t+1)3+eǫ−1)

3t2(eǫ−1)2 −(1−a)e2ǫ(eǫ+1)2

(eǫ−1)2(eǫ−a)2)

+ (1−a)e2ǫ(eǫ+1)2

(eǫ−1)2(eǫ−a)2 .

Since VarH[Y |1] and VarH[Y |0] are linear equations

respect to β, we compare slopes of β in VarH[Y |1] and

VarH[Y |0]. We define the slope of β in VarH[Y |1] as

slope1 :=t+ 1

eǫ − 1+ 1− aeǫ(eǫ + 1)2

(eǫ − 1)(eǫ − a)2

+(t+ eǫ)((t+ 1)3 + eǫ − 1)

3t2(eǫ − 1)2− (1 − a)e2ǫ(eǫ + 1)2

(eǫ − 1)2(eǫ − a)2,

(176)

and the slope of β in VarH[Y |0] as

slope2 :=(t+ eǫ)((t+ 1)3 + eǫ − 1)

3t2(eǫ − 1)2− (1− a)e2ǫ(eǫ + 1)2

(eǫ − 1)2(eǫ − a)2.

(177)

Then, we represent left boundary of β as

β1 :=2(eǫ − a)2(eǫ − 1)− aeǫ(eǫ + 1)2

2(eǫ − a)2(eǫ + t)− aeǫ(eǫ + 1)2, (178)

and the right boundary of β as

β2 := 1, (179)

31

Page 32: Local Differential Privacy based Federated Learning for ...

This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks.

and the value of β at the intersection of slope1 and slope2is

βintersection :=(c− 1)(c− a)2 − ac(c+ 1)2

(t+ c)(c− a)2 − ac(c+ 1)2. (180)

Then, slope1 and slope2 have the following possible com-

binations:

1) If slope1 > 0, slope2 > 0, β = β1.

2) If slope1 < 0, slope2 < 0, β = β2.

3) If slope1 · slope2 < 0, If β1 < βintersection < β2, β =βintersection.

4) If slope1 · slope2 < 0, If ββintersection < β1 or

βintersection > β2, find β for min{

max{Var[Y |1, β =β1],Var[Y |0, β = β1]},max{Var[Y |1, β = β2],Var[Y |0, β = β2]}

}

.

5) If slope1 · slope2 = 0,

Case 1: slope1 = 0, slope2 6= 0,

a) If slope1 = 0, slope2 > 0, βintersection ∈ [β1, β2],β = [β1, βintersection].

b) If slope1 = 0, slope2 < 0, βintersection ∈ [β1, β2],β = [βintersection, β2].

c) If slope1 = 0, slope2 > 0, βintersection < β1

or βintersection > β2, VarH[Y |1, β = [β1, β2]] >VarH[Y |0, β = [β1, β2]], β = [β1, β2].

d) If slope1 = 0, slope2 > 0, βintersection < β1

or βintersection > β2, VarH[Y |0, β = [β1, β2]] >VarH[Y |1, β = [β1, β2]], β = β1.

e) If slope1 = 0, slope2 < 0, βintersection < β1

or βintersection > β2, VarH[Y |1, β = [β1, β2]] >VarH[Y |0, β = [β1, β2]], β = [β1, β2].

f) If slope1 = 0, slope2 < 0, βintersection < β1

or βintersection > β2, VarH[Y |1, β = [β1, β2]] <VarH[Y |0, β = [β1, β2]], β = β2.

Case 2: slope2 = 0, slope1 6= 0,

a) If slope1 > 0, slope2 = 0 , βintersection ∈ [β1, β2],β = [β1, βintersection].

b) If slope1 < 0, slope2 = 0 , βintersection ∈ [β1, β2],β = [βintersection, β2].

c) If slope2 = 0, slope1 > 0, βintersection < β1

or βintersection > β2, VarH[Y |1, β = [β1, β2]] >VarH[Y |0, β = [β1, β2]], β = β1.

d) If slope2 = 0, slope1 > 0, βintersection < β1

or βintersection > β2, VarH[Y |0, β = [β1, β2]] >VarH[Y |1, β = [β1, β2]], β = [β1, β2].

e) If slope2 = 0, slope1 < 0, βintersection < β1

or βintersection > β2, VarH[Y |1, β = [β1, β2]] >VarH[Y |0, β = [β1, β2]], β = β2.

f) If slope2 = 0, slope1 < 0, βintersection < β1

or βintersection > β2, VarH[Y |1, β = [β1, β2]] <VarH[Y |0, β = [β1, β2]], β = [β1, β2].

Case 3: slope1 = 0 and slope2 = 0,

a) If VarH[Y |1, β = [β1, β2]] < VarH[Y |0, β =[β1, β2]], β = [β1, β2].

b) If VarH[Y |1, β = [β1, β2]] > VarH[Y |0, β =[β1, β2]], β = [β1, β2].

c) If VarH[Y |1, β = [β1, β2]] = VarH[Y |0, β =[β1, β2]], β = [β1, β2].

Proof. 1) If slope1 > 0, slope2 > 0, VarH[Y |1]and VarH[Y |0] monotonically increase β ∈ [β1, β2],minβ maxx∈[−1,1] VarH[Y |x] is at β = β1.

2) Similar to 1), we have minβ maxx∈[−1,1] VarH[Y |x] is

at β = 1.

3) If β1 < βintersection < β2, we have:

• If slope1 > 0, slope2 < 0, Var[Y |1] monotonically

increases and Var[Y |0] monotonically decreases, so when

β ∈ [β1, βintersection], maxx∈[−1,1] VarH[Y |x] =VarH[Y |0]. When β ∈ [βintersection, β2],maxx∈[−1,1] VarH[Y |x] = VarH[Y |1]. Therefore,

minβ maxx∈[−1,1] VarH[Y |x] = VarH[Y |1] = VarH[Y |0]at β = βintersection.

• If slope1 < 0, slope2 > 0, Var[Y |1] monotonically

decreases and Var[Y |0] monotonically increases, so when

β ∈ [β1, βintersection], maxx∈[−1,1] VarH[Y |x] =VarH[Y |1]. When β ∈ [βintersection, β2],maxx∈[−1,1] VarH[Y |x] = VarH[Y |0]. Therefore,

minβ maxx∈[−1,1] VarH[Y |x] = VarH[Y |1] = VarH[Y |0]at β = βintersection.

4) If βintersection < β1 and β2 > βintersection, we have:

• If slope1 > 0, slope2 < 0 and VarH[Y |1] >VarH[Y |0], since VarH[Y |1] monotonically increases

in the domain, minβ VarH[Y |1] is at β = β1.

minβ maxx∈[−1,1] VarH[Y |x] is at β = β1.

• If slope1 > 0, slope2 < 0 and VarH[Y |1] <VarH[Y |0]. Since VarH[Y |0] monotonically decreases

in the domain, minβ VarH[Y |0] is at β = β2.

minβ maxx∈[−1,1] VarH[Y |x] is at β = β2.

• If slope1 < 0, slope2 > 0 and VarH[Y |1] >VarH[Y |0],maxx∈[−1,1] VarH[Y |x] = VarH[Y |1],since VarH[Y |1] monotonically decreases in

the domain, minβ VarH[Y |1] is at β = β2.

minβ maxx∈[−1,1] VarH[Y |x] is at β = β2.

• If slope1 < 0 , slope2 > 0 and VarH[Y |1] < VarH[Y |0],maxx∈[−1,1] VarH[Y |x] = VarH[Y |0], since VarH[Y |0]monotonically increases in the domain, minβ VarH[Y |0]is at β = β1. minβ maxx∈[−1,1] VarH[Y |x] is at β = β1.

5)

• Case 1:

a) If slope1 = 0, slope2 > 0, βintersection ∈[β1, β2], we can conclude that β ∈[β1, βintersection],VarH[Y |1] > VarH[Y |0]. When

β ∈ (βintersection, β2],VarH[Y |0] > VarH[Y |1].Since VarH[Y |0] monotonically increases if β ∈(βintersection, β2], minβ maxx∈[−1,1] VarH[Y |x] =VarH[Y |1, β = [β1, βintersection]].

b) If slope1 = 0, slope2 < 0, βintersection ∈[β1, β2], we can conclude that β ∈[β1, βintersection],VarH[Y |0] > VarH[Y |1].When β ∈ (βintersection, β2],VarH[Y |1] >VarH[Y |0]. Since VarH[Y |1] does not change

and VarH[Y |0] monotonically decreases,

minβ maxx∈[−1,1] VarH[Y |x] = VarH[Y |0, β =[βintersection, β2]].

c) If slope1 = 0, slope2 > 0,VarH[Y |1, β = [β1, β2]] >VarH[Y |0, β = [β1, β2]], the βintersection > β2.

32

Page 33: Local Differential Privacy based Federated Learning for ...

This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks.

Since VarH[Y |1] does not change if β ∈ [β1, β2],minβ maxx∈[−1,1] VarH[Y |x] = VarH[Y |1, β =[β1, β2]].

d) If slope1 = 0, slope2 > 0,VarH[Y |0, β = [β1, β2]] >VarH[Y |1, β = [β1, β2]], the βintersection >β2. Since VarH[Y |0] monotonically decreases if

β ∈ [β1, β2], minβ maxx∈[−1,1] VarH[Y |x] =VarH[Y |0, β = β2].

e) If slope1 = 0, slope2 < 0,VarH[Y |1, β = [β1, β2]] >VarH[Y |0, β = [β1, β2]], the βintersection < β1.

Since VarH[Y |1] does not change if β ∈ [β1, β||2],minβ maxx∈[−1,1] VarH[Y |x] = VarH[Y |1, β =[β1, β2]].

f) If slope1 = 0, slope2 < 0,VarH[Y |0, β = [β1, β2]] >VarH[Y |1, β = [β1, β2]], the βintersection >β2. Since VarH[Y |0] monotonically decreases if

β ∈ [β1, β2], minβ maxx∈[−1,1] VarH[Y |x] =VarH[Y |0, β = β2].

• Case 2: The proof is similar to Case 1.

• Case 3: VarH[Y |0] and VarH[Y |1] are unchanged when

β ∈ [β1, β2]. Hence, minβ maxx∈[−1,1] VarH[Y |x] =maxx∈[−1,1] VarH[Y |x, β = [β1, β2]].

33