Paper-4 Add-In Macros for Privacy-Preserving Distributed Logrank Test Computation
-
Upload
rachel-wheeler -
Category
Documents
-
view
218 -
download
0
Transcript of Paper-4 Add-In Macros for Privacy-Preserving Distributed Logrank Test Computation
-
7/28/2019 Paper-4 Add-In Macros for Privacy-Preserving Distributed Logrank Test Computation
1/6
International Journal of Computational Intelligence and Information Security, June 2013 Vol. 4, No. 6ISSN: 1837-7823
* This paper was supported by NSF CNS-0845149 and CCF-0915374. Part of the results were presented at
[UNESST 2012]
Add-in Macros for Privacy-preserving Distributed Logrank Test
Computation*
Yu Li1
and Sheng Zhong1Department of Computer Science and Engineering, the State University of New York at
Buffalo, Buffalo, NY USA 14260
Abstract
Survival analysis is frequently used for dealing with survival outcomes in biological organisms. However it
is a tedious process to compare survival curves step by step. In this study, we designed and developed a user-
friendly, cloud based Microsoft Excel privacy-preserving program, named Scorpio, for incorporation of
electronic health care using privacy preserving logrank test model.
Keywords: Survival curves, Logrank test, Privacy preserving, Excel
1. Introduction
In modern society, people care about their privacy issues increasingly more with the development of
information technology. In hospital, patients will have their own medical records stored in the computer, so that
biomedical scientists can use this information to do some research. These records will include the medical
history of patients such as laboratory test results and medications prescribed. In order to prevent the leak of
personal electronic health record, the federal Health Insurance Portability and Accountability Act (HIPAA) has
set a national standard to protect privacy of this kind of information. Since the explosive growth of medical
research in recent years, biomedical scientists have come up with the idea of using these electronic medical data
for incorporate research. However, the privacy and security issue still has been the most concerned thing that
impedes such kind of incorporate research. For this reason, with the development of information and cryptograph
technology, there is a trend that using computer methods and programs to help medical scientists to solve the
privacy issue without revealing patients information to others. Survival analysis is also called time to event
analysis. Survival analysis is very useful for studying different kinds of event like disease onset, earthquakes,
stock market crash [1]. Survival analysis can be used to predict after observing a set of individuals at some
specifically time point and continuous monitoring them for fixed intervals of time. Therefore, how to build a
survival analysis model is the most critical component to get a better prediction. In biomedical field, survival
analysis mainly means observing time to death of experimental subject. Obviously, If having more experiment
data that we used for training we can get a more precise model. Therefore, biomedical researchers want to
combine the data from different institutes to build a better survival analysis model, especially survival function
comparison models [2]. For the privacy and security issues, computer scientist can use privacy preserving
method to protect the data from revealing to anyone. In order to compare the survival curves without revealing
the data, [2] has come up with a privacy preserving model that can protect the data privacy.
However it is a tedious process to compare survival curves step by step. In medical area, Microsoft Excel is
widely used due to its friendly user-interface and easy operation. Compared with other statistical computing
softwares like SAS and SPSS etc, although most of these softwares have a strong data management ability, the
usage of them will be complicated for medical people who has not been training professionally. Microsoft Excelhas been widely applied in Medical institutes no matter it is used for store experimental data or creates survival
curves. It can help medical scientists to analyse and make better decisions. Besides these, Microsoft Excel has a
strong ability to let VBA (Visual Basic for Applications) or Macro develop programs to control Excel. Therefore,
most of biomedical scientists are more willing to use Microsoft Excel to store the data that get from the
experiment. Consequently, many scientists have developed programs which can apply to Microsoft Excel
immediately and automatically. In [3], Hitoshi Sato presented a package of macro programs (named PK
MOMENT) to automatically calculate non-compartmental pharmacokinetic parameters on Microsoft Excel
spreadsheet. In [4], Zhang presents PKSolver, a freely available menu-driven add-in program for Microsoft
Excel written in Visual Basic for Applications (VBA), for solving basic problems in pharmacokinetic (PK) and
pharmacodynamic (PD) data analysis. In [5], Brown presented a simple, easily understood methodology for
solving biologically based models using a Microsoft Excel spreadsheet. In [6], a user-friendly, inexpensive
EXCEL-based program to find potential phosphorylation sites in proteins is presented.
In this paper, we develop a user-friendly, cloud based Microsoft Excel privacy-preserving program, named
Scorpio, for incorporation of electronic health care using privacy preserving logrank test model. Since the
-
7/28/2019 Paper-4 Add-In Macros for Privacy-Preserving Distributed Logrank Test Computation
2/6
International Journal of Computational Intelligence and Information Security, June 2013 Vol. 4, No. 6ISSN: 1837-7823
37
program does not require any programming skills or any use of VBA or Macro language. Once the data from all
institutes are ready, the program can be run automatically. In the rest of this paper, we describe the method of
creating privacy preserving comparison test of survival curves, especially data store and collection method as
well as the design and implementation of our program.
2. MethodsLogrank test is a standard comparison test of survival curves. When a research institute wants to raise a
computation for logrank test, he needs to collect data from different medical institutes. However, some medical
data are very sensitive. How to compute the logrank test without revealing these data to other people who does
not own is a big issue. In [2], the authors have come up with a privacy preserving secure sum method which
generate an initial random number and add it to the first medical institutes data. Here, we introduce their method
briefly. They suppose there are n groups of individuals.
Table 1: Summary of Denotations for Logrank Test
: the number of individuals that are alive in group k at the beginning of time interval j.: the number of events occurring in group k in interval j.
: the number of observed deaths in group k.: expected number of deaths in group k.
The finalZis the logrank test result. A smallerZindicates that the hypothesis has a higher probability that is
true. In [2], the authors assume there are s parties (s > 3) involved in this logrank test computation. Theyprovided a privacy preserving method that let the first institute who participate this survival analysis
computation add a random number to its data. The range of the random number should as same as and .
Then pass it to the next participant. Similarly, every other participant adds its local value to the sums that it
receives and sends the new sums to the next party. Finally, the first institute can get the sum and calculate the
logrank test with the random number he already knows. In this process, actual values of and are hidden
behind the random numbers [2].
Based on this privacy preserving model, we design a program that can automatically collect data from each
participate medical institute and add these data to the initial file immediately. After collecting all the data, the
program then calculate the quotient of the number of events occurring divided by the number of individuals that
are alive in each interval. Then each medical institute can get the value automatically. After that each institute
can calculate the expected number of deaths and logrank test statistic automatically. Then we let the program
repeat the method again that add another random number to the first medical institutes logrank test result and
add up all these result. Then first institute who rise up the comparison can get the final logrank test statistic and
inform all other participants.
-
7/28/2019 Paper-4 Add-In Macros for Privacy-Preserving Distributed Logrank Test Computation
3/6
International Journal of Computational Intelligence and Information Security, June 2013 Vol. 4, No. 6ISSN: 1837-7823
38
Specifically we use cloud-based storage to collect the data from each institute. Cloud-based storage can let
everybody who has the permission reach the file from anywhere. In this part, as shown in figure 1, we first let
party 1 add a random number on its data and upload the file into the server, then party 2 download the file and
add its own data on the existing data, then upload the file to the server. Go on like this until the last party done.
Therefore the first party can get the sum of actual data after minus the random number. After that program can
automatically call Microsoft Excel Macro we developed to calculate the value we need. After that party 1 can getthe final logrank test statistic result and let other participated institutes know.
3. Program Description
3.1 Software Design
The program is developed using C# combined with Microsoft Excel VBA which is universal available and
very convenient in Bio-medical research institute. We assume every medical institute uses Microsoft Excel to
store the survival data. In order to protect privacy of these survival data, our program add a random number to
the original data of first institute. Then the first institute raises the requirement of computation for the survival
curve comparing logrank test. Our program will automatically upload the file to the server and add other
institutes data to the existing data. Therefore, the institute participated the computation will not know others
survival data. Although this can be done manually, it will be very tedious and waste a lot of time to click the
button when calculate the value using Excel. However our program can easily read the input file and calculate
the logrank survival comparison automatically without revealing data to others.
Figure 1: The flow chart of our program
3.2 How to use Scorpio
After one institute sets up a server that use for store the file, the institute who wants to participate the
logrank test calculation runs the program we developed as shown in figure 2. First, every institute should
connect to the server. Then one medical institute who wants to raise the calculation uploads their files that has
added a random number on the data, and chooses the participant and click the send button. Then each participant
will receive a message in turn. After that the program will download the file and add their data on the previous
data in the file and upload it. After all participants finishing adding their data, the first party can get the whole
data with the random number he added.
-
7/28/2019 Paper-4 Add-In Macros for Privacy-Preserving Distributed Logrank Test Computation
4/6
International Journal of Computational Intelligence and Information Security, June 2013 Vol. 4, No. 6ISSN: 1837-7823
39
Figure 2: The program user interface for privacy preserving logrank test
3.3 Computation of survival curves comparing using logrank test
For the computation of survival curve comparing using logrank test, after the program collecting the data
from all medical institutes, the program minus the random number which has been added to the original data of
first institute and get the whole alive and death number of every intervals. The program then calculates the
summation of all alive and death number respectively.
3.4 Program Code
Here we list some Excel Micro we developed in our program.
Add Random Number
Sub AddRandomNumber()
Range("E1").Select
ActiveCell.FormulaR1C1 = "=RANDBETWEEN(1,20)"
Range("F1").Select
ActiveCell.FormulaR1C1 = "Random Number for d"
Range("E1").Select
Selection.Copy
Range("F2:F12").Select
Selection.PasteSpecial Paste:=xlPasteValues, Operation:=xlNone, SkipBlanks _
:=False, Transpose:=False
Range("E1").Select
Application.CutCopyMode = False
ActiveCell.FormulaR1C1 = ""
Range("E1").Select
ActiveCell.FormulaR1C1 = "dj+RND"
Range("E2").Select
ActiveCell.FormulaR1C1 = "=RC[-3]+RC[1]"
Range("E2").Select
Selection.AutoFill Destination:=Range("E2:E12"), Type:=xlFillDefault
Range("E2:E12").SelectRange("H1").Select
-
7/28/2019 Paper-4 Add-In Macros for Privacy-Preserving Distributed Logrank Test Computation
5/6
International Journal of Computational Intelligence and Information Security, June 2013 Vol. 4, No. 6ISSN: 1837-7823
40
ActiveCell.FormulaR1C1 = "=RANDBETWEEN(1,20)"
Range("I1").Select
ActiveCell.FormulaR1C1 = "Random Number for n"
Range("H1").Select
Selection.Copy
Range("I2:I12").Select
Selection.PasteSpecial Paste:=xlPasteValues, Operation:=xlNone, SkipBlanks
:=False, Transpose:=False
Range("H2").Select
Application.CutCopyMode = False
ActiveCell.FormulaR1C1 = ""
Range("H1").Select
ActiveCell.FormulaR1C1 = "nj+RNN"
Range("H2").Select
ActiveCell.FormulaR1C1 = "=RC[-5]+RC[1]"
Selection.AutoFill Destination:=Range("H2:H12"), Type:=xlFillDefault
End Sub
Compute Ek
Sub ComputeE()
Range("M1").SelectApplication.CutCopyMode = False
ActiveCell.FormulaR1C1 = "E"
Range("M2").Select
ActiveCell.FormulaR1C1 = "=R[-1]C[-10]*R[-1]C[-2]"
Range("M2").Select
ActiveCell.FormulaR1C1 = "=RC[-10]*RC[-2]"
Range("M2").Select
Selection.AutoFill Destination:=Range("M2:M12"), Type:=xlFillDefault
End Sub
4. Samples of Program Runs
The medical scientists usually prefer to use Microsoft Excel to store the data that gets from experiment.
They also care about the privacy issue when they want to combine the data from different medical institute to do
some research. The Scorpio program is specially designed for medical scientists to combine their survival data to
generate comparing survival curves using logrank test. The input data is as figure 3 shows. The medical
scientists just only need to type the alive and death number into different time intervals. After the program
collect all required data from other institutes, the first party use the macro we provide can get the final logrank
test statistic result as figure 4 shows.
-
7/28/2019 Paper-4 Add-In Macros for Privacy-Preserving Distributed Logrank Test Computation
6/6
International Journal of Computational Intelligence and Information Security, June 2013 Vol. 4, No. 6ISSN: 1837-7823
41
Figure 3: Original data owned by each institute which should be keep confidential from revealing to other parties
Figure 4: The final result of privacy-preserving logrank test statistic
5. Hard ware and software specifications
An Intel CORE i5 computer (2GB RAM) running under windows 7 operating system was used. The
program was developed using Microsoft Excels macro language in Excel 2010 platform.
6. Conclusion
In this paper, we have designed a Microsoft Excel Macro based privacy preserving program for survival
curves comparison using logrank test. In order to make it easy to use and protect the data privacy, the program
can be applied to Microsoft Excel immediately which is widely used by clinics and biomedical scientists. The
program also can protect privacy of the data by adding random number to the original data. Experiments on the
real medical data have shown the effectiveness of our proposed program.
References
[1] Allison, P.D. (2010) Survival analysis using SAS: A practical guide, SAS publishing.
[2] Chen, T. and Zhong, S (2011) Privacy-Preserving Models for Comparing Survival Curves Using the
Logrank Test, Computer methods and programs in biomedicine.
[3] Sato, H. and Sato, S. and Wang, Y.M. and Horikoshi, I. (1996) Add-in macros for rapid and versatile
calculation of non-compartmental pharmacokinetic parameters on Microsoft Excel spreadsheets., Computer
methods and programs in biomedicine.50,1,43-52.
[4] Zhang, Y. and Huo, M. and Zhou, J. and Xie, S.(2010) PKSolver: An add-in program for pharmacokinetic
and pharmacodynamic data analysis in Microsoft Excel. Computer methods and programs in biomedicine.
99,3,306-314.
[5] Brown, M. (1999) A methodology for simulating biological systems using Microsoft Excel. Computer
methods and programs in biomedicine. 58,2,181-190
[6] Wera, S. (1998): An EXCEL-based method to search for potential Ser/Thr-phosphorylation sites in proteins.
Computer methods and programs in biomedicine. 58,1,65-68
[7] Li, Y and Zhong, S. (2012) Scorpio: A simple, convenient, Microsoft Excel Macro based program for
privacy-preserving logrank test. Computer Applications for Database, Education, and Ubiquitous Computing.
86-91