Searching for Efficacy Biomarker in Early Clinical Development – An Example of Using NGS in Early...

23
Searching for Efficacy Biomarker in Early Clinical Development – An Example of Using NGS in Early Oncology Development Feng Gao, Jacob Zhang, Godwin Yung and Ray Liu 19 th May 2015

Transcript of Searching for Efficacy Biomarker in Early Clinical Development – An Example of Using NGS in Early...

Page 1: Searching for Efficacy Biomarker in Early Clinical Development – An Example of Using NGS in Early Oncology Development Feng Gao, Jacob Zhang, Godwin Yung.

Searching for Efficacy Biomarker in Early Clinical Development – An Example of Using NGS in Early Oncology Development

Feng Gao, Jacob Zhang, Godwin Yung and Ray Liu

19th May 2015

Page 2: Searching for Efficacy Biomarker in Early Clinical Development – An Example of Using NGS in Early Oncology Development Feng Gao, Jacob Zhang, Godwin Yung.

2

Introduction

• Advances in next-generation sequencing (NGS) technologies provide a powerful tool for gaining deeper understanding of biological processes for diseases, but also provide new challenges for statistical analysis.

• In this study, we wanted to identify signatures of tumor somatic mutation that are associated with clinical efficacy.

• In the study:– Sample: archived tumor samples that include 5 different solid tumor types. All

patients are under the same treatment (single arm).– NGS whole exome sequencing (WES): variant calling from tumor-germline

pairs on coding regions (coding exons) to identify cancer somatic mutations. – Clinical endpoints: tumor size change (%), and PFS.– Sample size and power: started with ~120 samples, with adequate power for

testing 20 candidate genes.– Several tiers of NGS data: from WES, from cancer related gene list, from

further reduced top candidate cancer genes.– Types of statistical analyses: univariate analysis, multivariate analysis and

pathway analysis.

|○○○○  |    DDMMYY

Page 3: Searching for Efficacy Biomarker in Early Clinical Development – An Example of Using NGS in Early Oncology Development Feng Gao, Jacob Zhang, Godwin Yung.

3

Challenges

• Sample quality: archived, FFPE treated samples had high attrition rate in the NGS process. Samples from clinically non-evaluable subjects were also removed. As a result, sample size with available NGS data and clinical data is reduced to n=47.

• Challenges with small sample size: – NGS data at variant level cannot be analyzed – almost all of them are singletons. We

have to aggregate the data to gene level. We have to make the assumption that all the variants within a gene have the effect in the same direction.

– Many genes were non-variant, or singletons, and have to be removed from univariate analysis, thus reducing the number of genes in the univariate analysis.

– With small sample size, the mutation patterns with several different genes could appear identical in our data, making it difficult to interpret the result.

– Data from different tumor types have to be pooled together, creating heterogeneity issues.

– Univariate analysis did not produce statistically significant findings from the top candidate cancer gene list after multiplicity adjustment. Top genes from univariate analysis on WES list were heavily influenced by a single patient. Multivariate regression approach generated gene pairs that lack biological interpretation. Recursive partition approach did not work well with such a small sample size.

• Preliminary analysis did show that there is definitely biological information in the data. Now the challenge is how do we identify such useful features.

|○○○○  |    DDMMYY

Page 4: Searching for Efficacy Biomarker in Early Clinical Development – An Example of Using NGS in Early Oncology Development Feng Gao, Jacob Zhang, Godwin Yung.

Overview of data

Page 5: Searching for Efficacy Biomarker in Early Clinical Development – An Example of Using NGS in Early Oncology Development Feng Gao, Jacob Zhang, Godwin Yung.

5

How we dealt with the challenges

• Using multivariate approach starting with biologically driven gene set/pathways, we created 2434 sets of genes by grouping 6410 genes together if they belong to the same pathway.

• A panel of markers may be more powerful than a single marker– Sparse distribution of somatic mutations – Low information content of a single mutation (binary, 0/1)– Larger coverage of patients

• A panel of markers need to be supported by biology– Bigger confidence– Better interpretability

• Known pathways or protein-protein interaction

(PPI) network have been tested as predictive

marker panels– A pathway = a bag of genes grouped by

biology better chance to find biologically meaningful

markers

|○○○○  |    DDMMYY

Pathway DB # of pathways

MetaCoreTM 912

BIOCYC 33

KEGG 794

REACTOME 1358

Wiki Pathways 225

Pathway interaction DB

183

Page 6: Searching for Efficacy Biomarker in Early Clinical Development – An Example of Using NGS in Early Oncology Development Feng Gao, Jacob Zhang, Godwin Yung.

6

Construction of gene-sets

Pathwaydatabase

(ex: KEGG)

2434

3

2

1

Genes Pathways

59325931

167616754465

1111103

1821111103

,

,...,,,

,,

,...,,,

6410

111

110

5

3

g

Pathways Genes

g

g

g

g

344334333

44343921

34711221

43

21

,,

,...,,,

,...,,,

,

,

6410

2

1

Genes

g

g

g

Page 7: Searching for Efficacy Biomarker in Early Clinical Development – An Example of Using NGS in Early Oncology Development Feng Gao, Jacob Zhang, Godwin Yung.

7

Model Assumption for the effect of genes within a pathway

Sequence kernel association test (SKAT) Effect sizes are distributed normal with mean 0

Counting-based burden test (cBT) Effect sizes are the same

Threshold-based burden test (tBT) Single effect when there are more than T mutations

Statistical models used in the analysis

• Tumor size change (%) used as efficacy variable• To increase statistical power, and to accommodate different modes of gene effect within a pathway/gene-set, we assume the following 3 models:

Page 8: Searching for Efficacy Biomarker in Early Clinical Development – An Example of Using NGS in Early Oncology Development Feng Gao, Jacob Zhang, Godwin Yung.

8

More details in statistical models

Model Assumption Test

SKAT Score

cBT Wald

tBT Wald

),0(~, Ngiii jjj

ij

g

threshold1

ij

g

• For all three models, we can test the null hypothesis of no pathway association by testing a single parameter:

• In practice, we may wish to reject H0 if at least one of the three tests is significant. Therefore, let us also consider the omnibus test (OT) whose p-value is the minimum of the p-values from SKAT, cBT, and tBT.

0:0 H

Page 9: Searching for Efficacy Biomarker in Early Clinical Development – An Example of Using NGS in Early Oncology Development Feng Gao, Jacob Zhang, Godwin Yung.

More statistical challenges

• Multiplicity correction, the issues:– Multiple models: multiple models used to test pathway association.– Correlation: 2434 pathways are not independent. Many share the same

genes (more detail in next slide).

• The solution for multiplicity correction in the presence of feature correlation:– Resampling-based multiple testing can adjust p-values to account for

multiple testing by incorporating correlation and other distributional characteristics.

Page 10: Searching for Efficacy Biomarker in Early Clinical Development – An Example of Using NGS in Early Oncology Development Feng Gao, Jacob Zhang, Godwin Yung.

10

Many gene-sets are correlated

• Many of the K=2434 gene-sets have overlapping genes. Some are even identical.

• For example, in 10 gene-sets, the following relation holds:

• It is of interest to efficiently test multiple hypotheses.

1. Diminish the number of gene-sets for association testing

2. Test all K gene-sets and then efficiently correct for multiple testing

10

98765

43

21

G

GGGGG

GG

GG

Page 11: Searching for Efficacy Biomarker in Early Clinical Development – An Example of Using NGS in Early Oncology Development Feng Gao, Jacob Zhang, Godwin Yung.

11

Review: Single-step methods for multiplicity adjustment

• Let us distinguish the random p-value from the experimentally observed p-value using capital and lower cases letters, respectively:

Also, denote the complete null hypothesis by

• Single-step methods are simultaneous test procedures that perform equivalent multiplicity adjustments for all tests. E.g.,– Bonferroni:

– Sidak:)1,min(~

ii Kpp

Kii pp )1(1~

),...,(

),...,(

1

1

K

K

ppp

PPP

K

ii

C HH1

0

Page 12: Searching for Efficacy Biomarker in Early Clinical Development – An Example of Using NGS in Early Oncology Development Feng Gao, Jacob Zhang, Godwin Yung.

12

Bonferroni and Sidak methods require certain assumptions to control the FWER.

• Bonferroni:

• Sidak:

Without independence, equality becomes ≤ for very general conditions.

iUHPKK

HKP

HKP

ii

K

i

Ci

CiKi

for )1,0(~| )/(

inequality Bonferroni )|/Pr(

)|Pr(minFWER

1 0

01

iUHP

HP

HP

iiK

K

i

CKi

CKiKi

for )1,0(~| ))1((1

ceIndependen )|)1(1Pr(1

)|)1(1Pr(minFWER

K/1

1 0

01

)|~

Pr(minFWER 01C

iKi HP

Page 13: Searching for Efficacy Biomarker in Early Clinical Development – An Example of Using NGS in Early Oncology Development Feng Gao, Jacob Zhang, Godwin Yung.

• Bonferroni and Sidak methods fail to incorporate dependence and distributional characteristics of the observed p-values. Both limitations are concerns.

Resampling-based multiple testing (RBMT)

13

RBMTIf we knew the joint distribution P, then we could compute

so that

In practice, we may not know P. However, in many cases vectors P*, having the same distribution as P, may be simulated via resampling:

We can therefore compute

)|Pr(min~01C

ijKji HpPp

)|~

Pr(minFWER 01C

iKi HP

)Pr(min~ *1 ijKji pPp

*1min jKj P

CjKj HP 01 |min

*PCHP 0|

Page 14: Searching for Efficacy Biomarker in Early Clinical Development – An Example of Using NGS in Early Oncology Development Feng Gao, Jacob Zhang, Godwin Yung.

14

Illustration

Gene-set(i)

Observedp-value (pi)

R resampled p-values (Pi*)

1 p1 p*1,1 p*

1,2 … p*1,R

2 p2 p*2,1 p*

2,2 … p*2,R

K pK p*K,1 p*

K,2 … p*K,R

min({p*i,1}) min({p*

i,2}) min({p*i,R})

Distribution of p-value under H0

Dependence between tests

(Multiple testing) adjusted p-value

Page 15: Searching for Efficacy Biomarker in Early Clinical Development – An Example of Using NGS in Early Oncology Development Feng Gao, Jacob Zhang, Godwin Yung.

15

Our approach to simulating P*“Simultaneous” bootstrap test

)ˆ,0(~ˆ,ˆˆ'

ˆ,ˆ,ˆ

),0(~,'

2***

2

2

NXY

NXY

*SKAT,,1 rp

*cBT,,1 rp

*tBT,,1 rp

*OT,,1 rp

Repeat R times

*SKAT,,2 rp

*cBT,,2 rp

*tBT,,2 rp

*OT,,2 rp

*SKAT,,rKp

*cBT,,rKp

*tBT,,rKp

*OT,,rKp

Estimate residualsunder the null model

Generate pseudo-datavia parametric bootstrap

Perform GSAT on pseudo-data

Page 16: Searching for Efficacy Biomarker in Early Clinical Development – An Example of Using NGS in Early Oncology Development Feng Gao, Jacob Zhang, Godwin Yung.

16

Intuition: How does P* estimate P|H0C?

)ˆ,0(~ˆ,ˆˆ'

ˆ,ˆ,ˆ

),0(~,'

2***

2

2

NXY

NXY

*SKAT,,1 rp

*cBT,,1 rp

*tBT,,1 rp

*OT,,1 rp

*SKAT,,2 rp

*cBT,,2 rp

*tBT,,2 rp

*OT,,2 rp

*SKAT,,rKp

*cBT,,rKp

*tBT,,rKp

*OT,,rKp

P*P|H0C

),0(~,' 2 NXY

SKAT,1p cBT,1p tBT,1p OT,1p

SKAT,2p cBT,2p tBT,2p OT,2p

SKAT,Kp cBT,Kp tBT,Kp OT,Kp

Page 17: Searching for Efficacy Biomarker in Early Clinical Development – An Example of Using NGS in Early Oncology Development Feng Gao, Jacob Zhang, Godwin Yung.

17

GS Adjusted p-value

SKAT cBT tBT OT

1 0.931 0.872 0.015 0.018

2 0.094 0.010 0.465 0.022

3 0.798 0.017 0.784 0.037

4 0.803 0.580 0.037 0.044

5 0.875 0.482 0.044 0.052

6 0.038 0.223 0.057 0.067

7 0.022 0.063 0.116 0.134

8 0.007 0.823 0.967 0.178

9 0.044 0.974 0.999 0.432

bold = significant at “α=.05”

Efficiently correcting for multiple testing increases # of significant gene-sets from 0 to 4.

NGS study resultsSignificant gene-sets

Page 18: Searching for Efficacy Biomarker in Early Clinical Development – An Example of Using NGS in Early Oncology Development Feng Gao, Jacob Zhang, Godwin Yung.

Resampling based multiplicity adjusted p-value=0.0184

Top pathway from analysis

Page 19: Searching for Efficacy Biomarker in Early Clinical Development – An Example of Using NGS in Early Oncology Development Feng Gao, Jacob Zhang, Godwin Yung.

19

How does resampling-based approach compare to other single-step methods for p-value adjustment

• Our approach: resampling-based approach for multiplicity adjustment.

• Other single-step methods for multiplicity adjustment:– If we knew the effective number of independent tests Ke, then we could

apply the Bonferroni or Sidak adjustments:

– One can estimate Ke based on biology or from a purely statistical standpoint:

• Linkage disequilibrium structure Ke = number of major LD blocks

• Principal components analysis (PCA) of P* where Ke = number of principal components that explains x% of the variation in Pi*’s

• Like PCA, resampling-based adjustments that use the minimum statistic rely on P*.

)1,min(~iei pKp eK

ii pp )1(1~

Page 20: Searching for Efficacy Biomarker in Early Clinical Development – An Example of Using NGS in Early Oncology Development Feng Gao, Jacob Zhang, Godwin Yung.

20

Simulation to compare resampling-based approach with other single-step methods for p-value adjustment

• Rather than consider all K=2434 gene-sets, we considered only10 gene-sets. These gene-sets are highly related:

• Using parameter estimates from the fitted null model of the original data, we randomly simulated outcomes under a null model or a desired alternative model.

• For the first of 10,000 simulated studies, we performed RBMT by generating R=10,000 bootstrap resamples.

98765

4321 ,

GGGGG

GGGG

Page 21: Searching for Efficacy Biomarker in Early Clinical Development – An Example of Using NGS in Early Oncology Development Feng Gao, Jacob Zhang, Godwin Yung.

21

Simulation resultsEffective number of independent tests (Ke) for OT

MethodCutoff for

pEst.

FWERx%

Sidak, Ke=1 0.0500 0.363 29.8

Sidak, Ke=9 0.0057 0.051 83.4

Sidak, Ke=30 0.0017 0.016 100.0

RBMT 0.0056 0.050 −

Table. In order to control FWER ≤ 0.05, different methods propose different cutoffs for the p-values. We provide the estimated FWER for each proposed cutoff, as well as the x% of variation in Pi* explained by the first Ke PCs (slide 24). Notice how similar RBMT is to the Sidak correction with Ke=9.

)|Pr(min~0

*101

Cjj HpPp

Page 22: Searching for Efficacy Biomarker in Early Clinical Development – An Example of Using NGS in Early Oncology Development Feng Gao, Jacob Zhang, Godwin Yung.

22

Summary

• NGS attrition rate high due to DNA quality and other issues.

• Univariate analysis did not have adequate power due to reduced sample size.

• Multivariate approach based on pathways is considered. We used 4 types of test (SKAT, cBT, tBT, and OT) for the association between 2434 gene-sets and %TSCB.

• We used a resampling-based approach to correct for multiple testing.

• Our efforts led to the identification of a gene-set as a statistically significant and biologically interesting pathway.

Page 23: Searching for Efficacy Biomarker in Early Clinical Development – An Example of Using NGS in Early Oncology Development Feng Gao, Jacob Zhang, Godwin Yung.

23

References

• Elbers C.C., et al. (2009) Using genome-wide pathway analysis to unravel the etiology of complex diseases. Genet Epi 33, 419-431.

• Wang K., Li M., and Bucan M. (2007) Pathway-based approaches for analysis of genomewide association studies. Am J Hum Genet 81, 1278-1283.

• Yu K., et al. (2009) Pathway analysis by adaptive combination of p-values. Genet Epi 33, 700-709.

• Westfall P.H. and Young S.S. (1993) Resampling-based multiple testing: Examples and methods for p-value adjustment. John Wiley & Sons Inc, New York.

• Wu M.C., et al. (2011) Rare variant association testing for sequencing data with the sequence kernel association test. Am J Hum Gen 89,82-93.

|○○○○  |    DDMMYY