Download - Statistical Issues in Clinical Research - An Overview

8/8/2019 Statistical Issues in Clinical Research - An Overview

1/64

www.pharmasri.com


2/64


3/64

` Answering even relatively simple questions under the bestconditions a controlled clinical trial can be tricky. Possiblesources of bias abound, and if appropriate safeguards are not taken,may combine to give a false or misleading conclusion

` Some of the factors which make clinical research hard

Formulating the right scientific question can be deceptively tricky

Logistical complexity, especially the need to use multiple sites Trial conduct is highly interdisciplinary, requiring sustained, well-

coordinated effort from many groups

Staggered recruitment of subjects, uncertainty about accrualpattern is unavoidable

Patient dropout, particularly in longer trials

Potential for the goalpost to move mid-trial unforeseen eventscan destroy, or severely reduce, the relevance of the study evenbefore it ends

www.pharmasri.com


4/64

` Lasagnas Law

The prevalence of any disease under study dropsdramatically once study enrolment opens up, andreturns to previous levels only once enrolment closes

` Murphys Law Anything that can go wrong, will go wrong In particular, the most egregious breach of protocol

instructions will occur at the highest-enrolling site` Giltinans Law

The quality of data obtained from any site is inverselyproportional to the degree of exaltation of the thoughtleader or principal investigator at the site (in extremecases, the role of thought leader is so all-consuming

that delays in filing the necessary paperwork result inactual enrolment levels close to zero) : clearly, all just different manifestations ofMurphys Law

www.pharmasri.com


5/64

` A key concern is that each individual study protocol must achieve

its goals, not just on its own terms it must also make sense withinthe broader picture

` A major practical issue is the ever-changing nature of thelandscape the long duration of most trials, and the uncertaintyabout the results means that the original target may have shifted bycompletion of a given trial

` Nonetheless, a key requirement when designing any trial is that theproposed design should give the best chance possible of enablingthe development plan to proceed to the next stage, once resultsfrom the trial become available

` The previous condition should be met, even when results do notcorrespond to the desired answer; it is important to remember thata failed clinical trial is not one which fails to give the desiredanswer, but rather one which fails to give an unambiguous answer

www.pharmasri.com


6/64

` Phase III objectives determined primarily by (i) target product profile

(think desired label claim) (ii) norms for the given disease

` Primary and secondary objectives should map readily tocorresponding statistical hypotheses

` Safety objectives are given greater emphasis in Phases I and II;Phase III focuses on efficacy and safety

` Objectives should be specified as precisely as possible. At aminimum, include information on What measure of efficacy/safety will be used? Key features of the target patient population Dosing regimen, i.e. amount, frequency, and route of dosing

` Preferable to use neutral language when specifying objectives(personal opinion). Phrases like to compare (investigate) the efficacyor to characterize the pharmacokinetics are preferable to, e.g., todemonstrate efficacy or to establish superiority

www.pharmasri.com


7/64

Examples:

` To investigate the effect of a single 5mg dose ofrhwonderprotein, administered by transgenic snakebite,on clotting ability in Irish clergymen, as measured by thechange from baseline in prothrombin time, rather than To

demonstrate the efficacy of rhwonderprotein in improvingclotting ability

` To investigate the effect of twice daily SC injection of40g/kg of rhIGF-I for 12 weeks on glycemic control, insubjects with moderate to severe Type II diabetes, asmeasured by the average change from baseline in HbA1c,compared to subjects in the placebo group

www.pharmasri.com


8/64

1. Selection bias2. Allocation bias

3. Evaluation bias(observer/instrument)

4. Recall bias

5. Time (systematic change inpatient population, treatment, orother aspect of study conduct astrial progresses)

6. Withdrawal / drop out patterns7. Lack of compliance with study

protocol

8. Unblinding (of patient, physician,or study personnel)

1. Unambiguous eligibility criteria2. Randomization, stratification,

blinding3. Blinding, standardization

(training, or central evaluation)4. Appropriate data collection

instruments5. Balanced treatment allocation,

protocol should specify salientdetails of study conduct, avoidingroom for differential interpretations

6. Pre-specified analysisconventions, sensitivity analyses

7. Training; engaged studycoordinators at site

8. Randomized allocation; suitableprecautions surrounding treatmentcodes and drug inventory/supply

www.pharmasri.com


9/64


10/64


11/64

` Randomization is the basis for statistical inference` A significance level represents the probability that

differences in outcome can be the result of randomfluctuations.

` Without randomization a statistically significantdifference may be the result of non randomdifferences in the distribution of unknown prognosticfactors

` Randomization does not ensure that groups aremedically equivalent, but it distributes randomly theunknown biasing factors

` Randomization plays an important role for thegeneralization of the observed clinical trials data

www.pharmasri.com


12/64

` If prognostic factors are known use randomizationmethods that can account for it Stratification / blocking Adaptive randomization

` If possible randomize patients within a site` Patients enrolled early may differ from patients enrolled

later Watch out for staggered enrollment Temporary closing of study sites or arms can cause problems

` Protocol amendments that affect inclusion/exclusion

criteria may be tricky` Even in open label studies randomization codes should

be locked

www.pharmasri.com


13/64

` Randomization does not guarantee that therewill be no bias by subjective judgment inevaluating and reporting the treatment effect

` Such bias can be minimized by blocking theidentity of treatment (blinding) Types of blinding

` Challenges Ethical considerations

Unblinding procedures for safety reasons

Unblinding procedures at final analysis

www.pharmasri.com


14/64

` Protection against certain types of bias is through appropriate

design precautions (stratification, randomization, blinding)

` Other types of bias are prevented only by giving unambiguousinstructions to the sites on the intended patient population and howall aspects of the study should be conducted

` Sites will sniff out each ambiguity in the protocol, and interpret andexecute the instructions more divergently than you can imagine

` There is vagueness regarding key aspects of study conduct, e.g.use of con meds, evaluation schedule, endpoint definition, handlingof dropouts, how key evaluations will be carried out, etc. etc. etc.

` Major divergence in interpretation (e.g. in deciding eligibility, or howto measure a key response variable) has the potential to torpedo the protocol entirely may not become evident until its too late

www.pharmasri.com


15/64

` As a routine precaution, it is advisable to limit thecontribution to enrolment of any single site to no morethan 15% of the total. Note that this limit is generally notspecified explicitly in protocol text, but is communicated tosites at study initiation nonetheless

` Non-standard evaluations may require intensive trainingof site personnel to reduce systematic differences inevaluation among sites

` Centralized (blinded) evaluation, when feasible, is oftenthe best option

` It is a good idea to develop a prospective publicationstrategy, securing upfront buy-in from key stakeholders

` A plan and timetable for disseminating study resultsshould be developed, following existing SOPs, andcommunicated to sites prospectively

www.pharmasri.com


16/64

`

Regular, frequent communication with sites is important

` Early monitoring of key variables is advisable, to allow

problems to be detected and fixed early

` Appropriate mechanisms should be in place to allow

evaluation of aggregated safety data in a timely fashion,

(remember that individual sites may not be able to

discern adverse patterns, based only on their data)

` Each team member should try to attain at least a basic

understanding of the role of every other team member

www.pharmasri.com


17/64

` Discussion here will focus primarily on efficacy endpoints` What about other kinds of endpoints?

` Pharmacokinetic endpoints are generally standardparameters derived from the observed concentration-time profiles

` Safety endpoints also tend to be fairly standard; mostare common across protocols, with occasionaldisease/drug-specific markers Incidence of adverse events (general, protocol-specified, by body

system, etc.)

Changes in key laboratory parameters Incidence of antibodies (neutralizing or not)

` Pharmacodynamic endpoints, in contrast, aremeasures of activity, and will vary from study to study.Recommendations for efficacy endpoints apply.

www.pharmasri.com


18/64

` No problem in Phase I, where focus is primarily on safety and PK

endpoints. Limited sample sizes preclude formal evaluation of

efficacy if it must be mentioned in the protocol, it is preferable to

refer to activity, rather than efficacy

` Drug approval requires establishing an acceptable risk-benefit

profile. It is important to bear in mind that the regulatory expectation

is that ofclinical benefit to the patient` Thus, in general, the primary efficacy endpoint should be a measure

of clinical effect (as opposed to, e.g. a biochemical or physiological

marker)

` Taking the primary efficacy endpoint in a pivotal trial to be a

biomarker which is not a direct measure of clinical benefit issomething which should be done only with prior buy-in from all

relevant regulatory agencies

` In general, such buy-in can be attained only in the case of an

established surrogate endpoint more on this below

www.pharmasri.com


19/64


20/64

` Generally speaking, endpoints which can be measuredin a completely objective fashion are preferred` This may not always be possible some degree of

subjectivity may be unavoidable (e.g. in endpoints suchas physicians or patients evaluation of improvement)

` The degree to which this kind of subjectivity may beacceptable is likely to depend on perceptions about theintegrity of blinding in the study

` In evaluating quality of life, use of a validatedinstrument is preferable. In many cases, a disease-specific QOL questionnaire exists

` Consultation with the Health Economics group is highlyrecommended, to ensure that collection of QOL datasupports the target product profile (dont wait until PhaseIII to do this)

www.pharmasri.com


21/64

`

In general, key efficacy endpoints should bestraightforward to measure. Avoid measures which mightstill be considered experimental, which require highlycomplex instrumentation, or involve extremelyspecialized assays. Measurements which rely heavily ontechnician skill or judgement can also be problematic

` Centralized evaluation of key endpoints may help guardagainst inter-site variation

` If key variables do involve specialized assays, make sure

that assay procedures are thoroughly understood, andconsistently implemented

www.pharmasri.com


22/64

` Multiple secondary endpoints are common *` Multiple primary endpoints are sometimes used

If consensus on a single 1r endpoint is impossible Should be a course of last resort (personal view) Have an associated penalty, in terms of a higher bar to

declare statistical significance at a given level E` A common approach is to require significance at level E

k, where k is the number of co-primary endpoints(Bonferroni)` Bonferroni works reasonably, provided k is not too large,

and if the constituent endpoints are uncorrelated` For highly correlated endpoints, Bonferroni is inefficient;

true attained significance will be < E`

Especially problematic if there is interest in multiplesubsets* Try to show some discipline regarding # of 2r endpoints

www.pharmasri.com


23/64

1. Continuous - e.g. reduction in cholesterol, HbA1c,visual acuity

2. Categoricala) Multiple categories with no natural orderingb) Ordered categorical - e.g. different degrees of improvement

3. Dichotomous e.g. response/non-response*,

dead/alive at a specific time post-treatment4. Time-to-event e.g. survival, time to progression

Different analysis methods are appropriate for each mainendpoint type; sample size requirements differ as well

(3) is obviously a special case of (2)

www.pharmasri.com


24/64

` Approximate ordering by information content (fromhighest to lowest) isContinuous > time-to-event ~ ordered categorical

> categorical > binary

` As a result, demonstrating an effect when the primaryefficacy measure is a response rate is typically mostdemanding, in terms of sample size

` Although continuous response variables may havepreferable statistical properties, it is quite common forFDA to require the primary efficacy variable to be aresponse rate, where response is defined as theproportion of subjects who reach a specified threshold ofimprovement on the continuous scale (Raptiva, Lucentis)

www.pharmasri.com


25/64

` Response rate (where response is based on change in

tumor size, according to well-defined criteria; best post-treatment evaluation is counted, so response is not linkedto a specific timepoint)

` Duration of response (note that the resolution with whichthis can be determined will depend on the frequency of

scheduled evaluations)` Survival time

` Time to disease progression, where criteria for progressionare well-defined

` Progression-free survival

One major question is the extent to which a treatment effecton response, in terms of reduction of tumor size, is predictive

for treatment effect on survival. Unfortunately, this seems to vary bytumor

and treatment class.

www.pharmasri.com


26/64

In the standard hypothesis testing framework for efficacy Type I error : conclude an ineffective drug is effective

(false positive)

Type II error : conclude an effective drug is ineffective

(false negative)

` Ideally, both error probabilities should be controlled

` Generally, sample size is chosen to give acceptable power

(defined as 1- Type II error rate, or 1 - F) for a prespecified

false positive rate, E

`

In phase III efficacy trials, E is 0.05, by regulatory fiat` Acceptable power is generally taken to be 90% for pivotal

studies

www.pharmasri.com


27/64

` This has implications for sample size, due totension between both types of error` Timeline implications, as study duration = treatment

duration + accrual time` Common pitfall exaggerate extent of the possible

treatment effect (power for the home run), over-optimistic sample sizes` General guideline : power study to detect treatment

effect specified in the target product profile (regular,not optimistic, scenario)

`

In some cases, sample size is dictated by safety,rather than efficacy, considerations (satisfyminimum regulatory requirements)

www.pharmasri.com


28/64

` For a given value ofE, power depends on

Magnitude of the treatment effect ()

Sample size ()

Inter-subject variability for continuous measurements ()

Response rates for binary responses ()

` For most pivotal efficacy trials, the standard approach is to calculate

the sample size necessary to give adequate (90%) power to detecta clinically meaningful treatment effect, with a type I error rate of

5%

` Calculating the sample size needed for a given power requires

some knowledge about variability of continuous responses (or

response rates, for binary data)

` Clinically meaningful needs to be defined in terms of the target

product profile, not as the effect size which will give acceptable

power for the sample size Im willing/able to use

www.pharmasri.com


29/64

` Sample size is not always dictated by this kind of power analysis

in some cases, safety requirements may be the deciding factor(rheumatoid arthritis, psoriasis)

` In earlier phases, it may not be practical to run trials big enough tocontrol both Type I and Type II error rates as well as we might like

` 80% power is generally considered adequate in Phase II; onoccasion we may settle for less

` Similarly, requiring significance at the 5% level may be overlystringent in Phase II

` Personal view: it is foolish to allow the hegemony of hypothesistesting to control our thinking prior to Phase III

` Instead, view the issue as an estimation problem

` Precision analysis Choose sample size in such a way that there is a desired

precision at fixed confidence level

Small chance of detecting true treatment effect

www.pharmasri.com


30/64

Challenge Power for correctly detecting a clinical meaningful difference at a

fixed type I error rate depends primarily on the number ofevents(deaths, progressions, etc.)

Specifying the number of events doesnt uniquely determine thenumber of subjects

For instance, suppose the required number of events is 280. If 300subjects per group is sufficient to give the required number ofevents, then 250 per group must as well it will just take longer

Thus, sample size calculations are a little more complex for time-to-event responses and will depend on

calculating the number of events needed to give the desired power an assumption about the median time-to-event in the control group

an assumption about the size of the difference between control andtreated groups

projected accrual patterns

targeted study duration

www.pharmasri.com


31/64

`

Interim analysis is a tool to protect the welfare ofsubjects

By stopping enrollment/treatment as soon as a drug is

determined to be harmful

By stopping enrollment as soon as a drug is

determined to be beneficial By stopping trials which will yield little additional useful

information (or which have negligible chance of

demonstrating efficacy if fully enrolled, given results to

date)` The associated statistical methods are generally referred

to as group sequential methods

www.pharmasri.com


32/64

` Should preserve an overall false positive rate ofE for the trial :cannot claim statistical significance at level E if the unadjusted p-value at one of the interim analyses happens to be less than E

` In general, the unadjusted p-value for testing treatment effect at anygiven interim analysis will be compared to a more stringent (lower)

bound to stop early (for efficacy) requires compelling evidence

` Regulatory agencies need to be convinced that interim analyses donot compromise the integrity of the blind

` Regulatory guidelines over the past 10 years have become stricterand stricter, ultimately requiring that interim analyses be conductedby an external, independent group, i.e. study team members are nolonger privy to interim results

www.pharmasri.com


33/64

`

Basically, interim results should not be shared with anyone in thesponsor company, or at participating study centers

` The only feedback to the sponsor is in the form of the

recommendations from the Data Monitoring Committee

` Details of any proposed interim analysis, including the sponsors

expectations of the DMC, should be laid out prospectively in a written

charter

` SOPs and a charter template exist and should be followed

` Although team members do not conduct the actual analyses,

scheduled interim analyses can be highly labor-intensive nonetheless.

Genentechs biostatistician/statistical programmer will still need to

work with the external data group to develop detailed specifications forthe analyses and displays to be made available to the Data Monitoring

Board

www.pharmasri.com


34/64

` Early stopping for efficacy is not the only possibility (recent experiencenotwithstanding). Doing so is generally non-controversial, provided anappropriate group sequential stopping rule, and the role of the DMC,have been identified prospectively

` Early stopping for safety can range from scenarios which are veryclear-cut to situations which are considerably more ambiguous. In thelatter case, having an experienced DMC chair can be particularlyimportant

` Early stopping for lack of efficacy (futility analysis) is not particularlycommon (with one exception, discussed on the next slide) the ideathat incorporating this option can result in substantial reduction in thenumber of patients (gating risk) seems slightly misleading (personal

opinion) Stopping for futility in a controlled trial will typically happen only if the

treatment appears considerably inferior to control at the interim analysis

Enrolment continues during preparation for the interim analysis, whichtypically occurs at a point where accrual has gained momentum, so # ofsubjects saved may not be that great

www.pharmasri.com


35/64

` An exception is the case of uncontrolled oncology trials focusing on

estimation of response rate` Use of a two-stage (or multi-stage) design is common

` At a given analysis stage, if the observed response rate is so low

that it essentially rules out the possibility that the true response rate

is acceptable, may choose to stop

`

Typically the argument is based on the upper 90% or 95%confidence limit for the true response rate stop if this is lower than

the minimum rate identified as interesting in the TPP

` Recall the rule of 3, often invoked in the context of safety data. If a

particular event (adverse reaction, response) occurs in 0 out of N

subjects tested, then the 95% upper confidence limit for the true rate

of occurrence is 3/N.

` Thus, for instance, if no responses are observed in the first 20

subjects, this effectively rules out values of the true response rate

greater than 3/20, or 15%. If the TPP requires a response rate of at

least 20%, stopping for futility seems warranted

www.pharmasri.com


36/64

` A fairly detailed exposition can be found on our websiteat : gwiz/projects/stathelp introductory course notes,lecture 4

` Use of the binomial distribution` C

alculating standard errors; normal approximation forlarge samples` Estimation and confidence intervals for a single rate` Testing for difference between two rates (z-test, -test,Fishers exact test)

` Estimation and confidence intervals for the differencebetween two rates

` Testing for differences in rates among several groups(-test, Fishers exact test)

www.pharmasri.com


37/64

` If the response of interest is survival time, then

specialized methods are needed, for two main reasons

Frequency distribution of survival times is usually not well-behaved not normal, not even symmetric

In the context of clinical studies, cannot wait to observe allsurvival times this means, for some subjects, all we know is thattheir survival time exceeds the observation period

` In statistical jargon, such survival times are called (right)-censored observations

` Methods for survival times are also applicable to anyresponse of type time-to-event e.g. time to diseaseprogression, etc.

www.pharmasri.com


38/64

`

Definitions: survivor function, hazard function

` Estimation of survival curve : Kaplan-Meier

` Comparison of one or more survival curves :logrank test, Wilcoxon test

` Comparing survival curves, allowing adjustment

for other factors (e.g. baseline disease status) :

proportional hazard regression, aka the Cox

model

www.pharmasri.com


39/64

www.pharmasri.com


40/64

` We wish to estimate the proportion remaining disease-free at any giventime, equivalently, the estimated probability of that a member of thepopulation from which the sample is drawn is alive without disease atthat time

` Because of the censoring we use the Kaplan-Meiermethod. For eachtime interval we estimate the probability that those without disease atthe beginning remain so throughout the interval. This is a conditionalprobability.

` The probability of being disease-free at any time point is calculated asthe product of the conditional probabilities of surviving without diseasethrough each interval prior to that time point.

` The calculations are simplified by ignoring times at which there were norecorded events (whether progressions or losses to censorship).

` Censorship is accommodated in the calculations by ensuring that all

subjects previously lost to censoring are removed from the risk setwhen calculating the conditional probability for a given timepoint

` Because the overall probability of being disease free at a particulartimepoint is calculated as a product of the relevant conditionalprobabilities, this (Kaplan-Meier) method of estimating the survivalcurve is sometimes referred to as the product-limit estimate

www.pharmasri.com


41/64

` Survival probabilities are usually presented as a connected"curve. The curve takes the form of a step function, withchanges in the estimated probability occurring (only) when anevent (progression) was observed

` Observations censored during any interval affect the number stillat risk at the start of the next interval. Censoring is thus

accommodated when calculating the step sizes, its effect onthe curve is relatively subtle, but becomes cumulatively moreimportant over time. Some versions of the Kaplan-Meier curvedisplay censoring times as superimposed short vertical lines(works best for relatively small sample sizes)

` In practice, a computer is used to do these calculations.

` Standard errors and confidence intervals for estimated survivalprobabilities can be found by using a formula due to Greenwood

` Reporting estimated median survival with associated confidencelimits is usual; estimating other percentiles is also possible

www.pharmasri.com


42/64

Two most common tests are

` Logrank test` Wilcoxon test

If comparison needs to allow adjustment for othercovariates besides group ID (e.g baseline diseasestatus), the most common approach is

` Cox (proportional hazards) regression

As the name implies, this analysis frames the comparison in termsof the effect a treatment or covariate exerts on the hazard function,rather than directly on the survival function

www.pharmasri.com


43/64

Logrank test

` Basic idea at each new event time, figure out the survivalpattern that would be expected if the null hypothesis (no

difference) were true` Quantify the difference between the observed survival pattern

and that expected under null hypothesis. This is done at eachnew event time.

` Obtain a cumulative measure of discrepancy from H0 by addingup the contributions across all event times

` Compare the result to appropriate tables (chi-square) to obtain ap-value

Wilcoxon test variation of logrank text which gives greater weightto discrepancies occurring earlier

www.pharmasri.com


44/64

Limitations of the logrank test

` Only addresses the question : is there a difference?No direct quantification of the size of the difference

` Doesnt allow adjustment for other relevant prognostic

factors (e.g. differences at baseline)

These questions usually addressed by Cox(proportional hazards) regression. Salient output is

` estimated coefficient with standard error and/or

confidence interval` Usually interested in whether or not coefficient is zero

` Quantifies effect on hazard, rather than the survivalfunction

www.pharmasri.com


45/64

For completeness, here are the definitions:

` Survival function

S(t) Probability of surviving past time t

` Hazard function

h(t) Probability of dying at time t, given onehas survived until that time

For calculus fans, the hazard function turns out to bed/dt [ - log (S(t) ]

www.pharmasri.com


46/64

Safety and efficacy data differ in some key aspects` Safety hypotheses are not specified a priori

` Failure to achieve statistical significance does not

mean that a safety finding can be ignored

` With safety data the goal is to prove a negative

` Safety analyses are usually descriptive

`A few serious medical events can lead to the

termination of products development extreme

value distributions are relevant to safety analyses

` Concurrent controls may not provide adequate

context for interpretation

www.pharmasri.com


47/64

` Phase III trials are typically sized based on efficacy:what type of safety statements are appropriate?

` Drug exposure: how to summarize, how to correlatewith adverse events observed, etc.

Dose response

Open label trials

Placebo-controlled trials

` Sources of bias (under-reporting, longer follow-upleads to more events)

` Adverse events: very very many types, so what is anappropriate way to summarize/analyze?

Multiplicity

www.pharmasri.com


48/64

` Number of subjects and duration of exposureduring development is minimal relative to the # of

patients that may receive drug post-approval: Only the most common AEs (e.g., incidence of 1 % or

more) are identified Less common AEs (1 in 1000) cannot be reliably

detected

Rare events (1 in 10,000) will almost certainly not be

observed at all

Some patient groups may have been excluded from

trials entirely, or insufficiently represented to a degree

which precludes identifying any risks specific to them

www.pharmasri.com


49/64

` Safety: Applicant must demonstrate product safety (FDA has

obligation to demand)x Extent of data: There must be sufficient information to decide

whether the drug is safe.x Adequate analyses: Adequate tests by all methods

reasonably applicablemust be performed to evaluate safetyfor labeled use.

x Reasonable results: Tests should show that drug is safe aslabeled

x Risks must be adequately defined.

x Extreme risks (even if rare) must be obvious.

www.pharmasri.com


50/64

` Efficacy: Applicant must demonstrate substantial evidence of

effectiveness claimed.

x Substantial evidence : evidence consisting of adequate and

well-controlled investigations, including clinical investigations,from which experts could conclude the drug will have the

claimed effect.

x Investigations imply replication or corroboration.

x Typical: 2 Phase III trials with identical or similar designs

x

In special circumstances: 1 Phase III trial may be sufficient.x E.g. life-threatening diseases with very limited therapeutic options

(always a good idea to talk to regulatory agencies prior to trial

initiation)

www.pharmasri.com


51/64

` Regulatory Agencies FDA

EEC (European Economic Community)

` U.S. Codes ofFederal Regulations forClinical Trials

` ICH (International Conference onHarmonization) Initiatives undertaken by regulatory authorities and industry

associations to promote international harmonization ofregulatory requirements

Good Clinical Practice (GCP)

Structure and content of clinical studies

Clinical safety data management: Definitions and standardsfor expedited reporting

Statistical principles for clinical trials

www.pharmasri.com


52/64

. a laboratory measurement or physical sign

used as a substitute for a clinical endpoint that

measures how a patient feels, functions, or

survives.

from a definition of the term surrogate endpoint by

Temple, cited in Fleming and DeMets (1996),

Annals of InternalM

edicine, 125, pages 605-613[Surrogate endpoints in clinical trials: are we being misled?]

www.pharmasri.com


53/64

Some thoughts on biomarkers

www.pharmasri.com


54/64

Predict clinical efficacy of treatment based

on its effect on biomarker (data may be

available earlier; may provide answer with fewer

number of subjects)

Use in Phase II is common

dose ranging based on biomarker

Phase III go/no go decision based onobserved treatment effect on biomarker

www.pharmasri.com


55/64

` Biochemical (cholesterol, HIV viral load, cytokineconcentration, hemoglobin A1c )

` Immunological (lymphocyte subpopulation

counts, CD4+ , CD11a+ T cells, CD20+ B cells..)

` Saturation of target cell surface antigen or

soluble ligand

` Physiological (e.g. blood pressure, pulmonary

function testing, episodes of arrythmia )` Imaging (angiography, tumor size, bone density

by DEXA scan )

www.pharmasri.com


56/64

` Lowering of cholesterol level by treatment with statins

(survival benefit established)

` Reduction in viral RNA in peripheral blood throughtreatment with protease inhibitors delays HIV diseaseprogression

` Improved glycemic control (HbA1c) predictive of delayedonset of microvascular complications (retino-, nephro-,neuropathy) in Type I diabetes

` 90-minute TIMI flow (angiography) predictive of 30-day

survival following thrombolytic therapy

` Reduction in free IgE following treatment with an anti-IgEantibody correlates with symptom improvement scores inallergic rhinitis and asthma

www.pharmasri.com


57/64

Experience with biomarkers is not always positive

` CD4 counts as a surrogate in AIDS trials mixed

performance as a predictor of clinical benefit

` Tumor size in cancer trials experience runs both ways

appears to depend both on tumor type and on class oftreatments

` Experience in the CAST trial demonstrated that treatment

with encainide/flecainide clearly reduced the incidence of

arrythmias, but increased mortality` Similar results in context of treating atrial fibrillation

` Blood pressure as surrogate effect translates to clinical

benefit for some drug classes, but not others

www.pharmasri.com


58/64

` Biomarker not on causal pathway of disease process

` Several pathways intervention affects that mediatedthrough biomarker, but not others (redundancy)

` Biomarker not on the pathway affected by the intervention,or is insensitive to treatment effect

` Intervention has mechanisms of action unrelated to thedisease process (aka the law ofunintended

consequences)

` Failure of either type is possible - biomarker could falselypredict, or fail to predict, clinical benefit

www.pharmasri.com


59/64

Other potential contributing factors include:

Measurement difficulties due to rater effects

GNE experience (K-interferon in renal cell carcinoma)

strongly supports advisability of blinded tumor

evaluation by a single central review board (avoidbias, minimize center differences)

Measurement difficulties arising from sample preparation,

transport, storage, and handling

Time constraints in assaying fresh blood, possible effects ofactivation of T-cells, lack of standardization ofFACS assay

protocols and reporting methods, heterogeneity of tumor

samples, center differences (use of local or central labs)

www.pharmasri.com


60/64

Other potential assay-related difficulties include -

Matrix effects

Interference by other proteins can affect assay

specificity and/or sensitivity

Development of antibodies

Can be hard to detect; harder to quantify reliably;

extremely difficult to assess clinical significance, if any

Inter-laboratory differences

Can be large enough to make biomarker data uninterpretable

www.pharmasri.com


61/64

` Avoid the what we can measure is what we should

measure fallacy` Experience with imaging-based biomarkers to date has

been disappointing

` Non-targeted genomic assays (e.g. microarrays followed bydata mining) has the potential for much wasted effort

` Avoid the rearranging the deckchairs on the Titanic fix, e.g.straining to improve assay precision from a CV of 20% to15% when the within-subject CV for the marker is 40% andthe inter-subject CV is 50%.

` Cytokines make particularly treacherous biomarkers

` Proteomics is not for sissies

` Distinguish between must know and nice-to-know

` An understanding of mechanism of action may be nice toknow, but is not a requirement for drug approval

www.pharmasri.com


62/64

` If the word cascade appears in the description ofthe disease process, all bets are off

` The topic of biomarkers seems to drive otherwise

thoughtful researchers to an irrational frenzy of

wishful thinking` The message so eloquently expounded by Jaggeret

alremains as relevant today as it was in 1969

` Lasagnas Law already mitigates against rapid

accrual of eligible subjects to clinical trials

` To slow recruitment from a trickle to a complete

grinding halt only two words are needed in the

protocol: serial biopsy

www.pharmasri.com


63/64

` Utility of a particular biomarker depends not only on the

disease, but also on the nature of the therapeutic intervention

` Validation of any candidate biomarker must necessarily be

considered on a case-by-case basis

` Validity of a marker for a given drug class may not transfer to

other drug classes for the same disease` Success is most likely when intervention clearly affects the

biomarker, whose role in the disease process is well-

established and clearly understood

` Validation of a putative marker cannot happen withoutultimately generating the required clinical outcome data

` Regulatory conservatism is to be expected, and seems

appropriate

www.pharmasri.com


64/64