A statistician on a NICE committee

4

Click here to load reader

Transcript of A statistician on a NICE committee

Page 1: A statistician on a NICE committee

81june2010© 2010 The Royal Statistical Society

NICE was founded in 1999. It is mainly a “virtual” organisation which employs only about 300 people, with a large number of unpaid people helping on its various committees. Each appraisal committee consists of about 30 people from a wide variety of professions. There are clinicians from various specialties, but in particular

A statistician on a NICE committee

NICE is very much in the public eye. “NICE” may be an unfortunate acronym, since it invites sub-editors to exercise their wit when a decision it makes is “not nice”. Its decisions are often unpopular with patients, who may believe they are denied possibly life-saving treatments simply on grounds of cost, and with phar-maceutical companies, which stand to make a great deal of money if one of their products gains NICE approval and to lose a great deal of money if it does not. I serve on one of NICE’s four appraisal committees and my committee has been accused of comprising “faceless bean counters”, which (since there are no accountants on the committee) I took to mean statisticians and so took rather personally. More recently the committees have been accused by the US media of being “death panels”. If my colleagues and I really felt that that was our role I doubt if many of us would be serving on them. In reality I have found it is a wonderful (and challenging) opportunity for proper use of statistics to influence public policy.

So how do the NICE committees decide whether to approve a new treatment, and what role do statistics and statisticians play in this? There are a number of stat-isticians on these committees so I should emphasise that these are my own opinions and that other statisticians may have different ones.

The National Institute for Health and Clinical Excellence decides which new therapies should be allowed in the NHS. In Britain it is known as NICE. Some Americans call it a death panel. Mike Campbell works on a NICE appraisal committee. He explains what he does.

© iStockphoto.com/Baldur Tryggvason

Page 2: A statistician on a NICE committee

82 june2010

cardiovascular and cancer since these are the commonest areas of application of new tech-nologies. There are general practitioners since many of the therapies will be dispensed by them. There are representatives from NHS management, nursing, and public health doctors. There is usually one representative from industry and one person who represents patients, commonly by being a member of a patients’ organisation. There are also health economists and medical statisticians. Special-ist experts and patient representatives for the medical condition in question are invited for particular appraisals. As one can imagine, there may be conflicts of interest – our decisions can affect share prices – and each member has to sign a document to confirm that they have no competing interests.

Some health economics

Many of the decisions hinge on health eco-nomics. Vastly different technologies – from wisdom tooth extraction to kidney transport and cancer drugs – have to be compared. To do this, there are two outcomes from trials that are considered. Extension to life is one; the health-related quality of life is the other. Clinicians do not only want their patients to live longer; they want them to have independence, mobility, freedom from pain, and so on – the things that add pleasure and value to life.

Quality of life is often assessed using a questionnaire known as the EQ5D or Euroqol. It is rather crude and contains five dimensions measured on three levels such as “I am not anx-ious or depressed”, “I am moderately anxious or depressed”, and “I am extremely anxious or depressed”. Another instrument is the SF-36, which is longer and more finely grained. The answers to these instruments are mapped onto a utility which is a number ranging from 0 for death to 1 for perfect health.

The benefit of treatment is then projected over the lifetime of the patient and is measured as Quality Adjusted Life Years (QALYs, pronounced “kwollies”). For a non-fatal disease such as arthritis, a drug might improve one’s utility by 0.1, and this benefit is expected to last for 10 years, so one would gain 1 QALY. For a terminal illness, such as advanced cancer, a drug may extend life by 6 months. However, compared to full health score of 1, one might expect the patient’s health over this period to be only about 0.4, and so one would gain 0.5 × 0.4 = 0.2 QALYs. In practice, future health gain is subjected to a discount (giving higher value to the immediate future). Recent guidelines on end-of-life treatments have

suggested that one might value the extra life as if in full health (or even more), so that if you gained 6 months of life from a cancer drug, you would treat this as 0.5 QALYs.

How much the new treatment will cost the NHS also has to be determined. This would include not just drug costs, but also training costs for new equipment, staffing costs, etc. We do not formally take into account the wider costs to society, such as whether a person can hold down a job on the new treatment, but these issues are often weighed up in the com-mittee’s discussions, and this is where patient representatives can be particularly helpful. The other important point is that it is the costs per person that count, so each person is valued the same. The total costs to the NHS are consid-ered but are not the main issue.

The difficulty here is that if one was suffering a common disease, approving a new therapy may put considerable pressure on the NHS even if the additional costs per person

are quite modest. However, it would be unfair to an individual to turn a therapy down simply because others are suffering the same disease. The other side of the coin is that those suf-fering a rare disease may argue that despite high individual costs, the total cost to the NHS may not be great, and so the treatment should be approved. But this is valuing people with rare diseases more highly. The important principle, often quoted by health economists, is “a QALY is a QALY is a QALY” – that is, we value health on an individual basis. There is an argument that it is more costly to develop drugs for rare diseases since there are fewer profits to be made, and so people with rare diseases should be treated more favourably. I think this may be a valid argument, but against that I have seen companies try to suggest that an expensive drug should be approved since it is used for rare cancers, when in fact the drug is also used for more common ones and so would actually have quite a wide market.

The lifetime costs and benefits are often determined using what is known as semi-Markov models. A semi-Markov model is one where we imagine a cohort of people proceeding through life. During each year they

will have certain risks to health or life, and also incur certain costs. Some of the cohort will die and no longer be available the following year. (See the article that follows, by James Hanley and Elizabeth Turner, for some history of this.) The cohort is followed up on a regular basis until all have died and the overall costs and benefits are assessed. These models can often be run on Excel spreadsheets, although sometimes a more sophisticated package such as WinBUGS is used. Further details of the statistical analysis of cost-effectiveness data are given in Willan and Briggs (2006)1.

Having worked out the costs and the benefits of the new therapy and the standard, we obtain the incremental cost effectiveness ratio (ICER) as

ICER = The extra costs of the new treatment

The extra QALYs the treatment gives

This gives us the increased costs to get one QALY extra from the new treatment. At its simplest the ICER is the cost of buying an extra year of full health. In general, NICE will approve therapies with an ICER less than £20,000, will discuss ICERs over this, and only exceptionally will approve ICERs over £30,000. Thus £30,000 may be considered in most cases a ceiling for the cost of a year of good life.

Recently NICE accepted that people value life more highly towards the end of it, but that drugs which may give a patient an extra 3 months would have to be quite cheap to be affordable on these thresholds. They have introduced end-of-life criteria that include the benefit of the treatment being more than 3 months, life expectation being less than 24 months and for a “small” population. In these circumstances approval may be given for drugs over the standard threshold.

Since a threshold approach does not ex-plicitly take into account how much money the NHS actually has available, it is not the most logical method. A better one would be to rank all the new therapies by their ICERs and then approve successively larger ICERs until all the money is spent. However, the ICERs are not all available at the same time, and so this cannot be done. Another assumption is that since there is a finite amount of money available, money spent on one patient means taking money away from another patient. However, purchas-ers (hospitals and primary care trusts) often have little idea about which of the therapies they pay for are less cost-effective and so which ones they should stop paying for. Committees should always bear in mind that if they approve

£20,000 for an extra year of full health will probably be approved; £30,000 will

probably not

Page 3: A statistician on a NICE committee

83june2010

an expensive drug, possibly more cost-effective treatments may be stopped. For example, in a recent BBC2 television programme, The Price of Life, it was pointed out that the cost of keep-ing one terminally ill person alive for 3 months on a particular drug would pay for several pal-liative nurses for 1 year. The nurses can care for many more patients but have not been subject to a NICE appraisal, do not have a powerful company arguing their case – and may face the sack to pay for expensive new treatments.

Since the ICER is a ratio, it has poor statistical properties. In addition, the sense of absolute size is lost. A treatment costing an extra £200 but delivering an extra 0.01 QALY has the same ICER as one costing an extra £20,000 but delivering 1 extra QALY. There is often huge uncertainty surrounding the QALY gain, particularly for small gains, and so the uncertainty intervals (confidence or posterior Bayes) for ICERs are often very wide and asymmetric. A chart designed to help decisions is called the cost effectiveness accept-ability curve. This looks at different thresholds for the ICER (the “willingness to pay”), and for each threshold works out the probability that a treatment is cost-effective relative to another treatment (effectively a Bayesian argument). Since the more one is willing to pay, the more likely a treatment is to be cost-effective, these will be monotonically increasing curves. Figure 1 shows a recent cost effectiveness accept-ability curve for a drug, sorafenib, versus best supportive care for advanced kidney cancer2. The steepness of the slope shows that we are reasonably confident where the true value lies. One can see that even if one is willing to pay £40,000 per QALY, sorafenib has only a 25% chance of being cost-effective – that is, of

actually delivering that extra year of good life; this rises to 50% if one is willing to pay £50,000 per QALY. The fact that the curve flattens and does not reach 100% indicates that there are possible situations in which the new treatment is worse than normal care so that, no matter how much one is willing to spend on the new treatment, it won’t be worthwhile.

The process of review

There are currently two kinds of review, multiple technology appraisals (MTAs) and single technology appraisals (STAs). MTAs involve comparison of several new treatments, often ones that form a class, such as glitizones for treating type II diabetes, and are prepared by a technology assessment group based in a university. They will carry out a systematic review to identify all the trials of the therapies, and if required will perform a meta-analysis. The whole process can take some time and so STAs were introduced to speed it up. STAs involve only one technology and the review and modelling are done by the manufacturer. Their report is sent to an evidence review group who will comment upon it and might suggest alternative scenarios. There can be problems when the manufacturer’s model is (wilfully?) obscure, or when the review group’s ICER es-timates are much higher than those presented by the company.

At the appraisal committee meeting, two of the members present summaries to the rest. One of the presenters is usually a “technical” person – a statistician or a health economist. Clinicians expert in the area and patients who may benefit from the technology are invited. The appraisal committee make suggestions

and comments, and make a decision as to whether to approve the treatment. The fact that on occasion the decision requires a secret ballot shows that this is not a cut-and-dried procedure. Their decision document is then sent out to stakeholders for review and com-ment. The committee meet again to produce a final appraisal determination. This is then sent to NICE for approval. The stakeholders can launch an appeal, but if there is no appeal or the appeal fails then the recommendations become mandatory in law. Further details are given by Walker et al.3

Statistical issues

Quantitative thinking is essential in the ap-praisal process, and so statisticians have much to offer. One general point is that modellers are often rather deterministic in their thinking and estimates are often not accompanied by any measure of uncertainty. Statisticians, by their training, tend to ask themselves how good the estimate is, how vulnerable is it to sampling and bias, and what is the uncertainty around it. An important point is that the models are non-linear, so that uncertainty also affects the point estimates. However, in the end one has to make a decision. One usually cannot simply “not reject the null hypothesis” and sit on the fence. On occasion one can protest that more evidence is needed to make a decision; in general one has to make decisions under uncertainty.

Figure 1. Cost effectiveness acceptability curve for sorafenib versus best supportive care. The x-axis shows the value, in pounds per QALY, of the ceiling ratio – the amount one is willing to pay to provide an extra year of good-quality life. The y-axis shows the probability that the drug is cost-effective (i.e. that the ICER is less than the ceiling)

100

75

50

25

00 20 000 40 000 60 000 80 000 100 000 120 000 140 000

Prob

abili

ty c

ost-

effe

ctiv

e (%

)

Value of ceiling ratio

Pitfalls in interpretation of studies for appraisal

• Isthemanufacturerorthetechnologyassessment group recommending treat-ment for only a sub-group of patients considered in the evidence? Is this sensible?

• Istheoutcomeintheprotocolforatrialthe same as the outcome in the paper reporting the trial?

• Isthetreatmentinthecontrolgroupin the evidence likely to be similar to that replaced by the new treatment in practice?

• Arethepatientsinthetrialtypicalofpatients treated in the NHS?

• Didpatientsswaptreatmentinthereported trials and if so how has this been handled in the assessment?

• Hasthemanufactureromittedsomething(e.g. a study or a particular analysis such as a probabilistic sensitivity analysis)? If so, why?

Page 4: A statistician on a NICE committee

84 june2010

The subgroup problem

A recurring theme is what is known as the sub-group problem. Manufacturers of a new drug against a specific disease would like to sell as much of the drug as possible, and so would like approval for treating as wide a range of disease varieties and severities as possible. They may design clinical trials which will include as many people as possible. The trials may show that the overall efficacy of the drug is not statisti-cally significant. The manufacturer sometimes then tries to find subgroups of patients for whom the effect is statistically significant. The danger of this was illustrated by Sleight et al. in a study of aspirin for people who had had a heart attack4. Overall the mortality was 9.4% on aspirin and 11.8% on placebo. But for those born under the star signs Libra or Gemini the mortality was 11.1% on aspirin and 10.2% on placebo – so if you were born under Libra or Gemini it appeared that aspirin was bad for you!

The subgroup problem also arose in the appraisal of photodynamic therapy for age-related macular degeneration5. This is a disease in which new blood vessels grow in the back of the eye, causing vision to become blurred. Photodynamic therapy using the drug verteporfin causes these new blood vessels to become sensitive to laser light, enabling them to be “zapped”. A trial was conducted and over-all there was statistically significant benefit of therapy in terms of people losing visual acuity of more than 15 letters in 2 years. However, macular degeneration can be divided into “clas-sic”, where the blood vessels can be easily seen, and “occult”, where they are hidden. The results by subgroup are given in Table 1.

One can see that the evidence for the “clas-sic” subgroup appears to be much stronger. The question for the committee was whether this was simply the result of data dredging. In fact the company were able to produce documents

showing that there had been a preplanned subgroup analysis, and an understanding of the way the treatment works suggested that it would be more effective in classic age-related macular degeneration. Thus approval was given for the “predominantly classic” subgroup only.

In fact trial data often relate to a specified subgroup of people, and it can be difficult to decide whether the therapy would be cost-effective in a different population. For example, a drug in a trial might be used “first line” (on diagnosis), whereas current practice is to try a cheaper drug first and if this fails to use the new drug “second line”. The latter group is obviously a different population than the former.

A related problem is when an investiga-tor changes the outcome variable between the protocol and the publication, for example from overall survival to “progression-free” survival in cancer. This may be for the rather odd reason that because fewer than 50% of the patients have died one cannot estimate the median survival time. Alas it is often not possible to determine whether an outcome was chosen as a result of an unplanned subgroup analysis or a change of endpoint unless one goes back to the trial protocol.

Lack of data

A major issue with appraisals is lack of good-quality evidence. One issue is that manufactur-ers can control the availability of data. As an example, in a meta-analysis of three trials of a drug for the treatment of rheumatoid arthritis, one trial was almost three times the size of the other two, but the manufacturer refused to release data related to the outcome variable of interest, stating that the trial was a safety study and so the main outcome was adverse events. The appraisal group were a bit suspicious of this, and so imputed a poor outcome for the drug in this trial. When the manufacturers read the report they were incandescent!

However, they still didn’t release the data and so the analysis stood.

Conclusions

It is very exciting, and a real privilege, to see statistics used not in a dry academic paper, but to help make decisions which affect people directly. It can be a responsibility to be one of a few on the committee who can appreciate whether the statistics presented are valid or not. It can be frustrating because often there are huge uncertainties about the true results and in an “academic” environment one would require more evidence before coming to a decision. However, this may deprive people of potentially beneficial therapies and so one has to do the best one can under the circumstances. NICE is not perfect, but it is well regarded throughout the world, and many other countries either follow NICE advice or have set up similar institutions. I am proud to be associated with it.

Disclaimer

The opinions expressed in this article are my own and do not necessarily reflect those of NICE.

References1. Willan, A.R. and Briggs, A.H. (2006)

Statistical Analysis of Cost-Effectiveness Data. Chichester: Wiley.

2. Connock, M., Round, J., Bayliss, S., Tubeuf, S., Greenheld, W. and Moore, D. (2009) Sorafenib for advanced heptacellular carcinoma. West Midlands Health Technology Assessment Evidence Review Group Report for the National Institute for Health and Clinical Excellence.

3. Walker, S., Palmer, S., Sculpher, M. (2007) The role of NICE technology appraisal in NHS rationing. British Medical Bulletin, 81–82, 51–64.

4. ISIS-2 (Second International Study of Infarct Survival) Collaborative Group (1988) Randomised trial of intravenous streptokinase, oral aspirin, both or neither among 17,187 cases of suspected acute myocardial infarction: ISIS-2. Lancet, ii 8607, 349–360.

5. Meads, C., Salas, C., Roberts, T., Moore, D., Fry-Smith, A. and Hyde, C. (2002) Clinical effectiveness and cost utility of photodynamic therapy for wet age-related macular degeneration. West Midlands Health Technology Assessment Group Report for National Institute for Health and Clinical Excellence.

Mike Campbell works at the School of Health and Related Research at the University of Sheffield. His prime research interests are in the design and analysis of studies of complex interactions.

Comparison: 06 subgroupOutcome: 01 predominantly classic

Studyverteporfin

n/Nplacebo

n/NRR

(95% CI fixed)Weight

%RR

(95% CI fixed)

01 predominantly classicTAP2 65/159 57/83 0.0 0.60[0.47,0.75]

02 minimally classicTAP2 106/202 58/104 0.0 0.94[0.76,1.17]

03 occult onlyTAP2 18/41 14/20 0.0 0.63[0.40,0.98]

0.1 0.2 1 5 10Favours verteporfin Favours placebo

Table 1. Results for photodynamic therapy study subgroups – outcome loss of visual acuity of more than 15 letters over 2 years