The value and hazards of standardization in clinical epidemiologic research

10
J Clh EpMemiol Vol. 41, No. 11, pp. 1125-1134, 1988 Printed in Great Britain. All rights reserved 0895-4356/88 S3.00 + 0.00 Copyright 0 1988 Rrgamon Press plc THE VALUE AND HAZARDS OF STANDARDIZATION IN CLINICAL EPIDEMIOLOGIC RESEARCH CHARLESK. CHAN,* ALVAN R. FEINSTJZIN,~ JAMB F. JEKEL$ and CAROLYN K. WELL@ Departmentsof Medicine and Epidemiology and the Robert Wood Johnson Clinical scholar Program, Yale University School of Medicine, New Haven, Connecticut, U.S.A. (Received in revised form 22 March 1988) Ah&act-The statistical standardization of rates produces a single summary value that converts crude rates of Occurrence into “standardized” rates that are adjusted for differences in the composition of compared Populations. Although the process is well described in the epidemiologic literature and is regularly applied in comparisons of large populations, many investigators are not familiar with three important hazards that are magnified for the smaller groups studied in clinical epidemiologic research. This report contains a new “symmetrical” outline of the direct and indirect standardization processes, and an illustration of three pragmatic hazards: (1) Because the direct standardizing factor uses the observed stratum-spaoilic rates, and because any stratum-specific rates that depend onsmall denominators may ba misleading or unstable, the indirect method is preferred when the observed strata have small denominators. (2) Both the direct and indirect standardizing methods are highly vulnerable both to the choice of reference population and to the boundaries chosen when strata are demarcated or consolidated. The standardized rates can be altered dramatically according to differences in the stratum proportions of the reference population, or to distinctions produced when standardizing strata are consolidated. (3) If the stratum-specific rates and stratum proportions have different patterns of variation across the strata of the compared groups, the use of a single summary value-no matter what method of standardization is applied-may obscure cogent patterns of variation and significant differences in the stratum-specific rates. These hazards can be overcome if the studied group and the reference population are carefully compared for inconsistent variations in the stratum-specific rates and proportions before any standardizing procedure is applied. In many instances, the best approach may be to compare the unaltered stratum-specific rates, without standardization. Standardization Hazards Clinical epidemiologic research *Fellow of the Medical Research Council of Canada and __. i : Visttmg Sohoiar of the Robert Wood Johnson Ciinicai Scholar Program, Yale University School of Medicine, New Haven, Connecticut, U.S.A. tProfessor of Medicine and .Epidemiology and Director, Clinical Enidemioloav Unit. Yale Universitv School of Medicine; New HJven, ‘Connecticut. Sknior Bio- statistician, Cooperative Studies Program Coordinating Center, Veterans Administration Medical Center, West Haven, Connecticut. Reprint requests should be ad- dressed to: Alvan R. Feinstein, M.D., Yale University School of Medicine, Room I-456 SHM, P.O. Box 3333, New Haven, CT 06510, U.S.A. SC. E. A. Winslow Professor of Epidemiology and Public Health, Yale University School of Medicine, New Haven, Connecticut. Supported in part by grants from the Andrew W. Mellon Foundation and The Council for Tobacco Research- U.S.A., Inc. as a Special Project. INTRODUCTION The main purpose of adjustment or “standarci- ization” in epidemiologic and Vital statistics [l-4] is to allow fair comparisons of the rates of events in populations that have different com- positions in age, sex, or other pertinent charac- teristics. The differences may arise when rates are compared for different clinical settings or geographic regions, or for the same region at different eras in secular time. A simple way of getting fair comparisons is to divide the compared populations into cogent sllhgrnllps nr strata, and then to compare the stratum-specific rates of events directly in corre- sponding strata [5]. If the compared populations contain many strata, however, the contrast of C.E. 4l/lLCi 1125

Transcript of The value and hazards of standardization in clinical epidemiologic research

J Clh EpMemiol Vol. 41, No. 11, pp. 1125-1134, 1988 Printed in Great Britain. All rights reserved

0895-4356/88 S3.00 + 0.00 Copyright 0 1988 Rrgamon Press plc

THE VALUE AND HAZARDS OF STANDARDIZATION IN CLINICAL EPIDEMIOLOGIC RESEARCH

CHARLES K. CHAN,* ALVAN R. FEINSTJZIN,~ JAMB F. JEKEL$ and CAROLYN K. WELL@

Departments of Medicine and Epidemiology and the Robert Wood Johnson Clinical scholar Program, Yale University School of Medicine, New Haven, Connecticut, U.S.A.

(Received in revised form 22 March 1988)

Ah&act-The statistical standardization of rates produces a single summary value that converts crude rates of Occurrence into “standardized” rates that are adjusted for differences in the composition of compared Populations.

Although the process is well described in the epidemiologic literature and is regularly applied in comparisons of large populations, many investigators are not familiar with three important hazards that are magnified for the smaller groups studied in clinical epidemiologic research. This report contains a new “symmetrical” outline of the direct and indirect standardization processes, and an illustration of three pragmatic hazards:

(1) Because the direct standardizing factor uses the observed stratum-spaoilic rates, and because any stratum-specific rates that depend onsmall denominators may ba misleading or unstable, the indirect method is preferred when the observed strata have small denominators.

(2) Both the direct and indirect standardizing methods are highly vulnerable both to the choice of reference population and to the boundaries chosen when strata are demarcated or consolidated. The standardized rates can be altered dramatically according to differences in the stratum proportions of the reference population, or to distinctions produced when standardizing strata are consolidated.

(3) If the stratum-specific rates and stratum proportions have different patterns of variation across the strata of the compared groups, the use of a single summary value-no matter what method of standardization is applied-may obscure cogent patterns of variation and significant differences in the stratum-specific rates.

These hazards can be overcome if the studied group and the reference population are carefully compared for inconsistent variations in the stratum-specific rates and proportions before any standardizing procedure is applied. In many instances, the best approach may be to compare the unaltered stratum-specific rates, without standardization.

Standardization Hazards Clinical epidemiologic research

*Fellow of the Medical Research Council of Canada and __. i : Visttmg Sohoiar of the Robert Wood Johnson Ciinicai Scholar Program, Yale University School of Medicine, New Haven, Connecticut, U.S.A.

tProfessor of Medicine and .Epidemiology and Director, Clinical Enidemioloav Unit. Yale Universitv School of Medicine; New HJven, ‘Connecticut. Sknior Bio- statistician, Cooperative Studies Program Coordinating Center, Veterans Administration Medical Center, West Haven, Connecticut. Reprint requests should be ad- dressed to: Alvan R. Feinstein, M.D., Yale University School of Medicine, Room I-456 SHM, P.O. Box 3333, New Haven, CT 06510, U.S.A.

SC. E. A. Winslow Professor of Epidemiology and Public Health, Yale University School of Medicine, New Haven, Connecticut.

Supported in part by grants from the Andrew W. Mellon Foundation and The Council for Tobacco Research- U.S.A., Inc. as a Special Project.

INTRODUCTION

The main purpose of adjustment or “standarci- ization” in epidemiologic and Vital statistics [l-4] is to allow fair comparisons of the rates of events in populations that have different com- positions in age, sex, or other pertinent charac- teristics. The differences may arise when rates are compared for different clinical settings or geographic regions, or for the same region at different eras in secular time.

A simple way of getting fair comparisons is to divide the compared populations into cogent sllhgrnllps nr strata, and then to compare the stratum-specific rates of events directly in corre- sponding strata [5]. If the compared populations contain many strata, however, the contrast of

C.E. 4l/lLCi 1125

1126 CHARLES K. CHAN et al.

multiple sets of stratum-specific rates becomes unappealing. To avoid these multiple com- parisons, investigators usually summarize the individual stratum-specific rates into a single overall measure of the occurrence of events [5-12],

In the customary situation, a total number of events, t, is observed in a study group having a total of n members. The crude rate of occurrence for the events is r,, = t/n. The corre- sponding rate of occurrence in each constituent stratum of the observed group, i.e. the stratum- specific rate, will be ri = t,/n,. The proportion of the total group in each stratum will be wi = n,/n, where n = Cni. The stratum-specific rates, r,, are multiplied by the stratum proportions, wi, to form a component product, riwi, for each stratum. Some reiativeiy simpie aigebra wiii show that the crude rate is the sum of these component products, i.e. r,, = Eri wi.

Each group can be simply summarized with its crude rate, but if two compared groups with similar stratum-specific rates (ri) have a different composition of their stratum proportions (w,), the crude rates may distort the important simi- larities in the stratum-specific rates. When they are multiplied by disparities in the correspond- ing wi values, the final result may be dra- m_aticallv nlt~r~d Tn avcid the nrnhlem CRIIS~~ , -------. - - r _ _ _ - _ ___ _ - -I - _ by these compositional differences, the crude rates can be “adjusted” by one of two general methods: direct or indirect standardization [l-12].

We shall first outline these two methods in a relatively novel approach that shows their gen- eral “symmetry”. We shall then describe some striking surprises and inconsistencies when ei- ther method was applied to data of a clinical epidemiologic investigation. Although these difficulties are reasonably well known to de- mographers and have been regularly discussed in biostatistical or epidemiologic literature [6,11, 131, the magnitude of the problems may not be appreciated when the standardization process is applied to data from clinical groups that are relatively small. Our main purpose in this report is to call these problems to the attention of clinical investigators. The solutions

we shall propose are not particularly novel, but many investigators are not aware of either the problems or the solutions.

THE STANDARDIZATION PROCEDURE

For either a direct or indirect standardization, a standard reference population is selected. It is often chosen “externally” from a particular geographic region or nation, but in other situ- ations, the reference population is constructed “internally” as the sum of the observed study groups. The reference population will have Ri as its stratum-specific rates, Wi as stratum propor- tions, Ri Wi as stratum component products, and crude rate R, = C Ri Wi. In these symbols for the reference population, each upper-case ietter has the same meaning as the corre- sponding lower-case letters in the observed group described earlier. The correspondence of symbols is shown in Table 1.

The direct and indirect forms of standard- ization are often confusing because the two processes seem asymmetrical. The direct process appears straightforward, being done in one al- gebraic step, whereas the indirect process seems to require several different steps and concepts. The symmetry of the two processes becomes more evident. . ..--- -._-- ___, however. if the zal~ehmic strl~r- , -- ____ -_D______ tures are rearranged so that the observed crude rate in the study group (TZriwi) is multiplied by a standardizing factor, SF, to become the standardized rate.

In both the direct and indirect adjustments, the standardizing factor depends on differences in the Wi and wi values for proportional com- position of the reference and study populations. In the direct method, these Wi and wi values are multiplied by the stratum-specific rates, ri, ob- served in the stud’ group. Thus, the standard- izing factor for the direct adjustment becomes

SF,=xriWi crjwi. i

The direct standardized rate becomes

rDS = c riwi x SF, = c riwi x .

Table 1. Symbols and formulas for summary indexes

Study group Reference population

Stratum-specific rates Stratum proportions Crude rate Expected rate

r, Wi

r, = Zrj wi r.= ZRiwi

Clinical Epidemiologic Research 1127

With cancellation of the Cr,w, term in numer- ator and denominator, this expression becomes Cr, Wi. Because the latter expression is usually cited immediately, the direct standardized rate seems to have a single algebraic step, without a “9tandardi7in9 factnr”. ____‘_____-___o ____ -_ .

In the indirect adjustment, the Wi and wi proportions are multiplied by the stratum- specific rates, Ri, observed in the reference population. Thus,

SF, = 1 R, Wi xRiwi.

The indirect standardized rate is

In both types of adjustment, the numerators of the standardizing factor contain the Wi proportions for the reference population; the denominators contain the study group’s wi proportions; and the stratum-specific rates of the standardizing factor have the same source in both numerators and denominators. The source of the stratum-specific rates differs, however, in the two methods. The direct method used the ri values of the study group; the indirect method uses the Ri values of the reference group.

The latter expression is regularly rearranged to form a ratio of the crude and expected rates in the study group. This term (pri wi/C Ri wi] = [r,, /r,]) forms a standard occur- rence ratio, which is called a Standard Incidence n_.:_ ,clTr%\ P__ :-_:>____ 1-A;. l?__ -_-L_*:*_. Aall (31A) wr 1llL;1ueIlL;5: Udld. ror mor1allry

rates, the term is called a Standard Mortality Ratio (SMR). This ratio is multiplied by the crude rate in the reference population to form the indirect standardized rate. The SIR (or SMR) is used in many epidemiologic studies because it is analogous to an estimate of relative risk.

ILLUSTRATION OF PROBLEMS

When s utmm’c atratlrm nmnnrt;nn. .T.VP VP_ WP met the swamnot;,- nmhlmn. scanr;~tprl ..1aw11 u b’“UF ” UCIUCUAI y’“y”*Cn”‘nU C&I1 A-- IT w A.11 c &I..# 1-” ub*.‘Yc’w p ” “AIXI.” UUU”“lU l.“U

tained, their multiplication by some other with standardization when we wanted an age stratum-specific rates produces an expected rate adjustment for each sex of the occurrence rates for that group. Thus, the expected rates are of lung cancer that had been clinically un- Xri Wi for the reference population and CRiwi recognized during life and that were found as a for the study group. With this concept, the surprise at necropsy [14]. During a 30-year adjustment procedure can be described in words period from 1953 to 1982, a total of 15,896

and symbols as follows: Direct Standardization:

f Expected rate in

Crude rate in x re erence population) R, study group (Crude rate in

=r,x- S!lL+? $7$-q)

'0

Indirect Standardization:

(Crude rate in Crude rate in x reference population) % study group (Expected rate in

study group) = r” ’ I;’

Table 2. Occurrence of lung cancers in men, Y-NHH necropsy “surprises” and CSTR reported cases (195382)

Age group

Era Reference population

53-62 63-72 73-82 No. lung cancer cases Male

S N S N S N in men population

<20 20-29 3&39 4cb49 50-59 6&69 7&79 ,80

Total

0 732 0 776 0 440 7 14,804,541 0 79 0 95 0 60 29 5,766,492 1 133 0 125 0 60 281 5,468,118 0 322 0 220 0 134 1776 5,112,OlO 1 572 2 539 3 256 5730 4,513,103 5 718 2 598 5 430 9079 3,117,946 6 691 10 547 4 283 6079 1,596,329 3 377 5 331 2 162 1859 645,306

16 3624 19 3231 14 1825 25,460 41,023,845

Crude rates (x 10-S) 422 588 767 62

Y-NHH: Yale-New Haven Hospital. CSTR: Connecticut State Tumor Registry, Hartford, Connecticut. S: “surprise” lung cancer. N: necropsy.

1128 CHARLES K. CHAN et al.

Table 3. Direct standardization for “surprise” lung cancer in men (1953-82) (Y-NHH and CSTR)

Age group <iO 2Ck29 3&39 4049 S&59 6&69 -l&l9 380

rDS (x 10-S)

Stratum occurrence rates (rJ (x 10-S)

53-62 63-72 73-82 Stratum proportions (IV,)

(reference population)

0 0 0 0

152 0 0 0

175 371 696 334 868 1828 796 1151

219 162

ii 0 0

1172 1163 1413 1234

292

^ ^_. U.Jbl

0.141 0.133 0.125 0.110 0.076 0.039 0.016

Y-NHH: Yale-New Haven Hospital. CSTR: Connecticut State Tumor Registry, Hartford, Connecticut. rDS : Direct standardized rates,

necropsies had been performed at Yale-New Haven Hospital (Y-NHH). For 8680 of the necropsies in men and 6767 in women, a diagno- sis of lung cancer had not been previously identified or suspected during life. In 49 of these men and 19 of the women, lung cancer was found as a “surprise” at necropsy. The results to be discussed here are shown only for the men. (Analogous results and problems were noted for women.)

The secular trends in these “surprise” occur- VP~PPE fnr three P~IIE lOC’Lc;7 !963_72, aEd l”AAIYU XVI bInI”” VIUU) II.JQ “A, 1973-82, can be constructed from Table 2, which shows the number of necropsies and the age stratum-specific occurrences for the “sur- prise” detection of unexpected lung cancer in men for each secular era. Table 3 shows the inconsistent secular trends in the stratum-

specific occurrence rates. They rose mono- tonically in the age groups 5G59 and 80. They showed a fall-then-rise pattern in the 60-69 group and a reverse rise-then-fall in the 70-79 group. They had zero values after the first era in the 30-39 age group.

We wanted to compare the necropsy surprise rates for the three secular periods with one another and with the customarily reported occurrence rates for lung cancer. We have elsewhere [14] offered a justification for the Pnirlf=minlnoir ctratC=ov nf rMilna&lo “PP+nnE” W~“..““‘“‘“e’~ “C’..C’eJ “1 ~““‘y..’ .a.- 11”W1 VrJ”, “surprise” rates with occurrence rates reported during life. Regardless of whether readers agree with this strategy, its propriety is not pertinent for the discussion here, which is concerned solely with numerical problems that arise in the two types of standardization. These prob-

Table 4. Indirect standardization for “surprise” lung cancer in men (1953-82) (Y-NHH and CSTR)

Age group

Stratum proportions (wJ

53-62 63372 73-82

Stratum expected rates (RJ* (x 10-S)

(reference population)

<20 0.202 0.240 0.241 0.047 2&29 0.022 0.029 0.033 0.503 30-39 0.037 0.039 0.033 5.139 40-49 0.089 0.068 0.073 34.546 SO-59 0.158 0.167 0.140 126.964 6M9 0.198 0.185 0.236 291.185 7&79 0.190 0.169 0.155 420.277 ~80 0.104 0.102 0.089 288.080

SIR 2.3 3.3 4.3

143 205 264 rIs c x 10-S)

*The stratum expected rates were based on the reported lung cancer rates for the ‘.~&.A 19c1 P, :” ,r.P et”+- ,.F P,.“..B.r;,...+ “C rrr\.r;A~A I.., +l.a PPTD r&L’“U L>-lJ-“L 111 L11ti UILIIG “1 ~“IAIIc*LLcUL, (10 yl”*lUrU “J Ulti bU11..

Y-NHH: Yale-New Haven Hospital. CSTR: Connecticut State Tumor Registry, Hartford, Connecticut. rls: Indirect standardized rate. SIR: Standardized incidence ratio.

Clinical Epidemiologic Research 1129

Table 5. Direct standardization for “surprise” lung cancer in men (1953-82)

Stratum occurrence rate (rj) (x 10-S)

Age Stratum proportions (IV,) group 53-62 63-72 73-82 (reference population)

:6!? 60-69 7Q-79 ,80 rns (x lo-‘)

~80 280 rm (x 10-S)

!O? 696 868 796 194

400 796 406

Four-Sfrata Method !!4 3!6 334 1163

1828 1413 1511 1253 219 439

Two-Strata Method 483 722

1511 1235 499 730

!? 869 0:076 0.039 0.016

0.984 0.016

rns: Direct standardized rates.

lems will occur with analogous forms of data, regardless of their source.

The reference population we selected for men was the data used by the Connecticut State Tumor Registry (CSTR) to calculate male stratum-specific rates of lung cancer for the entire 30-year period. These results, which con- tain the pooled cases of lung cancers reported to the CSTR and the CST’s pooled estimated population of men for the State of Connecticut during the study period, 1953-82 [15], are shown on the far right side of Table 2.

Crude rates

The crude rates shown at the bottom of Table 2 are simply the total number of cases of lung cancer found in each group, divided by the total number of people (appropriate necropsies or reference population) in the group. Thus, for men in the period 1953-62, the crude rate of necropsy surprise cases is 1613624 = 422 cases per 100,000 necropsies.

The necropsy crude rates for surprise cases of lung cancer show a steady monotonic rise from

422 to 767 per 100,000 in the three eras. The corresponding reference crude rate of detected lung cancer in the CSTR was 62 x 10P5.

Direct standardized rates

Table 3 shows the components, calculated from the data of Table 2, that are needed for direct standardization, using the formula Zr, Wi. The ri values are obtained for each stratum in Table 3 by dividing the number of “surprise” cases (S) by the number of necropsies (IV) in the mt~cnon_djng age drntnm nf Tnhk 3 The W. -“--“I, “.~...~_~. -a __.,._ _. __._ ,. , values are calculated by dividing the reference population in each age group by the total reference population.

The direct standardized rates were calculated by the I;ri Wi process and are shown in the last row of Table 3. Note that these rates are less than half the size of the corresponding crude rates in Table 2, and that the monotonically rising secular trend in the crude rates has been altered to show a fall-then-rise pattern in the direct standardized rates.

Table 6. A simplified example of mortality rates in metropolitan and nonmetropolitan counties

Age group <20 20-44 45-79 280

r (x 10e5) ros (x !Ow5) r.. (x 10e5)

Metropolitan

Stratum death Stratum

rate (r,) proportion (x 10-5) 0%)

85 0.30 128 0.20 140 0.25 160 0.25

126 124 124

Nonmetropolitan Reference

Stratum death

rate (rJ (x 10-S)

100 140 120 130

Stratum proportion

(Wi)

0.25 0.25 0.20 0.30

126 124 123

Stratum death

rate (Ri) (x 10-s)

100 120 130 140

Stratum proportion

h)

0.30 0.30 0.20 0.20

r: Crude rates. ros: Direct standardized rates. rIs: Indirect standardized rates.

1130 CHARLES K. CHAN et al.

Indirect standardized rates

Table 4 shows the components needed for the indirect standardization process. The wi values of the necropsy stratum proportions and the Ri values of the corresponding stratum-specific rates of the reference population were calculated from the appropriate data in Table 2.

To illustrate the subsequent statistical process, the expected occurrence rate, ZRiwi, for the period 1953-1962 was calculated as (0.202) (0.047) + (0.022) (0.503) + (0.037) (5.139) + (0.089) (34.546) + (0.158) (126.946) + (0.198) (291.185) + (0.190) (420.277) + (0.104) (288.080) = 191 per 100,000 necropsies. This result can be used in two ways. Since the crude occurrence rate for the standard population (XRiWi) was 62 x lo-’ in Table 2, the indirect standardizing factor will be 62 x 10m5/ 191 x 10m5 = 0.3246. This SF, can then multiply the crude rate of the study population, which was 442 x 10e5 in Table 2, to form the indirect standardized rate of 143 x lo-‘. Alternatively, the SIR for the observed and expected study group rates can be calculated as 442 x lo-‘/191 x 10w5, which is 2.3. When this value is multiplied by the crude occurrence rate for the reference population, the same indirect standardized rate is ca!cu!ated as (2.3)(62 x lo-‘) = 143 per 100,000 necropsies.

As shown in the bottom row of Table 4, the indirect standardized rates for the necropsy groups are lower than the direct standardized rates, but the monotonic rise in secular trend has been retained. The two types of standard- ;%+0t;fin th..o “h_... “..I.“+m..r:-l A:cF,------ ’ IWLIVAI LLIUD auvw JUVJL~IILIQI U~~C~CII~GJ ill the individual magnitudes and secular trend of the results.

Comparison of standardized results

The differences in composition of the com- pared groups are evident in the stratum propor- Gr\“C fnt +hp e.Pf.as.P”~.% ~,.r..ln+:~.. (IX71 :.. T..LI- CI"IIci 1"I Cur IUlC1bUbG p"pu'clL,"" \““j, “1 leaVIE;

3 and for the observed study groups (wi) in Table 4. The necropsied groups (as might be expected) were substantially older than the gen- eral reference population. The distinction is particularly well shown by inspecting the sum of proportions for strata above age 50 in Tables 3 and 4. This sum is 0.241 in the reference popu- lation, and 0.650, 0.623, and 0.620 for the three successive eras in the necropsy groups.

Tables 2, 3, and 4 also show a prime source of difficulty in the standardizing procedure. In many of the strata of the younger age (50 years) groups under study, no surprise cases were

found. These groups thus had stratum-specific occurrences of 0 in Table 2, leading to corre- sponding stratum-specific rates of 0. Con- sequently, these strata make no contribution to the direct standardized rates, which may thus become misleading.

Because of these distinctions, none of the observed or standardized results seems satis- factory for both statistical and scientific objec- tives. Statistically, the four sets of crude rates shown in Table 2 cannot be compared with one another, because of differences in composition of the study groups and reference population; and the direct standardized rates in Table 3 may be misleading, because of null values in the cells of numerators for many of the study strata. The indirect standardized rates in Table 4, appear to be statistically satisfactory, because the ob- served stratum proportions in the study groups and the reference stratum-specific rates all come from substantial numbers that seem to be rea- sonabiy stable. Nevertheless, despite the large number of necropsies under study, the standard error estimates for these indirect standardized rates are large enough to make the 95% confidence intervals overlap one another [16].

In addition, the indirect standardization pro- cedure does not seem scientifically satisfactory. It depends on the reference stratum-specific rates of lung cancer diagnosed and reported during life in a general population, but our research was concerned with surprise lung cancers discovered at necropsy. For this type of necropsy discovery, there are no reference oc- Cur_rence r&es in the Peneral nonnlntinn since o------- r-r--------9 the surprise cancers are discovered only in the population receiving necropsy. The indirect standardization could thus provide a simulated analogy, but not a true standardization.

Alternative comparisons

dilemma would be simply to compare the occur- rence rates within each set of necropsy strata, without attempting to achieve a single standard- ized value for the entire group. Thus, the stratum-specific occurrence rates in Table 3 could be compared with one another for each era, and with the standard rates shown in Table 4. In men above age 50, this comparison shows that all of the stratum-specific rates for necropsy-surprise cases (Table 3) were higher than the corresponding standard rates of cases detected during life (Table 4). As noted earlier for the secular trends, the stratum-specific rates

Clinical Epidemiologic Research 1131

in Table 3 show a monotonic rise for the age groups Xl-59 and 80, a fall-then-rise at age 60-69, and a sharp rise followed by a slight fall at age 70-79. The stratum-specific rates are all substantially higher in the third era (1973-82) thnn in thP fimt em (195143\ hut the trenrk ..*-A* I.. C&1., .I.“. VI.. \.z-- “_,, 1-w .--- ----_-1 vary according to irregularities among the strata in the middle era (1963-72). These comparisons of stratum-specific rates offer the most simple examination of the data, unaffected by any adjustments.

Another alternative would be to collapse some of the age strata of Table 2 so that ri values of 0 can be avoided. This type of consolidation would give numerical stability to the direct standardization, which is the only scientifically appropriate adjustment that can be applied here (because of the absence of true “surprise” rates from a reference population). Unfortunately, however, the direct standardized rates will also differ substantially according to the pattern of collapsed strata. When the age groups are col- lapsed into four strata, as shown in the upper half of Table 5, the corresponding direct stan- dardized rates (rDs) in the three eras become 194, 219, and 437 per 100,000. When the age groups are collapsed into two strata, ~80 and > 80 years of age, the corresponding rDs become 406 AQQ and 7111 zac chnwn in the lnwpr half nf 7”“) -MS, u__.. , d” WY “Al” .I.. _.I . .._ .-.,-a A-1.1 .c- Table 5. Although the secular trends show the same monotonic rise in both of the “collapsed” patterns, the magnitudes of the standardized rates are substantially different.

The results of these different processes there- fore show that the various types of standard- ization produce conflicting magnitudes and sometimes conflicting trends in the standardized values.

DISCUSSION

The discussion will deal with sources and possible solutions for all three of the problems just cited.

Problem 1: Standardization for small strata

As shown in Table 2, many of the strata had no cases of “surprise” lung cancer, so that the stratum-specific rates (ri) were 0. These rates may be valid if the true occurrence rates are indeed zero in a stratum, but for relatively small nf.PIIVt~nPP rn+l=Z ElW.ll DE l-l nnl nr nmnllnr * “WU‘IWXIVI IUCIU, UUIll YO “.““I “I UIIIUIIYI, :! would not be surprising to find no cases in strata containing less than 1000 members. The zero value for r, would produce a “tight confidence

interval”, because p = 0 in the formula & for the standard error used in the confidence interval, but the estimate would be highly unreliable or at least uncertain.

The impact of these “uncertain” zero values of ri is narticmlscrlv prc~incgt in th_e d&i r-’ ------- ‘J

standardization process. Because the ri values are directly multiplied by the Wi values in the reference population, and because the ri values were 0 in strata with the largest values of Wi, the direct standardization of our data led to a fall in the standardized rate for 1963-72, although the rates were actually rising in three of the four strata where cases were found.

The instability of the direct adjustment pro- cess for small strata can be reduced with the indirect method, which does not rely on the small, unstable ri values of the study groups. Instead, the indirect method uses the larger, more stable stratum-specific rates (Ri) of the reference population and the larger, more stable values of wi of the study groups. The indirect method would have been preferred for our data if a reliable reference source existed for appropriate values of the Ri.

The indirect method is obviously desirable if the results of a vast number of strata will be compared only among the groups under study, ~nrl nnt with ca “exterza!” t~fcmanm= nnnnlcxt;nn UllY ll”C ..nCAa u IW~YIVAIVI y”yUAUCa”II. Thus, in the comparison of etiologic or ther- apeutic agents, the study groups can be readily combined to form an “internal” reference pop- ulation. For example, in the Royal College of General Practitioners’ cohort study [17] of the etiologic relationship between types of con- traception and subsequent morbid events, the investigators required 1764 stratified cells to adjust for differences in age, parity, social class, and other background factors. Although the total cohort contained about 47,000 women, the denominators and numerators for rates of mor- bidity with each form of contraception became very small in many of the cells of the 1764 strata. To avoid the unstable values of ri pro- duced by these small numbers, the investigators used an indirect standardization process, in which the entire cohort under study became an “internal” reference population. In the necropsy study under discussion here, the results for the three individual eras could have been combined to produce an internal reference population. -t-h& JV&..a hnw..wc=r althnmoh cnitslhle fnr I,,,U ~,I”A”I, L‘“..“.WI) .+‘.““..(5” 1.e1.---- --- examining the trends in surprise detection, would not have allowed the necropsy sur- prise rates to be compared with the rates of

1132 CHARLES K. C~UN et al.

cancers detected in the conventional reference adjusted with standardizing populations having population. different age compositions.

Regardless of whether the reference popu- lation comes from external or internal sources, the data will often contain many strata with small denominators or with_ zero occurrence i______ --__- ______ -_-_L -- ____._______ rates when risks of exposure are assessed in most clinical research and occupational studies [18]. If a single adjusted value is desired in such studies, the indirect method of standardization will be preferred. The indirect method should be more statistically robust than the direct method, and should yield a numerically more stable adjusted rate in these circumstances.

This distinction is sometimes ignored when secular trends are reviewed for mortality rates over a period of decades. If the investigators indicate that the rates are “standardized”, but do not state that the same standardizing popu- lation was used for all of the adjustments at each decade, the standardized values may be distorted and not really comparable. Even if the same standardizing population was used for all the adjustments, however, the results might be substantially different had some other standardizing population been selected.

If the indirect method is scientifically un- desirable, as in the necropsy data under dis- cussion, adjacent strata can be collapsed to increase their sizes and to make the ri values more stable for use in a direct standardization. As shown in Table 5 and in the next section, however, the collapsing process can produce a new set of problems.

Problem 3: Obscured distinctions in a single summary

Problem 2: Selection of standardizing population

Both the direct and indirect methods of stan- dardization are highly sensitive to the choice of reference population [5, 13, 191. Distinctively different compositions of the stratum propor- tions (wi) or di_flerent nnttemn nf cnllnnced rl_--‘--I __ -----rl- - strata can drastically alter the crude rates. The different standardized rates might be made to support completely opposite conclusions, particularly if the factitious standardized rates are used for inferential conclusions about hypotheses, rather than for purely descriptive contrasts.

if the stratum-specific rates vary in different ways across the strata of the study and reference populations, the cogent patterns of variation may be obscured when the adjusted rates are compared with a single summary index, regard- less of what method of standardization is used. The single overall adjusted (directed or indirect) rates may show no difference between the two compared populations, although cogent differences exist ., in the individual stratum- specific rates. (The results would be analogous to a type ii error in intemretinc2 s!at&tica! r------a results.)

This hazard is evident from a closer look at the formulas used for standardization. In both methods, the crude rate is multiplied by a standardization factor, SF, which contains Wi values in the numerator and wi values in the denominator. Because the values of wi are fixed, and because the same set of stratum-specific rates is used in both factors-r, for the direct, and Ri for the indirect adjustments-the Wi composition of the selected standard population can substantially alter the values of the SF, and thereby the standardized rates.

Kitagawa [20] offered a good illustration of this phenomenon in a study of mortality rates in 1960 among the different counties in the U.S. Table 6, which contains a condensed version of Kitagawa’s example, shows the strata, strata- specific rates, stratum proportions, overall crude rates, and the direct and indirect stan- dardized rates for a comparison of the mortality rates between a set of the metropolitan and nonmetropolitan counties. The stratum-specific rates show some striking differences. They are distinctly lower for the metropolitan counties in the younger age groups (below age 45) and distinctly higher in the older age groups. The crude rates for the two regions are identical, however, and remain essentially unchanged after either the direct or the indirect standard- ization procedure.

This process was illustrated in our lung cancer data when the strata were collapsed in Table 5 to alter the values of Wi (as well as wi). In the more conventional data of vital statistics for mortality rates, Yerushalmy [5] has given several examples of the dramatic changes pro- duced when the same set of mortality rates is

The obscuring effect of these single summary indexes arises from the composition of strata in the observed groups, and from the distribution of stratum nronortions and stratum-specific r~ ~r~~-~-~ rates in the reference population.

The foregoing problem can be avoided if standardization is replaced by more than one

Clinical Epidemiologic Research 1133

summary index for each population. For exam- ple, the results of Table 6 could be consolidated into two strata-one for people under age 45 and another for those who were 2 45. The cogent differences in rates would then be pre- served for these two age strata in the metro- politan and nonmetropolitan counties. The data analysts would have to be prepared, however, to compare two sets of summary values rather than one.

INDICATIONS FOR STANDARDIZATION

Although the use of standardized rates has often been criticized [5-12,201, and although the most fundamental and most accurate com- parisons consist of a direct contrast of stable stratum-specific rates, standardization may be appropriate and necessary in at least three situ- ations. in one situation, as in the first exampie cited here, the stratum-specific numbers may be too small to be regarded as stable. An indirect standardized rate-if appropriate data are available-may be a reasonable compromise for this type of problem. Second, the stratum- specific rates may not be available in the ob- served study groups, thus requiring the indirect method of standardization to adjust the various crude rates. Third, the comparison of statistical results for different regions or eras (and some- times treatments) is much easier with a single summary index than with an array of stratum- specific rates.

Nevertheless, the process has the hazards cited here when stratum-specific rates for ri are unstable (i.e. small rate values with small de- nominators), when consolidation of strata alters the relative structure of the stratum proportion values for Wi, and when a single summary value obscures important distinctions that can be noted only with direct comparison of rates in individual strata.

A Sinai probiem that has been deiiberateiy omitted from this discussion is the use of other- than-demographic variables for the adjustment process. In conventional epidemiologic litera- ture, most rates are standardized according to age, gender, and (sometimes) race. These de- mographic variables are used because the data are readily available in most groups, and be- cause they have a well known effect in mortality rates of a general community population, whose individual states of health are not under consid- eration. In the groups of diseased people who are often studied in clinical epidemiology, how-

ever, the severity of the disease is a much more cogent prognostic factor [13] than the de- mographic attributes of age, sex, and race. Any adjusted or unadjusted comparisons will be unsatisfactory if differences in severity of disease are neglected in the compared groups. For comparisons of this type, the “adjustments” are usually done not with standardization from a reference population, but with a multivariable mathematical analysis that includes the appro- priate “severity” factors for the observed groups.

Acknowledgements-The authors thank MS Elizabeth Pesapane for the preparing the manuscript, and Mr Paul Sullivan B.A., research analyst of the CSTR, for providing the 1980-82 Connecticut lung cancer data and population estimates.

5.

6.

7.

8.

9.

10.

11.

12.

13.

14.

15.

World Health Statistics Quarterly. Geneva: World Health Organization; 1982; Vol. 35, 22-42. Cancer Incideoce in Five Countries. Lyon: International Agency for Research on Cancer; 1976: Vol. 3,453584. World Health Statistics Annual. Geneva: World Health Organization; 1979: 22-37. Cancer control objectives for the nation: 1985-2000. Nntl Cancer Inst mowgr. Bethesda. Md: National Cancer Institute; 1986. Yerushahny J. A mortality index for use in place of the see-adiusted death rate. Am J Public Health 195 1: 4 1: 967-9i2. Fleiss JL. The standardization of rates. In: Fleiss JL, Ed.StatlstlcaI Methods for Rates and Proportions. New York: Wiley; 1981: 2nd edn, 237-255. Doll R, Cook P. Summarizing indices for comparison of cancer incidence data. Int J Cancer 1967; 2: 269-279. Woolsey TD. Adjusted death rates and other indices of mortality. In: Linder FE, Grove RD, Eds. Vital Statis- tics Rates ia tbe United States, 1900-1940. Washing- ton, DC.: U.S. Government Printing Office: 1943: 60-91. Breslow NE, Day NE. Indirect standardization and multiplicative models for rates, with reference to the age adjustment of cancer incidence and relative fre- auencv data. J Cbron Dis 1975: 28: 289-303. Elveback LR. Discussion of “Indices of mortality and tests of their statistical significance”. Hum Biol 1966; 38: 322-324. Kleinbaum DG, Kleinbaum A. Adjusted Rates: The Direct Rate, Undergraduate Mathematics and its Appli- cations. Cambridge, Mass.: Birkhauser, Boston; 1979. Lidell FDK. The measurement of occupational mor- tality. Br J Iadast Med 1960; 17: 228-233. Feinstein AR. Clinical Epidemiology. Philadelphia: W. B. Saunders; 1985. McFarlane MJ, Feinstein AR, Wells CK, Chan CK. The “epidemiologic necropsy”: unexpected detections, demographic selections, and that changing rates of lung cancer. JAMA 1987; 258: 331-338. Heston JF, Kelly JAB, Meigs JW, Flannery JT. Forty- five years of cancer incidence in Connecticut 1935-79. Natl-Cancer Inst Monog 70. Bethesda, Md: National Cancer Institute: 1986. (NIH Publication No. !i/i%C?~ Y” _“.a_,.

16. Kahn HA. Adjustment of data without use of multi-

REFERENCES

1134 CHARLES K. CHAN et al.

variate models. In: Kahn HA, Ed. An Introduction to - ._ cupational cohort studies. Int J Epidemiol 1986; 15: Bpidemioiogic iViethods. New York Oxford University

^ __ x-21.

Press; 1983; 75-78. 19. Wainer H. Minority contributions to the SAT score ..17. Royal College of General Practitioners’ oral con- turnaround: an example of Simpson’s paradox. J Educ

traception study. Further analyses of mortality in oral Stat 1986; 11: 239-244. contraceptive users. Lancet 1981; 1: 541-546. 20. Kitagawa EM. Theoretical considerations in the selec-

‘18. Tsai SP, Wen CP. A review of methodological issues tion of a mortality index, and some empirical com- of the standardized mortality ration (SMR) in oc- parisons. Hum Biol 1966; 38: 293-308.