The Heritability of Chronometric Variables

69
Chapter 7 The Heritability of Chronometric Variables Evidence for a genetic component in chronometric performance assures us that reliable measures of individual differences reflect a biological basis and that variation in the brain mechanisms involved are not solely the result of influences occurring after conception. Any nongenetic effects could be attributable to exogenous biological influences, such as disease or trauma, or to nutrition in early development, or to purely experiential effects that transfer to an individual's performance on chronometric tasks, for instance, practice in playing video games. In addition to heritability of individual differences in performance on a given chrono- metric task, it is important to know the nature of the task's correlation with other vari- ables that lend some degree of ecological validity (i.e., the variable's correlation with 'real life' performances generally deemed as important in a given society). Two distinct variables, for example, reaction time (RT) and IQ, could each be highly heritable, but the correlation between them could be entirely due to nongenetic factors. The two variables, say, RT and IQ, could each be indirectly correlated with each other via each one's corre- lation with a quite different variable that causally affects both variables, for example, some nutritional factor. On the other hand, two variables could be genetically correlated. A genetic correlation, which is determined by a particular type of genetic analysis, indi- cates that the variables have certain genetic influences in common, though other specific genetic or environmental factors may also affect each variable independently. In the ter- minology of behavioral genetics, both genetic and environmental effects may be either common (or shared), in whole or in part, between two or more individuals, or they may be specific (or unshared) for each individual. There are two types of genetic correlation between kinships: (1) simple genetic correlation and (2)pleiotropic correlation. In a simple genetic correlation different genetic factors, a and b, for different phe- notypic traits, A and B, are correlated in the population because of population hetero- geneity due to cross-assortative mating for the two traits. Hence within any given family, in meiosis, the genes for each trait are independently and randomly passed on to each of the offspring. Because of independent, random assortment of the genes going to each sibling, the causal connections a---~A and b---~B themselves have no causal con- nection in common. A well-established example is the population correlation between height and IQ. These phenotypes are correlated about .20 in the general population, although there is no causal connection whatsoever between genes for height and genes for IQ, as shown by the fact that there is zero correlation between the height and IQ of full siblings (Jensen & Sinha, 1993). All of the population correlation between height and IQ exists between families; none of the correlation exists within families (i.e., between full siblings). In a pleiotropic correlation, a single gene has two (or more) different phenotypic effects, which therefore are necessarily correlated within families. The sibling who has

Transcript of The Heritability of Chronometric Variables

Page 1: The Heritability of Chronometric Variables

Chapter 7

The Heritability of Chronometric Variables

Evidence for a genetic component in chronometric performance assures us that reliable measures of individual differences reflect a biological basis and that variation in the brain mechanisms involved are not solely the result of influences occurring after conception. Any nongenetic effects could be attributable to exogenous biological influences, such as disease or trauma, or to nutrition in early development, or to purely experiential effects that transfer to an individual's performance on chronometric tasks, for instance, practice in playing video games.

In addition to heritability of individual differences in performance on a given chrono- metric task, it is important to know the nature of the task's correlation with other vari- ables that lend some degree of ecological validity (i.e., the variable's correlation with 'real life' performances generally deemed as important in a given society). Two distinct variables, for example, reaction time (RT) and IQ, could each be highly heritable, but the correlation between them could be entirely due to nongenetic factors. The two variables, say, RT and IQ, could each be indirectly correlated with each other via each one's corre- lation with a quite different variable that causally affects both variables, for example, some nutritional factor. On the other hand, two variables could be genetically correlated. A genetic correlation, which is determined by a particular type of genetic analysis, indi- cates that the variables have certain genetic influences in common, though other specific genetic or environmental factors may also affect each variable independently. In the ter- minology of behavioral genetics, both genetic and environmental effects may be either common (or shared), in whole or in part, between two or more individuals, or they may be specific (or unshared) for each individual.

There are two types of genetic correlation between kinships: (1) simple genetic correlation and (2)pleiotropic correlation.

In a simple genetic correlation different genetic factors, a and b, for different phe- notypic traits, A and B, are correlated in the population because of population hetero- geneity due to cross-assortative mating for the two traits. Hence within any given family, in meiosis, the genes for each trait are independently and randomly passed on to each of the offspring. Because of independent, random assortment of the genes going to each sibling, the causal connections a---~A and b---~B themselves have no causal con- nection in common. A well-established example is the population correlation between height and IQ. These phenotypes are correlated about .20 in the general population, although there is no causal connection whatsoever between genes for height and genes for IQ, as shown by the fact that there is zero correlation between the height and IQ of full siblings (Jensen & Sinha, 1993). All of the population correlation between height and IQ exists between families; none of the correlation exists within families (i.e., between full siblings).

In a pleiotropic correlation, a single gene has two (or more) different phenotypic effects, which therefore are necessarily correlated within families. The sibling who has

Page 2: The Heritability of Chronometric Variables

128 Clocking the Mind

the pleiotropic gene will show both phenotypic traits; the child who does not have the gene will show neither of the traits. An example is the double-recessive gene for phenylketonuria (PKU), which results in two effects:(1) mental retardation and (2) a lighter pigmentation of hair and skin color than the characteristic of the siblings without the PKU gene. (Nowadays, the unfortunate developmental consequences of PKU are ameliorated by dietary means.) Another likely example of pleiotropy is the well-established correlation (about +.25) between myopia and IQ. The absence of a within-family corre- lation between two distinct phenotypic traits contraindicates a pleiotropic correlation. The presence of a within-family correlation between two distinct phenotypes, however, is not by itself definitive evidence of pleiotropy, because the correlation could possibly be caused by some environmental factor that affects both phenotypes. Pleiotropy can be indi- cated by the method of cross-twin correlations between two distinct variables (e.g., dif- ferent test scores), A and B. The method is applicable to both monozygotic, twins reared apart (MZA) and monozygotic, twins reared together-dizygotic, twins reared together (MZT-DZT) designs. Pleiotropy is indicated if the MZA twins in each pair (labeled MZA1 and MZA2), show essentially the same cross-correlations on tests A and B, i.e., the cross-correlation between test A scores of MZA1 and test B scores of MZA2, and twins 2 are significant and virtually the same as the correlations between A scores of MZA1 and B scores of MZA1 (and also the same for MZA2). The two main types of genetic cor- relation, simple, and pleiotropic, are illustrated in Figure 7.1, the direct and cross-twin correlations in Figure 7.2.

Heritability Defined

Heritability, symbolized as h 2, is a statistic derived from various kinship correlations on a given metrical trait that estimates the proportion of the total phenotypic variance in a defined population that is attributable to genetic variation, i.e., individual differences in genotypes. (An exceptionally thorough critical exposition of the meaning of heritability is provided by Sesardic, 2005.)

In the simplest formulation, a phenotype (P) is represented as the sum of its genetic (G) and environmental (E) components, i.e., P = G + E. The phenotypic variance (Vp) therefore,

Genetic correlation

Simple Pleiotropic

~ - - - " " b Genotypes a" " a

A B A B Phenotypes ,. , ". ,"

Figure 7.1: Genetic correlations, simple and pleiotropic, between phenotypes A and B. Arrows indicate a causal relationship between genes and phenotypes; dashed curved lines

indicate a statistical correlation.

Page 3: The Heritability of Chronometric Variables

The Heritabil i ty o f Chronometric Variables 129

T w i n s Scores On T e s t

1 " "

,, ,,

,, ,,

,, ,,

1 " "

2 " "

1 " "

,, ,,

A

A

B )r B

Twin correlations

A

B

B )r A

Cross-twin correlations

Figure 7.2: Illustration of twin and cross-twin correlations, which are usually obtained from samples of both MZ and DZ twins and are used in quantitative genetics to estimate the genetic correlation (or the environmental correlation) between different metric traits. For further discussion and references explicating this method, see Plomin, DeFries, and

McClearn, 1990, pp. 235-246.

is Vp = V6 + V E. Heritability, then, is defined as h 2 -- VG]V P. The proportion of phenotypic variance attributable to the environment (e2), i.e., variance due to all nongenetic factors, is not measured directly, but is the residual: e 2 = 1 - h 2.

The two most commonly used kinds of data for estimating h 2 in humans are based on (1) the correlation between identical, or MZA; and (2) contrasting the correlation between MZT with the correlation between fraternal, or DZT. Note: the effect of age differences between the twin-pairs in such studies are statistically removed.

Here is the logic in estimating h 2 from these twin correlations:

(1) MZA have all of their genes in common (i.e., identical genotypes) but they have no shared environment; so, the intraclass correlation between MZA for a given trait is entirely a result of they having all their genes in common. Therefore, the intraclass correlation between MZA twins, labeled rMz A, is a direct estimate of the proportion of genetic variance in the measured trait, so rMz A = h2.1

(2) The correlation between MZT, rMz T, comprises all of their genes and their shared envi- ronment, and the correlation between DZT, rDz T, comprises half of their genes and their shared environment. So the difference rMz T- rDz T estimates half of the genetic variance; therefore h 2 = 2(rMzx--rDzv).

Page 4: The Heritability of Chronometric Variables

130 Clocking the Mind

Estimates of h 2 in Chronometric Variables

The total published empirical literature on this subject consists of 12 independent studies, as far as I can determine. It is difficult to summarize or compare them all in detail, as they are so heterogeneous in the various chronometric paradigms, procedures, specific meas- ures, and subject samples that they used. For this specific information readers are referred to the original sources, which for economy are cited here only by first author and date of publication: Baker, Vernon & Ho (1991), Boomsma (1991), Bouchard et al., (1986), Couvre (1988a,b), Ho (1988), Luciano et al., (2001), McGue et al., (1984, 1989), Neubauer (2000), Petrill (1995), Rijsdijk (1998), and Vernon (1989). Three of the studies (Bouchard, 1986; McGue, 1984, 1989) are based on MZA; all the others are based on MZT-DZT. Both of these types of studies provide fairly similar estimates of h 2, although those based on MZA are less variable, generally they give slightly higher values.

The findings of these studies fall into three categories

(1) the h 2 of direct measures of chronometric variables (RT or speed of information pro- cessing);

(2) the h 2 of derived measures (intraindividual variability in RT (RTSD), difference scores (such as name identity-physical identity (NI-PI) in the Posner paradigm), and the intercept and slope of RT (as in the S. Sternberg memory scan paradigm) as a function of information load or task complexity; and

(3) the genetic correlation (rrp) between a direct chronometric variable (e.g., RT) and a psychometric variable (e.g., IQ or g). A genetic correlation of this type is pleiotropic (see Figure 7.1 ).

The results of these studies can be summarized accordingly:

(i) For direct measures of RT (speed of response), the estimates of h 2 a r e Mean = .44, Median = .48, and SD = . 19.

(ii) For derived measures (RTSD, NI-PI difference scores, intercept, slope in S. Stemberg paradigm), h 2 Mean = .29, Median = .23, and SD = .21. Omitting RTSD, the values of h e based only on difference scores, intercept, and slope are Mean = .20, Median .20, and SD = .08.

(iii) For genetic correlations between direct measures of RT and IQ, Mean = .90, Median = .95, and SD = .134. In other words, there is a very high degree of genetic deter- mination of the correlations between RT and IQ. The phenotypic correlation between RT and IQ is lowered by nongenetic effects.

Authors of the various studies concluded the following: "The common factor (among 11 RT tests) was influenced primarily by additive genetic

effects, such that the observed relationships among speed and IQ measures are mediated entirely by hereditary factors" (Baker, 1991, p. 351).

". . . the general mental speed component underlying most of these tasks, which are strongly related to psychometric measures of g, do show substantial genetic effects" (Bouchard, Lykken, Segal & Wilcox, 1986, p. 306).

Page 5: The Heritability of Chronometric Variables

The Her#ability of Chronometric Variables 131

"The phenotypic relationship between the measures of general intelligence and the measures of speed of processing employed are due largely to correlated genetic effects" (Ho, Baker & Decker, 1988, p. 247).

"The results reported here support the existence of a general speed component under- lying performance on most experimental cognitive tasks which is strongly related to psy- chometric measures of g, and for which there are substantial genetic effects" (McGue, Bouchard, Lykken & Feier, 1984, p. 256).

"The RT tests' correlations with full-scale IQ scores correlated .603 with their heritabili- ties, and the tests' correlations with g-factor scores extracted from the 10 subtests of the Multidimensional Aptitude Battery correlated .604 with their heritabilities" (Vernon, 1989b).

A large study of the genetic correlation between speed of information-processing (SIP) and IQ in MZT and DZT twins tested at ages between 16 and 18 years concluded, "Multivariate genetic analyses at both ages showed that the RT-IQ correlations were explained by genetic influences. These results are in agreement with earlier findings... and support the existence of a common, heritable biological basis underlying the SIP-IQ relationship" (Rijsdijk, Vernon & Boomsma, 1998, p. 77).

Another large study of MZT and DZT twins concluded, "To summarize, S I P - as measured by ECTs - - has been shown to correlate substantially with psychometric intel- ligence. Variance in ECTs shows substantial genetic influence, although somewhat less than psychometric intelligence. The phenotypic correlations between measures of mental speed and psychometric intelligence are largely, but not exclusively, due to overlapping genetic effects. The same can be concluded for the correlations among independent meas- ures of intelligence. Thus, the speed of information processing is partially heritable and shares part of its genetic influence with psychometric intelligence. But there are also spe- cific genetic influences apart from those pleiotropic genes, affecting both psychometric intelligence and mental speed" (Neubauer, Spinath, Riemann, Borkenau & Angleitner, 2000, p. 285).

In order to obtain the possibly truest overall estimate of h 2 based on all 12 of he extant studies, Beaujean (2005) undertook a sophisticated model-fitting meta-analysis of all the reported data, a type of analysis that, in effect, partials out the irrelevant variance resulting from the marked heterogeneity of the elementary cognitive tasks (ECTs), the different types of twin studies, and the diverse subject samples used in the 12 extant data sets. His "best estimate" of the average h 2 of the various RT measures is .53, which is hardly dif- ferent from the average h 2 of various psychometric tests obtained under similarly hetero- geneous conditions. The subjectively simpler ECTs averaged lower heritability (.40) than the subjectively more complex ECTs (.67).

Generalizations from the Present Studies of the Heritability of Chronometric Variables

(1) Direct measures of RT or speed of information processing are unquestionably herita- ble, with a wide range of h 2 values (depending on various conditions) that are gener- ally somewhat smaller than the values of h 2 for most psychometric tests, although there is considerable overlap between the distribution of h 2 values for direct chrono- metric variables (RT) and the distribution of h 2 values for psychometric test scores. The lower values of h 2 for RT than for test scores seems to be related to the relative

Page 6: The Heritability of Chronometric Variables

132 Clocking the Mind

simplicity and homogeneity of most RT measures and to their greater specificity (i.e., sources of variance unique to each different RT task).

(2) There are two methods of data analysis that invariably increase the estimated h 2 of RT measures: (a) a composite RT, and (b) a latent trait analysis of RT. (a) The composite RT is the overall average RT (either mean or median) obtained

from an individuals' mean or median RTs derived from a number of different ECTs. The composite mean RTs can be either the raw RTs or they can be the means of the standardized RTs within each ECT. The h 2 based on the raw com- posite RTs is generally higher than the h 2 based on standardized RTs. The reason is that raw RTs of different ECTs reflect their different levels of complexity or dif- ficulty, and the more difficult tasks have larger values of RT. The larger values of RT (up to a point) also have higher h 2, and therefore the tasks composing the raw composite are, in effect, weighted in favor of yielding a higher value for h 2 in the total composite than does the standardized RT. The raw composite RT takes advantage of what has been termed the "heritability paradox" (explained in the section on "The 'Heritability Paradox' "). The raw composite value is also prefer- able because it retains the RT values in the original and meaningful ratio scale of time measurement. The aggregation of different RT measurements in a composite increases h 2 simply because it gives increased relative weight to the common fac- tor of RT among a number of different tests, and thereby increases the reliability of the common factor and diminishes the proportion of variance attributable to uncorrelated factors among different RT tasks, that is, their specificity. Nevertheless, task specificity remains as a source of variance that attenuates the true correlation between the phenotypic common factor of the various RTs and other variables of interest, including its genotype. (Recall that h 2 is the square of the phenotype X genotype correlation (see Note 1).

(b) Probably the best available method for estimating the h 2 o f RT is by means of a latent trait analysis or a factor analysis, where the latent trait or factor is the largest common factor of a number of different RT measures of the particular ECTs of interest. This is best represented by the first principal factor (PC1) in a common factor analysis. The resulting factor score, which best represents the common factor among the ECTs, then, is a weighted composite score comprising all of the ECTs in which the RTs on each ECT are weighted by their loadings on the PF1. As is true for all factor scores, there remains some small degree of error consisting of other minor factors, test specificity, and unreliability. These unwanted sources of individual differences variance are never perfectly elimi- nated, but are more minimized by the use of factor scores (particularly when obtained by the method first proposed by Bartlett, 1937) than by perhaps any other present method.

Age Correction

Age needs to be taken into account in the estimation of h 2, because h 2 is known to vary across the total age range, generally increasing gradually from early childhood to later maturity. RT, too, varies with age, as was shown in Chapters 4 and 5. As most of the

Page 7: The Heritability of Chronometric Variables

The Heritability of Chronometric Variables 133

studies of RT heritability are based on groups that are fairly homogeneous in age (mostly young adults), the age factor is not an importantly distorting source of variance. In groups that are more heterogeneous in age, the age factor can be statistically removed from the RT data by means of multiple regression, where the regression equation for each subject includes chronological age (in months), age squared, and age cubed as the independent variables, with RT as the dependent variable; the differences between the obtained and predicted RTs are the age-corrected scores used for estimating h 2. This regression procedure is typically sufficient to remove all significant linear and nonlinear effects of age from the RT measures.

The "Heritability Paradox"

This is the name given to the frequent and seemingly surprising finding that h 2 increases as a function of task complexity and also as a function of the degree to which tasks call for prior learned knowledge and skills. The notion that this is paradoxical results from the expectation of some theorists that the more elemental or basic components of performance in the causal hierarchy of cognition, going from stimulus (the problem) to response (the answer), are far less removed from the underlying brain mechanisms than are the final out- comes of complex problem solving. The tests of greater complexity are considered to be more differentially influenced by environmental conditions, such as prior cultural-educa- tional opportunities for having acquired the specific knowledge or skills called for by the test. Therefore, it is expected that individual differences in tests that are more dependent on past-acquired knowledge and skills, such as typical tests of IQ and scholastic achieve- ment, should reflect genetic factors to a lesser degree than RT on relatively simple ECTs, which are assumed to depend almost exclusively on the most basic or elemental cognitive processes. Presumably, individual differences in RT on such relatively simple ECTs scarcely involve differences in environmental factors influencing specific cultural or scholastic knowledge and skills, and therefore ECTs should have higher h 2 than the more experientially loaded psychometric tests.

It turns out, however, that the empirical finding described as the "heritability paradox" is not really paradoxical at all. It is actually an example of a well-known effect in psy- chometrics - - the aggregation of causal effects, whereby the sum or average of a number of correlated factors has greater reliability and generality than the average of the reliabil- ity coefficients of each of the aggregate's separate components. The factor common to a number of somewhat different but correlated ECTs, therefore, should be expected to have a higher phenotype-genotype correlation (and thus higher h 2) than its separate elements. Psychometric tests, even the relatively homogeneous subtests of a multi-test battery, are generally much more complex measures than are the ECTs typically used in measuring RT. A single psychometric test is typically composed of numerous nonidentical items each of which samples a different aspect of brain activity, even when the same elements in that activity are sampled by RT on a specific ECT that might consist of a virtually iden- tical reaction stimulus on each and every test trial and therefore would elicit responses involving a much more restricted sample of neural processes. In view of the aggregation effect, it should also not be surprising to find that the composite RT on a number of varied ECTs shows values of h 2 that are very comparable to those found for most psychometric

Page 8: The Heritability of Chronometric Variables

134 Clocking the Mind

tests of intelligence and achievement. Achievements are also an aggregation, not just due to their variety of knowledge and skill content, but also because they are a cumulative, aggregated product of neural information processes occurring over extended periods of time. In acquiring knowledge and skills over an extended period of time, individual dif- ferences in speed of information processing are much like differences in compound inter- est rates acting on a given amount of capital. The seemingly slight but consistent individual differences in such relatively elemental processes as measured by ECTs, when acting over periods of months or years, therefore, can result in remarkably large individ- ual differences in achievement. Individual achievements, reflecting largely the accumu- lated effects of individual differences in speed of information processing acting on cultural-educational input, therefore, generally show a greater genetic than environmen- tal influence.

Note

1. A common error in the interpretation of the correlation between MZA is that the h 2 is the square of the MZA correlation, whereas, in fact, the MZA correlation itself is a direct estimate of h 2, as explained below:

Definitions P~, phenotype value of twin 1; and P2, phenotype value of twin 2 (i.e., the co-twin of

twin 1). G~, genotype value of twin 1; G 2, genotype value of twin 2, and, as the twins are

monozygotic, G 1 = G 2 = G. E~, environmental value of twin 1; E 2, environmental value of twin 2.

The Model

Pl = G + E 1 and P2 = G + E 2

Computation The values P, G, and E are in standardized score form, so that in a sample of twins the means of the values of P, G, and E are all equal to 0, with SD (and variance) = 1.

Assume the rearing environments of the separated MZ twins are uncorrelated, and that G and E are uncorrelated (i.e., random placement of co-twins in different environ- ments).

Then the correlation (rl2) between twins 1 and 2 in a large sample of N pairs of twins (with members of each twin pair called 1 and 2) is

r12 = ]~P1P2]N-- ~,[(G+E1)(G+E1)]/N

(Note: read E as "the sum of ") Expanded, this becomes

rl2 -- ~G2]N + ~GEI/N + ]~GE2]N + ]~E1E2]N

Page 9: The Heritability of Chronometric Variables

The Heritability of Chronometric Variables 135

Since G and E are uncorrelated and E~ and E 2 a re uncorrelated, each of the last three terms in the above equation is equal to zero, and so these terms can be dropped out, leav- ing only

r12 = ~,GZ[N, which is the genotypic variance (or genetic variance, or heritability). The present mean value of r12 for IQ in all studies of MZ twins reared apart is .75. The estimated population correlation between phenotypes and their corresponding

genotypes, then, is ~r12. (The present best estimate of the phenotype-genotype correlation for IQ = ~/.75 = .87.)

Page 10: The Heritability of Chronometric Variables

Chapter 8

The Factor Structure of Reaction Time in Elementary Cognitive Tasks

John B. Carroll, the doyen of factor analysis in the field of psychometrics, reviewed the existing literature in 1991 on the factor structure of reaction time (RT) and its relationship to psychometric measures of cognitive abilities (Carroll, 1991a). He factor-analyzed 39 data sets that might possibly yield information on the factorial structure of various kinds of RT measurements. His conclusion:

My hope was to arrive at a clear set of differentiable factors of reaction time, but this hope could not be realized because of large variation in the kinds and numbers of variables that were employed in the various studies. The only rough differentiation that I could arrive at may be characterized as one that distinguished between very simple RT tasks, such as simple and choice RT, and tasks that involved at least a fair degree of mental process- ing including encoding, categorization, or comparison of stimuli. These two factors are sometimes intercorrelated, but sometimes not; the correlation varies with the type of sample and the types of variables involved. The main conclusion I would draw here is that the available evidence is not sufficient to permit drawing any solid conclusions about the structure of reaction time variables. (p. 6)

The one fairly certain conclusion that Carroll could glean at that time was that the corre- lation between RTs, on the one hand, and scores on conventional tests of cognitive abili- ties, on the other, originates in their higher-order factors, namely psychometric g. As he stated it, ".. .the correlation [between chronometric ECTs and psychometric tests] can be traced to the fact that the higher-order ability, in this case [psychometric] g, includes some tendency on the part of higher-ability persons to respond more quickly in ECT tasks" (p. 10). Later on, in his great work on the factor structure of cognitive abilities (Carroll, 1993), he was able to draw on newer studies that yielded further conclusions regarding the relationship between chronometric and psychometric measures, the subject of Chapter 9. The present chapter examines only the factor structure of RT in various ECTs separately from their correlations with psychometric tests. Unfortunately, this task is about as intractable today as it was for Carroll back in 1991, despite the greatly accelerated growth of RT research in recent years. The problem remains much the same as Carroll described it. Interest has been focused so much on the connection between RT and psychometric intelligence that the factor structure of RT tasks has never been systematically studied in its own fight. So we are limited to the few possible generalizations about factor structure that can be gathered from the correlations among various ECTs originally obtained to answer different questions.

Page 11: The Heritability of Chronometric Variables

138 Clocking the Mind

We are reasonably tempted therefore to ask: why bother with the factor analysis of RTs? Is asking about the factor structure of ECTs even the right question? The answers to these essentially methodological questions and the empirical conclusions that follow depend on being clear about certain factor-analytic concepts and their specific terminology.

Factor Analysis and Factor Structure

Factor analysis is used to describe the structure of a given correlation matrix of n variables in terms of a number of source traits, or latent variables, that cannot be directly measured but are hypothesized to explain the pattern of correlations among the n observed variables. The factor analysis begins with a matrix of correlation coefficients among a set of directly measured variables, V1, V2 . . . . . Vn, such as test scores. The computational procedure extracts from the correlation matrix a number of factors and factor loadings, representing the latent traits (hypothetical sources of variance) that mathematically account for the structure of the correlation matrix.

Factor analysis can be explained most simply in terms of a Venn diagram, shown in Figure 8.1. The total standardized variance of each of the three variables A, B, and C (e.g., scores on tests of verbal, perceptual-spatial, and numerical abilities) is represented as a cir- cle. The standardized total variance, 0-2 = 1, of each variable is represented by the area encompassed within each circle. The shaded areas overlapping between any one variable and all the others represent the proportion of the total variance that the variable has in com- mon with all of the other variables (termed the variable's communality, symbolized as h2). The total of all the shaded areas (the sum of the communalities) is the common factor vari- ance in the given set of variables. The coefficient of correlation, r, between any two vari- ables is the square root of the total area of overlap between those two variables. The nonoverlapping area for any variable in the given matrix constitutes variance that is unique to the measurements of that particular variable. It is referred to as that variable's unique- ness, U, and is equal to 1 - h 2. Each and every measured variable has some degree of U,

Figure 8.1: Venn diagram used to illustrate the gist of factor analysis.

Page 12: The Heritability of Chronometric Variables

The Factor Structure of RT in ECTs 139

which is composed of two parts: (1) Specificity, S, a true (i.e., reliable) source of variance that is not common to any other variable in the given matrix; and (2) random error of meas- urement, or unreliability (e).

The areas of overlap (shaded areas) represent factors, F, or common variance between two or more variables. In Figure 8.1 we see a total of three Fs each of which has only two variables in common (FAB, FAO FCB ). Because these factors comprise only particular groups of variables but not all of the variables, they are termed group factors (also called primary factors or first-order factors). One factor in Figure 8.1, F c, is common to all of the variables and is referred to as the general factor of the given matrix. (The general fac- tor should be labelled G for any matrix that does not contain a number of complex cogni- tive variables, such as IQ, that are typically considered the best exemplars of Spearman's g. The degree to which any obtained G factor resembles Spearman's g is a complex ques- tion that can only be answered empirically.)

In Figure 8.1 we see that the total variance comprising all three variables and their inter- correlations can be dichotomously divided in two ways: (1) uniqueness (U) versus common factors (all shaded areas), and (2) group factors versus a general factor (F~). A variable's correlation with a particular factor is termed its factor loading on that factor. In Figure 8.1, the factor loading is the square root of the bounded area. Hence the square of a variable's loading on a given factor is the proportion of variance in that variable that is accounted for by the given factor. Factors are named according to the characteristics of the particular vari- ables on which they have their larger loadings (termed salient loadings). It is especially important to note that a factor is definitely not an amalgam or average of two or more vari- ables, but rather is a distillate of the particular source(s) of variance they have in common.

Although this Venn diagram serves to illustrate the gist of factor analysis, with only three variables, it is actually far too simple to be realistic. (For example, as a rule in factor analy- sis at least three variables are required to identify one factor.) But this simple Venn diagram can also be used to illustrate one other feature that is too often unrecognized in the use of factor analysis. When there is a significant G factor in the matrix, it should be clearly rep- resented in the factor analysis of that matrix. The procedure known as varimax rotation of the factor axes is intended to maximize and roughly equalize the variance attributable to each of the group factors. This is fine if there is no G in the matrix. But if indeed the matrix actually harbors a G factor, varimax totally obscures it. The G variance is dispersed among the group factors in a way that makes them all perfectly uncorrelated with each other as well as inflating them with the variance that should rightfully be accredited to the G factor. This could be illustrated in Figure 8.1 by combining parts of F~ with each of the group factors FAB, FAC, and FBC SO as to maximize and equalize their variances as much as possible, at the same time maintaining zero correlations among the varimax factors. When a substantial G factor is revealed in the correlation matrix by any factor method properly capable of doing so, it is simply wrong to disperse its variance among the group factors. Varimax fac- tors, however, may be useful in identifying the group factors in the matrix as a preliminary step to performing a hierarchical analysis. ~ But presenting a varimax rotation as a final result should be permissible only when one can first reject the hypothesis that the correla- tion matrix contains a G factor with significant loadings on every variable, a condition that is virtually assured for a matrix of all-positive correlations. In the domain of cognitive abil- ities this hypothesis has so far never been legitimately rejected (Jensen, 1998b, p. 117).

Page 13: The Heritability of Chronometric Variables

140 Clocking the Mind

The term factor structure refers to a model that displays the structure of the correlation matrix in terms of its latent variables (factors). The three factor models most frequently encountered are the hierarchical, the bi-factor (or nested), and principal factors (with or without varimax rotation). These can be explained here without reference to the proce- dures for determining the number of factors to be extracted from a given matrix or the iter- ative computational procedures for obtaining the factor loadings. These and many other technical issues can be found in all modem textbooks on factor analysis. Factor analysis and components analysis are usually applied to a correlation matrix, but under special con- ditions, they can also be applied to a variance-covariance matrix. ~

The common factors are sometimes referred to as latent traits, a term that implies nothing causal, but merely indicates that they are not directly observed or measured vari- ables (e.g., test scores), but emerge from a factor analysis of the correlations among the directly observed variables. There are always fewer common factors than the number of variables, and the common factors comprise only some fraction of the total variance con- tained in the directly observed variables. Hence the measurement of factors in individu- als is problematic; the obtained factor scores are always just estimates of the true factor scores, which can never be known exactly, however useful individuals' estimated factor scores might be theoretically or practically. (The most admirably clear and succinct non- mathematical discussion of the fundamental differences between factor analysis and prin- cipal components (PC) analysis I have seen is provided by the British statistician D. J. Bartholomew (2004)). PC analysis is sometimes used in place of factor analysis, usually omitting all components with latent roots (eigenvalues) smaller than 1. The result of a PC analysis looks similar to a factor analysis. But a PC analysis is technically not a latent trait model, as it analyzes the total variance including the uniqueness; therefore the com- ponents are "contaminated" by some admixture of uniqueness (i.e., specificity and error variance) and are therefore always a bit larger and a little less clear-cut than the corre- sponding common factors. (A correlation matrix of n variables contains n PCs, but usu- ally in psychometric research only those PCs with the largest latent roots (typically eigenvalues > 1) are retained in the final analysis.) Strictly speaking, PCs are not com- mon factors, though they contain the common factors and are therefore usually highly correlated with them (Jensen & Weng, 1994). The most distinctly different models of common factor analysis and PCs are illustrated based on a matrix of nine intercorrelated variables. Examination of these factor matrices reveals the typical characteristics of each model. 2

Problems in Factor-Analyzing RT Variables

All of the factor analyses I have found that were performed exclusively of RT tasks are afflicted with one or more of the several known obstacles to attaining an optimal result. Specifically noting these deficiencies, to which purely psychometric analyses are also liable, helps to indicate the desiderata for future studies of the latent structure of RT in var- ious ECTs. This could help in reducing redundancy in the measurement of RT variables, because the potential varieties of observed RT measurements undoubtedly far exceeds the number of latent variables.

Page 14: The Heritability of Chronometric Variables

The Factor Structure of RT in ECTs 141

(1) The three variables rule. The most conspicuously common violation in factor analy- ses of RT measures is the failure to apply the rule that each factor must be identified by at least three (or preferably more) different variables. If one hypothesizes the emer- gence of a particular factor, three different variables tapping fairly large amounts of that factor need to be included in the correlation matrix. The observed measurements of these variables cannot be merely equivalent forms of the same test; their high inter- correlations would not qualify as evidence for an authentic factor. The latent trait of interest must be manifested in more than a single test to qualify as a group factor. Otherwise only that part of its variance that is common to all other variables in the analysis will show up as a general factor, G; all of the remaining non-G variance con- stitutes the test's uniqueness, U. If a test actually harbors what could be a substantial group factor, that part of its variance remains hidden in the U variance as test speci- ficity, S.

Such has been the fate of many RT variables in the factor analyses I have perused. These RTs appear to consist only of G and U, with small and nondescript loadings on any group factors. In some analyses there are no substantial group factors at all; only G and U emerge. We can identify authentic group factors only by including three or more variables that might possibly contain the same group factor. If a group factor emerges, it then can be further substantiated, or cross-validated, by identifying the same factor when the set of tests that originally identified it are included in a new cor- relation matrix containing tests other than those used in the original matrix. Such pro- cedures have been used many times over to identify the main group factors in the enormous catalog of psychometric tests of cognitive abilities, which vastly exceeds the number of independent factors that account for most of their reliable covariation (Carroll, 1993).

(2) Experimentally independent variables. Another general rule in factor analysis is that the variables entering into it must be experimentally independent, that is, no two vari- ables should be derived from the very same measurements. If A and B are experimen- tally independent variables in a given factor analysis, then any other variables containing A and B (e.g., A+B, A/B, etc.) should not be entered in the same correla- tion matrix. Examples of nonindependent measures are an individual's mean RT over n trials, the standard deviation (RTSD) of the individual's RTs, and the slope and inter- cept of RT, all derived from the very same data. Nonindependent measurements share identical measurement errors, which distorts their true intercorrelations. Such inflated correlations have entered into many factor analyses of RT data and at times can create the false appearance of a factor. The ideal way to solve this problem, if there is any good reason to include certain derived measures (e.g., individual means and SDs of RTs) in the same factor analysis is to divide each subject's RT performance into odd and even trials (e.g., measuring mean RTs on the odd trials and RTSDs on the even tri- als) thereby making the derived variables experimentally independent. This operation can be done twice, to utilize all of the data, by reversing the odd-even variables. The two correlations thus obtained can be averaged (using Fisher's Z transformation). Further, the correlation between the derived variables (e.g., RT and RTSD) can be cor- rected for attenuation using the average of the two odd-even correlations of the same

Page 15: The Heritability of Chronometric Variables

142 Clocking the Mind

variable (boosted by the Spearman-Brown formula) as the reliability coefficient. The odd-even method is obviously not confounded by practice effects or intersession effects on the RT measurements. Note that there is no problem in including both RT and movement time (MT) in the same factor analysis, as they are experimentally inde- pendent measures.

Practice Effects and Intersession Effects

The effects of these conditions have not been thoroughly investigated in factor-analytic studies, but there is good reason to believe that they can be manifested as "noise" in the intercorrelations among RT variables. Subjects show varying degrees of improvement in performance as they become more practiced on a given RT task or on a series of different ECTs in which subjects repeatedly use the same response console. There are intratask and intertask practice effects. Their magnitudes depend largely on task complexity, the more demanding tasks showing the larger practice effects. Because these effects may alter the factor composition of a task over the course of testing, they should be minimized by ade- quate practice trials and judicious sequencing of different tasks. Suitable data for factor analysis can be obtained from RT when enough practice trials have been given to stabilize not necessarily the absolute level of every individual's RTs per se, but the reliability of individual differences in RT across trials as indicated by the intertrial correlations. This reliability typically becomes fairly stable after relatively few (10-30) practice trials, even as the absolute RT continues to decrease gradually across a great many more trials, approaching asymptote at a decelerated rate. In using any new RT task it is necessary to run a parametric study to determine the course of changes in reliability of RT over n trials as a guide to estimating the optimum number of practice trials for the particular task in a given population.

Even when individual differences in RT have stabilized in a given practice session, there is a rather marked loss of this stability across different test sessions on the same well- practiced tasks, when the same tasks are administered on different days or even a few hours apart on the same day. In other words, the intersession reliability of individual differences is lower than the intrasession reliability. Intersession and intrasession reliability coeffi- cients on the same task have been found to differ mostly within the range of .70 to .95. For relatively simple RT tasks, such as the Hick paradigm, for example, we have found that intersession reliability remains rather consistently lower than intrasession reliability across 10 days of practice sessions. More complex tasks than SRT show greater intersession cor- relations (reliabilities), which increase over sessions, resulting in a matrix of intersession correlations that resembles a simplex (i.e., increasing correlations between successive ses- sions). Hence mixing a set of RT tasks administered in a single session together with tasks administered in different sessions may introduce sources of variance into the correlation matrix that can create puzzling inconsistencies in the factor loadings of various tests depending on whether they contain intersession variance, a transitory kind of specificity.

It should be noted in this context that such transitory specificity is, strictly speaking, not random measurement error, as it does not arise within sessions but arises only between ses- sions. It represents a true-score day-to-day fluctuation in an individual's RT, reflecting

Page 16: The Heritability of Chronometric Variables

The Factor Structure of RT in ECTs 143

subtle changes in the physiological state that are generally not detectable with conven- tional psychometric tests. The differences in mean RT for an individual's "good" days may differ significantly from the mean RT on "bad" days. Such fluctuations reflect the great sensitivity of RT measures to slight changes in the individual's physiological state. Interestingly, the simpler RT tasks tend to be more sensitive to such fluctuations. The con- sistent magnitude of daily fluctuations in RT might itself be an important individual dif- ference variable to be studied in its own right. Hence this sensitivity of RT should not be viewed as a drawback but rather as one of the advantages of RT for certain purposes, despite the correlational "noise" that such transitory specificity adds to a factor analysis. The solution to the problem, if one aims to achieve stable test intercorrelations and clearly interpretable group factor loadings, is to aggregate each individual's mean RTs on a given task obtained on two or more sessions, thereby averaging-out the transitory specificity due to intraindividual variation across sessions. If this may seem a Herculean labor for RT researchers, whoever said scientific research should be easy? We can compare our prob- lems with the efforts of physicists to prove the existence of, say, neutrinos or antimatter. Our aim here is simply to try to identify reliably the main latent variables in the realm of those ECTs for which mean RT is in the range below 2 s for the majority of adolescents and adults (or 3 s for young children and the elderly). The much longer response latencies elicited by more difficult tasks often evoke strategies that make them factorially more com- plex and therefore less likely to line up with group factors reflecting more elemental processes. They tend to merge into one or more of the known psychometric group factors.

Excessive Task Heterogeneity

This is probably the biggest problem in present factor analyses of ECTs. When multiple ECTs are used on the same group of subjects, theoretically nonessential task demands can be too varied for the latent traits to be clearly discernable. They are obscured by method variance. The response console, stimulus display screen, preparatory intervals, speed/accuracy instructions, criteria for determining the number of practice trials, and the like, should all be held constant across the essential manipulated variables in the given ECT. Different ECTs should vary only in the conditions that essentially distinguish one type of ECT from another in terms of their specific cognitive demands, such as attention, stimulus discrimination, or retrieval of information from short-term memory (STM) or long-term memory (LTM). Ideally, method variance should be made as close to nil as pos- sible. Essentially, in designing a factor analytic study of ECTs one must think in terms of revealing their hypothesized latent variables, which dictates minimizing nonessential method variance in the observed RTs on a variety of different paradigms.

Also the different ECTs should be similar enough in complexity or difficulty level to produce fairly similar and preferably very low response-error rates. It is unduly problem- atic to deal statistically or factor analytically with large differences in error rates, either between different ECTs or between individuals on the same ECT. One solution is to retain only correct responses in the analysis. To ensure exactly the same predetermined number of error-free trials for every subject in the study sample, the sequence of test trials can be automatically programmed to recycle the stimulus that produced an error so it is to be pre- sented again later in the sequence. Of course, all errors are recorded for other possible

Page 17: The Heritability of Chronometric Variables

144 Clocking the Mind

analyses. There should also be a predetermined criterion for discarding subjects who take an inordinate number of trials to achieve n error-free trials. Explicit rules for discarding outliers in a given study are seldom used in psychological research compared to their more routine use in the natural sciences. Outlier rules are often useful and occasionally neces- sary in mental chronometry.

Another category of outliers is the small minority of individuals whose performance on a given ECT fails to conform to the information-processing model it is supposed to represent (Roberts & Pallier, 2001). For example, in the Hick paradigm a few rare individuals do not show any linear increase or systematic change in RT as the number of bits of information increases. And one unusual subject (one of the world's greatest mental calculating prodigies) who was tested in my laboratory, although performing exceptionally well on the Sternberg memory scan task, showed not the slightest tendency to conform to the typical scan effect for this paradigm, i.e., a linear increase RT as a function of the set size of the series of a num- bers; this subject's RTs were the same for all set sizes from 1 to 7 (Jensen, 1990). In such cases it seems apparent that the subject is actually not performing the "same task" as the great majority of subjects. Such atypical subjects should be treated as outliers, although they may be of interest for study in their own fight. But their presence in a typical data set usually atten- uates the ECT's correlations with other ECTs and psychometric variables.

RT and MT Admixture Effects

This is a problematic kind of task heterogeneity. A response console using a home button permits the separate measurement of both RT and MT. But each measure, RT and MT, can contaminate the other one to some degree.

The problem in factor-analyzing response times to various ECTs that differ in type of response (single or double) is that RT and DT are not the same, as was explained in Figure 3.3. (p.53) RT and MT differ significantly even under otherwise completely iden- tical conditions. The one study of this phenomenon, based on the Hick paradigm, is dis- cussed in Chapter 3. (p.53) As hypothesized there, the "backward" effects of MT on RT are probably an example of Fitts's Law (Fitts, 1954) which states there is a monotonic rela- tionship between the latency of a response calling for a particular movement and the com- plexity or precision for the required motor response. This implies that the specific movement has to be mentally programmed before it can be made, and that takes time, which is some fraction of the RT. Hence the RT in the double response condition requires more mental programming time than is required for an RT when a MT response is not required. Virtually all of the response programming time is included in the RT, while lit- tle, if any of it gets into the MT, which itself is apparently a purely psychomotor variable rather than a cognitive one. And RT and MT seem not to be functionally related. In the Hick paradigm, for example, there is zero intraindividual (or within subjects) correlation between RT and MT, while there is a low but significant interindividuals (or between sub- jects) correlation (usually <.30) between RT and MT (Jensen, 1987b). The absence of an RT X MT correlation within subjects and the presence of a significant correlation between subjects generally suggests there is no directly causal connection between RT and MT. That is, although DT and MT are correlated in the population, they do not have a directly functional relationship in an individual.

Page 18: The Heritability of Chronometric Variables

The Factor Structure of RT in ECTs 145

Still unknown is whether the increment in RT that consists of the programming time required by the MT response is factorially the same or different from a theoretically pure RT.

Information Processes versus Information Content

If the aim of measuring RT in various experimental variations of an ECT is to measure indi- vidual differences in a certain hypothesized process, such as speed of scanning information in STM, there is a question of how much of the individual differences variance is associated with the particular information process and how much is associated with the ECT's partic- ular content (e.g., verbal, numerical, or figural). Unless these two sources of variance are experimentally manipulated in a processes • design, a factor analysis of such data on a suitable sample of individuals would be uninformative as to the probable number of orthogonal factors needed to explain the correlation matrix or to estimate the proportions of individual differences variance attributable to each factor. The extant literature has not sys- tematically addressed this question, and the little incidental evidence is still too meager and inconsistent to allow a confident answer. One large-N study, however, is suggestive and provocative (Levine, Preddy, & Thorndike, 1987). Three groups of various standardized psychometric tests known to measure verbal, quantitative, and visuospatial abilities were all correlated with six RT tasks in which the response stimuli of the ECTs consisted of either verbal, quantitative, or spatial materials. The result: the type of content made little differ- ence in the correlations; the average correlations was - .27 when the content was the same for the RT and psychometric variables and - . 2 2 when the content differed. A factor analy- sis of all the intercorrelations showed that a single general factor, G, common to both the RT and psychometric variables accounted for most of the common factor variance. The average loadings of the RT and psychometric variables on this general factor, G, are quite comparable ( - . 4 0 and .43). The RT variables had negligible loadings on psychometric group factors residualized from G. But there was also another factor, orthogonal to the gen- eral factor, with substantial loadings on only the RT variables. This factor probably reflects the strictly psychomotor component of RT, which is unshared by the nonspeeded psycho- metric tests. It is noteworthy that when the RT factor is residualized from the G factor, its largest loadings are on the cognitively least demanding ECTs, such as SRT and 2-choice RT. (Also, psychometric tests have no nonzero loadings on the residualized RT factor.) It may seem unfortunate to some researchers if it is established that RT generally "reads through" the different contents of cognitive tasks, reflecting only their common factor inde- pendent of specific content. It would mean that, except for G, a greater variety of distinct cognitive factors are measured by psychometric than by chronometric tests. At present, however, it still remains to be determined by further investigation whether verbal, quantita- tive, spatial, and other psychometric group factors (residualized from G) can be measured chronometrically or if mental speed can only reflect their common factor.

Is the Factor Analysis of Reaction Time Asking the Wrong Question ?

The answer is Yes and No; it depends entirely on the question. It should be recalled that the correlation coefficients on which factor analysis is based are in turn based on the standardized deviations of individuals from the mean of the particular group that was

Page 19: The Heritability of Chronometric Variables

146 Clocking the Mind

tested and which is presumably a sample of some defined population. The linear correla- tion between any pair of variables measured in this sample, therefore, represents, on aver-

age, the relative deviations of each of the two variables from their respective group means. Knowing the coefficient of correlation between two given variables tells us the expected mean value of one test's average deviation (in standard deviation units) from its group mean given the other variable's average deviation from it's group mean. All the time we are dealing with group averages. A factor analysis of a correlation matrix, therefore, also can only reflect averages. Thus factors represent statistical entities, not individual entities, which can be estimated within this context only with a determinable margin of error.

The size of the correlations among various measures of cognitive variables indicates the average redundancy of these variables in the description of individual differences (vari- ance) in abilities. In a factor analysis the redundancy among the variables is highlighted by the number of significant factors and also by the magnitudes of the variables' loadings in these factors. Redundant variables have highly similar loadings on each of the significant factors. They are, on average, interchangeable variables. In the extreme, it would be like measuring height both in centimeters and in inches. The factor analysis of chronometric measures is useful for selecting those tests that best measure different orthogonal (i.e., nonredundant) factors (in addition to G) and that have the largest factor loadings. These, then, are the tests that have the highest probability of yielding the most information in the particular population of interest. This is practically useful information and justifies the fac- tor analysis of chronometric tests, in addition to their use in advancing basic research and theory on human cognition.

On the other hand, factor analysis can contribute little if anything to detecting or meas- uring exceptional conditions that are not detectably reflected by group means or variances in the general population. Yet many highly atypical and abnormal phenomena peculiar to a very few individuals are quite worthy of study. Though they would statistically be regarded as outliers in any population sample, they are entirely real and reliably measurable phe- nomena. An example in psychometrics is visual and auditory digit span memory, which are perfectly correlated in the normal population (hence they are factorially redundant meas- ures). But they are poorly correlated among the rare individuals who have suffered a brain injury specifically affecting the auditory cortex. The infrequency of such individuals in the general population would preclude discovering the factorial separation of auditory and visual memory span by means of a factor analysis based on a sample of the general popu- lation. The same kind of differentiation of abilities under abnormal conditions could also occur in various chronometric measurements that might otherwise appear unitary or redun- dant in the factor analysis of typical population samples. The interpretation of RT measures in exceptional cases is considered later in reviewing the uses of chronometry for clinical diagnosis of abnormal brain conditions, monitoring their progression, and longitudinally assessing the effects of specific treatments and drugs in individual patients. This is proba- bly the field of the potentially most valuable practical uses of chronometry.

Empirical Evidence

Because of all the problems with the factor analysis of chronometric variables in the pres- ent literature, including the inordinate heterogeneity of the RT paradigms, apparatuses, and

Page 20: The Heritability of Chronometric Variables

The Factor Structure of RT in ECTs 147

sample data, it is wholly unfeasible to attempt a true meta-analysis of all the findings. Nevertheless, there are still a few consistent observations that can be discerned from the extant data. As these generalizations apply to speeding up of processing measures regard- less of whether they are based on DT or RT (as previously defined), from here on I will use RT as the genetic term for all measures of response time based on a manual response.

R T a n d M T are Dis t inc t Factors

Although there is generally a slight but significant positive correlation between RT and MT, in factor analyses that include measures of both variables they load on two clear-cut orthogonal group factors. This is probably the most consistent and least ambiguous gener- alization that can be drawn from the whole factor analytic literature involving measures of RT and MT. Carroll's (1993) review of this evidence characterizes RT as a cognitive fac- tor and MT as a psychomotor factor. Although worthy of study in its own right, MT is not considered in the subsequent discussion.

Elementary Information Processes are not yet Identified as Group Factors

I have not found a factor analysis in which three or more distinct variables intended to meas- ure one of the hypothesized elementary cognitive processes were included in the same cor- relation matrix with enough other RT variables to allow the possible emergence of factors representing the hypothesized elementary processes. Examples of the most well-known ele- mentary processes are derived variables such as the slope of the Hick function (rate of infor- mation processing), the slope in the S. Sternberg paradigm (the rate of retrieval of information from STM), and the NI-PI measure (name identity RT-physical identity RT) from the Posner paradigm reflecting the rate of retrieval of information from long-term mem- ory (LTM). The overall mean RTs obtained from these three paradigms do, however, have a large common factor. Whether derived measures of different elementary processes form dis- tinct group factors when residualized from their common factor has not yet been determined.

Significant Group Factors are Unidentified

Group factors orthogonal to G emerge in a few of the factor analyses and are large enough (in terms of the percentage of variance accounted for) to be considered significant factors. But there are two problems in trying to characterize them in psychological terms: (1) the few variables in which they are mainly loaded are too heterogeneous to reveal any com- mon features that would provide a clue as to what precisely is the nature of the factor; and (2) there are too few large loadings to be able to identify a factor; the factor loadings are too similar in size to highlight any features that would afford a clue to what the factor com- mon among the variables might be.

A Large Common Factor, G, Exists in all Reaction Time Variables

I have found four studies containing independent data sets comprising between six and nine different RT variables in which procedures for the ECT measures within each study

Page 21: The Heritability of Chronometric Variables

148 Clocking the Mind

were methodologically fairly homogeneous. (The intercorrelation matrices for each of these four data sets are found in the following studies: Hale & Jansen, 1994; Kyllonen, 1985, Table 3, also reported in Carroll, 1993, Table 11.7; Miller & Vernon, 1996; Roberts & Stankov, 1999.) The specific aim here is to determine how many substantial factors can be extracted from each matrix and what percentage of the total variance in each matrix can be accounted for by these factors, with the remaining variance representing the variables' uniqueness. PCs analysis is probably best suited for this purpose. The first principal com- ponent (PC1) is interpreted as G and the other substantial components are the raw mate- rial for group factors residualized from G in a hierarchical factor analysis. (Remember that the chronometric G being discussed here is not necessarily Spearman's psychometric g. The relationship between G and g is the subject of Chapter 9.)

The criterion used here by which a PC is deemed "substantial" is based on the eigenval- ues (latent roots) of the correlation matrix of n RT variables, using the Kaiser-Guttman rule for considering only those PCs with eigenvalues of 1 or greater than 1 as "substantial." The logic of this rule, simply, is that if one extracts PCs from the correlations among a large number of variables each consisting of purely random numbers (hence containing no true factors), all of the n extracted PC's eigenvalues hover around 1, the first one or two PCs are always slightly greater than 1 due to purely chance correlations in the matrix, and all the remaining components have very gradually decreasing eigenvalues of less than 1. (The sum of all of the eigenvalues is always equal to the number of variables, n. A particular PC's eigenvalue divided by n is the proportion of the total variance in all n variables accounted for by that PC.) When a typical correlation matrix is residualized from all of its PCs having eigenvalues > 1, the residualized correlation matrix typically looks just like a matrix of cor- relations among variables composed entirely of random numbers. Therefore, obtained PCs with eigenvalues smaller than 1 cannot be claimed to represent substantial latent variables.

The PC analyses of these four studies reveal two main points:

(1) All studies show a very large PC1 (or G factor). The PC1, on average, accounted for 57.4 percent (SD = 9.9 percent) of the total variance. This exceeds the percentage of variance accounted for by the PC1 of some standard psychometric test batteries, such as the Wechsler Intelligence Scales (about 40 percent). The chronometric G can only be interpreted at this point as general speed of information processing.

(2) In only one study (based on nine ECTs) was there a second component (PC2) that was substantial (with eigenvalue = 1.6), accounting for 17.8 percent of the total variance. (The PC1 accounted for 43.7 percent.) The PC2 is rather ambiguous but seems to con- trast RTs for the simpler and more complex ECTs. In this study, in which the tasks were more liable to method variance than were the three other studies, the PC proba- bly reflects different ratios of sensory-motor/cognitive abilities in the RT performance on the different types of ECTs.

Disregarding the ubiquitous sensory-motor aspect of RT, we are faced by a gaping ques- tion: can RTs on various ECTs reliably identify individual differences in any other cogni- tive latent traits besides a single common factor the general speed of information processing?

The data provided by Hale and Jansen (1994) in the previous PC analyses seem to favor the hypothesis of an exclusive general factor accounting for all the reliable individual

Page 22: The Heritability of Chronometric Variables

The Factor Structure of RT in ECTs 149

differences variance in different ECTs. This finding was explicated in a different context in Chapter 6 in relation to the high predictability of fast and slow groups' mean RTs on a considerable variety of ECTs, as shown in Figure 6.12.

The very same kind of plots was made for individual subjects selected from the high and low extremes and the middle of the total distribution of overall mean RT on all tasks. The resulting plots, shown in Figure 8.2, confirm the very same effect as seen in the group

2 m � 9 m . ' "

i ~ IInter = +0.1101 . , ~ . ~ I Inter =-0.0101 - ~ v - ~ -/X~7~- ]r2 = 0.94 ]

nL '~v J I I I I I 7 I I , I I I

~ 3 �9149

if) ._~ 0

t-- _

~ v

-E '- ~" ~ - ~ ~ 1 7 6 I s l o p o = 0 . 9 8 . = ~ [Inter = -0.1481 x~ "" [Inter = +0 .042 I

le-0 -o O . . / i I , I I I . / , I i I I I

1 _ - ope = 1.80 j ~ " Inter = - 0 . 5 4 3 r S ~ I In ter = - 0 . 4 3 2

- " / ~ 1"2=0"94 - " I I r 2 = 0 " 9 7

0 . " / / , I , I ; I . " / , I , I ; / 0 1 2 3 0 1 2 3

Group mean latencies (sec)

Figure 8.2: Plots for six individuals' RTs based on the same tasks shown as group means in Figure 6.12. Subjects were selected from the high and low extremes and the middle of the total distribution of RTs in a sample of 40 undergraduate university students. Note the high values of r 2 despite the marked differences in the slopes and intercepts of the regres- sion lines. The one exception is subject U14. Retesting of this subject would determine if the exceptional deviations of the same data points represents a reliable individual differ- ences phenomenon (from Hale & Jansen, 1994, reprinted with permission of the American

Psychological Society).

Page 23: The Heritability of Chronometric Variables

150 Clocking the Mind

mean RTs in Figure 6.12. That is, individual differences in all the various RT tasks differ along a single time dimension m general speed of information processing - - while the dif- ferences in RT between the tasks reflect differences in task difficulty or complexity as rep- resented by the slope measures in Figures 6.12 and 8.2. This finding, assuming it is strongly established in other RT data sets, thus resolves into two key questions: (1) What accounts for the G factor in RTs? and (2) What accounts for the reliable mean differences in RT across various ECTs? These remain open questions, partly because most of the RT data reviewed here were based on healthy, bright, young university undergraduates. A sig- nificantly different picture might result if the same kind of analyses were done in more het- erogeneous population samples, especially if they included individuals with various kinds of brain pathology. In any case, if chronometric G is found to have substantial external validity, various measures of it would be important variables in their own right even if no other authentic speed of processing factors independent of G could be found.

Notes

1. It should be noted that RT variables are particularly well suited to the factor analysis or principal components analysis of their raw-score variance--covariance matrix rather than the correlation matrix. The Pearson correlation coefficient is simply the standard- ized covariance, i.e., Cov XY = [~ (X - X) * (Y - Y) ]/N; Correlation rxy =

Cov XY/cr x* Cry. It makes no sense to factor analyze a covariance matrix composed of raw-score variables that are not all on a scale with the same equal units of measure- ment. RT, being a ratio scale, which is the highest level of measurement, is one of the few variables in psychological research that could justify the use of covariances instead of correlations in a factor analysis or components analysis. But I have not come across an instance of its use in RT research. So I have tried factor analyzing RT data on a few covariance matrices to find out how the results differ from those obtained by analyzing the standardized covariances, i.e., correlations. Several points can be noted: (1) The factors that emerge are usually quite similar, but not always, as judged by the correla- tions of the column vectors of factor loadings and by the congruence coefficients between the factors. (2) The variables' factor loadings obtained from the covariance matrix reflect an amalgam of both (a) the factor structure obtained from the correlation matrix and (b) the differences in the variances of the variables. (3) In the covariance analysis, variables that have similar variance tend to load more similarly on the same factors. (4) In the covariance analysis, the first, most general factor strongly reflects different variables' similarity in variance. The loadings of variables on the general fac- tor are, in a sense, weighted by the magnitudes of their variances. These features of covariance analysis may be most informative in the case of RT variables when looking for those particular variables that are related to an external criterion, such as IQ or other psychometric scores, because it is known that the RT tasks with greater individual dif- ferences variance are generally more highly correlated with other cognitive measures, particularly psychometric g. If one wants to obtain factor scores that would best pre- dict performance on psychometric tests, therefore, the optimal method should be to obtain the factor scores actually as component scores from a PCs analysis of the RT

Page 24: The Heritability of Chronometric Variables

The Factor Structure of RT in ECTs 151

variables' raw covariance matrix. When a ratio scale with a true zero point is available, as in RT, some factor analysts (e.g., Burt, 1940, pp. 280-288) even go a step further and suggest factor analyzing the mean raw cross products of the variables, i.e. (EXY)/N. This brings the mean difficulty level of the tasks as well as their variances and intercor- relations simultaneously to bear in the results of the analysis, whether by factor analysis or component analysis.

2. Figure N8.1 illustrates the Spearman model in which one common factor is extracted from a set of variables (V1 - V 9 ) , with each variable loaded on a single factor (g) common to all the variables. Variance unaccounted for by the general factor is attrib- uted to the variables' uniqueness (u).

ul u2 u3 u4

g ...,..

V7 V8

I I I I u5 u6 u7 u8 u9

Figure N8.2 is the Thurstone model in which a number of uncorrelated factors (F1, F2, F3) are extracted. F1 may be a general factor, but if the factors are varimax rotated they remain uncorrelated (i.e., orthogonal factor axes) but the general factor variance is dis- persed among all the common factors.

F,)

ul u2 u3

!

u4

V5

u5 u6

F

u7

~/8

u8 u9

Figure N8.3 illustrates the bi-factor model (also called a nested model) in which a general factor is first extracted from the correlation matrix (as the first principal fac- tor in a common factor analysis) and then the significant group factors are extracted from the variance remaining in the matrix. The group factors are uncorrelated because the general factor accounting for their intercorrelation was previously extracted. There is no hierarchical dependence of g on the group factors; because of this the g factor is always a fraction larger than the g extracted in a hierarchical analysis.

Page 25: The Heritability of Chronometric Variables

152 Clocking the Mind

(

ul u2 u3 u4 u5 u6 u7 u8 u9

Figure N8.4 shows a hierarchical model in which the general factor is arrived at by first extracting group factors, which, if correlated with one another, allows a factor analysis of the group factors and the extraction of their common factor, g. In a matrix with very many variables there can be two levels of group factors, and so the general factor then emerges from the third level of the factor hierarchy. Factor loadings at each successive lower level of the hierarchy are residualized from the more general factors at the higher levels, creating an orthogonalized hierarchical structure in which every factor is perfectly uncorrelated with every other factor, thereby representing the cor- relations among all the measured variables in terms of a limited number of uncorre- lated group factors.

g

ul u2 u3 u4 u5 u6 u7 u8 u9

These models can be treated either as exploratory factor analysis (EFA) or as confirma- tory factor analysis (CFA). CFA uses statistical tests of the goodness-of-fit of different fac- tor models (EFA) to the data. Two or more different models are statistically contrasted against one another for their goodness-of-fit to the data in terms of their degrees of parsi- mony and conformity to certain theoretically derived expectations in explaining the corre- lational structure. The various models (except varimax when there is a large general factor)

Page 26: The Heritability of Chronometric Variables

The Factor Structure of RT in ECTs 153

yield highly similar results, typically showing very high (<.95) coefficients of congruence between the different models, particularly for the general factor in the domain of cognitive abilities (Jensen & Weng, 1994).

Table N8.1 shows examples of different models of factor analysis when each is applied to an analysis of the same correlation matrix. Table N8.2 shows the results of (1) a PCs analysis of the same correlation matrix used in Table N8.1, and (2) the varimax rotated components.

Table N8.1: Three factor models applied to the same correlation matrix.

Hierarchical factor Bi-factor analysis analysis ~

Varimax rotation of principal factors

g F1 F2 F3 g F1 F2 F3 F1 F2 F3

V1 .72 .37 .74 .29 .70 .30 .25 V2 .63 .31 .66 .23 .61 .26 .22 V3 .54 .26 .57 .18 .53 .22 .19 V4 .56 .42 .59 .37 .24 .63 .18 V5 .48 .36 .52 .29 .21 .54 .15 V6 .40 .30 .44 .23 .17 .45 .13 V7 .42 .43 .44 .41 .17 .15 .54 V8 .35 .36 .37 .33 .14 .12 .47 V9 .28 .29 .30 .25 .11 .10 .38

1All factor loadings <. 10 (constituting 0.50 percent of the total variance) are omitted.

Table N8.2: Principal components analysis and varimax rotation of the components based on the same correlation matrix used in the factor analyses in Table N8.1.

Principal components Varimax rotation of PUs I

PC1 PC2 PC3 1 2 V1 .77 -.08 -.31 .76 .28 V2 .72 -.09 -.37 .76 .21 V3 .64 -.10 -.47 .78 .11 V4 .66 -.27 .27 .29 .69 V5 .60 -.29 .34 .21 .71 V6 .52 -.32 .46 .07 .76 V7 .53 .47 .09 .22 .13 V8 .46 .52 .12 .14 .09 V9 .38 .56 .20 .03 .07

3 .20 .15 .09 .14 .11 .07 .67 .69 .70

1Rotated PCs are technically no longer principal components (nor are they common factors as they contain uniqueness) and so are labeled as 1, 2, and 3.

Page 27: The Heritability of Chronometric Variables

Chapter 9

Correlated Chronometric and Psychometric Variables

By far the most extensive literature on the relationship between chronometric and psycho- metric variables is found in the study of mental abilities, particularly general intelligence. Although the earliest empirical studies in this vein date back at least as far as the research of Galton in the late nineteenth centtu~, over 95 percent of the literature on reaction time (RT) and individual differences in mental ability has accumulated over just the past two decades.

The virtual hiatus in this line of research lasted for about 80 years. It is one of the more bizarre and embarrassing episodes in the history of psychology, and one that historians in the field have not adequately explained. A chronology of the bare facts has been outlined elsewhere (Jensen, 1982, pp. 95-98); Deary (2000a, pp. 66-72) provides the fullest account of the misleading secondhand reports of the early studies perpetuated for decades in psychology textbooks. It is a marvelous demonstration of how utterly deficient studies escape criticism when their conclusions favor the prevailing zeitgeist.

The classic example here is the often-cited study by Clark Wissler (1870-1947), a grad- uate student working under James McKeen Cattell, the first American psychologist to be personally influenced by Galton. The circumstances of this study, overseen by this eminent psychologist and conducted in the prestigious psychology department of Columbia University, could not have been more auspicious. Published in 1901, Wissler's study tested Galton's notion that RT (and various other sensory-motor tests) is correlated with intelli- gence. The result was a pitifully nonsignificant correlation of - .02 between "intelligence" and RT. The null result of Wissler's test on Galton's idea is what was most emphasized in three generations of psychology textbooks. What their authors seldom pointed out was that all the cards were outrageously (but na'fvely) stacked in favor of the null hypothesis, for example: (1) the severe restriction of the range-of-talent in the subject sample (Columbia University students), which has the statistical effect of limiting the obtained correlation; (2) "intelligence" was not measured psychometrically but merely estimated from students' grade point average, which in selective colleges is correlated with IQ not more than .50; and (3) the reliability of the RT measurements (based on only three trials) could not have been higher than 0.20, as determined with present-day chronometers. Under such condi- tions, a nonsignificant result was virtually predestined. Yet for decades this study was cred- ited with having dealt the heaviest blow against the Galtonian position! It remained the standard teaching about the relationship between RT and IQ until recently, apparently in total blindness to the fact that in 1904 a now historic classic by the English psychologist Charles Spearman (1863-1945) was published in the American Journal of Psychology, giving detailed notice of the methodological inadequacies of Wissler's study, and also introducing the statistical formula for correcting a correlation coefficient for attenuation (unreliability) due to measurement error.

When I began doing research on the correlation between RT and IQ, in the late 1970s, nearly every psychologist I spoke to about it was at best skeptical or tried to disparage and

Page 28: The Heritability of Chronometric Variables

156 Clocking the Mind

discourage the idea, in the firm conviction that earlier research had amply proved the effort to be utterly fruitless. Their annoyance with me for questioning this dogma was evident, despite my pointing out that I could find no valid basis for it in the literature. But I did find at least a dozen or so published but generally unrecognized studies (some reviewed by Beck, 1933) that made my venture seem a reasonably good bet. My friends' surprisingly strong conviction and even annoyance that my research program was taking a wrong turn decidedly increased my motivation to pursue the subject further. I was further encouraged by the revival of chronometry for the study of individual differences in the promising research of Earl Hunt and co-workers (1975) at the University of Washington and also that of Robert J. Stemberg (1977) at Yale University. At the time I sensed a changing attitude in the air that perhaps presaged a second chance for the role of mental chronometry in dif- ferential psychology.

But I have still often wondered why there was so strong an apparent prejudice against the possibility of finding that RT and mental speed are somehow related to intelligence. Why had this idea been resisted for so long by so many otherwise reasonable psycholo- gists? The most likely explanation, I suspect, is the entrenchment of the following attitudes and implicit beliefs in many psychologists. These attitudes were bedrock in the psycho- logical zeitgeist throughout most of the twentieth century.

(1) Any performance measurable as RT to an elementary task is necessarily much too simple to reflect the marvelously subtle, complex, and multifaceted qualities of the human intellect. A still pervasive legacy from philosophy to psychology is the now largely implicit mind-body dualism, which resists reductionist physical explanations of specifically human psychological phenomena. Any kind of RT was commonly viewed as a merely physical motor reaction rather than as an attribute of mind. Disbelievers in the possibility of an RT-IQ connection pointed out that many lower animals, for instance frogs, lizards, and cats, have much faster RTs than do humans (which in fact is true). And when con- fronted with good evidence of an RT-IQ correlation, they dismiss it as evidence for the triviality of whatever is measured by the IQ. These obstacles to research on RT are sup- ported by belief systems, not by empirical inquiry.

(2) The speed of very complexly determined cognitive behavior is often confused with the sheer speed of information processing. It is noted, for example, that duffers at playing chess seldom take more than a minute or two for their moves, while the greatest chess champions, like Fisher and Kasparov, at times take up to half an hour or more to make a single move. Or it is pointed out that acknowledged geniuses, such as Darwin and Einstein, described themselves as "slow thinkers." Or that Rossini could compose a whole opera in less time than Beethoven would take to compose an overture. "Fast but superficial, slow but profound" is a common notion in folk psychology. But these anecdotes take no account of the amount or the "depth" of mental processing that occurs in a highly complex per- formance. The few times I have played against a chess master (and always lost), I noticed that all their responses to my moves were exceedingly q u i c k - a second or two. But in tournament competition against others near their own level of skill, these chess masters typically take much more time in making their moves. Obviously they must have to process a lot more complex chess information when competing against their peers.

(3) Applied psychologists have resisted pursuing the RT-IQ relation mainly for practical reasons. There has existed no suitable battery of RT measures shown to have a degree of

Page 29: The Heritability of Chronometric Variables

Correlated Chronometric and Psychometric Variables 157

practical validity for predicting external variables comparable to the validity of psychometric tests (PTs). Nor would RT tasks be as economical, as they require individual testing with a computerized apparatus with special software. So it is unlikely that RT tests could take over the many practical uses of standardized PTs, either individual or group administered. This is presently true. But RT methods have been conceived as serving mainly the purely scientific purpose of testing analytic hypotheses concerning the elemental sources of individual differ- ences in the established factors of mental ability identified by complex PTs.

(4) Psychometricians have downgraded RT as a measure of cognitive ability because RT is mistakenly assumed to measure the same kind of test-taking speed factor that has been identified in PT batteries. This test-speed factor is observed in very simple tasks that virtu- ally all subjects can easily perform. Individual differences in these highly speeded tests can be reliably measured only if the task is scored in terms of how many equally simple items the subject can execute within a short-time limit, such as 1 or 2 min, for example, the Digit Symbol and Coding subtests of the Wechsler Intelligence Scales. The common variance in these speeded tests typically shows up in a large factor analysis of various PTs as a small first-order factor with a weak relation to psychometric g. Its most distinguishing character- istic is its very low correlation with nonspeeded power tests, such as Vocabulary, General Information, and Similarities, or the Raven matrices. Various types of choice RT, on the other hand, have their highest correlations with the most highly g loaded nonspeeded PTs, and they show their lowest correlations with the speeded tests that define the psychometric speed factors, such as coding, clerical checking, and making Xs, which have the lowest g loadings of any PTs. In this respect, the psychometric speed factor is just the opposite of RT measures. So the mistaken equating of mental speed as measured in chronometric para- digms with scores on highly speeded PTs has given the former a bum rap.

The idea that mental speed may be importantly involved in variation in human intelli- gence was not universally deprecated in American psychology. Early on, one of the pio- neers of psychometrics, Edward L. Thorndike (1874-1949), the most famous student of J. McKeen Cattell, attributed a prominent role to mental speed in his set of principles for the measurement of intelligence (Thorndike, Bregmamn, Cobb, & Woodyard, 1927). He also referred to these principles as the "products of intelligence." Within certain specified con- ditions and limits, all of these hypothetical generalizations have since been proved empir- ically valid. What is now needed is a unified theory that can explain each of them and the basis of their interrelationships. Because these five principles stated in Thorndike's The Measurement of Intelligence (1927) well summarize some of the most basic phenomena that need to be explained by a theory of individual differences in intelligence, they are worth quoting in full:

1. Other things being equal, the harder the tasks a person can master, the greater is his intelligence (p. 22).

2. Other things being equal, the greater the number of tasks of equal difficulty that a per- son masters, the greater is his intelligence (p. 24).

3. Other things being equal, the more quickly a person produces the correct response, the greater is his intelligence (p. 24).

4. Other things being equal, if intellect A can do at each level [of difficulty] the same number of tasks as intellect B, but in less time, intellect A is better. To avoid any

Page 30: The Heritability of Chronometric Variables

158 Clocking the Mind

appearance of assuming that speed is commensurate with level or with extent, we may replace "better" by "quicker" (p. 33).

5. It is important to know the relation between level [difficulty] and speed for two rea- sons. If the relation is very close, the speed of performing tasks which all can perform would be an admirable practical measure of intellect. The record would be in time, an unimpeachable and most convenient unit (p. 400).

A year before the appearance of Thorndike's 1927 book, two psychologists at Harvard published a small study based on only five subjects. RT was reported to be correlated a phenomenal - . 9 0 and - 1.00 with scores on two intelligence tests. In retrospect, these cor- relations are recognized as obvious outliers - - not surprising for such small-sample corre- lations. It is amazing that the study was not immediately repeated with a much larger sample! Nevertheless, the authors' conclusion was on target in noting the promise sug- gested by their experiment: "If the relation of intelligence (as the tests have tested it) to RT of any sort can finally be established, great consequences, both practical and scientific, would follow" (Peak & Boring, 1926, p. 94).

Chronometric Correlations with Conventional Mental Tests

That fact that chronometric measures are correlated with scores on PTs of mental abilities is now firmly established. Presently, active researchers in this field have reviewed much of this evidence from various theoretical perspectives and have drawn remarkably similar conclusions (Caryl et al., 1999; Deary, 2000a, b; Detterman, 1987; Jensen, 1982, 1985, 1987a, 1998, Chapter 8; Lohman, 2000; Neubauer, 1997; Vernon, 1987). A true meta- analysis of all this evidence, however, is neither feasible, nor could it be very informative. The great heterogeneity of the subject samples, the obtained correlations, and the testing conditions of both the chronometric and psychometric variables calls for a different kind of summary of the empirical findings.

Overall, I estimate that less than 5 percent of the correlations reported in the RT-IQ lit- erature are on the "wrong" side of zero, most probably due to errors of sampling and meas- urement. The vast majority of all the reported correlations between RT and mental test scores are negative, with their distribution centered well below the zero point on the scale of possible correlations. But of greater interest, theoretically, than the overall mean and SD of these correlations are the task conditions that systematically govern variation in their magnitudes.

There are many possible ways to categorize the relevant data. Probably the most informative from the standpoint of theory development are the following main types of quantitative relationship between RT tasks and PTs:

(a) Comparing the mean levels of selected high- and low-PT criterion groups on various measures of RT.

(b) Zero-order correlations between single RT tasks and single PTs. (c) Multiple correlations between two or more RT tasks and a single PT, and vice versa. (d) Correlations between latent traits, e.g., (i) canonical correlation between two or more RT

and two or more PT variables; (ii) correlations between PT g factor scores and single RT

Page 31: The Heritability of Chronometric Variables

Correlated Chronometric and Psychometric Variables 159

variables; (iii) factor analysis of a correlation matrix containing a variety of both RT and PT variables.

Nested in the above categories are other conditions that can affect the magnitude of the RT-PT correlation:

(a') The particular kind of RT measurement (e.g., mean RT, reaction time standard devia- tion (RTSD), slope);

(b') the specific content of the PT variable and the RT variable (e.g., verbal, numerical); (c') the particular elementary cognitive task (ECT) paradigm on which RT is measured; (d') the range of RT task difficulty, information load, or features of RT tasks that can be

ordered on a "complexity" continuum; and (e') characteristics of the subject sample (range of ability, age, sex, educational level).

Comparison of Criterion Groups

The simplest RT paradigm on which extensive ability group comparisons are available is the Hick paradigm. Figure 9.1 shows the mean RTs of three large groups of young adults of similar age selected from different regions of the IQ and scholastic achievement contin- uum. All were tested on the same RT apparatus under virtually identical conditions. An

800

700

A o

600 E

E = 500

t - O

o. 400 n -

3 0 0

- LO

- S ~ _ .

A V E e ' " " _ i ~ - ' - ~

- HI ~

200 I I I I 0 1 2 3

Bits

Figure 9.1: Mean RT in the Hick paradigm with 1-, 2-, 4-, and 8-choice RT tasks correspon- ding to information loads of 0, 1, 2, and 3 bits, in three young adult groups at three levels of ability labeled Hi (university students [N=588]), Ave (vocational students [N=324), and Lo (employees in a sheltered workshop [N= 104]), with the respective groups' mean IQs of 120, 105, and 75. The intercepts (in milliseconds) and slopes (in milliseconds per bit) of the Hick function for each of the groups are: Hi, 295 and 25;

Average, 323 and 35; Lo, 483 and 96. (Data from Jensen, 1987a.)

Page 32: The Heritability of Chronometric Variables

160 Clocking the Mind

institutionalized group of 60 severely retarded adults (mean IQ about 40) was also tested on the Hick task with the same apparatus, but, on average, they did not conform to Hick's law, i.e., the linear regression of RT on bits (Jensen, Schafer, & Crinella, 1982). It is the only group reported in the RT literature that does not display Hick's law and also the only group for which movement time (MT) is slower than RT. However, in most groups of normal intel- ligence there are a few individuals who do not conform to Hick's law (see Jensen, 1987a, pp. 119-122). We also found that individuals with an IQ below 20 could not perform the Hick task, usually perseverating on the button used for the 0 bit condition regardless of the number of buttons exposed in subsequent trials. Nor was it possible to inculcate the Hick skill for the full range of information load (0-3 bits) in a half-hour training period.

These three criterion groups (in Figure 9.1) also differed on average in intraindividual variability (RTSD) in RT measured (in milliseconds) as the average standard deviation (SD) of RTs over 30 trials: Hi=37, Average =52, Lo =220. RTSD increases linearly, not as a function of bits (or the logarithm of the number (n) of possible alternatives of the response stimulus), but directly as a function of n itself. RTSD is generally more highly related to intelligence differences than is RT (Jensen, 1992).

RT is also discriminating at the high end of the ability scale, as shown in Figure 9.2, com- paring the mean RT of three criterion groups on each of the eight different ECTs that differ over a considerable range of difficulty as indicated by their mean RTs. The various RT tests are described elsewhere (Cohn, Carlson, & Jensen, 1985). The three criterion groups were 50 university undergraduates (Un); 60 academically gifted students (G), mean age 13.5 years, all with IQ above 130, enrolled in college level courses in math and science; and 70 nongifted (NG) junior high students in regular classes, mean age 13.2 years, scoring as a group 1 SD

SA2

DT3 words

DT2 words

SD2

DT2 digit

DT3 digit

Digit

RT mean

-

- / ~ . / / g "

_ f i r / ~ ./ / ' "

i

/ i i ',

F I I I I I I I I I I I

300 500 700 900 1100 1300 1500 Mean latency of processing task (msec)

NG >/

I I 1700

Figure 9.2: Mean latency (RT) on various processing tasks in three groups: university stu- dents (Un), gifted (G), and nongifted (NG) children (aged 12-14 years). (From Cohn,

Carlson, & Jensen, 1985, with permission of Elsevier.)

Page 33: The Heritability of Chronometric Variables

Correlated Chronometric and Psychometric Variables 161

above California statewide norms in scholastic achievement. The Un and G groups do not dif- fer significantly in mean RT, but both differ about 1.3 SD from the NG group. Even on the simplest test (rifting the index finger from the home button when a single light goes "on"), the G and NG groups differ on an average by 54 ms. On all tasks, they differ by 700 ms, on aver- age, a difference amounting to approximately 1.5 SD. Overall, the actual speed of processing for the NG group is 1.6 times slower than that of the G group. The profiles of mean latencies (RT) for each of the three groups, however, are highly similar, with correlations of.98 between each of the three pairs of profiles. Similar effects were found for measures of RTSD.

Another study compared academically gifted (G) and nongifted (NG) pupils, ages 11-14 years, differing 1.74 SD on Raven's Advanced Progressive Matrices test. Three easy RT tasks of increasing complexity were given: (1) simple reaction time (SRT), (2) 8-choice reaction time (CRT) (in the Hick paradigm), and (3) Odd-Man-Out (OMO) discrimination. They showed the following NG-G differences in mean RT (milliseconds) and RTSD (Kranzler, Whang, & Jensen, 1994).

SRT CRT OMO

Difference in mean RT 9 Effect size (cr units) 0.15 Difference in mean RTSD 7 Effect size for RTSD 0.23

41 138 0.78 1.24

22 71 0.79 1.34

A Brinley plot is an especially revealing graphical method for contrasting different cri- teflon groups simultaneously on a number of variables, because the "goodness-of-fit" of the data points (i.e., various tests) to a linear function reflects the degree to which the var- ious tests are measuring a global factor that differentiates the criterion groups. The Brinley plot, of course, is meaningful only for ratio scale measurements. In one of the important articles in the RT literature, Rabbitt (1996) gives several examples of Brinley plots based on RT, each of them showing about the same picture for criterion groups that differ on PTs of ability. In one example, the subjects were 101 elderly adults (aged 61-83). Two crite- rion groups, closely matched on age, were selected on the Cattell Culture Fair (CF) Intelligence Test. The low IQ group had CF raw scores between 11 and 29 points; the high group had CF scores between 29 and 40. The low and high CF means differ about 3?. In the Brinley plot of these data, shown in Figure 9.3, the bivariate data points for 15 simple tasks with mean RT ranging between 200 and 1750 ms fall on a regression line that accounts for 99 percent of their variance. As the ECTs increase in information processing load, the differences between the high and low CF groups increase by a single global mul- tiplicative factor. This factor differs between individuals, strongly affecting processing speed on every test. The larger the task's information load, the greater is the total process- ing time by a constant factor for each individual m a factor clearly related to IQ. But it is also instructive to consider Figure 9.4, which shows the relationship between the mean RTs on these 15 ECTs and the magnitudes of their Pearson correlations with the CF IQ. The fact that the data points do not fall very close to the regression line (accounting for r2=.56 of the variance) indicates that some other factor(s) besides sheer processing speed (for instance, working memory) are probably involved in the ECTs' correlations with IQ.

Page 34: The Heritability of Chronometric Variables

162 Clocking the Mind

Essentially, the same effect is found for Brinley plots that compare overlapping groups of young adults of average (N=106) and high (N=100) ability (mean IQs = 100 and 120) tested on seven ECTs that vary in information load (Vernon, Nador, & Kantor, 1985). Plots are shown for both mean RT and mean RTSD in Figure 9.5. Though RTSD shows a very large multiplicative effect, less of its variance is explained by the linear regression than in the case of mean RT.

Correlations of Single Chronometric and Psychometric Variables

Typically, the smallest chronometric-psychometric (C-P) correlations are found between single C and P variables. This is particularly true for single C variables because they are usually much more homogeneous in content and cognitive demands than are P tests, which are invariably composed of a considerable number of varied items scored right or wrong for which the total score reflects a considerably broader factor than the score on a single C test. Increasing the number of trials given on a particular C test increases the reliability of measurement of the subject's mean (or median) RT and RTSD, but it has either no effect or a diminishing effect on the breadth of the factor it measures.

3 . 0 0 -

2 . 7 5 -

2 . 5 0 -

>, 2 . 2 5 -

I J �9 2 . 0 0 - t . _

o o 1 . 7 5 -

�9 1 . 5 0 -

0 - - 1 . 2 5 - IJ= 0

1 . 0 0 - o ,_1

0 . 7 5 -

0 . 5 0 -

0 . 2 5 -

0 . 0 0

0 . 0 0

Y = - 6 . 7 9 e -2 + 1 . 3 3 X - r2 - 0 . 9 9

[ ]

[] 1/~/

I I I I I I I I 0 . 2 5 0 . 5 0 0 . 7 5 1 .00 1 .25 1 .50 1 .75 2 . 0 0

High CF IQ test scorers (X)

Figure 9.3: A Brinley plot of response time measures (in seconds) on 15 tasks given to adult groups in the lower (Low IQ) and upper (High IQ) halves of the distribution of scores on the CF Intelligence Test. The data points are well fitted by the linear regression (r 2 - .99). The dashed line (added by ARJ) is the hypothetical regression line assuming the Low IQ group performed exactly the same as the High IQ group. (From Rabbitt, 1996,

with permission of Ablex.)

Page 35: The Heritability of Chronometric Variables

Correlated Chronometric and Psychometric Variables 163

0.6 I I ! I

I- - 0 . 5 - * * tv X ~ -0.4 - , ~ "6 ~176

-

~" - 0 . 2 t- O m n_ - -

-0 .1 * r = 0.75

O0 / I I I I " 0 500 1000 1500 2000

Mean RTs on 15 ECTs

2500

Figure 9.4: A correlation scatter diagram showing the relationship (r=.75) between the mean RTs on 15 ECTs and these ECTs' correlations with the Cattell CF IQ. (Graph based

on data from Rabbitt, 1996, p. 82, Table 1.)

1400 i I /

~ /" 6 !

~1200- r=O":' ' ' . ~ / -

| looo- t / - _O ~ | ,,,,e

m 800 - � 9 /

600 I-- I/

4 0 0 Y I I I I 400 600 800 1000 1200 1400

High IQ: Mean RT

500

a r 400

C la Q

Z 300

0

200

I O0 I O0

r = 0.76 ./

~ / . / / �9 ~ _

- . / �9 �9 -/

/0 /'0

I/" I I I 200 300 400 500

High IQ: Mean RTSD

Figure 9.5: Brinley plots comparing High IQ and Average IQ groups on mean RT (left panel) and mean RTSD (fight panel) for seven diverse ECTs. The solid line represents the hypothetical regression line assuming the Average IQ group performed the same as the High IQ group. (Graphs based on data from Vernon, Nador, & Kantor, 1985, pp. 142-144,

Tables 1 and 3.)

Page 36: The Heritability of Chronometric Variables

164 Clocking the Mind

The following brief collection of examples is not intended to be a comprehensive review. An assortment of examples was selected from a wide variety of ECTs so as to give a fair idea of the magnitudes and variability of the single correlations found between sin- gle C and P variables. Much of the variability in C-P correlations, even for the same ECT paradigm, is associated with differences in the range of psychometric ability within the dif- ferent subject samples. How much variability in C-P correlations for nominally the same ECT is associated with differences in laboratory equipment, procedures, and test condi- tions is unknown; these variables have not yet been systematically studied.

Hick Paradigm

The simplicity of the Hick paradigm makes it a good example. Beyond the 0 bit condition (SRT) it is a set of CRTs usually based on 2, 4, and 8 possible choices, which are scaled as 1, 2, and 3 bits. The simplicity of the tasks is shown by the fact that for the hardest condition (3 bits) normal adults have RTs of less than 500 ms. Figure 9.6 shows the unweighted and N-weighted mean correlations between RT and IQ as a function of bits within 15 independ- ent groups that are each considerably more homogeneous in IQ than is the general popula- tion (Jensen, 1987a). The correlations are all quite low, but it is of theoretical interest that they are a linear function of bits. The N-weighted and unweighted correlations between IQ and the aggregated RT over all bits, corrected for both restriction of range and attenuation, are - .31 and - .39, respectively. This is about as close as we can come to estimating the pop- ulation RT-IQ correlation in the Hick paradigm (Jensen, 1987a). (The corresponding RTSD-IQ correlations are - . 3 2 and - .42.) When Hick data were obtained with a quite different apparatus in a sample of 860 U.S. Air Force enlistees, the RT-IQ correlations were slightly higher than those in Figure 9.6, ranging from - . 2 2 to - .29, but surprisingly in this sample there was no significant trend in the size of the correlations as a function of bits (Detterman, 1987). The IQ-RT correlation for the combined RT data, corrected for

b

e,. - 0 .25

I -

- 0 .23 0

._o -0.21

o - 0 . 1 9 o r

- 0 . 1 7

I I I I

_ � 9 Unweighted X _ m

_ o ~ Weighted X -

I I I I

0 1 2 3

B i t s

Figure 9.6: Unweighted and N-weighted mean correlations between RT and IQ as a func- tions of bits in the Hick paradigm, based on 15 independent samples with total N = 1129.

(From Jensen, 1987a, p. 164, with permission of Ablex.)

Page 37: The Heritability of Chronometric Variables

Correlated Chronometric and Psychometric Variables 165

restriction of range in the Air Force sample, was - .47, which can be regarded as another sta- tistical estimate of the population correlation.

In a very heterogeneous group with Wechsler Performance IQs ranging from 57 to 130, the RT-IQ correlations were especially large and there was a strong linear trend in RT-IQ correlations as a function of bits, shown in Figure 9.7. Although the distribution of IQs in this specially selected sample does not represent any intact population group, being more a rectangular than a normal distribution. But it strongly confirms the phenomenon of the Hick RT-IQ correlation and its linear regression on task complexity scaled as bits.

In a truly random population sample of 900 middle-aged adults (age around 56 years) in Scotland, the observed RT-IQ correlations for simple RT and 4-choice RT were - .31 and - .49, respectively. This sample included the full range of IQs in the population, the distributions of both RT and IQ were close to normal (perfectly Gaussian for the middle 95 percent of the distribution), and the RT-IQ correlations did not vary significantly between subgroups defined by age, social class, education, or error rates in the 4-choice RT (Deary, Der, & Ford, 2001; also see Der & Deary, 2003 for further analyses of these data). Given this excellent population sample, it is especially instructive for a theory of the RT-IQ correlation to view the particular form of the relationship between SRT and CRT in relation to IQ (reported in terms of deciles of the total distribution on the Alice Heim 4 IQ test), shown in Figure 9.8. Note especially the linear regression of CRT on SRT. It indi- cates that CRT is a constant multiple of SRT as a function of IQ. (The goodness-of-fit to linearity would probably be even better for a measure of g.) The same near-linear trend of CRT as a decreasing function of IQ, as shown in Figure 9.8, was also found for each of the

- 0 . 7 5 - _ �9

_

-0.70

_0 •

- 0 . 6 5

r

o

�9 ~ - 0 . 6 0 I 1 )

13..

-0.55

-0.50 o T' , i i i

o 0 .5 1 1 .5 2 2 .5 3

Bits (log 2 n)

Figure 9.7: Correlations between choice RT and IQ as a function of the number of choice alternatives scaled in bits, in a group of 48 persons with Wechsler Performance IQs

ranging from 57 to 130. (Data from Lally & Nettelbeck, 1977.)

Page 38: The Heritability of Chronometric Variables

166 Clocking the Mind

900

800 - o ~t

l= 7 0 0 -

o

E 600 , i,, i

C o ,i,m o 5 0 0 - t ~

�9 n"

400 -

3000 L o w

I I

�9 �9 CRT

�9 �9 �9 �9

�9 �9 SRT I �9 P 5 10

IQ D e c i l e s H i g h

900 I I

800

700

600 300

�9 CRT = 260.6 + 1.30 SRT r = 0.98

I I 350 400 450

S i m p l e r e a c t i o n t i m e ( m s e c )

Figure 9.8" Left: SRT and CRT plotted as a function of IQ in deciles from lowest (1) to highest (10). Right: CRT plotted as a function of SRT. (Data from Der & Deary, 2003,

Table 2.)

three different CRT tasks in a group of college students, all of them in the upper-half of the population-normed distribution on the CF IQ test (Mathews & Dorn, 1989, p. 311).

Memory Scan (MS) and Visual Scan (VS) Tasks

These are also known, after their inventors, as the Sternberg (1966) and the Neisser (1967) paradigms. In the MS task, a series of 1-7 digits (called "the positive set") is presented for a brief time (3 s), then a single digit appears to which the subject responds as quickly as possible by pressing one of the two buttons labeled YES and NO, indicating whether the single number was or was not a member of the positive set. The mean RTs increase as a linear function of set size. In the VS task, the order of stimulus presentation is just the reverse of that in the MS task. First, a single digit is presented (hence making little demand on memory). Then a set of 1 to 7 digits is shown, which the subject visually scans and responds YES or NO according to whether the single digit is present or absent in the scanned set. The RTs for MS and VS hardly differ, and individuals' RTs on MS and VS are correlated .91 (disattenuated r= l .00) (Jensen, 1987b). This indicates that one and the same latent variable is involved in individual differences on each of these rather dissimilar paradigms, which differ mainly in their demands on short-term memory (STM).

The average correlations of individual differences in mean (or median) RT (and derived parameters) on MS and VS with "IQ" (i.e., scores on various intelligence tests) averaged over five independent studies totaling 314 subjects are shown in Table 9.1. The mean RT and RTSD consistently show rather low but significant correlations with psychometric intelligence, whereas the derived measures, intercept and slope, of the regression of RT on set size are virtually zero and some are even on the theoretically "wrong" side of zero. Theoretically, of course, the intercept represents only the sensory-motor component of RT (in effect, RT for set size = 0), rather than any cognitive processing component, so a zero

Page 39: The Heritability of Chronometric Variables

Correlated Chronometric and Psychometric Variables 167

correlation with IQ is not expected. In reality, however, SRT does have some cognitive component because of uncertainty (i.e., bit > 0) of the time of onset of the RS.

The nonsignificance of the slope parameter, however, is obviously theoretically trou- bling for any theory that posits processing speed as an important component of intelli- gence. However, as explained elsewhere (Jensen & Reed, 1990), because of lower reliability and statistical artifacts the RT slope is severely handicapped as a measure of individual differences. But it is not necessarily true for group differences. Because meas- urement errors average-out in the means of fairly large groups, it would be of critical the- oretical value to determine if there is a significant difference between high IQ and low IQ groups in the overall slopes of the RT means of MS and VS regressed on set size. I have not found such a study.

Semantic Verification Test

The SVT is intended as a simplified version of the various sentence verification tests that have been used in linguistic and cognitive research. It is fully described elsewhere (Jensen, Larson, & Paul, 1988). Instead of using full sentences to describe a simple stimulus, such as "The star is above the cross," the present SVT uses single prepositions, such as BEFORE, AFTER, FIRST, LAST, BETWEEN (and their negation, NOT BEFORE, etc.), to describe the position of one letter among a set of three. The three letters are always one of the six permutations of the letters A B C. First, the subject presses the home button on a binary response console; then, presented on a computer monitor for 3 s is a statement such as C after B. The screen then is blank for 1 s and the letters A B C appear on the screen, and the subject responds as quickly as possible by moving the index finger from the home button and pushing either the YES or the NO button (in this example the correct response is YES). RT is the time interval between the onset of the reaction stimulus (RS) A B C and the sub- ject's releasing the home button. The mean difficulty levels of the various permutations of SVT items differ markedly and consistently. For university students, the average RTs on the different permutations range from about 600 to 1400 ms; and RTs to the negative statements average about 200 ms longer than to the positive statements (Paul, 1984).

Even in the restricted range of IQ for Berkeley undergraduates there is a correlation of - .45 between the total RT on the SVT and scores on the Raven Advanced Progressive Matrices (on which the subjects were told to attempt every item and to take all the time

Table 9.1: Correlations of various RT parameters with "IQ" in memory scan and visual scan tasks, a

RT parameter Memory Visual

Mean/Med RT - .293 - .266 RTSD - .279 - .289 Intercept - . 169 + .060 Slope +.056 +.016

aAverage correlations from five independent studies (Jensen, 1987b, Table 14).

Page 40: The Heritability of Chronometric Variables

168 Clocking the Mind

they needed). There are also large mean differences on the SVT between university undergraduates and Navy recruits (Jensen et al., 1988). The SVT was taken by 36 gifted (IQ>135) seventh graders and their 36 nearest-in-age siblings. The two groups differed 1.09o" in IQ (Raven Advanced Progressive Matrices). Their mean RTs on the SVT differed almost as much as -0.910". The RT-IQ correlation in the combined groups was - .53; the r was - . 43 in the gifted group and - . 4 2 in their siblings. The corresponding correla- tions for RTSD are - . 6 0 and - . 4 7 (Jensen, Cohn, & Cohn, 1989). It is worth noting that MT showed no significant difference between the gifted and sibling groups and no significant correlations with IQ in either group or the combined groups, again suggesting that MT is not a cognitive or g-loaded variable. The unique theoretical importance of this sibling study, however, is that it controls the environmental between-families background factors (social class, income, ethnicity, general nutrition, etc.) and yet shows that the RT-IQ correlations remain significant both within-families (i.e., among siblings) and between-families. In other words, the population correlation between RT and IQ does not depend on differences in the kinds of environmental variables that systematically differ between families.

Coincidence Timing

Coincidence timing (CT) is one of the simpler types of chronometric measurement. However, it cannot be considered as SRT, because the subject's response reflects not only the speed of stimulus apprehension, as does SRT, but also calls for further information pro- cessing in the anticipation and prediction required by the CT task. In this procedure, the subject, with a finger poised on the response key, views a computer monitor on which there is a stationary vertical line extending from the top center to the bottom center of the screen. Then a small square enters the screen from either the fight or left side, traveling in a straight line horizontally (or in a random path on one-third of the trials) at a constant speed of 10 cm/s. The subject is instructed to press the button exactly when the small square coincides with the vertical line. Two basic scores are derived from the subject's perform- ance: the mean and SD over trials of the absolute distance (i.e., error) between the small square and the vertical line at the time that the subject pressed the response key. In a group of 56 eighth-grade students (mean age 13.5 years), the mean and SD of the CT error scores were significantly correlated with IQ (Standard Progressive Matrices) - . 29 and - .36 , respectively. Curiously, when the effect of the subjects' sex is statistically removed, these correlations are increased slightly to - . 3 6 and - .37 ; the correlation with IQ of a combined score of the CT mean error and its SD over trials was - .40 . All of these correlations have significance levels of p<.003 (Smith & McPhee, 1987).

A Variety of Single and Dual Task Paradigms

RT tasks that make greater processing demands generally have longer RTs and larger cor- relations with IQ tests and particularly with g, the latent factor common to all such tests. For example, some tasks, like the Hick paradigm, require no retrieval of information, whereas paradigms like the Sternberg memory-scanning task call for the retrieval of infor- mation from STM, and the Posner Name Identity-Physical Identity task calls for retrieval

Page 41: The Heritability of Chronometric Variables

Correlated Chronometric and Psychometric Variables 169

of information from long-term memory (LTM). The retrieval process takes time. One way to experimentally manipulate the cognitive demands on RT tasks is by requiring the sub- ject to perform two tasks within a brief time period. For example, incoming information has to be momentarily held in STM while performing a second and different processing task after which the information in STM has to be retrieved. This is called a dual task par- adigm (described in detail in Chapter 2, pp. 31-32). Studies that included a battery of both single and dual task RT paradigms are described in Chapter 6 (pp.). Besides the Hick task, the other seven tasks consist of the Sternberg and Posner paradigms, which are presented both as single tasks and, in certain combinations, as dual tasks.

( 3 500 t ~

E

r 400

"~ 300

E 200

E .o ~00 o

" 0

7o

4* / ~ ' 5

*3 *2

1 = 0.97 (p = 0.98) I I I I ..... ! I I I I I i

300 500 700 900 11001300 Mean latency of processing task (msec)

(x)

-0.40 ~

B

- 0 . 3 0 - t:13 <> 5,

09 - *6 < -0.20 - 3,

' - - " 4 . 9 = 2 _~ -o. lo

o

-0.00 300 500 700 900 1100 1300

Mean latency of processing task (msec)

r = - 0 . 9 8 I ! I I ! I,,, I ! t I I I J I I I I I I I I I I

1500

Figure 9.9: Left panel: RT differences between vocational and university students as a function of task difficulty as indexed by mean latency (RT) on each task in the combined groups. Right panel: Correlations of eight RT tasks with ASVAB g factor scores as a function of task difficulty. Single tasks are numbers 1, 2, 5, 8; dual tasks are numbers

3, 4, 6, 7. (From Vernon & Jensen, 1984, with permission of Ablex.)

Page 42: The Heritability of Chronometric Variables

170 Clocking the Mind

The panel on the left side in Figure 9.9 shows the mean differences in RT between groups of vocational college (V) and university (U) students on the eight processing tasks (Vernon & Jensen, 1984). Task difficulty is indexed by the overall mean latency (RT) of the task in the combined groups. The V-U differences are closely related to the tasks' pro- cessing demands as indicated by their mean latencies. The panel on the fight in Figure 9.9 shows the correlations of task difficulty (indexed by the mean latency) of each of the tasks with g factor scores obtained from the 10 subtests on the Armed Services Vocational Aptitude Battery (ASVAB). Although the single correlations between individual differ- ences in RT and g factor scores (i.e., the ordinate in the fight panel of Figure 9.9) are all smaller than - .40, the correlation between the V-U mean differences in RT and the tests' g loadings is .95, indicating that variation in the magnitudes of the V-U mean RT differ- ences is almost entirely a result of the group difference in psychometric g. The g factor is manifested in RT performance to the extent of the task's cognitive demands.

Multiple and Canonical Correlations

Multiple Correlation

A multiple correlation coefficient (R) yields the maximum degree of liner relationship that can be obtained between two or more independent variables and a single dependent variable. (R is never signed as + or - . R 2 represents the proportion of the total variance in the dependent variable that can be accounted for by the independent variables.) The independent variables are each optimally weighted such that their composite will have the largest possible correlation with the dependent variable. Because the determina- tion of these weights (beta coefficients) is, like any statistic, always affected (the R is always inflated) by sampling error, the multiple R is properly "shrunken" so as to correct for the bias owing to sampling error. Shrinkage of R is based on the number of independent variables and the sample size. When the number of independent variables is small (<10) and the sample size is large (> 100), the shrinkage procedure has a negligible effect. Also, the correlations among the independent variables that go into the calculation of R can be corrected for attenuation (measurement error), which increases R. Furthermore, R can be corrected for restriction of the range of ability in the particular sam- ple when its variance on the variables entering into R is significantly different from the population variance, assuming it is adequately estimated. Correction of correlations for restriction of range is frequently used in studies based on students in selective colleges, because they typically represent only the upper half of the IQ distribution in the general population.

Two examples of the multiple R between several RT variables and a single "IQ" score are given below. To insure a sharp distinction between RTs based on very simple ECTs and timed scores on conventional PTs, the following examples were selected to exclude any ECTs on which the mean RTs are greater than 1 s for normal adults or 2 s for young chil- dren. Obviously, not much cogitation can occur in this little time.

The simplest example is the Hick paradigm. Jensen (1987a) obtained values of R in large samples, where the independent variables are various parameters of RT and MT

Page 43: The Heritability of Chronometric Variables

Correlated Chronometric and Psychometric Variables 171

derived from Hick data, viz. mean RT, RTSD, the intercept and slope of the regression of RT on bits, and mean MT.

Without corrections for attenuation and restriction of range in the samples, R=.35; with both of these corrections, R= .50. This is the best estimate we have of the population value of the largest correlation that can be obtained between a combination of variables obtained from the Hick parameters and IQ as measured by one or another single test, most often the Raven matrices.

Vemon (1988) analyzed four independent studies totaling 702 subjects. Each study used a wide variety of six to eight ETCs that were generally more complex and far more het- erogeneous in their processing demands than the much simpler and more homogeneous Hick task. The average value of the multiple R (shrunken but not corrected for restriction of range) relating RT and IQ was .61, and for RTSD and IQ R was .60. For RT and RTSD combined, R was .66.

Canonical Correlation

This might be regarded as the simplest form of a latent trait model. Instead of determining the correlation between observed variables (e.g., test scores), a canonical correlation (CR) calculates the correlation between (a) the common latent trait(s) in a given set of two or more observed variables and (b) the common latent traits(s) in another set of two or more observed variables. It is like a multiple R in which both the independent variables and the dependent variables consist of a number of different measurements. However, the C divides the common variance between the two sets of variables into orthogonal (i.e., uncorrelated) components, called canonical variates. All of the Cs I have found between chronometric and psychometric variables have only one significant canonical variate; this first canonical vari- ate, however, is quite substantial and it is the common factor linking both sets of variables. Like R, the C is always an absolute (i.e., unsigned) value. Like R, C is also inflated by ran- dom sampling errors and therefore is usually "shrunken" to correct for this statistical bias.

The C model of the common latent variable in sets of psychometric and chronometric variables that represents the highest significant correlation that can be obtained between the two sets of variables is illustrated in Figure 9.10, from data on 96 college sophomores,

Correlated latent variables

I PSY 1

I PSY 2

[PSY 3

I PSY 4

ETC.

Chron 1 I

Chron 2 I

Chron 3 I

Chron 4 I

ETC.

Figure 9.10: A canonical correlation (C) between four chronometric tests and four PTs illustrates the method applied to correlational data from a study by Saccuzzo et al. (1986)

described in the text.

Page 44: The Heritability of Chronometric Variables

172 Clocking the Mind

in which the general chronometric variate (Gc) of four chronometric measures and that of four psychometric variables (Gp) have a C=.65 (shrunken C=.55). The correlation of each of the variables with the common factor of each set is shown in each of the boxes. The chronometric measures were simple RT, 3-choice RT, 5-choice RT, and inspection time (IT). The psychometric measures were the Wechsler Vocabulary and Block Design sub- tests, the Scholastic Aptitude Test, and the freshman grade point average (Saccuzzo, Larson, & Rimland, 1986).

Kranzler and Jensen (1991) obtained the C in much larger sets of chronometric (37) and psychometric (11) variables. The chronometric variables were the RT, RTSD and MT, MTSD on the following paradigms: simple RT, 8-choice RT, and OMO RT, Sternberg memory scan, Neisser visual scan, Posner binary RT for same-different words and on syn- onyms-antonyms, and IT. The psychometric variables were the Raven Advanced Progressive Matrices and the 10 subtests of the Multivariate Aptitude Battery. In a sample of 100 university undergraduates, the C between these two sets of variables was .60 (shrunken = .57). After correction for restriction of range in the university sample (with mean IQ of 120), the shrunken C was .72, a value about the same as that of the average correlation between various IQ tests (Jensen, 1980b, pp. 314-315).

In a sample of 109 children (aged 4-6 years), Miller and Vernon (1996) obtained RT data on eight diverse ECTs on each of which the RT averaged less than 2 s; the error rate was 8 percent. The psychometric variables were the 10 subtests of the Wechsler Preschool and Primary Scale of Intelligence (WPPSI). The uncorrected C between the RT and WPPSI batteries was .65.

Overall, the several estimates of the canonical correlations between sets of diverse psy- chometric mental ability tests and sets of various measures of RT in the various studies range from .55 to .72, averaging .62.

Factor Analysis of Chronometric and Psychometric Variables Together

This is the most analytic method of looking at the relationship between the two classes of measurements. In terms of the number of RT measures (both direct and derived) obtained from several ECTs and the number of PTs, the factor analysis by Carroll (1991b) of the data from Kranzler and Jensen (1991), described in the previous section (p. 171), is prob- ably the most revealing. Carroll performed Schmid-Leiman orthogonalized hierarchical factor analysis of these data, which comprised correlations among 27 chronometric and 11 psychometric variables. The hierarchical structure of the factor matrix is quite complex, but the main gist of it can be most easily explained in terms of the simplified schematic diagram shown in Table 9.2. Before doing the factor analysis, Carroll properly reflected all of the correlations between the chronometric and psychometric variables so that "good- ness" of performance on both types of variables is represented by a positive (+) correlation between them. This resulted in positive signs for all of the salient or significant factor load- ings in the whole analysis. All of the PT and all of the ECT:RT variables (but not the ECT:MT variables) have salient loadings on a large common factor labeled as g here. But also note that the PT and ECT variables have no common group factors. Besides their load- ings on g, the PTs load only on the PT group factors Verbal and Spatial. The ECTs load on

Page 45: The Heritability of Chronometric Variables

Correlated Chronometric and Psychometric Variables 173

Table 9.2: Orthogonal Factors.

Variable g Psychometric

V S M RT

Chronometric

RTS RTNS MT

Psychometric Tests

Chronometric Tasks--RT

+ + + + + +

+ + +

Chronometric Tasks--MT

+ + + + + +

different group factors depending on whether the ECT makes a demand on memory (either STM or LTM), labeled ECT+Mem, as in the Sternberg and Posner paradigms, or makes no memory demand (labeled ECT:RT), as in the Hick and OMO paradigms.

In Carroll's analysis, the average g loading of the PTs= .35; of the ECT:RT, g loadings =.43. The group factors had the following average loadings: PT-Verbal=.61, PT-Spatial= .47; ECT:RT =.42, ECT + Mem:RT =.58, and ECT:MT =.51.

The results of Carroll's (1991b) definitive analysis of the Kranzler-Jensen (1991) data help to explain why the simple correlations between single psychometric and single chronometric measures are typically so relatively small. It is because a large part of the variance in each type of variable, both psychometric and chronometric, consists of a group factor + the variable's relatively large specificity, which the variables do not have in com- mon. It appears at this stage of analysis that there may be only one factor (here called g) that is common to both the psychometric and chronometric domains. Other studies, too, have found that the degree of correlation between RT measures and various PTs depends on the size of the tests' g loadings (Hemmelgarn & Kehle, 1984; Smith & Stanley, 1983). No significant or systematic relationship has been found between RT and any other psy- chometric factors independent of g.

A quite similar result to that of Carroll's analysis is indicated by the correlations (Miller & Vernon, 1996, from Tables 6, 8, and 10) based on eight diverse RT tests (with mean RTs

Page 46: The Heritability of Chronometric Variables

174 Clocking the Mind

ranging between 800 and 1800 ms) and 10 subtests of the Wechsler Preschool and Primary Scale of Intelligence-Revised (WPPSI-R) given to 109 children of ages 4 to 6 years (mean IQ 107, SD 13). On the full 18X18 matrix of correlations among the RT and WPPSI vari- ables, I have performed a nested factor analysis, in which the general factor is extracted first, followed by extraction of the remaining significant factors. There is a large general factor (g), accounting for 36 percent of the total variance on which all of the 18 variables have substantial loadings; and there are only two group factors (with eigenvalues > 1), accounting for 12 and 6 percent of the total variance. These two factors represent the well- established Verbal and Nonverbal Performance factors found in all of the Wechsler test bat- teries. The RT tests' significant loadings are entirely confined to the g factor, with the single exception of 2-choice RT (for Colors: same or different), which was loaded on both g and the Performance factor. After the g factor was extracted, the RT variables yielded no coherent group factor(s). There was nothing at all that looked like an RT factor independ- ent of g. Whatever non-g variance remained in the whole battery of eight RT tests was spe- cific to each test. In the Kranzler-Jensen data set factor analyzed by Carroll (1991), as described above, there were distinct RT group factors, independent of g, representing RT tasks that either did or did not make demands on memory, whether STM or LTM. In the Miller and Vernon study, however, none of the RT tasks involved either STM or LTM. The study by Miller and Vernon also included five separate tests of STM, which, besides hav- ing large g loadings, share the same group factor with the WPSSI subtests, but the mem- ory tests have relatively weak correlations ( - . 2 0 to - .40) with RT. The interaction of memory and processing speed seem to be fundamental to g.

If different sets of PTs that are based on different contents, such as verbal (V), quanti- tative (Q), and spatial (S), are factor analyzed along with parallel sets of chronometric tests (ECTs) based on the same contents, will the three types of content be represented by three group factors, V, Q, S, or will the group factors represent only the distinction between PT and ECT? This question was answered in a study by Levine, Preddy, and Thorndike (1987), based on samples of school children in the 4th, 7th, and 10th grades. The result of a hierarchical factor analysis is predictable from the previously described studies. All of the PTs and ECTs loaded about equally on the g factor (average loadings for PTs = .43, for ECTs = - .41). There were three group factors, V, Q, S, on which only the PTs showed significant loadings; and there was a single RT factor, independent of g, with loadings unrelated to the test contents. The specific content features of the ECTs were not repre- sented in the factor structure. The lone RT factor represents a source of variance unique to the ECTs regardless of content; it is probably a psychomotor factor common to all of the six of the ECTs, which were administered with the same apparatus and procedure. Again, we see that the correlation between nonspeeded psychometric scores and RT is mediated indirectly by the relationship of each type of variable to a higher order factor (g in this case). But the stark conclusion that RT is a better measure of g than it is of any psycho- metric group factors, if it even reflects any of them at all, should be held in abeyance until this possibility is confirmed by further studies.

Although both PTs and ECTs, when factor analyzed together in one correlation matrix, have substantial loadings on the general factor of the whole matrix, the question arises whether the resulting general factor is a different g or might in some way be a better g than that extracted exclusively from a psychometric battery, such as the Wechsler Intelligence

Page 47: The Heritability of Chronometric Variables

Correlated Chronometric and Psychometric Variables 175

Scales. To answer the question as to a "better" g, one must ask "better for what?" Hence, an external criterion is needed to assess which of the two g factors is a better predictor of the criterion.

Exactly this question was posed and researched in two studies using various types of factor analysis and structural equation modeling (another statistical technique for identi- fying latent variables and their interrelationships) (Luo & Petrill, 1999; Luo, Thompson, & Detterman, 2003). The subjects were 568 elementary school children. The psychomet- ric battery was the 11 subtests of the Wechsler Intelligence Scale for Children-Revised (WISC-R); the chronometric battery was RTs on each of six ECTs; and the criterion vari- able for scholastic performance was the Metropolitan Achievement Test (MAT), assessing reading, math, and language.

The data in the exceptionally instructive analyses by Luo and Petrill (1999) and Luo et al. (2003, 2006) are like the studies previously reviewed, with the typical results depicted in Table 9.2. That is, both the psychometric and the chronometric measures are loaded on a single large common factor, g, and they have no lower-order group factors in common - - the group factors are specific to each domain. More importantly, the position of the g axis of the combined psychometric and chronometric tests vis-a-vis the axis of the general factor of scholastic achievement is very close to the g axis of the combined correlation matrix. The g of the combined matrix is, in fact, a slightly better predictor of scholastic achievement than the psychometric battery alone (validity coefficients of about 0.5 and 0.6, respectively). So g, whether psychometric or chronometric, is pretty much one and the same general factor. Because there is virtually no general or scholastic infor- mation content in the simple chronometric ECTs, the authors inferred that speed of infor- mation processing as measured by the ECTs is an important mechanism mediating the observed correlations between the general factor of the PTs (WISC-R) and the general factor of scholastic achievement (MAT). They also submit that information processing speed may not be the only elemental cognitive component mediating the correlation between g and scholastic performance. Certain nonchronometric variables, particularly memory processes, contribute to the correlation between psychometric g and scholastic performance independently of processing speed. As hypothesized later on, there appears to be a synergistic interaction between processing speed and memory processes that accounts for most, or perhaps all, of the g variance.

Failed Explanations of the RT-IQ Correlation

The revival of research on the RT-IQ relationship engendered numerous criticisms of both the idea of there being a correlation between RT and IQ and the actual findings showing such a correlation. Critics were seldom previously involved in this field of research and apparently had no intentions of collecting appropriate data to empirically check their spec- ulations. The seeming aim was not to encourage further research but rather to "explain away" the RT-IQ correlation in various superficial ways that, if accepted as valid, would dampen interest in pursuing this line of investigation. After all, to many psychologists at that time it seemed too implausible that individual differences in anything as apparently mindless and trivial as a person's RT performance could be causally related to individual differences in something so miraculously complex as human intelligence.

Page 48: The Heritability of Chronometric Variables

176 Clocking the Mind

The total extant evidence at that time, however, forced the debate to shift from contest- ing the existence of the RT-IQ correlation itself to its interpretation. The arguments hinged mainly on the direction of causation. Is the RT-IQ correlation explained by bottom-up processing (i.e., faster RT ~ higher IQ), or by top-down processing (i.e., higher IQ faster RT)? The top-down view was favored by critics of the essential findings on the RT-IQ correlation. They argued that higher-level cognitive processes are the causal factor in the acquisition and performance of lower-level skills, such as RT performance, and this therefore explains the observed IQ ~ RT relationship. According to this view, mental speed per se plays no causal role but only reflects the consequences of higher-level pro- cessing. The IQ ~ RT correlation is claimed to come about because individuals with higher intelligence learn new skills more readily, take greater advantage of subtle cues in the task format, discover more optimal performance strategies, and the like. Hence they are winners in the RT game. This is all quite plausible, however, uninteresting. For if we accept the top-down theory to explain the IQ ~ RT correlation, there is nothing new to be learned about the nature of individual variation in IQ from further studies of this phenom- enon. It would be a blind alley for intelligence researchers wishing to go beyond conven- tional psychometrics and descriptive statistics.

The bottom-up theory, on the other hand, proposes the opposite direction of causality in the RT ~ IQ connection. It aims to search out the root causes of intelligence, or g, with speed of cognitive processing as a crucial construct and the measurement of RT and IT essential tools. The program's research rationale is essentially this: Individuals with faster speed of information processing (hence faster RT) thereby take in more information from the environment per unit of exposure time m attending to, processing, and consolidating a larger proportion of the information content into LTM, later to be retrieved as knowledge and cognitive skills, thus causing learning at a faster rate, and developing all the variety of "higher-cognitive processes" identified as intelligence. The highest-order common factor in assessments of all these cognitive aspects of individual differences is g. Thus, individ- ual differences in speed of processing are posited as a fundamental, or bottom-up, cause of g. To discover the physical basis of processing speed requires that cognitive neurosci- entists must search further down the causal chain for the precise mechanisms linking the brain to RT and other behavioral expressions of g. In this analytic-reductionist program, experiments require the precise quantitative data afforded by chronometry, on the one hand, and by direct imaging and measurement of brain variables, on the other, each domain acting alternately as the independent or the dependent variable in experimental designs.

Probably the most comprehensive critique of the bottom-up formulation is that of Longstreth (1984), who suggested a number of possible artifacts in RT methods that might account for the IQ-RT correlations. Even the linear slope of the Hick function and its rela- tion to IQ were attributed to the specific order of presenting the 0-3 bits RT tasks. Some of Longstreth's complaints could be contradicted with empirical evidence available at that time (Jensen & Vernon, 1986). Later on, the experimental artifacts and confounds Longstreth and others held responsible for the RT-IQ findings were empirically investi- gated in independent studies designed to control or manipulate the possible effects of each claimed artifact. In most of these studies, the artifacts could not account for the IQ-RT cor- relations or other theoretically relevant features of the RT data, and in some studies, elim- inating one or more of the imputed artifacts even increased the RT-IQ correlation

Page 49: The Heritability of Chronometric Variables

Correlated Chronometric and Psychometric Variables 177

(Kranzler, Whang, & Jensen, 1988; Neubauer, 1991; Neubauer & Fruedenthaler, 1994; Larson & Saccuzzo, 1986; Larson, Saccuzzo, & Brown, 1994; Smith & Carew, 1987; Vernon & Kantor, 1986; Widaman & Carlson, 1987; for reviews: Deary, 2000a, pp. 156-160; Jensen, 1998a, pp. 238-248).

Rather than detailing each of these often-complex studies, it would be useful here to simply list the most prominent of the various failed hypotheses proposed as explanations of the RT-IQ correlation.

Motivation Higher-IQ subjects are assumed to be more motivated to excel on any cog- nitive task and hence show faster RT. However, direct indicators of variation in degree of autonomic arousal and effort (e.g., pupillary dilation) show that on tasks at any given level of difficulty higher-IQ subjects register less effort than lower-IQ subjects. The authors of a definitive study of this hypothesis concluded " . . . more intelligent individuals do not solve a tractable cognitive problem by bringing increased activation, 'mental energy' or 'mental effort' to bear. On the contrary, these individuals show less task-induced activation in solving a problem of a given level of difficulty. This suggests that individuals differing in intelligence must also differ in the efficiency of those brain processes which mediate the particular cognitive task" (Ahem & Beatty, 1979, p. 1292).

Practice and learning The idea is that higher-IQ subjects learn the task requirements faster and benefit more from practice and therefore excel at RT tasks. Considering the sim- plicity of the ECTs on which RT is measured, this seems a most unlikely explanation for the RT-IQ correlation observed in samples of very bright college students performing tasks that are understood even by preschoolers and the mentally retarded. But more telling is the fact that the RT performance of lower-IQ individuals does not gradually catch up to that of higher-IQ individuals, even after many thousands of practice trials spread over many ses- sions. The idea that RT differences on the simplest ECTs are the result of differences in abil- ity to grasp the task requirements is not at all borne out by any evidence. Beyond the first few practice trials, the rank order of individuals' RTs is maintained throughout extended RT practice up to an asymptotic level. Although prolonged practice results in a slight overall improvement in RT, its magnitude is exceeded by individual differences in RT.

Visual attention, retinal displacement, and eye movements This explanation assigns the causal locus to peripheral rather than central processes. It applies to choice RT tasks in which the choices vary in the spatial location of the RS, which varies randomly from trial- to-trial. Task difficulty and RT vary with the number of RS alternatives, as in the Hick task. As the number of alternatives increases, the subject's visual focus of attention will, on aver- age, be further removed from the randomly presented RS. The RT on the more difficult task conditions, therefore, could be a function of the degree "retinal displacement" of the RS and the slight automatic eye movement needed to bring the RS into acute focus (i.e., foveal vision). RT is slightly faster for stimuli focused on the fovea than on other retinal areas (peripheral vision). Thus the increase in RT as the number of RS alternatives is increased might not be a measure of cognitive processing speed as a function of information load, but only a peripheral ocular effect. This explanation, however, is contradicted by the finding that the increase in RT as a function of the number of alternatives in the RS occurs even

Page 50: The Heritability of Chronometric Variables

178 Clocking the Mind

when the RS alternatives (e.g., different colors) are always presented in exactly one and the same location, on which the subject's vision is directly focused immediately before the appearance of each RS, assuring acute foveal vision on virtually all trials. The result shows that the number of RS alternatives per se (i.e., the ECT's cognitive information load), rather than variation in the visual location of the RS, is the main cause of a subject's RT to the RS. Individual variation in RT is clearly a central, not a peripheral, phenomenon.

Speed-accuracy trade-off and other strategies This is the idea that subjects adopt a par- ticular strategy for maximizing their performance, and smarter subjects can figure out more effective strategies. An obvious strategy is to increase speed at the expense of increasing response errors. When subjects are instructed to use this strategy, it effectively decreases RT and increases error rate. So it was assumed that higher-IQ subjects are more likely to dis- cover this RT/error trade-off strategy, which would explain the negative RT-IQ correlation. The fallacy in this explanation, however, is that the trade-off effect, which can be experi- mentally manipulated by instructions, turns out to be entirely a within-subjects effect, not a between-subjects effect. Therefore, it cannot be causally related to individual differences in IQ - - an entirely between-subjects source of variance. If the brighter subjects in any sam- ple had faster RTs because they are more likely to spontaneously adopt a strategy of speed/accuracy trade-off, then we should expect (a) a positive correlation between IQ and errors and (b) a negative correlation between RT and errors. But what is actually found is the opposite direction of both of these correlations. The brighter subjects have both fewer errors and faster (i.e., shorter) RT. Other strategies also have been experimentally investi- gated with respect to various ECTs, and there is no evidence that the IQ-RT correlation on ECTs can be in the least explained by individual differences in the use of particular strate- gies. In probably the most comprehensive examination of the strategies hypothesis, the investigators concluded "Alternatives to strategy theories of g must be pursued if progress is to be made in explaining general intelligence" (Alderton & Larson, 1994, p. 74).

Speeded psychometric tests One of the most frequently proclaimed explanations for the RT-IQ correlation is that both the RT task and the correlated PT share the same common speed factor when all or part of the PT is speeded. This idea is totally invalidated by clear- cut evidence. PTs show at least as high or higher correlations with RT when they are given without time limits as when the same tests are speeded. Since the test-speed notion was the most prevalent explanation of the RT-IQ correlation when I began researching RT some 30 years ago, from the beginning it has been routine practice to administer every PT with- out time limit and instruct subjects to attempt every item and take as much time as they wish. To prevent subjects' awareness of the time others take on the test (to prevent a "keep- ing up with the Joneses" effect), every subject is tested alone in a quiet room. After giving instructions, the examiner leaves the room. If the subject's performance is timed, it is with- out the subject's knowledge. So studies of the RT-IQ relationship conducted in our chronometric laboratory are not confounded by test-taking speed. Individual differences in ad libitum time on PTs are not significantly correlated with RT but with a personality fac- tor - - extraversionmintroversion (assessed by the Eysenck Personality Inventory in our early studies of RT). Though extraversion is negatively correlated (about - .40) with total ad libitum testing time, it is correlated neither with test scores nor with RT.

Page 51: The Heritability of Chronometric Variables

Correlated Chronometric and Psychometric Variables 179

Special Chronometric Phenomena Related to Psychometric Abilities

Intraindividual variability (RTSD) This is the trial-to-trial fluctuation in RT, measured as the SD of a subject's RTs over a given number of trials (n), henceforth labeled RTSD. It has not yet been established whether RTSD represents a truly random within-subject fluctuation or is a regular periodic phenomenon that is disguised as a random variation because of the asynchrony of response vis-~-vis the randomly timed appearance of the RS. This is a crucial question for future research. If the apparently random fluctuation in RTs represented by RTSD is not just an artifact of such asynchrony but is truly random, it would suggest a kind of "neural noise," for which reliable individual differences in mag- nitude might explain g. It would be even more important theoretically if it were established that RTSD reflects a true neural periodicity, or oscillation in reaction potential. What has been well established so far is that variation in RTSD is a reliable individual differences phenomenon and, more importantly, individual differences in RTSD are correlated with g and IQ at least as much, and probably more, than is the mean or median RT or any other parameters derived from RT data (Jensen, 1992; Baumeister, 1998). When RTSD shows a lower correlation with IQ than does RT, it is usually because of the much lower reliability of RTSD. The disattenuated correlation of RT and RTSD with IQ generally favors RTSD. So this is one of the most central issues for RT research. Chaper 11 fully explains the importance of intraindividual variability for a theory of the RT-g correlation.

A provocative study based on 242 college students found that RTSD measured in vari- ous chronometric tasks is positively correlated with a psychometric scale of neuroticism, suggesting that trait neuroticism is a reflection of "noise" in the brain's neural control sys- tem (Robinson & Tamir, 2005). It is puzzling, however, that mean RT itself showed no sig- nificant correlation with neuroticism, given the intrinsically high correlation between RT and RTSD found in most studies. This calls for further investigation to be entirely convinc- ing as a broadly generalizable empirical fact. It could turn out to be merely a scale artifact.

In fact, the mean RT and RTSD are so highly correlated as to seem virtually redundant in a statistical and factor analytic sense, much as would be the redundancy of diameters and circumferences of various-sized circles. The near-perfect correlation between RT and RTSD reflects the fact that across individuals there is a virtually constant proportionality between mean RT and RTSD, as measured by the coefficient of variation (CV = SDRT/RT). Yet, RT and RTSD behave quite differently in a number of ways, including their relation to the RT task's information load, as shown in Figure 9.11, based on 1400 subjects. The Hick task's mean RTs are plotted as a function of bits, and the means of RTSD are plotted as a function of the actual number of response alternatives for the same data. The difference between these functions is a real phenomenon, not a mathematically necessary artifact. RT increases as a linear function of bits (i.e., log 2 of the number of S-R response alternatives), while RTSD increases linearly as a direct function of the number of response alternatives.

As explained previously (Chapter 4, pp.), another statistic, the mean square successive difference (MSSD) or ~/MSSD, should supplement RTSD in future research on intraindivid- ual variability. I have not found MSSD ever being used in RT research, unfortunately. Because RTSD comprises not only random fluctuation between trials but also any system- atic or nonrandom trends in the subject's performance, such as practice effects, it may con- found quite different sources of RT variation. MSSD, however, measures purely the absolute

Page 52: The Heritability of Chronometric Variables

180 Clocking the Mind

4 5 0

E 4 0 0 -

o

L -

e- ra 3 5 0 -

I I I I

R = 0 . 9 9 8

3 0 0 I I I I 0 1 2 3

N u m b e r o f b i t s

9 0 I I I I �9 I

80

7O

~ 60 -

50 -

40 I I I I I 1 2 4 6 8

Number of response alternatives

Figure 9.11" Mean RT and Mean RTSD, based on 1400 subjects, plotted as a function of bits (for RT) and number of response alternatives (for RTSD). (From data in Jensen, 1987a.)

RT fluctuations between adjacent trials and therefore is uncontaminated by possible system- atic trends in the subject's performance. It would be a major empirical achievement to demonstrate that the magnitude of MSSD can be manipulated experimentally, as it could pro- vide the means for testing the hypothesis that the observed random fluctuations in a subject's trial-to-trial RTs is essentially the result of the timing of the RS being randomly out of syn- chrony with the subject's regular oscillation of reaction potential. This is a challenge for the techniques of experimental psychology. If behavioral evidence of a regular oscillation is found, the next question obviously is whether individual differences in the rate of oscillation in reaction potential, as an operationally defined construct, has a physical basis in brain func- tioning. Its investigation would call for the techniques of neurophysiology.

The "worst performance rule" This is another phenomenon that must be explained by any theory of the RT-IQ correlation. Though it has been known for decades, it received lit- tle attention. It was not until 1990 that it was given a name m the worst performance rule (WPR) - - in connection with an excellent large-scale study that definitively substantiated the WPR (Larson & Alderton, 1990). This surprising phenomenon is the fact that, in Larson and Alderton's words, "The worst RT trials reveal more about intelligence than do other portions of the RT distribution." The WPR, tested with quite different RT tasks in a college sample, was further established by Kranzler (1992). One study based on a very het- erogeneous sample (ages 18 to 88 years) did not show the WPR, perhaps because of the study's many differences from previous studies, in the RT tasks, tests, and procedures used (Salthouse, 1998). Coyle (2003) found the WPR to apply in children in the average range of ability (mean IQ-- 109) but not in gifted children (mean IQ= 140), and related this find- ing to Spearman's "law of diminishing returns," which states that conventional IQ tests are less g loaded for individuals at higher levels of ability, because a larger proportion of their variance in abilities is invested in various, more specialized, group factors, such as verbal, numerical, and spatial abilities, consequently leaving a proportionally smaller investment in g (Jensen, 2003). This suggests that WPR phenomenon depends mainly on the g factor rather than on a mixture of abilities including their non-g components.

Page 53: The Heritability of Chronometric Variables

Correlated Chronometric and Psychometric Variables 181

The WPR is demonstrated with RT tasks as follows: across n test trials each subject's RTs are rank ordered from the fastest to the slowest RT. (To minimize outliers often the two most extreme RTs are discarded, so RTs on n - 2 trials are ranked). The RT-IQ cor- relations within each rank then are calculated for the entire subject sample. Consistent with the WPR, it is found that the resulting within-rank correlations systematically increase from the fastest to the slowest RT. It is also important for a theory of the RT-IQ correlation to note that the within-ranks coefficients of variation (CV = SD/mean) are per- fectly correlated (r=.998 in the Larson and Alderton study) with the within-ranks RT-IQ correlations. This close connection between the WPR phenomenon and the RTSD-IQ correlation implies that if intraindividual fluctuation in RT across trials (i.e., RTSD) is considered the more "basic" phenomenon, then the WPR is simply a necessary derivative consequence of the RTSD-IQ correlation, which, as proposed in Chapter 11, is the chief causal mechanism in explaining the RT-IQ correlation and possibly the basis of g itself.

The WPR can also be displayed graphically by comparing two groups that differ in IQ. Note that the differences between the group means increase going from the fastest to the slowest RTs, as shown in Figure 9.12 based on simple RT measured with the same apparatus and procedures in both groups.

700 -

600 -

500 -

~" 4 0 0 - rr r -

3 0 0 -

200 -

1 0 0 -

. . . ~ ' ~ ' ~ ' ~ - Retarted f /

f

0 I I I I I I I 0 2 4 6 8 10 12 14

Rank

Figure 9.12: Mean simple RT plotted as a function of rank order (from fastest to slowest) of each individual's RTs, for groups of young adults with normal intelligence (mean IQ 120) and with mental retardation (mean IQ 70). (From Jensen, 1982a, p. 291, with

permission of Springer.)

Page 54: The Heritability of Chronometric Variables

182 Clocking the Mind

Convertability between RT and response errors In studies of the RT-IQ relation, little investigative attention has been paid to response errors, that is, making the wrong response when there are two or more possible choices. The intense focus on purely mental speed vari- ables in most RT-IQ studies has resulted in a neglect of the role of errors in the RT-IQ cor- relation. But there is also the fact that experimenters have usually tried to minimize error rates by keeping the RT tasks simple and by stressing accuracy as well as speed in the pre- liminary instructions. In some studies, only RTs for correct responses are used in the data analysis, and though errors may be automatically counted, they are seldom reported. Moreover, unless the RT task is fairly complex and the number of test trials is quite large, individual differences in the very small error rates are almost like single-digit random num- bers and have near-zero reliability. Hence at present most possible generalizations about errors are relatively impressionistic and call for more systematic empirical verification.

However, there are two phenomena that have been rather informally but consistently observed in many of the studies in the Berkeley chronometric laboratory. The first is that response errors increase with task complexity (as indicated by mean RT). The second is that as task complexity increases, up to a point, the increasing errors are increasingly cor- related with IQ; but beyond that critical level of task complexity, errors are more correlated with IQ than is RT. As tasks increase in complexity, there is a trade-off between RT and errors in their degree of correlation with IQ.

This suggests that the difficulty level of cognitive tasks when ranked in terms of RT should be similar to that of the same tasks as measured by their error rates, assuming that RTs and error rates both had enough variability to allow their reliable measurement in the same data set. This hypothesis was explicitly tested in a study that compared a group of university students with grade-school children (ages 8 to 9) on the Semantic Verification Test (SVT), previously described on page 20 (Jensen et al., 1988). It was necessary to have two groups of widely differing ability levels in order to obtain relatively error-free speed measures in one group and to get a large enough number of errors in the other group to ensure a sufficiently wide range of variation in the item's error rates under untimed condi- tions. There were 14-item types each given (with different permutations of the letters) seven times in a random order, totaling 84 items in all. The university students were given the SVT as a RT task, emphasizing both speed and accuracy in the instructions. (Their mean RTs on each of the 14-item types ranged from about 600 to 1400 msec, with an aver- age error rate of 7 percent. The mean RTs were correlated - .45 with scores on the untimed Advanced Raven Matrices, which is as high as the correlation between the Raven and the Wechsler IQ (WAIS) in university samples.) The very same SVT items were given to the school children in the form of an untimed paper-and-pencil test of 28 items (two items for each of the 14 SVT item types). The instructions made no mention of speed and urged that every item be attempted. The required response was to circle either YES or NO that accom- panied each SVT item in the test booklet. The children's average error rate was 18 percent. The essential finding: the children's mean errors on the 14 SVT items had a rank correla- tion of .79 (.90 when corrected for attenuation) with the mean RTs of the corresponding SVT items in the university sample. In other words, the more difficult the item was for the children, the greater was its mean RT for the university students. This high congruence between item RTs and item error rates suggests the possibility of using very high-IQ col- lege students' mean RTs on easy test items as a means for obtaining objective ratio-scale

Page 55: The Heritability of Chronometric Variables

Correlated Chronometric and Psychometric Variables 183

measures of item difficulty in untimed PTs for children. This phenomenon of the "conver- tion" of RTs to response error rates (or conversely to p-values) calls for further study. Aside from its possibly practical applications, it seems a crucial point for a theory of the RT-IQ correlation and hence for a theory of g itself.

Task complexity and the RT-IQ correlation It has long seemed paradoxical that RT has low to moderate correlations with IQ, the correlation increasing as a function of task complexity (or length of RT) while the time taken per item on conventional PTs is much less correlated with total score (or with IQ on another test) based on the number of items being scored as correct. The times taken per Raven matrices item, for example, show near- zero correlations with RT. The true-score variance of test scores depends almost entirely on the number right (or conversely, the number of error responses). The relationship between RT and task complexity or cognitive load of the items to be responded to (i.e., the RS) has been a subject of frequent discussion and dispute in the RT-IQ literature (e.g., Larson & Saccuzzo, 1989). I have examined virtually the entire literature on this seeming paradox, but rather than giving a detailed account of all these empirical studies, I will simply summarize the main conclusions that clearly emerge from a wide range of studies. These findings can be illustrated by a couple of studies that were specifically directed at analyzing the relationship of task complexity to the RT-IQ correlation.

But first, a few words about the definitions of complexity in this context. One or another of five clear operational criteria of task complexity is generally used: (1) the average of the subjective ratings made by N judges of various RT tasks' "complexity"; (2) the amount of uncertainty as an increasing function of the number of choices (response alternatives) that are associated with the n different RS, such as the difference between SRT and CRTs based on two or more stimulus-response alternatives; (3) the theoretically presumed number of distinct mental operations that are required for a correct response, such as the difference between adding two digits and adding three digits; (4) the difference between (a) single tasks that make a minimal demand on memory and (b) dual tasks requiting that one item of information RS 1 be held in memory while performing the interposed task RS2-RT2, then performing RT1; and (5) various tasks' mean RTs used as a measure of complexity. All of the above conditions except 1 and 5 can be experimentally manipulated as inde- pendent variables while RT is the dependent variable.

Subjective judgment (condition 1) is probably the most questionable measure, although, as predicted by the Spearman-Brown formula, the mean ranking of tasks for "complexity" would gain in validity by aggregating the rankings by an increasing number of judges. A study of the SVT (described on page ) in which a group of 25 college students were asked to rank the 14-item types of the SVTs for "complexity" showed that subjective judgments of item complexity do have a fair degree of objective validity (Paul, 1984). The raters were naive concerning the SVT and its use in RT research. The mean ratings on "complexity" of the 14 SVT items (from least complex- 1 to most complex = 14) had a rank-order cor- relation of +.61 with the items' mean RTs obtained in another group of students (N=50).

The hypothesized relationship of the RT-IQ correlation to task complexity is shown in Figure 9.13. The level of complexity at the peak of the curve is not constant for groups of different ability levels. Although the relative levels of complexity on different tasks can be ranked with fair consistency, the absolute complexity level varies across different ability

Page 56: The Heritability of Chronometric Variables

184 Clocking the Mind

levels. The peak of the curve in Figure 9.13 occurs at a shorter RT for adults than for chil- dren and for high IQ than for low IQ groups of the same age. The peak level of task com- plexity for correlation with IQ in college students, for example, is marked by a mean RT of about 1 s; and for elementary school children it is between 2 and 3 s. But there has not been enough systematic parametric research on this point to permit statements that go beyond these tentative generalizations.

A direct test of the hypothesis depicted in Figure 9.13 was based on eight novel speed- of-processing tasks devised to systematically differ in difficulty or complexity (Lindley, Wilson, Smith, & Bathurst, 1995). They were administered to a total of 195 undergradu- ate college students. IQ was measured by the Wonderlic Personnel Test. The results are summarized in Figure 9.14. This study affords a clue to what is probably the major cause

Task complexi ty

m

t -

O . D ,.i..,,

m l l )

O

o o m

Figure 9.13: The generalized relationship of the RT-IQ correlation to task complexity. The absolute value of the correlation coefficient Irl is represented here for graphic clarity, although the empirical RT-IQ correlation is always a negative value, with the very rare

exceptions being attributable to sampling error.

-0.40 - c -

o , i

t ~

"~ - 0 .30 - 1._ t _

O

ol - 0 .20 -

t e

- 0 .10 -

0.0 I I I I 0 0.5 1.0 1.5 2.0

Mean reaction time (seconds)

Figure 9.14: The RT-IQ correlation plotted as a function of mean RT for 105 undergraduates. (From data in Lindley et al., 1995.)

Page 57: The Heritability of Chronometric Variables

Correlated Chronometric and Psychometric Variables 185

i i i ~- co o4

i i i

u o p , e l a J J o o O ld .E l

i i c j ~-

SJOJJe e s u o d s e J Uee lN

0 0

i i i r i.c) m'-

( o e s ) e u J ! l e s u o d s e J u e e l N

- c o .~'~

~ o

"-3

o . p -

C ) I

CO .>,

_ ~ o

- , - ~,

I I I

'r co c~ o 6 o

I I I

u o ! l e l e J J O O OI-J .H

~ o

,--~,

o co

I I I

co

I I I

u o ! l e l e J J O O 01-11:1

I I

I I

CO CXl ,,-

SJOJJ~ e s u o d s e J Uee lN

�9 C O " ' ~

r~ L

'V-- ~

r "~

�9 ,p.C) ,. [..~

�9

r ~ r ~

- ~ ~ o Q .

�9

,.~ �9

"r C :::t

co . . 0 o ~

�9 0

�9

~

Page 58: The Heritability of Chronometric Variables

186 Clocking the Mind

of the very wide range of RT-IQ correlations reported in various studies. The correlation is influenced by two conditions: (1) test complexity and (2) the mean and range of IQ in the subject sample, as the peak of the complexity function shifts to longer RTs as the mean IQ declines. Therefore, the significant RT-IQ correlations fall within a relatively narrow range of task complexity for various groups selected from different regions of the whole spectrum of ability in the population. Hence, when it comes to measuring general intelli- gence by means of RT there is probably no possibility of finding any single RT task with a level of task complexity that is optimally applicable to different samples that ranges widely in ability. The average RT-IQ correlation in the general population on any single task, therefore, represents an average of mostly suboptimal complexity levels (hence lower RT-IQ correlations) for most of the ability strata within in the whole population.

The optimum level of task complexity for the IQ-RT correlation is hypothesized to occur near the RT threshold between error-free responses and error responses. This is the point on the complexity continuum beyond which RT becomes less correlated (negatively) with IQ and errors become increasingly correlated (negatively) with IQ.

This hypothesis of a trade-off between RT and errors in the RT-IQ correlation and the Errors-IQ correlation was tested in a study expressly designed for this purpose (Schweizer, 1998). In order to study the relationships between errors, RT, and the RT-IQ correlation, the RTs and the number of errors had to be measured entirely on the High side of the com- plexity function shown in Figure 9.13, resulting in mean RTs ranging between 3 and 7 s; and even then the error rates averaged only 16 percent. Three sets of different RT tasks were used (numbers ordering, figures ordering, mental arithmetic). In each set, the task com- plexity was experimentally controlled as the independent variable to produce three distinct levels of complexity, determined by the number of homogeneous mental operations required to make a correct response. IQ was measured as the averaged scores on the Wechsler test (WAIS-R); subjects were 76 university students (mean IQ= 120.4, SD 9.6).

Figure 9.15 shows the results (averaged over the three different RT tasks) for the hypothesized functional relationships between the key variables. The consistent linearity of the relationships shows that it is possible to devise cognitive tasks that vary unidimen- sionally in complexity.

Unfortunately, a graph of the relation between complexity and the Error-IQ correlation is not possible with the given data. The Error-IQ correlations were said to be very small and only the two largest of the nine possible correlations were reported, both significant ( - . 24 and - .28, each at p < .05); but they evinced no systematic relationship to task com- plexity. It would probably require a considerably greater range of complexity and error rate to adequately test the relation between task complexity and the Errors-IQ correlation. In typical PTs it is so problematic to measure item complexity that the term is usually used synonymously with item difficulty, measured as the error rate (or percent passing) when all item responses are scored as either fight or wrong. Then, of course, the relationship between item difficulty and the Error-IQ correlation is a foregone conclusion. The corre- lation between item response times and IQ based on right-wrong scoring is typically very low, but this is mainly because there are so many different causes of error responses to test items, except in item sets that have been specially constructed to differ in difficulty along some unitary dimension of complexity. The meaning of complexity in chronometric tasks is discussed further in Chapter 11.

Page 59: The Heritability of Chronometric Variables

Chapter 10

Sensory Intake Speed and Inspection Time

Although tests of RT appear to be very simple compared to the typical items in psycho- metric tests, it tums out that reaction times, even simple reaction time (SRT), are actually complex phenomena with various points of entry for the play of individual differences in sensory, motor, and cognitive components. These can be empirically analyzed. Table 10.1 lists these pivotal points.

aSRT or CRT = total time elapsed between Stage 2 (RS onset) and 2Bc (overt response). The premotor and motor stages of RT are known to be heterogeneous with respect to indi- vidual differences; i.e., they show different correlations with age, IQ, and other external variables. For example, the premotor stage but not the motor stage of RT is correlated with IQ. Here we are not referring to movement time (MT) as this term is used in connection with RT devices that involve the subject's release of a home button and pressing a response button. The motor aspect of interest here involves only the release of the home button per

se. It is the intrinsic motor aspect of the initial overt response (pressing or releasing) the home button, i.e., the RT (also referred to as decision time, or DT) when the response con- sole has both a home button and a response button.

The motor stage actually begins well before the initial overt response, as shown by an ingenious experiment on SRT and CRT in a group of school-age children who were pre- natally exposed to alcohol (ALC) as compared with a normal control (NC) group (Simmons, Wass, Thomas, & Riley, 2002). Of interest here is only the partitioning of the RT into its premotor and motor components, referred to as the premotor RT (or PMRT) and the motor RT (or MRT). This division of the total RT was accomplished by obtaining electromyographic (EMG) recordings from electrodes attached to the subject's arm mus- cle while the subject was performing the RT task. A typical result for a single subject on SRT is illustrated in Figure 10.1, in which A indicates the stimulus onset (RS), B the begin- ning of the motor response, and C the overt response. The time interval A-B is PMRT, and the interval B-C is MRT. For children aged 8-9, the average PMRT is about 300 ms for SRT and about 400 ms for CRT; the corresponding MRTs are about 70 and 80 ms, so PMRT is about four to five times greater than the MRT. The PMRT is more sensitive than MRT by age and IQ differences. (Both variables showed significant deleterious effects of prenatal exposure to alcohol, with a larger effect for CRT than for SRT.)

An important conclusion from this demonstration is that even SRT is not causally uni- tary, but is a composite of at least two distinct variables, PMRT and MRT. Along with sen- sory transduction of the visual stimulus (RS) from the retina to the visual cortex (about 50 ms), the total time for perceptual intake and analysis of the RS apparently includes about 80 percent of the total RT, slightly more for CRT than for SRT. Moreover, the percentage of RT constituting the total PMRT increases as a function of RS complexity. Shortened exposure or weaker intensity of the RS requires longer analysis and results in a slower RT. In terms of signal detection theory, a longer-lasting stimulus better overcomes the level of random background "noise" thus increasing discrimination of the RS and hence the speed

Page 60: The Heritability of Chronometric Variables

188 Clocking the Mind

Table 10.1: Sequential analysis of sensory, cognitive, and motor components of SRT and choice reaction time (CRT).

Stage Subject's activity

0. Pretask

Task proper 1. Preparatory signal 2. Response stimulus (RS) onset a

A. Premotor stage

B. Motor stage

(a) Instructed on nature and demands of task (b) Practice trials; familiarization

(a) Vigilance, expectancy, focused attention

(a) Sensory transduction of RS to brain (b) Stimulus apprehension (in SRT) (b') plus discrimination (in CRT) (c) Response selection (in CRT) (a) Efferent nerve propagation (b) Recruitment of motor response (c) Overt response execution

and probability of a correct response to the RS, particularly as the RS increases in com- plexity, as in multiple CRT tasks. The background "noise" consists both of distracting exter- nal task-irrelevant stimuli and internal random neural activity. Thus an important question is: how long a stimulus exposure at a given level of stimulus intensity is required to accu- rately discriminate a given difference between two stimuli, absent any motor component?

Inspection Time (Visual)

The first systematic attempt to answer this question, conceived as a problem in psy- chophysics is known as the inspection-time paradigm, originally developed by Vickers, Nettelbeck, and Willson (1972). This paradigm attracted little interest outside psy- chophysics until it was discovered that measures of individual differences in inspection time (IT) had remarkably substantial correlations with IQ (Nettelbeck & Lally, 1976). Since then a large literature on IT, based on a great many empirical studies and theoretical discussions, has grown up mainly in differential psychology. As there are now compre- hensive reviews and meta-analyses of all the research on IT, it would be pointless to pres- ent here more than a summary of the main findings and issues, noting the most critical questions that are still in need of further empirical studies. To virtually enter the entire lit- erature on IT, with detailed summaries and discussions of many specific studies, readers should consult the key references (Brand & Deary, 1982; Deary, 1996, 2000a, Chapter 7; Deary & Stough, 1996; Luciano et al., 2005; Nettelbeck, 1987, pp. 295-346, 2003, pp. 77-91).

The best-known method for measuring IT is described in Chapter 2, pp. 33-35. The test stimulus and backward mask are presented tachistoscopically by means of light emitting diodes (LEDs). The IT paradigm differs most importantly from all other chronometric

Page 61: The Heritability of Chronometric Variables

Sensory Intake Speed and Inspection Time 189

B C

4 -

EMG

1.00

.m t--

>, 0.75 t _

0.50- (.9

UJ

0.25

0.00

RT

i i i i i i i i 1 [ 1 1 1 1 1 1 1 1 1 l l l l l l ~ t l J l l l l l l l l i l l

I PMRT

I l l i l l l l l [ l l i l i l i l l [

0 100 200 300 400 Time (milliseconds)

Figure 10.1: Simple RT of a 9-year-old, showing the time intervals for the PMRT (A-B) and MRT (B-C) and the corresponding electromyograph showing the muscle activation preceding the overt RT by 80 ms. (From Simmons et al., 2002, with permission of the

Research Society on Alcoholism.)

paradigms by completely eliminating any demand for a speeded motor response. Measurements of IT, therefore, reflect no motor component whatsoever. The sensory dis- crimination called for in the IT paradigm is so simple that every individual would perform with 100 percent accuracy if there were no severe time constraint on the presentation of the stimulus. For example, following a preparatory stimulus to focus the subject's attention, the subject must decide which one of two parallel fines presented side by side is shorter, where the shorter line is only half the length of the longer line (see Figure 2.14, p. 33). The two lines are exposed tachistoscopically and are followed after the very short IT exposure interval by a "backward mask" which completely covers and obliterates the test stimulus, thereby allow- ing no further inspection. The subject is encouraged to take as much time as needed to make a decision. Across multiple trials the interval between the test stimulus and the mask is sys- tematically varied until the experimenter has zeroed-in on the stimulus duration for which the subject can discriminate accurately on, say, 75 percent of the trials, i.e., the midpoint

Page 62: The Heritability of Chronometric Variables

190 Clocking the Mind

between 50 percent chance of guessing and 100 percent accurate discrimination. Figure 10.2 shows the performance of a single subject whose IT is about 12 ms by the 75 percent criterion.

The originator of the IT paradigm, the psychophysicist Douglas Vickers, first conceived of IT as a theoretical construct: namely, as the minimal amount of time needed for a sin- gle inspection, defined conceptually as the minimal sample of sensory input required for the simplest possible discrimination. The particular discriminated stimuli, consisting of two vertical parallel lines of conspicuously unequal lengths, was devised as the operational estimate of visual IT (VIT). Because somewhat different discriminanda of similar sim- plicity produced different values of IT and some subjects' performance was affected by extraneous perceived cues (e.g., an apparent motion effect on the occurrence of the mask- ing stimulus) critics of the IT paradigm argued that IT probably involves something more than sheer speed of sensory intake. Today most researchers, including Vickers, generally agree with the position stated by Mackenzie, Molley, Martin, Lovegrove, and McNicol (1991) that "IT now appears, not as the time required for initial stimulus intake, but as the time (or some consistent portion of the time) required to solve a problem: to make the required discrimination or matching judgment about the stimulus items" (p. 42).

Studies of RT and studies of IT have been quite insular and, until recently, both para- digms have seldom entered into one and the same study. Also, the comparatively large amount of time subjects take to make a judgment following the IT discriminandum has been virtually ignored. This IT response interval clearly does not meet any definition of RT, because it is intrinsic to the IT paradigm that there be no explicit time constraint on the subject's response. However, anyone who has taken the IT test, or observed others tak- ing it, will have noticed that as the duration of the IT interval between the test stimulus and

1 . 0 - -

0 . 9 -

"~ 0 . 8 -

0 L

ca. 0 . 7 - 0

e - -

0

o~ 0 . 6 - 0

0.5- _

0.4 0

J I ~ I ~ I

10 20 30

Duration (ms)

Figure 10.2: The probability of a correct discrimination as a function of stimulus duration for an individual. Each data point represents the mean of a number of trials. The individ- ual's IT is usually estimated as the average stimulus duration with .75 probability of cor- rect responses. (From Deary & Stough, 1996, with permission of the American

Psychological Association.)

Page 63: The Heritability of Chronometric Variables

Sensory Intake Speed and Inspection Time 191

the backward mask is made shorter, the overt DT increases. On the more difficult dis- criminations, the subject must think longer about what was seen in order to increase the probability of making the correct decision. To what extent does the subjects' trying to "read back" or recover the quickly fading memory trace of the stimulus play a part in IT performance? The fact that a decision is involved in IT implies that some cognitive effort is involved, as well as an involuntary sensory speed component. Although IT is only a frac- tion of the length of SRT or CRT, it is not necessarily any less complex than RT psycho- logically. The empirically least questionable difference between RT and IT is the absence of a motor component in IT. The discovery of other possible differences between RT and IT will depend on knowing more about the difference between the construct validity of each paradigm, their disattenuated correlation with each other, and differences in their cor- relations with external psychometric variables.

Reliability of IT

It is important to determine the reliability of IT for the specific subject sample and the par- ticular apparatus and procedure used in the study. A proper estimate of reliability is essen- tial for the interpretation of the correlation of the IT measurements with any external variables or for establishing its true construct validity.

The reliability for any given IT procedure is positively related to the number of test tri- als and the variance of IT in the study sample. However, it has not yet been established, as it has for RT, how closely the reliability of IT is predictable from the Spearman-Brown formula relating reliability to the number of items (or trials) in the test. Such parametric information is useful for any quantitative science. Investigators must also take note of the difference between the two main types of reliability coefficient: internal consistency and test-retest. They are conceptually different and each is appropriate for a given purpose. The fact that they typically have quite similar values is just an empirical coincidence rather than a conceptual or mathematical necessary. There is no rational basis for inferring one from the other. Internal consistency reliability (split-half or preferably coefficient alpha, which is the average of all possible split-half reliability coefficients for the given data set), is an index of the factor homogeneity of the measurements. Test-retest reliability estimates the temporal stability of the mean (or median) measurements obtained on different occa- sions separated by a specified time interval. It needs to be taken into account in evaluating replications of a study. If the test-retest stability coefficient is low, the results of attempted replications will vary more erratically than if the stability coefficient is high, depending on the number of subjects.

Fortunately, there is a full meta-analysis of the reliability of IT based on virtually all of the IT studies available prior to 2001 that compared reliability coefficients of IT and IQ: a grand total of 92 studies comprising 4197 subjects (Grudnik & Kranzler, 2001, Table 2), shown here in Table 10.2. In those studies that reported both internal consistency and test-retest reliability, only the coefficient with a lower value was used in calculating the mean reliability, therefore resulting in a conservative estimate of the average reliability, weighted by sample size. For the total of all samples the weighted mean reliability is .805. The mean for adults is .815; for children .782. Correction of the sample means for restric- tion of range increases the reliability about +.04 overall. These analyses indicate that the

Page 64: The Heritability of Chronometric Variables

192 Clocking the Mind

Table 10.2: Mean and variance of reliability coefficients for IT and IQ. a

Meta-analysis IT IQ

M V M V

Total .805 .015 .948 .033 Adults .815 .015 .942 .031 Children .782 .013 .945 .046

IT task type Visual Auditory

.833 .008 .942 .031

.815 .015 .942 .031

Strategy users/nonusers Users .805 .015 .948 .033 Nonusers .815 .015 .942 .033

aBased on Table 2 (p. 527) in Grudnik and Kranzler (2001).

reliability of both visual and auditory IT (AIT) is nearly as high as the typical psychome- tric tests used for individual diagnosis and could easily be made higher simply by increas- ing the number of test trials. The mean reliability of the IQ tests based on their standardization samples is about +.10 to +.15 higher than the mean reliability of IT as obtained in these studies. This is quite remarkable, in fact, because the IQ tests are com- posed of multiple diverse items that allow chance variation and subtest specificities to "average-out," whereas IT is an extremely homogeneous task administered in relatively few trials compared to the number of items in a typical IQ test. With more extensive test- ing, the reliability of IT could well exceed that of the best IQ tests.

Construct Validity of IT

This is the least researched and least satisfactorily answered question regarding IT. The construct measured by IT is still in question. The only way its construct validity can be determined is by means of correlational analyses, such as factor analysis and structural equation modeling, in the company of other well-defined variables, and by establishing its convergent and divergent validity in comparison with other chronometric methods.

Yet we can draw a few tentative conclusions from the relevant correlations now reported. IT is clearly different from RT, because when entered in a multiple regression equation along with RT it makes an independent contribution to the correlation with IQ or psychometric g. Also, IT shows a different profile of correlations with various psychome- tric tests from that of RT, being more highly correlated with visual-spatial tests than with verbal tests than it is with RT. And IT tends to be less strongly g loaded than RT. A sec- ond-order general speed factor, however, links IT to the third-order factor of psychometric g and hence to general IQ tests (O'Connor & Bums, 2003). A striking feature of O'Connor's and Burns's factor analysis of several different speed-of-processing tasks, including IT, the Hick paradigm, the odd-man-out (OMO) paradigm, and other speeded

Page 65: The Heritability of Chronometric Variables

Sensory Intake Speed and Inspection Time 193

tasks, is that the Hick RT variables have near-zero loadings (average loading = + 0.007) on the first-order factor (labeled Visualization Speed) on which IT has its largest loading (+ 0.350), but the OMO RT has its largest loading (+ 0.469) on this same Visualization factor. IT is loaded-0.021 on the DT factor on which the Hick RT variables (for 2-, 4-, and 8-response alternatives) have loadings of + 0.788, + 0.863, and + 0.773, respec- tively. (All the MT variables are loaded entirely on a separate factor. Thus it appears that IT may not even be a component of either 2-, 4-, and 8-CRT in the Hick paradigm, none of which demands much visual discrimination, but IT and OMO share a factor that demands a higher degree of visual discrimination. Hence it is certain that IT and RT are not factorially equivalent or interchangeable variables.

Is "speed of sensory intake" a suitable construct label for IT? To qualify for the breadth expected of a construct, the variable of VIT would have to be shown to correlate highly with other measures of sensory intake, such as its procedural counterpart, AIT. Strangely, those who research VIT have worked so independently of those who research AIT that the two paradigms have not yet been directly compared empirically in any systematic fash- ion. Both VIT and AIT, however, are quite comparably correlated with IQ, as shown in Table 10.3.

However, VIT and AIT could possibly have different correlations with different subtests or with different group factors or specificity in the various IQ tests. The latent trait repre- sented by three distinct auditory discrimination tests, including AIT, have correlations with each other in the range .60 to .75 and their general factor is amazingly correlated .64 with psychometric g as represented by the general factor of three very highly g-loaded psycho- metric tests (Deary, 2000, p. 217, Figure 7.9). Although both VIT and AIT are similarly correlated with g, they have different specificities (s) and probably one or more unshared group factors (F) related specifically to auditory (a) or visual (v) processes. The simplest

Table 10.3: Meta-analysis of the IT X IQ correlations (raw and corrected a) reported in published studies (from Grudnik & Kranzler, 2001, with permission from Elsevier).

Meta-analysis N K b Raw Correcteda

Mean Variance Mean Variance

Participants Total 4197 92 - .30 .03 -.51 .00 Adults 2887 62 -.31 .03 -.51 .00 Children 1310 30 -.29 .03 - .44 .01 IT task type Visual 2356 50 - .32 .03 -.49 .02 Auditory 390 10 - .30 .04 -.58 .07 Strategy users/nonusers Users 205 9 - .34 .03 - .60 .00 Nonusers 160 9 - .46 .03 -.77 .00

aCorrelations corrected for the sampling artifacts of sampling error, attenuation, and range variation. bNumber of independent studies. N is the number of individuals.

Page 66: The Heritability of Chronometric Variables

194 Clocking the Mind

likely factor models for VIT and AIT, for example, would be VIT=g+Fv+s; and AIT=g+Fa+s. (Note: by definition, the specificities, s, of VIT and AIT are different and uncorrelated.)

A whole area of investigation is opened by questions about the construct validity of IT and of how it is related to RT and other chronometric variables. So far, the main interest in IT has been its quite remarkable correlation with IQ. Few other indicators of IT's exter- nal or ecological validity have yet been sought.

Correlation of IT with IQ

Probably, the most consistent and indisputable fact about IT is its substantial correlation with IQ. There are three comprehensive reviews and meta-analyses of the findings (Nettelbeck, 1987; Kranzler & Jensen, 1989; Grudnik & Kranzler, 2001). Since the most recent meta-analysis, by Grudnik and Kranzler, is cumulative over all the published data, its conclusions, as far as they go, are definitive. The IT X IQ correlations are shown in Table 10.3. The correlations were also corrected for artifacts due to sampling error, attenu- ation, and range variation. The results are shown here separately for adults and children, visual and AIT, and for groups claiming to use a conscious strategy (such as an apparent motion effect) versus those claiming no such strategy in their performance of the IT task. The overall corrected mean r is -.51 (-.30 prior to correction). An important observation is that there is no significant variation in the corrected correlations across the various studies. Also, the correlation differences between children and adults, and between visual and AIT are statistically nonsignificant. The difference between strategy users (-.60) and nonusers (-.77) appears substantial, but because of the small number of samples (K = 9) it was not tested for significance. Grudnik and Kranzler (2001, p. 528) also point out those children who are diagnosed as reading disabled perform very differently on IT than do children of normal reading ability. These two groups show the same magnitude of RT X IQ correlation, but with opposite signs: disabled r = +.44, nondisabled r = - .44! This finding suggests that IT (and possibly the degree of discrepancy between visual and AIT) might be useful in the differential diagnosis and prescription for treatment of various reading disabilities.

IT in a Psychometric Factor Space

Although it is amply proven that the speed of IT is substantially correlated with IQ (r --- .50), it cannot be assumed that the IT is mainly correlated with g, even though IQ and g are highly correlated. This is because the IQ is not a unidimensional variable but is an inconsistent amalgam of a number of different factors and test specificity, such that g sel- dom accounts for more than 50 percent of the total common factor variance among the var- ious subtests in any given test battery. This allows the possibility that g might contribute less to the IQ x IT correlation than one or more of the other factors. This possibility has been looked at in several studies, with mostly ambiguous or inconsistent results, probably because of differences in the study samples, differences in the psychometric test batteries, and differences in the IT apparatuses and procedures.

Consider a battery composed of 11 psychometric tests: Advanced Raven and Multidimensional Aptitude Battery (MAB) and 36 chronometric variables based on 7

Page 67: The Heritability of Chronometric Variables

Sensory Intake Speed and Inspection Time 195

different elementary cognitive tasks (ECTs), first published in a study by Kranzler and Jensen (1991). In an orthogonalized hierarchical factor analysis (Schmid-Leiman) of these data, Carroll (1991b) found that IT was loaded only .19 on the highest-order factor (at the second-order), which he labeled "Decision Speed and Psychometric g " For comparison, the average of the loadings of SRT and CRT (Hick paradigm) on this factor are .37 and .56, respectively, and the OMO RT is loaded .62. The loadings of the 10 MAB psychometric subtests on the same general factor range from .28 to .46, with an average loading of .37. The Raven Advanced Progressive Matrices loaded .41. Apparently, IT has much less com- mon variance with the psychometric tests than does the RT variables. IT has its largest loading on a first-order factor that Carroll labeled "Decision speed m Hick and IT tasks" m a first-order common factor of Hick RT and IT residualized from the second-order gen- eral factor. Even the first order factor of "crystallized intelligence" (defined by MAB sub- tests: Vocabulary, Information, Comprehension, Similarities, Picture Completion) have loadings on the second-order g averaging .30 and ranging from .28 to .35. The largest Pearson r that IT has with any one of the six independent RT measures in this study is only .216 (Hick 8-CRT). Yet IT has similar correlations with two of the MAB subtests (Picture Completion .313, Spatial .221).

Another way to examine the relationship of IT to psychometric g is by the method of correlated vectors (fully described in Jensen, 1998b, Appendix B). In this case, the method consists essentially of obtaining the rank-order correlation between the column vector of disattenuated g loadings of the subtests of an intelligence battery with the corresponding column vector of each of the various subtests' disattenuated Pearson correlations with speed of IT. When this method has been applied to RT measures, there is typically a high positive correlation between the vector of the subtests' g loadings and the vector of the sub- tests' correlation with speed of processing in RT tasks.

The correlated vectors method was applied to IT by Deary and Crawford (1998), using the 11 subtests of the Wechsler Intelligence Scale for Adults-Revised (WAIS-R) to obtain the subtests' g loadings. In three independent representative subject samples, the g and IT vectors were negatively correlated ( - . 4 1 , - . 8 5 , - . 3 1 ) - an opposite result than is usual for RT (and various biological measures, such as brain size and cortical evoked poten- tials). Evidently, IT gains its correlation with IQ through factors other than g, such as the g-residualized common factor of the WAIS-R Performance subtests (see Deary, 1993). But could this result be peculiar to the WAIS-R, in which the so-called Verbal subtests typically have somewhat larger g loadings than the so-called Performance subtests? If the theoretical or "true" g is conceived to be independent of the verbal-nonverbal distinction, and is not intrinsically more (or less) related to verbal than to nonverbal tests, this find- ing by Deary and Crawford could be just a peculiar artifact of the WAIS-R battery. Its generality would depend on replications of the same outcome in different test batteries. It might be that verbal tests, which tap possibly the highest cognitive functions to evolve in the human brain, are intrinsically more sensitive than nonverbal tests to reflecting g dif- ferences m a question that can be answered only empirically.

A seeming contradictory result is found in a battery of 14 diverse subtests, including both verbal and nonverbal subtests as well as 4-CRT and IT. Nettelbeck and Rabbitt (1992) found that both RT and IT have their largest loadings (-.69 and- .71) on the second-order hierarchical g factor. The main difference between IT and CRT is that CRT has its only

Page 68: The Heritability of Chronometric Variables

196 Clocking the Mind

salient loading on g, while IT has a salient loading (-.33) also on a first-order factor which has salient loading also on the WAIS subtests of Picture Arrangement and Block Designs (with g loadings of .69 and .82). The method of correlated vectors applied to this test bat- tery used by Nettelbeck and Rabbitt shows a correlation of +.95 between the vector of subtests' g loadings and the vector of subtests' correlations with speed of IT m a result strongly opposite to the finding of Deary and Crawford (1998) described above. No defin- itive explanation of the huge discrepancy between the results in these two studies can be given, except to note the two main differences between the studies: (1) In the Deary and Crawford study the largest g loadings in the battery were on the verbal tests, whereas in the Nettelbeck and Rabbitt study the largest g loadings were on the nonverbal performance tests; and (2) the subject samples differ considerably in age and ability level. The D&C sample was fairly representative of the general adult population, whereas the N&R sample was relatively elderly and of high IQ, with ages ranging from 54 to 85, and a mean IQ at the 90th percentile of the general population. In any case, these conflicting studies are puz- zling and leave the present issue unresolved. Factor analyses based on larger batteries than just the 11 subtests of the Wechsler batteries should help in locating IT in the psychomet- ric factor space, as far as that might be possible.

At present, the safest conclusion seems to be that VIT is related to most conventional psychometric batteries both through g and through one or more lower-order factors repre- senting aspects of visual processing speed (or auditory processing speed in the case of AIT). As is also true for any measure of RT, much of the variance in IT, is unrelated to traditional measures of psychometric intelligence (see Chaiken, 1994). The hypothesized relationships of VIT and AIT to each other and to psychometric g are shown in terms of a Venn diagram in Figure 10.3, in which the area of each circle represents the total standardized variance

VlT AIT

Figure 10.3: Venn diagram showing hypothetical relationships between Visual and Auditory ITs and psychometric g. Areas: 1+2+3+4 = total variance of psychometric g, 2 + 4 = g vari- ance of VIT, 3+4=g variance of AIT, 4+5 = common variance of VIT and AIT, 5 = first-order

common factor variance of 6 = specific variance of VIT. 7 = specific variance of AIT.

Page 69: The Heritability of Chronometric Variables

Sensory Intake Speed and Inspection Time 197

(o~=1) of each of the variables (psychometric g, VIT, AIT). The areas of overlap represent latent variables, i.e., proportions of variance common to any two or three of the depicted variables, two of which (VIT and AIT) are directly measured, whereas g is wholly a latent variable common to all cognitive abilities.

It should be remembered, however, that the external validity and importance of IT does not depend on the number or nature of the factors on which it has its salient load- ings in any kind of factor analysis of conventional psychometric batteries, which scarcely encompass the entire array of cognitive abilities. There is a risk in validating new measures of intelligence against older established measures, because these may fail to reflect certain abilities or traits that are important predictors of performance in life. In most factor analyses of psychometric batteries, what appears as the large specificity of IT could have important correlates in certain cognitive aspects of living that are untapped by most psychometric tests which would constitute the external validity of IT. A striking example of this is seen in a study by Stough and Bates (2004). A compre- hensive test of secondary school scholastic achievement was correlated with a battery of four standard psychometric tests (correlations in parentheses): Raven Advanced Progressive Matrices (.23), Verbal IQ (.39), Figural-Spatial IQ (.39), and IT (-.74). The multiple R of the three standard tests with Scholastic Achievement is .44. Including IT along with the three standard tests raises the multiple R to .79, hence the incremental validity of IT is .67. Also, in a principal components analysis of all five measures, which yields two significant components (i.e., eigenvalues > 1), IT had the largest loading of any of the variables on the first component (-.75). (The second principal component has significant loadings only on IT and Achievement.) This remarkable finding, based on N = 50, of course, calls for replication.

The latest addition to the puzzlement so far regarding the nature of the correlation between IT and IQ is a most impressive large-scale behavior genetics study based on 2012 genetically related subjects in Holland and Australia (Luciano et al., 2005). This study used latent trait models to help decide between the "bottom up" versus the "top down" hypotheses of whether individual differences in perceptual speed influence differ- ences in intelligence (IQ) or differences in IQ cause differences in perceptual speed. The correlation (about-.40) between IQ and IT as a latent trait did not fit either the bottom- up or the top-down causal models but is best explained as the result of pleiotropic genes affecting variation in both IT and IQ, without implying that either variable has any causal effect on the other. (The genetic phenomenon known as pleiotropy is explained in Chapter 7, pp. 127-129) The authors concluded: "This finding of a common genetic factor pro- vides a better target for identifying genes involved in cognition than genes which are unique to specific traits" (p. 1).