Journal of Educational and Behavioral Statistics-2015-Liang-5-34
Transcript of Journal of Educational and Behavioral Statistics-2015-Liang-5-34
-
8/9/2019 Journal of Educational and Behavioral Statistics-2015-Liang-5-34
1/30
A Quasi-Parametric Method for Fitting FlexibleItem Response Functions
Longjuan Liang
Educational Testing Service
Michael W. Browne
Ohio State University
If standard two-parameter item response functions are employed in the analysis
of a test with some newly constructed items, it can be expected that, for some
items, the item response function (IRF) will not fit the data well. This lack of fit
can also occur when standard IRFs are fitted to personality or psychopathology
items. When investigating reasons for misfit, it is helpful to compare item
response curves (IRCs) visually to detect outlier items. This is only feasible if
the IRF employed is sufficiently flexible to display deviations in shape from the
norm. A quasi-parametric IRF that can be made arbitrarily flexible by increas-
ing the number of parameters is proposed for this purpose. To take capitaliza-
tion on chance into account, the use of Akaike information criterion or Bayesian
information criterion goodness of approximation measures is recommended for
suggesting the number of parameters to be retained. These measures balance
the effect on fit of random error of estimation against systematic error of
approximation. Computational aspects are considered and efficacy of the
methodology developed is demonstrated.
Keywords: item response theory; flexible item response function; monotonic polynomial
1. Introduction
When dichotomous items of an ability test are being analyzed, the most
widely employed item response functions (IRFs) have two parameters. Although
these IRFs have been found to be useful in general, they lack flexibility and there
are situations where they fail to fit some items. When this happens, it could be
either that the items have flaws or the data have characteristics that cannot be
handled by the IRF. In this situation, it is helpful to have access to a flexible IRF
that yields an item response curve (IRC) that will display differences in shape
between items.
Journal of Educational and Behavioral Statistics
2015, Vol. 40, No. 1, pp. 5–34
DOI: 10.3102/1076998614556816
# 2014 AERA. http://jebs.aera.net
5
at Alexandru Ioan Cuza on February 8, 2015http://jebs.aera.netDownloaded from
http://jebs.aera.net/http://jebs.aera.net/http://jebs.aera.net/http://jebs.aera.net/
-
8/9/2019 Journal of Educational and Behavioral Statistics-2015-Liang-5-34
2/30
A number of articles on flexible IRFs have appeared. For example, Drasgow,
Levine, Williams, McLaughlin, and Candell (1989) describe and illustrate the
use of multilinear formula score theory for nonparametric IRFs; Ramsay and
Winsberg (1991) use monotonic spline basis functions and calculate maximum
marginal likelihood (MML) estimates for the item parameters; Meijer and
Baneke (2004) discuss the use of nonparametric methods when analyzing psy-
chopathology and personality items. Other approaches for improving goodness
of fit of item response models are also possible. For example, Woods and Thissen
(2006) employ the two-parameter logistic (2PL) function for the IRF in all items
but replaced the standard normal ability distribution by a spline-based density. A
Bayesian approach to nonparametric item response modeling has been recently
developed by Duncan and MacEachern (2008, 2013). Although very good esti-
mates of nonstandard IRCs are produced, a considerable amount of computation
is required.
In his seminal article on nonparametric item response theory, Ramsay (1991)
suggested use of a nonparametric regression of item scores on normalized ability
surrogates using kernel smoothing as IRC. This approach is simple and robust
and allows the shape of the IRC to vary freely from one item to another. It has
made a substantial impact. Kernel smoothing does not constrain the IRF to be
monotonic, so that it can provide option response curves for incorrect options.
When the correct option of ability items is being analyzed, however, it is desir-
able to be able to constrain the IRCs to be monotonic. Lee (2007) investigated
this matter and used isotonic regression in conjunction with Ramsay’s approach
to obtain monotonic IRFs. A disadvantage of these nonparametric approaches is
that the IRF is not readily portable to scores of examinees that are not in the cali-
bration sample so that scoring test results for future examinees is difficult.
This article presents an IRF that is ‘‘quasi-parametric’’ (Ramsay, 1991, p. 613)
in the sense that it employs parameters that are intended solely for the provision of
a graphical representation of the IRC and not for interpretation in terms of some
underlying psychological process. Monotonicity can be guaranteed with this
approach that permits flexibility of the IRC and facilitates the use of the existing
parametric Bayesian Expected A Posteriori (EAP) estimates for the stochastic abil-
ity parameter. Our proposed filtered monotonic polynomial (FMP) IRF is the com-
position of a logistic function and a monotonic polynomial. Results concerning
cumulative distribution functions (cdfs) given by Elphinstone (1983) show that the
FMP IRF may be used to approximate any IRF with a continuous derivative arbi-
trarily closely by increasing the number of parameters in the monotonic polyno-
mial. Thus, this FMP IRF not only is flexible but is formulated as an algebraic
expression that is easily portable to future examinees not in the calibration sample.
When the FMP function is required to approximate some ‘‘population’’ IRF,
with few parameters it can happen that the approximating FMP IRF will require
more parameters than the (usually unknown) population IRF. When samples are
very large, and sampling error can be disregarded, the necessity for many FMP
A Quasi-Parametric Method for Fitting
6
at Alexandru Ioan Cuza on February 8, 2015http://jebs.aera.netDownloaded from
http://jebs.aera.net/http://jebs.aera.net/http://jebs.aera.net/http://jebs.aera.net/
-
8/9/2019 Journal of Educational and Behavioral Statistics-2015-Liang-5-34
3/30
parameters would not matter. In smaller samples where sampling error needs to
be considered, an attempt to use many FMP parameters could result in appreci-
able capitalization on chance. In general, it is preferable to retain fewer para-
meters in smaller samples than in large samples (see Browne, 2000; Cudeck &
Henly, 1991). We shall implement this principle by using the Akaike information
criteria (AIC; Akaike, 1973) or the Bayesian information criteria (BIC; Schwarz,
1978) as guides when choosing the number of FMP parameters for an item. These
are goodness of approximation measures that make no assumption of an exactly
correct model in the population, take sample size into account, limit the number
of parameters when the sample size is small, and allow more parameters as the
sample size increases.
The FMP approach involves no assumption that the number of item para-
meters is the same for all items, so that the shape of the IRC can vary from one
item to another, as is the case with the nonparametric regression approaches.
Because the usual two PL (2PL) IRF is a special case of the FMP family of IRFs,
it can be fitted at the same time as more flexible IRFs for comparative purposes.
Furthermore, the FMP requirement of monotonicity for the IRF may be discarded
to result in a filtered unconstrained polynomial (FUP) procedure that can assist
the diagnosis of nonmonotonic items. Although this article concentrates on
extensions to the computationally convenient 2PL IRF, basic theory is presented
in a manner that can be extended to other IRF families.
Unlike the 2PL, the one PL (1PL) IRF, or Rasch model, constrains an item para-
meter (discrimination) to equality across items (cf. Thissen & Orlando, 2001, p. 76,
equation 3). Consequently, the 1PL does not fit into the FMP computational frame-
work that estimates item parameters successively, 1 item at a time, rather than con-
currently for all items. Furthermore, the fundamental philosophy of the FMP
approach is to seek a model that fits given data as well as possible and contradicts
that of the Rasch model which requires that data should fit a given model to satisfy
mandatory measurement requirements (cf. Thissen & Orlando, 2001, pp. 90–91).
The following section gives a brief review of the parametric and nonpara-
metric approaches for estimating the IRF, including the joint maximum like-
lihood (JML) and MML parametric estimation methods. Thereafter, we
introduce the filtered polynomial IRF estimation method and consider the
choice of the number of parameters using the AIC information theoretical
approach. Subsequently, we present results from simulation studies and an
example with actual data. A summary of findings and conclusions of the
research is provided in the closing part of the article. Details concerning
parameter estimation are given in Online Appendices A and B.
2. Item Response Theory
Consider a N n data matrix, Y , with typical element y si which represents theresponse of examinee s, to item i with y si ¼ 1 if the response is correct and y si ¼ 0
Liang and Browne
7
at Alexandru Ioan Cuza on February 8, 2015http://jebs.aera.netDownloaded from
http://jebs.aera.net/http://jebs.aera.net/http://jebs.aera.net/http://jebs.aera.net/
-
8/9/2019 Journal of Educational and Behavioral Statistics-2015-Liang-5-34
4/30
otherwise. The responses of all examinees to item i are contained in the N 1vector, y#i, formed from column i of Y . A row s of Y provides the response patternfor examinee s and will be denoted by the 1 n vector, y0 s! with ith element y si.Thus, a column of Y represents scores of N examinees on an item and a row rep-resents scores of an examinee on n items.
We assume that there is a single latent trait, y, that influences an examinee’s
response to each item. The IRF for item i
PiðyÞ ¼ Probð yi ¼ 1jyÞ ð1Þgives the probability that an examinee with ability y will give the ‘‘correct’’
answer to item i. Because Pi(y) represents a probability, it must be bounded
below by 0 and above by 1. With ability or achievement tests, also, it makes sense
to assume that the probability of passing an item increases as y increases, so thatthe IRF will be monotonically increasing, bounded below by 0 and above by 1.
Any IRF Pi(y) that decreases as y increases would be symptomatic of an unusual
item.
The vectors, y s!, are regarded as independent realizations of a randomvector y. For each examinee s, there corresponds a realization y s of the latent
trait, y, that represents examinee ability. We assume local independence, that
is, that conditionally on y ¼ y s the elements of y are independently distrib-uted. Consequently, the probability of a specific response pattern y s! given
y s is
Probð y s!jy ¼ y sÞ ¼Yni¼1
P y si si 1 P sið Þ 1 y sið Þ; ð2Þ
where
P si ¼ Piðy sÞ: ð3Þ
2.1. Parametric IRFs
In addition to the abilities, y s, parametric IRFs involve additional item-specific parameters. As examples, we shall consider two well-known IRFs each
with two parameters, an item discrimination parameter ai and an item difficulty
parameter bi. The normal ogive IRF for item i is given (e.g., Lord & Novick,
1968, p. 366) by
Pi yð Þ ¼Z miðyÞ
11 ffiffiffiffiffiffi2
p expð z 2Þdz ð4Þ
and the two PL IRF (Birnbaum, 1968) by
Pi yð Þ ¼ 11 þ expfmi yð Þg
ð5Þ
A Quasi-Parametric Method for Fitting
8
at Alexandru Ioan Cuza on February 8, 2015http://jebs.aera.netDownloaded from
http://jebs.aera.net/http://jebs.aera.net/http://jebs.aera.net/http://jebs.aera.net/
-
8/9/2019 Journal of Educational and Behavioral Statistics-2015-Liang-5-34
5/30
where mi(y) is a linear function of y:
mi yð Þ ¼ ai y bið Þ: ð6ÞBoth IRFs are monotonic increasing if ai > 0 and are bounded by 0 and 1.
They are similar but differ in the scale of ai. This difference between the two
IRFs can be reduced substantially by replacing the function mi(y) in Equation
5 by the rescaled function 1:702mi yð Þ:1Early in the development of item response theory, the normal ogive IRF
was used predominantly but was replaced subsequently by the 2PL which is com-
putationally more convenient. Two methods are best known for obtaining para-
meter estimate using the 2PL. The first was originally suggested by Birnbaum
(1968). The unobservable ability variables, y s, are regarded as parameters to be
estimated rather than as realizations of a latent variable with a prespecified normal
distribution. A likelihood function is maximized jointly with respect to the item
parameters ai; bi; i ¼ 1; . . . ; n and the ability parameters y s; s ¼ 1; . . . ; N usingan alternating iterative algorithm. This method of estimation is referred to as JML
estimation. Consistency of the estimates has never been proved (e.g., Baker, 1992,
pp. 104–105) and ‘‘tuning’’ of the algorithm is necessary (Baker, 1992, p. 112).
The second method of estimation, introduced by Bock and Lieberman (1970)
for the 2PL, treated ability as a latent trait with a specified normal distribution
and maximized the marginal likelihood for item parameters alone integrating out
the ability variable, y. A Newton–Raphson algorithm was proposed and gave
acceptable results for a small number of items but was not practical for many
items. Significant improvements were provided by Bock and Aitkin (1981) who
approximated the density of y by a step function that facilitated the use of an
expectation-maximization algorithm (EM algorithm). This method of estimation
is known as MML and is now frequently employed.
The MML estimation method has advantages over JML in that it can obtain
estimates of the item parameters without estimating ability parameters,
y s; s ¼ 1; . . . ; N at the same time. Estimates for the latent traits, y s, may beobtained subsequently using a Bayesian method.
2.2. Ramsay’s Nonparametric IRF
Ramsay (1991, 2000) introduced a nonparametric approach to estimating an
IRC using kernel smoothing. This approach requires a surrogate ability value,~y s, for each examinee. All examinees are ranked according to total test score and ~yr is defined to be the estimated quantile of the standard normal distribution cor-
responding to rank r . A smoothed estimate of the IRF is given by:
b PiðyÞ ¼ P N r ¼1 K ~yr yh yriP N
r ¼1 K ~yr y
h
; ð7Þ
Liang and Browne
9
at Alexandru Ioan Cuza on February 8, 2015http://jebs.aera.netDownloaded from
http://jebs.aera.net/http://jebs.aera.net/http://jebs.aera.net/http://jebs.aera.net/
-
8/9/2019 Journal of Educational and Behavioral Statistics-2015-Liang-5-34
6/30
where yri is the score on item i of the examinee with rank r . The symmetric non-
negative weighting function
K z
ð Þ ¼ ð2
Þ1=2
exp
ð z 2=2
Þwhere z
¼ ~yr
y =h ð8Þ
is known as the Gaussian kernel. It will have a maximum when z ¼ 0 and decrease toward zero as j z j increases. An increase in the bandwidth h will resultin slower changes of the function b PiðyÞ but also increase bias of the function.Rapid changes or wiggles due to sampling fluctuation decrease as N increases,
so that bias can be reduced by reducing h as N increases. In the computer program
TestGraf (Ramsay, 2000), the default of h ¼ 1:1 N 0:2 is employed, so that h is afunction of N alone and decreases as N increases.
It is possible to use the nonparametric IRF defined by Equation 7 to compute
maximum likelihood estimates of the ability parameters y s; s ¼ 1; . . . ; N ; feed them back into the process for reranking the examinees, obtaining new surrogate
variables ~y s; s ¼ 1; . . . ; N , and carrying out an iterative procedure. In the Test-Graf (Ramsay, 2000) program, this can be done, but a manual intervention at
each iteration is required. This precludes use of the iterative procedure in random
sampling experiments. If no iterations are carried out, the original surrogate vari-
able values, ~y s, are output as estimates of the ability variables.
3. Quasi-parametric IRFs
The extension of the IRF for the 2PL in Equation 5 to yield IRFs that are
simultaneously both flexible and parametric will now be considered.
Elphinstone (1983, 1985) proposed a monotonic polynomial–based approach
for estimating an unknown univariate distribution function. Sinnott (1997) sub-
sequently named it the ‘‘filtered polynomial’’ distribution estimation method and
extended it to a multivariate setting. Here, the general methodology provided by
Elphinstone (1983) will be adapted to estimate an IRF of unknown functional
form. The likelihood function appropriate here for estimating an unknown IRFis different to that used by Elphinstone (1985) for estimating a distribution func-
tion of unknown functional form.
3.1. Filtered Polynomials
The IRF Pi(y) yields the probability that an examinee with ability y will
answer a specified item, i, correctly. Unless otherwise stated, each IRF to be con-
sidered here is assumed (i) to be monotonic increasing, (ii) to be bounded by 0
and 1, and (iii) to have a continuous first derivative with respect to y implyingthat the IRF is also continuous. Suppose that the functional form of some ‘‘true’’
IRF ~ Pi yð Þ is not known, but a known scalar valued function, H (m), of a scalar
A Quasi-Parametric Method for Fitting
10
at Alexandru Ioan Cuza on February 8, 2015http://jebs.aera.netDownloaded from
http://jebs.aera.net/http://jebs.aera.net/http://jebs.aera.net/http://jebs.aera.net/
-
8/9/2019 Journal of Educational and Behavioral Statistics-2015-Liang-5-34
7/30
valued argument, m, satisfies the three requirements of an IRF specified previ-
ously: for example, either the logistic function
H m
ð Þ ¼ 1
1 þ expðmÞ;
ð9
Þor normal ogive
H mð Þ ¼Z m
11 ffiffiffiffiffiffi2
p expð z 2=2Þdz ; ð10Þ
would be suitable.
It is known (e.g., Elphinstone, 1983, p. 167) that there exists at least one con-
tinuous monotonic function ~mi yð Þ such that~ Pi yð Þ ¼ H ~mi yð Þð Þ: ð11Þ
This monotonic function, ~mi yð Þ is, in general, not of a known functional form.It may, however, be approximated arbitrarily closely by a polynomial mi yð Þ of odd degree, 2k i þ 1, where k i 0, if k i is made sufficiently large (Elphinstone,1983, section 4). Thus,
mi yð Þ ¼ b0i þ b1iy þ b2iy2 þ þ b2k þ1;iy2k iþ1 ~mi yð Þ; ð12Þ
with 2k i
þ2 parameters represented by the vector b
0i
¼ b0i; b1i; . . . ; b2k
þ1;i
. A
reparameterization of bi that is used to ensure that the polynomial in Equation12 is monotonic will be described in subsection 3.2.
Any population IRF ~ Pi yð Þ in Equation 11 that is of an unknown functionalform may be approximated arbitrarily closely by the IRF of known functional
form
Pi yð Þ ¼ H mi yð Þð Þ; ð13Þif k i is sufficiently large. That is, the IRF is the composition of the filter with the
monotonic polynomial, P
¼ H
m. Thus, the ‘‘filter’’ H
ðÞ transforms the
unbounded monotonic polynomial, mi yð Þ, in Equation 12 into a monotonic IRC, Pi yð Þ; that is bounded by 0 and 1. (This terminology is motivated by an analogoussituation in signal processing in which a potentially unbounded signal is trans-
formed into a bounded signal through a device known as a ‘‘filter.’’) The IRF
defined in Equation 13 will be consequently referred to as an FMP model. If
no constraints are imposed on the coefficients in Equation 12 to ensure that mi(y)
is monotonic, the filtered function in Equation 13 will still be bounded by 0 and 1
but need not be monotonic. The resulting model will then be referred to as a fil-
tered unconstrained polynomial (FUP) model.
The logistic function in Equation 9 will be used henceforth as a filter because
it is algebraically convenient to do so. Use of the normal ogive as a filter would
give essentially the same results but leads to algebraic expressions that are more
Liang and Browne
11
at Alexandru Ioan Cuza on February 8, 2015http://jebs.aera.netDownloaded from
http://jebs.aera.net/http://jebs.aera.net/http://jebs.aera.net/http://jebs.aera.net/
-
8/9/2019 Journal of Educational and Behavioral Statistics-2015-Liang-5-34
8/30
complicated and less easily evaluated. Substitution of the polynomial in Equation
12 into the logistic filter in Equation 9 yields the IRF:
Pi y
ð Þ ¼ P y
jbi
ð Þ ¼ 1
1 þ exp b0i þ b1iy þ b2iy2 þ þ b2k iþ1;iy2k iþ1 ; ð14Þwhich applies to both the FUP and the FMP models. The difference is that the
coefficient vector, bi, is unconstrained for the FUP model and constraints are
applied to bi to ensure monotonicity of the IRC for the FMP model. These con-
straints are applied by means of reparameterizations that will be described in
Subsection 3.2.
Because k i can vary from one item to another, the shapes of the IRC for
different items may be different. When k i¼ 0, the IRF in Equation 14 isequivalent to the 2PL IRF of Equations (5) and (6) with b0i ¼ aibi and b1i ¼ ai. The filter, H ðÞ, may be any monotonic function that is bounded
by zero and one and has a continuous first derivative. It is also desirable that
H ðÞ should have the same domain as the domain hypothesized for theunknown ~ Pi yð Þ, so that the approximating IRF Pi(y) and the approximated IRF ~ Pi yð Þ have domains that match. The filter has the same mathematical
properties as a statistical cdf, so that alternative cdfs to those in Equations
9 and 10 could be tried as a filter.
Consequently in situations where it is plausible to restrict y to the nonnegative
real line, the gamma ogive could be tried as a filter. If y is assumed to be con-tained in a closed interval, the beta ogive could be employed. It is worth bearing
in mind that filters that are close in shape to that of the unknown IRF ~ Pi yð Þ willneed a lower degree for the monotonic polynomial than those that are more dis-
similar. The choice of filter is not always critical, however, because one can com-
pensate for an inappropriate filter to some extent by increasing the degree of the
monotonic polynomial. There are, however, practical limits to the degree of the
monotonic polynomial because computational instabilities are associated with
polynomial models of high degree.
3.2. Monotonicity Constraints
A necessary condition for the polynomial, mi(y), given in Equation 12 to be
monotonic is that it be of odd degree, 2k i þ 1. Here, we shall employ a parame-terization of an odd-degree polynomial that ensures that it is monotonic. The key
ideas were contained in a single formula that was presented by Ramsay (1977,
p. 108) in the context of monotonic transformations to additivity. These were
developed in detail by Elphinstone (1983, section 4) in the context of distribution
estimation.
A necessary and sufficient condition for mi(y) to be monotonic is that its first
derivative be a nonnegative polynomial
A Quasi-Parametric Method for Fitting
12
at Alexandru Ioan Cuza on February 8, 2015http://jebs.aera.netDownloaded from
http://jebs.aera.net/http://jebs.aera.net/http://jebs.aera.net/http://jebs.aera.net/
-
8/9/2019 Journal of Educational and Behavioral Statistics-2015-Liang-5-34
9/30
pi yð Þ ¼ d d y
mi yð Þ ¼ a0i þ a1iy þ þ a2k i;iy2k i 0 for all y ð15Þ
and must consequently be of even degree, 2k i. This polynomial, pi(y), has 2k i
þ1
coefficients that will be represented by the vector a0i ¼ a0i; a1i; . . . ; a2k i;i .Given the nonnegative polynomial pi(y) in Equation 15, the corresponding
monotonic polynomial mi(y) in Equation 12 is obtained from the indefinite
integral
mi yð Þ ¼ i þZ
pi yð Þd y; ð16Þ
where i is the constant of integration. Consequently, the relationships betweenthe coefficients of mi(y) and those of pi(y) are given by:
b0i ¼ i and b j ;i ¼a j 1;i
j for j ¼ 1; 2; . . . ; 2k i þ 1: ð17Þ
The polynomial pi(y) in Equation 15 needs to be evaluated subject to the
requirement that pi yð Þ 0 for all admissible y. This may be accomplished byusing the following reparameterization of pi(y) (Elphinstone, 1983, p. 173):
pþi y
ð Þ ¼ i Q
k i
j ¼11
2a j ;iy
þ ða2 j ;i
þb j ;i
Þy2
h i; k i > 0
i; k i ¼ 0:
8>>>: ð
18
Þ
The 2k i þ 1 coefficients i; a1;i;b1;i; . . . ;ak i ; bk i
of pþi yð Þ are required tosatisfy the k i þ 1 inequality constraints
i 0; and b j 0; j ¼ 1; . . . ; k i: ð19ÞThen, given the parameter vector
γ 0i
¼ ð i; i;a1;i;b1;i; . . . ;ak i ;bk i
Þ ð20
Þfor pþi yð Þ; the procedure described in Online Appendix A may be used to com-
pute the corresponding parameter vector
a0i ¼ a0i; a1i; . . . ; a2k i;i
ð21Þfor pi(y) that will ensure that pi yð Þ ¼ pþi yð Þ > 0. This procedure makes use of arecurrence relation. When ai has been obtained, Equation 17 may be used to
obtain the parameter vector
b
0
i ¼ b0i; b1i; . . . ; b2k iþ1;i ð22Þthat ensures that mi(y) in Equation 12 will be monotonic increasing in y. Thus,
the complicated inequality constraints on bi that are required for monotonicity
Liang and Browne
13
at Alexandru Ioan Cuza on February 8, 2015http://jebs.aera.netDownloaded from
http://jebs.aera.net/http://jebs.aera.net/http://jebs.aera.net/http://jebs.aera.net/
-
8/9/2019 Journal of Educational and Behavioral Statistics-2015-Liang-5-34
10/30
of the polynomial in Equation 12 are imposed by means of a double reparame-
terization: The parameter vector bi is a function (17) of ai, which in turn is a func-
tion of the parameter vector γ i that satisfies the simple linear inequality
constraints in Equation 19.
As an alternative to the reparameterization approach employed here, an
approach due to Hawkins (1994) for dealing with a monotonic polynomial by
applying equality constraints at judiciously chosen values of y would be worth
investigation.
3.3. Parameter Estimation
A two-stage estimation method based on Ramsay’s (1991) procedure will be
employed to estimate the item parameters and the abilities. Stage 1 is to obtain sur-
rogate values, ~y s; s ¼ 1; . . . ; N , for the examinees’ abilities, y s. In Ramsay’s pro-cedure, these surrogates are the quantiles of a standard normal distribution based
on ranked total test scores. A problem with ranking test scores is that ties can occur
very frequently especially for a short test with many examinees. In Ramsay’s Test-
Graf, ranks are randomly assigned to the tied test scores. To avoid this need for
random rank assignment, first principal component scores are used here to assign
ranks. Component scores are consequently obtained from the left singular vector
corresponding to the largest singular value of the centered data matrix Y 1y0Þð .If the sum of elements of the corresponding right singular vector is negative, both
the left and right singular vector are reflected. In addition to eliminating the occur-
rence of tied ranks, the first principal component scores optimally summarizes the
data matrix Y in one dimension. The principal component score ranks are trans-
formed to the quantiles, qi, of a standard normal distribution to yield the N 1 vec-tor, ~; of ability surrogates. This normalization of surrogate ability scores providesan identification constraint (Ramsay, 1991, p. 614) required for the model.
In the second stage, after the vector, ~; of normalized ability surrogates is
available, the conditional maximum likelihood estimates,
b γ i; i ¼ 1; . . . ; n, of the
item parameter vectors, given ~
, are obtained. Because of the assumption of localindependence, these estimates may be obtained 1 item at a time by minimizing
the scaled negative log-likelihood objective function:
F i ¼ N 1ln L γ ijy#i; ~ ¼ N 1 X N
s¼1 y si ln ð P siÞ þ 1 y sið Þ ln 1 P sið Þf g; ð23Þ
where P si ¼ Pi ~ s
: (The scaling by N 1 in Equation 23 is convenient because itavoids dependence of the magnitude of F i on sample size.) Computational details
are given in Online Appendix B.
This procedure may be viewed as a modified version of the JML estimation
method that is truncated after the first iteration. In the initial stages of our
research, a full JML iterative process for jointly estimating the FMP item
A Quasi-Parametric Method for Fitting
14
at Alexandru Ioan Cuza on February 8, 2015http://jebs.aera.netDownloaded from
http://jebs.aera.net/http://jebs.aera.net/http://jebs.aera.net/http://jebs.aera.net/
-
8/9/2019 Journal of Educational and Behavioral Statistics-2015-Liang-5-34
11/30
parameters, γ , and abilities, θ, by maximum likelihood was tried out on data that
were randomly generated according to an FMP model. After obtaining the con-
ditional maximum likelihood estimates
b γ ¼
b γ 01; . . . ; b
γ 0n
0 by minimizing Equa-
tion 23, each examinee’s ability parameter y s s ¼ 1; . . . ; N ; was estimated, one ata time, by minimizing the scaled conditional negative log-likelihood objective
function
N 1ln L y sjy s!; b γ ð Þ ¼ Xni¼1
y si lnð P siÞ þ 1 y sið Þ ln 1 P sið Þf g; s ¼ 1; . . . ; N ; ð24Þ
with respect to y s. The estimates obtained were then ranked and normalized to
replace the surrogates and iterative cycles were continued until convergence.
This procedure was not found satisfactory in the present context and was con-
sequently discarded. During iteration, item parameter estimates often drifted away
from the known values. This tendency increased as k was increased. Concurrently,
there was a tendency for the ability estimates, by s, to drift away from the randomlygenerated, and therefore known, y s, as the cycling procedure continued whether or
not convergence occurred. Thus, the iterated JML estimate of g was less satisfac-
tory than the currently used surrogate-based estimate. Further evidence that this
type of iterative algorithm is unsatisfactory will be found in Subsection 4.2.
Rather than regarding the abilities, y s, as parameters estimated by minimizing
Equation 24, they are therefore regarded here as realizations of a random variableand the EAP approach (cf. Bock & Moustaki, 2007, Subsection 5.3) is used to
obtain Bayesian estimates. To be consistent with the normalization of surrogates
in the first stage, the standard normal density j(y) is employed for y, so that the
expected value of the a posteriori distribution of abilities is given by:
E ðyjys; γ Þ ¼R 11
Qni¼1 PiðyÞ y si 1 PiðyÞf g1 y si yj yð Þd yR 1
1Qn
i¼1 PiðyÞ y si 1 PiðyÞf g1 y si jðyÞd yð25Þ
This expected value is estimated by replacing item parameters by estimates in
Equation 25 and approximating the two integrals involved using rectangular quadrature to obtain
by s ¼ "̂ yjy s!; ^ γ ð Þ ¼PQ
r ¼1Qn
i¼1½ Pið€yr Þ y si 1 Pið€yr Þn o1 y si
jð€yr Þ€yr PQr ¼1
Qni¼1½ Pið€yr Þ y si 1 Pið€yr Þ
n o1 y sijð€yr Þ
; s ¼ 1; . . . ; N ; ð26Þ
where €yr ; r ¼ 1; . . . ; Q are equally spaced points on the closed interval [4, 4].
3.4. Choice of the Number of Parameters
In Subsection 3.1, a hypothetical ‘‘true’’ IRF, ~ Pi yð Þ; for item i is specified.Because its functional form is unknown, it cannot be estimated directly but can
Liang and Browne
15
at Alexandru Ioan Cuza on February 8, 2015http://jebs.aera.netDownloaded from
http://jebs.aera.net/http://jebs.aera.net/http://jebs.aera.net/http://jebs.aera.net/
-
8/9/2019 Journal of Educational and Behavioral Statistics-2015-Liang-5-34
12/30
be approximated arbitrarily closely by an IRF, Pi(y), of known functional form
(Equation 13) by using a sufficient number of parameters. In this situation, it
is not possible to provide a goodness-of-fit test with a null hypothesis involving
an algebraic specification for a ‘‘true’’ model, ~
Pi yð Þ. The AIC (Akaike, 1973;Burnham & Anderson, 2004, pp. 266–268) is helpful under these circumstances,however. It may be regarded as an estimate of an expected cross-validation cri-
terion using the Kullback–Leibler measure of the distance between two distribu-
tions (De Leeuw, 1992) and is based on information theory (Burnham &
Anderson, 2004, section 2, pp. 264–266) rather than on classical statistical infer-
ence. The AIC is evaluated for each set of candidate models for Item i where each
model is obtained by varying the number of parameters in the approximating
IRF, PiðyÞ. The candidate model yielding the smallest AIC tentatively suggeststhe number of parameters to be employed. No statistical test, null hypothesis, or
significance level is involved.
The computing procedure described in Online Appendix B produces a
sequence of nested FMP models with k i ¼ 0; 1; 2; . . . ; k max because the final iter-ated parameter values for one model are employed in the definition of good start-
ing values for the next model. This sequence of models also provides a
convenient candidate set for the AIC. Because the unknown ‘‘true’’ model (11)
cannot be included in this candidate set, the aim of the analysis can only be
‘‘model approximation’’ and not ‘‘model verification.’’
The AIC is defined as AIC¼
2 ln Lþ
2q (e.g., Burnham & Anderson, 2004,
p. 268) where L is the likelihood function and q represents the number of estim-
able parameters for a model in the candidate set. Only the rank order of models
according to the AIC is employed in the selection process. Consequently, all val-
ues of the AIC in the candidate set may be multiplied by the same positive con-
stant without affecting any conclusions. The AIC increases without bound as N
increases, but this problem may be corrected by multiplying the AIC for each
candidate model by N 1. The scaled AIC for item i then is
AICi ¼
2 N 1 ln L b γ ijy#i; ~ þ
2qi
N ¼ 2 L
þ 2
N qi;
ð27
Þwhere L ¼ N 1 P N
s¼1 L si;
L si ¼ L b γ ij y si; ~y s ¼ y si ln P si þ 1 y sið Þ ln 1 P sið Þf g > 0 s ¼ 1; . . . ; N ; ð28Þ
is the contribution of examinee s to the log likelihood, P si ¼ Pið~y sÞ and
qi ¼ 2k i þ 2; ð29Þis the number of parameters. Because L is the mean of the identically distributed
L si, s ¼ 1; . . . ; N , its expected value E ð LÞ remains constant as N increases.
A Quasi-Parametric Method for Fitting
16
at Alexandru Ioan Cuza on February 8, 2015http://jebs.aera.netDownloaded from
http://jebs.aera.net/http://jebs.aera.net/http://jebs.aera.net/http://jebs.aera.net/
-
8/9/2019 Journal of Educational and Behavioral Statistics-2015-Liang-5-34
13/30
The negative log likelihood, N 1 ln L b γ ijy#i; ~ , is minimized with respectto the parameter vector γ i so that it decreases as the number of parameters, qi,
increases. Given N , therefore, the first term in Equation 27 decreases and the sec-
ond term, (2/ N )qi, increases as qi increases and, as a result, acts as a penalty onAICi. If sample size, N , is very large, however, the effect of an increase of qi on
the penalty (2/ N )qi will be negligibly small and the candidate model with the
largest number of parameters will yield the smallest AICi. Thus, the AIC tends
to favor IRFs with few parameters when samples are small, thereby avoiding
overfitting, and to favor IRFs with many parameters in large samples when over-
fitting is not an issue. Use of the AIC is not intended to provide an estimate of
some correct number of parameters in a population but rather to lead to a model,
possibly with few parameters, that will predict optimally outside the calibration
sample. Examples of the effect of sample size on the AIC in the analysis of cov-ariance structures are given in Browne (2000, Subsection 4.8).
The theoretical justification for the penalty term, (2/ N )qi, of AIC involves an
assumption that the likelihood function is correctly specified, so that the item
parameter estimates in b γ are maximum likelihood. This is not the case in the pres-ent situation because item parameter estimates are obtained by regarding the
latent ability variables y s as observed quantities, whereas in practice, they are
unobservable and replaced by surrogates ~y s. Consequently, the item parameter
estimates may only be regarded as some sort of pseudo-maximum likelihood. For
this reason, it is best to regard AICi in Equation 27 as a pseudo-AIC, having the
same formula as a legitimate AIC but being applied under other assumptions.
This pseudo-AIC still has the property of penalizing models with many para-
meters when the sample size is small, but the value of the penalty may not be
optimal.
The BIC proposed by Schwarz (1978) is similar to the AIC but has a different
penalty term. Burnham and Anderson (2004) compare the AIC and BIC and point
out that the BIC is not related to information theory. After scaling by N 1, the
BIC becomes
BICi ¼ 2 L þ ln N N
qi: ð30Þ
Again the BIC penalty term ðln N = N Þqi tends toward zero as N increases.Also, as used here, the BIC is in effect a pseudo-BIC. Characteristics of the AIC
and BIC that are shared by the pseudo-AIC and pseudo-BIC are as follows.
Both the AIC and BIC favor a small number of parameters in ‘‘small’’ samples
but can favor many parameters in ‘‘large’’ samples. If N 8, the number of para-meters indicated by the BIC will not be greater than that indicated by the AIC.
Neither the AIC nor the BIC is intended to suggest a ‘‘correct’’ model. Rather,
they give an indication of the number of parameters to use in order to give a good
approximation to an unspecified ‘‘true’’ IRF taking sample size into account. In
Liang and Browne
17
at Alexandru Ioan Cuza on February 8, 2015http://jebs.aera.netDownloaded from
http://jebs.aera.net/http://jebs.aera.net/http://jebs.aera.net/http://jebs.aera.net/
-
8/9/2019 Journal of Educational and Behavioral Statistics-2015-Liang-5-34
14/30
small samples where coefficient estimates are contaminated by error, fewer poly-
nomial coefficients should be used than in large samples where estimates will be
more accurate (cf. Browne, 2000).
The choice of the number of parameters q ¼
2k þ
2 using the AIC or BIC
plays a similar role when using the FMP or FUP models as the choice of the
bandwidth h in Ramsay’s nonparametric IRF. However, q and h operate in oppo-
site directions. Flexibility of the FMP IRC increases as the positive integer q
increases, whereas flexibility of the nonparametric IRC increases as the positive
real number h decreases toward zero. The use of h ¼ 1:1 N 0:2 for nonparametricIRC smoothing and k for flexibility of the FMP IRC have similar aims. Both are
intended to guard against using overflexible IRCs if sample sizes are small so that
random sampling fluctuations are large and overfitting can occur.
4. Numerical Studies
Two simulation studies are reported to illustrate properties of the FMP
approach in comparison with other approaches. These are followed by a numer-
ical illustration of the effect of alternative identification conditions for the FMP
model.
4.1. Simulation Study Design
This section deals with notation and with common aspects of the simulations.FMP_ k will represent an FMP model with index k yielding a monotonic poly-
nomial of degree 2k þ 1 in Equation 12 and an IRF with q ¼ 2k þ 2 parametersin Equation 14. In particular, the IRF for FMP_ 0 is a reparameterization of the
IRF for the 2PL, so that the two models are equivalent, even if estimation meth-
ods differ.
In both of the simulation studies, 100 random samples were generated. Each
sample consisted of 2,000 examinees’ responses on 20 items. Sets of 20 items
had IRF of the same algebraic form. Population parameter values were chosen
by generating them from specified distributions. Appropriate details will begiven in subsections 4.2 and 4.3.
Ability variables, y s, generated for Subsections 4.2 and 4.3, were independently
distributed according to the normal distribution with mean 0 and variance 1. Given
an IRF, P(y), and an examinee ability value, y s, the examinee’s response, y s, was
computed by drawing a random number, u s, from a uniform distribution U [0,1] and
defining the response by y s ¼ 1 when u s < P y sð Þ and y s ¼ 0 otherwise.Performance of the models was evaluated from the following two perspec-
tives: (i) for each item i, the closeness of the estimated IRC, b Pi yð Þ, to the chosen
population IRC, Pi yð Þ, and (ii) closeness of the N estimated abilities, by s estimated from Equation 26 to the actual randomly generated abilities, y s, employed to
produce the data. The root integrated mean square error (RIMSE; Ramsay,
A Quasi-Parametric Method for Fitting
18
at Alexandru Ioan Cuza on February 8, 2015http://jebs.aera.netDownloaded from
http://jebs.aera.net/http://jebs.aera.net/http://jebs.aera.net/http://jebs.aera.net/
-
8/9/2019 Journal of Educational and Behavioral Statistics-2015-Liang-5-34
15/30
1991, p. 621) was used as a measure of the closeness of the estimated IRC, b Pi yð Þ, to the IRC, Pi yð Þ, used for data generation of item i; i ¼ 1; . . . ; n. Thisis defined as
RIMSEðiÞIRC ¼
P Rr ¼1 ^ Pi €yr
Pi €yr 2jð€yr ÞP Rr ¼1 jð€yr Þ
264375
12
; ð31Þ
where the €yr , r ¼ 1; 2; . . . ; R, are evaluation points that are equally spaced on½3:5; 3:5, and j ð Þ represents the density function of a standard normal distri-
bution. Closeness of the estimates, by s, to the actually generated abilities, y s, wasevaluated using the root mean square error
RMSEy ¼P N
s¼1 by s y s 2 N
264375
12
; ð32Þ
where N is the number of examinees.
Because the rank order of ability estimates are often regarded as more impor-
tant than their actual values, the Spearman rank correlation coefficient, rŷ;y,
between the estimates,
by s, and the actually generated ability variables, y s, was
also used as a measure of equivalence of the estimated and actual ability
variables.
4.2. Simulation Study 1
Comparison of FMP_0 With MML and JML for the 2PL Model
The 2PL and FMP_0 IRCs are equivalent. This section compares our FMP_0
estimates of this IRC with two alternative estimates: the MML estimates
obtained using MULTILOG (Thissen, Chen, & Bock, 2003) and the JML esti-
mates obtained using the TESTAT module of the SYSTAT software package
(Version 10.2). MML estimates were chosen as a gold standard for comparison
with FMP_0 estimates because they appear to be the most widely employed. As
pointed out in Subsection 3.3, the FMP_0 estimation procedure may be regarded
as a JML estimation procedure truncated after the first iteration. Although JML
estimates are often regarded less favorably than MML estimates, JML estimates
provided by an independently written commercial program were also included to
demonstrate that the FMP_0 estimates, although related, do not have the same
suboptimal performance as the JML estimates.
Data for this simulation study were generated using the parameterization of
the 2PL IRF defined by Equations 5 and 6. Population parameter values for each
of the 20 items were chosen randomly. Discrimination parameters, a j ,
j ¼ 1; . . . ; 20 were drawn from a uniform distribution a U ½1:1; 1:8 and the
Liang and Browne
19
at Alexandru Ioan Cuza on February 8, 2015http://jebs.aera.netDownloaded from
http://jebs.aera.net/http://jebs.aera.net/http://jebs.aera.net/http://jebs.aera.net/
-
8/9/2019 Journal of Educational and Behavioral Statistics-2015-Liang-5-34
16/30
difficulty parameters, b j , from a normal distribution, b Nð0; 1Þ, truncated at2:5 and þ2:5.
In Figure 1, three different estimates of the same IRC are plotted for 4 selected
items in one of the samples. The FMP_0 estimate of the IRC was obtained using
our FMP computer program, the MML estimate of the IRC with MULTILOG
Version 7 (Thissen et al., 2003), and the JML estimate of the IRC with the TES-
TAT module of the SYSTAT (Version 10.2) software package. For each item, the
population IRC is also shown. In general, the MML estimated curve almost coin-
cides with the population curve, and the FMP_0 estimated curve is very slightly
further away. This suggests that the FMP_0 surrogate-based IRF estimates (k ¼0) are almost as good as the MML estimates. In all four diagrams in Figure 1, the
JML estimated curve is clearly further away from the population curve than is the
FMP_0 estimated curve. This indicates superiority of the FMP_0 item parameter
estimates over the JML estimates. In view of the fact that the FMP procedure is
a JML algorithm terminated after the first iteration, it appears that the further itera-
tion is harmful rather than helpful. This finding is concordant with comments in
Subsection 3.3 and is not surprising because of difficulties associated with maxi-
mum likelihood estimation when the number of parameters increases as the num-
ber of examinees increases (Neyman & Scott, 1948).
true
FMP_0
MML
JML
true
FMP_0
MML
JML
true
FMP_0
MML
JML
true
FMP_0
MML
JML
Item 1 Item 2
Item 3
0 . 0
0 . 2
0 . 4
0 . 6
0 . 8
1 . 0
-4 -2 0 2 4
0 . 0
0 . 2
0 . 4
0 . 6
0 . 8
1 . 0
-4 -2 0 2 4
Item 4
ability, θ
P r o b a b i l i t y
FIGURE 1. Comparisons of estimated IRCs among FMP, MML, and JML. IRC ¼ itemresponse curve; FMP ¼ filtered monotonic polynomial; JML ¼ joint maximumlikelihood.
A Quasi-Parametric Method for Fitting
20
at Alexandru Ioan Cuza on February 8, 2015http://jebs.aera.netDownloaded from
http://jebs.aera.net/http://jebs.aera.net/http://jebs.aera.net/http://jebs.aera.net/
-
8/9/2019 Journal of Educational and Behavioral Statistics-2015-Liang-5-34
17/30
Plots of pairs of measures of closeness of estimated abilities, by s, s ¼ 1; . . . ; 100, to the actual randomly generated abilities, y s, are given inFigure 2. In this figure, the left plot compares FMP_0 with MML and JML
in terms of RMSEy and the right plot compares the rank correlations. MML
estimates have very slightly smaller (better) RMSEy’s than the FMP_0 esti-mates obtained using Equation 26. The rank correlations from FMP_0 esti-
mates and from MML estimates are very close. The FMP_0 ability
estimates produce smaller (better) RMSEy values and higher (better) rank
correlation values than the JML estimates. This finding is in agreement with
the discussion in Subsection 3.3.
Means and standard deviations (in parentheses) of accuracy measures over the
100 generated samples are shown in Table 1. As an overall measure of accuracy
of estimated IRCs, the average RIMSEIRC (see Equation 31)
RIMSEIRC ¼ 120
X20i¼1
RIMSEðiÞIRC ð33Þ
was used. Mean accuracy measures, RIMSEIRC, are shown in the first row. The
RIMSEIRC measures for FMP_0 and MML are quite close; the measure for MML
being smaller (better), as can be expected. The RIMSEIRC measure for JML is
clearly inferior to (higher than) those of FMP_0 and MML. This observation is
concordant with the trends visible in Figure 1.
The second and third rows give the mean RMSEy and rank correlation mea-sures of accuracy of the ability variable estimates, by, provided by the three esti-mation procedures. Accuracy as measured mean RMSEy is essentially the same
0.36 0.38 0.40 0.42
0 . 3
6
0 . 3
8
0 . 4
0
0 . 4
2
FMP_0 (RMSE for abilities)
M M L o r J M L
MMLJML
0.91 0.92 0.93 0.94
0 . 9
1
0 . 9
2
0 . 9
3
0 . 9
4
FMP_0 (rank correlations for abilities)
M M L o r J M L
MMLJML
FIGURE 2. Comparisons of RMSE y’s for FMP_0, MML, and JML. RMSE ¼ root mean square error; FMP ¼ filtered monotonic polynomial; JML ¼ joint maximum likelihood.
Liang and Browne
21
at Alexandru Ioan Cuza on February 8, 2015http://jebs.aera.netDownloaded from
http://jebs.aera.net/http://jebs.aera.net/http://jebs.aera.net/http://jebs.aera.net/
-
8/9/2019 Journal of Educational and Behavioral Statistics-2015-Liang-5-34
18/30
for FMP_0 and MML and is somewhat inferior for JML. The mean rank correla-
tion is essentially the same for the three methods.The overall impression given by this simulation study is that the FMP_0 esti-
mates are nearly as accurate as MML estimates and are clearly more accurate
than JML despite the fact that FMP_0 may be regarded as JML truncated after
one iteration (also see subsection 3.3).
4.3. Simulation Study 2
This simulation study represents the type of situation for which the FMP
model is intended (see subsection 3.1). The true IRF, ~ Pi yð Þ, is unknown to theuser and is approximated by the FMP in Equation 13.
In this simulation study, the true IRF was chosen to be the cdf of a mixture of
two normal distributions:
~ P yj;m1;s1;m2;s2ð Þ ¼ F yjm1;s1ð Þ þ ð1 ÞF yjm2;s2ð Þ ð34Þwhere is the selection probability and F yjm; sð Þ represents the cdf of a normaldistribution with mean m and variance s. Values of these parameters for each
of the n
¼ 20 items were generated randomly using the distributions:
U ½0:3; 0:7, m1 N 1:5; 0:1ð Þ;s1 N 1; 0:1ð Þ, m2 N 1:0; 0:1ð Þ, and s2 N 0:4; 0:1ð Þ.
The Ramsay TestGraf model, with the default bandwidth h ¼ 1:120000:2 ¼ 0:24, and FMP_ k models with k ¼ 0; . . . ; 4 were fitted to 100 ran-dom samples with N ¼ 2,000 and n ¼ 20. Thus, the simplest FMP model wasFMP_0 (2PL) with 2 parameters and the most complex FMP_4 with 10 para-
meters, while the ‘‘true’’ model, treated as unknown in all analyses, had 5
parameters.
Figure 3 shows IRCs for 4 of the items estimated from one of the samples. Values
for k AIC (k suggested by AIC) are shown in the lower right-hand corner. All k AICturned out to be equal to 1, not only in the 4 selected items but also in the remaining
16 items. The true curve, TestGraf curve, and the FMP curves for k AIC are shown. It
TABLE 1.
Means and Standard Deviations of Accuracy Measures.
FMP_0 MML JML
RIMSEIRCð 0Þ 0.024 (0.001) 0.014 (0.001) 0.076 (0.001)RMSEyð 0Þ 0.382 (0.011) 0.379 (0.011) 0.403 (0.011)Rank Corr yð Þð 1Þ 0.928 (0.006) 0.928 (0.006) 0.924 (0.006) Note. FMP ¼ filtered monotonic polynomial; MML ¼ maximum marginal likelihood; JML ¼ jointmaximum likelihood; RIMSE ¼ root integrated mean square error; RMSE ¼ root mean square error.
A Quasi-Parametric Method for Fitting
22
at Alexandru Ioan Cuza on February 8, 2015http://jebs.aera.netDownloaded from
http://jebs.aera.net/http://jebs.aera.net/http://jebs.aera.net/http://jebs.aera.net/
-
8/9/2019 Journal of Educational and Behavioral Statistics-2015-Liang-5-34
19/30
is difficult to compare the TestGraf and FMP curves because they use different
quantities (h and k ) to control smoothness. In this study, however, TestGraf tends
to fit the straight lower part of the true curve better and FMP the sharp curve in the
upper half.
Figure 4 compares the RMSEy fit measures for estimated ability by between TestGraf and FMP_ k AIC. The top figure shows that the RMSEy’sof FMP_ k AIC are better (closer to zero) than those of TestGraf in this simu-
lation study. In the bottom figure, the rank correlations for FMP_ k AIC are
again better (closer to 1) than those for TestGraf. It should be borne in mind,
however, that the default choice of normalized total test scores for ability
estimates was used in TestGraf. The alternative iterative facility requires a
user intervention at each iteration and therefore is not practical in simulation
experiments. A possible explanation for the poorer results of TestGraf is that
sample size is N
¼2,000 and number of items is n
¼20 which would lead to
many ties in the total scores used to provide ranks for the normalization pro-
cess. These ties are resolved in TestGraf by generating random orderings
within ties (cf. subsection 2.2).
Item 13
TrueTestGraf
FMP_k1_AIC
TrueTestGraf
FMP_k1_AIC
TrueTestGraf
FMP_k1_AIC
TrueTestGraf
FMP_k1_AIC
Item 14
Item 15
0 . 0
0 . 2
0 . 4
0 . 6
0 . 8
1 . 0
-2 -1 0 1 2
0 . 0
0 . 2
0 . 4
0 . 6
0
. 8
1 . 0
-2 -1 0 1 2
Item 16
ability, θ
P r o b a b i l i t y
FIGURE 3. Comparisons of the estimated IRCs ( N ¼ 2,000). IRC ¼ item response curve.
Liang and Browne
23
at Alexandru Ioan Cuza on February 8, 2015http://jebs.aera.netDownloaded from
http://jebs.aera.net/http://jebs.aera.net/http://jebs.aera.net/http://jebs.aera.net/
-
8/9/2019 Journal of Educational and Behavioral Statistics-2015-Liang-5-34
20/30
Table 2 summarizes accuracy measures for TestGraf, FMP_ k AIC, and
FMP_ k AIC. Entries are means and standard deviations (in parentheses) calculated
over items and random samples (i.e., 20 100 ¼ 2; 000 observations). The firstrow shows that the three estimation methods yielded essentially the same
RIMSEIRC, so that there was little to choose in overall accuracy of the three
methods for approximating the chosen true IRCs. When abilities, y, are esti-
mated, the situation changes. It can be seen from row 2 that the mean RMSEy
was essentially the same for FMP_ k AIC and FMP_ k BIC, but these were noticeably better (smaller) than that for TestGraf. Again row 3 shows that the FMP_ k AIC and
FMP_ k BIC yielded essentially the same Rank_Corr (y) which was noticeably bet-
ter (larger) than that for TestGraf.
To evaluate how the FMP model performs with a smaller sample size, the
FMP IRF was also fitted to the first 300 of the 2,000 simulated examinees for
each of the 100 simulated samples. Table 3 summarizes the same information
as in Table 2, but with a sample size of N ¼ 300 instead of N ¼ 2,000.As can be expected, the IRC fit measures, RIMSEIRC, in Tables 2 and 3 indi-
cate less accuracy of estimates when sample size drops from 2,000 to 300. On theother hand, interpretation of IRCs is hardly affected by the reduction of sample
size. Figure 5 shows IRCs based on N ¼ 300 for the same 4 items plotted in
0.45 0.50 0.55 0.60 0.65 0.70 0.75
0 . 4
5
0 . 5
5
0 . 6
5
0 . 7
5
AIC selected model (RMSE for abilities)
T E S
T G R A F
0.70 0.75 0.80 0.85
0 . 7
0
0 . 7
5
0 . 8 0
0 . 8
5
AIC selected model (rank correlation for abilities)
T E S T G R
A F
FIGURE 4. Comparisons between TestGraf and FMP_ k AIC of accuracy measures of
by.
FMP
¼ filtered monotonic polynomial.
A Quasi-Parametric Method for Fitting
24
at Alexandru Ioan Cuza on February 8, 2015http://jebs.aera.netDownloaded from
http://jebs.aera.net/http://jebs.aera.net/http://jebs.aera.net/http://jebs.aera.net/
-
8/9/2019 Journal of Educational and Behavioral Statistics-2015-Liang-5-34
21/30
Figure 3 for N ¼ 2,000. Comparison of Figures 5 and 3 suggests that conclusionsdrawn from IRCs based on N ¼ 300 do not differ much from those drawn fromthe corresponding IRCs based on N ¼ 2,000.
Differences in ability fit measures, RMSEy and Rank_Corr (y), between
Tables 2 and 3 seem sufficiently small to be disregarded.
4.4. A Numerical Experiment to Investigate the Assumption of a Normal
Distribution for y
As pointed out by Ramsay (1991, p. 614, equation 6), a change of distributionfor y does not affect model fit, provided that it is accompanied by an appropriate
change in the IRF. Thus, the assumption of a normal distribution for y is an iden-
tification condition for the data generation process when the IRFs are uncon-
strained. The distribution of y cannot be estimated unless constraints are
imposed on the functional form of the IRFs (cf. Woods & Thissen, 2006). It is
not possible to simultaneously estimate the density of y and the item IRFs.
The FMP methodology proposed here for estimating an IRF specifies a
N ð0; 1
Þdistribution of y for identification purposes and uses normalized surro-
gate abilities (see subsection 3.3). Furthermore, when the EAP procedure (Equa-
tion 26) for obtaining ability estimates, by, is employed, a N ð0; 1Þ is assumed again as the prior distribution for y.
TABLE 2.
Means and Standard Deviations of RMSEs for TestGraf and FMP.
TestGraf FMP_ k AIC FMP_ k BIC
RIMSEIRC 0ð Þ 0.041 (0.003) 0.042 (0.003) 0.042 (0.004)RMSEy 0ð Þ 0.707 (0.048) 0.481 (0.012) 0.482 (0.012)Rank Corr ðyÞ 1ð Þ 0.769 (0.037) 0.834 (0.012) 0.835 (0.012) Note. N ¼ 2,000. FMP ¼ filtered monotonic polynomial; RIMSE ¼ root integrated mean squareerror; RMSE ¼ root mean square error.
TABLE 3.
Means and Standard Deviations of RMSEs for TestGraf and FMP.
TestGraf FMP_ k AIC FMP_ k BIC
RIMSEIRC 0ð Þ 0.069 (0.008) 0.064 (0.009) 0.075 (0.007)RMSEy 0ð Þ 0.695 (0.076) 0.492 (0.021) 0.492 (0.021)(Rank Corr ðyÞ 1ð Þ 0.763 (0.050) 0.828 (0.027) 0.829 (0.027) Note. N ¼ 300. FMP ¼ filtered monotonic polynomial; RIMSE ¼ root integrated mean square error;RMSE ¼ root mean square error.
Liang and Browne
25
at Alexandru Ioan Cuza on February 8, 2015http://jebs.aera.netDownloaded from
http://jebs.aera.net/http://jebs.aera.net/http://jebs.aera.net/http://jebs.aera.net/
-
8/9/2019 Journal of Educational and Behavioral Statistics-2015-Liang-5-34
22/30
We shall demonstrate by means of a numerical example that if
i. in an artificially constructed population, the generation distribution used for y is
nonnormal (e.g., bimodal) and simultaneously all IRFs are generated as 2PL
ii. the identification condition that y is normal is used in the FMP estimation proce-
dure by normalizing the surrogates then
iii. the resulting unconstrained estimates of the item IRFs are not 2PL.
This result is stated at the population level but is investigated here using two
finite, but very large ( N ¼ 100,000) data sets, regarded as finite pseudo- populations. These are used to demonstrate the effect of changing the distribution
chosen for y without changing the IRCs for n ¼ 20 items. In one data set, referred to as DS-B, the distribution used for generating y is chosen to be symmetric and
strongly bimodal with a mean of 0 and a standard deviation of 1. This bimodal
distribution is generated by the mixture of a N ð2; 51=2Þ and an independent N ð2; 51=2Þ with a probability of .5 for selecting each component distribution.For the other data set, referred to as DS-N, the distribution used for generating y
Item 13
True
TestGraf
FMP_k1_AIC
True
TestGraf
FMP_k2_AIC
True
TestGraf
FMP_k1_AIC
True
TestGraf
FMP_k1_AIC
Item 14
Item 15
0 . 0
0 . 2
0 . 4
0 . 6
0 . 8
1 . 0
-2 -1 0 1 2
0 . 0
0 . 2
0 . 4
0 . 6
0 . 8
1 . 0
-2 -1 0 1 2
Item 16
ability, θ
P r o b a b i l i t y
FIGURE 5. Comparisons of the estimated IRCs ( N ¼ 300). IRC ¼ item response curve.
A Quasi-Parametric Method for Fitting
26
at Alexandru Ioan Cuza on February 8, 2015http://jebs.aera.netDownloaded from
http://jebs.aera.net/http://jebs.aera.net/http://jebs.aera.net/http://jebs.aera.net/
-
8/9/2019 Journal of Educational and Behavioral Statistics-2015-Liang-5-34
23/30
is N ð0; 1Þ. Item scores, y s, for the two data sets are generated as described in sub-section 4.1 using a 2PL (FMP_0) IRF defined by Equations 5 and 6 for each item.
Item parameter values are equal across the two data sets for each of the 20
items. The only difference in the generation process for the two data sets is that
bimodal y’s are used for DS-B and normal y’s for DS-N. Superimposed kernel-
smoothed density functions for y in the two data sets are shown in Figure 6.
Both data sets are then analyzed in the same way using the FMP method
described in Subsection 3.3. For item parameter estimation in both DS-B and
DS-N, k i ¼ 2 is chosen for all items to yield equally flexible IRCs in the two datasets. Thus, the y distribution identification condition used when generating DS-N
matches the y distribution identification condition made in its analysis. However,
the y distribution identification condition used when generating DS-B conflicts
with the y distribution identification condition made in its analysis. Because the
two data sets employ the same items and are analyzed in exactly the same man-
ner, any differences in estimated IRFs can be attributed to the conflict of identi-
fication conditions for the distribution of y in the analysis of DS-B.
Superimposed IRCs obtained from the two data sets are shown for 4 of the items
in Figure 7. In all four figures, the estimated B-IRC does not coincide with the esti-
mated N-IRC although the same IRF was used at the generation stage. This is due
to the conflict in identification conditions on the distribution of y in DS_B at the
generation stage with those at the estimation stage. There is no such conflict in
DS-N. The distortion of B-IRC in the two figures in the first row occurs with items
of medium difficulty and is hardly noticeable. In the second row, the distortion is
more visible and occurs with items of high and of low difficulty. Thus, the differ-
ence in identification conditions on y employed at the generation and estimation
stages can affect different types of items in different ways.
Ability, θ
D e n s i t y
Bimodal θ
Normal θ
-4 -2 0 2 4
0 . 0
0 . 1
0 . 2
0 . 3
0 . 4
0 . 5
FIGURE 6. Superimposed normal and bimodal densities for y.
Liang and Browne
27
at Alexandru Ioan Cuza on February 8, 2015http://jebs.aera.netDownloaded from
http://jebs.aera.net/http://jebs.aera.net/http://jebs.aera.net/http://jebs.aera.net/
-
8/9/2019 Journal of Educational and Behavioral Statistics-2015-Liang-5-34
24/30
Without knowledge of the generation process, the two B-IRCs in the second
row could easily be misinterpreted as indicating that a 2PL IRF is inappropriate
for the data. There is, however, a way of detecting (without prior knowledge) a
difference between the y distribution at the generation stage from the known
assumption of a normal distribution at the estimation stage. This is to obtain EAP
estimates, by s, of the abilities using Equation 26 and estimate their density usingkernel smoothing. Figure 8 shows kernel smoothed plots of densities for these
ability estimates obtained from data sets N and B. The density of by from DS-Bis clearly bimodal, although not as noticeably as that in Figure 6, and the densityof by from DS-N is essentially normal.
In summary, Figures 7 and 8 indicate that a conflict of distribution assump-
tions affects both the estimated IRFs and the distribution of ability estimates.
5. An Example Using Actual Data
In the previous section, the FMP approach was shown to be useful by means of
simulation studies. Here, FMP and FUP models will be applied to an actual data
set that is included with the TestGraf distribution (Ramsay, 2000). The FMP and
FUP results will be compared with those from the TestGraf program that does not
impose monotonicity requirements on the estimated IRCs. The data set consists
Bimodal θ
Normal θ
Bimodal θ
Normal θ
Bimodal θ
Normal θ
0 . 0
0 . 2
0 . 4
0 .
6
0 . 8
1 . 0
-4 -2 0 2 4
0 . 0
0 . 2
0 . 4
0 . 6
0 . 8
1 . 0
-4 -2 0 2 4
Bimodal θ
Normal θ
Ability, θ
P r o b a b i l i t y
k= 2
FIGURE 7. Examples of estimated IRCs when density of y is either bimodal or normal.
IRC ¼ item response curve.
A Quasi-Parametric Method for Fitting
28
at Alexandru Ioan Cuza on February 8, 2015http://jebs.aera.netDownloaded from
http://jebs.aera.net/http://jebs.aera.net/http://jebs.aera.net/http://jebs.aera.net/
-
8/9/2019 Journal of Educational and Behavioral Statistics-2015-Liang-5-34
25/30
of 379 students’ responses to an examination with 100 four-option multiple-
choice questions which was given for Psychology 101, an introductory psychol-
ogy course.The data in the original file were recoded dichotomously with a ‘‘1’’ for a
correct response and a ‘‘0’’ otherwise. Missing responses were treated as incor-
rect responses. To decide on the degree of the polynomial, models were fitted
sequentially with k ¼ 0; 1; . . . ; 4 yielding corresponding polynomials of degree1, 3, . . . , 9 and the optimal values of k suggested by both the AIC and BIC were
recorded. This was done independently for the FMP and FUP. The default value
h ¼ 1:1 3790:2 ¼ 0.34 of the bandwidth was employed for TestGraf.FMP, FUP, and TestGraf IRFs may all be regarded as different regressions
with the probability of passing an item as independent variable on the surrogateability score, ~y, as independent variable. In order to provide a graphical represen-
tation of the relationship between data and the IRCs, reference points were
plotted on the same graph as the IRCs. To obtain these points, a truncated ability
range of [3, 3] was first divided into 12 intervals of length .5. Corresponding toeach interval, a single reference point (y, p) was obtained with y equal to the mid-
point of the interval and p equal to the proportion of examinees with surrogate
abilities,
by s, in the interval who correctly answered the item. If any interval was
empty, the corresponding reference point was omitted. These reference points are
valid for the FMP and FUP IRCs for all k because the same surrogate abilities areused. For convenience, these reference points could also be used for the IRC from
TestGraf that uses different surrogate values.
k=2
EAP θ^
D e n s i t y
Bimodal θ
Normal θ
-4 -2 0 2 4
0 . 0
0 . 1
0 . 2
0 . 3
0 . 4
0 . 5
FIGURE 8. Superimposed densities of estimates by from DS-B and DS-N.
Liang and Browne
29
at Alexandru Ioan Cuza on February 8, 2015http://jebs.aera.netDownloaded from
http://jebs.aera.net/http://jebs.aera.net/http://jebs.aera.net/http://jebs.aera.net/
-
8/9/2019 Journal of Educational and Behavioral Statistics-2015-Liang-5-34
26/30
IRC plots for some selected non-2PL items from the introductory psychology test
are shown in Figure 9. For each item, three IRCs are plotted: (i) TestGraf, (ii)
FMP_ k AICwhere k AICyields the lowest AIC for FMP, and (iii) FUP_ k AIC, where k AICyields the lowest AIC for FUP. (In the legend, FMP-k1 stands for FMP_ k AIC¼ 1and so on.) For each item, the reference points are represented by small circles.
The following observations may be made from Figure 9. The unconstrained
FUP_ k AIC and TestGraf curves tend to be similar, but the FUP curves tend to undu-
late more smoothly and the TestGraf curves to wiggle more. It is difficult to say
whether or not this difference is due to inherent properties of the two fitting meth-
ods or to the different criteria, k and h, for controlling flexibility in FUP and Test-
Graf (cf. Items 69 and 96). Also it is of interest to inspect closeness of monotonic
FMP_ k AIC curves to nonmonotonic FUP_ k AIC and TestGraf curves. Note that Item
96 appears to be a problematic item. The IRCs from both TestGraf and FUP show
that the probability of correctly answering this item decreases as ability increases.
With the constraint of monotonicity, the FMP IRC comes out as a flat line.
TestGraf FMP-k0FUP-k1
Item 3
TestGraf FMP-k1FUP-k1
Item 5
TestGraf FMP-k1FUP-k2
Item 13
TestGraf FMP-k0FUP-k1
Item 22
TestGraf FMP-k1FUP-k0
Item 24
TestGraf FMP-k0FUP-k2
Item 39
TestGraf FMP-k1
FUP-k1
Item 69
TestGraf FMP-k1
FUP-k1
Item 74
TestGraf FMP-k0
FUP-k2
0 . 0
0 . 2
0 . 4
0 . 6
0 . 8
1 . 0
0 . 0
0 . 2
0 . 4
0 . 6
0 . 8
1 . 0
-3 -2 -1 0 1 2 3 0 . 0
0 . 2
0 . 4
0 . 6
0 . 8
1 . 0
-3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3
Item 96
ability
P r o b a b i l i t y
FIGURE 9. Estimated IRCs for Psychology 101 data. IRC ¼ item response curve.
A Quasi-Parametric Method for Fitting
30
at Alexandru Ioan Cuza on February 8, 2015http://jebs.aera.netDownloaded from
http://jebs.aera.net/http://jebs.aera.net/http://jebs.aera.net/http://jebs.aera.net/
-
8/9/2019 Journal of Educational and Behavioral Statistics-2015-Liang-5-34
27/30
Figure 10 provides plots of the estimates by of abilities from TestGraf againstthose from FMP_ k AIC and FMP_ k BIC. The two plots are similar and in both cases,
estimated abilities from TestGraf are slightly lower than those from FMP at low
values and slightly higher at high values. In both cases, the TestGraf y estimates
are close to the FMP y estimates.
6. Summary and Conclusions
General filtered polynomial (FMP/FUP) approaches for constructing a flex-
ible IRF have been developed. The model is quasi-parametric because the para-
meters involved are not intended for interpretation. Their main function is to
define a flexible IRF that simultaneously (i) produces graphical displays of
deviations from the usually assumed S-shape and (ii) is easily portable to future
examinees not present in the calibration sample. Although the usual property of
monotonicity of an IRF is imposed in FMP, the monotonicity constraints are dis-
carded in FUP to provide a filtered unconstrained polynomial IRC that need not
be monotonic but is still bounded by 0 and 1.
The IRCs developed are intended for visual inspection to obtain diagnostic
information about deviant items. This will be helpful for detecting unsatisfactory
items when constructing ability tests. Another potential application will be in the
analysis of psychopathology scales (Meijer & Baneke, 2004) where the usual
assumptions made for ability tests are no longer applicable. Furthermore, the
FUP facility will be useful for providing option response curves for incorrect
options in multioption tests.
Computational procedures have been developed for estimation purposes and a
computer program, FMP, written in FORTRAN 90.2 Monotonicity constraints
are imposed by means of a reparameterization. This methodology has been tried
out in two simulation studies and on an actual example and found to compare
favorably with existing methods. In Simulation Study 1 where the true IRC was
FIGURE 10. Comparison of by0 s from TestGraf and FMP (Psychology 101 data).
Liang and Browne
31
at Alexandru Ioan Cuza on February 8, 2015http://jebs.aera.netDownloaded from
http://jebs.aera.net/http://jebs.aera.net/http://jebs.aera.net/http://jebs.aera.net/
-
8/9/2019 Journal of Educational and Behavioral Statistics-2015-Liang-5-34
28/30
a 2PL (or, equivalently, FMP_0), the FMP IRCs were very close to those from
the gold standard MML and were clearly superior to those from JML. This is
reassuring because the FMP_0 algorithm may be regarded as a first iteration
of JML and difficulties with JML are recognized. In Simulation Study 2, where
a nonstandard IRF was used for the generating model, the FMP approach yielded
as good an approximation to the actual generating IRF as the well-known non-
parametric method, implemented in the program TestGraf, and clearly more
accurate estimates of the abilities y s. In the actual example, the current approach
compares favorably with TestGraf but has the additional advantages of being
able to produce either monotonic increasing or nonmonotonic IRCs as well as
easily portable IRFs. Although the current article has concentrated on the use
of a logistic filter, the theory presented can easily be adapted to the use of other
filters such as those derived from normal, beta, or gamma ogives.
Acknowledgments
The authors are grateful to Michael Edwards, Steven MacEachern, the editor, and the
reviewers for their thought provoking comments and helpful suggestions.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research,
authorship, and/or publication of this article.
FundingThe author(s) disclosed receipt of the following financial support for the research, author-
ship, and/or publication of this article: This research was supported in part by NSF grant
SES-0437251. It was carried out in partial fulfillment of the requirements for the first
author’s PhD degree in quantitative psychology at the Ohio State University with the
second author as advisor.
Notes
1. Hayley (1952) suggested multiplication of the logit by D ¼ 1.702 to approxi-mate the Normal Ogive.
2. The program, FMP, is being prepared for distribution on the Internet. Please
address all inquiries to the first author.
Supplementary Material
The online appendices are available at http:/jeb.sagepub.com/supplemental.
References
Akaike, H. (1973). Information theory and an extension of the maximum likelihood prin-
ciple. In B. N. Petrox & F. Caski (Eds.), Second international symposium on informa-tion theory (pp. 267–281). Budapest, Hungary: Akademiai Kiado.
Baker, F. B. (1992). Item response theory: Parameter estimation techniques. New York,
NY: Marcel Dekker.
A Quasi-Parametric Method for Fitting
32
at Alexandru Ioan Cuza on February 8, 2015http://jebs.aera.netDownloaded from
http://jebs.aera.net/http://jebs.aera.net/http://jebs.aera.net/http://jebs.aera.net/
-
8/9/2019 Journal of Educational and Behavioral Statistics-2015-Liang-5-34
29/30
Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s
ability. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores
(pp. 399–402). Reading MA: Addison-Wesley.
Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item para-
meters: Application of an EM algorithm. Psychometrika, 46 , 443–459.Bock, R. D., & Lieberman, M. (1970). Fitting a response model for n dichotomously
scored items. Psychometrika, 35, 179–197.
Bock, R. D., & Moustaki, I. (2007). Item response theory in a general framework. In C. R.
Rao & S. Sinharay (Eds.), Handbook of statistics, volume 26: Psychometrics (pp.
469–514). Amsterdam, The Netherlands: North-Holland.
Browne, M. W. (2000). Cross-validation methods. Journal of Mathematical Psychology,
44, 108–132.
Burnham, K. P., & Anderson, D. R. (2004). Multimodel inference: Understanding AIC
and BIC in model selection. Sociological Methods and Research, 33, 261–304.
Cudeck, R., & Henly, S. J. (1991). Model selection in covariance structures analysis
and the ‘‘problem’’ of sample size: A clarification. Psychological Bulletin, 109,
512–519.
De Leeuw, J. (1992). Introduction to Akaike (1973) information theory and an extension
of the maximum likelihood principle. In S. Kotz & N. L. Johnson (Eds.), Break-
throughs in statistics (Vol. 1, pp. 599–609). London, England: Springer-Verlag.
Drasgow, F., Levine, M. V., Williams, B., McLaughlin, M. E., & Candell, G. L. (1989).
Modeling incorrect responses to multiple-choice items with multilinear formula score
theory. Applied Psychological Measurement , 13, 285–299.
Duncan, K. A., & MacEachern, S. N. (2008). Nonparametric Bayesian modeling for item
response. Statistical Modeling , 8, 41–66.
Duncan, K. A., & MacEachern, S. N. (2013). Nonparametric Bayesian modeling for item
response with a three parameter logistic prior mean. In M. C. Edwards & R. C.
MacCallum (Eds.), Current topics in the theory and application of latent variable
methods. New York, NY: Routledge.
Elphinstone, C. D. (1983). A target distribution model for nonparametric density estima-
tion. Communications in Statistics—Theory and Methods, 12, 161–198.
Elphinstone, C. D. (1985). A method of distribution and density estimation (Unpublished
dissertation). University of South Africa, Pretoria, South Africa.
Hayley, D.C. (1952). Estimation of the dosage mortality relationship when the dose is sub- ject to error . (Technical Report No. 15). Stanford, CA: Stanford University, Applied
Mathematics and Statistics Laboratory.
Hawkins, D. M. (1994). Fitting monotonic polynomials to data. Computational Statistics,
9, 233–247.
Lee, Y.-S. (2007). A comparison of methods for nonparametric estimation of item char-
acteristic curves for binary items. Applied Psychological Measurement , 31, 121–134.
Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading,
MA: Addison Wesley.
Meijer, R. R., & Baneke, J. J. (2004). Analyzing psychopathology items: A case for non-
metric item response theory modeling. Psychological Methods, 9, 354–368. Neyman, J., & Scott, E. L. (1948). Consistent estimates based on partially consistent
observations. Econometrika, 16 , 1–32.
Liang and Browne
33
at Alexandru Ioan Cuza on February 8, 2015http://jebs.aera.netDownloaded from
http://jebs.aera.net/http://jebs.aera.net/http://jebs.aera.net/http://jebs.aera.net/
-
8/9/2019 Journal of Educational and Behavioral Statistics-2015-Liang-5-34
30/30
Ramsay, J. O. (1977). Monotonic weighted power transformations to additivity. Psycho-
metrika, 42, 83–109.
Ramsay, J. O. (1991). Kernel smoothing approaches to nonparametric item characteristic
curve estimation. Psychometrika, 56 , 611–630.
Ramsay, J. O. (2000). TestGraf: A program for the graphical analysis of multiple choicetest and questionnaire data [Computer program and manual]. Retrieved from http://
www.psych.mcgill.ca/faculty/ramsay/ramsay.html
Ramsay, J. O., & Winsberg, S. (1991). Maximum marginal likelihood estimation for semi-
parametric item analysis. Psychometrika, 56 , 365–379.
Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6 ,
461–464.
Sinnott, L. T. (1997). Filtered polynomial density approximations and their application to
discriminant analysis (MS Thesis). The Ohio State University, Columbus, OH.
Thissen, D., Chen, W.-H, & Bock, R. D. (2003). Multilog (version 7) [Computer soft-
ware]. Lincolnwood, IL: Scientific Software International.
Thissen, D., & Orlando, M. (2001). Item response theory for items scored in two cate-
gories. In D. Thissen & H. Wainer (Eds.), Test scoring (pp. 73–140). Mahwah, NJ:
Lawrence Erlbaum.
Woods, C. M., & Thissen, D. (2006). Item response theory with estimation of the latent
population distribution using spline-based densities. Psychometrika, 71, 281–301.
Authors
LONGJUAN LIANG is a psychometric manager at Educational Testing Service,Rosedale Rd, Princeton, NJ 08822; e-mail: [email protected]. Her research interests